在子线程中创建BeautifulSoup对象将打印编码错误

问题描述 投票:1回答:2

我写了一个示例代码:

import requests
from bs4 import BeautifulSoup
from threading import Thread
def test():
    r = requests.get('http://zhuanlan.sina.com.cn/')
    soup = BeautifulSoup(r.content,'lxml')

print('run test on main thread')
test()

print('run test on child thread')
t = Thread(target=test)
t.start()
t.join()

输出是:

run test on main thread
run test on child thread
encoding error : input conversion failed due to input error, bytes 0x95 0x50 0x22 0x20
encoding error : input conversion failed due to input error, bytes 0x95 0x50 0x22 0x20
encoding error : input conversion failed due to input error, bytes 0x95 0x50 0x22 0x20

我编写了一个测试函数,并在主线程和子线程中运行它。如输出中所示,测试函数在子线程打印encoding error: input conversion failed due to input error中运行,我无法阻止它。为什么会这样?

python multithreading beautifulsoup thread-safety lxml
2个回答
0
投票

我建议这来自xml解析器......因为使用HTML解析器,错误消失了......

def test():
    r = requests.get('http://zhuanlan.sina.com.cn/')
    soup = BeautifulSoup(r.text, 'html.parser')

我得到了这个:

run test on main thread
run test on child thread

0
投票

这有点晚了,但只是让任何人再次碰到这个,这对我有用:

soup = BeautifulSoup(r.content,'lxml',from_encoding="iso-8859-1")
© www.soinside.com 2019 - 2024. All rights reserved.