我写了一个示例代码:
import requests
from bs4 import BeautifulSoup
from threading import Thread
def test():
r = requests.get('http://zhuanlan.sina.com.cn/')
soup = BeautifulSoup(r.content,'lxml')
print('run test on main thread')
test()
print('run test on child thread')
t = Thread(target=test)
t.start()
t.join()
输出是:
run test on main thread
run test on child thread
encoding error : input conversion failed due to input error, bytes 0x95 0x50 0x22 0x20
encoding error : input conversion failed due to input error, bytes 0x95 0x50 0x22 0x20
encoding error : input conversion failed due to input error, bytes 0x95 0x50 0x22 0x20
我编写了一个测试函数,并在主线程和子线程中运行它。如输出中所示,测试函数在子线程打印encoding error: input conversion failed due to input error
中运行,我无法阻止它。为什么会这样?
我建议这来自xml解析器......因为使用HTML解析器,错误消失了......
def test():
r = requests.get('http://zhuanlan.sina.com.cn/')
soup = BeautifulSoup(r.text, 'html.parser')
我得到了这个:
run test on main thread
run test on child thread
这有点晚了,但只是让任何人再次碰到这个,这对我有用:
soup = BeautifulSoup(r.content,'lxml',from_encoding="iso-8859-1")