我想将GET请求的响应转换为字符串,然后我想到了以下代码:
import socket
target_host = "www.google.com"
target_port = 80 # create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# connect the client
s.connect((target_host, target_port))
# send some data
request = "GET / HTTP/1.1\r\nHost:%s\r\n\r\n" % target_host
s.send(request.encode("utf-8"))
full_msg = ""
# Prevent recv() function to stop the script to wait until it receives more data, even if there is no more.
s.settimeout(1)
flag = True
while flag:
# receive some data
try:
response = s.recv(4096)
full_msg = full_msg + str(response)
print("Adding msg")
except Exception as e:
print(full_msg)
flag = False
print(e)
print("Loop ended")
print(type(full_msg))
response = s.recv(4096).decode("utf-8")
我收到以下异常:
'utf-8' codec can't decode byte 0xe8 in position 1025: invalid continuation byte
我不知道如何解决此问题,因为我无法修改从响应中获得的字符,并且如果我不每次都对它进行解码,那么我需要删除使我的full_msg字符串弄乱的“ b”字符在循环中。
另外,在文档中它说.recv()方法返回一个字符串,但是我似乎正在得到一个类似字节的对象。任何想法都受到欢迎,我也乐于知道可以改进我的代码的任何方式。根据响应中的数据,Content-Type: text/html; charset=ISO-8859-1
。响应不是UTF-8。改为.decode('iso-8859-1')
。最好还是使用requests: