Python Socket utf-8解码

问题描述 投票:0回答:1

我想将GET请求的响应转换为字符串,然后我想到了以下代码:

import socket

target_host = "www.google.com"

target_port = 80  # create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# connect the client
s.connect((target_host, target_port))

# send some data
request = "GET / HTTP/1.1\r\nHost:%s\r\n\r\n" % target_host
s.send(request.encode("utf-8"))
full_msg = ""

# Prevent recv() function to stop the script to wait until  it receives more data, even if there is no more.
s.settimeout(1)
flag = True

while flag:
    # receive some data
    try:
        response = s.recv(4096)
        full_msg = full_msg + str(response)
        print("Adding msg")
    except Exception as e:
        print(full_msg)
        flag = False
        print(e)

print("Loop ended")
print(type(full_msg))
问题是,当我尝试在s.recv(4096)中解码响应时,将其替换为以下代码:

response = s.recv(4096).decode("utf-8")

我收到以下异常:

'utf-8' codec can't decode byte 0xe8 in position 1025: invalid continuation byte

我不知道如何解决此问题,因为我无法修改从响应中获得的字符,并且如果我不每次都对它进行解码,那么我需要删除使我的full_msg字符串弄乱的“ b”字符在循环中。

另外,在文档中它说.recv()方法返回一个字符串,但是我似乎正在得到一个类似字节的对象。任何想法都受到欢迎,我也乐于知道可以改进我的代码的任何方式。
python sockets http utf-8 decode
1个回答
0
投票

根据响应中的数据,Content-Type: text/html; charset=ISO-8859-1。响应不是UTF-8。改为.decode('iso-8859-1')。最好还是使用requests

© www.soinside.com 2019 - 2024. All rights reserved.