在 Python 中解码来自 httplib2 请求的内容时出现问题

Question

早上好，

我有以下代码，它使用 httplib2 从 URL 获取内容：

from __future__ import unicode_literals
import httplib2
import requests
import subprocess
from bs4 import BeautifulSoup

def initialize():
    global url
    url = "http://nottherealurl.com"
    global header
    header = set_header()
def set_header():
    return {
    "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Encoding":"gzip, deflate, br",  
    "Content-Type":"text/html; charset=utf-8",
    "Accept-Language":"en-US,en;q=0.5",
    "Connection":"keep-alive",
    "DNT":"1",
    "Sec-Fetch-Dest":"document", 
    "Sec-Fetch-Mode":"navigate",   
    "Sec-Fetch-Site":"cross-site",
    "Sec-Fetch-User":"?1",
    "Sec-GPC":"1",   
    "Upgrade-Insecure-Requests":"1",
    "TE":"trailers",
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; rv:122.0) Gecko/20100101 Firefox/122.0"    
    } 
def get_url():
    initialize()
    h = httplib2.Http()
    (resp, content) = h.request(url,"GET",headers=header) 
    print(content)

我调用

get_url()

来获取URL的内容；但是，它返回这样的二进制数据

b'\x90\x03\x02\x80\xfc-\xd5\xfe\xec\\N

。

我在 Cygwin curl 中测试 url 时没有遇到同样的问题。

Answer 1

问题出在以下部分：

“接受编码”：“gzip、deflate、br”

根据Python请求响应以utf-8编码但无法解码

br 请求 Brotli 压缩，这是 Google 正在推动的一种新的压缩标准（请参阅 RFC 7932），以取代网络上的 gzip。 Chrome 要求使用 Brotli，因为最新版本的 Chrome 本身就可以理解它。您请求 Brotli 是因为您从 Chrome 复制了标头。但 requests 本身并不理解 Brotli。

修改为“Accept-Encoding”:“gzip, deflate”

httlib2可以处理编码，可以轻松显示数据。

在 Python 中解码来自 httplib2 请求的内容时出现问题

问题描述投票：0回答：1

1个回答

最新问题

在 Python 中解码来自 httplib2 请求的内容时出现问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1