HTTP 请求适用于curl,但使用Python 时失败并返回403

问题描述 投票:0回答:1

我正在尝试从“https://www.straitstimes.com/news/singapore/rss.xml”下载 rss feed。我有以下 Python 脚本:

import requests

r = requests.get('https://www.straitstimes.com/news/singapore/rss.xml')

for k, v in r.headers.items():
    print("{}: {}".format(k, v))
    
print(r.content)

当我运行此命令时,我得到以下响应:

Cache-Control: max-age=0, no-cache, no-store                                                                                          
Content-Type: text/html                                                                                                               
Date: Wed, 13 Dec 2023 03:06:00 GMT                                                                                                   
Expires: Wed, 13 Dec 2023 03:05:59 GMT                                                                                                
Referrer-Policy: no-referrer-when-downgrade                                                                                           
Server: ECD (sgc/56B1)                                                                                                                
Set-Cookie: sph_user_country=SG;Path=/;                                                                                               
X-EC-Security-Audit: 403                                                                                                              
x-vmg-version: v10.5.70                                                                                                               
Content-Length: 345                                                                                                                   
b'<?xml version="1.0" encoding="iso-8859-1"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n         "http://www.w3
.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\t<head>\n\t\t<titl
e>403 - Forbidden</title>\n\t</head>\n\t<body>\n\t\t<h1>403 - Forbidden</h1>\n\t</body>\n</html>\n'

当我尝试使用以下请求通过curl获取它时(我试图强制HTTP/1.1并从请求中删除任何用户代理/接受标头),我得到了很好的XML。我的请求做错了什么?

curl https://www.straitstimes.com/news/singapore/rss.xml -v --http1.1 -H 'User-Agent:' -H 'Accept:'
python curl python-requests
1个回答
0
投票

你可以这样尝试

import requests

headers = {
    'User-Agent': '',
    'Accept': ''
}

url = 'https://www.straitstimes.com/news/singapore/rss.xml'
r = requests.get(url, headers=headers)
print(r.status_code)  
if r.status_code == 200:
    print(r.text) 
© www.soinside.com 2019 - 2024. All rights reserved.