我正在尝试搜索谷歌的某些产品,但谷歌返回的结果语言取决于代理,我试图在我的标题中使用'accept-language': 'en-US,en;q=0.9'
修复它但仍然没有用
import requests
from bs4 import BeautifulSoup
products=["Majestic Pet Stairs Steps","Ball Jars Wide Mouth Lids 12/Pack","LED Duck Color Changing Floating Speaker"]
for product in products:
headers = {
'authority': 'www.google.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
'accept-language': 'en-US,en;q=0.9'}
url = 'https://google.com/search?q={}'.format(product)
PROXY = None
res=requests.get(url,headers=headers,proxies=PROXY)
if res.status_code!=200:
print("bad proxy")
break
soup = BeautifulSoup(res.text,"lxml")
print(soup.title.text)
我想要的是始终以英语获得结果(无论代理人)
它们提供了搜索API:https://developers.google.com/custom-search/v1/overview
如果您通过网络抓取进行大量自动查询,他们可能会开始设置验证码或阻止您。
有一个方便的库我用于我的搜索,我的应用程序的片段:
点击安装谷歌安装,RFC
from googlesearch import search
results = list(search(str(tag)+' '+str(intitle), domains = ['stackoverflow.com'], stop = SITE.page_size))