我想从google.com废弃网络搜索结果。我按照这个问题的第一个答案,Google Search Web Scraping with Python。不幸的是我收到连接错误。我碰巧也检查过其他网站,它没有连接。是因为公司代理设置?
请注意我正在使用virtualenv“Web Scraping”。
from urllib.parse import urlencode, urlparse, parse_qs
from lxml.html import fromstring
from requests import get
raw = get("https://www.google.com/search?q=StackOverflow").text
page = fromstring(raw)
for result in page.cssselect(".r a"):
url = result.get("href")
if url.startswith("/url?"):
url = parse_qs(urlparse(url).query)['q']
print(url[0])
raw = get(“https://www.google.com/search?q=StackOverflow”)。text Traceback(最近一次调用最后一次):
文件“”,第1行,在raw = get(“https://www.google.com/search?q=StackOverflow”)。文本
文件“c:\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ api.py”,第75行,获取返回请求('get',url,params = params, ** kwargs)
文件“c:\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ api.py”,第60行,请求返回session.request(method = method,url = url, ** kwargs)
文件“c:\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ sessions.py”,第524行,请求resp = self.send(prep,** send_kwargs)
文件“c:\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ sessions.py”,第637行,在send r = adapter.send(request,** kwargs)
文件“c:\ users \ appdata \ local \ programs \ python \ python37 \ webscrapping \ lib \ site-packages \ requests \ adapters.py”,第516行,发送引发ConnectionError(e,request = request)
ConnectionError:HTTPSConnectionPool(host ='www.google.com',port = 443):使用url超出最大重试次数:/ search?q = StackOverflow(由NewConnectionError引起(':无法建立新连接:[WinError 10060] A连接尝试失败,因为连接方在一段时间后没有正确响应,或者建立的连接失败,因为连接的主机未能响应'))
请指教。谢谢
编辑:我试过谷歌google.com,它失败了。
import os
hostname = "https://www.google.com" #example
response = os.system("ping -c 1 " + hostname)
#and then check the response...
if response == 0:
print(hostname, 'is up!')
else:
print(hostname, 'is down!')
我认为由于您的代理设置,您收到此错误。尝试在命令提示符中运行以下命令之一
set http_proxy=http://proxy_address:port
set http_proxy=http://user:password@proxy_address:port
set https_proxy=https://proxy_address:port
set https_proxy=https://user:password@proxy_address:port