我在运行代码时遇到问题,并在 StackOverflow 上找到了完美的解决方案。但是,当我进行必要的更改并运行它时,我没有得到任何输出。
代码:
from bs4 import BeautifulSoup
import urllib.parse
import requests
r = requests.get('https://duckduckgo.com/html/?q=test')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('a', attrs={'class':'result__url'}, href=True)
for link in results:
url = link['href']
o = urllib.parse.urlparse(url)
d = urllib.parse.parse_qs(o.query)
print(d['uddg'][0])
urlparse() 用于路径组件 “从中获取查询字符串并将其传递给
parse_qs()
以进一步处理它。然后您可以使用 uddg
名称提取链接。”
这应该是最初的几个结果:
http://www.speedtest.net/
https://www.merriam-webster.com/dictionary/test
https://en.wikipedia.org/wiki/Test
https://www.thefreedictionary.com/test
https://www.dictionary.com/browse/test
我没有得到任何输出。 输出:
In [14]:
您得到了
403
,因此您没有结果。要解决此问题,请添加 headers
。
具体方法如下:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0",
}
page = requests.get('https://duckduckgo.com/html/?q=test', headers=headers).text
soup = BeautifulSoup(page, 'html.parser').find_all("a", class_="result__url", href=True)
for link in soup:
print(link['href'])
输出:
https://www.merriam-webster.com/dictionary/test
https://www.speedtest.net/
https://www.dictionary.com/browse/test
https://www.thefreedictionary.com/test
https://www.thesaurus.com/browse/test
https://en.wikipedia.org/wiki/Test
https://www.tests.com/
http://speedtest.xfinity.com/
https://fast.com/
https://www.spectrum.com/internet/speed-test
https://projectstream.google.com/speedtest
https://dictionary.cambridge.org/dictionary/english/test
http://www.act.org/content/act/en/products-and-services/the-act.html
...