使用 Beautiful Soup 4 解析 HTML 时无法让循环工作

问题描述 投票:0回答:1

我正在使用 Beautiful Soup 文档来帮助我了解如何实施它。我对整个 Python 不太熟悉,所以也许我犯了语法错误,但我不这么认为。下面的代码应该打印出 Etsy 主页上的任何链接,但它并没有这样做。该文档说明了与此类似的内容,但也许我遗漏了一些东西。这是我的代码:

#!/usr/bin/python3

# import library
from bs4 import BeautifulSoup
import requests
import os.path
from os import path

# Request to website and download HTML contents
url='https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
req=requests.get(url)
content=req.text

soup=BeautifulSoup(content, 'html.parser')

for x in soup.head.find_all('a'):
    print(x.get('href'))

如果我这样设置,HTML 会打印出来,但我无法让 for 循环工作。

python parsing beautifulsoup html-parsing
1个回答
0
投票

如果你想从指定的 URL 中获取所有标签,那么:

from bs4 import BeautifulSoup
import requests

# Request to website and download HTML contents
url = 'https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'

with requests.get(url) as r:
    r.raise_for_status()
    soup = BeautifulSoup(r.text, 'lxml')
    for a in soup.find_all('a', href=True):
        print(a['href'])
© www.soinside.com 2019 - 2024. All rights reserved.