使用Python中的BeautifulSoup解析HTML

Question

目前我的代码如下：

 from bs4 import BeautifulSoup
 import requests

 main_url = 'http://www.foodnetwork.com/recipes/a-z'
 response = requests.get(main_url)
 soup = BeautifulSoup(response.text, "html.parser")
 mylist = [t for tags in soup.find_all(class_='m-PromoList o-Capsule__m-
           PromoList') for t in tags if (t!='\n')]

截至目前，我得到一个包含正确信息的列表，但它仍然在HTML标记内。下面给出了列表元素的示例：

 <li class="m-PromoList__a-ListItem"><a href="//www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-3612570">"16 Bean" Pasta E Fagioli</a></li>

从这个项目我想分别提取href链接和下面的字符串，但我这样做有困难，我真的不认为获取此信息应该需要一整套新的操作。怎么办？

Answer 1

您可以这样做以获取一个元素的href和文本：

href = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a')['href']
text = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a').text

对于项目列表：

my_list = soup.find_all('li', attrs={'class':'m-PromoList__a-ListItem'})
for el in my_list:
    href = el.find('a')['href']
    text = el.find('a').text
    print(href)
    print(text)

编辑：减少运行时间的重要提示：不要多次搜索同一个标签。而是将标记保存在变量中，然后多次使用它。

a = soup.find('li', attrs={'class':'m-PromoList__a-ListItem'}).find('a')
href = a.get('href')
text = a.text

在大型HTML代码中，查找标记需要花费大量时间，因此这样做会减少查找标记所需的时间，因为它只运行一次。

Answer 2

有几种方法可以达到同样的效果。这是使用css selector的另一种方法：

from bs4 import BeautifulSoup
import requests

response = requests.get('http://www.foodnetwork.com/recipes/a-z')
soup = BeautifulSoup(response.text, "lxml")
for item in soup.select(".m-PromoList__a-ListItem a"):
    print("Item_Title: {}\nItem_Link: {}\n".format(item.text,item['href']))

部分结果：

Item_Title: "16 Bean" Pasta E Fagioli
Item_Link: //www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-3612570

Item_Title: "16 Bean" Pasta e Fagioli
Item_Link: //www.foodnetwork.com/recipes/ina-garten/16-bean-pasta-e-fagioli-1-3753755

Item_Title: "21" Apple Pie
Item_Link: //www.foodnetwork.com/recipes/21-apple-pie-recipe-1925900

使用Python中的BeautifulSoup解析HTML

问题描述投票：0回答：2

2个回答

最新问题

使用Python中的BeautifulSoup解析HTML

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2