缩小我使用python从网站抓取的内容

Question

我正在尝试对网站进行python抓取，但在将其缩小到合理大小时遇到了麻烦，而python无法识别我的要求。例如，这是我的代码：

import bs4
import requests

url = requests.get('https://ballotpedia.org/Alabama_Supreme_Court')
soup = bs4.BeautifulSoup(url.text, 'html.parser')
y = soup.find('table')
print(y)

我正在努力搜寻阿拉巴马州最高法院法官的姓名，但是有了这段代码，我获得了太多的信息。我已经尝试过诸如（在第6行中）

y = soup.find('table',{'class':'wikitable sortable'})`

但是我收到一条消息，说搜索没有结果。

这里是网页检查的图像。我的目标是使thead在我的代码中工作，但失败了！

我怎样才能给python指定我只需要法官的名字？

非常感谢！

Answer 1

简单来说，我会这样做。

import pandas as pd

df = pd.read_html("https://ballotpedia.org/Alabama_Supreme_Court")[2]["Judge"]

print(df.to_list())

输出：

['Brad Mendheim', 'Kelli Wise', 'Michael Bolin', 'William Sellers', 'Sarah Stewart', 'Greg Shaw', 'Tommy Bryan', 'Jay Mitchell', 'Tom 
Parker']

现在返回到原始的issue来解决它，因为我个人很喜欢在不导航至其他解决方案的情况下解决实际问题。

find之间的差异，该差异仅返回第一个element，但find_all将返回list的elements。检查Documentation。

直接导入from bs4 import BeautifulSoup而不是import bs4，因为它是Python的The DRY Principle。

让bs4处理内容，因为它是背景中的任务之一。因此请使用r.text]代替r.content

现在，我们将深入研究HTML以选择它：

from bs4 import BeautifulSoup
import requests

r = requests.get("https://ballotpedia.org/Alabama_Supreme_Court")
soup = BeautifulSoup(r.content, 'html.parser')


print([item.text for item in soup.select(
    "table.wikitable.sortable.jquery-tablesorter a")])
现在，您必须阅读有关CSS-Selection

输出：

['Brad Mendheim', 'Kelli Wise', 'Michael Bolin', 'William Sellers', 'Sarah Stewart', 'Greg Shaw', 'Tommy Bryan', 'Jay Mitchell', 'Tom Parker']

缩小我使用python从网站抓取的内容

问题描述投票：2回答：1

1个回答

最新问题

缩小我使用python从网站抓取的内容

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1