为什么不用 Py 来清理表格呢？

Question

我想抓取两个表，但只得到第一个表的结果。为什么？我对两个表使用相同的逻辑。

import requests
from bs4 import BeautifulSoup
import pandas as pd

# URL to scrape
url = "https://fbref.com/en/comps/9/keepers/Premier-League-Stats"

# Send a GET request
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Scrape the first table
table1 = soup.find_all('table', attrs={'id': 'stats_squads_keeper_for'})[0]
df1 = pd.read_html(str(table1))[0]

# Scrape the second table
table2 = soup.find_all('table', attrs={'id': 'stats_keeper'})[0]
df2 = pd.read_html(str(table2))[0]

# Print data frames
print(df1) # run well
print(df2) # is empty

Answer 1

如果您不一定必须使用

bs

执行此操作，您也可以使用selenium。它也非常适合抓取动态内容。如果您不想没有

bs

，您还可以将硒与

bs

结合使用。

这是一个工作版本：

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By

url = "https://fbref.com/en/comps/9/keepers/Premier-League-Stats"
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)

driver.get(url)

pd.set_option('display.width', None)

table1 = driver.find_element(By.ID, "stats_squads_keeper_for").get_attribute("outerHTML")
df1 = pd.read_html(str(table1))[0]
print(df1)

table2 = driver.find_element(By.ID, "stats_keeper").get_attribute("outerHTML")
df2 = pd.read_html(str(table2))[0]
print(df2)

with open("tab1.csv", "w", encoding="utf-8") as f:
    f.write(df1.to_csv())
with open("tab2.csv", "w", encoding="utf-8") as f:
    f.write(df2.to_csv())

driver.quit()

为什么不用 Py 来清理表格呢？

问题描述投票：0回答：1

1个回答

最新问题

为什么不用 Py 来清理表格呢？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1