如何从网站上抓取表格并创建数据框

问题描述 投票:0回答:1

我正在使用 python 在表上进行网络抓取练习,我能够成功地打印给定表的列,但我无法用它创建 DataFrame。建议的 append 方法无法使用,因为它不再适用于 pandas。 CONCAT 方法应该可以工作,但我无法使其工作,我不知道如何使用 [i].text

你能帮帮我吗?

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys


url = "https://worldpopulationreview.com/countries"
PATH = 'C:/chromedriver_win32/chromedriver.exe'

driver = webdriver.Chrome(PATH)
driver.get(url)
driver.find_element(By.XPATH, "/html/body/div[1]/div/div/div/div[2]/div/button[2]").click() #copy 

countries = driver.find_elements(By.XPATH, ("//*[@id='__next']/div/div[3]/section[2]/div[1]/div/div/div/div[3]/div[2]//tbody/tr/td[1]"))
for x in countries:
    print(x.text)

教程的建议是这样的,但是“追加”方法无法工作,因为它已被弃用:

for i in range(len(countries)):
    df_population = df_population.append({"Countries": countries[i].text)}

我正在尝试的是这个,但它不起作用:

df_population = pd.DataFrame(columns=["Countries"]) # does not work
df_population = pd.concat([pd.DataFrame(**[countries[i].text]**, columns=['Countries']) for i in range(len(countries))], ignore_index=True) # does not work
python selenium-webdriver concatenation
1个回答
0
投票

使用硒获取

driver.page_source
并将其传递给
pd.read_html
以创建
pandas.DataFrame

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import chromedriver_autoinstaller
import pandas as pd


# auto install the chrome driver or pass a path like you currently are doing
chromedriver_autoinstaller.install()

# run headless
chrome_options = Options()
chrome_options.add_argument("--headless")
# create the driver object
driver = webdriver.Chrome(options=chrome_options)
# go to the website
driver.get('https://worldpopulationreview.com/countries')
# get the first table on the page
df = pd.read_html(driver.page_source)[0]
# close the driver
driver.quit()


   Flag        Country  2023 (Live)  2022 Population Area (km²)  \
0   NaN          India   1427068453       1417173173       3.3M   
1   NaN          China   1425713463       1425887337       9.7M   
2   NaN  United States    339780047        338289857       9.4M   
3   NaN      Indonesia    277263198        275501339       1.9M   
4   NaN       Pakistan    239928531        235824862     881.9K   

  Land Area (km²)  Density (/km²) Growth Rate World %  Rank  
0              3M           481.0       0.81%  17.85%     1  
1            9.4M           151.0      -0.02%  17.81%     2  
2            9.1M            37.0        0.5%   4.25%     3  
3            1.9M           148.0       0.74%   3.47%     4  
4          770.9K           312.0       1.98%      3%     5  
© www.soinside.com 2019 - 2024. All rights reserved.