我正在使用 python 在表上进行网络抓取练习,我能够成功地打印给定表的列,但我无法用它创建 DataFrame。建议的 append 方法无法使用,因为它不再适用于 pandas。 CONCAT 方法应该可以工作,但我无法使其工作,我不知道如何使用 [i].text
你能帮帮我吗?
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
url = "https://worldpopulationreview.com/countries"
PATH = 'C:/chromedriver_win32/chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get(url)
driver.find_element(By.XPATH, "/html/body/div[1]/div/div/div/div[2]/div/button[2]").click() #copy
countries = driver.find_elements(By.XPATH, ("//*[@id='__next']/div/div[3]/section[2]/div[1]/div/div/div/div[3]/div[2]//tbody/tr/td[1]"))
for x in countries:
print(x.text)
教程的建议是这样的,但是“追加”方法无法工作,因为它已被弃用:
for i in range(len(countries)):
df_population = df_population.append({"Countries": countries[i].text)}
我正在尝试的是这个,但它不起作用:
df_population = pd.DataFrame(columns=["Countries"]) # does not work
df_population = pd.concat([pd.DataFrame(**[countries[i].text]**, columns=['Countries']) for i in range(len(countries))], ignore_index=True) # does not work
使用硒获取
driver.page_source
并将其传递给pd.read_html
以创建pandas.DataFrame
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import chromedriver_autoinstaller
import pandas as pd
# auto install the chrome driver or pass a path like you currently are doing
chromedriver_autoinstaller.install()
# run headless
chrome_options = Options()
chrome_options.add_argument("--headless")
# create the driver object
driver = webdriver.Chrome(options=chrome_options)
# go to the website
driver.get('https://worldpopulationreview.com/countries')
# get the first table on the page
df = pd.read_html(driver.page_source)[0]
# close the driver
driver.quit()
Flag Country 2023 (Live) 2022 Population Area (km²) \
0 NaN India 1427068453 1417173173 3.3M
1 NaN China 1425713463 1425887337 9.7M
2 NaN United States 339780047 338289857 9.4M
3 NaN Indonesia 277263198 275501339 1.9M
4 NaN Pakistan 239928531 235824862 881.9K
Land Area (km²) Density (/km²) Growth Rate World % Rank
0 3M 481.0 0.81% 17.85% 1
1 9.4M 151.0 -0.02% 17.81% 2
2 9.1M 37.0 0.5% 4.25% 3
3 1.9M 148.0 0.74% 3.47% 4
4 770.9K 312.0 1.98% 3% 5