如何在Python上使用Selenium从站点抓取全名?

问题描述 投票:1回答:2

我对在Python上编码和学习Selenium Webdriver相当陌生。到目前为止,我已经获得了很多帮助,并且非常接近所需的输出。

到目前为止,我已经可以获取玩家的缩写名称,大/小数据和台词。例如,我当前的输出看起来像这样:

Player                              Over       Line       Under

A. Radulov                          +127       2.5         -167  
G. Landeskog                        -130       2.5         +100
etc.

但是,我希望最终输出显示玩家的所有全名:

Player                               Over        Line       Under

Alexander Radulov                     +127       2.5         -167  
Gabriel Landeskog                     -130       2.5         +100
etc.

这是我当前的代码

import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time

driver=webdriver.Chrome("C:\webdrivers\chromedriver.exe")
driver.maximize_window()
driver.get("https://www.betonline.ag/sportsbook/player-props")
WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"builder")))
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li[@class='one-third one-third-remove']//a[./b[contains(.,'Over / Under')]]"))).click()
time.sleep(2)




WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div[ng-if='selected.league']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li[@ng-repeat='league in leagues']/a[.//span[text()='NHL']]"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div[ng-if^='selected.game']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//li/a[.//div[text()='All Available']]"))).click()

WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//span[contains(.,'Shots on goal')]"))).click()



player=[]
Over=[]
line=[]
Under=[]
Playersname=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='div-table__row__cell hard--bottom hard--right ng-scope']//a[@class='ng-binding']")))
for players in Playersname:
    player.append(players.text)

OverAndUnder=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a>b.milli.caps.ng-binding")))
count=int(len(OverAndUnder)/2)
x=0
for i in range(count):
    Over.append(OverAndUnder[x].text)
    Under.append(OverAndUnder[x+1].text)
    x=x+2

lines=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"div[ng-class*='overUnder']>b")))
for l in lines:
    line.append(l.text)

df = pd.DataFrame({"Player":player,"Over":Over,"Line":line, "Under":Under})
print(df)

为了找到全名,我点击了玩家的名字并检查了数据。我确实找到了正确的数据,但是不确定如何正确地对此进行正确解析。

enter image description here

我希望最终的输出将包括所有相同的数据,只是玩家的全名而不是缩写的名字。预先感谢您提供的任何帮助或见解。

python pandas selenium web-scraping webdriverwait
2个回答
0
投票

所需的元素是Angular元素,因此要提取文本98.72,您必须为visibility_of_element_located()引入WebDriverWait,并且可以使用以下任一解决方案:

  • 使用CSS_SELECTOR文本属性:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "player-stats-content.ng-isolate-scope[data='pStats']>p span:nth-of-type(2)"))).text)
    
  • 使用XPATHget_attribute()

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//player-stats-content[@class='ng-isolate-scope' and @data='pStats']/p[@class='text--center beta cap']//following::span[2]"))).get_attribute("innerHTML"))
    
  • :您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

0
投票

尝试获取元素的innerHTML

for player_name in Playersname:
    player.append(player_name.get_attribute("innerHTML"))

这将获取元素内的所有html / text,在这种情况下,它只是播放器名称。

© www.soinside.com 2019 - 2024. All rights reserved.