使用正确的代码拉动 td/html 元素 - Selenium 和 Python

问题描述 投票:0回答:1

我一直在尝试从 https://www.tradingview.com/markets/stocks-turkey/market-movers-all 提取股票代码(股票的简称)、股票名称、价格、板块和市值列-股票/ 努力使用正确的代码提取正确的 html 元素。我尝试过使用 Selector Gadget 来识别 Xpath,但是我对 HTML 树和规则不是很有信心。我注意到前 3 列被视为网页中的单个 td。粘贴下面的代码,此时正在拉动整个行。谢谢..

from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import pandas as pd

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Chrome()
website = 'https://www.tradingview.com/markets/stocks-turkey/market-movers-all-stocks/'
driver.get(website) #to open the website

while True:
    try:
        loadMoreButton = driver.find_element(By.XPATH,'//*[contains(concat( " ", @class, " " ), concat( " ", "content-D4RPB3ZC", " " ))]')
        time.sleep(2)
        loadMoreButton.click()
        time.sleep(5)
    except Exception as e:
        print (e)
        break
print ("Complete")
time.sleep(10)

matches = driver.find_elements(By.TAG_NAME,'tr')

ticker_symbol = []
ticker_name = []
ticker_price =[]
ticker_sector =[]
ticker_marketcap =[]

for match in matches:
    print(match.text)

driver.quit()

python html selenium-webdriver xpath html-table
1个回答
0
投票

我解决了一些问题

  1. 用适当的
    .sleep()
    s
     替换 
    WebDriverWait
  2. s
  3. 更新了定位器

工作代码如下。

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

url = 'https://www.tradingview.com/markets/stocks-turkey/market-movers-all-stocks/'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)

while True:
    try:
        driver.find_element(By.XPATH,'//span[text()="Load More"]').click()
    except StaleElementReferenceException:
        break

wait = WebDriverWait(driver, 10)
rows = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,'table[class="table-Ngq2xrcG"] tr.listRow')))
for row in rows:
    ticker_symbol = row.find_element(By.XPATH, './td[1]//a').text
    ticker_name = row.find_element(By.XPATH, './td[1]//sup').text
    ticker_price = row.find_element(By.XPATH, './td[2]').text
    ticker_marketcap = row.find_element(By.XPATH, './td[6]').text
    try:
        ticker_sector = row.find_element(By.XPATH, './td[11]/a').text
    except NoSuchElementException:
        ticker_sector = "—"

    print(ticker_symbol, ticker_name, ticker_price, ticker_marketcap, ticker_sector)

driver.quit()

输出是

A1CAP A1 CAPITAL YATIRIM 24.76 TRY 3.38B TRY Finance
ACSEL ACIPAYAM SELULOZ 99.7 TRY 1.104B TRY Process Industries
ADEL ADEL KALEMCILIK 322.50 TRY 7.69B TRY Consumer Durables
...
© www.soinside.com 2019 - 2024. All rights reserved.