我一直在尝试从 https://www.tradingview.com/markets/stocks-turkey/market-movers-all 提取股票代码(股票的简称)、股票名称、价格、板块和市值列-股票/ 努力使用正确的代码提取正确的 html 元素。我尝试过使用 Selector Gadget 来识别 Xpath,但是我对 HTML 树和规则不是很有信心。我注意到前 3 列被视为网页中的单个 td。粘贴下面的代码,此时正在拉动整个行。谢谢..
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import pandas as pd
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome()
website = 'https://www.tradingview.com/markets/stocks-turkey/market-movers-all-stocks/'
driver.get(website) #to open the website
while True:
try:
loadMoreButton = driver.find_element(By.XPATH,'//*[contains(concat( " ", @class, " " ), concat( " ", "content-D4RPB3ZC", " " ))]')
time.sleep(2)
loadMoreButton.click()
time.sleep(5)
except Exception as e:
print (e)
break
print ("Complete")
time.sleep(10)
matches = driver.find_elements(By.TAG_NAME,'tr')
ticker_symbol = []
ticker_name = []
ticker_price =[]
ticker_sector =[]
ticker_marketcap =[]
for match in matches:
print(match.text)
driver.quit()
我解决了一些问题
.sleep()
s替换
WebDriverWait
工作代码如下。
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
url = 'https://www.tradingview.com/markets/stocks-turkey/market-movers-all-stocks/'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
while True:
try:
driver.find_element(By.XPATH,'//span[text()="Load More"]').click()
except StaleElementReferenceException:
break
wait = WebDriverWait(driver, 10)
rows = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,'table[class="table-Ngq2xrcG"] tr.listRow')))
for row in rows:
ticker_symbol = row.find_element(By.XPATH, './td[1]//a').text
ticker_name = row.find_element(By.XPATH, './td[1]//sup').text
ticker_price = row.find_element(By.XPATH, './td[2]').text
ticker_marketcap = row.find_element(By.XPATH, './td[6]').text
try:
ticker_sector = row.find_element(By.XPATH, './td[11]/a').text
except NoSuchElementException:
ticker_sector = "—"
print(ticker_symbol, ticker_name, ticker_price, ticker_marketcap, ticker_sector)
driver.quit()
输出是
A1CAP A1 CAPITAL YATIRIM 24.76 TRY 3.38B TRY Finance
ACSEL ACIPAYAM SELULOZ 99.7 TRY 1.104B TRY Process Industries
ADEL ADEL KALEMCILIK 322.50 TRY 7.69B TRY Consumer Durables
...