我正在参与一个学校项目,我需要从雅虎财经网站获取某些股票的分析师价格目标估计(这是强制性的)。
当我尝试通过 beautiful soup 使用它时,我无法抓取它(我相信 JS 正在调整页面加载),所以我转向 selenium 来获取此类数据。但是,当我尝试通过 XPATH 获取元素时,它会返回错误,就好像它不存在一样。我正在使用 EC,以防它需要加载,但它不起作用。我尝试将等待时间修改为 2 分钟,所以这不是问题
代码如下:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument("--headless")
chrome_options.add_argument(f'user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36')
chrome_options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://finance.yahoo.com/quote/BBAJIOO.MX?.tsrc=fin-srch")
driver.delete_all_cookies()
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="Col2-11-QuoteModule-Proxy"]/div/section/div')))
有人知道为什么会发生这种情况吗?我怎样才能获得这样的评级?
下图是所需的评分
这是 HTML 代码的示例:
<div aria-label="Low 60 Current 64.59 Average 69.25 High 76.8" class="Px(10px)">
<div class="Pos(r) Pb(30px) H(1em)">
<div class="Start(75%) W(25%) Pos(a) H(1em) Bdbc($seperatorColor) Bdbw(2px) Bdbs(d) T(30px)"></div>
<div class="Pos(a) H(1em) Bdbc($seperatorColor) Bdbw(2px) Bdbs(s) T(30px) W(100%)"></div>
<div class="Pos(a) D(ib) T(35px)" data-test="analyst-cur-tg" style="left: 27.3214%;">
<div class="W(7px) H(7px) Bgc($linkActiveColor) Bdrs(50%) Z(1) B(-5px) Translate3d($half3dTranslate) Pos(r)"></div>
<div class="Bgc($linkActiveColor) Start(0) T(5px) W(1px) H(17px) Z(0) Pos(r)"></div>
<div class="Miw(100px) T(6px) C($linkActiveColor) Pos(r) Fz(s) Fw(500) D(ib) Ta(c) Translate3d($half3dTranslate)"><span>Current</span> <span>64.59</span></div>
</div>
<div class="Pos(a) D(ib) T(-1px)" data-test="analyst-avg-tg" style="left: 55.0595%;">
<div class="Pos(r) T(5px) Miw(100px) Fz(s) Fw(500) D(ib) C($primaryColor)Ta(c) Translate3d($half3dTranslate)"><span>Average</span> <span>69.25</span></div>
<div class="Pos(r) Bgc($tertiaryColor) W(1px) H(17px) Z(0) T(6px) Start(-1px)"></div>
<div class="W(8px) H(8px) Bgc(t) Bd Bdc($seperatorColor) Bdrs(50%) Z(1) B(-6px) Pos(r) Translate3d($half3dTranslate)"></div>
</div><span class="W(6px) H(6px) Bgc($tertiaryColor) Bdrs(50%) Z(0) B(-5px) Start(0) Pos(a) Translate3d($half153dTranslate)"></span><span class="W(6px) H(6px) Bgc($tertiaryColor) Bdrs(50%) Z(0) B(-5px) Pos(a) Translate3d($zero153dTranslate) Start(100%)"></span></div>
<div class="Ov(a) Fz(xs) Mt(10px) C($tertiaryColor)">
<div class="Pos(r) Fl(start) Fz(xs) C($tertiaryColor) "><span>Low</span> <span>60.00</span></div>
<div class="Pos(r) Fl(end) Fz(xs) C($tertiaryColor) "><span>High</span> <span>76.80</span></div>
</div>
</div>
我猜您使用的 xpath 是从开发人员模式复制的,但在这种情况下它是空的。
driver.delete_all_cookies() # modify below this line
l = driver.find_element(By.ID, 'app')
f = open("s.txt", "w", encoding='utf-8')
str_content = l.get_attribute("innerHTML")
f.write(str_content)
f.close() # save the log file
l = driver.find_element(By.ID, 'Col2-11-QuoteModule-Proxy')
print(l.get_attribute("innerHTML")) # empty since <span></span>
打开日志文件,注意👇
如果 (窗口.性能) {窗口.性能.标记 && window.performance.mark('Col2-11-QuoteModule');window.performance.measure && window.performance.measure('Col2-11-QuoteModuleDone','PageStart','Col2-11-QuoteModule');}
所以xpath是空的,空有两个意思,不存在,或者还不存在👇
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
l = driver.find_element(By.XPATH, xp)
print(l.get_attribute("innerHTML"))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="Col2-11-QuoteModule-Proxy"]/div/section/div')))
xp='//*[@id="Col2-11-QuoteModule-Proxy"]/div/section/div/div[1]/div[3]/div[3]/span[2]'
l = driver.find_element(By.XPATH, xp)
print(l.text) # the current value
print("you can continue project from here")
driver.quit()
为了安全起见,通常在完成后退出驱动程序。