将鼠标移到绘图上时显示的剪贴数据

问题描述 投票:0回答:1

我对从https://www.hltv.org/team/7532/big之类的网页中自动抓取感兴趣。更准确地说,我想从将鼠标悬挂在绘图上时显示的框中提取日期和#rank(请参见下面的屏幕截图)

我曾尝试将python与硒结合使用,但尽管经历了不同的教程,但我真的不知道如何进一步进行。我觉得我需要从样式属性更改顶部和左侧的值,但我不知道如何执行以及是否应该使用xpath,css选择器或其他任何方法。这是我的一段代码,返回了我感兴趣的WebElement(大概),但是我没有从中提取任何东西:(

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
executable_path=r'C:/Users/fabbe/Documents/Python Scripts/hltv/chromedriver/chromedriver.exe'
driver = webdriver.Chrome(executable_path, chrome_options=options)

driver.get("https://www.hltv.org/team/7532/big")

elements = driver.find_elements_by_xpath("//*[@id='fusioncharts-tooltip-element']")

screenshot

python selenium screen-scraping
1个回答
0
投票

我将采用另一种方法来获取图形数据,这样就不必将鼠标悬停在图形的所有部分上。

您必须添加以下导入。

import json
from lxml import html

代码:

url = "https://www.hltv.org/team/7532/BIG"
driver.get(url)
graph_data  = driver.find_element_by_css_selector('.chart-container.core-chart-container .border-box .graph').get_attribute('data-fusionchart-config')
graph_text = json.loads(graph_data)['dataSource']['dataset'][0]['data']
for graph_item in graph_text:
    tree = html.fromstring(graph_item['tooltext'])
    print("Date:" + tree.xpath("//div[@class='subtitle']//text()")[0])
    print("Rank:" + tree.xpath("(//div[@class='ranking-development-top-info']//div[@class='title'])[2]//text()")[0])
driver.close()

这里获取图形内容,然后进行解析。然后仅获取我们感兴趣的数据并遍历所有图形项。

下面是输出。

Date:24th December 2018
Rank:#11
Date:31st December 2018
Rank:#11
Date:7th January 2019
Rank:#11
Date:14th January 2019
Rank:#12
Date:21st January 2019
Rank:#13
Date:28th January 2019
Rank:#13
Date:4th February 2019
Rank:#15
Date:11th February 2019
Rank:#12
Date:18th February 2019
Rank:#14
Date:25th February 2019
Rank:#15
Date:4th March 2019
Rank:#18
Date:11th March 2019
Rank:#16
Date:18th March 2019
Rank:#18
Date:25th March 2019
Rank:#18
Date:1st April 2019
Rank:#18
Date:8th April 2019
Rank:#18
Date:15th April 2019
Rank:#18
Date:22nd April 2019
Rank:#19
Date:29th April 2019
Rank:#19
Date:6th May 2019
Rank:#18
Date:13th May 2019
Rank:#18
Date:20th May 2019
Rank:#20
Date:27th May 2019
Rank:#22
Date:3rd June 2019
Rank:#22
Date:10th June 2019
Rank:#22
Date:17th June 2019
Rank:#26
Date:24th June 2019
Rank:#30
Date:1st July 2019
Rank:#34
Date:8th July 2019
Rank:#23
Date:15th July 2019
Rank:#27
Date:22nd July 2019
Rank:#22
Date:29th July 2019
Rank:#23
Date:5th August 2019
Rank:#28
Date:12th August 2019
Rank:#25
Date:19th August 2019
Rank:#24
Date:26th August 2019
Rank:#26
Date:2nd September 2019
Rank:#28
Date:9th September 2019
Rank:#24
Date:16th September 2019
Rank:#22
Date:23rd September 2019
Rank:#22
Date:30th September 2019
Rank:#21
Date:7th October 2019
Rank:#27
Date:14th October 2019
Rank:#24
Date:21st October 2019
Rank:#26
Date:28th October 2019
Rank:#24
Date:4th November 2019
Rank:#24
Date:11th November 2019
Rank:#24
Date:18th November 2019
Rank:#28
Date:25th November 2019
Rank:#26
Date:2nd December 2019
Rank:#26
Date:9th December 2019
Rank:#29
Date:16th December 2019
Rank:#33
Date:23rd December 2019
Rank:#40
Date:30th December 2019
Rank:#39
Date:6th January 2020
Rank:#46
Date:13th January 2020
Rank:#46
Date:20th January 2020
Rank:#46
Date:27th January 2020
Rank:#22
Date:3rd February 2020
Rank:#22
Date:10th February 2020
Rank:#23
Date:17th February 2020
Rank:#25
Date:24th February 2020
Rank:#26
Date:2nd March 2020
Rank:#21
Date:9th March 2020
Rank:#20
© www.soinside.com 2019 - 2024. All rights reserved.