Python:使用 Selenium/BS4 抓取使用脚本填充的画布

问题描述 投票:0回答:1

我正在尝试为我的 Pyside6 应用程序实现一个价格跟踪页面,我想通过使用 Selenium 和 bs4 进行网页抓取来实现。

我的代码进入名为 Cardmarket 的网站页面,现在我正在尝试实现显示 Yugioh 卡价格波动的图表的网络抓取。 该图表是一个画布,其数据是用脚本填充的,我很难抓取有关它的任何内容;我试过: 1)抓取canvas的信息 2)尝试将画布刮成图像 3)尝试网络抓取

全部返回空行或无

这是我使用的代码部分:

import requests
from bs4 import BeautifulSoup

from lxml import html 

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.chrome.options import Options



options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.cardmarket.com/it/YuGiOh")
print(driver.title)

'''WRITE IN THE SEARCHBAR'''
#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "ProductSearchInput")))
search = driver.find_element(By.ID, "ProductSearchInput")
search.click()
search.send_keys("Ultimate conductor Tyranno")
search.send_keys(Keys.RETURN)
#-----------------------------------------------------------------------------------------------------------------------#

'''TURN IT INTO GRILL VIEW'''
#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "/html/body/main/section/div[2]/div[2]/div/a[2]")))
search = driver.find_element(By.XPATH, "/html/body/main/section/div[2]/div[2]/div/a[2]")
search.click()
#-----------------------------------------------------------------------------------------------------------------------#


#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'h2.card-title.h3')))
search = driver.find_elements(By.CSS_SELECTOR, 'h2.card-title.h3')
#attr = search.get_attribute()
i = 0
clickable_elements = []
for element in search:
    i += 1
    print(element.text)
    clickable_elements.append(element.text)


###User chooses one of the results###
'''choice = input("Choose one of the option, with a number between 1 and %i: " % i)'''
choice = 1  ##############################

while True:
    try:
        choice = int(choice)
        if choice == 0 or choice > i:
            raise Exception("That's not a valid option! retry: ")
        else:
            break

    except:
        print("That's not a valid option!")
        choice = input("Choose one of the option, with a number between 1 and %i: " % i)
#######################################

print(choice)
print(clickable_elements[choice - 1])
search = driver.find_element(By.XPATH, "/html/body/main/section/div[3]/div[%i]/a/div/h2" % int(choice))
search.click()

'''
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "chartjs-render-monitor")))
search = driver.find_element(By.CLASS_NAME, 'chart-init-script')
print(search.text)
'''

URL = driver.current_url
response = requests.get(URL) 
soup = BeautifulSoup(response.text, 'html.parser')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "chart-init-script")))
script = soup.find('script')#, attrs = {'class':'chart-init-script'}) 
print(script.text)

我希望获得图表的图像,因为谷歌让我像这样下载它: (https://i.sstatic.net/cWEUHGog.png)

或者至少抓取脚本中包含的内容,以便将数据提供给 Pyside6 小部件并填充它。

这是我想要抓取的 HTML 部分: (https://i.sstatic.net/kZEU1p2b.png)

python selenium-webdriver web-scraping beautifulsoup html5-canvas
1个回答
0
投票

您无法抓取 CANVAS 标签,因为它就像页面中的应用程序。无需抓取 HTML。如果您有权访问网站/画布标签的开发人员,他们可以为您添加钩子,但否则您就不走运了。

© www.soinside.com 2019 - 2024. All rights reserved.