Python：使用 Selenium/BS4 抓取使用脚本填充的画布

Question

我正在尝试为我的 Pyside6 应用程序实现一个价格跟踪页面，我想通过使用 Selenium 和 bs4 进行网页抓取来实现。

我的代码进入名为 Cardmarket 的网站页面，现在我正在尝试实现显示 Yugioh 卡价格波动的图表的网络抓取。该图表是一个画布，其数据是用脚本填充的，我很难抓取有关它的任何内容；我试过： 1）抓取canvas的信息 2）尝试将画布刮成图像 3）尝试网络抓取

全部返回空行或无

这是我使用的代码部分：

import requests
from bs4 import BeautifulSoup

from lxml import html 

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.chrome.options import Options



options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.cardmarket.com/it/YuGiOh")
print(driver.title)

'''WRITE IN THE SEARCHBAR'''
#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "ProductSearchInput")))
search = driver.find_element(By.ID, "ProductSearchInput")
search.click()
search.send_keys("Ultimate conductor Tyranno")
search.send_keys(Keys.RETURN)
#-----------------------------------------------------------------------------------------------------------------------#

'''TURN IT INTO GRILL VIEW'''
#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "/html/body/main/section/div[2]/div[2]/div/a[2]")))
search = driver.find_element(By.XPATH, "/html/body/main/section/div[2]/div[2]/div/a[2]")
search.click()
#-----------------------------------------------------------------------------------------------------------------------#


#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'h2.card-title.h3')))
search = driver.find_elements(By.CSS_SELECTOR, 'h2.card-title.h3')
#attr = search.get_attribute()
i = 0
clickable_elements = []
for element in search:
    i += 1
    print(element.text)
    clickable_elements.append(element.text)


###User chooses one of the results###
'''choice = input("Choose one of the option, with a number between 1 and %i: " % i)'''
choice = 1  ##############################

while True:
    try:
        choice = int(choice)
        if choice == 0 or choice > i:
            raise Exception("That's not a valid option! retry: ")
        else:
            break

    except:
        print("That's not a valid option!")
        choice = input("Choose one of the option, with a number between 1 and %i: " % i)
#######################################

print(choice)
print(clickable_elements[choice - 1])
search = driver.find_element(By.XPATH, "/html/body/main/section/div[3]/div[%i]/a/div/h2" % int(choice))
search.click()

'''
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "chartjs-render-monitor")))
search = driver.find_element(By.CLASS_NAME, 'chart-init-script')
print(search.text)
'''

URL = driver.current_url
response = requests.get(URL) 
soup = BeautifulSoup(response.text, 'html.parser')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "chart-init-script")))
script = soup.find('script')#, attrs = {'class':'chart-init-script'}) 
print(script.text)

我希望获得图表的图像，因为谷歌让我像这样下载它： (https://i.sstatic.net/cWEUHGog.png)

或者至少抓取脚本中包含的内容，以便将数据提供给 Pyside6 小部件并填充它。

这是我想要抓取的 HTML 部分： (https://i.sstatic.net/kZEU1p2b.png)

Answer 1

您无法抓取 CANVAS 标签，因为它就像页面中的应用程序。无需抓取 HTML。如果您有权访问网站/画布标签的开发人员，他们可以为您添加钩子，但否则您就不走运了。

Python：使用 Selenium/BS4 抓取使用脚本填充的画布

问题描述投票：0回答：1

1个回答

最新问题

Python：使用 Selenium/BS4 抓取使用脚本填充的画布

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1