我正在尝试为我的 Pyside6 应用程序实现一个价格跟踪页面,我想通过使用 Selenium 和 bs4 进行网页抓取来实现。
我的代码进入名为 Cardmarket 的网站页面,现在我正在尝试实现显示 Yugioh 卡价格波动的图表的网络抓取。 该图表是一个画布,其数据是用脚本填充的,我很难抓取有关它的任何内容;我试过: 1)抓取canvas的信息 2)尝试将画布刮成图像 3)尝试网络抓取
全部返回空行或无
这是我使用的代码部分:
import requests
from bs4 import BeautifulSoup
from lxml import html
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.cardmarket.com/it/YuGiOh")
print(driver.title)
'''WRITE IN THE SEARCHBAR'''
#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "ProductSearchInput")))
search = driver.find_element(By.ID, "ProductSearchInput")
search.click()
search.send_keys("Ultimate conductor Tyranno")
search.send_keys(Keys.RETURN)
#-----------------------------------------------------------------------------------------------------------------------#
'''TURN IT INTO GRILL VIEW'''
#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "/html/body/main/section/div[2]/div[2]/div/a[2]")))
search = driver.find_element(By.XPATH, "/html/body/main/section/div[2]/div[2]/div/a[2]")
search.click()
#-----------------------------------------------------------------------------------------------------------------------#
#-----------------------------------------------------------------------------------------------------------------------#
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'h2.card-title.h3')))
search = driver.find_elements(By.CSS_SELECTOR, 'h2.card-title.h3')
#attr = search.get_attribute()
i = 0
clickable_elements = []
for element in search:
i += 1
print(element.text)
clickable_elements.append(element.text)
###User chooses one of the results###
'''choice = input("Choose one of the option, with a number between 1 and %i: " % i)'''
choice = 1 ##############################
while True:
try:
choice = int(choice)
if choice == 0 or choice > i:
raise Exception("That's not a valid option! retry: ")
else:
break
except:
print("That's not a valid option!")
choice = input("Choose one of the option, with a number between 1 and %i: " % i)
#######################################
print(choice)
print(clickable_elements[choice - 1])
search = driver.find_element(By.XPATH, "/html/body/main/section/div[3]/div[%i]/a/div/h2" % int(choice))
search.click()
'''
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "chartjs-render-monitor")))
search = driver.find_element(By.CLASS_NAME, 'chart-init-script')
print(search.text)
'''
URL = driver.current_url
response = requests.get(URL)
soup = BeautifulSoup(response.text, 'html.parser')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "chart-init-script")))
script = soup.find('script')#, attrs = {'class':'chart-init-script'})
print(script.text)
我希望获得图表的图像,因为谷歌让我像这样下载它: (https://i.sstatic.net/cWEUHGog.png)
或者至少抓取脚本中包含的内容,以便将数据提供给 Pyside6 小部件并填充它。
这是我想要抓取的 HTML 部分: (https://i.sstatic.net/kZEU1p2b.png)
您无法抓取 CANVAS 标签,因为它就像页面中的应用程序。无需抓取 HTML。如果您有权访问网站/画布标签的开发人员,他们可以为您添加钩子,但否则您就不走运了。