这是我写的代码:
from selenium import webdriver
import pandas as pd
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
# List of URLs to scrape
urls = ["https://www.monolithai.com/blog/4-ways-ai-is-changing-the-packaging-industry"
,"https://mitsubishisolutions.com/the-role-of-artificial-intelligence-in-smart-packaging-lines"
, "https://thedatascientist.com/how-artificial-intelligence-is-revolutionizing-the-packaging-industry/"
," https://packagingeurope.com/comment/ai-and-the-future-of-packaging/9665.article"]
# Initialize the WebDriver
driver = webdriver.Chrome() # Use appropriate WebDriver for your browser
wait = WebDriverWait(driver,10)
# Initialize empty lists to store scraped data
all_text = []
all_images = []
all_links = []
# Iterate over each URL and scrape text, images, and links
for url in urls:
driver.get(url)
body= wait.until(EC.presence_of_element_located((By.TAG_NAME,'body')))
# Scrape text
page_text = driver.find_element_by_tag_name('body').text
all_text.append(page_text)
# Scrape images
images = driver.find_elements_by_tag_name('img')
image_urls = [img.get_attribute('src') for img in images]
all_images.append(image_urls)
# Scrape links
links = driver.find_elements_by_tag_name('a')
link_urls = [link.get_attribute('href') for link in links]
all_links.append(link_urls)
# Close the WebDriver when finished
#driver.quit()
# Create a DataFrame from the scraped data
data = {
'URL': urls,
'Text': all_text,
'Images': all_images,
'Links': all_links
}
df = pd.DataFrame(data)
# Save the DataFrame to an Excel file
df.to_excel('scraped_data.xlsx', index=False)
出现以下错误:
DevTools 监听 ws://127.0.0.1:56991/devtools/browser/8be11b91-e7ec-4f18-949e-7319a4341af5 回溯(最近一次调用最后一次):文件“c:\Users\PRADEEP BIRARE\Desktop\web3.py",第 29 行,在
page_text = driver.find_element_by_tag_name('body').text 属性错误: “WebDriver”对象没有属性“find_element_by_tag_name”PS C:\Users\PRADEEP BIRARE>
该错误源于 Selenium 4.3.0+ 中已弃用的 find_element_by_* 方法。
修复:将它们替换为 find_element(By.TAG_NAME, 'body') (与图像/链接类似)。
考虑使用 driver.quit() 后关闭 WebDriver。