Python 3.5 - 如何对javascript呈现的页面进行webscraping

问题描述 投票:-2回答:1

我正在尝试使用python 3和webdriver提取一个javascript渲染表。

我的代码如下:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
driver.refresh()
# Wait for the dynamically loaded elements to show up
WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.TAG_NAME, "table")))

# And grab the page HTML source
html = driver.page_source
driver.quit()
print(html)

现在,当我打印身体时,我的打印中不存在javascript渲染的内容。我怎样才能提取我想要的表格(表格的整个html代码)?

非常感谢你

javascript python selenium web-scraping webdriver
1个回答
0
投票

我为解决您的问题所做的是使用Beautifulsoup库来解析源代码。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

import bs4

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
driver.refresh()
# Wait for the dynamically loaded elements to show up
WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.TAG_NAME, "table")))

# And grab the page HTML source
html = driver.page_source

# Turns html into a beautifulsoup object
bs4_html = bs4.BeautifulSoup(html, 'lxml')

# Finds the table
table = bs4_html.find_all('table')

driver.quit()

print(table)

控制台输出一英里长,所以我不能把它放在这里。

希望有所帮助!

© www.soinside.com 2019 - 2024. All rights reserved.