Python 3.5 - 如何对javascript呈现的页面进行webscraping

Question

我正在尝试使用python 3和webdriver提取一个javascript渲染表。

我的代码如下：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
driver.refresh()
# Wait for the dynamically loaded elements to show up
WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.TAG_NAME, "table")))

# And grab the page HTML source
html = driver.page_source
driver.quit()
print(html)

现在，当我打印身体时，我的打印中不存在javascript渲染的内容。我怎样才能提取我想要的表格（表格的整个html代码）？

非常感谢你

Answer 1

我为解决您的问题所做的是使用Beautifulsoup库来解析源代码。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

import bs4

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
driver.refresh()
# Wait for the dynamically loaded elements to show up
WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.TAG_NAME, "table")))

# And grab the page HTML source
html = driver.page_source

# Turns html into a beautifulsoup object
bs4_html = bs4.BeautifulSoup(html, 'lxml')

# Finds the table
table = bs4_html.find_all('table')

driver.quit()

print(table)

控制台输出一英里长，所以我不能把它放在这里。

希望有所帮助！

Python 3.5 - 如何对javascript呈现的页面进行webscraping

问题描述投票：-2回答：1

1个回答

最新问题

Python 3.5 - 如何对javascript呈现的页面进行webscraping

问题描述 投票：-2回答：1

1个回答

最新问题

问题描述投票：-2回答：1