提取其他内容python请求

Question

我希望从网页中提取生成的内容。

我在python 3中使用库请求返回如下页面

 import requests 
 url = "https://app.updateimpact.com/treeof/org.json4s/json4s- 
  native_2.11/3.5.2"

 html_doc = requests.get(url)
 print(html_doc.text)

检索文本似乎只是填充。我应该使用哪些工具来深入研究内容并在那里提取信息？

Answer 1

Javascript需要在页面上运行以提供大部分内容。使用像selenium这样的方法可以运行它。请注意，需要额外的等待条件才能确保加载某些内容。然后，您可以使用selenium语法提取信息或将page_source中的html转储到BeautifulSoup中。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs

d = webdriver.Chrome()
d.get('https://app.updateimpact.com/treeof/org.json4s/json4s-native_2.11/3.5.2')
dependencies = WebDriverWait(d, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.stats-list')))
print(dependencies)
soup = bs(d.page_source, 'lxml')
print(soup.select_one('#tree').text) # example

Answer 2

如果内容是html，您可以查看：

如果它是json，你会使用：

https://docs.python.org/3/library/json.html

提取其他内容python请求

问题描述投票：0回答：2

2个回答

最新问题

提取其他内容python请求

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2