无法获取网站的隐藏内容

问题描述 投票:0回答:1

我想在BeautifulSoup的帮助下刮一个网站。我无法获取网站的内容,但是当我检查网站时,它是在源代码上。

import requests
import urllib 

from bs4 import BeautifulSoup


url1 = 'https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc'

response1 = get(url1)

print(response1.text[:500])
html_soup1 = BeautifulSoup(response1.text, 'html.parser')
type(html_soup1)

all_info1 = html_soup1.find("div", {"data-bind": "foreach: opportunities"})
all_info1

all_automation1 = all_info1.find_all("div",{"data-automation":"opportunity"})

all_automation1

在源代码中有“作业标题”,“位置”和“描述”等细节,但我无法在html内容中看到相同的细节。

beautifulsoup python-3.5 spyder
1个回答
0
投票

你应该尝试这样或类似的东西从该页面获取标题:

import time
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get('https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc')
time.sleep(3)       #let the browser load it's content
soup = BeautifulSoup(driver.page_source,'lxml')
for item in soup.select("h3 .opportunity-link"):
    print(item.text)
driver.quit()
© www.soinside.com 2019 - 2024. All rights reserved.