为什么我在抓取此网站时没有收到正确的响应文本?

问题描述 投票:0回答:1

基本上,我正在尝试抓取一个网站,但我没有得到任何返回值作为响应。打印了response.text,但它没有提供动态数据。只有 . 中的非动态内容。已打印回复,但我只是得到 。有人知道我的问题有什么解决办法吗?另外,当我点击禁用JS,然后启用JS时,尽管它是一个动态网站,但由于某种原因在网络中的XHR/Fetch中刷新页面后没有任何请求

import requests
from bs4 import BeautifulSoup

# Set up the URL
url = "https://www.amazon.jobs/en/search?base_query=&loc_query="

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find all job titles
    job_titles = soup.find_all('h3', class_='job-title')

    # Print the job titles
    for title in job_titles:
        print(title.text.strip())
else:
    print("Failed to retrieve the page. Status code:", response.status_code)

我尝试了漂亮的汤、scrapy 和简单的请求,但没有任何效果。

web-scraping beautifulsoup get
1个回答
0
投票

我们可以从后台请求中提取数据。

API URL: https://www.amazon.jobs/en/search.json?radius=24km&facets%5B%5D=normalized_country_code&facets%5B%5D=normalized_state_name&facets%5B%5D=normalized_city_name&facets%5B%5D=location&facets %5B%5D=business_category&facets%5B%5D=category&facets%5B%5D=schedule_type_id&facets%5B%5D=employee_class&fa cets%5B%5D=normalized_location&facets%5B%5D=job_function_id&facets%5B%5D=is_manager&facets%5B%5D=is_intern&偏移量=0&结果限制=100&排序=相关&纬度=&经度=&loc_group_id=&loc_query=&base_query=&城市=&国家=&地区=&县=&query_options=&

import requests
import json


# Set up the URL
url = "https://www.amazon.jobs/en/search.json?radius=24km&facets%5B%5D=normalized_country_code&facets%5B%5D=normalized_state_name&facets%5B%5D=normalized_city_name&facets%5B%5D=location&facets%5B%5D=business_category&facets%5B%5D=category&facets%5B%5D=schedule_type_id&facets%5B%5D=employee_class&facets%5B%5D=normalized_location&facets%5B%5D=job_function_id&facets%5B%5D=is_manager&facets%5B%5D=is_intern&offset=0&result_limit=100&sort=relevant&latitude=&longitude=&loc_group_id=&loc_query=&base_query=&city=&country=&region=&county=&query_options=&"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the JSON content
    data = response.json()
    
    # Find all job titles
    jobs_list = data['jobs']

    # Print the job titles
    for job in jobs_list:
        print(job['title'].strip())
else:
    print("Failed to retrieve the page. Status code:", response.status_code)
© www.soinside.com 2019 - 2024. All rights reserved.