为什么我尝试从 GlassDoor 抓取时状态代码为 403?

问题描述 投票:0回答:1

我正在尝试从玻璃门上抓取工作信息。类似的代码结构对于 LinkedIn 来说效果很好,但我在这里遇到了问题。我收到的状态代码为 403,所以我猜我在 GlassDoor 后端遇到了某种障碍。我想知道是否有人可以帮我找出原因。

    #Define URL
    url = "https://www.glassdoor.com/Job/united-states-it-entry-level-jobs-    SRCH_IL.0,13_IN1_KO14,28.html"
    
    #Retrieve HTML Content
    response = requests.get(url)
    html = response.content
    
    #Parae HTML
    soup = BeautifulSoup(html, 'html.parser')
    
    #Pull Job Titles
    jobListing = soup.find_all('div', {'class': 'jobCard JobCard_jobCardContent__X81Ew'})
    for div in jobListing:
        #Using this to test if it finds the div
        print("Found Div")
    else:
        print("Didn't find Div")
    #    for a in div.find_all('a', {'class': 'JobCard_jobTitle___7I6y'}):
    #        print(a.text.strip())
    
    if response.status_code == 200:
        print("Yes")
    else:
        print(response.status_code)

我期望能够找到该 Div 并提取数据,但事实并非如此

python web-scraping beautifulsoup screen-scraping scrape
1个回答
0
投票

向服务器发出请求时尝试设置

User-Agent
标头:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0"
}

url = "https://www.glassdoor.com/Job/united-states-it-entry-level-jobs-SRCH_IL.0,13_IN1_KO14,28.html"

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")

for a in soup.select('a[id^="job-title"]'):
    print(a.text)

打印:

After Hours NOC Analyst - Entry Level IT
IT Security Governance Analyst Level I
IT Audit Temp (Entry Level - 3 month temp paid position)
Restaurant IT Analyst Level I
Junior ServiceNow HRSD Developer
IT Analyst I
Junior IT Operations Engineer
Network Engineers, Entry Level
Entry-Level Unix Systems Administrator
IT Support Specialist
IT Engineer(Entry-Mid Level)
Urgent IT Telecaller Executive
Product Manager - Entry Level
IT Support Generalist
IT Analyst I
Junior IT Administrator
UWB Systems Engineer - Entry Level
Entry level IT Specialist
Entry Level IT Support Technician
Helpdesk Engineer (Entry-Level)
IT Project Coordinator Intern
Entry-Level Cyber Security Analyst
Onsite - Remote IT Support
Level 1 IT Help Desk Analyst -GA
Electronic Systems Engineer (Entry-Level)
IT Support Level 1
Systems Analyst
IT Help Desk Technician
IT Field Tech I
Entry-Level ERP Analyst
© www.soinside.com 2019 - 2024. All rights reserved.