嵌套 div 中的 xpath 问题

问题描述 投票:0回答:1

python/scrapy 新手。我正在控制台中通过 xpath 测试响应,并且能够使用下面的代码打印 h1 标头作为测试。 现在我尝试选择 xpath 来提取 (1) 职位名称、(2) 职位 URL

这是我的控制台代码:

    r = scrapy.Request(url='https://www.northropgrumman.com/jobs?remote=yes-may-consider-full-time-teleworking-for-this-position&country=united-states-of-america&_job_category=global-supply-chain,business-management,program-management')
    
    fetch(r)

     #this works and pulls "Job Search" header at top of page
    response.xpath('//h1/text()').getall()
    
    # broken, tried many combos of xpaths to get job title and url
    response.xpath("/html/body/div[1]/main/div[2]/div/div/div[3]/div[2]/div/div/div/div/div[1]/div[1]/div/div/div/div/div/div/div[1]/a/text()").getall() 

此页面列出的职位的职位名称和职位 URL 的 xpath 是什么?

https://www.northropgrumman.com/jobs?remote=yes-may-consider-full-time-teleworking-for-this-position&country=united-states-of-america&_job_category=global-supply-chain,business-管理,项目管理

python xpath scrapy
1个回答
0
投票

职位的 XPath 可能是:

//div[@class="col-sm-9"]/a/@href

对于工作网址:

//div[@class="col-sm-9"]/a/h2/text()

两者均用一个衬垫:

//div[@class="col-sm-9"]/a/@href|//div[@class="col-sm-9"]/a/h2/text()

结果:

href="/jobs/Business-Management/Contract/United-States-of-America/Virginia/Fairfax/R10151186/principal-sr-principal-contract-administrator"
​#text "Principal / Sr Principal Contract Administrator"
​href="/jobs/Business-Management/Contract/United-States-of-America/California/Sunnyvale/R10153611/principal-senior-principal-contract-administrator-hybrid-or-full-time-remote-schedule"
​#text "Principal / Senior Principal Contract Administrator (Hybrid or Full Time Remote Schedule)"
​href="/jobs/Business-Management/Multi-Function/United-States-of-America/Maryland/Linthicum/R10150106/principal-pricing-analyst"
​#text "Principal Pricing Analyst"
...
© www.soinside.com 2019 - 2024. All rights reserved.