如何通过其内容查找标签?这是我找到必要元素的方式,但是某些页面上的结构不同,因此并不总是有效。
yield {
...
'Education': response.css('.provider-item:nth-child(3) .h2-style+ span::text').get(),
'Training': response.css('.provider-item:nth-child(4) .h2-style+ span::text').get(),
...
}
检查代码示例
In [4]: i = response.xpath('.//span[contains(text(),"Education")]')
In [5]: i
Out[5]: [<Selector xpath='.//span[contains(text(),"Education")]' data='<span class="listing-h2 h2-style">Edu...'>]
In [6]: i.xpath('following-sibling::span[1]/text()').extract()
Out[6]:
['A.B. in Economics with a minor in Asian Studies, ',
'Occidental College',
'Masters in Chinese Medicine, Tai Hsuan Foundation']
如果您想一次从div.provider-item
标签中提取所有数据点,则可以尝试此操作(如果span.h2
标签中的“键”和value
标签中的具有span
属性的itemprop
,则>
data = {} for item in response.css("div.provider-item"): key = item.css("span.listing-h2.h2-style::text").extract_first() value = item.css("span[itemprop]::text").extract() #value = item.css("span::text").extract()[1:] data[key] = value
如果每个
div.provider-item
标签都严格包含2个span
标签,则可以尝试以下操作:
data = {}
for item in response.css("div.provider-item"):
key, value = item.css("span::text").extract()
data[key] = value