使用 CSS 选择器选择一组元素和文本

Question

我有一个 HTML 页面，例如：-

<div>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
</div>

我需要选择一个这样的组：-

<a href='link'>
<u class>name</u>
</a>
text
<br>

我需要从一组中选择 3 个值：链接、名称和文本。有什么方法可以选择这样的组，并使用 CSS 选择器、Xpath 或其他方式从 scrapy 中的每个组中提取这些特定值？

Answer 1

Scrapy 提供了一种机制，可以使用

yield

作为项目，定义键值对的 Python 对象，在 html 页面上

Items

多个值。

您可以单独提取，但也可以将它们作为键值对一起生成。

要提取元素属性的值，请使用 attr()。
要提取innerhtml，请使用文本。

就像你可以在 scrapy 中定义解析函数一样：

def parse(self, response):
      
        for_link = response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8)  a::attr(href)').getall()
            
        for_name = response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8) a u::text').getall()
              
        for_text =  response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8)::text').getall()
             
            # Yield all elements
            yield {"link": for_link, "name": for_name, "text": for_text}

打开 items.py 文件。

# Define here the models for your scraped
# items
# Import the required library
import scrapy
 
# Define the fields for Scrapy item here
# in class
class <yourspider>Item(scrapy.Item):
     
    # Item key for a
    for_link = scrapy.Field()
     
    # Item key for u
    for_name = scrapy.Field()
     
    # Item key for span
    for_text = scrapy.Field()

了解更多详情，阅读本教程

Answer 2

如果可以像这样将文本换行：

<a href='link'>
<u class>name</u>
</a>
<span>text</span>
<br>

然后你可以像这样选择 CSS 中的所有内容：

a, a + span {}

或者您可以分别设置这两个样式：

a {}

a + span {}

的意思是“紧随其后”或“紧随其后”。

使用 CSS 选择器选择一组元素和文本

问题描述投票：0回答：2

2个回答

最新问题

使用 CSS 选择器选择一组元素和文本

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2