xpath 在控制台中有效，但在 scrapy 中无效

Question

我正在学习网络抓取，我正在尝试抓取这个网站http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights

所以当我执行“scrapy crawl baseket”时，即使

xpath

experion 是正确的

，结果也是空的

所以它只给我 Name 、 Team 、 Position 、 Height(inches) 、 Weight(pounds) 而不是数据

拜托，如果有答案，请解释问题以及你是如何知道的，如果有办法，因为我不想再次因同样的问题而失败，谢谢。 .

我看到了这样的问题，有人说那个表不包含我试图删除的主体，但是一样。

它在 shell 中也不起作用。这是代码：

import scrapy


class BasketSpider(scrapy.Spider):
    name = "basket"
    allowed_domains = ["wiki.stat.ucla.edu"]
     start_urls = ["http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights"]
 
def parse(self, response):
    for row in response.xpath('//table[@class="wikitablet"/body/tr]'):
        yield {
            'name': row.xpath('.//th[1]/text()').get(),
            'team': row.xpath('.//tr/th[2]/text()').get(),
            'position': row.xpath('.//tr/th[3]/text()').get(),
            'height': row.xpath('.//tr/th[4]/text()').get(),
            'weight': row.xpath('.//tr/th[5]/text()').get(),
             }

Answer 1

你在 XPath 中选择了错误的类

它是 //table[@class="wikitable"]

你用过：[@class="wikitablet"

Answer 2

浏览器注入的

tbody

元素在实际响应中并不存在。当您应该在每个字段的 xpath 表达式中使用

th

时，您也在使用

td

。

您只需删除初始 xpath 表达式中的那部分：

例如：

import scrapy


class BasketSpider(scrapy.Spider):
    name = "basket"
    allowed_domains = ["wiki.stat.ucla.edu"]
     start_urls = ["http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights"]
 
    def parse(self, response):
        for row in response.xpath('//table[@class="wikitable"]/tr]'):
            yield {
                'name': row.xpath('.//td[1]/text()').get(),
                'team': row.xpath('.//td[2]/text()').get(),
                'position': row.xpath('.//td[3]/text()').get(),
                'height': row.xpath('.//td[4]/text()').get(),
                'weight': row.xpath('.//td[5]/text()').get(),
             }

xpath 在控制台中有效，但在 scrapy 中无效

问题描述投票：0回答：2

2个回答

最新问题

xpath 在控制台中有效，但在 scrapy 中无效

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2