[使用Nokogiri从网站抓取时如何访问文本节点

Question

我正在从两个站点抓取数据。首先刮擦其他，然后将价格重复两次。第二个站点抓取了正确的数据，但是返回了一个间距问题，我不确定该如何解决。

class DailyDealz::Deal
attr_accessor :name, :price, :availability, :url

def self.today
 # Scrape woot and meh and then return deals based on that data
 self.scrape_deals
end

def self.scrape_deals
    deals = []

    deals << self.scrape_woot
    deals << self.scrape_meh
    # deals << self.scrape_steepandcheap

    deals
end

def self.scrape_woot
    doc = Nokogiri::HTML(open("https://www.woot.com/"))

    deal = self.new
    deal.name = doc.search("h2.main-title").text.strip
    deal.price = doc.search("#todays-deal span.price").text.strip
    deal.url = doc.search("a.wantone").first.attr("href").strip
    deal.availability = true
    deal.website 

    deal
end

def self.scrape_meh
    doc = Nokogiri::HTML(open("https://meh.com/"))

    deal = self.new
    deal.name = doc.search("section.features h2").text.strip
    deal.price = doc.search("#button.buy-button").text.gsub("Buy it.", "").strip
    deal.url = "https://meh.com/"
    deal.availability = true

    deal
end

返回此：

// ♥  ./bin/daily-dealz
Todays Daily Deals
1. Apple Watch Blowout! - $129.99–$279.99$129.99$279.99 - true - 
2. 12-For-Tuesday: Fun Putty 1.8oz Tins

                                - 12 for $19 -  - true - 
Enter the number of the deal you'd like more info on or type list to see deals again or exit to exit 
program.

我该如何删除重复的定价和不足的间隔？

Answer 1

有两个问题：

#todays-deal span.price：三个元素符合此条件。让我们通过更改为>>使其更加具体
```
#todays-deal .price-holder > span.price
选择price-holder div及其下的第一个span.price。
```
文本包含换行符。在gsub(/\s+/,' ')之后添加strip。

参见此example。

[另一注：#button.buy-button正在寻找按钮ID，而不是“按钮”类型的元素。将其更改为button.buy-button。

Answer 2

请勿使用内核的open，该内核已被覆盖且已不建议使用此方法：

[使用Nokogiri从网站抓取时如何访问文本节点

问题描述投票：0回答：2

2个回答

最新问题

[使用Nokogiri从网站抓取时如何访问文本节点

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2