从2个站点抓取数据。第一个站点刮擦其他站点,然后将价格重复两次。第二个站点抓取了正确的数据,但是返回了一个非常模糊的间距问题,即我不确定如何解决。这两个问题都需要审查。
class DailyDealz::Deal
attr_accessor :name, :price, :availability, :url
def self.today
# Scrape woot and meh and then return deals based on that data
self.scrape_deals
end
def self.scrape_deals
deals = []
deals << self.scrape_woot
deals << self.scrape_meh
# deals << self.scrape_steepandcheap
deals
end
def self.scrape_woot
doc = Nokogiri::HTML(open("https://www.woot.com/"))
deal = self.new
deal.name = doc.search("h2.main-title").text.strip
deal.price = doc.search("#todays-deal span.price").text.strip
deal.url = doc.search("a.wantone").first.attr("href").strip
deal.availability = true
deal.website
deal
end
def self.scrape_meh
doc = Nokogiri::HTML(open("https://meh.com/"))
deal = self.new
deal.name = doc.search("section.features h2").text.strip
deal.price = doc.search("#button.buy-button").text.gsub("Buy it.", "").strip
deal.url = "https://meh.com/"
deal.availability = true
deal
end
返回是
// ♥ ./bin/daily-dealz
Todays Daily Deals
1. Apple Watch Blowout! - $129.99–$279.99$129.99$279.99 - true -
2. 12-For-Tuesday: Fun Putty 1.8oz Tins
- 12 for $19 - - true -
Enter the number of the deal you'd like more info on or type list to see deals again or exit to exit
program.
如何删除woot中的重复定价?如何删除meh中的尴尬间距?
有两个问题:
#todays-deal span.price
:三个元素符合此条件。让我们通过更改为>>使其更加具体
#todays-deal .price-holder > span.price
选择
price-holder
div及其下的第一个span.price
。
文本包含换行符。在gsub(/\s+/,' ')
之后添加strip
。
参见此example。
[另一注:#button.buy-button
正在寻找按钮ID,而不是“按钮”类型的元素。将其更改为button.buy-button
。