从结构不良的HTML中获取带有nokogiri的特定项目

问题描述 投票:0回答:1

我正在使用Nokogiri抓取this page以获取事件列表,但该页面或多或少都是全部一块内容。我似乎无法使用以下代码访问特定的h3项目(带有class =“ news”):

def scrape_broadway_books
  base_url = "https://broadwaybookshophackney.com"
  slug = "/events/?event=archive"
  url = base_url + slug
  unparsed_page = HTTParty.get(url)
  parsed_page = Nokogiri::HTML(unparsed_page)
  events_list = parsed_page.at_css("div#content")
  # binding.pry
  events = Array.new
  events_list.each do |item|
    puts item.css("h3.news").text
  end
end

这给了我错误:

undefined method `css' for ["id", "content"]:Array (NoMethodError)

为什么不能遍历内容div?

ruby web-scraping nokogiri
1个回答
0
投票

[at_css returns a single element。它不会重复;当您尝试对其进行迭代时,您将获得div的属性。请改用parsed_page.css("div#content")

© www.soinside.com 2019 - 2024. All rights reserved.