用于在qazxsw poi中找到href的正则表达式

问题描述 投票:-1回答:2

我需要找到两个使用ruby open-uri的网站之间的距离。运用

 open-uri ruby

查找链接无法正常工作。有什么想法吗?

ruby regex open-uri
2个回答
3
投票

如果你想找到def check(url) site = open(url.base_url) link = %r{^<([a])([^"]+)*([^>]+)*(?:>(.*)<\/\1>|\s+\/>)$} site.each_line {|line| puts $&,$1,$2,$3,$4 if (line=~link)} p url.links end 标签'a参数,请使用正确的工具,这通常不是正则表达式。您更有可能使用HTML / XML解析器。

href是Ruby的首选解析器:

Nokogiri

1
投票

我看到这个正则表达式有几个问题:

  • 空格必须位于空标记中的尾部斜杠之前,但是你的正则表达式需要它
  • 你的正则表达式非常冗长冗余

请尝试以下方法,它会从<a>标签中提取您的网址:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri.HTML(open('http://www.example.org/index.html'))
doc.search('a').map{ |a| a['href'] }

pp doc.search('a').map{ |a| a['href'] }
# => [
# =>  "/",
# =>  "/domains/",
# =>  "/numbers/",
# =>  "/protocols/",
# =>  "/about/",
# =>  "/go/rfc2606",
# =>  "/about/",
# =>  "/about/presentations/",
# =>  "/about/performance/",
# =>  "/reports/",
# =>  "/domains/",
# =>  "/domains/root/",
# =>  "/domains/int/",
# =>  "/domains/arpa/",
# =>  "/domains/idn-tables/",
# =>  "/protocols/",
# =>  "/numbers/",
# =>  "/abuse/",
# =>  "http://www.icann.org/",
# =>  "mailto:[email protected]?subject=General%20website%20feedback"
# => ]
© www.soinside.com 2019 - 2024. All rights reserved.