如果你想找到def check(url)
site = open(url.base_url)
link = %r{^<([a])([^"]+)*([^>]+)*(?:>(.*)<\/\1>|\s+\/>)$}
site.each_line {|line| puts $&,$1,$2,$3,$4 if (line=~link)}
p url.links
end
标签'a
参数,请使用正确的工具,这通常不是正则表达式。您更有可能使用HTML / XML解析器。
href
是Ruby的首选解析器:
Nokogiri
我看到这个正则表达式有几个问题:
请尝试以下方法,它会从<a>标签中提取您的网址:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.HTML(open('http://www.example.org/index.html'))
doc.search('a').map{ |a| a['href'] }
pp doc.search('a').map{ |a| a['href'] }
# => [
# => "/",
# => "/domains/",
# => "/numbers/",
# => "/protocols/",
# => "/about/",
# => "/go/rfc2606",
# => "/about/",
# => "/about/presentations/",
# => "/about/performance/",
# => "/reports/",
# => "/domains/",
# => "/domains/root/",
# => "/domains/int/",
# => "/domains/arpa/",
# => "/domains/idn-tables/",
# => "/protocols/",
# => "/numbers/",
# => "/abuse/",
# => "http://www.icann.org/",
# => "mailto:[email protected]?subject=General%20website%20feedback"
# => ]