如何解决 403 禁止消息机械化 Ruby

问题描述 投票:0回答:1

403 => Net::HTTPForbidden for https://www.state.gov/countries-areas-archive/tunisia/page/2/——未处理的响应(Mechanize::ResponseCodeError)

这是我在控制台中读到的内容,我想抓取一份 9 页的美国国务院关于突尼斯的声明。怎么了?
代码似乎是正确的 Ruby Mechanize:

require 'mechanize'
agent = Mechanize.new
9.times do |i|

page = agent.get("https://www.state.gov/countries-areas-archive/tunisia/page/#{i+1}/")
page.search('a.collection-result_link').each do |link|
    agent.click(link)
url = agent.page.search('link[rel = "canonical"]').attr('href').text
wrapped_url = url.gsub(url, "<a href='#{url}'>الرابط</a>")  
title = agent.page.search('h1.featured-content__headline report-header__headline stars-above').text
statements = [wrapped_url, title]
puts statements
end
end

ruby web-scraping mechanize http-status-code-403
1个回答
0
投票

导入logger类并设置

user_agent_alias
。这在 Mechanize rubydoc 中演示如下:

require 'mechanize'
require 'logger'

agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari'   
© www.soinside.com 2019 - 2024. All rights reserved.