Scrapy InitSpider无法单击正确的登录按钮

问题描述 投票:0回答:1

我正在尝试使用Scrapy InitSpider登录Tor论坛但是我遇到了以下问题。

以下是我处理登录的代码:

def init_request(self):
  ##This function is called before crawling starts
  return Request(url=self.login_page, callback=self.login)

def login(self, response):
  ##Generates a login request
  return FormRequest.from_response(
    response, 
    formdata = {
      'user': 'username', 
      'passwrd': 'password', 
      'cookielength':'9999'
    }, 
    clickdata = {
      'type': 'submit', 
      'class': 'button_submit', 
      'value': 'Login'
    }, 
    callback = self.check_login_response
  )

def check_login_response(self, response):
  ##Check the response returned by a login request to see if we are successfully logged in.
  print(str(response.body))

  if "Hello" in response.body:
    self.log("Successfully logged in. Let's start crawling!")
    ##Now the crawling can begin..
    self.initialized()

我收到以下错误:

raise ValueError('No clickable element matching clickdata: %r' % (clickdata,))
ValueError: No clickable element matching clickdata: {'type': 'submit', 'class': 'button_submit', 'value': 'Login'}

一旦我从clickdata列表中删除属性'value':'Login',蜘蛛就会选择页面上的第一个可点击元素,而不是要登录的元素,登录失败。

以下是页面登录部分的相关HTML:

<div class="roundframe"><br class="clear">
  <dl>
    <dt>Username:</dt>
    <dd>
      <input type="text" name="user" size="20" value="" class="input_text">
    </dd>
    <dt>Password:</dt>
    <dd>
      <input type="password" name="passwrd" value="" size="20" class="input_password">
    </dd>
  </dl>
  <dl>
    <dt>Minutes to stay logged in:</dt>
    <dd>
      <input type="text" name="cookielength" size="4" maxlength="4" value="120" class="input_text">
    </dd>
    <dt>Always stay logged in:</dt>
    <dd>
      <input type="checkbox" name="cookieneverexp" class="input_check" onclick="this.form.cookielength.disabled = this.checked;">
    </dd>
  </dl>
  <p>
    <input type="submit" value="Login" class="button_submit">
  </p>
  <p class="smalltext">
    <a href="http://thehub7xbw4dc5r2.onion/index.php?action=reminder">
      Forgot your password?
    </a>
  </p>
  <input type="hidden" name="d49bd52b3" value="f29d515eca1c0199840161f01d940973">
  <input type="hidden" name="hash_passwrd" value="">
</div>

谁能告诉我怎么解决这个问题?谢谢!

python web-scraping scrapy
1个回答
1
投票

我不知道为什么clickdata不起作用(它描述得很糟糕,可能只适用于name属性)但这段代码对我有用:

def login(self, response):
    ##Generates a login request
    return scrapy.FormRequest.from_response(
      response,
      formid='frmLogin',
      formdata = {
        'user': 'username',
        'passwrd': 'password',
        'cookielength':'9999'
      },
      # clickdata = {
      #   'type': 'submit',
      #   'class': 'button_submit',
      #   'value': 'Login'
      # },
      callback = self.check_login_response
    )
© www.soinside.com 2019 - 2024. All rights reserved.