登录网络抓取

问题描述 投票:0回答:2

我正在尝试在 www.roblox.com 上抓取一个需要登录的页面。我已经使用 .ROBLOSECURITY cookie 完成了此操作,但是,该 cookie 每隔几天就会更改一次。我想使用登录表单和 Python 来登录。表格和我到目前为止所拥有的内容如下。我不想使用任何附加库,例如 mechanize 或 requests。

形式:

<form action="/newlogin" id="loginForm" method="post" novalidate="novalidate" _lpchecked="1">
    <div id="loginarea" class="divider-bottom" data-is-captcha-on="False">
        <div id="leftArea">
            <div id="loginPanel">
                <table id="logintable">
                    <tbody>
                        <tr id="username">
                            <td><label class="form-label" for="Username">Username:</label></td>
                            <td><input class="text-box text-box-medium valid" data-val="true" data-val-required="The Username field is required." id="Username" name="Username" type="text" value="" autocomplete="off" aria-required="true" aria-invalid="false" style="cursor: auto; background-image: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=);"></td>
                        </tr>
                        <tr id="password">
                            <td><label class="form-label" for="Password">Password:</label></td>
                            <td><input class="text-box text-box-medium" data-val="true" data-val-required="The Password field is required." id="Password" name="Password" type="password" autocomplete="off" style="cursor: auto; background-image: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=);"></td>
                        </tr>
                    </tbody>
                </table>
                <div>
                </div>
                <div>
                    <div id="forgotPasswordPanel">
                        <a class="text-link" href="/Login/ResetPasswordRequest.aspx" target="_blank">Forgot your password?</a>
                    </div>
                    <div id="signInButtonPanel" data-use-apiproxy-signin="False" data-sign-on-api-path="https://api.roblox.com/login/v1">
                        <a roblox-js-onclick="" class="btn-medium btn-neutral">Sign In</a>
                        <a roblox-js-oncancel="" class="btn-medium btn-negative">Cancel</a>
                    </div>
                    <div class="clearFloats">
                    </div>
                </div>
                <span id="fb-root">
                    <div id="SplashPageConnect" class="fbSplashPageConnect">
                        <a class="facebook-login" href="/Facebook/SignIn?returnTo=/home" ref="form-facebook">
                            <span class="left"></span>
                            <span class="middle">Login with Facebook<span>Login with Facebook</span></span>
                            <span class="right"></span>
                        </a>
                    </div>
                </span>
            </div>
        </div>
        <div id="rightArea" class="divider-left">
            <div id="signUpPanel" class="FrontPageLoginBox">
                <p class="text">Not a member?</p>
                <h2>Sign Up to Build &amp; Make Friends</h2>
                <a roblox-js-onsignup="" class="btn-medium btn-primary">Sign Up</a>
            </div>
        </div>
    </div>
    <input id="ReturnUrl" name="ReturnUrl" type="hidden" value="">
</form>

到目前为止我所拥有的:

import cookielib
import urllib
import urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.addheaders = [('User-agent', 'Mozilla/5.0')]

urllib2.install_opener(opener)

authentication_url = 'http://www.roblox.com/newlogin'

payload = {
    'ReturnUrl' : 'http://www.roblox.com/home',
    'Username' : 'usernamehere',
    'Password' : 'passwordhere'
    }

data = urllib.urlencode(payload)

req = urllib2.Request(authentication_url, data)

resp = urllib2.urlopen(req)
contents = resp.read()
print contents

我的代码有什么问题;我只有在打印内容时才看到登录页面

PS:登录页面是HTTPS

python forms cookies authentication web-scraping
2个回答
1
投票

OP 的解决方案。

我自己完成了脚本,代码如下:

import cookielib
import urllib
import urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.addheaders = [('User-agent', 'Mozilla/5.0')]

urllib2.install_opener(opener)

authentication_url = 'https://www.roblox.com/newlogin'

payload = {
    'username' : 'YourUsernameHere',
    'password' : 'YourPasswordHere',
    '' : 'Log In',
    }

data = urllib.urlencode(payload)

req = urllib2.Request(authentication_url, data)

resp = urllib2.urlopen(req)
PageYouWantToOpen = urllib2.urlopen("http://www.roblox.com/develop").read()

1
投票

几周前,我仅使用 urllib.request 进行了一些网络抓取/自动选项卡打开,从而制作了这个课程。这可能会帮助您,或者也许会让您走上正确的道路。

import urllib.request
class Log_in:
    def __init__(self, loginURL, username, password):
        self.loginURL = loginURL
        self.username = username
        self.password = password
    def log_in_to_site(self):
        auth_handler = urllib.request.HTTPBasicAuthHandler()
        auth_handler.add_password(realm = None,
                                  uri=self.loginURL,
                                  user=self.username,
                                  passwd=self.password)
        opener = urllib.request.build_opener(auth_handler)
        urllib.request.install_opener(opener)
© www.soinside.com 2019 - 2024. All rights reserved.