需要登录的网站的网络抓取[重复]

问题描述 投票:1回答:1

首先,我不是python专家。我正在学习python从这个特定的游戏网站上抓取数据。我正在尝试从需要登录的网站上抓取数据。除非您登录该网站,否则您将看不到数据。(我已附上屏幕截图,您登录后将在上述网站上看到该页面)我尝试运行以下代码:

import requests
from bs4 import BeautifulSoup

page = requests.get('<website url>')
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)

[这里,我得到的结果就像我没有登录该网站一样。有人可以指导我我需要做什么吗?

enter image description here

python-3.x web-scraping beautifulsoup
1个回答
1
投票

您可以使用requests.session()登录,然后发出下一个请求。

例如:

import requests
from bs4 import BeautifulSoup

data = {'lEmail': '<YOUR EMAIL HERE>',
        'lPass': '<YOUR PASSWORD HERE>',
        'fbSig': 'web'}

url = 'https://www.airline4.net/research_main.php?mode=search&rwy=1000&dist=25000&depId=3982&arr=0&arrId=0&fbSig=false'
login_url = 'https://www.airline4.net/weblogin/login.php'

with requests.session() as s:
    s.post(login_url, data=data).text

    # now you are logged in, just print some information:
    soup = BeautifulSoup(s.get(url).content, 'html.parser')
    print(soup.get_text(strip=True, separator='\n'))

打印:

Distance
Y class
J class
F class
Rwy
OPIS
-
SCIP
Pakistan, Islamabad
-
Chile, Isla De Pascua
19,273 km
10,827ft rwy
Market:
55%
Y class
473
J class
221
F class
129
OPIS
-
NTGJ
Pakistan, Islamabad
-
French Polynesia, Totegegie
17,075 km
6,562ft rwy
Market:
67%
Y class
286
J class
161
F class
21
OPIS
-

... and so on.
© www.soinside.com 2019 - 2024. All rights reserved.