所以我刚开始学习使用python3进行网络抓取,并且我想登录该网站:https://dienynas.tamo.lt/Prisijungimas/Login
所需的表单数据为:用户名:用户名,密码:IsMobileUser:否,ReturnUrl:”,RequireCaptcha:否,时间戳:2020-03-31 14:11:21,SToken:17a48bd154307fe36dcadc6359681609f4799034ad5cade3e1b31864f25fe12f
这是我的代码:
from bs4 import BeautifulSoup
import requests
from lxml import html
from datetime import datetime
data = {'UserName': 'username',
'Password': 'password',
'IsMobileUser': 'false',
'ReturnUrl': '',
'RequireCaptcha': 'false'
}
login_url = 'https://dienynas.tamo.lt/Prisijungimas/Login'
url = 'https://dienynas.tamo.lt/Pranesimai'
with requests.Session() as s:
r = s.get(login_url)
soup = BeautifulSoup(r.content, "lxml")
AUTH_TOKEN = soup.select_one("input[name=SToken]")["value"]
now = datetime.now()
data['Timestamp'] = f'{now.year}-{now.month}-{now.day} {now.hour}:{now.minute}:{now.second}'
data["SToken"] = AUTH_TOKEN
r = s.post(login_url, data=data)
r = s.get(url)
print(r.text)
而且我无法登录该页面,我认为时间戳记做错了吗?请帮助:)
编辑:所以今天我做了一些更改,因为我发现我需要的大多数数据都在隐藏的输入中,所以:
data = {'UserName': 'username',
'Password': 'password',
}
AUTH_TOKEN = soup.find("input",{'name':"SToken"}).get("value")
Timestamp = soup.find("input",{'name':"Timestamp"}).get("value")
IsMobileUser = soup.find("input",{'name':"IsMobileUser"}).get("value")
RequireCaptcha = soup.find("input", {'name': "RequireCaptcha"}).get("value")
ReturnUrl = soup.find("input", {'name': "ReturnUrl"}).get("value")
并将其添加到数据字典中,我也尝试创建标题:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
r = s.post(login_url, data=data, headers=headers)
是的,对我没有任何帮助。也许也许有一种方法可以找出为什么我无法登录?
我同意你的意见。看来您没有发送正确的时间戳。该网站具有输入内容,因此您可以像令牌一样将其抓取并发送,也可以在网站使用的相同时区生成相同的时间戳记
from bs4 import BeautifulSoup
import requests
from lxml import html
from datetime import datetime
from pytz import timezone
data = {'UserName': 'username',
'Password': 'password',
'IsMobileUser': 'false',
'ReturnUrl': '',
'RequireCaptcha': 'false'
}
login_url = 'https://dienynas.tamo.lt/Prisijungimas/Login'
url = 'https://dienynas.tamo.lt/Pranesimai'
with requests.Session() as s:
r = s.get(login_url)
soup = BeautifulSoup(r.content, "lxml")
AUTH_TOKEN = soup.find("input",{'name':"SToken"}).get("value")
Timestamp = soup.find("input",{'name':"Timestamp"}).get("value") #2020-03-31 15:36:37
now = datetime.now(timezone('Etc/GMT-3'))
data['Timestamp'] = now.strftime('%Y-%m-%d %H:%M:%S') #2020-03-31 15:36:36
print('Timestamp from website',Timestamp)
print('Timestamp from python',data['Timestamp'])
data["SToken"] = AUTH_TOKEN
r = s.post(login_url, data=data)
r = s.get(url)
print(r.text)