有一个网站包含从 API 中提取的数据。每页最多可以有 100 行。如果您要检查第 1、2、3 等页面的 API URL,它们每次都会发生变化。到目前为止,我每次都采用相同的脚本,只是切换了 URL,但每次我还必须将其保存在不同的 excel 文件中,否则它会删除数据。
我希望有一个脚本能够从该表中提取所有信息,然后将它们全部放入同一张纸上的 Excel 中,而不会覆盖值。
我使用的主页是http://www.nhl.com/stats/teams?aggregate=0&report=days Betweengames&reportType=game&dateFrom=2021-10-12&dateTo=2021-11-30&gameType=2&filter=gamesPlayed,gte, 1&sort=a_teamFullName,daysRest&page=0&pageSize=50 但请记住,该页面上的所有信息都是从 API 中提取的。
这是我正在使用的代码:
import requests
import json
import pandas as pd
url = ('https://api.nhle.com/stats/rest/en/team/daysbetweengames? `isAggregate=false&isGame=true&sort=%5B%7B%22property%22:%22teamFullName%22,%22direction%22:%22ASC%22%7D,%7B%22property%22:%22daysRest%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22teamId%22,%22direction%22:%22ASC%22%7D%5D&start=0&limit=500&factCayenneExp=gamesPlayed%3E=1&cayenneExp=gameDate%3C=%222021-11-30%2023%3A59%3A59%22%20and%20gameDate%3E=%222021-10-12%22%20and%20gameTypeId=2')`
resp = requests.get(url).text
resp = json.loads(resp)
df = pd.DataFrame(resp['data'])
df.to_excel('Master File.xlsx', sheet_name = 'Info')
任何帮助将不胜感激。
url
有start=...
- 所以你可以使用for
循环并替换这个值0
,100
,200
等并为不同的url运行代码,以及append()
到DataFrame
如果将 url 中的所有参数(在 char
?
之后)放入字典中,然后以 get(url, params=...)
的形式运行,会更简单
requests
有 response.json
所以不需要 json.loads(response.text)
import requests
import pandas as pd
# --- before loop ---
url = 'https://api.nhle.com/stats/rest/en/team/daysbetweengames'
payload = {
'isAggregate': 'false',
'isGame': 'true',
'start': 0,
'limit': 100,
'sort': '[{"property":"teamFullName","direction":"ASC"},{"property":"daysRest","direction":"DESC"},{"property":"teamId","direction":"ASC"}]',
'factCayenneExp': 'gamesPlayed>=1',
'cayenneExp': 'gameDate<="2021-11-30 23:59:59" and gameDate>="2021-10-12" and gameTypeId=2',
}
df = pd.DataFrame()
# --- loop ---
for start in range(0, 1000, 100):
print('start:', start)
payload['start'] = start
response = requests.get(url, params=payload)
data = response.json()
df = df.append(data['data'], ignore_index=True)
# --- after loop ---
print(df)
df.to_excel('Master File.xlsx', sheet_name='Info')
结果:
daysRest faceoffWinPct gameDate ... ties timesShorthandedPerGame wins
0 4 0.47169 2021-10-13 ... None 5.0 1
1 3 0.50847 2021-11-22 ... None 4.0 0
2 2 0.45762 2021-10-26 ... None 1.0 0
3 2 0.56666 2021-11-05 ... None 2.0 1
4 2 0.54716 2021-11-14 ... None 1.0 1
.. ... ... ... ... ... ... ...
675 1 0.37209 2021-10-28 ... None 2.0 1
676 1 0.48000 2021-10-21 ... None 3.0 1
677 0 0.57692 2021-11-06 ... None 1.0 0
678 0 0.32727 2021-11-19 ... None 3.0 0
679 0 0.47169 2021-11-27 ... None 4.0 1