如何更新python抓取的有效负载信息

问题描述 投票:0回答:1

我有一个适用于该网站的python刮板:

https://dhhr.wv.gov/COVID-19/Pages/default.aspx

它将通过单击上面URL中的“正病例趋势”链接从其中一个图表中刮取工具提示。

这是我的代码:

import re
import requests
import json
from datetime import date
url4 = 'https://wabi-us-gov-virginia-api.analysis.usgovcloudapi.net/public/reports/querydata?synchronous=true'
# payload:
x=r'{"version":"1.0.0","queries":[{"Query":{"Commands":[{"SemanticQueryDataShapeCommand":{"Query":{"Version":2,"From":[{"Name":"c","Entity":"Case Data"}],"Select":[{"Column":{"Expression":{"SourceRef":{"Source":"c"}},"Property":"Lab Report Date"},"Name":"Case Data.Lab Add Date"},{"Aggregation":{"Expression":{"Column":{"Expression":{"SourceRef":{"Source":"c"}},"Property":"Daily Confirmed Cases"}},"Function":0},"Name":"Sum(Case Data.Daily Confirmed Cases)"},{"Aggregation":{"Expression":{"Column":{"Expression":{"SourceRef":{"Source":"c"}},"Property":"Daily Probable Cases"}},"Function":0},"Name":"Sum(Case Data.Daily Probable Cases)"}]},"Binding":{"Primary":{"Groupings":[{"Projections":[0,1,2]}]},"DataReduction":{"DataVolume":4,"Primary":{"BinnedLineSample":{}}},"Version":1}}}]},"CacheKey":"{\"Commands\":[{\"SemanticQueryDataShapeCommand\":{\"Query\":{\"Version\":2,\"From\":[{\"Name\":\"c\",\"Entity\":\"Case Data\"}],\"Select\":[{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"c\"}},\"Property\":\"Lab Report Date\"},\"Name\":\"Case Data.Lab Add Date\"},{\"Aggregation\":{\"Expression\":{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"c\"}},\"Property\":\"Daily Confirmed Cases\"}},\"Function\":0},\"Name\":\"Sum(Case Data.Daily Confirmed Cases)\"},{\"Aggregation\":{\"Expression\":{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"c\"}},\"Property\":\"Daily Probable Cases\"}},\"Function\":0},\"Name\":\"Sum(Case Data.Daily Probable Cases)\"}]},\"Binding\":{\"Primary\":{\"Groupings\":[{\"Projections\":[0,1,2]}]},\"DataReduction\":{\"DataVolume\":4,\"Primary\":{\"BinnedLineSample\":{}}},\"Version\":1}}}]}","QueryId":"","ApplicationContext":{"DatasetId":"fb9b182d-de95-4d65-9aba-3e505de8eb75","Sources":[{"ReportId":"dbabbc9f-cc0d-4dd0-827f-5d25eeca98f6"}]}}],"cancelQueries":[],"modelId":339580}'
x=x.replace("\\\'","'")
json_data = json.loads(x)
final_data2 = requests.post(url4, json=json_data, headers={'X-PowerBI-ResourceKey': 'ab4e5874-7bbf-44c9-9443-0701abdee612'}).json()
print(json.dumps(final_data2))

问题是,由于有效负载和X-PowerBI-ResourceKey标头参数值更改,并且我不得不从浏览器检查网络部分中查找并手动复制和粘贴新值并将其粘贴到我的源中,因此几天后它停止工作。有没有办法从网页上以编程方式获取它们并在我的代码中构建它们?

web-scraping beautifulsoup scrapy web-crawler
1个回答
0
投票

我很确定资源密钥是编码为base64的iframe网址的一部分。

from base64 import b64decode
from bs4 import BeautifulSoup
import json
import requests


resp = requests.get('https://dhhr.wv.gov/COVID-19/Pages/default.aspx')
soup = BeautifulSoup(resp.text)

data = soup.find_all('iframe')[0]['src'].split('=').pop()
decoded = json.loads(b64decode(data).decode())
© www.soinside.com 2019 - 2024. All rights reserved.