使用beautifulsoup从网站获取json数据

问题描述 投票:0回答:1

对不起,我对此有些陌生,所以我想获取某些json数据"getMe":"IneedThisData"

from bs4 import BeautifulSoup
import json

html_doc = """
<!DOCTYPE html>
<html>
<head>
    <title>Sample</title>
</head>
<body>
<script type="text/javascript">utag_cfg_ovrd = window.utag_cfg_ovrd || {};utag_cfg_ovrd.noview = true;
</script>
<script async="" src="/assets/AppMeasurement.js">
</script>
<script>
    window.REDUX_STATE = {"appConfig":
    {"dataLab":"energy","minimum":"maximum":"getMe":"IneedThisData"}}
</script>

</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
data = json.loads(soup.find('script', 'window.REDUX_STATE').text)

我收到AttributeError: 'NoneType' object has no attribute 'text'的错误我仍然停留在将数据加载到变量中。

python python-3.x
1个回答
0
投票

假设"minimum":"maximum":"getMe"是一个错字,而实际上是"minimum":"maximum","getMe"没有错字,则可以使用以下代码:

soup = BeautifulSoup(html_doc, 'html.parser')
tags = soup.find_all("script")
#print(tags)
data = None
for t in tags:
  text = str(t.contents[0])
  if "window.REDUX_STATE" in text:
    splits = text.split("=")
    print(splits[1])
    data = json.loads(splits[1])


print(data)

在您的代码中,soup.find('script', 'window.REDUX_STATE')与任何标签都不匹配。

© www.soinside.com 2019 - 2024. All rights reserved.