无法使用beautifulSoup for javascript提取数据吗?

问题描述 投票:1回答:1

嗨,我正试图从https://newslab.malaysiakini.com/covid-19/en中提取数据

import requests
from bs4 import BeautifulSoup

page = requests.get("https://newslab.malaysiakini.com/covid-19/en")

soup = BeautifulSoup(page.content, 'html.parser')

option_tags = soup.find(id="uk-grid uk-grid-small uk-width-auto uk-flex uk-flex-middle uk-flex-center")

patient_items = option_tags.find_all(class_="patient")

first = patient_items[0]
print(first.prettigy())

我无法提取结果,好像我的html.parser无法获取数据,就像我在Google控制台中看到的那样。任何人都可以帮忙吗?

python web-scraping beautifulsoup html-parsing
1个回答
0
投票

在向https://newslab.malaysiakini.com/covid-19/en发出初始请求后,该网站提出了很多请求。这些其他链接可能包含您要查找的内容。

此链接可能包含您正在寻找的所有信息,但GPS坐标除外。该位置更加困难,它们似乎已被编译为一些javascript和数据标签。

https://m5.malaysiakini.com/en/tag/covid-19?alt=json这包含Google地图/列表上所有故事的JSON格式。例如:

{
            "title": "Tabligh particiapants: Foreigners the cause of Covid-19 spread, not fair to blame locals",
            "sid": 514832,
            "image_feat": ["https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg"],
            "image_feat_single": "https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg",
            "summary": "<p>Most of us went to the hospital for testing as soon we were given the directive, says a participant.</p>",
            "author": "",
            "author_array": [],
            "author_display": "no",
            "date_pub": 1584321043,
            "date_pub2": "1584321043000",
            "date_pubh": "2020-03-16 09:10:43+08:00",
            "category": "news",
            "comment_count": 0,
            "tags": ["health", "coronavirus", "covid-19", "tabligh gathering", "infection"],
            "free": false,
            "redirect": "",
            "date_modh": "2020-03-16 09:10:43+08:00"
        }
© www.soinside.com 2019 - 2024. All rights reserved.