嗨,我正试图从https://newslab.malaysiakini.com/covid-19/en中提取数据
import requests
from bs4 import BeautifulSoup
page = requests.get("https://newslab.malaysiakini.com/covid-19/en")
soup = BeautifulSoup(page.content, 'html.parser')
option_tags = soup.find(id="uk-grid uk-grid-small uk-width-auto uk-flex uk-flex-middle uk-flex-center")
patient_items = option_tags.find_all(class_="patient")
first = patient_items[0]
print(first.prettigy())
我无法提取结果,好像我的html.parser无法获取数据,就像我在Google控制台中看到的那样。任何人都可以帮忙吗?
在向https://newslab.malaysiakini.com/covid-19/en
发出初始请求后,该网站提出了很多请求。这些其他链接可能包含您要查找的内容。
此链接可能包含您正在寻找的所有信息,但GPS坐标除外。该位置更加困难,它们似乎已被编译为一些javascript和数据标签。
https://m5.malaysiakini.com/en/tag/covid-19?alt=json这包含Google地图/列表上所有故事的JSON格式。例如:
{
"title": "Tabligh particiapants: Foreigners the cause of Covid-19 spread, not fair to blame locals",
"sid": 514832,
"image_feat": ["https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg"],
"image_feat_single": "https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg",
"summary": "<p>Most of us went to the hospital for testing as soon we were given the directive, says a participant.</p>",
"author": "",
"author_array": [],
"author_display": "no",
"date_pub": 1584321043,
"date_pub2": "1584321043000",
"date_pubh": "2020-03-16 09:10:43+08:00",
"category": "news",
"comment_count": 0,
"tags": ["health", "coronavirus", "covid-19", "tabligh gathering", "infection"],
"free": false,
"redirect": "",
"date_modh": "2020-03-16 09:10:43+08:00"
}