我正在尝试从跟踪电晕病毒病例的网站上抓取数据。该网站是“ https://www.coronatracker.com/”
我要抓取的表是这样的:Corona record table
[如果我们看一下它的html元素,它有一个表元素,其中包含thead和tbody我正在尝试阅读整个表格,但是我的尝试仅产生了标题。我也想阅读表的内容。
[这是我编写的代码,希望能阅读表:
import requests
from bs4 import BeautifulSoup
url = "https://www.coronatracker.com/"
html_page = requests.get(url)
soup = BeautifulSoup(html_page.text, 'html.parser')
#pointing to div that is parent to table
data = soup.find('div' , {'class':'w-full block md:hidden mt-4 mb-8'})
#pointing to table
tables = data.find_all('table' , {'class':'table-auto w-full'})
#printing out the headings
for table in tables:
print(table.text)
#printing out the contents
body = table.find('tbody')
for data in body.find_all('tr'):
print(data)
问题在于读取表的内容,标题已被很好地读取。
From bs4 import beautifulsoup
Import request
Dataaa=request.get(url)
Scrapped=beautifulsoup(Dataa.text,html.parse)
Tbody= Scrapped.find('tbody')
您感兴趣的表格内容是动态生成的。但是,您可以使用this link来获取和处理内容。
这里是您可以如何:
import requests
import pandas as pd
URL = "https://api.coronatracker.com/v3/stats/worldometer/topCountry?limit=15&sort=-confirmed"
df = pd.DataFrame(columns=['country','confirmed','recovered','deaths'])
res = requests.get(URL,headers={'User-Agent':'Mozilla/5.0'})
for item in res.json():
country = item['country']
confirmed = item['totalConfirmed']
recovered = item['totalRecovered']
deaths = item['totalDeaths']
df = df.append({'country':country,'confirmed':confirmed,'recovered':recovered,'deaths':deaths},ignore_index=True)
print(df)
输出:
country confirmed recovered deaths
0 USA 1170184 162653 68002
1 Spain 247122 148558 25264
2 Italy 210717 81654 28884
3 UK 186599 135 28446
4 France 168396 50562 24760
5 Germany 165183 130600 6812
6 Russia 134687 16639 1280
7 Turkey 126045 63151 3397
8 Iran 97424 78422 6203
9 Brazil 97100 40937 6761
10 China 82877 77713 4633
11 Canada 57148 24416 3606
12 Belgium 49906 12309 7844
13 Peru 42534 12434 1200
14 India 42490 11775 1391