如何在Pandas中将列表转换为数据框?

问题描述 投票:0回答:3

我使用Pandas和BeautifulSoup从Wikipedia刮了一张桌子,得到了一个列表。我想将其转换为数据帧,但是当我使用pd.DataFrame()函数时,结果与预期不符。请帮助。

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))

一切正常,直到这一点,但是在那之后,当我尝试以下代码时

neigh = pd.DataFrame(df) 

它只返回一行和一列输出。

python pandas dataframe beautifulsoup
3个回答
2
投票

您已经有一个封装在列表中的pandas DataFrame。您只需要考虑第一个元素:

neigh = df[0]
print(neigh)
    Postcode           Borough          Neighbourhood
0        M1A      Not assigned           Not assigned
1        M2A      Not assigned           Not assigned
2        M3A        North York              Parkwoods
3        M4A        North York       Victoria Village
4        M5A  Downtown Toronto           Harbourfront
..       ...               ...                    ...
282      M8Z         Etobicoke              Mimico NW
283      M8Z         Etobicoke     The Queensway West
284      M8Z         Etobicoke  Royal York South West
285      M8Z         Etobicoke         South of Bloor
286      M9Z      Not assigned           Not assigned

[287 rows x 3 columns]

2
投票

您可以使用pandasread_html函数直接从URL中读取表

>>> url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
>>> tables = pd.read_html(url)
>>> len(tables)
3
>>> tables[0]
    Postcode           Borough          Neighbourhood
0        M1A      Not assigned           Not assigned
1        M2A      Not assigned           Not assigned
2        M3A        North York              Parkwoods
3        M4A        North York       Victoria Village
4        M5A  Downtown Toronto           Harbourfront
..       ...               ...                    ...
282      M8Z         Etobicoke              Mimico NW
283      M8Z         Etobicoke     The Queensway West
284      M8Z         Etobicoke  Royal York South West
285      M8Z         Etobicoke         South of Bloor
286      M9Z      Not assigned           Not assigned

[287 rows x 3 columns]
>>> type(tables[0])
<class 'pandas.core.frame.DataFrame'>

read_html将从URL中读取所有的table标记并返回dataframes的列表


1
投票

您在df中已经有数据框

print(df[0])
© www.soinside.com 2019 - 2024. All rights reserved.