我有一个 csv 文件,看起来像这样:
ID ; name; location; level; DATE19970901; DATE19970902; ...;DATE20201031;survey;person
001; foo; east; 500; 123.1; 342.5; ...; 234.5; A; John
002; bar; west; 50; 67.8; 98.3; ...; 76.6; A; Jenn
003; baz; north; 5000; 535.7; 99.9; ...; 432.6; B; John
我需要将其变成这样的数据框:
ID 001 002 003
name foo bar baz
location east west north
level 500 50 5000
survey A A B
person John Jenn John
date
1997-09-01 123.1 67.8 535,7
1991-09-02 342.5 98.3 99.9
...
2020-10-31 234.5 76.6 432.6
现在,在我看来,最简单的方法是读入它,然后告诉它将数据行 0,1,2,8443,8444 转换为多索引行,但我缺少它的函数。
.transpose()
似乎只需要一个完整的 df 就可以变成多重索引。我可能可以将我的 df 拆分为多索引 df 和数据 df 并将它们合并,但这对我来说似乎很复杂并且容易出错。有一个简单的方法,就是读入 csv,转置 df,将其导出到 csv 并再次读入,但这看起来相当老套(而且很慢,尽管在我的情况下这并不是真正的问题) .
.MultiIndex.from_frame
:
to_datetime
输出:
# read dataset as semicolon-separated data, ignoring spaces
df = pd.read_csv('inpt_data.csv', sep=r'\s*;\s*', engine='python')
# identify non-DATE columns
cols = ['ID', 'name', 'location', 'level', 'survey', 'person']
# or, programmatically
# cols = list(df.columns.difference(list(df.filter(like='DATE')), sort=False))
# reshape, convert to dates
out = df.set_index(cols).T.rename_axis('date')
out.index = pd.to_datetime(out.index, format='DATE%Y%m%d')
ID 1 2 3
name foo bar baz
location east west north
level 500 50 5000
survey A A B
person John Jenn John
date
1997-09-01 123.1 67.8 535.7
1997-09-02 342.5 98.3 99.9
2020-10-31 234.5 76.6 432.6
:
pd.read_csv
输出:
cols = ['ID', 'name', 'location', 'level', 'survey', 'person']
df = pd.read_csv('data.csv', sep=';', index_col=cols).T