如何将具有多个索引列的熊猫数据框传输到R?最好使用CSV文件?
import pandas as pd
df = pd.DataFrame({'foo':[1,1,2], 'baz':['a', 'c', 'b'], 'bar':[0.2, 0.6, 0.9], 'x':[4, 0.6, 0.9]})
display(df)
df = df.groupby(['foo', 'baz']).describe()
display(df)
df.to_csv("test.csv")
生成以下CSV文件:
,,bar,bar,bar,bar,bar,bar,bar,bar,x,x,x,x,x,x,x,x
,,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
foo,baz,,,,,,,,,,,,,,,,
1,a,1.0,0.2,,0.2,0.2,0.2,0.2,0.2,1.0,4.0,,4.0,4.0,4.0,4.0,4.0
1,c,1.0,0.6,,0.6,0.6,0.6,0.6,0.6,1.0,0.6,,0.6,0.6,0.6,0.6,0.6
2,b,1.0,0.9,,0.9,0.9,0.9,0.9,0.9,1.0,0.9,,0.9,0.9,0.9,0.9,0.9
我如何保留这种层次结构?或者,如果不可能的话,至少让R自动连接名称以保留标题中的信息
# as_index=False
df = df.groupby(['foo', 'baz'], as_index=False).describe()
# combine the column names with a .
df.columns = df.columns.map('.'.join)
foo.count foo.mean foo.std foo.min foo.25% foo.50% foo.75% foo.max bar.count bar.mean bar.std bar.min bar.25% bar.50% bar.75% bar.max x.count x.mean x.std x.min x.25% x.50% x.75% x.max
1.0 1.0 NaN 1.0 1.0 1.0 1.0 1.0 1.0 0.2 NaN 0.2 0.2 0.2 0.2 0.2 1.0 4.0 NaN 4.0 4.0 4.0 4.0 4.0
1.0 1.0 NaN 1.0 1.0 1.0 1.0 1.0 1.0 0.6 NaN 0.6 0.6 0.6 0.6 0.6 1.0 0.6 NaN 0.6 0.6 0.6 0.6 0.6
1.0 2.0 NaN 2.0 2.0 2.0 2.0 2.0 1.0 0.9 NaN 0.9 0.9 0.9 0.9 0.9 1.0 0.9 NaN 0.9 0.9 0.9 0.9 0.9