我有一个像这样的pandas数据框(df)。
Close Close Close Close Close
Date
2000-01-03 00:00:00 NaN NaN NaN NaN -0.033944
2000-01-04 00:00:00 NaN NaN NaN NaN 0.0351366
2000-01-05 00:00:00 -0.033944 NaN NaN NaN -0.0172414
2000-01-06 00:00:00 0.0351366 -0.033944 NaN NaN -0.00438596
2000-01-07 00:00:00 -0.0172414 0.0351366 -0.033944 NaN 0.0396476
在 R
如果我想选择第五列
five=df[,5]
而不含第5列
rest=df[,-5]
我如何用pandas数据框架进行类似的操作?
我在大熊猫中试过这个
five=df.ix[,5]
但它给出了这个错误
File "", line 1
df.ix[,5]
^
SyntaxError: invalid syntax
如果你想要第五列。
df.ix[:,4]
把冒号插在那里,就可以取走该列的所有行。
要排除第五列,你可以试试。
df.ix[:, (x for x in range(0, len(df.columns)) if x != 4)]
要选择通过索引过滤列。
In [19]: df
Out[19]:
Date Close Close.1 Close.2 Close.3 Close.4
0 2000-01-0300:00:00 NaN NaN NaN NaN -0.033944
1 2000-01-0400:00:00 NaN NaN NaN NaN 0.035137
2 2000-01-0500:00:00 -0.033944 NaN NaN NaN -0.017241
3 2000-01-0600:00:00 0.035137 -0.033944 NaN NaN -0.004386
4 2000-01-0700:00:00 -0.017241 0.035137 -0.033944 NaN 0.039648
In [20]: df.ix[:, 5]
Out[20]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
In [21]: df.icol(5)
/usr/bin/ipython:1: FutureWarning: icol(i) is deprecated. Please use .iloc[:,i]
#!/usr/bin/python2
Out[21]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
In [22]: df.iloc[:, 5]
Out[22]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
要选择除索引以外的所有列。
In [29]: df[[df.columns[i] for i in range(len(df.columns)) if i != 5]]
Out[29]:
Date Close Close.1 Close.2 Close.3
0 2000-01-0300:00:00 NaN NaN NaN NaN
1 2000-01-0400:00:00 NaN NaN NaN NaN
2 2000-01-0500:00:00 -0.033944 NaN NaN NaN
3 2000-01-0600:00:00 0.035137 -0.033944 NaN NaN
4 2000-01-0700:00:00 -0.017241 0.035137 -0.033944 NaN
如果你的DataFrame没有列标签,你想选择一些特定的列,那么你应该使用 泥鳅 方法。
例如,如果你想选择第一列和所有的行。
df = dataset.iloc[:,0]
这里的df变量将包含存储在数据框架第一列的值。
请记住
type(df) -> pandas.core.series.Series
希望能帮到你