与Python中的多索引熊猫进行交互

问题描述 投票:0回答:2

我具有以下多索引数据框:

df
Out[44]: 
Attributes     Adj Close                            ...      Volume             new
Symbols    ADANIPORTS.NS ASIANPAINT.NS AXISBANK.NS  ...    WIPRO.NS     ZEEL.NS    
Date                                                ...                            
2015-06-12    304.541199    678.997864  536.043945  ...   5907771.0    779489.0 NaN
2015-06-15    302.589355    672.165955  534.683044  ...   2754255.0   1515607.0 NaN
2015-06-16    305.663574    678.324219  535.752380  ...   3010103.0   2013937.0 NaN
2015-06-17    303.516510    683.809143  533.905396  ...   3640391.0   2795132.0 NaN
2015-06-18    303.516510    701.417847  534.683044  ...   4764172.0   1354886.0 NaN
                 ...           ...         ...  ...         ...         ...  ..
2020-06-04    339.200012   1633.750000  394.350006  ...   7397468.0  47567706.0 NaN
2020-06-05    341.149994   1638.550049  405.299988  ...   7245584.0  36525490.0 NaN
2020-06-08    343.299988   1634.250000  430.250000  ...  12213696.0  27555715.0 NaN
2020-06-09    346.049988   1642.050049  420.049988  ...   7845252.0  27072444.0 NaN
2020-06-10    346.549988   1635.949951  427.450012  ...   7391699.0  17115912.0 NaN

[1227 rows x 301 columns]

所以当我跑步时

df.columns.levels
Out[45]: FrozenList([['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume', 'new'], ['ADANIPORTS.NS', 'ASIANPAINT.NS', 'AXISBANK.NS', 'BAJAJ-AUTO.NS', 'BAJFINANCE.NS', 'BAJAJFINSV.NS', 'BHARTIARTL.NS', 'INFRATEL.NS', 'BPCL.NS', 'BRITANNIA.NS', 'CIPLA.NS', 'COALINDIA.NS', 'DRREDDY.NS', 'EICHERMOT.NS', 'GAIL.NS', 'GRASIM.NS', 'HCLTECH.NS', 'HDFC.NS', 'HDFCBANK.NS', 'HEROMOTOCO.NS', 'HINDALCO.NS', 'HINDUNILVR.NS', 'ICICIBANK.NS', 'INDUSINDBK.NS', 'INFY.NS', 'IOC.NS', 'ITC.NS', 'JSWSTEEL.NS', 'KOTAKBANK.NS', 'LT.NS', 'M&M.NS', 'MARUTI.NS', 'NESTLEIND.NS', 'NTPC.NS', 'ONGC.NS', 'POWERGRID.NS', 'RELIANCE.NS', 'SHREECEM.NS', 'SBIN.NS', 'SUNPHARMA.NS', 'TCS.NS', 'TATAMOTORS.NS', 'TATASTEEL.NS', 'TECHM.NS', 'TITAN.NS', 'ULTRACEMCO.NS', 'UPL.NS', 'VEDL.NS', 'WIPRO.NS', 'ZEEL.NS', '']])

最后当我跑步时:

df['Close']
Out[46]: 
Symbols     ADANIPORTS.NS  ASIANPAINT.NS  ...    WIPRO.NS     ZEEL.NS
Date                                      ...                        
2015-06-12     312.049988     705.650024  ...  201.505997  333.700012
2015-06-15     310.049988     698.549988  ...  202.837997  339.250000
2015-06-16     313.200012     704.950012  ...  202.613007  345.000000
2015-06-17     311.000000     710.650024  ...  204.955994  351.100006
2015-06-18     311.000000     728.950012  ...  208.537994  347.000000
                  ...            ...  ...         ...         ...
2020-06-04     339.200012    1633.750000  ...  216.250000  205.600006
2020-06-05     341.149994    1638.550049  ...  218.750000  206.199997
2020-06-08     343.299988    1634.250000  ...  226.449997  197.050003
2020-06-09     346.049988    1642.050049  ...  218.350006  192.399994
2020-06-10     346.549988    1635.949951  ...  217.000000  189.800003

到目前为止很好。我的问题是当我运行以下命令时:

df['new'] =df['Close'].pct_change()

我明白了:

df['new']
Out[40]: 
Date
2015-06-12   NaN
2015-06-15   NaN
2015-06-16   NaN
2015-06-17   NaN
2015-06-18   NaN
              ..
2020-06-04   NaN
2020-06-05   NaN
2020-06-08   NaN
2020-06-09   NaN
2020-06-10   NaN
Name: new, Length: 1227, dtype: float64

我为什么得到NaN?这很奇怪,因为当我运行时:

df['Close'].pct_change()
Out[50]: 
Symbols     ADANIPORTS.NS  ASIANPAINT.NS  ...  WIPRO.NS   ZEEL.NS
Date                                      ...                    
2015-06-12            NaN            NaN  ...       NaN       NaN
2015-06-15      -0.006409      -0.010062  ...  0.006610  0.016632
2015-06-16       0.010160       0.009162  ... -0.001109  0.016949
2015-06-17      -0.007024       0.008086  ...  0.011564  0.017681
2015-06-18       0.000000       0.025751  ...  0.017477 -0.011678
                  ...            ...  ...       ...       ...
2020-06-04      -0.001031      -0.048236  ...  0.023184  0.050588
2020-06-05       0.005749       0.002938  ...  0.011561  0.002918
2020-06-08       0.006302      -0.002624  ...  0.035200 -0.044374
2020-06-09       0.008010       0.004773  ... -0.035769 -0.023598
2020-06-10       0.001445      -0.003715  ... -0.006183 -0.013513

[1227 rows x 50 columns]

我该怎么办才能获得pct_change栏?我期望获得一个新列,该列将计算每个股票收盘价的pct_change。我该怎么办?

enter image description here

python python-3.x pandas multi-index
2个回答
0
投票

您可以从Close的列生成列的列表,例如:

df[[('New',tc) for tc in df['Close'].columns]] = df['Close'].pct_change()

带有示例:

# random values and similar structure
np.random.seed(1)
df = pd.DataFrame(np.random.random(40).reshape(-1, 4), 
                  columns=pd.MultiIndex.from_tuples([['Open', 'a'], ['Open', 'b'], 
                                                     ['Close', 'a'], ['Close', 'b']], 
                                                   names=('lv1', 'lv2')), 
                 index=pd.date_range(end='2020-06-12', periods=10, freq='D'))
#add the columns
df[[('New',tc) for tc in df['Close'].columns]] = df['Close'].pct_change()

print (df)
lv1             Open               Close                    New          
lv2                a         b         a         b            a         b
2020-06-03  0.417022  0.720324  0.000114  0.302333          NaN       NaN
2020-06-04  0.146756  0.092339  0.186260  0.345561  1627.507181  0.142982
2020-06-05  0.396767  0.538817  0.419195  0.685220     1.250585  0.982921
2020-06-06  0.204452  0.878117  0.027388  0.670468    -0.934666 -0.021529
2020-06-07  0.417305  0.558690  0.140387  0.198101     4.125932 -0.704532
2020-06-08  0.800745  0.968262  0.313424  0.692323     1.232574  2.494788
2020-06-09  0.876389  0.894607  0.085044  0.039055    -0.728661 -0.943589
2020-06-10  0.169830  0.878143  0.098347  0.421108     0.156420  9.782485
2020-06-11  0.957890  0.533165  0.691877  0.315516     6.035073 -0.250748
2020-06-12  0.686501  0.834626  0.018288  0.750144    -0.973567  1.377519

#see it is well structure
print (df.columns.levels)
FrozenList([['Close', 'Open', 'New'], ['a', 'b']])

0
投票

您可以使用pd.concat()为所有股票添加一个具有接近百分比变化的新水平,即:

df = pd.concat([df, df[['Close']].pct_change().rename(columns={'Close': 'Close_pct'})], axis=1)

注意双括号:df[['Close']]-这将保留切片的列级别。

© www.soinside.com 2019 - 2024. All rights reserved.