如何在熊猫中一次重新分配多个MultiIndex列?

问题描述 投票:0回答:2

给出同一数据集的两个版本,一个版本堆叠而另一个版本则不堆叠。

>>> a = pandas_datareader.DataReader(["MSFT", "AAPL"], "yahoo")
>>> a
Attributes   Adj Close                   Close                    High                     Low                    Open                  Volume            
Symbols           MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL
Date                                                                                                                                                      
2015-06-01   42.744289  120.306801   47.230000  130.539993   47.770000  131.389999   46.619999  130.050003   47.060001  130.279999  28837300.0  32112800.0
2015-06-02   42.463726  119.772255   46.919998  129.960007   47.349998  130.660004   46.619999  129.320007   46.930000  129.860001  21498300.0  33667600.0
2015-06-03   42.400375  119.919716   46.849998  130.119995   47.740002  130.940002   46.820000  129.899994   47.369999  130.660004  28002200.0  30983500.0
2015-06-04   41.956924  119.219307   46.360001  129.360001   47.160000  130.580002   46.200001  128.910004   46.790001  129.580002  27745500.0  38450100.0
2015-06-05   41.757805  118.564957   46.139999  128.649994   46.520000  129.690002   45.840000  128.360001   46.310001  129.500000  25438100.0  35626800.0
...                ...         ...         ...         ...         ...         ...         ...         ...         ...         ...         ...         ...
2020-05-22  183.509995  318.890015  183.509995  318.890015  184.460007  319.230011  182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011  181.570007  316.730011  186.500000  324.239990  181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  181.809998  318.109985  181.990005  318.709991  176.600006  313.089996  180.199997  316.140015  39517100.0  28236300.0
2020-05-28  181.399994  318.250000  181.399994  318.250000  184.149994  323.440002  180.380005  315.630005  180.740005  316.769989  33810200.0  33390200.0
2020-05-29  183.250000  317.940002  183.250000  317.940002  184.270004  321.149994  180.410004  316.470001  182.729996  319.250000  42130400.0  38383100.0

>>> b = a.stack()
>>> b
Attributes           Adj Close       Close        High         Low        Open      Volume
Date       Symbols                                                                        
2015-06-01 MSFT      42.744289   47.230000   47.770000   46.619999   47.060001  28837300.0
           AAPL     120.306801  130.539993  131.389999  130.050003  130.279999  32112800.0
2015-06-02 MSFT      42.463726   46.919998   47.349998   46.619999   46.930000  21498300.0
           AAPL     119.772255  129.960007  130.660004  129.320007  129.860001  33667600.0
2015-06-03 MSFT      42.400375   46.849998   47.740002   46.820000   47.369999  28002200.0
...                        ...         ...         ...         ...         ...         ...
2020-05-26 AAPL     316.730011  316.730011  324.239990  316.500000  323.500000  31380500.0
2020-05-27 MSFT     181.809998  181.809998  181.990005  176.600006  180.199997  39492600.0
           AAPL     318.109985  318.109985  318.709991  313.089996  316.140015  28211100.0
2020-05-28 MSFT     181.580002  181.580002  182.470001  180.389999  180.740005   9760951.0
           AAPL     319.850006  319.850006  321.070007  315.630005  316.769989  10119124.0

我正在尝试从a中获取几列,对其进行转换,然后将其重新分配给数据集。与b完美搭配。

>>> b[["Close", "High"]] = b[["Close", "High"]].pct_change().fillna(0)
>>> b
Attributes           Adj Close     Close      High         Low        Open      Volume
Date       Symbols                                                                    
2015-06-01 MSFT      42.744289  0.000000  0.000000   46.619999   47.060001  28837300.0
           AAPL     120.306801  1.763921  1.750471  130.050003  130.279999  32112800.0
2015-06-02 MSFT      42.463726 -0.640570 -0.639623   46.619999   46.930000  21498300.0
           AAPL     119.772255  1.769821  1.759451  129.320007  129.860001  33667600.0
2015-06-03 MSFT      42.400375 -0.639504 -0.634624   46.820000   47.369999  28002200.0
...                        ...       ...       ...         ...         ...         ...
2020-05-26 AAPL     316.730011  0.744396  0.738552  316.500000  323.500000  31380500.0
2020-05-27 MSFT     181.809998 -0.425978 -0.438718  176.600006  180.199997  39492600.0
           AAPL     318.109985  0.749684  0.751250  313.089996  316.140015  28211100.0
2020-05-28 MSFT     181.580002 -0.429191 -0.427473  180.389999  180.740005   9760951.0
           AAPL     319.850006  0.761483  0.759577  315.630005  316.769989  10119124.0

[2516 rows x 6 columns]

但是对于a无效。

>>> a[["Close", "High"]] = a[["Close", "High"]].pct_change().fillna(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2935, in __setitem__
    self._setitem_array(key, value)
  File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2961, in _setitem_array
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

如果我逐列在哪里做,这完全有可能。我正在使用for循环作为临时解决方案,但对我来说似乎效率低下且不干净。

>>> a["Close"] = a["Close"].pct_change().fillna(0)
>>> a
Attributes   Adj Close                 Close                  High                     Low                    Open                  Volume            
Symbols           MSFT        AAPL      MSFT      AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL
Date                                                                                                                                                  
2015-06-01   42.744289  120.306801  0.000000  0.000000   47.770000  131.389999   46.619999  130.050003   47.060001  130.279999  28837300.0  32112800.0
2015-06-02   42.463726  119.772255 -0.006564 -0.004443   47.349998  130.660004   46.619999  129.320007   46.930000  129.860001  21498300.0  33667600.0
2015-06-03   42.400375  119.919716 -0.001492  0.001231   47.740002  130.940002   46.820000  129.899994   47.369999  130.660004  28002200.0  30983500.0
2015-06-04   41.956924  119.219307 -0.010459 -0.005841   47.160000  130.580002   46.200001  128.910004   46.790001  129.580002  27745500.0  38450100.0
2015-06-05   41.757805  118.564957 -0.004745 -0.005489   46.520000  129.690002   45.840000  128.360001   46.310001  129.500000  25438100.0  35626800.0
...                ...         ...       ...       ...         ...         ...         ...         ...         ...         ...         ...         ...
2020-05-21  183.429993  316.850006 -0.012011 -0.007455  186.669998  320.890015  183.289993  315.869995  185.399994  318.660004  29119500.0  25672200.0
2020-05-22  183.509995  318.890015  0.000436  0.006438  184.460007  319.230011  182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011 -0.010572 -0.006774  186.500000  324.239990  181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  0.001322  0.004357  181.990005  318.709991  176.600006  313.089996  180.199997  316.140015  39492600.0  28211100.0
2020-05-28  183.561005  322.510010  0.009631  0.013832  183.820007  323.000000  180.389999  315.630005  180.740005  316.769989  15009134.0  16107365.0

我将其作为程序的一部分编写,该程序应该与列是否为MultiIndex无关,是否有任何更干净/更快的方式可以做到这一点而无需遍历列?

python pandas multi-index
2个回答
0
投票

您可以获得一个多索引并对其进行调整。

a[["Close", "High"]].columns

MultiIndex([('Close', 'MSFT'),
            ('Close', 'AAPL'),
            ( 'High', 'MSFT'),
            ( 'High', 'AAPL')],
           names=['Attributes', 'Symbols'])

a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]] = a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]].pct_change().fillna(0)

Attributes  Adj Close   Close   High    Low Open    Volume
Symbols MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL
Date                                                
2015-06-01  42.744289   120.306801  0.000000    0.000000    0.000000    0.000000    46.619999   130.050003  47.060001   130.279999  28837300.0  32112800.0
2015-06-02  42.463726   119.772255  -inf    -inf    -0.008792   -0.005556   46.619999   129.320007  46.930000   129.860001  21498300.0  33667600.0
2015-06-03  42.400375   119.919716  -0.772704   -1.277080   0.008237    0.002143    46.820000   129.899994  47.369999   130.660004  28002200.0  30983500.0
2015-06-04  41.956924   119.219307  6.010459    -5.744469   -0.012149   -0.002749   46.200001   128.910004  46.790001   129.580002  27745500.0  38450100.0
2015-06-05  41.757805   118.564957  -0.546270   -0.060285   -0.013571   -0.006816   45.840000   128.360001  46.310001   129.500000  25438100.0  35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22  183.509995  318.890015  -1.036311   -1.863583   -0.011839   -0.005173   182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011  -25.238713  -2.052047   0.011059    0.015694    181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  -1.125029   -1.643233   -0.024182   -0.017055   176.600006  313.089996  180.199997  316.140015  39517100.0  28236300.0
2020-05-28  181.399994  318.250000  -2.706163   -0.898978   0.011869    0.014841    180.380005  315.630005  180.740005  316.769989  33810200.0  33390200.0
2020-05-29  183.250000  317.940002  -5.522368   -3.213063   0.000652    -0.007080   180.410004  316.470001  182.729996  319.250000  42130400.0  38383100.0
1259 rows × 12 columns

0
投票

对于Multiindexes,使用loc方法获得您的结果要安全得多。在下面的代码中,loc集中在列上(axis = 0表示在行上工作),并选择“ Close”和“ High”。您可以安全地将替换值放在等式的另一侧,并且不会出现任何错误。我也建议阅读MultiIndexes上的pandas文档以获取更多信息-我相信使用multiIndexes会对您有所帮助:

a.loc(axis=1)[["Close","High"]] = a[["Close","High"]].pct_change().fillna(0)
© www.soinside.com 2019 - 2024. All rights reserved.