ValueError:无法在同一数据帧上没有重叠索引名称的情况下连接

问题描述 投票:0回答:3

我在 pandas 数据帧中遇到了一个奇怪的问题,where in, where() 失败,抱怨它无法加入重叠的索引名称。

要重现此问题,请尝试以下操作:

import yfinance as yf
from datetime import datetime
startdate=datetime(2022,12,1)
enddate=datetime(2022,12,6)
y_symbols = ['GOOG', 'AAPL', 'MSFT']
data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
data[data['Close'] > 100]

然后引发的错误如下所示:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
..
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 229, in join
    join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4658, in join
    return self._join_multi(other, how=how)
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4782, in _join_multi
    raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names

在这里,

data
看起来像:

                 Close                                High                          ...        Open                            Volume
                  AAPL        GOOG        MSFT        AAPL        GOOG        MSFT  ...        AAPL        GOOG        MSFT      AAPL      GOOG      MSFT
Date                                                                                ...
2022-12-01  148.309998  101.279999  254.690002  149.130005  102.589996  256.119995  ...  148.210007  101.400002  253.869995  71250400  21771500  26041500
2022-12-02  147.809998  100.830002  255.020004  148.000000  101.150002  256.059998  ...  145.960007   99.370003  249.820007  65421400  18812200  21522800
2022-12-05  146.630005   99.870003  250.199997  150.919998  101.750000  253.820007  ...  147.770004   99.815002  252.009995  68826400  19955500  23435300

数据框中可能缺少什么而这不起作用?

python pandas dataframe valueerror
3个回答
1
投票

可能是由多级列引起的,因为

where()
方法需要单级列。先试着把它压平。

startdate=datetime(2022,12,1)
enddate=datetime(2022,12,6)
y_symbols = ['GOOG', 'AAPL', 'MSFT']
data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
data = data.stack()
filtered_cond = data['Close'] > 100
filtered_data = data.where(filtered_cond).unstack()

1
投票

这有帮助。从

yfinance
获取结果后设置列名。

理想情况下,希望

yfinance
自己来处理这个问题。

>>> import yfinance as yf
>>> from datetime import datetime
>>> startdate=datetime(2022,12,1)
>>> enddate=datetime(2022,12,6)
>>> y_symbols = ['GOOG', 'AAPL', 'MSFT']
>>> data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
[*********************100%***********************]  3 of 3 completed
>>> data.columns.names = ["Attributes", "Symbols"]
>>> data[data['Close'] > 100]
Attributes       Close                                High                                 Low                                Open                            Volume
Symbols           AAPL        GOOG        MSFT        AAPL        GOOG        MSFT        AAPL        GOOG        MSFT        AAPL        GOOG        MSFT      AAPL        GOOG      MSFT
Date
2022-12-01  148.309998  101.279999  254.690002  149.130005  102.589996  256.119995  146.610001  100.669998  250.919998  148.210007  101.400002  253.869995  71250400  21771500.0  26041500
2022-12-02  147.809998  100.830002  255.020004  148.000000  101.150002  256.059998  145.649994   99.169998  249.690002  145.960007   99.370003  249.820007  65421400  18812200.0  21522800
2022-12-05  146.630005         NaN  250.199997  150.919998         NaN  253.820007  145.770004         NaN  248.059998  147.770004         NaN  252.009995  68826400         NaN  23435300
>>>

>>> data.where(data['Close'] > 100).where(data['High'] > 120)
Attributes       Close                         High                          Low                         Open                     Volume
Symbols           AAPL GOOG        MSFT        AAPL GOOG        MSFT        AAPL GOOG        MSFT        AAPL GOOG        MSFT      AAPL GOOG      MSFT
Date
2022-12-01  148.309998  NaN  254.690002  149.130005  NaN  256.119995  146.610001  NaN  250.919998  148.210007  NaN  253.869995  71250400  NaN  26041500
2022-12-02  147.809998  NaN  255.020004  148.000000  NaN  256.059998  145.649994  NaN  249.690002  145.960007  NaN  249.820007  65421400  NaN  21522800
2022-12-05  146.630005  NaN  250.199997  150.919998  NaN  253.820007  145.770004  NaN  248.059998  147.770004  NaN  252.009995  68826400  NaN  23435300

0
投票

Where 方法接受类似列表的参数,其中包含用于过滤的布尔值。你必须向它传递一个 pandas 系列、numpy 数组、python 列表等。但是你向它传递一个数据帧 (df['close']>500) 并且其中方法引发错误 您可以阅读 pandas 文档以获取更多信息

© www.soinside.com 2019 - 2024. All rights reserved.