我在 pandas 数据帧中遇到了一个奇怪的问题,where in, where() 失败,抱怨它无法加入重叠的索引名称。
要重现此问题,请尝试以下操作:
import yfinance as yf
from datetime import datetime
startdate=datetime(2022,12,1)
enddate=datetime(2022,12,6)
y_symbols = ['GOOG', 'AAPL', 'MSFT']
data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
data[data['Close'] > 100]
然后引发的错误如下所示:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
..
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 229, in join
join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4658, in join
return self._join_multi(other, how=how)
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4782, in _join_multi
raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names
在这里,
data
看起来像:
Close High ... Open Volume
AAPL GOOG MSFT AAPL GOOG MSFT ... AAPL GOOG MSFT AAPL GOOG MSFT
Date ...
2022-12-01 148.309998 101.279999 254.690002 149.130005 102.589996 256.119995 ... 148.210007 101.400002 253.869995 71250400 21771500 26041500
2022-12-02 147.809998 100.830002 255.020004 148.000000 101.150002 256.059998 ... 145.960007 99.370003 249.820007 65421400 18812200 21522800
2022-12-05 146.630005 99.870003 250.199997 150.919998 101.750000 253.820007 ... 147.770004 99.815002 252.009995 68826400 19955500 23435300
数据框中可能缺少什么而这不起作用?
可能是由多级列引起的,因为
where()
方法需要单级列。先试着把它压平。
startdate=datetime(2022,12,1)
enddate=datetime(2022,12,6)
y_symbols = ['GOOG', 'AAPL', 'MSFT']
data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
data = data.stack()
filtered_cond = data['Close'] > 100
filtered_data = data.where(filtered_cond).unstack()
这有帮助。从
yfinance
获取结果后设置列名。
理想情况下,希望
yfinance
自己来处理这个问题。
>>> import yfinance as yf
>>> from datetime import datetime
>>> startdate=datetime(2022,12,1)
>>> enddate=datetime(2022,12,6)
>>> y_symbols = ['GOOG', 'AAPL', 'MSFT']
>>> data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
[*********************100%***********************] 3 of 3 completed
>>> data.columns.names = ["Attributes", "Symbols"]
>>> data[data['Close'] > 100]
Attributes Close High Low Open Volume
Symbols AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT
Date
2022-12-01 148.309998 101.279999 254.690002 149.130005 102.589996 256.119995 146.610001 100.669998 250.919998 148.210007 101.400002 253.869995 71250400 21771500.0 26041500
2022-12-02 147.809998 100.830002 255.020004 148.000000 101.150002 256.059998 145.649994 99.169998 249.690002 145.960007 99.370003 249.820007 65421400 18812200.0 21522800
2022-12-05 146.630005 NaN 250.199997 150.919998 NaN 253.820007 145.770004 NaN 248.059998 147.770004 NaN 252.009995 68826400 NaN 23435300
>>>
>>> data.where(data['Close'] > 100).where(data['High'] > 120)
Attributes Close High Low Open Volume
Symbols AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT
Date
2022-12-01 148.309998 NaN 254.690002 149.130005 NaN 256.119995 146.610001 NaN 250.919998 148.210007 NaN 253.869995 71250400 NaN 26041500
2022-12-02 147.809998 NaN 255.020004 148.000000 NaN 256.059998 145.649994 NaN 249.690002 145.960007 NaN 249.820007 65421400 NaN 21522800
2022-12-05 146.630005 NaN 250.199997 150.919998 NaN 253.820007 145.770004 NaN 248.059998 147.770004 NaN 252.009995 68826400 NaN 23435300
Where 方法接受类似列表的参数,其中包含用于过滤的布尔值。你必须向它传递一个 pandas 系列、numpy 数组、python 列表等。但是你向它传递一个数据帧 (df['close']>500) 并且其中方法引发错误 您可以阅读 pandas 文档以获取更多信息