熊猫索引与列号和多个条件匹配

问题描述 投票:0回答:1

我正在尝试填充看起来像这样的数据框

      Name   Origin      Date Open High  Low Close    Date+1  Open+1 High+1 Low+1 Close+1
0  Bananas     Bali  20200108  NaN  NaN  NaN   NaN  20200109     NaN    NaN   NaN     NaN
1  Coconut  Bahamas  20200110  NaN  NaN  NaN   NaN  20200111     NaN    NaN   NaN     NaN

[在看起来像这样的数据框中找到数据

      Name   Origin      Date      Time  Open  High  Low  Close
0  Bananas     Bali  20200108  15:30:00  1.58  1.85  1.4   1.50
1  Bananas     Bali  20200108  22:00:00  1.68  1.78  1.5   1.60
2  Bananas     Bali  20200109  15:30:00  1.88  1.95  1.7   1.86
3  Bananas     Bali  20200109  22:00:00  1.78  1.88  1.6   1.65
4  Coconut  Bahamas  20200110  15:30:00  2.58  2.85  2.4   2.50
5  Coconut  Bahamas  20200110  22:00:00  2.68  2.78  2.5   2.60
6  Coconut  Bahamas  20200111  15:30:00  2.88  2.95  2.7   2.86
7  Coconut  Bahamas  20200111  22:00:00  2.78  2.88  2.6   2.65

由于第一个数据框中的列具有不同的名称(例如,“ Open”和“ Open + 1”),我想不出一种简单的索引匹配方法,而不必复制代码并重命名第二个数据帧。因此,我认为按列号索引匹配更容易,但是即时通讯在确定如何执行此操作方面存在问题。列的条件为“名称”,“来源”和“日期”(Date + 1表示Open + 1,等等。)。

我尝试使用以下代码:

ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))

为了获得正确的列值,但是我得到一个'KeyError:0',它引用列号。

我在下面创建了一个示例代码,可用于获取相同的数据帧。

import pandas as pd

#Creating first sample dataframe
lst1 = [['Bananas', 'Bali', '20200108', 'NaN', 'NaN', 'NaN', 'NaN', '20200109', 'NaN', 'NaN', 'NaN', 'NaN'],
   ['Coconut', 'Bahamas', '20200110', 'NaN', 'NaN', 'NaN', 'NaN', '20200111', 'NaN', 'NaN', 'NaN', 'Nan']]

df1 = pd.DataFrame(lst1, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('First Dataframe')
print(df1)

#Creating second sample dataframe
lst2 = [['Bananas', 'Bali', '20200108', '15:30:00', 1.58, 1.85, 1.50, 1.50],
    ['Bananas', 'Bali', '20200108', '22:00:00', 1.68, 1.78, 1.40, 1.60],
    ['Bananas', 'Bali', '20200109', '15:30:00', 1.88, 1.95, 1.70, 1.86],
    ['Bananas', 'Bali', '20200109', '22:00:00', 1.78, 1.88, 1.60, 1.65],
    ['Coconut', 'Bahamas', '20200110', '15:30:00', 2.58, 2.85, 2.50, 2.50],
    ['Coconut', 'Bahamas', '20200110', '22:00:00', 2.68, 2.78, 2.40, 2.60],
    ['Coconut', 'Bahamas', '20200111', '15:30:00', 2.88, 2.95, 2.70, 2.86],
    ['Coconut', 'Bahamas', '20200111', '22:00:00', 2.78, 2.88, 2.60, 2.65]]

df2 = pd.DataFrame(lst2, columns =['Name', 'Origin', 'Date', 'Time', 'Open', 'High', 'Low', 'Close'])
print('Second Dataframe')
print(df2)

#Index Match

ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))


print("Printing first index")
print(ColOpen)

#Desired Output
lst3 = [['Bananas', 'Bali', '20200108', 1.58, 1.85, 1.4, 1.6, '20200109', 1.88, 1.95, 1.6, 1.65],
   ['Coconut', 'Bahamas', '20200110', 2.58, 2.85, 2.4, 2.6, '20200111', 2.88, 2.95, 2.6, 2.65]]

df3 = pd.DataFrame(lst3, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('Desired Output')
print(df3)

有人可以帮我弄清楚该怎么做吗?

编辑:所需的输出。还更新了一点代码。

      Name   Origin      Date  Open  ...  Open+1  High+1  Low+1 Close+1
0  Bananas     Bali  20200108  1.58  ...    1.88    1.95    1.6    1.65
1  Coconut  Bahamas  20200110  2.58  ...    2.88    2.95    2.6    2.65

我正在尝试填充看起来像这样的数据框名称来源日期开盘高低开盘日+1开盘1高+1低+1开盘+1 0香蕉Bali 20200108 NaN NaN NaN NaN ...

python python-3.x pandas
1个回答
0
投票

编辑:

© www.soinside.com 2019 - 2024. All rights reserved.