将最后一行的值添加到该行中

问题描述 投票:0回答:3

我想按名称分组时获取最后一行的值。例如,第2行中名称Walter的最后一次迭代,我想在Col3中获得Dog +“,” + Col1的猫,而Beer +“,” + Col3中的Wine。有很多列,所以我想根据索引/列位置而不是列名来进行设置。

+------+---------+-------+
| Col1 |  Name   | Col3  |
+------+---------+-------+
| Dog  | Walter  | Beer  |
| Cat  | Walter  | Wine  |
| Dog  | Alfonso | Cider |
| Dog  | Alfonso | Cider |
| Dog  | Alfonso | Vodka |
+------+---------+-------+

这是我想要的输出:

+---------------+---------------------------+---------------------+
|     Col1      |           Name            |        Col3         |
+---------------+---------------------------+---------------------+
| Dog           | Walter                    | Beer                |
| Dog, Cat      | Walter, Walter            | Beer, Wine          |
| Dog           | Alfonso                   | Cider               |
| Dog, Dog      | Alfonso, Alfonso          | Cider, Cider        |
| Dog, Dog, Dog | Alfonso, Alfonso, Alfosno | Cider, Cider, Vodka |
+---------------+---------------------------+---------------------+

这是我尝试过的(但不起作用):

for i in df:
    if df.loc[i,1] == df.loc[i+1,1]:
        df.loc[i,0] + ", " + df.loc[i+1,0]
    else:
        df.loc[i+1,0]

[我读到,用for循环遍历大熊猫中的行是不受欢迎的,所以我想通过向量化或应用(或其他有效方式)获得输出。

python pandas transformation data-wrangling
3个回答
1
投票

您可以使用groupbycumsum。如果您不介意(取决于您的使用方式)在末尾有多余的逗号/空格,则可以执行以下操作:

print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum()))
              Col1                   Col3
0            Dog,                  Beer, 
1       Dog, Cat,            Beer, Wine, 
2            Dog,                 Cider, 
3       Dog, Dog,          Cider, Cider, 
4  Dog, Dog, Dog,   Cider, Cider, Vodka, 

但是如果要删除多余的逗号/空格,只需将str [:-2]添加到每一列,例如:

print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum())\
         .apply(lambda x: x.str[:-2]))
            Col1                 Col3
0            Dog                 Beer
1       Dog, Cat           Beer, Wine
2            Dog                Cider
3       Dog, Dog         Cider, Cider
4  Dog, Dog, Dog  Cider, Cider, Vodka

1
投票

您基本上想做的是在每个组上运行一个交换聚合函数。熊猫使用comsum进行常规添加,但不支持自定义交换功能。为此,您可能需要使用一些numpy函数:

df = pd.DataFrame({"col1": ["D", "C", "D", "D", "D"], "Name": ["W", "W", "A", "A", "A"], 
                   "col3": ["B", "W", "C", "C", "V"] })


import numpy as np
def ser_accum(op,ser):
    u_op = np.frompyfunc(op, 2, 1) # two inputs, one output
    return u_op.accumulate(ser, dtype=np.object)

def plus(x,y):
    return x + "," + y

def accum(df):
    for col in df.columns:
        df[col] = ser_accum(plus, df[col])
    return df

df.groupby("Name").apply(accum)

这是结果:

col1    Name    col3
0   D   W   B
1   D,C W,W B,W
2   D   A   C
3   D,D A,A C,C
4   D,D,D   A,A,A   C,C,V

-1
投票

[如果仅关心Col1Col3的结果,请尝试以下操作:

df.groupby('Name').agg(list).applymap(', '.join)

结果:

                  Col1                 Col3
Name                                       
Alfonso  Dog, Dog, Dog  Cider, Cider, Vodka
Walter        Dog, Cat           Beer, Wine
© www.soinside.com 2019 - 2024. All rights reserved.