基于Pandas中特殊字符分隔列中的每个项目的聚合

问题描述 投票:2回答:1

我输入数据如下

Date        Investment Type                                    Medium
1/1/2000    Mutual Fund, Stocks, Fixed Deposit, Real Estate    Own, Online,Through Agent
1/2/2000    Mutual Fund, Stocks, Real Estate                   Own
1/3/2000    Fixed Deposit                                      Online
1/3/2000    Mutual Fund, Fixed Deposit, Real Estate            Through Agent
1/2/2000    Stocks                                             Own, Online,                            Through Agent

我的功能输入是中等。它可以是列表的单个值。我想基于Medium输入搜索数据,然后聚合下面给出的数据。对于Medium中的值,请检查投资类型,然后汇总每种投资类型的数据

Medium                                Investment Type           Date
Own,Online                            Mutual Fund               1/1/2000,1/2/2000 
Own,Online                            Stocks                    1/1/2000,1/2/2000
Own,Online                            Fixed Deposit             1/1/2000,1/3/2000
Own,Online                            Real Estate               1/1/2000
python pandas csv aggregate
1个回答
2
投票

您可以使用:

L = ['Online','Own']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['New_Medium'] = df.pop('Medium').str.findall('('+ pat + ')').str.join(', ')
#remove rows with empty values
df = df[df['New_Medium'].astype(bool)]

from  itertools import product
df1 = pd.DataFrame([j for i in df.apply(lambda x: x.str.split(',\s*')).values 
                      for j in product(*i)], columns=df.columns)
print (df1)
        Date Investment Type New_Medium
0   1/1/2000     Mutual Fund        Own
1   1/1/2000     Mutual Fund     Online
2   1/1/2000          Stocks        Own
3   1/1/2000          Stocks     Online
4   1/1/2000   Fixed Deposit        Own
5   1/1/2000   Fixed Deposit     Online
6   1/1/2000     Real Estate        Own
7   1/1/2000     Real Estate     Online
8   1/2/2000     Mutual Fund        Own
9   1/2/2000          Stocks        Own
10  1/2/2000     Real Estate        Own
11  1/3/2000   Fixed Deposit     Online
12  1/2/2000          Stocks        Own
13  1/2/2000          Stocks     Online

#get all combinations and aggregate join by unique values
df = df1.groupby('Investment Type').agg(lambda x: ', '.join(x.unique())).reset_index()
print (df)
  Investment Type                Date   New_Medium
0   Fixed Deposit  1/1/2000, 1/3/2000  Own, Online
1     Mutual Fund  1/1/2000, 1/2/2000  Own, Online
2     Real Estate  1/1/2000, 1/2/2000  Own, Online
3          Stocks  1/1/2000, 1/2/2000  Own, Online
© www.soinside.com 2019 - 2024. All rights reserved.