每个客户每年排名交易趋势

问题描述 投票:-2回答:2

在jupyter工作,我的数据框每年每个客户的交易数量和字段表明“交易量比去年更多,交易量比去年减少,第一年为零。

我想创建一个分子,每个客户的每个“up”将增加1,并且每个“down”将“减少”1。

我知道我首先需要对df进行排序,而不是构建一个循环,该循环将运行在客户数量和每年运行的内部循环但我需要帮助。

DF样本:

df = pd.DataFrame({
    'group number': [1,1,1,1,3,3,3],
    'year': ['2012','2013','2014','2015','2011','2012','2013'],
    'trend': [NaN,'down','up','up',NaN,'down','up']
}) 

这是我到目前为止所做的:

df =pd.read_excel('totals_new.xlsx',sheet_name='Sheet1').sort_values(['group number', 'year'])

noofgroups = len(df['group number'].unique())
yearspergroup = df.groupby('group number')['year'].nunique()

vtrend =0

for i in noofgroups:
    for j in yearspergroup:
        if df["trend"] == "up":
            vtrend = vtrend+1
        if df["trend"] == "down":
            vtrend = vtrend-1
python pandas loops for-loop
2个回答
0
投票

IIUC,您可以使用嵌套的np.where()转换您的trend列,然后执行groupby()agg()。拿这个样本数据帧:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'group number': [1,1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,2,2,1,2,1,2],
    'year': ['2017','2016','2018','2017','2016','2018','2017','2016','2018','2017','2016','2018',
        '2017','2016','2018','2017','2016','2018','2017','2016','2018','2017'],
    'trend': ['up','down','up',np.nan,'up','down',np.nan,'up','up','up','down',
        'up',np.nan,'up','up','up','down','up','up','up',np.nan,'down']
    })

产量:

    group number  year trend
0              1  2017    up
1              1  2016  down
2              1  2018    up
3              1  2017   NaN
4              1  2016    up
5              1  2018  down
6              1  2017   NaN
7              2  2016    up
8              2  2018    up
9              2  2017    up
10             2  2016  down
11             2  2018    up
12             2  2017   NaN
13             1  2016    up
14             1  2018    up
15             1  2017    up
16             2  2016  down
17             2  2018    up
18             1  2017    up
19             2  2016    up
20             1  2018   NaN
21             2  2017  down

然后:

df['trend'] = np.where(df['trend']=='up', 1, np.where(df['trend']=='down', -1, 0))

df.groupby(['group number','year']).agg({'trend': 'sum'})

返回:

                   trend
group number year       
1            2016      1
             2017      3
             2018      1
2            2016      0
             2017      0
             2018      3

0
投票

这个案子现在可能已经关闭了,但是,这是一个可能的解决方案,因为它之前没有得出结论。

import pandas as pd

"""
In this case, the original dataframe is already properly sorted by group number and year.
If it isn't, the 2 columns should be sorted first
"""
df = pd.DataFrame({
    'group number': [1,1,1,1,3,3,3],
    'year': ['2012','2013','2014','2015','2011','2012','2013'],
    'trend': [np.nan,'down','up','up', np.nan,'down','up']
}) 

df['trend_val'] = df.loc[df['trend'].isna() == False, 'trend'].map(lambda x: -1 if x == 'down' else 1)
df.join(df.groupby('group number')['trend_val'].cumsum(), rsuffix='_cumulative')

>>>df
   group number  year trend  trend_val  trend_val_cumulative
0             1  2012   NaN        NaN                   NaN
1             1  2013  down       -1.0                  -1.0
2             1  2014    up        1.0                   0.0
3             1  2015    up        1.0                   1.0
4             3  2011   NaN        NaN                   NaN
5             3  2012  down       -1.0                  -1.0
6             3  2013    up        1.0                   0.0
© www.soinside.com 2019 - 2024. All rights reserved.