具有各种列条件的Pandas Dataframe sum函数

问题描述 投票:1回答:3

此函数根据5个不同的标准(开始日期,结束日期,资金,帐户和分析)对数据框中的行求和:

df = pd.DataFrame(
    [
    ['02-09-2019',20190902,  20.00,  'F1','B1','I2'],
    ['23-09-2019',20190923,  237.36, 'F1','B1','I1'],
    ['15-11-2019',20191115,  200.00, 'F1','B1','I1'],
    ['16-11-2019',20191116,  2045.00, 'F1','B1','I2'],
    ['05-05-2020',20200505,  205.00, 'F2','B2','I1'],
    ],
    columns= ['Datestr','Datenum','Cost','Fund','Account','Analysis'])


def per_sum(startdate, enddate, fund, account, analysis):
    return df[(df.Datenum > startdate) &
              (df.Datenum < enddate) &
              (df.Fund == fund) &
              (df.Account == account) &
              (df.Analysis == analysis)
              ].Cost.sum()


per_sum(20190000,20200000,'F1','B1','I1')

如果未提供资金,帐户或分析数据,我将如何调整此功能以继续总成本。

例如:如果我想在所有资金和帐户中找到总计Analysis'I2'。

这种事情不起作用:

per_sum(20190000,20200000,'','','I2')

谢谢

python pandas dataframe criteria
3个回答
1
投票

这可能不是很优雅,但是透明且万无一失:

def per_sum_2(startdate, enddate, fund = None, account=None, analysis=None):

    df2 = df[(df.Datenum > startdate) &
              (df.Datenum < enddate) ]
    if not fund is None:
        df2 = df2[df2.Fund == fund]
    if not account is None:
        df2 = df2[df2.Account == account]
    if not analysis is None:
        df2 = df2[df2.Analysis == analysis]

    return df2.Cost.sum()

per_sum_2(20190000,20200000,analysis='I2')

2065.0

1
投票

[Idea是按|链接的bitwise OR新概念,要按空格进行比较:

def per_sum(startdate, enddate, fund, account, analysis):
    return df[(df.Datenum > startdate) &
              (df.Datenum < enddate) &
              ((df.Fund == fund) | (fund == '')) &
              ((df.Account == account) | (account == '')) &
              ((df.Analysis == analysis) | (analysis == ''))
              ].Cost.sum()

print(per_sum(20190000,20200000,'','',''))
2502.36

print(per_sum(20190000,20200000,'','','I2'))
2065.0

编辑:

如果还希望用日期时间进行过滤,一种可能的解决方案是为更改开始和结束日期时间添加if-else语句:

def per_sum(startdate, enddate, fund, account, analysis):
    startdate = -np.inf if startdate == '' else startdate
    enddate = np.inf if enddate == '' else enddate
    return df[(df.Datenum > startdate) &
              (df.Datenum < enddate) &
              ((df.Fund == fund) | (fund == '')) &
              ((df.Account == account) | (account == '')) &
              ((df.Analysis == analysis) | (analysis == ''))
              ].Cost.sum()

print(per_sum('','','','',''))
2707.36

0
投票

此:

per_sum(20190000,20200000,'','','I2') 

无效,因为''不是适合该列所有大小写/值的通配符。您可以使用正则表达式来匹配列值的所有值。

您可以将函数声明更改为包含列的默认值,因此,当您要忽略一个默认值时,在调用函数时不要给它指定参数。

def per_sum(startdate, enddate, fund='somepattern', account='otherpattern', analysis):
© www.soinside.com 2019 - 2024. All rights reserved.