此函数根据5个不同的标准(开始日期,结束日期,资金,帐户和分析)对数据框中的行求和:
df = pd.DataFrame(
[
['02-09-2019',20190902, 20.00, 'F1','B1','I2'],
['23-09-2019',20190923, 237.36, 'F1','B1','I1'],
['15-11-2019',20191115, 200.00, 'F1','B1','I1'],
['16-11-2019',20191116, 2045.00, 'F1','B1','I2'],
['05-05-2020',20200505, 205.00, 'F2','B2','I1'],
],
columns= ['Datestr','Datenum','Cost','Fund','Account','Analysis'])
def per_sum(startdate, enddate, fund, account, analysis):
return df[(df.Datenum > startdate) &
(df.Datenum < enddate) &
(df.Fund == fund) &
(df.Account == account) &
(df.Analysis == analysis)
].Cost.sum()
per_sum(20190000,20200000,'F1','B1','I1')
如果未提供资金,帐户或分析数据,我将如何调整此功能以继续总成本。
例如:如果我想在所有资金和帐户中找到总计Analysis'I2'。
这种事情不起作用:
per_sum(20190000,20200000,'','','I2')
谢谢
这可能不是很优雅,但是透明且万无一失:
def per_sum_2(startdate, enddate, fund = None, account=None, analysis=None):
df2 = df[(df.Datenum > startdate) &
(df.Datenum < enddate) ]
if not fund is None:
df2 = df2[df2.Fund == fund]
if not account is None:
df2 = df2[df2.Account == account]
if not analysis is None:
df2 = df2[df2.Analysis == analysis]
return df2.Cost.sum()
per_sum_2(20190000,20200000,analysis='I2')
2065.0
[Idea是按|
链接的bitwise OR
新概念,要按空格进行比较:
def per_sum(startdate, enddate, fund, account, analysis):
return df[(df.Datenum > startdate) &
(df.Datenum < enddate) &
((df.Fund == fund) | (fund == '')) &
((df.Account == account) | (account == '')) &
((df.Analysis == analysis) | (analysis == ''))
].Cost.sum()
print(per_sum(20190000,20200000,'','',''))
2502.36
print(per_sum(20190000,20200000,'','','I2'))
2065.0
编辑:
如果还希望用日期时间进行过滤,一种可能的解决方案是为更改开始和结束日期时间添加if-else
语句:
def per_sum(startdate, enddate, fund, account, analysis):
startdate = -np.inf if startdate == '' else startdate
enddate = np.inf if enddate == '' else enddate
return df[(df.Datenum > startdate) &
(df.Datenum < enddate) &
((df.Fund == fund) | (fund == '')) &
((df.Account == account) | (account == '')) &
((df.Analysis == analysis) | (analysis == ''))
].Cost.sum()
print(per_sum('','','','',''))
2707.36
此:
per_sum(20190000,20200000,'','','I2')
无效,因为''不是适合该列所有大小写/值的通配符。您可以使用正则表达式来匹配列值的所有值。
您可以将函数声明更改为包含列的默认值,因此,当您要忽略一个默认值时,在调用函数时不要给它指定参数。
def per_sum(startdate, enddate, fund='somepattern', account='otherpattern', analysis):