我的同事编写了 8 个类似的函数,用于在过滤后执行操作。举个例子,一个函数就像
def operation_at_country_kind_level(df):
# there are other countries in the data,
# but we only want these two hardcoded ones
for country in ["USA", "UK"]:
# again, hardcoded values for kind
for kind in ["A", "B"]:
df_filter = df[(df["Country"] == country) & (df["Kind"] == kind)]
results = operation(df_filter)
store_results(results, country = country, kind = kind)
另一个就像
def operation_at_country_level(df):
for country in ["USA", "UK"]:
df_filter = df[df["Country"] == country]
results = operation(df_filter)
store_results(results, country = country, kind = "-")
还有一个根本没有过滤的函数,它只是计算输入数据帧的结果,并以
store_results(results, country = "-", type = "-")
作为最终调用。在真实数据中,用户可以选择过滤 4 列。
我想通过只使用一个函数来简化这一点,并且用户传入他们所需的过滤级别的参数。我的想法是使用 group-by 语句,如下例所示,其中包含一些伪代码
def operation_at_level(df, groupby_columns):
# check if we're doing any for loop
if len(groupby_columns) > 0:
for index, data in df.groupby(groupby_columns):
# check that the values in the index are
# valid, something like 'if "Country" in
# groupby_columns and index[index_of_country_
# in_groupby_columns] in ["USA", "UK"]'
results = operation(data)
store_results(results, # work out what should
# go here based on groupby columns)
else:
results = operation(df)
store_results(results, "-", "-")
感觉就像我在函数内部进行了如此多的检查,以至于与拥有 8 个不同的函数一样费力。对于如何将这些相似的函数与 4 个可能的列合并以过滤为一个函数,有什么建议吗?
您可以使用
getattr
函数来每次获取您想要的函数,方法是让用户给出他想要使用的函数的名称,然后调用该函数。
例如
import funcs # Suppose your file with all the functions is called funcs
func_to_call = input("Tell me which function you want to use:")
function = getattr(funcs , func_to_call )
result = function(df)
否则,您可能可以传递一个额外的参数并使用它来映射/检查您想要调用哪个函数。
def global_operation_filter(df, check):
if check == 1:
# write the code for the first check
if check == 2:
# write the code for the second check etc