创建条件并将其存储在列表中以便稍后索引

问题描述 投票:0回答:1

我用 pandas 数据框创建了一组条件:

'cat_A100 = df['Mr_Diag_Icd10_Code'].str.startswith(('A','B'))

sub_A101 = df['Mr_Diag_Icd10_Code'].str.startswith(tuple([f"A0{i}" for i in range(9)]))
sub_A102 = df['Mr_Diag_Icd10_Code'].str.startswith(tuple([f"A09"]))
sub_A103 = df['Mr_Diag_Icd10_Code'].str.startswith(tuple([f"A{i}" for i in range(15,20)])) |        df['Mr_Diag_Icd10_Code'].str.startswith(tuple([f"B90"]))
sub_A104 = df['Mr_Diag_Icd10_Code'].str.startswith(tuple([f"A{i}" for i in range(40,42)]))
sub_A105 = df['Mr_Diag_Icd10_Code'].str.startswith(tuple([f"B24"]))'

并使用条件创建新变量:

'df.loc[cat_A100, 'diagcat'] = 'A100: Certain infectious and parasitic diseases'
df.loc[cat_A100 & sub_A101, 'diagsub'] = 'A101: Intestinal infectious diseases except diarrhoea'
df.loc[cat_A100 & sub_A102, 'diagsub'] = 'A102: Diarrhoea and gastroenteritis of presumed infectious origin'
df.loc[cat_A100 & sub_A103, 'diagsub'] = 'A103: Tuberculosis'
df.loc[cat_A100 & sub_A104, 'diagsub'] = 'A104: Septicaemia'
df.loc[cat_A100 & sub_A105, 'diagsub'] = 'A105: HIV disease'
df.loc[cat_A100 & ~sub_A101 & ~sub_A102 & ~sub_A103 & ~sub_A104 & ~sub_A105, 'diagsub'] = 'A106: Other infectious and parasitic diseases''

有没有办法让我的代码更简洁?我希望创建一个条件元组或列表,然后在创建变量时引用它们(第二组代码)

谢谢!

这是一种潜在的方式吗?或者有更简洁的方法来格式化我的代码吗?

任何建议都有帮助:)

pandas tuples
1个回答
0
投票

可以使用字典来存储子类别条件及其对应的标签。然后,您可以迭代此字典以应用条件并分配标签。

这种方法减少了重复,也使将来更容易更新或扩展您的条件。

具体操作方法如下:

# == Necessary imports =========================================================
import pandas as pd

# == Create example DataFrame ==================================================
df = pd.DataFrame({'Mr_Diag_Icd10_Code': ['A01', 'A09', 'A10', 'A16', 'A41', 'B24', 'C01']})

# == Define Conditions ==========================================================
# Define the category condition
cat_A100 = df['Mr_Diag_Icd10_Code'].str.startswith(('A', 'B'))

# Dictionary for subcategories and their corresponding conditions
subcat_conditions = {
    ('A', 'B'): 'A106: Other infectious and parasitic diseases',
    tuple(f"A{i:02d}" for i in range(9)): 'A101: Intestinal infectious diseases except diarrhoea',
    ('A09',): 'A102: Diarrhoea and gastroenteritis of presumed infectious origin',
    tuple(f"A{i}" for i in range(15, 20)) + ('B90',): 'A103: Tuberculosis',
    tuple(f"A{i}" for i in range(40, 42)): 'A104: Septicaemia',
    ('B24',): 'A105: HIV disease',
}

# == Create 'diagcat' and 'diagsub' columns ====================================

# Apply the category and subcategory conditions
df.loc[cat_A100, 'diagcat'] = 'A100: Certain infectious and parasitic diseases'

for condition, label in subcat_conditions.items():
    df.loc[cat_A100 & df['Mr_Diag_Icd10_Code'].str.startswith(condition), 'diagsub'] = label

注释

您在

subcat_conditions
字典中定义条件的顺序很重要。如果条件重叠,则应首先定义不太具体的条件。例如,如果条件“A106”先于其他条件,则应首先定义它。否则,它应该最后定义。

© www.soinside.com 2019 - 2024. All rights reserved.