基于 YAML 中的条件的数据帧过滤

问题描述 投票:0回答:0

我正在尝试按 YAML 中存储的条件过滤数据框。大概有100多个条件可以过滤;这些只是一些条件。

general_1:
  condition_1:
    'A': 1
    'B': 5
    'C': range(0, 8)
    'D': 21

  condition_2:
    'A': 1
    'B': 4
    'C': range(9, 200)
    'D': 22

  condition_3:
    'A': 1
    'B': 3
    'C': range(3, 200)
    'D': 22

    condition_4:
    'A': 1
    'B': 6
    'C': range(3, 200)
    'D': [21, 101, 102, 241, 242, 341, 342, 343, 344, 345, 346, 347, 348, 349, 351, 352, 353, 354, 355, 356, 357, 551, 552, 553, 554, 555, 556, 665, 667, 767, 861, 862]

我的目标是将此条件与数据框匹配并创建包含结果的新列,以便我可以标记不匹配的行。

定义输入 CSV 文件的路径

input_file = data_file

定义配置 YAML 文件的路径

config_file = yaml_file

def filter_columns(df, yaml_file):
    with open(yaml_file) as f:
        config = yaml.safe_load(f)
    for row in df:
        if (row['A'] == config['general_1']['condition_1']['A'] and
            row['B'] == config['general_1']['condition_1']['B'] and
            row['C'] == config['general_1']['condition_1']['C'] and
            row['D'] in config['general_1']['condition_1']['D']):
            row['matched'] = 1
        elif (row['A'] == config['general_1']['condition_2']['A'] and
              row['B'] == config['general_1']['condition_2']['B'] and
              row['C'] == config['general_1']['condition_2']['C'] and
              row['D'] in config['general_1']['condition_2']['D']):
            row['matched'] = 1
        elif (row['A'] == config['general_1']['condition_3']['A'] and
              row['B'] == config['general_1']['condition_3']['B'] and
              row['C'] == config['general_1']['condition_3']['C'] and
              row['D'] in config['general_1']['condition_3']['D']):
            row['matched'] = 1
        else:
            row['matched'] = 0
    return df

将输入的 CSV 文件读入字典列表

with open(input_file, 'r') as f:
    reader = csv.DictReader(f)
    data = [row for row in reader]

根据配置文件过滤列

filtered_data = filter_columns(data, config_file)

我不知道我哪里做错了。该函数不会创建包含结果的新列。

python pandas dataframe bigdata
© www.soinside.com 2019 - 2024. All rights reserved.