确定 DataFrame 和 Showcase 中匹配的行对通过/失败

问题描述 投票:0回答:1

我有一个包含行对(源和目标)的 DataFrame,我想确定每对是否匹配。我需要添加一个新列来指示该对是否通过或未通过匹配条件。

Obs | Dataset | Col1 | Col2 | Col3
----------------------------------
1   | Source  | A    | 10   | X
2   | Target  | A    | 10   | X
3   | Source  | B    | 20   | Y
4   | Target  | B    | 20   | Y
5   | Source  | C    | 30   | Z
6   | Target  | D    | 30   | Z

我想要的输出:

Obs | Dataset | Result | Col1 | Col2 | Col3
--------------------------------------------
1   | Source  | Pass   | A    | 10   | X
2   | Target  |        | A    | 10   | X
3   | Source  | Pass   | B    | 20   | Y
4   | Target  |        | B    | 20   | Y
5   | Source  | Fail   | C    | 30   | Z
6   | Target  |        | D    | 30   | Z

代码:

import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import PatternFill

class ExcelHighlighter:
    def __init__(self, file_path, sheet_name):
        self.file_path = file_path
        self.sheet_name = sheet_name
        self.light_green_fill = PatternFill(start_color='00FF00', end_color='00FF00', fill_type='solid')
        self.light_coral_fill = PatternFill(start_color='FF8080', end_color='FF8080', fill_type='solid')

    def highlight_and_save(self, output_path='output.xlsx'):
        df = pd.read_excel(self.file_path, sheet_name=self.sheet_name)

        with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
            df.to_excel(writer, index=False, sheet_name=self.sheet_name)
            workbook = writer.book
            sheet = writer.sheets[self.sheet_name]

            for row in range(2, df.shape[0] + 1, 2):
                for col in range(3, df.shape[1]):
                    cell_value_source = df.iloc[row - 2, col]
                    cell_value_target = df.iloc[row - 1, col]

                    if cell_value_source == cell_value_target:
                        sheet.cell(row=row - 1, column=col + 1).fill = self.light_green_fill
                        sheet.cell(row=row, column=col + 1).fill = self.light_green_fill

                    elif cell_value_source != cell_value_target:
                        sheet.cell(row=row - 1, column=col + 1).fill = self.light_coral_fill
                        sheet.cell(row=row, column=col + 1).fill = self.light_coral_fill

            workbook.save(output_path)     

from Highlighter import ExcelHighlighter
highlighter = ExcelHighlighter('input.xlsx', 'Sheet1')
highlighter.highlight_and_save()

期望我如何添加“结果”(作为第三列)?

python openpyxl
1个回答
0
投票

解决方案在于识别所有“通过”(即“源”和“目标”中存在的),并将所有其他定义为“失败”。

# break up the input into 2 separate dataframes
source_df = df[df['Dataset'] == 'Source']
target_df = df[df['Dataset'] == 'Target']

# apply pd.merge, which will only keep those records that are prsent in both
pass_df = pd.merge(source_df, taget_df, how='inner', on=['Col1', 'Col2', 'Col3'])

所有其他人都“失败”。

© www.soinside.com 2019 - 2024. All rights reserved.