比较两个 pandas df 行并对匹配值应用条件格式

Question

我认为这是一个小问题，但我没有成功地使用代码解决方案。我有两个数据帧 df_diff 和 df_all。两个 pandas df 具有相同的键列（例如“Key”），但列名称不同。

代码应该迭代 df_diff 的行，获取键值，在 df_all 中查找具有该键值的行，然后迭代 df_diff 的这一行的所有单元格，并搜索是否有任何单元格与df_all 中的相应行。

如果有匹配，该单元格应该接收红色背景颜色。

请注意，这些数据框之间的列名称不同，但“键”列除外。

这是一个输入示例： df_diff

钥匙	第_1栏	第 2 栏
钥匙2	价值2	价值3
钥匙3	价值3	价值4
钥匙4	价值5	值6

df_all

钥匙	Column_all_A	Column_all_B
钥匙2	价值8	价值2
钥匙3	价值3	价值10
钥匙6	值0	价值11

预期输出：

Answer 1

这是我对自己问题的回答：

import pandas as pd

# Sample data for df_diff
data_diff = {
'Key': ['Key2', 'Key3', 'Key4'],
'Column_1': ['Value2', 'Value3', 'Value5'],
'Column_2': ['Value3', 'Value4', 'Value6']
}
df_diff = pd.DataFrame(data_diff)

# Sample data for df_all
data_all = {
'Key': ['Key2', 'Key3', 'Key6'],
'Column_all_A': ['Value8', 'Value3', 'Value0'],
'Column_all_B': ['Value2', 'Value10', 'Value11']
}
df_all = pd.DataFrame(data_all)

# Function to find matching cells and apply red background to df_all
def highlight_matching_cells(row_all):
    # Get the key value from the current row in df_all
    key_value = row_all['Key']

    # Filter the corresponding row in df_diff using the key value
    row_diff = df_diff[df_diff['Key'] == key_value]

    # Check if a matching row is found in df_diff
    if not row_diff.empty:
        # Iterate over columns in df_all (except the 'Key' column)
        for col_all in row_all.index[1:]:
            # Iterate over columns in the matching row of df_diff (except the 'Key' column)
            for col_diff in row_diff.columns[1:]:
                # Check if the cell value in df_all matches any cell value in the matching row of df_diff
                if row_all[col_all] == row_diff[col_diff].iloc[0]:
                # If a match is found, return a list with red background for the matched cell in df_all
                    return ['background-color: red' if col == col_all else '' for col in row_all.index]
    # If no match is found, return a list with no background color for all cells in df_all
    return ['' for _ in row_all.index]

# Apply the function to each row in df_all
df_highlighted = df_all.style.apply(highlight_matching_cells, axis=1)

# Display the highlighted DataFrame
df_highlighted

这给了我想要的输出：

但是，有谁有更优雅、更短的方法吗？ 我想定义一个带有格式化条件的 styler() 函数，并使用 df.style.apply() 或 df.style.applymap() 将格式应用到每个匹配的单元格。

Answer 2

这是使用带有 listcomp 的映射器来构建样式的选项之一：

lstyles = [
    ["background-color:lightcoral" # <-- adjust the color here
    if v in df_diff.set_index("Key").T.to_dict("list").get(k, []) else ""
    for v in vals] for k, *vals in df_all.values
]

use_cols = df_all.columns.difference(["Key"])

out = (
    df_all.style.apply(lambda _: pd.DataFrame(lstyles, columns=use_cols),
                       axis=None, subset=use_cols)
)

输出：

比较两个 pandas df 行并对匹配值应用条件格式

问题描述投票：0回答：2

2个回答

最新问题

比较两个 pandas df 行并对匹配值应用条件格式

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2