我认为这是一个小问题,但我没有成功地使用代码解决方案。我有两个数据帧 df_diff 和 df_all。两个 pandas df 具有相同的键列(例如“Key”),但列名称不同。
代码应该迭代 df_diff 的行,获取键值,在 df_all 中查找具有该键值的行,然后迭代 df_diff 的这一行的所有单元格,并搜索是否有任何单元格与df_all 中的相应行。
如果有匹配,该单元格应该接收红色背景颜色。
请注意,这些数据框之间的列名称不同,但“键”列除外。
这是一个输入示例: df_diff
钥匙 | 第_1栏 | 第 2 栏 |
---|---|---|
钥匙2 | 价值2 | 价值3 |
钥匙3 | 价值3 | 价值4 |
钥匙4 | 价值5 | 值6 |
df_all
钥匙 | Column_all_A | Column_all_B |
---|---|---|
钥匙2 | 价值8 | 价值2 |
钥匙3 | 价值3 | 价值10 |
钥匙6 | 值0 | 价值11 |
这是我对自己问题的回答:
import pandas as pd
# Sample data for df_diff
data_diff = {
'Key': ['Key2', 'Key3', 'Key4'],
'Column_1': ['Value2', 'Value3', 'Value5'],
'Column_2': ['Value3', 'Value4', 'Value6']
}
df_diff = pd.DataFrame(data_diff)
# Sample data for df_all
data_all = {
'Key': ['Key2', 'Key3', 'Key6'],
'Column_all_A': ['Value8', 'Value3', 'Value0'],
'Column_all_B': ['Value2', 'Value10', 'Value11']
}
df_all = pd.DataFrame(data_all)
# Function to find matching cells and apply red background to df_all
def highlight_matching_cells(row_all):
# Get the key value from the current row in df_all
key_value = row_all['Key']
# Filter the corresponding row in df_diff using the key value
row_diff = df_diff[df_diff['Key'] == key_value]
# Check if a matching row is found in df_diff
if not row_diff.empty:
# Iterate over columns in df_all (except the 'Key' column)
for col_all in row_all.index[1:]:
# Iterate over columns in the matching row of df_diff (except the 'Key' column)
for col_diff in row_diff.columns[1:]:
# Check if the cell value in df_all matches any cell value in the matching row of df_diff
if row_all[col_all] == row_diff[col_diff].iloc[0]:
# If a match is found, return a list with red background for the matched cell in df_all
return ['background-color: red' if col == col_all else '' for col in row_all.index]
# If no match is found, return a list with no background color for all cells in df_all
return ['' for _ in row_all.index]
# Apply the function to each row in df_all
df_highlighted = df_all.style.apply(highlight_matching_cells, axis=1)
# Display the highlighted DataFrame
df_highlighted
但是,有谁有更优雅、更短的方法吗? 我想定义一个带有格式化条件的 styler() 函数,并使用 df.style.apply() 或 df.style.applymap() 将格式应用到每个匹配的单元格。
这是使用带有 listcomp 的映射器来构建样式的选项之一:
lstyles = [
["background-color:lightcoral" # <-- adjust the color here
if v in df_diff.set_index("Key").T.to_dict("list").get(k, []) else ""
for v in vals] for k, *vals in df_all.values
]
use_cols = df_all.columns.difference(["Key"])
out = (
df_all.style.apply(lambda _: pd.DataFrame(lstyles, columns=use_cols),
axis=None, subset=use_cols)
)
输出: