我有一个带有几列的 Pandas 数据框。
# Example Data frame
df = pd.DataFrame({'A':[1,15,10,47,35],
'B':["Mac","Mac","Mac","Mac","Mac"],
'C':["Dog","Dog","Cat","Dog","Tiger"],
'D':["CDN", "USD", "CDN", "Pe", "Dr"]
})
我想根据列中每个元素的相对频率为“B”、“C”、“D”列中的每个元素着色。例如,“D”列中“CDN”的相对频率为 2/5 = 0.4。
这些是我基于相对频率的颜色标准:
相对频率 | 颜色 |
---|---|
大于等于0.90 | 绿色 |
小于0.90且大于或等于0.30 | 黄色 |
小于0.30 | 红色 |
由于“D”列中“CDN”的相对频率为 0.4,因此该单元格将被分配黄色背景颜色。
我知道如何查找列中每个元素的相对频率以及如何为元素着色。
我的问题是一列的样式不断被另一列的样式覆盖。这是我的代码:
RemvColOfInterest = ['B', 'C', 'D'] # These are the columns whose elements we want to color
lstcollectionOverallRelFreqs = ['some relative frequencies'] # You don't have to worry about this
colIndexList = [] # This is the index of each of the columns in RemvColOfInterest
s = 0
while (s < len(RemvColOfInterest)):
colIndexList.append(s)
s = s + 1
tempdf = copy.copy(df)
for g, h in zip( RemvColOfInterest, colIndexList ):
df = tempdf.style.applymap(highlight_cell, lstFreq = lstcollectionOverallRelFreqs, colIndex = h, subset = pd.IndexSlice[:, [g]])
# If I output my df to an excel file:
df.to_excel("My file path", index = False)
def highlight_cell(value, lstFreq, colIndex):
Freq = determine_Freq(lstFreq[colIndex]) # All you need to know is that this is the function that finds the relative frequency associated with the element/cell
threshold1 = 0.90
threshold2 = 0.30
if (Freq >= threshold1):
return 'background-color: green;'
elif ((Freq < threshold1) and (Freq >= threshold2)):
return 'background-color: yellow;'
else:
return 'background-color: red;'
在 Excel 文件中,只有“D”列中的元素具有背景颜色。列“B”和“C”只有通常的白色背景颜色。这让我相信“B”列和“C”列的样式都被“D”列的样式覆盖。我该如何防止这种情况发生。
我相信这是有问题的行(当它在 for 循环中时,导致
df
的样式在每次迭代期间被新样式替换):
df = tempdf.style.applymap(highlight_cell, lstFreq = lstcollectionOverallRelFreqs, colIndex = h, subset = pd.IndexSlice[:, [g]])
问题是,在应用样式(子集参数)时,我一次只考虑一列。那么,为什么不同栏目的样式会互相覆盖呢?如果我不这样做:
df[g] = tempdf.style.applymap(highlight_cell, lstFreq = lstcollectionOverallRelFreqs, colIndex = h, subset = pd.IndexSlice[:, [g]])
对于“B”、“C”和“D”列中的每个单元格,我都得到
pandas.io.formats.style.Styler object at 0x00000...
。有什么指示/建议吗?
示例的输出 excel 文件应如下所示:
xlsxwriter
):
import pandas as pd
df = pd.DataFrame(
{
"A": [1, 15, 10, 47, 35],
"B": ["Mac", "Mac", "Mac", "Mac", "Mac"],
"C": ["Dog", "Dog", "Cat", "Dog", "Tiger"],
"D": ["CDN", "USD", "CDN", "Pe", "Dr"],
}
)
writer = pd.ExcelWriter("out.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
worksheet = writer.sheets["Sheet1"]
format1 = workbook.add_format({"bg_color": "#98fb98", "font_color": "#111111"})
format2 = workbook.add_format({"bg_color": "#ffff31", "font_color": "#111111"})
format3 = workbook.add_format({"bg_color": "#fe2712", "font_color": "#111111"})
for c in ['B', 'C', 'D']:
vals = df[c].value_counts() / len(df)
for i, v in zip(vals.index, vals):
f = {"type": "cell", "criteria": "==", "value": f'"{i}"', "format": format1 if v > 0.9 else (format2 if v > 0.3 else format3)}
r = f"{c}2:{c}{len(df)+1}"
worksheet.conditional_format(r, f)
writer.close()
创建
out.xlsx
(来自 LibreOffice 的屏幕截图):
编辑:
openpyxl
版本:
import pandas as pd
df = pd.DataFrame(
{
"A": [1, 15, 10, 47, 35],
"B": ["Mac", "Mac", "Mac", "Mac", "Mac"],
"C": ["Dog", "Dog", "Cat", "Dog", "Tiger"],
"D": ["CDN", "USD", "CDN", "Pe", "Dr"],
}
)
format1 = "background-color: #98fb98; color: #111111"
format2 = "background-color: #ffff31; color: #111111"
format3 = "background-color: #fe2712; color: #111111"
def fn(x):
if x.name == "A":
return [""] * len(x)
vals = x.value_counts() / len(x)
return [
format1 if v > 0.9 else (format2 if v > 0.3 else format3)
for v in map(vals.get, x)
]
with pd.ExcelWriter("out.xlsx", engine="openpyxl") as writer:
df.style.apply(fn).to_excel(writer, index=False, sheet_name="Sheet1")