我想在 df 中基于其他 col 值引入一个新的 col。 如果 c1-c3 列只有 1 个唯一值,则该唯一值将进入 c4 列。 如果 c1-c3 列有两个不同的值,则“两者”都将进入 c4 列。 NaN 不应被视为有效值。只有 c2 和 c3 有一些 NaN。
最小示例:
df = pd.DataFrame({
"c1": ["left", "right", "right", "left", "left","right"],
"c2": ["left", "right", "right", "right", "NaN","right"],
"c3": ["NaN", "NaN", "left", "NaN", "left","right"]})
所需df:
answerdf = pd.DataFrame({
"c1": ["left", "right", "right", "left", "left","right"],
"c2": ["left", "right", "right", "right", "NaN","right"],
"c3": ["NaN", "NaN", "left", "NaN", "left","right"],
"c4":["left", "right", "both", "both", "left","right"] })
import pandas as pd
import numpy as np
df = pd.DataFrame({
"c1": ["left", "right", "right", "left", "left", "right"],
"c2": ["left", "right", "right", "right", np.nan, "right"],
"c3": [np.nan, np.nan, "left", np.nan, "left", "right"]
})
def worker(row):
if "left" in row.values and "right" in row.values:
return "both"
if "left" in row.values:
return "left"
if "right" in row.values:
return "right"
return np.nan
df["c4"] = df[["c1", "c2", "c3"]].apply(worker, axis=1)
如果既没有给出 left 也没有给出 right 则返回 nan,并且可能更容易理解
输出
c1 c2 c3 c4
0 left left NaN left
1 right right NaN right
2 right right left both
3 left right NaN both
4 left NaN left left
5 right right right right
使用 apply 创建带有 lambda 函数的列 c4 以检查唯一值的数量。
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame({
"c1": ["left", "right", "right", "left", "left", "right"],
"c2": ["left", "right", "right", "right", np.nan, "right"],
"c3": [np.nan, np.nan, "left", np.nan, "left", "right"]
})
unique_values = lambda x: x.nunique(dropna=True)
df["c4"] = df[["c1", "c2", "c3"]].apply(unique_values, axis=1).map({1: lambda x: x.index[0], 2: "both"})
print(df)
输出:
c1 c2 c3 c4
0 left left NaN <function <lambda> at 0x7f2c9108d4c0>
1 right right NaN <function <lambda> at 0x7f2c9108d4c0>
2 right right left both
3 left right NaN both
4 left NaN left <function <lambda> at 0x7f2c9108d4c0>
5 right right right <function <lambda> at 0x7f2c9108d4c0>
试试这个:
#convert string 'NaN' to np.NaN
df = df.where(df.ne('NaN'))
#find the max value per row, and in rows where there are more than one unique value, make both
df.stack().groupby(level=0).agg('max').where(df.nunique(axis=1).eq(1),'both')
或
g = df.where(df.ne('NaN')).stack().groupby(level=0)
np.where(g.nunique().ne(2),g.first(),'both')
或
pd.get_dummies(df.where(df.ne('NaN')).stack()).groupby(level=0).any().mul([1,2]).sum(axis=1).map(dict(enumerate(['neither','left','right','both'])))