根据其他列 pandas 中的值创建新列

Question

我想在 df 中基于其他 col 值引入一个新的 col。如果 c1-c3 列只有 1 个唯一值，则该唯一值将进入 c4 列。如果 c1-c3 列有两个不同的值，则“两者”都将进入 c4 列。 NaN 不应被视为有效值。只有 c2 和 c3 有一些 NaN。

最小示例：

df = pd.DataFrame({
                     "c1": ["left", "right", "right", "left", "left","right"], 
                     "c2": ["left", "right", "right", "right", "NaN","right"], 
                     "c3": ["NaN", "NaN", "left", "NaN", "left","right"]})

所需df：

answerdf = pd.DataFrame({
                     "c1": ["left", "right", "right", "left", "left","right"], 
                     "c2": ["left", "right", "right", "right", "NaN","right"], 
                     "c3": ["NaN", "NaN", "left", "NaN", "left","right"], 
                        "c4":["left", "right", "both", "both", "left","right"] })

Answer 1

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "c1": ["left", "right", "right", "left", "left", "right"],
    "c2": ["left", "right", "right", "right", np.nan, "right"],
    "c3": [np.nan, np.nan, "left", np.nan, "left", "right"]
})

def worker(row):
    if "left" in row.values and "right" in row.values:
        return "both"
    if "left" in row.values:
        return "left"
    if "right" in row.values:
        return "right"
    return np.nan

df["c4"] = df[["c1", "c2", "c3"]].apply(worker, axis=1)

如果既没有给出 left 也没有给出 right 则返回 nan，并且可能更容易理解

输出

    c1  c2  c3  c4
0   left    left    NaN     left
1   right   right   NaN     right
2   right   right   left    both
3   left    right   NaN     both
4   left    NaN     left    left
5   right   right   right   right

Answer 2

使用 apply 创建带有 lambda 函数的列 c4 以检查唯一值的数量。

示例：

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "c1": ["left", "right", "right", "left", "left", "right"],
    "c2": ["left", "right", "right", "right", np.nan, "right"],
    "c3": [np.nan, np.nan, "left", np.nan, "left", "right"]
})

unique_values = lambda x: x.nunique(dropna=True)

df["c4"] = df[["c1", "c2", "c3"]].apply(unique_values, axis=1).map({1: lambda x: x.index[0], 2: "both"})

print(df)

输出：

      c1     c2     c3                                     c4
0   left   left    NaN  <function <lambda> at 0x7f2c9108d4c0>
1  right  right    NaN  <function <lambda> at 0x7f2c9108d4c0>
2  right  right   left                                   both
3   left  right    NaN                                   both
4   left    NaN   left  <function <lambda> at 0x7f2c9108d4c0>
5  right  right  right  <function <lambda> at 0x7f2c9108d4c0>

Answer 3

试试这个：

#convert string 'NaN' to np.NaN
df = df.where(df.ne('NaN'))

#find the max value per row, and in rows where there are more than one unique value, make both
df.stack().groupby(level=0).agg('max').where(df.nunique(axis=1).eq(1),'both')

或

g = df.where(df.ne('NaN')).stack().groupby(level=0)

np.where(g.nunique().ne(2),g.first(),'both')

或

pd.get_dummies(df.where(df.ne('NaN')).stack()).groupby(level=0).any().mul([1,2]).sum(axis=1).map(dict(enumerate(['neither','left','right','both'])))

根据其他列 pandas 中的值创建新列

问题描述投票：0回答：3

3个回答

最新问题

根据其他列 pandas 中的值创建新列

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3