Python pandas 使用 if else 条件删除重复项

Question

注意我简化了示例数据集。实际上我有大约 60 个属性，它们不是数字。

我有一个包含两列（位置、属性）的数据框。有些位置是重复的，对于这些重复项，我想保留属性列中没有特定值的所有行。但是，如果除该特定值之外没有其他值可用，则保留具有该特定值的行。

location = ['A', 'B', 'B', 'C', 'C']
attribute = ['1', '2', '3', '2', '2']

df = pd.DataFrame({'location':location, 'attribute':attribute})

预期输出：

换句话说，由于我有很多属性，我想在特定属性之前优先选择所有属性，并且只有在没有其他属性可用的情况下，才采用具有“不首选”值的行。表中为属性值2。

我该怎么做？

Answer 1

添加“属性”列，因为您有多个属性。

样品

import pandas as pd

location = ['A', 'B', 'B', 'C', 'C']
attribute = ['1', '2', '3', '2', '2']
attribute2 = ['1', '2', '10', '10', '10']

df = pd.DataFrame({'location':location, 
                   'attribute':attribute,
                   'attribute2':attribute2
                   })

df

  location attribute attribute2
0        A         1          1
1        B         2          2 # attr == 2, not preferred
2        B         3         10 # attr2 == 10, not preferred
3        C         2         10
4        C         2         10

代码

attributes = (pd.concat([df['attribute'].ne('2'),
                         df['attribute2'].ne('10')
                         ], 
                        axis=1)
              .mul(weights)
              .sum(axis=1)
              )

idx_values = attributes.groupby(df.location).idxmax()

out = df.iloc[idx_values, :]

输出

  location attribute attribute2
0        A         1          1
2        B         3         10
3        C         2         10

解释

首先，使用
```
pd.concat
```
连接所有属性的布尔系列（在
```
axis1
```
上）。
如果您对某些属性有偏好，请调整
```
weights
```
列表。例如，在此示例中，“属性”的权重为 2。这意味着：
```
attribute != 2
```
被认为比 attribute2 != 10
```
更重要
```
。换句话说：如果出现平局，您更喜欢带有
```
attribute2 == 10
```
的行。我们可以将这些
```
weights
```
与
```
df.mul
```
相加。如果您没有特别的子偏好，请跳过此步骤并立即在
df.sum
```
 上询问 
```
axis=1
。
现在，我们获取结果
```
Series
```
并应用
```
Series.groupby
```
，传递
```
df.locations
```
并得到
```
groupby.idxmax
```
。我们将返回每个位置的最高值的索引值。
最后，我们可以使用这些值从原始的
```
df
```
和
```
df.iloc
```
中进行选择。

Python pandas 使用 if else 条件删除重复项

问题描述投票：0回答：1

1个回答

最新问题

Python pandas 使用 if else 条件删除重复项

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1