pandas 中重复值的索引

Question

我在 python 中使用 pandas 并使用

duplicated(keep=False)

我已经在同一数据帧列的值之间进行了检查。所以我得到的结果是：

5063     True
5064     True
5065     True
5066    False
5067    False
        ...  
5310    False
5311    False
5312     True
5313    False
5314    False

现在我想找到真实值的索引。所以我认为这可行

duplicateRowsDF = df.duplicated(keep=False)
for j in range (0, len(duplicateRowsDF)):
    attempt=str(duplicateRowsDF.iloc[j])
    if attempt=="True":
        duplicated_index=duplicateRowsDF.index

但我越来越

Index([5063, 5064, 5065, 5066, 5067, 5068, 5069, 5070, 5071, 5072,
   ...
   5305, 5306, 5307, 5308, 5309, 5310, 5311, 5312, 5313, 5314],
  dtype='int64', length=252)

虽然我的目标是用 5063 5064 5065 5312 制作一个“数组”。我也尝试过代码

df[df.index.duplicated(keep=False)]

，但它似乎没有提供我所期望的。

Answer 1

您可以在

boolean indexing

中过滤索引值：

out = df.index[df.index.duplicated(keep=False)]

Answer 2

只需对索引执行布尔索引：

out = duplicateRowsDF.index[duplicateRowsDF]

或者，没有中间：

out = df.index[df.duplicated(keep=False)]

pandas 中重复值的索引

问题描述投票：0回答：2

2个回答

最新问题

pandas 中重复值的索引

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2