通过使用 pandas 添加整数来区分重复的列名

Question

我有一些具有相同名称的列。我想在重复的列名称中添加 1

数据

Date        Type    hi  hello   stat    hi  hello   
1/1/2022    a       0   0       1       1   0

想要的

Date        Type    hi  hello   stat    hi1     hello1  
1/1/2022    a       0   0       1       1       0

做

mask = df['col2'].duplicated(keep=False)

我相信我可以利用掩码，但不确定如何在不调用实际列的情况下有效地实现这一点。我想调用完整的数据集并允许算法更新欺骗。

任何建议都值得赞赏

Answer 1

请注意，从 Pandas 2.x 版本开始，tdy 的解决方案将不再起作用，因为它已被重构为

pandas.io.common.dedup_names

（请参阅 https://github.com/pandas-dev/pandas/issues/50371）：

import pandas as pd

df.columns = pd.io.common.dedup_names(df.columns, is_potential_multiindex=False)

Answer 2

pandas 2.0 的新功能

使用新的内置

io.common.dedup_names()

：

df.columns = pd.io.common.dedup_names(df.columns, is_potential_multiindex=False)

#        Date  Type  hi  hello  stat  hi.1  hello.1
# 0  1/1/2022     a   0      0     1     1        0

请注意，它可以缩放到任意数量的重复名称：

cols = ['hi']*3 + ['hello']*5
pd.io.common.dedup_names(cols, is_potential_multiindex=False)

# ['hi', 'hi.1', 'hi.2', 'hello', 'hello.1', 'hello.2', 'hello.3', 'hello.4']

对于熊猫< 2.0

之前的方法是

io.parsers.base_parser._maybe_dedup_names()

:

df.columns = pd.io.parsers.base_parser.ParserBase({'usecols': None})._maybe_dedup_names(df.columns)

对于熊猫< 1.3

原来的方法是

io.parsers._maybe_dedup_names()

:

df.columns = pd.io.parsers.ParserBase({})._maybe_dedup_names(df.columns)

Answer 3

您需要对列名应用重复操作。然后将重复信息映射到一个字符串，然后您可以将其添加到原始列名称中。

df.columns = df.columns+[{False:'',True:'1'}[x] for x in df.columns.duplicated()]

Answer 4

我们能做到

s = df.columns.to_series().groupby(df.columns).cumcount().replace({0:''}).astype(str).radd('.')
df.columns = (df.columns + s).str.strip('.')
df
Out[153]: 
       Date Type  hi  hello  stat  hi.1  hello.1
0  1/1/2022    a   0      0     1     1        0

通过使用 pandas 添加整数来区分重复的列名

问题描述投票：0回答：4

4个回答

pandas 2.0 的新功能

对于熊猫< 2.0

对于熊猫< 1.3

最新问题

通过使用 pandas 添加整数来区分重复的列名

问题描述 投票：0回答：4

4个回答

pandas 2.0 的新功能

对于熊猫< 2.0

对于熊猫< 1.3

最新问题

问题描述投票：0回答：4