我想用新月顺序数字的名称替换列中的字符串

Question

我有一个名为 ID 的列的数据框，其中包含以下字符串：“10387-14_S91_L001”和“1590-13H-caso14_S76”，总共 20 个不同的 ID，它们沿着 107 行数据框重复。

我想将包含子字符串“caso”的字符串替换为仅“caso00”，附带的数字必须按新月顺序排列；不包含“caso”的字符串必须替换为“controle00”，同样采用新月顺序。

我尝试仅列出列中的 unique() 值，但无法继续

Answer 1

如果我理解正确，您可以将

str.replace

与正则表达式一起使用：

df['out'] = df['ID'].str.replace(r'([a-zA-Z]+)(\d*)(?=_[^_]*$)',
                                 lambda m: f'caso{m.group(2):02}'
                                 if m.group(1) == 'caso' else 'controle00',
                                 regex=True)

输出：

                    ID                      out
0     0387-14_S91_L001  0387-14_controle00_L001
1  1590-13H-caso14_S76      1590-13H-caso14_S76
2         123-caso_456           123-caso00_456

正则表达式演示

([a-zA-Z]+)  # match (and capture) letters 
(\d*)        # match (and capture) optional digits
(?=_[^_]*$)  # ensure there is a single _ and no other until the end

我想用新月顺序数字的名称替换列中的字符串

问题描述投票：0回答：1

1个回答

最新问题

我想用新月顺序数字的名称替换列中的字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1