如何从字符串中删除文本的特定部分？

Question

我正在使用Python。我有一个带有字符串列“描述”的数据框。在这些字符串中，我有这样的内容：“п.5.6.2 ГОСТ 2.114-2016”、“п.4.1 ГОСТ 2.102-2013”、“п.5 ГОСТ Р 51672-2000”等等。我想删除它们

我尝试了这段代码：

import pandas as pd
import re

# Sample DataFrame
data = {'description': ["п. 5.6.2 ГОСТ 2.114-2016",
                        "п. 4.1 ГОСТ 2.102-2013",
                        "п.5 ГОСТ Р 51672-2000"]}

df = pd.DataFrame(data)

# Define the regex pattern to match the specified format
pattern = r'п\.\s*\d+(?:\.\d+)*\s*ГОСТ(?:\s*Р)?\s*\d+(?:-\d+)*'

# Remove the matched patterns from the 'description' column
df['description'] = df['description'].apply(lambda x: re.sub(pattern, '', x))

print(df)

但结果是：

              description
0               .114-2016
1               .102-2013
2

我做错了什么？

Answer 1

正则表达式模式似乎没有正确捕获您要删除的整个字符串。让我们调整模式以确保它与您提供的整个格式匹配。这是修改后的代码：

import pandas as pd
import re

# Sample DataFrame
data = {'description': ["п. 5.6.2 ГОСТ 2.114-2016",
                        "п. 4.1 ГОСТ 2.102-2013",
                        "п.5 ГОСТ Р 51672-2000"]}

df = pd.DataFrame(data)

# Define the regex pattern to match the specified format
pattern = r'п\.\s*\d+(?:\.\d+)*\s*ГОСТ(?:\s*Р)?\s*\d+(?:-\d+)*'

# Adjust the regex pattern to match the entire string and remove it
df['description'] = df['description'].apply(lambda x: re.sub(pattern, '', x).strip())

print(df)

这应该正确地从 DataFrame 的“描述”列中删除指定的模式。

如何从字符串中删除文本的特定部分？

问题描述投票：0回答：1

1个回答

最新问题

如何从字符串中删除文本的特定部分？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1