尝试删除带空格的符号（“-”），同时保持符号（“-”）不带空格

Question

我有一个用Python打开的txt文件。我正在尝试删除符号并按字母顺序排列其余单词。删除句点，逗号等不是问题。但是，当我将破折号与其余符号一起添加到列表中时，似乎无法删除带有空格的破折号。

这是我打开的示例：

content = "The quick brown fox - who was hungry - jumps over the 7-year old lazy dog."

这就是我想要的（已删除句点，并且未附加到单词的破折号）：

content = "The quick brown fox who was hungry jumps over the 7-year old lazy dog"

但是我要么得到这个（所有破折号都被删除）：

content = "The quick brown fox who was hungry jumps over the 7year old lazy dog"

或此（破折号未删除）：

content = "The quick brown fox - who was hungry - jumps over the 7-year old lazy dog"

这是我的全部代码。添加content.replace（）即可。但这不是我想要的：

f = open("article.txt", "r")

# Create variable (Like this removing " - " works)
content = f.read()
content = content.replace(" - ", " ")

# Create list
wordlist = content.split()

# Which symbols (If I remove the line "content = content.replace(" - ", " ")", the " - " in this list doesn't get removed here)
chars = [",", ".", "'", "(", ")", "‘", "’", " - "]

# Remove symbols
words = []
for element in wordlist:
    temp = ""
    for ch in element:
        if ch not in chars:
            temp += ch
    words.append(temp)

# Print words, sort alphabetically and do not print duplicates
for word in sorted(set(words)):
    print(word)

它是这样的。但是，当我删除content = content.replace(" - ", " ")时，chars中的“空白+破折号+ whitspace”不会被删除。

并且如果我将其替换为“-”（没有空格），则会得到我不想要的内容：

content = "The quick brown fox who was hungry jumps over the 7year old lazy dog"

是否可以使用chars之类的列表来执行此操作，或者是我使用.replace（）的唯一选择。

并且为什么有一个特殊的原因使得Python首先按字母顺序对大写字母排序，然后对不大写的单词分别排序？

类似（添加字母ABC以强调我要说的内容）：

7-year
A
B
C
The
brown
dog
fox
hungry
jumps
lazy
old
over
quick
the
was
who

Answer 1

之后

wordlist = content.split()

您的列表不再包含带有开始/结束空格的任何内容。

str.split()

删除连续的空格。因此您的拆分列表中没有' - '。

Doku：https://docs.python.org/3/library/stdtypes.html#str.split

str。split（sep = None，maxsplit = -1）
如果未指定sep或为None，则应用不同的拆分算法：连续空白的运行被视为单个分隔符，并且结果开头将包含无空字符串或如果字符串具有前导或尾随空格，则结束。

替换' - '似乎是正确的-保持与代码接近的另一种方法是从拆分列表中完全删除'-'：

chars = [",", ".", "'", "(", ")"]   # modified

# Remove symbols
words = []
for element in wordlist:
    temp = ""
    if element == '-':             # skip pure -
        continue
    for ch in element:             # handle characters to be removed
        if ch not in chars:
            temp += ch
    words.append(temp)

Answer 2

您可以像这样使用re.sub：

>>> import re
>>> strip_chars = re.compile('(?:[,.\'()‘’])|(?:[-,]\s)')
>>> content = "The quick brown fox - who was hungry - jumps over the 7-year old lazy dog."
>>> strip_chars.sub("", content)
'The quick brown fox who was hungry jumps over the 7-year old lazy dog'
>>> strip_chars.sub("", content).split()
['The', 'quick', 'brown', 'fox', 'who', 'was', 'hungry', 'jumps', 'over', 'the', '7-year', 'old', 'lazy', 'dog']
>>>

尝试删除带空格的符号（“-”），同时保持符号（“-”）不带空格

问题描述投票：1回答：2

2个回答

最新问题

尝试删除带空格的符号（“-”），同时保持符号（“-”）不带空格

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2