尝试删除带空格的符号(“-”),同时保持符号(“-”)不带空格

问题描述 投票:1回答:2

我有一个用Python打开的txt文件。我正在尝试删除符号并按字母顺序排列其余单词。删除句点,逗号等不是问题。但是,当我将破折号与其余符号一起添加到列表中时,似乎无法删除带有空格的破折号。

这是我打开的示例:

content = "The quick brown fox - who was hungry - jumps over the 7-year old lazy dog."

这就是我想要的(已删除句点,并且未附加到单词的破折号):

content = "The quick brown fox who was hungry jumps over the 7-year old lazy dog"

但是我要么得到这个(所有破折号都被删除):

content = "The quick brown fox who was hungry jumps over the 7year old lazy dog"

或此(破折号未删除):

content = "The quick brown fox - who was hungry - jumps over the 7-year old lazy dog"

这是我的全部代码。添加content.replace()即可。但这不是我想要的:

f = open("article.txt", "r")

# Create variable (Like this removing " - " works)
content = f.read()
content = content.replace(" - ", " ")

# Create list
wordlist = content.split()

# Which symbols (If I remove the line "content = content.replace(" - ", " ")", the " - " in this list doesn't get removed here)
chars = [",", ".", "'", "(", ")", "‘", "’", " - "]

# Remove symbols
words = []
for element in wordlist:
    temp = ""
    for ch in element:
        if ch not in chars:
            temp += ch
    words.append(temp)

# Print words, sort alphabetically and do not print duplicates
for word in sorted(set(words)):
    print(word)

它是这样的。但是,当我删除content = content.replace(" - ", " ")时,chars中的“空白+破折号+ whitspace”不会被删除。

并且如果我将其替换为“-”(没有空格),则会得到我不想要的内容:

content = "The quick brown fox who was hungry jumps over the 7year old lazy dog"

是否可以使用chars之类的列表来执行此操作,或者是我使用.replace()的唯一选择。

并且为什么有​​一个特殊的原因使得Python首先按字母顺序对大写字母排序,然后对不大写的单词分别排序?

类似(添加字母ABC以强调我要说的内容):

7-year
A
B
C
The
brown
dog
fox
hungry
jumps
lazy
old
over
quick
the
was
who
python python-3.x whitespace symbols alphabetical-sort
2个回答
0
投票

之后

wordlist = content.split()

您的列表不再包含带有开始/结束空格的任何内容。

str.split() 

删除连续的空格。因此您的拆分列表中没有' - '

Doku:https://docs.python.org/3/library/stdtypes.html#str.split

  • str。split(sep = None,maxsplit = -1)

如果未指定sep或为None,则应用不同的拆分算法:连续空白的运行被视为单个分隔符,并且结果开头将包含无空字符串或如果字符串具有前导或尾随空格,则结束。


替换' - '似乎是正确的-保持与代码接近的另一种方法是从拆分列表中完全删除'-'

chars = [",", ".", "'", "(", ")"]   # modified

# Remove symbols
words = []
for element in wordlist:
    temp = ""
    if element == '-':             # skip pure -
        continue
    for ch in element:             # handle characters to be removed
        if ch not in chars:
            temp += ch
    words.append(temp)

0
投票

您可以像这样使用re.sub

>>> import re
>>> strip_chars = re.compile('(?:[,.\'()‘’])|(?:[-,]\s)')
>>> content = "The quick brown fox - who was hungry - jumps over the 7-year old lazy dog."
>>> strip_chars.sub("", content)
'The quick brown fox who was hungry jumps over the 7-year old lazy dog'
>>> strip_chars.sub("", content).split()
['The', 'quick', 'brown', 'fox', 'who', 'was', 'hungry', 'jumps', 'over', 'the', '7-year', 'old', 'lazy', 'dog']
>>> 
© www.soinside.com 2019 - 2024. All rights reserved.