我需要编写一个函数,将多个格式字符串替换为downcase。
例如,一个段落包含不同格式的单词'something',如'Something','SomeThing','SOMETHING','SomeTHing'需要将所有格式单词转换为downcase'something'。
如何用downcase替换函数?
您可以将段落拆分为不同的单词,然后使用slugify模块生成每个单词的slug,将其与“something”进行比较,如果匹配,则将该单词替换为“something”。
In [1]: text = "This paragraph contains Something, SOMETHING, AND SomeTHing"
In [2]: from slugify import slugify
In [3]: for word in text.split(" "): # Split the text using space, and iterate through the words
...: if slugify(unicode(word)) == "something": # Compare the word slug with "something"
...: text = text.replace(word, word.lower())
In [4]: text
Out[4]: 'This paragraph contains something, something AND something'
将文本拆分为单个单词,并检查以小写字母书写的单词是否为“某事”。如果是,则将案例更改为更低
if word.lower() == "something":
text = text.replace(word, "something")
要了解如何将文本拆分为单词,请参阅this question。
另一种方法是迭代单个字母并检查字母是否是“某事”的第一个字母:
text = "Many words: SoMeThInG, SOMEthING, someTHing"
for n in range(len(text)-8):
if text[n:n+9].lower() == "something": # check whether "something" is here
text = text.replace(text[n:n+9], "something")
print text
您还可以使用re.findall
搜索并将段落拆分为单词和标点符号,并用小写版本替换所有不同的"Something"
案例:
import re
text = "Something, Is: SoMeThInG, SOMEthING, someTHing."
to_replace = "something"
words_punct = re.findall(r"[\w']+|[.,!?;: ]", text)
new_text = "".join(to_replace if x.lower() == to_replace else x for x in words_punct)
print(new_text)
哪个输出:
something, Is: something, something, something.
注意:re.findall
需要硬编码的正则表达式来搜索字符串中的内容。您的实际文本可能包含上述正则表达式中不包含的字符,您需要根据需要添加这些字符。