这是我之前在这里的帖子:
这是我使用的代码:
import sys
import re
pattern = re.compile("^[a-z]+$") # matches purely alphabetic words
starting_vowels = re.compile("(^[aeiouAEIOU])") # matches starting vowels
ending_vowels = re.compile("[aeiouAEIOU]$") # matches ending vowels
starting_vowel_match = 0
ending_vowel_match = 0
for line in sys.stdin:
line = line.strip() # removes leading and trailing whitespace
words = line.lower().split() # splits the line into words and converts to lowercase
for word in words:
if len(word) == 1:
print(word[0], 1, *((1, 1) if word[0] in 'aeiou' else (0, 0))) # * unpacks startVowel 1 endVowel 1 if word[0] is a vowel
else:
print(word[0], 1, 1 if word[0] in 'aeiou' else 0, 0)
print(*(f'{letter} 1 0 0' for letter in word[1: -1]), sep='\n')
print(word[-1], 1, 0, 1 if word[-1] in 'aeiou' else 0)
我希望它只在字符是字母表时打印,所以我想要的示例输出是包含字符串“这是美好生活”的文本文件:
i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1
我现在看到的是:
i 1 1 0
' 1 0 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1
我想知道如何去掉输出中的特殊字符。我尝试了几件事,包括添加
for letter in word:
if pattern.match(letter):
在
for letter in word"
块中,但它没有返回我想要的输出。
不确定为什么原始代码与 re 一起工作,因为它从未被使用过。
在分析超过 1 个字母的单词时,需要单独考虑 [1:-1] 拆分中的所有字符。
像这样的东西:
import sys
from string import ascii_lowercase as LOWER
VOWELS = set('aeiouAEIOU')
def isvowel(c):
return int(c in VOWELS)
for line in sys.stdin:
for word in line.strip().lower().split():
if len(word) == 1:
print(word, '1 1', isvowel(word[0]))
else:
print(word[0], 1, isvowel(word[0]), 0)
for letter in word[1:-1]:
if letter in LOWER:
print(f'{letter} 1 0 0')
print(word[-1], '1 0', isvowel(word[-1]))
输出:
i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1
所以你想把一个字符串拆分成单词,然后把每个单词拆分成字母。对于您想要打印的每个字母:
[letter] [starting_vowel_match] [letter_vowel_match] [ending_vowel_match]
这是我解决这个问题的方法:
import re
test = "It's a beautiful life"
for line in test.split("\n"):
line = line.strip() # removes leading and trailing whitespace
words = line.lower().split() # splits the line into words and converts to lowercase
for word in words:
for letter in re.sub(r'[^a-zA-Z0-9]', '', word):
print(
letter,
1 if word[0] in 'aeiou' else 0,
1 if letter in 'aeiou' else 0,
1 if word[-1] in 'aeiou' else 0)
结果看起来与您的示例输出不同,但我希望第一行包含 starting_vowel_match!
i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 0 0 0
e 0 1 0
a 0 1 0
u 0 1 0
t 0 0 0
i 0 1 0
f 0 0 0
u 0 1 0
l 0 0 0
l 0 0 1
i 0 1 1
f 0 0 1
e 0 1 1