我已经尝试让我的代码来计算句子中的单词数(用于在.txt文件上进行测试之前,但是它给了我这个结果:
Mr. Blah has a lot of Sr?
and Mrs. blah does not care.
lol
[[1, 1.0], [1, 1.0], [1, 1.0]]
而不是以下结果:
Mr. Blah has a lot of Sr?
and Mrs. blah does not care.
lol
[[7, 2.4], [6, 3.5], [1, 3.0]]
我下面的代码到目前为止是我一直在努力的。它应该计算一个句子中的单词数。然后计算句子中使用的字母数。最后,计算句子中使用的单词的平均字母数]
terminators = ["?", "!"] #Characters that always end a sentence other than a period
abrevs = ["Mrs", "Mr", "Dr", "Fr", "Jr", "Sr"] #Abbreviations that prevent a period from ending a sentence
#Replaced the word_length_list function from 1a. with this new one
def word_length_list(sentence):
print(sentence)
return [1]
#Once a sentence is found, this will calculate statistics for it
def collect_statistics(sentence):
word_lengths = word_length_list(sentence)
words_in_sentence = len(word_lengths) #Get word count
#Average word length
sum_of_word_lengths = 0
for length in word_lengths:
sum_of_word_lengths = sum_of_word_lengths + length
average_word_length = sum_of_word_lengths/words_in_sentence;
return [words_in_sentence, average_word_length]
# Replaced given text with this to test if it does work for the abbreviations and ellipses
story_text = "Mr. Blah has a lot of Sr? and Mrs. blah does not care. lol"
story_length = len(story_text)
statistics = []
sentence = ""
for i in range(story_length):
sentence_over = False # Assumption that this sentence will continue after the next character
nextchar = story_text[i] # Look at the next character in the story
if nextchar in terminators:
sentence_over = True #Change assumption.
#If it is a period, we have some special handling to do.
elif nextchar == ".": #End the sentence after this if-else block.
#But if it is a period, we have to deal with ellipsis and abbreviations
#If the period is followed by another period, probably an ellipsis & want to include in the sentence.
is_part_of_elipse = i+1 < story_length and story_text[i+1] == "."
is_part_of_abbrev = False # Assumption that this sentence will continue after a period, an abbreviation
for ab in abrevs: #Then check for abbreviation
if sentence.endswith(ab):
is_part_of_abbrev = True
if not (is_part_of_elipse or is_part_of_abbrev): # If not part of abbreviation and not part of ellipsis,
sentence_over = True # end of sentence by (period)
sentence = sentence + nextchar;
# Calculate the sentence statistcs
if sentence_over:
statistics.append(collect_statistics(sentence))
# Clear the sentence variable to make room for the next
sentence = ""
#Incase the last sentence was not terminated, add it to the stats
if len(sentence)>0:
statistics.append(collect_statistics(sentence))
print(statistics)
此函数总是返回相同的结果:
def word_length_list(sentence):
print(sentence)
return [1]
您可能想要查看计算句子中单词数的方式。
您需要修复几件事。
第一个word_length_list
返回[1]
,没有其他。
将该功能更改为:
def word_length_list(sentence):
return sentence.split()
接下来,我们需要在collect_statistics
中进行一些更改,以获得所需的结果:
将该功能更改为:
def collect_statistics(sentence):
word_lengths = word_length_list(sentence)
words_in_sentence = len(word_lengths)
sum_of_word_lengths = 0
for word in word_lengths:
sum_of_word_lengths += len(word)
average_word_length = sum_of_word_lengths/words_in_sentence;
return [words_in_sentence, average_word_length]
那表示数学中的某些行为会导致一些长的十进制返回,因此您需要对此进行补偿。我认为我得到的数字稍微多一点,因为代码仍在计算.
部分和Sr.
中的?
,因此您期望的2.4实际上是2.7。