Unicode语言的Python 3字数统计

问题描述投票：0回答：1

我的文本中包含多种语言。我想要一个单词计数，该单词计数仅记录使用Python 3与希腊unicode字符集一起出现的单词。

wordCount = 0
theText = open(file.txt, 'r')
for word in theText.split():
    if GreekUnicodeCheck(word):
        wordCount += 1
print(wordCount)
我曾考虑过检查每个单词中是否都有希腊字母：

wordCount = 0
theText = open(file.txt, 'r')
greekChars = [α, β, γ, δ, ε, ...]
for word in theText.split():
    if any(letter in word for letter in greekChars):
        wordCount += 1
print(wordCount)
我认为应该可以，但是当您考虑所有大写，变音符号等组合时，字符集会变得很大。（我正在使用古典希腊语）。有没有更优雅的解决方案？

我的文本中包含多种语言。我想要一个仅记录使用Python 3使用希腊unicode字符集出现的单词的单词计数，类似：wordCount = 0 ...

python python-3.x unicode python-unicode word-count

1个回答

0
投票

虽然不是很多。看一下this website。所有希腊字母都在此处列出，我使用python脚本将其转换为此列表：

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.