如何进行字母频率？

Question

这个问题需要我找到一个.txt文件的频率分析。

这是我到目前为止的代码：这可以找到单词的频率，但是我如何获得实际字母的频率？

f = open('cipher.txt', 'r')
word_count = []

for c in f:
  word_count.append(c)


word_count.sort()

decoding = {}

for i in word_count:
  decoding[i] = word_count.count(i)

for n in decoding:
  print(decoding)

此输出（作为一个简短的示例，因为 txt 文件很长）：

{'\n': 12, 'vlvf zev jvg jrgs gvzef\n': 1, 'z uvfgriv sbhfv bu wboof!\n': 1, "gsv yrewf zoo nbhea zaw urfsvf'\n": 1, 'xbhow ube gsv avj bjave yv\n': 1, '    gsv fcerat rf czffrat -\n': 1, 'viva gsrf tezff shg\n': 1, 'bph ab sbfbnrxsr (azeebj ebzw gb gsv wvvc abegs)\n': 1, 'cbfg rafrwv gsv shg.\n': 1, 'fb gszg lvze -- gsv fvxbaw lvze bu tvaebph [1689] -- r szw fhwwvaol gzpva\n': 1, 'fb r czgxsvw hc nl gebhfvef, chg avj xbewf ra nl fgezj szg, zaw\n': 1, 'fcrergf bu gsv ebzw yvxpbavw nv, zaw r xbhow abg xbaxvagezgv ba zalgsrat.\n': 1, 'fgbbw zg gsv xebffebzwf bu czegrat, r jvcg tbbwylv.\n': 1,

它给了我单词，但是我如何获得字母，例如有多少个“a”，或者有多少个“b”？

Answer 1

Counter

是一个非常有用的Python原生类，可以用来优雅地解决你的问题。

# count the letter freqency
from collections import Counter

with open('cipher.txt', 'r') as f:
    s = f.read()

c = Counter(s)  # the type of c is collection.Counter

# if you want dict as your output type
decoding = dict(c)
print(decoding)

如果你把“每一次与你的离别都像是一段永恒”放在你的

cipher.txt

上，你会通过上面的代码得到以下结果：

{'e': 6, 'v': 1, 'r': 4, 'y': 3, ' ': 8, 'p': 1, 'a': 2, 't': 5, 'i': 5, 'n': 2, 'g': 1, 'f': 1, 'o': 2, 'm': 1, 'u': 1, 's': 1, 'l': 3, 'k': 1}

但是，如果您想自己实现计数，这里有一个可能的解决方案，提供与使用

Counter

相同的结果。

# count the letter freqency, manually, without using collections.Counter

with open('cipher.txt', 'r') as f:
    s = f.read()

decoding = {}
for c in s:
    if c in decoding:
        decoding[c] += 1
    else:
        decoding[c] = 1

print(decoding)

Answer 2

您可以使用collections标准库中的Counter，它将生成结果字典：

from collections import Counter
s = """

This problem requires me to find the frequency analysis of a .txt file.

This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?"""

c = Counter(s)
print(c.most_common(5))

这将打印：

[(' ', 35), ('e', 20), ('t', 13), ('s', 11), ('o', 10)]

编辑：在不使用

Counter

的情况下，我们可以使用字典并不断增加计数：

c = {}
for character in s:
    try:
        c[character] += 1
    except KeyError:
        c[character] = 1
print(c)

这将打印：

{'\n': 4, 'T': 3, 'h': 9, 'i': 9, 's': 11, ' ': 35, 'p': 1, 'r': 9, 'o': 10, 'b': 2, 'l': 6, 'e': 20, 'm': 3, 'q': 4, 'u': 7, 't': 13, 'f': 10, 'n': 6, 'd': 5, 'c': 5, 'y': 5, 'a': 6, '.': 2, 'x': 1, ':': 1, 'w': 3, ',': 1, 'I': 1, 'g': 1, '?': 1}

Answer 3

如果您不想使用计数器，这里有一个更短的替代方案：

with open('cipher.txt', 'r') as f:
    s = f.read()

decoding = {}
for c in s:
    decoding[c] = decoding.get(c, 0) + 1 # This returns the value decoding[c] if the key exists, or the default value (0) if the key doesn't exist.

print(decoding)

如何进行字母频率？

问题描述投票：0回答：3

3个回答

最新问题

如何进行字母频率？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3