如何进行字母频率?

问题描述 投票:0回答:3

这个问题需要我找到一个.txt文件的频率分析。

这是我到目前为止的代码: 这可以找到单词的频率,但是我如何获得实际字母的频率?

f = open('cipher.txt', 'r')
word_count = []

for c in f:
  word_count.append(c)


word_count.sort()

decoding = {}

for i in word_count:
  decoding[i] = word_count.count(i)

for n in decoding:
  print(decoding)

此输出(作为一个简短的示例,因为 txt 文件很长):

{'\n': 12, 'vlvf zev jvg jrgs gvzef\n': 1, 'z uvfgriv sbhfv bu wboof!\n': 1, "gsv yrewf zoo nbhea zaw urfsvf'\n": 1, 'xbhow ube gsv avj bjave yv\n': 1, '    gsv fcerat rf czffrat -\n': 1, 'viva gsrf tezff shg\n': 1, 'bph ab sbfbnrxsr (azeebj ebzw gb gsv wvvc abegs)\n': 1, 'cbfg rafrwv gsv shg.\n': 1, 'fb gszg lvze -- gsv fvxbaw lvze bu tvaebph [1689] -- r szw fhwwvaol gzpva\n': 1, 'fb r czgxsvw hc nl gebhfvef, chg avj xbewf ra nl fgezj szg, zaw\n': 1, 'fcrergf bu gsv ebzw yvxpbavw nv, zaw r xbhow abg xbaxvagezgv ba zalgsrat.\n': 1, 'fgbbw zg gsv xebffebzwf bu czegrat, r jvcg tbbwylv.\n': 1,

它给了我单词,但是我如何获得字母,例如有多少个“a”,或者有多少个“b”?

python for-loop encoding cryptography frequency-analysis
3个回答
0
投票

Counter
是一个非常有用的Python原生类,可以用来优雅地解决你的问题。

# count the letter freqency
from collections import Counter

with open('cipher.txt', 'r') as f:
    s = f.read()

c = Counter(s)  # the type of c is collection.Counter

# if you want dict as your output type
decoding = dict(c)
print(decoding)

如果你把“每一次与你的离别都像是一段永恒”放在你的

cipher.txt
上,你会通过上面的代码得到以下结果:

{'e': 6, 'v': 1, 'r': 4, 'y': 3, ' ': 8, 'p': 1, 'a': 2, 't': 5, 'i': 5, 'n': 2, 'g': 1, 'f': 1, 'o': 2, 'm': 1, 'u': 1, 's': 1, 'l': 3, 'k': 1}

但是,如果您想自己实现计数,这里有一个可能的解决方案,提供与使用

Counter
相同的结果。

# count the letter freqency, manually, without using collections.Counter

with open('cipher.txt', 'r') as f:
    s = f.read()

decoding = {}
for c in s:
    if c in decoding:
        decoding[c] += 1
    else:
        decoding[c] = 1

print(decoding)

0
投票

您可以使用collections标准库中的Counter,它将生成结果字典:

from collections import Counter
s = """

This problem requires me to find the frequency analysis of a .txt file.

This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?"""

c = Counter(s)
print(c.most_common(5))

这将打印:

[(' ', 35), ('e', 20), ('t', 13), ('s', 11), ('o', 10)]

编辑:在不使用

Counter
的情况下,我们可以使用字典并不断增加计数:

c = {}
for character in s:
    try:
        c[character] += 1
    except KeyError:
        c[character] = 1
print(c)

这将打印:

{'\n': 4, 'T': 3, 'h': 9, 'i': 9, 's': 11, ' ': 35, 'p': 1, 'r': 9, 'o': 10, 'b': 2, 'l': 6, 'e': 20, 'm': 3, 'q': 4, 'u': 7, 't': 13, 'f': 10, 'n': 6, 'd': 5, 'c': 5, 'y': 5, 'a': 6, '.': 2, 'x': 1, ':': 1, 'w': 3, ',': 1, 'I': 1, 'g': 1, '?': 1}

0
投票

如果您不想使用计数器,这里有一个更短的替代方案:

with open('cipher.txt', 'r') as f:
    s = f.read()

decoding = {}
for c in s:
    decoding[c] = decoding.get(c, 0) + 1 # This returns the value decoding[c] if the key exists, or the default value (0) if the key doesn't exist.

print(decoding)
© www.soinside.com 2019 - 2024. All rights reserved.