这个问题需要我找到一个.txt文件的频率分析。
这是我到目前为止的代码: 这可以找到单词的频率,但是我如何获得实际字母的频率?
f = open('cipher.txt', 'r')
word_count = []
for c in f:
word_count.append(c)
word_count.sort()
decoding = {}
for i in word_count:
decoding[i] = word_count.count(i)
for n in decoding:
print(decoding)
此输出(作为一个简短的示例,因为 txt 文件很长):
{'\n': 12, 'vlvf zev jvg jrgs gvzef\n': 1, 'z uvfgriv sbhfv bu wboof!\n': 1, "gsv yrewf zoo nbhea zaw urfsvf'\n": 1, 'xbhow ube gsv avj bjave yv\n': 1, ' gsv fcerat rf czffrat -\n': 1, 'viva gsrf tezff shg\n': 1, 'bph ab sbfbnrxsr (azeebj ebzw gb gsv wvvc abegs)\n': 1, 'cbfg rafrwv gsv shg.\n': 1, 'fb gszg lvze -- gsv fvxbaw lvze bu tvaebph [1689] -- r szw fhwwvaol gzpva\n': 1, 'fb r czgxsvw hc nl gebhfvef, chg avj xbewf ra nl fgezj szg, zaw\n': 1, 'fcrergf bu gsv ebzw yvxpbavw nv, zaw r xbhow abg xbaxvagezgv ba zalgsrat.\n': 1, 'fgbbw zg gsv xebffebzwf bu czegrat, r jvcg tbbwylv.\n': 1,
它给了我单词,但是我如何获得字母,例如有多少个“a”,或者有多少个“b”?
Counter
是一个非常有用的Python原生类,可以用来优雅地解决你的问题。
# count the letter freqency
from collections import Counter
with open('cipher.txt', 'r') as f:
s = f.read()
c = Counter(s) # the type of c is collection.Counter
# if you want dict as your output type
decoding = dict(c)
print(decoding)
如果你把“每一次与你的离别都像是一段永恒”放在你的
cipher.txt
上,你会通过上面的代码得到以下结果:
{'e': 6, 'v': 1, 'r': 4, 'y': 3, ' ': 8, 'p': 1, 'a': 2, 't': 5, 'i': 5, 'n': 2, 'g': 1, 'f': 1, 'o': 2, 'm': 1, 'u': 1, 's': 1, 'l': 3, 'k': 1}
但是,如果您想自己实现计数,这里有一个可能的解决方案,提供与使用
Counter
相同的结果。
# count the letter freqency, manually, without using collections.Counter
with open('cipher.txt', 'r') as f:
s = f.read()
decoding = {}
for c in s:
if c in decoding:
decoding[c] += 1
else:
decoding[c] = 1
print(decoding)
您可以使用collections标准库中的Counter,它将生成结果字典:
from collections import Counter
s = """
This problem requires me to find the frequency analysis of a .txt file.
This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?"""
c = Counter(s)
print(c.most_common(5))
这将打印:
[(' ', 35), ('e', 20), ('t', 13), ('s', 11), ('o', 10)]
编辑:在不使用
Counter
的情况下,我们可以使用字典并不断增加计数:
c = {}
for character in s:
try:
c[character] += 1
except KeyError:
c[character] = 1
print(c)
这将打印:
{'\n': 4, 'T': 3, 'h': 9, 'i': 9, 's': 11, ' ': 35, 'p': 1, 'r': 9, 'o': 10, 'b': 2, 'l': 6, 'e': 20, 'm': 3, 'q': 4, 'u': 7, 't': 13, 'f': 10, 'n': 6, 'd': 5, 'c': 5, 'y': 5, 'a': 6, '.': 2, 'x': 1, ':': 1, 'w': 3, ',': 1, 'I': 1, 'g': 1, '?': 1}
如果您不想使用计数器,这里有一个更短的替代方案:
with open('cipher.txt', 'r') as f:
s = f.read()
decoding = {}
for c in s:
decoding[c] = decoding.get(c, 0) + 1 # This returns the value decoding[c] if the key exists, or the default value (0) if the key doesn't exist.
print(decoding)