我尝试的过程是获取密文中每个字母的百分比,与英文频率文本进行比较,并计算密文与参考频率之间的频率差异。接下来,我将两个文件之间的差异标准化(尝试了各种方法;这次常数:50,除数:3;很可能是错误的),并取平均值作为密钥。
主要程序:
attempts = 0
max_attempts = 3
while attempts < max_attempts:
input_file_name = input("Please enter the file you want to analyze: ")
if input_file_name.endswith('.txt') and os.path.exists(input_file_name):
input_text = ManageFile.openFile(input_file_name)
if input_text is None:
print("Error: Unable to read the input file.")
attempts += 1
else:
reference_file_name = input("Please enter the reference frequencies file: ")
if reference_file_name.endswith('.txt') and os.path.exists(reference_file_name):
reference_frequencies = ManageFile.openFile(reference_file_name)
if reference_frequencies is None:
print("Error: Unable to read the reference frequencies file.")
attempts += 1
else:
analyzer = FrequencyAnalyzer(input_text, reference_frequencies)
inferred_cipher_key = analyzer.analyze_text()
print("Inferred Caesar Cipher Key (Number of Shifts):", inferred_cipher_key)
decrypt_file = input("Do you want to decrypt the file? (y/n): ").lower()
if decrypt_file == 'y':
decrypted_text = Caesar(inferred_cipher_key).decrypt(input_text)
print("Decrypted Text:")
print(decrypted_text)
inner_attempts = 0
while inner_attempts < max_attempts:
output_file_name = input("Please enter a output file: ")
if output_file_name.endswith('.txt'):
if os.path.exists(output_file_name):
overwrite = input(f"The file '{output_file_name}' already exists. Do you want to overwrite it? (y/n): ").lower()
if overwrite != 'y':
print("Decryption canceled.")
break # Exit the loop if user doesn't want to overwrite and return to the main menu
else:
print(f"Overwriting the file '{output_file_name}'...")
ManageFile(input_text, inferred_cipher_key).toFile(output_file_name, decrypted_text)
print(f"Decrypted text has been saved to '{output_file_name}'")
break # Exit the loop if a valid output file name is provided
else:
print("Invalid output file name. Please include '.txt' extension.")
inner_attempts += 1
else:
print("Failed to read the file after 3 attempts. Returning to the main menu.")
else:
print("Decryption canceled.")
break
else:
print("Invalid reference frequencies file name or file does not exist. Please include '.txt' extension.")
attempts += 1
else:
print("Invalid input file name or file does not exist. Please include '.txt' extension.")
attempts += 1
if attempts >= max_attempts:
print("Failed to read the file(s) after 3 attempts. Returning to the main menu.")
功能:
class TextAnalyzer:
def __init__(self, text):
self._text = text
self._processed_text = ""
self._all_letters = string.ascii_uppercase
self._cleaned_letters = []
self._cleaned_letters_count = 0
self._individual_letter_counts = {}
self._top_5_letter_counts = []
def _preprocess_text(self):
self._cleaned_letters = list(filter(str.isalpha, self._text.upper()))
self._cleaned_letters_count = len(self._cleaned_letters)
self._individual_letter_counts = collections.Counter(self._cleaned_letters)
self._top_5_letter_counts = self._individual_letter_counts.most_common(5)
self._processed_text = " ".join(self._cleaned_letters)
def analyze_text(self):
self._preprocess_text()
class FrequencyAnalyzer(TextAnalyzer):
def __init__(self, text, frequency_data):
super().__init__(text)
self._frequency_data = {}
lines = frequency_data.split("\\n")
for line in lines:
char, percent = line.strip().split(',')
self._frequency_data[char] = float(percent)
self._letter_percentages = {}
self._inferred_cipher_key = None
def _calculate_letter_percentages(self):
super()._preprocess_text()
total_letters = sum(self._individual_letter_counts.values())
letter_percentages = {
letter: (count / total_letters) * 100 for letter, count in self._individual_letter_counts.items()
}
return letter_percentages
def _normalize_letter_percentages(self, letter_percentages):
normalized_diff = {
letter: (letter_percentages[letter] - self._frequency_data[letter] + 50) % 50 / 3
for letter in letter_percentages
}
return normalized_diff
def _generate_report(self):
letter_percentages = self._calculate_letter_percentages()
normalized_diff = self._normalize_letter_percentages(letter_percentages)
average_normalized_diff = sum(normalized_diff.values()) / len(normalized_diff)
inferred_cipher_key = round(average_normalized_diff)
#print("Inferred Caesar Cipher Key (Number of Shifts):", inferred_cipher_key)
return inferred_cipher_key
# print("Inferred Caesar Cipher Key (Number of Shifts):", inferred_cipher_key)
# print("Final Letter Percentages:")
# for char, percentage in letter_percentages.items():
# print(f"{char}: {percentage:.3f}%")
def analyze_text(self):
inferred_cipher_key = self._generate_report()
return inferred_cipher_key
我有 2 个加密文件和 1 个频率参考文件(百分比):
英文参考频率文本文件:
A,8.2
B,1.5
C,2.8
D,4.3
E,12.7
F,2.2
G,2.0
H,6.1
I,7.0
J,0.15
K,0.77
L,4.0
M,2.4
N,6.7
O,7.5
P,1.9
Q,0.095
R,6.0
S,6.3
T,9.1
U,2.8
V,0.98
W,2.4
X,0.15
Y,2.0
Z,0.074
当我运行第一个加密文件时,其中包含:
Beqvstm, beqvstm tqbbtm abiz,
pwe Q ewvlmz epib gwc izm.
Cx ijwdm bpm asg aw pqop,
tqsm i lqiuwvl qv bpm asg.
和频率参考文件通过
Please enter the file you want to analyze: Mystery.txt
Please enter the reference frequencies file: englishtext.txt
Inferred Caesar Cipher Key (Number of Shifts): 8
我得到 8,这是正确的,因为新的解密文本是
Twinkle, twinkle little star,
how I wonder what you are.
Up above the sky so high,
like a diamond in the sky.
但是当我尝试第二个加密文本(这是一个正确的句子)时:
Bpm nikba qv bpib kwuxtmf kiam qa ycmabqwvijtm.
应该给
The facts in that complex case is questionable.
作为解密文本, 我应该得到的推断密钥是 8,但得到的是 5。
Please enter the file you want to analyze: facts.txt
Please enter the reference frequencies file: englishtext.txt
Inferred Caesar Cipher Key (Number of Shifts): 5
我什至尝试了其他正确的句子,例如
Wkh shrsoh duh zhdulqj irupdo dwwluh
应该给
The people are wearing formal attire
作为解密文本,
我应该得到的推断密钥是 3,但我得到的是 4。
Please enter the file you want to analyze: attire.txt
Please enter the reference frequencies file: englishtext.txt
Inferred Caesar Cipher Key (Number of Shifts): 4
我使用的方法有什么问题,有更好的方法吗?
显然,您测试的句子没有表现出您期望的较长英语文本的字符分布。您使用的方法对于短文本来说本质上很脆弱。
也许更可靠的方法是收集相邻字母对(或三联体等)的统计信息;顺便说一句,正是这种密码学分析导致安德烈·马尔可夫发明了马尔可夫链。
对于侵入性较小的更改,也许可以尝试您已有的频率分析建议的前三个(或前五个等)旋转键,并对照字典检查哪些看起来最合理 - 或者可能将它们全部显示给用户并让他们选择。