推断的凯撒键在正确的句子上不准确

问题描述 投票:0回答:1

我尝试的过程是获取密文中每个字母的百分比,与英文频率文本进行比较,并计算密文与参考频率之间的频率差异。接下来,我将两个文件之间的差异标准化(尝试了各种方法;这次常数:50,除数:3;很可能是错误的),并取平均值作为密钥。

主要程序:

attempts = 0
max_attempts = 3

while attempts < max_attempts:
    input_file_name = input("Please enter the file you want to analyze: ")

    if input_file_name.endswith('.txt') and os.path.exists(input_file_name):
        input_text = ManageFile.openFile(input_file_name)

        if input_text is None:
            print("Error: Unable to read the input file.")
            attempts += 1
        else:
            reference_file_name = input("Please enter the reference frequencies file: ")

            if reference_file_name.endswith('.txt') and os.path.exists(reference_file_name):
                reference_frequencies = ManageFile.openFile(reference_file_name)

                if reference_frequencies is None:
                    print("Error: Unable to read the reference frequencies file.")
                    attempts += 1
                else:
                    analyzer = FrequencyAnalyzer(input_text, reference_frequencies)
                    inferred_cipher_key = analyzer.analyze_text()
                    print("Inferred Caesar Cipher Key (Number of Shifts):", inferred_cipher_key)

                    decrypt_file = input("Do you want to decrypt the file? (y/n): ").lower()

                    if decrypt_file == 'y':
                        decrypted_text = Caesar(inferred_cipher_key).decrypt(input_text)
                        print("Decrypted Text:")
                        print(decrypted_text)
                        inner_attempts = 0
                        while inner_attempts < max_attempts:
                            output_file_name = input("Please enter a output file: ")

                            if output_file_name.endswith('.txt'):
                                if os.path.exists(output_file_name):
                                    overwrite = input(f"The file '{output_file_name}' already exists. Do you want to overwrite it? (y/n): ").lower()
                                    if overwrite != 'y':
                                        print("Decryption canceled.")
                                        break # Exit the loop if user doesn't want to overwrite and return to the main menu
                                    else:
                                        print(f"Overwriting the file '{output_file_name}'...")

                                ManageFile(input_text, inferred_cipher_key).toFile(output_file_name, decrypted_text)
                                print(f"Decrypted text has been saved to '{output_file_name}'")
                                break  # Exit the loop if a valid output file name is provided
                            else:
                                print("Invalid output file name. Please include '.txt' extension.")
                                inner_attempts += 1
                        else:
                            print("Failed to read the file after 3 attempts. Returning to the main menu.")
                    else:
                        print("Decryption canceled.")
                    break
            else:
                print("Invalid reference frequencies file name or file does not exist. Please include '.txt' extension.")
                attempts += 1
    else:
        print("Invalid input file name or file does not exist. Please include '.txt' extension.")
        attempts += 1

if attempts >= max_attempts:
    print("Failed to read the file(s) after 3 attempts. Returning to the main menu.")

功能:

class TextAnalyzer:
    def __init__(self, text):
        self._text = text
        self._processed_text = ""
        self._all_letters = string.ascii_uppercase
        self._cleaned_letters = []
        self._cleaned_letters_count = 0
        self._individual_letter_counts = {}
        self._top_5_letter_counts = []

    def _preprocess_text(self):
        self._cleaned_letters = list(filter(str.isalpha, self._text.upper()))
        self._cleaned_letters_count = len(self._cleaned_letters)
        self._individual_letter_counts = collections.Counter(self._cleaned_letters)
        self._top_5_letter_counts = self._individual_letter_counts.most_common(5)
        self._processed_text = " ".join(self._cleaned_letters)

    def analyze_text(self):
        self._preprocess_text()

class FrequencyAnalyzer(TextAnalyzer):
    def __init__(self, text, frequency_data):
        super().__init__(text)
        self._frequency_data = {}
        lines = frequency_data.split("\\n")
        for line in lines:
            char, percent = line.strip().split(',')
            self._frequency_data[char] = float(percent)
        self._letter_percentages = {}
        self._inferred_cipher_key = None

    def _calculate_letter_percentages(self):
        super()._preprocess_text()
        total_letters = sum(self._individual_letter_counts.values())
        letter_percentages = {
            letter: (count / total_letters) * 100 for letter, count in self._individual_letter_counts.items()
        }
        return letter_percentages

    def _normalize_letter_percentages(self, letter_percentages):
        normalized_diff = {
            letter: (letter_percentages[letter] - self._frequency_data[letter] + 50) % 50 / 3
            for letter in letter_percentages
        }
        return normalized_diff

    def _generate_report(self):
        letter_percentages = self._calculate_letter_percentages()
        normalized_diff = self._normalize_letter_percentages(letter_percentages)
        average_normalized_diff = sum(normalized_diff.values()) / len(normalized_diff)
        inferred_cipher_key = round(average_normalized_diff)
        #print("Inferred Caesar Cipher Key (Number of Shifts):", inferred_cipher_key)
        return inferred_cipher_key

        # print("Inferred Caesar Cipher Key (Number of Shifts):", inferred_cipher_key)

        # print("Final Letter Percentages:")
        # for char, percentage in letter_percentages.items():
        #     print(f"{char}: {percentage:.3f}%")

    def analyze_text(self):
        inferred_cipher_key = self._generate_report()
        return inferred_cipher_key

我有 2 个加密文件和 1 个频率参考文件(百分比):

英文参考频率文本文件:

A,8.2
B,1.5
C,2.8
D,4.3
E,12.7
F,2.2
G,2.0
H,6.1
I,7.0
J,0.15
K,0.77
L,4.0
M,2.4
N,6.7
O,7.5
P,1.9
Q,0.095
R,6.0
S,6.3
T,9.1
U,2.8
V,0.98
W,2.4
X,0.15
Y,2.0
Z,0.074

当我运行第一个加密文件时,其中包含:

Beqvstm, beqvstm tqbbtm abiz,
pwe Q ewvlmz epib gwc izm.

Cx ijwdm bpm asg aw pqop,
tqsm i lqiuwvl qv bpm asg.

和频率参考文件通过

Please enter the file you want to analyze: Mystery.txt
Please enter the reference frequencies file: englishtext.txt
Inferred Caesar Cipher Key (Number of Shifts): 8

我得到 8,这是正确的,因为新的解密文本是

Twinkle, twinkle little star,
how I wonder what you are.

Up above the sky so high,
like a diamond in the sky.

但是当我尝试第二个加密文本(这是一个正确的句子)时:

Bpm nikba qv bpib kwuxtmf kiam qa ycmabqwvijtm.

应该给

The facts in that complex case is questionable.

作为解密文本, 我应该得到的推断密钥是 8,但得到的是 5。

Please enter the file you want to analyze: facts.txt
Please enter the reference frequencies file: englishtext.txt
Inferred Caesar Cipher Key (Number of Shifts): 5

我什至尝试了其他正确的句子,例如

Wkh shrsoh duh zhdulqj irupdo dwwluh

应该给

The people are wearing formal attire

作为解密文本,

我应该得到的推断密钥是 3,但我得到的是 4。

Please enter the file you want to analyze: attire.txt
Please enter the reference frequencies file: englishtext.txt
Inferred Caesar Cipher Key (Number of Shifts): 4

我使用的方法有什么问题,有更好的方法吗?

python encryption caesar-cipher
1个回答
0
投票

显然,您测试的句子没有表现出您期望的较长英语文本的字符分布。您使用的方法对于短文本来说本质上很脆弱。

也许更可靠的方法是收集相邻字母对(或三联体等)的统计信息;顺便说一句,正是这种密码学分析导致安德烈·马尔可夫发明了马尔可夫链。

对于侵入性较小的更改,也许可以尝试您已有的频率分析建议的前三个(或前五个等)旋转键,并对照字典检查哪些看起来最合理 - 或者可能将它们全部显示给用户并让他们选择。

© www.soinside.com 2019 - 2024. All rights reserved.