我编写了Python多处理器代码,由于某种原因仅使用一个处理器,请指教

问题描述 投票:0回答:1

我正在尝试使用多处理器解决方案通过暴力破解 md5,但是当我执行时它总是只使用一个处理器。我究竟做错了什么?希望得到反馈,谢谢。

import hashlib
import itertools
import string
import time
import multiprocessing


def crack(ii, numb_core, char_set, cracked_hashes, solution_found, processor_lower_bound, processor_upper_bound):
    # reads hashes from text file of hashes given from passwords in assignment and assembles list with \n removed
    fout = open("cracked_matches.txt", 'w')
    fin = open("hashes.txt", 'r')
    hashes_list = fin.readlines()
    hashes_list = [hash.strip() for hash in hashes_list]
    # not needed anymore, close for security
    fin.close()
    # event allows all processes to stop once a solution is found to avoid unnecessary computation
    # starts counting time, process_time_ns used to measure strictly computation time and avoid float inaccuracies
    time_start = time.process_time_ns() * 1e-9
    # loops organized by the string length allotted to a given processor, every character in the character set
    # is iterated for every possible length and combination of characters
    while len(cracked_hashes) < 8:
        for i in range(processor_lower_bound, processor_upper_bound + 1):
            print(f"{ii} {numb_core} doing {i} {char_set}")
            for char in itertools.product(char_set, repeat=i):
                # every char is combined with every char i times to get string of i length and every variation of chars
                if solution_found.is_set():
                    return 0
                crack_attempt = ''.join(char)
                # initializes md5 hash with the combination of chars encoded as bytes (utf-8 is friendly with all OS)
                m = hashlib.md5()
                m.update(bytes(crack_attempt, encoding='utf-8'))
                # checks the char combination against the hash list, if correct and unique ends the process
                # and outputs answer with time
                if m.hexdigest() in hashes_list and (crack_attempt, m.hexdigest()) not in cracked_hashes:
                    time_end = time.process_time_ns() * 1e-9
                    print("Password Cracked: " + crack_attempt + "\tIn " + str(time_end - time_start) + " seconds"
                            "\nHash: " + m.hexdigest())
                    # hash and password pair are saved and written to file
                    cracked_hashes.add((crack_attempt, m.hexdigest()))
                    #return cracked_hashes
    # in case of failure, processes are still stopped
    print("Unable to crack")


if __name__ == "__main__":
    # hexdigits chosen because MD5 doesn't include punctuation to minimize iterations
    # replace function gets rid of spaces
    char_set = string.hexdigits.replace(string.whitespace, '')
    # initializes cracked hash list to avoid repeating findings
    cracked_hashes = set()
    solution_found = multiprocessing.Event()
    # file is created to log cracked passwords and matching hashes
    # list of processes
    processes = []
    # loop starts multiple processes per execution of crack function, every execution will crack one password,
    # log it, and then end the processes
    for num_core in range(multiprocessing.cpu_count()):
        # distributes hash lengths across processes to divide labor, when a hash is found, that length
        # is skipped to prevent redundancy in computation and concentrate cores on uncracked lengths
        processor_lower_bound = (len(cracked_hashes) + num_core + 1)
        processor_upper_bound = processor_lower_bound + 1
        # initializes process
        process = multiprocessing.Process(target=crack,
                                          args=(len(cracked_hashes), num_core, char_set, cracked_hashes, solution_found,
                                                processor_lower_bound, processor_upper_bound,))
        processes.append(process)
        process.start()
    for process in processes:
        process.join()

我尝试过使用池和进程,进程运行得更好,请告诉我如何解决这个问题

python multiprocessing python-multiprocessing cpu md5
1个回答
0
投票

您在 Python 中使用多处理进行 MD5 破解的方法是正确的,但需要解决几个关键方面,以确保有效利用多个处理器:

  • 共享内存问题:在当前的实现中,每个进程独立工作,不与其他进程共享状态。 hacked_hashes 集不在进程之间共享。这一点至关重要,因为每个进程都需要了解其他进程破解的哈希值,以避免多余的工作。

  • 动态工作分配:在处理器之间分配工作的方式(processor_lower_bound 和processor_upper_bound)似乎是静态的,并且可能无法有效地利用所有内核。这种静态划分可能会导致一些处理器提前完成工作并闲置。

  • 事件处理:您对solution_found使用的multiprocessing.Event是正确的,但要确保它被有效地使用,以在哈希被破解后向所有进程发出停止信号。

  • 文件处理:在多处理环境中打开文件应小心谨慎,以避免冲突。最好将结果返回给主进程,让其处理文件写入。

要解决这些问题,请考虑进行以下修改:

  • 使用 Manager 实现共享状态:使用 multiprocessing.Manager() 为 hacked_hashes 创建共享状态。这允许所有进程查看并更新一组通用的破解哈希值。

  • 动态工作分配:考虑使用队列,而不是静态地划分工作,其中任务(不同的字符长度或组合)在可用时动态分配给处理器。这可确保所有处理器始终被占用。

  • 集中文件写入:将结果返回给主进程并在那里处理所有文件写入,避免冲突并确保线程安全。

这是您的代码的修订版本,其中包含以下注意事项:


import hashlib
import itertools
import string
import time
import multiprocessing

def crack(task_queue, result_queue, char_set, solution_found):
    while not solution_found.is_set():
        task = task_queue.get()
        if task is None:  # No more tasks
            break
        processor_lower_bound, processor_upper_bound = task
        for i in range(processor_lower_bound, processor_upper_bound + 1):
            for char in itertools.product(char_set, repeat=i):
                if solution_found.is_set():
                    return
                crack_attempt = ''.join(char)
                m = hashlib.md5()
                m.update(bytes(crack_attempt, encoding='utf-8'))
                result_queue.put((crack_attempt, m.hexdigest()))

def main():
    char_set = string.hexdigits.replace(' ', '')
    task_queue = multiprocessing.Queue()
    result_queue = multiprocessing.Queue()
    solution_found = multiprocessing.Event()

    # Create tasks
    for i in range(1, 10):  # Example range, adjust as needed
        task_queue.put((i, i+1))  # Adjust task size as per your requirement

    # Start processes
    processes = []
    for _ in range(multiprocessing.cpu_count()):
        p = multiprocessing.Process(target=crack, args=(task_queue, result_queue, char_set, solution_found))
        processes.append(p)
        p.start()

    # Process results and manage termination
    cracked_hashes = set()
    while len(cracked_hashes) < 8:
        try:
            attempt, hash_val = result_queue.get(timeout=10)  # Adjust timeout as needed
            if (attempt, hash_val) not in cracked_hashes:
                cracked_hashes.add((attempt, hash_val))
                print("Password Cracked: ", attempt, "Hash: ", hash_val)
                with open("cracked_matches.txt", 'a') as fout:
                    fout.write(f"{attempt}: {hash_val}\n")
                if len(cracked_hashes) >= 8:
                    solution_found.set()
                    break
        except queue.Empty:
            continue

    # Signal processes to stop and wait for them to finish
    for _ in range(multiprocessing.cpu_count()):
        task_queue.put(None)
    for p in processes:
        p.join()

if __name__ == "__main__":
    main()

此修订版本包括动态任务分配和结果共享队列,这将有助于有效利用所有可用处理器。确保根据您的具体要求测试和调整任务范围和超时。

根据添加的堆栈跟踪进行编辑:

带有消息“不允许跨线程访问控件”的 InvalidOperationException 表示您正在尝试从创建该控件的线程(通常是主 UI 线程)以外的线程访问或修改该控件。当尝试从后台线程更新 UI 元素时,这是 Windows 窗体应用程序中的常见问题。

在您的情况下,异常是由 AForge.Controls.VideoSourcePlayer 组件引发的,特别是在 Dermascope 类的 Disconnect 方法中。当从非 UI 线程的线程调用 VideoSourcePlayer 的 SignalToStop 方法时,会发生这种情况。

要解决此问题,您需要确保与 UI 控件(本例中为 VideoSourcePlayer)交互的代码在 UI 线程上执行。您可以使用控件的 Invoke 方法来封送对 UI 线程的调用。以下是您可以修改 Disconnect 方法来执行此操作的方法:


private void Disconnect()
{
    if (this.InvokeRequired)
    {
        this.Invoke(new MethodInvoker(() => {
            DisconnectInternal();
        }));
    }
    else
    {
        DisconnectInternal();
    }
}

private void DisconnectInternal()
{
    if (videoSourcePlayer.VideoSource != null)
    {
        // stop video device
        videoSourcePlayer.SignalToStop();
        videoSourcePlayer.WaitForStop();
        videoSourcePlayer.VideoSource = null;

        if (videoDevice.ProvideSnapshots)
        {
            videoDevice.SnapshotFrame -= new NewFrameEventHandler(videoDevice_SnapshotFrame);
        }
    }
}

在此修改中,Disconnect 检查它是否是从 UI 线程以外的线程调用的。如果是这样,它使用 Invoke 在 UI 线程上调用 DisconnectInternal。 DisconnectInternal 包含 Disconnect 方法的原始逻辑。

将此模式应用到代码中与后台线程中的 UI 元素交互的任何其他位置。这应该可以解决跨线程操作异常。

© www.soinside.com 2019 - 2024. All rights reserved.