生成复杂调度时内核死亡

问题描述 投票:0回答:1

尝试制定一个复杂的时间表,其中每周有 3 个人出席,1 人带午餐,1 人带咖啡,同时尝试尽可能地将所有这些时间间隔开。脚本比仅仅循环人员更受欢迎,因为可用性和潜在的演示者一直在变化,事实证明,对这么多人手动执行此操作很棘手。

我想出了一个应该能够做到这一点的脚本(如下),但是即使在添加批处理后内核仍然崩溃。它正在一个具有 60G 内存和 15 个 CPU 的远程 slurm 服务器上运行,这应该足够了(我将其用于 CNN 等)。

非常感谢您提供的任何帮助来识别内存泄漏/修复它的方法。

这是脚本:

#!/usr/bin/env python

import csv
from datetime import datetime, timedelta
import itertools

def generate_weekly_schedule(people_availability, start_date, end_date):
    # Create a list of all available dates within the specified range
    available_dates = [start_date + timedelta(days=i) for i in range((end_date - start_date).days + 1)]
    # Initialize the schedule
    schedule = []
    for date in available_dates:
        # Filter out people who are not available on this date
        available_people = [person for person, unavailable_dates in people_availability.items() if date not in unavailable_dates]
        if not available_people:
            continue  # Skip the date if no one is available

        # Try to select the presenter for the main presentation
        main_presentation = None
        for person in available_people:
            if all(date + timedelta(days=i) not in people_availability[person] for i in range(7)):
                main_presentation = person
                break
        if main_presentation is None:
            continue  # Skip the date if we couldn't find a main presenter

        # Shuffle the list of available people to randomize the selection of highlights, lunch, and coffee
        random_order = list(itertools.permutations(available_people))
        for order in random_order:
            if main_presentation not in order:
                highlights = order[:2]  # Assign the first two people in the shuffled list to highlights
                lunch_and_coffee_candidates = order[2:]
                break

        # Ensure that lunch and coffee providers are not presenting
        lunch_provider = None
        coffee_provider = None

        for person in lunch_and_coffee_candidates:
            if date + timedelta(days=7) not in people_availability[person] and lunch_provider is None:
                lunch_provider = person
            elif date + timedelta(days=7) not in people_availability[person] and coffee_provider is None:
                coffee_provider = person

        # Append the schedule for this date
        schedule.append([date.strftime("%Y-%m-%d"), main_presentation, highlights[0], highlights[1], lunch_provider, coffee_provider])

    return schedule

def write_schedule_to_csv(schedule, csv_filename):
    with open(csv_filename, mode='w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(["Date", "Main Presentation", "Highlight 1", "Highlight 2", "Lunch Provider", "Coffee Provider"])
        writer.writerows(schedule)

def batch_process_and_merge(people_availability, start_date, end_date, batch_size):
    batched_schedule = []
    current_date = start_date
    while current_date <= end_date:
        batch_end_date = min(current_date + timedelta(days=batch_size - 1), end_date)
        batch_schedule = generate_weekly_schedule(people_availability, current_date, batch_end_date)
        batched_schedule.extend(batch_schedule)
        current_date = batch_end_date + timedelta(days=1)
    return batched_schedule

if __name__ == "__main__":
    # Example input dictionary of people and their unavailable dates
    people_availability = {
        "Person 1": [(datetime(2023, 9, 8), datetime(2023, 9, 20))],
        "Person 2": [(datetime(2023, 9, 8), datetime(2023, 10, 13))],
        "Person 3": [],
        "Person 4": [],
        "Person 5":[(datetime(2023, 10, 1), datetime(2023, 12, 15))],
        "Person 6": [],
        "Person 7": [],
        "Person 8": [datetime(2023, 11, 10), datetime(2023, 11, 17)],
        "Person 9": [],
        "Person 10":[],
        "Person 11": [],
        "Person 12": [(datetime(2023, 10, 27), datetime(2023, 12, 15))],
        "Person 13": [datetime(2023, 9, 15)],
        "Person 14": [(datetime(2023, 9, 8), datetime(2023, 12, 15))],
        "Person 15": [(datetime(2023, 9, 8), datetime(2023, 12, 15))],
        "Person 16": [(datetime(2023, 9, 8), datetime(2023, 12, 15))],
        "Person 17": [],
        "Person 18": [],
        "Person 19": [datetime(2023, 9, 22), datetime(2023, 11, 3)],
        "Person 20": [],
        "Person 21": [],
        "Person 22": [],
        "Person 23": [datetime(2023, 9, 8), datetime(2023, 9, 13)],
        "Person 24": [datetime(2023, 9, 8), datetime(2023, 9, 30)],               
    }

    start_date = datetime(2023, 9, 8)
    end_date = datetime(2023, 12, 15)
    batch_size = 7 

    batched_schedule = batch_process_and_merge(people_availability, start_date, end_date, batch_size)
    write_schedule_to_csv(batched_schedule, "weekly_schedule.csv")

我尝试添加批处理并增加 CPU 数量,但一定存在一些我没有看到的数据泄漏,因为我认为这不需要那么多计算资源。

python memory-leaks out-of-memory
1个回答
0
投票

所以我使用调试器单步调试你的代码,我发现消耗你所有内存的行是:

random_order = list(itertools.permutations(available_people))
。这是一个非常具有欺骗性的行,因为您要做的就是用所有可能的排列填充一个列表。但是,由于您的列表中有 22 个项目,因此您的列表将需要 22^22 个项目,我认为大约有 300 个 octillion 项目。使用这样的迭代器,您无法将其卸载到列表中,您必须逐项遍历它。

我不确定你想如何在这里实现你的逻辑,但是使用

random_order = itertools.permutations(available_people)
将其保留为迭代器应该可以防止它尝试获取比你可能拥有的更多的内存。

© www.soinside.com 2019 - 2024. All rights reserved.