了解 PyTorch 内存管理和 GPU 到 CPU 传输

问题描述 投票:0回答:1

简介

我目前正在开发一个使用 PyTorch 的应用程序,并且遇到了与内存管理相关的有趣行为。具体来说,当我加载模型并将其从 CPU 移动到 GPU 时,只有部分模型被传输到 GPU(这看起来很正常)。但是,当我将模型从 GPU 移回 CPU 时,整个模型大小也会移回,导致 RAM 使用量增加。即使显式调用垃圾收集器或使用 torch 函数释放内存似乎也不会释放 RAM(仅释放 GPU 内存)。

重现问题: 下面是演示此问题的代码片段:

import gc
import torch
import torch.nn as nn
from memory_profiler import profile

INT_ITERATION = 5

class LargeNet(nn.Module):
    def __init__(self):
        super(LargeNet, self).__init__()
        self.fc1 = nn.Linear(10000, 5000)
        self.fc2 = nn.Linear(5000, 1000)
        self.fc3 = nn.Linear(1000, 500)
        self.fc4 = nn.Linear(500, 100)
        self.fc5 = nn.Linear(100, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = torch.relu(self.fc4(x))
        x = self.fc5(x)
        return x


@profile
def run_test():
    # Create the network and move it to the GPU
    model = LargeNet()
    model = model.to('cuda')
    
    model = model.to('cpu')
    del model

    gc.collect()
    torch.cuda.empty_cache()


if __name__ == "__main__":
    print("PyTorch version:", torch.__version__)

    if torch.cuda.is_available():
        for i in range(INT_ITERATION):
            print(f'******* Iteration num: {i+1} *********** \n')
            run_test()

        input("Press Enter to continue...")
    
    else:
        print('CUDA is not available')
    

要运行代码并重现问题,您需要在 Python 环境中安装

torch
memory_profiler
软件包。

输出和观察: 在我的带有 Torch 2.2.2 和 CUDA 12.1 的 Ubuntu 20.04 机器上(我在带有 Torch 2.1.0 和 CUDA 12.1 的 Windows PC 上遇到了同样的问题),我观察到以下行为:

******* Iteration num: 1 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    332.7 MiB    332.7 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    546.9 MiB    214.1 MiB           1       model = LargeNet()
    30    451.2 MiB    -95.6 MiB           1       model = model.to('cuda')
    31                                         
    32    662.9 MiB    211.7 MiB           1       model = model.to('cpu')
    33    472.4 MiB   -190.5 MiB           1       del model
    34                                         
    35    472.4 MiB      0.0 MiB           1       gc.collect()
    36    472.4 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 2 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    472.4 MiB    472.4 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    682.0 MiB    209.6 MiB           1       model = LargeNet()
    30    491.5 MiB   -190.5 MiB           1       model = model.to('cuda')
    31                                         
    32    682.0 MiB    190.5 MiB           1       model = model.to('cpu')
    33    491.5 MiB   -190.5 MiB           1       del model
    34                                         
    35    491.5 MiB      0.0 MiB           1       gc.collect()
    36    491.5 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 3 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    491.5 MiB    491.5 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    701.1 MiB    209.6 MiB           1       model = LargeNet()
    30    510.6 MiB   -190.5 MiB           1       model = model.to('cuda')
    31                                         
    32    720.2 MiB    209.6 MiB           1       model = model.to('cpu')
    33    529.6 MiB   -190.5 MiB           1       del model
    34                                         
    35    529.6 MiB      0.0 MiB           1       gc.collect()
    36    529.6 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 4 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    529.6 MiB    529.6 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    720.2 MiB    190.5 MiB           1       model = LargeNet()
    30    529.7 MiB   -190.5 MiB           1       model = model.to('cuda')
    31                                         
    32    682.4 MiB    152.7 MiB           1       model = model.to('cpu')
    33    491.6 MiB   -190.7 MiB           1       del model
    34                                         
    35    491.6 MiB      0.0 MiB           1       gc.collect()
    36    491.6 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 5 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    491.6 MiB    491.6 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    701.2 MiB    209.6 MiB           1       model = LargeNet()
    30    510.6 MiB   -190.6 MiB           1       model = model.to('cuda')
    31                                         
    32    720.2 MiB    209.6 MiB           1       model = model.to('cpu')
    33    529.7 MiB   -190.5 MiB           1       del model
    34                                         
    35    529.7 MiB      0.0 MiB           1       gc.collect()
    36    529.7 MiB      0.0 MiB           1       torch.cuda.empty_cache()


Press Enter to continue...

有趣的是,经过3到4次迭代后,内存使用量趋于稳定,没有进一步增加。然而,这种初始行为特别烦人,因为第一次加载模型时,与后续迭代相比,我可以使用更少的内存来使用它。

问题:

  1. PyTorch 中是否存在这种行为,或者这可能是一个问题吗?
  2. 如果这种行为是预期的,有没有办法在不关闭线程的情况下释放 CPU 上的所有 Torch 内存?
deep-learning memory-management pytorch
1个回答
0
投票

注意到同样的问题。希望有人能帮忙

© www.soinside.com 2019 - 2024. All rights reserved.