了解 PyTorch 内存管理和 GPU 到 CPU 传输

Question

简介：

我目前正在开发一个使用 PyTorch 的应用程序，并且遇到了与内存管理相关的有趣行为。具体来说，当我加载模型并将其从 CPU 移动到 GPU 时，只有部分模型被传输到 GPU（这看起来很正常）。但是，当我将模型从 GPU 移回 CPU 时，整个模型大小也会移回，导致 RAM 使用量增加。即使显式调用垃圾收集器或使用 torch 函数释放内存似乎也不会释放 RAM（仅释放 GPU 内存）。

重现问题：下面是演示此问题的代码片段：

import gc
import torch
import torch.nn as nn
from memory_profiler import profile

INT_ITERATION = 5

class LargeNet(nn.Module):
    def __init__(self):
        super(LargeNet, self).__init__()
        self.fc1 = nn.Linear(10000, 5000)
        self.fc2 = nn.Linear(5000, 1000)
        self.fc3 = nn.Linear(1000, 500)
        self.fc4 = nn.Linear(500, 100)
        self.fc5 = nn.Linear(100, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = torch.relu(self.fc4(x))
        x = self.fc5(x)
        return x


@profile
def run_test():
    # Create the network and move it to the GPU
    model = LargeNet()
    model = model.to('cuda')
    
    model = model.to('cpu')
    del model

    gc.collect()
    torch.cuda.empty_cache()


if __name__ == "__main__":
    print("PyTorch version:", torch.__version__)

    if torch.cuda.is_available():
        for i in range(INT_ITERATION):
            print(f'******* Iteration num: {i+1} *********** \n')
            run_test()

        input("Press Enter to continue...")
    
    else:
        print('CUDA is not available')

要运行代码并重现问题，您需要在 Python 环境中安装

torch

和

memory_profiler

软件包。

输出和观察：在我的带有 Torch 2.2.2 和 CUDA 12.1 的 Ubuntu 20.04 机器上（我在带有 Torch 2.1.0 和 CUDA 12.1 的 Windows PC 上遇到了同样的问题），我观察到以下行为：

******* Iteration num: 1 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    332.7 MiB    332.7 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    546.9 MiB    214.1 MiB           1       model = LargeNet()
    30    451.2 MiB    -95.6 MiB           1       model = model.to('cuda')
    31                                         
    32    662.9 MiB    211.7 MiB           1       model = model.to('cpu')
    33    472.4 MiB   -190.5 MiB           1       del model
    34                                         
    35    472.4 MiB      0.0 MiB           1       gc.collect()
    36    472.4 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 2 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    472.4 MiB    472.4 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    682.0 MiB    209.6 MiB           1       model = LargeNet()
    30    491.5 MiB   -190.5 MiB           1       model = model.to('cuda')
    31                                         
    32    682.0 MiB    190.5 MiB           1       model = model.to('cpu')
    33    491.5 MiB   -190.5 MiB           1       del model
    34                                         
    35    491.5 MiB      0.0 MiB           1       gc.collect()
    36    491.5 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 3 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    491.5 MiB    491.5 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    701.1 MiB    209.6 MiB           1       model = LargeNet()
    30    510.6 MiB   -190.5 MiB           1       model = model.to('cuda')
    31                                         
    32    720.2 MiB    209.6 MiB           1       model = model.to('cpu')
    33    529.6 MiB   -190.5 MiB           1       del model
    34                                         
    35    529.6 MiB      0.0 MiB           1       gc.collect()
    36    529.6 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 4 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    529.6 MiB    529.6 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    720.2 MiB    190.5 MiB           1       model = LargeNet()
    30    529.7 MiB   -190.5 MiB           1       model = model.to('cuda')
    31                                         
    32    682.4 MiB    152.7 MiB           1       model = model.to('cpu')
    33    491.6 MiB   -190.7 MiB           1       del model
    34                                         
    35    491.6 MiB      0.0 MiB           1       gc.collect()
    36    491.6 MiB      0.0 MiB           1       torch.cuda.empty_cache()


******* Iteration num: 5 *********** 

Filename: test_torch_memory_leak.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26    491.6 MiB    491.6 MiB           1   @profile
    27                                         def run_test():
    28                                             # Create the network and move it to the GPU
    29    701.2 MiB    209.6 MiB           1       model = LargeNet()
    30    510.6 MiB   -190.6 MiB           1       model = model.to('cuda')
    31                                         
    32    720.2 MiB    209.6 MiB           1       model = model.to('cpu')
    33    529.7 MiB   -190.5 MiB           1       del model
    34                                         
    35    529.7 MiB      0.0 MiB           1       gc.collect()
    36    529.7 MiB      0.0 MiB           1       torch.cuda.empty_cache()


Press Enter to continue...

有趣的是，经过3到4次迭代后，内存使用量趋于稳定，没有进一步增加。然而，这种初始行为特别烦人，因为第一次加载模型时，与后续迭代相比，我可以使用更少的内存来使用它。

问题：

PyTorch 中是否存在这种行为，或者这可能是一个问题吗？
如果这种行为是预期的，有没有办法在不关闭线程的情况下释放 CPU 上的所有 Torch 内存？

Answer 1

注意到同样的问题。希望有人能帮忙

了解 PyTorch 内存管理和 GPU 到 CPU 传输

问题描述投票：0回答：1

1个回答

最新问题

了解 PyTorch 内存管理和 GPU 到 CPU 传输

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1