GPU Memory Full On Idle

 views

I ran nvidia-smi and noticed GPU memory was almost completely full, but nothing was running. GPU usage was at 0%, no active jobs, yet nearly all VRAM was occupied. At first, I thought something was wrong with the GPU or drivers, but that wasn’t the case.

$ nvidia-smi
Sun Mar 22 13:01:31 2026       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 535.230.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5500               Off | 00000000:B3:00.0 Off |                  Off |
| 30%   35C    P8              18W / 230W |  24172MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|=======================================================================================|
|    0   N/A  N/A      2749      G   /usr/lib/xorg/Xorg                           63MiB |
|    0   N/A  N/A      2805      G   /usr/bin/gnome-shell                          7MiB |
+---------------------------------------------------------------------------------------+

After Gooogling

I found sudo reboot works but as it multiuser machine I had to ask permission.

Further Gooogling

In my case, I had been running training jobs using DataLoader(num_workers > 0). PyTorch spawns multiple worker processes (pt_data_worker) for faster data loading. The problem starts when a script crashes or I interrupt it using Ctrl + C. These worker processes don’t always terminate properly and continue running silently in the background, still holding GPU memory.

$ sudo fuser -v /dev/nvidia* 
                     USER        PID    ACCESS COMMAND
/dev/nvidia0:        root       2749    F...m  Xorg
                     gdm        2805    F...m  gnome-shell
                     <user>     414634  F...m  pt_data_worker
                     <user>     414646  F...m  pt_data_worker
                     <user>     414647  F...m  pt_data_worker
                     <user>     415374  F...m  python
/dev/nvidiactl:      root       2749    F...m  Xorg
                     gdm        2805    F...m  gnome-shell
                     <user>     414634  F...m  pt_data_worker
                     <user>     414646  F...m  pt_data_worker
                     <user>     414647  F...m  pt_data_worker
                     <user>     415374  F....  python
/dev/nvidia-modeset: root       2749    F....  Xorg
                     gdm        2805    F....  gnome-shell
/dev/nvidia-uvm:     <user>     414634  F....  pt_data_worker
                     <user>     414646  F....  pt_data_worker
                     <user>     414647  F....  pt_data_worker
                     <user>     415374  F....  python

Solved. Now working for me

Add Your Note