Purging and Reinstalling CUDA

What I did to clear up my install
Published

September 30, 2023

My deep learning machine has ended up with two repositories for cuda dependencies and cuda is periodically failing. To clean this up I want to purge all packages and configuration related to cuda and then reinstall from scratch.

Purging

The purging comes in two stages - there are all the packages related to cuda and nvidia, and then there are the custom sources and keys.

Purging Packages

The packages that are installed in the system can be found with dpkg -l. This lists the known packages in a machine readable way. For example:

➜ dpkg -l | grep nvidia
ii  libnvidia-cfg1-535:amd64                   535.113.01-0ubuntu0.20.04.1              amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-535                       535.104.12-0ubuntu1                      all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-450:amd64                450.119.04-0ubuntu1                      amd64        NVIDIA libcompute package
...

The cuda packages either have cuda or nvidia in the name, and the actual package of interest is the second term in the line. We can extract this using awk '{ print $2 }' to print the second argument of the line:

➜ dpkg -l | grep nvidia | awk '{ print $2 }'
libnvidia-cfg1-535:amd64
libnvidia-common-535
libnvidia-compute-450:amd64
...

With this we can then use xargs which takes the standard input and appends it to the command:

➜ dpkg -l | grep nvidia | awk '{ print $2 }' | xargs sudo apt-get remove --purge
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED
  libnvidia-cfg1-535* libnvidia-common-535* libnvidia-compute-450* ...

As the standard input of apt-get is taken by xargs you will not be able to confirm the operation and it will abort. This gives you a chance to review the packages that will be removed. Packages that are not installed will be ignored.

To actually remove them we just add the --yes option to apt-get. That gives us two commands to purge the cuda packages:

➜ dpkg -l | grep cuda | awk '{ print $2 }' | xargs sudo apt-get remove --purge --yes
➜ dpkg -l | grep nvidia | awk '{ print $2 }' | xargs sudo apt-get remove --purge --yes

After this there may be packages that were installed to support cuda that are no longer required. We can remove them with:

sudo apt-get autoremove

Purging Sources

The cuda installation instructions get you to write a source to /etc/apt/sources.list.d/. Checking this and the base source list can find the ones related to cuda:

grep nvidia /etc/apt/sources.list /etc/apt/sources.list.d/*

After removing these files you need to refresh your apt cache:

sudo apt-get update

Purging Keyrings

The final part is to purge the keys from the apt keyring. To find it we first list the keys:

➜ sudo apt-key list
/etc/apt/trusted.gpg
--------------------
...

pub   rsa4096 2017-09-28 [SCE]
      C95B 321B 61E8 8C18 09C4  F759 DDCA E044 F796 ECB0
uid           [ unknown] NVIDIA CORPORATION (Open Source Projects) <cudatools@nvidia.com>

...

Here we can see the NVIDIA Corporation key. The id of this key is the long hexadecimal number, and we can refer to this key using F796ECB0. The list output does not make it easy to understand the format.

To remove this key we then run:

➜ sudo apt-key del F796ECB0
OK

Checking the output of list should show that it has been deleted.

Checking Removal

We can check that cuda is not available to install by updating the apt cache and then installing it:

➜ sudo apt-get update
...
➜ sudo apt-get install cuda
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package cuda

We can check that nvidia-smi is unavailable:

➜ nvidia-smi
zsh: command not found: nvidia-smi

We can check that torch reports cuda as unavailable:

➜ poetry run python -c 'import torch; print(torch.cuda.is_available())'
False

(I’m doing this in a virtual environment, pytorch is not installed globally).

Reinstalling

The installation instructions cover ubuntu. The basic steps are:

Install the linux headers:

➜ sudo apt-get install linux-headers-$(uname -r)

Check that gcc is installed and working:

➜ gcc --version

Using the network installer with $distro/$arch of ubuntu2204/x86_64:

➜ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
➜ sudo dpkg -i cuda-keyring_1.1-1_all.deb

Then we just have to update and install:

➜ sudo apt-get update
➜ sudo apt-get install cuda-toolkit nvidia-driver-545 nvidia-utils-545

After this a restart is required to load the driver correctly.

Checking the Installation

We can check the installation using the command line tool nvidia-smi:

➜ nvidia-smi
Sat Sep 30 20:57:02 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA TITAN RTX               On  | 00000000:01:00.0 Off |                  N/A |
| 41%   61C    P0              78W / 280W |      1MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

We can also check that we can use cuda within pytorch:

➜ poetry run python -c '
import torch;
print(torch.cuda.is_available());
print(torch.tensor([1,2,3], device="cuda"))'
True
tensor([1, 2, 3], device='cuda:0')

Looks good!

I really need to install that other graphics card that my mate gave me. Would be good.