Lawrence Murray on 13 February 2023
updated on 10 March 2023
Graphics Processing Units (GPUs) are now a standard feature of the numerical computing landscape, especially so in machine learning. This course teaches foundational concepts of GPU computing: hardware, memory, kernels and streams. It includes practical sessions working with a deep neural network that has been implemented in C with CUDA. The motivation for using C (and not, for example, Python) is to reinforce the foundational concepts, as it forces us to be explicit about each step: each memory allocation, each kernel call, each stream synchronization.
Not familiar with C?
The practicals focus on reading, building, running and profiling code, not writing code. You will not have to write any C code from scratch.
The course consists of four modules of about one hour each: an introductory lecture, two practical sessions, and a closing lecture.
By way of philosophy, we focus less on individual kernel performance and more on holistic program performance, less on host-device memory copies and more on unified virtual memory, less on coalescence and more on cache efficiency. We believe that this reflects a more modern approach.
For the practical sessions you will need access to a machine with an Nvidia GPU and CUDA installed. This machine could be your local laptop or desktop machine if it has a discrete graphics card, or a remote machine to which you have SSH access. If you do not already have access to such a remote machine you can easily set up an instance on a cloud service provider by following this guide.
The GPU should be of at least the Pascal generation of hardware or later (post 2016), as we make use of more recent innovations in unified virtual memory that are not supported by earlier generations of hardware (Maxwell and before).
Regardless of where you will run the code—local or remote—you will need the following software installed on your local laptop or desktop machine. Follow the links for instructions on how to install the software and, if relevant, make an SSH connection to a remote machine with a GPU:
While the course assumes use of Visual Studio Code throughout, if you are familiar with SSH you could instead use a terminal and text editor of your choice. Nsight Systems is used for profiling code and, for the purposes of these practicals at least, has no obvious substitute.
A how-to and round-up of cloud service providers.
22 Nov 22
Zero-stride catch and a custom CUDA kernel.
16 Mar 23
A CUDA kernel.
22 Feb 23