Compiler optimizations can break it, function attributes can fix it.
3 min read
16 Aug 23
An alternative to explicit instantiations and macros.
3 min read
11 Jun 23
A little trick using an extra namespace and cross-import.
2 min read
30 Apr 23
Zero-stride catch and a custom CUDA kernel.
10 min read
16 Mar 23
A short course with a machine learning flavor, working with a feed-forward neural network implemented in C.
3 min read
13 Feb 23
Profiling code in Nsight Systems and refactoring to improve performance.
8 min read
12 Feb 23
Working with C code that trains a deep neural network.
13 min read
12 Feb 23
Opening lecture slides, introducing GPU hardware and key concepts: kernels, streams, memory.
8 min read
12 Feb 23
Closing lecture slides, tying up some loose ends and a taste of more advanced kernel programming.
7 min read
12 Feb 23
In case of issues switching between Wayland and X11
1 min read
22 Jan 23
How to profile CUDA code on cloud GPU instances using Nsight Systems.
2 min read
13 Jan 23