blog

C++: Avoiding Argument Dependent Lookup

A little trick using an extra namespace and cross-import.

2 min read

30 Apr 23

blog

Sums of Discrete Random Variables as Banded Matrix Products

Zero-stride catch and a custom CUDA kernel.

Sums of Discrete Random Variables as Banded Matrix Products

10 min read

16 Mar 23

blog
Fast Enumeration of Sums of Discrete Random Variables

5 min read

22 Feb 23

blog

GPU Course: Foundations of GPU Computing

A short course with a machine learning flavor, working with a feed-forward neural network implemented in C.

3 min read

13 Feb 23

blog

Foundations of GPU Computing: Practical Exercises #2

Profiling code in Nsight Systems and refactoring to improve performance.

8 min read

12 Feb 23

blog

Foundations of GPU Computing: Practical Exercises #1

Working with C code that trains a deep neural network.

13 min read

12 Feb 23

blog

Foundations of GPU Computing: Opening Lecture

Opening lecture slides, introducing GPU hardware and key concepts: kernels, streams, memory.

8 min read

12 Feb 23

blog

Foundations of GPU Computing: Closing Lecture

Closing lecture slides, tying up some loose ends and a taste of more advanced kernel programming.

7 min read

12 Feb 23

blog

Fix Scaling Issues on KDE Plasma

In case of issues switching between Wayland and X11

1 min read

22 Jan 23

blog

Profile in the Cloud with Nsight Systems

How to profile CUDA code on cloud GPU instances using Nsight Systems.

2 min read

13 Jan 23

blog

Develop in the Cloud with Visual Studio Code

How to develop on remote cloud instances using Visual Studio Code.

2 min read

13 Jan 23

blog

GPU Programming in the Cloud

A round-up of cloud service providers, and a how-to guide for each

19 min read

22 Nov 22