blog Latest

Limit clock speed and memory speed on an Nvidia GPU

How to limit clock speed and memory speed for more consistent results when benchmarking and profiling CUDA kernels.

3 min read

21 Oct 24

blog

Matrix Multiplication On GPU: Part 3, Coding for Speed

Tips and tricks to help the compiler optimize

7 min read

11 Oct 24

blog

Matrix Multiplication On GPU: Part 2, Tiling

Breaking down large matrix multiplications into tiles

Matrix Multiplication On GPU: Part 2, Tiling

8 min read

4 Oct 24

blog
Matrix Multiplication on GPU: Faster than Nvidia, Sometimes

10 min read

1 Oct 24

photography
Enjoying the View
1

17 Aug 24

blog

Gradients of Softmax and Logsumexp

Essential functions for categorical distributions and attention mechanisms in machine learning

4 min read

4 May 24

blog

C++: Pattern Matching Template Types

How to check if a template type matches a pattern? Something like is_like_v<T, vector<int,_>>.

6 min read

2 Mar 24

blog

C++: Overloading the Spaceship Operator, A Recipe

How to overload the three-way comparison (spaceship) operator<=>, and a reminder to overload operator== as well.

3 min read

9 Feb 24

blog

C++: Check if a type is an instantiation of a given class template

How to implement an is_instance_of type trait.

3 min read

5 Feb 24

photography
Gum Tree
1

3 Feb 24

blog

C++: Forwarding references, overload resolution, and taking back control

Consider merging overloads into one function with forwarding reference parameters

5 min read

27 Jan 24

blog

C++: Disable implicit conversion in specific contexts only

You get one implicit conversion, so burn it with a wrapper

2 min read

14 Nov 23