How to limit clock speed and memory speed for more consistent results when benchmarking and profiling CUDA kernels.
3 min read
21 Oct 24
Tips and tricks to help the compiler optimize
7 min read
11 Oct 24
Breaking down large matrix multiplications into tiles
8 min read
4 Oct 24
It works on my laptop
10 min read
1 Oct 24
Essential functions for categorical distributions and attention mechanisms in machine learning
4 min read
4 May 24
How to check if a template type matches a pattern? Something like is_like_v<T, vector<int,_>>
.
6 min read
2 Mar 24
How to overload the three-way comparison (spaceship) operator<=>
, and a reminder to overload operator==
as well.
3 min read
9 Feb 24
How to implement an is_instance_of
type trait.
3 min read
5 Feb 24
Consider merging overloads into one function with forwarding reference parameters
5 min read
27 Jan 24
You get one implicit conversion, so burn it with a wrapper
2 min read
14 Nov 23