blog

C++: Combinatorial instantiation of templates with std::variant

An alternative to explicit instantiations and macros.

3 min read / 11 Jun 23 (updated 16 Aug 23)

C++ templates can increase compile times dramatically. When developing a library that makes heavy use of them, a common approach is to use explicit instantiation to reduce compile times for users of the library, instantiating in the library build, instead of deferring to the user build. There are other reasons too. Consider, for example, writing a library of templated CUDA kernels, where the CUDA toolchain is required for compilation: explicit instantiation alleviates the user from setting up the CUDA toolchain, instead building their own code with their preferred toolchain and linking in the library (e.g. this is what I do with NumBirch).

But there are often many combinations of types for which a function template must be instantiated. Consider the following simple example:

template<class T, class U>
void test(T x, U y) {
  //
}

where valid types for T and U are double, float, and int (we should probably use SFINAE or concepts to restrict T and U to these, but omit the detail here). To explicitly instantiate all such combinations we would use the following:

template void test(double, double);
template void test(double, float);
template void test(double, int);
template void test(float, double);
template void test(float, float);
template void test(float, int);
template void test(int, double);
template void test(int, float);
template void test(int, int);

Compiling:

g++ -std=c++20 -c instantiate.cpp

and inspecting:

nm -C instantiate.o | grep test

gives:

0000000000000000 W void test<double, double>(double, double)
0000000000000000 W void test<double, float>(double, float)
0000000000000000 W void test<double, int>(double, int)
0000000000000000 W void test<float, double>(float, double)
0000000000000000 W void test<float, float>(float, float)
0000000000000000 W void test<float, int>(float, int)
0000000000000000 W void test<int, double>(int, double)
0000000000000000 W void test<int, float>(int, float)
0000000000000000 W void test<int, int>(int, int)

We can see that all desired instantiations have been made.

The code is quite verbose, and this is a small example: two arguments with each of three possible types, for $3^2 = 9$ instantiations. We can imagine that this can quickly get out of hand. It is realistic to think of ternary functions with three arguments and, say, eight different types, for $8^3 = 512$ instantiations. We will not want to enumerate all of those.

One option is to use the preprocessor. Something like this:

#define BINARY(f) \
    BINARY_FIRST(f, double) \
    BINARY_FIRST(f, float) \
    BINARY_FIRST(f, int)
#define BINARY_FIRST(f, T) \
    BINARY_SECOND(f, T, double) \
    BINARY_SECOND(f, T, float) \
    BINARY_SECOND(f, T, int)
#define BINARY_SECOND(f, T, U) \
    template void f(T, U);
BINARY(test)

This works, giving the same results on inspection as above, but becomes messy for large numbers of arguments or types. We might also need to exclude particular combinations of types that are incompatible, which will add to the mess.

Since C++17, another approach is to use std::variant to encode the valid types, then std::visit to enumerate all the instantiations (now implicit rather than explicit, but the end result is the same):

static void instantiate() {
  using valid = std::variant<double,float,int>;
  valid x, y;

  std::visit([](auto x, auto y) {
    test(x, y);
  }, x, y);
}

Again, same result on inspection with nm, but we might consider this a little nicer. Or we might wrap it up in a macro to make it a little nicer. And we can add instantiations of more than one template function to the one instantiate() static function.

Another advantage of this approach is that we can add logic to include or exclude particular combinations of types. For the sake of example, requiring that the two types are different, using an if constexpr statement:

static void instantiate() {
  using valid = std::variant<double,float,int>;
  valid x, y;

  std::visit([]<typename T, typename U>(T x, U y) {
    if constexpr (!std::is_same_v<T,U>) {
      test(x, y);
    }
  }, x, y);
}

This time we can inspect the result:

g++ -std=c++20 -c instantiate.cpp
nm -C instantiate.o | grep test

and see that only combinations where T and U are different are instantiated:

0000000000000000 W void test<double, float>(double const&, float const&)
0000000000000000 W void test<double, int>(double const&, int const&)
0000000000000000 W void test<float, double>(float const&, double const&)
0000000000000000 W void test<float, int>(float const&, int const&)
0000000000000000 W void test<int, double>(int const&, double const&)
0000000000000000 W void test<int, float>(int const&, float const&)

A downside is that the object file also contains a bunch of other instantiations related to std::variant and std::visit, although this seems minor.

Here’s the full code for further exploration:

#include <variant>

template<class T, class U>
void test(T x, U y) {
  //
}

/*
template void test(double, double);
template void test(double, float);
template void test(double, int);
template void test(float, double);
template void test(float, float);
template void test(float, int);
template void test(int, double);
template void test(int, float);
template void test(int, int);
*/

/*
#define BINARY(f) \
    BINARY_FIRST(f, double) \
    BINARY_FIRST(f, float) \
    BINARY_FIRST(f, int)
#define BINARY_FIRST(f, T) \
    BINARY_SECOND(f, T, double) \
    BINARY_SECOND(f, T, float) \
    BINARY_SECOND(f, T, int)
#define BINARY_SECOND(f, T, U) \
    template void f(T, U);
BINARY(test)
*/

static void instantiate() {
  using valid = std::variant<double,float,int>;
  valid x, y;

  std::visit([]<typename T, typename U>(T x, U y) {
    if constexpr (!std::is_same_v<T,U>) {
      test(x, y);
    }
  }, x, y);
}

Update 16 August 2023

This technique may not work with certain compiler optimizations. See the next post for how to fix that.