Placeholder substitution in the preprocessor

I came up with a trick that vastly simplifies some usecases of the preprocessor, and want to show it off.

This doesn’t change what can or can’t be done with the preprocessor, but makes the macro syntax much cleaner in some cases.

Demo on Godbolt Library on Github

The basic syntax | Example applications | Comparison with other methods | How does it work?

The basic syntax

The macro I’m presenting takes a piece of source code with placeholders and pastes it multiple times, substituting different placeholder values.

For example, given int _1_ = _2_; and tuples (a,10), (b,20), (c,30) it produces:

int a = 10;
int b = 20;
int c = 30;

The exact syntax in this case is:

EM_CODEGEN((a,10)(b,20)(c,30),, int _1_ = _2_;)

While this might look like a basic feature to have, the C/C++ preprocessor is very limited, so implementing this isn’t trivial.

The applications are endless! For example:

Generating enums with string conversion functions.
Generating reflection information for classes.
Generating const/&/&&-qualified function overloads.
Generating repetitive classes, such as vec2/vec3/vec4 (which was the original motivation for this library).

While all this was already possible before, this library makes it much easier.

Example applications

Generating enums with string conversion functions

#define MAKE_ENUM(E, elems) \
    enum class E { \
        EM_CODEGEN(elems,, _1_ = _2_,) \
    }; \
    std::string ToString(E e) \
    { \
        switch (e) { EM_CODEGEN(elems,, case E::EM_1: return EM_STR _P_(_1_);) } \
    }

MAKE_ENUM( E,
    (a,10)
    (b,20)
    (c,30)
)

This expands to:

enum class E
{
    a = 10,
    b = 20,
    c = 30,
};
std::string ToString(E e)
{
    switch (e)
    {
        case E::a: return "a";
        case E::b: return "b";
        case E::c: return "c";
    }
}

Where:

EM_STR is the classical stringification macro, such that EM_STR(foo) expands to "foo":
```
#define EM_STR(...) EM_STR_(__VA_ARGS__)
#define EM_STR_(...) #__VA_ARGS__
```
_P_ is a macro that must be added before parentheses if an _i_ placeholder appears inside of them. So EM_STR(_1_) is spelled here as EM_STR _P_(_1_).

While magic_enum is a thing (and can magically convert between enums and strings without any macros), it has its limitations (slow compilation times, inability to customize the names of the constants, not detecting large enum values, etc), so preprocessor-based solutions are still relevant.

The macro above doesn’t support omitting the values of the enum elements, but adding support for that is easy:

#define MAKE_ENUM(E, elems) \
    enum class E { \
        EM_CODEGEN(elems,, _1_ _2_OPT_,) \
    }; \
    std::string ToString(E e) \
    { \
        switch (e) { EM_CODEGEN(elems,, case E::EM_1: return EM_STR _P_(_1_);) } \
    }

MAKE_ENUM( E,
    (a,=10)
    (b)
    (c,=30)
)

Here _2_OPT_ is a version of _2_ that expands to nothing if the value isn’t specified, instead of erroring out.

And here’s a version with a nicer call syntax without =:

#define MAKE_ENUM(E, elems) \
    enum class E { \
        EM_CODEGEN(elems,, _1_ MAYBE_INIT _P_(_2_OPT_),) \
    }; \
    std::string ToString(E e) \
    { \
        switch (e) { EM_CODEGEN(elems,, case E::EM_1: return EM_STR _P_(_1_);) } \
    }
#define MAYBE_INIT(...) __VA_OPT__(= __VA_ARGS__)

MAKE_ENUM( E,
    (a,10)
    (b)
    (c,30)
)

Generating reflection information for classes

#define REFLECTED(m) \
    EM_CODEGEN(m,, _1_ _2_ _3_OPT_;) \
    auto members() {return std::tie( EM_CODEGEN(m, (,), _2_) );} \
    auto member_names() {return std::array{ EM_CODEGEN(m, (,), EM_STR _P_(_2_)) };}

struct A
{
    REFLECTED(
        (int,a)
        (float,b,=12.3f)
    )
};

This expands to:

struct A {
    int a;
    float b = 12.3f;
    auto members() {return std::tie(a, b);}
    auto member_names() {return std::array{"a", "b"};}
};

The new piece of syntax here is the second argument of EM_CODEGEN, which acts as an optional separator inserted between expansions (a comma in this case).

Astute readers might notice that this macro chokes on e.g. array types: (int[3],a) produces int[3] a;, which isn’t legal. Luckily this can be easily fixed by replacing _1_ with std::type_identity_t<_1_>.

Another problem is that types with embedded commas need special care: (std::map<int,float>,a) would be incorrectly parsed as (std::map<int, float>, a). The type needs to be parenthesized: ((std::map<int,float>), a). EM_CODEGEN automatically removes parentheses if present, so this will produce std::map<int,float> a; (not (std::map<int,float>) a;, which would be illegal).

Generating `const`/`&`/`&&`-qualified function overloads

#define MAYBE_CONST_LR(...) \
    EM_CODEGEN_LOW( \
        , \
        (      & ,         ((*this))) \
        (const & ,         ((*this))) \
        (      &&, std::move(*this) ) \
        (const &&, std::move(*this) ),, \
        (__VA_ARGS__) \
    )

#define QUAL _1_
#define FWD_SELF _2_

struct A
{
    int x;

    MAYBE_CONST_LR(
        int QUAL foo() QUAL {return FWD_SELF.x;}
    )
};

This expands to:

struct A
{
    std::string x;

    int &foo() & { return (*this).x; }
    int const &foo() const & { return (*this).x; }
    int &&foo() && { return std::move(*this).x; }
    int const &&foo() const && { return std::move(*this).x; }
};

This is useful for generating methods such as std::optional’s .value().

While C++23 does add a feature (explicit object parameters, aka “deducing this”) that makes this possible without macros, the feature has its limitations, meaning it’s not always viable (or is this is a compiler bug?).

The new piece of syntax here is EM_CODEGEN_LOW(...). This version of the macro is necessary when the code with placeholders is passed as a macro parameter. (A piece of source code with placeholders is toxic and can’t be passed to another macro without parenthesizing it, since _i_ expand to )(..., which otherwise break macro expansion. Unlike EM_CODEGEN, EM_CODEGEN_LOW assumes the code is already parenthesized, and that’s it.)

Here we’re using ((*this)) instead of (*this) because, again, one set of parentheses is automatically removed from the element if present. In this case it’s undesirable (*this.x isn’t what we want), so we add another pair of parentheses (one pair is removed, leaving us with (*this).x).

Generating repetitive `vec2`/`vec3`/`vec4` classes

This was the original motivation for this library. Those small fixed-size vectors are commonly used in graphics.

template <typename T, int N> struct vec;

#define MAKE_VEC(m) \
    template <typename T> \
    struct vec<T, EM_CODEGEN(m,, +1)> \
    { \
        EM_CODEGEN(m,, T _1_{};) \
        vec() {} \
        vec(EM_CODEGEN(m,(,), T _1_)) : EM_CODEGEN(m,(,), _1_ _P_(std::move _P_(_1_))) {} \
    };

MAKE_VEC((x)(y))
MAKE_VEC((x)(y)(z))
MAKE_VEC((x)(y)(z)(w))
#undef MAKE_VEC

This expands to:

template <typename T>
struct vec<T, +1+1>
{
    T x{};
    T y{};

    vec() {}
    vec(T x, T y) : x(std::move(x)), y(std::move(y)) {}
};

template <typename T>
struct vec<T, +1+1+1>
{
    T x{};
    T y{};
    T z{};

    vec() {}
    vec(T x, T y, T z) : x(std::move(x)), y(std::move(y)), z(std::move(z)) {}
};

template <typename T>
struct vec<T, +1+1+1+1>
{
    T x{};
    T y{};
    T z{};
    T w{};

    vec() {}
    vec(T x, T y, T z, T w) : x(std::move(x)), y(std::move(y)), z(std::move(z)), w(std::move(w)) {}
};

Comparison with other methods

Let’s see how this library compares with some alternative approaches.

Here’s how else our ‘generating enums with string conversions’ example can be implemented:

X-macro

The classic solution is an X-macro:

// Shared implementation:
#define MAKE_ENUM(E, F) \
    enum class E { F(ELEM) }; \
    inline std::string ToString(E e) \
    { \
        using type = E; \
        switch (e) { F(CASE) } \
    }
#define ELEM(name, value) name = value,
#define CASE(name, value) case type::name: return #name;

// Declaring a single enum:
#define E_MEMBERS(X) \
    X(e1,10) \
    X(e2,20) \
    X(e3,30)
MAKE_ENUM(E, E_MEMBERS)

After expansion:

enum class E
{
    e1 = 10,
    e2 = 20,
    e3 = 30,
};

inline std::string ToString(E e)
{
    using type = E;
    switch (e)
    {
        case type::e1: return "e1";
        case type::e2: return "e2";
        case type::e3: return "e3";
    }
}

This is trivial to implement, but the enum definition syntax doesn’t look good (requires a separate #define per enum).

The implementation of MAKE_ENUM requires multiple #defines, and the loop bodies (ELEM, CASE) have to be defined out of line, which harms readability if many loops are needed.

Plus, as you might have noticed, we can’t use the enum name in CASE(...) (hence the type typedef), and in general can’t pass any outside information into those loops, without making the X-lists (E_MEMBERS in this case) uglier:

#define E_MEMBERS(X,Y) \
    X(e1,10,Y) \
    X(e2,20,Y) \
    X(e3,30,Y)

Looping manually

The additional #define per call is undesirable, it would be great to simplify the syntax to something along the lines of:

MAKE_ENUM( E,
    (e1,10)
    (e2,20)
    (e3,30)
)

But how do we loop over the elements? It can be done manually, without any supporting libraries:

// Shared implementation:
#define MAKE_ENUM(E, elems) \
    enum class E { END(ELEM_A elems) }; \
    inline std::string ToString(E e) \
    { \
        using type = E; \
        switch (e) { END(CASE_A elems) } \
    }

#define END(...) END_(__VA_ARGS__)
#define END_(...) __VA_ARGS__##_END

#define ELEM(name, value) name = value,
#define ELEM_A(...) ELEM(__VA_ARGS__) ELEM_B
#define ELEM_B(...) ELEM(__VA_ARGS__) ELEM_A
#define ELEM_A_END
#define ELEM_B_END

#define CASE(name, value) case type::name: return #name;
#define CASE_A(...) CASE(__VA_ARGS__) CASE_B
#define CASE_B(...) CASE(__VA_ARGS__) CASE_A
#define CASE_A_END
#define CASE_B_END

After expansion, same as before:

enum class E
{
    e1 = 10,
    e2 = 20,
    e3 = 30,
};

inline std::string ToString(E e)
{
    using type = E;
    switch (e)
    {
        case type::e1: return "e1";
        case type::e2: return "e2";
        case type::e3: return "e3";
    }
}

While the call syntax looks clean now, the implementaiton is ugly. Not only do we need 5 #defines worth of boilerplate per loop (and the loop bodies are still defined out of line), we also have no way of smuggling external informaton into the loop body (ELEM, CASE), meaning the using type = E; workaround is unavoidable.

Looping with a library

There are libraries that help you write preprocessor loops, such as Boost.PP with its BOOST_PP_SEQ_FOR_EACH(...). Those usually come with a limit on the number of iterations they can perform (limited by the number of boilerplate macros they have pasted into their headers), but it’s possible to support unlimited number of iterations (I have a library for that, and it’s used internally by the macro I’m presenting here).

I’m going to use my own library for this example (see the previous link for the explanation of the syntax), but all of them should be more or less similar.

// Shared implementation:
#define MAKE_ENUM(E, elems) \
    enum class E { SF_FOR_EACH(ELEM, SF_NULL, SF_NULL,, elems) }; \
    inline std::string ToString(E e) \
    { \
        switch (e) { SF_FOR_EACH(CASE, SF_STATE, SF_NULL, E, elems) } \
    }

#define ELEM(n, E, name, value) name = value,
#define CASE(n, E, name, value) case E::name: return #name;

// Declaring a single enum:
MAKE_ENUM( E,
    (e1,10)
    (e2,20)
    (e3,30)
)

After expansion:

enum class E {
    e1 = 10,
    e2 = 20,
    e3 = 30,
};

inline std::string ToString(E e)
{
    switch (e)
    {
        case E::e1: return "e1";
        case E::e2: return "e2";
        case E::e3: return "e3";
    }
}

Now we have significantly less boilerplate per loop, and we are able to use the external information inside of the loop body (the enum name in case E::e1:, in this case).

But there’s still an additional #define per loop, and the loop bodies are still defined out-of-line.

And the library I’m presenting fixes both. :P

This library

To recap:

#define MAKE_ENUM(E, elems) \
    enum class E { \
        EM_CODEGEN(elems,, _1_ = _2_,) \
    }; \
    std::string ToString(E e) \
    { \
        switch (e) { EM_CODEGEN(elems,, case E::EM_1: return EM_STR _P_(_1_);) } \
    }

MAKE_ENUM( E,
    (a,10)
    (b,20)
    (c,30)
)

This finally gives us 0 extra macros per loop, and the loop bodies are now inline.

How does it work?

The basic idea is that the _i_ placeholders get replaced with )(foo,.

So int _1_ = _2_; becomes something like int )(foo, = )(bar, ;. If we parenthesize this, it becomes a list of the form (...)(...)(...), which we can then loop over using SF_FOR_EACH(...) or any other similar library.

EM_CODEGEN((a)(b)(c), sep, pattern) uses two nested loops. The outer loop iterates over (a)(b)(c), and then the inner loop iterates over the pattern (as shown above) and substitutes the values.

Astute readers might have noticed that #define _i_ )(foo, isn’t usable inside of parentheses. E.g. after f(_1_) expands to ( f( )(,foo ) ), we have no way of skipping f and recursing into ( )(,foo ). That’s why the _P_ macro (that was explained here) is needed. _P_(...) flattens the parentheses, expanding to )(open, ... )(close,. Then the same loop that expands placeholders can revert those back to (,).

That’s all for today!

The basic syntax

Example applications

Generating enums with string conversion functions

Generating reflection information for classes

Generating const/&/&&-qualified function overloads

Generating repetitive vec2/vec3/vec4 classes

Comparison with other methods

X-macro

Looping manually

Looping with a library

This library

How does it work?

Generating `const`/`&`/`&&`-qualified function overloads

Generating repetitive `vec2`/`vec3`/`vec4` classes