The compilation procedure for C++20 modules

Intro

I finally got around to looking into C++20 modules, but I don’t just want to throw them into a black box (called CMake) which would handle them for me, I want to understand how the build process works. So I’ve looked into how different compilers handle modules, and documented it all here.

This post could be useful to build system implementers, and to makefile enjoyers who want to avoid more complex build systems.

An example makefile (of questionable readability) is provided, that handles modules on Clang, GCC, and MSVC, including all of the Clang’s module compilation strategies, import std support, and non-cascading changes.

References:

Clang modules manual.
List of GCC module-related flags.
List of MSVC module-related flags. (Note that at least one flag is missing from this page, namely /ifcMap.)

Pre-C++20 compilation model

For completeness, here is the pre-modules compilation model.

Each .cpp file is compiled independently (possibly in parallel). As a byproduct of the compilation, you get a list of headers that were included directly or indirectly.

Then on a rebuild, you check the modification times of the headers from that list, and if any of the headers were modified, you recompile the respective the .cpp file.

To get the list of headers, you use -MD -MP on GCC and Clang (or -MMD -MP to skip system headers), and /sourceDependencies output.json on MSVC.

What are C++20 modules

TL;DR:

Modules are an alternative to headers, but they need to be precompiled (into a compiler-specific binary format) before being imported, and importing those precompiled files is much faster than including headers.

Different compilers have different conventions for what to call those precompiled module files. I’m going to call them BMI, following Clang’s convention.

Compiler	Name	Default extension
Clang	BMI (built module interface)	`.pcm` (precompiled module)
GCC	CMI (compiled module interface)	`.gcm` (GCC compiled module)
MSVC	IFC (probably stands for “interface”, the author himself doesn’t know anymore)	`.ifc`

Unlike headers, modules don’t leak macros: importing a module doesn’t give you its macros, and the imported module isn’t affected by the macros you have already defined.

In addition to a BMI, compiling a module also produces an object file (.o), which should be linked into the final executable. (If the module doesn’t contain function definitions and such, then forgetting to link the .o doesn’t seem to error.)

A simple example

// a.cppm
export module A;

export int sum(int a, int b) {return a + b;}

// 1.cpp
#include <iostream>
import A;

int main()
{
    std::cout << sum(10, 20) << '\n';
}

Here are simple instructions how to compile this, a more detailed explanation is provided later.

Clang

clang++ a.cppm -std=c++20 -c
clang++ 1.cpp -std=c++20 -fmodule-file=A=a.pcm -c
clang++ a.o 1.o

GCC

g++ a.cppm -fmodules -std=c++20 -c
g++ 1.cpp -fmodules -std=c++20 -c
g++ a.o 1.o

MSVC

cl /nologo /EHsc a.cppm /TP /std:c++20 /interface /c
cl /nologo /EHsc 1.cpp /std:c++20 /c
cl a.obj 1.obj

Clang-CL — doesn’t seem to support MSVC-style module flags, but supports clang++-style module flags. Older Clang versions needed them prefixed with /clang:. See this discussion in CMake bug tracker.

In each of those, the first command both produces a BMI and an object file for the module. The second command consumes (imports) that BMI and produces an object file for the file consuming the module. Then the third command links the two object files together. (Notice that with Clang, we need to tell it where to find the BMI for a specific module. More on the differences between compilers later.)

Since a BMI is faster to produce than an object file, compilers can employ various tricks to inform the build systems when the BMI is ready, so that it can be consumed (by a parallel compilation command) without waiting for the object. More on that later too.

There seems to be no single convention for what the source file extension should be. I’m using the Clang’s convention of .cppm. Since MSVC doesn’t understand this extension, I’m using /TP to tell it that it’s a C++ source file.

A is the module name. Module names are .-separated lists of identifiers, so A.B or A.B.C would also be valid names. . has no special meaning and is just a part of the name.

Notice that the “module declaration” (here and below bold quotes indicate the terms from the C++ standard) — the export module A; in this example — must be the first non-empty non-comment line in its source file. The module declaration is considered a preprocessor directive, despite not starting with # (but import ... isn’t one).

Source files (translation units) containing a module declaration are called the “module units”.

If you need to include any headers, then you should use something called a “global module fragment”:

// a.cppm
module;

#include <iostream>

export module A;

export void SayHello()
{
    std::cout << "Hello!\n";
}

First you announce that this is a module using module;, include any headers you need, and then add a module declaration followed by your own code.

The optional part starting with module; and until the module declaration is called the “global module fragment”.

The part starting from the module declaration (regardless of whether the global module fragment is present) is called the “module unit purview” of this module unit.

All code in global module fragments and in non-module TUs is considered to belong to a single imaginary “global module”.

Another option for including headers seems to be to wrap them in extern "C++" { ... }. While this works, Clang and MSVC warn about includes after module declarations, and don’t silence the warning even in the presence of extern "C++".

Kinds of module units

There are 4 kinds of module units, having different kinds of module declarations. They can share the same module name, and all module units sharing the name are called a single “named module”. From outside of a named module, that named module is only importable in one piece.

The kinds of module units are:

export module A; is the “primary module interface unit”. This is the only required module unit in a named module, and there can only be one per named module.

This is what is imported by import A;, and is the only thing that can be imported from outside of this named module.
module A; is the “module implementation unit” (a non-“partition” one, more on those later). You can have more than one of those.
```
// a.cppm
export module A;

export void foo();
```
```
// a_foo.cpp
module A;

void foo() {....}
```
You can put implementations into those, similar to the .h/.cpp separation prior to modules.

Those are not importable, neither from outside nor from inside the module.

module A; implicitly does import A; (to import the primary interface unit), and it’s an error to add your own redundant import A;.

Depending on how clever your build system and compiler are, I believe they may be able to avoid rebuilding dependent TUs even if you don’t move definitions to implementation units, if the module interface unit is modified in a way that e.g. only changes the function bodies and not their parameters and return types. At the time of writing, no compiler seems to support this.

Moving the definitions to an implementation unit makes this optimization guaranteed (i.e. the fact that importers of this module won’t be rebuilt when the implementation changes; as was the case with moving function definitions from headers into .cpp files). And this is more parallelizable, since we don’t have to wait for the interface unit to be rebuilt to check if it changed or not.
export module A:P; is a non-primary “module interface unit” and a “partition unit”, or informally a interface partition unit. There can be multiple of those.

The purpose of those is to split large interfaces, to avoid having everything in the primary interface unit.
```
// a.cppm
export module A;

export import :P1;
export import :P2;
```
```
// a_p1.cppm
export module A:P1;

export void foo() {....} // Define here or in a separate implementation unit.
```
```
// a_p2.cppm
export module A:P2;

export void bar() {....} // ^
```
As you can see, those can be imported inside of the same named module. But not from outside (at least not directly, but you can import the primary interface unit from outside that reexports those).

P is a “partition name”. Like a module name, it’s a .-separated list of identifiers, with . having no special meaning and just being a part of the name.

Partition names must be unique per named module.

The C++ standard requires that all interface partition units are reexported from the primary interface unit (IFNDR otherwise), but seemingly nothing bad could happen if you forget, it just makes them pointless if you do so. Notice that they can reexport each other, so the primary interface unit doesn’t have to do it directly, doing it indirectly is fine too.
module A:P; is a “module implementation unit” like 2, and a “module partition” like 3, informally an internal partition unit.

Those are in a weird place. Their intended purpose seems to be to hold internal utilities (internal to this named module), to share code between 2s: (between non-partition implementation units)
```
// a.cppm
export module A;

export void foo();
```
```
// a.cpp
module A;
import :Helpers;

void foo() {helper();}
```
```
// a_helpers.cppm
module A:Helpers;

void helper();
```
```
// a_helpers.cpp
module A;
import :Helpers; // Not strictly necessary in this example, but necessary in general.

void helper() {}
```
Those are importable only inside of the same named module, like interface partitions. But unlike interface partitions those can’t be reexported from interface units.

The partition name P must be unique per named module across both interface and implementation partitions.

Interestingly, Clang warns when importing those in interface units, because allegedly some contents could leak to the consumers of the module.

In addition to the intended use, those also double as an alternative to 2 that doesn’t implicitly import the entire 1, therefore providing better incremental build times. But you also get a useless BMI for it (unlike 2 which don’t have BMIs) (unless you detect in your build system that no BMI is needed and disable its generation, but so far only Clang seems to support that).

To summarize:

	`export` (interface)	no `export` (implementation)
no `:Partition`	(1) primary interface unit	(2) implementation unit
`:Partition`	(3) interface partition unit	(4) internal partition unit

1 and 3 are “interface module unit”s.
3 and 4 are “partition unit”s.
1, 3, 4 (but not 2) are importable units, though only 1 can be imported outside of its named module.

Clang’s convention is to use .cppm for importable units and .cpp for non-importable ones.

Depending on the context, a “module” seems to mean either a “named module” or an “importable module unit” or a “module unit”.

The intent seems to be to have a relatively small amount of named modules in a project. In large projects, the source files naturally tend to get separated into subdirectories, and each of those subdirectories is a good candidate for being a single named module. Small projects could consist entirely of a single named module.

The build procedure for modules

To correctly handle dependencies between modules, all .cpp/.cppm need to be scanned, to determine what importable units they import, and if they are themselves importable (and if so, under what name).

The scan results for a file need to be updated when the file changes, or when any headers that it includes (maybe indirectly) are modified (because you could wrap an import in an #ifdef affected by an include). We don’t need to rescan a file when the module units imported by it are modified.

Then the TU needs to be rebuilt if it had to be rescanned, or if any of its imported module units were modified, recursively (can get away with doint it non-recursively in some cases).

This means that we no longer need to emit the header dependencies as the byproduct of the compilation (unlike pre-modules), since it can be done during scanning, and is needed to correctly rescan anyway (in theory, the alternative to emitting them during scans is to emit them during compilation, but then if said compilation (of this TU) fails, you would have to remember to rescan it; this seems unnecessarily complicated and pointless).

But if rebuilding a BMI produced a bitwise identical file (a build system should compare hashes), then TUs importing it don’t have to be rebuilt. More details later.

Scanning dependencies

The scan commands need the same flags as your compiler: include paths, macros, language standard, etc.

The compilers all use the same JSON-based format for outputting module scan results, called P1689R5 after the proposal that introduced it.

Clang

clang-scan-deps -format=p1689 -o a.json -- clang++ a.cppm -std=c++20 -M -MP -MF a.d -MQ a.htgt -o a.mtgt

Writes P1689R5 module deps to a.json, based on -o ... before --. Omit or -o - to print to stdout.

Writes header deps to a.d, based on -MF .... Change to -MF - to print to stdout (omitting prints to a.mtgt per -o).

-o ... selects the target filename reported to P1689R5.

-MQ ... selects the target filename reported to header deps.

The minimal working command that ignores header deps is clang-scan-deps -format=p1689 -- clang++ src/a.cppm -std=c++20.

GCC

g++ a.cppm -std=c++20 -M -MP -fmodules -fdeps-format=p1689r5 -fdeps-file=a.json -fdeps-target=a.mtgt -MQ a.htgt -MF a.d

Writes P1689R5 module deps to a.json, based on -fdeps-file=..., change to -fdeps-file=- to print to stdout (omitting automatically chooses the filename).

Writes header deps to a.d, based on -MF .... Omit or -MF - to print to stdout. (-o seems to be equivalent to -MF).

-fdeps-target=... selects the target filename reported to P1689R5.

-MQ ... selects the target filename reported to header deps.

Don’t strictly need -std=c++20, but it’s weird to use modules before C++20.

Omitting -fdeps-format=p1689r5 will output module information in some curious GCC-specific Makefile-style format, to the same file as header deps.

The minimal working command that ignores header deps is g++ a.cppm -std=c++20 -M -MP -fmodules -fdeps-format=p1689r5 (the result is written to a.dii by default).

MSVC

cl a.cppm /nologo /EHsc /std:c++latest /scanDependencies a.json /sourceDependencies a.d /TP /Foa.mtgt

Writes P1689R5 module deps to a.json, based on /scanDependencies .... Pass - to print to stdout.

Writes header deps to a.d, based on /sourceDependencies .... Pass - to print to stdout.

/Fo... selects the target filename reported to P1689R5.

The header deps are in Microsoft’s own JSON format. It doesn’t include the target filename, so there is no flag to change it.

Don’t strictly need /nologo /EHsc.

MSVC doesn’t understand the .cppm extension by default, I’m using /TP to force it to assume the input is C++ code. You can omit this for .cpp files if you want.

The minimal working command is cl a.cppm /TP /scanDependencies- /std:c++20.

Reported target filenames

In the examples above, you can use the a.mtgt and a.htgt to carry arbitrary information to the resulting module deps files and headers deps files respectively.

Those are not strictly necessary for parsing those files. You can omit the corresponding parameters and get some default strings instead.

Scanning multiple files with a single command

The format itself seems to allow for scanning multiple source files with a single command (see the "rules": [...] array), and from my testing the only compiler that can do this is Clang if instead of providing the compiler flags manually to clang-scan-deps, you write them to a compiler_commands.json and pass that, as described here.

Either way, this doesn’t seem terribly useful compared to just scanning the files individually, in parallel.

P1689R5 schema summary

{
    "version": 1, // 0 on GCC
    "revision": 0,
    "rules": [
        {
            "primary-output": "a.o", // As chosen by a flag, see above.
            "outputs": [...], // MSVC only, not very useful.
            "provides": [ // This module. Only exists if this is an importable module unit.
                {
                    "logical-name": "A", // Name of this module unit.
                    "source-path": "a.cppm", // This source file. GCC doesn't report this.
                    "is-interface": true // True if `export module`, false if `module ...`. The latter only appears for partitions, since for non-partitions the entire `provides` is skipped.
                }
            ],
            "requires": [ // Imported modules. On Clang, this array is omitted if empty.
                {"logical-name": "B1"},
                {"logical-name": "B2"},
                // ...
            ]
        }
    ]
}

When partitions are involed, :... is just appended to the name string, so partitions don’t need to be special-cased.

The implicit import A; in files starting with export A; is implicitly added to requires.

Notice that "requires": [...] can’t differentiate between imports and export imports, but this very useful to a build system (at best it could probably enable some small build optimizations on MSVC, more on that later).

Single-phase compilation

All three compilers can do this, and this is the simplest approach.

“Single phase” refers to the BMI and the .o being produced by the same compiler invocation. (Some sources count the scan as another phase, which adds to the confusion, since for Clang “two-phase compilation” means something else.)

I’m told that GCC is able to report in real-time when the BMI is done (it can communicate so over sockets or otherwise), but implementing this into a build system sounds like too much effort, so I’m going to ignore it in this tutorial.

Clang

Clang treats .cppm and .cpp differently, as importable module units and as non-importable/non-module units respectively. This can be overridden using -xc++-module or -xc++ before the source filename respectively. Trying to compile an importable module unit as .cpp/-xc++ doesn’t error, but produces no BMI. Trying to compile a non-importable module unit or a non-module as a .cppm/-xc++-module errors.

Example compilation command for an importable unit:

clang++ a.cppm -std=c++20 -fmodules-reduced-bmi -c -o a.o -fmodule-output=a.pcm

-std=c++20 or newer is needed.

-fmodules-reduced-bmi enables a more optimal module format, which is the default since Clang 22. This flag is ignored on non-importable module units and on non-modules.

-fmodule-output=... controls where the BMI is placed. The default is to use the same location as -o with the extension changed to .pcm.

Any time you import a module unit (no matter if in .cpp or in .cppm), you must add -fmodule-file=NAME=PATH for that module unit, and for everything it imports recursively. (NAME= can be omitted, but the manual says this is deprecated.)

There is also -fprebuilt-module-path=... to search BMIs in a directory, but then they need to be named in a particular way (matching the module name as written in its module declaration), and unlike other compilers, Clang seems to lack a way to automatically pick the correct BMI output filenames, so we’d have to ensure the right names ourselves. So for this reason I like -fmodule-file=... more.

GCC

GCC treats .cpp and .cppm the same way.

GCC needs -fmodules to export or import modules.

GCC somehow allows modules in any language version, but passing -std=c++20 just in case is still a good idea.

You don’t need any special flags when compiling a BMI, the following command just works.

g++ a.cppm -std=c++20 -fmodules -c a.o

GCC automatically decides where to place the BMIs and where to import them from, it puts them in the ./gcm.cache directory. So no special flags are needed when importing the modules.

To override where the BMIs are stored and loaded from, you can provide a module map file, which is just a list of all module units, one per line: name and BMI path, separated by a space, e.g.:

A blah/a.gcm
A:P bleh/a_p.gcm

And so on. You can also add $root foo/bar as the first line, to prepend that directory (foo/bar) to every other path in the file.

This file can then be passed to -fmodule-mapper=... to any GCC command that may export or import a module. If this file is specified, then it must list every module, since it disables the automatic module-name-to-path translation.

GCC can also interact with programs over sockets or otherwise instead of using a simple file, but this sounds like too much work for me.

MSVC

MSVC doesn’t care about file extensions for importable vs non-importable units, like GCC.

But MSVC doesn’t understand the .cppm extension, so if you use that, you need /TP to force treat it as C++.

The standard needs to be set to /std:c++20 or newer.

MSVC needs different flags for different kinds of importable units: interface units need /interface, and internal partitions need /internalPartition.

/Fo... sets the object output path, and /ifcOutput... sets the BMI output path.

Like Clang, MSVC needs /reference NAME=PATH for every imported module recursively. It seems that unlike with Clang, we can skip it for indirect non-exported dependencies (so e.g. if export module A; does import B;, and then a TU does import A;, then that TU doesn’t need /reference B=... because B is not export imported). But since the scans don’t provide this information (whether something is imported or export imported), this is not useful and we have to recurse into all dependencies anyway.

MSVC also searches for BMIs in the current directory, and it seems this search can’t be disabled, and custom directories can’t be added.

MSVC does support module maps similar to GCC, but with a different file format, see the manual. Unlike GCC, it only respects the map when importing, but not when choosing the output BMI filename.

The module map syntax is:

[[module]]
name = 'A'
ifc = 'path/to/A.ifc'

[[module]]
name = 'B'
ifc = 'path/to/B.ifc'

Both '...' and "..." are allowed, but only "..." supports escape sequences (\ needs to be escaped as \\ in "...").

Compiler flags summary

Category	Clang	GCC	MSVC	Comment
Language standard	`-std=c++20` or newer	Any standard + `-fmodules`	`/std:c++20` or newer
Other flags	`-c` `-fmodules-reduced-bmi` to use a nicer BMI format (default since Clang 22)	`-c`	`/c` `/nologo /EHsc` recommended out of general sanity	Each compiler needs it’s usual `-c`/`/c` flag.
A module map?	No	`-fmodule-mapper=...` Needed to support custom BMI paths.	`/ifcMap...` Optional, can be used instead of passing `/reference` for every imported module.	I recommend using a module map at least in GCC to customize the BMI paths. Optionally in MSVC to avoid dealing with `/reference`, but the same logic is needed for Clang anyway. Clang has some kind of module maps, but those seem to be for the non-standard Clang modules, not for C++20 modules.
Choosing the output object filename	`-o ...`	`-o ...`	`/Fo...` (without a space)
Choosing the output BMI filename	`-fmodule-output=...` (Not setting this uses the object filename with the extension changed to `.pcm`.)	Taken from the module map. Or chosen automatically as `./gcm.cache/...` if no module map.	`/ifcOutput...` Or created in the current directory using the module name.	The filenames automatically selected by MSVC work with its implicit search for BMIs in the current directory. But the filenames automatically chosen by Clang don’t work with its `-fprebuilt-module-path=...`, as they are based on the object filenames, not on the module names.
Need to list imported BMI paths?	Yes, `-fmodule-file=NAME=PATH`. (Omitting `NAME=` is deprecated.) Can also search in directories using `-fprebuilt-module-path=...`, but that requires choosing the right BMI filenames in the build system, Clang doesn’t automate this.	No. The module map is used if provided, otherwise the default path `./gcm.cache` is searched for BMIs.	`/reference NAME=PATH` Or using the module map. Additionally always searches for BMIs in `.` regardless of anything else.	When listing BMIs using flags, must list indirect dependencies too.
Flags for different kinds of module units	`-xc++-module` for importable units (optional for `.cppm`) `-xc++` otherwise (optional for `.cpp`)	No	`/interface` for interface units `/internalPartition` for implementation partitions

Two-phase compilation

As an alternative to the single-phase compilation described above (that produces both the BMI and the .o in a single compiler invocation), Clang has several two-phase compilation models to choose from, where the compiler is called twice per importable module unit: first to produce the BMI, and then to produce the object file.

Note that some sources count the scan as an another phase, so if you hear someone say that “all compilers compile modules in two-phases”, they’re just counting the scan as one of the phases, they don’t refer to this strategy that Clang allows.

The two-phase process only applies to exportable module units. Everything else must be built in one phase regardless.

Here are Clang’s two-phase models that you can choose from:

1.
    a.cppm  --1-->  full BMI  --2-->  a.o
                      |
                      '-->  consumers

2.
                 .-->  full BMI  --2-->  a.o
    a.cppm  --1--|
                 '-->  reduced BMI  -->  consumers

3.
       .-----2-->  a.o
    a.cppm
       '-----1-->  BMI  -->  consumers

Clang has “full BMIs” vs “reduced BMIs”. A full BMI is needed to be able to produce an .o from it, but otherwise a reduced BMI seems better (it’s smaller, so faster to import, and I’m also told that full BMIs can leak undesired things to the importers).

Single-phase compilation (and 3 here) should use reduced BMIs, as those seem to be more optimal, but full BMIs should work for those too.

Reduced BMIs seem to be required if you want to support non-cascading changes

-fmodule-file=... must be passed to both phases.

How to perform each phase in each model:

1-2 and 2-2: To produce an .o from a full BMI, just pass the full BMI instead of the source file, along with -c. If the extension of the BMI is not .pcm, use -xpcm before the filename.
1-1: To produce a full BMI instead of an .o, pass --precompile instead of -c. (This sets the default to -fno-modules-reduced-bmi, even on newer Clang versions.) Then -o sets the BMI output path.
2-1: To produce both a full BMI and a reduced BMI, use --precompile -fmodules-reduced-bmi -fmodule-output=a_reduced.pcm -o a_full.pcm. Forgetting -fmodules-reduced-bmi on Clang 22 or newer does nothing, but on Clang 21 it silently doesn’t emit the reduced BMI.
3-2: Just do -xc++ to produce an .o without a BMI.
3-1: To produce a reduced BMI without also producing a full BMI or .o, use:
- --precompile-reduced-bmi instead of --precompile — Needs Clang 23 or newer (which wasn’t yet released at the time of writing this).
- This could be imitated using 2-1, sending the full BMI to /dev/null, but that seems slow enough to make approach 3 useless in Clang 22 and older.

It’s unclear which of those strategies is the best, or if they are better than the single-phase approach in the first place. Benchmarks are needed.

Something tells me 1 is not a good strategy, as it forces importers to consume the full BMIs instead of reduced BMIs.

I did a simple benchmark on this module:

module;
#include <iostream>
export module A;
export void foo() {}

And it seems that 1-1 and 2-1 for this file are individually slower than the entire single-phase compilation of this file, around 280ms vs 210ms, so strategies 1 and 2 seem to be pointless. After Clang 23 gets released, we can benchmark 3.

Two-phase compilation on compilers other than Clang

GCC and MSVC seem to lack the flags for this.

GCC has -fmodule-only to generate a BMI without an .o, but it doesn’t have a flag to generate an .o without a BMI, and there is no simple way of sending it to /dev/null (or elsewhere) without generating a separate module map for each individual TU, that would send only that single .o file to /dev/null.

Non-cascading changes

As mentioned earlier, rebuilding a BMI can produce a bitwise identical file in the following cases:

The imported module units got changed in a way that doesn’t affect this one, or
All changes in the current file are isolated to function bodies and such (no compiler seems to implement this at the time of writing).

For each BMI, after building it, hash it and store the hash to a file. Load the old hash first and compare them. (If this is Clang’s two-phase compilation and you have both a full and a reduced BMI, hash the reduced BMI only.)

Then, in theory, if your build system has a way to back out at runtime and mark a file as unmodified, you could just do that with the BMI (and touch it). But Make doesn’t.

If your build system can’t do this, you can emulate this by having a set of “unmodified BMIs” (their filenames; don’t write the set to disk). If after building a BMI its hash didn’t change, add the BMI to the set.

Then, when building anything that imports a module (either another BMI, or just a non-importable .o file), if the only reason its being rebuild is due to imported BMI changes, and all those changed BMIs are in the unmodified set, then skip rebuilding this TU and just touch the resulting file. If this is a BMI, add it to the set too.

When dealing with Clang’s two-phase compilation, with its the sequential variants 1 and 2, this check is only needed for the first stage. The second stage should be skipped (touching the outputs instead) if and only if we skipped the first stage in the same way.

I’ve tested this on the three compilers, and:

GCC doesn’t seem to have reproducible BMI builds as of GCC 15, i.e. the hashes will be different even if the input files are the same.
Clang and MSVC do have reproducible BMI builds.

But if you just change a function body in an interface unit, this still changes the BMI on those compilers.

Handling indirect dependencies

Indirect BMI dependencies need special attention.

// a.cppm
export module A;
export import :P;

// ap.cppm
export module A:P;
export void foo() {}

// 1.cpp
import A

Let’s say ap.cppm gets modified in a way that changes its BMI (e.g. foo get removed from it).

Then ap.cppm gets rebuilt, and so does a.cppm because a hash of its dependency has changed.

We want 1.cpp to be rebuilt too, which requires one of the two things to happen:

Either the compiler must ensure that the hash of a.cppm’s BMI changes (when rebuilding it with a modified dependency if that can affect its consumers).
Or the build system has to check whether ap.cppm’s BMI has changed when considering whether to rebuild 1.cpp (which normally wouldn’t happen, since ap.cppm is not a direct dependency of it, only a.cppm is).

1 is more desirable because this lets us skip more rebuilds. Only the compiler can tell what dependency changes affect or don’t affect the consumers of this module unit, so any implementation of this in the build system (i.e. 2) has to be conservative and sometimes rebuild more things than necessary.

A few simple tests show that:

Clang seems to support 1. They promise so in the documentation.
MSVC doesn’t support 1.
GCC is moot, because it doesn’t ensure reproducible BMI builds in the first place.

Therefore at least for MSVC, we have to do 2.

`import std;`

C++23 includes modules for the standard library: std and std.compat (the latter additionally imports the C standard library contents into the global namespace, not only into std::).

Since the BMIs depend on the compiler and its settings, the build system must build the BMIs for the standard modules. They don’t come preinstalled with the standard library.

The main issue is locating the sources for those modules. The procedure depends both on the compiler and on the chosen C++ standard library.

GCC: g++ -print-file-name=libstdc++.modules.json. This prints a path to a JSON, which in turn lists the paths to the two source files, std.cc and std.compat.cc, relative to the JSON itself.

In theory GCC can also work with libc++ instead of libstdc++, but this isn’t a popular configuration. I hope if this is enabled at GCC build time, g++ -print-file-name=libc++.modules.json should just work, possibly with -stdlib=libc++.

Another option for GCC is not locating the file at all, and compiling it using g++ -fsearch-include-path bits/std.cc .... and similarly for std.compat. The flag -fsearch-include-path tells it to search for the specified source files in include directories.
Clang: For libstdc++ and libc++: clang++ -print-library-module-manifest-path. Same style of JSON as GCC. This respects -stdlib=libstdc++ vs -stdlib=libc++, and outputs the correct JSON for each.

Clang also understands the GCC-style -print-file-name=libstdc++.modules.json, and for libc++ -print-file-name=libstdc++.modules.json. But this seems worse than -print-library-module-manifest-path, since now you have to manually specify which standard library to use.

Clang’t doesn’t understand GCC’s -fsearch-include-path.

The flag -print-library-module-manifest-path doesn’t work with MSVC STL. For MSVC STL it will print <NOT PRESENT>. I couldn’t find a proper procedure for MSVC STL, but this hack seems to work:
- Make a .cpp file containing only #include <yvals.h>. Compile it with -H -fsyntax-only.
- The first line of the stderr will be along the lines of . C:\\Program Files\\Microsoft Visual Studio\\18\\Community\\VC\\Tools\\MSVC\\14.50.35717\\include\\yvals.h.
- Relative to that path, you want ....\MSVC\14.50.35717\modules\modules.json.
When it’s unclear what standard library is being used, I’d try the MSVC STL procedure first, and if that header is not found, fall back to -print-library-module-manifest-path.
MSVC: Again no standardized search procedure, but the following seems to work.

Similarly to Clang + MSVC STL, make a .cpp file containing only #include <yvals.h>, then call cl 1.cpp /nologo /sourceDependencies- /scanDependencies NUL.

Extract .Data.Includes[0] from the printed JSON, e.g. via | jq -r '.Data.Includes[0]'.

You will get something similar to C:\Program Files\Microsoft Visual Studio\18\Community\VC\Tools\MSVC\14.50.35717\include\yvals.h, and you want ....\MSVC\14.50.35717\modules\modules.json relative to that.

Another search approach seems to be to look relative to the location of cl.exe, which can be e.g. C:\Program Files\Microsoft Visual Studio\18\Community\VC\Tools\MSVC\14.50.35717\bin\Hostx64\x64\cl.exe. But this doens’t work in msvc-wine, so I prefer the previous option.

Then scan and compile those modules as usual, except Clang needs -Wno-reserved-module-identifier to silence the warning about the module names being reserved for the standard library.

You could even skip scanning those files, as it seems that std.compat always imports std, and they don’t import anything else.

Header units

There is another kind of module units, called header units.

Headers (with no special module-related annotations) can be compiled into BMIs, and then imported using import "foo.h" or import <foo.h> (or even using the #include syntax via certain compiler flags).

Macros can be imported from header units (but they don’t propagate in the other direction, from the importer into the imported header).

Header units seem to not produce .o files.

The three compilers seem to already support those, at least minimally.

The big problem currently is that the module scans don’t list imported header units: some compilers ignore them entirely, and some try to import their BMIs during the scans and then error if those BMIs can’t be found.

Currently this can be worked around by building all header units before touching any other .cpp/.cppm files, but:

This prevents us from doing anything else in parallel.
We can’t support imports in header units, neither of named modules nor of other header units.

The proper solution for this seems to be for compilers to internally replace header unit imports with #includes during scans, but no compiler has implemented this yet.

To me it seems that header units are not ready yet, and we should wait for the compiler vendors to cook them more.

Private module fragment

You’re allowed to do the following in the primary interface unit:

export module A;

export void foo();

module :private;

void foo() {...}

Everything starting from module :private; is called the “private module fragment”.

The private module fragment can only be used in the primary interface unit. If it’s present, this named module must contain only one module unit (this primary interface unit).

In theory, it just guarantees that the contents don’t affect the importers and are not present in the BMI, as if it was moved to an implementation unit.

Quick tests in different compilers show that:

Clang and MSVC — support the syntax, but editing the private module fragment still changes the BMI hash.
GCC — doesn’t support the syntax yet.

It seems that no special support for those is needed in the build system, everything should just work out of the box.

An example makefile

I’ve made a makefile that implements everything described in this post: modules support on Clang, GCC, and MSVC, including all of Clang’s two-phase module compilation strategies, import std;, and non-cascading changes.

Header units are not implemented, since the compilers seem to lack the ability to scan dependencies on header units properly, at the time or writing.

Clangd

Clangd seems to support C++20 modules.

There are two different ways to make them work:

`--experimental-modules-support`

This is the proper solution.

Pass this flag to Clangd. If you’re using VSC, it goes into the Clangd: Arguments setting.

Make sure you have compile_commands.json that lists all your files (importantly, it must list all importable module units, including that for the std module if you use that).

The JSON does not need to contain -fmodule-file=... (it seems to have no effect), meaning you can create the JSON before scanning the modules.

Clangd handles finding all importable module units and resolving imports automatically. It ignores your BMIs and works correctly even before you build the BMIs.

The manual way

This is a fallback solution, if --experimental-modules-support doesn’t work for some reason.

Don’t pass --experimental-modules-support. Include -fmodule-file=... in your compile_commands.json.

Build your BMIs with the exact same Clang version as the version of Clangd you’re using.

Clangd will consume the BMIs specified via -fmodule-file=... in compile_commands.json, meaning you have to build the BMIs first for Clangd to function.

Clangd will provide stale completions if you edit an importable file but don’t rebuild its BMI.

Then, if you have the importing file already open, you must type something in it (don’t have to save those changes) for Clangd to notice the updated imported BMIs.

Non-standard MSVC internal partition handling

Adding this for completeness.

MSVC has an alternative non-standard mode of handling internal partitions that activates if you don’t pass /internalPartition.

Then it allows module A:P; to coexist with export module A:P; (while the standard doesn’t allow duplicate partition names in the same named module), and implicitly imports the latter into the former, and doesn’t generate any BMI for the former.

This implicit import is not reported during scanning. /internalPartition has no effect on scanning (which makes sense, since you need said scanning to determine if the file is an internal partition in the first place, and therefore could accept this flag).

While this mode does make some sense, it’s non-standard so probably should be avoided.

That’s all, thanks for reading!