[build2] LTO and parallelization with build2

Thu Aug 13 16:34:23 UTC 2020

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> Did you get a chance to test on a larger machine?

I only smoke-tested it on my 6C/12T development machine. At least build2
side seems to do the right thing (i.e., I got -flto=12 for an executable
project).

> 1. It might make sense to implement the find_option{,_prefix}
> functions taking {c,}strings in terms of the new iterator variants and
> the compare_option{,_prefix} functions you wrote.

Yes, I also thought we can clean that up.

> 2. Investigate the use of BLAKE3 hash for file checksums. BLAKE3[1] is
> significantly faster than SHA-1 and SHA-2 (5-10 times) and is highly
> parallelizable since it uses Merkle trees internally. This could
> utilize the new scheduler thread allocator, but even without
> parallelization, it's much faster. For small inputs, this may not
> matter much, but for many, large TUs or object file checksums, this
> might be noticeable, especially if solution 1 of [2] were implemented.

The largest amount of data that we currently hash is the preprocessed
TUs during C/C++ compilation. In fact, what we actually hash are the
preprocessor tokens that are returned by the lexer in order to calculate
the checksum that omits ignorable changes. Which means it's not going
to be easily parallelizable. Also, the build2 scheduler is geared
towards more substantial task and my feeling is that any win from
parallel hashing will be offset by the scheduling overhead (locking,
starting threads, etc).

> 3. Write a Fortran language build system module. This would likely
> need a lot of similar machinery in the cc module as well as some of
> the cxx module dependency scanning logic to handle Fortran modules.
> Fortran compilers though don't have a protocol for communication
> between buildsystem and compiler like C++ modules for module name-file
> mapping. Instead, compiled Fortran module interface files are named
> according to the (lowercase) module name and searched for in the -I,
> -J, and current directories (at least that's what gfortran seems to
> do). So we just need to find the module source file and compile that
> before any module "use"s it, and gfortran should find it. gfortran can
> use the C preprocessor (in traditional mode), but it's not invoked by
> default unless the file extension is .fpp or is like ".F*", and files
> can be textually included using either "include" statements or
> "#include" directives (when cpp is invoked).

Sounds interesting, though I personally have never used Fortran so
you will have to be the expert on the compilation model, etc. I did
hear Fortran modules being used as an example of how not to do
modules ;-).

I am also planning to generalize/factor some of the make dependency
parsing and handling logic from the cc module so that it can be reused
by other modules (quite a few tools these days can produce make-style
dependency information).

A couple of more areas that may pique your interest:

- Reproducible builds (-ffile-prefix-map) and separate debug info
  (-gsplit-dwarf) with the view towards distributed compilation and
  hashing.

- Assembler/linker support in the bin module.

- C++20 modules support.