[build2] LTO and parallelization with build2

Sat Aug 15 01:54:51 UTC 2020

On Thu, Aug 13, 2020 at 12:34 PM Boris Kolpackov
<boris at codesynthesis.com> wrote:
>
> The largest amount of data that we currently hash is the preprocessed
> TUs during C/C++ compilation.

Yeah, the TUs were where I suspected there might currently be some benefit.

> In fact, what we actually hash are the
> preprocessor tokens that are returned by the lexer in order to calculate
> the checksum that omits ignorable changes.

Yes, after looking at libbuild2/cc/{compile-rule,parser,lexer}.cxx I
can see this now.

I suspect this strategy is optimal when most changes are ignorable
with respect to the compiler output (including debug info). One
assumes the cumulative incremental token checksum updates take much
less time than compiling, which should be true provided the
parser/lexer and hasher are fast.

If changes are not ignorable with respect to the compiler, we will
obviously have to recompile, so we may have spent some additional time
doing the parsing/lexing and incremental hashing compared to just
going ahead and recompiling. If we again assume the parser/lexer and
hasher are fast compared to compiling, this is probably a negligible
contribution.

In either case, we want the parser/lexer and hasher to be fast, and
BLAKE3 is likely faster to update than SHA-256.

> Which means it's not going
> to be easily parallelizable. Also, the build2 scheduler is geared
> towards more substantial task and my feeling is that any win from
> parallel hashing will be offset by the scheduling overhead (locking,
> starting threads, etc).

Yeah, this probably means that the current TU hashing scheme is not
suitable for threaded parallelism. Furthermore, this is probably the
only reasonable TU hashing scheme since hashing the full TU is kind of
pointless unless you're trying to detect (only) identity transforms.

On the other hand, hashing large binary files (e.g. object files,
libraries, executables) could benefit much more from such parallelism
and single-threaded speed, and maybe one could come up with a
heuristic for determining when to use multiple hashing threads. This
hashing will likely be necessary to avoid updating e.g.
libraries/executables depending on unchanged object files[1],
re-running tests which depend on unchanged executables (i.e.
incremental testing), etc.

Although, in the case of linking shared libraries it might be possible
to do something smarter like hashing a representation of the ABI
(using e.g. libabigail)[2]. Presumably this is less total work than
full hash + re-link (in the case of mismatch), although it may depend
on the complexity of the library being analyzed.

> Sounds interesting, though I personally have never used Fortran so
> you will have to be the expert on the compilation model, etc.

Yeah, I'm not exactly an expert on Fortran, but I've worked with it
enough to probably work on a build system module for it.

> I did hear Fortran modules being used as an example of how not to do
> modules ;-).

Yeah, it's not ideal, but build2 should be more than capable of
handling it properly. Often times CMake or autotools Fortran projects
don't properly handle module dependencies because e.g. they use a
recursive design that doesn't have a full picture of (inter-directory)
module dependencies, or they don't scan for module dependencies at all
and just force you to iteratively recompile until things work.

> I am also planning to generalize/factor some of the make dependency
> parsing and handling logic from the cc module so that it can be reused
> by other modules (quite a few tools these days can produce make-style
> dependency information).

Sounds good. I had actually wondered about using a custom header
dependency scanner (similar to how modules must be handled) rather
than invoking the preprocessor and getting header info from e.g. -M*
since Clang devs showed this could be much faster than the
preprocessor[3]. But since we want the preprocessed TU anyways, this
is kind of a moot point.

> - Reproducible builds (-ffile-prefix-map)

I suppose this can already be done, but would the idea be to
automatically add something like -ffile-prefix-map=$src_root= to the
compiler args?

> and separate debug info (-gsplit-dwarf)

This is interesting, I wasn't aware of this option. It could
significantly improve link times and complement both the recent
-flto=auto work and the solution to [1]. I suppose you should already
be able to use -gsplit-dwarf and things will more or less work during
development and building.

I guess you would want to use dwp[4] or dwz[5] and link with
--gdb-index (or maybe gdb-add-index can work with dwo or dwp files?)
for installation/distribution, though. What Fedora does for its
packages is after install phase runs find-debuginfo.sh[6], which
searches for unstripped executable files, runs gdb-add-index and
strips them, then compresses the output with dwz.

> with the view towards distributed compilation and hashing.

Would the idea here be that each build node might have a different
environment as far as paths go? Or might it make sense to set up some
sort of container like e.g. bubblewrap[7] for a hermetic, consistent
build on each node?

I also noticed that Bazel uses Merkle trees for its remote caching
mechanism[8,9].

> - C++20 modules support.

This mostly just requires work on the compiler end at this point right?

[1] https://github.com/build2/build2/issues/87
[2] https://engineering.mongodb.com/post/pruning-dynamic-rebuilds-with-libabigail
[3] https://llvm.org/devmtg/2019-04/slides/TechTalk-Lorenz-clang-scan-deps_Fast_dependency_scanning_for_explicit_modules.pdf
[4] https://gcc.gnu.org/wiki/DebugFissionDWP
[5] https://sourceware.org/git/?p=dwz.git;a=summary
[6] https://github.com/rpm-software-management/rpm/blob/master/scripts/find-debuginfo.sh
[7] https://github.com/containers/bubblewrap
[8] https://github.com/bazelbuild/bazel/tree/master/src/main/java/com/google/devtools/build/lib/remote/merkletree
[9] https://github.com/bazelbuild/remote-apis/issues/141