From boris at codesynthesis.com  Mon Sep  7 11:49:51 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Mon, 7 Sep 2020 13:49:51 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKoc+gZcM8J12ibT7PSYkRp=bmU_7Ut8rjW71cpkwB=H44g@mail.gmail.com>
References: <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
 <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
 <boris.20200811144803@codesynthesis.com>
 <CACfYKofVneSYmV_aqsn1C63Cf+Yz-gYVi5MAXEYsMd0BZLn2AQ@mail.gmail.com>
 <boris.20200812141138@codesynthesis.com>
 <CACfYKodgFDmyiMUr_GHWYpKbyogBRpscLhawUCRVk+h4LagZpg@mail.gmail.com>
 <20200813163423.GA13783@codesynthesis.com>
 <CACfYKoc+gZcM8J12ibT7PSYkRp=bmU_7Ut8rjW71cpkwB=H44g@mail.gmail.com>
Message-ID: <boris.20200826155035@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> On the other hand, hashing large binary files (e.g. object files,
> libraries, executables) could benefit much more from such parallelism
> and single-threaded speed, and maybe one could come up with a
> heuristic for determining when to use multiple hashing threads. This
> hashing will likely be necessary to avoid updating e.g.
> libraries/executables depending on unchanged object files[1],
> re-running tests which depend on unchanged executables (i.e.
> incremental testing), etc.

I think in these cases we need to try hard to get away with just
the modification times (I am still going to reply to that bug
report, I promise ;-)).


> Sounds good. I had actually wondered about using a custom header
> dependency scanner (similar to how modules must be handled) rather
> than invoking the preprocessor and getting header info from e.g. -M*
> since Clang devs showed this could be much faster than the
> preprocessor[3].

To get accurate result one still has to do proper preprocessing (and
Clang scan-deps folks do, they just take some shortcuts that are
possible when all you need is the dependency information).


> But since we want the preprocessed TU anyways, this is kind of a
> moot point.

Right, our compilation model is to simultaneously perform partial
preprocessing (-fdirectives-only for GCC, -frewrite-includes for
Clang) and dependency extraction (and hashing) and then compile the
partially preprocessed TU, if necessary, potentially by shipping it
to a different host (distributed compilation). This has shown[1] to
result in faster compilation compared to the traditional model. It
will also become more important with C++20 modules where a TU might
have to be recompiled due to changes to things (like module BMIs that
the TU imports) other than the TU "text" itself (so the same partially
preprocessed TU can be reused).

There is, however, one significant drawback to this model: the
partially preprocessed TUs can be quite large and building bigger
projects requires GBs of disk space. We've actually been having
this issue on our CI machines and it will only get worse. So we
need to think of a possible solution.


> > - Reproducible builds (-ffile-prefix-map)
> 
> I suppose this can already be done, but would the idea be to
> automatically add something like -ffile-prefix-map=$src_root= to the
> compiler args?

This is still fuzzy and requires more thinking/experimentation. The
biggest issue here (and also for compilation result caching) is debug
information. In a sense, we are trying to sit on two chairs here:

1. Have object code with debug info that has correct references to
   the source files on the user's machine (so that the user doesn't
   have to manually specify the location of files during debugging).   

2. Have object code be "location-independent" so it can be cached
   and reused by multiple builds.


> > and separate debug info (-gsplit-dwarf)
> 
> This is interesting, I wasn't aware of this option. It could
> significantly improve link times and complement both the recent
> -flto=auto work and the solution to [1]. I suppose you should already
> be able to use -gsplit-dwarf and things will more or less work during
> development and building.

Since this produces another file, we would at least need to recognize
it so things are cleaned up properly, etc.


> I guess you would want to use dwp[4] or dwz[5] and link with
> --gdb-index (or maybe gdb-add-index can work with dwo or dwp files?)
> for installation/distribution, though. What Fedora does for its
> packages is after install phase runs find-debuginfo.sh[6], which
> searches for unstripped executable files, runs gdb-add-index and
> strips them, then compresses the output with dwz.

Another thing that would be interesting to consider/try is to produce
just the debug information (.dwo files), without .o and how long this
take compared to producing both. The idea is to perhaps produce debug
information on the local host and/or on demand.


> > with the view towards distributed compilation and hashing.
> 
> Would the idea here be that each build node might have a different
> environment as far as paths go? Or might it make sense to set up some
> sort of container like e.g. bubblewrap[7] for a hermetic, consistent
> build on each node?

Yes, that's the issue. But I don't think containers will offer a
solution here since the goal is to produce object files that are
reusable by multiple builds with varying environments, not to make
sure the environment does not vary.


> > - C++20 modules support.
> 
> This mostly just requires work on the compiler end at this point right?

And the build system integration.


[1] https://build2.org/article/preprocess-compile-performance.xhtml