From boris at codesynthesis.com Mon Sep 7 11:49:51 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Mon, 7 Sep 2020 13:49:51 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: <20200813163423.GA13783@codesynthesis.com> Message-ID: Matthew Krupcale writes: > On the other hand, hashing large binary files (e.g. object files, > libraries, executables) could benefit much more from such parallelism > and single-threaded speed, and maybe one could come up with a > heuristic for determining when to use multiple hashing threads. This > hashing will likely be necessary to avoid updating e.g. > libraries/executables depending on unchanged object files[1], > re-running tests which depend on unchanged executables (i.e. > incremental testing), etc. I think in these cases we need to try hard to get away with just the modification times (I am still going to reply to that bug report, I promise ;-)). > Sounds good. I had actually wondered about using a custom header > dependency scanner (similar to how modules must be handled) rather > than invoking the preprocessor and getting header info from e.g. -M* > since Clang devs showed this could be much faster than the > preprocessor[3]. To get accurate result one still has to do proper preprocessing (and Clang scan-deps folks do, they just take some shortcuts that are possible when all you need is the dependency information). > But since we want the preprocessed TU anyways, this is kind of a > moot point. Right, our compilation model is to simultaneously perform partial preprocessing (-fdirectives-only for GCC, -frewrite-includes for Clang) and dependency extraction (and hashing) and then compile the partially preprocessed TU, if necessary, potentially by shipping it to a different host (distributed compilation). This has shown[1] to result in faster compilation compared to the traditional model. It will also become more important with C++20 modules where a TU might have to be recompiled due to changes to things (like module BMIs that the TU imports) other than the TU "text" itself (so the same partially preprocessed TU can be reused). There is, however, one significant drawback to this model: the partially preprocessed TUs can be quite large and building bigger projects requires GBs of disk space. We've actually been having this issue on our CI machines and it will only get worse. So we need to think of a possible solution. > > - Reproducible builds (-ffile-prefix-map) > > I suppose this can already be done, but would the idea be to > automatically add something like -ffile-prefix-map=$src_root= to the > compiler args? This is still fuzzy and requires more thinking/experimentation. The biggest issue here (and also for compilation result caching) is debug information. In a sense, we are trying to sit on two chairs here: 1. Have object code with debug info that has correct references to the source files on the user's machine (so that the user doesn't have to manually specify the location of files during debugging). 2. Have object code be "location-independent" so it can be cached and reused by multiple builds. > > and separate debug info (-gsplit-dwarf) > > This is interesting, I wasn't aware of this option. It could > significantly improve link times and complement both the recent > -flto=auto work and the solution to [1]. I suppose you should already > be able to use -gsplit-dwarf and things will more or less work during > development and building. Since this produces another file, we would at least need to recognize it so things are cleaned up properly, etc. > I guess you would want to use dwp[4] or dwz[5] and link with > --gdb-index (or maybe gdb-add-index can work with dwo or dwp files?) > for installation/distribution, though. What Fedora does for its > packages is after install phase runs find-debuginfo.sh[6], which > searches for unstripped executable files, runs gdb-add-index and > strips them, then compresses the output with dwz. Another thing that would be interesting to consider/try is to produce just the debug information (.dwo files), without .o and how long this take compared to producing both. The idea is to perhaps produce debug information on the local host and/or on demand. > > with the view towards distributed compilation and hashing. > > Would the idea here be that each build node might have a different > environment as far as paths go? Or might it make sense to set up some > sort of container like e.g. bubblewrap[7] for a hermetic, consistent > build on each node? Yes, that's the issue. But I don't think containers will offer a solution here since the goal is to produce object files that are reusable by multiple builds with varying environments, not to make sure the environment does not vary. > > - C++20 modules support. > > This mostly just requires work on the compiler end at this point right? And the build system integration. [1] https://build2.org/article/preprocess-compile-performance.xhtml