[build2] LTO and parallelization with build2

Wed Aug 5 12:21:32 UTC 2020

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> Yeah, it seems like implementing the make jobserver would be rather
> complex.

One idea came into my head: we could provide hooks in the scheduler
to allow a build system module to interface with something that
controls hardware thread allocation (we would also need some kind
of a command line option for module pre-load since such a module
wouldn't be loaded from buildfiles). One could then implement a module
that provides the jobserver functionality either just the client or
the server and either from scratch or by reusing GNU make. Just putting
the idea out there.

> The LTO WHOPR mode[1] is enabled when -flto is passed[2] and an LTO
> partitioning algorithm is used[3]. The LGEN phase should be executed
> in parallel already by build2 since it invokes the compiler in
> parallel for the TUs. Then lto-wrapper forks and execs the two
> remaining stages, each executed with the specified parallelism:
> 
> [...]

Thanks for the overview!

> So provided that there are more partitions than allocated CPU threads
> (i.e. lto-partitions > n), both WPA and LTRANS stages of GCC LTO
> should utilize all n threads from the scheduler.

Does the number of partititions somehow correlate with the number of
TUs being linked? In a sense we have two problems:

1. We could supply too few hardware threads (e.g., because other
   threads are still being used by build2 at the start of linking
   but may become available during linking).

2. We could supply too many hardware threads that the linker cannot
   utilize but that build2 could have used for other tasks.

One interesting heuristics against (2) would be to supply up to half
of the available hardware threads. With hyper-threaded CPUs one would
only waste at most 10-20% of performance in the worst case scenario.

> Hypothetically, the WPA stage might support the jobserver at some
> point, so both stages could support dynamic thread allocation from a
> build2 jobserver. But I think the above analysis at least indicates
> that the static thread approach should be capable of fully utilizing
> the threads assigned for linking. It would then be up to build2 to
> make sure it's keeping the other threads busy with other tasks while
> linking.

Yes, that's the problem, build2 may have nothing else to do and no
way to communicate to the linker that it can use more threads (e.g.,
during the second stage). But it's probably a good enough first
approximation.