[build2] LTO and parallelization with build2

Mon Aug 3 11:51:03 UTC 2020

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> Fedora is looking to enable LTO by default in F33[1], and recently
> they decided to use -flto=auto for the default GCC LTO build flags[2].
> This means[3] that GCC will attempt to use the GNU make jobserver, if
> available, but otherwise fall back to using the number of CPU threads
> on the system. Since build2 has its own scheduler, though, this means
> there might be e.g. up to n^2 threads on an n CPU thread system during
> linking, which could negatively impact performance / waste CPU time.

The bigger issue is potential memory usage (I've seen translation units
that take over 1G to compile).

> So when building with -flto={auto,n} using GCC, build2 may either need
> to instead invoke the compiler/linker with -flto=m, where m is the
> number of available threads from the scheduler (I'm not sure the
> scheduler currently supports something like this), or just disable
> parallel linking with -flto[=1]. Alternatively, build2 could attempt
> to act as a GNU make jobserver itself somehow. This also might become
> relevant during compilation as GCC looks to parallelize
> compilation[4,5].
> 
> I wonder if you've given any thought on how best to handle this. As
> far as I can tell, ninja hasn't really addressed this either, but it
> looks like they're considering making ninja work as a GNU make
> jobserver client[6,7], and there's a recent PR to make ninja work with
> multiple simultaneous ninja processes, all sharing a job limit[8].

For -flto to work via jobserver, I believe build2 (and ninja) would
need to implement the server proper, not just the client. And having
been subscribed to make-alpha for the past decade, I can tell you
that jobserver in GNU make has been a never-ending source of bugs,
corner case, and compatibility issues (see this post[1] for a primer).
So I would like to avoid touching that can of worms if I can help it
(we could probably do the client if really necessary, but the server
is a whole different story).

Now on to how we could handle -flto in build2. It would actually
be quite easy for the link rule to allocate more than one hardware
thread (if available) in order to pass it on to the linker. There
is no such support in the scheduler now but it should be pretty
straightforward to add. With this idea then it's only a matter
of rewriting -flto=auto or -flto=jobserver with -flto=N where
N is the number of hardware threads allocated.

There are two potentially problems with this:

1. If GCC does not use all the allocated threads, then they will be
   wasted, which would be pretty bad.

   Do you know if GCC will always utilize all the threads given? It
   appears to be generating a Makefile that it then passes to make,
   so probably it depends on what's in that Makefile.

2. Theoretically, via the jobserver, the linker can utilize additional
   threads as they become available. In our case, the number of
   allocated threads would be fixed at the linker start time.

Thoughts?

[1] http://make.mad-scientist.net/papers/jobserver-implementation/