# [build2] LTO and parallelization with build2

Matthew Krupcale mkrupcale at matthewkrupcale.com
Sun Aug 9 23:45:35 UTC 2020

On Sat, Aug 8, 2020 at 9:47 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> I suppose we could have added -flto-jobs=N if there is -flto=thin
> and no user-supplied -flto-jobs.

This is what I ended up doing[1].

> But maybe let's leave it for if/when
> someone needs it (I am not familiar with -flto=thin and whether it
> is used in practice).

I'm not sure how widely used it is either, and Fedora will currently
default to -flto[=full] for Clang[2]. Feel free to discard that
portion or play around with it.

> Maybe we could add iterator-based versions along these lines:
>
> template <typename I>
> I
> find_option_prefix (const char* prefix, I, I, bool = false);

I added this, as well as a find_option variant of this[3]. It's not
the same API as the other versions of find_option (i.e. returns I
rather than bool), but I wanted to find the whole option (not just the
prefix) "-flto=auto" (GCC) or "-flto=thin" (Clang).

Feel free to modify as you see fit. Having played around with it a
little, it's not 100% reliable yet. Sometimes it would not schedule
enough threads. But this is probably issue 1. you mentioned, which is
something that would require a dynamic scheduler interaction with the
linker as previously discussed. A little more concerning are the
following two issues.

Sometimes build2 thought there was a deadlock:

\$ b build2-test-build/
ld build2-test-build/libbuild2/libs{build2}
info: deadlocks are normally caused by dependency cycles
info: re-run with -s to diagnose dependency cycles
terminate called without an active exception
Aborted (core dumped)

Sometimes it would schedule more threads/jobs than I expected. With n
hardware threads, and m = n/2 specified as the max for the
alloc_guard, on my 4C4T machine I would sometimes see a total of N =
\sum_i {N_i} = 2n to 3n = 6 to 9 LTO jobs, with -flto=N_i specified,
where 1<=N_i<=3. If m=0 (allocate as many as available), I even saw as
many as N = n^2 = 16 with 1<=N_i<=5.

So there may be a problem with my use of the scheduler or the
scheduler itself which requires some experimentation.

Best,
Matthew

[1] https://fedorapeople.org/cgit/mkrupcale/public_git/build2.git/commit/?h=lto-parallelization&id=2715223d9f39ab60de52cb6573d1b30b43d9138f
[2] https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/master/f/macros#_332