From mkrupcale at matthewkrupcale.com  Sat Aug  1 16:45:09 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Sat, 1 Aug 2020 12:45:09 -0400
Subject: [build2] LTO and parallelization with build2
Message-ID: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>

Hello,

Fedora is looking to enable LTO by default in F33[1], and recently
they decided to use -flto=auto for the default GCC LTO build flags[2].
This means[3] that GCC will attempt to use the GNU make jobserver, if
available, but otherwise fall back to using the number of CPU threads
on the system. Since build2 has its own scheduler, though, this means
there might be e.g. up to n^2 threads on an n CPU thread system during
linking, which could negatively impact performance / waste CPU time.

So when building with -flto={auto,n} using GCC, build2 may either need
to instead invoke the compiler/linker with -flto=m, where m is the
number of available threads from the scheduler (I'm not sure the
scheduler currently supports something like this), or just disable
parallel linking with -flto[=1]. Alternatively, build2 could attempt
to act as a GNU make jobserver itself somehow. This also might become
relevant during compilation as GCC looks to parallelize
compilation[4,5].

I wonder if you've given any thought on how best to handle this. As
far as I can tell, ninja hasn't really addressed this either, but it
looks like they're considering making ninja work as a GNU make
jobserver client[6,7], and there's a recent PR to make ninja work with
multiple simultaneous ninja processes, all sharing a job limit[8].

Best,
Matthew

[1] https://fedoraproject.org/wiki/LTOByDefault
[2] https://src.fedoraproject.org/rpms/redhat-rpm-config/c/4637e1bd5512b869bd07c58b7545d2528a9bc4c8?branch=master
[3] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
[4] https://gcc.gnu.org/wiki/ParallelGcc
[5] https://www.phoronix.com/scan.php?page=news_item&px=GCC-2020-More-Parallel-Large-S
[6] https://github.com/ninja-build/ninja/issues/1139#issuecomment-520546470
[7] https://github.com/ninja-build/ninja/pull/1140
[8] https://github.com/ninja-build/ninja/pull/1815


From bsutherland at soundhound.com  Sun Aug  2 23:53:20 2020
From: bsutherland at soundhound.com (Bruce Sutherland)
Date: Mon, 3 Aug 2020 08:53:20 +0900
Subject: [build2] Generating source code in build2 projects
In-Reply-To: <boris.20200731062456@codesynthesis.com>
References: <CAHrQT23ZNmXuR8xLG4GSVjYGizPhNhkG1=jrX7_iFY=z3C99wQ@mail.gmail.com>
 <boris.20200731062456@codesynthesis.com>
Message-ID: <CAHrQT23GH1yWCMZmnU3KgvMRnF5za1QHKw-pkrYEBCkUDuAf+A@mail.gmail.com>

Thank you Boris, that's very helpful. I'll start out with ad hoc
recipes, and will reach out again if we run into trouble.

On Fri, Jul 31, 2020 at 1:42 PM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> Bruce Sutherland <bsutherland at soundhound.com> writes:
>
> > So I assume we need to configure blubc as a tool in build2, and define
> > dependencies from .blub to .cpp and .h files.
> >
> > How should we do that with build2? The documentation mentions
> > generated source code, but doesn't go into much detail. Can anyone
> > point me to an example of a project with a similar configuration?
>
> Currently there are two way to do this: you can provide a rule for
> compiling any .blub to .h/.cpp or you can provide an ad hoc recipe
> for compiling a specific foo.blub to foo.h/foo.cpp.
>
> To provide a rule, currently, you will have to write a build system
> module that implements it (see, for example, libbuild2-rust[1]). In
> the next release we are also planning to add support for ad hoc rules
> that can be written directly in buildfiles (similar to ad hoc recipes
> below).
>
> An ad hoc recipe you can write directly in the buildfile and I would
> suggest that you start with that since writing a rule/module is fairly
> involved. The release notes for 0.13.0[2] have a number of examples
> that should get you started (in particular, see the xxd example).
> There is also a NASM example[3] I posted a few days ago that shows
> how to define a custom target type (e.g., for .blub).
>
> Note also that this is still an area of active development so there
> might be rough edges, missing pieces, etc. So we are quite interested
> in usage feedback and also don't mind doing a bit of hand-holding. So
> let me know how it goes or if there are any issues. In particular, if
> you have a lot of .blub, then writing ad hoc recipes could get tedious
> and that's where support for ad hoc rules could be helpful.
>
> [1] https://github.com/build2/libbuild2-rust
> [2] https://build2.org/release/0.13.0.xhtml#adhoc-recipe
> [3] https://lists.build2.org/archives/users/2020-July/000823.html


From boris at codesynthesis.com  Mon Aug  3 11:51:03 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Mon, 3 Aug 2020 13:51:03 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
Message-ID: <boris.20200803131754@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> Fedora is looking to enable LTO by default in F33[1], and recently
> they decided to use -flto=auto for the default GCC LTO build flags[2].
> This means[3] that GCC will attempt to use the GNU make jobserver, if
> available, but otherwise fall back to using the number of CPU threads
> on the system. Since build2 has its own scheduler, though, this means
> there might be e.g. up to n^2 threads on an n CPU thread system during
> linking, which could negatively impact performance / waste CPU time.

The bigger issue is potential memory usage (I've seen translation units
that take over 1G to compile).


> So when building with -flto={auto,n} using GCC, build2 may either need
> to instead invoke the compiler/linker with -flto=m, where m is the
> number of available threads from the scheduler (I'm not sure the
> scheduler currently supports something like this), or just disable
> parallel linking with -flto[=1]. Alternatively, build2 could attempt
> to act as a GNU make jobserver itself somehow. This also might become
> relevant during compilation as GCC looks to parallelize
> compilation[4,5].
> 
> I wonder if you've given any thought on how best to handle this. As
> far as I can tell, ninja hasn't really addressed this either, but it
> looks like they're considering making ninja work as a GNU make
> jobserver client[6,7], and there's a recent PR to make ninja work with
> multiple simultaneous ninja processes, all sharing a job limit[8].

For -flto to work via jobserver, I believe build2 (and ninja) would
need to implement the server proper, not just the client. And having
been subscribed to make-alpha for the past decade, I can tell you
that jobserver in GNU make has been a never-ending source of bugs,
corner case, and compatibility issues (see this post[1] for a primer).
So I would like to avoid touching that can of worms if I can help it
(we could probably do the client if really necessary, but the server
is a whole different story).

Now on to how we could handle -flto in build2. It would actually
be quite easy for the link rule to allocate more than one hardware
thread (if available) in order to pass it on to the linker. There
is no such support in the scheduler now but it should be pretty
straightforward to add. With this idea then it's only a matter
of rewriting -flto=auto or -flto=jobserver with -flto=N where
N is the number of hardware threads allocated.

There are two potentially problems with this:

1. If GCC does not use all the allocated threads, then they will be
   wasted, which would be pretty bad.

   Do you know if GCC will always utilize all the threads given? It
   appears to be generating a Makefile that it then passes to make,
   so probably it depends on what's in that Makefile.

2. Theoretically, via the jobserver, the linker can utilize additional
   threads as they become available. In our case, the number of
   allocated threads would be fixed at the linker start time.

Thoughts?

[1] http://make.mad-scientist.net/papers/jobserver-implementation/


From mkrupcale at matthewkrupcale.com  Mon Aug  3 14:56:38 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Mon, 3 Aug 2020 10:56:38 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200803131754@codesynthesis.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
Message-ID: <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>

On Mon, Aug 3, 2020 at 7:51 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> The bigger issue is potential memory usage (I've seen translation units
> that take over 1G to compile).

Yeah, very good point. This will especially be a problem on ARM.

> For -flto to work via jobserver, I believe build2 (and ninja) would
> need to implement the server proper, not just the client.

Yes, I believe that's right. The ninja devs in issue 1139 I think were
primarily trying to address a different issue which is what happens if
ninja is invoked by an outer make, or somehow make is the primary
scheduler.

> And having been subscribed to make-alpha for the past decade, I can tell you
> that jobserver in GNU make has been a never-ending source of bugs,
> corner case, and compatibility issues (see this post[1] for a primer).
> So I would like to avoid touching that can of worms if I can help it

Yeah, it seems like implementing the make jobserver would be rather complex.

> Now on to how we could handle -flto in build2. It would actually
> be quite easy for the link rule to allocate more than one hardware
> thread (if available) in order to pass it on to the linker. There
> is no such support in the scheduler now but it should be pretty
> straightforward to add. With this idea then it's only a matter
> of rewriting -flto=auto or -flto=jobserver with -flto=N where
> N is the number of hardware threads allocated.

Yeah, that's kind of what I had in mind.

> There are two potentially problems with this:
>
> 1. If GCC does not use all the allocated threads, then they will be
>    wasted, which would be pretty bad.
>
>    Do you know if GCC will always utilize all the threads given? It
>    appears to be generating a Makefile that it then passes to make,
>    so probably it depends on what's in that Makefile.

The LTO WHOPR mode[1] is enabled when -flto is passed[2] and an LTO
partitioning algorithm is used[3]. The LGEN phase should be executed
in parallel already by build2 since it invokes the compiler in
parallel for the TUs. Then lto-wrapper forks and execs the two
remaining stages, each executed with the specified parallelism:

1. The WPA stage is partitioned[4] with the output of each partition
done by separate forks of the LTO process[5]. Currently, the WPA stage
doesn't support the jobserver mode[6]. The default partitioning
algorithm is balanced[7], with the number of partitions controlled by
lto-partitions parameter[7] (default 32[8]). It looks like the
lto-partitions should exceed the number of CPUs used for compilation.
2. The LTRANS stage also operates on each partition independently.
Whether or not using the jobserver, the parallel LTO mode generates a
temporary Makefile[9], but if not using the jobserver, make will be
invoked without --jobserver-fd args[10] and with a statically
generated number of make jobs[11].

So provided that there are more partitions than allocated CPU threads
(i.e. lto-partitions > n), both WPA and LTRANS stages of GCC LTO
should utilize all n threads from the scheduler.

> 2. Theoretically, via the jobserver, the linker can utilize additional
>    threads as they become available. In our case, the number of
>    allocated threads would be fixed at the linker start time.

Yeah, I had thought about this issue with the static thread number
approach. Without the build2 jobserver, both WPA and LTRANS stages are
limited to n threads. Only the LTRANS stage currently supports the use
of the jobserver, though. Thus, even if build2 implements a make
jobserver, the WPA stage will currently be limited to at most n
threads from the build2 scheduler when spawning the linker.

Hypothetically, the WPA stage might support the jobserver at some
point, so both stages could support dynamic thread allocation from a
build2 jobserver. But I think the above analysis at least indicates
that the static thread approach should be capable of fully utilizing
the threads assigned for linking. It would then be up to build2 to
make sure it's keeping the other threads busy with other tasks while
linking.

[1] https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
[2] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1521
[3] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1564
[4] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1747
[5] https://github.com/gcc-mirror/gcc/blob/d1961e648e0fedebd06e4ad786c1bfc536312ef7/gcc/lto/lto.c#L2398
[6] https://github.com/gcc-mirror/gcc/blob/d1961e648e0fedebd06e4ad786c1bfc536312ef7/gcc/lto/lto.c#L3154
[7] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
[8] https://github.com/gcc-mirror/gcc/blob/51e85e64e125803502fde94b9e22037c0ccaa8b2/gcc/params.def#L1097
[9] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1877
[10] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1949
[11] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1968


From boris at codesynthesis.com  Wed Aug  5 12:21:32 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Wed, 5 Aug 2020 14:21:32 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
Message-ID: <boris.20200805135240@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> Yeah, it seems like implementing the make jobserver would be rather
> complex.

One idea came into my head: we could provide hooks in the scheduler
to allow a build system module to interface with something that
controls hardware thread allocation (we would also need some kind
of a command line option for module pre-load since such a module
wouldn't be loaded from buildfiles). One could then implement a module
that provides the jobserver functionality either just the client or
the server and either from scratch or by reusing GNU make. Just putting
the idea out there.


> The LTO WHOPR mode[1] is enabled when -flto is passed[2] and an LTO
> partitioning algorithm is used[3]. The LGEN phase should be executed
> in parallel already by build2 since it invokes the compiler in
> parallel for the TUs. Then lto-wrapper forks and execs the two
> remaining stages, each executed with the specified parallelism:
> 
> [...]

Thanks for the overview!


> So provided that there are more partitions than allocated CPU threads
> (i.e. lto-partitions > n), both WPA and LTRANS stages of GCC LTO
> should utilize all n threads from the scheduler.

Does the number of partititions somehow correlate with the number of
TUs being linked? In a sense we have two problems:

1. We could supply too few hardware threads (e.g., because other
   threads are still being used by build2 at the start of linking
   but may become available during linking).

2. We could supply too many hardware threads that the linker cannot
   utilize but that build2 could have used for other tasks.

One interesting heuristics against (2) would be to supply up to half
of the available hardware threads. With hyper-threaded CPUs one would
only waste at most 10-20% of performance in the worst case scenario.


> Hypothetically, the WPA stage might support the jobserver at some
> point, so both stages could support dynamic thread allocation from a
> build2 jobserver. But I think the above analysis at least indicates
> that the static thread approach should be capable of fully utilizing
> the threads assigned for linking. It would then be up to build2 to
> make sure it's keeping the other threads busy with other tasks while
> linking.

Yes, that's the problem, build2 may have nothing else to do and no
way to communicate to the linker that it can use more threads (e.g.,
during the second stage). But it's probably a good enough first
approximation.


From mkrupcale at matthewkrupcale.com  Wed Aug  5 16:24:13 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Wed, 5 Aug 2020 12:24:13 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200805135240@codesynthesis.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
Message-ID: <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>

On Wed, Aug 5, 2020 at 8:21 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> One idea came into my head: we could provide hooks in the scheduler
> to allow a build system module to interface with something that
> controls hardware thread allocation (we would also need some kind
> of a command line option for module pre-load since such a module
> wouldn't be loaded from buildfiles). One could then implement a module
> that provides the jobserver functionality either just the client or
> the server and either from scratch or by reusing GNU make. Just putting
> the idea out there.

That's an interesting idea indeed. It would require implementing the
jobserver somehow as you said (either custom or reusing GNU make), but
at least it would be somewhat isolated from the existing scheduler
then.

> Does the number of partititions somehow correlate with the number of
> TUs being linked?

I think for -flto-partition=balanced (the default) the number of
partitions is always statically determined by --param
lto-partitions=m, regardless of the number of TUs (by default, m=32).
For -flto-partition=1to1, m might be equivalent to the number of TUs.

> 1. We could supply too few hardware threads (e.g., because other
>    threads are still being used by build2 at the start of linking
>    but may become available during linking).

Yeah, this is the problem that would require integrating linker stages
2-3 with the build2 jobserver somehow.

> 2. We could supply too many hardware threads that the linker cannot
>    utilize but that build2 could have used for other tasks.

Based on my reading of how the partitioning works, as long as
lto-partitons=m > -flto=n, I think it should be possible for the
linker to fully utilize all given threads. This might require some
experimentation and verification, though.

> But it's probably a good enough first approximation.

Yeah, that's kind of what I was thinking.

With the current understanding of the partitioning, I was also
wondering if -flto=auto might be as likely to OOM as we previously
thought (i.e. with current build2 without changes).

build2 will already spawn as many linker threads (each currently
spawning 1 process when CFLAGS has -flto or -flto=1) as the scheduler
has available (call it n). Let's say those n threads are linking TU's
with cumulative size N_i bytes (or some other metric of code
complexity, like nodes in the tree representation), 0<i<n. Then
perhaps the total memory usage would be sum_i {N_i}

If CFLAGS is instead -flto=m or -flto=auto build2 (and children) will
spawn n*m threads, but if the m LTO threads are ~equally sized
partitions of the TUs (which balanced algorithm tries to do), their
size is e.g. M_ij = N_i / m now, and the memory usage would be
sum_{i=1}^n \sum_{j=1}^m {N_i / m} = sum_i {N_i}, which is the same.

This of course has some assumptions about the cumulative memory of the
m partitioned threads being the same as 1 thread doing the full link,
but maybe it's not far off. In any case, there are still concerns
about CPU starvation with n*m threads in the latter case, so it'd
probably be good to address regardless.


From boris at codesynthesis.com  Fri Aug  7 08:45:51 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Fri, 7 Aug 2020 10:45:51 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
Message-ID: <boris.20200807102744@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> [...] (by default, m=32) [...]
> 
> Based on my reading of how the partitioning works, as long as
> lto-partitons=m > -flto=n, I think it should be possible for the
> linker to fully utilize all given threads. This might require some
> experimentation and verification, though.

CPUs with more than 32 hardware threads are not that uncommon these
days.


> With the current understanding of the partitioning, I was also
> wondering if -flto=auto might be as likely to OOM as we previously
> thought (i.e. with current build2 without changes).
> 
> [...]
> 
> This of course has some assumptions about the cumulative memory of the
> m partitioned threads being the same as 1 thread doing the full link,
> but maybe it's not far off.

Could be. There is at least some fixed memory cost for each process
spawned.

In any case, I've added[1] the scheduler API to allocate extra active
threads. What's left is to translate -flto=auto. Would you like to take
a stab at it? I could give you some pointers if you are interested.

(BTW, there is now etc/bootstrap[2] script for setting up the build2
development environment. It might still have some rough edges but we
find it pretty usable.)


[1] https://github.com/build2/build2/commit/600da2b97e
[2] https://git.build2.org/cgit/etc/tree/bootstrap


From mkrupcale at matthewkrupcale.com  Fri Aug  7 13:22:58 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Fri, 7 Aug 2020 09:22:58 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200807102744@codesynthesis.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
Message-ID: <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>

On Fri, Aug 7, 2020 at 4:46 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:
>
> > [...] (by default, m=32) [...]
>
> CPUs with more than 32 hardware threads are not that uncommon these
> days.

For some reason when I was browsing the GitHub GCC sources, it led me
to look at an older revision of gcc/params.def. lto-partitions default
was bumped to 128[1], and lto-partitions is now set in
gcc/params.opt[2].

In any case, I think that in addition to translating -flto=auto or
-flto=n (with n>N) to -flto=N, where N is the number of available
hardware threads, we may have to also set --param lto-partitions=m,
where m>N. I'm not entirely sure what the right value for m is
compared to the number of threads. m=N+1? m=2*N? For reference, here
is the original discussion on WHOPR partitioning[3].

> In any case, I've added[1] the scheduler API to allocate extra active
> threads.

Oh that was fast.

> What's left is to translate -flto=auto. Would you like to take
> a stab at it? I could give you some pointers if you are interested.

Yeah, I might give it a try this weekend, and pointers are always appreciated.

[1] https://github.com/gcc-mirror/gcc/commit/448af20a27c9a1706712eba8500f5f81f5f6a46d
[2] https://github.com/gcc-mirror/gcc/blob/c3f94f5786a014515c09c7852db228c74adf51e5/gcc/params.opt#L365
[3] http://patchwork.ozlabs.org/comment/152396/


From boris at codesynthesis.com  Fri Aug  7 14:28:36 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Fri, 7 Aug 2020 16:28:36 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
Message-ID: <boris.20200807155146@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> For some reason when I was browsing the GitHub GCC sources, it led me
> to look at an older revision of gcc/params.def. lto-partitions default
> was bumped to 128[1] [...]

Ok, that's a much more reasonable default.


> In any case, I think that in addition to translating -flto=auto or
> -flto=n (with n>N) to -flto=N, where N is the number of available
> hardware threads, we may have to also set --param lto-partitions=m,
> where m>N. I'm not entirely sure what the right value for m is
> compared to the number of threads. m=N+1? m=2*N?

I am not sure we want to mess with lto-partitions, feels like it can
get GCC-version-dependent. Also, I would be weary of rewriting -flto=n;
we generally assume the user knows what they are doing (and if not,
they will appreciate an opportunity to learn ;-)).


> > What's left is to translate -flto=auto. Would you like to take
> > a stab at it? I could give you some pointers if you are interested.
> 
> Yeah, I might give it a try this weekend, and pointers are always
> appreciated.

The place to do this seems to be libbuild2/cc/link-rule.cxx:2993,
just before printing the command line (we generally want to allocate
extra threads as late as possible and release them as early as
possible, the latter would be after run_finish() on line 3164).

We will also only want to do this if:

1. It's not a static library.

2. It's GCC of sufficient version. (I wonder what's the story with
   Clang here?)

You can find plenty of such tests earlier in the function.

As for finding the option itself, we have the find_option_prefix()
utility function that does almost what we need (we need the
position, not the pointer to the option value). Maybe we can add
find_option_prefix_position() or some such?


From mkrupcale at matthewkrupcale.com  Sat Aug  8 01:14:02 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Fri, 7 Aug 2020 21:14:02 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200807155146@codesynthesis.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
Message-ID: <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>

On Fri, Aug 7, 2020 at 10:28 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> I am not sure we want to mess with lto-partitions, feels like it can
> get GCC-version-dependent.

Yeah, that may make things unnecessarily complicated.

> Also, I would be weary of rewriting -flto=n;
> we generally assume the user knows what they are doing (and if not,
> they will appreciate an opportunity to learn ;-)).

Okay, should simplify the logic slightly as well.

> 2. It's GCC of sufficient version. (I wonder what's the story with
>    Clang here?)

GCC 10 should be the first to support -flto=auto[1]. I also looked
into the Clang situation, and it looks like it only allows control of
the number of threads/jobs when -flto=thin is used[2-4]. This is
controlled with -flto-jobs=N since version 4 I think[5].

> As for finding the option itself, we have the find_option_prefix()
> utility function that does almost what we need (we need the
> position, not the pointer to the option value). Maybe we can add
> find_option_prefix_position() or some such?

I wonder if instead of adding the _position variations, it might make
sense to change the API of the existing functions to work more like
e.g. std::find{,_if}. This would allow you to use the same function
for either examining the value found (i.e. *it) or working with the
position (i.e. it). Although, I don't know if you could adapt the
functions which take a const lookup& to this API.

[1] https://gcc.gnu.org/gcc-10/changes.html
[2] https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-flto-jobs
[3] https://clang.llvm.org/docs/ThinLTO.html
[4] http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html
[5] https://github.com/llvm/llvm-project/commit/12286d22b7964fa69e44c9d8ca36cc85d4cf5225


From boris at codesynthesis.com  Sat Aug  8 13:47:15 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Sat, 8 Aug 2020 15:47:15 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
Message-ID: <boris.20200808153443@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> I also looked into the Clang situation, and it looks like it only
> allows control of the number of threads/jobs when -flto=thin is
> used[2-4]. This is controlled with -flto-jobs=N since version 4 I
> think[5].

I suppose we could have added -flto-jobs=N if there is -flto=thin
and no user-supplied -flto-jobs. But maybe let's leave it for if/when
someone needs it (I am not familiar with -flto=thin and whether it
is used in practice).


> I wonder if instead of adding the _position variations, it might make
> sense to change the API of the existing functions to work more like
> e.g. std::find{,_if}. This would allow you to use the same function
> for either examining the value found (i.e. *it) or working with the
> position (i.e. it). Although, I don't know if you could adapt the
> functions which take a const lookup& to this API.

No, lookup versions won't fit. Also, IMO, std::find* API's usability
is awful. It may be general and composable, but every time I write
something along these lines I cringe:

auto i (find_if (v.begin (), v.end (), ...));
if (i != v.end ())
  ...

Compare:

if (const string* o = find_option_prefix (..., v))
  ...

Maybe we could add iterator-based versions along these lines:

template <typename I>
I
find_option_prefix (const char* prefix, I, I, bool = false);


From mkrupcale at matthewkrupcale.com  Sun Aug  9 23:45:35 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Sun, 9 Aug 2020 19:45:35 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200808153443@codesynthesis.com>
References: <CACfYKod3+QdLpW3JhYJfSyhmpf-XT2pMdkqaxESKwYst7jRiDA@mail.gmail.com>
 <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
Message-ID: <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>

On Sat, Aug 8, 2020 at 9:47 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> I suppose we could have added -flto-jobs=N if there is -flto=thin
> and no user-supplied -flto-jobs.

This is what I ended up doing[1].

> But maybe let's leave it for if/when
> someone needs it (I am not familiar with -flto=thin and whether it
> is used in practice).

I'm not sure how widely used it is either, and Fedora will currently
default to -flto[=full] for Clang[2]. Feel free to discard that
portion or play around with it.

> Maybe we could add iterator-based versions along these lines:
>
> template <typename I>
> I
> find_option_prefix (const char* prefix, I, I, bool = false);

I added this, as well as a find_option variant of this[3]. It's not
the same API as the other versions of find_option (i.e. returns I
rather than bool), but I wanted to find the whole option (not just the
prefix) "-flto=auto" (GCC) or "-flto=thin" (Clang).

Feel free to modify as you see fit. Having played around with it a
little, it's not 100% reliable yet. Sometimes it would not schedule
enough threads. But this is probably issue 1. you mentioned, which is
something that would require a dynamic scheduler interaction with the
linker as previously discussed. A little more concerning are the
following two issues.

Sometimes build2 thought there was a deadlock:

$ b build2-test-build/
ld build2-test-build/libbuild2/libs{build2}
error: deadlock suspected, aborting
  info: deadlocks are normally caused by dependency cycles
  info: re-run with -s to diagnose dependency cycles
terminate called without an active exception
Aborted (core dumped)

Sometimes it would schedule more threads/jobs than I expected. With n
hardware threads, and m = n/2 specified as the max for the
alloc_guard, on my 4C4T machine I would sometimes see a total of N =
\sum_i {N_i} = 2n to 3n = 6 to 9 LTO jobs, with -flto=N_i specified,
where 1<=N_i<=3. If m=0 (allocate as many as available), I even saw as
many as N = n^2 = 16 with 1<=N_i<=5.

So there may be a problem with my use of the scheduler or the
scheduler itself which requires some experimentation.

Best,
Matthew

[1] https://fedorapeople.org/cgit/mkrupcale/public_git/build2.git/commit/?h=lto-parallelization&id=2715223d9f39ab60de52cb6573d1b30b43d9138f
[2] https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/master/f/macros#_332
[3] https://fedorapeople.org/cgit/mkrupcale/public_git/build2.git/commit/?h=lto-parallelization&id=28537a37ea6b85eeca0a8b3b2532e9aefad1e6ee


From boris at codesynthesis.com  Mon Aug 10 14:48:19 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Mon, 10 Aug 2020 16:48:19 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
References: <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
Message-ID: <boris.20200810164015@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> I'm not sure how widely used it is either, and Fedora will currently
> default to -flto[=full] for Clang[2]. Feel free to discard that
> portion or play around with it.

I read a bit on it and it seems thin LTO is actually considered
new/better approach compared to full. So let's keep it. I wonder
why Fedora doesn't default to that.


> Sometimes it would not schedule enough threads. But this is probably
> issue 1. you mentioned [...].

If a project has a single final link stage (e.g., an executable), then
the linker should be given all the available threads. It would be good
to confirm at least this is the case.


> A little more concerning are the following two issues.

I think these are due to a silly bug (one of those "made sure everyting
is correct expect the most trivial part") in my implementation of
allocate()/deallocate() which is now fixed in master. Can you give it a
try and see if you get a more sensible behavior?


From mkrupcale at matthewkrupcale.com  Tue Aug 11 03:18:09 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Mon, 10 Aug 2020 23:18:09 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200810164015@codesynthesis.com>
References: <boris.20200803131754@codesynthesis.com>
 <CACfYKofQH-+zxL+d+s=smTunqNrzNO2zdpBLB_xbaHWZLBB-+Q@mail.gmail.com>
 <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
Message-ID: <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>

On Mon, Aug 10, 2020 at 10:48 AM Boris Kolpackov
<boris at codesynthesis.com> wrote:
>
> I wonder why Fedora doesn't default to that.

Probably because both the LTO[1] and Clang toolchain[2] changes are
new for F33. Both have obviously been supported in the distribution,
but these changes are (potentially) distribution-wide changes.

> If a project has a single final link stage (e.g., an executable), then
> the linker should be given all the available threads. It would be good
> to confirm at least this is the case.

After applying your scheduler fixes and setting m=0 in the
alloc_guard, this appears to work correctly.

> I think these are due to a silly bug (one of those "made sure everyting
> is correct expect the most trivial part") in my implementation of
> allocate()/deallocate() which is now fixed in master.

When I saw the original implementation, I was wondering about those
particular lines, but I thought maybe my understanding of the
scheduler was wrong (i.e. perhaps active_ was referring to internally
[to the scheduler] active threads or something).

> Can you give it a try and see if you get a more sensible behavior?

Yes, I've pushed changes to the same branch rebased on top of your
scheduler fix, and things are working much better. Here are some
timings (3 trials) on my 4C4T system for linking the
executables/libraries in build2 (this is only the linking, compilation
is already done):

$ b /tmp/build2-test-build/
$ for i in $(seq 3); do
> find /tmp/build2-test-build/ -type f -executable -delete
> time b /tmp/build2-test-build/
> done

master -flto=auto
real    6m33.626s    6m27.739s    6m29.363s    |    6m30.243s
user    22m55.659s    22m29.588s    22m56.146s    |    22m47.131s
sys    1m11.267s    1m10.142s    1m11.252s    |    1m10.887s

master -flto=1
real    5m56.454s    5m53.458s    5m54.255s    |    5m54.722s
user    19m22.788s    19m20.051s    19m22.424s    |    19m21.754s
sys    1m2.472s    1m2.406s    1m3.169s    |    1m2.682s

lto-parallelization -flto=auto
real    5m53.139s    5m46.499s    5m49.596s    |    5m49.745s
user    19m16.702s    19m15.868s    19m15.820s    |    19m16.130s
sys    1m1.647s    1m1.269s    1m1.649s    |    1m1.522s

So there's a reduction in real/user time of the lto-parallelization
branch with -flto=auto relative to both -flto=auto (-10%/-15%) and
-flto=1 (-1.4%/-0.5%) master branch. The difference will probably be
greater on larger core machines and with fewer objects that can be
linked in parallel.

Best,
Matthew

[1] https://fedoraproject.org/wiki/LTOByDefault
[2] https://fedoraproject.org/wiki/Changes/CompilerPolicy


From boris at codesynthesis.com  Tue Aug 11 12:53:21 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Tue, 11 Aug 2020 14:53:21 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
References: <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
 <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
Message-ID: <boris.20200811144803@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> Yes, I've pushed changes to the same branch rebased on top of your
> scheduler fix, and things are working much better.

Thanks, I've reviewed the changes and pushed my tweaks:

https://git.build2.org/cgit/build2/log/?h=lto-parallelization

It's mostly cosmetic though there were a few issues (especially
in the Clang part). Could you review them and if happy, I will
merge everything to master.

Note also that I moved the option translation after the low-
verbosity diagnostics. The thinking being that in the -v
output it is more useful to see -flto=auto rather than some
fixed number (e.g., in case the user wants to re-run the
command manually).


From mkrupcale at matthewkrupcale.com  Tue Aug 11 13:16:30 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Tue, 11 Aug 2020 09:16:30 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200811144803@codesynthesis.com>
References: <boris.20200805135240@codesynthesis.com>
 <CACfYKofnPK3MyY5yQj045y0B1dYDcMq7sjGE4BMBTR6BNdhDkg@mail.gmail.com>
 <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
 <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
 <boris.20200811144803@codesynthesis.com>
Message-ID: <CACfYKofVneSYmV_aqsn1C63Cf+Yz-gYVi5MAXEYsMd0BZLn2AQ@mail.gmail.com>

On Tue, Aug 11, 2020 at 8:53 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> Thanks, I've reviewed the changes and pushed my tweaks:
>
> https://git.build2.org/cgit/build2/log/?h=lto-parallelization
>
> It's mostly cosmetic though there were a few issues (especially
> in the Clang part). Could you review them and if happy, I will
> merge everything to master.

Yep, everything looks good.

> Note also that I moved the option translation after the low-
> verbosity diagnostics. The thinking being that in the -v
> output it is more useful to see -flto=auto rather than some
> fixed number (e.g., in case the user wants to re-run the
> command manually).

Yeah, makes sense.


From boris at codesynthesis.com  Wed Aug 12 12:14:32 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Wed, 12 Aug 2020 14:14:32 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKofVneSYmV_aqsn1C63Cf+Yz-gYVi5MAXEYsMd0BZLn2AQ@mail.gmail.com>
References: <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
 <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
 <boris.20200811144803@codesynthesis.com>
 <CACfYKofVneSYmV_aqsn1C63Cf+Yz-gYVi5MAXEYsMd0BZLn2AQ@mail.gmail.com>
Message-ID: <boris.20200812141138@codesynthesis.com>

Merged to master. Your first contribution to build2, congrats & thanks!
I wonder what's next... ;-)


From mkrupcale at matthewkrupcale.com  Wed Aug 12 14:02:59 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Wed, 12 Aug 2020 10:02:59 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <boris.20200812141138@codesynthesis.com>
References: <boris.20200807102744@codesynthesis.com>
 <CACfYKoeUJ0FVxaRBJryTMKsAWyTTxK-H8BXjYbcxvgjLn1_t6g@mail.gmail.com>
 <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
 <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
 <boris.20200811144803@codesynthesis.com>
 <CACfYKofVneSYmV_aqsn1C63Cf+Yz-gYVi5MAXEYsMd0BZLn2AQ@mail.gmail.com>
 <boris.20200812141138@codesynthesis.com>
Message-ID: <CACfYKodgFDmyiMUr_GHWYpKbyogBRpscLhawUCRVk+h4LagZpg@mail.gmail.com>

On Wed, Aug 12, 2020 at 8:14 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> Merged to master. Your first contribution to build2, congrats & thanks!

Awesome--fortunately, it was fairly simple with your advice :). Did
you get a chance to test on a larger machine?

> I wonder what's next... ;-)

Good question lol. I do have a few ideas:

1. It might make sense to implement the find_option{,_prefix}
functions taking {c,}strings in terms of the new iterator variants and
the compare_option{,_prefix} functions you wrote.
2. Investigate the use of BLAKE3 hash for file checksums. BLAKE3[1] is
significantly faster than SHA-1 and SHA-2 (5-10 times) and is highly
parallelizable since it uses Merkle trees internally. This could
utilize the new scheduler thread allocator, but even without
parallelization, it's much faster. For small inputs, this may not
matter much, but for many, large TUs or object file checksums, this
might be noticeable, especially if solution 1 of [2] were implemented.
3. Write a Fortran language build system module. This would likely
need a lot of similar machinery in the cc module as well as some of
the cxx module dependency scanning logic to handle Fortran modules.
Fortran compilers though don't have a protocol for communication
between buildsystem and compiler like C++ modules for module name-file
mapping. Instead, compiled Fortran module interface files are named
according to the (lowercase) module name and searched for in the -I,
-J, and current directories (at least that's what gfortran seems to
do). So we just need to find the module source file and compile that
before any module "use"s it, and gfortran should find it. gfortran can
use the C preprocessor (in traditional mode), but it's not invoked by
default unless the file extension is .fpp or is like ".F*", and files
can be textually included using either "include" statements or
"#include" directives (when cpp is invoked).

[1] https://github.com/BLAKE3-team/BLAKE3
[2] https://github.com/build2/build2/issues/87


From boris at codesynthesis.com  Thu Aug 13 16:34:23 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Thu, 13 Aug 2020 18:34:23 +0200
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <CACfYKodgFDmyiMUr_GHWYpKbyogBRpscLhawUCRVk+h4LagZpg@mail.gmail.com>
References: <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
 <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
 <boris.20200811144803@codesynthesis.com>
 <CACfYKofVneSYmV_aqsn1C63Cf+Yz-gYVi5MAXEYsMd0BZLn2AQ@mail.gmail.com>
 <boris.20200812141138@codesynthesis.com>
 <CACfYKodgFDmyiMUr_GHWYpKbyogBRpscLhawUCRVk+h4LagZpg@mail.gmail.com>
Message-ID: <20200813163423.GA13783@codesynthesis.com>

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> Did you get a chance to test on a larger machine?

I only smoke-tested it on my 6C/12T development machine. At least build2
side seems to do the right thing (i.e., I got -flto=12 for an executable
project).


> 1. It might make sense to implement the find_option{,_prefix}
> functions taking {c,}strings in terms of the new iterator variants and
> the compare_option{,_prefix} functions you wrote.

Yes, I also thought we can clean that up.


> 2. Investigate the use of BLAKE3 hash for file checksums. BLAKE3[1] is
> significantly faster than SHA-1 and SHA-2 (5-10 times) and is highly
> parallelizable since it uses Merkle trees internally. This could
> utilize the new scheduler thread allocator, but even without
> parallelization, it's much faster. For small inputs, this may not
> matter much, but for many, large TUs or object file checksums, this
> might be noticeable, especially if solution 1 of [2] were implemented.

The largest amount of data that we currently hash is the preprocessed
TUs during C/C++ compilation. In fact, what we actually hash are the
preprocessor tokens that are returned by the lexer in order to calculate
the checksum that omits ignorable changes. Which means it's not going
to be easily parallelizable. Also, the build2 scheduler is geared
towards more substantial task and my feeling is that any win from
parallel hashing will be offset by the scheduling overhead (locking,
starting threads, etc).


> 3. Write a Fortran language build system module. This would likely
> need a lot of similar machinery in the cc module as well as some of
> the cxx module dependency scanning logic to handle Fortran modules.
> Fortran compilers though don't have a protocol for communication
> between buildsystem and compiler like C++ modules for module name-file
> mapping. Instead, compiled Fortran module interface files are named
> according to the (lowercase) module name and searched for in the -I,
> -J, and current directories (at least that's what gfortran seems to
> do). So we just need to find the module source file and compile that
> before any module "use"s it, and gfortran should find it. gfortran can
> use the C preprocessor (in traditional mode), but it's not invoked by
> default unless the file extension is .fpp or is like ".F*", and files
> can be textually included using either "include" statements or
> "#include" directives (when cpp is invoked).

Sounds interesting, though I personally have never used Fortran so
you will have to be the expert on the compilation model, etc. I did
hear Fortran modules being used as an example of how not to do
modules ;-).

I am also planning to generalize/factor some of the make dependency
parsing and handling logic from the cc module so that it can be reused
by other modules (quite a few tools these days can produce make-style
dependency information).

A couple of more areas that may pique your interest:

- Reproducible builds (-ffile-prefix-map) and separate debug info
  (-gsplit-dwarf) with the view towards distributed compilation and
  hashing.

- Assembler/linker support in the bin module.

- C++20 modules support.


From mkrupcale at matthewkrupcale.com  Sat Aug 15 01:54:51 2020
From: mkrupcale at matthewkrupcale.com (Matthew Krupcale)
Date: Fri, 14 Aug 2020 21:54:51 -0400
Subject: [build2] LTO and parallelization with build2
In-Reply-To: <20200813163423.GA13783@codesynthesis.com>
References: <boris.20200807155146@codesynthesis.com>
 <CACfYKofA_Zt+x=HvWvNBbaau5ehn7fXM3GfqZxL8jS2p6m+Hmg@mail.gmail.com>
 <boris.20200808153443@codesynthesis.com>
 <CACfYKodboq2rqNgYSE6Nwdk6+YNeUicnFg=mU7spvW5+tmDP=w@mail.gmail.com>
 <boris.20200810164015@codesynthesis.com>
 <CACfYKoeMdrR9OY-xT_AcC2ZFi8yf==pWRkznX6Zv_LeqLRH1=Q@mail.gmail.com>
 <boris.20200811144803@codesynthesis.com>
 <CACfYKofVneSYmV_aqsn1C63Cf+Yz-gYVi5MAXEYsMd0BZLn2AQ@mail.gmail.com>
 <boris.20200812141138@codesynthesis.com>
 <CACfYKodgFDmyiMUr_GHWYpKbyogBRpscLhawUCRVk+h4LagZpg@mail.gmail.com>
 <20200813163423.GA13783@codesynthesis.com>
Message-ID: <CACfYKoc+gZcM8J12ibT7PSYkRp=bmU_7Ut8rjW71cpkwB=H44g@mail.gmail.com>

On Thu, Aug 13, 2020 at 12:34 PM Boris Kolpackov
<boris at codesynthesis.com> wrote:
>
> The largest amount of data that we currently hash is the preprocessed
> TUs during C/C++ compilation.

Yeah, the TUs were where I suspected there might currently be some benefit.

> In fact, what we actually hash are the
> preprocessor tokens that are returned by the lexer in order to calculate
> the checksum that omits ignorable changes.

Yes, after looking at libbuild2/cc/{compile-rule,parser,lexer}.cxx I
can see this now.

I suspect this strategy is optimal when most changes are ignorable
with respect to the compiler output (including debug info). One
assumes the cumulative incremental token checksum updates take much
less time than compiling, which should be true provided the
parser/lexer and hasher are fast.

If changes are not ignorable with respect to the compiler, we will
obviously have to recompile, so we may have spent some additional time
doing the parsing/lexing and incremental hashing compared to just
going ahead and recompiling. If we again assume the parser/lexer and
hasher are fast compared to compiling, this is probably a negligible
contribution.

In either case, we want the parser/lexer and hasher to be fast, and
BLAKE3 is likely faster to update than SHA-256.

> Which means it's not going
> to be easily parallelizable. Also, the build2 scheduler is geared
> towards more substantial task and my feeling is that any win from
> parallel hashing will be offset by the scheduling overhead (locking,
> starting threads, etc).

Yeah, this probably means that the current TU hashing scheme is not
suitable for threaded parallelism. Furthermore, this is probably the
only reasonable TU hashing scheme since hashing the full TU is kind of
pointless unless you're trying to detect (only) identity transforms.

On the other hand, hashing large binary files (e.g. object files,
libraries, executables) could benefit much more from such parallelism
and single-threaded speed, and maybe one could come up with a
heuristic for determining when to use multiple hashing threads. This
hashing will likely be necessary to avoid updating e.g.
libraries/executables depending on unchanged object files[1],
re-running tests which depend on unchanged executables (i.e.
incremental testing), etc.

Although, in the case of linking shared libraries it might be possible
to do something smarter like hashing a representation of the ABI
(using e.g. libabigail)[2]. Presumably this is less total work than
full hash + re-link (in the case of mismatch), although it may depend
on the complexity of the library being analyzed.

> Sounds interesting, though I personally have never used Fortran so
> you will have to be the expert on the compilation model, etc.

Yeah, I'm not exactly an expert on Fortran, but I've worked with it
enough to probably work on a build system module for it.

> I did hear Fortran modules being used as an example of how not to do
> modules ;-).

Yeah, it's not ideal, but build2 should be more than capable of
handling it properly. Often times CMake or autotools Fortran projects
don't properly handle module dependencies because e.g. they use a
recursive design that doesn't have a full picture of (inter-directory)
module dependencies, or they don't scan for module dependencies at all
and just force you to iteratively recompile until things work.

> I am also planning to generalize/factor some of the make dependency
> parsing and handling logic from the cc module so that it can be reused
> by other modules (quite a few tools these days can produce make-style
> dependency information).

Sounds good. I had actually wondered about using a custom header
dependency scanner (similar to how modules must be handled) rather
than invoking the preprocessor and getting header info from e.g. -M*
since Clang devs showed this could be much faster than the
preprocessor[3]. But since we want the preprocessed TU anyways, this
is kind of a moot point.

> - Reproducible builds (-ffile-prefix-map)

I suppose this can already be done, but would the idea be to
automatically add something like -ffile-prefix-map=$src_root= to the
compiler args?

> and separate debug info (-gsplit-dwarf)

This is interesting, I wasn't aware of this option. It could
significantly improve link times and complement both the recent
-flto=auto work and the solution to [1]. I suppose you should already
be able to use -gsplit-dwarf and things will more or less work during
development and building.

I guess you would want to use dwp[4] or dwz[5] and link with
--gdb-index (or maybe gdb-add-index can work with dwo or dwp files?)
for installation/distribution, though. What Fedora does for its
packages is after install phase runs find-debuginfo.sh[6], which
searches for unstripped executable files, runs gdb-add-index and
strips them, then compresses the output with dwz.

> with the view towards distributed compilation and hashing.

Would the idea here be that each build node might have a different
environment as far as paths go? Or might it make sense to set up some
sort of container like e.g. bubblewrap[7] for a hermetic, consistent
build on each node?

I also noticed that Bazel uses Merkle trees for its remote caching
mechanism[8,9].

> - C++20 modules support.

This mostly just requires work on the compiler end at this point right?

[1] https://github.com/build2/build2/issues/87
[2] https://engineering.mongodb.com/post/pruning-dynamic-rebuilds-with-libabigail
[3] https://llvm.org/devmtg/2019-04/slides/TechTalk-Lorenz-clang-scan-deps_Fast_dependency_scanning_for_explicit_modules.pdf
[4] https://gcc.gnu.org/wiki/DebugFissionDWP
[5] https://sourceware.org/git/?p=dwz.git;a=summary
[6] https://github.com/rpm-software-management/rpm/blob/master/scripts/find-debuginfo.sh
[7] https://github.com/containers/bubblewrap
[8] https://github.com/bazelbuild/bazel/tree/master/src/main/java/com/google/devtools/build/lib/remote/merkletree
[9] https://github.com/bazelbuild/remote-apis/issues/141


From stankiewiczal at gmail.com  Mon Aug 24 16:38:46 2020
From: stankiewiczal at gmail.com (Aleksander Stankiewicz)
Date: Mon, 24 Aug 2020 18:38:46 +0200
Subject: [build2] B2/cppget.org web frontend availability
Message-ID: <CAA_8HfQSj12c89J3=8vqwwMOp3aoJ5FDKmUfa3ipUsUpg9LeAA@mail.gmail.com>

Hi,
do you have docker image containing working service for private repos like
cppget.org (so I attach my persistent volume only for instance to store
private libs)?

I really like how things have been resolved in build2, but it's hard to
decide about usage of it without own/private projects. It's just no go :(
when I talk with different people about usage of it in the company I'm
working for at the moment.
It's really a pity that such simple argument against it stops b2 adoption...

The other thing that very slows adoption of it is missing integration with
(at least)VSCode.
Does it exist and I just can't find it or it has no any integration with
editors/ides out of the box at the moment(through some extensions/packages)?

-- 
Kind regards
Aleksander Stankiewicz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.build2.org/archives/users/attachments/20200824/4c1ebe96/attachment.html>

From boris at codesynthesis.com  Tue Aug 25 11:20:14 2020
From: boris at codesynthesis.com (Boris Kolpackov)
Date: Tue, 25 Aug 2020 13:20:14 +0200
Subject: [build2] B2/cppget.org web frontend availability
In-Reply-To: <CAA_8HfQSj12c89J3=8vqwwMOp3aoJ5FDKmUfa3ipUsUpg9LeAA@mail.gmail.com>
References: <CAA_8HfQSj12c89J3=8vqwwMOp3aoJ5FDKmUfa3ipUsUpg9LeAA@mail.gmail.com>
Message-ID: <boris.20200825130921@codesynthesis.com>

Aleksander Stankiewicz <stankiewiczal at gmail.com> writes:

> do you have docker image containing working service for private repos like
> cppget.org (so I attach my persistent volume only for instance to store
> private libs)?

Not yet but we are working on something along these lines as we speak.
Specifically, it will be a VM instead of a container (we don't have the
expertise to make things work reliably as a container) but there will
also be a setup script if you would like to try to install things in
a container. We should have something ready to try in about a week.


> The other thing that very slows adoption of it is missing integration with
> (at least) VSCode.

There is currently no integration but there is talk of writing a VSCode
plugin. I will ping the person interested (Joel).


P.S. While b2 is a natural abbreviation for build2, it's ambiguous with
the "Boost Build" build system which they also abbreviate as b2 (and get
very upset if anyone else tries to use this name ;-)). So we try to stick
to the full name (build2).


From mjklaim at gmail.com  Tue Aug 25 11:37:29 2020
From: mjklaim at gmail.com (=?UTF-8?Q?Klaim_=2D_Jo=C3=ABl_Lamotte?=)
Date: Tue, 25 Aug 2020 13:37:29 +0200
Subject: [build2] B2/cppget.org web frontend availability
In-Reply-To: <boris.20200825130921@codesynthesis.com>
References: <CAA_8HfQSj12c89J3=8vqwwMOp3aoJ5FDKmUfa3ipUsUpg9LeAA@mail.gmail.com>
 <boris.20200825130921@codesynthesis.com>
Message-ID: <CAOU91OPHk9QdnieqEmQF-XA49D3n3u20yXOuc56fD0ijsA0iZg@mail.gmail.com>

On Tue, 25 Aug 2020 at 13:20, Boris Kolpackov <boris at codesynthesis.com>
wrote:

> Aleksander Stankiewicz <stankiewiczal at gmail.com> writes:
> > The other thing that very slows adoption of it is missing integration
> with
> > (at least) VSCode.
>
> There is currently no integration but there is talk of writing a VSCode
> plugin. I will ping the person interested (Joel).
>
>
Indeed, I have some plans to make a VSCode extension and maybe a Visual
Studio (not Code) extension too (for the more advanced debugging tools).
Unfortunately I cannot say when I'll be able to begin work on this. So far
I did some research to prepare that project, but I also need to finish
another project first.

Meanwhile, VSCode is still a good tool to use with build2 projects, that's
what I use almost every day working with build2. The main things to know:
 - You can create "debug launch" json files that will help you debug
executables built with build2 (or anything else) - to do that, go in the
debug pannel and create a new launch action, then fill the fields. The only
issue is that I didn't find yet a way to build before launching the program
to test and I don't see a way to attach a debugger to tests when `b test`
is used;
 - As long as you have all the code (including configurations) of your
project in the directory (or directories) open in an instance of VSCode, it
will find the related source code (maybe not the include paths when you
write `#include <...>` however.).
 - Setting VScode to use the "make" syntax highlighting for `buildfile`,
`build2file` `*.build` will help with editing these files (I decided to not
go with a syntax highlighting for manifest files because they don't have an
extension, so it might be more ambiguous);
 - Using the console inside VSCode helps having the same experience
whatever the OS (I use git-bash on windows and VSCode can be set to use it,
it's similar to your usual bash on linux).

The goal of the extension would then be to automate setting up all that and
generate action commands for VSCode. (and similarly for VS, but it's a bit
different).
Also I hope to better inform the intellisense in VSCode which currently is
poor and mostly tries to understand the code around, but fails at basic
stuffs. The one in VS-not-code works better, even in directory-project mode
(if you want to use VS with build2, I recommend doing that, also the
debugger tools are far better).

A. Joël Lamotte
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.build2.org/archives/users/attachments/20200825/836b7aef/attachment.html>

From per.edin at sequence-point.se  Tue Aug 25 11:52:45 2020
From: per.edin at sequence-point.se (Per Edin)
Date: Tue, 25 Aug 2020 13:52:45 +0200
Subject: [build2] B2/cppget.org web frontend availability
In-Reply-To: <CAOU91OPHk9QdnieqEmQF-XA49D3n3u20yXOuc56fD0ijsA0iZg@mail.gmail.com>
References: <CAA_8HfQSj12c89J3=8vqwwMOp3aoJ5FDKmUfa3ipUsUpg9LeAA@mail.gmail.com>
 <boris.20200825130921@codesynthesis.com>
 <CAOU91OPHk9QdnieqEmQF-XA49D3n3u20yXOuc56fD0ijsA0iZg@mail.gmail.com>
Message-ID: <CAJA0JPBcr++52BpwGjmLa2gu-ge=7xOTonVTg_-iWVf8u1C-Aw@mail.gmail.com>

On Tue, Aug 25, 2020 at 1:38 PM Klaim - Joël Lamotte <mjklaim at gmail.com> wrote:
> Indeed, I have some plans to make a VSCode extension and maybe a Visual Studio (not Code) extension too (for the more advanced debugging tools).
> Unfortunately I cannot say when I'll be able to begin work on this. So far I did some research to prepare that project, but I also need to finish another project first.
>

FYI, I'd be very happy to test such an extension for VSCode in the future.

// Per Edin


From mjklaim at gmail.com  Tue Aug 25 13:43:52 2020
From: mjklaim at gmail.com (=?UTF-8?Q?Klaim_=2D_Jo=C3=ABl_Lamotte?=)
Date: Tue, 25 Aug 2020 15:43:52 +0200
Subject: [build2] B2/cppget.org web frontend availability
In-Reply-To: <CAJA0JPBcr++52BpwGjmLa2gu-ge=7xOTonVTg_-iWVf8u1C-Aw@mail.gmail.com>
References: <CAA_8HfQSj12c89J3=8vqwwMOp3aoJ5FDKmUfa3ipUsUpg9LeAA@mail.gmail.com>
 <boris.20200825130921@codesynthesis.com>
 <CAOU91OPHk9QdnieqEmQF-XA49D3n3u20yXOuc56fD0ijsA0iZg@mail.gmail.com>
 <CAJA0JPBcr++52BpwGjmLa2gu-ge=7xOTonVTg_-iWVf8u1C-Aw@mail.gmail.com>
Message-ID: <CAOU91OO304_mb66tdquDXwhadq-_xWekBXOY-AA9e1i6J434zQ@mail.gmail.com>

On Tue, 25 Aug 2020 at 13:52, Per Edin <per.edin at sequence-point.se> wrote:

> On Tue, Aug 25, 2020 at 1:38 PM Klaim - Joël Lamotte <mjklaim at gmail.com>
> wrote:
> > Indeed, I have some plans to make a VSCode extension and maybe a Visual
> Studio (not Code) extension too (for the more advanced debugging tools).
> > Unfortunately I cannot say when I'll be able to begin work on this. So
> far I did some research to prepare that project, but I also need to finish
> another project first.
> >
>
> FYI, I'd be very happy to test such an extension for VSCode in the future.
>
>
Noted, I'll contact you when I have something testable.

A. Joël Lamotte
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.build2.org/archives/users/attachments/20200825/55568f85/attachment.html>

From stankiewiczal at gmail.com  Tue Aug 25 12:54:21 2020
From: stankiewiczal at gmail.com (Aleksander Stankiewicz)
Date: Tue, 25 Aug 2020 14:54:21 +0200
Subject: [build2] B2/cppget.org web frontend availability
In-Reply-To: <boris.20200825130921@codesynthesis.com>
References: <CAA_8HfQSj12c89J3=8vqwwMOp3aoJ5FDKmUfa3ipUsUpg9LeAA@mail.gmail.com>
 <boris.20200825130921@codesynthesis.com>
Message-ID: <CAA_8HfRbGA0XzVUP6yRZ-qRZi_sWhJHPZa2aeM1VVnM2rpcSsw@mail.gmail.com>

Thanks for the quick response! :)
I will check it out when it's ready and I have some free time already

Kind regards,
Aleksander Stankiewicz

wt., 25 sie 2020 o 13:20 Boris Kolpackov <boris at codesynthesis.com>
napisał(a):

> Aleksander Stankiewicz <stankiewiczal at gmail.com> writes:
>
> > do you have docker image containing working service for private repos
> like
> > cppget.org (so I attach my persistent volume only for instance to store
> > private libs)?
>
> Not yet but we are working on something along these lines as we speak.
> Specifically, it will be a VM instead of a container (we don't have the
> expertise to make things work reliably as a container) but there will
> also be a setup script if you would like to try to install things in
> a container. We should have something ready to try in about a week.
>
>
> > The other thing that very slows adoption of it is missing integration
> with
> > (at least) VSCode.
>
> There is currently no integration but there is talk of writing a VSCode
> plugin. I will ping the person interested (Joel).
>
>
> P.S. While b2 is a natural abbreviation for build2, it's ambiguous with
> the "Boost Build" build system which they also abbreviate as b2 (and get
> very upset if anyone else tries to use this name ;-)). So we try to stick
> to the full name (build2).
>


-- 
Serdecznie pozdrawiam
Aleksander Stankiewicz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.build2.org/archives/users/attachments/20200825/3f0ab6a1/attachment.html>