From mkrupcale at matthewkrupcale.com Sat Aug 1 16:45:09 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Sat, 1 Aug 2020 12:45:09 -0400 Subject: [build2] LTO and parallelization with build2 Message-ID: Hello, Fedora is looking to enable LTO by default in F33[1], and recently they decided to use -flto=auto for the default GCC LTO build flags[2]. This means[3] that GCC will attempt to use the GNU make jobserver, if available, but otherwise fall back to using the number of CPU threads on the system. Since build2 has its own scheduler, though, this means there might be e.g. up to n^2 threads on an n CPU thread system during linking, which could negatively impact performance / waste CPU time. So when building with -flto={auto,n} using GCC, build2 may either need to instead invoke the compiler/linker with -flto=m, where m is the number of available threads from the scheduler (I'm not sure the scheduler currently supports something like this), or just disable parallel linking with -flto[=1]. Alternatively, build2 could attempt to act as a GNU make jobserver itself somehow. This also might become relevant during compilation as GCC looks to parallelize compilation[4,5]. I wonder if you've given any thought on how best to handle this. As far as I can tell, ninja hasn't really addressed this either, but it looks like they're considering making ninja work as a GNU make jobserver client[6,7], and there's a recent PR to make ninja work with multiple simultaneous ninja processes, all sharing a job limit[8]. Best, Matthew [1] https://fedoraproject.org/wiki/LTOByDefault [2] https://src.fedoraproject.org/rpms/redhat-rpm-config/c/4637e1bd5512b869bd07c58b7545d2528a9bc4c8?branch=master [3] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options [4] https://gcc.gnu.org/wiki/ParallelGcc [5] https://www.phoronix.com/scan.php?page=news_item&px=GCC-2020-More-Parallel-Large-S [6] https://github.com/ninja-build/ninja/issues/1139#issuecomment-520546470 [7] https://github.com/ninja-build/ninja/pull/1140 [8] https://github.com/ninja-build/ninja/pull/1815 From bsutherland at soundhound.com Sun Aug 2 23:53:20 2020 From: bsutherland at soundhound.com (Bruce Sutherland) Date: Mon, 3 Aug 2020 08:53:20 +0900 Subject: [build2] Generating source code in build2 projects In-Reply-To: References: Message-ID: Thank you Boris, that's very helpful. I'll start out with ad hoc recipes, and will reach out again if we run into trouble. On Fri, Jul 31, 2020 at 1:42 PM Boris Kolpackov wrote: > > Bruce Sutherland writes: > > > So I assume we need to configure blubc as a tool in build2, and define > > dependencies from .blub to .cpp and .h files. > > > > How should we do that with build2? The documentation mentions > > generated source code, but doesn't go into much detail. Can anyone > > point me to an example of a project with a similar configuration? > > Currently there are two way to do this: you can provide a rule for > compiling any .blub to .h/.cpp or you can provide an ad hoc recipe > for compiling a specific foo.blub to foo.h/foo.cpp. > > To provide a rule, currently, you will have to write a build system > module that implements it (see, for example, libbuild2-rust[1]). In > the next release we are also planning to add support for ad hoc rules > that can be written directly in buildfiles (similar to ad hoc recipes > below). > > An ad hoc recipe you can write directly in the buildfile and I would > suggest that you start with that since writing a rule/module is fairly > involved. The release notes for 0.13.0[2] have a number of examples > that should get you started (in particular, see the xxd example). > There is also a NASM example[3] I posted a few days ago that shows > how to define a custom target type (e.g., for .blub). > > Note also that this is still an area of active development so there > might be rough edges, missing pieces, etc. So we are quite interested > in usage feedback and also don't mind doing a bit of hand-holding. So > let me know how it goes or if there are any issues. In particular, if > you have a lot of .blub, then writing ad hoc recipes could get tedious > and that's where support for ad hoc rules could be helpful. > > [1] https://github.com/build2/libbuild2-rust > [2] https://build2.org/release/0.13.0.xhtml#adhoc-recipe > [3] https://lists.build2.org/archives/users/2020-July/000823.html From boris at codesynthesis.com Mon Aug 3 11:51:03 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Mon, 3 Aug 2020 13:51:03 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: Matthew Krupcale writes: > Fedora is looking to enable LTO by default in F33[1], and recently > they decided to use -flto=auto for the default GCC LTO build flags[2]. > This means[3] that GCC will attempt to use the GNU make jobserver, if > available, but otherwise fall back to using the number of CPU threads > on the system. Since build2 has its own scheduler, though, this means > there might be e.g. up to n^2 threads on an n CPU thread system during > linking, which could negatively impact performance / waste CPU time. The bigger issue is potential memory usage (I've seen translation units that take over 1G to compile). > So when building with -flto={auto,n} using GCC, build2 may either need > to instead invoke the compiler/linker with -flto=m, where m is the > number of available threads from the scheduler (I'm not sure the > scheduler currently supports something like this), or just disable > parallel linking with -flto[=1]. Alternatively, build2 could attempt > to act as a GNU make jobserver itself somehow. This also might become > relevant during compilation as GCC looks to parallelize > compilation[4,5]. > > I wonder if you've given any thought on how best to handle this. As > far as I can tell, ninja hasn't really addressed this either, but it > looks like they're considering making ninja work as a GNU make > jobserver client[6,7], and there's a recent PR to make ninja work with > multiple simultaneous ninja processes, all sharing a job limit[8]. For -flto to work via jobserver, I believe build2 (and ninja) would need to implement the server proper, not just the client. And having been subscribed to make-alpha for the past decade, I can tell you that jobserver in GNU make has been a never-ending source of bugs, corner case, and compatibility issues (see this post[1] for a primer). So I would like to avoid touching that can of worms if I can help it (we could probably do the client if really necessary, but the server is a whole different story). Now on to how we could handle -flto in build2. It would actually be quite easy for the link rule to allocate more than one hardware thread (if available) in order to pass it on to the linker. There is no such support in the scheduler now but it should be pretty straightforward to add. With this idea then it's only a matter of rewriting -flto=auto or -flto=jobserver with -flto=N where N is the number of hardware threads allocated. There are two potentially problems with this: 1. If GCC does not use all the allocated threads, then they will be wasted, which would be pretty bad. Do you know if GCC will always utilize all the threads given? It appears to be generating a Makefile that it then passes to make, so probably it depends on what's in that Makefile. 2. Theoretically, via the jobserver, the linker can utilize additional threads as they become available. In our case, the number of allocated threads would be fixed at the linker start time. Thoughts? [1] http://make.mad-scientist.net/papers/jobserver-implementation/ From mkrupcale at matthewkrupcale.com Mon Aug 3 14:56:38 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Mon, 3 Aug 2020 10:56:38 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Mon, Aug 3, 2020 at 7:51 AM Boris Kolpackov wrote: > > The bigger issue is potential memory usage (I've seen translation units > that take over 1G to compile). Yeah, very good point. This will especially be a problem on ARM. > For -flto to work via jobserver, I believe build2 (and ninja) would > need to implement the server proper, not just the client. Yes, I believe that's right. The ninja devs in issue 1139 I think were primarily trying to address a different issue which is what happens if ninja is invoked by an outer make, or somehow make is the primary scheduler. > And having been subscribed to make-alpha for the past decade, I can tell you > that jobserver in GNU make has been a never-ending source of bugs, > corner case, and compatibility issues (see this post[1] for a primer). > So I would like to avoid touching that can of worms if I can help it Yeah, it seems like implementing the make jobserver would be rather complex. > Now on to how we could handle -flto in build2. It would actually > be quite easy for the link rule to allocate more than one hardware > thread (if available) in order to pass it on to the linker. There > is no such support in the scheduler now but it should be pretty > straightforward to add. With this idea then it's only a matter > of rewriting -flto=auto or -flto=jobserver with -flto=N where > N is the number of hardware threads allocated. Yeah, that's kind of what I had in mind. > There are two potentially problems with this: > > 1. If GCC does not use all the allocated threads, then they will be > wasted, which would be pretty bad. > > Do you know if GCC will always utilize all the threads given? It > appears to be generating a Makefile that it then passes to make, > so probably it depends on what's in that Makefile. The LTO WHOPR mode[1] is enabled when -flto is passed[2] and an LTO partitioning algorithm is used[3]. The LGEN phase should be executed in parallel already by build2 since it invokes the compiler in parallel for the TUs. Then lto-wrapper forks and execs the two remaining stages, each executed with the specified parallelism: 1. The WPA stage is partitioned[4] with the output of each partition done by separate forks of the LTO process[5]. Currently, the WPA stage doesn't support the jobserver mode[6]. The default partitioning algorithm is balanced[7], with the number of partitions controlled by lto-partitions parameter[7] (default 32[8]). It looks like the lto-partitions should exceed the number of CPUs used for compilation. 2. The LTRANS stage also operates on each partition independently. Whether or not using the jobserver, the parallel LTO mode generates a temporary Makefile[9], but if not using the jobserver, make will be invoked without --jobserver-fd args[10] and with a statically generated number of make jobs[11]. So provided that there are more partitions than allocated CPU threads (i.e. lto-partitions > n), both WPA and LTRANS stages of GCC LTO should utilize all n threads from the scheduler. > 2. Theoretically, via the jobserver, the linker can utilize additional > threads as they become available. In our case, the number of > allocated threads would be fixed at the linker start time. Yeah, I had thought about this issue with the static thread number approach. Without the build2 jobserver, both WPA and LTRANS stages are limited to n threads. Only the LTRANS stage currently supports the use of the jobserver, though. Thus, even if build2 implements a make jobserver, the WPA stage will currently be limited to at most n threads from the build2 scheduler when spawning the linker. Hypothetically, the WPA stage might support the jobserver at some point, so both stages could support dynamic thread allocation from a build2 jobserver. But I think the above analysis at least indicates that the static thread approach should be capable of fully utilizing the threads assigned for linking. It would then be up to build2 to make sure it's keeping the other threads busy with other tasks while linking. [1] https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html [2] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1521 [3] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1564 [4] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1747 [5] https://github.com/gcc-mirror/gcc/blob/d1961e648e0fedebd06e4ad786c1bfc536312ef7/gcc/lto/lto.c#L2398 [6] https://github.com/gcc-mirror/gcc/blob/d1961e648e0fedebd06e4ad786c1bfc536312ef7/gcc/lto/lto.c#L3154 [7] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html [8] https://github.com/gcc-mirror/gcc/blob/51e85e64e125803502fde94b9e22037c0ccaa8b2/gcc/params.def#L1097 [9] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1877 [10] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1949 [11] https://github.com/gcc-mirror/gcc/blob/d2ae6d5c053315c94143103eeae1d3cba005ad9d/gcc/lto-wrapper.c#L1968 From boris at codesynthesis.com Wed Aug 5 12:21:32 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Wed, 5 Aug 2020 14:21:32 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: Matthew Krupcale writes: > Yeah, it seems like implementing the make jobserver would be rather > complex. One idea came into my head: we could provide hooks in the scheduler to allow a build system module to interface with something that controls hardware thread allocation (we would also need some kind of a command line option for module pre-load since such a module wouldn't be loaded from buildfiles). One could then implement a module that provides the jobserver functionality either just the client or the server and either from scratch or by reusing GNU make. Just putting the idea out there. > The LTO WHOPR mode[1] is enabled when -flto is passed[2] and an LTO > partitioning algorithm is used[3]. The LGEN phase should be executed > in parallel already by build2 since it invokes the compiler in > parallel for the TUs. Then lto-wrapper forks and execs the two > remaining stages, each executed with the specified parallelism: > > [...] Thanks for the overview! > So provided that there are more partitions than allocated CPU threads > (i.e. lto-partitions > n), both WPA and LTRANS stages of GCC LTO > should utilize all n threads from the scheduler. Does the number of partititions somehow correlate with the number of TUs being linked? In a sense we have two problems: 1. We could supply too few hardware threads (e.g., because other threads are still being used by build2 at the start of linking but may become available during linking). 2. We could supply too many hardware threads that the linker cannot utilize but that build2 could have used for other tasks. One interesting heuristics against (2) would be to supply up to half of the available hardware threads. With hyper-threaded CPUs one would only waste at most 10-20% of performance in the worst case scenario. > Hypothetically, the WPA stage might support the jobserver at some > point, so both stages could support dynamic thread allocation from a > build2 jobserver. But I think the above analysis at least indicates > that the static thread approach should be capable of fully utilizing > the threads assigned for linking. It would then be up to build2 to > make sure it's keeping the other threads busy with other tasks while > linking. Yes, that's the problem, build2 may have nothing else to do and no way to communicate to the linker that it can use more threads (e.g., during the second stage). But it's probably a good enough first approximation. From mkrupcale at matthewkrupcale.com Wed Aug 5 16:24:13 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Wed, 5 Aug 2020 12:24:13 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Wed, Aug 5, 2020 at 8:21 AM Boris Kolpackov wrote: > > One idea came into my head: we could provide hooks in the scheduler > to allow a build system module to interface with something that > controls hardware thread allocation (we would also need some kind > of a command line option for module pre-load since such a module > wouldn't be loaded from buildfiles). One could then implement a module > that provides the jobserver functionality either just the client or > the server and either from scratch or by reusing GNU make. Just putting > the idea out there. That's an interesting idea indeed. It would require implementing the jobserver somehow as you said (either custom or reusing GNU make), but at least it would be somewhat isolated from the existing scheduler then. > Does the number of partititions somehow correlate with the number of > TUs being linked? I think for -flto-partition=balanced (the default) the number of partitions is always statically determined by --param lto-partitions=m, regardless of the number of TUs (by default, m=32). For -flto-partition=1to1, m might be equivalent to the number of TUs. > 1. We could supply too few hardware threads (e.g., because other > threads are still being used by build2 at the start of linking > but may become available during linking). Yeah, this is the problem that would require integrating linker stages 2-3 with the build2 jobserver somehow. > 2. We could supply too many hardware threads that the linker cannot > utilize but that build2 could have used for other tasks. Based on my reading of how the partitioning works, as long as lto-partitons=m > -flto=n, I think it should be possible for the linker to fully utilize all given threads. This might require some experimentation and verification, though. > But it's probably a good enough first approximation. Yeah, that's kind of what I was thinking. With the current understanding of the partitioning, I was also wondering if -flto=auto might be as likely to OOM as we previously thought (i.e. with current build2 without changes). build2 will already spawn as many linker threads (each currently spawning 1 process when CFLAGS has -flto or -flto=1) as the scheduler has available (call it n). Let's say those n threads are linking TU's with cumulative size N_i bytes (or some other metric of code complexity, like nodes in the tree representation), 0 References: Message-ID: Matthew Krupcale writes: > [...] (by default, m=32) [...] > > Based on my reading of how the partitioning works, as long as > lto-partitons=m > -flto=n, I think it should be possible for the > linker to fully utilize all given threads. This might require some > experimentation and verification, though. CPUs with more than 32 hardware threads are not that uncommon these days. > With the current understanding of the partitioning, I was also > wondering if -flto=auto might be as likely to OOM as we previously > thought (i.e. with current build2 without changes). > > [...] > > This of course has some assumptions about the cumulative memory of the > m partitioned threads being the same as 1 thread doing the full link, > but maybe it's not far off. Could be. There is at least some fixed memory cost for each process spawned. In any case, I've added[1] the scheduler API to allocate extra active threads. What's left is to translate -flto=auto. Would you like to take a stab at it? I could give you some pointers if you are interested. (BTW, there is now etc/bootstrap[2] script for setting up the build2 development environment. It might still have some rough edges but we find it pretty usable.) [1] https://github.com/build2/build2/commit/600da2b97e [2] https://git.build2.org/cgit/etc/tree/bootstrap From mkrupcale at matthewkrupcale.com Fri Aug 7 13:22:58 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Fri, 7 Aug 2020 09:22:58 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Fri, Aug 7, 2020 at 4:46 AM Boris Kolpackov wrote: > > Matthew Krupcale writes: > > > [...] (by default, m=32) [...] > > CPUs with more than 32 hardware threads are not that uncommon these > days. For some reason when I was browsing the GitHub GCC sources, it led me to look at an older revision of gcc/params.def. lto-partitions default was bumped to 128[1], and lto-partitions is now set in gcc/params.opt[2]. In any case, I think that in addition to translating -flto=auto or -flto=n (with n>N) to -flto=N, where N is the number of available hardware threads, we may have to also set --param lto-partitions=m, where m>N. I'm not entirely sure what the right value for m is compared to the number of threads. m=N+1? m=2*N? For reference, here is the original discussion on WHOPR partitioning[3]. > In any case, I've added[1] the scheduler API to allocate extra active > threads. Oh that was fast. > What's left is to translate -flto=auto. Would you like to take > a stab at it? I could give you some pointers if you are interested. Yeah, I might give it a try this weekend, and pointers are always appreciated. [1] https://github.com/gcc-mirror/gcc/commit/448af20a27c9a1706712eba8500f5f81f5f6a46d [2] https://github.com/gcc-mirror/gcc/blob/c3f94f5786a014515c09c7852db228c74adf51e5/gcc/params.opt#L365 [3] http://patchwork.ozlabs.org/comment/152396/ From boris at codesynthesis.com Fri Aug 7 14:28:36 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Fri, 7 Aug 2020 16:28:36 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: Matthew Krupcale writes: > For some reason when I was browsing the GitHub GCC sources, it led me > to look at an older revision of gcc/params.def. lto-partitions default > was bumped to 128[1] [...] Ok, that's a much more reasonable default. > In any case, I think that in addition to translating -flto=auto or > -flto=n (with n>N) to -flto=N, where N is the number of available > hardware threads, we may have to also set --param lto-partitions=m, > where m>N. I'm not entirely sure what the right value for m is > compared to the number of threads. m=N+1? m=2*N? I am not sure we want to mess with lto-partitions, feels like it can get GCC-version-dependent. Also, I would be weary of rewriting -flto=n; we generally assume the user knows what they are doing (and if not, they will appreciate an opportunity to learn ;-)). > > What's left is to translate -flto=auto. Would you like to take > > a stab at it? I could give you some pointers if you are interested. > > Yeah, I might give it a try this weekend, and pointers are always > appreciated. The place to do this seems to be libbuild2/cc/link-rule.cxx:2993, just before printing the command line (we generally want to allocate extra threads as late as possible and release them as early as possible, the latter would be after run_finish() on line 3164). We will also only want to do this if: 1. It's not a static library. 2. It's GCC of sufficient version. (I wonder what's the story with Clang here?) You can find plenty of such tests earlier in the function. As for finding the option itself, we have the find_option_prefix() utility function that does almost what we need (we need the position, not the pointer to the option value). Maybe we can add find_option_prefix_position() or some such? From mkrupcale at matthewkrupcale.com Sat Aug 8 01:14:02 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Fri, 7 Aug 2020 21:14:02 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Fri, Aug 7, 2020 at 10:28 AM Boris Kolpackov wrote: > > I am not sure we want to mess with lto-partitions, feels like it can > get GCC-version-dependent. Yeah, that may make things unnecessarily complicated. > Also, I would be weary of rewriting -flto=n; > we generally assume the user knows what they are doing (and if not, > they will appreciate an opportunity to learn ;-)). Okay, should simplify the logic slightly as well. > 2. It's GCC of sufficient version. (I wonder what's the story with > Clang here?) GCC 10 should be the first to support -flto=auto[1]. I also looked into the Clang situation, and it looks like it only allows control of the number of threads/jobs when -flto=thin is used[2-4]. This is controlled with -flto-jobs=N since version 4 I think[5]. > As for finding the option itself, we have the find_option_prefix() > utility function that does almost what we need (we need the > position, not the pointer to the option value). Maybe we can add > find_option_prefix_position() or some such? I wonder if instead of adding the _position variations, it might make sense to change the API of the existing functions to work more like e.g. std::find{,_if}. This would allow you to use the same function for either examining the value found (i.e. *it) or working with the position (i.e. it). Although, I don't know if you could adapt the functions which take a const lookup& to this API. [1] https://gcc.gnu.org/gcc-10/changes.html [2] https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-flto-jobs [3] https://clang.llvm.org/docs/ThinLTO.html [4] http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html [5] https://github.com/llvm/llvm-project/commit/12286d22b7964fa69e44c9d8ca36cc85d4cf5225 From boris at codesynthesis.com Sat Aug 8 13:47:15 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Sat, 8 Aug 2020 15:47:15 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: Matthew Krupcale writes: > I also looked into the Clang situation, and it looks like it only > allows control of the number of threads/jobs when -flto=thin is > used[2-4]. This is controlled with -flto-jobs=N since version 4 I > think[5]. I suppose we could have added -flto-jobs=N if there is -flto=thin and no user-supplied -flto-jobs. But maybe let's leave it for if/when someone needs it (I am not familiar with -flto=thin and whether it is used in practice). > I wonder if instead of adding the _position variations, it might make > sense to change the API of the existing functions to work more like > e.g. std::find{,_if}. This would allow you to use the same function > for either examining the value found (i.e. *it) or working with the > position (i.e. it). Although, I don't know if you could adapt the > functions which take a const lookup& to this API. No, lookup versions won't fit. Also, IMO, std::find* API's usability is awful. It may be general and composable, but every time I write something along these lines I cringe: auto i (find_if (v.begin (), v.end (), ...)); if (i != v.end ()) ... Compare: if (const string* o = find_option_prefix (..., v)) ... Maybe we could add iterator-based versions along these lines: template I find_option_prefix (const char* prefix, I, I, bool = false); From mkrupcale at matthewkrupcale.com Sun Aug 9 23:45:35 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Sun, 9 Aug 2020 19:45:35 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Sat, Aug 8, 2020 at 9:47 AM Boris Kolpackov wrote: > > I suppose we could have added -flto-jobs=N if there is -flto=thin > and no user-supplied -flto-jobs. This is what I ended up doing[1]. > But maybe let's leave it for if/when > someone needs it (I am not familiar with -flto=thin and whether it > is used in practice). I'm not sure how widely used it is either, and Fedora will currently default to -flto[=full] for Clang[2]. Feel free to discard that portion or play around with it. > Maybe we could add iterator-based versions along these lines: > > template > I > find_option_prefix (const char* prefix, I, I, bool = false); I added this, as well as a find_option variant of this[3]. It's not the same API as the other versions of find_option (i.e. returns I rather than bool), but I wanted to find the whole option (not just the prefix) "-flto=auto" (GCC) or "-flto=thin" (Clang). Feel free to modify as you see fit. Having played around with it a little, it's not 100% reliable yet. Sometimes it would not schedule enough threads. But this is probably issue 1. you mentioned, which is something that would require a dynamic scheduler interaction with the linker as previously discussed. A little more concerning are the following two issues. Sometimes build2 thought there was a deadlock: $ b build2-test-build/ ld build2-test-build/libbuild2/libs{build2} error: deadlock suspected, aborting info: deadlocks are normally caused by dependency cycles info: re-run with -s to diagnose dependency cycles terminate called without an active exception Aborted (core dumped) Sometimes it would schedule more threads/jobs than I expected. With n hardware threads, and m = n/2 specified as the max for the alloc_guard, on my 4C4T machine I would sometimes see a total of N = \sum_i {N_i} = 2n to 3n = 6 to 9 LTO jobs, with -flto=N_i specified, where 1<=N_i<=3. If m=0 (allocate as many as available), I even saw as many as N = n^2 = 16 with 1<=N_i<=5. So there may be a problem with my use of the scheduler or the scheduler itself which requires some experimentation. Best, Matthew [1] https://fedorapeople.org/cgit/mkrupcale/public_git/build2.git/commit/?h=lto-parallelization&id=2715223d9f39ab60de52cb6573d1b30b43d9138f [2] https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/master/f/macros#_332 [3] https://fedorapeople.org/cgit/mkrupcale/public_git/build2.git/commit/?h=lto-parallelization&id=28537a37ea6b85eeca0a8b3b2532e9aefad1e6ee From boris at codesynthesis.com Mon Aug 10 14:48:19 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Mon, 10 Aug 2020 16:48:19 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: Matthew Krupcale writes: > I'm not sure how widely used it is either, and Fedora will currently > default to -flto[=full] for Clang[2]. Feel free to discard that > portion or play around with it. I read a bit on it and it seems thin LTO is actually considered new/better approach compared to full. So let's keep it. I wonder why Fedora doesn't default to that. > Sometimes it would not schedule enough threads. But this is probably > issue 1. you mentioned [...]. If a project has a single final link stage (e.g., an executable), then the linker should be given all the available threads. It would be good to confirm at least this is the case. > A little more concerning are the following two issues. I think these are due to a silly bug (one of those "made sure everyting is correct expect the most trivial part") in my implementation of allocate()/deallocate() which is now fixed in master. Can you give it a try and see if you get a more sensible behavior? From mkrupcale at matthewkrupcale.com Tue Aug 11 03:18:09 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Mon, 10 Aug 2020 23:18:09 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Mon, Aug 10, 2020 at 10:48 AM Boris Kolpackov wrote: > > I wonder why Fedora doesn't default to that. Probably because both the LTO[1] and Clang toolchain[2] changes are new for F33. Both have obviously been supported in the distribution, but these changes are (potentially) distribution-wide changes. > If a project has a single final link stage (e.g., an executable), then > the linker should be given all the available threads. It would be good > to confirm at least this is the case. After applying your scheduler fixes and setting m=0 in the alloc_guard, this appears to work correctly. > I think these are due to a silly bug (one of those "made sure everyting > is correct expect the most trivial part") in my implementation of > allocate()/deallocate() which is now fixed in master. When I saw the original implementation, I was wondering about those particular lines, but I thought maybe my understanding of the scheduler was wrong (i.e. perhaps active_ was referring to internally [to the scheduler] active threads or something). > Can you give it a try and see if you get a more sensible behavior? Yes, I've pushed changes to the same branch rebased on top of your scheduler fix, and things are working much better. Here are some timings (3 trials) on my 4C4T system for linking the executables/libraries in build2 (this is only the linking, compilation is already done): $ b /tmp/build2-test-build/ $ for i in $(seq 3); do > find /tmp/build2-test-build/ -type f -executable -delete > time b /tmp/build2-test-build/ > done master -flto=auto real 6m33.626s 6m27.739s 6m29.363s | 6m30.243s user 22m55.659s 22m29.588s 22m56.146s | 22m47.131s sys 1m11.267s 1m10.142s 1m11.252s | 1m10.887s master -flto=1 real 5m56.454s 5m53.458s 5m54.255s | 5m54.722s user 19m22.788s 19m20.051s 19m22.424s | 19m21.754s sys 1m2.472s 1m2.406s 1m3.169s | 1m2.682s lto-parallelization -flto=auto real 5m53.139s 5m46.499s 5m49.596s | 5m49.745s user 19m16.702s 19m15.868s 19m15.820s | 19m16.130s sys 1m1.647s 1m1.269s 1m1.649s | 1m1.522s So there's a reduction in real/user time of the lto-parallelization branch with -flto=auto relative to both -flto=auto (-10%/-15%) and -flto=1 (-1.4%/-0.5%) master branch. The difference will probably be greater on larger core machines and with fewer objects that can be linked in parallel. Best, Matthew [1] https://fedoraproject.org/wiki/LTOByDefault [2] https://fedoraproject.org/wiki/Changes/CompilerPolicy From boris at codesynthesis.com Tue Aug 11 12:53:21 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Tue, 11 Aug 2020 14:53:21 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: Matthew Krupcale writes: > Yes, I've pushed changes to the same branch rebased on top of your > scheduler fix, and things are working much better. Thanks, I've reviewed the changes and pushed my tweaks: https://git.build2.org/cgit/build2/log/?h=lto-parallelization It's mostly cosmetic though there were a few issues (especially in the Clang part). Could you review them and if happy, I will merge everything to master. Note also that I moved the option translation after the low- verbosity diagnostics. The thinking being that in the -v output it is more useful to see -flto=auto rather than some fixed number (e.g., in case the user wants to re-run the command manually). From mkrupcale at matthewkrupcale.com Tue Aug 11 13:16:30 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Tue, 11 Aug 2020 09:16:30 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Tue, Aug 11, 2020 at 8:53 AM Boris Kolpackov wrote: > > Thanks, I've reviewed the changes and pushed my tweaks: > > https://git.build2.org/cgit/build2/log/?h=lto-parallelization > > It's mostly cosmetic though there were a few issues (especially > in the Clang part). Could you review them and if happy, I will > merge everything to master. Yep, everything looks good. > Note also that I moved the option translation after the low- > verbosity diagnostics. The thinking being that in the -v > output it is more useful to see -flto=auto rather than some > fixed number (e.g., in case the user wants to re-run the > command manually). Yeah, makes sense. From boris at codesynthesis.com Wed Aug 12 12:14:32 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Wed, 12 Aug 2020 14:14:32 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: Merged to master. Your first contribution to build2, congrats & thanks! I wonder what's next... ;-) From mkrupcale at matthewkrupcale.com Wed Aug 12 14:02:59 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Wed, 12 Aug 2020 10:02:59 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: On Wed, Aug 12, 2020 at 8:14 AM Boris Kolpackov wrote: > > Merged to master. Your first contribution to build2, congrats & thanks! Awesome--fortunately, it was fairly simple with your advice :). Did you get a chance to test on a larger machine? > I wonder what's next... ;-) Good question lol. I do have a few ideas: 1. It might make sense to implement the find_option{,_prefix} functions taking {c,}strings in terms of the new iterator variants and the compare_option{,_prefix} functions you wrote. 2. Investigate the use of BLAKE3 hash for file checksums. BLAKE3[1] is significantly faster than SHA-1 and SHA-2 (5-10 times) and is highly parallelizable since it uses Merkle trees internally. This could utilize the new scheduler thread allocator, but even without parallelization, it's much faster. For small inputs, this may not matter much, but for many, large TUs or object file checksums, this might be noticeable, especially if solution 1 of [2] were implemented. 3. Write a Fortran language build system module. This would likely need a lot of similar machinery in the cc module as well as some of the cxx module dependency scanning logic to handle Fortran modules. Fortran compilers though don't have a protocol for communication between buildsystem and compiler like C++ modules for module name-file mapping. Instead, compiled Fortran module interface files are named according to the (lowercase) module name and searched for in the -I, -J, and current directories (at least that's what gfortran seems to do). So we just need to find the module source file and compile that before any module "use"s it, and gfortran should find it. gfortran can use the C preprocessor (in traditional mode), but it's not invoked by default unless the file extension is .fpp or is like ".F*", and files can be textually included using either "include" statements or "#include" directives (when cpp is invoked). [1] https://github.com/BLAKE3-team/BLAKE3 [2] https://github.com/build2/build2/issues/87 From boris at codesynthesis.com Thu Aug 13 16:34:23 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Thu, 13 Aug 2020 18:34:23 +0200 Subject: [build2] LTO and parallelization with build2 In-Reply-To: References: Message-ID: <20200813163423.GA13783@codesynthesis.com> Matthew Krupcale writes: > Did you get a chance to test on a larger machine? I only smoke-tested it on my 6C/12T development machine. At least build2 side seems to do the right thing (i.e., I got -flto=12 for an executable project). > 1. It might make sense to implement the find_option{,_prefix} > functions taking {c,}strings in terms of the new iterator variants and > the compare_option{,_prefix} functions you wrote. Yes, I also thought we can clean that up. > 2. Investigate the use of BLAKE3 hash for file checksums. BLAKE3[1] is > significantly faster than SHA-1 and SHA-2 (5-10 times) and is highly > parallelizable since it uses Merkle trees internally. This could > utilize the new scheduler thread allocator, but even without > parallelization, it's much faster. For small inputs, this may not > matter much, but for many, large TUs or object file checksums, this > might be noticeable, especially if solution 1 of [2] were implemented. The largest amount of data that we currently hash is the preprocessed TUs during C/C++ compilation. In fact, what we actually hash are the preprocessor tokens that are returned by the lexer in order to calculate the checksum that omits ignorable changes. Which means it's not going to be easily parallelizable. Also, the build2 scheduler is geared towards more substantial task and my feeling is that any win from parallel hashing will be offset by the scheduling overhead (locking, starting threads, etc). > 3. Write a Fortran language build system module. This would likely > need a lot of similar machinery in the cc module as well as some of > the cxx module dependency scanning logic to handle Fortran modules. > Fortran compilers though don't have a protocol for communication > between buildsystem and compiler like C++ modules for module name-file > mapping. Instead, compiled Fortran module interface files are named > according to the (lowercase) module name and searched for in the -I, > -J, and current directories (at least that's what gfortran seems to > do). So we just need to find the module source file and compile that > before any module "use"s it, and gfortran should find it. gfortran can > use the C preprocessor (in traditional mode), but it's not invoked by > default unless the file extension is .fpp or is like ".F*", and files > can be textually included using either "include" statements or > "#include" directives (when cpp is invoked). Sounds interesting, though I personally have never used Fortran so you will have to be the expert on the compilation model, etc. I did hear Fortran modules being used as an example of how not to do modules ;-). I am also planning to generalize/factor some of the make dependency parsing and handling logic from the cc module so that it can be reused by other modules (quite a few tools these days can produce make-style dependency information). A couple of more areas that may pique your interest: - Reproducible builds (-ffile-prefix-map) and separate debug info (-gsplit-dwarf) with the view towards distributed compilation and hashing. - Assembler/linker support in the bin module. - C++20 modules support. From mkrupcale at matthewkrupcale.com Sat Aug 15 01:54:51 2020 From: mkrupcale at matthewkrupcale.com (Matthew Krupcale) Date: Fri, 14 Aug 2020 21:54:51 -0400 Subject: [build2] LTO and parallelization with build2 In-Reply-To: <20200813163423.GA13783@codesynthesis.com> References: <20200813163423.GA13783@codesynthesis.com> Message-ID: On Thu, Aug 13, 2020 at 12:34 PM Boris Kolpackov wrote: > > The largest amount of data that we currently hash is the preprocessed > TUs during C/C++ compilation. Yeah, the TUs were where I suspected there might currently be some benefit. > In fact, what we actually hash are the > preprocessor tokens that are returned by the lexer in order to calculate > the checksum that omits ignorable changes. Yes, after looking at libbuild2/cc/{compile-rule,parser,lexer}.cxx I can see this now. I suspect this strategy is optimal when most changes are ignorable with respect to the compiler output (including debug info). One assumes the cumulative incremental token checksum updates take much less time than compiling, which should be true provided the parser/lexer and hasher are fast. If changes are not ignorable with respect to the compiler, we will obviously have to recompile, so we may have spent some additional time doing the parsing/lexing and incremental hashing compared to just going ahead and recompiling. If we again assume the parser/lexer and hasher are fast compared to compiling, this is probably a negligible contribution. In either case, we want the parser/lexer and hasher to be fast, and BLAKE3 is likely faster to update than SHA-256. > Which means it's not going > to be easily parallelizable. Also, the build2 scheduler is geared > towards more substantial task and my feeling is that any win from > parallel hashing will be offset by the scheduling overhead (locking, > starting threads, etc). Yeah, this probably means that the current TU hashing scheme is not suitable for threaded parallelism. Furthermore, this is probably the only reasonable TU hashing scheme since hashing the full TU is kind of pointless unless you're trying to detect (only) identity transforms. On the other hand, hashing large binary files (e.g. object files, libraries, executables) could benefit much more from such parallelism and single-threaded speed, and maybe one could come up with a heuristic for determining when to use multiple hashing threads. This hashing will likely be necessary to avoid updating e.g. libraries/executables depending on unchanged object files[1], re-running tests which depend on unchanged executables (i.e. incremental testing), etc. Although, in the case of linking shared libraries it might be possible to do something smarter like hashing a representation of the ABI (using e.g. libabigail)[2]. Presumably this is less total work than full hash + re-link (in the case of mismatch), although it may depend on the complexity of the library being analyzed. > Sounds interesting, though I personally have never used Fortran so > you will have to be the expert on the compilation model, etc. Yeah, I'm not exactly an expert on Fortran, but I've worked with it enough to probably work on a build system module for it. > I did hear Fortran modules being used as an example of how not to do > modules ;-). Yeah, it's not ideal, but build2 should be more than capable of handling it properly. Often times CMake or autotools Fortran projects don't properly handle module dependencies because e.g. they use a recursive design that doesn't have a full picture of (inter-directory) module dependencies, or they don't scan for module dependencies at all and just force you to iteratively recompile until things work. > I am also planning to generalize/factor some of the make dependency > parsing and handling logic from the cc module so that it can be reused > by other modules (quite a few tools these days can produce make-style > dependency information). Sounds good. I had actually wondered about using a custom header dependency scanner (similar to how modules must be handled) rather than invoking the preprocessor and getting header info from e.g. -M* since Clang devs showed this could be much faster than the preprocessor[3]. But since we want the preprocessed TU anyways, this is kind of a moot point. > - Reproducible builds (-ffile-prefix-map) I suppose this can already be done, but would the idea be to automatically add something like -ffile-prefix-map=$src_root= to the compiler args? > and separate debug info (-gsplit-dwarf) This is interesting, I wasn't aware of this option. It could significantly improve link times and complement both the recent -flto=auto work and the solution to [1]. I suppose you should already be able to use -gsplit-dwarf and things will more or less work during development and building. I guess you would want to use dwp[4] or dwz[5] and link with --gdb-index (or maybe gdb-add-index can work with dwo or dwp files?) for installation/distribution, though. What Fedora does for its packages is after install phase runs find-debuginfo.sh[6], which searches for unstripped executable files, runs gdb-add-index and strips them, then compresses the output with dwz. > with the view towards distributed compilation and hashing. Would the idea here be that each build node might have a different environment as far as paths go? Or might it make sense to set up some sort of container like e.g. bubblewrap[7] for a hermetic, consistent build on each node? I also noticed that Bazel uses Merkle trees for its remote caching mechanism[8,9]. > - C++20 modules support. This mostly just requires work on the compiler end at this point right? [1] https://github.com/build2/build2/issues/87 [2] https://engineering.mongodb.com/post/pruning-dynamic-rebuilds-with-libabigail [3] https://llvm.org/devmtg/2019-04/slides/TechTalk-Lorenz-clang-scan-deps_Fast_dependency_scanning_for_explicit_modules.pdf [4] https://gcc.gnu.org/wiki/DebugFissionDWP [5] https://sourceware.org/git/?p=dwz.git;a=summary [6] https://github.com/rpm-software-management/rpm/blob/master/scripts/find-debuginfo.sh [7] https://github.com/containers/bubblewrap [8] https://github.com/bazelbuild/bazel/tree/master/src/main/java/com/google/devtools/build/lib/remote/merkletree [9] https://github.com/bazelbuild/remote-apis/issues/141 From stankiewiczal at gmail.com Mon Aug 24 16:38:46 2020 From: stankiewiczal at gmail.com (Aleksander Stankiewicz) Date: Mon, 24 Aug 2020 18:38:46 +0200 Subject: [build2] B2/cppget.org web frontend availability Message-ID: Hi, do you have docker image containing working service for private repos like cppget.org (so I attach my persistent volume only for instance to store private libs)? I really like how things have been resolved in build2, but it's hard to decide about usage of it without own/private projects. It's just no go :( when I talk with different people about usage of it in the company I'm working for at the moment. It's really a pity that such simple argument against it stops b2 adoption... The other thing that very slows adoption of it is missing integration with (at least)VSCode. Does it exist and I just can't find it or it has no any integration with editors/ides out of the box at the moment(through some extensions/packages)? -- Kind regards Aleksander Stankiewicz -------------- next part -------------- An HTML attachment was scrubbed... URL: From boris at codesynthesis.com Tue Aug 25 11:20:14 2020 From: boris at codesynthesis.com (Boris Kolpackov) Date: Tue, 25 Aug 2020 13:20:14 +0200 Subject: [build2] B2/cppget.org web frontend availability In-Reply-To: References: Message-ID: Aleksander Stankiewicz writes: > do you have docker image containing working service for private repos like > cppget.org (so I attach my persistent volume only for instance to store > private libs)? Not yet but we are working on something along these lines as we speak. Specifically, it will be a VM instead of a container (we don't have the expertise to make things work reliably as a container) but there will also be a setup script if you would like to try to install things in a container. We should have something ready to try in about a week. > The other thing that very slows adoption of it is missing integration with > (at least) VSCode. There is currently no integration but there is talk of writing a VSCode plugin. I will ping the person interested (Joel). P.S. While b2 is a natural abbreviation for build2, it's ambiguous with the "Boost Build" build system which they also abbreviate as b2 (and get very upset if anyone else tries to use this name ;-)). So we try to stick to the full name (build2). From mjklaim at gmail.com Tue Aug 25 11:37:29 2020 From: mjklaim at gmail.com (=?UTF-8?Q?Klaim_=2D_Jo=C3=ABl_Lamotte?=) Date: Tue, 25 Aug 2020 13:37:29 +0200 Subject: [build2] B2/cppget.org web frontend availability In-Reply-To: References: Message-ID: On Tue, 25 Aug 2020 at 13:20, Boris Kolpackov wrote: > Aleksander Stankiewicz writes: > > The other thing that very slows adoption of it is missing integration > with > > (at least) VSCode. > > There is currently no integration but there is talk of writing a VSCode > plugin. I will ping the person interested (Joel). > > Indeed, I have some plans to make a VSCode extension and maybe a Visual Studio (not Code) extension too (for the more advanced debugging tools). Unfortunately I cannot say when I'll be able to begin work on this. So far I did some research to prepare that project, but I also need to finish another project first. Meanwhile, VSCode is still a good tool to use with build2 projects, that's what I use almost every day working with build2. The main things to know: - You can create "debug launch" json files that will help you debug executables built with build2 (or anything else) - to do that, go in the debug pannel and create a new launch action, then fill the fields. The only issue is that I didn't find yet a way to build before launching the program to test and I don't see a way to attach a debugger to tests when `b test` is used; - As long as you have all the code (including configurations) of your project in the directory (or directories) open in an instance of VSCode, it will find the related source code (maybe not the include paths when you write `#include <...>` however.). - Setting VScode to use the "make" syntax highlighting for `buildfile`, `build2file` `*.build` will help with editing these files (I decided to not go with a syntax highlighting for manifest files because they don't have an extension, so it might be more ambiguous); - Using the console inside VSCode helps having the same experience whatever the OS (I use git-bash on windows and VSCode can be set to use it, it's similar to your usual bash on linux). The goal of the extension would then be to automate setting up all that and generate action commands for VSCode. (and similarly for VS, but it's a bit different). Also I hope to better inform the intellisense in VSCode which currently is poor and mostly tries to understand the code around, but fails at basic stuffs. The one in VS-not-code works better, even in directory-project mode (if you want to use VS with build2, I recommend doing that, also the debugger tools are far better). A. Joël Lamotte -------------- next part -------------- An HTML attachment was scrubbed... URL: From per.edin at sequence-point.se Tue Aug 25 11:52:45 2020 From: per.edin at sequence-point.se (Per Edin) Date: Tue, 25 Aug 2020 13:52:45 +0200 Subject: [build2] B2/cppget.org web frontend availability In-Reply-To: References: Message-ID: On Tue, Aug 25, 2020 at 1:38 PM Klaim - Joël Lamotte wrote: > Indeed, I have some plans to make a VSCode extension and maybe a Visual Studio (not Code) extension too (for the more advanced debugging tools). > Unfortunately I cannot say when I'll be able to begin work on this. So far I did some research to prepare that project, but I also need to finish another project first. > FYI, I'd be very happy to test such an extension for VSCode in the future. // Per Edin From mjklaim at gmail.com Tue Aug 25 13:43:52 2020 From: mjklaim at gmail.com (=?UTF-8?Q?Klaim_=2D_Jo=C3=ABl_Lamotte?=) Date: Tue, 25 Aug 2020 15:43:52 +0200 Subject: [build2] B2/cppget.org web frontend availability In-Reply-To: References: Message-ID: On Tue, 25 Aug 2020 at 13:52, Per Edin wrote: > On Tue, Aug 25, 2020 at 1:38 PM Klaim - Joël Lamotte > wrote: > > Indeed, I have some plans to make a VSCode extension and maybe a Visual > Studio (not Code) extension too (for the more advanced debugging tools). > > Unfortunately I cannot say when I'll be able to begin work on this. So > far I did some research to prepare that project, but I also need to finish > another project first. > > > > FYI, I'd be very happy to test such an extension for VSCode in the future. > > Noted, I'll contact you when I have something testable. A. Joël Lamotte -------------- next part -------------- An HTML attachment was scrubbed... URL: From stankiewiczal at gmail.com Tue Aug 25 12:54:21 2020 From: stankiewiczal at gmail.com (Aleksander Stankiewicz) Date: Tue, 25 Aug 2020 14:54:21 +0200 Subject: [build2] B2/cppget.org web frontend availability In-Reply-To: References: Message-ID: Thanks for the quick response! :) I will check it out when it's ready and I have some free time already Kind regards, Aleksander Stankiewicz wt., 25 sie 2020 o 13:20 Boris Kolpackov napisał(a): > Aleksander Stankiewicz writes: > > > do you have docker image containing working service for private repos > like > > cppget.org (so I attach my persistent volume only for instance to store > > private libs)? > > Not yet but we are working on something along these lines as we speak. > Specifically, it will be a VM instead of a container (we don't have the > expertise to make things work reliably as a container) but there will > also be a setup script if you would like to try to install things in > a container. We should have something ready to try in about a week. > > > > The other thing that very slows adoption of it is missing integration > with > > (at least) VSCode. > > There is currently no integration but there is talk of writing a VSCode > plugin. I will ping the person interested (Joel). > > > P.S. While b2 is a natural abbreviation for build2, it's ambiguous with > the "Boost Build" build system which they also abbreviate as b2 (and get > very upset if anyone else tries to use this name ;-)). So we try to stick > to the full name (build2). > -- Serdecznie pozdrawiam Aleksander Stankiewicz -------------- next part -------------- An HTML attachment was scrubbed... URL: