[build2] Improvements to PostScript and PDF documentation generation

Boris Kolpackov boris at codesynthesis.com
Sat Mar 23 15:00:47 UTC 2019


Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> While looking at the existing PDF documentation[1], I noticed several
> glyphs were wrong e.g. in the `tree` outputs. [...] the Perl script
> used to generate the PS from HTML does not appear to support Unicode
> encoding as far as I can tell[2].

Yes, we have this problem. That Perl script appears to be no longer
maintained/developed, so it is in a sense a dead end. I was actually
thinking of dropping PDF generation altogether but now we may not
have to.


> This led me to consider alternative methods for creating PS/PDF
> documentation with the right encoding. [...]
> 
> I didn't think forcing this conversion to work would be a good
> long-term strategy, so I decided to add LaTeX output to CLI[3] and
> generate the PS/PDF using pdflatex[4-9].

Interesting. On a related note, I've been thinking of adding support
for Markdown as output (so that we can generate README.md's that can
be then shown on GitHub, etc). Do you know if there is a good tool
for converting Markdown to PDF? This is not to imply that there is
anything wrong with the LaTeX idea or your implementation (see below).
I just want to understand the alternatives.


> LaTeX is quite powerful and gives a lot more control over the
> generated PDF than any of these HTML conversion programs could ever
> hope to do, so if generating PS/PDF documentation is desirable, this
> is probably the best way to do so. To generate the documentation this
> way, however, you will need several TeX Live packages (currently)
> including:
> 
> pmboxdraw (part of oberdiek)
> upquote
> fancyhdr
> parskip
> tcolorbox

Generally, we would want to minimize the use of extra packages,
especially less mainstream (and therefore potentially buggier)
ones.

Could you elaborate on what parts of the CLI formatting they
are used for (if at all) and how mainstream they are?

Or, to put it another way, what packages are required (and why)
by the output of the CLI compiler itself (as opposed to the
prologues/epilogues).


> Everything for the most part seems to work: [...]

Nice. From experience, the thing that caused us the most problems
is escaping of special characters, especially in the man output
where the escaping rules depend on the context. Do you have a
sense of how comprehensively your code handles this for LaTeX?

And now thinking about it, the second most problematic area is
nested constructs. For example, lists inside lists, etc. Though
I don't believe this is an issue with LaTeX.


> One place that LaTeX handling is somewhat different from both is
> in quotations: usually, the opening quote should be specified
> as `` and closing as '', but handling this in the cli/context.cxx:
> format_line function was more complicated than it was worth,
> so quotes are just passed on like normal as for HTML/man outputs.

I think this is perfectly fine.


> There's obviously room for adjustment/improvement, but this is a first
> pass to get an idea if this is even something you'd like to do.

I haven't had a chance to study the implementation in detail yet but
from the cursory look this definitely looks interesting. And the output
looks very nice as well.


> Proper code highlighting I suppose will require some change in the
> parsing / .cli files to indicate the language, though.

Yes, that idea crossed my mind and I am sure we can do something
interesting here. The initial thinking was to allow sending pre-
formatted fragments through a custom filter that is expected to
produce something in the output format.



More information about the users mailing list