[build2] Improvements to PostScript and PDF documentation generation

Sat Mar 23 18:45:13 UTC 2019

On Sat, Mar 23, 2019 at 11:00 AM Boris Kolpackov
<boris at codesynthesis.com> wrote:
> Interesting. On a related note, I've been thinking of adding support
> for Markdown as output (so that we can generate README.md's that can
> be then shown on GitHub, etc). Do you know if there is a good tool
> for converting Markdown to PDF?

I've not tried it personally, but it looks like Pandoc does also
support Markdown to PDF in general, although it may run into the same
problems as the HTML -> PDF I tried to do with it. Perhaps it won't
since it looks like the troublesome area was concerning table output,
so if tables aren't used in the Markdown, Pandoc might work.

> Could you elaborate on what parts of the CLI formatting they
> are used for (if at all) and how mainstream they are?
>
> Or, to put it another way, what packages are required (and why)
> by the output of the CLI compiler itself (as opposed to the
> prologues/epilogues).

Almost all of the output of CLI can be compiled by the standard LaTeX
packages. The only exception I can think would be the `tcolorbox`
package, but this could be an option in the CLI output generation. I
chose to use this for the out-of-line notes since it looked the best,
but they could also be handled just using normal paragraphs. The
`tcolorbox` handles very long (i.e. multi-page) notes well while still
setting them apart from the main text like a note ought to and can
pretty much put any arbitrary LaTeX inside the environment, like a
minipage environment. In-line notes are handled using footnotes
currently.

The same can also basically be said for prologue/epilogue
requirements: most of them are provided by the base latex packages,
with a few exceptions (e.g. `upquote`). Even so, some of these are
just for cosmetic improvements and could be considered optional:
`fancyhdr` to add customized headers/footers, `textcomp` and `upquote`
to fix the appearance of double and single-quotes, `parskip` to
correct the spacing of `tcolorbox` notes, `calc` to perform a length
calculation (could hard-code), `geometry` to set the page margins
(could let it be set automatically, but default is relatively large
margins if you're not used to LaTeX output).

In summary, these are the packages and what requires them:

Required by  TeX Live   fedora                  debian
-----------------------------------------------------------------------
CLI output   hyperref   texlive-hyperref        texlive-latex-base
CLI output   pdflatex   texlive-latex           texlive-latex-base
CLI output   tcolorbox  texlive-tcolorbox       texlive-latex-extra
Prologue     babel      texlive-babel           texlive-latex-base
Prologue     calc       texlive-tools           texlive-latex-base
Prologue     fancyhdr   texlive-fancyhdr        texlive-latex-base
Prologue     fontenc    texlive-latex           texlive-latex-base
Prologue     geometry   texlive-geometry        texlive-latex-base
Prologue     inputenc   texlive-latex           texlive-latex-base
Prologue     parskip    texlive-parskip         texlive-latex-recommended
Prologue     pmboxdraw  texlive-oberdiek        texlive-latex-base
Prologue     textcomp   texlive-latex           texlive-latex-base
Prologue     upquote    texlive-upquote         texlive-latex-extra

> Nice. From experience, the thing that caused us the most problems
> is escaping of special characters, especially in the man output
> where the escaping rules depend on the context. Do you have a
> sense of how comprehensively your code handles this for LaTeX?

As far I can tell, I think I've handled all the cases[1,2] in both
latex.cxx:escape_latex and in context.cxx:format_line. I know while
making this work with the build2 documentation there were several
escape-errors I fixed during the process, so it's been tested somewhat
by this. One example of a place that this escaping is
context-dependent is URLs[3], but I think this is handled by calling
format_line(ot_plain) at context.cxx:753 for the link_target, so it
shouldn't have LaTeX escaping.

> And now thinking about it, the second most problematic area is
> nested constructs. For example, lists inside lists, etc. Though
> I don't believe this is an issue with LaTeX.

Yeah I think LaTeX can handle most nested constructs pretty well:
nested lists are no problem. There are some limitations on nesting
things, depending on what environment or command you use, though, For
example, notes with complex contents like verbatim / pre-formatted
blocks I think cannot appear in \parbox or \fbox, which is one reason
I'm using tcolorbox.

> I haven't had a chance to study the implementation in detail yet but
> from the cursory look this definitely looks interesting. And the output
> looks very nice as well.

Thanks, take your time.

> Yes, that idea crossed my mind and I am sure we can do something
> interesting here. The initial thinking was to allow sending pre-
> formatted fragments through a custom filter that is expected to
> produce something in the output format.

Yeah, for the LaTeX output, you can just indicate the language of the
pre-formatted fragment contained in the `listings` or `minted`
environment and then let them format it (using Pygments in the case of
`minted`) during PDF generation.

[1] https://en.wikibooks.org/wiki/LaTeX/Basics#Reserved_Characters
[2] http://www.cespedes.org/blog/85/how-to-escape-latex-special-characters
[3] https://tex.stackexchange.com/a/207641