[build2] Improvements to PostScript and PDF documentation generation

Matthew Krupcale mkrupcale at matthewkrupcale.com
Fri Mar 15 02:35:41 UTC 2019


Hello,

While looking at the existing PDF documentation[1], I noticed several
glyphs were wrong e.g. in the `tree` outputs. Upon further inspection,
I realized the PostScript generated was encoded in ISO-8859-1,

$ file -i build2-build-system-manual-letter.ps
build2-build-system-manual-letter.ps: application/postscript; charset=iso-8859-1

which does not support UTF-8 glyphs that can appear in the XHTML
generated by CLI from the *.cli documentation files,

$ file -i doc/build2-build-system-manual.xhtml
doc/build2-build-system-manual.xhtml: text/html; charset=utf-8
$ file -i doc/manual.cli
doc/manual.cli: text/x-c; charset=utf-8

and the Perl script used to generate the PS from HTML does not appear
to support Unicode encoding as far as I can tell[2].

This led me to consider alternative methods for creating PS/PDF
documentation with the right encoding. Pandoc appeared to be a more
recent tool, but this didn't seem to work:

$ pandoc -f html -t latex -o build2-build-system-manual.pdf
build2-build-system-manual.xhtml
Error producing PDF.
! Argument of \LT at nofcols has an extra }.
<inserted text>
                \par
l.117 \begin{longtable}[]{@{}ll@{}}

I didn't think forcing this conversion to work would be a good
long-term strategy, so I decided to add LaTeX output to CLI[3] and
generate the PS/PDF using pdflatex[4-9].

LaTeX is quite powerful and gives a lot more control over the
generated PDF than any of these HTML conversion programs could ever
hope to do, so if generating PS/PDF documentation is desirable, this
is probably the best way to do so. To generate the documentation this
way, however, you will need several TeX Live packages (currently)
including:

pmboxdraw (part of oberdiek)
upquote
fancyhdr
parskip
tcolorbox

Everything for the most part seems to work: internal and external
links / references, both program options / man page type and regular
documentation, Unicode characters, lists, formatting, notes. A lot of
the CLI additions are based off of and are often quite similar to the
code for HTML/man output types, so they could probably be consolidated
/ shared. One place that LaTeX handling is somewhat different from
both is in quotations: usually, the opening quote should be specified
as `` and closing as '', but handling this in the
cli/context.cxx:format_line function was more complicated than it was
worth, so quotes are just passed on like normal as for HTML/man
outputs.

There's obviously room for adjustment/improvement, but this is a first
pass to get an idea if this is even something you'd like to do. The
`tcolorbox` package in particular is very powerful[10], and there's
the potential to actually do proper code highlighting using either
`listings` or `minted` + `Pygments`. Proper code highlighting I
suppose will require some change in the parsing / .cli files to
indicate the language, though.

Let me know if this interests you.

Best,
Matthew

[1] https://build2.org/build2/doc/build2-build-system-manual-letter.pdf#page=6
[2] https://linux.die.net/man/1/html2ps
[3] https://fedorapeople.org/cgit/mkrupcale/public_git/cli.git/log/?h=generate-latex
[4] https://fedorapeople.org/cgit/mkrupcale/public_git/build2-style.git/log/?h=generate-latex
[5] https://fedorapeople.org/cgit/mkrupcale/public_git/build2.git/log/?h=generate-latex
[6] https://fedorapeople.org/cgit/mkrupcale/public_git/bpkg.git/log/?h=generate-latex
[7] https://fedorapeople.org/cgit/mkrupcale/public_git/bdep.git/log/?h=generate-latex
[8] https://fedorapeople.org/cgit/mkrupcale/public_git/build2-toolchain.git/log/?h=generate-latex
[9] https://mkrupcale.fedorapeople.org/build2/build2-docs-a4-pdf.tar.xz
[10] http://texdoc.net/texmf-dist/doc/latex/tcolorbox/tcolorbox.pdf



More information about the users mailing list