[build2] Improvements to PostScript and PDF documentation generation
Matthew Krupcale
mkrupcale at matthewkrupcale.com
Fri Mar 15 02:35:41 UTC 2019
Hello,
While looking at the existing PDF documentation[1], I noticed several
glyphs were wrong e.g. in the `tree` outputs. Upon further inspection,
I realized the PostScript generated was encoded in ISO-8859-1,
$ file -i build2-build-system-manual-letter.ps
build2-build-system-manual-letter.ps: application/postscript; charset=iso-8859-1
which does not support UTF-8 glyphs that can appear in the XHTML
generated by CLI from the *.cli documentation files,
$ file -i doc/build2-build-system-manual.xhtml
doc/build2-build-system-manual.xhtml: text/html; charset=utf-8
$ file -i doc/manual.cli
doc/manual.cli: text/x-c; charset=utf-8
and the Perl script used to generate the PS from HTML does not appear
to support Unicode encoding as far as I can tell[2].
This led me to consider alternative methods for creating PS/PDF
documentation with the right encoding. Pandoc appeared to be a more
recent tool, but this didn't seem to work:
$ pandoc -f html -t latex -o build2-build-system-manual.pdf
build2-build-system-manual.xhtml
Error producing PDF.
! Argument of \LT at nofcols has an extra }.
<inserted text>
\par
l.117 \begin{longtable}[]{@{}ll@{}}
I didn't think forcing this conversion to work would be a good
long-term strategy, so I decided to add LaTeX output to CLI[3] and
generate the PS/PDF using pdflatex[4-9].
LaTeX is quite powerful and gives a lot more control over the
generated PDF than any of these HTML conversion programs could ever
hope to do, so if generating PS/PDF documentation is desirable, this
is probably the best way to do so. To generate the documentation this
way, however, you will need several TeX Live packages (currently)
including:
pmboxdraw (part of oberdiek)
upquote
fancyhdr
parskip
tcolorbox
Everything for the most part seems to work: internal and external
links / references, both program options / man page type and regular
documentation, Unicode characters, lists, formatting, notes. A lot of
the CLI additions are based off of and are often quite similar to the
code for HTML/man output types, so they could probably be consolidated
/ shared. One place that LaTeX handling is somewhat different from
both is in quotations: usually, the opening quote should be specified
as `` and closing as '', but handling this in the
cli/context.cxx:format_line function was more complicated than it was
worth, so quotes are just passed on like normal as for HTML/man
outputs.
There's obviously room for adjustment/improvement, but this is a first
pass to get an idea if this is even something you'd like to do. The
`tcolorbox` package in particular is very powerful[10], and there's
the potential to actually do proper code highlighting using either
`listings` or `minted` + `Pygments`. Proper code highlighting I
suppose will require some change in the parsing / .cli files to
indicate the language, though.
Let me know if this interests you.
Best,
Matthew
[1] https://build2.org/build2/doc/build2-build-system-manual-letter.pdf#page=6
[2] https://linux.die.net/man/1/html2ps
[3] https://fedorapeople.org/cgit/mkrupcale/public_git/cli.git/log/?h=generate-latex
[4] https://fedorapeople.org/cgit/mkrupcale/public_git/build2-style.git/log/?h=generate-latex
[5] https://fedorapeople.org/cgit/mkrupcale/public_git/build2.git/log/?h=generate-latex
[6] https://fedorapeople.org/cgit/mkrupcale/public_git/bpkg.git/log/?h=generate-latex
[7] https://fedorapeople.org/cgit/mkrupcale/public_git/bdep.git/log/?h=generate-latex
[8] https://fedorapeople.org/cgit/mkrupcale/public_git/build2-toolchain.git/log/?h=generate-latex
[9] https://mkrupcale.fedorapeople.org/build2/build2-docs-a4-pdf.tar.xz
[10] http://texdoc.net/texmf-dist/doc/latex/tcolorbox/tcolorbox.pdf
More information about the users
mailing list