[build2] Improvements to PostScript and PDF documentation generation

Matthew Krupcale mkrupcale at matthewkrupcale.com
Tue Jun 18 18:44:34 UTC 2019


On Tue, Jun 18, 2019 at 8:07 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> I did a bit of reading on pandoc and here are my key takeaways:
>
> That last point made me aware of wkhtmltopdf which is a Qt WebKit-based
> converter that seems to be similar in nature to html2ps. Maybe it's
> worth a try on our XHTML output.

Yeah, pandoc or wkhtmltopdf might be able to produce PDFs which are
good enough, although you won't have the level of control over it as
you will with LaTeX.

> Your reply actually refreshed my memory on my annual experience of
> using LaTeX to make conference slides. And the more I thought about
> it the more I realized how much I actually dislike that experience.
> It normally goes like this (it's a bit of a rant, sorry):
>
> The slides look great, though. ;-)
>
> Ranting aside, LaTeX documents are crafted, not written, which makes
> it seem like a really poor choice as a machine-generated output format.

Yeah, I'm no LaTeX expert either and often have to reuse pieces I've
found on Stack Overflow. But the nice thing about LaTeX is that once
you've got it right, it works well and is consistent. Slides are also
a lot more difficult than normal documents, though, because usually
slides require careful control over the content positioning, which is
what LaTeX tries to do automatically.

For the CLI-generated LaTeX documentation, I didn't have to resort to
anything really fancy, aside from maybe the tcolorbox (which was only
necessary because of potential for verbatim content inside).
Everything else CLI generates is pretty standard LaTeX. The LaTeX
packages used by the prologue are really optional but make things look
a little better, except for pmboxdraw, which is needed for the Unicode
directory tree listings.

> So I am wondering if it will be better to produce CommonMark and let
> someone (e.g., pandoc) who specializes in such conversions deal with
> nuances of producing LaTeX?

Certainly possible, but as stated above, it won't give you the level
of control over the PDF appearance that direct LaTeX will.

> Now, Markdown is not exactly meant for program output, at least not to
> the same degree as, say, XHTML. But it feels a lot closer to the CLI
> language so hopefully the mapping should be straightforward.

Yeah, I'm sure there is a pretty straightforward mapping between CLI
language and Markdown. I think it would be mostly a variant of the
current text-format generation.

> Again, I feel bad disregarding the work you have done but I am afraid
> it will turn into a never-ending maintenance/customization nightmare (I
> can see request along the "X does not work with Y, you should add an
> option to use fancy-Y instead").

I certainly don't want to create a maintenance burden, but I don't
think it would really be necessary to allow the user to customize the
CLI LaTeX generation itself in any case. The generated LaTeX should
indicate the semantics and document structure, while the style and
specifics are defined in the prologue. If someone is not satisfied
with the particular elements (i.e. LaTeX environments or commands)
used by CLI, they should be able to either post-process the LaTeX or
redefine them in their prologue.

For example, say that they don't want to use tcolorbox for their note
blocks. They could either post-process the generated LaTeX to use
whatever environment they want (e.g. textually change tcolorbox ->
framed), or they could define their own environment with
\newenvironment{tcolorbox} in the prologue and give it whatever style
they like.

The rest of the elements like I said I think are pretty standard, but
they could customize those as well using the same post-processing or
re-definition of environments / commands in the prologue. Worst case,
they could modify the CLI LaTeX generation itself, but I don't think
that would be necessary for such standard elements.

> What do you think?

I think it really just depends how much control you think is
worthwhile for the PDF generation. Like if you look at the tcolorbox
or TikZ/PGF manuals themselves, they demonstrate the potential for
what LaTeX -> PDF documentation can look like. If the Markdown -> PDF
or XHTML -> PDF documentation is sufficient, then the control over the
LaTeX is not necessary.

I know the CLI language doesn't yet support preformatted source code
language specification, but if it were to, do you know if pandoc can
generate nicely formatted source code in the Markdown -> PDF
conversion? I'm not sure how html2ps or wkhtmltopdf deal with source
code language formatting in the XHTML -> PDF conversion. The tcolorbox
manual has some nice examples of how this looks using either the
listings or minted packages and would be fairly easy to implement in
the CLI generation I think.

> Would you be willing to explore a CommonMark output implementation?

I may be willing to give this a shot, but I may not have time right
now to work on it.

> BTW, not long ago we packaged cmark-gfm[1] which is a C library for
> parsing and transforming CommonMark to various formats (including
> LaTeX). We currently use its to-HTML transformation in brep to
> display Markdown package descriptions (for example[2]). It also
> crossed my mind that we could potentially use this library in CLI
> for something interesting.

Yeah, if CommonMark is able to express everything you need in your
documentation to sufficient detail, then CLI could only directly work
in that format and use cmark-gfm to generate the other formats.

Best,
Matthew



More information about the users mailing list