[build2] Improvements to PostScript and PDF documentation generation

Tue Jun 18 12:07:11 UTC 2019

Matthew Krupcale <mkrupcale at matthewkrupcale.com> writes:

> On Sat, Mar 23, 2019 at 11:00 AM Boris Kolpackov <boris at codesynthesis.com> wrote:
>
> > Interesting. On a related note, I've been thinking of adding support
> > for Markdown as output (so that we can generate README.md's that can
> > be then shown on GitHub, etc). Do you know if there is a good tool
> > for converting Markdown to PDF?
> 
> I've not tried it personally, but it looks like Pandoc does also
> support Markdown to PDF in general, although it may run into the same
> problems as the HTML -> PDF I tried to do with it. Perhaps it won't
> since it looks like the troublesome area was concerning table output,
> so if tables aren't used in the Markdown, Pandoc might work.

I did a bit of reading on pandoc and here are my key takeaways:

1. Markdown is its main input format so presumably PDF output is well
   supported for all/most of its constructs (unlike HTML).

2. It supports PDF output via multiple paths:

   * PDF output via PDFLaTeX
   * PDF output via XeLaTeX
   * PDF output via LuaTeX
   * PDF output via ConTeXt
   * PDF output via wkhtmltopdf

That last point made me aware of wkhtmltopdf which is a Qt WebKit-based
converter that seems to be similar in nature to html2ps. Maybe it's
worth a try on our XHTML output.

> Yeah I think LaTeX can handle most nested constructs pretty well:
> nested lists are no problem. There are some limitations on nesting
> things, depending on what environment or command you use, though, For
> example, notes with complex contents like verbatim / pre-formatted
> blocks I think cannot appear in \parbox or \fbox, which is one reason
> I'm using tcolorbox.

Your reply actually refreshed my memory on my annual experience of
using LaTeX to make conference slides. And the more I thought about
it the more I realized how much I actually dislike that experience.
It normally goes like this (it's a bit of a rant, sorry):

1. Look for an example of whatever I am trying to do in one of the
   previous slide decks. If found, great, use it and don't ask any
   questions.

2. Otherwise, try to do it in what seems like a sensible and
   straightforward way. Sometimes it works but often it doesn't.
   If it's the latter, then you will either get incomprehensible
   diagnostics or things looking off.

3. Google time: find someone having the same problem on Stack Overflow.
   Nine times out of ten, however, the answer is not "this is what you
   did wrong, that's the reason, and this is how to do it correctly".
   Rather, it is "oh, you shouldn't be using package foo for that but
   instead foox/fancyfoo/etc".

The slides look great, though. ;-)

Ranting aside, LaTeX documents are crafted, not written, which makes
it seem like a really poor choice as a machine-generated output format.

So I am wondering if it will be better to produce CommonMark and let
someone (e.g., pandoc) who specializes in such conversions deal with
nuances of producing LaTeX?

Now, Markdown is not exactly meant for program output, at least not to
the same degree as, say, XHTML. But it feels a lot closer to the CLI
language so hopefully the mapping should be straightforward.

Again, I feel bad disregarding the work you have done but I am afraid
it will turn into a never-ending maintenance/customization nightmare (I
can see request along the "X does not work with Y, you should add an
option to use fancy-Y instead").

What do you think? Would you be willing to explore a CommonMark output
implementation?

BTW, not long ago we packaged cmark-gfm[1] which is a C library for
parsing and transforming CommonMark to various formats (including
LaTeX). We currently use its to-HTML transformation in brep to
display Markdown package descriptions (for example[2]). It also
crossed my mind that we could potentially use this library in CLI
for something interesting.

[1] https://cppget.org/libcmark-gfm
[2] https://cppget.org/libstud-uuid