Developing with PDF — Hacker News Books

rayiner · 2022-09-30 · Original thread

The slowness seems to be from the UI. PDF engines, except pdf.js, are very fast. On my Mac Mini MuPDF can render 60-100 pages per second, which means for a typical PDF you don’t even have to cache more than a couple of pages in memory at a time. Emacs (pdf-tools) uses poppler, which is slower, but still can handle scrolling through 300 mb PDFs no problem. And the Acrobat engine itself is fast. Prior to Acrobat DC it ran very smoothly on old hardware.

PDF has a lot of features, but it’s not complicated. If your PDF doesn’t contain a 3D model, that’s just some code that takes up room in the library but probably doesn’t even get paged in. A PDF is just a command stream, and the commands for drawing text are straightforward: https://www.oreilly.com/library/view/developing-with-pdf/978...

As to cross platform software, what about Qt, or GTK, or WxWidgets? There isn’t a single decent Electron app. Even VSCode uses CPU sitting there doing nothing. It’s a freaking text editor, what’s happening in the background?!

layer8 · 2022-05-04 · Original thread

The PDF spec is officially available here: http://www.adobe.com/go/pdfreference

There’s also this book which provides a good introduction and overview and is useful for understanding how the format works (although the PDF reference itself is pretty decent too, as far as specs go): https://www.oreilly.com/library/view/developing-with-pdf/978... (You can find a PDF copy if you look around.) EDIT: There’s also https://www.oreilly.com/library/view/pdf-explained/978144932... which might be even better.

However, be warned that the PDF format can be quite complex and is not exactly for the faint of heart. It’s best to use an established library to generate PDF output, like PDFBox, iText, PDFSharp, PDFKit, etc. Those tend to have their own tutorials.

For emphasis: Do not generate PDFs “by hand”! You risk inadvertently generating PDFs that do not fully conform to the spec, and not noticing it because PDF readers are quite lenient in what they accept. A lot of PDFs in the wild are not standard-conforming in some way or other, because their generators were not carefully written against the spec, but against “whatever Acrobat Reader accepts”. This is the bane of every software on the receiving end that needs to process PDFs.

Get the best books from Hacker News each week