Hello, current maintainer of wkhtmltopdf here. This looks like an interesting approach, as Qt/WebKit is effectively unsupported. wkhtmltopdf is further hampered by the fact that a lot of the functionality requires a patched Qt version, which hasn't been ported to Qt 5.x.
Switching to Blink is obvious, but which wrapper to use (CEF, Electron, etc) is hard to decide. Because they all use a multi process architecture, supporting the API around which the tools are built is not really possible. Time to work on it has also been in short supply in recent times.
So, good work by the authors. Wish we could get something better, but as long as browsers move at the pace they do, it's going to be tough.
Thanks for all your good (and hard!) work on wkhtmltopdf - it sets a high bar for usefulness. We'd love your input on areas that you think could be improved in Athena.
It's not that bad. On my system, starting and cleaning up a Docker container costs about 800ms, whereas running wkhtmltopdf on the New York Times home page takes ~4sec.
You say this now. Just wait a few years when this is a common pattern for distributing cross-platform applications. I'd much rather start up an image than clutter my system with a pile of dependencies that I may need to even compile. No thanks.
Using docker means that you don't have to install any of these dependencies on your computer, and, if it's setup correctly, it runs sandboxes from the rest of my system. So it's not just about the "overhead".
Does docker have a single system to report included libraries up to management, so that when, for example, there's a major hole in openssl or nginx or a php library or the Chromium engine, I will immediately know which docker images are vulnerable and which I don't have to worry about?
Because until that happens, the only way I can run docker images is when I've created them myself... and even then I really need a tool that does the above, so that I can rebuild them when the next CVE comes around.
... so build them yourself. And keep them in your own repository. And build them on a base that you have analyzed. You can probably use whatever tooling you are using for that purpose today. They aren't voodoo.
If you already have Docker running this is a no brainer. The Docker daemon is light weight and starting new containers takes milliseconds is the image is already on your machine (which gets auto-downloaded first run).
wkhtmltopdf frustrates me to no end. I've even got a wrapper script to help iterate through it's various major versions and option flags in an attempt to get usable PDFs. I really hope this is a significant improvement.
I get that it's hard to convert web elements (via Chrome) to a readable/nice PDF, but wkhtmltopdf really isn't the miracle tool it is made out to be. Like you, I have to tweak the options majorly, and even then it fails to support proper page breaks between text lines (i.e. a line of text that is cut across a page break looks so unprofessional).
We are in need of a true web to PDF conversion tool; it's one of those things that is overlooked in the business sector but represents quite a valuable niche.
Please stress-test and feel free to submit problem cases as issues to the github repo [0].
"Aggressive mode", which leverages Chromium's "simplify page" mode, might do some of what you want, but typically you need some of the funky print css classes to enforce widows/orphans.
The main reason we have built this however is that wkhtmltopdf just crashes all the time. Other commenters seem to agree. We'd love feedback on whether Athena works in cases that other tools fail, as well as on quality of conversion.
We started off using wkhtmltopdf for the reporting module in our application to generate PDF's. After using that for a while and searching far and wide for an alternative, we settled on PrinceXML and haven't looked back. It's not OSS, but it's really awesome... once integrated, just set it and forget it.
We are thinking about moving to PrinceXml from java flyingsaucer which doesn't deal with newer html construct. Looks to be really good but good to have other feedbacks.
What options did you settle on? I have used it in production for chopping up roughly 35,000 documents (and counting) and I have somewhat bare-bones options.
Typical use would be `makepdf -o URL`. This saves the PDF in a temp folder, opens it to preview, then asks to save. If it encounters any fatal errors, it iterates through the available `wkhtmltopdf` binaries on PATH (I've got a couple) until it finds the first one that works.
Then there are a few options that can be added (ignore, lowquality, disable-javascript) and it will feed the appropriate option string for that version of `wkhtmltopdf` (since option strings may vary by version).
Except when doing multiple pages and multiple table, footers and headers, splitting tables correctly. Ow yeah, and the difference with local files between windows and linux.
That's about it :)
PS. I use it to create invoices, there aren't a lot of alternatives for it though. So i'm happy it's here! I don't think it's an easy problem to solve..
This isn't a package replacement, it's a microservice in a Docker container.
Replacing one package with three is nothing to be proud of. Replacing one with 13 plus wrapper code [1] is utterly ridiculous.
I don't know the background of this project, so there may be some important detail I'm missing, but it seems they would have done better to set up a stable wkhtmltopdf Dockerfile microservice instead, and avoid all of the wrapper code entirely. Cool as a demo of what's possible with tech today; less cool as a demo of how many hoops you can jump through to reproduce existing functionality.
Switching to Blink is obvious, but which wrapper to use (CEF, Electron, etc) is hard to decide. Because they all use a multi process architecture, supporting the API around which the tools are built is not really possible. Time to work on it has also been in short supply in recent times.
So, good work by the authors. Wish we could get something better, but as long as browsers move at the pace they do, it's going to be tough.