Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Ok, I'm now banging my head against a brick wall with this one.

I have an HTML (not XHTML) document that renders fine in Firefox 3 and IE 7. It uses fairly basic CSS to style it and renders fine in HTML.

I'm now after a way of converting it to PDF. I have tried:

  • DOMPDF: it had huge problems with tables. I factored out my large nested tables and it helped (before it was just consuming up to 128M of memory then dying--thats my limit on memory in php.ini) but it makes a complete mess of tables and doesn't seem to get images. The tables were just basic stuff with some border styles to add some lines at various points;
  • HTML2PDF and HTML2PS: I actually had better luck with this. It rendered some of the images (all the images are Google Chart URLs) and the table formatting was much better but it seemed to have some complexity problem I haven't figured out yet and kept dying with unknown node_type() errors. Not sure where to go from here; and
  • Htmldoc: this seems to work fine on basic HTML but has almost no support for CSS whatsoever so you have to do everything in HTML (I didn't realize it was still 2001 in Htmldoc-land...) so it's useless to me.

I tried a Windows app called Html2Pdf Pilot that actually did a pretty decent job but I need something that at a minimum runs on Linux and ideally runs on-demand via PHP on the Webserver.

I really can't believe I'm this stuck. Am I missing something?

share|improve this question
3  
Html2Pdf actually uses an embedded instance of IE to render the page, then converts that to PDF - probably through IE's print mechanism. – Joel Mueller Dec 24 '08 at 19:31
5  
since it's a 2008 question, dompdf is much more mature now. ;-) – Hendra Uzia Jul 15 '11 at 9:03
dompdf now supports CSS 2.1 and can deal with @import, @media and @screen rules, and will load external stylesheets. It also comes bundled with everything required for it to work, although there are things you can install to get better performance than the default libs. code.google.com/p/dompdf – jammypeach Oct 4 '12 at 9:39
you should go for a webkit based effort to make sure your rendering is the same - have a look at [htm2pdf.co.uk/html-to-pdf-api] for a service based on wkhtmltopdf – user1914292 Mar 14 at 21:34

26 Answers

up vote 115 down vote accepted

Have a look at PrinceXML.

It's definitely the best HTML/CSS to PDF converter out there, although it's not free (But hey, your programming is not free either, so if it saves you 10 hours of work, you're home free.)

Oh yeah, did I mention that this is the first (and probably only) HTML2PDF solution that does full ACID2!?

http://princexml.com/samples/

share|improve this answer
7  
After more testing... Prince XML is seriously cool. Nuff said. – cletus Jan 13 '09 at 1:18
8  
PrinceXML is really awesome. Only if it was not that expensive :-( – acme Sep 15 '10 at 10:03
7  
My company wrote a web service built around Prince. Significantly cheaper upfront costs, and usable without needing to install anything: docraptor.com – Joel Meador Jan 11 '11 at 8:31
55  
$3800 per server?! you got to be kidding... – Paktas Nov 3 '12 at 12:42
22  
I don't know about the rest of you, but 10 hours of work != $3800USD – Andrew Fox Jan 10 at 3:10
show 7 more comments

Have a look at WKHTMLTOPDF . It is open source, based on webkit and free.

We wrote a small tutorial here.

share|improve this answer
14  
Better than anything else I've used, simple and free. – MGOwen Nov 1 '09 at 23:14
2  
This one operates on the best premise IMO. Boostrap conversion off an existing renderer instead of writing one from scratch - not a trivial task. Furthermore, Webkit is written in C++ and therefore much faster and much less of a resource hog than PHP based implementation. – Koobz Feb 15 '10 at 12:36
2  
Right approach. Perfect results. Tnx! – mac Jul 12 '10 at 16:46
2  
We have had huge problems trying to get this to render fonts properly CentOS servers. After literally weeks of messing around, it seems the only option is not to use CentOS. – Abhi Beckert Sep 18 '11 at 11:23
1  
Wkhtmltopdf is very good, but is really suffering as a project. The Google code repo is a mess, nothing has been updated for years, there are abandoned forks all over the place, docs are out of sync, vital repos have disappeared, it's no longer possible to compile it statically with recent Qt and it's a complete nightmare to get it to compile anyway, so everyone is reliant on some limited old binaries (e.g. the latest OS X build on Google code is 32-bit for OS X 10.4!). It could really do with picking up by someone clueful in the Qt and webkit world. – Synchro Jun 3 at 13:26
show 8 more comments

After some investigation and general hair-pulling the solution seems to be HTML2PDF. DOMPDF did a terrible job with tables, borders and even moderately complex layout and htmldoc seems reasonably robust but is almost completely CSS-ignorant and I don't want to go back to doing HTML layout without CSS just for that program.

HTML2PDF looked the most promising but I kept having this weird error about null reference arguments to node_type. I finally found the solution to this. Basically, PHP 5.1.x worked fine with regex replaces (preg_replace_*) on strings of any size. PHP 5.2.1 introduced a php.ini config directive called pcre.backtrack_limit. What this config parameter does is limits the string length for which matching is done. Why this was introduced I don't know. The default value was chosen as 100,000. Why such a low value? Again, no idea.

A bug was raised against PHP 5.2.1 for this, which is still open almost two years later.

What's horrifying about this is that when the limit is exceeded, the replace just silently fails. At least if an error had been raised and logged you'd have some indication of what happened, why and what to change to fix it. But no.

So I have a 70k HTML file to turn into PDF. It requires the following php.ini settings:

  • pcre.backtrack_limit = 2000000; # probably more than I need but that's OK
  • memory_limit = 1024M; # yes, one gigabyte; and
  • max_execution_time = 600; # yes, 10 minutes.

Now the astute reader may have noticed that my HTML file is smaller than 100k. The only reason I can guess as to why I hit this problem is that html2pdf does a conversion into xhtml as part of the process. Perhaps that took me over (although nearly 50% bloat seems odd). Whatever the case, the above worked.

Now, html2pdf is a resource hog. My 70k file takes approximately 5 minutes and at least 500-600M of RAM to create a 35 page PDF file. Not quick enough (by far) for a real-time download unfortunately and the memory usage puts the memory usage ratio in the order of 1000-to-1 (600M of RAM for a 70k file), which is utterly ridiculous.

Unfortunately, that's the best I've come up with.

share|improve this answer
4  
Nice report, cletus. WTG! – Seb Oct 19 '09 at 12:49
your pcre.backtrack_limit tip saved my project. Thank you so much! – increddibelly Feb 8 '12 at 9:23
1  
that's brilliant work! i wish i could give you more than 1 vote... – Moshe Shaham Jul 13 '12 at 23:45
1  
me too "HTML2PDF" is the best php code that I ever test! – Mahoor13 Oct 3 '12 at 14:29

Why dont you try MPDF version 2.0?.. I used for creating PDF document.Its working fine...

share|improve this answer
1  
mpdf 5.0 works really well! – Dalen Nov 13 '10 at 0:56
+1 for the documentation, it's extremly clear! I'm gonna try this soon – Marco Demaio Oct 20 '11 at 15:20
It is true, mpdf really works and it is fast, it creates the pdf file on the fly. – conualfy Jan 16 at 19:55

Checkout TCPDF. It has some HTML to PDF functionality that might be enough for what you need. It's also free!

share|improve this answer
it's support is for rendering html is rather limited, you might want to read this: tcpdf.org/doc/classTCPDF.html#ac3fdf25fcd36f1dce04f92187c621407 – Hendra Uzia Jul 15 '11 at 8:17

I suggest http://docraptor.com (which uses PrinceXML as the "engine")

share|improve this answer
Unfortunately impossible to use if you want to generate large PDF-files with a lot of images. I think there is a 60 second timelimit on requests and if Docraptor needs to download a lot of files this will be exceeded, and no file will be made. – Vilhelm Jun 9 '11 at 9:55
This issue Vilhelm mentioned has been fixed. – illbzo1 Dec 28 '11 at 21:08

If your meaning is to create a pdf from php, pdflib will help you.

Else, if you want to convert an HTML page in pdf via PHP, you'll find a little trouble outta here

So, the options I know are:

DOMPDF : php class that wrap the html and build the pdf. Works good, customizable (if you know php), based on pdflib, if i remember right it takes even some CSS. Bad news: slow when the html is big or many complex.

HTML2PS: same of DOMPDF, but this one convert first in .ps (ghostscript), then, in whatever format you need (pdf, jpg, png). For me is little better then dompdf, but have the same speed problem.. oh, better compatibility with css.

Those two are php classes, but if you can install some software on the server, and access it throught passthru() or system(), give a look to these too:

wkhtmltopdf: based on webkit (safari's wrapper), is really fast and powerfull.. seem like is the best one (atm) for convert on the fly html pages to pdf, taking only 2 seconds for a 3 pages xHTML document with CSS2. Is a recent project, anyway, the google.code page is often updated.

htmldoc : this one is a tank, it really never stop/crash.. the project seem death in the 2007, but anyway if you dont need css compatibility this can be nice for you.

tcpdf - this is an ehnanced and mantained version of fpdf. Main Features of tpdf and it is also having less execution time with great output. For detailed tutorial on using the two most popular pdf generation classes: TCPDF and FPDF.. please follow this link

See this posts also.

  1. Converting HTML in PHP File to PDF File
  2. Best pdf generator in PHP , mpdf or fpdf?
  3. Export a html into PDF in PHP?
  4. Writing HTML with PHP variables to PDF file?
  5. How to convert html into pdf with php?
  6. Tool for exporting html as pdf
share|improve this answer
All of the products you've mentioned have already been brought up by others. Some are even in the OP's question... – Charles Aug 15 '12 at 18:10
@Charles: I have given answer in accordance with comparison between different pdf creators with description. – Somnath Muluk Aug 15 '12 at 18:30

Just to bump the thread, I've tried DOMPDF and it worked perfectly. I've used divs and other block level elements to position everythign. Kept it strictly CSS2.1 and it played nicely.

share|improve this answer

There's a tutorial on Zend's devzone on generating pdf from php (part 1, part 2) without any external libraries. I never implemented this sort of solution, but since it's all php, you might find it more flexible to implement and debug.

share|improve this answer

I am using fpdf to produce pdf files using php. It's working well for me so far to produce simple outputs.

share|improve this answer

Thanks to the person who posted the "WKHTMLTOPDF" suggestion.

I was previously using "mPDF" which does a decent job of rendering HTML and CSS, however due to a recent issue with "nested tables", I am going to try "WKHTMLTOPDF".

So far, I've tested it with a few websites, and found it to "lightning fast", and "pretty damn accurate" in terms of rendering as intended.

Given it's based on WebKits, I'm sure it'll do most websites without issue.

* One thing to note: in using it in a "web-app" you'll need to build a wrapper. For any programmer, this should be a breeze. *

-Peter

share|improve this answer
1  
No, you don't need to write your own wrapper; there are PHP bindings (and for other languages, too): code.google.com/p/wkhtmltopdf/wiki/IntegrationWithPhp – Piskvor Nov 9 '11 at 7:36

Well if you want to find a perfect XHTML+CSS to PDF converter library, forget it. It's far from possible. Because it's just like finding a perfect browser (XHTML+CSS rendering engine). Do we have one? IE or FF?

I have had some success with DOMPDF. The thing is that you have to modify your HTML+CSS code to go with the way the library is meant to work. Other than that, I have pretty good results.

See below:

Original HTML

Converting HTML to PDF

share|improve this answer

Perhaps you might try and use Tidy before handing the file to the converter. If one of the renderer chokes on some HTML problem (like unclosed tag), it might help it.

share|improve this answer
Yes a valid point but I've thought of this already. There are no unmatched nor nonstandard tags in my HTML. – cletus Dec 24 '08 at 9:36
PhiLho: that remark helped me out today! – jerrygarciuh Oct 2 '09 at 3:02

I dont think a php class will be the best for render an xHtml page with css.

What happen when a new css rule come out? (soon css 3.0...)

The best way to render an html page is, obvisiuly, a browser. Firefox 3.0 can natively 'print' in pdf format, torisugary developed an extension (command line print) to use it. Here you'll find it.

Anyway, there are still many problmes runninr firefox just as a pdf converter...

At the moment, i think that wkhtmltopdf is the best (that is the one used by the safari browser), fast, quick, awesome. Yes, opensource as well... Give it a look

share|improve this answer

Try grabbing the latest nightly dompdf build - I was using an older version that was a terrible resource hog and took forever to render my pdf. After grabbing a nightly from here:

http://eclecticgeek.com/dompdf/

It only took a few seconds to generate the PDF - AND it was just as nicely rendered as with PrinceXML / Docraptor. Seems like they've seriously optimized the dompdf code since I last used it!

share|improve this answer

Good news! Snappy!!

Snappy is a very easy open source PHP5 library, allowing thumbnail, snapshot or PDF generation from a url or a html page. And... it uses the excellent webkit-based wkhtmltopdf

Enjoy! ^_^

share|improve this answer

It's already been mentioned, but I'd just like to confirm that mpdf is the easiest, most powerful and most free html to pdf converter out there. The sky's really the limit. You can even generate pdfs of dynamic, user generated data.

For instance, a client wanted a CMS system so he could update the track list of the music he played at his club. That was no problem, but he also wanted users to be able download a .pdf of the playlist, and so this downloadable pdf had to be updated by the cms too. Thanks to mpdf, with some simple loops and interspersed variables I could do just that. Something that I thought would take me weeks literally took me minutes.

Download page http://www.mpdf1.com/mpdf/index.php

Great article that helped me get started http://www.smaizys.com/php/mpdf-html-to-pdf-introduction/

share|improve this answer

Fine rendering doesn't mean anything. Does it validate?

All browsers do the most they can to just show something on the screen, no matter how bad the input. And of course they do not do the same thing. If you want the same rendering as FireFox, you could use its rendering engine. There are pdf generators for it. It is an awful lot of work, though.

share|improve this answer
Yes it validates. – cletus Dec 24 '08 at 12:16

This question is pretty old already, but haven't seen anyone mentioning CutyCapt so I will :)

CutyCapt

CutyCapt is a small cross-platform command-line utility to capture WebKit's rendering of a web page into a variety of vector and bitmap formats, including SVG, PDF, PS, PNG, JPEG, TIFF, GIF, and BMP

share|improve this answer

Use DOMPDF for best result. Here are Examples Link:

http://pxd.me/dompdf/www/examples.php

share|improve this answer

Darryl Hein's mention above of TCPDF (http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf) is likely a great idea. Nicola Asuni's code is pretty handy and powerful. The only killer is if you ever plan on merging PDF files with your generated PDF it doesn't have those features. You would have to create the PDF and then merge it using something like PDFTK by Sid Steward (www.pdflabs.com/tools/pdftk-the-pdf-toolkit/).

share|improve this answer

http://code.google.com/p/flying-saucer/ not PHP, but a java library, which does the thing:

Flying Saucer takes XML or XHTML and applies CSS 2.1-compliant stylesheets to it, in order to render to PDF

It is usable from PHP via system() or a similar call. Although it requires XML well-formedness of the input.

share|improve this answer

pdfcrowd does the job with a simple API. Free for personal use and not that expensive for professional use.

share|improve this answer

if you are looking to convert less than 100 html pages into pdf within a month, than pdfcrowd can do the job for you. Really simple and easy to integrate.

share|improve this answer

I recommend TCPDF or DOMPDF, is that order

share|improve this answer

I've tried a lot of different libraries for PHP. All the listed I've tried. In my opinion TCPDF library is the best compromise performance/usability. It's very simply to install and use, also good performance in small medium application. If you need high performance and very big PDF document, use Zend_PDF module, but get ready to coding hard!

share|improve this answer

protected by Community May 22 '11 at 1:59

This question is protected to prevent "thanks!", "me too!", or spam answers by new users. To answer it, you must have earned at least 10 reputation on this site.

Not the answer you're looking for? Browse other questions tagged or ask your own question.