Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'd like to create a Word document using Python, however, I want to re-use as much of my existing document-creation code as possible. I am currently using an XSLT to generate an HTML file that I programatically convert to a PDF file. However, my client is now requesting that the same document be made available in Word (.doc) format.

So far, I haven't had much luck finding any solutions to this problem. Is anyone aware of an open source library (or *gulp* a proprietary solution) that may help resolve this issue?

NOTE: All possible solutions must run on Linux. I believe this eliminates pywin32.

share|improve this question

5 Answers

up vote 21 down vote accepted

A couple ways you can create Word documents using Python:

EDIT:

Since COM is out of the question, I suggest the following (inspired by @kcrumley's answer):

Using the UNO library to automate Open Office from python, open the HTML file in OOWriter, then save as .doc.

EDIT2:

There is now a pure Python docx project that looks nice (I have not used it).

share|improve this answer
Wow, you hit 2 of the same 3 ideas I was going to say (COM and RTF). Thanks for saving me the time. :) – kcrumley Jun 23 '09 at 21:45
6  
+1 for suggesting .RTF instead of .DOC – Hardwareguy Jun 23 '09 at 21:46
Unfortunately, .doc is required. No RTF. – Huuuze Jun 23 '09 at 21:52

I tried python-docx with succes, it enables you to make and edit docx within Python

share|improve this answer
4  
To get more attention when you answer a question it might be a nice idea to include some example code, even if its only linked from that link you provided. – Jakob Bowyer Nov 8 '11 at 15:19

1) If you want to just stick another step on the end of your current pipeline, there are several options out there now for converting PDF files to Word files. I haven't tried 123PDFConverter, but the CNET Editors recommend it (same link); it has a free trial; and it supports automation. As with any 3rd-party file converter, your mileage may vary, depending how complicated your PDFs are, and how good the software actually is.

2) Building on codeape's COM automation suggestion, if you COM automate Word, you can open your actual HTML file in Word, and call the "Save As" command, to save it as a DOC file.

share|improve this answer

Can you write is as the WordML XML files and zip it up into the .docx format? All your client would need is the Word 2007 filter if they aren't on Office 2007 already.

There are many examples out there.

You can also load XML directly into Word, starting with 2003, or so I've been told.

share|improve this answer
Unfortunately, this option is not ideal. From what I can tell, I'd need to convert my data into WordML to maintain the formatting of the document. – Huuuze Jun 23 '09 at 21:22

I have had to do something similar with python as well. It is far more manual work than I want, but documents created with pyRTF were causing Word and OpenOffice to crash and I didn't have the motivation to try to figure it out.

I have found it simplest (but not ideal) to create a Word document template with the styles I want. Then my Python creates an HTML file whose <p> styles are labeled after the Word styles. Then I open the HTML file in Word and open the template in Word. I cut and paste all text from the HTML file into the template, and Word re-formats it all according to the styles I had set up previously. That works for the occasional file in my situation. It might not work for your situation. FYI.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.