Java converting doc file to pdf using Apache POI library with all graphics,images,tables,borders etc

Question

I am converting doc file to pdf using following java code by using Apache poi library :

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;

public class TestDoc {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        POIFSFileSystem fs = null;
        Document document = new Document();
        try {
            System.out.println("Starting the test");

            //D:\vijay\doctopdf
            fs = new POIFSFileSystem(new FileInputStream("D:/vijay/doctopdf/test.doc"));

            HWPFDocument doc = new HWPFDocument(fs);
            WordExtractor we = new WordExtractor(doc);

            OutputStream file = new FileOutputStream(new File("D:/vijay/doctopdf/test.pdf"));

            PdfWriter writer = PdfWriter.getInstance(document, file);

            Range range = doc.getRange();
            document.open();
            writer.setPageEmpty(true);
            document.newPage();
            writer.setPageEmpty(true);

            String[] paragraphs = we.getParagraphText();
            for (int i = 0; i < paragraphs.length; i++) {

                org.apache.poi.hwpf.usermodel.Paragraph pr = range
                        .getParagraph(i);
                // CharacterRun run = pr.getCharacterRun(i);
                // run.setBold(true);
                // run.setCapitalized(true);
                // run.setItalic(true);
                paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
                System.out.println("Length:" + paragraphs[i].length());
                System.out.println("Paragraph" + i + ": "
                        + paragraphs[i].toString());

                // add the paragraph to the document
                document.add(new Paragraph(paragraphs[i]));
            }

            System.out.println("Document testing completed");
        } catch (Exception e) {
            System.out.println("Exception during test");
            e.printStackTrace();
        } finally {
            // close the document
            document.close();
        }
    }

}

above code run successfully(convert only text in pdf). but when doc contain tables or images or etc.. it will not come in resulting pdf. Any one know how can i get doc as pdf with full accuracy and formatting.

user3610941 · Answer 1 · 2014-05-27 14:14:36Z

up vote 0 down vote

You can use WordExtractor from Apache Tika Parser

answered May 27 at 14:14

user3610941
262

add comment

asked	9 months ago
viewed	1070 times
active	16 days ago

current community

your communities

more stack exchange communities

Java converting doc file to pdf using Apache POI library with all graphics,images,tables,borders etc

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged java apache-poi or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Java converting doc file to pdf using Apache POI library with all graphics,images,tables,borders etc

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java apache-poi or ask your own question.

Related

Hot Network Questions