1 May 2012 05:43
Extracting PDF metadata and exploding pages
Scott Gifford <sgifford <at> suspectclass.com>
2012-05-01 03:43:43 GMT
2012-05-01 03:43:43 GMT
Hello,
We're working on an application that needs to shuffle around the pages in a PDF file.
Right now it uses a hodgepodge of different programs to manipulate the PDF, and if possible I'd like to have it just use Ghostscript. That would simplify dependencies, and also simplify troubleshooting in the event something goes wrong.
First, it uses poppler's pdfinfo to extract metadata from the PDF, like this:
Title: t10_4CCreator: Adobe Illustrator CS4Producer: Adobe PDF library 9.00CreationDate: Fri Dec 16 18:26:22 2011ModDate: Fri Dec 16 18:26:22 2011Tagged: noPages: 1Encrypted: noPage size: 270 x 162 ptsFile size: 955508 bytesOptimized: yesPDF version: 1.4
Next, it splits a multi-page PDF into many single-page PDFs, with "pdftk burst".
After that it uses ghostscript to generate PNG thumbnails of each page.
The user then re-orders the pages in a Web UI using the thumbnails. Finally, it puts them back together in a different order with ghostscript.
I have not been able to find a reasonable way to extract the PDF metadata with Ghostscript, or to "burst" a multi-page PDF document into many one-page PDF documents (well, I could use -dFirstPage and -dLastPage for every page, but that requires many calls to gs for a big document and is much, much slower than pdftk).
I would really like to be able to load the PDF file into ghostscript one time, extract the data I need, then convert the pages one at a time to individual PDF files then to PNGs. Is it possible to drive ghostscript like this, having it do multiple operations on each page?
Thanks for any tips!
----Scott.
_______________________________________________ gs-devel mailing list gs-devel <at> ghostscript.com http://ghostscript.com/cgi-bin/mailman/listinfo/gs-devel
RSS Feed