Friday, March 12, 2010

Putting together PDF files

Extracted from linux.com

By Scott Nesbitt

Joining PDFs the Ghostscript way

Ghostscript is a package that enables you to view or print PostScript and PDF files to other formats, or to convert those files to other formats. It's a popular tool among Linux users, but what many people don't know is that Ghostscript is also a powerful tool for combining PDF files.

To use Ghostscript to combine PDF files, type something like the following:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdffile1.pdf file2.pdf

Unless you're very familiar with Ghostscript, that string of commands won't mean much to you. Here's a quick breakdown:

    \t
  • gs -- starts the Ghostscript program
  • \t
  • -dBATCH -- once Ghostscript processes the PDF files, it should exit. If you don't include this option, Ghostscript will just keep running
  • \t
  • -dNOPAUSE -- forces Ghostscript to process each page without pausing for user interaction
  • \t
  • -q -- stops Ghostscript from displaying messages while it works
  • \t
  • -sDEVICE=pdfwrite -- tells Ghostscript to use its built-in PDF writer to process the files
  • \t
  • -sOutputFile=finished.pdf -- tells Ghostscript to save the combined PDF file with the name that you specified

When using Ghostscript to combine PDF files, you can add any PDF-relatedoption to the command line. For example, you can compress the file, target it to an eBook reader, or encrypt it. See the Ghostscript documentation for more information.

The biggest advantage to Ghostscript is that it's a standard part of many Linux distributions. If you don't have it on your computer, it's easy todownload and install it.

Using Ghostscript has its drawbacks, too. Unless you use Ghostscript's PDF options, the utility produces a barebones merged PDF file, and a large one at that, because by default Ghostscript doesn't compress PDF files. On top of that, some people may find typing long strings of options at the command line to be a bit of a chore.

joinPDF: Quick and simple

If you want a no-muss, no-fuss way of joining two or more PDF files together,look no further than joinPDF. It's a simple but elegant little utility that consists of a script (named joinPDF) and a compiled Java file. To run it, you only need to specify at the command line the name of the output file and the files that you want to combine. To use joinPDF you type something like this:

joinpdf myFile.pdf file1.pdf file2.pdf ...

Depending on how many PDF files you're combining and their sizes, joinPDF onlytakes a few seconds to merge them. JoinPDF compresses the output file it generates; while writing this article, I used with joinPDF to merge various combinations offiles of various sizes, and each time, the resulting PDF file was several kilobytes toseveral tens of kilobytes smaller than the total sizes of the source files.

JoinPDF is a Java utility -- to use it, you needversion 1.4 of the Java Runtime Environment installed. It runs on any Linuxdistribution, or any other operating system that supports Java. In order to use joinPDF out of the box, you have to copy the Java file to the /usr/lib directory -- that's where the joinPDF script expects to find it. If you want to put the Java files somewhere else, like the /usr/local/bin directory, you need to edit the joinPDF script to point to that directory.

The biggest advantage of joinPDF is its simplicity. There are no optionsto remember. Of course, some users might find joinPDF's simplicity to be a detriment. If you want options, joinPDF isn't for you. Also, joinPDF cannot join PDFs if one or more of them is encrypted.

The joinPDF package comes with another script called splitPDF. As its name implies, splitPDF is used to extract pages PDF files. A discussion of splitPDF is beyond the scope of this article, but if you need to pull pages out of your PDF files, you'll find splitPDF useful.

Merging PDF files with pdfmeld

Do you need a lot of features in the software that you use to combine your PDF files? Then consider pdfmeld. Of the three applications discussed in this article, pdfmeld is probably the most powerful and flexible.

To use pdfmeld you type something like this at the command line:

pdfmeld file1.pdf,file2.pdf,... result.pdf [options]

pdfmeld has literally dozens of options -- for a full list, check out the documentation. These options include adding bookmarks to a PDF file, encrypting the PDF file, and adding information like title, author name, and subject. While it sounds complex and difficult to use, pdfmeld really isn't. You'll quickly find that you'll only use a handful of the options regularly, and you can forget about the rest.

pdfmeld doesn't just combine PDF files. You can use it extract pages from a PDF file, rearrange the pages in a file, rotate pages, and even touch up text. In fact, pdfmeld packs many of the features of Adobe Acrobat in a package that weighs in at just over 1 MB.

pdfmeld's range of options are its greatest strength. But they come at a price, albeit a small one -- $9.95. Like joinPDF, pdfmeld automatically compresses the resulting file. It's also very fast: it only took a few seconds to mash three 20-page PDF files together on my old 300MHz Linux box.

I found very little wrong with pdfmeld. One problem that I did encounter, that I didn't see with Ghostscript or joinPDF, was the error message "Page Contents Object has Wrong Type" when I tried to open a merged PDF file in Acrobat Reader. This happens when an empty page contains contents information. This only happened twice, when I added a cover followed by a blank page to a particular document.

Other tools

These three applications aren't your only choices. Some of the other tools available for merging PDF files include pfdtk, Multivalent, and pdcat. I briefly looked at pdftk and Multivalent (pdcat is a commercial product), and found them to be solid applications.

So, which utility comes out on top? Just for its sheer number of features, you should give pdfmeld a serious look. While some people might balk at dropping $9.95 for software that does pretty much the same thing that Ghostscript does, I think the price is well worth it. Of course, being a long-time Ghostscript user I still have a soft spot for it. But typing those long strings of options really wears me down after a while. And joinPDF is perfect if you want to get the job done quickly and easily.

If you're adamant about using only free software, then go with Ghostscript or joinPDF. But if you can afford to drop 10 bucks, you'll find that pdfmeld is a great little application that can handle all of your PDF merging needs and then some.

Scott Nesbitt is a Toronto, Canada-based writer and the Toronto managing editor for the ScalableAir Network.

No comments: