iText PDFReader Example
In the previous example, we studied how we can use IText
to create and manage PDF files. In this example, we will see how we can use IText to read the PDF files in our application.
We will use the PDF files created in the previous examples to read and decrypt. The reader may download the source files from the previous example.
Table Of Contents
1. Project Set-up
Let’s get started by creating a simple Maven project. Now, import the maven dependencies using the below pom.xml
:
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>ITextExample</groupId> <artifactId>ITextExample</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <dependency> <groupId>com.itextpdf</groupId> <artifactId>itextpdf</artifactId> <version>5.5.6</version> </dependency> <dependency> <groupId>org.bouncycastle</groupId> <artifactId>bcprov-jdk15on</artifactId> <version>1.52</version> </dependency> </dependencies> </project>
Now the project setup is complete and we can start with reading the PDF files.
2. Read a simple PDF
Here’s a simple class that reads the PDF File and prints it out in the console and also writes to a separate PDF File.
ReadPdf.java
package com.jcg.examples; import java.io.FileOutputStream; import java.io.IOException; import com.itextpdf.text.DocumentException; import com.itextpdf.text.pdf.BaseFont; import com.itextpdf.text.pdf.PdfContentByte; import com.itextpdf.text.pdf.PdfReader; import com.itextpdf.text.pdf.PdfStamper; public class ReadPdf { public static void main(String[] args) { try { PdfReader pdfReader = new PdfReader("HelloWorld.pdf"); PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileOutputStream("Rewritten HelloWorld.pdf")); PdfContentByte content = pdfStamper.getUnderContent(1);//1 for the first page BaseFont bf = BaseFont.createFont(BaseFont.TIMES_ITALIC, BaseFont.CP1250, BaseFont.EMBEDDED); content.beginText(); content.setFontAndSize(bf, 18); content.showTextAligned(PdfContentByte.ALIGN_CENTER, "JavaCodeGeeks", 250,650,0); content.endText(); pdfStamper.close(); pdfReader.close(); } catch (IOException e) { e.printStackTrace(); } catch (DocumentException e) { e.printStackTrace(); } } }
We create an instance of com.itextpdf.text.pdf.PdfReader
class by passing the Filename of the PDF we wish to read. Then we pass the instance of this class to com.itextpdf.text.pdf.PdfStamper
which creates a new PDF file and adds the content of the existing file along-with the extra Text we added. It is possible to add images and files in a similar fashion. The com.itextpdf.text.pdf.PdfContentByte
class is used to get the exact location where the file is to be modified like page number
, under the existing content, over the existing content, x & y pivot positions etc. It also applies proper encoding to the Fonts we have selected which are to be written to the PDF file.
Here is the sample output of the modified PDF :
3. Extract a File from PDF
In the previous example, we saw how we can attach a file to the PDF document. In this section we will see how we can extract an attached file from the PDF.
Here’s the code for it:
ExtractAttachment.java
package com.jcg.examples; import java.io.File; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.util.Set; import com.itextpdf.text.pdf.PRStream; import com.itextpdf.text.pdf.PdfArray; import com.itextpdf.text.pdf.PdfDictionary; import com.itextpdf.text.pdf.PdfName; import com.itextpdf.text.pdf.PdfReader; public class ExtractAttachment { private static final String FILE_NAME = "HelloWorld.pdf"; public static void main(String[] args) { try { PdfReader pdfReader = new PdfReader(FILE_NAME); PdfDictionary catalog = pdfReader.getCatalog(); PdfDictionary names = catalog.getAsDict(PdfName.NAMES); PdfDictionary embeddedFiles = names.getAsDict(PdfName.EMBEDDEDFILES); PdfArray embeddedFilesArray = embeddedFiles.getAsArray(PdfName.NAMES); extractFiles(pdfReader, embeddedFilesArray); } catch (IOException e) { e.printStackTrace(); } } private static void extractFiles(PdfReader pdfReader, PdfArray filespecs) { PdfDictionary files = filespecs.getAsDict(1); PdfDictionary refs = files.getAsDict(PdfName.EF); PRStream prStream = null; FileOutputStream outputStream = null; String filename = ""; Set keys= refs.getKeys(); try { for (PdfName key : keys) { prStream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key)); filename = files.getAsString(key).toString(); outputStream = new FileOutputStream(new File(filename)); outputStream.write(PdfReader.getStreamBytes(prStream)); outputStream.flush(); outputStream.close(); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { try { if (outputStream != null) outputStream.close(); } catch (IOException e) { e.printStackTrace(); } } } }
We start of extraction by creating the reading the PDF in the PdfReader
class. Then we extract the catalog of the document via the reader in the com.itextpdf.text.pdf.PdfDictionary
object. From the document catalog, we extract the array of attached documents and pass-on the pdfreader
and attached document array to the extractFiles
method.
This method gets a java.util.Set
object from the array and creates a new file with the same name as the attached file. We iterate over this Set i.e. once for each file in the attachment Set. We get the content of the attached file in the com.itextpdf.text.pdf.PRStream
object using the PdfReader#getPdfObject
method. We pass the key
as the current element taken from the Set.
4. Read an encrypted PDF
Reading an encrypted pdf is almost similar as reading a plain, non-encypted PDF. We just need to use a different version of PdfReader
constructor.
com.itextpdf.text.pdf.PdfReader.PdfReader(String filename, byte[] ownerPassword) throws IOException
In this constructor, we pass the owner password we used while creating the PDF document as a byte array.
ReadEncryptedPdf.java
package com.jcg.examples; import java.io.IOException; import com.itextpdf.text.pdf.PdfReader; public class ReadEncryptedPdf { public static void main(String[] args) { try { byte[] ownerPassword = "ownerPassword".getBytes(); PdfReader pdfReader = new PdfReader("EncryptedHelloWorld.pdf",ownerPassword); System.out.println("Is the PDF Encrypted "+pdfReader.isEncrypted()); System.out.println("File is opened with full permissions : "+pdfReader.isOpenedWithFullPermissions()); System.out.println("File length is : "+pdfReader.getFileLength()); System.out.println("File is tampered? "+pdfReader.isTampered()); pdfReader.close(); } catch (IOException e) { e.printStackTrace(); } } }
The pdfReader#isEncrypted()
method returns true if the Document opened by this instance is encrypted.
The isOpenedWithFullPermissions
is used to check if the document is opened with full permission i.e to read write and modify. If the document is not encrypted one, this method returns true
. The isTampered()
method is used to check if the file was modified.
com.itextpdf.text.pdf.PdfStamper
.Opening such a tampered reader in a Pdfstamper
will throw a com.itextpdf.text.DocumentException
with message as "the original document was reused read it again from file"
. Providing a wrong password will lead to com.itextpdf.text.BadPasswordException
when passing the reader to Pdfstamper class.
Here’s the output of the attempt :
Is the PDF Encrypted : true File is opened with full permissions : true File length is : 1393 File is tampered? false
5. Download the Source Code
We studied how we can read a PDF using PdfReader
class from IText and the different operations that could be performed on the PDF document.
You can download the source code of this example here: ItextPdfReaderExample.zip