Core Java

iText PDFReader Example

In the previous example, we studied how we can use IText to create and manage PDF files. In this example, we will see how we can use IText to read the PDF files in our application.

We will use the PDF files created in the previous examples to read and decrypt. The reader may download the source files from the previous example.
 
 
 
 
 
 
 

1. Project Set-up

Let’s get started by creating a simple Maven project. Now, import the maven dependencies using the below pom.xml:

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>ITextExample</groupId>
	<artifactId>ITextExample</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<dependencies>
		<dependency>
			<groupId>com.itextpdf</groupId>
			<artifactId>itextpdf</artifactId>
			<version>5.5.6</version>
		</dependency>
		<dependency>
			<groupId>org.bouncycastle</groupId>
			<artifactId>bcprov-jdk15on</artifactId>
			<version>1.52</version>
		</dependency>


	</dependencies>

</project>

Now the project setup is complete and we can start with reading the PDF files.

2. Read a simple PDF

Here’s a simple class that reads the PDF File and prints it out in the console and also writes to a separate PDF File.

ReadPdf.java

package com.jcg.examples;

import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;

public class ReadPdf
{
		public static void main(String[] args)
		{
				try
				{
					PdfReader pdfReader = new PdfReader("HelloWorld.pdf");
					PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileOutputStream("Rewritten HelloWorld.pdf"));
					PdfContentByte content = pdfStamper.getUnderContent(1);//1 for the first page
					BaseFont bf = BaseFont.createFont(BaseFont.TIMES_ITALIC, BaseFont.CP1250, BaseFont.EMBEDDED);
					content.beginText();
					content.setFontAndSize(bf, 18);
					content.showTextAligned(PdfContentByte.ALIGN_CENTER, "JavaCodeGeeks", 250,650,0);
					content.endText();
			
					pdfStamper.close();
					pdfReader.close();
				}
				catch (IOException e)
				{
						e.printStackTrace();
				}
				catch (DocumentException e)
				{
						e.printStackTrace();
				}
		}
}

We create an instance of com.itextpdf.text.pdf.PdfReader class by passing the Filename of the PDF we wish to read. Then we pass the instance of this class to com.itextpdf.text.pdf.PdfStamper which creates a new PDF file and adds the content of the existing file along-with the extra Text we added. It is possible to add images and files in a similar fashion. The com.itextpdf.text.pdf.PdfContentByte class is used to get the exact location where the file is to be modified like page number , under the existing content, over the existing content, x & y pivot positions etc. It also applies proper encoding to the Fonts we have selected which are to be written to the PDF file.

Here is the sample output of the modified PDF :

Fig 1 : Read PDF Using Itext
Fig 1 : Read PDF Using Itext

3. Extract a File from PDF

In the previous example, we saw how we can attach a file to the PDF document. In this section we will see how we can extract an attached file from the PDF.

Here’s the code for it:

ExtractAttachment.java

package com.jcg.examples;


import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Set;

import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfDictionary;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;


public class ExtractAttachment
{

		private static final String FILE_NAME = "HelloWorld.pdf";

		public static void main(String[] args)
		{
				try
				{
					PdfReader pdfReader = new PdfReader(FILE_NAME);
					PdfDictionary catalog = pdfReader.getCatalog();
					PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
					PdfDictionary embeddedFiles = names.getAsDict(PdfName.EMBEDDEDFILES);
					PdfArray embeddedFilesArray = embeddedFiles.getAsArray(PdfName.NAMES);
					extractFiles(pdfReader, embeddedFilesArray);
				}
				catch (IOException e)
				{
					e.printStackTrace();
				}
		}

		private static void extractFiles(PdfReader pdfReader, PdfArray filespecs)
		{
				PdfDictionary files = filespecs.getAsDict(1);
				PdfDictionary refs = files.getAsDict(PdfName.EF);
				PRStream prStream = null;
				FileOutputStream outputStream = null;
				String filename = "";
				Set keys= refs.getKeys();
				try
				{
					for (PdfName key : keys)
					{
						prStream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));
						filename = files.getAsString(key).toString();
						outputStream = new FileOutputStream(new File(filename));
						outputStream.write(PdfReader.getStreamBytes(prStream));
						outputStream.flush();
						outputStream.close();
					}
				}
				catch (FileNotFoundException e)
				{
					e.printStackTrace();
				}
				catch (IOException e)
				{
					e.printStackTrace();
				}
				finally
				{
						try
						{
							if (outputStream != null)
								outputStream.close();
						}
						catch (IOException e)
						{
							e.printStackTrace();
						}
				}
		}
}

We start of extraction by creating the reading the PDF in the PdfReader class. Then we extract the catalog of the document via the reader in the com.itextpdf.text.pdf.PdfDictionary object. From the document catalog, we extract the array of attached documents and pass-on the pdfreader and attached document array to the extractFiles method.

This method gets a java.util.Set object from the array and creates a new file with the same name as the attached file. We iterate over this Set i.e. once for each file in the attachment Set. We get the content of the attached file in the com.itextpdf.text.pdf.PRStream object using the PdfReader#getPdfObject method. We pass the key as the current element taken from the Set.

4. Read an encrypted PDF

Reading an encrypted pdf is almost similar as reading a plain, non-encypted PDF. We just need to use a different version of PdfReader constructor.

com.itextpdf.text.pdf.PdfReader.PdfReader(String filename, byte[] ownerPassword) throws IOException

In this constructor, we pass the owner password we used while creating the PDF document as a byte array.

ReadEncryptedPdf.java

package com.jcg.examples;

import java.io.IOException;

import com.itextpdf.text.pdf.PdfReader;

public class ReadEncryptedPdf
{
		public static void main(String[] args)
		{
				try
				{
						byte[] ownerPassword = "ownerPassword".getBytes();
						PdfReader pdfReader = new PdfReader("EncryptedHelloWorld.pdf",ownerPassword);
						System.out.println("Is the PDF Encrypted "+pdfReader.isEncrypted());
						System.out.println("File is opened with full permissions : "+pdfReader.isOpenedWithFullPermissions());
						System.out.println("File length is : "+pdfReader.getFileLength());
System.out.println("File is tampered? "+pdfReader.isTampered());
						
						pdfReader.close();
				}
				catch (IOException e)
				{
					e.printStackTrace();
				}
		}
}

The pdfReader#isEncrypted() method returns true if the Document opened by this instance is encrypted.

The isOpenedWithFullPermissions is used to check if the document is opened with full permission i.e to read write and modify. If the document is not encrypted one, this method returns true. The isTampered() method is used to check if the file was modified.

Note: If the tampered flag is set to true, it cannot be used in a com.itextpdf.text.pdf.PdfStamper.

Opening such a tampered reader in a Pdfstamper will throw a com.itextpdf.text.DocumentException with message as "the original document was reused read it again from file". Providing a wrong password will lead to com.itextpdf.text.BadPasswordException when passing the reader to Pdfstamper class.

Here’s the output of the attempt :

Is the PDF Encrypted : true
File is opened with full permissions : true
File length is : 1393
File is tampered? false

5. Download the Source Code

We studied how we can read a PDF using PdfReader class from IText and the different operations that could be performed on the PDF document.

Download
You can download the source code of this example here: ItextPdfReaderExample.zip

Chandan Singh

Chandan holds a degree in Computer Engineering and is a passionate software programmer. He has good experience in Java/J2EE Web-Application development for Banking and E-Commerce Domains.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button