XPath Best Practices Tutorial

David GuinivereDecember 9th, 2016Last Updated: December 9th, 2016

0 725 6 minutes read

XPath is used to retrieve and interpret information represented in XML files. This tutorial assumes that the reader has a working knowledge of XPath, does not attempt to teach XPath. This tutorial instead shows you, the reader, how to create a simple Java SE application that uses XPath expressions to get information about a computer inventory stored in an XML file (Inventory.xml).

1. Introduction

Some of the output are simply values retrieved directly from the data file; other output results are calculated using XPath expressions.

This tutorial is written using Java SE 8u111. There are no other frameworks nor tools utilized in this tutorial.

See the W3Schools XPath tutorial for a review of XPath.

http://www.w3schools.com/xml/xpath_intro.asp

1.1 XPath: What Is It and Why Use It?

XPath is essentially a syntax that uses path expressions to navigate through XML data files, to retrieve information. XPath includes hundreds of built-in functions for retrieving string, numeric, and boolean values.

There are functions for date and time comparison, node manipulation, sequence manipulation, just to name a few categories. C++, JavaScript, PHP, Python, and many other languages (and technologies) use XPath in addition to Java.

2. The Data

As mentioned above our data is going to be a single XML file, which is used by our application. The Inventory.xml file describes the computers in our inventory and is categorized by the vendor.
NOTE: This tutorial uses contrived data. The data is not intended to be factual (or even realistic).

inventory.xml:
<?xml version="1.0" encoding="UTF-8"?>
<inventory>
    <vendor name="Dell">
        <computer>
            <model>Win 10 Laptop</model>
            <os>Windows 10</os>
            <cpu>Intel i7</cpu>
            <ram>12GB</ram>
            <price>900.00</price>
        </computer>
        <computer>
            <model>Low Cost Windows Laptop</model>
            <os>Windows 10 Home</os>
            <cpu>Intel Pentium</cpu>
            <ram>4GB</ram>
            <price>313.00</price>
        </computer>
        <computer>
            <model>64 Bit Windows Desktop Computer</model>
            <os>Windows 10 Home 64 Bit</os>
            <cpu>AMD A8-Series</cpu>
            <ram>8GB</ram>
            <price>330.00</price>
        </computer>
    </vendor>
    <vendor name="Apple">
        <computer>
            <model>Apple Desktop Computer</model>
            <os>MAC OS X</os>
            <cpu>Intel Core i5</cpu>
            <ram>8GB</ram>
            <price>1300.00</price>
        </computer>
        <computer>
            <model>Apple Low Cost Desktop Computer</model>
            <os>OS X Yosemite</os>
            <cpu>4th Gen Intel Core i5</cpu>
            <ram>8GB</ram>
            <price>700.00</price>
        </computer>
    </vendor>
    <vendor name="HP">
        <computer>
            <model>HP Low Cost Windows 10 Laptop</model>
            <os>Windows 10 Home</os>
            <cpu>AMD A6-Series</cpu>
            <ram>4GB</ram>
            <price>230.00</price>
        </computer>
        <computer>
            <model>Windows 7 Desktop</model>
            <os>Windows 7</os>
            <cpu>6th Gen Intel Core i5</cpu>
            <ram>6GB</ram>
            <price>750.00</price>
        </computer>
        <computer>
            <model>HP High End, Low Cost 64 Bit Desktop</model>
            <os>Windows 10 Home 64 Bit</os>
            <cpu>6th Gen Intel Core i7</cpu>
            <ram>12GB</ram>
            <price>800.00</price>
        </computer>
    </vendor>
</inventory>

1. There are 3 vendors; each vendor has a unique name
2. There are 8 computers defined
3. Each computer node has 5 children:

* model – Name of this configuration

* os – Name of Operating System installed

* cpu – Type of processor

* ram – size of installed RAM

* price – expressed as a decimal number

3. The Application

3.1 Parsers

The first decision when using XML is which type of XML parser to use. There are two main categories of XML parsers:

* DOM – Document Object Model – This popular class of parsers read the entire XML file and construct the DOM in memory. Since the DOM is memory resident, evaluation of the XPath expressions is faster.

* SAX – Simple API for XML – These parsers are event driven XML parsers that do not require much memory and are better suited for large XML files. SAX parsers are typically slower than DOM parsers. However, if the XML data is too large for the resulting model to fit in memory, or handling of special characters, or XML tags is required, then a SAX parser may be the only option.

For simplicity, and to keep the main focus of this tutorial on how to integrate XPath into your Java application, this tutorial uses a DOM parser.

3.2 Data Sources

Fortunately the DocumentBuilder.parse() method has been overloaded so the XML data can come from a number of different input sources as documented by Oracle in the DocumentBuilder class:

* File: Document domDocument = DocumentBuilder.parse(File f);

* InputStream: Document domDocument = DocumentBuilder.parse(InputStream is)* URI: Document domDocument = DocumentBuilder.parse(String uri);

See the Oracle documentation for the DocumentBuilder class for more details.

https://docs.oracle.com/javase/8/docs/api/javax/xml/parsers/DocumentBuilder.html

3.3 Set Up

After deciding which type of parser to use, the application needs 2 things to be ready for evaluating XPath expressions:

Item 1: A Document object in which the Document Object Model (DOM) stored.

Item 2: An XPath object which compiles XPath expressions, and queries the DOM (Item 1).

Item 3: Create the DOM by supplying an XML data source to the parse() method of the parser. The parser which is provided by a DocumentBuilderFactory object. When multiple DOMs are required, build the DOMs sequentially using a single parser.

3.4 Querying the Data

Querying the XML data is a 3 step process:

Step 1: Call the compile() method of the XPath object, which, if successful, yields an XPathExpression.

Step 2: Call the evaluate() method of the XPathExpression and specify a member of the XPathConstant class as the return type.

Step 3: Retrieve the data from the object returned in Step 2.

NOTE: Omitting the return type in Step 2 defaults the return type to String. In this case, there is no need for Step 3.

Examples:

//Read a string value
String str = xPath.compile(expression).evaluate(domDocument);

//Read a single XPath node
Node node = (Node) xPath.compile(expression).evaluate(domDocument, XPathConstants.NODE);

//Read a set of XPath nodes
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(domDocument, XPathConstants.NODESET);

3.5 XPath Expressions

Components of an XPath expression can include one or more of the following:

* Paths – Specifies nodes or node-sets

* Predicates – Predicates are surrounded by square brackets ([]) and used to specify specific nodes.

* Operators – Usual arithmetic and logical operators, plus the union operator (|)

* Axes – Specifies a set of nodes relative to the current node.

See the W3Schools XPath tutorial for details.

http://www.w3schools.com/xml/xpath_intro.asp

3.6 Walking the DOM

Learning how to use XPath expressions involves a steep, and sometimes frustrating learning curve. Walking the DOM and using org.w3c.dom.Element objects to visit and output node values is often a simpler approach. The trade-off is that this tends to involve more coding.

3.7 Putting It All Together

Now it is time to wrap up and demonstrate with a JavaSE XPath application. The application is very simple and divided into 2 main portions:

– Item 1: Setup

– Item 2: Examples using XPath expressions, and “Walking the DOM”

import java.io.FileInputStream;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class XPathBestPractices {

    public static void main(String... args) {

        ////////////////////////////////////////
        // First do the setup
        //
        // Instantiate the factory that supplies the DOM parser
        DocumentBuilderFactory builderFactory =
                DocumentBuilderFactory.newInstance();

        DocumentBuilder domParser = null;
        try {
            // Instantiate the DOM parser
            domParser = builderFactory.newDocumentBuilder();

            // Item 1: Load the DOM Document from the XML data using the parser
            Document domDocument =
                    domParser.parse(new FileInputStream("inventory.xml"));

            // Item 2: Instantiate an XPath object which compiles
            // and evaluates XPath expressions.
            XPath xPath = XPathFactory.newInstance().newXPath();

            String expr = null; // Used to hold the XPath expressions

            ////////////////////////////////////////
            // Now it's time to use the domDocument and the xPath objects,
            // repeatedly, to query the data out.

            // Use the XPath count() function to count the number of computers
            expr = "count(//computer)";
            Number computerCount = (Number) xPath.compile(expr).evaluate(domDocument,
                  XPathConstants.NUMBER);
            System.out.println("1. There are " + computerCount + 
                  " computers in the inventory.");
            outputSeparator();


            // Get a list of the vendors
            // The following expression gets a set of nodes that have a name attribute,
            // then sets the value of each node using the name attribute.
            expr = "//vendor[@name]/@name";
            NodeList resultNodeList = (NodeList) xPath.compile(expr)
                  .evaluate(domDocument, XPathConstants.NODESET);
            if (resultNodeList != null) {
                int vendorCount = resultNodeList.getLength();
                System.out.println("2. There are " + vendorCount + " vendors:");
                for (int i = 0; i < vendorCount; i++) { 
                     Node vendorNode = resultNodeList.item(i); 
                     String name = vendorNode.getNodeValue();
                     System.out.println(name); 
                }
            }
            outputSeparator();

            // Walk the DOM to print the computers in inventory
            Element rootElement = domDocument.getDocumentElement();
            NodeList modelNodeList = rootElement
                .getElementsByTagName("computer");
            System.out.println("3. Computer models in inventory:"); 
            if (modelNodeList != null && modelNodeList.getLength() > 0) {
                for (int i = 0; i < modelNodeList.getLength(); i++) {
                    Node node = modelNodeList.item(i);
                    if (node.getNodeType() == Node.ELEMENT_NODE) {
                        Element e = (Element) node;

                        displayNode(e, "model", "Model           : ");
                        displayNode(e, "os", "Operating System: ");
                        displayNode(e, "ram", "Installed RAM   : ");
                        displayNode(e, "cpu", "Processor       : ");
                        displayNode(e, "price", "Price           : $");
                        System.out.println();
                    }
                }
            }

        } catch (SAXException e) {
            // Even though we are using a DOM parser a SAXException is thrown
            // if the DocumentBuilder cannot parse the XML file
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (XPathExpressionException e) {
            e.printStackTrace();
        }
    }

    // Helper method to pretty up the output
    public static void outputSeparator() {
        System.out.println("=+=+=+=+=+=+=+=+");
    }

    // Helper method to output a node
    public static void displayNode(Element parent, String childName, String label) {
        NodeList nodeList = parent.getElementsByTagName(childName);
        System.out.println(label
                + nodeList.item(0).getChildNodes().item(0).getNodeValue());
    }
    
}

4. Download Complete Source Code and XML Data

Download
You can download the full source code and data for this example here: javaxpathbestpracticestutorial

XPath Best Practices Tutorial

1. Introduction

1.1 XPath: What Is It and Why Use It?

2. The Data

3. The Application

3.1 Parsers

3.2 Data Sources

3.3 Set Up

3.4 Querying the Data

3.5 XPath Expressions

3.6 Walking the DOM

3.7 Putting It All Together

4. Download Complete Source Code and XML Data

Thank you!

David Guinivere

Thank you!

1. Introduction

1.1 XPath: What Is It and Why Use It?

2. The Data

3. The Application

3.1 Parsers

3.2 Data Sources

3.3 Set Up

3.4 Querying the Data

3.5 XPath Expressions

3.6 Walking the DOM

3.7 Putting It All Together

4. Download Complete Source Code and XML Data

Thank you!

Related Articles

Thank you!