Java XPath Examples
1. Introduction
The previous article, Java XPath Best Practices Tutorial (https://examples.javacodegeeks.com/core-java/xpath-best-practices-tutorial/), explored how to set up a Java application to create a DOM (Document Object Model) document using a DOM parser to read an XML file; and an XPath object to evaluate XPath expressions as applied to the DOM.
This article dives into how to construct XPath expressions. Starting with the syntax used to build XPath expressions, and ending with some examples sum up the concepts explored.
The download for this article includes both the inventory.xml file used in the previous article and also includes the complete source code for a simple Java console application, called XPath Expression Explorer. More details about XPath Expression Explorer revealed throughout this article.
2. XPath Expression Explorer
This article builds and uses a Java application (XPath Expression Explorer) to reveal facts about XPath expressions and to help shorten the learning curve encountered when learning XPath expressions.
2.1 The Data
Below is the inventory.xml file from the previous article.
inventory.xml
<?xml version="1.0" encoding="UTF-8"?> <inventory> <vendor name="Dell"> <computer> <model>Win 10 Laptop</model> <os>Windows 10</os> <cpu>Intel i7</cpu> <ram>12GB</ram> <price>900.00</price> </computer> <computer> <model>Low Cost Windows Laptop</model> <os>Windows 10 Home</os> <cpu>Intel Pentium</cpu> <ram>4GB</ram> <price>313.00</price> </computer> <computer> <model>64 Bit Windows Desktop Computer</model> <os>Windows 10 Home 64 Bit</os> <cpu>AMD A8-Series</cpu> <ram>8GB</ram> <price>330.00</price> </computer> </vendor> <vendor name="Apple"> <computer> <model>Apple Desktop Computer</model> <os>MAC OS X</os> <cpu>Intel Core i5</cpu> <ram>8GB</ram> <price>1300.00</price> </computer> <computer> <model>Apple Low Cost Desktop Computer</model> <os>OS X Yosemite</os> <cpu>4th Gen Intel Core i5</cpu> <ram>8GB</ram> <price>700.00</price> </computer> </vendor> <vendor name="HP"> <computer> <model>HP Low Cost Windows 10 Laptop</model> <os>Windows 10 Home</os> <cpu>AMD A6-Series</cpu> <ram>4GB</ram> <price>230.00</price> </computer> <computer> <model>Windows 7 Desktop</model> <os>Windows 7</os> <cpu>6th Gen Intel Core i5</cpu> <ram>6GB</ram> <price>750.00</price> </computer> <computer> <model>HP High End, Low Cost 64 Bit Desktop</model> <os>Windows 10 Home 64 Bit</os> <cpu>6th Gen Intel Core i7</cpu> <ram>12GB</ram> <price>800.00</price> </computer> </vendor> </inventory> A. There are 3 vendors; each vendor has a unique name B. There are 8 computers defined C. Each computer node has 5 children: * model – Name of this configuration * os – Name of Operating System installed * cpu – Type of processor * ram – size of installed RAM * price – expressed as a decimal number
2.2 The Application
Below is the Java code that comprises the XPath Expression Explorer console application.
JavaXPathExpressionExplorer.java
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; public class JavaXPathExpressionExplorer { public static final String DEFAULT_XML_FILENAME = "inventory.xml"; public static void main(String... args) { // Setup an InputStreamReader to read from the keyboard InputStreamReader reader = new InputStreamReader(System.in); BufferedReader in = new BufferedReader(reader); // Instantiate the factory that supplies the DOM parser DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder domParser = null; try { // Instantiate the DOM parser domParser = builderFactory.newDocumentBuilder(); // Load the DOM Document from the XML data using the parser Document domDocument = domParser.parse(getFileInputStreamName(in)); // Instantiate an XPath object which compiles // and evaluates XPath expressions. XPath xPath = XPathFactory.newInstance().newXPath(); while (true) { System.out.print("Enter expression (blank line to exit): "); String expr = in.readLine(); // Holds the XPath expression try { if ((expr == null) || (expr.length() == 0)) { System.exit(0); // User is done entering expressions } System.out.println(expr + " evaluates to:"); // See if expr evaluates to a String String resString = (String) xPath.compile(expr). evaluate(domDocument, XPathConstants.STRING); if (resString != null) { System.out.println("String: " + resString); } Number resNumber = (Number) xPath.compile(expr). evaluate(domDocument, XPathConstants.NUMBER); if (resNumber != null) { System.out.println("Number: " + resNumber); } Boolean resBoolean = (Boolean) xPath.compile(expr). evaluate(domDocument, XPathConstants.BOOLEAN); if (resNumber != null) { System.out.println("Boolean: " + resBoolean); } Node resNode = (Node) xPath.compile(expr). evaluate(domDocument, XPathConstants.NODE); if (resNode != null) { System.out.println("Node: " + resNode); } NodeList resNodeList = (NodeList) xPath.compile(expr). evaluate(domDocument, XPathConstants.NODESET); if (resNodeList != null) { int lenList = resNodeList.getLength(); System.out.println("Number of nodes in NodeList: " + lenList); for (int i = 1; i <= lenList; i++) { resNode = resNodeList.item(i-1); String resNodeNameStr = resNode.getNodeName(); String resNodeTextStr = resNode.getTextContent(); System.out.println(i + ": " + resNode + " (NodeName:'" + resNodeNameStr + "' NodeTextContent:'" + resNodeTextStr + "')"); } } outputSeparator(); } catch (XPathExpressionException e) { // Do nothing. This prevents output to console if // expression result type is not appropriate // for the XPath expression being compiled and evaluated } } // end while (true) } catch (SAXException e) { // Even though we are using a DOM parser a SAXException is thrown // if the DocumentBuilder cannot parse the XML file e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParserConfigurationException e){ e.printStackTrace(); } } // Helper method to load the XML file into the DOM Document public static String getFileInputStreamName(BufferedReader inputReader) { System.out.print("Enter XML file name (abc.xml) “+ “or leave blank to use "+DEFAULT_XML_FILENAME+": "); String fileName = null; try { fileName = inputReader.readLine(); } catch (IOException e) { e.printStackTrace(); } if ((fileName == null) || (fileName.length() == 0)) { return DEFAULT_XML_FILENAME; } return fileName; } // Helper method to pretty up the output public static void outputSeparator() { System.out.println("=+=+=+=+=+=+=+=+"); } }
The application initially prompts the user for an XML filename. Respond to this prompt with a blank line to use the inventory.xml file found in the application’s classpath.
The application then takes an XPath expression entered from the keyboard, compiles, and evaluates the expression using different return types (as determined by XPathConstants) and displays the results to the user.
The main loop in this application repeatedly prompts for XPath expressions. Entering a blank line terminates the application.
Admittedly the application is crude, but it is effective for learning about XPath expressions.
3. XPath Expressions
3.1 XPathConstants Effect on XPath Expressions
The evaluate() method of an XPath object allows the user to specify an optional XPathConstant which determines the data type of the result returned, which changes the value of the result.
NOTE: If the optional XPathConstant is not passed to evaluate(), the data type of the result returned by evaluate() is String.
The table below shows the effects of the different XPathConstants when the XPath expression /inventory/vendor/computer/cpu[text() = “Intel Pentium”] is evaluated given a DOM built from the inventory.xml file (noted in section 2.1 The Data)
Table showing effects of different XPathConstants
XPath Constant Java Data Type Value Returned XPathConstant.String String Intel Pentium XPathConstant.Number Number NaN XPathConstant.Boolean Boolean true XPathConstant.Node Node [cpu: null] XPathConstant.NodeList NodeList [cpu: null]
It is worth noting: Using the NodeList on line 7:
- Executing the getNodeName() method returns the String “cpu”
- Executing the getNodeValue() method returns the String “Intel Pentium”
(namely, the same value as shown on line 1)
This is shown in the code below, which has been excerpted from the XPath Expression Explorer:
Excerpt from JavaXPathExpressionExplorer.java
NodeList resNodeList = (NodeList) xPath.compile(expr). evaluate(domDocument, XPathConstants.NODESET); if (resNodeList != null) { int lenList = resNodeList.getLength(); System.out.println("Number of nodes in NodeList: " + lenList); for (int i = 1; i <= lenList; i++) { resNode = resNodeList.item(i-1); String resNodeNameStr = resNode.getNodeName(); String resNodeTextStr = resNode.getTextContent(); System.out.println(i + ": " + resNode + " (NodeName:'" + resNodeNameStr + "' NodeTextContent:'" + resNodeTextStr + "')"); } }
Which renders the following output when executed:
Output from code excerpt, above
Number of nodes in NodeList: 1 1: [cpu: null] (NodeName:'cpu' NodeTextContent:'Intel Pentium')
3.2 XPath Expression Syntax
DOM documents represent XML data as a tree structure. XPath expressions are a series of steps, or paths through the tree where each step specifies a Node or a set of nodes (NodeList) from the tree.
Each step comes from one of the following categories:
Node specifications
*matches any element node
/ | specifies the root node, which is the first node in the tree |
// | specifies nodes in the tree that matches the selection regardless of location within the tree |
. | specifies the current node |
.. | specifies the parent of the current node |
nodename | specifies all nodes in the tree with the name “nodename” |
@ | specifies attributes within the node |
@* | matches any node with any attribute |
node() | matches any node of any kind |
Predicates
Predicates are used to select specific nodes and are always surrounded by square brackets ‘[]’
Examples of some predicates are:
/vendor/computer[1] | Selects the first computer node that is the child of a vendor node |
/vendor/computer[last()] | Selects the last computer node that is a child of a vendor node |
/vendor/computer[last()-1] | Selects the computer before the last computer which is a child of a vendor |
/vendor/computer[position()350.00] | Selects all the computer nodes of any vendor with a price value greater than 350.00 |
Axes
XPath axes specify set of Nodes relative to the current node.
ancestor | specifies all ancestors (such as parent, or grandparent) of the current node |
ancestor-or-self | specifies all ancestors of the current node and the current node itself |
attribute | specifies all attributes of the current node |
child | specifies all children of the current node |
descendant | specifies all descendants (such as children, or grandchildren) of the current node |
descendant-or-self | specifies all descendants of the current node and the current node itself |
following | specifies everything in the document after the closing tag of the current node |
following-sibling | specifies all siblings after the current node |
namespace | specifies all namespace nodes of the current node |
parent | specifies the parent of the current node |
preceding | specifies all nodes that appear before the current node in the document except ancestors, attribute nodes and namespace nodes |
preceding-sibling | specifies all siblings before the current node |
self | specifies the current node |
Operators
Node Set Operator | |
| | Union of two node-sets (CAUTION: The Union operator ANDs two node sets. In most computer languages ‘|’ is an OR operation |
Arithmetic Operators | |
+ | Addition |
– | Subtraction |
* | Multiplication |
div | Integer Division |
mod | Modulus (division remainder) |
Logical Operators | |
and | And |
or | Or |
= | Equal |
!= | Not equal |
< | Less than |
> | Greater than |
>= | Greater than or equal to |
Functions
There is a vast array of XPath functions. In fact far too many to go into any detail here. If a function requires a text argument, as opposed to a Node orf NodeList, use the text() function to retrieve text associated with the current Node.
For information concerning XPath functions consult Section 4 of the XPath Specification:
3.3 XPath Expression Examples
Use the sample XPath expressions below, with the inventory.xml file and the XPath Expression Explorer. Then download for this article includes both the inventory.xml file and the source for the XPath Expression Explorer.
- Get a list of all “AMD” processors
/inventory/vendor/computer/cpu[contains(text(),”AMD”)] - Get list of the models of all computers with AMD processors
/inventory/vendor/computer/cpu[contains(text(),”AMD”)]/preceding-sibling::model - Get all of the computers with cpu of “Intel Pentium”
/inventory/vendor/computer/cpu[text() = “Intel Pentium”] - Select all computers with 4 GB ram
/inventory/vendor/computer/ram[text()=”4GB”] - Get all the vendors with computers with AMD processors
//computer[contains(cpu,’AMD’)]/parent::node()/@name
4. Download The Source Code
This was a Java Xpath example.
You can download the full source code for this article here: JavaXPathExamples.zip