Xpath Unique/Distinct Value Example
In this article we will learn how to select a distinct value from an XML using XPath. We will use IntelliJ ans an IDE and will use Saxon API for XPATH evaluation.
1. Introduction
XPath is a W3C recommendation and is a major element in the XSLT standard. It can be used to navigate through elements and attributes in an XML document. It is a syntax for defining parts of an XML document and uses path expressions to navigate in XML documents. It contains a library of standard functions. XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system. XPath expressions can be used in JavaScript, Java, XML Schema, PHP, Python, C and C++, and lots of other languages.
2. Working with Examples
2.1 Create new project
In this section we will learn how to create a simple Java project in IntellJ. Then we will write a simple code to test our XPath example. Please note that you can use any other Integrated Development Environment (IDE) as well but then the steps for creating the project will be different.
Open IntelliJ and go to File=>New=>Project.
Choose Java and click Next.
In the next section IntelliJ will ask you if you want to create the project from template. We will not do that, so leave everything as it is and click Next. In the next window give the Project name and location and click Finish.
Now we will see how to create a new package and new class in this newly created project. Right click on the ‘src’ folder and choose New=>Package. Give the package name and click OK.
Now right click on the newly created package and choose New=>Java Class. Give the class name and click OK.
2.2 fn:distinct-values
We can use distinct-values
function which is available in XPath 2.0
for finding the unique values. The fn:distinct-values
function returns a sequence of unique atomic values from $arg
. Values are compared based on their typed value. Values of different numeric types may be equal, for example the xs:integer
value 1 is equal to the xs:decimal
value 1.0, so the function only returns one of these values. If two values have incomparable types, e.g. xs:string
and xs:integer
, they are considered distinct, rather than an error being raised. Untyped values are treated like strings.
The $arg
sequence can contain atomic values or nodes, or a combination of the two. The nodes in the sequence have their typed values extracted, using the usual function conversion rules. This means that only the contents of the nodes are compared, not any other properties of the nodes (for example, their names).
2.3 Saxon
Since the distinct-values
function is available in XPATH 2.0
we will make use of the Saxon package. A Saxon package is a collection of tools for processing XML documents. It has an XSLT 2.0
processor, which implements the Version 1.0 XSLT
and XPath Recommendations from the World Wide Web Consortium, found at XSLT and XPATH with a number of powerful extensions. It also has an XPath 2.0
processor accessible to Java applications. It has also got an XQuery
1.0 processor that can be used from the command line, or invoked from a Java application by use of an API.
Download the Saxon jar file and add it as a dependency. To add the jar file as a project dependency right click on the project and choose ‘Open Module Settings’. Go to the Dependencies tab.
Click on ‘+’ and choose ‘JARs or directories…’. Select the downloaded saxon jar file and click OK. For this example we are using 9.8.0-3 version.
2.4 Sample Code
In this section we will write some sample code. For simplicity we will create an XML file at the same location where the java file exist. Ideally you should keep your static files in different location. To create an XML file right click on the package and choose New => File. Give the file name and click OK. We will create some test entries as below:
books.xml
<books> <book id="123456"> <title>Title 1</title> <author>Author 1</author> <publisher>Publisher 1</publisher> <isbn>ISBN1</isbn> <cost>56.98</cost> </book> <book id="452234"> <title>Title 2</title> <author>Author 2</author> <publisher>United Nation 2</publisher> <isbn>ISBN2</isbn> <cost>21.32</cost> </book> <book id="897855"> <title>Title 3</title> <author>Author 3</author> <publisher>Publisher 3</publisher> <isbn>ISBN3</isbn> <cost>107.90</cost> </book> <book id="897832"> <title>Title 4</title> <author>Author 3</author> <publisher>Publisher 4</publisher> <isbn>ISBN4</isbn> <cost>13.90</cost> </book> </books>
Now we will see the java code required to perform the XPath query. If you run the class below it will print 3 not 4 because the fourth book element in the above XML has got Author 3.
XPathDistinctValueExample.java
package com.javacodegeeks; import net.sf.saxon.Configuration; import net.sf.saxon.lib.NamespaceConstant; import net.sf.saxon.om.DocumentInfo; import net.sf.saxon.xpath.XPathFactoryImpl; import org.xml.sax.InputSource; import javax.xml.transform.sax.SAXSource; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpression; import javax.xml.xpath.XPathFactory; import java.io.File; import java.util.List; public class XPathDistinctValueExample { public static void main(String[] args) throws Exception { new XPathDistinctValueExample().execute(); } private void execute() throws Exception { System.setProperty("javax.xml.xpath.XPathFactory:" + NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl"); XPathFactory xPathFactory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON); XPath xPath = xPathFactory.newXPath(); InputSource inputSource = new InputSource(new File("src/com/javacodegeeks/books.xml").toURI().toString()); SAXSource saxSource = new SAXSource(inputSource); Configuration config = ((XPathFactoryImpl) xPathFactory).getConfiguration(); DocumentInfo document = config.buildDocument(saxSource); String xPathStatement = "distinct-values(//books/book/author)"; XPathExpression xPathExpression = xPath.compile(xPathStatement); List matches = (List) xPathExpression.evaluate(document, XPathConstants.NODESET); System.out.println(matches.size()); } }
3. Conclusion
In this example we saw what is an XPATH
and what it is used for. We discussed about the distinct-values
function available in XPATH 2.0
. Then in the next section we discussed how to create a simple Java project in IntelliJ. We also looked at how to add external dependencies in the project. In the last section we looked at the working example of using the distinct-values
function.