XPath

Java XPath Using SAX Example

1. Introduction

XPath is used to retrieve and interpret information represented in XML files using either a DOM or SAX parser.

    * DOM – Document Object Model – This popular class of parsers read the entire XML file and construct the DOM in memory. Since the DOM is memory resident, evaluation of the XPath expressions is faster.
    * SAX – Simple API for XML – These parsers are usually single-pass, event driven XML parsers that do not store the document model in memory and consequently have a much lower memory requirement which makes then better suited for large XML files. However, they also tend to be slower than DOM parsers. If the XML data is too large for the resulting model to fit in memory, or handling of special characters or XML tags is required, then a SAX parser may be the only option.

The previous articles in this series (listed in the next section) concentrated on the usage of DOM parsers.

Now attention is switched to using SAX parsers by looking at two ways to use a SAX Parser with Java to process XML files:

  • First: Use the SAX Parser from the javax.xml.parsers package to retrieve information from the inventory.xml file by defining and using a DefaultHandler to handle callback events from the parser.
  • Second: Use the SAX parser from Saxon (http://saxon.sourceforge.net/) to evaluate the same inventory.xml file using XPath expressions.

1.1. Requirements

This article assumes the reader has a working knowledge of XPath, and core Java. This article does not attempt to teach XPath nor Java.

This article and the code examples were written using Java SE 8u111. The second code example (Saxon Sample) uses the Saxon HE parser, version 9.7. IntelliJ IDEA was used to build and execute both the Default Handler and the Saxon code samples.

NOTE:The DefaultHandlerSample can be built an executed from the command line. However due to a documented bug in the saxon9he.jar, the SaxonSampler must be built and executed using the IntelliJ IDEA in order to avoid a run time error.

The Saxon HE package, documentation, additional code samples are all available from the Saxon website (http://saxon.sourceforge.net/). It is also strongly recommended you download the resources package, which contains sample code, and user documentation.

See the W3Schools XPath tutorial for a review of XPath.

See the previous articles in this series for basic information on using XPath with Java

2. The Data

The data used for both code samples presented in this article is a single XML file. The inventory.xml file describes the computers in a small inventory.

inventory.xml

inventory.xml:
<?xml version="1.0" encoding="UTF-8"?>
<inventory>
    <computer serialno="12345">
        <model>Win 10 Laptop</model>
        <os>Windows 10</os>
        <cpu>Intel i7</cpu>
        <ram>12GB</ram>
        <price>900.00</price>
    </computer>
    <computer serialno="P2233">
        <model>Low Cost Windows Laptop</model>
        <os>Windows 10 Home</os>
        <cpu>Intel Pentium</cpu>
        <ram>4GB</ram>
        <price>313.00</price>
    </computer>
    <computer serialno="X01985">
        <model>64 Bit Windows Desktop Computer</model>
        <os>Windows 10 Home 64 Bit</os>
        <cpu>AMD A8-Series</cpu>
        <ram>8GB</ram>
        <price>330.00</price>
    </computer>
    <computer serialno="APL888">
        <model>Apple Desktop Computer</model>
        <os>MAC OS X</os>
        <cpu>Intel Core i5</cpu>
        <ram>8GB</ram>
        <price>1300.00</price>
    </computer>
    <computer serialno="AB1C48">
        <model>Apple Low Cost Desktop Computer</model>
        <os>OS X Yosemite</os>
        <cpu>4th Gen Intel Core i5</cpu>
        <ram>8GB</ram>
        <price>700.00</price>
    </computer>
    <computer serialno="HP1C48">
        <model>HP Low Cost Windows 10 Laptop</model>
        <os>Windows 10 Home</os>
        <cpu>AMD A6-Series</cpu>
        <ram>4GB</ram>
        <price>230.00</price>
    </computer>
    <computer serialno="W7D001">
        <model>Windows 7 Desktop</model>
        <os>Windows 7</os>
        <cpu>6th Gen Intel Core i5</cpu>
        <ram>6GB</ram>
        <price>750.00</price>
    </computer>
    <computer serialno="HPHELC555">
        <model>HP High End, Low Cost 64 Bit Desktop</model>
        <os>Windows 10 Home 64 Bit</os>
        <cpu>6th Gen Intel Core i7</cpu>
        <ram>12GB</ram>
        <price>800.00</price>
    </computer>
</inventory><
  1. There are 8 computers defined
  2. Each computer node has a serial number (serialno) attribute
  3. Each computer node has 5 children:
  • model – Name of this configuration
  • os – Name of Operating System installed
  • cpu – Type of processor
  • ram – size of installed RAM
  • price – expressed as a decimal number

3. The Code Samples Using SAX Parsers

3.1. Using a Default Handler

In this code sample there are 3 classes:

  1. Compter.java – This class defines the Computer object with all of its’ getters and setters.
  2. MyHandler.java – A class to define how to handle startElement, endElement, and characters events from the SAX parser.
  3. JavaSAXParse.java – This is the main driving class for this simple application. It initializes the SAX parser with a reference to an instance of MyHandler, and a reference to the inventory.xml file, then gets a list of the computer nodes found by the parser, and displays the results.

Computer.java

package com.javacodegeeks.DefaultHandlerSample;

import java.text.DecimalFormat;
import java.text.NumberFormat;

/**
 * Computer object definition
 */


public class Computer {

    private String serialNo;
    private String model;
    private String os;
    private String cpu;
    private String ram;
    private Double price;

    private static final NumberFormat formatter = new DecimalFormat("#0.00");

    Computer() {
        serialNo = "";
        model = "";
        os = "";
        cpu = "";
        ram = "";
        price = 0.0;
    }

    public String getSerialNo() {
        return serialNo;
    }

    public void setSerialNo(String serialNo) {
        this.serialNo = serialNo;
    }

    public String getModel() {
        return model;
    }

    public void setModel(String model) {
        this.model = model;
    }

    public String getOs() {
        return os;
    }

    public void setOs(String os) {
        this.os = os;
    }

    public String getCpu() {
        return cpu;
    }

    public void setCpu(String cpu) {
        this.cpu = cpu;
    }

    public String getRam() {
        return ram;
    }

    public void setRam(String ram) {
        this.ram = ram;
    }

    public Double getPrice() { return price; }

    public void setPrice(Double price) {
        this.price = price;
    }

    @Override
        public String toString() {
            return "Computer:  SerialNo:" + this.serialNo + ", Model:" + this.model +
                    ", OS:" + this.os + ", CPU:=" + this.cpu + ",  RAM:" + this.ram +
                    ", Price:" + formatter.format(this.price);
        }

}

MyHandler.java

package com.javacodegeeks.DefaultHandlerSample;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import java.util.ArrayList;
import java.util.List;


/**
 * MyHandler class defines the actions to be taken
 * in response to SAX Parser callback events.
 */

public class MyHandler extends DefaultHandler {

    //List to hold Employees object
    private List compList = null;
    private Computer comp = null;


    // Getter method for list of computers list
    public List getCompList() {
        return compList;
    }

    boolean bModel;
    boolean bOs;
    boolean bCpu;
    boolean bRam;
    boolean bPrice;

    @Override
    public void startElement(String uri, String localName, String qName,
                             Attributes attributes) throws SAXException {

        if (qName.equalsIgnoreCase("Inventory")) {
            // If the list of computers is null, then initialize it
            if (compList == null)
                compList = new ArrayList();
        } else if (qName.equalsIgnoreCase("Computer")) {
            // Create a new Computer object, and set the serial number from the attribute
            comp = new Computer();
            // Get the serialNo attribute
            String serialNumber = attributes.getValue("serialno");
            comp.setSerialNo(serialNumber);

        // Set boolean values for fields, will be used in setting Employee variables
        } else if (qName.equalsIgnoreCase("model")) {
            bModel = true;
        } else if (qName.equalsIgnoreCase("os")) {
            bOs = true;
        } else if (qName.equalsIgnoreCase("cpu")) {
            bCpu = true;
        } else if (qName.equalsIgnoreCase("ram")) {
            bRam = true;
        } else if (qName.equalsIgnoreCase("price")) {
            bPrice = true;
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (qName.equalsIgnoreCase("Computer")) {
            // Add the Computer object to the list
            compList.add(comp);
        }
    }

    @Override
    public void characters(char ch[], int start, int length) throws SAXException {

        if (bModel) {
            // Set computer model age
            comp.setModel(new String(ch, start, length));
            bModel = false;
        } else if (bOs) {
            comp.setOs(new String(ch, start, length));
            bOs = false;
        } else if (bCpu) {
            comp.setCpu(new String(ch, start, length));
            bCpu = false;
        } else if (bRam) {
            comp.setRam(new String(ch, start, length));
            bRam = false;
        } else if (bPrice) {
            comp.setPrice(Double.parseDouble(new String(ch, start, length)));
            bPrice = false;
        }
    }
}

JavaSAXParse.java

package com.javacodegeeks.DefaultHandlerSample;

import java.io.File;
import java.io.IOException;
import java.util.List;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.SAXException;


public class JavaSAXParse {

    // Define the file path for the XML data file
    //    Default to project root
    static final String XML_DATA_FILE_PATH = "inventory.xml";


    public static void main(String[] args) {
        SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = saxParserFactory.newSAXParser();
            MyHandler handler = new MyHandler();
            saxParser.parse(new File(XML_DATA_FILE_PATH), handler);
            // Get Computer list
            List compList = handler.getCompList();
            // Display it to the user
            for (Computer comp : compList)
                System.out.println(comp);
        } catch (ParserConfigurationException | SAXException | IOException e) {
            e.printStackTrace();
        }
    }
}

Below is the output from the above code:

Computer:  SerialNo:12345, Model:Win 10 Laptop, OS:Windows 10, CPU:=Intel i7,  RAM:12GB, Price:900.00
Computer:  SerialNo:P2233, Model:Low Cost Windows Laptop, OS:Windows 10 Home, CPU:=Intel Pentium,  RAM:4GB, Price:313.00
Computer:  SerialNo:X01985, Model:64 Bit Windows Desktop Computer, OS:Windows 10 Home 64 Bit, CPU:=AMD A8-Series,  RAM:8GB, Price:330.00
Computer:  SerialNo:APL888, Model:Apple Desktop Computer, OS:MAC OS X, CPU:=Intel Core i5,  RAM:8GB, Price:1300.00
Computer:  SerialNo:AB1C48, Model:Apple Low Cost Desktop Computer, OS:OS X Yosemite, CPU:=4th Gen Intel Core i5,  RAM:8GB, Price:700.00
Computer:  SerialNo:HP1C48, Model:HP Low Cost Windows 10 Laptop, OS:Windows 10 Home, CPU:=AMD A6-Series,  RAM:4GB, Price:230.00
Computer:  SerialNo:W7D001, Model:Windows 7 Desktop, OS:Windows 7, CPU:=6th Gen Intel Core i5,  RAM:6GB, Price:750.00
Computer:  SerialNo:HPHELC555, Model:HP High End, Low Cost 64 Bit Desktop, OS:Windows 10 Home 64 Bit, CPU:=6th Gen Intel Core i7,  RAM:12GB, Price:800.00

3.2. Using XPath Expressions With the Saxon Parser

The Saxon SAX parser is a SAX parser which also supports XPath expressions.

When downloading the Saxon HE from the Saxon website (http://saxon.sourceforge.net/), it is also strongly recommended you download the resources package, which contains sample code, and user documentation.

This sample code consists of a single class XPathSAXExample to parse the Inventory.xml file, and evaluate XPath expressions.

XPathSAXExample.java

import net.sf.saxon.Configuration;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.om.DocumentInfo;
import net.sf.saxon.om.NodeInfo;
import net.sf.saxon.xpath.XPathFactoryImpl;
import org.xml.sax.InputSource;

import javax.xml.transform.sax.SAXSource;
import javax.xml.xpath.*;
import java.io.File;
import java.util.List;

/**
 * Class XPathSAXExample - Parses the Inventory.xml file and uses
 * the JAXP XPath API to evaluate XPath expressions.
 */

public class XPathSAXExample {


    public static void main (String args[]) throws Exception {
        XPathSAXExample xpsexample = new XPathSAXExample();
        xpsexample.runApp("inventory.xml");
    }

    /**
     * Run the application
     */

    public void runApp(String filename) throws Exception {

        /////////////////////////////////////////////
        // The following initialization code is specific to Saxon
        // Please refer to SaxonHE documentation for details
        System.setProperty("javax.xml.xpath.XPathFactory:"+
                NamespaceConstant.OBJECT_MODEL_SAXON,
                "net.sf.saxon.xpath.XPathFactoryImpl");

        XPathFactory xpFactory = XPathFactory.
                newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
        XPath xpExpression = xpFactory.newXPath();
        System.err.println("Loaded XPath Provider " + xpExpression.getClass().getName());

        // Build the source document.
        InputSource inputSrc = new InputSource(new File(filename).toURL().toString());
        SAXSource saxSrc = new SAXSource(inputSrc);
        Configuration config = ((XPathFactoryImpl) xpFactory).getConfiguration();
        TreeInfo treeInfo = config.buildDocumentTree(saxSrc);
        // End Saxon specific code
        /////////////////////////////////////////////

        XPathExpression findComputers =
                xpExpression.compile("count(//computer)");

        Number countResults = (Number)findComputers.evaluate(treeInfo, XPathConstants.NUMBER);
        System.out.println("1. There are " + countResults + " computers in the inventory.");
        outputSeparator();


        // Get a list of the serial numbers
        // The following expression gets a set of nodes that have a serialno attribute,
        // then extracts the serial numbers from the attribute and finally creates a
        // list of nodes that contain the serial numbers.
        XPathExpression findSerialNos =
                xpExpression.compile("//computer[@serialno]/@serialno");

        List resultNodeList = (List) findSerialNos.evaluate(docInfo, XPathConstants.NODESET);
        if (resultNodeList != null) {
            int count = resultNodeList.size();
            System.out.println("2. There are " + count + " serial numbers:");

            // Go through each node in the list and display the serial number.
            for (int i = 0; i < count; i++) {
                NodeInfo cNode = (NodeInfo) resultNodeList.get(i);
                String name = cNode.getStringValue();
                System.out.println("Serial Number:" + name);
            }
        }
        outputSeparator();


        // Finish when the user enters "."
        System.out.println("Finished.");
    }


    // Helper method to pretty up the output
    public static void outputSeparator() {
        System.out.println("=+=+=+=+=+=+=+=+");
    }

}

Below is the output from the above code:

1. There are 8.0 computers in the inventory.
=+=+=+=+=+=+=+=+
2. There are 8 serial numbers:
Serial Number:12345
Serial Number:P2233
Serial Number:X01985
Serial Number:APL888
Serial Number:AB1C48
Serial Number:HP1C48
Serial Number:W7D001
Serial Number:HPHELC555
=+=+=+=+=+=+=+=+
Finished.

4. Conclusion

SAX parsers are most commonly used by subclassing the DefaultHandler() method to suit your needs. Some SAX parsers do have a JAXP API interface that allows for the evaluation of XPath expressions.

5. Download the Source Code

This was a Java XPath Using SAX Example.

Download
You can download the full source code for this article here: JavaXPathUsingSAX.zip

David Guinivere

David graduated from University of California, Santa Cruz with a Bachelor’s degree in Information Science. He has been actively working with Java since the days of J2SE 1.2 in 1998. His work has largely involved integrating Java and SQL. He has worked on a wide range of projects including e-commerce, CRM, Molecular Diagnostic, and Video applications. As a freelance developer he is actively studying Big Data, Cloud and Web Development.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button