Java XPath Performance Tips

1. Introduction

This is the third article in a series dealing with using XPath within a Java application. This article focuses attention on tips for improving XPath performance.

The first two articles were:

The code used in this article uses Java SE 8u111. There are no other frameworks nor tools are referenced in this article.

2. Parser Choice

Essentially there are two types of parsers used to parse XML data:

3. Performance Tests

This section evaluates several, but similar XPath expressions for performance in regards to compilation and execution.

3.1 XML Data Used

The XPath expressions used in the performance tests were run against two XML files. Both files conform to the following structure:

Data files: entryLevel_Short.xml and entryLevel_Big.xml

<?xml version="1.0" encoding="UTF-8"?>
<TheRoot>
    <Entry>
        <Kid1></Kid1>
        <Kid2></Kid2>
        <Kid3></Kid3>
        <Kid4></Kid4>
        <Kid5></Kid5>
        <Kid6></Kid6>
        <Kid7></Kid7>
        <Kid8></Kid8>
        <Kid9></Kid9>
        <Kid10></Kid10>
    </Entry>
    <Entry>
        <Kid1></Kid1>
        <Kid2></Kid2>
        <Kid3></Kid3>
        <Kid4></Kid4>
        <Kid5></Kid5>
        <Kid6></Kid6>
        <Kid7></Kid7>
        <Kid8></Kid8>
        <Kid9></Kid9>
        <Kid10></Kid10>
    </Entry>
    <Entry>
        <Kid1></Kid1>
        <Kid2></Kid2>
        <Kid3></Kid3>
        <Kid4></Kid4>
        <Kid5></Kid5>
        <Kid6></Kid6>
        <Kid7></Kid7>
        <Kid8></Kid8>
        <Kid9></Kid9>
        <Kid10></Kid10>
    </Entry>
              .
              .
              .
A. The file enrtyLevel_Short.xml has 2,000 ‘Entry’ nodes
B. The file entryLevel_Big.xml has 20,000 ‘Entry’ nodes
C. Each ‘Entry’ node, in both files, has 10 children nodes.
    Kid1 through Kid10

3.2 Code Used

The code used to compile and evaluate the XPath expressions using XPath objects, previously presented in the Java XPath Examples article (https://examples.javacodegeeks.com/core-java/xml/xpath/java-xpath-examples/). The code was updated with a loop to perform the evaluation, compilation, and execution of the XPath expression a preset number of times.

Code adopted from Java XPath Best Practices Tutorial article (https://examples.javacodegeeks.com/core-java/xpath-best-practices-tutorial/) has been added to evaluate ‘predicate’ based XPath expressions by ‘Walking the DOM’, with the addition of a loop to validate and execute the expression.

The complete code used is contained in the class XPathPerformanceTester and is included, along with the XML data files (entryLevel_Short.xml and entryLevel_Big.xml), as part of the download attached to this article.

3.3 Performance Comparison Results

Below is a table that compares the same XPath expression when evaluated against XML data files with:

Table showing performance when compiling and evaluating different XPath expressions with smaller and larger XML files

                       Average time using       Average time using     
Expression             entryLevel_Short.xml     entryLevel_Big.xml     Comment
/TheRoot/Entry          82ms                       319ms                First Entry node
//Kid1/..               88ms                       550ms                Parent of first Kid1
//Kid1/parent::node()   42ms                       416ms                Parent of first Kid1
/TheRoot/Entry/Kid1     32ms                       322ms                Returns first Kid1 node

Rows 5 and 6 both return the parent of the first Kid1 node. Row 5 locates the parent by using the relative location operator ‘..’ to get the parent, where Row 6 uses the ‘parent::’ axes to get the same result. Notice when executed against the smaller XML file, the expression ‘//Kid1/..’ expression takes over twice as long to execute as the ‘//Kid1/parent::node()’ expression. Compare that to the execution times for a file with 10% more nodes, the average execution for the ‘//Kid1/parent:: node()’ expression grows by roughly 10% as expected, while the execution time for the ‘//Kid1/..’ expression extends by approximately 20%.

Table showing performance when using predicates with XPath objects versus ‘Walking the DOM’ as applied to smaller and larger XML files

                          Average time using       Average time using     
Expression                entryLevel_Short.xml     entryLevel_Big.xml     Comment
/TheRoot/Entry[last()]    847ms                    88,794ms               Last Entry node
~Entry[last()]             15ms                        30ms               Last Entry node
/TheRoot/Entry[1000]       34ms                       203ms               1,000th Entry Node
~Entry[1000]                2ms                        15ms               1,000th Entry Node
/TheRoot/Entry[2000]       39ms                       218ms               2,000th Entry Node
~Entry[2000]                2ms                        14ms               2,000th Entry Node
/TheRoot/Entry[20000]      97ms                       392ms               20,000th Entry Node
~Entry[20000]             undefined-Not Enough Data    16ms               20,000th Entry Node

Expressions in that begin with ‘~’ are evaluated using ‘Walking the DOM’ code.

The expression ‘//Entry[last()]’ returns the last ‘Entry’ node in the data file. Notice that the timing results for Row 4, takes over an order of magnitude longer to execute over any other expression when used with the smaller XML file, and over 2 orders of magnitude when applied to the larger XML file.

The table shows that the ‘Walk the DOM’ code can determine the same result as the XPath Object code in a fraction of the time.

However, the comparison is not fair since the ‘Walk the DOM’ code is optimized for expressions using a predicate to find either the nth or last occurrence of a single node. The ‘Walk the DOM’ code is very specific and fails if used with more complicated expressions.

Granted this timing data is anecdotal at best. However, it does show some relative performance relationships between XPath expressions and the data.

4. Conclusion

XPath performance is a matter of tradeoffs of the different components.

DOM parsers are generally faster than SAX parsers. The tradeoff is that SAX parsers are more complicated to use since they are single pass, and event driven, but they can handle much larger XML files than DOM parsers.

Using ‘Walk the DOM’ code can be blazingly fast but is not very robust. Small changes to the XML schema can require massive code changes to accommodate. The tradeoff is that using XPath Objects usually only requires a change in the XPath expression to accommodate variations in the XML schema, but has poorer performance.

5. Alternative Products

Here is a partial list of XPath related products which address different aspects of performance or functionality:

6. Download the Source Code

Download the XML data and source code used in his article.

Download
You can download the full source code for this article here: JavaXPathPerformanceTips.zip
Exit mobile version