How to use XPath in Python
1. Introduction
In this example, we will learn about the XPath and how we can use it in Python to traverse different XML or HTML Files.
2. What is XPath?
XPath stands for XML Path Language. It is a major element in the XSLT standard. XPath can be used to navigate through elements and attributes in an XML document. XPath uses path expressions to navigate in XML documents. It also contains a library of standard functions and plays a major role in XSLT and in XQuery.
3. XPath Path Expressions
XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system. They are used by XSLT to perform transformations or by XPointer for addressing purposes. XPath specifies seven types of nodes that can be the output of the execution of the XPath expression.
- Root
- Element
- Text
- Attribute
- Comment
- Processing Instruction
- Namespace
The most useful path expressions are:
nodename | Selects all nodes with the name “nodename” |
/ | Selects from the root node |
// | Selects nodes in the document from the current node that matches the selection no matter where they are |
. | Selects the current node |
.. | Selects the parent of the current node |
@ | Selects attributes |
4. XPath Standard Functions
XPath includes over 200 built-in functions. There are functions for string values, numeric values, booleans, date and time comparison, node manipulation, sequence manipulation, and much more. These expressions can be used in JavaScript, Java, XML Schema, Python, and lots of other languages.
4.1 Node Functions
- node(): function return node value.
- text(): function return text value of specific node.
- comment(): function matches comment node and return that specific comment node.
- last(): function return size of total context in given context. name of this function last so it’s means function not return last node value.
- position(): function return the position of an element in the set (list) of elements in a given context.
- id(dtd_id): function return nodes base on passed DTD unique ID.
- name(node_expression): function return the string name of the last expression (node set).
- local-name(node_expression): function return the local string name of the last expression.
4.2 Numeric functions
- count(node_expression): function count number of element in a specified node.
- sum(node_expression): function return the sum of element value in a specified node.
- div: XPath div function does not take any parameter its you between two numeric functions. and given to a divided value.
- number(): XPath number function converting string to a number.
- floor(): XPath foor function return largest integer round value that is equal to or less then to a parameter value. Another way we can say last round value that value return of the function.
eg. FLOOR(10.45) return 10. - ceiling(): XPath ceiling function return smallest integer round value that is greater then or equal to a parameter value. Another way we can say next round value that value return of the function.
eg. CEILING(10.45) return 11. - round(): XPath round function return the round number of specified nth number of decimal place.
eg. ROUND(10.45) return 10.
4.3 String functions
- string(): XPath string function converting number to a string.
- concat(string, …): XPath concat function concatenated number of arguments and return to a concatenated string.
- starts-with(string, string): XPath start-with function return True/False. Return True if second argument string is start with first argument.
- contains(string, string) – XPath contains function return True/False. Return True if second argument string is a contain of first argument.
- substring(string, number, number) – XPath substring() function return a selected string character from a full string.
4.4 Boolean functions
XPath Boolean functions are used to convert an argument(as a number, string) to a boolean and return either True or False.
- boolean(number|string|node-expression|object): returns true if expression is positive.
- true(): returns true if passed string is a normal string.
- false(): returns false if passed string is not a normal string.
- lang(): returns true if context node language same as specified string argument.
5. How we use Python XPath in XSLT
5.1 Instalation
XPath is a major element in the XSLT standard. With XPath knowledge, you will be able to take great advantage of XSL. We will use XPath through the lxml toolkit. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt and is generally distributed through PyPI. Most Linux platforms come with some version of lxml readily packaged, usually named python-lxml for the Python 2.x version and python3-lxml for Python 3.x. If you can use that version, the quickest way to install lxml is to use the system package manager.
- Debian/Ubuntu: sudo apt-get install python3-lxml
- MacOS-X (a macport of lxml): sudo port install py27-lxml
If your system does not provide binary packages or you want to install a newer version, the best way is to get the pip package management tool and run the following:
- pip install lxml
5.2 Example
In this example, we will see how to use XPath with XSLT to transform an XML.
xpath.py
from io import StringIO from lxml import etree f = StringIO('<foo><bar>POTATO</bar></foo>') tree = etree.parse(f) print(tree.xpath('/foo/bar/text()')) r = tree.xpath('/foo/bar') print(len(r)) print(r[0].text) xsl = etree.XML('''\ <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" encoding="utf8" /> <xsl:template match="/"> <foo><xsl:value-of select="/a/b/text()" /></foo> </xsl:template> </xsl:stylesheet>''') transform = etree.XSLT(xsl) f = StringIO('tomato') doc = etree.parse(f) result = transform(doc) result.write_output("output.xml")
- Line 4: We create our XML-Like string.
- Line 5: In order to use the xpath commands we have to transform it into an etree.
- Line 6: Now we can traverse the XML using XPath.
- Line 11: Here we create the XSL file.
- Line 19: We pass the XSL file into the XSLT.
- Line 22: We tranform the f String into an XML, based on the XSL file we just created.
6. Summary
In this article, we used a simple example to convert an XML file into a different structure. With the usage of simple XPath expressions, we traversed different nodes and transformed them. XPath is underappreciated but is a very strong tool to use. It gives you flexibility and a fast way to search XML files. You can also very easily find nodes in an HTML file.
7. Download Source Code
This was an example of how we can use the XPath with XSLT in Python to parse and transform an XML File.
You can download the full source code of this example here: How to use XPath in Python