DOM
Remove nodes from DOM document recursively
In this example we shall show you how to remove Nodes from a DOM Document recursively. We have implemented two methods, removeRecursively(Node node, short nodeType, String name)
, in order to remove recursively a Node from a DOM Document and void prettyPrint(Document xml)
, in order to convert a DOM into a formatted XML String. To remove Nodes from a DOM Document recursively one should perform the following steps:
- Obtain a new instance of a DocumentBuilderFactory, that is a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents.
- Set the parser produced so as not to validate documents as they are parsed, using
setValidating(boolean validating)
API method of DocumentBuilderFactory, with validating set to false. - Create a new instance of a DocumentBuilder, using
newDocumentBuilder()
API method of DocumentBuilderFactory. - Parse the FileInputStream with the content to be parsed, using
parse(InputStream is)
API method of DocumentBuilder. This method parses the content of the given InputStream as an XML document and returns a new DOM Document object. - Call
removeRecursively(Node node, short nodeType, String name)
method of the example. This method takes a Node, a shortnodeType
and a Stringname
as parameters. It checks if the nodeType of the given node is equal to the specified nodetype and if the given name is equal to the name of the node, usinggetNodeType()
andgetNodeName()
API methods of Node. If this statement is true, then it removes the node from the DOM DOcument, by taking the parent node of the node, withgetParentNode()
API method of Node, and then by removing the specified child, withremoveChild(Node oldChild)
API method of Node. Else, if the above statement is false, it gets the children of this Node, usinggetChildNodes()
API method of Node and does the same steps for each one of them. - Use
normalize()
API method of Document, to normalize the DOM tree. The method puts all text nodes in the full depth of the sub-tree underneath this node. - Call
void prettyPrint(Document xml)
method of the example. The method gets the xml Document and converts it into a formatted xml String, after transforming it with specific parameters, such as encoding. The method uses a Transformer, that is created usingnewTransformer()
API method of TransformerFactory. The Transformer is used to transform a source tree into a result tree. After setting specific output properties to the transformer, usingsetOutputProperty(String name, String value)
API method of Transformer, the method uses it to make the transformation, withtransform(Source xmlSource, Result outputTarget)
API method of Transformer. The parameters are the DOMSource with the DOM node and the result that is a StreamResult created from a StringWriter,
as described in the code snippet below.
package com.javacodegeeks.snippets.core; import java.io.File; import java.io.FileInputStream; import java.io.StringWriter; import java.io.Writer; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; public class RemoveNodesFromDOMDocumentRecursively { public static void main(String[] args) throws Exception { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setValidating(false); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new FileInputStream(new File("in.xml"))); // remove all elements named 'item' removeRecursively(doc, Node.ELEMENT_NODE, "item"); // remove all comment nodes removeRecursively(doc, Node.COMMENT_NODE, null); // Normalize the DOM tree, puts all text nodes in the // full depth of the sub-tree underneath this node doc.normalize(); prettyPrint(doc); } public static void removeRecursively(Node node, short nodeType, String name) { if (node.getNodeType()==nodeType && (name == null || node.getNodeName().equals(name))) { node.getParentNode().removeChild(node); } else { // check the children recursively NodeList list = node.getChildNodes(); for (int i = 0; i < list.getLength(); i++) { removeRecursively(list.item(i), nodeType, name); } } } public static final void prettyPrint(Document xml) throws Exception { Transformer tf = TransformerFactory.newInstance().newTransformer(); tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); tf.setOutputProperty(OutputKeys.INDENT, "yes"); Writer out = new StringWriter(); tf.transform(new DOMSource(xml), new StreamResult(out)); System.out.println(out.toString()); } }
Input:
<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Java Tutorials and Examples</title> <item> <title><![CDATA[Java Tutorials]]></title> <link>http://www.javacodegeeks.com/</link> </item> <item> <title><![CDATA[Java Examples]]></title> <link>http://examples.javacodegeeks.com/</link> </item> </channel> </rss>
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<rss version="2.0">
<channel>
<title>Java Tutorials and Examples</title>
</channel>
</rss>
This was an example of how to remove Nodes from a DOM Document recursively in Java.
Dear Sir It is indeed a very good effort, which makes it easy to process XML files. I have two questions: 1. How can we modify/add the code that can process multiple XML files at once that have similar structure/XML tags. 2. The program currently gives a NullPointerException if a similar XML file is given but with one/few missing tags. While processing multiple files (as in Question 01), “Can we ignore such exception without affecting the normal flow of the program?” In other words, is it possible to process multiple similar XML files with varying/missing number of nodes? Thanks and… Read more »