Trie tutorial in java
In this tutorial, we are going to discuss a data structure called Trie. Trie is a tree structure representing words. It could be used for implementing dictionary or spell checker. The following tutorial is implemented in java using junit 4.12 for unit test.
1. Introduction to Trie
Trie is an ordered tree structure which takes advantage of the structure of the keys that it stores. The idea behind a Trie is that it uses the key itself to navigate the search through the structure. The word Trie comes from the word reTRIEval. Because these keys help you retrieve the keys themselves.
Here is the representation of a Trie for a small dictionary of words. There are six words represented b, be, bear, best, bean, a, at, ate.
As you can see in this example, not every node in this Trie represents a word. Some of the nodes are just internal nodes and they don’t actually represent any word in this dictionary. Also, these Trie can have more than two children at each node. In binary search tree, each node always has two children, but these Trie can have more than two children.
Now, let’s find a word e.g. bean in this Trie. First start by looking at the first character in the word, which is b. Then, we can see there is a link to the next node down to the Trie that we could continue to that node. Then look ahead at our next character which is e. We repeat the process and find the next node. Afterwards, we look for the next character in the word, which is a. Similar we find the link to the next node. We continue looking for the characters until no more characters left. The last character is n, we can see there is a link from the current node to the next node which represents our word bean and we find the word. This is just a quick look at Trie to find a word. Let’s see how to implement a Trie.
2. Implementing a Trie
In the following example, TrieNode represents each node in Trie. It includes text, children and a flag to specify the node from non-word in the Trie. The children field is a key-value map. The key represents the character which links to another TrieNode. e.g. b which links to be. This is how we could track inside a Trie.
TrieNode.java
public class TrieNode { private HashMap<Character, TrieNode> children; private String text; private boolean isWord; public TrieNode() { children = new HashMap<Character, TrieNode>(); text = ""; isWord = false; } public TrieNode(String text) { this(); this.text = text; } public HashMap<Character, TrieNode> getChildren() { return children; } public String getText() { return text; } public boolean isWord() { return isWord; } public void setIsWord(boolean word) { isWord = word; } @Override public String toString() { return text; } }
In the following class add()
, find()
and delete()
methods are implemented. There are also some private methods which could help us to simplify those methods such as findNode()
or insertChar()
.
Here are the algorithm steps how to add()
a word into a Trie:
- Set current node to root node. The root node does not contain any letter (initialized to the null character for convenience).
- Convert the word to a char array.
- Set the current letter to the first letter in the word.
- If the current node already has an existing reference to the current letter (through one of the elements in the “children” field) then set current node to that referenced node. Otherwise, create a new node, set the letter to current letter, and set current node to this new node. Repeat step 3 until all letters in the current word has been processed.
Trie.java
public class Trie { private TrieNode root; private int size; public Trie() { root = new TrieNode(); size = 0; } public boolean add(String word) { TrieNode trie = root; if (trie == null || word == null) return false; char[] chars = word.toCharArray(); int counter = 0; while (counter < chars.length) { Set childs = trie.getChildren().keySet(); if (!childs.contains(chars[counter])) { insertChar(trie, chars[counter]); if (counter == chars.length - 1) { getChild(trie, chars[counter]).setIsWord(true); size++; return true; } } trie = getChild(trie, chars[counter]); if (trie.getText().equals(word) && !trie.isWord()) { trie.setIsWord(true); size++; return true; } counter++; } return false; } public boolean find(String str) { Map<Character, TrieNode> children = root.getChildren(); TrieNode t = null; for (int i = 0; i < str.length(); i++) { char c = str.charAt(i); if (children.containsKey(c)) { t = children.get(c); children = t.getChildren(); } else return false; } return true; } public boolean remove(String str) { return findNode(root, str); } private TrieNode getChild(TrieNode trie, Character c) { return trie.getChildren().get(c); } private TrieNode insertChar(TrieNode trie, Character c) { if (trie.getChildren().containsKey(c)) { return null; } TrieNode next = new TrieNode(trie.getText() + c.toString()); trie.getChildren().put(c, next); return next; } private boolean findNode(TrieNode trie, String s) { Map<Character, TrieNode> children = root.getChildren(); TrieNode parent = null; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (children.containsKey(c)) { parent = trie; trie = children.get(c); children = trie.getChildren(); if (trie.getText().equals(s)) { parent.getChildren().remove(c); trie = null; return true; } } } return false; } public int getSize() { return size; } }
Algorithm steps for how to find()
a word in a Trie:
- Set current children to root node’s children.
- Set the current letter to the first letter in the word.
- If children map contains the current letter, then set the current node to that node including its children.
- Repeat steps 2 and 3 until all letters in the word have been processed.
- Now there are two possibilities that may indicate the letter is not there in the tree:
- the current letter is the last letter and there is no valid node containing this letter.
- there is a valid node containing the last letter but the node does not indicate it contains a full word.
- If step the conditions in step 5 are not met, then we have a match for the word in the Trie.
Algorithm steps for remove()
is pretty similar to what we have described for find()
. The only difference is that when we find the word, we remove it from the Trie.
In the following test class, there are test cases for add()
, find()
and delete()
methods in Trie.
TrieTest.java
public class TrieTest { private Trie trie; @Before public void setUp() { trie = new Trie(); trie.add("at"); trie.add("Hello"); trie.add("Been"); trie.add("yes"); trie.add("water"); trie.add("be"); } @Test public void testInsert() { assertTrue(trie.find("water")); assertTrue(trie.find("at")); assertFalse(trie.find("Beat")); assertFalse(trie.find("Test")); trie.remove("water"); assertFalse(trie.find("water")); } @Test public void testDelete() { assertTrue(trie.find("Hello")); assertTrue(trie.find("at")); trie.remove("Hello"); trie.remove("at"); assertFalse(trie.find("Hello")); assertFalse(trie.find("at")); } }
Now, you should be able to add more functionalities and work with Trie. Have a look at the code and implement other methods such as replace()
or update()
.
3. Download the source code
You can download the full source code of this example here: Trie tutorial in Java
Whats the difference between a HashMap and LinkedList solution?