|
1 | 1 | # Trie |
2 | 2 |
|
3 | | -##What is a Trie? |
4 | | -A trie (also known as a prefix tree, or radix tree in some other (but different) implementations) is a special type of tree used to store associative data structures where the key item is normally of type String. Each node in the trie is typically not associated with a value containing strictly itself, but more so is linked to some common prefix that precedes it in levels above it. Oftentimes, true key-value pairs are associated with the leaves of the trie, but they are not limited to this. |
| 3 | +## What is a Trie? |
5 | 4 |
|
6 | | -##Why a Trie? |
7 | | -Tries are very useful simply for the fact that it has some advantages over other data structures, like the binary tree or a hash map. These advantages include: |
8 | | -* Looking up keys is typically faster in the worst case when compared to other data structures. |
9 | | -* Unlike a hash map, a trie need not worry about key collisions |
10 | | -* No need for hasing, as each key will have a unique path in the trie |
11 | | -* Tries, by implementation, can be by default alphabetically ordered. |
| 5 | +A `Trie`, (also known as a prefix tree, or radix tree in some other implementations) is a special type of tree used to store associative data structures. A `Trie` for a dictionary might look like this: |
12 | 6 |
|
| 7 | + |
13 | 8 |
|
14 | | -##Common Algorithms |
| 9 | +Storing the English language is a primary use case for a `Trie`. Each node in the `Trie` would representing a single character of a word. A series of nodes then make up a word. |
15 | 10 |
|
16 | | -###Find (or any general lookup function) |
17 | | -Tries make looking up keys a trivial task, as all one has to do is walk over the nodes until we either hit a null reference or we find the key in question. |
| 11 | +## Why a Trie? |
18 | 12 |
|
19 | | -The algorithm would be as follows: |
20 | | -``` |
21 | | - let node be the root of the trie |
22 | | - |
23 | | - for each character in the key |
24 | | - if the child of node with value character is null |
25 | | - return false (key doesn't exist in trie) |
26 | | - else |
27 | | - node = child of node with value character (move to the next node) |
28 | | - return true (key exists in trie and was found |
29 | | -``` |
| 13 | +Tries are very useful for certain situations. Here are some of the advantages: |
30 | 14 |
|
31 | | -And in swift: |
32 | | -```swift |
33 | | -func find(key: String) -> (node: Node?, found: Bool) { |
34 | | - var currentNode = self.root |
35 | | - |
36 | | - for c in key.characters { |
37 | | - if currentNode.children[String(c)] == nil { |
38 | | - return(nil, false) |
39 | | - } |
40 | | - currentNode = currentNode.children[String(c)]! |
41 | | - } |
| 15 | +* Looking up values typically have a better worst-case time complexity. |
| 16 | +* Unlike a hash map, a `Trie` does not need to worry about key collisions. |
| 17 | +* Doesn't utilize hashing to guarantee a unique path to elements. |
| 18 | +* `Trie` structures can be alphabetically ordered by default. |
42 | 19 |
|
43 | | - return(currentNode, currentNode.isValidWord()) |
44 | | - } |
45 | | -``` |
| 20 | +## Common Algorithms |
46 | 21 |
|
47 | | -###Insertion |
48 | | -Insertion is also a trivial task with a Trie, as all one needs to do is walk over the nodes until we either halt on a node that we must mark as a key, or we reach a point where we need to add extra nodes to represent it. |
| 22 | +### Contains (or any general lookup method) |
49 | 23 |
|
50 | | -Let's walk through the algorithm: |
| 24 | +`Trie` structures are great for lookup operations. For `Trie` structures that model the English language, finding a particular word is a matter of a few pointer traversals: |
51 | 25 |
|
52 | | -``` |
53 | | - let S be the root node of our tree |
54 | | - let word be the input key |
55 | | - let length be the length of the key |
56 | | - |
| 26 | +```swift |
| 27 | +func contains(word: String) -> Bool { |
| 28 | + guard !word.isEmpty else { return false } |
| 29 | + |
| 30 | + // 1 |
| 31 | + var currentNode = root |
57 | 32 |
|
58 | | - find(word) |
59 | | - if the word was found |
60 | | - return false |
61 | | - else |
62 | | - |
63 | | - for each character in word |
64 | | - if child node with value character does not exist |
65 | | - break |
66 | | - else |
67 | | - node = child node with value character |
68 | | - decrement length |
69 | | - |
70 | | - if length != 0 |
71 | | - let suffix be the remaining characters in the key defined by the shortened length |
72 | | - |
73 | | - for each character in suffix |
74 | | - create a new node with value character and let it be the child of node |
75 | | - node = newly created child now |
76 | | - mark node as a valid key |
77 | | - else |
78 | | - mark node as valid key |
| 33 | + // 2 |
| 34 | + var characters = Array(word.lowercased().characters) |
| 35 | + var currentIndex = 0 |
| 36 | + |
| 37 | + // 3 |
| 38 | + while currentIndex < characters.count, |
| 39 | + let child = currentNode.children[character[currentIndex]] { |
| 40 | + |
| 41 | + currentNode = child |
| 42 | + currentIndex += 1 |
| 43 | + } |
| 44 | + |
| 45 | + // 4 |
| 46 | + if currentIndex == characters.count && currentNode.isTerminating { |
| 47 | + return true |
| 48 | + } else { |
| 49 | + return false |
| 50 | + } |
| 51 | +} |
79 | 52 | ``` |
80 | 53 |
|
81 | | -And the corresponding swift code: |
| 54 | +The `contains` method is fairly straightforward: |
82 | 55 |
|
83 | | -```swift |
84 | | - func insert(w: String) -> (word: String, inserted: Bool) { |
85 | | - |
86 | | - let word = w.lowercaseString |
87 | | - var currentNode = self.root |
88 | | - var length = word.characters.count |
| 56 | +1. Create a reference to the `root`. This reference will allow you to walk down a chain of nodes. |
| 57 | +2. Keep track of the characters of the word you're trying to match. |
| 58 | +3. Walk the pointer down the nodes. |
| 59 | +4. `isTerminating` is a boolean flag for whether or not this node is the end of a word. If this `if` condition is satisfied, it means you are able to find the word in the `trie`. |
89 | 60 |
|
90 | | - if self.contains(word) { |
91 | | - return (w, false) |
92 | | - } |
| 61 | +### Insertion |
93 | 62 |
|
94 | | - var index = 0 |
95 | | - var c = Array(word.characters)[index] |
| 63 | +Insertion into a `Trie` requires you to walk over the nodes until you either halt on a node that must be marked as `terminating`, or reach a point where you need to add extra nodes. |
96 | 64 |
|
97 | | - while let child = currentNode.children[String(c)] { |
98 | | - currentNode = child |
99 | | - length -= 1 |
100 | | - index += 1 |
| 65 | +```swift |
| 66 | +func insert(word: String) { |
| 67 | + guard !word.isEmpty else { return } |
101 | 68 |
|
102 | | - if(length == 0) { |
103 | | - currentNode.isWord() |
104 | | - wordList.append(w) |
105 | | - wordCount += 1 |
106 | | - return (w, true) |
107 | | - } |
| 69 | + // 1 |
| 70 | + var currentNode = root |
| 71 | + |
| 72 | + // 2 |
| 73 | + var characters = Array(word.lowercased().characters) |
| 74 | + var currentIndex = 0 |
| 75 | + |
| 76 | + // 3 |
| 77 | + while currentIndex < characters.count { |
| 78 | + let character = characters[currentIndex] |
108 | 79 |
|
109 | | - c = Array(word.characters)[index] |
| 80 | + // 4 |
| 81 | + if let child = currentNode.children[character] { |
| 82 | + currentNode = child |
| 83 | + } else { |
| 84 | + currentNode.add(child: character) |
| 85 | + currentNode = currentNode.children[character]! |
110 | 86 | } |
| 87 | + |
| 88 | + currentIndex += 1 |
111 | 89 |
|
112 | | - let remainingChars = String(word.characters.suffix(length)) |
113 | | - for c in remainingChars.characters { |
114 | | - currentNode.children[String(c)] = Node(c: String(c), p: currentNode) |
115 | | - currentNode = currentNode.children[String(c)]! |
| 90 | + // 5 |
| 91 | + if currentIndex == characters.count { |
| 92 | + currentNode.isTerminating = true |
116 | 93 | } |
117 | | - |
118 | | - currentNode.isWord() |
119 | | - wordList.append(w) |
120 | | - wordCount += 1 |
121 | | - return (w, true) |
122 | 94 | } |
123 | | - |
| 95 | +} |
124 | 96 | ``` |
125 | 97 |
|
126 | | -###Removal |
127 | | -Removing keys from the trie is a little more tricky, as there a few more cases that we have to take into account the fact that keys may exist that are actually sub-strings of other valid keys. That being said, it isn't as simple a process to just delete the nodes for a specific key, as we could be deleting references/nodes necessary for already exisitng keys! |
| 98 | +1. Once again, you create a reference to the root node. You'll move this reference down a chain of nodes. |
| 99 | +2. Keep track of the word you want to insert. |
| 100 | +3. Begin walking through your word letter by letter |
| 101 | +4. Sometimes, the required node to insert already exists. That is the case for two words inside the `Trie` that shares letters (i.e "Apple", "App"). If a letter already exists, you'll reuse it, and simply traverse deeper down the chain. Otherwise, you'll create a new node representing the letter. |
| 102 | +5. Once you get to the end, you mark `isTerminating` to true to mark that specific node as the end of a word. |
128 | 103 |
|
129 | | -The algorithm would be as follows: |
130 | | - |
131 | | -``` |
132 | | - |
133 | | - let word be the key to remove |
134 | | - let node be the root of the trie |
135 | | - |
136 | | - find(word) |
137 | | - if word was not found |
138 | | - return false |
139 | | - else |
140 | | - |
141 | | - for each character in word |
142 | | - node = child node with value character |
143 | | - |
144 | | - if node has more than just 1 child node |
145 | | - Mark node as an invalid key, since removing it would remove nodes still in use |
146 | | - else |
147 | | - while node has no valid children and node is not the root node |
148 | | - let character = node's value |
149 | | - node = the parent of node |
150 | | - delete node's child node with value character |
151 | | - return true |
152 | | -``` |
| 104 | +### Removal |
153 | 105 |
|
| 106 | +Removing keys from the trie is a little tricky, as there are a few more cases you'll need to take into account. Nodes in a `Trie` may be shared between different words. Consider the two words "Apple" and "App". Inside a `Trie`, the chain of nodes representing "App" is shared with "Apple". |
154 | 107 |
|
155 | | - |
156 | | -and the corresponding swift code: |
| 108 | +If you'd like to remove "Apple", you'll need to take care to leave the "App" chain in tact. |
157 | 109 |
|
158 | 110 | ```swift |
159 | | - func remove(w: String) -> (word: String, removed: Bool){ |
160 | | - let word = w.lowercaseString |
161 | | - |
162 | | - if(!self.contains(w)) { |
163 | | - return (w, false) |
164 | | - } |
165 | | - var currentNode = self.root |
| 111 | +func remove(word: String) { |
| 112 | + guard !word.isEmpty else { return } |
166 | 113 |
|
167 | | - for c in word.characters { |
168 | | - currentNode = currentNode.getChildAt(String(c)) |
169 | | - } |
170 | | - |
171 | | - if currentNode.numChildren() > 0 { |
172 | | - currentNode.isNotWord() |
173 | | - } else { |
174 | | - var character = currentNode.char() |
175 | | - while(currentNode.numChildren() == 0 && !currentNode.isRoot()) { |
176 | | - currentNode = currentNode.getParent() |
177 | | - currentNode.children[character]!.setParent(nil) |
178 | | - currentNode.children[character]!.update(nil) |
179 | | - currentNode.children[character] = nil |
180 | | - character = currentNode.char() |
181 | | - } |
182 | | - } |
183 | | - |
184 | | - wordCount -= 1 |
185 | | - |
186 | | - var index = 0 |
187 | | - for item in wordList{ |
188 | | - if item == w { |
189 | | - wordList.removeAtIndex(index) |
190 | | - } |
191 | | - index += 1 |
| 114 | + // 1 |
| 115 | + var currentNode = root |
| 116 | + |
| 117 | + // 2 |
| 118 | + var characters = Array(word.lowercased().characters) |
| 119 | + var currentIndex = 0 |
| 120 | + |
| 121 | + // 3 |
| 122 | + while currentIndex < characters.count { |
| 123 | + let character = characters[currentIndex] |
| 124 | + guard let child = currentNode.children[character] else { return } |
| 125 | + currentNode = child |
| 126 | + currentIndex += 1 |
| 127 | + } |
| 128 | + |
| 129 | + // 4 |
| 130 | + if currentNode.children.count > 0 { |
| 131 | + currentNode.isTerminating = false |
| 132 | + } else { |
| 133 | + var character = currentNode.value |
| 134 | + while currentNode.children.count == 0, let parent = currentNode.parent, !parent.isTerminating { |
| 135 | + currentNode = parent |
| 136 | + currentNode.children[character!] = nil |
| 137 | + character = currentNode.value |
192 | 138 | } |
193 | | - |
194 | | - return (w, true) |
195 | 139 | } |
196 | | - |
| 140 | +} |
197 | 141 | ``` |
198 | 142 |
|
| 143 | +1. Once again, you create a reference to the root node. |
| 144 | +2. Keep track of the word you want to remove. |
| 145 | +3. Attempt to walk to the terminating node of the word. The `guard` statement will return if it can't find one of the letters; It's possible to call `remove` on a non-existant entry. |
| 146 | +4. If you reach the node representing the last letter of the word you want to remove, you'll have 2 cases to deal with. Either it's a leaf node, or it has more children. If it has more children, it means the node is used for other words. In that case, you'll just mark `isTerminating` to false. In the other case, you'll delete the nodes. |
199 | 147 |
|
200 | | -###Running Times |
| 148 | +### Time Complexity |
201 | 149 |
|
202 | | -Let n be the length of some key in the trie |
| 150 | +Let n be the length of some value in the `Trie`. |
203 | 151 |
|
204 | | -* Find(...) : In the Worst case O(n) |
205 | | -* Insert(...) : O(n) |
206 | | -* Remove(...) : O(n) |
| 152 | +* `contains` - Worst case O(n) |
| 153 | +* `insert` - O(n) |
| 154 | +* `remove` - O(n) |
207 | 155 |
|
208 | | -###Other Notable Operations |
| 156 | +### Other Notable Operations |
209 | 157 |
|
210 | | -* Count: Returns the number of keys in the trie ( O(1) ) |
211 | | -* getWords: Returns a list containing all keys in the trie ( *O(1) ) |
212 | | -* isEmpty: Returns true f the trie is empty, false otherwise ( *O(1) ) |
213 | | -* contains: Returns true if the trie has a given key, false otherwise ( O(n) ) |
214 | | - |
215 | | -`* denotes that running time may vary depending on implementation |
| 158 | +* `count`: Returns the number of keys in the `Trie` - O(1) |
| 159 | +* `words`: Returns a list containing all the keys in the `Trie` - O(1) |
| 160 | +* `isEmpty`: Returns `true` if the `Trie` is empty, `false` otherwise - O(1) |
216 | 161 |
|
217 | 162 | See also [Wikipedia entry for Trie](https://en.wikipedia.org/wiki/Trie). |
218 | 163 |
|
219 | | -*Written for the Swift Algorithm Club by Christian Encarnacion* |
220 | | - |
| 164 | +*Written for the Swift Algorithm Club by Christian Encarnacion. Refactored by Kelvin Lau* |
0 commit comments