Prefix tree

The trie, or prefix tree, is a data structure for storing strings or other sequences in a way that allows for a fast look-up. In its simplest form it can be used as a list of keywords or a dictionary.
By associating each string with an object it can be used as an alternative to a hashmap. The name 'trie' comes from the word 'retrieval'.

The basic idea behind a trie is that each successive letter is stored as a separate node. To find out if the word 'cat' is in the list you start at the root and look up the 'c' node. Having found
the 'c' node you search the list of c's children for an 'a' node, and so on. To differentiate between 'cat' and 'catalog' each word is ended by a special delimiter.

The figure below shows a schematic representation of a partial trie:

Implementation

The fastest way to implement this is with fixed size arrays. Unfortunately this only works if you know which characters can show up in the sequences. For keywords with 26 letters its a fast but space
consuming option, for unicode strings its pretty much impossible.

Instead of fixed sizes arrays you can use a linked list at each node. This has obvious space advantages, since no more empty spaces are stored. Unfortunately searching a long linked list is rather
slow. For example to find the word 'zzz' you might need 3 times 26 steps.

Faster trie algorithms have been devised that lie somewhere between these two extremes in terms of speed and space consumption. These can be found by searching google.

Fun & games with prefix trees

Prefix trees are a bit of an overlooked data structure with lots of interesting possibilities.

Storage

By storing values at each leaf node you can use them as a kind of alternative hashmap, although when working with unicode strings a hashmap will greatly outperform a trie.

As a dictionary

Looking up if a word is in a trie takes O(n) operations, where n is the length of the word. Thus - for array implementations - the lookup speed doesn't change with increasing trie size.

Word completion

Word completion is straightforward to implement using a trie: simply find the node corresponding to the first few letters, and then collape the subtree into a list of possible endings.

This can be used in autocompleting user input in text editors or the T9 dictionary on your phone

Censoring strings

Given a large list of swear words and a string to censor a trie offers a speed advantage over a simple array of strings. If the swear word can appear anywhere in the string you'll need to attempt
to match it from any possible starting offset. With a string of m characters and a list of n words this would mean m*n string comparisons.

Using a trie you can attempt to find a match from each given offset in the string, this means m trie lookups. Since the speed of a trie lookup scales well with an increasing number of words this is
considerably faster than the array lookup.

Java linked list implementation

Just for fun, here's a java linked list implementation. Keep in mind that this is a fairly slow implementation. For serious speed boosts you'll need to investigate double or triple-array tries.

Please note: the version below is a simplified version intended only to give some insight into the workings of the Trie. For the full version please see theDownloads
section.

publicclass Trie { /** * The delimiter used in this word to tell where words end. Without a proper delimiter either A. * a lookup for 'win' would return false if the list also contained 'windows', or B. a lookup * for 'mag' would return true if the only word in the list was 'magnolia' * * The delimiter should never occur in a word added to the trie. */ public final static char DELIMITER = '\u0001'; /** * Creates a new Trie. */ public Trie() { root = new Node('r'); size = 0; } /** * Adds a word to the list. * @param word The word to add. * @return True if the word wasn't in the list yet */ public boolean add(String word) { if (add(root, word+ DELIMITER, 0)) { size++; int n = word.length(); if (n > maxDepth) maxDepth = n; return true; } return false; } /* * Does the real work of adding a word to the trie */ private boolean add(Node root, String word,int offset) { if (offset== word.length())return false; int c = word.charAt(offset); // Search for node to add to Node last = null, next = root.firstChild; while (next != null) { if (next.value < c) { // Not found yet, continue searching last = next; next = next.nextSibling; } else if (next.value == c) { // Match found, add remaining word to this node return add(next, word, offset+ 1); } // Because of the ordering of the list getting here means we won't // find a match else break; } // No match found, create a new node and insert Node node = new Node(c); if (last== null) { // Insert node at the beginning of the list (Works for next == null // too) root.firstChild = node; node.nextSibling = next; } else { // Insert between last and next last.nextSibling = node; node.nextSibling = next; } // Add remaining letters for (int i= offset + 1; i< word.length(); i++) { node.firstChild =new Node(word.charAt(i)); node = node.firstChild; } return true; } /** * Searches for a word in the list. * * @param word The word to search for. * @return True if the word was found. */ public boolean isEntry(String word) { if (word.length()== 0) throw new IllegalArgumentException("Word can't be empty"); return isEntry(root, w+ DELIMITER, 0); } /* * Does the real work of determining if a word is in the list */ private boolean isEntry(Node root, String word, int offset) { if (offset== word.length())return true; int c = word.charAt(offset); // Search for node to add to Node next = root.firstChild; while (next != null) { if (next.value < c) next= next.nextSibling; else if (next.value == c) return isEntry(next, word, offset +1); else return false; } return false; } /** * Returns the size of this list; */ public int size() { return size; } /** * Returns all words in this list starting with the given prefix * * @param prefix The prefix to search for. * @return All words in this list starting with the given prefix, or if no such words are found, * an array containing only the suggested prefix. */ public String[] suggest(String prefix) { return suggest(root, prefix,0); } /* * Recursive function for finding all words starting with the given prefix */ private String[] suggest(Node root,String word, int offset) { if (offset== word.length()) { ArrayList<String> words = new ArrayList<String>(size); char[] chars= new char[maxDepth]; for (int i = 0; i < offset; i++) chars[i] = word.charAt(i); getAll(root, words, chars, offset); return words.toArray(newString[words.size()]); } int c = word.charAt(offset); // Search for node to add to Node next = root.firstChild; while (next != null) { if (next.value < c) next= next.nextSibling; else if (next.value == c) return suggest(next, word, offset +1); else break; } return new String[]{ word }; } /** * Searches a string for words present in the trie and replaces them with stars (asterixes). * @param z The string to censor */ public String censor(String s) { if (size== 0) return s; String z = s.toLowerCase(); int n = z.length(); StringBuilder buffer = new StringBuilder(n); int match; char star = '*'; for (int i= 0; i < n;) { match = longestMatch(root, z, i,0, 0); if (match > 0) { for (int j = 0; j < match; j++) { buffer.append(star); i++; } } else { buffer.append(s.charAt(i++)); } } return buffer.toString(); } /* * Finds the longest matching word in the trie that starts at the given offset... */ private int longestMatch(Node root, String word, int offset,int depth, int maxFound) { // Uses delimiter = first in the list! Node next = root.firstChild; if (next.value== DELIMITER) maxFound = depth; if (offset== word.length())return maxFound; int c = word.charAt(offset); while (next != null) { if (next.value < c) next= next.nextSibling; else if (next.value == c) return longestMatch(next, word, offset + 1, depth + 1, maxFound); else return maxFound; } return maxFound; } /* * Represents a node in the trie. Because a node's children are stored in a linked list this * data structure takes the odd structure of node with a firstChild and a nextSibling. */ private class Node { public int value; public Node firstChild; public Node nextSibling; public Node(int value) { this.value= value; firstChild = null; nextSibling = null; } } private Node root; private int size; private int maxDepth; // Not exact, but bounding for the maximum }

Please note: the code given above is intended only to give some insight into the workings of the Trie. For the full version of the class please see theDownloads
section.

Prefix tree的更多相关文章

Leetcode: Implement Trie (Prefix Tree) && Summary: Trie
Implement a trie with insert, search, and startsWith methods. Note: You may assume that all inputs a ...
leetcode面试准备:Implement Trie (Prefix Tree)
leetcode面试准备:Implement Trie (Prefix Tree) 1 题目 Implement a trie withinsert, search, and startsWith m ...
【LeetCode】208. Implement Trie (Prefix Tree)
Implement Trie (Prefix Tree) Implement a trie with insert, search, and startsWith methods. Note:You ...
[LeetCode] 208. Implement Trie (Prefix Tree) ☆☆☆
Implement a trie with insert, search, and startsWith methods. Note:You may assume that all inputs ar ...
笔试算法题（39）：Trie树（Trie Tree or Prefix Tree）
议题:TRIE树 (Trie Tree or Prefix Tree): 分析: 又称字典树或者前缀树,一种用于快速检索的多叉树结构:英文字母的Trie树为26叉树,数字的Trie树为10叉树:All ...
Trie树（Prefix Tree）介绍
本文用尽量简洁的语言介绍一种树形数据结构 -- Trie树. 一.什么是Trie树 Trie树,又叫字典树.前缀树(Prefix Tree).单词查找树或键树,是一种多叉树结构.如下图: 上图是一 ...
字典树(查找树) leetcode 208. Implement Trie (Prefix Tree) 、211. Add and Search Word - Data structure design
字典树(查找树) 26个分支作用:检测字符串是否在这个字典里面插入.查找字典树与哈希表的对比:时间复杂度:以字符来看:O(N).O(N) 以字符串来看:O(1).O(1)空间复杂度:字典树远远小于哈 ...
LeetCode208 Implement Trie (Prefix Tree). LeetCode211 Add and Search Word - Data structure design
字典树(Trie树相关) 208. Implement Trie (Prefix Tree) Implement a trie with insert, search, and startsWith ...
【leetcode】208. Implement Trie (Prefix Tree 字典树)
A trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently s ...

随机推荐

springmvc注解形式的开发参数接收
springmvc基于注解的开发注解第一个例子 1. 创建web项目 springmvc-2 2. 在springmvc的配置文件中指定注解驱动,配置扫描器 <!-- sprimgmvc 注解 ...
手动添加SSH支持、使用c3p0
之前做的笔记,现在整理一下:大家有耐心的跟着做就能成功: SSH(struts2.spring.hibernate) * struts2 * 充当mvc的角色 * hibernate ...
FFmpeg源代码简单分析：libswscale的sws_scale()
===================================================== FFmpeg的库函数源代码分析文章列表: [架构图] FFmpeg源代码结构图 - 解码 F ...
1090. Highest Price in Supply Chain (25) -计层的BFS改进
题目如下: A supply chain is a network of retailers(零售商), distributors(经销商), and suppliers(供应商)-- everyon ...
shell 数据流重定向操作符总结
最近看了鸟哥私房菜关于shell数据流重定向的内容,总结一下. 操作符: 1.标准输入(stdin):代码为0,符号:< 或者<< 2.标准输出(stdout):代码为1,符号:&g ...
安装解压版本的MySQL，安装过程中的常见命令，检查windows系统错误日志的方式来检查MySQL启动错误，关于Fatal error: Can't open and lock privilege
以端口 port = 3306 # 设置mysql的安装目录 basedir=D://Installed//mysql-5.6.26-winx64//mysql-5.6.26-winx64 # ...
there was no endpoint listening at net.pipe://localhost/PreviewProcessingService/ReportProcessing
当你在开发reporting service报表时,进行报表的preview时报下图中的错误,以下方法可以让你直接跳过这个错误,继续查看报表的运行结果. 直接选择你需要运行查看的报表右击run就可以, ...
Android初级教程理论知识（第九章多媒体编程）
多媒体概念文字.图片.音频.视频计算机图片大小的计算图片大小 = 图片的总像素 * 每个像素占用的大小单色图:每个像素占用1/8个字节 16色图:每个像素占用1/2个字节 256色图:每个像素 ...
【Android应用开发】推送原理解析极光推送使用详解 (零基础精通推送)
作者 : octopus_truth 转载请注明出处 : http://blog.csdn.net/shulianghan/article/details/45046283 推送技术产生场景 : -- ...
ROS_Kinetic_20 ROS基础补充
ROS_Kinetic_20 ROS基础补充 1 手动创建ROS功能包参考官网:http://wiki.ros.org/cn/ROS/Tutorials/Creating%20a%20Package ...

Prefix tree

Prefix tree

Implementation

Fun & games with prefix trees

Storage

As a dictionary

Word completion

Censoring strings

Java linked list implementation

Prefix tree的更多相关文章

随机推荐

热门专题