The Trie Data Structure – A Prefix Tree for Autocompletes

After covering the GADDAG Data Structure it seems like a good idea to follow up on the structure it’s based on, the Trie. The Trie is a very special case structure where that’s optimized for prefix searches and was first proposed in 1960 by Edward Fredkin, which has the advantage of using less memory than a GADDAG but has the downside where it’s only efficient for one purpose. It just so happens though that purpose is very common on websites: autocompletion.

A Trie for any one word looks like a linked list, as seen here:

1

No matter how complex a Trie gets, every word will only appear once and it will appear like it does above somewhere in the tree.

Let’s look at a more complex structure containing the words “Call”, “Cat”, “Cater”, “Bat”, “Bake”, “Cake”, and “Cope”:

2

Based on the example above, if you wanted to know every word that contained the prefix “CA” you can simply walk to the node “C” from the root, then down to “A”, and then crawl the rest of the tree to grab all the words that a hanging off of that node.

A Trie has a very fast prefix search time of O(M), where M is the number of letters in the prefix, and it has return time of O(M+N) where N is the number of nodes hanging off of that prefix. This makes the absolute worst case time for a Trie search O(N) where N is the number of nodes in the Trie, which happens when you search for an empty string as the prefix.

The worst case time can be avoided in situations like autocompletes where you will likely not return every possible word hanging off the the prefix, but you will instead only return the top N results. This is shown in the code below, and drastically reduces the time needed to return when dealing with large tries and short prefixes, for example when you want to autocomplete against a dictionary and the user just entered their first keystroke.

Here is a sample of a Trie implemented in .NET 4.5, I’m licensing it under the MIT License so it is free to use however you wish. This implementation does not include optimizations like using binary arrays to store characters or node compression (which would turn this into a radix tree), it was written to be simple to follow for those that want to walk through the operation of a Trie.

/*
    The MIT License (MIT)
    Copyright (c) 2013

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"),
    to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
    and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

using System.Collections;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Linq;

public class Trie
{
    public Node RootNode { get; private set; }

    public Trie()
    {
        RootNode = new Node { Letter = Node.Root };
    }

    public void Add(string word)
    {
        word = word.ToLower() + Node.Eow;
        var currentNode = RootNode;
        foreach(var c in word)
        {
            currentNode = currentNode.AddChild(c);
        }
    }

    public List<string> Match(string prefix, int? maxMatches)
    {
        prefix = prefix.ToLower();

        var set = new HashSet<string>();

        _MatchRecursive(RootNode, set, "", prefix, maxMatches);
        return set.ToList();
    }

    private static void _MatchRecursive(Node node, ISet<string> rtn, string letters, string prefix, int? maxMatches)
    {
        if (maxMatches != null && rtn.Count == maxMatches)
            return;

        if (node == null)
        {
            if (!rtn.Contains(letters)) rtn.Add(letters);
            return;
        }

        letters += node.Letter.ToString();

        if (prefix.Length > 0)
        {
            if (node.ContainsKey(prefix[0]))
            {
                _MatchRecursive(node[prefix[0]], rtn, letters, prefix.Remove(0, 1), maxMatches);
            }
        }
        else
        {
            foreach (char key in node.Keys)
            {
                _MatchRecursive(node[key], rtn, letters, prefix, maxMatches);
            }
        }
    }
}
public class Node
{
    public const char Eow = '$';
    public const char Root = ' ';

    public char Letter { get; set; }
    public HybridDictionary Children { get; private set; }

    public Node() { }

    public Node(char letter)
    {
        this.Letter = letter;
    }

    public Node this[char index]
    {
        get { return (Node)Children[index]; }
    }

    public ICollection Keys
    {
        get { return Children.Keys; }
    }

    public bool ContainsKey(char key)
    {
        return Children.Contains(key);
    }

    public Node AddChild(char letter)
    {
        if (Children == null)
            Children = new HybridDictionary();

        if (!Children.Contains(letter))
        {
            var node = letter != Eow ? new Node(letter) : null;
            Children.Add(letter, node);
            return node;
        }

        return (Node)Children[letter];
    }

    public override string ToString()
    {
        return this.Letter.ToString();
    }
}

Leave a comment