## CS 2 (Winter 2021)Project 04: Movie Finder

This project focuses on implementing the trie data structure, creating an extensible API, and designing performance improvements to tries to better fit a specific use-case.

# Goals and Outcomes

In this project, you will implement one data structure (called a trie) and two versions of an autocomplete algorithm (one that uses Java’s HashMap and one that uses your Trie). Your autocomplete algorithms will then be used to be the backend of a system that complete the names of movies in a search field.

By the end of this project, you will…

• be able to run an efficient autocomplete algorithm on a variety of datasets
• understand how to implement a trie data structure and some algorithms that correspond to them
• have gained a better understanding of API design, as it relates to making projects usable with a variety of clients

# Connection to Lecture

This project covers lecture08 (trees) and lecture09 (tries). You will likely find the code from these two lectures to be very useful in the later parts of this project.

# Overview and Implementation Strategy

In the previous project, you wrote three implementations of the IDeque interface. This time, you will continue our mission of implementing all the data structures ourselves by writing a trie implementation of the IDictionary interface. In addition to building a trie data structure, you will write an autocomplete algorithm which can switch between using a backing java.util.HashMap and your very own TrieMap.

# HashMovieAutoCompleter Requirements

First, we will implement a movie autocompleter algorithm backed by a HashMap. Our autocomplete implementation will be case-insensitive, and will suggest a list of movie names, given a set of words. Consider the movie database that includes the following movies:

• The Avengers
• The Avengers Age of Ultron
• Captain America
• Black Panther
• Transformers Age of Extinction

Now consider the autocomplete query "age of". We want to match on any set of consecutive words in a movie title; so, we should suggest ["The Avengers Age of Ultron", "Transformers Age of Extinction"].

However, if the query is "of age", we should not suggest anything, because none of the movies have “of” and “age” in that order. Similarly, if our autocomplete query is "the", then we should suggest ["The Avengers", "The Avengers Age of Ultron"].

In our first AutoCompleter, HashMovieAutoCompleter, the backing data structure will be a java.util.HashMap which maps each normalized movie title to an IDeques of “suffixes” that can be formed from that title.

For example, the above database would have the following mappings:

• "the avengers" $$\mapsto$$ ["the avengers", "avengers"]
• "the avengers age of ultron" $$\mapsto$$ ["the avengers age of ultron", "avengers age of ultron", "age of ultron", "of ultron", "ultron"]
• "captain america" $$\mapsto$$ ["captain america", "america"]
• "black panther" $$\mapsto$$ ["black panther", "panther"]
• "transformers age of extinction" $$\mapsto$$ ["transformers age of extinction", "age of extinction", "of extinction", "extinction"]

public static void populateTitles()

Populates the autocompleter title map with all suffixes for each movie title.

You will find the pre-populated field ID_MAP which maps movie titles to their IMDB ids useful. Do NOT attempt to open the data files directly; this has already been done for you in the super class AbstractMovieAutoCompleter.

public static IDeque<String> complete(String term)

Given a search term (i.e., words separated by spaces), returns an IDeque of movie titles that are possible completions of the search term. Make sure there are no duplicate values in the result.

The general algorithm here is to check if the search term is a prefix of any of the entries in the map. If so, it is a valid completion. Note that “This!” does not match “this” because symbols are significant.

# Running the AutoCompleter

If all your D tests pass, you should be able to run the “Movie Autocomplete” target of the project. A browser window should automatically open with the movie finder interface. The algorithm you just wrote backs the search for completions. We recommend trying a query like “the “ which we expect to have lots of results. You will likely notice that the interface is noticably slow! This is all because of the data structure we chose. In the next part of the project, we will write a new data structure, called a trie, which is more suited to this type of application.

# Trie Requirements

A Trie is a dictionary which maps “strings” to some type. Unlike other trees that we’ve seen previously, each node only contains a single “letter”. For example, here is a trie that represents the words {adam, add, app, bad, bag, bags, beds, bee, cab}:

You should be familiar with using a HashMap and a TreeMap at this point. So, we’ll start with a comparison to those.

## Comparing ITrieMap to HashMap and TreeMap

It helps to compare it with dictionaries you’ve already seen: HashMap and TreeMap. Each of these types of maps takes two generic parameters K (for the “key” type) and V (for the “value” type). It is important to understand that Tries are NOT a general purpose data structure. There is an extra restriction on the “key” type; namely, it must be made up of characters (it’s tempting to think of the Character type here, but really, we mean any alphabet–chars, alphabetic letters, bytes, etc.). Example of types that Tries are good for include: String, byte[], List<E>. Examples of types that Tries cannot be used for include int and Dictionary<E>.

In our implementation of Tries, we encode this restriction in the generic type parameters:

• A: An “alphabet type”. For a String, it would be Character. For a byte[], it would be Byte.
• K: A “key type”. We insist that all the “key types” extend Iterable<A> which encodes exactly the restriction that there is an underlying alphabet.
• V: A “value type”. There are no special restrictions on this type.

Because the key type is iterable, there is a clean way to take apart keys (i.e., you can use a foreach loop), but we also need a way to put keys back together. To do this, we use Java’s anonymous functions.

For our purposes, what this means is that the constructor will look like this:

public ITrieMap(Function<IDeque<A>, K> collector) {
...
}


And, eventually, when we create a trie to use for autocompleting, the instantiation will look like this:

private static ITrieMap<String, IDeque<String>, IDeque<String>> titles =
new TrieMap<>((IDeque<String> s) -> s);


## TrieMap

A TrieMap is an implementation of a trie where the “pointers” are made up of a HashMap. An array would work as well, but you should think about why that might not be a good idea if the alphabet size is large (like 5 or more). Consider the following ITrieMap:

We could manually construct this as a TrieMap as follows:

this.root = new TrieNode();
this.root.pointers.put('a', new TrieNode("hello"));
this.root.pointers.get('a').pointers.put('d', new TrieNode());
this.root.pointers.get('a').pointers.get('d').pointers.put('d', new TrieNode("1 + 1"));
this.root.pointers.get('a').pointers.put('p', new TrieNode());
this.root.pointers.get('a').pointers.get('p').pointers.put('p', new TrieNode("for a phone"));


Notice that the pointers variables in each of the nodes are just standard HashMaps!

You will implement all the standard IDictionary operations. For each of these, read the comments for IDictionary for the particulars. There are two methods (getCompletions and findPrefix) which are special for ITrieMaps and have additional information/restrictions:

• isPrefix(K k) should return true iff k is a prefix of some key in the trie. For example, if “add” were a key in the trie, then:
isPrefix(") = isPrefix("a”) = isPrefix("ad”) = isPrefix("add”) = true.
This method is arguably one of the major reasons to use a ITrieMap over another implementation of Dictionary. (We saw a similar trade-off between HashMap (faster) and TreeMap (ordered) a few weeks ago.) Unlike in a normal IDictionary, it is possible (and in fact, easy) to implement this method.

• getCompletions(K prefix) should return an IDeque filled with the values of all descendants of the prefix. For example, in the example above:
getCompletions("") = getCompletions("a") = ["hello", "1 + 1", "for a phone"]

Your implementations of get, put, isPrefix, and containsKey must have time complexity $$\Theta(d)$$ where $$d$$ is the number of letters in the key argument of these methods. These methods work on the entire key (the whole “string” of “letters”); make sure to only remove/add/find the exact key asked for. You will also have to implement other methods like keys and values, but the runtime of these methods may be worse. Notably, we are not implementing the remove method right now–it’s not needed for the application; so, we’ll implement it later in the A tests.

# TrieMovieAutoCompleter Requirements

Next, we’ll implement an autocompleter that uses our trie. The specification is identical to the D part, except the main dictionary will be a trie which maps suffixes to IDeques of titles instead of a HashMap that goes in the opposite direction. Furthermore, you will find the getCompletions method you wrote to be very useful when implementing complete().

After you’ve implemented this, go to Main.java and swap out line 28. You should see a significant speed-up from the HashMap version!

# TrieMapremove()

Finally, to wrap things up, go ahead and implement the remove method for your trie. remove(K k) should delete k from the trie as well as all of the nodes that become unnecessary. One implementation of remove (called lazy deletion) would be to find k in the map and set its value to null (since null is not a valid value in the map). You may not implement remove as lazy deletion. Instead, you must ensure that all leaves of your trie have values.