** **
This will be the last solo project in CS 2. The reason this project is solo is because the material in this project is the most important part of CS 2.
You may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. Dean’s Tutors **may not** read your code.

# Goals and Outcomes

In this project, you will build a Markov Model using various data structures.

By the end of this project, you will have…

- implemented a hashtable and a binary search tree
- written a word suggester which predicts what the next word might be given some number of previous words

# Connection to Lecture

This project covers lecture08 (trees) and lecture12/lecture13 (hash tables). You will likely find the code from these lectures to be very useful in this project.

# Overview and Implementation Strategy

This week, the test “grade-levels” are a little weird. Instead of having what would normally be C-B-A, we have B-A. The B tests are incredibly important to the learning outcomes of the course which is why they’re rated “B” instead of “C”.

In previous projects, you wrote several “list-like” data structures (the `IDeque`

s) and one `IDictionary`

implementation (the trie).
This time, you will continue our mission of implementing all the data structures ourselves by
writing several more `IDictionary`

implementations. This time, we will use the data structures to back Markov text generation, which
“mimics” the style of a provided corpus.

** **
Just to be super clear: **you may not use ANY data structures in java.util in this project**. The whole point is to implement
your own versions of the fundamental data structures.

# Looking Back to Move Forward

** **
This time, you should copy forward at least one `IDeque`

implementation and your `TrieMap`

(if it passed the C tests or better in project04).
You should place these data structures in the `edu.caltech.cs2.datastructures`

package.

#
`NGram`

: A client of your data structures

Data structures store…data. Different data structures require the presence of different operations.

- A dictionary made out of a list requires
**equality**testing on the key type. - A dictionary made out of a hash table requires
**equality**testing*and***hashcode calculation**on the key type. - A dictionary made out of a tree requires
**equality**testing*and***comparison**on the key type.

In Java, we implement these operations directly in the class that represents the key type. For this project, we will mostly use the `NGram`

type. An `NGram`

is a list of \(n\) words appearing in order in a text.
Before implementing your data structures, it is imperative that equality, hashcode, and comparisons all work. Otherwise, you will run into weird
errors with the tests that are not due to your data structures.

** Task 0.**
Implement the `equals`

method in `NGram`

. Make sure to compare the sizes of the `NGram`

s before bothering to loop through them.
If you look in `IterableString`

, you can see an example of a working `equals`

method. Note that, unlike the `equals`

method
in `IterableString`

, you can’t just delegate to the internal data structure (because your `IQueue`

implementations *also* don’t have an `equals`

method
implemented). You may not use `toString`

(nor may you convert to a `String`

manually either!) to implement `equals`

; we expect you to build it from scratch.

Your `equals`

function must not modify the elements in the `data`

field nor their order.

** Task 1.**
Implement the `hashCode`

method in `NGram`

in the way discussed in class. **DO NOT USE THE pow FUNCTION!** To make this efficient, you might want to use Horner’s Rule. Your

`hashCode`

function must not modify the elements in the `data`

field or their order.**What multiplier should I use for Horner’s method?**

As discussed in class, 31 and 37 are good multiplier to use for Horner’s method because they’re both prime.

**How should I represent each String when computing my hash function?**

Hint: in Java, Strings have a built-in hashCode() function that you can call!

**I’ve done the above, but my hashCode method still fails!**

Our tests require that you multiply the first item in the sequence by 31 (or 37) to the nth power, and so on down to 0. Using the reverse order (e.g., 31^0*a + … 31^n*z, as at the above link) is technically correct, but not currently supported.

** Task 2.**
Implement the `compareTo`

method in `NGram`

. You may not use `toString`

to implement `compareTo`

; we expect you to build it from scratch.
Your `compareTo`

function must not modify the elements in the `data`

field or their order. If the `NGram`

s are equal, we expect an output of 0. If the first argument is shorter, a negative output is expected and vice versa if the second argument is shorter. Otherwise, the result should be the comparison of the first pair of unequal elements.

#
`MoveToFrontDictionary`

: Another Dictionary

In this part, you will implement `MoveToFrontDictionary`

, a new type of `Dictionary`

.

`MoveToFrontDictionary`

is a type of linked list where new items are inserted at the front of the list, and an existing item gets
moved to the front whenever it is referenced (e.g., when `get`

is called). Although it has \(\mathcal{O}(n)\) worst-case time operations, it has a very good amortized analysis.
We will not discuss this data structure in class. `MoveToFrontDictionary`

should **only** rely on *equality* testing of keys.

** **
In the past, students have tried to use doubly-linked lists for this data structure. **DO NOT DO THIS**. It is unnecessary and makes debugging far harder than it needs to be.

** **
Because in `MoveToFrontDictionary`

calls to `get`

change the ordering of the dictionary, your iterator should use `keys()`

instead.

** **
Rather than writing the same logic in multiple methods, we strongly recommend writing `get()`

and calling it in `put()`

and `remove()`

.

** Task 3.**
Implement `MoveToFrontDictionary`

as described above.

#
`ChainingHashDictionary`

: Another Another Dictionary

In this part, you will implement `ChainingHashDictionary`

. Your hash table must use separate chaining–not probing. Furthermore, you must make the type of chain
generic. In particular, you should be able to use *any* dictionary implementation as the type inside the buckets. To do this, you should take in a `Supplier<IDictionary<K, V>>`

in the constructor and call `chain.get()`

whenever you want to create a chain.

**What is this Supplier thing?**

As discussed in class, the buckets of a hash table must use some data structure to deal with collisions. The “Supplier” is a constructor which indicates *which* data
structure to use (e.g., MoveToFrontDictionary, BSTDictionary, etc.).

Your `ChainingHashDictionary`

should rehash as appropriate (use an appropriate load factor as discussed in class), and its capacity should always be a prime number.
Your `ChainingHashDictionary`

should be able to work with the provided corpora which means there shouldn’t be a hard cap on how much it can grow; though, it doesn’t
have to use primes past 500000. Recall that all Hash Tables rely on a reasonable definition of hash function as well as equality testing.

**Should I calculate primes in my constructor?**

You should not calculate primes in your constructor! That would be far too slow. Instead, hard-code a list of primes in a constant at the top of your class!

At some point, you will want to test various types of chains in your `ChainingHashDictionary`

. It is confusing to do this initially; so, we have provided some examples
in the `MarkovTextGenerator`

class.

** Task 4.**
Implement `ChainingHashDictionary`

as described above.

#
`BSTDictionary`

: Another Another Another Dictionary

In this part, you will implement `BSTDictionary`

. `put`

, `get`

, and `containsKey`

should have average \(\mathcal{O}(\lg(n))\) behavior, as discussed in class.

`remove`

is pretty tricky. You should look at the slides we posted that describe how it should work.

`remove`

MUST be written recursively to recieve any credit.

Recall that all binary search trees rely on a reasonable definition of comparison, in Java, this is the `compareTo`

method.

** Task 5.**
Implement `BSTDictionary`

as described above.

#
`NGram`

s and Generating Text

An `NGram`

is a list of \(n\) words appearing in order in a text. They are often used in textual analysis to see how frequent patterns
are. We have written a library which uses one of your `IDictionary`

s to generate text that sounds like the author of *an original text*.
You can check it out by running `MarkovTextGenerator`

. We recommend using the `reddit.corpus`

corpus which can be downloaded
here.