Setup
Register for the lab using grinch
: https://grinch.caltech.edu/register. For this lab, you will need to clone the repository for some starter code. The button
at the bottom of this page will automatically submit your answers, and gitlab will check them for you.
Manhattan Walks
Consider the following problem:
A Manhattan Walk from \((0, 0)\) to \((n, m)\) is a series of moves starting at \((n, m)\) with \(m\) Norths and \(n\) Easts that eventually reaches \((0, 0)\). There are many such Manhattan Walks (differentiated by a different order of Norths and Easts). The following programs both compute how many Manhattan Walks there are from \((n, m)\) to \((0, 0)\).
def walks(n, m):
if n < 0 or m < 0:
return 0
if n == 0 and m == 0:
return 1
return walks(n - 1, m) + walks(n, m - 1)
walks_answers = {}
def walks_memo(n, m):
global walks_answers
if n < 0 or m < 0:
return 0
if n == 0 and m == 0:
return 1
if (n, m) in walks_answers:
return walks_answers[(n, m)]
walks_answers[(n, m)] = walks_memo(n - 1, m) + walks_memo(n, m - 1)
return walks_answers[(n, m)]
The program on the left naively uses recursion to calculate the answer. Unfortunately, it repeats many of the same calls over and over.
For example, walks(2, 2)
calls walks(1, 1)
twice. (Once from walks(2, 2)
\(\mapsto\) walks(1, 2)
\(\mapsto\) walks(1, 1)
and again
from walks(2, 2)
\(\mapsto\) walks(2, 1)
\(\mapsto\) walks(1, 1)
.)
The program on the right “memoizes” the results, which is a fancy way of saying it saves answers to calls its already seen. First, it initializes a global map from arguments to answers. Then, before computing the recursive calls from scratch, it checks if they were already computed and in the map. Finally, before returning a new answer, it saves the result into that map.
static
in Java
As we’ve discussed previously, static
in Java is a way of indicating that a field “transcends” any particular instance. To put it another way, the field belongs to the blueprint of the
class itself rather than to any particular instance created with the new
keyword. This allows us to make “functions” and global variables that we can use over multiple iterations of
calls or instantiations of classes. For today’s purposes, we will use static
variables to count and collect things across multiple calls to the same “function”.
Counting the Number of Method Calls
We would like to begin by figuring out just how many repeated calls are made to walks
when calculating a single input. To do this, we’ll write some Java
code to
automate tracing through the code and finding the right numbers.
Task 0.
Modify Walks.java
in the following three ways:
- Add a new static field to the class to represent the number of recursive calls made by
walks
. - Modify the
walks
method to modify the existing count every time a call towalks
is made (including the original, first call). - Write a
main
method that, for each argument pair in the following table, resets your static field, calls the method, and checks the resulting number.
Counting the Number of Unique Method Calls
This time, we will try to count the number of unique method calls made when walks(n, n)
is called for various n
’s.
Java has a (relatively) recent feature called record
classes which allows you to write a whole class (including reasonable toString
, hashCode
, compareTo
, etc. methods) without explicitly
writing them yourself. In this lab, we will need to make use of a simple record
class called Pair
which represents a tuple of two items. You can declare this record
class with the following
line in your code:
public record Pair<K, V>(K key, V value) {}
Task 1.
Modify Walks.java
in the following three ways:
- Add another new static field to keep track of which arguments have previously been given to a call to
walks
. Since the recursive calls might have different \(n\) and \(m\) values, you should use thePair
class here, as described above. - Modify the
walks
method to modify the set of prior arguments every time a call towalks
is made (including the original, first call). - Write a
main
method that, for each argument pair in the following table, resets your static field, calls the method, and checks the size of the resulting set.
Calculating the Results (without memoization)
Finally, we will actually calculate the results.
Task 2.
Use the existing Walks.java
(which does not use memoization) to fill in the values in the following table,
You should notice that, as the argument values increase, the speed of the function slows down significantly.
Calculating the Results (with memoization)
Now, attempt to calculate walks(20, 20)
with your existing code. By our calculations, this call would take around 2.5 hours to finish on Adam’s new 2024 MacBook Pro.
Let’s see if we can speed this up using memoization. That is, using a hash table to save the results of any previous calculations to avoid spending time re-calculating the
same thing over and over.
Task 3.
Modify Walks.java
in the following three ways:
- Add another new static field to keep track of the answers for each
Pair
provided as argments to a call towalks
. This field must be static. Think about why. - Modify the
walks
method to check the set of prior arguments every time a call towalks
is made before doing any further calculations. If the pair is in your field, just use whatever the previous result was instead of calculating it again. - Modify the
walks
method to add to your result to the static field after calculating it (i.e., right before returning to the user). - Write a
main
method that, for each argument pair in the following table, calls your method, and prints out the resulting answer.
Memoization and Recurrence Relations
So when is memoization useful? As we saw in the earlier parts, there is a recurring pattern of recursive calls. At each step, we make calls by adjusting one or more of the arguments in a step-wise function, approaching the base case through various paths. This pattern is called a recurrence relation, defined in terms of a few variables. The recurrence relation in the above problems can be phrased as: f(n,m) = f(n-1, m) + f(n, m-1)
.
In the next portion, you will work on defining a recurrence relation.
Longest Common Subsequence
Now that you’ve seen how memoization works, let’s move onto a more difficult problem:
The longest common subsequence (LCS) of two strings is the longest sequence of characters common to both sequences. The characters in the subsequence must be in order, but do not have to be consecutive in either string (i.e., there can be skipped characters in the middle).
Let lcs(n, m)
be the length of the longest common subsequence of the first \(n\) characters of \(s_1\) and the first \(m\) characters of \(s_2\).
Consider the following “greedy” algorithm that attempts to compute lcs(n, m)
:
def badlcs(s1, s2):
# If there's no string left...
if len(s1) == 0 or len(s2) == 0:
return 0
# If the first characters are a match...
elif s1[0] == s2[0]:
return 1 + lcs(s1[1:], s2[1:])
# Reduce the problem since we didn't find a match...
else:
return lcs(s1[1:], s2)
The reason badlcs
doesn’t work is that it attempts to always remove from the left. There is, however, a correct algorithm that always chooses the
best path rather than assuming which one it will be. Think through what that algorithm might look like and fill in the blanks below to complete it.