CS 2 (Winter 2024) Lab 06: Memoization

This lab focuses on using memoization (e.g., saving values to avoid repeated computation) to speed up code.

Setup

Register for the lab using grinch: https://grinch.caltech.edu/register. For this lab, you will need to clone the repository for some starter code. The button at the bottom of this page will automatically submit your answers, and gitlab will check them for you.

Manhattan Walks

Consider the following problem:

A Manhattan Walk from \((0, 0)\) to \((n, m)\) is a series of moves starting at \((n, m)\) with \(m\) Norths and \(n\) Easts that eventually reaches \((0, 0)\). There are many such Manhattan Walks (differentiated by a different order of Norths and Easts). The following programs both compute how many Manhattan Walks there are from \((n, m)\) to \((0, 0)\).


def walks(n, m):
   if n < 0 or m < 0:
      return 0
   if n == 0 and m == 0:
      return 1
   return walks(n - 1, m) + walks(n, m - 1)
walks_answers = {}
def walks_memo(n, m):
   global walks_answers
   if n < 0 or m < 0:
      return 0
   if n == 0 and m == 0:
       return 1
   if (n, m) in walks_answers:
      return walks_answers[(n, m)]
   walks_answers[(n, m)] = walks_memo(n - 1, m) + walks_memo(n, m - 1)
   return walks_answers[(n, m)]

The program on the left naively uses recursion to calculate the answer. Unfortunately, it repeats many of the same calls over and over.

For example, walks(2, 2) calls walks(1, 1) twice. (Once from walks(2, 2) \(\mapsto\) walks(1, 2) \(\mapsto\) walks(1, 1) and again from walks(2, 2) \(\mapsto\) walks(2, 1) \(\mapsto\) walks(1, 1).)

The program on the right “memoizes” the results, which is a fancy way of saying it saves answers to calls its already seen. First, it initializes a global map from arguments to answers. Then, before computing the recursive calls from scratch, it checks if they were already computed and in the map. Finally, before returning a new answer, it saves the result into that map.

static in Java

As we’ve discussed previously, static in Java is a way of indicating that a field “transcends” any particular instance. To put it another way, the field belongs to the blueprint of the class itself rather than to any particular instance created with the new keyword. This allows us to make “functions” and global variables that we can use over multiple iterations of calls or instantiations of classes. For today’s purposes, we will use static variables to count and collect things across multiple calls to the same “function”.

Counting the Number of Method Calls

We would like to begin by figuring out just how many repeated calls are made to walks when calculating a single input. To do this, we’ll write some Java code to automate tracing through the code and finding the right numbers.

Counting the Number of Unique Method Calls

This time, we will try to count the number of unique method calls made when walks(n, n) is called for various n’s.

Java has a (relatively) recent feature called record classes which allows you to write a whole class (including reasonable toString, hashCode, compareTo, etc. methods) without explicitly writing them yourself. In this lab, we will need to make use of a simple record class called Pair which represents a tuple of two items. You can declare this record class with the following line in your code:

        public record Pair<K, V>(K key, V value) {}

Calculating the Results (without memoization)

Finally, we will actually calculate the results.

Calculating the Results (with memoization)

Now, attempt to calculate walks(20, 20) with your existing code. By our calculations, this call would take around 2.5 hours to finish on Adam’s new 2024 MacBook Pro. Let’s see if we can speed this up using memoization. That is, using a hash table to save the results of any previous calculations to avoid spending time re-calculating the same thing over and over.

Memoization and Recurrence Relations

So when is memoization useful? As we saw in the earlier parts, there is a recurring pattern of recursive calls. At each step, we make calls by adjusting one or more of the arguments in a step-wise function, approaching the base case through various paths. This pattern is called a recurrence relation, defined in terms of a few variables. The recurrence relation in the above problems can be phrased as: f(n,m) = f(n-1, m) + f(n, m-1).

In the next portion, you will work on defining a recurrence relation.

Longest Common Subsequence

Now that you’ve seen how memoization works, let’s move onto a more difficult problem:

The longest common subsequence (LCS) of two strings is the longest sequence of characters common to both sequences. The characters in the subsequence must be in order, but do not have to be consecutive in either string (i.e., there can be skipped characters in the middle).


Let lcs(n, m) be the length of the longest common subsequence of the first \(n\) characters of \(s_1\) and the first \(m\) characters of \(s_2\).

Consider the following “greedy” algorithm that attempts to compute lcs(n, m):

def badlcs(s1, s2):
    # If there's no string left...
    if len(s1) == 0 or len(s2) == 0:
        return 0
    # If the first characters are a match...
    elif s1[0] == s2[0]:
        return 1 + lcs(s1[1:], s2[1:])
    # Reduce the problem since we didn't find a match...
    else:
        return lcs(s1[1:], s2)

The reason badlcs doesn’t work is that it attempts to always remove from the left. There is, however, a correct algorithm that always chooses the best path rather than assuming which one it will be. Think through what that algorithm might look like and fill in the blanks below to complete it.

Submit Responses For Credit