CS 2 (Winter 2024) Project 01: Ciphers

This project serves as a transition between CS 1 and CS 2. It will focus on helping you learn Java, use built-in data structures, and object-oriented programming.

Projects and Labs in CS 2

For projects in CS 2, you will sometimes be allowed to work in groups of two. This project, however, is solo. You may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. The projects are intended to be more difficult than the labs; they test how much of the material that lecture and lab introduced that you have actually learned. You will notice that labs are much more straight-forward about how to implement things, whereas projects will have a much larger design component where you have to decide how to set things up. We will ramp up slowly on this; so, for this project, we will still give you a lot of guidance. However, if you get stuck for more than 30 minutes on a bug, you should come to office hours. We want this experience to be an enjoyable one, and nobody really enjoys debugging for hours on end.

Setup

If you haven’t already, do the setup. Otherwise, just register for the project using grinch: https://grinch.caltech.edu/register and clone the repository as explained in the setup instructions.

From CS 1 to CS 2

Caesar Ciphers

A cipher is an algorithm for encrypting or decrypting text. Generally, we call the unencrypted text “plain text” and encrypted version “cipher text”.

Here is an example of text that uses a Caesar cipher (which you will implement in this project):

encdec

Notably, the original text is readable, but the encrypted version is not.

A Caesar Cipher is computed by “shifting” each letter of the text by a constant number, \(k\), as if they were in a circle. For example, the following “ring” represents a mapping for a Caesar Cipher where \(k = 4\):

circle

Caesar Cipher Creator (D Tests)

In this part, you will be editing the src/CaesarCipher.java file which has empty methods for you to fill in. As we will be providing you with Python code for these methods, you might find the handout that compares Python and Java syntax on the course website to be helpful.

Implementing the Class

Ultimately, the goal for this part is to take in a piece of text (the “plain text”) and a number of letters to rotate by (the “key”), and output the encrypted text (the “cipher text”). You should begin by reading the main method, which we have written for you, to get a sense of how Java can read from and write to the console.

In Python, there is only one way to represent Strings of length one; however, in Java, there are two. First, we can represent them as a string like in Python. Second, they constitute a separate type called char (or Character). The reason for this distinction is that, because we have types in Java, it is useful to be able to specify that we’re working with a single letter. Unlike in Python, Strings in Java must use double quotes and chars must use single quotes.

The first function you should implement is findIndexInAlphabet. Note that these purple boxes in the guide always indicate a function you will have to write.

public static int findIndexInAlphabet(char c)

You should loop through the valid indices of ALPHABET and return the index equal to the input character c if one exists. If no such index exists, you should return -1.


In Python, this method would approximately look like:

def findIndexInAlphabet(c):
    for i in range(len(ALPHABET)):
        if ALPHABET[i] == c:
            return i
    return -1


In Python, you can’t have two methods named the same thing, even if they have different arguments; however, in Java, this is allowed and called method overloading. Next, you will implement two versions of the rot method: one for if we want to rotate a character and another for rotating a String. Second, you should implement the char version.

public static char rot(char c, int amount)

You should convert the input character into a number. Then, add the rotation amount to that number, making sure to wrap the number around so it is between 0 and 25. Then, convert it back to a character.


In Python, this method would approximately look like:

def rot(c, n):
    idx = findIndexInAlphabet(c)
    newIdx = (idx + n) % len(ALPHABET)
    return ALPHABET[newIdx]


Finally, you will write the String version which gets called directly from main.

public static String rot(String line, int amount)

You should rotate each character by the rotation amount and return a new string with all of the new characters.


In Python, this method would approximately look like:

def rot(line, n):
    output = ""
    for c in line:
        output += rot(c, n)
    return output

Running the Tests

Those three functions are the only code you have to write for the CaesarCipher class. The next step is testing the code. Normally, you would test it by running main which you are welcome to do, but we’ve provided you with our tests which do this automatically. On the top right of the IntelliJ interface is a green “play” button which will run the tests whenever the “configuration” directly to the left is set to a “Tests” option. If you make sure it says “D Tests” and hit the play button, the JUnit testing interface should pop up on the bottom.

Caesar Cipher Solver (C Tests)

In the previous part, we encrypted a piece of text using a Caesar Cipher. In this part, we will go in the opposite direction: that is, we will decrypt the cipher text back into the plain text. For a Caesar cipher, this means we need to rotate every letter by \(26 - k\), where \(k\) is the key used to encrypt the text originally. Since there are only \(26\) possible values of \(k\), we can just try all of them and see which one “works best”; this technique is called brute forcing a solution.

For this part, you may not use any fancy String or Collection methods that do all the work for you. (In fact, the automated tests will catch if you try to do this and fail.)

Our general strategy for this part (which you will implement in main) is the following:

I don’t understand the difference between == and .equals()?

When dealing with primitives (e.g. int, char, any type that starts with a lowercase letter), you can freely use == to compare their values. However, with objects in Java (e.g. Strings, any type starting with uppercase letters), the ‘==’ only compares the addresses in memory, not the values themselves, which can lead to faulty comparisons.

Instead, objects like Strings have a built in method ‘equals’ that you can use for value comparison. For example, for strings str1 and str2, you would want to write str1.equals(str2) if you want to check their equality, not str1 == str2.

My splitBySpaces method is failing on the first test case, but passes the rest?

The first test passes in the empty string “”, and wants you to return a list containing the empty string as the single value. Make sure that your code handles this case accordingly.

As an aside, if you use string comparison to handle this case, be sure to use s.equals(t) instead of s == t, as strings are Objects and must be compared with .equals().

Substitution Cipher Solver (B Tests)

For the last part of this project, we will write a solver for a more complicated cipher in which the mapping between source letters and target letters can be more than just a shift (like in a Caesar Cipher). In particular, we will look at “simple monoalphabetic substitution ciphers” in which the letters are shuffled to create a substitution alphabet and each letter in the source text is replaced with its corresponding letter in the substitution alphabet. For example:

a b c d e f g h i j k l m n o p q r s t u v w x y z
q r z d f e y j h i n m l k o p a b s t u v w x g c

Unlike with Caesar Ciphers, where we can just try every possible key, there are far too many possibilities for substitution ciphers to try them all. Instead, we’ll we’ll use a probabilistic method to only try keys that are likely to be better. To make it even more interesting, we will remove the spaces from the cipher texts for this part as well.

Setting Up A Substitution Cipher

Most of the work in this part will be in implementing a SubstitutionCipher class which will represent a single instance of a substitution cipher key and the corresponding cipher text to decrypt. Your class should store exactly these two things as fields. We represent a key with a Map from Character to Character and the cipher text as just a String. You are now ready to implement the following methods.

public SubstitutionCipher(String ciphertext, Map<Character, Character> key)

Initializes a SubstitutionCipher to use the provided cipher text and key.

public String getCipherText()

Returns the cipher text for this SubstitutionCipher.

public String getPlainText()

Applies the current key to the current cipher text to get a proposed plain text to return.

public SubstitutionCipher randomSwap()

Returns a new SubstitutionCipher with the same cipher text as this one, but with a modified key such that exactly one pair of keys have exchanged their values.


Note that you must use the Random RANDOM defined at the top of the class rather than creating a new one for our tests to work correctly. Your algorithm should start by generating two characters to swap using RANDOM. Make sure to handle the case where both generated characters are the same by repeatedly generating a second new character until they are different. This method should not modify the original cipher.

public SubstitutionCipher(String ciphertext)

Initializes a SubstitutionCipher to use the provided cipher text and a random key.


The random key should be generated by first creating a SubstitutionCipher that uses the “identity” key (everything maps to itself), and then using randomSwap \(10000\) times to generate a new key to be used.

“Scoring” Phrases and Quadgrams

Ultimately, we will transform an initial random key into keys that fit the text better and better. To figure out if a candidate key is better, we need a way of comparing the “english-ness” of two phrases. We call this idea scoring. To get a score for a phrase, we will split it into rolling chunks of four letters and evaluate their probabilities separately. (Formally, we are making the assumption that each of these chunks is independent, which is not true in general, but is good enough for our purposes.) For example, if we were working with the phrase DOGGIES, we would split it into the chunks: DOGG, OGGI, GGIE, GIES and get the probabilities of each of these. Each of these four letter pieces is called a quadgram.

Computationally, dealing with very small numbers is difficult; so, we will deal with “log probabilities” (also called “log likelihoods”) instead. We have provided you with a class QuadGramLikelihoods which reads in a file of probabilities and provides its user with log likelihoods for all quadgrams. You do not need to understand how it works, but, you are, of course, welcome to read the code. We also used this class in lecture; so, you could look back to those notes.

public double getScore(QuadGramLikelihoods likelihoods)

Returns the sum of the log likelihoods (which you can get from the parameter likelihoods) of each consecutive quadgram in the plain text. Limit the usage of getPlainText if your implementation is too slow.

Finding a “Good” Solution

To find a “good solution”, we will put together all the previous pieces into an algorithm called hill climbing which successively improves a random solution. You should implement it as follows.

public SubstitutionCipher getSolution(QuadGramLikelihoods likelihoods)

We have implemented a client for your SubstitutionCipher that reads a cipher from the console and uses your class to decrypt it.

My SubstitutionCipherSolverTests are running forever?

One potential cause of this could be in your getScore method. Note, above we say:

Avoid using getPlainText more than once if your implementation is too slow.

Running getPlainText many times (e.g. within a loop) within getScore may result in slowdown because you are recomputing the same value over and over. If you are doing this, try to rewrite your code such that getPlainText only gets called once.

My SubstitutionCipherSolverTests are failing, but everything else passes?

This could be because of several things, but one thing to check is that when you are calling your randomSwap, be sure to call it on the cipher you want. Calling it on this, i.e. by writing “randomSwap()” or “this.randomSwap()”, will perform a random swap on the current cipher (which under the provided algorithm for getSolution in the spec is never used). Instead, write “curr.randomSwap()” to call it on a cipher curr.

Another potential issue could be that you are not resetting your counter after you make a replacement. Recall from the spec that the condition for continuing is that fewer than 1000 trials in a row have not resulted in a replacement, so when a replacement occurs, it breaks the streak. Be sure your code reflects that.

I am getting an UnsupportedOperatorException and I don’t understand why?

One cause of this could be that you are attempting to modify the immutable key field of SubstitutionCipher. The key field, aka “this.key”, is immutable because it is part of the SubstitutionCipher object. Thus, if you attempt to change its keys or values, it will throw the UnsupportedOperatorException error.

To avoid this error, if you need to modify keys, consider making a copy of the this.keys hashmap with the following syntax:

Map newKey = new HashMap<>(this.key)

newKey will now hold all the same keys and values of this.key, but you can modify it freely.

My random cipher constructor tests are failing?

One thing this could be is that you are not calling randomSwap enough times. While writing your second SubstitutionCipher constructor, please double check that you are calling randomSwap() 10,000 times, as per the spec.

The idea here is to start with an identity key, meaning that every letter maps to itself. When we call randomSwap() only once, we only switch one of the mappings of the letters, meaning that 24 of the letters still map to themselves. We want to call randomSwap() 10,000 times to create a unique key where we ensure randomness and key complexity and so that we can randomly initialize a key for the cipher text.

My randomSwap tests are failing? I’m using this.key.keySet()

When you use this.key.keySet() within your code to generate the list of characters to randomize from, this will cause nondeterministic behavior in our tests. This is because the call this.key.keySet() will make a set out of the keys, however, sets are non-ordered and so the list will not be ordered deterministically. Thus, it will fail our tests which set specific random seeds, as it will randomize differently than expected.

Instead of using this.key.keySet() to get the letters to randomize from, consider using ALPHABET from CaesarCipher. You can either copy it over manually, or use by indexing into “CaesarCipher.ALPHABET”.

Decrypting cryptogram.txt (A Tests)

Committing and Pushing to the Repository

A “commit” is version control language for a chunk of changes that mean something when grouped together. Importantly, commits will only stay locally on your machine until you push them to the gitlab server.

Checking the Results on gitlab

When you go to https://gitlab.caltech.edu, you should see your project01 repository listed. Click on it. All the way on the right, there should be a red “X”, an orange “pause” symbol, a blue “time” symbol, or a green “check” symbol. This symbol indicates the status of the automated tests which will check the correctness of your code. Click on it. On the new page, you should see a table, and under “status”, there should be another symbol. Click on it. The new page will list the “categories” of tests for this repository. These will be some combination of “A”, “B”, “C”, and “D”. These letters indicate what grade you earned on the assessment for passing those tests. If you click on one of them, it will detail for you which sub-tests you passed and failed which might help you debug.

Note that the “A Tests” for project01 are only on gitlab. So, you must check there to make sure you are passing them. It is good practice to always check gitlab after submitting a project or lab in this course.

My A tests are failing but my output looks correct?

If you notice that you are outputting what seems to be correct English text into your plaintext file but the A tests are failing on git, one thing to check is that you are reading every line of the input. If you only read the first line, you will not pass the tests because you do not have the complete output, even though it will look correct (assuming everything is correct in SubstitutionCipher).

Thus, you should take advantage of the .hasNextLine() and .nextLine() methods of the Scanner to make sure that you read the entire input in your function.

Another thing that could cause this behavior is decoding each line of the file by itself. The whole cryptogram file is encrypted with a single key mapping, so the key should be the same for all lines–if you get different keys for different lines, then your decoded text will almost certainly be incorrect in some places. Using all of the encrypted text when searching for the correct key ensures that your likelihood sum calculation is accounting for the likelihoods of all of the words in the cryptogram (versus just those in one line), which makes it more likely that you find the right key as well.

Submit OpenEndeds For Credit