Projects and Labs in CS 2

For projects in CS 2, you will sometimes be allowed to work in groups of two. This project, however, is solo. You may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. The projects are intended to be more difficult than the labs; they test how much of the material that lecture and lab introduced that you have actually learned. You will notice that labs are much more straight-forward about how to implement things, whereas projects will have a much larger design component where you have to decide how to set things up. We will ramp up slowly on this; so, for this project, we will still give you a lot of guidance. However, if you get stuck for more than 30 minutes on a bug, you should come to office hours. We want this experience to be an enjoyable one, and nobody really enjoys debugging for hours on end.

Setup

If you haven’t already, do the setup. Otherwise, just register for the project using grinch: https://grinch.caltech.edu/register and clone the repository as explained in the setup instructions.

From CS 1 to CS 2

Please note that there is quite a jump in complexity between CS 1 and CS 2, and, while the beginning of this project will resemble CS 1, the end will more closely resemble what CS 2 will look like.

Caesar Ciphers

A cipher is an algorithm for encrypting or decrypting text. Generally, we call the unencrypted text “plain text” and encrypted version “cipher text”.

Here is an example of text that uses a Caesar cipher (which you will implement in this project):

encdec

Notably, the original text is readable, but the encrypted version is not.

A Caesar Cipher is computed by “shifting” each letter of the text by a constant number, \(k\), as if they were in a circle. For example, the following “ring” represents a mapping for a Caesar Cipher where \(k = 4\):

circle

Caesar Cipher Creator (D Tests)

In this part, you will be editing the src/CaesarCipher.java file which has empty methods for you to fill in. As we will be providing you with Python code for these methods, you might find the handout that compares Python and Java syntax on the course website to be helpful.

Implementing the Class

Ultimately, the goal for this part is to take in a piece of text (the “plain text”) and a number of letters to rotate by (the “key”), and output the encrypted text (the “cipher text”). You should begin by reading the main method, which we have written for you, to get a sense of how Java can read from and write to the console.

Task 1. Seriously, read the main method. You will need to understand Scanners for later in the project.

In Python, there is only one way to represent Strings of length one; however, in Java, there are two. First, we can represent them as a string like in Python. Second, they constitute a separate type called char (or Character). The reason for this distinction is that, because we have types in Java, it is useful to be able to specify that we’re working with a single letter. Unlike in Python, Strings in Java must use double quotes and chars must use single quotes.

The first function you should implement is findIndexInAlphabet. Note that these purple boxes in the specification always indicate a function you will have to write.

public static int findIndexInAlphabet(char c)

You should loop through the valid indices of ALPHABET and return the index equal to the input character c if one exists. If no such index exists, you should return -1.

In Python, this method would approximately look like:

def findIndexInAlphabet(c):
    for i in range(len(ALPHABET)):
        if ALPHABET[i] == c:
            return i
    return -1

In Python, you can’t have two methods named the same thing, even if they have different arguments; however, in Java, this is allowed and called method overloading. Next, you will implement two versions of the rot method: one for if we want to rotate a character and another for rotating a String. Second, you should implement the char version.

public static char rot(char c, int amount)

You should convert the input character into a number. Then, add the rotation amount to that number, making sure to wrap the number around so it is between 0 and 25. Then, convert it back to a character.

In Python, this method would approximately look like:

def rot(c, n):
    idx = findIndexInAlphabet(c)
    newIdx = (idx + n) % len(ALPHABET)
    return ALPHABET[newIdx]

Finally, you will write the String version which gets called directly from main.

public static String rot(String line, int amount)

You should rotate each character by the rotation amount and return a new string with all of the new characters.

In Python, this method would approximately look like:

def rot(line, n):
    output = ""
    for c in line:
        output += rot(c, n)
    return output

Running the Tests

Those three functions are the only code you have to write for the CaesarCipher class. The next step is testing the code. Normally, you would test it by running main which you are welcome to do, but we’ve provided you with our tests which do this automatically. On the top right of the IntelliJ interface is a green “play” button which will run the tests whenever the “configuration” directly to the left is set to a “Tests” option. If you make sure it says “D Tests” and hit the play button, the JUnit testing interface should pop up on the bottom.

Task 2. Implement findIndexInAlphabet and the two variants of rot. Then, run the D Tests as described above. If you correctly implemented everything so far, the D tests will pass with a green checkmark.

Caesar Cipher Solver (C Tests)

In the previous part, we encrypted a piece of text using a Caesar Cipher. In this part, we will go in the opposite direction: that is, we will decrypt the cipher text back into the plain text. For a Caesar cipher, this means we need to rotate every letter by \(26 - k\), where \(k\) is the key used to encrypt the text originally. Since there are only \(26\) possible values of \(k\), we can just try all of them and see which one “works best”; this technique is called brute forcing a solution.

In the previous part, you may have noticed that a lot of the information that was in this specification was also in comments above the individual methods. Rather than repeating the exact same thing in the specification and the comments, we will often split the information across the two sources. Usually, the high-level requirements and hints will be in the specification and the details and examples will be in the code itself. We recognize that you are not used to this, but, this is the way the real world works: there is no one source of information in real projects.

For this part, you may not use any fancy String or Collection methods that do all the work for you. (In fact, the automated tests will catch if you try to do this and fail.)

Our general strategy for this part (which you will implement in main) is the following:

Create a Scanner for the console
Print out “Type a sentence to decrypt: “ (with no newline)
Get a line of input from the user
Break up the line into a list of words
For each possible rotation of the letters:
If more than half of the words in the sentence are also in the dictionary
- Print out the sentence with rotated letters

Task 3. In src/CaesarCipherSolver.java, we have provided you with a required “method decomposition” for this program. We strongly recommend you implement the methods in the order they appear in the file. The header comments contain specifications for each of the methods we want you to write.

Don’t forget to run the C Tests when you’re done.

Substitution Cipher Solver (B Tests)

For the last part of this project, we will write a solver for a more complicated cipher in which the mapping between source letters and target letters can be more than just a shift (like in a Caesar Cipher). In particular, we will look at “simple monoalphabetic substitution ciphers” in which the letters are shuffled to create a substitution alphabet and each letter in the source text is replaced with its corresponding letter in the substitution alphabet. For example:

↕

Unlike with Caesar Ciphers, where we can just try every possible key, there are far too many possibilities for substitution ciphers to try them all. Instead, we’ll we’ll use a probabilistic method to only try keys that are likely to be better. To make it even more interesting, we will remove the spaces from the cipher texts for this part as well.

Setting Up A Substitution Cipher

Most of the work in this part will be in implementing a SubstitutionCipher class which will represent a single instance of a substitution cipher key and the corresponding cipher text to decrypt. Your class should store exactly these two things as fields. We represent a key with a Map from Character to Character and the cipher text as just a String. You are now ready to implement the following methods.

public SubstitutionCipher(String ciphertext, Map<Character, Character> key)

Initializes a SubstitutionCipher to use the provided cipher text and key.

public String getCipherText()

Returns the cipher text for this SubstitutionCipher.

public String getPlainText()

Applies the current key to the current cipher text to get a proposed plain text to return.

public SubstitutionCipher randomSwap()

Returns a new SubstitutionCipher with the same cipher text as this one, but with a modified key such that exactly one pair of keys have exchanged their values.

Note that you must use the Random RANDOM defined at the top of the class rather than creating a new one for our tests to work correctly. Your algorithm should start by generating two characters to swap using RANDOM. Make sure to handle the case where both generated characters are the same by repeatedly generating a second new character until they are different. This method should not modify the original cipher.

public SubstitutionCipher(String ciphertext)

Initializes a SubstitutionCipher to use the provided cipher text and a random key.

The random key should be generated by first creating a SubstitutionCipher that uses the “identity” key (everything maps to itself), and then using randomSwap \(10000\) times to generate a new key to be used.

“Scoring” Phrases and Quadgrams

Ultimately, we will transform an initial random key into keys that fit the text better and better. To figure out if a candidate key is better, we need a way of comparing the “english-ness” of two phrases. We call this idea scoring. To get a score for a phrase, we will split it into rolling chunks of four letters and evaluate their probabilities separately. (Formally, we are making the assumption that each of these chunks is independent, which is not true in general, but is good enough for our purposes.) For example, if we were working with the phrase DOGGIES, we would split it into the chunks: DOGG, OGGI, GGIE, GIES and get the probabilities of each of these. Each of these four letter pieces is called a quadgram.

Computationally, dealing with very small numbers is difficult; so, we will deal with “log probabilities” (also called “log likelihoods”) instead. We have provided you with a class QuadGramLikelihoods which reads in a file of probabilities and provides its user with log likelihoods for all quadgrams. You do not need to understand how it works, but, you are, of course, welcome to read the code. We also used this class in lecture; so, you could look back to those notes.

public double getScore(QuadGramLikelihoods likelihoods)

Returns the sum of the log likelihoods (which you can get from the parameter likelihoods) of each consecutive quadgram in the cipher text. Avoid using getPlainText if your implementation is too slow.

Finding a “Good” Solution

To find a “good solution”, we will put together all the previous pieces into an algorithm called hill climbing which successively improves a random solution. You should implement it as follows.

public SubstitutionCipher getSolution(QuadGramLikelihoods likelihoods)

Create a cipher with this cipher’s ciphertext and a random key \(C\)
While fewer than 1000 trials in a row have not resulted in a replacement:
- Randomly swap two letters in \(C\) to make a new key \(M\)
- If \(M\) scores better than \(C\):
  - Replace \(C\) with \(M\)

We have implemented a client for your SubstitutionCipher that reads a cipher from the console and uses your class to decrypt it.

Task 4. Implement all the methods, as explained above, in the SubstitutionCipher class. Then, run the B Tests to check your work.

Decrypting `cryptogram.txt` (A Tests)

Task 5. Finally, you can use all your hard work to decrypt the secret message hidden in the file cryptogram.txt. Write a program to remove all non-upper-case characters (including spaces and newlines) from cryptogram.txt, run SubstitutionCipherSolver on the result, and put the final decrypted cryptogram in plaintext.txt. You do not need to submit this extra program; just submit the plaintext.txt file.

Please note that the cryptogram is multiple lines.

Committing and Pushing to the Repository

A “commit” is version control language for a chunk of changes that mean something when grouped together. Importantly, commits will only stay locally on your machine until you push them to the gitlab server.

Task 6. To commit to the repository, click the green checkmark on the top-right of IntelliJ’s interface. A window will pop up and ask you to type in a message. Choose some meaningful description of your feature, then click the little down arrow next to the “Commit” button on the bottom right of the dialog.
Choose “Commit and Push” in the little menu that pops up. If IntelliJ asks you if you’re sure you want to commit and push, say yes.

Checking the Results on `gitlab`

When you go to https://gitlab.caltech.edu, you should see your project01 repository listed. Click on it. All the way on the right, there should be a red “X”, an orange “pause” symbol, a blue “time” symbol, or a green “check” symbol. This symbol indicates the status of the automated tests which will check the correctness of your code. Click on it. On the new page, you should see a table, and under “status”, there should be another symbol. Click on it. The new page will list the “categories” of tests for this repository. These will be some combination of “A”, “B”, “C”, and “D”. These letters indicate what grade you earned on the assessment for passing those tests. If you click on one of them, it will detail for you which sub-tests you passed and failed which might help you debug.

Note that the “A Tests” for project01 are only on gitlab. So, you must check there to make sure you are passing them. It is good practice to always check gitlab after submitting a project or lab in this course.

Task 7. Check to make sure you are passing all the tests on gitlab. If you are, you’re done!

Projects and Labs in CS 2

Setup

From CS 1 to CS 2

Caesar Ciphers

Caesar Cipher Creator (D Tests)

Implementing the Class

Running the Tests

Caesar Cipher Solver (C Tests)

Substitution Cipher Solver (B Tests)

Setting Up A Substitution Cipher

“Scoring” Phrases and Quadgrams

Finding a “Good” Solution

Decrypting cryptogram.txt (A Tests)

Committing and Pushing to the Repository

Checking the Results on gitlab

Decrypting `cryptogram.txt` (A Tests)

Checking the Results on `gitlab`