Projects and Labs in CS 2
For projects in CS 2, you will sometimes be allowed to work in groups of two. This project, however, is solo. You may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. The projects are intended to be more difficult than the labs; they test how much of the material that lecture and lab introduced that you have actually learned. You will notice that labs are much more straight-forward about how to implement things, whereas projects will have a much larger design component where you have to decide how to set things up. We will ramp up slowly on this; so, for this project, we will still give you a lot of guidance. However, if you get stuck for more than 30 minutes on a bug, you should come to office hours. We want this experience to be an enjoyable one, and nobody really enjoys debugging for hours on end.
From CS 1 to CS 2
Please note that there is quite a jump in complexity between CS 1 and CS 2, and, while the beginning of this project will resemble CS 1, the end will more closely resemble what CS 2 will look like.
A cipher is an algorithm for encrypting or decrypting text. Generally, we call the unencrypted text “plain text” and encrypted version “cipher text”.
Here is an example of text that uses a Caesar cipher (which you will implement in this project):
Notably, the original text is readable, but the encrypted version is not.
A Caesar Cipher is computed by “shifting” each letter of the text by a constant number, \(k\), as if they were in a circle. For example, the following “ring” represents a mapping for a Caesar Cipher where \(k = 4\):
Caesar Cipher Creator (D Tests)
In this part, you will be editing the
src/CaesarCipher.java file which has empty methods for you to fill in. As we will be providing you with Python
code for these methods, you might find the handout that compares Python and Java syntax on the course website to be helpful.
Implementing the Class
Ultimately, the goal for this part is to take in a piece of text (the “plain text”) and a number of letters to rotate by (the “key”), and output the encrypted text (the “cipher text”).
You should begin by reading the
main method, which we have written for you, to get a sense of how Java can read from and write to the console.
Seriously, read the
main method. You will need to understand
Scanners for later in the project.
In Python, there is only one way to represent
Strings of length one; however, in Java, there are two. First, we can represent them as a string like in Python. Second, they
constitute a separate type called
Character). The reason for this distinction is that, because we have types in Java, it is useful to
be able to specify that we’re working with a single letter. Unlike in Python,
Strings in Java must use double quotes and
chars must use single quotes.
The first function you should implement is
findIndexInAlphabet. Note that these purple boxes in the guide always indicate a function you will have to write.
public static int findIndexInAlphabet(char c)
You should loop through the valid indices of
ALPHABET and return the index equal to the input character
c if one exists. If no such index exists, you should
In Python, this method would approximately look like:
for i in range(len(ALPHABET)):
if ALPHABET[i] == c:
In Python, you can’t have two methods named the same thing, even if they have different arguments; however, in Java, this is allowed and called method overloading. Next, you will implement two versions of the
rot method: one for if we want to rotate a character and another for rotating a
String. Second, you should implement the
public static char rot(char c, int amount)
You should convert the input character into a number. Then, add the rotation
amount to that number, making sure to wrap the number around so it is between
Then, convert it back to a character.
In Python, this method would approximately look like:
def rot(c, n):
idx = findIndexInAlphabet(c)
newIdx = (idx + n) % len(ALPHABET)
Finally, you will write the
String version which gets called directly from
public static String rot(String line, int amount)
You should rotate each character by the rotation
amount and return a new string with all of the new characters.
In Python, this method would approximately look like:
def rot(line, n):
output = ""
for c in line:
output += rot(c, n)
Running the Tests
Those three functions are the only code you have to write for the
CaesarCipher class. The next step is testing the code. Normally, you would test it by running
main which you are welcome
to do, but we’ve provided you with our tests which do this automatically. On the top right of the IntelliJ interface is a green “play” button which will run the tests whenever
the “configuration” directly to the left is set to a “Tests” option. If you make sure it says “D Tests” and hit the play button, the
JUnit testing interface should pop
up on the bottom.
findIndexInAlphabet and the two variants of
rot. Then, run the
D Tests as described above.
If you correctly implemented everything so far, the
D tests will pass with a green checkmark. If a test is failing, take a look at the test description and hints in the console to help debug!
Caesar Cipher Solver (C Tests)
In the previous part, we encrypted a piece of text using a Caesar Cipher. In this part, we will go in the opposite direction: that is, we will decrypt the cipher text back into the plain text. For a Caesar cipher, this means we need to rotate every letter by \(26 - k\), where \(k\) is the key used to encrypt the text originally. Since there are only \(26\) possible values of \(k\), we can just try all of them and see which one “works best”; this technique is called brute forcing a solution.
In the previous part, you may have noticed that a lot of the information that was in this guide was also in comments above the individual methods. Rather than repeating the exact same thing in the guide and the comments, we will often split the information across the two sources. Usually, the high-level requirements and hints will be in the guide and the details and examples will be in the code itself. We recognize that you are not used to this, but, this is the way the real world works: there is no one source of information in real projects.
For this part, you may not use any fancy
Collection methods that do all the work for you. (In fact, the automated tests will catch if you try to do this and
Our general strategy for this part (which you will implement in
main) is the following:
- Create a Scanner for the console
- Print out “Type a sentence to decrypt: “ (with no newline)
- Get a line of input from the user
- Break up the line into a list of words
- For each possible rotation of the letters:
- If more than half of the words in the sentence are also in the dictionary
- Print out the sentence with rotated letters
src/CaesarCipherSolver.java, we have provided you with a required “method decomposition” for this program. We strongly recommend you implement
the methods in the order they appear in the file. The header comments contain specifications for each of the methods we want you to write.
Don’t forget to run the
C Tests when you’re done.
When dealing with primitives (e.g. int, char, any type that starts with a lowercase letter), you can freely use == to compare their values. However, with objects in Java (e.g. Strings, any type starting with uppercase letters), the ‘==’ only compares the addresses in memory, not the values themselves, which can lead to faulty comparisons.
Instead, objects like Strings have a built in method ‘equals’ that you can use for value comparison. For example, for strings str1 and str2, you would want to write str1.equals(str2) if you want to check their equality, not str1 == str2.
The first test passes in the empty string “”, and wants you to return a list containing the empty string as the single value. Make sure that your code handles this case accordingly.
As an aside, if you use string comparison to handle this case, be sure to use s.equals(t) instead of s == t, as strings are Objects and must be compared with .equals().
Substitution Cipher Solver (B Tests)
For the last part of this project, we will write a solver for a more complicated cipher in which the mapping between source letters and target letters can be more than just a shift (like in a Caesar Cipher). In particular, we will look at “simple monoalphabetic substitution ciphers” in which the letters are shuffled to create a substitution alphabet and each letter in the source text is replaced with its corresponding letter in the substitution alphabet. For example:
Unlike with Caesar Ciphers, where we can just try every possible key, there are far too many possibilities for substitution ciphers to try them all. Instead, we’ll we’ll use a probabilistic method to only try keys that are likely to be better. To make it even more interesting, we will remove the spaces from the cipher texts for this part as well.
Setting Up A Substitution Cipher
Most of the work in this part will be in implementing a
SubstitutionCipher class which will represent a single instance of a substitution cipher key
and the corresponding cipher text to decrypt. Your class should store exactly these two things as fields. We represent a key with a
Character and the cipher text as just a
String. You are now ready to implement the following methods.
public SubstitutionCipher(String ciphertext, Map<Character, Character> key)
SubstitutionCipher to use the provided cipher text and key.
public String getCipherText()
Returns the cipher text for this
public String getPlainText()
Applies the current key to the current cipher text to get a proposed plain text to return.
public SubstitutionCipher randomSwap()
Returns a new
SubstitutionCipher with the same cipher text as this one,
but with a modified key such that exactly one pair of keys have exchanged their values.
Note that you must use the
Random RANDOM defined at the top of the class rather than creating a new one for our tests to work correctly.
Your algorithm should start by generating two characters to swap using
RANDOM. Make sure to handle the case where both generated characters are
the same by repeatedly generating a second new character until they are different. This method should not modify the original cipher.
public SubstitutionCipher(String ciphertext)
SubstitutionCipher to use the provided cipher text and a random key.
The random key should be generated by first creating a
SubstitutionCipher that uses the “identity” key (everything maps to itself), and then
randomSwap \(10000\) times to generate a new key to be used.
“Scoring” Phrases and Quadgrams
Ultimately, we will transform an initial random key into keys that fit the text better and better. To figure out if a candidate key is better, we need a way of comparing the
“english-ness” of two phrases. We call this idea scoring. To get a score for a phrase, we will split it into rolling chunks of four letters and evaluate their probabilities
separately. (Formally, we are making the assumption that each of these chunks is independent, which is not true in general, but is good enough for our purposes.) For example, if we were
working with the phrase
DOGGIES, we would split it into the chunks:
GIES and get the probabilities of each of these. Each of these four
letter pieces is called a quadgram.
Computationally, dealing with very small numbers is difficult; so, we will deal with “log probabilities” (also called “log likelihoods”) instead.
We have provided you with a class
QuadGramLikelihoods which reads in a file of probabilities and provides its user with log likelihoods for all quadgrams. You do not need to understand
how it works, but, you are, of course, welcome to read the code. We also used this class in lecture; so, you could look back to those notes.
public double getScore(QuadGramLikelihoods likelihoods)
Returns the sum of the log likelihoods (which you can get from the parameter
likelihoods) of each consecutive quadgram in the plain text.
Limit the usage of
getPlainText if your implementation is too slow.
Finding a “Good” Solution
To find a “good solution”, we will put together all the previous pieces into an algorithm called hill climbing which successively improves a random solution. You should implement it as follows.
public SubstitutionCipher getSolution(QuadGramLikelihoods likelihoods)
- Create a cipher with this cipher’s ciphertext and a random key \(C\)
- While fewer than 1000 trials in a row have not resulted in a replacement:
- Randomly swap two letters in \(C\) to make a new key \(M\)
- If \(M\) scores better than \(C\):
- Replace \(C\) with \(M\)
We have implemented a client for your
SubstitutionCipher that reads a cipher from the console and uses your class to decrypt it.
Implement all the methods, as explained above, in the
Then, run the
B Tests to check your work.
One potential cause of this could be in your
getScore method. Note, above we say:
getPlainText more than once if your implementation is too slow.
getPlainText many times (e.g. within a loop) within getScore may result in slowdown because you are recomputing the same value over and over. If you are doing this, try to rewrite your code such that getPlainText only gets called once.
This could be because of several things, but one thing to check is that when you are calling your
randomSwap, be sure to call it on the cipher you want. Calling it on
this, i.e. by writing “randomSwap()” or “this.randomSwap()”, will perform a random swap on the current cipher (which under the provided algorithm for getSolution in the spec is never used). Instead, write “curr.randomSwap()” to call it on a cipher
Another potential issue could be that you are not resetting your counter after you make a replacement. Recall from the spec that the condition for continuing is that fewer than 1000 trials in a row have not resulted in a replacement, so when a replacement occurs, it breaks the streak. Be sure your code reflects that.
One cause of this could be that you are attempting to modify the immutable key field of SubstitutionCipher. The key field, aka “this.key”, is immutable because it is part of the SubstitutionCipher object. Thus, if you attempt to change its keys or values, it will throw the UnsupportedOperatorException error.
To avoid this error, if you need to modify keys, consider making a copy of the this.keys hashmap with the following syntax:
Map newKey = new HashMap<>(this.key)
newKey will now hold all the same keys and values of this.key, but you can modify it freely.
One thing this could be is that you are not calling
randomSwap enough times. While writing your second SubstitutionCipher constructor, please double check that you are calling
randomSwap() 10,000 times, as per the spec.
The idea here is to start with an identity key, meaning that every letter maps to itself. When we call
randomSwap() only once, we only switch one of the mappings of the letters, meaning that 24 of the letters still map to themselves. We want to call
randomSwap() 10,000 times to create a unique key where we ensure randomness and key complexity and so that we can randomly initialize a key for the cipher text.
When you use
this.key.keySet() within your code to generate the list of characters to randomize from, this will cause nondeterministic behavior in our tests. This is because the call
this.key.keySet() will make a set out of the keys, however, sets are non-ordered and so the list will not be ordered deterministically. Thus, it will fail our tests which set specific random seeds, as it will randomize differently than expected.
Instead of using
this.key.keySet() to get the letters to randomize from, consider using
ALPHABET from CaesarCipher. You can either copy it over manually, or use by indexing into “CaesarCipher.ALPHABET”.
cryptogram.txt (A Tests)
Finally, you can use all your hard work to decrypt the secret message hidden in the file
cryptogram.txt. Write a program to remove
all non-upper-case characters (including spaces and newlines) from
SubstitutionCipherSolver on the result, and put the final decrypted cryptogram in
plaintext.txt. You do not need to submit this extra program; just submit the
Please note that the cryptogram is multiple lines.
Committing and Pushing to the Repository
A “commit” is version control language for a chunk of changes that mean something when grouped together. Importantly, commits will only stay locally on your machine
until you push them to the
To commit to the repository, click the green checkmark on the top-right of IntelliJ’s interface. A window will pop up and
ask you to type in a message. Choose some meaningful description of your feature, then click the little down arrow next to the “Commit” button on the bottom right of the dialog.
Choose “Commit and Push” in the little menu that pops up. If IntelliJ asks you if you’re sure you want to commit and push, say yes.
Checking the Results on
When you go to https://gitlab.caltech.edu, you should see your
project01 repository listed. Click on it. All the way on the right, there should be a red “X”,
an orange “pause” symbol, a blue “time” symbol, or a green “check” symbol. This symbol indicates the status of the automated tests which will
check the correctness of your code. Click on it. On the new page, you should see a table, and under “status”, there should be another symbol. Click on it. The new page will list
the “categories” of tests for this repository. These will be some combination of “A”, “B”, “C”, and “D”. These letters indicate what grade you earned on the assessment for
passing those tests. If you click on one of them, it will detail for you which sub-tests you passed and failed which might help you debug.
Note that the “A Tests” for
project01 are only on
gitlab. So, you must check there to make sure you are passing them. It is good practice to
gitlab after submitting a project or lab in this course.
Task 7. Check to make sure you are passing all the tests on gitlab. If you are, you’re done!
If you notice that you are outputting what seems to be correct English text into your plaintext file but the A tests are failing on git, one thing to check is that you are reading every line of the input. If you only read the first line, you will not pass the tests because you do not have the complete output, even though it will look correct (assuming everything is correct in SubstitutionCipher).
Thus, you should take advantage of the
.nextLine() methods of the Scanner to make sure that you read the entire input in your function.
Another thing that could cause this behavior is decoding each line of the file by itself. The whole cryptogram file is encrypted with a single key mapping, so the key should be the same for all lines–if you get different keys for different lines, then your decoded text will almost certainly be incorrect in some places. Using all of the encrypted text when searching for the correct key ensures that your likelihood sum calculation is accounting for the likelihoods of all of the words in the cryptogram (versus just those in one line), which makes it more likely that you find the right key as well.
Task 8. Answer all the OpenEnded questions below this task. These will be graded entirely on completion/effort, but they will really help us improve the course. Then, click the “Submit OpenEndeds” button below.