Projects and Labs in CS 2
For projects in CS 2, you will sometimes be allowed to work in groups of two. This project, however, is solo. You may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. The projects are intended to be more difficult than the labs; they test how much of the material that lecture and lab introduced that you have actually learned. You will notice that labs are much more straight-forward about how to implement things, whereas projects will have a much larger design component where you have to decide how to set things up. We will ramp up slowly on this; so, for this project, we will still give you a lot of guidance. However, if you get stuck for more than 30 minutes on a bug, you should come to office hours. We want this experience to be an enjoyable one, and nobody really enjoys debugging for hours on end.
Setup
If you haven’t already, do the setup. Otherwise, just register for the project using grinch
: https://grinch.caltech.edu/register and clone the repository
as explained in the setup instructions.
From CS 1 to CS 2
Please note that there is quite a jump in complexity between CS 1 and CS 2, and, while the beginning of this project will resemble CS 1, the end will more closely resemble what CS 2 will look like.
Caesar Ciphers
A cipher is an algorithm for encrypting or decrypting text. Generally, we call the unencrypted text “plain text” and encrypted version “cipher text”.
Here is an example of text that uses a Caesar cipher (which you will implement in this project):
Notably, the original text is readable, but the encrypted version is not.
A Caesar Cipher is computed by “shifting” each letter of the text by a constant number, \(k\), as if they were in a circle. For example, the following “ring” represents a mapping for a Caesar Cipher where \(k = 4\):
Caesar Cipher Creator (D Tests)
In this part, you will be editing the src/CaesarCipher.java
file which has empty methods for you to fill in. As we will be providing you with Python
code for these methods, you might find the handout that compares Python and Java syntax on the course website to be helpful.
Implementing the Class
Ultimately, the goal for this part is to take in a piece of text (the “plain text”) and a number of letters to rotate by (the “key”), and output the encrypted text (the “cipher text”).
You should begin by reading the main
method, which we have written for you, to get a sense of how Java can read from and write to the console.
Task 1.
Seriously, read the main
method. You will need to understand Scanner
s for later in the project.
In Python, there is only one way to represent String
s of length one; however, in Java, there are two. First, we can represent them as a string like in Python. Second, they
constitute a separate type called char
(or Character
). The reason for this distinction is that, because we have types in Java, it is useful to
be able to specify that we’re working with a single letter. Unlike in Python, String
s in Java must use double quotes and char
s must use single quotes.
The first function you should implement is findIndexInAlphabet
. Note that these purple boxes in the guide always indicate a function you will have to write.
public static int findIndexInAlphabet(char c)
You should loop through the valid indices of ALPHABET
and return the index equal to the input character c
if one exists. If no such index exists, you should
return -1
.
In Python, this method would approximately look like:
def findIndexInAlphabet(c):
for i in range(len(ALPHABET)):
if ALPHABET[i] == c:
return i
return -1
In Python, you can’t have two methods named the same thing, even if they have different arguments; however, in Java, this is allowed and called method overloading. Next,
you will implement two versions of the rot
method: one for if we want to rotate a character and another for rotating a String
. Second, you should implement the char
version.
public static char rot(char c, int amount)
You should convert the input character into a number. Then, add the rotation amount
to that number, making sure to wrap the number around so it is between 0
and 25
.
Then, convert it back to a character.
In Python, this method would approximately look like:
def rot(c, n):
idx = findIndexInAlphabet(c)
newIdx = (idx + n) % len(ALPHABET)
return ALPHABET[newIdx]
Finally, you will write the String
version which gets called directly from main
.
public static String rot(String line, int amount)
You should rotate each character by the rotation amount
and return a new string with all of the new characters.
In Python, this method would approximately look like:
def rot(line, n):
output = ""
for c in line:
output += rot(c, n)
return output
Running the Tests
Those three functions are the only code you have to write for the CaesarCipher
class. The next step is testing the code. Normally, you would test it by running main
which you are welcome
to do, but we’ve provided you with our tests which do this automatically. On the top right of the IntelliJ interface is a green “play” button which will run the tests whenever
the “configuration” directly to the left is set to a “Tests” option. If you make sure it says “D Tests” and hit the play button, the JUnit
testing interface should pop
up on the bottom.
Task 2.
Implement findIndexInAlphabet
and the two variants of rot
. Then, run the D
Tests as described above.
If you correctly implemented everything so far, the D
tests will pass with a green checkmark. If a test is failing, take a look at the test description and hints in the console to help debug!
Caesar Cipher Solver (C Tests)
In the previous part, we encrypted a piece of text using a Caesar Cipher. In this part, we will go in the opposite direction: that is, we will decrypt the cipher text back into the plain text. For a Caesar cipher, this means we need to rotate every letter by \(26 - k\), where \(k\) is the key used to encrypt the text originally. Since there are only \(26\) possible values of \(k\), we can just try all of them and see which one “works best”; this technique is called brute forcing a solution.
In the previous part, you may have noticed that a lot of the information that was in this guide was also in comments above the individual methods. Rather than repeating the exact same thing in the guide and the comments, we will often split the information across the two sources. Usually, the high-level requirements and hints will be in the guide and the details and examples will be in the code itself. We recognize that you are not used to this, but, this is the way the real world works: there is no one source of information in real projects.
For this part, you may not use any fancy String
or Collection
methods that do all the work for you. (In fact, the automated tests will catch if you try to do this and
fail.)
Our general strategy for this part (which you will implement in main
) is the following:
- Create a Scanner for the console
- Print out “Type a sentence to decrypt: “ (with no newline)
- Get a line of input from the user
- Break up the line into a list of words
- For each possible rotation of the letters:
- If more than half of the words in the sentence are also in the dictionary
- Print out the sentence with rotated letters
Task 3.
In src/CaesarCipherSolver.java
, we have provided you with a required “method decomposition” for this program. We strongly recommend you implement
the methods in the order they appear in the file. The header comments contain specifications for each of the methods we want you to write.
Don’t forget to run the C Tests
when you’re done.
When dealing with primitives (e.g. int, char, any type that starts with a lowercase letter), you can freely use == to compare their values. However, with objects in Java (e.g. Strings, any type starting with uppercase letters), the ‘==’ only compares the addresses in memory, not the values themselves, which can lead to faulty comparisons.
Instead, objects like Strings have a built in method ‘equals’ that you can use for value comparison. For example, for strings str1 and str2, you would want to write str1.equals(str2) if you want to check their equality, not str1 == str2.
The first test passes in the empty string “”, and wants you to return a list containing the empty string as the single value. Make sure that your code handles this case accordingly.
As an aside, if you use string comparison to handle this case, be sure to use s.equals(t) instead of s == t, as strings are Objects and must be compared with .equals().
Substitution Cipher Solver (B Tests)
For the last part of this project, we will write a solver for a more complicated cipher in which the mapping between source letters and target letters can be more than just a shift (like in a Caesar Cipher). In particular, we will look at “simple monoalphabetic substitution ciphers” in which the letters are shuffled to create a substitution alphabet and each letter in the source text is replaced with its corresponding letter in the substitution alphabet. For example:
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z |
↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ | ↕ |
q | r | z | d | f | e | y | j | h | i | n | m | l | k | o | p | a | b | s | t | u | v | w | x | g | c |
Unlike with Caesar Ciphers, where we can just try every possible key, there are far too many possibilities for substitution ciphers to try them all. Instead, we’ll we’ll use a probabilistic method to only try keys that are likely to be better. To make it even more interesting, we will remove the spaces from the cipher texts for this part as well.
Setting Up A Substitution Cipher
Most of the work in this part will be in implementing a SubstitutionCipher
class which will represent a single instance of a substitution cipher key
and the corresponding cipher text to decrypt. Your class should store exactly these two things as fields. We represent a key with a Map
from Character
to Character
and the cipher text as just a String
. You are now ready to implement the following methods.
public SubstitutionCipher(String ciphertext, Map<Character, Character> key)
Initializes a SubstitutionCipher
to use the provided cipher text and key.
public String getCipherText()
Returns the cipher text for this SubstitutionCipher
.
public String getPlainText()
Applies the current key to the current cipher text to get a proposed plain text to return.
public SubstitutionCipher randomSwap()
Returns a new SubstitutionCipher
with the same cipher text as this one,
but with a modified key such that exactly one pair of keys have exchanged their values.
Note that you must use the Random RANDOM
defined at the top of the class rather than creating a new one for our tests to work correctly.
Your algorithm should start by generating two characters to swap using RANDOM
. Make sure to handle the case where both generated characters are
the same by repeatedly generating a second new character until they are different. This method should not modify the original cipher.
public SubstitutionCipher(String ciphertext)
Initializes a SubstitutionCipher
to use the provided cipher text and a random key.
The random key should be generated by first creating a SubstitutionCipher
that uses the “identity” key (everything maps to itself), and then
using randomSwap
\(10000\) times to generate a new key to be used.
“Scoring” Phrases and Quadgrams
Ultimately, we will transform an initial random key into keys that fit the text better and better. To figure out if a candidate key is better, we need a way of comparing the
“english-ness” of two phrases. We call this idea scoring. To get a score for a phrase, we will split it into rolling chunks of four letters and evaluate their probabilities
separately. (Formally, we are making the assumption that each of these chunks is independent, which is not true in general, but is good enough for our purposes.) For example, if we were
working with the phrase DOGGIES
, we would split it into the chunks: DOGG
, OGGI
, GGIE
, GIES
and get the probabilities of each of these. Each of these four
letter pieces is called a quadgram.
Computationally, dealing with very small numbers is difficult; so, we will deal with “log probabilities” (also called “log likelihoods”) instead.
We have provided you with a class QuadGramLikelihoods
which reads in a file of probabilities and provides its user with log likelihoods for all quadgrams. You do not need to understand
how it works, but, you are, of course, welcome to read the code. We also used this class in lecture; so, you could look back to those notes.
public double getScore(QuadGramLikelihoods likelihoods)
Returns the sum of the log likelihoods (which you can get from the parameter likelihoods
) of each consecutive quadgram in the plain text.
Limit the usage of getPlainText
if your implementation is too slow.
Finding a “Good” Solution
To find a “good solution”, we will put together all the previous pieces into an algorithm called hill climbing which successively improves a random solution. You should implement it as follows.
public SubstitutionCipher getSolution(QuadGramLikelihoods likelihoods)
- Create a cipher with this cipher’s ciphertext and a random key \(C\)
- While fewer than 1000 trials in a row have not resulted in a replacement:
- Randomly swap two letters in \(C\) to make a new key \(M\)
- If \(M\) scores better than \(C\):
- Replace \(C\) with \(M\)
We have implemented a client for your SubstitutionCipher
that reads a cipher from the console and uses your class to decrypt it.
Task 4.
Implement all the methods, as explained above, in the SubstitutionCipher
class.
Then, run the B Tests
to check your work.
One potential cause of this could be in your getScore
method. Note, above we say:
Avoid using getPlainText
more than once if your implementation is too slow.
Running getPlainText
many times (e.g. within a loop) within getScore may result in slowdown because you are recomputing the same value over and over. If you are doing this, try to rewrite your code such that getPlainText only gets called once.
This could be because of several things, but one thing to check is that when you are calling your randomSwap
, be sure to call it on the cipher you want. Calling it on this
, i.e. by writing “randomSwap()” or “this.randomSwap()”, will perform a random swap on the current cipher (which under the provided algorithm for getSolution in the spec is never used). Instead, write “curr.randomSwap()” to call it on a cipher curr
.
Another potential issue could be that you are not resetting your counter after you make a replacement. Recall from the spec that the condition for continuing is that fewer than 1000 trials in a row have not resulted in a replacement, so when a replacement occurs, it breaks the streak. Be sure your code reflects that.
One cause of this could be that you are attempting to modify the immutable key field of SubstitutionCipher. The key field, aka “this.key”, is immutable because it is part of the SubstitutionCipher object. Thus, if you attempt to change its keys or values, it will throw the UnsupportedOperatorException error.
To avoid this error, if you need to modify keys, consider making a copy of the this.keys hashmap with the following syntax:
Map newKey = new HashMap<>(this.key)
newKey
will now hold all the same keys and values of this.key, but you can modify it freely.
One thing this could be is that you are not calling randomSwap
enough times. While writing your second SubstitutionCipher constructor, please double check that you are calling randomSwap()
10,000 times, as per the spec.
The idea here is to start with an identity key, meaning that every letter maps to itself. When we call randomSwap()
only once, we only switch one of the mappings of the letters, meaning that 24 of the letters still map to themselves. We want to call randomSwap()
10,000 times to create a unique key where we ensure randomness and key complexity and so that we can randomly initialize a key for the cipher text.
this.key.keySet()
When you use this.key.keySet()
within your code to generate the list of characters to randomize from, this will cause nondeterministic behavior in our tests. This is because the call this.key.keySet()
will make a set out of the keys, however, sets are non-ordered and so the list will not be ordered deterministically. Thus, it will fail our tests which set specific random seeds, as it will randomize differently than expected.
Instead of using this.key.keySet()
to get the letters to randomize from, consider using ALPHABET
from CaesarCipher. You can either copy it over manually, or use by indexing into “CaesarCipher.ALPHABET”.
Decrypting cryptogram.txt
(A Tests)
Task 5.
Finally, you can use all your hard work to decrypt the secret message hidden in the file cryptogram.txt
. Write a program to remove
all non-upper-case characters (including spaces and newlines) from cryptogram.txt
, run
SubstitutionCipherSolver
on the result, and put the final decrypted cryptogram in plaintext.txt
. You do not need to submit this extra program; just submit the plaintext.txt
file.
Please note that the cryptogram is multiple lines.
Committing and Pushing to the Repository
A “commit” is version control language for a chunk of changes that mean something when grouped together. Importantly, commits will only stay locally on your machine
until you push them to the gitlab
server.
Task 6.
To commit to the repository, click the green checkmark on the top-right of IntelliJ’s interface. A window will pop up and
ask you to type in a message. Choose some meaningful description of your feature, then click the little down arrow next to the “Commit” button on the bottom right of the dialog.
Choose “Commit and Push” in the little menu that pops up. If IntelliJ asks you if you’re sure you want to commit and push, say yes.
Checking the Results on gitlab
When you go to https://gitlab.caltech.edu, you should see your project01
repository listed. Click on it. All the way on the right, there should be a red “X”,
an orange “pause” symbol, a blue “time” symbol, or a green “check” symbol. This symbol indicates the status of the automated tests which will
check the correctness of your code. Click on it. On the new page, you should see a table, and under “status”, there should be another symbol. Click on it. The new page will list
the “categories” of tests for this repository. These will be some combination of “A”, “B”, “C”, and “D”. These letters indicate what grade you earned on the assessment for
passing those tests. If you click on one of them, it will detail for you which sub-tests you passed and failed which might help you debug.
Note that the “A Tests” for project01
are only on gitlab
. So, you must check there to make sure you are passing them. It is good practice to
always check gitlab
after submitting a project or lab in this course.
Task 7. Check to make sure you are passing all the tests on gitlab. If you are, you’re done!
If you notice that you are outputting what seems to be correct English text into your plaintext file but the A tests are failing on git, one thing to check is that you are reading every line of the input. If you only read the first line, you will not pass the tests because you do not have the complete output, even though it will look correct (assuming everything is correct in SubstitutionCipher).
Thus, you should take advantage of the .hasNextLine()
and .nextLine()
methods of the Scanner to make sure that you read the entire input in your function.
Another thing that could cause this behavior is decoding each line of the file by itself. The whole cryptogram file is encrypted with a single key mapping, so the key should be the same for all lines–if you get different keys for different lines, then your decoded text will almost certainly be incorrect in some places. Using all of the encrypted text when searching for the correct key ensures that your likelihood sum calculation is accounting for the likelihoods of all of the words in the cryptogram (versus just those in one line), which makes it more likely that you find the right key as well.
Task 8. Answer all the OpenEnded questions below this task. These will be graded entirely on completion/effort, but they will really help us improve the course. Then, click the “Submit OpenEndeds” button below.