[Vigenère Cipher] [Nate Hardison - Harvard University] [This is CS50. - CS50.TV] Meet Alice. Alice has a crush on Bob. Fortunately for Alice, Bob also has eyes for her. Unfortunately for their budding romance, not only do Alice's parents disapprove of Bob, but Alice's best friend, Evelyn, has a secret crush on Bob and selfishly wants to keep them apart at all costs. To send secret messages to each other that Alice's parents can't understand, Alice and Bob have been using a Caesar cipher, which works by shifting the alphabet by a certain number of letters as a way to generate a new alphabet. Each letter in the original alphabet is then substituted by its corresponding letter in the new shifted alphabet. Alice's favorite number is 3, which Bob knows, so she uses 3 as her key. When she shifts the English alphabet by 3 letters, A becomes D, B becomes E, C becomes F, and so forth. When she gets to the end of the alphabet--to the letters X, Y, and Z-- she just wraps around back to the beginning of the alphabet and substitutes X with A, Y with B, and Z with C. So when Alice goes to encrypt her secret message to Bob, namely "Meet me at the park at eleven a.m.," she just makes the appropriate substitutions. M becomes P, E becomes H, and so on until her unencrypted plain text message is turned into encrypted cipher text: "Phhw ph dw wkh sdun dw hohyhq dp" is definitely not the most romantic sounding, but Alice believe that it'll do. Alice gives the message to Evelyn to deliver to Bob's house. But Evelyn instead takes it back to her room and tries to crack the code. One of the first things Evelyn notices is that the letter H occurs 7 times in the message, many more times than any other letter. Knowing that the letter E is the most common in the English language, occurring almost 13% of the time, Evelyn guesses that H has been substituted for E in order to make the secret message and tries using a key of 3 to decrypt it. Within minutes, Evelyn figures out Alice's plans and evilly calls Alice's parents. Had Alice and Bob taken CS50, they would have known of this frequency-analysis attack on the Caesar cipher, which allows it to be broken quite quickly. They would also have known that the cipher is easily subject to a brute-force attack, whereby Evelyn could have tried all of the possible 25 keys, or shifts of the English alphabet, in order to decipher the message. Why 25 keys and not 26? Well, try shifting any letter by 26 positions, and you'll see why. Anyway, a brute-force attack would have taken Evelyn a bit longer but not long enough to keep her from thwarting Alice and Bob's plans, especially if Evelyn has the aid of a computer which could rip through all 25 cases in an instant. So, this problem also plagued others who used the Caesar cipher, and therefore people began experimenting with more complex substitution ciphers that use multiple shift values instead of just one. One of the most well-known of these is called Vigenère cipher. How do we get multiple shift values? Well, instead of using a number as the key, we use a word for the key. We'll use each letter in the key to generate a number, and the effect is that we'll have multiple Caesar cipher-style keys for shifting letters. Let's see how this works by encrypting Alice's message to Bob: Meet me at the park at eleven a.m. I, personally, think bacon is delicious, so let's use that as the key. If we take the message in its unencrypted, plain-text format, we see that it's 25 letters long. Bacon has only 5 letters, so we need to repeat it 5 times to make it match the length of the plain text. Bacon bacon bacon bacon bacon. As a brief aside, if the number of letters in the plain text didn't divide cleanly by the number of letters in the key, we just end the final repetition of our key early, using only the letters we needed to make everything match up. Now we go about finding the shift values. We're going to do this by using the position of each letter of our key--bacon-- in the A to Z alphabet. Since we're computer scientists, we like to start counting at zero instead of 1, so we're going to say that the position of the first letter of bacon--B-- is in position 1 in the zero-indexed A to Z alphabet, not 2, and the position of A is zero, not 1. Using this algorithm, we can find the shift values for each letter. To encrypt the plain text and generate cipher text, we just shift each letter in the plain text by the specified amount, just like we do with the Caesar cipher, wrapping from Z back to A if necessary. M gets shifted by 1 place to become N. The first E doesn't shift at all, but we shift the second E by 2 places to G and T by 14 places to H. If we work through the plain text, we end up with, "Negh zf av huf pcfx bt gzrwep oz." Again, not very romantic-sounding but definitely cryptic. If Alice and Bob had known about Vigenère cipher, would they have been safe from Evelyn's prying eyes? What do you think? Would you want to log into your bank account if your bank decided to use Vigenère cipher to encrypt your communication using your password as your key? If I were you, I wouldn't. And while Evelyn might be kept busy long enough for Alice and Bob to have their meet-up, it's not worth it for Alice and Bob to chance it. Vigenère cipher is relatively easy to break if you know the length of the key because then you can treat the encrypted cipher text as the product of a few interwoven Caesar ciphers. Finding the length of the key isn't terribly hard, either. If the original plain-text message is long enough that some words occur multiple times, eventually you'll see repetition cropping up in the encrypted cipher text, as in this example, where you see MONCY appear twice. Additionally, you can perform a brute-force attack on the cipher. This does take significantly longer than a brute-force attack on the Caesar cipher, which can be done almost instantaneously with a computer since instead of 25 cases to check you've got 26ⁿ - 1 possibilities, where n is the length of the unknown key. This is because each letter in the key could be any of the 26 letters, A through Z, and a smart person would try to use a key that can't be found in a dictionary, which means that you'd have to test all of the weird letter combinations, like ZXXXFF, and not just a couple hundred thousand words in the dictionary. The minus 1 comes into the math because you wouldn't want to use a key with only A's, since with our zero-indexed alphabet that would give you the same effect as using a Caesar cipher with a key of zero. Anyway, 26ⁿ - 1 does get large rather quickly, but while you definitely wouldn't want to try breaking a cipher by hand this way, this is definitely doable with a computer. Fortunately for Alice and Bob, and for online banking, cryptographers have developed more secure ways to encrypt secret messages from prying eyes. However, that's a topic for another time. My name is Nate Hardison. This is CS50.