[RSA] [Rob Bowden] [Tommy MacWilliam] [Harvard University] [This is CS50.] [CS50.TV] Let's take a look at RSA, a widely used algorithm for encrypting data. Encryption algorithms like Caesar and Vigenère ciphers aren't very secure. With the Caesar cipher, an attacker only needs to try 25 different keys to get the message's plain text. While the Vigenère cipher is more secure than the Caesar cipher because of the larger search space for keys, once an attacker knows the length of the key in a Vigenère cipher, which can be determined via an analysis of patterns in the encrypted text, the Vigenère cipher is not that much more secure than the Caesar cipher. RSA, on the other hand, isn't vulnerable to attacks like this. The Caesar cipher and Vigenère cipher use the same key to both encrypt and decrypt a message. This property makes these ciphers symmetric key algorithms. A fundamental problem with symmetric key algorithms is that they rely on the one encrypting and sending the message and the one receiving and decrypting the message to have already agreed upfront on the key they will both use. But we have a bit of a startup problem here. How do 2 computers that want to communicate establish a secret key between them? If the key must be secret, then we need a way to encrypt and decrypt the key. If all we have is symmetric key cryptography then we've just come back to the same problem. RSA, on the other hand, uses a pair of keys, one for encryption and another for decryption. One is called the public key, and the other is the private key. The public key is used to encrypt messages. As you might guess by its name, we can share our public key with anyone we want without compromising the security of an encrypted message. Messages encrypted using a public key can only be decrypted with its corresponding private key. While you can share your public key, you should always keep your private key secret. Since the private key should be kept a secret and only the private key can be used to decrypt messages, if 2 users want to send messages encrypted with RSA back and forth both users need to have their own public and private key pair. Messages from user 1 to user 2 only use user 2's key pair, and messages from user 2 to user 1 only use user 1's key pair. The fact that there are 2 separate keys to encrypt and decrypt messages makes RSA an asymmetric key algorithm. We don't need to encrypt the public key in order to send it to another computer since the key is public anyway. This means that RSA doesn't have the same startup problem as a symmetric key algorithm. How do 2 computers that want to communicate establish a secret key between them? If the key must be secret, then we need a way to encrypt and decrypt the key. If all we have is symmetric key cryptography then we've just come back to the same problem. RSA, on the other hand, uses a pair of keys, one for encryption and another for decryption. One is called the public key, and the other is the private key. The public key is used to encrypt messages. As you might guess by its name, we can share our public key with anyone we want without compromising the security of an encrypted message. Messages encrypted using a public key can only be decrypted with its corresponding private key. While you can share your public key, you should always keep your private key secret. Since the private key should be kept a secret and only the private key can be used to decrypt messages if 2 users want to send messages encrypted with RSA back and forth both users need to have their own public and private key pair. Messages from user 1 to user 2 only use user 2's key pair, and messages from user 2 to user 1 only use user 1's key pair. The fact that there are 2 separate keys to encrypt and decrypt messages makes RSA an asymmetric key algorithm. We don't need to encrypt the public key in order to send it to another computer since the key is public anyway. This means that RSA doesn't have the same startup problem as the symmetric key algorithms. So if I want to send a message using RSA encryption to Rob, I'll first need Rob's public key. To generate a pair of keys, Rob needs to pick 2 large prime numbers. These numbers will be used in both the public and private keys, but the public key will only use the product of these 2 numbers, not the numbers themselves. Once I've encrypted the message using Rob's public key I can send the message to Rob. For a computer, factoring numbers is a hard problem. The public key, remember, used the product of 2 prime numbers. This product must then have only 2 factors, which happen to be the numbers that make up the private key. In order to decrypt the message, RSA will use this private key or the numbers multiplied together in the process of creating the public key. Because it's computationally hard to factor the number used in a public key into the 2 numbers used in the private key it's difficult for an attacker to figure out the private key that will be necessary to decrypt the message. Now let's go into some lower level details of RSA. Let's first see how we can generate a pair of keys. First, we'll need 2 prime numbers. We'll call these 2 numbers p and q. In order to pick p and q, in practice we would pseudorandomly generate large numbers and then use a test for determining whether or not those numbers are probably prime. We can keep generating random numbers over and over again until we have 2 primes that we can use. Here let's pick p = 23 and q = 43. Remember, in practice, p and q should be much larger numbers. As far as we know, the larger the numbers, the harder it is to crack an encrypted message. But it's also more expensive to encrypt and decrypt messages. Today it's often recommended that p and q are at least 1024 bits, which puts each number at over 300 decimal digits. But we'll pick these small numbers for this example. Now we'll multiply p and q together to get a 3rd number, which we'll call n. In our case, n = 23 * 43, which = 989. We have n = 989. Next we'll multiply p - 1 with q - 1 to obtain a 4th number, which we'll call m. In our case, m = 22 * 42, which = 924. We have m = 924. Now we'll need a number e that is relatively prime to m and less than m. Two numbers are relatively prime or coprime if the only positive integer that divides them both evenly is 1. In other words, the greatest common divisor of e and m must be 1. In practice, it's common for e to be the prime number 65537 as long as this number doesn't happen to be a factor of m. For our keys, we'll pick e = 5 since 5 is relatively prime to 924. Finally, we'll need one more number, which we'll call d. D must be some value that satisfies the equation de = 1(mod m). This mod m signifies we'll use something called modular arithmetic. In modular arithmetic, once a number gets higher than some upper bound it will wrap back around to 0. A clock, for example, uses modular arithmetic. One minute after 1:59, for example, is 2:00, not 1:60. The minute hand has wrapped around to 0 upon reaching an upper bound of 60. So, we can say 60 is equivalent to 0 (mod 60) and 125 is equivalent to 65 is equivalent to 5 (mod 60). Our public key will be the pair e and n where in this case e is 5 and n is 989. Our private key will be the pair d and n, which in our case is 185 and 989. Notice that our original primes p and q don't appear anywhere in our private or public keys. Now that we have our pair of keys, let's take a look at how we can encrypt and decrypt a message. I want to send a message to Rob, so he will be the one to generate this key pair. Then I'll ask Rob for his public key, which I'll use to encrypt a message to send to him. Remember, it's totally okay for Rob to share his public key with me. But it would not be okay to share his private key. I don't have any idea what his private key is. We can break our message m up into several chunks all smaller than n and then encrypt each of those chunks. We'll encrypt the string CS50, which we can break up into 4 chunks, one per letter. In order to encrypt my message, I'll need to convert it into some kind of numeric representation. Let's concatenate the ASCII values with the characters in my message. In order to encrypt a given message m I'll need to compute c = m to the e (mod n). But m must be smaller than n, or else the full message can't be expressed modulo n. We can break m up into several chunks, all of which are smaller than n, and encrypt each of those chunks. Encrypting each of these chunks, we get c1 = 67 to the 5 (mod 989) which = 658. For our second chunk we have 83 to the 5 (mod 989) which = 15. For our third chunk we have 53 to the 5 (mod 989) which = 799. And finally, for our last chunk we have 48 to the 5 (mod 989) which = 975. Now we can send over these encrypted values to Rob. Here you go, Rob. While our message is in flight, let's take another look at how we got that value for d. Our number d needed to satisfy 5d = 1 (mod 924). This makes d the multiplicative inverse of 5 modulo 924. Given 2 integers, a and b, the extended Euclidean algorithm can be used to find the greatest common divisor of these 2 integers. It will also give us 2 other numbers, x and y, that satisfy the equation ax + by = the greatest common divisor of a and b. How does this help us? Well, plugging in e = 5 for a and m = 924 for b we already know that these numbers are coprime. Their greatest common divisor is 1. This gives us 5x + 924y = 1 or 5x = 1 - 924y. But if we only care about everything modulo 924 then we can drop the - 924y. Think back to the clock. If the minute hand is on 1 and then exactly 10 hours pass, we know the minute hand will still be on the 1. Here we start at 1 and then wrap around exactly y times, so we'll still be at 1. We have 5x = 1 (mod 924). And here this x is the same as the d we were looking for before, so if we use the extended Euclidean algorithm to get this number x, that's the number we should use as our d. Now let's run the extended Euclidean algorithm for a = 5 and b = 924. We'll use a method called the table method. Our table will have 4 columns, x, y, d, and k. Our table starts off with 2 rows. In the first row we have 1, 0, then our value of a, which is 5, and our second row is 0, 1, and our value for b, which is 924. The value of the 4th column, k, will be the result of dividing the value of d in the row above it with the value of d on the same row. We have 5 divided by 924 is 0 with some remainder. That means we have k = 0. Now the value of every other cell will be the value of the cell 2 rows above it minus the value of the row above it times k. Let's start with d in the 3rd row. We have 5 - 924 * 0 = 5. Next we have 0 - 1 * 0 which is 0 and 1 - 0 * 0 which is 1. Not too bad, so let's move on to the next row. First we need our value of k. 924 divided by 5 = 184 with some remainder, so our value for k is 184. Now 924 - 5 * 184 = 4. 1 - 0 * 184 is 1 and 0 - 1 * 184 is -184. All right, let's do the next row. Our value of k will be 1 because 5 divided by 4 = 1 with some remainder. Let's fill in the other columns. 5 - 4 * 1 = 1. 0 - 1 * 1 = -1. And 1 - -184 * 1 is 185. Let's see what our next value of k would be. Well, it looks like we have 4 divided by 1, which is 4. In this case where we're dividing by 1 such that k is equal to the value of d in the above row means that we're done with our algorithm. We can see here that we have x = 185 and y = -1 in the last row. Let's now come back to our original goal. We said that the value of x as a result of running this algorithm would be the multiplicative inverse of a (mod b). That means that 185 is the multiplicative inverse of 5 (mod 924) which means that we have a value of 185 for d. The fact that d = 1 in the last row verifies that e was coprime to m. If it weren't 1 then we would have to pick a new e. Now let's see if Rob has received my message. When someone sends me an encrypted message as long as I've kept my private key a secret I'm the only one who can decrypt the message. To decrypt a chunk c I can calculate the original message is equal to the chunk to d power (mod n). Remember that d and n are from my private key. To get a full message from its chunks we decrypt each chunk and concatenate the results. Exactly how secure is RSA? The truth is, we don't know. Security is based on how long it would take an attacker to crack a message encrypted with RSA. Remember that an attacker has access to your public key, which contains both e and n. If the attacker manages to factor n into its 2 primes, p and q, then she could calculate d using the extended Euclidean algorithm. This gives her the private key, which can be used to decrypt any message. But how quickly can we factor integers? Again, we don't know. Nobody has found a fast way of doing it, which means that given large enough n it would take an attacker unrealistically long to factor the number. If someone revealed a fast way of factoring integers RSA would be broken. But even if integer factorization is inherently slow the RSA algorithm could still have some flaw in it that allows for easy decryption of messages. Nobody has found and revealed such a flaw yet, but that doesn't mean one doesn't exist. In theory, someone could be out there reading all data encrypted with RSA. There's another bit of a privacy issue. If Tommy encrypts some message using my public key and an attacker encrypts the same message using my public key the attacker will see that the 2 messages are identical and thus know what Tommy encrypted. In order to prevent this, messages are typically padded with random bits before being encrypted so that the same message encrypted multiple times will look different as long as the padding on the message is different. But remember how we have to split messages into chunks so that each chunk is smaller than n? Padding the chunks means that we might have to split things up into even more chunks since the padded chunk must be smaller than n. Encryption and decryption are relatively expensive with RSA, and so needing to break up a message into many chunks can be very costly. If a large volume of data needs to be encrypted and decrypted we can combine the benefits of symmetric key algorithms with those of RSA to get both security and efficiency. Although we won't go into it here, AES is a symmetric key algorithm like the Vigenère and Caesar ciphers but much harder to crack. Of course, we can't use AES without establishing a shared secret key between the 2 systems, and we saw the problem with that before. But now we can use RSA to establish the shared secret key between the 2 systems. We'll call the computer sending the data the sender and the computer receiving the data the receiver. The receiver has an RSA key pair and sends the public key to the sender. The sender generates an AES key, encrypts it with the receiver's RSA public key, and sends the AES key to the receiver. The receiver decrypts the message with its RSA private key. Both the sender and the receiver now have a shared AES key between them. AES, which is much faster at encryption and decryption than RSA, can now be used to encrypt the large volumes of data and send them to the receiver, who can decrypt using the same key. AES, which is much faster at encryption and decryption than RSA, can now be used to encrypt the large volumes of data and send them to the receiver, who can decrypt using the same key. We just needed RSA to transfer the shared key. We no longer need to use RSA at all. It looks like I've got a message. It doesn't matter if anyone read what's on the paper airplane before I caught it because I'm the only one with the private key. Let's decrypt each of the chunks in the message. The first chunk, 658, we raise to the d power, which is 185, mod n, which is 989, is equal to 67, which is the letter C in ASCII. Now, onto the second chunk. The second chunk has value 15, which we raise to the 185th power, mod 989, and this is equal to 83 which is the letter S in ASCII. Now for the third chunk, which has value 799, we raise to 185, mod 989, and this is equal to 53, which is the value of the character 5 in ASCII. Now for the last chunk, which has value 975, we raise to 185, mod 989, and this is equal to 48, which is value of the character 0 in ASCII. My name is Rob Bowden, and this is CS50. [CS50.TV] RSA at all. RSA at all. [laughter] At all.