1 00:00:00,000 --> 00:00:00,760 2 00:00:00,760 --> 00:00:02,270 >> ZAMYLA: Let's tackle Caesar. 3 00:00:02,270 --> 00:00:06,110 In Caesar, we allow the user to encode a secret message. 4 00:00:06,110 --> 00:00:09,780 So let's dive right in and look at our to-dos for this problem. 5 00:00:09,780 --> 00:00:12,210 So first, we get the key from the user. 6 00:00:12,210 --> 00:00:15,210 Then we get the plaintext that they want to encode. 7 00:00:15,210 --> 00:00:21,380 After that, we encipher it for them, and finally we print their ciphertext. 8 00:00:21,380 --> 00:00:23,600 >> So let's start with an example. 9 00:00:23,600 --> 00:00:26,920 Say you wanted to encode the entire alphabet with a key of two. 10 00:00:26,920 --> 00:00:31,360 Well, your entire alphabet would just be shifted to letters. 11 00:00:31,360 --> 00:00:37,060 So A would encode to C, B to D, C to E, so on and so forth, 12 00:00:37,060 --> 00:00:42,470 until you get to X, which encodes to Z or zed, depending on where you're from. 13 00:00:42,470 --> 00:00:47,445 Then Y would then shift all the way, wrap around the alphabet to get to A, 14 00:00:47,445 --> 00:00:53,256 and then finally the last letter of the alphabet, Z, zed, would encode to B. 15 00:00:53,256 --> 00:00:54,660 >> You got that? 16 00:00:54,660 --> 00:00:56,380 Let's look at some examples. 17 00:00:56,380 --> 00:01:00,540 The first example here is very similar to what we just explained above. 18 00:01:00,540 --> 00:01:05,560 So if I encode some section of the alphabet, A through L, by a key of two, 19 00:01:05,560 --> 00:01:09,760 then I just get my entire alphabet shifted two letters. 20 00:01:09,760 --> 00:01:12,230 >> Then, in my next example, the key is still two, 21 00:01:12,230 --> 00:01:15,080 so you should still know which letters to expect. 22 00:01:15,080 --> 00:01:16,400 But here it's a phrase. 23 00:01:16,400 --> 00:01:18,100 This is CS50. 24 00:01:18,100 --> 00:01:21,090 So you'll notice that I preserve the case of my letters, 25 00:01:21,090 --> 00:01:25,600 so any upper case letters are also upper case letters in the ciphertext. 26 00:01:25,600 --> 00:01:27,800 And any lowercase letters in the plaintext 27 00:01:27,800 --> 00:01:30,640 are also lowercase letters in the ciphertext. 28 00:01:30,640 --> 00:01:34,020 But I keep the letters and any exclamation marks 29 00:01:34,020 --> 00:01:37,610 or any other punctuation the same. 30 00:01:37,610 --> 00:01:40,360 >> So now that we have a sense for how the program works, 31 00:01:40,360 --> 00:01:43,890 feel free to go run some more examples of your own, if you wish. 32 00:01:43,890 --> 00:01:47,072 Let's start with getting the key from the user. 33 00:01:47,072 --> 00:01:48,780 Traditionally, with other problems, we've 34 00:01:48,780 --> 00:01:51,450 been accustomed to getting any numbers that we 35 00:01:51,450 --> 00:01:54,710 need by prompting the user with the function getint. 36 00:01:54,710 --> 00:01:58,850 But this time we're actually going to use the command line argument 37 00:01:58,850 --> 00:02:01,760 and a new function called atoi. 38 00:02:01,760 --> 00:02:05,130 >> When you run the main program in C, then it 39 00:02:05,130 --> 00:02:08,500 takes in two parameters-- int argc, which 40 00:02:08,500 --> 00:02:11,670 is the number of arguments passed in, and then 41 00:02:11,670 --> 00:02:15,920 argv, an array of strings which contains the list of all of the arguments 42 00:02:15,920 --> 00:02:16,740 passed. 43 00:02:16,740 --> 00:02:19,740 You don't explicitly have to declare these variables. 44 00:02:19,740 --> 00:02:22,700 They're computed for you by the compiler. 45 00:02:22,700 --> 00:02:28,160 Correct usage for this would be for argc to be two in this case, 46 00:02:28,160 --> 00:02:32,630 because the user has to pass in the call to the program, ./caesar, 47 00:02:32,630 --> 00:02:35,570 and then the key, whatever number they wish. 48 00:02:35,570 --> 00:02:39,920 So that means that argc must be two in order for a valid use of Caesar 49 00:02:39,920 --> 00:02:41,680 to be executed. 50 00:02:41,680 --> 00:02:43,590 >> So let's look at an example. 51 00:02:43,590 --> 00:02:47,760 Say I've already written and compiled a program called blastoff. 52 00:02:47,760 --> 00:02:52,670 So if I ran in the command line ./blastoff Team Rocket, well, then, 53 00:02:52,670 --> 00:02:57,750 argc would be three because I passed in three distinct arguments. 54 00:02:57,750 --> 00:02:59,830 Then argv would look like this. 55 00:02:59,830 --> 00:03:03,750 It's an array, and it would contain each of the three strings. 56 00:03:03,750 --> 00:03:09,640 ./blastoff in the first index, team in the next, and rocket in the last. 57 00:03:09,640 --> 00:03:11,610 >> Let's talk about arrays for a sec. 58 00:03:11,610 --> 00:03:15,560 Arrays are data structures that hold multiple values of the same type. 59 00:03:15,560 --> 00:03:19,980 This can come in handy when we have lists of integers or strings. 60 00:03:19,980 --> 00:03:23,030 Just remember they have to be the same type. 61 00:03:23,030 --> 00:03:25,310 In computer science, we love counting from zero, 62 00:03:25,310 --> 00:03:29,260 so remember that arrays are also zero-indexed. 63 00:03:29,260 --> 00:03:34,360 So the first element of my array is going to be at index zero. 64 00:03:34,360 --> 00:03:37,580 So in this case, when I have an array of length four, 65 00:03:37,580 --> 00:03:41,350 then the last index of the last element of my array 66 00:03:41,350 --> 00:03:44,970 is actually going to be at index three, not four. 67 00:03:44,970 --> 00:03:48,880 Because remember, we start counting at zero. 68 00:03:48,880 --> 00:03:52,530 >> Here's an example of how you might create an array of your own. 69 00:03:52,530 --> 00:03:56,440 So say I wanted to store my three favorite dog names. 70 00:03:56,440 --> 00:03:59,030 Then I would create an array of strings. 71 00:03:59,030 --> 00:04:04,450 So I would declare the type, string, and then put the name of the array, dogs, 72 00:04:04,450 --> 00:04:06,450 and then in those square brackets put the size 73 00:04:06,450 --> 00:04:09,260 of the array-- in this case, three. 74 00:04:09,260 --> 00:04:12,690 >> So my first entry is going to be dogs at index zero, 75 00:04:12,690 --> 00:04:14,750 and that's going to be Milo. 76 00:04:14,750 --> 00:04:17,850 Then dogs at index one is going to be goofy, 77 00:04:17,850 --> 00:04:23,060 darling Mochi, and then the last entry, the third entry at index two, 78 00:04:23,060 --> 00:04:26,130 is going to be cute, sweet Elphie. 79 00:04:26,130 --> 00:04:28,610 You'll notice that the format for filling in this array 80 00:04:28,610 --> 00:04:32,150 is very much like how you might declare any other variable where 81 00:04:32,150 --> 00:04:36,307 you have the variable name followed by the value that you want stored in it. 82 00:04:36,307 --> 00:04:38,140 The only difference in this case is that you 83 00:04:38,140 --> 00:04:42,700 have to remember to put the index of the value in square brackets. 84 00:04:42,700 --> 00:04:46,420 And there we have our three favorite dogs. 85 00:04:46,420 --> 00:04:48,780 >> But alas, it's time to get back to Caesar. 86 00:04:48,780 --> 00:04:52,910 Remember that correct usage for the user is going to be passing in not only 87 00:04:52,910 --> 00:04:57,430 the name of the program ./caesar, but also the key that they want to shift 88 00:04:57,430 --> 00:04:58,850 their plaintext by. 89 00:04:58,850 --> 00:05:01,540 So that means that argc must be two. 90 00:05:01,540 --> 00:05:07,580 They must pass in two-- no more, no less than two command line arguments. 91 00:05:07,580 --> 00:05:09,050 >> Now, what about argv? 92 00:05:09,050 --> 00:05:12,880 Well, we already know that the array is going to be of length two, 93 00:05:12,880 --> 00:05:15,270 but what's contained in each element? 94 00:05:15,270 --> 00:05:19,330 Well, the first element is going to be ./caesar, 95 00:05:19,330 --> 00:05:24,190 and then the next element is going to contain the key that the user typed in. 96 00:05:24,190 --> 00:05:27,480 Now, if they used it correctly for the usage of Caesar, 97 00:05:27,480 --> 00:05:29,350 then they're going to type in a number. 98 00:05:29,350 --> 00:05:33,260 But even though the character that they type is a number, 99 00:05:33,260 --> 00:05:35,790 it's of data type string. 100 00:05:35,790 --> 00:05:40,390 >> So how do we convert that string to an integer? 101 00:05:40,390 --> 00:05:46,680 So say I have num, a string, containing the string 50. 102 00:05:46,680 --> 00:05:49,000 If I want to convert that to an integer, then I simply 103 00:05:49,000 --> 00:05:53,340 declare a new variable, an integer i, calling atoi. 104 00:05:53,340 --> 00:06:01,160 I pass in my string variable, num, and then i will then contain the number 50. 105 00:06:01,160 --> 00:06:04,350 Make sure to check the man pages for the atoi function 106 00:06:04,350 --> 00:06:07,990 to check which library it's in, as well as what value it 107 00:06:07,990 --> 00:06:14,550 will return if the string passed in doesn't contain all numbers. 108 00:06:14,550 --> 00:06:16,950 >> So now that we've gotten the key, the next step 109 00:06:16,950 --> 00:06:19,430 is to get the plaintext from the user. 110 00:06:19,430 --> 00:06:21,170 Now, this is going to be less complicated 111 00:06:21,170 --> 00:06:23,410 than navigating around the command line arguments. 112 00:06:23,410 --> 00:06:26,190 All we have to do is call the getstring function 113 00:06:26,190 --> 00:06:29,660 to prompt the user to give us a string, but remember 114 00:06:29,660 --> 00:06:34,090 to check the specifications for how we might want to prompt the user for that. 115 00:06:34,090 --> 00:06:36,420 >> Now we come to the heart of the problem-- 116 00:06:36,420 --> 00:06:38,860 how to encipher the plaintext. 117 00:06:38,860 --> 00:06:42,830 Well, first, let's talk about how to encipher one character at a time, 118 00:06:42,830 --> 00:06:47,370 and then we'll address how to iterate over the entire plaintext. 119 00:06:47,370 --> 00:06:50,440 I've written some pseudocode for the Caesar problem. 120 00:06:50,440 --> 00:06:52,310 I encourage you to write your own as well. 121 00:06:52,310 --> 00:06:55,900 It might not look identical to mine, and that's OK, but as long 122 00:06:55,900 --> 00:06:58,640 as the general idea is the same. 123 00:06:58,640 --> 00:07:00,780 >> The first three steps we've already done. 124 00:07:00,780 --> 00:07:03,100 We've gotten the key from the command line argument, 125 00:07:03,100 --> 00:07:05,510 we've turned that key into an integer, and we've 126 00:07:05,510 --> 00:07:09,320 prompted the user for the plaintext that they want to encipher. 127 00:07:09,320 --> 00:07:12,420 So then the next big chunk is that for each character 128 00:07:12,420 --> 00:07:15,070 in the plaintext string, if it's alphabetic, 129 00:07:15,070 --> 00:07:17,750 we want to preserve its case and shift it. 130 00:07:17,750 --> 00:07:20,900 By preserve case, what I mean is that all upper case 131 00:07:20,900 --> 00:07:23,580 letters should remain upper case and all lowercase letters 132 00:07:23,580 --> 00:07:25,640 should remain lowercase. 133 00:07:25,640 --> 00:07:29,110 So then after we shift those, then we print the ciphertext. 134 00:07:29,110 --> 00:07:33,100 >> Here are three functions that are going to come in handy for this problem. 135 00:07:33,100 --> 00:07:38,010 Remember up above when I gave the example for shifting this is CS50? 136 00:07:38,010 --> 00:07:41,800 Remember that the 50 and the exclamation mark didn't shift? 137 00:07:41,800 --> 00:07:45,680 So how can we tell whether we need to shift a letter or not? 138 00:07:45,680 --> 00:07:48,650 Well, "is alpha," if you pass it a character, 139 00:07:48,650 --> 00:07:54,850 will return true if that character is a letter and false otherwise. 140 00:07:54,850 --> 00:07:56,870 To help you with preserving capitalization 141 00:07:56,870 --> 00:07:59,750 are the functions "is upper" and "is lower." 142 00:07:59,750 --> 00:08:03,350 >> These two functions also take in a single character as input 143 00:08:03,350 --> 00:08:06,580 and return you a Boolean, either true or false 144 00:08:06,580 --> 00:08:11,280 depending on whether that character is upper case or lower case. 145 00:08:11,280 --> 00:08:14,610 Because "is upper" and "is lower" are Boolean functions, 146 00:08:14,610 --> 00:08:18,660 meaning that they return you a Boolean, you can use these in your conditions. 147 00:08:18,660 --> 00:08:23,490 So here's a snippet of code that only prints a letter if it's upper case. 148 00:08:23,490 --> 00:08:27,790 So I've declared my character letter to be the upper case zed 149 00:08:27,790 --> 00:08:33,440 and then if "is upper" returns true, then I print that letter. 150 00:08:33,440 --> 00:08:38,200 >> Going back to our simple example of shifting the alphabet by a key of two, 151 00:08:38,200 --> 00:08:41,049 how do we actually get that to work? 152 00:08:41,049 --> 00:08:45,770 Well, it turns out that characters and integers are very closely related. 153 00:08:45,770 --> 00:08:48,840 Each character has an integer value associated 154 00:08:48,840 --> 00:08:53,260 with it found in the ASCII chart, where each character's ASCII 155 00:08:53,260 --> 00:08:55,380 value is displayed. 156 00:08:55,380 --> 00:08:58,940 So an upper case A corresponds to an ASCII value of 65 157 00:08:58,940 --> 00:09:02,270 and a lowercase a to an ASCII value of 97. 158 00:09:02,270 --> 00:09:04,940 >> Feel free to look up any ASCII chart online 159 00:09:04,940 --> 00:09:07,720 to see these values for yourself. 160 00:09:07,720 --> 00:09:12,096 So what this means is that we can take the character of upper case A, 161 00:09:12,096 --> 00:09:18,200 add the integer two to it, and then get the character upper case C as a result. 162 00:09:18,200 --> 00:09:23,720 That's because 65, the ASCII value for capital A, plus 2, 163 00:09:23,720 --> 00:09:29,960 gives us 67, which corresponds to the character of upper case C. 164 00:09:29,960 --> 00:09:33,480 >> Unfortunately, things won't quite be so simple. 165 00:09:33,480 --> 00:09:36,980 We have an equation that we have to consider. 166 00:09:36,980 --> 00:09:43,590 Here it tells us that the ith ciphertext letter corresponds to the ith plaintext 167 00:09:43,590 --> 00:09:48,900 letter, plus the key-- all of that, modular 26. 168 00:09:48,900 --> 00:09:50,810 Why is that the case? 169 00:09:50,810 --> 00:09:55,430 Let's go back to our example from before, where capital A, plus 2, 170 00:09:55,430 --> 00:09:57,590 gives us capital C. 171 00:09:57,590 --> 00:10:01,870 >> So applying the equation that the specification gives us, 172 00:10:01,870 --> 00:10:06,660 then let's take capital A plus 2 and mod that by 26. 173 00:10:06,660 --> 00:10:10,730 So capital A, when I put it in those single quotation marks, 174 00:10:10,730 --> 00:10:14,010 allows me to treat it as an integer, so that allows 175 00:10:14,010 --> 00:10:17,500 me to cast it to its ASCII value, 65. 176 00:10:17,500 --> 00:10:20,080 65 plus 2 is 67. 177 00:10:20,080 --> 00:10:25,210 67 mod 26 gives us 15, but that doesn't really 178 00:10:25,210 --> 00:10:32,590 make sense because we know that the capital C ASCII value is 67, not 15. 179 00:10:32,590 --> 00:10:35,580 So how do we reconcile that? 180 00:10:35,580 --> 00:10:39,840 >> Well, here I'd like to posit the notion of an alphabetical index. 181 00:10:39,840 --> 00:10:44,010 So we've already talked about how each character has its ASCII value, 182 00:10:44,010 --> 00:10:48,810 but I'd like to say, well, let's think about each character also having 183 00:10:48,810 --> 00:10:52,230 an alphabetical index, where A for instance, 184 00:10:52,230 --> 00:10:58,800 as the first letter of the alphabet, has an alphabetical index of zero. 185 00:10:58,800 --> 00:11:02,070 So now let's apply the same equation as before, 186 00:11:02,070 --> 00:11:05,040 but using an alphabetical index. 187 00:11:05,040 --> 00:11:07,810 >> So A is zero, as we've defined. 188 00:11:07,810 --> 00:11:15,640 So then taking zero plus two, mod 26, that's two, mod 26, which gives us two. 189 00:11:15,640 --> 00:11:18,780 And well, in terms of an alphabetical index, 190 00:11:18,780 --> 00:11:23,930 C is the third letter in the alphabet, so that would correspond 191 00:11:23,930 --> 00:11:26,290 to an alphabetical index of two. 192 00:11:26,290 --> 00:11:29,850 So it seems that using the alphabetical index in this case 193 00:11:29,850 --> 00:11:32,840 actually gives us the correct result. 194 00:11:32,840 --> 00:11:35,020 >> So now let's check to see if the equation works 195 00:11:35,020 --> 00:11:37,210 with an alphabetical index. 196 00:11:37,210 --> 00:11:42,540 Y's alphabetical index is 24 as the second to last letter in the alphabet. 197 00:11:42,540 --> 00:11:46,520 So then 24 plus our key of two gives us 26. 198 00:11:46,520 --> 00:11:54,750 26 mod 26 gives us 0, which, lucky for us, is the alphabetical index for A. 199 00:11:54,750 --> 00:11:59,100 So hopefully that's proof enough that the alphabetical index method works. 200 00:11:59,100 --> 00:12:03,180 If not, feel free to try out some examples of your own. 201 00:12:03,180 --> 00:12:08,030 >> In order to properly wrap around the alphabet and apply the Caesar equation, 202 00:12:08,030 --> 00:12:11,280 then we know that we need to deal with alphabetical indices. 203 00:12:11,280 --> 00:12:15,130 But we start with ASCII values, and only then 204 00:12:15,130 --> 00:12:18,530 do we then convert to the alphabetical index. 205 00:12:18,530 --> 00:12:23,970 From there, in order to print, we need to deal with the ASCII values again. 206 00:12:23,970 --> 00:12:28,350 So we need to figure out how to go from ASCII to alphabetical 207 00:12:28,350 --> 00:12:31,080 and from alphabetical to ASCII. 208 00:12:31,080 --> 00:12:34,910 >> So I leave it to you to figure out the pattern between a character 209 00:12:34,910 --> 00:12:38,590 and its alphabetical index and its ASCII value. 210 00:12:38,590 --> 00:12:41,530 Now, remember that even though this table right on the slide 211 00:12:41,530 --> 00:12:45,630 shows the capital letters, we also have to consider whether or not 212 00:12:45,630 --> 00:12:48,915 a different pattern applies for the lower case characters. 213 00:12:48,915 --> 00:12:52,070 214 00:12:52,070 --> 00:12:55,840 >> So now that we've figured out how to shift a single character, 215 00:12:55,840 --> 00:13:02,200 then all we have to do is scale that up to go across the entire plaintext. 216 00:13:02,200 --> 00:13:04,260 So the plaintext is a string. 217 00:13:04,260 --> 00:13:08,210 Lucky for us, a string is really just an array of characters, 218 00:13:08,210 --> 00:13:12,150 so in order to access every character of a string, all you have to do 219 00:13:12,150 --> 00:13:14,270 is to use array notation. 220 00:13:14,270 --> 00:13:20,270 Say I have a variable of type string called "text='this is CS50.'" 221 00:13:20,270 --> 00:13:22,730 >> Well, then, in order to access each character, 222 00:13:22,730 --> 00:13:25,880 all I have to do with the variable text is 223 00:13:25,880 --> 00:13:31,660 to say well, text at index zero, that corresponds to capital T. Text at index 224 00:13:31,660 --> 00:13:35,100 one corresponds to the lower case h. 225 00:13:35,100 --> 00:13:38,110 Another useful function is the string length function. 226 00:13:38,110 --> 00:13:40,760 So passing in a string to that function will return 227 00:13:40,760 --> 00:13:44,610 an integer, the length of that string. 228 00:13:44,610 --> 00:13:47,060 >> Now that we've talked about all these different elements, 229 00:13:47,060 --> 00:13:48,540 let's put them back together. 230 00:13:48,540 --> 00:13:52,210 So return to either my pseudocode code or your pseudocode 231 00:13:52,210 --> 00:13:55,920 and go through and make sure that you know how to do every single thing. 232 00:13:55,920 --> 00:14:01,520 Getting the key using argc and argv, turning the key into an integer, a 233 00:14:01,520 --> 00:14:06,840 to i, prompting for plaintext, getstring, and then iterating 234 00:14:06,840 --> 00:14:09,590 over every character in the plaintext string, 235 00:14:09,590 --> 00:14:14,910 preserving the case of each character and shifting that character by the key, 236 00:14:14,910 --> 00:14:17,520 ensuring that you're wrapping around the alphabet, 237 00:14:17,520 --> 00:14:20,850 finally printing that ciphertext. 238 00:14:20,850 --> 00:14:25,470 My name is Amila, and this was Caesar. 239 00:14:25,470 --> 00:14:28,448