## Week 2 Wednesday Andrew Sellergren ### Announcements and Demos (0:00-5:00) + This is CS50. + Sign up for [CS50 Lunch](http://cs50.net/lunch) if you're interested! + If you're interested in helping us translate CS50's content into a language you are fluent in, [let us know](http://cs50.net/translate)! + Hopefully we didn't scare you too much with our discussion of academic honesty last time. Know that you are free to show each other snippets of code, but you should draw the line at directly copying them. + Problem Set 2 will dive into the world of cryptography, the art and science of encrypting or scrambling information in order to protect it. The notion of password strength is one aspect of cryptography that we'll talk about. + A word on pass/fail: if you're feeling overwhelmed and you're thinking about leaving the course altogether, consider taking it pass/fail instead. It's a wonderful way of trying something out and learning a lot along the way. + We've rigged Sanders with wifi! The SSID is CS50 and the password is 12345678. Har, har, har. ### Real World Security (5:00-13:00) + Speaking of password strength, you might want to consider changing your password if it's the same as [King Roland's](http://www.youtube.com/watch?v=a6iW-8xPw3k) in Spaceballs. + The following are both examples of encrypted text: > Or fher gb qevax lbhe Binyvar. > {crypt}$1$LlBcWwQn$pxTB3yAjbVS/HTD2xuXFI0 + You can see that the first one resembles English text while the second one, an entry from the Linux passwords file, isn't even readable. + As another concrete example of security (or insecurity) in the real world, take a look at [`iUnlock.c`](http://cdn-local.cs50.net/2012/fall/lectures/2/src2w/iUnlock.c), the very first iPhone jailbreaker. This program exploits some flaws in Apple's software and unlocks the operating system to enable installation of third-party apps and other customization. Although it might seem at first glance to be way more complex than anything we've done so far in this course, if you look more closely you can see that it follows all of the same rules and logic. ### I Demand A Positive Integer! (13:00-19:00) + Recall our use of do-while loops to get user input: /**************************************************************************** * positive1.c * * David J. Malan * malan@harvard.edu * * Demands that user provide a positive number. * * Demonstrates use of do-while. ***************************************************************************/ #include #include int main(void) { // loop until user provides a positive integer int n; do { printf("I demand that you give me a positive integer: "); n = GetInt(); } while (n < 1); printf("Thanks for the %d!\n", n); return 0; } + Why did we declare `n` outside of the curly braces? If we didn't, then we would only be able to access the value of `n` while inside the curly braces. This is the issue of scope that we discussed last time. + `positive2.c` illustrates the same functionality but with the use of a Boolean variable: /**************************************************************************** * positive2.c * * David J. Malan * malan@harvard.edu * * Demands that user provide a positive number. * * Demonstrates use of bool. ***************************************************************************/ #include #include int main(void) { // loop until user provides a positive integer bool thankful = false; do { printf("I demand that you give me a positive integer: "); if (GetInt() > 0) thankful = true; } while (thankful == false); printf("Thanks for the positive integer!\n"); return 0; } + The `bool` type doesn't come built in to C; you must include the CS50 Library to use it. The variable `thankful` initially takes the value of false and only takes the value of true when the user provides a positive integer. The while loop continues executing until `thankful` is true, so ultimately we get the same result as `positive1.c`. Arguably, though, `positive2.c` is more human readable. + Notice that in this case, we're not saving the return value of `GetInt`. Rather, we're immediately comparing it to 0 without assigning it to a variable. + We can use some syntactic sugar to make this program more compact: /**************************************************************************** * positive3.c * * David J. Malan * malan@harvard.edu * * Demands that user provide a positive number. * * Demonstrates use of !. ***************************************************************************/ #include #include int main(void) { // loop until user provides a positive integer bool thankful = false; do { printf("I demand that you give me a positive integer: "); if (GetInt() > 0) thankful = true; } while (!thankful); printf("Thanks for the positive integer!\n"); return 0; } + The `!` or bang operator stands for "not." Thus, we can read `while(!thankful)` as "while not thankful." ### Functions (19:00-57:00) + Last time, we wrote a program which uses a function to cube a value: /**************************************************************************** * return.c * * David J. Malan * malan@harvard.edu * * Cubes a variable. * * Demonstrates use of parameter and return value. ***************************************************************************/ #include // function prototype int cube(int a); int main(void) { int x = 2; printf("x is now %d\n", x); printf("Cubing...\n"); x = cube(x); printf("Cubed!\n"); printf("x is now %d\n", x); return 0; } /** * Cubes argument. */ int cube(int a) { return a * a * a; } + Whereas `printf` didn't return anything and only had a side effect, `cube` actually returns a value we're interested in. + To better understand functions, we can simulate them onstage with a volunteer. David writes "hello, world" on a piece of paper and passes it to our volunteer Ken who then writes it on his iPad to show us all. This is analogous to `main` passing "hello, world" as an argument to `printf`, which then writes it out to the screen. + Consider the line of code: `x = cube(x)`. David writes down 2, the value of `x`, and hands it to Ken. Ken cubes it and writes down 8 on the iPad. This time, he hands the iPad back to David. But what happened to our original value of `x`? Ken still has it. Turns out when you pass a value to a function, you're actually just passing a copy of the value. + `buggy3.c` demonstrates how passing by value can be problematic: /**************************************************************************** * buggy3.c * * David J. Malan * malan@harvard.edu * * Should swap two variables' values, but doesn't! * Can you find the bug? ***************************************************************************/ #include // function prototype void swap(int a, int b); int main(void) { int x = 1; int y = 2; printf("x is %d\n", x); printf("y is %d\n", y); printf("Swapping...\n"); swap(x, y); printf("Swapped!\n"); printf("x is %d\n", x); printf("y is %d\n", y); return 0; } /** * Swap arguments' values. */ void swap(int a, int b) { int tmp = a; a = b; b = tmp; } + In this program, we initialize two variables, `x` and `y`, to the values 1 and 2, respectively. First, we print out those values. Second, we call a function named `swap` which, as its name implies, should swap the values of `x` and `y`. Third, we print out the values of `x` and `y` to verify that the swap has occurred. + If we compile and run this program, we see that the values of `x` and `y` are not actually swapped. What's the problem with `swap`? When we call `swap(x, y)`, we're actually passing copies of `x` and `y`. `swap` stores these copies in variables `a` and `b` and actually does swap them. However, when `swap` returns, the values of `a` and `b` are thrown away. `a` and `b` are limited to the scope of `swap`. + The fact that we call `swap`'s temporary variables `a` and `b` has no bearing on this bug. Even if we called them `x` and `y`, the bug would persist. The temporary variables used within functions can be completely different in name from the variables which are passed to the function as arguments. This makes sense since the person who wrote the function isn't always the same person who calls the function. + Maybe we can fix `swap` by writing it like so: /** * Swap arguments' values. */ void swap(int a, int b) { a = b; b = a; } + Actually, this would introduce another bug. After the first line of `swap` executes, the value of `a` is lost entirely. After the second line, both `a` and `b` will have the value of `b`. The original version of `swap` avoided this problem by using a third temporary variable to store the original value of `a`. + How do we fix the original bug with `swap`? We can't use return values to do so because we would need to return two, which C doesn't allow. We'll need to introduce some new concepts to solve this problem, so we'll come back to it later. + `buggy4.c` demonstrates the same bug as `buggy3.c`, but perhaps more simply: /**************************************************************************** * buggy4.c * * David J. Malan * malan@harvard.edu * * Should increment a variable, but doesn't! * Can you find the bug? ***************************************************************************/ #include // function prototype void increment(void); int main(void) { int x = 1; printf("x is now %d\n", x); printf("Incrementing...\n"); increment(); printf("Incremented!\n"); printf("x is now %d\n", x); return 0; } /** * Tries to increment x. */ void increment(void) { x++; } + `x` exists only within `main`, not within `increment`. This program won't even compile. The error it gives is as follows: buggy4.c:32:5: error: use of undeclared identifier 'x' + We've seen this "undeclared identifier" error before, but note also that the first part of the message tells us the file name, line number, and character position of the error. Whenever you see compiler errors, you should aim to fix them from top to bottom, since the errors may be compounding each other. + One way to fix `buggy4.c` is to make `x` a global variable: /**************************************************************************** * buggy4.c * * David J. Malan * malan@harvard.edu * * Should increment a variable, but doesn't! * Can you find the bug? ***************************************************************************/ #include // function prototype void increment(void); int x; int main(void) { x = 1; printf("x is now %d\n", x); printf("Incrementing...\n"); increment(); printf("Incremented!\n"); printf("x is now %d\n", x); return 0; } /** * Tries to increment x. */ void increment(void) { x++; } + Now that `x` is declared outside the scope of both `main` and `increment`, it can be accessed within both functions. However, global variables are generally considered to be bad design, in part because it's hard to keep track of them as programs get more complicated. + Another way to fix this program would be to change the definition of `increment` to take in an argument and return a value: /**************************************************************************** * buggy4.c * * David J. Malan * malan@harvard.edu * * Should increment a variable, but doesn't! * Can you find the bug? ***************************************************************************/ #include // function prototype int increment(int); int main(void) { int x = 1; printf("x is now %d\n", x); printf("Incrementing...\n"); x = increment(x); printf("Incremented!\n"); printf("x is now %d\n", x); return 0; } /** * Tries to increment x. */ int increment(int number) { return number + 1; } + We have to change the prototype of `increment` so that it takes in an `int` and returns an `int`. We also now need to assign the return value of `increment` to a variable. Once we've done so, our program actually increments the value of `x` as intended. + Question: how can we change `x` to `number` without the program getting confused? Think back to when Ken was onstage. Once David passed the value to Ken, he doesn't care what Ken calls the value. Many of you probably know how to drive a car, but you don't have to know how exactly the car works in order to do so. This is just an implementation detail that you don't need to concern yourself with. + Question: does order matter when calling functions? Yes. The first argument you pass will become the first argument in the function's prototype. #### The Stack + So far, we haven't given much thought to representing a program's memory. We've talked about how much memory different variable types take up, but that's about it. To get more explicit about how a program's memory is laid out, let's consider the following diagram: !["A program's memory."](program_memory.png "A program's memory.") + At the top of the program's memory is the text segment, which contains the actual 0's and 1's that make up your program. The sections for initialized and uninitialized data are used for storing global variables, if your program has any. Most of a program's memory is reserved for the stack and heap. We won't talk about the heap today, but the stack consists of chunks of memory piled on top of each other. Just like the stack of trays in the dining hall, the stack of variables in a program requires that the one on top be taken off before any of the others can be taken off. Let's zoom in on the stack: !["The stack."](stack.png "The stack.") + By convention, the memory at the bottom of the stack is numbered lowest and the memory at the top of the stack is numbered highest. The memory at the bottom of the stack is first used for `main`. Any functions that `main` calls have their memory stacked on top. As those functions return, their chunks of memory are popped off the stack. + Since a program's memory is finite, it's easy to imagine how it might run out: #include void chorus(void); int main(void) { chorus(); } void chorus(void) { chorus(); } + `chorus` is a function which does nothing but call itself. If we compile and run this program, we get a segmentation fault almost immediately. If we add a call to `printf` within `chorus`, we can see that the function is being called infinitely. Going back to our diagram of the stack, this would imply that an infinite number of chunks of memory are being added on top of `main`. At some point, we run out of places to put those chunks of memory. + It turns out that storing a string in memory is simply a matter of storing each of its characters next to each other. Because the characters are stored next to each other, we can iterate over them using loops. As we iterate over them, we might decide to change them so that the end result is an encrypted string. #### Short Recap + To define a function, we have to first specify the types and names of its arguments as well as the type of its return value: int increment(int number) + Here, the return type is `int` and the single argument is an `int` named `number`. + We can divide a program's memory into sections. The topmost section is the text, which contains the 0's and 1's of the program. `main`'s variables are stored at the bottom of the section of memory known as the stack. As other functions are called, their arguments and variables are stacked on top of `main`'s. The function might modify the contents of its own memory, but ultimately that memory is throw away when the function returns. + Question: where are global variables stored? In the initialized data (if it is immediately given a value) or in the uninitialized data (if it is not immediately given a value) sections. ### Intro to Cryptography (57:00-64:00) + As its name implies, secret-key cryptography depends on a secret key. In grade school, if you decided to pass a note to your crush, you might have first scrambled it. You might have, for example, translated each letter in the note to a different letter. The catch is that your crush would have needed to know how you translated those letters. This knowledge of how you translated the message is the secret key in this case. + But there's a catch-22 here. If this is the first time you've communicated with your crush, how can you tell him or her the secret key in a secure way? Maybe you agreed upon it beforehand. But in the case where you are purchasing something on Amazon, you probably don't know anyone that works there so you haven't agreed on a secret key ahead of time. The security of your Amazon transactions, then, must rely on something other than secret-key cryptography. + Little Orphan Annie actually made [use of secret-key cryptography](http://www.youtube.com/watch?v=zdA__2tKoIU) in order to communicate a message to Ralphie in A Christmas Story. ### Arrays (64:00-76:00) + When we want to store a set of related variables of the same type, we can make use of an array: /**************************************************************************** * array.c * * David J. Malan * malan@harvard.edu * * Computes a student's average across 2 quizzes. * * Demonstrates C's math library. ***************************************************************************/ #include #include #include // number of quizzes per term #define QUIZZES 2 int main(void) { // ask user for grades float grades[QUIZZES]; printf("\nWhat were your quiz scores?\n\n"); for (int i = 0; i < QUIZZES; i++) { printf("Quiz #%d of %d: ", i+1, QUIZZES); grades[i] = GetFloat(); } // compute average float sum = 0; for (int i = 0; i < QUIZZES; i++) sum += grades[i]; int average = (int) round(sum / QUIZZES); // report average printf("\nYour average is: %d\n\n", average); return 0; } + We could, in fact, implement this same program using two `int` variables rather than an array of `int`. However, consider that if the number of quizzes changes to 3, 4, 5, or 10. Suddenly we have a lot more variables to keep track of. + `QUIZZES` is a constant. The convention for constants in C is to write them in all capital letters. To define them, we use the `#define` keyword, which is a *preprocessor directive*. + The line `float grades[QUIZZES]` declares an array of `float` with the name `grades` and of size 2 (because `QUIZZES` is actually just the number 2). + To access elements of an array, we use square bracket notation, e.g. `grades[i]`. If `i` is 0, then `grades[0]` accesses the first element of the array. Note that the indices of an array start from 0. + Interestingly, strings can be treated as arrays of characters: /**************************************************************************** * string1.c * * David J. Malan * malan@harvard.edu * * Prints a given string one character per line. * * Demonstrates strings as arrays of chars and use of strlen. ***************************************************************************/ #include #include #include int main(void) { // get line of text string s = GetString(); // print string, one character per line if (s != NULL) { for (int i = 0; i < strlen(s); i++) { char c = s[i]; printf("%c\n", c); } } return 0; } + Here, `s[i]` gives the character at index `i` in the string. `strlen` is a function which gives the number of characters in a string. Thus, this program loops through the characters in a user-inputted string and prints them one to a line. + We're also doing some error checking. There's a special value called `NULL` which could be returned by `GetString` if the user gives us some bad input. We check that the user hasn't given us bad input before looping through it. + `string2.c` has one optimization over `string1.c`: /**************************************************************************** * string2.c * * David J. Malan * malan@harvard.edu * * Prints a given string one character per line. * * Demonstrates strings as arrays of chars with slight optimization. ***************************************************************************/ #include #include #include int main(void) { // get line of text string s = GetString(); // print string, one character per line if (s != NULL) { for (int i = 0, n = strlen(s); i < n; i++) { printf("%c\n", s[i]); } } return 0; } + Here, we call `strlen` in the initialization condition rather than in the termination condition. This way, we don't call it on every single iteration of the loop. Instead, we call it once at the beginning of the loop and store its value in a temporary variable `n`. + In Problem Set 2, you'll be looping through strings in order to encrypt and decrypt them. #### Command-line Arguments + Just as we can pass arguments to functions, we can pass arguments to programs at the command line. These are received by a program as variables to `main` named `argc`, the number of command-line arguments, and `argv`, an array of the arguments themselves. The program [`argv1.c`](http://cdn-local.cs50.net/2012/fall/lectures/2/src2w/argv1.c) is a simple proof of concept for accessing these command-line arguments.