### Announcements and Demos (0:00-1:00) + This is CS50. + Our focus over the next few weeks will be much more on the higher-level concepts rather than on the lower-level details of syntax. Whew! ### From Last Time (1:00-12:00) + We've looked at arrays in both Scratch and C. In the context of the actual computer, an array is simply a collection of contiguous chunks of memory. Last Wednesday, we saw that an array could be useful for storing quiz grades. If we had instead used individual variables for each quiz grade and there had been 10, 20, or 30 quizzes, the number of variables we would need to keep track of would be unwieldy. Instead, we were able to store all of our quiz grades in a single variable using this syntax: float grades[QUIZZES] + Recall that `QUIZZES` is actually just a number that we defined as a constant. So, the above is equivalent to: float grades[2] + The motivation for abstracting away this number as a constant is that if we ever want to change, we only need to change it in one place: the top of the file. If we didn't use a constant, the number 2 would appear periodically throughout our program and if we wanted to change it to 3, we would have to change every single instance of it. + We use the square brackets to declare an array, but also to assign values to it: grades[i] = GetFloat(); + This line assigns a `float` from the user in the `i`th location of the array. In the case of the declaration, the line begins with a data type, e.g. `float` above. + What if we try to assign to `grades[2]` when the array is only of size 2? In an array of size 2, `grades[0]` and `grades[1]` are both valid locations, but `grades[2]` is one off the end. In some other languages, arrays can be dynamic in size, but in C, they are fixed in size. Trying to access a location beyond the end of the array can cause serious problems, as we'll see later. + In `array.c`, we use the following line to compute the average of the quiz grades: int average = (int) round(sum / QUIZZES); + Writing `(int)` is a way of casting, or converting, another data type (in this case, the return value of `round` which is a `float`) to an `int`. + Question: what happened to the `\0` at the end of the array? This only appears at the ends of strings, i.e. arrays of `char`, not arrays of other data types. + David often uses the phrase "ask the operating system." This is just a colloquial description of what's going on when we declare a variable. Effectively, we're requesting memory to store that variable and it's the operating system that can provide that memory. ### Things as Other Things (12:00-31:00) #### Strings as Arrays + Last week, we briefly touched upon the idea that strings are actually just arrays of characters. Given this knowledge, we can iterate through the characters in strings just as we can the elements of any array: /**************************************************************************** * string1.c * * David J. Malan * malan@harvard.edu * * Prints a given string one character per line. * * Demonstrates strings as arrays of chars and use of strlen. ***************************************************************************/ #include #include #include int main(void) { // get line of text string s = GetString(); // print string, one character per line for (int i = 0; i < strlen(s); i++) { char c = s[i]; printf("%c\n", c); } return 0; } + In our for loop, we iterate from 0 to `strlen(s)`, the length of the string. In each iteration of the loop, we access `s[i]`, the `i`th character of string `s`, and store it in a temporary variable `c`. If our string `s` is "hello," then `s[0]` is "h," `s[1]` is "e," `s[2]` is "l," and so on. Each of these characters is printed on its own line by this program. + One optimization we can make here is to pass `s[i]` directly to `printf` rather than using the temporary variable `c`: printf("%c\n", s[i]); + Notice that our formatting character is `%c`, not `%s`, because we're printing a `char`, not a `string`. + Another optimization we can make is to avoid calling the `strlen` function on every iteration of the loop: /**************************************************************************** * string2.c * * David J. Malan * malan@harvard.edu * * Prints a given string one character per line. * * Demonstrates strings as arrays of chars with slight optimization. ***************************************************************************/ #include #include #include int main(void) { // get line of text string s = GetString(); // print string, one character per line for (int i = 0, n = strlen(s); i < n; i++) { printf("%c\n", s[i]); } return 0; } + The length of the string isn't going to change between iterations of the loop, so we really should only calculate it once at the beginning of the loop and store its value in a temporary variable, in this case `n`. This speed optimization is not without a memory cost, however. We have to store an extra 32 bits now. In this case, it's probably worth it. In other cases, you'll need to make a design decision as to which is more important. #### Characters as Numbers + Ralphie taught us that the text "Or fher gb qevax lbhe Binygvar" actually means "Be sure to drink your Ovaltine" if you decrypt it using the ROT-13 cipher in which every character is rotated around the alphabet halfway. In Problem Set 2, you'll play around with this kind of encryption, but to do so, you'll need to know one important fact: a `char` in C can be treated as an `int` and vice versa. We take advantage of this in `ascii.c`: /**************************************************************************** * ascii.c * * David J. Malan * malan@harvard.edu * * Displays the mapping between alphabetical ASCII characters and * their decimal equivalents using one column. * * Demonstrates casting from int to char. ***************************************************************************/ #include int main(void) { // display mapping for uppercase letters for (int i = 65; i < 65 + 26; i++) printf("%c: %d\n", (char) i, i); // separate uppercase from lowercase printf("\n"); // display mapping for lowercase letters for (int i = 97; i < 97 + 26; i++) printf("%c: %d\n", (char) i, i); return 0; } + This program prints out an ASCII chart with only a few lines of code. We start our loops iterating at 65 and 97, the numbers for "A" and "a." In each iteration of each loop, we substitute `(char) i` into a `%c` placeholder. It seems that simply by casting an `int` to a `char`, we get the ASCII letter that the number stands for. Actually, we don't even need to explicitly cast the `int`. We can substitute `i` into `%c` and `printf` will implicitly cast the number for us. + To get "O" from "B" as we did in Ralphie's encrypted text, we take 13 steps from "B" to "C" to "D" and so on. In ASCII, "B" is represented by the number 66. Adding 13 to this gives us 79, which sure enough represents "O" in ASCII. What is the letter "Z" encrypted to by ROT-13? In ASCII, "Z" is represented by the number 91. Adding 13 to this gives us 104, which does not represent an uppercase letter in ASCII. Thus, when we encrypt letters closer to the end of the alphabeth than the front, we want to wrap around to the front of the alphabet. The modulo operator (`%`) will accomplish this for us: int x = y % z; + Modulo is a fancy way of saying "remainder." The above means "divide `y` by `z` and store the remainder in `x`." If `y` is 3 and `z` is 2, then the value 1 will be stored in `x`. + Question: what does it mean when a variable has an asterisk in front of it? This is what's called a *pointer*, but we'll discuss it in more detail later. + Question: how do you represent a negative number with an `int`? To oversimplify things for now, just remember that the leftmost bit is a 1 if it's negative and 0 if it's positive. In reality, the representation is accomplished via *two's complement* and is a little more complicated. ### Command-line Arguments (31:00-42:00) + Programs, like functions, can take in arguments. In fact, you've already passed arguments to a program even if you weren't aware of it. Whenever you ran `clang`, you passed in the name of your source code file as the first argument. The value of this argument influences the behavior of `clang`. + So far, most of our programs have started with the following: int main(void) + However, another legitimate declaration of `main` is as follows: int main(int argc, string argv[]) + `argc` is an argument to `main` which stands for "argument count," that is, how many command-line arguments the program was passed. If you ran `clang mario.c`, `argc` would take the value of 2 because the name of the program is counted as an argument. `argv` is an array of strings, each of which is a command-line argument that was passed to the program. Note that the square brackets have no number in them since the function `main` can take an array of any size. + With command-line arguments, you can accept user input when the program is first run instead of having to use `GetString` to accept it while the program is running. Imagine if you had to run `clang`, hit Enter, then type the name of the program you want to compile, then hit Enter again. Would be more annoying than e-mails from David! + As it turns out, `string` is a shorthand we granted you for `char *`, so the more proper way to declare `main` is: int main(int argc, char* argv[]) + Take a look at this program which simply prints out the command-line arguments it's been passed: /**************************************************************************** * argv1.c * * David J. Malan * malan@harvard.edu * * Prints command-line arguments, one per line. * * Demonstrates use of argv. ***************************************************************************/ #include #include int main(int argc, string argv[]) { // print arguments printf("\n"); for (int i = 0; i < argc; i++) printf("%s\n", argv[i]); printf("\n"); return 0; } + `argv[i]` gives us the `i`th argument that was passed to `argv1`. + As an aside, the words "foo," "bar," "baz," and "qux" are placeholders that computer scientists use when they need a random word to use in a demonstration. + If `argv` is an array of strings, and strings are arrays of characters, then `argv` is an array of arrays of characters. Interesting. If we use two loops, then, we can iterate over every character in every argument passed to our program: /**************************************************************************** * argv2.c * * David J. Malan * malan@harvard.edu * * Prints command-line arguments, one character per line. * * Demonstrates argv as a two-dimensional array. ***************************************************************************/ #include #include #include int main(int argc, string argv[]) { // print arguments printf("\n"); for (int i = 0; i < argc; i++) { for (int j = 0, n = strlen(argv[i]); j < n; j++) printf("%c\n", argv[i][j]); printf("\n"); } return 0; } + `argv[i][j]` gives us the `j`th character of the `i`th command-line argument. + Note that because we're using the `strlen` function, we have to include the `string.h` library. Because we're using the keyword `string`, we have to include the CS50 Library. ### Strings in Memory (42:00-58:00) + What's really going on underneath the hood when `GetString` returns user input? If the user types "hello," `GetString` has to ask the operating system for at least 5 bytes of memory (1 per character) in which to store this string. But let's say that we call `GetString` again and the user types "bye" this time. If "bye" is stored directly next to "hello" in memory, how do we know where one string ends and the other begins? We need a marker to tell us that we're at the end of a string. This marker is called the *null terminator* and is represented as `\0`. The null terminator is literally the number 0 (not the ASCII character "0"). + Even with the null terminator in place, `strlen` returns 5 for the string "hello." However, the operating system is using 6 bytes, not 5, to store this 5-character string. If you had to implement `strlen` yourself, you might try something like the following: #include #include int strlen(string input) { int i = 0; int length = 0; while (input[i] != '\0') length++; return length; } int main(void) { printf("Give me a string: "); string s = GetString(); printf("The length is %d\n", strlen(s)); return 0; } + Note that when writing single characters like `\0`, we have to use single quotes, not double quotes. + Here, we've defined our own function named `strlen`; we're not including `string.h` at all. But, oops, it turns out that this program won't compile because `clang` already knows there's a function named `strlen`. Let's rename our function to `length`: #include #include int length(string input) { int i = 0; int length = 0; while (input[i] != '\0') length++; return length; } int main(void) { printf("Give me a string: "); string s = GetString(); printf("The length is %d\n", length(s)); return 0; } + When we run this program and enter a string at the prompt, it appears to hang. Why? Our while loop is infinite because we're never updating `i`. Know that Ctrl + C will kill your program when it's hung. #include #include int length(string input) { int i = 0; int length = 0; while (input[i] != '\0') { length++; i++; } return length; } int main(void) { printf("Give me a string: "); string s = GetString(); printf("The length is %d\n", length(s)); return 0; } + Woohoo, it works! But let's optimize it: #include #include int length(string input) { int length = 0; while (input[length] != '\0') length++; return length; } int main(void) { printf("Give me a string: "); string s = GetString(); printf("The length is %d\n", length(s)); return 0; } + Even better! We got rid of the redundant variable `i`. + Keep in mind that strings can certainly contain multiple words. The space character is not to be confused with the null terminator. + Even though earlier we preached the value of not calling `strlen` on every iteration of a loop, this might not have been necessary. Compilers are often smart enough to know that the return value of a function like this won't change, so they'll optimize it for us by only calling it once and storing the result. Still, for the sake of your problem sets, it's best not to assume! ### The Size of the Problem (58:00-63:00) + A few weeks ago, we talked about solving the problem of counting the number of students in the lecture hall. Here, we talked about solving the problem of counting the number of letters in a string. In the former case, the size of the problem was a few hundred. In the latter case, the size of the problem was 5. More generally, we can say that the size of both problems was `n`, or some arbitrary number. + If we counted the students in the lecture hall one at a time, how many steps would it take to solve the problem? It would take *n*, the same number of steps as there are students. If we instead used the algorithm in which we divided the problem in half at each step, it would take log *n* steps. + We don't just want to solve problems, we want to solve problems efficiently. Let's say we have 8 doors behind which are random numbers. How can we go about finding the number 7? As a human, we would probably just open the doors one at a time. In the worst case (assuming the number 7 is actually behind one of the doors), it would take us 8 steps to solve this problem. Next time, we'll talk more about generalizing this notion of the worst-case solution to problems.