### Announcements and Demos (0:00-3:00) + This is CS50. + [Good news, everyone](http://www.youtube.com/watch?v=1D1cap6yETA)! No lecture on Monday, no problem set next week. Buuuuuut, Quiz 0 is on Wednesday and we have lecture on Friday to keep us on track. + Quiz 0 will mostly be short answer, multiple choice, true/false questions along with some short coding exercises. The best guide for the format is [previous quizzes](https://www.cs50.net/quizzes/). Note that the curriculum has changed slightly from year to year, so if there's a question that covers a topic you don't think we've covered, ignore it. + A course-wide review will be held in Northwest B103 this Sunday, October 7, at 7:30 p.m. Sections will also be focused on review this week. + If you're feeling overwhelmed, don't worry, you're not alone! We have ample support structure in place to help you out! If you really want to take the edge off the course, though, consider taking it pass/fail. You have until this coming Tuesday to change your grading status. If you're thinking of dropping the course altogether, please reach out to David or one of us on the staff to discuss it. ### Strings as Pointers (3:00-65:00) + As we told you before, the `string` data type is one that we provided to you with the CS50 Library and it's really just a synonym for `char *`. The `*` implies that this data type is a pointer, or a memory address. A memory address is simply a number that designates a particular byte among the billions that make up your RAM. + Consider this simple example: #include #include int main(void) { printf("Give me a string: "); string s = GetString(); printf("Give me another string: "); string t = GetString(); if (s == t) printf("You typed the same thing!\n"); else printf("You typed something different!\n"); return 0; } + However, if we type the same string twice as input to this program, we still get the "You typed something different!" message. The problem is that we're comparing the memory addresses in which the strings are stored, not the strings themselves. + In most systems, pointers take up 32 bits of memory. If you have a 64-bit system, it implies that pointers take up 64 bits of memory. If you have 32 bits with which to represent memory addresses, then you can represent some 4 billion addresses. This implies that 32-bit systems can only support 4 gigabytes of RAM. + Before we initialize `s`, it contains a garbage value from when its memory was being used by another function or program. `GetString` does the work of finding memory in which to put the string the user types. Then it returns the address of this memory (specifically the address of othe first character in the string) to be stored in `s`. Given this address of the first `char`, the rest of the string can be found by iterating until the null terminator character `\0` is encountered. Considering this, `strlen` is pretty simple: it takes a pointer, iterates to the null terminator, and counts the number of steps it takes. + Although the strings we typed were the same in value, they were stored in separate chunks of memory. Thus the value of `s` might be 123 whereas the value of `t` might be 456. Thus, we when we compare `s` and `t` we're really comparing 123 and 456. In general, we don't care what these values are, so when we represent pointers pictorially, we usually just draw an arrow to the memory that they point to. + Instead of the `==` operator, we really should use the `strcmp` function: /**************************************************************************** * compare2.c * * David J. Malan * malan@harvard.edu * * Compares two strings. * * Demonstrates strings as pointers to characters. ***************************************************************************/ #include #include #include int main(void) { // get line of text printf("Say something: "); char* s = GetString(); // get another line of text printf("Say something: "); char* t = GetString(); // try to compare strings if (s != NULL && t != NULL) { if (strcmp(s, t) == 0) printf("You typed the same thing!\n"); else printf("You typed different things!\n"); } return 0; } + `strcmp` returns 0 if two strings are equal, and -1 or 1 depending on the alphabetical ordering of the strings. This is useful in searching and sorting strings. + Now that we know how to compare strings, let's try copying them: /**************************************************************************** * copy1.c * * David J. Malan * malan@harvard.edu * * Tries and fails to copy two strings. * * Demonstrates strings as pointers to arrays. ***************************************************************************/ #include #include #include #include #include int main(void) { // get line of text printf("Say something: "); string s = GetString(); if (s == NULL) return 1; // try (and fail) to copy string string t = s; // change "copy" printf("Capitalizing copy...\n"); if (strlen(t) > 0) t[0] = toupper(t[0]); // print original and "copy" printf("Original: %s\n", s); printf("Copy: %s\n", t); return 0; } + We check if `s == NULL` because when we ask the operating system for memory, it might say no. For example, maybe we asked the operating system for trillions of bytes that it just doesn't have. `GetString` returns the special value `NULL` if something has gone wrong. `NULL` is actually a pointer that represents the memory address 0. The operating system will never use this memory, it is reserved for signifying an error. + Assigning the value of `s` to a new variable `t` does not actually make a copy of `s`. To demonstrate this, the rest of the program purports to capitalize the first letter of only the copy, but if we run the program we see that both the original and the copy are capitalized. Thinking of `s` and `t` pictorially, they are both arrows to the same memory. + To fix this problem, we need to create a new chunk of memory in which we can store a true copy of the string which `s` points to: /**************************************************************************** * copy2.c * * David J. Malan * malan@harvard.edu * * Copies a string. * * Demonstrates strings as pointers to arrays. ***************************************************************************/ #include #include #include #include #include int main(void) { // get line of text printf("Say something: "); char* s = GetString(); if (s == NULL) return 1; // allocate enough space for copy char* t = malloc((strlen(s) + 1) * sizeof(char)); if (t == NULL) return 1; // copy string int n = strlen(s); for (int i = 0; i < n; i++) t[i] = s[i]; t[n] = '\0'; // change copy printf("Capitalizing copy...\n"); if (strlen(t) > 0) t[0] = toupper(t[0]); // print original and copy printf("Original: %s\n", s); printf("Copy: %s\n", t); // free memory free(s); free(t); return 0; } + Here, we introduce a function named `malloc`, short for "memory allocation." This function takes a single argument, the number of bytes we're asking the operating system for. In this case, we calculate the number of bytes to be `strlen(s) + 1`, enough to store all of the characters of `s` plus one additional byte for the null terminator character. `sizeof` is a function which returns the size in bytes of a data type. Although a `char` will almost always be 1 byte, we want to make sure our program works on any systems. If `s` holds the string "hello," then we ask for 6 bytes. Assuming all goes well, `malloc` returns the address of the first byte of a chunk of memory. If all does not go well, `malloc` returns `NULL`, which is why we check for this special value. + After we have enough space to store a copy of `s`, we actually iterate over the characters of `s` and copy them one by one to `t`. We have to make sure to add the `\0` at the end of `t`. Alternatively, we could iterate to `i <= n` instead of `i < n` so that the `\0` from `s` would also be copied. + As before, we capitalize the first letter of `t`. This time, however, it works! The original remains uncapitalized. + At the end of this program, we call `free` to return the memory used by `s` and `t` to the operating system. Interestingly, this has no effect on the contents of the memory. They will still contain the strings as before. Similarly, when you drag a file into the trash can on your computer, it has no effect on the contents of the file: they're still stored on your hard disk somewhere. When you empty the trash can on your computer, the contents of the file remain, but the operating system deletes its record of where the contents are acctually stored. Still, the file itself can actually be recovered by some forensic methods that we'll make use of in a future problem set. If you really want to make sure that the contents of the file go way, you need to overwrite the 0's and 1's on the hard drive. Choosing the "Secure Empty Trash" option on a Mac does this for you. #### Pointer Arithmetic + One thing to note: we generally write `char*`, but this is equivalent to `char *`. + Thus far, we have accessed the characters in a string using square bracket notation. This program demonstrates an alternative way to do so: /**************************************************************************** * pointers.c * * David J. Malan * malan@harvard.edu * * Prints a given string one character per line. * * Demonstrates pointer arithmetic. ***************************************************************************/ #include #include #include #include int main(void) { // get line of text char* s = GetString(); if (s == NULL) return 1; // print string, one character per line for (int i = 0, n = strlen(s); i < n; i++) printf("%c\n", *(s+i)); // free string free(s); return 0; } + What exactly does `*(s+i)` evaluate to? Let's consider the first iteration of the loop, when `i` is 0. Then `*(s+i)` is really just `*s`. In this context, `*` is the *dereferencing operator*, which means "go to the memory address of." Thus, if `s` points to the string "hello," then `*s` is the letter "h." On the second iteration of the loop, `i` is 1 and the expression is `*(s+1)`, which means "go to the memory address 1 byte after `s`." Functionally, `*(s+i)` is equivalent to `s[i]`. + From now on, don't forget to `free` any memory allocated by `GetString` or `malloc`! If you allocate memory within a loop, you have to free it within the loop since the variable won't exist outside the scope of the loop. Soon we'll introduce a program called Valgrind that will help you track down memory leaks in your code. + Interestingly, pointer arithmetic works no matter the size of the underlying data type that the pointer is pointing to. In the case of a `char*`, the underlying data type is a single-byte `char` so it makes sense that adding 1 would advance to the next `char`. However, if `s` were an `int[]`, the example above would still work. C is smart enough to evaluate `i` to 4 bytes instead of 1 because the underlying data type is a four-byte `int`. #### Back to Swap + Recall our buggy `swap` function: void swap(int a, int b) { int tmp = a; a = b; b = tmp; } + Now that we know about pointers, we can fix this function quite easily: void swap(int* a, int* b) { int tmp = *a; *a = *b; *b = tmp; } + `a` and `b` are now pointers, not integers. Our function has changed so that when we call `swap`, we have to pass in addresses, not integers. So if `x` and `y` are stored at memory addresses 123 and 124, respectively, then `a` and `b` will have the values 123 and 124, respectively. + On the first line of the function, when we write `*a`, we're getting the contents of memory at address `a` and assigning them to `tmp`. `*a` evaluates to 1 (the value of `x`) so this is what gets stored in `tmp`. The second line of the function reads as "take the contents of memory at `b` and assign them to the contents of memory at `a`." Thus `x` becomes 2. Finally, we assign to the contents of memory at `b` the value of `tmp`. Thus, `y` becomes 1. + The final working version is as follows: /**************************************************************************** * swap.c * * David J. Malan * malan@harvard.edu * * Swaps two variables' values. * * Demonstrates passing by reference. ***************************************************************************/ #include // function prototype void swap(int* a, int* b); int main(void) { int x = 1; int y = 2; printf("x is %d\n", x); printf("y is %d\n", y); printf("Swapping...\n"); swap(&x, &y); printf("Swapped!\n"); printf("x is %d\n", x); printf("y is %d\n", y); return 0; } /** * Swap arguments' values. */ void swap(int* a, int* b) { int tmp = *a; *a = *b; *b = tmp; } + One last change is the call to `swap`. Previously, we wrote `swap(x, y)`, but as we said, we have to pass memory addresses, not integers to `swap` now. `&` is the "address-of" operator, so `swap(&x, &y)` means "pass the addresses of `x` and `y` to `swap`." + Where does all this memory come from? Recall our diagram of a program's memory: ![A program's memory](program_memory.png "A progroam's memory.") + The text segment contains the actual 0's and 1's of your program. The initialized data and unitialized data segments contain global variables. The stack is used to store local variables and function parameters. The heap is used to allocate memory which persists until it is explicitly given back to the operating system. Whereas a function's local variables disappear when the function returns, memory allocated from the heap will remain until it is passed to `free`. The heap is where `GetString` and `malloc` get their memory from. The memory itself is not intrinsically different from the memory on the stack; it's simply in a different location which follows different conventions. + Consider the following simple program: int main(void) { int* x; int* y; x = malloc(sizeof(int)); *x = 42; *y = 13; y = x; *y = 13; } + The first two lines declare two pointers to integers. `malloc` returns memory addresses, so we can assign its return value to `x`. Writing `*x = 42` places the integer 42 in the memory which `x` points to. `*y = 13` is problematic, though. We haven't initialized `y`, so it still contains some garbage value. When we try to access that garbage value as a memory address, we'll probably get a segmentation fault. + To explain this program better than David ever could, let's pass the mic to [Binky](http://www.youtube.com/watch?v=6pmWojisM_E)!