Announcements and Demos

  • On your seats you’ll find a real-life plastic version of the Game of Fifteen! Consider it your reward for implementing it in C.

  • Problem Set 4 involves writing a GUI for the game of Breakout. In the staff solution, you’ll see that God mode allows you to play with no hands. This mode is actually trivial to implement, as you simply need to set the paddle’s horizontal position to be the same as the ball’s.

  • If you haven’t already, check out the Shorts!

  • So much is possible with computer graphics! Check out what The Great Gatsby looks like before and after visual effects.

From Last Time

  • GDB is a debugger that allows you to walk through your programs step by step, printing variables and examining the stack as you go. Although the interface is somewhat arcane, it will save you hours of time in the long run. Note that you can use this for Problem Set 4, as the GUI part of Breakout will pop up in a GWindow separate from your terminal.

  • A string is really just a char*. This pointer stores the address of the first character of the string. We only need to know the address of the first character because we can iterate through the rest of the characters until we hit the null terminator. Memory addresses are often prefixed with 0x, which indicates that the number is in hexadecimal, or base 16. Hexadecimal makes use of 16 possible digits, 0-9 as well as A-F, to represent very large numbers quite concisely.

  • With compare-0.c, we discovered that we can’t compare strings using ==. This operator only compares their memory addresses. For example, a string s might have the value 0x123 and a string t might have the value 0x456 even though they both point to strings that read "hello." In that case, "hello" is actually stored twice in memory. We can represent pointers as arrows pointing to chunks of memory. To compare strings as a human would, character for character, we can use a function like strcmp.

  • Segmentation faults are memory errors that can result from iterating past the bounds of an array or dereferencing pointers that haven’t been initialized.

  • copy-0.c demonstrated that we can’t copy a string simply by assigning it to a new variable. In effect, this only creates two pointers that store the same memory address and thus point to the same string. Changes to that string, then, will be visible from both pointers. We fixed this in copy-1.c by actually allocating memory using malloc and copying the string character by character into that new memory.

  • Note that when you allocate memory, you can’t trust what it contains until you’ve initalized it. Before it is initialized, newly allocated memory contains garbage values that we’ll often denote with question marks.

  • Question: why aren’t we using the dereferencing operator to access the characters of a string? The square bracket notation is just syntactic sugar. When we write *s, it’s equivalent to s[0]. However, if we wanted to access character i of s, we’d have to write *(s+i), which is less intuitive but equivalent to s[i]. s is just a number representing the memory address of the first character, so adding i to it gives us the address of character i.

  • Our first attempt at implementing a swap function failed because, by default, arguments to functions are passed as copies. When we defined swap so that it took two pointers as arguments, it worked as intended.

Memory

The Stack

  • To help visualize your program’s memory, consider the following diagram:

    A program's memory.
  • The text segment contains the actual 0’s and 1’s of your program. The initialized data and unitialized data segments contain global variables. The stack is used to store local variables and function parameters.

  • Each time a function is called in a program, a new frame, or section of memory, is added to the stack. This is where the stack gets its name, as it’s just like a stack of trays in the dining hall.

  • In noswap.c, main would have the bottommost frame and swap would have a frame on top of it. Inside swap's frame, there is space for the variables a, b, and tmp. As we go through the logic of swap, the values of a and b are indeed switched. However, as soon as swap returns, its entire frame is wiped away from the stack, so all our work was for nothing! x and y, which live in main's frame, are unchanged.

  • In swap.c, the version that actually worked, a and b still exist on swap's frame, but now they point to the values of x and y. When we dereference them, we are actually changing memory on main's frame.

Binky

  • We left off last time with some code that put Binky in a tough spot:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    int main(void)
    {
        int* x;
        int* y;
    
        x = malloc(sizeof(int));
    
        *x = 42;
    
        *y = 13;
    
        y = x;
    
        *y = 13;
    }
    
  • What gets stored in x at line 6? The address of the first byte of memory allocated by malloc.

  • Unfortunately, in line 10, we dereference the pointer y before it has been initialized. y contains some garbage value that we interpret as a memory address, so when we try to access it, bad things happen. In the video, this meant decapitation for Binky. In C programs, this usually means a segmentation fault.

Stack Overflow

  • You may know this as a popular website, but it actually has a specific technical meaning. If a programmer forgets to check the boundaries of an array, he or she leaves his program vulnerable to an attack that can take over control of the program. Consider the following code:

    #include <string.h>
    
    void foo(char* bar)
    {
        char c[12];
        memcpy(c, bar, strlen(bar));
    }
    
    int main(int argc, char* argv[])
    {
        foo(argv[1]);
    }
    
  • For a thorough discussion of this attack, check out the Wikipedia article.

  • In short, this program passes the first command-line argument to a function foo that writes it into an array of size 12. If the first command-line argument is less than 12 characters long, everything works fine. If the first command-line argument is greater than 12 characters long, then it will overwrite memory past the bounds of c. If the first command-line argument is greater than 12 characters long and actually contains the address in memory of some malicious code, then it could potentially overwrite the return address of foo. When foo returns, then, it will give control of the program over to this malicious code rather than main.

  • Instead of ending on a scary note, let’s end with a joke.