Week 6 Monday Andrew Sellergren Announcements and Demos (0:00-6:00)

Week 6 Monday

Andrew Sellergren

Announcements and Demos (0:00-6:00)

This is CS50.
We have a few new tools available for you!
- Not sure if you’re curly braces are indented correctly? Run style50 to correct them automatically! It’s not perfect, but it will help you clean up your style.
- Even more compelling is check50 which will check your code for correctness. For example, run check50 -c 2012/pset4/recover recover.c to check your solution to one of Problem Set 4’s assignments. A red sad face indicates that a test failed, a green smiley face indicates that a test passed, and a yellow straight face indicates that a test wasn’t run. Note that the tests aren’t exhaustive: your code isn’t perfect just because it passes all the tests.

The CS50 Library (6:00-20:00)

We now know that a string is actually a char*. A char* is a pointer to the first character in a string. The rest of the string can be found by iterating until the null terminator (\0) is encountered.

Under the hood, GetString was actually using a function named sscanf to get input from the user. Let’s use a similar function on our own to get an integer from the user:

  /****************************************************************************
   * scanf1.c
   *
   * David J. Malan
   * malan@harvard.edu
   *
   * Reads a number from the user into an int.
   *
   * Demonstrates scanf and address-of operator.
   ***************************************************************************/

  #include <stdio.h>

  int main(void)
  {
      int x;
      printf("Number please: ");
      scanf("%d", &x);
      printf("Thanks for the %d!\n", x);
      return 0;
  }

Like printf, scanf takes formatting characters like %d as input. We use &x as the second argument instead of just x because we want to pass by reference, not by value. When we pass &x, we’re giving scanf the address of x so that it can fill in the memory with whatever integer it gets from the user. Think of it as drawing a map to a chunk of memory which scanf can follow. To access the memory that a pointer points to, we use the dereferencing operator (*).

Oops, we screwed up in scanf2.c:

  /****************************************************************************
   * scanf2.c
   *
   * David J. Malan
   * malan@harvard.edu
   *
   * Reads a string from the user into memory it shouldn't.
   *
   * Demonstrates possible attack!
   ***************************************************************************/

  #include <stdio.h>

  int main(void)
  {
      char* buffer;
      printf("String please: ");
      scanf("%s", buffer);
      printf("Thanks for the \"%s\"!\n", buffer);
      return 0;
  }

What is the first line of main doing? We’re declaring a variable named buffer that will point to the address of the first character of a string. Two lines later, we’re reading a string into buffer. But wait, we never actually allocated any valid memory to buffer, so we have nowhere to put the string. Not only that, we’re not telling scanf how big buffer is. Without knowing this, it might try to write too much into buffer’s memory and thus overflow it.

scanf3.c does a little better:

  /****************************************************************************
   * scanf3.c
   *
   * David J. Malan
   * malan@harvard.edu
   *
   * Reads a string from the user into an array (dangerously).
   *
   * Demonstrates potential buffer overflow!
   ***************************************************************************/

  #include <stdio.h>

  int main(void)
  {
      char buffer[16];
      printf("String please: ");
      scanf("%s", buffer);
      printf("Thanks for the \"%s\"!\n", buffer);
      return 0;
  }

Here, buffer is actually a chunk of memory (of size 16 bytes). Unfortunately, one of the problems from scanf2.c remains. What if the user types in 17 characters? Or 1 million characters? There’s a good chance it will fail with a segmentation fault. Bugs like these are often hard to track down because they only manifest themselves in certain scenarios. When a segmentation fault occurs, a file named core will be dumped into your present working directory. You can actually open this file with GDB to identify the line on which your program failed. More on this in section.
Question: why didn’t we put an & next to buffer? For our purposes, arrays are effectively pointers. This isn’t true in all contexts, but it’s a reasonable simplification.

Linked Lists (20:00-50:00)

Arrays are useful because they enable the storage of similar variables in contiguous memory. Recall our program from a few weeks ago that stored information about students. Had we chosen to store IDs, names, and houses in separate variables, the program would quickly have become very messy. However, because we chose to store the information in an array of structs, our program remained quite clean.
One downside of arrays is that they have a fixed size. There’s also no indicator of how big they are: you have to keep track of their size yourself.
To solve this problem of fixed size, we’ll relax the constraint that the memory we use be contiguous. We can take a little bit of memory from here and a little bit of memory from there just so long as we can connect them together. This new data structure is called a linked list.
Each element of a linked list contains not only the data we want to store, but also a pointer to the next element. The final element in the list has the NULL pointer

Linked list.

To store a linked list, we just need to remember where the first element is. We can find the rest by following the pointers.
Linked lists have the advantage of being dynamic in size. They have the disadvantage, though, of taking up more space. In the example above, we have to use 64 bits (32 for the actual integer and 32 for the pointer) to store an integer. They also have the disadvantage of being unidirectional (unless they’re doubly linked lists). Another disadvantage they have is the lack of random access. In an array, you can access the 10th element in a single step. In a linked list, it takes 10 steps to access the 10th element. Algorithms like binary search depend on random access to the data.
To implement a linked list, we’ll borrow some of the syntax we used for structs:
```
  typedef struct node
  {
      int n;
      struct node *next;
  }
  node;
```
Each element of this linked list we’ll call a node. In the above, struct node *next defines a pointer to another node. Because the compiler reads from top to bottom, a node won’t be defined until after all of these lines are executed. Thus, we have to write struct node instead of just node, both before the curly braces and inside the curly braces.

Search

Let’s depict the linked list above using volunteers onstage. Each volunteer holds a piece of paper with a number on it (9, 17, 22, 26, or 34) and points to the next node of the list with his or her left hand. One volunteer will hold a piece of paper that reads “first” and will point to the first node of the list. One volunteer (named Anita) will represent a pointer that we’ll use to search the list. To begin with, Anita points to the first node of the list, which has already been allocated:
```
  node* first;

  //  it's been allocated a list…

  node* anita = first;
```

To search the list for the value 22, Anita points to each node successively and checks if the node’s value equals 22. We might write this prorammatically as follows:

  bool find(node* list, int n)
  {
      node* anita = list;
      while (anita != NULL)
      {
          if ((*anita).n == n)
              return true;
          anita = (*anita).next;
      }
  }

Recall that the dot notation allows us to access the information inside a struct. But Anita is not a struct herself, she’s a pointer to a struct. So we have to write *anita to dereference her and get to the struct she’s pointing to. To point her to the next node in the list, we can’t simply write anita++. The memory in a linked list is not contiguous, so the next node in the list is not necessarily right next to where Anita is pointing. Instead, Anita needs to point where the next pointer is pointing.

This syntax is a bit of a mess. Thankfully the -> operator is actually a shortcut for * plus .:

  bool find(node* list, int n)
  {
      node* anita = list;
      while (anita != NULL)
      {
          if (anita->n == n)
              return true;
          anita = anita->next;
      }
  }

The arrow both dereferences the pointer and accesses the underlying struct it points to. One last line of code needs to be added to this function, though, to ensure its correctness:

  bool find(node* list, int n)
  {
      node* anita = list;
      while (anita != NULL)
      {
          if (anita->n == n)
              return true;
          anita = anita->next;
      }
      return false;
  }

If the while loop terminates, it means that Anita got all the way to the end of the list and never found the number n we were looking for.

Insertion

Let’s walk through the process of inserting a node into our linked list. To begin with, we need to request memory for a new node, named after our intrepid volunteer Rebecca:
```
  node* rebecca = malloc(sizeof(node));
```

We should also initialize her:

  rebecca->n = 55;
  rebecca->next = NULL;

Rebecca belongs at the end of the list, so it’s pretty easy to insert her. We just point the last node of the list to Rebecca and leave Rebecca pointed to NULL.
Next, let’s try inserting Saad, who represents the number 20. As before, we initialize him:
```
  saad->n = 22;
  saad->next = NULL;
```
This time, we need to be careful when updating pointers. If we first update a node in the list to point to NULL, then we orphan the remainder of the linked list. Instead, we should point Saad to the node that contains 22, then point the node that contains 17 to Saad.
The final case we need to handle is inserting to the head of the list. In this case, we point the new node to the same element that first points to. Then we update first to point to the new node.

Stacks and Queues (50:00-60:00)

We’ve already seen the stack as a representation of a program’s memory. Stacks demonstrate last-in, first-out (LIFO) storage: the last element inserted is the first element to be removed.
Wouldn’t it be great if the line for new iPhones was implemented as a stack? Then the last person to show up (i.e. you) would be the first to get a new iPhone! Unfortunately, the line is actually implemented as a queue, which demonstrates first-in, first-out (FIFO) storage.

In C, we might represent a stack like so:

  typedef struct
  {
      int numbers[CAPACITY];
      int size;
  }
  stack;

Initially, size is set to 0 and numbers is filled with garbage values. When we insert a value into the 0th element of numbers, we set size to 1. When we insert a value into the 1st element of numbers, we set size to 2. And so on. When we want to pop a number off the stack, we simply decrement size. We don’t actually have to change any of the values of numbers. When we go to insert another value into numbers, though, we’ll overwrite the previous last value.
Stacks and queues are examples of abstract data structures: that is, the details of their implementations don’t matter. Stacks simply provide the operations of push and pop: adding and removing elements, respectively. Given the way we’ve implemented a stack above, we have to raise an error if the user tries to insert more than CAPACITY elements.
Our implementation of a queue resembles a stack except for one additional member of the struct:
```
  typedef struct
  {
      int head;
      int numbers[CAPACITY];
      int size;
  }
```
By storing a number that represents the start of the queue (i.e. head), we avoid having to shift all the elements of the array down every time we remove an element. We don’t actually need to keep the first element of the queue at the 0th index of the array; instead, we know that the first element of the queue is actually stored at the index equal to head, which we constantly update. The price of not having to shift all the elements of the array every time we remove an element is the additional storage required for head.

Teaser (60:00-63:00)

Stacks and queues both have their advantages and disadvantages, but soon we’ll look at a data structure called a hash table that offers the holy grail of lookups: constant time. The important decision we’ll have to make is how big our hash table should be. And to make this decision, we’ll need to answer the following question:

In a room of n CS50 students, what’s the probability that at least 2 students have the > same birthday?