Andrew Sellergren
style50
is a tool we introduced to clean up the style of your code automatically.Another tool available to you is Valgrind, which helps diagnose memory errors. The command to run it is as follows:
valgrind -v --leak-check=full ./a.out
a.out
should be replaced with the name of your compiled program. The -v
means “verbose” to get the maximum possible output. --leak-check=full
asks Valgrind to be rigorous in checking for all memory leaks. Note that command-line flags are passed to programs with a single dash if they are one letter (e.g. -v
) but are passed with two dashes if they are longer than one letter (e.g. --leak-check
).Here’s a sample program that we’ll run through GDB:
/****************************************************************************
* memory.c
*
* David J. Malan
* malan@harvard.edu
*
* Demonstrates memory-related errors.
*
* problem 1: heap block overrun
* problem 2: memory leak -- x not freed
*
* Adapted from
* http://valgrind.org/docs/manual/quick-start.html#quick-start.prepare.
***************************************************************************/
#include <stdlib.h>
void f(void)
{
int *x = malloc(10 * sizeof(int));
x[10] = 0;
}
int main(void)
{
f();
return 0;
}
f
do? First, it declares a pointer x
. The pointer itself will take up 32 bits of memory, but it will point to a chunk of memory that is 10 times the size of an int
, i.e. 40 bytes, which we’ve allocated using malloc
. Note that we have to keep track of the size of this chunk of memory, there’s no programmatic way to know where it ends.x
is a pointer, we can use the array-style bracket notation to access its 10th element, as we do by writing x[10]
. However, the problem is that x
doesn’t have a 10th index: we only allocated 10 int
-size chunks, so 9 is the largest valid index. Interestingly, this program may not outright fail. Sometimes the operating system will be okay with us using one more chunk of memory than we asked for and sometimes it won’t. Regardless, this is bad practice.free(x)
inside of f
because x
will be out of scope once we return to main
.In a program of this size, these memory-related bugs may be obvious. In programs of much larger size, memory-related bugs may not be so obvious. Thankfully, Valgrind will help us identify them. When we run Valgrind on this program, we get the following messages (among a lot of other output):
HEAP SUMMARY:
in use at exit: 40 bytes in 1 blocks
total heap usage: 1 allocs, 0 frees, 40 bytes allocated
LEAK SUMMARY:
definitely lost: 40 bytes in 1 blocks
x
.Another message we get from Valgrind is the following:
1 errors in context 1 of 2:
Invalid write of size 4
at 0x80484A0: f (memory.c:21)
by 0x8048C1: main (memory.c:26)
x
.realloc
, this is a very expensive operation.Recall the definition of a linked list node:
/****************************************************************************
* list1.h
*
* David J. Malan
* malan@harvard.edu
*
* Defines a node for a linked list of integers.
***************************************************************************/
typedef struct node
{
int n;
struct node* next;
}
node;
The node contains an integer to store the actual data of interest as well as a pointer to another node. By convention, we write this definition in a header file named with the .h
extension. This keeps our .c
files more organized.
The actual linked list is defined as a single pointer to a node declared as a global variable at the top of the file:
// linked list
node* first = NULL;
When we compile and run list1.c
, we get a prompt giving us options to delete, find, insert, and traverse the list. To test it out, we can choose the insert option a few times and add the numbers 9, 17, and 22 to our linked list. How is this insertion actually happening? Here’s the definition of the insert
function:
/**
* Tries to insert a number into list.
*/
void insert(void)
{
// try to instantiate node for number
node* newptr = malloc(sizeof(node));
if (newptr == NULL)
return;
// initialize node
printf("Number to insert: ");
newptr->n = GetInt();
newptr->next = NULL;
// check for empty list
if (first == NULL)
first = newptr;
// else check if number belongs at list's head
else if (newptr->n < first->n)
{
newptr->next = first;
first = newptr;
}
// else try to insert number in middle or tail
else
{
node* predptr = first;
while (true)
{
// avoid duplicates
if (predptr->n == newptr->n)
{
free(newptr);
break;
}
// check for insertion at tail
else if (predptr->next == NULL)
{
predptr->next = newptr;
break;
}
// check for insertion in middle
else if (predptr->next->n > newptr->n)
{
newptr->next = predptr->next;
predptr->next = newptr;
break;
}
// update pointer
predptr = predptr->next;
}
}
// traverse list
traverse();
}
node
is since we can use the sizeof
function to figure it out. As always, we have to check the return value of malloc
for NULL
in case the memory couldn’t be allocated for some reason.n
using the arrow syntax. Recall that writing newptr->n = GetInt()
is equivalent to writing (*newptr).n = GetInt()
.Inserting into a linked lists can be divided into four cases. If first
is NULL
, then the list is empty, so we point first
to the new node and we’re done. To insert at the head of the list, we point the new node to where first
is pointing and we point first
to the new node:
// else check if number belongs at list's head
else if (newptr->n < first->n)
{
newptr->next = first;
first = newptr;
}
To insert at the tail of the list, we point the last node to the new node:
// check for insertion at tail
else if (predptr->next == NULL)
{
predptr->next = newptr;
break;
}
NULL
since we did that earlier when we initialized newptr
.predptr
) which will keep track of the node before the node we’re considering. We can’t move backwards in singly linked lists.The last case, inserting in the middle of the list, is a little tricky, so we’ll wave our hands at it for now. Here it is in code, though:
// check for insertion in middle
else if (predptr->next->n > newptr->n)
{
newptr->next = predptr->next;
predptr->next = newptr;
break;
}
list2.c
gives example code for creating a linked list of structs rather than integers.n
since the next available slot might be the very last in the hash table.How often does this kind of collision occur? Last time, we asked this question in a slightly different way:
In a room of n CS50 students, what’s the probability that at least 2 students have the same birthday?
To answer this question, we’ll consider the opposite: what’s the probability that no 2 students have the same birthday. If there’s only 1 student in the room, then the probability that no 2 students have the same birthday is 1. If there are 2 students in the room, then there are 364 possible birthdays out of 365 which the second student could have that would be different from the first student’s. Thus, the probability that no 2 students have the same birthday in a room of 2 is 364⁄365. The probability that no 2 students have the same birthday in a room of 3 is 363⁄365. And so on. To get the total probability, we multiple all of these probabilities together. You can see this math here, courtesy of Wikipedia:
This is much easier to interpret in the form of a graph, however:
How do we implement a hash table in code? We might declare an array of linked lists like so:
node* table[31];
If we wanted each node to store a string, we might define a node like so:
typedef struct node
{
char word[LENGTH + 1];
struct node* next;
}
node;
word
instead of a char *
? Just so we don’t have to use malloc
. Instead, we define an array which has enough indices to store the longest possible word.More generally, we can talk about a data structure called a tree which defines nodes that have parents, children, and siblings:
This diagram shows a tree which has varying numbers of children for each node. But if we are a little more strict in how many children a node can have, we create a structure that’s easily searchable. Consider a binary search tree:
In a binary search tree, a node can have at most two children. We could define this in code like so:
typedef struct node
{
int n;
struct node* left;
struct node* right;
}
node;
A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes, the nodes in a trie are arrays. We might use a trie to store a dictionary of words, as this diagram suggests:
In this trie, each index in the array stands for a letter of the alphabet. Each of those indices also points to another array of letters. In the above, we’re storing the names of famous scientists, e.g. Mendel, Mendeleev, Pavlov, Pasteur, and Turing. The Δ symbol denotes the end of a name. We have to keep track of where words end so that if one word actually contains another word (e.g. Mendeleev and Mendel), we know that both words exist. In code, the Δ symbol could be a Boolean flag in each node:
typedef struct node
{
bool is_word;
struct node* children[27];
}
node;