• One of the topics for today is digital forensics, recovering information, and Problem Set 4 will be in this domain.

• David used to work in the district attorney’s office, attempting to find evidence from hard drive and memory cards that the police brought in.

• The portrayal of this process in TV and movies might look like this, where characters, in clips from various shows and films, say phrases like "zoom" and "enhance," that magically cause images to reveal details previously unseen.

• But when we really try to "enhance" images, like that of Daven, we eventually see the pixels that compose the image, because there are only a finite of bits in the image. (The bad guy in the reflection in his eye will only be 6 pixels, no matter how far we try to zoom!)

• We’ll explore this in Problem Set 4 with file I/O, where I/O just means input/output. None of the programs we’ve worked with so far have been saving to disk by creating or changing files.

• An example of a file is a JPEG, which is simply an image file.

• What’s interesting is that files can typically be identified by certain patterns of bits. Different files, like a JPEG or a PNG (image file) or a GIF (image file) or a Word document or a Excel spreadsheet, will have different patterns of bits, and those patterns are usually at the top of the file, so when a computer opens it, it can recognize, say, a JPEG as an image, and display it to the user as a graphic. Or, it might look like a Word doc, so let’s show it to the user as an essay.

• For instance, the first three bytes of a JPEG are:

``              255  216  255``
• Next week we’ll be poking more deeply into files to see what’s always been there.

• But first, realize that "what’s there" generally isn’t written in decimal numbers like above.

• Computer scientists actually tend to express numbers in hexadecimal, as opposed to decimal or binary.

• Recall that decimal uses 10 digits, 0-9, while binary is composed of 2 digits, 0 and 1.

• Hexadecimal means that we will have 16 such digits, 0-9 and a, b, c, d, e, f.

• "a" is 10, "b" is 11, and so on.

• How can this be useful? Well let’s write out the bits that represent these numbers:

``````             255    216     255
11111111  11011000  11111111``````
• This is interesting because a byte has 8 bits, and if we break each byte into two chunks of 4 bits, each set of 4 bits will correspond to exactly one hexadecimal digit:

``````              255     216     255
1111 1111  1101 1000  1111 1111
f    f     d 8     f    f``````
• To make this more readable, we remove the whitespace and add `0x`, just to signify that the characters in the last row are in hexadecimal:

``````              255     216     255
1111 1111  1101 1000  1111 1111
f    f     d 8     f    f
0xff     0xd8    0xff``````
• Note that we can also convert two hexadecimal digits to 8 bits in binary, or one byte, making it especially useful for representing binary data.

• Another image file format is a bitmap file, BMP. One example of an image in that format is `bliss.bmp`, a very familiar rolling green hill set against a blue cloudy sky (the default Windows XP wallpaper on millions of PCs).

• A bitmap is just a pattern of pixels, or dots, a "map of bits," if you will.

• What’s interesting, though, is that its beginnings are more than just a few bytes. Its header has a whole bunch of numbers, bytes, with their orders and values determined years ago by its author, Microsoft. Indeed, Microsoft has named the types of those values things like `WORD` and `DWORD` and `LONG`, but those are simply data types like `int`, different names for the same thing.

• So when someone clicks on a BMP file, the image is only shown because the operating system (or image-viewing program, really) noticed all of these bits at the beginning of the file and recognized that it was a BMP. More on this later.

## Structs09:50

• Let’s look at a simpler file first:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5#include "structs.h"
6
7// number of students
8#define STUDENTS 3
9
10int main(void)
11{
12    // TODO
13}``````
• Let’s say we want to start creating a database of every student, and start by saving the name of a student and their house.

• We see that there are `3` such `STUDENTS`, so we can probably use something like a `for` loop to `GetString` a name and house for each of them. And we can start by instinctively:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5#include "structs.h"
6
7// number of students
8#define STUDENTS 3
9
10int main(void)
11{
12    string names[STUDENTS];
13    string houses[STUDENTS];
14
15    for (int i = 0; i < STUDENTS; i++)
16    {
17        names[i] = GetString();
18        houses[i] = GetString();
19    }
20
21    // TODO later...
22
23}``````
• So this is correct, as we create arrays to store the `names` and `houses`, and iterate through for the number of students, even if it’s not very user-friendly.

• But this is not very good design. What if they had an ID number or email or Twitter handle, or more details to associate? To add it we’d have to do something like this:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5#include "structs.h"
6
7// number of students
8#define STUDENTS 3
9
10int main(void)
11{
12    string names[STUDENTS];
13    string houses[STUDENTS];
14    int ids[STUDENTS];
16    ...
17}``````
• But soon, if we keep adding, we’ll have something pretty big and unwieldy.

• We can use a higher-level data structure to hold something of a type `student`, and we see an example of this in `structs.h`:

`````` 1#include <cs50.h>
2
3// structure representing a student
4typedef struct
5{
6    string name;
7    string house;
8}
9student;``````
• In fact, we’ve already been using `structs` in Problem Set 3, since there’s no such thing as a `GRect` or `GOval` in C. The SPL has those data types, implemented with the approach shown above.

• The keywords typedef and struct on line 4 just mean define a type — a structure — that is a container for multiple things, and inside that structure will be a `string` called `name` and a `string` called `house`, and the entire structure will be called `student` for convenience.

• `student` is now a data type just like `int` and `string` and `GRect` and others.

• Now we can do something like this, in `structs-0.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5#include "structs.h"
6
7// number of students
8#define STUDENTS 3
9
10int main(void)
11{
12    // declare students
13    student students[STUDENTS];
14...``````
• Note that we have one `array` named `students`, with each element of the type `student`. There are `STUDENTS` (which we’ve defined in line 8 to be `3`) elements in the `students` array.

• How do we access `name` and `house` and other fields, or items, in a `struct`? Well we scroll down in `structs-0.c`:

`````` 1...
2int main(void)
3{
4    // declare students
5    student students[STUDENTS];
6
7    // populate students with user's input
8    for (int i = 0; i < STUDENTS; i++)
9    {
10        printf("Student's name: ");
11        students[i].name = GetString();
12
13        printf("Student's house: ");
14        students[i].house = GetString();
15    }
16...``````
• We index into the array in line 11, and use a new syntax of `.name` to get the field called `name`.

• We’ll return to `struct`s soon!

## Quick Reminder15:21

• Speaking of images, here’s another one of Daven at Fire and Ice, a reminder that CS50 Lunch is Friday at 1:15pm as usual, RSVP at http://cs50.harvard.edu/rsvp.

## GDB15:31

• So where did we leave off on Monday? We introduced this problem of swapping two variables `a` and `b` with a temporary variable called `tmp`:

``````void swap(int a, int b)
{
int tmp = a;
a = b;
b = tmp;
}``````
• But remember that the problem is that it only swaps the variables locally, in the function’s own memory (recall the trays from Annenberg that represent each function’s slice of memory, and how the `swap` function doesn’t have access to the variables in `main`, but rather copies).

• Let’s open our friend `noswap.c`:

`````` 1/**
2 * noswap.c
3 *
4 * David J. Malan
5 * malan@harvard.edu
6 *
7 * Should swap two variables' values, but doesn't!  How come?
8 */
9
10#include <stdio.h>
11
12void swap(int a, int b);
13
14int main(void)
15{
16    int x = 1;
17    int y = 2;
18
19    printf("x is %i\n", x);
20    printf("y is %i\n", y);
21    printf("Swapping...\n");
22    swap(x, y);
23    printf("Swapped!\n");
24    printf("x is %i\n", x);
25    printf("y is %i\n", y);
26}
27
28/**
29 * Fails to swap arguments' values.
30 */
31void swap(int a, int b)
32{
33    int tmp = a;
34    a = b;
35    b = tmp;
36}``````
• In lines 16 and 17 we initialized `x` and `y`, had `printfs`, and then called `swap` on line 22. `swap`, on line 31, might look correct on first glance, but isn’t.

• Let’s warm up by investigating with our new friend `gdb`:

``````jharvard@appliance (~/Dropbox/src4w): gdb ./noswap
• Now let’s just run it:

``````(gdb) run
Starting program: /home/jharvard/noswap
x is 1
y is 2
Swapping...
Swapped!
x is 1
y is 2
[Inferior 1 (process 4922) exited normally]``````
• That wasn’t quite so useful, since it just ran the program. Let’s pause execution with `break`, and we can specify a line like `break 10`, but let’s just `break` at the `main` function:

``````(gdb) break main
Breakpoint 1 at 0x80484ac: file noswap.c, line 16.``````
• Indeed, if we glance back at our source code, line 16 is the first command in `main`.

• Now we can run the program again:

``````(gdb) run
Starting program: /home/jharvard/noswap
Breakpoint 1, main () at noswap.c:16
16      int x = 1;``````
• And it’s paused. Let’s `print x`:

``````(gdb) print x
\$1 = 0``````
• But it’s `0`. `gdb` has paused execution to just before line 16, so `x` has no assigned value but rather whatever was left in memory prior (a garbage value), and here we got lucky with a clean value of `0`.

• Let’s say `next` and `print x` again:

``````(gdb) next
17      int y = 2;
(gdb) print x
\$2 = 1``````
• And indeed we see `1`. What if we `print y`?

``````(gdb) print y
\$3 = 134514064``````
• We see another garbage value as expected, with those bits from some other program that used the memory last to store something. Not to worry, we can proceed:

``````(gdb) next
19      printf("x is %i\n", x);
(gdb) print y
\$4 = 2``````
• And `y` is `2` as expected. Moving on:

``````(gdb) n
x is 1
20      printf("y is %i\n", y);
(gdb) n
y is 2
21      printf("Swapping...\n");
(gdb) n
Swapping...
22      swap(x, y);``````
• But now we want to go inside `swap`, so we use the `step` command:

``````(gdb) step
swap (a=1, b=2) at noswap.c:33
33      int tmp = a;
(gdb) print tmp
\$5 = -1209908752``````
• `tmp` has a garbage value again, but we can see it’s correct after we initialize it.

• And remember we have `a` and `b` from the source code where `swap` is declared as `void swap(int a, int b)`, so it refers to the values passed in as `a` and `b`, and remember that those are copies of `x` and `y` held by `main`:

``````(gdb) next
34      a = b;
(gdb) print tmp
\$6 = 1
(gdb) print a
\$7 = 1
(gdb) next
35      b = tmp;
(gdb) print a
\$8 = 2``````
• Now we see that `a` is `2`, after we said `next` and executed line 34. We can also check that `b` is `1` and `tmp` is still there:

``````(gdb) next
36  }
(gdb) print a
\$9 = 2
(gdb) print b
\$10 = 1
(gdb) print tmp
\$11 = 1``````
• Let’s say `continue` to finish the program:

``````(gdb) continue
Continuing.
Swapped!
x is 1
y is 2
[Inferior 1 (process 4946) exited normally]
(gdb)``````
• `gdb` didn’t fix the problem, but helped us realize that our code didn’t have an impact.

## Strings20:35

• So let’s solve this problem. We’re now peeling back a layer of abstraction, and realizing that a `string` doesn’t actually exist, and instead is a `char*` with a name of `string`.

• Let’s open `compare-0.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3
4int main(void)
5{
6    // get line of text
7    printf("Say something: ");
8    string s = GetString();
9
10    // get another line of text
11    printf("Say something: ");
12    string t = GetString();
13
14    // try (and fail) to compare strings
15    if (s == t)
16    {
17        printf("You typed the same thing!\n");
18    }
19    else
20    {
21        printf("You typed different things!\n");
22    }
23}``````
• We get two strings, and try to compare them. We make it and try it out:

``````jharvard@appliance (~/Dropbox/src4w): make compare-0
clang -ggdb3 -O0 -std=c99 -Wall -Werror    compare-0.c  -lcs50 -lm -o compare-0
jharvard@appliance (~/Dropbox/src4w): ./compare-0
Say something: Daven
Say something: Rob
You typed different things!
jharvard@appliance (~/Dropbox/src4w): ./compare-0
Say something: Gabe
Say something: Gabe
You typed different things!
jharvard@appliance (~/Dropbox/src4w): ./compare-0
Say something: Zamyla
Say something: Zamyla
You typed different things!
jharvard@appliance (~/Dropbox/src4w):``````
• So what’s going on? The first time we got what we expected, but then we passed in two strings that were the same! All we’re doing in the source code is getting them and checking if they are `==`.

• Well it turns out that, when we say something like:

``````string s = GetString();
-----
|   |
-----``````
• where the box below `s` is storing `s`.

• Now `GetString`, in our first attempt, returned to us `Daven`, and also a `\0`, like so:

``````string s = GetString();
-----    -------------------------
|   |    | D | a | v | e | n |\0 |
-----    -------------------------``````
• It looks like `Daven\0` is made up of many bytes, and if a `string` is only 4 bytes, 32 bits, how can we fit the entire string inside `s`? Well, the row of squares containing `Daven\0` are just bytes in memory. We can think of each byte as having a certain address, just as buildings might have an address like 33 Oxford Street, 34 Oxford Street, or 35 Oxford Street, and here, as an example, we start numbering each byte in memory:

``````string s = GetString();
-----    -------------------------
|   |    | D | a | v | e | n |\0 |
-----    -------------------------
0x1 0x2 0x3 0x4 0x5 0x6``````
• And now, we can sort of guess that `GetString` returns not the entire `string`, but rather the address to the `string`: `Daven` "lives" at `0x1`. So `s` only contains the address to that string:

``````string s = GetString();
-----    -------------------------
|0x1|    | D | a | v | e | n |\0 |
-----    -------------------------
0x1 0x2 0x3 0x4 0x5 0x6``````
• And this has been going on since we first introduced a `string`. All we get when we ask for a `string` is the location of where it begins.

• Incidentally, programmers can put any address into a variable and try to jump to that area in memory, and we’ll see how that could be problematic next time.

• But how do we know where a `string` ends, and the next `string` begins? Well, the `\0`, special null character, tells us when a `string` ends.

• So now if we look at the code of `compare.c`:

``````...
// try (and fail) to compare strings
if (s == t)
{
printf("You typed the same thing!\n");
}
...``````
• we see that this fails since `s` and `t` are pointing to different addresses, since `t` is another `string`, and we’re comparing the locations rather than the first character of each one, then the next, and so on:

``````string s = GetString();
-----    -------------------------
|0x1|    | D | a | v | e | n |\0 |
-----    -------------------------
0x1 0x2 0x3 0x4 0x5 0x6

string t = GetString();
-----    -------------------------
|...|    |   |   |   |   |   |   |
-----    -------------------------
...``````
• So let’s fix this problem. If we had to implement it ourselves, we might compare letters in the two strings, one at a time, until we reached the end of one or both of them. But we don’t need to, thanks to the `strcmp` function as shown in `compare-1.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5int main(void)
6{
7    // get line of text
8    printf("Say something: ");
9    char* s = GetString();
10
11    // get another line of text
12    printf("Say something: ");
13    char* t = GetString();
14
15    // try to compare strings
16    if (s != NULL && t != NULL)
17    {
18        if (strcmp(s, t) == 0)
19        {
20            printf("You typed the same thing!\n");
21        }
22        else
23        {
24            printf("You typed different things!\n");
25        }
26    }
27}``````
• Notice that we use `strcmp` in line 18, which will return a negative number, or a positive number, or zero. Zero would mean that they are equal, and a positive or negative number would mean something like greater than or less than, if you wanted to alphabetize those strings.

• We’ve also switched to `char*` in lines 9 and 13, and really that’s the same as `string`, which we made up. For now, just know that the `*` means the address of something, and so a `char*`, as opposed to `int*`, means an address of a `char`.

• Going back to the board,

``````string s = GetString();
-----    -------------------------
|0x1|    | D | a | v | e | n |\0 |
-----    -------------------------
0x1 0x2 0x3 0x4 0x5 0x6

string t = GetString();
-----    -------------------------
|...|    |   |   |   |   |   |   |
-----    -------------------------
...``````
• that box with `0x1` is really a `char*`.

• Let’s open `copy-0.c`:

`````` 1#include <cs50.h>
2#include <ctype.h>
3#include <stdio.h>
4#include <string.h>
5
6int main(void)
7{
8    // get line of text
9    printf("Say something: ");
10    string s = GetString();
11    if (s == NULL)
12    {
13        return 1;
14    }
15
16    // try (and fail) to copy string
17    string t = s;
18
19    // change "copy"
20    printf("Capitalizing copy...\n");
21    if (strlen(t) > 0)
22    {
23        t[0] = toupper(t[0]);
24    }
25
26    // print original and "copy"
27    printf("Original: %s\n", s);
28    printf("Copy:     %s\n", t);
29
30    // success
31    return 0;
32}``````
• So in lines 10-14 we get a `string s` and checks that it’s not `NULL` in case something went wrong. Otherwise, we might start going to invalid addresses in memory, and cause more and more problems.

• Let’s try to copy the string in line 17, and capitalize the first character of `t`, `t[0]`, in line 23. And then we print them.

• But what’s really happening, and where is the bug? Let’s go back to line 17, where we set `string t` to `s`:

``````string t  =  s;
------     ------
|0x50|     |0x50|
------     ------``````
• So we’re setting `t` to point to the same address as `s`, but that just means when we change `t[0]`, the first letter in `t`, we also change `s[0]` since `s` points to the same thing:

``````string t  =  s;
------     ------
|0x50|     |0x50|
------     ------

---------------------------------
...  | g | a | b | e |\0 |   |   |   |
---------------------------------
0x50``````
• Let’s run `copy-0`:

``````jharvard@appliance (~/Dropbox/src4w): ./copy-0
Say something: gabe
Capitalizing copy...
Original: Gabe
Copy:     Gabe``````

## Memory Allocation33:32

• Hm, we can address this problem with `copy-1.c`:

`````` 1#include <cs50.h>
2#include <ctype.h>
3#include <stdio.h>
4#include <string.h>
5
6int main(void)
7{
8    // get line of text
9    printf("Say something: ");
10    char* s = GetString();
11    if (s == NULL)
12    {
13        return 1;
14    }
15
16    // allocate enough space for copy
17    char* t = malloc((strlen(s) + 1) * sizeof(char));
18    if (t == NULL)
19    {
20        return 1;
21    }
22
23    // copy string, including '\0' at end
24    for (int i = 0, n = strlen(s); i <= n; i++)
25    {
26        t[i] = s[i];
27    }
28
29    // change copy
30    printf("Capitalizing copy...\n");
31    if (strlen(t) > 0)
32    {
33        t[0] = toupper(t[0]);
34    }
35
36    // print original and copy
37    printf("Original: %s\n", s);
38    printf("Copy:     %s\n", t);
39
40    // free memory
41    free(s);
42    free(t);
43
44    // success
45    return 0;
46}``````
• This looks really complicated, but let’s talk about the concept first. We’ll use a loop to copy it character by character, but now we need to explicitly allocate memory for `t`:

``````string s = GetString();
------   ---------------------
|0x50|   | g | a | b | e |\0 |
------   ---------------------
0x50

string t
------   ---------------------
|    |   |                   |
------   ---------------------``````
• We declared a `string t`, but how do we assign a value to it? Well, let’s look at line 17, reproduced below:

``char* t = malloc((strlen(s) + 1) * sizeof(char));``
• We started by writing `char* t`, which is creating a variable `t` that will store an address to a character, creating that left-hand box for `t`. Then we call `malloc`, a function that allocates a certain amount of memory. The argument we pass in is the size of the memory that we want, which is `strlen(s)`, the number of characters in `s`, `+ 1` for the `\0` character ending the string, all multiplied by the `sizeof` a `char`. (We know a character is one byte, but that may, rarely, vary from computer to computer.) Finally, `malloc` will return the address of that chunk of memory, which might be anywhere:

``````string s = GetString();
------   ---------------------
|0x50|   | g | a | b | e |\0 |
------   ---------------------
0x50

string t
------   ---------------------
|0x88|   |                   |
------   ---------------------
0x88``````
• Now we can access the memory as an array in a `for` loop, reproduced below:

``````1    // copy string, including '\0' at end
2    for (int i = 0, n = strlen(s); i <= n; i++)
3    {
4        t[i] = s[i];
5    }``````
• We can do this because each string is stored with characters next to one another, so we can access them with this array notation.

• To recap, a `string` all this time was just an address of a character, a pointer, which in turn is just a number, that we conventionally write in hexadecimal.

• We also check if `t == NULL` because we might ask for more memory than `malloc` is able to give.

• And one final thing, if we return to what we were just looking at, we can replace line 4 below with line 5:

``````1    // copy string, including '\0' at end
2    for (int i = 0, n = strlen(s); i <= n; i++)
3    {
4        // t[i] = s[i];
5        *(t + i) = *(s + i);
6    }``````
• The `*` symbol can actually be used for two purposes. We’ve seen `char* t = …​` which is declaring that `t` is a pointer to a `char`, but if we use `*` without a word like `char` in front of it, it becomes a dereference operator. That just means "go there" - if an address, like 33 Oxford Street, was written on paper like *(33 Oxford Street), then we would just go there.

• `t` is the address of the new piece of memory, and `s` is the address of the original piece, and `i` goes from `0` to `1` to `2` to `3` etc, so `t + i` is just another number, since these are all addresses with number values.

• So on the first pass of the loop, with `i = 0`, we’re going to copy `g` from `0x50` to `0x88`:

``````string s = GetString();
------   ---------------------
|0x50|   | g | a | b | e |\0 |
------   ---------------------
0x50

string t
------   ---------------------
|0x88|   | g |   |   |   |   |
------   ---------------------
0x88``````
• On the next pass, `i = 1`, we’ll copy `a` from `0x50 + 1`, `0x51`, to `0x88 + 1`, `0x89`, and you can see how it’s going to proceed:

``````string s = GetString();
------   ---------------------
|0x50|   | g | a | b | e |\0 |
------   ---------------------
0x50

string t
------   ---------------------
|0x88|   | g | a |   |   |   |
------   ---------------------
0x88``````
• Let’s look at a final program:

``````int main(void)
{
int* x;
int* y;

x = malloc(sizeof(int));

*x = 42;

*y = 13;

y = x;

*y = 13;
}``````
• It first declares two variables, `x` and `y` that aren’t integers, but pointers to integers. Then we say `x = malloc(sizeof(int));`, or "give me enough memory to store an `int`", and the address returned by `malloc` will be stored in `x`.

• Meanwhile, `*x = 42` is going to the address stored in `x`, and putting `42` in it.

• Then we do the same with `y`, going to its address and putting `13` in it. But wait, `y` is probably a garbage value, some number left over from previous programs, but not contain an address to memory we should use to store an `int`. It’s like trying to go into a building you don’t own or have permission to enter, and bad things will happen.

• As an aside, David still remembers where he was when he understood pointers, sitting with his TF in the back of Eliot dining hall. So don’t worry if none of this makes sense just yet (though I hope these notes are helpful)!

• Let’s watch Pointer Fun with Binky.

• Binky is a clay …​ figure that talks about this code with a narrator, using a "magic wand of dereferencing" to show what we just explained, in a different way.

• There are three basic rules:

• "Pointer and pointee are separate - don’t forget to set up the pointee." (Don’t forget to `malloc` something for `y`!)

• "Dereference a pointer to access its pointee." (Use `*x` to go to the address stored in `x`!)

• "Assignment (=) between pointers makes them point to the same pointee." (`x = y` sets them to the same address.)