Introduction00:00

• Ask questions online at http://cs50.harvard.edu/discuss.

• Today we’ll talk more about data representation and cryptography, or scrambling information, but first a story from yesteryear.

• Radio Orphan Annie’s Secret Decoder Ring is a child-friendly form of cryptography, with two discs that rotates independently, each with a set of letters, such that `A` might now map to `B`, `B` to something like `C`, etc.

• This clip from A Christmas Story shows a child, Ralphie, excitedly decoding the secret message from the radio, only to find that it is an advertisement for Ovaltine, a beverage popular many years ago.

• CS50 Lunch will again be Friday 1:15pm, RSVP at http://cs50.harvard.edu/rsvp.

Data Representation04:43

Strings04:43

• Let’s say we want to represent a `string` for Zamyla’s name. We can place each character, or `char`, in its own box:

``````-------------------------
| Z | a | m | y | l | a |
-------------------------``````
• It’s useful to think of strings as composed of `char` puzzle pieces since we can access individual characters more easily. Let’s look at `string-0.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3
4int main(void)
5{
6    string s = GetString();
7
8    for (int i = 0; i < 6; i++)
9    {
10        printf("%c\n", s[i]); (10)
11    }
12}``````
• In line 10 we use `%c\n` to print each character on its own line, and to get each character, we use `s[i]`, as in the "box number" of the string `s`:

``````    -------------------------
s: | Z | a | m | y | l | a |
-------------------------
0   1   2   3   4   5``````
• So `s[0]` would get us `Z`, `s[1]` `a`, and so on, and as `i` is increased by the `for` loop, we will move through the string.

• But wait, this code won’t compile, because we’re missing header files. So let’s add `stdio.h` for `printf`, and `cs50.h` to have `string` and `GetString`.

`````` 1#include <cs50.h>
2#include <stdio.h>
3
4int main(void)
5{
6    string s = GetString();
7
8    for (int i = 0; i < 6; i++)
9    {
10        printf("%c\n", s[i]);
11    }
12}``````
• Now let’s run this program:

``````jharvard@appliance (~/Dropbox): ./string-0
Zamyla
Z
a
m
y
l
a
jharvard@appliance (~/Dropbox):``````
• Looks good. Let’s try it again with Daven’s full name:

``````jharvard@appliance (~/Dropbox): ./string-0
Davenport
D
a
v
e
n
p
jharvard@appliance (~/Dropbox):``````
• This bug happened because we put the 6 in the for loop, and we can actually determine the length of a string with `strlen`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3
4int main(void)
5{
6    string s = GetString();
7
8    for (int i = 0; i < strlen(s); i++)
9    {
10        printf("%c\n", s[i]);
11    }
12}``````
• Now we get a compiler error:

``````jharvard@appliance (~/Dropbox): make string-0
clang -ggdb3 -O0 -std=c99 -Wall -Werror    string-0.c  -lcs50 -lm -o string-0
string-0.c:8:25: error: implicitly declaring library function 'strlen' with type 'unsigned int (const char *)'
[-Werror]
for (int i = 0; i < strlen(s); i++)
^
string-0.c:8:25: note: please include the header <string.h> or explicitly provide a declaration for 'strlen'
1 error generated.
make: *** [string-0] Error 1
jharvard@appliance (~/Dropbox):``````
• We focus on `implicitly declaring library function`, which tells us we need to find the `strlen` function in another library.

• If we wanted to find out which library has this function, we can go into the terminal and run `man` for manual:

``````jharvard@appliance (~): man strlen
...
STRLEN(3)                  Linux Programmer's Manual                 STRLEN(3)

NAME
strlen - calculate the length of a string

SYNOPSIS
#include <string.h>
...``````
• So now we know to include `string.h`, and compiling and running now seem to work.

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5int main(void)
6{
7    string s = GetString();
8
9    for (int i = 0; i < strlen(s); i++)
10    {
11        printf("%c\n", s[i]);
12    }
13}``````
• We’ve been taking for granted that our laptop, and the CS50 Appliance, has a large amount of memory, but if we type for long enough, we can type more characters than we have memory, much like integers running out of bits.

• So we need to anticipate this problem by making sure `s` has some value that was indeed returned by `GetString`. If something goes wrong, `GetString` will return a special value, `NULL`, as in there is no value. We check if `s` is `NULL` before we use it, in `string-1.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5int main(void)
6{
7    string s = GetString();
8
9    if (s != NULL)
10    {
11        for (int i = 0; i < strlen(s); i++)
12        {
13            printf("%c\n", s[i]);
14        }
15    }
16}``````
• `!=` means not equal, so we only print the characters in `s` if there is a string in `s`.

• Let’s look at `string-2.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5int main(void)
6{
7    // get line of text
8    string s = GetString();
9
10    // print string, one character per line
11    if (s != NULL)
12    {
13        for (int i = 0; i < strlen(s); i++)
14        {
15            printf("%c\n", s[i]);
16        }
17    }
18}``````
• Remember in line 13 we initialize an `int i = 0` and increment it by `i++` every time. The middle part checks that `i < strlen(s)` is true in order to continue the loop, meaning we keep printing the character for the length of the string. But every time `i` is changing while `s` is not, so every time we are calculating the length of `s` over and over again unnecessarily. We can do a little better:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5int main(void)
6{
7    // get line of text
8    string s = GetString();
9
10    // print string, one character per line
11    if (s != NULL)
12    {
13        for (int i = 0, n = strlen(s); i < n; i++)
14        {
15            printf("%c\n", s[i]);
16        }
17    }
18}``````
• Now we initialize two variables, `i` and `n`, with `n` holding the length of the string. Though this version is equally as correct as the first, it is better design as we don’t need to answer the same question multiple times, and thus improved its efficiency.

• Note that we don’t have to say `int n` since it’s the same type as `i` and it is in the same statement.

• As an aside, there is no functional difference in this particular context between `i++` and `++i`, but `i++` is more clear that `i` is being incremented. Alternatively, you can write `i += 1`.

Typecasting18:50

• Typecasting is the ability to convert one datatype to another. Recall that ASCII maps letters to numbers. Let’s look at `ascii-0.c`:

`````` 1#include <stdio.h>
2
3int main(void)
4{
5    // display mapping for uppercase letters
6    for (int i = 65; i < 65 + 26; i++)
7    {
8        printf("%c: %i\n", (char) i, i);
9    }
10
11    // separate uppercase from lowercase
12    printf("\n");
13
14    // display mapping for lowercase letters
15    for (int i = 97; i < 97 + 26; i++)
16    {
17        printf("%c: %i\n", (char) i, i);
18    }
19}``````
• Let’s go through this program. Line 6 runs 26 times, starting from `65` because `A` is `65` in ASCII, and in line 8 we print a `char` and an `int`. It turns out by using `(char) i` we can print `i` out as a char. The loop below, starting at `97`, prints the lowercase characters in a similar way:

``````jharvard@appliance (~/Dropbox/src2w): make ascii-0
clang -ggdb3 -O0 -std=c99 -Wall -Werror    ascii-0.c  -lcs50 -lm -o ascii-0
jharvard@appliance (~/Dropbox/src2w): ./ascii-0
A: 65
B: 66
C: 67
...
X: 88
Y: 89
Z: 90

a: 97
b: 98
c: 99
...
x: 120
y: 121
z: 122``````
• Let’s look at `ascii-1.c`:

`````` 1#include <stdio.h>
2
3int main(void)
4{
5    // display mapping for uppercase letters
6    for (char c = 'A'; c <= 'Z'; c++)
7    {
8        printf("%c: %i\n", c, (int) c);
9    }
10
11    // separate uppercase from lowercase
12    printf("\n");
13
14    // display mapping for lowercase letters
15    for (char c = 'a'; c <= 'z'; c++)
16    {
17        printf("%c: %i\n", c, (int) c);
18    }
19}``````
• We can compare `c` to a character directly in lines 6 and 15 since the underlying data is still stored in bits and the computer will just compare the numbers. We can also see that `char c` can be converted back to an `int` in lines 8 and 17.

• Let’s look at `capitalize-0.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3#include <string.h>
4
5int main(void)
6{
7    // get line of text
8    string s = GetString();
9
10    // capitalize text
11    for (int i = 0, n = strlen(s); i < n; i++)
12    {
13        if (s[i] >= 'a' && s[i] <= 'z')
14        {
15            printf("%c", s[i] - ('a' - 'A'));
16        }
17        else
18        {
19            printf("%c", s[i]);
20        }
21    }
22    printf("\n");
23}``````
• First, we get a string from the user. In line 11 we iterate over the string character by character, storing the length of `s` in `n`. In line 13 we access each character, and determine if it’s lowercase by comparing its value to the boundaries of the values of lowercase characters. In line 15, we notice that `a` is `97`, `A` is `65`, `b` is `98`, `B` is `66`, and so on, meaning the difference between lowercase and uppercase is a constant 32. So we subtract that difference from the lowercase `s[i]`, which then gives us an uppercase character.

Abstraction28:16

• Now let’s look at `capitalize-1.c`:

`````` 1#include <cs50.h>
2#include <ctype.h>
3#include <stdio.h>
4#include <string.h>
5
6int main(void)
7{
8    // get line of text
9    string s = GetString();
10
11    // capitalize text
12    for (int i = 0, n = strlen(s); i < n; i++)
13    {
14        if (islower(s[i]))
15        {
16            printf("%c", toupper(s[i]));
17        }
18        else
19        {
20            printf("%c", s[i]);
21        }
22    }
23    printf("\n");
24}``````
• Here we can use a `toupper` function declared in `ctype.h` which we also included, and we call it by passing it `s[i]` within the parentheses. We also use `islower` to check if a character is lowercase. Notice that these functions were probably implemented with code similar to the previous example, but are nicely named and already exist for us to use. This follows along with the idea of abstracting away lower level details and using these functions to help us.

• If we look at the man page for `toupper`, we see this:

``````...
RETURN VALUE
The value returned is that of the converted letter, or c if the conversion was not possible.
...``````
• So now we can improve the code in `capitalize-2.c` by removing the `if` condition and allowing `toupper` to do the work:

`````` 1#include <cs50.h>
2#include <ctype.h>
3#include <stdio.h>
4#include <string.h>
5
6int main(void)
7{
8    // get line of text
9    string s = GetString();
10
11    // capitalize text
12    for (int i = 0, n = strlen(s); i < n; i++)
13    {
14        printf("%c", toupper(s[i]));
15    }
16    printf("\n");
17}``````
• Remember that we should also make sure that `s` is not `NULL`, or have a `do-while` loop prompt the user for a string until an acceptable one is given.

• For functions in the various libraries, `stdio.h`, `cs50.h`, `string.h`, `ctype.h`, reference.cs50.net has user-friendly explanations that are quite helpful.

Arrays33:57

• A volunteer from the audience, Alex, writes `Zamyla` on the screen, simulating `GetString`. He then writes `Belinda`, and then `Gabe`. If we think about the screen as all the memory we have, we notice that Alex wrote them with some spacing between the names. A computer has a grid of memory as well:

``````---------------------------------
| Z | a | m | y | l | a |   |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------``````
• The computer wants to be efficient, and use as much as the memory as possible:

``````---------------------------------
| Z | a | m | y | l | a |\0 |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------``````
• Strings end with a `\0` which, in binary, is eight 0s in a row. And this tells a computer that this is the end of a string in memory. So Belinda is added like so:

``````---------------------------------
| Z | a | m | y | l | a |\0 | B |
---------------------------------
| e | l | i | n | d | a |\0 |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------
|   |   |   |   |   |   |   |   |
---------------------------------``````
• And we can continue with even more names:

``````---------------------------------
| Z | a | m | y | l | a |\0 | B |
---------------------------------
| e | l | i | n | d | a |\0 | G |
---------------------------------
| a | b | e |\0 | D | a | v | e |
---------------------------------
| n |\0 |   |   |   |   |   |   |
---------------------------------``````
• So this general idea of storing items in boxes is known as an `array`. An `array` is a type of data structure, with a continguous number of the same type of data, back-to-back. A `string` is just an `array` of `char` variables, and we can even put numbers in an array. Arrays will generally have this format:

``type name[size];``
• Now say we wanted to get the ages of a number of people in the room. We might start with:

`````` 1#include <cs50.h>
2#include <stdio.h>
3
4int main(void)
5{
6    int age1 = GetInt();
7    int age2 = GetInt();
8    int age3 = GetInt();
9
10    // do something with those numbers ...
11}``````
• But this will limit us to only ever storing exactly 3 ages. So we can solve this problem with `ages.c`:

`````` 1#include <cs50.h>
2#include <stdio.h>
3
4int main(void)
5{
6    // determine number of people
7    int n;
8    do
9    {
10        printf("Number of people in room: ");
11        n = GetInt();
12    }
13    while (n < 1);
14
15    // declare array in which to store everyone's age
16    int ages[n];
17
18    // get everyone's age
19    for (int i = 0; i < n; i++)
20    {
21        printf("Age of person #%i: ", i + 1);
22        ages[i] = GetInt();
23    }
24
25    // report everyone's age a year hence
26    printf("Time passes...\n");
27    for (int i = 0; i < n; i++)
28    {
29        printf("A year from now, person #%i will be %i years old.\n", i + 1, ages[i] + 1);
30    }
31}``````
• In line 16, we declare an array that stores exactly `n` integers. The number in this case is how big we want the array to be, whereas earlier when we used `s[i]` we were retrieving that particular item in the array since it was already declared.

• Then we `GetInt` for each person, storing it in `ages[i]` as we go through the loop, meaning the ages will be placed in the first box, second box, and so on of the `ages[]` array.

• Finally, we iterate through the array again and print out each age, with 1 added to demonstrate what we can do after we retrieve the `int` from the array.

Command-Line Arguments46:00

• It turns out, instead of using `GetInt` or `GetString` to get input, we can use command-line arguments to get words typed into the blinking prompt, after your program’s name. For example, you might type:

``````jharvard@appliance (~): ./caesar 13
...
jharvard@appliance (~): ./caesar 7``````
• The number after `./caesar` is the command-line argument that can change every time you run the program, and the program will use it as input.

Cryptography46:36

• In Problem Set 2 we introduce you to cryptography, scrambling information. In particular, Caesar and Vigenere ciphers will rotate letters to another letter. Caesar adds a particular number to all the letters, while Vigenere adds a different number to each letter. In the end, you’ll see that you use the same key to turn plaintext into ciphertext, and back to plaintext.

• In the Hacker Edition, we’ll give you some usernames and encrypted (well, "hashed") passwords that look like `{crypt}\$1\$LlBcWwQn\$pxTB3yAjbVS/HTD2xuXFI0`, challenging you to crack them and finding the original passwords.

• A clip from Spaceballs humorously has "the stupidest combination I’ve ever heard in my life": `12345`.