Introduction00:00
-
Ask questions online at http://cs50.harvard.edu/discuss.
-
Today we’ll talk more about data representation and cryptography, or scrambling information, but first a story from yesteryear.
-
Radio Orphan Annie’s Secret Decoder Ring is a child-friendly form of cryptography, with two discs that rotates independently, each with a set of letters, such that
A
might now map toB
,B
to something likeC
, etc. -
This clip from A Christmas Story shows a child, Ralphie, excitedly decoding the secret message from the radio, only to find that it is an advertisement for Ovaltine, a beverage popular many years ago.
-
-
CS50 Lunch will again be Friday 1:15pm, RSVP at http://cs50.harvard.edu/rsvp.
Data Representation04:43
Strings04:43
-
Let’s say we want to represent a
string
for Zamyla’s name. We can place each character, orchar
, in its own box:------------------------- | Z | a | m | y | l | a | -------------------------
-
It’s useful to think of strings as composed of
char
puzzle pieces since we can access individual characters more easily. Let’s look atstring-0.c
:1#include <cs50.h> 2#include <stdio.h> 3 4int main(void) 5{ 6 string s = GetString(); 7 8 for (int i = 0; i < 6; i++) 9 { 10 printf("%c\n", s[i]); (10) 11 } 12}
-
In line 10 we use
%c\n
to print each character on its own line, and to get each character, we uses[i]
, as in the "box number" of the strings
:------------------------- s: | Z | a | m | y | l | a | ------------------------- 0 1 2 3 4 5
-
So
s[0]
would get usZ
,s[1]
a
, and so on, and asi
is increased by thefor
loop, we will move through the string. -
But wait, this code won’t compile, because we’re missing header files. So let’s add
stdio.h
forprintf
, andcs50.h
to havestring
andGetString
.1#include <cs50.h> 2#include <stdio.h> 3 4int main(void) 5{ 6 string s = GetString(); 7 8 for (int i = 0; i < 6; i++) 9 { 10 printf("%c\n", s[i]); 11 } 12}
-
Now let’s run this program:
jharvard@appliance (~/Dropbox): ./string-0 Zamyla Z a m y l a jharvard@appliance (~/Dropbox):
-
Looks good. Let’s try it again with Daven’s full name:
jharvard@appliance (~/Dropbox): ./string-0 Davenport D a v e n p jharvard@appliance (~/Dropbox):
-
This bug happened because we put the 6 in the for loop, and we can actually determine the length of a string with
strlen
:1#include <cs50.h> 2#include <stdio.h> 3 4int main(void) 5{ 6 string s = GetString(); 7 8 for (int i = 0; i < strlen(s); i++) 9 { 10 printf("%c\n", s[i]); 11 } 12}
-
Now we get a compiler error:
jharvard@appliance (~/Dropbox): make string-0 clang -ggdb3 -O0 -std=c99 -Wall -Werror string-0.c -lcs50 -lm -o string-0 string-0.c:8:25: error: implicitly declaring library function 'strlen' with type 'unsigned int (const char *)' [-Werror] for (int i = 0; i < strlen(s); i++) ^ string-0.c:8:25: note: please include the header <string.h> or explicitly provide a declaration for 'strlen' 1 error generated. make: *** [string-0] Error 1 jharvard@appliance (~/Dropbox):
-
We focus on
implicitly declaring library function
, which tells us we need to find thestrlen
function in another library. -
If we wanted to find out which library has this function, we can go into the terminal and run
man
for manual:jharvard@appliance (~): man strlen ... STRLEN(3) Linux Programmer's Manual STRLEN(3) NAME strlen - calculate the length of a string SYNOPSIS #include <string.h> ...
-
So now we know to include
string.h
, and compiling and running now seem to work.1#include <cs50.h> 2#include <stdio.h> 3#include <string.h> 4 5int main(void) 6{ 7 string s = GetString(); 8 9 for (int i = 0; i < strlen(s); i++) 10 { 11 printf("%c\n", s[i]); 12 } 13}
-
We’ve been taking for granted that our laptop, and the CS50 Appliance, has a large amount of memory, but if we type for long enough, we can type more characters than we have memory, much like integers running out of bits.
-
So we need to anticipate this problem by making sure
s
has some value that was indeed returned byGetString
. If something goes wrong,GetString
will return a special value,NULL
, as in there is no value. We check ifs
isNULL
before we use it, instring-1.c
:1#include <cs50.h> 2#include <stdio.h> 3#include <string.h> 4 5int main(void) 6{ 7 string s = GetString(); 8 9 if (s != NULL) 10 { 11 for (int i = 0; i < strlen(s); i++) 12 { 13 printf("%c\n", s[i]); 14 } 15 } 16}
-
!=
means not equal, so we only print the characters ins
if there is a string ins
.
-
-
-
Let’s look at
string-2.c
:1#include <cs50.h> 2#include <stdio.h> 3#include <string.h> 4 5int main(void) 6{ 7 // get line of text 8 string s = GetString(); 9 10 // print string, one character per line 11 if (s != NULL) 12 { 13 for (int i = 0; i < strlen(s); i++) 14 { 15 printf("%c\n", s[i]); 16 } 17 } 18}
-
Remember in line 13 we initialize an
int i = 0
and increment it byi++
every time. The middle part checks thati < strlen(s)
is true in order to continue the loop, meaning we keep printing the character for the length of the string. But every timei
is changing whiles
is not, so every time we are calculating the length ofs
over and over again unnecessarily. We can do a little better:1#include <cs50.h> 2#include <stdio.h> 3#include <string.h> 4 5int main(void) 6{ 7 // get line of text 8 string s = GetString(); 9 10 // print string, one character per line 11 if (s != NULL) 12 { 13 for (int i = 0, n = strlen(s); i < n; i++) 14 { 15 printf("%c\n", s[i]); 16 } 17 } 18}
-
Now we initialize two variables,
i
andn
, withn
holding the length of the string. Though this version is equally as correct as the first, it is better design as we don’t need to answer the same question multiple times, and thus improved its efficiency. -
Note that we don’t have to say
int n
since it’s the same type asi
and it is in the same statement.
-
-
-
As an aside, there is no functional difference in this particular context between
i++
and++i
, buti++
is more clear thati
is being incremented. Alternatively, you can writei += 1
.
Typecasting18:50
-
Typecasting is the ability to convert one datatype to another. Recall that ASCII maps letters to numbers. Let’s look at
ascii-0.c
:1#include <stdio.h> 2 3int main(void) 4{ 5 // display mapping for uppercase letters 6 for (int i = 65; i < 65 + 26; i++) 7 { 8 printf("%c: %i\n", (char) i, i); 9 } 10 11 // separate uppercase from lowercase 12 printf("\n"); 13 14 // display mapping for lowercase letters 15 for (int i = 97; i < 97 + 26; i++) 16 { 17 printf("%c: %i\n", (char) i, i); 18 } 19}
-
Let’s go through this program. Line 6 runs 26 times, starting from
65
becauseA
is65
in ASCII, and in line 8 we print achar
and anint
. It turns out by using(char) i
we can printi
out as a char. The loop below, starting at97
, prints the lowercase characters in a similar way:jharvard@appliance (~/Dropbox/src2w): make ascii-0 clang -ggdb3 -O0 -std=c99 -Wall -Werror ascii-0.c -lcs50 -lm -o ascii-0 jharvard@appliance (~/Dropbox/src2w): ./ascii-0 A: 65 B: 66 C: 67 ... X: 88 Y: 89 Z: 90 a: 97 b: 98 c: 99 ... x: 120 y: 121 z: 122
-
-
Let’s look at
ascii-1.c
:1#include <stdio.h> 2 3int main(void) 4{ 5 // display mapping for uppercase letters 6 for (char c = 'A'; c <= 'Z'; c++) 7 { 8 printf("%c: %i\n", c, (int) c); 9 } 10 11 // separate uppercase from lowercase 12 printf("\n"); 13 14 // display mapping for lowercase letters 15 for (char c = 'a'; c <= 'z'; c++) 16 { 17 printf("%c: %i\n", c, (int) c); 18 } 19}
-
We can compare
c
to a character directly in lines 6 and 15 since the underlying data is still stored in bits and the computer will just compare the numbers. We can also see thatchar c
can be converted back to anint
in lines 8 and 17.
-
-
Let’s look at
capitalize-0.c
:1#include <cs50.h> 2#include <stdio.h> 3#include <string.h> 4 5int main(void) 6{ 7 // get line of text 8 string s = GetString(); 9 10 // capitalize text 11 for (int i = 0, n = strlen(s); i < n; i++) 12 { 13 if (s[i] >= 'a' && s[i] <= 'z') 14 { 15 printf("%c", s[i] - ('a' - 'A')); 16 } 17 else 18 { 19 printf("%c", s[i]); 20 } 21 } 22 printf("\n"); 23}
-
First, we get a string from the user. In line 11 we iterate over the string character by character, storing the length of
s
inn
. In line 13 we access each character, and determine if it’s lowercase by comparing its value to the boundaries of the values of lowercase characters. In line 15, we notice thata
is97
,A
is65
,b
is98
,B
is66
, and so on, meaning the difference between lowercase and uppercase is a constant 32. So we subtract that difference from the lowercases[i]
, which then gives us an uppercase character.
-
Abstraction28:16
-
Now let’s look at
capitalize-1.c
:1#include <cs50.h> 2#include <ctype.h> 3#include <stdio.h> 4#include <string.h> 5 6int main(void) 7{ 8 // get line of text 9 string s = GetString(); 10 11 // capitalize text 12 for (int i = 0, n = strlen(s); i < n; i++) 13 { 14 if (islower(s[i])) 15 { 16 printf("%c", toupper(s[i])); 17 } 18 else 19 { 20 printf("%c", s[i]); 21 } 22 } 23 printf("\n"); 24}
-
Here we can use a
toupper
function declared inctype.h
which we also included, and we call it by passing its[i]
within the parentheses. We also useislower
to check if a character is lowercase. Notice that these functions were probably implemented with code similar to the previous example, but are nicely named and already exist for us to use. This follows along with the idea of abstracting away lower level details and using these functions to help us. -
If we look at the man page for
toupper
, we see this:... RETURN VALUE The value returned is that of the converted letter, or c if the conversion was not possible. ...
-
So now we can improve the code in
capitalize-2.c
by removing theif
condition and allowingtoupper
to do the work:1#include <cs50.h> 2#include <ctype.h> 3#include <stdio.h> 4#include <string.h> 5 6int main(void) 7{ 8 // get line of text 9 string s = GetString(); 10 11 // capitalize text 12 for (int i = 0, n = strlen(s); i < n; i++) 13 { 14 printf("%c", toupper(s[i])); 15 } 16 printf("\n"); 17}
-
Remember that we should also make sure that
s
is notNULL
, or have ado-while
loop prompt the user for a string until an acceptable one is given.
-
-
-
For functions in the various libraries,
stdio.h
,cs50.h
,string.h
,ctype.h
, reference.cs50.net has user-friendly explanations that are quite helpful.
Arrays33:57
-
A volunteer from the audience, Alex, writes
Zamyla
on the screen, simulatingGetString
. He then writesBelinda
, and thenGabe
. If we think about the screen as all the memory we have, we notice that Alex wrote them with some spacing between the names. A computer has a grid of memory as well:--------------------------------- | Z | a | m | y | l | a | | | --------------------------------- | | | | | | | | | --------------------------------- | | | | | | | | | --------------------------------- | | | | | | | | | ---------------------------------
-
The computer wants to be efficient, and use as much as the memory as possible:
--------------------------------- | Z | a | m | y | l | a |\0 | | --------------------------------- | | | | | | | | | --------------------------------- | | | | | | | | | --------------------------------- | | | | | | | | | ---------------------------------
-
Strings end with a
\0
which, in binary, is eight 0s in a row. And this tells a computer that this is the end of a string in memory. So Belinda is added like so:--------------------------------- | Z | a | m | y | l | a |\0 | B | --------------------------------- | e | l | i | n | d | a |\0 | | --------------------------------- | | | | | | | | | --------------------------------- | | | | | | | | | ---------------------------------
-
And we can continue with even more names:
--------------------------------- | Z | a | m | y | l | a |\0 | B | --------------------------------- | e | l | i | n | d | a |\0 | G | --------------------------------- | a | b | e |\0 | D | a | v | e | --------------------------------- | n |\0 | | | | | | | ---------------------------------
-
So this general idea of storing items in boxes is known as an
array
. Anarray
is a type of data structure, with a continguous number of the same type of data, back-to-back. Astring
is just anarray
ofchar
variables, and we can even put numbers in an array. Arrays will generally have this format:type name[size];
-
Now say we wanted to get the ages of a number of people in the room. We might start with:
1#include <cs50.h> 2#include <stdio.h> 3 4int main(void) 5{ 6 int age1 = GetInt(); 7 int age2 = GetInt(); 8 int age3 = GetInt(); 9 10 // do something with those numbers ... 11}
-
But this will limit us to only ever storing exactly 3 ages. So we can solve this problem with
ages.c
:1#include <cs50.h> 2#include <stdio.h> 3 4int main(void) 5{ 6 // determine number of people 7 int n; 8 do 9 { 10 printf("Number of people in room: "); 11 n = GetInt(); 12 } 13 while (n < 1); 14 15 // declare array in which to store everyone's age 16 int ages[n]; 17 18 // get everyone's age 19 for (int i = 0; i < n; i++) 20 { 21 printf("Age of person #%i: ", i + 1); 22 ages[i] = GetInt(); 23 } 24 25 // report everyone's age a year hence 26 printf("Time passes...\n"); 27 for (int i = 0; i < n; i++) 28 { 29 printf("A year from now, person #%i will be %i years old.\n", i + 1, ages[i] + 1); 30 } 31}
-
In line 16, we declare an array that stores exactly
n
integers. The number in this case is how big we want the array to be, whereas earlier when we useds[i]
we were retrieving that particular item in the array since it was already declared. -
Then we
GetInt
for each person, storing it inages[i]
as we go through the loop, meaning the ages will be placed in the first box, second box, and so on of theages[]
array. -
Finally, we iterate through the array again and print out each age, with 1 added to demonstrate what we can do after we retrieve the
int
from the array.
-
Command-Line Arguments46:00
-
It turns out, instead of using
GetInt
orGetString
to get input, we can use command-line arguments to get words typed into the blinking prompt, after your program’s name. For example, you might type:jharvard@appliance (~): ./caesar 13 ... jharvard@appliance (~): ./caesar 7
-
The number after
./caesar
is the command-line argument that can change every time you run the program, and the program will use it as input.
-
Cryptography46:36
-
In Problem Set 2 we introduce you to cryptography, scrambling information. In particular, Caesar and Vigenere ciphers will rotate letters to another letter. Caesar adds a particular number to all the letters, while Vigenere adds a different number to each letter. In the end, you’ll see that you use the same key to turn plaintext into ciphertext, and back to plaintext.
-
In the Hacker Edition, we’ll give you some usernames and encrypted (well, "hashed") passwords that look like
{crypt}$1$LlBcWwQn$pxTB3yAjbVS/HTD2xuXFI0
, challenging you to crack them and finding the original passwords.
-
-
A clip from Spaceballs humorously has "the stupidest combination I’ve ever heard in my life":
12345
.