Introduction
-
Rob Bowden teaches today, and makes fun of David for sending a vertical video to apologize for missing today’s lecture.
-
Today we’ll talk more about data representation and cryptography, or scrambling information, but first a story from yesteryear.
-
Radio Orphan Annie’s Secret Decoder Ring is a child-friendly form of cryptography, with two discs that rotates independently, the outer ring containing the letters
A
-Z
and the inner ring the numbers1
-26
to encode a message by mapping letters to numbers. -
This clip from A Christmas Story shows a child, Ralphie, excitedly decoding the secret message from the radio, only to find that it is an advertisement for Ovaltine, a beverage popular many years ago.
-
More on Strings
-
Recall from Monday the following representation of a string containing Zamyla’s name, with the individual `char`s split up into separate boxes.
------------------------- | Z | a | m | y | l | a | -------------------------
-
We can access these individual characters using bracket notation, so if we assign this string to a variable called
s
, we can get the first character usings[0]
, the second withs[1]
, and so on up tos[5]
(the finala
). -
This string is of length
6
, but positions in a string are 0-indexed, so it contains indices0
through5
.
-
-
Now let’s look at this string in a larger context of your computer’s memory:
--------------------------------- | Z | a | m | y | l | a | | | --------------------------------- | | | | | | | | | --------------------------------- | | | | | | | | | --------------------------------- | | | | | | | | | ---------------------------------
-
The computer’s memory is basically one long string of bytes, but we’ll represent it as a grid to save space.
-
-
A volunteer from the audience acts as a computer implementing the following code and storing the strings in memory:
#include <stdio.h> #include <cs50.h> int main(void) { // get four strings from the user string s1 = GetString(); string s2 = GetString(); string s3 = GetString(); string s4 = GetString(); // print the first string entered printf("string s1 is %s\n", s1); }
-
Rob, as the user, provides the strings DEAN, HANNAH, MARIA, and ROB, and our volunteer fills in the computer’s memory like so:
------------------------- |...| D | E | A | N | | ------------------------- | H | A | N | N | A | H | ------------------------- | | M | A | R | I | A | ------------------------- | | R | O | B | |...| -------------------------
-
Our volunteer left spaces between each string so the computer can tell where each string ends (otherwise, when we try to print just
s1
, we would getDEANHANNAHMARIAROB
rather than justDEAN
!). -
Rather than spaces, the computer actually uses a special terminator character,
\0
, to represent the end of a string:------------------------- |...| D | E | A | N |\0 | ------------------------- | H | A | N | N | A | H | ------------------------- |\0 | M | A | R | I | A | ------------------------- |\0 | R | O | B |\0 |...| -------------------------
-
-
The
strlen
function also relies on the presence of\0
- it just iterates over the characters of the string until it finds a\0
. -
In memory, the character
\0
is actually represented by a byte of all0
s (so 8 consecutive 0 bits). So what about the character0
? Remember ASCII, the system that maps characters to underlying byte values, whereA
maps to65
and so on; the number0
is represented by ASCII48
. -
We’ll refer to the
\0
character as a nul terminator.
Arrays
-
So this general idea of storing items in boxes, as we do under the hood in a string, is known as an array. An array is a type of data structure, with a continguous number of the same type of data, back-to-back. A
string
is just an array ofchar
variables, but we can put any of our other data types in an array as well. -
Now say we wanted to get the ages of a number of people in the room. We might start with:
1#include <cs50.h> 2#include <stdio.h> 3 4int main(void) 5{ 6 int age0 = GetInt(); 7 int age1 = GetInt(); 8 int age2 = GetInt(); 9 10 // do something with those numbers ... 11}
-
But this will force us to completely rewrite our code, and copy-paste that
GetInt
line for every person we want to get an age from. -
We can solve this problem by using an array. The general format for declaring an array of a given
type
andsize
as a variable calledname
is:type name[size];
-
Let’s look at how we do this in
ages.c
:1#include <cs50.h> 2#include <stdio.h> 3 4int main(void) 5{ 6 // determine number of people 7 int n; 8 do 9 { 10 printf("Number of people in room: "); 11 n = GetInt(); 12 } 13 while (n < 1); 14 15 // declare array in which to store everyone's age 16 int ages[n]; 17 18 // get everyone's age 19 for (int i = 0; i < n; i++) 20 { 21 printf("Age of person #%i: ", i + 1); 22 ages[i] = GetInt(); 23 } 24 25 // report everyone's age a year hence 26 printf("Time passes...\n"); 27 for (int i = 0; i < n; i++) 28 { 29 printf("A year from now, person #%i will be %i years old.\n", i + 1, ages[i] + 1); 30 } 31}
-
In line 16, we declare an array that stores exactly
n
integers. The number in this case is how big we want the array to be, whereas earlier when we useds[i]
we were retrieving that particular item in the array since it was already declared.-
This means that somewhere in memory, we have
n
integer-sized (32-bit, or 4-byte) boxes in a row.
-
-
Then we
GetInt
for each person, storing it inages[i]
as we go through the loop, meaning the ages will be placed in the first box, second box, and so on of theages[]
array.-
Now it should make a little more sense why the convention is to start
for
loops from0
rather than1
- we very often use afor
loop to iterate over the indices of an array, and arrays are 0-indexed.
-
-
Finally, we iterate through the array again and print out each age, with 1 added to demonstrate what we can do after we retrieve the
int
from the array. -
What happens if we try to store another
int
inages[n+1]
? Just as we saw when we tried to read many bytes past the end of a string, this results in a segmentation fault, an error that indicates we’ve touched a segment of memory that doesn’t belong to us. -
What if we didn’t check whether the user entered a positive number for the number of people in the room? An array cannot have a negative size - if you declare an array with a negative size directly in your code, it will not compile - so if the user gets away with entering a negative size for the array, it also results in a segmentation fault.
-
Command-Line Arguments
-
Now that we’ve seen arrays, we can start to work with command-line arguments, which we’ll need for Problem Set 2.
-
Commands like
cd
don’t ask for input; instead they take arguments from the command-line, socd pset1
changes the directory topset1
without a separate prompt for input.mkdir pset2
makes a directory calledpset2
, andmake hello
builds a program calledhello
.-
clang -o hello hello.c
has three such arguments (-o
,hello
, andhello.c
).
-
-
We’ve been writing programs that look like this, whereby
main
does not take any arguments (as implied by the presence ofvoid
):#include <cs50.h> #include <stdio.h> int main(void) { // TODO }
-
This means that no other words can be typed after the program’s name and accessed within
main
, and the only way to provide input is by a function running after the program is started, like withGetString
.
-
-
We will start adding code like this:
#include <cs50.h> #include <stdio.h> int main(int argc, string argv[]) { // TODO }
-
We see that
main
now takes two arguments,argc
which is anint
, andargv
which is an array of strings. We specify the name,argv
(short for argument vector, or array of arguments), but not the size, so any array can be passed in tomain.
-
This is a slightly different use of the bracket notation we haven’t seen before - rather than indicating the size of an array or a position within an array, these empty brackets just mean that
argv
is an array (of unspecified size), and we know it’s an array of strings because it’s declared asstring argv[]
. -
We need the size of
argv
to be unspecified because many programs (such asclang
) can take different numbers of command-line arguments depending on what you’re trying to do with them.
-
-
So command-line arguments look like this in memory:
argc ------------ | | ------------ argv ------------------------------------------------ | | | | | ... ------------------------------------------------
-
argv
is a chunk of memory that stores onestring
after another, andargc
is a single chunk of memory that holds anint
. -
We can access each string individually:
argc ------------ | | ------------ argv[0] argv[1] argv[2] argv[3] ------------------------------------------------ | | | | | ... ------------------------------------------------
-
If we run a program with
./hello
, the contents ofargc
andargv[0]
would be as follows:argc ------------ | 1 | ------------ argv[0] argv[1] argv[2] argv[3] ------------------------------------------------- | ./hello | | | | ... -------------------------------------------------
-
If we ran
clang -o hello hello.c
, however, we get:argc ------------ | 4 | ------------ argv[0] argv[1] argv[2] argv[3] ------------------------------------------------- | clang | -o | hello | hello.c | ... -------------------------------------------------
-
Since we don’t know where
argv
will end by itself, we needargc
to tell us where to stop looking. -
Let’s write a program that uses these arguments. What about a program that says
hello
without usingGetString
? Instead, it’ll take arguments like this:argc ------------ | 2 | ------------ argv[0] argv[1] argv[2] argv[3] ------------------------------------------------- | ./hello | Zamyla | | | ... -------------------------------------------------
-
We’ll call this
hello-3.c
:1#include <cs50.h> 2#include <stdio.h> 3 4int main(int argc, string argv[]) 5{ 6 printf("hello, %s\n", argv[1]); 7}
-
argv[1]
contains whateverstring
is passed in after the name of our program.
-
-
But what happens if we don’t type someone’s name in?
jharvard@ide50:~/workspace/src2w $ make hello-3 clang -ggdb3 -O0 -std=c99 -Wall -Werror hello-3.c -lcs50 -lm -o hello-3 jharvard@ide50:~/workspace/src2w $ ./hello-3 hello, (null)
-
printf
is printing(null)
because there’s nothing (well, technically,NULL
) inargv[1]
. -
When we run just
./hello-3
,argc
is 1, so the length of the arrayargv
is 1, so the only valid index inargv
isargv[0]
.
-
-
So let’s look at how we can prevent something like this happening in
hello-4.c
:1#include <cs50.h> 2#include <stdio.h> 3 4int main(int argc, string argv[]) 5{ 6 if (argc == 2) 7 { 8 printf("hello, %s\n", argv[1]); 9 } 10 else 11 { 12 printf("hello, you\n"); 13 } 14}
-
On line 6 we make sure that
argc
has a value of2
, and if so, we know we have a single name string to plug in and say hello to the user. If not, we instead substitute a generic message of "hello, you".jharvard@ide50:~/workspace/src2w $ ./hello-4 hello, you jharvard@ide50:~/workspace/src2w $ ./hello-4 Rob hello, Rob jharvard@ide50:~/workspace/src2w $ ./hello-4 Rob Maria hello, you jharvard@ide50:~/workspace/src2w $
-
-
Let’s look at
argv-1.c
:1#include <cs50.h> 2#include <stdio.h> 3 4int main(int argc, string argv[]) 5{ 6 // print arguments 7 for (int i = 0; i < argc; i++) 8 { 9 printf("%s\n", argv[i]); 10 } 11}
-
We’re iterating over the
argv
array, using its length,argc
, to know when to stop. -
This will print each argument, one per line:
jharvard@ide50:~/workspace/src2w $ make argv-1 clang -ggdb3 -O0 -std=c99 -Wall -Werror argv-1.c -lcs50 -lm -o argv-1 jharvard@ide50:~/workspace/src2w $ ./argv-1 ./argv-1 jharvard@ide50:~/workspace/src2w $ ./argv-1 Rob ./argv-1 Rob jharvard@ide50:~/workspace/src2w $ ./argv-1 Rob Maria Hannah ./argv-1 Rob Maria Hannah
-
Note that
argc
can never be less than 1, because there will always at least be the name of the program itself.
-
-
We can take this further in
argv-2.c
:1#include <cs50.h> 2#include <stdio.h> 3#include <string.h> 4 5int main(int argc, string argv[]) 6{ 7 // print arguments 8 for (int i = 0; i < argc; i++) 9 { 10 for (int j = 0, n = strlen(argv[i]); j < n; j++) 11 { 12 printf("%c\n", argv[i][j]); 13 } 14 printf("\n"); 15 } 16}
-
Now we go through each argument with line 8, but in line 10 we check the length of the argument stored in
argv[i]
, store it inn
, and usej
as a counter to iterate throughargv[i]
sincei
was already used. -
Then in line 12 we use this new syntax,
argv[i][j]
that gets the i’th string and the j’th character in that string. This relies on the fact thatargv
is an array of strings, which are themselves arrays of characters, soargv
is a nested array of arrays.jharvard@ide50:~/workspace/src2w $ make argv-2 clang -ggdb3 -O0 -std=c99 -Wall -Werror argv-2.c -lcs50 -lm -o argv-2 jharvard@ide50:~/workspace/src2w $ ./argv-2 . / a r g v - 2 jharvard@ide50:~/workspace/src2w $ ./argv-2 foo bar . / a r g v - 2 f o o b a r
-
-
One thing to note is that command-line arguments are generally separated by spaces, as you’d expect, but if you want to pass an argument that itself contains a space, you can do it as follows:
jharvard@ide50:~/workspace/src2w $ ./argv-2 Rob Maria 'Hannah Blumberg'
-
By enclosing
Hannah Blumberg
in quotes, we tell the program that it’s not two separate arguments, but rather only one.
-
Cryptography
-
In Problem Set 2 we introduce you to cryptography, specifically secret-key crypto, which can only be decoded by someone who knows the secret key that was used to encode the message.
-
In the Hacker Edition, we’ll give you some usernames and encrypted (well, "hashed") passwords that look like
{crypt}$1$LlBcWwQn$pxTB3yAjbVS/HTD2xuXFI0
, challenging you to crack them and finding the original passwords. -
After this problem set, you’ll be able to decode what this means:
or fher gb qevax lbhe Binygvar
-
As well as this URL, which you may remember from Week 0:
uggcf://lbhgh.or/bUt5FWLEUN0