[MUSIC PLAYING] DAVID MALAN: All right, so this is CS50. And this is week 2, wherein we're going to dive in a little more deeply to see this new language. And we're also going to take a look back at some of the concepts we looked at last week so that you can better understand some of the features of C and some of the steps you've been taking to make your code work. So we'll peel back some of the layers of abstraction from last week so that you better understand really what's going on underneath the hood of the computer. 

So, of course, last week, we began with perhaps the most canonical of programs in C, the most canonical of programs you can write pretty much in any language, which is that which says, quite simply, "hello, world." But recall that before actually running this program, we have to convert it into the language that computers themselves speak, which we defined last week as binary, 0's and 1's, otherwise known as machine language in this context. So we have to go somehow from this source code to something more like this machine code, the 0's and 1's that the computer actually understands. 

Now, you may recall too that we introduced a command for this. And that command was called make. And literally via this command, "make hello," could we make a program called hello. And make was a little fancy. It assumed that if you want to make a program called hello, it would look for a file called hello.c. That just happens automatically for you. And the end result, of course, was an additional file called hello that would end up getting put into your current directory. So you could then do ./hello and be on your way. 

But it turns out that make is actually automating a more specific set of steps for us that we'll see a little more closely now instead. So on the screen here is exactly the same code that we wrote last week to say, quite simply, "hello, world." And recall that any time you run "make hello" or "make mario" or "make cash" or "make credit," any of the problems that you might have tackled more recently, you see some cryptic output on the screen. Hopefully, no red or yellow error messages, but even when all is well, you see this white text which is indicative of all having been well. 

And last week, we just kind of ignored this and moved on and immediately did something like ./hello. But today, let's actually better understand what it is that we've been turning a blind eye to so that each week, as it passes, there's less and less that you don't understand the entirety of with respect to what's going on your screen. 

So again, if I do ls here, we'll see not only hello.c, but also the executable program called hello that I actually created via make. But look at this output. There's some mention of something called Clang here. And then there's a lot of other words or cryptic phrases, something in computer speak here that has all of these hyphens in front of them. 

And it turns out that what make is doing for us is it's automating execution of a command more specifically called clang. Clang is actually the compiler that we alluded to last week, a compiler being a program that converts source code to machine code. We've actually been using Clang this whole time. But notice that Clang requires a bit more sophistication. You have to understand a bit more about what's going on in order to use it. 

So let me go ahead and remove the program called hello. I'm going to use the rm command that we saw briefly last time. I'm going to confirm by hitting y. And if I type ls again now, hello.c is the only file that remains. 

Well, temporarily, let me take away the ability to use make. And let's now use Clang directly. Clang is another program installed in CS50 IDE. It's a very popular compiler that you can download onto your own Macs and PCs as well. But to run it is a little different. I'm going to go ahead and say clang and then the name of the file that I want to compile, hello.c being this one. I'm going to go ahead and hit Enter. And now nothing happens, seemingly. But frankly, as you've probably gleaned already, when nothing bad seems to happen, that implicitly tends to mean that something good happened. Your program compiled successfully. 

But curiously, if I type ls now, you don't see the program, hello. You see this weird file name called a.out. And this is actually a historical remnant. Years ago, when humans would use a compiler to compile their code, the default file name that every program was given was a.out for assembly output. More on that in a moment. But this is kind of a stupid name for a program. It's not at all descriptive of what it does. 

So it turns out that programs like Clang can be configured at the command line. The command line, again, refers to the blinking prompt where you can type commands. So indeed, I'm going to go ahead and remove this file now-- rm space a.out, and then confirm with y. And now I'm back to where I began with just hello.c. 

And let me go ahead now and do something a little different. I'm going to do "clang -o hello" and then the word "hello.c." And what I'm doing here is actually providing what we're going to start calling a command-line argument. So these commands, like make and rm, sometimes can just be run all by themselves. You just type a single word and hit Enter. 

But very often, we've seen that they take inputs in some sense. You type, "make hello." You type, "rm hello." And the second word, "hello," in those cases, is kind of an input to the command, otherwise now known as a command-line argument. It's an input to the command. 

So here, we have more command-line arguments. We've got the word "clang," which is the compiler we're about to run, "-o," which it turns out is shorthand notation for "output," so please output the following. What do you want to output? Well, the next word is "hello." And then the final word is "hello.c." 

So long story short, this command now more verbose though it is, is saying, run Clang, output a file called hello, and take as input file called hello.c. So when I run this command after hitting Enter, nothing again seems to happen. But if I type ls, I don't see that stupid default file name of a.out. Now I see the file name, hello. 

So this is how ultimately Clang is helping me compile my code. It's kind of automating all of those processes. But recall that that's not the only type of program we ran last week or wrote last week. We rather took code like this and began to enhance it with some additional lines. So version 2 of Hello, World actually involved prompting the user for input using CS50's get_string function, storing the output in a variable called name. But recall that we also had to add cs50.h at the top of the file. 

So let me go ahead and do that. Let me go ahead and remove hello because that's now the old version. Let me go in now and start updating my code here and go into my hello.c file, include cs50.h, now get myself a string called name, but we could call it anything, call the function get_string, and ask, "What's your name," question mark with a space at the very end just to create a gap. 

And then down here, instead of printing out "hello, world" always, let me print out "Hello, %s," which is a placeholder recall, and output the person's name. So last week, the way we compiled this program was just "make hello," no different from now. But this week, suppose I were to instead get rid of make, only because it's sort of automating steps for me that I now want to understand in more detail. I could compile this program again with clang -o hello hello.c, so just a reapplication of that same idea of passing in three arguments, -o, hello, and hello.c. 

But the catch now is that I'm actually going to see one of these red error messages. And let's consider what this is actually saying. There's still going to be a bunch of cryptic stuff here. But notice, as always, we're going to see, hopefully, something that's a little familiar. So "undefined reference to get_string." I don't yet know what an undefined reference is, necessarily. I don't know what a linker command is. But I at least recognize there's something going on with get_string. 

And there's a reason for this. It turns out that when using a library, whether it's CS50's library or others' as well, it's sometimes not sufficient only to include the header file at the top of your own code. Sometimes, you additionally have to tell the computer where to find the 0's and 1's that someone has written to implement a function like get_string. 

So the header file, like cs50.h, just tells the compiler that the function exists. But there's a second mechanism that, up until now, has been automated for us, that tells the computer where to find the actual 0's and 1's that implements the functions in that header file. So with that said, I'm going to need to actually add another command line argument to this command. And instead of doing clang -o hello hello.c, I'm going to additionally, and admittedly, cryptically, do -lcs50 at the end of this command, which quite simply refers to link in the CS50 library. 

So "link" is a term of art that we'll see what it means in more detail in just a moment. But this additional final command-line argument tells Clang, you already know that a function like get_string exists. -lcs50 means when compiling hello.c, make sure to incorporate all of the machine code from CS50's library into your program as well. In short, it's something you have to do when you use certain libraries. 

So now when I hit Enter, all seems to be well because nothing bad got printed. If I type ls, I see hello. And voila, I can do ./hello, type in my name, David. And voila, "hello, David." 

So why didn't we do all of this last week? And frankly, we've made no fundamental progress. All we've done is reveal what's going on underneath the hood. But I'll claim that, frankly, compiling your code by typing out all of these verbose command-line arguments just gets tedious quickly. And so computer scientists and programmers, more specifically, tend to automate monotonous steps. 

So what's happening ultimately with make is that all of this is being automated for us. So when you typed "make hello" last week-- and henceforth, you're welcome to continue using make as well-- notice that it generates this extra long command, some of which we haven't even talked about. But I do recognize clang at the beginning. I recognize hello.c see here. I recognize -lcs50 here. 

But notice there's a bunch of other stuff as well, not only the -o hello, but also -lm, which refers to a math library, -lcrypt, which refers to a cryptography or an encryption library. In short, we the staff have preconfigured make to just make sure that when you compile your code, all of the requisite dependencies, libraries, and so forth, are available to you without having to worry about all of these command-line arguments. 

So henceforth, you can certainly compile your code in this way using Clang directly. Or you can come back full circle to where we were last week and just run "make hello." But there's a reason we run make hello, because executing all of those steps manually tends to just get tedious quickly. 

And so indeed, what we've done here is compile our code. And compiling means going from source code to machine code. But today, we revealed that there's a little more, indeed, going on underneath the hood, this "linking" that I referred to and a couple of other steps as well. So it turns out when you compile your code from source code to machine code, there's a few more steps that are ultimately involved. 

And when we say "compiling," we actually mean these four steps. And we're not going to dwell on these kinds of low-level details. But it's perhaps enlightening just to see a brief tour of what's going on when you start with your source code and end up trying to produce machine code. 

So let's consider this. This is step 1 that the computer is doing for you when you compile your code. So step 1 takes your own source code that looks a little something like this. And it preprocesses your code, top to bottom, left to right. And to preprocess your code essentially means that it looks for any lines that start with a hash symbol, so #include cs50.h, #include stdio.h. 

And what the preprocessing step does is it's kind of like a find and replace. It notices, oh, here's a #include line. Let me go ahead and copy the contents of that file, cs50.h, into your own code. Similarly, when I encounter #include stdio.h, let me, the so-called preprocessor, open that file, stdio.h, and copy/paste the contents of that file so that what's in the file now looks more like this. So this is happening automatically. You never have to do this manually. 

But why is there this preprocessing step? If you recall our discussion last week of these lines of code that tend to go at the top of your file, does anyone perceive what the preprocessor is doing for me and why? Why do I write code that has these hash symbols, like #include cs50.h and #include stdio.h, but this preprocessor apparently is automatically replacing those lines with the actual contents of those files? What are these things here in yellow now? Yeah, Jack, what do you think? 

JACK: Is it defining all the functions for you to use in your code, otherwise the computer wouldn't know what to do? 

DAVID MALAN: Exactly. It's defining all of the functions in my code so that the computer knows what to do. Because remember that we ran into that sort of annoying bug last week, whereby I was trying to implement a function called, I think, get_positive_int. And recall that when I implemented that function at the bottom of my file, the compiler was kind of dumb in that it didn't realize that it existed because it was implemented all the way at the bottom of my file. 

So to Jack's point, by putting a mention of this function, a hint, if you will, at the very top, it's like training the compiler to know in advance that I don't know how it's implemented yet, but I know get_string is going to exist. I don't know how it's implemented yet, but I know printf is going to exist. 

So these header files that we've been including for the past week essentially contain all of the prototypes-- that is, all of the hints for all the functions that exist in the library-- so that your code, when compiled, know from the top down that those functions will indeed exist. So the preprocessor just saves us the trouble of having to copy and paste all of these prototypes, if you will, all of these hints, ourselves. 

So what happens after that step there? What comes next? Well, there might very well be other header files. There might very well be other contents in those files. But for now, let's just assume that only in there is the prototype. So now compiling actually has a more precise meaning that we'll define today. To compile your code now means to take this C code and to convert it from source code here to another type of source code here. 

Now, this is probably going to be the most cryptic stuff we ever see. And this is not code you need to understand. But what's on the screen here is what's called assembly code. So long story short, there's a lot of different computers in the world. And specifically, there's a lot of different types of CPUs in the, Central Processing Units, the brains of a computer. And a CPU understands certain commands. And those commands tend to be expressed in this language called assembly code. 

Now, I honestly don't really understand most of this myself. It's certainly been a while even since I thought hard about assembly code. But if I highlight a few operative characters here, notice that there's mention of main, get_string, printf. So this is of like a lower-level implementation of main, of get_string and printf, in a different language called assembly. So you write the C code. The computer, though, converts it to a more computer-friendly language called assembly code. 

And decades ago, humans wrote this stuff. Humans wrote assembly code. But nowadays, we have C. And nowadays, we have languages like Python-- more on that in a few weeks-- that are just more user friendly, even if it didn't feel like that this past week. Assembly code is a little closer to what the computer itself understands. 

But there's still another step. There's this step called assembling. And again, all of this is happening when you simply run make and, in turn, this command, clang. To assemble your code means to take this assembly code and finally convert it to machine code, 0's and 1's. So you write the source code. The compiler assembles it into assembly code. Then it compiles it into assembly code. Then it assembles it into machine code until we have the actual 0's and 1's. 

But there's actually one final step. Just because your code that you wrote has been converted into 0's and 1's, it still needs to be linked in with the 0's and 1's that CS50 wrote and that the designers of the C language wrote years ago when implementing the CS50 library in our case, and the printf function in their case. 

So this is to say that when you have code like this that's not only including the prototypes for functions like get_string and printf at the very top, these lines here in yellow are what are ultimately converted into 0's and 1's. We now have to combine those 0's and 1's with the 0's and 1's from cs50.c, which the staff wrote some time ago, and even a file called stdio.c, which the designers of C wrote years ago. And technically, it might be called something different underneath the hood. 

But there's really three files that are getting combined when you write your program. The first, I just claimed, once it's preprocessed and compiled and assembled, it's then in this form of all 0's and 1's. Somewhere on the CS50 IDE, there's a whole bunch of 0's and 1's representing cs50.c. Somewhere in CS50 IDE, there's another file representing the 0's and 1's for stdio.c So this final fourth step, a.k.a. linking, just takes all of my 0's and 1's, all of CS50 0's and 1's, all of printf's 0's and 1's, and links them all together into one big blob, if you will, that collectively represent your program, hello. 

So, my god, like, that's quite a mouthful and so many steps. And none of the steps have I described are really germane to you implementing Mario's pyramid or cash or credit, because what we've really been doing over the past week is taking all four of these fairly low-level, sophisticated concepts and, if you will, abstracting them away so that we just refer to this whole process as compiling. 

So we even though, yes, technically, compiling is just one of the four steps, what a programmer typically does when saying compiling is they're, just with a wave of the hand, referring to all of those lower-level details. But it is the case that there's multiple steps happening underneath the hood. 

And this is what make and, in turn, Clang are doing for you, automating this process of going from source code to assembly code to machine code and then linking it all together with any libraries you might have used. So no longer take for granted what's happening. Hopefully, that offers you a glimpse a bit more of what's actually happening when you compile your own code. 

Well, let me pause there, because that's quite a mouthful, and see if there's any questions on preprocessing, compiling, or assembling, or linking, a.k.a. compiling. And again, we won't dwell at this low level. We'll tend to now just abstract this all away if we can sort of agree that, OK, yes, there's those steps. But what's really important is the whole process, not the minutia. Sophia? 

SOPHIA: I had a question about with the first step, when we're replacing all the information at the top, is that information contained within the IDE? Or where do we-- are there files saved somewhere in that IDE, like, where it's getting all this information from? 

DAVID MALAN: Yeah, really good question. Where are all these files coming from? So yes, when you are using CS50 IDE, or frankly, if you're using your own Mac or your own PC, and you have preinstalled a compiler into your Mac or PC just like we have to CS50 IDE, what you get is a whole bunch of .h files somewhere on the computer system. 

You might also have a whole bunch of .c files, or compiled versions thereof, somewhere on the system. So yes, when you download and install a compiler, you are getting all of these libraries added for you. And we preinstalled an additional library called CS50's library that additionally comes with its own .h file and its own machine code as well. 

So all of those files are somewhere in CS50 IDE, or equivalently, in your own Mac or PC if you're working locally. And the compiler, Clang, in this case, just knows how to find that because one of the steps involved in installing your own compiler is making sure it's configured to know, per Sophia's question, where all those files are. [? Basili? ?] I'm sorry if I'm mispronouncing it. [? Basili? ?] 

[? BASILI: ?] So whenever we're compiling hello, for example, is the compiler also compiling, for example, CS50? Or does CS50 already exist in machine code somewhere beneath? 

DAVID MALAN: Yeah, really good question too. So I was kind of skirting this part of Sophia's question because technically speaking, probably cs50.c is not installed on the system. And technically, stdio.c is probably not installed in the system. Why? It just doesn't need to be. It would be kind of inefficient, that is, slow, if every time you compiled your own program, you had to additionally compile CS50's program, and stdio's program, and so forth. 

So it actually stands to reason that what computers typically do is they precompile all of those library files for you so that more efficiently they can just be linked in. And you don't have to keep preprocessing, compiling, and assembling third-party code. You only perform those steps on your own code and then link everything together. And indeed, that's the case. It's all done in advance. Iris, question from you. 

IRIS: When we replace the header files with prototypes, are we only replacing it with the prototypes that get used? Or are all the prototypes technically substituted? 

DAVID MALAN: Yeah, so I was kind of sweeping that detail under the rug with my dot, dot, dot. There's a whole lot of other stuff in those files. You're getting the entire contents of those files, even if the only thing you need is the prototype. 

But, and this is why I alluded to the fact too that technically, there probably isn't a stdio.c file, because there would be so much stuff in it. There's probably not just one stdio.h file with everything in it. There's probably some smaller files that get magically included as well. But yes, there are many more lines of code in those files. But that's OK. Your compiler is only going to use the lines that it actually cares about. Good question. 

All right, so with that said, this past week undoubtedly was a bit frustrating in some ways because you probably ran into problems. You ran into bugs, mistakes in your own code. You probably saw one or more yellow or red error messages. And you might have struggled a little bit just to get your code to compile. And again, that's normal. That will go away over time. 

But honestly, whenever I write C, let's say 20% of the time, I still have a compilation error, let alone logical errors, in my own code. So this is just part of the experience of writing code. Humans make mistakes in all forms of life. And that's ever more true in the context of code, where again, per our first two weeks precision is important as is correctness. And it's hard sometimes to achieve both of those goals. 

So let's consider now how you might be more empowered to debug your own code-- that is, find problems in your own code. And this word actually has some etymology. This isn't necessarily the first bug. But perhaps the most famous bug is this one pictured here from the research notebook of Grace Hopper, a famous computer scientist, who had discovered that there were some problems with the Harvard Mark II computer, a very famous computer nowadays that actually lives over soon in the new engineering school on campus-- used to live in the Science Center. 

The computer was having problems. And sure enough, when the engineers took a look inside of this big mainframe computer, there was actually a bug, pictured here and taped to Grace Hopper's notebook. So this wasn't necessarily the first use of the term "bug," but it is a very well-known example of an actual bug in an actual computer. Nowadays, we speak a little more metaphorically that a bug is just a mistake in one program. 

And we did give you a few tools last week for troubleshooting bugs. Help50 allows you to better understand some of the cryptic error messages. And that's just because the staff wrote this program that analyzed the problem you're having, and we try to translate it to just more human-friendly speak. 

We saw a tool called style50, which helps you not with your correctness, but just with the aesthetics of your code, helping you better indent things and add white space-- that is, blank lines or space characters-- so it's a little more user friendly to the human to read. And then check50, which, of course, the staff write so that we can give you immediate feedback on whether or not your code is correct per the problem sets or the lab specification. 

But there's some other tools that you should have in your toolkit. And we'll give those to you today. And one, frankly, is this universal debugging tool just called, in the context of C, printf. So printf, of course, is just this function that prints stuff out onto the screen. But that in and of itself is a wonderfully powerful tool via which you can chase down problems in your code. 

And even after we leave C in a few weeks and introduce Python and other languages, almost every programming language out there has some form of printf. Maybe it's called print. Maybe it's called say, as it was in Scratch, but some ability to display information or present information to a human. 

So let's try to use this primitive, this notion of print f, to chase down a bug in one's code. So let me go ahead and deliberately write a buggy program. I'm going to even call the file buggy0.c. And at the top of this file, I'm going to go ahead and #include stdio.h. No need for the CS50 library for this one. And then I'm going to do int main(void), which we saw last week, and we'll explain in more detail today. 

And then I'm going to give myself a quick loop. I just want to go ahead and print out, oh, I don't know, like, 10 hashes on the screen. So I want to print a vertical column, kind of like one of those screenshots from Super Mario Bros., not a pyramid, just a single column of hashes, and 10 of them. 

So I'm going to do something like, int i = 0, because I feel like I learned in class that I generally should start counting from 0. Then I'm going to have my condition in this for loop. And I want to do this 10 times. I'm going to do it less than or equal to 10. Then I'm going to go ahead and have my increment, which quite simply can be expressed as i++. And then inside this loop, I'm just going to go ahead and print out a single hash followed by a new line. 

I'm going to save the program. I'm going to compile it with clang -o buggy0 buggy0-- I mean, no. You don't have to use Clang manually in this way. It's a lot simpler to just abstract that away-- that's not a command-- to abstract that away and run make buggy0. And make will take care of the process of invoking Clang for you. 

I'm going to go ahead and run it. Seems to be compiling successfully, so no need for help50. It's already pretty well styled. In fact, if I run style50 on this buggy0, I don't have any comments yet. But at least it looks very nicely indented. So I think I'm OK with that. But let me add that comment and do "Print 10 hashes" just to remind myself of my goal. 

And now let me go ahead and run this, ./buggy0, Enter. And I see, OK, good. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, I think. All right, so it's a stupid bug. And maybe it's jumped out obviously to some of you. But maybe it's a little more subtle to others of you. But where do you begin? Suppose I were to run check50. And check50 were to say, nope, you printed out 11 hashes instead of 10. But my code looks right to me, at least at first glance. 

Well, how can I go about debugging this or solving this? Well, again, printf is your friend. If you want to understand more about your own program, use printf to temporarily print more information to the screen, not that we want in the final version, not that your TF wants to see, but that you, the programmer, can temporarily see. 

So before I print this hash, let me print something a little more pedantic like this-- "i is now %i backslash n." So I literally want to know, just for my own mental math, what is the value of i at this point before I print that hash? Now I'm going to go ahead and paste in the value of i. So I'm using %i as a placeholder. I'm plugging in the value of the variable i. 

I'm going to save my code now. I'm going to recompile it with make buggy0. And I'm going to rerun it now. And let me go ahead and increase the size of my window just so we can focus now on the output. And I'm going to go ahead and ./buggy0, Enter. 

OK, so now I see not only my output, but also commingled with that output, some diagnostic output, if you will, some debugging output. And it's just more pedantically telling me, "i is now 0," "i is now 1," "i is now 2," dot, dot, dot, "i is now 9," "i is now 10." OK, I don't hate the fact that i is 10. But I'm not loving the fact that if I started at 0 and printed a hash, and I'm hitting 10 and printing another hash, well, obviously, there's my problem. 

So it might not have been all that much more obvious than looking at the code itself. But by using printf, you can just be a lot more clear to yourself what's going on. So if now I see, OK, well, if I start at 0, I have to go up to 10. I could change my code to do this to be less than 10. I could leave that alone and go from 1 through 10. But again, programmer convention would be to go from 0 up to 10. So I think I'm good now. 

And in fact, now I'll go ahead and recompile this, make buggy0. Let me go ahead and increase the size of the window again just so I can temporarily see this and ./buggy0. OK, I start now at 0, 1, 2, dot, dot, dot. Now I stop at 9. And that, of course, gives me 10 hashes. So again, I don't need this in the final output. And I'm to go ahead and delete this now. It's temporary output. 

But again, having those instincts-- if you don't quite understand why your code is compiling but not running properly, and you want to better see what the computer is clearly seeing, its mind eye, use printf to just tell yourself what the value of some variable or variables are anywhere in your code that you want to see a little more detail. 

All right, let me pause for just a moment to see if there's any questions on this technique of just using printf to begin to debug your code and to see the values of variables in a way that's a little more explicit. No? All right. 

Well, let me propose an even more powerful tool that admittedly takes a little getting used to. But this is kind of one of those lessons, trust me, if you will, that if you spend a few more minutes, maybe even an hour or so this week, learning the following tool, you will save yourself hours, plural, maybe even tens of hours over the course of the next many weeks because this tool can help you truly see what's going on inside of your code. 

So this tool we're going to add to the list today is called debug50. And while this one does end with 50, implying that it's a CS50 tool, it's built on top of an industry standard tool known as GDB, the GNU DeBugger, that's a standard tool that a lot of different computer systems use to provide you with the ability to debug your code in a more sophisticated way than just using printf alone. 

So let's go ahead and do this. Let me go back to the buggy version of this program which, recall, had me going from 0 through 10, which was too many steps. A moment ago, I proposed that we just use printf to see the value of i. But frankly, the bigger our programs get, the more complicated they get, the more output they need to have on the screen. It's just going to get very messy quickly if you're printing out stuff that shouldn't be there, right? 

Think back to Mario. Mario's pyramid is this sort of graphical output. And it would very quickly get ugly and kind of hard to understand your pyramid if you're comingling that pyramid with actual textual output from printf as well. So debug50, and in turn a debugger in any language, is a tool that allows you to run your code step by step and look inside of variables and other pieces of memory inside of the computer while your program is running. 

Right now, pretty much every program we run takes a split second to run. That's way too fast for me, the human, to wrap my mind around what's going on step by step. A debugger allows you to run your program, but much more slowly, step by step, so you can see what's going on. 

So I'm going to go ahead now and run debug50 ./hello. No, sorry, debug50 ./buggy0. So I write debug50 first, a space, and then dot slash and the name of the program that's already compiled that I want to debug. So I'm going to go ahead and hit Enter. And notice that, oh, it was smart. It noticed that I changed my code. And I did a moment ago. I reverted it back to the buggy version. So let me fix this-- make buggy0. All right, no errors. 

Now let me go ahead and run debug50 again. And if you haven't noticed this already, sometimes I seem to type crazy fast. I'm not necessarily typing that fast. I'm going through my history in CS50 IDE. Using your arrow keys, Up and Down, you can scroll back in time for all of the commands you've typed over the past few minutes or hours or even days. And this will just start to save you keystrokes. So I'm going to go ahead and hit Up. And now I don't have to bother typing this whole command again. It's a helpful way to just save time. 

I'm going to go head in now and hit Enter. And now notice this error message-- I haven't set any breakpoints. "Set at least one breakpoint by clicking to the left of a line number and then re-run debug50!" Well, what's going on here? Well, debug50 needs me to tell the computer in advance at what line I want to break into and step through step by step. 

So, I can do that. I'm going to go over to the side of the file here, as it says. And you know what? The first interesting line is this one here, line 6. So I clicked in the so-called gutter, the left-hand side of the screen, on line 6. And that automatically put a red dot there, like a stop sign. 

Now, one last time, I'm going to go ahead and run debug50 ./buggy0 and hit Enter. And now notice this fancy new panel opens up on the right-hand side. And it's going to look a little cryptic at first. But let's consider what has changed on the screen. Notice now that highlighted in this sort of off-yellow color is line 6. And that's because what debug50 is doing is it's running my program, but it has paused execution on line 6. So it's done everything from line 1 through 5, but now it's waiting for me on line 6. 

And what's interesting over here is this-- let me zoom in on this window over here. And there's a lot going on here, admittedly. But let's focus for just a moment not on Watch Expressions, not on Call Stack, but only on Local Variables. And notice, I have a variable called i whose initial value is 0, and it's of type int. 

Now, this is kind of interesting because watch what I can do via these icons up here. I can click on this Step Over line and start to step through my code line by line. So let me go ahead and zoom out. Let me go ahead and click Step Over. And watch what happens to the yellow highlighting. It moves down to the next line. But notice, if I zoom in again up here, the value of i has not changed. Now let me go ahead and step over again. And notice the yellow highlighting doubles back. That makes sense because I'm in a loop. So it should be going back and forth, back and forth. 

But what next happens in a loop? Every time you go back to the beginning of the loop, remember that your incrementation happens, like the i++. So watch now closely in the top right-hand corner, when I Step Over now, notice that the value of i in my debugger has just been changed to 1. So I didn't have to use printf. I didn't have to mess up the output of my screen. I can literally see in this GUI, this Graphical User Interface on the right-hand side, what the value of i is. 

Now if I just start clicking a little more quickly, notice that as the loop is executing, again and again, the value of i keeps getting updated. And you know what? I bet, even though we started at 0, if I do this enough times, I will see that the value is 10 now, thereby giving me another printf at the bottom, thereby explaining the 11 total hashes that I saw. 

So I haven't gotten any new information here. But notice I've gotten unperturbed information. I've not messily and sloppily printed out all of these printf statements on the screen. I'm just kind of watching a little more methodically what's happening to the state of my variable over on the top right there. 

All right, let me pause here too to see if there's any questions on what this debugger does. Again, you compile your code. You run debug50 on your code, but only after setting a so-called breakpoint, where you decide in advance where do you want to pause execution of your code. Even though here I did it pretty much at the beginning of my program, for bigger programs, it's going to be super convenient to be able to pause halfway through your code and not have to go through the whole thing. Peter, question. 

PETER: About the debugger, what's the difference between Step Over and Step Into and Step Out and-- 

DAVID MALAN: Really good question. Let me come back to that in just a moment, because we'll do one other example where Step Into and Step Out actually are germane. But before we do that. Any other questions about debug50 before we reveal what Step Into and Step Over do for us as well? 

Oh, all right. Well, let's take Peter's question right there. Let me go ahead now and get out of the debugger. And honestly, I don't see an obvious way to get out of the debugger at the moment. But Control-C is your new friend today too. Pretty much any time you lose control of a program because the debugger's running, and you've lost interest in it. Or maybe last week, you wrote a program that has an infinite loop that just keeps going and going and going, Control-C will break out of that program. 

But let's now write quickly another program that, this time, has a second function. And we'll see one other feature of the debugger today. I'm going to go ahead and create a new file now called buggy1.c. Again, it's going to be deliberately flawed. 

But I'm going to first going to go ahead and #include cs50.h this time. And I'm going to #include stdio.h. I'm going to do int main void. And I'm going to go ahead and do the following-- give myself a variable called i. And I'm going to try to get a negative int by calling a function called get_negative_int. And then quite simply, I'm going to print out this value, "%i backslash n", i, semicolon. 

Now, there's only one problem-- get_negative_int does not exist. So like last week, where we implemented get_positive_int, this week, I'll implement get_negative_int. But I'm going to do it incorrectly at first. Now, get_negative_int, as the name implies, needs to return an integer. And even though we only spent brief time on this last week, recall that you can specify the output of a function, a custom function that you wrote, by putting its so-called return value first on this line. And then you can put the name of the function, like get_negative_int, and then in parentheses, you can put the input to the function. But if it takes no input, you can literally write the word "void," which is a term of art that just means, nothing goes here. 

I'm going to go ahead now and implement get_negative_int. And frankly, I think it's going to be pretty similar to last week. But my memory is a little hazy. So again, it will be deliberately flawed. But I'm going to go ahead and declare a variable called n. Then I'm going to do the following-- I'm going to set n equal to get_int. And I'm just going to explicitly ask the user for "Negative integer" followed by a space. And then I'm going to keep doing this while n is less than 0. And then at the very last line, I'm going to return n. 

So again, I claim that this function will get me a negative int from the user. And it's going to keep doing it again and again until the user cooperates. However, there is a bug. And there's a couple of bugs, in fact. Right now, let me go ahead and make a deliberate mistake-- make buggy1, Enter. And I see a whole bunch of errors here. 

I could use help50 on this. But based on last week, does anyone recall what the error here might be? "Error-- implicit declaration of function 'get_negative_int' is invalid in C99." So I don't know all of that, but implicit declaration of function is something you're going to start to see more often if you make this mistake. Anyone recall what this means and what the fix is without resorting to help50? Yeah, Jasmine, what do you think? 

JASMINE: So basically, since you declared it after you already used it in your code, it doesn't know what to read that as when it's processing it. So you have to move the first line above when you actually start the code. 

DAVID MALAN: Perfect. And this is the only time I will claim that copy/paste is acceptable and encouraged. I'm going to copy the very first line only of that function. And as Jasmine proposed, I'm going to paste it at the very top of the file, thereby giving myself a hint otherwise known as a prototype. So I'll even label it as such to remind myself why it's there-- prototype of that function. And here, I'm going to go ahead and "Get negative integer from user." And then this function is as left as written. 

So I now have this prototype at the very top of my file, which I think will indeed get rid of this error. Let me go to make buggy1 again. Now I see that indeed compiled OK. But when I run it now, ./buggy1-- let me go ahead and input a negative integer, negative 1. Hm. Negative 2, negative 3-- I feel like the function should be happy with this, and it's obviously not. So there's a bug. I'm going to go ahead and hit Control-C to get out of my program because otherwise, it would run potentially forever. 

And now I'm going to use debug50. But debug50 just got really interesting, to Peter's question earlier, because now I have things I can step into. I'm not writing all of my code in main. There's this other function now called get_negative_int. So let's see what happens now. 

Let me go ahead and set a breakpoint on the first interesting line of code, line 10. And it's interesting only in the sense that everything else is kind of boilerplate at this point. You just have to do it to get your program started. I'm going to now go down here. And I'm going to do debug50 ./buggy1. And in a moment, it's going to open up that sidebar. And I'm going to focus now not only on local variables-- like I did before, notice that i is again equal to 0 here by default. But I'm also going to reveal this option here, Call Stack. 

So Call Stack is a fancy way of referring to all of the functions that your program at this point in time has executed and not yet returned from. So right now, there's only one thing on the call stack because the only function that is currently executing is, of course, main, because why? I set a breakpoint at line 10, which is, by definition, inside of main. 

But to Peter's question earlier, I feel like lines 10 and 11-- frankly, they look pretty correct, right? It's hard at this point to have screwed up lines 10 and 11 except syntactically, because I'm getting a negative int. I'm storing it in i, and then I'm printing out the value of i on those two lines. But what if instead, I'm curious about get_negative_int? I feel like the bug-- logically, it's got to be in there because that's the harder code that I wrote. 

Notice this time, instead of clicking Step Over, let me go ahead and click on Step Into, which is one of the buttons Peter alluded to. And when I click Step Into, notice that you sort of go down the rabbit hole. And debug50 jumps into the function get_negative_int, and it focuses on the first interesting line of code. So do, in and of itself, really isn't that interesting. Int n isn't that interesting because it's not assigning a value to it even yet. The first juicy line of code seems to be line 19. And that's why the debugger has jumped to that line. 

Now, n = get_int feels pretty correct. It's hard to misuse get_int. But notice now on the right-hand side what has happened. Under Call Stack, you now see two things, not only main, but also get_negative_int in a stack. It's like a stack of trays in a cafeteria. The first tray at the bottom is like main. The second tray on the stack in the cafeteria is now get_negative_int. 

And what's cool about this is that notice that right now, I can see my local variables, n. And that's indeed the variable I used. So I no longer see i. I see n because I'm into the get_negative_int function. And now if I keep clicking Step Over again and again after typing in a number. Let me type in negative 1 here. 

Now notice on the top right of the screen, you can see in the debugger that n equals negative 1. I'm going to now go ahead and click Step Over. And I think I'm going to end up in line 22. If the human has typed in a negative integer like negative 1, obviously, that's negative. Let's proceed to line 22. But watch what happens when I click Step Over. It actually seems to be going back to the do loop again and again and again, as it will, I keep providing negative integers. 

So my logic then should be, well, OK, if n is negative 1, but my loop is still running, what should your logical takeaway here be? If n is negative 1, and that is by definition a negative integer, but my loop is still running, what could be your diagnostic conclusion if the debugger is essentially revealing this hint to you? n is negative 1, but the loop is still going. Omar, what would you conclude? 

OMAR: Either the condition is wrong, or maybe some sort of Boolean logic could be flawed. 

DAVID MALAN: Perfect. So obviously, either the condition is wrong, or there's something wrong with my Boolean logic. And Boolean logic just refers to true or false. So somewhere, I'm saying true instead of false, or I'm saying false instead of true. And frankly, the only place where I have code that's going to make this loop go again and again must logically be on line 21. So even if you're not quite sure how to fix it yet, just by deduction, you should realize that, OK, negative 1 is what's in the variable. But that's not good enough. The loop is still going. I must have screwed up the loop. 

And indeed, let me just now call it out. Line 21 is indeed the source of the bug. So we've isolated it. Out of 23 lines, we've at least found the one line where I know the solution has to be. What's the solution? How do I fix the logic now thanks to the debugger having led me down this road? How do I fix line 21 here? What's the fix? What do you propose? Yeah, Jacob? 

JACOB: You would have to change it from while n is less than 0 to while n is greater than 0. 

DAVID MALAN: Exactly. So instead of n less than 0, I want to say n greater than 0. And I think-- slight clarification, I think I want to include 0 here because 0 is not negative. And if I want a negative int, I think what I'm probably going to want to say is while n is greater than or equal to 0, keep doing the loop. So I very understandably sort of just inverted the logic. No big deal. I'm thinking negatives, and I did less than. But the fix is easy. The point is the debugger led you to this point. 

Now, those of you who have programmed before probably saw the bug jumping out at you. Those of you who haven't programmed before, probably with some time, would have figured out what the bug was, because out of 23 lines, it's got to be one of those. But as our programs get more sophisticated, and we start writing more lines of code, debug50 and debuggers in general will be your friend. 

And I realize that this is easier said than done because at first, when using a debugger, you're going to feel like, ah, I'm just going to use printf. Ah, I'm just going to fight through this. Because there's a bit of a learning curve, you will gain back that time and more by just using a debugger as your first instinct when chasing down problems like this. All right, so that's it for debug50, a new tool in your toolkit in addition to printf. But debug50 is hands down the more powerful of the two. 

Now, some of you have wondered over the past couple of weeks why there's this little rubber duck here. And there actually is a reason for this too. And there's one final debugging technique that, in all seriousness, we'll introduce you today to known as rubber duck debugging. And you can google this. There's a whole Wikipedia article about it. And this is kind of a thing in computer science circles for computer scientists or programmers to have rubber ducks on their desk. 

And the point here is that sometimes, when trying to understand what is wrong in your code, it helps to just talk it through. And in an ideal world, we would just talk to our colleague or our partner on some project. And just in hearing yourself vocalize what it is your code is supposed to do, very often, that proverbial light bulb goes off. And you're like, oh, wait a minute, never mind, I got it, just because you heard yourself speaking illogically when you intended something actual logical. 

Now, we don't often all have colleagues or partners or friends with whom we're working on a project with. And we don't often have family members or friends who want to hear about our code of all things. And so a wonderful proxy for that conversant partner would be literally a rubber duck. And so here in healthier times, we would be giving all of you rubber ducks. Here on stage, we brought a larger one for us all to share. If you've noticed in some of the wide shots on camera, there's a duck who's been watching this whole time. So that any time I screw up, I literally have someone I can sort of talk to nonverbally, in this case. 

But we can't emphasize enough that in addition to printf, in addition to the more sophisticated debug50, talking through your problems with code is a wonderfully valuable thing. And if your friends or family are willing to hear about some low-level code you're writing and some bug you're trying to solve, great. But in the absence of that, talk to a stuffed animal in your room. Talk to an actual rubber duck if you have one. Talk even aloud or think aloud. It's just a wonderful compelling habit to get into because just in hearing yourself vocalize what you think is logical will the illogical very often jump out at you instead. 

All right, so with that said, that's been a lot. Let's go ahead here and take a five-minute break, give everyone a bit of a breather. And when we come back, we'll take a look now at some of the more powerful features of C now that we can trust that we can solve any problems with all of these new tools. So we'll be back in five. 

All right, we are back. So let's take a look underneath the hood, so to speak, of a computer, because as fancy as these devices are and as powerful as they seem, they're relatively simple in their capabilities and what they can actually do. And let's reveal as much by way of last week's discussion of type. So recall that C supports different data types. So we saw char, and string, and int, and so forth. So to recap, we had all of these. 

Well, it turns out that each of these data types is defined on a typical computer system as taking up a fixed amount of space. And it depends on the computer, whether it's Mac or PC, or old or new, just how much space is used typically by these data types. But on CS50 IDE, the sizes of all of these types are as follows-- a bool, true or false, uses just 1 byte. Now, that's actually a little wasteful because 1 byte is 8 bits, and gosh, for a bool, you should only need 1 bit. You can't work at the single-bit level easily in C. And so we just typically spend 1 whole byte on a bool. 

Char is going to be 1 byte as well. And that might sound familiar, because last week when we talked about ASCII, we proposed that the total number of possible characters you can represent with a char was 256 because of 8 bits and 2 to the eighth power. So one char is 1 byte. And that's fixed in C, no matter what. 

Then there were all of these other data types. There was float, which is a real number with a decimal point. That happens to use 4 bytes. A double is also a real number with a decimal point, but it uses 8 bytes, which gives you even more precision. You can have more significant digits after the decimal point, for instance. Ints, we've used a bunch. Those are 4 bytes, typically. A long is twice as big, and that just allows you to represent an even bigger number. And some of you might have done that exactly on credit when storing a whole credit card number. 

Strings, for now, are a variable number of bytes. It could be a short string of text, a long string of text, a whole paragraph. So that's going to vary. So we'll come back to this notion of string next time. But today, focus on just these primitive types, if you will. 

And here is a picture of what is inside of your computer. So this is a piece of memory or RAM, Random Access Memory. And it might be a little smaller. It might be a little bigger depending on whether it's a laptop, or desktop, or phone, or the like. But it's in memory, or RAM, that programs are stored while they're running. And it's where files are stored when they are open. So typically, if you save, install programs, or save files, those are saved on what's generally called your hard drive, or hard disk, or solid-state disk, or CD, or some other physical medium. And that, the [INAUDIBLE] of which is that they don't require electricity to store your data long term. 

RAM is different. It's volatile, so to speak. But it's much faster than a hard disk or a solid-state disk, even. It's much faster because it's purely electronic. And indeed, there are no moving parts. It's purely electronic, as pictured here. And so with RAM, you have the ability to open files and run programs more quickly because when you double-click a program to run it, or you open a file in order to view or edit it, it's stored temporarily in RAM. 

And long story short, if your laptop battery has ever died, or your computer's gotten unplugged, or your phone dies, the reason that you and I tend to lose data, the paragraph that you just wrote in the essay that you hadn't yet saved, is because RAM, memory, is volatile. That is, it requires electricity to continue powering it. But for our purposes, we're only going to focus on RAM, not so much long-term disk space yet, because when you're running a program in C, it is indeed, by definition, running in your computer's memory. 

But the funny thing about something as simple as this picture is that each of these black rectangles is kind of a chip. And in those chips are stored all of the 0's and 1's, the little switches that we alluded to in week 0. So let's focus on and just zoom in on just one of these chips. Now, it stands to reason that I don't know how big this stick of RAM is. Maybe it's 1 gigabyte, a billion bytes. Maybe it's 4 gigabytes. Maybe it's even smaller or bigger. There's some number of bytes represented physically by this hardware. 

So if we zoom in further, let me propose that, all right, I don't know how many bytes are here. But if there's some number of bytes, whether it's a billion or 2 billion, or fewer or more, it stands to reason that we could just number all of these bytes. We could sort of think of this physical device, this memory, as just being a grid, top to bottom, left to right. 

And each of the squares I've just overlaid on this physical device might represent an individual byte. And again, in reality, maybe there's more of them. Maybe there's fewer of them. But it stands to reason, no matter how many there are, we can think of each of these as having a location. Like, this is the first byte, second byte, third byte, and so forth. 

Well, what does it mean, then, for a char to take up 1 byte? That means that if your computer's memory is running a program maybe that you wrote or I wrote that's using a char variable somewhere in it, the char you're storing in that variable may very well be stored in the top left-hand corner physically of this piece of RAM. Maybe it's there. Maybe it's elsewhere. But it's just one physical square. 

If you're storing something like an int, which takes up 4 bytes, well, that frankly might take up all four squares along the top there or somewhere else. If you're using a long, that's going to take up twice as much space. So representing an even bigger number in your computer's memory is going to require that you use all of the 0's and 1's comprising these 8 bytes instead. 

but let's now move away from physical hardware. Let's abstract it away, if you will, and just now start to think of our memory as just this grid. And technically, it's not a two-dimensional structure. I could just as easily draw all of these bytes from left to right. I could just fit fewer of them on the screen. So we'll take the physical metaphor a bit further and just think of our computer's memory as this grid, this grid of bytes. And those bytes are each 8 bits. Those bits are just 0's and 1's. 

So what we've really done is zoom in metaphorically on our computer's memory to start thinking about where things are going to end up in memory when you double-click on a program on your Mac or PC or, in CS50 IDE, when you do ./hello or ./buggy0 or ./buggy1, it's these bytes in your computer's memory that are filled with all of your variables' values. 

So let's consider an example here. Suppose I had written some code that involved declaring three scores. Maybe it's a class that's got, like, three tests. And you want to average the student's grade across all three of those tests. Well, let's go ahead and write a quick program that does exactly this. In CS50 IDE, I'm going to create a program called scores.c. And in scores.c, I'm going to go ahead and #include stdio.h. I'm going to then do my int main(void) as usual. 

And then inside of here, I'm going to keep it very simple. I'm going to give myself one int called score1. And just to be a little playful, I'm going to set it equal to 72, like last week. I'm going to give myself a second score and set it equal to 73, and then a third score whose value is going to be 33. 

And then let me go ahead and print out the average of those three values by plugging in a placeholder for floating point value, right? If you add three integers together and divide them by 3, I may very well get a fraction or a real number with a decimal point. So I'm going to use %f instead of %i because I don't want to truncate someone's grade. Otherwise, if they have, like, a 99.9%, they're not being rounded up to 100%. They're going to get the 99% because of truncation, as we discussed last week. 

So how do I do now the math of an average? Well, it's pretty straightforward-- score1 plus score2 plus score3 in parentheses, just like in math, divided by 3, semicolon. Let me save that file. Let me do make scores at the bottom. Again, we're not going to use Clang manually. No need to, because it's a lot easier to run make. 

But I did mess up here. "Format specifies type 'double', but the argument has type 'int'." So I don't quite understand that. But it's drawing my attention to the %f and the fact that my math looks like this. So any thoughts here? I don't think printf is going to help me here because the bug is within the printf line. I don't think that debug50 is going to really help me here because I already know what line of code the bug is in. This feels like an opportunity to talk to the physical duck or some other inanimate object. Or we can perhaps think about what errors we ran into even last week. [? Arpan, ?] what do you think? 

[? ARPAN: ?] I think it's telling you this because it's receiving in all the values are integer type, but you are telling it to be in float. 

DAVID MALAN: Indeed. So score1, score2, score3 are all integers, and the number 3 is literally an integer. And so this time, the compiler is smart enough to realize, wait a minute, you're trying to coerce an integer result into a floating point value, but you haven't done any floating point arithmetic, if you will. 

So you know what? There's a few ways to fix this. Last week, recall we proposed that you could use a cast, and you could explicitly cast one or more of those values to a float. So I could do this, for instance. Or I could cast all of these to floats or one of these to floats. There's many different possibilities. 

But frankly, the simplest fix is just to divide, for instance, by 3.0. I can avoid some of the headaches of casting from one to another by just making sure that there's at least one floating point value involved in this arithmetic. So now let me recompile scores. This time, it compiles OK. Let me do ./scores, and voila, my average isn't so high, 59.333333. 

All right, so what is actually going on inside of the computer irrespective of the floating point arithmetic, which was, again, a topic of last week? Well, let's consider these three variables, score1, score2, score3-- where are they actually being stored in the computer's memory? Well, let's consider that grid again. And again, I'm going to start at top left for convenience. But technically speaking-- we'll see this down the road-- your computer's memory is just like this big canvas. And values can end up in all different places. But for today, we'll keep it clean. 

The first variable, score1, I claim is going to be here, top left, for simplicity. But what's important about where score1-- that is, 72-- is being stored, is it's taking up four of these boxes. Each of these boxes, recall, represents 1 byte. And an integer, recall, in CS50 IDE is 4 bytes. Therefore, I have used 4 bytes of space to represent the number 72. The number 73 in score2 similarly is going to take up four boxes, as is score3 going to take up four boxes as well. 

But what's really going on underneath the hood here? Well, if each of these squares represents a byte, and each of those bytes is 8 bits, and a bit is just a 0 or 1, what's really going on underneath the hood is something like this. Somehow, this electronic memory is storing electricity in just the right way so that it's storing this pattern of 0's and 1's, a.k.a. 72 in decimal, this pattern of 0's and 1's, a.k.a. 73 in decimal, this pattern of 0's and 1's, a.k.a. 33 in decimal. 

But again, we don't have to keep thinking about or dwelling on the binary level. But this is only to say that everything we've discussed thus far is coming together now in this one picture because a computer is just storing these patterns for us, and we are allocating space now thanks to our programming language via code like this. 

But this code, correct though it may be, indeed 59.333333 and so forth was my average if my test scores were 72, 73, and 33. But I feel like there's an opportunity for better design here. So not just correctness, not just style, recall that design is this other metric of code quality. And it's a little more subjective, and it's a little more subject to debate among reasonable people. 

But I don't really love what I was doing with this naming scheme. And in fact, if we look at the code, there really wasn't much more to my program than these three lines. I worry this program isn't particularly well designed. What rubs you the wrong way, perhaps, about those three lines of code? What could be better? 

And even if you don't know the solution, especially if you've never programmed before, what kind of smells about those three lines? This is actually a term of art. "Code smell" is like something-- not loving that for some reason. If you can't put your finger on it, it's not the best design. The code smells. What's smelly, if you will, about score1, score2, score3? Ryan, what do you think? 

RYAN: If you're doing an average calculation, you don't need to add them up all together in the code. You can add them up beforehand and store it as one variable. 

DAVID MALAN: Absolutely. If I'm computing the average, I don't need to keep all three around. I can just keep a sum and then divide the whole sum by the total number. I like that, that instinct. What else might you not like about the design of this code now? Score1, score2, score3. Score1, score2, score3. Might there be opportunity still for improvement? I feel like any time you start to see this repetition, maybe. Andrew, your thoughts? 

ANDREW: Not hard code the three scores together? 

DAVID MALAN: OK, so not hard code the three scores. And what would you do instead? 

ANDREW: Maybe take an input, or I would-- yeah, I wouldn't write out the scores themselves. 

DAVID MALAN: Yeah, another good instinct. It's kind of stupid that I've written a program, compiled a program, that only computes the average for some student who literally got those three test scores and no others. Like, there's no dynamism here. Moreover, it's a little lazy too that I called my variables score1, score2, score3. I mean, where does it end after that? If I want to have a fourth test next semester, now I have to go and have score4. If I've got a fifth, score5. That starts to be reminiscent of last week's copy/paste, which really wasn't the best practice. 

And so let me propose that we clean this up. And it turns out we can clean this up by way of another topic, another feature of C that's also present in other languages, known as arrays. And if you happened to use something called a list in Scratch, very similar in spirit to Scratch's lists. But we didn't see those in lecture that first week. 

An array in C, as in other languages, is a sequence of values stored in memory back to back to back, a sequence of contiguous values, so to speak, back to back to back. So in that sense, it's like a list of values from left to right if we use the metaphor of the picture we've been drawing. 

So how might this be germane here? Well, it turns out that if you want to store a whole bunch of values, but they're all kind of interrelated, like they're all scores, you don't have to resort to this sort of lazy, score1, score2, score3, score4, score5, up to score99, depending on how many scores there are. Why don't you just call all of those numbers scores, but use a slightly different syntax? 

And that syntax gives you access to what are called arrays. So the syntax here on the screen is an example of declaring space for three integers all at once and collectively referring to all of them as the word "scores." So there's no more scores 1, 2, and 3. All three of those scores are in a variable called "scores." And what's new here is the square brackets, inside of which is a number that literally connotes how many integers do you want to store under the name "scores." 

So what does this allow me to do? It allows me still to define three integers in that array. So this array is going to be a chunk of memory back to back to back that I can put values in. And the way I put those values is going to look syntactically like this. I still use numbers, but now I'm using a new notation. And it's similar to what I resorted to before, but it's a little more generalized now and dynamic. 

Now if I want to update the very first score in that array, I literally write the name of the variable scores, bracket[0] and then assign it the value. If I want to get at the second score, I do scores[1]. If I want the third score, it's scores[2]. And the only thing that's a little weird and takes some getting used to is the fact that we are "zero-indexing" our arrays. 

So in past examples, like for loops and while loops, I've sort of said, eh, it's a convention in programming to start counting from 0. When it comes to arrays, which are contiguous sequences of values in a computer's memory, they have to start at 0. So otherwise, if you don't start counting at 0, you're literally going to be wasting space by overlooking one value. 

So now if we were to rename things on the screen, instead of calling these three rectangles score1, score2, score3, they're all called scores. But if you want to refer specifically to the first one, you use this fancy bracket notation, and the second one, this bracket notation, and the third one, this bracket notation. 

But notice the dichotomy. When declaring the array, when creating the array, saying, give me three ints, you use [3] where [3] is the total number of values. When you index into the array-- that is, when you go to a specific location in that chunk of memory-- you similarly use numbers. But now those are referring to their relative positions, position 0, position 1, position 2. This is the total number of spaces. This is the specific space first, second, and third. 

All right, so pictorially, nothing has changed, just our nomenclature really has. So let me go ahead and start to improve this program, taking in the advice that was offered too on how we can improve the design and get rid of the smelliness of it. Let me take the first-- let me take the easiest of these approaches first by just getting rid of these three separate variables and instead giving me one variable called scores, an array of size 3. And then I don't need to declare score1, score2. Again, that's all going away. That's all going away. That's all going away. 

Now if I want to initialize that array with these three values, I say scores[0]. And down here, I say scores[1]. And down here, I say scores[2]. So I've added one line of code. But notice the dynamism now. If I want to have a fourth one, I can just allocate here and then put in the value with another line of code, or 5, or 6, or 7, or 8. I don't have to start copying and pasting all of these different variable names by convention. 

But I think if we take some of the advice that was offered a moment ago, we can also clean this up by way of a loop or such as well. So let's do that. Let me go ahead and give myself, actually, first the CS50 library so that I can use get_int. And let's take this first piece of advice, which is, let's start asking for a score using get_int. And I'm going to do this three times. And yeah, I'm getting a little lazy. I'm getting a little bored already. So I'm going to copy/paste. And again, that does not bode well in general. When copying and pasting, we can probably do better still. 

But now I think I need to change just one more thing here. When doing the math, I want scores[0] plus scores[1] plus scores[2]. But before I solve this problem here-- the logic is still the same, but I'm now taking in dynamically three integers-- there's still a smell to it as well. It's still not as well designed. And so just to make clear, what could I do be doing better now? How could I clean up this code and make it not just correct, not just well styled, but better designed? What remains here? Nina? What do you think? 

NINA: The code is specific for only three scores. So you could, as an input, [INAUDIBLE] how many scores it wants at the very beginning. And then instead of having scores[0], scores[1], you could use a for loop that goes through from 0 to n minus 1 or less than n that will ask, and it should be one line of code instead. 

DAVID MALAN: Yeah, really good. It's the fact that we have get_int, get_int, get_int. That's the first sign that you're probably doing something suboptimally. It might be correct, but it's probably not well designed because I did literally resort to copy/paste. There's sort of a pattern here that I could certainly integrate into something like a loop. 

So let me do that. Let me actually get rid of two of these lines of code. Let me go up here and do something like for int i get 0, i less than 3 for now, i++. Let me open up this for loop. Let me indent that remaining line of code. And instead of scores[0]-- this is where arrays get really powerful-- you can use a variable to index into an array-- that is, to go to a specific location. What do I want to use for my variable? Well, I would think i here. So now I've whittled my lines of code down from all three triplicate, three nearly identical lines, into just one really inside of a loop that's going to do the same thing for me again and again. 

And as Nina proposed too, I don't have to hard code these 3's all over the place. Maybe I could do something like this. I could say something like, int total gets get_int. And I might ask, "Total number of scores." And I could literally ask the human from the get-go how many total scores are there. Then I can even more powerfully use this variable, total, in multiple places so that now, I'm doing my math much more dynamically. 

This, though-- I'm afraid, Nina, this broke a bit. I'm going to be a little more-- I need to exert a little more effort here on line 14 because now I can't hard code scores[0], [1], and [2] because if the total number of scores is more than that, I need to do more addition. If it's fewer than that, I need to do less addition. So I think we've introduced a bug, but we can fix that. 

But let me propose for just a moment. Let's not make it dynamic because I worry that's just made my life harder. Let's at least introduce one other feature here first. I'm going to go ahead up here and define a new feature of C today, which is known as a constant. If I know in advance that I want to declare a number that I want to use again and again and again without copying and pasting literally that number 3, I can give myself a constant int by a const int total = 3. 

This declares what's called a constant in programming, which is a feature of many languages whereby you declare a variable of sorts whose value can never change. Once you set it, you cannot change it. And that's a good thing because, one, it shouldn't change in the context of this program. And two, just in case you, the human, are fallible, you don't want to accidentally change it when you don't intend. So this is a feature of a programming language that sort of protects you from yourself. 

So now I can sort of take an amalgam of my instincts and Nina's and use this variable, total. And actually, another convention when declaring constants is to capitalize them just to make visually clear that there's something different or special about this variable. So I'm going to change this to TOTAL, and I'm going to use that value here and here and also down here. 

But I'm afraid both Nina and I have a little bit of cleanup here to do in that I still have hard coded scores[0], scores[1], and scores[2]. And I want to add a changing number of values together. So you know what? I've got an idea. Let me go ahead and create a function that's going to compute an average for me. So if I want to create my own function that computes an average, I want it to return a floating point value, just so that we don't truncate any math. I'm going to call this average. 

And the input to this function is going to be the length of an array and the actual array. And this is the last piece of funky syntax for now. It turns out that when you want to pass an array as input to a custom function, you literally use those square brackets again, but you don't specify the size. And the upside of this is that your function then can support an array that's got one space in it, two spaces, three, a hundred. It's more dynamic this way. 

So how do I compute an average here? I can do this a few different ways. But I think what was suggested earlier makes sense, where I can do some kind of summation. So let me do int sum = 0. Because how do you compute the average of a bunch of numbers? Well, you add them all together, and you divide by the total. 

Well, let's see how I might do that. Let me do for int i gets 0, i less than-- what should this be? Well, if I'm being passed as this custom function the length of the array and the actual array, I think I can iterate from i up to length, and then i++ on each iteration. And then on each iteration, I think I want to do sum plus whatever is in the array's i-th location, so to speak. So again, this is shorthand notation per last week for this. Sum equals whatever sum is plus whatever is in location i of the array. 

And once I've done all of that, I think what I can do is return the total sum divided by the length of the array. And what I like about this whole approach-- assuming my code's correct, and I don't think it is just yet-- notice what I can do back up in main. Now I can abstract away the notion of calculating an average and just do something like this with this line of code here. 

So what did I just do? A lot's going on, but let's focus for a moment on line 14 here. On line 14, I'm still just printing the average of some floating point placeholder. But what I'm passing as input now is this function, average, whose inputs are going to be TOTAL, which again is just this constant at the very top-- oh, sorry, I goofed. I should have capitalized it, which is just that constant at the very top. And I'm passing in scores, which again, is just this array of all of those scores. 

Meanwhile, in the function, in the context of the function, notice that the names of the inputs to a function do not need to match the names of the variables being passed into that function. So even though in main, they're called TOTAL and scores, in the context of my function, average, I can call them x and y, a and b, or more generically, length and array. I don't know what the array is, but it's an array of ints. And I don't know how long it is, but that answer is going to be in length. 

But there's still a bug here. There's still a bug. And if we ignore main for a moment, this is a subtle one. Does anyone see a mistake that I've made probably for the third time now over the past two weeks? What mistake subtle have I made here with my code only in this average function? This one's a little more subtle. But the goal is to compute the average of a whole bunch of integers and return the answer. Nicholas? 

NICHOLAS: You've declared the variable within the function. 

DAVID MALAN: I've declared the variable within the function. That's OK because I've declared my variable sum here, I think you mean. But that's inside of the average function. And I'm using sum inside of the outermost curly braces that was defined. And so that's OK. That's OK. Let's take another thought here. Olivia, where might the bug still be? 

OLIVIA: The return type's a float, but you're returning an int divided by an int. 

DAVID MALAN: Perfect. So I again made that same stupid mistake that's just going to get more obvious as time goes on that if I want to do floating point arithmetic, just like the Ariane rocket discussion, the Patriot missile-- like, these kinds of details matter in a program. Now it's correct because I'm actually going to ensure that even though the context here is much less important than those real-world contexts, just computing some average of scores, I'm not going to accidentally truncate any of my values. 

So again, in the context here of this function, average is just applying some of last week's principles. I've got a variable. I've got a loop. And I'm doing some floating point arithmetic, ultimately. And I'm now creating a function that takes two inputs. One is length, and one is the length-- one is the array itself, and the return type, as Olivia notes, is a float so that my output is also well defined. 

But what's nice about this is, again, you can think of these functions as abstractions. Now I don't need to worry about how I calculate an average because I now have this helper function, a custom function I wrote that can help me answer that question. And here, notice that the output of this average function will become an input into printf. 

And the only other feature I've added to the mix here now are not only arrays, which allow us to create multiple variables, a variable number of variables, if you will, but also this notion of a constant. If I find myself using the same number again and again and again, this constant can help me keep my code clean. 

And notice this. If next year, maybe another semester, there's four scores or four tests, I change it in one place. I recompile. Boom, I'm done. A well-designed program does not require that you go reading through the entirety of it, fixing numbers here, numbers there. Changing it in one place can allow me to improve this program, make it support four tests next year instead of just the three. But better still would be to take, I think, Nina's advice before, which was to maybe just use get_int and ask the human for how many tests they actually have. That too would work. 

Well, let me pause here to see if there's any questions then about arrays or about constants or passing them around as inputs and outputs in this way. Yeah, over to Sophia. 

SOPHIA: I had question about the use of float and why the use of one float causes the whole output to be a float. Why does that occur? 

DAVID MALAN: Yeah, really good question. That's just how C behaves. So long as there is one or more floating point values involved in a mathematical formula, it is going to use that data type, which is the more powerful one, if you will, rather than risk truncating anything. So you just need one float to be participating in the formula in question. Good question. Other questions on arrays or constants or this passing around of them? Yeah, over to Alexandra. 

ALEXANDRA: I have a question about the declaring of the array, scores. When you declared it in main, you said int scores. And in the brackets, you have TOTAL. Can you declare it without the TOTAL-- 

DAVID MALAN: Really good question. 

ALEXANDRA: --only the brackets? 

DAVID MALAN: Short answer, no. So the way I did it is the way you do have to do it. And in fact, if I highlight what I did here, now it currently says TOTAL. If I get rid of that, and I go back to our first version where I said something like 3 and 3 and 3 over here, you cannot do this, which I think, Alexandra, is what you were proposing. The computer needs to know how big the array is when you are creating it. 

The exception to that is that when you're passing an array from one function to another, you do not need to tell that custom function how big the array is because, again, you don't know in advance. You're writing a fairly generic, dynamic function whose purpose in life is to take any array as input of integers and any length and respond accordingly with an average that matches the size of that thing. 

And those of you, as an aside, who have programmed before, especially in Java, unlike in Java and certain other languages, the length of an array is not built into the array itself. If you do not pass in the length of an array to another function, there is no way to determine how big the array is. This is different from Java and other languages, where you can ask the array, in some sense, what is its length. In C, you have to pass both the array itself and its length around separately. [? Sina? ?] 

[? SINA: ?] I just-- I'm still a little bit confused about how, when we write that second command, when is it void in the parentheses? And when do we define the int? Because as I remember when we did the-- get a negative number, we get a positive number, it was void, but we still kind of gave it an input. I'm just not completely sold on that. 

DAVID MALAN: Sure, good question. Let me go ahead and open up that previous example, which was a little buggy, but it has the right syntax here. So here was the get_negative_int function from before. And, [? Sina, ?] you know it was void as input. So there was one comment you made where it still took input. That was not so. So get_negative_int did not take any input. And case in point, if we scroll up to main, notice that when I called it on line 10, I said get_negative_int, open parenthesis, close parenthesis, with no inputs inside of those parentheses. 

This keyword "void," which we've seen a few times now last week and this week, is just an explicit keyword in C that says, do not put anything here, which is to say, it would be incorrect for me up here to do something like this, like to pass in a number, or to pass in a prompt, or anything inside of those parentheses. The fact that this function, get_negative_int takes void as its input means it does not take any inputs whatsoever. That's fine. For get_negative_int, the name of the function says it all. Like, there's no need to parameterize or customize the behavior of getting negative int itself. You just want to get a negative int. 

By contrast, though, with the function we just wrote, average, this function does make conceptual sense to take inputs, because you can't just say, give me the average. Like, average of what? Like, it needs to take input so as to answer that question for you. And the input, in this case, is the array itself of numbers and the length of that array so you can do the arithmetic. And so, [? Sina, ?] hopefully, that helps make the distinction. You use void when you don't want to take input. And you actually specify a comma-separated list of arguments when you do want to take input. 

All right, so we focused up until now on integers, really. But let's simplify a little bit because it turns out that arrays and memory actually intersect to create some very familiar features of most any computer program, namely text or strings more generally. So suppose we simplify further, no more integers, no more arrays of integers. Let's just start for a moment with a single character and write a program that just creates a single brick from that Mario game. 

Let me go ahead and create a program here called brick.c. And in brick.c, I'm just going to #include stdio.h, int main(void) And more on this void a little later today. Char c gets, quote unquote, '#'. And then down here, let me just go ahead and print very simply a placeholder, %c, backslash n, and then output c. So this is a pretty stupid program. Its sole purpose in life is to print a single hash as you might have in a Mario pyramid of height 1, so very simple. 

Let me go ahead and make brick. It seems to compile OK. Let me run it with ./brick. And voila, we get a single brick. But let's consider for just a moment exactly what just happened here and what actually was going on underneath the hood. 

Well, you know what? I'm kind of curious. I remember from last week, we could cast values from one thing to another. What if I got a little curious, and I didn't print out c, which is this hash character, as %c, which is a placeholder for a character? What if I got a little crazy and said %i? I think I could probably coerce this char by casting it to an int so I can see its decimal equivalent. I could see its actual ASCII code. 

So let me rebuild this with make brick. Now let me do ./brick. And what number might we see? Last week, we saw 72 a lot, 73, and 33 for "HI!" This week, you can see 35. It turns out it's the code for and an ASCII hash. And you can see this, for instance, if I go to a website like-- let's go to asciichart.com. And sure enough, if I go to the same chart from last week, and I look for the hash symbol here, its ASCII code is 35. 

And it turns out, in C, if it's pretty straightforward to the computer that, yes, if this is a character, I know I can convert it to an int, you don't have to explicitly cast it. You can instead implicitly cast one data type to another just from context here. So printf and C are smart enough here to know, OK, you're giving me a character in the form of variable c. But you want to display it as a %i, an integer. That's going to be OK. And indeed, I still see the number 35. So that's just simple casting. 

But let's now put this into the context of today's picture. How is that character laid out? Well, quite simply, if this is my memory again, and we've gotten rid of all of the numbers, c, otherwise storing this hash, is just being stored in one of these bytes. It only requires one square because, again, a char is a single byte. But equivalently, 35 is the number that's actually being stored there. 

But I wonder, I wonder. Last week, we spent quite a bit of time storing not just single characters, but actual words like "hi" and other expressions. And so what if I were to do something like this? Let me go back to my code. And let me not quite yet practice what I just preached. And let me give myself three variables this time-- c1, c2, and c3. And let me deliberately store in those three variables H, I, in all caps, followed by an exclamation point. 

And per last week, when you're dealing with individual characters, you must, in C, use single quotes. When you're dealing with multiple characters, otherwise known last week as strings, use double quotes. But that's why I'm using single quotes, because we're only playing at the moment with single characters. 

Now let me go ahead and print these values out. Let me print out %c, %c, %c, and output c1, c2, c3. So this is perhaps the stupidest way you could print out a full word like "HI!" in C by storing every single character in its own variable, but so be it. I'm just using these first principles here. I'm using %c as my placeholder. I'm printing out these characters. 

So let me do make brick now. Compiles OK. And if I do a dot slash-- you know, I really should have renamed this file, but we'll rename it in a moment-- ./brick, "HI!" And let me go ahead and do this. Let me go ahead now and actually close the file. And recall from last week, if I want to rename my file from brick.c, let's say, to hi.c, I can use the move command, mv. And now if I open up this file, sure enough, there's hi.c. And I've fixed my renaming mistake. 

All right, so again, if I now do make hi, and I do ./hi, voila, I see the "HI!" But again, this is kind of a stupid way of implementing a string. But let's still look underneath the hood. Let me go ahead and get curious. Let me print out %i, %i, and %i. And Let me include spaces this time just so I can see separation between the numbers. Let me make hi again, ./hi. OK, there's that 72. There's that 73. And there's that 33 from last week. So that's interesting too. 

So what's going on underneath the hood in the computer's memory? Well, when I'm storing these three characters, now I'm just storing them in three different boxes, so c1, c2, c3. And when you look at it collectively, it kind of looks like a whole word even though it's, of course, just these individual characters. So what's underneath the hood, of course, though, is 72, 73, 33. Or equivalently, in binary, just this. So the story is the same even though we're now talking about chars instead of integers. 

But what happens when I do this? What happens when I do string s gets, quote unquote, "HI!" using double quotes? Well, let's change this program accordingly. Let me go ahead and do what we would have done last week, string-- I'll call it s just for s for string-- "HI!" in all caps. I can simplify this next line. I'm going to use %s as a placeholder for string s. 

But let's, for now, reveal what a string really is, because string is a term of art. Every programming language has "strings" even if it doesn't technically have a data type called string. C does not technically have a data type called string. We have added this type to C by way of CS50's library. 

But now if I do make hi, notice that my code compiles OK. And if I do ./hi Enter, voila, I still see "HI!", which is what I would have seen last week as well. And if we depict this in the computer's memory, because "HI!" is three letters, it's kind of like saying, well, give me three boxes, and let me call this string s. So this feels like a reasonable artist's rendition of what s is if it's storing a three-letter word like "HI!" 

But any time we have sequences of characters like this, I feel like we're now seeing the capability of a proper programming language. We introduced a little bit ago the notion of a string. So maybe could someone redefine string as we've been using it in terms of some of today's nomenclature? Like, what is a string? There's an example of one, "HI!", taking up three boxes. But how did we, CS50 maybe implement string underneath the hood, would you say? What is it? Tucker? 

TUCKER: Well, it's an array of characters and integers. Well, it's integers are used in the string, but it's an array of basically single characters. 

DAVID MALAN: Perfect. If we now have the ability to express-- very nicely done, Tucker. If we now have the ability to represent sequences of things, integers, for instance, like scores, well, it stands to reason that we can take another primitive, a very basic data type like a char. And if we want to spell things with those chars, like English words, well, let's just think of a string really as an array of characters, an array of chars. And indeed, that's exactly what string actually is. 

So this thing here, "HI!", technically speaking is an array called s. And this is s[0] This is s[1]. This is s[2]. It's just an array called s. Now, we didn't use the word array last week because it's not as familiar as the notion of a "string of text," for instance. But a string is apparently just an array. 

And if it's an array, that means we can access, if we want to, the individual characters of that array by way of the square bracket notation from today. But it turns out there's something a little special about strings as they're implemented. Recall in our example involving scores, the only way we knew how long that array was was because I had a second variable called length or TOTAL that stored the total number of integers in that array. That is to say in our scores example, not only did we allocate the array itself. We also kept track of how many things were in that array with two variables. 

However, up until now, every time you and I have used the printf function, and we have passed to that printf function a string like s, we have only provided printf with the string itself. Or logically, we have only provided printf with the array of characters itself. 

And yet somehow, printf is magically figuring out how long the string is. After all, when printf prints the value of s, it is printing H, I, exclamation point, and that's it. It's not going and printing 4 characters or 5 or 20, right? It stands to reason that there's other stuff in your computer's memory if you've got other variables or other programs running. Yet printf seems to be smart enough to know, given an array, how long the array is because, quite simply, it only prints out that single word. 

So how then does a computer know where a string ends in memory if all a string is is a sequence of characters? Well, it turns out that if your string is length 3, as is this one, H, I, exclamation point, technically a string, implemented underneath the hood, uses 4 bytes. It uses 4 bytes. It uses a fourth byte to be initialized to what we would describe as backslash 0, which is a weird way of describing it. 

But this just represents a special character, otherwise known as the null character, which is just a special value that represents the end of a string. So that is to say when you create a string, quote unquote with double quotes, "HI!"-- yes, the string is length 3. But you're wasting or spending 4 total bytes on it. 

Why? Because this is a clue to the computer as to where "HI!" ends and where the next string maybe begins. It is not sufficient to just start printing characters inside of printf one at a time, left to right. There needs to be this sort of equivalent of a stop sign at the end of the string, saying, that's it for this string. 

Well, what are these values? Well, let's convert them back to decimal-- 72, 73, 33. That fancy backslash 0 was just a way of saying, in character form, it's 0. More specifically, it is eight 0 bits inside of that square. So to store a string, the computer, unbeknownst to you, has been using one extra byte all, 0 bits, otherwise written as backslash 0, but otherwise known as literally the value 0. 

So this thing, otherwise colloquially known as null, is just a special character. And we can actually see it again. If I go back to my asciichart.com from before, notice number 0 is known as NUL, N-U-L in all caps. 

All right, so with that said, what is powerful then about strings once we have this capability? Well, let me go ahead and do this. Let me go back into my code from a moment ago. And let me go ahead and enhance this program a little bit just to get a little curious as to what's going on. 

You know what I can do? I bet what I can do here in this version here is this. You know what? If I want to print out all of these characters of s, I can get a little curious again and print out %c, %c, %c. And if s is an array, per today's syntax, I can technically do s[0], s[1], s[2]. And then if I save this, recompile my code with make hi, OK, ./hi, I still see "HI!" 

But you know what? Let me get a little more curious. Let me use %i so I can actually see those ASCII codes. Let me go ahead and recompile with make hi, ./hi. There's the 72, 73, 33. Now let me get even more curious. Let me print a fourth value like this here, s[3], which is the fourth location, mind you. So if I now do make hi and ./hi, voila, now you see 0. 

And what this hints at is actually a very dangerous feature of C. You know, suppose I'm curious at seeing what's beyond that. I could technically do s[4], the fifth location, even though according to my picture, there really shouldn't be anything at the fifth location, at least not that I know about just yet. But I can do it in C. Nothing's stopping me. 

So let me do make hi, ./hi. And that's interesting. Apparently there's the number 37. What is the number 37? Well, let me go back to my ASCII chart. And let me conclude that number 37 is a percent sign. So that's kind of weird because I didn't print out an explicit percent. Now I'm kind of poking around the computer's memory in places I shouldn't be looking, in some sense. 

In fact, if I get really curious, let's look not at location 4. How about location 40, like way off into that picture? Make hi, ./hi, 24, whatever that is. I can look at location 400, recompile my code, make hi, ./hi. And now it's 0 again. 

So this is what's both powerful and also dangerous about C. You can touch, look at, change any memory you want. You're essentially just on the honor system not to touch memory that does it belong to you. And invariably, especially next week, are we going to start accidentally touching memory that doesn't belong to you. And you'll see that it actually can cause computer programs to crash, including programs on your own Mac and PC, yet another source of common bugs. 

But now that we have this ability to store different strings or to think about strings as arrays, well, let's go ahead and consider how you might have multiple strings in a program. So for instance, if you were to store two strings in a program-- let's call them s and t respectively. Another programmer convention-- if you need two strings, call the first one s then the second one t. Maybe I'm storing "HI!" then "BYE!" 

Well, what's the computer's memory going to look like? Well, let's do some digging. "HI!", as before, is going to be stored here. So this whole thing refers to s, and it's taking 4 bytes because the last one is that special null character that just is the stop sign that demarcates the end of the string. 

"BYE!", meanwhile, is going to take up another B, Y, E, exclamation point, five bytes because I need a fifth byte to represent another null character. And this one deliberately wraps around. Though again, this is just an artist's rendition. There's not necessarily a grid in reality. B, Y, E, exclamation point, backslash 0 now represents t. 

So this is to say, if I had a program like this, where I had "HI!" and then "BYE!", and I started poking around the computer's memory just using the square bracket notation, I bet I could start accessing the value of B or Y or E just by looking a little past the string s. So again, as complicated as our programs get, all that's going on underneath the hood is you just plop things down in memory in locations like these. 

And so now that we have this ability or maybe this mental model for what's going on inside of a computer, we can consider some of the features that you might want to now use in programs that you write. So let me go ahead here and whip up a quick program, for instance, that goes ahead and, let's say, prints out the total length of a string. 

Let me go ahead and do this. I'm going to go ahead and create a new program here in CS50's IDE. And I'm going to call this one string.c. And I'm going to very quickly at the top include as usual cs50.h. And I'm going to go ahead and #include stdio.h. And I'm going to give myself int main(void). And then in here, I'm going to get myself a string. So string s equals get_string. Let me just ask the human for some input, whatever it is. Then let me go ahead and print out literally the word "Output" just so that I can actually see the result. 

And then down here, let me go ahead and print out that string, for int i get 0, i is less than-- huh, I don't know what the length of the string is yet. So let me just put a question mark there, which is not valid code, but we'll come back to this-- i++. And then inside of the loop, I want to go ahead and print out every character one at a time by using my new array notation. And then at the very end of this program, I'm going to print a new line just to make sure the cursor is on its own line. 

So this is a complete program that is now, as of this week, going to treat a string as an array, ergo, my syntax in line 10 that's using my new fancy square bracket notation. But the only question I haven't answered yet is this-- how do I know when to stop printing the string? How do I know when to stop? 

Well, it turns out, thus far, when we're using for loops, we've typically done something like just count from 0 on up to some number. This condition, though, is any Boolean expression. I just need to have a yes/no or a true/false answer. So you know what I could do? Keep looping so long as character at location i and s does not equal backslash 0. 

So this is now definitely some new syntax. Let me zoom in here. But s[i] just means the i-th character in s, or more specifically, the character at position i in s. Bang equals-- so bang is how a programmer pronounces exclamation point because it's a little faster-- bang equals means does not equal. So this is how you would do an equal sign with a slash through it in math. It's, in code, exclamation point, equals sign. 

And then notice this funkiness-- backslash 0 is again, the "null character," but it's in single quotes because, again, it is by definition a character. And for reasons we'll get into another time, backslash 0 is how you express it. Just like backslash n is kind of a weird escape character for the new line, backslash 0 is the character that is all 0's. 

So this is kind of a different for loop. I'm still starting at 0 for i. I'm still incrementing i as always. But I'm now not checking for some preordained length because just like a computer, I do not know a priori where these strings end. I only know that they end once I see backslash 0. 

So when I now go down here and do make string-- it compiles OK-- ./string, let me type in something like "HELLO" in all caps. Voila, the output is "HELLO" again. Let me do it again-- "BYE" in all caps, and the output is "BYE." So it's kind of a useless program in that it's just printing the same thing that I typed in. But I'm conditionally using this Boolean expression to decide whether or not to keep printing characters. 

Now thankfully, C comes with a function that can answer this for me. It turns out there is a function called strlen so I can literally just say, well, figure out what the length of the string is. The function is called strlen for string length. And it exists in a file called, not surprisingly, perhaps, string.h, string.h. So now let me go ahead down here and do make string-- compiles OK-- ./string. Type in "HELLO," and it still works. 

So this function strlen that does exist in a library via the header file string.h already exists. Someone else wrote it. But how did they write it? Odds are they wrote the first version that I did by checking for that backslash 0. 

But let me ask a subtle question here. This program is correct. It iterates over the whole length of the string, and it prints out every character therein. Can anyone observe a poor design decision in this function? This one's subtle, but there's something I don't like about my for loop in particular. And I'll isolate it to line 9. I've not done something optimally on line 9. There's an opportunity for better design. Any thoughts here on what I might do better? Yeah, Jonathan? 

JONATHAN: Yeah, to create basically another variable for the string length and to remember it. 

DAVID MALAN: Yeah, and why are you suggesting that? 

JONATHAN: If you want to use a different value for the string length, or if it might fluctuate or change, you want to just have a different variable as a sort of placeholder value for it. 

DAVID MALAN: OK, potentially. But I will claim in this case that because the human has typed in the word, once you type in the word, it's not going to change. But I think you're going down the right direction because in this Boolean expression here, i less than the string length of s, recall that this expression gets evaluated again and again and again. Every time through a for loop, recall that you're constantly checking the condition. The condition in this case is i less than the length of s. 

The problem is that strlen in this case is a function, which means there's some piece of code someone wrote, probably similar to what I wrote a few minutes ago, that you're constantly asking, what's the length of the string? What's the length of the string? And recall from our picture, the way you figure out the length of a string is you start at the beginning of the string, and you keep checking, am I at backslash 0? OK. Am I at backslash 0? OK. 

So to figure out the length of "HI!", it's going to take me 1, 2, 3, 4 steps, right, because I have to start at the beginning. And I iterate from location 0 on to the end. To find out the length of "BYE!", it's going to take me five steps because that's how long it's going to take me from left to right to find that backslash 0. 

So what I don't like about this line of code is, why are you asking for the string length of s again and again and again and again? It's not going to change in this context. So Jonathan's point is taken if we keep asking the user for more input. But in this case, we've only asked the human once. 

So you know what? Let's take Jonathan's advice and do int n equals the string length of s. And then maybe you know what we could do? Put n in this condition instead. So now I'm asking the same question, but I'm not foolishly, inefficiently asking the same question again and again, whereby the same question requires a good amount of work to find the backslash 0 again and again and again. 

Now, there's some cleaning up we can do here too. It turns out there's this other subtle feature of for loops. If you want to initialize another variable to a value, you can actually do this all at once. And you can do so before the semicolon. You can do comma n equals strlen of s. And then you can use n, just as I have here. So it's not all that much better, but it's a little cleaner in that now I've taken two lines of code and collapsed them into one. They both have to be of the same data types, but that's OK here because both i and n are. 

So again, the inefficiency here is that it was foolish before that I kept asking the same question again and again and again. But now I'm asking the question once, remembering it in a variable called n, and only comparing i against that integer which does not actually change. 

All right, I know that too was a lot. Let's go ahead here and take a 3-minute break just to stretch legs and whatnot. In 3 minutes, we'll come back and start to see applications now of all of these features ultimately to some problems that are going to lie ahead this week on the readability of language and also on cryptography. So we'll see you in 3 minutes. 

All right, so we're back. And this has been a whole bunch of low-level details, admittedly. And where we're going with this ultimately this week and beyond is applications of some of these building blocks. And one of those applications this coming week and the next problem set is going to be that of cryptography, the art of scrambling or encrypting information. 

And if you're trying to encrypt information, like messages, well, those messages might very well be written in English or in ASCII, if you will. And you might want to convert some of those ASCII characters from one thing to another so that if your message is intercepted by some third party, they can't actually decipher or figure out what it is that you've sent. So I feel like we're almost toward-- we're almost at the ability where, in code, we can start to convert one word to another or to scramble our text. But we do need a couple of more building blocks. 

So recall that we left off with this picture here, where we had two words in the computer's memory, "HI!" and "BYE!", both with exclamation points, but also both with these backslash 0's that you and I do not put there explicitly. They just happen for you any time you use the double quotes and any time you use the get_string function. 

So once we have those in memory, you can think of them as s and t respectively. But a string, s or t, is just an array. So again, you can also refer to all of these individual characters or chars via the new square bracket notation of today, s[0], s[1], s[2], s[3], and then t[0], t[1], [2], [3], and [4], and then whatever else is in the computer's memory. 

But you know what you can even do is this-- suppose that instead we wanted to have an array of words. So before, we had an array of scores, an array of integers. But now suppose we wanted in the context of some other program to have an array of words. You can totally do that. There's nothing stopping you from having an array of words. 

And the syntax is going to be identical. Notice, if I want an array called words that has room for two strings, I literally just say, string words[2]. This means, hey, computer, give me an array of size 2, each of whose members is going to be a string. How do I populate that array? Same as before with the scores-- words[0] gets, quote unquote, "HI!" Words[1] gets, quote unquote, "BYE!" 

So that is to say with this code, could we create a picture similar to the one previously? But I'm not calling these strings s and t. Now I'm calling them both "words" at two different locations, 0 and 1 respectively. So we could redraw that same picture like this. Now this word is technically named words[0]. And this one is referred to by words[1]. 

But again, what is a string? A string is an array. And yet, here we have an array of strings. So we kind of sort of have an array of arrays. So we've got an array of words, but a word is just a string. And a string is an array of characters. So what I really have on the board is an array of arrays. 

And so here-- and this will be the last weird syntax for today-- you can actually have multiple square brackets back to back. So if your variable's called words, and that variable's an array, if you want to get the first word in the array, you do words[0]. Once you're at that word, "HI!", and you want to get the first character in that word, you can similarly do [0]. So the first bracket refers to what word do you want in the array. The second bracket refers to what character do you want in that word. So now the I is that words[0][1]. The exclamation point is that words[0][2]. And the null character's at words[0][3]. 

Meanwhile, the B is that words[1][0], [1][1], [1][2], [1][3], [1][4]. So it's almost kind of like a coordinate system, if you will. It's a two-dimensional array, or an array of arrays. So this is only to say that if we wanted to think of arrays of strings as individual characters, we can. We have that expressiveness now to encode. 

So what more can I do now that I can manipulate things at this level? Let me do a program that'll be pretty applicable, I think, with some of our upcoming programs as well. Let me call this one uppercase. Let me quickly write a program whose purpose in life is just to convert an input word to uppercase. And let's see how we can do this. 

So let me go ahead and #include cs50.h. Let me go ahead and #include stdio.h. Let me also include this time string.h, which is going to give us functions like strlen. And then let me do int main(void). 

And then let me go ahead here and get a string from the user like before. So I'm just going to ask the user for a string. And I want them to give me whatever the string should be before I uppercase everything. Then I'm just going to go ahead and print out literally "After," just so I can see what happens after I capitalize everything in the string. 

And now let me go ahead and do this-- for int i get 0, i less than string length of s, i++. Wait a minute, I made that mistake before. Let's not repeat this question. Let's give myself a second variable-- n gets string length of s, i less than n, i++. So again, this is now becoming boilerplate. Any time you want to iterate over all of the characters in the string, this probably is a reasonable place to start. 

And then let me ask the question-- I want to iterate over every character in the string that the human has typed in. And I want to ask myself a question, just as we've done with any algorithm. Specifically, I want to ask if the current letter is lowercase, let me somehow convert it to uppercase. Else, let me just print it out unchanged. So how can I express that using last week and this week's building blocks? 

Well, let me say something like this-- if the character at location i in s, or if the i-th character in s is greater than or equal to a lowercase a, and the i-th character in s is less than or equal to a lower case z, what do I want to do? Let me go ahead and print out a character. But that character should be what? s bracket i, but I'm not sure what to do here yet. But let me come back to that. Else, let me go ahead and just print out that character unchanged, s[i]. 

So minus the placeholder, the question marks I've put, I'm kind of all the way there. Line 10 initializes i to 0. It's going to count all the way up to n, where n is the length of the string. And it's going to keep incrementing i. So we've seen that before. And again, that's going to become muscle memory before long. 

Line 12 is a little new, but it uses building blocks from last week and this. This week, we have the new square bracket notation to get the i-th character in the string s. Greater than or equal to, less than or equal to-- we saw at least one of those last week. That just means greater than or equal to, less than or equal to. 

I mentioned && last week, which is the logical AND operator, which means you can check one condition and another. And the whole thing is true if both of those are true. This is a bit weird today. But if you want to express, is the current character between lowercase a and lowercase z, totally fine to implicitly treat a and z as numbers, which they really are. Because again, if we come back to our favorite ASCII chart, you'll see again that lowercase a has a number associated with it, 97. Lowercase z has a number associated with it, 122. 

So if I really wanted to be pedantic, I could go back into my code and do something like, well, if this is greater than or equal to 97, and it's less than or equal to 122, but bad design. Like, I'm never going to remember that lowercase z is 122. Like, no one is going to know that. It makes the code less obvious. Go ahead and write it in a way that's a little more friendly to humans like this. 

But notice this question mark. How do I fill in this blank? Well, let me go back to the ASCII chart. This is subtle, but this is kind of cool. And humans were definitely thinking ahead. Notice that lowercase a is 97. Capital A is 65. Lowercase b is 98. Capital B is 66. And notice these two numbers-- 65 to 97, 66 to 98, 67 to 99. It would seem that no matter what letters we compare, lowercase and uppercase, they're always 32 apart. And that's consistent. We could do it for all 26 English letters. 

So if they're always 32 apart, you know what I could do-- if I want to take a lowercase letter, which is what I'm thinking about in line 14, I could just subtract off 32 in this case. It's not the cleanest, because again, I'm probably going to forget that math at some point. But at least mathematically, I think that'll do the trick because 97 will become 65. 98 will become 66, which is forcing those characters to lowercase. But they're not being printed as numbers. I'm still using %c to coerce it to be a char. 

So if I didn't mess any syntax up here, let me make uppercase. OK, ./uppercase. And let me go ahead and type in, for instance, my name in all lowercase. And voila, uppercase. Now, it's a little ugly. I forgot my backslash n, so let me go ahead and add one of those real quick just to fix the cursor. Let me recompile the code with make uppercase. Let me rerun the program with ./uppercase and now type in my name, David. Let me do it again with Brian. And notice that it's capitalizing everything character by character using only today's building blocks. 

This is correct. It's pretty well styled because everything's nicely indented. It's very readable even though it might look a little cryptic at first glance. But I think I can do better. And I can do better by using yet another library. And here's where C, and really programming in general, gets powerful. The whole point of using popular languages is because so many other people before you have solved problems that you don't need to solve again. And I'm sure over the past, like, 50 years, someone has probably written a function that capitalizes letters for me. I don't have to do this myself. 

And indeed, there is another library that I'm going to include by way of its header file. In ctype.h, type which is the language C and a bunch of type-related things. And in ctype.h, it turns out there's a function call-- there's a couple of functions. Specifically, let me get rid of all of this code. And let me call a function called islower and pass to islower s[i]. And islower, as you might guess, its purpose in life is to return essentially a Boolean value, true or false, if that character is lower. And if so, well, let me go ahead and print out a placeholder followed by the capitalization of that letter. 

Now, before I had to do that annoying math with minus 32 and figure it out, uh-uh, toupper of parentheses s[i]. And now I can otherwise just print out that character unchanged, just as before, s[i]. But now notice my program-- honestly, it's definitely a little shorter. It's a little simpler in that there's just less code. And hopefully, if the person that wrote islower and toupper did a good job, I know it's correct. I'm just standing on their shoulders. And frankly, my code's more readable because I understand what islower means, whereas that crazy && syntax and all of the additional code-- that was just a lot harder to wrap your mind around, arguably. 

So now if I go ahead and compile this-- make uppercase. OK, that seemed to work well. And now I'm going to go ahead and do ./uppercase and type in my name in all lowercase again. David seems to work. Brian seems to work. And I could do this all day long. It seems to still work. 

But you know what? I don't think I have to be even this explicit. You know what? I bet if the human who wrote toupper was smart, I bet I can just blindly pass in any character to toupper, and it's only going to uppercase it if it can be converted to uppercase. Otherwise, it'll pass it through unchanged. 

So you know what? Let me get rid of all of this stuff and really tighten this program up and print out a placeholder for c and then toupper of s[i]. And sure enough, if you read the documentation for this function, it will handle the case where it's either lowercase or not lowercase. And it will do the right thing. 

So now if I recompile my code, make uppercase, so far so good. ./uppercase, David again. Voila, it still works. And notice truly just how much tighter, how much cleaner, how much shorter my code is. And it's more readable in the sense that this function is pretty well named. Toupper is what it's indeed called. 

But there is an important detail here. Toupper expects as input a character. You cannot pass a whole word to it. It is still necessary at this point for me to be using this loop and doing it character by character. Now, how would you know this? Well, you'll see multiple examples of this over the weeks to come. 

But if I go to what's called the manual pages for the language C, we have our own web-based version of them. And we'll link this for you in the course's labs and problem sets as needed. You can see a list of all of the available functions in C at least that are frequently used in CS50. And if we uncheck a box at the top, we can see even more functions. There's dozens, maybe hundreds of functions, most of which we will not need or use in CS50. 

But this is going to be true in any language. You sort of pick up the building blocks that you need over time. So we'll refer you to these kinds of resources so that you don't rely only on what we show in section and lecture, but you have at your disposal these other functions and toolkits as well. And we'll do the same with Python and SQL and other languages as well. So those are what we call, again, manual pages. 

All right, a final feature before we even think about cryptography and scrambling information as for problem set 2. So a command-line argument I mentioned by name before-- it's like a word you can type after a program's name in order to provide it input at the command line. So make hello-- hello is a command-line argument to the program, hello. Rm space a.out-- a.out was an argument, a command-line argument to the program rm when I wanted to remove it. 

So we've already seen command-line arguments in action. But we haven't actually written any programs that allow you to accept words or other inputs from the so-called command line. Up until now, all of the input you and I have gotten in our programs comes from get_string, get_int, and so forth. We have never been able to look at words that the human might very well have typed at the prompt when running your program. But that's all about to change now. 

Let me go ahead and create a program called argv.c, and it'll become clear why in just a moment. I'm going to go ahead and include, shall we say, stdio.h. And then I'm going to give myself int main(void). And then I'm just going to very simply go back and change the void. So just as our own custom functions can take inputs-- and we saw that with get_negative_int. We saw that with average today-- so does main potentially take inputs. Up till now though, we've been saying void. And we told you to say void last week. And we told you to say void in problem set 1. 

But now it turns out that C does allow you to put other inputs into main. You can either say, nope, main does not take any command-line arguments. But if it does, you can say literally, int argc and string argv with square brackets. So it's a little cryptic. And technically, you don't have to type it precisely this way. But human convention would have you do it, at least for now, in this way. 

This says that main, your function, main, takes an integer as one input and not a string but an array of strings as input. And argc is shorthand notation for argument count. Argument count is an integer that's going to represent the number of words that your users type at the prompt. Argv is short for argument vector. Vector is a fancy way of saying list. It is a variable that's going to store in an array all of the strings that a human types at the prompt after your own program's name. 

So we can use this, for instance, as follows. Suppose that I want to let the user type their own name at the command prompt. I don't want to use get_string. I don't want to have to prompt the human later for their name. I want them to be able to run my program and give me their name all at once, just like make, just like rm, and Clang, and other programs we've seen. 

So I'm going to do this-- if argc == 2-- so if the number of arguments to my program is 2-- go ahead and print out, "hello, %s", and plug in whatever is that argv[1]. So more on this in just a moment. Else, if argc is not equal to 2, let's just go with last week's default, "hello, world." 

So what is this program's purpose in life? If the human types two words at the prompt, I want to say, "hello, David," "hello, Brian," "hello, so-and-so." Otherwise, if they don't type two words at the prompt, I'm just going to say the default "hello, world." 

So let me compile this, make argv. And, hm, I didn't get it right here-- unknown type string, unknown type string. All right, I goofed. If I'm using string, recall that now I need to start using the CS50 library. And again, we'll see all the more why in the coming weeks as we take those training wheels off. But now I'm going to do this again, make argv. There we go. Now it works-- ./argv, Enter, "hello, world." That's pretty much equivalent to what we did last week. 

But notice if I type in, for instance, argv[1] David, Enter, it says, "hello, David." If I type in argv Brian, it says that. If I type in Brian Yu, it says "hello, world." So what's going on? Well, the way you write programs in C that accept zero or more command-line arguments-- that is, words at the prompt after your program's name-- is you change what we have been doing all this time from void to be this into argc string argv with square brackets. 

And what the computer is going to do for you automatically is it's going to store in argc a number of the total number of words that the human typed in, not just the arguments, technically all of the words, including your own program's name. It's then going to fill this array of strings, a.k.a. argv, with all of the words the human typed at the prompt, so not just the arguments like Brian or David, but also the name of your program. 

So if the human typed in two total words, which they did, argv Brian, argv David, then I want to print out, "hello" followed by a placeholder and then whatever value is at argv[1]. And I'm deliberately not doing 0. If I did 0, based on the verbal definition I just gave, if I recompile this program, I don't want to see this, hello, ./argv. 

So the program's own name is automatically always stored for you at the first location in that array. But if you want the first useful piece of information, you actually would, after recompiling the code here, access it at [1]. And so in this way do we see in argv that we can actually access individual words. 

But notice this too-- suppose I want to print out all of the individual characters in someone's input. You know what? I bet I could even do this. Let me go ahead and do this. Instead of just printing out "hello," let me do for int i get 0, n equals the string length of argv[1]. And then over here, I'm going to do i is less than n, i++. All right, so I'm going to iterate over all of the characters in the first real word in argv. 

And what am I going to do? Well, let me go ahead and print out a character that's at argv[1] but at location i. So I said a moment ago with our picture that we could think of an array of strings as really just being an array of arrays. And so I can employ that syntax here by going into argv[1] to get me the word like "David" or "Brian" or so forth, and then further index into it with more square brackets that get me the D, the A, the V, the I, the D, and so forth. 

And just to be super clear, let me put a new line character there just so we can see explicitly what's going on. And let me go ahead now and just delete this "hello, world" because I don't want to see any hellos. I just want to see the word the human typed in. Make argv-- whoops, what did I do wrong? Oh, I used strlen when I shouldn't have because I haven't included string.h at the top. 

OK, now if I recompile this code and recompile make argv-- there we go-- ./argv David, you'll see one character per line. And if I do the same with Brian's name or anyone's name and change it to Brian, I'm printing one character at a time. So again, I'm not sure why you would want to do that. But in this case, my goal simply was to not only iterate over the characters in that first word, but print them out. So again, just by applying twice over this time this principle, can we actually see that a program has access to the individual characters in each of these strings. 

All right, and one last explanation before we introduce the crypto and application thereof. This thing here, this thing here-- does anyone have any idea as to why main, last week and this week, seems to return an int even though it's not an average function? It's not a get_positive_int function. It's not get_negative_int. Somehow, for some reason, main keeps returning an int even though we have never seen this int in action. 

What might this mean? This is the one last piece that we promised last week we would eventually explain. What might this mean? And this one's a tough one. Brian, who do we have? How about [? Gred, ?] is it? 

[? GRED: ?] Usually, the functions in the end have returned 0. And that means that the function stops. And the 0 is the integer that pops out of the main function. 

DAVID MALAN: Yeah, and this one's subtle in that if you had programmed before, odds are-- and I'm guessing you have, [? Gred-- ?] you've seen this in use before. We humans, though, in the real world of using Macs and PCs-- you've actually seen numbers, integers in weird places. Frankly, almost any time your computer freezes or you see an error message, odds are you see an English or some spoken language in the error message. But you very often see a numeric code. 

For instance, if you're having Zoom trouble, you'll often see the number 5 in the error window in Zoom's program. And 5 just means you're having network issues. So programmers often associate integers with things that can go wrong in a program. And as [? Gred ?] notes, they use 0 to connote that nothing has gone wrong, that all as well. 

So let me write one final program here just called exit.c that puts this to the test. Let me go ahead and write a program in a file called exit.c that's going to introduce what we're going to call an exit status. This is a subtlety that will be useful as our programs get a little more complicated. I'm going to go in here and do #include cs50.h. And I'm going to go ahead and #include stdio.h. 

And I'm going to give myself the longer version of main, so int argc, string argv with the square brackets. And in here, I'm going to say, if argc does not equal 2, uh-uh, the human is not doing what I want them to, and I'm going to yell at them in some way. I'm going to say missing command-line arguments. So any kind of error message that I want the human to see on the screen, I'm just going to tell them with that message. 

But I'm going to very subtly return the number 1. I'm going to return an error code. And the human is not necessarily going to see this code. But if we were to have a graphical user interface or some other feature to this program, that would be the number they see in the error window that pops up, just like Zoom might show you the number 5 if something has gone wrong. Similarly, if you've ever visited a page, frankly, and the web page doesn't exist, you see the integer 404. That's not technically the exact same incarnation of this, but it is representative of programmers using numbers to represent errors. So that one, you probably have seen. 

Here, I'm going to go ahead, though, and by default, say, "hello, %s," just like before, passing in whatever's in argv[1]. So same program as before, but I'm not going to do any of this lame, "hello, world" if the human doesn't type in their name as I expect. Instead, I am going to check, did the human give me two words at the command line? If not, I'm going to print, "missing command-line argument," and then return this exit code. 

Otherwise, if all is well, I'm going to go ahead and return explicitly 0. This is another number that the human, you and I, are never going to see, but we could have access to it. And frankly, for course purposes, check50 can have access to this. And graphical user interfaces, when we get to those, can have access to these values. So 0, as [? Gred ?] notes, is just all as well. But 1 would mean that something goes wrong. 

So let me go ahead and make exit, which is kind of appropriate, as we're wrapping up here. And let me go ahead and do ./exit. "Missing command-line argument" is what's displayed. If I go ahead and say, exit David, now I see "hello, David." Or exit Brian, I'll see "exit Brian." 

Now, this is not a technique you'll need to use often, but you can actually see these return values if you want. If I run exit, and I see this error message, I can very weirdly say, echo $?, which is a very admittedly cryptic way of saying, what was my exit status? And if you hit Enter, you'll see 1. By contrast, if I run exit of David, and I actually see "hello, David," and I do echo $?, now I will see 0. 

So again, this is not a technique you and I will use very frequently. But it's a capability of a program, and it's a capability of C, that you do now have access to. And so in writing programs moving forward, what we will often do in labs and in problem sets and the like is ask you to return from main either 0 or 1 or maybe 2 or 3 or 4 based on the problems that might have gone wrong in your program that you have detected and responded to appropriately. So it's a very effective way of handling errors in a standard way so that you know that you are being proactive about detecting mistakes. 

So what kinds of mistakes might we handle this week? And what kinds of problems might we solve? Well, today was entirely about deconstructing what a string is. Last week, it was just a sequence of text, a chunk of text. Today, it's now an array of characters. And we have new syntax in C for accessing those characters. 

We also today have access to more libraries, more header files, the documentation, therefore, so that we can actually solve problems without writing as much code ourselves. We can use other people's code in the form of these libraries. 

So one problem we will solve this coming week by way of problems set 2 is that of readability. Like, when you're reading a book or an essay or a paper or anything, what is it that makes it like a 3rd-grade reading level or a 12th-grade reading level or university reading level? Well, all of us probably have an intuitive sense, right? Like, if it's big font and short words, it's probably for younger kids. And if it's really complicated words with big vocabulary and things we don't know, maybe it's meant for university audiences. 

But we can quantify this a little more formulaically, not necessarily the only way, but we'll give you a few definitions. So for instance, here's a famous sentence-- "Mr. And Mrs. Dursley, of number four, Privet Drive, we're proud to say that they were perfectly normal, thank you very much," and so forth. 

Well, what is it about this text that puts Harry Potter at grade seven reading level? Well, it probably has to do with the vocabulary words. But it probably has to do with the lengths of the sentences, the amount of punctuation perhaps, the total number of characters that you might count up. You can imagine quantifying it just based generically on the look and the aesthetics of the text. 

What about this? "In computational linguistics, authorship attribution is the task of predicting the author of document of unknown authorship. This task is generally performed by the analysis of stylometric features-- particular"-- this is Brian's senior thesis. So this is not a seventh-grade reading level. This was actually rated at grade 16. So Brian's pretty sophisticated when it comes to writing theses. 

But there too, you could perhaps glean from the sophistication of the sentences, the length thereof, and the words therein-- there's something we could perhaps quantify so as to apply numbers. And indeed, that's one way you could assess the readability of a text even if you don't have access to a dictionary with which to figure out which are the actual big or small words. 

And what about cryptography? So it's incredibly common these days and so important these days for you and I to use cryptography, not necessarily using algorithms we ourselves come up with, but rather using software, like WhatsApp and Signal and Telegram and Messenger and others, that support encryption between you and the third party or friend or family, or at least minimally the website with which you're interacting. 

So cryptography is the art of scrambling information, or hiding information. And if that information is text, well, frankly, as of this third week of CS50, we already have the requisite building blocks for not only representing text, but we saw today manipulating it. Even just uppercasing characters allows us to start mutating text. 

Well, what does it mean to encrypt information? Well, it's like our black box from last week. You have some input. You want some output. The input, we're going to start calling plaintext. The message, you want to send from yourself to someone else. Ciphertext is the output that you want. 

And so in between there, there's going to be what we're going to call a cipher. A cipher is an algorithm that encrypts or scrambles its input so as to produce output that a third party can't understand. And hopefully, that cipher, that algorithm, is a reversible process so that when you receive the scrambled ciphertext, you can figure out what it was that the person sent to you. 

But the key to using cryptography-- pun intended-- is to also have a secret key. So if you think back to grade school, maybe you were flirting with someone in class, and you sent them a note on a piece of paper. Well, hopefully, you didn't just say, like, I love you, on the piece of paper and then pass it through all of your friends, or let alone the teacher, to the ultimate recipient. 

Maybe you did something like, an A becomes a B. A B becomes a C. A C becomes a D. Like, you kind of apply an algorithm to add 1 to all of the letters so that if the teacher does intercept it and look at it, they probably don't have enough care in the world to figure out what this is. It's just going to look like nonsense. But if your friend knows that you changed A to B, B to C by adding 1 to every letter, they could reverse that process and decrypt it. 

So the key, for instance, might be literally the number 1. The message literally might be, "I LOVE YOU." But what would the ciphertext be, or the output? Well, let's consider "I LOVE YOU" is a string which, as of today, is an array of characters. So what use is that? Well, let's consider exactly that phrase as though it's an array. It's an array of characters. We know from last week, characters are just integers, decimal integers, thanks to ASCII, and in turn, Unicode. 

So it turns out I, we already know, is 73. And if we looked up all the others on a chart, L is 76, 79, 86, 69, 89, 79, 85. So we could relatively easily and see-- you might have to check your notes and check my sample code and so forth-- but relatively easily in C convert "I LOVE YOU" to the corresponding integers by just casting, so to speak, chars to integers. 

I could very easily mathematically, using the plus operator in C, start to add 1 to every one of these characters, thereby encrypting my message. But I could send my friend these numbers. But I might as well make it a little more user friendly and cast it back from integers to chars. 

So now it would seem that the ciphertext for "I LOVE YOU," if using a key of 1-- and 1 just means change A to B, not A to C, just move it by one place-- this is the ciphertext for an encrypted message of, "I LOVE YOU." And so the whole process becomes 1 is the input as the key. "I LOVE YOU" is the input as the plaintext. And the output ultimately is this unpronounceable phrase that, again, if the teacher or some friend intercepts, they probably don't know what's going on. 

And indeed, this is the essence of cryptography. The algorithms that protect our emails and texts and financial information and health information is hopefully way more sophisticated than that particular algorithm as it is. But it reduces to the same process-- an input key and an input text followed by some output, the so-called ciphertext. 

And this has been with us for decades now in some form, sometimes even mechanical form. Back in the day, you could actually get these little circular devices that have letters on the alphabet on one side, other letters on the alphabet on the other side. And if you rotate one or the other, A might line up with B, B might line up with C. 

So you can have even a physical incarnation of cryptography, just as was popular in a movie that seems to play endlessly on TV, at least here in the US around Christmas time. And you might recognize if you've seen A Christmas Story one such look. So we'll use just a couple of minutes of our final moments together to take a look at this real-world incarnation of cryptography that undoubtedly you can probably see on TV this fall. 

[VIDEO PLAYBACK] 

- "Be it known to all and sundry that Ralph Parker is hereby appointed a member of the Little Orphan Annie secret circle and is entitled to all the honors and benefits occurring thereto." 

- "Signed, Little Orphan Annie." "Countersigned, Pierre Andre," in ink. Honors and benefits already at the age of nine. 

[RADIO CHATTER] 

- (ON RADIO) Attention! [INAUDIBLE] overboard! 

[CLANGING] 

- (ON RADIO) Come [INAUDIBLE] Gone overboard! 

- (ON RADIO) [INAUDIBLE] 

- Come on, let's get on with it. I don't need all that jazz about smugglers and pirates. 

[BARKING] 

- (ON RADIO) Listen tomorrow night for the concluding adventure of the Black Pirate Ship. Now it's time for Annie's secret message for you members of the secret circle. Remember kids, only members of any secret circle can decode any secret message. Remember, Annie is depending on you. Set your pins to B-2. Here is the message. 12, 11, 2, 8-- 

- I am in my first secret meeting. 

- (ON RADIO) --25, 14, 11, 18, 16, 23-- 

- Old Pierre was in great voice tonight. 

- (ON RADIO) --12, 23-- 

- I could tell that tonight's message was really important 

- (ON RADIO) --21, 3, 25. That's a message from Annie herself. Remember, don't tell anyone. 

[FOOTSTEPS AND PANTING] 

- 90 seconds later, I'm in the only room in the house where a boy of nine could sit in privacy and decode. [CHUCKLES] Aha, B. [CHUCKLES] I went to the next, E. The first word is "be." S, it was coming easier now. U. [CHUCKLES] 25, that's R. 

- Aw, come on, Ralphie, I gotta go. 

- Come on. 

- I'll be right down, Ma! 

- Gee whiz. 

- T, O. "Be sure to." Be sure to what? What was Little Orphan Annie trying to say? Be sure to what? 

- Ralphie, Randy has got to go. Will you please come out? 

- All right, Ma! I'll be right out! 

- I was getting closer now. The tension was terrible. What was it? The fate of the planet may hang in the balance. 

[KNOCKING] 

- Ralphie! Randy's got to go! 

- I'll be right out, for crying out loud! 

- [CHUCKLES] Almost there. My fingers flew. My mind was a steel trap, every pore vibrated. It was almost clear. Yes, yes, yes, yes. 

- "Be sure to drink your Ovaltine." Ovaltine? A crummy commercial? 

[MUSIC PLAYING] 

Son of a bitch. 

[END PLAYBACK] 

DAVID MALAN: All right, that's it for CS50. We will see you next time. 

[MUSIC PLAYING]