DAVID MALAN: All right. This is CS50, and today we look all the more underneath the hood, so to speak, of programming, which we've been doing the past couple of weeks, and of C in particular. And indeed, we're going to try to focus today in addition on some new programming techniques, really on first principles, so that what you've been seeing over the past couple of weeks no longer feels quite as much like magic. 

If you're sort of typing these magical incantations and you're not quite sure why things work, know that you will understand and appreciate all the more with practice and with application of these ideas, what it is you're doing. But today, we're going to go back to first principles, sort of week 0 material, to make sure that you understand that what we're doing now in week 2 is little different from what we did back in week 0. 

So in fact, let's take a look at one of the first programs we saw in C, which was a little something like this. This is our source code, so to speak. There were a few salient characteristics from last week that dovetailed with the first week, week 0. And that was this thing called main, which is just the main function. It's the main entry point to your program. It's the equivalent of scratches when green flag clicked. 

This of course is an example of another function, one that comes with C that allows you to print on the screen. It can take inputs, at least one input here, which is typically a string in double quotes, like the message "hello world." But of course, in order to use printf in the first place, you needed this thing up here. And Standard io.h represents what, as you understand it now? 

Any thoughts on what Standard io.h is? Yeah? 

AUDIENCE: A library on how [INAUDIBLE]. 

DAVID MALAN: Yeah, it's a manifestation of what's called a library, code that someone else wrote years ago. Specifically, Standard io.h is a header file. It's a file written in C but with a file extension ending in dot h that among other things declares that it has the prototype, so to speak, for printf so that Clang, when you're compiling your code, know what printf actually is. 

And of course this little thing back here, you've probably now gotten in the habit of using this /n is new line. And it forces the cursor to go on the next line. So those were some of the uglier characteristics of code last week, and we'll tease apart int and void and a few other things over the course of today and beyond. 

So when you compile your code with Clang, hello.c, and then run that program, ./a.out, which you probably haven't done on your own since, because we gave you a simpler way to do this, that process was all about creating a file containing zeros and ones that the computer understands, called a.out that you can run. Of course, a.out is a pretty stupid name for a program. It's hardly descriptive, even though it's the default. 

So the next program we wrote and compiled, we used -ohhello, which is a so-called command line argument to Clang. It's like an option it comes with that just lets you specify the name of the file to output. So you did this past week with the problem set, with a couple of programs you yourself wrote. 

But what is actually going on when you compile your code via that process? Well, it turns out that if we make this program a little more interesting, this becomes even more important with code like this. Now I've added a couple of lines of code. CS50.h, which is representative of the CS50 library. Again, code that other people wrote, in this case the staff some years ago, that declares that it has prototypes for the one liners for functions like GetString so that you can use more features than came with C by default. 

And it has things like String itself, a data type. So GetString is declared in that file. Name is, of course, a variable in which we stored my name last week. String is the type of variable in which we stored a name. And all of that is then outputed hello comma something, where the percent S recall was a placeholder, name is the variable we plugged in to that format code, and then all of that is possible because of CS50.h, which declares string and also gives us GetString. 

So that's a paradigm that's at the moment CS50 specific, but it's representative of any number of other functions we're going to start using today and in the weeks to come. The process now is going to be the same. However, when you compiled that program that used the CS50 library, you might recall and you might have gotten hung up on this past week if you used Clang and not another program, you need this -lcs50, and you need it at the end just because. That's the way Clang expects it. 

This is a special flag that we'll tease apart in just a couple of minutes, an argument to Clang that tells it to link in, so to speak, link in all of the zeros and ones from CS50's library. But we'll see that in just a moment. This, of course, is how you should probably be compiling your code here on out. It's just super simple, but it automates everything we just saw more pedantically, step by step. 

So we've been compiling our code for the past week now, and we're going to keep doing that for next several weeks, until-- spoiler-- we get to Python, and you're not going to have to compile anything anymore. It's just going to happen automatically for you. But until then, compilation is actually kind of an oversimplification of what's been happening the past week. Turns out there's like actually four distinct steps that you all had been inducing by running Make or even by running Clang manually at the command prompt. 

And just so that, again, we can sort of understand what it is you are doing when you run these commands, let's go to first principles, understand these four steps, but then we'll move on just like in week 0 and stipulate, OK, I got that. I don't need to think at this low level after today. But hopefully you'll understand from the bottom up these four steps. 

So let's take a look at pre-processing. This is a term of art in programming that refers to the following. When you have source code that looks like this, you have a couple of lines at the top that say hash include two files, two library files. Well, when you actually run Clang or you induce Clang to run by using Make, what happens is those lines that start with the hash symbol are actually sort of replaced with the actual contents of that file. 

So instead of this code remaining include CS50.h, literally what Clang does is go into CS50.h, grab the relevant lines of code, and essentially copy-paste them into your file, hello.c or whatever it's called. The next line here, standard io.h similarly gets replaced with whatever the lines of code are in that file, standard io.h. Doesn't matter to us what they are, but they look a little something like this, though I've simplified on the slide here. 

And there's a whole bunch of other stuff above and below those lines certainly in those files. What then happens after that? Well, compiling, even though this is the word we use and we'll continue using to describe taking source code to machine code, it's actually a more precise step than that. When a computer-- when a program is compiled, it technically starts like this after having been pre-processed-- again, that was step 1. 

This code is then converted by a compiler, like Clang, to something that looks even scarier than C. This is something called assembly code, and you can actually take entire courses on assembly code. And it wasn't all that many decades ago that humans were manually programming code that looked like this, so it wasn't quite zeros and ones. But my god, C is looking pretty good now, if this is the alternative language back in the day. 

So this is an example of assembly language. But even though it's pretty arcane looking, if I highlight in yellow a few characteristics, there's some things that are familiar. Main is up here. Get string is down here. Printf is down here. So when your code is compiled by Clang, it goes from your source code in C to this intermediate step assembly code, and that's just a little closer to what the CPU, the brain of your computer, actually understands. 

In fact, now highlighted in yellow are what are called instructions. So if you've ever heard of Intel or AMD or a bunch of companies that make CPUs, central processing units, the brains of a computer, what those CPUs understand is these very, very low level operations like this. And these relate to moving things around in memory and copying things and reading things and putting things onto the screen. 

But much more arcanely than C is. But again, we don't have to care about this, because Clang does all of this for us. But once you're at that point of having assembly code, you need to get it to machine code the actual zeros and ones. And that's where Clang does what's called assembling. There's another part of Clang, like some built-in functionality, that takes as input that assembly code and converts it from this to the zeros and ones that we talked about in week 0. 

But for a program like hello.c, which involved a few different files. For instance, this code again involved my code that we wrote last week. It involves the CS50 library, which the staff wrote years ago. And it involves standard io.h. That's yet another file. That's like three different files that Clang frankly has to compile for you. 

Now it would be super tedious if we had to run Clang like three times to do all this compilation. Thankfully we don't. It all happens automatically. So the last step in compiling a program after it's been pre-processed, after it's been compiled, after it's been assembled, is to combine all of the zeros and ones from the files involved into one big file, like Hello or a.out. 

So if hello.c started as source code, as did CS50.C, somewhere on the computer's hard drive, as did Standard IO.C, somewhere on the computer's hard drive, turns out the printf is actually in its own file within Standard IO. the library. But these are the three files involved for the program I just described. 

So once we actually go ahead and assemble this one, it becomes a whole bunch of zeros and ones. We assemble this one, a whole bunch of zeros and ones. This one, a whole bunch of zeros and ones. That's like three separate files that then get linked together, sort of commingled, into one big file called Hello, or called a.out. 

And my god, like that's a lot of complexity. But that's what humans have been building and developing for the past many decades when it comes to writing software. Back in the day, it started off as zeros and ones. That was no fun. Assembly language, scary though it looks, was actually a little easier, a little more accessible for humans to write. 

But eventually we humans got tired of that, and thus were born languages like C and C++ and Python and PHP and Ruby and others. It's been an evolution of languages along the way. So this now we can just abstract away into compiling. When you compile your code, all of that stuff happens. But all we really care about at the end of the day is the input, your source code, the output as machine code. 

But those are the various steps happening. And if you ever see cryptic-looking commands on the screen, it might relate indeed to some of those intermediate steps. All right, any questions then on what compiling is or pre-processing, compiling, assembling, or linking? Anything at all? All right. 

So beyond that, I'm sure you've encountered now, after just one week, bugs in your software. And in fact, one of the greatest skills you can acquire from programming class is not only how to write code, but how to debug code, most likely your own. And if you've ever wondered where this phrase comes from, this notion of debugging, so this is actually part of the mythology. 

So this is actually a notebook kept by Grace Hopper, a very famous computer scientist, working years ago with some colleagues on what was called the Mark 2 system. If you've ever walked through Harvard Science Center, there's a big part of a machine in the ground floor of the Science Center. That's the Mark 1, the precursor. 

Well, the Mark 2 at some point was discovered as having literally a bug inside of it, which was causing a problem. A moth of sorts. And Grace Hopper actually made this record here, if we zoom in, the first actual case of bug being found. And even though other people had used the expression bug before to refer to mistakes or problems in systems, this is really sort of the lore that folks in computer science look back on. 

So bugs are just mistakes in programs, things that you surely did not intend. And we'll consider today now how we can empower you, much more so than this past week, to solve your own problems and actually debug your software. So what are the mechanisms via which we can do this? 

So Help 50 is one of the tools that CS50 itself provides you with. And let's go ahead and take a look at a quick example that allows us to use this tool. I'm going to go ahead and open up my CS50 Sandbox here. I'm going to go ahead and create a program called Buggy 0.C, knowing in advance that I'm going to make a mistake here. 

And I'm going to go ahead and do main void, as do all of my programs begin. And I'm going to go ahead and do printf hello world backslash n semicolon. All right, so that's buggy 0.c. And again, even though I could run the Clang commands, henceforth I'm just going to run things like Make. So make buggy 0 Enter. And all right, here's the first of my errors. 

Let me just increase the size of my terminal window, focusing as always, always on the first error, which is the one in red here. Implicitly declaring library function printf with type int const char *w, error-- I mean, there's a lot there. There's a lot to digest, even though by now, you might recognize at least some of these symbols. But suppose you don't, and you want help understanding this message. Short of asking a human for help, someone who's more familiar, you can instead do this. Rerun the same command as before, but prefix it with help 50 and hit Enter. And what will happen is we will run make for you again. We will look at the output of make, cryptic though it might be to you, run it through our own Help 50 software and look for messages we understand. 

And if we recognize one of the error messages in your output, we're going to highlight in yellow a message like this-- buggy zero, dot C3 colon 5, error, implicitly declaring library function printf with type, dot, dot, dot. Did you forget to include standard Io dot h and with printf is declared at the top of your file. So that's, in this case, the exact answer. And so now, you'll just see that not only are we still showing you the error, we're highlighting where it is. And in fact, buggy zero, dot c, line 3, character 5, or column 5, is just one way of now homing in on what the issue is. 

Let me go ahead and open up another file here, or enhance this as buggy one dot c, and make a similar mistake, but one that triggers a different error message. In this case, I'm going to go ahead and get this right this time, include standard Io dot h. And then I'm going to go ahead and do int main void, and then just as before, I'm going to do this canonical program. String name gets get string. And ask the user, what's your name-- backslash, n. And then I'm going to go ahead and say hello to them with a %s comma name. 

So that too looks good. I'm going to go ahead and scroll back up here, do make buggy one this time. But of course, it looks like, my god, as before, I have two lines of code, yet somehow, five or six errors. Always focus on the top. So it probably relates to something like this, but this one's more confusing. The undeclared identifier string-- did you mean standard Io? Well, no. So if you don't quite grok that, go ahead and run the same command, help 50, make buggy one. And this time, we'll see the output of this command, hopefully, after asking for help, a clue as to what it is that we're actually looking for. 

And indeed, now we notice that oh, by undeclared identifier, clang means you've used a name string on line five of buggy one dot c, which hasn't been defined. Did you forget to include cs50 dot h, at this point. So in short, anytime you're having a problem running a command and you're seeing cryptic messages, reach for help 50 as a command for actually explaining it to you. And thereafter, probably you won't have to run that same command again. 

But what about another? Let me go ahead and open up a program I wrote in advance here, and go ahead and open this one. Yeah? Sure. 

AUDIENCE: [INAUDIBLE] just press more buttons. 

DAVID MALAN: To rerun the same command? 

AUDIENCE: Not to delete that, but to [INAUDIBLE] 

DAVID MALAN: Oh, yes, so just to keep things neat in class, I'm in the habit of hitting Control l a lot, which just clears my terminal window. It has no functional impact. It just gets the clutter off of the screen. You can also literally type, for instance, clear, Enter. That's just a little more verbose than hitting Control l. So there's a lot of little keyboard shortcuts, and interrupt at any point if you have questions about those. 

So here's a program that also is buggy. I wrote it in advance, and it's called buggy two dot c. It's got a for loop. It's printing some hashes. And the goal of this program is to print something 10 times. So I've got my for loop from zero on up to 10. I'm printing a hash with a backslash n. So let's go ahead and run this, make buggy two. Oops. I'm not in this directory. Let me go ahead and make buggy two-- seems to compile. So this is not a problem for help 50 yet, because that would be when the command itself isn't working. Buggy two-- all right, it looks good, but let's just be super sure-- one, two, three, four, five, six, seven, eight, nine, 10, 11. 

So it is flawed, if my goal is to print just 10 hashes. And obviously, this is very contrived. Odds are, you can just reason through what the problem here is, but this is representative of another type of problem that's not a bug syntactically, whereby you typed some wrong symbol or Command. This is more of a logical error. My goal is to print something 10 times. It's obviously not. It's printing something 11 times. And suppose that the goal at hand is to wrap your mind around, why is that happening? 

Well, the next debugging tool that we'll propose that you consider, is actually quite simply printf. It's perhaps the simplest tool you can use to actually understand what's going on inside of your program, and we might use it in this case as follows. I'm obviously printing out already the hash symbol, but let me go ahead and say something more deliberate, just to myself, something like i is now, %i, and then let's go ahead and just put a space, and then in there, output i semicolon. So this is not the goal of the program. It's just a temporary diagnostic message, so that now, if I go ahead and increase my terminal window, recompile buggy two, and rerun dot slash buggy two-- [LAUGHS] buffy two-- buggy two-- I'll now see, oh, a little more interesting information. 

Not only am I still seeing the hashes, I'm now seeing, in real time, the value of i. And now, it should probably jump out at you, if it didn't already in the for loop alone, what's the mistake I've made in my code? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Say again. 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, my first value for i was zero, and that's normally OK. Programmers do tend to start counting from zero, but if you do that, you can't catch keep counting through 10. You have to make a couple of tweaks here. So what can we do to fix? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, so this would be the canonical way of doing this. It's not the only way, but generally start at zero and go up to less than the value you care about. So now if I rerun this, I can go ahead and run make buggy two again, clear my screen, dot slash buggy two, Enter. And now I indeed have 10, even though it never says 10, but that's OK, because I'm starting at zero, and now that I found my logical error, where it's just not working as I intended, now I can go ahead and delete that line. 

I can go ahead and make buggy two once more, dot slash buggy two, Enter. And voila, I can now submit my program, or ship it out to my actual user. So printf is sort of a very old-school way of just wrapping your mind around what's going on in your program by just poking around. Use printf to see what's going on inside of your program, so you're not just staring at a screen trying to reason through without the help of the computer. 

But of course, that's about as versatile as cs50 sandbox gets when it comes to solving problems. You can write code up here. You can compile and run code down here. And there are commands like help 50 and a few others we'll see that you can run to improve your code, but the sandbox itself is actually pretty limited. And so today, we're going to introduce another programming environment that fundamentally is the same thing, it just has additional features, particularly ones related to debugging. 

So here now, is what is called CS50 IDE. IDE is a term of art for integrated development environment. You might have used it if you programed before in high school things like Eclipse or Visual Studio or NetBeans or a bunch of other tools as well. If you've ever used any of these tools, that's fine. Most students have not. But CS50 IDE is just sort of a fancier version of CS50 sandbox that adds some additional tools, like debugging tools. 

And so here I've gone ahead and logged in advance to CS50 IDE, and it's pretty much the same layout. On the top of the window is where my tabs with my code will go. On the bottom is my terminal window. It happens to be blue instead of black, but that's just an aesthetic detail. But you'll see a teaser over here of other features, including what's called the debugger, a program that's going to let me actually step through my code, step by step. 

So let's go ahead and do this after introducing one other command that exists in the IDE, and that's called debug 50. Suffice it to say, that any command this semester that ends in 50 is a training wheel of sorts that's CS50 specific. But by term's end, well we have essentially taken away all of those CS50 specific tools so that everything you're using is industry standard, so to speak. So if we look now at CS50 IDE, let's go ahead and maybe run that same program. 

So if I click this folder icon up here, you'll see a whole bunch of files, just like in the sandbox. And I've pre downloaded all of today's source code from CS50's website and just uploaded it to the IDE, just like you can in the sandbox. And we'll do this in section or in super section, manually, if you'd like. I'm going to go ahead and open up that same program buggy two, that's now in the IDE instead of the sandbox, and you'll see it looks pretty much the same. 

The color coding might be a little different, but that's just an aesthetic detail. And I can still run this. Make buggy two down here. But notice here, this error, I could use help 50 on this, but notice in advance, I've downloaded all of my code into a folder called source two. That's what's in the zip file, on the course's website. So again, just like we did briefly last week, if you know your code is not just in the default location, but is in another directory, what does cd stand for? 

AUDIENCE: Change directory. 

DAVID MALAN: OK. So change directory-- so not that hard. It changes directory. And now notice what the sandbox does. It's a little more powerful, even though it's a little more cryptic. It always puts a constant reminder of where you are in the folders in your IDE, whereas the sandbox hid this detail altogether. So again, we're removing a training wheel by just reminding you, you are in source two and the tilde is just a computer convention, meaning that is your home directory, that is your personal folder with your CS50 files, demarcated with just a tilde. 

So now I'm going to go ahead and do make buggy two. It does compile, because again, this is not a syntax error. This is a logical problem. I'm to go ahead now and dot slash buggy two. And if I count these up, I've still got 11 hashes on the screen. So I could go in and add printf, but that's not really taking advantage of any new tools. But watch what I can instead do. Let me scroll this down just a little bit so I can see all of my code. 

Let me go ahead and click to the left of the line numbers in the IDE, like in main, and it puts a red dot, like a stop sign that says stop here. This is what's called a breakpoint. This is a feature of a lot of integrated development environments, like CS50 IDE that's telling the computer in advance, when I run this program, don't just run it like usual, stop there, and allow me, the human, to step through my code, step by step by step. 

So to do this, you do not just run buggy two again. You instead run debug 50. So just like help 50 helps you understand error messages, debug 50 lets you walk through your program step by step by step. So let me go ahead and hit Enter. You'll notice now on the right-hand side a new window that the sandbox did not have opened up. And there's a lot going on there, but we'll soon see the pieces that matter. That is the debugger. 

And you'll see that this line here, line seven, is highlighted, because that's the first real piece of code inside of main that's potentially going to get executed. Nothing really happens with the curly braces. Seven is the first real line of code. So what this yellow or greenish bar means is that the debugger has paused your program at that moment in time, has not run all the way through, so we can start to poke around. And in fact, if I zoom in on the right, let's focus today pretty much on variables, you'll notice a nice little visual clue that you have a variable called i. 

At the moment, its value is zero. What is its type? Integer. So watch what happens now when I take advantage of some of the icons that are slightly higher up. I'm just going to scroll up on the debugger, and most of this we'll ignore for today, but there's some icons here. So if I were to hit Play, that will just resume my program and run it all the way to the end-- not very useful if my goal was to step through it. But if you hover over these other icons instead, step over, this will step over one line of code at a time, and execute it one by one by one, so literally allowing you to walk through your own code. 

And so let's try this. When I go ahead and click Step Over, notice that the color moves. Watch my terminal window now, the big blue window at the bottom. I'm going to see hash. Now notice that line seven is highlighted again, because just with a for loop, something's going to happen again and again. So what should we see happen though when I click step over once more? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: i should become one. So it's a little small, but watch the right-hand side of the screen where it says variable i, and I click Step Over-- voila, now we see one. And if I continue doing this, not much of interest really happens. I've just really slowed down the same program. But you'll notice that i is incrementing again and again and again. But what's interesting here is I didn't have to go in and change my code by adding a bunch of messy printf statements that I'm going to have to delete later just to submit my code or ship it on the internet. Instead, I can kind of watch what's going on inside of my computer's memory while I'm executing this program. 

And the fact now that the value of i is 10, and yet I'm about to print another hash, therein lies the same logical error. So we're seeing just graphically the same problem as before. So now at this point, the program is pretty much done. If I keep clicking Step Over, it's just going to terminate. If at this point, I'm like, oh my god, now I know it's wrong, you can exit out of most any program in the IDE or in sandbox by hitting Control c, for cancel, and that will kill the debugger, close the window, and get you back to your terminal window. 

And I can't emphasize this enough, moving forward even this week, use help 50 when you have a bug compiling your code, some error message that you don't understand. It will just help you like a member of the staff could. And then certainly reach out to us if you don't understand that. But debug 50 should, moving forward, be your first instinct. If you have a bug where something's not working, the amount of change your computing is wrong, the credit card numbers you're analyzing are wrong, use debug 50, starting this week, not two weeks from now, to develop that muscle memory of using a debugger. And it is truly a lifelong skill, not just for C, but for other languages as well. 

Any questions on that? You'll see more of it in section and beyond. So what else do we have in the way of tools in our toolkit here? Let's go ahead and introduce one other now. That one you've probably used this past week called check 50. This is a tool that allows you to analyze the correctness of your code. And you might recall with check 50, you did a little something like this. If I went ahead and whipped up a program, like my typical hello dot c-- so I've gone ahead and clicked Save, saving this file as hello dot c. Let me go ahead and include standard Io dot h, int main void. Let me go ahead now and printf. Hello comma world backslash n semicolon. 

And I know from the problem sets, that the way to check the correctness of this code with CS50-- check 50 and then a slug, a unique identifier. I'm using a shorter one just for lecture today called CS50 problems hello. That is just the unique set of tests that I want to run on my code called hello dot c. So what's happening here is I'm being prompted to authenticate. GitHub is what this uses, as you've seen. I'm going to go ahead and use my student account. I'm going to go ahead and log in. 

You'll notice a star represents your password, so it kind of sort of masks it, even though everyone in the world now knows how long my password is. And now we're preparing, we're uploading the submission, and in just a few seconds, we'll get some feedback from CS50's server that tells us, hopefully, that my code is perfectly correct-- perfectly correct. But no, it's not in this case. And if you recall from problem set one, you weren't supposed to just print hello world. You were supposed to print hello so and so, whatever the human's name is. 

So you'll see two green smileys here saying hello dot c exists. So I got that one right. I named the file correctly. Step two, it compiled, so there were no error messages when we ran make on your code. But we did get unhappy twice. We expected when passing in the name Emma, for you to say hello Emma. And when we expected to pass in Rodrigo, we expected hello Rodrigo, so you did not pass these two tests. 

So check 50 happens to be CS50 specific, that the TF's and I use to grade and provide automated feedback on code, but it's representative of what in the real world are just quite simply called tests. Whenever you work for a company or write software, part of that process is typically not just to write the code that solves your problem, but to write tests that make sure that your own code is correct, especially so that if you add features to your programs down the road or someone else tries to add features to your code, they and you don't break it-- you're constantly have a capability to make sure your code is still working as expected. 

So while we do use it in academic context to score problems sets, it's fundamentally representative of a real-world process of testing one's own code repeatedly. And then lastly, there's this thing-- style 50. So it's not uncommon when learning how to program, especially in a language like C, to be a little sloppy when it comes to writing your code. Technically speaking, this same program here, I could just make it look like this. And frankly, if I really wanted to, I can make it look like this, and the computer's not going to care. It's smart enough to be able to distinguish the various curly braces from parentheses and semicolons. 

But my god, this is not very pleasant to look at. Or if it is right now, break that mindset. This is not very pleasant to look at. You should be writing code that's easier for you to read, for other people to read, and honestly, easier for you to maintain. There is nothing worse than writing really bad code, coming back to it weeks or months later to fix something, add something, and you don't even know what you're looking at because it's your own code. So style 50 is a tool that just helps you develop muscle memory for writing prettier code. Style has nothing to do with your coach correctness. It's more of the nit picky aesthetics that just makes it pleasant to look at. 

And reasonable people will disagree as to what constitutes pretty code. With style 50, we, like a company, have standardized on what we would propose your C code looks like, so that we can have an objective measure of how clean it is. So if I go ahead and run, after saving my file, style 50 on hello dot c, Enter, you'll see some output like this. You'll see your same code in black and white at the bottom, but you'll see green text telling you where you should add space. So you should literally hit the spacebar four times and that will make style 50 happy. 

By contrast, if I instead do something like this, let me go ahead and correct it incorrectly. There are people in the world that write code that looks like this. This is frowned upon. But if I go ahead and run style 50 now on this file-- Enter-- you'll see the opposite. And it gets a little scarier with this syntax, because we're doing our best to explain what it is we want you to do. But we want you to delete the new line, the Enter key that you hit here, and we want you to pull it up to the top here, and we want you to delete that read here. 

So admittedly, it's sometimes hard for the computer to give you very straightforward advice as to what's going on. So you'll see over time, certain patterns. So in fact, if I go to CS50's own website here, let me go ahead and pull up what's called a style guide. And this is the authoritative answer when it comes to what your code should look like in a class or in a company. You'll see throughout this style guide that's online a lot of examples of what good code, pretty code, readable code should look like. And there, too, reasonable people will disagree, but it's part of the programming process to have good style for your code, as well in style 50 allows you to develop that muscle memory, as well. 

And one aside, whereas the sandbox tool used to auto save your file, the IDE does not do that. So notice I just hit Enter a couple of times in this file, or suppose I said something like Goodbye World more explicitly, and suppose I now move my cursor to the terminal window, you'll see a big red alert saying, hey did not save your file. That's because the IDE is meant to be a little more powerful and a little more of the onus now is on you to actually know OK, red dot up there means I should save. So file, Save, or you can hit Control s or Command s. So just realize that is now unto you. 

And lastly, a summary of what all these tools really figure into. Pretty much, the first four of these tools all relate to the writing correct code, code that works the way you want it to, code the way we want it to, code the way that some problem to be solved wants you to implement it. Style is the last of those, and that's really the best categorization thereof. Of course, not always do these tools solve all of your problems. And undoubtedly, if you didn't experience this, this past week already, you will get frustrated. 

You will get incredibly frustrated sometimes by some bug in your code and you might be staring at it. You might be thinking it through. You might try all of these darn tools, go to office hours tutorial, and it's still not working out for you. Frankly, the solution there is to take a step back. And I can't emphasize enough the value of going for a jog, taking a break, doing something else, changing your mental model and coming back to it later. I have literally, and I'm sure many of the TF's and TA's have, solved code while falling asleep, because there, you're sort of thoughtfully thinking through what it is you did, what it is you're trying to do. 

But undoubtedly, it helps to talk through your problems some time. And there's this other term of art in computer science called rubber duck debugging. The idea being that if you don't have a TF at your side or CA at your side or roommate who has any idea what you're talking about when it comes to programming, you can have one of these little things on your desk that you can literally, probably with the door closed, start talking to, to explain to the duck, just like you would a teaching fellow, what it is you think your code is doing, walking through it line-by-line verbally, until hopefully, you have that self-induced aha moment, like oh, wait a minute, it's supposed to be 10 not 11, at which point, you discretely put the duck back down and go about your work. 

But it is meant to be this proxy for just a very deliberate thoughtful process to which everyone is welcome. You're welcome to take a duck today on your way out and we have lots more tutorials and office hours, because this is not enough here today. This is just because it exists. But the goal with rubber duck debugging is just that additional human mechanism for solving problems by taking the emphasis off of tools and putting it really back on the human. So if a little socially awkwardly, consider deploying that tool as needed as well. 

So that's all focusing on correctness and style, and that's indeed what every problem set here on out is going to have as one component. Does it work correctly and is it well styled? But the third axis of quality, when it comes to writing software, not just for CS50 but really in general with programming in the real world, is this notion of design. And design isn't quite something that we can assess yet with software, and say you designed that well or you did not design that well, it's more of a subjective measure. And here, too, reasonable people can disagree. 

So what we'll focus on, not only today, but in the weeks to come, is also the process of writing well-designed software and making more intelligent decisions to not just get the problem solved, but to get it solved well. And this is what full-time software engineers at the Facebooks and Googles and Microsofts and others of the world do every day, especially when they have huge amounts of data and many, many users. Every design decision they make matters and might cost money or CPU cycles or memory. And indeed, think back to week zero, finding Mike Smith was possible in three different ways, but that third way, the divide and conquer, was hands down the most efficient. That was better designed than the first couple. 

So let's now consider this in the context of programming and how we can use a few new features today in C to solve problems better and to write better designed code. And we'll do that first by way of something that is called an array. So an array is something that allows us to solve a problem, in perhaps, the following way. So in our computers-- in our programs in C, we have choices of bunches of data types. We've seen that there's chars, there's ints, there's floats, there's longs, there's doubles, there's bool, there's now string, and there's actually a few others as well. And each of those, depending on the computer system you're using, does take up a specific amount of space, on CS50, IDE, on the sandbox, and most likely on your own personal Macs and PCs. 

These days, each one of these data types, if you're writing a program in C, takes up this much space, where one byte is 8 bits, 4 bytes is 32 bits, 8 bytes is 64 bits, to tie it back to week zero. So these are data types that we have at our disposal for any variables in our computer's memory. So why is that germane here? Well, this is that thing I showed a couple of weeks ago too, which is representative of RAM, random access memory. It's one of the pieces of hard drive in your macro PC or even phone these days. 

And each of these black chips represents some number of bytes. Odds are, small although it is in reality, it might represent a billion bytes if you have one gigabyte of memory, or maybe even more than that these days. But this little black chip, inside of your Mac, PC, or phone, is where information is stored when you're running software, whether it's on a desktop, or laptop, or mobile device. And we can actually think of this chip as just being divided into a bunch of different individual bytes. In fact, let's just arbitrarily zoom in on it and sort of divide it into rows and columns, and just claim that the top left here is going to be the first byte. This is the second byte, the third byte, and way down here is like the billionth byte of memory in my computer, obviously not drawn to scale, which is to say we can just number these bytes. So one, two, three, four, five, six, seven, eight, or to be really computer science like zero, one, two, three, four, five, six, seven, and so forth. 

So we don't have to know anything about how RAM works, electrically or physically, but let's just stipulate that if you've got some amount of RAM, we can surely think of each byte as having a number. So what does that do for us? Well if you write a program that has a char in it, a character, how big was a char according to the chart a moment ago? So just one byte. So if you allocate a char, called c, or called anything in your program, you will be asking the computer to use just one of these tiny little squares physically inside of your computer's memory. 

By contrast, how about an int-- how big was an int? Four bytes. So if you want to store a number as an integer, you're actually going to consume four of these bytes in your computer's memory instead. And if you're using a double or long, you might use as many of eight of them. So what is inside each of these boxes? There's eight bits here, eight bits here, eight bits here, or maybe it's eight little transistors, or even eight little light bulbs. Whatever they are, they're some way of representing zeros and ones. And that's what each of those boxes represents. 

So what can we do with this information? Well, let's go ahead and get rid of the hardware and abstract away, so to speak, as we keep doing, and consider if we zoom in here, how the computer, last week and this week end forever here out, is storing the information in the programs that you write. Suppose for instance, that we've got a program like this, with just three characters in it. I'm going to go ahead and whip this up in a file called, let's say, hi dot c. And I'm going to go ahead and do include standard Io dot h, int main void-- learning. 

Now in here, I'm going to go ahead and have those three lines of code. So give me one char called c1 arbitrarily and set it equal to a capital H. Give me another one called c2, set it equal to capital I. Give me a third called c3, and set that equal to the exclamation point. Now you'll notice one detail that I've not emphasized before, I don't think. What types of punctuation am I clearly using here? So single quotes or apostrophes here. Single quotes in C are necessary for chars. Chars or single characters, just one byte. 

Whenever you want to hardcode them into a program like this, like I've done here, use single quotes. Of course for strings we used double quotes. Why? Just because. Like C requires that we distinguish those two. So let me just do something a little silly here. Now that I've got three variables, let me just go ahead and print them all out. What is the format code I can print-- I can use to print a char? Yeah, a percent-- 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Percent c for char, so percent c, and I want three of them. So I'm going to print all three at once, followed by a new line. And then if I want to print c1 first, c2, c3, that's the syntax with printf for just plugging in three place holders followed by three values, respectively left to right, and hopefully it's going to print presumably hi on the screen followed by a new line. So let me save the file. Let me do make hi. OK, no errors, which is good. Let me do dot slash hi, and indeed I see hi exclamation point, however with a space in between each character. But you know what? hi exclamation point are indeed chars, but what is a char, or a character? What is an Ascii character underneath the hood? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: It's ultimately binary. Everything is binary. And what's one step in between there, in some sense? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: It's just a number, an integer. Thanks to Ascii and Unicode in week zero, there's just a mapping from characters to numbers. So how do I print numbers? What format code do I use for printf? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Percent i, for integer. So suppose I want to actually see those values? Notice what I can do. I can tell the computer, you know what? Even though c1 is a char, please go ahead and treat it as an integer. And I can literally write int in parentheses before the variable, which is what's known as casting, C-A-S-T, which is just a verb describing the act of converting one data type to another so that I can actually see those numbers. So let me go ahead and save the file. Let me go ahead now and do make hi again. That seems to work fine. Dot slash hi, and now this old familiar 72, 73, 33. 

And frankly, I don't need to be so pedantic here. Frankly, clang is smart enough to just know that if I pass it a char, but I ask it to format it is an int, it's going to implicitly, not explicitly, cast it for me. So if I go ahead and run make hi again, and do dot slash hi, I'm going to see the exact same thing. So this understanding of what's going on underneath the hood can allow me to kind of tinker now and play around with what's going on inside of my computer's memory. But let's now see this more visually. 

If this is my computer's memory really magnified, such that there's like a billion squares somewhere available to me and this is zero, this is one, this is two. Suppose I have a program with three variables-- c1, c2, and c3-- what the computer is going to do is going to put the h in one of those boxes. It's going to put the i in another box, and it's going to put the exclamation point in a third box, and somehow or other it's going to label those with the names of the variables. It's going to sort of jot down as with a virtual pencil, this is c1, this is c2, this is c3. But it's the H-I exclamation point that's actually stored at that location. 

But of course, it's not just a char. It's really technically a number. So really what's going on inside of my computer's memory is that 72, 73, and 33 is stored. But someone called out earlier it's actually binary. So what's really underneath the hood is this. Those zeros and ones are somehow implemented with transistors or light bulbs or whatever the technology is, but it's just storing a pattern of zeros and ones. And I did out the math before class. This indeed represents 72 in decimal, 73, and 33. 

But here, too, we're getting to a low-level implementation detail that we generally don't need to care about. Abstraction, per week zero, is this beautiful thing because we could just, meh, tune all that out and just think of it at any higher level that we want, whether it's decimal or whether it's actual Ascii characters. But that's all that's going on underneath the hood. Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Really good question. If you declared three variables as integers and stored 72, 73, 33 in them and tried to print them then with percent c, yes, you could coerce that behavior as well, and literally do the opposite. At that point, you need to know what the Ascii codes are-- 72, 73, 33. And mostly, programmers don't care about that. All they do is know that there is some mapping underneath the hood, but absolutely. 

Well let's consider another example now, this time involving three score, so three integers, instead of something like three characters. What might I actually do with values like this? Well, let me go ahead and write some code, this time in a file called scores dot c. I'm going to go ahead and clean up my terminal here and create a new file called scores dot c. And let's go ahead and do a few similar lines here. Let me go ahead and include say, CS50 dot h, include standard Io dot h, int main void, and now go ahead and start declaring some variables. 

Give me int score one. And I'm going to declare my score on some assignment to be 72, another score on an assignment to be about the same, 73, and another regrettable assignment to be, say, 33. So now I have three variables called integers, and suppose I just want to do something like print the average. I can certainly do this with printf and some math. So I might go ahead and say the average is % i, where that's going to be a placeholder, then a new line. And then the average, of course, is going to be something like score one, plus score two, plus score three, divided by three total, and then semicolon. So again, that's just the average. Add three numbers together, divide by the total number, and voila, we should get an average. 

Let me go ahead and save the file, compile this with make scores, Enter. Seems to compile OK-- dot slash scores. And I should get an average of 59 for those three quiz scores, or assignment scores, in this context. But this isn't the best design now. Now that we're dealing with numbers and scores, especially in the context of like a class where maybe you're going to have four scores or five scores or more scores, ultimately, week to week. What rubs you perhaps the wrong way about this design so far? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Say again. 

AUDIENCE: I 

DAVID MALAN: Yeah, it's very fixed. This is like writing a program at the beginning of the semester and deciding in advance there's only going to be three assignments, and if you want to have a fourth, too bad. The software does not support it. So that's not the best design. And what else might you critique about this code, simple as it is. Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, I'm potentially cheating students out of a partial score, especially if their average was like 59.5. I would like to be rounded up to 60, for instance. So we're also having some imprecision issues. And we'll come back to that as well. Any other critiques? Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, even though I typed it out manually, this is dangerously close to just copying and pasting the same code again and again and again. So just with the hi example, as with this one, as with our cough example last week and the week before, just doing this thing again and again and again is really an opportunity for a better design. So it turns out, there is that opportunity. And in C, if you know that you want to have more than just one value, but they're all kind of related, what might be a nice name for a variable containing multiple scores? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Scores plural in English. So how can we do that? Well unfortunately, if I just say int scores, I need to decide which score it gets as a value. Now those of you who have prior programming experience, might know where we're going with this, and we're about to get there. It turns out in C, if you want to have one variable that can store multiple values, you use what's called an array. An array is a list of values that can be all the same type in a variable of the same name. So if you want three scores, each of which is an int in C, you literally use square brackets, the number of scores you want, and then a semicolon. That will say to the computer, give me enough memory for three integers. 

Down here now, I get to change my syntax. I don't want score one, score two, score three. I want to put these scores inside of the array by simply saying its name, using square brackets, albeit a little differently this time, and put them at locations one, two, three, but that's actually my first mistake. Computer scientists typically start counting at one-- no-- computer scientists typically start counting at zero, so I need to zero index my array. Arrays are zero indexed, which just means the first location is zero, the second is one, the third is two. So this now, is equivalent code to giving me three variables, but now I've gotten rid of the messiness that you identified by copying and pasting the name again and again, and I can store them all together. 

AUDIENCE: On the scores, the number three stands for three variables, right? It doesn't stand for four? 

DAVID MALAN: Does the three stand for three variables? It stands for enough space for three values in one variable. Good question. Others, questions? Yeah? 

AUDIENCE: [INAUDIBLE] bringing equals and then [INAUDIBLE] 

DAVID MALAN: Really good question. Can you do this all in one line? Yes, but let me just tease you by saying something like this involving curly braces, but we won't go there today. But yes, there are ways to get around this. So let me go ahead and fix this now. If I want to compute the average now, I need to add these three values in this array, score zero, scores one, and scores two. But arithmetically, the answer-- the code is still the same, so if I now make scores and do dot slash scores, my average is still 59. And I do disclaim, there's still probably a mathematical bug because if we're using integers, as was noted, but we'll come back to that in just a little bit. 

So let's push a little harder. Even if you've never programmed before, what might still be a little bad about the design. The program works, but we can do it better. 

AUDIENCE: Still only stores three. 

DAVID MALAN: Still only stores three. So we haven't even solved the very first problem. Other critiques? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: I have too much code in the last line. Yeah, it's getting a little wordy, so it's going to be a little harder to read-- quite fair. Yeah? 

AUDIENCE: I 

DAVID MALAN: Sorry, say it a little louder. 

AUDIENCE: The scores are hardcoded into the program. 

DAVID MALAN: Yeah, the scores are hardcoded into the program, which means it doesn't matter what you get on your assignments, we're all getting 59's. So that's another problem as well. And any other critiques? Yeah? 

AUDIENCE: If it could read the input data, it might be better. 

DAVID MALAN: If it could read input data-- yeah, so let me combine those suggestions. It'd be great if, eventually, this program is dynamic. And anything else? Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Definitely. We can pull loop into the situation and actually get multiple values from the user. 

AUDIENCE: Always dividing by three, so [INAUDIBLE] 

DAVID MALAN: Yeah, it's also always dividing by three. And this is subtle, and it's not a huge problem yet, but there is this principle I'm kind of violating here known as don't repeat yourself. And I have repeated myself in at least two locations. What values appear in two locations? So three up here, and then also three down here. And minor though this detail seems, this is the source of so many common bugs because if you just kind of decide by yourself, well, I'm going to hard code three up here, I'm going to hard code three down here, odds are, tomorrow morning, next week, next month, next year, let alone a colleague of yours, is never going to notice the subtlety that this three just by social contract has to be the same as this three. 

That is not a code constraint. That's just sort of a little thing you knew and decided at the time. So let me fix this in the following way. It turns out that in C we can have variables that just have numbers like this, so maybe int n gets three. I can now just use my variable here and here. That's a little better. It's a little better. But there's this other feature in C, as with other languages too, where if you know you want to hard code some value, at least for now, but you don't want it to change, you will not change it and you want to make sure you don't accidentally change it, you can actually do something like this and even make it global if we want, at the top of the file, I can say not just int n, but const int n, and just because of human convention, I'm also going to now capitalize the variable, just because. 

And now I'm going to change this n to capital, this n to capital. The reason being, I have just created for myself what's called a constant. A constant is exactly what the word implies, even though you just say const, and then the type of the variable, the compiler, clang, we'll make sure that neither you nor some friend or colleague accidentally change the value of n. So now you can use n here, here, and any number of other places. It will always be the same. And what I'm using at the moment is what's called a global variable, which are often frowned upon, even though you can put variables outside of your functions, as we may eventually see, it tends to be sloppy, except with constants. 

When a constant is a value that you want to set and then forget about, if you come back to this program weeks or months later, and you're like oh, this semester we have four assignments, or five, it's just handy to put the values you might want to change before recompiling your code at the very top so you have to go fishing for visually lower in your code. So just a convention. It goes at the top of the file, quite often, and you declare it as const, and you capitalize it, and then you can use that value, n, throughout the code. 

But now let's tie together those other suggestions and make this program even better, such that it's not just hard coding this one value, n, everywhere. Let me go ahead and get rid of this. Let me go ahead now and take your suggestion that we do this dynamically, and we can use arrays for this too. If I know in advance that I want to ask the user for how many assignments there are this semester, well I can do something like this. Int n gets get int, and I'll say number of scores, and then prompt them for their input. And then what I'm going to do after that is give myself an array called scores of size n as step two. 

And then what I might do is something like this. For int i get zero, i less than n, i plus plus, which even though I'm typing it fast, is exactly the same paradigm we've used before, for, for loops. And here, I could do something like scores bracket i gets get int score semicolon, prompting the user again and again and again for a loop for the IFE score, so to speak. And because I start counting at zero, and on up to, but not through n, I will end up filling this with exactly as many scores as the human requested. 

Let's go ahead now and leave this as a to do for a moment. Let me just because the math's about the change-- let me go ahead and delete that and we'll just not do the average yet just so I can compile this first. I'm going to go ahead and make scores again-- seems to compile. Dot slash scores, number of scores-- let's do three, so 72, 73, 33, Enter, and my average is still to do. So we'll come back to that. But you know what? It would be nice to make this a little prettier. Why don't I tell the human what score I want from them, so I can say, give me score number such and such, i. So let me just use get int, like this. 

Now let me go ahead and make scores, dot slash scores. Give me three scores again. Score zero, 72, 73, 33. Now this is kind of stupid, right? At least for normal people who might use my program, what is score zero? What is score one? We can fix this for normal people, and just do that. We're not changing where we're putting the value, but we can certainly change the aesthetics of what we're doing. So let's remake scores. Dot slash scores, and now it's more human friendly-- 72, 73, 33. 

So one piece remains. How do I now compute the average in a way that's dynamic and I'm not hard coding score one, score two, score three again, or even the array version? And you know what? This is a nice opportunity to maybe come up with a helper function that also solves the int issue from before. So let me go ahead and say, you know what? The average could perhaps have a fraction. So what data type do I want to use if my average might have a fraction? So a double or float. So we'll go with either. I'll keep it simple because the scores are going to be crazy big or precise. 

I'm going to create a function called average. And if I want to average all of the numbers that the human has typed in, turns out I need to know two things. I need to know the length of the array that they've been accumulating and I need to have the array itself, so I'm going to denote it with these square brackets here. I don't have to know, at this point, how big it is. The compiler will figure that out for me. But I can now declare a function like this. Well how do you go about averaging some number of values, if you're handed them in a list, otherwise known as an array, but I'm telling you the length of that list, what's this sort of intuition for taking an average here? Yeah? 

AUDIENCE: You could take the sum and then divide it by [INAUDIBLE] number. 

DAVID MALAN: Yeah. Yeah, the average of a bunch of numbers is just add all the numbers together and then divide by the total number of numbers. And I have all of those ingredients. I have the length of the array, apparently, and I have the array of numbers itself, as follows. So let me go ahead and say something like sum is zero, because I'm just going to start counting from zero, and then I'm going to do for int i get zero, i less than length, i plus plus. So again, I typed it fast, but it's identical to my for loop from before. I'm just using the length as the condition. And now what do I want to do here? On each iteration, what do I want to add to the sum? Sum equals sum plus what? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: The next item in the array. And I can express that, it turns out, just like before the name of the array, which happens to be literally array, just for convenience. And then how do I get the appropriate value from it? Bracket i, because i is going to start in this loop at zero, going to go up to, but not through its length. So this is just a way of getting bracket zero, bracket one, bracket two, and just adding it to sum on each iteration. Now this is unnecessarily wordy. Recall, that this is shorthand notation for that. I can't just use plus, plus here though, because I want to add the actual scores not just one. So I can use either this syntax or the more verbose syntax, but I'll go with this one. 

And now at the end of this function, notice I have to make a decision. And we haven't seen terribly many functions of our own, but if this is what my function looks like, its name is average, it takes two inputs, one of which is an int called length, the other of which is an array of integers, and I know it's an array not by its name, which I could have called anything, but I know it because of these new square brackets today. However, what does this mention of float mean on the left-hand side of line 18? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: That's what it returns. The return value of a function is what it hands back to whoever is using it. So get string, returns a string. Get int, returns an int. Average I want to return a float. And so how do I return this value? Well, let me go ahead and return the sum divided by the length, as I think you proposed? Now there's actually one bug here, but we'll come back to that in a moment. Now let me just go ahead and plug in the average. What's the format code for a floating point value? Percent f, yeah. And then if I want to plug in the average, I can call my function called average. 

And what two inputs do I need to give it? n, which is the length of the array, and scores, which is the name of the array. So again, even though arrays are new, this is not. We have last week called functions that take one or more arguments and it's certainly fine to nest them. However, if you don't like that, you can certainly do something like this-- float average gets that, and then you can plug in average. But again, in the spirit of good design, you're just doubling the number of lines unnecessarily. 

So I'm going to go ahead and nest it just like this. All right, let me save that. And I feel really good about this so far. I feel like everything's making sense. So make scores. And oh, my god. Line 15 seems to be at fault. So we can certainly use help 50, but let's see if we can't reason through. What mistake have I made? It's highlighted here, even though it's very non obvious. Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Exactly. My function is at the bottom of my file and C is kind of dumb. It only does what it's told, top to bottom, left to right. And if your function averages at the bottom, but you're trying to use it in main, that's too late. So we can fix this in a couple of ways, just as we did last week. I can kind of sloppily just say, all right, well let's just move it to the top. That will solve that problem. But frankly, that moves main farther down and it's a good human convention to keep main at the top so you can see the main part of your program. 

This is why, last week, we introduced the notion of a prototype, where you literally-- and this is the only time where the copy-paste is OK-- you copy-paste the first line of your function and end it with a semicolon without any more currently braces. That's now a clue to solve that problem. Hey clang, here's a function. I'm not going to get around to implementing it yet, but you at least know what it's called. Now there's still a slight logical bug in here. Let me try re-saving and recompiling scores. It compiled this time-- nice. 

Let me go ahead and run scores. Number of scores will be three, 72, 73, 33. OK, that's pretty good. Let me try another one. How about two scores. 100 and suppose you get a 99 on the other, you probably want your grade to be what? 100, right. If it's 99.5, you'd prefer we round up. So where is that bug? Well let me scroll down here, and this is what you were alluding to earlier when you identified this early on. So I'm doing a couple of things incorrectly here. 

One, I'm adding the sum here. I'm using an int and initializing sum to zero, and then I'm dividing an integer by an integer. And this is subtle, but in C, if you divide an integer by an integer, just take a guess-- what do you get as the answer? 

AUDIENCE: An integer. 

DAVID MALAN: An integer. Integers can't store decimal points. So even if your score is 99.900000 ad nauseum, what's going to get thrown away is literally everything after the decimal point. So your grade is actually a 99. So there's a couple of ways we can fix this, but perhaps the simplest is this. I can use that casting feature from before. I can tell the computer, don't treat length as an int, actually treated as a float, and you know, just for good measure, also treat sum as a float. And there's different ways to do this, but now, I'm telling the computer divide a float by a float, which will allow me to return a float, and let's see what happens now. 

Let me save that. Make scores. It compiled. Dot slash scores. Number of scores is two. 100 is the first. 99 is the second. Nice, now I've gotten the grade I deserved. Heck, we could even bring in the round function if we want, which you might have used for p-set one, but we'll leave it as this. But I am going to go ahead and just do a 0.1 there. Recall that with format codes you can really start to get precise and say only show me one digit. So if I recompile this now, make scores, and do dot slash scores-- two scores-- 100, 99. There's my 99.5% Any questions then on these arrays and the use there of? Yeah? 

AUDIENCE: [INAUDIBLE] the average [INAUDIBLE] income scores by [INAUDIBLE] 

DAVID MALAN: Explain the average-- this part here? 

AUDIENCE: Yeah. 

DAVID MALAN: Sure, can I explain this? So, let me just show more of the code. The last line of this program's purpose in life is just to print the average of all of my scores. And I decided, partly for design purposes, but also today to illustrate a point, to relegate the computation of an average to a custom function. This is handy, because now if I ever work on another problem that needs to average, I've got a function I can use in that code too. But in this case, average takes two arguments, apparently the length of the array and the array itself, but I could call these two things anything I want-- x and y, length and array, anything else, but I chose this for clarity. 

But up here, I want to use that function. So just like in Scratch, recall that you can nest blocks and you can join something and then say it. So can we call the average function, passing in the length of the array and the array itself, that gives me back my average 99.5, and then I'm plugging that in to this format code in printf. So just like in math, when you have lots of parentheses, work from the inside out. Look at the innermost parentheses, figure out what that is, then work your way outward. And if you've programmed in Java, or Python, or other languages, you might be wondering why we need to tell the function the length of an array. 

In C, the arrays do not remember their own length. So if you have programmed before, this is necessary. You do not get that feature for free in C. Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Correct, if you do percent 0.1 you get one decimal point, so 99.5%. 

AUDIENCE: Suppose that the answer was 99.49 [INAUDIBLE] 

DAVID MALAN: Really good question. If the answer is mathematically 99.49, but you do 0.1 here, it will round up for you. It will-- good question as well. Yeah? 

AUDIENCE: What happens [INAUDIBLE]? 

DAVID MALAN: Really good question. What happens if you divide an int by a float or something else? You will typically up cast it to whatever the more powerful type is. So if you divide an int by a float, you will actually get back a float. So strictly speaking, I did not need to cast both the numerator and the denominator to a float. I just did it for consistency and demonstration's sake. 

So it turns out, while we've been looking at numbers here alone and scores, it turns out that there's actually an intricate relationship with all of the h's and the i's and the exhalation points we've been looking at, and all of the strings we've been typing in too, however this was a mouthful, and frankly I feel like a brownie as well, so why don't we take our five minute break here and we'll come back. 

We are back. So thus far, we've introduced arrays as an opportunity to improve the design of our code. So we're going to hear a lot of squeaking now, I think. So thus far, we've introduced arrays as the-- we're going to do my best to keep a straight face. Thus far, we have introduced arrays as a solution to a design problem so that we can actually store multiple values, but in the guise of one variable so as to avoid the copy-paste tendency that we might otherwise have. And those arrays ultimately started from trying to clean this kind of code up. 

But what is it that was ultimately going on inside of the computer's memory we can still consider, because it's actually not all that different. However, when we have three integers, score one, score two, score three, how many bytes is each of those-- it's going to take up? So four, if you think back to the chat from before, char is one, an int is four, at least on most systems, and so the number 72 in the variable called score one, we can draw on our computers memory is taking up four of these boxes. Because again, each box represents one byte, therefore four bytes requires four boxes. 

Score two and score three would similarly be laid out in my computer's memory. If I had three variables, score one, two, and three, as follows, like this. Of course what's underneath the hood is actually bits, but again, we don't need to worry about that level of abstraction anymore. But that's indeed all that's going on there. But we can clean this up. We can instead get rid of this copy-paste approach to variable names and just introduce an array called scores, plural, and then initialize those three values, as in the program I wrote here. 

And then, this picture is similar in spirit, but the names of these boxes, so to speak, become score zero, scores one, and scores two. So the array is now independent of the number of bytes being consumed. Just because an int is four bytes, doesn't mean you do score zero, scores four, scores eight, and so forth. It's still zero, one, two. The computer will figure out exactly how much space to give each of those values based on its type, which is an int. 

But it turns out that there's actually a relationship now to where we began this story when we looked at characters. H-I exclamation point was implemented with three lines of code using c1, c2, and c3. But last week, we already saw the notion of a string, and it turns out strings and chars are fundamentally interrelated in ways that we can now literally see. If we had a string called s, for instance, and that string contains three characters, H-I and an exclamation point, well it turns out you can actually get at the individual letters in a string by doing the name of the string, bracket, zero, close bracket, or s bracket one, or s bracket two. 

If the name of my variable is s, and s is a string, I can actually access the individual characters there in just like an array, which is to say then, what is a string as of this week versus last? It's just an array of chars. It's just an array of characters. So even though it's a data type, thanks to CS50's library and CS50 dot h, and we're going to take this training wheel off within a few weeks, we've essentially just created a string to be for now, at this point in the story, just an array of characters. Why? Because being able to have multiple characters is certainly way more useful than having to spell things out one variable at a time with one char at a time. 

So string is a data type in the CS50 library that for today's purposes indeed, just an array of characters. And we'll see before long that, that too is actually kind of a bit of a white lie, but we'll see why before long as well. So if I declare a string in C, I can actually literally do something like this. String s equals quote unquote hi, this time using double quotes, and not single quotes, because it's three characters and not just a single char. So in memory, that's actually going to look pretty much the same. If the variable's called s, it's going to have h i and an exclamation point. And just for simplicity, I'll label the first box as s and just assume that we can get everywhere else. 

But it turns out that strings are a little special, because unlike a char, which is one byte, unlike an int, which is four bytes, unlike a long, which is eight bytes, how long should a string be? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, I mean as many characters as you need, because if I want to store H-I I need-- H-I exclamation point, I need strings to be at least three bytes, it would seem-- for my name David, at least five bytes, for D-A-V-I-D-- Brian, as well, and much longer names in the room, too. So strings can't really have a preordained length associated with them, which is why I put a question mark on the board before when I first summarized the sizes of these types. But the catch is that if a variable only has a name, like s, or name, or any of the variables you use for p-set one's problems, it turns out we all need to decide as human programmers how do we know where the string ends? 

The name of the variable, suffice it to say, lets us know where the variable begins, just as I've drawn here. If you reference a variable in a program and call it s, the computer will just know to go to the first character in that string. But there needs to be a little clue to the computer as to where the string ends, and that clue is what's called a null character. It's a little funky to look at, but it's just a backslash zero, which might remind you of backslash n, which too is a little funky, and that's a special symbol that says move the cursor to the next line, give a new line. Backslash zero is the so-called null character or the null terminating character. And all that is special syntax for eight zero bits. 

So each of these boxes represents h bits. This is number 72. This is the number 73. This is the number 33. This backslash zero is just the way of drawing all eight bits as zeros. So that's what a computer uses in C to demarcate the end of a string. It just wastes one byte as all zero bits. And I say waste, because you know what? How much space does H-I exclamation point actually take up accordingly? How many bytes do you need to store hi? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Three, well, four, because you need to know where the string ends, otherwise you won't be able to distinguish the beginnings of other variables, potentially, in your computer's memory. And we'll see this in just a moment. So if my string is called s, it turns out that at s bracket zero is the first character. S bracket one is the second character. s bracket two is the third. And that null character, so to speak, the invisible backslash zero or eight zero bits happens to be at the end. So a string that's of length three, actually takes up four bytes. Any string you have typed into a computer yet, whether it's hi, or David, or Brian, or Emma, or Rodrigo, takes up as many characters as are in those names, plus one byte for this special null terminating character. 

So let's see that. If we were to write a program using these four names, let me go ahead and with that up really quickly here. I'm going to create a file called names dot c, and I'm going to go ahead and do include standard Io dot h. Then I'm going to go ahead and do int main void. Inside of here, I'm going to give myself four strings, using my new array syntax, as before. So I could call this name one, name two, name three, name four, but I'm not going to repeat that bad habit. I'm going to give myself a name-- a variable called names, plural, and store four strings in it, as follows. 

Let's give Emma the first spot there. Let's give Rodrigo the second spot there. I'm using all caps just because we've seen some of those Ascii codes before, but I could use lowercase as well. Let's add Brian. And then I'll go ahead and add myself lastly. So the array is of size four, but I count from zero on up through C. And now just for demonstration's sake, let's go ahead and print out, say, Emma's name. So if I want to print out Emma's name, the type of variable in which she is stored, is what? What is the type that I want to print? String. So that's percent s, just like last week. And I'm going to head and put a backslash n. And if I want to print Emma's name, what do I type here to plug into that placeholder? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Names brackets zero. It's a little bad that I'm hard coding it here, but again, I'm just demonstrating how this all works for now. Let me go ahead and save that. Let me do make names. Bit of an error here. What did I do wrong? Oh my god, all of this is wrong. Does anyone see it yet? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, I forgot the CS50 library. So even though I'm not using get string, I am using string, so I do need the CS50 library up here. So let me go ahead and clear that. Make names. OK better. Dot slash names, and I should just see Emma's name. But watch this, what I can do too. I know that Emma's name is a string, and I now know that a string is an array of characters, so I can also do this. Let me go ahead and print out one, two, three, four characters, and then a new line. And the characters I'm going to print out are going to be Emma's names, first character, Emma's names, second character, Emma's names, third character, and Emma's names, fourth character. 

So you can have what's essentially a two-dimensional array, where you have two sets of square brackets. The first one indexes me into the array of names. And to index into an array means go to a certain location in an array. So names, bracket zero, so to speak. This part here means go get Emma's name from the array of four names. This square bracket after says within that string, treat it as an array of characters and get the zeroth character, the first character, which is hopefully e and an m and an m and then a. 

So I'm going to go ahead and save this file now. Make names again. It compiled, dot slash names, and voila, Emma, Emma, I see twice. Now, I'm never again going to print any string like this. This is just ridiculous, plus I had to know in advance how long her name is. However, it is equivalent to printing the string itself. It's just C and printf knows when you use percent s and you pass on the name of a variable, all printf is probably doing under the hood is some kind of loop and it's iterating over your string from the first character and it's checking, is this the null character? If not, print it. Is this the null character? If not, print it. If this is the null character-- is this the null character? If not, print it. And that's how we get, E-M-M-A stop, because printf, in this line 12, presumably noticed, oh, wait a minute, the fifth byte in Emma's names zero array is backslash zero, or all eight bits as zero. Yeah? 

AUDIENCE: That's just part of [INAUDIBLE] 

DAVID MALAN: That is all part of the underneath the hood stuff of printf and it's what humans decided decades ago with C how strings would work. They could have come up with a different system, but this is the system that they decided to use. Other questions? Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: I didn't go further. So I deliberately did not touch bracket four, even though it's there. But I can try to print this. Let's see. So let me go ahead and change this program real quick. I'm going to go ahead and print out percent C a fifth time. And let's go ahead and see if we can see Emma's null terminating character at location four, which is her fifth location, so after the E-M-M-A. Let me save that. Make names, dot slash names, Emma Emma. So I don't see it there. But you know what? Let me try changing this last one just for kicks to percent i. 

And again, this is where printf is your friend. You can use it powerfully to see what's going on. Or we could whip out debug 50. Let me go ahead and make names, dot slash names. And voila, there's the zero. I'm printing it literally as an int just to see it. I would never do this in the real world. But it's indeed there. And now, this doesn't often work, but just for kicks-- I'm getting a little crazy-- suppose that I want to look well past Emma's name to like location 400, like let's start poking around in the computer's memory, one of those other boxes. Make names, dot slash names. OK, there's a negative three down there as well, or technically a hyphen and then a three. 

So we'll come back to this in a couple of weeks' time. We can actually start hacking around and looking around my computer's memory at any location, because it's just numbers of boxes on the screen. Yeah? 

AUDIENCE: Is there any limit to the length of the string? 

DAVID MALAN: Is there any limit to the length of the string? Short answer-- yes, the amount of memory that the computer has. So like 2 billion 4 billion-- it's long. 

AUDIENCE: What happens if try to type in [INAUDIBLE] 

DAVID MALAN: Really good question. What happens if you try to type that in hypothetically? It depends on the function you use. Let me come back to that in like two weeks time. Get string will not crash. Other C functions will crash, if you give them more input than they expect, and we'll come back to the reasons why. 

So what's actually going on underneath this hood, then, if we have these four names-- Emma, Rodrigo, Brian, and David. Well, if we consider our memory again, we know that Emma's up at this first location, E-M-M-A, followed by this null terminating character. But if the second name we stored in a variable was Rodrigo, turns out he's going to end up sort of back to back with that memory as well. And again, it's wrapping only because this is an artist's rendition of what memory looks like. There's no notion of left, right, up, or down in RAM. But he is R-O-D-R-I-G-O, and his null terminating character there. Brian might end up there. I might end up after it. And this is what's really going on underneath the hood of your computer. 

Each of these values isn't technically a character. It's technically a number. And frankly, it's not even a number. It's eight bits at a time. But again, we don't have to worry about that level of detail now that we're operating at this level of abstraction. And I put up the wrong code a moment ago. This is the code that I actually implemented using an array from the get go, as opposed to an actual-- as opposed to four separate variables. So just to highlight, then, what's going on, per the example I just did with printing out Emma's characters, if this is a variable called names, and there's four names in it, zero, one, two, three, you can think of every character as being kind of addressable using square bracket notation. 

The first set of square brackets picks the name in question. The second set of square brackets picks the character within the name. So e is the first character, so that's zero. m is the next one, so that's one. m is the third, so that's two. a Is the fourth, and so that's three. And then with Rodrigo, he's at names one, and his r is in brackets zero. So again, we're really getting into the weeds. And this is not what programming ultimately is, but this is just to say, there's no magic when you use printf and get string and get int, and so forth. All that's going on underneath the hood is manipulation of values like these. 

So let's now see what a string really is and we'll ultimately conclude today with some domain specific problems. Indeed with problem set two will you be exploring a number of real-world problems, like assessing just how readable some text is, what grade level might a certain book or another be, and two, implementing some notion of cryptography, the art of scrambling information. And suffice it to say, in both of those domains, reading texts and also cryptography, strings are going to be the ingredient that we need. So let's take a look now at a few examples involving more and more strings. 

I'm going to go ahead and create a program here called string dot c, just so I can play with this notion. I'm going to go ahead and include CS50 dot h. I'm going to go ahead and include standard Io dot h. I'll fix this up here-- int main void. And now let me go ahead and just play around with some strings for a moment. Let me go ahead and get myself a string from the user. So get string and ask for their input. Trying to type too fast now. 

So let me go ahead and ask the user for their input via get string, and store the answer in a variable called s. Then let me go ahead and preemptively say that their output is going to be the following. And what I want to do is just print out the individual characters in that string. So for int i get to zero, I don't know what my condition is yet, so I'll come back to that-- i plus plus. I'm going to go ahead and print out the individual character at the i-th location in that string, and I'm going to end this whole program with a new line. 

So I still have a blank to fill in, these question marks, but I ultimately just want to take as input a string, and then print it out as output, but not using percent s. I'm going to use percent c, one character at a time. So my question mark here is what question could I ask on every iteration before deciding whether or not I've printed every character in the string? Yeah? 

AUDIENCE: Length of the string. 

DAVID MALAN: Length of string. So I could say while i is less than the length of string. What else? 

AUDIENCE: The null character. 

DAVID MALAN: Or if it's equal to the null character. Let's try both of these. So if I know how strings are represented, I can just say while s bracket i does not equal backslash zero. Now this is a bit of a funky syntax, because even though it's two characters, I still have to use single quotes, because those two characters, just like backslash n, represent one idea, not two literal characters. But this is a literal translation of what we just discussed. Initialize i to zero, incremented on every iteration, but every time you do that check does the i-th character in the string equal the special null character, and if so, that's it for the loop. We only want to iterate through this for loop so long as it's not that special backslash zero. 

So if I go ahead now and save this file and make string and run dot slash string and my input for instance is Emma, Enter, I'm going to see literally her name back. So this is kind of my way of re implementing the idea of percent s, but using only percent c. But I liked your suggestion. Why don't we use the string-- the length of the string, rather than this low-level implementation detail? It would be really nice if I could just say while i is less than the length of s-- so how do express this? Well, it turns out there's another file called string dot h inside of which are a bunch of string-related functions that I might like to use. 

One of those is a function called str leng, for short, which means the length of a string. So I can take your suggestion and just say, I don't care how a string is implemented. I mean, my god, the whole point of programming ultimately is too abstract on those lower level implementation details. Let me just ask the computer what is your length, so that I don't count past it. Let me go ahead now and make string, dot slash string. Let's type in Emma again. And the output is the same. 

But now, this is correct perhaps, but I argue it's not very well-designed. I'm being a little inefficient and I bet I can do this better. What do you see? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Go ahead. 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, exactly. Remember in a for loop that the condition in the middle, in between the semicolons, is a question, a Boolean expression, that you ask again and again and again. And it turns out that calling a function is not without cost. It might take a split second, because computers are super fast, but why are you asking the same question again and again and again and again. The answer is never going to change, because Emma's name is not growing or shrinking, it's just Emma. So I can solve this in a couple of ways. 

I could do something like this. Int n get str leng of s, and then I could just plug in n. My program is just as correct, but it's a little better designed now because I'm asking the question of string length once, remembering the answer, and then using that answer again and again. Now, yes, technically, now I'm wasting some space, because I now have another variable called n. So something's gotta give. I'm going to use more space or maybe more time, but that's a theme we'll come back to next week especially. 

But it turns out there's some special syntax for this, too. If you know in a loop that you want to ask a question once and remember the answer, you can actually just say this and do this all in one line. It's no better or worse, it's just a little more succinct, stylistically. This has the same effect of initializing i to zero, and n to the length of string, and then never again asking that question. So I can save this. I can make string. I can then do dot slash string, and I'm going to see hopefully, Emma, Emma again. So a third and final version of this idea, but a little better Designed. Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: In this case, it's OK. This would be a common convention. When you are doing something especially to minimize the number of questions you're asking, this is OK, so long as it's still pretty tight. But there, too, reasonable people might disagree. Yeah? 

AUDIENCE: Is the prototype string in library [INAUDIBLE]? 

DAVID MALAN: Really good question. The prototype for string, its declaration, is in string dot h. I would get one of those cryptic error messages if I forgot to include string dot h, because clang would not know that str leng actually exists. Let me try another example here and see what kind of power we have now that we actually are controlling-- now that we actually understand what a string actually is. Let me go ahead and whip this up real fast. So up here in my program, called uppercase dot c, me give myself the CS50 library. Let me give myself standard Io dot h. And now let me give me string dot h, just so I can use str leng. Let me give myself the name of a function main. 

And then in here, let's do the same thing. String s gets get string. But this time, let me just ask the human for the string before I'm going to do something to it. Then I'm going to go ahead and say after I want the following to happen. And I'm going to do this-- for int i get zero, n equal str leng s as before. Do this so long as i is less than n, and on each iteration, i plus plus. So copy-paste from before. I just retyped out the same thing. Now let me go ahead and in this for loop, let me change this string, whatever it is, all to uppercase. 

So how might I do this? So let me go ahead and say, well, if the current character at s bracket i is greater than or equal to lower case a, and that same character is less than or equal to lowercase z. So I'm using some week one style stuff, even though we didn't really use this much syntax last week. I'm just asking a simple question. Is the i-th character in s greater than or equal to lowercase a and-- double ampersand means and-- logically, is that character less than or equal to z? So is it a, b, c, all the way through z-- is it a lowercase letter? If so, I want to do something like convert to uppercase. But we'll come back to that in just a moment. Else what do I want to do if the character is not lowercase and my goal is to uppercase the whole input? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, just leave it alone. So you know what? I'm just-- fine, I'm just going to leave it alone. I'm going to print it back out, just as I would with printf like that. So now even though this is not obvious from the get go how I'm going to solve this, I've now left myself a placeholder, pseudocode if you will. I just now need to answer this question. Well, it turns out a popular place to go for this answer would be AsciiChart.com And there's different ways to solve this, but this is just a free website that shows us all of the decimal numbers that correspond to letters. And recall from week zero, 65 is a, 66 is b, and so forth. 

Notice that 65 is-- capital A is 65. What is lowercase a? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: 97. And then look-- 66 to 98, 67 to 99, 68 to 100-- what's the difference between these? Yeah, it's 32. If you add 32 to 65, you get 97. If you add 32 to 66, you get 98, and so forth. So it seems that the lowercase letters, wonderfully conveniently, are all 32 values away from the uppercase letters. Or conversely, if I have a lowercase letter, logically, what could I do to it in order to convert it from uppercase to lowercase-- Sorry-- from lowercase to uppercase? Subtract, right? 

So why don't I try printing out printf, percent c, then go ahead and print out not the actual character, but just subtract 32 from it. I know these are integers underneath the hood. And frankly, if I want to be really explicit, I can convert it to an integer, the Ascii code, and then subtract 32, but that can be done implicitly-- we saw earlier. So let me go ahead and save this file and run uppercase, make uppercase, dot slash uppercase. And this time, let me write Emma's name in all lowercase, and voila, I see it here. Now it's a little ugly. What did I forget? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: A new line. So I'm going to go ahead and do that at the very end of the program, so I get it only once at the very end. Let me rerun-- make uppercase, dot slash uppercase, Emma in lowercase. Voila, I've got it uppercase. So this is like a very low-level implementation of the notion of upper casing something. So if you've ever done this in Google Docs or Microsoft Word-- convert this all to uppercase for whatever reason, that's all the computer is doing underneath the hood-- iterating over the characters and presumably subtracting off of that. But this, too, is at a low-level detail that we probably don't want to have to think about too much, and so it turns out there's functions that can solve this problem for us. 

And you might have discovered these last week or used them yourself. But on CS50's website is an example of what are called manual pages. And if I go ahead and pull this up on the course's website, we'll see a tool that adds the following. If I go to the course's web page and click on manual pages, you'll see the CS50 programmers manual, which is a simplified version of a very popular tool that's available on most computer systems that support programming. And suppose I want to do something like convert something to uppercase, I can search up there. 

And notice, there's a few functions available in C that relate to uppercase. Is upper, which asks a question, to lower and to upper. I'm going to go ahead and use to upper. I'm going to go ahead and use to upper. And if I click on this, I'll see essentially its documentation for it. And it's a little cryptic at first glance. But what you're seeing in the documentation is it's required header file and it's prototype. What file do I apparently need to include to use to upper? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Yeah, c type dot h. I don't really know what else is in there, but this is my hint that I should use that file. And what kind of input does to upper take? Well technically, it takes an int, for reasons that are explained in the documentation. But even if the documentation is not obvious, it turns out it's actually pretty easy to use. I'm going to go ahead and rip out most of this logic, and I'm just going to do this-- printf, percent c, to upper, s bracket i, semicolon. And up here, I'm going to go ahead and include c type dot h, because in reading the documentation, I realize that oh, I can pass in any character to to upper, and if it's lowercase, it's going to return in uppercase, and if it's not a lowercase letter, it's just going to return it unchanged. 

So if I save this file now, make uppercase, and then rerun this program, this time typing in Emma's name again in lowercase, voila, I've now used another helper function, something someone else wrote. But you can imagine that all the person did who wrote this function for us is what? Like an if else, checking the Ascii mathematics to see if the character is indeed lowercase. Any questions then on this? Again, now the goal is to move away from caring about 32 or the Ascii codes and just using helper functions someone else wrote. Yeah? 

AUDIENCE: Why [INAUDIBLE] 

DAVID MALAN: Why do you not need to-- 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: The type-- Ah, why do you not need to declare the type of int. I am. This only works if it's the same type as i. Good question. So I get away with it because both i and n are meant to be integers. Yeah? 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: Are there any limitations? No, you may use any functions you want on CS50 problem sets, whether or not we've used them in class. That's certainly fine, unless otherwise specified, which will rarely be the case. So what else then can we do? Well turns out, we've just empowered ourselves with a couple of new features, one of which is, again, called command line arguments. We've seen these before. What did I describe previously today and last week as a command line argument? What was an example? Anyone-- I heard here. 

AUDIENCE: Dash o. 

DAVID MALAN: Dash o. Remember that clang can have its default behavior, which was a little annoying, whereby it outputs a file called a dot out, overridden by saying dash o hello, or dash o anything, to change the output to a file of your choice. That was an example of a command line argument. You literally typed it after the command, on a line, and it's an argument in the sense that it's an input to the program. So a command line argument, more generally, is just one or more words that you type at the prompt after the program you care about running. 

So where are these germane here? Well finally, can we now explain what a little more of this canonical program is about. We already discussed earlier today that includes standard Io dot h. It just contains your prototypes for things like printf, and that gets copied and pasted during pre processing into the file, and so forth. But what we've not explained yet, what void is here, let alone what int is here. We've just been copying and pasting this now for just over a week. 

Well it turns out, that in C, you do not need to write only the word void inside of those parentheses. You can also write, wonderfully, int arg c, string arg v, open bracket, close bracket. Now why is that compelling? Well notice there's a pattern here, and it's quite similar to my average function a moment ago. It takes two arguments main, apparently. One is an int, and one is what? It's not a string, per se. It's-- 

AUDIENCE: [INAUDIBLE] 

DAVID MALAN: --an array of strings. Now arg v is a human convention. It means argument vector, which is a fancy way of saying an array of arguments. And the way you know this is an array is by the fact that you have open bracket closed bracket. And it's an array of strings because to the left is the word string. This is just an old-school integer called int arg c, which stands for by convention, argument count. However, we could call these arguments anything we want. Humans for decades have just called them arg c and arg v, just like my average function took in the length of an array and the number of scores inside of it. 

So what-- the actual scores inside of it. So what can we do with this information? Well it turns out, we can now write programs that take words from the human, not via get string, but at the actual command prompt. We can implement features, like clang has. So let me go ahead and write a program called arg v in a file called arg v dot c. Let me go ahead include the CS50 library. Let me go ahead and include standard Io dot h. Voila. 

Now let me go ahead and do int main not void, int arg c, string arg v, open brackets. So it's actually worse than it has been, but now it's useful. We'll see. And now I'm going to go ahead and do this. Let me go ahead and say if arg c equals two, that's going to mean that the human has typed two words at their prompt. And I'm going to go ahead and say this, hello percent s, new line, and then I'm going to plug in arg v bracket one, for reasons we'll soon see, else if arg c does not equal two, I'm just going to hard code this and say hello, world, backslash n. So what am I doing? 

I'm trying to write a program that allows the human now to write their name at the command prompt, instead of waiting for the program to run and use get string [INAUDIBLE] like a blinking prompt. So what I can do now is this, make arg v. It compiles. Dot slash arg v, Enter. Hello, world. So presumably, what does arg c equal when I run it in that way? 

DAVID MALAN: Maybe one-- I mean, not two, at least, it stands to reason. It's not two, because I didn't see my own name. So if I go ahead and rerun it now, it would say David. What's it going to say, hopefully? Like, hello comma David? And indeed, it does. Why? Well when you run a program that you have written in C and you specify one or more words after your program's name, you are handed those words in an array, called arg v, and you are told how many words the human typed in arg c. 

So the clang program, the make program, help 50, style 50, check 50, all of the programs we've seen thus far that take words after the program's names, literally are implemented with code that's similar in spirit to this. Some programmer checked oh, did the human type any words? If so, maybe I want to output a different name than a dot out. Maybe I want to output the name hello. When you run make something, well what do you want to make? That's a command line argument that the human programmer checked arg v for to know what program it is you want to make. 

So it's a simple idea, even though the syntax is admittedly pretty ugly. But it's the same idea. And the only two forms then, for main moving forward are either this new one, which lets you accept command line arguments, or the old one, which is when you know in advance I don't need any command line arguments. It's entirely up to you which to use, if you actually want to accept command line arguments. Now there's one last detail that we've not explained yet and that's this one here. 

Why the heck does main have a return value? And there's not really a super compelling reason here, but we can see that there's a low-level reason that this is useful, but it's not something to stress over much. It turns out that main by default in C does have a return value. And even though we have never returned anything from main yet, by default, main returns zero. Zero in computers typically means all is well. It's a little paradoxical, because you would think zero-- false-- bad. But no, zero tends to be good. 

The reason for this is that main can return non-zero values, like one, or negative one, or 2 billion, or negative 2 billion. In fact, if you've ever seen an error message on your Mac or PC, sometimes there's a little window that pops up and it's a cryptic looking code, like an error has happened, negative 42, or whatever. That number is just an arbitrary number some human decided that their main program will return if something went wrong. And we can do this as follows. I can write a program like this in a file called exit dot c that has, say, the CS50 library, that has includes standard Io dot h, int main void-- I'm going to go back to void, because I'm not going to take any-- or actually, no, I'm going to do int rc, and then string arg v brackets, so I can take a command line argument, and I'm going to start to error check. 

Suppose this is a program that the human is supposed to provide a command line argument. I'm going to do this. If arg c does not equal two, you know what I'm going to do? I'm going to yell at the user, say missing command line argument backslash n, but now I want to quit from the program. I want to do the equivalent of exit. So how do you do that in C? You actually return a value. And if all was well, you would return zero. However, if something went wrong, the sky's the limit, up to 2 billion or negative 2 billion. However, we'll keep it simple, and just return one, if something went wrong. 

Meanwhile, I might then say printf, hello, percent s. Type in arg v one, just as before. And then, if all is well, return zero. So not much new is happening here. This program is very similar to the last, except instead of saying hello world by default, I'm going to yell at the user with this, missing command line argument, and then return one to signal to the computer, this program did not succeed. And I'm going to return zero, if and only if, it did. Yeah? 

AUDIENCE: Why is arg c unequal to zero? 

DAVID MALAN: Why is arg c not equal-- really good question. So let me go ahead and change this. What is in arg v zero that makes it have two things instead of one, if I run David-- if I run my name, David. Well, hello-- let me recompile. Make arg v one, or make arg v, dot slash, arg v, hello-- no, wrong program. Make exit. Sorry. There's no program to detect that mistake. Dot slash exit, missing command line argument. However, if I do exit David, now I see-- oh, did I run arg v before? Check the tape. 

Hello dot exit. So in arg v, the first word you type, the program's name, is stored at arg v zero. The second word you type, the first argument you care about, is an arg v one. And that's why arg c is two. I literally typed two words at the prompt, even though only one of them is technically an argument I care about. So where can we go from this? 

So we're going to use this now to solve a number of problems, that of readability, for instance. You might recall this paragraph here. Mr. And Mr. Durst-- "Mr. And Mrs. Dursley of number 4 Privet Drive were proud to say that they were perfectly normal, thank you very much. They were the last people you'd expect to be involved in anything strange or mysterious, because they just didn't hold with such nonsense," and so forth. So from the very first Harry Potter in the Philosopher's Stone, if you were to run the entirety of that book through a program written in C, that analyzes its readability, you would be informed that the grade level for that book is estimated at grade 7. So you can read it well and comfortably if you're a human in grade 7. Why is that the case? 

Well, the program, as is conventional in software, would analyze like the number of words in the sentence, the lengths of your words, how big the words are that you're using. There's a number of heuristics that are not perfectly correlated with readability, but they are-- they're not perfectly aligned with readability, but they do correlate with readability. So the bigger the words, the bigger the sentences, and more likely the older you should be to actually read that text effectively. 

Now something like this. "In computational linguistics, authorship attribution is the task of predicting the author of a document of unknown authorship. This task is generally performed by the analysis of style metric features, particular characteristics of an author's writing that can be used to identify his or her works in contrast with the works of other authors." If you were to run that through the same program and see, otherwise known as Brian's senior thesis, you would get grade 16, because he uses a lot bigger words, longer sentences, more elegant prose. 

It turns out that this program in C to which I allude, will exist in a week, because for the first problem on the problem-- one of the problems on the problem set will you implement a readability analysis. But it all boils down to taking in text as inputs, such as Harry Potter or Brian's text, analyzing the lengths of the words, looking for the spaces, and so forth, and deciding how advanced that text is. But we're also going to challenge you with this, this notion of cryptography, the art of scrambling information to keep it private. 

And cryptography might work, just like in week zero, as having inputs and outputs, where the input is the message you want to send safely to someone else. The output is some kind of scrambled version thereof, the equivalent of, like in grade school, maybe writing a little love note to someone and passing it through the class to the recipient. And you don't want the teacher, if they intercept it, to be able to understand the message, so it's somehow scrambled or encrypted, so to speak. 

In cryptography, the input is called plaintext, and the output is called cipher text. So if we were, for instance, to say something like hi exclamation point, recall that, that of course can be represented in Ascii as three numbers-- 72, 73, and 33. Well, it turns out, if we want to send a fancier message, a longer one, we can just look at all of those numeric equivalents, do some mathematics on them, and effectively scramble them. But we need a key. You and I need to decide in advance, sender and recipient, what is the secret we're going to use to kind of jumble the letters up so as to encrypt it without a teacher or a classmate intercepting and decrypting it. 

Suppose, very simply and probably foolishly, our secret number is one. You and I both green one is our secret and we're going to use one to scramble the information as follows. If I want to say, I love you, and send this across an insecure medium, like a roomful of people, well I might first convert each of these letters to their Ascii equivalents just by looking them up on AsciiChart.com or doing it in code, then I might go ahead and start adding one to each of those letters, because that is the secret on which you and I have agreed, and then I'll convert it back to the characters as by casting it from an int to a char so that the message I actually write on my piece of paper, or send in my program, looks like this. So that if a teacher or a classmate intercepts it, they see this, but you know, I love you. And so, with that said, will you be doing your readability and cryptography and more? That's it for week two, and we'll see you next time.