>> All right, welcome -- welcome to CS 50. This is the end of Week 4, because you all got here a little bit early we have a little bit of a treat for you. This is perhaps going to be among the most awkward six minutes, though, of your life. [ Applause ] >> This -- the following, is not a spoof. [ Background noise ] >> Hey, welcome to the party. The four of us, along with host [Inaudible] and you, are launching Windows 7 ultimate software. So you know what, let's take a minute or so to tell you about how great it is to host a launch party. You can use house party tools to build your guest list, up load your pictures which [Inaudible] and you can even get a party pack. Though you're in your own home, you'll be able to participate with others in this exciting event around the world. >> In a lot of ways, you're just throwing the party with Windows 7 as an honored guest. Sounds easy. And it is. But we thought you would probably like to know what to do to get ready, and [Inaudible] >> Yeah. The four of us got to have our parties a little ahead of schedule, you know, try everything out. So we thought we'd be able to tell you some of the tips that help make our parties really fun. >> Of course the first thing you want to do is install Windows 7. >> Duh! >> Make sure you do that a couple days in advance of the party. >> Call customer service if you have any questions. >> Exactly. >> Play with Windows 7 before the party. >> Second, look at the activities you and your guests can try at your party, and choose the ones that seem to you to be the most fun. >> There's a video of each activity from one of our parties and we have tried them all, right? So you can see how the activity runs. And you know what, there's a handy page of notes to print out that tells you what you need ahead of time, I mean, it helps guide you along the party, right? Print these out, because these are called host notes, and they're at the web site. >> Hey, again, you don't have to do the activities listed under your party favor. You just look at them all and decide which one seems to be the most fun for your guests. >> And some of the host notes, they list bonus activities. [ Multiple voices speaking ] >> You want to try them, but you want to make sure that you have the right devices on hand. >> Right. Now how the party flows is totally up to you. Now here's a sample agenda, though, that we each more or less followed. >> Right. For the first half hour or so at my party the guests just came in, had a drink and a snack, and mingled like they do at any good party. But many of the activities suggest that you have photos from the party. So we each made a point of shooting 20 or so photos on a digital camera at the start of the party too. >> Oh my gosh, when everyone was there and settled, I ran an overview of some of my favorite Windows 7 features. I showed my guests things from 7.2, the Windows orientation video, and it took like ten minutes. You know it was great, it was totally informal, like everyone just kind of crowded around the computer in the kitchen. >> We also started with some of the basic Windows 7 features, right? >> Yeah. I mean, it's a good way to get things going, right? Whatever your party is. We've got four separate videos of each of us doing bits and pieces of this kind of thing at our own parties. Now, after my review, I went straight to an activity. >> Oh, you went straight to the activity? I let everybody fool around with snap for a minute -- [ Multiple voices speaking ] >> And then we started an activity maybe 30 minutes later. >> Well, either way works, right? You figure out what your guests want, and just play it by ear. In any event, we each did an activity or two. >> I did three activities, and -- [ Multiple voices speaking ] >> That's great. The activities, each have you talk for a minute or so, and then you set something up so your guests can try. So for the rest of the party you just leave your computer on and running, and then you let folks mess around with it, right? And the guests may have a question for you or something, or you may want to show them some things, but it was all really informal. Oh, and then when you're close to the end of the party, >> wanting everyone to leave -- [ Multiple voices speaking ] >> Towards the end I showed the guests windows.com/help, and it's a great site for people to get more information. And I found it to be a nice wrap up. >> I agree, help and how to is a great way to bring it all together. >> Yeah. >> And it's a great resource for you hosts too. You can find out pretty much anything you want to know about the features we're using for the activities. >> That's one way to flow the party, but again, it's all up to us, you, it's what's right for you and your guests. >> Yeah, that's exactly right. >> You know, the four of us learned quite a few things to help make our parties a lot of fun. >> Absolutely. Here's mine. Make the thing you're demonstrating personal to someone at the party. Like the way I made Chip's files get transferred by Windows easy transfer. >> Or the way I showed by guests web slices by talking about Frank's online auction shop -- [ Multiple voices speaking ] >> Or having folks edit photos of themselves to e-mail home. >> Bottom line, guest love it when the activity is about them. >> Hey, another thing, I found that it really helped to name the person to be the first with the hands-on activity. And have them pick the next person, and so on and so on. >> Yeah, I think you'll see that in a lot of our videos. It really helped get the guests onto the computer. >> Okay. On a more serious note. Decide what activities you're going to do at least a day or two in advance, and watch those videos and read the hand outs, and some activities have modest set up, you know? They require certain things for you to have at the house. >> Sure. Like you have to have two computers to do the web chatting activity. >> That's right. >> In any case, none of the set up is too hard, right? You need to make sure you're ready to go when your guests arrive and there are bonus activities in some cases, and you won't go deeper, perhaps, into it, and you have to have the equipment to do that. >> Right. >> Hey, it helped me to remember I'm not a salesman at this party. I'm not supposed to be a total expert either. This is a brand new product. And part of the point of a launch party is seeing what you already know and what you can figure out. It's just so simple anyone can do it. >> That is one of the great things about ending with help and how to. It's nice to be able to say throughout the party that you'll wrap it up with a great resource for any unanswered questions -- >> Exactly. I think the biggest thing is to be totally creative with the party and the activities. I mean, this is your party. >> Exactly. >> Can you believe that Microsoft put the launch of Windows 7 in our hands? They nuts or what? >> Maybe by letting you be involved. >> That was -- [ Multiple voices speaking ] >> I mean, I don't know, really, it does make total sense. Windows 7 is all about the computer user, making everyday tasks more simple, working the way we all really wanted to, and making new things possible. This really is our launch. >> Yeah. You're right. So it ought to be a party. Have fun out there. >> Cheers. >> Have a good one, guys. [ Multiple voices speaking ] >> So I had six minutes to come up with wise-ass comments, and I've got nothing. Like -- that is a genuine commercial for the release of a new operating system. And I realize this is CS 50 and we do some geeky things in this class, we encourage you to start your P sets on Fridays, I did it myself. My God, if you all start having Windows 7 parties, like, that is over the line. So -- let me counterbalance this by saying two things. One is if you do have a PC and would like to install Windows 7, CS 50 does have access to stuff like this for free. So you can actually go to the software page of CS 50's web site and follow the directions there for downloading windows in many different flavors. If you guys are Mac users, we also have -- or even LINUX, we have access to what is called virtual machine technology, VMware fusion, VMware work station, we'll talk perhaps briefly later in the semester about this. But for now these are just programs that you can download via our web site on a Mac, run them, and inside of them can you then run Windows if you like. So you may be familiar with this idea of dual booting a computer, boot camp is one incarnation of this. But that requires that you restart your whole computer to choose between Mac OS and Windows. Well, that's not really necessary these days, thanks to virtual machines. You can literary run Windows inside of a window on your Mac. But unfortunately, Apple does not really let the reverse be possible on PCs, albeit with a few exceptions. So let me now counter balance this ad with one other from one of our own CS 50 CAs groups, this is a good commercial. [ Music and singing ] >> So the cool people will be going to that party. Another dance? So there's lots of competition. But I did forget the best part. Let me toggle over. So this morning I downloaded 22 pages for notes for how to host my own Windows launch party. So you can do this too. We linked to it on the Courts lecture page. They have -- come on, focus, focus -- so you can download these things here, what do I do at the party, introduce; we'll be making the computer uniquely our own by changing the desktop background. And I only downloaded two of the like five or six PDFs. So anyhow, bless their hearts. Microsoft has been very good and generous to the course. But someone over there is I think out of control. So -- without further ado, we left off on Monday talking about pointers, which are just addresses. And we also talked a bit about GDB. So in reverse order here, GDB again is a debugger. This is a tool that allows you to step through your code and as the name implies, debug your code, find problems with it a little more easily sometimes than you can by just reasoning through your own code or putting print defs all over the place. And it's actually been wonderful to see on helpCS50.net the past couple of days, and the bulletin board, a bunch of you, even without much encouragement from us have actually been trying out GDB before reaching out with a question. So let me make that plea formally now. The next time you run up against a bug, before you think you need to resort to office hours, before you think you need to turn to the bulletin board or e-mail, do just fire up GDB, run it on your program as we saw on Monday, GDB, space program name, and then a few of the commands I introduced are probably sufficient for now to start poking around, set a break point at main, type next, maybe step, and just walk through your program. And there is on the resources page of the course web site a PDF, it's more detail than you'll want, probably, in this first week of using GDB, but it's essentially a cheat sheet for all of the things you can do. And you'll probably see me in the TFs do even more than that. So do start there with GDB, and you'll be surprised how powerful it is. So I spent yesterday using myPHP and mySQL skills to whip up a little grades interface for the TFs to input all grades into the course's web site, and then for you guys on an encrypted page to view them. So the TFs are in the process of up loading scores. You probably received via e-mail already, but do take a look at the new grades link on the course's web site. And this is meant to be a sanity check for you guys too, to make sure that our records reflect what you think you in fact received on some P set or quiz. No new hand outs today. Okay, so we left off talking about pointers and memory and we said that this is going to allow us to do a lot more because it's really giving us pretty low level control of the computer in terms of memory and ultimately hardware, but we also promise that we can do some damage. And we'll see over time how bad things can happen, and frankly, a number of you have already experienced this most recent week bad things happening, like weird things appearing in your game of 15, bored all of a sudden, and very often we've seen this is the result of your overstepping the bounds of some array. For instance, your board array. So just to quickly recap over here, if you declare something like inter, X, a little something like that, and hopefully you all can see this, but I'll recite anything I write. So if I declare int X, pictorially, we've been drawing this as a little square. This square is how big in size? 32 bits. AKA, 4 bytes. And this would be called X. So it has some little label and it's in fact 4 bytes. If instead I do something like this, char C, what does that look like in memory? It too is a chunk of memory, but how big is it? So it's just 8 bits or 1 byte in memory. So this would be C. So the quick recap is chars are 8 bits or 1 byte, shorts are 16 bits or 2 bytes, ints are 4 bytes, longs are 4 bytes on these computers these days. And long long is 8 bytes, or 64 bits. And we have floats and doubles, which are similarly 32, and 64-bits in size. But then we started doing other things. We started saying things like inter-- inters, let's say ARR bracket 4. So this declares an array, and so an array in memory looks instead a little bigger. So this is going to look a little something like this. It's going to have a label of ARR, and then what's going to be inside of this chunk of memory when I write this line of code here? Ones? [ Inaudible audience comment ] >> So garbage values, right? We don't know, right? We don't know, and we've seen examples already, if we just declare a variable or declare an array of inters or any data time, who knows what's going to be in there. So one lesson on Monday was to make sure to always initialize your data, otherwise unexpected things can happen. Now ARR here is the name of this array. But now you can start thinking of this ARR, the name of this variable, if it's an array, as not just a label or a name, but it's kind of the address in memory. Because one of the other things we said on Monday is that when you allocate a whole bunch of things in memory by way of an array they're contiguous, and this is important, contiguous as in the first int is here, the second int is here, and the third and the fourth are all back-to-back in memory. And that means you can then index into them using this square bracket notation because the computer knows that the zero element is right at the beginning. The one element is then 4 bytes over, the 2 element is 8 bytes over, and so forth. So arrays give us what's called random access, because you can jump anywhere in the array you want, so long as you know the left-hand side, and the right-hand side or the length, but we'll see in just a couple of weeks that there are other data structures that are actually more sophisticated, but for which we're going to have to give up that feature. Now what does it take to remember where an array as in memory. Well, yes you could certainly remember the address of this 32-bits, this 32-bits, this, and this. But again, it suffices, so far as we've seen, to just remember the first address, so long as you also remember what? So the length of the array. So unlike Java, those of you who have programmed before, unlike Java, C arrays do not let you ask the question how big is this array. At least usually. There are a couple of exceptions. But you can't ask that, so you have to keep it around and your own variable like N. So we also talked increasingly about this thing. So we said -- let's just rotate this arranged, we started talking more about strings, and a string is just a synonym for what data type? Yeah, so char star or really a string we've said is an array of characters, but char star implies that it's actually a pointer or an address. So these things too are now kind of the same. If I do something like -- if I declare a string, let's actually leave room this time, if I actually declare a string and store in it a short word like foo, I might do string, S gets quote unquote, foo. All right, and in memory, what does this look like? Well, how many bytes do I need to draw on the board? Okay, good. So that was good. Not easy to trip up just yet. So we need 4 bytes, because we need F-O-O and then the back slash 0. And in order to write a back slash 0, the character that's going here, this is the character F, and remember with char you use single quotes. This is the character or char O, character or char O, and then this is also single quote, back slash 0, single quote. So that's all that means there. You may see some text books just write a literal zero, but I would say many or most people just use this notation that this is in fact a char, just happens to be the actual number 0. But I can -- this is clearly the same thing as this. So char star S equals bar actually does the exact same thing in memory, because string and char star are just synonymous, as we'll see in the CS 50 library today. So now we have B A R back slash 0. So what is S? It's the label, yes, of these actual strings, rather, it's the label for these actual strings. But it's also as we're starting to see now an address. An address is something that's numeric and it's something that we're going to be able to perform tricks on to actually really start to manipulating our memory. So with that said, let's take a look at this example here. So this is again from the other day's hand outs. Pointers 1.C. And what I did today was I ran a little script on my own code that removed all of the comments just for the sake of discussion, but your print outs have some of the answers to some of these provocative questions. This program here is pointer 1.C. And the first interesting line of code has me calling get string, so this is going to return what? So get string prompts the user for a string, when I finally type some characters and hit enter, what is it that's being returned, technically. So conceptually, I'm getting back a string, a little more concretely, I'm getting back an array of characters. But again, return values in a C function can only be one thing, and you can't just hand me a whole bunch of characters. So what am I literally being handed as a return value for get string? A pointer. So an address of that string. So apparently, and you'll see this when we peel back this layer today, get string is actually allocating a chunk of memory, it's saying to LINUX I need a bunch of bytes of memory, I need 4 bytes, I need 8 bytes. Whatever. Tell me what address I can start writing these characters into RAM. Because then once I'm done writing the characters that the user has typed in, I'm going to return to the caller, the function who called me, the address of the first byte that you the operating system handed me. All right, so why am I checking for null here, under what circumstances do U something like get string might return this special sentinel value null. sorry? [ Inaudible audience comment ] >> I didn't enter anything, so maybe I hit, for instance, control D, is sort of an esoteric trick. If you essentially want to tell the computer you're not providing anything, you can send what's called the end of file or E O F signal or character, and that's usually done by hitting control D. So we need to be able to handle that. But under what other sort of more familiar circumstances might get string not be able to return to me, the address of some string in memory. Again, start thinking corner cases. What could go wrong. What could a really object noxious user try doing just to mess with me. Got to be a little more committal. Like, what could go wrong. Okay, so just pressing enter. So hopefully I handle really short strings. And in fact this code does. What's the -- what's the opposite of that, right? Again, corner cases generally mean, you know, if you're expecting a number, give it a word. If you're expecting a positive number, give it a negative, see what happens. So in this case, it's expecting a string, don't give it a string. Or what's the opposite of that. Give it the biggest fricking string that you can just by, you know, holding down your keyboard for a while and then hitting enter, just to see if you can overflow the memory in the computer, because this too would be bad. Because if the operating system doesn't have a billion bytes to give you, I really went to town on the keyboard and tried typing in a billion characters. Well, if there's not that much RAM in the computer, get string, you know, maybe it could return part of the string, just the first several thousand characters. Or something bad's going to happen because it's going to start overwriting important memory. That's where we would need to check the documentation. And as you'll see in CS 50.8 our own header file, which you do have a print out of as well, this is where in the absence of a man page can you actually turn for documentation for functions. So if you got some source code that you're using from us or from something you downloaded for future projects, I mean, honestly, looking at the source code is a very good place to start, assuming that that person has exercised some good design and style and actually documented it. So I'm curious, let me scroll down. Okay, get char, I'm not quite interested in that yet. Get float, get int. And now notice, this is a header file. So what do you see and what do you not see in this file. Sorry? So these are all called what, here? These things that are not comments? So these are the function declarations or the prototypes, they're not -- the functions are not yet implemented or defined. So declaring a function means telling me what it's going to look like. Defining or implementing a function is actually writing the code between the curly braces that implements that function. So a header file generally, as we'll start to see today, you guys have been using sharp, including header files for some time, is generally not much code, but rather a bunch of comments describing what this library or what this file can do. And also declarations for things you, the programmer, might need. So for instance, when you have been using standard lib dot H or math dot H or any of the other dot H files that you may have found useful over the past couple of weeks, what you're doing is telling GCC to go look on the local file system, the local LINUX hard drive, find the file called math dot H, because in that file is a list just like this of all of the special math functions someone else put a lot of work into writing for my benefit, and this way now does GCC know what -- what functions I can call. But generally, one other stem. Simply including the C -- the dot H file with sharp include is only telling GCC that this function exists. How do you then tell GCC where to get the actual bytes that comprise this file. How do you link them in, so to speak. And that's the keyword. Link. At what point too you link in the math library. Yeah. So compile time could be dash L M flag. So it's essentially a two-part process. This is sort of full disclosure. Hey, GCC, here comes some functions that I want to use. Here comes some constants, here comes some synonyms I want to use. But to actually tell GCC to compile in the zeros and ones that implement this stuff, that live in a different file all together, probably CS 50.O file or a math.0 file, you need that linker flag at the command line, the dash MLM, dash CS 50, and so forth. So we were looking for the get string, here it is. So apparently get string returns a string, and here's its documentation. Reads a line of text from standard input and returns it as a string, sans new line character. So in other words, even though I might type foo and hit enter, apparently this function's going to get rid of the new line character for me, and just return F-O-O and they be the terminating character, back slash 0. Ergo, if the user inputs only enter, it returns, quote unquote, in answer to your question, not null. So there's apparently a distinction here. Null is the special thing which signifies I've got nothing for you. But quote unquote in computer science is generally known as the empty string, which is a string, but there's just no actual readable characters there. So there's a difference null is nothing. Whereas the empty string represented in code like this is actually represented how in memory. With how many bytes? So there is in fact 1 byte being used to represent the so-called empty string, and that byte simply contains back slash 0. So that's the difference. Whereas null, this special constant null, this is like -- there is nothing actually there. I'm returning just the special sentinel value. Okay, so what about string? Well, all this time we've been saying that string is just the synonym more char star, and this is why. So here at top left at the top of CS 50 dot H, there's this feature called type def, where you can define your own data times. And we'll also do this for more sophisticated purposes. But what type def here is saying from left to right, is declare a type called string that is identical to char star. So it's just a synonym. And we only do it at the start of the semester just to kind of simplify things a little bit. But realize too, you may see things like this. So you don't need to have the star, FYI, right next to the variable name. You'll often see code like this. It generally tends to be clearer, though. If any time you define a pointer, moving forward, it is right next to the variable, or in this case the data type name. So just FYI on that. Okay, so now let's take a look again at this code that open this line of discussion. So here we go with the code. I call get string, I store it in S, I then do a sanity check. If it equals null, just return 1. Bad stuff's going to happen if I try using this thing otherwise. And then this for loop has what? So int I gets 0, N gets as I recalling, S, okay, so that's sort of boring for loop stuff. But there is something interesting here. What is going on here, exactly. so what is S, first in technical terms. What is S? So it's a pointer, which is an address, which is just a number, right? So every time I draw a RAM on the black board, I draw it as a rectangle, and I say byte 0 is here, then there's byte 1, then byte 2, then dot dot dot, byte 2 billion, if you've got 2 gigabytes of RAM. So memory can just be addressed numerically. I don't know if it's bottom up or top down, really depends on how, you know, the machine is viewing its chips of RAM. But in this case, we just care that S is a pointer or an address or number all the same here, and so what am I doing on each iteration? Well, S is the starting point of whatever string the user has typed in. So if the user has typed in F-O-O back slash S, first of all what is the value of N going to be within this loop. Hmm. What's the value of N going to be, once called here with string length? So it is going to be 3. So when we ask about the length of a string we mean in human terms, not special computer encoding terms. So the string length here is in fact 3, it is not 4, just to be clear. So what is the length of this string? So 0. Right. Because there's no actual alphabetic characters or otherwise there. Okay, so what is S? Well S is a pointer. So pointers we know are 32-bit chunks of memory. Right? So an address is 32-bits. So it looks like an int, but it's a special data type called a pointer. And now what is inside of S? Well, S is technically going to have the address of the first character in the string. See we're going to push the limits of my hand writing ability on a tablet here. But if this is O X 10, this is O X 11, so now we're dealing with chars today and not ints. So a char is 1 byte. So the size of the data type in now important. This is O X 12. And then O X 13. What is actually stored in this 32-bit chunk of memory called S? O X 10. Right? O X 10. The address of the first character, or pictorially, and it's a lot more user friendly just to start drawing things with arrows, what we essentially have is a pointer, pointing to the first character there. Okay, so with that said, S is then the address of this first piece of memory. So this loop iterates from 0 on up to 3. So it's going to execute for I equals 0, 1, and 2, on up to 3 but less than 3. So what am I printing? I'm printing a character in a new line on each iteration, and what am I doing? I'm printing some math. So S plus I. So S plus 0 is what on the first iteration, what number. O X 10, right? The hexadecimal value 10. Which is just a number represented in hex. Okay. So star and then an address tells me to do what? Go there. Right? That's all we said on Monday, when you put a star in front of a variable, if that variable is a pointer or in this case if you put a star in front of an arithmetic expression that itself is the result of doing math on a pointer, the star just means dereference this, go there. So that means go to this character and what gets printed? Perhaps needless to say, F. Now we iterate onto the loop again. So I gets plus plus, so now I is 1. So X plus 1 is now O X 11, so what gets printed? Right, so now we do a little bit of math. So O X 11, go there. So O gets printed. Plus plus, O X 12, go there, print that, and then I is now going to equal N so the loop terminates. Right? So that's all that's going on here. This is what's generally known as pointer arithmetic. And as the name implies, it just has to do with doing arithmetic on pointers. But GCC or the compiler figures out whether you want to add 1 byte or 4 bytes or as we'll see, it depends on the context. But for now it's pretty straight forward. Initially, why did I initialize a second variable here called N instead of just putting string length here, by doing this term, it's never going to work on this. So why did I do the approach I did and not just monk it to the condition part? Light. Otherwise I'd be calculating the length of the string S every time. And odds are the length of the string is not going to change, if I'm not doing anything destructively to the string, I'm just letting it be. So the length is never going to change, putting it into the condition would actually be fairly stupid because then this loop is going to have an increased running time just because I'm not -- I'm foolishly checking the length of it again and again and again. Okay, finally, there's this. And this is new today. What does this probably mean, free S. Why is that necessary? So yeah, so there is a keyword [Inaudible] that will get you today. All this time, get string and get -- yeah, so all this time, get string is actually pretty poorly implemented. Sort of objectively speaking. So CS 50's library is all about making it easier to get user input. To do this, we need to allocate memory on demand, but we don't know how much memory you're going to need at first, because we don't now how many words or how many characters our user is going to type in. So we allocate memory, as we'll see today, dynamically. We allocate enough -- as much as we need to fit your string, unless we completely exhaust the RAM's capacity, and then we return you a pointer to it. But the problem in a language like C and C++ is that if we hand you memory and you never hand it back, we will assume you're continuing to use it. So fast forward to reality, if you've ever been using your computer, Mac, PC, whatever, for a long time, many hours or even many days, without even shutting it down, I mean, what you probably experience is the machine starts to slow down over time or you know, hitting alternative tab or trying to change Windows might start to feel like things are grinding to a halt. There could be any number of explanations for that problematic behavior. But one of them is that your computer is running out of memory. So not physically, but your computer is running this program, and that program, and this program, and humans are fallible, and probably wrote some buggy code that asked the operating system for memory, but never gave it back. In fact, in the worst case, you can imagine a simple application, like an instant messenger, every time you get an IM, whether you're using AOL or MSN or Gtalk or whatever, a string appears on your screen. Well, that string has to be stored in memory somewhere. So you know, even if we just kind of think through intuitively how an instant messaging client works, odds are that memory is being allocated dynamically. Every time you get an IM, maybe, the OS is being asked, oh, someone just sent me L-O-L. I need another 4 bytes for this, or a longer sentence, I need an even bigger chunk of memory for this. But if that client never says to the operating system, oh, the user closed the window, here's all of that RAM back, AOL instant messenger or what not is just going to keep asking for more and more and more bytes. And then if you look at your activity monitor on a Mac or process manager on a PC, you might see that one stupid little program is using many, many, many megabytes, if not gigabytes of ram, because the programmer screwed up, and this is how easy it is to screw up. And in fact, any program you all have written thus far allocating memory has been buggy at least objectively speaking, because probably none of you have ever freed the memory you asked for by way of get string. But today that all ends. So it turns out that get string uses a function called meloc, for memory allocation, and we'll see that today. Free is essentially the opposite. You hand to free a pointer that has been known, that you know points to a chunk of memory that has been allocated for you. So let me go ahead and open now a second variant of this, just so show something a little more sophisticated indicated, and actually let me clean up one of the conditions for a moment just to show something slightly different. So this 5 here is kind of a magic number right now, but I just wanted to simplify the code for the sake of discussion, and I'm actually I'm going to go ahead and delete these two lines of code just for discussion's sake. So three lines of interesting code now. The first allocates an array statistically, as we'll say. If it's static in the sense that I give the values in advantageous and I'm not letting the user provide them, for instance. This is again how you statistically initialize an array. You can use curly braces like this, and just put your numbers or your strings or whatever inside separated by commas. But the loop is essentially the same. Here's a star, and here is some pointer arithmetic, but notice the difference here. And this is kind of neat. So this time the array is not of type char, it's not a string, but rather it's a type int. So an int, we said a moment ago is 32 bits or 4 bytes, and yet when I iterate over this program's array, printing out each of its numbers, so actually let's do that sanity check. So make pointers 2. Let me run pointers 2. That's all it does. It brings 1 to 5. But notice how I'm doing to do that. I'm iterating from 0 on up to 5. But each time I'm going the exact same arithmetic. So I'm taking numbers plus 0. Numbers plus 1. Numbers plus 2. But that feels kind of buggy, right? If an int takes up 4 bytes of memory that looks kind of like this, and I print the first int, well that makes sense, the 32 bits representing the number, say the number 1 gets printed. And so the width of this thing now is 32 bits from left to right. So this here is my pointer, called numbers, and it's pointing to the start of this element. And I print out those 32 bits. But if I then add 1 to it, that sort of means that this arrow is not pointing there, but it's kind of pointing here, right? Because that would be 1, this would be byte 2 and this would be byte 3, and then we'd have the next byte, starting at another 4. So is this buggy or not? Actually, this is kind of a leading question, because you wouldn't know the answer. So no, it's not buggy, because one of the features of this thing called pointer arithmetic, and this is just really to hammer this home so you don't yourself do the wrong math. GCC is smart enough to realize, oh, this pointer here is a pointer to an int I know from the way I was designed that an int is 32 bits or 4 bytes. Therefore, any time someone tries to perform arithmetic on me with plus 0, plus 1, I'm really going to do plus 1 times the size of me. Plus 2 times the size of me. So what this means is numbers plus I times 4 is really the mathematics that are going on. And that's what let's me go from left to right, across the array correctly. And I'll leave that updated version of the code there. Okay, so any questions before we start peeling back the layers of the library here. No? Okay. So here is a use of the CS 50 library. It says print def, save some -- this is an example that uses the library -- print def, say something, char star S 1, get string. So really, starting today and starting with P set 4 on ward, no more string. It's char star. That particular training wheel will come off. allocate enough memory for copy. Okay, interesting. So the context here is that I wanted to write a little program that let's me copy one string to another and then actually demonstrate that the copy is correct. So this is excerpted from this code here, this is copy 1.C. And it's not terribly long, but it uses the same building blocks. So up here I say, say something. All right? That's not that interesting. Here I say get string, and then I do a sanity check. Okay, so at this point, and as your print out suggests, I can comment those 4 lines of code with one comment like, get input from user or whatever. All right, so now this. This is a little worrisome here, what -- in English is this line of code doing? What is it copying? [ Inaudible audience comment ] >> So it's just copying the memory address and it's taking the value in S 1, which is an address, a pointer, and it's storing it in S 2, so at this point in the story, S 2 is a copy of S 1. But conceptually, S 2 is not a copy of the string pointed at by S 1. In fact, let's take a quick look. So I'm going to make copy 1, I'm going to then run copy 1, and I'm going to say something like hello there. Hmm, it didn't seem to capitalize the whole thing. Let's try this again with another word in all lower case. Foo. Lots of lower case letters. Okay, oh, okay, no, that's correct. I had to think about it. But what's wrong here, this is buggy. So the goal is to capitalize the original, or to make a -- actually I should tell you what the point is. The goal of this program is to take the original string, make a copy of it, and capitalize the copy. But clearly what's happened here is both have been capitalized. So let's take a quick look at the code and then see what it actually takes to fix this, not for the sake of fixing it, but for the sake of understanding what's going on underneath the hood. So at this point in the story we have a string called foo or whatever, in memory, and I would say keep it short so that my hand writing doesn't completely fail us. So I have a string called F-O-O back slash 0, I have a pointer called S 1, and that is effectively pointing to this byte in memory the moment get string returns. Okay, it is not null, so I don't return yet. I proceed to the next line. This line of code here where it says S 2 similarly declares a pointer of size 32 bits, even though it's pointing to a char, the point yes, sir is still 32 bits, so it's still the same size square, in reality -- in theory, not in reality. Per my handwriting here. So that's S 2. What is S 2 pointing at? So it's pointing at the same thing. So S 2 is a copy of S 1. But if we're trying to consider char stars to be conceptually bigger entities than just a number but an entire string, clearly, I've not copied the string, because the same 4 bytes are being used for F-O-O back slash 0. So now I claim I'm going to capitalize the copy here. So I do a little sanity check. If the length of S 2 is greater than 0, like let me make sure that I have room for this string, what do I do? I take the 0 location of S 2 and I change it to the result of calling 2 upper on the 0 character in S 2. And I took that function from the string library that's up here. So you can see it documented, you can check out the man page, it's just a little function that does capitalization for me, by remembering, oh, 65 on up is upper case, 97 on up is lower case, that's the whole deal. All right, but then I claim here comes the original, S 1. Here comes the copy, but the problem was that both the original and the copy were the same. They were both capitalized. But pictorially, that should make sense. So how do I fix this problem fundamentally if I want to maintain the original, and then make a copy, the latter of which only is capitalize d. What needs to happen on a high level? Yeah, so we need new memory, right? So we need to take 4 bytes that have F-O-O back slash 0, we need another 4 bytes, and then we need to fill that array with that particular copy. So let's go ahead and open copy 2. You each have a print out of this as well. Let's scroll on down here. And now this is the new magic. And this is something that's going to become very useful because thus far, pretty much any program you all have written, if it takes any form of input the only way you've been able to get input from the user is by way of get string. But you'll see and you certainly want to write programs that take far more interesting input than just a string here or a string there. And you're not going to be able to use a get string function. You might want to get a new record in a database, you might want to read in an entire web page, maybe not in C, but in another language. You're going to need dynamicism in which you can allocate as much memory as you want, but on demand. And the means by which you do this is this little guy here, meloc. So what am I doing in this version? The first few lines of code are identical. I declare a string of S 1, and I get a string from the user, so that picture again looks a little something like this, with a pointer to the first chunk of memory. Now though, I do this. I call a function called meloc for memory allocation, and that takes a single argument here. The number of bytes you want allocated for your use. So I had to do a little bit of math. I could have just hard coded 4 in here, but that's probably not the point of the exercise. So let me figure it out dynamically. How many bytes do I need? I need the length of the original string plus 1 times the size of the piece of data that I want to store in that location. So Stirling of S 1 gives that answer, what in this specific example from before? So 3, so 3 plus 1 times size of char, and the size of char, what does this return? So you're right, this returns one byte here, so not 8 bits, 1 byte. Size of return is bytes. But 4 times 1, 4 bytes comes back. And just a little sanity check, why this plus 1 here? Because you need the zero character for the end of the string. Okay, so let's see what happens next. Okay, so int, N gets string length of 1. Okay, so this is kind of a borrowing for loop that iterates from 0 to N. Oh, this is just using array notation to make a copy of S 2, S 1's I character, and put in S 2's I character. And then finally at the very end I need to make sure to install a back slash 0 here, those following along quite carefully might realize is this last line really necessary? If I really wanted to be nit-picky, I could delete this, but what would I need to change to make sure I have a back slash 0 at the end of the copied string, S 2. [ Inaudible audience comment ] >> Good. So I need to -- you know, I don't need to hard code in a back slash 0. Let me just steal the one that's all right in S 1 and make a copy. Which way is better? Eh, it's not really clear. The only reason that I might want to do this for sure is just to ward off the possibility that S 1 is somehow broken or corrupted, at least this way I know S 2 is going to stop at one point. But for the most part, these two are equivalent. And now what do I do? Capitalizing copy. So why am I executing this line here? Why am I checking the string length of S 2 in this context at the bottom of the program. Like, what is bad about not checking the string length of S. S 2. What might I do blindly? [ Inaudible audience comment ] >> So I might try changing a null character to an upper case character, or really, if this string S 2 has zero length, I'm saying go to the first byte in S 2. If there's nothing there, you all -- some of you have made this mistake, I mean, maybe we can do a little confessional, even with problems in theory. How many of you are willing to admit you created a core dump in your directory. Okay, so now those of you who weren't willing to put your hands up now, now your hands can go up, right? So you're in good company. So it's a lot of you. And odds are those whose hands didn't went up, I think it's more likely that you haven' started the P set yet, or that you're just not fessing up, since it happens to the best of us. And let's see -- let's see if I can't induce this behavior myself. So this is my original program. So what do I want to do to maybe mess with my own program here? Well, let me do something like this. Let me go ahead and -- let's say -- say not up to N, but you know, I kind of screw up, and I do something -- oops. So instead of making a copy from 0 up to N, let me be really obnoxious and try copying 100 bytes only a few of which -- few of which probably belong to me. So I'm going to make copy 2, I'm going to run copy 2, I'm going to type in F-O-O, which is the string length of 3. Okay, that seemed to work. So realize, too, these problems you're running into with memory are not always easily changed down. And this is again why GDB can be so powerful, because sometimes, because of the way memory is managed in the computer, sometimes you can touch memory that doesn't belong to you. But nothing bad happens. But sometimes it does. So with memory, can you actually get the sort of non-deterministic bugs, because things happen differently sometimes when you run a program if user input is influencing the behavior. So let me try this again. Let me recompile, this time using 1,000 for copy 2. So what I'm doing now is I'm blindly copying 1,000 bytes from S 1 to S 2, eve though I did not allocate that many bytes for the copy. Whoa, okay. So bad stuff just happened. Right? So that's the take away. By doing L S, there's my core file. If I do an L S dash L, what you'll see is the long listing. So this is a lot of dates and times, but if I look for core here, notice that apparently this dump outside 442,368 bytes of memory, and if you really want to be more like a human you can do L S dash L H for human readable. So my core dump is apparently 432 kilobytes, which is apparently roughly how much memory that program was using at the moment I screwed up. So the computer dumped the contents of that memory to the local hard drive. So let me actually do something with this core dump, another trick that we can introduce with GDB today is this. I can run GDB on copy 2 and then I can go ahead and type run, and then I can time foo, and now I get this mess. But you can also do sort of some forensics with GDB. If you already screwed up and you therefore have a core dump file in your directory, you can tell GDB to use that. You say GDB, name of the program, and then name of the core dump, which is typically core, hit enter, and what you'll see now is -- let's see if it's actually -- hmm, it's not terribly useful here. But it did tell me that this program, when the core file was generated, aborted with signal 6. And actually, you know, I would also e-mail help at CS 50.net if I got this err your. Because this is not actually that helpful. So what's actually going on? Well, let's take a quick look by running the program. GDB of copy 2, enter, let me go ahead and break in main, let me go ahead and run, and now it's going to say something. Now I'm being prompted to say something, I'm going to type foo. Now it's doing a little sanity check. Now its allocating the additional memory, and let me do something here first. I'm going print S 2, and look, there's just garbage there. If I then type next, let me now print S 2. And now there's different garbage. Why is there different garbage all of a sudden? What's that? Well, so I haven't put anything in S 1 yet. So S 2 originally by default had? Garbage value here. So now it seems to have some new garbage value, but that's actually to be expected. Because what does meloc return, quite simply? It returns the address of what? Just the address of a chunk of memory. However many bytes you asked for. But it makes no claims as to what's actually at that address, and so we have different garbage. Because we've been handed a different address. All right, let's get to the point where I actually screw up. I check the string length, if I print N, and is indeed 3, now here's my loop. So you know what, let me go ahead and type next here, okay, that seems to be fine. Next here, let me do a quick sanity check and print S 2. And really, I'm just futzing around, oh, but some progress. Now I at least have an F at the start of S 2. Will he me go ahead and type next again, next again, now let me print S 2 again. Okay, F-O-O -- oh, that's an accented O, so it looks coincidentally like garbage and not an accented O. So let's do one more time. Next, next, now let me print S 2. Good. There's still some garbage. So let me do one more iteration here, and now print S 2 -- ah ha. Why does the string all of a sudden look perfect? Because there is a null zero. And one of the things GDB does for me is realizes oh, if I sigh character, character, character, back slash 0, let me just show the user the thing before the back slash 0. Unwilling if I go ahead and type continue, I'll see that something bad happens here, there's mention of heap over here, which is an interesting keyword, let me go over here. Oh, so this is interesting. GlibC detected, invalid pointer when I called freeze, some really bad stuff seems to have happened, and apparently the result of messing with my own memory. So why is this then a good thing in well, we now have the ability to do what we want with memory and pretty much anywhere we want, albeit with this down side. So the quick teaser here, at the bottom of this picture, finally, we're now putting back the top of this picture. At the bottom of this picture is the so-called stack. Quick review, what goes on the stack, what kind of stuff. So functions, frames, which contain the local variables and also functions, parameters. And when we played around with recursion a couple lectures ago, and I just kind of foolishly or naively implemented a recursive program, a function that called itself, an I just let it call itself thousands of times, millions of times. It eventually seg faulted, which again hints at a memory error, because what happened to the stack? Well, it kept putting frame after frame or frame on the stack. But clearly as your memory and reality limited, finitely, and in fact most operating systems as we've said somewhat arbitrarily say you cannot use more than 2 gigabytes per any given program, and that generally has to do with the size of the int or whatnot that they're using. So in that recursive program, bad things eventually happened when my stack overran this thing called the heap. Well, what is this heap? The heap pictorially is generally just drawn at the top, and this just means that the heap is where dynamically allocated memory is taken from. Any time you call meloc hence forth, it comes from that portion of memory called the heap, and as the arrow suggests, the more and more memory you allocate with meloc, the closer and closer and closer you get to the stack. So you get sort of these competing beasts where memory gets allocated by you via meloc. But if you start to push the limit with meloc you too might end up crashing your program. Now what about those things on top? Well, there's initialized data and uninitialized data. Well, those of you who tackled Problem Set 3 already, a little bit, know that there's at least two global variables. What are those? So board and B for dimension, and in fact, if I go ahead and open this so I'm going into CS 50, pub source, P set, P set 3, 15, 15.C. If you've not seen this yet, not to worry, but think -- thank you. The laughter helps, yeah, embarrass. Good lesson. So this is P set 3's 15.C file. So this is one of the framework files that we give you for this Problem Set. And there's two global variables. One is called board, and it's a two dimensional array. So as we said a while ago, you can have a two dimensional array and you just declare it by specifying the width or the height, or the height and the width. Doesn't matter which way you view your world, so long as you do it consistently throughout your program. And actually, just this morning we were corresponding with one of your classmates who very recently, not to poke fun at all, had some weird bug where every time he ran one of his functions, one or more of the elements of his board, one of the numbers from 1 to 15 was just changing inexplicably. It was becoming negative 1 or something weird. And so what the problem actually ended up being was he had in some line of code of his program, hope no one tries to copy the distribution code, because we're now writing now writing on top of it here, he had a line like this. For int I get 0, I is less than dim Max, I++, and then he probably had a nested loop or something like this, some hypothesizing, partly, J plus plus, but there is a silly mistake like this, and what was happening was when he was allocate -- let's see, B-O-A-R-D, J, gets let's say -- let's say I, just arbitrarily. So there's a bug in this loop. What is the bug that I introduced intentionally? So I'm iterating not from us, I is 0 less than dim Max, but rather less than or equal to. So I'm actually going too big here. But the interesting thing was every time he did this, oh, that's what it was, his variable D that was changing. So D was somehow changing, even though he was never touching the variable D. And yet if you look back now at the proximity of this area of memory called initialized data and uninitialized data, those global variables end up, up there. And if you give them a default value they're put in the slice of memory called initialized data. If you declare as we've done two global variables without giving them default values they end up in uninitialized data. But because I had written these things back-to-back in the tile, GCC had laid them out in memory back-to-back, which means if with your loop you iterate ever so slightly too far, guess whose value does in fact get globbered. D, because it was literally right next to board in memory. And this is the simple explanation. And then finally, the text segment up there, it's a weird name but it's historically accurate, what is the text segment of your program. So those are actually the zeros and ones that compose your program on disc, when you compile a program into A.out or any executable and then run it at the prompt, well they be the computer has to be able to read those bits back through its CPU and it puts them at the very top of this chunk of memory. So there's an interesting danger now that arises with the fact that most stuff in memory is allocated one after the other. The stack, the fact that it does this, is very nice conceptually neat, but it's very dangerous because adversaries or malicious coders can actually exploit this. And so one of the most common ways of hacking into a system, even to this day, one of the most common ways of breaking into a web site or some other piece of software is to feed a program more bytes than its expecting. So as this student did accidentally by stepping over the bounds of his array into a variable D, what people generally do, and sometimes it's by trill and error, is they try to input not just some normal text-like foo, but they try to paste into a program essentially executable code. Zeros and ones that they wrote, that if you can somehow slip them into a program's memory in the right place, those zeros and ones might actually get executed. So if you've ever had a friend who downloads a crack for some piece of software to circumvent the serial number or whatever, that's often the result of someone figuring out exactly where in code, now everyone's attentive, where in code that if condition is or where that variable is, and somehow trying to globber its value or tell the computer, you know, unbeknownst to it, to move to a different function all together and not that particular one. Even to this day, there's no such software out there that's still written in C++, and similarly dangerous, but powerful languages that may be 50% of the time still are exploits, the result of buffer overflow attacks. And in fact, when the iPhone had first come out a couple of years ago, those of you -- some of you might be familiar with this idea of jail breaking it, and being able to put your own software on it, pull it off of AT&T's network and put it on to T-Mobile. And two years ago in 2007 we actually distributed in CS 50 the crack code for the iPhone because somehow had written it in C, posted it on the Internet, and by running this C code on your iPhone, could you take advantage of a stupid mistake an Apple developer accidentally made -- that did not check the bounds of an array. And so this very clever or malicious person who wrote this code was able to take advantage of that, and to insert into memory data that was not expected. Let's take a five minute break. >> All right, welcome back. So this was the article, actually, two years ago, that I think I read, like the morning of lecture, and it was -- this was posted on a popular web site called [Inaudible] gadget, iPhone, iPod Touch jail break code posted, and I went into our C S 50 archives. And here is in fact that code. I'll link to this on our web site. I mean, this was publicly available, we're not really doing anything nasty here. But what is perhaps the academic value of glancing at this is that although we'll just glance at the code, all of the code that this fellow wrote just boils down to some basic C constructs that we've been teasing apart the past couple of weeks. So this is the code, exploit for the iPod, iPhone, by talk carta, Drey, and Niacin. Credit for the discovery goes to Tavis [Phonetics]. All right, so we have some sharpened clues up here. This is actually C++ thing, so for those familiar or unfamiliar, C++ is kind of like a new and improved version of C, a super, if you will, with additional features. They're definitely distinct, but they're very much related. Here we have a function, I mean, this stuff is all fairly familiar, we got some parameters defined there, a for loop, oh, and this is interesting, we'll actually glance at this briefly today. C allows you to declare something called the struct. So there will come a point very soon where it's insufficient to represent pieces of our program with just chars and ints and strings. You kind of want to represent a whole student object or a whole person object, you want to represent an entity that might have multiple components to it. And you can do that with this piece of syntax called a struct, and define your own data structures. If I scroll down here we see some more struct stuff. Where's main, where's main. There it is. So there's the familiar main. So this fellow would not get very good points for style, right? I have yet to see a single comment in here, but maybe that's one of his points. So we get some hexadecimal, apparently that comes in useful and handy when cracking iPhones. And then there's some fairly esoteric looking stuff, but as I was talking with one of your classmates during break, that often is the work flow. And I don't know the specifics of this particular attack vector, is for people just to bang on programs, or bang on web sites, and by this I mean give them a billion characters and see what happens, give them zero characters and see what happens. Generally, bang on someone's code, which we encourage you to do to your own, because the teaching fellows will then do that themselves because very often if you push hard enough, you start to find wake points in programs. And in fact, what adversaries generally thrive on is finding bugs in programs that cause them to crash. Because even though you, the human, the user might actually get really annoyed when your program or some web site crashes as the result of some unintended behavior, that really makes an adversary drool, because generally, as we've even started to see here, when there are mistakes made you can induce unintended behavior and sometimes by injecting data that shouldn't actually be there. And so what happens often is trial and error, you bang on a web site, bang on a program, viola. It crashes. It core dumps. Then the fun starts. At least in the eyes of these folks. Because then can you finally start to think about why did it crash and how can I maybe take advantage of that. Now in the case of iPhones and things like this, this really is a cat and mouse game, right? Because Apple quickly fixed whatever exploit, whatever bug this fellow took over, but it's a cat and mouse game, and that's -- this has been done several times since. So it's much -- the adversaries, frankly, probably have the advantage in all of this because we the -- well, we the good guys have to fix every one of our mistakes to keep our code and our programs and our data safe. The bad guys just have to find one flaw or one mistake. So the tables are by nature fairly imbalanced. I thought I would show you this too, this was shared by a classmate. Now you have a bit more comfort, perhaps, with crazy-looking syntax, this is linked, too, on the courses lecture page. And it's kind of a witty, jokey thing someone spent a lot of time on, coming up with different snippets of code that different personalities or people might write. So someone in high school or junior high back in the day might have written a program in a language called basic, that as I said has explicit line numbers, looks a little simple like that. Then maybe in college it gets a little more sophisticated. After that, maybe in CS 51 you start using a language called lisp or scheme, looks a little like that. Then you start to become a professional, and the code starts to get longer. A lot of the examples just get a little crazy. But it speaks to this tendency to overengineer a problem, this written on a Windows platform, just ridiculous amount of code. All of these programs implement hello world. Then things start to come back down to meet reality, apprentice hacker. So the geeks in the room might actually like to play with some of these, because you can run them on nice, most of these programs, if you know how to compile them or how to interpret them. But I think it gets funny sort of toward the bottom, where it becomes a little office like or office space like. So now we have the new manager printing something in basic, middle manager, if you're familiar with a little command like this. Senior manager, and then the chief executive, I thought was the funniest. [ Laughter ] >> So, fun with geeky humor. Okay, so we promised to take the hood off of the CS 50 library, and this is what you've been taking for granted all this time. But now we're in Week Four of the course, even though some of the techniques you're about to see are a little more sophisticated than we'd see in current problem sets, you can at least now maybe take on faith that certain lines of code do what the comments say they do without getting perhaps a little overwhelmed by what -- just a couple weeks ago was entirely new stuff. So what I'm going to do is go back into today's source directory. You do have a print out of this. And this file again that we've been including via the preprocessor directive called sharp include looks a little like this. It's mostly comments up top, but then notice we are using a couple of values that are defined in these libraries here, flow dot H, limits dot H, the boolean data type if you used it. We actually ripped that off from a standard library called standard bool dot H, seed did not originally can with a boolean data type. People just used ints and used zero and one. But eventually people came up with a type def for that. We define string in here, and then down here are just the proto types or declarations for all these various functions, and the documentation for them. And as an aside, you won't have to do these sets of instructions at the top of this file, but just so you know, and try to remember this eight weeks hence, when you leave CS 50, if you ever try to repeal your code on another LINUX box or PC or Mac, and you may have some trouble with the compilation if you don't actually have copies of CS 50.H and CS 50.C. We install them sort of automatically on nice.fas in a way that just makes things work. So we included -- just FYI eight months hence, directions up here, they're arcane, but they'll be a little more familiar in a few weeks time, how you can continue using the CS 50 library or in spirit any library like it, down the road. Essentially, you have to create the equivalent of a dot O file and then move it to a special place on your actual Mac, Windows or LINUX computer. So let's now look at the source code for CS 50.C. So we copied and pasted a lot of the same documentation, let's scroll down to get int. Since that's perhaps one we've been using a lot. It's not many lines of code. So I stripped out the comments here. But let's see if we can't reason through what's going on. So while 1 induces what we would generally call insert the blank, fill in the blank, infinite loop, right? So infinite loops, often bad, not necessarily. Sometimes it is actually a very reasonable construct to code deliberately, so that you can do something in perpetuity until something is true, or false, at which points you can break out of that loop, as we actually do, by returning from within. So there's different ways of doing this, but one approach is to embrace the infinite loop and just make sure logically that you will at some point break out of it, if that's the intention. So it turns out that get int actually uses our own version of get string. So the first time Glen and I sat down to write this code we, you know, we too started copying and pasting. Then we realized wow, get int is really similar to get float. And get float is really similar to get double. Maybe we should factor out these lines of code that we keep copying and pasting into every one of these functions. And so we did, and thus was born get string. So here we're leveraging a function we wrote elsewhere, and this factoring out of common code is perfectly consistent with the message we've been trying to send, that once your code starts getting a little longer, a little unwieldy, then can you start plucking out portions of it just like we've done in the game of 15. The fact that you're handed a file with a whole bunch of functions, each one of which does something conceptually distinct speaks to this idea of good design and modularity. So how are we leveraging this? We get a string up here, we call it a line, the variable called line, then we do a little sanity check, if line is null, return int Max. So here's where you trip over a little limitation or frustration of C. If something goes wrong in a function that's supposed to return an int, how too you signify an error? Well, if you return zero, that suggests that you can never actually get the number zero from the user. So let's pick something else. If you -- maybe we return 123. 123, any time there's an error. But the problem then is the user can never type in the number 123 because you can't distinguish that from an error. So in short, you have to pick some value, presumably, you're not going to pick one smack-dab in the middle of the range, something like negative 1 or 0, or as we've done, we picked the largest int possible. We decided you know what, there's an upper bound on the number of ints we can store with just 32 bits anyway, let's steal one of those, let's steal the very biggest of them, 2 to the 31 minus 1, and just say to the user, sorry, you cannot type in 2 billion, give or take. You can use anything smaller than that. So this constant, as suggested by the capitalization of it, is actually defined in one of these header files in CS 50.H. So in float.H and limits.H, they're just some constants that someone else already defined for us that represent the largest possible ints that you can represent with a C program. So we said you know what, let's just return that value. So you, all this time , any time you all have been calling get int and just using its value, you know, you've never actually been doing the right thing, which is check and make sure it's a legitimate value. Technically any time you've been calling get int, just like we've been calling get string and checking for null, you really should have been checking for int max. But frankly, in the first weeks of the course, really gets a little distracting I think if you're constantly checking and getting. So we throw -- we ignore those details, but now in the documentation you can see force -- if line can't be read, if any error actually happens, it returns int Max. And so this is again one of these artsy [Inaudible] moments. You just have to know what the possible return values or so you can handle any errors accordingly. All right, so let's assume there's no errors, because frankly this happens very, very rarely in most context. So what do we do next. Well, this is a new function. S scan F. String scan F. So this function takes a string as input and scans it for special characters. And we're using this function to figure out did the user type in int, did they type a word, a char, we need to use some format codes and just like print def uses format codes, so does S scan F use format codes like this. So this is a little trick, and we won't spend too much time on the details of this, but just so you have a taste of how this -- what this library's been doing and how it's been saving you trouble, notice or giving you trouble, perhaps, S scan F is going to scan the line the user times in, and it's going to look for this pattern. It's going to look for space and actually because of the -- because the documentation of S scan F says this, one space or multiple spaces. The fact that I intentioned included a space right there means allow any white space. The user can hit space bars as many times as he or she wants. Then look for an integer, then look for a space, one or more spaces, then look for a character. And this is actually kind of a clever trick that we did here. So these two variables here is the clever trick that let's S scan F essentially return two values. So remember, C function can't return two values, it can only return one. You can only have one thing on the left-hand side of the equal sign, the assignment operator, but we've had as Monday this goal, this desire to actually change multiple values at once. I and J, A and B, X and Y. So what am I doing with N and C that are just primitives, ints and chars, what am I doing with this ampersand, what am I really passing into S scan F. Yeah, pointer. So the address of N, the address of C, which means now that I've handed him this road map to those variables, S scan F can put anything he wants in those variables, which means he effectively can return two values, but that's an abuse of the terminology. He can alter or modify or mutate two values, and that's the goal here. And now finally, I know from the documentation, that S scan F returns the total number of variables that were read into. So I've handed it two, and in N, but S scan F doesn't have to fill those variables, because frankly, if the user just hits enter or a lot of spaces and no ints and no chars, I mean, he has nothing to put in N or C. So S scan F returns the number of variables that were actually filled with values. So why am I doing this? Like, why the percent C. Clearly, I want just one of those to be filled. Right? Because I'm checking here for equalling, equalling one. I only want one variable filled, the first one, N. What does it mean if the return value is 2, do you think. So there's a character, right? Just logically, if the return value of S scan F is 2, not 1, that means both N and C were filled with values. But what does that mean? Well' according to the format codes, the number would have been put first because the percent D and because of the N. So if the user times something else, this something else, the first character of it, would at least end up in C, and so what this means is that if the user is trying to mess with me by typing something like 4 and then again my random string of the year, monkey, it's going to detect 4 and then M, but that's an error, right? Because we only want the number 4. So if the return value of S scan F is everything but 1, that means the user typed in yes in int, but then some garbage. Or the user times nothing, in which case the return value is 0, in which case it's also an error. So what do we do in the case of an error? We free the line, so we free the memory that we just got by way of get string, and then we tell the user to retry. So any time you've seen in your own programs retry colon, it's simply because we've hard coded it into this function here, but if S scan F does in fact return 1, what do we do? Well, we don't need the line of text, I don't need everything the user times, I don't need any spaces or what not, I just need N. So I go ahead and return N. And now a quick distinction, to be clear. Get string allocates memory dynamically with regard to our little picture from before, where does get string take memory from? So the heap, right? So get string or in turn, meloc takes all of its memory from this area called the heap. By contrast, where do N and C live in memory. So they live in the stack. And we knew that from a week or so ago. Any local variables live on the stack. But the problem with the stack is that as soon as a function returns, as soon as get int returns, the stuff from the stack goes away, gets obliterated. It's no longer safely there. And this is why the heap is no useful and so neat, because on the heap, memory stays where it is and it doesn't get automatically overwritten as it does in the stack. Now granted, if you try to use too much memory it will accidentally get overwritten, but not automatically. It's the stack whose memory is reused constantly, as soon as a function returns. Which is to say do I need to ever free a local variable like N or C. So no, it would be an error to free anything on the stack. You only call the free function on something that's all indicated on the heap and that's why I only free the line and I only free the line here. Well, let's now tease apart get string, since that's apparently what's making all of this possible in the first place. So get string is a little longer, we won't dwell on all of the details, but just to hint at the flexibility here. So get string first declares a string called buffer, but it initializes it to null. And that's just good practice. You will find that even if you don't need to use a variable yet, you will save yourself a lot of time over, you know, the course of your lifetime by initializing it intentionally, variables to value, so you know in fact they are something. Then I have some capacity local variable and N local variable and a character, now this looks like bad style, but it's just because I've written out all of the comments from today's code. And it looks like there's this function. There is a function in C called F get C file get character standard in refers to your keyboard. So what this is, a loop that says keep reading character after character from the keyboard, store it on each iteration in a local variable called C, and then, and this is our clever one-liner, make sure that C does not equal the new line and does not equal the special E O F, end of file character that happens if the user hits, say, control D on most computers. Now this line of code here, which is best explained by the comments, does a really neat thing. If the total number of characters in the string at the moment plus 1 puts us over the capacity of the string, in other words, we have this variable called buffer, it's just a bunch of bytes, currently, there are no bytes there, because I didn't initialize it to anything. If the next byte would overflow the so-called buffer of memory just intuitively, what had we best do, if we want to be able to grow dynamically. The buffer is this big, the user just typed something in, I need to go gasp for more memory. Now let's fast forward a few moments in the story, suppose that I now had 10 characters, but the user types in 11. Problem is I can't just overwrite the 11th space, because I'm going to overwrite the back slash 0. So what get string ultimately does is it starts with a buffer like this, the more and more characters it finds at the prompt by way of this F get C function, it desides oh, you know what, let me double the buffer size. Let me double it again and then double it again. And then double it again. So it's literally growing dynamically. It's using a function called realloc, which equal the effect of reallocating memory if its possible. You hand it a pointer, say I don't want this memory any more, I want more. Give it back to me. So that's what this ultimately does, and then finally, at the end of this function we're sort of a good neighbor. So we then decide, you know what, if I've just been doubling and doubling and doubling this buffer, but now I actually didn't need all of that memory, I just needed an extra one or two bytes, not all of the memory over here. These three lines of code here, as you'll see in your print outs if you reason through it, simply allocates just enough memory using alloc, using N plus 1 for the new line character. We use a function called string N copy, stir N copy, which copies one to the other, and then we free this excessively large buffer so that what we hand you, the student, the user, is only as many bytes as you actually need. And so this little exercise today of finally peeling back the layer and looking underneath the hood is precisely in the spirit of taking these training wheels off. And next week when we dive into yet more powerful techniques still. We'll see you on Monday. ==== Transcribed by Automatic Sync Technologies ====