[MUSIC PLAYING] DAVID J. MALAN: This is like a freshman seminar today. OK. So very rainy out. This tends to happen on Wednesdays, but all the more opportunity for questions today. So let's start off actually with the film in just a moment. But we'll start grandly as always. This is CS50, and this is the end of week 4. So if you've ever watched TV or a movie wherein there's some computer experts and the police, or FBI, or some agency is trying to catch some adversary, well, you've probably heard the expression "enhance," whereby that technician somehow magically zooms in infinitely far to see the criminals identity or the license plate number in even the shimmer of a mirror or the glint of someone's eye. So indeed, let's take a look at a few such scenes from Hollywood. [VIDEO PLAYBACK] -OK, now let's get a good look at you. -Hold it. Run that back. -Wait a minute. Go right. -There, freeze that. -Full screen. -OK, freeze that. -Tighten up on that, will you? -Vector in on that guy by the back wheel. -Zoom in right here on this spot. -With the right equipment, the image could be enlarged and sharpened. -What's that? -It's an enhancement program. -Can you clear that up any? -I don't know. Let's enhance it. -Enhance section A6. -I enhanced the detail, and-- I think there's enough to enhance, release it to my screen. -I enhanced the reflection in her eye. -Let's run this through video enhancement. -Edgar, can you enhance this? -Hang on. -I've been working on this reflection. -Someone's reflection. -Reflection. -There's a reflection of the man's face. -The reflection. -There's a reflection. -Zoom in on the mirror. -You can see a reflection. -Can you enhance the image from here? -Can you enhance him right here? -Can you enhance it? Can you enhance it? -Can we enhance this? -Can you enhance it? -Hold on a second, I'll enhance. -Zoom in on the door. -Times 10. -Zoom. -Move in. -More. -Wait, stop. -Stop. -Pause it. -Rotate us 75 degrees around the vertical, please. -Stop. Go back to the part about the door, again. -Got an image enhancer that can bitmap? -Hey, maybe we can use the Pradeep Sen method to see into the windows. -This software is state of the art. -The eigenvalue is off. -With the right combination of algorithm-- -He's taken elimination algorithms to the next level, and I can use them to enhance this photograph. -Lock on and enlarge the z-axis. -Enhance. -Enhance. -Enhance. -Freeze and enhance. [END PLAYBACK] DAVID J. MALAN: All right, so all of those are actually words. They're just strung together in a way that's not actually sensible. And, in fact, CS50 and courses like it tends to ruin a lot of TV and movies for you. Because when those computer experts are rattling off terms and saying fancy things like eigenvectors, and the z-axis, and any number of other actually more technical terms, they're really just stringing words together all too often. Is that one of our hopes is that, as a side effect of taking courses like this, will more people in the world actually be able to weigh in and just ever so slightly influence the quality and accuracy of those films? In fact, let's take a look at reality. So here is the staff photo of Mary, one of our teaching fellows. And suppose she is suspected of something. And yet, there's a glimmer of some piece of evidence in her eye, or in the reflection of her eyeglasses. Well, if we do exactly as the films propose, wherein we zoom and "enhance", this is how much information is in Mary's face when you capture an image with that original resolution. And, in fact, you can see these dots. And these are what are called pixels, P-I-X-E-L-S, which is just a square typically that is a dot that composes an image. And back in the day, and actually even today with some of today's LED TVs or LCD TVs, if you've got one in your room or at home, if you go up super close to it, and especially if it's a somewhat older TV, you can probably even see these dots and that's what compose an image. And there is no more information than this. We could "enhance", in the sense of smoothing things over and sort of inferring kind of, sort of what color should be next to Mary's eye so that it's not actually so pixelated. But if I keep zooming in, there is the bad guy in her eye. Like that is all the information we have. You cannot create information out of nothing. There's only a finite number of bits there. So in Problem Set 4, where you have an opportunity to play with this kind of world. In Problem Set 4, you'll explore the world of graphics, and forensics, and actually write code that recovers lost images. You'll write code that manipulates existing images and ultimately understand what's going on underneath the hood. And, it turns out, it's actually not all that complicated. For instance, if we wanted to represent a smiley face where with these black pixels, or these black dots, well, we could simply represent them as truly a bitmap. And if you had ever heard that expression bitmap, perhaps it now starts to make a little more sense today. We already know what a bit is. It's 0 or 1. And a map is just something like a piece of paper that gives you directions and has maybe a grid of x- and y-coordinates. So here is a bitmap. It's a map of bits whereby a 1 is apparently going to represent a white pixel, and a 0 is going to represent a black pixel. But we could certainly flip it around. It doesn't really matter so long as we're consistent. And here is how, in binary-- inside of a computer's memory, or even inside of a file on your hard drive-- could you store the simplest of smiley face images. But what are we, of course, lacking in this image? Color, right? It's an obvious next step or enhancement to improve this with color. So unfortunately with just a single bit, 0 or 1, we could represent color. That could be red, or blue, or black, or white, or green, or pink, or any pairs of colors. But for simplicity's sake, we'll just assume black and white. So what logically do we need if we want to implement color in an image? What do we have to do? Like if the limiting factor here is that with one bit you can only represent two states, 0 or 1, white or black, what do you want to do? AUDIENCE: More data. DAVID J. MALAN: More bits, yeah more data, more bits. And, indeed, that's exactly how color images are represented. Rather than use a single bit, a 0 or 1 for each pixel, each dot, you just use multiple. Maybe use 8, maybe, more commonly use 24, and indeed, in Problem Set 4, will you play with a file format that uses 24 bits typically. But most of you are probably familiar with JPEGs. If you've ever taken a photo on your phone, or uploaded or seen something on Facebook, or Flickr, any number of photo-based websites, you've probably seen a JPEG image before. And it turns out, this is the file format we're going to use in PSet 4, whereby you're going to have to recover images that I've accidentally deleted from a corrupted memory card in the camera, if you will. And it turns out that even though JPEG is pretty sophisticated-- it's much more sophisticated than the black and white dots we saw a moment ago, because there's actually fancy algorithms that are used to compress a JPEG, so that you can have a really nice, quality picture but using relatively few bits. And we'll come back to compression before long. It turns out that the first three bytes in a JPEG image-- no matter what you've taken a photograph of-- are the values 255, 216, 255. In other words, if you just see that pattern of bits, represented here as three bytes, or 24 bits total, with high probability you can infer that you are looking at it this first three bytes of a JPEG. And this is what's known as the signature of a JPEG. A lot of file formats out there tend to start with certain patterns of 0s and 1s, so that Windows, and Mac OS, and iOS, and Android know what kind of file they are, in addition to the so-called file extension that a lot of files have. If you have .jpg, that's another clue to the computer. So let's now look at this a little more technically. We know the decimal system is 0 through 9. We know binary is 0 and 1. And if you think back to PSet 0, we had you wrestle with, for a little bit, something called hexadecimal, where you have 16 digits, instead of 10 or instead of 2. And those digits, by convention, are 0 through 9 and then a through f, where f represents what decimal number, just as a quick sanity check? So, 15. And a must represent 10, just by nature of the ordering that I've given. It's just an arbitrary convention, but it's quite standard. So if we look at this pattern of three bytes-- let's just start to look at it in a manner consistent with how computer scientists generally look at and think about files. You can certainly think about files in 0s, and 1s, and decimal, but in reality, we tend to use binary or more typically hexadecimal-- back from PSet 0. So let me propose that 255, 216, and 255 are just these patterns of 0s and 1s. And you can check this if you want to do the math from Week 0. But, for now, just assume that this is indeed correct. I've just rewritten three decimal numbers as three binary values. Now what I'm going to do is just add some white space, just for readability's sake. And notice, I'm just going to move things apart. So before, after, before, after. I'm doing nothing interesting other than just spreading things out so that notice each set of eight bits is now two sets of four bits. This is useful because hexadecimal is particularly fashionable because each hexadecimal digit 0 through f, or more specifically 0 through 15, can be represented with exactly four bits. In other words, in hexadecimal if you want to represent a 0, it's just 0000, four zeros. And if you want to represent 15, it's 1111, which is four bits. And if you do the math, if this is the ones place, this is the 16s place, that's going to give you-- rather that's going to-- sorry, in binary, that's going to give you 15, ones place, twos place, fours and eights place. So let me propose that that set of four bits on the left is what we're going to call f. It's the biggest number you can represent with four bits. And we already know from hexadecimal, f is the biggest digit in hexadecimal. We've got another f there, two more over there. And for now, just take on faith that I have done the math right and that the left half of those bits, 1101, is the same thing as d in hexadecimal. And the right hand, 1000, is just 8. And that one's easy to see, right? The 8 represents-- is right underneath that eights place. So we have one in the eights column and nothing in the fours, twos or ones. So now more conventionally, humans tend to write hexadecimal digits like this, you just squish them together, and then you prefix them with 0x. It means nothing other than a visual clue to a human-- here comes a hexadecimal value-- because it might not otherwise be obvious. Which is to say, ultimately, that the pattern of zeros and ones, or the pattern of hexadecimal digits equivalently that you're going to start looking for in Problem Set 4 is this-- and the Problem Set 4 spec will walk you through this in more detail-- but realize as sort of arcane as this might look at first glance, you're going to start seeing this a lot. And in fact, even in GDB, the debugger we introduced on Monday and Dan introduces in PSet 3, is going to often show you hexadecimal values just because they tend to be more conventional than decimal or binary in the world of computers. Now let's put this into context. Many of you might remember this picture here, which came from what? Vista, so even earlier than that, Windows XP did this debut. So this is a beautiful landscape. And in fact, if you poke around online-- I think it's a Wikipedia article, wherein someone very amazingly went out found this location in the world set up his or her camera in precisely the right place-- and this today looks like-- but it's exactly the same setting. This image, though, is in a file format called bitmap, b-m-p. And we're going to take a super quick glance at what that means. But bitmap is just a different way of representing images still using pixels in 0s and 1s, ultimately. But at quick glance, it has a more interesting signature at the beginning of the file. It's not just three bytes, rather there's a whole bunch of patterns of bytes that have predetermined meaning. For instance, somewhere in the first few bytes of a bitmap image is going to be the size of the image, the width of the image, the height of the image, so useful metadata, if you will. Useful information that Photoshop or any graphics program you're using might actually care about. So more on this in Problem Set 4, but this is only to say that at the end of the day all the file formats you've been using for years-- Microsoft Word files, Numbers files, Excel files, any number of file formats that might have some known file extension are just 0s and 1s underneath the hood. And humans have decided what the conventions are, what patterns of 0s and 1s represent a Word file versus an Excel file, versus any number of other file formats. So in PSet 4, you'll have an opportunity to play with that. But what does it mean to have a struct. This is actually a nice segue now into C, which has only a couple of additional features that we haven't looked at yet. It's a pretty small language and one of the nice features about C is a struct. For instance, if you wanted to represent-- let's say you wanted to have a variable that represents a student in some program. Maybe you were writing a course registration program, or core shopping tool, or something like that. What are pieces of data related to a student that come to mind? Like a student is represented with what values? Yeah? You have a name as a student. What else does a typical student have? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: So, sorry. AUDIENCE: Age. DAVID J. MALAN: An age or birthday equivalently, yep. What else? AUDIENCE: ID number? DAVID J. MALAN: So an ID number, maybe a phone number, maybe a dorm, or house, or college, or something like that. Any number of pieces of data that you might have in your contacts list is what might define a student. So if we wanted to do this, in code, we might do something simple like this. We might have a program so that has let's say, int main(void). And if I want to represent a student I might have, for instance, a string called name for that student, a string called dorm for that student, maybe an int called ID for that student. And because I'm using string, I need to go back and put up CS50.h. Maybe I'm going to need stdio.h. So let me preemptively do those and I'm going to call this student.c for now and save this. And now I can do something with these variables. And we're just going to write that as a comment in pseudo code, because it's not interesting what we do for now. OK, so this is a program that somehow stores a student. What do I want to do if I want to store two students? So my first instinct is going to be all right, wait a minute, if I have another student why don't I just do string name 2 , string dorm 2, int id2. And we've done gone down this road before and what was our solution to what seems to be kind of a hackish copy paste job here? AUDIENCE: An array. DAVID J. MALAN: Yeah, we could use an array. Right this very quickly becomes unwieldy. You have to sort of arbitrarily start naming all of these variables. And you, the human, have to keep track that OK name2 corresponds with dorm2 corresponds with id2. It just becomes a mess. So it's a lot easier, recall from a few weeks ago, to just having to called string names and maybe give us three of those. And then maybe we have string dorms and have three of those, or with a constant, int ids and have three of those. But even now this feels a little sloppy, right. We're talking about students and yet I'm really dwelling on the low level implementation details. The student is a name and a dorm and ID. Why can't I just declare a variable called student and call it s. And if I want another student, why don't I just call it t. Or if I want a whole bunch of students, why don't I just say I have a whole class of students, and it's three of them. In other words, why can't I come up with my own data type, called Students, inside of which is a name, is an ID, is a dorm, is any number of other fields. And it turns out you can do exactly that. So C has this feature called struct. That's a language feature that allows us to do exactly this. I'm going to go ahead and open up structs.h where we're going to see the following definition of a student. It turns out--and this one's even simpler than the one involving an ID a moment ago. If you want to come up with your homemade data type, and in addition to int, and char and float and all these others that exist, you can do so by literally writing typedef struct, then some curly braces, inside of which you list the variables you want to associate with this new custom data type like a name and a dorm, and then after the curly braces you give a name to the new data type. So, for instance, student. And what's nice about this now is that if we look at the corresponding code, the convention, first of all, is to put this in a file called something dot h, a header file, which we haven't started using ourselves too much. But we're going to start using quite a bit now. And what we can do with this, ultimately, in these few lines of code is declare exactly that data type, a student. And now let's use it. I'm going to now go into a file called structs1.c. And let's take a look at a few characteristics here. So the stuff up here is mostly familiar, and we'll come back to what is not familiar in just a moment. This of course is including my own header file, which is new as well, except for PSet 3 where, recall, we have helpers.h. So you might recall #include helpers.h. Why though am I using quotes instead of angled brackets? When do I choose between them? Almost always I seem to use angled brackets. And then, all of a sudden on line six I'm using double quotes. Why might that be? Yeah? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: That's an actual, what? AUDIENCE: That's in your IDE. DAVID J. MALAN: Yeah, that's in my actual IDE. And let's not dwell on the IDE, because that's just a tool that I'm using. That's in my current directory, specifically. So structs.h is my own file not installed in the IDE, in the operating system itself, rather it's in my current directory. So the convention is if you want to include your own header file, you just use double quotes. What do we call this thing in line 8, generally speaking? This is what? #define something. This represents constants, right? If you want to have a value in your program that you use a whole bunch of times, it's good convention to factor it out, declare it, with the hash symbol define, then, by convention, in all uppercase word-- though it's not strictly necessary, but it's human convention to capitalize constants so that they jump out at you visually-- space and then the value you want to be equivalent to that constant's name. No semicolon, but you simply follow that pattern there. So what am I doing in this actual code. So let's take a look at the main program here. In line 12 because I have included structs.h, I now have magically at my disposal a new data type. I don't just have access to int, and char, and float, and string, and blue and others. I now have access to a student data type. So in line 12, I'm combining two ideas-- one a custom data type and two, using an array. And so in this program if I want to actually support three different students in my program, I can simply say give me a variable called students, each of which is of type students, which is my custom data type. And, specifically, give me three of those in my array. So now what do we do in this program? Here's just a for loop iterating from 0 to 3, because that's what the value of students is. I'm just prompting the user give me the student's name. And then in line 17, we have a mostly familiar line. We have our old friend GetString on the right. And what piece of syntax is apparently new, if you've never programmed in C before, and have never used the structs? Yeah? AUDIENCE: The .name. DAVID J. MALAN: The .name. But this isn't too much of a leap, because now students bracket i gives you the i-th student. And if you want to dive inside of that structure, you just use a single period and then the name of the variable inside, or the property inside that you want to get access to. Similarly then, if I then prompt the user, give me the student's dorm, you can similarly store that string in the dorm variable inside of that student structure. And now things get a little fancy. And this is going to look at perhaps a lot quite soon. But you'll see this far more in PSet 4, so let's just glance at it now. It turns out that in line 23 through 38, what do you think I'm perhaps doing? I've removed the comments for today, but the version of the code online for reference has all comments. What do I seem to be doing? AUDIENCE: Saving the file with all the information that the user entered. DAVID J. MALAN: Yeah, exactly, this is a new way that we're seeing two, another feature of C, whereby I can create my own files. Thus far, almost every program you've written is stateless. As soon as it's done running, that's it. There's no memory or recollection of it. There's no file saved. But if you do want to save input that has happened, like in a game or a program like this, it turns out we can do so. And you'll see this more in PSet 4 and in Section. But this line 23 essentially creates a file called students.csv. And you might have seen this before. Even if you've never studied CS before, CSV is comma-separated variables. It's like a very poor man's version of an Excel file, which means that it could be opened in Excel and in Apple Numbers, and it has rows and columns. But it's not a proprietary format like Microsoft or Apple's. It's just commas separating the values that we'll see in a moment. And just take a guess. In line 23, at the very end, my second argument to this new function called f open for file open is w. What might w denote? Yeah? AUDIENCE: It lets you write to the file? DAVID J. MALAN: It lets you write to the file. So there's a couple of variants that we can plug in here. But if you just want to read the file, that is look at it and read it into memory, you just use quote unquote "r". If you want to write to the file, you use quote unquote "w". There's also append and a couple of other things if you want to modify existing files. Now we're going to keep seeing this thing, then we'll come back to line 24. NULL, it turns out, is a special value that can be returned by certain functions if something has gone wrong-- if the file doesn't exist, if you've run out of memory, or a bunch of other errors. But for now, let's just assume that this is just conventional error checking. Here in line 26, I'm iterating from 0 to 3 over all my students. And this is kind of sort of a new function, fprintf, but just take a guess. If printf is just print a formatted string, what does fprintf probably mean? AUDIENCE: Print to a file. DAVID J. MALAN: Print a formatted string to a file. That's what the additional f means is file. And the new first argument has to be the variable that represents your file. Then we just have a format string just like printf. And even though this syntax is new, this just means plug in the student's name, plug-in the student dorm, and then with fclose, close the file. And then lastly-- this is new and we'll come back to this before long-- I'm freeing the student for reasons that happened up above there. But we'll come back to that before long-- that's because of how GetString is actually working underneath the hood. So let's take a quick look here. If I type ls in my directory, notice that I do not have a file called students.csv, just not there, does not exist. So if I now compile this program, make structs-1, . /structs-1, and I'm going to go ahead and type in Andi , who lives in Berkeley at Yale. We're going to have Rob who lives in Thayer these days. And let's come up with where is, I think, Maria is in Mather, if I have remembered correctly. So nothing seems to happen. But if I type ls now, there is students.csv. Let's go ahead and open students.csv. This is again a very lightweight file format. But I've simply adopted a convention that I have two rows and columns here. The first column is people's first names. The second column is the student's dorm, or college, or house, or whatnot. And now I've saved this permanently in a file. So it's not all that interesting. But this is just a stepping stone now to being able to persist information permanently. So let's see now what more we can do with these and other features. But first, any questions? That was a lot, and that was fast. But you'll see a lot more in PSet 4, as well. Yeah? AUDIENCE: Is there a way to continue adding names to that file? DAVID J. MALAN: Good question. Is there a way to continue adding names to that file? Yes. And, in fact, if you end up re-opening the file, you would use quote unquote "a" for append, which would just add a new line, a new line again and again, exactly. Good question. Other questions? Yeah? AUDIENCE: If you ran the program again right now, would it keep adding names to the file or would it open up a new file? DAVID J. MALAN: Ah, good question. If you ran the program again right now, maybe typed in new names, would it add to the file or overwrite the file? The latter, because I'm not using append mode. And because I'm just blindly opening the file for writing, it's just going to overwrite the file. So I would indeed need to do is append, if I want to actually have a long term database. Now CSV is useful, frankly, even for like if you're writing-- and we'll eventually see this later in the semester when we use CSVs for other purposes. If you want to store all of the people who have registered for some event, or signed up for your student group, or something like that, storing the data in this kind of format is super convenient. Because literally, if I were to download this file. I could double-- and let's actually try this if I have Excel or Numbers on here. I'm going to right-click or control-click my file. Whoops. Right-click or control-click my file. Come on, my mouse isn't cooperating. Download-- I'm going to download all the files here so just so I can grab this one. And let's see if this works students.csv-- first time I've activated. Now they want to see my contacts. Now, I need to register. See how easy it is to use CSVs? Yes, keep it up to date. OK, now we're ready for class. OK, oh, what's new? OK, close. That was magical. OK, now we have to update. And now, it forgot what file I originally opened, but what a-- there we go. OK, so now we have an Excel file. Thank you. OK, so what I did was the easy part. Of course I could have pre-installed Excel, or Numbers, or whatever program. But this is nice, because now I can manipulate the data in a standard format. So now let's context switch to where we left off last time, which was to start to take off training wheels. But first, you didn't see this earlier lunch is again happening here at Fire and Ice in Cambridge, Sitar in New Haven. Sign up on CS50s website ASAP to join CS50 students and staff. So we took training wheels off on Monday as follows-- string has been declared in CS50s library for some time. And it's nice, because it allows us to talk about variables as being complete words and sentences and more. But it turns out string does not exist. That is just a synonym, or an alias, that we have created for something that actually is a little more technical called a char*. And indeed, we saw an example of a program on Monday that didn't behave quite as we expected. This was the file, compare-0. And recall that compare-0, if I recompile Monday's program and run compare-0 and type in mom in lowercase, and mom in lowercase again. The program insisted I type different things, even though mom, all in lowercase, is identical visually. So what was the short answer for why the computer thinks those two strings are different? Yeah? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Right. So, mom, the first time I type it in, is being stored somewhere in my computer's memory but in a different location than the second time I type in mom. Now it certainly could be optimized. The computer could be smart and realize these two strings, hey, they're identical. Let me not redundantly store it. But computers don't do that optimization unless you tell them to. So, by default, they're just going to end up in two different places in memory. And so to be more clear, when we compared the two strings, the first was called s, the second was called t, what specifically was I comparing here on line 13? Yeah. AUDIENCE: It's the place in memory that the variable will point to. DAVID J. MALAN: Exactly, I was comparing the place in memory that those variables pointed to. So specifically, if mom was at byte number 1, and 2, and 3, and 4-- because remember the backslash 0 needs to be all the way at the end. And the other instance of mom, m-o-m, was at address 10, 11, 12, and 13. I was comparing 1, that address, that location in memory, against 10, which is obviously not the same. 1 is not 10. So this is nice in that it's pretty straightforward. But it's problematic insofar as we can't seem to compare strings. So fundamentally-- and at this low level, if you wanted to implement a program to compare two separate words that the user has typed in for quality, do they line up char for char, just in general terms, what do we need to do, apparently? It's not sufficient just to look at those two addresses. What do we need to do? Yeah? AUDIENCE: Iterate through the string [INAUDIBLE]. DAVID J. MALAN: Yeah, let's iterate through the string. Let's use a for loop, a while loop, or whatever you're most comfortable with. And if we've got two strings somewhere in memory, let's look at each's first character, then each's second character, then third, and fourth, and fifth, until we hit what special sentinel value? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah, the backslash zero, at which point in either string we can decide that's it. Have we matched every single character? If not, return false. If so, return true. And so that's exactly what this version of the program compare-1.c does. It is identical to what we looked at Monday except that I've gotten rid of the word string-- though that has no functional impact-- all I'm doing now is removing some visual training wheels, but to see clearly that s and t are addresses. And that's what the star, the asterisk, represents is an address, otherwise known more technically as a pointer. So when I declare s on line 9 and say char* s, that doesn't mean give me a string. That means give me a variable whose purpose in life is to store an address. Because I am about to put the address of a string into it. And indeed, GetString, to be clear, does not return a string. It does not return mom backslash zero, per se. What does GetString specifically and precisely return? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: An address, the address of the first character in some string it has gotten. And so now we're seeing a special keyword again. And, I alluded to this earlier. This is going to be good convention that we'll see again and again now. I'm checking to make sure that s is not null and t is not null. Because based on my really quick mention earlier, what might mean if GetString returns not an address but N-U-L-L, which is again, some special value? AUDIENCE: Error. DAVID J. MALAN: It's an error. Something went wrong. And what typically might happen, especially with strings-- which might be of unknown length in advance-- maybe the computers' out of memory, maybe you typed in such a long word or sentence or pasted such a huge essay there's just not enough memory. And so GetString can't return the address of the whole thing, so it just returns nothing. And it says an error has happened by returning the special NULL value. It's the zero address, so to speak. Now it turns out C comes with a function that does that iteration. We don't have to implement this with a for loop or a while loop ourselves. We can use a function, called succinctly, stir comp, or string compare, whose purpose in life is to do exactly that. You give it two pointers, two addresses, and it will go to those addresses and then compare letter for letter for letter for quality, stopping only when what is true? When intuitively should stir comp stop iterating, just to be clear? When it hits a backslash 0 in either string, at which point it can decide has everything matched, or has there been a discrepancy? So, if we run this now and try our little capitalization game, so make compare-1, ./compare-1, and type mom in lowercase both times. Now it's the same thing. And if I do it again with lowercase and then maybe uppercase. Now it indeed distinguishes between upper and lowercase. So not all that hard or magical, but it does now explain what's going on underneath the hood. So what more can we extract from this kind of lesson? So let's take a look at this. I'm going to go ahead and write a quick program here called copy-0. And now let's go ahead and actually let's do this-- with copy-0, take a look at what I've got here. I first tell the user, say something. Then I get a string and I stored it in s. Then I check if s equals equals NULL, just return 1. So this is just standard error checking. Nothing interesting has happened. And in fact, if we get rid of the error checking, this looks like week 1 code at the moment. But I've started to get a little better about that. Now in line 16, a week ago, maybe even a couple days or minutes ago, you might say line 16 is creating a variable called t and copying s into it. And that's a perfectly reasonable takeaway. But be more precise now. What is happening in line 16? What is getting copied from right to left? Yeah? AUDIENCE: Is t getting an address of s? DAVID J. MALAN: Exactly, t is getting the address of s. So to be clear now, if I go back to that earlier example and I draw out the thing I've typed in. And what I've typed in-- here's s, and here is what I've typed in somewhere in memory, mom and then a backslash 0 that's added for me. What I stored in here, recall, this is at location 1, 2, 3, 4, this is what's currently in s. So if on line 16, I say give me another variable called t and store in at the value of s, what gets stored here will not mom but rather just the number 1. So if we look ahead in this program now, what's going to happen? So notice that there's this function you might have used this some time ago for Caesar, or Vigenere, or maybe not at all. I claim with my printf, I'm going to capitalize the copy t. First in line 19, quick sanity check, strlen checks the length of t. Because I don't want to try to capitalize something if there's no string there. If the user just hit Enter, there's nothing to capitalize. So I don't want to do line 21. So line 21 is capitalizing which letter, apparently, in t? AUDIENCE: m? DAVID J. MALAN: It looks like it's copying which one? AUDIENCE: m. DAVID J. MALAN: Uh, m. OK, so the first m, because notice that I'm passing to toupper, which if you've never seen it it's just a function to capitalize as its input. t bracket zero means give me the zero character of t. And so how does this picture change, to be clear? What needs to get rewritten or changed with respect to s and t and mom backslash zero. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah, so this one here simply needs to get changed to-- fix this-- needs to get changed to a capital m. But now, look later in the program, if I print out s and t as I clean here, watch what's going to happen printing out s and t. So make copy-0, ./copy-0. Let me go ahead and type in mom in all lowercase. Notice both the original and the copy have been capitalized. Why? Well, s and t are both pointing to, if you will, the same chunk of memory. And frankly, this is getting really uninteresting-- the fact that we're using address zero here. I mean, I don't really care where stuff is in memory. Sorry I'm erasing a little too much. But I don't really care where things are in memory. And so, indeed what programmers tend to think about is that when you talk about an address, or a pointer, who cares where it is in memory. I don't care if it's at byte one or one billion. I just care that this variable is effectively pointing at that chunk of memory. And so, henceforth, rather than quibble over arbitrary memory addresses, let's just start to draw pointers as pointers, as arrows. So what s and t really are, according to this program, because of how I created t, it's just two separate variables pointing at the same chunk of memory. And we don't care where they are. So we can abstract away that detail. So how do I fix this? If I want to write a version of the copy program that actually copies the string and capitalizes only the copy, just intuitively, what's got to be an ingredient to our solution? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: We need a what? AUDIENCE: Chunk of memory. DAVID J. MALAN: We need another chunk of memory, right? We don't know how to do it yet, necessarily. But I kind of need this to happen so that the original mom in lower case ends up in that extra chunk of memory. And then when I change the copy, I don't want to change this copy here. I instead want to change only this copy so that the original is unchanged. So, let's see how we might do this. In copy-1, which has already been stripped of comment, but is commented online. We instead do the following-- these lines are identical, get me a string and call it s. But now let's look at one of our most complex but the last of the complexity for awhile, line 16 does exactly this. So if your comfy with the picture we just drew-- give me a new chunk of memory, copy everything into it, let's see how we translate that to code. So line 16, on the left hand side, char* t gives me this box over here. That's all it does. On the right hand side, m alloc, or malloc, is memory allocation, super fancy, a cryptic way of just saying give me a chunk of memory. How much memory do we need? Well, is kind of a big expression. But let's see what it says here. So this, of course, is give me the string length of s. So, mom it should be what? So just three, right? mom is three characters. You don't count the backslash zero when you talk about the length of a string it's actually the human visible letters. So mom, so this gives me 3. But wait a minute, I'm now adding 1. Why do I actually want to allocate 4 bytes and not just 3? Yeah? AUDIENCE: For the sentinel value? DAVID J. MALAN: Exactly, for that sentinel value. For the backslash zero, I need 4 bytes total. So I need the length of the string plus 1. And then just for good measure-- even though on this system, it's always going to be 1-- I'm saying multiply this by the size of a char. Turns out sizeof is an operator in C that just tells you the number of bytes that's required for a certain data type. It doesn't work for arrays, typically, sometimes it does. But in the general case, no. But it will tell me how many bytes a char is, which turns out is always 1. So this is like multiplying by 1. So super cryptic looking line of code. But all it does is gives me a chunk of memory. But does it seem to be copying anything into that memory? Not yet. And so what do I on line 22, and 23, 24, 25, well, I simply do this. And this is sort of old school stuff now. This is like PSet 2, where you're just moving things around in memory, or rather in strings. So I'm iterating from 0 to the length of the string s. And I'm copying the i-th character in s into the i-th character in t. And because I, the programmer, made sure to allocate exactly as many bytes as I need, it's perfect one-to-one relationship. And I copy mom in lowercase to the new one. And then lastly, I do this line. And so the effect is only to capitalize this t here. So a lot to absorb, but if you just consider what's really going on underneath the hood is just moving these bytes around, all that is needed to solve this problem is just to give us this chunk of memory. Now at the risk of overwhelming, let me show one other example that's almost identical, except for this one line of code. So this is the hacker version of this program, if you will. But let's just distill it into what's going on. Line 24 used to be this t bracket i gets s bracket i. Now, I'm changing this to the much more cryptic star t plus 1 equals star s plus 1. So what's happening and why do we have a star character? We've seen the star before, and it's being used differently here. We previously saw char*, now I'm seeing a star at the beginning, and that's OK. Because it turns out we can kind of infer just from those first principles what's going on. So just to be clear, what is s? Last week, it was a string. That doesn't suffice anymore. What is s, specifically? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: It's a pointer. It's the address of the first character we typed in. OK, what is t? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: The address of the first byte in t, that chunk of memory reallocated. So it turns out that when we iterate from 0 on up to the string length-- first of all, i starts off at 0, because of this old school for loop thing. So just for simplicity, let's assume that the first line of code is really just this, right. If i is zero, adding zero to something presumably is not going to have an effect. So what is this saying? It turns out that the star operator in this context is the dereference operator, which is just a fancy way of saying go to the following address. So if s is the address of the first character in this chunk of memory, *s means go there. And because we've drawn the picture in this way, you can adopt the following mental model. If this is s, and you say *s, *s kind of like chutes and ladders, if you remember the game from childhood, is like follow that arrow and go to the address. *t is the same thing. So start here, go to its chunk. I can't just draw on this screen that way. *t means to go here. And then, the for loop is just saying move this character here, move this character here, move this character here. But how do I do that incrementation? I need to undo what I just deleted. This is what's generally called pointer arithmetic, which means math with addresses. If, in this for loop, I keep incrementing i, and s is an address and t is an address, if I just keep adding 1, that just means keep moving forward, and forward, and forward in the memory. It's like Oxford Street, the street that the CS building is on. The CS buildings is at 33 Oxford Street. So if you were to do 33 Oxford Street plus 1, that brings you to 34 Oxford Street, then 35 Oxford Street, then 36 Oxford Street, whatever those buildings actually are--if they exist. And so, that's all we're doing here with pointer arithmetic. So it's a super arcane way of expressing ourselves. But all that's happening underneath the hood is just following these addresses, like following a map, if you will, or following arrows like we've drawn on the screen. OK, a lot to digest. Any question on syntax, concepts, pointers, malloc, or the like. Yeah, over here first. AUDIENCE: So where that says *t equals toupper *t, is that going to capitalize all the letters or just-- DAVID J. MALAN: Ah, really good question. So in this line here, 31, is this going to capitalize the first letter or all of the letters. So let's answer that by going back to first principles. And first principles here I mean just go to the basic definitions of what's involved. So toupper's a function that capitalizes a char. That's all. *t means go to the first-- go to the address in t. So, in the picture, if this is the chunk of memory we allocated with malloc, and this is t, *t means go here. Meanwhile, you're passing that value, lowercase m to toupper, you're getting back capital M, where are you putting it? You're putting it in that same location. And so by that logic of those basic definitions it's only capitalizing the first letter unless you iterate with i or a for loop or a while loop, it's not going to do anything more than you ask it. Good question. Yeah? AUDIENCE: Why did you use the dereference method rather than the array? DAVID J. MALAN: Ah, good question. Why would you use the dereference method instead of the array method? No particular reason, to be honest. And, in fact, for this kind of example, right, I'm just arguing making the program more complicated, more eyes are glazing over, people are checking out because this looks super arcane, but even though it's doing the same thing. And so, frankly, this is an unnecessarily visually complex solution to the problem. It's still good design, five out of five for design, whether it's in the bracket notation or the pointer notation. But-- especially when we get later in the course in PSet 5 when we implement that dictionary that I've mentioned a couple of times-- we'll actually care about the low level memory addresses that we really understand what's going on. But, for now, it turns out that this line of code here square brackets don't really exist. They are what's called syntactic sugar, which is just a weirdly cool way of saying the compiler converts square brackets to be that mathematical expression. So it's a human convention to be able to just write these very user-friendly brackets. But what the compiler, clang, is really doing any time you write what's highlighted in line 24, underneath the hood it's really converting it to this. It's just more pleasurable as a human to read and write code like line 24. But eventually those training wheels too come off when one's own comfort gets stronger. All right, so recall then that this was the sort of biggest problem we ran into. And that's what sparked this whole damn conversation about pointers, and addresses, and copying things. It was because we tripped over this stupid, stupid issue, whereby I implemented logically-- with Lauren up here on the demo and the orange juice in the milk-- a perfectly algorithmically correct function for swapping two variables' values, but the damn thing didn't have any persistent, or permanent, effect on my code. And why was that? In a nutshell, why is this implementation of swap logically correct, but has no impact on the variables that are passed to it, like x and y for main? What was the gist of the issue? Yeah? AUDIENCE: Because variable made copies of variable in the pass through function. DAVID J. MALAN: Exactly, when you pass variables into a function, or arguments into a function, they're passed by copy, which means you get an identical looking pattern of bits for both x and y, called here a and b. And you can do anything you want with those copies, but they're going to have no effect on the calling function. And, in fact, we drew that picture on the screen, recall last time, whereby if you really think about what's going on underneath the hood-- if this is your computer's memory, and down here is the chunk of memory being used for main, this is the chunk of memory being used for swap, and so even if main has two variables, x and y, swap might have identical looking values, both of which are 1 and 2, but they're completely different chunks of memory. So we need a solution to this. And frankly, it would seem that we now have a solution to this problem, right. If we now have the ability to manipulate things by way of addresses and, sort of chutes and ladders style, follow these arrows and go anywhere we want in memory, couldn't we solve this problem by passing from main to swap not the values we want to swap, but just intuitively what could we pass to swap instead? [INTERPOSING VOICES] DAVID J. MALAN: Why don't we just pass it the addresses, right? Why don't we give swap a treasure map, if you will, that leads it to the actual values x and y. Let's swap, actually change those original bits, rather than just passing copies of the bits. And so, in fact, that's what's going to be the solution. This version here is clearly bad and flawed. And now, at first glance, it just looks like we added a bunch of stars randomly and crossed our fingers that it would compile. But, it would now compile. But let's see what these things mean. And, unfortunately, the authors of C could have chosen another symbol to make this a little clearer, but the star operator has different meaning in two different contexts. And we've seen both, but let's distinguish. So up at the top there, when I have changed a and b from being int's in the bad version to int stars, a and b, previously, were integers. What are a and b now in the good, green version? They're addresses. Addresses of what, to be clear? Addresses of integers. So the fact that I'm saying int star means this is the address of an integer, specifically. So now notice in the lines of code, something else has changed too. tmp stays the same, because it's just the temporary integer, no memory magic there. But a now needs a star. And, in fact, every other mention of a and b, notice that all that's changing from red to green is that I'm prefixing those variables with stars. Because I don't want to copy a and b. Because if I just copy a and b and swap a and b, what am I actually swapping? Just addresses, I want to swap what's at those addresses. I want to go there. And so the star operator inside of my function, not inside of the parameter list, means you go to those addresses and actually change those values. So what does the picture now look like instead. Well, if instead I'm passing in for a and b not 1 and 2-- I actually need to add one other definition here. So suppose that this chunk of memory is at location 10. This is at location 11, but this is a bit of a simplification, I now have two choices do I pass x and y or do I pass their addresses? If I pass their addresses like this, I just now need to implement swap per the green code so that when it sees a and when it sees b, it doesn't just copy a and b and move the milk and orange juice. The milk and orange juice metaphor now breaks down, because those are cups of liquid and not maps. We instead need to go to address 10 and we need to go to address 11, and then perform that swapping logic. So the logic is the same, but we need a slightly different way of accessing those variables. And so in the end, what the program has to look like is this. In swap.c literally copied and pasted the green version. But I need to make one change. It's not sufficient just to change swap. What other line of code do I need to change? Yeah? AUDIENCE: Where it takes the arguments. DAVID J. MALAN: Where it takes its argument. So if I scroll up to main, I can't just pass in x and y, and, I promise, the last piece of new syntax today. I need to pass in not x and y but the address of x and y. And it turns out, the symbol that the authors of C chose is if you use an ampersand here, not to be confused with the bitwise ampersand, if you use an ampersand here and an ampersand here, this figures out for you, what's the address of x, maybe it's 10, what's the address of y, maybe it's 11, and passes those in instead. So a lot to absorb all at once. But let's see now quickly in our remaining four minutes where things can go awry. And as an aside, actually I took this picture, TF took this picture a year or two ago. So this is the back corner of Eliot Dining Hall. Pointers are perhaps the hardest topic that we cover in CS50. So if you worry the sort of slope is like maybe it's more of a hockey stick like this, realize we're kind of nearing a peak in terms of the conceptual complexity. And I bring up this photo, because I swear to god, in fall 1996, when I took CS50 with my teaching fellow, Nishat Mehta, he sat me down in the corner of the Eliot D. Hall over lunch, or dinner, or something to try to help me understand pointers. And this is where I was weeks after it was introduced in lecture when I finally understood pointers. And I'm hopeful that this will click far sooner for you. But realize this absolutely among the more sophisticated topics we've looked at. But it's among the most powerful. And when you get it, it's really all just going to finally come together. So rest assured it doesn't need to all sink in today. So here's the last program we're going to look at. And we're going to end with a quick three minutes of claymation made by our friend, Nick Parlante. Here's a program, that on the top two lines declares a variable x and y. Both of which are addresses of integers, AKA pointers. We then allocate enough memory to store an int and store the address of that memory in x. So, it's even simpler than the example before. Give me four bytes of memory, that's the size of an int, and put that address in x. This line here means go to the address in x and put the meaning of life, the number 42 there. But this line worries me. Star y means go to the address in y, and put the unlucky number 13 there. Why is it dangerous, at this point in the story-- albeit rapidly told in our waning minutes here-- why is it bad for me to say, go to the address in y? AUDIENCE: You haven't [INAUDIBLE]. DAVID J. MALAN: I haven't put anything in y. So what is the value of y, at this point in the story? We have no idea. It's some garbage value and nor does Binky know. If we could end on this note. [VIDEO PLAYBACK] -Hey, Binky, wake up. It's time for pointer fun. -What's that? Learn about pointers? Oh, goody. -Well, to get started, I guess we're going to need a couple pointers. -OK. This code allocates two pointers which can point to integers. -OK, well I see the two pointers, but they don't seem to be pointing to anything. -That's right. Initially pointers don't point to anything. The things they point to are called pointees and setting them up is a separate step. -Oh, right, right. I knew that. The pointees are separate. So how do you allocate a pointee? -OK, well this code allocates a new integer pointee, and this part sets x to point to it. -Hey, that looks better. So make it do something. -OK, I'll dereference the pointer x to store the number 42 into its pointee. For this trick, I'll need my magic wand of dereferencing. -Your magic wand of dereferencing? Uh, that, that's great. -This is what the code looks like. I'll just set up the number and-- [POP SOUND] -Hey, look there it goes. So, doing a dereference on x follows the arrow to access its pointee. In this case, to store 42 in there. Hey, try using it to store the number 13 through the other pointer, y. -OK. I'll just go over here to y, and get the number 13 set up. And then take the wand of dereferencing and just-- [BUZZER SOUND] -Oh, hey that didn't work. Say, uh, Binky, I don't think dereferencing y is a good idea, because setting up the pointee is a separate step. And I don't think we ever did it. -Hmm, good point. -Yeah, we allocated the pointer, y, but we never set it to point to a pointee. -Hmm, very observant. -Hey, you're looking good there, Binky. Can you fix it so that y points to the same pointee as x. -Sure, I use my magic wand of pointer assignment. -Is that going to be a problem, like before? -No, this doesn't touch the pointees. It just changes one pointer to point to the same thing-- [POPPING SOUND] --as another. -Oh, I see. Now y points to the same place as x. So, wait, now y is fixed. It has a pointee. So you can try the wand of dereferencing again to send the 13 over. -Oh, OK, here goes. -Hey, look at that. Now dereferencing works on y. And because the pointers are sharing that one pointee, they both see the 13. -Yeah, sharing, uh, whatever. So, are we going to switch places now? -Oh, look we're out of time. -But-- -Just remember the three pointer rules. Number 1, the basic structure is that you have a pointer, and it points over to a pointee. But the pointer and pointee are separate. And the common error is to set up a pointer but to forget to give it a pointee. Number 2, pointer dereferencing starts at the pointer and follows its arrow over to access its pointee. As we all know, this only works if there is a pointee, which kind of gets back to rule number 1. Number 3, pointer assignment takes one pointer and changes it to point to the same pointee as another pointer. So after the assignment, the two pointers will point to the same pointee, sometimes that's called sharing. And that's all there is to it, really. Bye-bye now. [END PLAYBACK] DAVID J. MALAN: That's it for CS50. Thanks to Professor Nick Parlante. We'll see you next week. [ELECTRONIC MUSIC PLAYING]