WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:01:17.581 [MUSIC PLAYING] 00:01:18.631 --> 00:01:22.651 DAVID J. MALAN: Well, this is CS50, and already this is week four, 00:01:22.651 --> 00:01:24.631 and recall that last week, week three, we 00:01:24.631 --> 00:01:27.571 began to explore the inside of a computer's memory a bit more. 00:01:27.571 --> 00:01:30.631 We talked about arrays, which were just chunks of memory 00:01:30.631 --> 00:01:33.451 back to back to back that really lay things out left to right, top 00:01:33.451 --> 00:01:36.721 to bottom, and this is actually a pretty common paradigm, even if you're 00:01:36.721 --> 00:01:38.761 new to programming, and certainly new to C. 00:01:38.761 --> 00:01:43.771 You've seen this approach of just using memory in some way to lay things out, 00:01:43.771 --> 00:01:45.161 like images, for instance. 00:01:45.161 --> 00:01:50.371 So for instance, here is a photo taken of last week's front row, for instance, 00:01:50.371 --> 00:01:53.791 and this is an opportunity to explore exactly what happens 00:01:53.791 --> 00:01:56.911 if we start to zoom in and zoom in and zoom in, because it seems like most 00:01:56.911 --> 00:02:00.661 any TV show like CSI, or whatever, or any movie that 00:02:00.661 --> 00:02:06.601 explores forensic information might have the investigators zoom in 00:02:06.601 --> 00:02:09.994 on an image like this to see what the glint in someone's eye 00:02:09.994 --> 00:02:12.661 is because that reveals the license plate number of someone that 00:02:12.661 --> 00:02:13.556 just drove past. 00:02:13.556 --> 00:02:15.431 Something that's a little over the top there, 00:02:15.431 --> 00:02:18.661 but there's an opportunity here to speak to why that is so unrealistic. 00:02:18.661 --> 00:02:21.661 For instance, let's zoom on this puppet here's eye and let's 00:02:21.661 --> 00:02:23.971 zoom in a little more to see what might be reflected. 00:02:23.971 --> 00:02:26.581 Let's zoom in a little more, and that's it. 00:02:26.581 --> 00:02:29.051 There's only finite amount of information 00:02:29.051 --> 00:02:31.171 if you have an image represented in this way. 00:02:31.171 --> 00:02:34.321 We're using pixels-- these dots on the screen as rows and columns-- 00:02:34.321 --> 00:02:36.781 because if you're only using a finite amount of memory 00:02:36.781 --> 00:02:40.111 then at the end of the day, you can only store a finite amount of information. 00:02:40.111 --> 00:02:43.921 At least I don't really see in this grid here any glint of a license plate 00:02:43.921 --> 00:02:46.651 or something like that that you might otherwise see in Hollywood. 00:02:46.651 --> 00:02:49.681 So today we'll explore these kinds of representations 00:02:49.681 --> 00:02:52.501 of how you might use memory in new and interesting ways 00:02:52.501 --> 00:02:55.861 to represent now, very familiar things, but also 00:02:55.861 --> 00:02:59.071 start to explore what some of the limitations are of this representation. 00:02:59.071 --> 00:03:02.851 But consider after all that this doesn't need to be even as high resolution, 00:03:02.851 --> 00:03:05.161 as many pixels as something like this other image, 00:03:05.161 --> 00:03:09.131 you can imagine just doing something silly with Post-It notes, like this. 00:03:09.131 --> 00:03:11.821 And if you think of an image as just having rows and columns, 00:03:11.821 --> 00:03:14.131 these rows otherwise known as scan lines-- something 00:03:14.131 --> 00:03:17.701 we'll explore in the coming week-- you could make this fun smiley face 00:03:17.701 --> 00:03:22.111 by just using two different values, maybe a zero and a one. 00:03:22.111 --> 00:03:26.141 Or yellow and purple, or vice versa, just to make something come to life. 00:03:26.141 --> 00:03:30.331 Now in practice, recall we talked about storing not just a zero or one, 00:03:30.331 --> 00:03:37.414 but maybe an R, a G, and a B value-- like 24 bits, or three bytes in total-- 00:03:37.414 --> 00:03:38.581 but we'll come back to that. 00:03:38.581 --> 00:03:40.289 That would just be a more involved image. 00:03:40.289 --> 00:03:46.111 But for fun, if today you want to tackle something passively in the background, 00:03:46.111 --> 00:03:49.531 if you go to this URL here, we've put together an opportunity 00:03:49.531 --> 00:03:52.201 to do a bit of pixel art. 00:03:52.201 --> 00:03:55.801 If you go to this URL here, that'll redirect you to a Google Spreadsheet. 00:03:55.801 --> 00:03:58.141 If you have a laptop with you today that'll 00:03:58.141 --> 00:04:01.541 look a little something like this, which we've organized in rows and columns. 00:04:01.541 --> 00:04:05.881 So if you'd like to go ahead and use Google Spreadsheet's colorization 00:04:05.881 --> 00:04:09.331 feature to color in those individual squares if you'd like, 00:04:09.331 --> 00:04:12.751 see if you can't make something a little creative and then email it to Carter 00:04:12.751 --> 00:04:16.841 and we'll exhibit some of the best or favorites on the website thereafter. 00:04:16.841 --> 00:04:20.064 So let's transition then to something a little more familiar-- images. 00:04:20.064 --> 00:04:22.231 And not all of you have used, presumably, Photoshop, 00:04:22.231 --> 00:04:25.481 but you're probably generally familiar with Photoshop as a program for editing 00:04:25.481 --> 00:04:27.701 and creating images or photos or the like. 00:04:27.701 --> 00:04:30.631 And here is a screenshot of p's color picker, 00:04:30.631 --> 00:04:32.618 via which you can change what color you're 00:04:32.618 --> 00:04:34.951 going to draw with the paint brush, or what color you're 00:04:34.951 --> 00:04:36.931 going to fill in with the paint bucket. 00:04:36.931 --> 00:04:39.031 It's representative of any kind of graphical tool. 00:04:39.031 --> 00:04:41.441 And there's a lot of information in here, 00:04:41.441 --> 00:04:43.921 but there's perhaps some familiar terms now-- 00:04:43.921 --> 00:04:47.791 R, G, and B. In fact, right now this is Photoshop's way 00:04:47.791 --> 00:04:50.491 of saying you're about to fill in your background or foreground 00:04:50.491 --> 00:04:52.681 with the color black, and that appears to be 00:04:52.681 --> 00:04:56.131 represented with an R, a G, and a B value of zero, zero, zero. 00:04:56.131 --> 00:05:01.981 Or alternatively, using a hash symbol and then 000000. 00:05:01.981 --> 00:05:04.441 And if some of you have already made web pages before 00:05:04.441 --> 00:05:06.331 and you know a little bit of HTML and CSS, 00:05:06.331 --> 00:05:08.671 you probably are familiar with this kind of syntax-- 00:05:08.671 --> 00:05:12.531 a hash symbol and then six, or sometimes three digits thereafter. 00:05:12.531 --> 00:05:15.031 And if we look at a few different colors here, for instance, 00:05:15.031 --> 00:05:17.131 here might be the representation of white. 00:05:17.131 --> 00:05:23.311 Now the R, the G, and the B values went way up from 0 to 255, 255, 255. 00:05:23.311 --> 00:05:28.111 Or alternatively, it looks like Photoshop, and in turn web browsers, 00:05:28.111 --> 00:05:31.589 could represent that same color white with FFFFFF. 00:05:31.589 --> 00:05:32.881 And let's just do a few others. 00:05:32.881 --> 00:05:37.621 Here is red, and it turns out that red is a whole lot of red, 255, 00:05:37.621 --> 00:05:39.181 but no green, no blue. 00:05:39.181 --> 00:05:40.326 Or, a.k.a. 00:05:40.326 --> 00:05:42.549 FF0000. 00:05:42.549 --> 00:05:44.341 So there's perhaps a pattern here emerging. 00:05:44.341 --> 00:05:48.421 Here is green, zero, 255, zero, a.k.a. 00:05:48.421 --> 00:05:52.661 00FF00, or lastly, here blue, which is no red, 00:05:52.661 --> 00:05:56.371 no green but apparently a lot of blue, 255 again, a.k.a. 00:05:56.371 --> 00:05:58.471 0000FF. 00:05:58.471 --> 00:06:01.861 Now some of you, again, might have seen this notation before, 00:06:01.861 --> 00:06:05.071 these zeros and these F's and all of the numbers and letters in between, 00:06:05.071 --> 00:06:06.844 but this is another form of notation. 00:06:06.844 --> 00:06:08.761 And in fact, we'll explore this today-- really 00:06:08.761 --> 00:06:11.491 is just a precondition for talking about some other concepts. 00:06:11.491 --> 00:06:14.641 But the ideas, ultimately, are really no different. 00:06:14.641 --> 00:06:17.821 What we're about to see is a different base system-- 00:06:17.821 --> 00:06:19.951 not just binary, not just decimal, but something 00:06:19.951 --> 00:06:21.871 we're about to call hexadecimal. 00:06:21.871 --> 00:06:25.831 But first, recall that with RGB we previously did the following. 00:06:25.831 --> 00:06:28.231 Any RGB value-- red, green, blue-- just combine 00:06:28.231 --> 00:06:30.761 some amount of red or green or blue. 00:06:30.761 --> 00:06:35.341 So here we have 72, 73, 33, which in the context of an email or text, of course, 00:06:35.341 --> 00:06:36.901 said what-- 00:06:36.901 --> 00:06:38.401 a couple of weeks back? 00:06:38.401 --> 00:06:40.891 Just hi with an exclamation point, but in the context 00:06:40.891 --> 00:06:45.121 of a Photoshop-like program, this might instead be representing, 00:06:45.121 --> 00:06:47.558 collectively, this shade of yellow, for instance, 00:06:47.558 --> 00:06:50.141 when you combine that much red that much green that much blue. 00:06:50.141 --> 00:06:51.451 So here is the same idea. 00:06:51.451 --> 00:06:53.701 If you've got a lot of red, no green, no blue, 00:06:53.701 --> 00:06:55.291 together that's going to give us red. 00:06:55.291 --> 00:06:58.081 If you've got no red, a lot of green, no blue, 00:06:58.081 --> 00:06:59.851 that's going to give us, of course, green. 00:06:59.851 --> 00:07:03.169 If you've got no red, no green, a lot of blue, that of course, 00:07:03.169 --> 00:07:04.211 is going to give us blue. 00:07:04.211 --> 00:07:08.401 So there's a pattern emerging here where apparently 00 is none, as always, 00:07:08.401 --> 00:07:10.591 and FF is apparently a lot. 00:07:10.591 --> 00:07:17.281 And it's maybe somehow equated with 255, at least per that Photoshop screenshot. 00:07:17.281 --> 00:07:20.551 Meanwhile, if we combine one last one, a lot of red, a lot of green, 00:07:20.551 --> 00:07:21.631 a lot of blue-- 00:07:21.631 --> 00:07:25.359 that's actually going to give us a single white pixel like this. 00:07:25.359 --> 00:07:26.401 All right, so think back. 00:07:26.401 --> 00:07:30.119 Here was binary-- in the world of binary you had just two digits, zero and one. 00:07:30.119 --> 00:07:31.411 Could have been anything else-- 00:07:31.411 --> 00:07:36.541 A or B, X or Y, but the world standardized on these numerals 00:07:36.541 --> 00:07:37.381 zero and one. 00:07:37.381 --> 00:07:40.591 In our world's decimal system, of course, you have zero through nine. 00:07:40.591 --> 00:07:44.101 As of today though, we're going to start using hexadecimal sometimes 00:07:44.101 --> 00:07:47.986 in the context of images and also files just because it's a convention 00:07:47.986 --> 00:07:49.834 and there's some conveniences to it. 00:07:49.834 --> 00:07:51.751 Where now, you're going to be able to count up 00:07:51.751 --> 00:07:54.601 to F in a notation called hexadecimal. 00:07:54.601 --> 00:07:59.671 From zero through nine, then you keep going to A to B to C to D to E to F, 00:07:59.671 --> 00:08:02.641 the idea being each of these, even though it's weirdly 00:08:02.641 --> 00:08:06.781 a letter of the English alphabet, it's still just a single symbol. 00:08:06.781 --> 00:08:12.241 It's not one zero for 10, or 1 1 for eleven-- all 16 of these values, 00:08:12.241 --> 00:08:15.601 these digits, so to speak, are indeed still just single symbols, 00:08:15.601 --> 00:08:19.211 and that's a characteristic of just using this other notational system. 00:08:19.211 --> 00:08:24.751 So how do we get from 00 and FF to something like 0 and 255, respectively? 00:08:24.751 --> 00:08:26.761 Well, this hexadecimal system, a.k.a. 00:08:26.761 --> 00:08:30.186 Base 16, just does the math from week zero and really, 00:08:30.186 --> 00:08:31.811 grade school, a little bit differently. 00:08:31.811 --> 00:08:34.981 For instance, if you have a number that's got two digits, 00:08:34.981 --> 00:08:38.921 or hexadecimal digits as of today, the columns are just a little different. 00:08:38.921 --> 00:08:42.511 Instead of powers of two or powers of 10, which we saw for binary and decimal 00:08:42.511 --> 00:08:45.271 respectively, it's powers of 16. 00:08:45.271 --> 00:08:48.001 So if we just do the math out, that's the ones column, 00:08:48.001 --> 00:08:50.731 this is the 16s column, and so forth. 00:08:50.731 --> 00:08:53.741 Things get actually pretty big pretty quickly in this system. 00:08:53.741 --> 00:08:56.746 But now let's just consider how we would represent familiar numbers. 00:08:56.746 --> 00:08:59.371 If you've got two hexadecimal digits for which these hashes are 00:08:59.371 --> 00:09:02.431 just placeholders, zero, zero is going to mathematically 00:09:02.431 --> 00:09:04.931 equal the decimal number you and I know, of course, as zero. 00:09:04.931 --> 00:09:05.431 Why? 00:09:05.431 --> 00:09:06.721 Same thing as week zero-- 00:09:06.721 --> 00:09:11.041 16 times zero plus one times zero is the number you and I know as zero. 00:09:11.041 --> 00:09:12.521 And we can count up from here. 00:09:12.521 --> 00:09:15.031 This, in hexadecimal, would be how a computer 00:09:15.031 --> 00:09:16.831 represents the number we know as one. 00:09:16.831 --> 00:09:18.821 It would be zero one in this case. 00:09:18.821 --> 00:09:24.181 This would be two, three, four, five, six, seven, eight, nine-- 00:09:24.181 --> 00:09:26.141 in decimal, we're about to go to 10. 00:09:26.141 --> 00:09:29.211 But in hexadecimal, to be clear, what comes next? 00:09:29.211 --> 00:09:38.021 So, apparently A, so 0A, 0B, which is now 10, or 11, or 12, 13, 14, 15. 00:09:38.021 --> 00:09:41.111 So using hexadecimal is just an interesting way 00:09:41.111 --> 00:09:44.951 of using single symbols now, zero through F, 00:09:44.951 --> 00:09:47.901 to count from zero through 15. 00:09:47.901 --> 00:09:50.651 And we'll see why it's 15 in a moment, but as soon as we get to F, 00:09:50.651 --> 00:09:54.821 anyone want to conjecture how in hexadecimal, a.k.a. hex, 00:09:54.821 --> 00:09:57.731 do we now count up one position higher? 00:09:57.731 --> 00:10:01.431 What comes after 0F in hexadecimal? 00:10:01.431 --> 00:10:03.701 So, one zero-- it's the same kind of thing-- 00:10:03.701 --> 00:10:05.866 once you're at the highest digit possible, F-- 00:10:05.866 --> 00:10:07.991 or in our decimal world that would have been nine-- 00:10:07.991 --> 00:10:11.111 you add one more, nine wraps around to zero, or in this case, 00:10:11.111 --> 00:10:12.821 F wraps around to zero. 00:10:12.821 --> 00:10:15.791 You carry the one and voila-- now we're representing 00:10:15.791 --> 00:10:17.511 the number you and I know as 16. 00:10:17.511 --> 00:10:19.451 And we could keep going forever, literally. 00:10:19.451 --> 00:10:23.186 This could be 17, 18, 19, 20, and decimal-- 00:10:23.186 --> 00:10:25.061 but let's just wave our hands at it and count 00:10:25.061 --> 00:10:27.821 as high as we can-- dot, dot, dot-- the highest 00:10:27.821 --> 00:10:31.181 we could count in hexadecimal with two digits, just logically, 00:10:31.181 --> 00:10:32.981 would be what, in hexadecimal? 00:10:32.981 --> 00:10:35.091 Something, something. 00:10:35.091 --> 00:10:35.951 FF, I heard. 00:10:35.951 --> 00:10:39.531 So yes, that's the biggest digit possible, so FF is what we have. 00:10:39.531 --> 00:10:43.163 So how high can you count in hexadecimal if you've got just two of these digits? 00:10:43.163 --> 00:10:44.621 Well, it's the same math as always. 00:10:44.621 --> 00:10:46.571 16 times F, a.k.a. 00:10:46.571 --> 00:10:52.941 15, so that's 16 times 15 plus one times F, or one times 15-- 00:10:52.941 --> 00:10:57.341 that gives us 240 plus 15 in decimal, the result of which, of course, now 00:10:57.341 --> 00:10:59.421 is 255. 00:10:59.421 --> 00:11:02.511 So this hexadecimal system-- you may have seen in the world of web pages, 00:11:02.511 --> 00:11:05.261 and if you haven't we'll get to that in this class in a few weeks, 00:11:05.261 --> 00:11:07.991 or we just saw in the context of Photoshop-- just 00:11:07.991 --> 00:11:14.141 has this shorthand notation of counting as high as 255 but just calling it FF. 00:11:14.141 --> 00:11:17.771 Now it's marginal, but that's like 50% savings of how many digits 00:11:17.771 --> 00:11:21.491 you need in order to count as high as 255 because in decimal, of course, 00:11:21.491 --> 00:11:23.321 255 is three digits. 00:11:23.321 --> 00:11:27.131 In hexadecimal you can count as high using just two, 00:11:27.131 --> 00:11:30.489 and that difference is going to get magnified the bigger our numbers get. 00:11:30.489 --> 00:11:33.281 Let me stipulate for now, you're going to get more and more savings 00:11:33.281 --> 00:11:36.431 in terms of just how many symbols you need on the screen to represent 00:11:36.431 --> 00:11:39.881 bigger and bigger numbers than that. 00:11:39.881 --> 00:11:43.301 All right, let me pause here just to see if there's any questions thus far 00:11:43.301 --> 00:11:46.721 on what we've called hexadecimal, which again, just gives us zero through nine 00:11:46.721 --> 00:11:53.408 as well as A through F. Any questions or confusion? 00:11:53.408 --> 00:11:55.991 And if it feels like we're lingering a bit much on arithmetic, 00:11:55.991 --> 00:11:59.331 we're not really going to see other notations besides this moving forward. 00:11:59.331 --> 00:12:03.461 These are the go-to three in a programmer's world, typically. 00:12:03.461 --> 00:12:04.671 But there are some others. 00:12:04.671 --> 00:12:06.240 Yeah. 00:12:06.240 --> 00:12:08.532 AUDIENCE: Does the hexadecimal symbol take more storage 00:12:08.532 --> 00:12:11.251 than the decimal system? 00:12:11.251 --> 00:12:12.501 DAVID J. MALAN: Good question. 00:12:12.501 --> 00:12:16.611 Does hexadecimal require more storage or less storage than the decimal system? 00:12:16.611 --> 00:12:20.841 Theoretically no, because this is just a way of representing information 00:12:20.841 --> 00:12:23.721 and we'll see in a concrete example in a moment. 00:12:23.721 --> 00:12:27.111 But inside of the computer, at the end of the day, you're still storing bits. 00:12:27.111 --> 00:12:30.228 And using hexadecimal is not using more or fewer bits, 00:12:30.228 --> 00:12:32.061 think of this as how you might write it down 00:12:32.061 --> 00:12:34.971 on a piece of paper, just how many digits you're going to write 00:12:34.971 --> 00:12:37.941 or on a computer screen, how many digits you're going to see at once, 00:12:37.941 --> 00:12:41.211 but it doesn't change how the computer is representing information 00:12:41.211 --> 00:12:44.331 because all they're representing at the end of the day is zeros and ones. 00:12:44.331 --> 00:12:45.621 So in fact, let's go there. 00:12:45.621 --> 00:12:49.851 If this-- a moment ago FF I claimed was 255-- 00:12:49.851 --> 00:12:51.891 let's just rewind to week zero and if we wanted 00:12:51.891 --> 00:12:56.391 to count to 255 in binary, that's as high as you can count, recall, 00:12:56.391 --> 00:12:57.411 with eight bits. 00:12:57.411 --> 00:12:59.244 And there's only a few of these numbers that 00:12:59.244 --> 00:13:03.081 are useful to memorize, like 255 is as high as you can count with eight bits 00:13:03.081 --> 00:13:06.981 if you start at zero, because two to the eighth is 256, but if you start at zero 00:13:06.981 --> 00:13:09.471 it's zero through 255. 00:13:09.471 --> 00:13:13.671 So in binary, recall if you have eight bits, all of which were ones, 00:13:13.671 --> 00:13:15.991 and I won't do out the math pedantically here, 00:13:15.991 --> 00:13:18.366 but if I do do this plus this plus this, dot, dot, 00:13:18.366 --> 00:13:21.391 dot-- that's also going to give me 255. 00:13:21.391 --> 00:13:24.441 So this is what's interesting here about hexadecimal. 00:13:24.441 --> 00:13:28.851 It turns out that an upside of storing values in hexadecimal 00:13:28.851 --> 00:13:32.571 is that we're going to see the first F represents 00:13:32.571 --> 00:13:35.901 the left half of all these bits, and the second F in this case 00:13:35.901 --> 00:13:38.431 represents the rightmost four of these bits. 00:13:38.431 --> 00:13:41.061 So it turns out hexadecimal is very useful when you 00:13:41.061 --> 00:13:44.031 want to treat data in units of four. 00:13:44.031 --> 00:13:47.181 It's not quite eight, but units of four, and that's not bad. 00:13:47.181 --> 00:13:50.271 Which is why-- if you use two digits like I have thus far, 00:13:50.271 --> 00:13:53.061 00 or FF or anything in between-- 00:13:53.061 --> 00:13:57.921 that's actually a convenient way of representing eight bits in total. 00:13:57.921 --> 00:14:02.091 One hex digit for the first four bits, one hex digit for the second. 00:14:02.091 --> 00:14:04.791 And again, there's nothing new intellectually here per se, 00:14:04.791 --> 00:14:08.571 it's just a different way of representing the same story as before-- 00:14:08.571 --> 00:14:09.651 zeros and ones. 00:14:09.651 --> 00:14:11.491 So in what context do we see this? 00:14:11.491 --> 00:14:12.831 Well, we talked about memory last week, and we're 00:14:12.831 --> 00:14:14.414 going to talk more about it this week. 00:14:14.414 --> 00:14:16.941 If this is my computer's RAM-- random access memory-- 00:14:16.941 --> 00:14:21.111 you can again think of each byte as having a number associated with it-- 00:14:21.111 --> 00:14:22.671 its address or location. 00:14:22.671 --> 00:14:26.991 This might be zero, this might be 2 billion, and so in the past 00:14:26.991 --> 00:14:29.781 I've described these as just this, using decimal numbers. 00:14:29.781 --> 00:14:34.131 Here's byte zero, one, two, three, four, five, six, seven, 15, 16 00:14:34.131 --> 00:14:35.581 would be here, and so forth. 00:14:35.581 --> 00:14:40.071 But it turns out in the world of memory, and thus today, programming, people 00:14:40.071 --> 00:14:44.691 tend to count memory bytes using hexadecimal. 00:14:44.691 --> 00:14:46.881 Partly just by convention, but also partly 00:14:46.881 --> 00:14:49.581 because it's a little more succinct and again, each digit 00:14:49.581 --> 00:14:52.641 represents four bits, typically. 00:14:52.641 --> 00:14:54.396 So what comes after F here? 00:14:54.396 --> 00:14:56.271 Well, if I think about the computer's memory, 00:14:56.271 --> 00:15:01.311 I normally might do after F, which is 15, 16. 00:15:01.311 --> 00:15:05.931 But instead, one zero, one one, one two, one three-- this 00:15:05.931 --> 00:15:10.551 is not 10, 11, 12, 13, because I claim I'm in the context of hexadecimal now. 00:15:10.551 --> 00:15:12.621 As per the previous slide, we already started 00:15:12.621 --> 00:15:15.441 going into A's through F's, so you immediately 00:15:15.441 --> 00:15:18.111 see here a possible problem. 00:15:18.111 --> 00:15:21.081 Why is this now worrisome, if all of a sudden you're 00:15:21.081 --> 00:15:26.791 seeing seemingly familiar numbers like 10, 11, 12, 13? 00:15:26.791 --> 00:15:28.928 We didn't really stumble across this problem 00:15:28.928 --> 00:15:30.511 when it was all zeros and ones before. 00:15:30.511 --> 00:15:31.614 Yeah. 00:15:31.614 --> 00:15:33.156 AUDIENCE: Try to do math [INAUDIBLE]. 00:15:35.284 --> 00:15:37.951 DAVID J. MALAN: Yeah, so if you're writing some code in C that's 00:15:37.951 --> 00:15:39.809 doing some math, you might accidentally-- 00:15:39.809 --> 00:15:42.601 or the computer might accidentally confuse hexadecimal with decimal 00:15:42.601 --> 00:15:45.161 if they look in some context the same. 00:15:45.161 --> 00:15:47.251 Any number on the board that doesn't have a letter 00:15:47.251 --> 00:15:51.041 is ambiguously hexadecimal or decimal at this point, 00:15:51.041 --> 00:15:52.751 and so how might we resolve this? 00:15:52.751 --> 00:15:55.711 Well, it turns out that what computers typically do is this. 00:15:55.711 --> 00:16:00.481 By convention, any time you see 0x and then a number, 00:16:00.481 --> 00:16:02.911 that's a human convention of saying-- 00:16:02.911 --> 00:16:06.371 signaling to the reader that this is in fact a hexadecimal number. 00:16:06.371 --> 00:16:10.441 So if it's 0x10, that is not the number 10, 00:16:10.441 --> 00:16:15.611 that is the hexadecimal number one zero, which recall we said earlier, 00:16:15.611 --> 00:16:18.631 is how you count up to 16. 00:16:18.631 --> 00:16:21.151 And again, these are not the kinds of things to memorize, 00:16:21.151 --> 00:16:24.561 it's really just the system for how you think about these things. 00:16:24.561 --> 00:16:27.061 So henceforth today, we're going to start seeing hexadecimal 00:16:27.061 --> 00:16:28.471 in a bunch of contexts. 00:16:28.471 --> 00:16:31.501 When you write code, you might even write code using some hexadecimal 00:16:31.501 --> 00:16:34.001 but again, it's just a different way of representing numbers 00:16:34.001 --> 00:16:37.261 and humans have different conventions for different contexts. 00:16:37.261 --> 00:16:40.771 All right, so with that said, any questions now on this building block? 00:16:40.771 --> 00:16:46.321 But here on out, we'll start using it in some actual code. 00:16:46.321 --> 00:16:48.011 Any questions? 00:16:48.011 --> 00:16:49.581 Nothing so far? 00:16:49.581 --> 00:16:50.081 All right. 00:16:50.081 --> 00:16:53.821 So, let's go ahead and consider maybe a familiar example. 00:16:53.821 --> 00:16:57.571 Something where involving code, where I initialize a variable like n 00:16:57.571 --> 00:16:59.389 to a value like 50, in this case. 00:16:59.389 --> 00:17:01.681 And then let's start to tinker around with what's going 00:17:01.681 --> 00:17:03.391 on inside of the computer's memory. 00:17:03.391 --> 00:17:06.191 In a moment I'm going to load up VS Code on my computer 00:17:06.191 --> 00:17:09.511 and I'm going to go ahead and whip up a program that very simply assigns 00:17:09.511 --> 00:17:13.231 a value like the number 50 to a variable called n, 00:17:13.231 --> 00:17:19.036 but today, keep in mind that that variable n and that value 50 00:17:19.036 --> 00:17:21.404 is going to be stored somewhere in my computer's memory, 00:17:21.404 --> 00:17:24.571 and it turns out today we'll introduce a bit more syntax so you can actually 00:17:24.571 --> 00:17:27.011 see where things are being stored. 00:17:27.011 --> 00:17:28.711 So let me click over to VS Code here. 00:17:28.711 --> 00:17:31.681 I'm going to create a program called address.c just 00:17:31.681 --> 00:17:34.171 to explore computer's addresses today, and I'm 00:17:34.171 --> 00:17:38.701 going to do an include stdio.h, int main(void), as usual. 00:17:38.701 --> 00:17:40.441 No command line arguments for now. 00:17:40.441 --> 00:17:43.043 I'm going to declare that variable n equals 50, 00:17:43.043 --> 00:17:45.251 and then I'm just going to go ahead and print it out. 00:17:45.251 --> 00:17:50.731 So nothing very interesting but I'll use %i backslash n and then comma n 00:17:50.731 --> 00:17:52.321 to print out that value. 00:17:52.321 --> 00:17:55.311 Nothing here should be very interesting to compile or run, 00:17:55.311 --> 00:17:57.811 but I'll do it just to make sure I didn't make any mistakes. 00:17:57.811 --> 00:18:03.301 Looks like as expected, it simply prints out the number 50, like this. 00:18:03.301 --> 00:18:06.781 But let's consider then, what this code is doing underneath the hood 00:18:06.781 --> 00:18:09.521 when it's actually run on your machine. 00:18:09.521 --> 00:18:11.401 So here we have that grid of memory. 00:18:11.401 --> 00:18:15.451 That variable n is an int, and if you think back, 00:18:15.451 --> 00:18:19.051 how many bytes typically do we use for an int? 00:18:19.051 --> 00:18:20.131 Yeah. 00:18:20.131 --> 00:18:22.690 Four, so four bytes, or 32 bits. 00:18:22.690 --> 00:18:26.491 So if each of these squares represents one byte, then my computer, somewhere 00:18:26.491 --> 00:18:29.813 in my memory, or RAM, is using four of these squares. 00:18:29.813 --> 00:18:32.521 Maybe it ends up over here just because there's other stuff being 00:18:32.521 --> 00:18:33.731 used elsewhere, for instance. 00:18:33.731 --> 00:18:35.481 Though I don't really know, and frankly, I 00:18:35.481 --> 00:18:38.273 don't really care where it ends up, just that it ends up somewhere. 00:18:38.273 --> 00:18:41.940 So the variable-- the value 50 is stored here in a variable called n. 00:18:41.940 --> 00:18:45.581 Even though I've written it as decimal, just like in my code-- 00:18:45.581 --> 00:18:50.184 let me again remind that this is 32 zeros and ones representing that 50-- 00:18:50.184 --> 00:18:53.351 it's just going to be very tedious if we start writing everything in binary, 00:18:53.351 --> 00:18:56.351 so I'll use the more comfortable human decimal system. 00:18:56.351 --> 00:18:59.141 So that's what's going on inside of the computer's memory. 00:18:59.141 --> 00:19:03.571 So what if I actually wanted to start tinkering with its location, 00:19:03.571 --> 00:19:06.091 or maybe just knowing its location? 00:19:06.091 --> 00:19:09.901 Well, this variable n indeed has a name, n-- 00:19:09.901 --> 00:19:13.763 that's a label of sorts for it-- but at the end of the day that 50 is 00:19:13.763 --> 00:19:16.471 technically at a specific address, and I'm going to make one up-- 00:19:16.471 --> 00:19:19.501 0x123, and it's 123 because I really don't 00:19:19.501 --> 00:19:22.421 care what it is, I just want an address for the sake of discussion. 00:19:22.421 --> 00:19:28.951 So way over here off screen might be byte zero, way down here is byte 0x123. 00:19:28.951 --> 00:19:32.861 It's in hexadecimal notation just by convention. 00:19:32.861 --> 00:19:36.691 So how can I actually see where my variables are ending up 00:19:36.691 --> 00:19:38.341 in memory if I'm curious to do so? 00:19:38.341 --> 00:19:41.821 Well, let me go back to my code here and let me actually 00:19:41.821 --> 00:19:44.081 change this just a little bit. 00:19:44.081 --> 00:19:49.381 Let me go ahead and introduce, for instance, another symbol 00:19:49.381 --> 00:19:53.581 here and another topic altogether, namely pointers. 00:19:53.581 --> 00:19:59.111 So a pointer is a variable that stores the address of some value-- 00:19:59.111 --> 00:20:02.371 the location of some value or more specifically, 00:20:02.371 --> 00:20:05.681 the specific byte in which that value is stored. 00:20:05.681 --> 00:20:08.941 So again, if you think of your memory as being a whole bunch of bytes-- 00:20:08.941 --> 00:20:11.701 zero at top left, 2 billion or whatever at bottom right, 00:20:11.701 --> 00:20:13.201 depending on how much RAM you have-- 00:20:13.201 --> 00:20:15.481 each of those things has a location, or an address. 00:20:15.481 --> 00:20:19.571 A pointer is just a variable storing one such address. 00:20:19.571 --> 00:20:24.751 So it turns out that in the world of C, there's a couple of new symbols 00:20:24.751 --> 00:20:29.111 we can use if we want to see what it is we're talking about here, 00:20:29.111 --> 00:20:32.041 and those two operators, as of today, are these. 00:20:32.041 --> 00:20:35.831 You can use the ampersand operator in C in a couple of ways. 00:20:35.831 --> 00:20:38.761 We already saw it very briefly to do ampersand ampersand-- 00:20:38.761 --> 00:20:42.271 it's kind of and two Boolean expressions together 00:20:42.271 --> 00:20:43.811 in the context of a conditional. 00:20:43.811 --> 00:20:44.821 This is different. 00:20:44.821 --> 00:20:48.631 A single ampersand is the address of operator. 00:20:48.631 --> 00:20:52.651 So literally, in your code, if you've got a variable like n or anything else 00:20:52.651 --> 00:20:57.901 and you write &n, C is going to figure out for you what is the address of that 00:20:57.901 --> 00:21:00.371 variable n in the computer's memory. 00:21:00.371 --> 00:21:06.001 And it's going to give you a number, otherwise known as the address of that. 00:21:06.001 --> 00:21:09.781 If you want to store that address in a variable 00:21:09.781 --> 00:21:15.841 even though yes, it's a number like 0x123, you have to tell C in advance 00:21:15.841 --> 00:21:21.721 that you want to store not an int per se, but the address of an int. 00:21:21.721 --> 00:21:25.351 And the syntax for doing that-- somewhat nonobviously-- is 00:21:25.351 --> 00:21:29.071 to use an asterisk here, a star operator, and you 00:21:29.071 --> 00:21:30.871 say this when creating the variable. 00:21:30.871 --> 00:21:35.371 If you want p to be a pointer, that is the address of some other variable, 00:21:35.371 --> 00:21:37.051 you do int star p. 00:21:37.051 --> 00:21:41.191 And the star just tells the computer, this is not an integer per se, 00:21:41.191 --> 00:21:44.641 this is the address of something that yes, is an int, 00:21:44.641 --> 00:21:46.401 but we're just being more precise. 00:21:46.401 --> 00:21:49.301 So on the right hand side you have the address of operator. 00:21:49.301 --> 00:21:52.281 As always with the equal sign, you copy from right to left. 00:21:52.281 --> 00:21:56.231 Because &n is by definition the address of something you have to store it 00:21:56.231 --> 00:22:01.781 in a pointer, and the way to declare a pointer is to specify the type of value 00:22:01.781 --> 00:22:05.831 whose address you're storing, and then use the star to indicate that this is 00:22:05.831 --> 00:22:09.341 indeed a pointer and not just a regular old int. 00:22:09.341 --> 00:22:10.811 So let's see this in practice. 00:22:10.811 --> 00:22:13.871 Let me go back to my own source code here and let 00:22:13.871 --> 00:22:15.881 me make just a couple of tweaks. 00:22:15.881 --> 00:22:18.221 I'm going to leave n alone here but I'm going 00:22:18.221 --> 00:22:22.761 to go ahead and initially just do this. 00:22:22.761 --> 00:22:27.341 Let me say int star p equals ampersand n, 00:22:27.341 --> 00:22:31.961 and then down here, I'm going to print out not n this time, but p-- 00:22:31.961 --> 00:22:33.401 the variable p. 00:22:33.401 --> 00:22:38.171 And then even though yes, it's just a number and therefore I could use %i 00:22:38.171 --> 00:22:42.311 for integers, there's actually a special format code in printf for printing 00:22:42.311 --> 00:22:45.521 pointers or addresses, and that's %p. 00:22:45.521 --> 00:22:48.821 So now let's go ahead and recompile this, make address-- 00:22:48.821 --> 00:22:53.871 so far so good-- ./address, Enter, and a little weirdly, 00:22:53.871 --> 00:22:58.511 but perhaps understandably now, the address in my computer's memory 00:22:58.511 --> 00:23:02.381 at which the variable n happened to be stored was not quite as simple 00:23:02.381 --> 00:23:03.881 as 0x123. 00:23:03.881 --> 00:23:06.431 This computer has a lot more memory so technically, 00:23:06.431 --> 00:23:12.491 it was stored at 0x7FFCB4578E5C. 00:23:12.491 --> 00:23:14.651 Now that has no special significance to me. 00:23:14.651 --> 00:23:16.881 It could have ended up somewhere else altogether, 00:23:16.881 --> 00:23:20.381 but this is just where, in my computer-- or technically the cloud 00:23:20.381 --> 00:23:22.901 server to which I'm connected using VS Code here-- 00:23:22.901 --> 00:23:25.498 that just happens to be where n ended up. 00:23:25.498 --> 00:23:28.331 And strictly speaking, I don't even need to introduce this variable. 00:23:28.331 --> 00:23:31.181 I could get rid of p and I could just say 00:23:31.181 --> 00:23:34.901 print not just n, but the address of n and achieve the same thing. 00:23:34.901 --> 00:23:37.361 You don't need to temporarily store it in a variable. 00:23:37.361 --> 00:23:40.341 Let me just do make address again, ./address, 00:23:40.341 --> 00:23:42.921 and now I see this address here. 00:23:42.921 --> 00:23:46.466 And notice if I keep running the program, it's actually moving around. 00:23:46.466 --> 00:23:49.091 There's other stuff presumably going on inside of the computer. 00:23:49.091 --> 00:23:52.501 Maybe it's actually randomizing it so it's not always at the same location. 00:23:52.501 --> 00:23:55.001 That can actually be a security feature underneath the hood, 00:23:55.001 --> 00:24:00.521 but this happens to be at that moment in time where that value is in memory, 00:24:00.521 --> 00:24:03.491 quite like our picture a moment ago. 00:24:03.491 --> 00:24:06.641 All right, so let me pause here to see if there's now 00:24:06.641 --> 00:24:08.171 any questions on what we just did. 00:24:08.171 --> 00:24:10.171 Yeah? 00:24:10.171 --> 00:24:12.391 AUDIENCE: Is there any way to control where 00:24:12.391 --> 00:24:15.551 you are storing something in memory? 00:24:15.551 --> 00:24:18.746 Does it even matter if it works, or does it just 00:24:18.746 --> 00:24:21.271 matter that you could go in and locate where something is? 00:24:21.271 --> 00:24:22.813 DAVID J. MALAN: Really good question. 00:24:22.813 --> 00:24:25.381 Is there any way to control where something is in memory? 00:24:25.381 --> 00:24:28.338 Short answer is yes, and this is both the power in the danger of C, 00:24:28.338 --> 00:24:31.171 and we're going to do this today and make a few deliberate mistakes, 00:24:31.171 --> 00:24:36.241 because with this power of going to or getting the address of any variable, 00:24:36.241 --> 00:24:38.341 I could just arbitrarily right now write code 00:24:38.341 --> 00:24:42.611 that stores a value at byte 2 billion, or zero, or anything in between. 00:24:42.611 --> 00:24:46.771 But that also means potentially, I could start creepily looking 00:24:46.771 --> 00:24:50.831 around at all of the computer's memory, even at things that I didn't put there. 00:24:50.831 --> 00:24:53.371 Maybe other programs, maybe other parts of programs 00:24:53.371 --> 00:24:55.621 and indeed, this is a potential security threat, 00:24:55.621 --> 00:24:57.984 if suddenly you're able to just look anywhere 00:24:57.984 --> 00:24:59.401 you want in the computer's memory. 00:24:59.401 --> 00:25:04.021 Now, I'm overselling it a little bit because nowadays, in this decade, 00:25:04.021 --> 00:25:06.571 there are some defenses in place in compilers 00:25:06.571 --> 00:25:09.941 and in our operating systems that do hedge against this a little bit. 00:25:09.941 --> 00:25:12.391 But this is still a very frequent source of problems, 00:25:12.391 --> 00:25:14.791 and later today we'll talk briefly about things 00:25:14.791 --> 00:25:17.651 called stack overflow, which is not just a website, 00:25:17.651 --> 00:25:19.831 it is a problem that you can encounter. 00:25:19.831 --> 00:25:22.351 Heap overflow, and more generally buffer overflows-- 00:25:22.351 --> 00:25:25.801 there's just so many things that can go wrong using this language called C, 00:25:25.801 --> 00:25:29.401 and if any of you have encountered a segmentation fault yet? 00:25:29.401 --> 00:25:31.321 I think we saw a few hands for that already. 00:25:31.321 --> 00:25:33.901 You touched memory that you shouldn't have 00:25:33.901 --> 00:25:38.611 and odds are you did it most recently by going too far in an array. 00:25:38.611 --> 00:25:42.001 Going to the left, or negative in an array, or somehow looking at memory 00:25:42.001 --> 00:25:42.841 you shouldn't have. 00:25:42.841 --> 00:25:47.051 And we'll explain today why it is you were able to do that. 00:25:47.051 --> 00:25:49.531 Other questions on these primitives so far? 00:25:49.531 --> 00:25:51.623 Yeah, from Carter? 00:25:51.623 --> 00:25:54.748 AUDIENCE: [INAUDIBLE] pointer star p, but then we used p later in the code. 00:25:54.748 --> 00:25:56.031 Is it called star p or p? 00:25:56.031 --> 00:25:57.281 DAVID J. MALAN: Good question. 00:25:57.281 --> 00:25:58.571 Earlier, we used star p. 00:25:58.571 --> 00:26:01.061 Let me rewind in time to the previous version of this code, 00:26:01.061 --> 00:26:03.341 where I actually had a variable called p. 00:26:03.341 --> 00:26:07.151 Just like with variable declarations in the past, 00:26:07.151 --> 00:26:12.621 once you've declared a variable to be an int, a char, a bool, or an int 00:26:12.621 --> 00:26:15.761 star, a.k.a. a pointer, you don't thereafter 00:26:15.761 --> 00:26:18.671 keep using the word int or now, the star. 00:26:18.671 --> 00:26:20.471 Once you've declared it, that's it. 00:26:20.471 --> 00:26:21.921 You only refer to it by name. 00:26:21.921 --> 00:26:26.111 And so it's very deliberate what I did here, 00:26:26.111 --> 00:26:28.661 saying that the type here is int star-- 00:26:28.661 --> 00:26:30.671 that is a pointer to an int-- 00:26:30.671 --> 00:26:33.611 but here I just said the name of the variable, as always. 00:26:33.611 --> 00:26:36.311 I didn't repeat int, and I also didn't repeat star. 00:26:36.311 --> 00:26:39.191 But at the risk of bending one's minds a little bit there 00:26:39.191 --> 00:26:45.441 is unfortunately one other use for the star operator, and that's as follows. 00:26:45.441 --> 00:26:49.181 If you want to print out not the address of something, 00:26:49.181 --> 00:26:54.261 but what is at a specific address, you can actually do this. 00:26:54.261 --> 00:26:59.621 If I want to print out the integer via %i, that is at that address, 00:26:59.621 --> 00:27:04.061 I can actually use the star here, which technically contradicts what I just 00:27:04.061 --> 00:27:07.161 said but it has a different function here-- a different purpose. 00:27:07.161 --> 00:27:09.561 So let me go ahead and do this in two different ways. 00:27:09.561 --> 00:27:11.366 I'm going to leave this line of code as is, 00:27:11.366 --> 00:27:13.241 but I'm going to add another line of code now 00:27:13.241 --> 00:27:17.201 that prints out what apparently will be an integer, in a moment. 00:27:17.201 --> 00:27:21.124 So %i backslash n, and I could see-- and let me just do n for now. 00:27:21.124 --> 00:27:23.291 So there's really nothing special happening now, I'm 00:27:23.291 --> 00:27:25.301 just adding a sort of mindless printing of n. 00:27:25.301 --> 00:27:28.041 So make address, ./address-- 00:27:28.041 --> 00:27:31.601 there's the current address of n and there's the value of n. 00:27:31.601 --> 00:27:34.571 But what's kind of cool about C here, too, 00:27:34.571 --> 00:27:38.861 is if you know that a value is at a specific address like p, 00:27:38.861 --> 00:27:42.591 there's one other use for this star operator, the asterisk. 00:27:42.591 --> 00:27:46.221 You can use it as the so-called dereference operator, 00:27:46.221 --> 00:27:49.071 which means go to that address. 00:27:49.071 --> 00:27:54.701 And so here what we actually have is an example of a pointer p, 00:27:54.701 --> 00:27:59.631 which is an address like 0x123 or 0x7FF and so forth. 00:27:59.631 --> 00:28:03.191 But if you say star p now, you're not redeclaring the variable 00:28:03.191 --> 00:28:04.631 because I didn't mention int-- 00:28:04.631 --> 00:28:07.391 you're going to that address in p. 00:28:07.391 --> 00:28:09.071 So let me recompile this now. 00:28:09.071 --> 00:28:15.191 Make address, ./address, and just to be clear-- 00:28:15.191 --> 00:28:16.721 what should I see? 00:28:16.721 --> 00:28:20.231 I'm first going to see the pointer itself, 0x something. 00:28:20.231 --> 00:28:23.096 What's the second line of output I should presumably see now? 00:28:25.801 --> 00:28:27.591 Shout a little louder. 00:28:27.591 --> 00:28:31.911 So I'm hearing 50, and that's true because if you figure out the address 00:28:31.911 --> 00:28:38.151 of n and print it in line seven, but then go to the address of n, a.k.a. p, 00:28:38.151 --> 00:28:41.331 that's indeed going to just show you the number n-- 00:28:41.331 --> 00:28:44.121 the value of n again. 00:28:44.121 --> 00:28:47.028 All right, any questions now on this syntax-- and I will concede, 00:28:47.028 --> 00:28:48.861 I think this is confusing-- the fact that we 00:28:48.861 --> 00:28:51.051 use the star for multiplication, the fact 00:28:51.051 --> 00:28:53.361 that we use the star to declare a pointer, 00:28:53.361 --> 00:28:56.601 but then we use a star in a third way to dereference the pointer 00:28:56.601 --> 00:28:57.651 and go to the pointer. 00:28:57.651 --> 00:29:01.251 It's just too confusing, honestly, but with practice comes comfort. 00:29:01.251 --> 00:29:02.681 Yeah. 00:29:02.681 --> 00:29:12.501 AUDIENCE: [INAUDIBLE] 00:29:12.501 --> 00:29:13.751 DAVID J. MALAN: Good question. 00:29:13.751 --> 00:29:17.321 Do you-- when you are using the ampersand operator 00:29:17.321 --> 00:29:19.271 to get the address of something, the onus 00:29:19.271 --> 00:29:23.411 is on you at the moment to know what you are getting the address of. 00:29:23.411 --> 00:29:24.341 Is it a string? 00:29:24.341 --> 00:29:25.181 Is it a char? 00:29:25.181 --> 00:29:25.901 Is it a bool? 00:29:25.901 --> 00:29:26.681 Is it an int? 00:29:26.681 --> 00:29:30.041 I wrote this code so I know in line six that I'm 00:29:30.041 --> 00:29:33.131 trying to get the address of what is an integer. 00:29:33.131 --> 00:29:35.271 AUDIENCE: What about line eight? 00:29:35.271 --> 00:29:38.991 DAVID J. MALAN: In line eight you don't have 00:29:38.991 --> 00:29:40.821 to worry about that-- good question. 00:29:40.821 --> 00:29:44.851 Notice in line eight, I didn't tell the computer, other than the %i, 00:29:44.851 --> 00:29:49.551 what kind of address I'm going to, but I did already in line six. 00:29:49.551 --> 00:29:52.581 I told the compiler that p, now and forever, 00:29:52.581 --> 00:29:55.041 is going to be the address of an int. 00:29:55.041 --> 00:29:59.961 That's enough information in advance so that printf, or really the language C, 00:29:59.961 --> 00:30:03.951 still knows on line eight that p is a pointer to an int, 00:30:03.951 --> 00:30:07.371 and that way it will print out all four bytes at that address, 00:30:07.371 --> 00:30:11.288 not just part of it, and not more than those four bytes. 00:30:11.288 --> 00:30:11.871 Good question. 00:30:11.871 --> 00:30:13.801 Yeah, next to you. 00:30:13.801 --> 00:30:15.301 AUDIENCE: Do pointers have pointers? 00:30:15.301 --> 00:30:16.601 DAVID J. MALAN: Do pointers have pointers? 00:30:16.601 --> 00:30:17.101 Yes. 00:30:17.101 --> 00:30:20.731 We won't do this today by having pointers to pointers, 00:30:20.731 --> 00:30:24.421 but yes, you can use star star, and then things get-- 00:30:24.421 --> 00:30:26.311 I'm sorry. 00:30:26.311 --> 00:30:28.501 We won't do that today and we won't do that often. 00:30:28.501 --> 00:30:31.051 In fact Python, another language, is just a couple of weeks 00:30:31.051 --> 00:30:32.221 away, so hang in there. 00:30:32.221 --> 00:30:32.921 Almost there. 00:30:32.921 --> 00:30:34.561 A question back here? 00:30:34.561 --> 00:30:36.331 Was there? 00:30:36.331 --> 00:30:38.191 That was-- more verbal feedback like that 00:30:38.191 --> 00:30:40.871 is helpful as we forge into the more complicated stuff. 00:30:40.871 --> 00:30:41.551 Other questions? 00:30:41.551 --> 00:30:42.909 Yeah. 00:30:42.909 --> 00:30:44.785 AUDIENCE: What's the point of [INAUDIBLE]?? 00:30:48.071 --> 00:30:51.161 DAVID J. MALAN: What's the point of printing the address? 00:30:51.161 --> 00:30:54.451 AUDIENCE: Like, using the address to [INAUDIBLE].. 00:30:54.451 --> 00:30:55.381 DAVID J. MALAN: Sure. 00:30:55.381 --> 00:30:56.521 What's the point of doing this? 00:30:56.521 --> 00:30:58.771 If you don't mind, let me-- let's get there in a moment. 00:30:58.771 --> 00:31:01.471 This is not the common use case, just printing out the address-- 00:31:01.471 --> 00:31:02.821 who really cares? 00:31:02.821 --> 00:31:05.401 At the moment we care only for the sake of discussion. 00:31:05.401 --> 00:31:07.453 We're soon going to start using these addresses. 00:31:07.453 --> 00:31:09.661 So hang in there just a little bit for that one, too, 00:31:09.661 --> 00:31:13.621 but it will solve some problems for us before long. 00:31:13.621 --> 00:31:17.311 So let's actually just now depict what was going on inside of the computer's 00:31:17.311 --> 00:31:19.691 memory just a moment ago. 00:31:19.691 --> 00:31:23.971 So if I toggle back here, let me redraw my computer's memory, 00:31:23.971 --> 00:31:27.421 now let me plop into the memory n, which is storing in this program 00:31:27.421 --> 00:31:28.471 the number 50. 00:31:28.471 --> 00:31:30.631 Where is p in my computer's memory? 00:31:30.631 --> 00:31:33.691 Specifically, I don't know and apparently it moves around each time I 00:31:33.691 --> 00:31:35.741 run the program so for the sake of discussion, 00:31:35.741 --> 00:31:40.711 let's just propose that if 50 ended up at address 0x123, I don't know-- 00:31:40.711 --> 00:31:43.471 p ends up over here, at address-- 00:31:43.471 --> 00:31:46.661 whoops-- at whatever address this is here. 00:31:46.661 --> 00:31:49.111 But notice a couple of curiosities now. 00:31:49.111 --> 00:31:52.621 If p is a pointer, it's the address of something. 00:31:52.621 --> 00:31:57.961 So the value in p should be an address, and I've indeed written it as such-- 00:31:57.961 --> 00:32:02.071 0x123, and technically there's not an x there, there's not a zero there, 00:32:02.071 --> 00:32:04.471 there's not even a 123 there per se-- there's 00:32:04.471 --> 00:32:08.011 a pattern of bits that represents the address 0x123. 00:32:08.011 --> 00:32:11.681 But again, that's weak zero-- don't care about binary day-to-day. 00:32:11.681 --> 00:32:17.761 So if this is p, and this I claimed was n, why is p so much bigger? 00:32:17.761 --> 00:32:20.231 Can someone conjecture here? 00:32:20.231 --> 00:32:25.061 Because it turns out whether n is an int or a char or a bool, 00:32:25.061 --> 00:32:27.701 which are different types-- heck, even a long-- 00:32:27.701 --> 00:32:31.871 it turns out that p is always going to take up eight squares on the board, 00:32:31.871 --> 00:32:33.951 but why might that be? 00:32:33.951 --> 00:32:35.261 What might explain that? 00:32:39.591 --> 00:32:41.507 Yeah, thoughts? 00:32:41.507 --> 00:32:45.451 AUDIENCE: Perhaps it allocates eight bytes, 00:32:45.451 --> 00:32:48.959 but it doesn't know the type of the data [INAUDIBLE].. 00:32:48.959 --> 00:32:50.001 DAVID J. MALAN: OK, fair. 00:32:50.001 --> 00:32:52.191 Maybe it's allocating eight bytes because it doesn't know the type. 00:32:52.191 --> 00:32:54.711 Turns out that's OK because an address is an address. 00:32:54.711 --> 00:32:58.281 It's really up to the programmer to use it as a string or a char or a bool. 00:32:58.281 --> 00:33:00.381 Other thoughts? 00:33:00.381 --> 00:33:05.443 AUDIENCE: Maybe the first four for the actual number and the last four 00:33:05.443 --> 00:33:11.033 is some null that [INAUDIBLE] where the pointer ends. 00:33:11.033 --> 00:33:12.241 DAVID J. MALAN: OK, possibly. 00:33:12.241 --> 00:33:15.211 It could be that pointers have some complexity like a backslash n 00:33:15.211 --> 00:33:18.091 or something curious like that, like we talked about for strings. 00:33:18.091 --> 00:33:19.751 Turns out that's not the case. 00:33:19.751 --> 00:33:23.281 It turns out that pointers nowadays typically are, but not 00:33:23.281 --> 00:33:25.921 always are eight bytes, a.k.a. 00:33:25.921 --> 00:33:29.101 64 bits, because you and I-- our Macs, our PCs, 00:33:29.101 --> 00:33:32.911 heck-- even our phones have a lot more memory than they did years ago. 00:33:32.911 --> 00:33:34.801 Back in the day, a pointer might have only 00:33:34.801 --> 00:33:38.701 been 32 bits, or even only eight bits way back in the day. 00:33:38.701 --> 00:33:41.551 It's considered 32 bits, because that was the norm for some time. 00:33:41.551 --> 00:33:45.091 How high can you count, roughly, if you've got 32 bits? 00:33:45.091 --> 00:33:47.901 What's the number we keep rattling off? 00:33:47.901 --> 00:33:53.061 32 bits is roughly 2 to the 32, so it's 4 billion, 00:33:53.061 --> 00:33:57.271 and I keep saying it's 2 billion if you do negative, but in the world of memory 00:33:57.271 --> 00:34:00.531 there's a reason I keep saying 2 billion bytes, two gigabytes, 00:34:00.531 --> 00:34:03.591 because for a very long time that was the maximum amount of memory 00:34:03.591 --> 00:34:04.621 a computer could have. 00:34:04.621 --> 00:34:05.121 Why? 00:34:05.121 --> 00:34:07.491 Because the pointers that the computers were using 00:34:07.491 --> 00:34:09.531 were only, for instance, 32 bits. 00:34:09.531 --> 00:34:12.591 And with 32 bits, depending on whether you allow for negatives or not, 00:34:12.591 --> 00:34:15.621 you can count as high as 2 billion, roughly, or maybe 4 billion 00:34:15.621 --> 00:34:17.961 but you know what-- your Mac, your PC, your phone 00:34:17.961 --> 00:34:22.441 could not have had five gigabytes of memory, or 5 billion bytes of memory. 00:34:22.441 --> 00:34:25.191 You certainly couldn't have had what computers nowadays come with, 00:34:25.191 --> 00:34:27.171 which might be 8 gigabytes of memory-- 00:34:27.171 --> 00:34:28.561 16 gigabytes of memory. 00:34:28.561 --> 00:34:29.211 Why? 00:34:29.211 --> 00:34:33.501 Because with 4 bytes, or 32 bits, you literally, physically, 00:34:33.501 --> 00:34:37.611 can't count that high, which means if I drew a picture of all of the memory we 00:34:37.611 --> 00:34:41.301 would run out of numbers to describe them, which means most of my memory 00:34:41.301 --> 00:34:42.631 would just be unusable. 00:34:42.631 --> 00:34:45.771 So pointers nowadays are 64 bits, or eight bytes. 00:34:45.771 --> 00:34:46.521 That's really big. 00:34:46.521 --> 00:34:48.438 I can't even pronounce how big that number is, 00:34:48.438 --> 00:34:51.051 but it's plenty for the next many years, and so 00:34:51.051 --> 00:34:52.881 we've drawn it that way on the board here. 00:34:52.881 --> 00:34:54.501 Now let's just abstract this away. 00:34:54.501 --> 00:34:56.209 Let's get rid of all the other bytes that 00:34:56.209 --> 00:34:58.911 are storing something or nothing else, and let's now 00:34:58.911 --> 00:35:02.241 start to abstract away this complexity because the reality is, 00:35:02.241 --> 00:35:04.131 to your question earlier-- 00:35:04.131 --> 00:35:06.441 what is this useful for, or what do we-- do we actually 00:35:06.441 --> 00:35:07.971 care about these addresses? 00:35:07.971 --> 00:35:08.961 Generally, no. 00:35:08.961 --> 00:35:11.061 We're doing this so that you see there's no magic. 00:35:11.061 --> 00:35:13.951 We're just moving things around and poking around in memory. 00:35:13.951 --> 00:35:16.791 But what a person would typically do when talking about pointers 00:35:16.791 --> 00:35:19.401 would literally be to just point at something. 00:35:19.401 --> 00:35:21.951 I really don't care what address n is at, 00:35:21.951 --> 00:35:25.131 so it suffices when general, when drawing pictures on a whiteboard, 00:35:25.131 --> 00:35:27.021 having a discussion with another programmer, 00:35:27.021 --> 00:35:31.341 you just draw an arrow from the pointer to the value in question, 00:35:31.341 --> 00:35:36.470 because neither you nor I probably care about the specifics of 0x whatever. 00:35:36.470 --> 00:35:39.813 There's your pointer-- it's literally an arrow, and we can see this. 00:35:39.813 --> 00:35:42.021 So it turns out that these pointers, these addresses, 00:35:42.021 --> 00:35:45.831 are not that dissimilar to what we've done for hundreds of years 00:35:45.831 --> 00:35:48.381 in the form of a postal system. 00:35:48.381 --> 00:35:50.121 For instance, here is a post office-- 00:35:50.121 --> 00:35:52.731 here, no-- here is a mailbox, and suppose 00:35:52.731 --> 00:35:55.431 that this is a mailbox labeled p. 00:35:55.431 --> 00:35:58.191 It's a pointer, and suppose there's another mailbox 00:35:58.191 --> 00:36:02.041 way over there, which is just another bite of my computer's memory. 00:36:02.041 --> 00:36:03.831 What are we really talking about? 00:36:03.831 --> 00:36:07.881 Well, you store in a computer's memory values like the number 50, 00:36:07.881 --> 00:36:11.841 or the word "hi" inside of your computer's memory at some location. 00:36:11.841 --> 00:36:15.921 But today we can also use those same memory locations 00:36:15.921 --> 00:36:17.551 to store the address of things. 00:36:17.551 --> 00:36:21.351 For instance, if I open this up here and I 00:36:21.351 --> 00:36:25.071 see OK, the value inside of this mailbox is not a number like 50, 00:36:25.071 --> 00:36:26.361 it's actually an address-- 00:36:26.361 --> 00:36:30.861 0x123-- that's like a pointer, a breadcrumb leading 00:36:30.861 --> 00:36:32.661 from one location in memory to another. 00:36:32.661 --> 00:36:35.161 And in fact, would someone who's seated roughly over there-- 00:36:35.161 --> 00:36:37.761 do you mind getting the mail over there? 00:36:37.761 --> 00:36:40.581 Any volunteers over in this section? 00:36:40.581 --> 00:36:42.931 Just need you to get to the mailbox before I do. 00:36:42.931 --> 00:36:44.781 Who's being volunteered? 00:36:44.781 --> 00:36:45.471 Oh yes, please. 00:36:45.471 --> 00:36:50.926 Whoever is gesturing most wildly, come on down. 00:36:50.926 --> 00:36:51.426 Sure. 00:36:57.861 --> 00:36:59.315 What's your name? 00:36:59.315 --> 00:37:00.078 AUDIENCE: Anfoo. 00:37:00.078 --> 00:37:01.161 DAVID J. MALAN: Say again? 00:37:01.161 --> 00:37:01.851 AUDIENCE: Anfoo. 00:37:01.851 --> 00:37:03.201 DAVID J. MALAN: Anfoo? 00:37:03.201 --> 00:37:06.081 OK, come on up to the edge of the stage there and just to be clear-- 00:37:06.081 --> 00:37:09.801 if this is p, that is apparently n, but to make clear 00:37:09.801 --> 00:37:12.621 what we're talking about when we're storing 0x whatever values-- 00:37:12.621 --> 00:37:15.771 like 0x123, that's essentially equivalent to my 00:37:15.771 --> 00:37:18.501 maybe pulling out something like this and just 00:37:18.501 --> 00:37:21.051 abstractly pointing to your mailbox there, 00:37:21.051 --> 00:37:25.311 or if you prefer, pointing to the mailbox-- 00:37:25.311 --> 00:37:26.271 OK, all right. 00:37:28.951 --> 00:37:29.451 Thank you. 00:37:29.451 --> 00:37:29.951 All right. 00:37:32.661 --> 00:37:34.821 This is akin to me pointing at your mailbox, 00:37:34.821 --> 00:37:36.863 and if you want to go ahead and open your mailbox 00:37:36.863 --> 00:37:43.201 and reveal to the crowd what's inside your mailbox labeled n. 00:37:43.201 --> 00:37:43.981 All right. 00:37:46.501 --> 00:37:48.601 Thank you. 00:37:48.601 --> 00:37:51.221 We have a little CS50 stress ball for your trouble. 00:37:51.221 --> 00:37:52.553 Thank you for coming up. 00:37:52.553 --> 00:37:55.261 So that's just to put a visual on what it is we're talking about, 00:37:55.261 --> 00:37:58.171 because it can get very abstract, very cryptic quickly when we're 00:37:58.171 --> 00:38:01.391 talking about addresses and memory and drawing it like these little squares. 00:38:01.391 --> 00:38:04.308 But if you think about just walking into a post office or an apartment 00:38:04.308 --> 00:38:07.261 complex that's got a lot of mailboxes, those mailboxes 00:38:07.261 --> 00:38:10.231 essentially are a big chunk of memory and each 00:38:10.231 --> 00:38:12.091 of those mailboxes has an address-- 00:38:12.091 --> 00:38:14.821 this is apartment one, two, three-- apartment 2 billion. 00:38:14.821 --> 00:38:18.091 And inside of those mailboxes can go anything 00:38:18.091 --> 00:38:20.261 that can be represented as information. 00:38:20.261 --> 00:38:23.341 It could be a number like n, or 50, or if you 00:38:23.341 --> 00:38:25.741 prefer it could be a number that represents 00:38:25.741 --> 00:38:27.631 the address of another mailbox. 00:38:27.631 --> 00:38:30.811 And this is akin, really, if you've ever had an apartment or you 00:38:30.811 --> 00:38:33.631 and your parents have moved, to having a forwarding address. 00:38:33.631 --> 00:38:36.001 It's like having the Post Office in the US 00:38:36.001 --> 00:38:39.481 put some kind of piece of paper in your old mailbox saying, 00:38:39.481 --> 00:38:41.911 actually forward it to that other mailbox. 00:38:41.911 --> 00:38:44.281 That really is all a pointer is doing. 00:38:44.281 --> 00:38:45.991 At the end of the day, it's just a number 00:38:45.991 --> 00:38:48.331 but it's a number being used in a different way 00:38:48.331 --> 00:38:50.461 and it's the syntax that we've introduced, 00:38:50.461 --> 00:38:54.271 not just int but int star, that tells the computer how 00:38:54.271 --> 00:38:58.741 to treat that number in this slightly different way. 00:38:58.741 --> 00:39:01.841 Are there any questions then, on this? 00:39:01.841 --> 00:39:03.962 Yeah, in back. 00:39:03.962 --> 00:39:06.379 AUDIENCE: If you had a variable, like int c, [INAUDIBLE].. 00:39:10.711 --> 00:39:12.691 DAVID J. MALAN: If I did int c and-- 00:39:12.691 --> 00:39:14.841 say the code again? 00:39:14.841 --> 00:39:17.011 Once more? 00:39:17.011 --> 00:39:19.141 Equal to n, so let me actually type it out. 00:39:19.141 --> 00:39:21.271 If I give myself another line of code, tell me 00:39:21.271 --> 00:39:27.251 one last time what to type. int is equal to n, like this? 00:39:27.251 --> 00:39:31.951 So this is OK, and I can't draw it quite quickly enough on the board here, 00:39:31.951 --> 00:39:36.181 but this would be like creating another four bytes somewhere in memory, maybe 00:39:36.181 --> 00:39:40.231 down here, that stores an identical copy of 50 00:39:40.231 --> 00:39:43.381 because the assignment operator from right to left copies one value 00:39:43.381 --> 00:39:44.201 to another. 00:39:44.201 --> 00:39:47.671 So that would just add one more rectangle of size four 00:39:47.671 --> 00:39:50.391 to this particular picture. 00:39:50.391 --> 00:39:52.371 If I'm answering your question as intended. 00:39:52.371 --> 00:39:57.231 OK, so that is week one style use of assignment operators before pointers. 00:39:57.231 --> 00:40:00.051 I could, though, start copying pointers but again, we'll 00:40:00.051 --> 00:40:01.881 come back to some of that complexity. 00:40:01.881 --> 00:40:03.421 Any other questions here? 00:40:03.421 --> 00:40:04.921 AUDIENCE: That was a great question. 00:40:04.921 --> 00:40:06.841 Does the pointer point-- 00:40:06.841 --> 00:40:10.084 does the same pointer point to the new replica as well? 00:40:10.084 --> 00:40:11.501 DAVID J. MALAN: Ah, good question. 00:40:11.501 --> 00:40:12.406 Short answer, no. 00:40:12.406 --> 00:40:17.101 And to repeat for the camera, if I create a second variable like this, 00:40:17.101 --> 00:40:21.271 int c equals n, and I claim without actually drawing it on the board 00:40:21.271 --> 00:40:25.191 that this gives me another rectangle, the value of which is also 50, 00:40:25.191 --> 00:40:26.681 p does not get touched. 00:40:26.681 --> 00:40:29.041 And this is what's important and really characteristic 00:40:29.041 --> 00:40:33.001 of C. Nothing happens automatically for you. 00:40:33.001 --> 00:40:36.581 p is not going to be updated unless you update p in some way, 00:40:36.581 --> 00:40:39.121 so creating a third variable called c-- even 00:40:39.121 --> 00:40:41.521 if you're copying its value from right to left, 00:40:41.521 --> 00:40:44.701 that has no effect on anything else in the program. 00:40:44.701 --> 00:40:46.031 A good question. 00:40:46.031 --> 00:40:52.201 So what have we seen that's perhaps now a little more explainable? 00:40:52.201 --> 00:40:56.221 Well, recall that we talked quite a bit last week about strings, and just 00:40:56.221 --> 00:41:02.101 to recap in layperson's terms, what is this string as you now understand it? 00:41:02.101 --> 00:41:04.191 So say-- well, let me take a specific hand here. 00:41:04.191 --> 00:41:05.091 What's a string? 00:41:05.091 --> 00:41:06.926 How about over here. 00:41:06.926 --> 00:41:08.301 AUDIENCE: An array of characters. 00:41:08.301 --> 00:41:08.811 DAVID J. MALAN: OK, sure. 00:41:08.811 --> 00:41:09.728 Both of you are right. 00:41:09.728 --> 00:41:10.971 An array of characters. 00:41:10.971 --> 00:41:13.761 An array of characters, and we-- 00:41:13.761 --> 00:41:16.881 I claimed-- or revealed last week that string is not technically 00:41:16.881 --> 00:41:20.151 a feature built into C. It's not an official data type 00:41:20.151 --> 00:41:22.401 but every programmer in most any language 00:41:22.401 --> 00:41:25.641 refers to sequences of characters-- words, letters, 00:41:25.641 --> 00:41:27.451 paragraphs-- as strings. 00:41:27.451 --> 00:41:30.771 So the vernacular exists but the data type doesn't typically 00:41:30.771 --> 00:41:34.111 exist per se in C. So what we're about to do, if you will, 00:41:34.111 --> 00:41:36.951 for dramatic effect, is take off some training wheels today. 00:41:36.951 --> 00:41:41.451 The CS50 library implemented in the form of the header file CS50.8-- 00:41:41.451 --> 00:41:43.581 we claim has had a bunch of things in it. 00:41:43.581 --> 00:41:46.761 Prototypes for GetString, prototypes for GetInt, 00:41:46.761 --> 00:41:49.281 and all of those other functions, but it turns out 00:41:49.281 --> 00:41:53.481 it also is what defines the word "string" in such a way 00:41:53.481 --> 00:41:55.981 that you all can use it these past several weeks. 00:41:55.981 --> 00:41:58.641 So let's take a look at an example of a string in use. 00:41:58.641 --> 00:42:00.681 Here, for instance, is a tiny bit of code 00:42:00.681 --> 00:42:05.421 that uses the word "string," creating a variable called s 00:42:05.421 --> 00:42:08.083 and then storing quote unquote, hi, exclamation point. 00:42:08.083 --> 00:42:10.791 Let's consider what this looks like now in the computer's memory. 00:42:10.791 --> 00:42:13.541 I don't care about all the other bytes, let's just focus on these, 00:42:13.541 --> 00:42:16.551 and this per last week is how "hi" might be stored. 00:42:16.551 --> 00:42:19.311 h-i exclamation point and then one more, as someone already 00:42:19.311 --> 00:42:23.151 observed, that sentinel value-- that null character which 00:42:23.151 --> 00:42:26.558 just means eight zero bits to demarcate the end of that string 00:42:26.558 --> 00:42:28.641 just in case there's something to the right of it, 00:42:28.641 --> 00:42:31.801 the computer can now distinguish one string from another. 00:42:31.801 --> 00:42:35.004 So last week we introduced this new syntax. 00:42:35.004 --> 00:42:36.921 Well, if strings are just arrays of characters 00:42:36.921 --> 00:42:39.831 you can then very cleverly use that square bracket notation 00:42:39.831 --> 00:42:44.631 and go to location zero or one or two, which are like addresses, 00:42:44.631 --> 00:42:46.431 but they're relative to the string. 00:42:46.431 --> 00:42:51.381 This could be at 0x123 or 0x456, but with this bracket notation 00:42:51.381 --> 00:42:54.381 zero is always the beginning of the string, one is the next, 00:42:54.381 --> 00:42:55.801 two is the next, and so forth. 00:42:55.801 --> 00:43:00.561 So that was our array syntax for indexing into an array. 00:43:00.561 --> 00:43:03.471 But technically speaking, we can go a little deeper today-- 00:43:03.471 --> 00:43:09.741 technically speaking, if hi is starting at the address 0x123 then 00:43:09.741 --> 00:43:15.711 it stands to reason that i is at 0x124, exclamation point's at 0x125, 00:43:15.711 --> 00:43:18.711 and the null is that 0x126. 00:43:18.711 --> 00:43:23.331 Now, I don't care about 123 per se, but even though this is hexadecimal, 00:43:23.331 --> 00:43:24.591 this is correct math. 00:43:24.591 --> 00:43:28.101 Even in hex, if you just add one when you start at 0x123, 00:43:28.101 --> 00:43:30.456 the next number is four, five, six at the end. 00:43:30.456 --> 00:43:32.331 I don't have to worry about A's, B's, and C's 00:43:32.331 --> 00:43:35.341 because I'm not counting that high in this example. 00:43:35.341 --> 00:43:39.531 So if that's the case, and my computer is actually 00:43:39.531 --> 00:43:47.271 laying out the word hi in memory like that, well, what exactly is s? 00:43:47.271 --> 00:43:50.001 What exactly is s if, at the end of the day, 00:43:50.001 --> 00:43:56.031 H-I exclamation point null is storing-- or is or stored at these addresses? 00:43:56.031 --> 00:43:57.006 Where is s? 00:43:57.006 --> 00:43:58.881 Now that I've taken off those training wheels 00:43:58.881 --> 00:44:02.481 and showed you where H-I exclamation point null actually are, 00:44:02.481 --> 00:44:04.221 what happened to s? 00:44:04.221 --> 00:44:08.211 Well s, as always, is actually a variable. 00:44:08.211 --> 00:44:10.251 Even in the code I proposed a moment ago, 00:44:10.251 --> 00:44:13.551 s is apparently a data type that yes, doesn't come with C, 00:44:13.551 --> 00:44:16.101 but CS50's library makes it exist. 00:44:16.101 --> 00:44:21.471 s is a variable of type string, so where is s in this picture? 00:44:21.471 --> 00:44:25.431 Well, it turns out that s might be up here. 00:44:25.431 --> 00:44:28.971 Again, I'm just drawing it anywhere for the sake of discussion, 00:44:28.971 --> 00:44:33.141 but s is a variable per that line of code. 00:44:33.141 --> 00:44:36.978 What s is storing, apparently, I claim, is 0x123. 00:44:36.978 --> 00:44:40.311 I actually don't really care about these addresses, so let's abstract that away. 00:44:40.311 --> 00:44:45.591 s is apparently, as of now, today, one week later, just a pointer 00:44:45.591 --> 00:44:46.761 to a character. 00:44:46.761 --> 00:44:49.311 Specifically, the first character in s. 00:44:49.311 --> 00:44:51.411 And this is the last piece of the puzzle. 00:44:51.411 --> 00:44:54.981 Last week we had this clever way of demarcating the end of a string. 00:44:54.981 --> 00:44:59.901 Well, it turns out that strings are represented in the computer's memory 00:44:59.901 --> 00:45:03.861 as a variable that is a pointer, inside of which 00:45:03.861 --> 00:45:06.901 is the address of the first character in the string. 00:45:06.901 --> 00:45:09.951 So if s points at the first character and you 00:45:09.951 --> 00:45:12.501 can trust that backslash zero is at the end of the string, 00:45:12.501 --> 00:45:18.091 that's literally all you need to figure out where a string begins and ends. 00:45:18.091 --> 00:45:19.531 So what do I mean by this? 00:45:19.531 --> 00:45:21.141 Well, let's be a little more concrete. 00:45:21.141 --> 00:45:24.801 In terms of this picture, if I've started with this line of code here, 00:45:24.801 --> 00:45:29.961 it turns out all this time since week 1, that the word string has just 00:45:29.961 --> 00:45:36.871 semi-secretly been an alias for char star. 00:45:36.871 --> 00:45:39.391 I know, so char star. 00:45:39.391 --> 00:45:40.841 So why does this make sense? 00:45:40.841 --> 00:45:44.081 It's a little weird still, but if in our previous example 00:45:44.081 --> 00:45:47.671 we were able to store the address of an integer by declaring a variable 00:45:47.671 --> 00:45:49.831 called p, as int star p-- 00:45:49.831 --> 00:45:52.681 well, if as of now strings are just the address 00:45:52.681 --> 00:45:58.111 of the first character in a string, then probably a string is just a char star 00:45:58.111 --> 00:46:01.861 because that means s is the address of a character, the very 00:46:01.861 --> 00:46:03.461 first character in the string. 00:46:03.461 --> 00:46:07.441 Now, the string might have three letters like it did, or four, or even a hundred 00:46:07.441 --> 00:46:09.571 if it's a long paragraph, but that's fine 00:46:09.571 --> 00:46:11.488 because you can trust that there's going to be 00:46:11.488 --> 00:46:13.181 that null character at the very end. 00:46:13.181 --> 00:46:16.921 So this is a general purpose way of representing strings 00:46:16.921 --> 00:46:20.041 using this new mechanism in C. 00:46:20.041 --> 00:46:23.221 So in fact, let me go ahead here and introduce maybe 00:46:23.221 --> 00:46:25.061 a couple of manipulations of this. 00:46:25.061 --> 00:46:28.831 Let me go back to my code here, and let's get rid of this integer stuff, 00:46:28.831 --> 00:46:32.381 and let's instead now do, for instance, this. 00:46:32.381 --> 00:46:37.383 Let me add in the CS50 library, so we'll include CS50.H for now. 00:46:37.383 --> 00:46:39.091 I'm going to go ahead and inside of main, 00:46:39.091 --> 00:46:41.971 give myself a string s equals hi exclamation point. 00:46:41.971 --> 00:46:43.621 I don't type the backslash zero. 00:46:43.621 --> 00:46:48.228 C does that for me automatically by using my double quotes like this. 00:46:48.228 --> 00:46:49.811 Now let me just go ahead and print it. 00:46:49.811 --> 00:46:52.981 So this again is week 1 style stuff where I'm just printing a string. 00:46:52.981 --> 00:46:54.611 No pointers yet. 00:46:54.611 --> 00:46:59.761 So let me do make address, Enter, ./address, and hopefully I see hi, 00:46:59.761 --> 00:47:01.391 so nothing new there. 00:47:01.391 --> 00:47:05.341 But let's start to peel back some of these layers here. 00:47:05.341 --> 00:47:09.361 Let me first of all, get rid of the CS50 library for a moment 00:47:09.361 --> 00:47:13.651 and let me change string to char star. 00:47:13.651 --> 00:47:15.901 And it's a little bit weird but yes, the convention 00:47:15.901 --> 00:47:19.899 is to say char, a space, then the star, and then immediately thereafter 00:47:19.899 --> 00:47:20.941 the name of the variable. 00:47:20.941 --> 00:47:23.691 Strictly speaking though, you might see textbooks or websites that 00:47:23.691 --> 00:47:26.671 do it like this or like this, but the canonical way 00:47:26.671 --> 00:47:28.451 is typically to do it like that. 00:47:28.451 --> 00:47:31.311 So now no more CS50 library, no more training wheels, if you will. 00:47:31.311 --> 00:47:33.821 I'm just treating strings for what they really are. 00:47:33.821 --> 00:47:37.021 Let me go ahead and do make address, Enter-- 00:47:37.021 --> 00:47:39.181 so far so good-- ./address-- 00:47:39.181 --> 00:47:40.651 and that, too, still works. 00:47:40.651 --> 00:47:44.851 So %s is a thing that comes with printf because the word string is programmer 00:47:44.851 --> 00:47:48.901 terminology but strictly speaking C doesn't have a string data type. 00:47:48.901 --> 00:47:53.221 It's always been char star, so what this means now is I 00:47:53.221 --> 00:47:56.761 can start to have some fun with these basic ideas, 00:47:56.761 --> 00:47:59.891 even though this is not purposeful other than for the sake of discussion. 00:47:59.891 --> 00:48:03.901 But if s is this-- let me go back and give myself the CS50 library. 00:48:03.901 --> 00:48:06.391 Let's put those training wheels back on for just a moment 00:48:06.391 --> 00:48:09.221 so that I can do one manipulation at a time. 00:48:09.221 --> 00:48:12.131 Here's my string s, as before. 00:48:12.131 --> 00:48:15.181 Well, let me go ahead and declare a char called c, 00:48:15.181 --> 00:48:20.221 and let me store the first character in the string there, which is 00:48:20.221 --> 00:48:22.891 s bracket zero, and that should give me h. 00:48:22.891 --> 00:48:25.951 And then just for kicks, let me go ahead and do char star-- 00:48:25.951 --> 00:48:33.061 whoops-- let me go ahead and do char star p equals ampersand c, 00:48:33.061 --> 00:48:35.491 and see what this actually prints for me. 00:48:35.491 --> 00:48:38.861 Let me go ahead and print out what p is here. 00:48:38.861 --> 00:48:40.091 So we're just playing around. 00:48:40.091 --> 00:48:43.681 So make address-- so far so good-- ./address. 00:48:43.681 --> 00:48:46.021 All right, so what have I just done? 00:48:46.021 --> 00:48:51.151 I've just created a char c and stored in it the letter H, which 00:48:51.151 --> 00:48:55.531 is the same thing as s bracket I, then I'm saying, what's the address of c, 00:48:55.531 --> 00:48:58.391 and that's apparently 0x7FF whatever. 00:48:58.391 --> 00:48:59.641 So that's the address. 00:48:59.641 --> 00:49:01.841 But I technically didn't have to do that. 00:49:01.841 --> 00:49:03.641 Let me go ahead and do two things now. 00:49:03.641 --> 00:49:12.001 Instead of just printing p, let me go ahead and print out maybe s itself. 00:49:12.001 --> 00:49:14.461 Let me go ahead and do make address, Enter-- 00:49:14.461 --> 00:49:17.611 so far so good-- ./address and-- 00:49:17.611 --> 00:49:20.371 damn it, what did I do wrong. 00:49:20.371 --> 00:49:22.201 Oh shoot, I didn't want to do that. 00:49:22.201 --> 00:49:25.781 Oh, I really made a mess of this. 00:49:25.781 --> 00:49:28.561 What did I want to do here? 00:49:28.561 --> 00:49:31.831 That was supposed to be impressive but it was the opposite. 00:49:31.831 --> 00:49:35.321 So let me turn it around. 00:49:35.321 --> 00:49:39.181 So if I intended to do this, why are lines nine and 10 00:49:39.181 --> 00:49:41.461 printing different values? 00:49:41.461 --> 00:49:44.641 Didn't really intend to go here, but let me try to save this. 00:49:44.641 --> 00:49:51.991 Why are we seeing different addresses, namely this address 402004 for s, 00:49:51.991 --> 00:49:57.031 and then 0x7FF for p? 00:49:57.031 --> 00:49:57.991 Any thoughts? 00:49:57.991 --> 00:50:00.121 Yeah, over here. 00:50:00.121 --> 00:50:02.571 AUDIENCE: [INAUDIBLE] is the character c is 00:50:02.571 --> 00:50:07.471 its own sort of location of the [INAUDIBLE],, 00:50:07.471 --> 00:50:09.513 and it's taking off just the values [INAUDIBLE].. 00:50:09.513 --> 00:50:10.513 DAVID J. MALAN: Correct. 00:50:10.513 --> 00:50:12.684 So if I really wanted to weasel my way out of this, 00:50:12.684 --> 00:50:15.351 this is a great answer to the previous question which was about, 00:50:15.351 --> 00:50:20.091 what if I introduce another variable, c, that's a copy of the value, 00:50:20.091 --> 00:50:22.791 and not in this case an int, but an actual char. 00:50:22.791 --> 00:50:28.281 Here, I've made c be a copy of the character that's at the beginning of s, 00:50:28.281 --> 00:50:29.381 but that's indeed a copy. 00:50:29.381 --> 00:50:31.131 So if I were to draw it on the screen that 00:50:31.131 --> 00:50:35.271 would give me a different rectangle in which this copy of h 00:50:35.271 --> 00:50:36.681 would actually be stored. 00:50:36.681 --> 00:50:38.631 So I didn't intend to do this, but what you're 00:50:38.631 --> 00:50:40.618 seeing is yes, the address of s-- 00:50:40.618 --> 00:50:42.951 and apparently that's at a pretty low address by default 00:50:42.951 --> 00:50:44.961 here-- then you're seeing the address of c. 00:50:44.961 --> 00:50:47.841 But even though each of them is h, I claim 00:50:47.841 --> 00:50:49.803 one is at a different address in memory. 00:50:49.803 --> 00:50:51.261 And this has always been happening. 00:50:51.261 --> 00:50:53.991 Any time you created one variable or another it was ending up here, 00:50:53.991 --> 00:50:55.908 or here, or here, or somewhere else in memory. 00:50:55.908 --> 00:50:58.911 Now for the first time all we're doing is actually just poking around 00:50:58.911 --> 00:51:02.371 the computer's memory to see what is actually there. 00:51:02.371 --> 00:51:06.021 So let me actually back this up a little bit 00:51:06.021 --> 00:51:09.391 and do what I intended to do here, which was something like this. 00:51:09.391 --> 00:51:13.551 So if string s equals quote unquote, hi, let's go ahead 00:51:13.551 --> 00:51:23.051 and give myself a pointer, called p, to the first character in s. 00:51:23.051 --> 00:51:26.891 All right, so now let me go ahead and print out the value of this pointer, 00:51:26.891 --> 00:51:29.034 %p, printing out p. 00:51:29.034 --> 00:51:30.951 So we're just going to do one thing at a time. 00:51:30.951 --> 00:51:33.761 So make address, Enter, ./address. 00:51:33.761 --> 00:51:38.861 There, at the moment, is the address of the first character in s. 00:51:38.861 --> 00:51:40.781 What I meant to do now, was this. 00:51:40.781 --> 00:51:43.721 If I want to print out two things this time, 00:51:43.721 --> 00:51:49.391 let me print out not only what p is, but also what s itself originally is. 00:51:49.391 --> 00:51:53.411 Because if I claim that everyone from last week should be comfortable with 00:51:53.411 --> 00:51:56.381 s bracket zero just representing the first character in s 00:51:56.381 --> 00:51:59.621 by definition of strings being arrays of characters. 00:51:59.621 --> 00:52:05.871 Then s, as of today, is itself the address of a character, 00:52:05.871 --> 00:52:06.761 the first one in s. 00:52:06.761 --> 00:52:10.721 So if I now do make address, and do ./address, 00:52:10.721 --> 00:52:13.481 this time I see the same exact things. 00:52:13.481 --> 00:52:14.081 Thank you. 00:52:18.228 --> 00:52:20.811 This is really the lamest sort of thing to be applauding over, 00:52:20.811 --> 00:52:26.571 but what we're demonstrating here is that s is by definition the address 00:52:26.571 --> 00:52:28.261 of the first character in c. 00:52:28.261 --> 00:52:30.931 So if we borrow some of our mental model from last week-- 00:52:30.931 --> 00:52:35.811 well, if s bracket zero is the first character in c, doing the ampersand on 00:52:35.811 --> 00:52:38.351 that expression should be the same as s. 00:52:38.351 --> 00:52:40.851 Now this isn't to say that we would jump through these hoops 00:52:40.851 --> 00:52:45.051 all the time with this much syntax, but this is just to do proof by example 00:52:45.051 --> 00:52:51.171 that s is in fact, as I claimed a moment ago, just the address of a character. 00:52:51.171 --> 00:52:54.651 Not even multiple characters, it's the address of a single character, 00:52:54.651 --> 00:52:58.581 but the key thing is it's the address of the first character in the string, 00:52:58.581 --> 00:53:01.821 and per last week we trust that C is going 00:53:01.821 --> 00:53:04.881 to look for that null character at the very end just 00:53:04.881 --> 00:53:08.721 to make sure it knows where the string actually ends. 00:53:08.721 --> 00:53:12.317 All right, a question came up over here. 00:53:12.317 --> 00:53:25.581 AUDIENCE: [INAUDIBLE] 00:53:25.581 --> 00:53:26.581 DAVID J. MALAN: Correct. 00:53:26.581 --> 00:53:30.181 To summarize, on line eight, when I am using %p-- 00:53:30.181 --> 00:53:33.181 that just means print a pointer value, so 0x something-- 00:53:33.181 --> 00:53:35.581 I'm passing it s. 00:53:35.581 --> 00:53:41.281 Previously, when we used %s, printf knew to print not just the first character 00:53:41.281 --> 00:53:45.481 of s, but h, i, exclamation point, and then stop when it hits the backslash 00:53:45.481 --> 00:53:46.621 zero. 00:53:46.621 --> 00:53:51.841 p is different. %p tells the computer to go to that address-- 00:53:51.841 --> 00:53:56.711 sorry, tells the computer to print that address on the screen. 00:53:56.711 --> 00:53:59.761 So this is where %s all this time has been powerful. 00:53:59.761 --> 00:54:03.961 The reason printf worked in week 1 and 2 and 3 00:54:03.961 --> 00:54:07.261 was because printf was designed by some human years ago 00:54:07.261 --> 00:54:10.291 to go to the address that's being passed in-- for instance, 00:54:10.291 --> 00:54:12.631 s-- and print out character after character 00:54:12.631 --> 00:54:16.291 after character until it sees the null character backslash zero, 00:54:16.291 --> 00:54:17.891 and then stop printing it. 00:54:17.891 --> 00:54:21.481 So that's-- you're getting a lot of functionality for free from %s. 00:54:21.481 --> 00:54:23.911 Today we're using something much simpler, %p, 00:54:23.911 --> 00:54:27.211 which just literally prints what s is. 00:54:27.211 --> 00:54:28.951 And the reason we don't do this in week 1 00:54:28.951 --> 00:54:31.021 is just because this is like way too much 00:54:31.021 --> 00:54:33.021 to be interesting when all you want to print out 00:54:33.021 --> 00:54:34.541 is hi or hello, world, or the like. 00:54:34.541 --> 00:54:36.511 But now what we're really doing is revealing 00:54:36.511 --> 00:54:38.941 what's been going on this whole time. 00:54:38.941 --> 00:54:40.678 And let me make one other example here. 00:54:40.678 --> 00:54:42.511 Let me go ahead and get rid of this variable 00:54:42.511 --> 00:54:45.901 here and let me just print out a few things to make the same point. 00:54:45.901 --> 00:54:50.131 I'm going to print out not just s like I did here, but let's go ahead 00:54:50.131 --> 00:54:51.181 and print out every-- 00:54:51.181 --> 00:54:53.071 the address of every character in s. 00:54:53.071 --> 00:54:57.353 So let's get the first letter in s and get its address, 00:54:57.353 --> 00:54:59.311 and I'm going to do copy paste for time's sake, 00:54:59.311 --> 00:55:02.521 but not something I would do frequently. 00:55:02.521 --> 00:55:06.034 So let me print out the address of the first character, the second character, 00:55:06.034 --> 00:55:07.951 the third, and actually even the fourth, which 00:55:07.951 --> 00:55:11.321 is the backslash zero, by doing this. 00:55:11.321 --> 00:55:15.931 So when I compiled this program-- make address, ./address-- 00:55:15.931 --> 00:55:19.441 I should see two identical values and then 00:55:19.441 --> 00:55:21.931 additional values that are one byte away. 00:55:21.931 --> 00:55:27.571 In my diagram a moment ago, my addresses were arbitrarily 0x123, 124, 125, 126. 00:55:27.571 --> 00:55:33.841 Now it starts at, by chance, 0x402004, which is s. 00:55:33.841 --> 00:55:37.381 0x402004 is the same thing as s because I'm just 00:55:37.381 --> 00:55:39.991 saying go to the first character and then get its address. 00:55:39.991 --> 00:55:41.491 Those are one in the same now. 00:55:41.491 --> 00:55:47.401 And then after that is 0x402005, 006, 007, 00:55:47.401 --> 00:55:49.181 because that is just like the diagram. 00:55:49.181 --> 00:55:52.981 Go to the i, to the exclamation point, and to the null character. 00:55:52.981 --> 00:55:55.891 So all I'm doing now is using my newfound understanding of what 00:55:55.891 --> 00:55:59.251 ampersand does and what the star does, is I'm just playing around. 00:55:59.251 --> 00:56:02.149 I'm poking around in the computer's memory. 00:56:02.149 --> 00:56:03.691 Just to demonstrate there's no magic. 00:56:03.691 --> 00:56:06.661 It's all there very deliberately because I or printf or someone 00:56:06.661 --> 00:56:07.441 else put it there. 00:56:07.441 --> 00:56:09.166 Yeah. 00:56:09.166 --> 00:56:15.894 AUDIENCE: [INAUDIBLE] 00:56:15.894 --> 00:56:17.561 DAVID J. MALAN: Really good observation. 00:56:17.561 --> 00:56:21.071 So it's indeed the case that hi, unlike 50, 00:56:21.071 --> 00:56:26.291 is ending up at a very low address, not the 0x7FF wherever it was. 00:56:26.291 --> 00:56:29.261 That's actually because, long story short, strings 00:56:29.261 --> 00:56:32.231 are often stored in a different part of the computer's memory-- 00:56:32.231 --> 00:56:34.331 more on that later today-- for efficiency. 00:56:34.331 --> 00:56:37.541 There's actually only going to be one copy of the word "hi" and exclamation 00:56:37.541 --> 00:56:40.821 point, and the computer is going to tuck it at the beginning of my memory, 00:56:40.821 --> 00:56:43.751 but other values like ints and floats and the 00:56:43.751 --> 00:56:46.391 like-- they end up lower in memory by convention. 00:56:46.391 --> 00:56:49.641 But a good observation, because that is consistent here. 00:56:49.641 --> 00:56:53.111 All right, so a couple final details then, on what's been going on here. 00:56:53.111 --> 00:56:58.691 Let me go ahead and claim that we implemented char star-- 00:56:58.691 --> 00:57:01.391 or rather, string as a char star as follows. 00:57:01.391 --> 00:57:03.731 As of last week we were writing this code. 00:57:03.731 --> 00:57:07.961 As of this week, we can now start writing this code because char star 00:57:07.961 --> 00:57:11.541 specifically, we invented in the CS50 library. 00:57:11.541 --> 00:57:14.891 But it turns out you've seen a way of inventing your own data types. 00:57:14.891 --> 00:57:16.631 Recall this thing here. 00:57:16.631 --> 00:57:20.861 We played around last time with data structures, or the struct keyword in C, 00:57:20.861 --> 00:57:24.641 and briefly the typedef keyword, which defines a type for you. 00:57:24.641 --> 00:57:26.651 And if I highlight what's interesting here, 00:57:26.651 --> 00:57:30.341 the way we invented a person data type last time 00:57:30.341 --> 00:57:33.401 was to define a person as having two variables inside of it-- 00:57:33.401 --> 00:57:38.598 a structure that encapsulates a name and encapsulates a number. 00:57:38.598 --> 00:57:41.681 Now even though the syntax is a little different today because of the star 00:57:41.681 --> 00:57:47.771 thing, notice that this could be a similar application of that idea. 00:57:47.771 --> 00:57:52.061 If I want to create a type called string, highlighted in yellow here, 00:57:52.061 --> 00:57:56.231 then I use typedef to make it defined to be char star. 00:57:56.231 --> 00:57:59.951 So this is literally all that has ever been in CS50.h, 00:57:59.951 --> 00:58:02.771 in addition to those prototypes of functions we've talked about. 00:58:02.771 --> 00:58:05.831 typedef char star string is a one-line code 00:58:05.831 --> 00:58:10.558 that brings the word string as a data type into existence, 00:58:10.558 --> 00:58:12.141 and that's all that's ever been there. 00:58:12.141 --> 00:58:15.281 But the star, the char star, is just too much in week 1. 00:58:15.281 --> 00:58:18.671 We wait until this point to peel back that layer. 00:58:18.671 --> 00:58:21.161 are any questions, then, on what a string is? 00:58:21.161 --> 00:58:23.741 What star or the ampersand are doing? 00:58:23.741 --> 00:58:25.511 Yeah. 00:58:25.511 --> 00:58:28.608 AUDIENCE: [INAUDIBLE] 00:58:28.608 --> 00:58:29.691 DAVID J. MALAN: Oh my God. 00:58:29.691 --> 00:58:31.071 Massive spoiler, but yes. 00:58:31.071 --> 00:58:34.671 If that is-- is that why when you compare two strings as I briefly 00:58:34.671 --> 00:58:38.671 did, or almost did, problems arise. 00:58:38.671 --> 00:58:40.971 And in fact yes, last week we use str compare-- 00:58:40.971 --> 00:58:45.351 STRCMP-- for a very deliberate reason because yes, the spoiler is I 00:58:45.351 --> 00:58:49.941 accidentally would have compared two addresses in memory, not the strings 00:58:49.941 --> 00:58:52.111 at those addresses. 00:58:52.111 --> 00:58:53.251 Other questions here. 00:58:55.213 --> 00:58:58.171 All right, well, before we give ourselves maybe a 10 minute break here, 00:58:58.171 --> 00:58:59.401 we have lots of pieces of paper. 00:58:59.401 --> 00:59:02.191 If anyone wants to come on up and play with this big stack of Post-Its, 00:59:02.191 --> 00:59:04.201 if you want to make your own eight by eight grid of something 00:59:04.201 --> 00:59:07.261 to share with the class if you're artistically inclined, come on up. 00:59:07.261 --> 00:59:09.991 Otherwise, let's take 10 minutes and will return after 10. 00:59:09.991 --> 00:59:14.911 All right, so let's come back to this question of how 00:59:14.911 --> 00:59:17.881 we can start to use these pointers and these addresses, ultimately 00:59:17.881 --> 00:59:18.971 in an interesting way. 00:59:18.971 --> 00:59:21.211 The goal ultimately next week is going to be 00:59:21.211 --> 00:59:24.931 to use these addresses to really stitch together more complicated data 00:59:24.931 --> 00:59:28.261 structures than just persons, like last week, or candidates 00:59:28.261 --> 00:59:30.061 in the context of an electoral algorithm, 00:59:30.061 --> 00:59:33.631 if you will, and actually really use our memory in the most versatile way 00:59:33.631 --> 00:59:36.691 to represent not just images but maybe videos 00:59:36.691 --> 00:59:39.191 and other two-dimensional structures as well. 00:59:39.191 --> 00:59:41.581 But for now, let's come back to this address example, 00:59:41.581 --> 00:59:46.561 whittle it down to just a hi initially, and see what's going on again, here 00:59:46.561 --> 00:59:47.461 underneath the hood. 00:59:47.461 --> 00:59:50.401 So let me re-add the CS50 library just so we 00:59:50.401 --> 00:59:54.031 use our synonym for a moment, that is the word string, 00:59:54.031 --> 00:59:56.161 and I'll redefine s as a string. 00:59:56.161 --> 00:59:58.831 And what I didn't mention before is that these double quotes 00:59:58.831 --> 01:00:01.681 that you've been using for some time are actually a little special. 01:00:01.681 --> 01:00:04.921 The double quotes are a clue to the compiler 01:00:04.921 --> 01:00:09.311 that what is between them is in fact a string as we now know it, 01:00:09.311 --> 01:00:12.571 which means the compiler will do all the work of figuring out 01:00:12.571 --> 01:00:15.331 where to put the h, the i, the exclamation point, 01:00:15.331 --> 01:00:18.361 and even adding for you automatically a backslash zero. 01:00:18.361 --> 01:00:20.581 And what the compiler will do for you, too, 01:00:20.581 --> 01:00:23.461 is figure out what address all four of those chars 01:00:23.461 --> 01:00:27.331 ended up at and store it for you in the variable s. 01:00:27.331 --> 01:00:31.531 So that's why it just happens with strings without using ampersands 01:00:31.531 --> 01:00:35.911 or even stars explicitly, but the star at least has been there because again, 01:00:35.911 --> 01:00:38.401 string is just synonymous now with char star. 01:00:38.401 --> 01:00:42.371 It's not really as readable, but it is now the same idea. 01:00:42.371 --> 01:00:44.911 So I'll leave string in place just to do something week 01:00:44.911 --> 01:00:48.581 1 style here for a moment, and let's go ahead and print out a few characters. 01:00:48.581 --> 01:00:54.031 So I'm going to use %c this time, and I'm going to print out s bracket zero 01:00:54.031 --> 01:00:59.161 and then I'm going to print out s bracket one and s bracket two, 01:00:59.161 --> 01:01:03.091 literally doing week three style from last week-- 01:01:03.091 --> 01:01:07.921 a printing of every character in s as though it were an array. 01:01:07.921 --> 01:01:11.221 So ./address should give me h-i exclamation point. 01:01:11.221 --> 01:01:14.461 And if I really want to get curious, technically speaking, 01:01:14.461 --> 01:01:18.691 I could print out one more location, and let me go ahead and recompile, 01:01:18.691 --> 01:01:24.211 make address ./address and there is, it would seem, the backslash zero. 01:01:24.211 --> 01:01:29.641 I'm not seeing zero because I didn't type literally the zero char in ASCII, 01:01:29.641 --> 01:01:33.331 it's literally eight zero bits which are technically unprintable, 01:01:33.331 --> 01:01:34.961 if you will, in printf speak. 01:01:34.961 --> 01:01:37.351 And so what I'm seeing here is like a blank symbol. 01:01:37.351 --> 01:01:39.541 That just means there is something else there-- 01:01:39.541 --> 01:01:43.801 it's apparently all eight zero bits, but they are there 01:01:43.801 --> 01:01:46.571 even though we're not seeing them literally right now. 01:01:46.571 --> 01:01:49.211 Well, let's go ahead and peel back one of these layers 01:01:49.211 --> 01:01:53.131 and let me go ahead and get rid of the CS50 library and get rid of, 01:01:53.131 --> 01:01:56.551 therefore, the word string because again, henceforth it's just char star. 01:01:56.551 --> 01:01:57.901 Nothing else is different. 01:01:57.901 --> 01:02:00.781 I'm going to now do make address, ./address, 01:02:00.781 --> 01:02:02.251 and it's the same exact thing. 01:02:02.251 --> 01:02:05.621 And now, let's just focus on the hi rather than even worry about that. 01:02:05.621 --> 01:02:10.411 So I'm going to recompile one last time and now I have h-i exclamation point. 01:02:10.411 --> 01:02:15.001 Well, it turns out that the array notation we used last week 01:02:15.001 --> 01:02:17.611 was technically some of this syntactic sugar. 01:02:17.611 --> 01:02:20.821 Sort of a neat way to use syntax in a useful way, 01:02:20.821 --> 01:02:26.431 but we can see more explicitly today what the square brackets for a string 01:02:26.431 --> 01:02:28.061 is actually doing. 01:02:28.061 --> 01:02:29.801 Let me go ahead and do this. 01:02:29.801 --> 01:02:35.041 Let me adventurously say I want to print out not s bracket 01:02:35.041 --> 01:02:40.831 zero, but I want to print out whatever the first character of s is. 01:02:40.831 --> 01:02:43.081 So to be clear, what is s now? 01:02:43.081 --> 01:02:44.431 It's the address of a string. 01:02:44.431 --> 01:02:45.931 OK, but what is s, really? 01:02:45.931 --> 01:02:49.441 s is the address of the first char in a string 01:02:49.441 --> 01:02:52.441 and again, that's sufficient for defining a string because eventually 01:02:52.441 --> 01:02:55.361 the computer will see that there's a backslash n at the end of it. 01:02:55.361 --> 01:03:01.241 So s is specifically the address of the first character in a string. 01:03:01.241 --> 01:03:04.291 So that means, using my new syntax, if I want 01:03:04.291 --> 01:03:07.583 to print out that first character I can print out star 01:03:07.583 --> 01:03:11.473 s, because recall that star is the dereference operator when you don't 01:03:11.473 --> 01:03:13.681 repeat the word char, you don't repeat the word int-- 01:03:13.681 --> 01:03:15.301 you just use the star here. 01:03:15.301 --> 01:03:17.821 That means go to that address. 01:03:17.821 --> 01:03:22.651 Similarly, if I, in my newfound knowledge of how strings work, 01:03:22.651 --> 01:03:26.281 know that the h comes first, then the i right after it, 01:03:26.281 --> 01:03:30.151 then the exclamation point, then the backslash zero, contiguously 01:03:30.151 --> 01:03:33.931 one byte apart, I could start to do some arithmetic. 01:03:33.931 --> 01:03:39.571 I could go to s plus 1 byte and print out the second character, 01:03:39.571 --> 01:03:43.321 and I could print out whatever is at s plus 2-- 01:03:43.321 --> 01:03:46.591 in fact, doing what's generally known as pointer arithmetic. 01:03:46.591 --> 01:03:49.591 Literally treating pointers as the numbers they are-- 01:03:49.591 --> 01:03:52.831 hexadecimal or decimal, doesn't really matter-- it's still just numbers. 01:03:52.831 --> 01:03:55.661 And go ahead and add one byte or two bytes 01:03:55.661 --> 01:03:58.151 to them to start at the beginning of a string 01:03:58.151 --> 01:04:00.831 and just poke around from left to right. 01:04:00.831 --> 01:04:04.901 So this now is equivalent to what we did last week using square bracket 01:04:04.901 --> 01:04:09.671 notation, but now I'm re implementing that same idea with this lower level 01:04:09.671 --> 01:04:13.821 plumbing, understanding ampersand and stars now a little bit more, 01:04:13.821 --> 01:04:16.601 so if I remake this program and do ./address, 01:04:16.601 --> 01:04:19.128 I should still see h-i exclamation point. 01:04:19.128 --> 01:04:21.461 But what I'm really doing is just kind of demonstrating, 01:04:21.461 --> 01:04:24.851 hopefully, my understanding of what really 01:04:24.851 --> 01:04:26.711 is going on in the computer's memory. 01:04:26.711 --> 01:04:29.231 Now, programmers who are maybe trying to show off 01:04:29.231 --> 01:04:30.611 might actually write this syntax. 01:04:30.611 --> 01:04:33.236 I think the more common syntax would be what we did last week-- 01:04:33.236 --> 01:04:34.971 s bracket zero, s bracket one. 01:04:34.971 --> 01:04:35.471 Why? 01:04:35.471 --> 01:04:37.346 It's just a little more readable and we don't 01:04:37.346 --> 01:04:41.531 need to brag about or care about this underlying representation. 01:04:41.531 --> 01:04:44.411 The square brackets last week we're an abstraction, if you will, 01:04:44.411 --> 01:04:46.721 on top of what is lower level math. 01:04:46.721 --> 01:04:49.361 But that's all that's going on underneath the hood. 01:04:49.361 --> 01:04:52.811 We're poking around from byte to byte to byte. 01:04:52.811 --> 01:04:58.221 All right, let me pause here, see if there's any questions on that one. 01:04:58.221 --> 01:05:00.931 Any questions on this? 01:05:00.931 --> 01:05:03.651 Let's do one more then, just to demonstrate that this is not 01:05:03.651 --> 01:05:05.171 even specific to strings. 01:05:05.171 --> 01:05:07.161 Let me go ahead and get rid of all of this 01:05:07.161 --> 01:05:11.541 and let me give myself an array of numbers like I did last week. 01:05:11.541 --> 01:05:13.821 So if I'm going to declare all the numbers 01:05:13.821 --> 01:05:16.521 at once using this funky curly brace notation, 01:05:16.521 --> 01:05:19.971 I can do like 4, 6, 8, 2, 7, 5, 0. 01:05:19.971 --> 01:05:24.051 So seven different numbers inside of an array that's automatically 01:05:24.051 --> 01:05:25.071 initialized like this. 01:05:25.071 --> 01:05:27.131 I don't, strictly speaking, need to say seven. 01:05:27.131 --> 01:05:28.881 The compiler is smart enough to figure out 01:05:28.881 --> 01:05:31.251 how many numbers I put with commas between them, 01:05:31.251 --> 01:05:35.751 and that just gives me an array containing 4, 6, 8, 2, 7, 5, 0. 01:05:35.751 --> 01:05:39.201 So it turns out I can print each of these numbers in the familiar way. 01:05:39.201 --> 01:05:45.021 I can do a printf of %i backslash n, and I can print numbers bracket zero, 01:05:45.021 --> 01:05:49.041 and let me just do some quick copy/paste just to print the first three of these. 01:05:49.041 --> 01:05:53.881 Theoretically, that should print out 4, 6, 8, and so forth. 01:05:53.881 --> 01:05:57.021 But I can do the same sort of manipulation understanding 01:05:57.021 --> 01:05:59.931 what pointers now are, using pointer arithmetic. 01:05:59.931 --> 01:06:03.741 So let me actually unwind this and just go back to one printf, 01:06:03.741 --> 01:06:07.191 and instead of printing numbers bracket zero like I might have last week, 01:06:07.191 --> 01:06:11.361 let me just go and print out whatever is at that address-- 01:06:11.361 --> 01:06:13.431 so asterisk numbers. 01:06:13.431 --> 01:06:15.861 Let me then print out the second digit, which 01:06:15.861 --> 01:06:21.051 is going to be whatever is at numbers plus 1, and then let me do this further 01:06:21.051 --> 01:06:25.021 and do whatever is at numbers plus 2, and if I really want to repeat this, 01:06:25.021 --> 01:06:27.261 let me do it four more times and do what's 01:06:27.261 --> 01:06:31.881 at location three, four, five, and six. 01:06:31.881 --> 01:06:35.631 And that's seven total numbers because I started counting at zero. 01:06:35.631 --> 01:06:37.201 So let me just quickly run this. 01:06:37.201 --> 01:06:39.651 Make address, ./address. 01:06:39.651 --> 01:06:42.381 There are those seven digits being printed. 01:06:42.381 --> 01:06:46.401 But there's something subtle but also useful here. 01:06:46.401 --> 01:06:47.541 Each of these digits-- 01:06:47.541 --> 01:06:49.341 4, 6, 8, 2,7,5, 0-- 01:06:49.341 --> 01:06:49.891 is an int. 01:06:49.891 --> 01:06:50.391 Why? 01:06:50.391 --> 01:06:52.531 Because I made an array of integers. 01:06:52.531 --> 01:06:57.181 But think back-- how big is a typical integer, have we claimed? 01:06:57.181 --> 01:07:02.821 Four bytes, or 32 bits, so it's worth noting that I don't really 01:07:02.821 --> 01:07:04.841 need to worry about that detail. 01:07:04.841 --> 01:07:10.119 Notice that I did not do plus 4, plus 8, plus 12, plus 16, plus 20. 01:07:10.119 --> 01:07:11.911 I, the programmer, strictly speaking, don't 01:07:11.911 --> 01:07:14.191 need to worry about how big the data type is. 01:07:14.191 --> 01:07:16.291 This is the power of pointer arithmetic. 01:07:16.291 --> 01:07:21.931 The compiler is smart enough to know that if you add 1 to this pointer, 01:07:21.931 --> 01:07:26.441 that is the same as saying go one more piece of data-- 01:07:26.441 --> 01:07:27.481 not just one byte-- 01:07:27.481 --> 01:07:29.251 so if it's an int, move four. 01:07:29.251 --> 01:07:30.871 If it's a second int, move eight. 01:07:30.871 --> 01:07:32.601 If it's a third int, move 12. 01:07:32.601 --> 01:07:35.821 Pointer arithmetic handles that annoying arithmetic for you 01:07:35.821 --> 01:07:38.461 so you can just think of this as a number after a number 01:07:38.461 --> 01:07:41.821 after a number that are back to back to back but not one byte apart, 01:07:41.821 --> 01:07:43.171 but four bytes apart. 01:07:43.171 --> 01:07:47.201 Which is only to say plus 1, plus 2, plus 3 works no matter the data type. 01:07:47.201 --> 01:07:47.701 Why? 01:07:47.701 --> 01:07:53.121 Because the compiler knows what type of data you're talking about. 01:07:53.121 --> 01:07:56.511 Now, there's one other detail I should reveal here 01:07:56.511 --> 01:07:58.671 that I've taken for granted. 01:07:58.671 --> 01:08:01.641 In the past I was using double quotes to represent strings, 01:08:01.641 --> 01:08:04.371 and I claim that the compiler's smart enough to realize that oh, 01:08:04.371 --> 01:08:08.911 if I have double quote hi, that means it's an array of h-i exclamation point, 01:08:08.911 --> 01:08:10.431 and then the backslash zero. 01:08:10.431 --> 01:08:12.801 Notice this usefulness. 01:08:12.801 --> 01:08:18.561 It turns out that you can actually treat arrays as though the name of the array 01:08:18.561 --> 01:08:20.781 is itself a pointer, and this is actually 01:08:20.781 --> 01:08:23.151 going to be something useful in upcoming problems 01:08:23.151 --> 01:08:26.721 when we want to pass arrays around in the computer's memory. 01:08:26.721 --> 01:08:30.463 Notice that strictly speaking on line five, there's no pointers going on. 01:08:30.463 --> 01:08:32.421 There's no star, there's no ampersand-- there's 01:08:32.421 --> 01:08:35.661 nothing new there, and yet instantly on line seven 01:08:35.661 --> 01:08:40.491 I'm pretending that it is the address, and this is actually OK. 01:08:40.491 --> 01:08:44.391 It turns out that an array really can be treated 01:08:44.391 --> 01:08:47.881 as the address of the first element in that array. 01:08:47.881 --> 01:08:52.079 The difference is that there's no secret backslash zero anywhere. 01:08:52.079 --> 01:08:53.871 This is just part of the phone number here, 01:08:53.871 --> 01:08:56.691 the ending in zero-- that's not like a special backslash zero. 01:08:56.691 --> 01:08:59.721 So this is something we're going to take advantage of too, before long. 01:08:59.721 --> 01:09:03.441 There's this interrelationship between addresses and arrays 01:09:03.441 --> 01:09:08.121 that just generally allows you to treat one as though it is the other, 01:09:08.121 --> 01:09:10.521 but the math is taken care of for you. 01:09:10.521 --> 01:09:14.961 Are any questions then on this before we start to solve some bigger problems? 01:09:14.961 --> 01:09:16.761 Yeah. 01:09:16.761 --> 01:09:23.784 AUDIENCE: [INAUDIBLE] 01:09:23.784 --> 01:09:24.951 DAVID J. MALAN: Potentially. 01:09:24.951 --> 01:09:28.911 If you go beyond the end of an array, you might get a segmentation fault. 01:09:28.911 --> 01:09:32.181 The problem is that that symptom is sometimes nondeterministic, 01:09:32.181 --> 01:09:35.181 which means that sometimes it will happen, sometimes it won't. 01:09:35.181 --> 01:09:39.141 It often depends on how far off the end of the array you actually go. 01:09:39.141 --> 01:09:41.631 You'll often not induce the segmentation fault 01:09:41.631 --> 01:09:44.421 if you just poke a little too far, but if you go way too far 01:09:44.421 --> 01:09:45.831 it quite likely will. 01:09:45.831 --> 01:09:49.161 But we'll give you a tool today actually for detecting and solving 01:09:49.161 --> 01:09:51.181 exactly that kind of situation. 01:09:51.181 --> 01:09:54.091 So let's go ahead now and do something a little different in code, 01:09:54.091 --> 01:09:56.601 but that actually comes back to that spoiler from earlier. 01:09:56.601 --> 01:10:01.471 Let me go ahead and create a program called compare.c, and in this program 01:10:01.471 --> 01:10:04.641 I'm going to go ahead and allow myself the CS50 library, 01:10:04.641 --> 01:10:08.121 not so much for string but so that I can actually use GetInt still, 01:10:08.121 --> 01:10:12.440 which is way easier than the way we'll see that C normally lets you get input. 01:10:12.440 --> 01:10:15.471 Let me give myself stdio.h, do an int main(void), 01:10:15.471 --> 01:10:18.381 not worrying about command line arguments today, and let me go ahead 01:10:18.381 --> 01:10:22.701 and get an int i using get int, and ask the human for the value of i, 01:10:22.701 --> 01:10:28.461 then let me give myself an int j, ask the user for another int, calling it j, 01:10:28.461 --> 01:10:32.631 and then let me go ahead and kind of naively, but to your point earlier, 01:10:32.631 --> 01:10:36.051 if i equals equals j, then let's go ahead 01:10:36.051 --> 01:10:41.121 and print out something like "same," backslash n, else let's go ahead 01:10:41.121 --> 01:10:44.791 and print out "different" if they are not, in fact, the same. 01:10:44.791 --> 01:10:48.951 So that would seem to be a program that compares the value of two integers. 01:10:48.951 --> 01:10:51.261 All right, so let's go ahead and run make compare-- 01:10:51.261 --> 01:10:53.451 so far so good-- ./compare. 01:10:53.451 --> 01:10:56.991 OK, i will be 50, j will be 50-- 01:10:56.991 --> 01:10:58.041 they're the same. 01:10:58.041 --> 01:10:59.221 Let's do it once more. 01:10:59.221 --> 01:11:02.239 i will be 50, j will be 42. 01:11:02.239 --> 01:11:03.031 They are different. 01:11:03.031 --> 01:11:07.341 So so far, so good in this first version of comparison. 01:11:07.341 --> 01:11:10.411 But as you might see where I'm going with this, 01:11:10.411 --> 01:11:14.151 let's move away from integers and let's actually change these things to char-- 01:11:14.151 --> 01:11:15.301 to strings. 01:11:15.301 --> 01:11:17.901 So I could do string s over here-- 01:11:17.901 --> 01:11:20.481 GetString s over here. 01:11:20.481 --> 01:11:27.351 Then I could do string t over here, and GetString over here, 01:11:27.351 --> 01:11:30.081 asking the user for t this time, here. 01:11:30.081 --> 01:11:31.611 And then I can compare the two. 01:11:31.611 --> 01:11:33.458 If s equals equals t-- 01:11:33.458 --> 01:11:34.791 and this is a common convention. 01:11:34.791 --> 01:11:37.821 If you've used s for string already you can use t for the next one, at least 01:11:37.821 --> 01:11:39.441 for simple demonstrations like this. 01:11:39.441 --> 01:11:42.566 I'm going to compare the two, just like I did for ints, which worked great. 01:11:42.566 --> 01:11:46.521 Make compare-- so far so good-- ./address-- 01:11:46.521 --> 01:11:47.361 oh, sorry. 01:11:47.361 --> 01:11:49.221 Wrong program-- ./compare. 01:11:49.221 --> 01:11:52.431 Let me go ahead and type in something like 01:11:52.431 --> 01:11:57.401 hi, exclamation point and bye, exclamation point, which of course 01:11:57.401 --> 01:11:59.301 should definitely be different. 01:11:59.301 --> 01:12:05.121 Let me run it again with hi, exclamation point and hi, exclamation point. 01:12:05.121 --> 01:12:07.071 Different-- maybe I messed up. 01:12:07.071 --> 01:12:10.181 Let's maybe do it lowercase, maybe that'll fix. 01:12:10.181 --> 01:12:12.501 But no, those two are different. 01:12:12.501 --> 01:12:16.481 So to come back to what I described as a spoiler earlier, what's 01:12:16.481 --> 01:12:20.659 the fundamental issue here, to be clear? 01:12:20.659 --> 01:12:22.701 Why is it saying different even though I'm pretty 01:12:22.701 --> 01:12:24.118 sure I typed the same thing twice. 01:12:24.118 --> 01:12:26.181 Yeah. 01:12:26.181 --> 01:12:29.601 Yeah, this is where it's now useful to know that string has been 01:12:29.601 --> 01:12:33.063 an abstraction-- a training wheel, if you will-- and if we take that away-- 01:12:33.063 --> 01:12:35.271 still use GetString because that's convenient still-- 01:12:35.271 --> 01:12:38.061 but if I change string to be char star, it's 01:12:38.061 --> 01:12:44.301 a little more explicit as to what s and what t are. s is a pointer to a char, 01:12:44.301 --> 01:12:46.761 that is the address of a char. t is a pointer 01:12:46.761 --> 01:12:48.921 to a char, that is the address of a char. 01:12:48.921 --> 01:12:52.071 Specifically, the first character in s and the first character 01:12:52.071 --> 01:12:53.851 in t, respectively. 01:12:53.851 --> 01:12:56.076 So if I'm comparing these two it should stand 01:12:56.076 --> 01:12:57.951 to reason that they're going to be different. 01:12:57.951 --> 01:12:58.451 Why? 01:12:58.451 --> 01:13:02.061 Because s might end up here in memory and t might end up here in memory. 01:13:02.061 --> 01:13:05.181 Each time I call GetString, it is not smart enough or advanced enough 01:13:05.181 --> 01:13:07.171 to know that, wait a minute-- you typed the same thing. 01:13:07.171 --> 01:13:08.691 I'm just going to hand you back the same address. 01:13:08.691 --> 01:13:11.511 That doesn't happen because we did not design GetString that way. 01:13:11.511 --> 01:13:15.141 Each time I call GetString, it returns, apparently, 01:13:15.141 --> 01:13:17.901 a different copy of the string that was typed in. 01:13:17.901 --> 01:13:20.211 A hi over here and a hi over here. 01:13:20.211 --> 01:13:22.791 They might look the same to the human but to the computer 01:13:22.791 --> 01:13:26.691 they are different chunks of memory, and therefore at different addresses. 01:13:26.691 --> 01:13:30.181 And here, too, we can reveal what is GetString returning? 01:13:30.181 --> 01:13:34.161 Well, up until today it was returning a string, so to speak. 01:13:34.161 --> 01:13:35.661 That's not really a thing. 01:13:35.661 --> 01:13:38.001 Technically, what GetString has always been 01:13:38.001 --> 01:13:43.371 doing is returning the address of the first char in a string 01:13:43.371 --> 01:13:47.181 and trusting that we put a backslash zero at the end of whatever the human 01:13:47.181 --> 01:13:51.411 typed in, and that's enough now for printf, for strlen, for you 01:13:51.411 --> 01:13:53.961 to know where a string begins and ends. 01:13:53.961 --> 01:13:57.711 So GetString has actually always returned a pointer. 01:13:57.711 --> 01:14:01.101 It has not returned a quote unquote string per se, 01:14:01.101 --> 01:14:04.401 but there are functions that can solve this comparison for us. 01:14:04.401 --> 01:14:07.501 Recall that I could do something like this. 01:14:07.501 --> 01:14:10.431 I could actually go in here and I could-- 01:14:10.431 --> 01:14:11.641 let's see, where was it? 01:14:11.641 --> 01:14:18.981 So if I include str compare here and use it to pass in two values, s and t, 01:14:18.981 --> 01:14:22.701 let's see now what happens when I make compare. 01:14:22.701 --> 01:14:26.211 Implicitly declaring library function str compare with type int-- 01:14:26.211 --> 01:14:27.321 and well, there's a star. 01:14:27.321 --> 01:14:30.801 So you might have seen this error before and you might have ignored most of it, 01:14:30.801 --> 01:14:35.281 but there's some evidence of stars or pointers going on here. 01:14:35.281 --> 01:14:37.771 It looks like I didn't include the string.h header file, 01:14:37.771 --> 01:14:38.961 so that's an easy fix. 01:14:38.961 --> 01:14:43.551 Include string.h which, despite its name, does not create a data type 01:14:43.551 --> 01:14:46.431 called string, it just has string-related functions in it 01:14:46.431 --> 01:14:47.511 like str compare. 01:14:47.511 --> 01:14:49.161 Let's make compare again. 01:14:49.161 --> 01:14:51.231 Now it compiles, ./compare. 01:14:51.231 --> 01:14:55.011 Now let's type in hi, exclamation point and even the same thing again. 01:14:55.011 --> 01:14:58.641 These are now-- oh, I used it wrong. 01:14:58.641 --> 01:15:00.364 OK, user error. 01:15:00.364 --> 01:15:02.781 That was supposed to be impressive, but it's the opposite. 01:15:02.781 --> 01:15:05.101 What did I do wrong? 01:15:05.101 --> 01:15:06.201 What did I do wrong here? 01:15:06.201 --> 01:15:07.463 Yeah. 01:15:07.463 --> 01:15:08.951 Yeah. 01:15:08.951 --> 01:15:12.258 AUDIENCE: [INAUDIBLE] 01:15:12.258 --> 01:15:14.591 DAVID J. MALAN: Yeah, it returns three different values. 01:15:14.591 --> 01:15:18.371 Zero if they're the same, positive 1 becomes before the other, 01:15:18.371 --> 01:15:20.061 negative if the opposite is true. 01:15:20.061 --> 01:15:23.261 I just forgot that, so like I did last week correctly, 01:15:23.261 --> 01:15:26.741 if I want to compare them for equality per the manual page, 01:15:26.741 --> 01:15:29.421 I should be checking for zero as the return value. 01:15:29.421 --> 01:15:32.591 Now make compare, ./compare, Enter. 01:15:32.591 --> 01:15:35.261 Let's try it one last time-- hi and hi. 01:15:35.261 --> 01:15:36.821 OK now, they're in fact the same. 01:15:36.821 --> 01:15:38.231 And Justin, thank you. 01:15:41.871 --> 01:15:44.751 And indeed, not that it's returning same all the time. 01:15:44.751 --> 01:15:46.971 If I type in hi and then bye, it's indeed 01:15:46.971 --> 01:15:49.261 noticing that difference as well. 01:15:49.261 --> 01:15:53.251 Well, let me go ahead and do one other thing here. 01:15:53.251 --> 01:15:55.501 Let's do one other thing. 01:15:55.501 --> 01:15:59.001 Let me go ahead now and just reveal more pictorially what's going on. 01:15:59.001 --> 01:16:02.331 Let's get rid of the string comparison and let's just print these things out. 01:16:02.331 --> 01:16:06.111 The simple way to print this out would be with %s and again, %s is special-- 01:16:06.111 --> 01:16:07.161 printf knows-- 01:16:07.161 --> 01:16:10.341 taking an address and start there, print every character up 01:16:10.341 --> 01:16:13.741 until the backslash n, so let's just hand it s and do that. 01:16:13.741 --> 01:16:16.911 And then let's do one more, %s,t. 01:16:16.911 --> 01:16:21.751 This is, again, sort of a mix of week 1 and this week 01:16:21.751 --> 01:16:23.571 because I got rid of the word string. 01:16:23.571 --> 01:16:28.711 I'm using char star, but I'm still using printf and %s in the same way. 01:16:28.711 --> 01:16:32.331 Let me go ahead and run compare now, and if I type hi and hi, 01:16:32.331 --> 01:16:34.291 I should see the same thing twice. 01:16:34.291 --> 01:16:37.911 So they look the same, but here now we have the syntax today 01:16:37.911 --> 01:16:40.291 to print out the actual addresses of these things. 01:16:40.291 --> 01:16:44.721 So let me just change the s to a p, because p means don't go to the address 01:16:44.721 --> 01:16:48.651 and print it, it means just print the address as a pointer. 01:16:48.651 --> 01:16:53.421 So make compare, ./compare, and now let's type in hi, and once more, 01:16:53.421 --> 01:16:57.831 and I should see, indeed, two slightly different addresses given 01:16:57.831 --> 01:16:58.641 in hexadecimal. 01:16:58.641 --> 01:17:00.951 One's got a B at the end, one's got an F at the end, 01:17:00.951 --> 01:17:03.481 and they are indeed a few bytes apart. 01:17:03.481 --> 01:17:06.706 So this is just confirming what our suspicions have actually been. 01:17:06.706 --> 01:17:09.081 So what does this mean, perhaps in the computer's memory? 01:17:09.081 --> 01:17:10.581 Well, let's take a look. 01:17:10.581 --> 01:17:14.511 I've zoomed out so I have a little more squares to look at at once. 01:17:14.511 --> 01:17:20.901 Here might be s in memory when I do string s equals, or char star s equals. 01:17:20.901 --> 01:17:24.381 I get a variable that's of size 1, 2, 3, 4, 5, 6, 7, 8, because I 01:17:24.381 --> 01:17:27.951 claimed earlier that on modern systems, pointers are generally eight bytes 01:17:27.951 --> 01:17:30.261 nowadays so they can count even higher. 01:17:30.261 --> 01:17:33.246 And inside of the computer's memory, also, might be hi. 01:17:33.246 --> 01:17:35.871 And I don't know where it ends up so for the sake of discussion 01:17:35.871 --> 01:17:36.801 it ended up down here. 01:17:36.801 --> 01:17:39.761 That's what was free when I ran the program. 01:17:39.761 --> 01:17:41.601 h-i exclamation point, backslash zero. 01:17:41.601 --> 01:17:46.761 Maybe it ended up, for the sake of discussion, at 0x123, 4, 5, and 6. 01:17:46.761 --> 01:17:51.801 So to be clear, what is s storing once the assignment 01:17:51.801 --> 01:17:54.711 operator copies from right to left? 01:17:54.711 --> 01:17:59.331 What is s storing if I advance one more slide? 01:17:59.331 --> 01:18:01.451 Yeah. 01:18:01.451 --> 01:18:05.261 0x123, the presumption being that if a string is 01:18:05.261 --> 01:18:09.236 defined by the address of its first char and that address of its first char 01:18:09.236 --> 01:18:13.691 is 0x123, then that's indeed what should be in the variable s. 01:18:13.691 --> 01:18:16.751 And so technically, that's what's been happening with that assignment 01:18:16.751 --> 01:18:18.251 operator from right to left. 01:18:18.251 --> 01:18:21.401 GetString indeed returns a string, so to speak, 01:18:21.401 --> 01:18:25.241 but more properly it returns the address of a char. 01:18:25.241 --> 01:18:28.721 What's been then copied from right to left using that assignment operator 01:18:28.721 --> 01:18:31.601 all these weeks is indeed that address. 01:18:31.601 --> 01:18:36.101 Now technically, we don't really need to care about where these addresses are. 01:18:36.101 --> 01:18:38.951 It suffices to just think about them referentially, but let's 01:18:38.951 --> 01:18:42.791 first consider where t might be. t is just another variable that I 01:18:42.791 --> 01:18:44.441 created on my second line of code. 01:18:44.441 --> 01:18:46.061 Maybe it ends up there, maybe somewhere else. 01:18:46.061 --> 01:18:48.353 For the sake of discussion I'll draw it left and right. 01:18:48.353 --> 01:18:51.771 Where did the second word end up that I typed in? 01:18:51.771 --> 01:18:57.671 Well, suppose the second copy of hi ended up at 0x456457458459. 01:18:57.671 --> 01:18:58.961 What ended up in t? 01:18:58.961 --> 01:19:00.551 I'll pluck this one off myself. 01:19:00.551 --> 01:19:02.621 0x456, presumably. 01:19:02.621 --> 01:19:06.071 And so this is now a pictorial representation of why, 01:19:06.071 --> 01:19:07.751 and let's abstract away everything else. 01:19:07.751 --> 01:19:13.061 When I compared s against t using equal equals, based on the picture 01:19:13.061 --> 01:19:14.591 they're obviously not the same. 01:19:14.591 --> 01:19:16.751 One is over here, one is over here. 01:19:16.751 --> 01:19:21.281 And per a moment ago, one is 0x123, the other is 0x456. 01:19:21.281 --> 01:19:24.491 Yes, technically they're pointing at something that's the same, 01:19:24.491 --> 01:19:27.971 but that just reveals how str compare works. 01:19:27.971 --> 01:19:30.641 str compare is apparently a function that 01:19:30.641 --> 01:19:33.881 takes in the address of a string as its argument 01:19:33.881 --> 01:19:36.401 and the address of another string as its argument, 01:19:36.401 --> 01:19:41.321 it goes to the first character in each of those strings, respectively, 01:19:41.321 --> 01:19:43.511 and probably has a for loop or a while loop 01:19:43.511 --> 01:19:46.421 and just goes from left to right, comparing, looking 01:19:46.421 --> 01:19:50.141 for the same chars left and right, and if it doesn't notice any differences, 01:19:50.141 --> 01:19:52.121 boom-- it returns zero. 01:19:52.121 --> 01:19:56.481 If it does notice a difference it returns a positive or a negative value. 01:19:56.481 --> 01:20:00.321 And that's very similar, recall, to how we implemented string length ourselves 01:20:00.321 --> 01:20:00.821 last week. 01:20:00.821 --> 01:20:03.731 I used a for loop, I was looking for a backslash zero. 01:20:03.731 --> 01:20:09.521 str compare is probably a little similar in spirit, looping from left to right 01:20:09.521 --> 01:20:13.001 but comparing, this time not just counting. 01:20:13.001 --> 01:20:15.731 Are any questions then, on string comparison 01:20:15.731 --> 01:20:18.821 and why it is that we use str compare and not equals equals? 01:20:18.821 --> 01:20:20.013 Yeah. 01:20:20.013 --> 01:20:22.249 AUDIENCE: Do pointers have addresses? 01:20:22.249 --> 01:20:24.041 DAVID J. MALAN: Do pointers have addresses? 01:20:24.041 --> 01:20:24.541 Yes. 01:20:24.541 --> 01:20:29.291 So we won't do that today, but I could actually use the ampersand operator 01:20:29.291 --> 01:20:30.821 on s or on t. 01:20:30.821 --> 01:20:34.421 That would give me the equivalent of a char star star 01:20:34.421 --> 01:20:36.606 that itself could be stored elsewhere in memory. 01:20:36.606 --> 01:20:37.481 That's where it ends. 01:20:37.481 --> 01:20:39.671 We don't do that recursively forever. 01:20:39.671 --> 01:20:42.611 There's star and there's star star, but yes, that is a thing 01:20:42.611 --> 01:20:45.911 and it's very often useful in the context of two dimensional arrays, 01:20:45.911 --> 01:20:49.181 which we haven't really talked about, but that is a feature of the language, 01:20:49.181 --> 01:20:49.681 too. 01:20:49.681 --> 01:20:50.711 But not today. 01:20:50.711 --> 01:20:52.221 Good question. 01:20:52.221 --> 01:20:55.271 All right, so what might we now do to take things up a notch? 01:20:55.271 --> 01:20:57.791 Well let's go ahead and implement a different program here 01:20:57.791 --> 01:21:01.341 that maybe tries copying some values, just to demonstrate this. 01:21:01.341 --> 01:21:05.081 Let me open up a file called, how about copy.c, 01:21:05.081 --> 01:21:07.511 and I'm going to start off with a few includes. 01:21:07.511 --> 01:21:11.291 So let's include the CS50 library just so we have a way of getting user input. 01:21:11.291 --> 01:21:15.941 Let's include-- how about stdio as always, let's preemptively 01:21:15.941 --> 01:21:18.711 include string.h and maybe one other in a moment. 01:21:18.711 --> 01:21:21.711 Let's do int main(void) as before. 01:21:21.711 --> 01:21:25.241 And then in here, let's get a string from the user and just 01:21:25.241 --> 01:21:27.671 call it s for simplicity. 01:21:27.671 --> 01:21:31.361 And heck, we can actually just call this char star if we want, 01:21:31.361 --> 01:21:33.474 or string, since we're using the RS50 library. 01:21:33.474 --> 01:21:34.641 But we'll come back to that. 01:21:34.641 --> 01:21:38.231 Let's now make a copy of s and do s equals t, 01:21:38.231 --> 01:21:42.891 using a single assignment operator and then let's check something like this. 01:21:42.891 --> 01:21:47.831 Let's go into the first character of t, which is t bracket zero, 01:21:47.831 --> 01:21:50.231 and then let's uppercase it using that function 01:21:50.231 --> 01:21:55.571 that we've used in the past of toupper t bracket zero, semicolon. 01:21:55.571 --> 01:21:57.231 And actually, I should go back up here. 01:21:57.231 --> 01:22:01.468 If I'm using toupper or if you use tolower or isupper or islower-- 01:22:01.468 --> 01:22:04.301 I might not remember this offhand, but it was in another header file 01:22:04.301 --> 01:22:06.161 called C type dot h. 01:22:06.161 --> 01:22:09.291 There was a bunch of helpful functions in that library as well. 01:22:09.291 --> 01:22:14.096 Now at the very last line of the program let's just print out what both s and t 01:22:14.096 --> 01:22:21.521 are by simply printing out %s for each of them, and t is %s also, not %t, 01:22:21.521 --> 01:22:24.681 of course, and let's see what happens here. 01:22:24.681 --> 01:22:26.471 So let me make copy-- 01:22:26.471 --> 01:22:27.881 oh my God, so many mistakes. 01:22:27.881 --> 01:22:29.271 What did I do wrong? 01:22:29.271 --> 01:22:30.221 Oh. 01:22:30.221 --> 01:22:31.301 OK, that was unintended. 01:22:31.301 --> 01:22:34.851 String t equals s, sorry, so I'm creating two variables, 01:22:34.851 --> 01:22:37.781 s and t respectively, and I'm copying s into t. 01:22:37.781 --> 01:22:39.461 Make copy, Enter. 01:22:39.461 --> 01:22:44.651 There we go. ./copy, and let's now type in, for instance, 01:22:44.651 --> 01:22:48.521 how about hi exclamation point in all lowercase this time, 01:22:48.521 --> 01:22:52.091 and now what gets printed? 01:22:52.091 --> 01:22:56.201 I don't think that's what I intended, so to speak, here. 01:22:56.201 --> 01:23:00.021 Because notice that I got s from the user, so that checks out. 01:23:00.021 --> 01:23:03.703 I then copied t into s, which looks correct. 01:23:03.703 --> 01:23:05.411 That's what we always use assignment for. 01:23:05.411 --> 01:23:09.191 Then I uppercase the first letter in t, but not s-- 01:23:09.191 --> 01:23:10.331 at least in my code-- 01:23:10.331 --> 01:23:14.051 then I printed s and t and then noticed, apparently, both s 01:23:14.051 --> 01:23:17.921 and t got capitalized. 01:23:17.921 --> 01:23:20.521 So if you're starting to get a little comfortable with what's 01:23:20.521 --> 01:23:24.421 going on underneath the hood, what's the fundamental problem here? 01:23:24.421 --> 01:23:28.223 Why did both get capitalized? 01:23:28.223 --> 01:23:29.431 Why did both get capitalized? 01:23:29.431 --> 01:23:30.121 Yeah, over here. 01:23:30.121 --> 01:23:32.601 AUDIENCE: Could it be they're referencing the same address? 01:23:32.601 --> 01:23:34.011 DAVID J. MALAN: Yeah, they're representing the same address. 01:23:34.011 --> 01:23:35.871 So C is really literal. 01:23:35.871 --> 01:23:39.261 If you create another variable called t and you assign it the value of s, 01:23:39.261 --> 01:23:41.871 you are literally assigning it the value in s, 01:23:41.871 --> 01:23:44.761 which is 0x123 or something like that. 01:23:44.761 --> 01:23:48.381 And so at that point in the story both s and t presumably 01:23:48.381 --> 01:23:51.951 have a value of 0x123, which means they technically 01:23:51.951 --> 01:23:56.061 point to the same h-i exclamation point in memory. 01:23:56.061 --> 01:24:00.891 Nowhere did I tell the computer to give me a copy of a h-i exclamation point 01:24:00.891 --> 01:24:04.131 per se, I literally said just copy s. 01:24:04.131 --> 01:24:08.391 So here's where an understanding of what s literally is explains the situation. 01:24:08.391 --> 01:24:10.761 I'm only copying the pointers. 01:24:10.761 --> 01:24:12.601 So what actually went on in memory? 01:24:12.601 --> 01:24:14.241 Let's take a look here at this grid. 01:24:14.241 --> 01:24:17.091 If I created s initially, maybe it ends up here. 01:24:17.091 --> 01:24:20.601 And I created hi in lowercase, and it ended up down here. 01:24:20.601 --> 01:24:26.751 Then the address was, again, like 0x123456, 0x123 is what's in s. 01:24:26.751 --> 01:24:29.451 If then I create a second variable called t, 01:24:29.451 --> 01:24:33.681 and I call it a string, a.k.a. char star, maybe it again ends up here. 01:24:33.681 --> 01:24:39.261 But when I copy s into t by doing t equals s semicolon, 01:24:39.261 --> 01:24:44.866 that literally just copies s into t, which puts the value 0x123 there. 01:24:44.866 --> 01:24:47.991 So if we now abstract away all these numbers and just think about a picture 01:24:47.991 --> 01:24:52.371 with arrows, what we've drawn in the computer's memory is this. 01:24:52.371 --> 01:24:56.871 Two different pointers but storing the same address, which means 01:24:56.871 --> 01:24:59.761 the breadcrumbs lead to the same place. 01:24:59.761 --> 01:25:02.841 And so if you follow the t breadcrumb and capitalize the first letter, 01:25:02.841 --> 01:25:06.831 it is functionally the same as copying the-- 01:25:06.831 --> 01:25:12.471 changing the first letter in the version s as well. 01:25:12.471 --> 01:25:17.311 So what's the solution, then, to this kind of problem? 01:25:17.311 --> 01:25:19.381 Even if you have no idea how to do it in code, 01:25:19.381 --> 01:25:21.946 what's the gist of what I really intended, which is, 01:25:21.946 --> 01:25:26.101 I want a genuine copy of s, called t. 01:25:26.101 --> 01:25:30.213 I want a new h-i exclamation point backslash zero. 01:25:30.213 --> 01:25:31.921 What do I need to do to make that happen? 01:25:31.921 --> 01:25:32.888 Thoughts? 01:25:32.888 --> 01:25:35.631 AUDIENCE: I think there's a function called str copy. 01:25:35.631 --> 01:25:38.961 DAVID J. MALAN: So there is a function called str copy, strcpy, 01:25:38.961 --> 01:25:41.511 which is a possible answer to this question. 01:25:41.511 --> 01:25:45.681 The catch with stir copy is that you have to tell it in advance not only 01:25:45.681 --> 01:25:48.231 what the source string is-- the one you want to copy-- 01:25:48.231 --> 01:25:50.961 you also need to pass in the address of a chunk of memory 01:25:50.961 --> 01:25:55.551 into which you can copy the string, and here's one thing we haven't seen yet, 01:25:55.551 --> 01:25:57.951 and we need one more building block today, if you will. 01:25:57.951 --> 01:26:02.361 We haven't yet seen a way to create new chunks of memory 01:26:02.361 --> 01:26:05.281 and then let some other function copy into them. 01:26:05.281 --> 01:26:08.661 And for this, we're going to introduce something called dynamic memory 01:26:08.661 --> 01:26:09.571 allocation. 01:26:09.571 --> 01:26:12.291 And this is the last and most powerful feature perhaps, today, 01:26:12.291 --> 01:26:16.251 whereby we're going to introduce two functions, malloc and free, where 01:26:16.251 --> 01:26:19.491 malloc means memory allocate, which literally does just that. 01:26:19.491 --> 01:26:22.641 It's a function that takes a number as input-- how many bytes of memory 01:26:22.641 --> 01:26:26.034 do you want the operating system to find for you somewhere in that big grid? 01:26:26.034 --> 01:26:27.951 It's going to find it and it's going to return 01:26:27.951 --> 01:26:31.554 to you the address of the first byte of contiguous memory back to back to back, 01:26:31.554 --> 01:26:34.221 and then you can do anything you want with that chunk of memory. 01:26:34.221 --> 01:26:35.751 free is going to do the opposite. 01:26:35.751 --> 01:26:38.571 When you're done using a chunk of memory that malloc has given you, 01:26:38.571 --> 01:26:42.201 you can say free it, and that means you hand it back to the operating system 01:26:42.201 --> 01:26:45.421 and then the operating system can use it for something else later. 01:26:45.421 --> 01:26:48.861 So this is actually evidence of a common problem in programming. 01:26:48.861 --> 01:26:53.311 If your Mac your PC has ever been in the habit of starting to get really, 01:26:53.311 --> 01:26:57.921 really slow, or it's slowing to a crawl-- heck, maybe it even freezes-- 01:26:57.921 --> 01:27:00.921 one of the possible explanations could be 01:27:00.921 --> 01:27:03.801 that the program you're running by Apple or Microsoft 01:27:03.801 --> 01:27:07.041 or whoever, maybe they're using malloc or some equivalent, 01:27:07.041 --> 01:27:08.346 asking the operating system-- 01:27:08.346 --> 01:27:10.221 Mac OS or Windows-- for, give me more memory. 01:27:10.221 --> 01:27:11.001 I need more memory. 01:27:11.001 --> 01:27:12.381 The user is creating more images. 01:27:12.381 --> 01:27:13.821 The user is typing a longer essay. 01:27:13.821 --> 01:27:15.441 Give me more memory, more memory. 01:27:15.441 --> 01:27:20.001 If the program has a bug and never actually frees any of that memory, 01:27:20.001 --> 01:27:22.701 your computer might end up using all of the available memory 01:27:22.701 --> 01:27:26.571 and honestly, humans are not very good at handling corner cases like that. 01:27:26.571 --> 01:27:29.451 Very often programs, computers just freeze at that point 01:27:29.451 --> 01:27:33.591 or get really, really slow because they start trying to be creative 01:27:33.591 --> 01:27:35.751 when there's not enough memory left. 01:27:35.751 --> 01:27:38.361 So one of the reasons for a computer really slowing down 01:27:38.361 --> 01:27:42.634 might be calling for malloc a lot, or some equivalent, but never freeing it. 01:27:42.634 --> 01:27:45.051 Which is to say, you should always use these two functions 01:27:45.051 --> 01:27:48.631 in concert and free memory once you are done with it. 01:27:48.631 --> 01:27:52.761 So let me go ahead and do this in code and solve this problem properly. 01:27:52.761 --> 01:27:54.801 Let me go ahead and do this. 01:27:54.801 --> 01:27:58.491 Before I copy s into t using something like str copy, 01:27:58.491 --> 01:28:01.126 I first need to get a bunch of memory from the computer. 01:28:01.126 --> 01:28:04.251 So to do that, let's make this super clear that we're dealing with pointer, 01:28:04.251 --> 01:28:07.821 so I'm going to change my strings to char stars for both s and t, 01:28:07.821 --> 01:28:10.281 and what I technically am going to store in t 01:28:10.281 --> 01:28:14.331 is the address of an available chunk of memory. 01:28:14.331 --> 01:28:18.531 To do that, I can ask the computer to allocate memory for me, 01:28:18.531 --> 01:28:19.941 and how many bytes. 01:28:19.941 --> 01:28:23.181 If I want to create a copy of h-i exclamation point, 01:28:23.181 --> 01:28:26.501 I need how many bytes? 01:28:26.501 --> 01:28:27.001 Good! 01:28:27.001 --> 01:28:27.631 Four! 01:28:27.631 --> 01:28:31.891 Because I need the h, the i, the exclamation point, and additional space 01:28:31.891 --> 01:28:33.001 for the backslash zero. 01:28:33.001 --> 01:28:35.161 It's up to me to understand that and ask for it. 01:28:35.161 --> 01:28:36.691 It's not going to happen magically. 01:28:36.691 --> 01:28:40.601 Nothing does in C. So I could just naively type four there, 01:28:40.601 --> 01:28:43.501 and that would be correct if I type in h-i exclamation 01:28:43.501 --> 01:28:47.431 point or any other three letter word or phrase, but to do this dynamically 01:28:47.431 --> 01:28:50.761 I should probably do something like strlen of s 01:28:50.761 --> 01:28:54.331 plus 1 for the additional null character. 01:28:54.331 --> 01:28:56.821 Recall that string length does it in the English sense-- 01:28:56.821 --> 01:29:00.991 it returns the length of the string you see, plus 1 also takes into account 01:29:00.991 --> 01:29:03.241 the fact that I'm going to need that backslash n. 01:29:03.241 --> 01:29:05.611 Now let me do this old school style first. 01:29:05.611 --> 01:29:10.351 Let me go ahead and manually copy the string s into t first. 01:29:10.351 --> 01:29:18.211 So for int i equals 0, i is less than the string length of s, i plus plus. 01:29:18.211 --> 01:29:23.161 Then inside my for loop, I'm going to do t bracket i equals s bracket 01:29:23.161 --> 01:29:27.211 i, but actually I want the null character too, 01:29:27.211 --> 01:29:30.001 so I want to do the length of the string plus 1 more, 01:29:30.001 --> 01:29:32.671 and heck, I think I learned an optimization last time. 01:29:32.671 --> 01:29:35.131 If I'm doing this again and again, I could really 01:29:35.131 --> 01:29:40.861 do n equals strlen of s plus 1 and then do i is less than n, 01:29:40.861 --> 01:29:43.361 just as a nice design optimization. 01:29:43.361 --> 01:29:46.531 I think this for loop will actually handle the process, then, 01:29:46.531 --> 01:29:53.341 of copying every character from s into every available byte of memory in t. 01:29:53.341 --> 01:29:56.671 Or I could get rid of all of that and take your suggestion, which 01:29:56.671 --> 01:30:00.841 is to use str copy, which takes as its first argument the destination 01:30:00.841 --> 01:30:03.301 and its second argument the source. 01:30:03.301 --> 01:30:08.281 So copy from right to left in this case, too, that's going to do all of that 01:30:08.281 --> 01:30:11.231 automatically for me as well. 01:30:11.231 --> 01:30:13.421 Now I think I'm good. 01:30:13.421 --> 01:30:15.401 I can now capitalize safely. 01:30:15.401 --> 01:30:19.441 The first character in t, which is now a different chunk of memory 01:30:19.441 --> 01:30:23.441 than s, and then I can print them both out to see that one has not changed 01:30:23.441 --> 01:30:24.451 but the other has. 01:30:24.451 --> 01:30:27.331 So make copy-- all right, what did I do wrong? 01:30:27.331 --> 01:30:30.421 Implicitly declaring library function malloc dot, dot, dot. 01:30:30.421 --> 01:30:33.061 So we've seen this kind of error before. 01:30:33.061 --> 01:30:36.151 What is-- even if you don't know quite how to solve it, 01:30:36.151 --> 01:30:37.681 what's the essence of the solution? 01:30:37.681 --> 01:30:40.711 What do I need to do to fix this kind of problem involving implicitly 01:30:40.711 --> 01:30:43.271 declaring a library function? 01:30:43.271 --> 01:30:44.081 What did I forget? 01:30:44.081 --> 01:30:46.211 Yeah. 01:30:46.211 --> 01:30:47.561 I need to include the library. 01:30:47.561 --> 01:30:51.551 And I could look this up in the manual, or I know it off the top of my head, 01:30:51.551 --> 01:30:52.361 I just forgot it. 01:30:52.361 --> 01:30:54.461 There's another library we'll occasionally 01:30:54.461 --> 01:30:56.561 need now called standard lib-- 01:30:56.561 --> 01:31:00.671 standard library-- that contains malloc and free prototypes 01:31:00.671 --> 01:31:02.021 and some other stuff, too. 01:31:02.021 --> 01:31:05.061 All right, let me just clear this away and do make copy one more time. 01:31:05.061 --> 01:31:10.961 Now I'm good. ./copy, Enter, All right. s, I'm going to type in hi, lowercase. 01:31:10.961 --> 01:31:14.771 t and s now come back as intended. 01:31:14.771 --> 01:31:19.961 s is untouched, it would seem, but t is now capitalized. 01:31:19.961 --> 01:31:23.351 Are any questions, then, on what we just did in code? 01:31:23.351 --> 01:31:25.172 Yeah. 01:31:25.172 --> 01:31:28.581 AUDIENCE: You said that malloc and free go together. 01:31:28.581 --> 01:31:32.093 [INAUDIBLE] 01:31:32.093 --> 01:31:33.051 DAVID J. MALAN: Indeed. 01:31:33.051 --> 01:31:35.093 There's a few improvements I want to make, so let 01:31:35.093 --> 01:31:36.651 me actually do those right now. 01:31:36.651 --> 01:31:39.681 Technically, I should practice what I preached and I should indeed, 01:31:39.681 --> 01:31:42.098 when I'm done with t, free t. 01:31:42.098 --> 01:31:44.181 Fortunately, I don't have to worry about how big t 01:31:44.181 --> 01:31:47.691 was-- the computer remembers how many bytes it gave me and it will go free 01:31:47.691 --> 01:31:49.371 all of them, not just the first. 01:31:49.371 --> 01:31:51.081 I should do free t. 01:31:51.081 --> 01:31:53.751 I don't need to do free s, and I shouldn't, 01:31:53.751 --> 01:31:56.691 because that is handled automatically by the CS50 library. 01:31:56.691 --> 01:31:59.091 s, recall, came from GetString, and we actually 01:31:59.091 --> 01:32:01.469 have some fancy code in place that makes sure 01:32:01.469 --> 01:32:03.261 that at the end of your program's execution 01:32:03.261 --> 01:32:06.321 we free any memory that we allocated so we don't actually 01:32:06.321 --> 01:32:08.256 waste memory like I described earlier. 01:32:08.256 --> 01:32:10.131 But there's actually a couple of other things 01:32:10.131 --> 01:32:12.631 if I really want to be pedantic I should put in here. 01:32:12.631 --> 01:32:16.071 It turns out that sometimes malloc can fail, 01:32:16.071 --> 01:32:18.809 and sometimes malloc doesn't have enough memory available 01:32:18.809 --> 01:32:20.601 because maybe your computer's doing so much 01:32:20.601 --> 01:32:22.701 stuff there's just no more RAM available. 01:32:22.701 --> 01:32:24.981 So technically, I should do something like this-- 01:32:24.981 --> 01:32:29.541 if t equals equals null, with two L's today, 01:32:29.541 --> 01:32:32.751 then I should just return 1 or something to say that there was a problem. 01:32:32.751 --> 01:32:34.626 I should probably print an error message too, 01:32:34.626 --> 01:32:36.301 but for now I'm going to keep it simple. 01:32:36.301 --> 01:32:38.526 I should also probably check this. 01:32:38.526 --> 01:32:40.851 This is a little risky of me. 01:32:40.851 --> 01:32:45.511 If I'm doing t bracket zero, this is assuming that there is a letter there. 01:32:45.511 --> 01:32:48.231 But what if the human just hit Enter at the prompt 01:32:48.231 --> 01:32:51.391 and didn't even type h, let alone h-i exclamation point? 01:32:51.391 --> 01:32:53.631 What if there is no t bracket zero? 01:32:53.631 --> 01:32:59.181 So technically, what I should probably do here is, if the length of t 01:32:59.181 --> 01:33:05.121 is at least greater than zero, then go ahead and safely capitalize 01:33:05.121 --> 01:33:06.441 the first letter of it. 01:33:06.441 --> 01:33:08.731 And then at the very end if all goes well, 01:33:08.731 --> 01:33:12.841 I can return zero, thereby signifying that indeed, this thing was successful. 01:33:12.841 --> 01:33:16.711 So yes, these two functions, malloc and free, should be in concert. 01:33:16.711 --> 01:33:21.651 And so if you call malloc you should call free eventually. 01:33:21.651 --> 01:33:27.256 But you did not call malloc for s, so you should not call free for s. 01:33:27.256 --> 01:33:28.131 Yeah, other question. 01:33:28.131 --> 01:33:29.298 AUDIENCE: Here's a question. 01:33:29.298 --> 01:33:31.579 Why do we do malloc plus 1? 01:33:31.579 --> 01:33:33.371 DAVID J. MALAN: Why did I do malloc plus 1? 01:33:33.371 --> 01:33:36.281 So malloc-- sorry, malloc of string length of s 01:33:36.281 --> 01:33:39.903 plus 1-- the string length is the literal length of the string as a human 01:33:39.903 --> 01:33:41.111 would perceive it in English. 01:33:41.111 --> 01:33:44.111 So h-i exclamation point-- strlen gives me 3, 01:33:44.111 --> 01:33:47.801 but I know now as of last week and this week what a string technically is 01:33:47.801 --> 01:33:49.751 and a string always has an extra byte. 01:33:49.751 --> 01:33:52.301 The onus is on me to understand and apply 01:33:52.301 --> 01:33:57.011 that lesson learned so that I actually give str copy enough room for that 01:33:57.011 --> 01:33:58.631 trailing null character. 01:33:58.631 --> 01:34:04.301 And here's just an annoying thing when we called the backslash zero N-U-L last 01:34:04.301 --> 01:34:08.351 week, it turns out that N-U-L-L is the same idea. 01:34:08.351 --> 01:34:11.531 It's also zero, but it's zero in the context of pointer. 01:34:11.531 --> 01:34:15.761 So long story short, you never really write N-U-L, I've just said it 01:34:15.761 --> 01:34:17.051 and we saw it on the screen. 01:34:17.051 --> 01:34:22.631 You will start writing N-U-L-L when you want to check whether or not a pointer 01:34:22.631 --> 01:34:23.681 is valid or not. 01:34:23.681 --> 01:34:25.091 And what I mean by that is this. 01:34:25.091 --> 01:34:27.971 If malloc fails and there's just not enough memory left inside 01:34:27.971 --> 01:34:31.271 of the computer for you, it's got to return a special value, 01:34:31.271 --> 01:34:35.201 and that special value is N-U-L-L in all capital letters. 01:34:35.201 --> 01:34:36.821 That signifies something went wrong. 01:34:36.821 --> 01:34:41.771 Do not trust that I'm giving you a useful return value. 01:34:41.771 --> 01:34:45.391 Other questions on these copies thus far? 01:34:45.391 --> 01:34:47.530 Yeah, over there. 01:34:47.530 --> 01:34:51.481 AUDIENCE: [INAUDIBLE] 01:34:51.481 --> 01:34:52.731 DAVID J. MALAN: Good question. 01:34:52.731 --> 01:34:54.621 Will str copy not work without malloc? 01:34:54.621 --> 01:34:57.891 You kind of need both in this case because str copy, 01:34:57.891 --> 01:35:01.281 by definition-- if I pull up its manual page-- needs a destination 01:35:01.281 --> 01:35:03.261 to put the copied characters. 01:35:03.261 --> 01:35:06.321 It's not sufficient just to say char star t semicolon. 01:35:06.321 --> 01:35:07.761 That only gives you a pointer. 01:35:07.761 --> 01:35:10.701 But I need another chunk of memory that's 01:35:10.701 --> 01:35:14.811 just as big as h-i exclamation point backslash zero, 01:35:14.811 --> 01:35:17.271 so malloc gives me a whole bunch of memory 01:35:17.271 --> 01:35:21.561 and then str copy fills it with h-i exclamation point backslash zero. 01:35:21.561 --> 01:35:24.021 So again, that's why we're going down to this lower level, 01:35:24.021 --> 01:35:26.063 because once you understand what needs to be done 01:35:26.063 --> 01:35:27.931 you now have the functions to do it. 01:35:27.931 --> 01:35:29.971 So let's actually consider what we just solved. 01:35:29.971 --> 01:35:33.831 So in this next version of the program where I actually introduced malloc, 01:35:33.831 --> 01:35:37.341 t was initialized for the return value of malloc, 01:35:37.341 --> 01:35:39.381 and maybe the memory that I got back was here-- 01:35:39.381 --> 01:35:42.981 0x456457458459. 01:35:42.981 --> 01:35:45.291 I've left it blank initially because nothing 01:35:45.291 --> 01:35:47.001 is put there automatically by malloc. 01:35:47.001 --> 01:35:51.111 I just get a chunk of memory that is now mine to use as I see fit. 01:35:51.111 --> 01:35:56.031 I then assign t to that return value, which points t at the first address. 01:35:56.031 --> 01:35:57.861 Notice there's no backslash zero. 01:35:57.861 --> 01:36:00.741 This is not yet a string it's just a chunk of memory-- 01:36:00.741 --> 01:36:02.871 four bytes-- an array of four bytes. 01:36:02.871 --> 01:36:06.441 What str copy eventually did for me was it copied the h over, 01:36:06.441 --> 01:36:10.671 the i over, the exclamation point over, and the backslash zero. 01:36:10.671 --> 01:36:14.541 And if I didn't want to use str copy or I forgot that it existed, my for loop 01:36:14.541 --> 01:36:18.701 would have done exactly the same thing. 01:36:18.701 --> 01:36:23.818 Are any questions, then, on these examples here. 01:36:23.818 --> 01:36:24.401 Any questions? 01:36:24.401 --> 01:36:26.144 Yeah. 01:36:26.144 --> 01:36:33.131 AUDIENCE: [INAUDIBLE] 01:36:33.131 --> 01:36:34.381 DAVID J. MALAN: Good question. 01:36:34.381 --> 01:36:38.731 After malloc, if I had then still done just t equals s, 01:36:38.731 --> 01:36:41.851 it actually would have recreated the same original problem 01:36:41.851 --> 01:36:45.571 by just copying 0x123 from s into t. 01:36:45.571 --> 01:36:48.751 So then I would have been left with a picture that looked like this a few 01:36:48.751 --> 01:36:52.711 steps ago, I would have-- and I can't quite do it live-- 01:36:52.711 --> 01:36:55.021 this arrow, if I did what you just described, 01:36:55.021 --> 01:36:58.998 would now be pointing over here and so I wouldn't have fundamentally solved 01:36:58.998 --> 01:37:01.081 the problem, I would have just additionally wasted 01:37:01.081 --> 01:37:04.141 four bytes temporarily that I'm not actually using. 01:37:04.141 --> 01:37:05.983 Yeah. 01:37:05.983 --> 01:37:09.781 AUDIENCE: [INAUDIBLE] 01:37:09.781 --> 01:37:10.861 DAVID J. MALAN: You can-- 01:37:10.861 --> 01:37:12.819 do you always use malloc and str copy together? 01:37:12.819 --> 01:37:13.594 Not necessarily. 01:37:13.594 --> 01:37:15.511 These are both solving two different problems. 01:37:15.511 --> 01:37:19.771 malloc's giving me enough memory to make a copy, str copy is doing the copy. 01:37:19.771 --> 01:37:23.581 However, you could actually use an array, if you wanted, of characters, 01:37:23.581 --> 01:37:26.911 and you could use str copy on that, and there's other use cases for str copy. 01:37:26.911 --> 01:37:29.071 But thus far, it's a reasonable mental model 01:37:29.071 --> 01:37:31.291 to have that if you want to copy strings, 01:37:31.291 --> 01:37:34.921 you use malloc and then str copy, or your own homegrown loop. 01:37:34.921 --> 01:37:36.844 Yeah. 01:37:36.844 --> 01:37:47.171 AUDIENCE: [INAUDIBLE] 01:37:47.171 --> 01:37:49.370 DAVID J. MALAN: Say that once more. 01:37:49.370 --> 01:37:54.579 AUDIENCE: [INAUDIBLE] 01:37:54.579 --> 01:37:55.371 DAVID J. MALAN: No. 01:37:55.371 --> 01:37:57.031 It will-- good question. 01:37:57.031 --> 01:38:00.171 If I had a-- 01:38:00.171 --> 01:38:03.441 str copy, per its documentation, will copy the whole string 01:38:03.441 --> 01:38:05.661 plus the null character at the end. 01:38:05.661 --> 01:38:08.121 It just assumes there will be one there. 01:38:08.121 --> 01:38:12.291 It's therefore up to you to pass str copy a long enough chunk of memory 01:38:12.291 --> 01:38:13.281 to have room for that. 01:38:13.281 --> 01:38:15.471 If I only ask malloc for three bytes, that 01:38:15.471 --> 01:38:17.541 could have potentially created a memory problem 01:38:17.541 --> 01:38:20.901 whereby str copy would just still blindly copy one, two, three, 01:38:20.901 --> 01:38:24.441 four bytes, but technically it should have only touched three of those. 01:38:24.441 --> 01:38:27.291 You do not yet have access to the fourth one, or the rights to it, 01:38:27.291 --> 01:38:29.541 because you never asked malloc for it. 01:38:29.541 --> 01:38:31.461 Yeah. 01:38:31.461 --> 01:38:34.461 AUDIENCE: So the number inside malloc would be the number of bytes. 01:38:34.461 --> 01:38:34.821 DAVID J. MALAN: Correct. 01:38:34.821 --> 01:38:36.696 The number inside malloc-- it's one argument. 01:38:36.696 --> 01:38:39.723 It's the number of bytes you want back. 01:38:39.723 --> 01:38:43.041 AUDIENCE: Does that mean you have to remember [INAUDIBLE]?? 01:38:45.798 --> 01:38:48.131 DAVID J. MALAN: Yes, the onus is on you, the programmer, 01:38:48.131 --> 01:38:50.298 to remember or frankly, use a function to figure out 01:38:50.298 --> 01:38:51.821 how many bytes you actually need. 01:38:51.821 --> 01:38:54.671 That's why I did not ultimately type in four manually, 01:38:54.671 --> 01:38:56.441 I used str length plus 1. 01:38:56.441 --> 01:38:59.831 So the plus 1 is necessary if you understand how strings are represented, 01:38:59.831 --> 01:39:02.471 but using strlen means that I can actually 01:39:02.471 --> 01:39:05.651 play around with any types of inputs and it will dynamically 01:39:05.651 --> 01:39:07.541 figure out the length. 01:39:07.541 --> 01:39:09.821 So suffice it to say, there's so many ways 01:39:09.821 --> 01:39:11.931 already where you can start to break programs. 01:39:11.931 --> 01:39:15.386 Let's give you at least one tool for finding mistakes that you might make. 01:39:15.386 --> 01:39:17.261 And indeed, in upcoming problem sets you will 01:39:17.261 --> 01:39:19.361 use this to find bugs in your own code. 01:39:19.361 --> 01:39:22.991 Not just using printf, not just using the built-in debugger, but another tool 01:39:22.991 --> 01:39:24.201 here as well. 01:39:24.201 --> 01:39:27.371 So let me go ahead and deliberately write a program called memory.c 01:39:27.371 --> 01:39:29.511 that has some memory-related errors. 01:39:29.511 --> 01:39:34.901 Let me include stdio.h at the top and let me include stdlib.h at the top 01:39:34.901 --> 01:39:36.551 so I have access to malloc now. 01:39:36.551 --> 01:39:41.171 Let me do int main(void) and then inside of main, let me do this-- 01:39:41.171 --> 01:39:44.351 I want to allocate maybe how about three-- 01:39:44.351 --> 01:39:45.711 space for three integers. 01:39:45.711 --> 01:39:46.211 Why? 01:39:46.211 --> 01:39:48.191 Just for the sake of discussion. 01:39:48.191 --> 01:39:52.721 So I'm going to go ahead and do malloc of three, but I don't want three bytes. 01:39:52.721 --> 01:39:56.008 I want three integers and an integer is four bytes, 01:39:56.008 --> 01:39:57.341 so technically I could do this-- 01:39:57.341 --> 01:40:01.851 3 times 4, or I could do 12 but again, that's making certain assumptions 01:40:01.851 --> 01:40:04.341 and if I run this program on a slightly different computer, 01:40:04.341 --> 01:40:05.861 int might be a different size. 01:40:05.861 --> 01:40:10.321 so the better way to do this would be 3 times whatever the size is of an int. 01:40:10.321 --> 01:40:13.571 And this is just an operator you can use any time if you just want to find out 01:40:13.571 --> 01:40:15.611 on this computer, how big is an int? 01:40:15.611 --> 01:40:18.291 How big is a float, or something else? 01:40:18.291 --> 01:40:20.411 So that's going to give me that many-- 01:40:20.411 --> 01:40:22.811 that much memory for three ints. 01:40:22.811 --> 01:40:24.821 What do I want to assign this to? 01:40:24.821 --> 01:40:27.011 Well, malloc returns an address. 01:40:27.011 --> 01:40:32.291 Pointers are addresses, so I'm going to create a pointer to an int called 01:40:32.291 --> 01:40:34.521 x and assign it the value. 01:40:34.521 --> 01:40:35.741 So what am I doing here? 01:40:35.741 --> 01:40:38.321 This is a little less obvious, but again go back to basics. 01:40:38.321 --> 01:40:43.091 The right hand side here gives me a chunk of memory for three integers. 01:40:43.091 --> 01:40:46.661 malloc returns the address of the first byte of that chunk. 01:40:46.661 --> 01:40:48.791 How do I store the address of anything? 01:40:48.791 --> 01:40:49.691 I need a pointer. 01:40:49.691 --> 01:40:53.561 The syntax for today is type of data, star, 01:40:53.561 --> 01:40:58.631 where the type of data in question is three ints, so I do int star x. 01:40:58.631 --> 01:41:02.531 Again, it's kind of purposeless, only for sort of instructional purposes 01:41:02.531 --> 01:41:07.901 here, but this is equivalent now to having a chunk of memory of size 12 01:41:07.901 --> 01:41:11.351 in total, presumably, so I can technically now do this. 01:41:11.351 --> 01:41:15.491 I can go into maybe the first location and assign it the number 72 01:41:15.491 --> 01:41:16.911 like the other day. 01:41:16.911 --> 01:41:24.701 Second location, the number 73, and the third location, maybe the number 33. 01:41:24.701 --> 01:41:27.551 Now I've deliberately made two mistakes here 01:41:27.551 --> 01:41:30.701 because I'm trying to trip over my newfound understanding, 01:41:30.701 --> 01:41:33.281 or my greenness with understanding pointers. 01:41:33.281 --> 01:41:36.641 One, I didn't remember that I should be treating chunks of memory 01:41:36.641 --> 01:41:37.751 as zero indexed. 01:41:37.751 --> 01:41:41.141 malloc essentially returns an array, if you want to think of it as that. 01:41:41.141 --> 01:41:43.541 An array of three ints, or more technically, 01:41:43.541 --> 01:41:47.381 the address of a chunk of memory that could fit three ints. 01:41:47.381 --> 01:41:50.681 So I can use my square bracket notation, or I could be really cool 01:41:50.681 --> 01:41:53.631 and use pointer arithmetic, but this is a little more user friendly. 01:41:53.631 --> 01:41:55.481 But I have made two mistakes. 01:41:55.481 --> 01:41:59.081 I did not start indexing at zero, so line seven 01:41:59.081 --> 01:42:00.941 should have been x bracket zero. 01:42:00.941 --> 01:42:03.813 Line eight should have been x bracket 1, and then line nine 01:42:03.813 --> 01:42:05.021 should have been x bracket 2. 01:42:05.021 --> 01:42:06.231 So first mistake. 01:42:06.231 --> 01:42:09.161 The second mistake that I've made as a side effect, 01:42:09.161 --> 01:42:12.221 is I'm also touching memory that I shouldn't. 01:42:12.221 --> 01:42:17.171 x bracket 3 would mean go to the fourth int in the chunk of memory 01:42:17.171 --> 01:42:17.981 that came back. 01:42:17.981 --> 01:42:20.501 I only asked for enough memory for three ints, 01:42:20.501 --> 01:42:23.741 not four, so this is what's called a buffer overflow. 01:42:23.741 --> 01:42:26.831 I am accidentally, but deliberately at the moment, 01:42:26.831 --> 01:42:30.951 going beyond the boundaries of this array, this chunk of memory. 01:42:30.951 --> 01:42:33.311 So bad things happen, but not necessarily 01:42:33.311 --> 01:42:34.641 by just running your program. 01:42:34.641 --> 01:42:36.191 Let me go ahead and just try this. 01:42:36.191 --> 01:42:42.011 Make memory, and you'll see here that it compiles OK. ./memory, 01:42:42.011 --> 01:42:44.139 and it actually does not segmentation fault, 01:42:44.139 --> 01:42:46.181 which comes back to that point of nondeterminism. 01:42:46.181 --> 01:42:48.551 Sometimes it does, sometimes it doesn't-- it depends on how bad 01:42:48.551 --> 01:42:49.691 of a mistake you made. 01:42:49.691 --> 01:42:52.858 But there's a program that can spot these kinds of mistakes, 01:42:52.858 --> 01:42:55.691 and I'm going to go ahead and expand my terminal window for a moment 01:42:55.691 --> 01:43:01.151 and I'm going to run not just ./memory, but a program called Valgrind./memory. 01:43:01.151 --> 01:43:04.001 This is a command that comes with a lot of computer systems 01:43:04.001 --> 01:43:07.071 that's designed to find memory-related bugs in code. 01:43:07.071 --> 01:43:09.011 So it's a new tool in your toolkit today, 01:43:09.011 --> 01:43:11.111 and you'll use it with the coming problem sets. 01:43:11.111 --> 01:43:12.311 I'm going to run this now. 01:43:12.311 --> 01:43:14.591 It's output, honestly, it's hideous. 01:43:14.591 --> 01:43:17.981 But there's a few things that will start to jump out 01:43:17.981 --> 01:43:20.381 and will help you with tools and the problems 01:43:20.381 --> 01:43:21.951 sets to see these kinds of things. 01:43:21.951 --> 01:43:23.531 Here's the first mistake. 01:43:23.531 --> 01:43:26.471 Invalid write of size four. 01:43:26.471 --> 01:43:30.461 That's on memory.c line nine, per my highlights. 01:43:30.461 --> 01:43:32.351 So let me go look at line nine. 01:43:32.351 --> 01:43:36.011 In what sense is this an invalid write of size four? 01:43:36.011 --> 01:43:38.591 Well, I'm touching memory that I shouldn't, and I'm 01:43:38.591 --> 01:43:40.061 touching it as though it's an int. 01:43:40.061 --> 01:43:42.551 And an int is four bytes-- size four. 01:43:42.551 --> 01:43:45.831 So again, this takes some practice to get used to, the nomenclature here, 01:43:45.831 --> 01:43:48.771 but this is now a clue for me, the programmer, 01:43:48.771 --> 01:43:52.231 that not only did I screw up, but I screwed up related to memory 01:43:52.231 --> 01:43:54.749 and so this is just a hint, if you will. 01:43:54.749 --> 01:43:57.291 It's not going to necessarily tell you exactly how to fix it, 01:43:57.291 --> 01:44:01.131 you have to wrestle with the semantics, but invalid 01:44:01.131 --> 01:44:02.961 write of size four-- oh, OK. 01:44:02.961 --> 01:44:07.321 So I should not have indexed past the boundary here. 01:44:07.321 --> 01:44:10.021 All right, so I shouldn't have done that. 01:44:10.021 --> 01:44:15.764 So let me go ahead then and change this to zero, one, and two, perhaps, here. 01:44:15.764 --> 01:44:17.931 All right, so let me go ahead and recompile my code. 01:44:17.931 --> 01:44:24.261 Make memory, ./memory, still doesn't seem to be broken but it is technically 01:44:24.261 --> 01:44:24.891 buggy. 01:44:24.891 --> 01:44:31.101 Let me go ahead and run Valgrind again, so Valgrind of ./memory, Enter. 01:44:31.101 --> 01:44:33.321 And now there's fewer scary-- 01:44:33.321 --> 01:44:36.841 less scary output now, but there's still something in there. 01:44:36.841 --> 01:44:40.368 Notice this-- 12 bytes in one blocks-- 01:44:40.368 --> 01:44:42.201 no regard for grammar there-- are definitely 01:44:42.201 --> 01:44:43.971 lost in lost record one of one. 01:44:43.971 --> 01:44:47.611 Super cryptic, but this is hinting at a so-called memory leak. 01:44:47.611 --> 01:44:51.441 The blocks of memory are lost in the sense that I malloc'd them-- 01:44:51.441 --> 01:44:52.881 I asked for them but I never-- 01:44:52.881 --> 01:44:55.071 take a guess-- freed them. 01:44:55.071 --> 01:44:56.008 I have a memory leak. 01:44:56.008 --> 01:44:58.341 And this is the arcane way of saying, you've screwed up. 01:44:58.341 --> 01:44:59.551 You have a memory leak. 01:44:59.551 --> 01:45:01.821 So this is an easy fix, fortunately. 01:45:01.821 --> 01:45:06.211 Once I'm done with this memory I just need to free it at the end. 01:45:06.211 --> 01:45:08.631 So now let me go ahead and rerun make memory, 01:45:08.631 --> 01:45:12.441 it's still runs fine so all the while I might have thought, incorrectly, 01:45:12.441 --> 01:45:13.581 my code is correct. 01:45:13.581 --> 01:45:15.261 But let me run Valgrind one more time. 01:45:15.261 --> 01:45:17.451 Valgrin of ./memory, Enter. 01:45:17.451 --> 01:45:19.341 Now, this is pretty good. 01:45:19.341 --> 01:45:21.531 All heap blocks were freed, whatever that means. 01:45:21.531 --> 01:45:23.371 No leaks are possible. 01:45:23.371 --> 01:45:26.481 And even though it's still a little cryptic, there's no other error here 01:45:26.481 --> 01:45:29.985 and in fact, it's pretty explicit-- error summary, zero errors from zero 01:45:29.985 --> 01:45:31.641 contexts, dot, dot, dot. 01:45:31.641 --> 01:45:34.831 So even though this is one of the most arcane tools we'll use, 01:45:34.831 --> 01:45:37.341 it's also one of the most powerful because it can see things 01:45:37.341 --> 01:45:40.671 that you, the human, might not, and maybe even that the debugger might not. 01:45:40.671 --> 01:45:42.741 It does a much closer reading of your code 01:45:42.741 --> 01:45:48.501 while it's running to figure out exactly what is going on. 01:45:48.501 --> 01:45:50.781 Any questions, then, on this tool? 01:45:50.781 --> 01:45:54.681 And we'll guide you after today with actually using this, too. 01:45:54.681 --> 01:45:57.201 Just helps you find memory-related mistakes 01:45:57.201 --> 01:46:00.021 that you might now be capable of making. 01:46:00.021 --> 01:46:02.181 All right, let's do one other memory-related thing. 01:46:02.181 --> 01:46:04.171 Let me shrink my terminal window here. 01:46:04.171 --> 01:46:07.911 Let me create one other file here called garbage.c. 01:46:07.911 --> 01:46:11.421 It turns out there's a term of ours called garbage values in programming 01:46:11.421 --> 01:46:12.931 that we can reveal as follows. 01:46:12.931 --> 01:46:15.921 Let me include stdio.h, and let me include-- 01:46:15.921 --> 01:46:19.461 how about stdlib.h, and then let me give myself int 01:46:19.461 --> 01:46:22.561 main(void), and then in this relatively short program 01:46:22.561 --> 01:46:25.461 let me give myself three ints using last week's 01:46:25.461 --> 01:46:29.421 notation, just int scores bracket 3 for 3 quiz scores, or whatever. 01:46:29.421 --> 01:46:33.441 Then let me go ahead and do for int i equals zero, i less than 3, 01:46:33.441 --> 01:46:38.691 i plus plus, then let me go ahead and print out, %i backslash n, 01:46:38.691 --> 01:46:40.911 scores bracket i semicolon. 01:46:40.911 --> 01:46:43.491 That's it. 01:46:43.491 --> 01:46:48.781 This code, pretty sure is going to compile and it's going to run, 01:46:48.781 --> 01:46:51.171 but what is my logical bug? 01:46:51.171 --> 01:46:55.701 I've forgotten a step even though the code that's written is not so wrong. 01:46:55.701 --> 01:46:58.431 Yeah? 01:46:58.431 --> 01:47:00.921 Yeah, I didn't provide the scores, so I didn't actually 01:47:00.921 --> 01:47:04.851 initialize the array called scores to have any scores whatsoever. 01:47:04.851 --> 01:47:08.391 What's curious about this, though, is that the computer technically 01:47:08.391 --> 01:47:09.081 doesn't mind. 01:47:09.081 --> 01:47:13.041 Let me go ahead and playfully make garbage, Enter, 01:47:13.041 --> 01:47:15.621 and it's an apt description because what I'm about to see 01:47:15.621 --> 01:47:18.231 are so-called garbage values. 01:47:18.231 --> 01:47:23.061 When you, the programmer, do not initialize your codes variables to have 01:47:23.061 --> 01:47:25.878 values, sometimes, who knows what's going to be there. 01:47:25.878 --> 01:47:27.711 The computer's been doing some other things, 01:47:27.711 --> 01:47:31.161 there's a bit of work that happens even before your code runs in the computer, 01:47:31.161 --> 01:47:34.401 so there might be remnants of past ints, chars, strings, 01:47:34.401 --> 01:47:37.041 floats-- anything else in there and what you're seeing 01:47:37.041 --> 01:47:42.661 is those garbage values, which is to say you should never forget, 01:47:42.661 --> 01:47:45.601 as I just did, to initialize the value of some variable. 01:47:45.601 --> 01:47:47.601 And this is actually pretty dangerous, and there 01:47:47.601 --> 01:47:51.081 have been many examples of software being compromised 01:47:51.081 --> 01:47:54.261 because of one of these issues where a variable wasn't initialized 01:47:54.261 --> 01:47:58.611 and all of a sudden users, maybe people on the internet in the context of web 01:47:58.611 --> 01:48:02.481 applications, could suddenly see the contents of someone else's memory, 01:48:02.481 --> 01:48:03.591 or remnants. 01:48:03.591 --> 01:48:06.051 Maybe someone's password that had been previously typed in 01:48:06.051 --> 01:48:08.031 or some other value like a credit card number 01:48:08.031 --> 01:48:09.591 that had been previously typed in. 01:48:09.591 --> 01:48:11.571 There are different defense mechanisms in place 01:48:11.571 --> 01:48:15.111 to generally make this not so likely, but it's certainly 01:48:15.111 --> 01:48:18.171 very possible, at least in this kind of context, 01:48:18.171 --> 01:48:22.101 to see values that you probably shouldn't because they 01:48:22.101 --> 01:48:25.621 might be remnants from something else that used them. 01:48:25.621 --> 01:48:29.701 So this is to say again, you have this great power now to manipulate memory, 01:48:29.701 --> 01:48:33.021 but also now you have this great hacking ability to poke around 01:48:33.021 --> 01:48:36.441 the contents of memory, and this is exactly what hackers sometimes do when 01:48:36.441 --> 01:48:40.431 trying to find ways to exploit systems. 01:48:40.431 --> 01:48:41.661 Are any questions here? 01:48:44.571 --> 01:48:45.071 No? 01:48:45.071 --> 01:48:47.111 All right, let's go ahead and take a quick five minute break 01:48:47.111 --> 01:48:49.511 and when we come back, we'll build on these final topics. 01:48:49.511 --> 01:48:50.381 See you in five. 01:48:50.381 --> 01:48:51.671 We are back. 01:48:51.671 --> 01:48:55.481 First, just a little programmer humor from XKCD, which hopefully now 01:48:55.481 --> 01:48:57.851 will make a little bit of sense to you. 01:48:57.851 --> 01:49:02.321 And what we'll also do next to take a look at a short two minute video that 01:49:02.321 --> 01:49:05.501 animates with claymation, if you will, from our friends at Stanford, 01:49:05.501 --> 01:49:08.501 exactly what happens now if you have an understanding of what garbage 01:49:08.501 --> 01:49:12.004 values are and how they get there, and what happens then if you misuse them. 01:49:12.004 --> 01:49:14.171 It's one thing just to print them out as I just did, 01:49:14.171 --> 01:49:18.431 it's another if you actually mistake a garbage value for a valid pointer, 01:49:18.431 --> 01:49:21.881 because garbage values are just zeros and ones somewhere-- numbers, that is. 01:49:21.881 --> 01:49:24.761 But if you use that new dereference operator, the star, 01:49:24.761 --> 01:49:29.111 and try to go to a garbage value thinking incorrectly that it's 01:49:29.111 --> 01:49:31.511 a valid pointer, bad things can happen. 01:49:31.511 --> 01:49:36.431 Computers can crash or more familiarly, segmentation faults can happen. 01:49:36.431 --> 01:49:39.401 So allow me to introduce, if we could dim the lights for two minutes, 01:49:39.401 --> 01:49:41.111 our friend Binky from Stanford. 01:49:44.951 --> 01:49:46.541 SPEAKER 1: Hey Binky, wake up. 01:49:46.541 --> 01:49:49.221 It's time for pointer fun. 01:49:49.221 --> 01:49:50.331 BINKY: What's that? 01:49:50.331 --> 01:49:51.921 Learn about pointers? 01:49:51.921 --> 01:49:53.184 Oh, goody! 01:49:53.184 --> 01:49:55.101 SPEAKER 1: Well, to get started, I guess we're 01:49:55.101 --> 01:49:56.721 going to need a couple of pointers. 01:49:56.721 --> 01:50:00.998 BINKY: OK, this code allocates two pointers which can point to integers. 01:50:00.998 --> 01:50:01.581 SPEAKER 1: OK. 01:50:01.581 --> 01:50:05.188 Well, I see the two pointers, but they don't seem to be pointing to anything. 01:50:05.188 --> 01:50:06.021 BINKY: That's right. 01:50:06.021 --> 01:50:08.151 Initially, pointers don't point to anything. 01:50:08.151 --> 01:50:11.181 The things they point to are called pointees, and setting them up 01:50:11.181 --> 01:50:12.174 is a separate step. 01:50:12.174 --> 01:50:13.341 SPEAKER 1: Oh, right, right. 01:50:13.341 --> 01:50:14.031 I knew that. 01:50:14.031 --> 01:50:16.021 The pointees are separate. 01:50:16.021 --> 01:50:18.351 So how do you allocate a pointee? 01:50:18.351 --> 01:50:21.921 BINKY: OK, well this code allocates a new integer pointee, 01:50:21.921 --> 01:50:24.994 and this part sets x to point to it. 01:50:24.994 --> 01:50:26.411 SPEAKER 1: Hey, that looks better. 01:50:26.411 --> 01:50:28.021 So make it do something. 01:50:28.021 --> 01:50:31.411 BINKY: OK, I'll dereference the pointer x to store the number 01:50:31.411 --> 01:50:33.541 42 into its pointee. 01:50:33.541 --> 01:50:37.201 For this trick, I'll need my magic wand of dereferencing. 01:50:37.201 --> 01:50:40.591 SPEAKER 1: Your magic wand of dereferencing? 01:50:40.591 --> 01:50:42.441 That great. 01:50:42.441 --> 01:50:44.151 BINKY: This is what the code looks like. 01:50:44.151 --> 01:50:46.946 I'll just set up the number and-- 01:50:46.946 --> 01:50:47.821 SPEAKER 1: Hey, look. 01:50:47.821 --> 01:50:49.171 There it goes. 01:50:49.171 --> 01:50:54.091 So doing a dereference on x follows the arrow to access its pointee, 01:50:54.091 --> 01:50:56.131 in this case to store 42 in there. 01:50:56.131 --> 01:51:00.751 Hey, try using it to store the number 13 through the other pointer, y. 01:51:00.751 --> 01:51:01.891 BINKY: OK. 01:51:01.891 --> 01:51:06.271 I'll just go over here to y and get the number 13 set up, 01:51:06.271 --> 01:51:10.801 and then take the wand of dereferencing and just-- 01:51:10.801 --> 01:51:11.881 whoa! 01:51:11.881 --> 01:51:14.101 SPEAKER 1: Oh hey, that didn't work. 01:51:14.101 --> 01:51:17.821 Say, Binky, I don't think dereferencing y is a good idea 01:51:17.821 --> 01:51:21.016 because setting up the pointee is a separate step 01:51:21.016 --> 01:51:23.551 and I don't think we ever did it. 01:51:23.551 --> 01:51:24.601 BINKY: Good point. 01:51:24.601 --> 01:51:27.031 SPEAKER 1: Yeah, we allocated the pointer y, 01:51:27.031 --> 01:51:30.271 but we never set it to point to a pointee. 01:51:30.271 --> 01:51:31.439 BINKY: Very observant. 01:51:31.439 --> 01:51:33.481 SPEAKER 1: Hey, you're looking good there, Binky. 01:51:33.481 --> 01:51:36.361 Can you fix it so that y points to the same pointee as x? 01:51:36.361 --> 01:51:39.721 BINKY: Sure, I'll use my magic wand of pointer assignment. 01:51:39.721 --> 01:51:41.971 SPEAKER 1: Is that going to be a problem, like before? 01:51:41.971 --> 01:51:43.861 BINKY: No, this doesn't touch the pointees, 01:51:43.861 --> 01:51:47.491 it just changes one pointer to point to the same thing as another. 01:51:47.491 --> 01:51:48.511 SPEAKER 1: Oh, I see. 01:51:48.511 --> 01:51:51.181 Now y points to the same place as x. 01:51:51.181 --> 01:51:53.071 So wait, now y is fixed. 01:51:53.071 --> 01:51:56.131 It has a pointee so you can try the wand of dereferencing again 01:51:56.131 --> 01:51:58.741 to send the 13 over. 01:51:58.741 --> 01:52:01.073 BINKY: OK, here it goes. 01:52:01.073 --> 01:52:02.281 SPEAKER 1: Hey, look at that. 01:52:02.281 --> 01:52:04.111 Now dereferencing works on y. 01:52:04.111 --> 01:52:08.161 And because the pointers are sharing that one pointee, they both see the 13. 01:52:08.161 --> 01:52:09.301 BINKY: Yeah, sharing. 01:52:09.301 --> 01:52:09.871 Whatever. 01:52:09.871 --> 01:52:11.911 So are we going to switch places now? 01:52:11.911 --> 01:52:13.831 SPEAKER 1: Oh look, we're out of time. 01:52:13.831 --> 01:52:14.951 BINKY: But-- 01:52:14.951 --> 01:52:17.171 That's from our friend Nick Parlante at Stanford. 01:52:17.171 --> 01:52:19.511 So let's consider what Nick did here as Binky. 01:52:19.511 --> 01:52:21.581 So here is all the code together. 01:52:21.581 --> 01:52:25.258 These first couple of lines were not bad, and notice that in Stanford's code 01:52:25.258 --> 01:52:26.591 they move the stars to the left. 01:52:26.591 --> 01:52:27.341 That's fine. 01:52:27.341 --> 01:52:30.251 Again, more conventional might be this syntax here. 01:52:30.251 --> 01:52:31.461 These two lines are fine. 01:52:31.461 --> 01:52:34.781 It's OK to create variables, even pointers, 01:52:34.781 --> 01:52:38.411 and not assign them a value initially so long as you eventually do. 01:52:38.411 --> 01:52:40.931 So we eventually do here, with this line. 01:52:40.931 --> 01:52:43.991 We assign to x the return value of malloc, which 01:52:43.991 --> 01:52:45.821 is presumably the address of something. 01:52:45.821 --> 01:52:49.071 To be fair, we should really be checking for null as well, 01:52:49.071 --> 01:52:50.991 but that's not the biggest problem here. 01:52:50.991 --> 01:52:53.481 The biggest problem is not even this next line, 01:52:53.481 --> 01:52:59.231 which means go to the memory location in x and store the number 42 there. 01:52:59.231 --> 01:53:01.451 That's fine, because again, malloc returns 01:53:01.451 --> 01:53:03.701 the address of some chunk of memory. 01:53:03.701 --> 01:53:05.801 This chunk of memory is big enough for an int. 01:53:05.801 --> 01:53:08.711 x is therefore going to store the address of that chunk that's 01:53:08.711 --> 01:53:09.671 big enough for an int. 01:53:09.671 --> 01:53:13.541 Star x recalls the dereference operator, means go to that address 01:53:13.541 --> 01:53:15.341 and put 42 in it. 01:53:15.341 --> 01:53:18.461 It's like going to the mailbox and putting the number 42 in it 01:53:18.461 --> 01:53:21.371 instead of taking the number 50 out, like we did before. 01:53:21.371 --> 01:53:23.051 But why is this line bad? 01:53:23.051 --> 01:53:26.291 This is where Binky lost his head, so to speak. 01:53:26.291 --> 01:53:27.641 Why is this bad? 01:53:27.641 --> 01:53:28.681 Yeah. 01:53:28.681 --> 01:53:30.681 AUDIENCE: We haven't yet allocated space for it. 01:53:30.681 --> 01:53:31.231 DAVID J. MALAN: Exactly. 01:53:31.231 --> 01:53:33.141 We haven't yet allocated space for y. 01:53:33.141 --> 01:53:36.051 There's no mention of malloc, there's no assignment of y, 01:53:36.051 --> 01:53:37.591 even to that same memory. 01:53:37.591 --> 01:53:40.441 So this would be, go to the address in y, 01:53:40.441 --> 01:53:43.831 but if there is no known address in y, it is a so-called garbage value, 01:53:43.831 --> 01:53:46.761 which means go to some random address that you have no control over, 01:53:46.761 --> 01:53:47.571 and boom-- 01:53:47.571 --> 01:53:52.221 that might cause what we've seen in the past, perhaps as a segmentation fault. 01:53:52.221 --> 01:53:54.111 Now this, fortunately, is the kind of thing 01:53:54.111 --> 01:53:58.041 that if you don't quite have the eye for it yet, Valgrins, that new tool, 01:53:58.041 --> 01:53:59.911 could help you find as well. 01:53:59.911 --> 01:54:03.681 But it's just another example of again, the sort of upside and downside 01:54:03.681 --> 01:54:07.111 of having control now over memory at this level. 01:54:07.111 --> 01:54:07.611 All right. 01:54:07.611 --> 01:54:09.444 Well, let's go ahead and do one other thing. 01:54:09.444 --> 01:54:12.586 Considering from last week that this notion of swapping 01:54:12.586 --> 01:54:14.211 was actually a really common operation. 01:54:14.211 --> 01:54:17.211 We had all of our volunteers come up, we had to swap a lot of things 01:54:17.211 --> 01:54:19.581 during bubble sorts and even selection sort, 01:54:19.581 --> 01:54:21.681 and we just took for granted that the two 01:54:21.681 --> 01:54:23.613 humans would swap themselves just fine. 01:54:23.613 --> 01:54:25.821 But there needs to be code to do that if you actually 01:54:25.821 --> 01:54:29.638 implement bubble sort, selection sort, or anything that involves swapping. 01:54:29.638 --> 01:54:31.221 So let's consider some code like this. 01:54:31.221 --> 01:54:33.291 We'll keep it simple like last week, and where 01:54:33.291 --> 01:54:40.339 we wanted to swap some values like int A and int B, for instance, here. 01:54:40.339 --> 01:54:43.131 Void because I'm not going to return a value, but I have a function 01:54:43.131 --> 01:54:44.031 called swap. 01:54:44.031 --> 01:54:49.341 So here, for instance, might be some code for this. 01:54:49.341 --> 01:54:50.549 But why is it so complicated? 01:54:50.549 --> 01:54:52.133 Here, let's actually take a step back. 01:54:52.133 --> 01:54:53.301 Why don't we do this here. 01:54:53.301 --> 01:54:54.921 I think we have time for one more volunteer. 01:54:54.921 --> 01:54:56.379 Could we get someone to come on up? 01:54:56.379 --> 01:54:58.671 You have to be comfy on camera and you're 01:54:58.671 --> 01:55:01.701 being asked to help with your-- oh, I'll go with the friend, pointing. 01:55:01.701 --> 01:55:05.641 So whoever has their friend doing this here-- 01:55:05.641 --> 01:55:06.621 no? 01:55:06.621 --> 01:55:08.511 Now they're pointing it over here. 01:55:08.511 --> 01:55:10.251 Now, literally an arm is being twisted. 01:55:10.251 --> 01:55:11.751 OK. 01:55:11.751 --> 01:55:12.471 Come on down. 01:55:12.471 --> 01:55:13.341 That backfired. 01:55:18.311 --> 01:55:18.956 Come on over. 01:55:24.481 --> 01:55:26.241 And what is your name? 01:55:26.241 --> 01:55:27.153 AUDIENCE: Marina. 01:55:27.153 --> 01:55:28.111 DAVID J. MALAN: Marina. 01:55:28.111 --> 01:55:29.641 Nice to meet you. 01:55:29.641 --> 01:55:31.718 Who were you trying to volunteer? 01:55:31.718 --> 01:55:32.801 AUDIENCE: My friend Jesse. 01:55:32.801 --> 01:55:33.971 DAVID J. MALAN: OK. 01:55:33.971 --> 01:55:38.291 So here we have for Marina two glasses of liquid, orange and purple, 01:55:38.291 --> 01:55:39.821 just so that they're super obvious. 01:55:39.821 --> 01:55:42.226 And suppose that the problem at hand, like last week, 01:55:42.226 --> 01:55:45.101 it's just to swap two values, as though these two glasses represented 01:55:45.101 --> 01:55:47.111 two people and we want to swap them. 01:55:47.111 --> 01:55:50.501 But let's consider these glasses to be like variables, or location 01:55:50.501 --> 01:55:52.211 in an array, and you know what? 01:55:52.211 --> 01:55:54.681 I'd really like you to swap the values. 01:55:54.681 --> 01:55:58.241 So orange has to go in there, and purple has to go in there. 01:55:58.241 --> 01:55:59.194 How would you do it? 01:55:59.194 --> 01:56:01.361 And we'll see if we can then translate that to code. 01:56:01.361 --> 01:56:03.508 AUDIENCE: [INAUDIBLE] 01:56:03.508 --> 01:56:04.591 DAVID J. MALAN: OK, what-- 01:56:04.591 --> 01:56:06.444 say it a little louder. 01:56:06.444 --> 01:56:07.111 All right, yeah. 01:56:07.111 --> 01:56:09.571 So presumably, you're struggling mentally 01:56:09.571 --> 01:56:12.781 with how you would do this without having an extra cup, so good foresight 01:56:12.781 --> 01:56:13.321 here. 01:56:13.321 --> 01:56:16.191 Let me go ahead and we do have a temporary variable, if you will. 01:56:16.191 --> 01:56:18.691 So if I hand you this, how would you now solve this problem? 01:56:21.181 --> 01:56:22.931 AUDIENCE: I would go like that, but it's-- 01:56:22.931 --> 01:56:23.581 DAVID J. MALAN: No, that's-- 01:56:23.581 --> 01:56:24.371 Oh. 01:56:24.371 --> 01:56:24.871 Well, OK. 01:56:24.871 --> 01:56:27.981 Go do it-- go with your instincts. 01:56:27.981 --> 01:56:29.541 OK. 01:56:29.541 --> 01:56:30.681 Sure, go ahead. 01:56:30.681 --> 01:56:32.811 Go to whatever your instincts are. 01:56:39.201 --> 01:56:41.828 Yeah, so a little-- so strictly speaking, probably 01:56:41.828 --> 01:56:43.911 shouldn't have moved the glasses just because that 01:56:43.911 --> 01:56:45.931 would be like moving the array locations, 01:56:45.931 --> 01:56:48.611 so let's actually do it one more time but the glasses now 01:56:48.611 --> 01:56:50.361 have to go back where they originally are. 01:56:50.361 --> 01:56:55.051 So how would you swap these now, using this temporary variable? 01:56:55.051 --> 01:56:56.476 OK, good. 01:56:56.476 --> 01:56:59.101 Otherwise we'd be completely uprooting the array, for instance, 01:56:59.101 --> 01:57:01.081 by just physically moving it around. 01:57:01.081 --> 01:57:03.571 So you moved the orange into this temporary variable, 01:57:03.571 --> 01:57:05.911 then you copied the purple into where the orange was, 01:57:05.911 --> 01:57:08.281 and now, presumably, excellent. 01:57:08.281 --> 01:57:11.101 The orange is going to end up where the purple once was 01:57:11.101 --> 01:57:13.621 and this temporary variable, it stored up some extra memory. 01:57:13.621 --> 01:57:16.441 It was necessary at the time, but not necessary, ultimately. 01:57:16.441 --> 01:57:22.131 But a round of applause if we could, and thank you for doing that so well. 01:57:22.131 --> 01:57:26.311 So the fact that it instantly occurred to Mariana 01:57:26.311 --> 01:57:29.711 that you need some temporary variable is a perfect translation to code, 01:57:29.711 --> 01:57:32.951 and in fact this code here, that we might glimpse now, 01:57:32.951 --> 01:57:35.038 is reminiscent of exactly that algorithm, 01:57:35.038 --> 01:57:37.871 where A and B, at the end of the day, are the same chunks of memory. 01:57:37.871 --> 01:57:39.881 Just like the second time, the two glasses 01:57:39.881 --> 01:57:42.281 have to kind of stay put, even though we're physically lifting them, 01:57:42.281 --> 01:57:44.031 but they're going back to where they were, 01:57:44.031 --> 01:57:46.031 is kind of like having two values, A and B, 01:57:46.031 --> 01:57:49.091 and you just have a temporary variable into which you copy A, 01:57:49.091 --> 01:57:52.331 then you change A with B, then you go and change 01:57:52.331 --> 01:57:55.271 B with whatever the original value of A was, 01:57:55.271 --> 01:57:59.921 because you temporarily stored it in this temporary variable, tmp. 01:57:59.921 --> 01:58:04.161 Unfortunately, this code doesn't necessarily work as intended. 01:58:04.161 --> 01:58:07.391 So let me go over to my VS Code here and open up 01:58:07.391 --> 01:58:10.661 a program called swap.c, and in swap.c, let 01:58:10.661 --> 01:58:15.641 me whip up something really quickly here with, how about include stdio.h, 01:58:15.641 --> 01:58:17.561 int main(void). 01:58:17.561 --> 01:58:22.751 Inside of main let me do something like x gets 1 and y gets 2. 01:58:22.751 --> 01:58:27.881 Let me just print out as a visual confirmation that x is %i, 01:58:27.881 --> 01:58:32.891 y is %i backslash n, plugging in x and y, respectively. 01:58:32.891 --> 01:58:36.071 Then let me call a swap function that we'll invent in just a moment. 01:58:36.071 --> 01:58:42.761 Swap x and y And then let me print out again x is %i, y is %i backslash n, 01:58:42.761 --> 01:58:46.331 just to print out again what they are, because presumably I should see 1, 01:58:46.331 --> 01:58:49.494 2 first, then 2, 1 the second time. 01:58:49.494 --> 01:58:51.161 Now how is swap going to be implemented? 01:58:51.161 --> 01:58:54.591 Let me implement it exactly as on the screen a moment ago. 01:58:54.591 --> 01:58:57.011 So void swap int x-- 01:58:57.011 --> 01:58:59.501 or let's call it int A for consistency, int B. 01:58:59.501 --> 01:59:01.661 But I could always call those anything I want. 01:59:01.661 --> 01:59:05.891 Int tmp gets A, A gets B, B gets tmp. 01:59:05.891 --> 01:59:08.981 So exactly as I proposed a moment ago, and exactly 01:59:08.981 --> 01:59:12.761 as Mariana really implemented it using these glasses of water. 01:59:12.761 --> 01:59:16.571 I need to now include my prototype, as always, so nothing new there. 01:59:16.571 --> 01:59:20.261 And I'll just copy/paste that up here, and now let's go ahead and run this. 01:59:20.261 --> 01:59:23.471 So make swap-- so far, so good-- swap-- 01:59:23.471 --> 01:59:28.331 x is now 1, y is 2, x is 1, y is 2. 01:59:28.331 --> 01:59:34.091 So there seems to be a bit of a bug here, but why might this be? 01:59:34.091 --> 01:59:37.931 This code does not in fact work, even though it obviously works in reality. 01:59:37.931 --> 01:59:39.725 Yeah? 01:59:39.725 --> 01:59:46.239 AUDIENCE: Because A and B have different addresses than x and y [INAUDIBLE].. 01:59:46.239 --> 01:59:48.031 DAVID J. MALAN: Good, and let me summarize. 01:59:48.031 --> 01:59:51.361 A and B do indeed have different addresses of x and y, 01:59:51.361 --> 01:59:54.961 and in fact what happens when you call a function like this on line 11, 01:59:54.961 --> 01:59:59.221 calling swap, passing in x and y, you are calling a function 01:59:59.221 --> 02:00:00.851 by value, so to speak. 02:00:00.851 --> 02:00:02.611 And this is a term of art that just means 02:00:02.611 --> 02:00:07.321 you are passing in copies of x and y, respectively, and calling them 02:00:07.321 --> 02:00:11.551 A and B in the context of this function, but they're indeed copies. 02:00:11.551 --> 02:00:15.451 Now technically, these names are local only. 02:00:15.451 --> 02:00:18.211 I could have called this x, I could have called this y, 02:00:18.211 --> 02:00:22.531 I could have changed this to x, this to y, this to x, and this to y. 02:00:22.531 --> 02:00:24.031 The problem would still remain. 02:00:24.031 --> 02:00:27.961 Just because you use the same names in one function as you do elsewhere, 02:00:27.961 --> 02:00:29.551 that doesn't mean they're the same. 02:00:29.551 --> 02:00:31.121 They just look the same to you. 02:00:31.121 --> 02:00:35.821 But indeed, swap is going to get copies of this x and y, and in this context, 02:00:35.821 --> 02:00:38.461 this scope, so to speak-- 02:00:38.461 --> 02:00:40.801 x and y will be copies of the original. 02:00:40.801 --> 02:00:43.141 So for clarity, let me revert this back to A and B 02:00:43.141 --> 02:00:46.951 just to make super clear that they're indeed different, albeit copies, 02:00:46.951 --> 02:00:48.901 but there's indeed a problem there. 02:00:48.901 --> 02:00:51.041 This function actually works fine. 02:00:51.041 --> 02:00:52.361 In fact, notice this. 02:00:52.361 --> 02:00:56.921 Let me go ahead and print out inside of this. printf A is %i, 02:00:56.921 --> 02:01:00.991 B is %i backslash n, and then I'll print A and B. 02:01:00.991 --> 02:01:04.201 And let me do that same thing at the beginning of this function before it 02:01:04.201 --> 02:01:05.381 does any work. 02:01:05.381 --> 02:01:06.751 Let me go ahead and rerun. 02:01:06.751 --> 02:01:10.741 Make swap, ./swap, and this is promising. 02:01:10.741 --> 02:01:17.371 Initially, x is 1, y is 2, A is 1, B is 2, A is 2, B is 1, 02:01:17.371 --> 02:01:19.598 but then nope-- x is 1, y is 2. 02:01:19.598 --> 02:01:21.931 So if anything, I've confirmed that the logic is right-- 02:01:21.931 --> 02:01:25.051 Mariana's logic is right, but there's something about C. 02:01:25.051 --> 02:01:28.921 There's something about using one function versus another that's actually 02:01:28.921 --> 02:01:30.671 creating a problem here. 02:01:30.671 --> 02:01:35.021 The fact that I'm passing in copies of these values is creating this problem. 02:01:35.021 --> 02:01:36.391 So what in fact is going on? 02:01:36.391 --> 02:01:39.211 Well again, inside of your computer's memory there is these little chips, 02:01:39.211 --> 02:01:41.086 and we've been talking about them abstractly, 02:01:41.086 --> 02:01:43.141 it's just this grid of memory locations. 02:01:43.141 --> 02:01:46.343 It turns out that your computer uses this memory 02:01:46.343 --> 02:01:47.551 in a pretty conventional way. 02:01:47.551 --> 02:01:51.631 It's not just random, where it just puts stuff wherever is available, 02:01:51.631 --> 02:01:55.591 it actually uses different parts of the memory for different purposes. 02:01:55.591 --> 02:01:58.981 And you have control over a lot of it, but the computer uses some of it 02:01:58.981 --> 02:01:59.823 for itself. 02:01:59.823 --> 02:02:01.531 And let's go ahead and zoom out from this 02:02:01.531 --> 02:02:05.581 and consider that within your computer's memory, what a computer will typically 02:02:05.581 --> 02:02:09.001 do is actually store initially, all of the zeros and ones 02:02:09.001 --> 02:02:13.001 that you compiled in the top of your computer's memory, so to speak. 02:02:13.001 --> 02:02:16.231 So when you compile a program and then you run it with ./whatever, 02:02:16.231 --> 02:02:19.651 or on a Mac or PC you double click on it, the computer first-- 02:02:19.651 --> 02:02:24.781 the operating system first-- loads all of your program zeros and ones, a.k.a. 02:02:24.781 --> 02:02:29.371 Machine code, into just one big chunk of memory at the top, so to speak. 02:02:29.371 --> 02:02:33.301 Below that it stores global variables-- any variables 02:02:33.301 --> 02:02:37.183 you have created in your program that are outside of main and outside 02:02:37.183 --> 02:02:37.891 of any functions. 02:02:37.891 --> 02:02:39.691 Generally, the top of your file. 02:02:39.691 --> 02:02:41.634 Globals tend to go at the top there. 02:02:41.634 --> 02:02:44.551 Then there's this chunk of memory that's generally known as the heap-- 02:02:44.551 --> 02:02:46.951 and we saw that word briefly in Valgin's output, 02:02:46.951 --> 02:02:50.581 and then there's this other chunk of memory called the stack. 02:02:50.581 --> 02:02:55.711 And it turns out that up until this week you were using the stack heavily. 02:02:55.711 --> 02:03:00.961 Any time you use local variables in a function they end up on the stack. 02:03:00.961 --> 02:03:04.681 Any time you use malloc, that memory ends up on the heap. 02:03:04.681 --> 02:03:06.751 Now as the arrow suggests, this actually looks 02:03:06.751 --> 02:03:09.834 like a problem waiting to happen because if you use more and more and more 02:03:09.834 --> 02:03:11.671 heap, and more and more and more stack, it's 02:03:11.671 --> 02:03:14.401 like two things barreling down the tracks at one another-- this does not 02:03:14.401 --> 02:03:14.891 end well. 02:03:14.891 --> 02:03:16.141 And that's actually a problem. 02:03:16.141 --> 02:03:19.481 If you've ever heard the phrase stack overflow, or use the website, 02:03:19.481 --> 02:03:21.271 this is the origin of its name. 02:03:21.271 --> 02:03:23.521 When you start to use more and more and more 02:03:23.521 --> 02:03:25.801 memory by calling lots and lots of functions 02:03:25.801 --> 02:03:28.261 or using lots and lots of local variables, 02:03:28.261 --> 02:03:30.511 you use a lot of this stack memory. 02:03:30.511 --> 02:03:33.961 Or if you use malloc a lot and keep calling malloc, malloc, malloc, 02:03:33.961 --> 02:03:37.681 and never really, or rarely calling free, you just use more and more memory 02:03:37.681 --> 02:03:41.521 and eventually these two things might overflow each other, at which point 02:03:41.521 --> 02:03:42.571 you're just out of luck. 02:03:42.571 --> 02:03:45.191 The program will crash or something bad will happen. 02:03:45.191 --> 02:03:47.971 So the onus is on you just to don't do that. 02:03:47.971 --> 02:03:50.221 But this is the design, generally, of what's 02:03:50.221 --> 02:03:52.111 going on inside of your computer's memory. 02:03:52.111 --> 02:03:55.711 Now within that memory, though, there are certain conventions 02:03:55.711 --> 02:03:57.571 focusing on here, the stack. 02:03:57.571 --> 02:04:00.031 And in fact, let me go over here with a marker 02:04:00.031 --> 02:04:03.521 and say that this represents the bottom of my memory, ultimately. 02:04:03.521 --> 02:04:07.801 And so here we have a whole bunch of wooden blocks and each of these squares 02:04:07.801 --> 02:04:10.091 represents a byte of memory and this, for instance, 02:04:10.091 --> 02:04:12.781 might represent four bytes altogether-- good enough for an int, 02:04:12.781 --> 02:04:14.111 or something like that. 02:04:14.111 --> 02:04:18.451 So in my original code that I wrote earlier, that is in fact, buggy, 02:04:18.451 --> 02:04:20.851 what is in fact going on inside the swap function? 02:04:20.851 --> 02:04:24.901 We can visualize it like this-- when you run ./swap or any program for that 02:04:24.901 --> 02:04:28.501 matter, main is the first function to get called with a C program, 02:04:28.501 --> 02:04:32.011 and so I'm just going to label this bottom row of memory as main. 02:04:32.011 --> 02:04:36.381 And what were the two variables I had in main called in this code? 02:04:36.381 --> 02:04:37.631 Yeah. 02:04:37.631 --> 02:04:38.201 x and y. 02:04:38.201 --> 02:04:40.401 And each of those was an int, so that's four bytes, 02:04:40.401 --> 02:04:43.121 so it's deliberate that I reserved four-- 02:04:43.121 --> 02:04:45.951 a chunk of wood here that's four bytes. 02:04:45.951 --> 02:04:49.901 So let me just call this x, and I'm just going to write the number 1 in this box 02:04:49.901 --> 02:04:50.411 here. 02:04:50.411 --> 02:04:54.431 And then I had my other variable y, and I'm going to put the number 2 there. 02:04:54.431 --> 02:04:58.641 What happens when main calls swap like it does in this code here? 02:04:58.641 --> 02:05:04.931 Well, it has two variables of its own, A and B, and A initially is 1 02:05:04.931 --> 02:05:09.341 and B is initially 2, but it has a third variable, tmp, 02:05:09.341 --> 02:05:12.371 which is a local variable in addition to the arguments A and B 02:05:12.371 --> 02:05:16.931 that are passed in, so I'm going to call this tmp, tmp over here. 02:05:16.931 --> 02:05:18.156 And what is the value of tmp? 02:05:18.156 --> 02:05:19.781 Well, we have to look back at the code. 02:05:19.781 --> 02:05:24.431 tmp initially gets the value of A. All right, the value of a was 1, 02:05:24.431 --> 02:05:26.141 so tmp initially gets 1. 02:05:26.141 --> 02:05:28.601 That's step one in my three line program. 02:05:28.601 --> 02:05:32.621 OK, A equals B. So that is assigned from the right to the left of the B 02:05:32.621 --> 02:05:36.251 into the A So B is 2, A is this, so let me go ahead 02:05:36.251 --> 02:05:38.361 and erase this and just overwrite that. 02:05:38.361 --> 02:05:41.891 So at this moment in the story you have two copies of two, 02:05:41.891 --> 02:05:44.711 so that's OK though, because the third line of code 02:05:44.711 --> 02:05:47.741 says tmp gets copied into B. So what's tmp-- 02:05:47.741 --> 02:05:53.171 1, gets copied into B, so let me overwrite this 2 with a 1, 02:05:53.171 --> 02:05:54.821 and now what happens? 02:05:54.821 --> 02:05:57.941 Now unfortunately, the code ends. 02:05:57.941 --> 02:06:01.511 swap doesn't actually do anything with the result, and the problem in C 02:06:01.511 --> 02:06:03.521 is that I could have had a return value. 02:06:03.521 --> 02:06:05.741 I could go in there and change void to int, 02:06:05.741 --> 02:06:07.511 but which one am I going to return? 02:06:07.511 --> 02:06:09.221 The A or the B? 02:06:09.221 --> 02:06:11.631 The whole goal is to swap two values, and it 02:06:11.631 --> 02:06:13.631 seems kind of lame if you can't write a function 02:06:13.631 --> 02:06:16.661 to do something as common per last week sorting algorithms 02:06:16.661 --> 02:06:18.191 as swapping two values. 02:06:18.191 --> 02:06:19.541 But what really happens? 02:06:19.541 --> 02:06:22.751 Well, even though when this program starts running, 02:06:22.751 --> 02:06:25.991 main is using this chunk of memory at the bottom in the so-called stack, 02:06:25.991 --> 02:06:28.661 and the stack is just like a cafeteria stack of trays-- 02:06:28.661 --> 02:06:30.201 it grows up, like this. 02:06:30.201 --> 02:06:32.291 Here's main's memory on the stack. 02:06:32.291 --> 02:06:34.571 Here's the swap function's memory on the stack. 02:06:34.571 --> 02:06:37.241 It's using three ints instead of two-- 02:06:37.241 --> 02:06:38.951 instead of only two. 02:06:38.951 --> 02:06:42.461 What happens when the function returns, whether it's void or not? 02:06:42.461 --> 02:06:45.701 The sort of recollection that this is swap's memory goes away 02:06:45.701 --> 02:06:47.291 and garbage values are left. 02:06:47.291 --> 02:06:51.531 So, adorably, we get rid of these values here, 02:06:51.531 --> 02:06:55.991 and there's still data there-- technically, the numbers 1, 1, and 2 02:06:55.991 --> 02:06:59.591 are still there in the computer's memory but they no longer belong to us 02:06:59.591 --> 02:07:01.341 because the function has now returned. 02:07:01.341 --> 02:07:04.421 So they're still in there and this is kind of an example visually 02:07:04.421 --> 02:07:07.781 of why there's other stuff in memory even though you didn't put it there, 02:07:07.781 --> 02:07:08.621 necessarily. 02:07:08.621 --> 02:07:11.071 Sometimes you did put it there, but now once 02:07:11.071 --> 02:07:14.711 swap returns you only should be touching memory inside of main. 02:07:14.711 --> 02:07:19.001 But we've never actually copied one value into main. 02:07:19.001 --> 02:07:22.661 We haven't returned anything and we haven't solved this fundamentally. 02:07:22.661 --> 02:07:24.291 So how could we do this? 02:07:24.291 --> 02:07:28.301 Well, what if we instead passed into swap not copies of x and y, 02:07:28.301 --> 02:07:32.681 calling them A and B. What if they passed in breadcrumbs to x and y, 02:07:32.681 --> 02:07:35.861 sort of a treasure map that will lead swap to the actual x 02:07:35.861 --> 02:07:37.241 and to the actual y? 02:07:37.241 --> 02:07:41.051 Today we have that capability using pointers. 02:07:41.051 --> 02:07:44.921 So suppose that we use this code instead. 02:07:44.921 --> 02:07:47.831 There's a lot of stars going on here, which is a bit annoying, 02:07:47.831 --> 02:07:50.501 but let's consider what it is we're trying to achieve. 02:07:50.501 --> 02:07:55.391 What if we pass in not x and y, but the address of x and the address of y, 02:07:55.391 --> 02:07:57.501 respectively-- breadcrumbs, if you will-- 02:07:57.501 --> 02:08:00.521 that will lead swap to the original values. 02:08:00.521 --> 02:08:04.331 Then what we do is we still give ourselves a tmp variable, 02:08:04.331 --> 02:08:05.351 like an empty glass. 02:08:05.351 --> 02:08:07.691 It's still a glass, so we still call it an int, 02:08:07.691 --> 02:08:10.071 but what do we want to put into that temporary variable? 02:08:10.071 --> 02:08:12.654 We don't want to put A into it, because that's an address now. 02:08:12.654 --> 02:08:15.371 We want to go to that address per the star 02:08:15.371 --> 02:08:17.141 and put whatever's at that address. 02:08:17.141 --> 02:08:18.381 What do we then want to do? 02:08:18.381 --> 02:08:22.121 Well, we want to then copy into whatever's at location A, 02:08:22.121 --> 02:08:24.911 we want to copy over to location A's contents 02:08:24.911 --> 02:08:29.111 whatever is at location B's contents and then lastly, we 02:08:29.111 --> 02:08:32.261 want to copy tmp into whatever's at location B. 02:08:32.261 --> 02:08:36.149 So again, we're very deliberately introducing all of these stars 02:08:36.149 --> 02:08:38.441 because we don't want to change any of these addresses, 02:08:38.441 --> 02:08:41.861 we want to go to these addresses per the reference operator 02:08:41.861 --> 02:08:46.221 and put values there, or get values from. 02:08:46.221 --> 02:08:47.691 So what does this actually mean? 02:08:47.691 --> 02:08:52.001 Well, if I kind of rewind in this story and I go back here, I still have tmp, 02:08:52.001 --> 02:08:57.671 although I'm going to delete its value to begin with, I still have B 02:08:57.671 --> 02:09:01.121 and I still have A, but what's going to be different 02:09:01.121 --> 02:09:05.051 this time is how I use A and B. So let me finish erasing those. 02:09:05.051 --> 02:09:07.181 That's A on the left, this is B on the right. 02:09:07.181 --> 02:09:09.701 At this point in the story, we're rerunning swap 02:09:09.701 --> 02:09:13.151 with this new and improved version, and let's see what happens. 02:09:13.151 --> 02:09:16.871 Well, x is presumably at some address. 02:09:16.871 --> 02:09:20.351 Maybe it's like 0x123, as always. 02:09:20.351 --> 02:09:23.471 What then does A get when I'm using this code? 02:09:23.471 --> 02:09:27.131 The value of A is 0x123. 02:09:27.131 --> 02:09:28.391 What is the value of B? 02:09:28.391 --> 02:09:31.661 Maybe y is that 0x456. 02:09:31.661 --> 02:09:32.651 What goes in B? 02:09:32.651 --> 02:09:38.281 Well, I'm going to put 0x456, and the what am I going to do? 02:09:38.281 --> 02:09:40.471 Based on these three lines of code, I'm going 02:09:40.471 --> 02:09:44.671 to store in tmp whatever is at the address in A. What is the address in A? 02:09:44.671 --> 02:09:47.701 That's this thing here, so I'm going to put 1 in tmp. 02:09:47.701 --> 02:09:50.251 Line two-- I'm going to go to B-- 02:09:50.251 --> 02:09:53.131 all right, B is 456, so I'm going to B and I'm 02:09:53.131 --> 02:09:57.931 going to store 2 at whatever is at location A, and at location A 02:09:57.931 --> 02:10:01.211 is 123, so that's this, so what am I going to do? 02:10:01.211 --> 02:10:03.901 I'm going to change this 1 to a 2. 02:10:03.901 --> 02:10:06.631 Last line of code-- get the value of tmp, which is 1, 02:10:06.631 --> 02:10:11.731 and then put it at whatever the location B is, so B, 456, go there 02:10:11.731 --> 02:10:16.291 and change it to be the value of tmp, tmp, which puts 1 here. 02:10:16.291 --> 02:10:17.521 That's it for the code. 02:10:17.521 --> 02:10:19.081 There's still no return value. 02:10:19.081 --> 02:10:22.381 swap returns, which means these three temporary variables 02:10:22.381 --> 02:10:24.091 are garbage values now. 02:10:24.091 --> 02:10:26.471 They can be reused by subsequent function calls 02:10:26.471 --> 02:10:31.091 but now, I've actually swapped the values of x and y. 02:10:31.091 --> 02:10:35.041 Which is to say what came as naturally as the real world here for Mariana 02:10:35.041 --> 02:10:38.521 is not quite as simply done in C because again, 02:10:38.521 --> 02:10:40.861 functions are isolated from each other. 02:10:40.861 --> 02:10:44.141 You can pass in values but you get copies of those values. 02:10:44.141 --> 02:10:48.691 If you want one function to affect the value of a variable somewhere else, 02:10:48.691 --> 02:10:52.021 you have to 1, understand what's going on but 2, 02:10:52.021 --> 02:10:54.971 pass things in as by a pointer here. 02:10:54.971 --> 02:10:58.561 So if I go back to my code here, I need to make a few changes now. 02:10:58.561 --> 02:11:00.661 Let me get rid of these extra printf's. 02:11:00.661 --> 02:11:03.391 Let me go in and add all these stars. 02:11:03.391 --> 02:11:07.411 So I'm dereferencing these actual addresses here and here, 02:11:07.411 --> 02:11:09.821 and I've got to make one more change. 02:11:09.821 --> 02:11:16.381 How do I now call swap if swap is expecting an int star and an int star? 02:11:16.381 --> 02:11:19.441 That is, the address of an int and the address of another int. 02:11:19.441 --> 02:11:21.931 What do I change on line 11 here? 02:11:21.931 --> 02:11:24.231 Yeah. 02:11:24.231 --> 02:11:25.983 Sorry, a little louder. 02:11:25.983 --> 02:11:30.231 AUDIENCE: [INAUDIBLE] 02:11:30.231 --> 02:11:33.051 DAVID J. MALAN: Sorry, the address of operator. 02:11:33.051 --> 02:11:37.731 So up here on line 11, we do ampersand x and ampersand y. 02:11:37.731 --> 02:11:41.001 So that yes, we're technically passing in a copy of a value, 02:11:41.001 --> 02:11:43.881 but this time the copy we're passing in is technically an address, 02:11:43.881 --> 02:11:47.271 and as soon as we have an address, just like when I held up the fuzzy finger-- 02:11:47.271 --> 02:11:50.571 the foamy finger-- I can point at that address, I can go to that address 02:11:50.571 --> 02:11:54.561 and actually get a value from the mailbox or put a value into the mailbox 02:11:54.561 --> 02:11:56.821 if I even want. 02:11:56.821 --> 02:12:01.551 So let's cross our fingers now and do make swap, Enter. 02:12:01.551 --> 02:12:02.721 Oh my God, so many mistakes. 02:12:02.721 --> 02:12:04.881 Oh, I didn't remember to change my prototype, 02:12:04.881 --> 02:12:08.421 so let me go way up here and add two more stars because I 02:12:08.421 --> 02:12:09.801 made that change already. 02:12:09.801 --> 02:12:14.961 Make swap, ./swap, and viola-- now I have actually swapped. 02:12:14.961 --> 02:12:15.741 Thank you. 02:12:19.291 --> 02:12:19.831 Thank you. 02:12:19.831 --> 02:12:21.661 The two values. 02:12:21.661 --> 02:12:24.491 All right, so what more can we do here? 02:12:24.491 --> 02:12:29.461 Well, let me consider that all this time we've 02:12:29.461 --> 02:12:33.691 been deliberately using GetString and GetInt and GetFloat 02:12:33.691 --> 02:12:35.111 and so forth, but for a reason. 02:12:35.111 --> 02:12:38.069 These aren't just training wheels for the sake of making things easier, 02:12:38.069 --> 02:12:41.071 they're actually in place to make your code safer. 02:12:41.071 --> 02:12:45.511 And to illustrate this, let me go ahead and open up one other file here. 02:12:45.511 --> 02:12:49.861 How about a file called scanf.c. 02:12:49.861 --> 02:12:52.891 It turns out that the old school way-- the way in C, 02:12:52.891 --> 02:12:57.151 really, of getting user input, is via functions like scanf, 02:12:57.151 --> 02:13:00.751 and let me go ahead and include stdio.h, int main(void), 02:13:00.751 --> 02:13:04.441 and without using the CS50 library at all for strings or for any of those 02:13:04.441 --> 02:13:05.611 get functions. 02:13:05.611 --> 02:13:08.161 Let me give myself an int called x. 02:13:08.161 --> 02:13:12.076 Let me just print out what the value of x is, even though it's going to be a-- 02:13:12.076 --> 02:13:15.361 or rather, ask the user for the value by asking them for x. 02:13:15.361 --> 02:13:18.781 And I'm going to use a function called scanf that's going to scan 02:13:18.781 --> 02:13:25.351 in an integer using %i, and I'm going to store whatever the human types 02:13:25.351 --> 02:13:27.306 in at this location. 02:13:27.306 --> 02:13:30.181 And then I'm going to go ahead and, just so we can see what happened, 02:13:30.181 --> 02:13:34.231 I'm going to print out with %i whatever the human typed in as follows. 02:13:34.231 --> 02:13:37.321 All right, so line eight is week 1 style code. 02:13:37.321 --> 02:13:40.991 Line five and six is week 1 style code. 02:13:40.991 --> 02:13:46.411 So the curiosity today is this new line. scanf is another function in stdio.h, 02:13:46.411 --> 02:13:47.971 and notice what I'm doing. 02:13:47.971 --> 02:13:50.671 I'm using the same syntax that I use for printf, 02:13:50.671 --> 02:13:54.091 which is kind of a little clue-- a format code to tell scanf what it is I 02:13:54.091 --> 02:13:57.031 want to scan in, that is, read from the human's keyboard-- 02:13:57.031 --> 02:14:00.571 and I'm telling it where to put whatever the human typed in. 02:14:00.571 --> 02:14:04.321 I can't just say x, because we run into the same darn problem as with swap. 02:14:04.321 --> 02:14:06.811 I have to give a little breadcrumb to the variable 02:14:06.811 --> 02:14:10.111 where I want scanf to put the human's integer. 02:14:10.111 --> 02:14:13.541 And so this just tells the computer to get an int. 02:14:13.541 --> 02:14:15.781 This is what you would have had to type, essentially, 02:14:15.781 --> 02:14:18.691 in week 1 just to get an int from the user, 02:14:18.691 --> 02:14:21.541 and there's a whole bunch of things that can go wrong still, 02:14:21.541 --> 02:14:24.931 but that's the cryptic syntax we would have had to show you in week 1. 02:14:24.931 --> 02:14:26.881 Let me go ahead and make scanf here-- 02:14:26.881 --> 02:14:29.941 oops-- user error. 02:14:29.941 --> 02:14:31.891 Put the semicolon in the wrong place. 02:14:31.891 --> 02:14:33.781 Make scanf, Enter. 02:14:33.781 --> 02:14:35.281 Oh my God. 02:14:35.281 --> 02:14:36.676 Non void doesn't return a value. 02:14:40.371 --> 02:14:42.591 Oh, thank you. 02:14:42.591 --> 02:14:43.221 Strike two. 02:14:43.221 --> 02:14:43.851 OK. 02:14:43.851 --> 02:14:45.141 Make scanf. 02:14:45.141 --> 02:14:45.831 There we go. 02:14:45.831 --> 02:14:46.971 OK, so scanf-- 02:14:46.971 --> 02:14:49.951 I'm going to type in a number like 50 and it just prints it back out. 02:14:49.951 --> 02:14:54.181 So that is the traditional way of implementing something like GetInt. 02:14:54.181 --> 02:14:57.651 The problem, though, is when you start to get into strings, things 02:14:57.651 --> 02:14:59.121 get dangerous quickly. 02:14:59.121 --> 02:15:01.289 Let me delete all of this and give myself 02:15:01.289 --> 02:15:03.831 a string s, although wait a minute-- we don't call it strings 02:15:03.831 --> 02:15:06.891 anymore-- char star to store a string. 02:15:06.891 --> 02:15:10.731 Then let me go ahead and just prompt the user for a string, using just printf. 02:15:10.731 --> 02:15:15.531 Then let me go ahead and use scanf, ask them for a string this time with %s, 02:15:15.531 --> 02:15:18.211 and store it at that address. 02:15:18.211 --> 02:15:20.751 Then let me go ahead and print out whatever the human typed 02:15:20.751 --> 02:15:23.641 in just by using the same notation. 02:15:23.641 --> 02:15:28.791 So here, line five is the same thing as string s, but we've taken back 02:15:28.791 --> 02:15:31.191 that layer today so it's char star s. 02:15:31.191 --> 02:15:35.991 This is just week one this is just week one, line seven is new. 02:15:35.991 --> 02:15:41.811 scanf will also read from the human's keyboard a string and store it at s. 02:15:41.811 --> 02:15:43.641 But that's OK, because s is an address. 02:15:43.641 --> 02:15:46.551 It's correct not to do the ampersand. 02:15:46.551 --> 02:15:47.451 It's not necessary. 02:15:47.451 --> 02:15:52.071 A string is and has always been a char star, a.k.a string. 02:15:52.071 --> 02:15:54.091 The problem, though, arises as follows-- 02:15:54.091 --> 02:15:56.411 if I do make scanf-- 02:15:56.411 --> 02:15:57.911 oh my God, what did I do wrong-- 02:15:57.911 --> 02:16:00.431 I can't-- OK, we have certain defenses in place with make. 02:16:00.431 --> 02:16:06.881 Let me do clang of scanf.c, an output of program called scanf. 02:16:06.881 --> 02:16:09.838 All right, so I'm overriding some of our pedagogical defenses 02:16:09.838 --> 02:16:11.171 that we have in place with make. 02:16:11.171 --> 02:16:15.761 Let me now run scanf of this version, Enter, and let me type in something 02:16:15.761 --> 02:16:20.341 like, how about hi again. 02:16:20.341 --> 02:16:23.161 So it didn't even store something and it weirdly printed out null. 02:16:23.161 --> 02:16:26.821 This time it's in lowercase, but that is somewhat related. 02:16:26.821 --> 02:16:31.561 What did I fundamentally do wrong though, here? 02:16:31.561 --> 02:16:33.691 Why is this getting more and more dangerous? 02:16:33.691 --> 02:16:35.471 And let me illustrate the point even more. 02:16:35.471 --> 02:16:38.741 What if I type in not just something like hello, which also doesn't work. 02:16:38.741 --> 02:16:44.581 What if I do like, hellooooo and make a really long string, Enter-- 02:16:44.581 --> 02:16:45.871 that still works. 02:16:45.871 --> 02:16:48.191 Can I do this again? 02:16:48.191 --> 02:16:50.091 Let's try again. 02:16:50.091 --> 02:16:53.271 Right, a really long, unexpectedly long string. 02:16:53.271 --> 02:16:55.131 This is the nondeterminism kicking in. 02:16:55.131 --> 02:16:55.851 Enter. 02:16:55.851 --> 02:16:56.421 All right, damn it. 02:16:56.421 --> 02:16:58.254 I was trying to trigger a segmentation fault 02:16:58.254 --> 02:17:01.491 but it wouldn't, but the point still remains. 02:17:01.491 --> 02:17:06.181 It's still not working, but what's the essence of why this isn't working, 02:17:06.181 --> 02:17:07.851 and it's not storing my actual input? 02:17:07.851 --> 02:17:08.731 Yeah. 02:17:08.731 --> 02:17:10.666 AUDIENCE: Do you have to make a space? 02:17:10.666 --> 02:17:12.541 DAVID J. MALAN: We have to make space for it. 02:17:12.541 --> 02:17:15.781 So what we're missing here is malloc, or something like that. 02:17:15.781 --> 02:17:18.741 So I could do that, I could do something like this. 02:17:18.741 --> 02:17:21.441 Well, let the human type in at least a three letter word 02:17:21.441 --> 02:17:25.581 so I could do malloc of 3 plus 1 for the null character. 02:17:25.581 --> 02:17:29.961 So let me give them four characters, and let me go ahead and do make scanf-- 02:17:29.961 --> 02:17:30.921 whoops. 02:17:30.921 --> 02:17:33.081 Nope, sorry. clang, I have to-- 02:17:33.081 --> 02:17:33.721 nope. 02:17:33.721 --> 02:17:34.221 Dammit. 02:17:34.221 --> 02:17:40.811 Oh, include stdlib.h-- there we go. 02:17:40.811 --> 02:17:43.836 That gives me malloc, now I'm going to recompile this with clang, 02:17:43.836 --> 02:17:46.961 now I'm going to rerun it, and now I'm going to type in my first thing, hi. 02:17:46.961 --> 02:17:48.341 That now works. 02:17:48.341 --> 02:17:52.061 And let me get a little aggressive now and type in hello, which is too long. 02:17:52.061 --> 02:17:54.101 Still works, but I'm getting lucky. 02:17:54.101 --> 02:17:57.671 Let me try a hellooooooo. 02:17:57.671 --> 02:17:59.995 Damn it, that still works, too. 02:17:59.995 --> 02:18:01.091 Sort of. 02:18:01.091 --> 02:18:03.290 But it actually-- not quite. 02:18:03.290 --> 02:18:05.411 There's some weirdness going on there already. 02:18:05.411 --> 02:18:07.011 It turns out I can also do this. 02:18:07.011 --> 02:18:10.390 I could actually just say char star four and give myself 02:18:10.390 --> 02:18:11.681 an array of four characters. 02:18:11.681 --> 02:18:13.101 Let me try this one more time. 02:18:13.101 --> 02:18:16.661 So let me rerun clang ./scanf. 02:18:16.661 --> 02:18:21.460 Hellooooooo, clearly exceeding the four characters-- 02:18:21.460 --> 02:18:22.091 there we go. 02:18:22.091 --> 02:18:23.080 Thank you, all right. 02:18:26.821 --> 02:18:29.342 So the point here, though, is if we hadn't given you GetInt, 02:18:29.342 --> 02:18:31.800 you would have had to use the scanf thing-- not a huge deal 02:18:31.800 --> 02:18:33.071 because it seemed to work. 02:18:33.071 --> 02:18:36.321 But if we hadn't given you GetString you would have had to do stuff like this, 02:18:36.321 --> 02:18:39.481 knowing about malloc already or knowing about strings being erased, 02:18:39.481 --> 02:18:41.550 and even now there's a danger. 02:18:41.550 --> 02:18:45.751 If the human types in five letters, six letters, 100 letters-- this code, 02:18:45.751 --> 02:18:49.501 like with the Hello input, will probably just crash, which is bad. 02:18:49.501 --> 02:18:51.481 So GetString also has this functionality built 02:18:51.481 --> 02:18:53.790 in where we have a fancy loop inside such 02:18:53.790 --> 02:18:58.321 that we allocate using malloc as many bytes as you physically type in, 02:18:58.321 --> 02:19:00.271 and we use malloc essentially every keystroke. 02:19:00.271 --> 02:19:05.101 The moment you type in h-e-l-l-o, we're laying the tracks as we go and we keep 02:19:05.101 --> 02:19:09.571 allocating more and more memory so that we theoretically will never crash with 02:19:09.571 --> 02:19:12.300 GetString even though it's this easy to crack-- 02:19:12.300 --> 02:19:15.451 this easy to crash your code using scanf if you again 02:19:15.451 --> 02:19:18.121 did it without the help of a library. 02:19:18.121 --> 02:19:20.178 So where are we all going with this? 02:19:20.178 --> 02:19:22.261 Well, let me show you a few final examples that'll 02:19:22.261 --> 02:19:24.601 pave the way for what will be problem set four. 02:19:24.601 --> 02:19:27.761 Let me go ahead and open up from today's code-- 02:19:27.761 --> 02:19:29.880 which is available on the course's website-- 02:19:29.880 --> 02:19:36.841 for instance, a program like this, called phonebook.c, 02:19:36.841 --> 02:19:39.540 and I'm just going to give you a quick tour of it, 02:19:39.540 --> 02:19:42.502 that you'll see more details on in the context of p-set four itself. 02:19:42.502 --> 02:19:45.210 We're going to introduce a few new functions you're going to see. 02:19:45.210 --> 02:19:48.451 You're going to see a function called fopen, which stands for file open, 02:19:48.451 --> 02:19:51.842 and it takes two arguments-- the name of a file to open like a CSV 02:19:51.842 --> 02:19:55.050 that you might manipulate in Excel or Google Spreadsheets or the like-- comma 02:19:55.050 --> 02:19:59.851 separated values, and then something like A for append, R for read, 02:19:59.851 --> 02:20:02.790 W for write, depending on whether you want to add to the file, 02:20:02.790 --> 02:20:05.321 just open it up, or change it. 02:20:05.321 --> 02:20:07.831 We're going to introduce you to a file pointer. 02:20:07.831 --> 02:20:09.671 You'll see that capital file-- 02:20:09.671 --> 02:20:12.271 which is a little bit unconventional-- capital file is 02:20:12.271 --> 02:20:15.121 a pointer to an actual file on the computer's hard drive 02:20:15.121 --> 02:20:17.640 so that you can actually access something like a CSV file, 02:20:17.640 --> 02:20:18.991 or heck, even images. 02:20:18.991 --> 02:20:21.300 And we're going to see down below that you're also 02:20:21.300 --> 02:20:25.050 going to have the ability to write files as well, or print to files. 02:20:25.050 --> 02:20:28.981 You'll see functions like printf printf for file printf. 02:20:28.981 --> 02:20:34.111 Or fwrite-- file write-- which now that you will begin to understand pointers, 02:20:34.111 --> 02:20:37.951 you'll have the ability to actually not only read files-- 02:20:37.951 --> 02:20:41.470 text files, images, other things-- but also write them out. 02:20:41.470 --> 02:20:46.921 In fact for instance, just as a teaser here, JPEGs will be one of the things 02:20:46.921 --> 02:20:49.321 we focus on this week where we give you a forensic image 02:20:49.321 --> 02:20:51.991 and your goal is to recover as many photographs 02:20:51.991 --> 02:20:55.651 from this forensic image of a digital camera as you possibly can. 02:20:55.651 --> 02:20:59.071 And the way you're going to do that is by knowing in advance 02:20:59.071 --> 02:21:03.571 that every JPEG in the world starts with these three bytes, written 02:21:03.571 --> 02:21:05.800 in hexadecimal, but these three numbers. 02:21:05.800 --> 02:21:08.521 And so in fact, just as a teaser, let me open up 02:21:08.521 --> 02:21:11.701 an example you'll see on the course's website for today. 02:21:11.701 --> 02:21:14.436 If I scroll through here, you'll see a program 02:21:14.436 --> 02:21:16.061 that does a little something like this. 02:21:16.061 --> 02:21:18.211 And again, more on this-- 02:21:18.211 --> 02:21:20.401 if we could hit the button-- 02:21:20.401 --> 02:21:21.041 there we go. 02:21:21.041 --> 02:21:26.221 So here we have the notion of a byte we're going to create for ourselves. 02:21:26.221 --> 02:21:29.101 We'll see a data type called byte, which is a common convention. 02:21:29.101 --> 02:21:30.341 This gives me three bytes. 02:21:30.341 --> 02:21:32.674 And you're going to learn about a function called fread, 02:21:32.674 --> 02:21:36.571 which reads from a file some number of bytes-- for instance, three bytes. 02:21:36.571 --> 02:21:38.341 We might then use code like this. 02:21:38.341 --> 02:21:42.001 If bytes bracket zero equals equals 0xFF and bytes 02:21:42.001 --> 02:21:47.761 bracket 1 equals 0xD8 and bytes bracket 2 equals 0xFF, all three of those 02:21:47.761 --> 02:21:52.481 bytes I just claimed represent a JPEG, you'll see an output like this. 02:21:52.481 --> 02:21:55.811 Let me go ahead and run this program as follows. 02:21:55.811 --> 02:21:59.921 Let me copy jpeg.c into my directory from today's distribution. 02:21:59.921 --> 02:22:08.071 Let me do make jpeg, and let me run jpeg on a file which is available online 02:22:08.071 --> 02:22:11.841 called lecture.jpeg, and I claim yes, it's possibly a JPEG. 02:22:11.841 --> 02:22:12.841 Well, what is that file? 02:22:12.841 --> 02:22:16.481 Let me open it up for us, called lecture.jpeg, and here, for instance, 02:22:16.481 --> 02:22:20.581 is that same photo with which we began class, namely implemented as a JPEG. 02:22:20.581 --> 02:22:22.711 But what we're also going to do this week 02:22:22.711 --> 02:22:27.631 is start to implement our own sort of filters a la Instagram, whereby 02:22:27.631 --> 02:22:30.901 we might take images and actually run them through a program that 02:22:30.901 --> 02:22:32.919 creates different versions thereof. 02:22:32.919 --> 02:22:34.711 For instance, using a different file format 02:22:34.711 --> 02:22:38.501 called BMP, which essentially lays out all of its pixels from left to right, 02:22:38.501 --> 02:22:39.901 top to bottom, in a grid. 02:22:39.901 --> 02:22:41.461 You're going to see a struct-- 02:22:41.461 --> 02:22:43.501 a data struct in C that's way more complicated 02:22:43.501 --> 02:22:45.631 than the candidate structure from the past, 02:22:45.631 --> 02:22:47.866 or the person structure from the past, that 02:22:47.866 --> 02:22:50.491 looks like this, which is just a whole bunch more values in it, 02:22:50.491 --> 02:22:52.408 but we'll walk you through these in the p-set. 02:22:52.408 --> 02:22:54.421 And we might take a photograph like this and ask 02:22:54.421 --> 02:22:56.881 you to run a few different filters on it a la Instagram, 02:22:56.881 --> 02:23:00.511 like a black and white filter, or grayscale, a sepia filter 02:23:00.511 --> 02:23:04.531 to give it some old school feel, or a reflection like this to invert it, 02:23:04.531 --> 02:23:07.121 or blur it, even in this way. 02:23:07.121 --> 02:23:10.111 And just to end on a note here, I have a version 02:23:10.111 --> 02:23:13.621 of this code ready to go that doesn't implement all of those filters, 02:23:13.621 --> 02:23:16.351 it just implements one filter initially. 02:23:16.351 --> 02:23:19.051 Let me go ahead and just ready this on my computer here. 02:23:19.051 --> 02:23:21.106 I'm going to go into my own version of filter 02:23:21.106 --> 02:23:22.981 and you'll see a few files that will give you 02:23:22.981 --> 02:23:26.621 a tour of this coming week in bitmap.h, for instance, 02:23:26.621 --> 02:23:31.511 is a version of this structure that I claimed existed a moment ago. 02:23:31.511 --> 02:23:39.361 And let me show you this file here, helpers.c, in which there is a function 02:23:39.361 --> 02:23:43.051 called filter that I've already implemented in advance today. 02:23:43.051 --> 02:23:46.111 But the ones we give you for the piece that won't already be implemented, 02:23:46.111 --> 02:23:48.486 this function called filter takes the height of an image, 02:23:48.486 --> 02:23:51.581 the width of an image, and a two dimensional array. 02:23:51.581 --> 02:23:54.571 So rows and columns of pixels, and then I 02:23:54.571 --> 02:23:58.411 have a loop like this that iterates over all of the pixels in an image from top 02:23:58.411 --> 02:24:00.041 to bottom, left to right. 02:24:00.041 --> 02:24:02.011 And then notice what I'm going to do here. 02:24:02.011 --> 02:24:05.191 I'm going to change the blue value to be zero in this case, 02:24:05.191 --> 02:24:07.601 and the green value to be zero in this case. 02:24:07.601 --> 02:24:08.341 But why? 02:24:08.341 --> 02:24:12.091 Well, the image I have here in mind is this one, 02:24:12.091 --> 02:24:14.881 whereby we have this hidden image that simply 02:24:14.881 --> 02:24:18.151 has old school style-- a secret message embedded in it. 02:24:18.151 --> 02:24:21.361 And if you don't happen to have in your dorm one of these secret decoder 02:24:21.361 --> 02:24:23.581 glasses that essentially make everything red-- 02:24:23.581 --> 02:24:26.456 getting rid of the green in the world and the blue in the world-- 02:24:26.456 --> 02:24:28.831 you can actually-- I'm actually probably the only one who 02:24:28.831 --> 02:24:31.111 can read this right now-- see what message 02:24:31.111 --> 02:24:33.391 is hidden behind all of this red noise. 02:24:33.391 --> 02:24:39.121 But if using my code written here in helpers.c I get rid of all the blue 02:24:39.121 --> 02:24:41.821 in the picture and I get rid of all the green in the picture, 02:24:41.821 --> 02:24:44.431 essentially implementing the idea of this filter-- 02:24:44.431 --> 02:24:47.251 this red filter where you only see red-- 02:24:47.251 --> 02:24:50.501 well, let's go ahead and compile this program. 02:24:50.501 --> 02:24:55.471 Make filter, run ./filter on this hidden message.bmp. 02:24:55.471 --> 02:24:58.531 I'm going to save it in a new file called message.bmp, 02:24:58.531 --> 02:25:01.471 and with one final flourish we're going to open up 02:25:01.471 --> 02:25:05.371 message.bmp, which is the result of having put on these glasses, 02:25:05.371 --> 02:25:08.521 and hopefully now you too will see what I see. 02:25:17.531 --> 02:25:18.931 All right, that's it for CS50! 02:25:18.931 --> 02:25:19.931 We'll see you next time. 02:25:21.731 --> 02:25:25.681 [MUSIC PLAYING]