1 00:00:00,000 --> 00:01:17,581 [MUSIC PLAYING] 2 00:01:17,581 --> 00:01:18,631 3 00:01:18,631 --> 00:01:22,651 DAVID J. MALAN: Well, this is CS50, and already this is week four, 4 00:01:22,651 --> 00:01:24,631 and recall that last week, week three, we 5 00:01:24,631 --> 00:01:27,571 began to explore the inside of a computer's memory a bit more. 6 00:01:27,571 --> 00:01:30,631 We talked about arrays, which were just chunks of memory 7 00:01:30,631 --> 00:01:33,451 back to back to back that really lay things out left to right, top 8 00:01:33,451 --> 00:01:36,721 to bottom, and this is actually a pretty common paradigm, even if you're 9 00:01:36,721 --> 00:01:38,761 new to programming, and certainly new to C. 10 00:01:38,761 --> 00:01:43,771 You've seen this approach of just using memory in some way to lay things out, 11 00:01:43,771 --> 00:01:45,161 like images, for instance. 12 00:01:45,161 --> 00:01:50,371 So for instance, here is a photo taken of last week's front row, for instance, 13 00:01:50,371 --> 00:01:53,791 and this is an opportunity to explore exactly what happens 14 00:01:53,791 --> 00:01:56,911 if we start to zoom in and zoom in and zoom in, because it seems like most 15 00:01:56,911 --> 00:02:00,661 any TV show like CSI, or whatever, or any movie that 16 00:02:00,661 --> 00:02:06,601 explores forensic information might have the investigators zoom in 17 00:02:06,601 --> 00:02:09,994 on an image like this to see what the glint in someone's eye 18 00:02:09,994 --> 00:02:12,661 is because that reveals the license plate number of someone that 19 00:02:12,661 --> 00:02:13,556 just drove past. 20 00:02:13,556 --> 00:02:15,431 Something that's a little over the top there, 21 00:02:15,431 --> 00:02:18,661 but there's an opportunity here to speak to why that is so unrealistic. 22 00:02:18,661 --> 00:02:21,661 For instance, let's zoom on this puppet here's eye and let's 23 00:02:21,661 --> 00:02:23,971 zoom in a little more to see what might be reflected. 24 00:02:23,971 --> 00:02:26,581 Let's zoom in a little more, and that's it. 25 00:02:26,581 --> 00:02:29,051 There's only finite amount of information 26 00:02:29,051 --> 00:02:31,171 if you have an image represented in this way. 27 00:02:31,171 --> 00:02:34,321 We're using pixels-- these dots on the screen as rows and columns-- 28 00:02:34,321 --> 00:02:36,781 because if you're only using a finite amount of memory 29 00:02:36,781 --> 00:02:40,111 then at the end of the day, you can only store a finite amount of information. 30 00:02:40,111 --> 00:02:43,921 At least I don't really see in this grid here any glint of a license plate 31 00:02:43,921 --> 00:02:46,651 or something like that that you might otherwise see in Hollywood. 32 00:02:46,651 --> 00:02:49,681 So today we'll explore these kinds of representations 33 00:02:49,681 --> 00:02:52,501 of how you might use memory in new and interesting ways 34 00:02:52,501 --> 00:02:55,861 to represent now, very familiar things, but also 35 00:02:55,861 --> 00:02:59,071 start to explore what some of the limitations are of this representation. 36 00:02:59,071 --> 00:03:02,851 But consider after all that this doesn't need to be even as high resolution, 37 00:03:02,851 --> 00:03:05,161 as many pixels as something like this other image, 38 00:03:05,161 --> 00:03:09,131 you can imagine just doing something silly with Post-It notes, like this. 39 00:03:09,131 --> 00:03:11,821 And if you think of an image as just having rows and columns, 40 00:03:11,821 --> 00:03:14,131 these rows otherwise known as scan lines-- something 41 00:03:14,131 --> 00:03:17,701 we'll explore in the coming week-- you could make this fun smiley face 42 00:03:17,701 --> 00:03:22,111 by just using two different values, maybe a zero and a one. 43 00:03:22,111 --> 00:03:26,141 Or yellow and purple, or vice versa, just to make something come to life. 44 00:03:26,141 --> 00:03:30,331 Now in practice, recall we talked about storing not just a zero or one, 45 00:03:30,331 --> 00:03:37,414 but maybe an R, a G, and a B value-- like 24 bits, or three bytes in total-- 46 00:03:37,414 --> 00:03:38,581 but we'll come back to that. 47 00:03:38,581 --> 00:03:40,289 That would just be a more involved image. 48 00:03:40,289 --> 00:03:46,111 But for fun, if today you want to tackle something passively in the background, 49 00:03:46,111 --> 00:03:49,531 if you go to this URL here, we've put together an opportunity 50 00:03:49,531 --> 00:03:52,201 to do a bit of pixel art. 51 00:03:52,201 --> 00:03:55,801 If you go to this URL here, that'll redirect you to a Google Spreadsheet. 52 00:03:55,801 --> 00:03:58,141 If you have a laptop with you today that'll 53 00:03:58,141 --> 00:04:01,541 look a little something like this, which we've organized in rows and columns. 54 00:04:01,541 --> 00:04:05,881 So if you'd like to go ahead and use Google Spreadsheet's colorization 55 00:04:05,881 --> 00:04:09,331 feature to color in those individual squares if you'd like, 56 00:04:09,331 --> 00:04:12,751 see if you can't make something a little creative and then email it to Carter 57 00:04:12,751 --> 00:04:16,841 and we'll exhibit some of the best or favorites on the website thereafter. 58 00:04:16,841 --> 00:04:20,064 So let's transition then to something a little more familiar-- images. 59 00:04:20,064 --> 00:04:22,231 And not all of you have used, presumably, Photoshop, 60 00:04:22,231 --> 00:04:25,481 but you're probably generally familiar with Photoshop as a program for editing 61 00:04:25,481 --> 00:04:27,701 and creating images or photos or the like. 62 00:04:27,701 --> 00:04:30,631 And here is a screenshot of p's color picker, 63 00:04:30,631 --> 00:04:32,618 via which you can change what color you're 64 00:04:32,618 --> 00:04:34,951 going to draw with the paint brush, or what color you're 65 00:04:34,951 --> 00:04:36,931 going to fill in with the paint bucket. 66 00:04:36,931 --> 00:04:39,031 It's representative of any kind of graphical tool. 67 00:04:39,031 --> 00:04:41,441 And there's a lot of information in here, 68 00:04:41,441 --> 00:04:43,921 but there's perhaps some familiar terms now-- 69 00:04:43,921 --> 00:04:47,791 R, G, and B. In fact, right now this is Photoshop's way 70 00:04:47,791 --> 00:04:50,491 of saying you're about to fill in your background or foreground 71 00:04:50,491 --> 00:04:52,681 with the color black, and that appears to be 72 00:04:52,681 --> 00:04:56,131 represented with an R, a G, and a B value of zero, zero, zero. 73 00:04:56,131 --> 00:05:01,981 Or alternatively, using a hash symbol and then 000000. 74 00:05:01,981 --> 00:05:04,441 And if some of you have already made web pages before 75 00:05:04,441 --> 00:05:06,331 and you know a little bit of HTML and CSS, 76 00:05:06,331 --> 00:05:08,671 you probably are familiar with this kind of syntax-- 77 00:05:08,671 --> 00:05:12,531 a hash symbol and then six, or sometimes three digits thereafter. 78 00:05:12,531 --> 00:05:15,031 And if we look at a few different colors here, for instance, 79 00:05:15,031 --> 00:05:17,131 here might be the representation of white. 80 00:05:17,131 --> 00:05:23,311 Now the R, the G, and the B values went way up from 0 to 255, 255, 255. 81 00:05:23,311 --> 00:05:28,111 Or alternatively, it looks like Photoshop, and in turn web browsers, 82 00:05:28,111 --> 00:05:31,589 could represent that same color white with FFFFFF. 83 00:05:31,589 --> 00:05:32,881 And let's just do a few others. 84 00:05:32,881 --> 00:05:37,621 Here is red, and it turns out that red is a whole lot of red, 255, 85 00:05:37,621 --> 00:05:39,181 but no green, no blue. 86 00:05:39,181 --> 00:05:40,326 Or, a.k.a. 87 00:05:40,326 --> 00:05:42,549 FF0000. 88 00:05:42,549 --> 00:05:44,341 So there's perhaps a pattern here emerging. 89 00:05:44,341 --> 00:05:48,421 Here is green, zero, 255, zero, a.k.a. 90 00:05:48,421 --> 00:05:52,661 00FF00, or lastly, here blue, which is no red, 91 00:05:52,661 --> 00:05:56,371 no green but apparently a lot of blue, 255 again, a.k.a. 92 00:05:56,371 --> 00:05:58,471 0000FF. 93 00:05:58,471 --> 00:06:01,861 Now some of you, again, might have seen this notation before, 94 00:06:01,861 --> 00:06:05,071 these zeros and these F's and all of the numbers and letters in between, 95 00:06:05,071 --> 00:06:06,844 but this is another form of notation. 96 00:06:06,844 --> 00:06:08,761 And in fact, we'll explore this today-- really 97 00:06:08,761 --> 00:06:11,491 is just a precondition for talking about some other concepts. 98 00:06:11,491 --> 00:06:14,641 But the ideas, ultimately, are really no different. 99 00:06:14,641 --> 00:06:17,821 What we're about to see is a different base system-- 100 00:06:17,821 --> 00:06:19,951 not just binary, not just decimal, but something 101 00:06:19,951 --> 00:06:21,871 we're about to call hexadecimal. 102 00:06:21,871 --> 00:06:25,831 But first, recall that with RGB we previously did the following. 103 00:06:25,831 --> 00:06:28,231 Any RGB value-- red, green, blue-- just combine 104 00:06:28,231 --> 00:06:30,761 some amount of red or green or blue. 105 00:06:30,761 --> 00:06:35,341 So here we have 72, 73, 33, which in the context of an email or text, of course, 106 00:06:35,341 --> 00:06:36,901 said what-- 107 00:06:36,901 --> 00:06:38,401 a couple of weeks back? 108 00:06:38,401 --> 00:06:40,891 Just hi with an exclamation point, but in the context 109 00:06:40,891 --> 00:06:45,121 of a Photoshop-like program, this might instead be representing, 110 00:06:45,121 --> 00:06:47,558 collectively, this shade of yellow, for instance, 111 00:06:47,558 --> 00:06:50,141 when you combine that much red that much green that much blue. 112 00:06:50,141 --> 00:06:51,451 So here is the same idea. 113 00:06:51,451 --> 00:06:53,701 If you've got a lot of red, no green, no blue, 114 00:06:53,701 --> 00:06:55,291 together that's going to give us red. 115 00:06:55,291 --> 00:06:58,081 If you've got no red, a lot of green, no blue, 116 00:06:58,081 --> 00:06:59,851 that's going to give us, of course, green. 117 00:06:59,851 --> 00:07:03,169 If you've got no red, no green, a lot of blue, that of course, 118 00:07:03,169 --> 00:07:04,211 is going to give us blue. 119 00:07:04,211 --> 00:07:08,401 So there's a pattern emerging here where apparently 00 is none, as always, 120 00:07:08,401 --> 00:07:10,591 and FF is apparently a lot. 121 00:07:10,591 --> 00:07:17,281 And it's maybe somehow equated with 255, at least per that Photoshop screenshot. 122 00:07:17,281 --> 00:07:20,551 Meanwhile, if we combine one last one, a lot of red, a lot of green, 123 00:07:20,551 --> 00:07:21,631 a lot of blue-- 124 00:07:21,631 --> 00:07:25,359 that's actually going to give us a single white pixel like this. 125 00:07:25,359 --> 00:07:26,401 All right, so think back. 126 00:07:26,401 --> 00:07:30,119 Here was binary-- in the world of binary you had just two digits, zero and one. 127 00:07:30,119 --> 00:07:31,411 Could have been anything else-- 128 00:07:31,411 --> 00:07:36,541 A or B, X or Y, but the world standardized on these numerals 129 00:07:36,541 --> 00:07:37,381 zero and one. 130 00:07:37,381 --> 00:07:40,591 In our world's decimal system, of course, you have zero through nine. 131 00:07:40,591 --> 00:07:44,101 As of today though, we're going to start using hexadecimal sometimes 132 00:07:44,101 --> 00:07:47,986 in the context of images and also files just because it's a convention 133 00:07:47,986 --> 00:07:49,834 and there's some conveniences to it. 134 00:07:49,834 --> 00:07:51,751 Where now, you're going to be able to count up 135 00:07:51,751 --> 00:07:54,601 to F in a notation called hexadecimal. 136 00:07:54,601 --> 00:07:59,671 From zero through nine, then you keep going to A to B to C to D to E to F, 137 00:07:59,671 --> 00:08:02,641 the idea being each of these, even though it's weirdly 138 00:08:02,641 --> 00:08:06,781 a letter of the English alphabet, it's still just a single symbol. 139 00:08:06,781 --> 00:08:12,241 It's not one zero for 10, or 1 1 for eleven-- all 16 of these values, 140 00:08:12,241 --> 00:08:15,601 these digits, so to speak, are indeed still just single symbols, 141 00:08:15,601 --> 00:08:19,211 and that's a characteristic of just using this other notational system. 142 00:08:19,211 --> 00:08:24,751 So how do we get from 00 and FF to something like 0 and 255, respectively? 143 00:08:24,751 --> 00:08:26,761 Well, this hexadecimal system, a.k.a. 144 00:08:26,761 --> 00:08:30,186 Base 16, just does the math from week zero and really, 145 00:08:30,186 --> 00:08:31,811 grade school, a little bit differently. 146 00:08:31,811 --> 00:08:34,981 For instance, if you have a number that's got two digits, 147 00:08:34,981 --> 00:08:38,921 or hexadecimal digits as of today, the columns are just a little different. 148 00:08:38,921 --> 00:08:42,511 Instead of powers of two or powers of 10, which we saw for binary and decimal 149 00:08:42,511 --> 00:08:45,271 respectively, it's powers of 16. 150 00:08:45,271 --> 00:08:48,001 So if we just do the math out, that's the ones column, 151 00:08:48,001 --> 00:08:50,731 this is the 16s column, and so forth. 152 00:08:50,731 --> 00:08:53,741 Things get actually pretty big pretty quickly in this system. 153 00:08:53,741 --> 00:08:56,746 But now let's just consider how we would represent familiar numbers. 154 00:08:56,746 --> 00:08:59,371 If you've got two hexadecimal digits for which these hashes are 155 00:08:59,371 --> 00:09:02,431 just placeholders, zero, zero is going to mathematically 156 00:09:02,431 --> 00:09:04,931 equal the decimal number you and I know, of course, as zero. 157 00:09:04,931 --> 00:09:05,431 Why? 158 00:09:05,431 --> 00:09:06,721 Same thing as week zero-- 159 00:09:06,721 --> 00:09:11,041 16 times zero plus one times zero is the number you and I know as zero. 160 00:09:11,041 --> 00:09:12,521 And we can count up from here. 161 00:09:12,521 --> 00:09:15,031 This, in hexadecimal, would be how a computer 162 00:09:15,031 --> 00:09:16,831 represents the number we know as one. 163 00:09:16,831 --> 00:09:18,821 It would be zero one in this case. 164 00:09:18,821 --> 00:09:24,181 This would be two, three, four, five, six, seven, eight, nine-- 165 00:09:24,181 --> 00:09:26,141 in decimal, we're about to go to 10. 166 00:09:26,141 --> 00:09:29,211 But in hexadecimal, to be clear, what comes next? 167 00:09:29,211 --> 00:09:38,021 So, apparently A, so 0A, 0B, which is now 10, or 11, or 12, 13, 14, 15. 168 00:09:38,021 --> 00:09:41,111 So using hexadecimal is just an interesting way 169 00:09:41,111 --> 00:09:44,951 of using single symbols now, zero through F, 170 00:09:44,951 --> 00:09:47,901 to count from zero through 15. 171 00:09:47,901 --> 00:09:50,651 And we'll see why it's 15 in a moment, but as soon as we get to F, 172 00:09:50,651 --> 00:09:54,821 anyone want to conjecture how in hexadecimal, a.k.a. hex, 173 00:09:54,821 --> 00:09:57,731 do we now count up one position higher? 174 00:09:57,731 --> 00:10:01,431 What comes after 0F in hexadecimal? 175 00:10:01,431 --> 00:10:03,701 So, one zero-- it's the same kind of thing-- 176 00:10:03,701 --> 00:10:05,866 once you're at the highest digit possible, F-- 177 00:10:05,866 --> 00:10:07,991 or in our decimal world that would have been nine-- 178 00:10:07,991 --> 00:10:11,111 you add one more, nine wraps around to zero, or in this case, 179 00:10:11,111 --> 00:10:12,821 F wraps around to zero. 180 00:10:12,821 --> 00:10:15,791 You carry the one and voila-- now we're representing 181 00:10:15,791 --> 00:10:17,511 the number you and I know as 16. 182 00:10:17,511 --> 00:10:19,451 And we could keep going forever, literally. 183 00:10:19,451 --> 00:10:23,186 This could be 17, 18, 19, 20, and decimal-- 184 00:10:23,186 --> 00:10:25,061 but let's just wave our hands at it and count 185 00:10:25,061 --> 00:10:27,821 as high as we can-- dot, dot, dot-- the highest 186 00:10:27,821 --> 00:10:31,181 we could count in hexadecimal with two digits, just logically, 187 00:10:31,181 --> 00:10:32,981 would be what, in hexadecimal? 188 00:10:32,981 --> 00:10:35,091 Something, something. 189 00:10:35,091 --> 00:10:35,951 FF, I heard. 190 00:10:35,951 --> 00:10:39,531 So yes, that's the biggest digit possible, so FF is what we have. 191 00:10:39,531 --> 00:10:43,163 So how high can you count in hexadecimal if you've got just two of these digits? 192 00:10:43,163 --> 00:10:44,621 Well, it's the same math as always. 193 00:10:44,621 --> 00:10:46,571 16 times F, a.k.a. 194 00:10:46,571 --> 00:10:52,941 15, so that's 16 times 15 plus one times F, or one times 15-- 195 00:10:52,941 --> 00:10:57,341 that gives us 240 plus 15 in decimal, the result of which, of course, now 196 00:10:57,341 --> 00:10:59,421 is 255. 197 00:10:59,421 --> 00:11:02,511 So this hexadecimal system-- you may have seen in the world of web pages, 198 00:11:02,511 --> 00:11:05,261 and if you haven't we'll get to that in this class in a few weeks, 199 00:11:05,261 --> 00:11:07,991 or we just saw in the context of Photoshop-- just 200 00:11:07,991 --> 00:11:14,141 has this shorthand notation of counting as high as 255 but just calling it FF. 201 00:11:14,141 --> 00:11:17,771 Now it's marginal, but that's like 50% savings of how many digits 202 00:11:17,771 --> 00:11:21,491 you need in order to count as high as 255 because in decimal, of course, 203 00:11:21,491 --> 00:11:23,321 255 is three digits. 204 00:11:23,321 --> 00:11:27,131 In hexadecimal you can count as high using just two, 205 00:11:27,131 --> 00:11:30,489 and that difference is going to get magnified the bigger our numbers get. 206 00:11:30,489 --> 00:11:33,281 Let me stipulate for now, you're going to get more and more savings 207 00:11:33,281 --> 00:11:36,431 in terms of just how many symbols you need on the screen to represent 208 00:11:36,431 --> 00:11:39,881 bigger and bigger numbers than that. 209 00:11:39,881 --> 00:11:43,301 All right, let me pause here just to see if there's any questions thus far 210 00:11:43,301 --> 00:11:46,721 on what we've called hexadecimal, which again, just gives us zero through nine 211 00:11:46,721 --> 00:11:53,408 as well as A through F. Any questions or confusion? 212 00:11:53,408 --> 00:11:55,991 And if it feels like we're lingering a bit much on arithmetic, 213 00:11:55,991 --> 00:11:59,331 we're not really going to see other notations besides this moving forward. 214 00:11:59,331 --> 00:12:03,461 These are the go-to three in a programmer's world, typically. 215 00:12:03,461 --> 00:12:04,671 But there are some others. 216 00:12:04,671 --> 00:12:06,240 Yeah. 217 00:12:06,240 --> 00:12:08,532 AUDIENCE: Does the hexadecimal symbol take more storage 218 00:12:08,532 --> 00:12:11,251 than the decimal system? 219 00:12:11,251 --> 00:12:12,501 DAVID J. MALAN: Good question. 220 00:12:12,501 --> 00:12:16,611 Does hexadecimal require more storage or less storage than the decimal system? 221 00:12:16,611 --> 00:12:20,841 Theoretically no, because this is just a way of representing information 222 00:12:20,841 --> 00:12:23,721 and we'll see in a concrete example in a moment. 223 00:12:23,721 --> 00:12:27,111 But inside of the computer, at the end of the day, you're still storing bits. 224 00:12:27,111 --> 00:12:30,228 And using hexadecimal is not using more or fewer bits, 225 00:12:30,228 --> 00:12:32,061 think of this as how you might write it down 226 00:12:32,061 --> 00:12:34,971 on a piece of paper, just how many digits you're going to write 227 00:12:34,971 --> 00:12:37,941 or on a computer screen, how many digits you're going to see at once, 228 00:12:37,941 --> 00:12:41,211 but it doesn't change how the computer is representing information 229 00:12:41,211 --> 00:12:44,331 because all they're representing at the end of the day is zeros and ones. 230 00:12:44,331 --> 00:12:45,621 So in fact, let's go there. 231 00:12:45,621 --> 00:12:49,851 If this-- a moment ago FF I claimed was 255-- 232 00:12:49,851 --> 00:12:51,891 let's just rewind to week zero and if we wanted 233 00:12:51,891 --> 00:12:56,391 to count to 255 in binary, that's as high as you can count, recall, 234 00:12:56,391 --> 00:12:57,411 with eight bits. 235 00:12:57,411 --> 00:12:59,244 And there's only a few of these numbers that 236 00:12:59,244 --> 00:13:03,081 are useful to memorize, like 255 is as high as you can count with eight bits 237 00:13:03,081 --> 00:13:06,981 if you start at zero, because two to the eighth is 256, but if you start at zero 238 00:13:06,981 --> 00:13:09,471 it's zero through 255. 239 00:13:09,471 --> 00:13:13,671 So in binary, recall if you have eight bits, all of which were ones, 240 00:13:13,671 --> 00:13:15,991 and I won't do out the math pedantically here, 241 00:13:15,991 --> 00:13:18,366 but if I do do this plus this plus this, dot, dot, 242 00:13:18,366 --> 00:13:21,391 dot-- that's also going to give me 255. 243 00:13:21,391 --> 00:13:24,441 So this is what's interesting here about hexadecimal. 244 00:13:24,441 --> 00:13:28,851 It turns out that an upside of storing values in hexadecimal 245 00:13:28,851 --> 00:13:32,571 is that we're going to see the first F represents 246 00:13:32,571 --> 00:13:35,901 the left half of all these bits, and the second F in this case 247 00:13:35,901 --> 00:13:38,431 represents the rightmost four of these bits. 248 00:13:38,431 --> 00:13:41,061 So it turns out hexadecimal is very useful when you 249 00:13:41,061 --> 00:13:44,031 want to treat data in units of four. 250 00:13:44,031 --> 00:13:47,181 It's not quite eight, but units of four, and that's not bad. 251 00:13:47,181 --> 00:13:50,271 Which is why-- if you use two digits like I have thus far, 252 00:13:50,271 --> 00:13:53,061 00 or FF or anything in between-- 253 00:13:53,061 --> 00:13:57,921 that's actually a convenient way of representing eight bits in total. 254 00:13:57,921 --> 00:14:02,091 One hex digit for the first four bits, one hex digit for the second. 255 00:14:02,091 --> 00:14:04,791 And again, there's nothing new intellectually here per se, 256 00:14:04,791 --> 00:14:08,571 it's just a different way of representing the same story as before-- 257 00:14:08,571 --> 00:14:09,651 zeros and ones. 258 00:14:09,651 --> 00:14:11,491 So in what context do we see this? 259 00:14:11,491 --> 00:14:12,831 Well, we talked about memory last week, and we're 260 00:14:12,831 --> 00:14:14,414 going to talk more about it this week. 261 00:14:14,414 --> 00:14:16,941 If this is my computer's RAM-- random access memory-- 262 00:14:16,941 --> 00:14:21,111 you can again think of each byte as having a number associated with it-- 263 00:14:21,111 --> 00:14:22,671 its address or location. 264 00:14:22,671 --> 00:14:26,991 This might be zero, this might be 2 billion, and so in the past 265 00:14:26,991 --> 00:14:29,781 I've described these as just this, using decimal numbers. 266 00:14:29,781 --> 00:14:34,131 Here's byte zero, one, two, three, four, five, six, seven, 15, 16 267 00:14:34,131 --> 00:14:35,581 would be here, and so forth. 268 00:14:35,581 --> 00:14:40,071 But it turns out in the world of memory, and thus today, programming, people 269 00:14:40,071 --> 00:14:44,691 tend to count memory bytes using hexadecimal. 270 00:14:44,691 --> 00:14:46,881 Partly just by convention, but also partly 271 00:14:46,881 --> 00:14:49,581 because it's a little more succinct and again, each digit 272 00:14:49,581 --> 00:14:52,641 represents four bits, typically. 273 00:14:52,641 --> 00:14:54,396 So what comes after F here? 274 00:14:54,396 --> 00:14:56,271 Well, if I think about the computer's memory, 275 00:14:56,271 --> 00:15:01,311 I normally might do after F, which is 15, 16. 276 00:15:01,311 --> 00:15:05,931 But instead, one zero, one one, one two, one three-- this 277 00:15:05,931 --> 00:15:10,551 is not 10, 11, 12, 13, because I claim I'm in the context of hexadecimal now. 278 00:15:10,551 --> 00:15:12,621 As per the previous slide, we already started 279 00:15:12,621 --> 00:15:15,441 going into A's through F's, so you immediately 280 00:15:15,441 --> 00:15:18,111 see here a possible problem. 281 00:15:18,111 --> 00:15:21,081 Why is this now worrisome, if all of a sudden you're 282 00:15:21,081 --> 00:15:26,791 seeing seemingly familiar numbers like 10, 11, 12, 13? 283 00:15:26,791 --> 00:15:28,928 We didn't really stumble across this problem 284 00:15:28,928 --> 00:15:30,511 when it was all zeros and ones before. 285 00:15:30,511 --> 00:15:31,614 Yeah. 286 00:15:31,614 --> 00:15:33,156 AUDIENCE: Try to do math [INAUDIBLE]. 287 00:15:35,284 --> 00:15:37,951 DAVID J. MALAN: Yeah, so if you're writing some code in C that's 288 00:15:37,951 --> 00:15:39,809 doing some math, you might accidentally-- 289 00:15:39,809 --> 00:15:42,601 or the computer might accidentally confuse hexadecimal with decimal 290 00:15:42,601 --> 00:15:45,161 if they look in some context the same. 291 00:15:45,161 --> 00:15:47,251 Any number on the board that doesn't have a letter 292 00:15:47,251 --> 00:15:51,041 is ambiguously hexadecimal or decimal at this point, 293 00:15:51,041 --> 00:15:52,751 and so how might we resolve this? 294 00:15:52,751 --> 00:15:55,711 Well, it turns out that what computers typically do is this. 295 00:15:55,711 --> 00:16:00,481 By convention, any time you see 0x and then a number, 296 00:16:00,481 --> 00:16:02,911 that's a human convention of saying-- 297 00:16:02,911 --> 00:16:06,371 signaling to the reader that this is in fact a hexadecimal number. 298 00:16:06,371 --> 00:16:10,441 So if it's 0x10, that is not the number 10, 299 00:16:10,441 --> 00:16:15,611 that is the hexadecimal number one zero, which recall we said earlier, 300 00:16:15,611 --> 00:16:18,631 is how you count up to 16. 301 00:16:18,631 --> 00:16:21,151 And again, these are not the kinds of things to memorize, 302 00:16:21,151 --> 00:16:24,561 it's really just the system for how you think about these things. 303 00:16:24,561 --> 00:16:27,061 So henceforth today, we're going to start seeing hexadecimal 304 00:16:27,061 --> 00:16:28,471 in a bunch of contexts. 305 00:16:28,471 --> 00:16:31,501 When you write code, you might even write code using some hexadecimal 306 00:16:31,501 --> 00:16:34,001 but again, it's just a different way of representing numbers 307 00:16:34,001 --> 00:16:37,261 and humans have different conventions for different contexts. 308 00:16:37,261 --> 00:16:40,771 All right, so with that said, any questions now on this building block? 309 00:16:40,771 --> 00:16:46,321 But here on out, we'll start using it in some actual code. 310 00:16:46,321 --> 00:16:48,011 Any questions? 311 00:16:48,011 --> 00:16:49,581 Nothing so far? 312 00:16:49,581 --> 00:16:50,081 All right. 313 00:16:50,081 --> 00:16:53,821 So, let's go ahead and consider maybe a familiar example. 314 00:16:53,821 --> 00:16:57,571 Something where involving code, where I initialize a variable like n 315 00:16:57,571 --> 00:16:59,389 to a value like 50, in this case. 316 00:16:59,389 --> 00:17:01,681 And then let's start to tinker around with what's going 317 00:17:01,681 --> 00:17:03,391 on inside of the computer's memory. 318 00:17:03,391 --> 00:17:06,191 In a moment I'm going to load up VS Code on my computer 319 00:17:06,191 --> 00:17:09,511 and I'm going to go ahead and whip up a program that very simply assigns 320 00:17:09,511 --> 00:17:13,231 a value like the number 50 to a variable called n, 321 00:17:13,231 --> 00:17:19,036 but today, keep in mind that that variable n and that value 50 322 00:17:19,036 --> 00:17:21,404 is going to be stored somewhere in my computer's memory, 323 00:17:21,404 --> 00:17:24,571 and it turns out today we'll introduce a bit more syntax so you can actually 324 00:17:24,571 --> 00:17:27,011 see where things are being stored. 325 00:17:27,011 --> 00:17:28,711 So let me click over to VS Code here. 326 00:17:28,711 --> 00:17:31,681 I'm going to create a program called address.c just 327 00:17:31,681 --> 00:17:34,171 to explore computer's addresses today, and I'm 328 00:17:34,171 --> 00:17:38,701 going to do an include stdio.h, int main(void), as usual. 329 00:17:38,701 --> 00:17:40,441 No command line arguments for now. 330 00:17:40,441 --> 00:17:43,043 I'm going to declare that variable n equals 50, 331 00:17:43,043 --> 00:17:45,251 and then I'm just going to go ahead and print it out. 332 00:17:45,251 --> 00:17:50,731 So nothing very interesting but I'll use %i backslash n and then comma n 333 00:17:50,731 --> 00:17:52,321 to print out that value. 334 00:17:52,321 --> 00:17:55,311 Nothing here should be very interesting to compile or run, 335 00:17:55,311 --> 00:17:57,811 but I'll do it just to make sure I didn't make any mistakes. 336 00:17:57,811 --> 00:18:03,301 Looks like as expected, it simply prints out the number 50, like this. 337 00:18:03,301 --> 00:18:06,781 But let's consider then, what this code is doing underneath the hood 338 00:18:06,781 --> 00:18:09,521 when it's actually run on your machine. 339 00:18:09,521 --> 00:18:11,401 So here we have that grid of memory. 340 00:18:11,401 --> 00:18:15,451 That variable n is an int, and if you think back, 341 00:18:15,451 --> 00:18:19,051 how many bytes typically do we use for an int? 342 00:18:19,051 --> 00:18:20,131 Yeah. 343 00:18:20,131 --> 00:18:22,690 Four, so four bytes, or 32 bits. 344 00:18:22,690 --> 00:18:26,491 So if each of these squares represents one byte, then my computer, somewhere 345 00:18:26,491 --> 00:18:29,813 in my memory, or RAM, is using four of these squares. 346 00:18:29,813 --> 00:18:32,521 Maybe it ends up over here just because there's other stuff being 347 00:18:32,521 --> 00:18:33,731 used elsewhere, for instance. 348 00:18:33,731 --> 00:18:35,481 Though I don't really know, and frankly, I 349 00:18:35,481 --> 00:18:38,273 don't really care where it ends up, just that it ends up somewhere. 350 00:18:38,273 --> 00:18:41,940 So the variable-- the value 50 is stored here in a variable called n. 351 00:18:41,940 --> 00:18:45,581 Even though I've written it as decimal, just like in my code-- 352 00:18:45,581 --> 00:18:50,184 let me again remind that this is 32 zeros and ones representing that 50-- 353 00:18:50,184 --> 00:18:53,351 it's just going to be very tedious if we start writing everything in binary, 354 00:18:53,351 --> 00:18:56,351 so I'll use the more comfortable human decimal system. 355 00:18:56,351 --> 00:18:59,141 So that's what's going on inside of the computer's memory. 356 00:18:59,141 --> 00:19:03,571 So what if I actually wanted to start tinkering with its location, 357 00:19:03,571 --> 00:19:06,091 or maybe just knowing its location? 358 00:19:06,091 --> 00:19:09,901 Well, this variable n indeed has a name, n-- 359 00:19:09,901 --> 00:19:13,763 that's a label of sorts for it-- but at the end of the day that 50 is 360 00:19:13,763 --> 00:19:16,471 technically at a specific address, and I'm going to make one up-- 361 00:19:16,471 --> 00:19:19,501 0x123, and it's 123 because I really don't 362 00:19:19,501 --> 00:19:22,421 care what it is, I just want an address for the sake of discussion. 363 00:19:22,421 --> 00:19:28,951 So way over here off screen might be byte zero, way down here is byte 0x123. 364 00:19:28,951 --> 00:19:32,861 It's in hexadecimal notation just by convention. 365 00:19:32,861 --> 00:19:36,691 So how can I actually see where my variables are ending up 366 00:19:36,691 --> 00:19:38,341 in memory if I'm curious to do so? 367 00:19:38,341 --> 00:19:41,821 Well, let me go back to my code here and let me actually 368 00:19:41,821 --> 00:19:44,081 change this just a little bit. 369 00:19:44,081 --> 00:19:49,381 Let me go ahead and introduce, for instance, another symbol 370 00:19:49,381 --> 00:19:53,581 here and another topic altogether, namely pointers. 371 00:19:53,581 --> 00:19:59,111 So a pointer is a variable that stores the address of some value-- 372 00:19:59,111 --> 00:20:02,371 the location of some value or more specifically, 373 00:20:02,371 --> 00:20:05,681 the specific byte in which that value is stored. 374 00:20:05,681 --> 00:20:08,941 So again, if you think of your memory as being a whole bunch of bytes-- 375 00:20:08,941 --> 00:20:11,701 zero at top left, 2 billion or whatever at bottom right, 376 00:20:11,701 --> 00:20:13,201 depending on how much RAM you have-- 377 00:20:13,201 --> 00:20:15,481 each of those things has a location, or an address. 378 00:20:15,481 --> 00:20:19,571 A pointer is just a variable storing one such address. 379 00:20:19,571 --> 00:20:24,751 So it turns out that in the world of C, there's a couple of new symbols 380 00:20:24,751 --> 00:20:29,111 we can use if we want to see what it is we're talking about here, 381 00:20:29,111 --> 00:20:32,041 and those two operators, as of today, are these. 382 00:20:32,041 --> 00:20:35,831 You can use the ampersand operator in C in a couple of ways. 383 00:20:35,831 --> 00:20:38,761 We already saw it very briefly to do ampersand ampersand-- 384 00:20:38,761 --> 00:20:42,271 it's kind of and two Boolean expressions together 385 00:20:42,271 --> 00:20:43,811 in the context of a conditional. 386 00:20:43,811 --> 00:20:44,821 This is different. 387 00:20:44,821 --> 00:20:48,631 A single ampersand is the address of operator. 388 00:20:48,631 --> 00:20:52,651 So literally, in your code, if you've got a variable like n or anything else 389 00:20:52,651 --> 00:20:57,901 and you write &n, C is going to figure out for you what is the address of that 390 00:20:57,901 --> 00:21:00,371 variable n in the computer's memory. 391 00:21:00,371 --> 00:21:06,001 And it's going to give you a number, otherwise known as the address of that. 392 00:21:06,001 --> 00:21:09,781 If you want to store that address in a variable 393 00:21:09,781 --> 00:21:15,841 even though yes, it's a number like 0x123, you have to tell C in advance 394 00:21:15,841 --> 00:21:21,721 that you want to store not an int per se, but the address of an int. 395 00:21:21,721 --> 00:21:25,351 And the syntax for doing that-- somewhat nonobviously-- is 396 00:21:25,351 --> 00:21:29,071 to use an asterisk here, a star operator, and you 397 00:21:29,071 --> 00:21:30,871 say this when creating the variable. 398 00:21:30,871 --> 00:21:35,371 If you want p to be a pointer, that is the address of some other variable, 399 00:21:35,371 --> 00:21:37,051 you do int star p. 400 00:21:37,051 --> 00:21:41,191 And the star just tells the computer, this is not an integer per se, 401 00:21:41,191 --> 00:21:44,641 this is the address of something that yes, is an int, 402 00:21:44,641 --> 00:21:46,401 but we're just being more precise. 403 00:21:46,401 --> 00:21:49,301 So on the right hand side you have the address of operator. 404 00:21:49,301 --> 00:21:52,281 As always with the equal sign, you copy from right to left. 405 00:21:52,281 --> 00:21:56,231 Because &n is by definition the address of something you have to store it 406 00:21:56,231 --> 00:22:01,781 in a pointer, and the way to declare a pointer is to specify the type of value 407 00:22:01,781 --> 00:22:05,831 whose address you're storing, and then use the star to indicate that this is 408 00:22:05,831 --> 00:22:09,341 indeed a pointer and not just a regular old int. 409 00:22:09,341 --> 00:22:10,811 So let's see this in practice. 410 00:22:10,811 --> 00:22:13,871 Let me go back to my own source code here and let 411 00:22:13,871 --> 00:22:15,881 me make just a couple of tweaks. 412 00:22:15,881 --> 00:22:18,221 I'm going to leave n alone here but I'm going 413 00:22:18,221 --> 00:22:22,761 to go ahead and initially just do this. 414 00:22:22,761 --> 00:22:27,341 Let me say int star p equals ampersand n, 415 00:22:27,341 --> 00:22:31,961 and then down here, I'm going to print out not n this time, but p-- 416 00:22:31,961 --> 00:22:33,401 the variable p. 417 00:22:33,401 --> 00:22:38,171 And then even though yes, it's just a number and therefore I could use %i 418 00:22:38,171 --> 00:22:42,311 for integers, there's actually a special format code in printf for printing 419 00:22:42,311 --> 00:22:45,521 pointers or addresses, and that's %p. 420 00:22:45,521 --> 00:22:48,821 So now let's go ahead and recompile this, make address-- 421 00:22:48,821 --> 00:22:53,871 so far so good-- ./address, Enter, and a little weirdly, 422 00:22:53,871 --> 00:22:58,511 but perhaps understandably now, the address in my computer's memory 423 00:22:58,511 --> 00:23:02,381 at which the variable n happened to be stored was not quite as simple 424 00:23:02,381 --> 00:23:03,881 as 0x123. 425 00:23:03,881 --> 00:23:06,431 This computer has a lot more memory so technically, 426 00:23:06,431 --> 00:23:12,491 it was stored at 0x7FFCB4578E5C. 427 00:23:12,491 --> 00:23:14,651 Now that has no special significance to me. 428 00:23:14,651 --> 00:23:16,881 It could have ended up somewhere else altogether, 429 00:23:16,881 --> 00:23:20,381 but this is just where, in my computer-- or technically the cloud 430 00:23:20,381 --> 00:23:22,901 server to which I'm connected using VS Code here-- 431 00:23:22,901 --> 00:23:25,498 that just happens to be where n ended up. 432 00:23:25,498 --> 00:23:28,331 And strictly speaking, I don't even need to introduce this variable. 433 00:23:28,331 --> 00:23:31,181 I could get rid of p and I could just say 434 00:23:31,181 --> 00:23:34,901 print not just n, but the address of n and achieve the same thing. 435 00:23:34,901 --> 00:23:37,361 You don't need to temporarily store it in a variable. 436 00:23:37,361 --> 00:23:40,341 Let me just do make address again, ./address, 437 00:23:40,341 --> 00:23:42,921 and now I see this address here. 438 00:23:42,921 --> 00:23:46,466 And notice if I keep running the program, it's actually moving around. 439 00:23:46,466 --> 00:23:49,091 There's other stuff presumably going on inside of the computer. 440 00:23:49,091 --> 00:23:52,501 Maybe it's actually randomizing it so it's not always at the same location. 441 00:23:52,501 --> 00:23:55,001 That can actually be a security feature underneath the hood, 442 00:23:55,001 --> 00:24:00,521 but this happens to be at that moment in time where that value is in memory, 443 00:24:00,521 --> 00:24:03,491 quite like our picture a moment ago. 444 00:24:03,491 --> 00:24:06,641 All right, so let me pause here to see if there's now 445 00:24:06,641 --> 00:24:08,171 any questions on what we just did. 446 00:24:08,171 --> 00:24:10,171 Yeah? 447 00:24:10,171 --> 00:24:12,391 AUDIENCE: Is there any way to control where 448 00:24:12,391 --> 00:24:15,551 you are storing something in memory? 449 00:24:15,551 --> 00:24:18,746 Does it even matter if it works, or does it just 450 00:24:18,746 --> 00:24:21,271 matter that you could go in and locate where something is? 451 00:24:21,271 --> 00:24:22,813 DAVID J. MALAN: Really good question. 452 00:24:22,813 --> 00:24:25,381 Is there any way to control where something is in memory? 453 00:24:25,381 --> 00:24:28,338 Short answer is yes, and this is both the power in the danger of C, 454 00:24:28,338 --> 00:24:31,171 and we're going to do this today and make a few deliberate mistakes, 455 00:24:31,171 --> 00:24:36,241 because with this power of going to or getting the address of any variable, 456 00:24:36,241 --> 00:24:38,341 I could just arbitrarily right now write code 457 00:24:38,341 --> 00:24:42,611 that stores a value at byte 2 billion, or zero, or anything in between. 458 00:24:42,611 --> 00:24:46,771 But that also means potentially, I could start creepily looking 459 00:24:46,771 --> 00:24:50,831 around at all of the computer's memory, even at things that I didn't put there. 460 00:24:50,831 --> 00:24:53,371 Maybe other programs, maybe other parts of programs 461 00:24:53,371 --> 00:24:55,621 and indeed, this is a potential security threat, 462 00:24:55,621 --> 00:24:57,984 if suddenly you're able to just look anywhere 463 00:24:57,984 --> 00:24:59,401 you want in the computer's memory. 464 00:24:59,401 --> 00:25:04,021 Now, I'm overselling it a little bit because nowadays, in this decade, 465 00:25:04,021 --> 00:25:06,571 there are some defenses in place in compilers 466 00:25:06,571 --> 00:25:09,941 and in our operating systems that do hedge against this a little bit. 467 00:25:09,941 --> 00:25:12,391 But this is still a very frequent source of problems, 468 00:25:12,391 --> 00:25:14,791 and later today we'll talk briefly about things 469 00:25:14,791 --> 00:25:17,651 called stack overflow, which is not just a website, 470 00:25:17,651 --> 00:25:19,831 it is a problem that you can encounter. 471 00:25:19,831 --> 00:25:22,351 Heap overflow, and more generally buffer overflows-- 472 00:25:22,351 --> 00:25:25,801 there's just so many things that can go wrong using this language called C, 473 00:25:25,801 --> 00:25:29,401 and if any of you have encountered a segmentation fault yet? 474 00:25:29,401 --> 00:25:31,321 I think we saw a few hands for that already. 475 00:25:31,321 --> 00:25:33,901 You touched memory that you shouldn't have 476 00:25:33,901 --> 00:25:38,611 and odds are you did it most recently by going too far in an array. 477 00:25:38,611 --> 00:25:42,001 Going to the left, or negative in an array, or somehow looking at memory 478 00:25:42,001 --> 00:25:42,841 you shouldn't have. 479 00:25:42,841 --> 00:25:47,051 And we'll explain today why it is you were able to do that. 480 00:25:47,051 --> 00:25:49,531 Other questions on these primitives so far? 481 00:25:49,531 --> 00:25:51,623 Yeah, from Carter? 482 00:25:51,623 --> 00:25:54,748 AUDIENCE: [INAUDIBLE] pointer star p, but then we used p later in the code. 483 00:25:54,748 --> 00:25:56,031 Is it called star p or p? 484 00:25:56,031 --> 00:25:57,281 DAVID J. MALAN: Good question. 485 00:25:57,281 --> 00:25:58,571 Earlier, we used star p. 486 00:25:58,571 --> 00:26:01,061 Let me rewind in time to the previous version of this code, 487 00:26:01,061 --> 00:26:03,341 where I actually had a variable called p. 488 00:26:03,341 --> 00:26:07,151 Just like with variable declarations in the past, 489 00:26:07,151 --> 00:26:12,621 once you've declared a variable to be an int, a char, a bool, or an int 490 00:26:12,621 --> 00:26:15,761 star, a.k.a. a pointer, you don't thereafter 491 00:26:15,761 --> 00:26:18,671 keep using the word int or now, the star. 492 00:26:18,671 --> 00:26:20,471 Once you've declared it, that's it. 493 00:26:20,471 --> 00:26:21,921 You only refer to it by name. 494 00:26:21,921 --> 00:26:26,111 And so it's very deliberate what I did here, 495 00:26:26,111 --> 00:26:28,661 saying that the type here is int star-- 496 00:26:28,661 --> 00:26:30,671 that is a pointer to an int-- 497 00:26:30,671 --> 00:26:33,611 but here I just said the name of the variable, as always. 498 00:26:33,611 --> 00:26:36,311 I didn't repeat int, and I also didn't repeat star. 499 00:26:36,311 --> 00:26:39,191 But at the risk of bending one's minds a little bit there 500 00:26:39,191 --> 00:26:45,441 is unfortunately one other use for the star operator, and that's as follows. 501 00:26:45,441 --> 00:26:49,181 If you want to print out not the address of something, 502 00:26:49,181 --> 00:26:54,261 but what is at a specific address, you can actually do this. 503 00:26:54,261 --> 00:26:59,621 If I want to print out the integer via %i, that is at that address, 504 00:26:59,621 --> 00:27:04,061 I can actually use the star here, which technically contradicts what I just 505 00:27:04,061 --> 00:27:07,161 said but it has a different function here-- a different purpose. 506 00:27:07,161 --> 00:27:09,561 So let me go ahead and do this in two different ways. 507 00:27:09,561 --> 00:27:11,366 I'm going to leave this line of code as is, 508 00:27:11,366 --> 00:27:13,241 but I'm going to add another line of code now 509 00:27:13,241 --> 00:27:17,201 that prints out what apparently will be an integer, in a moment. 510 00:27:17,201 --> 00:27:21,124 So %i backslash n, and I could see-- and let me just do n for now. 511 00:27:21,124 --> 00:27:23,291 So there's really nothing special happening now, I'm 512 00:27:23,291 --> 00:27:25,301 just adding a sort of mindless printing of n. 513 00:27:25,301 --> 00:27:28,041 So make address, ./address-- 514 00:27:28,041 --> 00:27:31,601 there's the current address of n and there's the value of n. 515 00:27:31,601 --> 00:27:34,571 But what's kind of cool about C here, too, 516 00:27:34,571 --> 00:27:38,861 is if you know that a value is at a specific address like p, 517 00:27:38,861 --> 00:27:42,591 there's one other use for this star operator, the asterisk. 518 00:27:42,591 --> 00:27:46,221 You can use it as the so-called dereference operator, 519 00:27:46,221 --> 00:27:49,071 which means go to that address. 520 00:27:49,071 --> 00:27:54,701 And so here what we actually have is an example of a pointer p, 521 00:27:54,701 --> 00:27:59,631 which is an address like 0x123 or 0x7FF and so forth. 522 00:27:59,631 --> 00:28:03,191 But if you say star p now, you're not redeclaring the variable 523 00:28:03,191 --> 00:28:04,631 because I didn't mention int-- 524 00:28:04,631 --> 00:28:07,391 you're going to that address in p. 525 00:28:07,391 --> 00:28:09,071 So let me recompile this now. 526 00:28:09,071 --> 00:28:15,191 Make address, ./address, and just to be clear-- 527 00:28:15,191 --> 00:28:16,721 what should I see? 528 00:28:16,721 --> 00:28:20,231 I'm first going to see the pointer itself, 0x something. 529 00:28:20,231 --> 00:28:23,096 What's the second line of output I should presumably see now? 530 00:28:25,801 --> 00:28:27,591 Shout a little louder. 531 00:28:27,591 --> 00:28:31,911 So I'm hearing 50, and that's true because if you figure out the address 532 00:28:31,911 --> 00:28:38,151 of n and print it in line seven, but then go to the address of n, a.k.a. p, 533 00:28:38,151 --> 00:28:41,331 that's indeed going to just show you the number n-- 534 00:28:41,331 --> 00:28:44,121 the value of n again. 535 00:28:44,121 --> 00:28:47,028 All right, any questions now on this syntax-- and I will concede, 536 00:28:47,028 --> 00:28:48,861 I think this is confusing-- the fact that we 537 00:28:48,861 --> 00:28:51,051 use the star for multiplication, the fact 538 00:28:51,051 --> 00:28:53,361 that we use the star to declare a pointer, 539 00:28:53,361 --> 00:28:56,601 but then we use a star in a third way to dereference the pointer 540 00:28:56,601 --> 00:28:57,651 and go to the pointer. 541 00:28:57,651 --> 00:29:01,251 It's just too confusing, honestly, but with practice comes comfort. 542 00:29:01,251 --> 00:29:02,681 Yeah. 543 00:29:02,681 --> 00:29:12,501 AUDIENCE: [INAUDIBLE] 544 00:29:12,501 --> 00:29:13,751 DAVID J. MALAN: Good question. 545 00:29:13,751 --> 00:29:17,321 Do you-- when you are using the ampersand operator 546 00:29:17,321 --> 00:29:19,271 to get the address of something, the onus 547 00:29:19,271 --> 00:29:23,411 is on you at the moment to know what you are getting the address of. 548 00:29:23,411 --> 00:29:24,341 Is it a string? 549 00:29:24,341 --> 00:29:25,181 Is it a char? 550 00:29:25,181 --> 00:29:25,901 Is it a bool? 551 00:29:25,901 --> 00:29:26,681 Is it an int? 552 00:29:26,681 --> 00:29:30,041 I wrote this code so I know in line six that I'm 553 00:29:30,041 --> 00:29:33,131 trying to get the address of what is an integer. 554 00:29:33,131 --> 00:29:35,271 AUDIENCE: What about line eight? 555 00:29:35,271 --> 00:29:38,991 DAVID J. MALAN: In line eight you don't have 556 00:29:38,991 --> 00:29:40,821 to worry about that-- good question. 557 00:29:40,821 --> 00:29:44,851 Notice in line eight, I didn't tell the computer, other than the %i, 558 00:29:44,851 --> 00:29:49,551 what kind of address I'm going to, but I did already in line six. 559 00:29:49,551 --> 00:29:52,581 I told the compiler that p, now and forever, 560 00:29:52,581 --> 00:29:55,041 is going to be the address of an int. 561 00:29:55,041 --> 00:29:59,961 That's enough information in advance so that printf, or really the language C, 562 00:29:59,961 --> 00:30:03,951 still knows on line eight that p is a pointer to an int, 563 00:30:03,951 --> 00:30:07,371 and that way it will print out all four bytes at that address, 564 00:30:07,371 --> 00:30:11,288 not just part of it, and not more than those four bytes. 565 00:30:11,288 --> 00:30:11,871 Good question. 566 00:30:11,871 --> 00:30:13,801 Yeah, next to you. 567 00:30:13,801 --> 00:30:15,301 AUDIENCE: Do pointers have pointers? 568 00:30:15,301 --> 00:30:16,601 DAVID J. MALAN: Do pointers have pointers? 569 00:30:16,601 --> 00:30:17,101 Yes. 570 00:30:17,101 --> 00:30:20,731 We won't do this today by having pointers to pointers, 571 00:30:20,731 --> 00:30:24,421 but yes, you can use star star, and then things get-- 572 00:30:24,421 --> 00:30:26,311 I'm sorry. 573 00:30:26,311 --> 00:30:28,501 We won't do that today and we won't do that often. 574 00:30:28,501 --> 00:30:31,051 In fact Python, another language, is just a couple of weeks 575 00:30:31,051 --> 00:30:32,221 away, so hang in there. 576 00:30:32,221 --> 00:30:32,921 Almost there. 577 00:30:32,921 --> 00:30:34,561 A question back here? 578 00:30:34,561 --> 00:30:36,331 Was there? 579 00:30:36,331 --> 00:30:38,191 That was-- more verbal feedback like that 580 00:30:38,191 --> 00:30:40,871 is helpful as we forge into the more complicated stuff. 581 00:30:40,871 --> 00:30:41,551 Other questions? 582 00:30:41,551 --> 00:30:42,909 Yeah. 583 00:30:42,909 --> 00:30:44,785 AUDIENCE: What's the point of [INAUDIBLE]?? 584 00:30:48,071 --> 00:30:51,161 DAVID J. MALAN: What's the point of printing the address? 585 00:30:51,161 --> 00:30:54,451 AUDIENCE: Like, using the address to [INAUDIBLE].. 586 00:30:54,451 --> 00:30:55,381 DAVID J. MALAN: Sure. 587 00:30:55,381 --> 00:30:56,521 What's the point of doing this? 588 00:30:56,521 --> 00:30:58,771 If you don't mind, let me-- let's get there in a moment. 589 00:30:58,771 --> 00:31:01,471 This is not the common use case, just printing out the address-- 590 00:31:01,471 --> 00:31:02,821 who really cares? 591 00:31:02,821 --> 00:31:05,401 At the moment we care only for the sake of discussion. 592 00:31:05,401 --> 00:31:07,453 We're soon going to start using these addresses. 593 00:31:07,453 --> 00:31:09,661 So hang in there just a little bit for that one, too, 594 00:31:09,661 --> 00:31:13,621 but it will solve some problems for us before long. 595 00:31:13,621 --> 00:31:17,311 So let's actually just now depict what was going on inside of the computer's 596 00:31:17,311 --> 00:31:19,691 memory just a moment ago. 597 00:31:19,691 --> 00:31:23,971 So if I toggle back here, let me redraw my computer's memory, 598 00:31:23,971 --> 00:31:27,421 now let me plop into the memory n, which is storing in this program 599 00:31:27,421 --> 00:31:28,471 the number 50. 600 00:31:28,471 --> 00:31:30,631 Where is p in my computer's memory? 601 00:31:30,631 --> 00:31:33,691 Specifically, I don't know and apparently it moves around each time I 602 00:31:33,691 --> 00:31:35,741 run the program so for the sake of discussion, 603 00:31:35,741 --> 00:31:40,711 let's just propose that if 50 ended up at address 0x123, I don't know-- 604 00:31:40,711 --> 00:31:43,471 p ends up over here, at address-- 605 00:31:43,471 --> 00:31:46,661 whoops-- at whatever address this is here. 606 00:31:46,661 --> 00:31:49,111 But notice a couple of curiosities now. 607 00:31:49,111 --> 00:31:52,621 If p is a pointer, it's the address of something. 608 00:31:52,621 --> 00:31:57,961 So the value in p should be an address, and I've indeed written it as such-- 609 00:31:57,961 --> 00:32:02,071 0x123, and technically there's not an x there, there's not a zero there, 610 00:32:02,071 --> 00:32:04,471 there's not even a 123 there per se-- there's 611 00:32:04,471 --> 00:32:08,011 a pattern of bits that represents the address 0x123. 612 00:32:08,011 --> 00:32:11,681 But again, that's weak zero-- don't care about binary day-to-day. 613 00:32:11,681 --> 00:32:17,761 So if this is p, and this I claimed was n, why is p so much bigger? 614 00:32:17,761 --> 00:32:20,231 Can someone conjecture here? 615 00:32:20,231 --> 00:32:25,061 Because it turns out whether n is an int or a char or a bool, 616 00:32:25,061 --> 00:32:27,701 which are different types-- heck, even a long-- 617 00:32:27,701 --> 00:32:31,871 it turns out that p is always going to take up eight squares on the board, 618 00:32:31,871 --> 00:32:33,951 but why might that be? 619 00:32:33,951 --> 00:32:35,261 What might explain that? 620 00:32:39,591 --> 00:32:41,507 Yeah, thoughts? 621 00:32:41,507 --> 00:32:45,451 AUDIENCE: Perhaps it allocates eight bytes, 622 00:32:45,451 --> 00:32:48,959 but it doesn't know the type of the data [INAUDIBLE].. 623 00:32:48,959 --> 00:32:50,001 DAVID J. MALAN: OK, fair. 624 00:32:50,001 --> 00:32:52,191 Maybe it's allocating eight bytes because it doesn't know the type. 625 00:32:52,191 --> 00:32:54,711 Turns out that's OK because an address is an address. 626 00:32:54,711 --> 00:32:58,281 It's really up to the programmer to use it as a string or a char or a bool. 627 00:32:58,281 --> 00:33:00,381 Other thoughts? 628 00:33:00,381 --> 00:33:05,443 AUDIENCE: Maybe the first four for the actual number and the last four 629 00:33:05,443 --> 00:33:11,033 is some null that [INAUDIBLE] where the pointer ends. 630 00:33:11,033 --> 00:33:12,241 DAVID J. MALAN: OK, possibly. 631 00:33:12,241 --> 00:33:15,211 It could be that pointers have some complexity like a backslash n 632 00:33:15,211 --> 00:33:18,091 or something curious like that, like we talked about for strings. 633 00:33:18,091 --> 00:33:19,751 Turns out that's not the case. 634 00:33:19,751 --> 00:33:23,281 It turns out that pointers nowadays typically are, but not 635 00:33:23,281 --> 00:33:25,921 always are eight bytes, a.k.a. 636 00:33:25,921 --> 00:33:29,101 64 bits, because you and I-- our Macs, our PCs, 637 00:33:29,101 --> 00:33:32,911 heck-- even our phones have a lot more memory than they did years ago. 638 00:33:32,911 --> 00:33:34,801 Back in the day, a pointer might have only 639 00:33:34,801 --> 00:33:38,701 been 32 bits, or even only eight bits way back in the day. 640 00:33:38,701 --> 00:33:41,551 It's considered 32 bits, because that was the norm for some time. 641 00:33:41,551 --> 00:33:45,091 How high can you count, roughly, if you've got 32 bits? 642 00:33:45,091 --> 00:33:47,901 What's the number we keep rattling off? 643 00:33:47,901 --> 00:33:53,061 32 bits is roughly 2 to the 32, so it's 4 billion, 644 00:33:53,061 --> 00:33:57,271 and I keep saying it's 2 billion if you do negative, but in the world of memory 645 00:33:57,271 --> 00:34:00,531 there's a reason I keep saying 2 billion bytes, two gigabytes, 646 00:34:00,531 --> 00:34:03,591 because for a very long time that was the maximum amount of memory 647 00:34:03,591 --> 00:34:04,621 a computer could have. 648 00:34:04,621 --> 00:34:05,121 Why? 649 00:34:05,121 --> 00:34:07,491 Because the pointers that the computers were using 650 00:34:07,491 --> 00:34:09,531 were only, for instance, 32 bits. 651 00:34:09,531 --> 00:34:12,591 And with 32 bits, depending on whether you allow for negatives or not, 652 00:34:12,591 --> 00:34:15,621 you can count as high as 2 billion, roughly, or maybe 4 billion 653 00:34:15,621 --> 00:34:17,961 but you know what-- your Mac, your PC, your phone 654 00:34:17,961 --> 00:34:22,441 could not have had five gigabytes of memory, or 5 billion bytes of memory. 655 00:34:22,441 --> 00:34:25,191 You certainly couldn't have had what computers nowadays come with, 656 00:34:25,191 --> 00:34:27,171 which might be 8 gigabytes of memory-- 657 00:34:27,171 --> 00:34:28,561 16 gigabytes of memory. 658 00:34:28,561 --> 00:34:29,211 Why? 659 00:34:29,211 --> 00:34:33,501 Because with 4 bytes, or 32 bits, you literally, physically, 660 00:34:33,501 --> 00:34:37,611 can't count that high, which means if I drew a picture of all of the memory we 661 00:34:37,611 --> 00:34:41,301 would run out of numbers to describe them, which means most of my memory 662 00:34:41,301 --> 00:34:42,631 would just be unusable. 663 00:34:42,631 --> 00:34:45,771 So pointers nowadays are 64 bits, or eight bytes. 664 00:34:45,771 --> 00:34:46,521 That's really big. 665 00:34:46,521 --> 00:34:48,438 I can't even pronounce how big that number is, 666 00:34:48,438 --> 00:34:51,051 but it's plenty for the next many years, and so 667 00:34:51,051 --> 00:34:52,881 we've drawn it that way on the board here. 668 00:34:52,881 --> 00:34:54,501 Now let's just abstract this away. 669 00:34:54,501 --> 00:34:56,209 Let's get rid of all the other bytes that 670 00:34:56,209 --> 00:34:58,911 are storing something or nothing else, and let's now 671 00:34:58,911 --> 00:35:02,241 start to abstract away this complexity because the reality is, 672 00:35:02,241 --> 00:35:04,131 to your question earlier-- 673 00:35:04,131 --> 00:35:06,441 what is this useful for, or what do we-- do we actually 674 00:35:06,441 --> 00:35:07,971 care about these addresses? 675 00:35:07,971 --> 00:35:08,961 Generally, no. 676 00:35:08,961 --> 00:35:11,061 We're doing this so that you see there's no magic. 677 00:35:11,061 --> 00:35:13,951 We're just moving things around and poking around in memory. 678 00:35:13,951 --> 00:35:16,791 But what a person would typically do when talking about pointers 679 00:35:16,791 --> 00:35:19,401 would literally be to just point at something. 680 00:35:19,401 --> 00:35:21,951 I really don't care what address n is at, 681 00:35:21,951 --> 00:35:25,131 so it suffices when general, when drawing pictures on a whiteboard, 682 00:35:25,131 --> 00:35:27,021 having a discussion with another programmer, 683 00:35:27,021 --> 00:35:31,341 you just draw an arrow from the pointer to the value in question, 684 00:35:31,341 --> 00:35:36,470 because neither you nor I probably care about the specifics of 0x whatever. 685 00:35:36,470 --> 00:35:39,813 There's your pointer-- it's literally an arrow, and we can see this. 686 00:35:39,813 --> 00:35:42,021 So it turns out that these pointers, these addresses, 687 00:35:42,021 --> 00:35:45,831 are not that dissimilar to what we've done for hundreds of years 688 00:35:45,831 --> 00:35:48,381 in the form of a postal system. 689 00:35:48,381 --> 00:35:50,121 For instance, here is a post office-- 690 00:35:50,121 --> 00:35:52,731 here, no-- here is a mailbox, and suppose 691 00:35:52,731 --> 00:35:55,431 that this is a mailbox labeled p. 692 00:35:55,431 --> 00:35:58,191 It's a pointer, and suppose there's another mailbox 693 00:35:58,191 --> 00:36:02,041 way over there, which is just another bite of my computer's memory. 694 00:36:02,041 --> 00:36:03,831 What are we really talking about? 695 00:36:03,831 --> 00:36:07,881 Well, you store in a computer's memory values like the number 50, 696 00:36:07,881 --> 00:36:11,841 or the word "hi" inside of your computer's memory at some location. 697 00:36:11,841 --> 00:36:15,921 But today we can also use those same memory locations 698 00:36:15,921 --> 00:36:17,551 to store the address of things. 699 00:36:17,551 --> 00:36:21,351 For instance, if I open this up here and I 700 00:36:21,351 --> 00:36:25,071 see OK, the value inside of this mailbox is not a number like 50, 701 00:36:25,071 --> 00:36:26,361 it's actually an address-- 702 00:36:26,361 --> 00:36:30,861 0x123-- that's like a pointer, a breadcrumb leading 703 00:36:30,861 --> 00:36:32,661 from one location in memory to another. 704 00:36:32,661 --> 00:36:35,161 And in fact, would someone who's seated roughly over there-- 705 00:36:35,161 --> 00:36:37,761 do you mind getting the mail over there? 706 00:36:37,761 --> 00:36:40,581 Any volunteers over in this section? 707 00:36:40,581 --> 00:36:42,931 Just need you to get to the mailbox before I do. 708 00:36:42,931 --> 00:36:44,781 Who's being volunteered? 709 00:36:44,781 --> 00:36:45,471 Oh yes, please. 710 00:36:45,471 --> 00:36:50,926 Whoever is gesturing most wildly, come on down. 711 00:36:50,926 --> 00:36:51,426 Sure. 712 00:36:57,861 --> 00:36:59,315 What's your name? 713 00:36:59,315 --> 00:37:00,078 AUDIENCE: Anfoo. 714 00:37:00,078 --> 00:37:01,161 DAVID J. MALAN: Say again? 715 00:37:01,161 --> 00:37:01,851 AUDIENCE: Anfoo. 716 00:37:01,851 --> 00:37:03,201 DAVID J. MALAN: Anfoo? 717 00:37:03,201 --> 00:37:06,081 OK, come on up to the edge of the stage there and just to be clear-- 718 00:37:06,081 --> 00:37:09,801 if this is p, that is apparently n, but to make clear 719 00:37:09,801 --> 00:37:12,621 what we're talking about when we're storing 0x whatever values-- 720 00:37:12,621 --> 00:37:15,771 like 0x123, that's essentially equivalent to my 721 00:37:15,771 --> 00:37:18,501 maybe pulling out something like this and just 722 00:37:18,501 --> 00:37:21,051 abstractly pointing to your mailbox there, 723 00:37:21,051 --> 00:37:25,311 or if you prefer, pointing to the mailbox-- 724 00:37:25,311 --> 00:37:26,271 OK, all right. 725 00:37:28,951 --> 00:37:29,451 Thank you. 726 00:37:29,451 --> 00:37:29,951 All right. 727 00:37:32,661 --> 00:37:34,821 This is akin to me pointing at your mailbox, 728 00:37:34,821 --> 00:37:36,863 and if you want to go ahead and open your mailbox 729 00:37:36,863 --> 00:37:43,201 and reveal to the crowd what's inside your mailbox labeled n. 730 00:37:43,201 --> 00:37:43,981 All right. 731 00:37:46,501 --> 00:37:48,601 Thank you. 732 00:37:48,601 --> 00:37:51,221 We have a little CS50 stress ball for your trouble. 733 00:37:51,221 --> 00:37:52,553 Thank you for coming up. 734 00:37:52,553 --> 00:37:55,261 So that's just to put a visual on what it is we're talking about, 735 00:37:55,261 --> 00:37:58,171 because it can get very abstract, very cryptic quickly when we're 736 00:37:58,171 --> 00:38:01,391 talking about addresses and memory and drawing it like these little squares. 737 00:38:01,391 --> 00:38:04,308 But if you think about just walking into a post office or an apartment 738 00:38:04,308 --> 00:38:07,261 complex that's got a lot of mailboxes, those mailboxes 739 00:38:07,261 --> 00:38:10,231 essentially are a big chunk of memory and each 740 00:38:10,231 --> 00:38:12,091 of those mailboxes has an address-- 741 00:38:12,091 --> 00:38:14,821 this is apartment one, two, three-- apartment 2 billion. 742 00:38:14,821 --> 00:38:18,091 And inside of those mailboxes can go anything 743 00:38:18,091 --> 00:38:20,261 that can be represented as information. 744 00:38:20,261 --> 00:38:23,341 It could be a number like n, or 50, or if you 745 00:38:23,341 --> 00:38:25,741 prefer it could be a number that represents 746 00:38:25,741 --> 00:38:27,631 the address of another mailbox. 747 00:38:27,631 --> 00:38:30,811 And this is akin, really, if you've ever had an apartment or you 748 00:38:30,811 --> 00:38:33,631 and your parents have moved, to having a forwarding address. 749 00:38:33,631 --> 00:38:36,001 It's like having the Post Office in the US 750 00:38:36,001 --> 00:38:39,481 put some kind of piece of paper in your old mailbox saying, 751 00:38:39,481 --> 00:38:41,911 actually forward it to that other mailbox. 752 00:38:41,911 --> 00:38:44,281 That really is all a pointer is doing. 753 00:38:44,281 --> 00:38:45,991 At the end of the day, it's just a number 754 00:38:45,991 --> 00:38:48,331 but it's a number being used in a different way 755 00:38:48,331 --> 00:38:50,461 and it's the syntax that we've introduced, 756 00:38:50,461 --> 00:38:54,271 not just int but int star, that tells the computer how 757 00:38:54,271 --> 00:38:58,741 to treat that number in this slightly different way. 758 00:38:58,741 --> 00:39:01,841 Are there any questions then, on this? 759 00:39:01,841 --> 00:39:03,962 Yeah, in back. 760 00:39:03,962 --> 00:39:06,379 AUDIENCE: If you had a variable, like int c, [INAUDIBLE].. 761 00:39:10,711 --> 00:39:12,691 DAVID J. MALAN: If I did int c and-- 762 00:39:12,691 --> 00:39:14,841 say the code again? 763 00:39:14,841 --> 00:39:17,011 Once more? 764 00:39:17,011 --> 00:39:19,141 Equal to n, so let me actually type it out. 765 00:39:19,141 --> 00:39:21,271 If I give myself another line of code, tell me 766 00:39:21,271 --> 00:39:27,251 one last time what to type. int is equal to n, like this? 767 00:39:27,251 --> 00:39:31,951 So this is OK, and I can't draw it quite quickly enough on the board here, 768 00:39:31,951 --> 00:39:36,181 but this would be like creating another four bytes somewhere in memory, maybe 769 00:39:36,181 --> 00:39:40,231 down here, that stores an identical copy of 50 770 00:39:40,231 --> 00:39:43,381 because the assignment operator from right to left copies one value 771 00:39:43,381 --> 00:39:44,201 to another. 772 00:39:44,201 --> 00:39:47,671 So that would just add one more rectangle of size four 773 00:39:47,671 --> 00:39:50,391 to this particular picture. 774 00:39:50,391 --> 00:39:52,371 If I'm answering your question as intended. 775 00:39:52,371 --> 00:39:57,231 OK, so that is week one style use of assignment operators before pointers. 776 00:39:57,231 --> 00:40:00,051 I could, though, start copying pointers but again, we'll 777 00:40:00,051 --> 00:40:01,881 come back to some of that complexity. 778 00:40:01,881 --> 00:40:03,421 Any other questions here? 779 00:40:03,421 --> 00:40:04,921 AUDIENCE: That was a great question. 780 00:40:04,921 --> 00:40:06,841 Does the pointer point-- 781 00:40:06,841 --> 00:40:10,084 does the same pointer point to the new replica as well? 782 00:40:10,084 --> 00:40:11,501 DAVID J. MALAN: Ah, good question. 783 00:40:11,501 --> 00:40:12,406 Short answer, no. 784 00:40:12,406 --> 00:40:17,101 And to repeat for the camera, if I create a second variable like this, 785 00:40:17,101 --> 00:40:21,271 int c equals n, and I claim without actually drawing it on the board 786 00:40:21,271 --> 00:40:25,191 that this gives me another rectangle, the value of which is also 50, 787 00:40:25,191 --> 00:40:26,681 p does not get touched. 788 00:40:26,681 --> 00:40:29,041 And this is what's important and really characteristic 789 00:40:29,041 --> 00:40:33,001 of C. Nothing happens automatically for you. 790 00:40:33,001 --> 00:40:36,581 p is not going to be updated unless you update p in some way, 791 00:40:36,581 --> 00:40:39,121 so creating a third variable called c-- even 792 00:40:39,121 --> 00:40:41,521 if you're copying its value from right to left, 793 00:40:41,521 --> 00:40:44,701 that has no effect on anything else in the program. 794 00:40:44,701 --> 00:40:46,031 A good question. 795 00:40:46,031 --> 00:40:52,201 So what have we seen that's perhaps now a little more explainable? 796 00:40:52,201 --> 00:40:56,221 Well, recall that we talked quite a bit last week about strings, and just 797 00:40:56,221 --> 00:41:02,101 to recap in layperson's terms, what is this string as you now understand it? 798 00:41:02,101 --> 00:41:04,191 So say-- well, let me take a specific hand here. 799 00:41:04,191 --> 00:41:05,091 What's a string? 800 00:41:05,091 --> 00:41:06,926 How about over here. 801 00:41:06,926 --> 00:41:08,301 AUDIENCE: An array of characters. 802 00:41:08,301 --> 00:41:08,811 DAVID J. MALAN: OK, sure. 803 00:41:08,811 --> 00:41:09,728 Both of you are right. 804 00:41:09,728 --> 00:41:10,971 An array of characters. 805 00:41:10,971 --> 00:41:13,761 An array of characters, and we-- 806 00:41:13,761 --> 00:41:16,881 I claimed-- or revealed last week that string is not technically 807 00:41:16,881 --> 00:41:20,151 a feature built into C. It's not an official data type 808 00:41:20,151 --> 00:41:22,401 but every programmer in most any language 809 00:41:22,401 --> 00:41:25,641 refers to sequences of characters-- words, letters, 810 00:41:25,641 --> 00:41:27,451 paragraphs-- as strings. 811 00:41:27,451 --> 00:41:30,771 So the vernacular exists but the data type doesn't typically 812 00:41:30,771 --> 00:41:34,111 exist per se in C. So what we're about to do, if you will, 813 00:41:34,111 --> 00:41:36,951 for dramatic effect, is take off some training wheels today. 814 00:41:36,951 --> 00:41:41,451 The CS50 library implemented in the form of the header file cs50.h-- 815 00:41:41,451 --> 00:41:43,581 we claim has had a bunch of things in it. 816 00:41:43,581 --> 00:41:46,761 Prototypes for GetString, prototypes for GetInt, 817 00:41:46,761 --> 00:41:49,281 and all of those other functions, but it turns out 818 00:41:49,281 --> 00:41:53,481 it also is what defines the word "string" in such a way 819 00:41:53,481 --> 00:41:55,981 that you all can use it these past several weeks. 820 00:41:55,981 --> 00:41:58,641 So let's take a look at an example of a string in use. 821 00:41:58,641 --> 00:42:00,681 Here, for instance, is a tiny bit of code 822 00:42:00,681 --> 00:42:05,421 that uses the word "string," creating a variable called s 823 00:42:05,421 --> 00:42:08,083 and then storing quote unquote, hi, exclamation point. 824 00:42:08,083 --> 00:42:10,791 Let's consider what this looks like now in the computer's memory. 825 00:42:10,791 --> 00:42:13,541 I don't care about all the other bytes, let's just focus on these, 826 00:42:13,541 --> 00:42:16,551 and this per last week is how "hi" might be stored. 827 00:42:16,551 --> 00:42:19,311 h-i exclamation point and then one more, as someone already 828 00:42:19,311 --> 00:42:23,151 observed, that sentinel value-- that null character which 829 00:42:23,151 --> 00:42:26,558 just means eight zero bits to demarcate the end of that string 830 00:42:26,558 --> 00:42:28,641 just in case there's something to the right of it, 831 00:42:28,641 --> 00:42:31,801 the computer can now distinguish one string from another. 832 00:42:31,801 --> 00:42:35,004 So last week we introduced this new syntax. 833 00:42:35,004 --> 00:42:36,921 Well, if strings are just arrays of characters 834 00:42:36,921 --> 00:42:39,831 you can then very cleverly use that square bracket notation 835 00:42:39,831 --> 00:42:44,631 and go to location zero or one or two, which are like addresses, 836 00:42:44,631 --> 00:42:46,431 but they're relative to the string. 837 00:42:46,431 --> 00:42:51,381 This could be at 0x123 or 0x456, but with this bracket notation 838 00:42:51,381 --> 00:42:54,381 zero is always the beginning of the string, one is the next, 839 00:42:54,381 --> 00:42:55,801 two is the next, and so forth. 840 00:42:55,801 --> 00:43:00,561 So that was our array syntax for indexing into an array. 841 00:43:00,561 --> 00:43:03,471 But technically speaking, we can go a little deeper today-- 842 00:43:03,471 --> 00:43:09,741 technically speaking, if hi is starting at the address 0x123 then 843 00:43:09,741 --> 00:43:15,711 it stands to reason that i is at 0x124, exclamation point's at 0x125, 844 00:43:15,711 --> 00:43:18,711 and the null is that 0x126. 845 00:43:18,711 --> 00:43:23,331 Now, I don't care about 123 per se, but even though this is hexadecimal, 846 00:43:23,331 --> 00:43:24,591 this is correct math. 847 00:43:24,591 --> 00:43:28,101 Even in hex, if you just add one when you start at 0x123, 848 00:43:28,101 --> 00:43:30,456 the next number is four, five, six at the end. 849 00:43:30,456 --> 00:43:32,331 I don't have to worry about A's, B's, and C's 850 00:43:32,331 --> 00:43:35,341 because I'm not counting that high in this example. 851 00:43:35,341 --> 00:43:39,531 So if that's the case, and my computer is actually 852 00:43:39,531 --> 00:43:47,271 laying out the word hi in memory like that, well, what exactly is s? 853 00:43:47,271 --> 00:43:50,001 What exactly is s if, at the end of the day, 854 00:43:50,001 --> 00:43:56,031 H-I exclamation point null is storing-- or is or stored at these addresses? 855 00:43:56,031 --> 00:43:57,006 Where is s? 856 00:43:57,006 --> 00:43:58,881 Now that I've taken off those training wheels 857 00:43:58,881 --> 00:44:02,481 and showed you where H-I exclamation point null actually are, 858 00:44:02,481 --> 00:44:04,221 what happened to s? 859 00:44:04,221 --> 00:44:08,211 Well s, as always, is actually a variable. 860 00:44:08,211 --> 00:44:10,251 Even in the code I proposed a moment ago, 861 00:44:10,251 --> 00:44:13,551 s is apparently a data type that yes, doesn't come with C, 862 00:44:13,551 --> 00:44:16,101 but CS50's library makes it exist. 863 00:44:16,101 --> 00:44:21,471 s is a variable of type string, so where is s in this picture? 864 00:44:21,471 --> 00:44:25,431 Well, it turns out that s might be up here. 865 00:44:25,431 --> 00:44:28,971 Again, I'm just drawing it anywhere for the sake of discussion, 866 00:44:28,971 --> 00:44:33,141 but s is a variable per that line of code. 867 00:44:33,141 --> 00:44:36,978 What s is storing, apparently, I claim, is 0x123. 868 00:44:36,978 --> 00:44:40,311 I actually don't really care about these addresses, so let's abstract that away. 869 00:44:40,311 --> 00:44:45,591 s is apparently, as of now, today, one week later, just a pointer 870 00:44:45,591 --> 00:44:46,761 to a character. 871 00:44:46,761 --> 00:44:49,311 Specifically, the first character in s. 872 00:44:49,311 --> 00:44:51,411 And this is the last piece of the puzzle. 873 00:44:51,411 --> 00:44:54,981 Last week we had this clever way of demarcating the end of a string. 874 00:44:54,981 --> 00:44:59,901 Well, it turns out that strings are represented in the computer's memory 875 00:44:59,901 --> 00:45:03,861 as a variable that is a pointer, inside of which 876 00:45:03,861 --> 00:45:06,901 is the address of the first character in the string. 877 00:45:06,901 --> 00:45:09,951 So if s points at the first character and you 878 00:45:09,951 --> 00:45:12,501 can trust that backslash zero is at the end of the string, 879 00:45:12,501 --> 00:45:18,091 that's literally all you need to figure out where a string begins and ends. 880 00:45:18,091 --> 00:45:19,531 So what do I mean by this? 881 00:45:19,531 --> 00:45:21,141 Well, let's be a little more concrete. 882 00:45:21,141 --> 00:45:24,801 In terms of this picture, if I've started with this line of code here, 883 00:45:24,801 --> 00:45:29,961 it turns out all this time since week 1, that the word string has just 884 00:45:29,961 --> 00:45:36,871 semi-secretly been an alias for char star. 885 00:45:36,871 --> 00:45:39,391 I know, so char star. 886 00:45:39,391 --> 00:45:40,841 So why does this make sense? 887 00:45:40,841 --> 00:45:44,081 It's a little weird still, but if in our previous example 888 00:45:44,081 --> 00:45:47,671 we were able to store the address of an integer by declaring a variable 889 00:45:47,671 --> 00:45:49,831 called p, as int star p-- 890 00:45:49,831 --> 00:45:52,681 well, if as of now strings are just the address 891 00:45:52,681 --> 00:45:58,111 of the first character in a string, then probably a string is just a char star 892 00:45:58,111 --> 00:46:01,861 because that means s is the address of a character, the very 893 00:46:01,861 --> 00:46:03,461 first character in the string. 894 00:46:03,461 --> 00:46:07,441 Now, the string might have three letters like it did, or four, or even a hundred 895 00:46:07,441 --> 00:46:09,571 if it's a long paragraph, but that's fine 896 00:46:09,571 --> 00:46:11,488 because you can trust that there's going to be 897 00:46:11,488 --> 00:46:13,181 that null character at the very end. 898 00:46:13,181 --> 00:46:16,921 So this is a general purpose way of representing strings 899 00:46:16,921 --> 00:46:20,041 using this new mechanism in C. 900 00:46:20,041 --> 00:46:23,221 So in fact, let me go ahead here and introduce maybe 901 00:46:23,221 --> 00:46:25,061 a couple of manipulations of this. 902 00:46:25,061 --> 00:46:28,831 Let me go back to my code here, and let's get rid of this integer stuff, 903 00:46:28,831 --> 00:46:32,381 and let's instead now do, for instance, this. 904 00:46:32,381 --> 00:46:37,383 Let me add in the CS50 library, so we'll include CS50.H for now. 905 00:46:37,383 --> 00:46:39,091 I'm going to go ahead and inside of main, 906 00:46:39,091 --> 00:46:41,971 give myself a string s equals hi exclamation point. 907 00:46:41,971 --> 00:46:43,621 I don't type the backslash zero. 908 00:46:43,621 --> 00:46:48,228 C does that for me automatically by using my double quotes like this. 909 00:46:48,228 --> 00:46:49,811 Now let me just go ahead and print it. 910 00:46:49,811 --> 00:46:52,981 So this again is week 1 style stuff where I'm just printing a string. 911 00:46:52,981 --> 00:46:54,611 No pointers yet. 912 00:46:54,611 --> 00:46:59,761 So let me do make address, Enter, ./address, and hopefully I see hi, 913 00:46:59,761 --> 00:47:01,391 so nothing new there. 914 00:47:01,391 --> 00:47:05,341 But let's start to peel back some of these layers here. 915 00:47:05,341 --> 00:47:09,361 Let me first of all, get rid of the CS50 library for a moment 916 00:47:09,361 --> 00:47:13,651 and let me change string to char star. 917 00:47:13,651 --> 00:47:15,901 And it's a little bit weird but yes, the convention 918 00:47:15,901 --> 00:47:19,899 is to say char, a space, then the star, and then immediately thereafter 919 00:47:19,899 --> 00:47:20,941 the name of the variable. 920 00:47:20,941 --> 00:47:23,691 Strictly speaking though, you might see textbooks or websites that 921 00:47:23,691 --> 00:47:26,671 do it like this or like this, but the canonical way 922 00:47:26,671 --> 00:47:28,451 is typically to do it like that. 923 00:47:28,451 --> 00:47:31,311 So now no more CS50 library, no more training wheels, if you will. 924 00:47:31,311 --> 00:47:33,821 I'm just treating strings for what they really are. 925 00:47:33,821 --> 00:47:37,021 Let me go ahead and do make address, Enter-- 926 00:47:37,021 --> 00:47:39,181 so far so good-- ./address-- 927 00:47:39,181 --> 00:47:40,651 and that, too, still works. 928 00:47:40,651 --> 00:47:44,851 So %s is a thing that comes with printf because the word string is programmer 929 00:47:44,851 --> 00:47:48,901 terminology but strictly speaking C doesn't have a string data type. 930 00:47:48,901 --> 00:47:53,221 It's always been char star, so what this means now is I 931 00:47:53,221 --> 00:47:56,761 can start to have some fun with these basic ideas, 932 00:47:56,761 --> 00:47:59,891 even though this is not purposeful other than for the sake of discussion. 933 00:47:59,891 --> 00:48:03,901 But if s is this-- let me go back and give myself the CS50 library. 934 00:48:03,901 --> 00:48:06,391 Let's put those training wheels back on for just a moment 935 00:48:06,391 --> 00:48:09,221 so that I can do one manipulation at a time. 936 00:48:09,221 --> 00:48:12,131 Here's my string s, as before. 937 00:48:12,131 --> 00:48:15,181 Well, let me go ahead and declare a char called c, 938 00:48:15,181 --> 00:48:20,221 and let me store the first character in the string there, which is 939 00:48:20,221 --> 00:48:22,891 s bracket zero, and that should give me h. 940 00:48:22,891 --> 00:48:25,951 And then just for kicks, let me go ahead and do char star-- 941 00:48:25,951 --> 00:48:33,061 whoops-- let me go ahead and do char star p equals ampersand c, 942 00:48:33,061 --> 00:48:35,491 and see what this actually prints for me. 943 00:48:35,491 --> 00:48:38,861 Let me go ahead and print out what p is here. 944 00:48:38,861 --> 00:48:40,091 So we're just playing around. 945 00:48:40,091 --> 00:48:43,681 So make address-- so far so good-- ./address. 946 00:48:43,681 --> 00:48:46,021 All right, so what have I just done? 947 00:48:46,021 --> 00:48:51,151 I've just created a char c and stored in it the letter H, which 948 00:48:51,151 --> 00:48:55,531 is the same thing as s bracket I, then I'm saying, what's the address of c, 949 00:48:55,531 --> 00:48:58,391 and that's apparently 0x7FF whatever. 950 00:48:58,391 --> 00:48:59,641 So that's the address. 951 00:48:59,641 --> 00:49:01,841 But I technically didn't have to do that. 952 00:49:01,841 --> 00:49:03,641 Let me go ahead and do two things now. 953 00:49:03,641 --> 00:49:12,001 Instead of just printing p, let me go ahead and print out maybe s itself. 954 00:49:12,001 --> 00:49:14,461 Let me go ahead and do make address, Enter-- 955 00:49:14,461 --> 00:49:17,611 so far so good-- ./address and-- 956 00:49:17,611 --> 00:49:20,371 damn it, what did I do wrong. 957 00:49:20,371 --> 00:49:22,201 Oh shoot, I didn't want to do that. 958 00:49:22,201 --> 00:49:25,781 Oh, I really made a mess of this. 959 00:49:25,781 --> 00:49:28,561 What did I want to do here? 960 00:49:28,561 --> 00:49:31,831 That was supposed to be impressive but it was the opposite. 961 00:49:31,831 --> 00:49:35,321 So let me turn it around. 962 00:49:35,321 --> 00:49:39,181 So if I intended to do this, why are lines nine and 10 963 00:49:39,181 --> 00:49:41,461 printing different values? 964 00:49:41,461 --> 00:49:44,641 Didn't really intend to go here, but let me try to save this. 965 00:49:44,641 --> 00:49:51,991 Why are we seeing different addresses, namely this address 402004 for s, 966 00:49:51,991 --> 00:49:57,031 and then 0x7FF for p? 967 00:49:57,031 --> 00:49:57,991 Any thoughts? 968 00:49:57,991 --> 00:50:00,121 Yeah, over here. 969 00:50:00,121 --> 00:50:02,571 AUDIENCE: [INAUDIBLE] is the character c is 970 00:50:02,571 --> 00:50:07,471 its own sort of location of the [INAUDIBLE],, 971 00:50:07,471 --> 00:50:09,513 and it's taking off just the values [INAUDIBLE].. 972 00:50:09,513 --> 00:50:10,513 DAVID J. MALAN: Correct. 973 00:50:10,513 --> 00:50:12,684 So if I really wanted to weasel my way out of this, 974 00:50:12,684 --> 00:50:15,351 this is a great answer to the previous question which was about, 975 00:50:15,351 --> 00:50:20,091 what if I introduce another variable, c, that's a copy of the value, 976 00:50:20,091 --> 00:50:22,791 and not in this case an int, but an actual char. 977 00:50:22,791 --> 00:50:28,281 Here, I've made c be a copy of the character that's at the beginning of s, 978 00:50:28,281 --> 00:50:29,381 but that's indeed a copy. 979 00:50:29,381 --> 00:50:31,131 So if I were to draw it on the screen that 980 00:50:31,131 --> 00:50:35,271 would give me a different rectangle in which this copy of h 981 00:50:35,271 --> 00:50:36,681 would actually be stored. 982 00:50:36,681 --> 00:50:38,631 So I didn't intend to do this, but what you're 983 00:50:38,631 --> 00:50:40,618 seeing is yes, the address of s-- 984 00:50:40,618 --> 00:50:42,951 and apparently that's at a pretty low address by default 985 00:50:42,951 --> 00:50:44,961 here-- then you're seeing the address of c. 986 00:50:44,961 --> 00:50:47,841 But even though each of them is h, I claim 987 00:50:47,841 --> 00:50:49,803 one is at a different address in memory. 988 00:50:49,803 --> 00:50:51,261 And this has always been happening. 989 00:50:51,261 --> 00:50:53,991 Any time you created one variable or another it was ending up here, 990 00:50:53,991 --> 00:50:55,908 or here, or here, or somewhere else in memory. 991 00:50:55,908 --> 00:50:58,911 Now for the first time all we're doing is actually just poking around 992 00:50:58,911 --> 00:51:02,371 the computer's memory to see what is actually there. 993 00:51:02,371 --> 00:51:06,021 So let me actually back this up a little bit 994 00:51:06,021 --> 00:51:09,391 and do what I intended to do here, which was something like this. 995 00:51:09,391 --> 00:51:13,551 So if string s equals quote unquote, hi, let's go ahead 996 00:51:13,551 --> 00:51:23,051 and give myself a pointer, called p, to the first character in s. 997 00:51:23,051 --> 00:51:26,891 All right, so now let me go ahead and print out the value of this pointer, 998 00:51:26,891 --> 00:51:29,034 %p, printing out p. 999 00:51:29,034 --> 00:51:30,951 So we're just going to do one thing at a time. 1000 00:51:30,951 --> 00:51:33,761 So make address, Enter, ./address. 1001 00:51:33,761 --> 00:51:38,861 There, at the moment, is the address of the first character in s. 1002 00:51:38,861 --> 00:51:40,781 What I meant to do now, was this. 1003 00:51:40,781 --> 00:51:43,721 If I want to print out two things this time, 1004 00:51:43,721 --> 00:51:49,391 let me print out not only what p is, but also what s itself originally is. 1005 00:51:49,391 --> 00:51:53,411 Because if I claim that everyone from last week should be comfortable with 1006 00:51:53,411 --> 00:51:56,381 s bracket zero just representing the first character in s 1007 00:51:56,381 --> 00:51:59,621 by definition of strings being arrays of characters. 1008 00:51:59,621 --> 00:52:05,871 Then s, as of today, is itself the address of a character, 1009 00:52:05,871 --> 00:52:06,761 the first one in s. 1010 00:52:06,761 --> 00:52:10,721 So if I now do make address, and do ./address, 1011 00:52:10,721 --> 00:52:13,481 this time I see the same exact things. 1012 00:52:13,481 --> 00:52:14,081 Thank you. 1013 00:52:18,228 --> 00:52:20,811 This is really the lamest sort of thing to be applauding over, 1014 00:52:20,811 --> 00:52:26,571 but what we're demonstrating here is that s is by definition the address 1015 00:52:26,571 --> 00:52:28,261 of the first character in c. 1016 00:52:28,261 --> 00:52:30,931 So if we borrow some of our mental model from last week-- 1017 00:52:30,931 --> 00:52:35,811 well, if s bracket zero is the first character in c, doing the ampersand on 1018 00:52:35,811 --> 00:52:38,351 that expression should be the same as s. 1019 00:52:38,351 --> 00:52:40,851 Now this isn't to say that we would jump through these hoops 1020 00:52:40,851 --> 00:52:45,051 all the time with this much syntax, but this is just to do proof by example 1021 00:52:45,051 --> 00:52:51,171 that s is in fact, as I claimed a moment ago, just the address of a character. 1022 00:52:51,171 --> 00:52:54,651 Not even multiple characters, it's the address of a single character, 1023 00:52:54,651 --> 00:52:58,581 but the key thing is it's the address of the first character in the string, 1024 00:52:58,581 --> 00:53:01,821 and per last week we trust that C is going 1025 00:53:01,821 --> 00:53:04,881 to look for that null character at the very end just 1026 00:53:04,881 --> 00:53:08,721 to make sure it knows where the string actually ends. 1027 00:53:08,721 --> 00:53:12,317 All right, a question came up over here. 1028 00:53:12,317 --> 00:53:25,581 AUDIENCE: [INAUDIBLE] 1029 00:53:25,581 --> 00:53:26,581 DAVID J. MALAN: Correct. 1030 00:53:26,581 --> 00:53:30,181 To summarize, on line eight, when I am using %p-- 1031 00:53:30,181 --> 00:53:33,181 that just means print a pointer value, so 0x something-- 1032 00:53:33,181 --> 00:53:35,581 I'm passing it s. 1033 00:53:35,581 --> 00:53:41,281 Previously, when we used %s, printf knew to print not just the first character 1034 00:53:41,281 --> 00:53:45,481 of s, but h, i, exclamation point, and then stop when it hits the backslash 1035 00:53:45,481 --> 00:53:46,621 zero. 1036 00:53:46,621 --> 00:53:51,841 p is different. %p tells the computer to go to that address-- 1037 00:53:51,841 --> 00:53:56,711 sorry, tells the computer to print that address on the screen. 1038 00:53:56,711 --> 00:53:59,761 So this is where %s all this time has been powerful. 1039 00:53:59,761 --> 00:54:03,961 The reason printf worked in week 1 and 2 and 3 1040 00:54:03,961 --> 00:54:07,261 was because printf was designed by some human years ago 1041 00:54:07,261 --> 00:54:10,291 to go to the address that's being passed in-- for instance, 1042 00:54:10,291 --> 00:54:12,631 s-- and print out character after character 1043 00:54:12,631 --> 00:54:16,291 after character until it sees the null character backslash zero, 1044 00:54:16,291 --> 00:54:17,891 and then stop printing it. 1045 00:54:17,891 --> 00:54:21,481 So that's-- you're getting a lot of functionality for free from %s. 1046 00:54:21,481 --> 00:54:23,911 Today we're using something much simpler, %p, 1047 00:54:23,911 --> 00:54:27,211 which just literally prints what s is. 1048 00:54:27,211 --> 00:54:28,951 And the reason we don't do this in week 1 1049 00:54:28,951 --> 00:54:31,021 is just because this is like way too much 1050 00:54:31,021 --> 00:54:33,021 to be interesting when all you want to print out 1051 00:54:33,021 --> 00:54:34,541 is hi or hello, world, or the like. 1052 00:54:34,541 --> 00:54:36,511 But now what we're really doing is revealing 1053 00:54:36,511 --> 00:54:38,941 what's been going on this whole time. 1054 00:54:38,941 --> 00:54:40,678 And let me make one other example here. 1055 00:54:40,678 --> 00:54:42,511 Let me go ahead and get rid of this variable 1056 00:54:42,511 --> 00:54:45,901 here and let me just print out a few things to make the same point. 1057 00:54:45,901 --> 00:54:50,131 I'm going to print out not just s like I did here, but let's go ahead 1058 00:54:50,131 --> 00:54:51,181 and print out every-- 1059 00:54:51,181 --> 00:54:53,071 the address of every character in s. 1060 00:54:53,071 --> 00:54:57,353 So let's get the first letter in s and get its address, 1061 00:54:57,353 --> 00:54:59,311 and I'm going to do copy paste for time's sake, 1062 00:54:59,311 --> 00:55:02,521 but not something I would do frequently. 1063 00:55:02,521 --> 00:55:06,034 So let me print out the address of the first character, the second character, 1064 00:55:06,034 --> 00:55:07,951 the third, and actually even the fourth, which 1065 00:55:07,951 --> 00:55:11,321 is the backslash zero, by doing this. 1066 00:55:11,321 --> 00:55:15,931 So when I compiled this program-- make address, ./address-- 1067 00:55:15,931 --> 00:55:19,441 I should see two identical values and then 1068 00:55:19,441 --> 00:55:21,931 additional values that are one byte away. 1069 00:55:21,931 --> 00:55:27,571 In my diagram a moment ago, my addresses were arbitrarily 0x123, 124, 125, 126. 1070 00:55:27,571 --> 00:55:33,841 Now it starts at, by chance, 0x402004, which is s. 1071 00:55:33,841 --> 00:55:37,381 0x402004 is the same thing as s because I'm just 1072 00:55:37,381 --> 00:55:39,991 saying go to the first character and then get its address. 1073 00:55:39,991 --> 00:55:41,491 Those are one in the same now. 1074 00:55:41,491 --> 00:55:47,401 And then after that is 0x402005, 006, 007, 1075 00:55:47,401 --> 00:55:49,181 because that is just like the diagram. 1076 00:55:49,181 --> 00:55:52,981 Go to the i, to the exclamation point, and to the null character. 1077 00:55:52,981 --> 00:55:55,891 So all I'm doing now is using my newfound understanding of what 1078 00:55:55,891 --> 00:55:59,251 ampersand does and what the star does, is I'm just playing around. 1079 00:55:59,251 --> 00:56:02,149 I'm poking around in the computer's memory. 1080 00:56:02,149 --> 00:56:03,691 Just to demonstrate there's no magic. 1081 00:56:03,691 --> 00:56:06,661 It's all there very deliberately because I or printf or someone 1082 00:56:06,661 --> 00:56:07,441 else put it there. 1083 00:56:07,441 --> 00:56:09,166 Yeah. 1084 00:56:09,166 --> 00:56:15,894 AUDIENCE: [INAUDIBLE] 1085 00:56:15,894 --> 00:56:17,561 DAVID J. MALAN: Really good observation. 1086 00:56:17,561 --> 00:56:21,071 So it's indeed the case that hi, unlike 50, 1087 00:56:21,071 --> 00:56:26,291 is ending up at a very low address, not the 0x7FF wherever it was. 1088 00:56:26,291 --> 00:56:29,261 That's actually because, long story short, strings 1089 00:56:29,261 --> 00:56:32,231 are often stored in a different part of the computer's memory-- 1090 00:56:32,231 --> 00:56:34,331 more on that later today-- for efficiency. 1091 00:56:34,331 --> 00:56:37,541 There's actually only going to be one copy of the word "hi" and exclamation 1092 00:56:37,541 --> 00:56:40,821 point, and the computer is going to tuck it at the beginning of my memory, 1093 00:56:40,821 --> 00:56:43,751 but other values like ints and floats and the 1094 00:56:43,751 --> 00:56:46,391 like-- they end up lower in memory by convention. 1095 00:56:46,391 --> 00:56:49,641 But a good observation, because that is consistent here. 1096 00:56:49,641 --> 00:56:53,111 All right, so a couple final details then, on what's been going on here. 1097 00:56:53,111 --> 00:56:58,691 Let me go ahead and claim that we implemented char star-- 1098 00:56:58,691 --> 00:57:01,391 or rather, string as a char star as follows. 1099 00:57:01,391 --> 00:57:03,731 As of last week we were writing this code. 1100 00:57:03,731 --> 00:57:07,961 As of this week, we can now start writing this code because char star 1101 00:57:07,961 --> 00:57:11,541 specifically, we invented in the CS50 library. 1102 00:57:11,541 --> 00:57:14,891 But it turns out you've seen a way of inventing your own data types. 1103 00:57:14,891 --> 00:57:16,631 Recall this thing here. 1104 00:57:16,631 --> 00:57:20,861 We played around last time with data structures, or the struct keyword in C, 1105 00:57:20,861 --> 00:57:24,641 and briefly the typedef keyword, which defines a type for you. 1106 00:57:24,641 --> 00:57:26,651 And if I highlight what's interesting here, 1107 00:57:26,651 --> 00:57:30,341 the way we invented a person data type last time 1108 00:57:30,341 --> 00:57:33,401 was to define a person as having two variables inside of it-- 1109 00:57:33,401 --> 00:57:38,598 a structure that encapsulates a name and encapsulates a number. 1110 00:57:38,598 --> 00:57:41,681 Now even though the syntax is a little different today because of the star 1111 00:57:41,681 --> 00:57:47,771 thing, notice that this could be a similar application of that idea. 1112 00:57:47,771 --> 00:57:52,061 If I want to create a type called string, highlighted in yellow here, 1113 00:57:52,061 --> 00:57:56,231 then I use typedef to make it defined to be char star. 1114 00:57:56,231 --> 00:57:59,951 So this is literally all that has ever been in CS50.h, 1115 00:57:59,951 --> 00:58:02,771 in addition to those prototypes of functions we've talked about. 1116 00:58:02,771 --> 00:58:05,831 typedef char star string is a one-line code 1117 00:58:05,831 --> 00:58:10,558 that brings the word string as a data type into existence, 1118 00:58:10,558 --> 00:58:12,141 and that's all that's ever been there. 1119 00:58:12,141 --> 00:58:15,281 But the star, the char star, is just too much in week 1. 1120 00:58:15,281 --> 00:58:18,671 We wait until this point to peel back that layer. 1121 00:58:18,671 --> 00:58:21,161 are any questions, then, on what a string is? 1122 00:58:21,161 --> 00:58:23,741 What star or the ampersand are doing? 1123 00:58:23,741 --> 00:58:25,511 Yeah. 1124 00:58:25,511 --> 00:58:28,608 AUDIENCE: [INAUDIBLE] 1125 00:58:28,608 --> 00:58:29,691 DAVID J. MALAN: Oh my God. 1126 00:58:29,691 --> 00:58:31,071 Massive spoiler, but yes. 1127 00:58:31,071 --> 00:58:34,671 If that is-- is that why when you compare two strings as I briefly 1128 00:58:34,671 --> 00:58:38,671 did, or almost did, problems arise. 1129 00:58:38,671 --> 00:58:40,971 And in fact yes, last week we use str compare-- 1130 00:58:40,971 --> 00:58:45,351 STRCMP-- for a very deliberate reason because yes, the spoiler is I 1131 00:58:45,351 --> 00:58:49,941 accidentally would have compared two addresses in memory, not the strings 1132 00:58:49,941 --> 00:58:52,111 at those addresses. 1133 00:58:52,111 --> 00:58:53,251 Other questions here. 1134 00:58:55,213 --> 00:58:58,171 All right, well, before we give ourselves maybe a 10 minute break here, 1135 00:58:58,171 --> 00:58:59,401 we have lots of pieces of paper. 1136 00:58:59,401 --> 00:59:02,191 If anyone wants to come on up and play with this big stack of Post-Its, 1137 00:59:02,191 --> 00:59:04,201 if you want to make your own eight by eight grid of something 1138 00:59:04,201 --> 00:59:07,261 to share with the class if you're artistically inclined, come on up. 1139 00:59:07,261 --> 00:59:09,991 Otherwise, let's take 10 minutes and will return after 10. 1140 00:59:09,991 --> 00:59:14,911 All right, so let's come back to this question of how 1141 00:59:14,911 --> 00:59:17,881 we can start to use these pointers and these addresses, ultimately 1142 00:59:17,881 --> 00:59:18,971 in an interesting way. 1143 00:59:18,971 --> 00:59:21,211 The goal ultimately next week is going to be 1144 00:59:21,211 --> 00:59:24,931 to use these addresses to really stitch together more complicated data 1145 00:59:24,931 --> 00:59:28,261 structures than just persons, like last week, or candidates 1146 00:59:28,261 --> 00:59:30,061 in the context of an electoral algorithm, 1147 00:59:30,061 --> 00:59:33,631 if you will, and actually really use our memory in the most versatile way 1148 00:59:33,631 --> 00:59:36,691 to represent not just images but maybe videos 1149 00:59:36,691 --> 00:59:39,191 and other two-dimensional structures as well. 1150 00:59:39,191 --> 00:59:41,581 But for now, let's come back to this address example, 1151 00:59:41,581 --> 00:59:46,561 whittle it down to just a hi initially, and see what's going on again, here 1152 00:59:46,561 --> 00:59:47,461 underneath the hood. 1153 00:59:47,461 --> 00:59:50,401 So let me re-add the CS50 library just so we 1154 00:59:50,401 --> 00:59:54,031 use our synonym for a moment, that is the word string, 1155 00:59:54,031 --> 00:59:56,161 and I'll redefine s as a string. 1156 00:59:56,161 --> 00:59:58,831 And what I didn't mention before is that these double quotes 1157 00:59:58,831 --> 01:00:01,681 that you've been using for some time are actually a little special. 1158 01:00:01,681 --> 01:00:04,921 The double quotes are a clue to the compiler 1159 01:00:04,921 --> 01:00:09,311 that what is between them is in fact a string as we now know it, 1160 01:00:09,311 --> 01:00:12,571 which means the compiler will do all the work of figuring out 1161 01:00:12,571 --> 01:00:15,331 where to put the h, the i, the exclamation point, 1162 01:00:15,331 --> 01:00:18,361 and even adding for you automatically a backslash zero. 1163 01:00:18,361 --> 01:00:20,581 And what the compiler will do for you, too, 1164 01:00:20,581 --> 01:00:23,461 is figure out what address all four of those chars 1165 01:00:23,461 --> 01:00:27,331 ended up at and store it for you in the variable s. 1166 01:00:27,331 --> 01:00:31,531 So that's why it just happens with strings without using ampersands 1167 01:00:31,531 --> 01:00:35,911 or even stars explicitly, but the star at least has been there because again, 1168 01:00:35,911 --> 01:00:38,401 string is just synonymous now with char star. 1169 01:00:38,401 --> 01:00:42,371 It's not really as readable, but it is now the same idea. 1170 01:00:42,371 --> 01:00:44,911 So I'll leave string in place just to do something week 1171 01:00:44,911 --> 01:00:48,581 1 style here for a moment, and let's go ahead and print out a few characters. 1172 01:00:48,581 --> 01:00:54,031 So I'm going to use %c this time, and I'm going to print out s bracket zero 1173 01:00:54,031 --> 01:00:59,161 and then I'm going to print out s bracket one and s bracket two, 1174 01:00:59,161 --> 01:01:03,091 literally doing week three style from last week-- 1175 01:01:03,091 --> 01:01:07,921 a printing of every character in s as though it were an array. 1176 01:01:07,921 --> 01:01:11,221 So ./address should give me h-i exclamation point. 1177 01:01:11,221 --> 01:01:14,461 And if I really want to get curious, technically speaking, 1178 01:01:14,461 --> 01:01:18,691 I could print out one more location, and let me go ahead and recompile, 1179 01:01:18,691 --> 01:01:24,211 make address ./address and there is, it would seem, the backslash zero. 1180 01:01:24,211 --> 01:01:29,641 I'm not seeing zero because I didn't type literally the zero char in ASCII, 1181 01:01:29,641 --> 01:01:33,331 it's literally eight zero bits which are technically unprintable, 1182 01:01:33,331 --> 01:01:34,961 if you will, in printf speak. 1183 01:01:34,961 --> 01:01:37,351 And so what I'm seeing here is like a blank symbol. 1184 01:01:37,351 --> 01:01:39,541 That just means there is something else there-- 1185 01:01:39,541 --> 01:01:43,801 it's apparently all eight zero bits, but they are there 1186 01:01:43,801 --> 01:01:46,571 even though we're not seeing them literally right now. 1187 01:01:46,571 --> 01:01:49,211 Well, let's go ahead and peel back one of these layers 1188 01:01:49,211 --> 01:01:53,131 and let me go ahead and get rid of the CS50 library and get rid of, 1189 01:01:53,131 --> 01:01:56,551 therefore, the word string because again, henceforth it's just char star. 1190 01:01:56,551 --> 01:01:57,901 Nothing else is different. 1191 01:01:57,901 --> 01:02:00,781 I'm going to now do make address, ./address, 1192 01:02:00,781 --> 01:02:02,251 and it's the same exact thing. 1193 01:02:02,251 --> 01:02:05,621 And now, let's just focus on the hi rather than even worry about that. 1194 01:02:05,621 --> 01:02:10,411 So I'm going to recompile one last time and now I have h-i exclamation point. 1195 01:02:10,411 --> 01:02:15,001 Well, it turns out that the array notation we used last week 1196 01:02:15,001 --> 01:02:17,611 was technically some of this syntactic sugar. 1197 01:02:17,611 --> 01:02:20,821 Sort of a neat way to use syntax in a useful way, 1198 01:02:20,821 --> 01:02:26,431 but we can see more explicitly today what the square brackets for a string 1199 01:02:26,431 --> 01:02:28,061 is actually doing. 1200 01:02:28,061 --> 01:02:29,801 Let me go ahead and do this. 1201 01:02:29,801 --> 01:02:35,041 Let me adventurously say I want to print out not s bracket 1202 01:02:35,041 --> 01:02:40,831 zero, but I want to print out whatever the first character of s is. 1203 01:02:40,831 --> 01:02:43,081 So to be clear, what is s now? 1204 01:02:43,081 --> 01:02:44,431 It's the address of a string. 1205 01:02:44,431 --> 01:02:45,931 OK, but what is s, really? 1206 01:02:45,931 --> 01:02:49,441 s is the address of the first char in a string 1207 01:02:49,441 --> 01:02:52,441 and again, that's sufficient for defining a string because eventually 1208 01:02:52,441 --> 01:02:55,361 the computer will see that there's a backslash n at the end of it. 1209 01:02:55,361 --> 01:03:01,241 So s is specifically the address of the first character in a string. 1210 01:03:01,241 --> 01:03:04,291 So that means, using my new syntax, if I want 1211 01:03:04,291 --> 01:03:07,583 to print out that first character I can print out star 1212 01:03:07,583 --> 01:03:11,473 s, because recall that star is the dereference operator when you don't 1213 01:03:11,473 --> 01:03:13,681 repeat the word char, you don't repeat the word int-- 1214 01:03:13,681 --> 01:03:15,301 you just use the star here. 1215 01:03:15,301 --> 01:03:17,821 That means go to that address. 1216 01:03:17,821 --> 01:03:22,651 Similarly, if I, in my newfound knowledge of how strings work, 1217 01:03:22,651 --> 01:03:26,281 know that the h comes first, then the i right after it, 1218 01:03:26,281 --> 01:03:30,151 then the exclamation point, then the backslash zero, contiguously 1219 01:03:30,151 --> 01:03:33,931 one byte apart, I could start to do some arithmetic. 1220 01:03:33,931 --> 01:03:39,571 I could go to s plus 1 byte and print out the second character, 1221 01:03:39,571 --> 01:03:43,321 and I could print out whatever is at s plus 2-- 1222 01:03:43,321 --> 01:03:46,591 in fact, doing what's generally known as pointer arithmetic. 1223 01:03:46,591 --> 01:03:49,591 Literally treating pointers as the numbers they are-- 1224 01:03:49,591 --> 01:03:52,831 hexadecimal or decimal, doesn't really matter-- it's still just numbers. 1225 01:03:52,831 --> 01:03:55,661 And go ahead and add one byte or two bytes 1226 01:03:55,661 --> 01:03:58,151 to them to start at the beginning of a string 1227 01:03:58,151 --> 01:04:00,831 and just poke around from left to right. 1228 01:04:00,831 --> 01:04:04,901 So this now is equivalent to what we did last week using square bracket 1229 01:04:04,901 --> 01:04:09,671 notation, but now I'm re implementing that same idea with this lower level 1230 01:04:09,671 --> 01:04:13,821 plumbing, understanding ampersand and stars now a little bit more, 1231 01:04:13,821 --> 01:04:16,601 so if I remake this program and do ./address, 1232 01:04:16,601 --> 01:04:19,128 I should still see h-i exclamation point. 1233 01:04:19,128 --> 01:04:21,461 But what I'm really doing is just kind of demonstrating, 1234 01:04:21,461 --> 01:04:24,851 hopefully, my understanding of what really 1235 01:04:24,851 --> 01:04:26,711 is going on in the computer's memory. 1236 01:04:26,711 --> 01:04:29,231 Now, programmers who are maybe trying to show off 1237 01:04:29,231 --> 01:04:30,611 might actually write this syntax. 1238 01:04:30,611 --> 01:04:33,236 I think the more common syntax would be what we did last week-- 1239 01:04:33,236 --> 01:04:34,971 s bracket zero, s bracket one. 1240 01:04:34,971 --> 01:04:35,471 Why? 1241 01:04:35,471 --> 01:04:37,346 It's just a little more readable and we don't 1242 01:04:37,346 --> 01:04:41,531 need to brag about or care about this underlying representation. 1243 01:04:41,531 --> 01:04:44,411 The square brackets last week we're an abstraction, if you will, 1244 01:04:44,411 --> 01:04:46,721 on top of what is lower level math. 1245 01:04:46,721 --> 01:04:49,361 But that's all that's going on underneath the hood. 1246 01:04:49,361 --> 01:04:52,811 We're poking around from byte to byte to byte. 1247 01:04:52,811 --> 01:04:58,221 All right, let me pause here, see if there's any questions on that one. 1248 01:04:58,221 --> 01:05:00,931 Any questions on this? 1249 01:05:00,931 --> 01:05:03,651 Let's do one more then, just to demonstrate that this is not 1250 01:05:03,651 --> 01:05:05,171 even specific to strings. 1251 01:05:05,171 --> 01:05:07,161 Let me go ahead and get rid of all of this 1252 01:05:07,161 --> 01:05:11,541 and let me give myself an array of numbers like I did last week. 1253 01:05:11,541 --> 01:05:13,821 So if I'm going to declare all the numbers 1254 01:05:13,821 --> 01:05:16,521 at once using this funky curly brace notation, 1255 01:05:16,521 --> 01:05:19,971 I can do like 4, 6, 8, 2, 7, 5, 0. 1256 01:05:19,971 --> 01:05:24,051 So seven different numbers inside of an array that's automatically 1257 01:05:24,051 --> 01:05:25,071 initialized like this. 1258 01:05:25,071 --> 01:05:27,131 I don't, strictly speaking, need to say seven. 1259 01:05:27,131 --> 01:05:28,881 The compiler is smart enough to figure out 1260 01:05:28,881 --> 01:05:31,251 how many numbers I put with commas between them, 1261 01:05:31,251 --> 01:05:35,751 and that just gives me an array containing 4, 6, 8, 2, 7, 5, 0. 1262 01:05:35,751 --> 01:05:39,201 So it turns out I can print each of these numbers in the familiar way. 1263 01:05:39,201 --> 01:05:45,021 I can do a printf of %i backslash n, and I can print numbers bracket zero, 1264 01:05:45,021 --> 01:05:49,041 and let me just do some quick copy/paste just to print the first three of these. 1265 01:05:49,041 --> 01:05:53,881 Theoretically, that should print out 4, 6, 8, and so forth. 1266 01:05:53,881 --> 01:05:57,021 But I can do the same sort of manipulation understanding 1267 01:05:57,021 --> 01:05:59,931 what pointers now are, using pointer arithmetic. 1268 01:05:59,931 --> 01:06:03,741 So let me actually unwind this and just go back to one printf, 1269 01:06:03,741 --> 01:06:07,191 and instead of printing numbers bracket zero like I might have last week, 1270 01:06:07,191 --> 01:06:11,361 let me just go and print out whatever is at that address-- 1271 01:06:11,361 --> 01:06:13,431 so asterisk numbers. 1272 01:06:13,431 --> 01:06:15,861 Let me then print out the second digit, which 1273 01:06:15,861 --> 01:06:21,051 is going to be whatever is at numbers plus 1, and then let me do this further 1274 01:06:21,051 --> 01:06:25,021 and do whatever is at numbers plus 2, and if I really want to repeat this, 1275 01:06:25,021 --> 01:06:27,261 let me do it four more times and do what's 1276 01:06:27,261 --> 01:06:31,881 at location three, four, five, and six. 1277 01:06:31,881 --> 01:06:35,631 And that's seven total numbers because I started counting at zero. 1278 01:06:35,631 --> 01:06:37,201 So let me just quickly run this. 1279 01:06:37,201 --> 01:06:39,651 Make address, ./address. 1280 01:06:39,651 --> 01:06:42,381 There are those seven digits being printed. 1281 01:06:42,381 --> 01:06:46,401 But there's something subtle but also useful here. 1282 01:06:46,401 --> 01:06:47,541 Each of these digits-- 1283 01:06:47,541 --> 01:06:49,341 4, 6, 8, 2,7,5, 0-- 1284 01:06:49,341 --> 01:06:49,891 is an int. 1285 01:06:49,891 --> 01:06:50,391 Why? 1286 01:06:50,391 --> 01:06:52,531 Because I made an array of integers. 1287 01:06:52,531 --> 01:06:57,181 But think back-- how big is a typical integer, have we claimed? 1288 01:06:57,181 --> 01:07:02,821 Four bytes, or 32 bits, so it's worth noting that I don't really 1289 01:07:02,821 --> 01:07:04,841 need to worry about that detail. 1290 01:07:04,841 --> 01:07:10,119 Notice that I did not do plus 4, plus 8, plus 12, plus 16, plus 20. 1291 01:07:10,119 --> 01:07:11,911 I, the programmer, strictly speaking, don't 1292 01:07:11,911 --> 01:07:14,191 need to worry about how big the data type is. 1293 01:07:14,191 --> 01:07:16,291 This is the power of pointer arithmetic. 1294 01:07:16,291 --> 01:07:21,931 The compiler is smart enough to know that if you add 1 to this pointer, 1295 01:07:21,931 --> 01:07:26,441 that is the same as saying go one more piece of data-- 1296 01:07:26,441 --> 01:07:27,481 not just one byte-- 1297 01:07:27,481 --> 01:07:29,251 so if it's an int, move four. 1298 01:07:29,251 --> 01:07:30,871 If it's a second int, move eight. 1299 01:07:30,871 --> 01:07:32,601 If it's a third int, move 12. 1300 01:07:32,601 --> 01:07:35,821 Pointer arithmetic handles that annoying arithmetic for you 1301 01:07:35,821 --> 01:07:38,461 so you can just think of this as a number after a number 1302 01:07:38,461 --> 01:07:41,821 after a number that are back to back to back but not one byte apart, 1303 01:07:41,821 --> 01:07:43,171 but four bytes apart. 1304 01:07:43,171 --> 01:07:47,201 Which is only to say plus 1, plus 2, plus 3 works no matter the data type. 1305 01:07:47,201 --> 01:07:47,701 Why? 1306 01:07:47,701 --> 01:07:53,121 Because the compiler knows what type of data you're talking about. 1307 01:07:53,121 --> 01:07:56,511 Now, there's one other detail I should reveal here 1308 01:07:56,511 --> 01:07:58,671 that I've taken for granted. 1309 01:07:58,671 --> 01:08:01,641 In the past I was using double quotes to represent strings, 1310 01:08:01,641 --> 01:08:04,371 and I claim that the compiler's smart enough to realize that oh, 1311 01:08:04,371 --> 01:08:08,911 if I have double quote hi, that means it's an array of h-i exclamation point, 1312 01:08:08,911 --> 01:08:10,431 and then the backslash zero. 1313 01:08:10,431 --> 01:08:12,801 Notice this usefulness. 1314 01:08:12,801 --> 01:08:18,561 It turns out that you can actually treat arrays as though the name of the array 1315 01:08:18,561 --> 01:08:20,781 is itself a pointer, and this is actually 1316 01:08:20,781 --> 01:08:23,151 going to be something useful in upcoming problems 1317 01:08:23,151 --> 01:08:26,721 when we want to pass arrays around in the computer's memory. 1318 01:08:26,721 --> 01:08:30,463 Notice that strictly speaking on line five, there's no pointers going on. 1319 01:08:30,463 --> 01:08:32,421 There's no star, there's no ampersand-- there's 1320 01:08:32,421 --> 01:08:35,661 nothing new there, and yet instantly on line seven 1321 01:08:35,661 --> 01:08:40,491 I'm pretending that it is the address, and this is actually OK. 1322 01:08:40,491 --> 01:08:44,391 It turns out that an array really can be treated 1323 01:08:44,391 --> 01:08:47,881 as the address of the first element in that array. 1324 01:08:47,881 --> 01:08:52,079 The difference is that there's no secret backslash zero anywhere. 1325 01:08:52,079 --> 01:08:53,871 This is just part of the phone number here, 1326 01:08:53,871 --> 01:08:56,691 the ending in zero-- that's not like a special backslash zero. 1327 01:08:56,691 --> 01:08:59,721 So this is something we're going to take advantage of too, before long. 1328 01:08:59,721 --> 01:09:03,441 There's this interrelationship between addresses and arrays 1329 01:09:03,441 --> 01:09:08,121 that just generally allows you to treat one as though it is the other, 1330 01:09:08,121 --> 01:09:10,521 but the math is taken care of for you. 1331 01:09:10,521 --> 01:09:14,961 Are any questions then on this before we start to solve some bigger problems? 1332 01:09:14,961 --> 01:09:16,761 Yeah. 1333 01:09:16,761 --> 01:09:23,784 AUDIENCE: [INAUDIBLE] 1334 01:09:23,784 --> 01:09:24,951 DAVID J. MALAN: Potentially. 1335 01:09:24,951 --> 01:09:28,911 If you go beyond the end of an array, you might get a segmentation fault. 1336 01:09:28,911 --> 01:09:32,181 The problem is that that symptom is sometimes nondeterministic, 1337 01:09:32,181 --> 01:09:35,181 which means that sometimes it will happen, sometimes it won't. 1338 01:09:35,181 --> 01:09:39,141 It often depends on how far off the end of the array you actually go. 1339 01:09:39,141 --> 01:09:41,631 You'll often not induce the segmentation fault 1340 01:09:41,631 --> 01:09:44,421 if you just poke a little too far, but if you go way too far 1341 01:09:44,421 --> 01:09:45,831 it quite likely will. 1342 01:09:45,831 --> 01:09:49,161 But we'll give you a tool today actually for detecting and solving 1343 01:09:49,161 --> 01:09:51,181 exactly that kind of situation. 1344 01:09:51,181 --> 01:09:54,091 So let's go ahead now and do something a little different in code, 1345 01:09:54,091 --> 01:09:56,601 but that actually comes back to that spoiler from earlier. 1346 01:09:56,601 --> 01:10:01,471 Let me go ahead and create a program called compare.c, and in this program 1347 01:10:01,471 --> 01:10:04,641 I'm going to go ahead and allow myself the CS50 library, 1348 01:10:04,641 --> 01:10:08,121 not so much for string but so that I can actually use GetInt still, 1349 01:10:08,121 --> 01:10:12,440 which is way easier than the way we'll see that C normally lets you get input. 1350 01:10:12,440 --> 01:10:15,471 Let me give myself stdio.h, do an int main(void), 1351 01:10:15,471 --> 01:10:18,381 not worrying about command line arguments today, and let me go ahead 1352 01:10:18,381 --> 01:10:22,701 and get an int i using get int, and ask the human for the value of i, 1353 01:10:22,701 --> 01:10:28,461 then let me give myself an int j, ask the user for another int, calling it j, 1354 01:10:28,461 --> 01:10:32,631 and then let me go ahead and kind of naively, but to your point earlier, 1355 01:10:32,631 --> 01:10:36,051 if i equals equals j, then let's go ahead 1356 01:10:36,051 --> 01:10:41,121 and print out something like "same," backslash n, else let's go ahead 1357 01:10:41,121 --> 01:10:44,791 and print out "different" if they are not, in fact, the same. 1358 01:10:44,791 --> 01:10:48,951 So that would seem to be a program that compares the value of two integers. 1359 01:10:48,951 --> 01:10:51,261 All right, so let's go ahead and run make compare-- 1360 01:10:51,261 --> 01:10:53,451 so far so good-- ./compare. 1361 01:10:53,451 --> 01:10:56,991 OK, i will be 50, j will be 50-- 1362 01:10:56,991 --> 01:10:58,041 they're the same. 1363 01:10:58,041 --> 01:10:59,221 Let's do it once more. 1364 01:10:59,221 --> 01:11:02,239 i will be 50, j will be 42. 1365 01:11:02,239 --> 01:11:03,031 They are different. 1366 01:11:03,031 --> 01:11:07,341 So so far, so good in this first version of comparison. 1367 01:11:07,341 --> 01:11:10,411 But as you might see where I'm going with this, 1368 01:11:10,411 --> 01:11:14,151 let's move away from integers and let's actually change these things to char-- 1369 01:11:14,151 --> 01:11:15,301 to strings. 1370 01:11:15,301 --> 01:11:17,901 So I could do string s over here-- 1371 01:11:17,901 --> 01:11:20,481 GetString s over here. 1372 01:11:20,481 --> 01:11:27,351 Then I could do string t over here, and GetString over here, 1373 01:11:27,351 --> 01:11:30,081 asking the user for t this time, here. 1374 01:11:30,081 --> 01:11:31,611 And then I can compare the two. 1375 01:11:31,611 --> 01:11:33,458 If s equals equals t-- 1376 01:11:33,458 --> 01:11:34,791 and this is a common convention. 1377 01:11:34,791 --> 01:11:37,821 If you've used s for string already you can use t for the next one, at least 1378 01:11:37,821 --> 01:11:39,441 for simple demonstrations like this. 1379 01:11:39,441 --> 01:11:42,566 I'm going to compare the two, just like I did for ints, which worked great. 1380 01:11:42,566 --> 01:11:46,521 Make compare-- so far so good-- ./address-- 1381 01:11:46,521 --> 01:11:47,361 oh, sorry. 1382 01:11:47,361 --> 01:11:49,221 Wrong program-- ./compare. 1383 01:11:49,221 --> 01:11:52,431 Let me go ahead and type in something like 1384 01:11:52,431 --> 01:11:57,401 hi, exclamation point and bye, exclamation point, which of course 1385 01:11:57,401 --> 01:11:59,301 should definitely be different. 1386 01:11:59,301 --> 01:12:05,121 Let me run it again with hi, exclamation point and hi, exclamation point. 1387 01:12:05,121 --> 01:12:07,071 Different-- maybe I messed up. 1388 01:12:07,071 --> 01:12:10,181 Let's maybe do it lowercase, maybe that'll fix. 1389 01:12:10,181 --> 01:12:12,501 But no, those two are different. 1390 01:12:12,501 --> 01:12:16,481 So to come back to what I described as a spoiler earlier, what's 1391 01:12:16,481 --> 01:12:20,659 the fundamental issue here, to be clear? 1392 01:12:20,659 --> 01:12:22,701 Why is it saying different even though I'm pretty 1393 01:12:22,701 --> 01:12:24,118 sure I typed the same thing twice. 1394 01:12:24,118 --> 01:12:26,181 Yeah. 1395 01:12:26,181 --> 01:12:29,601 Yeah, this is where it's now useful to know that string has been 1396 01:12:29,601 --> 01:12:33,063 an abstraction-- a training wheel, if you will-- and if we take that away-- 1397 01:12:33,063 --> 01:12:35,271 still use GetString because that's convenient still-- 1398 01:12:35,271 --> 01:12:38,061 but if I change string to be char star, it's 1399 01:12:38,061 --> 01:12:44,301 a little more explicit as to what s and what t are. s is a pointer to a char, 1400 01:12:44,301 --> 01:12:46,761 that is the address of a char. t is a pointer 1401 01:12:46,761 --> 01:12:48,921 to a char, that is the address of a char. 1402 01:12:48,921 --> 01:12:52,071 Specifically, the first character in s and the first character 1403 01:12:52,071 --> 01:12:53,851 in t, respectively. 1404 01:12:53,851 --> 01:12:56,076 So if I'm comparing these two it should stand 1405 01:12:56,076 --> 01:12:57,951 to reason that they're going to be different. 1406 01:12:57,951 --> 01:12:58,451 Why? 1407 01:12:58,451 --> 01:13:02,061 Because s might end up here in memory and t might end up here in memory. 1408 01:13:02,061 --> 01:13:05,181 Each time I call GetString, it is not smart enough or advanced enough 1409 01:13:05,181 --> 01:13:07,171 to know that, wait a minute-- you typed the same thing. 1410 01:13:07,171 --> 01:13:08,691 I'm just going to hand you back the same address. 1411 01:13:08,691 --> 01:13:11,511 That doesn't happen because we did not design GetString that way. 1412 01:13:11,511 --> 01:13:15,141 Each time I call GetString, it returns, apparently, 1413 01:13:15,141 --> 01:13:17,901 a different copy of the string that was typed in. 1414 01:13:17,901 --> 01:13:20,211 A hi over here and a hi over here. 1415 01:13:20,211 --> 01:13:22,791 They might look the same to the human but to the computer 1416 01:13:22,791 --> 01:13:26,691 they are different chunks of memory, and therefore at different addresses. 1417 01:13:26,691 --> 01:13:30,181 And here, too, we can reveal what is GetString returning? 1418 01:13:30,181 --> 01:13:34,161 Well, up until today it was returning a string, so to speak. 1419 01:13:34,161 --> 01:13:35,661 That's not really a thing. 1420 01:13:35,661 --> 01:13:38,001 Technically, what GetString has always been 1421 01:13:38,001 --> 01:13:43,371 doing is returning the address of the first char in a string 1422 01:13:43,371 --> 01:13:47,181 and trusting that we put a backslash zero at the end of whatever the human 1423 01:13:47,181 --> 01:13:51,411 typed in, and that's enough now for printf, for strlen, for you 1424 01:13:51,411 --> 01:13:53,961 to know where a string begins and ends. 1425 01:13:53,961 --> 01:13:57,711 So GetString has actually always returned a pointer. 1426 01:13:57,711 --> 01:14:01,101 It has not returned a quote unquote string per se, 1427 01:14:01,101 --> 01:14:04,401 but there are functions that can solve this comparison for us. 1428 01:14:04,401 --> 01:14:07,501 Recall that I could do something like this. 1429 01:14:07,501 --> 01:14:10,431 I could actually go in here and I could-- 1430 01:14:10,431 --> 01:14:11,641 let's see, where was it? 1431 01:14:11,641 --> 01:14:18,981 So if I include str compare here and use it to pass in two values, s and t, 1432 01:14:18,981 --> 01:14:22,701 let's see now what happens when I make compare. 1433 01:14:22,701 --> 01:14:26,211 Implicitly declaring library function str compare with type int-- 1434 01:14:26,211 --> 01:14:27,321 and well, there's a star. 1435 01:14:27,321 --> 01:14:30,801 So you might have seen this error before and you might have ignored most of it, 1436 01:14:30,801 --> 01:14:35,281 but there's some evidence of stars or pointers going on here. 1437 01:14:35,281 --> 01:14:37,771 It looks like I didn't include the string.h header file, 1438 01:14:37,771 --> 01:14:38,961 so that's an easy fix. 1439 01:14:38,961 --> 01:14:43,551 Include string.h which, despite its name, does not create a data type 1440 01:14:43,551 --> 01:14:46,431 called string, it just has string-related functions in it 1441 01:14:46,431 --> 01:14:47,511 like str compare. 1442 01:14:47,511 --> 01:14:49,161 Let's make compare again. 1443 01:14:49,161 --> 01:14:51,231 Now it compiles, ./compare. 1444 01:14:51,231 --> 01:14:55,011 Now let's type in hi, exclamation point and even the same thing again. 1445 01:14:55,011 --> 01:14:58,641 These are now-- oh, I used it wrong. 1446 01:14:58,641 --> 01:15:00,364 OK, user error. 1447 01:15:00,364 --> 01:15:02,781 That was supposed to be impressive, but it's the opposite. 1448 01:15:02,781 --> 01:15:05,101 What did I do wrong? 1449 01:15:05,101 --> 01:15:06,201 What did I do wrong here? 1450 01:15:06,201 --> 01:15:07,463 Yeah. 1451 01:15:07,463 --> 01:15:08,951 Yeah. 1452 01:15:08,951 --> 01:15:12,258 AUDIENCE: [INAUDIBLE] 1453 01:15:12,258 --> 01:15:14,591 DAVID J. MALAN: Yeah, it returns three different values. 1454 01:15:14,591 --> 01:15:18,371 Zero if they're the same, positive 1 becomes before the other, 1455 01:15:18,371 --> 01:15:20,061 negative if the opposite is true. 1456 01:15:20,061 --> 01:15:23,261 I just forgot that, so like I did last week correctly, 1457 01:15:23,261 --> 01:15:26,741 if I want to compare them for equality per the manual page, 1458 01:15:26,741 --> 01:15:29,421 I should be checking for zero as the return value. 1459 01:15:29,421 --> 01:15:32,591 Now make compare, ./compare, Enter. 1460 01:15:32,591 --> 01:15:35,261 Let's try it one last time-- hi and hi. 1461 01:15:35,261 --> 01:15:36,821 OK now, they're in fact the same. 1462 01:15:36,821 --> 01:15:38,231 And Justin, thank you. 1463 01:15:41,871 --> 01:15:44,751 And indeed, not that it's returning same all the time. 1464 01:15:44,751 --> 01:15:46,971 If I type in hi and then bye, it's indeed 1465 01:15:46,971 --> 01:15:49,261 noticing that difference as well. 1466 01:15:49,261 --> 01:15:53,251 Well, let me go ahead and do one other thing here. 1467 01:15:53,251 --> 01:15:55,501 Let's do one other thing. 1468 01:15:55,501 --> 01:15:59,001 Let me go ahead now and just reveal more pictorially what's going on. 1469 01:15:59,001 --> 01:16:02,331 Let's get rid of the string comparison and let's just print these things out. 1470 01:16:02,331 --> 01:16:06,111 The simple way to print this out would be with %s and again, %s is special-- 1471 01:16:06,111 --> 01:16:07,161 printf knows-- 1472 01:16:07,161 --> 01:16:10,341 taking an address and start there, print every character up 1473 01:16:10,341 --> 01:16:13,741 until the backslash n, so let's just hand it s and do that. 1474 01:16:13,741 --> 01:16:16,911 And then let's do one more, %s,t. 1475 01:16:16,911 --> 01:16:21,751 This is, again, sort of a mix of week 1 and this week 1476 01:16:21,751 --> 01:16:23,571 because I got rid of the word string. 1477 01:16:23,571 --> 01:16:28,711 I'm using char star, but I'm still using printf and %s in the same way. 1478 01:16:28,711 --> 01:16:32,331 Let me go ahead and run compare now, and if I type hi and hi, 1479 01:16:32,331 --> 01:16:34,291 I should see the same thing twice. 1480 01:16:34,291 --> 01:16:37,911 So they look the same, but here now we have the syntax today 1481 01:16:37,911 --> 01:16:40,291 to print out the actual addresses of these things. 1482 01:16:40,291 --> 01:16:44,721 So let me just change the s to a p, because p means don't go to the address 1483 01:16:44,721 --> 01:16:48,651 and print it, it means just print the address as a pointer. 1484 01:16:48,651 --> 01:16:53,421 So make compare, ./compare, and now let's type in hi, and once more, 1485 01:16:53,421 --> 01:16:57,831 and I should see, indeed, two slightly different addresses given 1486 01:16:57,831 --> 01:16:58,641 in hexadecimal. 1487 01:16:58,641 --> 01:17:00,951 One's got a B at the end, one's got an F at the end, 1488 01:17:00,951 --> 01:17:03,481 and they are indeed a few bytes apart. 1489 01:17:03,481 --> 01:17:06,706 So this is just confirming what our suspicions have actually been. 1490 01:17:06,706 --> 01:17:09,081 So what does this mean, perhaps in the computer's memory? 1491 01:17:09,081 --> 01:17:10,581 Well, let's take a look. 1492 01:17:10,581 --> 01:17:14,511 I've zoomed out so I have a little more squares to look at at once. 1493 01:17:14,511 --> 01:17:20,901 Here might be s in memory when I do string s equals, or char star s equals. 1494 01:17:20,901 --> 01:17:24,381 I get a variable that's of size 1, 2, 3, 4, 5, 6, 7, 8, because I 1495 01:17:24,381 --> 01:17:27,951 claimed earlier that on modern systems, pointers are generally eight bytes 1496 01:17:27,951 --> 01:17:30,261 nowadays so they can count even higher. 1497 01:17:30,261 --> 01:17:33,246 And inside of the computer's memory, also, might be hi. 1498 01:17:33,246 --> 01:17:35,871 And I don't know where it ends up so for the sake of discussion 1499 01:17:35,871 --> 01:17:36,801 it ended up down here. 1500 01:17:36,801 --> 01:17:39,761 That's what was free when I ran the program. 1501 01:17:39,761 --> 01:17:41,601 h-i exclamation point, backslash zero. 1502 01:17:41,601 --> 01:17:46,761 Maybe it ended up, for the sake of discussion, at 0x123, 4, 5, and 6. 1503 01:17:46,761 --> 01:17:51,801 So to be clear, what is s storing once the assignment 1504 01:17:51,801 --> 01:17:54,711 operator copies from right to left? 1505 01:17:54,711 --> 01:17:59,331 What is s storing if I advance one more slide? 1506 01:17:59,331 --> 01:18:01,451 Yeah. 1507 01:18:01,451 --> 01:18:05,261 0x123, the presumption being that if a string is 1508 01:18:05,261 --> 01:18:09,236 defined by the address of its first char and that address of its first char 1509 01:18:09,236 --> 01:18:13,691 is 0x123, then that's indeed what should be in the variable s. 1510 01:18:13,691 --> 01:18:16,751 And so technically, that's what's been happening with that assignment 1511 01:18:16,751 --> 01:18:18,251 operator from right to left. 1512 01:18:18,251 --> 01:18:21,401 GetString indeed returns a string, so to speak, 1513 01:18:21,401 --> 01:18:25,241 but more properly it returns the address of a char. 1514 01:18:25,241 --> 01:18:28,721 What's been then copied from right to left using that assignment operator 1515 01:18:28,721 --> 01:18:31,601 all these weeks is indeed that address. 1516 01:18:31,601 --> 01:18:36,101 Now technically, we don't really need to care about where these addresses are. 1517 01:18:36,101 --> 01:18:38,951 It suffices to just think about them referentially, but let's 1518 01:18:38,951 --> 01:18:42,791 first consider where t might be. t is just another variable that I 1519 01:18:42,791 --> 01:18:44,441 created on my second line of code. 1520 01:18:44,441 --> 01:18:46,061 Maybe it ends up there, maybe somewhere else. 1521 01:18:46,061 --> 01:18:48,353 For the sake of discussion I'll draw it left and right. 1522 01:18:48,353 --> 01:18:51,771 Where did the second word end up that I typed in? 1523 01:18:51,771 --> 01:18:57,671 Well, suppose the second copy of hi ended up at 0x456457458459. 1524 01:18:57,671 --> 01:18:58,961 What ended up in t? 1525 01:18:58,961 --> 01:19:00,551 I'll pluck this one off myself. 1526 01:19:00,551 --> 01:19:02,621 0x456, presumably. 1527 01:19:02,621 --> 01:19:06,071 And so this is now a pictorial representation of why, 1528 01:19:06,071 --> 01:19:07,751 and let's abstract away everything else. 1529 01:19:07,751 --> 01:19:13,061 When I compared s against t using equal equals, based on the picture 1530 01:19:13,061 --> 01:19:14,591 they're obviously not the same. 1531 01:19:14,591 --> 01:19:16,751 One is over here, one is over here. 1532 01:19:16,751 --> 01:19:21,281 And per a moment ago, one is 0x123, the other is 0x456. 1533 01:19:21,281 --> 01:19:24,491 Yes, technically they're pointing at something that's the same, 1534 01:19:24,491 --> 01:19:27,971 but that just reveals how str compare works. 1535 01:19:27,971 --> 01:19:30,641 str compare is apparently a function that 1536 01:19:30,641 --> 01:19:33,881 takes in the address of a string as its argument 1537 01:19:33,881 --> 01:19:36,401 and the address of another string as its argument, 1538 01:19:36,401 --> 01:19:41,321 it goes to the first character in each of those strings, respectively, 1539 01:19:41,321 --> 01:19:43,511 and probably has a for loop or a while loop 1540 01:19:43,511 --> 01:19:46,421 and just goes from left to right, comparing, looking 1541 01:19:46,421 --> 01:19:50,141 for the same chars left and right, and if it doesn't notice any differences, 1542 01:19:50,141 --> 01:19:52,121 boom-- it returns zero. 1543 01:19:52,121 --> 01:19:56,481 If it does notice a difference it returns a positive or a negative value. 1544 01:19:56,481 --> 01:20:00,321 And that's very similar, recall, to how we implemented string length ourselves 1545 01:20:00,321 --> 01:20:00,821 last week. 1546 01:20:00,821 --> 01:20:03,731 I used a for loop, I was looking for a backslash zero. 1547 01:20:03,731 --> 01:20:09,521 str compare is probably a little similar in spirit, looping from left to right 1548 01:20:09,521 --> 01:20:13,001 but comparing, this time not just counting. 1549 01:20:13,001 --> 01:20:15,731 Are any questions then, on string comparison 1550 01:20:15,731 --> 01:20:18,821 and why it is that we use str compare and not equals equals? 1551 01:20:18,821 --> 01:20:20,013 Yeah. 1552 01:20:20,013 --> 01:20:22,249 AUDIENCE: Do pointers have addresses? 1553 01:20:22,249 --> 01:20:24,041 DAVID J. MALAN: Do pointers have addresses? 1554 01:20:24,041 --> 01:20:24,541 Yes. 1555 01:20:24,541 --> 01:20:29,291 So we won't do that today, but I could actually use the ampersand operator 1556 01:20:29,291 --> 01:20:30,821 on s or on t. 1557 01:20:30,821 --> 01:20:34,421 That would give me the equivalent of a char star star 1558 01:20:34,421 --> 01:20:36,606 that itself could be stored elsewhere in memory. 1559 01:20:36,606 --> 01:20:37,481 That's where it ends. 1560 01:20:37,481 --> 01:20:39,671 We don't do that recursively forever. 1561 01:20:39,671 --> 01:20:42,611 There's star and there's star star, but yes, that is a thing 1562 01:20:42,611 --> 01:20:45,911 and it's very often useful in the context of two dimensional arrays, 1563 01:20:45,911 --> 01:20:49,181 which we haven't really talked about, but that is a feature of the language, 1564 01:20:49,181 --> 01:20:49,681 too. 1565 01:20:49,681 --> 01:20:50,711 But not today. 1566 01:20:50,711 --> 01:20:52,221 Good question. 1567 01:20:52,221 --> 01:20:55,271 All right, so what might we now do to take things up a notch? 1568 01:20:55,271 --> 01:20:57,791 Well let's go ahead and implement a different program here 1569 01:20:57,791 --> 01:21:01,341 that maybe tries copying some values, just to demonstrate this. 1570 01:21:01,341 --> 01:21:05,081 Let me open up a file called, how about copy.c, 1571 01:21:05,081 --> 01:21:07,511 and I'm going to start off with a few includes. 1572 01:21:07,511 --> 01:21:11,291 So let's include the CS50 library just so we have a way of getting user input. 1573 01:21:11,291 --> 01:21:15,941 Let's include-- how about stdio as always, let's preemptively 1574 01:21:15,941 --> 01:21:18,711 include string.h and maybe one other in a moment. 1575 01:21:18,711 --> 01:21:21,711 Let's do int main(void) as before. 1576 01:21:21,711 --> 01:21:25,241 And then in here, let's get a string from the user and just 1577 01:21:25,241 --> 01:21:27,671 call it s for simplicity. 1578 01:21:27,671 --> 01:21:31,361 And heck, we can actually just call this char star if we want, 1579 01:21:31,361 --> 01:21:33,474 or string, since we're using the RS50 library. 1580 01:21:33,474 --> 01:21:34,641 But we'll come back to that. 1581 01:21:34,641 --> 01:21:38,231 Let's now make a copy of s and do s equals t, 1582 01:21:38,231 --> 01:21:42,891 using a single assignment operator and then let's check something like this. 1583 01:21:42,891 --> 01:21:47,831 Let's go into the first character of t, which is t bracket zero, 1584 01:21:47,831 --> 01:21:50,231 and then let's uppercase it using that function 1585 01:21:50,231 --> 01:21:55,571 that we've used in the past of toupper t bracket zero, semicolon. 1586 01:21:55,571 --> 01:21:57,231 And actually, I should go back up here. 1587 01:21:57,231 --> 01:22:01,468 If I'm using toupper or if you use tolower or isupper or islower-- 1588 01:22:01,468 --> 01:22:04,301 I might not remember this offhand, but it was in another header file 1589 01:22:04,301 --> 01:22:06,161 called C type dot h. 1590 01:22:06,161 --> 01:22:09,291 There was a bunch of helpful functions in that library as well. 1591 01:22:09,291 --> 01:22:14,096 Now at the very last line of the program let's just print out what both s and t 1592 01:22:14,096 --> 01:22:21,521 are by simply printing out %s for each of them, and t is %s also, not %t, 1593 01:22:21,521 --> 01:22:24,681 of course, and let's see what happens here. 1594 01:22:24,681 --> 01:22:26,471 So let me make copy-- 1595 01:22:26,471 --> 01:22:27,881 oh my God, so many mistakes. 1596 01:22:27,881 --> 01:22:29,271 What did I do wrong? 1597 01:22:29,271 --> 01:22:30,221 Oh. 1598 01:22:30,221 --> 01:22:31,301 OK, that was unintended. 1599 01:22:31,301 --> 01:22:34,851 String t equals s, sorry, so I'm creating two variables, 1600 01:22:34,851 --> 01:22:37,781 s and t respectively, and I'm copying s into t. 1601 01:22:37,781 --> 01:22:39,461 Make copy, Enter. 1602 01:22:39,461 --> 01:22:44,651 There we go. ./copy, and let's now type in, for instance, 1603 01:22:44,651 --> 01:22:48,521 how about hi exclamation point in all lowercase this time, 1604 01:22:48,521 --> 01:22:52,091 and now what gets printed? 1605 01:22:52,091 --> 01:22:56,201 I don't think that's what I intended, so to speak, here. 1606 01:22:56,201 --> 01:23:00,021 Because notice that I got s from the user, so that checks out. 1607 01:23:00,021 --> 01:23:03,703 I then copied t into s, which looks correct. 1608 01:23:03,703 --> 01:23:05,411 That's what we always use assignment for. 1609 01:23:05,411 --> 01:23:09,191 Then I uppercase the first letter in t, but not s-- 1610 01:23:09,191 --> 01:23:10,331 at least in my code-- 1611 01:23:10,331 --> 01:23:14,051 then I printed s and t and then noticed, apparently, both s 1612 01:23:14,051 --> 01:23:17,921 and t got capitalized. 1613 01:23:17,921 --> 01:23:20,521 So if you're starting to get a little comfortable with what's 1614 01:23:20,521 --> 01:23:24,421 going on underneath the hood, what's the fundamental problem here? 1615 01:23:24,421 --> 01:23:28,223 Why did both get capitalized? 1616 01:23:28,223 --> 01:23:29,431 Why did both get capitalized? 1617 01:23:29,431 --> 01:23:30,121 Yeah, over here. 1618 01:23:30,121 --> 01:23:32,601 AUDIENCE: Could it be they're referencing the same address? 1619 01:23:32,601 --> 01:23:34,011 DAVID J. MALAN: Yeah, they're representing the same address. 1620 01:23:34,011 --> 01:23:35,871 So C is really literal. 1621 01:23:35,871 --> 01:23:39,261 If you create another variable called t and you assign it the value of s, 1622 01:23:39,261 --> 01:23:41,871 you are literally assigning it the value in s, 1623 01:23:41,871 --> 01:23:44,761 which is 0x123 or something like that. 1624 01:23:44,761 --> 01:23:48,381 And so at that point in the story both s and t presumably 1625 01:23:48,381 --> 01:23:51,951 have a value of 0x123, which means they technically 1626 01:23:51,951 --> 01:23:56,061 point to the same h-i exclamation point in memory. 1627 01:23:56,061 --> 01:24:00,891 Nowhere did I tell the computer to give me a copy of a h-i exclamation point 1628 01:24:00,891 --> 01:24:04,131 per se, I literally said just copy s. 1629 01:24:04,131 --> 01:24:08,391 So here's where an understanding of what s literally is explains the situation. 1630 01:24:08,391 --> 01:24:10,761 I'm only copying the pointers. 1631 01:24:10,761 --> 01:24:12,601 So what actually went on in memory? 1632 01:24:12,601 --> 01:24:14,241 Let's take a look here at this grid. 1633 01:24:14,241 --> 01:24:17,091 If I created s initially, maybe it ends up here. 1634 01:24:17,091 --> 01:24:20,601 And I created hi in lowercase, and it ended up down here. 1635 01:24:20,601 --> 01:24:26,751 Then the address was, again, like 0x123456, 0x123 is what's in s. 1636 01:24:26,751 --> 01:24:29,451 If then I create a second variable called t, 1637 01:24:29,451 --> 01:24:33,681 and I call it a string, a.k.a. char star, maybe it again ends up here. 1638 01:24:33,681 --> 01:24:39,261 But when I copy s into t by doing t equals s semicolon, 1639 01:24:39,261 --> 01:24:44,866 that literally just copies s into t, which puts the value 0x123 there. 1640 01:24:44,866 --> 01:24:47,991 So if we now abstract away all these numbers and just think about a picture 1641 01:24:47,991 --> 01:24:52,371 with arrows, what we've drawn in the computer's memory is this. 1642 01:24:52,371 --> 01:24:56,871 Two different pointers but storing the same address, which means 1643 01:24:56,871 --> 01:24:59,761 the breadcrumbs lead to the same place. 1644 01:24:59,761 --> 01:25:02,841 And so if you follow the t breadcrumb and capitalize the first letter, 1645 01:25:02,841 --> 01:25:06,831 it is functionally the same as copying the-- 1646 01:25:06,831 --> 01:25:12,471 changing the first letter in the version s as well. 1647 01:25:12,471 --> 01:25:17,311 So what's the solution, then, to this kind of problem? 1648 01:25:17,311 --> 01:25:19,381 Even if you have no idea how to do it in code, 1649 01:25:19,381 --> 01:25:21,946 what's the gist of what I really intended, which is, 1650 01:25:21,946 --> 01:25:26,101 I want a genuine copy of s, called t. 1651 01:25:26,101 --> 01:25:30,213 I want a new h-i exclamation point backslash zero. 1652 01:25:30,213 --> 01:25:31,921 What do I need to do to make that happen? 1653 01:25:31,921 --> 01:25:32,888 Thoughts? 1654 01:25:32,888 --> 01:25:35,631 AUDIENCE: I think there's a function called str copy. 1655 01:25:35,631 --> 01:25:38,961 DAVID J. MALAN: So there is a function called str copy, strcpy, 1656 01:25:38,961 --> 01:25:41,511 which is a possible answer to this question. 1657 01:25:41,511 --> 01:25:45,681 The catch with stir copy is that you have to tell it in advance not only 1658 01:25:45,681 --> 01:25:48,231 what the source string is-- the one you want to copy-- 1659 01:25:48,231 --> 01:25:50,961 you also need to pass in the address of a chunk of memory 1660 01:25:50,961 --> 01:25:55,551 into which you can copy the string, and here's one thing we haven't seen yet, 1661 01:25:55,551 --> 01:25:57,951 and we need one more building block today, if you will. 1662 01:25:57,951 --> 01:26:02,361 We haven't yet seen a way to create new chunks of memory 1663 01:26:02,361 --> 01:26:05,281 and then let some other function copy into them. 1664 01:26:05,281 --> 01:26:08,661 And for this, we're going to introduce something called dynamic memory 1665 01:26:08,661 --> 01:26:09,571 allocation. 1666 01:26:09,571 --> 01:26:12,291 And this is the last and most powerful feature perhaps, today, 1667 01:26:12,291 --> 01:26:16,251 whereby we're going to introduce two functions, malloc and free, where 1668 01:26:16,251 --> 01:26:19,491 malloc means memory allocate, which literally does just that. 1669 01:26:19,491 --> 01:26:22,641 It's a function that takes a number as input-- how many bytes of memory 1670 01:26:22,641 --> 01:26:26,034 do you want the operating system to find for you somewhere in that big grid? 1671 01:26:26,034 --> 01:26:27,951 It's going to find it and it's going to return 1672 01:26:27,951 --> 01:26:31,554 to you the address of the first byte of contiguous memory back to back to back, 1673 01:26:31,554 --> 01:26:34,221 and then you can do anything you want with that chunk of memory. 1674 01:26:34,221 --> 01:26:35,751 free is going to do the opposite. 1675 01:26:35,751 --> 01:26:38,571 When you're done using a chunk of memory that malloc has given you, 1676 01:26:38,571 --> 01:26:42,201 you can say free it, and that means you hand it back to the operating system 1677 01:26:42,201 --> 01:26:45,421 and then the operating system can use it for something else later. 1678 01:26:45,421 --> 01:26:48,861 So this is actually evidence of a common problem in programming. 1679 01:26:48,861 --> 01:26:53,311 If your Mac your PC has ever been in the habit of starting to get really, 1680 01:26:53,311 --> 01:26:57,921 really slow, or it's slowing to a crawl-- heck, maybe it even freezes-- 1681 01:26:57,921 --> 01:27:00,921 one of the possible explanations could be 1682 01:27:00,921 --> 01:27:03,801 that the program you're running by Apple or Microsoft 1683 01:27:03,801 --> 01:27:07,041 or whoever, maybe they're using malloc or some equivalent, 1684 01:27:07,041 --> 01:27:08,346 asking the operating system-- 1685 01:27:08,346 --> 01:27:10,221 Mac OS or Windows-- for, give me more memory. 1686 01:27:10,221 --> 01:27:11,001 I need more memory. 1687 01:27:11,001 --> 01:27:12,381 The user is creating more images. 1688 01:27:12,381 --> 01:27:13,821 The user is typing a longer essay. 1689 01:27:13,821 --> 01:27:15,441 Give me more memory, more memory. 1690 01:27:15,441 --> 01:27:20,001 If the program has a bug and never actually frees any of that memory, 1691 01:27:20,001 --> 01:27:22,701 your computer might end up using all of the available memory 1692 01:27:22,701 --> 01:27:26,571 and honestly, humans are not very good at handling corner cases like that. 1693 01:27:26,571 --> 01:27:29,451 Very often programs, computers just freeze at that point 1694 01:27:29,451 --> 01:27:33,591 or get really, really slow because they start trying to be creative 1695 01:27:33,591 --> 01:27:35,751 when there's not enough memory left. 1696 01:27:35,751 --> 01:27:38,361 So one of the reasons for a computer really slowing down 1697 01:27:38,361 --> 01:27:42,634 might be calling for malloc a lot, or some equivalent, but never freeing it. 1698 01:27:42,634 --> 01:27:45,051 Which is to say, you should always use these two functions 1699 01:27:45,051 --> 01:27:48,631 in concert and free memory once you are done with it. 1700 01:27:48,631 --> 01:27:52,761 So let me go ahead and do this in code and solve this problem properly. 1701 01:27:52,761 --> 01:27:54,801 Let me go ahead and do this. 1702 01:27:54,801 --> 01:27:58,491 Before I copy s into t using something like str copy, 1703 01:27:58,491 --> 01:28:01,126 I first need to get a bunch of memory from the computer. 1704 01:28:01,126 --> 01:28:04,251 So to do that, let's make this super clear that we're dealing with pointer, 1705 01:28:04,251 --> 01:28:07,821 so I'm going to change my strings to char stars for both s and t, 1706 01:28:07,821 --> 01:28:10,281 and what I technically am going to store in t 1707 01:28:10,281 --> 01:28:14,331 is the address of an available chunk of memory. 1708 01:28:14,331 --> 01:28:18,531 To do that, I can ask the computer to allocate memory for me, 1709 01:28:18,531 --> 01:28:19,941 and how many bytes. 1710 01:28:19,941 --> 01:28:23,181 If I want to create a copy of h-i exclamation point, 1711 01:28:23,181 --> 01:28:26,501 I need how many bytes? 1712 01:28:26,501 --> 01:28:27,001 Good! 1713 01:28:27,001 --> 01:28:27,631 Four! 1714 01:28:27,631 --> 01:28:31,891 Because I need the h, the i, the exclamation point, and additional space 1715 01:28:31,891 --> 01:28:33,001 for the backslash zero. 1716 01:28:33,001 --> 01:28:35,161 It's up to me to understand that and ask for it. 1717 01:28:35,161 --> 01:28:36,691 It's not going to happen magically. 1718 01:28:36,691 --> 01:28:40,601 Nothing does in C. So I could just naively type four there, 1719 01:28:40,601 --> 01:28:43,501 and that would be correct if I type in h-i exclamation 1720 01:28:43,501 --> 01:28:47,431 point or any other three letter word or phrase, but to do this dynamically 1721 01:28:47,431 --> 01:28:50,761 I should probably do something like strlen of s 1722 01:28:50,761 --> 01:28:54,331 plus 1 for the additional null character. 1723 01:28:54,331 --> 01:28:56,821 Recall that string length does it in the English sense-- 1724 01:28:56,821 --> 01:29:00,991 it returns the length of the string you see, plus 1 also takes into account 1725 01:29:00,991 --> 01:29:03,241 the fact that I'm going to need that backslash n. 1726 01:29:03,241 --> 01:29:05,611 Now let me do this old school style first. 1727 01:29:05,611 --> 01:29:10,351 Let me go ahead and manually copy the string s into t first. 1728 01:29:10,351 --> 01:29:18,211 So for int i equals 0, i is less than the string length of s, i plus plus. 1729 01:29:18,211 --> 01:29:23,161 Then inside my for loop, I'm going to do t bracket i equals s bracket 1730 01:29:23,161 --> 01:29:27,211 i, but actually I want the null character too, 1731 01:29:27,211 --> 01:29:30,001 so I want to do the length of the string plus 1 more, 1732 01:29:30,001 --> 01:29:32,671 and heck, I think I learned an optimization last time. 1733 01:29:32,671 --> 01:29:35,131 If I'm doing this again and again, I could really 1734 01:29:35,131 --> 01:29:40,861 do n equals strlen of s plus 1 and then do i is less than n, 1735 01:29:40,861 --> 01:29:43,361 just as a nice design optimization. 1736 01:29:43,361 --> 01:29:46,531 I think this for loop will actually handle the process, then, 1737 01:29:46,531 --> 01:29:53,341 of copying every character from s into every available byte of memory in t. 1738 01:29:53,341 --> 01:29:56,671 Or I could get rid of all of that and take your suggestion, which 1739 01:29:56,671 --> 01:30:00,841 is to use str copy, which takes as its first argument the destination 1740 01:30:00,841 --> 01:30:03,301 and its second argument the source. 1741 01:30:03,301 --> 01:30:08,281 So copy from right to left in this case, too, that's going to do all of that 1742 01:30:08,281 --> 01:30:11,231 automatically for me as well. 1743 01:30:11,231 --> 01:30:13,421 Now I think I'm good. 1744 01:30:13,421 --> 01:30:15,401 I can now capitalize safely. 1745 01:30:15,401 --> 01:30:19,441 The first character in t, which is now a different chunk of memory 1746 01:30:19,441 --> 01:30:23,441 than s, and then I can print them both out to see that one has not changed 1747 01:30:23,441 --> 01:30:24,451 but the other has. 1748 01:30:24,451 --> 01:30:27,331 So make copy-- all right, what did I do wrong? 1749 01:30:27,331 --> 01:30:30,421 Implicitly declaring library function malloc dot, dot, dot. 1750 01:30:30,421 --> 01:30:33,061 So we've seen this kind of error before. 1751 01:30:33,061 --> 01:30:36,151 What is-- even if you don't know quite how to solve it, 1752 01:30:36,151 --> 01:30:37,681 what's the essence of the solution? 1753 01:30:37,681 --> 01:30:40,711 What do I need to do to fix this kind of problem involving implicitly 1754 01:30:40,711 --> 01:30:43,271 declaring a library function? 1755 01:30:43,271 --> 01:30:44,081 What did I forget? 1756 01:30:44,081 --> 01:30:46,211 Yeah. 1757 01:30:46,211 --> 01:30:47,561 I need to include the library. 1758 01:30:47,561 --> 01:30:51,551 And I could look this up in the manual, or I know it off the top of my head, 1759 01:30:51,551 --> 01:30:52,361 I just forgot it. 1760 01:30:52,361 --> 01:30:54,461 There's another library we'll occasionally 1761 01:30:54,461 --> 01:30:56,561 need now called standard lib-- 1762 01:30:56,561 --> 01:31:00,671 standard library-- that contains malloc and free prototypes 1763 01:31:00,671 --> 01:31:02,021 and some other stuff, too. 1764 01:31:02,021 --> 01:31:05,061 All right, let me just clear this away and do make copy one more time. 1765 01:31:05,061 --> 01:31:10,961 Now I'm good. ./copy, Enter, All right. s, I'm going to type in hi, lowercase. 1766 01:31:10,961 --> 01:31:14,771 t and s now come back as intended. 1767 01:31:14,771 --> 01:31:19,961 s is untouched, it would seem, but t is now capitalized. 1768 01:31:19,961 --> 01:31:23,351 Are any questions, then, on what we just did in code? 1769 01:31:23,351 --> 01:31:25,172 Yeah. 1770 01:31:25,172 --> 01:31:28,581 AUDIENCE: You said that malloc and free go together. 1771 01:31:28,581 --> 01:31:32,093 [INAUDIBLE] 1772 01:31:32,093 --> 01:31:33,051 DAVID J. MALAN: Indeed. 1773 01:31:33,051 --> 01:31:35,093 There's a few improvements I want to make, so let 1774 01:31:35,093 --> 01:31:36,651 me actually do those right now. 1775 01:31:36,651 --> 01:31:39,681 Technically, I should practice what I preached and I should indeed, 1776 01:31:39,681 --> 01:31:42,098 when I'm done with t, free t. 1777 01:31:42,098 --> 01:31:44,181 Fortunately, I don't have to worry about how big t 1778 01:31:44,181 --> 01:31:47,691 was-- the computer remembers how many bytes it gave me and it will go free 1779 01:31:47,691 --> 01:31:49,371 all of them, not just the first. 1780 01:31:49,371 --> 01:31:51,081 I should do free t. 1781 01:31:51,081 --> 01:31:53,751 I don't need to do free s, and I shouldn't, 1782 01:31:53,751 --> 01:31:56,691 because that is handled automatically by the CS50 library. 1783 01:31:56,691 --> 01:31:59,091 s, recall, came from GetString, and we actually 1784 01:31:59,091 --> 01:32:01,469 have some fancy code in place that makes sure 1785 01:32:01,469 --> 01:32:03,261 that at the end of your program's execution 1786 01:32:03,261 --> 01:32:06,321 we free any memory that we allocated so we don't actually 1787 01:32:06,321 --> 01:32:08,256 waste memory like I described earlier. 1788 01:32:08,256 --> 01:32:10,131 But there's actually a couple of other things 1789 01:32:10,131 --> 01:32:12,631 if I really want to be pedantic I should put in here. 1790 01:32:12,631 --> 01:32:16,071 It turns out that sometimes malloc can fail, 1791 01:32:16,071 --> 01:32:18,809 and sometimes malloc doesn't have enough memory available 1792 01:32:18,809 --> 01:32:20,601 because maybe your computer's doing so much 1793 01:32:20,601 --> 01:32:22,701 stuff there's just no more RAM available. 1794 01:32:22,701 --> 01:32:24,981 So technically, I should do something like this-- 1795 01:32:24,981 --> 01:32:29,541 if t equals equals null, with two L's today, 1796 01:32:29,541 --> 01:32:32,751 then I should just return 1 or something to say that there was a problem. 1797 01:32:32,751 --> 01:32:34,626 I should probably print an error message too, 1798 01:32:34,626 --> 01:32:36,301 but for now I'm going to keep it simple. 1799 01:32:36,301 --> 01:32:38,526 I should also probably check this. 1800 01:32:38,526 --> 01:32:40,851 This is a little risky of me. 1801 01:32:40,851 --> 01:32:45,511 If I'm doing t bracket zero, this is assuming that there is a letter there. 1802 01:32:45,511 --> 01:32:48,231 But what if the human just hit Enter at the prompt 1803 01:32:48,231 --> 01:32:51,391 and didn't even type h, let alone h-i exclamation point? 1804 01:32:51,391 --> 01:32:53,631 What if there is no t bracket zero? 1805 01:32:53,631 --> 01:32:59,181 So technically, what I should probably do here is, if the length of t 1806 01:32:59,181 --> 01:33:05,121 is at least greater than zero, then go ahead and safely capitalize 1807 01:33:05,121 --> 01:33:06,441 the first letter of it. 1808 01:33:06,441 --> 01:33:08,731 And then at the very end if all goes well, 1809 01:33:08,731 --> 01:33:12,841 I can return zero, thereby signifying that indeed, this thing was successful. 1810 01:33:12,841 --> 01:33:16,711 So yes, these two functions, malloc and free, should be in concert. 1811 01:33:16,711 --> 01:33:21,651 And so if you call malloc you should call free eventually. 1812 01:33:21,651 --> 01:33:27,256 But you did not call malloc for s, so you should not call free for s. 1813 01:33:27,256 --> 01:33:28,131 Yeah, other question. 1814 01:33:28,131 --> 01:33:29,298 AUDIENCE: Here's a question. 1815 01:33:29,298 --> 01:33:31,579 Why do we do malloc plus 1? 1816 01:33:31,579 --> 01:33:33,371 DAVID J. MALAN: Why did I do malloc plus 1? 1817 01:33:33,371 --> 01:33:36,281 So malloc-- sorry, malloc of string length of s 1818 01:33:36,281 --> 01:33:39,903 plus 1-- the string length is the literal length of the string as a human 1819 01:33:39,903 --> 01:33:41,111 would perceive it in English. 1820 01:33:41,111 --> 01:33:44,111 So h-i exclamation point-- strlen gives me 3, 1821 01:33:44,111 --> 01:33:47,801 but I know now as of last week and this week what a string technically is 1822 01:33:47,801 --> 01:33:49,751 and a string always has an extra byte. 1823 01:33:49,751 --> 01:33:52,301 The onus is on me to understand and apply 1824 01:33:52,301 --> 01:33:57,011 that lesson learned so that I actually give str copy enough room for that 1825 01:33:57,011 --> 01:33:58,631 trailing null character. 1826 01:33:58,631 --> 01:34:04,301 And here's just an annoying thing when we called the backslash zero N-U-L last 1827 01:34:04,301 --> 01:34:08,351 week, it turns out that N-U-L-L is the same idea. 1828 01:34:08,351 --> 01:34:11,531 It's also zero, but it's zero in the context of pointer. 1829 01:34:11,531 --> 01:34:15,761 So long story short, you never really write N-U-L, I've just said it 1830 01:34:15,761 --> 01:34:17,051 and we saw it on the screen. 1831 01:34:17,051 --> 01:34:22,631 You will start writing N-U-L-L when you want to check whether or not a pointer 1832 01:34:22,631 --> 01:34:23,681 is valid or not. 1833 01:34:23,681 --> 01:34:25,091 And what I mean by that is this. 1834 01:34:25,091 --> 01:34:27,971 If malloc fails and there's just not enough memory left inside 1835 01:34:27,971 --> 01:34:31,271 of the computer for you, it's got to return a special value, 1836 01:34:31,271 --> 01:34:35,201 and that special value is N-U-L-L in all capital letters. 1837 01:34:35,201 --> 01:34:36,821 That signifies something went wrong. 1838 01:34:36,821 --> 01:34:41,771 Do not trust that I'm giving you a useful return value. 1839 01:34:41,771 --> 01:34:45,391 Other questions on these copies thus far? 1840 01:34:45,391 --> 01:34:47,530 Yeah, over there. 1841 01:34:47,530 --> 01:34:51,481 AUDIENCE: [INAUDIBLE] 1842 01:34:51,481 --> 01:34:52,731 DAVID J. MALAN: Good question. 1843 01:34:52,731 --> 01:34:54,621 Will str copy not work without malloc? 1844 01:34:54,621 --> 01:34:57,891 You kind of need both in this case because str copy, 1845 01:34:57,891 --> 01:35:01,281 by definition-- if I pull up its manual page-- needs a destination 1846 01:35:01,281 --> 01:35:03,261 to put the copied characters. 1847 01:35:03,261 --> 01:35:06,321 It's not sufficient just to say char star t semicolon. 1848 01:35:06,321 --> 01:35:07,761 That only gives you a pointer. 1849 01:35:07,761 --> 01:35:10,701 But I need another chunk of memory that's 1850 01:35:10,701 --> 01:35:14,811 just as big as h-i exclamation point backslash zero, 1851 01:35:14,811 --> 01:35:17,271 so malloc gives me a whole bunch of memory 1852 01:35:17,271 --> 01:35:21,561 and then str copy fills it with h-i exclamation point backslash zero. 1853 01:35:21,561 --> 01:35:24,021 So again, that's why we're going down to this lower level, 1854 01:35:24,021 --> 01:35:26,063 because once you understand what needs to be done 1855 01:35:26,063 --> 01:35:27,931 you now have the functions to do it. 1856 01:35:27,931 --> 01:35:29,971 So let's actually consider what we just solved. 1857 01:35:29,971 --> 01:35:33,831 So in this next version of the program where I actually introduced malloc, 1858 01:35:33,831 --> 01:35:37,341 t was initialized for the return value of malloc, 1859 01:35:37,341 --> 01:35:39,381 and maybe the memory that I got back was here-- 1860 01:35:39,381 --> 01:35:42,981 0x456457458459. 1861 01:35:42,981 --> 01:35:45,291 I've left it blank initially because nothing 1862 01:35:45,291 --> 01:35:47,001 is put there automatically by malloc. 1863 01:35:47,001 --> 01:35:51,111 I just get a chunk of memory that is now mine to use as I see fit. 1864 01:35:51,111 --> 01:35:56,031 I then assign t to that return value, which points t at the first address. 1865 01:35:56,031 --> 01:35:57,861 Notice there's no backslash zero. 1866 01:35:57,861 --> 01:36:00,741 This is not yet a string it's just a chunk of memory-- 1867 01:36:00,741 --> 01:36:02,871 four bytes-- an array of four bytes. 1868 01:36:02,871 --> 01:36:06,441 What str copy eventually did for me was it copied the h over, 1869 01:36:06,441 --> 01:36:10,671 the i over, the exclamation point over, and the backslash zero. 1870 01:36:10,671 --> 01:36:14,541 And if I didn't want to use str copy or I forgot that it existed, my for loop 1871 01:36:14,541 --> 01:36:18,701 would have done exactly the same thing. 1872 01:36:18,701 --> 01:36:23,818 Are any questions, then, on these examples here. 1873 01:36:23,818 --> 01:36:24,401 Any questions? 1874 01:36:24,401 --> 01:36:26,144 Yeah. 1875 01:36:26,144 --> 01:36:33,131 AUDIENCE: [INAUDIBLE] 1876 01:36:33,131 --> 01:36:34,381 DAVID J. MALAN: Good question. 1877 01:36:34,381 --> 01:36:38,731 After malloc, if I had then still done just t equals s, 1878 01:36:38,731 --> 01:36:41,851 it actually would have recreated the same original problem 1879 01:36:41,851 --> 01:36:45,571 by just copying 0x123 from s into t. 1880 01:36:45,571 --> 01:36:48,751 So then I would have been left with a picture that looked like this a few 1881 01:36:48,751 --> 01:36:52,711 steps ago, I would have-- and I can't quite do it live-- 1882 01:36:52,711 --> 01:36:55,021 this arrow, if I did what you just described, 1883 01:36:55,021 --> 01:36:58,998 would now be pointing over here and so I wouldn't have fundamentally solved 1884 01:36:58,998 --> 01:37:01,081 the problem, I would have just additionally wasted 1885 01:37:01,081 --> 01:37:04,141 four bytes temporarily that I'm not actually using. 1886 01:37:04,141 --> 01:37:05,983 Yeah. 1887 01:37:05,983 --> 01:37:09,781 AUDIENCE: [INAUDIBLE] 1888 01:37:09,781 --> 01:37:10,861 DAVID J. MALAN: You can-- 1889 01:37:10,861 --> 01:37:12,819 do you always use malloc and str copy together? 1890 01:37:12,819 --> 01:37:13,594 Not necessarily. 1891 01:37:13,594 --> 01:37:15,511 These are both solving two different problems. 1892 01:37:15,511 --> 01:37:19,771 malloc's giving me enough memory to make a copy, str copy is doing the copy. 1893 01:37:19,771 --> 01:37:23,581 However, you could actually use an array, if you wanted, of characters, 1894 01:37:23,581 --> 01:37:26,911 and you could use str copy on that, and there's other use cases for str copy. 1895 01:37:26,911 --> 01:37:29,071 But thus far, it's a reasonable mental model 1896 01:37:29,071 --> 01:37:31,291 to have that if you want to copy strings, 1897 01:37:31,291 --> 01:37:34,921 you use malloc and then str copy, or your own homegrown loop. 1898 01:37:34,921 --> 01:37:36,844 Yeah. 1899 01:37:36,844 --> 01:37:47,171 AUDIENCE: [INAUDIBLE] 1900 01:37:47,171 --> 01:37:49,370 DAVID J. MALAN: Say that once more. 1901 01:37:49,370 --> 01:37:54,579 AUDIENCE: [INAUDIBLE] 1902 01:37:54,579 --> 01:37:55,371 DAVID J. MALAN: No. 1903 01:37:55,371 --> 01:37:57,031 It will-- good question. 1904 01:37:57,031 --> 01:38:00,171 If I had a-- 1905 01:38:00,171 --> 01:38:03,441 str copy, per its documentation, will copy the whole string 1906 01:38:03,441 --> 01:38:05,661 plus the null character at the end. 1907 01:38:05,661 --> 01:38:08,121 It just assumes there will be one there. 1908 01:38:08,121 --> 01:38:12,291 It's therefore up to you to pass str copy a long enough chunk of memory 1909 01:38:12,291 --> 01:38:13,281 to have room for that. 1910 01:38:13,281 --> 01:38:15,471 If I only ask malloc for three bytes, that 1911 01:38:15,471 --> 01:38:17,541 could have potentially created a memory problem 1912 01:38:17,541 --> 01:38:20,901 whereby str copy would just still blindly copy one, two, three, 1913 01:38:20,901 --> 01:38:24,441 four bytes, but technically it should have only touched three of those. 1914 01:38:24,441 --> 01:38:27,291 You do not yet have access to the fourth one, or the rights to it, 1915 01:38:27,291 --> 01:38:29,541 because you never asked malloc for it. 1916 01:38:29,541 --> 01:38:31,461 Yeah. 1917 01:38:31,461 --> 01:38:34,461 AUDIENCE: So the number inside malloc would be the number of bytes. 1918 01:38:34,461 --> 01:38:34,821 DAVID J. MALAN: Correct. 1919 01:38:34,821 --> 01:38:36,696 The number inside malloc-- it's one argument. 1920 01:38:36,696 --> 01:38:39,723 It's the number of bytes you want back. 1921 01:38:39,723 --> 01:38:43,041 AUDIENCE: Does that mean you have to remember [INAUDIBLE]?? 1922 01:38:45,798 --> 01:38:48,131 DAVID J. MALAN: Yes, the onus is on you, the programmer, 1923 01:38:48,131 --> 01:38:50,298 to remember or frankly, use a function to figure out 1924 01:38:50,298 --> 01:38:51,821 how many bytes you actually need. 1925 01:38:51,821 --> 01:38:54,671 That's why I did not ultimately type in four manually, 1926 01:38:54,671 --> 01:38:56,441 I used str length plus 1. 1927 01:38:56,441 --> 01:38:59,831 So the plus 1 is necessary if you understand how strings are represented, 1928 01:38:59,831 --> 01:39:02,471 but using strlen means that I can actually 1929 01:39:02,471 --> 01:39:05,651 play around with any types of inputs and it will dynamically 1930 01:39:05,651 --> 01:39:07,541 figure out the length. 1931 01:39:07,541 --> 01:39:09,821 So suffice it to say, there's so many ways 1932 01:39:09,821 --> 01:39:11,931 already where you can start to break programs. 1933 01:39:11,931 --> 01:39:15,386 Let's give you at least one tool for finding mistakes that you might make. 1934 01:39:15,386 --> 01:39:17,261 And indeed, in upcoming problem sets you will 1935 01:39:17,261 --> 01:39:19,361 use this to find bugs in your own code. 1936 01:39:19,361 --> 01:39:22,991 Not just using printf, not just using the built-in debugger, but another tool 1937 01:39:22,991 --> 01:39:24,201 here as well. 1938 01:39:24,201 --> 01:39:27,371 So let me go ahead and deliberately write a program called memory.c 1939 01:39:27,371 --> 01:39:29,511 that has some memory-related errors. 1940 01:39:29,511 --> 01:39:34,901 Let me include stdio.h at the top and let me include stdlib.h at the top 1941 01:39:34,901 --> 01:39:36,551 so I have access to malloc now. 1942 01:39:36,551 --> 01:39:41,171 Let me do int main(void) and then inside of main, let me do this-- 1943 01:39:41,171 --> 01:39:44,351 I want to allocate maybe how about three-- 1944 01:39:44,351 --> 01:39:45,711 space for three integers. 1945 01:39:45,711 --> 01:39:46,211 Why? 1946 01:39:46,211 --> 01:39:48,191 Just for the sake of discussion. 1947 01:39:48,191 --> 01:39:52,721 So I'm going to go ahead and do malloc of three, but I don't want three bytes. 1948 01:39:52,721 --> 01:39:56,008 I want three integers and an integer is four bytes, 1949 01:39:56,008 --> 01:39:57,341 so technically I could do this-- 1950 01:39:57,341 --> 01:40:01,851 3 times 4, or I could do 12 but again, that's making certain assumptions 1951 01:40:01,851 --> 01:40:04,341 and if I run this program on a slightly different computer, 1952 01:40:04,341 --> 01:40:05,861 int might be a different size. 1953 01:40:05,861 --> 01:40:10,321 so the better way to do this would be 3 times whatever the size is of an int. 1954 01:40:10,321 --> 01:40:13,571 And this is just an operator you can use any time if you just want to find out 1955 01:40:13,571 --> 01:40:15,611 on this computer, how big is an int? 1956 01:40:15,611 --> 01:40:18,291 How big is a float, or something else? 1957 01:40:18,291 --> 01:40:20,411 So that's going to give me that many-- 1958 01:40:20,411 --> 01:40:22,811 that much memory for three ints. 1959 01:40:22,811 --> 01:40:24,821 What do I want to assign this to? 1960 01:40:24,821 --> 01:40:27,011 Well, malloc returns an address. 1961 01:40:27,011 --> 01:40:32,291 Pointers are addresses, so I'm going to create a pointer to an int called 1962 01:40:32,291 --> 01:40:34,521 x and assign it the value. 1963 01:40:34,521 --> 01:40:35,741 So what am I doing here? 1964 01:40:35,741 --> 01:40:38,321 This is a little less obvious, but again go back to basics. 1965 01:40:38,321 --> 01:40:43,091 The right hand side here gives me a chunk of memory for three integers. 1966 01:40:43,091 --> 01:40:46,661 malloc returns the address of the first byte of that chunk. 1967 01:40:46,661 --> 01:40:48,791 How do I store the address of anything? 1968 01:40:48,791 --> 01:40:49,691 I need a pointer. 1969 01:40:49,691 --> 01:40:53,561 The syntax for today is type of data, star, 1970 01:40:53,561 --> 01:40:58,631 where the type of data in question is three ints, so I do int star x. 1971 01:40:58,631 --> 01:41:02,531 Again, it's kind of purposeless, only for sort of instructional purposes 1972 01:41:02,531 --> 01:41:07,901 here, but this is equivalent now to having a chunk of memory of size 12 1973 01:41:07,901 --> 01:41:11,351 in total, presumably, so I can technically now do this. 1974 01:41:11,351 --> 01:41:15,491 I can go into maybe the first location and assign it the number 72 1975 01:41:15,491 --> 01:41:16,911 like the other day. 1976 01:41:16,911 --> 01:41:24,701 Second location, the number 73, and the third location, maybe the number 33. 1977 01:41:24,701 --> 01:41:27,551 Now I've deliberately made two mistakes here 1978 01:41:27,551 --> 01:41:30,701 because I'm trying to trip over my newfound understanding, 1979 01:41:30,701 --> 01:41:33,281 or my greenness with understanding pointers. 1980 01:41:33,281 --> 01:41:36,641 One, I didn't remember that I should be treating chunks of memory 1981 01:41:36,641 --> 01:41:37,751 as zero indexed. 1982 01:41:37,751 --> 01:41:41,141 malloc essentially returns an array, if you want to think of it as that. 1983 01:41:41,141 --> 01:41:43,541 An array of three ints, or more technically, 1984 01:41:43,541 --> 01:41:47,381 the address of a chunk of memory that could fit three ints. 1985 01:41:47,381 --> 01:41:50,681 So I can use my square bracket notation, or I could be really cool 1986 01:41:50,681 --> 01:41:53,631 and use pointer arithmetic, but this is a little more user friendly. 1987 01:41:53,631 --> 01:41:55,481 But I have made two mistakes. 1988 01:41:55,481 --> 01:41:59,081 I did not start indexing at zero, so line seven 1989 01:41:59,081 --> 01:42:00,941 should have been x bracket zero. 1990 01:42:00,941 --> 01:42:03,813 Line eight should have been x bracket 1, and then line nine 1991 01:42:03,813 --> 01:42:05,021 should have been x bracket 2. 1992 01:42:05,021 --> 01:42:06,231 So first mistake. 1993 01:42:06,231 --> 01:42:09,161 The second mistake that I've made as a side effect, 1994 01:42:09,161 --> 01:42:12,221 is I'm also touching memory that I shouldn't. 1995 01:42:12,221 --> 01:42:17,171 x bracket 3 would mean go to the fourth int in the chunk of memory 1996 01:42:17,171 --> 01:42:17,981 that came back. 1997 01:42:17,981 --> 01:42:20,501 I only asked for enough memory for three ints, 1998 01:42:20,501 --> 01:42:23,741 not four, so this is what's called a buffer overflow. 1999 01:42:23,741 --> 01:42:26,831 I am accidentally, but deliberately at the moment, 2000 01:42:26,831 --> 01:42:30,951 going beyond the boundaries of this array, this chunk of memory. 2001 01:42:30,951 --> 01:42:33,311 So bad things happen, but not necessarily 2002 01:42:33,311 --> 01:42:34,641 by just running your program. 2003 01:42:34,641 --> 01:42:36,191 Let me go ahead and just try this. 2004 01:42:36,191 --> 01:42:42,011 Make memory, and you'll see here that it compiles OK. ./memory, 2005 01:42:42,011 --> 01:42:44,139 and it actually does not segmentation fault, 2006 01:42:44,139 --> 01:42:46,181 which comes back to that point of nondeterminism. 2007 01:42:46,181 --> 01:42:48,551 Sometimes it does, sometimes it doesn't-- it depends on how bad 2008 01:42:48,551 --> 01:42:49,691 of a mistake you made. 2009 01:42:49,691 --> 01:42:52,858 But there's a program that can spot these kinds of mistakes, 2010 01:42:52,858 --> 01:42:55,691 and I'm going to go ahead and expand my terminal window for a moment 2011 01:42:55,691 --> 01:43:01,151 and I'm going to run not just ./memory, but a program called Valgrind./memory. 2012 01:43:01,151 --> 01:43:04,001 This is a command that comes with a lot of computer systems 2013 01:43:04,001 --> 01:43:07,071 that's designed to find memory-related bugs in code. 2014 01:43:07,071 --> 01:43:09,011 So it's a new tool in your toolkit today, 2015 01:43:09,011 --> 01:43:11,111 and you'll use it with the coming problem sets. 2016 01:43:11,111 --> 01:43:12,311 I'm going to run this now. 2017 01:43:12,311 --> 01:43:14,591 It's output, honestly, it's hideous. 2018 01:43:14,591 --> 01:43:17,981 But there's a few things that will start to jump out 2019 01:43:17,981 --> 01:43:20,381 and will help you with tools and the problems 2020 01:43:20,381 --> 01:43:21,951 sets to see these kinds of things. 2021 01:43:21,951 --> 01:43:23,531 Here's the first mistake. 2022 01:43:23,531 --> 01:43:26,471 Invalid write of size four. 2023 01:43:26,471 --> 01:43:30,461 That's on memory.c line nine, per my highlights. 2024 01:43:30,461 --> 01:43:32,351 So let me go look at line nine. 2025 01:43:32,351 --> 01:43:36,011 In what sense is this an invalid write of size four? 2026 01:43:36,011 --> 01:43:38,591 Well, I'm touching memory that I shouldn't, and I'm 2027 01:43:38,591 --> 01:43:40,061 touching it as though it's an int. 2028 01:43:40,061 --> 01:43:42,551 And an int is four bytes-- size four. 2029 01:43:42,551 --> 01:43:45,831 So again, this takes some practice to get used to, the nomenclature here, 2030 01:43:45,831 --> 01:43:48,771 but this is now a clue for me, the programmer, 2031 01:43:48,771 --> 01:43:52,231 that not only did I screw up, but I screwed up related to memory 2032 01:43:52,231 --> 01:43:54,749 and so this is just a hint, if you will. 2033 01:43:54,749 --> 01:43:57,291 It's not going to necessarily tell you exactly how to fix it, 2034 01:43:57,291 --> 01:44:01,131 you have to wrestle with the semantics, but invalid 2035 01:44:01,131 --> 01:44:02,961 write of size four-- oh, OK. 2036 01:44:02,961 --> 01:44:07,321 So I should not have indexed past the boundary here. 2037 01:44:07,321 --> 01:44:10,021 All right, so I shouldn't have done that. 2038 01:44:10,021 --> 01:44:15,764 So let me go ahead then and change this to zero, one, and two, perhaps, here. 2039 01:44:15,764 --> 01:44:17,931 All right, so let me go ahead and recompile my code. 2040 01:44:17,931 --> 01:44:24,261 Make memory, ./memory, still doesn't seem to be broken but it is technically 2041 01:44:24,261 --> 01:44:24,891 buggy. 2042 01:44:24,891 --> 01:44:31,101 Let me go ahead and run Valgrind again, so Valgrind of ./memory, Enter. 2043 01:44:31,101 --> 01:44:33,321 And now there's fewer scary-- 2044 01:44:33,321 --> 01:44:36,841 less scary output now, but there's still something in there. 2045 01:44:36,841 --> 01:44:40,368 Notice this-- 12 bytes in one blocks-- 2046 01:44:40,368 --> 01:44:42,201 no regard for grammar there-- are definitely 2047 01:44:42,201 --> 01:44:43,971 lost in lost record one of one. 2048 01:44:43,971 --> 01:44:47,611 Super cryptic, but this is hinting at a so-called memory leak. 2049 01:44:47,611 --> 01:44:51,441 The blocks of memory are lost in the sense that I malloc'd them-- 2050 01:44:51,441 --> 01:44:52,881 I asked for them but I never-- 2051 01:44:52,881 --> 01:44:55,071 take a guess-- freed them. 2052 01:44:55,071 --> 01:44:56,008 I have a memory leak. 2053 01:44:56,008 --> 01:44:58,341 And this is the arcane way of saying, you've screwed up. 2054 01:44:58,341 --> 01:44:59,551 You have a memory leak. 2055 01:44:59,551 --> 01:45:01,821 So this is an easy fix, fortunately. 2056 01:45:01,821 --> 01:45:06,211 Once I'm done with this memory I just need to free it at the end. 2057 01:45:06,211 --> 01:45:08,631 So now let me go ahead and rerun make memory, 2058 01:45:08,631 --> 01:45:12,441 it's still runs fine so all the while I might have thought, incorrectly, 2059 01:45:12,441 --> 01:45:13,581 my code is correct. 2060 01:45:13,581 --> 01:45:15,261 But let me run Valgrind one more time. 2061 01:45:15,261 --> 01:45:17,451 Valgrin of ./memory, Enter. 2062 01:45:17,451 --> 01:45:19,341 Now, this is pretty good. 2063 01:45:19,341 --> 01:45:21,531 All heap blocks were freed, whatever that means. 2064 01:45:21,531 --> 01:45:23,371 No leaks are possible. 2065 01:45:23,371 --> 01:45:26,481 And even though it's still a little cryptic, there's no other error here 2066 01:45:26,481 --> 01:45:29,985 and in fact, it's pretty explicit-- error summary, zero errors from zero 2067 01:45:29,985 --> 01:45:31,641 contexts, dot, dot, dot. 2068 01:45:31,641 --> 01:45:34,831 So even though this is one of the most arcane tools we'll use, 2069 01:45:34,831 --> 01:45:37,341 it's also one of the most powerful because it can see things 2070 01:45:37,341 --> 01:45:40,671 that you, the human, might not, and maybe even that the debugger might not. 2071 01:45:40,671 --> 01:45:42,741 It does a much closer reading of your code 2072 01:45:42,741 --> 01:45:48,501 while it's running to figure out exactly what is going on. 2073 01:45:48,501 --> 01:45:50,781 Any questions, then, on this tool? 2074 01:45:50,781 --> 01:45:54,681 And we'll guide you after today with actually using this, too. 2075 01:45:54,681 --> 01:45:57,201 Just helps you find memory-related mistakes 2076 01:45:57,201 --> 01:46:00,021 that you might now be capable of making. 2077 01:46:00,021 --> 01:46:02,181 All right, let's do one other memory-related thing. 2078 01:46:02,181 --> 01:46:04,171 Let me shrink my terminal window here. 2079 01:46:04,171 --> 01:46:07,911 Let me create one other file here called garbage.c. 2080 01:46:07,911 --> 01:46:11,421 It turns out there's a term of ours called garbage values in programming 2081 01:46:11,421 --> 01:46:12,931 that we can reveal as follows. 2082 01:46:12,931 --> 01:46:15,921 Let me include stdio.h, and let me include-- 2083 01:46:15,921 --> 01:46:19,461 how about stdlib.h, and then let me give myself int 2084 01:46:19,461 --> 01:46:22,561 main(void), and then in this relatively short program 2085 01:46:22,561 --> 01:46:25,461 let me give myself three ints using last week's 2086 01:46:25,461 --> 01:46:29,421 notation, just int scores bracket 3 for 3 quiz scores, or whatever. 2087 01:46:29,421 --> 01:46:33,441 Then let me go ahead and do for int i equals zero, i less than 3, 2088 01:46:33,441 --> 01:46:38,691 i plus plus, then let me go ahead and print out, %i backslash n, 2089 01:46:38,691 --> 01:46:40,911 scores bracket i semicolon. 2090 01:46:40,911 --> 01:46:43,491 That's it. 2091 01:46:43,491 --> 01:46:48,781 This code, pretty sure is going to compile and it's going to run, 2092 01:46:48,781 --> 01:46:51,171 but what is my logical bug? 2093 01:46:51,171 --> 01:46:55,701 I've forgotten a step even though the code that's written is not so wrong. 2094 01:46:55,701 --> 01:46:58,431 Yeah? 2095 01:46:58,431 --> 01:47:00,921 Yeah, I didn't provide the scores, so I didn't actually 2096 01:47:00,921 --> 01:47:04,851 initialize the array called scores to have any scores whatsoever. 2097 01:47:04,851 --> 01:47:08,391 What's curious about this, though, is that the computer technically 2098 01:47:08,391 --> 01:47:09,081 doesn't mind. 2099 01:47:09,081 --> 01:47:13,041 Let me go ahead and playfully make garbage, Enter, 2100 01:47:13,041 --> 01:47:15,621 and it's an apt description because what I'm about to see 2101 01:47:15,621 --> 01:47:18,231 are so-called garbage values. 2102 01:47:18,231 --> 01:47:23,061 When you, the programmer, do not initialize your codes variables to have 2103 01:47:23,061 --> 01:47:25,878 values, sometimes, who knows what's going to be there. 2104 01:47:25,878 --> 01:47:27,711 The computer's been doing some other things, 2105 01:47:27,711 --> 01:47:31,161 there's a bit of work that happens even before your code runs in the computer, 2106 01:47:31,161 --> 01:47:34,401 so there might be remnants of past ints, chars, strings, 2107 01:47:34,401 --> 01:47:37,041 floats-- anything else in there and what you're seeing 2108 01:47:37,041 --> 01:47:42,661 is those garbage values, which is to say you should never forget, 2109 01:47:42,661 --> 01:47:45,601 as I just did, to initialize the value of some variable. 2110 01:47:45,601 --> 01:47:47,601 And this is actually pretty dangerous, and there 2111 01:47:47,601 --> 01:47:51,081 have been many examples of software being compromised 2112 01:47:51,081 --> 01:47:54,261 because of one of these issues where a variable wasn't initialized 2113 01:47:54,261 --> 01:47:58,611 and all of a sudden users, maybe people on the internet in the context of web 2114 01:47:58,611 --> 01:48:02,481 applications, could suddenly see the contents of someone else's memory, 2115 01:48:02,481 --> 01:48:03,591 or remnants. 2116 01:48:03,591 --> 01:48:06,051 Maybe someone's password that had been previously typed in 2117 01:48:06,051 --> 01:48:08,031 or some other value like a credit card number 2118 01:48:08,031 --> 01:48:09,591 that had been previously typed in. 2119 01:48:09,591 --> 01:48:11,571 There are different defense mechanisms in place 2120 01:48:11,571 --> 01:48:15,111 to generally make this not so likely, but it's certainly 2121 01:48:15,111 --> 01:48:18,171 very possible, at least in this kind of context, 2122 01:48:18,171 --> 01:48:22,101 to see values that you probably shouldn't because they 2123 01:48:22,101 --> 01:48:25,621 might be remnants from something else that used them. 2124 01:48:25,621 --> 01:48:29,701 So this is to say again, you have this great power now to manipulate memory, 2125 01:48:29,701 --> 01:48:33,021 but also now you have this great hacking ability to poke around 2126 01:48:33,021 --> 01:48:36,441 the contents of memory, and this is exactly what hackers sometimes do when 2127 01:48:36,441 --> 01:48:40,431 trying to find ways to exploit systems. 2128 01:48:40,431 --> 01:48:41,661 Are any questions here? 2129 01:48:44,571 --> 01:48:45,071 No? 2130 01:48:45,071 --> 01:48:47,111 All right, let's go ahead and take a quick five minute break 2131 01:48:47,111 --> 01:48:49,511 and when we come back, we'll build on these final topics. 2132 01:48:49,511 --> 01:48:50,381 See you in five. 2133 01:48:50,381 --> 01:48:51,671 We are back. 2134 01:48:51,671 --> 01:48:55,481 First, just a little programmer humor from XKCD, which hopefully now 2135 01:48:55,481 --> 01:48:57,851 will make a little bit of sense to you. 2136 01:48:57,851 --> 01:49:02,321 And what we'll also do next to take a look at a short two minute video that 2137 01:49:02,321 --> 01:49:05,501 animates with claymation, if you will, from our friends at Stanford, 2138 01:49:05,501 --> 01:49:08,501 exactly what happens now if you have an understanding of what garbage 2139 01:49:08,501 --> 01:49:12,004 values are and how they get there, and what happens then if you misuse them. 2140 01:49:12,004 --> 01:49:14,171 It's one thing just to print them out as I just did, 2141 01:49:14,171 --> 01:49:18,431 it's another if you actually mistake a garbage value for a valid pointer, 2142 01:49:18,431 --> 01:49:21,881 because garbage values are just zeros and ones somewhere-- numbers, that is. 2143 01:49:21,881 --> 01:49:24,761 But if you use that new dereference operator, the star, 2144 01:49:24,761 --> 01:49:29,111 and try to go to a garbage value thinking incorrectly that it's 2145 01:49:29,111 --> 01:49:31,511 a valid pointer, bad things can happen. 2146 01:49:31,511 --> 01:49:36,431 Computers can crash or more familiarly, segmentation faults can happen. 2147 01:49:36,431 --> 01:49:39,401 So allow me to introduce, if we could dim the lights for two minutes, 2148 01:49:39,401 --> 01:49:41,111 our friend Binky from Stanford. 2149 01:49:44,951 --> 01:49:46,541 SPEAKER 1: Hey Binky, wake up. 2150 01:49:46,541 --> 01:49:49,221 It's time for pointer fun. 2151 01:49:49,221 --> 01:49:50,331 BINKY: What's that? 2152 01:49:50,331 --> 01:49:51,921 Learn about pointers? 2153 01:49:51,921 --> 01:49:53,184 Oh, goody! 2154 01:49:53,184 --> 01:49:55,101 SPEAKER 1: Well, to get started, I guess we're 2155 01:49:55,101 --> 01:49:56,721 going to need a couple of pointers. 2156 01:49:56,721 --> 01:50:00,998 BINKY: OK, this code allocates two pointers which can point to integers. 2157 01:50:00,998 --> 01:50:01,581 SPEAKER 1: OK. 2158 01:50:01,581 --> 01:50:05,188 Well, I see the two pointers, but they don't seem to be pointing to anything. 2159 01:50:05,188 --> 01:50:06,021 BINKY: That's right. 2160 01:50:06,021 --> 01:50:08,151 Initially, pointers don't point to anything. 2161 01:50:08,151 --> 01:50:11,181 The things they point to are called pointees, and setting them up 2162 01:50:11,181 --> 01:50:12,174 is a separate step. 2163 01:50:12,174 --> 01:50:13,341 SPEAKER 1: Oh, right, right. 2164 01:50:13,341 --> 01:50:14,031 I knew that. 2165 01:50:14,031 --> 01:50:16,021 The pointees are separate. 2166 01:50:16,021 --> 01:50:18,351 So how do you allocate a pointee? 2167 01:50:18,351 --> 01:50:21,921 BINKY: OK, well this code allocates a new integer pointee, 2168 01:50:21,921 --> 01:50:24,994 and this part sets x to point to it. 2169 01:50:24,994 --> 01:50:26,411 SPEAKER 1: Hey, that looks better. 2170 01:50:26,411 --> 01:50:28,021 So make it do something. 2171 01:50:28,021 --> 01:50:31,411 BINKY: OK, I'll dereference the pointer x to store the number 2172 01:50:31,411 --> 01:50:33,541 42 into its pointee. 2173 01:50:33,541 --> 01:50:37,201 For this trick, I'll need my magic wand of dereferencing. 2174 01:50:37,201 --> 01:50:40,591 SPEAKER 1: Your magic wand of dereferencing? 2175 01:50:40,591 --> 01:50:42,441 That great. 2176 01:50:42,441 --> 01:50:44,151 BINKY: This is what the code looks like. 2177 01:50:44,151 --> 01:50:46,946 I'll just set up the number and-- 2178 01:50:46,946 --> 01:50:47,821 SPEAKER 1: Hey, look. 2179 01:50:47,821 --> 01:50:49,171 There it goes. 2180 01:50:49,171 --> 01:50:54,091 So doing a dereference on x follows the arrow to access its pointee, 2181 01:50:54,091 --> 01:50:56,131 in this case to store 42 in there. 2182 01:50:56,131 --> 01:51:00,751 Hey, try using it to store the number 13 through the other pointer, y. 2183 01:51:00,751 --> 01:51:01,891 BINKY: OK. 2184 01:51:01,891 --> 01:51:06,271 I'll just go over here to y and get the number 13 set up, 2185 01:51:06,271 --> 01:51:10,801 and then take the wand of dereferencing and just-- 2186 01:51:10,801 --> 01:51:11,881 whoa! 2187 01:51:11,881 --> 01:51:14,101 SPEAKER 1: Oh hey, that didn't work. 2188 01:51:14,101 --> 01:51:17,821 Say, Binky, I don't think dereferencing y is a good idea 2189 01:51:17,821 --> 01:51:21,016 because setting up the pointee is a separate step 2190 01:51:21,016 --> 01:51:23,551 and I don't think we ever did it. 2191 01:51:23,551 --> 01:51:24,601 BINKY: Good point. 2192 01:51:24,601 --> 01:51:27,031 SPEAKER 1: Yeah, we allocated the pointer y, 2193 01:51:27,031 --> 01:51:30,271 but we never set it to point to a pointee. 2194 01:51:30,271 --> 01:51:31,439 BINKY: Very observant. 2195 01:51:31,439 --> 01:51:33,481 SPEAKER 1: Hey, you're looking good there, Binky. 2196 01:51:33,481 --> 01:51:36,361 Can you fix it so that y points to the same pointee as x? 2197 01:51:36,361 --> 01:51:39,721 BINKY: Sure, I'll use my magic wand of pointer assignment. 2198 01:51:39,721 --> 01:51:41,971 SPEAKER 1: Is that going to be a problem, like before? 2199 01:51:41,971 --> 01:51:43,861 BINKY: No, this doesn't touch the pointees, 2200 01:51:43,861 --> 01:51:47,491 it just changes one pointer to point to the same thing as another. 2201 01:51:47,491 --> 01:51:48,511 SPEAKER 1: Oh, I see. 2202 01:51:48,511 --> 01:51:51,181 Now y points to the same place as x. 2203 01:51:51,181 --> 01:51:53,071 So wait, now y is fixed. 2204 01:51:53,071 --> 01:51:56,131 It has a pointee so you can try the wand of dereferencing again 2205 01:51:56,131 --> 01:51:58,741 to send the 13 over. 2206 01:51:58,741 --> 01:52:01,073 BINKY: OK, here it goes. 2207 01:52:01,073 --> 01:52:02,281 SPEAKER 1: Hey, look at that. 2208 01:52:02,281 --> 01:52:04,111 Now dereferencing works on y. 2209 01:52:04,111 --> 01:52:08,161 And because the pointers are sharing that one pointee, they both see the 13. 2210 01:52:08,161 --> 01:52:09,301 BINKY: Yeah, sharing. 2211 01:52:09,301 --> 01:52:09,871 Whatever. 2212 01:52:09,871 --> 01:52:11,911 So are we going to switch places now? 2213 01:52:11,911 --> 01:52:13,831 SPEAKER 1: Oh look, we're out of time. 2214 01:52:13,831 --> 01:52:14,951 BINKY: But-- 2215 01:52:14,951 --> 01:52:17,171 That's from our friend Nick Parlante at Stanford. 2216 01:52:17,171 --> 01:52:19,511 So let's consider what Nick did here as Binky. 2217 01:52:19,511 --> 01:52:21,581 So here is all the code together. 2218 01:52:21,581 --> 01:52:25,258 These first couple of lines were not bad, and notice that in Stanford's code 2219 01:52:25,258 --> 01:52:26,591 they move the stars to the left. 2220 01:52:26,591 --> 01:52:27,341 That's fine. 2221 01:52:27,341 --> 01:52:30,251 Again, more conventional might be this syntax here. 2222 01:52:30,251 --> 01:52:31,461 These two lines are fine. 2223 01:52:31,461 --> 01:52:34,781 It's OK to create variables, even pointers, 2224 01:52:34,781 --> 01:52:38,411 and not assign them a value initially so long as you eventually do. 2225 01:52:38,411 --> 01:52:40,931 So we eventually do here, with this line. 2226 01:52:40,931 --> 01:52:43,991 We assign to x the return value of malloc, which 2227 01:52:43,991 --> 01:52:45,821 is presumably the address of something. 2228 01:52:45,821 --> 01:52:49,071 To be fair, we should really be checking for null as well, 2229 01:52:49,071 --> 01:52:50,991 but that's not the biggest problem here. 2230 01:52:50,991 --> 01:52:53,481 The biggest problem is not even this next line, 2231 01:52:53,481 --> 01:52:59,231 which means go to the memory location in x and store the number 42 there. 2232 01:52:59,231 --> 01:53:01,451 That's fine, because again, malloc returns 2233 01:53:01,451 --> 01:53:03,701 the address of some chunk of memory. 2234 01:53:03,701 --> 01:53:05,801 This chunk of memory is big enough for an int. 2235 01:53:05,801 --> 01:53:08,711 x is therefore going to store the address of that chunk that's 2236 01:53:08,711 --> 01:53:09,671 big enough for an int. 2237 01:53:09,671 --> 01:53:13,541 Star x recalls the dereference operator, means go to that address 2238 01:53:13,541 --> 01:53:15,341 and put 42 in it. 2239 01:53:15,341 --> 01:53:18,461 It's like going to the mailbox and putting the number 42 in it 2240 01:53:18,461 --> 01:53:21,371 instead of taking the number 50 out, like we did before. 2241 01:53:21,371 --> 01:53:23,051 But why is this line bad? 2242 01:53:23,051 --> 01:53:26,291 This is where Binky lost his head, so to speak. 2243 01:53:26,291 --> 01:53:27,641 Why is this bad? 2244 01:53:27,641 --> 01:53:28,681 Yeah. 2245 01:53:28,681 --> 01:53:30,681 AUDIENCE: We haven't yet allocated space for it. 2246 01:53:30,681 --> 01:53:31,231 DAVID J. MALAN: Exactly. 2247 01:53:31,231 --> 01:53:33,141 We haven't yet allocated space for y. 2248 01:53:33,141 --> 01:53:36,051 There's no mention of malloc, there's no assignment of y, 2249 01:53:36,051 --> 01:53:37,591 even to that same memory. 2250 01:53:37,591 --> 01:53:40,441 So this would be, go to the address in y, 2251 01:53:40,441 --> 01:53:43,831 but if there is no known address in y, it is a so-called garbage value, 2252 01:53:43,831 --> 01:53:46,761 which means go to some random address that you have no control over, 2253 01:53:46,761 --> 01:53:47,571 and boom-- 2254 01:53:47,571 --> 01:53:52,221 that might cause what we've seen in the past, perhaps as a segmentation fault. 2255 01:53:52,221 --> 01:53:54,111 Now this, fortunately, is the kind of thing 2256 01:53:54,111 --> 01:53:58,041 that if you don't quite have the eye for it yet, Valgrins, that new tool, 2257 01:53:58,041 --> 01:53:59,911 could help you find as well. 2258 01:53:59,911 --> 01:54:03,681 But it's just another example of again, the sort of upside and downside 2259 01:54:03,681 --> 01:54:07,111 of having control now over memory at this level. 2260 01:54:07,111 --> 01:54:07,611 All right. 2261 01:54:07,611 --> 01:54:09,444 Well, let's go ahead and do one other thing. 2262 01:54:09,444 --> 01:54:12,586 Considering from last week that this notion of swapping 2263 01:54:12,586 --> 01:54:14,211 was actually a really common operation. 2264 01:54:14,211 --> 01:54:17,211 We had all of our volunteers come up, we had to swap a lot of things 2265 01:54:17,211 --> 01:54:19,581 during bubble sorts and even selection sort, 2266 01:54:19,581 --> 01:54:21,681 and we just took for granted that the two 2267 01:54:21,681 --> 01:54:23,613 humans would swap themselves just fine. 2268 01:54:23,613 --> 01:54:25,821 But there needs to be code to do that if you actually 2269 01:54:25,821 --> 01:54:29,638 implement bubble sort, selection sort, or anything that involves swapping. 2270 01:54:29,638 --> 01:54:31,221 So let's consider some code like this. 2271 01:54:31,221 --> 01:54:33,291 We'll keep it simple like last week, and where 2272 01:54:33,291 --> 01:54:40,339 we wanted to swap some values like int A and int B, for instance, here. 2273 01:54:40,339 --> 01:54:43,131 Void because I'm not going to return a value, but I have a function 2274 01:54:43,131 --> 01:54:44,031 called swap. 2275 01:54:44,031 --> 01:54:49,341 So here, for instance, might be some code for this. 2276 01:54:49,341 --> 01:54:50,549 But why is it so complicated? 2277 01:54:50,549 --> 01:54:52,133 Here, let's actually take a step back. 2278 01:54:52,133 --> 01:54:53,301 Why don't we do this here. 2279 01:54:53,301 --> 01:54:54,921 I think we have time for one more volunteer. 2280 01:54:54,921 --> 01:54:56,379 Could we get someone to come on up? 2281 01:54:56,379 --> 01:54:58,671 You have to be comfy on camera and you're 2282 01:54:58,671 --> 01:55:01,701 being asked to help with your-- oh, I'll go with the friend, pointing. 2283 01:55:01,701 --> 01:55:05,641 So whoever has their friend doing this here-- 2284 01:55:05,641 --> 01:55:06,621 no? 2285 01:55:06,621 --> 01:55:08,511 Now they're pointing it over here. 2286 01:55:08,511 --> 01:55:10,251 Now, literally an arm is being twisted. 2287 01:55:10,251 --> 01:55:11,751 OK. 2288 01:55:11,751 --> 01:55:12,471 Come on down. 2289 01:55:12,471 --> 01:55:13,341 That backfired. 2290 01:55:18,311 --> 01:55:18,956 Come on over. 2291 01:55:24,481 --> 01:55:26,241 And what is your name? 2292 01:55:26,241 --> 01:55:27,153 AUDIENCE: Marina. 2293 01:55:27,153 --> 01:55:28,111 DAVID J. MALAN: Marina. 2294 01:55:28,111 --> 01:55:29,641 Nice to meet you. 2295 01:55:29,641 --> 01:55:31,718 Who were you trying to volunteer? 2296 01:55:31,718 --> 01:55:32,801 AUDIENCE: My friend Jesse. 2297 01:55:32,801 --> 01:55:33,971 DAVID J. MALAN: OK. 2298 01:55:33,971 --> 01:55:38,291 So here we have for Marina two glasses of liquid, orange and purple, 2299 01:55:38,291 --> 01:55:39,821 just so that they're super obvious. 2300 01:55:39,821 --> 01:55:42,226 And suppose that the problem at hand, like last week, 2301 01:55:42,226 --> 01:55:45,101 it's just to swap two values, as though these two glasses represented 2302 01:55:45,101 --> 01:55:47,111 two people and we want to swap them. 2303 01:55:47,111 --> 01:55:50,501 But let's consider these glasses to be like variables, or location 2304 01:55:50,501 --> 01:55:52,211 in an array, and you know what? 2305 01:55:52,211 --> 01:55:54,681 I'd really like you to swap the values. 2306 01:55:54,681 --> 01:55:58,241 So orange has to go in there, and purple has to go in there. 2307 01:55:58,241 --> 01:55:59,194 How would you do it? 2308 01:55:59,194 --> 01:56:01,361 And we'll see if we can then translate that to code. 2309 01:56:01,361 --> 01:56:03,508 AUDIENCE: [INAUDIBLE] 2310 01:56:03,508 --> 01:56:04,591 DAVID J. MALAN: OK, what-- 2311 01:56:04,591 --> 01:56:06,444 say it a little louder. 2312 01:56:06,444 --> 01:56:07,111 All right, yeah. 2313 01:56:07,111 --> 01:56:09,571 So presumably, you're struggling mentally 2314 01:56:09,571 --> 01:56:12,781 with how you would do this without having an extra cup, so good foresight 2315 01:56:12,781 --> 01:56:13,321 here. 2316 01:56:13,321 --> 01:56:16,191 Let me go ahead and we do have a temporary variable, if you will. 2317 01:56:16,191 --> 01:56:18,691 So if I hand you this, how would you now solve this problem? 2318 01:56:21,181 --> 01:56:22,931 AUDIENCE: I would go like that, but it's-- 2319 01:56:22,931 --> 01:56:23,581 DAVID J. MALAN: No, that's-- 2320 01:56:23,581 --> 01:56:24,371 Oh. 2321 01:56:24,371 --> 01:56:24,871 Well, OK. 2322 01:56:24,871 --> 01:56:27,981 Go do it-- go with your instincts. 2323 01:56:27,981 --> 01:56:29,541 OK. 2324 01:56:29,541 --> 01:56:30,681 Sure, go ahead. 2325 01:56:30,681 --> 01:56:32,811 Go to whatever your instincts are. 2326 01:56:39,201 --> 01:56:41,828 Yeah, so a little-- so strictly speaking, probably 2327 01:56:41,828 --> 01:56:43,911 shouldn't have moved the glasses just because that 2328 01:56:43,911 --> 01:56:45,931 would be like moving the array locations, 2329 01:56:45,931 --> 01:56:48,611 so let's actually do it one more time but the glasses now 2330 01:56:48,611 --> 01:56:50,361 have to go back where they originally are. 2331 01:56:50,361 --> 01:56:55,051 So how would you swap these now, using this temporary variable? 2332 01:56:55,051 --> 01:56:56,476 OK, good. 2333 01:56:56,476 --> 01:56:59,101 Otherwise we'd be completely uprooting the array, for instance, 2334 01:56:59,101 --> 01:57:01,081 by just physically moving it around. 2335 01:57:01,081 --> 01:57:03,571 So you moved the orange into this temporary variable, 2336 01:57:03,571 --> 01:57:05,911 then you copied the purple into where the orange was, 2337 01:57:05,911 --> 01:57:08,281 and now, presumably, excellent. 2338 01:57:08,281 --> 01:57:11,101 The orange is going to end up where the purple once was 2339 01:57:11,101 --> 01:57:13,621 and this temporary variable, it stored up some extra memory. 2340 01:57:13,621 --> 01:57:16,441 It was necessary at the time, but not necessary, ultimately. 2341 01:57:16,441 --> 01:57:22,131 But a round of applause if we could, and thank you for doing that so well. 2342 01:57:22,131 --> 01:57:26,311 So the fact that it instantly occurred to Mariana 2343 01:57:26,311 --> 01:57:29,711 that you need some temporary variable is a perfect translation to code, 2344 01:57:29,711 --> 01:57:32,951 and in fact this code here, that we might glimpse now, 2345 01:57:32,951 --> 01:57:35,038 is reminiscent of exactly that algorithm, 2346 01:57:35,038 --> 01:57:37,871 where A and B, at the end of the day, are the same chunks of memory. 2347 01:57:37,871 --> 01:57:39,881 Just like the second time, the two glasses 2348 01:57:39,881 --> 01:57:42,281 have to kind of stay put, even though we're physically lifting them, 2349 01:57:42,281 --> 01:57:44,031 but they're going back to where they were, 2350 01:57:44,031 --> 01:57:46,031 is kind of like having two values, A and B, 2351 01:57:46,031 --> 01:57:49,091 and you just have a temporary variable into which you copy A, 2352 01:57:49,091 --> 01:57:52,331 then you change A with B, then you go and change 2353 01:57:52,331 --> 01:57:55,271 B with whatever the original value of A was, 2354 01:57:55,271 --> 01:57:59,921 because you temporarily stored it in this temporary variable, tmp. 2355 01:57:59,921 --> 01:58:04,161 Unfortunately, this code doesn't necessarily work as intended. 2356 01:58:04,161 --> 01:58:07,391 So let me go over to my VS Code here and open up 2357 01:58:07,391 --> 01:58:10,661 a program called swap.c, and in swap.c, let 2358 01:58:10,661 --> 01:58:15,641 me whip up something really quickly here with, how about include stdio.h, 2359 01:58:15,641 --> 01:58:17,561 int main(void). 2360 01:58:17,561 --> 01:58:22,751 Inside of main let me do something like x gets 1 and y gets 2. 2361 01:58:22,751 --> 01:58:27,881 Let me just print out as a visual confirmation that x is %i, 2362 01:58:27,881 --> 01:58:32,891 y is %i backslash n, plugging in x and y, respectively. 2363 01:58:32,891 --> 01:58:36,071 Then let me call a swap function that we'll invent in just a moment. 2364 01:58:36,071 --> 01:58:42,761 Swap x and y And then let me print out again x is %i, y is %i backslash n, 2365 01:58:42,761 --> 01:58:46,331 just to print out again what they are, because presumably I should see 1, 2366 01:58:46,331 --> 01:58:49,494 2 first, then 2, 1 the second time. 2367 01:58:49,494 --> 01:58:51,161 Now how is swap going to be implemented? 2368 01:58:51,161 --> 01:58:54,591 Let me implement it exactly as on the screen a moment ago. 2369 01:58:54,591 --> 01:58:57,011 So void swap int x-- 2370 01:58:57,011 --> 01:58:59,501 or let's call it int A for consistency, int B. 2371 01:58:59,501 --> 01:59:01,661 But I could always call those anything I want. 2372 01:59:01,661 --> 01:59:05,891 Int tmp gets A, A gets B, B gets tmp. 2373 01:59:05,891 --> 01:59:08,981 So exactly as I proposed a moment ago, and exactly 2374 01:59:08,981 --> 01:59:12,761 as Mariana really implemented it using these glasses of water. 2375 01:59:12,761 --> 01:59:16,571 I need to now include my prototype, as always, so nothing new there. 2376 01:59:16,571 --> 01:59:20,261 And I'll just copy/paste that up here, and now let's go ahead and run this. 2377 01:59:20,261 --> 01:59:23,471 So make swap-- so far, so good-- swap-- 2378 01:59:23,471 --> 01:59:28,331 x is now 1, y is 2, x is 1, y is 2. 2379 01:59:28,331 --> 01:59:34,091 So there seems to be a bit of a bug here, but why might this be? 2380 01:59:34,091 --> 01:59:37,931 This code does not in fact work, even though it obviously works in reality. 2381 01:59:37,931 --> 01:59:39,725 Yeah? 2382 01:59:39,725 --> 01:59:46,239 AUDIENCE: Because A and B have different addresses than x and y [INAUDIBLE].. 2383 01:59:46,239 --> 01:59:48,031 DAVID J. MALAN: Good, and let me summarize. 2384 01:59:48,031 --> 01:59:51,361 A and B do indeed have different addresses of x and y, 2385 01:59:51,361 --> 01:59:54,961 and in fact what happens when you call a function like this on line 11, 2386 01:59:54,961 --> 01:59:59,221 calling swap, passing in x and y, you are calling a function 2387 01:59:59,221 --> 02:00:00,851 by value, so to speak. 2388 02:00:00,851 --> 02:00:02,611 And this is a term of art that just means 2389 02:00:02,611 --> 02:00:07,321 you are passing in copies of x and y, respectively, and calling them 2390 02:00:07,321 --> 02:00:11,551 A and B in the context of this function, but they're indeed copies. 2391 02:00:11,551 --> 02:00:15,451 Now technically, these names are local only. 2392 02:00:15,451 --> 02:00:18,211 I could have called this x, I could have called this y, 2393 02:00:18,211 --> 02:00:22,531 I could have changed this to x, this to y, this to x, and this to y. 2394 02:00:22,531 --> 02:00:24,031 The problem would still remain. 2395 02:00:24,031 --> 02:00:27,961 Just because you use the same names in one function as you do elsewhere, 2396 02:00:27,961 --> 02:00:29,551 that doesn't mean they're the same. 2397 02:00:29,551 --> 02:00:31,121 They just look the same to you. 2398 02:00:31,121 --> 02:00:35,821 But indeed, swap is going to get copies of this x and y, and in this context, 2399 02:00:35,821 --> 02:00:38,461 this scope, so to speak-- 2400 02:00:38,461 --> 02:00:40,801 x and y will be copies of the original. 2401 02:00:40,801 --> 02:00:43,141 So for clarity, let me revert this back to A and B 2402 02:00:43,141 --> 02:00:46,951 just to make super clear that they're indeed different, albeit copies, 2403 02:00:46,951 --> 02:00:48,901 but there's indeed a problem there. 2404 02:00:48,901 --> 02:00:51,041 This function actually works fine. 2405 02:00:51,041 --> 02:00:52,361 In fact, notice this. 2406 02:00:52,361 --> 02:00:56,921 Let me go ahead and print out inside of this. printf A is %i, 2407 02:00:56,921 --> 02:01:00,991 B is %i backslash n, and then I'll print A and B. 2408 02:01:00,991 --> 02:01:04,201 And let me do that same thing at the beginning of this function before it 2409 02:01:04,201 --> 02:01:05,381 does any work. 2410 02:01:05,381 --> 02:01:06,751 Let me go ahead and rerun. 2411 02:01:06,751 --> 02:01:10,741 Make swap, ./swap, and this is promising. 2412 02:01:10,741 --> 02:01:17,371 Initially, x is 1, y is 2, A is 1, B is 2, A is 2, B is 1, 2413 02:01:17,371 --> 02:01:19,598 but then nope-- x is 1, y is 2. 2414 02:01:19,598 --> 02:01:21,931 So if anything, I've confirmed that the logic is right-- 2415 02:01:21,931 --> 02:01:25,051 Mariana's logic is right, but there's something about C. 2416 02:01:25,051 --> 02:01:28,921 There's something about using one function versus another that's actually 2417 02:01:28,921 --> 02:01:30,671 creating a problem here. 2418 02:01:30,671 --> 02:01:35,021 The fact that I'm passing in copies of these values is creating this problem. 2419 02:01:35,021 --> 02:01:36,391 So what in fact is going on? 2420 02:01:36,391 --> 02:01:39,211 Well again, inside of your computer's memory there is these little chips, 2421 02:01:39,211 --> 02:01:41,086 and we've been talking about them abstractly, 2422 02:01:41,086 --> 02:01:43,141 it's just this grid of memory locations. 2423 02:01:43,141 --> 02:01:46,343 It turns out that your computer uses this memory 2424 02:01:46,343 --> 02:01:47,551 in a pretty conventional way. 2425 02:01:47,551 --> 02:01:51,631 It's not just random, where it just puts stuff wherever is available, 2426 02:01:51,631 --> 02:01:55,591 it actually uses different parts of the memory for different purposes. 2427 02:01:55,591 --> 02:01:58,981 And you have control over a lot of it, but the computer uses some of it 2428 02:01:58,981 --> 02:01:59,823 for itself. 2429 02:01:59,823 --> 02:02:01,531 And let's go ahead and zoom out from this 2430 02:02:01,531 --> 02:02:05,581 and consider that within your computer's memory, what a computer will typically 2431 02:02:05,581 --> 02:02:09,001 do is actually store initially, all of the zeros and ones 2432 02:02:09,001 --> 02:02:13,001 that you compiled in the top of your computer's memory, so to speak. 2433 02:02:13,001 --> 02:02:16,231 So when you compile a program and then you run it with ./whatever, 2434 02:02:16,231 --> 02:02:19,651 or on a Mac or PC you double click on it, the computer first-- 2435 02:02:19,651 --> 02:02:24,781 the operating system first-- loads all of your program zeros and ones, a.k.a. 2436 02:02:24,781 --> 02:02:29,371 Machine code, into just one big chunk of memory at the top, so to speak. 2437 02:02:29,371 --> 02:02:33,301 Below that it stores global variables-- any variables 2438 02:02:33,301 --> 02:02:37,183 you have created in your program that are outside of main and outside 2439 02:02:37,183 --> 02:02:37,891 of any functions. 2440 02:02:37,891 --> 02:02:39,691 Generally, the top of your file. 2441 02:02:39,691 --> 02:02:41,634 Globals tend to go at the top there. 2442 02:02:41,634 --> 02:02:44,551 Then there's this chunk of memory that's generally known as the heap-- 2443 02:02:44,551 --> 02:02:46,951 and we saw that word briefly in Valgin's output, 2444 02:02:46,951 --> 02:02:50,581 and then there's this other chunk of memory called the stack. 2445 02:02:50,581 --> 02:02:55,711 And it turns out that up until this week you were using the stack heavily. 2446 02:02:55,711 --> 02:03:00,961 Any time you use local variables in a function they end up on the stack. 2447 02:03:00,961 --> 02:03:04,681 Any time you use malloc, that memory ends up on the heap. 2448 02:03:04,681 --> 02:03:06,751 Now as the arrow suggests, this actually looks 2449 02:03:06,751 --> 02:03:09,834 like a problem waiting to happen because if you use more and more and more 2450 02:03:09,834 --> 02:03:11,671 heap, and more and more and more stack, it's 2451 02:03:11,671 --> 02:03:14,401 like two things barreling down the tracks at one another-- this does not 2452 02:03:14,401 --> 02:03:14,891 end well. 2453 02:03:14,891 --> 02:03:16,141 And that's actually a problem. 2454 02:03:16,141 --> 02:03:19,481 If you've ever heard the phrase stack overflow, or use the website, 2455 02:03:19,481 --> 02:03:21,271 this is the origin of its name. 2456 02:03:21,271 --> 02:03:23,521 When you start to use more and more and more 2457 02:03:23,521 --> 02:03:25,801 memory by calling lots and lots of functions 2458 02:03:25,801 --> 02:03:28,261 or using lots and lots of local variables, 2459 02:03:28,261 --> 02:03:30,511 you use a lot of this stack memory. 2460 02:03:30,511 --> 02:03:33,961 Or if you use malloc a lot and keep calling malloc, malloc, malloc, 2461 02:03:33,961 --> 02:03:37,681 and never really, or rarely calling free, you just use more and more memory 2462 02:03:37,681 --> 02:03:41,521 and eventually these two things might overflow each other, at which point 2463 02:03:41,521 --> 02:03:42,571 you're just out of luck. 2464 02:03:42,571 --> 02:03:45,191 The program will crash or something bad will happen. 2465 02:03:45,191 --> 02:03:47,971 So the onus is on you just to don't do that. 2466 02:03:47,971 --> 02:03:50,221 But this is the design, generally, of what's 2467 02:03:50,221 --> 02:03:52,111 going on inside of your computer's memory. 2468 02:03:52,111 --> 02:03:55,711 Now within that memory, though, there are certain conventions 2469 02:03:55,711 --> 02:03:57,571 focusing on here, the stack. 2470 02:03:57,571 --> 02:04:00,031 And in fact, let me go over here with a marker 2471 02:04:00,031 --> 02:04:03,521 and say that this represents the bottom of my memory, ultimately. 2472 02:04:03,521 --> 02:04:07,801 And so here we have a whole bunch of wooden blocks and each of these squares 2473 02:04:07,801 --> 02:04:10,091 represents a byte of memory and this, for instance, 2474 02:04:10,091 --> 02:04:12,781 might represent four bytes altogether-- good enough for an int, 2475 02:04:12,781 --> 02:04:14,111 or something like that. 2476 02:04:14,111 --> 02:04:18,451 So in my original code that I wrote earlier, that is in fact, buggy, 2477 02:04:18,451 --> 02:04:20,851 what is in fact going on inside the swap function? 2478 02:04:20,851 --> 02:04:24,901 We can visualize it like this-- when you run ./swap or any program for that 2479 02:04:24,901 --> 02:04:28,501 matter, main is the first function to get called with a C program, 2480 02:04:28,501 --> 02:04:32,011 and so I'm just going to label this bottom row of memory as main. 2481 02:04:32,011 --> 02:04:36,381 And what were the two variables I had in main called in this code? 2482 02:04:36,381 --> 02:04:37,631 Yeah. 2483 02:04:37,631 --> 02:04:38,201 x and y. 2484 02:04:38,201 --> 02:04:40,401 And each of those was an int, so that's four bytes, 2485 02:04:40,401 --> 02:04:43,121 so it's deliberate that I reserved four-- 2486 02:04:43,121 --> 02:04:45,951 a chunk of wood here that's four bytes. 2487 02:04:45,951 --> 02:04:49,901 So let me just call this x, and I'm just going to write the number 1 in this box 2488 02:04:49,901 --> 02:04:50,411 here. 2489 02:04:50,411 --> 02:04:54,431 And then I had my other variable y, and I'm going to put the number 2 there. 2490 02:04:54,431 --> 02:04:58,641 What happens when main calls swap like it does in this code here? 2491 02:04:58,641 --> 02:05:04,931 Well, it has two variables of its own, A and B, and A initially is 1 2492 02:05:04,931 --> 02:05:09,341 and B is initially 2, but it has a third variable, tmp, 2493 02:05:09,341 --> 02:05:12,371 which is a local variable in addition to the arguments A and B 2494 02:05:12,371 --> 02:05:16,931 that are passed in, so I'm going to call this tmp, tmp over here. 2495 02:05:16,931 --> 02:05:18,156 And what is the value of tmp? 2496 02:05:18,156 --> 02:05:19,781 Well, we have to look back at the code. 2497 02:05:19,781 --> 02:05:24,431 tmp initially gets the value of A. All right, the value of a was 1, 2498 02:05:24,431 --> 02:05:26,141 so tmp initially gets 1. 2499 02:05:26,141 --> 02:05:28,601 That's step one in my three line program. 2500 02:05:28,601 --> 02:05:32,621 OK, A equals B. So that is assigned from the right to the left of the B 2501 02:05:32,621 --> 02:05:36,251 into the A So B is 2, A is this, so let me go ahead 2502 02:05:36,251 --> 02:05:38,361 and erase this and just overwrite that. 2503 02:05:38,361 --> 02:05:41,891 So at this moment in the story you have two copies of two, 2504 02:05:41,891 --> 02:05:44,711 so that's OK though, because the third line of code 2505 02:05:44,711 --> 02:05:47,741 says tmp gets copied into B. So what's tmp-- 2506 02:05:47,741 --> 02:05:53,171 1, gets copied into B, so let me overwrite this 2 with a 1, 2507 02:05:53,171 --> 02:05:54,821 and now what happens? 2508 02:05:54,821 --> 02:05:57,941 Now unfortunately, the code ends. 2509 02:05:57,941 --> 02:06:01,511 swap doesn't actually do anything with the result, and the problem in C 2510 02:06:01,511 --> 02:06:03,521 is that I could have had a return value. 2511 02:06:03,521 --> 02:06:05,741 I could go in there and change void to int, 2512 02:06:05,741 --> 02:06:07,511 but which one am I going to return? 2513 02:06:07,511 --> 02:06:09,221 The A or the B? 2514 02:06:09,221 --> 02:06:11,631 The whole goal is to swap two values, and it 2515 02:06:11,631 --> 02:06:13,631 seems kind of lame if you can't write a function 2516 02:06:13,631 --> 02:06:16,661 to do something as common per last week sorting algorithms 2517 02:06:16,661 --> 02:06:18,191 as swapping two values. 2518 02:06:18,191 --> 02:06:19,541 But what really happens? 2519 02:06:19,541 --> 02:06:22,751 Well, even though when this program starts running, 2520 02:06:22,751 --> 02:06:25,991 main is using this chunk of memory at the bottom in the so-called stack, 2521 02:06:25,991 --> 02:06:28,661 and the stack is just like a cafeteria stack of trays-- 2522 02:06:28,661 --> 02:06:30,201 it grows up, like this. 2523 02:06:30,201 --> 02:06:32,291 Here's main's memory on the stack. 2524 02:06:32,291 --> 02:06:34,571 Here's the swap function's memory on the stack. 2525 02:06:34,571 --> 02:06:37,241 It's using three ints instead of two-- 2526 02:06:37,241 --> 02:06:38,951 instead of only two. 2527 02:06:38,951 --> 02:06:42,461 What happens when the function returns, whether it's void or not? 2528 02:06:42,461 --> 02:06:45,701 The sort of recollection that this is swap's memory goes away 2529 02:06:45,701 --> 02:06:47,291 and garbage values are left. 2530 02:06:47,291 --> 02:06:51,531 So, adorably, we get rid of these values here, 2531 02:06:51,531 --> 02:06:55,991 and there's still data there-- technically, the numbers 1, 1, and 2 2532 02:06:55,991 --> 02:06:59,591 are still there in the computer's memory but they no longer belong to us 2533 02:06:59,591 --> 02:07:01,341 because the function has now returned. 2534 02:07:01,341 --> 02:07:04,421 So they're still in there and this is kind of an example visually 2535 02:07:04,421 --> 02:07:07,781 of why there's other stuff in memory even though you didn't put it there, 2536 02:07:07,781 --> 02:07:08,621 necessarily. 2537 02:07:08,621 --> 02:07:11,071 Sometimes you did put it there, but now once 2538 02:07:11,071 --> 02:07:14,711 swap returns you only should be touching memory inside of main. 2539 02:07:14,711 --> 02:07:19,001 But we've never actually copied one value into main. 2540 02:07:19,001 --> 02:07:22,661 We haven't returned anything and we haven't solved this fundamentally. 2541 02:07:22,661 --> 02:07:24,291 So how could we do this? 2542 02:07:24,291 --> 02:07:28,301 Well, what if we instead passed into swap not copies of x and y, 2543 02:07:28,301 --> 02:07:32,681 calling them A and B. What if they passed in breadcrumbs to x and y, 2544 02:07:32,681 --> 02:07:35,861 sort of a treasure map that will lead swap to the actual x 2545 02:07:35,861 --> 02:07:37,241 and to the actual y? 2546 02:07:37,241 --> 02:07:41,051 Today we have that capability using pointers. 2547 02:07:41,051 --> 02:07:44,921 So suppose that we use this code instead. 2548 02:07:44,921 --> 02:07:47,831 There's a lot of stars going on here, which is a bit annoying, 2549 02:07:47,831 --> 02:07:50,501 but let's consider what it is we're trying to achieve. 2550 02:07:50,501 --> 02:07:55,391 What if we pass in not x and y, but the address of x and the address of y, 2551 02:07:55,391 --> 02:07:57,501 respectively-- breadcrumbs, if you will-- 2552 02:07:57,501 --> 02:08:00,521 that will lead swap to the original values. 2553 02:08:00,521 --> 02:08:04,331 Then what we do is we still give ourselves a tmp variable, 2554 02:08:04,331 --> 02:08:05,351 like an empty glass. 2555 02:08:05,351 --> 02:08:07,691 It's still a glass, so we still call it an int, 2556 02:08:07,691 --> 02:08:10,071 but what do we want to put into that temporary variable? 2557 02:08:10,071 --> 02:08:12,654 We don't want to put A into it, because that's an address now. 2558 02:08:12,654 --> 02:08:15,371 We want to go to that address per the star 2559 02:08:15,371 --> 02:08:17,141 and put whatever's at that address. 2560 02:08:17,141 --> 02:08:18,381 What do we then want to do? 2561 02:08:18,381 --> 02:08:22,121 Well, we want to then copy into whatever's at location A, 2562 02:08:22,121 --> 02:08:24,911 we want to copy over to location A's contents 2563 02:08:24,911 --> 02:08:29,111 whatever is at location B's contents and then lastly, we 2564 02:08:29,111 --> 02:08:32,261 want to copy tmp into whatever's at location B. 2565 02:08:32,261 --> 02:08:36,149 So again, we're very deliberately introducing all of these stars 2566 02:08:36,149 --> 02:08:38,441 because we don't want to change any of these addresses, 2567 02:08:38,441 --> 02:08:41,861 we want to go to these addresses per the reference operator 2568 02:08:41,861 --> 02:08:46,221 and put values there, or get values from. 2569 02:08:46,221 --> 02:08:47,691 So what does this actually mean? 2570 02:08:47,691 --> 02:08:52,001 Well, if I kind of rewind in this story and I go back here, I still have tmp, 2571 02:08:52,001 --> 02:08:57,671 although I'm going to delete its value to begin with, I still have B 2572 02:08:57,671 --> 02:09:01,121 and I still have A, but what's going to be different 2573 02:09:01,121 --> 02:09:05,051 this time is how I use A and B. So let me finish erasing those. 2574 02:09:05,051 --> 02:09:07,181 That's A on the left, this is B on the right. 2575 02:09:07,181 --> 02:09:09,701 At this point in the story, we're rerunning swap 2576 02:09:09,701 --> 02:09:13,151 with this new and improved version, and let's see what happens. 2577 02:09:13,151 --> 02:09:16,871 Well, x is presumably at some address. 2578 02:09:16,871 --> 02:09:20,351 Maybe it's like 0x123, as always. 2579 02:09:20,351 --> 02:09:23,471 What then does A get when I'm using this code? 2580 02:09:23,471 --> 02:09:27,131 The value of A is 0x123. 2581 02:09:27,131 --> 02:09:28,391 What is the value of B? 2582 02:09:28,391 --> 02:09:31,661 Maybe y is that 0x456. 2583 02:09:31,661 --> 02:09:32,651 What goes in B? 2584 02:09:32,651 --> 02:09:38,281 Well, I'm going to put 0x456, and the what am I going to do? 2585 02:09:38,281 --> 02:09:40,471 Based on these three lines of code, I'm going 2586 02:09:40,471 --> 02:09:44,671 to store in tmp whatever is at the address in A. What is the address in A? 2587 02:09:44,671 --> 02:09:47,701 That's this thing here, so I'm going to put 1 in tmp. 2588 02:09:47,701 --> 02:09:50,251 Line two-- I'm going to go to B-- 2589 02:09:50,251 --> 02:09:53,131 all right, B is 456, so I'm going to B and I'm 2590 02:09:53,131 --> 02:09:57,931 going to store 2 at whatever is at location A, and at location A 2591 02:09:57,931 --> 02:10:01,211 is 123, so that's this, so what am I going to do? 2592 02:10:01,211 --> 02:10:03,901 I'm going to change this 1 to a 2. 2593 02:10:03,901 --> 02:10:06,631 Last line of code-- get the value of tmp, which is 1, 2594 02:10:06,631 --> 02:10:11,731 and then put it at whatever the location B is, so B, 456, go there 2595 02:10:11,731 --> 02:10:16,291 and change it to be the value of tmp, tmp, which puts 1 here. 2596 02:10:16,291 --> 02:10:17,521 That's it for the code. 2597 02:10:17,521 --> 02:10:19,081 There's still no return value. 2598 02:10:19,081 --> 02:10:22,381 swap returns, which means these three temporary variables 2599 02:10:22,381 --> 02:10:24,091 are garbage values now. 2600 02:10:24,091 --> 02:10:26,471 They can be reused by subsequent function calls 2601 02:10:26,471 --> 02:10:31,091 but now, I've actually swapped the values of x and y. 2602 02:10:31,091 --> 02:10:35,041 Which is to say what came as naturally as the real world here for Mariana 2603 02:10:35,041 --> 02:10:38,521 is not quite as simply done in C because again, 2604 02:10:38,521 --> 02:10:40,861 functions are isolated from each other. 2605 02:10:40,861 --> 02:10:44,141 You can pass in values but you get copies of those values. 2606 02:10:44,141 --> 02:10:48,691 If you want one function to affect the value of a variable somewhere else, 2607 02:10:48,691 --> 02:10:52,021 you have to 1, understand what's going on but 2, 2608 02:10:52,021 --> 02:10:54,971 pass things in as by a pointer here. 2609 02:10:54,971 --> 02:10:58,561 So if I go back to my code here, I need to make a few changes now. 2610 02:10:58,561 --> 02:11:00,661 Let me get rid of these extra printf's. 2611 02:11:00,661 --> 02:11:03,391 Let me go in and add all these stars. 2612 02:11:03,391 --> 02:11:07,411 So I'm dereferencing these actual addresses here and here, 2613 02:11:07,411 --> 02:11:09,821 and I've got to make one more change. 2614 02:11:09,821 --> 02:11:16,381 How do I now call swap if swap is expecting an int star and an int star? 2615 02:11:16,381 --> 02:11:19,441 That is, the address of an int and the address of another int. 2616 02:11:19,441 --> 02:11:21,931 What do I change on line 11 here? 2617 02:11:21,931 --> 02:11:24,231 Yeah. 2618 02:11:24,231 --> 02:11:25,983 Sorry, a little louder. 2619 02:11:25,983 --> 02:11:30,231 AUDIENCE: [INAUDIBLE] 2620 02:11:30,231 --> 02:11:33,051 DAVID J. MALAN: Sorry, the address of operator. 2621 02:11:33,051 --> 02:11:37,731 So up here on line 11, we do ampersand x and ampersand y. 2622 02:11:37,731 --> 02:11:41,001 So that yes, we're technically passing in a copy of a value, 2623 02:11:41,001 --> 02:11:43,881 but this time the copy we're passing in is technically an address, 2624 02:11:43,881 --> 02:11:47,271 and as soon as we have an address, just like when I held up the fuzzy finger-- 2625 02:11:47,271 --> 02:11:50,571 the foamy finger-- I can point at that address, I can go to that address 2626 02:11:50,571 --> 02:11:54,561 and actually get a value from the mailbox or put a value into the mailbox 2627 02:11:54,561 --> 02:11:56,821 if I even want. 2628 02:11:56,821 --> 02:12:01,551 So let's cross our fingers now and do make swap, Enter. 2629 02:12:01,551 --> 02:12:02,721 Oh my God, so many mistakes. 2630 02:12:02,721 --> 02:12:04,881 Oh, I didn't remember to change my prototype, 2631 02:12:04,881 --> 02:12:08,421 so let me go way up here and add two more stars because I 2632 02:12:08,421 --> 02:12:09,801 made that change already. 2633 02:12:09,801 --> 02:12:14,961 Make swap, ./swap, and viola-- now I have actually swapped. 2634 02:12:14,961 --> 02:12:15,741 Thank you. 2635 02:12:19,291 --> 02:12:19,831 Thank you. 2636 02:12:19,831 --> 02:12:21,661 The two values. 2637 02:12:21,661 --> 02:12:24,491 All right, so what more can we do here? 2638 02:12:24,491 --> 02:12:29,461 Well, let me consider that all this time we've 2639 02:12:29,461 --> 02:12:33,691 been deliberately using GetString and GetInt and GetFloat 2640 02:12:33,691 --> 02:12:35,111 and so forth, but for a reason. 2641 02:12:35,111 --> 02:12:38,069 These aren't just training wheels for the sake of making things easier, 2642 02:12:38,069 --> 02:12:41,071 they're actually in place to make your code safer. 2643 02:12:41,071 --> 02:12:45,511 And to illustrate this, let me go ahead and open up one other file here. 2644 02:12:45,511 --> 02:12:49,861 How about a file called scanf.c. 2645 02:12:49,861 --> 02:12:52,891 It turns out that the old school way-- the way in C, 2646 02:12:52,891 --> 02:12:57,151 really, of getting user input, is via functions like scanf, 2647 02:12:57,151 --> 02:13:00,751 and let me go ahead and include stdio.h, int main(void), 2648 02:13:00,751 --> 02:13:04,441 and without using the CS50 library at all for strings or for any of those 2649 02:13:04,441 --> 02:13:05,611 get functions. 2650 02:13:05,611 --> 02:13:08,161 Let me give myself an int called x. 2651 02:13:08,161 --> 02:13:12,076 Let me just print out what the value of x is, even though it's going to be a-- 2652 02:13:12,076 --> 02:13:15,361 or rather, ask the user for the value by asking them for x. 2653 02:13:15,361 --> 02:13:18,781 And I'm going to use a function called scanf that's going to scan 2654 02:13:18,781 --> 02:13:25,351 in an integer using %i, and I'm going to store whatever the human types 2655 02:13:25,351 --> 02:13:27,306 in at this location. 2656 02:13:27,306 --> 02:13:30,181 And then I'm going to go ahead and, just so we can see what happened, 2657 02:13:30,181 --> 02:13:34,231 I'm going to print out with %i whatever the human typed in as follows. 2658 02:13:34,231 --> 02:13:37,321 All right, so line eight is week 1 style code. 2659 02:13:37,321 --> 02:13:40,991 Line five and six is week 1 style code. 2660 02:13:40,991 --> 02:13:46,411 So the curiosity today is this new line. scanf is another function in stdio.h, 2661 02:13:46,411 --> 02:13:47,971 and notice what I'm doing. 2662 02:13:47,971 --> 02:13:50,671 I'm using the same syntax that I use for printf, 2663 02:13:50,671 --> 02:13:54,091 which is kind of a little clue-- a format code to tell scanf what it is I 2664 02:13:54,091 --> 02:13:57,031 want to scan in, that is, read from the human's keyboard-- 2665 02:13:57,031 --> 02:14:00,571 and I'm telling it where to put whatever the human typed in. 2666 02:14:00,571 --> 02:14:04,321 I can't just say x, because we run into the same darn problem as with swap. 2667 02:14:04,321 --> 02:14:06,811 I have to give a little breadcrumb to the variable 2668 02:14:06,811 --> 02:14:10,111 where I want scanf to put the human's integer. 2669 02:14:10,111 --> 02:14:13,541 And so this just tells the computer to get an int. 2670 02:14:13,541 --> 02:14:15,781 This is what you would have had to type, essentially, 2671 02:14:15,781 --> 02:14:18,691 in week 1 just to get an int from the user, 2672 02:14:18,691 --> 02:14:21,541 and there's a whole bunch of things that can go wrong still, 2673 02:14:21,541 --> 02:14:24,931 but that's the cryptic syntax we would have had to show you in week 1. 2674 02:14:24,931 --> 02:14:26,881 Let me go ahead and make scanf here-- 2675 02:14:26,881 --> 02:14:29,941 oops-- user error. 2676 02:14:29,941 --> 02:14:31,891 Put the semicolon in the wrong place. 2677 02:14:31,891 --> 02:14:33,781 Make scanf, Enter. 2678 02:14:33,781 --> 02:14:35,281 Oh my God. 2679 02:14:35,281 --> 02:14:36,676 Non void doesn't return a value. 2680 02:14:40,371 --> 02:14:42,591 Oh, thank you. 2681 02:14:42,591 --> 02:14:43,221 Strike two. 2682 02:14:43,221 --> 02:14:43,851 OK. 2683 02:14:43,851 --> 02:14:45,141 Make scanf. 2684 02:14:45,141 --> 02:14:45,831 There we go. 2685 02:14:45,831 --> 02:14:46,971 OK, so scanf-- 2686 02:14:46,971 --> 02:14:49,951 I'm going to type in a number like 50 and it just prints it back out. 2687 02:14:49,951 --> 02:14:54,181 So that is the traditional way of implementing something like GetInt. 2688 02:14:54,181 --> 02:14:57,651 The problem, though, is when you start to get into strings, things 2689 02:14:57,651 --> 02:14:59,121 get dangerous quickly. 2690 02:14:59,121 --> 02:15:01,289 Let me delete all of this and give myself 2691 02:15:01,289 --> 02:15:03,831 a string s, although wait a minute-- we don't call it strings 2692 02:15:03,831 --> 02:15:06,891 anymore-- char star to store a string. 2693 02:15:06,891 --> 02:15:10,731 Then let me go ahead and just prompt the user for a string, using just printf. 2694 02:15:10,731 --> 02:15:15,531 Then let me go ahead and use scanf, ask them for a string this time with %s, 2695 02:15:15,531 --> 02:15:18,211 and store it at that address. 2696 02:15:18,211 --> 02:15:20,751 Then let me go ahead and print out whatever the human typed 2697 02:15:20,751 --> 02:15:23,641 in just by using the same notation. 2698 02:15:23,641 --> 02:15:28,791 So here, line five is the same thing as string s, but we've taken back 2699 02:15:28,791 --> 02:15:31,191 that layer today so it's char star s. 2700 02:15:31,191 --> 02:15:35,991 This is just week one this is just week one, line seven is new. 2701 02:15:35,991 --> 02:15:41,811 scanf will also read from the human's keyboard a string and store it at s. 2702 02:15:41,811 --> 02:15:43,641 But that's OK, because s is an address. 2703 02:15:43,641 --> 02:15:46,551 It's correct not to do the ampersand. 2704 02:15:46,551 --> 02:15:47,451 It's not necessary. 2705 02:15:47,451 --> 02:15:52,071 A string is and has always been a char star, a.k.a string. 2706 02:15:52,071 --> 02:15:54,091 The problem, though, arises as follows-- 2707 02:15:54,091 --> 02:15:56,411 if I do make scanf-- 2708 02:15:56,411 --> 02:15:57,911 oh my God, what did I do wrong-- 2709 02:15:57,911 --> 02:16:00,431 I can't-- OK, we have certain defenses in place with make. 2710 02:16:00,431 --> 02:16:06,881 Let me do clang of scanf.c, an output of program called scanf. 2711 02:16:06,881 --> 02:16:09,838 All right, so I'm overriding some of our pedagogical defenses 2712 02:16:09,838 --> 02:16:11,171 that we have in place with make. 2713 02:16:11,171 --> 02:16:15,761 Let me now run scanf of this version, Enter, and let me type in something 2714 02:16:15,761 --> 02:16:20,341 like, how about hi again. 2715 02:16:20,341 --> 02:16:23,161 So it didn't even store something and it weirdly printed out null. 2716 02:16:23,161 --> 02:16:26,821 This time it's in lowercase, but that is somewhat related. 2717 02:16:26,821 --> 02:16:31,561 What did I fundamentally do wrong though, here? 2718 02:16:31,561 --> 02:16:33,691 Why is this getting more and more dangerous? 2719 02:16:33,691 --> 02:16:35,471 And let me illustrate the point even more. 2720 02:16:35,471 --> 02:16:38,741 What if I type in not just something like hello, which also doesn't work. 2721 02:16:38,741 --> 02:16:44,581 What if I do like, hellooooo and make a really long string, Enter-- 2722 02:16:44,581 --> 02:16:45,871 that still works. 2723 02:16:45,871 --> 02:16:48,191 Can I do this again? 2724 02:16:48,191 --> 02:16:50,091 Let's try again. 2725 02:16:50,091 --> 02:16:53,271 Right, a really long, unexpectedly long string. 2726 02:16:53,271 --> 02:16:55,131 This is the nondeterminism kicking in. 2727 02:16:55,131 --> 02:16:55,851 Enter. 2728 02:16:55,851 --> 02:16:56,421 All right, damn it. 2729 02:16:56,421 --> 02:16:58,254 I was trying to trigger a segmentation fault 2730 02:16:58,254 --> 02:17:01,491 but it wouldn't, but the point still remains. 2731 02:17:01,491 --> 02:17:06,181 It's still not working, but what's the essence of why this isn't working, 2732 02:17:06,181 --> 02:17:07,851 and it's not storing my actual input? 2733 02:17:07,851 --> 02:17:08,731 Yeah. 2734 02:17:08,731 --> 02:17:10,666 AUDIENCE: Do you have to make a space? 2735 02:17:10,666 --> 02:17:12,541 DAVID J. MALAN: We have to make space for it. 2736 02:17:12,541 --> 02:17:15,781 So what we're missing here is malloc, or something like that. 2737 02:17:15,781 --> 02:17:18,741 So I could do that, I could do something like this. 2738 02:17:18,741 --> 02:17:21,441 Well, let the human type in at least a three letter word 2739 02:17:21,441 --> 02:17:25,581 so I could do malloc of 3 plus 1 for the null character. 2740 02:17:25,581 --> 02:17:29,961 So let me give them four characters, and let me go ahead and do make scanf-- 2741 02:17:29,961 --> 02:17:30,921 whoops. 2742 02:17:30,921 --> 02:17:33,081 Nope, sorry. clang, I have to-- 2743 02:17:33,081 --> 02:17:33,721 nope. 2744 02:17:33,721 --> 02:17:34,221 Dammit. 2745 02:17:34,221 --> 02:17:40,811 Oh, include stdlib.h-- there we go. 2746 02:17:40,811 --> 02:17:43,836 That gives me malloc, now I'm going to recompile this with clang, 2747 02:17:43,836 --> 02:17:46,961 now I'm going to rerun it, and now I'm going to type in my first thing, hi. 2748 02:17:46,961 --> 02:17:48,341 That now works. 2749 02:17:48,341 --> 02:17:52,061 And let me get a little aggressive now and type in hello, which is too long. 2750 02:17:52,061 --> 02:17:54,101 Still works, but I'm getting lucky. 2751 02:17:54,101 --> 02:17:57,671 Let me try a hellooooooo. 2752 02:17:57,671 --> 02:17:59,995 Damn it, that still works, too. 2753 02:17:59,995 --> 02:18:01,091 Sort of. 2754 02:18:01,091 --> 02:18:03,290 But it actually-- not quite. 2755 02:18:03,290 --> 02:18:05,411 There's some weirdness going on there already. 2756 02:18:05,411 --> 02:18:07,011 It turns out I can also do this. 2757 02:18:07,011 --> 02:18:10,390 I could actually just say char star four and give myself 2758 02:18:10,390 --> 02:18:11,681 an array of four characters. 2759 02:18:11,681 --> 02:18:13,101 Let me try this one more time. 2760 02:18:13,101 --> 02:18:16,661 So let me rerun clang ./scanf. 2761 02:18:16,661 --> 02:18:21,460 Hellooooooo, clearly exceeding the four characters-- 2762 02:18:21,460 --> 02:18:22,091 there we go. 2763 02:18:22,091 --> 02:18:23,080 Thank you, all right. 2764 02:18:26,821 --> 02:18:29,342 So the point here, though, is if we hadn't given you GetInt, 2765 02:18:29,342 --> 02:18:31,800 you would have had to use the scanf thing-- not a huge deal 2766 02:18:31,800 --> 02:18:33,071 because it seemed to work. 2767 02:18:33,071 --> 02:18:36,321 But if we hadn't given you GetString you would have had to do stuff like this, 2768 02:18:36,321 --> 02:18:39,481 knowing about malloc already or knowing about strings being erased, 2769 02:18:39,481 --> 02:18:41,550 and even now there's a danger. 2770 02:18:41,550 --> 02:18:45,751 If the human types in five letters, six letters, 100 letters-- this code, 2771 02:18:45,751 --> 02:18:49,501 like with the Hello input, will probably just crash, which is bad. 2772 02:18:49,501 --> 02:18:51,481 So GetString also has this functionality built 2773 02:18:51,481 --> 02:18:53,790 in where we have a fancy loop inside such 2774 02:18:53,790 --> 02:18:58,321 that we allocate using malloc as many bytes as you physically type in, 2775 02:18:58,321 --> 02:19:00,271 and we use malloc essentially every keystroke. 2776 02:19:00,271 --> 02:19:05,101 The moment you type in h-e-l-l-o, we're laying the tracks as we go and we keep 2777 02:19:05,101 --> 02:19:09,571 allocating more and more memory so that we theoretically will never crash with 2778 02:19:09,571 --> 02:19:12,300 GetString even though it's this easy to crack-- 2779 02:19:12,300 --> 02:19:15,451 this easy to crash your code using scanf if you again 2780 02:19:15,451 --> 02:19:18,121 did it without the help of a library. 2781 02:19:18,121 --> 02:19:20,178 So where are we all going with this? 2782 02:19:20,178 --> 02:19:22,261 Well, let me show you a few final examples that'll 2783 02:19:22,261 --> 02:19:24,601 pave the way for what will be problem set four. 2784 02:19:24,601 --> 02:19:27,761 Let me go ahead and open up from today's code-- 2785 02:19:27,761 --> 02:19:29,880 which is available on the course's website-- 2786 02:19:29,880 --> 02:19:36,841 for instance, a program like this, called phonebook.c, 2787 02:19:36,841 --> 02:19:39,540 and I'm just going to give you a quick tour of it, 2788 02:19:39,540 --> 02:19:42,502 that you'll see more details on in the context of p-set four itself. 2789 02:19:42,502 --> 02:19:45,210 We're going to introduce a few new functions you're going to see. 2790 02:19:45,210 --> 02:19:48,451 You're going to see a function called fopen, which stands for file open, 2791 02:19:48,451 --> 02:19:51,842 and it takes two arguments-- the name of a file to open like a CSV 2792 02:19:51,842 --> 02:19:55,050 that you might manipulate in Excel or Google Spreadsheets or the like-- comma 2793 02:19:55,050 --> 02:19:59,851 separated values, and then something like A for append, R for read, 2794 02:19:59,851 --> 02:20:02,790 W for write, depending on whether you want to add to the file, 2795 02:20:02,790 --> 02:20:05,321 just open it up, or change it. 2796 02:20:05,321 --> 02:20:07,831 We're going to introduce you to a file pointer. 2797 02:20:07,831 --> 02:20:09,671 You'll see that capital file-- 2798 02:20:09,671 --> 02:20:12,271 which is a little bit unconventional-- capital file is 2799 02:20:12,271 --> 02:20:15,121 a pointer to an actual file on the computer's hard drive 2800 02:20:15,121 --> 02:20:17,640 so that you can actually access something like a CSV file, 2801 02:20:17,640 --> 02:20:18,991 or heck, even images. 2802 02:20:18,991 --> 02:20:21,300 And we're going to see down below that you're also 2803 02:20:21,300 --> 02:20:25,050 going to have the ability to write files as well, or print to files. 2804 02:20:25,050 --> 02:20:28,981 You'll see functions like printf printf for file printf. 2805 02:20:28,981 --> 02:20:34,111 Or fwrite-- file write-- which now that you will begin to understand pointers, 2806 02:20:34,111 --> 02:20:37,951 you'll have the ability to actually not only read files-- 2807 02:20:37,951 --> 02:20:41,470 text files, images, other things-- but also write them out. 2808 02:20:41,470 --> 02:20:46,921 In fact for instance, just as a teaser here, JPEGs will be one of the things 2809 02:20:46,921 --> 02:20:49,321 we focus on this week where we give you a forensic image 2810 02:20:49,321 --> 02:20:51,991 and your goal is to recover as many photographs 2811 02:20:51,991 --> 02:20:55,651 from this forensic image of a digital camera as you possibly can. 2812 02:20:55,651 --> 02:20:59,071 And the way you're going to do that is by knowing in advance 2813 02:20:59,071 --> 02:21:03,571 that every JPEG in the world starts with these three bytes, written 2814 02:21:03,571 --> 02:21:05,800 in hexadecimal, but these three numbers. 2815 02:21:05,800 --> 02:21:08,521 And so in fact, just as a teaser, let me open up 2816 02:21:08,521 --> 02:21:11,701 an example you'll see on the course's website for today. 2817 02:21:11,701 --> 02:21:14,436 If I scroll through here, you'll see a program 2818 02:21:14,436 --> 02:21:16,061 that does a little something like this. 2819 02:21:16,061 --> 02:21:18,211 And again, more on this-- 2820 02:21:18,211 --> 02:21:20,401 if we could hit the button-- 2821 02:21:20,401 --> 02:21:21,041 there we go. 2822 02:21:21,041 --> 02:21:26,221 So here we have the notion of a byte we're going to create for ourselves. 2823 02:21:26,221 --> 02:21:29,101 We'll see a data type called byte, which is a common convention. 2824 02:21:29,101 --> 02:21:30,341 This gives me three bytes. 2825 02:21:30,341 --> 02:21:32,674 And you're going to learn about a function called fread, 2826 02:21:32,674 --> 02:21:36,571 which reads from a file some number of bytes-- for instance, three bytes. 2827 02:21:36,571 --> 02:21:38,341 We might then use code like this. 2828 02:21:38,341 --> 02:21:42,001 If bytes bracket zero equals equals 0xFF and bytes 2829 02:21:42,001 --> 02:21:47,761 bracket 1 equals 0xD8 and bytes bracket 2 equals 0xFF, all three of those 2830 02:21:47,761 --> 02:21:52,481 bytes I just claimed represent a JPEG, you'll see an output like this. 2831 02:21:52,481 --> 02:21:55,811 Let me go ahead and run this program as follows. 2832 02:21:55,811 --> 02:21:59,921 Let me copy jpeg.c into my directory from today's distribution. 2833 02:21:59,921 --> 02:22:08,071 Let me do make jpeg, and let me run jpeg on a file which is available online 2834 02:22:08,071 --> 02:22:11,841 called lecture.jpeg, and I claim yes, it's possibly a JPEG. 2835 02:22:11,841 --> 02:22:12,841 Well, what is that file? 2836 02:22:12,841 --> 02:22:16,481 Let me open it up for us, called lecture.jpeg, and here, for instance, 2837 02:22:16,481 --> 02:22:20,581 is that same photo with which we began class, namely implemented as a JPEG. 2838 02:22:20,581 --> 02:22:22,711 But what we're also going to do this week 2839 02:22:22,711 --> 02:22:27,631 is start to implement our own sort of filters a la Instagram, whereby 2840 02:22:27,631 --> 02:22:30,901 we might take images and actually run them through a program that 2841 02:22:30,901 --> 02:22:32,919 creates different versions thereof. 2842 02:22:32,919 --> 02:22:34,711 For instance, using a different file format 2843 02:22:34,711 --> 02:22:38,501 called BMP, which essentially lays out all of its pixels from left to right, 2844 02:22:38,501 --> 02:22:39,901 top to bottom, in a grid. 2845 02:22:39,901 --> 02:22:41,461 You're going to see a struct-- 2846 02:22:41,461 --> 02:22:43,501 a data struct in C that's way more complicated 2847 02:22:43,501 --> 02:22:45,631 than the candidate structure from the past, 2848 02:22:45,631 --> 02:22:47,866 or the person structure from the past, that 2849 02:22:47,866 --> 02:22:50,491 looks like this, which is just a whole bunch more values in it, 2850 02:22:50,491 --> 02:22:52,408 but we'll walk you through these in the p-set. 2851 02:22:52,408 --> 02:22:54,421 And we might take a photograph like this and ask 2852 02:22:54,421 --> 02:22:56,881 you to run a few different filters on it a la Instagram, 2853 02:22:56,881 --> 02:23:00,511 like a black and white filter, or grayscale, a sepia filter 2854 02:23:00,511 --> 02:23:04,531 to give it some old school feel, or a reflection like this to invert it, 2855 02:23:04,531 --> 02:23:07,121 or blur it, even in this way. 2856 02:23:07,121 --> 02:23:10,111 And just to end on a note here, I have a version 2857 02:23:10,111 --> 02:23:13,621 of this code ready to go that doesn't implement all of those filters, 2858 02:23:13,621 --> 02:23:16,351 it just implements one filter initially. 2859 02:23:16,351 --> 02:23:19,051 Let me go ahead and just ready this on my computer here. 2860 02:23:19,051 --> 02:23:21,106 I'm going to go into my own version of filter 2861 02:23:21,106 --> 02:23:22,981 and you'll see a few files that will give you 2862 02:23:22,981 --> 02:23:26,621 a tour of this coming week in bitmap.h, for instance, 2863 02:23:26,621 --> 02:23:31,511 is a version of this structure that I claimed existed a moment ago. 2864 02:23:31,511 --> 02:23:39,361 And let me show you this file here, helpers.c, in which there is a function 2865 02:23:39,361 --> 02:23:43,051 called filter that I've already implemented in advance today. 2866 02:23:43,051 --> 02:23:46,111 But the ones we give you for the piece that won't already be implemented, 2867 02:23:46,111 --> 02:23:48,486 this function called filter takes the height of an image, 2868 02:23:48,486 --> 02:23:51,581 the width of an image, and a two dimensional array. 2869 02:23:51,581 --> 02:23:54,571 So rows and columns of pixels, and then I 2870 02:23:54,571 --> 02:23:58,411 have a loop like this that iterates over all of the pixels in an image from top 2871 02:23:58,411 --> 02:24:00,041 to bottom, left to right. 2872 02:24:00,041 --> 02:24:02,011 And then notice what I'm going to do here. 2873 02:24:02,011 --> 02:24:05,191 I'm going to change the blue value to be zero in this case, 2874 02:24:05,191 --> 02:24:07,601 and the green value to be zero in this case. 2875 02:24:07,601 --> 02:24:08,341 But why? 2876 02:24:08,341 --> 02:24:12,091 Well, the image I have here in mind is this one, 2877 02:24:12,091 --> 02:24:14,881 whereby we have this hidden image that simply 2878 02:24:14,881 --> 02:24:18,151 has old school style-- a secret message embedded in it. 2879 02:24:18,151 --> 02:24:21,361 And if you don't happen to have in your dorm one of these secret decoder 2880 02:24:21,361 --> 02:24:23,581 glasses that essentially make everything red-- 2881 02:24:23,581 --> 02:24:26,456 getting rid of the green in the world and the blue in the world-- 2882 02:24:26,456 --> 02:24:28,831 you can actually-- I'm actually probably the only one who 2883 02:24:28,831 --> 02:24:31,111 can read this right now-- see what message 2884 02:24:31,111 --> 02:24:33,391 is hidden behind all of this red noise. 2885 02:24:33,391 --> 02:24:39,121 But if using my code written here in helpers.c I get rid of all the blue 2886 02:24:39,121 --> 02:24:41,821 in the picture and I get rid of all the green in the picture, 2887 02:24:41,821 --> 02:24:44,431 essentially implementing the idea of this filter-- 2888 02:24:44,431 --> 02:24:47,251 this red filter where you only see red-- 2889 02:24:47,251 --> 02:24:50,501 well, let's go ahead and compile this program. 2890 02:24:50,501 --> 02:24:55,471 Make filter, run ./filter on this hidden message.bmp. 2891 02:24:55,471 --> 02:24:58,531 I'm going to save it in a new file called message.bmp, 2892 02:24:58,531 --> 02:25:01,471 and with one final flourish we're going to open up 2893 02:25:01,471 --> 02:25:05,371 message.bmp, which is the result of having put on these glasses, 2894 02:25:05,371 --> 02:25:08,521 and hopefully now you too will see what I see. 2895 02:25:17,531 --> 02:25:18,931 All right, that's it for CS50! 2896 02:25:18,931 --> 02:25:19,931 We'll see you next time. 2897 02:25:21,731 --> 02:25:25,681 [MUSIC PLAYING]