1 00:00:08,986 --> 00:00:09,756 >> David: All right. 2 00:00:09,966 --> 00:00:11,756 Welcome back to CS50. 3 00:00:11,926 --> 00:00:14,236 This is the end of week four. 4 00:00:14,236 --> 00:00:16,626 So, literally earlier today when I was on my way 5 00:00:16,686 --> 00:00:20,356 to campus I emerged from the Harvard Subway Station and I was 6 00:00:20,356 --> 00:00:22,816 about to exit the station up the stairs 7 00:00:22,816 --> 00:00:25,486 when I noticed this older woman who was standing 8 00:00:25,486 --> 00:00:26,526 by one of these things. 9 00:00:26,526 --> 00:00:29,166 If unfamiliar, these are the mechanical computer devices 10 00:00:29,166 --> 00:00:31,826 that you have to use to buy a ticket on the T these days 11 00:00:31,826 --> 00:00:33,786 and there was one of these really awkward 12 00:00:33,786 --> 00:00:35,806 but touching situations where it was clear 13 00:00:35,806 --> 00:00:38,666 that this woman had no idea, you know, what to do, 14 00:00:38,666 --> 00:00:42,616 how to get from here to here, and I could see her just staring 15 00:00:42,616 --> 00:00:44,316 at the turnstiles that you're supposed to go 16 00:00:44,316 --> 00:00:45,416 through to get into the subway. 17 00:00:45,756 --> 00:00:48,936 So, you know, I actually paused and tried to lend a hand 18 00:00:48,936 --> 00:00:51,356 with this and what struck me at the end of this experience 19 00:00:51,356 --> 00:00:54,016 of walking this individual through the process 20 00:00:54,056 --> 00:00:56,926 of buying a ticket was actually a recurring theme 21 00:00:56,926 --> 00:00:58,126 in my mind for this course. 22 00:00:58,386 --> 00:01:00,856 One of the discomforts, one of the reasons that I think 23 00:01:00,856 --> 00:01:03,236 so many people in society and even 24 00:01:03,236 --> 00:01:05,646 in this class are uncomfortable when it comes to computers 25 00:01:05,646 --> 00:01:08,256 and intimidated by technology and kind of throw up their hands 26 00:01:08,546 --> 00:01:11,576 when things go wrong that you don't understand is frankly not 27 00:01:11,576 --> 00:01:13,776 necessarily your fault. 28 00:01:13,866 --> 00:01:17,596 The problem is recurringly in society 29 00:01:17,596 --> 00:01:19,236 that it's just horrible designs. 30 00:01:19,356 --> 00:01:22,126 People make really stupid decisions and so as you know 31 00:01:22,126 --> 00:01:25,106 in this course with P sets there are three axes that we look 32 00:01:25,106 --> 00:01:26,366 at when evaluating your code. 33 00:01:26,366 --> 00:01:28,806 Correctness just doesn't do what it's supposed to do 34 00:01:29,066 --> 00:01:30,996 but then also design and design is one 35 00:01:30,996 --> 00:01:32,996 of those things it's a little harder to put your finger 36 00:01:32,996 --> 00:01:35,846 on initially, but it really is something to bear 37 00:01:35,846 --> 00:01:37,536 in mind especially toward terms end 38 00:01:37,746 --> 00:01:39,556 when you tackle your own final projects. 39 00:01:39,556 --> 00:01:40,866 Odds are statistically most 40 00:01:40,866 --> 00:01:43,326 of you will tackle something web based although you'll be able 41 00:01:43,326 --> 00:01:47,706 to tackle something android based, iPhone based, C based, 42 00:01:47,946 --> 00:01:51,896 PHP, really the choice will ultimately be yours, 43 00:01:52,216 --> 00:01:53,926 but to think about exactly how 44 00:01:53,926 --> 00:01:56,866 to design good software is something you, 45 00:01:57,396 --> 00:01:59,056 is certainly a lesson that we want to promote. 46 00:01:59,056 --> 00:02:02,186 So, I actually paused for a moment to take some photographs 47 00:02:02,186 --> 00:02:04,956 after this particular experience because after walking this woman 48 00:02:04,956 --> 00:02:06,956 who happened to be, I mean she was 49 00:02:06,956 --> 00:02:09,126 at least 70 plus years old I would say, 50 00:02:09,126 --> 00:02:11,796 no offense if she's watching this on the Internet now 51 00:02:12,066 --> 00:02:14,746 and I got that wrong, but at least 70 years old. 52 00:02:14,746 --> 00:02:16,066 She was foreign and she admitted 53 00:02:16,066 --> 00:02:19,056 to me upfront this was her first time in the Boston Subway System 54 00:02:19,056 --> 00:02:21,536 and so this was completely new to her and yet you would think 55 00:02:21,536 --> 00:02:24,686 for sort of a metropolitan city you should have designed your 56 00:02:24,686 --> 00:02:27,386 technology, you should design your computer programs in a way 57 00:02:27,676 --> 00:02:30,866 that anyone who is visiting your city or who lives 58 00:02:30,866 --> 00:02:33,956 in your city can actually solve these problems quickly 59 00:02:33,956 --> 00:02:36,316 and obviously and yet here is what our experience was 60 00:02:36,316 --> 00:02:36,916 like together. 61 00:02:37,276 --> 00:02:38,576 So, we approached this machine here, 62 00:02:38,576 --> 00:02:41,056 the photograph is a little blurry, but it was very clear 63 00:02:41,056 --> 00:02:44,076 to us but if you zoom in on the screen and apologies 64 00:02:44,076 --> 00:02:45,696 for the reflections, again, this was just taken 65 00:02:45,696 --> 00:02:48,986 with my cell phone, you see what appear to be three huge buttons 66 00:02:48,986 --> 00:02:51,346 and at the top it said something like have a Charlie Card, 67 00:02:51,646 --> 00:02:53,736 which may very well apply to a lot of locals. 68 00:02:53,736 --> 00:02:55,526 And then at top right there's this longer sentence 69 00:02:55,526 --> 00:02:58,286 to purchase new Charlie ticket, value tickets 70 00:02:58,286 --> 00:02:59,606 and passes, press here. 71 00:02:59,936 --> 00:03:02,066 So, ask yourself frankly especially when it comes 72 00:03:02,096 --> 00:03:05,096 to designing your own tools and your own programs whether it's 73 00:03:05,096 --> 00:03:08,026 for real users to use, whether it's for your research group 74 00:03:08,026 --> 00:03:11,006 to use in some other field, you know, where do you even begin? 75 00:03:11,006 --> 00:03:15,596 Frankly, I would argue that probably 80%, 90% of the users 76 00:03:15,596 --> 00:03:19,056 of Boston Subway just want to buy one damn ticket to get 77 00:03:19,056 --> 00:03:21,286 on the subway and yet where is the button 78 00:03:21,286 --> 00:03:22,326 that says one-way pass? 79 00:03:22,746 --> 00:03:25,166 Where is the button that just says click here 80 00:03:25,456 --> 00:03:27,196 to do the most common case? 81 00:03:27,196 --> 00:03:29,466 This is optimized not for that common case 82 00:03:29,746 --> 00:03:31,106 but for all possible cases. 83 00:03:31,106 --> 00:03:34,096 So, I did some reading quickly and she looked at the screen 84 00:03:34,096 --> 00:03:36,856 for a moment and then I suggested we hit the top right 85 00:03:36,856 --> 00:03:39,236 button to purchase new Charlie Ticket, whatever that is, 86 00:03:39,556 --> 00:03:40,826 value tickets and passes. 87 00:03:40,826 --> 00:03:41,726 So, we pressed here. 88 00:03:42,086 --> 00:03:42,896 So now we get this. 89 00:03:42,896 --> 00:03:44,716 It's a little more straightforward now, 90 00:03:44,716 --> 00:03:47,166 but even then semantically frankly it's a little confusing. 91 00:03:47,166 --> 00:03:48,606 What ticket would you like to purchase? 92 00:03:48,846 --> 00:03:51,496 Well, you can buy a pass, which would seem conceptually not 93 00:03:51,496 --> 00:03:54,876 to be a ticket, but at top left was somewhat obvious this time 94 00:03:55,336 --> 00:03:56,826 bus and subway tickets. 95 00:03:56,826 --> 00:03:58,516 We're in the subway we want 96 00:03:58,516 --> 00:04:00,026 to buy a subway ticket so we click that. 97 00:04:00,466 --> 00:04:02,156 So, then we had to select our fare. 98 00:04:02,156 --> 00:04:04,376 So this, too, fairly straightforward and yet 99 00:04:04,376 --> 00:04:07,496 and this was a socially awkward moment I had 100 00:04:07,646 --> 00:04:10,526 to ask her are you an adult or a senior? 101 00:04:10,526 --> 00:04:12,346 I didn't want to make that judgment call for her 102 00:04:12,346 --> 00:04:15,206 when she said I am, what did she say? 103 00:04:15,206 --> 00:04:18,466 She said I am beyond senior or something like that. 104 00:04:18,706 --> 00:04:21,066 So, I said, okay, I didn't want to assume so we clicked senior 105 00:04:21,196 --> 00:04:23,116 and frankly it was then a dead end because apparently 106 00:04:23,116 --> 00:04:26,386 to get some senior discount you need to have some special pass 107 00:04:26,546 --> 00:04:28,366 or something like this that we didn't have and 108 00:04:28,366 --> 00:04:31,026 yet that certainly wasn't obvious here so now we together 109 00:04:31,026 --> 00:04:32,556 and this person in particular had to figure 110 00:04:32,556 --> 00:04:34,936 out how you go back and then restart this process. 111 00:04:34,936 --> 00:04:35,586 So we went back. 112 00:04:35,876 --> 00:04:38,706 We clicked adult at top left this time and now this. 113 00:04:38,706 --> 00:04:40,856 Like I just want to get on the damn subway. 114 00:04:40,856 --> 00:04:42,846 I don't want to have to consider how much it costs 115 00:04:42,846 --> 00:04:45,186 or what multiple of number of tickets I want to get. 116 00:04:45,476 --> 00:04:48,316 I just wanted to get her on the subway and she, too, 117 00:04:48,316 --> 00:04:51,066 she did mention I want a round trip, she wants to be able 118 00:04:51,066 --> 00:04:54,026 to come back from Park Street Station so where is the option 119 00:04:54,026 --> 00:04:56,026 that just says one way or round trip? 120 00:04:56,306 --> 00:04:58,296 No, instead we have to click other amount 121 00:04:58,356 --> 00:05:01,266 at bottom right there, when we have to input 122 00:05:01,266 --> 00:05:04,156 in this screen 4-0-0 123 00:05:04,156 --> 00:05:06,466 after consulting visually the little cheat sheet 124 00:05:06,466 --> 00:05:08,286 on the device itself it tells you 125 00:05:08,516 --> 00:05:13,166 in a very long chart how much one pass costs so we multiply 126 00:05:13,166 --> 00:05:15,206 and type in 4-0-0, we hit enter. 127 00:05:15,546 --> 00:05:18,856 Now, we're reminded that an SV adult is what we're buying, 128 00:05:18,856 --> 00:05:20,976 whatever SV is, probably single value, 129 00:05:20,976 --> 00:05:22,876 but who the hell cares at this point? 130 00:05:22,876 --> 00:05:26,086 So now amount to pay $4, credit card/debit card. 131 00:05:26,086 --> 00:05:29,606 So, okay, we go ahead and click credit card at this point. 132 00:05:30,096 --> 00:05:32,086 I'm reminded of the confirmation screen here 133 00:05:32,086 --> 00:05:34,246 and that's not too bad, then we put in the card 134 00:05:34,246 --> 00:05:36,856 and we click confirm and then finally we see the 135 00:05:36,856 --> 00:05:38,256 swap function. 136 00:05:39,876 --> 00:05:43,426 [Laughter] Then finally literally nine steps later the 137 00:05:43,546 --> 00:05:46,496 ticket comes out of the machine and then, frankly, 138 00:05:46,496 --> 00:05:47,666 had to go through the process of where 139 00:05:47,666 --> 00:05:49,096 to put the ticket in the machines. 140 00:05:49,096 --> 00:05:51,916 If you've ever used those systems they actually run 141 00:05:51,916 --> 00:05:53,986 Windows because I've seen the blue screen of death 142 00:05:53,986 --> 00:05:57,216 on these screens, but you put the ticket in, it goes in 143 00:05:57,216 --> 00:05:59,486 and out, in and out and in and out and eventually comes back 144 00:05:59,486 --> 00:06:01,006 out in the right place, you pull the ticket out 145 00:06:01,076 --> 00:06:03,446 and in my experience in Boston for the past several years, 146 00:06:03,546 --> 00:06:06,716 I'd say 5% of the time the gates don't even open at that point, 147 00:06:06,996 --> 00:06:09,356 but that's a whole other issue, but the point here is 148 00:06:09,356 --> 00:06:12,476 that at every point in this design process the wrong 149 00:06:12,476 --> 00:06:13,976 decision was made, right? 150 00:06:14,186 --> 00:06:15,736 Again, I don't have data to back this up, 151 00:06:15,736 --> 00:06:18,426 but just common sense suggests to me that the common case is 152 00:06:18,646 --> 00:06:21,176 if I'm on the subway platform I want to get on that train 153 00:06:21,416 --> 00:06:22,566 and maybe I want to get back 154 00:06:22,566 --> 00:06:24,536 and where are those two buttons, right? 155 00:06:24,536 --> 00:06:27,216 So when it comes time to implementing your own software, 156 00:06:27,426 --> 00:06:29,046 ask yourselves these questions. 157 00:06:29,046 --> 00:06:31,276 It may be obvious to you, the developer 158 00:06:31,276 --> 00:06:34,016 or the computer scientist or the engineer, well, obviously, 159 00:06:34,016 --> 00:06:36,776 I just click, here, here, here, here, here, here and voila, 160 00:06:36,776 --> 00:06:39,466 out comes my ticket, but most people don't have an interest 161 00:06:39,606 --> 00:06:41,346 in that process and a lot of people, 162 00:06:41,346 --> 00:06:43,266 this person in particular, 163 00:06:43,596 --> 00:06:45,666 just don't necessarily understand that process. 164 00:06:45,666 --> 00:06:47,896 Her goal was so simple, get on the train. 165 00:06:47,926 --> 00:06:51,276 The solution is so simple, give us $2, we'll give you a ticket 166 00:06:51,506 --> 00:06:53,256 and yet it takes nine steps to do that. 167 00:06:53,256 --> 00:06:55,376 So, I don't mean to start off with a bit of a rant. 168 00:06:55,376 --> 00:06:57,586 I actually do get worked up over these things 169 00:06:57,586 --> 00:06:58,446 because it drives me nuts 170 00:06:58,486 --> 00:07:00,876 because these are not hard problems to solve and 171 00:07:00,876 --> 00:07:02,526 yet consistently throughout society 172 00:07:02,526 --> 00:07:05,776 and your own laptops there are dozens of examples I'm sure 173 00:07:05,776 --> 00:07:07,696 of poorly designed software. 174 00:07:07,696 --> 00:07:10,456 So, among the lessons you'll hopefully exist this course 175 00:07:10,456 --> 00:07:12,696 with is not just how to make something correct. 176 00:07:12,696 --> 00:07:15,186 The machine is for the most part correct. 177 00:07:15,186 --> 00:07:17,376 It eventually outputs a ticket with the right amount, 178 00:07:17,716 --> 00:07:20,746 but my God, how many steps does it take for us to get there? 179 00:07:20,746 --> 00:07:23,056 Let me disclaim, too, we're by no means perfect 180 00:07:23,056 --> 00:07:25,366 so the Harvard courses APP that a lot of you used on the web 181 00:07:25,366 --> 00:07:28,786 to shop for courses, we recently sent out a form to get feedback 182 00:07:28,786 --> 00:07:29,996 from everyone because we don't doubt 183 00:07:29,996 --> 00:07:33,256 that it was imperfect the first time around, but we're going 184 00:07:33,256 --> 00:07:35,986 to iterate and make amends with that. 185 00:07:36,016 --> 00:07:38,046 So realize you don't have to get it right the first time, 186 00:07:38,046 --> 00:07:39,586 but you have to listen and watch 187 00:07:39,586 --> 00:07:41,976 to what actually those problems are and that is one 188 00:07:41,976 --> 00:07:43,456 of the roles your teaching fellows play. 189 00:07:43,756 --> 00:07:45,746 It's not just about stopping some stupid number 190 00:07:45,746 --> 00:07:47,916 on your transcript in the end; 191 00:07:47,916 --> 00:07:50,606 it's about actually providing some compelling feedback. 192 00:07:50,836 --> 00:07:54,476 So, with that said let me hop off the soapbox 193 00:07:54,476 --> 00:07:56,066 and remind us of this. 194 00:07:56,266 --> 00:08:00,506 So, a couple of weeks ago we proposed to swap two integers, 195 00:08:00,536 --> 00:08:01,826 and we couldn't at the time. 196 00:08:02,056 --> 00:08:05,146 Such a simple goal and you look at the code, looks, correct, 197 00:08:05,146 --> 00:08:08,126 we used GDB recall, the new debugger, and we stepped 198 00:08:08,236 --> 00:08:11,646 through this code and it was correct up until the last line 199 00:08:11,646 --> 00:08:14,296 but then as soon as that second curly brace happened 200 00:08:14,566 --> 00:08:18,436 and this function called swap returned as we say all 201 00:08:18,436 --> 00:08:21,636 of this hard work of swapping A and B seemed to have been lost 202 00:08:21,636 --> 00:08:24,986 and now last week we attributed this to this issue of scope. 203 00:08:25,226 --> 00:08:28,256 This idea that each function has its own chunk of memory 204 00:08:28,496 --> 00:08:32,016 that other functions can't necessarily see unless you give 205 00:08:32,016 --> 00:08:35,456 them access to it by passing in copies of variables 206 00:08:35,736 --> 00:08:38,796 or as we'll see today providing references there to 207 00:08:38,796 --> 00:08:40,926 or pointers there to and this is new jargon 208 00:08:40,926 --> 00:08:43,936 that we'll tease apart today and it's at this point in the class 209 00:08:43,936 --> 00:08:45,996 where you really begin to appreciate both the power 210 00:08:45,996 --> 00:08:49,376 and frankly the danger of programming particularly 211 00:08:49,376 --> 00:08:51,666 in a fairly low-level language like C 212 00:08:51,896 --> 00:08:53,856 which gives you great flexibility. 213 00:08:53,856 --> 00:08:56,716 You can touch almost any part of memory in the computer systems 214 00:08:56,716 --> 00:08:59,496 that you want with your program, but do you want to? 215 00:08:59,766 --> 00:09:00,486 Probably not. 216 00:09:00,486 --> 00:09:03,336 Almost everyone in this room has probably had a segfault 217 00:09:03,336 --> 00:09:05,146 at some point or core dump where you end 218 00:09:05,146 --> 00:09:06,796 up with this random file called core, 219 00:09:07,026 --> 00:09:08,896 which recall is just the contents of memory 220 00:09:08,896 --> 00:09:10,886 at the time your program crashed. 221 00:09:11,226 --> 00:09:13,946 Well, that suggests that you were not using memory correctly 222 00:09:13,946 --> 00:09:16,426 and so we'll tease apart today exactly what it means 223 00:09:16,746 --> 00:09:19,596 to navigate inside of a computer by way of memory 224 00:09:19,596 --> 00:09:22,396 and we'll also touch on over time what are some 225 00:09:22,396 --> 00:09:24,096 of the evils that might happen. 226 00:09:24,096 --> 00:09:25,416 In fact, I dug up this 227 00:09:25,526 --> 00:09:28,206 from a few years ago it was actually pretty fortuitous 228 00:09:28,206 --> 00:09:31,456 like the day before one of CS50s lectures the original iPhone 229 00:09:31,456 --> 00:09:31,896 was cracked. 230 00:09:32,106 --> 00:09:35,596 To crack the phone means or it was jail broken as people say 231 00:09:35,836 --> 00:09:39,366 which means you were able to use an iPhone back then and now 232 00:09:39,366 --> 00:09:42,616 with other software on like AT&T's network or broad, 233 00:09:42,926 --> 00:09:45,756 or sorry, on T-Mobile's network as opposed to just AT&T. 234 00:09:45,946 --> 00:09:47,676 You can install other software on it, you don't have 235 00:09:47,676 --> 00:09:49,466 to use the Apple Store, you can use third party stores. 236 00:09:49,696 --> 00:09:51,936 So, it frees you from some of Apple's tethers 237 00:09:52,166 --> 00:09:54,396 and this was the code that circulated on the Internet 238 00:09:54,396 --> 00:09:56,706 with which people could crack their iPhones. 239 00:09:56,706 --> 00:09:58,356 It was pretty primitive at the time. 240 00:09:58,356 --> 00:10:00,046 This was not sort of for the timid 241 00:10:00,206 --> 00:10:02,576 because you could very easily brick your iPhone. 242 00:10:03,046 --> 00:10:06,436 Yes, as you may have, okay, maybe I could have, 243 00:10:06,526 --> 00:10:08,506 I see the smiles that there's something interesting 244 00:10:08,506 --> 00:10:09,236 in the comment. 245 00:10:09,286 --> 00:10:11,826 So, I'll just fast forward through that, 246 00:10:11,966 --> 00:10:13,966 but look the point really I was trying to make, 247 00:10:14,976 --> 00:10:17,616 is there anything else inappropriate here? 248 00:10:18,166 --> 00:10:18,896 No, we're good. 249 00:10:19,046 --> 00:10:21,846 So, this is clearly C, right, and it's C because with C 250 00:10:21,846 --> 00:10:24,476 and a few other languages you get some low-level control 251 00:10:24,476 --> 00:10:26,286 and what I believe happens in this case 252 00:10:26,286 --> 00:10:28,496 of the iPhone's first jail breaking was 253 00:10:28,496 --> 00:10:32,066 that there was something called a buffer overflow exploit. 254 00:10:32,066 --> 00:10:34,966 An opportunity in Apple's own source code 255 00:10:35,076 --> 00:10:39,226 where someone accidently didn't check the size of an array, 256 00:10:39,226 --> 00:10:41,876 didn't make sure that when you're iterating from I 257 00:10:41,876 --> 00:10:43,566 to zero plus, plus, plus, plus, plus, 258 00:10:43,566 --> 00:10:45,926 plus that you don't actually stop for instance at the end 259 00:10:45,926 --> 00:10:48,806 of the array and so what these folks were able 260 00:10:48,806 --> 00:10:52,546 to do was inject their own code that they had written 261 00:10:52,716 --> 00:10:55,636 into their iPhones in such a way that they trick the phone 262 00:10:55,636 --> 00:10:57,036 into executing that code 263 00:10:57,246 --> 00:10:59,226 and thereafter they had full-fledged access 264 00:10:59,266 --> 00:11:00,086 to the iPhone. 265 00:11:00,086 --> 00:11:04,056 In fact, I believe they were able to Telnet 266 00:11:04,056 --> 00:11:07,336 or SSH essentially into their phones and get a blinking prompt 267 00:11:07,606 --> 00:11:09,166 because essentially under the hood 268 00:11:09,166 --> 00:11:11,686 of the phone is a UNIX like system. 269 00:11:11,686 --> 00:11:12,616 So it's actually pretty cool. 270 00:11:12,616 --> 00:11:14,416 Frankly, it's gotten much easier these days. 271 00:11:14,686 --> 00:11:16,896 Apparently you can now jail break your iPhones 272 00:11:16,896 --> 00:11:19,266 and your iPads and your iPod Touches by like pulling 273 00:11:19,266 --> 00:11:21,266 up a web page and typing the right command. 274 00:11:21,266 --> 00:11:23,126 So, we don't necessarily recommend that, 275 00:11:23,126 --> 00:11:25,156 but what was pretty cool especially that day, 276 00:11:26,596 --> 00:11:28,306 okay, so this is okay. 277 00:11:28,306 --> 00:11:30,086 Why am I reversing this APP, we were going 278 00:11:30,086 --> 00:11:31,016 to release the resources. 279 00:11:31,016 --> 00:11:31,356 All right. 280 00:11:31,966 --> 00:11:33,296 So there's some cool tricks people can do. 281 00:11:33,296 --> 00:11:34,606 We'll make this available because, frankly, 282 00:11:34,606 --> 00:11:36,496 you could Google it and now it's pretty moot 283 00:11:36,496 --> 00:11:38,016 because Apple has plugged this particular hole 284 00:11:38,016 --> 00:11:40,446 but there's all sorts of other bugs still, but it all boils 285 00:11:40,446 --> 00:11:42,386 down to one of the topics today, which is going to be 286 00:11:42,386 --> 00:11:44,716 that of this thing called a pointer. 287 00:11:44,716 --> 00:11:46,196 So, here was the problem. 288 00:11:46,196 --> 00:11:48,586 Let me draw a quick picture in chalk here. 289 00:11:48,856 --> 00:11:52,016 We generally draw our computers memory lately just some 290 00:11:52,016 --> 00:11:54,076 rectangle like this where the bottom 291 00:11:54,076 --> 00:11:56,696 of it is the thing called the stack and it's called that, 292 00:11:56,696 --> 00:11:58,106 it's nice to call it that conceptually 293 00:11:58,106 --> 00:11:59,586 because every time you call a function, 294 00:11:59,826 --> 00:12:02,106 recall that you put another frame on this stack, 295 00:12:02,316 --> 00:12:03,966 which means allocate a chunk of memory 296 00:12:03,966 --> 00:12:07,006 from main then allocate a chunk of memory for say swap 297 00:12:07,236 --> 00:12:09,426 and conceptually it's going on top, on top, 298 00:12:09,426 --> 00:12:11,566 on top of the previous function's memory 299 00:12:11,856 --> 00:12:14,946 and that's conceptually true and also we'll see eventually 300 00:12:14,946 --> 00:12:18,576 if you use GDB and you actually print the addresses 301 00:12:18,716 --> 00:12:21,556 of every byte in memory, you'll see that, in fact, 302 00:12:21,646 --> 00:12:24,246 the addresses are going higher and higher and higher or lower 303 00:12:24,246 --> 00:12:26,766 and lower depending on where you are in the story. 304 00:12:26,766 --> 00:12:30,196 So down below is generally where main's memory goes. 305 00:12:30,296 --> 00:12:32,236 So, if you've got local variables in main, 306 00:12:32,236 --> 00:12:36,116 if you're using arc and argv in main, they generally belong 307 00:12:36,116 --> 00:12:37,956 at the bottom of the stack conceptually 308 00:12:38,156 --> 00:12:41,006 and if main now calls a function whether it's get string 309 00:12:41,266 --> 00:12:45,456 or printf or swap or foo or whatever, it then gets pushed 310 00:12:45,536 --> 00:12:49,936 onto the stack here so we'll call this the swap frame here 311 00:12:49,936 --> 00:12:53,626 on the stack and any variables that are local to swap like A 312 00:12:53,626 --> 00:12:57,666 and B or temp also end up in that chunk of memory. 313 00:12:57,666 --> 00:12:58,856 So what's happening in buggy3, 314 00:12:58,856 --> 00:13:02,466 which recall looked a little something like this, 315 00:13:03,176 --> 00:13:05,866 is when you have this code in main 316 00:13:05,866 --> 00:13:08,656 up top whereby I'm declaring an int called x 317 00:13:08,656 --> 00:13:10,856 and then I'm assigning it the value of 1 318 00:13:11,096 --> 00:13:14,086 and then I'm declaring another variable called y assigning it 319 00:13:14,086 --> 00:13:18,506 the value of 2, well, what's really going on here in main is 320 00:13:18,506 --> 00:13:21,456 that if we carve out 32 bits, this is the thing 321 00:13:21,456 --> 00:13:22,506 that we're referring to as X, 322 00:13:22,786 --> 00:13:25,236 this is the thing we're referring to as y and so 323 00:13:25,236 --> 00:13:28,676 when I say x gets 1, I'm putting the value 1 where the pattern 324 00:13:28,676 --> 00:13:30,976 of bits that represent 1 there and the same deal for 2. 325 00:13:31,476 --> 00:13:34,706 Now, where they end up isn't so important for the picture sake. 326 00:13:34,706 --> 00:13:36,706 They go somewhere in that chunk of memory back-to-back, 327 00:13:37,236 --> 00:13:39,516 but for now the point is that they are inside 328 00:13:39,516 --> 00:13:41,956 of x and y main's frame. 329 00:13:42,176 --> 00:13:46,136 So, what happens next is when you call swap with a line 330 00:13:46,136 --> 00:13:49,316 of code like this one here, you're passing 331 00:13:49,316 --> 00:13:54,516 to swap two values, x and Y, but in order to bridge this gap, 332 00:13:54,756 --> 00:13:57,116 you can think of this line as really being some kind 333 00:13:57,116 --> 00:14:00,506 of barrier, in order to pass some variables from this frame 334 00:14:00,566 --> 00:14:03,146 into this frame, you can't pass the originals, right? 335 00:14:03,146 --> 00:14:05,886 You can't physically move these bits here, 336 00:14:06,036 --> 00:14:09,556 but you could very easily make a copy and so somewhere in swap, 337 00:14:09,556 --> 00:14:11,676 let's just draw it here for simplicity, there's going 338 00:14:11,676 --> 00:14:13,366 to be another 32 bits called A 339 00:14:13,366 --> 00:14:18,516 and then here there's another 32 bits called B and what we put 340 00:14:18,516 --> 00:14:22,056 in there is just copies of whatever was in x and Y. And so 341 00:14:22,056 --> 00:14:23,356 if at first glance it feels 342 00:14:23,356 --> 00:14:27,676 like this code should absolutely work, well, if you think 343 00:14:27,676 --> 00:14:28,646 about what's really going 344 00:14:28,646 --> 00:14:30,856 on underneath the hood, well, sure it works. 345 00:14:30,856 --> 00:14:33,476 It swaps this one and these two, A and B, 346 00:14:33,736 --> 00:14:35,666 but it has no access to main's memory. 347 00:14:35,886 --> 00:14:38,516 So, today do we finally equip you with a solution 348 00:14:38,516 --> 00:14:40,536 to this problem one that allows us 349 00:14:40,536 --> 00:14:44,366 to do many more powerful things including starting to read 350 00:14:44,366 --> 00:14:46,776 and write from discs, read and write files, 351 00:14:46,776 --> 00:14:48,316 which will be useful for our forensics piece 352 00:14:48,316 --> 00:14:49,656 when we actually recover JPEGs 353 00:14:49,656 --> 00:14:53,286 and similar file-based mechanisms and it allows us, 354 00:14:53,346 --> 00:14:57,886 too, to really pass anything we want around in memory, 355 00:14:57,886 --> 00:15:01,316 and we just need a new, tiny piece of new syntax 356 00:15:01,466 --> 00:15:02,746 in order to make this happen. 357 00:15:02,746 --> 00:15:06,686 So this is version two of the swap function and, in fact, 358 00:15:06,686 --> 00:15:11,946 it is almost identical, this is before, this is after, before, 359 00:15:11,946 --> 00:15:15,266 after and it looks like the solution to this problem 360 00:15:15,266 --> 00:15:17,566 that we've been revisiting a couple 361 00:15:17,566 --> 00:15:19,366 of times now is just to do what? 362 00:15:20,026 --> 00:15:22,646 Well, it's a prefix most of the symbols in this function 363 00:15:22,646 --> 00:15:25,916 with just an asterisk, with a * from the keyboard so Shift-8. 364 00:15:26,306 --> 00:15:27,826 Well, what does this actually mean? 365 00:15:28,096 --> 00:15:31,256 Well, it means something a little different conceptually. 366 00:15:31,256 --> 00:15:33,786 Notice that we're still saying int and we're still saying int, 367 00:15:33,856 --> 00:15:36,026 we're still calling things A and B, again, 368 00:15:36,026 --> 00:15:38,746 to be clear this was before, this was after, 369 00:15:39,146 --> 00:15:42,046 but the * in this context before a means 370 00:15:42,046 --> 00:15:47,346 that a is no longer an int; it's instead a pointer to an int. 371 00:15:47,526 --> 00:15:49,726 It is the address of an integer. 372 00:15:50,076 --> 00:15:53,546 So, whereas before we are literally passing a copy of x 373 00:15:53,756 --> 00:15:56,856 and y and respectively calling them a and b, 374 00:15:56,856 --> 00:15:58,716 they are different chunks of 32 bits, 375 00:15:59,096 --> 00:16:02,956 this time what we're doing is passing to swap not x itself, 376 00:16:03,186 --> 00:16:05,516 not y itself, but in a moment we'll see 377 00:16:05,516 --> 00:16:10,236 that we're actually passing in the addresses of x and y, right? 378 00:16:10,236 --> 00:16:12,216 Because if you think about it conceptually just sort 379 00:16:12,216 --> 00:16:15,606 of from a real-world perspective, if the problem is 380 00:16:15,636 --> 00:16:20,106 that swap is broken because he cannot access main's memory, 381 00:16:20,106 --> 00:16:22,136 he cannot access x or y, well, 382 00:16:22,136 --> 00:16:26,466 the solution just intuitively is give swap access to x 383 00:16:26,466 --> 00:16:28,856 and y. How do you give one function access 384 00:16:29,086 --> 00:16:30,896 to another function's chunks of memory? 385 00:16:31,236 --> 00:16:33,876 Well, simply with the * notation at least on the way 386 00:16:33,876 --> 00:16:35,876 in when you declare the function called swap, 387 00:16:36,166 --> 00:16:38,866 you simply say this is not going to take an int and another int 388 00:16:38,866 --> 00:16:39,706 because that's useless. 389 00:16:39,706 --> 00:16:41,146 It would give me copies of things. 390 00:16:41,486 --> 00:16:43,896 Instead I'm going to be expecting the address 391 00:16:43,896 --> 00:16:47,646 of some int and the address of another int and thanks 392 00:16:47,646 --> 00:16:51,476 to this address I can literally find this address in RAM, 393 00:16:51,726 --> 00:16:53,036 do anything I want there, 394 00:16:53,086 --> 00:16:56,046 return and what I've just done is actually changed 395 00:16:56,046 --> 00:16:59,246 or mutate the values of those original variables. 396 00:16:59,366 --> 00:17:02,616 In other words, if you hand someone literally the street 397 00:17:02,616 --> 00:17:05,536 address of something, right, if you hand, let's say, 398 00:17:06,446 --> 00:17:13,346 if you hand someone the address of a house or a home 399 00:17:13,506 --> 00:17:16,336 that person can then literally go to that location 400 00:17:16,336 --> 00:17:18,196 and do whatever they want at that location. 401 00:17:18,426 --> 00:17:19,626 It's the same idea here. 402 00:17:19,626 --> 00:17:23,206 If main wants to give access to swap to x and Y, 403 00:17:23,586 --> 00:17:27,556 it just has to tell swap where x and y are. 404 00:17:27,816 --> 00:17:29,936 So, let's actually take a look at this in action. 405 00:17:30,496 --> 00:17:33,626 So this here is going to be among your printouts 406 00:17:33,626 --> 00:17:35,046 from this current week. 407 00:17:35,406 --> 00:17:36,906 It's a program called swap. 408 00:17:37,076 --> 00:17:39,356 It's in a file called swap.C. Everything should be 409 00:17:39,356 --> 00:17:41,256 alphabetical so it should be toward the end 410 00:17:41,256 --> 00:17:42,956 of this week's packet and notice 411 00:17:42,956 --> 00:17:45,116 that it's almost identical except 412 00:17:45,116 --> 00:17:46,806 for a couple of syntax changes. 413 00:17:47,196 --> 00:17:49,636 So, at the very top of the file I'm just copying 414 00:17:49,636 --> 00:17:51,766 and pasting what was essentially on the slide a moment ago. 415 00:17:51,766 --> 00:17:54,506 I have to change the prototype of this function to say 416 00:17:54,506 --> 00:17:59,156 that swap no longer takes an int per se or another int per se, 417 00:17:59,366 --> 00:18:03,706 but rather it takes two pointers, two ints, and in fact, 418 00:18:03,706 --> 00:18:05,196 it's on the very last page if you're still flipping. 419 00:18:05,806 --> 00:18:08,956 So, main meanwhile is going to change, but let's fast forward 420 00:18:08,956 --> 00:18:11,066 to the end just to confirm that all I did was copy 421 00:18:11,066 --> 00:18:12,936 and paste what was on the slide a moment ago. 422 00:18:13,166 --> 00:18:16,296 Indeed, swap in your printout there is just defined 423 00:18:16,546 --> 00:18:21,136 as now taking *a and *b and then it also uses the * later, 424 00:18:21,136 --> 00:18:23,746 but we'll come back to what the different uses of the * means, 425 00:18:23,816 --> 00:18:27,976 but for now I claim conceptually it just means swap has access 426 00:18:28,276 --> 00:18:30,106 to the locations of its parameters. 427 00:18:30,376 --> 00:18:32,536 So, main does have to change a tiny bit. 428 00:18:32,826 --> 00:18:36,736 So this is the last new piece of syntax to be honest for a while. 429 00:18:36,836 --> 00:18:38,976 See, to be honest even though it might feel like a whole 430 00:18:38,976 --> 00:18:41,526 of new stuff at once, it's a pretty small language 431 00:18:41,526 --> 00:18:44,526 and so we've almost seen all of the syntactic features thus far 432 00:18:44,526 --> 00:18:47,266 so now we'll be able to start focusing more on concepts. 433 00:18:47,556 --> 00:18:49,356 Notice the one new piece of syntax is this. 434 00:18:49,596 --> 00:18:52,146 I'm not passing an x and I'm not passing a Y, 435 00:18:52,406 --> 00:18:55,066 I'm passing an ampersand x and ampersand y 436 00:18:55,066 --> 00:18:56,176 and can you take a guess 437 00:18:56,176 --> 00:18:58,566 as to what the ampersand operator must mean? 438 00:18:59,716 --> 00:19:00,276 So the address. 439 00:19:00,456 --> 00:19:03,306 That's just the special symbol in C that says don't pass X, 440 00:19:03,626 --> 00:19:07,496 that is don't pass a copy of X. Rather figure out where x is 441 00:19:07,496 --> 00:19:09,416 in memory, where he is in that frame 442 00:19:09,656 --> 00:19:13,966 and provide swap the numeric address in RAM of that value 443 00:19:13,966 --> 00:19:16,416 so that swap can go do anything it wants at that address. 444 00:19:16,656 --> 00:19:20,476 It is the postal address of x and the postal address of y. 445 00:19:20,906 --> 00:19:23,056 So that's the only change we need to make there 446 00:19:23,316 --> 00:19:25,206 and watch what happens when we actually run this. 447 00:19:25,296 --> 00:19:27,406 If I go ahead and make swap 448 00:19:27,406 --> 00:19:29,116 and then run the program called swap, 449 00:19:29,446 --> 00:19:33,126 finally for the first time I'm claiming that I've swapped x 450 00:19:33,126 --> 00:19:37,016 and y from 1 and 2 to 2 and 1 and, indeed, that is the case 451 00:19:37,356 --> 00:19:39,796 because if you look at what mains does, 452 00:19:39,796 --> 00:19:43,946 it is specifically designed to print x and y twice both before 453 00:19:44,356 --> 00:19:47,336 and after and this is the first time we've actually gotten the 454 00:19:47,336 --> 00:19:48,416 answers we expected. 455 00:19:48,796 --> 00:19:50,966 So, what must be going on in swap? 456 00:19:50,966 --> 00:19:52,466 Because we just have to reason 457 00:19:52,466 --> 00:19:55,226 through now what the * notation actually means. 458 00:19:55,226 --> 00:19:58,796 So the *a and *b here in swap's prototype says 459 00:19:58,796 --> 00:20:02,096 to expect the address of an int call it a and the address 460 00:20:02,096 --> 00:20:05,596 of another int call it b. Well, I still need to make a copy 461 00:20:05,596 --> 00:20:07,586 of one of those integers because remember we can't just pass 462 00:20:07,796 --> 00:20:10,926 these values, swap these values simultaneously 463 00:20:10,926 --> 00:20:14,106 in this way although as an aside there's actually a neat bit-wise 464 00:20:14,106 --> 00:20:15,696 operation with which you can do this -- 465 00:20:15,696 --> 00:20:16,626 we'll see this eventually -- 466 00:20:16,896 --> 00:20:18,976 but for now you can't just swap them simultaneously 467 00:20:18,976 --> 00:20:21,136 because as we've seen we'll lose one of their values. 468 00:20:21,456 --> 00:20:22,536 So let's instead do this. 469 00:20:22,536 --> 00:20:25,536 Let's instead declare another int just 470 00:20:25,536 --> 00:20:27,266 as we've been doing all this time. 471 00:20:27,496 --> 00:20:30,286 I'm going to call it temp and it doesn't matter where it ends up 472 00:20:30,286 --> 00:20:31,806 but conceptually it's in this frame. 473 00:20:32,096 --> 00:20:33,466 This thing is going to be called temp 474 00:20:33,606 --> 00:20:34,656 and what are we putting there? 475 00:20:35,036 --> 00:20:37,556 It looks like we're putting *a. 476 00:20:37,656 --> 00:20:40,926 Now this is perhaps, frankly this is really the reason 477 00:20:40,926 --> 00:20:43,156 that people tend to get confused with the new piece of syntax 478 00:20:43,426 --> 00:20:46,226 because the * means different things in different contexts. 479 00:20:46,676 --> 00:20:50,346 Here in functions prototype it just means the address of. 480 00:20:50,346 --> 00:20:54,086 Expect the address of an int but here in the context 481 00:20:54,086 --> 00:20:56,826 of the function itself inside the curly braces, 482 00:20:57,076 --> 00:20:59,506 it means go to that address. 483 00:20:59,976 --> 00:21:03,546 So, when you say Star A, A is again an address, 484 00:21:03,546 --> 00:21:07,696 *a in the inside of the function means go to that address 485 00:21:08,066 --> 00:21:11,466 so that means go to that address A, 486 00:21:11,466 --> 00:21:13,196 grab its value and put it where? 487 00:21:13,196 --> 00:21:15,636 In temp. So that means at the end of that first line 488 00:21:15,636 --> 00:21:18,386 of code just as before the value of 1 ends 489 00:21:18,386 --> 00:21:22,076 up in a variable called temp because we said go to A 490 00:21:22,076 --> 00:21:24,936 and I look around, oh, there's A, what's there? 491 00:21:24,936 --> 00:21:25,776 That value. 492 00:21:25,776 --> 00:21:26,656 And so I put it there. 493 00:21:26,986 --> 00:21:29,906 Now, by that logic the next two steps are pretty much the same. 494 00:21:29,906 --> 00:21:31,326 Star A, *b. 495 00:21:31,326 --> 00:21:32,056 What does that mean? 496 00:21:32,386 --> 00:21:36,876 So that means Star B go to B so where is B? 497 00:21:36,976 --> 00:21:39,886 So B is, and now let me update this, oh, 498 00:21:39,976 --> 00:21:42,656 actually let me take one step back here for a moment. 499 00:21:45,156 --> 00:21:47,596 Let me rewind slightly in the story 500 00:21:47,596 --> 00:21:49,336 so that I don't actually mislead because I forgot 501 00:21:49,336 --> 00:21:50,416 to update these two cells. 502 00:21:50,706 --> 00:21:53,796 So, in this program when we call this version of swap 1 503 00:21:53,796 --> 00:21:55,256 and 2 do not end up here, right? 504 00:21:55,256 --> 00:21:56,566 That's the whole point of the solution. 505 00:21:56,566 --> 00:21:57,756 We're not putting 1 and 2 here. 506 00:21:57,756 --> 00:21:59,326 What are we really putting in these boxes? 507 00:22:00,756 --> 00:22:01,786 So the address right? 508 00:22:01,786 --> 00:22:02,586 So the location. 509 00:22:02,586 --> 00:22:04,166 Now I don't know what the locations are 510 00:22:04,166 --> 00:22:06,576 so let's just arbitrarily come up with some memory addresses. 511 00:22:06,576 --> 00:22:09,866 I'm going to call this address one, two, three and I'm going 512 00:22:09,866 --> 00:22:11,846 to call this one 4, 5, 6, right? 513 00:22:11,846 --> 00:22:13,486 I don't exactly know where they are in RAM, 514 00:22:13,716 --> 00:22:16,496 but I know if I've got 2 gigabytes of RAM, zero is here, 515 00:22:16,496 --> 00:22:18,586 byte 1 is here, byte 2 is here, dot, dot, 516 00:22:18,586 --> 00:22:22,186 dot so maybe byte 123 is over here and 456 is here. 517 00:22:22,186 --> 00:22:24,626 Whatever. We just need to know that they are numeric addresses 518 00:22:24,856 --> 00:22:28,536 because what that means here is that what goes inside of A is 4, 519 00:22:28,536 --> 00:22:33,046 5, 6 this time and 1, 2, 3 because, again, we're passing 520 00:22:33,046 --> 00:22:35,976 into swap the address of A, 4, 5, 6, 521 00:22:35,976 --> 00:22:38,466 and the address of B, 1, 2, 3. 522 00:22:38,826 --> 00:22:40,656 So literally are those numbers stored 523 00:22:40,656 --> 00:22:41,896 in those same chunks of memory. 524 00:22:42,046 --> 00:22:44,546 So now let me pick up where we left off. 525 00:22:44,646 --> 00:22:47,106 This line of code here, temp gets *a. 526 00:22:47,106 --> 00:22:50,036 So now we can tell the story properly. 527 00:22:50,036 --> 00:22:50,716 All right, what's A? 528 00:22:51,146 --> 00:22:52,916 A is 4, 5, 6. 529 00:22:52,916 --> 00:22:55,246 The *a means go to that address so it means, 530 00:22:55,246 --> 00:22:56,246 all right, where's 5, 6? 531 00:22:56,356 --> 00:22:59,966 Here's 0, 1, 3, ah-ha, here's 4, 5, 6 what is there? 532 00:23:00,156 --> 00:23:03,006 One. And so I put 1 into this box 533 00:23:03,006 --> 00:23:05,046 so that's the correction I needed to make from before. 534 00:23:05,536 --> 00:23:06,486 So now the next line. 535 00:23:06,486 --> 00:23:08,106 Star A gets *b. 536 00:23:08,446 --> 00:23:12,616 So *b means, look at B, all right so B is this thing here 537 00:23:12,926 --> 00:23:15,156 in the swap frame all right it's 1, 2, 3. 538 00:23:15,316 --> 00:23:17,066 Star B means go there, all right? 539 00:23:17,106 --> 00:23:21,316 So by 0, 1 dot, dot, dot, ah-ha, here's 1, 2, 3. 540 00:23:21,316 --> 00:23:21,986 What is there? 541 00:23:21,986 --> 00:23:24,346 It looks like the number 2 so I'm going to go ahead 542 00:23:24,346 --> 00:23:25,706 and put the number 2 where? 543 00:23:26,146 --> 00:23:28,776 Well, on the left-hand side now we say *a 544 00:23:28,776 --> 00:23:30,736 so it's the same thing. 545 00:23:30,736 --> 00:23:31,916 So *a. What's A? 546 00:23:31,916 --> 00:23:33,566 A is 4, 5, 6. 547 00:23:33,846 --> 00:23:37,406 Star A means go there and so where am I going 548 00:23:37,406 --> 00:23:39,666 to put the value of 2? 549 00:23:39,896 --> 00:23:40,666 Where I am. 550 00:23:40,776 --> 00:23:43,766 Right? So I've left my finger there because I've gone there, 551 00:23:43,766 --> 00:23:46,406 *a, go there put what's in *b 552 00:23:46,716 --> 00:23:48,516 so that means put the value 2 there. 553 00:23:48,516 --> 00:23:51,726 So at this point in the story, the addresses are still in A 554 00:23:51,726 --> 00:23:53,836 and B; 4, 5, 6 and 1, 2, 3. 555 00:23:54,146 --> 00:23:58,656 The value 2 is still in what we called Y. It's now also 556 00:23:58,656 --> 00:24:01,396 in what we called X, but notice inside 557 00:24:01,396 --> 00:24:02,826 of main and here's the power. 558 00:24:02,826 --> 00:24:05,576 Now finally swap has the ability, the power, 559 00:24:05,816 --> 00:24:09,676 to modify memory that isn't his own, that's not in his own scope 560 00:24:09,956 --> 00:24:11,546 because we've passed it in by address. 561 00:24:11,546 --> 00:24:13,846 Now, we're not quite there because now we have two copies 562 00:24:13,846 --> 00:24:17,056 of 2 so the last line of code says *b gets temp. 563 00:24:17,296 --> 00:24:18,406 Well, temp is easy. 564 00:24:18,536 --> 00:24:19,076 There's temp. 565 00:24:19,076 --> 00:24:21,366 It's the value 1 this is sort of week one stuff. 566 00:24:21,606 --> 00:24:24,166 Variable stores the value 1 I've got it. 567 00:24:24,556 --> 00:24:28,306 Star B in the last line of code now means find B, 568 00:24:28,306 --> 00:24:29,346 all right, that's 1, 2, 3. 569 00:24:29,646 --> 00:24:32,386 Star B means go there so that's 0, 1, 2, 3. 570 00:24:32,386 --> 00:24:33,236 Okay, 1, 2, 3. 571 00:24:33,466 --> 00:24:38,866 So the value of 2 is here so now *b gets temp so what goes here? 572 00:24:38,866 --> 00:24:40,506 Well, what's in temp? 573 00:24:40,886 --> 00:24:43,536 And so now at the end of the story we have, indeed, 574 00:24:43,536 --> 00:24:46,336 swapped the values, but the story isn't quite complete 575 00:24:46,366 --> 00:24:48,296 because there's still something that's going to happen next. 576 00:24:48,666 --> 00:24:51,426 What happens now after we've executed this third line 577 00:24:51,426 --> 00:24:57,236 of code, what happens next in the story? 578 00:24:57,236 --> 00:25:00,026 So we hit the curly brace so the very bottom of the function 579 00:25:00,026 --> 00:25:02,876 and as soon as you hit that, the next line in the story is well, 580 00:25:02,876 --> 00:25:06,036 then we return to main and where are we executing in main? 581 00:25:06,066 --> 00:25:07,956 Well, if we scroll up where did we begin? 582 00:25:08,366 --> 00:25:10,286 We began here so at this point 583 00:25:10,286 --> 00:25:11,796 in the story the next thing that's going 584 00:25:11,796 --> 00:25:13,796 to happen is this line called Print F 585 00:25:13,796 --> 00:25:15,856 that says swap exclamation point, right? 586 00:25:16,196 --> 00:25:18,056 But the moment swap returns, 587 00:25:18,156 --> 00:25:20,346 the moment we hit this bottom most curly brace, 588 00:25:20,716 --> 00:25:22,396 what conceptually happens in memory? 589 00:25:23,866 --> 00:25:25,286 We lose this frame. 590 00:25:25,286 --> 00:25:28,166 It gets popped off the stack as one says 591 00:25:28,426 --> 00:25:31,176 and so you know what the values actually as we've hinted 592 00:25:31,176 --> 00:25:32,636 at with our brief discussions 593 00:25:32,636 --> 00:25:34,716 of forensics they're actually still there. 594 00:25:34,836 --> 00:25:36,576 So the fact that I'm erasing it does not mean 595 00:25:36,576 --> 00:25:38,206 that the computer erased that memory 596 00:25:38,206 --> 00:25:40,176 or over wrote those values with all zeros. 597 00:25:40,176 --> 00:25:42,976 They're still there, but the computer has forgotten what is, 598 00:25:42,976 --> 00:25:43,636 in fact, there. 599 00:25:43,636 --> 00:25:44,516 Now, is that a problem? 600 00:25:44,856 --> 00:25:49,096 No. Because A and B and even temp they were by nature local 601 00:25:49,096 --> 00:25:50,296 or temporary variables. 602 00:25:50,296 --> 00:25:52,286 They might very well be storing the addresses 603 00:25:52,286 --> 00:25:55,326 of memory elsewhere, but we just needed them as sort 604 00:25:55,326 --> 00:25:58,026 of a cheat sheet, a little address card to know 605 00:25:58,326 --> 00:26:00,966 where the original values x and y were. 606 00:26:00,966 --> 00:26:03,396 So, any questions? 607 00:26:04,786 --> 00:26:09,856 Can't have told that perfectly 608 00:26:09,856 --> 00:26:11,806 that story I'm sure so any questions? 609 00:26:12,156 --> 00:26:12,706 Now is your chance. 610 00:26:14,196 --> 00:26:14,906 Really? That well? 611 00:26:14,906 --> 00:26:15,736 Ah, damn, okay. 612 00:26:18,016 --> 00:26:18,866 >> Student: [inaudible] 613 00:26:18,866 --> 00:26:19,676 >> David: So good question. 614 00:26:19,756 --> 00:26:21,766 So, and this is where, again, things get sort 615 00:26:21,766 --> 00:26:24,586 of unnecessarily complicated just because of the syntax. 616 00:26:24,996 --> 00:26:28,676 So, when you want to figure out the address 617 00:26:28,776 --> 00:26:30,626 of a variable, you say ampersand. 618 00:26:30,906 --> 00:26:32,686 That is the address of operator. 619 00:26:33,006 --> 00:26:36,396 The * is the "go there" operator. 620 00:26:36,616 --> 00:26:41,256 So, if you assume that a or b are actually addresses, 621 00:26:41,256 --> 00:26:44,236 **a means look at those addresses and then go there just 622 00:26:44,236 --> 00:26:47,086 as I did with my finger on the board, but in the context 623 00:26:47,086 --> 00:26:50,876 of swap we use the ampersand because swap needs 624 00:26:50,876 --> 00:26:54,526 to be informed of the locations of x and Y. swap needs 625 00:26:54,526 --> 00:26:58,436 to be told that x is at 4, 5, 6 and y is at 1, 2, 3. 626 00:26:58,676 --> 00:27:00,306 Well, the mechanism that C gives us 627 00:27:00,306 --> 00:27:03,366 for asking those questions is just the ampersand 628 00:27:03,506 --> 00:27:06,206 and so ampersand x in this line 629 00:27:06,206 --> 00:27:08,936 of code here will literally return according 630 00:27:08,936 --> 00:27:11,106 to our story the number 4, 5, 631 00:27:11,176 --> 00:27:13,726 6 ampersand y will literally return 632 00:27:13,726 --> 00:27:15,706 in our story the number 1, 2, 3. 633 00:27:15,946 --> 00:27:20,506 So it's 4, 5, 6 comma 1, 2, 3 that are passed into swap 634 00:27:20,846 --> 00:27:29,516 and that's why those same numbers end up in A and B. Yeah? 635 00:27:29,516 --> 00:27:30,796 >> Student: [Inaudible]. 636 00:27:30,796 --> 00:27:33,056 >> David: Yes, if you really, 637 00:27:33,056 --> 00:27:35,896 if you want to reimplement the solution from last 638 00:27:35,896 --> 00:27:37,316 where whereby it was buggy, 639 00:27:37,596 --> 00:27:40,876 you can absolutely say take the address of x then go there, 640 00:27:40,946 --> 00:27:43,386 thereby undoing all of the work you just did. 641 00:27:43,616 --> 00:27:46,106 If you really want to be obnoxious, you can do this 642 00:27:46,776 --> 00:27:49,876 because really these operators just undo themselves, right? 643 00:27:49,876 --> 00:27:51,946 One says get the address, one says go there. 644 00:27:51,946 --> 00:27:53,136 Get the address, go there. 645 00:27:53,476 --> 00:27:54,706 So, in fact, you could do this. 646 00:27:54,706 --> 00:27:56,536 Now, you might in some context have to add the right kind 647 00:27:56,536 --> 00:27:57,876 of parentheses so that GCC 648 00:27:57,876 --> 00:28:00,306 or your compiler doesn't get confused, 649 00:28:00,716 --> 00:28:02,886 but really they just reverse each other's processes. 650 00:28:03,156 --> 00:28:05,136 Ampersand gets the address so you can use it. 651 00:28:05,366 --> 00:28:07,826 Star means here is an address go there. 652 00:28:08,706 --> 00:28:09,096 That's all. 653 00:28:09,246 --> 00:28:13,106 And now the point just to tease apart, I can't erase that, 654 00:28:13,196 --> 00:28:17,846 let's fix, so now the one detail that's worth pointing out is 655 00:28:17,846 --> 00:28:23,446 that this here the * in swap's prototype and the * here 656 00:28:23,446 --> 00:28:27,056 in swap's prototype do not mean go there. 657 00:28:27,336 --> 00:28:29,186 It may mean something different and this is sort 658 00:28:29,186 --> 00:28:32,466 of just stupid re-use of syntax although frankly it wouldn't 659 00:28:32,466 --> 00:28:34,626 really be much fun to have yet a symbol so they went 660 00:28:34,626 --> 00:28:37,306 with the same symbol which is pretty reasonable in the context 661 00:28:37,306 --> 00:28:41,526 of a function prototype this just means *a expect A 662 00:28:41,526 --> 00:28:44,486 to be the address of an int and expect B 663 00:28:44,486 --> 00:28:47,016 to be the address of an int. 664 00:28:47,296 --> 00:28:48,276 They don't mean go there. 665 00:28:48,276 --> 00:28:50,666 They only mean go there when you're inside the curly braces. 666 00:28:51,116 --> 00:28:52,566 So that's the only distinction. 667 00:28:53,296 --> 00:28:53,936 Other questions? 668 00:28:54,676 --> 00:28:55,666 All right. 669 00:28:55,916 --> 00:28:58,076 So let's take a look then at somewhere 670 00:28:58,076 --> 00:29:00,696 where this is actually a useful thing to know 671 00:29:00,696 --> 00:29:05,686 so this is compare1.c. So, in compare1.c I've stripped 672 00:29:05,686 --> 00:29:06,996 out the comments in my version 673 00:29:06,996 --> 00:29:09,126 but in your version you do have comments for reference 674 00:29:09,306 --> 00:29:10,826 and it's actually pretty self-explanatory 675 00:29:10,826 --> 00:29:11,876 if you just read through the code. 676 00:29:11,876 --> 00:29:14,166 At the top, I've got some CS50 Library going on, 677 00:29:14,496 --> 00:29:18,026 standard I/O library; I don't bother mentioning argc or argv 678 00:29:18,026 --> 00:29:20,216 because I'm not going to use them in this program. 679 00:29:20,216 --> 00:29:21,906 I'm going to use the CS50 Library instead 680 00:29:21,906 --> 00:29:25,646 for user input I'm saying say something then I'd get a string 681 00:29:25,646 --> 00:29:29,236 from the user and I call time s1 and then I say, say something, 682 00:29:29,476 --> 00:29:31,876 and then I get another string from the user and call it s2 683 00:29:31,876 --> 00:29:34,916 and apparently this program's purpose in life is 684 00:29:34,916 --> 00:29:38,776 to tell me yes or no the user said the same thing both times. 685 00:29:38,956 --> 00:29:41,896 Now, we're using the equal equals operator 686 00:29:41,896 --> 00:29:43,446 and this conceptually is correct. 687 00:29:43,616 --> 00:29:45,456 This is the equality operator. 688 00:29:45,686 --> 00:29:47,886 It would definitely be wrong if we were doing this 689 00:29:48,116 --> 00:29:49,156 because little sanity check 690 00:29:49,156 --> 00:29:51,496 if you just got the single one it means what instead? 691 00:29:52,556 --> 00:29:53,166 Assignment. 692 00:29:53,296 --> 00:29:55,796 So that would, this is actually a very common mistake 693 00:29:55,796 --> 00:29:57,356 and it makes sense if you think about it. 694 00:29:57,606 --> 00:29:59,606 If you accidently do this in your code, 695 00:29:59,656 --> 00:30:04,606 which is very commonly done, you're saying if S gets s2 696 00:30:04,606 --> 00:30:07,826 so you're literally saying copy s2 into s1 697 00:30:07,826 --> 00:30:09,846 and then you're asking the question 698 00:30:10,126 --> 00:30:12,346 if this expression is true, 699 00:30:12,596 --> 00:30:15,506 so it turns out if you've ever encountered this what's going 700 00:30:15,506 --> 00:30:19,556 to happen there if s2 is anything other than zero, 701 00:30:19,696 --> 00:30:22,366 it it's 1, if it's 2, if it's 3, if it's negative 1, negative 2, 702 00:30:22,366 --> 00:30:25,936 negative 3, well, both s1 and s2 are going to become the value 1, 703 00:30:25,936 --> 00:30:28,296 2 or 3 or negative 1, negative 2, negative 3, 704 00:30:28,886 --> 00:30:30,636 but because it's non-zero what does the 705 00:30:30,696 --> 00:30:32,126 if condition evaluate to? 706 00:30:32,126 --> 00:30:32,986 So true. 707 00:30:34,396 --> 00:30:37,006 So even though we think of conditions 708 00:30:37,006 --> 00:30:39,046 as being Boolean values true or false, 709 00:30:39,276 --> 00:30:43,206 really underneath the hood true means anything other than zero. 710 00:30:43,516 --> 00:30:46,956 False means zero and so if you've run into this 711 00:30:46,956 --> 00:30:50,056 in your own code and for some reason it's this first branch 712 00:30:50,086 --> 00:30:51,586 that's always executing 713 00:30:51,586 --> 00:30:54,186 or almost always executing odds are it's 714 00:30:54,186 --> 00:30:56,336 because you've assigned one variable to another, 715 00:30:56,616 --> 00:30:59,986 they both happen to then be non-zero and so, of course, 716 00:31:00,176 --> 00:31:01,926 the first branch is going to execute 717 00:31:01,976 --> 00:31:04,906 because again a Boolean expression really is equivalent 718 00:31:04,906 --> 00:31:08,846 to anything but zero is true and zero is false. 719 00:31:08,846 --> 00:31:11,376 Now, this was not the bug in this program because I did, 720 00:31:11,376 --> 00:31:13,776 in fact, use the equality operator and so it feels 721 00:31:13,776 --> 00:31:17,006 like if I type the word foo both times I should, in fact, 722 00:31:17,006 --> 00:31:18,946 get back you type the same thing. 723 00:31:18,946 --> 00:31:19,636 So, let's try this. 724 00:31:19,636 --> 00:31:22,936 So make compare 1 and then I'm going to go ahead 725 00:31:22,936 --> 00:31:24,986 and run compare 1 enter. 726 00:31:24,986 --> 00:31:29,066 So, I'm going to type foo enter, foo enter 727 00:31:29,066 --> 00:31:30,626 and I type different things. 728 00:31:30,736 --> 00:31:31,116 All right. 729 00:31:31,116 --> 00:31:33,036 Let's try again with maybe a different word. 730 00:31:33,036 --> 00:31:34,216 Let's just type my own name. 731 00:31:34,216 --> 00:31:36,326 That's definitely the same though. 732 00:31:36,326 --> 00:31:38,766 Let's try foo and bar. 733 00:31:38,766 --> 00:31:39,876 I know this is wrong. 734 00:31:40,526 --> 00:31:45,156 So, it seems to always say I type different things. 735 00:31:45,196 --> 00:31:46,366 Now, why is that? 736 00:31:46,366 --> 00:31:48,156 Well, if we look back at the code here, 737 00:31:48,496 --> 00:31:51,896 take a look at what we're really doing. 738 00:31:52,096 --> 00:31:54,226 We're assigning to s1 the string that the user typed in. 739 00:31:54,226 --> 00:31:56,646 We're assigning to s2 the other string that the user typed in 740 00:31:56,646 --> 00:31:59,376 and then we're comparing them, but today we now begin 741 00:31:59,376 --> 00:32:02,146 to take off the training wheels that are the CS50 Library 742 00:32:02,146 --> 00:32:04,756 or really more technically we're going to peel back the layers 743 00:32:04,826 --> 00:32:08,686 that we've been borrowing from the CS50 Library and really look 744 00:32:08,686 --> 00:32:10,636 under the hood at what's been going on. 745 00:32:10,776 --> 00:32:12,126 Well, we've actually said this already. 746 00:32:12,366 --> 00:32:14,946 The word string is just a synonym for what? 747 00:32:16,216 --> 00:32:18,146 char *. So all this time 748 00:32:18,146 --> 00:32:20,586 and frankly this is why we hide this detail in the first week 749 00:32:20,586 --> 00:32:22,096 or two because it's just not interesting 750 00:32:22,096 --> 00:32:23,076 to get into that early. 751 00:32:23,076 --> 00:32:24,146 It's just a distraction. 752 00:32:24,506 --> 00:32:26,906 Really this program is identical to this. 753 00:32:27,416 --> 00:32:32,326 So, if we now apply the logic from today, char * s1 means 754 00:32:32,326 --> 00:32:35,446 that s1 is not a char, it's instead what? 755 00:32:36,426 --> 00:32:38,926 A pointer or the address of what? 756 00:32:39,796 --> 00:32:41,866 The address of a char, right? 757 00:32:41,866 --> 00:32:46,756 Because if *a meant the address of an int and just call it a, 758 00:32:46,926 --> 00:32:50,066 well, then *s1 means this is just the address of a char 759 00:32:50,306 --> 00:32:52,496 and so it turns out what's really happening underneath the 760 00:32:52,496 --> 00:32:56,226 hood with the CS50 Library is we're not handing you a string 761 00:32:56,226 --> 00:32:58,336 in the same sense we're handing you an int 762 00:32:58,696 --> 00:33:00,906 because the string recall is just a sequence 763 00:33:00,906 --> 00:33:02,676 of characters back to back to back to back. 764 00:33:03,146 --> 00:33:06,806 Well, if I've got a five-letter word like D-A-V-I-D, well, 765 00:33:07,056 --> 00:33:08,726 that's like five bytes and 766 00:33:08,726 --> 00:33:10,816 yet we only have the ability thus far in this class 767 00:33:10,816 --> 00:33:14,566 to return one thing at a time I can't return five bytes to you, 768 00:33:14,856 --> 00:33:16,916 but wait a minute, those bytes by nature 769 00:33:16,916 --> 00:33:18,246 of a computer are just stored in RAM. 770 00:33:18,696 --> 00:33:22,926 D-A-V-I-D back to back to back so if I want you to have access 771 00:33:22,926 --> 00:33:24,676 to that string, you know, I don't have 772 00:33:24,706 --> 00:33:26,666 to return all five characters to you. 773 00:33:26,666 --> 00:33:28,506 What could I just return instead? 774 00:33:28,916 --> 00:33:31,276 Just the first one or more generally the address 775 00:33:31,346 --> 00:33:33,926 of the first one and then, man, I'll just figure it 776 00:33:33,926 --> 00:33:36,126 out from there where the rest of the letters are 777 00:33:36,126 --> 00:33:37,636 because by definition of a string, 778 00:33:37,816 --> 00:33:38,936 they're back to back to back. 779 00:33:38,936 --> 00:33:41,456 So I just have to see, oh, here's a D, let me keep looking, 780 00:33:41,456 --> 00:33:44,116 here's an A, keep looking here's a V, here's an I, here's a D 781 00:33:44,336 --> 00:33:47,386 and yet, and this is one tidbit we introduced a week or so ago, 782 00:33:47,496 --> 00:33:50,446 how do I now know if I'm just given the address of the start 783 00:33:50,446 --> 00:33:51,916 of the string where the end is? 784 00:33:53,176 --> 00:33:56,536 Right, so there's that special sentinel value that's back slash 785 00:33:56,536 --> 00:33:59,026 zero, which we'll start to see more explicitly now. 786 00:33:59,296 --> 00:34:01,296 This was the special value that said 787 00:34:01,296 --> 00:34:04,506 to the computer string stops here and because you have 788 00:34:04,506 --> 00:34:07,366 that barrier given to you at the end of every string, 789 00:34:07,686 --> 00:34:10,676 just the most efficient way you can pass strings around is just 790 00:34:10,676 --> 00:34:13,626 by passing the address of their very first bytes, 791 00:34:13,766 --> 00:34:14,876 the location in RAM. 792 00:34:15,126 --> 00:34:18,636 So, what are we really doing in this line of code here? 793 00:34:18,636 --> 00:34:20,056 What are we really comparing? 794 00:34:21,596 --> 00:34:22,626 The addresses. 795 00:34:22,946 --> 00:34:25,156 Now, if I've asked the user for a string 796 00:34:25,436 --> 00:34:28,636 and then a moment later I ask the user for another string, 797 00:34:28,926 --> 00:34:31,376 well, they're going to end up in different locations 798 00:34:31,376 --> 00:34:33,216 in memory just by nature of get string. 799 00:34:33,216 --> 00:34:34,986 If you called get string multiple times, 800 00:34:35,216 --> 00:34:38,136 surely you've been able to get different strings from the user. 801 00:34:38,136 --> 00:34:41,106 So they are, in fact, returning different chunks of memory 802 00:34:41,106 --> 00:34:43,636 as we'll soon see but when you're comparing s1 803 00:34:43,636 --> 00:34:47,106 against s2 you're literally saying does the address 804 00:34:47,106 --> 00:34:50,006 of the first string equal the address of the second string 805 00:34:50,206 --> 00:34:51,726 and hopefully that's not the case 806 00:34:51,726 --> 00:34:53,666 because otherwise get string is pretty useless 807 00:34:53,666 --> 00:34:56,506 if it only returns the same chunk of memory again and again. 808 00:34:56,506 --> 00:34:58,386 You could never ask the user for more than one word 809 00:34:58,386 --> 00:34:59,436 or more than one sentence. 810 00:34:59,756 --> 00:35:02,496 So, in other words, if we apply some of the storyline 811 00:35:02,496 --> 00:35:05,906 from the previous tale, if s1 happens to be 4, 5, 812 00:35:05,906 --> 00:35:08,146 6 and s2 happens to be 1, 2, 3, 813 00:35:08,446 --> 00:35:10,406 you're just asking the question is 4, 5, 814 00:35:10,406 --> 00:35:12,356 6 equals equals to 1, 2, 3? 815 00:35:12,636 --> 00:35:14,786 Well, obviously not so you're always going 816 00:35:14,786 --> 00:35:17,406 to say you typed different things. 817 00:35:17,846 --> 00:35:20,736 Well, let's try; let's be a little resistant here to this. 818 00:35:20,736 --> 00:35:21,906 Let me go ahead and open copy1.c. 819 00:35:22,046 --> 00:35:26,276 Let's actually try to really manipulate these strings 820 00:35:26,276 --> 00:35:27,436 in a more deliberate fashion. 821 00:35:27,656 --> 00:35:28,746 This program is not too long. 822 00:35:28,746 --> 00:35:29,816 It's just a few lines here 823 00:35:29,816 --> 00:35:31,656 and I'll introduce one new function as well. 824 00:35:32,066 --> 00:35:35,766 Here in this version this is copy1.c, 825 00:35:35,816 --> 00:35:36,976 which you should have a printout of. 826 00:35:37,186 --> 00:35:38,926 I again say, say something. 827 00:35:39,186 --> 00:35:41,856 Now, I've taken off the training wheel of the CS50 Library 828 00:35:41,856 --> 00:35:44,376 and I'm just flat out saying char *. No more strings. 829 00:35:44,476 --> 00:35:46,966 They are char *s because we now know what that means. 830 00:35:47,336 --> 00:35:48,336 So, I'm getting a string 831 00:35:48,606 --> 00:35:50,716 and so what really am I doing in this line of code? 832 00:35:50,886 --> 00:35:52,966 Well, get string again is returning the address 833 00:35:53,596 --> 00:35:55,116 of the string the user typed in, 834 00:35:55,116 --> 00:35:58,376 the address of the very first character like the letter D 835 00:35:58,606 --> 00:36:01,846 and storing that address in this variable s1. 836 00:36:01,846 --> 00:36:05,756 Now, I mentioned this other sentinel value a while back. 837 00:36:05,756 --> 00:36:08,356 If a get string returns null, 838 00:36:08,356 --> 00:36:10,316 that pretty much means something went wrong 839 00:36:10,316 --> 00:36:12,046 and there are a few things that could go wrong. 840 00:36:12,266 --> 00:36:15,736 One of the perhaps easiest ones to put your finger on is what 841 00:36:15,736 --> 00:36:16,896 if the computer is out of memory? 842 00:36:17,046 --> 00:36:18,856 I mean what if you're running so many things, 843 00:36:19,046 --> 00:36:21,936 what if the user has copied and pasted their thesis 844 00:36:22,046 --> 00:36:24,066 and just pasted it at the blinking prompt 845 00:36:24,306 --> 00:36:25,796 such that you're now out of memory 846 00:36:25,796 --> 00:36:27,626 because your computer is somewhat limited in memory 847 00:36:27,846 --> 00:36:31,926 so get string cannot possibly return all those characters 848 00:36:31,926 --> 00:36:34,356 or fit all of those characters in memory and return 849 00:36:34,356 --> 00:36:35,636 to you the address of the first. 850 00:36:35,896 --> 00:36:36,836 It can't do any of that. 851 00:36:36,836 --> 00:36:39,166 And so get string is defined as we'll see 852 00:36:39,326 --> 00:36:40,986 as just returning this special value 853 00:36:40,986 --> 00:36:43,336 in all uppercase called null which is its way 854 00:36:43,336 --> 00:36:45,906 of saying there is no address because, in fact, 855 00:36:46,246 --> 00:36:50,036 what null really represents under the hood is just similar 856 00:36:50,036 --> 00:36:54,436 in spirit to this, but it's just this value 0x00. 857 00:36:54,436 --> 00:36:56,766 This just means hexadecimal notation, which we'll come back 858 00:36:56,806 --> 00:36:59,596 to before long so really it just means it's returning the 859 00:36:59,596 --> 00:37:00,396 address zero. 860 00:37:00,676 --> 00:37:03,946 Now the address zero I said I think on Monday is special. 861 00:37:04,066 --> 00:37:07,266 Only the operating system has controlling of byte zero 862 00:37:07,306 --> 00:37:10,926 in the computer's RAM and so if a function ever returns null, 863 00:37:10,926 --> 00:37:13,936 aka zero, well, something must have gone wrong 864 00:37:13,936 --> 00:37:16,226 because that can't possibly belong to me that memory 865 00:37:16,456 --> 00:37:18,386 because by human convention zero is owned 866 00:37:18,386 --> 00:37:20,696 by the operating system; not by a program I wrote. 867 00:37:21,066 --> 00:37:22,226 So, I just return 1. 868 00:37:22,286 --> 00:37:24,746 Why 1? It's arbitrary, but it's anything other than zero 869 00:37:24,746 --> 00:37:26,456 so I just exit because something went wrong 870 00:37:26,456 --> 00:37:27,666 and this program is just going to bail. 871 00:37:28,076 --> 00:37:30,846 Now, what's really going on here in this line 872 00:37:30,846 --> 00:37:33,586 of code here char * s2 gets s1? 873 00:37:33,586 --> 00:37:35,376 Well, let's just reason through it. 874 00:37:35,656 --> 00:37:39,206 s2 is what data type now? 875 00:37:39,466 --> 00:37:40,306 It's not a char. 876 00:37:40,306 --> 00:37:41,946 It's instead a? 877 00:37:42,896 --> 00:37:45,816 So it's an address, it's a pointer, address pointer, 878 00:37:45,816 --> 00:37:49,096 synonyms for now, so it's the address of a char 879 00:37:49,356 --> 00:37:52,406 so this makes sense because s1 is also the address of a char 880 00:37:52,646 --> 00:37:54,436 so if I wanted to make a copy 881 00:37:54,436 --> 00:37:57,406 of that address this is absolutely the right syntax. 882 00:37:57,516 --> 00:37:59,326 It's char * s2 gets s1. 883 00:37:59,766 --> 00:38:03,506 So that is actually making a copy 884 00:38:03,776 --> 00:38:06,336 of the address and putting it in s2. 885 00:38:06,336 --> 00:38:09,756 What is it not doing conceptually though? 886 00:38:09,996 --> 00:38:12,366 It's not all copying the string. 887 00:38:12,366 --> 00:38:15,676 It's not copying the F-O-O or the D-A-V-I-D. 888 00:38:15,676 --> 00:38:19,336 It's only copying the address that was returned by get string 889 00:38:19,596 --> 00:38:22,266 so even though conceptually is the name this program suggests, 890 00:38:22,266 --> 00:38:23,716 I really just want to make a copy 891 00:38:23,936 --> 00:38:26,796 so that maybe I can make one version of D-A-V-I-D 892 00:38:26,796 --> 00:38:29,076 or F-O-O all uppercase or all lowercase or I want 893 00:38:29,076 --> 00:38:29,916 to spell check, something. 894 00:38:30,106 --> 00:38:32,716 I want an actual copy in memory so that that string is 895 00:38:32,716 --> 00:38:33,736 in two different places. 896 00:38:34,056 --> 00:38:35,526 This is not going to be the way to do it. 897 00:38:35,736 --> 00:38:38,576 So, let's do a little sanity check now and let's try 898 00:38:38,576 --> 00:38:40,966 to demonstrate as much I now claim I'm going 899 00:38:40,966 --> 00:38:43,426 to capitalize the copy of the string I just made. 900 00:38:43,676 --> 00:38:45,086 First, I'm going to do a sanity check 901 00:38:45,086 --> 00:38:47,516 so we've used string length, strlen, before. 902 00:38:47,856 --> 00:38:49,406 Turns out you really want 903 00:38:49,406 --> 00:38:52,646 to start using string length any time you're dealing with strings 904 00:38:52,646 --> 00:38:57,386 or now char *s because string length was written a long time 905 00:38:57,386 --> 00:39:00,896 ago and it was not designed itself to check if the length 906 00:39:00,896 --> 00:39:04,266 of a string is zero so if you don't check whether the length 907 00:39:04,266 --> 00:39:07,626 of a string is zero and you pass in essentially the address 908 00:39:07,946 --> 00:39:10,616 of a bogus chunk of memory you pass in zero, zero, 909 00:39:10,896 --> 00:39:13,836 string length is not going to return and say retry. 910 00:39:13,836 --> 00:39:15,686 It's instead going to do what do you think? 911 00:39:16,796 --> 00:39:18,016 It's going to segfault, right? 912 00:39:18,016 --> 00:39:19,606 If you pass a function in C, 913 00:39:19,606 --> 00:39:21,996 a value that it's not expecting bad things happen 914 00:39:21,996 --> 00:39:24,206 and bad things generally reduce to segfault. 915 00:39:24,206 --> 00:39:25,726 So, I'm going to check for it myself. 916 00:39:25,836 --> 00:39:27,216 It's very easy to do. 917 00:39:27,326 --> 00:39:29,916 It's string length takes a string as its argument 918 00:39:29,986 --> 00:39:33,376 or now aka char * so I can just say is the length 919 00:39:33,376 --> 00:39:34,736 of s2 greater than 1? 920 00:39:34,736 --> 00:39:36,756 If so, I've got some letters to capitalize. 921 00:39:37,086 --> 00:39:37,726 So, what do I do? 922 00:39:37,726 --> 00:39:40,726 Well, this is syntax you might have used in caesar or vigenere. 923 00:39:40,726 --> 00:39:43,106 You can treat a string as though it's an array 924 00:39:43,106 --> 00:39:43,856 because it really is. 925 00:39:43,856 --> 00:39:46,186 It's just a chunk of characters back to back to back 926 00:39:46,556 --> 00:39:49,806 so this is storing at the very first location 927 00:39:49,806 --> 00:39:50,926 in the string what? 928 00:39:50,926 --> 00:39:53,646 It's storing what there? 929 00:39:53,856 --> 00:39:58,186 It's passing to this function called toupper, 930 00:39:58,186 --> 00:40:00,006 which if you've never used it it actually does what it says it 931 00:40:00,006 --> 00:40:01,226 makes it touppercase. 932 00:40:01,656 --> 00:40:04,216 So, I'm passing in the first character in s2, 933 00:40:04,286 --> 00:40:06,456 I'm making it uppercase and then I'm putting it back 934 00:40:06,876 --> 00:40:09,526 so casually speaking this is just capitalizing the first 935 00:40:09,526 --> 00:40:12,296 letter of whatever word the user typed in to s2. 936 00:40:12,296 --> 00:40:13,096 That copy, that's all. 937 00:40:13,206 --> 00:40:14,386 It's not capitalizing the whole thing. 938 00:40:14,626 --> 00:40:15,966 Because I've hard coded the zero, 939 00:40:15,996 --> 00:40:17,646 it's just capitalizing they first character. 940 00:40:17,896 --> 00:40:19,656 So the very last two lines of code I'm saying this, 941 00:40:19,986 --> 00:40:23,576 the original string is s1, the copy of the string is s2, 942 00:40:23,686 --> 00:40:27,256 but if I haven't lost you, what are we really going 943 00:40:27,256 --> 00:40:31,406 to see when we print this? 944 00:40:31,666 --> 00:40:32,386 Someone assert. 945 00:40:32,386 --> 00:40:33,166 >> Student: [inaudible]. 946 00:40:33,166 --> 00:40:34,876 >> David: Both capitalized. 947 00:40:35,136 --> 00:40:37,486 We're going to see them both capitalized 948 00:40:37,526 --> 00:40:39,736 because I've made a newbie mistake here. 949 00:40:39,736 --> 00:40:41,566 This is not actually copying the string 950 00:40:41,566 --> 00:40:43,906 and giving me two different versions of foo or David 951 00:40:43,906 --> 00:40:44,836 or whatever I typed in, 952 00:40:45,106 --> 00:40:48,096 it's actually just giving me two copies of the address and so 953 00:40:48,096 --> 00:40:51,846 when I do this here manipulating the zero character 954 00:40:51,846 --> 00:40:53,646 at address s2, well, you know what? 955 00:40:53,646 --> 00:40:56,056 If s2 equals s1, guess which character you're also changing? 956 00:40:56,116 --> 00:40:57,766 The original because it's the same thing. 957 00:40:57,766 --> 00:40:59,336 You've made a copy of the address. 958 00:40:59,366 --> 00:41:00,326 So let's actually see this in action. 959 00:41:00,356 --> 00:41:00,806 So make copy one. 960 00:41:00,836 --> 00:41:01,886 All right, I'm going to run copy one enter. 961 00:41:01,916 --> 00:41:02,336 Let's type in foo. 962 00:41:02,366 --> 00:41:03,026 All right, so looks good, right? 963 00:41:03,056 --> 00:41:04,196 Not a problem, but now let's actually type 964 00:41:04,226 --> 00:41:04,886 in something in all lowercase. 965 00:41:04,916 --> 00:41:05,756 F-O-O in lowercase and, in fact, 966 00:41:05,786 --> 00:41:07,016 it is buggy because I typed lowercase F-O-O and 967 00:41:07,046 --> 00:41:08,786 yet I get back capital F-O-O as both the original and the copy 968 00:41:08,816 --> 00:41:10,196 and that's simply because I haven't done this correctly. 969 00:41:10,226 --> 00:41:10,886 So, any questions on that? 970 00:41:10,916 --> 00:41:10,983 Yeah? 971 00:41:10,983 --> 00:41:11,726 >> Student: [Inaudible]. 972 00:41:11,726 --> 00:41:13,936 >> David: Correct, yep, char *s2 gets s1. 973 00:41:13,936 --> 00:41:14,003 >> Student: [Inaudible]. 974 00:41:14,003 --> 00:41:15,016 >> David: So the, why do I need the *? 975 00:41:15,016 --> 00:41:15,083 >> Student: [inaudible]. 976 00:41:15,083 --> 00:41:18,526 >> David: So the reason here, in a nutshell and we'll come back 977 00:41:18,526 --> 00:41:22,916 to other examples that demonstrate this, in a nutshell 978 00:41:22,916 --> 00:41:32,896 because I declared s1 to be a char *, s2 if I'm going 979 00:41:34,276 --> 00:41:44,636 to make a copy of s1, it has to be the same data type 980 00:41:44,636 --> 00:41:48,856 and so recall that just as when, 981 00:41:48,856 --> 00:41:59,306 and actually maybe this is my fault, 982 00:42:00,416 --> 00:42:04,336 just as when you declare a function prototype, 983 00:42:04,336 --> 00:42:07,166 which we did a moment ago, recall that in swap.c 984 00:42:07,166 --> 00:42:14,216 when you declare a function prototype and you, therefore, 985 00:42:14,546 --> 00:42:18,706 declare the variables, you say * to denote this is going 986 00:42:18,706 --> 00:42:23,426 to be a pointer, and I did not say this before. 987 00:42:24,186 --> 00:42:26,866 When you declare a pointer yourself manually, 988 00:42:26,866 --> 00:42:28,496 you do say char * the variable name 989 00:42:28,566 --> 00:42:31,476 because recall that's the same thing that we did earlier 990 00:42:31,526 --> 00:42:34,646 but we called it instead string. 991 00:42:34,766 --> 00:42:38,446 And so if we wanted to do this here, it's again, 992 00:42:38,446 --> 00:42:43,296 the same thing we're just now pulling back this layer 993 00:42:43,366 --> 00:42:49,196 and calling them char *s not actually strings. 994 00:42:49,196 --> 00:42:49,506 >> Student: [inaudible] 995 00:42:49,506 --> 00:42:52,486 >> David: And the * will have other powers 996 00:42:52,596 --> 00:42:53,846 that we'll soon see. 997 00:42:54,206 --> 00:43:01,976 Why don't we take a three-minute break? 998 00:43:02,046 --> 00:43:02,326 All right. 999 00:43:02,716 --> 00:43:03,676 So, we're back. 1000 00:43:03,676 --> 00:43:06,156 So this is the picture we've been using a whole lot. 1001 00:43:06,156 --> 00:43:08,546 And, again, the rectangle represents your computer's RAM, 1002 00:43:08,796 --> 00:43:10,566 the bottom represents the part of RAM 1003 00:43:10,566 --> 00:43:13,366 that we generally call the stack, main conceptually ends 1004 00:43:13,366 --> 00:43:14,986 up on the bottom of the stack followed 1005 00:43:14,986 --> 00:43:18,136 by its local variables then the function say foo that it calls 1006 00:43:18,136 --> 00:43:21,256 and on and on and on and up, but there is, in fact, 1007 00:43:21,536 --> 00:43:24,486 something above all of this and we've seen this picture briefly 1008 00:43:24,736 --> 00:43:26,136 and that's this thing called the heap. 1009 00:43:26,486 --> 00:43:30,196 So, in memory, you have different zones if you will. 1010 00:43:30,196 --> 00:43:32,506 The bottom, again, is generally called the stack, but it turns 1011 00:43:32,506 --> 00:43:33,696 out there's stuff even lower 1012 00:43:33,696 --> 00:43:36,476 than that conceptually things called environment variables 1013 00:43:36,746 --> 00:43:38,456 above, in fact, memory 1014 00:43:38,706 --> 00:43:42,936 at the very top conceptually is the text segment and so it turns 1015 00:43:42,936 --> 00:43:46,166 out that even though it's given a strange name the text segment 1016 00:43:46,166 --> 00:43:50,106 in RAM is actually the zeros and ones that you compiled and then 1017 00:43:50,106 --> 00:43:52,916 when you ran your program a program just like on your Mac 1018 00:43:52,916 --> 00:43:54,686 or PC gets loaded into memory. 1019 00:43:54,736 --> 00:43:57,386 You double click an icon, the program gets loaded into memory, 1020 00:43:57,606 --> 00:44:00,046 well, conceptually where does your program end up? 1021 00:44:00,346 --> 00:44:03,336 It ends up at the top most portion of the computer's RAM. 1022 00:44:03,336 --> 00:44:06,026 All right so it has to live in RAM as opposed to the hard drive 1023 00:44:06,026 --> 00:44:08,466 because otherwise things would be terribly slow as you know 1024 00:44:08,726 --> 00:44:11,096 so it's much better if your programs live while they're 1025 00:44:11,096 --> 00:44:12,796 running in RAM and they end 1026 00:44:12,796 --> 00:44:14,436 up in what's called the tech segment. 1027 00:44:14,686 --> 00:44:17,586 Now, as an aside, there's another couple of layers 1028 00:44:17,586 --> 00:44:20,066 at the very top above the stack and above the heap, 1029 00:44:20,286 --> 00:44:21,556 but below the tech segment 1030 00:44:21,736 --> 00:44:24,786 and those are called initialized data and uninitialized data. 1031 00:44:25,146 --> 00:44:28,026 So it turns out if you've ever declared a global variable, 1032 00:44:28,026 --> 00:44:30,686 which we did once in lecture and if you've tackled the game 1033 00:44:30,686 --> 00:44:34,206 of 15 already you'll know that the board and the dimensions 1034 00:44:34,206 --> 00:44:36,926 of the board are very intentionally by us declared 1035 00:44:36,926 --> 00:44:41,236 to be global at the very top of fifteen.c. While the implication 1036 00:44:41,236 --> 00:44:44,886 of that in RAM is if you declare a global variable outside 1037 00:44:44,886 --> 00:44:47,226 of all your functions at the very top of your file, 1038 00:44:47,526 --> 00:44:50,016 they don't end up in the stack they end up way, 1039 00:44:50,016 --> 00:44:51,516 way up at the top of memory 1040 00:44:51,516 --> 00:44:53,686 in either the initialized data segment 1041 00:44:53,786 --> 00:44:55,896 if you assigned it a value with the equal sign, 1042 00:44:55,976 --> 00:44:58,286 the assignment's operator or they end up in this part, 1043 00:44:58,286 --> 00:45:00,986 the uninitialized data if you just say int x; 1044 00:45:01,536 --> 00:45:03,756 and don't actually give it a value yet. 1045 00:45:04,066 --> 00:45:07,606 So, conceptually if you've ever wondered why you get access 1046 00:45:07,666 --> 00:45:10,236 in all of your functions to global variables that's 1047 00:45:10,236 --> 00:45:13,226 because they're not down here, they're at the very top of RAM 1048 00:45:13,266 --> 00:45:16,706 and any function can access that RAM way up there, 1049 00:45:17,106 --> 00:45:19,556 but for now the interesting player 1050 00:45:19,556 --> 00:45:21,316 in the story is this thing called the heap. 1051 00:45:21,626 --> 00:45:23,506 So the heap is a chunk of memory 1052 00:45:24,076 --> 00:45:27,126 in a computer's RAM that's conceptually allocated 1053 00:45:27,126 --> 00:45:29,676 to what's called dynamic memory allocation. 1054 00:45:29,976 --> 00:45:31,676 There are absolutely going to be times 1055 00:45:31,676 --> 00:45:34,746 where you're running a program where the programmer, say you, 1056 00:45:35,026 --> 00:45:38,946 didn't possibly know in advanced how much RAM the program was 1057 00:45:38,946 --> 00:45:39,686 going to need. 1058 00:45:39,726 --> 00:45:42,286 For instance in the past, we had that silly little program 1059 00:45:42,286 --> 00:45:44,126 for computing the average of some quizzes 1060 00:45:44,126 --> 00:45:46,436 and it was actually a pretty bad implementation 1061 00:45:46,436 --> 00:45:49,386 because I had essentially hard coded in the number of quizzes. 1062 00:45:49,606 --> 00:45:51,796 We introduced a feature of C called a constant, 1063 00:45:52,146 --> 00:45:54,766 but how many quizzes did I assume every student has? 1064 00:45:55,786 --> 00:45:58,876 So, two. Now this is somewhat annoying because if we 1065 00:45:58,876 --> 00:46:01,616 in some future term have three quizzes or four or we want 1066 00:46:01,616 --> 00:46:04,306 to use this code for other courses that have a weekly quiz, 1067 00:46:04,606 --> 00:46:07,816 we'd have to recompile the program every time we want 1068 00:46:07,816 --> 00:46:09,966 to change that value and that's just annoying especially 1069 00:46:09,966 --> 00:46:12,316 if you're writing software that needs to actually end 1070 00:46:12,316 --> 00:46:15,726 up on consumer's computers you can't expect them to wait 1071 00:46:15,726 --> 00:46:17,506 until you change your code, recompile 1072 00:46:17,506 --> 00:46:18,486 or give them an update. 1073 00:46:18,766 --> 00:46:21,176 Why not write the program in a way where you figure 1074 00:46:21,176 --> 00:46:24,666 out dynamically when the program is run how much memory you need 1075 00:46:24,666 --> 00:46:27,736 rather than hard coding in two with or within that constant. 1076 00:46:28,026 --> 00:46:30,496 The heap offers us a solution to that problem. 1077 00:46:30,746 --> 00:46:32,696 In fact, consider the CS50 Library. 1078 00:46:32,696 --> 00:46:34,986 You've been calling get string; you've been calling get in, 1079 00:46:35,036 --> 00:46:36,806 maybe get float and a couple of others. 1080 00:46:37,176 --> 00:46:41,046 Well, we certainly didn't know on day one how many times each 1081 00:46:41,046 --> 00:46:42,656 of you was going to want to call get string, 1082 00:46:42,656 --> 00:46:45,946 how many words a user might type when you call get string. 1083 00:46:46,206 --> 00:46:50,916 So, certainly the CS50 Library designed to be dynamic and, 1084 00:46:50,916 --> 00:46:54,446 in fact, any time you call get string, we are, in fact, 1085 00:46:54,846 --> 00:46:58,226 allocating a chunk of RAM but it's not coming from the stack; 1086 00:46:58,226 --> 00:47:00,166 it's actually coming from this portion 1087 00:47:00,166 --> 00:47:01,426 of memory called the heap. 1088 00:47:01,796 --> 00:47:03,146 So, let's actually take a look 1089 00:47:03,146 --> 00:47:05,846 at how this might actually affect us. 1090 00:47:06,226 --> 00:47:10,706 So this is the code that we had a moment ago for copy one. 1091 00:47:11,016 --> 00:47:14,556 No, for copy1.c. Let me pull this up. 1092 00:47:15,206 --> 00:47:18,396 So this is copy1.c, same program as before break, 1093 00:47:18,646 --> 00:47:20,796 let's really see what's going on here. 1094 00:47:20,796 --> 00:47:24,486 So, let me go had and let's swivel this 1095 00:47:24,486 --> 00:47:26,756 around for just a moment so we have a blank slate 1096 00:47:27,786 --> 00:47:29,786 and now what's really going on? 1097 00:47:29,786 --> 00:47:31,316 Well, what's nice about this program is 1098 00:47:31,316 --> 00:47:33,386 that there's just one function, main, so we don't need 1099 00:47:33,386 --> 00:47:36,246 to draw the stack and get things all complicated. 1100 00:47:36,456 --> 00:47:40,186 We can just treat the whole blackboard as mains frames. 1101 00:47:40,356 --> 00:47:43,096 So, I'll be a little looser now where I put things. 1102 00:47:43,096 --> 00:47:44,526 So, we say, say something. 1103 00:47:44,736 --> 00:47:47,766 Then I go ahead and declare a char * called s1. 1104 00:47:47,766 --> 00:47:50,726 Well, it turns out on most computers an address 1105 00:47:50,726 --> 00:47:55,836 of the location and memory, aka a pointer, is itself 32 bits. 1106 00:47:55,996 --> 00:47:58,416 They can be 64 bits, but very often 1107 00:47:58,416 --> 00:48:01,156 on systems today are pointers 32 bits. 1108 00:48:01,346 --> 00:48:05,436 Now that's changing the more years that pass the more 1109 00:48:05,436 --> 00:48:09,216 of you have 64-bit computers and the more servers have many, 1110 00:48:09,216 --> 00:48:13,046 many gigabytes of RAM and so you need actually 64-bits, 1111 00:48:13,276 --> 00:48:16,666 but for now let's assume a common system whereby a pointer 1112 00:48:16,746 --> 00:48:19,536 just by definition of the homework is 32 bits. 1113 00:48:19,716 --> 00:48:20,666 Now, I mentioned this why? 1114 00:48:20,936 --> 00:48:21,756 Well, all this time 1115 00:48:21,756 --> 00:48:24,286 on the blackboard I always draw an int as a square. 1116 00:48:24,566 --> 00:48:27,656 Well, int is almost always 32 bits at least so far 1117 00:48:27,656 --> 00:48:28,656 as we've seen so, in fact, 1118 00:48:28,906 --> 00:48:30,936 any time I draw a pointer hence force I'm just going 1119 00:48:30,936 --> 00:48:33,786 to draw a square as well because they are, in fact, 1120 00:48:33,786 --> 00:48:35,456 the same size usually in memory. 1121 00:48:35,456 --> 00:48:36,086 So, here we go. 1122 00:48:36,426 --> 00:48:39,126 I have a variable called s1 and it's 1123 00:48:39,126 --> 00:48:41,276 of type char * so here we go. 1124 00:48:41,376 --> 00:48:44,536 It's a little square, it's going to be called s1 1125 00:48:44,536 --> 00:48:47,996 and that's what exists as of line two of this program, 1126 00:48:48,296 --> 00:48:50,196 but now I've called get string. 1127 00:48:50,496 --> 00:48:52,446 Get string recall does not use the stack. 1128 00:48:52,716 --> 00:48:54,186 It uses this thing called the heap. 1129 00:48:54,436 --> 00:48:56,796 I have no idea where that is other than to know that it's 1130 00:48:56,876 --> 00:48:59,656 up so let's just go ahead and assume that everything 1131 00:48:59,656 --> 00:49:03,166 over here is the heap so what has just happened? 1132 00:49:03,166 --> 00:49:05,476 Suppose I type in the word foo, F-O-O. 1133 00:49:05,476 --> 00:49:09,066 What get string does and we'll eventually see the actual source 1134 00:49:09,066 --> 00:49:12,456 code for CS50's Library what get string does is it allocates 1135 00:49:12,566 --> 00:49:15,916 in this chunk of memory called the heap enough RAM for the F, 1136 00:49:16,416 --> 00:49:21,426 for the O, for the O and for the back slash zero. 1137 00:49:21,426 --> 00:49:24,886 So it does that for us so that you can get away 1138 00:49:24,886 --> 00:49:26,826 with just knowing the address of the first byte 1139 00:49:27,106 --> 00:49:29,166 and it will make sure that you know when to stop 1140 00:49:29,166 --> 00:49:31,766 by including the special value so, in fact, get the string 1141 00:49:31,766 --> 00:49:33,206 if you type in a three-letter word, 1142 00:49:33,206 --> 00:49:35,896 we allocate four bytes no matter what 1143 00:49:35,956 --> 00:49:37,176 because we need an additional byte 1144 00:49:37,176 --> 00:49:40,936 for this special sentinel value back slash zero at the very end. 1145 00:49:41,196 --> 00:49:44,336 So now at this point in the story get string is 1146 00:49:44,336 --> 00:49:46,126 about to return because the user just typed 1147 00:49:46,126 --> 00:49:48,886 in F-O-O enter get string is about to return. 1148 00:49:49,066 --> 00:49:52,246 It can't return a chunk of memory like it can a copy 1149 00:49:52,246 --> 00:49:54,086 of an int so it can't return the whole thing 1150 00:49:54,086 --> 00:49:55,196 so what again is returned? 1151 00:49:56,256 --> 00:49:56,986 So it's the address. 1152 00:49:57,426 --> 00:49:58,536 Now, again, what's the address? 1153 00:49:58,676 --> 00:49:59,506 Frankly I don't know. 1154 00:49:59,506 --> 00:50:01,096 I mean you might have two gigabytes of RAM. 1155 00:50:01,306 --> 00:50:04,146 Down here we said this is like 4, 5, 6 and 1, 2, 1156 00:50:04,146 --> 00:50:09,826 3 so this is going to be memory location 7, 7, 7, 7, 1 whatever. 1157 00:50:09,826 --> 00:50:12,626 It's pretty arbitrary, but it's big relative 1158 00:50:12,626 --> 00:50:14,666 to what the numbers were we were talking about before. 1159 00:50:14,976 --> 00:50:18,716 All right so I probably should have chosen a shorter number 1160 00:50:18,716 --> 00:50:23,266 because now I can't figure it in here, but let's call it 71. 1161 00:50:23,966 --> 00:50:26,956 So, the address is 71 so what ends up in s1? 1162 00:50:26,956 --> 00:50:29,266 Well, literally the number 71. 1163 00:50:29,316 --> 00:50:33,666 That is the address in memory in the heap of the first byte 1164 00:50:33,666 --> 00:50:34,686 that the user typed in. 1165 00:50:34,936 --> 00:50:37,856 Now, frankly this is going to very quickly get distracting 1166 00:50:38,136 --> 00:50:40,896 and because who cares, certainly for the classes purposes 1167 00:50:40,956 --> 00:50:43,516 where things are in memory and for you in memory who cares 1168 00:50:43,516 --> 00:50:45,636 where things end up, you just care that they end up somewhere 1169 00:50:45,636 --> 00:50:48,566 in memory so what the world generally does is not bother 1170 00:50:48,566 --> 00:50:51,336 coming up with these contrived examples like 71 1171 00:50:51,336 --> 00:50:57,436 for this address and 1, 2, 3, and 3, 4, 5, 4, 5, 6. 1172 00:50:57,686 --> 00:51:01,856 Instead if we want this pointer to represent the address 1173 00:51:01,856 --> 00:51:03,796 of something in as much as it points 1174 00:51:03,836 --> 00:51:06,176 at that address let's just draw an arrow. 1175 00:51:06,696 --> 00:51:08,656 So, for the most part any time we talk about 1176 00:51:08,656 --> 00:51:11,656 or draw pointers an arrow suffices, which really 1177 00:51:11,656 --> 00:51:14,506 in there is a number like 71, which is the literal byte 1178 00:51:14,546 --> 00:51:17,726 that the F is actually in in RAM, but frankly who cares? 1179 00:51:18,146 --> 00:51:20,066 Let's just assume that s1 is a pointer 1180 00:51:20,066 --> 00:51:23,046 and as an arrow suggests it's pointing at this byte here. 1181 00:51:23,286 --> 00:51:24,546 So, after the second line 1182 00:51:24,546 --> 00:51:27,766 for code here char * s1 gets the return value 1183 00:51:27,766 --> 00:51:30,646 of get string this is what the state of our world looks like. 1184 00:51:30,646 --> 00:51:32,876 This is main string here on the stack. 1185 00:51:33,236 --> 00:51:35,966 This is the heap somewhere else in memory, but they are, 1186 00:51:35,966 --> 00:51:37,476 in fact, in distinct locations. 1187 00:51:37,476 --> 00:51:38,186 So, now I check. 1188 00:51:38,186 --> 00:51:39,976 Is s1 equal, equal to null? 1189 00:51:40,186 --> 00:51:40,906 Well, it's not. 1190 00:51:40,906 --> 00:51:42,046 It's not zero. 1191 00:51:42,046 --> 00:51:45,436 It's 71 or whatever so that if condition doesn't apply, 1192 00:51:45,766 --> 00:51:47,566 but now what do I do in this next line? 1193 00:51:47,706 --> 00:51:50,466 This one we concluded by talking about earlier? 1194 00:51:50,766 --> 00:51:53,616 char * s2 gets s1. 1195 00:51:53,616 --> 00:51:55,296 Well, I know how big a pointer is. 1196 00:51:55,436 --> 00:51:58,076 It looks like a square because it's 32 bits I'm told. 1197 00:51:58,526 --> 00:52:01,346 This thing is called s2 so let's just label it s2 1198 00:52:01,346 --> 00:52:03,576 and now s2 gets s1. 1199 00:52:03,576 --> 00:52:06,346 Well, the assignment operator makes a copy of the thing 1200 00:52:06,346 --> 00:52:08,136 on the right and puts it in the thing on the left. 1201 00:52:08,666 --> 00:52:10,596 Technically speaking what's here? 1202 00:52:10,836 --> 00:52:11,526 Seventy-one. 1203 00:52:11,526 --> 00:52:12,626 So, what really ends up here? 1204 00:52:13,066 --> 00:52:13,676 Seventy-one. 1205 00:52:14,016 --> 00:52:15,906 Again, stupid, arbitrary number 1206 00:52:15,906 --> 00:52:17,886 to choose let's just update our picture. 1207 00:52:18,086 --> 00:52:21,626 At this point in the story both s1 and s2 are literally pointing 1208 00:52:21,796 --> 00:52:23,476 at the same location in memory 1209 00:52:23,476 --> 00:52:25,156 so that's now what the story looks like. 1210 00:52:25,446 --> 00:52:26,636 So, now what happens next? 1211 00:52:26,716 --> 00:52:30,336 I next say I am capitalizing the copy dot, dot, dot. 1212 00:52:30,616 --> 00:52:31,666 I then do a sanity check. 1213 00:52:31,666 --> 00:52:34,396 Is the strength length greater than two or greater than zero? 1214 00:52:34,586 --> 00:52:35,806 Yeah, there's like three letters there 1215 00:52:35,806 --> 00:52:38,116 so that condition doesn't cause problems. 1216 00:52:38,556 --> 00:52:40,106 So now I do this. 1217 00:52:40,136 --> 00:52:41,376 s2 bracket zero. 1218 00:52:41,826 --> 00:52:44,576 So s2 is this guy. 1219 00:52:45,036 --> 00:52:47,606 It turns out that the bracket notation just means 1220 00:52:47,606 --> 00:52:49,406 that this is essentially an array and you know what? 1221 00:52:49,466 --> 00:52:52,616 It is. It's drawn like an array, it effectively is an array 1222 00:52:52,906 --> 00:52:58,126 so bracket zero means go to the zero's location in that array, 1223 00:52:58,246 --> 00:53:00,136 which happens to be F and do what with it? 1224 00:53:00,586 --> 00:53:03,716 Well, call toupper pass this lowercase F 1225 00:53:03,956 --> 00:53:07,136 to this function called toupper it's going to return capital F 1226 00:53:07,406 --> 00:53:10,896 and so what do I assign to s2 bracket zero? 1227 00:53:11,266 --> 00:53:13,136 Well, the return value of toupper 1228 00:53:13,456 --> 00:53:14,886 so that literally changes this. 1229 00:53:15,266 --> 00:53:18,696 Now notice the key takeaway of that example was s1 is pointing 1230 00:53:18,696 --> 00:53:21,226 at that first letter; s2 is pointing at that same letter. 1231 00:53:21,226 --> 00:53:24,916 It doesn't matter that I happen to get there by way of s2 1232 00:53:24,916 --> 00:53:28,216 if they both lead to the same destination the end result is 1233 00:53:28,216 --> 00:53:29,626 going to be that they both, in fact, 1234 00:53:29,986 --> 00:53:36,116 look like capital F lowercase o, lowercase o. Any questions 1235 00:53:36,116 --> 00:53:43,286 on that part of the story? 1236 00:53:44,046 --> 00:53:44,166 Yeah? 1237 00:53:44,666 --> 00:53:51,236 >> Student: [Inaudible]. 1238 00:53:51,736 --> 00:54:00,456 >> David: Could you speak up a little louder? 1239 00:54:00,456 --> 00:54:01,596 >> Student: [Inaudible]. 1240 00:54:01,596 --> 00:54:01,926 >> David: Okay. 1241 00:54:01,926 --> 00:54:03,016 So good question. 1242 00:54:03,016 --> 00:54:05,366 So, generally speaking when a function returns the memory 1243 00:54:05,366 --> 00:54:07,326 that it used gets overwritten or disappears. 1244 00:54:07,656 --> 00:54:09,836 First, that's actually not applicable here 1245 00:54:09,956 --> 00:54:12,716 because in this story there's only one function that I wrote 1246 00:54:12,716 --> 00:54:14,936 at least called main so there's no notion 1247 00:54:14,936 --> 00:54:17,656 of one stack frame then another then another 1248 00:54:17,656 --> 00:54:18,766 and then those being popped off. 1249 00:54:19,286 --> 00:54:21,536 >> Student: Yes, that's the question. 1250 00:54:21,536 --> 00:54:22,376 >> David: So, correct. 1251 00:54:22,506 --> 00:54:23,406 Okay, so good point. 1252 00:54:23,576 --> 00:54:26,336 So there are function calls going on not that I wrote 1253 00:54:26,336 --> 00:54:30,156 but that the CS50 Library wrote the get string function does, 1254 00:54:30,156 --> 00:54:32,476 in fact, result in this process of a frame going 1255 00:54:32,476 --> 00:54:35,796 on the stack then it gets popped off when get string return 1256 00:54:35,796 --> 00:54:37,686 so that part of the story is, in fact, consistent, 1257 00:54:38,026 --> 00:54:40,426 but what's saving us here is 1258 00:54:40,426 --> 00:54:44,106 that get string allocates these bytes for F-O-O 1259 00:54:44,106 --> 00:54:46,326 and then the back slash zero not on the stack. 1260 00:54:46,626 --> 00:54:48,216 It's not a local variable. 1261 00:54:48,426 --> 00:54:50,226 We'll see in a moment there's a special function. 1262 00:54:50,226 --> 00:54:54,686 Its name is malloc for memory allocation and what malloc does 1263 00:54:54,686 --> 00:54:56,976 for us is we say, hmm, the user has typed 1264 00:54:57,046 --> 00:54:58,486 in a three-letter word. 1265 00:54:58,676 --> 00:55:02,076 I need four bytes and so you essentially call malloc a four 1266 00:55:02,316 --> 00:55:05,836 and this function is going to grab from the heap the address 1267 00:55:06,346 --> 00:55:09,566 of four spare bytes that are not currently being used 1268 00:55:09,566 --> 00:55:11,146 by anything else and the fact 1269 00:55:11,146 --> 00:55:12,756 that get string is using the heap 1270 00:55:13,096 --> 00:55:15,986 and not the stack means we avoid that inherent problem 1271 00:55:15,986 --> 00:55:17,536 that we keep tripping over with swap 1272 00:55:17,866 --> 00:55:20,086 because we're using a different chunk of memory altogether 1273 00:55:20,086 --> 00:55:22,396 and that, again, is why the heap is now 1274 00:55:22,396 --> 00:55:23,736 such a compelling part of the story. 1275 00:55:23,736 --> 00:55:26,326 It gives us the power to have complete dynamism 1276 00:55:26,326 --> 00:55:27,056 in our program. 1277 00:55:27,056 --> 00:55:29,466 We don't have to worry about memory disappearing just 1278 00:55:29,466 --> 00:55:31,716 because my function is done executing. 1279 00:55:32,186 --> 00:55:33,856 So let's actually see how we solve this 1280 00:55:33,856 --> 00:55:35,956 with this function with actual code. 1281 00:55:35,956 --> 00:55:39,496 So, copy2.c is also among your printouts 1282 00:55:39,626 --> 00:55:42,266 and there's a little more line of code here that needs 1283 00:55:42,326 --> 00:55:43,896 to solve this for us, but the rest 1284 00:55:43,896 --> 00:55:45,346 of it is pretty much the same. 1285 00:55:45,416 --> 00:55:46,256 So, let's take a look. 1286 00:55:46,556 --> 00:55:49,526 This is copy2.c. At the very beginning I, again, 1287 00:55:49,526 --> 00:55:53,926 demand say something and then I declare s1 to be a string, 1288 00:55:53,956 --> 00:55:58,846 aka char *, and I store in s1 the string the user types in. 1289 00:55:58,906 --> 00:55:59,946 Now, that's not quite proper. 1290 00:55:59,946 --> 00:56:05,066 I store in s1 the address of the first byte that the user typed 1291 00:56:05,066 --> 00:56:08,046 in and by the way that first bite happens to live 1292 00:56:08,046 --> 00:56:10,746 in this new place called the heap and that's the only update 1293 00:56:10,746 --> 00:56:11,606 to the story thus far. 1294 00:56:11,866 --> 00:56:12,856 Null does not apply. 1295 00:56:12,856 --> 00:56:14,036 Let's assume that the user typed 1296 00:56:14,036 --> 00:56:15,926 in a pretty short word we didn't run out of memory 1297 00:56:15,926 --> 00:56:18,316 or anything crazy so here's the new feature. 1298 00:56:18,316 --> 00:56:20,876 It looks a little cryptic but it actually just does what it says. 1299 00:56:21,166 --> 00:56:22,746 So, malloc is a new function. 1300 00:56:22,986 --> 00:56:27,626 It's defined in a header file just FYI called standard lib. 1301 00:56:27,776 --> 00:56:30,386 Not stdio but stdlib.h. 1302 00:56:30,586 --> 00:56:32,226 So that's why I've included that header file. 1303 00:56:32,606 --> 00:56:35,576 malloc as its name suggests allocates memory 1304 00:56:35,766 --> 00:56:38,416 and it takes right now a single argument. 1305 00:56:38,616 --> 00:56:41,016 It just needs to know how many bytes of memory do you want? 1306 00:56:41,366 --> 00:56:44,286 Well, I don't know yet and although, 1307 00:56:44,286 --> 00:56:45,406 yes, I do in this story. 1308 00:56:45,406 --> 00:56:48,056 If the purpose of this program is to make a copy of the string. 1309 00:56:48,396 --> 00:56:49,516 Well, the user, I don't know 1310 00:56:49,516 --> 00:56:51,406 in advance what the user is going to type in, right? 1311 00:56:51,406 --> 00:56:53,366 I certainly don't want to assume they're always going 1312 00:56:53,366 --> 00:56:54,926 to type a three-letter or a four-letter 1313 00:56:54,926 --> 00:56:58,206 or a two-letter word I want some dynamism but that's fine 1314 00:56:58,256 --> 00:57:00,226 because get string can get a string of any length, 1315 00:57:00,906 --> 00:57:02,996 I can then use the string length function 1316 00:57:02,996 --> 00:57:06,326 to just ask while the program is running how big is the string 1317 00:57:06,326 --> 00:57:06,976 that I was handed? 1318 00:57:06,976 --> 00:57:08,216 How many characters are in it? 1319 00:57:08,476 --> 00:57:10,506 In this case, I'm going to get back the answer 3 1320 00:57:10,506 --> 00:57:12,166 because the user typed in F-O-O, 1321 00:57:12,416 --> 00:57:13,996 but wait a minute, what's with this? 1322 00:57:13,996 --> 00:57:14,686 Why the plus 1? 1323 00:57:16,206 --> 00:57:17,356 The back slash zero, right? 1324 00:57:17,356 --> 00:57:20,526 Because you have so much control over the computer now, yes, 1325 00:57:20,606 --> 00:57:23,166 the word is three letters, F-O-O, but if you're going 1326 00:57:23,166 --> 00:57:25,176 to make a copy of this thing, you better adhere 1327 00:57:25,176 --> 00:57:26,656 to the conventions of the computer, 1328 00:57:26,856 --> 00:57:29,766 which allocates one additional byte for that back slash zero 1329 00:57:29,986 --> 00:57:32,566 so I have to tell malloc I need three, 1330 00:57:32,626 --> 00:57:35,026 the length of s1 plus an additional 1 1331 00:57:35,316 --> 00:57:37,676 and now this is actually just a little safety net. 1332 00:57:37,816 --> 00:57:41,396 So, I'm now multiplying by the result of calling the size 1333 00:57:41,396 --> 00:57:42,996 of operator, which we've seen before. 1334 00:57:42,996 --> 00:57:44,486 We did it for silly purposes just 1335 00:57:44,486 --> 00:57:46,086 to see how big each data type was, 1336 00:57:46,396 --> 00:57:49,126 but on most systems the size of a char is what? 1337 00:57:49,126 --> 00:57:49,636 How many bytes? 1338 00:57:50,716 --> 00:57:54,406 One. So, on most systems if you didn't remember that, no bigger. 1339 00:57:54,406 --> 00:57:58,526 FYI, it's almost always 1 so this line here calling size 1340 00:57:58,526 --> 00:58:01,326 of char is pretty much equivalent on most computers 1341 00:58:01,326 --> 00:58:04,596 to doing this by just typing in 1 and at which point 1342 00:58:04,596 --> 00:58:05,386 if you're just going to multiply 1343 00:58:05,386 --> 00:58:08,146 by 1 you're just wasting time why not just eliminate 1344 00:58:08,146 --> 00:58:11,666 that all together, but if you write in code that you want 1345 00:58:11,666 --> 00:58:14,076 to be able to execute on today's computers and the computers 1346 00:58:14,076 --> 00:58:16,676 that come out next year where maybe the bytes are, 1347 00:58:16,676 --> 00:58:19,866 maybe the chars are two bytes the right way to write code 1348 00:58:19,866 --> 00:58:22,226 that does not crash when hardware changes is 1349 00:58:22,226 --> 00:58:26,546 to actually include something like that time size of char 1350 00:58:26,546 --> 00:58:28,186 so that if it does change at some point, 1351 00:58:28,186 --> 00:58:28,976 your program is not going to break. 1352 00:58:29,396 --> 00:58:32,886 So in the end, this is just saying allocate me as many bytes 1353 00:58:32,976 --> 00:58:35,576 as were needed to store s1 itself. 1354 00:58:35,836 --> 00:58:39,096 What am I storing in s2? 1355 00:58:39,426 --> 00:58:41,156 What must malloc be returning? 1356 00:58:41,816 --> 00:58:45,936 It's the address of that chunk of memory. 1357 00:58:45,936 --> 00:58:47,776 So, what's happening here in the story is this. 1358 00:58:48,086 --> 00:58:51,206 Now, again in heap malloc always gives you memory 1359 00:58:51,206 --> 00:58:53,086 from the heap not the stack and so 1360 00:58:53,086 --> 00:58:55,866 as an aside the CS50 Library uses, again, 1361 00:58:56,576 --> 00:58:58,806 malloc to get memory to use. 1362 00:58:59,146 --> 00:59:01,936 When I call malloc of that crazy arithmetic expression, 1363 00:59:02,126 --> 00:59:03,936 I'm really just calling get malloc a four 1364 00:59:03,936 --> 00:59:05,536 because I need 1, 2, 3, 4 bytes. 1365 00:59:05,806 --> 00:59:09,136 What malloc does for me is it finds in RAM somewhere 1366 00:59:09,256 --> 00:59:11,036 where there's four bytes in a row. 1367 00:59:11,036 --> 00:59:13,486 I have no idea what's here at the moment so I'm just going 1368 00:59:13,486 --> 00:59:15,596 to draw a question mark because that memory might have been used 1369 00:59:15,596 --> 00:59:17,066 previously for some other purpose, 1370 00:59:17,256 --> 00:59:19,496 but we know it's currently available to us 1371 00:59:19,836 --> 00:59:21,126 so we have four bytes of memory. 1372 00:59:21,166 --> 00:59:22,326 What does malloc return? 1373 00:59:22,576 --> 00:59:26,626 It returns the address of this first byte so really the address 1374 00:59:26,626 --> 00:59:30,326 of the first char here and so what gets stored in s2 now? 1375 00:59:30,716 --> 00:59:34,776 We're actually making a legitimate copy 1376 00:59:34,866 --> 00:59:36,296 with this version of the program 1377 00:59:36,566 --> 00:59:38,286 so I don't have this bug in anymore. 1378 00:59:38,286 --> 00:59:43,496 I'm going to delete that arrow and actually draw s2 as pointing 1379 00:59:43,496 --> 00:59:48,176 to this chunk of memory because whereas before this sequence 1380 00:59:48,176 --> 00:59:51,846 of chars might have lived at address 71 or whatever, well, 1381 00:59:51,846 --> 00:59:53,566 this one might live at 91. 1382 00:59:53,566 --> 00:59:54,516 Somewhere else in memory 1383 00:59:54,516 --> 00:59:56,906 but it's a different number hence it's a different arrow. 1384 00:59:56,906 --> 00:59:58,886 It's pointing at a different place in the heap. 1385 00:59:59,266 --> 01:00:01,036 So now a sanity check. 1386 01:00:01,036 --> 01:00:02,176 Is s2 equal equal to null? 1387 01:00:02,396 --> 01:00:04,376 Could be. If the user typed in a really long word, 1388 01:00:04,376 --> 01:00:06,346 we're out of memory, something else went wrong, could be. 1389 01:00:06,346 --> 01:00:08,496 So, I have to check for that, but nine times 1390 01:00:08,496 --> 01:00:09,306 out of ten there's not going 1391 01:00:09,306 --> 01:00:10,736 to be a problem so now let's do this. 1392 01:00:11,516 --> 01:00:13,716 Well, what does it mean to copy a string? 1393 01:00:13,716 --> 01:00:15,976 These question marks are not the string yet right? 1394 01:00:15,976 --> 01:00:17,306 All I did was ask for memory. 1395 01:00:17,556 --> 01:00:19,756 If I want a copy of the string, I've got to whip 1396 01:00:19,756 --> 01:00:22,886 out my week 1 skills of just iterating with a four loop 1397 01:00:22,886 --> 01:00:25,126 from left to right and make copies of those characters. 1398 01:00:25,126 --> 01:00:26,076 So, what am I doing there? 1399 01:00:26,396 --> 01:00:28,426 So, I'm initializing a variable called N 1400 01:00:28,806 --> 01:00:30,336 to the string length of s1. 1401 01:00:30,336 --> 01:00:32,986 So conceptually that's going to be what, 3 or 4? 1402 01:00:34,706 --> 01:00:35,946 What's the string length of s1? 1403 01:00:36,676 --> 01:00:36,916 >> Student: Three. 1404 01:00:37,086 --> 01:00:37,756 >> David: So it is 3. 1405 01:00:37,756 --> 01:00:39,376 Even though there's four bytes the length 1406 01:00:39,376 --> 01:00:40,456 of the string is consistent 1407 01:00:40,456 --> 01:00:42,476 with what a human being would interpret as the length 1408 01:00:42,476 --> 01:00:43,616 of the string which is 3. 1409 01:00:43,946 --> 01:00:45,276 So, what do I next do? 1410 01:00:45,536 --> 01:00:50,366 I iterate from I equals zero on up to less than N so I'm 1411 01:00:50,366 --> 01:00:52,806 at this point in the story here plus, plus each time. 1412 01:00:52,806 --> 01:00:54,736 Well, this is, again, this is kind of like caesar stuff 1413 01:00:54,736 --> 01:00:56,976 or vigenere where you just copied characters back 1414 01:00:56,976 --> 01:00:57,366 and forth. 1415 01:00:57,656 --> 01:01:00,206 So this has the effect of starting at location zero 1416 01:01:00,506 --> 01:01:04,596 and I do s2 bracket I gets s1 bracket I. 1417 01:01:04,596 --> 01:01:05,986 Well, what's s1 bracket I? 1418 01:01:06,216 --> 01:01:08,656 Here's s1, it's pointing to this array, 1419 01:01:09,266 --> 01:01:11,676 bracket I is bracket zero initially so it's pointing 1420 01:01:11,676 --> 01:01:13,176 at capital F. Same deal. 1421 01:01:13,426 --> 01:01:17,086 s2 bracket I is s2 bracket zero which means here 1422 01:01:17,266 --> 01:01:20,616 so the question mark now gets clobbered or overwritten 1423 01:01:20,616 --> 01:01:22,696 with literally what was above it here. 1424 01:01:22,966 --> 01:01:23,866 Now what happens here? 1425 01:01:24,026 --> 01:01:25,676 Well, this question mark becomes an O, 1426 01:01:25,676 --> 01:01:28,416 this question mark becomes an O, and then the loop terminates 1427 01:01:28,816 --> 01:01:31,926 because it's iterating from zero to N so that's zero, 1, 1428 01:01:32,006 --> 01:01:35,356 2 and the length of the string is 3 so the loop terminates, 1429 01:01:35,446 --> 01:01:39,326 but I remember that I needed to have this special sentinel value 1430 01:01:39,596 --> 01:01:40,976 so I'm just going to put it there manually. 1431 01:01:41,076 --> 01:01:43,876 I could have done this in another way but I am going 1432 01:01:43,876 --> 01:01:47,296 to just be extra deliberate so I make sure that this copy ends 1433 01:01:47,296 --> 01:01:51,426 with back slash zero and voila, now, I have a copy. 1434 01:01:51,836 --> 01:01:56,816 So, if I compile and run copy2.c now let's see what happens. 1435 01:01:56,816 --> 01:02:01,386 Make copy2, all right, run copy2 enter. 1436 01:02:01,636 --> 01:02:04,286 I'm going to go ahead and type foo in all lowercase enter 1437 01:02:04,586 --> 01:02:07,466 and finally now I seem to have a working program 1438 01:02:07,466 --> 01:02:11,116 because this foo is at a different location than this foo 1439 01:02:11,116 --> 01:02:14,076 and so I have, in fact, copied the actual copy 1440 01:02:15,096 --> 01:02:17,676 or I have actually capitalized the actual copy. 1441 01:02:18,806 --> 01:02:20,576 Any questions on this copy program? 1442 01:02:21,996 --> 01:02:22,666 Anything at all? 1443 01:02:23,956 --> 01:02:24,226 All right. 1444 01:02:24,226 --> 01:02:27,026 So let's look at one other example 1445 01:02:27,026 --> 01:02:31,616 that we can then visualize with a fun video of sorts. 1446 01:02:31,736 --> 01:02:34,776 So, let's see this eraser actually never works very well 1447 01:02:34,776 --> 01:02:35,966 so I'm just going to draw below it. 1448 01:02:36,446 --> 01:02:40,276 So, suppose I'm in a function, I need the space. 1449 01:02:40,596 --> 01:02:41,976 All right. 1450 01:02:42,916 --> 01:02:44,606 All right. 1451 01:02:44,726 --> 01:02:48,176 So I am going to on the fly write a very simple little main 1452 01:02:48,176 --> 01:02:50,656 program that just illustrates some of the syntax 1453 01:02:50,716 --> 01:02:52,976 that we can then visualize more interesting. 1454 01:02:52,976 --> 01:02:56,506 So, int main, I don't care about any command line argument 1455 01:02:56,506 --> 01:02:57,746 so I'm just going to say void. 1456 01:02:58,356 --> 01:03:00,236 All right so now here's the start of my program. 1457 01:03:00,236 --> 01:03:01,646 It's not going to be very interesting yet, 1458 01:03:01,716 --> 01:03:04,056 but it will allow us to tease apart these ideas with, well, 1459 01:03:04,056 --> 01:03:04,896 we'll see some Play-Doh. 1460 01:03:05,196 --> 01:03:08,426 All right so I want first a pointer. 1461 01:03:08,426 --> 01:03:12,346 I'm going to call it x and how do I make a pointer to an int? 1462 01:03:12,846 --> 01:03:14,166 Well, I use the * notation. 1463 01:03:14,166 --> 01:03:15,256 So we've seen this before. 1464 01:03:15,806 --> 01:03:17,176 So I have int *x. 1465 01:03:17,176 --> 01:03:17,676 All right. 1466 01:03:18,296 --> 01:03:21,216 Now I'm going to have another pointer to an int. 1467 01:03:21,356 --> 01:03:25,506 I'm going to call it int *y enter or, sorry, semicolon, 1468 01:03:25,876 --> 01:03:27,086 and then what if I do this? 1469 01:03:27,736 --> 01:03:29,606 Based on the lessons we've been telling if I say *, 1470 01:03:29,606 --> 01:03:32,386 you know what I have a computer. 1471 01:03:32,386 --> 01:03:33,336 Why am I using chalk? 1472 01:03:34,376 --> 01:03:35,626 All right. 1473 01:03:36,856 --> 01:03:38,036 [Laughter] There we go. 1474 01:03:38,866 --> 01:03:42,556 Let's modernize this program in main, boy, 1475 01:03:42,556 --> 01:03:44,066 this is a lot easier it turns out. 1476 01:03:44,336 --> 01:03:46,466 Okay, and all of you can actually see it. 1477 01:03:46,466 --> 01:03:47,336 So, int *x. 1478 01:03:48,186 --> 01:03:52,266 Now get int *y so now I'll use the board 1479 01:03:52,266 --> 01:03:54,476 for things I can't really draw very well with the keyboard 1480 01:03:54,686 --> 01:03:58,886 so what does the memory of my program look like at the moment? 1481 01:03:59,586 --> 01:04:02,406 Well, at this point I have two boxes. 1482 01:04:03,066 --> 01:04:07,076 A square it's called X, another square it's called y 1483 01:04:07,366 --> 01:04:09,666 and now this time I'm doing pointers 1484 01:04:09,666 --> 01:04:11,216 to int not points to char. 1485 01:04:11,216 --> 01:04:14,526 So, should I be drawing these boxes four times bigger 1486 01:04:14,876 --> 01:04:17,136 since ints are four times bigger than chars usually? 1487 01:04:17,746 --> 01:04:19,626 So a little sanity check. 1488 01:04:19,626 --> 01:04:23,696 No. Even though the key word has changed from char to int, 1489 01:04:24,016 --> 01:04:25,866 the * is the same and the fact 1490 01:04:25,866 --> 01:04:28,256 that I've got a * there is saying this is a pointer 1491 01:04:28,256 --> 01:04:32,116 to an int or a pointer to a char pointers are always the same 1492 01:04:32,116 --> 01:04:32,966 size on a computer. 1493 01:04:32,966 --> 01:04:37,116 They are always say 32 bits or always 64 bits they do not vary 1494 01:04:37,116 --> 01:04:39,826 in size based on what they're pointing at. 1495 01:04:39,986 --> 01:04:41,886 All right so now I have this part of the story. 1496 01:04:42,206 --> 01:04:43,296 Let me go ahead and do this. 1497 01:04:43,506 --> 01:04:45,816 At this point in the story you know what's inside 1498 01:04:45,816 --> 01:04:46,366 of this thing? 1499 01:04:46,766 --> 01:04:48,076 Well, who knows frankly. 1500 01:04:48,416 --> 01:04:49,086 Question mark. 1501 01:04:49,536 --> 01:04:52,056 Right? I've allocated a variable, 1502 01:04:52,286 --> 01:04:54,116 it happens to be a pointer; it's still a variable, 1503 01:04:54,386 --> 01:04:55,466 who know what's in it, right? 1504 01:04:55,466 --> 01:04:57,446 We've seen in class that you get garbage values 1505 01:04:57,446 --> 01:04:59,256 if you don't initialize something to a value 1506 01:04:59,586 --> 01:05:02,126 so let me actually initialize x to a value. 1507 01:05:02,126 --> 01:05:04,886 I'm going to now use this new, fancy function called malloc 1508 01:05:05,086 --> 01:05:08,186 and I'm going to say x gets the return value of malloc 1509 01:05:08,446 --> 01:05:10,576 of the size of an int. 1510 01:05:11,406 --> 01:05:12,926 So what does this line of code do? 1511 01:05:12,926 --> 01:05:15,896 Well, somewhere in the world there's this thing called the 1512 01:05:15,986 --> 01:05:18,176 heap, foo is going to be no more. 1513 01:05:18,176 --> 01:05:20,976 This is a new program, but I'll keep drawing the heap over here. 1514 01:05:21,246 --> 01:05:24,046 So, when I call malloc size of int and int 1515 01:05:24,046 --> 01:05:26,636 on this system is probably 4 bytes or 32 bits 1516 01:05:27,026 --> 01:05:27,926 so what am I getting back? 1517 01:05:28,186 --> 01:05:31,786 Well, I'm going to get back, let's just draw a bigger square 1518 01:05:31,786 --> 01:05:33,776 than usual that's roughly four times the size. 1519 01:05:34,226 --> 01:05:37,796 This is now on the heap it's four bytes 1520 01:05:38,126 --> 01:05:41,316 so what gets stored in X? 1521 01:05:42,496 --> 01:05:43,456 What's that? 1522 01:05:43,916 --> 01:05:46,606 Yeah, a pointer to this chunk of memory. 1523 01:05:46,966 --> 01:05:50,656 Conceptually it's this, x has been assigned the return value 1524 01:05:50,656 --> 01:05:54,116 of malloc, malloc returns the address of the chunk of memory 1525 01:05:54,116 --> 01:05:56,136 that you've asked for of the operating system 1526 01:05:56,386 --> 01:05:57,996 so pictorially we just have an arrow. 1527 01:05:57,996 --> 01:05:59,546 Now, really underneath the hood? 1528 01:05:59,546 --> 01:06:01,916 Well, maybe this ended up at location 7, 8, 1529 01:06:01,916 --> 01:06:04,956 9 so what's really in x is the number 7, 8, 9, 1530 01:06:04,956 --> 01:06:07,286 which is the address in memory, but again, uninteresting. 1531 01:06:07,286 --> 01:06:09,986 The picture is more compelling with an arrow at this point. 1532 01:06:10,306 --> 01:06:13,896 So, now I have x pointing at a chunk of memory what's 1533 01:06:13,896 --> 01:06:14,816 in this memory though? 1534 01:06:15,866 --> 01:06:16,916 Honestly who knows. 1535 01:06:16,966 --> 01:06:19,386 It's just some random chunk of four bytes that happens 1536 01:06:19,386 --> 01:06:21,106 to be available at this point in time, 1537 01:06:21,406 --> 01:06:22,436 but I can put something there. 1538 01:06:22,736 --> 01:06:26,246 Let me go ahead and put the number 42. 1539 01:06:26,456 --> 01:06:28,626 So, is this correct? 1540 01:06:29,636 --> 01:06:29,716 >> Student: No. 1541 01:06:29,966 --> 01:06:30,176 >> David: Okay. 1542 01:06:30,396 --> 01:06:31,486 So, leading question. 1543 01:06:31,486 --> 01:06:32,666 It's not correct, but why? 1544 01:06:32,736 --> 01:06:36,706 If I do x equals 42, I'm not going to put the 42 here, 1545 01:06:36,756 --> 01:06:37,466 where am I going to put it? 1546 01:06:37,466 --> 01:06:40,406 It's going to go here, which means I'm going 1547 01:06:40,406 --> 01:06:43,336 to lose this arrow which means I've just asked for memory, 1548 01:06:43,616 --> 01:06:45,666 I've lost track of it and so we mentioned 1549 01:06:45,666 --> 01:06:48,566 in week zero I think the notion of a memory leak, 1550 01:06:48,866 --> 01:06:51,736 which is often creates the symptom 1551 01:06:51,736 --> 01:06:53,846 of your computer slowing down, you need to reboot, 1552 01:06:53,846 --> 01:06:55,406 it just gets slower and slower and slower. 1553 01:06:55,766 --> 01:06:58,906 Well, if you forget where the memory is that you asked for, 1554 01:06:59,176 --> 01:07:01,356 well, in fact, that is how you make a memory leak. 1555 01:07:01,356 --> 01:07:02,946 You just forget where the memory is that you asked 1556 01:07:02,946 --> 01:07:04,626 for on the heap and it's not going to get cleaned 1557 01:07:04,626 --> 01:07:07,056 up on the stack because malloc puts it somewhere else. 1558 01:07:07,056 --> 01:07:08,096 So this isn't quite right. 1559 01:07:08,306 --> 01:07:11,176 We need to tell the computer go to the address in x 1560 01:07:11,416 --> 01:07:15,046 and put 42 there so is the symbology there & or *? 1561 01:07:15,106 --> 01:07:18,736 It is *. Star means go there. 1562 01:07:19,016 --> 01:07:23,806 So, now we have said go to the address stored in X, 1563 01:07:23,806 --> 01:07:27,226 which is who knows, it's over there, put the number 42 there 1564 01:07:27,476 --> 01:07:32,266 so what I've just drawn is this part of this story here. 1565 01:07:32,486 --> 01:07:33,776 Well, now let's do something stupid. 1566 01:07:33,836 --> 01:07:35,096 Let's actually screw this up 1567 01:07:35,096 --> 01:07:38,496 and let's actually say, hmm, how about *y? 1568 01:07:38,496 --> 01:07:39,806 Let's put something there. 1569 01:07:39,806 --> 01:07:42,466 *y gets 13, an unlucky number. 1570 01:07:42,826 --> 01:07:44,676 In fact, what have I probably just done? 1571 01:07:45,536 --> 01:07:49,186 Segfault. And why might I possibly have done that? 1572 01:07:49,186 --> 01:07:50,456 Why did that possibly happen? 1573 01:07:50,456 --> 01:07:52,786 Star y gets 13. 1574 01:07:52,886 --> 01:07:53,806 Same exact syntax. 1575 01:07:54,896 --> 01:07:59,186 Well, yeah, we don't have memory being pointed at. 1576 01:07:59,186 --> 01:08:00,656 In fact, we kind of do in a sense. 1577 01:08:00,886 --> 01:08:03,176 This question mark I mean it's obviously not a question mark. 1578 01:08:03,386 --> 01:08:06,416 There's some bogus value here like 9, 9, 9. 1579 01:08:06,416 --> 01:08:07,436 Why not, whatever? 1580 01:08:07,436 --> 01:08:09,496 It's just reminiscent of a previous program 1581 01:08:09,496 --> 01:08:11,556 or a previous function the number 9, 9, 1582 01:08:11,556 --> 01:08:13,096 9 so what does that really mean? 1583 01:08:13,296 --> 01:08:16,636 That means pictorially this y 1584 01:08:16,926 --> 01:08:20,026 by default is pointing somewhere, but who knows where? 1585 01:08:20,266 --> 01:08:23,726 And so when I say *y gets 13, okay, I interpret, 1586 01:08:23,726 --> 01:08:26,366 misinterpret this value as a pointer 1587 01:08:26,366 --> 01:08:28,436 and I say go to location 9, 9, 9. 1588 01:08:28,436 --> 01:08:29,446 That is here in RAM. 1589 01:08:29,766 --> 01:08:33,016 I can certainly try to write the number 13 there, but bam, 1590 01:08:33,326 --> 01:08:36,796 my program very likely will crash if this chunk 1591 01:08:36,796 --> 01:08:39,216 of memory was not given to me previously 1592 01:08:39,216 --> 01:08:41,566 by the operating system and maybe it is owned 1593 01:08:41,566 --> 01:08:42,616 by the operating system. 1594 01:08:42,616 --> 01:08:45,906 Maybe it's really close to the number zero to null and, 1595 01:08:45,906 --> 01:08:47,636 therefore, I'm absolutely going to crash. 1596 01:08:47,636 --> 01:08:48,396 So, this was bad. 1597 01:08:48,706 --> 01:08:51,186 So, we'll have to fix that but let's do something correct. 1598 01:08:51,186 --> 01:08:51,676 Can I do this? 1599 01:08:52,536 --> 01:08:57,656 y gets X? So, let me say this is bad, this line, 1600 01:08:57,816 --> 01:08:59,356 y gets X. This is legitimate. 1601 01:08:59,866 --> 01:09:03,596 Right? If I want to store in y the same thing that's in X, 1602 01:09:04,346 --> 01:09:06,686 that's just like drawing what on the board here? 1603 01:09:07,716 --> 01:09:09,186 An arrow to the same thing, right? 1604 01:09:09,246 --> 01:09:10,986 We played that game last time, right? 1605 01:09:11,136 --> 01:09:12,606 y gets the same value of x 1606 01:09:12,606 --> 01:09:15,456 so pictorially points the same place and so now 1607 01:09:15,456 --> 01:09:19,836 with the last line of code what if I do this, *y gets 13? 1608 01:09:19,836 --> 01:09:21,006 How does the picture change? 1609 01:09:23,266 --> 01:09:28,086 The 42 becomes a 13 and just as we did for the incorrect copy, 1610 01:09:28,376 --> 01:09:32,446 now both what x is pointing at and what y is pointing at has, 1611 01:09:32,446 --> 01:09:33,496 in fact, changed in memory. 1612 01:09:33,496 --> 01:09:37,486 It becomes a 13 and that is consistent with the code 1613 01:09:37,956 --> 01:09:38,966 and it also doesn't crash 1614 01:09:39,206 --> 01:09:41,276 because now y has been assigned to the correct value. 1615 01:09:41,276 --> 01:09:44,016 So this we would want to delete in order for the code not 1616 01:09:44,016 --> 01:09:47,876 to run the risk of crashing, but let's now see this was made 1617 01:09:48,706 --> 01:09:52,026 by an excellent teacher out at Stanford University. 1618 01:09:52,266 --> 01:09:55,446 It's kind of one of these things gets passed around quite a bit. 1619 01:09:55,836 --> 01:09:59,126 Allow me, it's about 2, 3 minutes here to introduce you 1620 01:09:59,126 --> 01:10:00,726 to someone called Binky. 1621 01:10:01,666 --> 01:10:03,436 Though this might look silly and it is, in fact, 1622 01:10:03,436 --> 01:10:04,636 Play-doh it is a professor 1623 01:10:04,636 --> 01:10:06,146 at Stanford University so it's legit. 1624 01:10:06,636 --> 01:10:07,976 All right. 1625 01:10:08,016 --> 01:10:08,083 [ Video playing ] 1626 01:10:08,083 --> 01:10:10,396 >> Hey, Binky. 1627 01:10:10,396 --> 01:10:11,266 Wake up. It's. 1628 01:10:11,486 --> 01:10:12,696 >> David: And that is the professor's voice. 1629 01:10:12,696 --> 01:10:12,856 [Laughter] 1630 01:10:12,856 --> 01:10:14,486 >> Time for pointer fun. 1631 01:10:15,496 --> 01:10:16,736 >> What's that? 1632 01:10:16,736 --> 01:10:18,246 Learn about pointers? 1633 01:10:18,286 --> 01:10:20,086 Oh, goody. 1634 01:10:20,086 --> 01:10:21,476 >> Well, to get started I guess we're going 1635 01:10:21,476 --> 01:10:22,326 to need a couple of pointers. 1636 01:10:23,066 --> 01:10:26,496 >> Okay. This code allocates two pointers 1637 01:10:26,496 --> 01:10:29,936 which can point to integers. 1638 01:10:29,936 --> 01:10:34,596 >> Okay. Well, I see the two pointers, but they don't seem 1639 01:10:34,596 --> 01:10:36,956 to be pointing to anything. 1640 01:10:36,956 --> 01:10:39,286 >> That's right. 1641 01:10:40,466 --> 01:10:46,566 Initially pointers don't point to anything. 1642 01:10:46,566 --> 01:10:48,486 The things they point to are called pointees 1643 01:10:48,486 --> 01:10:55,356 and setting them is a separate step. 1644 01:10:55,356 --> 01:10:56,276 >> Oh, right, right. 1645 01:10:56,276 --> 01:10:56,706 I knew that. 1646 01:10:57,116 --> 01:10:58,106 The pointees are separate. 1647 01:10:58,106 --> 01:11:00,626 So, how do you allocate a pointee? 1648 01:11:00,626 --> 01:11:06,326 >> Okay. Well, this code allocates a new integer pointee 1649 01:11:07,026 --> 01:11:12,366 and this part sets x to point to it. 1650 01:11:12,416 --> 01:11:15,636 >> Hey, that looks better. 1651 01:11:15,636 --> 01:11:17,426 So, make it do something. 1652 01:11:18,286 --> 01:11:23,396 >> Okay. I'll dereference the pointer x 1653 01:11:23,526 --> 01:11:28,516 to store the number 42 into its pointee. 1654 01:11:28,706 --> 01:11:35,836 For this trick I'll need my magic wand of dereferencing. 1655 01:11:35,836 --> 01:11:39,596 >> Your magic wand of dereferencing? 1656 01:11:39,956 --> 01:11:41,006 That's great. 1657 01:11:41,396 --> 01:11:43,326 >> This is what the code looks like. 1658 01:11:43,576 --> 01:11:46,106 I'll just set up the number and. 1659 01:11:46,106 --> 01:11:50,046 >> Hey, look, there it goes. 1660 01:11:50,186 --> 01:11:53,536 So, doing a dereference on x follows the arrow 1661 01:11:53,646 --> 01:11:55,786 to access its pointee. 1662 01:11:56,146 --> 01:12:01,586 In this case, to store 42 in there. 1663 01:12:01,586 --> 01:12:06,406 Hey, try using it to store the number 13 1664 01:12:06,886 --> 01:12:08,196 through the other pointer Y. 1665 01:12:08,196 --> 01:12:12,976 >> Okay. I'll just go over here to y and get the number 13 set 1666 01:12:13,826 --> 01:12:18,716 up and then take the wand of dereferencing and just, whoa. 1667 01:12:18,956 --> 01:12:19,576 >> Oh, hey. 1668 01:12:19,896 --> 01:12:21,886 That didn't work. 1669 01:12:22,016 --> 01:12:30,926 Say, Binky, I don't think dereferencing y is a good idea 1670 01:12:30,926 --> 01:12:37,696 because, you know, setting up the pointee is a separate step 1671 01:12:37,696 --> 01:12:41,276 and I don't think we ever did it. 1672 01:12:41,276 --> 01:12:41,706 >> Good point. 1673 01:12:41,706 --> 01:12:43,886 >> Yeah. We allocated the pointer y but we never set it 1674 01:12:43,916 --> 01:12:44,456 to point to a pointee. 1675 01:12:44,486 --> 01:12:44,876 >> Very observant. 1676 01:12:44,906 --> 01:12:45,686 >> Hey, you're looking good there, Binky. 1677 01:12:45,716 --> 01:12:47,306 Can you fix it so that y points to the same pointee as X? 1678 01:12:47,336 --> 01:12:48,596 >> Sure. I'll use my magic wand of pointer assignment. 1679 01:12:48,626 --> 01:12:49,736 >> Is that going to be a problem like before? 1680 01:12:49,766 --> 01:12:50,576 >> No, this doesn't touch the pointees. 1681 01:12:50,606 --> 01:12:51,566 It just changes one pointer to point 1682 01:12:51,596 --> 01:12:52,256 to the same thing as another. 1683 01:12:52,286 --> 01:12:52,466 >> Oh, I see. 1684 01:12:52,496 --> 01:12:54,146 Now, y points to the same place as X. So, wait now y is fixed. 1685 01:12:54,176 --> 01:12:54,596 It has a pointee. 1686 01:12:54,626 --> 01:12:55,256 So you can try the wand 1687 01:12:55,286 --> 01:12:56,516 of dereferencing again to send the 13 over. 1688 01:12:56,546 --> 01:12:56,966 >> Okay. Here it goes. 1689 01:12:56,996 --> 01:12:57,386 >> Hey, look at that. 1690 01:12:57,416 --> 01:12:58,196 Now, dereferencing works on y 1691 01:12:58,226 --> 01:12:59,066 and because the pointers are sharing 1692 01:12:59,096 --> 01:13:00,056 that one pointee, they both see the 13. 1693 01:13:00,086 --> 01:13:00,476 >> Yeah, sure, whatever. 1694 01:13:00,506 --> 01:13:01,436 So, are we going to switch places now? 1695 01:13:01,466 --> 01:13:02,036 >> Oh, look, we're out of time. 1696 01:13:02,066 --> 01:13:02,156 >> But. 1697 01:13:02,186 --> 01:13:03,086 >> Just remember the three pointer rules. 1698 01:13:03,116 --> 01:13:04,556 Number one, the basic structure is that you have a pointer 1699 01:13:04,586 --> 01:13:05,786 and it points over to a pointee, but the pointer 1700 01:13:05,816 --> 01:13:07,196 and pointee are separate and the common error is to set 1701 01:13:07,226 --> 01:13:08,456 up a pointer but to forget to give it a pointee. 1702 01:13:08,486 --> 01:13:09,506 Number two, pointer dereferencing starts 1703 01:13:09,536 --> 01:13:10,406 at the pointer and follows its arrow 1704 01:13:10,436 --> 01:13:11,096 over to access its pointee. 1705 01:13:11,126 --> 01:13:12,416 As we all know, this only works if there is a pointee, 1706 01:13:12,446 --> 01:13:13,556 which kind of gets back to rule number one. 1707 01:13:13,586 --> 01:13:14,756 Number three, pointer assignment, takes one pointer 1708 01:13:14,786 --> 01:13:16,346 and changes it to point to the same pointee as another pointer 1709 01:13:16,376 --> 01:13:17,606 so after the assignment the two pointers will point 1710 01:13:17,636 --> 01:13:18,086 to the same pointee. 1711 01:13:18,116 --> 01:13:18,806 Sometimes that's called sharing 1712 01:13:18,836 --> 01:13:19,706 and that's all there is to it, really. 1713 01:13:19,736 --> 01:13:19,976 Bye-bye now. 1714 01:13:20,226 --> 01:13:21,466 >> David: So this is just the beginning 1715 01:13:21,466 --> 01:13:23,086 of this new power you now have. 1716 01:13:23,086 --> 01:13:24,786 We will return on Monday and do 1717 01:13:24,786 --> 01:13:26,296 yet more powerful things with this. 1718 01:13:26,376 --> 01:13:28,346 See you then.