1 00:00:11,056 --> 00:00:11,256 >> All right. 2 00:00:11,256 --> 00:00:12,396 Welcome back to CS50. 3 00:00:12,456 --> 00:00:18,156 This is the start of Week 5 and this is Recursion, 4 00:00:18,286 --> 00:00:19,836 couldn't miss the opportunity here. 5 00:00:20,126 --> 00:00:24,226 So this fancy new technology here will hopefully empower some 6 00:00:24,226 --> 00:00:25,916 of you to actually see the blackboard 7 00:00:25,916 --> 00:00:27,706 if we actually sketch things over there 8 00:00:27,786 --> 00:00:29,776 so more on that to come. 9 00:00:29,876 --> 00:00:34,226 So, this might perhaps be the simplest Sudoku board 10 00:00:34,226 --> 00:00:35,206 that you might solve. 11 00:00:35,206 --> 00:00:38,656 For those unfamiliar, Sudoku is all about completing the rows, 12 00:00:38,656 --> 00:00:41,696 columns and squares within a little board like this 13 00:00:41,996 --> 00:00:44,506 but if nothing else, perhaps this little joke here makes a 14 00:00:44,506 --> 00:00:45,676 bit more sense now. 15 00:00:45,996 --> 00:00:49,856 So, this week and next and the week after, we introduce one 16 00:00:49,856 --> 00:00:51,876 of our newest problem domains. 17 00:00:51,876 --> 00:00:53,296 So the next Problem Set No. 18 00:00:53,296 --> 00:00:56,186 5 is actually going to focus on the world of forensics 19 00:00:56,376 --> 00:00:58,216 and the world of forensics is all 20 00:00:58,216 --> 00:01:01,766 about recovering data that's been accidentally deleted 21 00:01:01,766 --> 00:01:04,566 or intentionally deleted or generally trying 22 00:01:04,766 --> 00:01:07,616 to answer questions that someone else does not want you 23 00:01:07,616 --> 00:01:08,766 to have the answer to. 24 00:01:09,036 --> 00:01:11,306 One of my favorite jobs to be honest years ago 25 00:01:11,306 --> 00:01:14,236 in grad school was an internship I started over the summer 26 00:01:14,236 --> 00:01:16,456 which was working for the Middlesex District Attorney's 27 00:01:16,456 --> 00:01:19,226 Office working for their forensic investigator 28 00:01:19,226 --> 00:01:21,696 and learning a lot about how this stuff works. 29 00:01:22,096 --> 00:01:24,946 In a nutshell what would happen is the local Mass State Police 30 00:01:24,946 --> 00:01:27,856 would drop off things like hard drives and USB sticks, 31 00:01:28,216 --> 00:01:30,386 sometimes keyboards and mice and monitors, 32 00:01:30,386 --> 00:01:32,386 which are actually forensically not all that useful, 33 00:01:32,556 --> 00:01:33,466 but we got everything 34 00:01:33,466 --> 00:01:36,716 from actual suspect's homes or offices. 35 00:01:36,876 --> 00:01:40,856 Our job was to answer questions of the form did he or she do it. 36 00:01:40,856 --> 00:01:43,976 Now quite often the answer was unclear 37 00:01:43,976 --> 00:01:47,726 because there isn't necessarily a smoking gun digitally 38 00:01:47,726 --> 00:01:49,346 on a computer but I will say 39 00:01:49,346 --> 00:01:51,996 within Middlesex County sometimes the answer was as easy 40 00:01:51,996 --> 00:01:55,306 as double-clicking a secret folder called My Documents 41 00:01:55,306 --> 00:01:58,386 inside of which evidence could quite often be found 42 00:01:58,486 --> 00:02:01,396 but more technically interesting was when we would have 43 00:02:01,526 --> 00:02:04,386 to actually recover data whether it was images 44 00:02:04,386 --> 00:02:06,006 that had been deleted intentionally 45 00:02:06,006 --> 00:02:06,976 or from a browser's cache, 46 00:02:07,596 --> 00:02:09,246 whether it was Excel spreadsheets 47 00:02:09,246 --> 00:02:12,496 with financial information that needed to be deleted or emails 48 00:02:12,496 --> 00:02:15,666 that had been sent or issuing subpoena's to local ISP's 49 00:02:15,666 --> 00:02:19,676 like Yahoo and Comcast and these folks to actually get records. 50 00:02:19,676 --> 00:02:22,206 It's fascinating and it's also terrifying to be honest. 51 00:02:22,206 --> 00:02:25,566 Increasingly just how many remnants we all leave digitally 52 00:02:25,566 --> 00:02:28,116 when we so much as check our email these days 53 00:02:28,116 --> 00:02:30,166 and we'll talk more about this towards semester's end 54 00:02:30,376 --> 00:02:33,356 in the context of the internet but this week we'll begin 55 00:02:33,356 --> 00:02:36,346 to focus on the lower level details of your own laptops 56 00:02:36,346 --> 00:02:38,576 and your own USB sticks and external hard drives 57 00:02:38,836 --> 00:02:41,586 as to just how much data is actually there even 58 00:02:41,586 --> 00:02:44,546 when you think you have deleted said information. 59 00:02:44,746 --> 00:02:48,076 Now with that said, shows like this have glorified the sexiness 60 00:02:48,076 --> 00:02:51,726 of doing forensic investigations and they've also conveyed 61 00:02:51,726 --> 00:02:54,896 to the population that we can do some amazing things 62 00:02:54,896 --> 00:02:55,566 with computers. 63 00:02:55,836 --> 00:02:57,156 So that's not always the case. 64 00:02:57,156 --> 00:02:59,946 For instance if you've ever been watching a TV show, 65 00:02:59,946 --> 00:03:02,576 something like Law and Order and the like and the cop's leaning 66 00:03:02,576 --> 00:03:05,126 over the shoulder of the computer technician 67 00:03:05,126 --> 00:03:08,206 and here she says something like can you enhance that? 68 00:03:08,206 --> 00:03:10,756 No, you can't usually enhance that whether 69 00:03:10,756 --> 00:03:13,596 that is a license plate or some suspect's clothes 70 00:03:13,596 --> 00:03:14,196 that they're wearing. 71 00:03:14,446 --> 00:03:15,656 We don't have infinite zoom. 72 00:03:15,656 --> 00:03:17,326 If you've taken a photo with your camera 73 00:03:17,326 --> 00:03:19,456 or with your digital camera and you start zooming 74 00:03:19,456 --> 00:03:21,966 in you typically start seeing things get pretty pixilated 75 00:03:21,966 --> 00:03:22,776 and pretty splotchy. 76 00:03:23,166 --> 00:03:26,386 So for the most part that's what the cops would also seize. 77 00:03:26,386 --> 00:03:28,856 So we'll also try to debunk some of these myths 78 00:03:28,856 --> 00:03:31,686 that are propagated by shows like this. 79 00:03:31,876 --> 00:03:35,256 Also too hopefully after taking this semester we're going 80 00:03:35,256 --> 00:03:37,626 to kind of ruin accidentally a whole bunch of movies for you 81 00:03:37,626 --> 00:03:40,516 because you're going to start noticing in movies and also TV 82 00:03:40,686 --> 00:03:42,046 that there's just crazy talk sometimes. 83 00:03:42,046 --> 00:03:44,246 All right, they'll spend a million dollars producing some 84 00:03:44,246 --> 00:03:48,506 TV show and not so much as ask a computer oriented intern what a 85 00:03:48,506 --> 00:03:51,106 technically accurate thing to say would be. 86 00:03:51,106 --> 00:03:54,036 You've all probably seen the movie Jurassic Park. 87 00:03:54,036 --> 00:03:57,916 Famous clip that you might not have noticed at the time but is 88 00:03:58,576 --> 00:04:01,976 with us on YouTube this day was this clip here. 89 00:04:02,516 --> 00:04:20,216 [ Movie clip ] 90 00:04:20,716 --> 00:04:23,246 >> Okay so first of all it's a UNIX system. 91 00:04:23,246 --> 00:04:25,316 It's essentially the same as it's a Mac 92 00:04:25,796 --> 00:04:28,456 or something ridiculous like hat and the program 93 00:04:28,456 --> 00:04:32,266 that she then proceeded to use with all those little squares 94 00:04:32,266 --> 00:04:35,086 and the rectangles that you saw depicted is apparently just a 95 00:04:35,086 --> 00:04:38,536 graphical file browser whereby your folders are depicted 96 00:04:38,536 --> 00:04:40,546 as cubes or squares on the screen. 97 00:04:40,586 --> 00:04:42,916 So these are essentially things a person would 98 00:04:42,916 --> 00:04:43,706 have double-clicked. 99 00:04:43,706 --> 00:04:49,866 Another one that we're fans of, this one related to CSI here. 100 00:04:50,516 --> 00:04:53,126 We haven't used all of this jargon here but it too is a bit 101 00:04:53,126 --> 00:04:54,446 of crazy talk by the writers. 102 00:04:55,516 --> 00:05:08,686 [ TV clip ] 103 00:05:09,186 --> 00:05:10,616 >> Okay that means nothing, like. 104 00:05:10,616 --> 00:05:14,186 I will create a graphical user interface using an old dated 105 00:05:14,186 --> 00:05:16,256 language to track an IP address. 106 00:05:16,256 --> 00:05:18,996 It's not the right technical solution to this problem. 107 00:05:20,446 --> 00:05:23,416 We'll give you let's see. 108 00:05:23,416 --> 00:05:26,386 One more. One more and this one I'm looking forward 109 00:05:26,386 --> 00:05:29,586 to using this spring in our iPhone programming class 110 00:05:29,586 --> 00:05:31,366 for reasons that will become clear in a moment. 111 00:05:32,516 --> 00:05:39,586 [ Video ] 112 00:05:40,086 --> 00:06:05,016 >> Now let me rewind a split second here and go back 113 00:06:05,326 --> 00:06:07,536 to this one in particular. 114 00:06:07,616 --> 00:06:14,666 Again not a language we ourselves will be using 115 00:06:14,716 --> 00:06:17,636 but we'll be using in CS164 this spring. 116 00:06:18,126 --> 00:06:22,266 Fast forward and so this is a programming language 117 00:06:22,266 --> 00:06:25,186 which is called Objective C used these days to program 118 00:06:25,186 --> 00:06:26,636 on the iPhone as well as on the Mac. 119 00:06:26,986 --> 00:06:29,326 Pretty sure whatever this hacker was doing has nothing to do 120 00:06:29,326 --> 00:06:32,726 with crayons as suggested down here in the source code. 121 00:06:32,976 --> 00:06:36,296 If you also look through this what this code is actually 122 00:06:36,296 --> 00:06:38,336 implementing is something completely benign. 123 00:06:38,386 --> 00:06:42,736 It's like a little toggle switch on an iPhone or an iPod Touch 124 00:06:42,736 --> 00:06:43,826 where you touch the little blue thing 125 00:06:43,826 --> 00:06:44,856 and it moves left to right. 126 00:06:44,856 --> 00:06:46,896 So there's a whole lot of Easter eggs ahead of you 127 00:06:46,896 --> 00:06:50,756 if you start pausing your TiVo or Hulu online but hopefully one 128 00:06:50,756 --> 00:06:51,696 of the goals of this course, 129 00:06:52,016 --> 00:06:56,076 not just to ruin certain popular media for you but really just 130 00:06:56,076 --> 00:06:57,256 to make sure after a course 131 00:06:57,286 --> 00:07:00,536 like this you're actually making educated and informed decisions 132 00:07:00,536 --> 00:07:02,446 when it actually comes to these technologies. 133 00:07:02,576 --> 00:07:06,696 So, without further ado Problem Set 4 this week challenges you 134 00:07:06,876 --> 00:07:08,056 with an even larger set 135 00:07:08,056 --> 00:07:10,796 of distribution code whereas last week you did get a bit 136 00:07:10,796 --> 00:07:12,686 of distribution code for Game of 15. 137 00:07:12,926 --> 00:07:16,596 This week we hand you more so the goal just like last week is 138 00:07:16,626 --> 00:07:19,706 to first understand code that someone else has written 139 00:07:19,706 --> 00:07:21,566 and this will be incredibly common. 140 00:07:21,566 --> 00:07:23,766 Not if you go off and necessarily program 141 00:07:24,066 --> 00:07:28,976 with for a company or with some partner working on some project 142 00:07:29,186 --> 00:07:32,066 but even if for your own final projects or side projects 143 00:07:32,066 --> 00:07:33,716 that you might want to implement you need 144 00:07:33,716 --> 00:07:34,926 to use something called an API, 145 00:07:35,266 --> 00:07:36,996 an application programming interface. 146 00:07:36,996 --> 00:07:39,636 Quite common at the end of the semester is not for you guys 147 00:07:39,636 --> 00:07:42,586 to implement your entire projects from scratch but rather 148 00:07:42,586 --> 00:07:44,516 to stand on the shoulders of other people 149 00:07:44,516 --> 00:07:46,426 who have written code that you can incorporate 150 00:07:46,546 --> 00:07:50,306 into your own programs so that your product is actually much 151 00:07:50,306 --> 00:07:52,056 more glamorous that it might be if you had 152 00:07:52,056 --> 00:07:53,986 to do absolutely everything from scratch. 153 00:07:54,196 --> 00:07:56,876 So we've provided you with a framework for the game of Sudoku 154 00:07:57,076 --> 00:07:58,356 out of the box when you run it, 155 00:07:58,356 --> 00:07:59,696 it looks a little something like this. 156 00:07:59,696 --> 00:08:01,196 Unfortunately it doesn't do anything 157 00:08:01,196 --> 00:08:02,586 if you hit up, down, left and right. 158 00:08:02,816 --> 00:08:06,196 That green cursor doesn't actually move but once you begin 159 00:08:06,196 --> 00:08:07,666 to fill in some of the blanks 160 00:08:07,666 --> 00:08:09,986 and this time we're a little less explicit as to 161 00:08:09,986 --> 00:08:11,136 where these blanks are. 162 00:08:11,136 --> 00:08:12,966 We don't say to do, to do, to do. 163 00:08:13,136 --> 00:08:15,926 We instead leave it to you to determine both on your own 164 00:08:15,926 --> 00:08:18,296 and with the help of the walkthrough where you need 165 00:08:18,296 --> 00:08:22,006 to start augmenting the code in order to add things like support 166 00:08:22,136 --> 00:08:25,256 for the cursor up, down, left, right and also detecting 167 00:08:25,256 --> 00:08:27,796 when the game has actually been won. 168 00:08:28,096 --> 00:08:30,516 So there's also the hacker edition this week whereas much 169 00:08:30,516 --> 00:08:32,696 like God mode last week in Game of 15, 170 00:08:32,846 --> 00:08:34,606 this week the hackers will have 171 00:08:34,606 --> 00:08:38,216 to implement a hint mode whereby hitting H will begin 172 00:08:38,216 --> 00:08:40,926 to spoil the fun of the game and hitting H again and again 173 00:08:40,926 --> 00:08:42,786 and again will start to reveal one 174 00:08:42,786 --> 00:08:46,366 after the other correctly positioned numbers on the board. 175 00:08:46,366 --> 00:08:47,936 So this is going to require that those 176 00:08:47,936 --> 00:08:50,356 of you tackling the hacker edition essentially figure 177 00:08:50,356 --> 00:08:53,446 out how to solve Sudoku and then reveal one 178 00:08:53,446 --> 00:08:57,856 at a time a potential number to the user that is playing. 179 00:08:57,856 --> 00:08:59,436 So that is Sudoku. 180 00:08:59,936 --> 00:09:02,516 So I'm wearing a little CS50 apparel here 181 00:09:02,516 --> 00:09:04,396 as you may have seen some of the course staff wear. 182 00:09:04,616 --> 00:09:07,376 So the course says there's this tradition of designing 183 00:09:07,376 --> 00:09:10,026 by students and staff alike fun designs that then 184 00:09:10,026 --> 00:09:13,036 in theory can then be sported at semesters end 185 00:09:13,036 --> 00:09:14,636 so that you don't need to just say it 186 00:09:14,636 --> 00:09:17,786 to your friends I took CS50, you can literally wear CS50. 187 00:09:17,786 --> 00:09:20,066 So you'll see in the problem set this week 188 00:09:20,066 --> 00:09:22,206 that you are cordially invited to participate 189 00:09:22,206 --> 00:09:23,436 in this year's design if you would 190 00:09:23,436 --> 00:09:25,786 like to contribute per those instructions there. 191 00:09:26,126 --> 00:09:30,426 And so at this point we have these pink forms here 192 00:09:30,426 --> 00:09:32,026 which is the pass fail forms. 193 00:09:32,086 --> 00:09:36,266 If you have any doubts as to whether or not you remain 194 00:09:36,266 --> 00:09:37,146 in the course, whether 195 00:09:37,146 --> 00:09:40,606 or not you're a little increasingly unsure, whether 196 00:09:40,606 --> 00:09:45,036 or not you can actually, whether or not this is something you can 197 00:09:45,036 --> 00:09:47,486 in fact succeed at realize that the aim of pass fail, 198 00:09:47,486 --> 00:09:49,266 which is wholeheartedly encouraged by the course 199 00:09:49,546 --> 00:09:51,736 in an opportunity really to take the edge off 200 00:09:51,736 --> 00:09:52,616 of a course like this. 201 00:09:52,616 --> 00:09:55,076 So that if you do find yourself and you have found yourself 202 00:09:55,356 --> 00:09:58,016 over the past few weeks making it to some 1 a.m., 203 00:09:58,016 --> 00:09:59,966 2 a.m. the night before as some P sets do 204 00:10:00,186 --> 00:10:01,646 and you're feeling pretty good about it, 205 00:10:01,846 --> 00:10:03,256 but damn there's just some bug, 206 00:10:03,256 --> 00:10:05,796 some bugs that you can't quite chase down, one of the aims 207 00:10:05,796 --> 00:10:08,426 in our mind pass fail is to allow you to say you know what. 208 00:10:08,426 --> 00:10:10,926 I'm 95, 99 percent of the way there. 209 00:10:11,136 --> 00:10:11,696 That's fine. 210 00:10:11,696 --> 00:10:12,946 We're going to move on from there. 211 00:10:12,946 --> 00:10:14,756 So realize I'm happy to still sign these 212 00:10:15,036 --> 00:10:16,956 and I thought I would show you just two things 213 00:10:16,956 --> 00:10:20,016 and this might only be of interest to me sort 214 00:10:20,016 --> 00:10:22,466 of nostalgically but this was the very first sheet of notes 215 00:10:22,466 --> 00:10:24,746 that I wrote on my very first day of CS50. 216 00:10:24,816 --> 00:10:28,536 A very famous computer scientist named Brian Kernighan was there 217 00:10:28,536 --> 00:10:31,536 that year teaching us and I was just kind of amused 218 00:10:31,586 --> 00:10:34,756 to see 1) this was like the only day I took notes apparently I 219 00:10:34,756 --> 00:10:37,786 very quickly became one of those students but I was just kind 220 00:10:37,786 --> 00:10:40,056 of amused to see that even 221 00:10:40,056 --> 00:10:42,456 for me years ago this stuff wasn't completely new. 222 00:10:42,456 --> 00:10:43,476 What is an algorithm? 223 00:10:43,476 --> 00:10:44,706 I actually wrote it down. 224 00:10:44,856 --> 00:10:47,226 Some precise sequence of steps for getting something done. 225 00:10:47,446 --> 00:10:49,486 Apparently my takeaway from Lecture 1 226 00:10:49,486 --> 00:10:50,776 that year was precision 227 00:10:50,776 --> 00:10:52,556 and correctness were really important. 228 00:10:52,986 --> 00:10:56,766 I was really in the exclamation point phase here where I scroll. 229 00:10:56,766 --> 00:10:57,086 Let's see. 230 00:10:57,086 --> 00:10:57,886 My last page of notes 231 00:10:57,886 --> 00:10:59,816 in my first section were three and four. 232 00:11:00,066 --> 00:11:03,136 So these are the top three things apparently 233 00:11:03,136 --> 00:11:05,986 that I took away from the first weeks of CS50 but this is 234 00:11:05,986 --> 00:11:08,266 to say too like all of this was new to me 235 00:11:08,266 --> 00:11:10,326 when I first dove in years ago. 236 00:11:10,326 --> 00:11:12,916 So realize you remain in very good company. 237 00:11:12,916 --> 00:11:14,666 Now one of your classmates, last clip for now, 238 00:11:14,936 --> 00:11:16,896 one of your classmates actually sent me this link 239 00:11:16,896 --> 00:11:20,566 after he had just finished P set 3 with great delight 240 00:11:20,566 --> 00:11:23,026 and it's perhaps representative of the emotions some 241 00:11:23,026 --> 00:11:24,286 of you have felt quite a bit. 242 00:11:25,516 --> 00:11:30,726 [ Video ] 243 00:11:31,226 --> 00:11:31,726 All right. 244 00:11:32,516 --> 00:11:36,546 [ Applause ] 245 00:11:37,046 --> 00:11:40,456 >> So this was bad. 246 00:11:40,836 --> 00:11:43,486 So this implementation of swap though 247 00:11:43,486 --> 00:11:45,376 at first glance seemingly correct 248 00:11:45,376 --> 00:11:48,196 in that it does swap two inputs A and B, 249 00:11:48,406 --> 00:11:50,136 we then emphasized multiple times 250 00:11:50,136 --> 00:11:51,806 that this is actually fundamentally flawed 251 00:11:51,806 --> 00:11:54,376 because as soon as the swap function returns and you hit 252 00:11:54,376 --> 00:11:56,426 that curly brace well that's it. 253 00:11:56,516 --> 00:11:59,236 All the work that you did with A and B and temp 254 00:11:59,236 --> 00:12:02,186 where we did last week with milk and orange juice, it's just kind 255 00:12:02,186 --> 00:12:04,826 of lost because the values are not actually returned. 256 00:12:05,026 --> 00:12:08,326 Now one of the problems in C is how many values maximally can 257 00:12:08,326 --> 00:12:09,526 you return from a function? 258 00:12:10,896 --> 00:12:11,926 So just one right. 259 00:12:11,926 --> 00:12:13,856 You can return nothing which is the case of the void 260 00:12:14,066 --> 00:12:18,046 or you can return an ENT or HR or a pointer to a string 261 00:12:18,046 --> 00:12:20,846 as we'll see but you can really only return one thing. 262 00:12:21,026 --> 00:12:25,086 Now you can actually return multiple things if you return 263 00:12:25,086 --> 00:12:28,006 like a two-element array or a three-element array 264 00:12:28,246 --> 00:12:29,806 or you return this thing called a pointer 265 00:12:29,806 --> 00:12:32,256 but these are not elegant solutions. 266 00:12:32,256 --> 00:12:35,156 So in the world of C typically if you want a function 267 00:12:35,156 --> 00:12:38,906 to take input and produce multiple outputs what you do is 268 00:12:38,906 --> 00:12:42,626 you do not pass in those inputs by value, by copy. 269 00:12:42,886 --> 00:12:44,176 In other words you don't do this. 270 00:12:44,556 --> 00:12:47,356 Instead what you do is the promised solution of last week 271 00:12:47,356 --> 00:12:50,086 of you do this ever so slightly different 272 00:12:50,086 --> 00:12:51,526 but those stars before the A 273 00:12:51,526 --> 00:12:53,966 and the B make all of the difference. 274 00:12:54,046 --> 00:12:57,616 This notation Star A and Star B mean don't pass A 275 00:12:57,616 --> 00:12:59,596 and B in by value. 276 00:12:59,596 --> 00:13:03,036 Don't pass in copies of them but rather pass them 277 00:13:03,036 --> 00:13:06,956 in by their address, by pointer. 278 00:13:07,196 --> 00:13:10,086 So pointers are every semester, for myself included back 279 00:13:10,086 --> 00:13:13,526 in the day, like very often a challenging concept but really 280 00:13:13,526 --> 00:13:16,036 if you just reduce it consistently in your mind 281 00:13:16,306 --> 00:13:17,686 to just the basic definition. 282 00:13:17,686 --> 00:13:18,766 A pointer is an address 283 00:13:19,366 --> 00:13:22,856 and so Star A means give me the address of some integer. 284 00:13:23,106 --> 00:13:26,446 You can typically reason through any sort of new type of code 285 00:13:26,446 --> 00:13:28,676 that might at first glance look pretty arcane. 286 00:13:28,676 --> 00:13:32,196 So this syntax here is saying when I take in two arguments, 287 00:13:32,196 --> 00:13:35,526 let's call them A and B, they're not going to be actual integers. 288 00:13:35,766 --> 00:13:39,016 They are going to be the addresses of integers and just 289 00:13:39,016 --> 00:13:41,436 as I did over here with the milk and OJ by kind 290 00:13:41,436 --> 00:13:44,656 of writing directions on a sheet of paper and then handing those 291 00:13:44,656 --> 00:13:47,686 to the swap function in our imaginary tale here. 292 00:13:47,896 --> 00:13:49,826 Well, that's exactly what's happening here. 293 00:13:49,826 --> 00:13:52,316 You're being handed a map to the computer's memory 294 00:13:52,316 --> 00:13:54,916 and you're saying A is going to be here, B is going to be there. 295 00:13:54,916 --> 00:13:57,986 If you want to change those values, go there, change them 296 00:13:58,276 --> 00:14:02,166 and now you've actually effectively not returned two 297 00:14:02,166 --> 00:14:04,666 values but you've affected two values. 298 00:14:04,756 --> 00:14:08,046 So this is a very common paradigm and will also enable us 299 00:14:08,046 --> 00:14:09,676 to do much more interesting things 300 00:14:09,676 --> 00:14:12,446 and in fact we'll begin now to peel back even more layers. 301 00:14:12,446 --> 00:14:14,906 The CS50 library, that thing called argv 302 00:14:15,226 --> 00:14:16,436 for command line Arguments. 303 00:14:16,576 --> 00:14:18,736 We've actually been using pointers 304 00:14:18,736 --> 00:14:20,856 and memory addresses for quite some time. 305 00:14:20,856 --> 00:14:23,146 We just haven't revealed such syntax. 306 00:14:23,146 --> 00:14:24,656 So let's actually see this work. 307 00:14:24,986 --> 00:14:27,316 So I've gone ahead here and launched the CS50 appliance. 308 00:14:28,026 --> 00:14:31,036 I've opened the file called swap dot c. I'm going to go ahead 309 00:14:31,036 --> 00:14:33,856 and zoom in on what is now the correct solution 310 00:14:33,856 --> 00:14:34,616 to this problem. 311 00:14:34,616 --> 00:14:36,426 So I've got some boilerplate stuff 312 00:14:36,426 --> 00:14:39,986 up top includes standard IO dot h. I then have the swap 313 00:14:39,986 --> 00:14:43,776 prototype and now this is copy paste of what we just saw 314 00:14:43,776 --> 00:14:46,156 in the green slide but it ends with a semi colon 315 00:14:46,156 --> 00:14:49,426 and that's just again the clue saying hey swap is going 316 00:14:49,426 --> 00:14:49,906 to exist. 317 00:14:49,906 --> 00:14:51,786 It's a message to GCC, the compiler. 318 00:14:52,116 --> 00:14:52,966 So now main. 319 00:14:52,966 --> 00:14:55,186 We've seen this half a dozen times now 320 00:14:55,186 --> 00:14:57,836 where I assign X a value, Y a value 321 00:14:58,026 --> 00:14:59,536 and then I claim I'm swapping them 322 00:14:59,536 --> 00:15:01,456 but then they're never actually swapped 323 00:15:01,496 --> 00:15:04,156 but there's one more key syntactic difference this week 324 00:15:04,156 --> 00:15:05,436 that we didn't look at last week. 325 00:15:05,816 --> 00:15:08,986 You do have to change the syntax that you use 326 00:15:08,986 --> 00:15:11,476 when you actually want to pass in a variable 327 00:15:11,556 --> 00:15:13,666 or two variables by their address. 328 00:15:13,666 --> 00:15:16,256 You can't just say X. You can't just say Y. You have 329 00:15:16,256 --> 00:15:19,276 to say give me the address of X and the address of Y 330 00:15:19,276 --> 00:15:22,336 and you can perhaps infer what the relatively simple syntax 331 00:15:22,336 --> 00:15:24,296 for that is, is the single ampersand. 332 00:15:24,666 --> 00:15:28,116 By putting an ampersand before a variable's name you're telling 333 00:15:28,116 --> 00:15:32,456 the compiler go find the address of X and pass that in. 334 00:15:32,666 --> 00:15:34,176 Do not pass X in itself. 335 00:15:35,246 --> 00:15:36,566 Now just a word on types. 336 00:15:36,816 --> 00:15:38,956 So X and Y are indeed integers. 337 00:15:39,276 --> 00:15:45,186 So what is the data type of ampersand X would you say? 338 00:15:45,406 --> 00:15:46,706 How would you answer that question? 339 00:15:46,816 --> 00:15:50,596 What's the type of ampersand X? 340 00:15:51,936 --> 00:15:55,656 So it's the address of operator so it is passing an address. 341 00:15:55,826 --> 00:15:56,916 Well what is that map to? 342 00:15:57,156 --> 00:16:00,036 Well that just maps to no need to project here. 343 00:16:00,036 --> 00:16:03,156 That just maps to INT star. 344 00:16:03,436 --> 00:16:06,556 So anytime you have a pointer, pointers are meant 345 00:16:06,606 --> 00:16:07,856 to point to specific things. 346 00:16:08,256 --> 00:16:10,646 So if you want a pointer to an INT you would typically 347 00:16:10,646 --> 00:16:12,176 if you were going to write it say INT star. 348 00:16:12,176 --> 00:16:14,776 If you want a pointer to a char, you say char star. 349 00:16:14,776 --> 00:16:18,366 If you want a pointer to a double, you say double star. 350 00:16:18,456 --> 00:16:22,226 So in this case ampersand X I don't have to mention X at all. 351 00:16:22,436 --> 00:16:25,556 I don't mention star because X already exists at this point 352 00:16:25,556 --> 00:16:27,746 in the story, in this highlighted line that's called 353 00:16:27,746 --> 00:16:28,116 a swap. 354 00:16:28,116 --> 00:16:32,086 All I want to do is tell GCC X whatever data type it is, 355 00:16:32,086 --> 00:16:35,826 happens to be an INT, get its address and pass that in. 356 00:16:35,826 --> 00:16:37,106 So now what does swap look like? 357 00:16:37,146 --> 00:16:39,176 It looks exactly like as was promised. 358 00:16:39,176 --> 00:16:43,866 In the swap function we say INT Star A, INT Star B. So now A 359 00:16:43,866 --> 00:16:46,426 and B are effectively local variables. 360 00:16:46,426 --> 00:16:48,496 They're parameters but they're local to swap. 361 00:16:48,496 --> 00:16:50,736 They only exist at this point in the story. 362 00:16:51,076 --> 00:16:53,786 What gets stored to be clear in A? 363 00:16:54,976 --> 00:16:57,956 So the address of X and what gets stored in B? 364 00:16:57,956 --> 00:17:01,716 The address of Y. So now we can't implement swap 365 00:17:01,716 --> 00:17:03,366 in precisely the same way we used 366 00:17:03,366 --> 00:17:07,056 to because notice what would immediately be bad about this. 367 00:17:07,056 --> 00:17:09,576 If I go in and manually delete all of these stars 368 00:17:09,576 --> 00:17:12,316 for just a moment, if I leave it alone 369 00:17:12,316 --> 00:17:16,446 as this what am I actually doing with the swap function now? 370 00:17:16,576 --> 00:17:17,346 What am I swapping? 371 00:17:17,966 --> 00:17:21,306 So I'm just swapping their addresses 372 00:17:21,306 --> 00:17:22,576 but that's not what I want. 373 00:17:22,576 --> 00:17:26,616 I want to go to whatever A is pointing at which is X. I want 374 00:17:26,616 --> 00:17:30,446 to go to whatever B is pointing at and that's Y. I want 375 00:17:30,446 --> 00:17:32,856 to change those values and then that's it. 376 00:17:33,006 --> 00:17:34,526 It would be completely meaningless just 377 00:17:34,526 --> 00:17:36,936 to change the addresses because at the end of the day 378 00:17:36,936 --> 00:17:38,726 if you put something at Address 1, 2, 3, 379 00:17:38,726 --> 00:17:40,306 4, 5 that's where it is. 380 00:17:40,476 --> 00:17:44,766 If you put something else Y at 4, 5, 6, 7 that's where it is. 381 00:17:44,766 --> 00:17:46,416 If you want to change those values you have 382 00:17:46,416 --> 00:17:47,866 to go to those locations. 383 00:17:47,866 --> 00:17:50,076 You can't just change those locations themselves, 384 00:17:50,076 --> 00:17:51,136 the addresses themselves. 385 00:17:51,566 --> 00:17:53,626 So what does this mean then if we walk 386 00:17:53,626 --> 00:17:55,416 through these three final lines? 387 00:17:56,166 --> 00:18:00,026 So the left-hand side here declares an integer called temp 388 00:18:00,026 --> 00:18:01,256 and that's how many bits? 389 00:18:01,476 --> 00:18:02,986 Just a quick sanity check. 390 00:18:03,616 --> 00:18:05,556 So it's 32 bits in the appliance. 391 00:18:05,556 --> 00:18:08,236 Could be 64 on more modern hardware but in the appliance 392 00:18:08,236 --> 00:18:09,966 and in a lot of computers 32 bits. 393 00:18:10,136 --> 00:18:12,156 Now Star A means what in English? 394 00:18:12,836 --> 00:18:19,636 Okay the location of A but not quite the location of A. Star A, 395 00:18:19,636 --> 00:18:22,366 so the star in this case is the dereference operator 396 00:18:22,366 --> 00:18:26,356 so Star A means go to that address and find what's there. 397 00:18:26,826 --> 00:18:27,856 So a quick sanity check. 398 00:18:27,856 --> 00:18:30,826 If I hand you the address and it's called A and I say go 399 00:18:30,826 --> 00:18:32,206 to this address what are you going to find 400 00:18:32,206 --> 00:18:33,246 when you show up at that address? 401 00:18:33,806 --> 00:18:33,976 >> X 402 00:18:34,406 --> 00:18:35,136 >> X, the No. 403 00:18:35,246 --> 00:18:39,056 1. So what is stored in temp right now is simply the No. 404 00:18:39,056 --> 00:18:41,276 1. Okay so now in the second line here. 405 00:18:41,276 --> 00:18:47,936 Star A means go to A so what's currently at A? 406 00:18:47,936 --> 00:18:48,446 >> [Inaudible comment] 407 00:18:48,446 --> 00:18:49,036 >> So the No. 408 00:18:49,036 --> 00:18:52,166 1 aka X so what am I putting there based 409 00:18:52,166 --> 00:18:54,296 on the assignment operator, based on that equal sign, 410 00:18:54,436 --> 00:18:56,866 well I'm saying go to B. So I'm going 411 00:18:56,866 --> 00:18:58,876 to B. What do I find there? 412 00:18:58,876 --> 00:18:59,776 >> [Inaudible comment] 413 00:18:59,776 --> 00:19:04,366 >> So I find 2 aka Y and so now it's saying take that value 2, 414 00:19:04,616 --> 00:19:09,046 put it at Star A which now has the effect of putting the No. 415 00:19:09,046 --> 00:19:10,046 2 where the No. 416 00:19:10,046 --> 00:19:11,146 1 once was. 417 00:19:11,146 --> 00:19:13,446 So at this point in the story there's X and there's Y 418 00:19:13,446 --> 00:19:15,886 and there's A and B respectively pointing at them 419 00:19:15,886 --> 00:19:17,986 but what two values, in this point in the story, 420 00:19:17,986 --> 00:19:20,176 are in locations X and Y? 421 00:19:20,176 --> 00:19:21,676 >> [Inaudible comment] 422 00:19:21,676 --> 00:19:22,356 >> They're both 2. 423 00:19:22,486 --> 00:19:23,136 They're both the No. 424 00:19:23,136 --> 00:19:23,866 2 at this point. 425 00:19:24,096 --> 00:19:27,196 So all we have to do now is fix the mess we just made. 426 00:19:27,196 --> 00:19:28,546 We made two copies of 2. 427 00:19:28,546 --> 00:19:29,836 We need to now put the No. 428 00:19:29,836 --> 00:19:32,816 1 back in the location Y. 429 00:19:33,086 --> 00:19:35,186 So temp you already said earlier was the No. 430 00:19:35,186 --> 00:19:36,406 1 and there's nothing new there. 431 00:19:36,406 --> 00:19:37,216 Temp is an int. 432 00:19:37,216 --> 00:19:38,026 It's as simple as that. 433 00:19:38,026 --> 00:19:39,886 It's just this number we put there which was 1. 434 00:19:40,276 --> 00:19:43,526 Star B means go to wherever B is pointing 435 00:19:43,766 --> 00:19:46,506 and that is the location we think of as? 436 00:19:47,036 --> 00:19:48,006 So that's Y right. 437 00:19:48,006 --> 00:19:51,216 Star B is Y. So what do I put at Y location? 438 00:19:52,026 --> 00:19:53,156 The No. 1. 439 00:19:53,706 --> 00:19:56,596 So that's a bit of a long tale to tell of just verbally 440 00:19:56,596 --> 00:20:00,346 but recall how we can actually see this relatively simply step 441 00:20:00,346 --> 00:20:00,816 by step. 442 00:20:00,816 --> 00:20:02,486 Let me go into my source directory. 443 00:20:02,736 --> 00:20:05,496 Let me go ahead and make swap which is the name of this file. 444 00:20:05,756 --> 00:20:08,446 Let me go ahead and run it just as a sanity check 445 00:20:08,446 --> 00:20:11,116 and it does seem to be finally all these weeks later, 446 00:20:11,116 --> 00:20:13,596 working properly but that all happens too fast 447 00:20:13,596 --> 00:20:15,906 in underneath the hood so let me go ahead instead 448 00:20:15,906 --> 00:20:19,136 and run GDB of swap and Enter. 449 00:20:19,136 --> 00:20:20,946 I'm going to go ahead and run it here. 450 00:20:20,946 --> 00:20:23,746 Okay that wasn't all that useful but it did run 451 00:20:23,746 --> 00:20:25,006 into the confines of GDB. 452 00:20:25,006 --> 00:20:27,476 How do I actually pause execution and step through this? 453 00:20:27,476 --> 00:20:29,766 So I want to actually do break it. 454 00:20:29,766 --> 00:20:32,116 Let's just break at the beginning, break main Enter. 455 00:20:32,356 --> 00:20:34,926 It's now telling me there's a breakpoint at some crazy address 456 00:20:34,926 --> 00:20:36,846 and that's where the program physically is in memory. 457 00:20:37,116 --> 00:20:40,556 In the file swap dot c Line 22 and if I actually scrolled 458 00:20:40,556 --> 00:20:43,386 up the very start of main would indeed be at Line 22. 459 00:20:43,616 --> 00:20:46,356 So now I type Run to run the program but I'm instantly going 460 00:20:46,356 --> 00:20:47,416 to hit that breakpoint. 461 00:20:47,416 --> 00:20:49,706 So now we can just step through this one step at a time. 462 00:20:50,006 --> 00:20:51,506 So INT X gets 0. 463 00:20:51,506 --> 00:20:55,736 So let me do a Print X and that's already wrong. 464 00:20:57,166 --> 00:20:57,886 Why? 465 00:20:57,886 --> 00:20:57,953 >> [Inaudible comment] 466 00:20:57,953 --> 00:20:58,796 >> Yeah. 467 00:20:59,926 --> 00:21:02,776 >> I think it's the address. 468 00:21:02,776 --> 00:21:04,786 >> It's the address so it's actually not the address here. 469 00:21:04,786 --> 00:21:06,526 This is literally the value that's at X. 470 00:21:07,626 --> 00:21:08,576 >> You were saying what was sort 471 00:21:08,576 --> 00:21:10,406 of there before you have to make it to here. 472 00:21:10,446 --> 00:21:10,746 >> Exactly. 473 00:21:10,746 --> 00:21:13,526 So this is an example of one of these garbage values right. 474 00:21:13,836 --> 00:21:17,706 Just because we see that we're at Line 22 remember 475 00:21:17,706 --> 00:21:21,366 with GDB it shows you the line before it actually executes it 476 00:21:21,416 --> 00:21:21,896 for you. 477 00:21:22,236 --> 00:21:26,286 So at this point in the story I've not yet executed Line 22. 478 00:21:26,286 --> 00:21:29,446 So X has not been set to one so what in the world is X? 479 00:21:29,446 --> 00:21:30,516 Well we never know. 480 00:21:30,626 --> 00:21:32,676 It's probably some garbage value that's there 481 00:21:32,676 --> 00:21:34,446 from some previous function call 482 00:21:34,536 --> 00:21:36,966 or some previous use of that location. 483 00:21:37,196 --> 00:21:39,046 So if we ran this all day long 484 00:21:39,046 --> 00:21:41,316 on different computers might very well get different numbers 485 00:21:41,316 --> 00:21:43,536 again and again but the point is that it's just wrong. 486 00:21:43,536 --> 00:21:44,856 It's not a useful number. 487 00:21:45,116 --> 00:21:48,146 So now let's type Next and this will actually move me 488 00:21:48,146 --> 00:21:49,006 to the next line. 489 00:21:49,276 --> 00:21:51,466 It's not yet executed Line 23 490 00:21:51,466 --> 00:21:54,726 but if I now do Print X I should see one 491 00:21:54,876 --> 00:21:56,996 and again don't be confused by the dollar sign too. 492 00:21:56,996 --> 00:21:59,076 This is just a fancier feature so that if you want 493 00:21:59,076 --> 00:22:01,666 to refer back to some value you print it earlier. 494 00:22:01,866 --> 00:22:04,616 You can still access it by way of this dollar sign notation 495 00:22:04,616 --> 00:22:08,026 but for now you'll rarely need to use that. 496 00:22:08,126 --> 00:22:08,516 All right. 497 00:22:08,516 --> 00:22:10,226 So let's just go ahead and do Next. 498 00:22:10,786 --> 00:22:11,756 Let's do Next. 499 00:22:11,996 --> 00:22:14,356 Notice now it all of a sudden said X is 1 500 00:22:14,416 --> 00:22:15,906 so it's actually printing the printouts. 501 00:22:16,226 --> 00:22:16,786 Let's do Next. 502 00:22:17,526 --> 00:22:17,816 All right. 503 00:22:17,816 --> 00:22:20,716 It's about to say swapping so let's do Next one more time 504 00:22:21,106 --> 00:22:24,896 and now before we dive into the swap function let's do a little 505 00:22:24,896 --> 00:22:25,526 sanity check. 506 00:22:25,526 --> 00:22:29,026 Let me clear the screen and Print X. Okay let me Print Y. 507 00:22:29,516 --> 00:22:31,216 So the state of the world is as we expect. 508 00:22:31,636 --> 00:22:35,416 So if I do Type Next what's about to happen? 509 00:22:35,416 --> 00:22:35,626 >> [Inaudible comment] 510 00:22:35,626 --> 00:22:37,796 >> So they're going to get swapped but it's going to kind 511 00:22:37,796 --> 00:22:38,896 of happen all of a sudden. 512 00:22:38,896 --> 00:22:41,356 I'd actually like to see what the swap function is going 513 00:22:41,356 --> 00:22:41,546 to do. 514 00:22:41,546 --> 00:22:43,706 How do I actually go inside of the swap function? 515 00:22:43,966 --> 00:22:44,146 >> Step. 516 00:22:44,496 --> 00:22:45,486 >> So you do Step. 517 00:22:45,486 --> 00:22:48,306 So Step steps you into what's about to happen as opposed 518 00:22:48,306 --> 00:22:50,026 to Next which steps over. 519 00:22:50,026 --> 00:22:53,596 In both cases though they'll make the code execute but all 520 00:22:53,596 --> 00:22:55,396 at once or one step at a time. 521 00:22:55,666 --> 00:22:57,846 So now notice we're in swap and not 522 00:22:58,006 --> 00:23:00,426 to make things look a little too crazy, 523 00:23:00,476 --> 00:23:02,166 what I'm going to do, a new command. 524 00:23:02,296 --> 00:23:04,846 Back trace just in case you start to get lost 525 00:23:04,846 --> 00:23:07,776 in where you are, if you type back trace you'll see a quick 526 00:23:07,776 --> 00:23:11,296 if arcane summary of those frames that we keep drawing 527 00:23:11,296 --> 00:23:12,566 on the board as rectangles. 528 00:23:12,836 --> 00:23:16,566 So unfortunately it actually rather fortunately it draws them 529 00:23:16,566 --> 00:23:17,326 in the right order. 530 00:23:17,326 --> 00:23:19,776 Notice at the bottom of the stack is main. 531 00:23:19,776 --> 00:23:20,646 It's indicated with No. 532 00:23:20,646 --> 00:23:23,206 1 here and then on top of that is No. 533 00:23:23,206 --> 00:23:25,556 0. It's kind of unfortunate that they reversed the numbering 534 00:23:25,746 --> 00:23:29,676 but we can now see those stack frames and so what we have here, 535 00:23:29,816 --> 00:23:31,666 we're inside of the swap function. 536 00:23:31,916 --> 00:23:34,946 So what's about to happen is Line 42 is 537 00:23:35,026 --> 00:23:36,306 about to get executed. 538 00:23:36,706 --> 00:23:40,286 So if I print out temp, what's actually going 539 00:23:40,286 --> 00:23:41,976 to get displayed on the screen? 540 00:23:43,746 --> 00:23:47,746 If I print temp. 541 00:23:47,746 --> 00:23:48,266 >> [Inaudible comment] 542 00:23:48,266 --> 00:23:50,136 >> Yeah so it's going to be some garbage value right 543 00:23:50,136 --> 00:23:51,586 because this line has not executed 544 00:23:51,586 --> 00:23:53,956 so again there's no right answer, happens to be zero. 545 00:23:53,996 --> 00:23:54,756 We're just lucky. 546 00:23:55,026 --> 00:23:59,466 So if I actually now print out let's say Star A, 547 00:23:59,466 --> 00:24:03,446 what am I going to see? 548 00:24:03,446 --> 00:24:03,706 >> [Inaudible comment] 549 00:24:03,706 --> 00:24:07,626 >> So Star A. So star means go there. 550 00:24:07,626 --> 00:24:09,616 It's the d reference operator and if I go 551 00:24:09,616 --> 00:24:10,896 to A what am I going to see? 552 00:24:11,176 --> 00:24:11,266 >> One. 553 00:24:11,826 --> 00:24:13,446 >> So I should see 1 so let's do a quick check. 554 00:24:13,616 --> 00:24:15,706 Print Star A. I do see the No. 555 00:24:15,706 --> 00:24:19,096 1. If I instead just Print A what should I instead see? 556 00:24:20,116 --> 00:24:20,586 >> Address. 557 00:24:20,586 --> 00:24:23,406 >> So the address and then again mostly meaningless but it is 558 00:24:23,406 --> 00:24:28,216 in fact the address in RAM of where A or rather 559 00:24:28,316 --> 00:24:31,776 of what is inside of A. So when we say the address 560 00:24:31,966 --> 00:24:33,666 of X is in A where is X? 561 00:24:34,286 --> 00:24:36,366 Well it happens to be at that address there. 562 00:24:36,616 --> 00:24:38,646 Okay so let's just move on with the functionality. 563 00:24:38,646 --> 00:24:41,526 So if I type Next, that line of code 564 00:24:41,526 --> 00:24:42,936 up here is going to execute. 565 00:24:43,096 --> 00:24:45,596 INT temp gets Star A. So when I type Next 566 00:24:45,866 --> 00:24:48,216 and now Print temp, I should see the No. 567 00:24:48,216 --> 00:24:51,106 1 and if I print Star A I should also see the No. 568 00:24:51,106 --> 00:24:53,336 1 because we've moved Star A into temp. 569 00:24:53,916 --> 00:24:56,566 The next line of code that's about to get executed is this. 570 00:24:56,766 --> 00:25:00,216 So as soon as I type in Next and then print out Star A, 571 00:25:00,216 --> 00:25:02,176 it should be this time? 572 00:25:02,726 --> 00:25:02,816 >> One. 573 00:25:04,416 --> 00:25:07,246 >> So not one now because we just executed this line 574 00:25:07,246 --> 00:25:08,026 that I've highlighted. 575 00:25:08,836 --> 00:25:12,436 So Star A equals whatever is at Star B. What was at Star B? 576 00:25:12,686 --> 00:25:13,206 >> Two. 577 00:25:13,206 --> 00:25:15,556 >> So Star A is now 2 as well. 578 00:25:15,766 --> 00:25:18,936 So we have seen to now put into Star A the No. 579 00:25:18,936 --> 00:25:20,886 2 and in Star B is the No. 580 00:25:20,886 --> 00:25:22,836 2 so we have that just one last step. 581 00:25:23,326 --> 00:25:26,346 So now the last step recall is go ahead and replace what's 582 00:25:26,526 --> 00:25:29,016 at location B with whatever's in temp. 583 00:25:29,246 --> 00:25:30,516 Well what's in temp? 584 00:25:30,516 --> 00:25:32,396 One. So what's at Star B? 585 00:25:32,396 --> 00:25:33,546 We already know it's 2. 586 00:25:33,676 --> 00:25:36,466 So in a moment when I type Next hopefully this will have 587 00:25:36,466 --> 00:25:40,816 clobbered the value at Star B and in fact it's the No. 588 00:25:40,816 --> 00:25:43,806 1 and it's Star A is 2. 589 00:25:44,076 --> 00:25:46,346 So if I now, now I'm getting bored with this debugging. 590 00:25:46,346 --> 00:25:48,406 I'm just going to type Continue and that's going to run 591 00:25:48,406 --> 00:25:49,406 to the end of the program. 592 00:25:49,646 --> 00:25:53,426 Voila. It's claimed swap X is 2 Y is 1. 593 00:25:53,826 --> 00:25:55,526 So there should be two takeaways here. 594 00:25:55,526 --> 00:25:57,586 So 1) exactly what the difference is 595 00:25:57,586 --> 00:26:00,166 between just saying A, just saying B and saying Star A 596 00:26:00,166 --> 00:26:03,186 and Star B and what he difference is between saying X 597 00:26:03,186 --> 00:26:08,906 and Y versus saying ampersand X and ampersand Y and then lastly, 598 00:26:08,906 --> 00:26:11,786 more in practical terms again just going through this process 599 00:26:11,786 --> 00:26:14,576 of using GDB even though it might feel a little clunky 600 00:26:14,576 --> 00:26:16,506 or a bit tricky to wrap your mind 601 00:26:16,506 --> 00:26:19,066 around at first is incredible useful as you start to try 602 00:26:19,066 --> 00:26:20,716 to find bugs in your own code. 603 00:26:21,046 --> 00:26:24,346 >> Does the pointer have a scope? 604 00:26:24,346 --> 00:26:26,126 >> Does the pointer have a scope? 605 00:26:26,306 --> 00:26:29,256 Yes. So a pointer is still just a variable. 606 00:26:29,616 --> 00:26:34,306 So A and B only exist inside of the swap function here. 607 00:26:35,006 --> 00:26:37,426 So yes. There's no change in the notion of scope. 608 00:26:37,646 --> 00:26:39,216 All we've introduced is the Star 609 00:26:39,216 --> 00:26:41,426 and the star means this is not an actual integer. 610 00:26:41,426 --> 00:26:43,096 This is the address of an integer. 611 00:26:43,196 --> 00:26:43,306 Yeah? 612 00:26:43,566 --> 00:26:44,816 >> What's the difference between the asterisk, the star 613 00:26:45,156 --> 00:26:45,926 and the ampersand again? 614 00:26:45,926 --> 00:26:46,346 >> Good question. 615 00:26:46,346 --> 00:26:47,826 What's the difference between the asterisk, 616 00:26:47,826 --> 00:26:49,316 the star and the ampersand? 617 00:26:49,616 --> 00:26:52,606 So the star means unfortunately two things. 618 00:26:53,196 --> 00:26:57,526 When you specify a star in the prototype of a function, 619 00:26:57,596 --> 00:26:59,416 for instance, right here in the line I've highlighted, 620 00:26:59,416 --> 00:27:01,086 swap INT Star A and Star B. 621 00:27:01,326 --> 00:27:05,866 In that context you are saying A should be a pointer to an int. 622 00:27:05,866 --> 00:27:07,936 B should be a pointer to an int. 623 00:27:08,396 --> 00:27:12,466 This is identical to well, let me not go there yet. 624 00:27:12,596 --> 00:27:13,756 So that's all that's saying. 625 00:27:13,756 --> 00:27:17,246 That is the data type of A and B. Now in the context 626 00:27:17,286 --> 00:27:18,716 of the actual function, 627 00:27:18,866 --> 00:27:21,176 star takes on unfortunately a different meaning. 628 00:27:21,396 --> 00:27:23,226 What does Star A and Star B mean 629 00:27:23,286 --> 00:27:25,966 in the three highlighted lines now inside of the swap function? 630 00:27:26,916 --> 00:27:27,176 >> Go there. 631 00:27:27,496 --> 00:27:28,566 >> So it just means go there. 632 00:27:28,566 --> 00:27:30,516 So it's the dereference operator. 633 00:27:30,516 --> 00:27:34,596 It means go to that address and then lastly, this syntax up here 634 00:27:34,596 --> 00:27:36,916 and this is really the third and final piece of syntax, 635 00:27:37,006 --> 00:27:39,836 the ampersand means X is some variable. 636 00:27:39,936 --> 00:27:40,906 It's INT in this case. 637 00:27:40,906 --> 00:27:43,726 Ampersand X means get the address of X. 638 00:27:44,506 --> 00:27:45,626 So if I can hammer home just 639 00:27:45,626 --> 00:27:48,716 that one detail let me actually quickly go back into GDB. 640 00:27:48,716 --> 00:27:51,106 We still have our break point even though the program 641 00:27:51,106 --> 00:27:51,756 stopped running. 642 00:27:51,756 --> 00:27:53,726 Let me run the run command again. 643 00:27:54,346 --> 00:27:55,386 Now I'm back in main. 644 00:27:55,666 --> 00:27:56,346 Let me do Next. 645 00:27:56,796 --> 00:27:59,066 Let me do Next and let me clear the screen 646 00:27:59,066 --> 00:28:00,076 and just do a quick check. 647 00:28:00,176 --> 00:28:03,786 Print X. Print Y. They're 1 and 2 but now even 648 00:28:03,786 --> 00:28:05,296 in GDB I can play around with this. 649 00:28:05,336 --> 00:28:08,066 I can Print ampersand X and what do I get back? 650 00:28:08,416 --> 00:28:10,846 Well that in fact is the numeric address, 651 00:28:10,916 --> 00:28:13,446 albeit in hexadecimal notation which we started looking 652 00:28:13,446 --> 00:28:16,556 at last week of the variable X. So if you look 653 00:28:16,556 --> 00:28:18,966 at your two gigabytes worth of RAM in your computer 654 00:28:19,076 --> 00:28:21,196 and you literally go to that address 655 00:28:21,196 --> 00:28:22,686 and I can't translate hexadecimal 656 00:28:22,686 --> 00:28:23,896 in my head to English words. 657 00:28:24,066 --> 00:28:28,276 If you go to that address you can actually see the No. 658 00:28:28,276 --> 00:28:30,216 1 at that location. 659 00:28:30,216 --> 00:28:30,296 Yeah. 660 00:28:30,676 --> 00:28:36,596 >> I was wondering why do you need to use a star 661 00:28:36,866 --> 00:28:39,636 to access the value inside the pointer? 662 00:28:41,186 --> 00:28:42,746 >> Do you need to use the star 663 00:28:42,746 --> 00:28:44,446 to access the value in the pointer? 664 00:28:44,446 --> 00:28:46,586 Yes if you want to go to that address 665 00:28:46,586 --> 00:28:48,986 to find its value you say Star A or Star B. 666 00:28:48,986 --> 00:28:51,286 >> If you just Print A that would just give you the address? 667 00:28:51,796 --> 00:28:53,686 >> If you just Print A or you just Print B, 668 00:28:53,686 --> 00:28:55,596 yes that will only show you the address 669 00:28:55,826 --> 00:28:58,776 which for the most part is useless unless you're using the 670 00:28:58,776 --> 00:29:03,546 address to actually go there. 671 00:29:03,546 --> 00:29:04,476 >> [Inaudible comment] 672 00:29:04,476 --> 00:29:04,936 >> [Inaudible comment] pointer? 673 00:29:05,016 --> 00:29:09,566 >> That is correct because A and B were declared as pointers, 674 00:29:09,906 --> 00:29:13,836 simply printing A or printing B would not be useful. 675 00:29:13,836 --> 00:29:15,776 It might be interesting in that you can see where they are 676 00:29:15,776 --> 00:29:16,886 but it would not be useful. 677 00:29:17,726 --> 00:29:17,816 Yeah? 678 00:29:18,296 --> 00:29:22,056 >> So when you [inaudible] allocate spaces in memory for A 679 00:29:22,056 --> 00:29:25,206 and B know that you're not overwriting something else 680 00:29:25,206 --> 00:29:29,196 or something which is more important so when 681 00:29:29,196 --> 00:29:32,676 that garbage might have come up [inaudible] program. 682 00:29:32,716 --> 00:29:35,396 >> That's a really okay so that's a really good question. 683 00:29:35,396 --> 00:29:38,096 So the garbage values we keep seeing and they're kind 684 00:29:38,096 --> 00:29:41,196 of a cute curiosity but thus far they've had no down sides 685 00:29:41,476 --> 00:29:44,686 but if we actually use these garbage values bad things can 686 00:29:44,686 --> 00:29:45,496 in fact happen. 687 00:29:45,936 --> 00:29:49,916 If you have a pointer that has some junk value in it 688 00:29:49,916 --> 00:29:51,486 because of some previous function call 689 00:29:51,486 --> 00:29:55,196 or some previous execution of code and you do star and then 690 00:29:55,196 --> 00:29:57,816 that pointer value, bad things will happen. 691 00:29:57,936 --> 00:30:01,756 In fact let's see if we can implement this very idea really 692 00:30:01,756 --> 00:30:02,366 fast here. 693 00:30:02,366 --> 00:30:05,276 So let me open up a new document in [inaudible] I'm going 694 00:30:05,776 --> 00:30:09,656 to go ahead and save this as bad dot C, I'm going to go ahead 695 00:30:09,656 --> 00:30:13,526 and at the top here let's just first start doing INTs main 696 00:30:13,526 --> 00:30:15,856 void, I'm not going to bother with any command line arguments, 697 00:30:16,126 --> 00:30:18,436 and this is the other context 698 00:30:18,436 --> 00:30:19,886 in which you might use the star notation 699 00:30:19,886 --> 00:30:21,066 that I promised a moment ago. 700 00:30:21,466 --> 00:30:22,686 I can actually say 701 00:30:22,686 --> 00:30:25,246 to the compiler give me a pointer to some data type. 702 00:30:25,276 --> 00:30:30,026 I can actually say INT star P and P denotes pointer here 703 00:30:30,226 --> 00:30:33,376 but notice I do not have in this simple example thus far an 704 00:30:33,376 --> 00:30:33,936 equal sign. 705 00:30:33,936 --> 00:30:40,366 There's no assignment operator, so what is actually at star P. 706 00:30:40,426 --> 00:30:43,376 If I were to do something like this, print a percent D 707 00:30:43,376 --> 00:30:45,226 like print an integer, what do I want to print? 708 00:30:45,436 --> 00:30:47,896 Go to P and print whatever integer's there. 709 00:30:48,206 --> 00:30:49,856 What's going to happen? 710 00:30:50,776 --> 00:30:52,576 What's gonna print on the screen? 711 00:30:54,506 --> 00:30:58,216 Yeah, it's kind of totally unclear, right? 712 00:30:58,216 --> 00:31:02,846 So P as a pointer and because I've not initialized it 713 00:31:02,846 --> 00:31:04,476 to anything it could be the number 0, 714 00:31:04,476 --> 00:31:08,546 it could be the number 1, it could be 555551212, 715 00:31:08,546 --> 00:31:09,116 it could be any number. 716 00:31:09,116 --> 00:31:11,946 And if I then presumptuously say it doesn't matter what it is, 717 00:31:12,036 --> 00:31:15,236 go there with star P well what if star P is 718 00:31:15,236 --> 00:31:18,646 in a particularly bad point in memory, it's the part of memory 719 00:31:18,646 --> 00:31:20,966 up at the top that is where your own program is. 720 00:31:21,206 --> 00:31:24,206 Maybe its address zero which we also said last week is bad 721 00:31:24,206 --> 00:31:26,696 because the null pointer is known as address zero. 722 00:31:26,856 --> 00:31:30,496 So in short I daresay the most common way 723 00:31:30,606 --> 00:31:33,836 for crashing a program with a seg fault or with a core dump 724 00:31:33,836 --> 00:31:36,516 that some of you guys have accidentally induced thus far is 725 00:31:36,516 --> 00:31:38,126 by doing exactly something like that. 726 00:31:38,406 --> 00:31:41,446 Not realizing that a pointer has just some bogus value 727 00:31:41,446 --> 00:31:44,346 or some null value and accidentally going there 728 00:31:44,346 --> 00:31:46,966 and then you get this so-called segmentation fault. 729 00:31:47,806 --> 00:31:49,656 So though this is a lot of new syntax, 730 00:31:49,726 --> 00:31:52,866 again there are only three ways in which we've used the star. 731 00:31:52,946 --> 00:31:55,166 Here, and this is just off the cuff, we'll actually see this 732 00:31:55,166 --> 00:31:56,776 in more useful context in the future. 733 00:31:57,116 --> 00:31:58,616 Here when you actually want a pointer, 734 00:31:58,976 --> 00:32:02,796 we saw it in the function prototype of the swap function. 735 00:32:02,796 --> 00:32:05,716 And then inside of the function itself where you say go there 736 00:32:06,116 --> 00:32:07,336 by way of the star notation. 737 00:32:07,336 --> 00:32:09,656 And we'll see another example before long. 738 00:32:09,656 --> 00:32:10,436 Any other questions? 739 00:32:11,176 --> 00:32:12,286 Yeah. 740 00:32:12,786 --> 00:32:16,076 >> [Inaudible comment] 741 00:32:16,576 --> 00:32:24,946 >> So what is the advantage of swapping them in the function? 742 00:32:26,276 --> 00:32:32,346 I missed the end of it. 743 00:32:32,346 --> 00:32:32,413 >> [Inaudible comment] 744 00:32:32,413 --> 00:32:35,226 >> Ah, so good question and if I can kind 745 00:32:35,956 --> 00:32:38,836 of reminisce what we did last week with the milk and OJ. 746 00:32:38,836 --> 00:32:41,916 When I had two slips of paper and I said here's the address 747 00:32:41,916 --> 00:32:43,776 of the milk, here's the address of the orange juice, 748 00:32:43,836 --> 00:32:45,606 then I handed them to someone to do the swapping. 749 00:32:45,806 --> 00:32:47,696 Suppose I now receive these pieces of paper. 750 00:32:48,066 --> 00:32:49,996 Well if I just said okay here is these pieces 751 00:32:49,996 --> 00:32:52,956 of paper back it doesn't actually change the contents 752 00:32:52,956 --> 00:32:53,536 of memory. 753 00:32:53,796 --> 00:32:56,196 All I've done is kind of a switcheroo of the addresses 754 00:32:56,276 --> 00:32:59,416 but the physical layout of RAM has been affected in no way 755 00:32:59,676 --> 00:33:02,436 and the goal typically of swapping two variables is 756 00:33:02,436 --> 00:33:04,276 to literally swap two variables, 757 00:33:04,526 --> 00:33:07,346 not just change you know the order in which we're talking 758 00:33:07,346 --> 00:33:14,416 about them, if that's not too hand wavy of an answer. 759 00:33:14,696 --> 00:33:18,366 Other questions? 760 00:33:18,366 --> 00:33:19,696 Yeah. 761 00:33:19,696 --> 00:33:19,906 >> [Inaudible comment] 762 00:33:19,906 --> 00:33:20,346 >> Yeah. 763 00:33:20,346 --> 00:33:20,576 >> [Inaudible comment] 764 00:33:20,576 --> 00:33:21,186 >> Perfect. 765 00:33:21,186 --> 00:33:24,666 Yes. So we could start to do crazy things whereby 766 00:33:24,766 --> 00:33:26,546 if you have-let's do this 767 00:33:26,866 --> 00:33:28,816 and again we'll see a more complete example in a moment. 768 00:33:28,816 --> 00:33:33,196 If I have INT X gets 1 and now I have INT star P, 769 00:33:33,446 --> 00:33:36,956 I could actually do this because X is an integer 770 00:33:37,386 --> 00:33:39,876 and every integer, every variable, is somewhere 771 00:33:39,876 --> 00:33:43,086 in memory, ampersand X means find the location in memory 772 00:33:43,326 --> 00:33:46,496 so INT star P means give me a pointer to an integer 773 00:33:46,696 --> 00:33:49,506 and what address do I actually want to put there is X. 774 00:33:50,326 --> 00:33:54,776 And so what I can then do is now I can either print out X 775 00:33:54,776 --> 00:33:56,856 like we did in week one and this would absolutely work. 776 00:33:56,956 --> 00:33:59,556 Print percent D as a placeholder, print out X, 777 00:33:59,886 --> 00:34:02,466 but what would also work now if I wanted to print the number 1? 778 00:34:03,086 --> 00:34:07,436 I could also say star P, or if you want 779 00:34:07,436 --> 00:34:10,746 to get crazy I could also do well print X but wait a minute, 780 00:34:10,876 --> 00:34:12,966 let's get the address of X, wait a minute changed my mind. 781 00:34:12,966 --> 00:34:17,506 Star X ampersand star and yes these would invert each 782 00:34:17,506 --> 00:34:18,396 other's behaviors. 783 00:34:18,396 --> 00:34:21,306 Completely useless, very bad style but it would 784 00:34:21,306 --> 00:34:23,746 in fact reverse the effects of those operators. 785 00:34:25,006 --> 00:34:30,156 So, all right so we've now swapped successfully. 786 00:34:30,156 --> 00:34:32,756 It's always around this point in the semester 787 00:34:32,756 --> 00:34:34,706 and you might have seen this in section already, 788 00:34:35,046 --> 00:34:36,416 but a quick interlude here. 789 00:34:37,176 --> 00:34:39,526 Now that you get this particular joke let's go ahead 790 00:34:39,526 --> 00:34:41,456 and take a five minute break and we'll regroup 791 00:34:41,566 --> 00:34:44,356 as to how these pointers can be used and abused. 792 00:34:47,836 --> 00:34:53,966 All right, so we are back and recall 793 00:34:53,966 --> 00:34:56,596 that this is the picture we keep drawing, the bottom part 794 00:34:56,596 --> 00:34:57,786 of our computer's memory 795 00:34:57,996 --> 00:35:01,036 and we typically call these rectangular slivers frames. 796 00:35:01,036 --> 00:35:03,076 And recall just a moment ago within GDB 797 00:35:03,076 --> 00:35:06,936 when I executed back trace what I was seeing was pairs 798 00:35:06,936 --> 00:35:07,896 of these frames. 799 00:35:07,896 --> 00:35:09,136 So even though I've drawn them 800 00:35:09,136 --> 00:35:13,316 at different heights here main is a frame and swap is a frame 801 00:35:13,546 --> 00:35:14,666 but I've kind of divided it 802 00:35:14,666 --> 00:35:17,256 out here pictorially whereby you see the parameters first, 803 00:35:17,476 --> 00:35:20,256 then the local variables, then swaps the parameters, 804 00:35:20,456 --> 00:35:22,756 then swaps locals and so forth. 805 00:35:22,756 --> 00:35:25,446 And this simple, relatively simple layout 806 00:35:25,446 --> 00:35:28,386 in this pattern is actually very easily exploitable. 807 00:35:28,386 --> 00:35:30,916 So I mentioned a couple weeks ago this very popular way 808 00:35:31,146 --> 00:35:34,126 of hacking into systems and taking over servers 809 00:35:34,126 --> 00:35:37,356 and defacing websites and so forth and that very often 810 00:35:37,356 --> 00:35:40,766 but not always reduces to leveraging and understanding 811 00:35:40,766 --> 00:35:43,266 of all of this memory and then somehow figuring out how 812 00:35:43,266 --> 00:35:46,686 to inject your own code into one of these frames 813 00:35:46,926 --> 00:35:49,476 so that you overwhelm that frame with way more bits 814 00:35:49,476 --> 00:35:50,786 that it was supposed to store. 815 00:35:50,786 --> 00:35:52,436 And the result is that your code, 816 00:35:52,496 --> 00:35:55,456 not the original programmers, gets executed. 817 00:35:55,636 --> 00:35:57,106 But more on that in just a bit. 818 00:35:57,616 --> 00:36:01,466 So we had a couple of programs on Wednesday, 819 00:36:01,466 --> 00:36:05,746 among them compare one dot C and this program recall if I scroll 820 00:36:05,746 --> 00:36:08,856 up to just the main function was actually flawed. 821 00:36:09,346 --> 00:36:11,166 Even if I typed in hello 822 00:36:11,166 --> 00:36:14,196 and then hello again it said you typed different things. 823 00:36:14,196 --> 00:36:17,896 And what now is the simple explanation for why typing hello 824 00:36:17,896 --> 00:36:23,566 and then hello again yields different things apparently? 825 00:36:24,256 --> 00:36:24,996 Yeah. 826 00:36:24,996 --> 00:36:25,656 >> [Inaudible comment] 827 00:36:25,656 --> 00:36:28,696 >> Exactly. 828 00:36:28,696 --> 00:36:32,156 What we're really doing with this one seemingly correct line 829 00:36:32,156 --> 00:36:36,196 of code is comparing two values but literally two values. 830 00:36:36,196 --> 00:36:38,496 We're comparing S-1 and S-2, well what are they? 831 00:36:38,676 --> 00:36:40,816 Well recall we peeled back this layer last week 832 00:36:40,816 --> 00:36:42,946 and we said you know what, this is a training wheel, 833 00:36:43,206 --> 00:36:47,226 this is actually char star, this too is actually char star 834 00:36:47,426 --> 00:36:49,906 and so S-1 and S-2 are pointers to chars. 835 00:36:50,186 --> 00:36:51,266 What char in fact? 836 00:36:51,576 --> 00:36:54,256 Well the very first character in the string. 837 00:36:54,456 --> 00:36:56,916 And recall that the convention the world adopted years ago 838 00:36:56,916 --> 00:37:00,236 for strings in C is that all you have to remember is the address 839 00:37:00,236 --> 00:37:02,076 of the first character because so long 840 00:37:02,076 --> 00:37:04,946 as you plant a little breadcrumb at the end, backslash zero 841 00:37:04,946 --> 00:37:06,896 at the end of the string, then you can figure 842 00:37:06,896 --> 00:37:10,076 out by just iterating from left to right exactly how long 843 00:37:10,076 --> 00:37:11,936 that string is and where it ends and therefore 844 00:37:11,936 --> 00:37:14,316 when you should stop printing out H-E-L-L-O. 845 00:37:14,586 --> 00:37:17,616 So what was the solution to fixing this problem? 846 00:37:18,206 --> 00:37:19,536 What's that? 847 00:37:19,696 --> 00:37:21,696 >> [Inaudible comment] 848 00:37:21,856 --> 00:37:26,656 >> So put star-okay, so we could do this, we could say star S-1, 849 00:37:26,656 --> 00:37:30,316 star S-2 but borrowing the jargon from before break today, 850 00:37:30,496 --> 00:37:32,926 what does star S-1 mean here? 851 00:37:32,926 --> 00:37:33,516 >> [Inaudible comment] 852 00:37:33,516 --> 00:37:35,596 >> Go to S-1 and what should we find there 853 00:37:35,596 --> 00:37:37,076 if I typed in hello both times? 854 00:37:37,756 --> 00:37:39,706 I'll find H and then I'll find H again, 855 00:37:39,936 --> 00:37:41,786 so this is a step toward correctness 856 00:37:42,016 --> 00:37:45,086 but unfortunately now it's going to say that any two words 857 00:37:45,086 --> 00:37:49,176 that start with hello, that start with H are actually going 858 00:37:49,176 --> 00:37:50,866 to be the same and in fact they're not. 859 00:37:51,096 --> 00:37:52,736 So we did have recall last week a loop 860 00:37:52,736 --> 00:37:54,936 where we started iterating from left to right 861 00:37:54,936 --> 00:37:56,046 and then just looking for difference, 862 00:37:56,046 --> 00:37:57,306 looking for difference, looking for difference, 863 00:37:57,306 --> 00:38:00,166 and if we find one difference whether it's the string length, 864 00:38:00,166 --> 00:38:03,056 whether it's a character that's not quite identical then it 865 00:38:03,056 --> 00:38:04,326 actually would say different. 866 00:38:04,326 --> 00:38:05,346 Otherwise if we get to the end 867 00:38:05,346 --> 00:38:08,566 of both strings we found nothing wrong, we can then say the same. 868 00:38:08,896 --> 00:38:12,256 Or an even simpler approach and again consistent with this idea 869 00:38:12,256 --> 00:38:14,486 that you need not always reinvent the wheel 870 00:38:14,486 --> 00:38:17,206 if someone else has put it into some kind of library 871 00:38:17,496 --> 00:38:19,236 and header file we could do this, 872 00:38:19,236 --> 00:38:23,276 and I've actually added one bit of error checking 873 00:38:23,276 --> 00:38:25,456 that we didn't have last week when we did it on the fly. 874 00:38:26,056 --> 00:38:28,206 Recall that we introduced this function, str com, 875 00:38:28,546 --> 00:38:31,886 string comparison and it can actually return three values 876 00:38:31,886 --> 00:38:32,966 but we only care about one. 877 00:38:32,966 --> 00:38:34,836 What did it mean if it returns the value zero? 878 00:38:34,836 --> 00:38:39,226 That the strings are actually equal, they're the same length, 879 00:38:39,446 --> 00:38:40,706 they're literally the same characters. 880 00:38:40,706 --> 00:38:42,006 They're equal and recall 881 00:38:42,006 --> 00:38:43,946 that str com could return a negative number 882 00:38:43,986 --> 00:38:45,846 or a positive number but that had to do 883 00:38:45,846 --> 00:38:47,796 with alphabetical ordering if you actually cared 884 00:38:47,796 --> 00:38:49,986 about whether one comes first or last. 885 00:38:50,356 --> 00:38:53,356 Now I added one check here and this is one of these rules 886 00:38:53,356 --> 00:38:55,446 of thumbs that even though it might not seem all 887 00:38:55,446 --> 00:38:58,216 that compelling at first, take our word for it. 888 00:38:58,216 --> 00:39:01,686 Any, any, anytime henceforth that you start using pointers 889 00:39:01,716 --> 00:39:03,586 which you've actually been using since week one, 890 00:39:03,796 --> 00:39:04,726 we just call them strings. 891 00:39:04,726 --> 00:39:08,136 We didn't call them char stars, you should always be checking 892 00:39:08,466 --> 00:39:11,846 if a pointer is null before you do anything with it 893 00:39:11,926 --> 00:39:14,606 because as per the conversation before break 894 00:39:14,886 --> 00:39:17,066 if you've got some garbage value in a pointer, 895 00:39:17,256 --> 00:39:20,336 if that pointer is itself zero either intentionally 896 00:39:20,336 --> 00:39:22,856 or accidentally and you say go to this address 897 00:39:22,856 --> 00:39:25,846 and that address is zero or some bogus garbage character, 898 00:39:25,996 --> 00:39:26,896 you're going to induce what? 899 00:39:28,216 --> 00:39:31,906 Site fault, but the problem is not always and this is why a lot 900 00:39:31,906 --> 00:39:34,456 of bugs go undetected by people like us, 901 00:39:34,456 --> 00:39:37,036 by professional programmers, sometimes for years 902 00:39:37,336 --> 00:39:39,686 because not always does a mistake 903 00:39:39,686 --> 00:39:40,766 like this induce an error. 904 00:39:40,876 --> 00:39:43,786 Typically we call these things frames and they also refer 905 00:39:43,786 --> 00:39:45,436 in general to segments of memory. 906 00:39:45,766 --> 00:39:47,116 Typically you're actually allowed 907 00:39:47,116 --> 00:39:49,086 to wander a little bit outside 908 00:39:49,086 --> 00:39:51,496 of the actual memory you've been given. 909 00:39:51,696 --> 00:39:53,546 You can go a little higher, a little lower, 910 00:39:53,656 --> 00:39:56,026 but if you go too far then the segmentation fault happens. 911 00:39:56,246 --> 00:39:58,816 But the problem is that if you don't venture terribly far 912 00:39:58,816 --> 00:40:01,216 or you just get lucky and you go to a garbage address 913 00:40:01,216 --> 00:40:04,816 but that address doesn't induce a crash, you might not realize 914 00:40:04,816 --> 00:40:06,386 that your code is in fact flawed. 915 00:40:06,656 --> 00:40:08,396 So one of the things we'll introduce in a couple 916 00:40:08,396 --> 00:40:10,456 of problem sets is a tool called Balgren [assumed spelling] 917 00:40:10,706 --> 00:40:13,646 which is a tool like GDB that you just run at the command line 918 00:40:13,856 --> 00:40:16,726 and amazingly it will actually analyze your program for you 919 00:40:16,726 --> 00:40:19,366 and try with some probability to figure 920 00:40:19,366 --> 00:40:21,656 out if you're misusing pointers in some way. 921 00:40:22,056 --> 00:40:25,296 But for now the approach is to be defensive and anytime you're 922 00:40:25,296 --> 00:40:26,316 about to do something 923 00:40:26,316 --> 00:40:29,116 with a pointer check if it's in fact null. 924 00:40:29,416 --> 00:40:32,236 And the reason is str com is actually a pretty 925 00:40:32,236 --> 00:40:32,966 simple function. 926 00:40:32,966 --> 00:40:35,166 It does what we did last week, it starts at the left 927 00:40:35,166 --> 00:40:36,596 of each string and compares, compares, 928 00:40:36,596 --> 00:40:37,606 compares, compares, compares. 929 00:40:37,786 --> 00:40:39,046 But it does not check 930 00:40:39,106 --> 00:40:41,306 if a pointer is bogus or if it's zero. 931 00:40:41,586 --> 00:40:45,556 So the problem is if you accidentally somehow make S-1 932 00:40:45,556 --> 00:40:48,416 or S-2 null, for instance the computer is running low 933 00:40:48,416 --> 00:40:50,706 on memory or you've initialized them to null yourself, 934 00:40:50,706 --> 00:40:52,436 any number of ways, we'll see more over time, 935 00:40:52,586 --> 00:40:54,356 and you accidentally pass null pointers 936 00:40:54,356 --> 00:40:57,056 to str com it probably will crash on you. 937 00:40:57,056 --> 00:40:59,996 So in short always check the values of pointers. 938 00:41:00,376 --> 00:41:04,186 All right so we saw one more sophisticated example, 939 00:41:04,386 --> 00:41:06,566 namely trying to copy strings, and our first attempt 940 00:41:06,566 --> 00:41:08,316 at this also was flawed. 941 00:41:08,316 --> 00:41:11,556 So this was copy one dot C last week, recall that I typed 942 00:41:11,646 --> 00:41:15,906 in a word I think hello then too and I tried to capitalize it 943 00:41:16,086 --> 00:41:18,046 by just changing the first letter of the word. 944 00:41:18,166 --> 00:41:20,216 But every time I tried to capitalize 945 00:41:20,216 --> 00:41:22,946 that word hello I ended up changing both copies, 946 00:41:22,946 --> 00:41:26,086 the original and the copy because how did I try 947 00:41:26,086 --> 00:41:27,666 to copy these strings last week? 948 00:41:28,316 --> 00:41:32,586 Well I used this line S-2 gets S-1 and really I can simplify 949 00:41:32,586 --> 00:41:34,936 that and you might have tried this in previous problem sets. 950 00:41:35,146 --> 00:41:38,066 This even looks more convincing, this looks perfectly legit, 951 00:41:38,066 --> 00:41:40,786 string S-2 gets equal to S-1. 952 00:41:40,966 --> 00:41:43,206 But it's not right because what is really happening 953 00:41:43,206 --> 00:41:46,716 in this one line of code? 954 00:41:47,396 --> 00:41:47,546 Sorry. 955 00:41:47,546 --> 00:41:48,116 >> [Inaudible comment] 956 00:41:48,116 --> 00:41:48,986 >> Exactly. 957 00:41:48,986 --> 00:41:50,556 So you are creating a new pointer 958 00:41:50,666 --> 00:41:51,966 but you're not creating a new string. 959 00:41:51,966 --> 00:41:53,386 If we define a string as a bunch 960 00:41:53,386 --> 00:41:56,686 of contiguous characters all we're defining with char S-2, 961 00:41:57,096 --> 00:41:59,326 char star S-2, is another pointer. 962 00:41:59,326 --> 00:42:00,496 What are you pointing at? 963 00:42:00,606 --> 00:42:02,826 Well you are pointing at whatever the user typed in. 964 00:42:02,826 --> 00:42:05,566 So if this is S-1, this is S-2 and the user typed hello 965 00:42:05,566 --> 00:42:07,826 and that word is over here in RAM, at this point 966 00:42:07,826 --> 00:42:09,786 in the story they're both pointing to the same place. 967 00:42:09,946 --> 00:42:11,406 All you've done is create a new pointer, 968 00:42:11,646 --> 00:42:13,886 not a whole array of characters. 969 00:42:14,176 --> 00:42:19,316 So we did solve this and we introduced finally copy 2 dot C. 970 00:42:19,856 --> 00:42:22,596 So in copy 2 it was admittedly a little more complex 971 00:42:23,246 --> 00:42:26,746 but if you think about what each step is doing it actually solves 972 00:42:26,746 --> 00:42:27,936 the problem quite nicely. 973 00:42:28,176 --> 00:42:32,036 So the left hand side stays the same, char star S-2 and again 974 00:42:32,036 --> 00:42:32,956 if this is starting 975 00:42:32,956 --> 00:42:36,206 to intimidate it's again just string S-2 but you have 976 00:42:36,246 --> 00:42:38,986 to start remembering now that strings involve pointers 977 00:42:39,356 --> 00:42:42,436 so what now am I storing inside of S-2? 978 00:42:42,786 --> 00:42:45,646 Well I claim I'm putting in S-2 this time the address 979 00:42:46,066 --> 00:42:47,136 of a new chunk of memory. 980 00:42:47,446 --> 00:42:49,256 That memory initially has just junk in it. 981 00:42:49,256 --> 00:42:50,226 Who knows what's in there, 982 00:42:50,596 --> 00:42:52,766 but how much memory have I just allocated? 983 00:42:53,756 --> 00:42:57,196 Well malloc is memory alloc and it takes in arguments 984 00:42:57,386 --> 00:42:58,756 and that's the number, an integer, 985 00:42:58,986 --> 00:43:00,996 and that integer is how many bytes do you want. 986 00:43:01,126 --> 00:43:05,066 So if I typed in H-E-L-L-O I want five bytes from malloc, 987 00:43:06,156 --> 00:43:08,436 okay I actually want six, the sixth one being 988 00:43:08,436 --> 00:43:10,016 for the backslash zero. 989 00:43:10,016 --> 00:43:14,676 So I always want a +1 so I get the string length of S-1 +1 990 00:43:14,886 --> 00:43:18,056 and then just because I've kind of done this before 991 00:43:18,056 --> 00:43:20,386 and I know these kinds of crazy corner cases that can arise, 992 00:43:20,646 --> 00:43:22,896 why did we say last week that I then want to multiply 993 00:43:22,896 --> 00:43:26,036 that number by the size of a char? 994 00:43:26,436 --> 00:43:26,556 Yeah. 995 00:43:26,556 --> 00:43:27,566 >> [Inaudible comment] 996 00:43:27,566 --> 00:43:28,246 >> Exactly. 997 00:43:28,246 --> 00:43:34,366 Let me tweak slightly, it could be different machines use 998 00:43:34,366 --> 00:43:36,466 different sizes for chars. 999 00:43:36,746 --> 00:43:39,476 So in the typical system certainly the appliance and most 1000 00:43:39,476 --> 00:43:42,746 of the computers you will own typically a char would just be 1001 00:43:42,786 --> 00:43:47,116 one byte or a bit but just in case we hand this compote off 1002 00:43:47,116 --> 00:43:49,986 to a fancier computer or it lingers around for ten years 1003 00:43:49,986 --> 00:43:52,636 and the next computer actually uses two bytes for every char, 1004 00:43:52,876 --> 00:43:54,456 this way we're being a little defensive 1005 00:43:54,456 --> 00:43:56,216 and the program will keep working. 1006 00:43:56,216 --> 00:43:57,926 And silly though this example is, 1007 00:43:57,966 --> 00:43:59,706 think about again the Y2K issue. 1008 00:43:59,916 --> 00:44:01,656 Right, it was this mentality typically 1009 00:44:01,656 --> 00:44:03,496 of either its memory is expensive, 1010 00:44:03,496 --> 00:44:06,666 we need to shave off every byte we can and just use two digits 1011 00:44:06,666 --> 00:44:10,156 or it was a brash mentality like who the heck is going 1012 00:44:10,156 --> 00:44:11,596 to be running our code in 30 years. 1013 00:44:11,596 --> 00:44:13,526 Well a lot of people were still running the code. 1014 00:44:13,526 --> 00:44:15,976 So it's this kind of mentality trying to make sure 1015 00:44:15,976 --> 00:44:18,216 that your code is future proofed and always correct 1016 00:44:18,606 --> 00:44:20,666 that is a very good habit to get into. 1017 00:44:21,216 --> 00:44:24,146 So what is malloc actually doing? 1018 00:44:24,366 --> 00:44:26,606 Well it turns out that malloc, memory alloc, 1019 00:44:26,946 --> 00:44:29,956 is not taking memory from anywhere you see here 1020 00:44:29,956 --> 00:44:33,476 on the screen because recall that the stack, these frames 1021 00:44:33,506 --> 00:44:36,166 that we keep drawing as slivers on this rectangle, 1022 00:44:36,396 --> 00:44:38,406 they go away pretty quickly, right. 1023 00:44:38,406 --> 00:44:40,416 As soon as the function returns that's it, 1024 00:44:40,816 --> 00:44:42,786 the memory is no longer safe to use. 1025 00:44:43,006 --> 00:44:44,116 So this feels problematic. 1026 00:44:44,186 --> 00:44:47,976 If malloc actually borrowed RAM from this part of the picture, 1027 00:44:47,976 --> 00:44:50,386 well as soon as the swap function returned 1028 00:44:50,386 --> 00:44:51,656 or some other function returned, 1029 00:44:51,806 --> 00:44:54,006 even if it asked malloc give me some memory. 1030 00:44:54,186 --> 00:44:54,926 How much memory? 1031 00:44:55,016 --> 00:44:56,856 Well whatever number you passed in as an argument, 1032 00:44:57,106 --> 00:44:58,846 what if then someone else needs that memory? 1033 00:44:59,296 --> 00:45:01,966 So in other words when you call malloc you want 1034 00:45:01,966 --> 00:45:04,886 to allocate memory permanently or at least 1035 00:45:04,886 --> 00:45:06,686 until you are ready to give it back. 1036 00:45:06,956 --> 00:45:08,726 The stack is in very ephemeral, 1037 00:45:08,726 --> 00:45:11,146 anything on the stack can go away at any time as soon 1038 00:45:11,146 --> 00:45:12,136 as these functions return. 1039 00:45:12,356 --> 00:45:14,556 If you want to allocate a chunk of memory for like a word 1040 00:45:14,556 --> 00:45:18,786 like hello and you want that memory to stay active even 1041 00:45:19,006 --> 00:45:21,166 after your functions themselves have returned, 1042 00:45:21,416 --> 00:45:22,886 well you actually need to put it elsewhere. 1043 00:45:23,306 --> 00:45:26,206 So we drew briefly in previous weeks this picture here. 1044 00:45:26,366 --> 00:45:29,226 This whole rectangle now might represent your computer's RAM, 1045 00:45:29,496 --> 00:45:31,036 at the bottom of this is the stack. 1046 00:45:31,376 --> 00:45:32,606 There's also this other stuff 1047 00:45:32,776 --> 00:45:34,796 that are generally called environment variables. 1048 00:45:35,046 --> 00:45:37,076 These are special things like your user name. 1049 00:45:37,166 --> 00:45:40,166 So it turns out that when you run a program the word Jay 1050 00:45:40,166 --> 00:45:41,996 Harbard [assumed spelling] is actually put in RAM 1051 00:45:41,996 --> 00:45:44,226 and that is accessible to you, the programmer. 1052 00:45:44,506 --> 00:45:47,146 We typically don't need it or use it but there's stuff 1053 00:45:47,146 --> 00:45:49,596 like that that's put below your stack so that it's there 1054 00:45:49,596 --> 00:45:51,266 for main and anyone else who wants it. 1055 00:45:51,606 --> 00:45:54,966 At the very very top of your RAM is the so-called text segment, 1056 00:45:54,966 --> 00:45:56,206 and what does text mean? 1057 00:45:56,206 --> 00:46:00,456 So it's a little misleading 1058 00:46:00,786 --> 00:46:04,556 but for historical reasons text segment is the zeroes and ones 1059 00:46:04,756 --> 00:46:08,776 that you wrote by way of compiling your code with GCC. 1060 00:46:08,946 --> 00:46:11,696 So the text segment is literally the zeros and ones 1061 00:46:11,696 --> 00:46:15,296 that compose your program Sudoku or Game of 15, 1062 00:46:15,366 --> 00:46:16,936 anything like that, they're loaded into RAM. 1063 00:46:16,936 --> 00:46:18,606 And that should just make intuitive sense, right? 1064 00:46:18,606 --> 00:46:20,846 If you double click an icon or run a program's name 1065 00:46:21,066 --> 00:46:24,356 for anything to happen those bits need to be loaded somewhere 1066 00:46:24,356 --> 00:46:26,446 so the computer can start doing something with them, 1067 00:46:26,666 --> 00:46:29,016 and indeed they're loaded into RAM in this way. 1068 00:46:29,346 --> 00:46:31,676 Now lastly there's a couple other things that are put way 1069 00:46:31,676 --> 00:46:34,306 up there, initialized data and uninitialized data. 1070 00:46:34,586 --> 00:46:36,416 There's some subtleties there but in a nutshell 1071 00:46:36,696 --> 00:46:39,656 on the rare occasions when we've used global variables, 1072 00:46:39,766 --> 00:46:43,036 for instance the Game of 15 we had G which was the global board 1073 00:46:43,036 --> 00:46:46,196 at the very top of the file, global variables go there too. 1074 00:46:46,416 --> 00:46:48,156 And that should also now make intuitive sense 1075 00:46:48,156 --> 00:46:50,616 because you don't want your global variables on the stack 1076 00:46:50,916 --> 00:46:53,146 because then they might disappear mid-execution 1077 00:46:53,146 --> 00:46:55,836 of program but if they're way at the top they're going 1078 00:46:55,836 --> 00:46:57,546 to stay there is the takeaway here. 1079 00:46:57,936 --> 00:46:59,506 So that leaves just one last segment. 1080 00:47:00,056 --> 00:47:01,596 So this was someone's bright idea 1081 00:47:01,866 --> 00:47:04,236 but you know why not have this thing called the stack row up, 1082 00:47:04,236 --> 00:47:06,796 up, up, up, up, and we need some other space 1083 00:47:06,796 --> 00:47:08,186 so let's have this other space grow down, 1084 00:47:08,186 --> 00:47:08,836 down, down, down, down. 1085 00:47:09,156 --> 00:47:11,406 Now of course this is like a problem waiting to happen 1086 00:47:11,406 --> 00:47:13,466 and we've induced this problem before, right. 1087 00:47:13,466 --> 00:47:15,336 When I wrote that silly function called foo 1088 00:47:15,386 --> 00:47:17,856 and then I had foo called foo and I didn't have any kind 1089 00:47:17,856 --> 00:47:20,136 of base case or if condition to stop foo 1090 00:47:20,136 --> 00:47:21,596 from calling itself, what happened? 1091 00:47:21,786 --> 00:47:23,446 Well, the stack went up, up, up, up, up, 1092 00:47:23,446 --> 00:47:25,816 it clobbered the so-called heap even though we didn't call it 1093 00:47:25,816 --> 00:47:28,346 that that day, and the program terminated. 1094 00:47:28,346 --> 00:47:30,216 The operating system said segmentation fault, 1095 00:47:30,456 --> 00:47:32,276 you have exceeded your segments. 1096 00:47:32,696 --> 00:47:34,076 But this is kind of nice, right. 1097 00:47:34,076 --> 00:47:36,146 This might seem kind of stupid in that this is 1098 00:47:36,146 --> 00:47:37,756 like you know two trains on a track, 1099 00:47:37,756 --> 00:47:39,516 this is not going to end well. 1100 00:47:39,856 --> 00:47:41,476 But if you think about it in the real world 1101 00:47:41,476 --> 00:47:44,306 with your computer you only have a finite amount of memory 1102 00:47:44,486 --> 00:47:46,716 so it's not as though we could just flip this picture around 1103 00:47:46,716 --> 00:47:49,836 and just say let the memory grow up for both of these sections 1104 00:47:49,836 --> 00:47:52,436 of memory, both heap and stack, or eventually you're just going 1105 00:47:52,436 --> 00:47:55,236 to hit your two gigabyte, your two billionth byte of RAM 1106 00:47:55,236 --> 00:47:56,296 or however much you have. 1107 00:47:56,766 --> 00:47:59,866 So this is actually nice in that it lets us use the whole 1108 00:47:59,866 --> 00:48:03,576 potential memory and bad things happen only if we get greedy 1109 00:48:03,666 --> 00:48:05,516 and we try to load in too big of a dataset, 1110 00:48:05,516 --> 00:48:06,876 we call function too many times. 1111 00:48:07,286 --> 00:48:10,576 But long story short malloc puts memory on the heap 1112 00:48:11,096 --> 00:48:13,106 and it puts it way up there away from the stack. 1113 00:48:13,326 --> 00:48:15,666 And again if this is two gigabytes it actually takes a 1114 00:48:15,666 --> 00:48:17,966 bit of effort or a bit of stupidity 1115 00:48:18,016 --> 00:48:19,826 to actually have the stack hit the heap, 1116 00:48:20,016 --> 00:48:21,656 though it does sometimes happen. 1117 00:48:22,256 --> 00:48:23,016 So let's take a look. 1118 00:48:23,016 --> 00:48:26,856 Let me go ahead now and open up a little file called bar dot C 1119 00:48:27,236 --> 00:48:30,556 which actually does not do anything that's useful 1120 00:48:30,796 --> 00:48:32,646 but it does now play with a bunch 1121 00:48:32,646 --> 00:48:34,316 of these various syntactic details. 1122 00:48:34,526 --> 00:48:37,136 So if at any point something goes over your head, 1123 00:48:37,136 --> 00:48:39,556 just look confused, raise your hand because all 1124 00:48:39,556 --> 00:48:42,676 of these are just now re-emphasis 1125 00:48:42,876 --> 00:48:44,576 on these same basic building blocks. 1126 00:48:44,576 --> 00:48:48,256 So this is bar dot C, at the top these two things are called 1127 00:48:48,616 --> 00:48:51,736 quick quiz prototypes, right, 1128 00:48:51,736 --> 00:48:53,346 so apparently there's two functions foo 1129 00:48:53,346 --> 00:48:55,216 and bar defined elsewhere in this file. 1130 00:48:55,216 --> 00:48:56,906 I'm just informing GCC that they're going 1131 00:48:56,906 --> 00:48:57,936 to exist at some point. 1132 00:48:58,166 --> 00:48:58,856 So here's main. 1133 00:48:59,136 --> 00:49:02,886 So this in English if you were asked what is this doing declare 1134 00:49:02,886 --> 00:49:05,826 an integer called A. What does that really mean? 1135 00:49:05,886 --> 00:49:08,646 Allocate 32 bits and call it A. 1136 00:49:08,646 --> 00:49:09,866 Where are those bits stored though? 1137 00:49:09,986 --> 00:49:14,006 Where is A in this picture? 1138 00:49:14,916 --> 00:49:15,646 So it would be in the stack. 1139 00:49:16,006 --> 00:49:18,806 This is a local variable, local variables have been 1140 00:49:18,806 --> 00:49:19,996 and remain on the stack. 1141 00:49:20,196 --> 00:49:22,696 The only time something is going to end up on the heap is 1142 00:49:22,696 --> 00:49:24,296 if you explicitly call malloc. 1143 00:49:24,296 --> 00:49:25,476 That is a safe rule of thumb. 1144 00:49:25,756 --> 00:49:26,526 Now how about here? 1145 00:49:26,726 --> 00:49:29,146 As a new site so you might see differences between me 1146 00:49:29,146 --> 00:49:30,516 and the TFs or even textbooks. 1147 00:49:30,876 --> 00:49:33,186 This is probably the best habit to get 1148 00:49:33,186 --> 00:49:36,136 into whereby the star is immediately next 1149 00:49:36,136 --> 00:49:39,966 to the variable name only because it can be confusing 1150 00:49:39,966 --> 00:49:42,346 if you have multiple variables declared on the same line. 1151 00:49:42,576 --> 00:49:45,626 So in short this is probably the best style to get into. 1152 00:49:45,876 --> 00:49:49,126 But char star S means give me a pointer to a character, 1153 00:49:49,396 --> 00:49:55,306 more technically give me 32 bits in which I can store what? 1154 00:49:55,996 --> 00:49:56,116 >> Char. 1155 00:49:56,676 --> 00:50:00,606 >> Not a char but the address of a character. 1156 00:50:00,826 --> 00:50:03,846 So char star S means gives me 32 bits 1157 00:50:04,446 --> 00:50:05,476 that are going to store an address. 1158 00:50:05,476 --> 00:50:06,366 The address of what? 1159 00:50:06,366 --> 00:50:07,966 The address of some char. 1160 00:50:08,206 --> 00:50:10,876 All right, so on the right hand side we now have hello world. 1161 00:50:11,056 --> 00:50:12,406 Now this is hard coded, right? 1162 00:50:12,406 --> 00:50:15,396 This is not a variable, I'm not using guest string so just 1163 00:50:15,396 --> 00:50:17,466 as an aside where is hello world stored? 1164 00:50:17,716 --> 00:50:19,616 This is effectively global data 1165 00:50:19,856 --> 00:50:22,476 so when I said before global variables are way at the top, 1166 00:50:22,476 --> 00:50:25,896 so is a hard coded string like hello also put way up there. 1167 00:50:26,326 --> 00:50:28,026 All right, so what else here do we have? 1168 00:50:28,086 --> 00:50:31,636 We have print a, percent S, backslash N, 1169 00:50:31,636 --> 00:50:32,906 now this looks a little scary. 1170 00:50:33,206 --> 00:50:34,026 So what is S? 1171 00:50:34,356 --> 00:50:36,116 Well first let me delete the scary part. 1172 00:50:36,116 --> 00:50:37,716 So we just have S bracket zero. 1173 00:50:37,956 --> 00:50:39,796 If I print out S bracket zero 1174 00:50:39,796 --> 00:50:42,716 and then put percent C here, that prints a char. 1175 00:50:42,716 --> 00:50:46,296 What char does it print? 1176 00:50:47,016 --> 00:50:51,866 So I just figured it out too, right, W if I counted right. 1177 00:50:52,426 --> 00:50:55,406 Zero, one, two, three, four, five, six, seven, 1178 00:50:55,406 --> 00:50:58,006 so the seventh character zero index is W. 1179 00:50:58,486 --> 00:51:00,886 But if I instead say uh uh, I don't want to print a char, 1180 00:51:00,886 --> 00:51:03,246 I want to print a string so I change the percent C 1181 00:51:03,246 --> 00:51:07,356 to percent S and I don't want to print S bracket seven 1182 00:51:07,356 --> 00:51:10,686 because that's actually the seventh character zero index. 1183 00:51:11,076 --> 00:51:14,136 What does the ampersand operator do in this context? 1184 00:51:14,636 --> 00:51:18,976 What am I getting the address of to be clear? 1185 00:51:19,656 --> 00:51:22,306 I'm getting the address of the seventh character 1186 00:51:22,626 --> 00:51:24,086 so that's kind of interesting. 1187 00:51:24,266 --> 00:51:26,006 Now that doesn't feel like a complete string, 1188 00:51:26,006 --> 00:51:28,506 in fact that's smack dab in the middle of this bigger string. 1189 00:51:28,816 --> 00:51:29,686 But guess what? 1190 00:51:29,686 --> 00:51:33,696 What does this, what does the word world also coincidentally 1191 00:51:33,696 --> 00:51:34,536 end in? 1192 00:51:35,796 --> 00:51:37,556 The null terminator, right, 1193 00:51:37,556 --> 00:51:40,456 that substring world also obviously ends 1194 00:51:40,456 --> 00:51:42,076 in the same backslash zero, 1195 00:51:42,346 --> 00:51:44,756 so ampersand S means get me the address 1196 00:51:44,886 --> 00:51:46,406 of the seventh character. 1197 00:51:46,406 --> 00:51:47,136 Well what is that? 1198 00:51:47,266 --> 00:51:49,896 Well if we're all in agreement that W is it that's 1199 00:51:49,896 --> 00:51:50,806 where we are in the story. 1200 00:51:50,866 --> 00:51:54,736 But when you say ampersand S bracket seven 1201 00:51:54,896 --> 00:51:57,576 that means okay I don't want W, I want the address of W. 1202 00:51:57,776 --> 00:52:00,616 So that means now somewhere, we don't need to project this, 1203 00:52:00,616 --> 00:52:02,196 it's just going to be a silly rectangle. 1204 00:52:02,426 --> 00:52:06,806 But that means somewhere is the word H-E-L-L, 1205 00:52:06,806 --> 00:52:08,516 okay now it's enough words to project, 1206 00:52:08,566 --> 00:52:09,746 let's try our fancy new technology. 1207 00:52:10,876 --> 00:52:12,766 That's how bad my handwriting looks for those 1208 00:52:12,766 --> 00:52:13,486 who have never seen it. 1209 00:52:13,846 --> 00:52:18,776 All right, so we have W-O-R-L-D backslash zero, 1210 00:52:18,776 --> 00:52:21,826 now if I break this up into something that looks a bit more 1211 00:52:21,826 --> 00:52:27,266 like an array, okay normally this would be a much prettier 1212 00:52:27,266 --> 00:52:32,426 picture, but maybe this is not the best use of technology. 1213 00:52:32,426 --> 00:52:34,876 Okay, that's the projector doing that. 1214 00:52:34,876 --> 00:52:39,146 So when we say S, what is S? 1215 00:52:39,416 --> 00:52:42,456 S is the address of the start of the string itself 1216 00:52:42,456 --> 00:52:44,576 and it ends back here at the backslash zero. 1217 00:52:44,816 --> 00:52:47,016 But if I say S bracket zero that takes me 1218 00:52:47,016 --> 00:52:49,526 to the seventh character, zero index, and that's literally W. 1219 00:52:49,526 --> 00:52:51,216 But if I say what's the address of it, 1220 00:52:51,396 --> 00:52:53,766 that's like asking the question what is the address 1221 00:52:54,006 --> 00:52:57,106 of this particular point in this whole array 1222 00:52:57,106 --> 00:52:58,346 that we think of as a string. 1223 00:52:58,496 --> 00:53:00,746 Well who knows what it is, it's like OX 1, 2, 3, 4, 5. 1224 00:53:01,236 --> 00:53:04,166 But if I then say print the string that starts 1225 00:53:04,166 --> 00:53:05,826 at that address what does print F do? 1226 00:53:05,966 --> 00:53:07,946 Well it's a pretty simple function, it starts here 1227 00:53:07,946 --> 00:53:10,746 and it prints, prints, prints, prints and it only stops 1228 00:53:10,746 --> 00:53:12,826 when it sees backslash zero. 1229 00:53:13,186 --> 00:53:15,106 So what word is going to get printed based 1230 00:53:15,106 --> 00:53:17,026 on this second line of code in main? 1231 00:53:18,016 --> 00:53:20,986 World. Now you know again, 1232 00:53:21,146 --> 00:53:22,996 highlighting it now clearly doesn't work just 1233 00:53:22,996 --> 00:53:23,706 as I highlighted it. 1234 00:53:23,966 --> 00:53:26,766 So this is kind of a silly exercise here 1235 00:53:26,766 --> 00:53:29,856 but again it just means that all of this stuff with pointers 1236 00:53:29,856 --> 00:53:33,456 and ampersands and stars just reduces still to these basics. 1237 00:53:33,856 --> 00:53:34,936 So let's do this now. 1238 00:53:35,216 --> 00:53:38,376 A gets five, so what is A, what data type? 1239 00:53:39,756 --> 00:53:40,566 All right so it's an INT, 1240 00:53:40,566 --> 00:53:42,086 we declared that a couple lines before. 1241 00:53:42,226 --> 00:53:44,486 A gets five, puts the number five in A, 1242 00:53:44,486 --> 00:53:45,676 so that's kind of old school. 1243 00:53:46,046 --> 00:53:48,116 Foo A, I don't know what foo's going to do 1244 00:53:48,116 --> 00:53:49,216 so let's scroll down now. 1245 00:53:49,616 --> 00:53:53,906 So foo takes in an argument and that argument is of type INT, 1246 00:53:54,286 --> 00:53:56,736 so at this point in the story what value is an N? 1247 00:53:58,126 --> 00:54:00,656 It's a five but think back to the picture. 1248 00:54:00,886 --> 00:54:02,486 This actually looks a little different, 1249 00:54:02,536 --> 00:54:05,016 so if this is now my picture we're talking 1250 00:54:05,016 --> 00:54:08,396 about foo now not swap but whereas A exists 1251 00:54:08,396 --> 00:54:12,046 in main's local frame, the second rectangle here, 1252 00:54:12,046 --> 00:54:17,786 we call that foo so actually we have a computer here. 1253 00:54:18,376 --> 00:54:21,486 Foo's locals, foo's parameters, 1254 00:54:22,096 --> 00:54:26,046 so where is actually this number N stored? 1255 00:54:26,046 --> 00:54:27,516 Well it's going to be stored 1256 00:54:27,936 --> 00:54:29,956 in the thing called foo's parameters. 1257 00:54:30,046 --> 00:54:32,356 So they're distinct copies, and we can see that now 1258 00:54:32,356 --> 00:54:33,276 with this kind of picture. 1259 00:54:33,426 --> 00:54:35,616 All right so what am I actually going to do with foo? 1260 00:54:35,886 --> 00:54:38,856 So let's see INT B, so this is yet another local variable, 1261 00:54:39,126 --> 00:54:41,546 B gets N's so now everyone in the world has a value 1262 00:54:41,546 --> 00:54:44,226 of five it seems, so B is five, N is five, 1263 00:54:44,266 --> 00:54:45,766 back over there A is five. 1264 00:54:46,176 --> 00:54:49,466 So B star equals two, so what's the value of B? 1265 00:54:50,866 --> 00:54:54,316 So ten, right, so five times two is ten, so I'm passing 10 1266 00:54:54,376 --> 00:54:57,336 into bar so we're almost done with this story. 1267 00:54:57,916 --> 00:54:59,026 Okay, this is kind of stupid. 1268 00:54:59,206 --> 00:55:00,796 All bar does is print this. 1269 00:55:01,686 --> 00:55:04,256 Now what's the takeaway here? 1270 00:55:05,366 --> 00:55:08,766 Well actually so the takeaway here is if I had thought 1271 00:55:08,766 --> 00:55:10,576 about this we would have been walking through this with GDB 1272 00:55:10,576 --> 00:55:12,966 but we did that earlier so it's not all that enlightening now 1273 00:55:12,966 --> 00:55:14,376 to step through each of these frames. 1274 00:55:14,716 --> 00:55:18,336 But if we now get down to bar here and I print out M, 1275 00:55:18,586 --> 00:55:23,476 M is going to be-let's at least close this loop-is going 1276 00:55:23,476 --> 00:55:25,036 to be ten, right. 1277 00:55:25,036 --> 00:55:26,476 We're just not doing anything with it. 1278 00:55:26,906 --> 00:55:28,556 So actually we can now do this. 1279 00:55:28,556 --> 00:55:29,876 Let me go in here and I'm going 1280 00:55:29,876 --> 00:55:31,856 to compile this program called bar. 1281 00:55:31,856 --> 00:55:36,046 So make bar, all right I'm going to now run GDB on it, 1282 00:55:36,286 --> 00:55:40,846 so GDB bar dot slash bar, enter. 1283 00:55:40,956 --> 00:55:42,286 All right, so now I'm going 1284 00:55:42,286 --> 00:55:45,856 to do a break point at-let's do a breakpoint, let's not step 1285 00:55:45,856 --> 00:55:46,916 through this because it gets very boring, 1286 00:55:46,916 --> 00:55:49,686 let's do a breakpoint at bar, this is the very last function 1287 00:55:49,686 --> 00:55:51,136 that ever gets called in this program. 1288 00:55:51,606 --> 00:55:52,996 So now let's go ahead and do run. 1289 00:55:53,496 --> 00:55:56,046 Okay, so now this is more interesting than last time 1290 00:55:56,046 --> 00:55:58,616 because we don't have to step through all the tedium with GDB. 1291 00:55:58,616 --> 00:56:00,766 At this point in the story I'm kind of pretty deep. 1292 00:56:01,066 --> 00:56:03,066 That picture is actually starting to stack up. 1293 00:56:03,066 --> 00:56:05,086 If I type back trace and hit enter, 1294 00:56:05,436 --> 00:56:07,696 notice I can see a reminder here what's going on. 1295 00:56:07,696 --> 00:56:10,276 At the very bottom of my stack is main and its variables, 1296 00:56:10,626 --> 00:56:14,346 in foo is its variables, in bar is its variables. 1297 00:56:14,466 --> 00:56:17,296 And again this is even if you've not yet dabbled 1298 00:56:17,356 --> 00:56:19,736 in GDB the goal here really is just 1299 00:56:19,736 --> 00:56:21,086 to show you some opportunities. 1300 00:56:21,406 --> 00:56:24,236 I can obviously print out in bar the value of M, 1301 00:56:24,756 --> 00:56:27,876 again M is just this simple thing here, the input. 1302 00:56:28,096 --> 00:56:31,206 But suppose I get curious in GDB and I've kind of gone too far 1303 00:56:31,416 --> 00:56:33,506 and I actually want to know wait a minute, what was the value 1304 00:56:33,506 --> 00:56:36,706 of B inside of the foo function, right. 1305 00:56:36,706 --> 00:56:38,436 You might think that damn, now I have to kind 1306 00:56:38,436 --> 00:56:40,976 of restart my programs, set a new breakpoint in foo, 1307 00:56:40,976 --> 00:56:43,656 well you don't need to do that, right. 1308 00:56:43,656 --> 00:56:46,286 We have paused execution by way of a breakpoint, 1309 00:56:46,466 --> 00:56:48,956 you can see where you've been with back trace. 1310 00:56:49,146 --> 00:56:52,026 What's really powerful about GDB is you can kind of go back 1311 00:56:52,026 --> 00:56:54,046 in time or at least with regard 1312 00:56:54,046 --> 00:56:57,836 to this picture you can access any of these frames you want 1313 00:56:57,836 --> 00:57:00,396 within GDB simply by typing this command. 1314 00:57:00,566 --> 00:57:03,586 If I'm curious as to what the value of B was 1315 00:57:03,976 --> 00:57:06,286 in the foo function, even though I'm currently 1316 00:57:06,286 --> 00:57:07,596 in the bar function because that's 1317 00:57:07,596 --> 00:57:11,016 where I set my breakpoint, I can say frame number one 1318 00:57:11,486 --> 00:57:15,006 and hit enter and voila, it kind of winds back time. 1319 00:57:15,006 --> 00:57:16,416 It doesn't undo any of the work, 1320 00:57:16,416 --> 00:57:19,876 it's now showing me what's inside of that frame number one 1321 00:57:20,116 --> 00:57:23,706 so I now can do something like what was B and ten. 1322 00:57:23,956 --> 00:57:26,006 So again this just hints that again the power 1323 00:57:26,006 --> 00:57:26,766 of something like GDB. 1324 00:57:26,766 --> 00:57:29,336 You can literally pause execution of your program 1325 00:57:29,336 --> 00:57:31,436 and go anywhere you want in your memory 1326 00:57:31,646 --> 00:57:33,336 and simply understanding the order 1327 00:57:33,336 --> 00:57:35,616 in which your own functions are getting called will let you 1328 00:57:35,616 --> 00:57:37,306 begin to traverse this. 1329 00:57:38,266 --> 00:57:42,926 All right, so let's do one other thing now here. 1330 00:57:43,466 --> 00:57:47,486 The CS50 library, so in the CS50 library which we've been taking 1331 00:57:47,486 --> 00:57:50,886 for granted and will decreasingly actually be using 1332 00:57:50,886 --> 00:57:55,216 now that we've revealed what a string actually is, a char star. 1333 00:57:55,516 --> 00:57:58,616 Let me scroll down, there's get char, there's get double, 1334 00:57:58,986 --> 00:58:01,276 there's get floats, there's get INTs, 1335 00:58:01,626 --> 00:58:04,076 there's get long long, there's get string. 1336 00:58:04,076 --> 00:58:06,366 Get string's the only one we're going to look at right now 1337 00:58:06,756 --> 00:58:09,986 because it actually is used by all of those other functions. 1338 00:58:09,986 --> 00:58:12,226 When you call get INT it's actually getting a string 1339 00:58:12,226 --> 00:58:14,436 from the user and then effectively using HY 1340 00:58:14,436 --> 00:58:17,226 or an equivalent to convert it to an integer for you. 1341 00:58:17,556 --> 00:58:19,356 So what is get string doing? 1342 00:58:19,356 --> 00:58:22,386 I'm going to skip over these local variables here 1343 00:58:22,766 --> 00:58:24,986 and just reveal a little bit about what's going on. 1344 00:58:25,546 --> 00:58:29,906 So inside of get string there's this function that rarely 1345 00:58:29,906 --> 00:58:31,596 if ever you will have to use yourself. 1346 00:58:31,956 --> 00:58:35,006 But F get C you can kind of guess what it might do. 1347 00:58:35,056 --> 00:58:36,486 Get C, what are we probably getting? 1348 00:58:37,596 --> 00:58:39,876 It's getting a char, one char at a time. 1349 00:58:40,106 --> 00:58:42,256 The reason for this is that as I keep saying 1350 00:58:42,256 --> 00:58:43,976 and we'll soon see it's really easy 1351 00:58:44,246 --> 00:58:46,966 to overstep again the boundaries of your own memory 1352 00:58:46,966 --> 00:58:49,886 in a programming language like C and so if you get greedy 1353 00:58:49,936 --> 00:58:51,536 and you say give me 10 characters at once 1354 00:58:51,536 --> 00:58:53,866 or give me 20, but the user gives you fewer than that 1355 00:58:53,866 --> 00:58:56,426 or more than that again site fault, core dump, crashes, 1356 00:58:56,426 --> 00:58:58,696 hacking, bad stuff is the takeaway there. 1357 00:58:59,036 --> 00:59:02,066 So with get C even though it feels a little less efficient 1358 00:59:02,066 --> 00:59:04,866 what get string in the CS50 library is really doing is it's 1359 00:59:04,866 --> 00:59:07,116 getting one character at a time from the user. 1360 00:59:07,336 --> 00:59:09,976 And this very paranoid approach to getting input 1361 00:59:09,976 --> 00:59:11,396 from the keyboard ensures 1362 00:59:11,396 --> 00:59:15,516 that we never go past what the user has actually typed in. 1363 00:59:15,876 --> 00:59:17,266 Now what's really happening here? 1364 00:59:17,506 --> 00:59:19,866 Well the way CS50's library works is 1365 00:59:19,866 --> 00:59:23,126 that initially it arbitrarily allocates a buffer that looks 1366 00:59:23,126 --> 00:59:24,686 like that that's size 32. 1367 00:59:24,876 --> 00:59:28,366 We made an educated guess that typically 32 bytes is plenty 1368 00:59:28,366 --> 00:59:30,826 for the types of words you guys might type into a program. 1369 00:59:31,136 --> 00:59:32,406 But just in case that's wrong, 1370 00:59:32,406 --> 00:59:35,136 and just in case you actually want to type in 33 characters 1371 00:59:35,136 --> 00:59:38,276 or 100 characters, the get string function needs 1372 00:59:38,276 --> 00:59:40,786 to be able to grow dynamically. 1373 00:59:40,786 --> 00:59:41,586 It needs to be able to say 1374 00:59:41,586 --> 00:59:43,246 to the operating system we screwed up, 1375 00:59:43,246 --> 00:59:44,576 we asked for 32 bytes, 1376 00:59:44,806 --> 00:59:47,126 we actually need 64 bytes, can we have more. 1377 00:59:47,386 --> 00:59:50,386 So what the CS50 library actually does if you read 1378 00:59:50,386 --> 00:59:53,226 through this is if we find 1379 00:59:53,736 --> 00:59:55,526 that you know what the so-called buffer, 1380 00:59:55,526 --> 00:59:58,346 the space we've allocated in get string is too little, 1381 00:59:58,586 --> 00:59:59,976 turns out there's another function 1382 00:59:59,976 --> 01:00:01,856 and this is not something you'll likely need to call 1383 01:00:01,856 --> 01:00:04,126 at least during the term but called realloc 1384 01:00:04,416 --> 01:00:07,656 that will do literally that, it will reallocate an array for you 1385 01:00:07,846 --> 01:00:09,256 and find you more space. 1386 01:00:09,356 --> 01:00:11,836 And if need be, if you've been allocated here 1387 01:00:11,836 --> 01:00:13,496 and someone else is already here, 1388 01:00:13,496 --> 01:00:15,846 what realloc will do is it will say all right you want more 1389 01:00:15,846 --> 01:00:18,116 memory, I'm going to move your whole array here 1390 01:00:18,116 --> 01:00:20,786 and put it elsewhere so I can then give you more space 1391 01:00:20,836 --> 01:00:22,346 so that it all remains contiguous. 1392 01:00:22,536 --> 01:00:23,646 And so what realloc does 1393 01:00:23,646 --> 01:00:26,656 which is nice is you just can't move an array around in memory, 1394 01:00:26,656 --> 01:00:28,366 you can't just move bytes from one place to another. 1395 01:00:28,366 --> 01:00:29,416 But you can do what? 1396 01:00:29,416 --> 01:00:32,086 You can copy them from one place to another 1397 01:00:32,086 --> 01:00:34,356 and so realloc actually does this. 1398 01:00:34,926 --> 01:00:37,066 So we'll come back to this before long but know 1399 01:00:37,066 --> 01:00:40,696 that all this time the CS50 library has been using malloc 1400 01:00:40,696 --> 01:00:43,136 and in turn realloc that's a family of functions 1401 01:00:43,376 --> 01:00:44,486 to allocate a chunk of memory 1402 01:00:44,486 --> 01:00:46,656 and if it needs more, more, more, more. 1403 01:00:46,656 --> 01:00:50,376 And now where does get string put that memory 1404 01:00:51,266 --> 01:00:55,706 that it's handing you when you call get string? 1405 01:00:55,906 --> 01:00:59,426 Where must the strings from get string be getting put 1406 01:00:59,426 --> 01:01:01,066 in this memory when they're handed back to you? 1407 01:01:01,676 --> 01:01:05,026 It's got to be the heap, right, 1408 01:01:05,026 --> 01:01:07,606 because if get string foolishly put its memory 1409 01:01:07,606 --> 01:01:09,626 on the stack what would obviously happen the moment get 1410 01:01:09,626 --> 01:01:10,296 string returns? 1411 01:01:11,796 --> 01:01:14,826 Like that memory, you know it might still say H-E-L-L-O 1412 01:01:14,826 --> 01:01:17,006 in that stack frame but now as soon 1413 01:01:17,006 --> 01:01:18,276 as you call another function bam 1414 01:01:18,276 --> 01:01:19,696 that string's going to get overwritten. 1415 01:01:19,946 --> 01:01:22,066 So what was important to us at the start of this semester 1416 01:01:22,066 --> 01:01:24,596 that anytime you allocate a string that it ends 1417 01:01:24,596 --> 01:01:28,866 up in the heap so that your functions can get called, 1418 01:01:29,466 --> 01:01:31,926 finish calling, call return, call return, 1419 01:01:32,136 --> 01:01:33,556 and that string stays alive 1420 01:01:33,796 --> 01:01:35,926 so all this time we've actually been using the heap. 1421 01:01:35,926 --> 01:01:37,006 And we'll see before long 1422 01:01:37,326 --> 01:01:39,086 yet more sophisticated approaches to that. 1423 01:01:39,836 --> 01:01:41,536 So let me just give a sense now 1424 01:01:41,846 --> 01:01:45,086 of why bad things happen so easily. 1425 01:01:45,486 --> 01:01:48,456 Let me actually skip over the actual code here 1426 01:01:48,456 --> 01:01:49,156 and focus on this. 1427 01:01:49,286 --> 01:01:50,646 So if you want to read more about this, 1428 01:01:50,726 --> 01:01:53,006 especially if you're the hacker edition type, 1429 01:01:53,006 --> 01:01:55,286 this is actually excerpted this picture from a Wikipedia article 1430 01:01:55,286 --> 01:01:56,636 that goes into much more detail. 1431 01:01:56,986 --> 01:02:00,286 But this is a picture right now of a stack, right, 1432 01:02:00,286 --> 01:02:02,366 it's from bottom to top that we keep looking at 1433 01:02:02,366 --> 01:02:03,996 but it's got some colors depicted here 1434 01:02:03,996 --> 01:02:05,076 to make things more friendly. 1435 01:02:05,566 --> 01:02:08,316 But it turns out that I've kind of been simplifying the world. 1436 01:02:08,416 --> 01:02:11,846 When I say that on the stack is something like main's parameters 1437 01:02:11,846 --> 01:02:14,056 and main's local variables and foo's parameters 1438 01:02:14,056 --> 01:02:15,336 and foo's local variables, 1439 01:02:15,656 --> 01:02:17,556 there's actually been other stuff still. 1440 01:02:17,996 --> 01:02:24,766 Long story short when a function calls another function the only 1441 01:02:24,766 --> 01:02:27,516 way that guy, the second guy knows 1442 01:02:27,716 --> 01:02:29,336 where he should hand control back 1443 01:02:29,416 --> 01:02:33,286 to when he's done executing is because what also is going 1444 01:02:33,286 --> 01:02:36,016 on the stack is each function's return address. 1445 01:02:36,376 --> 01:02:38,766 In other words when you call a function even like print F, 1446 01:02:38,766 --> 01:02:40,996 this is kind of like happening all these weeks behind 1447 01:02:40,996 --> 01:02:41,496 the scenes. 1448 01:02:41,766 --> 01:02:44,186 You're essentially writing on a piece of paper your address, 1449 01:02:44,186 --> 01:02:46,406 your main function's address, your foo function's address, 1450 01:02:46,406 --> 01:02:48,606 whatever function you wrote you're handing it to print F 1451 01:02:48,606 --> 01:02:50,886 and saying do your thing and then return 1452 01:02:50,886 --> 01:02:52,376 to this address which is my address. 1453 01:02:53,116 --> 01:02:56,456 So, the problem though is that this picture then means on top 1454 01:02:56,456 --> 01:03:00,116 of the stack is a whole bunch of juicy information, 1455 01:03:00,446 --> 01:03:03,056 not only are there things like variables. 1456 01:03:03,056 --> 01:03:04,686 Notice up here there's the variable, 1457 01:03:05,026 --> 01:03:07,526 there's the variable bar whatever that is 1458 01:03:07,586 --> 01:03:09,286 and the variable C whatever that is. 1459 01:03:09,576 --> 01:03:11,786 But these things I just mentioned, particularly this one 1460 01:03:11,786 --> 01:03:13,266 in red is a return address. 1461 01:03:14,036 --> 01:03:15,356 So the return address is that sheet 1462 01:03:15,356 --> 01:03:17,216 of paper that's been secretly being put 1463 01:03:17,216 --> 01:03:18,976 on the stack every time you call a function. 1464 01:03:19,396 --> 01:03:20,626 But here's the bad thing 1465 01:03:20,626 --> 01:03:22,756 and here's how this all ties together with arrays. 1466 01:03:23,076 --> 01:03:25,476 We've said before that an array is a fixed size 1467 01:03:25,966 --> 01:03:28,346 and bad things happen if you go too far 1468 01:03:28,346 --> 01:03:29,916 to the right of your array. 1469 01:03:30,296 --> 01:03:31,926 But notice what happens. 1470 01:03:31,926 --> 01:03:34,966 When you're calling functions notice how this arrow reminds 1471 01:03:34,966 --> 01:03:37,906 us, the stack grows up, up, up, up, up, up, 1472 01:03:38,266 --> 01:03:40,886 but it was someone's bright idea years ago 1473 01:03:40,886 --> 01:03:43,806 that when you store an array in memory the first letter, 1474 01:03:43,806 --> 01:03:47,036 for instance H, is going to get put here 1475 01:03:47,846 --> 01:03:50,336 but then the next one goes there, there, there, 1476 01:03:50,336 --> 01:03:52,306 and again this is a picture so it's not really a rectangle. 1477 01:03:52,306 --> 01:03:55,836 But then you move down, down, down, down, wrap around, down, 1478 01:03:55,836 --> 01:03:58,536 down, down, so what happens in this model 1479 01:03:59,046 --> 01:04:02,296 if the string the user types is not H-E-L-L-O but is 1480 01:04:02,296 --> 01:04:03,756 like 1,000 characters. 1481 01:04:04,196 --> 01:04:05,346 Where does it go? 1482 01:04:05,396 --> 01:04:07,226 What memory gets clobbered so to speak? 1483 01:04:08,036 --> 01:04:11,786 Well all of this stuff, so what really sophisticated 1484 01:04:11,786 --> 01:04:16,836 or lucky bad guys do is they try to-and this can be as simple 1485 01:04:16,836 --> 01:04:18,596 as running a program at the prompt and typing 1486 01:04:18,596 --> 01:04:20,296 in some crazy long string and seeing 1487 01:04:20,296 --> 01:04:22,746 if it crashes-what a bad guy will do is try 1488 01:04:22,746 --> 01:04:24,966 to inject let's call it attack code, 1489 01:04:24,966 --> 01:04:26,266 generalize it as A for now. 1490 01:04:26,466 --> 01:04:27,496 What is an attack code? 1491 01:04:27,696 --> 01:04:29,186 It can be anything, it can be something stupid 1492 01:04:29,186 --> 01:04:31,816 like spam everyone in the computer's address book 1493 01:04:31,816 --> 01:04:33,596 or it could be erase everything on the hard drive. 1494 01:04:33,596 --> 01:04:35,096 Attack code is whatever you want it to be. 1495 01:04:35,556 --> 01:04:37,316 But a bad guy will essentially do this 1496 01:04:37,586 --> 01:04:38,966 and the colors here are important. 1497 01:04:39,156 --> 01:04:40,546 What did the red colors represent? 1498 01:04:41,926 --> 01:04:43,356 The return address, that sheet of paper. 1499 01:04:43,656 --> 01:04:46,836 So a really smart bad guy will somehow figure 1500 01:04:46,836 --> 01:04:48,826 out to be honest quite often by trial and error, 1501 01:04:48,826 --> 01:04:50,816 a whole lot of trials and a whole lot of errors, 1502 01:04:50,816 --> 01:04:53,106 but it only takes one success to then break into the system. 1503 01:04:53,466 --> 01:04:57,216 He or she will try to figure out you know where is this address, 1504 01:04:57,846 --> 01:05:00,376 where is that sheet of paper being stored in this program 1505 01:05:00,646 --> 01:05:02,946 because I'm going to type in some crazy long sequence 1506 01:05:02,946 --> 01:05:05,136 of characters and it might not even be alphabetical [inaudible] 1507 01:05:05,356 --> 01:05:07,926 characters, it can be like those unprintable characters 1508 01:05:07,926 --> 01:05:09,296 where like it looks weird on the screen 1509 01:05:09,406 --> 01:05:10,586 because it's just zeros and ones. 1510 01:05:10,586 --> 01:05:13,536 I'm going to inject some kind of crazy long string, attack, 1511 01:05:13,696 --> 01:05:15,526 attack, attack, attack, attack, but I just have 1512 01:05:15,526 --> 01:05:16,746 to get one detail right. 1513 01:05:17,066 --> 01:05:19,916 The very last thing I type in at the keyboard or pass 1514 01:05:20,066 --> 01:05:22,586 through the server via the internet, say via URL, 1515 01:05:22,936 --> 01:05:26,916 has to be the address of my own attack code. 1516 01:05:27,506 --> 01:05:30,216 So if what you do when prompted by something 1517 01:05:30,216 --> 01:05:33,276 like a get string is you type in the equivalent on your keyboard 1518 01:05:33,276 --> 01:05:35,826 of an attack function that again deletes something, sends spam, 1519 01:05:35,826 --> 01:05:38,316 whatever, but you get the very last detail right 1520 01:05:38,316 --> 01:05:41,306 and you say the address of my own attack code is going 1521 01:05:41,306 --> 01:05:45,606 to start at this address 80C0, whatever that is. 1522 01:05:45,606 --> 01:05:50,826 So long as you put that address 8035C080 to refer back to you, 1523 01:05:50,916 --> 01:05:51,806 what's going to happen? 1524 01:05:51,806 --> 01:05:54,056 Whatever function this is, print F, get string or whatnot, 1525 01:05:54,196 --> 01:05:55,346 he's going to finish executing, 1526 01:05:55,346 --> 01:05:57,236 he's going to say all right I've been told to return 1527 01:05:57,236 --> 01:06:00,176 to this address but what just happened, someone else wrote 1528 01:06:00,176 --> 01:06:03,256 over what I wrote on this thereby telling me to return 1529 01:06:03,256 --> 01:06:04,586 to a different return address 1530 01:06:04,586 --> 01:06:07,026 and that return address is depicted in A there 1531 01:06:07,336 --> 01:06:09,926 and that might literally mean go erase the hard drive now. 1532 01:06:10,416 --> 01:06:12,046 So how do bad guys figure this out? 1533 01:06:12,046 --> 01:06:13,776 It really is a lot of trial and error. 1534 01:06:13,776 --> 01:06:16,586 Anytime you visit a website, anytime you run a program 1535 01:06:16,586 --> 01:06:19,786 and it crashes, it core dumps, you get file not found 1536 01:06:19,786 --> 01:06:22,296 or internal server error occurred, 1537 01:06:22,486 --> 01:06:25,196 this is prime hacking opportunities for bad guys 1538 01:06:25,196 --> 01:06:27,306 because what it suggests is not necessarily 1539 01:06:27,306 --> 01:06:30,136 that there's a vulnerability but there's at least a mistake. 1540 01:06:30,216 --> 01:06:33,496 Someone screwed up and did not check the length of an array, 1541 01:06:33,496 --> 01:06:35,376 someone let me go too far in memory, 1542 01:06:35,626 --> 01:06:38,546 and so if I can actually now guess by trial 1543 01:06:38,546 --> 01:06:40,566 and error writing a program that generates a whole bunch 1544 01:06:40,566 --> 01:06:42,846 of garbage just seeing which one actually gets me 1545 01:06:42,846 --> 01:06:46,026 into the system, this is incredibly common how bad guys 1546 01:06:46,266 --> 01:06:47,626 get into systems. 1547 01:06:47,626 --> 01:06:50,606 And in fact if you ever run a web server as you might do 1548 01:06:50,606 --> 01:06:52,976 for your final project and troll through your own web logs, 1549 01:06:53,256 --> 01:06:55,586 you'll see that there are these people called script kitties 1550 01:06:55,586 --> 01:06:57,616 on the internet who essentially run programs like this, 1551 01:06:57,616 --> 01:07:00,036 sometimes not even knowing who you are or caring who you are, 1552 01:07:00,226 --> 01:07:02,646 but trying to send garbage values, garbage values 1553 01:07:02,646 --> 01:07:04,126 to your web server in particular 1554 01:07:04,346 --> 01:07:07,436 and so you'll see crazy long URLs show up in your log 1555 01:07:07,706 --> 01:07:10,186 because it's been incredibly common that web servers 1556 01:07:10,186 --> 01:07:13,636 on Linux, on Windows, on MAC OS actually have flaws like this 1557 01:07:13,816 --> 01:07:17,306 where if you get that input just right you can trick the computer 1558 01:07:17,306 --> 01:07:19,256 into doing anything you want. 1559 01:07:19,676 --> 01:07:23,236 So with that said, we'll see you on Wednesday. 1560 01:07:31,136 --> 01:07:33,296 The end.