1 00:00:09,136 --> 00:00:11,706 >> All right welcome back to CS50. 2 00:00:11,776 --> 00:00:14,706 This is the start of week 7. 3 00:00:14,706 --> 00:00:19,116 So it was actually around this time last year that I went 4 00:00:19,116 --> 00:00:22,256 into a bit of rant when I was in the subway station 5 00:00:22,256 --> 00:00:24,436 in Harvard Square and there was this nice lady 6 00:00:24,436 --> 00:00:26,816 who was clearly foreign, had never been 7 00:00:26,816 --> 00:00:29,766 to our subway system before and all she wanted to do was to get 8 00:00:29,766 --> 00:00:32,256 from a Harvard Square to Park Street, pretty reasonable goal, 9 00:00:32,256 --> 00:00:33,276 pretty straight forward route. 10 00:00:33,736 --> 00:00:35,796 And so she walked up to this device here. 11 00:00:36,256 --> 00:00:39,246 Blurry only in as much as I'm bad in photography, 12 00:00:39,536 --> 00:00:41,876 but this is-- these machines if you've never used them before 13 00:00:41,876 --> 00:00:42,896 that they're very high-tech. 14 00:00:42,896 --> 00:00:44,646 You can touch the screens and interact with them 15 00:00:44,646 --> 00:00:46,606 and actually buy yourself a train ticket. 16 00:00:46,886 --> 00:00:50,306 Unfortunately, the sheer number of steps involved 17 00:00:50,306 --> 00:00:52,726 in buying something as simple as a one way ticket 18 00:00:52,726 --> 00:00:55,456 to Park Street involves some 7 plus steps. 19 00:00:55,456 --> 00:00:58,116 So you walk up as I did with this woman who had turned to me 20 00:00:58,116 --> 00:00:59,846 or really any one around for some help 21 00:00:59,846 --> 00:01:03,146 because she could not make sense of our American machines here. 22 00:01:04,096 --> 00:01:08,096 You walk up, you see the screen and you've got 3 huge buttons 23 00:01:08,096 --> 00:01:11,036 and the one you ultimately want is that one up there. 24 00:01:11,036 --> 00:01:13,056 To purchase new Charlie Ticket value tickets 25 00:01:13,056 --> 00:01:14,046 and passes press here. 26 00:01:14,316 --> 00:01:16,056 Now as an aside and from the perspective 27 00:01:16,056 --> 00:01:18,226 of user interface what is a Charlie Ticket? 28 00:01:18,296 --> 00:01:19,596 All right, we're already assuming 29 00:01:19,596 --> 00:01:22,096 that some outsider knows what we are talking 30 00:01:22,096 --> 00:01:23,346 about when we say Charlie Ticket 31 00:01:23,346 --> 00:01:24,716 but we can forgive that and move on. 32 00:01:25,186 --> 00:01:26,916 So now, I'm asked what I would like to purchase. 33 00:01:27,266 --> 00:01:29,436 This one is admittedly pretty straight forward. 34 00:01:29,546 --> 00:01:32,246 So you click on the left to actually take the subway 35 00:01:32,606 --> 00:01:34,696 and then you are asked for your fare here and so 36 00:01:34,696 --> 00:01:37,046 that one too is fairly straight forward and then you get 37 00:01:37,046 --> 00:01:39,076 to this point which is a curious one 38 00:01:39,076 --> 00:01:42,756 and that going downtown cost neither 5 dollars, 10 dollars, 39 00:01:42,756 --> 00:01:45,266 nor 20 dollars so you have to click other amount 40 00:01:45,266 --> 00:01:48,296 if you just want that one way pass which these days I think is 41 00:01:48,296 --> 00:01:50,446 like a dollar 50, 2 dollars, right. 42 00:01:50,446 --> 00:01:52,976 So clearly an up sell attempt here and I think 43 00:01:52,976 --> 00:01:55,976 if we do the math, even 5 dollars is not the cost 44 00:01:55,976 --> 00:01:58,216 of a round trip, I think it's probably 4 dollars or so. 45 00:01:58,726 --> 00:02:01,346 So we click other amounts, this woman and I and then you get 46 00:02:01,346 --> 00:02:04,526 to this and God forbid you wanna spend some triple digit amount 47 00:02:04,526 --> 00:02:06,046 of money just to get to Park Street. 48 00:02:06,716 --> 00:02:09,696 So now, you're on the screen and you input your number 49 00:02:09,986 --> 00:02:11,726 and finally you get to this screen 50 00:02:11,726 --> 00:02:13,526 which says you're buying an SV adult, 51 00:02:13,526 --> 00:02:14,616 whatever the heck that is. 52 00:02:14,896 --> 00:02:17,386 It turns out it's single value but who cares. 53 00:02:17,676 --> 00:02:19,586 And then amount to pay 4 dollars. 54 00:02:19,586 --> 00:02:22,526 Now I can credit card, debit card, and long story short. 55 00:02:22,526 --> 00:02:24,946 Dear God, it took us like 5 minutes just 56 00:02:24,946 --> 00:02:26,826 to buy a one way ticket to Park Street. 57 00:02:26,996 --> 00:02:29,896 Now, rewind to yesteryear just when I was here 58 00:02:29,896 --> 00:02:31,616 and we didn't have these fancy touchscreens. 59 00:02:31,616 --> 00:02:34,016 We instead had these devices where you put your dollars in 60 00:02:34,016 --> 00:02:36,716 and you get a T token, a metallic token drop it in a slot 61 00:02:36,716 --> 00:02:37,716 and then you're on your way. 62 00:02:37,956 --> 00:02:39,166 This is not an improvement. 63 00:02:39,166 --> 00:02:41,416 I would wager and yet this is representative, 64 00:02:41,416 --> 00:02:45,536 this kind of screen of a horrible user interface and sort 65 00:02:45,536 --> 00:02:46,786 of technology gone wrong. 66 00:02:46,786 --> 00:02:48,326 And unfortunately technology 67 00:02:48,496 --> 00:02:51,326 and by extension computer science too often get a bad rap 68 00:02:51,626 --> 00:02:54,736 for being beyond the scope of most mortals because we're 69 00:02:54,736 --> 00:02:57,066 so often presented with crap like this, right? 70 00:02:57,066 --> 00:03:00,286 Interfaces that are not optimized for the common case 71 00:03:00,286 --> 00:03:03,656 which I dare say is to get on subway and go somewhere not 72 00:03:03,656 --> 00:03:05,486 to go through the 7 some odd steps. 73 00:03:05,486 --> 00:03:08,256 So you'll find in problem set 5 I've formed this week 74 00:03:08,486 --> 00:03:10,266 to go online tomorrow as usual. 75 00:03:10,696 --> 00:03:14,106 You'll be asked to keep an eye out this week or to think 76 00:03:14,106 --> 00:03:17,726 about some piece of hardware or software whether 77 00:03:17,726 --> 00:03:20,606 in subway stations in your own life, on campus here 78 00:03:20,606 --> 00:03:22,696 at Harvard that could be better. 79 00:03:22,696 --> 00:03:27,716 And we're gonna ask you for your simply real world human instinct 80 00:03:27,716 --> 00:03:31,136 as to how this software or hardware device could be better 81 00:03:31,196 --> 00:03:32,236 in terms of its design. 82 00:03:32,466 --> 00:03:34,426 Now, recurring theme when we did this exercise 83 00:03:34,426 --> 00:03:36,836 for the last time last year is that there was a lot 84 00:03:36,836 --> 00:03:39,106 of Harvard centric things that were pointed out. 85 00:03:39,346 --> 00:03:41,806 I'm sure some of you can think of various web sites 86 00:03:41,806 --> 00:03:44,786 of which you are all too fond here within Harvard.edu 87 00:03:45,026 --> 00:03:46,136 that could use some tinkering. 88 00:03:46,426 --> 00:03:49,716 So do know that we will actually group some of your feedback 89 00:03:49,716 --> 00:03:52,416 by Harvard and non Harvard specific feedback so that we 90 00:03:52,416 --> 00:03:54,856 as a course will escalate your critiques, 91 00:03:55,016 --> 00:03:57,936 your informed judgments of various tools so that hopefully, 92 00:03:57,936 --> 00:04:01,416 collectively as a university we can rise up and present 93 00:04:01,526 --> 00:04:04,486 to each other interfaces that are not quite like this. 94 00:04:04,486 --> 00:04:06,146 Those of you who are not freshmen but lived 95 00:04:06,146 --> 00:04:09,306 through [inaudible] from college.harvard.edu 96 00:04:09,926 --> 00:04:13,576 in years past will know that that system is no more, 97 00:04:13,576 --> 00:04:16,166 replaced now with Gmail and I dare say 98 00:04:16,166 --> 00:04:19,186 that in particular was a recurring theme 99 00:04:19,186 --> 00:04:22,716 in last year's survey and look, that problem is gone. 100 00:04:22,716 --> 00:04:26,056 So perhaps, we as a class can similarly chip away this year. 101 00:04:26,426 --> 00:04:29,566 So with that rant aside, so it's a sad week 102 00:04:29,566 --> 00:04:31,536 over the past few days where we of course 103 00:04:31,536 --> 00:04:33,726 as a world lost Steve Jobst and to be honest, 104 00:04:33,726 --> 00:04:36,616 I would say in the context of this MBTA system in one 105 00:04:36,676 --> 00:04:40,036 of the most amazing things Apple really has done is take 106 00:04:40,326 --> 00:04:43,096 technologies and make them simpler, right. 107 00:04:43,096 --> 00:04:45,626 Mac OS itself is not fundamentally any simpler I 108 00:04:45,626 --> 00:04:47,826 would say than windows these days but these phones 109 00:04:47,826 --> 00:04:49,466 that we have, these iPads that we have 110 00:04:49,746 --> 00:04:51,946 like even technophiles have fallen in love with these things 111 00:04:51,946 --> 00:04:55,096 because they just make things easier and we would be remiss 112 00:04:55,096 --> 00:04:56,446 in not pointing at another gentleman 113 00:04:56,446 --> 00:04:58,906 who unfortunately passed away last week named Denis Ritchie. 114 00:04:59,346 --> 00:05:01,936 This is the author of the C language 115 00:05:01,936 --> 00:05:04,416 with which you have been all to acquainted this semester 116 00:05:04,606 --> 00:05:06,496 and he was also one of the co-contributors 117 00:05:06,496 --> 00:05:09,936 to the UNIX operating system which then gave rise to things 118 00:05:09,936 --> 00:05:12,166 like Linux and the like which we've been using in this course. 119 00:05:12,166 --> 00:05:14,256 So there's a really nice obituary in the New York Times. 120 00:05:14,256 --> 00:05:16,686 If you Google his name, Dennis Ritchie but know that it is 121 00:05:16,686 --> 00:05:21,316 to this man to whom we owe most everything we've been talking 122 00:05:21,316 --> 00:05:22,526 about on this class. 123 00:05:23,256 --> 00:05:26,666 So with that said, what's on the horizon ahead. 124 00:05:26,666 --> 00:05:30,666 So problem at 5 introduces not only the world of multimedia 125 00:05:30,666 --> 00:05:33,196 and imagery but also that of forensics and part 126 00:05:33,196 --> 00:05:35,466 of forensics will involve understanding how things 127 00:05:35,466 --> 00:05:36,346 like images work. 128 00:05:36,566 --> 00:05:38,046 Now it turns out that even though we look 129 00:05:38,046 --> 00:05:39,916 at images all day long on facebook 130 00:05:39,916 --> 00:05:42,596 or just your own computer in the form of JPEGs, and GIFs 131 00:05:42,596 --> 00:05:46,006 and PNGs, well, you can actually represent images as you will see 132 00:05:46,006 --> 00:05:48,136 in this week's pset fairly simply, right? 133 00:05:48,136 --> 00:05:50,366 If we have a trivial image that we wanna represent 134 00:05:50,366 --> 00:05:54,756 like a smiley face here, well you'll probably know before CS50 135 00:05:54,756 --> 00:05:57,566 that an image is just a bunch of dots called pixels. 136 00:05:57,566 --> 00:06:01,246 It's a grid, horizontally and vertically of dots like this 137 00:06:01,246 --> 00:06:03,796 and so in the simplest form if all we wanted 138 00:06:03,796 --> 00:06:06,076 to do is implement a black and white smiley face, 139 00:06:06,076 --> 00:06:09,056 you know we could kinda do this using week one's material. 140 00:06:09,056 --> 00:06:11,466 We can just auto-- arbitrarily say 141 00:06:11,466 --> 00:06:13,576 that the bit one will represent white, 142 00:06:13,686 --> 00:06:16,066 the bit zero will represent black. 143 00:06:16,436 --> 00:06:20,306 And so if we create a file using some kind of program or editor, 144 00:06:20,576 --> 00:06:23,506 whereby we say 11000011, 145 00:06:23,506 --> 00:06:27,296 we could get the top row aka scan line 146 00:06:27,296 --> 00:06:29,056 of the smiley's head there. 147 00:06:29,056 --> 00:06:31,346 Now of course images today are much more glamorous 148 00:06:31,346 --> 00:06:33,806 than this black and white images but all we do 149 00:06:33,806 --> 00:06:35,956 to represent things like JPEGs 150 00:06:35,956 --> 00:06:38,176 on facebook profiles is just more 151 00:06:38,176 --> 00:06:40,516 than a single bit per dot rather we typically use 152 00:06:40,516 --> 00:06:44,726 like 24 bits per dot to represent most any color 153 00:06:44,726 --> 00:06:46,256 that a human could possible see. 154 00:06:46,256 --> 00:06:48,236 And so you'll see that the problems that walks you 155 00:06:48,236 --> 00:06:50,866 through the progression from bitmaps BMPs 156 00:06:50,866 --> 00:06:53,786 which this thing here is to JPEGs which you then need 157 00:06:53,786 --> 00:06:56,516 to recover as part of the forensic challenge. 158 00:06:56,656 --> 00:06:59,786 But we also introduce some C in programming specific ideas. 159 00:06:59,786 --> 00:07:02,266 So we talked briefly in past weeks about structs 160 00:07:02,516 --> 00:07:04,516 and this table here is kind of overwhelming 161 00:07:04,776 --> 00:07:08,946 but what this table here represents is what you can find 162 00:07:09,146 --> 00:07:11,796 at the start of any bitmap file. 163 00:07:11,796 --> 00:07:14,206 BMP is a file format like JIF and JPEG. 164 00:07:14,416 --> 00:07:16,636 It's with the smiley face there it was but it turns 165 00:07:16,636 --> 00:07:19,626 out that it's not sufficient to just put zeros and ones 166 00:07:19,626 --> 00:07:23,436 in a file and say okay this is a BMP rather file formats 167 00:07:23,436 --> 00:07:25,936 like the bitmap file format needs what we generally 168 00:07:25,936 --> 00:07:26,626 call metadata. 169 00:07:26,626 --> 00:07:30,006 It needs some hints so that programs like photo shop 170 00:07:30,006 --> 00:07:33,606 and Ristretto photo viewer or whatever you're using to open 171 00:07:33,606 --> 00:07:36,126 up graphics know a little something about how 172 00:07:36,126 --> 00:07:37,306 to interpret those bits. 173 00:07:37,306 --> 00:07:39,426 Right, if you just hand a program zeros and ones, 174 00:07:39,686 --> 00:07:40,986 it's completely out of context. 175 00:07:40,986 --> 00:07:44,076 It's not clear if this is a word document, a text file, an image. 176 00:07:44,336 --> 00:07:47,566 So this metadata is what list the file extension the last 3 177 00:07:47,566 --> 00:07:49,766 letters of a file name like BMP 178 00:07:49,766 --> 00:07:52,636 or JPG typically provide this information. 179 00:07:52,776 --> 00:07:55,396 Now not all of these is all that interesting but you'll see 180 00:07:55,396 --> 00:07:56,856 at the top that you'll-- 181 00:07:56,946 --> 00:07:59,296 actually you'll see in the problem set that we do inquire 182 00:07:59,296 --> 00:08:02,136 about things like BI size and width and height. 183 00:08:02,506 --> 00:08:05,726 So information as to well, are all of these zeros and ones 184 00:08:05,726 --> 00:08:08,766 that you see, this wide, are they this tall and so forth. 185 00:08:08,766 --> 00:08:11,986 All of that information is hidden in here and as an aside, 186 00:08:11,986 --> 00:08:15,606 the reason for these new data types like word and DWORD 187 00:08:15,636 --> 00:08:17,736 and byte, these are actually typically used 188 00:08:17,736 --> 00:08:20,366 in the windows world when doing windows programming. 189 00:08:20,616 --> 00:08:23,306 Microsoft simply has some of its own data types 190 00:08:23,376 --> 00:08:26,896 that there is a mapping to Linux and UNIX data types like char 191 00:08:26,896 --> 00:08:28,006 and float and so forth. 192 00:08:28,266 --> 00:08:31,886 So you'll see in bmp.h one of the files we give you this week 193 00:08:32,106 --> 00:08:34,976 that we just create synonyms for these various things, for word 194 00:08:34,976 --> 00:08:37,136 and double word and long and bytes. 195 00:08:37,286 --> 00:08:40,526 So realize that we have all the expressiveness now and see 196 00:08:40,526 --> 00:08:42,716 to do this and those numbers if not clear on the left, 197 00:08:42,716 --> 00:08:44,906 just show you how far into the struct, 198 00:08:44,906 --> 00:08:47,806 at what byte position this particular field is. 199 00:08:47,806 --> 00:08:50,386 But without this information, you're kind of screwed right? 200 00:08:50,386 --> 00:08:54,046 Unless you understand something about the format of a file, 201 00:08:54,046 --> 00:08:55,286 it's pretty much impossible 202 00:08:55,286 --> 00:08:57,716 to read it unless you have these hints and similarly 203 00:08:57,716 --> 00:09:00,306 for the recovery part of these problems that we have 204 00:09:00,306 --> 00:09:01,706 to recover a whole bunch of JPEGs 205 00:09:01,706 --> 00:09:04,826 that I accidentally deleted just knowing a little something 206 00:09:04,826 --> 00:09:06,176 about this header information. 207 00:09:06,176 --> 00:09:07,136 This one again is BMPs. 208 00:09:07,136 --> 00:09:10,876 It's not JPEGs but we tell you a little about the JPEG format 209 00:09:11,116 --> 00:09:12,686 so that you can similarly recover 210 00:09:12,936 --> 00:09:15,566 like an investigator might a whole bunch of images. 211 00:09:15,746 --> 00:09:17,176 And now in a Hacker Edition for those 212 00:09:17,176 --> 00:09:19,486 who have been interested this week realize 213 00:09:19,486 --> 00:09:23,066 that this is most likely the last 214 00:09:23,216 --> 00:09:26,286 of our Hacker Editions 'cause we now kind of converging 215 00:09:26,286 --> 00:09:28,836 and all get on to the same page realize these one is actually a 216 00:09:28,836 --> 00:09:30,306 little more accessible than usual. 217 00:09:30,306 --> 00:09:33,916 What you'll do is much the same problem set but when challenged 218 00:09:33,916 --> 00:09:36,876 to resize images, BMPs to make them bigger, 219 00:09:37,116 --> 00:09:38,806 the Hacker Edition will also challenge you 220 00:09:38,806 --> 00:09:39,776 to make them smaller. 221 00:09:39,966 --> 00:09:42,566 And making an image smaller is actually non obvious whereas 222 00:09:42,606 --> 00:09:43,896 to make something twice as big, 223 00:09:44,156 --> 00:09:46,556 you can imagine conceptually just double all the pixels. 224 00:09:46,556 --> 00:09:48,336 Make everything twice as wide, twice as tall, 225 00:09:48,396 --> 00:09:50,036 and then you can just do that for any factor. 226 00:09:50,286 --> 00:09:51,776 But if you have to shrink an image 227 00:09:52,026 --> 00:09:53,496 and for instance you have a pixel, 228 00:09:53,496 --> 00:09:54,926 what would you do with that pixel? 229 00:09:54,926 --> 00:09:55,666 A pixel is a pixel. 230 00:09:55,666 --> 00:09:56,586 You can't shrink it. 231 00:09:56,826 --> 00:09:59,046 So instead you have to throw information away, 232 00:09:59,046 --> 00:10:00,596 if you wanna shrink an image. 233 00:10:00,806 --> 00:10:03,106 And so one of the challenges of the Hacker Edition is to decide 234 00:10:03,316 --> 00:10:05,886 which pixels should you throw away or if something is black 235 00:10:05,886 --> 00:10:07,766 and something is white, should you may be throw one 236 00:10:07,766 --> 00:10:10,816 of them away and just make one remaining pixel gray 237 00:10:10,936 --> 00:10:11,966 or something to that effect. 238 00:10:11,966 --> 00:10:14,646 So there are a lot of opportunities there for design. 239 00:10:14,646 --> 00:10:16,306 And there will also be opportunities as usual 240 00:10:16,306 --> 00:10:18,876 for office hours but per 2 weeks ago's announcement, 241 00:10:19,086 --> 00:10:20,966 we're gonna have to cross the river just this week 242 00:10:20,966 --> 00:10:24,376 to join the iLAB for one of its inaugural events, 243 00:10:24,666 --> 00:10:25,796 where we'll have the same iPod 244 00:10:26,026 --> 00:10:27,606 and then the same approach to office hours. 245 00:10:27,896 --> 00:10:30,496 But upon arrival, you'll see a nice sexy lobby like this. 246 00:10:30,496 --> 00:10:31,836 There will presumably be people 247 00:10:31,836 --> 00:10:33,576 by then behind the glass walls there 248 00:10:33,576 --> 00:10:35,106 and you'll pretty much be welcome 249 00:10:35,106 --> 00:10:36,306 to just spread out wherever. 250 00:10:36,506 --> 00:10:38,366 You can certainly bring friends from other classes. 251 00:10:38,366 --> 00:10:40,026 The goal is simply to hang out as a class, 252 00:10:40,026 --> 00:10:42,866 get the questions answered by the TFs and CAs and eat pizza 253 00:10:42,866 --> 00:10:45,446 and soda which will be supplied gratefully by the iLAB. 254 00:10:45,446 --> 00:10:47,386 All right, so quiz zero. 255 00:10:47,976 --> 00:10:48,666 Also what we will do. 256 00:10:48,666 --> 00:10:49,576 I'll be there tonight. 257 00:10:50,036 --> 00:10:52,406 If you have questions or wanna talk through your quiz zero, 258 00:10:52,406 --> 00:10:53,886 you can reach out to me directly. 259 00:10:53,886 --> 00:10:54,306 We will make one 260 00:10:54,306 --> 00:10:58,926 of the categories quiz zero questions tonight on q.cs50.net 261 00:10:59,156 --> 00:11:00,036 and also throughout the week. 262 00:11:00,036 --> 00:11:02,496 If you wanna walk through your quiz with one of the TFs 263 00:11:02,496 --> 00:11:06,316 or CAs realize that we will make ourselves available for that 264 00:11:06,526 --> 00:11:11,356 and lastly, if you're on that fence and you're here today 265 00:11:11,356 --> 00:11:13,436 at lecture but you are sort of planning to head 266 00:11:13,436 --> 00:11:15,786 to the registrar by 5 p.m. to withdraw from CS50, 267 00:11:16,016 --> 00:11:18,856 please do catch me after class for conversation if you're 268 00:11:18,856 --> 00:11:22,166 on that fence 'cause realized that we are now pretty much more 269 00:11:22,166 --> 00:11:23,936 than half way through the course's problem sets 270 00:11:23,936 --> 00:11:26,076 and pretty much half way through the course so we would hate 271 00:11:26,076 --> 00:11:27,356 to lose you at this point. 272 00:11:27,956 --> 00:11:29,636 All right, now for something technical. 273 00:11:30,236 --> 00:11:33,096 So one of the hardest things in [inaudible] is dealing 274 00:11:33,096 --> 00:11:36,436 with things like memory and pointers and malloc and free 275 00:11:36,436 --> 00:11:39,746 and increasingly will this be an opportunity to really screw 276 00:11:39,746 --> 00:11:42,316 up your programs and make programs that crash 277 00:11:42,316 --> 00:11:44,006 and seg fault and core dump. 278 00:11:44,256 --> 00:11:47,096 But thankfully, just as we have tools to write programs, 279 00:11:47,096 --> 00:11:50,706 we also have tools to fix or to diagnose or debug programs. 280 00:11:50,706 --> 00:11:55,216 We've used or we've said you should use gdb thus far, 281 00:11:55,216 --> 00:11:58,126 the new debugger but we also introduced this week a nice tool 282 00:11:58,126 --> 00:12:01,446 called Valgrind whose purpose in life is to analyze your code 283 00:12:01,626 --> 00:12:02,966 and figure out if you've screwed 284 00:12:02,966 --> 00:12:04,836 up with regard to memory management. 285 00:12:05,066 --> 00:12:06,776 Did you accidentally leak memory? 286 00:12:06,776 --> 00:12:09,646 Did you accidentally go beyond the boundaries of some array? 287 00:12:09,896 --> 00:12:11,046 Now the tool is imperfect. 288 00:12:11,046 --> 00:12:14,306 They can't necessarily perfectly test your code 289 00:12:14,306 --> 00:12:16,056 because of course any time you run programs, 290 00:12:16,286 --> 00:12:18,626 you give them input and Valgrind is not gonna be able 291 00:12:18,626 --> 00:12:21,586 to try your program on an infinite supply of inputs. 292 00:12:21,586 --> 00:12:24,376 So it's gonna do its best guest but you'll find 293 00:12:24,376 --> 00:12:27,016 that it can help us find some things quite nicely. 294 00:12:27,016 --> 00:12:29,956 Let me go ahead and open up a file called memory dotC. 295 00:12:30,606 --> 00:12:32,386 That's among our source code for today 296 00:12:32,386 --> 00:12:34,976 and this is a pretty stupid program that's deliberately 297 00:12:34,976 --> 00:12:36,876 buggy and that it does the following. 298 00:12:36,876 --> 00:12:38,096 So let's start from the bottom. 299 00:12:38,096 --> 00:12:39,366 Here is my main function. 300 00:12:39,366 --> 00:12:40,716 Takes no command line arguments, 301 00:12:40,946 --> 00:12:43,966 apparently calls a function F and then return 0. 302 00:12:44,036 --> 00:12:45,066 So that looks okay. 303 00:12:45,316 --> 00:12:48,066 But then let's roll upward to the F function. 304 00:12:48,286 --> 00:12:49,346 It returns void. 305 00:12:49,346 --> 00:12:51,596 It takes no arguments in but it does 306 00:12:51,906 --> 00:12:55,176 in this first line call malloc and just as a check, 307 00:12:55,176 --> 00:12:58,846 sanity check how many bytes get malloc in this highlighted line. 308 00:12:59,856 --> 00:13:00,606 So 40, right. 309 00:13:00,606 --> 00:13:04,236 If we assume in int is 4 bytes, then 4 times 10 is of course 40 310 00:13:04,456 --> 00:13:08,136 and then what just to be technical are we storing in x? 311 00:13:10,236 --> 00:13:11,056 Then address. 312 00:13:11,056 --> 00:13:13,656 The address of that chunk of 40 bytes. 313 00:13:14,066 --> 00:13:16,056 So that's sort of from 2 weeks back and then 314 00:13:16,056 --> 00:13:18,356 on the next line unfortunately, I do something stupid. 315 00:13:18,356 --> 00:13:20,266 Why is this flawed? 316 00:13:20,356 --> 00:13:21,566 This second line of code. 317 00:13:21,566 --> 00:13:26,536 Extract a 10 get 0. 318 00:13:27,136 --> 00:13:28,276 Yeah? 319 00:13:28,466 --> 00:13:30,466 [ Inaudible Remark ] 320 00:13:30,656 --> 00:13:34,866 >> Perfect so the maximum index for this array should be 9 321 00:13:34,866 --> 00:13:37,806 from 0 through 9 because I've only allocated 10 int. 322 00:13:38,106 --> 00:13:40,686 So of course this is an off by 1 error or I'm 323 00:13:40,686 --> 00:13:42,486 over reaching the boundaries of my array. 324 00:13:42,726 --> 00:13:46,126 In the worst case, this program might crash and seg fault 325 00:13:46,126 --> 00:13:48,266 and core dump but not always and that's one 326 00:13:48,266 --> 00:13:49,356 of the frustrating things. 327 00:13:49,356 --> 00:13:52,156 Often if you just go ever so slightly outside 328 00:13:52,156 --> 00:13:54,386 of your memory bounds-- your memory's bounds, 329 00:13:54,696 --> 00:13:57,666 the program might not actually crash 'cause the operating 330 00:13:57,666 --> 00:14:00,236 system is being a bit generous and is allowing you 331 00:14:00,236 --> 00:14:02,586 if unofficially and in a dangerous way 332 00:14:02,836 --> 00:14:05,186 to go slightly beyond your array's bounds. 333 00:14:05,396 --> 00:14:08,066 So in other words, this is kind of hard to catch sometimes, 334 00:14:08,066 --> 00:14:09,686 certainly just by visual inspection. 335 00:14:09,986 --> 00:14:11,056 So let's instead try this. 336 00:14:11,056 --> 00:14:12,746 Let me open a terminal window. 337 00:14:13,216 --> 00:14:17,776 Let me go ahead then and compile my code with make memory. 338 00:14:18,416 --> 00:14:20,246 And now if I run this program, 339 00:14:20,906 --> 00:14:22,216 it actually seems to be fine, right? 340 00:14:22,216 --> 00:14:24,106 No core dump, no segmentation fault. 341 00:14:24,106 --> 00:14:25,676 So at first glance, all is okay 342 00:14:25,676 --> 00:14:28,176 but now instead let me run this new program Valgrind 343 00:14:28,516 --> 00:14:30,306 and let me go ahead and run it on memory. 344 00:14:30,356 --> 00:14:31,746 I still need the dot slash just 345 00:14:31,746 --> 00:14:34,326 to test this program here and then hit Enter. 346 00:14:34,716 --> 00:14:37,266 Unfortunately whoever first wrote Valgrind decided 347 00:14:37,266 --> 00:14:40,586 that let's make its output a crazy mess but if you read 348 00:14:40,586 --> 00:14:42,676 between the lines, you can actually see some 349 00:14:42,676 --> 00:14:43,606 juicy information. 350 00:14:43,606 --> 00:14:46,176 So let me scroll back up to the very start of this 351 00:14:46,536 --> 00:14:47,926 and you'll see here's where we started. 352 00:14:47,926 --> 00:14:49,236 This is where I ran the command 353 00:14:49,236 --> 00:14:51,626 and now we have some copyright information and stuff like that 354 00:14:51,826 --> 00:14:55,336 but here is where the helpful information is, invalid write 355 00:14:55,556 --> 00:14:58,106 of size 4 at such and such an address. 356 00:14:58,366 --> 00:14:59,886 Now this pointer, this address is 357 00:14:59,886 --> 00:15:01,986 on the left hand side are generally not all 358 00:15:01,986 --> 00:15:04,966 that useful unless we really get low level with our debugging 359 00:15:05,206 --> 00:15:07,786 but it does mean that you did something wrong at that address. 360 00:15:08,026 --> 00:15:10,886 Specifically at line 23 of what file? 361 00:15:12,286 --> 00:15:13,506 Right, so memory dotC, right? 362 00:15:13,506 --> 00:15:16,326 So the output is actually quite similar to gdb and so 363 00:15:16,326 --> 00:15:21,696 if I go back to line 23, which is this guy here sure enough, 364 00:15:21,696 --> 00:15:25,366 Valgrind has noticed that I wrote 4 bytes erroneously there. 365 00:15:25,596 --> 00:15:27,876 Now it's still up to me, the human, to determine 366 00:15:27,876 --> 00:15:28,886 where is my mistake 367 00:15:28,886 --> 00:15:30,856 and of course we already identified the mistake 368 00:15:30,856 --> 00:15:32,096 but the program found it. 369 00:15:32,256 --> 00:15:33,626 Even though when I ran this program, 370 00:15:33,626 --> 00:15:35,026 nothing appeared to be wrong. 371 00:15:35,226 --> 00:15:37,786 Now lastly, this here says-- 372 00:15:37,786 --> 00:15:41,916 this actually being address is 0 bytes-- actually, that's okay. 373 00:15:41,916 --> 00:15:44,896 Let me scroll down here and now if we go down here, 374 00:15:44,896 --> 00:15:49,026 leak summary definitely lost 40 bytes in 1 block. 375 00:15:49,366 --> 00:15:51,256 Now, this might be a much bigger program. 376 00:15:51,256 --> 00:15:52,256 Right now it's pretty simple. 377 00:15:52,256 --> 00:15:54,756 It's only a few lines so I can actually run Valgrind 378 00:15:54,756 --> 00:15:57,746 with more switches and notice it's telling me to rerun 379 00:15:57,746 --> 00:16:00,476 with hyphen, hyphen, leak, hyphen, check equals full 380 00:16:00,666 --> 00:16:02,476 to see details of leaked memory. 381 00:16:02,696 --> 00:16:04,346 And in fact if I go back to my slide here, 382 00:16:04,346 --> 00:16:06,216 what you should generally run even though it's a little 383 00:16:06,216 --> 00:16:08,106 tedious to type is exactly this. 384 00:16:08,226 --> 00:16:11,476 Valgrind-v-- so I'm gonna try this, Valgrind-V. 385 00:16:12,256 --> 00:16:15,176 -V almost always means robust for a Linux program. 386 00:16:15,486 --> 00:16:17,826 Some of these switches take hyphen, 387 00:16:17,826 --> 00:16:19,356 hyphen not single hyphens. 388 00:16:19,616 --> 00:16:22,506 The convention just FYI is that if it's a single letter switch, 389 00:16:22,506 --> 00:16:23,756 it's usually a single hyphen. 390 00:16:23,986 --> 00:16:26,076 If it's a longer word like leak, 391 00:16:26,256 --> 00:16:27,566 then you would actually use hyphen, 392 00:16:27,566 --> 00:16:29,456 hyphen but that's just the human convention. 393 00:16:29,456 --> 00:16:33,436 So leak check equals full and then I have to say 8 dot 394 00:16:33,436 --> 00:16:35,166 out in general but in the concrete case, 395 00:16:35,166 --> 00:16:36,356 it's the program called memory. 396 00:16:36,436 --> 00:16:38,606 So let me hit Enter now and oh my God. 397 00:16:38,726 --> 00:16:41,176 So much more flew across the screen there 398 00:16:41,176 --> 00:16:44,506 but let's scroll up-- scroll up and here we go. 399 00:16:45,126 --> 00:16:47,816 Now because I ran it with more robust output 400 00:16:47,816 --> 00:16:49,086 and I frankly knew where to look. 401 00:16:49,086 --> 00:16:51,276 You would kind to have to take a few seconds to sift through this 402 00:16:51,566 --> 00:16:55,176 but notice, 40 bytes in 1 block are definitely lost 403 00:16:55,176 --> 00:16:56,506 and lost record 1 of 1. 404 00:16:56,726 --> 00:16:58,556 I don't quite know what that means but it looks 405 00:16:58,556 --> 00:17:02,546 like the bug is in a function called malloc, right? 406 00:17:03,066 --> 00:17:04,526 >> All right, so probably not, right? 407 00:17:04,526 --> 00:17:06,246 So you've called malloc so let's 408 00:17:06,246 --> 00:17:09,036 at least chase this bug back a little further in the stack 409 00:17:09,486 --> 00:17:11,096 and what was the previous function called? 410 00:17:11,406 --> 00:17:13,916 Okay, it's probably me who screwed up so it's somewhere 411 00:17:13,916 --> 00:17:17,866 in memory dot C line 22, a function called F was called 412 00:17:17,866 --> 00:17:19,916 and somewhere in there 40 bytes were lost. 413 00:17:20,176 --> 00:17:22,156 Of course in a program, this small, it's pretty obvious 414 00:17:22,186 --> 00:17:25,076 to the reader what I did wrong which is what? 415 00:17:25,076 --> 00:17:26,356 How did I lose these 40 bytes? 416 00:17:27,476 --> 00:17:28,276 I didn't call free. 417 00:17:28,346 --> 00:17:30,496 I should have called free at some point 418 00:17:30,786 --> 00:17:32,746 and of course this would be kind of a silly exercise 419 00:17:32,746 --> 00:17:35,756 to just do it now because this is again kind of a bogus program 420 00:17:35,996 --> 00:17:37,696 but that would be the correct fix. 421 00:17:37,696 --> 00:17:38,946 Just hopefully in the real world, 422 00:17:38,946 --> 00:17:40,226 if I'm writing some real program, 423 00:17:40,456 --> 00:17:43,346 there would be some useful stuff going on in between those lines. 424 00:17:43,346 --> 00:17:44,846 But this one is just completely wrong. 425 00:17:44,846 --> 00:17:47,716 Going into bracket 10 is just incorrect. 426 00:17:48,086 --> 00:17:51,806 So in short, unglamorous program but a useful tool. 427 00:17:51,806 --> 00:17:54,386 So henceforth anytime you write any program whether 428 00:17:54,386 --> 00:17:56,936 for [inaudible] at five or onward that uses malloc, 429 00:17:56,936 --> 00:17:59,916 that uses arrays in any way do run your program 430 00:17:59,916 --> 00:18:00,616 through that tool. 431 00:18:00,616 --> 00:18:02,476 Take a few seconds to sift through the output 432 00:18:02,646 --> 00:18:05,866 and it might find a bug frankly before your TF does. 433 00:18:06,936 --> 00:18:07,906 Any questions? 434 00:18:09,806 --> 00:18:13,556 All right, so now we have the opportunity to start 435 00:18:13,676 --> 00:18:16,666 to implement with the same fundamentals from the first half 436 00:18:16,666 --> 00:18:20,356 of the semester some real world constructs so that we can begin 437 00:18:20,356 --> 00:18:23,456 to solve more interesting, more challenging problems 438 00:18:23,636 --> 00:18:27,546 with more sophisticated tools than just an array or variable. 439 00:18:27,656 --> 00:18:31,426 So this is a photograph from my favorite [inaudible] and this is 440 00:18:31,426 --> 00:18:33,166 of course a stack of trays. 441 00:18:33,166 --> 00:18:36,706 Well it turns out in computer science a stack is a 442 00:18:36,706 --> 00:18:37,606 data structure. 443 00:18:37,606 --> 00:18:39,476 It's a data structure that allows you 444 00:18:39,476 --> 00:18:41,876 to actually store data and potentially 445 00:18:41,876 --> 00:18:45,536 in more useful way then say a simple array does. 446 00:18:45,566 --> 00:18:47,856 The idea of a stack just like in the real world is 447 00:18:47,886 --> 00:18:50,546 that if you have a stack of trays, there are some 448 00:18:50,546 --> 00:18:52,026 up sides and some down sides. 449 00:18:52,166 --> 00:18:56,286 One upside of a stack is that if I put something down like a tray 450 00:18:56,286 --> 00:18:58,636 and then I put something else down like a tray, 451 00:18:58,846 --> 00:19:01,796 then I put something else down like a tray well it's very easy 452 00:19:01,796 --> 00:19:05,596 and very fast to get the most recent thing from that stack. 453 00:19:05,886 --> 00:19:08,516 You can just pop it off the top of the stack so to speak. 454 00:19:08,776 --> 00:19:11,526 Unfortunately, a stack of trays or a stack 455 00:19:11,646 --> 00:19:15,036 of paper here is not the most efficient system if you instead 456 00:19:15,036 --> 00:19:16,486 or for instance queuing 457 00:19:16,486 --> 00:19:18,206 up for something you really want, right. 458 00:19:18,206 --> 00:19:20,286 It would kind of suck if you got to the Apple store 459 00:19:20,286 --> 00:19:24,736 like 5 am before or even earlier to get your fancy new iPhone 3 460 00:19:24,736 --> 00:19:27,266 or 4 or whatever and the staff 461 00:19:27,266 --> 00:19:29,416 in blue shirts starts popping people off the [inaudible] queue 462 00:19:29,496 --> 00:19:31,116 from the back, right? 463 00:19:31,116 --> 00:19:32,656 That is a bad data structure. 464 00:19:32,656 --> 00:19:35,476 A bad design when it comes to selling iPhones. 465 00:19:35,476 --> 00:19:37,446 You would really piss off the people at the front of the line. 466 00:19:37,686 --> 00:19:39,046 So instead in computer science, 467 00:19:39,046 --> 00:19:41,116 there's an alternative data structure that's similar 468 00:19:41,116 --> 00:19:41,656 in spirit. 469 00:19:41,656 --> 00:19:43,206 It's kind of a list of some sort 470 00:19:43,566 --> 00:19:45,726 but it's called technically a queue. 471 00:19:45,726 --> 00:19:49,206 And a queue is first in first out. 472 00:19:49,366 --> 00:19:51,716 So the acronym is fifo, FIFO. 473 00:19:51,716 --> 00:19:54,226 First in first out just means if you get there first 474 00:19:54,226 --> 00:19:56,226 in the morning, you're gonna be the first one allowed 475 00:19:56,226 --> 00:19:57,886 to buy that product. 476 00:19:57,886 --> 00:20:00,486 By contrast a stack is the opposite 477 00:20:00,746 --> 00:20:03,376 but we'll see utility of both of them. 478 00:20:03,376 --> 00:20:06,076 In fact, those of you familiar with HTML know 479 00:20:06,076 --> 00:20:09,036 that the prevalence in HTML is the idea of balancing tags, 480 00:20:09,036 --> 00:20:11,536 having an open tag, and a closed tag, and the notion 481 00:20:11,536 --> 00:20:13,716 of validating HTML involves checking. 482 00:20:13,716 --> 00:20:16,056 Do you close all of the tags that you opened 483 00:20:16,316 --> 00:20:19,066 or you can actually use it turns out a stack data structure 484 00:20:19,366 --> 00:20:22,226 to answer questions like that if you are the W3c's Validator. 485 00:20:22,226 --> 00:20:24,196 And for those for whom most of those words made no sense, 486 00:20:24,446 --> 00:20:26,516 not to fear just a couple of weeks, we'll be diving 487 00:20:26,516 --> 00:20:29,226 into the web programming portion of the course. 488 00:20:29,416 --> 00:20:31,876 But to begin to now use these data structures 489 00:20:31,876 --> 00:20:34,926 and solve more interesting problems, we need to be able 490 00:20:35,106 --> 00:20:39,046 to chain structures together. 491 00:20:39,406 --> 00:20:42,316 One of the biggest problems of an array is that as-- 492 00:20:42,316 --> 00:20:45,036 even though it's super simple to use once you get to hang of it 493 00:20:45,036 --> 00:20:46,796 and that you can go to any elements in it. 494 00:20:46,856 --> 00:20:51,226 What can you not do easily with an array? 495 00:20:51,226 --> 00:20:51,456 [ Inaudible Remark ] 496 00:20:51,456 --> 00:20:54,256 >> Sorry? You can't extend it right. 497 00:20:54,256 --> 00:20:57,936 You have to decide in advance if you want 10 bytes or 100 bytes 498 00:20:57,936 --> 00:20:59,936 or whatever it is that you are allocating. 499 00:21:00,176 --> 00:21:01,476 You have to decide in advance. 500 00:21:01,476 --> 00:21:03,036 And what happens if you decide, 501 00:21:03,036 --> 00:21:06,276 I really need 1 additional element or twice 502 00:21:06,276 --> 00:21:08,536 as many elements what do you have to do 503 00:21:08,536 --> 00:21:10,166 if you're using arrays to store your data. 504 00:21:10,886 --> 00:21:12,126 What's that? 505 00:21:12,126 --> 00:21:12,306 [ Inaudible Remark ] 506 00:21:12,306 --> 00:21:16,166 >> Yeah. You pretty much have to make a new one. 507 00:21:16,356 --> 00:21:20,276 Copy the old data into the new one and then free the old one. 508 00:21:20,416 --> 00:21:22,096 Now, we haven't done that ourselves really 509 00:21:22,096 --> 00:21:23,616 but we've been using something that does. 510 00:21:23,616 --> 00:21:24,866 What function that we've been using 511 00:21:24,866 --> 00:21:27,966 since week 1 does exactly that, when it runs out of space, 512 00:21:27,966 --> 00:21:31,236 it allocates more, copies old into new and then frees the old. 513 00:21:31,236 --> 00:21:31,586 >> GetString. 514 00:21:32,636 --> 00:21:33,736 >> GetString, exactly. 515 00:21:33,736 --> 00:21:35,536 Remember GetString is a perfect example 516 00:21:35,536 --> 00:21:38,416 because we have no idea how long of a sentence 517 00:21:38,416 --> 00:21:41,406 or paragraph some students ever gonna type when we sit 518 00:21:41,406 --> 00:21:44,026 down at the start of the semester to implement GetString. 519 00:21:44,246 --> 00:21:45,766 So we have to be dynamic. 520 00:21:45,766 --> 00:21:47,336 We have to enable the function 521 00:21:47,526 --> 00:21:50,276 to grow the storage space it's using and we've seen 522 00:21:50,276 --> 00:21:52,746 as of 2 weeks ago that the technique 523 00:21:52,746 --> 00:21:55,236 for that involves using malloc and it turns 524 00:21:55,236 --> 00:21:57,996 out some other related functions like realloc and the like 525 00:21:58,286 --> 00:22:01,406 but it boils down to allocating and reallocating memory. 526 00:22:01,656 --> 00:22:05,496 So what if there were a data structure that's allowed you 527 00:22:05,536 --> 00:22:07,666 to have some fixed amount of memory first 528 00:22:07,836 --> 00:22:10,806 but then grow it super easily because the problem 529 00:22:10,806 --> 00:22:12,696 with using the array trick, is what? 530 00:22:12,696 --> 00:22:15,736 If your string is this big and you need 1 more byte or twice 531 00:22:15,736 --> 00:22:17,806 as many bytes, this is a lot of work, right? 532 00:22:17,806 --> 00:22:19,746 You have to ask the operating system for more memory. 533 00:22:19,746 --> 00:22:23,096 So temporarily, you're doubling your memory footprints then you 534 00:22:23,096 --> 00:22:26,416 have to copy as in a 4 or Y loop all of the old memory 535 00:22:26,766 --> 00:22:29,406 into the new memory, then you have to free that memory. 536 00:22:29,406 --> 00:22:30,276 That's a lot of steps. 537 00:22:30,506 --> 00:22:33,396 If the user has may be hit 1 extra character. 538 00:22:33,656 --> 00:22:34,726 So what's an alternative? 539 00:22:34,726 --> 00:22:37,376 You can avoid resizing arrays in this manner 540 00:22:37,536 --> 00:22:40,126 by just allocating maybe 2 gigabytes of memory. 541 00:22:40,126 --> 00:22:42,136 Any time the user might type something in, 542 00:22:42,446 --> 00:22:45,266 I'll just choose a huge upper bound, 2 gigabytes 543 00:22:45,266 --> 00:22:47,456 but of course what's the obvious problem here? 544 00:22:47,456 --> 00:22:48,086 [ Inaudible Remark ] 545 00:22:48,086 --> 00:22:52,666 >> Right, if you just allocate an excessive amount of memory, 546 00:22:52,696 --> 00:22:55,106 your program at some point is just gonna stop working 547 00:22:55,106 --> 00:22:58,356 or it's gonna slow the computer to a crawl because it appears 548 00:22:58,356 --> 00:23:00,446 to be using way more memory that it should 549 00:23:00,586 --> 00:23:02,576 and if you only have a couple gigabytes of RAM 550 00:23:02,806 --> 00:23:04,886 and your silly little program is using it 551 00:23:04,886 --> 00:23:06,416 when you're actually trying to do useful work 552 00:23:06,416 --> 00:23:08,146 with like a browser or word processor, 553 00:23:08,406 --> 00:23:09,526 you know everything will just slow 554 00:23:09,526 --> 00:23:11,126 to a crawl as we've discussed. 555 00:23:11,366 --> 00:23:13,096 So suppose there were a structure 556 00:23:13,096 --> 00:23:14,876 that looked a little something more like this. 557 00:23:15,346 --> 00:23:17,656 Enter into the picture something called a linked list 558 00:23:18,246 --> 00:23:19,746 which is pretty much what the name suggests. 559 00:23:19,856 --> 00:23:23,126 It is a linked list of what are generally called nodes. 560 00:23:23,126 --> 00:23:23,786 What's a node? 561 00:23:24,026 --> 00:23:25,566 For now, it's just a chunk of memory 562 00:23:25,776 --> 00:23:27,906 and we can implement these little chunks by way 563 00:23:27,906 --> 00:23:29,936 of that C construct known as a struct. 564 00:23:30,406 --> 00:23:32,266 So where have we used the struct before thus far? 565 00:23:33,146 --> 00:23:38,286 What comes to mind? 566 00:23:38,286 --> 00:23:38,986 [ Inaudible Remark ] 567 00:23:38,986 --> 00:23:39,826 >> Sudoku. 568 00:23:39,826 --> 00:23:43,546 So we had a struct in sudoku to represent things like the board 569 00:23:43,546 --> 00:23:45,166 and the y, x location. 570 00:23:45,466 --> 00:23:49,646 We had something for students a couple of weeks ago 571 00:23:49,646 --> 00:23:51,586 where we wrapped together students name 572 00:23:51,636 --> 00:23:52,986 and ID number and house. 573 00:23:53,236 --> 00:23:54,436 So any time you wanna kind 574 00:23:54,436 --> 00:23:57,256 of clump together some related data fields 575 00:23:57,256 --> 00:23:58,376 and call it something new 576 00:23:58,376 --> 00:24:01,236 like a student structure, you can do that. 577 00:24:01,386 --> 00:24:03,846 So these squares in just a moment are gonna represent what 578 00:24:03,846 --> 00:24:07,496 we'll generically call a node and each of these squares 579 00:24:07,496 --> 00:24:11,266 in this picture here stores a number, apparently an int, a 9, 580 00:24:11,266 --> 00:24:16,276 a 17, a 22, a 26, and a 34 and this is equivalent 581 00:24:16,376 --> 00:24:19,756 for the moment to an array of size 5. 582 00:24:20,106 --> 00:24:23,486 So I see now 5 numbers 9, 17, 22, 26, 34. 583 00:24:23,656 --> 00:24:26,736 so in effect, I could implement this very easily with week 1-- 584 00:24:26,906 --> 00:24:29,446 with week 2 stuff with an array of size 5. 585 00:24:29,856 --> 00:24:33,436 But I propose that because I've now drawn the picture 586 00:24:33,436 --> 00:24:36,276 in this way using arrows and not by having these numbers back 587 00:24:36,276 --> 00:24:38,296 to back to back as is the case for arrays, 588 00:24:38,296 --> 00:24:40,956 it's gonna be super easy conceptually 589 00:24:41,226 --> 00:24:44,766 to insert more numbers into this, to append them to the end, 590 00:24:44,766 --> 00:24:46,186 to insert them into the beginning 591 00:24:46,376 --> 00:24:47,496 and there too is an advantage. 592 00:24:47,496 --> 00:24:48,866 What was really hard in an array? 593 00:24:49,246 --> 00:24:51,226 Remember when we had a whole bunch of people up here 594 00:24:51,226 --> 00:24:53,436 on the stage when we were doing sorting and we had 595 00:24:53,436 --> 00:24:56,126 to move someone who is over there, all the way over here. 596 00:24:56,316 --> 00:24:59,156 In the worst case, we had to ask everyone, "Can you move down? 597 00:24:59,156 --> 00:24:59,936 Can you move down?" 598 00:25:00,236 --> 00:25:04,046 So just to make 1 swap in the worst case, with an array we had 599 00:25:04,046 --> 00:25:07,906 to do like N-1 steps of work, but suppose the picture were 600 00:25:07,906 --> 00:25:09,926 like this and suppose that in 601 00:25:09,926 --> 00:25:12,546 between each number was literally a gap 602 00:25:12,546 --> 00:25:16,876 and there's some concept of an arrow saying the next number is 603 00:25:16,876 --> 00:25:18,036 over there-- the next number is 604 00:25:18,036 --> 00:25:21,306 over there well maybe we could just have squeezed that students 605 00:25:21,306 --> 00:25:24,916 and their number into wherever we wanted in this list. 606 00:25:25,636 --> 00:25:29,816 So we just need a way of linking nodes together in this way. 607 00:25:29,816 --> 00:25:32,456 So odds are these arrows represent what? 608 00:25:32,456 --> 00:25:32,886 [ Inaudible Remark ] 609 00:25:32,886 --> 00:25:35,706 >> Yeah, so pointers right. 610 00:25:35,706 --> 00:25:37,496 This is how we've been generically drawing pointers 611 00:25:37,496 --> 00:25:40,486 on the blackboard and so really and underneath the hood, 612 00:25:40,486 --> 00:25:41,716 there are some memory addresses 613 00:25:41,716 --> 00:25:43,346 and hexadecimal notation and all of that. 614 00:25:43,716 --> 00:25:45,436 But conceptually, this is a linked list. 615 00:25:45,836 --> 00:25:47,836 It is something like an array but instead 616 00:25:47,836 --> 00:25:50,236 of having contiguous memory, your nodes, 617 00:25:50,236 --> 00:25:53,176 your integers can be stored all over you computer's memory 618 00:25:53,176 --> 00:25:54,846 which is great because then you don't need 619 00:25:54,846 --> 00:25:56,866 to find a huge contiguous block. 620 00:25:56,866 --> 00:26:00,726 It can come all fragmented like from the entirety of your RAM 621 00:26:01,036 --> 00:26:04,756 and it looks like we'll be able to more easily insert numbers 622 00:26:04,816 --> 00:26:08,626 into these lists because we can just temporarily move up 1 623 00:26:08,626 --> 00:26:11,766 of those arrows, slot someone else and another number in there 624 00:26:12,036 --> 00:26:14,936 and then fix what the arrows are pointing at. 625 00:26:15,176 --> 00:26:17,656 So let's just take a look at what this might look like. 626 00:26:17,656 --> 00:26:22,496 I'm gonna go ahead and open a file called let's say list 1.h 627 00:26:23,046 --> 00:26:26,306 and I'll propose this as the container for 1 628 00:26:26,306 --> 00:26:27,436 of these things called nodes. 629 00:26:27,436 --> 00:26:29,916 So we've seen the syntax before even though we haven't written 630 00:26:29,916 --> 00:26:31,036 it ourselves all that much. 631 00:26:31,296 --> 00:26:32,576 Type def does what? 632 00:26:32,576 --> 00:26:32,916 [ Inaudible Remark ] 633 00:26:32,916 --> 00:26:37,686 >> If you kind of wanna be a student, 634 00:26:37,686 --> 00:26:39,006 you just say well it defines a type. 635 00:26:39,006 --> 00:26:40,366 Well, what is that it makes a synonym. 636 00:26:40,366 --> 00:26:42,196 1 type is a synonym for something else. 637 00:26:42,546 --> 00:26:43,986 So let's see what this means here. 638 00:26:44,206 --> 00:26:47,346 So we see the keyword struct and we see the word node. 639 00:26:47,596 --> 00:26:49,826 So G that is being friendly and highlighting all 640 00:26:49,826 --> 00:26:51,396 of the C specific words. 641 00:26:51,396 --> 00:26:53,446 So type def in struct are C key words. 642 00:26:53,696 --> 00:26:55,116 Node is just something I made up. 643 00:26:55,116 --> 00:26:56,176 It could be called foo 644 00:26:56,416 --> 00:26:58,306 but by convention we call these things nodes 645 00:26:58,516 --> 00:27:01,196 and then what composes a node? 646 00:27:01,196 --> 00:27:02,896 What is in one of those rectangles? 647 00:27:02,896 --> 00:27:03,566 Well, 2 things. 648 00:27:03,566 --> 00:27:06,456 One is an int and I can arbitrarily call it N 649 00:27:06,556 --> 00:27:08,006 but I could call it anything I want 650 00:27:08,396 --> 00:27:11,396 but then the curious feature is the second thing. 651 00:27:11,396 --> 00:27:13,976 The bottom half of this picture here. 652 00:27:14,386 --> 00:27:17,866 So for instance, here is N, the number 9, these fields here 653 00:27:17,866 --> 00:27:20,356 which I'm apparently gonna call next what is that? 654 00:27:21,156 --> 00:27:22,436 Well, if it's an arrow 655 00:27:22,436 --> 00:27:25,266 that suggests pictorially that it's a pointer. 656 00:27:25,476 --> 00:27:26,666 How do we write a pointer? 657 00:27:26,666 --> 00:27:30,236 Well with a star somehow but what is this a pointer to? 658 00:27:30,236 --> 00:27:32,906 Well, it's a pointer to the same kind of structure. 659 00:27:33,316 --> 00:27:37,986 So we say, struct node star and then we choose an arbitrary word 660 00:27:37,986 --> 00:27:40,386 for this arrow and I'll call it next just to imply 661 00:27:40,386 --> 00:27:42,206 that it's always pointing to the next object. 662 00:27:42,746 --> 00:27:44,986 And then down here, I have the word node again 663 00:27:44,986 --> 00:27:48,186 which feels a little redundant and it is in some sense. 664 00:27:48,896 --> 00:27:51,386 So at the end of this statement is the word node, 665 00:27:51,646 --> 00:27:53,436 that relates to the type def. 666 00:27:53,876 --> 00:27:58,456 So what this word and this word together mean is henceforth call 667 00:27:58,456 --> 00:28:01,386 this whole thing more succinctly node. 668 00:28:01,806 --> 00:28:05,876 But remember that C is not so generous. 669 00:28:05,876 --> 00:28:08,346 It doesn't look ahead in your code and so we kind 670 00:28:08,346 --> 00:28:09,926 of have a problem because if I want 671 00:28:09,926 --> 00:28:14,046 to create this pointer inside of 1 of these structures to another 672 00:28:14,046 --> 00:28:16,656 such structure, does the word node exists 673 00:28:16,656 --> 00:28:18,686 at this point in time? 674 00:28:18,686 --> 00:28:19,776 Not really, right. 675 00:28:19,776 --> 00:28:22,346 The word node is literally declared here. 676 00:28:22,696 --> 00:28:25,056 Well, wait a minute you also said node here 677 00:28:25,416 --> 00:28:26,856 but this is again just because. 678 00:28:27,116 --> 00:28:30,466 I need temporarily a name so that I can refer 679 00:28:30,466 --> 00:28:34,566 to myself inside of this structure so long story short, 680 00:28:34,716 --> 00:28:38,556 struct node creates this thing and then the combination 681 00:28:38,556 --> 00:28:41,776 of type def and node here rename it more succinctly 682 00:28:41,776 --> 00:28:43,956 from struct node to just node. 683 00:28:44,166 --> 00:28:45,486 Why? Why this complexity? 684 00:28:45,486 --> 00:28:47,756 Frankly it's just tedious typing struct node, struct node, 685 00:28:47,756 --> 00:28:50,596 struct node all over the place so now we can just call it node. 686 00:28:50,756 --> 00:28:51,816 Thanks to type def. 687 00:28:51,816 --> 00:28:54,066 But if that's a little unclear syntactically that's fine 688 00:28:54,066 --> 00:28:57,196 for now just take for granted that it makes one 689 00:28:57,196 --> 00:28:59,136 of these rectangles possible. 690 00:28:59,586 --> 00:29:02,046 So how do we now start to manipulate these things? 691 00:29:02,046 --> 00:29:04,536 Well, let's rewind just a moment to an example 692 00:29:04,536 --> 00:29:05,746 that we already looked at in the past. 693 00:29:06,096 --> 00:29:08,316 Recall this thing from a couple of weeks ago. 694 00:29:08,596 --> 00:29:12,296 This was a structure that we used to define a student 695 00:29:12,296 --> 00:29:14,196 and there's one syntactic difference. 696 00:29:14,196 --> 00:29:15,966 What is fundamentally different 697 00:29:16,216 --> 00:29:19,836 about this is there's no word here to the right struct. 698 00:29:19,966 --> 00:29:23,806 Why would I not need to say type def struct student using the 699 00:29:23,806 --> 00:29:28,346 logic from a moment ago? 700 00:29:28,346 --> 00:29:29,226 [ Simultaneous Talking ] 701 00:29:29,226 --> 00:29:30,586 >> Exactly and let me tweak the word. 702 00:29:30,766 --> 00:29:33,396 I don't mention student inside of the struct, 703 00:29:33,546 --> 00:29:36,376 so I don't need a temporary word there, but I could put this here 704 00:29:36,606 --> 00:29:38,106 but again just as a matter of style 705 00:29:38,106 --> 00:29:40,596 because I'm not using struct student anywhere. 706 00:29:40,756 --> 00:29:42,476 I might as well make it more succinct. 707 00:29:42,476 --> 00:29:44,216 I do need to say struct and it's in red 708 00:29:44,216 --> 00:29:46,846 because it is an important key word here but I don't need 709 00:29:46,846 --> 00:29:49,166 to say students up here unless I'm gonna refer 710 00:29:49,166 --> 00:29:51,986 to a student internally and this hopefully makes a little bit 711 00:29:51,986 --> 00:29:52,906 of intuitive sense right? 712 00:29:52,906 --> 00:29:55,556 If I'm a student object, there should fundamentally 713 00:29:55,556 --> 00:29:58,096 in the real world and in programming be no notion 714 00:29:58,096 --> 00:30:00,366 of next inside of a student's object. 715 00:30:00,366 --> 00:30:03,036 Right? I shouldn't as a human have some kind 716 00:30:03,036 --> 00:30:05,926 of data remembrance of someone to the left of me 717 00:30:06,236 --> 00:30:07,586 so again we don't need a pointer there 718 00:30:07,586 --> 00:30:09,596 but we do have two pointers for name and house. 719 00:30:10,026 --> 00:30:11,106 So how did we use this? 720 00:30:11,106 --> 00:30:14,516 Well recall students structs 1 dot C which we looked 721 00:30:14,516 --> 00:30:17,536 at briefly awhile ago and this was just a simple example 722 00:30:17,806 --> 00:30:20,736 where we created a bunch of structs of type students. 723 00:30:21,046 --> 00:30:25,146 We then got the user's ID and the user's name and their house 724 00:30:25,146 --> 00:30:28,656 and then we scrolled down and just yelled out so and so is 725 00:30:28,656 --> 00:30:30,016 in [inaudible] if they were in [inaudible]. 726 00:30:30,756 --> 00:30:32,566 So recall this was from a couple of weeks ago 727 00:30:32,566 --> 00:30:35,166 and there's only a couple of important take aways for now. 728 00:30:35,716 --> 00:30:38,316 For now, recall that this is how we declare a-- 729 00:30:38,616 --> 00:30:42,296 in array of students and for those 730 00:30:42,296 --> 00:30:44,366 with prior programming experience class means 731 00:30:44,366 --> 00:30:46,936 like a class of students, not a class of objects. 732 00:30:47,246 --> 00:30:50,096 So student class gives me an array called class 733 00:30:50,656 --> 00:30:52,876 of student structures here. 734 00:30:53,376 --> 00:30:55,696 So what goes on here, this was the new syntax. 735 00:30:55,956 --> 00:30:59,526 If I wanna update the 8th student's ID number, 736 00:30:59,526 --> 00:31:00,586 I used the dot syntax. 737 00:31:01,056 --> 00:31:03,216 If I wanna update the 8th student's name, 738 00:31:03,276 --> 00:31:05,416 I use the dot syntax and if I wanna use the-- 739 00:31:05,416 --> 00:31:08,456 update the 8th student's house, dot syntax. 740 00:31:08,486 --> 00:31:09,146 That was new. 741 00:31:09,466 --> 00:31:11,026 Then this was kind of old school stuff, 742 00:31:11,026 --> 00:31:12,946 strcmp for string comparison. 743 00:31:12,946 --> 00:31:14,806 What was I comparing, I was comparing their house 744 00:31:14,856 --> 00:31:17,636 against literally the word [inaudible] in double quote 745 00:31:17,636 --> 00:31:19,306 and then this was important here. 746 00:31:19,526 --> 00:31:21,426 I did need to, for good measure, 747 00:31:21,426 --> 00:31:24,156 call free on house and name, why? 748 00:31:25,356 --> 00:31:27,546 Because what, sorry? 749 00:31:27,546 --> 00:31:27,736 [ Inaudible Remark ] 750 00:31:27,736 --> 00:31:32,146 >> Because I stored it but how did I get those strings, 751 00:31:32,366 --> 00:31:32,956 name and house? 752 00:31:34,566 --> 00:31:36,886 So you getstring and even though this was kind 753 00:31:36,886 --> 00:31:39,936 of some garbage we used to kind of brush under the rug, 754 00:31:39,936 --> 00:31:41,986 this was kind of a flaw in our previous programs 755 00:31:41,986 --> 00:31:44,306 for several weeks which revealed a couple of weeks ago 756 00:31:44,306 --> 00:31:46,956 that get string all this time has of course been using malloc 757 00:31:47,146 --> 00:31:49,846 and so now anytime you use malloc whether directly 758 00:31:49,846 --> 00:31:52,386 or indirectly, you must free that memory. 759 00:31:52,386 --> 00:31:54,656 Otherwise, Valgrind, this tool I showed a moment ago 760 00:31:54,796 --> 00:31:56,066 that would actually catch. 761 00:31:56,066 --> 00:31:58,976 If you run Valgrind in fact on most of your prior programs 762 00:31:58,976 --> 00:32:00,796 that you getstring from early in the semester, 763 00:32:00,956 --> 00:32:03,066 almost all of them would be buggy 764 00:32:03,246 --> 00:32:05,536 in that they would all have some memory leak. 765 00:32:06,026 --> 00:32:07,956 Alright, but that's okay, that was deliberate 766 00:32:08,016 --> 00:32:09,416 and those were the so called training wheels 767 00:32:09,416 --> 00:32:10,456 that are now off. 768 00:32:10,816 --> 00:32:14,426 So let's just do one modification of this to kind 769 00:32:14,616 --> 00:32:17,306 of tie this together also to this week's problem set 5. 770 00:32:17,306 --> 00:32:20,906 So this is struct2.C. It's almost identical 771 00:32:21,146 --> 00:32:23,646 but I thought it would be kind of neat to stop writing programs 772 00:32:23,686 --> 00:32:26,106 that take an input and then produce output 773 00:32:26,106 --> 00:32:27,596 and then lose that output forever. 774 00:32:27,596 --> 00:32:29,326 It would kind of be nice if our program 775 00:32:29,556 --> 00:32:32,766 for instance could store permanently the names and IDs 776 00:32:32,766 --> 00:32:35,376 and names of all the students we typed in, right? 777 00:32:35,376 --> 00:32:37,786 If we want to approximate the idea of a database, 778 00:32:38,146 --> 00:32:42,016 a server that stores information long term will be kind of nice. 779 00:32:42,016 --> 00:32:44,996 If I could use these same ideas now and start saving my work 780 00:32:45,106 --> 00:32:47,896 to disk, just like we take for granted in Microsoft word 781 00:32:47,896 --> 00:32:50,506 and the like the ability to save, let's do that ourselves. 782 00:32:50,926 --> 00:32:53,106 So here is exactly the same code. 783 00:32:53,106 --> 00:32:54,726 Give me an array of students structs, 784 00:32:55,166 --> 00:32:58,016 populate them using get int and getstring and getstring 785 00:32:58,646 --> 00:33:00,606 but down here, go ahead and say, 786 00:33:00,606 --> 00:33:02,316 whoever is in [inaudible] just to be cute. 787 00:33:02,656 --> 00:33:05,456 But now let's save these students to disk 788 00:33:05,986 --> 00:33:07,396 which just means save a file. 789 00:33:07,686 --> 00:33:08,726 So how might we do this? 790 00:33:08,796 --> 00:33:10,686 So we could do this in any number of ways. 791 00:33:10,686 --> 00:33:14,496 Microsoft back in the day decided that dot doc format, 792 00:33:14,546 --> 00:33:17,946 d-o-c is going to store your essay and your bold facing 793 00:33:17,946 --> 00:33:19,966 and your table of contents in a certain way. 794 00:33:20,266 --> 00:33:22,156 Similarly, we as programmers, 795 00:33:22,156 --> 00:33:24,696 have to decide how are we gonna store something 796 00:33:24,696 --> 00:33:27,506 like a student ID and name and house perhaps 797 00:33:27,506 --> 00:33:29,206 for multiple people in a file. 798 00:33:29,456 --> 00:33:31,296 So you could actually do this in a bunch of ways 799 00:33:31,296 --> 00:33:33,576 but we decided totally arbitrarily 800 00:33:33,826 --> 00:33:36,316 that we're just gonna store these IDs, names, 801 00:33:36,316 --> 00:33:40,256 and house one per line in a text file just so we can see it. 802 00:33:40,336 --> 00:33:42,456 But this is of course a very unsophisticated approach 803 00:33:42,456 --> 00:33:45,076 but it will allow us to write files to disks. 804 00:33:45,216 --> 00:33:45,936 So let's see this. 805 00:33:46,266 --> 00:33:48,476 The very first line of interest is this thing here. 806 00:33:48,476 --> 00:33:50,906 So F open, you might guess is file open. 807 00:33:51,256 --> 00:33:53,396 We've not had to use this ourselves unless you already 808 00:33:53,396 --> 00:33:55,086 dove into problem set 5 809 00:33:55,336 --> 00:33:57,246 but as the name suggests, it opens a file. 810 00:33:57,246 --> 00:33:58,626 What's the name of that file gonna be, 811 00:33:59,056 --> 00:34:00,556 arbitrarily I called it database. 812 00:34:00,716 --> 00:34:02,896 It could be called foo but again I chose database 813 00:34:03,136 --> 00:34:05,386 and then just take a guess, what does "W" means? 814 00:34:06,586 --> 00:34:07,196 So it means write. 815 00:34:07,556 --> 00:34:10,236 So it means write this file as opposed to reading it. 816 00:34:10,236 --> 00:34:13,726 So you can actually use the same F open function to read files in 817 00:34:13,726 --> 00:34:16,806 but for today, we'll just do it with W. As an aside, 818 00:34:16,806 --> 00:34:20,396 you might see in various books or tutorials, people saying WB 819 00:34:20,686 --> 00:34:25,316 for write binary or write ascii, realize that's on the appliance 820 00:34:25,316 --> 00:34:28,406 and on most modern systems, it suffices just to say R or W, 821 00:34:28,406 --> 00:34:30,576 you don't need to worry about the B, just FYI. 822 00:34:31,206 --> 00:34:32,696 So why do I check for this? 823 00:34:33,076 --> 00:34:35,926 If FP which is just generic notation 824 00:34:35,926 --> 00:34:38,416 for file pointer, if it equals null. 825 00:34:38,416 --> 00:34:41,316 When might something go wrong related 826 00:34:41,316 --> 00:34:42,496 to writing a file to disk? 827 00:34:42,906 --> 00:34:45,306 Why might F open this conjecture return null. 828 00:34:45,306 --> 00:34:45,856 [ Inaudible Remark ] 829 00:34:45,856 --> 00:34:46,006 >> sorry? 830 00:34:46,006 --> 00:34:46,956 [ Inaudible Remark ] 831 00:34:46,956 --> 00:34:49,786 >> So database is missing? 832 00:34:49,786 --> 00:34:51,486 Maybe that file doesn't exist. 833 00:34:51,486 --> 00:34:53,356 In this case, that's gonna be okay cause what's nice 834 00:34:53,356 --> 00:34:55,116 about F open is if you're writing a file 835 00:34:55,116 --> 00:34:56,996 and it doesn't exist, it will create it for you 836 00:34:56,996 --> 00:34:59,626 but a good thought, that would be the answer if we we're trying 837 00:34:59,626 --> 00:35:01,126 to read it back in certainly. 838 00:35:01,226 --> 00:35:02,026 Yeah? 839 00:35:02,596 --> 00:35:03,016 >> The disk is full. 840 00:35:03,016 --> 00:35:04,486 >> Yes, so if your disk is full, right. 841 00:35:04,486 --> 00:35:08,556 If your hard drive is just overflowing with other files 842 00:35:08,556 --> 00:35:10,666 and you just don't have enough space for any new file, 843 00:35:10,666 --> 00:35:15,316 F open could return null, other maybe more technical problem? 844 00:35:16,666 --> 00:35:19,606 If you've done something stupid or accidentally like try-- 845 00:35:19,606 --> 00:35:23,086 change the permissions on some folder or you're in some folder 846 00:35:23,086 --> 00:35:24,706 on the system that doesn't belong to you, 847 00:35:24,706 --> 00:35:26,996 it's someone else's my document's folder for instance, 848 00:35:27,116 --> 00:35:29,996 well F open might return null in that case too. 849 00:35:29,996 --> 00:35:32,356 So in short, if you lack the ability to write a file, 850 00:35:32,396 --> 00:35:33,456 you're just gonna get back null 851 00:35:33,456 --> 00:35:34,556 but what is it you're gonna get back? 852 00:35:35,206 --> 00:35:39,846 Well FP is of type file and it is in all caps just 853 00:35:40,116 --> 00:35:42,726 because someone decided to make it that way and it turns 854 00:35:42,726 --> 00:35:46,386 out a file structure, F-I-L-E in all caps is some kind 855 00:35:46,386 --> 00:35:49,296 of structure not unlike conceptually a student's 856 00:35:49,296 --> 00:35:52,046 structure, inside of which is a bunch of stuff 857 00:35:52,556 --> 00:35:55,596 and that stuff is generally hidden from us but you can think 858 00:35:55,596 --> 00:35:58,166 of a file, kind of like a cassette tape 859 00:35:58,226 --> 00:35:59,516 if you even remember these things. 860 00:35:59,516 --> 00:36:02,496 A cassette tape of course has a whole bunch of tape inside 861 00:36:02,716 --> 00:36:04,456 and when you play a song in a cassette tape 862 00:36:04,456 --> 00:36:07,186 with these 2 things turn and the tape physically moves. 863 00:36:07,366 --> 00:36:10,096 A file on a computer system obviously has no tape 864 00:36:10,336 --> 00:36:13,066 but conceptually you read and write it in the same way. 865 00:36:13,256 --> 00:36:15,966 When you open a file with F open, it's like taking 866 00:36:15,966 --> 00:36:19,886 out a cassette tape and starting at the beginning of that tape 867 00:36:19,886 --> 00:36:22,156 and as you start to read from the cassette tape or read 868 00:36:22,156 --> 00:36:24,776 from the file, it's like advancing the reading head 869 00:36:25,096 --> 00:36:28,166 or the cursor so to speak or the file position indicator 870 00:36:28,166 --> 00:36:31,506 so to speak on that file and if you write to a file, 871 00:36:31,796 --> 00:36:32,996 similarly that happens. 872 00:36:32,996 --> 00:36:35,186 Now this is relevant because in problem set 5, 873 00:36:35,186 --> 00:36:39,886 one of the challenge is to enlarge a bit map file. 874 00:36:40,146 --> 00:36:42,026 So you have to take a pixel and double it 875 00:36:42,026 --> 00:36:43,226 or triple it and so forth. 876 00:36:43,486 --> 00:36:45,776 So realize that one of the challenges that arises 877 00:36:45,776 --> 00:36:49,506 in problem set 5 is, well if a file is being read from start 878 00:36:49,506 --> 00:36:51,916 to end just like a cassette tape, well what if you need 879 00:36:51,916 --> 00:36:54,076 to kind of rewind the tape and go back 880 00:36:54,076 --> 00:36:57,336 and reprint those same pixels, re-output those same pixels, 881 00:36:57,556 --> 00:36:59,396 well there's functions like F seek, 882 00:36:59,536 --> 00:37:01,396 file seek which can help you do this. 883 00:37:01,436 --> 00:37:03,666 There's F rewind although that's typically overkill 884 00:37:03,876 --> 00:37:06,186 but there's other ways altogether as Tommy discussed 885 00:37:06,186 --> 00:37:07,766 in the walk through involving arrays 886 00:37:07,836 --> 00:37:09,556 and just remembering some of those bits. 887 00:37:09,596 --> 00:37:13,566 But in short, for now, this is a pointer to a file structure 888 00:37:13,766 --> 00:37:15,606 and it's not clear to us just 889 00:37:15,606 --> 00:37:18,396 yet what's actually in that file. 890 00:37:19,016 --> 00:37:22,806 I accidentally clicked there so let me go back to structs2 dot C 891 00:37:22,806 --> 00:37:24,076 and we'll finish the story. 892 00:37:24,516 --> 00:37:26,246 So how do you write to this file? 893 00:37:26,246 --> 00:37:29,316 Well if I get as far as here inside of this if condition, 894 00:37:29,576 --> 00:37:31,676 the file opened okay and I'm good to go to write. 895 00:37:31,776 --> 00:37:33,186 So how do I write to a file. 896 00:37:33,426 --> 00:37:34,586 It's actually pretty familiar. 897 00:37:34,886 --> 00:37:38,676 So you don't use F-- you don't print F, use F print F 898 00:37:38,676 --> 00:37:41,976 for file print F. The first argument is apparently a pointer 899 00:37:41,976 --> 00:37:44,506 to the thing that you wanna write to and what are the second 900 00:37:44,506 --> 00:37:45,636 and third arguments apparently. 901 00:37:45,636 --> 00:37:49,186 It's pretty much identical to print F, right. 902 00:37:49,186 --> 00:37:50,216 It's the format string. 903 00:37:50,216 --> 00:37:52,766 What do you wanna write and what you want to substitute 904 00:37:52,766 --> 00:37:55,226 in for any placeholders so apparently, 905 00:37:55,226 --> 00:37:57,876 I'm gonna write integer followed by a new line, 906 00:37:58,056 --> 00:38:01,436 a string new line, string new line and I'm gonna plug in each 907 00:38:01,436 --> 00:38:02,636 of these student's IDs, 908 00:38:02,636 --> 00:38:05,786 names and houses then I'm gonna close my file with F close 909 00:38:05,786 --> 00:38:07,656 as file close then I'm gonna go back 910 00:38:07,826 --> 00:38:09,786 and free the student's name and house. 911 00:38:10,586 --> 00:38:14,056 But one question, I don't free the student's ID number, why? 912 00:38:14,826 --> 00:38:15,646 I don't free ID. 913 00:38:15,646 --> 00:38:16,286 >> That's not a string. 914 00:38:17,716 --> 00:38:20,826 >> It's not a string, it's an int and we did not allocate 915 00:38:20,826 --> 00:38:22,746 that integer with malloc. 916 00:38:22,746 --> 00:38:25,136 It was just there inside of the struct recall. 917 00:38:25,466 --> 00:38:26,776 So let's see what the end result here. 918 00:38:26,776 --> 00:38:29,406 So this is structs2 dot C, let me go ahead 919 00:38:29,406 --> 00:38:32,216 and make structs2, okay. 920 00:38:32,536 --> 00:38:33,816 It seem to compile okay. 921 00:38:33,816 --> 00:38:35,126 Let me go ahead and run this now. 922 00:38:35,656 --> 00:38:39,356 So run structs2, enter, student's ID, 923 00:38:39,456 --> 00:38:43,026 let's Matt Kirkland [phonetic], student's ID, 924 00:38:43,026 --> 00:38:45,356 let's say Rob Kirkland [phonetic], 925 00:38:46,016 --> 00:38:49,976 student's ID 3 David Meather. 926 00:38:50,286 --> 00:38:52,826 So it still says David is in Meather but then nothing, 927 00:38:53,046 --> 00:38:55,976 but if I do LS now, there's a whole bunch of code from today 928 00:38:55,976 --> 00:38:57,726 as well as this file database. 929 00:38:57,796 --> 00:38:59,186 So let me go ahead and run G edit 930 00:38:59,276 --> 00:39:01,676 on database, enter and voila! 931 00:39:01,676 --> 00:39:03,236 We've just created this file. 932 00:39:03,426 --> 00:39:05,776 Again, so primitive this format, right? 933 00:39:05,776 --> 00:39:08,706 This is not what a word document looks like inside. 934 00:39:08,926 --> 00:39:12,526 But it is our own proprietary file format just as bit map 935 00:39:12,526 --> 00:39:15,096 and JPEG have their own file formats and ours too is 936 00:39:15,096 --> 00:39:18,586 in just ascii text which might not be sufficient 937 00:39:18,586 --> 00:39:20,876 if we wanna store something like an ID photo 938 00:39:21,156 --> 00:39:22,886 but at least we now have one way 939 00:39:22,886 --> 00:39:24,586 of expressing ourselves persistently 940 00:39:24,896 --> 00:39:26,356 by writing these things to disk. 941 00:39:26,896 --> 00:39:29,846 And it remains then to use these same fundamentals 942 00:39:30,046 --> 00:39:32,646 to start linking things together in an interesting way. 943 00:39:32,646 --> 00:39:34,776 Let's go ahead and take our 5-minute break now 944 00:39:34,946 --> 00:39:35,876 and then we'll return to that. 945 00:39:37,516 --> 00:39:39,766 [ Pause ] 946 00:39:40,266 --> 00:39:43,986 >> Alright, so we are back. 947 00:39:44,166 --> 00:39:46,876 Linked lists are much more exciting 948 00:39:46,876 --> 00:39:50,076 when we actually make a mess of them with actual humans. 949 00:39:50,076 --> 00:39:53,646 So if we could indulge, are 8 people willing 950 00:39:53,646 --> 00:39:55,056 to volunteer on center stage? 951 00:39:56,016 --> 00:39:57,016 Have to be comfy on camera. 952 00:39:58,486 --> 00:40:01,086 Okay, let me decide this randomly 953 00:40:01,086 --> 00:40:01,866 who is se gonna come up. 954 00:40:01,866 --> 00:40:05,326 Okay here we go, how about you two, three, four, five, six, 955 00:40:05,636 --> 00:40:07,766 like in the side of the room, seven, eight. 956 00:40:08,326 --> 00:40:09,096 Come on up. 957 00:40:09,096 --> 00:40:11,026 All right, you lost your chance. 958 00:40:11,286 --> 00:40:13,596 All right, come on up and we are going 959 00:40:13,596 --> 00:40:17,296 to hand you the same numbers we see here on the screen. 960 00:40:17,296 --> 00:40:19,336 So you will be number nine and if you wanna go 961 00:40:19,336 --> 00:40:25,666 on our usual spot over there welcome 17, welcome 22, 962 00:40:26,516 --> 00:40:35,296 welcome 26, we will fix the slide 963 00:40:35,296 --> 00:40:36,776 to make this consistent with reality. 964 00:40:37,556 --> 00:40:41,186 34, you can be first. 965 00:40:42,246 --> 00:40:43,926 You can be pointer. 966 00:40:44,126 --> 00:40:44,246 >> Okay. 967 00:40:44,576 --> 00:40:47,206 >> You can be predecessor pointer, very dramatic. 968 00:40:47,206 --> 00:40:53,096 All right, so slights bug and this is an image too isn't it. 969 00:40:53,226 --> 00:40:57,006 May I see 20-- who did I get wrong, 970 00:40:57,556 --> 00:41:00,476 29 okay, debugging [laughter]. 971 00:41:01,956 --> 00:41:07,506 One moment please. 972 00:41:08,096 --> 00:41:10,346 Fixed, all right [laughter]. 973 00:41:10,996 --> 00:41:13,786 So that's 26. 974 00:41:13,786 --> 00:41:16,836 Okay, so here we have a linked list and let's ignore 975 00:41:16,836 --> 00:41:18,816 for the moment how we got to this point already. 976 00:41:18,816 --> 00:41:21,086 Let's just assume that we are-we've fast forwarded 10 977 00:41:21,086 --> 00:41:23,146 minutes mentally and we already have the ability 978 00:41:23,146 --> 00:41:25,596 to create linked lists and here is the state of our world 979 00:41:25,596 --> 00:41:27,356 at this moment and actually let's pop 980 00:41:27,356 --> 00:41:29,656 out if you don't mind pointer and predecessor pointer 981 00:41:29,656 --> 00:41:32,426 if you could be here in the heap for just a moment. 982 00:41:32,426 --> 00:41:36,106 So first, this is good and now this is the point 983 00:41:36,106 --> 00:41:37,566 at which we need to really mimic this picture 984 00:41:37,566 --> 00:41:39,566 if you could somewhat awkwardly with your left hand point 985 00:41:39,566 --> 00:41:45,716 at the next person except for 34 where I would point 986 00:41:45,716 --> 00:41:47,196 down with your left hand, okay. 987 00:41:47,656 --> 00:41:50,266 So now we have implemented a linked list with humans. 988 00:41:50,266 --> 00:41:51,176 So what's going on here? 989 00:41:51,176 --> 00:41:53,816 So first of all each of these-- you still have to point. 990 00:41:54,196 --> 00:41:57,196 All right, each of these structures has two things inside 991 00:41:57,196 --> 00:41:57,816 of the struct. 992 00:41:57,816 --> 00:41:59,456 We have an integer like the number 9 993 00:41:59,646 --> 00:42:01,556 and then we also have a left arm known 994 00:42:01,646 --> 00:42:03,316 as next which is the pointer. 995 00:42:03,316 --> 00:42:05,496 So each of these humans represents a struct 996 00:42:05,496 --> 00:42:07,826 at this point with those two fields for a total 997 00:42:07,826 --> 00:42:10,566 of little check how many bytes does each person represent. 998 00:42:11,176 --> 00:42:13,566 How many bytes? 999 00:42:14,166 --> 00:42:14,466 >> Eight. 1000 00:42:14,626 --> 00:42:16,406 >> Eight, right, so four bytes for the int 1001 00:42:16,406 --> 00:42:17,706 and four bytes for the pointer. 1002 00:42:17,706 --> 00:42:19,886 So each of these rectangles is 8 bytes 1003 00:42:19,886 --> 00:42:21,186 on a system like the appliance. 1004 00:42:21,446 --> 00:42:23,986 Now first is a little special, so first what's your name? 1005 00:42:24,206 --> 00:42:24,436 >> Lisa. 1006 00:42:24,586 --> 00:42:27,996 >> Lisa is especial in that she does not have an integer. 1007 00:42:27,996 --> 00:42:31,296 She is only a pointer and so the reason for her being drawn 1008 00:42:31,296 --> 00:42:34,066 as a smaller rectangle up there is that she is only 4 bytes 1009 00:42:34,306 --> 00:42:36,046 but Lisa is super important 1010 00:42:36,046 --> 00:42:38,676 because without Lisa we're gonna forget completely 1011 00:42:38,676 --> 00:42:39,946 where this whole linked list is 1012 00:42:39,946 --> 00:42:43,216 and without Lisa we would have a whole huge memory leak 1013 00:42:43,266 --> 00:42:45,666 if we allocated all of these nodes but didn't remember 1014 00:42:45,666 --> 00:42:47,156 where the first of those nodes is. 1015 00:42:47,326 --> 00:42:49,406 So for now, each-- Lisa is special. 1016 00:42:49,406 --> 00:42:51,816 She is the pointer and Lisa really is our linked list. 1017 00:42:52,206 --> 00:42:54,926 So we've had arrays before and we've had names of arrays 1018 00:42:54,926 --> 00:42:57,036 which are really just the addresses as we've seen 1019 00:42:57,486 --> 00:43:00,306 and so Lisa is sort of the equivalent of that. 1020 00:43:00,306 --> 00:43:03,256 The only piece of data we have to have to have to keep 1021 00:43:03,256 --> 00:43:05,536 around in memory to remember where the entirety 1022 00:43:05,536 --> 00:43:07,986 of our linked list is, is the first person, 1023 00:43:07,986 --> 00:43:09,896 a pointer to the start of the list and now all 1024 00:43:09,896 --> 00:43:11,816 of these guys are fundamentally the same, 1025 00:43:11,816 --> 00:43:14,236 they are incomplete structures except for 34 1026 00:43:14,236 --> 00:43:15,896 who apparently is pointing at the ground 1027 00:43:15,896 --> 00:43:17,336 because that's the end of the list 1028 00:43:17,476 --> 00:43:19,026 so that's probably gonna represent what, 1029 00:43:19,026 --> 00:43:19,706 when pointing down. 1030 00:43:20,636 --> 00:43:23,696 So null, we need a special value to say there is no one else 1031 00:43:23,696 --> 00:43:25,036 to the left so when pointing 1032 00:43:25,036 --> 00:43:27,316 down this represents storing the null pointer, 1033 00:43:27,316 --> 00:43:29,026 all zeros in that pointer field. 1034 00:43:29,296 --> 00:43:31,716 Okay, so the first challenge at hand and we will make use of-- 1035 00:43:31,716 --> 00:43:33,996 you don't technically have to keep standing there like that, 1036 00:43:33,996 --> 00:43:34,766 we will use here in a moment. 1037 00:43:34,766 --> 00:43:35,126 [ Laughter ] 1038 00:43:35,126 --> 00:43:39,986 >> Okay so let's go ahead and try inserting at the end. 1039 00:43:40,276 --> 00:43:42,576 So we have the-- we need the number 55. 1040 00:43:42,666 --> 00:43:45,706 Actually let me change it, let me steal a pointer, 1041 00:43:45,806 --> 00:43:46,896 can you take on the rule 55. 1042 00:43:47,086 --> 00:43:48,436 So what's your name? 1043 00:43:48,576 --> 00:43:48,846 >> Lucas. 1044 00:43:48,846 --> 00:43:50,626 >> So malloc Lucas, right. 1045 00:43:50,626 --> 00:43:52,896 So now we have another 8 bytes inside 1046 00:43:52,896 --> 00:43:56,896 of which we can point the number 55 and now let's figure out how 1047 00:43:56,896 --> 00:43:58,186 to insert Lucas into this list 1048 00:43:58,186 --> 00:43:59,386 and actually what's your name again? 1049 00:43:59,386 --> 00:43:59,726 >> Joseph. 1050 00:43:59,726 --> 00:44:03,456 >> Joseph oh okay and I will take that [laughter]. 1051 00:44:03,456 --> 00:44:06,016 So Joseph if you wanna kind of take charge here 1052 00:44:06,016 --> 00:44:10,156 and explain just in real human terms how do we insert 55 1053 00:44:10,216 --> 00:44:10,896 into our linked list. 1054 00:44:10,986 --> 00:44:15,096 If you wanna go ahead and make the insertion, sure [laughter]. 1055 00:44:19,576 --> 00:44:22,786 And let me give you this if this is on just 1056 00:44:22,786 --> 00:44:24,286 to explain what it is you're doing here. 1057 00:44:25,066 --> 00:44:28,336 >> Okay, so I guess you take him to the beginning of the list 1058 00:44:28,336 --> 00:44:31,096 that you know this is where the list starts and-- . 1059 00:44:31,256 --> 00:44:31,346 >> Okay. 1060 00:44:31,576 --> 00:44:34,446 >> You go down through each pointer to find the end 1061 00:44:34,446 --> 00:44:37,136 of the list and then end up here. 1062 00:44:37,316 --> 00:44:38,186 >> Okay, good. 1063 00:44:38,186 --> 00:44:39,666 So we found the so-called tail of the list 1064 00:44:39,666 --> 00:44:40,336 and then what did you do? 1065 00:44:40,986 --> 00:44:42,626 >> Then you insert there. 1066 00:44:42,986 --> 00:44:45,016 >> Okay. So let's push a little harder and what it means 1067 00:44:45,046 --> 00:44:48,466 to insert into the list what you see, you put him there but-- 1068 00:44:48,806 --> 00:44:50,616 >> So you take the pointer 1069 00:44:50,616 --> 00:44:55,306 of the last element in the list and put. 1070 00:44:55,306 --> 00:44:55,373 [ Laughter ] 1071 00:44:55,373 --> 00:44:55,440 . 1072 00:44:55,440 --> 00:44:58,666 >> Okay excellent and where should his left hand 1073 00:44:58,666 --> 00:44:59,126 be pointing. 1074 00:44:59,416 --> 00:44:59,816 >> To the ground. 1075 00:44:59,976 --> 00:45:00,636 >> Okay good. 1076 00:45:00,636 --> 00:45:02,546 And this is actually quite important just 1077 00:45:02,546 --> 00:45:05,966 because you've malloced a node, a structure put 1078 00:45:05,966 --> 00:45:07,816 in the number 55, you also have 1079 00:45:07,816 --> 00:45:11,036 to change what the default next value is because just 1080 00:45:11,036 --> 00:45:13,276 as with local variables if you just malloc a node 1081 00:45:13,526 --> 00:45:15,756 and then put 55 there what's gonna be inside 1082 00:45:15,756 --> 00:45:17,276 of the other 4 bytes called next. 1083 00:45:18,526 --> 00:45:18,706 >> Garbage. 1084 00:45:18,776 --> 00:45:21,116 >> It's just some garbage value and that's really bad 1085 00:45:21,116 --> 00:45:22,146 in the context of pointers 1086 00:45:22,146 --> 00:45:23,846 because if you follow a garbage value 1087 00:45:23,846 --> 00:45:26,586 as a pointer you're gonna end up in some crazy place in memory 1088 00:45:26,586 --> 00:45:29,466 and the odds are that you're going to seg fault or core dump. 1089 00:45:29,646 --> 00:45:30,396 So that was really good. 1090 00:45:30,396 --> 00:45:34,176 So now we have 55 at the right position, 34 is point at 55, 1091 00:45:34,466 --> 00:45:37,636 let's now try to do a little something else with list. 1092 00:45:37,846 --> 00:45:41,096 Supposed we instead now wanted to insert the number 5 so I need 1093 00:45:41,096 --> 00:45:43,156 to malloc one more volunteer please. 1094 00:45:43,156 --> 00:45:47,356 Okay come on up malloc number 5, what's your name. 1095 00:45:48,226 --> 00:45:48,516 >> Emmy. 1096 00:45:49,026 --> 00:45:50,896 >> So Emmy's been malloced here. 1097 00:45:50,896 --> 00:45:54,556 We're gonna need you again with the mic to handle our insertion 1098 00:45:54,556 --> 00:45:55,636 at the head of the list 1099 00:45:55,636 --> 00:45:57,966 and consider now what is fundamentally different 1100 00:45:57,966 --> 00:46:00,406 about insertion at the head of the list and the reason 1101 00:46:00,406 --> 00:46:02,606 of course that we're doing head and tail is because we're trying 1102 00:46:02,606 --> 00:46:05,166 to keep the list sorted which we've tried to do with arrays 1103 00:46:05,166 --> 00:46:07,636 in the past, all right, go ahead. 1104 00:46:07,636 --> 00:46:07,776 [ Inaudible Remark ] 1105 00:46:07,776 --> 00:46:08,796 >> Oh, and talk as close as you can. 1106 00:46:08,796 --> 00:46:10,756 >> Okay, so you have her at some place in memory. 1107 00:46:10,976 --> 00:46:11,436 >> Okay. 1108 00:46:11,516 --> 00:46:13,516 [ Laughter ] 1109 00:46:13,596 --> 00:46:17,556 >> What the first person right to her and her 1110 00:46:17,556 --> 00:46:19,276 to point to the next person. 1111 00:46:19,426 --> 00:46:20,046 >> Okay good. 1112 00:46:20,136 --> 00:46:22,286 And now even though as humans they did have to kind 1113 00:46:22,286 --> 00:46:24,386 of shuffle left and right to make room for each other, 1114 00:46:24,616 --> 00:46:26,656 in memory because each of these nodes is 1115 00:46:26,656 --> 00:46:29,166 that some arbitrary location in memory as determined 1116 00:46:29,166 --> 00:46:33,436 by malloc notice what did not have to happen was 19, 17, 22, 1117 00:46:33,436 --> 00:46:36,426 26, 34, and 55 did not have to move, right. 1118 00:46:36,426 --> 00:46:38,776 If they did have to move, the running time 1119 00:46:38,776 --> 00:46:40,996 of insert would not be constant time, 1120 00:46:40,996 --> 00:46:42,146 it instead would have been what? 1121 00:46:43,216 --> 00:46:46,886 They go of N right and the worse case if we had to move every one 1122 00:46:46,886 --> 00:46:49,586 over by one spot in memory which we would had to do 1123 00:46:49,586 --> 00:46:53,296 in an array its big O of N but instead insertion is big O 1124 00:46:53,296 --> 00:46:56,666 of 1 constant time 'cause all we have to do is right to the start 1125 00:46:56,666 --> 00:46:58,336 of the list and we know where Lisa is 1126 00:46:58,336 --> 00:47:00,776 because she again is the special pointer that we always keep 1127 00:47:00,776 --> 00:47:03,126 around and even though we had to do a couple things 1128 00:47:03,126 --> 00:47:06,386 like point her at 5 and 5 at 9 well that's still two, 1129 00:47:06,476 --> 00:47:08,746 three steps it's a finite number, it's constant 1130 00:47:08,966 --> 00:47:10,636 so that was a really fast operation. 1131 00:47:10,636 --> 00:47:12,876 By contrast, what was insertion at the tail of the list? 1132 00:47:14,396 --> 00:47:15,006 >> The same thing. 1133 00:47:15,166 --> 00:47:16,886 >> Well, big O of-- 1134 00:47:16,966 --> 00:47:19,896 >> Big O of while you have to go-- 1135 00:47:19,896 --> 00:47:22,316 >> Okay we had to go to every pointer so big O of N exactly. 1136 00:47:22,316 --> 00:47:24,706 So big O of N. So let's see now what happens 1137 00:47:24,706 --> 00:47:29,156 if we make this a little trickier and have one more node 1138 00:47:29,156 --> 00:47:31,776 so I need a malloc one last volunteer number 20. 1139 00:47:33,336 --> 00:47:35,666 Hansel, come on up. 1140 00:47:36,286 --> 00:47:38,856 And so we've malloc another node and take it away. 1141 00:47:38,856 --> 00:47:41,646 He belongs in the middle and is explain for the audience 1142 00:47:41,646 --> 00:47:43,816 if you could how we find his location because it's not 1143 00:47:43,816 --> 00:47:45,656 as simple as tail or the head this time. 1144 00:47:46,146 --> 00:47:48,366 >> Okay. Well, in the middle is N. 1145 00:47:48,736 --> 00:47:49,976 >> Well, so in sorted order. 1146 00:47:50,346 --> 00:47:50,986 >> Sorted order. 1147 00:47:50,986 --> 00:47:52,336 >> Yes and you can walk in front if its-- 1148 00:47:52,456 --> 00:47:53,926 if you need to see the actual numbers. 1149 00:47:53,986 --> 00:47:56,456 >> Okay, so what's her name? 1150 00:47:57,146 --> 00:47:59,326 >> You weren't expecting to be this involved we're you 1151 00:47:59,326 --> 00:48:00,976 in this demo, okay [laughter]. 1152 00:48:01,286 --> 00:48:04,976 >> But so you go to first pointer, 1153 00:48:06,366 --> 00:48:08,576 follow it to the next value, compare the value. 1154 00:48:08,576 --> 00:48:09,246 >> Good. 1155 00:48:09,246 --> 00:48:14,576 >> Then if it's greater than the value in the first element. 1156 00:48:14,726 --> 00:48:14,936 >> Okay. 1157 00:48:15,016 --> 00:48:17,156 >> If you point to the second one and compare those. 1158 00:48:17,516 --> 00:48:19,376 >> Good. Just please just watch where you are walking. 1159 00:48:19,886 --> 00:48:20,046 >> Okay. 1160 00:48:22,686 --> 00:48:25,546 >> Okay and good and so now how many pointers need 1161 00:48:25,546 --> 00:48:27,426 to be updated? 1162 00:48:28,216 --> 00:48:28,846 >> I will see [laughter]. 1163 00:48:28,976 --> 00:48:34,206 So, however many-- whatever element that is. 1164 00:48:34,536 --> 00:48:36,836 >> Good. Okay well actually let's view it. 1165 00:48:36,836 --> 00:48:38,906 Can we pop Hansel out for just a moment and 17 1166 00:48:38,906 --> 00:48:40,286 of you could keep pointing to 22 1167 00:48:40,456 --> 00:48:44,266 so just super deliberately now what needs to be updated 1168 00:48:44,266 --> 00:48:46,196 in the way of left hand to insert Ansel 1169 00:48:46,196 --> 00:48:46,966 in the right location. 1170 00:48:46,966 --> 00:48:49,156 >> So he has to point to Hansel. 1171 00:48:49,326 --> 00:48:49,996 >> Okay, that's good. 1172 00:48:49,996 --> 00:48:50,556 >> Point to her. 1173 00:48:50,706 --> 00:48:52,116 >> Good bug. 1174 00:48:52,556 --> 00:48:55,926 So you just broke our linked list how and why 1175 00:48:56,646 --> 00:48:57,876 and someone else can answer this, I don't want 1176 00:48:57,876 --> 00:48:59,066 to put you too much on this problem. 1177 00:48:59,066 --> 00:49:03,486 What was the bug in that? 1178 00:49:03,486 --> 00:49:03,553 [ Inaudible Answer ] 1179 00:49:03,553 --> 00:49:06,486 >> Right, so you have to point Hansel at 22 first 1180 00:49:06,936 --> 00:49:08,066 because otherwise what happened? 1181 00:49:08,866 --> 00:49:10,866 When we updated 17 to point 1182 00:49:10,866 --> 00:49:16,286 to Hansel number 20 we now have lost a pointer namely 1183 00:49:16,356 --> 00:49:18,916 where did 22 go, where did 26 go, we've both-- 1184 00:49:18,916 --> 00:49:22,346 we have a dangling reference we've orphaned frankly those 1185 00:49:22,376 --> 00:49:23,236 four people there. 1186 00:49:23,236 --> 00:49:25,796 So this is an example of perfectly memory leak, 1187 00:49:26,046 --> 00:49:28,446 so well done so let's fix. 1188 00:49:29,026 --> 00:49:30,356 >> Okay, so now you point. 1189 00:49:31,166 --> 00:49:33,026 >> Okay Hansel points to 22 and-- 1190 00:49:33,026 --> 00:49:33,266 >> Seventeen. 1191 00:49:33,606 --> 00:49:36,486 >> Seventeen point to 20 and voila, oh. 1192 00:49:36,486 --> 00:49:37,586 Okay that was a lot of work 1193 00:49:37,586 --> 00:49:40,716 but incredibly important now the order of operation. 1194 00:49:40,836 --> 00:49:43,646 So let's hold that pause for just a moment and let's see 1195 00:49:43,646 --> 00:49:46,756 if we can round out just a couple more 'cause I promise you 1196 00:49:46,756 --> 00:49:48,146 it's a lot more interesting to look 1197 00:49:48,146 --> 00:49:49,786 at humans do this than actually C code. 1198 00:49:50,206 --> 00:49:54,106 So supposed we wanted to remove the tail of the list, 1199 00:49:54,326 --> 00:49:57,446 supposed that we said we needed to chop off number 55 1200 00:49:57,446 --> 00:49:59,786 or we wanna remove the largest element from the list, 1201 00:49:59,786 --> 00:50:00,636 where are we go going with this? 1202 00:50:00,856 --> 00:50:02,796 Well you can imagine now if we have the ability 1203 00:50:02,986 --> 00:50:05,386 to represent a dynamic length of a list. 1204 00:50:05,976 --> 00:50:09,586 That's more interesting-- more dynamic than array we can start 1205 00:50:09,586 --> 00:50:12,706 to implement that idea like a stack of trays or a queue 1206 00:50:12,896 --> 00:50:14,746 in something like a line outside the Apple store. 1207 00:50:14,746 --> 00:50:15,896 So suppose the goal now is 1208 00:50:15,896 --> 00:50:18,836 to remove the highest numbered person 55 what needs 1209 00:50:18,836 --> 00:50:19,936 to be updated? 1210 00:50:19,936 --> 00:50:22,256 >> Okay so you can just point her to the ground. 1211 00:50:22,446 --> 00:50:23,326 >> Good. 1212 00:50:23,776 --> 00:50:24,236 >> Free him. 1213 00:50:24,596 --> 00:50:25,816 >> Okay good! 1214 00:50:26,716 --> 00:50:28,646 But what did-- okay so that's good 1215 00:50:28,646 --> 00:50:31,936 but technically we run the risk there of doing what again? 1216 00:50:32,496 --> 00:50:37,226 What we technically might have lost 8 bites of memory 1217 00:50:37,436 --> 00:50:41,216 because we pointed 34 down which means okay well where was 55 1218 00:50:41,446 --> 00:50:43,466 so we at least need and I can help out here. 1219 00:50:43,786 --> 00:50:46,146 We at least need to either invoke me 1220 00:50:46,146 --> 00:50:47,906 like a temporary variable and I point 1221 00:50:47,906 --> 00:50:51,136 at 55 now 34 can point again at the ground 1222 00:50:51,306 --> 00:50:52,406 and then what do we free? 1223 00:50:52,826 --> 00:50:55,966 We free this pointer which is still pointing at the same node 1224 00:50:55,966 --> 00:50:57,626 or what could you also do rather than-- 1225 00:50:57,626 --> 00:50:58,326 [ Inaudible Remark ] 1226 00:50:58,326 --> 00:51:01,136 >> Right you could free 55 first. 1227 00:51:01,136 --> 00:51:01,846 There's nothing bad 1228 00:51:02,056 --> 00:51:05,786 about freeing a node that's been pointed at by someone else 1229 00:51:05,936 --> 00:51:09,356 so long as you immediately thereafter lower 34's hand. 1230 00:51:09,536 --> 00:51:11,476 So either you can do it with a temporary variable 1231 00:51:11,516 --> 00:51:13,146 or just do it in the right order. 1232 00:51:13,306 --> 00:51:17,216 And lastly, if we wanna actually update say the head of the list 1233 00:51:18,136 --> 00:51:19,816 such that it looks like this. 1234 00:51:19,866 --> 00:51:21,466 We wanna remove number 5. 1235 00:51:22,236 --> 00:51:23,596 Can I put pressure on you one last time 1236 00:51:23,596 --> 00:51:26,156 to get this one right with no memory leaks? 1237 00:51:26,156 --> 00:51:27,856 [ Inaudible Remark ] 1238 00:51:27,856 --> 00:51:30,016 >> We wanna remove 5. 1239 00:51:30,996 --> 00:51:34,696 Oh! What needs to happen? 1240 00:51:34,876 --> 00:51:39,406 >> Involve the temporary pointer, pointer 5 1241 00:51:39,406 --> 00:51:40,046 >> Okay. [inaudible] mine. 1242 00:51:40,046 --> 00:51:41,156 >> Okay. Point at 5. 1243 00:51:41,236 --> 00:51:41,416 >> Good. 1244 00:51:41,596 --> 00:51:43,456 >> Point her at whatever number that-- 1245 00:51:43,536 --> 00:51:46,276 and then you can free the point. 1246 00:51:46,276 --> 00:51:46,756 >> Excellent! 1247 00:51:46,876 --> 00:51:48,586 So if we could-- a big round of applause for these guys. 1248 00:51:48,586 --> 00:51:49,106 This is not easy. 1249 00:51:49,106 --> 00:51:50,716 [ Applause ] 1250 00:51:50,716 --> 00:51:51,866 >> Very nicely done. 1251 00:51:51,866 --> 00:51:53,436 Guys you can head out to Rob there. 1252 00:51:56,086 --> 00:51:56,986 Thank you so much. 1253 00:51:56,986 --> 00:52:00,916 So now let us make linked list look much less exciting 1254 00:52:00,916 --> 00:52:02,236 with actual code. 1255 00:52:02,236 --> 00:52:05,246 Just to get a sense of what happens here and how 1256 00:52:05,246 --> 00:52:06,426 to start thinking about this. 1257 00:52:06,896 --> 00:52:09,776 You'll find that we've implemented all of that now 1258 00:52:09,776 --> 00:52:11,226 for you and you'll find this 1259 00:52:11,226 --> 00:52:14,776 to be particularly useful for problem set 6. 1260 00:52:14,776 --> 00:52:17,186 One of the challenges in problems set 6 recall 1261 00:52:17,506 --> 00:52:18,816 from a much earlier teaser is 1262 00:52:18,816 --> 00:52:21,426 to implement the fastest spell checker possible. 1263 00:52:21,766 --> 00:52:24,516 Now at first glance, this might seem a little mundane 1264 00:52:24,516 --> 00:52:27,126 but it gets hard when the list of words, 1265 00:52:27,126 --> 00:52:28,836 the list of English words you're gonna be handed is 1266 00:52:28,836 --> 00:52:31,316 like 150 thousand words long 1267 00:52:31,586 --> 00:52:34,746 and you're gonna need somehow read those in from a file much 1268 00:52:34,746 --> 00:52:37,156 like you are doing already with problems set 5 this week. 1269 00:52:37,356 --> 00:52:39,516 But then you're gonna have to somewhat somehow lay 1270 00:52:39,516 --> 00:52:42,556 out all 150 thousand of those words in memory 1271 00:52:42,806 --> 00:52:45,896 so that look ups spell checking is incredibly fast. 1272 00:52:46,296 --> 00:52:47,866 Now the simplest implementation 1273 00:52:47,866 --> 00:52:50,026 of a dictionary is actually pretty simple right. 1274 00:52:50,026 --> 00:52:52,046 You allocate enough RAM, enough bytes 1275 00:52:52,146 --> 00:52:53,926 to fit all 150 thousand words 1276 00:52:54,166 --> 00:52:56,596 and you just load them up into a huge array. 1277 00:52:56,816 --> 00:52:58,566 Write one word after another. 1278 00:52:58,566 --> 00:53:01,266 Much like [inaudible] contains all the command line arguments. 1279 00:53:01,266 --> 00:53:03,526 It's not hard to implement a spell checker or a dictionary 1280 00:53:03,526 --> 00:53:04,826 if you just put everything in an array. 1281 00:53:05,026 --> 00:53:07,076 But what would be the run time-- 1282 00:53:07,076 --> 00:53:08,706 what would be the big oh notation 1283 00:53:08,776 --> 00:53:11,986 for spell checking a single word if that's your implementation? 1284 00:53:13,776 --> 00:53:14,986 Big oh of? 1285 00:53:15,226 --> 00:53:17,896 So if you have N words 1286 00:53:17,896 --> 00:53:20,806 or 150 thousand words let's generalize it as N big O 1287 00:53:20,806 --> 00:53:22,506 of what is the running time of spell check 1288 00:53:22,506 --> 00:53:23,896 in the implementation that uses an array? 1289 00:53:23,896 --> 00:53:26,136 Right it's N right? 1290 00:53:26,136 --> 00:53:28,436 It might be N/2 on average but we always get rid 1291 00:53:28,436 --> 00:53:31,036 of constant factors so big O of N is you know-- 1292 00:53:31,036 --> 00:53:32,476 actually pretty fast compared to some 1293 00:53:32,476 --> 00:53:34,576 of the horrible sorting algorithms we played 1294 00:53:34,576 --> 00:53:37,246 with like bubble sort selection sort those things really 1295 00:53:37,246 --> 00:53:39,546 under performed compared to things like merge store 1296 00:53:39,796 --> 00:53:43,386 but relatively speaking even linear time is something 1297 00:53:43,386 --> 00:53:44,486 that you might wanna avoid. 1298 00:53:44,486 --> 00:53:46,596 What are -- what's an example of a running time that we saw 1299 00:53:46,596 --> 00:53:48,386 that was actually better than linear? 1300 00:53:48,786 --> 00:53:50,286 And what algorithm did we see that in? 1301 00:53:51,166 --> 00:53:51,746 >> Binary search. 1302 00:53:51,816 --> 00:53:52,996 >> Yeah so binary search. 1303 00:53:52,996 --> 00:53:54,546 Right this divide and conquer approach 1304 00:53:54,546 --> 00:53:57,756 where we counted students or we found numbers behind pieces 1305 00:53:57,756 --> 00:53:59,956 of paper on the board so log in is even better. 1306 00:54:00,116 --> 00:54:03,246 But now the Holy Grail, this week and next is going 1307 00:54:03,246 --> 00:54:05,916 to completely went up ourselves and try 1308 00:54:05,916 --> 00:54:08,876 to achieve constant time whereby even 1309 00:54:08,876 --> 00:54:12,106 if you have a 150 thousand words the challenge ultimately 1310 00:54:12,106 --> 00:54:15,096 in problem set six is going to be to get as close as possible 1311 00:54:15,096 --> 00:54:17,266 to instantaneous look ups. 1312 00:54:17,266 --> 00:54:21,926 No N no log N. Nothing other than constant time. 1313 00:54:21,926 --> 00:54:24,646 Now we're almost going to be able to get there. 1314 00:54:24,806 --> 00:54:28,026 But you'll see that it's gonna require some ingenuity among 1315 00:54:28,026 --> 00:54:31,226 them tricks like linked list and something called hash tables 1316 00:54:31,226 --> 00:54:33,176 or tries and trees and the like. 1317 00:54:33,506 --> 00:54:36,466 So more on that to come but for now let's take a step toward it. 1318 00:54:36,676 --> 00:54:39,656 So here is list 1.H and here again is just 1319 00:54:39,656 --> 00:54:41,526 to be clear the structure that each 1320 00:54:41,526 --> 00:54:44,616 of this guys except Lisa represented a moment ago. 1321 00:54:44,756 --> 00:54:47,056 Each of them had an int and held up on paper. 1322 00:54:47,056 --> 00:54:49,126 Each of them had a left tan called next 1323 00:54:49,466 --> 00:54:51,506 and now those were our 8 bites that each 1324 00:54:51,506 --> 00:54:52,716 of these guys represented. 1325 00:54:53,046 --> 00:54:56,896 Now I'm gonna go ahead and open up list 1.C and we won't dwell 1326 00:54:56,896 --> 00:55:00,066 on all of the minutia here 'cause it can actually frankly 1327 00:55:00,066 --> 00:55:02,226 unless you kinda worked through it slowly it's very easy 1328 00:55:02,226 --> 00:55:05,056 to get lost but let's take a look first 1329 00:55:05,056 --> 00:55:05,836 at what this thing does. 1330 00:55:05,836 --> 00:55:07,316 Let me go into a terminal window. 1331 00:55:07,536 --> 00:55:13,776 Let me do make list 1 and let me go ahead now and run list 1 1332 00:55:14,046 --> 00:55:15,616 and you'll see that just so that this is 1333 00:55:15,616 --> 00:55:17,046 so interesting I created sort 1334 00:55:17,046 --> 00:55:19,606 of a poor man's menu here whereby it just waits for you 1335 00:55:19,606 --> 00:55:21,906 at this command prompt to either type in delete, 1336 00:55:21,906 --> 00:55:23,676 find and search or traverse. 1337 00:55:24,006 --> 00:55:26,366 All of those should be self explanatory except traverse just 1338 00:55:26,366 --> 00:55:28,496 means start at the beginning of you list and just prints 1339 00:55:28,496 --> 00:55:29,346 out all of the numbers. 1340 00:55:29,346 --> 00:55:30,276 That's all we mean there. 1341 00:55:30,476 --> 00:55:32,106 So I just needed a little menu because I wanted 1342 00:55:32,106 --> 00:55:33,756 to implement functions called delete, 1343 00:55:33,926 --> 00:55:35,626 find, insert and traverse. 1344 00:55:35,976 --> 00:55:37,606 So what do I wanna do here? 1345 00:55:37,606 --> 00:55:42,546 Let me move this slightly over and do delete. 1346 00:55:43,096 --> 00:55:45,516 Well there's no number to delete so the list is empty. 1347 00:55:45,576 --> 00:55:47,046 So let's actually start with insert. 1348 00:55:47,046 --> 00:55:49,826 I hit 3 for insert what number do I wanna insert? 1349 00:55:49,826 --> 00:55:52,506 I'll insert the number 5 just like we started with there. 1350 00:55:52,846 --> 00:55:55,036 So now just to see what's going 1351 00:55:55,036 --> 00:55:57,906 on every time you execute a command I wrote the program 1352 00:55:57,906 --> 00:55:59,296 in such a way that it just tells you. 1353 00:55:59,296 --> 00:56:01,386 It reminds you what the current list is from left 1354 00:56:01,386 --> 00:56:02,706 to right just like our humans. 1355 00:56:02,996 --> 00:56:03,806 So let's do another one. 1356 00:56:03,806 --> 00:56:07,376 Insert, I'm gonna insert let's do the exact same numbers. 1357 00:56:07,376 --> 00:56:09,916 So I had a 5 in there before. 1358 00:56:09,916 --> 00:56:12,446 Let's also put in the number 17. 1359 00:56:12,846 --> 00:56:13,866 That was among our numbers. 1360 00:56:13,866 --> 00:56:16,726 Let me do another insert this time for 9 so those are three 1361 00:56:16,726 --> 00:56:19,186 of the numbers we had and notice now that the program is 1362 00:56:19,186 --> 00:56:21,606 at the moment smart enough to actually insert 1363 00:56:21,766 --> 00:56:22,956 at the right location. 1364 00:56:22,956 --> 00:56:25,076 55 is gonna go all the way at the end there. 1365 00:56:25,076 --> 00:56:27,166 And I can continue this and I could insert all 1366 00:56:27,166 --> 00:56:29,106 of the same numbers that we had up here on stage. 1367 00:56:29,336 --> 00:56:32,256 So now let's peel back the implementation 1368 00:56:32,346 --> 00:56:33,776 and see how this is working. 1369 00:56:34,096 --> 00:56:36,846 So first, what do I need at the top of this program? 1370 00:56:36,846 --> 00:56:38,146 So I've got a bunch of includes. 1371 00:56:38,486 --> 00:56:40,196 We haven't seen all of this before 1372 00:56:40,196 --> 00:56:44,046 but I'll wave my hands it's the new one there uni-standard.H. 1373 00:56:44,356 --> 00:56:47,026 And just point out at 1 I'm including my own file 1374 00:56:47,026 --> 00:56:48,846 and that has the type def we just saw. 1375 00:56:48,846 --> 00:56:52,836 And I also have a global variable. 1376 00:56:53,006 --> 00:56:54,906 I could avoid this because remember we've generally 1377 00:56:54,906 --> 00:56:56,576 preached that global variables are bad 1378 00:56:56,856 --> 00:56:59,876 but when you're implementing a program, as in this case, 1379 00:56:59,876 --> 00:57:02,566 whose sole purpose in life is to do one thing, 1380 00:57:02,566 --> 00:57:03,636 implement a linked list. 1381 00:57:04,026 --> 00:57:05,916 It's pretty reasonable in this scenario 1382 00:57:05,916 --> 00:57:07,036 to have a global variable. 1383 00:57:07,036 --> 00:57:10,446 Just like in the game of 15 we had a big global variable there 1384 00:57:10,446 --> 00:57:13,086 called G because the whole program was dedicated 1385 00:57:13,086 --> 00:57:13,626 to the game. 1386 00:57:13,626 --> 00:57:15,636 We might as well have one global rather 1387 00:57:15,636 --> 00:57:17,996 than pass the same pointer around again and again. 1388 00:57:18,256 --> 00:57:21,596 So here is essentially, I could to be clever, 1389 00:57:21,596 --> 00:57:23,256 just rename this Lisa right? 1390 00:57:23,256 --> 00:57:26,006 So Lisa represented this pointer here. 1391 00:57:26,006 --> 00:57:28,456 So she's 4 bytes which is implied by the fact 1392 00:57:28,456 --> 00:57:31,586 that there's a star there even though a node is 8 bites 1393 00:57:31,586 --> 00:57:33,966 remember that every pointer is 4 bytes. 1394 00:57:34,136 --> 00:57:36,906 So this variable now represents Lisa but I'll go back 1395 00:57:36,906 --> 00:57:39,886 to the global variable here called first 1396 00:57:39,886 --> 00:57:41,766 and that was the piece of paper that Lisa was holding up. 1397 00:57:41,976 --> 00:57:44,316 So that's Lisa and she's initialized to null 1398 00:57:44,316 --> 00:57:47,306 and this too is important if you don't initialize Lisa 1399 00:57:47,306 --> 00:57:51,196 or the head of your linked list to null, you're gonna assume 1400 00:57:51,196 --> 00:57:52,606 that whatever garbage value was 1401 00:57:52,606 --> 00:57:55,936 in Lisa's left hand is an actual node and if you follow 1402 00:57:55,936 --> 00:57:58,056 that pointer bad things are gonna happen 1403 00:57:58,056 --> 00:58:00,526 if no one existed to the left of Lisa. 1404 00:58:00,806 --> 00:58:02,356 So this is incredibly important now 1405 00:58:02,356 --> 00:58:04,726 to always initialize your pointers 1406 00:58:04,726 --> 00:58:07,096 to some known value for instance null. 1407 00:58:07,376 --> 00:58:09,916 Now I've got some function prototypes and I promised 1408 00:58:09,916 --> 00:58:10,916 that those things would exist. 1409 00:58:11,316 --> 00:58:12,876 Let's scroll down to main. 1410 00:58:13,186 --> 00:58:15,066 So main here's how I implemented the menu 1411 00:58:15,216 --> 00:58:16,416 as we've done in the past. 1412 00:58:16,416 --> 00:58:17,476 I have a do-while loop 1413 00:58:17,476 --> 00:58:19,796 and that's how I keep prompting the user again and again. 1414 00:58:20,056 --> 00:58:22,416 Here's just the menu it's kinda formatted in this way 1415 00:58:22,416 --> 00:58:23,716 for just aesthetic purposes. 1416 00:58:24,016 --> 00:58:26,316 As in the side if you've never known this-- 1417 00:58:26,316 --> 00:58:28,856 if you have a really long string that you wanna print 1418 00:58:28,856 --> 00:58:31,406 with printer def and you don't want it to wrap and wrap 1419 00:58:31,406 --> 00:58:34,516 and wrap around you can actually close the quotes as I did here. 1420 00:58:34,516 --> 00:58:38,906 And then just hit enter and pull it multiple coded strings 1421 00:58:38,906 --> 00:58:41,056 on each line with no other punctuation. 1422 00:58:41,056 --> 00:58:42,076 No commas anything. 1423 00:58:42,366 --> 00:58:44,606 And they'll be automatically contaminated together. 1424 00:58:44,606 --> 00:58:46,756 They'll be attached to one another automatically. 1425 00:58:46,986 --> 00:58:49,196 It's just a trick for keeping your code a little prettier. 1426 00:58:49,526 --> 00:58:52,676 So here's where I say command I use a get ints font call 1427 00:58:52,676 --> 00:58:55,086 to get the commands then I have a switch statements 1428 00:58:55,086 --> 00:58:56,936 and this is nice and clean because if the user types 1429 00:58:56,936 --> 00:59:00,166 in one I call delete, two fine and so forth. 1430 00:59:00,216 --> 00:59:02,736 So all of that is kind of older constructs 1431 00:59:02,736 --> 00:59:04,546 and then I have this trick here the user types 1432 00:59:04,546 --> 00:59:06,346 in 0 that's how they would quit. 1433 00:59:06,586 --> 00:59:08,206 Now this not the most friendly interface. 1434 00:59:08,206 --> 00:59:11,136 It would be nice if I insert-- if I supported Q for quit 1435 00:59:11,136 --> 00:59:13,486 and the like but that's really not the goal of this program. 1436 00:59:13,486 --> 00:59:15,016 Let's focus instead on the linked list. 1437 00:59:15,526 --> 00:59:18,446 So notice here that at the very end 1438 00:59:18,446 --> 00:59:22,146 of main I already have the foresight to free this list 1439 00:59:22,706 --> 00:59:26,776 and so if you assume that the user might start inserting nodes 1440 00:59:26,806 --> 00:59:28,846 that means that you'll gonna call malloc a whole bunch 1441 00:59:28,846 --> 00:59:29,366 of times. 1442 00:59:29,676 --> 00:59:32,836 So in my main function before I quit I wanna make sure 1443 00:59:32,836 --> 00:59:35,926 to free any memory that's been allocated by this program. 1444 00:59:36,266 --> 00:59:37,396 Now how does this work? 1445 00:59:37,396 --> 00:59:41,006 We'll come back to this but here I have a temporary pointer 1446 00:59:41,006 --> 00:59:41,606 that was me. 1447 00:59:41,606 --> 00:59:43,496 I was this temporary variable this story. 1448 00:59:43,726 --> 00:59:46,686 I'm pointing at the same thing Lisa's is at. 1449 00:59:47,186 --> 00:59:48,286 So this is actually important. 1450 00:59:48,286 --> 00:59:50,546 Lisa a moment ago, so Lisa took her paper 1451 00:59:50,546 --> 00:59:52,676 but suppose this was Lisa here that's head first 1452 00:59:53,006 --> 00:59:58,466 so if I say pointer gets first what am I pointing to? 1453 00:59:58,466 --> 01:00:00,966 Here's first. 1454 01:00:02,166 --> 01:00:07,736 Okay. So here's Lisa, here's me, what am I pointing to? 1455 01:00:08,756 --> 01:00:11,846 Why not wedo this highlighted line of code pointer gets first. 1456 01:00:11,846 --> 01:00:14,596 There are kind of two answers. 1457 01:00:14,596 --> 01:00:17,026 Am I pointing to Lisa or am I pointing 1458 01:00:17,026 --> 01:00:21,526 to whatever she is pointing at. 1459 01:00:21,526 --> 01:00:23,486 So I'm actually pointing to whatever she's pointing 1460 01:00:23,646 --> 01:00:25,156 because consider what's going on here. 1461 01:00:25,156 --> 01:00:25,936 What is first? 1462 01:00:26,246 --> 01:00:29,656 Well first is a pointer and it's declared up here. 1463 01:00:29,906 --> 01:00:31,836 Well what does that represent even though we refer 1464 01:00:31,836 --> 01:00:34,476 to this thing as Lisa, Lisa contains a value. 1465 01:00:34,476 --> 01:00:36,386 Her left hand represented that value. 1466 01:00:36,636 --> 01:00:38,416 So if I say pointer gets first, 1467 01:00:38,566 --> 01:00:40,606 that means do exactly what Lisa is doing. 1468 01:00:40,606 --> 01:00:43,736 Put inside of your 4 bytes exactly the address 1469 01:00:44,106 --> 01:00:46,696 that Lisa was storing and that happened to be the address 1470 01:00:46,696 --> 01:00:48,806 of like number 5 or number 9 whoever was there 1471 01:00:48,806 --> 01:00:49,616 at that point in time. 1472 01:00:49,856 --> 01:00:52,336 So I'm gonna actually be pointing at the exact same thing 1473 01:00:52,336 --> 01:00:54,396 that Lisa is but not at Lisa herself. 1474 01:00:54,486 --> 01:00:55,936 So what then happens? 1475 01:00:55,936 --> 01:00:58,066 Well this is just a trick and this is very common 1476 01:00:58,066 --> 01:00:59,996 in linked list and data structures where you need 1477 01:00:59,996 --> 01:01:01,566 to free a bunch of stuff again and again. 1478 01:01:01,836 --> 01:01:04,746 I have a wild loop and I'm saying wow my pointer is 1479 01:01:04,746 --> 01:01:05,336 not null. 1480 01:01:05,336 --> 01:01:07,506 So while I'm pointing at a legitimate person 1481 01:01:07,506 --> 01:01:09,386 and you can see how the story ends. 1482 01:01:09,386 --> 01:01:11,676 If I get to the end of the list and then I follow the pointer, 1483 01:01:11,846 --> 01:01:14,176 eventually I, the temporary variable will be pointing 1484 01:01:14,176 --> 01:01:15,516 at the ground also, null. 1485 01:01:15,846 --> 01:01:17,416 So that's when that loop will stop 1486 01:01:18,046 --> 01:01:22,026 and for now this is what's going on when you needed it 1487 01:01:22,106 --> 01:01:27,616 to remember before freeing something how to keep track 1488 01:01:28,066 --> 01:01:29,586 with fingers in code here. 1489 01:01:29,586 --> 01:01:30,956 So I put here that's not this. 1490 01:01:31,156 --> 01:01:33,396 What I mean here is this. 1491 01:01:34,036 --> 01:01:36,476 This line of code obviously frees a pointer. 1492 01:01:36,476 --> 01:01:37,296 It frees a struct. 1493 01:01:37,666 --> 01:01:38,966 So what are these two doing? 1494 01:01:39,186 --> 01:01:41,156 Well there's one new piece of syntax here. 1495 01:01:41,526 --> 01:01:45,236 So when I am standing here and I am pointing 1496 01:01:45,926 --> 01:01:48,966 at say the first node on the list, number 5. 1497 01:01:49,426 --> 01:01:51,216 I am then going to go ahead 1498 01:01:51,216 --> 01:01:53,726 and create a duplicate here pred pointer 1499 01:01:54,056 --> 01:01:56,166 which is also-- oops I have it here. 1500 01:01:58,206 --> 01:02:00,886 These two are temporarily equivalent. 1501 01:02:01,266 --> 01:02:04,776 So I'm pointing pointer is that this guy here say number 5. 1502 01:02:05,106 --> 01:02:08,616 Pred pointer at the moment is also pointing to number 5. 1503 01:02:09,106 --> 01:02:11,126 But what does this line of code then do? 1504 01:02:11,126 --> 01:02:12,236 This is the only new one. 1505 01:02:12,446 --> 01:02:14,586 Pointer who is currently pointing 1506 01:02:14,586 --> 01:02:17,346 at 5 gets pointer arrow next. 1507 01:02:18,496 --> 01:02:23,616 So that saying go to this guy follow his left hand and point 1508 01:02:23,616 --> 01:02:24,826 at whoever was to his left 1509 01:02:25,036 --> 01:02:26,716 which in this case was I think was the number 9. 1510 01:02:27,076 --> 01:02:30,386 So in short, this piece of code here and it's nice 1511 01:02:30,436 --> 01:02:33,816 that the syntax actually uses an arrow says update pointer 1512 01:02:34,116 --> 01:02:36,886 to be equal to whatever it's already pointing 1513 01:02:36,966 --> 01:02:39,936 at which is pointer but follow that next field. 1514 01:02:39,936 --> 01:02:40,866 Follow the arrow. 1515 01:02:40,966 --> 01:02:43,186 So this one line of code is just my way 1516 01:02:43,186 --> 01:02:45,766 of again having a temporary variable pointing at this person 1517 01:02:45,766 --> 01:02:48,286 and if I wanna point at the next one, I need to follow that arrow 1518 01:02:48,286 --> 01:02:51,156 to 9 follow that arrow again and again and again 1519 01:02:51,156 --> 01:02:52,376 and if I keep doing that remember 1520 01:02:52,376 --> 01:02:54,646 that 55's left hand was pointing down. 1521 01:02:54,926 --> 01:02:56,356 So as soon as I follow that arrow, 1522 01:02:56,686 --> 01:02:59,866 my own PTR variable is also going to be null. 1523 01:03:00,206 --> 01:03:02,486 No why did I keep pred pointer around? 1524 01:03:03,106 --> 01:03:04,906 Well you could do this in different ways. 1525 01:03:05,166 --> 01:03:08,256 But I needed to make sure that just because I advanced 1526 01:03:08,376 --> 01:03:10,336 to the next pointer, I need to make sure 1527 01:03:10,336 --> 01:03:12,406 that I don't orphan the previous person. 1528 01:03:12,406 --> 01:03:15,526 I need to remember them 'cause I need to advancer this pointer 1529 01:03:15,746 --> 01:03:18,946 and then delete or rather free this person. 1530 01:03:19,426 --> 01:03:21,676 Suppose I did the order of operations differently. 1531 01:03:21,786 --> 01:03:25,146 Suppose I did this free pointer which feels very reasonable 1532 01:03:25,456 --> 01:03:28,096 and then do pointer gets pointer next. 1533 01:03:28,096 --> 01:03:30,606 I mean frankly this is kind 1534 01:03:30,606 --> 01:03:32,996 of intuitively I think much more straight forward. 1535 01:03:33,246 --> 01:03:35,466 Free what you're pointing at and then update yourself 1536 01:03:35,466 --> 01:03:36,846 to the next pointer and then repeat. 1537 01:03:37,086 --> 01:03:41,616 But why is this bad? 1538 01:03:41,616 --> 01:03:41,806 [ Inaudible Remark ] 1539 01:03:41,806 --> 01:03:42,816 >> Because I already-- exactly. 1540 01:03:42,846 --> 01:03:45,206 If as soon as you call free on a pointer 1541 01:03:45,206 --> 01:03:47,926 and thus a struct you are not allowed to touch 1542 01:03:47,926 --> 01:03:51,106 that memory anymore because it's possible the OS even though it's 1543 01:03:51,106 --> 01:03:52,456 just between two lines of code, 1544 01:03:52,616 --> 01:03:54,586 it's possible the operating system might need that chunk 1545 01:03:54,586 --> 01:03:56,746 of memory immediately allocated to someone else 1546 01:03:56,746 --> 01:03:59,596 and voila you've just now followed a pointer that's no 1547 01:03:59,596 --> 01:04:00,716 longer your own. 1548 01:04:00,716 --> 01:04:03,256 In reality that's not quite how the story gonna go 1549 01:04:03,446 --> 01:04:04,666 but the idea is the same. 1550 01:04:04,666 --> 01:04:07,556 Once you have freed memory it is no longer yours to touch. 1551 01:04:07,806 --> 01:04:11,076 It could in fact be reclaimed and used in some other forms. 1552 01:04:11,266 --> 01:04:13,896 So that's why frankly we have kinda jump through these hoops 1553 01:04:14,156 --> 01:04:17,566 and frankly make the code more complex and harder to understand 1554 01:04:17,816 --> 01:04:20,446 but all we're doing really is implementing this idea 1555 01:04:20,446 --> 01:04:22,526 of temporary fingers pointing at these nodes 1556 01:04:22,526 --> 01:04:25,876 and only freeing them once we get the ordering right. 1557 01:04:26,046 --> 01:04:27,696 But for now let me wave my hand at that 1558 01:04:27,696 --> 01:04:30,186 and let's focus on say insert. 1559 01:04:30,786 --> 01:04:33,096 So insert is actually a little long but we'll look 1560 01:04:33,096 --> 01:04:34,036 at just a couple of cases. 1561 01:04:34,036 --> 01:04:36,456 Al right. So this is where pointers gets interesting 1562 01:04:36,456 --> 01:04:39,446 but realize we're providing this code for a reason 1563 01:04:39,446 --> 01:04:41,836 so you don't need to implement at all from scratch. 1564 01:04:42,106 --> 01:04:45,766 So how do you go about inserting a number like the number five 1565 01:04:45,766 --> 01:04:47,956 or nine or 55 or whatever into the list? 1566 01:04:48,466 --> 01:04:50,186 So first, I need a node. 1567 01:04:50,216 --> 01:04:53,726 So I'm gonna call malloc and I need as much RAM as is the size 1568 01:04:53,726 --> 01:04:55,716 of a node and then I store that addresses of memory 1569 01:04:55,716 --> 01:04:57,966 and new pointer and then I just do a sanity check 1570 01:04:57,966 --> 01:04:59,826 if new pointer is null I'm just gonna return 1571 01:04:59,826 --> 01:05:01,146 because I'm obviously out of memory. 1572 01:05:01,146 --> 01:05:02,726 Something went wrong, can't proceed. 1573 01:05:03,236 --> 01:05:07,236 So that's what creates an empty human on stage with no number 1574 01:05:07,236 --> 01:05:08,236 and no pointer just yet. 1575 01:05:08,716 --> 01:05:10,556 Alright, now I initialize the node. 1576 01:05:10,766 --> 01:05:13,026 Number to insert well I'm just using getint 1577 01:05:13,316 --> 01:05:16,446 and here's how given a new pointer so given a human 1578 01:05:16,446 --> 01:05:20,126 on stage, here is how I update his or her N field the integer 1579 01:05:20,126 --> 01:05:23,236 and then I initialize the next field to null. 1580 01:05:24,076 --> 01:05:26,866 So let me just point out one thing syntactically this is kind 1581 01:05:26,866 --> 01:05:27,186 of new. 1582 01:05:27,186 --> 01:05:29,596 This arrow notation and remember and before 1583 01:05:29,596 --> 01:05:31,996 when we've used structs we used the dot notation. 1584 01:05:32,456 --> 01:05:35,036 Why do I may not use the dot notation here 1585 01:05:35,036 --> 01:05:38,186 for new pointer even though I claim new pointer is 1586 01:05:38,186 --> 01:05:42,166 at the moment because I've called malloc this is what our 1587 01:05:42,166 --> 01:05:43,006 picture looks like. 1588 01:05:44,226 --> 01:05:45,626 This is our node. 1589 01:05:45,876 --> 01:05:47,726 This is N. This is next. 1590 01:05:49,006 --> 01:05:50,486 This is new pointer. 1591 01:05:51,436 --> 01:05:58,366 And if we could try our camera trick so oops didn't see that. 1592 01:05:58,876 --> 01:06:01,926 So this is what just happened when I call malloc I get this. 1593 01:06:02,206 --> 01:06:04,506 When I store the return value of malloc 1594 01:06:04,506 --> 01:06:06,406 in new pointer I get this picture. 1595 01:06:06,586 --> 01:06:08,736 And so this is what's going on in memory right now. 1596 01:06:09,096 --> 01:06:11,396 So when I called getints with my next line of code 1597 01:06:11,626 --> 01:06:13,936 and suppose I type in the number 5 what happens? 1598 01:06:14,216 --> 01:06:17,666 Well new pointer, arrow, N gets the number 5 1599 01:06:17,906 --> 01:06:19,626 and then this I'm initializing to null 1600 01:06:19,626 --> 01:06:21,586 so I'll just write null in there. 1601 01:06:22,106 --> 01:06:24,046 But the question I post was this 1602 01:06:24,366 --> 01:06:27,266 if this is a struct why am I not using the dot notation 1603 01:06:27,266 --> 01:06:29,186 as I did earlier today and two weeks ago? 1604 01:06:29,186 --> 01:06:29,606 [ Inaudible Remark ] 1605 01:06:29,606 --> 01:06:32,016 >> Yes, so it's an address. 1606 01:06:32,476 --> 01:06:35,866 So new pointer is not a struct at this moment in time. 1607 01:06:35,866 --> 01:06:41,776 If I had a struct it would be like this-- node, a new node. 1608 01:06:41,966 --> 01:06:44,716 That is a struct because recall we declared node 1609 01:06:44,716 --> 01:06:46,526 as a data type so struct a new node. 1610 01:06:46,716 --> 01:06:49,176 That is something that I could do this notation on. 1611 01:06:49,176 --> 01:06:52,376 A new node.N gets one two three. 1612 01:06:52,546 --> 01:06:56,256 And then a new node.next gets null. 1613 01:06:56,646 --> 01:06:59,896 But I didn't do that because new pointers is what? 1614 01:06:59,896 --> 01:07:00,686 Well it's an address. 1615 01:07:01,386 --> 01:07:04,666 So just like this picture suggests it's a pointer pointing 1616 01:07:04,666 --> 01:07:06,756 at a struct so I need to not use the dot 1617 01:07:06,756 --> 01:07:09,546 because look there is no N or next field in this value. 1618 01:07:09,786 --> 01:07:12,016 I have to follow the arrow to get here. 1619 01:07:12,406 --> 01:07:15,306 But to reveal that the syntax is actually the same 1620 01:07:15,346 --> 01:07:17,636 in previous lectures how did we start 1621 01:07:17,636 --> 01:07:19,536 at a pointer and then go there? 1622 01:07:19,536 --> 01:07:22,376 What was the syntax for saying here's a pointer go there? 1623 01:07:23,376 --> 01:07:24,786 Yeah, so star. 1624 01:07:24,786 --> 01:07:27,896 So you know what this arrow notation we're seeing now is 1625 01:07:27,896 --> 01:07:30,656 actually equivalent to writing this. 1626 01:07:32,046 --> 01:07:33,826 So this is exactly equivalent. 1627 01:07:33,826 --> 01:07:35,746 So it's saying here's new pointer. 1628 01:07:36,036 --> 01:07:39,246 Go there. Thanks to the star notation and once you are there, 1629 01:07:39,246 --> 01:07:41,646 you can use the old school dot notation. 1630 01:07:42,066 --> 01:07:44,056 So that begs the question why the arrow? 1631 01:07:44,656 --> 01:07:48,306 Why not just write this code like this? 1632 01:07:48,516 --> 01:07:51,826 This is more of this syntactic sugar right. 1633 01:07:51,826 --> 01:07:55,256 It's the same effect but frankly I would claim 1634 01:07:55,256 --> 01:07:56,676 that this is just a little more readable. 1635 01:07:56,676 --> 01:07:58,356 In fact it kinda mirrors our idea 1636 01:07:58,396 --> 01:08:01,086 of pictures much more effectively than star 1637 01:08:01,086 --> 01:08:02,236 and parenthesis and dot. 1638 01:08:02,476 --> 01:08:03,456 But these are equivalents. 1639 01:08:03,456 --> 01:08:05,936 And then commentary here is just to reveal 1640 01:08:05,936 --> 01:08:07,466 that they're doing nothing new. 1641 01:08:07,626 --> 01:08:09,186 It's just a new piece of syntax. 1642 01:08:09,456 --> 01:08:11,236 Alright so now we've initialized the null 1643 01:08:11,236 --> 01:08:14,016 with both an N like 5 and null. 1644 01:08:14,346 --> 01:08:15,716 Now I'm gonna do the following. 1645 01:08:15,716 --> 01:08:17,916 So this case is easy. 1646 01:08:18,116 --> 01:08:19,356 Suppose the list is empty. 1647 01:08:19,356 --> 01:08:21,676 Suppose Lisa is just standing up here awkwardly. 1648 01:08:21,676 --> 01:08:23,876 She's initialized to null so her hands are like these 1649 01:08:24,076 --> 01:08:26,406 and there is no linked list like Lisa is null. 1650 01:08:26,596 --> 01:08:28,556 Well this is actually the easiest case to handle 1651 01:08:28,556 --> 01:08:30,236 because I malloc someone from the audience. 1652 01:08:30,236 --> 01:08:33,036 I insert the number five into his or her N fields 1653 01:08:33,036 --> 01:08:34,626 and then what do I do with their next field? 1654 01:08:34,626 --> 01:08:36,586 Initialize it to null and then what do I do with Lisa. 1655 01:08:37,016 --> 01:08:39,596 Just point Lisa's next field or rather point Lisa. 1656 01:08:39,696 --> 01:08:41,696 She is a pointer at that node. 1657 01:08:41,996 --> 01:08:45,806 So this scenario here check for empty list, we didn't mimic this 1658 01:08:45,806 --> 01:08:48,656 on stage but this is by far the easiest scenario 1659 01:08:48,886 --> 01:08:51,286 if Lisa is not pointing at anything already 1660 01:08:51,486 --> 01:08:53,246 and we've just malloc a new node, 1661 01:08:53,246 --> 01:08:56,806 ergo we have a brand new linked list just point Lisa 1662 01:08:56,866 --> 01:08:57,906 at that node. 1663 01:08:58,516 --> 01:09:02,856 Now it gets a little trickier if the list belongs elsewhere. 1664 01:09:02,856 --> 01:09:05,726 We have to do a little thinking now. 1665 01:09:05,726 --> 01:09:07,106 Suppose that first is not null. 1666 01:09:07,106 --> 01:09:09,346 So Lisa is actually pointing at a node. 1667 01:09:09,586 --> 01:09:13,686 But suppose that the new nodes N field is less 1668 01:09:13,806 --> 01:09:16,156 than the first nodes N field. 1669 01:09:16,386 --> 01:09:18,286 So suppose the list has the number the number 5 in it 1670 01:09:18,426 --> 01:09:20,026 and I'm certain sorting number 1. 1671 01:09:20,026 --> 01:09:22,906 Just conceptually to keep these things in sorted order I need 1672 01:09:22,906 --> 01:09:26,576 to put someone between Lisa and that first node update Lisa 1673 01:09:26,576 --> 01:09:28,236 to point to this new thing number 1 1674 01:09:28,276 --> 01:09:30,236 and update 1 to point to 5. 1675 01:09:30,566 --> 01:09:31,706 So how do I do this? 1676 01:09:31,836 --> 01:09:33,846 Well first just notice our syntax here. 1677 01:09:33,846 --> 01:09:36,356 I'm using my arrow notation to check those values of N. 1678 01:09:36,546 --> 01:09:38,696 So if the new node N is less 1679 01:09:38,806 --> 01:09:41,056 than the first node's N what do I do? 1680 01:09:41,786 --> 01:09:42,776 New pointer next. 1681 01:09:43,056 --> 01:09:44,746 So here I can avoid temporary storage. 1682 01:09:44,746 --> 01:09:46,036 If I am now the number 1 1683 01:09:46,166 --> 01:09:49,926 and I belong here I can just steal whatever Lisa's already 1684 01:09:49,926 --> 01:09:50,746 pointing at. 1685 01:09:50,746 --> 01:09:51,856 So I'm the new node of 1. 1686 01:09:52,096 --> 01:09:55,446 I wanna point my left hand at whatever Lisa's already pointing 1687 01:09:55,806 --> 01:09:57,396 at so at this point of the story both she 1688 01:09:57,396 --> 01:09:58,996 and I are pointing at number 5. 1689 01:09:59,416 --> 01:10:01,476 But now, who do I have to update to finish the story? 1690 01:10:01,896 --> 01:10:03,896 I just say Lisa like point you r left hand not 1691 01:10:03,896 --> 01:10:05,996 at five anymore point it at number 1 1692 01:10:05,996 --> 01:10:09,216 and so now we have a little insertion at the list's head. 1693 01:10:09,826 --> 01:10:13,076 So if we scroll down further unfortunately it gets a little 1694 01:10:13,076 --> 01:10:14,636 messier and we have the advantage before 1695 01:10:14,636 --> 01:10:16,826 of using humans could've we just kind of walked back and forth 1696 01:10:16,826 --> 01:10:20,606 over stage but suffice it to say that these lines of code here 1697 01:10:20,926 --> 01:10:22,586 and that there is a loop involved here. 1698 01:10:22,866 --> 01:10:24,366 There is inequality test. 1699 01:10:24,366 --> 01:10:26,006 We're checking for duplicate values now 1700 01:10:26,006 --> 01:10:28,086 and we're also checking whether something is less 1701 01:10:28,226 --> 01:10:29,066 than or greater than. 1702 01:10:29,066 --> 01:10:30,216 So here's a greater than side. 1703 01:10:30,526 --> 01:10:32,366 Long story short, the remainder 1704 01:10:32,366 --> 01:10:34,906 of this function is what implements all these humans 1705 01:10:34,906 --> 01:10:37,316 working together where we had maybe a temporary variable. 1706 01:10:37,316 --> 01:10:39,276 We walked down the list, looked for the right spot. 1707 01:10:39,476 --> 01:10:41,296 Inserted the node into that location, 1708 01:10:41,486 --> 01:10:42,746 kept track of the left hand 1709 01:10:42,776 --> 01:10:44,256 to make sure we got the ordering right. 1710 01:10:44,426 --> 01:10:45,746 And this is definitely more complex 1711 01:10:45,796 --> 01:10:47,896 than those first scenarios but that's fine for now. 1712 01:10:48,196 --> 01:10:50,306 The take away ultimately for today is 1713 01:10:50,346 --> 01:10:53,726 that the basics have sunken. 1714 01:10:55,076 --> 01:10:57,146 Any questions on linked list? 1715 01:10:57,186 --> 01:10:57,936 They will recur. 1716 01:10:57,936 --> 01:10:59,906 It's not the last time. 1717 01:11:00,506 --> 01:11:01,256 Well that's it. 1718 01:11:01,256 --> 01:11:02,976 Why don't we end on this slide and we'll see you on Wednesday.