1 00:00:00,506 --> 00:00:10,556 [ Silence ] 2 00:00:11,056 --> 00:00:11,536 >> Alright. 3 00:00:11,536 --> 00:00:12,396 Welcome back to CS50. 4 00:00:12,396 --> 00:00:14,996 This is the end of week five, so no lecture on Monday 5 00:00:15,126 --> 00:00:16,876 because of the holiday, quiz zero is 6 00:00:16,876 --> 00:00:18,456 on Wednesday during lecture. 7 00:00:18,456 --> 00:00:19,976 Expect an email in the next day 8 00:00:19,976 --> 00:00:22,716 or two regarding the locations for quiz zero. 9 00:00:22,716 --> 00:00:24,006 We will separate you based 10 00:00:24,006 --> 00:00:25,946 on the first letter of your last name. 11 00:00:25,946 --> 00:00:28,486 So if you do not receive this email, 12 00:00:28,526 --> 00:00:30,856 just realize you should check the course's homepage before 13 00:00:30,856 --> 00:00:32,836 showing up anywhere on Wednesday. 14 00:00:32,836 --> 00:00:34,296 So the music you heard on your way 15 00:00:34,296 --> 00:00:36,416 in today is a bit of a special request. 16 00:00:36,416 --> 00:00:39,146 I received this email from one of your classmate yesterday. 17 00:00:39,676 --> 00:00:40,256 Hi, David. 18 00:00:40,256 --> 00:00:42,856 Could you play the Motion Sick 30 Lives, 19 00:00:42,926 --> 00:00:45,006 either the original version or the DDR remix. 20 00:00:45,316 --> 00:00:47,216 It would great whatever you think works better. 21 00:00:47,466 --> 00:00:49,456 Columbus Day is my second anniversary 22 00:00:49,456 --> 00:00:52,636 with my boyfriend Richard, and who's also 23 00:00:52,706 --> 00:00:54,526 in CS50 and is an avid gamer. 24 00:00:54,806 --> 00:00:56,996 So this song is sort of relevant. 25 00:00:57,066 --> 00:00:58,986 This would totally make our day. 26 00:00:58,986 --> 00:01:02,456 So, from CS50 to Joanna and Richard, 27 00:01:02,456 --> 00:01:03,946 Happy Second Anniversary. 28 00:01:04,516 --> 00:01:06,791 [ Cheering ] 29 00:01:07,291 --> 00:01:09,566 [ Applause ] 30 00:01:10,066 --> 00:01:12,136 >> So for those unfamiliar, this song, 30 lives, 31 00:01:12,626 --> 00:01:15,986 is actually poking fun at a game that I grew up with. 32 00:01:15,986 --> 00:01:18,166 It's referring to what's called the Konami code, 33 00:01:18,166 --> 00:01:20,636 which is the sequence of inputs 34 00:01:20,636 --> 00:01:22,936 on the original Nintendo controller whereby 35 00:01:22,936 --> 00:01:26,516 when starting this game called Contra, as well as some others, 36 00:01:26,776 --> 00:01:29,146 you could hit up, up, down, down, left, right, left, left, 37 00:01:29,146 --> 00:01:32,416 right, B, A, and you would immediately get 30 free lives 38 00:01:32,706 --> 00:01:34,176 with which to proceed in the game. 39 00:01:34,176 --> 00:01:36,656 And it's a funny thing, 30 plus years later, 40 00:01:36,656 --> 00:01:38,396 I've still remembered this code. 41 00:01:38,396 --> 00:01:41,426 But for those who've never seen the game, I thought I would pull 42 00:01:41,556 --> 00:01:44,656 up someone's playing there for just a moment here 43 00:01:45,056 --> 00:01:48,676 so that you too now know the sounds that I grew up with. 44 00:01:50,516 --> 00:02:20,846 [ Music ] 45 00:02:21,346 --> 00:02:24,666 >> Anyhow it's hard to justify too many seconds of that video. 46 00:02:24,996 --> 00:02:29,066 But that is what video games used to look like. 47 00:02:29,066 --> 00:02:32,276 Now, for those of you haven't really dug into the source code, 48 00:02:32,336 --> 00:02:34,266 the HTML as we'll call it in a few weeks, 49 00:02:34,456 --> 00:02:37,046 of the course's own homepage, it turns out that our site, 50 00:02:37,286 --> 00:02:40,926 like a lot, actually has a little Easter Egg in it, 51 00:02:41,016 --> 00:02:45,716 whereby if you go to cs50.net and hit up, up, down, down, 52 00:02:45,716 --> 00:02:48,786 left, right, left, left, right, B, A-- 53 00:02:48,786 --> 00:02:50,036 [ Laughter ] 54 00:02:50,036 --> 00:02:51,536 >> You indeed do get that. 55 00:02:51,626 --> 00:02:51,876 So-- 56 00:02:52,376 --> 00:02:54,616 [ Applause ] 57 00:02:55,116 --> 00:02:56,716 >> Anyhow, alright. 58 00:02:56,906 --> 00:02:59,496 So-- oh, speaking of games, I got one more for you. 59 00:02:59,496 --> 00:03:03,026 So speaking of games, this is a popular cartoon called XKCD 60 00:03:03,216 --> 00:03:12,096 that you should not understand. 61 00:03:12,096 --> 00:03:12,163 [ Laughter ] 62 00:03:12,163 --> 00:03:14,436 >> So, a little bit of geek humor today. 63 00:03:14,436 --> 00:03:18,806 So the CS50 library, we finally started peeling back the layers 64 00:03:18,856 --> 00:03:20,296 that it is on Monday. 65 00:03:20,296 --> 00:03:22,956 And what I thought I'd do is draw your attention specifically 66 00:03:22,956 --> 00:03:24,696 to just a couple of the features thereof. 67 00:03:24,696 --> 00:03:27,316 It's not important that you understand exactly what every 68 00:03:27,316 --> 00:03:29,956 line of code in the CS50 library does right now. 69 00:03:30,216 --> 00:03:33,216 Towards semester's end, it's absolutely a set of code 70 00:03:33,216 --> 00:03:35,516 that you should be able to read through and understand. 71 00:03:35,516 --> 00:03:37,846 And even if you hit some line that you don't quite understand, 72 00:03:37,846 --> 00:03:40,216 you should know that you can type man foo 73 00:03:40,216 --> 00:03:43,146 on the command prompt to get the man page for some function foo, 74 00:03:43,146 --> 00:03:45,146 you can Google for some answer to a function. 75 00:03:45,386 --> 00:03:47,476 But by the semester's end, you should certainly be able 76 00:03:47,476 --> 00:03:50,016 to understand it at high level and then dive 77 00:03:50,016 --> 00:03:53,106 in deeper using the tools that you picked up along the way. 78 00:03:53,386 --> 00:03:57,356 So there are these-- there are these types 79 00:03:57,356 --> 00:04:00,196 that are declared only in CS50's library. 80 00:04:00,536 --> 00:04:03,026 And everything seems to be slightly off there. 81 00:04:03,026 --> 00:04:03,866 Let me reset. 82 00:04:04,926 --> 00:04:08,686 There are two files in the CS50 library, the first one 83 00:04:08,686 --> 00:04:12,086 of which is called cs50.h. This is a header file, 84 00:04:12,396 --> 00:04:14,476 which if you've ever looked at, it has a whole lot of stuff, 85 00:04:14,476 --> 00:04:17,006 but it's mostly comments, and it's mostly the prototypes 86 00:04:17,006 --> 00:04:20,186 for all the functions in the .c file, the prototypes 87 00:04:20,186 --> 00:04:23,496 for getString, getInt, getLongLong and so forth. 88 00:04:23,676 --> 00:04:25,256 But if you've ever looked through here, 89 00:04:25,256 --> 00:04:27,316 you'll notice some new features that I'll just point out so 90 00:04:27,316 --> 00:04:28,456 that you seen them before. 91 00:04:28,686 --> 00:04:30,446 Well, there's this little construct up here. 92 00:04:30,446 --> 00:04:34,246 We've seen sharp define in the context of declaring constants. 93 00:04:34,496 --> 00:04:35,946 Well, that's kind of what we're doing here. 94 00:04:35,946 --> 00:04:39,816 We're defining the constants called _cs50_h. 95 00:04:40,096 --> 00:04:42,726 This is just kind of an old school trick to-- 96 00:04:42,726 --> 00:04:46,146 whereby you define a constant that you can then check 97 00:04:46,256 --> 00:04:49,086 for later to make sure you're not including a header file 98 00:04:49,086 --> 00:04:50,026 multiple times. 99 00:04:50,026 --> 00:04:51,656 So it turns out you haven't needed to do this 100 00:04:51,656 --> 00:04:54,636 for your own programs because when you've used sudoku.h, 101 00:04:54,876 --> 00:04:57,836 the only file including it was sudoku.c. And same deal 102 00:04:57,836 --> 00:04:59,656 with helpers.h, the only file 103 00:04:59,876 --> 00:05:03,526 that was including it last week was helpers.c. But when you get 104 00:05:03,526 --> 00:05:06,296 to the point of writing programs whereby multiple files, 105 00:05:06,526 --> 00:05:09,416 might very well want to include the same header file, 106 00:05:09,676 --> 00:05:10,776 or you need to make sure 107 00:05:10,776 --> 00:05:12,956 that you only read it into memory once. 108 00:05:13,116 --> 00:05:15,996 Otherwise, you're gonna try doubly defining function 109 00:05:15,996 --> 00:05:18,526 prototypes, you're gonna try doubly defining constants 110 00:05:18,526 --> 00:05:19,006 and so forth. 111 00:05:19,306 --> 00:05:21,836 So this is just a little preprocessor director 112 00:05:22,096 --> 00:05:24,316 that essentially is an if condition written 113 00:05:24,316 --> 00:05:27,806 in a preprocessing language the GCC understands that says 114 00:05:27,806 --> 00:05:30,666 if you've already read this file before, don't do this again. 115 00:05:30,876 --> 00:05:32,446 So just tuck that away if curios. 116 00:05:32,786 --> 00:05:35,806 We happened to include a couple of libraries ourselves 117 00:05:35,806 --> 00:05:38,416 so that we can use certain features later on in the file. 118 00:05:39,006 --> 00:05:42,406 We happened to include a file called standardbull.h. It's 119 00:05:42,406 --> 00:05:44,686 actually a math file that this notion of true 120 00:05:44,686 --> 00:05:47,106 and false is defined, which isn't native to C, 121 00:05:47,356 --> 00:05:50,896 but it's just a synonym for 0 and 1, and then there it is. 122 00:05:51,176 --> 00:05:52,996 We learned typedef a couple of days ago. 123 00:05:52,996 --> 00:05:55,296 And typedef allows you to define your own types. 124 00:05:55,586 --> 00:05:58,236 Well, here is how we define the keyword string 125 00:05:58,476 --> 00:06:00,106 and it maps to Char Star. 126 00:06:00,106 --> 00:06:01,346 So the syntax is a little weird. 127 00:06:01,346 --> 00:06:03,306 Things are kind of maybe in reverse order than you'd expect 128 00:06:03,306 --> 00:06:06,286 and the star is on the string-- next to string itself. 129 00:06:06,366 --> 00:06:09,476 But that line of code just says declare a keyword called string 130 00:06:09,476 --> 00:06:12,586 and henceforth, it's an alias for Char Star. 131 00:06:12,586 --> 00:06:15,256 And then the rest of the file is just comments and prototypes. 132 00:06:15,576 --> 00:06:18,446 Hey, here comes getChar, here comes getDouble, 133 00:06:18,446 --> 00:06:20,216 here comes getInt and so forth. 134 00:06:20,526 --> 00:06:23,966 So all this time, when you've been sharp including CS.h, 135 00:06:24,236 --> 00:06:27,226 you're telling GCC these are the prototypes 136 00:06:27,226 --> 00:06:30,616 for these functions 'cause they are in cs50.h, so don't fret 137 00:06:30,826 --> 00:06:33,536 if I, the programmer, start using getString, 138 00:06:33,536 --> 00:06:36,856 getInt and getFloat in my own code, because rest assured, 139 00:06:36,856 --> 00:06:39,516 they've been declared already in this header file. 140 00:06:39,746 --> 00:06:40,946 Now, what's in the C file? 141 00:06:41,166 --> 00:06:45,446 Well, the C file, cs50.c, has the implementations 142 00:06:45,446 --> 00:06:48,886 or definitions of all of these functions. 143 00:06:49,136 --> 00:06:52,686 At the top notice, we include itself, cs50.h. And then 144 00:06:52,686 --> 00:06:54,866 if we scroll down, we'll see what we saw Monday, 145 00:06:54,866 --> 00:06:56,126 namely getString. 146 00:06:56,306 --> 00:06:59,826 Let me go to the function implementation for getString. 147 00:07:00,156 --> 00:07:02,856 And recall that we do this little tricks in our code, 148 00:07:02,856 --> 00:07:06,016 whereby conceptually, we start with an empty buffer, thinking, 149 00:07:06,016 --> 00:07:07,826 well, maybe the user won't get us any input. 150 00:07:08,026 --> 00:07:10,056 But as soon as the users starts giving us input, 151 00:07:10,236 --> 00:07:13,936 we start growing this buffer by calling malloc, or its cousin, 152 00:07:14,096 --> 00:07:16,776 realloc, whereby we get more and more memory 153 00:07:16,776 --> 00:07:19,046 from the operating system to fit more and more characters 154 00:07:19,046 --> 00:07:22,386 from the user, but at every step, we take extra care never 155 00:07:22,626 --> 00:07:26,346 to exceed the boundaries of our own buffer, our own array 156 00:07:26,346 --> 00:07:28,416 of characters, because we, one, 157 00:07:28,656 --> 00:07:32,676 use that special function called fgetc, which gets one character 158 00:07:32,806 --> 00:07:35,376 from a file, F, from standard in, 159 00:07:35,376 --> 00:07:38,206 which is just a special keyword referring to the keyboard. 160 00:07:38,456 --> 00:07:40,466 And so, by doing this, it feels a little efficient, 161 00:07:40,466 --> 00:07:41,456 and arguably, it is. 162 00:07:41,566 --> 00:07:44,476 But by just inching our way along the user's input, 163 00:07:44,696 --> 00:07:46,996 we make sure that we always have enough storage space 164 00:07:46,996 --> 00:07:48,456 to receive it, so that we don't run 165 00:07:48,456 --> 00:07:51,606 into the so-called buffer overflow or overrun attack 166 00:07:51,666 --> 00:07:52,886 that we described last time. 167 00:07:52,886 --> 00:07:54,906 And it was this if condition we pointed out Monday, 168 00:07:55,226 --> 00:07:57,056 that you can look over in more detail if you'd like, 169 00:07:57,276 --> 00:08:00,206 that takes care of checking; do I have enough space in buffer. 170 00:08:00,206 --> 00:08:03,646 If not, let me double it, copy things over with realloc, 171 00:08:03,646 --> 00:08:05,566 and then proceed along my way. 172 00:08:05,786 --> 00:08:08,216 And there is actually an optimization toward the very end 173 00:08:08,216 --> 00:08:10,106 of this function which we didn't look at on Monday. 174 00:08:10,426 --> 00:08:11,646 There's a few lines of code 175 00:08:11,646 --> 00:08:13,596 that the comment says minimize buffer. 176 00:08:14,186 --> 00:08:15,136 So if you want to get better-- 177 00:08:15,416 --> 00:08:18,616 a better understanding of how malloc works and string copy 178 00:08:18,616 --> 00:08:21,616 or stir N copy which copies specific number, 179 00:08:21,676 --> 00:08:24,626 N of bytes from array to another, you can read 180 00:08:24,626 --> 00:08:26,636 through this code, and even with the skills we've picked 181 00:08:26,636 --> 00:08:27,926 up in piece that's 3 and 4, 182 00:08:28,076 --> 00:08:29,986 can probably make sense of what it's doing. 183 00:08:29,986 --> 00:08:33,106 In this last few lines of code, it's essentially saying, 184 00:08:33,256 --> 00:08:36,196 if I completely overcompensated by doubling the size 185 00:08:36,196 --> 00:08:38,886 of the buffer, when really, I just needed one more character, 186 00:08:39,086 --> 00:08:42,046 well, this clean up process at the bottom of the function is 187 00:08:42,046 --> 00:08:43,846 where we say, alright, give me 188 00:08:43,846 --> 00:08:46,766 yet another buffer that's exactly the right size, 189 00:08:47,056 --> 00:08:49,166 copy the stuff from the original buffer 190 00:08:49,166 --> 00:08:52,696 into this new smaller array, and then free, 191 00:08:52,826 --> 00:08:55,886 using this function free, the original buffer. 192 00:08:55,886 --> 00:08:57,986 So again, I'm pointing out just some of the highlights 193 00:08:57,986 --> 00:09:00,136 of the code, not necessarily some of the specifics. 194 00:09:00,346 --> 00:09:01,876 But realize, at the end of this function, 195 00:09:01,876 --> 00:09:05,946 you have in memory exactly enough bytes plus one 196 00:09:06,246 --> 00:09:07,446 for what the user typed in. 197 00:09:07,446 --> 00:09:08,696 And what's that plus one for? 198 00:09:08,696 --> 00:09:09,266 [ Inaudible Remark ] 199 00:09:09,266 --> 00:09:13,446 >> That backlash 0, the terminating-- the terminator-- 200 00:09:13,446 --> 00:09:16,746 the sentinel value that says this is the end of the string. 201 00:09:16,996 --> 00:09:18,116 Now, we won't look at all 202 00:09:18,116 --> 00:09:20,846 of the functions 'cause they're pretty much identical in spirit. 203 00:09:20,846 --> 00:09:22,086 But we will look at getInt. 204 00:09:22,086 --> 00:09:24,486 It turns out, we realized, when writing this library, 205 00:09:24,836 --> 00:09:27,106 that you know what, almost everyone 206 00:09:27,106 --> 00:09:29,596 of this functions involves getting input from the user 207 00:09:29,786 --> 00:09:31,396 and then returning a different data type. 208 00:09:31,446 --> 00:09:35,226 Well, why not treat the user's input initially as just a string 209 00:09:35,516 --> 00:09:37,046 and then check; is this is an int? 210 00:09:37,046 --> 00:09:38,136 If so, return in int. 211 00:09:38,136 --> 00:09:39,036 Is this is a float? 212 00:09:39,036 --> 00:09:40,696 If so return a float. 213 00:09:40,886 --> 00:09:43,926 And so, we actually use a trick in getInt, in getFloat, 214 00:09:43,926 --> 00:09:47,206 in getLongLong, whereby we first used getString 215 00:09:47,506 --> 00:09:48,686 to get this user's input. 216 00:09:48,746 --> 00:09:51,816 So notice here, this is getInt, at the very top of this code, 217 00:09:51,816 --> 00:09:54,906 we call getString, and we put that string 218 00:09:54,906 --> 00:09:56,106 in a variable called line. 219 00:09:56,456 --> 00:09:58,116 And then down here, we use a new trick, 220 00:09:58,116 --> 00:09:59,716 but there's not all that much to get int. 221 00:10:00,036 --> 00:10:03,976 There's this function called sscanf for string scan F. 222 00:10:04,596 --> 00:10:05,666 >> The F means format. 223 00:10:05,886 --> 00:10:07,386 So this is another cousin of printf, 224 00:10:07,846 --> 00:10:09,696 much like fprintf was on Monday. 225 00:10:10,176 --> 00:10:13,826 So sscanf does this; you give it a string its first argument, 226 00:10:13,826 --> 00:10:16,756 for instance line, and line is what the user just typed in. 227 00:10:17,086 --> 00:10:18,766 You then give it a for a format string, 228 00:10:18,996 --> 00:10:21,726 which unlike printf tells the computer what to print, 229 00:10:22,086 --> 00:10:23,766 this time in the context of scanf, 230 00:10:23,986 --> 00:10:25,916 the format string tells the computer what 231 00:10:25,916 --> 00:10:27,826 to expect, at least ideally. 232 00:10:28,116 --> 00:10:31,626 So this format string is saying to scanf, hopefully, 233 00:10:31,626 --> 00:10:33,666 inside of this string called line, 234 00:10:33,886 --> 00:10:37,276 there is an int followed by maybe a character. 235 00:10:37,276 --> 00:10:39,466 So maybe the user typed the number then hit the spacebar, 236 00:10:39,466 --> 00:10:41,336 or hit the spacebar then a number. 237 00:10:41,506 --> 00:10:44,616 But in short, this allows us to tell scanf, hopefully, 238 00:10:44,616 --> 00:10:45,486 you're gonna get an int, 239 00:10:45,846 --> 00:10:47,796 and then you might also get a charge, 240 00:10:47,796 --> 00:10:50,186 just if the user is being-- 241 00:10:50,686 --> 00:10:52,426 making a typo or being difficult. 242 00:10:52,776 --> 00:10:54,296 And so, at the end of this line, 243 00:10:54,296 --> 00:10:56,636 we check what the return value of scanf-- sscanf. 244 00:10:57,136 --> 00:11:01,306 Sscanf tells you exactly how many of those placeholders, 245 00:11:01,306 --> 00:11:03,386 the percent signs, were actually filled 246 00:11:03,386 --> 00:11:04,836 with values from the keyboard. 247 00:11:05,196 --> 00:11:08,696 So what we're hoping is that the user actually only types 248 00:11:08,696 --> 00:11:12,086 in an integer, in which case only the percent D placeholder 249 00:11:12,086 --> 00:11:12,846 will be filled. 250 00:11:13,176 --> 00:11:15,696 And so, sscanf will return 1. 251 00:11:15,696 --> 00:11:16,466 That's a good thing. 252 00:11:16,656 --> 00:11:19,966 Now, if the user is being difficult or makes a mistake 253 00:11:19,966 --> 00:11:25,096 at the keyboard and types in 123X, well, we're gonna capture 254 00:11:25,096 --> 00:11:28,366 that X in percent C because it's not a digit, it's not a number, 255 00:11:28,366 --> 00:11:30,676 it's a character C. And so, sscanf, 256 00:11:30,676 --> 00:11:33,366 in that accidental scenario, is gonna instead return what? 257 00:11:34,286 --> 00:11:36,406 So, 2. And so, we can detect that, 258 00:11:36,566 --> 00:11:38,336 because 2 obviously doesn't equal 1. 259 00:11:38,566 --> 00:11:41,366 And so, if that happens, this [inaudible] block executes, 260 00:11:41,366 --> 00:11:43,926 we get rid off-- we free the string, we free the memory 261 00:11:43,926 --> 00:11:45,796 that we just used to store the user's input, 262 00:11:46,046 --> 00:11:47,686 we yell at the user retry. 263 00:11:47,786 --> 00:11:49,896 So a lot of you have probably seen this retry prompt. 264 00:11:49,966 --> 00:11:50,916 This is where it comes from. 265 00:11:51,196 --> 00:11:53,706 And because we're in this intentionally infinite loop 266 00:11:53,706 --> 00:11:55,806 right now, the whole process repeats. 267 00:11:56,186 --> 00:11:59,076 So to summarize, we use this new function called sscanf 268 00:11:59,466 --> 00:12:02,766 that just allows us to parse, so to speak, the user's input 269 00:12:02,766 --> 00:12:04,456 and tees apart the various data types. 270 00:12:04,456 --> 00:12:06,996 And now, there is one new trick that I glossed over. 271 00:12:07,416 --> 00:12:09,236 The first argument is the string to parse. 272 00:12:09,636 --> 00:12:11,616 The second argument is the format string, 273 00:12:11,676 --> 00:12:14,886 what are ampersand N and ampersand C. 274 00:12:14,886 --> 00:12:14,953 [ Inaudible Remark ] 275 00:12:14,953 --> 00:12:18,026 >> So, yeah, these are the addresses. 276 00:12:18,026 --> 00:12:19,316 Ampersand means address if. 277 00:12:19,536 --> 00:12:22,506 That's the address of operator, and this returns the address 278 00:12:22,596 --> 00:12:25,066 of N and the address of C. Well, what are those? 279 00:12:25,326 --> 00:12:27,706 Well, it looks like-- ah, here they are right here. 280 00:12:27,906 --> 00:12:29,996 These are the local variables on the stack-- 281 00:12:30,056 --> 00:12:31,286 which is fine 'cause they're temporary-- 282 00:12:31,496 --> 00:12:35,586 local variables called N and C, but I can't just do this 283 00:12:35,586 --> 00:12:37,136 as I've normally do with printf. 284 00:12:37,346 --> 00:12:40,676 I can't just do this, because if I omit the ampersands, 285 00:12:41,056 --> 00:12:45,356 what gets passed into a sscanf? 286 00:12:45,356 --> 00:12:45,546 [ Inaudible Remark ] 287 00:12:45,546 --> 00:12:46,386 >> Right, the values. 288 00:12:46,386 --> 00:12:48,746 So by default, when you pass arguments to a function, 289 00:12:48,746 --> 00:12:50,916 recall that you pass them by value, 290 00:12:50,916 --> 00:12:51,986 which means you get a copy. 291 00:12:52,196 --> 00:12:55,276 So sure, sscanf has now been passed to copy of N 292 00:12:55,446 --> 00:12:57,716 which has some garbage value cause I didn't initialize it. 293 00:12:57,946 --> 00:13:00,496 Sscanf has been passed a copy of C 294 00:13:00,496 --> 00:13:02,046 which also has some garbage value. 295 00:13:02,336 --> 00:13:05,886 But if sscanf doesn't know where those variables live in memory, 296 00:13:06,046 --> 00:13:09,176 doesn't know their addresses, he can change it all they want, 297 00:13:09,336 --> 00:13:10,196 but it's not permanent. 298 00:13:10,276 --> 00:13:12,066 Because as soon as sscanf returns, 299 00:13:12,246 --> 00:13:13,486 the changes are gonna be lost. 300 00:13:13,486 --> 00:13:16,846 So anytime you want to function in C to be able to mutate, 301 00:13:16,846 --> 00:13:20,036 to change the value inside of some variable like an int 302 00:13:20,036 --> 00:13:23,126 or a char, you can't just pass in the int or char like this. 303 00:13:23,466 --> 00:13:25,196 You actually have to pass in the address. 304 00:13:25,426 --> 00:13:27,766 And what that means, to pass the address and recall, 305 00:13:28,086 --> 00:13:31,786 is to pass in a pointer to each of these variables. 306 00:13:31,786 --> 00:13:34,276 And the ampersand handles all of the hard work for you. 307 00:13:34,276 --> 00:13:37,556 It figures out where in ram N and where ram C are, 308 00:13:37,856 --> 00:13:40,096 and then passes in those pointers. 309 00:13:40,506 --> 00:13:40,626 Yeah? 310 00:13:40,626 --> 00:13:42,606 [ Inaudible Remark ] 311 00:13:42,606 --> 00:13:45,616 >> Good question. 312 00:13:45,616 --> 00:13:48,026 If the user inputs only characters, 313 00:13:48,026 --> 00:13:49,576 will sscanf return one? 314 00:13:49,576 --> 00:13:51,686 No. It will return zero in that case, 315 00:13:51,746 --> 00:13:52,856 because it will not have been able 316 00:13:52,856 --> 00:13:55,056 to populate percent D first. 317 00:13:55,196 --> 00:13:57,296 So the percent sign are important here. 318 00:13:57,296 --> 00:13:59,146 And if we actually flip through the code for getFloat, 319 00:13:59,146 --> 00:14:01,836 getLongLong, they're almost identical functions, 320 00:14:01,836 --> 00:14:03,796 except we use different format strings, for instance, 321 00:14:03,796 --> 00:14:06,506 percent F for a float instead of percent D. 322 00:14:06,876 --> 00:14:09,306 So let's actually use this new trick, sscanf. 323 00:14:09,376 --> 00:14:11,316 It's not something you have to use all that often. 324 00:14:11,316 --> 00:14:13,666 But when we start to manipulate files as we will 325 00:14:13,666 --> 00:14:16,286 for the forensics P set later on this semester, when you have 326 00:14:16,286 --> 00:14:19,596 to read in lots of words for a dictionary, it's gonna be useful 327 00:14:19,596 --> 00:14:21,336 to understand exactly what's going on. 328 00:14:21,336 --> 00:14:24,976 And so, this is one file, scanf.c, 329 00:14:24,976 --> 00:14:27,236 which is in Monday's print out, it's from last week, 330 00:14:27,266 --> 00:14:29,386 but I thought I'd resurrect it because we didn't bother looking 331 00:14:29,386 --> 00:14:33,056 at it, this is a file called scanf1.c, 332 00:14:33,216 --> 00:14:34,296 and it's pretty darn short. 333 00:14:34,616 --> 00:14:36,706 It doesn't even include the CS50 Library. 334 00:14:36,706 --> 00:14:37,876 So that training wheel is off. 335 00:14:38,156 --> 00:14:40,976 It doesn't take any command-line arguments as suggested by void. 336 00:14:41,166 --> 00:14:43,696 It does have a local variable called X, then we print 337 00:14:43,696 --> 00:14:47,256 out something fluffy number please, and then we use scanf. 338 00:14:47,726 --> 00:14:51,336 So this time, I'm using scanf instead of sscanf, 339 00:14:51,776 --> 00:14:54,166 because whereas sscanf, string scanf, 340 00:14:54,496 --> 00:14:57,726 expects to be past the string, S-- sorry-- 341 00:14:58,066 --> 00:15:01,646 rather, scanf, without that additional S, reads the input 342 00:15:01,646 --> 00:15:04,936 from where presumably, from the keyboard directly. 343 00:15:04,936 --> 00:15:06,166 So we're not using getString. 344 00:15:06,166 --> 00:15:07,506 So, same idea, this function, 345 00:15:07,506 --> 00:15:09,156 but it takes one fewer parameters. 346 00:15:09,156 --> 00:15:10,216 The first one is gone. 347 00:15:10,476 --> 00:15:12,326 Instead, we just pass in a format string 348 00:15:12,326 --> 00:15:14,666 because the keyboard is assumed as in source for the input. 349 00:15:14,946 --> 00:15:17,516 So this is actually sort of the old fashioned way 350 00:15:17,516 --> 00:15:19,366 or the non-CS50 library way 351 00:15:19,366 --> 00:15:20,866 of just getting an int from the user. 352 00:15:21,136 --> 00:15:24,556 You call scanf, you pass in the format string of percent D, 353 00:15:24,686 --> 00:15:26,486 and you pass scanf ampersand X, 354 00:15:26,896 --> 00:15:29,516 where X again is just some local value, and voila, 355 00:15:29,516 --> 00:15:32,506 you have now stored a number inside of X. 356 00:15:32,506 --> 00:15:33,856 So let's go ahead and compile this. 357 00:15:33,896 --> 00:15:39,056 So make scanf1, go ahead and run scanf1, and I'm gonna type in 1, 358 00:15:39,056 --> 00:15:41,286 2, 3, and thanks for the 1, 2, 3. 359 00:15:41,286 --> 00:15:42,356 So that in fact, worked. 360 00:15:42,356 --> 00:15:45,556 Now, let's go ahead and type A, B, C. Hit and-- 361 00:15:45,616 --> 00:15:47,546 oh, something went wrong. 362 00:15:47,546 --> 00:15:49,826 So this is why, to paint a picture 363 00:15:49,826 --> 00:15:50,836 with a very simple example, 364 00:15:50,836 --> 00:15:52,786 we introduce the CS50 library early on, 365 00:15:53,096 --> 00:15:54,956 because this is-- it looks simple. 366 00:15:54,956 --> 00:15:56,546 Frankly, this is a lot simpler than some 367 00:15:56,546 --> 00:15:57,456 of the code we've looked at, 368 00:15:57,816 --> 00:16:01,706 but its' so easily broken just by a difficult user. 369 00:16:01,706 --> 00:16:05,126 And so, the trick we use with sscanf, and two format strings 370 00:16:05,126 --> 00:16:07,886 and so forth allows us to detect errors for you 371 00:16:08,156 --> 00:16:10,116 and then trigger that retry prompt. 372 00:16:10,306 --> 00:16:13,056 Moreover, notice how dangerous functions like these are. 373 00:16:13,116 --> 00:16:17,366 This is scanf2.c. It's just as short, but this time, 374 00:16:17,366 --> 00:16:22,726 instead of using an int, I'm using a char star, aka string, 375 00:16:22,766 --> 00:16:23,856 but I'm calling it char star 376 00:16:23,856 --> 00:16:26,396 because I'm not using the CS50 library here this time. 377 00:16:26,746 --> 00:16:29,606 So a char star is not a buffer, 378 00:16:29,606 --> 00:16:32,346 and this variable name is what's intentionally misleading. 379 00:16:32,346 --> 00:16:34,826 Generally, a buffer is like an empty array, but an array 380 00:16:34,826 --> 00:16:35,966 that has some storage space. 381 00:16:35,966 --> 00:16:38,506 It might be-- there might not be anything legit in it yet, 382 00:16:38,776 --> 00:16:40,506 but it's at least memory that's been allocated. 383 00:16:40,716 --> 00:16:43,056 But with this one of code, char start buffer, 384 00:16:43,056 --> 00:16:45,736 what has been allocated for me exactly? 385 00:16:45,736 --> 00:16:46,346 [ Inaudible Remark ] 386 00:16:46,346 --> 00:16:48,326 >> A little louder. 387 00:16:48,326 --> 00:16:48,966 [ Inaudible Remark ] 388 00:16:48,966 --> 00:16:50,946 >> So just a pointer. 389 00:16:50,946 --> 00:16:52,066 Only a pointer. 390 00:16:52,236 --> 00:16:54,326 The only thing that's been allocated by that line 391 00:16:54,326 --> 00:16:58,166 of code recall is something we always draw as a square 392 00:16:58,396 --> 00:17:01,266 that represents four bytes, or 32 bits of storage space 393 00:17:01,266 --> 00:17:02,706 for an array, for a string. 394 00:17:02,956 --> 00:17:05,096 What has not been allocated by that line 395 00:17:05,096 --> 00:17:08,946 of code is anything even resembling those actual storage 396 00:17:08,946 --> 00:17:10,886 spaces for the string itself. 397 00:17:11,236 --> 00:17:13,216 And so, let's see what I proceed to do with this. 398 00:17:13,216 --> 00:17:16,006 Well, I call printf and just say string please, 399 00:17:16,326 --> 00:17:18,016 then I call scanf percent S, 400 00:17:18,016 --> 00:17:21,056 and that's okay 'cause I want a string to come in from the user, 401 00:17:21,386 --> 00:17:24,566 and I am passing the address to-- 402 00:17:24,566 --> 00:17:27,046 the address of a string to scanf. 403 00:17:27,676 --> 00:17:30,346 But what's gonna happen with this line of code, with scanf? 404 00:17:30,566 --> 00:17:33,356 Where is it gonna read this bytes 405 00:17:33,436 --> 00:17:35,006 from the user's keyboard into? 406 00:17:35,006 --> 00:17:39,146 Where are they gonna be put in memory? 407 00:17:39,296 --> 00:17:41,236 What-- where-- whatever is-- 408 00:17:41,236 --> 00:17:42,786 recall that when you just declare a variable, 409 00:17:42,786 --> 00:17:43,836 who knows what this could be. 410 00:17:43,836 --> 00:17:45,496 This could be 9, 8, 7. 411 00:17:45,496 --> 00:17:47,426 This is just some random garbage value. 412 00:17:47,696 --> 00:17:50,326 So when you call scanf, pass in percent S, 413 00:17:50,326 --> 00:17:52,996 so here come the string, the user types in a string, 414 00:17:52,996 --> 00:17:56,956 and you say put the string at the address called buffer, well, 415 00:17:56,956 --> 00:17:59,686 you know where those bytes that the user types in is gonna go? 416 00:18:00,006 --> 00:18:01,516 At this address, in memory. 417 00:18:01,566 --> 00:18:03,946 But who knows what's actually at that address. 418 00:18:04,206 --> 00:18:05,706 And what almost always happens 419 00:18:05,706 --> 00:18:08,336 when you just put something wherever you want without regard 420 00:18:08,336 --> 00:18:09,936 for it actually being your own memory? 421 00:18:09,936 --> 00:18:10,516 [ Inaudible Remark ] 422 00:18:10,516 --> 00:18:12,056 >> So, segufault. 423 00:18:12,056 --> 00:18:12,966 So let's try this. 424 00:18:12,966 --> 00:18:15,706 So let's run scanf2 string please, 425 00:18:15,936 --> 00:18:18,786 hello there, segmentation fault. 426 00:18:18,826 --> 00:18:22,076 It's that easy, because I have now indeed, read in the string, 427 00:18:22,076 --> 00:18:25,076 hello there exclamation point, put it at some address, 428 00:18:25,146 --> 00:18:27,706 some bogus address, but there is no storage space, 429 00:18:27,706 --> 00:18:30,036 there is no chunk of memory, there is no array 430 00:18:30,036 --> 00:18:32,456 of characters waiting to receive that input. 431 00:18:32,756 --> 00:18:34,686 So what does it mean to over flow a buffer? 432 00:18:35,106 --> 00:18:35,886 Case and point. 433 00:18:35,886 --> 00:18:38,056 If I were more clever here and I knew exactly how 434 00:18:38,056 --> 00:18:40,296 to write some attack code with just my keyboard 435 00:18:40,296 --> 00:18:43,256 and I could write out the 0s and 1s essentially in the right way 436 00:18:43,256 --> 00:18:46,756 with ASCII, I could perhaps compromise this machine 437 00:18:46,756 --> 00:18:47,736 or trick the program 438 00:18:47,736 --> 00:18:50,686 into executing some code that's not expected, because recall, 439 00:18:50,686 --> 00:18:53,566 we did exactly that on Monday by just assuming that what's 440 00:18:53,566 --> 00:18:55,516 in argv1 is legitimate. 441 00:18:55,846 --> 00:18:57,996 And so, we could potentially trigger 442 00:18:57,996 --> 00:19:00,716 that thing we called a buffer overrun exploit. 443 00:19:00,716 --> 00:19:03,256 It's that easy unless you actually code defensively, 444 00:19:03,416 --> 00:19:06,256 and the CS50 library at least is all about coding defensively 445 00:19:06,256 --> 00:19:09,426 and protecting you from threats against some user, 446 00:19:09,626 --> 00:19:12,096 but also mistakes from you, the programmer. 447 00:19:12,796 --> 00:19:12,886 Yeah? 448 00:19:13,386 --> 00:19:18,086 [ Inaudible Remark ] 449 00:19:18,586 --> 00:19:20,086 >> It is. So that-- 450 00:19:20,086 --> 00:19:23,036 is the pointer interpreted as the memory address? 451 00:19:23,036 --> 00:19:25,586 So, yes. We can actually see this as follows. 452 00:19:25,586 --> 00:19:29,366 Let me go ahead and load scanf2 again with my text editor, 453 00:19:29,366 --> 00:19:30,846 and let me go ahead and do this. 454 00:19:30,896 --> 00:19:36,526 Before I say string please, I'm gonna say I have a storage 455 00:19:36,596 --> 00:19:41,426 for a pointer at percent D. So I'm gonna put this pointer-- 456 00:19:41,426 --> 00:19:44,896 I'm gonna print out the address that is currently 457 00:19:44,896 --> 00:19:47,176 in that buffer array-- buffer variable. 458 00:19:47,176 --> 00:19:51,906 So let me go ahead and compile this, scanf2, let me run scanf2. 459 00:19:52,316 --> 00:19:56,276 So apparently, just by chance, the garbage value called buffer 460 00:19:56,276 --> 00:19:59,726 that we have is 2719732. 461 00:19:59,906 --> 00:20:01,976 So when I guess 987 before, I was a little off. 462 00:20:02,386 --> 00:20:04,136 >> That's the garbage value that happens to be 463 00:20:04,136 --> 00:20:05,656 in the computers' memory at this point in time. 464 00:20:05,916 --> 00:20:06,746 Now, what does that mean? 465 00:20:06,986 --> 00:20:09,846 When I call scanf, and I say put whatever the user types 466 00:20:09,846 --> 00:20:12,596 at the address in buffer, that's where it's gonna go, 467 00:20:12,596 --> 00:20:14,506 and odds are, if my program just started, 468 00:20:14,506 --> 00:20:15,916 I certainly have not called malloc, 469 00:20:15,916 --> 00:20:18,026 I certainly haven't asked for memory at that address, 470 00:20:18,396 --> 00:20:20,126 so who knows what's gonna be there, and bam, 471 00:20:20,306 --> 00:20:22,686 segmentation fault, because I've not been proactive 472 00:20:22,686 --> 00:20:23,916 in asking for memory. 473 00:20:24,166 --> 00:20:25,126 So, how to solve this? 474 00:20:25,126 --> 00:20:28,006 Well, anytime you want to actually get something 475 00:20:28,006 --> 00:20:30,336 from the user, well, we could actually do this; 476 00:20:30,336 --> 00:20:32,116 rather than allocate just the pointer, 477 00:20:32,396 --> 00:20:34,676 why don't we allocate a buffer with heck-- 478 00:20:34,676 --> 00:20:37,926 a ridiculous amount of space so that the user's string, 479 00:20:37,926 --> 00:20:39,386 odds are will fit in there. 480 00:20:39,666 --> 00:20:41,276 But even this is not a safe assumption. 481 00:20:41,276 --> 00:20:43,856 What if the user knows that I only allocated a million 482 00:20:43,856 --> 00:20:47,066 or billion or whatever that is number of bytes for my string, 483 00:20:47,066 --> 00:20:49,096 what did they have to do in order to crash 484 00:20:49,386 --> 00:20:51,126 or compromise potentially my program? 485 00:20:52,256 --> 00:20:53,136 Just one more than that. 486 00:20:53,316 --> 00:20:55,906 So again, it's not enough to just allocate more space 487 00:20:55,906 --> 00:20:57,366 and try to avoid the problem. 488 00:20:57,566 --> 00:20:59,856 You need to code against it, as by checking the length, 489 00:20:59,956 --> 00:21:02,336 and again, if you look back and getString or getInt, 490 00:21:02,636 --> 00:21:05,056 those are the tricks that you use at least and see 491 00:21:05,336 --> 00:21:07,736 to code defensively against either mistakes 492 00:21:07,736 --> 00:21:09,366 or malicious inputs. 493 00:21:10,176 --> 00:21:15,426 Alright. So what's then the relevance of this? 494 00:21:15,426 --> 00:21:18,646 Well, thus far, we have been focusing on arrays. 495 00:21:18,896 --> 00:21:20,306 And an array is the first 496 00:21:20,366 --> 00:21:22,546 of several data structures we're actually gonna see 497 00:21:22,546 --> 00:21:23,036 in the course. 498 00:21:23,036 --> 00:21:25,696 We'll call that an array is really an improvement 499 00:21:25,926 --> 00:21:28,806 over Week 0 stuff, whereby if you wanted storage space 500 00:21:28,806 --> 00:21:31,686 for some value, you had to declare a variable in scratch 501 00:21:31,686 --> 00:21:34,236 or even in C. And then if you wanted storage space 502 00:21:34,236 --> 00:21:36,456 for two variables, you declare two variables. 503 00:21:36,456 --> 00:21:38,426 If you want three things in memory, three variables, 504 00:21:38,426 --> 00:21:40,926 four variables, five-- and this very quickly started 505 00:21:40,976 --> 00:21:42,646 to get redundant, a whole lot 506 00:21:42,646 --> 00:21:44,816 of copy-paste starts happening quickly. 507 00:21:45,036 --> 00:21:46,906 And then also conceptually, if you wanted 508 00:21:46,906 --> 00:21:49,336 to start keeping track of a lot of like items, 509 00:21:49,336 --> 00:21:51,326 like someone's quiz scores, well, 510 00:21:51,486 --> 00:21:53,166 it seems completely ridiculous to have 511 00:21:53,166 --> 00:21:55,456 to have a quiz one variable, quiz two variable, 512 00:21:55,456 --> 00:21:57,346 quiz three variable, quiz four-- right? 513 00:21:57,346 --> 00:21:59,546 Wouldn't it be nice to just have a quizzes variable 514 00:21:59,826 --> 00:22:02,096 that you could then access individual members of? 515 00:22:02,096 --> 00:22:03,816 And array allowed us to do that. 516 00:22:03,816 --> 00:22:06,716 Moreover, an array allows us random access. 517 00:22:06,716 --> 00:22:07,686 What does it mean for any array 518 00:22:07,686 --> 00:22:10,576 to give us random access to data? 519 00:22:10,576 --> 00:22:11,296 [ Inaudible Response ] 520 00:22:11,296 --> 00:22:13,866 >> So you can get anywhere 521 00:22:13,866 --> 00:22:16,316 in that array using this bracket notation, right? 522 00:22:16,316 --> 00:22:18,396 You don't have to start at the beginning of the array 523 00:22:18,586 --> 00:22:21,336 and check every value to see where you're going. 524 00:22:21,336 --> 00:22:23,396 You can just jump to location bracket 1, 525 00:22:23,396 --> 00:22:25,366 bracket 2, bracket N minus 1. 526 00:22:25,616 --> 00:22:27,716 So an array is nice in terms of efficiency 527 00:22:27,886 --> 00:22:29,576 because you can jump anywhere you want. 528 00:22:29,636 --> 00:22:31,356 This was useful-- recall, for our sorting. 529 00:22:31,356 --> 00:22:33,726 When we had students up here on the stage and we needed 530 00:22:33,726 --> 00:22:35,946 to arbitrarily eject the person who is here, 531 00:22:36,166 --> 00:22:38,736 move them over here, and vice versa, well, we didn't have 532 00:22:38,736 --> 00:22:40,506 to constantly walk back and forth 533 00:22:40,506 --> 00:22:42,456 in the array spending all these additional steps. 534 00:22:42,886 --> 00:22:45,816 Rather, I just had to move this person to bracket 4, 535 00:22:45,946 --> 00:22:48,366 bracket 4 to bracket 0, or something like that. 536 00:22:48,506 --> 00:22:51,746 So an array is powerful and that it supports random access 537 00:22:51,746 --> 00:22:52,836 and it's fairly efficient, 538 00:22:53,106 --> 00:22:56,166 but what's at least one thing an array is not good at? 539 00:22:58,056 --> 00:23:02,946 What something an array is not good at? 540 00:23:02,946 --> 00:23:03,216 [ Inaudible Remark ] 541 00:23:03,216 --> 00:23:04,856 >> Inserting things into the middle, right? 542 00:23:04,856 --> 00:23:06,246 One of the first scenarios we ran 543 00:23:06,246 --> 00:23:08,696 into with the sorting algorithm was we had these 8 people 544 00:23:08,696 --> 00:23:11,106 up here, and we needed to put someone, say, here. 545 00:23:11,326 --> 00:23:12,556 So what was the solution? 546 00:23:12,556 --> 00:23:15,156 Well, the simple solution there was just eject one person 547 00:23:15,156 --> 00:23:17,206 and move them out of the way, or you could say, 548 00:23:17,206 --> 00:23:19,406 could you please move down this way? 549 00:23:19,666 --> 00:23:22,446 Right. In an array, you would have to move all of the elements 550 00:23:22,446 --> 00:23:25,836 because there is no space that's empty or free in the middle. 551 00:23:26,076 --> 00:23:29,426 And so, you have to actually spend time and spend the cycles 552 00:23:29,666 --> 00:23:31,196 to actually move things out of the way. 553 00:23:31,406 --> 00:23:34,496 Well, notice too, what getString really highlights a flaw 554 00:23:34,496 --> 00:23:35,006 within array. 555 00:23:35,286 --> 00:23:37,366 What's one of the really big problems with an array? 556 00:23:37,956 --> 00:23:40,206 It's-- sorry? 557 00:23:40,386 --> 00:23:41,276 >> Finite Size. 558 00:23:42,516 --> 00:23:45,066 >> It's a finite size, which means you have to guess 559 00:23:45,066 --> 00:23:47,626 in advance, or predict in advance, how big chunk 560 00:23:47,626 --> 00:23:48,656 of memory you're gonna need. 561 00:23:48,656 --> 00:23:51,296 And you know what, if you're wrong, that's kind of expensive. 562 00:23:51,296 --> 00:23:53,656 Because if you again, read through the line in getString, 563 00:23:53,896 --> 00:23:57,126 we do call this function called realloc which takes an array 564 00:23:57,406 --> 00:23:59,856 and returns you a bigger version of that array. 565 00:24:00,136 --> 00:24:02,426 But underneath the hood, what's really going 566 00:24:02,426 --> 00:24:06,186 on is realloc is essentially calling malloc most of time, 567 00:24:06,446 --> 00:24:09,116 manually copying everything from the old array 568 00:24:09,316 --> 00:24:11,666 into the new bigger array as with the four loop, 569 00:24:11,776 --> 00:24:14,526 just iterating from over the array, left to right and copying 570 00:24:14,526 --> 00:24:17,846 from old to new, then it calls free on the original array, 571 00:24:17,846 --> 00:24:19,336 and it returns to the new array. 572 00:24:19,646 --> 00:24:20,716 Like, that's a lot of work. 573 00:24:20,716 --> 00:24:22,646 One, you have to contact the operating system, 574 00:24:22,646 --> 00:24:25,246 and in general, anytime you ask the operating system for help, 575 00:24:25,546 --> 00:24:28,906 give me more memory, you incur a penalty, a performance penalty, 576 00:24:28,906 --> 00:24:29,986 that normally you don't notice. 577 00:24:29,986 --> 00:24:31,766 But if you're doing this again and again and again, 578 00:24:31,966 --> 00:24:34,446 your program does start to run noticeably slower. 579 00:24:34,726 --> 00:24:36,626 And also, it's just incredibly inefficient. 580 00:24:36,626 --> 00:24:38,616 If you knew or had a hunch in the beginning 581 00:24:38,836 --> 00:24:41,006 that you were gonna need more than just a single byte 582 00:24:41,006 --> 00:24:44,236 or two bytes, why did you only ask me for one or two bytes? 583 00:24:44,236 --> 00:24:45,906 Why not ask for a million bytes 584 00:24:46,246 --> 00:24:48,696 and then you'll have enough space? 585 00:24:48,696 --> 00:24:50,226 Well, what's the down side of that approach? 586 00:24:50,416 --> 00:24:51,796 Well, then you end up wasting memory. 587 00:24:51,796 --> 00:24:53,776 So it's really hard to find the sweet spot 588 00:24:53,776 --> 00:24:55,826 of getting the right amount of memory, but not too much 589 00:24:55,826 --> 00:24:57,606 that you're wasting memory unnecessarily, 590 00:24:57,946 --> 00:25:00,926 and not too little that you're wasting time copying everything 591 00:25:00,926 --> 00:25:02,536 and reallocating memory all of the time. 592 00:25:03,106 --> 00:25:06,856 But thankfully, there is a solution to this problem. 593 00:25:06,856 --> 00:25:08,616 Suppose we have a data structure 594 00:25:09,006 --> 00:25:11,656 that looks a little something more like this. 595 00:25:12,006 --> 00:25:15,786 So here is-- well, let's start calling a singly linked list, 596 00:25:15,786 --> 00:25:17,126 and this is actually the first 597 00:25:17,126 --> 00:25:19,156 of a much more interesting data structure 598 00:25:19,346 --> 00:25:20,676 that we now have the capabilities 599 00:25:20,676 --> 00:25:22,496 of designing ourselves. 600 00:25:22,496 --> 00:25:25,316 So this is just a pictorial representation of something 601 00:25:25,316 --> 00:25:27,096 that looks a lot like an array, 602 00:25:27,446 --> 00:25:29,696 but what does this thing seem to have in it? 603 00:25:29,816 --> 00:25:33,786 Well, each of these vertical rectangles has actually 604 00:25:33,786 --> 00:25:34,706 two components. 605 00:25:34,706 --> 00:25:38,016 One, apparently is an integer, the number 9, and the number 17, 606 00:25:38,016 --> 00:25:41,356 the number 22 and so forth, and then there is an arrow 607 00:25:41,356 --> 00:25:44,746 at the bottom of almost everyone of this vertical rectangles. 608 00:25:44,746 --> 00:25:46,626 What does that arrow presumably represent? 609 00:25:47,866 --> 00:25:50,606 So, an address or a pointer, right, arrows or just a pointer. 610 00:25:50,606 --> 00:25:51,706 And in fact, that's true. 611 00:25:51,986 --> 00:25:54,856 So it turns out, if one of your design goals is 612 00:25:54,856 --> 00:25:57,306 to actually have some dynamism in your data structures 613 00:25:57,306 --> 00:25:59,596 so that you can add things in the middle without having 614 00:25:59,596 --> 00:26:02,866 to move everyone down the stage, if you want to be able 615 00:26:02,866 --> 00:26:05,606 to expand the length of this thing without having 616 00:26:05,606 --> 00:26:08,596 to copy everything over to a new chunk of memory, well, 617 00:26:08,596 --> 00:26:11,756 maybe the solution is don't ask for memory that is back to back 618 00:26:11,756 --> 00:26:14,366 to back to back and then paint yourself into this corner. 619 00:26:14,706 --> 00:26:19,866 Rather, only ask for memory one node or one cell at a time, 620 00:26:20,116 --> 00:26:22,026 and then just stitch them together somehow. 621 00:26:22,056 --> 00:26:23,666 We now have pointers, we have the ability 622 00:26:23,666 --> 00:26:25,706 to have one variable point to another. 623 00:26:26,056 --> 00:26:28,826 So why don't we just include a pointer along with everyone 624 00:26:28,826 --> 00:26:31,936 of our pieces of data, or number 9, or number 17, 625 00:26:32,246 --> 00:26:36,146 and weave these data structures together as suggested 626 00:26:36,146 --> 00:26:39,136 by this arrows so that we can still walk from left to right, 627 00:26:39,136 --> 00:26:42,156 but we can also more easily insert something into the middle 628 00:26:42,226 --> 00:26:44,996 or into the end, or to the beginning. 629 00:26:45,366 --> 00:26:50,326 Now, to define a chunk of memory that has space for an integer 630 00:26:50,326 --> 00:26:53,076 like N and then a pointer called next, 631 00:26:53,146 --> 00:26:55,336 what piece of syntax do we need for Monday? 632 00:26:55,336 --> 00:26:55,946 [ Inaudible Remar k] 633 00:26:55,946 --> 00:26:57,506 >> Struct, right? 634 00:26:57,696 --> 00:27:00,706 So we introduce the notion of a struct on Monday. 635 00:27:00,706 --> 00:27:02,846 And at the time, we introduced struct for sort 636 00:27:02,846 --> 00:27:04,646 of a more compelling real world situation. 637 00:27:04,646 --> 00:27:07,836 We want to represent students, and students have IDs and houses 638 00:27:07,836 --> 00:27:09,126 and names and all of that. 639 00:27:09,366 --> 00:27:11,776 And so, rather than have three variables for one student, 640 00:27:11,986 --> 00:27:14,046 just felt a lot cleaner to have one variable 641 00:27:14,046 --> 00:27:16,846 for a student inside of which are these various fields. 642 00:27:17,046 --> 00:27:19,156 Well, this seems to suggest the same idea. 643 00:27:19,156 --> 00:27:21,956 I just need a structure of some sort of struct 644 00:27:22,156 --> 00:27:24,376 that contains what two data types, apparently? 645 00:27:24,376 --> 00:27:25,006 [ Inaudible Remark ] 646 00:27:25,006 --> 00:27:26,946 >> And int and-- 647 00:27:26,946 --> 00:27:27,766 [ Inaudible Remark ] 648 00:27:27,766 --> 00:27:29,286 >> And a pointer for some sort. 649 00:27:29,286 --> 00:27:30,306 Now, a pointer to what? 650 00:27:30,306 --> 00:27:31,356 Well, let me propose this. 651 00:27:31,546 --> 00:27:34,716 So it's a familiar syntax, even though it's little more-- 652 00:27:34,946 --> 00:27:37,036 that's-- actually, that's very familiar syntax. 653 00:27:37,106 --> 00:27:38,196 That's what we saw last time. 654 00:27:38,516 --> 00:27:40,276 So here is what we defined as a student. 655 00:27:40,506 --> 00:27:42,286 Let me propose this modification. 656 00:27:42,526 --> 00:27:44,166 So it's almost the same, but this time, 657 00:27:44,166 --> 00:27:46,496 I have an int called N instead of ID. 658 00:27:46,706 --> 00:27:48,866 I don't need a house and a name, but I do need a pointer. 659 00:27:49,146 --> 00:27:51,886 But because this time I'm not pointing at strings, 660 00:27:51,966 --> 00:27:54,026 I'm not pointing at a house, I'm not pointing at a name, 661 00:27:54,276 --> 00:27:56,806 I wanna point at another one of these structures, 662 00:27:57,056 --> 00:27:59,296 we need a slightly new piece of syntax which is this. 663 00:27:59,886 --> 00:28:02,016 Type defstruct, and then node. 664 00:28:02,016 --> 00:28:04,846 So I actually have to specify this word node 665 00:28:04,846 --> 00:28:07,276 or some keyword here because I wanna be able 666 00:28:07,276 --> 00:28:11,136 to reflexively refer to myself inside of the curly braces. 667 00:28:11,476 --> 00:28:15,666 So if you want a struct like this to contain an integer N 668 00:28:16,006 --> 00:28:19,816 and a pointer as suggested by the star, but that pointer needs 669 00:28:19,816 --> 00:28:23,796 to be able to point at, not a string, not a house, not a name, 670 00:28:24,076 --> 00:28:25,776 but at another such structure. 671 00:28:26,086 --> 00:28:29,576 The Syntax is to simply say very explicitly, this pointer, 672 00:28:29,576 --> 00:28:34,386 let's call it next, is going to be a pointer to a struct node. 673 00:28:34,836 --> 00:28:36,476 Now, this feels a little redundant here. 674 00:28:37,096 --> 00:28:39,336 But at the bottom, recall that the very-- 675 00:28:39,896 --> 00:28:42,826 at the very end of a struct, just as we do with student, 676 00:28:43,016 --> 00:28:46,496 we give the whole structure a name, you don't need to say node 677 00:28:46,496 --> 00:28:48,786 down here, but it's a nice short hand notation, 678 00:28:48,786 --> 00:28:51,316 because if you don't include that name at the bottom, 679 00:28:51,646 --> 00:28:52,706 you then have to, henceforth, 680 00:28:52,816 --> 00:28:56,096 call your structure struct node in code. 681 00:28:56,096 --> 00:28:58,436 Every time you want one of these things, you have to struct node, 682 00:28:58,436 --> 00:29:00,776 struct to node, struct node, and it just gets very tedious. 683 00:29:00,776 --> 00:29:03,006 If you wanna just call these things nodes, you can-- 684 00:29:03,006 --> 00:29:05,936 just like we did with student, redundantly saying node 685 00:29:06,106 --> 00:29:08,286 down here at the bottom outside the curly braces. 686 00:29:08,576 --> 00:29:09,666 So what's the takeaway? 687 00:29:09,856 --> 00:29:12,236 The takeaway is just that we have now implemented 688 00:29:12,236 --> 00:29:15,656 in C the code with which to implement anyone 689 00:29:15,976 --> 00:29:18,376 of these individual rectangles. 690 00:29:18,956 --> 00:29:21,556 So that begs the question, who cares? 691 00:29:21,736 --> 00:29:24,976 Well, when you actually store data on a hard drive, 692 00:29:24,976 --> 00:29:27,016 you probably don't know in advance how big 693 00:29:27,016 --> 00:29:28,806 that essay you're writing is going to be. 694 00:29:28,806 --> 00:29:30,596 You don't know how many pages it's going to be. 695 00:29:30,596 --> 00:29:31,486 When you check your email, 696 00:29:31,756 --> 00:29:33,896 you don't know how long an email you've just received, 697 00:29:33,896 --> 00:29:34,626 and therefore, you don't know 698 00:29:34,626 --> 00:29:37,246 in advance how much disk space is gonna be necessary 699 00:29:37,246 --> 00:29:39,606 if you download it locally to store that email. 700 00:29:39,606 --> 00:29:42,506 So it turns out on a file system, on Mac, in a PC, 701 00:29:42,506 --> 00:29:46,796 in Linux server, you have these things called linked lists being 702 00:29:46,796 --> 00:29:49,926 used all over the place to solve very common, 703 00:29:49,926 --> 00:29:51,176 very compelling problems, 704 00:29:51,466 --> 00:29:53,206 and among which is the implementation 705 00:29:53,206 --> 00:29:54,666 of a file on disks. 706 00:29:54,666 --> 00:29:56,426 So recall in Monday, we introduce 707 00:29:56,426 --> 00:29:59,156 that very simple example for file IO. 708 00:29:59,156 --> 00:30:01,686 IO, just this geek speak for input-output. 709 00:30:02,046 --> 00:30:04,856 >> So File IO refers to taking files as input 710 00:30:04,856 --> 00:30:06,116 and producing them as output. 711 00:30:06,276 --> 00:30:08,286 And we did that with that very simple example. 712 00:30:08,546 --> 00:30:11,456 As a little sanity check, what function do we use on Monday 713 00:30:11,456 --> 00:30:14,706 to write data to a file line by line? 714 00:30:14,706 --> 00:30:16,136 [ Inaudible Remark ] 715 00:30:16,136 --> 00:30:18,556 >> So we did use fopen to open and create the file, 716 00:30:18,556 --> 00:30:20,236 but then line by line? 717 00:30:20,776 --> 00:30:24,976 Fprintf. So instead of printf which prints to the screen 718 00:30:24,976 --> 00:30:27,436 by default, we used fprintf which prints to a file. 719 00:30:27,666 --> 00:30:30,756 So on Monday, we introduce this ability to actually store stuff 720 00:30:30,756 --> 00:30:33,146 on disk, which frankly, is just useful in general, 721 00:30:33,146 --> 00:30:34,906 'cause otherwise, almost every program we were 722 00:30:34,906 --> 00:30:37,226 in is kind of-- completely a femoral. 723 00:30:37,226 --> 00:30:38,796 You've quit running the program and bam, 724 00:30:38,966 --> 00:30:40,046 any data you've produced, 725 00:30:40,046 --> 00:30:42,146 any input you've provided disappears. 726 00:30:42,146 --> 00:30:44,516 But now, we have the ability to write things to disk, 727 00:30:44,746 --> 00:30:46,726 and certainly is this an omnipresent feature. 728 00:30:46,726 --> 00:30:48,966 So for problem set 5, which will-- 729 00:30:48,966 --> 00:30:50,486 is not going up the door this week-- 730 00:30:50,486 --> 00:30:51,966 you actually have the week off for the quiz, 731 00:30:52,266 --> 00:30:53,326 but will go out next week-- 732 00:30:53,586 --> 00:30:55,646 you'll actually be handed a number 733 00:30:55,646 --> 00:30:57,416 of problems involving file IO. 734 00:30:57,416 --> 00:30:59,546 And one of them is going to involve recovering, 735 00:31:00,076 --> 00:31:03,456 as I predicted in week 0 or 1, recovery from some JPEGs 736 00:31:03,786 --> 00:31:06,916 that I am accidentally going to erase 737 00:31:06,916 --> 00:31:09,366 over the next several days, having taken them on campus 738 00:31:09,366 --> 00:31:10,306 with a digital camera. 739 00:31:10,306 --> 00:31:12,296 And the goal is gonna be to recover these files. 740 00:31:12,296 --> 00:31:13,906 Well, these files, thankfully, 741 00:31:14,036 --> 00:31:16,096 we're gonna simplify a little bit, and we're gonna make sure 742 00:31:16,096 --> 00:31:19,316 to wipe or to zero or to format 743 00:31:19,696 --> 00:31:21,576 or erase any number of synonyms here. 744 00:31:21,716 --> 00:31:23,546 We're gonna erase the compact flash card 745 00:31:23,546 --> 00:31:26,376 that this digital camera has, so that for convenience's sake, 746 00:31:26,566 --> 00:31:28,146 when we actually take these photos, 747 00:31:28,236 --> 00:31:30,496 even though the camera might not know exactly how big 748 00:31:30,496 --> 00:31:32,986 that photo is gonna be in advance based on the lighting 749 00:31:32,986 --> 00:31:34,346 and the colors we're trying to store, 750 00:31:34,636 --> 00:31:37,956 it's at least gonna store them from top to bottom, it-- 751 00:31:38,126 --> 00:31:40,396 well, rather it's gonna store them contiguously 752 00:31:40,396 --> 00:31:41,626 as though they're an array. 753 00:31:41,626 --> 00:31:43,056 But this is not the norm. 754 00:31:43,056 --> 00:31:46,166 On your computer, you probably delete files all the time, 755 00:31:46,216 --> 00:31:47,636 either by dragging them to the trash 756 00:31:47,636 --> 00:31:49,916 or because you have temporary files that things 757 00:31:49,916 --> 00:31:51,386 like Microsoft Word are using. 758 00:31:51,696 --> 00:31:54,076 And so, hard drives become what's called fragmented. 759 00:31:54,076 --> 00:31:56,336 And how many of you have heard the term defragment your 760 00:31:56,336 --> 00:31:56,816 hard drive? 761 00:31:57,386 --> 00:31:58,616 How many of you actually do this? 762 00:31:59,776 --> 00:32:01,976 So if it's actually-- reasonable number. 763 00:32:01,976 --> 00:32:04,636 Frankly, these days, it's probably not all that necessary, 764 00:32:04,636 --> 00:32:07,206 especially now that hard drives have gotten much faster, 765 00:32:07,506 --> 00:32:10,286 and they've also gotten much better 766 00:32:10,286 --> 00:32:11,906 at managing these details themselves. 767 00:32:12,256 --> 00:32:15,466 But linked lists, this topic we're gonna start now exploring, 768 00:32:15,696 --> 00:32:18,626 lend themselves to the implementation of files on disk, 769 00:32:18,726 --> 00:32:21,296 because suppose you do start writing some essay, 770 00:32:21,296 --> 00:32:24,556 and it's pretty short initially, so Microsoft Word creates a file 771 00:32:24,556 --> 00:32:26,746 on disk that's ye big, then you come back the next day 772 00:32:26,746 --> 00:32:28,986 and you add another chapter to your essay or your thesis. 773 00:32:29,226 --> 00:32:30,886 And so, now, you need this much more space, 774 00:32:31,196 --> 00:32:32,286 but you know what just happened? 775 00:32:32,546 --> 00:32:35,946 Between yesterday and today, you downloaded some MP3s 776 00:32:35,946 --> 00:32:37,776 from the internet, and where did they go? 777 00:32:37,776 --> 00:32:39,426 They went to the free space on your hard drive, 778 00:32:39,466 --> 00:32:41,606 which at that time was right there. 779 00:32:41,816 --> 00:32:44,346 So now, you have first chapter of your thesis here, 780 00:32:44,346 --> 00:32:47,086 you've got some illegal music here, and now, you need space 781 00:32:47,206 --> 00:32:49,196 for the rest of your thesis. 782 00:32:49,346 --> 00:32:52,136 Well, this would be unfortunate if Microsoft Word said, "Sorry, 783 00:32:52,136 --> 00:32:53,256 you should have finished yesterday." 784 00:32:53,646 --> 00:32:55,766 Right? So instead, the computer's got 785 00:32:55,766 --> 00:32:57,186 to find some space elsewhere. 786 00:32:57,186 --> 00:32:59,146 And that space, you know, it's probably over there. 787 00:32:59,276 --> 00:33:01,386 So this is why files become fragmented, 788 00:33:01,386 --> 00:33:03,696 because if you don't know a priori how much space you need 789 00:33:03,696 --> 00:33:06,866 just like in RAM with erase, well, thankfully, 790 00:33:07,086 --> 00:33:09,206 the file system, the operating system, is going to say, 791 00:33:09,206 --> 00:33:12,806 "Alright, I'm gonna put chapter 1 here, chapter 2 over here, 792 00:33:13,036 --> 00:33:15,896 but I'm going to implement the equivalent on disk 793 00:33:16,206 --> 00:33:17,556 of this thing called a pointer." 794 00:33:17,556 --> 00:33:20,006 A pointer is generally referred to in RAM only, 795 00:33:20,226 --> 00:33:22,656 but you can certainly implement the same idea on disks 796 00:33:22,656 --> 00:33:24,756 so that you can weave together big files, 797 00:33:25,016 --> 00:33:27,566 even though they are fragmented across multiple locations. 798 00:33:27,926 --> 00:33:30,396 Now, what are some of the real world implications of this, 799 00:33:30,396 --> 00:33:31,306 and why come up with 800 00:33:31,306 --> 00:33:33,306 such a dramatic picture as this one, CSI? 801 00:33:33,946 --> 00:33:36,656 Well, if you had files potentially all 802 00:33:36,656 --> 00:33:39,806 over your hard drive and wove in together just 803 00:33:39,806 --> 00:33:41,876 with these references, these pointers on disk, 804 00:33:42,166 --> 00:33:44,626 it makes it very easy for traces of files 805 00:33:44,626 --> 00:33:46,106 to be left all over the place. 806 00:33:46,146 --> 00:33:48,236 In fact-- and this is more detailed than we need 807 00:33:48,236 --> 00:33:49,296 to spend time on today. 808 00:33:49,296 --> 00:33:53,286 In fact, when you ask an operating system for disk space, 809 00:33:53,786 --> 00:33:58,736 even if you need, let's say, 200 bytes for 200 keystrokes, 810 00:33:58,946 --> 00:34:00,296 for efficiency reasons, 811 00:34:00,296 --> 00:34:03,566 operating systems generally return you a chunk of memory 812 00:34:03,566 --> 00:34:04,896 in certain block sizes. 813 00:34:04,896 --> 00:34:08,296 You might actually get back 512 bytes just 814 00:34:08,296 --> 00:34:09,846 to store those 200 bytes. 815 00:34:09,846 --> 00:34:11,956 And the remaining 312, guess what happens 816 00:34:11,956 --> 00:34:13,656 to them, completely wasted. 817 00:34:13,836 --> 00:34:15,296 Because the operating system realizes, 818 00:34:15,296 --> 00:34:17,126 this probably isn't the common case. 819 00:34:17,126 --> 00:34:18,486 These days, people download movies 820 00:34:18,486 --> 00:34:19,816 and music and write big things. 821 00:34:19,816 --> 00:34:22,966 So having lots of small files isn't generally the common case. 822 00:34:23,196 --> 00:34:25,426 So this wastefulness doesn't happen all this time. 823 00:34:25,426 --> 00:34:27,916 But that wastefulness between the space you ask for 824 00:34:28,016 --> 00:34:30,966 and the space you're actually using, is called slack space. 825 00:34:30,966 --> 00:34:33,046 And so, in the world of forensics, digital forensics, 826 00:34:33,096 --> 00:34:35,636 CSI and these kinds of things, these are the kinds 827 00:34:35,636 --> 00:34:40,666 of places forensic or agents look for trying 828 00:34:40,666 --> 00:34:42,196 to find remnants of data. 829 00:34:42,196 --> 00:34:43,526 'Cause what's really happening? 830 00:34:43,526 --> 00:34:46,976 Well, just to motivate next week's Pset, and also, 831 00:34:46,976 --> 00:34:49,396 perhaps send you scrambling to start deleting stuff 832 00:34:49,396 --> 00:34:53,486 from your hard drives properly, recall from week 1 833 00:34:53,776 --> 00:34:57,066 that a hard drive is implemented in a bunch of different ways. 834 00:34:57,066 --> 00:35:00,356 These days, you have solid state drives which are flash memory. 835 00:35:00,646 --> 00:35:01,966 Idea is similar in sphere. 836 00:35:01,966 --> 00:35:04,406 The same things happen, 'cause most this is a software issue, 837 00:35:04,406 --> 00:35:06,676 not hardware issue, but I'll draw it old school 838 00:35:06,676 --> 00:35:09,616 like this floppy disk we tore apart a couple of weeks ago. 839 00:35:09,796 --> 00:35:11,666 Inside of a typical hard drive is a platter. 840 00:35:11,666 --> 00:35:12,336 It's a metal disk. 841 00:35:12,696 --> 00:35:15,126 And just like we saw in that cheesy video a couple 842 00:35:15,126 --> 00:35:17,606 of weeks ago, you have all of these bits somewhere 843 00:35:17,606 --> 00:35:20,196 on the hard drive, 0s and 1s, but those 0s 844 00:35:20,196 --> 00:35:22,496 and 1s are implemented how? 845 00:35:23,526 --> 00:35:25,306 A little magnetic particles, right? 846 00:35:25,306 --> 00:35:26,806 That if it's oriented this way, it's 1, 847 00:35:26,806 --> 00:35:29,246 if it's south north instead, it's a 0. 848 00:35:29,246 --> 00:35:30,786 So something arbitrary like that. 849 00:35:31,056 --> 00:35:32,436 Well, you probably know-- 850 00:35:32,436 --> 00:35:33,826 and I think I might have mentioned earlier-- 851 00:35:33,826 --> 00:35:37,946 that when you actually store files on a disk, they can-- 852 00:35:37,946 --> 00:35:39,596 they end up somewhere, say, right there. 853 00:35:39,596 --> 00:35:41,946 That is some file, some JPEG that I've downloaded 854 00:35:41,946 --> 00:35:43,156 or photograph I've taken. 855 00:35:43,516 --> 00:35:46,056 But suppose then I delete this. 856 00:35:46,196 --> 00:35:47,966 Well, what does it mean to delete a file? 857 00:35:48,066 --> 00:35:50,386 We'll, recall that somewhere in your computer, 858 00:35:50,386 --> 00:35:52,216 somewhere in the operating system, there is some kind 859 00:35:52,216 --> 00:35:53,756 of table that essentially is 860 00:35:53,756 --> 00:35:55,406 like a two column excel spreadsheet 861 00:35:55,606 --> 00:35:57,396 where the first column is the file name 862 00:35:57,396 --> 00:35:59,556 and the second column is the location. 863 00:35:59,926 --> 00:36:02,206 And if this file is called foo.jpeg, 864 00:36:02,486 --> 00:36:05,516 it might be at location-- who knows what this is? 865 00:36:05,596 --> 00:36:06,236 One, two, three. 866 00:36:06,346 --> 00:36:09,106 That is byte number 1, 2, 3 on the physical hard drive. 867 00:36:09,426 --> 00:36:11,026 So it's at location 1, 2, 3. 868 00:36:11,286 --> 00:36:13,176 Well, most of you probably know this now. 869 00:36:13,226 --> 00:36:15,856 What happens when you actually delete that file by dragging it 870 00:36:15,886 --> 00:36:19,226 to your recycle bin first or trash can? 871 00:36:20,586 --> 00:36:21,336 Yeah? 872 00:36:21,736 --> 00:36:23,736 [ Inaudible Remark ] 873 00:36:24,136 --> 00:36:24,846 >> Okay, perfect. 874 00:36:24,846 --> 00:36:26,766 So when you drag a file to your trash can 875 00:36:26,766 --> 00:36:27,746 or to your recycle bin, 876 00:36:27,746 --> 00:36:30,596 depending on what your operating system calls it, these 0s 877 00:36:30,596 --> 00:36:32,846 and 1s, they actually don't do anywhere, right? 878 00:36:32,846 --> 00:36:34,236 You certainly don't get rid of them. 879 00:36:34,236 --> 00:36:36,096 Otherwise, you'd have perpetually declining 880 00:36:36,096 --> 00:36:36,866 storage space. 881 00:36:37,096 --> 00:36:38,186 They actually stay there, 882 00:36:38,186 --> 00:36:42,446 but what happens is the computer just forgets where they are 883 00:36:42,886 --> 00:36:45,946 as by just erasing this row in this table. 884 00:36:45,946 --> 00:36:47,386 And in fact, the problem's a little worse. 885 00:36:47,386 --> 00:36:49,596 Most of you probably know that if you want to delete something, 886 00:36:49,846 --> 00:36:52,416 you can't just drag it to your recycle bin or trash can, 887 00:36:52,416 --> 00:36:53,576 because where is it at that point? 888 00:36:53,576 --> 00:36:54,456 [ Inaudible Remark ] 889 00:36:54,456 --> 00:36:56,656 >> It's in your recycle bin or your trash can. 890 00:36:56,656 --> 00:36:58,506 You actually have to go to the special menu 891 00:36:58,506 --> 00:37:00,526 and say empty recycle bin or empty trash. 892 00:37:00,766 --> 00:37:01,946 So it's in that second step 893 00:37:01,946 --> 00:37:04,506 when you actually proactively say delete this information, 894 00:37:04,746 --> 00:37:06,266 or you wait enough days or months 895 00:37:06,266 --> 00:37:07,836 that the computer just does this for you, 896 00:37:08,146 --> 00:37:09,646 that this in here gets erased. 897 00:37:10,006 --> 00:37:12,916 So when I accidentally format the compact flash card 898 00:37:12,916 --> 00:37:14,516 that has next week's photos on it, 899 00:37:14,696 --> 00:37:16,356 what's really gonna happen is all 900 00:37:16,356 --> 00:37:18,836 of the photos are still gonna be on that compact flash card, 901 00:37:19,096 --> 00:37:21,476 but the computer is gonna have no idea where they are. 902 00:37:21,746 --> 00:37:24,736 And so, what I'm gonna go ahead and do is make a forensic image 903 00:37:24,786 --> 00:37:27,006 of this compact flash card, which means plug it 904 00:37:27,006 --> 00:37:29,226 in to a special computer, run some special software 905 00:37:29,476 --> 00:37:32,296 that just blindly reads in all of the 0s and 1s 906 00:37:32,296 --> 00:37:34,846 from the flash memory which, even though its not circular 907 00:37:34,846 --> 00:37:37,596 and metallic, actually has zeros and ones implemented somehow 908 00:37:37,596 --> 00:37:40,506 on it, and what you're gonna get is a file that is, 909 00:37:40,506 --> 00:37:43,716 as they would use in police scenarios, a forensic copy 910 00:37:43,776 --> 00:37:46,076 of that media, so that you can then proceed 911 00:37:46,076 --> 00:37:47,536 to recover those JPEGs. 912 00:37:47,656 --> 00:37:50,646 And you'll see exactly how this is done in the spec itself, 913 00:37:51,006 --> 00:37:55,156 but it's going to amount to a strikingly easy process. 914 00:37:55,376 --> 00:37:56,596 And this piece that actually grew 915 00:37:56,596 --> 00:37:59,176 out of a real world scenario where three or five-- 916 00:37:59,176 --> 00:38:01,336 some number of years ago when grad school, a friend of mine-- 917 00:38:01,336 --> 00:38:02,936 this happened to a friend of mine. 918 00:38:02,936 --> 00:38:04,636 He didn't accidentally delete the files, 919 00:38:04,926 --> 00:38:07,736 but his compact flash card got corrupted somehow. 920 00:38:07,736 --> 00:38:09,936 And it was just, oh, there was something mechanical happened 921 00:38:09,936 --> 00:38:12,036 with the camera, so it just got a little corrupted. 922 00:38:12,256 --> 00:38:15,066 And so, the source code you guys are gonna write is exactly the 923 00:38:15,066 --> 00:38:17,206 code that he and I put together in order 924 00:38:17,206 --> 00:38:20,066 to recover his JPEGs several years ago. 925 00:38:20,066 --> 00:38:21,876 So there's easier ways to do this, right? 926 00:38:21,876 --> 00:38:23,736 You can probably just buy a piece of software to do this 927 00:38:23,736 --> 00:38:26,156 if you really need those photos back, but you'll be struct 928 00:38:26,246 --> 00:38:29,416 by think, by just how relatively straightforward it is once you 929 00:38:29,416 --> 00:38:31,006 understand the concepts. 930 00:38:31,006 --> 00:38:33,246 Now, as in the side, just to actually put some scare 931 00:38:33,396 --> 00:38:38,356 into folks, it is this easy to erase files from a disk. 932 00:38:38,356 --> 00:38:41,186 It is this easy to find remnants of files inside 933 00:38:41,186 --> 00:38:42,496 that stuff called slack space. 934 00:38:42,496 --> 00:38:44,916 So unfortunately, most operating system don't know how 935 00:38:44,916 --> 00:38:45,876 to do this very well. 936 00:38:45,876 --> 00:38:49,806 Apple has gone a bit better than has for instance, Windows. 937 00:38:50,156 --> 00:38:51,816 But if you've ever wondered what's-- 938 00:38:52,066 --> 00:38:54,256 let's see, this thing here, top left, in a Mac, 939 00:38:54,256 --> 00:38:57,076 you have secure empty trash, what does this option do? 940 00:38:57,076 --> 00:38:59,826 These are the empty trash. 941 00:38:59,826 --> 00:39:00,916 [ Inaudible Remark ] 942 00:39:00,916 --> 00:39:02,466 >> Yeah. So what secure kind of-- 943 00:39:02,466 --> 00:39:04,066 the Layman's way of explaining this, 944 00:39:04,066 --> 00:39:07,526 what that process actually does is it, yes, erases that, 945 00:39:07,766 --> 00:39:09,966 and it also doesn't quite erase these, 946 00:39:10,236 --> 00:39:12,466 but it turns them all into-- what could you do? 947 00:39:12,746 --> 00:39:15,876 Just all 1s, or something like that, or all 0s, 948 00:39:15,876 --> 00:39:18,236 or maybe better yet, all random. 949 00:39:18,236 --> 00:39:19,886 But the problem is-- and one of the-- 950 00:39:19,886 --> 00:39:21,896 we're gonna have you read an academic paper 951 00:39:21,896 --> 00:39:24,416 of sorts wherein a buddy of mine from MIT a few years ago, 952 00:39:24,416 --> 00:39:26,856 where he actually spent a good deal of money on eBay 953 00:39:26,856 --> 00:39:29,646 and similar websites, bought up a whole bunch of hard drives 954 00:39:29,646 --> 00:39:30,776 that people had just sold 955 00:39:30,776 --> 00:39:32,626 because they didn't need them anymore, 956 00:39:32,626 --> 00:39:35,736 and most of these people were nontechnical, or were people 957 00:39:35,736 --> 00:39:38,506 who had put their trust in the best buys of the world 958 00:39:38,506 --> 00:39:39,786 to often recycle hard drives. 959 00:39:39,786 --> 00:39:41,426 But if you've ever read, very often, 960 00:39:41,426 --> 00:39:43,246 realize it's a lot cheaper for us to just throw 961 00:39:43,246 --> 00:39:45,746 out the drive hard drives than actually securely erase them. 962 00:39:46,076 --> 00:39:48,846 So this paper you'll read is meant to scare and it's meant 963 00:39:48,846 --> 00:39:51,456 to open your eyes to the fact that he found on hundreds 964 00:39:51,456 --> 00:39:53,936 of hard drives thousands of credit card numbers, 965 00:39:54,256 --> 00:39:56,736 financial documents, pornography, 966 00:39:57,096 --> 00:40:01,056 health document stuff that is may well be in some subset, 967 00:40:01,346 --> 00:40:03,826 on your own computers or your own old hard drives. 968 00:40:04,096 --> 00:40:04,826 >> So why don't we go ahead 969 00:40:04,826 --> 00:40:10,156 and take a five minute break, let that simmer. 970 00:40:10,926 --> 00:40:12,236 We'll come back. 971 00:40:12,236 --> 00:40:13,036 Okay. So we are back. 972 00:40:13,426 --> 00:40:15,116 This is a sneak preview of one of the files 973 00:40:15,116 --> 00:40:17,146 that is among your printouts for today. 974 00:40:17,386 --> 00:40:18,836 A lot of it is pretty straightforward, 975 00:40:18,836 --> 00:40:21,706 but it introduces some new syntax arrow notation. 976 00:40:21,706 --> 00:40:22,956 So before we dive into this, 977 00:40:22,956 --> 00:40:24,416 which at first glance can actually look a little 978 00:40:24,416 --> 00:40:27,026 overwhelming, even though if you take it step by step, 979 00:40:27,276 --> 00:40:28,506 it actually is gonna be very-- 980 00:40:28,506 --> 00:40:29,496 can perfectly consistent 981 00:40:29,496 --> 00:40:31,136 with the verbal story we're about to tell. 982 00:40:31,496 --> 00:40:34,416 I thought I would first make this a little more human, 983 00:40:34,566 --> 00:40:38,896 for which, if you humor me, we need six volunteers. 984 00:40:38,956 --> 00:40:40,686 The first one gets to be first literally. 985 00:40:40,686 --> 00:40:42,306 [ Inaudible Remark ] 986 00:40:42,306 --> 00:40:43,896 >> You would like to be first? 987 00:40:43,896 --> 00:40:45,756 Alright. I need five more volunteers. 988 00:40:45,756 --> 00:40:47,946 You of course, need to be comfortable on camera here. 989 00:40:47,946 --> 00:40:50,076 How about-- yeah, ideally five people we've not have before. 990 00:40:50,076 --> 00:40:50,526 Come on up. 991 00:40:51,056 --> 00:40:52,046 Let's do five new faces. 992 00:40:52,046 --> 00:40:53,026 Yeah, come on up. 993 00:40:53,076 --> 00:40:55,266 Alright. And I think we need one more. 994 00:40:55,836 --> 00:40:56,306 Don't yawn. 995 00:40:56,536 --> 00:40:58,016 Okay. The hand'll go up. 996 00:40:58,016 --> 00:41:00,486 Alright. And last, yeah, come on down. 997 00:41:00,586 --> 00:41:01,856 Let me see if we're out of paper yet. 998 00:41:02,696 --> 00:41:04,786 Alright. And if you could line yourselves up right here roughly 999 00:41:04,786 --> 00:41:07,306 where you are in the same order as the board. 1000 00:41:07,946 --> 00:41:08,536 Let's see. 1001 00:41:08,786 --> 00:41:09,696 Yeah, I think we're good. 1002 00:41:09,976 --> 00:41:12,146 Yes. We have-- no, we have a number for you, number nine. 1003 00:41:13,166 --> 00:41:14,866 Alright, come on up. 1004 00:41:15,196 --> 00:41:16,606 Yeah. Alright. 1005 00:41:16,836 --> 00:41:19,246 So-- no, don't follow me. 1006 00:41:19,246 --> 00:41:23,436 Let's do it over here so we'll all stay in-- within view. 1007 00:41:23,496 --> 00:41:26,266 Okay. So if you wouldn't mind, first of all, 1008 00:41:26,266 --> 00:41:29,176 order yourselves exactly as these five numbers on the board, 1009 00:41:29,606 --> 00:41:34,026 and then go ahead and use your left hands as I pointer. 1010 00:41:34,026 --> 00:41:35,946 We got a gap in our link list there. 1011 00:41:35,946 --> 00:41:37,896 [ Inaudible Remark ] 1012 00:41:37,896 --> 00:41:38,246 >> That's okay. 1013 00:41:39,006 --> 00:41:39,736 Alright, much better. 1014 00:41:39,776 --> 00:41:41,126 Alright. I'm gonna try to stand out of the way. 1015 00:41:41,326 --> 00:41:43,286 And actually, go ahead and spread out just a little bit 1016 00:41:43,286 --> 00:41:45,116 so that pointing is not awkward. 1017 00:41:45,596 --> 00:41:46,926 So, few feet each of you. 1018 00:41:47,536 --> 00:41:50,796 Alright. So now-- no, [inaudible]-- there we go. 1019 00:41:50,906 --> 00:41:53,796 Okay. It is my first time with this demo too, actually. 1020 00:41:53,796 --> 00:41:56,276 So now, go ahead, and with the arrows, 1021 00:41:56,576 --> 00:42:00,146 implement with your left hand. 1022 00:42:00,146 --> 00:42:01,646 Okay. Excellent. 1023 00:42:01,876 --> 00:42:03,206 So yes, and you get to stand there 1024 00:42:03,206 --> 00:42:04,776 with just your hand dangling. 1025 00:42:05,046 --> 00:42:06,906 So we now have a linked list. 1026 00:42:07,046 --> 00:42:10,646 And actually-- this is actually frankly, I think a lot more easy 1027 00:42:10,646 --> 00:42:11,696 than working through this in code, 1028 00:42:11,696 --> 00:42:13,506 because we can actually manipulate the pointers 1029 00:42:13,506 --> 00:42:14,926 and the values a little more easily. 1030 00:42:15,196 --> 00:42:17,906 So let's first ask a question before this gets too awkward 1031 00:42:17,906 --> 00:42:18,206 up here. 1032 00:42:18,436 --> 00:42:20,866 One is, suppose now, I'm the program, 1033 00:42:20,866 --> 00:42:23,456 and I'm storing these numbers, not in an array obviously, 1034 00:42:23,736 --> 00:42:24,756 but as a linked list, 1035 00:42:24,756 --> 00:42:26,516 and suppose I wanna search this list, 1036 00:42:26,516 --> 00:42:28,586 and I wanna search for the number 50. 1037 00:42:28,846 --> 00:42:32,136 How do I go about using a linked list, not necessarily in code, 1038 00:42:32,136 --> 00:42:33,916 but conceptually finding the number 50 1039 00:42:33,916 --> 00:42:35,976 like we did two weeks ago up here? 1040 00:42:36,756 --> 00:42:37,506 What do I do? 1041 00:42:37,546 --> 00:42:40,056 I am the program. 1042 00:42:41,396 --> 00:42:42,476 Okay, sure. 1043 00:42:42,476 --> 00:42:42,686 [ Inaudible Remark ] 1044 00:42:42,686 --> 00:42:43,846 >> Okay. So I'd-- frankly, 1045 00:42:43,846 --> 00:42:47,426 because we literally have these pointers pointing 1046 00:42:47,426 --> 00:42:49,036 at the next element, the next element, 1047 00:42:49,266 --> 00:42:51,416 I cannot use what fancy technique 1048 00:42:51,416 --> 00:42:52,746 that we've using since week zero. 1049 00:42:52,746 --> 00:42:53,726 [ Inaudible Remark ] 1050 00:42:53,726 --> 00:42:55,516 >> So I can't use binary search or divide 1051 00:42:55,516 --> 00:42:57,556 and conquer more generally because I have no way 1052 00:42:57,556 --> 00:42:59,776 of just jumping to this middle of the list 1053 00:42:59,986 --> 00:43:01,146 because I don't know where it is. 1054 00:43:01,146 --> 00:43:04,506 The only thing I know about at this point in the story is, 1055 00:43:04,506 --> 00:43:06,466 as the keyword first suggests, 1056 00:43:06,696 --> 00:43:08,306 is the very first note in the list. 1057 00:43:08,496 --> 00:43:10,766 So the means by which you implement in the-- 1058 00:43:10,766 --> 00:43:13,296 in actual code a linked list, is yes, 1059 00:43:13,626 --> 00:43:15,936 you could remember every address of everyone 1060 00:43:15,936 --> 00:43:17,006 of this so called nodes. 1061 00:43:17,236 --> 00:43:18,956 How could you remember so many addresses? 1062 00:43:19,216 --> 00:43:20,906 I could have an array of pointers, 1063 00:43:21,036 --> 00:43:23,316 but that would completely defeat the point of this dynamism. 1064 00:43:23,316 --> 00:43:24,776 So we don't wanna take a step backwards 1065 00:43:24,776 --> 00:43:28,446 and implement the pointers, your left hands, with an array just 1066 00:43:28,446 --> 00:43:30,396 so we can access them more quickly, 'cause again, 1067 00:43:30,596 --> 00:43:32,656 we would be regressing back to an array itself. 1068 00:43:32,916 --> 00:43:35,586 So I need more dynamism, but I only need one pointer. 1069 00:43:35,586 --> 00:43:40,326 So first here is the only piece of data, 4 bytes, 32 bits, 1070 00:43:40,566 --> 00:43:43,606 that I, the main function, actually need to hang on to 1071 00:43:43,826 --> 00:43:46,616 in order to hang on to this whole list in memory, 1072 00:43:46,616 --> 00:43:49,406 because as the hand suggests, I can get to any node now, 1073 00:43:49,616 --> 00:43:51,186 so long as I remember just one of them. 1074 00:43:51,186 --> 00:43:53,726 And perhaps obviously, I need to remember the first, 1075 00:43:53,886 --> 00:43:56,716 because if I remember, say, the second, and if you're pointing-- 1076 00:43:56,716 --> 00:44:00,046 if you don't mind just over to number 17-- I mean, this now, 1077 00:44:00,076 --> 00:44:02,326 we've obviously orphaned poor number nine. 1078 00:44:02,326 --> 00:44:04,436 So just-- in terms of common sense, 1079 00:44:04,646 --> 00:44:06,296 we need to start the list at the first node. 1080 00:44:06,296 --> 00:44:06,726 So here we go. 1081 00:44:07,026 --> 00:44:09,596 So go ahead and fix your pointer and your hair, 1082 00:44:09,996 --> 00:44:13,166 and we have to now find the number 50. 1083 00:44:13,166 --> 00:44:15,396 So I'm the program, I have a pointer here, 1084 00:44:15,556 --> 00:44:17,506 I'm gonna need some new syntax in C 1085 00:44:17,506 --> 00:44:19,246 to actually follow a pointer. 1086 00:44:19,476 --> 00:44:21,626 On Monday, we use dot notation to get 1087 00:44:21,626 --> 00:44:23,206 at the data member of a struct. 1088 00:44:23,596 --> 00:44:25,326 These are structs, but we'll see-- 1089 00:44:25,326 --> 00:44:26,956 we're gonna use slightly new notation today. 1090 00:44:26,956 --> 00:44:28,676 We're gonna use literally an arrow, a hyphen, 1091 00:44:28,676 --> 00:44:30,956 and then an angled bracket so that it actually looks 1092 00:44:30,956 --> 00:44:32,976 like an arrow in code, but more on that in a moment. 1093 00:44:33,256 --> 00:44:35,296 So I'm gonna essentially follow this-- 1094 00:44:35,566 --> 00:44:37,436 these pointers just in a four loop, 1095 00:44:37,436 --> 00:44:40,336 and I'll use some new syntax eventually to actually do this. 1096 00:44:40,366 --> 00:44:43,786 But conceptually, it's one step here is this 50, is this 50, 1097 00:44:43,916 --> 00:44:46,316 is this 50, is this 50, in this 50, and now, 1098 00:44:46,316 --> 00:44:48,796 I see his left hand is not actually pointing. 1099 00:44:48,996 --> 00:44:50,686 In fact, as the picture up there suggests, 1100 00:44:50,686 --> 00:44:52,376 with the little antenna, he's null. 1101 00:44:52,376 --> 00:44:54,856 His left hand is null at this point, as suggested 1102 00:44:54,856 --> 00:44:55,896 by just dangling there. 1103 00:44:56,136 --> 00:44:57,776 And so, that means we're at the end of the list. 1104 00:44:58,096 --> 00:44:58,926 Alright. So at this point 1105 00:44:58,926 --> 00:45:01,306 in the story I know 50 is not present in the list. 1106 00:45:01,646 --> 00:45:03,606 So let's do something a little more interesting. 1107 00:45:03,606 --> 00:45:05,776 Suppose I now want to-- and you guys stay there-- 1108 00:45:06,226 --> 00:45:11,436 suppose I want to, let's say insert into this list. 1109 00:45:11,616 --> 00:45:13,116 Come get a 7th volunteer? 1110 00:45:14,046 --> 00:45:14,956 55? Anyone? 1111 00:45:15,126 --> 00:45:15,676 Yeah, come on up. 1112 00:45:16,426 --> 00:45:18,866 Alright. So now, I have number 55. 1113 00:45:19,186 --> 00:45:21,126 So I have just called malloc in class. 1114 00:45:21,126 --> 00:45:22,356 Malloc int. 1115 00:45:22,516 --> 00:45:24,126 alright. So now, we have-- what's your name? 1116 00:45:24,396 --> 00:45:24,576 >> Jacob. 1117 00:45:24,776 --> 00:45:27,816 >> Jacob has been malloced, and now, we have another struct 1118 00:45:27,816 --> 00:45:30,646 in memory, and he, as a malloced chunk of memory, 1119 00:45:30,646 --> 00:45:33,656 I make sure to malloc, not as Jacob, but enough RAM 1120 00:45:33,926 --> 00:45:35,886 for both his number and for his left hand. 1121 00:45:35,886 --> 00:45:38,646 So now, we have what, 8 bytes; 4 bytes for the number, 1122 00:45:38,996 --> 00:45:40,126 4 bytes for the pointer. 1123 00:45:40,326 --> 00:45:42,706 So Jacob here, like every one of these guys here, 1124 00:45:42,916 --> 00:45:45,726 represents a struct whose total size is 8 bytes. 1125 00:45:46,036 --> 00:45:48,216 Now, malloc just returns to me a chunk of memory. 1126 00:45:48,416 --> 00:45:49,536 I can-- let me borrow this back. 1127 00:45:49,636 --> 00:45:51,486 I can very easily-- as we'll see in code-- 1128 00:45:51,696 --> 00:45:55,096 put a number inside of the int inside of the structure. 1129 00:45:55,296 --> 00:45:57,746 But now, I have to do something with his left hand, 1130 00:45:57,746 --> 00:45:59,326 which right now, is just some garbage value. 1131 00:45:59,326 --> 00:46:00,936 Who knows where it's pointing off to, right? 1132 00:46:00,976 --> 00:46:02,076 Same story as before. 1133 00:46:02,386 --> 00:46:03,356 So how do I do this? 1134 00:46:03,356 --> 00:46:05,506 Well, malloc returned to me a chunk of memory, 1135 00:46:05,776 --> 00:46:10,806 so I'm gonna go ahead and play the role of new pointer. 1136 00:46:11,086 --> 00:46:12,176 So I am now point. 1137 00:46:12,176 --> 00:46:14,326 I need to allocated myself in code as we'll see, 1138 00:46:14,326 --> 00:46:17,796 just a new pointer, 32 bits that just remembers where Jacob is. 1139 00:46:17,846 --> 00:46:19,846 So right now, his left hand-- 1140 00:46:19,846 --> 00:46:21,216 if you could just point off awkwardly-- 1141 00:46:21,646 --> 00:46:27,186 I am gonna point to him because I called malloc, but he-- 1142 00:46:27,186 --> 00:46:29,036 his pointer is just dangling somewhere. 1143 00:46:29,036 --> 00:46:30,996 So let's figure out what to do with him. 1144 00:46:30,996 --> 00:46:32,536 Well, let me go ahead and just move him 1145 00:46:32,536 --> 00:46:34,896 over here, if you don't mind. 1146 00:46:35,646 --> 00:46:38,526 Because if I want to now insert Jacob, number 55, 1147 00:46:38,526 --> 00:46:40,506 into this list, what do I have to do? 1148 00:46:40,506 --> 00:46:42,786 Well, first of all, I'm gonna keep around this pointer 1149 00:46:42,786 --> 00:46:44,666 and remember where Jacob is, but then, 1150 00:46:44,666 --> 00:46:45,916 just as before, I have to check. 1151 00:46:46,256 --> 00:46:48,456 Alright. Does he belong-- and stay here for just a moment. 1152 00:46:48,636 --> 00:46:49,576 Does he belong here? 1153 00:46:49,626 --> 00:46:52,516 Nine? No. Nine is greater than-- 55 is greater than 9. 1154 00:46:52,516 --> 00:46:55,636 So I keep going, I keep going, I keep going, I keep going. 1155 00:46:55,636 --> 00:46:57,356 Can you meet me at the end of the list? 1156 00:46:57,426 --> 00:47:01,786 And 34, I have to keep going, but now, this pointer is null, 1157 00:47:01,786 --> 00:47:03,776 so this actually is a pretty trivial when insertion. 1158 00:47:04,026 --> 00:47:07,356 What does it take if-- Jacob, you don't mind coming over here. 1159 00:47:07,356 --> 00:47:10,486 So what is it gonna take mechanically to insert Jacob 1160 00:47:10,486 --> 00:47:11,836 into the right part of the list? 1161 00:47:11,836 --> 00:47:14,346 I now know he belongs here, what code-- 1162 00:47:14,346 --> 00:47:16,006 what lines of code do I need to implement? 1163 00:47:16,016 --> 00:47:18,016 [ Inaudible Remark ] 1164 00:47:18,026 --> 00:47:18,776 >> Yeah. Go ahead. 1165 00:47:18,826 --> 00:47:19,216 What's your name? 1166 00:47:19,536 --> 00:47:19,746 >> Mark. 1167 00:47:19,916 --> 00:47:20,346 >> Mark. Okay. 1168 00:47:20,346 --> 00:47:21,696 So what-- you have the right inclination. 1169 00:47:21,696 --> 00:47:22,916 What-- it's just expressing words. 1170 00:47:23,046 --> 00:47:23,396 >> In Array. 1171 00:47:23,716 --> 00:47:24,006 >> So-- 1172 00:47:24,096 --> 00:47:24,686 >> A pointer. 1173 00:47:24,986 --> 00:47:26,716 >> So you know-- you need to update your left hand, 1174 00:47:26,716 --> 00:47:29,396 the null pointer at the moment, it's currently zero, 1175 00:47:29,726 --> 00:47:31,356 need to point to Jacob. 1176 00:47:31,356 --> 00:47:32,666 And now, what does Jacob need to do? 1177 00:47:33,086 --> 00:47:35,156 Because this is just dangling. 1178 00:47:35,206 --> 00:47:36,596 So what should be initialize these two? 1179 00:47:36,596 --> 00:47:37,626 [ Inaudible Remark ] 1180 00:47:37,626 --> 00:47:38,106 >. So null. 1181 00:47:38,106 --> 00:47:39,156 So you can just point downward. 1182 00:47:39,356 --> 00:47:42,166 So it was really three major steps. 1183 00:47:42,166 --> 00:47:44,076 One, I had to do some four looping 1184 00:47:44,076 --> 00:47:45,446 and find the right location. 1185 00:47:45,626 --> 00:47:46,796 But if that didn't point in the story, 1186 00:47:46,796 --> 00:47:48,516 I'll just had update 2 pointers. 1187 00:47:48,516 --> 00:47:51,646 I had to update Mark's left hand, which was null. 1188 00:47:51,646 --> 00:47:54,366 Now, he is pointing at Jacob, and Jacob had to make sure 1189 00:47:54,366 --> 00:47:56,336 to initialize his left hand to null, 1190 00:47:56,496 --> 00:47:58,886 because if he were just kind of pointing out into nowhere, 1191 00:47:58,886 --> 00:48:00,476 what's the risk in the future? 1192 00:48:00,476 --> 00:48:01,046 [ Inaudible Remark ] 1193 00:48:01,046 --> 00:48:03,926 >> So there'll be another link in the list. 1194 00:48:03,926 --> 00:48:06,566 But if he is just pointing at some garbage value, 1195 00:48:06,566 --> 00:48:07,586 what's the implication? 1196 00:48:07,586 --> 00:48:10,176 Well, it means next time, if I start traversing this list, 1197 00:48:10,366 --> 00:48:13,436 trying to insert the number 65, I would go no, not here, 1198 00:48:13,436 --> 00:48:16,256 not here, not here, not here, not here, not here, 1199 00:48:16,416 --> 00:48:18,856 and then I'm just gonna fly off the end of this list going 1200 00:48:18,856 --> 00:48:20,396 to wherever he is pointing, 1201 00:48:20,636 --> 00:48:22,516 which is probably an invalid memory address, 1202 00:48:22,516 --> 00:48:23,476 which means what's gonna happen? 1203 00:48:23,476 --> 00:48:24,256 [ Inaudible Remark ] 1204 00:48:24,256 --> 00:48:24,656 >> Segfault. 1205 00:48:24,656 --> 00:48:26,216 Alright. Let's try one more, 'cause-- 1206 00:48:26,216 --> 00:48:28,446 at least one more 'cause it's gonna get a little trickier 1207 00:48:28,446 --> 00:48:28,876 than this. 1208 00:48:29,156 --> 00:48:30,996 Let's go ahead and insert-- you guys can stay there. 1209 00:48:30,996 --> 00:48:33,216 We're just gonna keep adding to the awkwardness up here. 1210 00:48:33,596 --> 00:48:36,766 So I need someone for number 5. 1211 00:48:37,196 --> 00:48:39,096 Alright. Come on down. 1212 00:48:39,096 --> 00:48:39,456 What's your name? 1213 00:48:40,036 --> 00:48:40,236 >> Kenny. 1214 00:48:40,506 --> 00:48:41,626 >> Kenny. Malloc, Kenny. 1215 00:48:44,096 --> 00:48:48,036 Alright. So I'm going to store in Kenny's N-- 1216 00:48:48,286 --> 00:48:50,866 the field's called N and his struct, the number 5. 1217 00:48:51,136 --> 00:48:53,936 His left hand is just some garbage value right now. 1218 00:48:54,116 --> 00:48:56,076 I'm gonna do the same story, so if you wanna just kind 1219 00:48:56,076 --> 00:48:57,066 of tag along for a moment. 1220 00:48:57,286 --> 00:48:58,536 I start off here at the beginning. 1221 00:48:58,746 --> 00:49:00,596 The-- my list is still at the beginning here. 1222 00:49:00,596 --> 00:49:01,726 So I check the first element. 1223 00:49:02,426 --> 00:49:05,896 Nine. So Kenny belongs before number 9 'cause he's number 5. 1224 00:49:06,196 --> 00:49:09,166 So what do you propose we do to get him 1225 00:49:09,166 --> 00:49:10,326 into the right place in the list? 1226 00:49:10,936 --> 00:49:13,726 Okay. So-- and what's your name again? 1227 00:49:14,346 --> 00:49:14,616 >> Herb. 1228 00:49:14,746 --> 00:49:15,336 >> Herb. Okay. 1229 00:49:15,336 --> 00:49:17,426 So Herb is pointing now at Kenny-- 1230 00:49:17,426 --> 00:49:18,146 [ Inaudible Remark ] 1231 00:49:18,146 --> 00:49:25,166 >> But someone pushed back. 1232 00:49:25,166 --> 00:49:25,436 [ Inaudible Remark ] 1233 00:49:25,436 --> 00:49:26,326 >> Okay. So, right. 1234 00:49:26,436 --> 00:49:27,456 So let's try that first though. 1235 00:49:27,456 --> 00:49:29,516 So Herb is pointing at Kenny, but now-- 1236 00:49:29,576 --> 00:49:30,046 what's your name again? 1237 00:49:30,476 --> 00:49:30,756 >> Karen. 1238 00:49:30,846 --> 00:49:32,486 >> Karen. Who is pointing at Karen? 1239 00:49:33,116 --> 00:49:34,256 We've lost Karen, right? 1240 00:49:34,256 --> 00:49:37,476 We've just leaked this entire string of memory, 1241 00:49:37,476 --> 00:49:41,206 this entire list of memory, because Herb pointed too soon 1242 00:49:41,436 --> 00:49:44,216 at Kenny, because now, we've orphaned the rest of this list. 1243 00:49:44,216 --> 00:49:45,616 Right? Case and point, memory leak. 1244 00:49:45,616 --> 00:49:47,156 And this is a big memory leak. 1245 00:49:47,156 --> 00:49:49,426 Now, we've lost the entire chain, and there is no way 1246 00:49:49,426 --> 00:49:50,396 of getting back to this now. 1247 00:49:50,396 --> 00:49:51,016 So let's fix. 1248 00:49:51,016 --> 00:49:52,206 What do we have to do first instead? 1249 00:49:52,666 --> 00:49:53,416 >> I will point at her. 1250 00:49:53,586 --> 00:49:55,676 >> So Kenny is gonna point at Karen first. 1251 00:49:55,676 --> 00:49:56,706 So now, it feels redundant. 1252 00:49:56,706 --> 00:49:58,946 Now, it's like we've got two heads of this list, 1253 00:49:58,946 --> 00:49:59,966 two starts of the list. 1254 00:50:00,556 --> 00:50:03,096 >> So now, what do we do? 1255 00:50:03,336 --> 00:50:05,526 So now, Herb just has to repoint himself, 1256 00:50:05,526 --> 00:50:07,756 not at Karen, but at Kenny. 1257 00:50:08,106 --> 00:50:09,616 Alright. So, one more. 1258 00:50:09,676 --> 00:50:10,586 This one's gonna be a pain. 1259 00:50:10,846 --> 00:50:15,046 Alright. So who would like to be the most difficult node, 20? 1260 00:50:16,006 --> 00:50:17,196 Yes, in the back. 1261 00:50:17,296 --> 00:50:18,056 Come on up. 1262 00:50:18,356 --> 00:50:19,246 Alright, what's your name? 1263 00:50:20,086 --> 00:50:20,666 >> Max. 1264 00:50:20,666 --> 00:50:22,996 >> Max. Malloc, Max. 1265 00:50:23,636 --> 00:50:26,936 Alright. Let's get this right the first time. 1266 00:50:28,146 --> 00:50:29,586 Okay. Very nice. 1267 00:50:30,366 --> 00:50:32,146 Okay. So here we go. 1268 00:50:32,146 --> 00:50:35,356 I'm gonna do my four loop thing where I try to find the location 1269 00:50:35,356 --> 00:50:37,556 in this list to insert this new value 20, 1270 00:50:37,646 --> 00:50:38,756 and nope, doesn't go here. 1271 00:50:38,756 --> 00:50:41,226 So I keep going, so I keep going, and oh, 1272 00:50:41,456 --> 00:50:43,056 it belongs somewhere in here. 1273 00:50:43,056 --> 00:50:44,726 I don't wanna keep going, 'cause what just happen 1274 00:50:44,726 --> 00:50:47,046 if I went too far, what can I not do at this point? 1275 00:50:47,046 --> 00:50:47,176 [ Inaudible Remark ] 1276 00:50:47,176 --> 00:50:49,956 >> I can't go back, which means I'm not gonna be able 1277 00:50:49,956 --> 00:50:50,626 to update-- what's your name? 1278 00:50:50,626 --> 00:50:50,736 >> Abby. 1279 00:50:50,876 --> 00:50:53,356 >> I'm not gonna be able to update Abby's left hand, 1280 00:50:53,356 --> 00:50:56,326 which means yes, I could insert Max here before-- 1281 00:50:56,696 --> 00:50:56,966 >> Rachel. 1282 00:50:57,026 --> 00:50:59,526 >> Rachel, but then, I can't fix the beginning of the list. 1283 00:50:59,526 --> 00:51:01,146 So at this point-- and this is gonna be the more-- 1284 00:51:01,236 --> 00:51:03,906 most complicated of the several in code-- I-- 1285 00:51:04,126 --> 00:51:07,146 doing all the leg work here, I kind of need two pointers, 1286 00:51:07,146 --> 00:51:09,006 my left and right hand, to point at both 1287 00:51:09,006 --> 00:51:11,486 of these guys temporarily, as well as Max, 1288 00:51:11,796 --> 00:51:13,576 so that we actually can do this correctly. 1289 00:51:13,576 --> 00:51:14,616 It's only gonna be three steps, 1290 00:51:14,966 --> 00:51:16,846 but order of operations is going to be important. 1291 00:51:16,846 --> 00:51:19,966 So what do you propose that we do first to insert max, 1292 00:51:19,966 --> 00:51:25,026 assuming I now am pointing at those two? 1293 00:51:25,026 --> 00:51:25,286 [ Inaudible Remark ] 1294 00:51:25,286 --> 00:51:27,156 >> Okay. Max is gonna point to Rachel. 1295 00:51:28,016 --> 00:51:30,136 Okay. And Rachel is again is 22. 1296 00:51:30,566 --> 00:51:34,736 Alright. So that's step one. 1297 00:51:34,946 --> 00:51:37,296 Okay. So now, you-- Abby? 1298 00:51:37,496 --> 00:51:41,896 Abby, 17, points at Max-- and now, conceptually, 1299 00:51:41,896 --> 00:51:44,196 if you don't mind kind of shuffling along. 1300 00:51:44,716 --> 00:51:45,146 Are we good? 1301 00:51:45,896 --> 00:51:47,526 Good then? 1302 00:51:47,526 --> 00:51:48,316 [ Inaudible Remark ] 1303 00:51:48,316 --> 00:51:56,026 >> We'll see-- it actually maps pretty well to see code. 1304 00:51:56,026 --> 00:51:58,536 But it's a little more clear, believe it or not, 1305 00:51:58,536 --> 00:51:59,666 to actually do it with humans, I think. 1306 00:51:59,666 --> 00:52:00,756 So we'll do that in just a moment. 1307 00:52:00,986 --> 00:52:02,726 Okay. So anymore updates needs to be done? 1308 00:52:03,926 --> 00:52:06,396 Alright. So let's now make this even more difficult. 1309 00:52:06,396 --> 00:52:07,816 I don't think we need any more humans. 1310 00:52:07,816 --> 00:52:09,836 In fact, it's time to start deleting some of you guys. 1311 00:52:10,146 --> 00:52:12,876 So let's go ahead and delete from the tale. 1312 00:52:12,876 --> 00:52:14,496 So, number 20. 1313 00:52:14,496 --> 00:52:16,536 Okay, Max-- I'm sorry, your time is up already. 1314 00:52:16,896 --> 00:52:19,116 So wait, don't just get out of the list. 1315 00:52:19,346 --> 00:52:20,626 We need to now delete Max. 1316 00:52:20,816 --> 00:52:22,666 So how do I delete the number 20? 1317 00:52:22,916 --> 00:52:24,866 Well, I'm-- be the four loop, so I'm gonna have to again, 1318 00:52:24,866 --> 00:52:25,616 start at the beginning. 1319 00:52:25,616 --> 00:52:26,736 And here's the downside, right? 1320 00:52:26,736 --> 00:52:28,886 We're already paying the price for this dynamism 1321 00:52:28,886 --> 00:52:30,696 and this flexibility of memory management. 1322 00:52:30,916 --> 00:52:32,736 What's the cost that I keep incurring 1323 00:52:32,736 --> 00:52:33,946 with every one of these algorithms? 1324 00:52:33,946 --> 00:52:34,586 [ Inaudible Remark ] 1325 00:52:34,586 --> 00:52:37,346 >> I have to traverse the whole list, right? 1326 00:52:37,346 --> 00:52:39,926 It's as though we now have operations that used 1327 00:52:39,926 --> 00:52:42,096 to be constant time, order of, O(1), 1328 00:52:42,096 --> 00:52:44,236 where I could just jump right where I want in the list. 1329 00:52:44,496 --> 00:52:47,536 Now, it seems that every one of these insertions or deletions is 1330 00:52:47,536 --> 00:52:51,706 at least O(n), because in the worst case, 1331 00:52:51,706 --> 00:52:53,416 we have to put someone at the very end of the list. 1332 00:52:53,416 --> 00:52:55,426 So again, there's this theme of tradeoffs in the course 1333 00:52:55,426 --> 00:52:58,206 and in CS in general, and here's another manifestation of that. 1334 00:52:58,396 --> 00:52:59,496 So I wanna delete 20. 1335 00:52:59,496 --> 00:53:00,136 So here we go. 1336 00:53:00,256 --> 00:53:03,176 This is not 20, this is not 20, this is not 20, 1337 00:53:03,586 --> 00:53:04,366 should I take the step? 1338 00:53:05,626 --> 00:53:05,693 >> No. 1339 00:53:05,693 --> 00:53:07,556 >> So not yet, 'cause if I do, I'm not gonna be able 1340 00:53:07,556 --> 00:53:08,746 to update the list, and I'm again, 1341 00:53:08,746 --> 00:53:10,386 gonna orphan part of this list. 1342 00:53:10,546 --> 00:53:12,936 So now, I am at 17 and 20. 1343 00:53:13,226 --> 00:53:14,826 I'm gonna go ahead and do what? 1344 00:53:14,826 --> 00:53:17,816 Shall I free Max? 1345 00:53:18,406 --> 00:53:19,256 >> No. 1346 00:53:19,256 --> 00:53:19,716 >> Okay, why? 1347 00:53:19,716 --> 00:53:20,196 [ Simultaneous Talking ] 1348 00:53:20,196 --> 00:53:23,406 >> Right. So if I free Max-- now take-- just to manifest the-- 1349 00:53:23,406 --> 00:53:24,386 can you step back a bit? 1350 00:53:24,526 --> 00:53:27,586 Suppose I've called the function free and I've passed it Max, 1351 00:53:27,816 --> 00:53:30,206 as Max being a pointer to the structure now. 1352 00:53:30,566 --> 00:53:33,396 Now, technically, if I don't do anything else in code, 1353 00:53:33,806 --> 00:53:36,126 odds are Max's left hand is still pointing where? 1354 00:53:36,406 --> 00:53:39,236 'Cause free, just like in the world of file systems 1355 00:53:39,236 --> 00:53:42,406 and forensics, doesn't actually reclaim that pointer 1356 00:53:42,406 --> 00:53:45,986 and change it immediately, but it's now no longer safe to trust 1357 00:53:45,986 --> 00:53:48,586 that his left hand is pointing at any legitimate value. 1358 00:53:48,766 --> 00:53:51,266 So we better go ahead and undo this and make sure, 1359 00:53:51,266 --> 00:53:52,426 if we're gonna delete Max, 1360 00:53:52,426 --> 00:53:54,576 that we first remember where he's pointing. 1361 00:53:54,576 --> 00:53:55,056 How do we do this? 1362 00:53:56,256 --> 00:53:57,436 So Abby could point to him. 1363 00:53:57,436 --> 00:53:58,756 But what just happened? 1364 00:53:58,756 --> 00:53:59,936 [ Inaudible Remark ] 1365 00:53:59,936 --> 00:54:00,886 >> Now, we just orphaned-- 1366 00:54:00,886 --> 00:54:02,376 [ Inaudible Remark ] 1367 00:54:02,376 --> 00:54:03,296 >> Well, not the rest of the list. 1368 00:54:03,296 --> 00:54:04,756 Abby is now pointing to Rachel. 1369 00:54:04,826 --> 00:54:06,196 But who did we orphan? 1370 00:54:06,406 --> 00:54:07,406 So now, we orphaned Max. 1371 00:54:08,086 --> 00:54:10,106 So now, we again have a memory leak, right? 1372 00:54:10,106 --> 00:54:10,816 It's not as bad. 1373 00:54:10,816 --> 00:54:13,156 We've just kind of lost track of Max, so you know, just-- 1374 00:54:13,206 --> 00:54:14,456 big deal, just one node. 1375 00:54:14,896 --> 00:54:17,256 But it's still 8 bytes. 1376 00:54:17,256 --> 00:54:18,676 And if we do this frankly, again and again, 1377 00:54:18,676 --> 00:54:19,656 that's gonna start to add up. 1378 00:54:19,656 --> 00:54:20,766 So this isn't good enough too. 1379 00:54:20,766 --> 00:54:22,706 So you know, I'm gonna have to get involved here, right? 1380 00:54:22,706 --> 00:54:25,186 I'm gonna have to have a special temporary pointer, 1381 00:54:25,396 --> 00:54:27,526 maybe new pointer predecessor pointer, what-- 1382 00:54:27,526 --> 00:54:29,236 pred pointer is what I used in the picture there. 1383 00:54:29,526 --> 00:54:30,556 So let me help out here. 1384 00:54:30,556 --> 00:54:33,776 So I'm gonna point at Max, me being a special alloc-- 1385 00:54:33,776 --> 00:54:35,616 special local variable, that's a pointer. 1386 00:54:35,856 --> 00:54:37,906 Now, Abby can go ahead and point as before. 1387 00:54:38,286 --> 00:54:40,786 So now, the list is fixed, and now, I can call free 1388 00:54:40,786 --> 00:54:44,096 on what I'm pointing at, and bam, Max disappears, 1389 00:54:45,166 --> 00:54:48,456 stage left-- hand round of applause for Max if we could. 1390 00:54:48,456 --> 00:54:49,756 [ Applause ] 1391 00:54:49,756 --> 00:54:52,146 >> Okay. So now, Max is free-- 1392 00:54:52,556 --> 00:54:54,186 actually, you can hang out here with us if you want. 1393 00:54:54,186 --> 00:54:55,396 It's-- we don't have to banish you. 1394 00:54:55,616 --> 00:54:58,046 So Max has been freed, but the list has been updated, 1395 00:54:58,046 --> 00:54:59,006 so let's do two more. 1396 00:54:59,396 --> 00:55:02,686 So let's go ahead now and delete-- let's do an easy one. 1397 00:55:02,736 --> 00:55:03,756 Jacob? Alright, Jacob. 1398 00:55:04,006 --> 00:55:06,016 I want to free Jacob. 1399 00:55:06,016 --> 00:55:08,316 So I wanna free the node containing the number 55. 1400 00:55:08,316 --> 00:55:08,906 So here we go. 1401 00:55:09,226 --> 00:55:12,016 Four loop, nope, not 55, nope, nope, nope, 1402 00:55:12,016 --> 00:55:13,646 nope, or O(n), found Jacob. 1403 00:55:13,646 --> 00:55:15,116 And so, what do I do here? 1404 00:55:16,276 --> 00:55:16,966 Free Jacob. 1405 00:55:17,146 --> 00:55:20,416 But what's just happened? 1406 00:55:20,416 --> 00:55:20,636 [ Inaudible Remark ] 1407 00:55:20,636 --> 00:55:22,846 >> Now, Mark's left hand is again pointing 1408 00:55:22,846 --> 00:55:23,966 at some garbage value. 1409 00:55:23,966 --> 00:55:25,076 He is technically pointing 1410 00:55:25,076 --> 00:55:27,566 at the memory previously occupied by Jacob. 1411 00:55:27,776 --> 00:55:29,566 But as soon as we call free on Jacob, 1412 00:55:29,816 --> 00:55:31,306 that memory might be get reused, 1413 00:55:31,306 --> 00:55:34,336 so I can't take this N minus 1 step. 1414 00:55:34,336 --> 00:55:36,226 I have to pause here, and before-- 1415 00:55:36,226 --> 00:55:37,666 if you can come back into the story for moment. 1416 00:55:37,936 --> 00:55:40,446 Before I actually free Mark-- 1417 00:55:40,446 --> 00:55:43,356 free Jacob, I should probably point at him, 1418 00:55:43,436 --> 00:55:44,506 then update what first? 1419 00:55:45,906 --> 00:55:47,146 Mark's left hand. 1420 00:55:47,326 --> 00:55:48,626 So now, your left hand goes down. 1421 00:55:48,926 --> 00:55:51,016 I have still retained the pointer to Jacob. 1422 00:55:51,016 --> 00:55:52,286 And so, now, I can call free 1423 00:55:52,286 --> 00:55:54,166 on the pointer that I have retained. 1424 00:55:54,166 --> 00:55:56,056 And so, now, Jacob is out of the story, 1425 00:55:56,306 --> 00:55:57,256 but you can stay there if you want. 1426 00:55:57,416 --> 00:55:59,836 So let's free one more just for good measure. 1427 00:56:00,206 --> 00:56:03,736 So Kenny is number 5, you're about to be banished. 1428 00:56:03,736 --> 00:56:06,136 So free Kenny, how do I do this? 1429 00:56:06,136 --> 00:56:08,786 Well, technically, Herb is pointing at Kenny, 1430 00:56:09,076 --> 00:56:10,446 so I can very easily find-- 1431 00:56:10,496 --> 00:56:11,876 okay, here it is, the number five. 1432 00:56:12,256 --> 00:56:14,766 So now, let's make the same mistake. 1433 00:56:15,026 --> 00:56:15,486 Free Kenny. 1434 00:56:15,846 --> 00:56:16,686 So if you could step back. 1435 00:56:17,616 --> 00:56:18,626 What's the problem, obviously? 1436 00:56:18,626 --> 00:56:20,316 [ Inaudible Remark ] 1437 00:56:20,316 --> 00:56:21,716 >> So what's that? 1438 00:56:22,206 --> 00:56:24,606 So Kenny is now-- well, Kenny has been freed. 1439 00:56:24,606 --> 00:56:25,976 So technically, he is not orphaned. 1440 00:56:26,106 --> 00:56:27,436 But who-- 1441 00:56:27,436 --> 00:56:27,776 [ Simultaneous Talking ] 1442 00:56:27,776 --> 00:56:28,956 >> The rest, yeah. 1443 00:56:28,956 --> 00:56:31,896 So now, even though his left hand might, as an artifact, 1444 00:56:31,896 --> 00:56:34,996 still be pointing at number 9 onward the rest of the list, 1445 00:56:35,376 --> 00:56:36,756 it's no longer considered valid. 1446 00:56:36,856 --> 00:56:37,976 So I can't touch that. 1447 00:56:38,226 --> 00:56:39,616 So I need to again use this trick. 1448 00:56:39,616 --> 00:56:42,516 I need to get involved with some temporary storage, point at 9, 1449 00:56:42,776 --> 00:56:45,486 point at 5, and now, I can free Kenny. 1450 00:56:45,686 --> 00:56:47,296 And now, what do I do here? 1451 00:56:47,296 --> 00:56:49,326 Herb is still pointing-- yeah, follow Kenny. 1452 00:56:49,556 --> 00:56:51,636 Yeah. So Herb is still pointing at Kenny. 1453 00:56:51,636 --> 00:56:53,046 And so, what do we need to do to fix the list? 1454 00:56:53,046 --> 00:56:54,576 [ Inaudible Remark ] 1455 00:56:54,576 --> 00:57:00,216 >> Now, we update Herb to point to 9 and 17, 22, and so forth. 1456 00:57:00,276 --> 00:57:03,146 So frankly, it's-- you know, it's easy to make mistakes 1457 00:57:03,146 --> 00:57:04,806 as we did several times here, but it's only two 1458 00:57:04,806 --> 00:57:07,806 or three operations, and let's go ahead and see. 1459 00:57:07,806 --> 00:57:09,336 This actually worked at a lot better than I though it would. 1460 00:57:09,566 --> 00:57:11,196 Let's go-- a quick round of applause for these guys, 1461 00:57:11,196 --> 00:57:12,246 and let's now translate this to code. 1462 00:57:12,246 --> 00:57:12,313 [ Applause ] 1463 00:57:12,313 --> 00:57:14,286 >> If you guys could see the TFs 1464 00:57:14,496 --> 00:57:15,576 on the corner there, that would be great. 1465 00:57:16,076 --> 00:57:17,776 You can keep these if you want as a souvenir. 1466 00:57:18,356 --> 00:57:20,806 So let's actually try to translate this now 1467 00:57:20,946 --> 00:57:25,446 to some actual implementation, and we'll actually see 1468 00:57:25,446 --> 00:57:27,506 that even though we're focusing for today on this notion 1469 00:57:27,506 --> 00:57:30,156 of a linked list, now that we have this power 1470 00:57:30,156 --> 00:57:33,236 to stitch together arbitrary objects in memory, 1471 00:57:33,476 --> 00:57:35,746 we can really start to do anything we want. 1472 00:57:35,796 --> 00:57:39,076 So case and point, when you're implementing something, say, 1473 00:57:39,076 --> 00:57:41,936 entrepreneurial in spirits, something like Google, 1474 00:57:41,936 --> 00:57:42,816 something like Facebook, 1475 00:57:42,816 --> 00:57:44,656 where you actually have some real world problems 1476 00:57:44,656 --> 00:57:47,306 where you have lots of data very often these days, 1477 00:57:47,476 --> 00:57:49,596 and you wanna be able to search it more effectively, 1478 00:57:49,746 --> 00:57:52,196 and you're just throwing an array at all of the web pages 1479 00:57:52,196 --> 00:57:55,346 in the world probably isn't going to solve that problem 1480 00:57:55,346 --> 00:57:56,536 of finding them very easily. 1481 00:57:56,756 --> 00:57:58,786 Facebook is still growing at a ridiculous rate, 1482 00:57:58,786 --> 00:58:01,556 storing every Facebook user in a big array of memory, 1483 00:58:01,716 --> 00:58:04,406 probably isn't it gonna cut it as well, but no matter. 1484 00:58:04,406 --> 00:58:07,816 Now, we have this ability to represent a Facebook user 1485 00:58:07,816 --> 00:58:09,946 with struct, a web page with struct. 1486 00:58:10,076 --> 00:58:11,606 And using things like pointers, 1487 00:58:11,796 --> 00:58:13,466 we can now weave these things together 1488 00:58:13,466 --> 00:58:15,346 in any manner we see fit. 1489 00:58:15,346 --> 00:58:18,076 And so, we'll start leveraging this all the more to come 1490 00:58:18,076 --> 00:58:20,866 up with even more effective and clever designs. 1491 00:58:20,926 --> 00:58:23,436 So let's go ahead and take a look at some code here. 1492 00:58:23,436 --> 00:58:27,256 In the list 1.h, we have, as promised, 1493 00:58:27,606 --> 00:58:28,966 the structure I need to use. 1494 00:58:28,966 --> 00:58:31,296 So every time I malloc a student from the audience, 1495 00:58:31,626 --> 00:58:34,046 we were essentially asking for this much space. 1496 00:58:34,186 --> 00:58:37,926 It's 64bits or 8 bytes, the 8 bytes-- 1497 00:58:37,926 --> 00:58:41,646 the first 4 bytes being for the N, and then pointers at least 1498 00:58:41,646 --> 00:58:44,486 on the cloud are 4 bytes as well or 32 bits. 1499 00:58:44,776 --> 00:58:47,276 So every struct is probably 8 bytes total. 1500 00:58:47,276 --> 00:58:49,196 That's how much space each of the humans 1501 00:58:49,196 --> 00:58:51,716 on stage here represented, except for me 1502 00:58:51,716 --> 00:58:54,416 who was some special main function, and Herb. 1503 00:58:54,416 --> 00:58:55,846 How much space was Herb taking 1504 00:58:55,846 --> 00:58:57,906 up at the very first of the list? 1505 00:58:58,616 --> 00:59:01,096 So he was just 4 bytes, 'cause Herb was just a pointer, 1506 00:59:01,096 --> 00:59:03,156 and that's why I intentionally chose yellow paper for him. 1507 00:59:03,386 --> 00:59:05,056 He was a little bit different from everyone else, 1508 00:59:05,266 --> 00:59:07,836 because he did not have storage space for an integer. 1509 00:59:08,056 --> 00:59:09,806 He just pointed at the start of the list. 1510 00:59:09,916 --> 00:59:12,736 And so, when you implement a linked list, it turns out, 1511 00:59:12,736 --> 00:59:16,286 all you really need, at least in advance, is one node. 1512 00:59:16,536 --> 00:59:18,796 So in this program, list 1.c, 1513 00:59:18,796 --> 00:59:21,546 which is among next week's printouts, notice I've declared 1514 00:59:21,546 --> 00:59:25,536 for convenience, one global variable, I've called it first, 1515 00:59:25,676 --> 00:59:29,316 just like we called Herb first, it's a pointer to a node, 1516 00:59:29,486 --> 00:59:31,566 so node start means pointer to a node. 1517 00:59:31,786 --> 00:59:34,406 And so, this is how you translate Herb standing 1518 00:59:34,406 --> 00:59:38,136 over here with his left hand pointing to actual C code. 1519 00:59:38,136 --> 00:59:39,396 Now, it's initialized to know-- 1520 00:59:39,396 --> 00:59:41,316 which means when Herb first came up, he was like this. 1521 00:59:41,796 --> 00:59:44,246 But now, I am gonna actually start doing something 1522 00:59:44,246 --> 00:59:44,766 with this list. 1523 00:59:44,806 --> 00:59:50,616 So this program is just a relatively simple program 1524 00:59:50,716 --> 00:59:53,736 in terms of usage, that allows me to play with the linked list. 1525 00:59:53,736 --> 00:59:55,106 And so, I needed a trivial menu. 1526 00:59:55,106 --> 00:59:57,636 So what you are seeing now is the list 1 program running. 1527 00:59:57,936 --> 01:00:00,986 I've just got an infinite wild loop listening for user input, 1528 01:00:01,186 --> 01:00:03,666 'cause I just wanted some mechanism whereby we could add 1529 01:00:03,666 --> 01:00:05,976 and remove numbers from an actual linked list and see them. 1530 01:00:06,046 --> 01:00:08,836 >> So I'm gonna go ahead and type 3 and hit enter. 1531 01:00:09,126 --> 01:00:12,116 This is gonna trigger an if condition or a switch case 1532 01:00:12,286 --> 01:00:14,946 in my main function that's gonna allow me to insert a number. 1533 01:00:15,186 --> 01:00:18,576 Let me go ahead and insert the number, let's say number-- 1534 01:00:18,576 --> 01:00:20,886 who was the first number before. 1535 01:00:20,886 --> 01:00:22,846 I'll try to make the same story come together. 1536 01:00:23,266 --> 01:00:25,206 We had number 9 was the very start of the list. 1537 01:00:25,326 --> 01:00:27,936 So let input 9, so the list is now 9. 1538 01:00:28,406 --> 01:00:31,406 Let me go ahead and insert 1 more value, and the next value 1539 01:00:31,406 --> 01:00:33,236 in the list was 17 before. 1540 01:00:33,576 --> 01:00:35,946 So now, the list is 9 followed by 17. 1541 01:00:35,946 --> 01:00:37,646 Let me do one more for good measure. 1542 01:00:37,966 --> 01:00:39,966 Let's do 34 who was the end. 1543 01:00:40,466 --> 01:00:41,446 And now, the list is this. 1544 01:00:41,526 --> 01:00:44,036 So the program itself is not interesting to run 1545 01:00:44,036 --> 01:00:46,236 or to play with, 'cause all it's gonna do is what you tell it. 1546 01:00:46,236 --> 01:00:49,146 What is interesting here is to look at the code. 1547 01:00:49,146 --> 01:00:53,116 So let me go ahead and open up-- this is list 1.c, 1548 01:00:53,416 --> 01:00:55,876 and let's see how an insertion actually works. 1549 01:00:55,876 --> 01:00:57,566 Well, first, the main function. 1550 01:00:57,856 --> 01:00:59,336 It's pretty does what I said. 1551 01:00:59,336 --> 01:01:01,346 It's just got to do a wild loop which his common 1552 01:01:01,346 --> 01:01:03,446 when you wanna get user input but sanity check it. 1553 01:01:03,716 --> 01:01:05,576 I've got a menu that I just printed with printf., 1554 01:01:05,976 --> 01:01:08,306 I'm using getInt to get the user's command, 1555 01:01:08,576 --> 01:01:10,636 then I've got a switch statement here, and notice, 1556 01:01:10,676 --> 01:01:12,076 this is a stylistic decision. 1557 01:01:12,076 --> 01:01:14,166 In the past, you've probably seen us hit enter 1558 01:01:14,426 --> 01:01:16,526 after the colon in a case, 1559 01:01:16,526 --> 01:01:18,216 so that these things would take up multiple lines. 1560 01:01:18,506 --> 01:01:21,786 But frankly, aesthetically, this is pretty pretty. 1561 01:01:22,106 --> 01:01:24,306 And other-- it is a little cleaner looking, this code. 1562 01:01:24,306 --> 01:01:26,126 So there's no reason to insert line breaks 1563 01:01:26,126 --> 01:01:28,486 as you might be always if these things can be written very 1564 01:01:28,486 --> 01:01:30,506 nicely on one line, stylistic decision. 1565 01:01:30,826 --> 01:01:33,386 Let me scroll down, and it looks like-- okay. 1566 01:01:33,796 --> 01:01:36,266 Apparently, here in the end is how I'm gonna free the linked 1567 01:01:36,266 --> 01:01:38,006 list, but let's not get ahead of ourselves. 1568 01:01:38,006 --> 01:01:41,806 So suppose I do type number 3 to insert an element. 1569 01:01:42,066 --> 01:01:44,486 Let me scroll down to the insert function. 1570 01:01:45,466 --> 01:01:47,216 Alright. So there's a lot of code here, 1571 01:01:47,216 --> 01:01:48,196 but I'm gonna draw your attention 1572 01:01:48,196 --> 01:01:51,036 to the highlights 'cause we'll return to this before long. 1573 01:01:51,166 --> 01:01:52,746 So here is the insert function. 1574 01:01:53,146 --> 01:01:55,896 Here is-- in the first few lines of codes-- 1575 01:01:55,896 --> 01:01:58,236 where I try to instantiate the node for the number. 1576 01:01:58,236 --> 01:02:00,816 So, when I stood up here and I said, "Malloc, Max." 1577 01:02:01,436 --> 01:02:03,536 what happened were these 3 lines. 1578 01:02:03,536 --> 01:02:05,836 I declare a local variable called new pointer. 1579 01:02:06,096 --> 01:02:08,166 So this was me taking on the role recall 1580 01:02:08,166 --> 01:02:10,316 of the new pointer 'cause I just needed one of my hands 1581 01:02:10,316 --> 01:02:12,606 to be pointing at each new audience member. 1582 01:02:13,096 --> 01:02:16,996 So now, I've called Malloc, size of node. 1583 01:02:16,996 --> 01:02:19,606 So it's with this call here, Malloc size of node, 1584 01:02:19,606 --> 01:02:22,416 that I get back those 8 bytes, where I got back Max and Jacob 1585 01:02:22,416 --> 01:02:23,796 and other volunteers from the audience. 1586 01:02:23,876 --> 01:02:25,826 So that's what's going on in these first few lines, 1587 01:02:26,156 --> 01:02:27,266 but I'm doing a sanity check. 1588 01:02:27,596 --> 01:02:30,516 If malloc returns null, something has gone wrong, 1589 01:02:30,516 --> 01:02:33,356 probably I've asked for too many volunteers, we're out of memory, 1590 01:02:33,586 --> 01:02:34,876 so I'd better bail at this point. 1591 01:02:34,876 --> 01:02:36,796 And so, I arbitrarily just decided to return. 1592 01:02:36,796 --> 01:02:38,066 I'm just not gonna try inserting. 1593 01:02:38,066 --> 01:02:39,176 I'm not gonna inform the user. 1594 01:02:39,416 --> 01:02:42,106 I'm just not gonna risk following a pointer that's null. 1595 01:02:42,716 --> 01:02:44,326 So now, I initialize the node. 1596 01:02:44,676 --> 01:02:48,636 So when I handed-- when I handed someone their-- 1597 01:02:48,636 --> 01:02:50,576 actually, maybe I should hang on to some of the numbers. 1598 01:02:50,776 --> 01:02:54,106 So when I handed the volunteer a number, that corresponds now 1599 01:02:54,106 --> 01:02:54,996 to these several lines. 1600 01:02:54,996 --> 01:02:56,976 So here is that new piece of syntax I promised. 1601 01:02:57,306 --> 01:02:59,956 I first asked the user number to insert, 1602 01:03:00,076 --> 01:03:01,646 I used getInt to get the int. 1603 01:03:01,726 --> 01:03:04,846 So here, I was the one playing getInt 'cause I handed the piece 1604 01:03:04,846 --> 01:03:05,446 of paper out. 1605 01:03:05,766 --> 01:03:06,906 But here is the new syntax. 1606 01:03:07,196 --> 01:03:09,776 New pointer at this moment of time, was pointing 1607 01:03:09,776 --> 01:03:12,066 at the volunteer, at Max or Jacob or someone 1608 01:03:12,066 --> 01:03:13,196 of our other volunteers. 1609 01:03:13,526 --> 01:03:17,726 New pointer, arrow N means take this pointer, new pointer, 1610 01:03:18,046 --> 01:03:22,546 follow it, and then dive into the field called N, 1611 01:03:22,846 --> 01:03:25,586 and put whatever you want there, in this case, 1612 01:03:25,646 --> 01:03:27,036 the return value of getInt. 1613 01:03:27,296 --> 01:03:30,516 Meanwhile, follow new pointer, use this arrow annotation, 1614 01:03:30,516 --> 01:03:33,946 go into its next field, and have each volunteer put their hand 1615 01:03:34,256 --> 01:03:34,956 by their side. 1616 01:03:34,956 --> 01:03:37,016 So those two steps; handing the human the number, 1617 01:03:37,216 --> 01:03:38,496 and then having them put their left hand 1618 01:03:38,496 --> 01:03:40,696 by their sides translates to these two lines 1619 01:03:41,026 --> 01:03:44,966 that are involved this new piece of syntax N arrow notation. 1620 01:03:44,966 --> 01:03:47,526 And frankly, this is finally the first piece of C syntax 1621 01:03:47,526 --> 01:03:48,756 that would make conceptual sense. 1622 01:03:48,796 --> 01:03:50,816 The arrow implies go there, point there. 1623 01:03:51,116 --> 01:03:52,046 And so, we have something that's 1624 01:03:52,046 --> 01:03:53,526 at least pretty reasonable this time. 1625 01:03:53,886 --> 01:03:57,186 Alright, so now, what's going to happen next? 1626 01:03:57,246 --> 01:03:59,906 Well, I first need to handle a few cases, 1627 01:03:59,906 --> 01:04:03,166 and this is where it's not easy to code this for the first time 1628 01:04:03,166 --> 01:04:05,446 because you have to consider all these possible cases, 1629 01:04:05,496 --> 01:04:08,236 sometimes in a certain order, but I realize through trial 1630 01:04:08,236 --> 01:04:10,196 and error, that the first thing I wanna do is check, 1631 01:04:10,646 --> 01:04:14,376 does Herb actually point to anyone form the get-go? 1632 01:04:14,376 --> 01:04:16,996 Or is he just standing here like this with his left hand down, 1633 01:04:16,996 --> 01:04:18,336 in which case, he exist, 1634 01:04:18,396 --> 01:04:21,416 my linked list exist, but it's of size 0. 1635 01:04:21,696 --> 01:04:23,376 So with this line of code, I'm checking, 1636 01:04:23,376 --> 01:04:25,426 is Herb's hand pointing down? 1637 01:04:25,676 --> 01:04:28,836 If so, inserting Max or whoever has come up to the stage 1638 01:04:28,836 --> 01:04:30,786 at this point is actually pretty easy, 1639 01:04:30,786 --> 01:04:32,836 because I've already malloced Max or someone, 1640 01:04:33,066 --> 01:04:36,226 I now have my new pointer pointing at him, so all I need 1641 01:04:36,226 --> 01:04:39,476 to do is ask Herb, Herb, can you please point at this struct 1642 01:04:39,476 --> 01:04:40,526 that I just allocated? 1643 01:04:40,526 --> 01:04:43,016 And so, that's the first case here, and it's easy 1644 01:04:43,076 --> 01:04:44,686 because there's so little work to be done. 1645 01:04:44,886 --> 01:04:46,706 Well, let's try a slightly more interesting one. 1646 01:04:46,996 --> 01:04:50,846 Suppose that the number I'm trying to insert into the list, 1647 01:04:51,096 --> 01:04:54,886 let's say 22, so suppose that the new nodes, 1648 01:04:54,886 --> 01:04:58,066 new pointers N field, which is 22, 1649 01:04:58,356 --> 01:05:02,606 is less than the first pointers N field. 1650 01:05:03,036 --> 01:05:07,166 Now, this might seem a little contradictory, because recall 1651 01:05:07,166 --> 01:05:11,386 that Herb does not have what? 1652 01:05:11,386 --> 01:05:11,453 [ Inaudible Remark ] 1653 01:05:11,453 --> 01:05:12,936 >> Herb does-- well, Herb does not have a value. 1654 01:05:12,936 --> 01:05:15,446 He does not have N. Herb is just, recall, 1655 01:05:15,646 --> 01:05:20,266 this special variable called first who's intentionally was 1656 01:05:20,266 --> 01:05:22,746 written on yellow paper cause he's only 4 bytes, 1657 01:05:23,046 --> 01:05:26,336 Herb does not have a value N, so it seems that we need 1658 01:05:26,336 --> 01:05:27,766 to take this arrow literally. 1659 01:05:27,836 --> 01:05:29,226 We don't have-- we don't wanna assume 1660 01:05:29,226 --> 01:05:30,946 that first itself is a node. 1661 01:05:31,406 --> 01:05:32,976 First is a pointer to a node. 1662 01:05:33,196 --> 01:05:37,466 So the fact that it says first arrow means follow the arrow, 1663 01:05:37,466 --> 01:05:40,286 follow the left hand, and then you arrive at some chunk 1664 01:05:40,286 --> 01:05:43,096 of memory, at that point you dive into the N field. 1665 01:05:43,156 --> 01:05:44,996 So we had number nine standing here 1666 01:05:44,996 --> 01:05:46,466 at the very beginning of the list. 1667 01:05:46,816 --> 01:05:50,126 So when we followed Herb's first pointer, we then ended 1668 01:05:50,126 --> 01:05:53,276 up in this node here and looked at the value on N. So it's okay 1669 01:05:53,276 --> 01:05:56,086 that Herb himself was not a struct, just a pointer. 1670 01:05:56,086 --> 01:05:57,886 You have to follow the arrow literally. 1671 01:05:57,956 --> 01:05:58,616 Same deal here. 1672 01:05:58,616 --> 01:06:01,376 New pointer-- again, I chose this yellow intentionally 1673 01:06:01,516 --> 01:06:03,416 when I malloced Max or someone from the audience, 1674 01:06:03,496 --> 01:06:05,846 I'm just a single pointer pointing at this new chunk 1675 01:06:05,846 --> 01:06:09,016 of memory, so this is consistent with that story. 1676 01:06:09,246 --> 01:06:12,006 Now, how do we translate those hand updates into code? 1677 01:06:12,526 --> 01:06:15,466 Well, my new-- if the node I've just allocated 1678 01:06:15,466 --> 01:06:20,416 from the audience be-- is suppose to be the first node 1679 01:06:20,416 --> 01:06:23,106 in the list because it's value is less 1680 01:06:23,236 --> 01:06:26,446 than the current first value, what do I need to do? 1681 01:06:26,916 --> 01:06:29,966 Well, if the new node I'm inserting actually belongs 1682 01:06:30,776 --> 01:06:32,996 over here at the very start of the list, 1683 01:06:33,056 --> 01:06:34,046 well, what do I need to do? 1684 01:06:34,106 --> 01:06:38,306 I need to ask-- I need to take my new node and point 1685 01:06:38,306 --> 01:06:40,196 at the former start of the list. 1686 01:06:40,196 --> 01:06:44,306 So this is when I inserted-- it was Jacob, number 5. 1687 01:06:44,306 --> 01:06:45,806 I might be confusing names. 1688 01:06:45,806 --> 01:06:47,856 But this is when I inserted-- no, Kenny-- 1689 01:06:47,926 --> 01:06:50,836 when we inserted 5 at the very start of the list, I had to say, 1690 01:06:50,836 --> 01:06:53,316 Kenny, please point at the current start of the list, 1691 01:06:53,566 --> 01:06:55,996 which is this first line of code, and then I asked Herb, 1692 01:06:55,996 --> 01:06:58,536 hey, Herb, would you mind pointing now at Kenny? 1693 01:06:59,076 --> 01:07:08,676 Hopefully, I got the names right there. 1694 01:07:08,676 --> 01:07:08,743 Yeah? 1695 01:07:08,743 --> 01:07:08,886 [ Inaudible Remark ] 1696 01:07:08,886 --> 01:07:10,106 >> So, correct. 1697 01:07:10,386 --> 01:07:13,596 I wanna do new pointer next gets first, 1698 01:07:13,996 --> 01:07:18,786 because if I first updated the first pointer, what happened? 1699 01:07:19,046 --> 01:07:20,986 That's when we orphaned the whole list. 1700 01:07:20,986 --> 01:07:24,736 If we update first prematurely, we lose track of everyone else. 1701 01:07:25,316 --> 01:07:27,906 So we wanna update the beginning of the list, 1702 01:07:28,156 --> 01:07:31,436 the pointer called first, or Herb, at the very last moment 1703 01:07:31,436 --> 01:07:32,746 so we don't orphan everyone else. 1704 01:07:33,846 --> 01:07:35,066 So if we reverse these steps, 1705 01:07:35,066 --> 01:07:36,666 we would actually incur the same bug 1706 01:07:36,666 --> 01:07:39,716 that we did somewhat intentionally earlier. 1707 01:07:40,166 --> 01:07:43,286 Now, this case here, I'm gonna wave my hand at some of it. 1708 01:07:43,286 --> 01:07:45,466 This is when frankly, these steps gets a little annoying, 1709 01:07:45,466 --> 01:07:48,046 and you only get it by thinking it through and trial and errors, 1710 01:07:48,046 --> 01:07:49,186 starting with very small list. 1711 01:07:49,516 --> 01:07:51,456 But in this frankly, when we did this with humans, 1712 01:07:51,456 --> 01:07:52,536 it was pretty straightforward. 1713 01:07:52,636 --> 01:07:54,466 It is actually pretty straightforward in code 1714 01:07:54,466 --> 01:07:55,626 if you read through all the cases, 1715 01:07:55,626 --> 01:07:57,596 but let's not dwell too long on this one 1716 01:07:57,866 --> 01:07:59,346 because we'll revisit it. 1717 01:07:59,346 --> 01:08:00,516 So what do I next? 1718 01:08:00,826 --> 01:08:03,566 If it-- if the node does not belong at the-- 1719 01:08:04,436 --> 01:08:05,836 there were two cases we handled already. 1720 01:08:06,146 --> 01:08:09,646 If the list is not empty and the number doesn't belong 1721 01:08:09,646 --> 01:08:11,856 in the beginning of the list, I am gonna go ahead 1722 01:08:11,856 --> 01:08:13,906 and collapse the remaining two cases, 1723 01:08:13,906 --> 01:08:15,986 middle and end, into one big list. 1724 01:08:16,386 --> 01:08:19,746 So this wild loop here is-- mounts to my-- 1725 01:08:20,066 --> 01:08:21,296 I called it four loop before. 1726 01:08:21,296 --> 01:08:24,056 But this wild loop does the process of looking and looking 1727 01:08:24,056 --> 01:08:26,086 and looking and looking for the right place in memory 1728 01:08:26,086 --> 01:08:29,126 to insert this node, then I have a few cases. 1729 01:08:29,126 --> 01:08:31,456 One-- and we didn't even bother with his verbally. 1730 01:08:31,706 --> 01:08:33,906 If there is a duplicate, I am gonna arbitrarily decide, 1731 01:08:34,176 --> 01:08:35,206 Max, go back to your seat. 1732 01:08:35,206 --> 01:08:36,636 We already have one of you in the least. 1733 01:08:36,636 --> 01:08:38,646 We don't watch another duplicate in the list. 1734 01:08:38,896 --> 01:08:41,206 But that's not a feature we had in the human example, 1735 01:08:41,206 --> 01:08:42,436 but I do have it here on code. 1736 01:08:42,876 --> 01:08:45,056 Else check for insertion at tail. 1737 01:08:45,536 --> 01:08:46,866 So what does this mean? 1738 01:08:46,866 --> 01:08:48,646 Well, this is the line of code that's interesting. 1739 01:08:48,646 --> 01:08:50,566 For now, notice that pred pointer, 1740 01:08:50,566 --> 01:08:52,496 assume that pred pointer is one of my hands 1741 01:08:52,496 --> 01:08:54,366 that I was just taking for granted, pointing, pointing, 1742 01:08:54,366 --> 01:08:55,566 pointing in each volunteer. 1743 01:08:55,946 --> 01:08:58,086 If my hand starts pointing at node, 1744 01:08:58,086 --> 01:08:59,756 what's the implication conceptually? 1745 01:09:01,126 --> 01:09:02,336 I'm just at the end of the list, right? 1746 01:09:02,336 --> 01:09:04,876 If I am now pointing at nothing, it means I went too far. 1747 01:09:05,096 --> 01:09:08,066 But intuitively, then that means that the new node I've alloc-- 1748 01:09:08,066 --> 01:09:10,956 malloced from the audience must belong at the tail of the list. 1749 01:09:11,256 --> 01:09:13,076 So this is why-- and this part 1750 01:09:13,076 --> 01:09:14,516 of what wave our hands out for today. 1751 01:09:14,516 --> 01:09:16,256 This is why I have this notion for a pred pointer, 1752 01:09:16,256 --> 01:09:18,316 previous pointer-- new pointer 1753 01:09:18,316 --> 01:09:20,766 and pred pointer are essentially my left and right hands. 1754 01:09:20,766 --> 01:09:22,406 When I was doing this walking through the list, 1755 01:09:22,406 --> 01:09:23,826 I was keeping track of where I was. 1756 01:09:24,086 --> 01:09:25,926 I need to do that, so that in code, 1757 01:09:25,926 --> 01:09:29,026 I actually remember where folks are. 1758 01:09:29,466 --> 01:09:33,756 Let's take a quick look at find, if only because it's nice 1759 01:09:33,756 --> 01:09:34,856 and straightforward now. 1760 01:09:35,106 --> 01:09:37,996 So let's assume, even though we glossed over that last big case, 1761 01:09:37,996 --> 01:09:39,496 which we will come back to before long. 1762 01:09:39,746 --> 01:09:42,316 Suppose my goal in life now is just to find the number 50, 1763 01:09:42,316 --> 01:09:44,536 or to find any node in a particular linked list. 1764 01:09:45,036 --> 01:09:47,416 So here, I prompt the user, enter the number to find, 1765 01:09:47,736 --> 01:09:50,236 I get the N from the users, store it in a variable called N, 1766 01:09:50,236 --> 01:09:52,306 and now, notice this trick. 1767 01:09:52,566 --> 01:09:55,106 So here is a very common approach with pointers-- 1768 01:09:55,366 --> 01:09:57,456 if you have some super special pointer, 1769 01:09:57,666 --> 01:09:59,066 like this one called first, 1770 01:09:59,066 --> 01:10:00,976 which again is global, and that's Herb. 1771 01:10:01,086 --> 01:10:01,896 >> That's Herb pointing 1772 01:10:01,896 --> 01:10:03,826 at any list we currently have in memory. 1773 01:10:04,166 --> 01:10:06,046 I'm gonna just declare a pointer, PTR, 1774 01:10:06,296 --> 01:10:08,716 that's initially identical to Herb's value, 1775 01:10:08,926 --> 01:10:12,286 because I just need-- like I did visually, I just need a finger 1776 01:10:12,286 --> 01:10:13,936 to point at each of this nodes with, 1777 01:10:13,936 --> 01:10:15,596 and we're gonna call that PTR here. 1778 01:10:15,996 --> 01:10:18,276 Now why PTR is not null? 1779 01:10:18,606 --> 01:10:20,726 So in other words, this is me making sure 1780 01:10:20,726 --> 01:10:22,336 that I don't walk off the end of the list. 1781 01:10:22,916 --> 01:10:27,176 If the current pointer, the thing I'm pointing at, arrow N, 1782 01:10:27,476 --> 01:10:30,516 equals the N I'm looking for-- and now notice, this is legit, 1783 01:10:30,516 --> 01:10:33,216 even though I've given N the same name as this N, 1784 01:10:33,446 --> 01:10:36,226 the fact that this guy is only accessible via the arrow, 1785 01:10:36,496 --> 01:10:39,336 it's completely legitimate, I can also call this guy N, 1786 01:10:39,336 --> 01:10:41,076 but I could have called him X or Y, 1787 01:10:41,456 --> 01:10:43,386 just simple to go with the same. 1788 01:10:43,706 --> 01:10:45,556 So if the current note I'm pointing 1789 01:10:45,556 --> 01:10:48,256 at with my right hand equals, equals N, 1790 01:10:48,256 --> 01:10:51,146 the value that the user typed in, just print out, found it, 1791 01:10:51,626 --> 01:10:53,826 sleep for a second just for some silly animation, 1792 01:10:54,106 --> 01:10:55,006 then break out of this. 1793 01:10:55,166 --> 01:11:00,336 Else, if I did not find the value N, if I pointed at Kenny, 1794 01:11:00,616 --> 01:11:02,496 and then again, and again, and again down the list, 1795 01:11:02,496 --> 01:11:04,406 and I am not finding the number I'm looking for, 1796 01:11:04,406 --> 01:11:05,706 how do I update this pointer? 1797 01:11:05,966 --> 01:11:08,966 Well, this is kind of trippy here, but I actually do say, 1798 01:11:09,086 --> 01:11:11,276 pointer, guess pointer next. 1799 01:11:12,066 --> 01:11:14,846 So if right now, I'm pointing at a node, 1800 01:11:14,846 --> 01:11:16,646 that's this rectangle, right? 1801 01:11:16,646 --> 01:11:19,426 Inside this rectangle is a number and a pointer, 1802 01:11:19,706 --> 01:11:22,406 and I am PTR pointing at this node, if I wanna move 1803 01:11:22,406 --> 01:11:25,016 on to the next node, what do I set myself equal to? 1804 01:11:25,396 --> 01:11:27,616 Well, I just grab this node that's at the bottom 1805 01:11:27,616 --> 01:11:29,606 of this struct, and I say, I am going now point 1806 01:11:29,606 --> 01:11:31,456 at whatever this thing is pointing at. 1807 01:11:31,816 --> 01:11:35,116 And so, I'm now conceptually pointing at the next element. 1808 01:11:35,656 --> 01:11:37,316 Now, this is just a linked list. 1809 01:11:37,436 --> 01:11:40,186 Where can you actually take these things? 1810 01:11:40,186 --> 01:11:41,426 Well, recall this example. 1811 01:11:41,426 --> 01:11:43,556 So this is a photograph taken by one of the TFs 1812 01:11:43,556 --> 01:11:45,666 at Mather House, of some trays. 1813 01:11:45,666 --> 01:11:47,416 We keep referring to stacks of trays. 1814 01:11:47,806 --> 01:11:49,666 So it turns out that besides linked lists, 1815 01:11:49,666 --> 01:11:51,266 there's a data structure called a stack. 1816 01:11:51,496 --> 01:11:53,706 It's coincidence that it's the same thing we've been calling 1817 01:11:53,706 --> 01:11:54,806 this portion of RAM. 1818 01:11:55,476 --> 01:12:03,266 But a stack is what's called a LIFO data structure, L-I-F-O, 1819 01:12:03,566 --> 01:12:07,026 whereby the last element in, L-I, last in, 1820 01:12:07,296 --> 01:12:10,346 is going to be the first element out, F-O. 1821 01:12:10,656 --> 01:12:13,686 And so, we can now, using arrays and using pointers, 1822 01:12:13,686 --> 01:12:15,946 can start to implement structures a little like this. 1823 01:12:16,306 --> 01:12:17,986 There's this thing here taken by someone on-- 1824 01:12:17,986 --> 01:12:20,146 in gagdet.com, of a bunch of geeks lining 1825 01:12:20,146 --> 01:12:22,516 up for the iPhone a couple of years ago, in Manhattan. 1826 01:12:22,876 --> 01:12:26,296 So here, we have-- I say that having an iPhone in my pocket-- 1827 01:12:26,336 --> 01:12:28,916 but here-- but I wasn't in line-- this was-- 1828 01:12:28,916 --> 01:12:30,566 I was not taking the photo. 1829 01:12:30,616 --> 01:12:32,706 So this is what we generally call a queue. 1830 01:12:32,776 --> 01:12:35,036 In the real world, you call this a queue or a line to. 1831 01:12:35,286 --> 01:12:38,996 This is a FIFO data structure, First In First Out. 1832 01:12:39,076 --> 01:12:40,946 So, queues in the real world tend 1833 01:12:40,946 --> 01:12:43,216 to be much more appreciated by humans. 1834 01:12:43,216 --> 01:12:46,196 We don't wanna get screwed over by being the first tray 1835 01:12:46,196 --> 01:12:47,886 at the bottom of the stack, and the last one 1836 01:12:47,886 --> 01:12:49,176 to actually leave the lining hall. 1837 01:12:49,476 --> 01:12:52,596 But with these now, different concepts in the real world, 1838 01:12:52,626 --> 01:12:54,626 can we start to now implement things in code. 1839 01:12:54,626 --> 01:12:56,276 And that's where we will go after quiz one, 1840 01:12:56,276 --> 01:12:57,946 all the more sophisticated data structures, 1841 01:12:57,946 --> 01:13:01,036 no lecture on Monday, we will see you on Wednesday. 1842 01:13:04,516 --> 01:13:06,890 [ Silence ]