1 00:00:00,000 --> 00:00:03,486 [MUSIC PLAYING] 2 00:00:03,486 --> 00:00:50,300 3 00:00:50,300 --> 00:00:54,350 DAVID J. MALAN: All right, this is CS50 and this is a lecture four. 4 00:00:54,350 --> 00:00:56,510 So we're here in beautiful Lowell Lecture Hall 5 00:00:56,510 --> 00:00:57,830 and Sanders is in use today. 6 00:00:57,830 --> 00:00:59,788 And we're joined by some friends that will soon 7 00:00:59,788 --> 00:01:02,610 be clear and present in just a moment. 8 00:01:02,610 --> 00:01:06,140 But before then, recall that last time we took a look at CS50 IDE. 9 00:01:06,140 --> 00:01:08,990 This was a new web-based programming environment similar in spirit 10 00:01:08,990 --> 00:01:12,470 to CS50 Sandbox and CS50 Lab, but added a few features. 11 00:01:12,470 --> 00:01:16,640 For instance, what features did it add to you-- 12 00:01:16,640 --> 00:01:17,620 to your capabilities? 13 00:01:17,620 --> 00:01:18,120 Yeah? 14 00:01:18,120 --> 00:01:19,040 AUDIENCE: Debugger. 15 00:01:19,040 --> 00:01:19,530 DAVID J. MALAN: What's that? 16 00:01:19,530 --> 00:01:20,210 AUDIENCE: The debugger. 17 00:01:20,210 --> 00:01:20,880 DAVID J. MALAN: The debugger. 18 00:01:20,880 --> 00:01:23,120 So debug50, which opens that side panel that 19 00:01:23,120 --> 00:01:26,250 allows you to step through your code, step by step, and see variables. 20 00:01:26,250 --> 00:01:26,750 Yeah? 21 00:01:26,750 --> 00:01:27,670 AUDIENCE: Check50. 22 00:01:27,670 --> 00:01:28,270 DAVID J. MALAN: Sorry, say again? 23 00:01:28,270 --> 00:01:28,860 AUDIENCE: Check50. 24 00:01:28,860 --> 00:01:31,700 DAVID J. MALAN: Check50 as well, which is a CS50 specific tool that 25 00:01:31,700 --> 00:01:33,410 allows you to check the correctness of your code 26 00:01:33,410 --> 00:01:36,410 much like the teaching fellows would when providing feedback on it. 27 00:01:36,410 --> 00:01:38,609 Running a series of tests that pretty much are 28 00:01:38,609 --> 00:01:40,400 the same tests that a lot of the homework's 29 00:01:40,400 --> 00:01:42,410 will encourage you yourself to run manually, 30 00:01:42,410 --> 00:01:43,970 but it just automates the process. 31 00:01:43,970 --> 00:01:44,719 And anything else? 32 00:01:44,719 --> 00:01:51,020 AUDIENCE: [INAUDIBLE] 33 00:01:51,020 --> 00:01:52,520 DAVID J. MALAN: So that is true too. 34 00:01:52,520 --> 00:01:55,530 There's a little hidden Easter egg that we don't use this semester, 35 00:01:55,530 --> 00:01:56,400 but yes indeed. 36 00:01:56,400 --> 00:01:58,650 If you look for a small puzzle piece, you 37 00:01:58,650 --> 00:02:02,430 can actually convert your C code back to Scratch like puzzle pieces 38 00:02:02,430 --> 00:02:06,257 and back and forth, and back to forth, thanks to Kareem and some of the team. 39 00:02:06,257 --> 00:02:08,340 So that is there, but by now, it's probably better 40 00:02:08,340 --> 00:02:09,929 to get comfortable with text as well. 41 00:02:09,929 --> 00:02:11,970 So there's a couple of our other tools that we've 42 00:02:11,970 --> 00:02:15,780 used over time of course besides check50 and debug50. 43 00:02:15,780 --> 00:02:20,040 We've of course used printf and when is printf useful? 44 00:02:20,040 --> 00:02:24,030 Like when might you want to use it beyond needing to just print something 45 00:02:24,030 --> 00:02:25,710 because the problem set tells you to. 46 00:02:25,710 --> 00:02:25,890 Yeah? 47 00:02:25,890 --> 00:02:27,480 AUDIENCE: To find where your bug is. 48 00:02:27,480 --> 00:02:29,605 DAVID J. MALAN: Yeah, so to find where your bug is. 49 00:02:29,605 --> 00:02:32,982 If you just, kind of, want to print out variables, value or some kind of text 50 00:02:32,982 --> 00:02:35,190 so you know what's going on and you don't necessarily 51 00:02:35,190 --> 00:02:37,000 want to deploy debug50, you can do that. 52 00:02:37,000 --> 00:02:37,500 When else? 53 00:02:37,500 --> 00:02:41,372 AUDIENCE: If you have a long formula for something [INAUDIBLE] 54 00:02:41,372 --> 00:02:43,266 and you want to see [INAUDIBLE]. 55 00:02:43,266 --> 00:02:44,140 DAVID J. MALAN: Good. 56 00:02:44,140 --> 00:02:44,639 Yeah. 57 00:02:44,639 --> 00:02:48,492 AUDIENCE: How running-- like going through debug50 50 times. 58 00:02:48,492 --> 00:02:49,450 DAVID J. MALAN: Indeed. 59 00:02:49,450 --> 00:02:51,616 Well, in real life-- so you might want to use printf 60 00:02:51,616 --> 00:02:55,190 when you have maybe a nested loop, and you want to put a printf inside loop 61 00:02:55,190 --> 00:02:57,010 so as to see when it kicks in. 62 00:02:57,010 --> 00:02:59,200 Of course, you could use debug50, but you 63 00:02:59,200 --> 00:03:02,544 might end up running debug50 or clicking next, next, next, next, next, next, 64 00:03:02,544 --> 00:03:04,460 next so many times that gets a little tedious. 65 00:03:04,460 --> 00:03:08,020 But do keep in mind, you can just put a breakpoint deeper into your code 66 00:03:08,020 --> 00:03:10,832 as well and perhaps remove an earlier breakpoint as well. 67 00:03:10,832 --> 00:03:13,540 And honestly, all the time, whether it's in C or other languages, 68 00:03:13,540 --> 00:03:18,400 do I find myself occasionally using printf just to type out printf in here 69 00:03:18,400 --> 00:03:22,581 just so that I can literally see if my code got to a certain point in here 70 00:03:22,581 --> 00:03:23,830 to see if something's printed. 71 00:03:23,830 --> 00:03:25,210 But the debugger you're going to find now 72 00:03:25,210 --> 00:03:28,060 and hence forth so much more powerful, so much more versatile. 73 00:03:28,060 --> 00:03:31,280 So if you haven't already gotten to the habit of using debug50 by all 74 00:03:31,280 --> 00:03:34,720 means start and use those breakpoints to actually walk through your code 75 00:03:34,720 --> 00:03:36,456 where you care to see what's going on. 76 00:03:36,456 --> 00:03:39,580 So style50, of course, checks the style of your code much like the teaching 77 00:03:39,580 --> 00:03:41,970 fellows might, and it shows you in red or green 78 00:03:41,970 --> 00:03:44,439 what spaces you might want to delete, what spaces you might 79 00:03:44,439 --> 00:03:45,980 want to add just to pretty things up. 80 00:03:45,980 --> 00:03:47,688 So it's more readable for you and others. 81 00:03:47,688 --> 00:03:49,120 And then what about help50? 82 00:03:49,120 --> 00:03:52,540 When should you instinctively reach for help50? 83 00:03:52,540 --> 00:03:55,245 AUDIENCE: When you don't understand an error message. 84 00:03:55,245 --> 00:03:56,245 DAVID J. MALAN: Exactly. 85 00:03:56,245 --> 00:03:58,020 Yeah, when you don't understand an error message. 86 00:03:58,020 --> 00:03:59,050 So you're compiling something. 87 00:03:59,050 --> 00:03:59,890 You're running a command. 88 00:03:59,890 --> 00:04:02,650 It doesn't really quite work and you're seeing a cryptic error message. 89 00:04:02,650 --> 00:04:05,358 Eventually, you'll get the muscle memory and the sort of exposure 90 00:04:05,358 --> 00:04:07,270 to just know, oh, I remember what that means. 91 00:04:07,270 --> 00:04:10,300 But until then, run help50 at the beginning of that same command, 92 00:04:10,300 --> 00:04:13,210 and it's going to try to detect what your error is 93 00:04:13,210 --> 00:04:17,260 and provide TF-like feedback on how to actually work around that. 94 00:04:17,260 --> 00:04:22,120 You'll see two on the course's website is a wonderful handout made 95 00:04:22,120 --> 00:04:24,670 by Emily Hong, one of our own teaching fellows, 96 00:04:24,670 --> 00:04:26,920 that introduces all of these tools, and a few more, 97 00:04:26,920 --> 00:04:29,597 and gets you into the habit of thinking about things. 98 00:04:29,597 --> 00:04:30,680 It's kind of a flow chart. 99 00:04:30,680 --> 00:04:32,650 If I have this problem, then do this or else 100 00:04:32,650 --> 00:04:34,640 if I have this problem do this other thing. 101 00:04:34,640 --> 00:04:36,440 So to check that out as well. 102 00:04:36,440 --> 00:04:39,854 But today, let's introduce really the last, certainly for C, 103 00:04:39,854 --> 00:04:41,770 of our command line tools that's going to help 104 00:04:41,770 --> 00:04:44,350 you chase down problems in your code. 105 00:04:44,350 --> 00:04:47,390 Last week, recall that we had talked about memory a lot. 106 00:04:47,390 --> 00:04:49,540 We talked about malloc, allocating memory, 107 00:04:49,540 --> 00:04:51,610 and we talked about freeing memory and the like. 108 00:04:51,610 --> 00:04:53,650 But it turns out, you can do a lot of damage 109 00:04:53,650 --> 00:04:55,220 when you start playing with memory. 110 00:04:55,220 --> 00:04:58,840 In fact, probably by now, almost everyone-- segmentation fault? 111 00:04:58,840 --> 00:04:59,590 [LAUGHTER] 112 00:04:59,590 --> 00:05:02,500 Yeah, so that's just one of the errors that you might run into, 113 00:05:02,500 --> 00:05:06,070 and frankly, you might have errors in your code now 114 00:05:06,070 --> 00:05:08,674 and hence forth that have bugs but you don't even realize it 115 00:05:08,674 --> 00:05:10,090 because you're just getting lucky. 116 00:05:10,090 --> 00:05:13,187 And the program is just not crashing or it's not freezing, 117 00:05:13,187 --> 00:05:14,270 but this can still happen. 118 00:05:14,270 --> 00:05:17,890 And so Valgrind is a command line program that is probably 119 00:05:17,890 --> 00:05:19,700 looks the scariest of the tools we've used, 120 00:05:19,700 --> 00:05:21,610 but you can also use it with help50, that 121 00:05:21,610 --> 00:05:24,799 just tries to find what are called memory leaks in your program. 122 00:05:24,799 --> 00:05:26,590 Recall that last week we introduced malloc, 123 00:05:26,590 --> 00:05:28,580 and malloc lets you allocate memory. 124 00:05:28,580 --> 00:05:32,830 But if you don't free that memory, by literally calling the free function, 125 00:05:32,830 --> 00:05:35,990 you're going to constantly ask your operating system, MacOS, Linux, 126 00:05:35,990 --> 00:05:37,812 Windows, whatever, can I have more memory? 127 00:05:37,812 --> 00:05:38,770 Can I have more memory? 128 00:05:38,770 --> 00:05:39,728 Can I have more memory? 129 00:05:39,728 --> 00:05:42,760 And if you never, literally, hand it back by calling free your computer 130 00:05:42,760 --> 00:05:45,570 may very well slow down or freeze or crash. 131 00:05:45,570 --> 00:05:49,000 And frankly, if you've ever had that happen on your Mac or PC, very likely 132 00:05:49,000 --> 00:05:50,920 that's what some human accidentally did. 133 00:05:50,920 --> 00:05:53,110 He or she just allocated more and more memory 134 00:05:53,110 --> 00:05:56,060 but never really got around to freeing that memory. 135 00:05:56,060 --> 00:05:59,950 So Valgrind can help you find those mistakes before you or your users do. 136 00:05:59,950 --> 00:06:04,630 So let's do a quick example, let me go CS50 IDE, and let me go ahead 137 00:06:04,630 --> 00:06:07,180 and make one new program here. 138 00:06:07,180 --> 00:06:10,360 We'll call it memory.c because we'll see later today how 139 00:06:10,360 --> 00:06:12,274 I might chase down those memory leaks. 140 00:06:12,274 --> 00:06:15,190 But for now, let's start with something even simpler, which all of you 141 00:06:15,190 --> 00:06:18,569 may be done by now, which is to accidentally touch memory 142 00:06:18,569 --> 00:06:21,860 that you shouldn't, changing it, reading it and let's see what this might mean. 143 00:06:21,860 --> 00:06:25,330 So let me do the familiar at the top here. 144 00:06:25,330 --> 00:06:28,675 Include standard IO. 145 00:06:28,675 --> 00:06:30,050 Well, let's not even do that yet. 146 00:06:30,050 --> 00:06:31,091 Let's just do this first. 147 00:06:31,091 --> 00:06:34,240 Let's do int, main(void), just to start a simple program 148 00:06:34,240 --> 00:06:38,440 and in here let me go ahead and just call a function called f. 149 00:06:38,440 --> 00:06:40,450 I don't really care what its name is for today. 150 00:06:40,450 --> 00:06:43,780 I just want to call a function f, and then that's it. 151 00:06:43,780 --> 00:06:47,620 Now this function f, let me go ahead and define it as follows, void f(void). 152 00:06:47,620 --> 00:06:50,034 It's not going to do much of anything at all. 153 00:06:50,034 --> 00:06:53,200 But let's suppose, just for the sake of discussion, that f's purpose in life 154 00:06:53,200 --> 00:06:55,780 is just to allocate memory for whatever useful purpose, 155 00:06:55,780 --> 00:06:58,480 but for now it's just for demonstration's sake. 156 00:06:58,480 --> 00:07:01,830 So what's the function with which you can allocate memory? 157 00:07:01,830 --> 00:07:02,710 AUDIENCE: Malloc. 158 00:07:02,710 --> 00:07:03,668 DAVID J. MALAN: Malloc. 159 00:07:03,668 --> 00:07:06,490 So suppose I want malloc space for, I don't know, 160 00:07:06,490 --> 00:07:08,439 something simple like just one integer. 161 00:07:08,439 --> 00:07:10,480 We're just doing this for demonstration purposes, 162 00:07:10,480 --> 00:07:13,420 or actually let's do more, 10 integers, 10 integers. 163 00:07:13,420 --> 00:07:17,800 I could, of course, do-- well, give me 10, but how many bytes do what I want? 164 00:07:17,800 --> 00:07:19,810 How many bytes do I need for 10 integers? 165 00:07:19,810 --> 00:07:20,800 AUDIENCE: sizeof(int). 166 00:07:20,800 --> 00:07:23,540 DAVID J. MALAN: Yeah, so I can do literally sizeof(int) 167 00:07:23,540 --> 00:07:28,072 and most likely the size of an int is going to be? 168 00:07:28,072 --> 00:07:28,729 AUDIENCE: Four. 169 00:07:28,729 --> 00:07:30,020 DAVID J. MALAN: Four, probably. 170 00:07:30,020 --> 00:07:32,750 On many systems today, it's just 4 bytes or 32 bits, 171 00:07:32,750 --> 00:07:36,200 but you don't want to hard code that lest someone else's computer not use 172 00:07:36,200 --> 00:07:37,170 those same values. 173 00:07:37,170 --> 00:07:38,280 So the size of an int. 174 00:07:38,280 --> 00:07:39,980 So 10 times the size of an int. 175 00:07:39,980 --> 00:07:42,270 Malloc returns what type of data? 176 00:07:42,270 --> 00:07:45,254 What does that hand me back? 177 00:07:45,254 --> 00:07:46,742 AUDIENCE: [INAUDIBLE] 178 00:07:46,742 --> 00:07:49,000 DAVID J. MALAN: Yeah, returns an address or a pointer. 179 00:07:49,000 --> 00:07:53,600 Specifically, the address, 100, 900, whatever, of the chunk of memory 180 00:07:53,600 --> 00:07:55,200 it just allocated for you. 181 00:07:55,200 --> 00:07:58,400 So if I want to keep that around, I need to declare a pointer. 182 00:07:58,400 --> 00:08:01,070 Let's just call it x for today that stores that address. 183 00:08:01,070 --> 00:08:04,430 Could call it x, y, z, whatever, but it's not an int that it's returning. 184 00:08:04,430 --> 00:08:05,720 It's the address of an int. 185 00:08:05,720 --> 00:08:07,970 And remember, that's what the star operator now means. 186 00:08:07,970 --> 00:08:10,010 The address of some data type. 187 00:08:10,010 --> 00:08:11,270 It's just a number. 188 00:08:11,270 --> 00:08:14,330 All right, so now if I were to-- 189 00:08:14,330 --> 00:08:15,680 first, let's clean this up. 190 00:08:15,680 --> 00:08:20,090 Turns out that you use malloc, I need to use stdlib.h. 191 00:08:20,090 --> 00:08:22,670 We saw that last week, albeit briefly, and then of course 192 00:08:22,670 --> 00:08:26,390 if I'm going to call f, what do I have to do to fix this code? 193 00:08:26,390 --> 00:08:27,830 AUDIENCE: You need to declare. 194 00:08:27,830 --> 00:08:29,954 DAVID J. MALAN: Yeah, I need to declare it up here, 195 00:08:29,954 --> 00:08:32,149 or I could just move f's implementation up top. 196 00:08:32,149 --> 00:08:34,690 So I think this works, even though this program at the moment 197 00:08:34,690 --> 00:08:35,580 is completely stupid. 198 00:08:35,580 --> 00:08:38,510 It doesn't do anything useful, but it will allocate memory. 199 00:08:38,510 --> 00:08:41,010 And I'll do something with it as follows. 200 00:08:41,010 --> 00:08:45,359 If I want to change the first value in this chunk of memory, 201 00:08:45,359 --> 00:08:46,400 well how might I do that? 202 00:08:46,400 --> 00:08:51,020 Well, I've asked the computer for 10 integers or rather space 203 00:08:51,020 --> 00:08:52,367 for 10 integers. 204 00:08:52,367 --> 00:08:54,200 What's interesting about malloc is that when 205 00:08:54,200 --> 00:08:58,566 it returns a chunk of memory for you it's contiguous, back-to-back. 206 00:08:58,566 --> 00:09:00,440 And when you hear contiguous or back-to-back, 207 00:09:00,440 --> 00:09:03,710 what kind of data structure does that recall to mind? 208 00:09:03,710 --> 00:09:04,569 AUDIENCE: An array. 209 00:09:04,569 --> 00:09:05,610 DAVID J. MALAN: An array. 210 00:09:05,610 --> 00:09:09,290 So it turns out we can treat this just random chunk of memory 211 00:09:09,290 --> 00:09:10,440 like it's an array. 212 00:09:10,440 --> 00:09:14,300 So if we want to go to the first location in that array of memory, 213 00:09:14,300 --> 00:09:18,200 I can just do this and put in the number say 50. 214 00:09:18,200 --> 00:09:21,680 Or if I want to go to the next location, I can do this. 215 00:09:21,680 --> 00:09:24,410 Or if I want to do the next location, I can do this. 216 00:09:24,410 --> 00:09:27,200 Or if I want to go to the last location, I might do this, 217 00:09:27,200 --> 00:09:32,064 but is that good or bad? 218 00:09:32,064 --> 00:09:32,950 AUDIENCE: Bad. 219 00:09:32,950 --> 00:09:33,952 DAVID J. MALAN: Why bad? 220 00:09:33,952 --> 00:09:35,560 AUDIENCE: It's-- it's out of bounds 221 00:09:35,560 --> 00:09:36,860 DAVID J. MALAN: Yeah, so it's out of bounds. 222 00:09:36,860 --> 00:09:37,360 Right? 223 00:09:37,360 --> 00:09:39,950 This is sort of week one style mistakes when it came to loops. 224 00:09:39,950 --> 00:09:42,360 Recall, with for loops or while loops, you might go a little too far, 225 00:09:42,360 --> 00:09:43,260 and that's fine. 226 00:09:43,260 --> 00:09:45,500 But now we actually will see we have a tool that 227 00:09:45,500 --> 00:09:47,040 can help us notice these things. 228 00:09:47,040 --> 00:09:50,780 So hopefully, just visually, it's apparent that what I have going on here 229 00:09:50,780 --> 00:09:54,800 is just-- on line 12, I have a variable x 230 00:09:54,800 --> 00:09:56,870 that storing the address of that chunk of memory. 231 00:09:56,870 --> 00:10:00,890 And then on line 13, I'm just trying to access location 10 232 00:10:00,890 --> 00:10:02,180 and set the value 50 there. 233 00:10:02,180 --> 00:10:04,460 But as you note, there is no location 10. 234 00:10:04,460 --> 00:10:08,750 There's location 0, 1, 2, 3, all the way through 9, of course. 235 00:10:08,750 --> 00:10:10,800 So how might we detect this with a program? 236 00:10:10,800 --> 00:10:13,466 Well, let me go ahead and increase my terminal window just a bit 237 00:10:13,466 --> 00:10:17,560 here, save my file, and let me go ahead and compile make memory. 238 00:10:17,560 --> 00:10:18,650 OK, all is well. 239 00:10:18,650 --> 00:10:20,690 It compiled without any error messages, and now 240 00:10:20,690 --> 00:10:24,237 let me go ahead and run memory, enter. 241 00:10:24,237 --> 00:10:25,820 All right, so that worked pretty well. 242 00:10:25,820 --> 00:10:29,014 Let's actually be a little more explicit here just for good measure. 243 00:10:29,014 --> 00:10:30,680 Let me go ahead and print something out. 244 00:10:30,680 --> 00:10:36,290 So printf, %i for an integer, and let's make it just more explicit. 245 00:10:36,290 --> 00:10:42,140 You inputted %i and then comma x bracket 10. 246 00:10:42,140 --> 00:10:46,405 And what do I have to include you use printf? 247 00:10:46,405 --> 00:10:47,320 AUDIENCE: stdio.h. 248 00:10:47,320 --> 00:10:48,611 DAVID J. MALAN: Yeah, so stdio. 249 00:10:48,611 --> 00:10:51,940 So let's just quickly add that, stdio.h, save. 250 00:10:51,940 --> 00:10:55,000 All right, let me recompile this, make memory, enter. 251 00:10:55,000 --> 00:10:59,410 And now let me go ahead and do ./memory. 252 00:10:59,410 --> 00:11:00,610 Huh? 253 00:11:00,610 --> 00:11:02,469 Feels like it's a correct program. 254 00:11:02,469 --> 00:11:05,260 And yet, for a couple of weeks now we've been claiming that mm-hmm, 255 00:11:05,260 --> 00:11:06,490 don't do that. 256 00:11:06,490 --> 00:11:09,100 Don't go beyond the boundaries of your array. 257 00:11:09,100 --> 00:11:10,352 So how do we reconcile this? 258 00:11:10,352 --> 00:11:13,060 Feels like buggy code or at least we've told you it's buggy code, 259 00:11:13,060 --> 00:11:13,935 and yet it's working. 260 00:11:13,935 --> 00:11:16,301 261 00:11:16,301 --> 00:11:16,800 Yeah? 262 00:11:16,800 --> 00:11:19,273 AUDIENCE: [INAUDIBLE] 263 00:11:19,273 --> 00:11:21,272 DAVID J. MALAN: That's a good way of putting it. 264 00:11:21,272 --> 00:11:23,076 AUDIENCE: It's still very similar. 265 00:11:23,076 --> 00:11:23,980 We want that. 266 00:11:23,980 --> 00:11:24,771 DAVID J. MALAN: OK. 267 00:11:24,771 --> 00:11:27,120 AUDIENCE: So we can theoretically-- 268 00:11:27,120 --> 00:11:29,902 it just created a program. 269 00:11:29,902 --> 00:11:32,360 DAVID J. MALAN: Yeah, and I think if I heard you correctly, 270 00:11:32,360 --> 00:11:35,066 you said C doesn't scream if you go too far? 271 00:11:35,066 --> 00:11:35,690 AUDIENCE: Yeah. 272 00:11:35,690 --> 00:11:36,070 DAVID J. MALAN: Yeah, OK. 273 00:11:36,070 --> 00:11:37,528 So that's a good way of putting it. 274 00:11:37,528 --> 00:11:39,640 Like, you can get lucky in C. And you can 275 00:11:39,640 --> 00:11:43,707 do something that is objectively, pedagogically, like technically wrong, 276 00:11:43,707 --> 00:11:45,290 but the computer's not going to crash. 277 00:11:45,290 --> 00:11:46,790 It's not going to freeze because you just get lucky. 278 00:11:46,790 --> 00:11:48,970 Because often, for performance reasons, when 279 00:11:48,970 --> 00:11:51,220 you allocate space for 10 integers, you're 280 00:11:51,220 --> 00:11:53,110 actually going to get a chunk of memory back 281 00:11:53,110 --> 00:11:54,651 that's a little bigger than you need. 282 00:11:54,651 --> 00:11:58,172 It's just not safe to assume that it's bigger than you need, 283 00:11:58,172 --> 00:11:59,380 but you might just get lucky. 284 00:11:59,380 --> 00:12:02,421 And you might end up having more memory that you can technically get away 285 00:12:02,421 --> 00:12:05,780 with touching or accessing or changing, and the computer's not going to notice. 286 00:12:05,780 --> 00:12:08,114 But that's not safe because on someone else's Mac or PC, 287 00:12:08,114 --> 00:12:11,238 their computer might just be operating a little bit differently than yours, 288 00:12:11,238 --> 00:12:13,780 and bam, that bug is going to bite them and not you. 289 00:12:13,780 --> 00:12:16,990 And those are the hardest, most annoying bugs to chase down as some of you 290 00:12:16,990 --> 00:12:17,800 might have experienced. 291 00:12:17,800 --> 00:12:18,299 Right? 292 00:12:18,299 --> 00:12:21,200 It works on your computer but not a friends or vise versa. 293 00:12:21,200 --> 00:12:23,320 These are the kinds of explanations for that. 294 00:12:23,320 --> 00:12:26,890 So Valgrind can help us track down even these most subtle errors. 295 00:12:26,890 --> 00:12:28,670 The program seems to be working. 296 00:12:28,670 --> 00:12:30,426 Check50 or tools like it might even assume 297 00:12:30,426 --> 00:12:32,800 that it's working because it is printing the right thing, 298 00:12:32,800 --> 00:12:36,010 but let's take a look at what this program Valgrind thinks. 299 00:12:36,010 --> 00:12:38,740 Let me increase the size of the terminal window here, 300 00:12:38,740 --> 00:12:41,980 and go ahead and type in Valgrind ./memory. 301 00:12:41,980 --> 00:12:47,240 So same program name ./memory but I'm prefixing it with the name Valgrind. 302 00:12:47,240 --> 00:12:47,740 All right? 303 00:12:47,740 --> 00:12:50,020 Unfortunately, Valgrind is really quite ugly, 304 00:12:50,020 --> 00:12:52,490 and it prints out a whole bunch of stuff here. 305 00:12:52,490 --> 00:12:53,500 So let's take a look. 306 00:12:53,500 --> 00:12:56,330 At the very top, you'll see all these numbers on the left, 307 00:12:56,330 --> 00:12:58,150 and that's just an unfortunate aesthetic. 308 00:12:58,150 --> 00:13:00,450 But we do see some useful information. 309 00:13:00,450 --> 00:13:03,580 Invalid read of size 4 and then it has these cryptic 310 00:13:03,580 --> 00:13:05,080 looking letters and numbers. 311 00:13:05,080 --> 00:13:07,990 What are those? 312 00:13:07,990 --> 00:13:09,634 They're just addresses and hexadecimal. 313 00:13:09,634 --> 00:13:11,800 It doesn't really matter what they are, but Valgrind 314 00:13:11,800 --> 00:13:15,670 can tell us where the memory is that's acting up suspiciously. 315 00:13:15,670 --> 00:13:18,370 You can then see next to that, that Valgrind is pointing 316 00:13:18,370 --> 00:13:21,760 to function f on memory. c 15th line. 317 00:13:21,760 --> 00:13:24,327 So that's perhaps helpful, and then main on line 8 318 00:13:24,327 --> 00:13:26,160 because that's the function that was called. 319 00:13:26,160 --> 00:13:29,590 So Valgrind is actually kind of nice in that it's showing us all the functions 320 00:13:29,590 --> 00:13:33,160 that you called from bottom up, much like the stack from last week. 321 00:13:33,160 --> 00:13:37,420 And so something's going wrong line 15, and if we go back to that, 322 00:13:37,420 --> 00:13:39,750 let's see line 15 was-- 323 00:13:39,750 --> 00:13:41,240 well, sure enough. 324 00:13:41,240 --> 00:13:43,660 I'm actually trying to access that memory location 325 00:13:43,660 --> 00:13:46,040 and frankly I did it on line 14 as well. 326 00:13:46,040 --> 00:13:49,540 So hopefully fixing one or both of those will address this issue. 327 00:13:49,540 --> 00:13:54,460 And notice here, this frankly just gets overwhelming pretty quickly. 328 00:13:54,460 --> 00:13:58,384 And then, oh, 40 bytes in one block are definitely lost in lost record. 329 00:13:58,384 --> 00:14:00,550 I mean, this is the problem with Valgrind, honestly. 330 00:14:00,550 --> 00:14:03,820 It was written some years ago, not particularly user friendly, 331 00:14:03,820 --> 00:14:05,980 but that's fine we have a tool to address this. 332 00:14:05,980 --> 00:14:09,610 Let me go ahead and rerun Valgrind with help50, 333 00:14:09,610 --> 00:14:13,150 enter, and see if we can't just assist with this. 334 00:14:13,150 --> 00:14:16,990 All right, so still the same amount of black and white input but down here now 335 00:14:16,990 --> 00:14:21,700 help50 is noticing, oh, I can help you with an invalid write of size 4. 336 00:14:21,700 --> 00:14:23,880 So it's still at the same location, but this time-- 337 00:14:23,880 --> 00:14:26,950 or rather same file, memory.c but line 14. 338 00:14:26,950 --> 00:14:30,550 And we propose, looks like you're trying to modify 4 bytes of memory that 339 00:14:30,550 --> 00:14:32,029 isn't yours, question mark. 340 00:14:32,029 --> 00:14:34,570 Did you try to store something beyond the bounds of an array? 341 00:14:34,570 --> 00:14:37,590 Take a closer look at line 14 of memory.c. 342 00:14:37,590 --> 00:14:40,930 So hopefully, even though Valgrind's output is crazy esoteric, 343 00:14:40,930 --> 00:14:43,870 at least that yellow output will point you toward, ah, line 14. 344 00:14:43,870 --> 00:14:48,282 I'm indeed touching 4 bytes, an integer, that shouldn't be. 345 00:14:48,282 --> 00:14:49,740 And so let's go ahead and fix this. 346 00:14:49,740 --> 00:14:53,050 If I go into my program, and I don't do this. 347 00:14:53,050 --> 00:14:57,250 Let's change it to location 9, and location 9 here and save. 348 00:14:57,250 --> 00:15:02,680 Then let me go ahead and rerun Valgrind without help50. 349 00:15:02,680 --> 00:15:05,270 All right, progress except-- 350 00:15:05,270 --> 00:15:05,770 oops. 351 00:15:05,770 --> 00:15:07,160 Nope, no progress. 352 00:15:07,160 --> 00:15:08,590 I skipped the step. 353 00:15:08,590 --> 00:15:10,840 Yeah, I didn't recompile it. 354 00:15:10,840 --> 00:15:12,590 A little puzzled why I saw the same thing. 355 00:15:12,590 --> 00:15:18,730 So now let's rerun Valgrind and here it seems to be better. 356 00:15:18,730 --> 00:15:20,830 So I don't see that same error message up 357 00:15:20,830 --> 00:15:25,810 at the very top like we did before, but notice here, 40 bytes in one blocks. 358 00:15:25,810 --> 00:15:29,140 OK, that was bad grammar in the program, but are definitely 359 00:15:29,140 --> 00:15:30,545 lost in loss record 1 of 1. 360 00:15:30,545 --> 00:15:32,170 So I still don't quite understand that. 361 00:15:32,170 --> 00:15:33,100 No big deal. 362 00:15:33,100 --> 00:15:36,580 Let's go ahead and run help50 and see what the second of two errors 363 00:15:36,580 --> 00:15:38,020 apparently is here. 364 00:15:38,020 --> 00:15:40,590 So here it's highlighting those lines. 365 00:15:40,590 --> 00:15:43,960 40 bytes and one blocks are definitely lost, and looks like your program 366 00:15:43,960 --> 00:15:45,550 leaked 40 bytes of memory. 367 00:15:45,550 --> 00:15:48,250 Did you forget the free memory that you allocated with malloc? 368 00:15:48,250 --> 00:15:51,580 Take a closer look at line 13 of memory.c. 369 00:15:51,580 --> 00:15:54,970 So in this case line 13 indeed has a call to malloc. 370 00:15:54,970 --> 00:15:57,916 So what's the fix for this problem? 371 00:15:57,916 --> 00:15:58,659 AUDIENCE: Free. 372 00:15:58,659 --> 00:16:00,700 DAVID J. MALAN: Per help50 or your own intuition? 373 00:16:00,700 --> 00:16:02,715 What do I have to add to this program? 374 00:16:02,715 --> 00:16:03,340 AUDIENCE: Free. 375 00:16:03,340 --> 00:16:03,570 AUDIENCE: Free. 376 00:16:03,570 --> 00:16:05,028 Yeah, free, and where does that go? 377 00:16:05,028 --> 00:16:07,640 378 00:16:07,640 --> 00:16:08,420 Right here. 379 00:16:08,420 --> 00:16:11,690 So we can free the memory. 380 00:16:11,690 --> 00:16:13,186 Why would this be bad? 381 00:16:13,186 --> 00:16:17,040 AUDIENCE: [INAUDIBLE] 382 00:16:17,040 --> 00:16:18,040 DAVID J. MALAN: Exactly. 383 00:16:18,040 --> 00:16:20,260 We're freeing the memory, which is like saying to the operating system, 384 00:16:20,260 --> 00:16:21,520 I don't need this anymore. 385 00:16:21,520 --> 00:16:24,190 And yet, two lines later we're using it again and again. 386 00:16:24,190 --> 00:16:24,940 So bad. 387 00:16:24,940 --> 00:16:27,342 We didn't do that mistake last week, but you should only 388 00:16:27,342 --> 00:16:29,050 be freeing memory when, literally, you're 389 00:16:29,050 --> 00:16:31,870 ready to free it up and give it back, which should probably 390 00:16:31,870 --> 00:16:33,170 be at the end of the program. 391 00:16:33,170 --> 00:16:36,580 So let me go ahead and re-save this, Open, up my terminal window, 392 00:16:36,580 --> 00:16:40,690 recompile it this time, and now, let me run Valgrind one last time 393 00:16:40,690 --> 00:16:42,190 without help50. 394 00:16:42,190 --> 00:16:47,070 And still a little verbose, but zero errors, from zero contexts. 395 00:16:47,070 --> 00:16:48,070 That sounds pretty good. 396 00:16:48,070 --> 00:16:52,599 And moreover, it also explicitly says, all heap blocks were freed. 397 00:16:52,599 --> 00:16:54,640 And recall that the heap, is that chunk of memory 398 00:16:54,640 --> 00:16:58,300 that we drew visually up here, which is where malloc takes memory from. 399 00:16:58,300 --> 00:16:59,590 So, done. 400 00:16:59,590 --> 00:17:02,620 So this is kind of the mentality with which 401 00:17:02,620 --> 00:17:05,788 to have when approaching the correctness of your code. 402 00:17:05,788 --> 00:17:08,829 Like, it's one thing to run sample inputs, or run the program like I did. 403 00:17:08,829 --> 00:17:09,524 All looked well. 404 00:17:09,524 --> 00:17:12,190 It's one thing to run tools like check50, which we humans wrote. 405 00:17:12,190 --> 00:17:15,148 But we too are fallible, certainly, and we might not think of anything. 406 00:17:15,148 --> 00:17:18,099 And thankfully, smart humans have made tools, that at first glance, 407 00:17:18,099 --> 00:17:19,349 might be a little hard to use. 408 00:17:19,349 --> 00:17:21,849 Like debug 50, as is Valgrind now. 409 00:17:21,849 --> 00:17:24,910 But they ultimately help you get your code 100% correct 410 00:17:24,910 --> 00:17:28,240 without you having to struggle visually over just staring at the screen. 411 00:17:28,240 --> 00:17:30,394 And we see this a lot in office hours, honestly. 412 00:17:30,394 --> 00:17:33,310 A lot of students, to their credit, sort of reasoning through, staring 413 00:17:33,310 --> 00:17:35,809 at the screen, just trying to understand what's going wrong, 414 00:17:35,809 --> 00:17:38,950 but they're not taking any additional input other than the characters 415 00:17:38,950 --> 00:17:39,550 on the screen. 416 00:17:39,550 --> 00:17:43,360 You have so many tools that can feed you more and more hints along the way. 417 00:17:43,360 --> 00:17:46,510 So do acquire those instincts. 418 00:17:46,510 --> 00:17:48,340 Any questions on this? 419 00:17:48,340 --> 00:17:48,925 Yeah? 420 00:17:48,925 --> 00:17:53,755 AUDIENCE: Sir, if you had a main function that took arguments. 421 00:17:53,755 --> 00:17:57,254 Would you run Valgrind with those arguments as well? 422 00:17:57,254 --> 00:17:58,420 DAVID J. MALAN: Yes, indeed. 423 00:17:58,420 --> 00:18:02,080 So Valgrind works just like debug 50, just like help50. 424 00:18:02,080 --> 00:18:05,050 If you have command line arguments, just run them as usual, 425 00:18:05,050 --> 00:18:08,920 but prefix your command with Valgrind, or maybe even help50 Valgrind, 426 00:18:08,920 --> 00:18:10,090 to help one with the other. 427 00:18:10,090 --> 00:18:10,870 Good question. 428 00:18:10,870 --> 00:18:11,990 Other thoughts? 429 00:18:11,990 --> 00:18:12,490 Yeah? 430 00:18:12,490 --> 00:18:14,406 AUDIENCE: Where does the data go [INAUDIBLE]?? 431 00:18:14,406 --> 00:18:18,580 432 00:18:18,580 --> 00:18:19,830 DAVID J. MALAN: Good question. 433 00:18:19,830 --> 00:18:21,830 So at the end of the day, think about what's 434 00:18:21,830 --> 00:18:24,470 inside the computer, which is just something like this. 435 00:18:24,470 --> 00:18:26,510 So physically, it's obviously still there. 436 00:18:26,510 --> 00:18:29,420 It's just being treated by the operating system-- 437 00:18:29,420 --> 00:18:32,840 Mac, OS, Windows, Linux, whatever, as like a pool of memory. 438 00:18:32,840 --> 00:18:36,420 We keep drawing it as a grid that looks a little something like this. 439 00:18:36,420 --> 00:18:40,370 So the operating systems job is to just keep track of which of those squares 440 00:18:40,370 --> 00:18:42,329 is in use, thanks to malloc. 441 00:18:42,329 --> 00:18:43,370 And which has been freed. 442 00:18:43,370 --> 00:18:44,820 And so you can think of it as having little check 443 00:18:44,820 --> 00:18:47,540 marks next to them saying, this is in use, this is in use, 444 00:18:47,540 --> 00:18:48,930 these others are not in use. 445 00:18:48,930 --> 00:18:53,047 So they just go back on the so-called free list into that pool of memory. 446 00:18:53,047 --> 00:18:53,630 Good question. 447 00:18:53,630 --> 00:18:56,255 If you take a higher level course on operating systems in fact, 448 00:18:56,255 --> 00:19:00,630 or CS61 or 161 at Harvard, you'll actually build these kinds of things 449 00:19:00,630 --> 00:19:01,130 yourself. 450 00:19:01,130 --> 00:19:03,411 And implement tools like, malloc, yourself. 451 00:19:03,411 --> 00:19:03,910 Yeah? 452 00:19:03,910 --> 00:19:07,160 AUDIENCE: So why did we have to allocate memory in this case, and what happens 453 00:19:07,160 --> 00:19:07,661 [INAUDIBLE]? 454 00:19:07,661 --> 00:19:08,910 DAVID J. MALAN: Good question. 455 00:19:08,910 --> 00:19:10,980 Why did we have to allocate memory in this case? 456 00:19:10,980 --> 00:19:11,840 We did not. 457 00:19:11,840 --> 00:19:14,810 This was purely, as mentioned, for demonstration purposes. 458 00:19:14,810 --> 00:19:16,910 If we had some program in which we wanted 459 00:19:16,910 --> 00:19:20,860 to allocate some amount of memory, then this is how we might do it. 460 00:19:20,860 --> 00:19:24,320 However, a cleaner way to do all of this, 461 00:19:24,320 --> 00:19:29,604 would have been to say, hey, computer, give me 10 integers like this, 462 00:19:29,604 --> 00:19:31,520 and not have to worry about memory management. 463 00:19:31,520 --> 00:19:35,330 And that's where we began in week one, just using arrays on the stack, 464 00:19:35,330 --> 00:19:36,320 so to speak. 465 00:19:36,320 --> 00:19:37,560 Not using malloc at all. 466 00:19:37,560 --> 00:19:40,670 So the point is only, that once you start using malloc, and free, 467 00:19:40,670 --> 00:19:43,760 and memory more generally, you take on more responsibilities 468 00:19:43,760 --> 00:19:46,870 than we did in week one. 469 00:19:46,870 --> 00:19:47,560 Good question. 470 00:19:47,560 --> 00:19:49,300 And the others? 471 00:19:49,300 --> 00:19:49,810 All right. 472 00:19:49,810 --> 00:19:53,090 So, turns out, there's one more tool, in all seriousness. 473 00:19:53,090 --> 00:19:55,090 This is the thing. 474 00:19:55,090 --> 00:20:01,420 [? DDB50. ?] So debug 50 is an allusion to a very popular tool called, GDB 50, 475 00:20:01,420 --> 00:20:02,720 [? Gnu ?] debugger. 476 00:20:02,720 --> 00:20:05,230 It's an older tool that you won't use at the command line, 477 00:20:05,230 --> 00:20:07,510 but it's what makes debug 50 work. 478 00:20:07,510 --> 00:20:08,720 Turns out, there's a thing. 479 00:20:08,720 --> 00:20:10,720 And there's an actual Wikipedia article that you 480 00:20:10,720 --> 00:20:14,320 might have clicked on in my email last night, called rubber duck debugging. 481 00:20:14,320 --> 00:20:18,140 And frankly, you don't have to go as all out, as excessive, as we did here, 482 00:20:18,140 --> 00:20:20,800 but the purpose of this technique, of rubber duck debugging, 483 00:20:20,800 --> 00:20:24,275 is to keep, literally, like a rubber duck on your shelf, or on your desk. 484 00:20:24,275 --> 00:20:27,400 And when you have a bug and you don't have the luxury of a teaching fellow, 485 00:20:27,400 --> 00:20:31,210 or a roommate who took CS50, or a more technical friend who can help walk you 486 00:20:31,210 --> 00:20:34,450 through your code, literally, start walking through your code 487 00:20:34,450 --> 00:20:39,410 verbally, talking to the duck saying, well, online 2, I'm declaring main, 488 00:20:39,410 --> 00:20:42,310 and on line 3, I'm allocating space for an array. 489 00:20:42,310 --> 00:20:44,920 And then, on line 4, I'm calling-- ah! 490 00:20:44,920 --> 00:20:46,254 That's what I'm doing wrong. 491 00:20:46,254 --> 00:20:49,420 So if any of you have ever had that kind of moment, whether in office hours, 492 00:20:49,420 --> 00:20:51,704 or alone, where you're either talking in your head, 493 00:20:51,704 --> 00:20:53,870 or you're talking through your code to someone else. 494 00:20:53,870 --> 00:20:55,661 And here, she doesn't even have to respond. 495 00:20:55,661 --> 00:21:01,150 You just hear yourself saying the wrong thing, or having that aha moment. 496 00:21:01,150 --> 00:21:05,110 You can approximate that by just keeping one of these little guys on your desk, 497 00:21:05,110 --> 00:21:06,310 and have that conversation. 498 00:21:06,310 --> 00:21:09,310 And it's actually not as crazy sounding as it actually is. 499 00:21:09,310 --> 00:21:12,040 It's that process of just talking through your code logically, 500 00:21:12,040 --> 00:21:15,430 step by step, in a way that you can't necessarily do in your own mind. 501 00:21:15,430 --> 00:21:16,570 At least I can't. 502 00:21:16,570 --> 00:21:18,370 When you hear yourself say something wrong, 503 00:21:18,370 --> 00:21:20,920 or that didn't quite follow logically, bam, you 504 00:21:20,920 --> 00:21:22,689 can actually have that aha moment. 505 00:21:22,689 --> 00:21:25,480 So on the way out today, by all means, take any one of these ducks. 506 00:21:25,480 --> 00:21:28,210 That took quite a long, time for [? Colten ?] to lay out today. 507 00:21:28,210 --> 00:21:31,810 And we'll have more at office hours in the weeks to come, if you would like. 508 00:21:31,810 --> 00:21:35,680 So some of you might recall such a duck from [? Currier ?] House 509 00:21:35,680 --> 00:21:38,840 last year too, which was a cousin of his as well. 510 00:21:38,840 --> 00:21:39,340 All right. 511 00:21:39,340 --> 00:21:41,372 So that is rubber duck debugging. 512 00:21:41,372 --> 00:21:44,080 Now, last week, recall that we began to take off training wheels. 513 00:21:44,080 --> 00:21:46,132 We'd use for a few weeks, the CS50 library. 514 00:21:46,132 --> 00:21:47,590 And that's kind of in the past now. 515 00:21:47,590 --> 00:21:50,050 That was just a technique, a tool, via which 516 00:21:50,050 --> 00:21:53,200 we could get user input a little more pleasantly, than if we actually 517 00:21:53,200 --> 00:21:55,000 started dealing with memory early on. 518 00:21:55,000 --> 00:21:58,180 And we revealed last week that a "string", quote, unquote, 519 00:21:58,180 --> 00:22:00,410 is just what, underneath the hood in C? 520 00:22:00,410 --> 00:22:02,930 521 00:22:02,930 --> 00:22:04,390 Say again. 522 00:22:04,390 --> 00:22:05,500 An array of characters. 523 00:22:05,500 --> 00:22:10,780 And even more specifically, it's a synonym S-T-R-I-N-G for what actual 524 00:22:10,780 --> 00:22:12,530 data type? 525 00:22:12,530 --> 00:22:14,170 char star, as we've called it. 526 00:22:14,170 --> 00:22:16,960 So a char star is just the computer scientists 527 00:22:16,960 --> 00:22:19,420 way of describing a pointer to a character, 528 00:22:19,420 --> 00:22:21,790 or rather the address of a character, which 529 00:22:21,790 --> 00:22:26,290 is functionally equivalent to saying an array of memory, or sequence of memory. 530 00:22:26,290 --> 00:22:29,720 But it's kind of the more precise, more technical way of describing it. 531 00:22:29,720 --> 00:22:33,460 And so now that we know that we have char stars underneath the hood, well, 532 00:22:33,460 --> 00:22:34,844 where is all of that coming from? 533 00:22:34,844 --> 00:22:36,760 Well, indeed, it maps directly to that memory. 534 00:22:36,760 --> 00:22:40,090 We keep pointing out that something like this is inside of your computer. 535 00:22:40,090 --> 00:22:43,540 And we can think of the memory as just being chunks of memory, 536 00:22:43,540 --> 00:22:45,640 all of whose bytes are numbered. 537 00:22:45,640 --> 00:22:49,600 0 on up to 2 gigabytes, or 2 billion, whatever the value might be. 538 00:22:49,600 --> 00:22:52,780 But of course last week, we pointed out that you think about this memory 539 00:22:52,780 --> 00:22:56,709 not as being hardware per se, but as just being this pool of memory that's 540 00:22:56,709 --> 00:22:58,000 divided into different regions. 541 00:22:58,000 --> 00:23:00,770 The very top of your computer's memory, so to speak, 542 00:23:00,770 --> 00:23:02,450 is what we call the text segment. 543 00:23:02,450 --> 00:23:05,560 And what goes in the text segment of your computer's memory 544 00:23:05,560 --> 00:23:08,160 when you're running a program? 545 00:23:08,160 --> 00:23:12,930 Text is like, poor choice of words, frankly, but what is it? 546 00:23:12,930 --> 00:23:13,556 Say again. 547 00:23:13,556 --> 00:23:14,850 AUDIENCE: File Headers? 548 00:23:14,850 --> 00:23:16,990 DAVID J. MALAN: Not the file headers, in this case. 549 00:23:16,990 --> 00:23:19,850 This is in the context of running a program, not necessarily saving a file. 550 00:23:19,850 --> 00:23:20,240 Yeah? 551 00:23:20,240 --> 00:23:21,410 AUDIENCE: String literals. 552 00:23:21,410 --> 00:23:23,210 DAVID J. MALAN: Not string literals here, 553 00:23:23,210 --> 00:23:25,300 but they're nearby, actually, in memory. 554 00:23:25,300 --> 00:23:26,180 AUDIENCE: Functions. 555 00:23:26,180 --> 00:23:27,800 DAVID J. MALAN: Functions, closer. 556 00:23:27,800 --> 00:23:28,520 Yeah. 557 00:23:28,520 --> 00:23:31,400 The text segment of your computer's memory 558 00:23:31,400 --> 00:23:33,950 is where, when you double click a program to run it, 559 00:23:33,950 --> 00:23:37,700 or in Linux, when you do dot flash something, to run it. 560 00:23:37,700 --> 00:23:41,360 That's where the zeros and ones of your actual program, the machine code, 561 00:23:41,360 --> 00:23:44,660 that we talked about in week zero, is just loaded into RAM. 562 00:23:44,660 --> 00:23:48,150 So recall from last week, that, you know, anything physical in this world-- 563 00:23:48,150 --> 00:23:51,170 hard drives, solid state drives, is slow. 564 00:23:51,170 --> 00:23:55,100 So those devices are slow, but RAM, the stuff we keep pulling up on the screen, 565 00:23:55,100 --> 00:23:56,090 is relatively fast. 566 00:23:56,090 --> 00:23:57,770 If only because it has no moving parts. 567 00:23:57,770 --> 00:23:58,862 It's purely electronic. 568 00:23:58,862 --> 00:24:01,070 So when you double click a program on your Mac or PC, 569 00:24:01,070 --> 00:24:03,290 or do dot slash something in Linux, that is 570 00:24:03,290 --> 00:24:05,930 loading from a slow device, your hard drive, 571 00:24:05,930 --> 00:24:09,710 where the data is stored long term, into RAM or memory, 572 00:24:09,710 --> 00:24:14,070 where it can run much more quickly and pleasurably in terms of performance. 573 00:24:14,070 --> 00:24:16,710 And so, what does this actually mean for us? 574 00:24:16,710 --> 00:24:18,050 Well, it's got to go somewhere. 575 00:24:18,050 --> 00:24:20,180 We just decided, humans, years ago that it's 576 00:24:20,180 --> 00:24:22,760 going to go at the top, so to speak, of this chunk of memory. 577 00:24:22,760 --> 00:24:25,910 Below that though, are the more dynamic regions of memory-- 578 00:24:25,910 --> 00:24:27,530 the stack and the heap. 579 00:24:27,530 --> 00:24:31,040 And we said this a moment ago, and last week as well, what goes on the heap? 580 00:24:31,040 --> 00:24:33,445 Or who uses the heap? 581 00:24:33,445 --> 00:24:34,720 AUDIENCE: Dynamic memory. 582 00:24:34,720 --> 00:24:36,011 DAVID J. MALAN: Dynamic memory. 583 00:24:36,011 --> 00:24:38,740 Any time you call malloc, you're asking the operating system 584 00:24:38,740 --> 00:24:40,330 for memory from the so-called heap. 585 00:24:40,330 --> 00:24:43,596 Anytime you call free, you're sort of conceptually putting it back. 586 00:24:43,596 --> 00:24:45,220 Like, it's not actually going anywhere. 587 00:24:45,220 --> 00:24:49,720 You're just marking it as available for other functions and variables to use. 588 00:24:49,720 --> 00:24:53,127 The stack, meanwhile, is used for what? 589 00:24:53,127 --> 00:24:54,210 AUDIENCE: Local variables. 590 00:24:54,210 --> 00:24:56,760 DAVID J. MALAN: Local variables and any of your functions. 591 00:24:56,760 --> 00:24:59,820 So main, typically takes a sliver of memory at the bottom. 592 00:24:59,820 --> 00:25:03,240 If main calls another function, it gets a sliver of memory above that. 593 00:25:03,240 --> 00:25:06,260 If that function calls one, it gets a sliver of memory above that. 594 00:25:06,260 --> 00:25:08,670 So they each have their own different regions of memory. 595 00:25:08,670 --> 00:25:11,580 But of course, these arrows, both pointing at each other, 596 00:25:11,580 --> 00:25:13,740 doesn't seem like such a good design. 597 00:25:13,740 --> 00:25:16,090 But the reality, is bad things can happen. 598 00:25:16,090 --> 00:25:20,460 You can allocate so much memory that, bam, the stack overflows the heap. 599 00:25:20,460 --> 00:25:22,530 Or the heap overflows the stack. 600 00:25:22,530 --> 00:25:25,497 Thus was born websites like Stack Overflow, and the like. 601 00:25:25,497 --> 00:25:26,580 But that's just a reality. 602 00:25:26,580 --> 00:25:28,910 If you have a finite amount of memory, at some point, 603 00:25:28,910 --> 00:25:30,180 something's going to break. 604 00:25:30,180 --> 00:25:32,824 Or the computer's going to have to say, mm-mm, no more memory. 605 00:25:32,824 --> 00:25:35,490 You're going to have to quit some programs, or close some files, 606 00:25:35,490 --> 00:25:36,316 or whatnot. 607 00:25:36,316 --> 00:25:38,940 So that was only to say that that's how the memory is laid out. 608 00:25:38,940 --> 00:25:42,330 And we started to explore this by way of a few programs. 609 00:25:42,330 --> 00:25:44,520 This one here-- it's a little dark here. 610 00:25:44,520 --> 00:25:46,940 This one here, was a swap function. 611 00:25:46,940 --> 00:25:48,000 Now it's even darker. 612 00:25:48,000 --> 00:25:54,480 It was a swap function that actually did swap two values, A and B. 613 00:25:54,480 --> 00:25:57,120 But it didn't actually work in the way we intended. 614 00:25:57,120 --> 00:25:59,610 What was broken about this swap function last week? 615 00:25:59,610 --> 00:26:02,390 616 00:26:02,390 --> 00:26:04,280 Like, I'm pretty sure it worked. 617 00:26:04,280 --> 00:26:08,030 And when our brave volunteer came up and swapped the orange juice and the milk, 618 00:26:08,030 --> 00:26:08,850 that worked. 619 00:26:08,850 --> 00:26:14,400 So like, the logic was correct, but the program itself did not work. 620 00:26:14,400 --> 00:26:14,970 Why? 621 00:26:14,970 --> 00:26:17,220 AUDIENCE: It changed the values of the copy variables. 622 00:26:17,220 --> 00:26:17,660 DAVID J. MALAN: Exactly. 623 00:26:17,660 --> 00:26:20,120 It changed values in the copies of the variable. 624 00:26:20,120 --> 00:26:22,910 So recall, that when main was the function 625 00:26:22,910 --> 00:26:26,900 we called, and it had two values, x and y, that chunk of memory was here. 626 00:26:26,900 --> 00:26:28,160 That chunk of memory was here. 627 00:26:28,160 --> 00:26:29,930 And it had like the numbers 1 and 2. 628 00:26:29,930 --> 00:26:33,080 But when it called the swap function, that got its own chunk of memory. 629 00:26:33,080 --> 00:26:35,930 So main was at the bottom, swap was above that. 630 00:26:35,930 --> 00:26:38,480 It had its own chunks of memory called, a and b, which 631 00:26:38,480 --> 00:26:40,430 initially, got the values 1 and 2. 632 00:26:40,430 --> 00:26:42,230 1 and 2 were indeed successfully swapped, 633 00:26:42,230 --> 00:26:44,930 but that had no effect on x and y. 634 00:26:44,930 --> 00:26:45,797 So we fixed that. 635 00:26:45,797 --> 00:26:47,880 With the newer version of this program, of course, 636 00:26:47,880 --> 00:26:50,960 it looked a lot more cryptic at first glance, but in English, 637 00:26:50,960 --> 00:26:53,780 could someone just describe what it is that happens 638 00:26:53,780 --> 00:26:56,460 in this example that was more correct? 639 00:26:56,460 --> 00:26:58,500 Like, what does this program do line by line? 640 00:26:58,500 --> 00:26:59,000 Yeah? 641 00:26:59,000 --> 00:27:01,208 AUDIENCE: Instead of passing copies of the variables, 642 00:27:01,208 --> 00:27:03,100 you pass pointers to their addresses. 643 00:27:03,100 --> 00:27:04,100 DAVID J. MALAN: Exactly. 644 00:27:04,100 --> 00:27:06,975 Instead of passing the values of the variables, thereby copying them, 645 00:27:06,975 --> 00:27:09,420 it passes the addresses of those variables. 646 00:27:09,420 --> 00:27:13,110 So that's like saying, I don't technically care where it is in memory, 647 00:27:13,110 --> 00:27:15,660 but I do need to know that it is somewhere in memory. 648 00:27:15,660 --> 00:27:18,300 So instead of passing an x in the number 1, 649 00:27:18,300 --> 00:27:20,600 let's suppose that x is at location 100-- 650 00:27:20,600 --> 00:27:21,961 my go to example. 651 00:27:21,961 --> 00:27:24,210 It's actually the number 100 that's going to go there. 652 00:27:24,210 --> 00:27:27,460 And if y is at the location like, 104, well, it's 653 00:27:27,460 --> 00:27:31,220 104 that's going to go there, which are not the values we want to swap, 654 00:27:31,220 --> 00:27:34,370 but those are sort of like little maps, or breadcrumbs if you will, 655 00:27:34,370 --> 00:27:36,550 that lead us to the right location. 656 00:27:36,550 --> 00:27:39,380 So that when we execute this code, what we're ultimately 657 00:27:39,380 --> 00:27:43,410 swapping in those three lines, is this and this, and all along the way, 658 00:27:43,410 --> 00:27:45,740 recall, we're using a temporary variable there 659 00:27:45,740 --> 00:27:48,050 that can be just thrown away after. 660 00:27:48,050 --> 00:27:50,090 So that's what pointers allowed us to do. 661 00:27:50,090 --> 00:27:54,110 And that's what allowed us to actually change values on the so-called stack, 662 00:27:54,110 --> 00:27:58,890 even by calling on other function. 663 00:27:58,890 --> 00:27:59,390 All right. 664 00:27:59,390 --> 00:28:05,540 Any questions then, on where we left off last time with the stack and with swap? 665 00:28:05,540 --> 00:28:07,270 No? 666 00:28:07,270 --> 00:28:07,770 All right. 667 00:28:07,770 --> 00:28:11,940 So recall we introduced Binky as well, who lost his head at one point, 668 00:28:11,940 --> 00:28:13,140 but why? 669 00:28:13,140 --> 00:28:16,552 What went horribly, horribly awry with this scene from last week's film 670 00:28:16,552 --> 00:28:17,135 from Stanford? 671 00:28:17,135 --> 00:28:20,297 672 00:28:20,297 --> 00:28:22,130 Binky was doing everything correctly, right? 673 00:28:22,130 --> 00:28:23,140 Like, moving values. 674 00:28:23,140 --> 00:28:24,700 42 was successful. 675 00:28:24,700 --> 00:28:25,619 And then, yeah? 676 00:28:25,619 --> 00:28:27,618 AUDIENCE: He tried to dereference something that 677 00:28:27,618 --> 00:28:31,500 wasn't pointing to any actual address. 678 00:28:31,500 --> 00:28:32,500 DAVID J. MALAN: Exactly. 679 00:28:32,500 --> 00:28:36,400 He tried to dereference a pointer, an address, that wasn't actually pointing 680 00:28:36,400 --> 00:28:37,630 to a valid address. 681 00:28:37,630 --> 00:28:41,560 Recall that this was the line in code in question that was unlucky and bad. 682 00:28:41,560 --> 00:28:45,310 Star y, means, go to the address in y, and do something to it. 683 00:28:45,310 --> 00:28:47,380 Set it equal to the number 13. 684 00:28:47,380 --> 00:28:50,680 But the problem was, that in the code we looked at last week, 685 00:28:50,680 --> 00:28:54,550 all we did at the start was say, hey, computer give me a pointer to an int, 686 00:28:54,550 --> 00:28:55,810 and call it x. 687 00:28:55,810 --> 00:28:58,070 Do the same, and call it y. 688 00:28:58,070 --> 00:29:02,320 Allocate space and point x at it. 689 00:29:02,320 --> 00:29:04,450 But we never did the same for y. 690 00:29:04,450 --> 00:29:08,860 So whereas x contained, last week, the address of an actual chunk of memory, 691 00:29:08,860 --> 00:29:12,640 thanks to malloc, what did y contain at that point in the story? 692 00:29:12,640 --> 00:29:13,670 The yellow line there. 693 00:29:13,670 --> 00:29:16,290 694 00:29:16,290 --> 00:29:17,276 What did y contain? 695 00:29:17,276 --> 00:29:17,775 What value? 696 00:29:17,775 --> 00:29:21,886 697 00:29:21,886 --> 00:29:22,850 AUDIENCE: Null. 698 00:29:22,850 --> 00:29:23,730 DAVID J. MALAN: Null. 699 00:29:23,730 --> 00:29:24,711 Maybe. 700 00:29:24,711 --> 00:29:25,210 Maybe. 701 00:29:25,210 --> 00:29:28,509 But it's not obvious because there's no mention of null in the program. 702 00:29:28,509 --> 00:29:29,300 We might get lucky. 703 00:29:29,300 --> 00:29:30,640 Null is just 0. 704 00:29:30,640 --> 00:29:33,760 And sometimes we've seen that 0 are the default values in a program. 705 00:29:33,760 --> 00:29:34,560 So maybe. 706 00:29:34,560 --> 00:29:37,941 But I say, maybe, and I'm hedging why. 707 00:29:37,941 --> 00:29:39,435 AUDIENCE: [INAUDIBLE]. 708 00:29:39,435 --> 00:29:40,310 DAVID J. MALAN: Yeah. 709 00:29:40,310 --> 00:29:42,700 And it doesn't allocate-- well, allocate, is not quite the right word. 710 00:29:42,700 --> 00:29:44,658 That suggests you are allocating actual memory. 711 00:29:44,658 --> 00:29:45,790 It's a garbage value. 712 00:29:45,790 --> 00:29:46,810 There's something there. 713 00:29:46,810 --> 00:29:47,020 Right? 714 00:29:47,020 --> 00:29:48,687 My Mac has been running for a few hours. 715 00:29:48,687 --> 00:29:51,603 And your Macs, and PCs, and phones, are probably running all day long. 716 00:29:51,603 --> 00:29:52,990 Or certainly when the lid is up. 717 00:29:52,990 --> 00:29:55,930 And so, your memory is getting used, and unused, and used. 718 00:29:55,930 --> 00:29:57,530 Like, lots of stuff is going on. 719 00:29:57,530 --> 00:30:00,567 So your computer is not filled with all zeros or all ones. 720 00:30:00,567 --> 00:30:02,650 If you look at it at some random point in the day, 721 00:30:02,650 --> 00:30:05,290 it's filled with like bunches and bunches of zeros and ones 722 00:30:05,290 --> 00:30:07,840 from previous programs that you quit long ago. 723 00:30:07,840 --> 00:30:09,889 Windows you have in the background and the like. 724 00:30:09,889 --> 00:30:11,680 So, the short of it is, when you're running 725 00:30:11,680 --> 00:30:15,271 a program for the first time, that's been running now for some time, 726 00:30:15,271 --> 00:30:16,270 it's going to get messy. 727 00:30:16,270 --> 00:30:18,978 That big rectangle of memory is going to have some ones over here 728 00:30:18,978 --> 00:30:21,350 some zeros over here and vise versa. 729 00:30:21,350 --> 00:30:26,300 So they're garbage values, because those bytes have some values in them. 730 00:30:26,300 --> 00:30:28,400 You just don't necessarily know what they are. 731 00:30:28,400 --> 00:30:31,630 So the point is, you should never ever dereference a pointer 732 00:30:31,630 --> 00:30:33,940 that you have not set yourself. 733 00:30:33,940 --> 00:30:35,080 Maybe you will crash. 734 00:30:35,080 --> 00:30:36,010 Maybe it won't crash. 735 00:30:36,010 --> 00:30:38,830 Valgrind can help you find these things but sometimes. 736 00:30:38,830 --> 00:30:41,800 But it's just not a safe operation. 737 00:30:41,800 --> 00:30:43,949 And lastly, the last thing we introduced last week, 738 00:30:43,949 --> 00:30:46,990 which will be the stepping stone for what problems we'll solve this week, 739 00:30:46,990 --> 00:30:47,880 was struct. 740 00:30:47,880 --> 00:30:52,540 So struck is kind of cool, in that you can design your own custom data 741 00:30:52,540 --> 00:30:53,410 structures. 742 00:30:53,410 --> 00:30:55,630 C is pretty limited out of the box, so to speak. 743 00:30:55,630 --> 00:30:59,500 You only have chars and boules, and floats, and ints, and doubles, 744 00:30:59,500 --> 00:31:00,730 and longs, and str-- 745 00:31:00,730 --> 00:31:02,439 well, we don't even have strings, per se. 746 00:31:02,439 --> 00:31:05,479 So it doesn't really come with many features, like a lot of languages do. 747 00:31:05,479 --> 00:31:07,720 Like Python, which we'll see in a few weeks. 748 00:31:07,720 --> 00:31:09,970 So with struct in C, you have the ability 749 00:31:09,970 --> 00:31:11,680 to solve some problems of your own. 750 00:31:11,680 --> 00:31:15,460 For instance, with the struct, we can actually 751 00:31:15,460 --> 00:31:19,110 start to implement our own features. 752 00:31:19,110 --> 00:31:20,260 Or our own data types. 753 00:31:20,260 --> 00:31:22,010 For instance, let me go up here. 754 00:31:22,010 --> 00:31:25,510 And let me go ahead and create a file called say, 755 00:31:25,510 --> 00:31:28,540 student, or rather destruct dot h. 756 00:31:28,540 --> 00:31:30,430 So recall that dot h is a header file. 757 00:31:30,430 --> 00:31:33,200 Thus far, you have used header files that other people made. 758 00:31:33,200 --> 00:31:36,850 Like, CS50 dot h, and standard IO dot h, and standard [? lid ?] dot h, 759 00:31:36,850 --> 00:31:38,080 but you can make your own. 760 00:31:38,080 --> 00:31:41,380 Header files are just files that typically contain code that you 761 00:31:41,380 --> 00:31:43,450 want to share across multiple programs. 762 00:31:43,450 --> 00:31:45,169 And we'll see more of this in time. 763 00:31:45,169 --> 00:31:46,960 So let me go ahead and just save this file. 764 00:31:46,960 --> 00:31:50,890 And suppose that I want to represent a student in memory. 765 00:31:50,890 --> 00:31:54,880 A student of course, is probably going to have what? 766 00:31:54,880 --> 00:31:59,640 For instance, how about a string for their name, 767 00:31:59,640 --> 00:32:02,650 a string for their dorm-- but string is kind of two weeks ago. 768 00:32:02,650 --> 00:32:04,630 Lets call this char star. 769 00:32:04,630 --> 00:32:07,720 And lets call name, char star. 770 00:32:07,720 --> 00:32:11,150 And so you might want to associate like, multiple pieces of data with students. 771 00:32:11,150 --> 00:32:11,650 Right? 772 00:32:11,650 --> 00:32:13,280 And you don't want to have multiple variables, per se. 773 00:32:13,280 --> 00:32:14,830 It would be nice to kind of encapsulate these together. 774 00:32:14,830 --> 00:32:16,900 And recall at the very end of last week, we 775 00:32:16,900 --> 00:32:20,680 saw this feature where you can define your own type, 776 00:32:20,680 --> 00:32:23,920 with typedef, that is a structure itself. 777 00:32:23,920 --> 00:32:25,340 And you can give it a name. 778 00:32:25,340 --> 00:32:29,060 So in short, simply by executing this these lines of code, 779 00:32:29,060 --> 00:32:31,060 you have just created your own custom data type. 780 00:32:31,060 --> 00:32:32,410 It's now called student. 781 00:32:32,410 --> 00:32:36,340 And every student in the world shall have, per this code, a name 782 00:32:36,340 --> 00:32:38,090 and a dorm associated with them. 783 00:32:38,090 --> 00:32:39,170 Now, why is this useful? 784 00:32:39,170 --> 00:32:42,250 Well the program, we looked at the very end of last time looked 785 00:32:42,250 --> 00:32:43,830 a little something like this. 786 00:32:43,830 --> 00:32:48,730 Instruct zero dot c, we had the following, 787 00:32:48,730 --> 00:32:52,016 I first allocated some amount of space for student. 788 00:32:52,016 --> 00:32:54,640 I asked the user what's the enrollment in the class or whatnot? 789 00:32:54,640 --> 00:32:56,020 That gives us an int. 790 00:32:56,020 --> 00:33:01,910 And then, we allocated an array of type student, called students, plural. 791 00:33:01,910 --> 00:33:04,600 This was an alternative, recall, to doing something 792 00:33:04,600 --> 00:33:10,270 like this, string names enrollment, and string dorms enrollment. 793 00:33:10,270 --> 00:33:11,200 Which would work. 794 00:33:11,200 --> 00:33:13,283 You could have two separate arrays, and you'd just 795 00:33:13,283 --> 00:33:17,170 have to remember that name zero and dorm zero is the same human. 796 00:33:17,170 --> 00:33:19,490 But why do that if you can keep things together. 797 00:33:19,490 --> 00:33:21,610 So with structs, we were able to do this. 798 00:33:21,610 --> 00:33:27,250 Give me this many student structures, and call the whole array, students. 799 00:33:27,250 --> 00:33:34,460 And the only new syntax we introduce to satisfy this goal, was what operator? 800 00:33:34,460 --> 00:33:35,356 AUDIENCE: The dot. 801 00:33:35,356 --> 00:33:36,356 DAVID J. MALAN: The dot. 802 00:33:36,356 --> 00:33:36,856 Yeah. 803 00:33:36,856 --> 00:33:40,090 So in the past, recall from like week two, we introduced arrays. 804 00:33:40,090 --> 00:33:42,280 And arrays allow you to do square bracket notation. 805 00:33:42,280 --> 00:33:45,490 So that is no different from a couple of weeks back. 806 00:33:45,490 --> 00:33:49,450 But if your array is not storing just integers, or chars, or floats, 807 00:33:49,450 --> 00:33:53,080 or whatever, it's actually storing a structure, like a student, 808 00:33:53,080 --> 00:33:57,400 you can get at that student's name by literally just saying dot name. 809 00:33:57,400 --> 00:33:59,999 And you can get at their dorm by doing dot dorm. 810 00:33:59,999 --> 00:34:01,540 And then everything else is the same. 811 00:34:01,540 --> 00:34:03,190 This is what's called, encapsulation. 812 00:34:03,190 --> 00:34:05,690 And it's kind of like a fundamental principle of programming 813 00:34:05,690 --> 00:34:08,949 where, if you have some real world entity, like a student, 814 00:34:08,949 --> 00:34:11,800 and you want to represent students with code, yeah, 815 00:34:11,800 --> 00:34:16,659 you can have a bunch of arrays that all have called names, dorms, emails, phone 816 00:34:16,659 --> 00:34:18,159 numbers, but that just gets messy. 817 00:34:18,159 --> 00:34:22,150 You can instead encapsulate all of that related Information about a student 818 00:34:22,150 --> 00:34:27,310 into one data structure so that now you have, per week zero, an abstraction. 819 00:34:27,310 --> 00:34:30,050 Like, a student is an abstraction. 820 00:34:30,050 --> 00:34:34,150 And if we break that abstraction, what is a student actually? 821 00:34:34,150 --> 00:34:37,830 Not in the real world, but in our code world here? 822 00:34:37,830 --> 00:34:39,010 Student is an abstraction. 823 00:34:39,010 --> 00:34:41,909 It's a useful word, all of us can kind of agree means something, 824 00:34:41,909 --> 00:34:45,810 but technically, what does it apparently mean? 825 00:34:45,810 --> 00:34:48,989 A student is actually a name in a dorm, which really kind of is 826 00:34:48,989 --> 00:34:52,409 diminutive to everyone in this room, but we've distilled it in code 827 00:34:52,409 --> 00:34:53,999 to just those two values. 828 00:34:53,999 --> 00:34:55,290 So there we have encapsulation. 829 00:34:55,290 --> 00:34:57,656 You're kind of encapsulating together multiple values. 830 00:34:57,656 --> 00:35:00,030 And you're abstracting away just have a more useful term, 831 00:35:00,030 --> 00:35:02,790 because no one is going to want to talk in terms of lines of code 832 00:35:02,790 --> 00:35:04,200 to describe anything. 833 00:35:04,200 --> 00:35:05,590 So, same topic as in the past. 834 00:35:05,590 --> 00:35:10,020 So, now we have the ability to come up with our own custom data structures 835 00:35:10,020 --> 00:35:10,710 it seems. 836 00:35:10,710 --> 00:35:13,330 That we can store anything inside of them that we want. 837 00:35:13,330 --> 00:35:16,860 So let's now see how poorly we've been designing 838 00:35:16,860 --> 00:35:19,360 some things for the past few weeks. 839 00:35:19,360 --> 00:35:22,830 So it turns out that much of the code, hopefully 840 00:35:22,830 --> 00:35:25,210 we've been writing in recent weeks has been correct, 841 00:35:25,210 --> 00:35:28,954 but we've been not necessarily designing solutions in the best way. 842 00:35:28,954 --> 00:35:30,870 Recall that when we have this chunk of memory, 843 00:35:30,870 --> 00:35:34,150 we've typically treated it as at most, an array. 844 00:35:34,150 --> 00:35:35,700 So just a contiguous chunk of memory. 845 00:35:35,700 --> 00:35:39,450 And thanks to this very simple mental model, do we get strings, 846 00:35:39,450 --> 00:35:42,210 do we get arrays of students now. 847 00:35:42,210 --> 00:35:45,960 But arrays aren't necessarily the best data structure in the world. 848 00:35:45,960 --> 00:35:49,800 Like, what is a downside of an array if you've encountered ones thus far. 849 00:35:49,800 --> 00:35:52,430 850 00:35:52,430 --> 00:35:54,770 In C, what's a downside of an array? 851 00:35:54,770 --> 00:35:55,760 Yeah? 852 00:35:55,760 --> 00:35:58,230 AUDIENCE: [INAUDIBLE]. 853 00:35:58,230 --> 00:35:59,480 DAVID J. MALAN: Can or cannot? 854 00:35:59,480 --> 00:36:00,010 AUDIENCE: Cannot. 855 00:36:00,010 --> 00:36:00,680 DAVID J. MALAN: You cannot. 856 00:36:00,680 --> 00:36:01,440 That is true. 857 00:36:01,440 --> 00:36:05,690 So in C, you cannot mix data types inside of an array. 858 00:36:05,690 --> 00:36:09,907 They must all be ints, they must all be chars, they must all be students. 859 00:36:09,907 --> 00:36:11,990 It's a bit of a white lie because technically, you 860 00:36:11,990 --> 00:36:15,320 can have something called a void star, and you can actually map-- but yes. 861 00:36:15,320 --> 00:36:18,161 That is true though, strictly speaking-- cannot mix data types. 862 00:36:18,161 --> 00:36:20,660 Though frankly, even though other languages let you do that, 863 00:36:20,660 --> 00:36:22,580 it's not necessarily the best design decision. 864 00:36:22,580 --> 00:36:23,540 But sure, a limitation. 865 00:36:23,540 --> 00:36:24,190 Other thoughts. 866 00:36:24,190 --> 00:36:24,734 Yeah? 867 00:36:24,734 --> 00:36:26,110 AUDIENCE: The size cannot change. 868 00:36:26,110 --> 00:36:27,734 DAVID J. MALAN: The size cannot change. 869 00:36:27,734 --> 00:36:28,760 Let's focus on that one. 870 00:36:28,760 --> 00:36:32,240 Because that's sort of even more constraining it would seem. 871 00:36:32,240 --> 00:36:37,010 So if you want an array for, say, two values, what do you do? 872 00:36:37,010 --> 00:36:41,744 Well, you can do something like int, x, bracket, 2, semi-colon. 873 00:36:41,744 --> 00:36:44,660 And what does that actually give you inside of your computer's memory? 874 00:36:44,660 --> 00:36:47,600 It gives you some chunk that we'll draw a rectangle. 875 00:36:47,600 --> 00:36:48,850 This is location 0. 876 00:36:48,850 --> 00:36:49,900 This is location 1. 877 00:36:49,900 --> 00:36:52,400 Suppose that, oh, a few minutes later, you change your mind. 878 00:36:52,400 --> 00:36:54,215 Oh, darn, I just took a-- 879 00:36:54,215 --> 00:36:56,480 I want to type in a third value, or I want 880 00:36:56,480 --> 00:36:58,430 to add another student to the array. 881 00:36:58,430 --> 00:37:00,230 Where do you put that? 882 00:37:00,230 --> 00:37:01,550 Well, you don't. 883 00:37:01,550 --> 00:37:04,490 If you want to add a third value to an array of size 2, 884 00:37:04,490 --> 00:37:06,900 what's your only option in C? 885 00:37:06,900 --> 00:37:08,380 AUDIENCE: You make a new array. 886 00:37:08,380 --> 00:37:09,280 DAVID J. MALAN: You make a new array. 887 00:37:09,280 --> 00:37:09,940 So literally. 888 00:37:09,940 --> 00:37:13,150 And if this array had the number like 42, 889 00:37:13,150 --> 00:37:17,260 and this had the number 13, the only way to add a third number is to allocate 890 00:37:17,260 --> 00:37:23,780 a second array, copy the values into the same locations, 42, 13, and then, 891 00:37:23,780 --> 00:37:25,390 we'll add another value, 50. 892 00:37:25,390 --> 00:37:28,150 And then, so that you're not using up twice as much space 893 00:37:28,150 --> 00:37:31,630 almost permanently, now you can sort of free somehow, 894 00:37:31,630 --> 00:37:33,830 or stop using that chunk of memory. 895 00:37:33,830 --> 00:37:34,480 So that's fine. 896 00:37:34,480 --> 00:37:35,857 It's correct what we just did. 897 00:37:35,857 --> 00:37:37,690 But what's the running time of that process? 898 00:37:37,690 --> 00:37:40,362 899 00:37:40,362 --> 00:37:43,570 Recall a couple of weeks ago, we started talking about efficiency and design. 900 00:37:43,570 --> 00:37:47,750 What's the running time of resizing an array. 901 00:37:47,750 --> 00:37:48,542 AUDIENCE: Too long. 902 00:37:48,542 --> 00:37:49,624 DAVID J. MALAN: Say Again. 903 00:37:49,624 --> 00:37:50,860 AUDIENCE: I said, too long. 904 00:37:50,860 --> 00:37:51,901 DAVID J. MALAN: Too long. 905 00:37:51,901 --> 00:37:53,137 Fair. 906 00:37:53,137 --> 00:37:54,220 But let's be more precise. 907 00:37:54,220 --> 00:38:01,945 Big o of-- big o of what? 908 00:38:01,945 --> 00:38:02,829 AUDIENCE: N. 909 00:38:02,829 --> 00:38:03,995 DAVID J. MALAN: N. What's n? 910 00:38:03,995 --> 00:38:05,150 AUDIENCE: [INAUDIBLE]. 911 00:38:05,150 --> 00:38:05,500 DAVID J. MALAN: OK. 912 00:38:05,500 --> 00:38:05,770 True. 913 00:38:05,770 --> 00:38:06,853 But what does n represent? 914 00:38:06,853 --> 00:38:08,375 AUDIENCE: [INAUDIBLE]. 915 00:38:08,375 --> 00:38:09,250 DAVID J. MALAN: Yeah. 916 00:38:09,250 --> 00:38:10,680 So you don't actually have to not know. 917 00:38:10,680 --> 00:38:11,804 It's just a general answer. 918 00:38:11,804 --> 00:38:14,700 In this case, however long the array is, call it n. 919 00:38:14,700 --> 00:38:18,340 It is that many steps to resize it into that plus 1. 920 00:38:18,340 --> 00:38:20,142 Technically it's big o, over n, plus 1. 921 00:38:20,142 --> 00:38:22,600 But recall in our discussion, "The big o notation," we just 922 00:38:22,600 --> 00:38:26,890 ignore the smaller terms-- the plus 1s, the divided by 2s, the plus n. 923 00:38:26,890 --> 00:38:30,400 We focus only on the most powerful term in the expression, which 924 00:38:30,400 --> 00:38:31,540 is just n here. 925 00:38:31,540 --> 00:38:35,140 So yes, if you have an array of size 2, and you resize it 926 00:38:35,140 --> 00:38:38,710 into an array of size 3, or really, n plus 1, that's 927 00:38:38,710 --> 00:38:40,210 going to take me roughly n steps. 928 00:38:40,210 --> 00:38:41,710 Technically n plus 1 steps. 929 00:38:41,710 --> 00:38:42,820 But n steps. 930 00:38:42,820 --> 00:38:44,260 Ergo big o of n. 931 00:38:44,260 --> 00:38:45,320 So it's a linear process. 932 00:38:45,320 --> 00:38:48,560 So possible but not necessarily the fastest 933 00:38:48,560 --> 00:38:51,970 thing because he literally had to move all those damn values around. 934 00:38:51,970 --> 00:38:56,110 So what would be better than this? 935 00:38:56,110 --> 00:38:59,950 And if you've programed before, you might have the right instincts already. 936 00:38:59,950 --> 00:39:01,210 How do we solve this problem? 937 00:39:01,210 --> 00:39:04,598 938 00:39:04,598 --> 00:39:05,098 Yeah? 939 00:39:05,098 --> 00:39:07,540 AUDIENCE: Would you allocate more memory at the end of the array? 940 00:39:07,540 --> 00:39:10,165 DAVID J. MALAN: Reallocate more memory at the end of the array. 941 00:39:10,165 --> 00:39:15,300 So it turns out c does have a function called, realloc. 942 00:39:15,300 --> 00:39:19,480 Perfectly, if not obviously, named that reallocates memory. 943 00:39:19,480 --> 00:39:23,200 And if you pass it, the address of a chunk of memory you've allocated, 944 00:39:23,200 --> 00:39:26,020 and the operating system notices, oh, yeah you got lucky. 945 00:39:26,020 --> 00:39:28,460 I've got more memory at the end of this array, 946 00:39:28,460 --> 00:39:32,050 it will then allocate that additional RAM for you, and let you use it. 947 00:39:32,050 --> 00:39:34,830 Or worst case, if there's nothing available at the end 948 00:39:34,830 --> 00:39:36,580 of the array in memory, because it's being 949 00:39:36,580 --> 00:39:38,890 used by something else in your program. 950 00:39:38,890 --> 00:39:39,760 That's fine. 951 00:39:39,760 --> 00:39:44,920 Realloc will take on the responsibility of creating another array somewhere 952 00:39:44,920 --> 00:39:48,010 in memory, copying all of that data for you into it, 953 00:39:48,010 --> 00:39:51,190 and returning the address of that new chunk of memory. 954 00:39:51,190 --> 00:39:53,031 Unfortunately, that's still linear. 955 00:39:53,031 --> 00:39:53,530 Yeah? 956 00:39:53,530 --> 00:39:55,282 AUDIENCE: Is this all being done in the heap? 957 00:39:55,282 --> 00:39:55,720 Or-- 958 00:39:55,720 --> 00:39:57,845 DAVID J. MALAN: This is all being done in the heap. 959 00:39:57,845 --> 00:40:00,760 Malloc, and realloc, and free, all operate on the heap. 960 00:40:00,760 --> 00:40:01,630 Yes. 961 00:40:01,630 --> 00:40:04,750 So that is a solution, but it doesn't really speak to the efficiency. 962 00:40:04,750 --> 00:40:05,250 Yeah? 963 00:40:05,250 --> 00:40:06,360 AUDIENCE: Could you use linked list? 964 00:40:06,360 --> 00:40:07,235 DAVID J. MALAN: Yeah. 965 00:40:07,235 --> 00:40:09,730 What is a linked list? 966 00:40:09,730 --> 00:40:10,367 Go ahead. 967 00:40:10,367 --> 00:40:13,450 AUDIENCE: It's when you have an element that points to different elements. 968 00:40:13,450 --> 00:40:14,241 DAVID J. MALAN: OK. 969 00:40:14,241 --> 00:40:15,391 Points to other elements. 970 00:40:15,391 --> 00:40:15,890 Yeah. 971 00:40:15,890 --> 00:40:18,100 So let me speak to what's the fundamental issue here. 972 00:40:18,100 --> 00:40:23,530 The fundamental problem is much like painting yourself into a corner, 973 00:40:23,530 --> 00:40:25,060 so to speak, as the cliche goes. 974 00:40:25,060 --> 00:40:29,260 With an array, you're deciding in advance how big the data structure is 975 00:40:29,260 --> 00:40:30,666 and committing to it. 976 00:40:30,666 --> 00:40:32,290 Well, what if you just do the opposite. 977 00:40:32,290 --> 00:40:33,490 Don't do that. 978 00:40:33,490 --> 00:40:39,130 If you want initially, room for just one value, say one integer, 979 00:40:39,130 --> 00:40:41,230 only ask the computer for that. 980 00:40:41,230 --> 00:40:44,890 Give me space for one integer and I'll put my number 42 in here. 981 00:40:44,890 --> 00:40:48,660 And then, if and only if, you want a second integer, 982 00:40:48,660 --> 00:40:50,890 do you ask the computer for a second integer. 983 00:40:50,890 --> 00:40:54,490 And so the computer, as by a malloc, or whatnot, will give you another one 984 00:40:54,490 --> 00:40:55,510 like, the number 13. 985 00:40:55,510 --> 00:40:58,900 And if you want a third, just ask the same question of the operating system. 986 00:40:58,900 --> 00:41:02,470 Each time just getting back one chunk of memory. 987 00:41:02,470 --> 00:41:05,560 But there's a fundamental gotcha here. 988 00:41:05,560 --> 00:41:06,850 There's always a trade off. 989 00:41:06,850 --> 00:41:08,200 So yes, this is possible. 990 00:41:08,200 --> 00:41:10,150 You can call malloc three times. 991 00:41:10,150 --> 00:41:13,690 Each time asking for a chunk of memory of size 1, instead of size 3, 992 00:41:13,690 --> 00:41:15,160 for instance. 993 00:41:15,160 --> 00:41:16,450 But what's the price you pay? 994 00:41:16,450 --> 00:41:18,460 Or what problem do we still need to solve? 995 00:41:18,460 --> 00:41:19,147 Yeah? 996 00:41:19,147 --> 00:41:20,580 AUDIENCE: They're not stored next to each other. 997 00:41:20,580 --> 00:41:20,780 DAVID J. MALAN: Yeah. 998 00:41:20,780 --> 00:41:22,613 They're not being stored next to each other. 999 00:41:22,613 --> 00:41:26,440 So even though I can think of this as being the first element, the second, 1000 00:41:26,440 --> 00:41:31,960 and the third, you do not have, in this story, random access to elements. 1001 00:41:31,960 --> 00:41:35,710 And random access, ergo, random access memory, or RAM, 1002 00:41:35,710 --> 00:41:38,290 just means that arithmetically, like, mathematically, you 1003 00:41:38,290 --> 00:41:43,190 can jump to location 0, location 1, location 2, randomly, or in constant 1004 00:41:43,190 --> 00:41:43,690 time. 1005 00:41:43,690 --> 00:41:44,831 Just instantly. 1006 00:41:44,831 --> 00:41:47,830 Because if they're all back to back to back, all you have to do is like, 1007 00:41:47,830 --> 00:41:51,730 add 1, or add 4, or whatever to the address, and you're there. 1008 00:41:51,730 --> 00:41:55,570 But the problem is, if you're calling malloc again and again 1009 00:41:55,570 --> 00:41:58,450 and again, there's no guarantee that these things are even 1010 00:41:58,450 --> 00:42:00,890 going to be proximal to one another. 1011 00:42:00,890 --> 00:42:03,550 These second chunks of memory might end up-- 1012 00:42:03,550 --> 00:42:06,880 if this is a big chunk of memory we've been talking about, 1013 00:42:06,880 --> 00:42:09,550 where the heaps up here, and the stacks down here-- 1014 00:42:09,550 --> 00:42:11,690 42 might end up over here. 1015 00:42:11,690 --> 00:42:14,350 The next chunk of memory, 50, might end up over here. 1016 00:42:14,350 --> 00:42:16,600 The third chunk might end up over here. 1017 00:42:16,600 --> 00:42:19,600 So you can't just jump from location 0, to 1, to 2, 1018 00:42:19,600 --> 00:42:25,730 because you have to somehow remember where location 0, and 1, and 2, are. 1019 00:42:25,730 --> 00:42:27,287 So how do we solve this? 1020 00:42:27,287 --> 00:42:30,370 Even if you haven't programed before, like, what would a solution be here? 1021 00:42:30,370 --> 00:42:33,274 1022 00:42:33,274 --> 00:42:35,659 AUDIENCE: Somehow store [INAUDIBLE]. 1023 00:42:35,659 --> 00:42:36,450 DAVID J. MALAN: OK. 1024 00:42:36,450 --> 00:42:38,772 Somehow storing the addresses of-- 1025 00:42:38,772 --> 00:42:40,500 AUDIENCE: Of the [INAUDIBLE] 1026 00:42:40,500 --> 00:42:40,890 DAVID J. MALAN: All right. 1027 00:42:40,890 --> 00:42:44,056 So let's just suppose, for the sake of discussion, that this chunk of memory 1028 00:42:44,056 --> 00:42:45,420 ended up at location 100. 1029 00:42:45,420 --> 00:42:48,180 This one ended up at like 150. 1030 00:42:48,180 --> 00:42:51,360 This one ended up at like 475. 1031 00:42:51,360 --> 00:42:53,610 Whatever those values are. 1032 00:42:53,610 --> 00:42:56,680 It would seem that somehow or other I need to remember three values-- 1033 00:42:56,680 --> 00:43:00,030 100, 150, and 475. 1034 00:43:00,030 --> 00:43:01,620 So where can I store that? 1035 00:43:01,620 --> 00:43:05,070 Well, it turns out, I can be a little clever but a little greedy. 1036 00:43:05,070 --> 00:43:08,040 I could say to malloc, you know what, every time I call you, don't just 1037 00:43:08,040 --> 00:43:11,580 give me space for an integer, give me space for an integer 1038 00:43:11,580 --> 00:43:15,520 plus the address of another integer. 1039 00:43:15,520 --> 00:43:19,350 So if you've ever kind of seen like popcorn strung together on a string, 1040 00:43:19,350 --> 00:43:24,360 or any kind of chain link fence where one link is linking to another. 1041 00:43:24,360 --> 00:43:29,130 We could create the equivalent of-- oops not that. 1042 00:43:29,130 --> 00:43:33,900 We could create the equivalent of this kind of picture, 1043 00:43:33,900 --> 00:43:38,010 where each of these squares, or nodes, we'll start calling them, kind of links 1044 00:43:38,010 --> 00:43:39,270 graphically to the other. 1045 00:43:39,270 --> 00:43:41,790 Well, we've seen these links, or these pointers, 1046 00:43:41,790 --> 00:43:44,490 literally arrows that are pointing implemented in code. 1047 00:43:44,490 --> 00:43:46,740 An arrow or a pointer is just an address. 1048 00:43:46,740 --> 00:43:47,640 So you know what? 1049 00:43:47,640 --> 00:43:53,310 We should just ask malloc not for enough space for just the number 42, 1050 00:43:53,310 --> 00:43:57,990 we should instead, ask for a little more memory in each of these squares, 1051 00:43:57,990 --> 00:44:00,510 making them pictorially rectangles now. 1052 00:44:00,510 --> 00:44:04,320 So that now, yes, we do have these arrows conceptually 1053 00:44:04,320 --> 00:44:06,460 pointing from one location to another. 1054 00:44:06,460 --> 00:44:10,602 But what values do I actually want to put in these new additional boxes? 1055 00:44:10,602 --> 00:44:12,570 AUDIENCE: The addresses of the next. 1056 00:44:12,570 --> 00:44:13,800 DAVID J. MALAN: The addresses of the next. 1057 00:44:13,800 --> 00:44:15,258 So they're like little breadcrumbs. 1058 00:44:15,258 --> 00:44:18,390 So in this box here, associated with the first value, 1059 00:44:18,390 --> 00:44:22,950 should be the address of my second value, 475. 1060 00:44:22,950 --> 00:44:26,370 Associated with my second value here, per the arrow-- 1061 00:44:26,370 --> 00:44:28,920 and let me draw the arrow from the right place. 1062 00:44:28,920 --> 00:44:33,150 --from the arrow, should be the address 150, because that's the last. 1063 00:44:33,150 --> 00:44:37,090 And then, from this extra box, what should I put there? 1064 00:44:37,090 --> 00:44:37,590 Yeah? 1065 00:44:37,590 --> 00:44:38,880 AUDIENCE: Slash 0 or something? 1066 00:44:38,880 --> 00:44:39,755 DAVID J. MALAN: Yeah. 1067 00:44:39,755 --> 00:44:43,050 So probably, the equivalent of slash 0, which in the world of pointer's recall, 1068 00:44:43,050 --> 00:44:44,460 is null. 1069 00:44:44,460 --> 00:44:47,820 So just a special value that means that's it, this is the end of the line. 1070 00:44:47,820 --> 00:44:51,320 That still leaves us with room to add a fourth value and point to it, 1071 00:44:51,320 --> 00:44:56,020 but it for now, signifies very clearly to us there's nothing actually there. 1072 00:44:56,020 --> 00:44:58,210 So what did we just do? 1073 00:44:58,210 --> 00:45:03,210 We created a list of values 50, oh sorry 42, 50, 13, 1074 00:45:03,210 --> 00:45:04,549 but we linked to them together. 1075 00:45:04,549 --> 00:45:06,090 First, pictorially, with just arrows. 1076 00:45:06,090 --> 00:45:08,280 Like any human might with a piece of chalk. 1077 00:45:08,280 --> 00:45:10,530 But technically in code, we could do this 1078 00:45:10,530 --> 00:45:14,380 by just storing addresses in each of these places. 1079 00:45:14,380 --> 00:45:19,260 So just to be clear then, what might this actually translate to in code? 1080 00:45:19,260 --> 00:45:22,020 Well, what if I proposed this. 1081 00:45:22,020 --> 00:45:28,320 In code, we might do something like this. 1082 00:45:28,320 --> 00:45:29,831 If we want to store an integer. 1083 00:45:29,831 --> 00:45:32,580 We're of course, going to need to store like int n, we'll call it. 1084 00:45:32,580 --> 00:45:35,790 n will represent 42, or 50, or 13. 1085 00:45:35,790 --> 00:45:37,585 But if we want to create a data structure, 1086 00:45:37,585 --> 00:45:39,960 we might want to start giving this data structure a name. 1087 00:45:39,960 --> 00:45:44,250 I called it, a moment ago, node, which is a CS term for a node in a linked 1088 00:45:44,250 --> 00:45:45,430 list, so to speak. 1089 00:45:45,430 --> 00:45:46,410 And it looks like this. 1090 00:45:46,410 --> 00:45:48,780 So typedef means, give me my own type. 1091 00:45:48,780 --> 00:45:51,120 Struct means, make it a structure, like a student was. 1092 00:45:51,120 --> 00:45:53,620 And then, node, which is going to be the name of this thing. 1093 00:45:53,620 --> 00:45:57,720 And I'll explain in a moment why I have the word node twice this time. 1094 00:45:57,720 --> 00:46:01,870 But I left room on the board for just one more line. 1095 00:46:01,870 --> 00:46:06,120 In addition to an int, called n, or whatever, 1096 00:46:06,120 --> 00:46:09,450 I need to somehow represent in code, the additional memory 1097 00:46:09,450 --> 00:46:11,890 that I want malloc to give me for the address. 1098 00:46:11,890 --> 00:46:14,910 So first of all, these are addresses of what data types? 1099 00:46:14,910 --> 00:46:16,720 Each of those three new boxes. 1100 00:46:16,720 --> 00:46:17,636 AUDIENCE: [INAUDIBLE]. 1101 00:46:17,636 --> 00:46:21,060 DAVID J. MALAN: They are the addresses of integers in that point in the story. 1102 00:46:21,060 --> 00:46:26,820 But technically, what is this box really pointing to? 1103 00:46:26,820 --> 00:46:29,370 Is it pointing specifically to the ints? 1104 00:46:29,370 --> 00:46:30,539 AUDIENCE: [INAUDIBLE]. 1105 00:46:30,539 --> 00:46:33,580 DAVID J. MALAN: It's pointing to that whole chunk of memory, if you will. 1106 00:46:33,580 --> 00:46:37,020 So if you start thinking about each of these rectangles as being a node, 1107 00:46:37,020 --> 00:46:39,910 and each of the arrows as pointing to another node, 1108 00:46:39,910 --> 00:46:45,510 we need to somehow express, I need to somehow store a pointer to a node. 1109 00:46:45,510 --> 00:46:48,510 In other words, each of these arrows needs to point to another node. 1110 00:46:48,510 --> 00:46:51,500 And in code, we could say this. 1111 00:46:51,500 --> 00:46:52,035 Right? 1112 00:46:52,035 --> 00:46:53,160 Like, let's give it a name. 1113 00:46:53,160 --> 00:46:55,990 Instead of n, which is the number, let's call it next. 1114 00:46:55,990 --> 00:46:59,940 So next, shall be the name of this field that points to the next node in memory. 1115 00:46:59,940 --> 00:47:04,106 And node star, what does that mean in English, if you will? 1116 00:47:04,106 --> 00:47:05,147 AUDIENCE: [INAUDIBLE]. 1117 00:47:05,147 --> 00:47:06,230 DAVID J. MALAN: Say again? 1118 00:47:06,230 --> 00:47:07,560 AUDIENCE: Pointing to an address. 1119 00:47:07,560 --> 00:47:08,560 DAVID J. MALAN: Pointing to an address. 1120 00:47:08,560 --> 00:47:08,730 Right? 1121 00:47:08,730 --> 00:47:09,540 It looks different. 1122 00:47:09,540 --> 00:47:11,550 Node is a new word today and that's fine. 1123 00:47:11,550 --> 00:47:14,550 But node star, just means a pointer to a node. 1124 00:47:14,550 --> 00:47:15,960 The address of a node. 1125 00:47:15,960 --> 00:47:18,750 And it turns out that this is a custom structure 1126 00:47:18,750 --> 00:47:20,400 so we actually have to say this. 1127 00:47:20,400 --> 00:47:23,760 But it's the same principle even though things are kind of escalating quickly 1128 00:47:23,760 --> 00:47:29,606 here, we just need to values, an int, and then, a pointer to another thing. 1129 00:47:29,606 --> 00:47:31,480 That other thing is going to be another node. 1130 00:47:31,480 --> 00:47:35,160 And we're just using a node, frankly, to encapsulate two values-- 1131 00:47:35,160 --> 00:47:36,420 an int and a pointer. 1132 00:47:36,420 --> 00:47:39,075 And the way you express in C, albeit somewhat cryptically, 1133 00:47:39,075 --> 00:47:43,770 a pointer, or one of those arrows, is you say give me a variable called next, 1134 00:47:43,770 --> 00:47:47,580 have it point to a structure called node. 1135 00:47:47,580 --> 00:47:51,930 Or rather, have it be the address of a structure of type node. 1136 00:47:51,930 --> 00:47:52,691 Yeah? 1137 00:47:52,691 --> 00:47:56,619 AUDIENCE: How can you [? reveal ?] the timing of struct node [INAUDIBLE]?? 1138 00:47:56,619 --> 00:48:00,321 1139 00:48:00,321 --> 00:48:01,570 DAVID J. MALAN: Good question. 1140 00:48:01,570 --> 00:48:06,430 So this feels like a circular kind of definition because I'm defining a node, 1141 00:48:06,430 --> 00:48:08,980 and yet, inside of a node is a node. 1142 00:48:08,980 --> 00:48:11,860 That is OK because of the star. 1143 00:48:11,860 --> 00:48:14,350 It is necessary in C-- 1144 00:48:14,350 --> 00:48:18,040 remember that C always is kind of read top to bottom. 1145 00:48:18,040 --> 00:48:22,630 So accordingly, this very first line of code here, typedef struct note, 1146 00:48:22,630 --> 00:48:25,570 at that point in the story, when clang has read that line, 1147 00:48:25,570 --> 00:48:28,821 it knows that a phrase, struct node, exists. 1148 00:48:28,821 --> 00:48:30,820 AUDIENCE: That's why you say nodes [INAUDIBLE].. 1149 00:48:30,820 --> 00:48:32,031 DAVID J. MALAN: Exactly. 1150 00:48:32,031 --> 00:48:32,530 Exactly. 1151 00:48:32,530 --> 00:48:34,150 We didn't need to do this with students because there were 1152 00:48:34,150 --> 00:48:36,400 no pointers involved to other students. 1153 00:48:36,400 --> 00:48:37,700 But yes, in this case. 1154 00:48:37,700 --> 00:48:42,160 So in short, this tells clang, hey, clang, give me a structure called node. 1155 00:48:42,160 --> 00:48:45,130 And then, in here, we say, hey, clang, each of those nodes 1156 00:48:45,130 --> 00:48:47,800 shall have two things, an integer called n, 1157 00:48:47,800 --> 00:48:52,300 and a pointer to another one of these data structures of type node, 1158 00:48:52,300 --> 00:48:55,780 and call the whole thing, node. 1159 00:48:55,780 --> 00:48:56,870 It's a bit of a mouthful. 1160 00:48:56,870 --> 00:48:58,610 But all this is, is the following. 1161 00:48:58,610 --> 00:49:00,460 Let me go ahead and erase all of this. 1162 00:49:00,460 --> 00:49:03,190 All this data type is-- 1163 00:49:03,190 --> 00:49:07,360 if we get rid of the picture we draw on the fly there. 1164 00:49:07,360 --> 00:49:10,750 --is this says, hey, clang, give me a data structure 1165 00:49:10,750 --> 00:49:12,880 that pictorially looks like this. 1166 00:49:12,880 --> 00:49:14,600 It's divided into two parts. 1167 00:49:14,600 --> 00:49:18,280 The first part is called n, the second type is called, next. 1168 00:49:18,280 --> 00:49:20,380 This data type is of type int. 1169 00:49:20,380 --> 00:49:24,090 This is a pointer to another such node. 1170 00:49:24,090 --> 00:49:24,910 And that's it. 1171 00:49:24,910 --> 00:49:28,180 Even though the code looks complex, the idea is exactly that. 1172 00:49:28,180 --> 00:49:29,472 Yeah? 1173 00:49:29,472 --> 00:49:31,952 AUDIENCE: [INAUDIBLE]? 1174 00:49:31,952 --> 00:49:34,930 Why do you have to say struct node again? 1175 00:49:34,930 --> 00:49:37,450 DAVID J. MALAN: Good question. 1176 00:49:37,450 --> 00:49:42,220 The reason is, as just came up a moment ago, clang 1177 00:49:42,220 --> 00:49:43,870 and C, in general, are kind of dumb. 1178 00:49:43,870 --> 00:49:45,970 They just read code top to bottom. 1179 00:49:45,970 --> 00:49:49,450 And the problem is, you have to declare the name of this structure 1180 00:49:49,450 --> 00:49:52,870 as being a struct node before you actually use it. 1181 00:49:52,870 --> 00:49:55,930 It's similar in spirit to our discussion of prototypes-- y functions need 1182 00:49:55,930 --> 00:49:57,580 to be mentioned way up top. 1183 00:49:57,580 --> 00:50:00,940 This just says to clang, give me a type called struct node. 1184 00:50:00,940 --> 00:50:02,990 You don't know what it's going to look like yet. 1185 00:50:02,990 --> 00:50:05,380 But I'll finish my thought later. 1186 00:50:05,380 --> 00:50:08,770 And then in here, we're just telling clang, inside of that node 1187 00:50:08,770 --> 00:50:12,680 should be an integer, as well as, a pointer to the very type of thing 1188 00:50:12,680 --> 00:50:14,050 I'm in the middle of defining. 1189 00:50:14,050 --> 00:50:17,350 But if I had left off the word node up there, and just said struct, 1190 00:50:17,350 --> 00:50:21,730 you couldn't do that because it hasn't seen the word N-O-D-E yet. 1191 00:50:21,730 --> 00:50:22,750 That's all. 1192 00:50:22,750 --> 00:50:24,650 Other questions? 1193 00:50:24,650 --> 00:50:25,150 All right. 1194 00:50:25,150 --> 00:50:29,770 So if I now have a data structure called node, 1195 00:50:29,770 --> 00:50:32,497 I can use it to kind of stitch together these linked lists. 1196 00:50:32,497 --> 00:50:34,330 And maybe just the very things a little bit, 1197 00:50:34,330 --> 00:50:37,150 and to start giving away some ducks, would folks 1198 00:50:37,150 --> 00:50:40,680 be comfortable with volunteering to solve a problem here? 1199 00:50:40,680 --> 00:50:41,180 Yeah? 1200 00:50:41,180 --> 00:50:41,350 OK. 1201 00:50:41,350 --> 00:50:42,150 Come on up. 1202 00:50:42,150 --> 00:50:44,488 1, 2-- 1203 00:50:44,488 --> 00:50:45,940 AUDIENCE: [INAUDIBLE]. 1204 00:50:45,940 --> 00:50:46,889 DAVID J. MALAN: Sure. 1205 00:50:46,889 --> 00:50:48,180 Or you can take a duck and run. 1206 00:50:48,180 --> 00:50:48,680 OK. 1207 00:50:48,680 --> 00:50:49,940 1, 2, how about 3? 1208 00:50:49,940 --> 00:50:51,230 Come on over here, 3. 1209 00:50:51,230 --> 00:50:54,770 So if you want to be our first pointer, you can be number 5. 1210 00:50:54,770 --> 00:50:55,610 Come on over here. 1211 00:50:55,610 --> 00:50:57,740 You want to be number 9. 1212 00:50:57,740 --> 00:50:58,610 And one more. 1213 00:50:58,610 --> 00:50:59,541 One more volunteer. 1214 00:50:59,541 --> 00:51:00,290 Come on over here. 1215 00:51:00,290 --> 00:51:01,400 Yeah. 1216 00:51:01,400 --> 00:51:02,030 All right. 1217 00:51:02,030 --> 00:51:08,860 So-- I'll meet you over here. 1218 00:51:08,860 --> 00:51:10,140 OK, 17. 1219 00:51:10,140 --> 00:51:10,710 All right. 1220 00:51:10,710 --> 00:51:11,744 So if you'd like to-- 1221 00:51:11,744 --> 00:51:14,160 just so we pick this up for those following along at home. 1222 00:51:14,160 --> 00:51:16,326 If you would like to just say hello to the audience. 1223 00:51:16,326 --> 00:51:17,528 ANDREA: Hi, I'm Andrea. 1224 00:51:17,528 --> 00:51:19,760 [? COMEY: ?] Hi, [? I'm Comey. ?] 1225 00:51:19,760 --> 00:51:21,496 [? KYONG: ?] Hi, [? I'm Kyong. ?] 1226 00:51:21,496 --> 00:51:22,917 SPEAKER 2: Hi, I'm [INAUDIBLE]. 1227 00:51:22,917 --> 00:51:24,000 DAVID J. MALAN: Wonderful. 1228 00:51:24,000 --> 00:51:24,270 OK. 1229 00:51:24,270 --> 00:51:26,820 If you wouldn't mind all just taking a big step back over the ducks, 1230 00:51:26,820 --> 00:51:28,540 just so that we're a little farther back. 1231 00:51:28,540 --> 00:51:29,790 Let's go ahead and do this. 1232 00:51:29,790 --> 00:51:33,060 If you're our first pointer, if you could come over here for instance, 1233 00:51:33,060 --> 00:51:34,470 and just stand outside the ducks. 1234 00:51:34,470 --> 00:51:37,570 And if you guys could come a little over here in front is still fine. 1235 00:51:37,570 --> 00:51:40,186 So here we have the makings of a linked list. 1236 00:51:40,186 --> 00:51:41,310 And what's your name again? 1237 00:51:41,310 --> 00:51:42,351 [? COMEY: ?] [? Comey. ?] 1238 00:51:42,351 --> 00:51:45,120 DAVID J. MALAN: [? Comey ?] is our first pointer if you will. 1239 00:51:45,120 --> 00:51:47,191 Via [? Comey's ?] variable are we just going 1240 00:51:47,191 --> 00:51:49,440 to keep track of the first element of the linked list. 1241 00:51:49,440 --> 00:51:52,564 So if you could, with your left hand, represent first. 1242 00:51:52,564 --> 00:51:54,480 Just point over at-- what was your name again? 1243 00:51:54,480 --> 00:51:55,140 ANDREA: Andrea. 1244 00:51:55,140 --> 00:51:56,890 DAVID J. MALAN: So Andrea is the number 9. 1245 00:51:56,890 --> 00:51:59,310 If you could use your left hand to point at number 5. 1246 00:51:59,310 --> 00:52:02,640 And if you could use your left hand, yep, to point at number 17. 1247 00:52:02,640 --> 00:52:05,766 And your left hand to just point at null, which we'll just call the ground. 1248 00:52:05,766 --> 00:52:07,556 So you don't want to just point it randomly 1249 00:52:07,556 --> 00:52:10,620 because that would be like following a bogus pointer, so here means null. 1250 00:52:10,620 --> 00:52:11,120 All right. 1251 00:52:11,120 --> 00:52:12,960 So this is a linked list. 1252 00:52:12,960 --> 00:52:15,900 All you need to store are linked list of three values 1253 00:52:15,900 --> 00:52:19,410 is three nodes, inside of which are three integers, 1254 00:52:19,410 --> 00:52:22,930 and their left hands represents that next pointer, so to speak. 1255 00:52:22,930 --> 00:52:25,920 [? Comey's ?] a little different, in that she's not holding a value. 1256 00:52:25,920 --> 00:52:27,210 She's not holding an integer. 1257 00:52:27,210 --> 00:52:31,710 Rather, holding just the name of the variable, first. 1258 00:52:31,710 --> 00:52:34,210 So you're the only one that's different here fundamentally. 1259 00:52:34,210 --> 00:52:36,610 So suppose I want to insert the number 20? 1260 00:52:36,610 --> 00:52:38,470 Could someone volunteer to be number 20? 1261 00:52:38,470 --> 00:52:38,970 OK. 1262 00:52:38,970 --> 00:52:40,690 Come on up. 1263 00:52:40,690 --> 00:52:41,640 All right. 1264 00:52:41,640 --> 00:52:43,401 And what's your name? 1265 00:52:43,401 --> 00:52:43,900 ERIC: Eric. 1266 00:52:43,900 --> 00:52:44,160 DAVID J. MALAN: Eric. 1267 00:52:44,160 --> 00:52:45,720 Eric, you're the number 20. 1268 00:52:45,720 --> 00:52:47,655 And Eric, actually, let's see. 1269 00:52:47,655 --> 00:52:50,760 1270 00:52:50,760 --> 00:52:52,290 Actually can we do this? 1271 00:52:52,290 --> 00:52:57,461 Let me give-- let me make this a little more different. 1272 00:52:57,461 --> 00:52:57,960 OK. 1273 00:52:57,960 --> 00:52:59,020 That never happened. 1274 00:52:59,020 --> 00:52:59,670 OK. 1275 00:52:59,670 --> 00:53:02,580 Eric, give me that please. 1276 00:53:02,580 --> 00:53:04,530 I want to insert Eric as number 5. 1277 00:53:04,530 --> 00:53:06,674 So Eric, I'm keeping this list sorted. 1278 00:53:06,674 --> 00:53:08,340 So where, obviously, you're going to go? 1279 00:53:08,340 --> 00:53:09,540 ERIC: Go right there. 1280 00:53:09,540 --> 00:53:09,810 DAVID J. MALAN: All right. 1281 00:53:09,810 --> 00:53:12,851 But before you do that, let's just consider what this looks like in code. 1282 00:53:12,851 --> 00:53:17,460 In code, presumably, we have malloced Eric from the audience. 1283 00:53:17,460 --> 00:53:20,250 I've given him a value, n of number 5. 1284 00:53:20,250 --> 00:53:23,550 And his left hand is like, it's garbage value right now, because it's not 1285 00:53:23,550 --> 00:53:25,170 pointing to anything specific. 1286 00:53:25,170 --> 00:53:28,680 So he's got two values-- an integer, and a left hand representing 1287 00:53:28,680 --> 00:53:30,030 the next pointer. 1288 00:53:30,030 --> 00:53:34,600 If the goal is to put Eric in sorted order. 1289 00:53:34,600 --> 00:53:36,210 What should our steps be? 1290 00:53:36,210 --> 00:53:38,690 Like, whose hand should point where, and in what order? 1291 00:53:38,690 --> 00:53:39,190 Yeah. 1292 00:53:39,190 --> 00:53:39,900 Give us one step. 1293 00:53:39,900 --> 00:53:41,595 AUDIENCE: You should point to number 9. 1294 00:53:41,595 --> 00:53:43,720 DAVID J. MALAN: OK so you should point at number 9, 1295 00:53:43,720 --> 00:53:46,666 which is equivalent to saying, point at whatever first. 1296 00:53:46,666 --> 00:53:48,040 Where [? Comey ?] is pointing at. 1297 00:53:48,040 --> 00:53:49,180 So go ahead and do that. 1298 00:53:49,180 --> 00:53:50,050 All right next? 1299 00:53:50,050 --> 00:53:50,550 What's the next step? 1300 00:53:50,550 --> 00:53:51,091 Someone else? 1301 00:53:51,091 --> 00:53:54,279 1302 00:53:54,279 --> 00:53:54,820 Someone else. 1303 00:53:54,820 --> 00:53:55,420 Almost there. 1304 00:53:55,420 --> 00:53:55,920 Yeah? 1305 00:53:55,920 --> 00:53:57,337 AUDIENCE: First should point to 5. 1306 00:53:57,337 --> 00:53:58,128 DAVID J. MALAN: OK. 1307 00:53:58,128 --> 00:54:00,154 So first, or [? Comey, ?] could you point to 5. 1308 00:54:00,154 --> 00:54:00,820 And that's fine. 1309 00:54:00,820 --> 00:54:01,750 You don't even have to move. 1310 00:54:01,750 --> 00:54:01,930 Right? 1311 00:54:01,930 --> 00:54:03,430 This is the beauty of a linked list. 1312 00:54:03,430 --> 00:54:05,474 It doesn't matter where you are in memory, 1313 00:54:05,474 --> 00:54:08,140 it's the whole beauty of these pointers, where you can literally 1314 00:54:08,140 --> 00:54:09,454 point at that other location. 1315 00:54:09,454 --> 00:54:12,370 It's not an array where they need to be standing back to back to back. 1316 00:54:12,370 --> 00:54:13,660 They can be pointing anywhere. 1317 00:54:13,660 --> 00:54:14,160 All right. 1318 00:54:14,160 --> 00:54:15,760 Let's go ahead and insert one more. 1319 00:54:15,760 --> 00:54:17,436 Who wants to be say, 55? 1320 00:54:17,436 --> 00:54:17,935 Big value. 1321 00:54:17,935 --> 00:54:18,435 Yeah. 1322 00:54:18,435 --> 00:54:20,930 Come on down. 1323 00:54:20,930 --> 00:54:21,430 All right. 1324 00:54:21,430 --> 00:54:21,730 What's your name? 1325 00:54:21,730 --> 00:54:22,620 [? KYONG: ?] [? Kyong. ?] 1326 00:54:22,620 --> 00:54:23,350 DAVID J. MALAN: [? Kyong. ?] OK. 1327 00:54:23,350 --> 00:54:24,290 So come on over. 1328 00:54:24,290 --> 00:54:26,498 So we've just malloced [? Kyong ?] from the audience. 1329 00:54:26,498 --> 00:54:28,600 I've given him his end value of 55. 1330 00:54:28,600 --> 00:54:31,570 His left hand is just some garbage value right now. 1331 00:54:31,570 --> 00:54:34,360 How do we insert [? Kyong ?] in the right order? 1332 00:54:34,360 --> 00:54:36,999 Where is the obviously supposed to go? 1333 00:54:36,999 --> 00:54:39,040 In sorted order, he obviously belongs at the end. 1334 00:54:39,040 --> 00:54:42,220 But here's the catch with the linked list. 1335 00:54:42,220 --> 00:54:45,160 Just like when we've discussed searching and sorting in the past, 1336 00:54:45,160 --> 00:54:48,550 the computer is pretty blind to all but just one value. 1337 00:54:48,550 --> 00:54:50,150 And the linked list, at the moment-- 1338 00:54:50,150 --> 00:54:52,810 like, I don't know that these three, these four, exist. 1339 00:54:52,810 --> 00:54:55,150 All I know really, is that [? Comey ?] exists. 1340 00:54:55,150 --> 00:54:58,600 Because via this first pointer, is the only access 1341 00:54:58,600 --> 00:55:00,100 to the rest of the elements. 1342 00:55:00,100 --> 00:55:03,250 And so what's cool about a linked list, but perhaps not obvious, 1343 00:55:03,250 --> 00:55:04,450 is that you only-- 1344 00:55:04,450 --> 00:55:06,702 the most important value is the first. 1345 00:55:06,702 --> 00:55:09,160 Because from the first value, you can get to everyone else. 1346 00:55:09,160 --> 00:55:12,190 It's not useful-- excuse me for me to remember, Andrea? 1347 00:55:12,190 --> 00:55:14,770 --Andrea alone, because if I do, I've just 1348 00:55:14,770 --> 00:55:18,340 lost track of [? Comey ?] and more importantly, because of his number, 1349 00:55:18,340 --> 00:55:18,934 Eric. 1350 00:55:18,934 --> 00:55:21,100 So all I have to do really, is remember [? Comey. ?] 1351 00:55:21,100 --> 00:55:27,240 So if the goal now is to insert number 55, what steps should come first? 1352 00:55:27,240 --> 00:55:28,484 No pun intended. 1353 00:55:28,484 --> 00:55:29,400 AUDIENCE: [INAUDIBLE]. 1354 00:55:29,400 --> 00:55:30,483 DAVID J. MALAN: Say again. 1355 00:55:30,483 --> 00:55:31,909 AUDIENCE: Finding the first space. 1356 00:55:31,909 --> 00:55:32,700 DAVID J. MALAN: OK. 1357 00:55:32,700 --> 00:55:33,490 Finding the first space. 1358 00:55:33,490 --> 00:55:36,615 So I'm going to start at [? Comey, ?] and I'm going to follow this pointer. 1359 00:55:36,615 --> 00:55:39,011 Number 5, does 55 belong here? 1360 00:55:39,011 --> 00:55:39,510 No. 1361 00:55:39,510 --> 00:55:42,240 So I'm going to follow this pointer and get to Andrea. 1362 00:55:42,240 --> 00:55:43,461 Does 55 belong here? 1363 00:55:43,461 --> 00:55:43,960 No. 1364 00:55:43,960 --> 00:55:46,931 Gonna follow her pointer, and 22, does it belong here? 1365 00:55:46,931 --> 00:55:47,430 No. 1366 00:55:47,430 --> 00:55:48,810 I follow this pointer, 26? 1367 00:55:48,810 --> 00:55:49,350 No. 1368 00:55:49,350 --> 00:55:51,000 But you have a free hand, it turns out. 1369 00:55:51,000 --> 00:55:52,874 So what step should come next? 1370 00:55:52,874 --> 00:55:54,850 AUDIENCE: [INAUDIBLE]. 1371 00:55:54,850 --> 00:55:58,600 DAVID J. MALAN: We could have you point at 55, and now done. 1372 00:55:58,600 --> 00:56:02,702 So relatively simple, but what was the running time of this? 1373 00:56:02,702 --> 00:56:04,047 AUDIENCE: [INAUDIBLE]. 1374 00:56:04,047 --> 00:56:05,380 DAVID J. MALAN: It's big o of n. 1375 00:56:05,380 --> 00:56:06,220 It's linear. 1376 00:56:06,220 --> 00:56:08,530 Because I had to start at the beginning, even though we 1377 00:56:08,530 --> 00:56:10,210 humans have the luxury of just eyeballing it. 1378 00:56:10,210 --> 00:56:11,860 Saying, oh, obviously, he belongs way at the end. 1379 00:56:11,860 --> 00:56:12,400 Mm-mm. 1380 00:56:12,400 --> 00:56:13,150 Not in code. 1381 00:56:13,150 --> 00:56:16,108 Like, we have to start at the beginning to reverse the whole darn list, 1382 00:56:16,108 --> 00:56:17,770 until we get linearly to the very end. 1383 00:56:17,770 --> 00:56:18,860 And now we're done. 1384 00:56:18,860 --> 00:56:20,080 Let's try one last one. 1385 00:56:20,080 --> 00:56:21,890 How about 20? 1386 00:56:21,890 --> 00:56:22,550 Yeah. 1387 00:56:22,550 --> 00:56:22,780 Great. 1388 00:56:22,780 --> 00:56:23,330 Come on down. 1389 00:56:23,330 --> 00:56:24,049 What's your name? 1390 00:56:24,049 --> 00:56:24,590 JAMES: James. 1391 00:56:24,590 --> 00:56:25,506 DAVID J. MALAN: James. 1392 00:56:25,506 --> 00:56:26,230 All right, James. 1393 00:56:26,230 --> 00:56:26,570 All right. 1394 00:56:26,570 --> 00:56:28,750 So we just malloced James, given him the number 20. 1395 00:56:28,750 --> 00:56:30,541 He obviously belongs roughly in the middle. 1396 00:56:30,541 --> 00:56:32,500 What's the first step? 1397 00:56:32,500 --> 00:56:33,460 AUDIENCE: [INAUDIBLE]. 1398 00:56:33,460 --> 00:56:34,514 DAVID J. MALAN: Sorry? 1399 00:56:34,514 --> 00:56:35,430 AUDIENCE: [INAUDIBLE]. 1400 00:56:35,430 --> 00:56:35,860 DAVID J. MALAN: All right. 1401 00:56:35,860 --> 00:56:36,980 So we start with [? Comey, ?] again. 1402 00:56:36,980 --> 00:56:37,479 All right. 1403 00:56:37,479 --> 00:56:37,980 First, OK. 1404 00:56:37,980 --> 00:56:39,220 5, do you belong here? 1405 00:56:39,220 --> 00:56:40,132 No. 1406 00:56:40,132 --> 00:56:41,090 Let me follow the link. 1407 00:56:41,090 --> 00:56:42,370 OK 9, do you belong here? 1408 00:56:42,370 --> 00:56:43,060 No. 1409 00:56:43,060 --> 00:56:44,822 Do you belong at 22-- ooh. 1410 00:56:44,822 --> 00:56:46,030 But what did I just do wrong? 1411 00:56:46,030 --> 00:56:48,740 1412 00:56:48,740 --> 00:56:49,582 I went too far. 1413 00:56:49,582 --> 00:56:50,540 At least in this story. 1414 00:56:50,540 --> 00:56:52,330 Like, I literally-- Andrea is behind me now. 1415 00:56:52,330 --> 00:56:52,830 OK. 1416 00:56:52,830 --> 00:56:55,360 So can I follow the pointer backwards? 1417 00:56:55,360 --> 00:56:55,900 You can't. 1418 00:56:55,900 --> 00:56:58,000 Like in every picture we've drawn, and every example 1419 00:56:58,000 --> 00:57:00,400 we've done with an address, we only have the address of the next pointer. 1420 00:57:00,400 --> 00:57:03,632 We don't have what's called, a doubly linked list, at least in this story, 1421 00:57:03,632 --> 00:57:04,840 where I can just turn around. 1422 00:57:04,840 --> 00:57:05,590 So that was a bug. 1423 00:57:05,590 --> 00:57:06,923 So I need to start over instead. 1424 00:57:06,923 --> 00:57:09,100 First, OK 5, OK 19-- 1425 00:57:09,100 --> 00:57:11,200 what I really need in code, ultimately, is 1426 00:57:11,200 --> 00:57:14,200 to kind of peek ahead and not actually move-- not that far. 1427 00:57:14,200 --> 00:57:15,180 Just to 22. 1428 00:57:15,180 --> 00:57:19,250 Peek ahead at 22 and realize, oh, that's going to be too far. 1429 00:57:19,250 --> 00:57:20,810 This is not yet far enough. 1430 00:57:20,810 --> 00:57:22,654 So let's go ahead and bring James over. 1431 00:57:22,654 --> 00:57:24,570 Well, actually, you can stay there physically. 1432 00:57:24,570 --> 00:57:26,230 But what step has to happen first? 1433 00:57:26,230 --> 00:57:29,999 I know now he belongs in here. 1434 00:57:29,999 --> 00:57:31,040 You want to point at him? 1435 00:57:31,040 --> 00:57:31,540 OK. 1436 00:57:31,540 --> 00:57:32,294 Point at him. 1437 00:57:32,294 --> 00:57:32,794 ANDREA: Oh. 1438 00:57:32,794 --> 00:57:34,190 I'm sorry, he points first. 1439 00:57:34,190 --> 00:57:35,940 DAVID J. MALAN: Well let's do that, just because it is incorrect. 1440 00:57:35,940 --> 00:57:36,470 That's fine. 1441 00:57:36,470 --> 00:57:36,970 OK. 1442 00:57:36,970 --> 00:57:40,330 Andrea proposed that we point here, but she just broke the whole linked list. 1443 00:57:40,330 --> 00:57:40,830 Why? 1444 00:57:40,830 --> 00:57:42,664 ANDREA: Because there's nothing to point at. 1445 00:57:42,664 --> 00:57:43,580 DAVID J. MALAN: Right. 1446 00:57:43,580 --> 00:57:45,110 No one is remembering-- what's was your name again? 1447 00:57:45,110 --> 00:57:45,470 [? KYONG: ?] [? Kyong. ?] 1448 00:57:45,470 --> 00:57:47,140 DAVID J. MALAN: No one's remembered where [? Kyong ?] was. 1449 00:57:47,140 --> 00:57:47,660 So you can't do that. 1450 00:57:47,660 --> 00:57:49,034 Your left hand has to stay there. 1451 00:57:49,034 --> 00:57:50,802 So what steps should happen first instead? 1452 00:57:50,802 --> 00:57:52,220 AUDIENCE: [INAUDIBLE]. 1453 00:57:52,220 --> 00:57:54,590 DAVID J. MALAN: James should point at whatever 1454 00:57:54,590 --> 00:57:56,280 Andrea is pointing at, perhaps? 1455 00:57:56,280 --> 00:57:58,614 So a little redundantly at the moment, just like before. 1456 00:57:58,614 --> 00:57:59,113 OK. 1457 00:57:59,113 --> 00:58:00,060 Now what happens next? 1458 00:58:00,060 --> 00:58:00,726 That's step one. 1459 00:58:00,726 --> 00:58:01,882 ANDREA: Now I can point. 1460 00:58:01,882 --> 00:58:03,590 DAVID J. MALAN: Now you can point at him. 1461 00:58:03,590 --> 00:58:04,089 OK. 1462 00:58:04,089 --> 00:58:05,040 You could do that. 1463 00:58:05,040 --> 00:58:05,540 All right. 1464 00:58:05,540 --> 00:58:08,360 And so now, this looks like a complete mess, 1465 00:58:08,360 --> 00:58:10,790 but if we know that [? Comey ?] is first, 1466 00:58:10,790 --> 00:58:16,910 we can follow the breadcrumbs to Eric, and then to Andrea, and then to James, 1467 00:58:16,910 --> 00:58:20,810 and then the rest of our list step by step by step. 1468 00:58:20,810 --> 00:58:23,270 So it's a huge amount of like logic now. 1469 00:58:23,270 --> 00:58:24,712 But what problem have we solved? 1470 00:58:24,712 --> 00:58:26,670 And I think we identified it over here earlier. 1471 00:58:26,670 --> 00:58:29,632 What was the problem first and foremost with the arrays? 1472 00:58:29,632 --> 00:58:30,880 AUDIENCE: [INAUDIBLE]. 1473 00:58:30,880 --> 00:58:33,640 DAVID J. MALAN: You have to decide on their size in advance. 1474 00:58:33,640 --> 00:58:36,890 And once you do that, if you want to add an additional element, 1475 00:58:36,890 --> 00:58:38,611 you have to resize the whole darn thing. 1476 00:58:38,611 --> 00:58:41,110 Which is expensive because you have to move everyone around. 1477 00:58:41,110 --> 00:58:43,239 Now frankly, I'm being a little greedy here. 1478 00:58:43,239 --> 00:58:45,280 And every time we've inserted these new elements, 1479 00:58:45,280 --> 00:58:46,904 I've been keeping them in sorted order. 1480 00:58:46,904 --> 00:58:50,530 So it would seem that if you insert things in sorted order, big o event, 1481 00:58:50,530 --> 00:58:51,070 every time. 1482 00:58:51,070 --> 00:58:52,600 Because in the worst case, the new element 1483 00:58:52,600 --> 00:58:54,280 might end up all the way at the end. 1484 00:58:54,280 --> 00:58:55,840 But what if we relax that constraint? 1485 00:58:55,840 --> 00:58:59,950 What if I'm not so uptight and need everything nice and orderly and sorted? 1486 00:58:59,950 --> 00:59:02,950 What if I just want to keep growing the list in any random order? 1487 00:59:02,950 --> 00:59:05,200 And I allocate the number 34. 1488 00:59:05,200 --> 00:59:06,560 And I'll play the number 34. 1489 00:59:06,560 --> 00:59:08,090 Malloc 34. 1490 00:59:08,090 --> 00:59:11,710 Where is the quickest place for me to go? 1491 00:59:11,710 --> 00:59:12,490 Yeah? 1492 00:59:12,490 --> 00:59:14,180 AUDIENCE: Point to 5, and then have [INAUDIBLE].. 1493 00:59:14,180 --> 00:59:14,430 DAVID J. MALAN: OK. 1494 00:59:14,430 --> 00:59:16,450 I'll point to 5, and then, [? Comey, ?] if you could point to me. 1495 00:59:16,450 --> 00:59:17,290 Done. 1496 00:59:17,290 --> 00:59:18,530 One-- well, two steps. 1497 00:59:18,530 --> 00:59:19,030 All right. 1498 00:59:19,030 --> 00:59:22,384 Suppose now, I malloc 17 with someone else, who'll we'll 1499 00:59:22,384 --> 00:59:23,300 pretend is right here. 1500 00:59:23,300 --> 00:59:25,744 Where's the best place for 17 to go? 1501 00:59:25,744 --> 00:59:26,950 AUDIENCE: [INAUDIBLE]. 1502 00:59:26,950 --> 00:59:28,783 DAVID J. MALAN: Right after [? Comey ?] too. 1503 00:59:28,783 --> 00:59:33,320 So now, [? Comey ?] can point at 17, 17 can point at me, I can point at Eric, 1504 00:59:33,320 --> 00:59:34,490 and so forth. 1505 00:59:34,490 --> 00:59:35,740 And that's two steps again. 1506 00:59:35,740 --> 00:59:38,073 Two steps-- if it's the same number of steps every time, 1507 00:59:38,073 --> 00:59:39,310 we call that, constant time. 1508 00:59:39,310 --> 00:59:41,440 And we write it as big o of 1. 1509 00:59:41,440 --> 00:59:43,120 And so here too, it's just a trade off. 1510 00:59:43,120 --> 00:59:46,570 If you want really fast insertions, don't worry about sorting. 1511 00:59:46,570 --> 00:59:48,820 Just put them at the beginning and deal with it later. 1512 00:59:48,820 --> 00:59:52,904 If you want a dynamic resizeability, don't use an array, use a linked list, 1513 00:59:52,904 --> 00:59:55,570 and just keep allocating more and more as you go without wasting 1514 00:59:55,570 --> 00:59:56,819 a huge amount of space too. 1515 00:59:56,819 --> 00:59:59,110 Which notice, that's another big problem with an array. 1516 00:59:59,110 --> 01:00:02,930 If you over allocate space, and only use part of it, you're just wasting space. 1517 01:00:02,930 --> 01:00:04,404 So there's no one solution here. 1518 01:00:04,404 --> 01:00:06,820 But we do now have the capabilities, thanks to the structs 1519 01:00:06,820 --> 01:00:11,440 and pointers to stitch together, if you will, these new problems. 1520 01:00:11,440 --> 01:00:12,145 Yes, please. 1521 01:00:12,145 --> 01:00:13,936 SPEAKER 2: Why can't the node [INAUDIBLE]?? 1522 01:00:13,936 --> 01:00:17,360 1523 01:00:17,360 --> 01:00:19,464 DAVID J. MALAN: And who am I in this story? 1524 01:00:19,464 --> 01:00:20,432 SPEAKER 2: [INAUDIBLE]. 1525 01:00:20,432 --> 01:00:21,390 DAVID J. MALAN: Oh, OK. 1526 01:00:21,390 --> 01:00:22,000 Absolutely. 1527 01:00:22,000 --> 01:00:24,330 So another very reasonable idea would be, well, 1528 01:00:24,330 --> 01:00:26,610 why don't we just put the new ones at the end? 1529 01:00:26,610 --> 01:00:30,220 That's fine if I keep track of who is at the end. 1530 01:00:30,220 --> 01:00:32,250 The problem, is at the moment in the story, 1531 01:00:32,250 --> 01:00:35,416 and we'll ultimately see this in code, I'm only remembering [? Comey. ?] And 1532 01:00:35,416 --> 01:00:37,800 from [? Comey ?] am I getting everywhere else. 1533 01:00:37,800 --> 01:00:40,230 I could have another pointer, a second pointer, 1534 01:00:40,230 --> 01:00:42,750 and literally call it, last, that's equivalent to you. 1535 01:00:42,750 --> 01:00:44,400 Or that's always pointing at you. 1536 01:00:44,400 --> 01:00:46,860 I just need then two pointers, one literally called first, 1537 01:00:46,860 --> 01:00:47,943 one literally called last. 1538 01:00:47,943 --> 01:00:48,630 That's fine. 1539 01:00:48,630 --> 01:00:52,230 That's a nice optimization if I want to throw all the elements at the end. 1540 01:00:52,230 --> 01:00:54,570 And frankly, I could get really fancy-- 1541 01:00:54,570 --> 01:00:57,300 and to solve the problem that Andrea cited earlier-- 1542 01:00:57,300 --> 01:01:00,870 if I store not just an int and a pointer, but instead, 1543 01:01:00,870 --> 01:01:03,570 an int and two pointers, I can even have each 1544 01:01:03,570 --> 01:01:05,850 of these guys pointing with their left and right hands 1545 01:01:05,850 --> 01:01:10,530 in a doubly linked list, so as to solve the problem Andrea identified, which 1546 01:01:10,530 --> 01:01:12,580 was if I go too far no big deal. 1547 01:01:12,580 --> 01:01:13,535 Take one step back. 1548 01:01:13,535 --> 01:01:15,840 I don't have to think as hard about that logic. 1549 01:01:15,840 --> 01:01:17,220 So there too, a trade off. 1550 01:01:17,220 --> 01:01:18,510 Let's go ahead and take a five minute break. 1551 01:01:18,510 --> 01:01:19,200 I'll turn on some music. 1552 01:01:19,200 --> 01:01:20,250 Grab a duck now, if you'd like. 1553 01:01:20,250 --> 01:01:22,650 And we'll return with some fancier data structures still. 1554 01:01:22,650 --> 01:01:23,950 Thanks. 1555 01:01:23,950 --> 01:01:24,450 All right. 1556 01:01:24,450 --> 01:01:24,949 We're back. 1557 01:01:24,949 --> 01:01:27,577 So let's now translate some of these ideas to code. 1558 01:01:27,577 --> 01:01:29,910 So that we can actually solve this problem a little more 1559 01:01:29,910 --> 01:01:32,770 concretely than just having humans pointing at each other. 1560 01:01:32,770 --> 01:01:34,830 So for instance, let's try to distill everything 1561 01:01:34,830 --> 01:01:37,350 we've been talking about into just a goal in code 1562 01:01:37,350 --> 01:01:39,180 of storing a list of numbers. 1563 01:01:39,180 --> 01:01:42,630 I would propose that we can take like three passes at this problem. 1564 01:01:42,630 --> 01:01:45,622 The first would be, let's just decide in advance how many numbers we 1565 01:01:45,622 --> 01:01:47,580 want to store so we don't have to deal with all 1566 01:01:47,580 --> 01:01:50,550 this complexity with the pointing and the pointers and all this, 1567 01:01:50,550 --> 01:01:53,730 and just hard code that value somehow, and just stop 1568 01:01:53,730 --> 01:01:56,670 when the user is inputted that many numbers and no more. 1569 01:01:56,670 --> 01:02:01,410 Two, we can improve upon that and at least let the user dynamically resize 1570 01:02:01,410 --> 01:02:02,250 their array. 1571 01:02:02,250 --> 01:02:05,040 So that if they decide to input more numbers than we intend, 1572 01:02:05,040 --> 01:02:06,720 it's going to grow, and deal with that. 1573 01:02:06,720 --> 01:02:08,670 Of course, arrays are not necessarily ideal 1574 01:02:08,670 --> 01:02:11,550 because they have to do all that damn copying from old to new. 1575 01:02:11,550 --> 01:02:12,600 That's linear time. 1576 01:02:12,600 --> 01:02:14,905 It would seem smartest to get subversion 3, which 1577 01:02:14,905 --> 01:02:16,530 is actually going to use a linked list. 1578 01:02:16,530 --> 01:02:20,670 So we're just more modestly allocating space for another number, 1579 01:02:20,670 --> 01:02:23,490 and another number, and another number, or really a node. 1580 01:02:23,490 --> 01:02:25,150 One number at a time. 1581 01:02:25,150 --> 01:02:27,190 So let me go ahead and start as follows. 1582 01:02:27,190 --> 01:02:33,420 I'm going to go ahead and include some familiar lines in list 0.c, 1583 01:02:33,420 --> 01:02:36,900 of the CS50 library, just to make it easy to get some user input for this. 1584 01:02:36,900 --> 01:02:38,970 And standard iO dot h, for printdef. 1585 01:02:38,970 --> 01:02:42,264 And let me go ahead and declare my main function as usual. 1586 01:02:42,264 --> 01:02:44,180 And then, in here let's do a couple of things. 1587 01:02:44,180 --> 01:02:48,630 First, let's ask the user for the capacity of the array 1588 01:02:48,630 --> 01:02:49,650 that we're going to use. 1589 01:02:49,650 --> 01:02:50,941 Or rather, let's do this first. 1590 01:02:50,941 --> 01:02:53,250 Let me first rewind and say, you know what? 1591 01:02:53,250 --> 01:02:55,290 Int, numbers, 50. 1592 01:02:55,290 --> 01:02:58,080 Well, that's going to be annoying to type in 50 numbers. 1593 01:02:58,080 --> 01:03:01,610 We're going to give the user two numbers at first, that here, she can type in. 1594 01:03:01,610 --> 01:03:05,660 Next, let's go ahead and prompt the user for those numbers. 1595 01:03:05,660 --> 01:03:09,107 So let me go ahead and say-- 1596 01:03:09,107 --> 01:03:09,690 let's do this. 1597 01:03:09,690 --> 01:03:13,500 Let's at least clean this up a little bit so that we can reuse this value. 1598 01:03:13,500 --> 01:03:14,910 So we don't have a magic number. 1599 01:03:14,910 --> 01:03:16,830 This just came up in discussion actually. 1600 01:03:16,830 --> 01:03:20,721 So while-- do I want to do that? 1601 01:03:20,721 --> 01:03:21,220 Nope. 1602 01:03:21,220 --> 01:03:22,140 Let me fix this. 1603 01:03:22,140 --> 01:03:24,817 This will be my capacity of size 2. 1604 01:03:24,817 --> 01:03:26,400 And that's going to give me that size. 1605 01:03:26,400 --> 01:03:28,710 And then, I'm going to keep track of how many integers 1606 01:03:28,710 --> 01:03:30,624 I've prompted the user for so far. 1607 01:03:30,624 --> 01:03:33,040 So initially, the size of this structure is going to be 0. 1608 01:03:33,040 --> 01:03:35,280 But it's capacity, so to speak, is 2. 1609 01:03:35,280 --> 01:03:36,960 So size means how many things are in it. 1610 01:03:36,960 --> 01:03:39,370 Capacity means how many things can be in it. 1611 01:03:39,370 --> 01:03:43,200 And while the size of the structure is less than its capacity, 1612 01:03:43,200 --> 01:03:45,460 let's go ahead and get some inputs from the user. 1613 01:03:45,460 --> 01:03:49,740 Let's go ahead and ask them for a number, using our old friend, get int. 1614 01:03:49,740 --> 01:03:51,450 And just say, give me a number. 1615 01:03:51,450 --> 01:03:54,810 And then, let me go ahead and insert the number 1616 01:03:54,810 --> 01:04:00,390 that they type in into this array at location size, like this. 1617 01:04:00,390 --> 01:04:02,970 And then, do size plus, plus. 1618 01:04:02,970 --> 01:04:03,660 I think. 1619 01:04:03,660 --> 01:04:05,410 You know, I wrote it pretty quickly. 1620 01:04:05,410 --> 01:04:07,200 But let's consider what I just did. 1621 01:04:07,200 --> 01:04:10,530 I initialized size to 0, because there's nothing in it initially. 1622 01:04:10,530 --> 01:04:13,980 Then I say, while size is less than the capacity of the whole thing-- 1623 01:04:13,980 --> 01:04:15,810 and capacity is 2 by default-- 1624 01:04:15,810 --> 01:04:17,100 go ahead and do the following. 1625 01:04:17,100 --> 01:04:19,460 Give me an int from the user. 1626 01:04:19,460 --> 01:04:19,960 OK. 1627 01:04:19,960 --> 01:04:21,720 So int number gets int. 1628 01:04:21,720 --> 01:04:25,920 Then, put at location, size, in my numbers, array, 1629 01:04:25,920 --> 01:04:28,020 whatever the human typed in, number. 1630 01:04:28,020 --> 01:04:30,850 And then, increment size with plus, plus. 1631 01:04:30,850 --> 01:04:31,350 All right. 1632 01:04:31,350 --> 01:04:33,090 So on the first iteration size is 0. 1633 01:04:33,090 --> 01:04:35,280 So numbers, bracket, 0, gets the first number. 1634 01:04:35,280 --> 01:04:37,380 Numbers, bracket, 1, gets the second number. 1635 01:04:37,380 --> 01:04:39,190 Then, size equals capacity. 1636 01:04:39,190 --> 01:04:40,440 So it stops, logically. 1637 01:04:40,440 --> 01:04:44,531 Any questions on the logic of this code? 1638 01:04:44,531 --> 01:04:45,030 All right. 1639 01:04:45,030 --> 01:04:48,240 So once we have those numbers, let's just do something simple. 1640 01:04:48,240 --> 01:04:50,730 Like for int, I gets 0. 1641 01:04:50,730 --> 01:04:53,970 I is less than the actual size I, plus, plus. 1642 01:04:53,970 --> 01:05:00,210 Let's just go ahead and print out the number 1643 01:05:00,210 --> 01:05:07,300 you inputted, percent I, backslash n, and type out numbers, bracket, I. All 1644 01:05:07,300 --> 01:05:07,800 right. 1645 01:05:07,800 --> 01:05:12,780 So if I made no typos in list 0 dot C, then, I'm going to go ahead 1646 01:05:12,780 --> 01:05:16,180 and do dot, slash, o, dot, C. I'm going to be prompted for a couple of numbers. 1647 01:05:16,180 --> 01:05:18,690 Let's go ahead and do 1, 2. 1648 01:05:18,690 --> 01:05:19,991 You inputted 1, you inputted 2. 1649 01:05:19,991 --> 01:05:20,490 All right. 1650 01:05:20,490 --> 01:05:21,360 So not bad. 1651 01:05:21,360 --> 01:05:23,820 But this is bad design, arguably, why? 1652 01:05:23,820 --> 01:05:27,160 1653 01:05:27,160 --> 01:05:29,610 Just find one fault. It's correct. 1654 01:05:29,610 --> 01:05:32,332 But bad design. 1655 01:05:32,332 --> 01:05:33,974 AUDIENCE: Repetitive. 1656 01:05:33,974 --> 01:05:36,890 DAVID J. MALAN: Repetitive, because I'm using a couple of loops, sure. 1657 01:05:36,890 --> 01:05:39,560 And it's fundamentally-- it's very limited in functionality. 1658 01:05:39,560 --> 01:05:40,160 Why? 1659 01:05:40,160 --> 01:05:42,297 Like how useful is this program? 1660 01:05:42,297 --> 01:05:43,760 AUDIENCE: It's hard coded at 2. 1661 01:05:43,760 --> 01:05:43,917 DAVID J. MALAN: Yeah. 1662 01:05:43,917 --> 01:05:44,930 It's hard coded at 2. 1663 01:05:44,930 --> 01:05:47,210 So let's at least improve upon this a little bit, 1664 01:05:47,210 --> 01:05:48,650 and get rid of this hard coding. 1665 01:05:48,650 --> 01:05:52,110 Why don't I at least ask the user for something like this? 1666 01:05:52,110 --> 01:05:56,857 Well, instead of just declaring the capacity, let me go ahead and say, 1667 01:05:56,857 --> 01:05:57,440 you know what? 1668 01:05:57,440 --> 01:05:58,580 Let's just replace the 2. 1669 01:05:58,580 --> 01:06:01,651 Get int, and just say capacity, for instance. 1670 01:06:01,651 --> 01:06:02,150 All right. 1671 01:06:02,150 --> 01:06:05,360 And now if I do this, I'm going to be prompted-- 1672 01:06:05,360 --> 01:06:07,600 so make list 0. 1673 01:06:07,600 --> 01:06:10,040 Dot slash list 0. 1674 01:06:10,040 --> 01:06:12,020 The capacity will be 2. 1675 01:06:12,020 --> 01:06:14,190 1, 2, that's nice. 1676 01:06:14,190 --> 01:06:17,120 But if I run it again, and give it a capacity of 3-- 1677 01:06:17,120 --> 01:06:21,266 1, 2, 3, I get more capacity. 1678 01:06:21,266 --> 01:06:21,890 So that's nice. 1679 01:06:21,890 --> 01:06:23,240 It's an improvement for sure. 1680 01:06:23,240 --> 01:06:25,160 There is a bug here. 1681 01:06:25,160 --> 01:06:33,134 Before I test it further, can anyone identify a bug or somehow crash this? 1682 01:06:33,134 --> 01:06:34,050 AUDIENCE: [INAUDIBLE]. 1683 01:06:34,050 --> 01:06:34,904 DAVID J. MALAN: Oh, go ahead. 1684 01:06:34,904 --> 01:06:36,571 AUDIENCE: If you don't input an integer. 1685 01:06:36,571 --> 01:06:38,320 DAVID J. MALAN: If I don't put an integer. 1686 01:06:38,320 --> 01:06:39,844 Or-- is that same comment up here? 1687 01:06:39,844 --> 01:06:42,219 AUDIENCE: I was going to say, what happens if you go back 1688 01:06:42,219 --> 01:06:47,419 and put in [INAUDIBLE] those other [INAUDIBLE] will be in the memory. 1689 01:06:47,419 --> 01:06:48,210 DAVID J. MALAN: Oh. 1690 01:06:48,210 --> 01:06:48,530 No. 1691 01:06:48,530 --> 01:06:50,113 Because I'm rerunning it in each time. 1692 01:06:50,113 --> 01:06:53,210 I don't need to worry about previous runs of the program. 1693 01:06:53,210 --> 01:06:53,832 Yeah? 1694 01:06:53,832 --> 01:06:56,160 AUDIENCE: In the for loop, it just goes 1, 1695 01:06:56,160 --> 01:06:59,786 2, 3, it doesn't actually care what you put it. 1696 01:06:59,786 --> 01:07:03,560 DAVID J. MALAN: [INAUDIBLE] 1, 2, 3-- well, I am iterating up to size, 1697 01:07:03,560 --> 01:07:04,640 which could be capacity. 1698 01:07:04,640 --> 01:07:06,650 Because now they do end up being equivalent. 1699 01:07:06,650 --> 01:07:08,180 Because I'm filling the whole thing. 1700 01:07:08,180 --> 01:07:08,780 But let's try this. 1701 01:07:08,780 --> 01:07:10,010 If you don't type in a value. 1702 01:07:10,010 --> 01:07:12,670 So let me go ahead and rerun this. 1703 01:07:12,670 --> 01:07:15,871 My capacity shall be duck. 1704 01:07:15,871 --> 01:07:16,370 All right. 1705 01:07:16,370 --> 01:07:17,780 So we did handle that. 1706 01:07:17,780 --> 01:07:19,380 Because getInt does that for me. 1707 01:07:19,380 --> 01:07:21,920 But I bet I can still break this. 1708 01:07:21,920 --> 01:07:24,260 Ooh, yeah, let's always try something negative. 1709 01:07:24,260 --> 01:07:25,160 Oh, OK. 1710 01:07:25,160 --> 01:07:25,750 So bad. 1711 01:07:25,750 --> 01:07:27,500 Like cryptic looking message, but clearly, 1712 01:07:27,500 --> 01:07:28,980 has to do with a negative value. 1713 01:07:28,980 --> 01:07:31,260 So I should probably be a little smarter about this. 1714 01:07:31,260 --> 01:07:33,459 And recall from like, Week 1, we did do this. 1715 01:07:33,459 --> 01:07:35,000 With Mario, you might have done this. 1716 01:07:35,000 --> 01:07:41,840 So I could do something like, do, while capacity is less than 1. 1717 01:07:41,840 --> 01:07:46,520 I could go ahead and say, capacity getInt capacity. 1718 01:07:46,520 --> 01:07:50,100 So just a little bit of error checking to close the bug that you identified. 1719 01:07:50,100 --> 01:07:50,600 All right. 1720 01:07:50,600 --> 01:07:52,340 So let's go ahead and recompile this. 1721 01:07:52,340 --> 01:07:57,044 Make lists 0-- oops we're going to start hearing that a lot today. 1722 01:07:57,044 --> 01:07:57,960 Aren't we [INAUDIBLE]? 1723 01:07:57,960 --> 01:08:00,330 Make list 0, dot, slash, list 0. 1724 01:08:00,330 --> 01:08:01,490 Capacity will be 3. 1725 01:08:01,490 --> 01:08:02,987 1, 2, 3. 1726 01:08:02,987 --> 01:08:04,320 Now capacity will be negative 1. 1727 01:08:04,320 --> 01:08:05,890 Doesn't allow it. 1728 01:08:05,890 --> 01:08:07,410 Capacity 0, doesn't allow it. 1729 01:08:07,410 --> 01:08:09,300 Capacity 1, yes. 1730 01:08:09,300 --> 01:08:11,010 So non-exhaustively, I've tested it. 1731 01:08:11,010 --> 01:08:12,240 It feels like it's in better shape. 1732 01:08:12,240 --> 01:08:12,740 OK. 1733 01:08:12,740 --> 01:08:16,410 But this program, while correct, and while more featureful, 1734 01:08:16,410 --> 01:08:18,390 still has this fundamental limit. 1735 01:08:18,390 --> 01:08:21,779 Wouldn't it be nice to allow the user to just keep typing numbers, 1736 01:08:21,779 --> 01:08:26,410 as many as they want, and then quit once they're done inputting numbers. 1737 01:08:26,410 --> 01:08:26,910 Right? 1738 01:08:26,910 --> 01:08:29,100 If you're making a program to compute someone's GPA, 1739 01:08:29,100 --> 01:08:31,350 different students might have taken different courses, 1740 01:08:31,350 --> 01:08:33,930 you don't want to have them to type in all 32 courses. 1741 01:08:33,930 --> 01:08:35,680 If they're younger and haven't taken all those courses. 1742 01:08:35,680 --> 01:08:38,471 Like there's a lot of scenarios where you don't know in advance how 1743 01:08:38,471 --> 01:08:40,319 many numbers the user wants to provide. 1744 01:08:40,319 --> 01:08:43,950 But you want to support a few numbers, lots of numbers, or beyond. 1745 01:08:43,950 --> 01:08:46,319 So let's do this in a second version. 1746 01:08:46,319 --> 01:08:52,720 In list 1 dot C, let me go ahead and improve upon that example as follows. 1747 01:08:52,720 --> 01:08:57,420 First, let me give my familiar friends up here CS50 dot for iO, 1748 01:08:57,420 --> 01:09:02,910 standard iO dot h, and then, in here, int main void. 1749 01:09:02,910 --> 01:09:05,130 And then, let's start writing this. 1750 01:09:05,130 --> 01:09:10,986 So now, I don't know in advance, necessarily, how many numbers the user 1751 01:09:10,986 --> 01:09:11,819 is going to type in. 1752 01:09:11,819 --> 01:09:13,819 Like the goal is, I want them to be able to type 1753 01:09:13,819 --> 01:09:16,140 in a number, another number, another number, and then 1754 01:09:16,140 --> 01:09:19,651 hit the equivalent of like, q, for quit, when they're done inputting numbers. 1755 01:09:19,651 --> 01:09:22,859 Like I don't want them to have to think about in advance, how many numbers it 1756 01:09:22,859 --> 01:09:24,930 is they're inputting. 1757 01:09:24,930 --> 01:09:26,140 But how do I do that? 1758 01:09:26,140 --> 01:09:30,100 Like I can't just come up with an array called numbers, and say, 50. 1759 01:09:30,100 --> 01:09:32,580 Because if the user wants to type in 51 numbers, 1760 01:09:32,580 --> 01:09:34,050 I'm going to have to resize that. 1761 01:09:34,050 --> 01:09:36,495 But how do you resize an array? 1762 01:09:36,495 --> 01:09:39,470 1763 01:09:39,470 --> 01:09:41,791 How do you resize an array? 1764 01:09:41,791 --> 01:09:43,010 AUDIENCE: [INAUDIBLE]. 1765 01:09:43,010 --> 01:09:43,270 DAVID J. MALAN: What's that? 1766 01:09:43,270 --> 01:09:44,069 AUDIENCE: You can't. 1767 01:09:44,069 --> 01:09:44,850 DAVID J. MALAN: You can't. 1768 01:09:44,850 --> 01:09:45,029 Right. 1769 01:09:45,029 --> 01:09:47,720 We've never seen an instance where you've re-sized an array. 1770 01:09:47,720 --> 01:09:49,470 We talked about it on the blackboard here. 1771 01:09:49,470 --> 01:09:52,700 Well, just like, allocate a bigger one and copy everything in. 1772 01:09:52,700 --> 01:09:54,720 And we did identify realloc. 1773 01:09:54,720 --> 01:09:57,600 But you can't actually use realloc on an array. 1774 01:09:57,600 --> 01:10:01,350 Realloc actually accepts an address of a chunk of memory 1775 01:10:01,350 --> 01:10:04,050 that you want to grow, or shrink. 1776 01:10:04,050 --> 01:10:06,420 So it turns out, if we now start to harness 1777 01:10:06,420 --> 01:10:10,470 the sort of fundamental definition of what an array is, a chunk of memory, 1778 01:10:10,470 --> 01:10:13,500 we can actually build arrays ourselves. 1779 01:10:13,500 --> 01:10:16,710 If an array is just a chunk of memory, or more specifically, 1780 01:10:16,710 --> 01:10:21,000 it's like the address of the first byte of a chunk of memory, 1781 01:10:21,000 --> 01:10:25,620 it would seem that I could declare my array, not with square brackets 1782 01:10:25,620 --> 01:10:27,940 as we've been doing for weeks, but I can say, 1783 01:10:27,940 --> 01:10:31,020 you know what numbers really is, it's really just a pointer. 1784 01:10:31,020 --> 01:10:33,570 And I'm initially going to initialize it to null. 1785 01:10:33,570 --> 01:10:35,470 Because there is no array. 1786 01:10:35,470 --> 01:10:37,950 But now I have the ability to point that pointer 1787 01:10:37,950 --> 01:10:41,010 at any chunk of memory, small or big. 1788 01:10:41,010 --> 01:10:42,160 Now why is this useful? 1789 01:10:42,160 --> 01:10:44,460 Well, initially let me claim that my capacity is 0, 1790 01:10:44,460 --> 01:10:45,780 because nothing's going on yet. 1791 01:10:45,780 --> 01:10:47,820 I haven't called malloc or anything. 1792 01:10:47,820 --> 01:10:52,080 And initially, my size is 0 because there's nothing in the array. 1793 01:10:52,080 --> 01:10:53,640 And it doesn't even have a size. 1794 01:10:53,640 --> 01:10:55,770 But let me just do this forever. 1795 01:10:55,770 --> 01:10:59,220 Much like in scratch, we had the forever block you can use, while true, and C, 1796 01:10:59,220 --> 01:11:02,850 to just say keep doing this until the user breaks out of this. 1797 01:11:02,850 --> 01:11:06,690 And let me go ahead and ask the user, give me a number, getInt. 1798 01:11:06,690 --> 01:11:09,060 And just ask them for a number. 1799 01:11:09,060 --> 01:11:11,580 And then, we just need a place to put that. 1800 01:11:11,580 --> 01:11:14,580 So where do I put this number? 1801 01:11:14,580 --> 01:11:18,000 Well, do I have, at the moment, any place to put the number? 1802 01:11:18,000 --> 01:11:18,600 No. 1803 01:11:18,600 --> 01:11:20,790 And technically speaking, how do you express that? 1804 01:11:20,790 --> 01:11:25,110 Like in pseudo code, I want to say, if no place for number. 1805 01:11:25,110 --> 01:11:26,700 But technically, I could do this. 1806 01:11:26,700 --> 01:11:33,030 Well, if the size of the array at the moment, equals its capacity, 1807 01:11:33,030 --> 01:11:36,360 that feels like a lower level way of expressing the same thing. 1808 01:11:36,360 --> 01:11:41,130 If whatever the capacity is, if the size is the same, there is no more room. 1809 01:11:41,130 --> 01:11:47,100 And that simple statement also covers the scenario where the capacity is 0, 1810 01:11:47,100 --> 01:11:48,570 the size is therefore, 0. 1811 01:11:48,570 --> 01:11:50,280 So its the same question. 1812 01:11:50,280 --> 01:11:52,590 Either we have no space at all, or we have some space 1813 01:11:52,590 --> 01:11:56,260 but we've used it all-- size equals, equals, capacity. 1814 01:11:56,260 --> 01:11:59,730 So if the size equals capacity, or put more casually, 1815 01:11:59,730 --> 01:12:01,290 if I don't have enough space. 1816 01:12:01,290 --> 01:12:03,804 What do I want to do intuitively? 1817 01:12:03,804 --> 01:12:04,789 AUDIENCE: [INAUDIBLE]. 1818 01:12:04,789 --> 01:12:06,330 DAVID J. MALAN: Allocate more memory. 1819 01:12:06,330 --> 01:12:09,060 And it turns out, you proposed, or someone proposed earlier, 1820 01:12:09,060 --> 01:12:10,290 reallocating memory. 1821 01:12:10,290 --> 01:12:13,000 We can use this function for the very first time. 1822 01:12:13,000 --> 01:12:14,617 Let me go ahead and say this-- 1823 01:12:14,617 --> 01:12:16,950 the catch with realloc is you have to be smart about it, 1824 01:12:16,950 --> 01:12:18,158 because it returns a pointer. 1825 01:12:18,158 --> 01:12:19,980 So let me propose this code first. 1826 01:12:19,980 --> 01:12:23,250 First, just give me a temporary variable, call it, temp, 1827 01:12:23,250 --> 01:12:25,760 that's going to store the following. 1828 01:12:25,760 --> 01:12:26,400 Actually, no. 1829 01:12:26,400 --> 01:12:28,710 Let me start this more simply. 1830 01:12:28,710 --> 01:12:34,640 Let me go ahead and say, numbers should be reallocated please, 1831 01:12:34,640 --> 01:12:39,010 realloc by passing its self in. 1832 01:12:39,010 --> 01:12:43,140 And this time, give me the size of an int, times-- 1833 01:12:43,140 --> 01:12:47,160 how many ints do I want this time? 1834 01:12:47,160 --> 01:12:52,320 How many numbers did the human just input presumably? 1835 01:12:52,320 --> 01:12:53,640 AUDIENCE: [INAUDIBLE]. 1836 01:12:53,640 --> 01:12:54,240 DAVID J. MALAN: Just one. 1837 01:12:54,240 --> 01:12:54,740 Right. 1838 01:12:54,740 --> 01:12:57,570 Because literally, we've only called getInt once in this story. 1839 01:12:57,570 --> 01:13:03,550 So whatever the size of this array is now, we need to increase it by 1. 1840 01:13:03,550 --> 01:13:04,470 That's all. 1841 01:13:04,470 --> 01:13:09,790 So this line of code here is saying, hey computer, 1842 01:13:09,790 --> 01:13:15,630 go ahead and reallocate this array from whatever its current size is, 1843 01:13:15,630 --> 01:13:18,840 and make it this size instead. 1844 01:13:18,840 --> 01:13:22,542 The size of whatever it is, plus 1, times the size of an int. 1845 01:13:22,542 --> 01:13:24,750 Because that's what we're trying to store, is an int. 1846 01:13:24,750 --> 01:13:26,430 So we have to do that multiplication. 1847 01:13:26,430 --> 01:13:28,759 And realloc, as mentioned earlier, is pretty fancy. 1848 01:13:28,759 --> 01:13:31,050 It's going to take an pointer, whatever chunk of memory 1849 01:13:31,050 --> 01:13:34,830 you've already allocated, and it's going to then reallocate 1850 01:13:34,830 --> 01:13:36,090 a bigger chunk of memory. 1851 01:13:36,090 --> 01:13:38,070 Hopefully, what's going to happen is this-- 1852 01:13:38,070 --> 01:13:41,520 if your chunk of memory initially looks like this, 1853 01:13:41,520 --> 01:13:44,239 it's going to hopefully notice, oh, this memory is free. 1854 01:13:44,239 --> 01:13:46,030 Let me just give you back the same address. 1855 01:13:46,030 --> 01:13:48,450 So if this is address 100, and you get lucky 1856 01:13:48,450 --> 01:13:51,550 and this address is also available, the realloc function's 1857 01:13:51,550 --> 01:13:53,550 going to remember that for the operating system. 1858 01:13:53,550 --> 01:13:55,200 It's going to return the number 100 again. 1859 01:13:55,200 --> 01:13:56,116 And you're good to go. 1860 01:13:56,116 --> 01:13:57,750 You can safely touch memory here. 1861 01:13:57,750 --> 01:14:03,390 Or if this is in use already, this chunk of memory, and therefore we 1862 01:14:03,390 --> 01:14:06,810 can't fit another byte there because some other code you wrote 1863 01:14:06,810 --> 01:14:08,080 is using that memory. 1864 01:14:08,080 --> 01:14:11,640 But there is twice as much memory available down here. 1865 01:14:11,640 --> 01:14:14,760 What realloc will do, is if you've stored the number 50, 1866 01:14:14,760 --> 01:14:18,040 it will handle the process of copying 50 to the new value. 1867 01:14:18,040 --> 01:14:20,820 This is going to be left as a garbage value for you to deal with. 1868 01:14:20,820 --> 01:14:24,970 And it's going to return to you the address of the new chunk of memory, 1869 01:14:24,970 --> 01:14:27,360 having done the copying for you. 1870 01:14:27,360 --> 01:14:30,640 So even though it's technically re-allocating the array, 1871 01:14:30,640 --> 01:14:32,610 it's not necessarily just going to grow it. 1872 01:14:32,610 --> 01:14:36,030 It might relocate it in memory to a bigger chunk, 1873 01:14:36,030 --> 01:14:39,150 and then give you the new address of that memory. 1874 01:14:39,150 --> 01:14:39,951 Question? 1875 01:14:39,951 --> 01:14:42,060 AUDIENCE: Is that process really preferable 1876 01:14:42,060 --> 01:14:45,260 to just creating extra memory in it's place. 1877 01:14:45,260 --> 01:14:50,104 And then saving the time and energy of reallocating them [? all at once. ?] 1878 01:14:50,104 --> 01:14:52,020 DAVID J. MALAN: That's a really good question. 1879 01:14:52,020 --> 01:14:55,410 Honestly, we could avoid this problem slightly by just doing, you know what, 1880 01:14:55,410 --> 01:14:57,960 give me at least-- 1881 01:14:57,960 --> 01:15:01,330 go ahead and give me at least the size of an int, times-- 1882 01:15:01,330 --> 01:15:04,330 I don't know, most humans are not going to type in more than 50 numbers. 1883 01:15:04,330 --> 01:15:05,460 Let's just pick 50. 1884 01:15:05,460 --> 01:15:08,582 So you could do this, and that would indeed save you time. 1885 01:15:08,582 --> 01:15:10,290 Because the approach I'm currently taking 1886 01:15:10,290 --> 01:15:13,710 is pretty inefficient because every damn time the user 1887 01:15:13,710 --> 01:15:17,040 calls getInt, and gives an int, we're resizing, resizing, resizing. 1888 01:15:17,040 --> 01:15:18,450 Very expensive. 1889 01:15:18,450 --> 01:15:20,250 As to what the best value is though-- 1890 01:15:20,250 --> 01:15:20,910 50? 1891 01:15:20,910 --> 01:15:21,990 Should it be 25? 1892 01:15:21,990 --> 01:15:23,190 Should it be 1,000? 1893 01:15:23,190 --> 01:15:25,260 I'm either going to under bet or over bet. 1894 01:15:25,260 --> 01:15:30,240 And it just depends on you to decide which of those is the worst decisions. 1895 01:15:30,240 --> 01:15:32,610 AUDIENCE: But, like, in terms of programs, 1896 01:15:32,610 --> 01:15:36,150 is it also pretty expensive to have memory that you're not using 1897 01:15:36,150 --> 01:15:37,980 or generally, is it usually more OK? 1898 01:15:37,980 --> 01:15:39,230 DAVID J. MALAN: Good question. 1899 01:15:39,230 --> 01:15:42,330 In programs you're writing, is it better to have more memory than you're using, 1900 01:15:42,330 --> 01:15:43,950 or should you really be conservative? 1901 01:15:43,950 --> 01:15:45,420 These days, memory is cheap. 1902 01:15:45,420 --> 01:15:47,280 We all have gigabytes of memory. 1903 01:15:47,280 --> 01:15:52,320 And so wasting 50 bytes or 200 bytes, times 4, of memory, not a big deal. 1904 01:15:52,320 --> 01:15:54,780 Like, just get the job done quickly and easily. 1905 01:15:54,780 --> 01:15:57,960 But in resource constrained devices, maybe, things like phones 1906 01:15:57,960 --> 01:16:00,120 or little internet of things style devices 1907 01:16:00,120 --> 01:16:04,134 that have a lot fewer resources, you don't really want to go wasting bytes. 1908 01:16:04,134 --> 01:16:06,300 But honestly, the CPUs, the brains in our computers, 1909 01:16:06,300 --> 01:16:08,820 are so darned fast these days, even if you're calling malloc 1910 01:16:08,820 --> 01:16:11,550 10 times, 1,000 times, it's happening so darned fast 1911 01:16:11,550 --> 01:16:13,200 that the human doesn't even notice. 1912 01:16:13,200 --> 01:16:13,819 So there too. 1913 01:16:13,819 --> 01:16:15,610 These are what are called design decisions. 1914 01:16:15,610 --> 01:16:17,560 And these are the kinds of things that, in the real world, 1915 01:16:17,560 --> 01:16:19,440 you might actually debate with someone at a whiteboard, 1916 01:16:19,440 --> 01:16:21,540 saying, no, this is stupid because of this reason. 1917 01:16:21,540 --> 01:16:23,498 Or he or she might push back for other reasons. 1918 01:16:23,498 --> 01:16:25,110 And no one's necessarily right. 1919 01:16:25,110 --> 01:16:27,650 The whole goal is to just that thought process first 1920 01:16:27,650 --> 01:16:29,130 so you're at least confident in what you chose. 1921 01:16:29,130 --> 01:16:29,630 Yeah? 1922 01:16:29,630 --> 01:16:32,525 AUDIENCE: When we were writing to a file in the last PSET, 1923 01:16:32,525 --> 01:16:36,900 was it storing it in memory first or putting it right on the hard drive? 1924 01:16:36,900 --> 01:16:39,630 DAVID J. MALAN: When you were calling fread, 1925 01:16:39,630 --> 01:16:42,870 you were by definition in the forensics problem set 1926 01:16:42,870 --> 01:16:45,990 reading bytes from disk into memory. 1927 01:16:45,990 --> 01:16:51,800 When you were calling fwrite, you were copying bytes from memory back to disk. 1928 01:16:51,800 --> 01:16:53,070 If that answers the question. 1929 01:16:53,070 --> 01:16:53,320 OK. 1930 01:16:53,320 --> 01:16:53,830 Other questions? 1931 01:16:53,830 --> 01:16:54,484 Yeah? 1932 01:16:54,484 --> 01:16:58,570 AUDIENCE: Why did you say, size + 1, in line 16? 1933 01:16:58,570 --> 01:17:01,180 DAVID J. MALAN: Why do I say, size + 1, in line 16? 1934 01:17:01,180 --> 01:17:06,010 Because the whole goal is to make room in this array for the newly inputted 1935 01:17:06,010 --> 01:17:08,080 number that the human just typed in. 1936 01:17:08,080 --> 01:17:10,240 And so whatever the current size of the array is, 1937 01:17:10,240 --> 01:17:11,770 I clearly need one more space. 1938 01:17:11,770 --> 01:17:14,475 1939 01:17:14,475 --> 01:17:17,062 AUDIENCE: So that repeats on and on? 1940 01:17:17,062 --> 01:17:18,770 DAVID J. MALAN: It does repeat on and on. 1941 01:17:18,770 --> 01:17:22,620 Because at the moment, I'm inside of this while loop. 1942 01:17:22,620 --> 01:17:26,219 So we do need to ask a question, when is the human done inputting. 1943 01:17:26,219 --> 01:17:28,010 And it turns out-- and this is not obvious. 1944 01:17:28,010 --> 01:17:31,310 And it's not the best user experience on a keyboard for the human. 1945 01:17:31,310 --> 01:17:35,900 But we can actually detect the following sentiments-- 1946 01:17:35,900 --> 01:17:42,380 if user is done inputting numbers, then let's go ahead and break. 1947 01:17:42,380 --> 01:17:45,530 But the question then is, how do you express that pseudo code? 1948 01:17:45,530 --> 01:17:49,610 Well, you could in some programs maybe type q for quit. 1949 01:17:49,610 --> 01:17:52,610 But is that going to work when using getInt? 1950 01:17:52,610 --> 01:17:54,740 Could we detect q? 1951 01:17:54,740 --> 01:17:55,250 Why not? 1952 01:17:55,250 --> 01:17:58,722 1953 01:17:58,722 --> 01:18:01,700 AUDIENCE: Because getInt immediately prompts you for another integer. 1954 01:18:01,700 --> 01:18:02,290 DAVID J. MALAN: Exactly. 1955 01:18:02,290 --> 01:18:04,310 Because getInt immediately prompts you for another int. 1956 01:18:04,310 --> 01:18:07,280 So because of the way we designed the CS50 library, you can't detect q, 1957 01:18:07,280 --> 01:18:11,690 or you can't have the human type quit unless you don't use getInt. 1958 01:18:11,690 --> 01:18:14,072 You instead use? 1959 01:18:14,072 --> 01:18:15,846 AUDIENCE: getString. 1960 01:18:15,846 --> 01:18:17,470 DAVID J. MALAN: We could use getString. 1961 01:18:17,470 --> 01:18:20,230 And then every time the human types in a number, we could use, 1962 01:18:20,230 --> 01:18:23,140 like, A2i to convert it to an int. 1963 01:18:23,140 --> 01:18:26,020 But if the human types in q or Q-U-I-T-- 1964 01:18:26,020 --> 01:18:30,190 a string also-- we could just have an if condition with string compare and quit. 1965 01:18:30,190 --> 01:18:32,800 But honestly, then you're reimplementing getInt-- 1966 01:18:32,800 --> 01:18:34,030 so trade-off. 1967 01:18:34,030 --> 01:18:36,490 Anyhow, a common way to work around this would 1968 01:18:36,490 --> 01:18:39,610 be, you know that Control-C quits programs, perhaps, 1969 01:18:39,610 --> 01:18:41,050 cancels out of your program. 1970 01:18:41,050 --> 01:18:43,870 There's another popular keystroke, Control-D, 1971 01:18:43,870 --> 01:18:45,760 that sends what's called end of file. 1972 01:18:45,760 --> 01:18:48,240 It simulates the end of a file. 1973 01:18:48,240 --> 01:18:50,161 It simulates the end of the human's input. 1974 01:18:50,161 --> 01:18:52,910 So it's kind of like the period at the end of an English sentence. 1975 01:18:52,910 --> 01:18:56,467 So if you want to signal to a computer that's waiting for input from you that 1976 01:18:56,467 --> 01:18:59,050 you don't want to quit the program-- that would be Control-C-- 1977 01:18:59,050 --> 01:19:02,350 but you just want to be done inputting input to the computer, 1978 01:19:02,350 --> 01:19:04,435 you hit Control-D, otherwise known as EOF. 1979 01:19:04,435 --> 01:19:07,060 And the way to express this-- and you would only know this from 1980 01:19:07,060 --> 01:19:09,740 documentation-- would be to say something like this, 1981 01:19:09,740 --> 01:19:13,760 if the number the human typed in equals end of file-- 1982 01:19:13,760 --> 01:19:16,060 but there is no such thing in this context-- 1983 01:19:16,060 --> 01:19:19,930 you actually do this because of the CS50 library works. 1984 01:19:19,930 --> 01:19:23,290 It turns out that if the only values a function can return 1985 01:19:23,290 --> 01:19:28,390 are integers, that means you can return 0, 1, negative 1, 2 billion, 1986 01:19:28,390 --> 01:19:30,220 negative 2 billion give or take. 1987 01:19:30,220 --> 01:19:33,390 What humans did for years with old programming languages 1988 01:19:33,390 --> 01:19:35,860 is they would just steal one or a few numbers. 1989 01:19:35,860 --> 01:19:39,700 For instance, you'd steal the number two billion and call it intmax-- 1990 01:19:39,700 --> 01:19:41,050 the maximum integer. 1991 01:19:41,050 --> 01:19:43,720 And you'd just say, you can never actually type 2 billion, 1992 01:19:43,720 --> 01:19:46,750 because we're using that as a special value to signify 1993 01:19:46,750 --> 01:19:49,750 that the human hit Control-D. Or you could do negative 2 billion, 1994 01:19:49,750 --> 01:19:51,640 or you could do 0, or 50. 1995 01:19:51,640 --> 01:19:55,630 But at some point, you have to steal one of the 4 billion available numbers 1996 01:19:55,630 --> 01:19:58,300 to use as a sentinel value, a special value 1997 01:19:58,300 --> 01:20:00,700 that you can then check for as a constant. 1998 01:20:00,700 --> 01:20:04,090 So anyhow, this just means, when the user is done typing input, 1999 01:20:04,090 --> 01:20:06,650 go ahead and break out of this while loop. 2000 01:20:06,650 --> 01:20:08,990 And as an aside, let me fix one thing. 2001 01:20:08,990 --> 01:20:11,770 It turns out things can go wrong with realloc. 2002 01:20:11,770 --> 01:20:15,070 And if realloc fails to allocate memory, it 2003 01:20:15,070 --> 01:20:19,340 can return null, a special value that just means, eh, something went wrong. 2004 01:20:19,340 --> 01:20:20,410 It's an invalid pointer. 2005 01:20:20,410 --> 01:20:21,790 It's the address 0. 2006 01:20:21,790 --> 01:20:25,930 And so it turns out there's a subtle bug here where, technically, I 2007 01:20:25,930 --> 01:20:27,370 should actually do this-- 2008 01:20:27,370 --> 01:20:31,400 store realloc's return value in a temporary variable. 2009 01:20:31,400 --> 01:20:35,650 Because if temp = null, something went wrong. 2010 01:20:35,650 --> 01:20:39,640 And I should actually go ahead and quit out of this program. 2011 01:20:39,640 --> 01:20:42,792 But let me wave my hand at that for now because it's more of a corner case. 2012 01:20:42,792 --> 01:20:46,000 But you'll see in the online version of this program we have additional error 2013 01:20:46,000 --> 01:20:50,440 checking that just checks, in the rare case that realloc fails, 2014 01:20:50,440 --> 01:20:52,330 clean it up and return properly. 2015 01:20:52,330 --> 01:20:54,940 But I'll wave to the online code for that. 2016 01:20:54,940 --> 01:20:55,820 All right. 2017 01:20:55,820 --> 01:20:59,846 Any questions on that example before we move on? 2018 01:20:59,846 --> 01:21:00,345 Yeah? 2019 01:21:00,345 --> 01:21:06,040 AUDIENCE: So in realloc, when it creates the new pointer for the [INAUDIBLE],, 2020 01:21:06,040 --> 01:21:08,302 does it clear the memory from the original pointer? 2021 01:21:08,302 --> 01:21:09,594 Does it automatically clear it? 2022 01:21:09,594 --> 01:21:10,843 DAVID J. MALAN: Good question. 2023 01:21:10,843 --> 01:21:13,840 When you call realloc and it ends up allocating more space, 2024 01:21:13,840 --> 01:21:16,690 does it clear the original memory? 2025 01:21:16,690 --> 01:21:17,320 No. 2026 01:21:17,320 --> 01:21:21,474 And that is where garbage values come from, for instance. 2027 01:21:21,474 --> 01:21:23,890 Because they're just left in memory from the previous use. 2028 01:21:23,890 --> 01:21:24,820 Other questions? 2029 01:21:24,820 --> 01:21:25,320 Yeah? 2030 01:21:25,320 --> 01:21:29,140 AUDIENCE: What does the user actually type to break? 2031 01:21:29,140 --> 01:21:33,460 DAVID J. MALAN: Oh, Control-D. Control-D. And it's not break. 2032 01:21:33,460 --> 01:21:36,410 It is to send end of file, end of input. 2033 01:21:36,410 --> 01:21:38,710 Control-C kills or breaks out of the program itself. 2034 01:21:38,710 --> 01:21:41,080 AUDIENCE: And that's the same as the intmax kind of? 2035 01:21:41,080 --> 01:21:43,911 DAVID J. MALAN: Same as intmax? 2036 01:21:43,911 --> 01:21:44,410 Yes. 2037 01:21:44,410 --> 01:21:46,090 AUDIENCE: Because you're not adding, like, a giant value. 2038 01:21:46,090 --> 01:21:47,090 DAVID J. MALAN: Correct. 2039 01:21:47,090 --> 01:21:50,270 In the CS50 library, intmax, yes, is the symbol. 2040 01:21:50,270 --> 01:21:50,770 Yes. 2041 01:21:50,770 --> 01:21:51,405 Yeah? 2042 01:21:51,405 --> 01:21:55,170 AUDIENCE: Could you also just ask the user to say, 2043 01:21:55,170 --> 01:21:57,290 do you want to enter another number yes or no? 2044 01:21:57,290 --> 01:21:58,030 DAVID J. MALAN: Absolutely. 2045 01:21:58,030 --> 01:21:59,080 We could add more logic. 2046 01:21:59,080 --> 01:22:00,190 And you could use getString. 2047 01:22:00,190 --> 01:22:02,830 And we could prompt him or her, hey, do you want to input another number. 2048 01:22:02,830 --> 01:22:04,580 The only downside of that would be, now, I 2049 01:22:04,580 --> 01:22:07,580 have to type in not only my number, but yes or no constantly. 2050 01:22:07,580 --> 01:22:10,281 So it's just a trade-off user interface-wise. 2051 01:22:10,281 --> 01:22:10,780 All right. 2052 01:22:10,780 --> 01:22:12,280 So let me go ahead. 2053 01:22:12,280 --> 01:22:16,720 And let me go ahead and return 0 here just as my simple solution 2054 01:22:16,720 --> 01:22:19,900 to this problem of something going wrong. 2055 01:22:19,900 --> 01:22:21,610 I've just compiled this program. 2056 01:22:21,610 --> 01:22:22,780 Let me go ahead and run it. 2057 01:22:22,780 --> 01:22:27,220 I'm going to type in one number, two numbers, three numbers. 2058 01:22:27,220 --> 01:22:27,970 And now I'm bored. 2059 01:22:27,970 --> 01:22:29,303 I don't want to keep doing this. 2060 01:22:29,303 --> 01:22:31,030 How do I tell the computer I'm done? 2061 01:22:31,030 --> 01:22:31,930 AUDIENCE: Control-D. 2062 01:22:31,930 --> 01:22:34,600 DAVID J. MALAN: Control-D. Oops. 2063 01:22:34,600 --> 01:22:35,119 Oh, OK. 2064 01:22:35,119 --> 01:22:37,285 That's correct behavior because I forgot a key step. 2065 01:22:37,285 --> 01:22:39,830 2066 01:22:39,830 --> 01:22:41,100 What's that? 2067 01:22:41,100 --> 01:22:42,045 AUDIENCE: [INAUDIBLE]. 2068 01:22:42,045 --> 01:22:42,920 DAVID J. MALAN: Yeah. 2069 01:22:42,920 --> 01:22:44,920 I'm not actually doing anything with the values. 2070 01:22:44,920 --> 01:22:49,235 I should probably for int I get 0, I less than size, 2071 01:22:49,235 --> 01:22:52,140 I + + code we had before. 2072 01:22:52,140 --> 01:22:57,990 And I should probably print out You inputted %I, this. 2073 01:22:57,990 --> 01:22:59,270 Save that. 2074 01:22:59,270 --> 01:23:00,680 Make list one. 2075 01:23:00,680 --> 01:23:03,320 So all I did was re-add the printing code. 2076 01:23:03,320 --> 01:23:08,100 Now if I rerun this-- one, two three, Control-D-- 2077 01:23:08,100 --> 01:23:08,928 dammit. 2078 01:23:08,928 --> 01:23:10,579 AUDIENCE: [INAUDIBLE]. 2079 01:23:10,579 --> 01:23:11,370 DAVID J. MALAN: Oh. 2080 01:23:11,370 --> 01:23:11,869 OK. 2081 01:23:11,869 --> 01:23:14,620 Now I broke my code here. 2082 01:23:14,620 --> 01:23:15,586 Let me do this. 2083 01:23:15,586 --> 01:23:17,460 We're going to get rid of this error checking 2084 01:23:17,460 --> 01:23:19,590 because I'm not actually ever resizing. 2085 01:23:19,590 --> 01:23:21,630 numbers gets realloc. 2086 01:23:21,630 --> 01:23:24,570 Oh, and maybe someone chiming in with this-- 2087 01:23:24,570 --> 01:23:29,760 numbers bracket size gets the user's input. 2088 01:23:29,760 --> 01:23:34,110 Size + +-- was this a key detail someone wanted me to do? 2089 01:23:34,110 --> 01:23:34,610 OK. 2090 01:23:34,610 --> 01:23:36,960 So I didn't actually finish the program earlier. 2091 01:23:36,960 --> 01:23:39,110 Notice we left off as follows-- 2092 01:23:39,110 --> 01:23:43,130 hey, computer, give me an array of size 0 initially 2093 01:23:43,130 --> 01:23:45,210 that's null-- there's no memory for it. 2094 01:23:45,210 --> 01:23:47,780 Therefore, the size of this array is 0. 2095 01:23:47,780 --> 01:23:49,460 Do the following forever. 2096 01:23:49,460 --> 01:23:51,710 Get a number from the human. 2097 01:23:51,710 --> 01:23:54,580 If the number equals this special value, intmax just 2098 01:23:54,580 --> 01:23:57,680 breakout because the program is done. 2099 01:23:57,680 --> 01:23:59,990 And actually, sorry. 2100 01:23:59,990 --> 01:24:04,505 This is why I write these in advance too. 2101 01:24:04,505 --> 01:24:05,960 OK. 2102 01:24:05,960 --> 01:24:08,300 Go ahead and prompt the user for a number. 2103 01:24:08,300 --> 01:24:12,180 If they have inputted the Control-D, just break out of this loop. 2104 01:24:12,180 --> 01:24:15,920 However, if the size of the array equals its current capacity, 2105 01:24:15,920 --> 01:24:21,170 go ahead and reallocate space for this thing being one number bigger than it 2106 01:24:21,170 --> 01:24:22,580 previously was. 2107 01:24:22,580 --> 01:24:25,580 Now, assuming that succeeded and we have memory, 2108 01:24:25,580 --> 01:24:30,170 go ahead, and just like our list 0 example, store in the numbers array 2109 01:24:30,170 --> 01:24:34,160 at the current location, which is 0, whatever number the human typed in. 2110 01:24:34,160 --> 01:24:38,460 And then increment the size by one to remember what we have done. 2111 01:24:38,460 --> 01:24:41,840 I'm also though going to need to do capacity + + here 2112 01:24:41,840 --> 01:24:44,550 to remember that we've increased the capacity of the array. 2113 01:24:44,550 --> 01:24:45,860 So again, two new measures. 2114 01:24:45,860 --> 01:24:48,330 capacity is how much space there is in total. 2115 01:24:48,330 --> 01:24:50,120 size is how much we're using. 2116 01:24:50,120 --> 01:24:52,640 They happen to be identical at the moment 2117 01:24:52,640 --> 01:24:56,560 because we're growing this thing step by step by step. 2118 01:24:56,560 --> 01:24:57,060 All right. 2119 01:24:57,060 --> 01:24:58,430 Let me go ahead and hit Save. 2120 01:24:58,430 --> 01:25:01,380 Let me go ahead and compile this one last time. 2121 01:25:01,380 --> 01:25:05,060 ./list1 and input 1, 2, 3. 2122 01:25:05,060 --> 01:25:07,070 Control-D. OK. 2123 01:25:07,070 --> 01:25:08,660 Now it's just an aesthetic bug. 2124 01:25:08,660 --> 01:25:10,810 I forgot my /n. 2125 01:25:10,810 --> 01:25:18,770 So just to prove that I can actually program, ./list1; 1, 2, 3; Control-D. 2126 01:25:18,770 --> 01:25:19,700 Phew. 2127 01:25:19,700 --> 01:25:20,450 All right. 2128 01:25:20,450 --> 01:25:21,336 So you inputted 1. 2129 01:25:21,336 --> 01:25:23,210 And the reason it didn't move to another line 2130 01:25:23,210 --> 01:25:26,630 is because Control-D gets sent immediately without hitting Enter. 2131 01:25:26,630 --> 01:25:27,402 All right. 2132 01:25:27,402 --> 01:25:28,550 Phew. 2133 01:25:28,550 --> 01:25:29,660 That's all using arrays. 2134 01:25:29,660 --> 01:25:35,000 Now let's do the sort of cake baked already and pull it out of the oven. 2135 01:25:35,000 --> 01:25:38,780 The third and final example here is list two. 2136 01:25:38,780 --> 01:25:41,972 And actually, before we get there, let me note one thing. 2137 01:25:41,972 --> 01:25:43,430 Yeah, let's do one last thing here. 2138 01:25:43,430 --> 01:25:49,380 Let me go ahead and run, per earlier, our new friend valgrind on list1. 2139 01:25:49,380 --> 01:25:50,690 Enter. 2140 01:25:50,690 --> 01:25:54,140 It's waiting for me to type in 1, 2, 3. 2141 01:25:54,140 --> 01:25:57,590 Let me go ahead and hit Control-D. Interesting. 2142 01:25:57,590 --> 01:26:01,190 I seem to have a buggy program even though I claimed a moment ago that I 2143 01:26:01,190 --> 01:26:02,390 knew what I was doing. 2144 01:26:02,390 --> 01:26:06,054 12 bytes in one blocks are definitely lost in lost record one of one. 2145 01:26:06,054 --> 01:26:07,970 Again, I don't understand most of those words. 2146 01:26:07,970 --> 01:26:10,490 But 12 bytes definitely lost-- 2147 01:26:10,490 --> 01:26:13,580 probably my fault. Why is it 12? 2148 01:26:13,580 --> 01:26:16,870 And what are those 12 bytes? 2149 01:26:16,870 --> 01:26:17,485 Yeah? 2150 01:26:17,485 --> 01:26:19,629 AUDIENCE: I think you made three integers. 2151 01:26:19,629 --> 01:26:21,045 DAVID J. MALAN: Yeah, 1, 2, and 3. 2152 01:26:21,045 --> 01:26:22,098 AUDIENCE: And each one is 4 bytes. 2153 01:26:22,098 --> 01:26:23,810 And you never freed them after you used malloc. 2154 01:26:23,810 --> 01:26:24,290 DAVID J. MALAN: Exactly. 2155 01:26:24,290 --> 01:26:25,160 I typed in three numbers-- 2156 01:26:25,160 --> 01:26:25,721 1, 2, and 3. 2157 01:26:25,721 --> 01:26:27,470 Each of those is 4 bytes on this computer. 2158 01:26:27,470 --> 01:26:29,360 That's 12-- 3 times 4. 2159 01:26:29,360 --> 01:26:32,840 And so I'd never freed them seems to be the source of the issue. 2160 01:26:32,840 --> 01:26:36,020 So at the end, let's just prove that valgrind 2161 01:26:36,020 --> 01:26:37,850 can detect correctness as well. 2162 01:26:37,850 --> 01:26:40,340 Free my numbers, semi-colon. 2163 01:26:40,340 --> 01:26:43,220 Let me go ahead and rerun make list1. 2164 01:26:43,220 --> 01:26:47,240 And now let me increase the size of this and do valgrind again 2165 01:26:47,240 --> 01:26:49,760 on list1, typing in the same values-- 2166 01:26:49,760 --> 01:26:50,570 1, 2, and 3. 2167 01:26:50,570 --> 01:26:53,360 Control-D. All he blocks were freed. 2168 01:26:53,360 --> 01:26:54,774 No leaks are possible. 2169 01:26:54,774 --> 01:26:56,190 So again, valgrind is your friend. 2170 01:26:56,190 --> 01:26:58,210 It finds problems that you didn't even necessarily notice. 2171 01:26:58,210 --> 01:27:00,710 And you didn't have to read through your lines of code again 2172 01:27:00,710 --> 01:27:03,510 and again to identify the source of the issue unnecessarily. 2173 01:27:03,510 --> 01:27:04,010 All right. 2174 01:27:04,010 --> 01:27:08,540 Any questions then on these arrays that are dynamically allocated 2175 01:27:08,540 --> 01:27:12,100 and the bugs we find therein with valgrind? 2176 01:27:12,100 --> 01:27:12,630 All right. 2177 01:27:12,630 --> 01:27:17,610 So the last demonstration of code is going to be this. 2178 01:27:17,610 --> 01:27:21,810 I have stolen, for this final example, some of the building blocks 2179 01:27:21,810 --> 01:27:23,580 that we had on the screen earlier. 2180 01:27:23,580 --> 01:27:28,740 In my code for list2.c, I need a structure called node. 2181 01:27:28,740 --> 01:27:31,620 And that node, as we claimed earlier with our human volunteers, 2182 01:27:31,620 --> 01:27:33,420 is going to contain a number called number, 2183 01:27:33,420 --> 01:27:35,190 we'll call it this time, instead of n. 2184 01:27:35,190 --> 01:27:40,050 And it's going to contain a ptr called next to another such node. 2185 01:27:40,050 --> 01:27:43,140 So that's copied and pasted earlier, albeit with the integer renamed 2186 01:27:43,140 --> 01:27:45,010 to number for clarity. 2187 01:27:45,010 --> 01:27:47,820 Now, notice in main what I'm doing first. 2188 01:27:47,820 --> 01:27:53,250 Go ahead and allocate an array of no space initially. 2189 01:27:53,250 --> 01:27:56,910 So this was like when Comey was holding up first and representing 2190 01:27:56,910 --> 01:27:58,990 the beginning of our data structure. 2191 01:27:58,990 --> 01:28:02,220 This is the analog using an array, that the piece of paper that 2192 01:28:02,220 --> 01:28:04,050 would be held up here would be numbers. 2193 01:28:04,050 --> 01:28:06,390 And it's just pointing at nothing, null-- like left hand 2194 01:28:06,390 --> 01:28:07,430 down on the floor. 2195 01:28:07,430 --> 01:28:09,810 Because there is no memory yet allocated. 2196 01:28:09,810 --> 01:28:13,890 But then, and while true, go ahead and get an integer 2197 01:28:13,890 --> 01:28:16,380 from the user with this code here. 2198 01:28:16,380 --> 01:28:21,810 Check if the user hit Control-D, as with this arcane technique. 2199 01:28:21,810 --> 01:28:25,260 And then our code is similar in spirit, but we 2200 01:28:25,260 --> 01:28:27,520 have to stitch these things together. 2201 01:28:27,520 --> 01:28:29,470 Allocate space for the number. 2202 01:28:29,470 --> 01:28:32,220 So when I malloc an additional volunteer from the audience 2203 01:28:32,220 --> 01:28:35,430 and he or she came down, the equivalent in code is this-- 2204 01:28:35,430 --> 01:28:41,430 hey, computer, allocate with malloc enough space to fit the size of a node, 2205 01:28:41,430 --> 01:28:44,460 then store the results in a ptr called n. 2206 01:28:44,460 --> 01:28:49,290 So node *n just means, give me a pointer to a node, call it n, 2207 01:28:49,290 --> 01:28:53,070 and store the address that was just allocated from the audience as before. 2208 01:28:53,070 --> 01:28:56,100 Why do I have these lines of code here that I've highlighted in blue? 2209 01:28:56,100 --> 01:28:57,474 What's that expressing? 2210 01:28:57,474 --> 01:29:03,900 2211 01:29:03,900 --> 01:29:08,270 If bang n, or if not n would be how you pronounce it-- 2212 01:29:08,270 --> 01:29:10,610 what's going on there? 2213 01:29:10,610 --> 01:29:11,530 Yeah? 2214 01:29:11,530 --> 01:29:15,360 AUDIENCE: If there is no more memory that you can point to, then it fails. 2215 01:29:15,360 --> 01:29:16,360 DAVID J. MALAN: Exactly. 2216 01:29:16,360 --> 01:29:18,110 This isn't going to happen all that often. 2217 01:29:18,110 --> 01:29:21,580 But if the computer is out of memory, and therefore malloc fails, 2218 01:29:21,580 --> 01:29:23,750 you don't want the program just to crash or freeze. 2219 01:29:23,750 --> 01:29:26,360 Like, all of us hate when that happens on Mac OS or Windows. 2220 01:29:26,360 --> 01:29:27,370 So check for it. 2221 01:29:27,370 --> 01:29:33,160 If not n, or equivalently, if n = = null, just return 1. 2222 01:29:33,160 --> 01:29:35,410 Quit gracefully, even though annoyingly. 2223 01:29:35,410 --> 01:29:37,690 But don't just crash or do something unexpected. 2224 01:29:37,690 --> 01:29:43,750 So you can simplify that check to just if not n-- if n is not a valid ptr, 2225 01:29:43,750 --> 01:29:44,980 return 1. 2226 01:29:44,980 --> 01:29:49,270 Now, here's the code with which we were implementing the demonstration 2227 01:29:49,270 --> 01:29:49,960 with our humans. 2228 01:29:49,960 --> 01:29:52,390 And this is the scariest looking or most cryptic at least 2229 01:29:52,390 --> 01:29:54,820 looking code we're going to see in C. 2230 01:29:54,820 --> 01:29:59,290 Today is our final day in C. We've been running up 2231 01:29:59,290 --> 01:30:01,810 a really steep hill of late, learning about memory, 2232 01:30:01,810 --> 01:30:03,930 and now data structures and syntax. 2233 01:30:03,930 --> 01:30:06,280 This is the last of our syntax in C. 2234 01:30:06,280 --> 01:30:09,700 So what are the symbols to be aware of? 2235 01:30:09,700 --> 01:30:15,700 This line of code here is how I handed one of our volunteers a piece of paper. 2236 01:30:15,700 --> 01:30:18,270 On the right-hand side is the number that was typed in-- 2237 01:30:18,270 --> 01:30:21,460 55, or 5, or 20, or whatever the value is. 2238 01:30:21,460 --> 01:30:24,130 On the left-hand side is where you want to put it. 2239 01:30:24,130 --> 01:30:29,080 n and then literally an arrow number does this. 2240 01:30:29,080 --> 01:30:34,300 It has, with malloc a line or so prior, given me in memory 2241 01:30:34,300 --> 01:30:36,310 just one of these big rectangles. 2242 01:30:36,310 --> 01:30:40,420 And again, the top of this in this example is called the number 2243 01:30:40,420 --> 01:30:42,070 and the bottom is called next. 2244 01:30:42,070 --> 01:30:45,280 So that's our human having stood up from the back of the room. 2245 01:30:45,280 --> 01:30:49,870 When I hand that human a number, like 55, it visually goes there. 2246 01:30:49,870 --> 01:30:53,230 The line of code with which you achieve that is this here. 2247 01:30:53,230 --> 01:30:57,850 Because notice on line 31 here, when I malloc that node, 2248 01:30:57,850 --> 01:31:02,080 I stored its address in a variable called n. 2249 01:31:02,080 --> 01:31:05,920 And that's a pointer, as drawn with an arrow, to that big node. 2250 01:31:05,920 --> 01:31:08,920 Or if we really want to be nit-picky, if this is in address 100, 2251 01:31:08,920 --> 01:31:11,590 yes, then the pointer actually has the value 100 in it. 2252 01:31:11,590 --> 01:31:13,540 But again, that's rarely useful information. 2253 01:31:13,540 --> 01:31:16,610 So we can abstract away with just an arrow. 2254 01:31:16,610 --> 01:31:21,370 So line 31 is what creates those boxes on the screen. 2255 01:31:21,370 --> 01:31:25,120 Line 38 is what puts the number-- 2256 01:31:25,120 --> 01:31:30,250 for instance, 55-- into the box exactly, much like I handed a piece of paper 2257 01:31:30,250 --> 01:31:31,030 over. 2258 01:31:31,030 --> 01:31:32,630 So what is this? 2259 01:31:32,630 --> 01:31:35,620 This is the only real new notation today, 2260 01:31:35,620 --> 01:31:38,180 even though we're using lots of stars elsewhere-- 2261 01:31:38,180 --> 01:31:43,660 arrow This is wonderfully the first time in C it actually maps to our pictures. 2262 01:31:43,660 --> 01:31:46,550 If n is the variable and you do n arrow something, 2263 01:31:46,550 --> 01:31:48,049 that means follow the arrow-- 2264 01:31:48,049 --> 01:31:50,590 kind of like Chutes and Ladders if you grew up playing that-- 2265 01:31:50,590 --> 01:31:57,190 and then put the number where the arrow has led you in the field called number. 2266 01:31:57,190 --> 01:32:02,830 So as an aside, we can think about this a different way. n is what data type? 2267 01:32:02,830 --> 01:32:04,010 What is this thing in blue-- 2268 01:32:04,010 --> 01:32:04,510 n? 2269 01:32:04,510 --> 01:32:07,807 2270 01:32:07,807 --> 01:32:08,750 AUDIENCE: Pointer. 2271 01:32:08,750 --> 01:32:10,041 DAVID J. MALAN: It's a pointer. 2272 01:32:10,041 --> 01:32:12,927 And it's a pointer to one of these things that we created earlier. 2273 01:32:12,927 --> 01:32:15,260 So we're not doing students anymore with our structures. 2274 01:32:15,260 --> 01:32:18,920 We're implementing nodes, which have numbers and next pointers. 2275 01:32:18,920 --> 01:32:24,440 So it turns out that if n is a pointer to a node-- 2276 01:32:24,440 --> 01:32:27,570 recall that dot notation from before-- 2277 01:32:27,570 --> 01:32:29,570 this is not how you access number in this case. 2278 01:32:29,570 --> 01:32:30,950 Because n is not a node itself. 2279 01:32:30,950 --> 01:32:32,090 It's a pointer. 2280 01:32:32,090 --> 01:32:35,480 But if n is a pointer, how do you go to a pointer? 2281 01:32:35,480 --> 01:32:37,130 How do you go to an address? 2282 01:32:37,130 --> 01:32:38,906 With what notation? 2283 01:32:38,906 --> 01:32:39,530 AUDIENCE: Star. 2284 01:32:39,530 --> 01:32:40,470 DAVID J. MALAN: Star. 2285 01:32:40,470 --> 01:32:44,280 So recall from last week, if we want to go to an address, 2286 01:32:44,280 --> 01:32:45,641 you could do syntax like this. 2287 01:32:45,641 --> 01:32:47,140 Ignore the parentheses for a moment. 2288 01:32:47,140 --> 01:32:52,290 Just *n means if n is an address of a chunk of memory, *n means go there. 2289 01:32:52,290 --> 01:32:56,520 Once you're there, you're conceptually right here-- top left-hand corner. 2290 01:32:56,520 --> 01:32:59,670 How do you access individual fields like number or next? 2291 01:32:59,670 --> 01:33:01,420 You use dot notation. 2292 01:33:01,420 --> 01:33:08,400 So if you literally do *n.number, that means go to the address and access 2293 01:33:08,400 --> 01:33:09,620 the number field. 2294 01:33:09,620 --> 01:33:12,090 There is nice syntactic sugar in C, which 2295 01:33:12,090 --> 01:33:16,110 is just a fancy way of saying shorthand notation, where it's just the arrow. 2296 01:33:16,110 --> 01:33:17,070 But that's all it is. 2297 01:33:17,070 --> 01:33:19,450 This arrow notation doesn't do anything new. 2298 01:33:19,450 --> 01:33:24,990 It just combines, go there, with, access a field in a struct, all in one breath 2299 01:33:24,990 --> 01:33:26,200 if you will. 2300 01:33:26,200 --> 01:33:28,560 And this just looks a little prettier. 2301 01:33:28,560 --> 01:33:30,990 When I told our volunteers earlier, point your hand 2302 01:33:30,990 --> 01:33:33,760 down at the floor, that's all that line of code is doing. 2303 01:33:33,760 --> 01:33:38,700 It's saying, go to n's address, which is here, access the next field, 2304 01:33:38,700 --> 01:33:41,100 and write in that field null, which is just 2305 01:33:41,100 --> 01:33:46,110 the address 0-- the default, special address, like pointing at the floor. 2306 01:33:46,110 --> 01:33:49,530 This line of code, 40, is just a quick error check. 2307 01:33:49,530 --> 01:33:51,540 if (numbers)-- what is that equivalent to? 2308 01:33:51,540 --> 01:33:54,150 That's actually just saying, if numbers, not equals null. 2309 01:33:54,150 --> 01:33:59,310 So if numbers is legitimate, if malloc worked correctly, then let's go ahead 2310 01:33:59,310 --> 01:34:01,458 and do the following. 2311 01:34:01,458 --> 01:34:02,270 Phew. 2312 01:34:02,270 --> 01:34:04,050 This is a mouthful. 2313 01:34:04,050 --> 01:34:05,990 What is going on here? 2314 01:34:05,990 --> 01:34:08,500 So this is a for-loop that's not using numbers. 2315 01:34:08,500 --> 01:34:09,487 Well, or is it? 2316 01:34:09,487 --> 01:34:12,320 Almost every for-loop we've written and you've probably written just 2317 01:34:12,320 --> 01:34:16,660 uses I, J, maybe K, but just integers probably. 2318 01:34:16,660 --> 01:34:18,550 But that doesn't have to be the case. 2319 01:34:18,550 --> 01:34:19,594 What is a pointer? 2320 01:34:19,594 --> 01:34:20,260 It's an address. 2321 01:34:20,260 --> 01:34:21,051 What is an address? 2322 01:34:21,051 --> 01:34:23,484 2323 01:34:23,484 --> 01:34:24,650 AUDIENCE: A place in memory. 2324 01:34:24,650 --> 01:34:26,570 DAVID J. MALAN: A place in memory, or a number really. 2325 01:34:26,570 --> 01:34:29,270 So you can certainly use for-loops just involving addresses. 2326 01:34:29,270 --> 01:34:30,230 But how? 2327 01:34:30,230 --> 01:34:32,280 So we'll consider this line of code. 2328 01:34:32,280 --> 01:34:35,240 This here looks different today, but it's everything 2329 01:34:35,240 --> 01:34:36,620 before that first semi-colon. 2330 01:34:36,620 --> 01:34:38,640 That's just where you initialize a value. 2331 01:34:38,640 --> 01:34:42,260 So this is like saying, hey, computer, go ahead and give me 2332 01:34:42,260 --> 01:34:51,420 a variable called ptr and initialize it to be the start of my list. 2333 01:34:51,420 --> 01:34:57,350 Then I'm saying, hey, computer, do this so long as ptr does not equal null. 2334 01:34:57,350 --> 01:34:59,310 And then what am I doing? 2335 01:34:59,310 --> 01:35:02,930 if-- and let's ignore this for now, it's an error check-- 2336 01:35:02,930 --> 01:35:08,930 go ahead and-- sorry, let me think for one second. 2337 01:35:08,930 --> 01:35:13,330 2338 01:35:13,330 --> 01:35:13,830 OK. 2339 01:35:13,830 --> 01:35:14,685 Let's do this. 2340 01:35:14,685 --> 01:35:17,340 2341 01:35:17,340 --> 01:35:19,770 What are these lines of code doing? 2342 01:35:19,770 --> 01:35:21,670 This is the code that was actually suggested 2343 01:35:21,670 --> 01:35:23,400 at the very end of our human example. 2344 01:35:23,400 --> 01:35:26,460 Like, what if we wanted to insert all of the elements 2345 01:35:26,460 --> 01:35:28,500 at the end of the link list? 2346 01:35:28,500 --> 01:35:30,030 How do you express that? 2347 01:35:30,030 --> 01:35:34,050 So in this highlighted lines of code, we're asking the question, 2348 01:35:34,050 --> 01:35:38,370 if the current pointer's next field is null, we've found the end. 2349 01:35:38,370 --> 01:35:42,810 Go ahead and update that next field to equal n and then break. 2350 01:35:42,810 --> 01:35:45,140 So let me translate this to an actual picture, 2351 01:35:45,140 --> 01:35:49,830 but using smaller boxes that makes clear where something is going. 2352 01:35:49,830 --> 01:35:53,410 So suppose that this program's been running for a little while 2353 01:35:53,410 --> 01:35:57,480 and we have a length list that looks like this, 2354 01:35:57,480 --> 01:36:02,200 where this one is pointing here and maybe this one's pointing here. 2355 01:36:02,200 --> 01:36:04,350 And this says null here. 2356 01:36:04,350 --> 01:36:05,400 And this points here. 2357 01:36:05,400 --> 01:36:11,310 And the numbers are, as we've been using today, 42, 50, 13. 2358 01:36:11,310 --> 01:36:14,655 So the start of this list is called numbers. 2359 01:36:14,655 --> 01:36:17,180 2360 01:36:17,180 --> 01:36:19,220 This points to the start of the list. 2361 01:36:19,220 --> 01:36:21,020 What am I doing in this for-loop? 2362 01:36:21,020 --> 01:36:24,650 I am just implementing the following logic with this loop-- 2363 01:36:24,650 --> 01:36:27,320 give me a variable called ptr, as represented 2364 01:36:27,320 --> 01:36:30,800 in the story by my left finger, here, and initialize 2365 01:36:30,800 --> 01:36:33,120 that to be the start of the list. 2366 01:36:33,120 --> 01:36:42,500 If that node's next pointer is equal to null, add a new node here. 2367 01:36:42,500 --> 01:36:43,880 But this is not null. 2368 01:36:43,880 --> 01:36:46,820 I want to follow the bread crumbs to here. 2369 01:36:46,820 --> 01:36:48,680 And then, oh, we're at the end of the list. 2370 01:36:48,680 --> 01:36:50,600 I want to insert this new thing here. 2371 01:36:50,600 --> 01:36:55,640 So how do express this code actually in C? 2372 01:36:55,640 --> 01:37:01,760 So if I look back up here, this is the line of code 2373 01:37:01,760 --> 01:37:05,980 that allocates my left finger here called ptr and initialize it 2374 01:37:05,980 --> 01:37:09,320 to equal numbers, which is the same as pointing at the first element. 2375 01:37:09,320 --> 01:37:11,930 It's kind of like Comey was representing first earlier. 2376 01:37:11,930 --> 01:37:13,940 But now our array is called numbers. 2377 01:37:13,940 --> 01:37:16,280 Next, what am I doing? 2378 01:37:16,280 --> 01:37:17,610 Does ptr equal null? 2379 01:37:17,610 --> 01:37:18,560 Well, no. 2380 01:37:18,560 --> 01:37:21,620 If my left hand is pointing here, it obviously doesn't equal null. 2381 01:37:21,620 --> 01:37:23,210 So we don't have to worry yet. 2382 01:37:23,210 --> 01:37:25,010 Then what do I want to do? 2383 01:37:25,010 --> 01:37:28,680 If ptr next equals null, well, what does that mean? 2384 01:37:28,680 --> 01:37:30,170 Well, ptr is here. 2385 01:37:30,170 --> 01:37:32,720 ptr arrow next means here. 2386 01:37:32,720 --> 01:37:35,842 Does this equal null in this story? 2387 01:37:35,842 --> 01:37:37,050 I mean, it literally doesn't. 2388 01:37:37,050 --> 01:37:39,630 Because null is not written there. null is way down there. 2389 01:37:39,630 --> 01:37:42,460 So the condition does not pass. 2390 01:37:42,460 --> 01:37:44,130 So what do I do next? 2391 01:37:44,130 --> 01:37:50,250 If ptr is equal to null doesn't apply, here's a weird update. 2392 01:37:50,250 --> 01:37:52,890 ptr gets ptr next. 2393 01:37:52,890 --> 01:37:54,780 So it's cryptic-looking syntax. 2394 01:37:54,780 --> 01:37:58,920 But if ptr is pointing here, what is ptr next? 2395 01:37:58,920 --> 01:38:00,120 That's just this, right? 2396 01:38:00,120 --> 01:38:00,750 This is n. 2397 01:38:00,750 --> 01:38:01,621 This is next. 2398 01:38:01,621 --> 01:38:02,370 Or this is number. 2399 01:38:02,370 --> 01:38:03,270 This is next. 2400 01:38:03,270 --> 01:38:05,860 So ptr next is this. 2401 01:38:05,860 --> 01:38:07,270 So what is this value? 2402 01:38:07,270 --> 01:38:09,420 Well, this is a pointer pointing here. 2403 01:38:09,420 --> 01:38:13,890 So that highlighted block of code, ptr equals ptr next, 2404 01:38:13,890 --> 01:38:17,910 has the effect visually of doing this. 2405 01:38:17,910 --> 01:38:18,660 Why? 2406 01:38:18,660 --> 01:38:22,200 If the arrows are a little too magical, just think about these being addresses. 2407 01:38:22,200 --> 01:38:25,830 If this is saying, the next address is location 100, 2408 01:38:25,830 --> 01:38:30,450 ptr equals ptr next is like saying, well, this also equals 100. 2409 01:38:30,450 --> 01:38:33,270 Whatever 100 is, for instance, over here is 2410 01:38:33,270 --> 01:38:35,410 what both arrows should now point out. 2411 01:38:35,410 --> 01:38:38,160 And if you now repeat this process and repeat this process, 2412 01:38:38,160 --> 01:38:41,970 eventually that question we asked earlier is going to apply-- 2413 01:38:41,970 --> 01:38:46,150 if ptr next equals null, what do I want to do? 2414 01:38:46,150 --> 01:38:53,760 Well, if ptr x equals null, there's two lines going on. ptr next equals n. 2415 01:38:53,760 --> 01:38:56,310 So ptr next is no longer null. 2416 01:38:56,310 --> 01:39:00,390 It should instead be pointing at n, which is the new node. 2417 01:39:00,390 --> 01:39:02,340 And then that's it. 2418 01:39:02,340 --> 01:39:04,230 Because this was already initialized to null. 2419 01:39:04,230 --> 01:39:06,180 And let's suppose this was 55. 2420 01:39:06,180 --> 01:39:07,310 And we're done. 2421 01:39:07,310 --> 01:39:09,810 So much easier to do, obviously, in person with just humans, 2422 01:39:09,810 --> 01:39:12,720 and moving around, and pointing with their left hands. 2423 01:39:12,720 --> 01:39:16,480 But in code, you just have to think about the basic building blocks. 2424 01:39:16,480 --> 01:39:17,970 What is each of these values? 2425 01:39:17,970 --> 01:39:20,130 Where is each of it pointing? 2426 01:39:20,130 --> 01:39:22,290 And which of those fields do you need to update? 2427 01:39:22,290 --> 01:39:25,800 And the only new code here-- even though we're kind of combining it all in one 2428 01:39:25,800 --> 01:39:26,700 massive example-- 2429 01:39:26,700 --> 01:39:27,480 is this. 2430 01:39:27,480 --> 01:39:31,230 We are actually using arrow notation to say, go to that address 2431 01:39:31,230 --> 01:39:34,320 and access some value therein. 2432 01:39:34,320 --> 01:39:37,140 And this condition down here, which I'll wave my hand out for now, 2433 01:39:37,140 --> 01:39:41,990 just handles this situation where the list is initially empty. 2434 01:39:41,990 --> 01:39:45,680 Any questions on this thus far? 2435 01:39:45,680 --> 01:39:46,730 All right. 2436 01:39:46,730 --> 01:39:52,730 So let's take a look more graphically at some final problems we can solve. 2437 01:39:52,730 --> 01:39:56,090 And what you'll see in the days ahead is the following 2438 01:39:56,090 --> 01:39:58,340 when it comes to these linked lists and more. 2439 01:39:58,340 --> 01:40:01,619 We now have the ability to actually allocate things in memory dynamically. 2440 01:40:01,619 --> 01:40:04,160 We don't necessarily know in advance how many numbers we have 2441 01:40:04,160 --> 01:40:06,950 or, in the case of the next problem set, how many words we have. 2442 01:40:06,950 --> 01:40:10,040 We have the ability though to use malloc, and maybe even realloc, 2443 01:40:10,040 --> 01:40:12,500 to grow and grow our data structure in memory. 2444 01:40:12,500 --> 01:40:14,600 And we have the ability in code to actually 2445 01:40:14,600 --> 01:40:17,360 traverse those values in such a way that we 2446 01:40:17,360 --> 01:40:20,150 can access memory that's all over the board now 2447 01:40:20,150 --> 01:40:22,620 and not necessarily back to back to back. 2448 01:40:22,620 --> 01:40:26,780 But what happens if we want to combine these ideas into fancier solutions 2449 01:40:26,780 --> 01:40:27,650 still? 2450 01:40:27,650 --> 01:40:29,700 Well, let's take a look at that. 2451 01:40:29,700 --> 01:40:35,490 In particular, if I go let's say over here to the following, 2452 01:40:35,490 --> 01:40:38,000 let's consider a problem we might now solve. 2453 01:40:38,000 --> 01:40:42,710 If I wanted to store everyone's name in this room in a data structure, 2454 01:40:42,710 --> 01:40:44,760 I could do what? 2455 01:40:44,760 --> 01:40:46,170 Well, we could use an array. 2456 01:40:46,170 --> 01:40:48,770 So I could actually decide how many people are in the room-- 2457 01:40:48,770 --> 01:40:49,790 let's call it n-- 2458 01:40:49,790 --> 01:40:52,790 and actually draw n boxes on the board, and then iteratively ask 2459 01:40:52,790 --> 01:40:55,340 everyone for their name, and actually write it down. 2460 01:40:55,340 --> 01:40:59,660 If I then wanted to take attendance thereafter and say, oh, is Alice here, 2461 01:40:59,660 --> 01:41:02,240 or is Bob here, or is Kareem here, or Brian, 2462 01:41:02,240 --> 01:41:05,960 I could just look through that array and say yes or no, that human is here. 2463 01:41:05,960 --> 01:41:08,060 But what's the running time of that algorithm? 2464 01:41:08,060 --> 01:41:11,540 How long would it take to look up a name in a data structure 2465 01:41:11,540 --> 01:41:14,242 where I've just drawn it as an array, a big list on the board? 2466 01:41:14,242 --> 01:41:15,200 AUDIENCE: A big O of n. 2467 01:41:15,200 --> 01:41:15,716 DAVID J. MALAN: What's that? 2468 01:41:15,716 --> 01:41:16,450 AUDIENCE: A big O of n. 2469 01:41:16,450 --> 01:41:17,420 DAVID J. MALAN: A big O of n, right? 2470 01:41:17,420 --> 01:41:20,253 Because if it's just a list of names, it's going to take big 0 of n. 2471 01:41:20,253 --> 01:41:22,310 And frankly, that seems a little slow. 2472 01:41:22,310 --> 01:41:24,410 How could I do an optimization? 2473 01:41:24,410 --> 01:41:26,630 Well, what if we combined some of these ideas? 2474 01:41:26,630 --> 01:41:28,700 Arrays are nice because they give me random sort 2475 01:41:28,700 --> 01:41:31,410 of instant access to memory locations. 2476 01:41:31,410 --> 01:41:35,720 But linked lists are nice because they allow me to dynamically add or subtract 2477 01:41:35,720 --> 01:41:38,282 elements even if I want from the list. 2478 01:41:38,282 --> 01:41:38,990 So you know what? 2479 01:41:38,990 --> 01:41:44,210 Instead of writing down everyone's names, like Alice, and Bob, 2480 01:41:44,210 --> 01:41:52,340 and Charlie, like this in just one big array of some fixed size that might 2481 01:41:52,340 --> 01:41:55,310 paint me into a corner-- now I only have room for one more name-- 2482 01:41:55,310 --> 01:41:57,950 what if I instead do things a little more cleverly? 2483 01:41:57,950 --> 01:42:01,160 So when I'm actually jotting down everyone's name in the room, what 2484 01:42:01,160 --> 01:42:04,290 if I instead did, OK, is Alice here. 2485 01:42:04,290 --> 01:42:04,790 All right. 2486 01:42:04,790 --> 01:42:06,190 Alice is here. 2487 01:42:06,190 --> 01:42:07,865 And then Brian is here. 2488 01:42:07,865 --> 01:42:09,140 I'm going to put Brian here. 2489 01:42:09,140 --> 01:42:11,221 And then maybe Charlie is here. 2490 01:42:11,221 --> 01:42:11,720 All right. 2491 01:42:11,720 --> 01:42:13,760 So Charlie. 2492 01:42:13,760 --> 01:42:16,287 And then maybe Arnold is here. 2493 01:42:16,287 --> 01:42:17,370 Where should I put Arnold? 2494 01:42:17,370 --> 01:42:19,160 So also starts with A. You know what? 2495 01:42:19,160 --> 01:42:21,770 Let's just put Arnold here. 2496 01:42:21,770 --> 01:42:23,510 Arnold. 2497 01:42:23,510 --> 01:42:24,542 And Abby is here. 2498 01:42:24,542 --> 01:42:25,250 So you know what? 2499 01:42:25,250 --> 01:42:27,960 Let's just put Abby up here as well. 2500 01:42:27,960 --> 01:42:29,270 Bob came as well. 2501 01:42:29,270 --> 01:42:31,520 So Bob-- so what's the pattern I'm obviously following 2502 01:42:31,520 --> 01:42:35,107 as I'm hearing names called out? 2503 01:42:35,107 --> 01:42:36,440 AUDIENCE: Alphabetically sorted. 2504 01:42:36,440 --> 01:42:37,860 DAVID J. MALAN: Alphabetically sorted-- 2505 01:42:37,860 --> 01:42:38,490 kind of. 2506 01:42:38,490 --> 01:42:41,280 Like, Abby kind of ended up in a weird place here. 2507 01:42:41,280 --> 01:42:44,400 But that's fine because I didn't hear her name first. 2508 01:42:44,400 --> 01:42:49,080 But I did kind of bucketize people into different rows of the board. 2509 01:42:49,080 --> 01:42:50,820 In other words, all of the A names I seem 2510 01:42:50,820 --> 01:42:53,050 to just write down for convenience at the top, 2511 01:42:53,050 --> 01:42:54,720 and then all of the B names together, and C names. 2512 01:42:54,720 --> 01:42:56,490 And probably if I kept going, I could do this all the way 2513 01:42:56,490 --> 01:42:58,420 through Z in the English alphabet. 2514 01:42:58,420 --> 01:43:01,770 So what's nice about this is that, yeah, I'm making lists of names, 2515 01:43:01,770 --> 01:43:03,990 but how long is each of those lists? 2516 01:43:03,990 --> 01:43:06,780 If there's n people in the room, each of my lists 2517 01:43:06,780 --> 01:43:09,720 is not going to be n long, which is slow. 2518 01:43:09,720 --> 01:43:14,505 It's going to be what? n divided by 26, give or take. 2519 01:43:14,505 --> 01:43:16,630 If we assume that there's an equal number of people 2520 01:43:16,630 --> 01:43:20,710 with Z names and A names, it's going to be roughly n divided by 26 so 2521 01:43:20,710 --> 01:43:23,500 that I have these chains of human names, but they're 2522 01:43:23,500 --> 01:43:26,890 much shorter than they would have been if I just grouped everyone together. 2523 01:43:26,890 --> 01:43:31,480 And this is a fundamental technique in programming called hashing. 2524 01:43:31,480 --> 01:43:34,640 It turns out there are things in this world called hash functions. 2525 01:43:34,640 --> 01:43:39,100 These are just mathematical, or verbal, or code-implemented functions 2526 01:43:39,100 --> 01:43:43,990 that take as input something and produce as output a number typically-- a number 2527 01:43:43,990 --> 01:43:46,930 from 0 to, say, 25, or from 1 to 26. 2528 01:43:46,930 --> 01:43:49,750 But they can also output strings in other contexts as well. 2529 01:43:49,750 --> 01:43:53,545 So my hash function here in my mind is, if you hand me a name, 2530 01:43:53,545 --> 01:43:55,670 I'm going to look at the first letter in your name. 2531 01:43:55,670 --> 01:43:57,910 And if it's A, I'm putting you in location 0. 2532 01:43:57,910 --> 01:44:00,180 If it's B, I'm going to put you in location 1. 2533 01:44:00,180 --> 01:44:03,540 If it's a Z, I'm going to put you in location 25 at the end. 2534 01:44:03,540 --> 01:44:05,620 So these are all buckets I've got, so to speak, 2535 01:44:05,620 --> 01:44:08,080 in computer science-- like 26 buckets or room 2536 01:44:08,080 --> 01:44:11,350 on the board that represent the starts of people's names. 2537 01:44:11,350 --> 01:44:12,670 So what is that? 2538 01:44:12,670 --> 01:44:16,690 Well, it would seem that if I don't know in advance how many A names I have, 2539 01:44:16,690 --> 01:44:19,880 that's kind of like drawing this as a linked list, if you will, 2540 01:44:19,880 --> 01:44:22,360 that might just get longer and longer. 2541 01:44:22,360 --> 01:44:27,250 But I do know that I only have a finite number of first letters. 2542 01:44:27,250 --> 01:44:30,790 So that-- at the risk of drawing a little messily-- 2543 01:44:30,790 --> 01:44:32,639 is kind of like drawing what data structure? 2544 01:44:32,639 --> 01:44:33,430 AUDIENCE: An array. 2545 01:44:33,430 --> 01:44:34,305 DAVID J. MALAN: Yeah. 2546 01:44:34,305 --> 01:44:39,610 It's kind of like drawing an array that just has 26 spots. 2547 01:44:39,610 --> 01:44:42,970 And what's nice about an array is that I have random access. 2548 01:44:42,970 --> 01:44:47,270 I can jump right to any letter of the alphabet in constant time, one step. 2549 01:44:47,270 --> 01:44:50,530 And once I get there, I'm still going to see a list of names. 2550 01:44:50,530 --> 01:44:54,180 Thankfully, thanks to linked lists, that list can be short or long. 2551 01:44:54,180 --> 01:44:55,930 But on average, let's say it's going to be 2552 01:44:55,930 --> 01:45:02,070 126th the length that it would have been if I just used one array or one linked 2553 01:45:02,070 --> 01:45:02,950 list. 2554 01:45:02,950 --> 01:45:06,940 So this technique of using a hash function-- which, again, 2555 01:45:06,940 --> 01:45:09,760 I've defined as you give me a name; I take that as input; 2556 01:45:09,760 --> 01:45:13,840 I look at the first letter; and I return as output a number from 0 to 25-- 2557 01:45:13,840 --> 01:45:17,194 a hash function lets you create a hash table. 2558 01:45:17,194 --> 01:45:19,360 And there's different ways to implement hash tables, 2559 01:45:19,360 --> 01:45:22,540 but perhaps one of the most common is indeed like this. 2560 01:45:22,540 --> 01:45:26,170 You decide in advance on the size of an array. 2561 01:45:26,170 --> 01:45:30,070 But that array does not contain the strings or the humans' names. 2562 01:45:30,070 --> 01:45:34,270 That array actually contains linked lists. 2563 01:45:34,270 --> 01:45:37,210 And it's the linked lists that contain the names. 2564 01:45:37,210 --> 01:45:39,010 So we borrow ideas from, like, week two. 2565 01:45:39,010 --> 01:45:42,580 We merge them with an idea today from week four of adding arrays 2566 01:45:42,580 --> 01:45:44,180 to linked list respectively. 2567 01:45:44,180 --> 01:45:46,360 And we kind of get the best of both worlds. 2568 01:45:46,360 --> 01:45:49,760 Because I can immediately jump to any letter of the alphabet super fast. 2569 01:45:49,760 --> 01:45:51,560 And once I'm there, yeah, there's a list, 2570 01:45:51,560 --> 01:45:55,540 but it's not nearly as long as it would have been if I didn't use this trick. 2571 01:45:55,540 --> 01:45:57,650 So what's the running time of all of this? 2572 01:45:57,650 --> 01:46:01,450 Well, it turns out that a hash table in the worst case 2573 01:46:01,450 --> 01:46:04,300 might still take you how many steps to find someone's name once it's 2574 01:46:04,300 --> 01:46:06,000 been added to the list? 2575 01:46:06,000 --> 01:46:09,644 In the very worst case, how many steps, if there's n people in the room? 2576 01:46:09,644 --> 01:46:10,640 AUDIENCE: n. 2577 01:46:10,640 --> 01:46:11,640 DAVID J. MALAN: Maybe n. 2578 01:46:11,640 --> 01:46:12,850 Why? 2579 01:46:12,850 --> 01:46:14,320 It's kind of a perverse situation. 2580 01:46:14,320 --> 01:46:17,001 But can you contrive a scenario in which, 2581 01:46:17,001 --> 01:46:19,000 even though we're doing this fanciness, it still 2582 01:46:19,000 --> 01:46:21,510 takes me n steps to confirm or deny that someone's here? 2583 01:46:21,510 --> 01:46:21,800 Yeah? 2584 01:46:21,800 --> 01:46:23,310 AUDIENCE: Everyone's name starts with the same letter. 2585 01:46:23,310 --> 01:46:25,150 DAVID J. MALAN: Everyone's name starts with the same letter 2586 01:46:25,150 --> 01:46:26,287 for some weird reason. 2587 01:46:26,287 --> 01:46:28,120 Now, it's a little silly in the human world. 2588 01:46:28,120 --> 01:46:29,870 But it could happen if you're just talking 2589 01:46:29,870 --> 01:46:32,080 data or whatever in the computer world. 2590 01:46:32,080 --> 01:46:37,240 This can devolve into, sure, an array with just one really linked list. 2591 01:46:37,240 --> 01:46:39,985 But in practice, that's not likely going to happen, right? 2592 01:46:39,985 --> 01:46:42,860 If we actually spent the time here and asked everyone for their name, 2593 01:46:42,860 --> 01:46:46,240 we'd probably get a reasonably uniform distribution of letters, 2594 01:46:46,240 --> 01:46:49,720 at least as is statistically likely with just human names. 2595 01:46:49,720 --> 01:46:51,550 So that would kind of spread things out. 2596 01:46:51,550 --> 01:46:55,150 And so there's this fundamental distinction between sort of real-world 2597 01:46:55,150 --> 01:46:58,390 running time, or wall clock time-- how many seconds are actually spinning 2598 01:46:58,390 --> 01:46:59,080 on the clock-- 2599 01:46:59,080 --> 01:47:00,850 versus asymptotic running time. 2600 01:47:00,850 --> 01:47:04,150 We've talked for a couple of weeks now about running time as being big O of n. 2601 01:47:04,150 --> 01:47:08,590 And that might be still the case, that a hash table-- yes, in the worst case, 2602 01:47:08,590 --> 01:47:10,481 it's still a big O of n data structure. 2603 01:47:10,481 --> 01:47:12,730 Because in the worst case, it's going to take n steps. 2604 01:47:12,730 --> 01:47:18,550 But in the real world, big O of n is really big O of n divided by 26, 2605 01:47:18,550 --> 01:47:21,320 even though we always ignore those lower-order terms. 2606 01:47:21,320 --> 01:47:24,940 But when it's you, the human, running the code and analyzing the data, 2607 01:47:24,940 --> 01:47:30,280 running 26 times faster is actually real time saved, 2608 01:47:30,280 --> 01:47:33,610 even though a mathematician might say, ah, that's the same fundamentally. 2609 01:47:33,610 --> 01:47:36,610 And indeed, one of the problems ahead for the next problem set 2610 01:47:36,610 --> 01:47:39,670 is going to be to suss out exactly what the implications are 2611 01:47:39,670 --> 01:47:43,270 in your own code for actual wall clock running time. 2612 01:47:43,270 --> 01:47:46,570 And making smarter design decisions, like something like this, 2613 01:47:46,570 --> 01:47:50,200 can actually really speed up your code to be 26 times as fast, even 2614 01:47:50,200 --> 01:47:52,030 though, yes, a theoretician would say, ah, 2615 01:47:52,030 --> 01:47:55,630 but that's still asymptotically or mathematically 2616 01:47:55,630 --> 01:47:58,000 equivalent to just something linear. 2617 01:47:58,000 --> 01:48:01,090 So it's this fine tuning that will make your code even better and better. 2618 01:48:01,090 --> 01:48:03,460 Now, frankly, hashing on first names probably 2619 01:48:03,460 --> 01:48:05,320 isn't the smartest thing alone, right? 2620 01:48:05,320 --> 01:48:09,151 Like, does anyone's-- and this is going to be hard. 2621 01:48:09,151 --> 01:48:11,260 Does anyone's name start with X here? 2622 01:48:11,260 --> 01:48:12,625 AUDIENCE: [INAUDIBLE]. 2623 01:48:12,625 --> 01:48:13,620 DAVID J. MALAN: [INAUDIBLE] is not here. 2624 01:48:13,620 --> 01:48:15,540 But thank you for that perfect counter-example. 2625 01:48:15,540 --> 01:48:16,350 But she's not here. 2626 01:48:16,350 --> 01:48:17,700 So look, there's no Zs. 2627 01:48:17,700 --> 01:48:19,999 So now we're down to 25 possible values. 2628 01:48:19,999 --> 01:48:22,290 And I could probably pick some less common letters too. 2629 01:48:22,290 --> 01:48:25,260 The point is there's probably a few more As than there are Zs 2630 01:48:25,260 --> 01:48:28,860 or a few more B's than there are Q's just by nature of human names. 2631 01:48:28,860 --> 01:48:32,260 So maybe just using the first letter isn't good enough. 2632 01:48:32,260 --> 01:48:35,700 And frankly, with 26 names-- suppose we did this for all of Harvard 2633 01:48:35,700 --> 01:48:37,140 and had thousands of names. 2634 01:48:37,140 --> 01:48:40,290 Each of my chains might still have hundreds or thousands of names. 2635 01:48:40,290 --> 01:48:43,680 So another design question is going to be, well, how many buckets should you 2636 01:48:43,680 --> 01:48:45,285 have, how big should the array be. 2637 01:48:45,285 --> 01:48:47,160 Maybe you shouldn't look at the first letter. 2638 01:48:47,160 --> 01:48:50,820 What if you look at the first and the second letter together-- so AA, and AB, 2639 01:48:50,820 --> 01:48:55,680 and AC, and then dot dot dot, BA, BB, BC, so you could come up 2640 01:48:55,680 --> 01:48:57,090 with more and more buckets? 2641 01:48:57,090 --> 01:48:57,750 But what else? 2642 01:48:57,750 --> 01:49:01,740 How else might we kind of uniformly distribute people? 2643 01:49:01,740 --> 01:49:06,366 What do all of you have that we could use as input to a hash function? 2644 01:49:06,366 --> 01:49:07,330 AUDIENCE: A last name. 2645 01:49:07,330 --> 01:49:07,510 DAVID J. MALAN: OK. 2646 01:49:07,510 --> 01:49:09,551 Well, you could do last name, which might give us 2647 01:49:09,551 --> 01:49:11,081 a different or similar distribution. 2648 01:49:11,081 --> 01:49:11,580 Yeah? 2649 01:49:11,580 --> 01:49:12,240 AUDIENCE: ID number. 2650 01:49:12,240 --> 01:49:12,820 DAVID J. MALAN: Whats that? 2651 01:49:12,820 --> 01:49:13,450 AUDIENCE: ID number. 2652 01:49:13,450 --> 01:49:13,870 DAVID J. MALAN: Yeah. 2653 01:49:13,870 --> 01:49:17,036 We could use your ID number and actually look at the first digit of your ID. 2654 01:49:17,036 --> 01:49:19,470 And odds are, it's 0 through 9. 2655 01:49:19,470 --> 01:49:21,880 So we could probably at least get 10 buckets that way. 2656 01:49:21,880 --> 01:49:23,890 And that's probably uniformly distributed. 2657 01:49:23,890 --> 01:49:24,700 I'm not sure. 2658 01:49:24,700 --> 01:49:27,430 We could use birth dates in some way. 2659 01:49:27,430 --> 01:49:29,650 Like, we could put all of the freshmen in one bucket, 2660 01:49:29,650 --> 01:49:31,690 all the seniors in another bucket, and everyone else, 2661 01:49:31,690 --> 01:49:34,689 and so forth, in their own buckets, which would also give us some input. 2662 01:49:34,689 --> 01:49:38,360 So again, a hash function is entirely up to you to program and design. 2663 01:49:38,360 --> 01:49:41,140 The goal though is to smooth things out. 2664 01:49:41,140 --> 01:49:44,860 You want to have roughly the same number of things in each linked list 2665 01:49:44,860 --> 01:49:48,250 just so that you have about the same performance 2666 01:49:48,250 --> 01:49:50,784 across all of these various inputs. 2667 01:49:50,784 --> 01:49:53,200 So let's take a look at a couple of other data structures, 2668 01:49:53,200 --> 01:49:54,790 again, in this abstract way. 2669 01:49:54,790 --> 01:49:58,090 Now that we know that, even though it's not obvious at first attempt, 2670 01:49:58,090 --> 01:49:59,800 we know how to construct arrays. 2671 01:49:59,800 --> 01:50:02,320 We kind of know now how to construct linked lists. 2672 01:50:02,320 --> 01:50:05,620 It stands to reason we could implement them together in code. 2673 01:50:05,620 --> 01:50:08,650 What else could we do now with these building blocks? 2674 01:50:08,650 --> 01:50:13,765 So for instance, this structure here is a very common one, known as a tree. 2675 01:50:13,765 --> 01:50:16,641 A tree like a family tree, where there's one patriarch or matriarch 2676 01:50:16,641 --> 01:50:19,390 at the top, and then their children, and then their grandchildren, 2677 01:50:19,390 --> 01:50:21,310 and great grandchildren, and so forth. 2678 01:50:21,310 --> 01:50:25,090 And what's nice about a tree structure is that, if you're storing data, 2679 01:50:25,090 --> 01:50:28,930 you can actually store the data in clever ways to the left child, 2680 01:50:28,930 --> 01:50:32,410 to the right child, and so forth, as follows. 2681 01:50:32,410 --> 01:50:37,390 Notice here, there's something curious about all the numbers in this data 2682 01:50:37,390 --> 01:50:38,800 structure. 2683 01:50:38,800 --> 01:50:41,522 What is noteworthy about them? 2684 01:50:41,522 --> 01:50:44,639 2685 01:50:44,639 --> 01:50:45,430 What is noteworthy? 2686 01:50:45,430 --> 01:50:45,856 Yeah? 2687 01:50:45,856 --> 01:50:46,710 AUDIENCE: Multiples of 11. 2688 01:50:46,710 --> 01:50:47,110 DAVID J. MALAN: What's that? 2689 01:50:47,110 --> 01:50:48,310 AUDIENCE: They're multiples of 11. 2690 01:50:48,310 --> 01:50:50,018 DAVID J. MALAN: They are multiples of 11. 2691 01:50:50,018 --> 01:50:53,281 That was just to make them look pretty though by the author here. 2692 01:50:53,281 --> 01:50:53,780 Yeah? 2693 01:50:53,780 --> 01:50:55,415 AUDIENCE: [INAUDIBLE]. 2694 01:50:55,415 --> 01:50:56,290 DAVID J. MALAN: Yeah. 2695 01:50:56,290 --> 01:50:58,120 There's a mathematical significance too. 2696 01:50:58,120 --> 01:51:02,560 Like, no matter what node or circle you look at, the value in it 2697 01:51:02,560 --> 01:51:08,276 is bigger than the left child and it's smaller than the right child. 2698 01:51:08,276 --> 01:51:09,400 So it's kind of in-between. 2699 01:51:09,400 --> 01:51:11,710 Any circle you look at, the number to the left is smaller, 2700 01:51:11,710 --> 01:51:13,126 the number to the right is bigger. 2701 01:51:13,126 --> 01:51:15,730 And I think that applies universally all over the place. 2702 01:51:15,730 --> 01:51:16,540 Yes? 2703 01:51:16,540 --> 01:51:17,560 So what does that mean? 2704 01:51:17,560 --> 01:51:23,400 We'll recall from, like, week 0 when we had a whole bunch of phone book pages 2705 01:51:23,400 --> 01:51:24,400 that we were searching-- 2706 01:51:24,400 --> 01:51:26,000 1, 2, 3, 4, 5, 6. 2707 01:51:26,000 --> 01:51:27,560 Let's give ourselves a 7th one. 2708 01:51:27,560 --> 01:51:30,550 Recall that when we did divide and conquer, or binary search, 2709 01:51:30,550 --> 01:51:31,719 we did it on an array. 2710 01:51:31,719 --> 01:51:34,510 And what was nice about binary search was we started in the middle, 2711 01:51:34,510 --> 01:51:36,430 and then we maybe went left, or we maybe went right, 2712 01:51:36,430 --> 01:51:38,346 and we kind of divided and divided and divided 2713 01:51:38,346 --> 01:51:41,920 and conquered the problem much more efficiently in logarithmic time 2714 01:51:41,920 --> 01:51:44,500 than it would have been if we did it linearly. 2715 01:51:44,500 --> 01:51:48,610 But we know now weeks later that arrays are kind of limiting, right? 2716 01:51:48,610 --> 01:51:51,320 If I keep storing all of my values in an array, 2717 01:51:51,320 --> 01:51:53,512 what can I not do with the array? 2718 01:51:53,512 --> 01:51:56,470 2719 01:51:56,470 --> 01:51:57,610 Make it bigger, right? 2720 01:51:57,610 --> 01:52:00,760 I can't add an element to it without copying every darn element, 2721 01:52:00,760 --> 01:52:02,770 as we've discussed thus far today. 2722 01:52:02,770 --> 01:52:04,990 But what if I was a little smarter about it? 2723 01:52:04,990 --> 01:52:07,900 What if I stored my values, not just in an array, 2724 01:52:07,900 --> 01:52:10,540 but I started storing them in these circles-- 2725 01:52:10,540 --> 01:52:12,010 let's call them nodes-- 2726 01:52:12,010 --> 01:52:18,400 and each of those nodes is really just an integer plus two additional values? 2727 01:52:18,400 --> 01:52:20,820 How would we implement this data structure in memory? 2728 01:52:20,820 --> 01:52:24,250 Well, here's an int n-- could represent the number in question. 2729 01:52:24,250 --> 01:52:26,260 And we could put that in a data structure 2730 01:52:26,260 --> 01:52:29,860 called a node that just has the same syntax as earlier today, 2731 01:52:29,860 --> 01:52:31,760 but I've left room for two more fields. 2732 01:52:31,760 --> 01:52:34,540 What is it that I want to represent in code if I 2733 01:52:34,540 --> 01:52:38,410 want to start storing my numbers, not in this old-school week 0 array, 2734 01:52:38,410 --> 01:52:42,142 but in a tree? 2735 01:52:42,142 --> 01:52:43,110 AUDIENCE: Two pointers. 2736 01:52:43,110 --> 01:52:43,600 DAVID J. MALAN: Two-- 2737 01:52:43,600 --> 01:52:44,580 AUDIENCE: Pointers. 2738 01:52:44,580 --> 01:52:44,975 DAVID J. MALAN: Two pointers. 2739 01:52:44,975 --> 01:52:45,630 Right? 2740 01:52:45,630 --> 01:52:49,070 A tree, as drawn here literally with arrows, 2741 01:52:49,070 --> 01:52:51,870 is just like saying every one of these nodes or circles 2742 01:52:51,870 --> 01:52:53,760 has a left child and a right child. 2743 01:52:53,760 --> 01:52:55,560 How do you implement children? 2744 01:52:55,560 --> 01:52:58,600 Well, you can literally just use pointer notation as well here. 2745 01:52:58,600 --> 01:53:01,920 A left child is just a pointer to another struct on the left. 2746 01:53:01,920 --> 01:53:05,100 And a right child is just another pointer to the child on the right. 2747 01:53:05,100 --> 01:53:08,730 And what's nice about this ultimately is that we can now 2748 01:53:08,730 --> 01:53:14,580 traverse this tree just as efficiently as we can traverse this array. 2749 01:53:14,580 --> 01:53:18,400 Because notice if I want to search for the number 66, 2750 01:53:18,400 --> 01:53:23,655 how many steps does it take me if I start at the top? 2751 01:53:23,655 --> 01:53:26,030 Just like Comey represented the start of our linked list, 2752 01:53:26,030 --> 01:53:29,450 so in the world of a tree does the root have special significance. 2753 01:53:29,450 --> 01:53:31,220 And that's where we always begin. 2754 01:53:31,220 --> 01:53:33,904 So how many steps does it take me to find 66 given the top? 2755 01:53:33,904 --> 01:53:34,570 AUDIENCE: Three. 2756 01:53:34,570 --> 01:53:35,450 AUDIENCE: Two. 2757 01:53:35,450 --> 01:53:36,741 DAVID J. MALAN: It looks like-- 2758 01:53:36,741 --> 01:53:37,929 yeah, two or three, right? 2759 01:53:37,929 --> 01:53:38,720 I start at the top. 2760 01:53:38,720 --> 01:53:41,240 I look at it and say, hmm, 55, which way do I go. 2761 01:53:41,240 --> 01:53:42,200 I go to the right. 2762 01:53:42,200 --> 01:53:43,110 Then I see 77. 2763 01:53:43,110 --> 01:53:43,610 OK. 2764 01:53:43,610 --> 01:53:44,600 Which way do I go? 2765 01:53:44,600 --> 01:53:45,560 I go to the left. 2766 01:53:45,560 --> 01:53:49,220 So it's the same logic as week 0 in dividing and conquering the phone book 2767 01:53:49,220 --> 01:53:51,170 or an array a couple of weeks later. 2768 01:53:51,170 --> 01:53:54,440 But we get to the number we care about pretty quickly. 2769 01:53:54,440 --> 01:53:55,634 And it's not linear. 2770 01:53:55,634 --> 01:53:57,800 And in fact, if we actually did out the math, what's 2771 01:53:57,800 --> 01:54:00,910 really cool about a binary search tree is that if you have n elements, 2772 01:54:00,910 --> 01:54:06,920 n circles, the height of that tree is by definition mathematically log n. 2773 01:54:06,920 --> 01:54:09,050 So the height of the tree just so happens 2774 01:54:09,050 --> 01:54:12,920 to correspond to exactly how many times you can take n 2775 01:54:12,920 --> 01:54:15,440 and divide it, divide it, divide it, divide it in two. 2776 01:54:15,440 --> 01:54:18,523 And you can actually see this if you think about it the reverse direction. 2777 01:54:18,523 --> 01:54:21,490 On the bottom row, there are how many elements? 2778 01:54:21,490 --> 01:54:21,990 All right? 2779 01:54:21,990 --> 01:54:23,323 And on the middle row, there is? 2780 01:54:23,323 --> 01:54:23,907 AUDIENCE: Two. 2781 01:54:23,907 --> 01:54:24,739 DAVID J. MALAN: Two. 2782 01:54:24,739 --> 01:54:26,210 And on the top row, there's one. 2783 01:54:26,210 --> 01:54:27,860 So you can actually see it in the reverse direction. 2784 01:54:27,860 --> 01:54:30,920 This is like divide and conquer, but in a different conceptual way. 2785 01:54:30,920 --> 01:54:33,930 2786 01:54:33,930 --> 01:54:38,160 Every row in the tree has half as many elements as the one below it. 2787 01:54:38,160 --> 01:54:41,450 And so the implication of that is just like from week 0 in the phone book 2788 01:54:41,450 --> 01:54:45,360 when we're dividing, and dividing, and dividing in half, and half, and half. 2789 01:54:45,360 --> 01:54:48,410 So this is only to say, now that we have structures and pointers, 2790 01:54:48,410 --> 01:54:50,310 we can build something like this. 2791 01:54:50,310 --> 01:54:53,150 But let's try one other example here too. 2792 01:54:53,150 --> 01:54:55,580 This is a crazy looking example. 2793 01:54:55,580 --> 01:54:57,350 But it's kind of amazing. 2794 01:54:57,350 --> 01:55:01,550 Suppose that, if we wanted to store a dictionary of words-- 2795 01:55:01,550 --> 01:55:04,130 so not humans' names this time, but English words. 2796 01:55:04,130 --> 01:55:06,710 So Merriam Webster or Oxford English Dictionary has what? 2797 01:55:06,710 --> 01:55:08,570 Thousands, hundreds of thousands of words 2798 01:55:08,570 --> 01:55:10,697 these days in English for instance? 2799 01:55:10,697 --> 01:55:12,030 How do you actually store those? 2800 01:55:12,030 --> 01:55:15,260 Well, if you just look up words in a dictionary back in yesteryear, 2801 01:55:15,260 --> 01:55:16,040 that is linear. 2802 01:55:16,040 --> 01:55:18,290 You have to start at the beginning and look through it 2803 01:55:18,290 --> 01:55:19,676 page by page, looking for words. 2804 01:55:19,676 --> 01:55:21,050 Or you could be a little smarter. 2805 01:55:21,050 --> 01:55:23,930 Because the words in any dictionary are hopefully alphabetized, 2806 01:55:23,930 --> 01:55:27,280 you can do the Mike Smith-style divide and conquer by going to the middle, 2807 01:55:27,280 --> 01:55:29,360 then the middle of the middle, and so forth-- 2808 01:55:29,360 --> 01:55:30,290 log of n. 2809 01:55:30,290 --> 01:55:34,700 But what if I told you, you could look up words in constant time-- 2810 01:55:34,700 --> 01:55:36,519 some fixed number of steps? 2811 01:55:36,519 --> 01:55:38,310 None of this divide and conquer complexity. 2812 01:55:38,310 --> 01:55:39,170 No log n. 2813 01:55:39,170 --> 01:55:43,610 Just constant time-- you want a word, go get it instantly. 2814 01:55:43,610 --> 01:55:46,940 That's where this last structure comes in, which is called a trie-- 2815 01:55:46,940 --> 01:55:51,110 T-R-I-E-- short for retrieval, even though it's pronounced the opposite. 2816 01:55:51,110 --> 01:55:57,450 So a trie is a tree each of whose nodes is an array. 2817 01:55:57,450 --> 01:56:00,900 So it's like this weird Frankenstein's monster kind of data structure. 2818 01:56:00,900 --> 01:56:04,250 We're just really combining lots of different ideas, as follows. 2819 01:56:04,250 --> 01:56:10,250 And the way a trie works, as is implied by this partial diagram on the board, 2820 01:56:10,250 --> 01:56:14,000 is that if you want to store the name Brian, for instance, 2821 01:56:14,000 --> 01:56:15,980 in your dictionary-- it's the first word-- 2822 01:56:15,980 --> 01:56:19,940 what you do is you start by creating a tree with just one node. 2823 01:56:19,940 --> 01:56:22,430 But that node is effectively an array. 2824 01:56:22,430 --> 01:56:26,120 That array is of size, let's say for simplicity, 26. 2825 01:56:26,120 --> 01:56:32,360 So A through Z. This location here therefore represents B for Brian. 2826 01:56:32,360 --> 01:56:37,010 So if I want to insert Brian into this tree, I create one node at the top. 2827 01:56:37,010 --> 01:56:39,830 And then for the second letter in his name, R, 2828 01:56:39,830 --> 01:56:44,030 I create another node, also an array, A through Z. 2829 01:56:44,030 --> 01:56:48,080 And so here, I put a pointer to this node here. 2830 01:56:48,080 --> 01:56:51,560 B-R-I. So I should have drawn some more boxes. 2831 01:56:51,560 --> 01:57:01,701 A, B, C, D, E, F, G, H, I. So here, I'm going to draw another pointer to B-- 2832 01:57:01,701 --> 01:57:02,200 wait. 2833 01:57:02,200 --> 01:57:02,700 Bian. 2834 01:57:02,700 --> 01:57:04,211 [LAUGHTER] 2835 01:57:04,211 --> 01:57:04,710 OK. 2836 01:57:04,710 --> 01:57:07,020 That's wrong. 2837 01:57:07,020 --> 01:57:09,540 Billy shall be our name. 2838 01:57:09,540 --> 01:57:12,870 Billy is at B. Wait. 2839 01:57:12,870 --> 01:57:14,560 No. 2840 01:57:14,560 --> 01:57:15,060 Dammit. 2841 01:57:15,060 --> 01:57:18,610 B, B. B-I-A-- yes, this works. 2842 01:57:18,610 --> 01:57:19,110 This works. 2843 01:57:19,110 --> 01:57:19,440 OK. 2844 01:57:19,440 --> 01:57:20,040 Sorry. 2845 01:57:20,040 --> 01:57:20,770 So here we go. 2846 01:57:20,770 --> 01:57:23,520 We're inserting Billy into this fancy data structure. 2847 01:57:23,520 --> 01:57:25,485 So the first node represents the first letter. 2848 01:57:25,485 --> 01:57:27,360 The second node represents the second letter. 2849 01:57:27,360 --> 01:57:29,350 The third node represents the third letter. 2850 01:57:29,350 --> 01:57:30,340 And so forth. 2851 01:57:30,340 --> 01:57:32,700 But what's cool about this is the re-usability. 2852 01:57:32,700 --> 01:57:36,690 So notice if this is the second letter and I counted this out correctly, 2853 01:57:36,690 --> 01:57:39,510 I, this is going to lead to a third node deeper 2854 01:57:39,510 --> 01:57:44,520 in the tree where it's L that we care about next, and then another one 2855 01:57:44,520 --> 01:57:47,562 down here which represents another L. 2856 01:57:47,562 --> 01:57:49,020 And I'll start drawing the letters. 2857 01:57:49,020 --> 01:57:53,860 L. This is B. This is I. L. And we'll call this L. 2858 01:57:53,860 --> 01:57:59,350 And then, finally, another one over here, which is a Y. And this 2859 01:57:59,350 --> 01:58:00,780 gets pointing down here. 2860 01:58:00,780 --> 01:58:02,310 This gets pointing here. 2861 01:58:02,310 --> 01:58:03,490 And so forth. 2862 01:58:03,490 --> 01:58:06,660 So in short, we have one node essentially 2863 01:58:06,660 --> 01:58:11,290 for every letter in the word that we're inserting into the data structure. 2864 01:58:11,290 --> 01:58:14,250 Now, this looks stupidly inefficient at the moment. 2865 01:58:14,250 --> 01:58:20,580 Because to store B, I, L, L, Y, how much memory did I just use? 2866 01:58:20,580 --> 01:58:26,370 26 plus 26 plus 26 plus 26 plus 26. 2867 01:58:26,370 --> 01:58:30,660 Just to store five characters, I use 26 times 5. 2868 01:58:30,660 --> 01:58:33,060 But this is kind of thematic in computer science-- 2869 01:58:33,060 --> 01:58:36,450 spend a little more space, and I bet I can decrease the amount of time 2870 01:58:36,450 --> 01:58:37,980 it takes to find anyone. 2871 01:58:37,980 --> 01:58:42,390 Because now no matter how many other students are in this data structure-- 2872 01:58:42,390 --> 01:58:44,370 and for instance, let's do another one. 2873 01:58:44,370 --> 01:58:48,390 If we had another one, like Bob-- 2874 01:58:48,390 --> 01:58:51,000 so B is the same first letter. 2875 01:58:51,000 --> 01:58:52,740 That leads us to this second node. 2876 01:58:52,740 --> 01:58:57,360 O is somewhere else in this array, say, over here. 2877 01:58:57,360 --> 01:59:00,480 So this represents O. And then Bob has another one. 2878 01:59:00,480 --> 01:59:02,280 So there's going to be another array here. 2879 01:59:02,280 --> 01:59:06,870 And this is why the picture above draws this so succinctly. 2880 01:59:06,870 --> 01:59:08,490 This is how we might store Bob. 2881 01:59:08,490 --> 01:59:16,980 So B, I, L, L, Y. Or you can follow a different route, B, O, B. 2882 01:59:16,980 --> 01:59:19,016 So we can start to reuse some of these arrays. 2883 01:59:19,016 --> 01:59:21,390 So there's where you start to get some of the efficiency. 2884 01:59:21,390 --> 01:59:25,020 Any time names share a few letters, then you start reusing those same nodes. 2885 01:59:25,020 --> 01:59:27,160 So it's not super, super wasteful. 2886 01:59:27,160 --> 01:59:30,690 But the question now is, if there's like 1,000 students in the class, 2887 01:59:30,690 --> 01:59:33,360 or 1,000 students in the room, we're going have a lot of nodes 2888 01:59:33,360 --> 01:59:34,980 there on the board. 2889 01:59:34,980 --> 01:59:38,430 But how many steps does it take to find Billy, 2890 01:59:38,430 --> 01:59:44,930 or Bob, or any name with this data structure, and to conclude yes or no 2891 01:59:44,930 --> 01:59:48,440 that student is in the class? 2892 01:59:48,440 --> 01:59:51,950 So, like, five for Billy, three for Bob. 2893 01:59:51,950 --> 01:59:56,030 And notice none of that math has any relationship 2894 01:59:56,030 --> 01:59:58,100 to how many students are in the room. 2895 01:59:58,100 --> 02:00:02,120 If we instead wrote out a long list of 1,000 names, in the worst case, 2896 02:00:02,120 --> 02:00:04,482 it might take me 1,000 steps to find Billy or Bob. 2897 02:00:04,482 --> 02:00:06,440 Maybe I could be a little smarter if I sort it. 2898 02:00:06,440 --> 02:00:08,720 But in the worst case, big O of n, it's linear. 2899 02:00:08,720 --> 02:00:12,140 Or if I used a hash table before, and maybe there's 2900 02:00:12,140 --> 02:00:14,150 1,000 students in the room, but, OK, there's 2901 02:00:14,150 --> 02:00:16,220 26 letters in the English alphabet at least. 2902 02:00:16,220 --> 02:00:17,140 So that's 26 buckets. 2903 02:00:17,140 --> 02:00:19,790 So maybe it's 1,000 divided by 26, worst case, 2904 02:00:19,790 --> 02:00:23,510 if I'm using those linked lists inside my array. 2905 02:00:23,510 --> 02:00:24,470 But wait a minute. 2906 02:00:24,470 --> 02:00:28,400 If I'm using this structure, a trie, where every node in the tree 2907 02:00:28,400 --> 02:00:34,287 is just in an array that leads me to the next node, ala breadcrumbs, B, I, L, L, 2908 02:00:34,287 --> 02:00:36,170 Y is 5 and always 5. 2909 02:00:36,170 --> 02:00:38,390 B, O, B is always 3. 2910 02:00:38,390 --> 02:00:41,910 B, R, I, A, N would have been 5 as well. 2911 02:00:41,910 --> 02:00:46,130 None of these totals has any impact or any influence 2912 02:00:46,130 --> 02:00:49,890 from the number of total names in the data structure. 2913 02:00:49,890 --> 02:00:54,260 So a trie in some sense is this amazing holy grail 2914 02:00:54,260 --> 02:00:57,650 in that, by combining these various data structures, now you get constant time, 2915 02:00:57,650 --> 02:00:59,410 but you do pay a price. 2916 02:00:59,410 --> 02:01:02,666 And just to be clear, what is the price we seem to be paying? 2917 02:01:02,666 --> 02:01:03,772 AUDIENCE: Memory. 2918 02:01:03,772 --> 02:01:04,730 DAVID J. MALAN: Memory. 2919 02:01:04,730 --> 02:01:07,310 And in fact, this is why I'm not really drawing it much more. 2920 02:01:07,310 --> 02:01:09,851 Because it just becomes a big mess on the screen because it's 2921 02:01:09,851 --> 02:01:11,660 hard to draw such wide data structures. 2922 02:01:11,660 --> 02:01:13,160 It's taking a huge amount of memory. 2923 02:01:13,160 --> 02:01:15,750 But theoretically, it's coming faster. 2924 02:01:15,750 --> 02:01:16,250 Yeah? 2925 02:01:16,250 --> 02:01:17,630 Question. 2926 02:01:17,630 --> 02:01:20,420 AUDIENCE: So would you deal with a case if someone is in the Bob, 2927 02:01:20,420 --> 02:01:22,411 but then the other kid is in the Bobby? 2928 02:01:22,411 --> 02:01:23,660 DAVID J. MALAN: Good question. 2929 02:01:23,660 --> 02:01:25,100 So it's a bit of a simplification. 2930 02:01:25,100 --> 02:01:28,430 If you were storing both Bob and Bobby, you would actually keep going. 2931 02:01:28,430 --> 02:01:31,220 So each of these elements is not just one letter. 2932 02:01:31,220 --> 02:01:35,660 You also have essentially a node there or some other data structure 2933 02:01:35,660 --> 02:01:37,807 that says either stop here or continue. 2934 02:01:37,807 --> 02:01:39,890 And you'll see actually in the problems that we'll 2935 02:01:39,890 --> 02:01:42,098 propose to you how you can represent that idea if you 2936 02:01:42,098 --> 02:01:43,340 choose to go this route. 2937 02:01:43,340 --> 02:01:46,670 Indeed, the challenge ahead ultimately is something quite like this. 2938 02:01:46,670 --> 02:01:48,811 You will implement your very own spell checker. 2939 02:01:48,811 --> 02:01:51,560 And we will give you code that gets you started with this process. 2940 02:01:51,560 --> 02:01:53,893 And of course, a spell checker these days in Google Docs 2941 02:01:53,893 --> 02:01:56,397 and Microsoft Word just underlines in red misspelled words. 2942 02:01:56,397 --> 02:01:57,230 But what's going on? 2943 02:01:57,230 --> 02:01:59,390 And how is it that Word or Google Docs can 2944 02:01:59,390 --> 02:02:02,720 spell check your English or whatever language so quickly? 2945 02:02:02,720 --> 02:02:05,570 Well, it has a dictionary in memory, probably with tens of thousands 2946 02:02:05,570 --> 02:02:07,460 or hundreds of thousands of words. 2947 02:02:07,460 --> 02:02:10,850 And all they're doing constantly is, every time you type a word 2948 02:02:10,850 --> 02:02:13,100 and hit the Spacebar, or Period, or Enter, 2949 02:02:13,100 --> 02:02:16,220 it's quickly looking up that new word or those words in its dictionary 2950 02:02:16,220 --> 02:02:20,610 and saying, yes or no, should I squiggle a red line underneath this word. 2951 02:02:20,610 --> 02:02:23,720 And so what we're going to do is give you a big text file, ASCII text, 2952 02:02:23,720 --> 02:02:26,010 containing 100-plus thousand words. 2953 02:02:26,010 --> 02:02:28,010 You're going to have to decide how to load those 2954 02:02:28,010 --> 02:02:32,517 into memory, not just correctly, but in a way that's well designed. 2955 02:02:32,517 --> 02:02:34,850 And we'll even give you a tool, if you choose to use it, 2956 02:02:34,850 --> 02:02:36,560 that times how long your code takes. 2957 02:02:36,560 --> 02:02:39,110 And it even counts how much RAM you're actually using. 2958 02:02:39,110 --> 02:02:42,312 But the key goals for this week and our final week in C 2959 02:02:42,312 --> 02:02:44,270 is to take some of these basic building blocks, 2960 02:02:44,270 --> 02:02:48,680 like arrays, and pointers, and structures, 2961 02:02:48,680 --> 02:02:51,980 and decide for yourselves how you're most comfortable stitching them 2962 02:02:51,980 --> 02:02:55,580 together, to what extent you want to really fine tune your code beyond just 2963 02:02:55,580 --> 02:03:00,050 getting it correct, and to give you a better sense of the underlying code 2964 02:03:00,050 --> 02:03:02,450 that people have had to write for years in libraries 2965 02:03:02,450 --> 02:03:05,022 to make programming doable, ala Scratch. 2966 02:03:05,022 --> 02:03:07,730 Because in just a few weeks, we're going to transition to Python. 2967 02:03:07,730 --> 02:03:10,490 And the dozens of lines of code you've been writing now 2968 02:03:10,490 --> 02:03:12,714 are going to be whittled down to one line, two line, 2969 02:03:12,714 --> 02:03:15,380 because we're going to get a lot more features from these newer, 2970 02:03:15,380 --> 02:03:16,280 fancier languages. 2971 02:03:16,280 --> 02:03:18,860 But you'll hopefully have an appreciation of what is actually 2972 02:03:18,860 --> 02:03:20,600 going on underneath that hood. 2973 02:03:20,600 --> 02:03:22,130 So I'll stick around for any one-on-one questions. 2974 02:03:22,130 --> 02:03:22,790 Let's call it a day. 2975 02:03:22,790 --> 02:03:24,873 Take a duck on your way out for roommates as well. 2976 02:03:24,873 --> 02:03:26,920 And we'll see you next time. 2977 02:03:26,920 --> 02:03:27,443