1 00:00:00,000 --> 00:00:03,472 [MUSIC PLAYING] 2 00:00:03,472 --> 00:01:02,110 3 00:01:02,110 --> 00:01:05,319 DAVID J. MALAN: All right, this is CS50. 4 00:01:05,319 --> 00:01:07,720 And this is week 5. 5 00:01:07,720 --> 00:01:11,890 And among our goals for today are to revisit some topics from past weeks 6 00:01:11,890 --> 00:01:16,120 but to focus all the more on design possibilities, particularly 7 00:01:16,120 --> 00:01:17,420 by way of data structures. 8 00:01:17,420 --> 00:01:21,040 So data structures, again, is this way via which you can structure your data. 9 00:01:21,040 --> 00:01:23,170 But more specifically in C, It's how you can 10 00:01:23,170 --> 00:01:27,160 use your computer's memory in interesting and, daresay, clever ways 11 00:01:27,160 --> 00:01:29,067 to actually solve problems more effectively. 12 00:01:29,067 --> 00:01:31,150 But we're going to see today that there's actually 13 00:01:31,150 --> 00:01:33,383 different types of data structures. 14 00:01:33,383 --> 00:01:35,550 And we'll make the distinction between abstractions, 15 00:01:35,550 --> 00:01:38,170 like high-level descriptions of these structures 16 00:01:38,170 --> 00:01:41,150 and the lower-level implementation details so to speak. 17 00:01:41,150 --> 00:01:45,170 So in particular, we'll talk first today about what we call abstract data types. 18 00:01:45,170 --> 00:01:48,190 So an abstract data type is like a data structure. 19 00:01:48,190 --> 00:01:51,250 But it offers certain properties, certain characteristics. 20 00:01:51,250 --> 00:01:53,230 And it's actually up to the programmer how 21 00:01:53,230 --> 00:01:55,850 to implement the underlying implementation details. 22 00:01:55,850 --> 00:01:58,960 So, for instance, there's actually this abstract data type that's 23 00:01:58,960 --> 00:02:00,880 common in computing known as a queue. 24 00:02:00,880 --> 00:02:04,490 And from the real world, most of us are presumably familiar with queues, 25 00:02:04,490 --> 00:02:07,770 otherwise known in the US typically as lines or forming lines. 26 00:02:07,770 --> 00:02:12,077 In fact, I have here three bags of cookies. 27 00:02:12,077 --> 00:02:14,660 Could I get three volunteers to come up on stage and queue up? 28 00:02:14,660 --> 00:02:16,310 OK, I saw your hand first. 29 00:02:16,310 --> 00:02:17,630 How about your hand second? 30 00:02:17,630 --> 00:02:20,000 And in the blue. 31 00:02:20,000 --> 00:02:22,640 OK, come on down, just you three. 32 00:02:22,640 --> 00:02:23,570 Come on over. 33 00:02:23,570 --> 00:02:26,090 And if you want to queue up over here if you could. 34 00:02:26,090 --> 00:02:29,060 35 00:02:29,060 --> 00:02:30,060 Come on down. 36 00:02:30,060 --> 00:02:30,560 Thank you. 37 00:02:30,560 --> 00:02:32,870 As we begin, do you want to introduce yourselves first? 38 00:02:32,870 --> 00:02:33,140 NAFTALI HOROWITZ: Hi. 39 00:02:33,140 --> 00:02:34,740 My name is Naftali Horowitz. 40 00:02:34,740 --> 00:02:37,850 I'm a first year studying computer science and economics. 41 00:02:37,850 --> 00:02:41,567 And I sleep at Hurlbut Hall. 42 00:02:41,567 --> 00:02:42,900 DAVID J. MALAN: All right, next. 43 00:02:42,900 --> 00:02:44,570 CATHERINE: Hi, everyone, my name is Catherine. 44 00:02:44,570 --> 00:02:46,260 I'm planning on studying engineering. 45 00:02:46,260 --> 00:02:49,010 I'm not sure mechanical or electrical yet but one of the two. 46 00:02:49,010 --> 00:02:50,218 And I'm currently in Kennedy. 47 00:02:50,218 --> 00:02:51,093 DAVID J. MALAN: Nice. 48 00:02:51,093 --> 00:02:51,710 Nice to meet. 49 00:02:51,710 --> 00:02:52,740 ISABELLA: Hi, everyone. 50 00:02:52,740 --> 00:02:53,570 I'm Isabella. 51 00:02:53,570 --> 00:02:54,620 I'm in Strauss. 52 00:02:54,620 --> 00:02:56,412 And I plan on majoring in computer science. 53 00:02:56,412 --> 00:02:57,495 DAVID J. MALAN: Wonderful. 54 00:02:57,495 --> 00:02:58,950 Well, welcome to all three of you. 55 00:02:58,950 --> 00:03:00,630 And I think this will be pretty straightforward. 56 00:03:00,630 --> 00:03:02,420 I have here these three bags of cookies. 57 00:03:02,420 --> 00:03:04,710 You formed nicely this line or this queue. 58 00:03:04,710 --> 00:03:07,610 So if you'd like to come up first and take your cookies, thank you. 59 00:03:07,610 --> 00:03:10,193 And right that way, that's all there is to this demonstration. 60 00:03:10,193 --> 00:03:11,390 Your cookies as well. 61 00:03:11,390 --> 00:03:12,650 Right this way. 62 00:03:12,650 --> 00:03:13,580 And your cookies. 63 00:03:13,580 --> 00:03:14,300 Right this way. 64 00:03:14,300 --> 00:03:15,650 Wonderfully well done. 65 00:03:15,650 --> 00:03:17,340 Thank you to our volunteers. 66 00:03:17,340 --> 00:03:21,620 The point is actually sincere, though, simple as that demonstration was. 67 00:03:21,620 --> 00:03:23,480 And as easy as it was to get those cookies, 68 00:03:23,480 --> 00:03:26,660 queues actually manifest a property that actually 69 00:03:26,660 --> 00:03:29,900 is germane to a lot of problem solving and computing and the real world. 70 00:03:29,900 --> 00:03:34,780 Specifically, queues offer this characteristic, FIFO, first in first 71 00:03:34,780 --> 00:03:35,280 out. 72 00:03:35,280 --> 00:03:37,280 And indeed as our volunteers just noticed, 73 00:03:37,280 --> 00:03:41,327 as they queued up on stage, 1, 2, 3, that is the order in which I 74 00:03:41,327 --> 00:03:42,410 handed them their cookies. 75 00:03:42,410 --> 00:03:44,240 And daresay it's a very equitable approach. 76 00:03:44,240 --> 00:03:45,050 It's very fair. 77 00:03:45,050 --> 00:03:47,690 First come, first served might be a more casual way 78 00:03:47,690 --> 00:03:50,420 of describing FIFO, first in, first out. 79 00:03:50,420 --> 00:03:53,570 Now, structures like these actually offer specific operations 80 00:03:53,570 --> 00:03:54,337 that make sense. 81 00:03:54,337 --> 00:03:57,170 And in the context of queues, we generally describe these operations 82 00:03:57,170 --> 00:03:58,850 as enqueueing and dequeueing 83 00:03:58,850 --> 00:04:01,600 So when our first three volunteers came up, they enqueued. 84 00:04:01,600 --> 00:04:03,880 And as I handed them each a bag of cookies, 85 00:04:03,880 --> 00:04:06,550 they dequeued and exited in that same order. 86 00:04:06,550 --> 00:04:11,530 Now, how could you go about implementing a queue in code, specifically in C? 87 00:04:11,530 --> 00:04:14,210 Well, we can actually implement it in bunches of different ways. 88 00:04:14,210 --> 00:04:17,950 But perhaps the most obvious is to borrow our old friend, namely arrays. 89 00:04:17,950 --> 00:04:21,100 And we could use a data structure that looks a little something 90 00:04:21,100 --> 00:04:26,150 like this, whereby we specify the total capacity of this data structure. 91 00:04:26,150 --> 00:04:30,140 For instance, we might store a total of 50 people or just 3 in this case. 92 00:04:30,140 --> 00:04:33,130 We might define our structure then as containing those people 93 00:04:33,130 --> 00:04:34,520 as simply an array. 94 00:04:34,520 --> 00:04:37,750 And if a person is a data type that we've defined in week past, 95 00:04:37,750 --> 00:04:40,510 you could imagine each of our volunteers is indeed a person. 96 00:04:40,510 --> 00:04:44,440 And we've stored them one after the other contiguously in memory 97 00:04:44,440 --> 00:04:47,410 by way of this actual array. 98 00:04:47,410 --> 00:04:51,637 But we do need to keep track inside of a queue using one other piece of data-- 99 00:04:51,637 --> 00:04:53,470 namely, we need to keep track of an integer, 100 00:04:53,470 --> 00:04:57,610 like the size, like how many people are actually in the queue at this moment. 101 00:04:57,610 --> 00:05:00,130 Because if we have a total capacity of 50, 102 00:05:00,130 --> 00:05:02,357 I'd like to if I only have three volunteers. 103 00:05:02,357 --> 00:05:04,190 Then I can do some quick arithmetic and know 104 00:05:04,190 --> 00:05:08,480 that I could have fit another 47 people in this same queue. 105 00:05:08,480 --> 00:05:09,440 But it's finite. 106 00:05:09,440 --> 00:05:13,400 Of course, if we had 50 volunteers all wanting cookies, that's as many people 107 00:05:13,400 --> 00:05:14,660 as we could actually handle. 108 00:05:14,660 --> 00:05:17,840 So there is this upper bound then on how many we could fit. 109 00:05:17,840 --> 00:05:21,800 But there's yet other ways for storing data inside of a computer's memory. 110 00:05:21,800 --> 00:05:24,410 And there's this other abstract data type known as a stack. 111 00:05:24,410 --> 00:05:26,600 And stacks are actually omnipresent as well 112 00:05:26,600 --> 00:05:29,340 even though it's not necessarily the system 113 00:05:29,340 --> 00:05:31,640 you would want when you line up on stage. 114 00:05:31,640 --> 00:05:34,070 For instance, could we get three more volunteers? 115 00:05:34,070 --> 00:05:37,190 OK, I saw a hand here, right here, and right here. 116 00:05:37,190 --> 00:05:38,150 Come on down. 117 00:05:38,150 --> 00:05:40,610 We'll have the orchestra come up this time. 118 00:05:40,610 --> 00:05:43,580 All right, come on over. 119 00:05:43,580 --> 00:05:45,740 And if you wouldn't mind, come on over. 120 00:05:45,740 --> 00:05:47,620 We'll do introductions first. 121 00:05:47,620 --> 00:05:50,870 This will be almost as easy as the last one if you want to introduce yourself. 122 00:05:50,870 --> 00:05:52,890 And let me just stack you against the lectern this time. 123 00:05:52,890 --> 00:05:54,090 So if you could go there. 124 00:05:54,090 --> 00:05:55,560 And if you could come over here. 125 00:05:55,560 --> 00:05:58,220 And if you could come over here, we'll stack all three of you. 126 00:05:58,220 --> 00:05:58,970 So you were first. 127 00:05:58,970 --> 00:06:00,260 So you're first in the stack. 128 00:06:00,260 --> 00:06:00,530 SPEAKER: Hi. 129 00:06:00,530 --> 00:06:01,130 I'm [INAUDIBLE]. 130 00:06:01,130 --> 00:06:02,505 I have no idea what I'm studying. 131 00:06:02,505 --> 00:06:03,452 And I live in Strauss. 132 00:06:03,452 --> 00:06:04,535 DAVID J. MALAN: Wonderful. 133 00:06:04,535 --> 00:06:05,035 And next? 134 00:06:05,035 --> 00:06:05,535 SPEAKER: Hi. 135 00:06:05,535 --> 00:06:07,370 I'm [? Tanai. ?] I'm studying econ and CS. 136 00:06:07,370 --> 00:06:09,270 And I live in Canada. 137 00:06:09,270 --> 00:06:09,770 CLARA: Hi. 138 00:06:09,770 --> 00:06:10,160 I'm Clara. 139 00:06:10,160 --> 00:06:11,240 I want to study applied math. 140 00:06:11,240 --> 00:06:12,320 And I'm in Wigglesworth. 141 00:06:12,320 --> 00:06:12,860 DAVID J. MALAN: Wonderful. 142 00:06:12,860 --> 00:06:14,068 Welcome, to all three of you. 143 00:06:14,068 --> 00:06:18,260 And if I may, let me just advance a bit more information about stacks. 144 00:06:18,260 --> 00:06:21,050 The catch is that stacks actually support 145 00:06:21,050 --> 00:06:24,470 what's known as LIFO, so last in, first out, which 146 00:06:24,470 --> 00:06:26,850 is sort of the opposite really of a queue or a line. 147 00:06:26,850 --> 00:06:28,440 So in fact you were last in line. 148 00:06:28,440 --> 00:06:29,900 So here we have your cookies. 149 00:06:29,900 --> 00:06:30,687 Thank you so much. 150 00:06:30,687 --> 00:06:33,270 And if you'd like to exit that way, we have your cookies here. 151 00:06:33,270 --> 00:06:34,020 Thank you so much. 152 00:06:34,020 --> 00:06:35,370 We'd you to exit this way. 153 00:06:35,370 --> 00:06:39,320 And even though you were first, LIFO doesn't really 154 00:06:39,320 --> 00:06:43,970 give you any cookies because you're first in, not last in. 155 00:06:43,970 --> 00:06:45,920 So, yeah, OK, point's made. 156 00:06:45,920 --> 00:06:47,370 We'll give you the cookies. 157 00:06:47,370 --> 00:06:50,030 All right, so thank you to all three of our volunteers. 158 00:06:50,030 --> 00:06:55,160 But LIFO, suffice to say, doesn't offer the same fairness 159 00:06:55,160 --> 00:06:58,070 guarantees as a queue or a line more traditionally. 160 00:06:58,070 --> 00:07:01,680 And imagine just lining up in any store or the dining hall or the like. 161 00:07:01,680 --> 00:07:06,540 Ideally, you want the people running the place to adhere to that queue, 162 00:07:06,540 --> 00:07:10,620 to that line so that FIFO is preserved if you indeed care about being first, 163 00:07:10,620 --> 00:07:15,130 whereas there are contexts in which LIFO does actually make sense. 164 00:07:15,130 --> 00:07:18,180 In fact, if you think about Gmail, your inbox, or Outlook, 165 00:07:18,180 --> 00:07:21,083 typically you're viewing your inbox as a stack right. 166 00:07:21,083 --> 00:07:23,250 Because when you get new mail, where does it end up? 167 00:07:23,250 --> 00:07:26,080 It actually ends up in the top. 168 00:07:26,080 --> 00:07:30,000 And if you're like me, odds are which emails do you tend to first? 169 00:07:30,000 --> 00:07:32,640 I mean, probably the ones on the top, the ones that 170 00:07:32,640 --> 00:07:36,300 came in last, most recently that is and that might actually 171 00:07:36,300 --> 00:07:39,578 be to the detriment of people who emailed you earlier today or yesterday. 172 00:07:39,578 --> 00:07:42,120 Because once they sort of fall off the bottom of your screen, 173 00:07:42,120 --> 00:07:45,390 frankly, unless you click next, you may never see those emails again. 174 00:07:45,390 --> 00:07:48,000 But stacks are indeed one way of storing data. 175 00:07:48,000 --> 00:07:50,520 And Google and Microsoft presumably made the judgment 176 00:07:50,520 --> 00:07:55,200 call that, in general, we users want to see the most recent data first. 177 00:07:55,200 --> 00:07:58,110 The last information might be the first we want out. 178 00:07:58,110 --> 00:08:01,140 Now, just in terms of nomenclature, the two operations 179 00:08:01,140 --> 00:08:03,990 that are analogous to enqueueing and dequeueing 180 00:08:03,990 --> 00:08:07,870 but with this property of LIFO are instead called push and pop. 181 00:08:07,870 --> 00:08:10,350 So when our first volunteer came up on stage, so to speak, 182 00:08:10,350 --> 00:08:12,930 I pushed him onto the stack against the lectern there. 183 00:08:12,930 --> 00:08:14,370 Second person was pushed. 184 00:08:14,370 --> 00:08:15,587 Third person was pushed. 185 00:08:15,587 --> 00:08:17,670 And then when it was time to hand out the cookies, 186 00:08:17,670 --> 00:08:21,720 we popped them, so to speak, one after the other but preserving that LIFO 187 00:08:21,720 --> 00:08:22,285 property. 188 00:08:22,285 --> 00:08:24,660 But here's where things are a little interesting in terms 189 00:08:24,660 --> 00:08:26,220 of implementation details. 190 00:08:26,220 --> 00:08:30,930 A stack could be implemented almost identically underneath the hood 191 00:08:30,930 --> 00:08:33,039 to a queue because what do you need? 192 00:08:33,039 --> 00:08:36,240 You need an array of people, which we could use our person data 193 00:08:36,240 --> 00:08:38,304 type for past classes. 194 00:08:38,304 --> 00:08:40,679 We have to keep track of how many people are in the stack 195 00:08:40,679 --> 00:08:43,260 so that even if we have a capacity of like 50, 196 00:08:43,260 --> 00:08:48,330 we know at least that we can store 3 plus maybe 47 others. 197 00:08:48,330 --> 00:08:51,330 Now, there's still going to be a change in the underlying implementation 198 00:08:51,330 --> 00:08:55,830 details because not pictured here is the actual C code that actually pushes 199 00:08:55,830 --> 00:08:57,900 and pops or enqueues and dequeues. 200 00:08:57,900 --> 00:09:01,290 So whatever loops you're using, whatever code you're using, 201 00:09:01,290 --> 00:09:04,110 odds are that's where those properties are going to be implemented. 202 00:09:04,110 --> 00:09:06,690 FIFO versus LIFO, you're going to implement maybe the loop 203 00:09:06,690 --> 00:09:10,470 in this direction instead of this one or some such distinction. 204 00:09:10,470 --> 00:09:12,690 But at the end of the day, stacks and queues 205 00:09:12,690 --> 00:09:15,030 are just abstract data types in the sense 206 00:09:15,030 --> 00:09:17,700 that we can implement them in bunches of ways, two of them 207 00:09:17,700 --> 00:09:19,650 among them here thus far on the screen. 208 00:09:19,650 --> 00:09:21,870 But that array is going to come back to bite us. 209 00:09:21,870 --> 00:09:24,360 Because if you only have a capacity of 50, 210 00:09:24,360 --> 00:09:27,450 what happens if 51 people want cookies next time? 211 00:09:27,450 --> 00:09:30,300 You just don't have room for them even though, clearly, we have 212 00:09:30,300 --> 00:09:32,010 enough room for the people themselves. 213 00:09:32,010 --> 00:09:33,240 We have enough memory. 214 00:09:33,240 --> 00:09:36,300 So it seems a little shortsighted to limit just how much data 215 00:09:36,300 --> 00:09:38,170 can fit in our data structures. 216 00:09:38,170 --> 00:09:41,580 So with that said, a friend of ours, Shannon Duvall at Elon University, 217 00:09:41,580 --> 00:09:45,390 kindly put together a visualization of the same. 218 00:09:45,390 --> 00:09:50,490 And allow me to introduce you to two fellows known as Jack and Lou. 219 00:09:50,490 --> 00:09:52,485 If we could dim the lights for this video. 220 00:09:52,485 --> 00:09:53,152 [VIDEO PLAYBACK] 221 00:09:53,152 --> 00:09:56,138 [MUSIC PLAYING] 222 00:09:56,138 --> 00:09:58,112 223 00:09:58,112 --> 00:10:00,940 - Once upon a time, there was a guy named Jack. 224 00:10:00,940 --> 00:10:04,270 When it came to making friends, Jack did not have the knack. 225 00:10:04,270 --> 00:10:07,300 So Jack went to talk to the most popular guy he knew. 226 00:10:07,300 --> 00:10:09,910 He went up to Lou and asked what do I do? 227 00:10:09,910 --> 00:10:12,430 Lou saw that his friend was really distressed. 228 00:10:12,430 --> 00:10:15,070 Well, Lou began, just look how you're dressed. 229 00:10:15,070 --> 00:10:17,620 Don't you have any clothes with a different look? 230 00:10:17,620 --> 00:10:18,820 Yes, said Jack. 231 00:10:18,820 --> 00:10:20,000 I sure do. 232 00:10:20,000 --> 00:10:22,340 Come to my house and I'll show them to you. 233 00:10:22,340 --> 00:10:23,590 So they went off to Jack's. 234 00:10:23,590 --> 00:10:26,620 And Jack showed Lou the box where he kept all his shirts 235 00:10:26,620 --> 00:10:28,270 and his pants and his socks. 236 00:10:28,270 --> 00:10:31,330 Lou said, I see you have all your clothes in a pile. 237 00:10:31,330 --> 00:10:33,820 Why don't you wear some others once in a while? 238 00:10:33,820 --> 00:10:37,090 Jack said, well, when I remove clothes and socks, 239 00:10:37,090 --> 00:10:39,760 I wash them and put them away in the box. 240 00:10:39,760 --> 00:10:42,340 Then comes the next morning, and up I hop. 241 00:10:42,340 --> 00:10:45,400 I go to the box and get my clothes off the top. 242 00:10:45,400 --> 00:10:48,100 Lou quickly realized the problem with Jack. 243 00:10:48,100 --> 00:10:51,070 He kept clothes, CDs, and books in a stack. 244 00:10:51,070 --> 00:10:53,600 When he reached for something to read or to wear, 245 00:10:53,600 --> 00:10:55,960 he chose the top book or underwear. 246 00:10:55,960 --> 00:10:58,580 Then when he was done, he would put it right back. 247 00:10:58,580 --> 00:11:01,230 Back it would go on top of the stack. 248 00:11:01,230 --> 00:11:03,560 I know the solution, said the triumphant Lou. 249 00:11:03,560 --> 00:11:06,140 You need to learn to start using a queue. 250 00:11:06,140 --> 00:11:09,090 Lou took Jack's clothes and hung them in a closet. 251 00:11:09,090 --> 00:11:11,720 And when he had emptied the box, he just tossed it. 252 00:11:11,720 --> 00:11:15,710 Then he said now, Jack, at the end of the day, put your clothes on the left 253 00:11:15,710 --> 00:11:17,160 when you put them away. 254 00:11:17,160 --> 00:11:18,920 Then tomorrow morning when you see the sun 255 00:11:18,920 --> 00:11:22,550 shine, get your clothes from the right from the end of the line. 256 00:11:22,550 --> 00:11:24,050 Don't you see, said Lou? 257 00:11:24,050 --> 00:11:25,520 It will be so nice. 258 00:11:25,520 --> 00:11:28,700 You'll wear everything once before you wear something twice. 259 00:11:28,700 --> 00:11:31,760 And with everything in queues in his closet and shelf, 260 00:11:31,760 --> 00:11:34,940 Jack started to feel quite sure of himself all thanks 261 00:11:34,940 --> 00:11:36,920 to Lou and his wonderful queue. 262 00:11:36,920 --> 00:11:39,535 263 00:11:39,535 --> 00:11:40,430 [END PLAYBACK] 264 00:11:40,430 --> 00:11:40,930 265 00:11:40,930 --> 00:11:42,180 DAVID J. MALAN: So the same-- 266 00:11:42,180 --> 00:11:44,010 wonderful, thanks to Shannon-- 267 00:11:44,010 --> 00:11:46,440 so you might have noticed I wear black all the time 268 00:11:46,440 --> 00:11:49,660 so we could make a similar gag about here's what my stack of clothes at home 269 00:11:49,660 --> 00:11:50,160 looks. 270 00:11:50,160 --> 00:11:52,740 Even though I might own a blue and a red sweatshirt, 271 00:11:52,740 --> 00:11:55,980 it doesn't really work if you're popping everything from a stack every time, 272 00:11:55,980 --> 00:11:59,130 cleaning it, replenishing the blacks sweaters before the red or the blue 273 00:11:59,130 --> 00:12:00,720 even get popped themselves. 274 00:12:00,720 --> 00:12:03,482 But we're going to focus today not just on stacks and queues 275 00:12:03,482 --> 00:12:05,190 which for us are really meant to motivate 276 00:12:05,190 --> 00:12:09,430 different ways of designing data even though the implementation details might 277 00:12:09,430 --> 00:12:09,930 differ. 278 00:12:09,930 --> 00:12:12,000 But we're going to start focusing on solving some problems 279 00:12:12,000 --> 00:12:14,430 that invariably we'd be bumping up against anyway as we 280 00:12:14,430 --> 00:12:17,070 develop more and more real world software, not just 281 00:12:17,070 --> 00:12:18,540 smaller programs as in class. 282 00:12:18,540 --> 00:12:20,910 And arrays, recall, are what? 283 00:12:20,910 --> 00:12:22,830 What's the key characteristic or definition 284 00:12:22,830 --> 00:12:25,448 of an array with respect to your computer's memory 285 00:12:25,448 --> 00:12:26,490 and storing things in it? 286 00:12:26,490 --> 00:12:27,110 Yeah? 287 00:12:27,110 --> 00:12:28,860 AUDIENCE: It stores the data contiguously. 288 00:12:28,860 --> 00:12:29,860 DAVID J. MALAN: Perfect. 289 00:12:29,860 --> 00:12:32,610 So it stores the data contiguously back to back to back. 290 00:12:32,610 --> 00:12:36,542 And as we've seen thus far, when you allocate space for an array, 291 00:12:36,542 --> 00:12:38,250 you typically do it with square brackets. 292 00:12:38,250 --> 00:12:40,800 You specify a number in those brackets or maybe a constant, 293 00:12:40,800 --> 00:12:42,570 like capacity, like I just did. 294 00:12:42,570 --> 00:12:45,780 And that fixates just how much data you can actually store in there. 295 00:12:45,780 --> 00:12:48,210 We did see last week, though, that we could 296 00:12:48,210 --> 00:12:52,050 start to use malloc to allocate an equivalent number of bytes. 297 00:12:52,050 --> 00:12:54,090 But even that, when you call it just once, 298 00:12:54,090 --> 00:12:56,740 gives you back a specific finite number of bytes. 299 00:12:56,740 --> 00:12:59,280 So you're similarly deciding in advance how much memory you 300 00:12:59,280 --> 00:13:00,883 can store in an array. 301 00:13:00,883 --> 00:13:03,550 So let's consider what kinds of problems this could get us into. 302 00:13:03,550 --> 00:13:04,947 So here's an array of size three. 303 00:13:04,947 --> 00:13:06,780 And suppose for the sake of discussion we've 304 00:13:06,780 --> 00:13:10,290 already put three numbers into it 1, 2, and 3 literally. 305 00:13:10,290 --> 00:13:13,440 Suppose now we want to add a fourth number to that array. 306 00:13:13,440 --> 00:13:14,820 Well, where does it go? 307 00:13:14,820 --> 00:13:18,610 Intuitively and pictorially, you'd like to think it could go there. 308 00:13:18,610 --> 00:13:21,390 But remember the context we introduced last week when 309 00:13:21,390 --> 00:13:22,890 we talked about computers' memories. 310 00:13:22,890 --> 00:13:24,760 There's lots of stuff going on. 311 00:13:24,760 --> 00:13:28,600 And if you only ask the computer, the operating system, 312 00:13:28,600 --> 00:13:32,795 room for three integers, who knows what's here and here and here, 313 00:13:32,795 --> 00:13:34,670 not to mention everywhere else on the screen? 314 00:13:34,670 --> 00:13:38,440 So if we zoom out for instance, we might like to put the number four there. 315 00:13:38,440 --> 00:13:43,010 But we can't if in that greater context there's a lot more stuff going on. 316 00:13:43,010 --> 00:13:46,550 So for instance, suppose that elsewhere in my same program or function 317 00:13:46,550 --> 00:13:50,890 I've already created a string like H-E-L-L-O, comma, space, world, 318 00:13:50,890 --> 00:13:52,120 backslash 0. 319 00:13:52,120 --> 00:13:56,050 Just by bad luck, that could be allocated right next to my 1, 2, 3. 320 00:13:56,050 --> 00:13:56,620 Why? 321 00:13:56,620 --> 00:13:59,860 Well, if I ask the operating system for space for three numbers, 322 00:13:59,860 --> 00:14:02,710 then I ask the operating system for space for a string, 323 00:14:02,710 --> 00:14:06,580 it's pretty reasonable for the computer to put those things back to back, 324 00:14:06,580 --> 00:14:10,360 because it's not going to anticipate for us that, well, maybe they actually want 325 00:14:10,360 --> 00:14:13,030 four numbers eventually or five numbers or more. 326 00:14:13,030 --> 00:14:15,520 Now, as for all of these Oscars the Grouch, 327 00:14:15,520 --> 00:14:17,920 that's just meant to represent pictorially 328 00:14:17,920 --> 00:14:19,480 here the notion of garbage values. 329 00:14:19,480 --> 00:14:22,300 There's clearly other bytes there and available. 330 00:14:22,300 --> 00:14:23,680 I don't know what it is. 331 00:14:23,680 --> 00:14:24,910 And I don't care what it is. 332 00:14:24,910 --> 00:14:29,020 But I do care that I can't just presume to put something right where 333 00:14:29,020 --> 00:14:33,350 I want in the computer's memory unless I preemptively ask it for more memory. 334 00:14:33,350 --> 00:14:35,290 Now, if all of those are garbage values, which 335 00:14:35,290 --> 00:14:38,410 is to say that who cares what they are-- it's just junk left over 336 00:14:38,410 --> 00:14:40,900 from previous runs of the function or the like-- 337 00:14:40,900 --> 00:14:44,140 there's clearly plenty of room for a fourth number. 338 00:14:44,140 --> 00:14:48,760 I could put the number four here or here or here or down here or here or here. 339 00:14:48,760 --> 00:14:51,670 But why would I not want to just plop the four wherever 340 00:14:51,670 --> 00:14:53,780 there is a garbage value currently? 341 00:14:53,780 --> 00:14:54,280 Yeah? 342 00:14:54,280 --> 00:14:57,580 AUDIENCE: Because you want it to be next to your array of 1, 2, 3. 343 00:14:57,580 --> 00:15:00,205 DAVID J. MALAN: Exactly, I want it to be next to my array of 1, 344 00:15:00,205 --> 00:15:03,910 2, 3 because, again, arrays must be and must remain contiguous. 345 00:15:03,910 --> 00:15:07,030 Now, that's not a deal breaker because where else could I 346 00:15:07,030 --> 00:15:09,010 put maybe the entire array? 347 00:15:09,010 --> 00:15:11,080 Well, there's room up here for four numbers. 348 00:15:11,080 --> 00:15:12,980 There's room down here for four numbers. 349 00:15:12,980 --> 00:15:13,720 So that's fine. 350 00:15:13,720 --> 00:15:15,650 And that could be a solution to the problem. 351 00:15:15,650 --> 00:15:18,220 If you've run out of space in your fixed size array, 352 00:15:18,220 --> 00:15:20,890 well maybe I just abstract everything else away, 353 00:15:20,890 --> 00:15:25,270 and I just move my array to a different location that's a little bit bigger. 354 00:15:25,270 --> 00:15:26,770 But there is going to be a downside. 355 00:15:26,770 --> 00:15:29,228 Even though this is a solution, even though I can certainly 356 00:15:29,228 --> 00:15:32,810 copy the 1, the 2, the 3-- and now I can plop the 4 there. 357 00:15:32,810 --> 00:15:36,010 And, heck, I can then let go of the old memory in some way 358 00:15:36,010 --> 00:15:38,920 and give it back to the operating system to be reused later. 359 00:15:38,920 --> 00:15:40,720 This is successful. 360 00:15:40,720 --> 00:15:44,350 But why intuitively might we not want this 361 00:15:44,350 --> 00:15:48,640 to be our solution of creating a new array that's a little bigger, 362 00:15:48,640 --> 00:15:51,400 copying the old into the new, and getting rid of the old? 363 00:15:51,400 --> 00:15:55,310 364 00:15:55,310 --> 00:15:57,080 Good, yeah, I think I had one more step. 365 00:15:57,080 --> 00:15:59,420 Suppose I want to add a fifth number, a sixth number. 366 00:15:59,420 --> 00:16:00,740 That's a lot of work. 367 00:16:00,740 --> 00:16:04,080 And, in fact, what's the expensive part or what's the slow part of that story? 368 00:16:04,080 --> 00:16:04,580 Yeah? 369 00:16:04,580 --> 00:16:05,955 AUDIENCE: It takes a lot of time. 370 00:16:05,955 --> 00:16:07,580 DAVID J. MALAN: It takes a lot of time. 371 00:16:07,580 --> 00:16:10,570 But specifically, what's taking time if we can put our finger on it? 372 00:16:10,570 --> 00:16:11,860 Yeah, in the back? 373 00:16:11,860 --> 00:16:14,260 AUDIENCE: You're using twice as much. 374 00:16:14,260 --> 00:16:16,310 DAVID J. MALAN: OK, for a period of time, 375 00:16:16,310 --> 00:16:19,120 I'm using twice as much memory, which certainly seems wasteful 376 00:16:19,120 --> 00:16:22,510 because even though I don't eventually need it, it is going to mushroom 377 00:16:22,510 --> 00:16:26,150 and then shrink back down, which seems like an inefficient use of resources. 378 00:16:26,150 --> 00:16:28,930 But what specifically is slow about this process too? 379 00:16:28,930 --> 00:16:30,063 Yeah, in the middle. 380 00:16:30,063 --> 00:16:32,980 AUDIENCE: You're iterating through the original array to copy it over. 381 00:16:32,980 --> 00:16:34,772 DAVID J. MALAN: Yeah, good choice of words. 382 00:16:34,772 --> 00:16:38,390 You're iterating over the array to copy it over using a for loop, a while loop. 383 00:16:38,390 --> 00:16:41,020 So it's probably like big O of n steps just 384 00:16:41,020 --> 00:16:44,260 to copy the array and technically big O of n plus 1 if we had one more. 385 00:16:44,260 --> 00:16:45,520 But that's still big O of n. 386 00:16:45,520 --> 00:16:48,580 So it's the copying, the moving of the data, so to speak, 387 00:16:48,580 --> 00:16:50,120 that's certainly correct. 388 00:16:50,120 --> 00:16:52,450 But maybe it's not the best design. 389 00:16:52,450 --> 00:16:55,060 Wouldn't it be better if we could do something otherwise? 390 00:16:55,060 --> 00:16:58,510 Well, let's consider what this might actually translate into in code 391 00:16:58,510 --> 00:17:00,370 and what the implications then might be. 392 00:17:00,370 --> 00:17:02,320 Let me switch over here to VS Code. 393 00:17:02,320 --> 00:17:07,700 Let me propose to open up a file called list.c brand new. 394 00:17:07,700 --> 00:17:10,920 And let's create this list of numbers and then add to it over time 395 00:17:10,920 --> 00:17:13,770 and see when and where we actually bump up against these problems. 396 00:17:13,770 --> 00:17:17,450 So let me include standard io.h in order to simply 397 00:17:17,450 --> 00:17:21,290 be able to print things out ultimately, int main void, so no need 398 00:17:21,290 --> 00:17:22,880 for command line arguments here. 399 00:17:22,880 --> 00:17:27,950 Let me give myself an array called list just of size 3 400 00:17:27,950 --> 00:17:29,960 for consistency with the picture thus far. 401 00:17:29,960 --> 00:17:31,970 And now let me go ahead and just manually 402 00:17:31,970 --> 00:17:34,250 make it look like in memory what it did on the screen. 403 00:17:34,250 --> 00:17:37,520 So list bracket 0 is going to equal to number 1. 404 00:17:37,520 --> 00:17:40,790 List bracket 1 is going to equal the number 2. 405 00:17:40,790 --> 00:17:42,920 And list bracket 2 equals number three. 406 00:17:42,920 --> 00:17:45,260 So even though the array is, of course, zero indexed, 407 00:17:45,260 --> 00:17:48,533 I'm using more familiar 1, 2, 3 as my digits here. 408 00:17:48,533 --> 00:17:50,450 Now, suppose I want to print these things out. 409 00:17:50,450 --> 00:17:52,830 Let's just do something as a simple exercise. 410 00:17:52,830 --> 00:17:58,520 So for int i equals 0, i is less than 3 i plus plus. 411 00:17:58,520 --> 00:18:01,220 Inside of this loop, I'm going to do something simple like print 412 00:18:01,220 --> 00:18:07,580 out iteratively, as you note, backslash n list bracket i. 413 00:18:07,580 --> 00:18:09,000 So very simple program. 414 00:18:09,000 --> 00:18:11,750 It's not the best design because I've got this magic number there. 415 00:18:11,750 --> 00:18:12,830 I'm hard coding the 3. 416 00:18:12,830 --> 00:18:15,080 But the point is just to go through the motions 417 00:18:15,080 --> 00:18:19,250 of demonstrating how this code works. 418 00:18:19,250 --> 00:18:21,805 Good, you got it in before I hit compile. 419 00:18:21,805 --> 00:18:23,660 So wait. 420 00:18:23,660 --> 00:18:25,160 Thank you. 421 00:18:25,160 --> 00:18:26,630 All right. 422 00:18:26,630 --> 00:18:27,690 Maybe round of applause. 423 00:18:27,690 --> 00:18:28,220 Thank you. 424 00:18:28,220 --> 00:18:30,170 [APPLAUSE] 425 00:18:30,170 --> 00:18:32,130 All right. 426 00:18:32,130 --> 00:18:34,880 All right, so this is going to get aggressive, though, eventually. 427 00:18:34,880 --> 00:18:36,740 So let me add the semicolon. 428 00:18:36,740 --> 00:18:39,960 Let me recompile this list. 429 00:18:39,960 --> 00:18:41,270 Seems to compile OK. 430 00:18:41,270 --> 00:18:44,390 And if I do ./list, I should see, of course, 1, 2, 3. 431 00:18:44,390 --> 00:18:45,380 So the code works. 432 00:18:45,380 --> 00:18:47,463 There's no memory constraints here because I'm not 433 00:18:47,463 --> 00:18:48,980 trying to actually add some values. 434 00:18:48,980 --> 00:18:54,200 But let me consider how I could go about implementing this idea of copying 435 00:18:54,200 --> 00:18:56,330 everything from the old array to the new array, 436 00:18:56,330 --> 00:18:59,960 frankly, just to see how annoying it is, how painful it is. 437 00:18:59,960 --> 00:19:02,270 So you're about to see the code escalate quickly. 438 00:19:02,270 --> 00:19:06,772 And it will be helpful to try to wrap your mind around each individual step 439 00:19:06,772 --> 00:19:08,480 even though if you take a step back, it's 440 00:19:08,480 --> 00:19:11,450 going to look like a crazy amount of code to solve a simple idea. 441 00:19:11,450 --> 00:19:12,330 But that's the point. 442 00:19:12,330 --> 00:19:14,660 We're going to get to a place, particularly in week 6 443 00:19:14,660 --> 00:19:18,420 where all of what we're about to do reduces to one line of code. 444 00:19:18,420 --> 00:19:19,998 So hang in there for now. 445 00:19:19,998 --> 00:19:21,290 So let me go ahead and do this. 446 00:19:21,290 --> 00:19:28,130 If I want to create a version of this code that can grow to fit more numbers, 447 00:19:28,130 --> 00:19:32,330 for instance, how can I go about doing this or at least demonstrate as much? 448 00:19:32,330 --> 00:19:37,070 Well, I cannot use an array in this traditional way of using square 449 00:19:37,070 --> 00:19:42,140 brackets because that makes list, the variable, forever of size 3. 450 00:19:42,140 --> 00:19:43,130 I can't free it. 451 00:19:43,130 --> 00:19:44,930 Remember free you can only use with malloc. 452 00:19:44,930 --> 00:19:48,540 So you can't give it back and then recreate it using this syntax. 453 00:19:48,540 --> 00:19:50,995 But I can use this trick from last time, whereby, 454 00:19:50,995 --> 00:19:53,870 if I know there is this function called malloc, whose purpose in life 455 00:19:53,870 --> 00:19:57,170 is to give me memory, I could, for instance re-declare 456 00:19:57,170 --> 00:20:01,820 list to be a pointer so to speak that is the address of a chunk of memory. 457 00:20:01,820 --> 00:20:07,130 And I could ask malloc for a chunk of memory namely of size 3 but not 3 458 00:20:07,130 --> 00:20:09,770 per se, three integers for good measure. 459 00:20:09,770 --> 00:20:14,180 So technically that's three times the size of whatever an int is. 460 00:20:14,180 --> 00:20:17,690 Now, for our purposes today, that's technically 3 times 4 or 12. 461 00:20:17,690 --> 00:20:20,120 But I'm trying to do this very generally in case we use it 462 00:20:20,120 --> 00:20:23,930 on an old computer or maybe a future computer, where the size of an int 463 00:20:23,930 --> 00:20:25,230 might very well change. 464 00:20:25,230 --> 00:20:26,750 That's why I'm using size of int. 465 00:20:26,750 --> 00:20:29,660 It will tell me always the correct answer for my computer. 466 00:20:29,660 --> 00:20:31,760 So to use malloc-- 467 00:20:31,760 --> 00:20:36,637 not going to catch me on this one-- what header file do I need to add? 468 00:20:36,637 --> 00:20:37,137 Standard? 469 00:20:37,137 --> 00:20:38,051 AUDIENCE: Standard lib.h. 470 00:20:38,051 --> 00:20:39,440 DAVID J. MALAN: Standard lib.h. 471 00:20:39,440 --> 00:20:42,500 So I'm going to go ahead and include standard lib.h, which 472 00:20:42,500 --> 00:20:43,977 gives me access to malloc. 473 00:20:43,977 --> 00:20:47,060 And what I'm going to additionally do is practice what I preach last week, 474 00:20:47,060 --> 00:20:51,500 whereby in extreme cases malloc can return not the address 475 00:20:51,500 --> 00:20:52,910 of an actual chunk of memory. 476 00:20:52,910 --> 00:20:56,300 What else can malloc return in cases of error? 477 00:20:56,300 --> 00:20:57,020 Yeah? 478 00:20:57,020 --> 00:20:57,740 AUDIENCE: Null. 479 00:20:57,740 --> 00:21:00,420 DAVID J. MALAN: Null, N-U-L-L in all caps, 480 00:21:00,420 --> 00:21:02,188 which represents technically address 0. 481 00:21:02,188 --> 00:21:03,980 But you're never supposed to use address 0. 482 00:21:03,980 --> 00:21:07,160 So it's a special sentinel value that just means something went wrong. 483 00:21:07,160 --> 00:21:08,200 Do not proceed. 484 00:21:08,200 --> 00:21:09,950 So it's going to add some bulk to my code. 485 00:21:09,950 --> 00:21:11,160 But it is good practice. 486 00:21:11,160 --> 00:21:14,738 So if list at this point actually equals equals null, 487 00:21:14,738 --> 00:21:16,280 there's no more work to be done here. 488 00:21:16,280 --> 00:21:17,880 I've got to abort the demo altogether. 489 00:21:17,880 --> 00:21:19,820 So I'm going to return 1 just arbitrarily 490 00:21:19,820 --> 00:21:21,362 to say we're done with this exercise. 491 00:21:21,362 --> 00:21:22,987 It's not going to be germane for class. 492 00:21:22,987 --> 00:21:25,760 We can surely find room for three integers but best practice 493 00:21:25,760 --> 00:21:27,830 whenever using malloc. 494 00:21:27,830 --> 00:21:33,290 Now, this code here does not need to change because list is now still 495 00:21:33,290 --> 00:21:36,240 a chunk of memory of size 12, I can actually 496 00:21:36,240 --> 00:21:38,520 get away with still using square bracket notation 497 00:21:38,520 --> 00:21:42,060 and treating this chunk of memory as though it's an array. 498 00:21:42,060 --> 00:21:43,230 And this is a bit subtle. 499 00:21:43,230 --> 00:21:47,040 But recall from last time, we talked briefly about pointer arithmetic, 500 00:21:47,040 --> 00:21:50,850 whereby the computer can do some arithmetic, some addition, subtraction 501 00:21:50,850 --> 00:21:54,088 on the actual addresses to get from one location to the other. 502 00:21:54,088 --> 00:21:56,130 And that's what the computer is going to do here. 503 00:21:56,130 --> 00:21:59,850 Because it says list bracket 0, that's essentially 504 00:21:59,850 --> 00:22:03,960 just going to put the number 1 literally at the beginning 505 00:22:03,960 --> 00:22:05,030 of that chunk of memory. 506 00:22:05,030 --> 00:22:08,280 And because this is a modern computer, it's going to take four bytes in total. 507 00:22:08,280 --> 00:22:12,930 But I don't want to put the number 4 here to shift it over myself. 508 00:22:12,930 --> 00:22:16,410 Because I'm using square brackets and because the computer 509 00:22:16,410 --> 00:22:19,770 knows that this chunk of memory is being treated 510 00:22:19,770 --> 00:22:24,850 as a chunk of addresses of integers, pointer arithmetic magically kicks in. 511 00:22:24,850 --> 00:22:29,310 So what the computer is going to do for me is put this 1 at location 0. 512 00:22:29,310 --> 00:22:34,830 It's going to put this number 2 at location 1 times size of int, so 4. 513 00:22:34,830 --> 00:22:38,040 And it's going to put this number 3 at location 2 times 514 00:22:38,040 --> 00:22:40,160 size of int, which gives me 8. 515 00:22:40,160 --> 00:22:41,910 So in other words, you don't have to think 516 00:22:41,910 --> 00:22:46,320 about how big that chunk of memory is if you already gave the compiler a clue as 517 00:22:46,320 --> 00:22:46,997 to the size. 518 00:22:46,997 --> 00:22:49,330 For our purposes today, don't worry too much about that. 519 00:22:49,330 --> 00:22:52,710 The bigger takeaway is that when you allocate space using malloc, 520 00:22:52,710 --> 00:22:55,080 you can certainly treat it as though it's 521 00:22:55,080 --> 00:22:59,100 an array using week 2 notation, which is arguably simpler than using 522 00:22:59,100 --> 00:23:01,560 dots and stars and all of that. 523 00:23:01,560 --> 00:23:03,480 But this isn't quite enough now because let 524 00:23:03,480 --> 00:23:05,730 me stipulate that for the sake of discussion, 525 00:23:05,730 --> 00:23:10,320 at this point in time here on line 16, where the cursor is blinking, 526 00:23:10,320 --> 00:23:13,170 suppose I realize just for the sake of discussion, 527 00:23:13,170 --> 00:23:17,130 oh, I should have allocated space for four integers instead of three. 528 00:23:17,130 --> 00:23:19,600 Now, obviously, if I were writing this for real, 529 00:23:19,600 --> 00:23:22,498 I should just go fix the code now and recompile it altogether? 530 00:23:22,498 --> 00:23:24,540 But let's just pretend for the sake of discussion 531 00:23:24,540 --> 00:23:28,770 that somewhere in your program you want to dynamically allocate more space 532 00:23:28,770 --> 00:23:32,490 and free up the old in order to implement this idea of copying 533 00:23:32,490 --> 00:23:34,270 from old to new memory. 534 00:23:34,270 --> 00:23:35,710 So how could I do that? 535 00:23:35,710 --> 00:23:39,570 Well, let me go ahead and temporarily give myself another chunk of memory. 536 00:23:39,570 --> 00:23:41,770 And I'm going to literally call it tmp for short, 537 00:23:41,770 --> 00:23:43,620 which is a common convention, tmp. 538 00:23:43,620 --> 00:23:46,350 I'm going to set that equal to the amount of space 539 00:23:46,350 --> 00:23:47,920 that I actually do now want. 540 00:23:47,920 --> 00:23:50,550 So I'm to say four times the size of an int. 541 00:23:50,550 --> 00:23:54,510 So technically it'll give me 16 but space for four integers this time. 542 00:23:54,510 --> 00:23:58,560 And what that's doing for me in code is essentially 543 00:23:58,560 --> 00:24:02,858 trying to find me space for four integers elsewhere 544 00:24:02,858 --> 00:24:04,650 that might very well be garbage values now. 545 00:24:04,650 --> 00:24:06,820 But I can, therefore, reuse them. 546 00:24:06,820 --> 00:24:09,690 So once I've done this, something could still go wrong. 547 00:24:09,690 --> 00:24:12,550 And I could check if temp equals equals null, 548 00:24:12,550 --> 00:24:15,810 then actually I should exit altogether and finish up. 549 00:24:15,810 --> 00:24:17,848 But there's a subtlety here. 550 00:24:17,848 --> 00:24:20,140 And you don't need to dwell too much on this for today. 551 00:24:20,140 --> 00:24:22,350 But there is technically a bug right now. 552 00:24:22,350 --> 00:24:28,500 Why based on week 4, last week, might it not be correct technically 553 00:24:28,500 --> 00:24:33,420 to immediately return 1 and abort the program altogether at this point? 554 00:24:33,420 --> 00:24:36,870 AUDIENCE: I think when you allocate memory sometimes it has garbage values. 555 00:24:36,870 --> 00:24:38,870 DAVID J. MALAN: OK, so when you allocate memory, 556 00:24:38,870 --> 00:24:41,328 sometimes there might be garbage values there that is true. 557 00:24:41,328 --> 00:24:45,260 But that is to say that those 16 bytes might be garbage values, 558 00:24:45,260 --> 00:24:47,360 have Oscar the grouch's all on the screen. 559 00:24:47,360 --> 00:24:51,290 But tmp itself will literally be the return value of malloc. 560 00:24:51,290 --> 00:24:55,310 And malloc will always return to you the address of a valid chunk of memory. 561 00:24:55,310 --> 00:24:56,700 Or it will return null. 562 00:24:56,700 --> 00:24:59,090 So this line is actually OK. 563 00:24:59,090 --> 00:25:02,170 What I don't love is that I'm returning 1 immediately. 564 00:25:02,170 --> 00:25:05,360 AUDIENCE: I think [INAUDIBLE]. 565 00:25:05,360 --> 00:25:07,490 DAVID J. MALAN: Yes, so this is where it's subtle. 566 00:25:07,490 --> 00:25:10,790 It's not quite right to just abort right now and return 1. 567 00:25:10,790 --> 00:25:11,300 Why? 568 00:25:11,300 --> 00:25:14,480 Because up here, remember, a few moments ago 569 00:25:14,480 --> 00:25:17,240 we used malloc presumably successfully. 570 00:25:17,240 --> 00:25:21,080 Because if we got all the way down here, we did not abort on line 9. 571 00:25:21,080 --> 00:25:22,190 So we kept going. 572 00:25:22,190 --> 00:25:26,780 But that means we've allocated three times size event, so 12 bytes earlier. 573 00:25:26,780 --> 00:25:30,140 So frankly, if you compile this code, run it, and then ask Valgrind, 574 00:25:30,140 --> 00:25:34,100 it's going to identify a memory leak of size 12 because, as you know, 575 00:25:34,100 --> 00:25:35,762 we did not free the original memory. 576 00:25:35,762 --> 00:25:38,720 So this is where frankly C does get a little annoying because you and I 577 00:25:38,720 --> 00:25:41,490 as the programmers have to remember all of these details. 578 00:25:41,490 --> 00:25:46,310 So what I really want to do here, before I return 1, to be best practice, 579 00:25:46,310 --> 00:25:48,840 I want to free the original list. 580 00:25:48,840 --> 00:25:51,450 So I give back those bytes to the operating system. 581 00:25:51,450 --> 00:25:54,727 Now, as an aside, technically when any program quits, all of the memory 582 00:25:54,727 --> 00:25:56,810 is going to be given back to the operating system. 583 00:25:56,810 --> 00:26:00,710 But practicing what I'm preaching now will get you into better situations 584 00:26:00,710 --> 00:26:01,320 later. 585 00:26:01,320 --> 00:26:04,560 Because if you don't free up memory, you will have leaks. 586 00:26:04,560 --> 00:26:06,690 And that's when our own Macs and PCs tend 587 00:26:06,690 --> 00:26:10,050 to start to slow down and use up more memory than they should. 588 00:26:10,050 --> 00:26:13,110 But let's avoid discussion of more error checking there. 589 00:26:13,110 --> 00:26:15,330 Let's just assume that now I'm on line 23 590 00:26:15,330 --> 00:26:19,230 of this program, whereby I have presumably successfully allocated 591 00:26:19,230 --> 00:26:20,110 enough space. 592 00:26:20,110 --> 00:26:24,360 So the next step after allocating these four bytes is to, as you noted earlier, 593 00:26:24,360 --> 00:26:28,000 iteratively copy the old numbers into the new space. 594 00:26:28,000 --> 00:26:30,043 So this is actually pretty straightforward. 595 00:26:30,043 --> 00:26:30,960 I'm going to go ahead. 596 00:26:30,960 --> 00:26:36,120 And for int i gets 0, i is less than 3 i plus plus just 597 00:26:36,120 --> 00:26:37,980 like I was printing last time. 598 00:26:37,980 --> 00:26:41,550 I'm going to go ahead and set the i-th location of temp 599 00:26:41,550 --> 00:26:45,450 equal to the i-th location of list, semicolon. 600 00:26:45,450 --> 00:26:46,140 And that's it. 601 00:26:46,140 --> 00:26:51,540 I'm just copying into the temporary array whatever was in the old array. 602 00:26:51,540 --> 00:26:55,230 But that still leaves me with this fourth byte, of course-- 603 00:26:55,230 --> 00:26:58,290 or, sorry, this fourth location, where I want to put the number 4. 604 00:26:58,290 --> 00:27:00,540 But if I'm going to do that for the sake of discussion 605 00:27:00,540 --> 00:27:03,540 even though this isn't really a compelling real world program, 606 00:27:03,540 --> 00:27:08,820 I'm going to just manually go into the last location in tmp, a.k.a. 607 00:27:08,820 --> 00:27:12,810 tmp bracket 3 and set that equal to my fourth number. 608 00:27:12,810 --> 00:27:13,920 So that's all. 609 00:27:13,920 --> 00:27:18,780 The whole point here is to mimic in code what it was we wanted to do here. 610 00:27:18,780 --> 00:27:20,110 But now there's one more step. 611 00:27:20,110 --> 00:27:23,880 What was the next step after copying the 1, the 2, the 3, and adding the 4? 612 00:27:23,880 --> 00:27:25,650 What do I want to do? 613 00:27:25,650 --> 00:27:27,810 Now, I can safely free the list. 614 00:27:27,810 --> 00:27:30,240 Now I want to go ahead and get rid of the original memory 615 00:27:30,240 --> 00:27:32,530 or at least hand it back to the operating system. 616 00:27:32,530 --> 00:27:36,510 So here is where I can free the list, not in the case of an error 617 00:27:36,510 --> 00:27:40,170 but actually deliberately free the original list because I 618 00:27:40,170 --> 00:27:42,420 don't need those 12 bytes anymore. 619 00:27:42,420 --> 00:27:46,440 But now if I want to really have quote, unquote list 620 00:27:46,440 --> 00:27:50,580 point at this new chunk of memory, well, then I could also do this, 621 00:27:50,580 --> 00:27:53,370 list equals temp. 622 00:27:53,370 --> 00:27:54,630 So this is a little weird. 623 00:27:54,630 --> 00:27:56,910 But recall that list has just now been freed. 624 00:27:56,910 --> 00:28:00,000 So even though list technically contains the address of a chunk of memory, 625 00:28:00,000 --> 00:28:02,767 it's no longer valid because, again, it was freed. 626 00:28:02,767 --> 00:28:04,350 So, yes, it's still technically there. 627 00:28:04,350 --> 00:28:06,300 But it's effectively garbage values now. 628 00:28:06,300 --> 00:28:08,040 So I'm certainly free-- 629 00:28:08,040 --> 00:28:08,730 no pun intended. 630 00:28:08,730 --> 00:28:11,860 I'm certainly allowed to update the value of list. 631 00:28:11,860 --> 00:28:15,238 And I want list to now point to the new chunk of memory. 632 00:28:15,238 --> 00:28:17,280 So sort of metaphorically, if list was originally 633 00:28:17,280 --> 00:28:20,860 pointing at a chunk of memory there, maybe now I want it to point over here. 634 00:28:20,860 --> 00:28:24,720 So I'm just updating the value of list ultimately. 635 00:28:24,720 --> 00:28:26,670 All right, now that I've got this all done, 636 00:28:26,670 --> 00:28:28,980 I think I can just use this same loop as before. 637 00:28:28,980 --> 00:28:32,220 I could change the 3 to a 4 because I now have four numbers. 638 00:28:32,220 --> 00:28:35,100 At the very bottom of this program though, also subtle, 639 00:28:35,100 --> 00:28:38,430 I should probably now at the very end free this list. 640 00:28:38,430 --> 00:28:42,210 And for good measure, let me go ahead and return 0. 641 00:28:42,210 --> 00:28:44,820 But now I think I have a complete program that, again, 642 00:28:44,820 --> 00:28:47,850 to be clear is not how you would write this in the real world 643 00:28:47,850 --> 00:28:52,140 because you would not allocate three integers then 644 00:28:52,140 --> 00:28:54,690 decide you want to allocate four then fix all of this. 645 00:28:54,690 --> 00:28:56,550 But we could probably borrow, copy and paste 646 00:28:56,550 --> 00:28:59,850 some of this code into production code eventually, whereby this would 647 00:28:59,850 --> 00:29:02,260 solve some actual problems dynamically. 648 00:29:02,260 --> 00:29:04,320 So let me cross my fingers, make list. 649 00:29:04,320 --> 00:29:06,030 So far so good, ./list. 650 00:29:06,030 --> 00:29:09,510 And I should see 1, 2, 3, 4. 651 00:29:09,510 --> 00:29:14,100 So long story short, it's a lot of work just to get from the original array 652 00:29:14,100 --> 00:29:15,000 to the second. 653 00:29:15,000 --> 00:29:18,900 So ideally, we would not do any of this in the first place. 654 00:29:18,900 --> 00:29:21,250 Ideally, what could we do instead? 655 00:29:21,250 --> 00:29:23,970 Well, maybe we should just allocate more memory 656 00:29:23,970 --> 00:29:27,422 from the get go in order to avoid this problem altogether. 657 00:29:27,422 --> 00:29:28,380 So how might I do that? 658 00:29:28,380 --> 00:29:33,480 Well, instead of having allocated an array of size 3, let alone an array 659 00:29:33,480 --> 00:29:37,320 of size 4, why don't I just proactively from the beginning of my program 660 00:29:37,320 --> 00:29:41,940 allocate an array of size 30 or heck 300 or 3,000 661 00:29:41,940 --> 00:29:45,255 and then just keep track of how much of it I'm using? 662 00:29:45,255 --> 00:29:46,810 That would be correct. 663 00:29:46,810 --> 00:29:50,190 It would solve the problem of not painting yourself into a corner 664 00:29:50,190 --> 00:29:51,840 so quickly. 665 00:29:51,840 --> 00:29:53,215 But what remains as an issue? 666 00:29:53,215 --> 00:29:54,840 AUDIENCE: You're using a lot of memory. 667 00:29:54,840 --> 00:29:56,800 DAVID J. MALAN: I'm using a bunch more memory. 668 00:29:56,800 --> 00:29:59,675 Especially if this program's only going to ever manage a few numbers, 669 00:29:59,675 --> 00:30:03,540 why are you wasting 100 times more memory than you might actually? 670 00:30:03,540 --> 00:30:06,140 And there's an another corner case that could still arise 671 00:30:06,140 --> 00:30:08,120 even though this solves the problem. 672 00:30:08,120 --> 00:30:10,800 AUDIENCE: If you add another list, you'll run out of memory. 673 00:30:10,800 --> 00:30:12,550 DAVID J. MALAN: Exactly, we can eventually 674 00:30:12,550 --> 00:30:14,650 still run into the exact same problem because if I 675 00:30:14,650 --> 00:30:17,813 want to put 301 numbers in the list or 3,001, well, 676 00:30:17,813 --> 00:30:20,230 I'm still going to have to jump through all of these hoops 677 00:30:20,230 --> 00:30:22,000 and reallocate all of that space. 678 00:30:22,000 --> 00:30:26,410 And, honestly, now per year concern about the looping, iterating 300 times 679 00:30:26,410 --> 00:30:29,260 3,000 times is certainly eventually going 680 00:30:29,260 --> 00:30:33,020 to start to add up if we're doing it a lot in terms of speed and slowdown. 681 00:30:33,020 --> 00:30:36,760 So maybe there's a better way altogether than doing this. 682 00:30:36,760 --> 00:30:41,050 And indeed there is if we start to treat our computer's memory as a canvas 683 00:30:41,050 --> 00:30:44,890 that we can start to use to design data structures more generally. 684 00:30:44,890 --> 00:30:46,630 Arrays are a data structure, arguably. 685 00:30:46,630 --> 00:30:47,540 They're super simple. 686 00:30:47,540 --> 00:30:49,060 They're contiguous chunks of memory. 687 00:30:49,060 --> 00:30:53,980 But we could use memory a little more cleverly, especially now per last week 688 00:30:53,980 --> 00:30:56,920 that we have pointers, which is painful as they might 689 00:30:56,920 --> 00:30:58,540 be to wrap your mind around sometimes. 690 00:30:58,540 --> 00:31:01,850 They really just let us point to different places in memory. 691 00:31:01,850 --> 00:31:05,420 And so we can start to stitch things together in an interesting way. 692 00:31:05,420 --> 00:31:11,312 So the only syntax we'll really need to do that to stitch things together 693 00:31:11,312 --> 00:31:13,270 in memory and build more interesting structures 694 00:31:13,270 --> 00:31:17,290 are these things, struct, which allows us to represent structs already. 695 00:31:17,290 --> 00:31:18,730 And we did this with persons. 696 00:31:18,730 --> 00:31:20,480 And we played with this last time as well. 697 00:31:20,480 --> 00:31:23,350 And we saw it already for queues and stacks. 698 00:31:23,350 --> 00:31:25,990 The dot operator, we haven't used it that much. 699 00:31:25,990 --> 00:31:28,180 But recall that whenever you have a struct, 700 00:31:28,180 --> 00:31:30,580 you can go inside of it using the dot operator. 701 00:31:30,580 --> 00:31:34,390 And we did that for a person, person.name and person.number 702 00:31:34,390 --> 00:31:36,790 when we were implementing a very simple address book. 703 00:31:36,790 --> 00:31:38,578 The star was new last week. 704 00:31:38,578 --> 00:31:40,870 And it can mean different things in different contexts. 705 00:31:40,870 --> 00:31:43,000 You use it when declaring a pointer. 706 00:31:43,000 --> 00:31:46,990 But you also use it when dereferencing a pointer, to go there. 707 00:31:46,990 --> 00:31:49,570 But just so you've seen it before, it actually 708 00:31:49,570 --> 00:31:53,900 tends to be a little annoying, a little confusing to use star and dot together. 709 00:31:53,900 --> 00:31:56,740 You might remember one example last week where in parentheses I 710 00:31:56,740 --> 00:31:57,940 put star something. 711 00:31:57,940 --> 00:32:02,200 And then I used a dot operator to go there and then go inside the structure. 712 00:32:02,200 --> 00:32:04,330 Long story short, we'll see today that you 713 00:32:04,330 --> 00:32:07,480 can combine simultaneous use of star and dot 714 00:32:07,480 --> 00:32:10,390 into something that actually looks like an arrow, something 715 00:32:10,390 --> 00:32:14,510 that vaguely looks like a foam finger that might be pointing from one place 716 00:32:14,510 --> 00:32:15,570 to another. 717 00:32:15,570 --> 00:32:17,990 So we'll see that actually in some code. 718 00:32:17,990 --> 00:32:20,960 So where can we take this? 719 00:32:20,960 --> 00:32:24,470 Well, let's implement the first of these ideas, namely something that's 720 00:32:24,470 --> 00:32:28,430 very canonical in computing known as a linked list. 721 00:32:28,430 --> 00:32:31,610 And let's see if we can maybe do this. 722 00:32:31,610 --> 00:32:35,720 How about Scully, could we get you to come on up and volunteer here? 723 00:32:35,720 --> 00:32:37,418 So our friend Scully-- 724 00:32:37,418 --> 00:32:38,960 there's some cookies in this for you. 725 00:32:38,960 --> 00:32:41,510 So Scully has come prepared with a whole bunch of balloons 726 00:32:41,510 --> 00:32:44,900 to represent chunks of memory because we'd like to paint a picture here 727 00:32:44,900 --> 00:32:48,620 of what's involved in actually allocating space that's not necessarily 728 00:32:48,620 --> 00:32:51,380 contiguous and might be over there or over here or over here 729 00:32:51,380 --> 00:32:52,620 in the computer's memory. 730 00:32:52,620 --> 00:32:56,120 So, for instance, if I want to start allocating space 731 00:32:56,120 --> 00:32:58,850 one at a time for a list of numbers, Scully, 732 00:32:58,850 --> 00:33:01,190 could you go ahead and malloc one balloon for me? 733 00:33:01,190 --> 00:33:04,910 And in this balloon I'll store for instance, the number one ultimately. 734 00:33:04,910 --> 00:33:07,040 So we have a balloon here. 735 00:33:07,040 --> 00:33:08,150 We rehearsed this before. 736 00:33:08,150 --> 00:33:11,478 And these balloons are actually really hard to blow up and tie off quickly. 737 00:33:11,478 --> 00:33:12,020 So thank you. 738 00:33:12,020 --> 00:33:13,730 So here we have a chunk of memory. 739 00:33:13,730 --> 00:33:19,190 And I could certainly for instance go in here and store if I might the-- 740 00:33:19,190 --> 00:33:19,950 here we go. 741 00:33:19,950 --> 00:33:22,670 I could certainly go ahead here and store in this balloon, 742 00:33:22,670 --> 00:33:24,620 for instance, the number one. 743 00:33:24,620 --> 00:33:27,867 But in the world of an array, it would just be back to back to back. 744 00:33:27,867 --> 00:33:30,200 And actually, frankly, why do we need the balloons even? 745 00:33:30,200 --> 00:33:32,330 I could just use these numbers, 1, 2, 3. 746 00:33:32,330 --> 00:33:34,670 But the problem doesn't indeed arise note, 747 00:33:34,670 --> 00:33:37,430 that when we want to put a fourth number, well where does it go? 748 00:33:37,430 --> 00:33:40,220 Well, again, just to paint a picture, ideally I 749 00:33:40,220 --> 00:33:42,770 might allocate space for four. 750 00:33:42,770 --> 00:33:46,190 But if this is my array of size 3 like where does it go? 751 00:33:46,190 --> 00:33:46,970 This is the point. 752 00:33:46,970 --> 00:33:49,340 We can't just put it next to the 3. 753 00:33:49,340 --> 00:33:51,560 Maybe there's room for the 4 over here. 754 00:33:51,560 --> 00:33:54,930 But we have to somehow connect these from one to the other. 755 00:33:54,930 --> 00:33:56,430 So, in fact, let's act that out. 756 00:33:56,430 --> 00:33:59,390 So if I instead use this balloon metaphor of just allocating space 757 00:33:59,390 --> 00:34:01,473 from wherever it is, can you go ahead and allocate 758 00:34:01,473 --> 00:34:03,350 like another chunk of memory for me? 759 00:34:03,350 --> 00:34:07,400 And here is where I'll now have a chunk of memory in which I can store 760 00:34:07,400 --> 00:34:11,909 the number computers a little slow. 761 00:34:11,909 --> 00:34:15,910 So in here, the second balloon I'll have a separate chunk of memory. 762 00:34:15,910 --> 00:34:18,113 AUDIENCE: Oh my gosh. 763 00:34:18,113 --> 00:34:19,280 DAVID J. MALAN: There we go. 764 00:34:19,280 --> 00:34:20,719 OK, good. 765 00:34:20,719 --> 00:34:24,980 Second chunk of memory, thank you, Scully. 766 00:34:24,980 --> 00:34:28,739 Now, I can certainly-- 767 00:34:28,739 --> 00:34:29,239 Thank you. 768 00:34:29,239 --> 00:34:33,300 I can certainly now store the number two in this chunk of memory. 769 00:34:33,300 --> 00:34:34,940 But it's not necessarily contiguous. 770 00:34:34,940 --> 00:34:37,833 This chunk came from over here as per Scully's position originally. 771 00:34:37,833 --> 00:34:39,750 This chunk obviously is coming from over here. 772 00:34:39,750 --> 00:34:41,750 And if you don't mind holding that for a moment, 773 00:34:41,750 --> 00:34:45,447 this is breaking the metaphor of an array, which was indeed contiguous. 774 00:34:45,447 --> 00:34:48,530 And even though I as the human can certainly go over and walk next to her, 775 00:34:48,530 --> 00:34:51,508 that's the equivalent of copying values from one place to another. 776 00:34:51,508 --> 00:34:53,300 What if we're a little more clever, though? 777 00:34:53,300 --> 00:34:55,940 And if Scully found space for this number one over here, 778 00:34:55,940 --> 00:34:57,660 let's just leave this balloon here. 779 00:34:57,660 --> 00:34:59,785 And if she found space for the number 2 over there, 780 00:34:59,785 --> 00:35:01,100 let's leave that balloon there. 781 00:35:01,100 --> 00:35:05,270 But we do somehow have to connect these numbers together. 782 00:35:05,270 --> 00:35:06,650 And here is where to-- 783 00:35:06,650 --> 00:35:08,220 I'll try to do this on the fly. 784 00:35:08,220 --> 00:35:10,060 Maybe I could do something like this. 785 00:35:10,060 --> 00:35:11,570 I can take this balloon here. 786 00:35:11,570 --> 00:35:14,810 And I can actually tie a string to it so that if I 787 00:35:14,810 --> 00:35:20,160 want to connect one to the other, we can link these, if you will, together. 788 00:35:20,160 --> 00:35:24,110 And so here now I have a linked list that is not necessarily contiguous. 789 00:35:24,110 --> 00:35:27,200 There's a whole bunch of memory that may very well have real values, 790 00:35:27,200 --> 00:35:28,880 may very well have garbage values. 791 00:35:28,880 --> 00:35:31,370 But I've somehow now linked these two together. 792 00:35:31,370 --> 00:35:35,150 And maybe just as a final flourish, if we could blow up one more balloon 793 00:35:35,150 --> 00:35:37,730 to represent more space-- and now she's finding room 794 00:35:37,730 --> 00:35:39,380 for that balloon over there. 795 00:35:39,380 --> 00:35:41,960 796 00:35:41,960 --> 00:35:42,560 Nice. 797 00:35:42,560 --> 00:35:45,060 This one is a Yale chunk of memory. 798 00:35:45,060 --> 00:35:51,930 So now I'll need one more link, if you will. 799 00:35:51,930 --> 00:35:55,260 And if I actually connect these two in this way, 800 00:35:55,260 --> 00:35:58,500 let me go ahead and tie this off here. 801 00:35:58,500 --> 00:36:01,970 Now I can go ahead and connect these two. 802 00:36:01,970 --> 00:36:05,000 If you never see this demonstration again in next year's videos, 803 00:36:05,000 --> 00:36:07,790 it's because this did not go very well. 804 00:36:07,790 --> 00:36:11,210 Here now we have the number one where we first malloced it, 805 00:36:11,210 --> 00:36:14,840 the number two roughly where we malloced it, and the number three-- 806 00:36:14,840 --> 00:36:17,360 OK, so maybe we'll fix this some other year. 807 00:36:17,360 --> 00:36:19,490 Now, we'll have the number 3 allocated there. 808 00:36:19,490 --> 00:36:21,860 But the whole point of this silly exercise 809 00:36:21,860 --> 00:36:25,910 is that we can certainly use the computer's memory as more of a Canvas, 810 00:36:25,910 --> 00:36:31,040 put things wherever we want, wherever is available so long as we somehow 811 00:36:31,040 --> 00:36:35,510 connect the dots, so to speak and can make our way from one chunk of memory 812 00:36:35,510 --> 00:36:39,380 to the next to the next, thereby literally linking them together. 813 00:36:39,380 --> 00:36:41,732 But, of course, we're using balloons for this metaphor. 814 00:36:41,732 --> 00:36:43,690 But at the end of the day, this is just memory. 815 00:36:43,690 --> 00:36:49,500 So how could we encode one chunk to another chunk to a third chunk 816 00:36:49,500 --> 00:36:50,310 might you think? 817 00:36:50,310 --> 00:36:51,180 What's the trick? 818 00:36:51,180 --> 00:36:51,435 Yeah? 819 00:36:51,435 --> 00:36:52,227 AUDIENCE: Pointers. 820 00:36:52,227 --> 00:36:53,520 DAVID J. MALAN: Using pointers. 821 00:36:53,520 --> 00:36:55,353 That's why we introduced pointers last week. 822 00:36:55,353 --> 00:36:57,510 Because as simple as an idea as it is, as hard 823 00:36:57,510 --> 00:36:59,640 as it is to write sometimes in code, it's 824 00:36:59,640 --> 00:37:04,822 literally just a pointer, a foam finger pointing to another chunk of memory. 825 00:37:04,822 --> 00:37:06,780 And so these pointers really are metaphorically 826 00:37:06,780 --> 00:37:09,370 being implemented now in with these pieces of string. 827 00:37:09,370 --> 00:37:12,370 So we'll have to debrief later and decide if we ever do this demo again. 828 00:37:12,370 --> 00:37:14,430 But thank you to Scully for participating. 829 00:37:14,430 --> 00:37:17,510 830 00:37:17,510 --> 00:37:19,760 OK, we have plenty of-- 831 00:37:19,760 --> 00:37:20,885 OK, fair's fair. 832 00:37:20,885 --> 00:37:23,360 833 00:37:23,360 --> 00:37:23,860 There we go. 834 00:37:23,860 --> 00:37:24,760 Thank you Scully. 835 00:37:24,760 --> 00:37:28,965 So let's now actually translate this to something a little more concrete 836 00:37:28,965 --> 00:37:32,090 and then get to the point where we can actually solve this problem in code. 837 00:37:32,090 --> 00:37:34,340 So here's that same canvas of memory. 838 00:37:34,340 --> 00:37:36,910 And if in this canvas of memory now I actually 839 00:37:36,910 --> 00:37:40,300 want to implement this idea of the number 1, the number 2, the number 3, 840 00:37:40,300 --> 00:37:43,870 let's stop tying our hands in terms of expecting our memory 841 00:37:43,870 --> 00:37:47,780 to be contiguous back to back and start to move away from using arrays. 842 00:37:47,780 --> 00:37:50,920 So, for instance, suppose I want a malloc space for the number 1 843 00:37:50,920 --> 00:37:53,110 just as I first asked of Scully. 844 00:37:53,110 --> 00:37:55,050 Suppose it ends up over there on the board. 845 00:37:55,050 --> 00:37:56,800 The important thing for discussion here is 846 00:37:56,800 --> 00:38:01,750 that that number one, wherever it ends up, is surely located at some address. 847 00:38:01,750 --> 00:38:03,880 And for the sake of discussion as in the past, 848 00:38:03,880 --> 00:38:07,510 suppose the number one just ends up at location 0x, 1, 2, 3. 849 00:38:07,510 --> 00:38:11,050 So 0x, 1 2, 3 is where Scully was originally standing right here. 850 00:38:11,050 --> 00:38:13,870 Then we asked for malloc for another chunk of memory. 851 00:38:13,870 --> 00:38:17,980 Suppose that it ends up over here at address 0x, 4, 5, 6. 852 00:38:17,980 --> 00:38:21,400 So that's maybe roughly here when Scully was standing in her second position. 853 00:38:21,400 --> 00:38:23,630 Lastly, we allocate the number 3. 854 00:38:23,630 --> 00:38:26,060 Maybe it ends up at location 0x, 7, 8, 9, 855 00:38:26,060 --> 00:38:30,680 which was again per Scully's third malloc roughly over here on stage. 856 00:38:30,680 --> 00:38:34,040 Now, this picture alone doesn't seem to lend itself 857 00:38:34,040 --> 00:38:37,970 to an implementation of the string, metaphorically, to the pointers 858 00:38:37,970 --> 00:38:41,210 unless we allow ourselves a new luxury. 859 00:38:41,210 --> 00:38:47,000 Instead of just storing the number 1, 2, 3 in our usual squares, 860 00:38:47,000 --> 00:38:50,690 I think what I'm going to have to do is cheat and use more memory 861 00:38:50,690 --> 00:38:51,740 to store what? 862 00:38:51,740 --> 00:38:53,870 The pointers as you proposed. 863 00:38:53,870 --> 00:38:56,090 So here's a trade off that I promised we would 864 00:38:56,090 --> 00:39:01,070 start to see more and more if you want to improve your performance in terms 865 00:39:01,070 --> 00:39:04,880 of time and avoid stupid copying of data from one place 866 00:39:04,880 --> 00:39:06,530 to another again and again and again. 867 00:39:06,530 --> 00:39:09,707 If you want to save time, you're going to have to give up some space. 868 00:39:09,707 --> 00:39:12,290 And there's going to be this trade off between time and space. 869 00:39:12,290 --> 00:39:15,210 And it's up to you to decide ultimately which is more important. 870 00:39:15,210 --> 00:39:19,040 So if you allow yourself not enough memory for the numbers 1, 2 and 3 871 00:39:19,040 --> 00:39:23,690 but twice as much memory for the numbers 1, 2, and 3 and three pointers, 872 00:39:23,690 --> 00:39:26,670 one for each, what could we now do? 873 00:39:26,670 --> 00:39:28,190 Well, if this node-- 874 00:39:28,190 --> 00:39:29,840 and this is a computing term. 875 00:39:29,840 --> 00:39:33,830 Node is just a generic term describing a box of memory, a chunk of memory 876 00:39:33,830 --> 00:39:34,700 in this case. 877 00:39:34,700 --> 00:39:38,690 If I've given you this blank slate here, what value 878 00:39:38,690 --> 00:39:44,040 would make sense to store here if it's associated with this number one? 879 00:39:44,040 --> 00:39:44,540 Yeah? 880 00:39:44,540 --> 00:39:46,550 AUDIENCE: Maybe the address of the next element. 881 00:39:46,550 --> 00:39:49,050 DAVID J. MALAN: Good, maybe the address of the next element. 882 00:39:49,050 --> 00:39:52,010 So the next element technically is supposed to be the number 2. 883 00:39:52,010 --> 00:39:56,300 So at this location, I'm going to store the value 0x, 4, 5, 6. 884 00:39:56,300 --> 00:39:59,900 What then logically should go here in the second box? 885 00:39:59,900 --> 00:40:02,030 0x, 7, 8, 9. 886 00:40:02,030 --> 00:40:06,330 And then here's a little non-obvious-- it's the end of the list as of now. 887 00:40:06,330 --> 00:40:08,690 So we can't afford to let it be a garbage value 888 00:40:08,690 --> 00:40:10,410 because a garbage value is a value. 889 00:40:10,410 --> 00:40:14,030 And we don't want Oscar to effectively be pointing to some random location 890 00:40:14,030 --> 00:40:15,420 lest we go there. 891 00:40:15,420 --> 00:40:19,490 So what would be a good special value to put here to terminate a list? 892 00:40:19,490 --> 00:40:24,630 So null, so not null, which we used for strings but same idea, N-U-L-L, 893 00:40:24,630 --> 00:40:28,950 which we keep using now for pointers, otherwise known as the 0 address, 894 00:40:28,950 --> 00:40:32,730 which I could just write for shorthand as 0x0 in this case, 895 00:40:32,730 --> 00:40:34,430 which is the same thing as null. 896 00:40:34,430 --> 00:40:38,720 So here then, even though we've changed nothing about how a computer works-- 897 00:40:38,720 --> 00:40:40,610 this is just my computer's memory-- 898 00:40:40,610 --> 00:40:45,320 I'm using more memory now to effectively link one chunk, to the next chunk, 899 00:40:45,320 --> 00:40:46,130 to the next chunk. 900 00:40:46,130 --> 00:40:49,640 So easy, just to note that the downside is more space. 901 00:40:49,640 --> 00:40:54,410 But now we don't have to worry about ever copying and moving this data 902 00:40:54,410 --> 00:40:57,380 around, which maybe over time for really big programs 903 00:40:57,380 --> 00:41:02,040 big data sets could very well be a net positive and a win for us. 904 00:41:02,040 --> 00:41:09,560 So any questions first on this notion of what a linked list actually is? 905 00:41:09,560 --> 00:41:11,810 No, all right, well, recall from last time 906 00:41:11,810 --> 00:41:15,140 too that rarely do we actually care what the specific addresses are. 907 00:41:15,140 --> 00:41:17,540 So this is one node, two node, and three nodes. 908 00:41:17,540 --> 00:41:21,140 And inside of each of these nodes is two values, the actual number 909 00:41:21,140 --> 00:41:23,692 we care about and then a pointer. 910 00:41:23,692 --> 00:41:26,150 And now this is actually an opportunity to introduce a term 911 00:41:26,150 --> 00:41:29,660 that you might see increasingly nowadays, data, so 1, 2, and 3, 912 00:41:29,660 --> 00:41:31,567 which we obviously care about in this case. 913 00:41:31,567 --> 00:41:33,650 And then we could actually refer to these pointers 914 00:41:33,650 --> 00:41:36,200 more generally as metadata. 915 00:41:36,200 --> 00:41:39,230 It's actual data because it's helping me solve a problem, get from one 916 00:41:39,230 --> 00:41:40,100 place to another. 917 00:41:40,100 --> 00:41:43,370 But metadata is distinct from data in that I don't fundamentally 918 00:41:43,370 --> 00:41:44,870 care about the metadata. 919 00:41:44,870 --> 00:41:46,580 That's an implementation detail. 920 00:41:46,580 --> 00:41:49,740 But it does help me organize my actual data. 921 00:41:49,740 --> 00:41:51,530 So this is more of a high-level concept. 922 00:41:51,530 --> 00:41:53,300 So what, though, is a linked list? 923 00:41:53,300 --> 00:41:57,440 It turns out the store linked list will generally use just one more value. 924 00:41:57,440 --> 00:42:00,290 And I'm going to draw it only as a square, a single box, 925 00:42:00,290 --> 00:42:03,320 because if I declare now in my code, as I soon will, 926 00:42:03,320 --> 00:42:08,660 a variable maybe called list that points to a node, this 927 00:42:08,660 --> 00:42:11,750 is effectively how I could implement a linked list. 928 00:42:11,750 --> 00:42:13,790 I use one node per value. 929 00:42:13,790 --> 00:42:18,800 And I use one extra pointer to find the first of those nodes. 930 00:42:18,800 --> 00:42:22,010 And, in fact, here again is where I don't need to care fundamentally 931 00:42:22,010 --> 00:42:23,650 where any of these addresses are. 932 00:42:23,650 --> 00:42:27,170 It suffices to know that, yes, computers have memory addresses. 933 00:42:27,170 --> 00:42:29,050 So I could just abstract this away. 934 00:42:29,050 --> 00:42:31,630 And this is how I might pictorially represent 935 00:42:31,630 --> 00:42:34,450 a linked list, a cleaner version of those three balloons, 936 00:42:34,450 --> 00:42:36,250 whereby I was here. 937 00:42:36,250 --> 00:42:39,610 This was Scully's first balloon, second balloon, third balloon. 938 00:42:39,610 --> 00:42:43,720 These arrows now just represent pointers or strings with the balloons. 939 00:42:43,720 --> 00:42:49,360 So with that said, how can we go about translating this to some actual code? 940 00:42:49,360 --> 00:42:54,160 Well, here's where we can call into play some of that same syntax from last time 941 00:42:54,160 --> 00:42:58,310 and even a couple of weeks ago when we introduced the notion of a structure. 942 00:42:58,310 --> 00:43:01,780 So here for instance is how we defined a couple classes ago the notion 943 00:43:01,780 --> 00:43:02,480 of a person? 944 00:43:02,480 --> 00:43:02,980 Why? 945 00:43:02,980 --> 00:43:05,260 Well, C doesn't come with a person data type. 946 00:43:05,260 --> 00:43:08,650 But we concluded it was useful to be able to associate someone's name 947 00:43:08,650 --> 00:43:11,690 with their number and maybe even other fields as well. 948 00:43:11,690 --> 00:43:15,443 So we typedef'd a structure containing these two values. 949 00:43:15,443 --> 00:43:17,860 We learned last week that string is technically char star. 950 00:43:17,860 --> 00:43:20,170 But that doesn't change what the actual structure is. 951 00:43:20,170 --> 00:43:22,270 And we call this struct a person. 952 00:43:22,270 --> 00:43:26,440 Well, here's what we revealed last time, again taking those training wheels off. 953 00:43:26,440 --> 00:43:27,520 It's just a char star. 954 00:43:27,520 --> 00:43:29,620 Let's keep going in this direction, though. 955 00:43:29,620 --> 00:43:32,530 If I want to define not a person but maybe more generically 956 00:43:32,530 --> 00:43:35,350 something I'll call today a node, like a container 957 00:43:35,350 --> 00:43:38,650 for my numbers and my pointers, well, I similarly 958 00:43:38,650 --> 00:43:40,690 just need two values, not a name and a number, 959 00:43:40,690 --> 00:43:44,800 which isn't relevant today but maybe the number as an actual int 960 00:43:44,800 --> 00:43:48,410 so I can store the 1, the 2, the 3, the 4 and so forth. 961 00:43:48,410 --> 00:43:50,380 And this is a little less obvious. 962 00:43:50,380 --> 00:43:53,380 But conceptually, what should be the second value 963 00:43:53,380 --> 00:43:55,510 inside of any of these nodes? 964 00:43:55,510 --> 00:43:56,320 Yeah? 965 00:43:56,320 --> 00:43:57,880 So indeed a pointer. 966 00:43:57,880 --> 00:43:59,082 A pointer to what, though? 967 00:43:59,082 --> 00:44:00,040 AUDIENCE: Another node. 968 00:44:00,040 --> 00:44:01,870 DAVID J. MALAN: A pointer to another node. 969 00:44:01,870 --> 00:44:04,330 And here's where the syntax gets a little weird. 970 00:44:04,330 --> 00:44:10,030 But how do I define there to be a pointer in here to another node? 971 00:44:10,030 --> 00:44:15,700 Well, you might be inclined to say node *next because this means next is 972 00:44:15,700 --> 00:44:18,310 the name of the property or the attribute the variable inside 973 00:44:18,310 --> 00:44:18,940 the struct. 974 00:44:18,940 --> 00:44:20,640 Star means it's a pointer. 975 00:44:20,640 --> 00:44:21,640 What is it a pointer to? 976 00:44:21,640 --> 00:44:22,480 Clearly a node. 977 00:44:22,480 --> 00:44:24,640 But here's where C can bite you. 978 00:44:24,640 --> 00:44:29,830 The word node does not exist until you get to this last line of code. 979 00:44:29,830 --> 00:44:31,610 C goes top to bottom, left to right. 980 00:44:31,610 --> 00:44:36,760 So you literally can't use the word node here if it's not existing until here. 981 00:44:36,760 --> 00:44:40,240 The simple fix for this is to actually use a slightly more verbose way 982 00:44:40,240 --> 00:44:41,560 of defining a structure. 983 00:44:41,560 --> 00:44:42,700 You can actually do this. 984 00:44:42,700 --> 00:44:44,260 And we didn't bother doing this with person 985 00:44:44,260 --> 00:44:45,677 because it didn't solve a problem. 986 00:44:45,677 --> 00:44:48,520 But if you actually make your first line a little more verbose 987 00:44:48,520 --> 00:44:54,280 and say, give me a definition for a structure called node, now in here 988 00:44:54,280 --> 00:44:56,670 you can actually do this. 989 00:44:56,670 --> 00:44:59,470 This is an annoying implementation detail 990 00:44:59,470 --> 00:45:01,398 when it comes to implementing structures in C. 991 00:45:01,398 --> 00:45:03,190 But, essentially, we're leveraging the fact 992 00:45:03,190 --> 00:45:05,440 that because C code is read from top to bottom 993 00:45:05,440 --> 00:45:08,890 if you give this structure a name called struct node, 994 00:45:08,890 --> 00:45:10,750 now you can refer to it here. 995 00:45:10,750 --> 00:45:11,500 But you know what? 996 00:45:11,500 --> 00:45:14,770 It's annoying to write struct node, struct node, struct node everywhere 997 00:45:14,770 --> 00:45:15,560 in your code. 998 00:45:15,560 --> 00:45:17,770 So this last line now just gives you a synonym. 999 00:45:17,770 --> 00:45:21,010 And it shortens struct node to just node. 1000 00:45:21,010 --> 00:45:23,170 So long story short, this is a good template 1001 00:45:23,170 --> 00:45:27,460 for any time you implement some notion of a node as we will today. 1002 00:45:27,460 --> 00:45:29,920 But it's fundamentally the same idea as a person 1003 00:45:29,920 --> 00:45:32,620 just containing now a number and a pointer 1004 00:45:32,620 --> 00:45:35,960 to the next as opposed to someone's name and phone number. 1005 00:45:35,960 --> 00:45:38,110 So let me go ahead and walk through with some code 1006 00:45:38,110 --> 00:45:43,570 how we might actually implement this process of allocating a balloon 1007 00:45:43,570 --> 00:45:45,940 and putting a number on it, allocating another balloon 1008 00:45:45,940 --> 00:45:49,420 and putting a number on it and then connecting those two balloons again 1009 00:45:49,420 --> 00:45:50,120 and again. 1010 00:45:50,120 --> 00:45:52,570 So we'll do this step by step in a vacuum 1011 00:45:52,570 --> 00:45:55,570 so you can see the syntax that maps to each of these ideas. 1012 00:45:55,570 --> 00:45:58,240 Then we'll actually pull up VS Code and combine it all 1013 00:45:58,240 --> 00:46:00,440 and make a demonstrative program. 1014 00:46:00,440 --> 00:46:03,580 So here, for instance, is the single line of C code 1015 00:46:03,580 --> 00:46:06,370 via which I can give myself the beginning of a linked 1016 00:46:06,370 --> 00:46:10,180 list that is a pointer that will eventually be pointing to something. 1017 00:46:10,180 --> 00:46:12,580 So metaphorically, it's like creating a pointer. 1018 00:46:12,580 --> 00:46:15,940 I know we've gotten some complaints about that in the audience. 1019 00:46:15,940 --> 00:46:19,130 We'll use the Harvard one to represent a pointer to something. 1020 00:46:19,130 --> 00:46:22,180 But if I only do this and I only say give me 1021 00:46:22,180 --> 00:46:25,840 a variable called list that is a pointer to a node, 1022 00:46:25,840 --> 00:46:28,190 that's going to leave a garbage value. 1023 00:46:28,190 --> 00:46:30,400 So this is like pointing to some random location 1024 00:46:30,400 --> 00:46:32,200 because it's previously some value. 1025 00:46:32,200 --> 00:46:33,250 Who knows what it is. 1026 00:46:33,250 --> 00:46:34,540 But we can solve that how? 1027 00:46:34,540 --> 00:46:39,850 What would be a good initial value to set this equal to? 1028 00:46:39,850 --> 00:46:40,750 So null. 1029 00:46:40,750 --> 00:46:44,020 At least if it's null, we then know that this isn't a garbage value. 1030 00:46:44,020 --> 00:46:46,592 This is literally 0x0, a.k.a. null. 1031 00:46:46,592 --> 00:46:48,800 And I'm just going to leave it blank for cleanliness. 1032 00:46:48,800 --> 00:46:53,410 So this would be the right way to begin to create a linked list of size 0. 1033 00:46:53,410 --> 00:46:54,520 There's nothing there. 1034 00:46:54,520 --> 00:46:58,660 But at least now that foam finger is not pointing to some bogus chunk of memory, 1035 00:46:58,660 --> 00:46:59,780 some garbage value. 1036 00:46:59,780 --> 00:47:03,220 So this is how the world might exist now in the computer's memory. 1037 00:47:03,220 --> 00:47:06,040 How do I go about allocating space now for a node? 1038 00:47:06,040 --> 00:47:08,110 Well, it's just ideas from last week. 1039 00:47:08,110 --> 00:47:12,040 Once the word node exists as via that typedef, 1040 00:47:12,040 --> 00:47:16,060 I can just use malloc to ask for the size of a node. 1041 00:47:16,060 --> 00:47:17,560 I don't have to do the math myself. 1042 00:47:17,560 --> 00:47:19,120 I don't care how big a node is. 1043 00:47:19,120 --> 00:47:21,380 Just let it do the math for me. 1044 00:47:21,380 --> 00:47:24,880 Then that's going to return presumably the address of a chunk of memory 1045 00:47:24,880 --> 00:47:27,010 big enough for that big rectangle. 1046 00:47:27,010 --> 00:47:30,850 And I'm going to store that for now in a temporary variable called n 1047 00:47:30,850 --> 00:47:33,050 that itself is a pointer to a node. 1048 00:47:33,050 --> 00:47:35,170 So this might look like a lot altogether. 1049 00:47:35,170 --> 00:47:39,400 But this is just like before when I allocated space for a string 1050 00:47:39,400 --> 00:47:41,680 or I allocated space for a bunch of numbers 1051 00:47:41,680 --> 00:47:45,770 and set it equal to a pointer to integers, for instance, [INAUDIBLE] 1052 00:47:45,770 --> 00:47:46,750 recently. 1053 00:47:46,750 --> 00:47:50,020 All right, so this gives me a box in memory. 1054 00:47:50,020 --> 00:47:52,420 This gives me a pointer called n. 1055 00:47:52,420 --> 00:47:55,360 So it's similarly just a single square because it's just an address. 1056 00:47:55,360 --> 00:47:57,730 And it similarly gives me a bigger chunk of memory 1057 00:47:57,730 --> 00:48:01,060 somewhere in the computer's memory containing enough space 1058 00:48:01,060 --> 00:48:04,570 for the number that's going to go there, a 1, a 2, or 3, or whatever, 1059 00:48:04,570 --> 00:48:06,740 and a pointer to the next value. 1060 00:48:06,740 --> 00:48:10,450 So these lines of code collectively, this half creates this in memory. 1061 00:48:10,450 --> 00:48:12,430 This half creates this in memory. 1062 00:48:12,430 --> 00:48:15,880 And the assignment here, the equal sign, essentially 1063 00:48:15,880 --> 00:48:17,193 does the equivalent of that. 1064 00:48:17,193 --> 00:48:19,360 I don't care what the address is, the actual number. 1065 00:48:19,360 --> 00:48:22,430 It's as though n is now pointing to that chunk of memory. 1066 00:48:22,430 --> 00:48:24,170 But this isn't very useful. 1067 00:48:24,170 --> 00:48:27,960 If I want to store the number 1 here, with what code can I do that? 1068 00:48:27,960 --> 00:48:31,040 Well, I could do this, borrowing an idea from last week. 1069 00:48:31,040 --> 00:48:34,640 So *n presumes that n is a pointer. 1070 00:48:34,640 --> 00:48:38,420 *n means go there, go to whatever you're pointing at. 1071 00:48:38,420 --> 00:48:41,240 The dot operator means if you're pointing at a structure, 1072 00:48:41,240 --> 00:48:43,490 go inside of it to the number field. 1073 00:48:43,490 --> 00:48:47,000 And we did this a couple of weeks ago with number and person 1074 00:48:47,000 --> 00:48:48,590 when we implemented an address book. 1075 00:48:48,590 --> 00:48:50,270 So star n is go there. 1076 00:48:50,270 --> 00:48:53,178 And the dot operator means go to the number field. 1077 00:48:53,178 --> 00:48:55,220 The one on the right hand side and the equal sign 1078 00:48:55,220 --> 00:48:58,520 means set whatever is there equal to the number 1. 1079 00:48:58,520 --> 00:49:00,800 It turns out this is the syntax, though, that I 1080 00:49:00,800 --> 00:49:04,310 alluded to being a little bit cryptic and not very pleasant to remember 1081 00:49:04,310 --> 00:49:04,940 or type. 1082 00:49:04,940 --> 00:49:07,490 Here, though, is where you can synonymously 1083 00:49:07,490 --> 00:49:11,690 instead use this line of code, which most C programmers would use instead. 1084 00:49:11,690 --> 00:49:13,700 This means n is still a pointer. 1085 00:49:13,700 --> 00:49:18,770 The arrow literally with a hyphen and a greater than sign means go there. 1086 00:49:18,770 --> 00:49:22,160 It's the exact same thing as the parentheses with the star, 1087 00:49:22,160 --> 00:49:23,060 with the dot. 1088 00:49:23,060 --> 00:49:26,835 This just simplifies it to look like these actual pictorial arrows. 1089 00:49:26,835 --> 00:49:29,210 So this would be the most conventional way of doing this. 1090 00:49:29,210 --> 00:49:31,100 How now do I update the next field? 1091 00:49:31,100 --> 00:49:33,260 Well, I think I'm going to just say the same thing, 1092 00:49:33,260 --> 00:49:39,050 n go there but go into the next field and set it equal to null. 1093 00:49:39,050 --> 00:49:39,740 Why null? 1094 00:49:39,740 --> 00:49:44,420 If the whole point here was to allocate just one chunk of memory, one node, 1095 00:49:44,420 --> 00:49:46,550 you don't want to leave this as a garbage value 1096 00:49:46,550 --> 00:49:49,130 because that value will be mistaken for an arrow pointing 1097 00:49:49,130 --> 00:49:51,910 to some random location. 1098 00:49:51,910 --> 00:49:52,910 All right, that's a lot. 1099 00:49:52,910 --> 00:49:54,900 And, again, we're doing it in isolation step 1100 00:49:54,900 --> 00:49:56,900 by step just to paint the picture on the screen. 1101 00:49:56,900 --> 00:50:01,550 But any questions on any of these steps? 1102 00:50:01,550 --> 00:50:05,030 Each picture translates to one line of code there. 1103 00:50:05,030 --> 00:50:10,340 All right, so if you're comfy enough with those lines there, 1104 00:50:10,340 --> 00:50:13,110 what can I proceed to now do? 1105 00:50:13,110 --> 00:50:18,620 Well, let me propose that what I could now do with this same approach 1106 00:50:18,620 --> 00:50:20,588 is set list itself equal to n. 1107 00:50:20,588 --> 00:50:22,880 Because if the whole goal is to build up a linked list, 1108 00:50:22,880 --> 00:50:25,280 and list represents that linked list, list 1109 00:50:25,280 --> 00:50:29,090 equals n is essentially saying whatever address is here, put it here. 1110 00:50:29,090 --> 00:50:32,420 And pictorially what that means is, temporarily point both pointers 1111 00:50:32,420 --> 00:50:33,690 to the same exact place. 1112 00:50:33,690 --> 00:50:34,190 Why? 1113 00:50:34,190 --> 00:50:36,260 Because this is the list that I care about long term. 1114 00:50:36,260 --> 00:50:39,260 This is maybe my global variable that I'm going to keep around forever 1115 00:50:39,260 --> 00:50:40,412 in my computer's memory. 1116 00:50:40,412 --> 00:50:43,370 This was just a temporary pointer so that I could get a chunk of memory 1117 00:50:43,370 --> 00:50:46,500 and go to its locations and update it with those values. 1118 00:50:46,500 --> 00:50:49,500 So, eventually, this is probably going to go away altogether. 1119 00:50:49,500 --> 00:50:51,800 And this then is a linked list of size 1. 1120 00:50:51,800 --> 00:50:54,350 This is what happened when Scully inflated one balloon, 1121 00:50:54,350 --> 00:50:58,550 I wrote the number 1 on it, and I pointed at that single balloon. 1122 00:50:58,550 --> 00:51:01,200 All right, if I want to go ahead and do this again and again, 1123 00:51:01,200 --> 00:51:02,700 we'll do this a little more quickly. 1124 00:51:02,700 --> 00:51:05,040 But it's the same kind of code for now. 1125 00:51:05,040 --> 00:51:08,200 Here's how I allocate space for another node. 1126 00:51:08,200 --> 00:51:10,268 Here's how I can temporarily store it in n. 1127 00:51:10,268 --> 00:51:13,560 And I'll re-delcare it here just to make clear that it's indeed just a pointer. 1128 00:51:13,560 --> 00:51:15,720 So the left hand side of the expression gives me this. 1129 00:51:15,720 --> 00:51:17,760 The right hand side of the expression gives me this. 1130 00:51:17,760 --> 00:51:18,630 Where could it be? 1131 00:51:18,630 --> 00:51:19,590 I mean, I put it here. 1132 00:51:19,590 --> 00:51:20,310 It could have been there. 1133 00:51:20,310 --> 00:51:21,685 It could have been anywhere else. 1134 00:51:21,685 --> 00:51:23,810 But malloc gets to decide that for us. 1135 00:51:23,810 --> 00:51:28,950 n equals this, just sets that temporary pointer equal to that chunk of memory. 1136 00:51:28,950 --> 00:51:30,090 I should clean this up. 1137 00:51:30,090 --> 00:51:33,150 How do I now put the number 2 into this node? 1138 00:51:33,150 --> 00:51:34,530 Well, I start at n. 1139 00:51:34,530 --> 00:51:35,670 I go there. 1140 00:51:35,670 --> 00:51:38,040 I go to the number field, which I keep drawing on top. 1141 00:51:38,040 --> 00:51:39,690 And I set it equal to 2. 1142 00:51:39,690 --> 00:51:43,600 Now, it's a little non-obvious what we should do here. 1143 00:51:43,600 --> 00:51:45,480 So I'm going to be a little lazy at first. 1144 00:51:45,480 --> 00:51:49,230 And rather than put these numbers into the linked list in sorted order, 1145 00:51:49,230 --> 00:51:51,493 like ascending order 1, 2, 3, 4, I'm just 1146 00:51:51,493 --> 00:51:53,410 going to plop it at the beginning of the list. 1147 00:51:53,410 --> 00:51:53,580 Why? 1148 00:51:53,580 --> 00:51:55,380 Because it's actually a little simpler. 1149 00:51:55,380 --> 00:51:58,292 Each time I allocate a new node, I just prepend it, 1150 00:51:58,292 --> 00:52:00,000 so to speak, to the beginning of the list 1151 00:52:00,000 --> 00:52:02,770 even though it's going to end up looking backwards in this case. 1152 00:52:02,770 --> 00:52:05,430 So, notice, at this point in the story, I've 1153 00:52:05,430 --> 00:52:08,280 got list pointing to the original linked list. 1154 00:52:08,280 --> 00:52:11,250 I've got n pointing to the brand new node. 1155 00:52:11,250 --> 00:52:15,060 And, ultimately, I want to connect these just as Scully 1156 00:52:15,060 --> 00:52:16,272 and I did with the strings. 1157 00:52:16,272 --> 00:52:17,230 This is just temporary. 1158 00:52:17,230 --> 00:52:18,900 So I want to connect these things. 1159 00:52:18,900 --> 00:52:20,760 Here's how I could do it wrong. 1160 00:52:20,760 --> 00:52:25,920 If I proceed now and update, rather, after one more line setting this equal 1161 00:52:25,920 --> 00:52:29,050 to null-- sorry, let's at least get rid of that garbage value-- 1162 00:52:29,050 --> 00:52:31,720 here's how I could proceed to maybe do this wrong. 1163 00:52:31,720 --> 00:52:36,540 Let me go ahead and update, for instance, list equals n. 1164 00:52:36,540 --> 00:52:40,560 So if I update list equaling n, that's going 1165 00:52:40,560 --> 00:52:43,980 to point the list at this new node. 1166 00:52:43,980 --> 00:52:47,110 But what has just happened? 1167 00:52:47,110 --> 00:52:48,000 What did I do wrong? 1168 00:52:48,000 --> 00:52:48,255 Yeah? 1169 00:52:48,255 --> 00:52:49,440 AUDIENCE: Nothing's pointing to 1. 1170 00:52:49,440 --> 00:52:51,180 DAVID J. MALAN: So nothing's pointing to 1. 1171 00:52:51,180 --> 00:52:54,030 And even though you and I obviously have this bird's eye view of everything 1172 00:52:54,030 --> 00:52:55,988 in the computer's memory, the computer doesn't. 1173 00:52:55,988 --> 00:52:58,620 If you have no variable remembering the location of that node, 1174 00:52:58,620 --> 00:53:00,667 for all intents and purposes, it is gone. 1175 00:53:00,667 --> 00:53:02,250 So what I've essentially done is this. 1176 00:53:02,250 --> 00:53:05,880 When I update that pointer to point at the number 2, it's as though-- 1177 00:53:05,880 --> 00:53:08,440 this was a much nicer idea in theory when we talked about it. 1178 00:53:08,440 --> 00:53:09,607 But it's not really working. 1179 00:53:09,607 --> 00:53:13,230 But this is effectively what we've tried to achieve, which is I've orphaned, 1180 00:53:13,230 --> 00:53:14,413 so to speak, the number 1. 1181 00:53:14,413 --> 00:53:16,830 And that too is a technical term in the context of memory. 1182 00:53:16,830 --> 00:53:20,100 If no one is pointing at it if no string is connected to it, 1183 00:53:20,100 --> 00:53:22,800 I have indeed orphaned a chunk of memory, a.k.a. 1184 00:53:22,800 --> 00:53:23,760 a memory leak. 1185 00:53:23,760 --> 00:53:25,950 And Valgrind would not, in fact like this. 1186 00:53:25,950 --> 00:53:28,540 And Valgrind would, in fact, notice this. 1187 00:53:28,540 --> 00:53:31,560 So what would be the better approach? 1188 00:53:31,560 --> 00:53:32,460 Let me rewind. 1189 00:53:32,460 --> 00:53:36,510 Instead of updating that address to be that of this node, let's 1190 00:53:36,510 --> 00:53:39,720 rewind to where we were a moment ago where list is still 1191 00:53:39,720 --> 00:53:42,914 pointing at the original, n is still pointing at the new chunk of memory. 1192 00:53:42,914 --> 00:53:44,122 And what should I do instead? 1193 00:53:44,122 --> 00:53:46,890 Well, what should I do is maybe this. 1194 00:53:46,890 --> 00:53:50,740 Let's go to the next field of the new node. 1195 00:53:50,740 --> 00:53:52,170 So follow the arrow. 1196 00:53:52,170 --> 00:53:53,400 Go to the next field. 1197 00:53:53,400 --> 00:53:55,560 And what should I put here instead? 1198 00:53:55,560 --> 00:54:00,120 Why don't I put the memory address of the original node? 1199 00:54:00,120 --> 00:54:01,150 How can I get that? 1200 00:54:01,150 --> 00:54:02,650 Well, that's actually this. 1201 00:54:02,650 --> 00:54:05,070 So if list is pointing at the original node, 1202 00:54:05,070 --> 00:54:08,560 I can just copy that address into this next field, 1203 00:54:08,560 --> 00:54:12,420 which has the effect of doing that, albeit in duplicate. 1204 00:54:12,420 --> 00:54:15,420 I've updated the next field to point at the very thing 1205 00:54:15,420 --> 00:54:18,120 that the original list is already pointing at. 1206 00:54:18,120 --> 00:54:19,950 And now for the sake of discussion, let me 1207 00:54:19,950 --> 00:54:22,530 get rid of my temporary node called n. 1208 00:54:22,530 --> 00:54:26,850 And what you'll see, ultimately, is that once we 1209 00:54:26,850 --> 00:54:30,990 set list equal to n and get rid of it, now 1210 00:54:30,990 --> 00:54:37,410 we can just treat the whole linked list as being connected and linked this way. 1211 00:54:37,410 --> 00:54:38,160 How do we do this? 1212 00:54:38,160 --> 00:54:39,660 Again, we won't belabor the point with more. 1213 00:54:39,660 --> 00:54:41,693 But suppose I want to allocate a third node. 1214 00:54:41,693 --> 00:54:43,110 I have to do the exact same thing. 1215 00:54:43,110 --> 00:54:47,280 But I have to update this next field to point at the existing list 1216 00:54:47,280 --> 00:54:49,080 before I update list itself. 1217 00:54:49,080 --> 00:54:53,020 Long story short, order of operations is going to be super important. 1218 00:54:53,020 --> 00:54:55,932 And if I want to stitch these data structures together, 1219 00:54:55,932 --> 00:54:58,140 I would encourage you to think ultimately-- certainly 1220 00:54:58,140 --> 00:55:00,140 when it comes time to write something like this, 1221 00:55:00,140 --> 00:55:04,160 think about what it is that we're actually trying to tie together. 1222 00:55:04,160 --> 00:55:05,810 So let me go ahead and do this. 1223 00:55:05,810 --> 00:55:08,200 I'm going to go over to VS Code here. 1224 00:55:08,200 --> 00:55:10,780 I'm going to delete the old code for list.c. 1225 00:55:10,780 --> 00:55:14,380 And perhaps now we can transition away from our old approach 1226 00:55:14,380 --> 00:55:18,530 and actually do something with these pointers instead. 1227 00:55:18,530 --> 00:55:21,880 So I'm going to go ahead, and let's say #include 1228 00:55:21,880 --> 00:55:26,020 as before, #include standard io.h. 1229 00:55:26,020 --> 00:55:29,920 Let's go ahead and include standard lib.h proactively. 1230 00:55:29,920 --> 00:55:31,910 And let's go ahead and create that data type. 1231 00:55:31,910 --> 00:55:34,990 So typedef a struct called node. 1232 00:55:34,990 --> 00:55:38,440 And inside of this node, let's give us an integer called number 1233 00:55:38,440 --> 00:55:40,300 to store the 1, the 2, the 3, the 4. 1234 00:55:40,300 --> 00:55:44,980 And then let's create a struct node star value called next whose purpose in life 1235 00:55:44,980 --> 00:55:47,860 is going to point to the next node in any such list. 1236 00:55:47,860 --> 00:55:51,130 I'm going to shorten the name of all of this to just node simply. 1237 00:55:51,130 --> 00:55:53,680 And then in main, let's go ahead and do this. 1238 00:55:53,680 --> 00:55:56,410 We'll bring back our friend argc and argv 1239 00:55:56,410 --> 00:55:59,050 so that I can actually implement a program this time that 1240 00:55:59,050 --> 00:56:01,220 lets me construct a linked list using numbers 1241 00:56:01,220 --> 00:56:02,845 that I just passed at the command line. 1242 00:56:02,845 --> 00:56:06,110 I don't want to bother with getInt again and again or the CS50 library. 1243 00:56:06,110 --> 00:56:10,490 So let's just use argc and argv. 1244 00:56:10,490 --> 00:56:15,380 But with argv, recall string now as of last week is synonymous with char star. 1245 00:56:15,380 --> 00:56:20,240 So that's the exact same thing as we've used in week 2 onward for command line 1246 00:56:20,240 --> 00:56:21,030 arguments. 1247 00:56:21,030 --> 00:56:22,280 So what do I want to do? 1248 00:56:22,280 --> 00:56:24,470 My goal in life with this demonstration is 1249 00:56:24,470 --> 00:56:30,180 to create and code this linked list here or at least the beginnings thereof. 1250 00:56:30,180 --> 00:56:31,650 So how can I do this? 1251 00:56:31,650 --> 00:56:33,320 Let me go back into VS Code. 1252 00:56:33,320 --> 00:56:37,640 Let me declare a linked list called list but initialize it to null. 1253 00:56:37,640 --> 00:56:40,130 So there's nothing there just yet. 1254 00:56:40,130 --> 00:56:44,300 How now can I go about building this linked list? 1255 00:56:44,300 --> 00:56:46,440 By taking numbers from the command line. 1256 00:56:46,440 --> 00:56:47,600 So let's do this. 1257 00:56:47,600 --> 00:56:55,970 For int i equals 1, i is less than argc i plus plus, let me go ahead 1258 00:56:55,970 --> 00:56:57,142 and do this. 1259 00:56:57,142 --> 00:56:59,600 I'm going to go ahead, and just for the sake of discussion, 1260 00:56:59,600 --> 00:57:02,220 let me print out where we're going with this. 1261 00:57:02,220 --> 00:57:07,925 Let me go ahead and print out %s backslash n whatever is in argv bracket 1262 00:57:07,925 --> 00:57:09,080 i. 1263 00:57:09,080 --> 00:57:11,178 So I'm not doing anything interesting yet. 1264 00:57:11,178 --> 00:57:13,470 But let's just demonstrate where we're going with this. 1265 00:57:13,470 --> 00:57:16,640 Let me go ahead and make list, ./list. 1266 00:57:16,640 --> 00:57:20,270 And let me put the numbers 1, 2 and 3 as command line arguments. 1267 00:57:20,270 --> 00:57:21,050 Enter. 1268 00:57:21,050 --> 00:57:22,910 There, we just have those numbers spit out. 1269 00:57:22,910 --> 00:57:25,190 I'm just jumping through this hoop to demonstrate 1270 00:57:25,190 --> 00:57:26,730 how I'm getting those values. 1271 00:57:26,730 --> 00:57:30,620 But notice the values in argv are always strings, a.k.a. 1272 00:57:30,620 --> 00:57:31,580 char star. 1273 00:57:31,580 --> 00:57:37,250 So if I actually want to convert a string to an integer like this, 1274 00:57:37,250 --> 00:57:38,750 how can I do this? 1275 00:57:38,750 --> 00:57:43,580 I want to set the number variable equal to argv bracket i. 1276 00:57:43,580 --> 00:57:44,930 But argv bracket i is a string. 1277 00:57:44,930 --> 00:57:49,425 How can I convert a string to a number anyone recall? 1278 00:57:49,425 --> 00:57:49,925 Yeah? 1279 00:57:49,925 --> 00:57:51,150 AUDIENCE: Atoi. 1280 00:57:51,150 --> 00:57:54,810 DAVID J. MALAN: Atoi, so ASCII to I, so ASCII to integer. 1281 00:57:54,810 --> 00:58:00,420 So if I do atoi, I can actually convert one to the other in this way. 1282 00:58:00,420 --> 00:58:04,208 And now I can actually print this as an int instead of a string. 1283 00:58:04,208 --> 00:58:06,750 Now, that's not going to change the aesthetics of the program 1284 00:58:06,750 --> 00:58:07,830 if I print it out again. 1285 00:58:07,830 --> 00:58:10,820 But it does, in fact, give me an integer to work with. 1286 00:58:10,820 --> 00:58:12,200 But let's not bother printing it. 1287 00:58:12,200 --> 00:58:16,310 Let's instead put this number and any other number at the command line 1288 00:58:16,310 --> 00:58:18,210 into a linked list. 1289 00:58:18,210 --> 00:58:22,430 So let me go ahead and allocate a pointer called n. 1290 00:58:22,430 --> 00:58:25,820 Let me set it equal to the return value of malloc 1291 00:58:25,820 --> 00:58:29,480 asking malloc for the size of one node. 1292 00:58:29,480 --> 00:58:31,700 Ideally that will give me a chunk of memory 1293 00:58:31,700 --> 00:58:34,280 that can fit this number and a pointer. 1294 00:58:34,280 --> 00:58:38,670 Just for good measure, I'm going to check, well, if n equals equals null, 1295 00:58:38,670 --> 00:58:41,400 then actually this isn't going to work. 1296 00:58:41,400 --> 00:58:44,962 So we should probably free memory thus far. 1297 00:58:44,962 --> 00:58:46,670 So I'm just going to leave this like this 1298 00:58:46,670 --> 00:58:48,270 because there's a few steps involved. 1299 00:58:48,270 --> 00:58:51,110 So free memory thus far. 1300 00:58:51,110 --> 00:58:55,130 And then we can go ahead, for instance, and return 1. 1301 00:58:55,130 --> 00:58:59,900 All right, if now I don't have an error and n is not in fact null, 1302 00:58:59,900 --> 00:59:02,330 but it's a valid address, I can go into n. 1303 00:59:02,330 --> 00:59:05,030 I can follow that pointer to the number field 1304 00:59:05,030 --> 00:59:07,110 and set it equal to the actual number. 1305 00:59:07,110 --> 00:59:08,902 So this is a little strange at first glance 1306 00:59:08,902 --> 00:59:11,277 that I've got number on the left and number on the right. 1307 00:59:11,277 --> 00:59:13,430 But they're different. n is currently pointing 1308 00:59:13,430 --> 00:59:17,180 at a chunk of memory big enough to fit a node. 1309 00:59:17,180 --> 00:59:19,910 n arrow number means go to that chunk of memory 1310 00:59:19,910 --> 00:59:24,260 and go to the top half of the rectangle and update that number 1311 00:59:24,260 --> 00:59:28,340 to be whatever the human typed in after we've converted it on line 16 1312 00:59:28,340 --> 00:59:31,880 here to an actual integer. 1313 00:59:31,880 --> 00:59:35,450 All right, what next do I do? 1314 00:59:35,450 --> 00:59:39,500 n arrow next should probably be at this point initialized to null. 1315 00:59:39,500 --> 00:59:44,990 And how now do I actually add this node n to my original linked list? 1316 00:59:44,990 --> 00:59:47,180 Well, I could just do list equals n. 1317 00:59:47,180 --> 00:59:51,110 And that would update a la the foam finger my list variable 1318 00:59:51,110 --> 00:59:52,730 to point at this new node. 1319 00:59:52,730 --> 00:59:55,250 But we said before that that's potentially bad. 1320 00:59:55,250 --> 00:59:55,940 Why? 1321 00:59:55,940 --> 00:59:59,080 Because if list is already pointing at something, 1322 00:59:59,080 --> 01:00:00,830 we can't just blow kindly change what it's 1323 01:00:00,830 --> 01:00:03,565 pointing at because we'll have orphaned any previous numbers. 1324 01:00:03,565 --> 01:00:05,690 It's not relevant at the moment because we're still 1325 01:00:05,690 --> 01:00:07,460 in the first iteration of this loop. 1326 01:00:07,460 --> 01:00:10,320 But we don't want to orphan or leak any memory. 1327 01:00:10,320 --> 01:00:11,930 So what do I first want to do? 1328 01:00:11,930 --> 01:00:15,510 Before I actually point the linked list at that new node, 1329 01:00:15,510 --> 01:00:21,140 I'm going to instead say, go to this current node, arrow, next, 1330 01:00:21,140 --> 01:00:23,510 and actually set that equal to list. 1331 01:00:23,510 --> 01:00:27,140 So strictly speaking, I don't actually need to initialize it to null. 1332 01:00:27,140 --> 01:00:34,020 I can initialize the next field of this new node to point at the existing list. 1333 01:00:34,020 --> 01:00:37,160 So what I'm going to do here is, instead of initializing the next field 1334 01:00:37,160 --> 01:00:42,020 equal to null, if I want to insert this new node in front of any nodes that 1335 01:00:42,020 --> 01:00:48,590 already exist, I can simply say set the node's next field equal to whatever 1336 01:00:48,590 --> 01:00:49,760 the list currently is. 1337 01:00:49,760 --> 01:00:53,960 And now in this last line I can update the list itself to point to n. 1338 01:00:53,960 --> 01:00:57,667 So after this, let's just go ahead and do something relatively simple even 1339 01:00:57,667 --> 01:01:00,750 though the syntax for this is going to look a little complicated at first. 1340 01:01:00,750 --> 01:01:04,270 How do I go about printing the whole list? 1341 01:01:04,270 --> 01:01:06,270 So print whole list. 1342 01:01:06,270 --> 01:01:08,020 Well, there's a couple of ways to do this. 1343 01:01:08,020 --> 01:01:10,140 But if you imagine a world-- if we fast forward 1344 01:01:10,140 --> 01:01:14,350 to a world in which we now have a linked list of size 3, for instance, 1345 01:01:14,350 --> 01:01:17,640 here's where we might be at some point in the computer's memory. 1346 01:01:17,640 --> 01:01:18,900 We've inserted the 1. 1347 01:01:18,900 --> 01:01:20,100 Then we inserted the 2. 1348 01:01:20,100 --> 01:01:21,180 Then we inserted the 3. 1349 01:01:21,180 --> 01:01:24,810 But because we're prepending everything, it actually looks like 3, 2, 1. 1350 01:01:24,810 --> 01:01:26,820 So how could I go about printing this? 1351 01:01:26,820 --> 01:01:28,710 Well, ideally, I could do this. 1352 01:01:28,710 --> 01:01:31,260 If a computer can only look at one location at a time, 1353 01:01:31,260 --> 01:01:36,060 I can grab my foam finger and point at the 3 and print it out, point at the 2 1354 01:01:36,060 --> 01:01:38,640 and print it out, point at the 1 and print it out. 1355 01:01:38,640 --> 01:01:42,090 And then because this is null, I'm all done pointing and printing. 1356 01:01:42,090 --> 01:01:44,730 But how can I translate this to actual code? 1357 01:01:44,730 --> 01:01:47,220 Well, I could implement that foam finger, so to speak, 1358 01:01:47,220 --> 01:01:48,520 in the following way. 1359 01:01:48,520 --> 01:01:51,990 I could give myself a pointer often abbreviated by computer scientists 1360 01:01:51,990 --> 01:01:57,690 as ptr, specify that that's indeed a pointer to a node, as per that star, 1361 01:01:57,690 --> 01:02:00,490 and initialize that pointer to be the list itself. 1362 01:02:00,490 --> 01:02:05,040 So this is the code equivalent of, if I have this same picture on the screen, 1363 01:02:05,040 --> 01:02:08,850 declaring a pointer variable and point it at whatever the list 1364 01:02:08,850 --> 01:02:12,330 itself is storing first. 1365 01:02:12,330 --> 01:02:15,540 And, now, that's akin to doing this. 1366 01:02:15,540 --> 01:02:19,360 If I now go back into my code, how can I do this? 1367 01:02:19,360 --> 01:02:22,260 Well, so long as that pointer does not equal 1368 01:02:22,260 --> 01:02:25,690 null-- that is, so long as that pointer is not at the end of the list, 1369 01:02:25,690 --> 01:02:31,890 let me go ahead and print out using printf an integer with percent i. 1370 01:02:31,890 --> 01:02:34,650 And then let's print out whatever I'm currently 1371 01:02:34,650 --> 01:02:37,930 pointing at in ptr arrow number. 1372 01:02:37,930 --> 01:02:42,720 So whatever I'm pointing at, go there and print the number that you find. 1373 01:02:42,720 --> 01:02:44,970 After that, what do I want to go ahead and do? 1374 01:02:44,970 --> 01:02:49,590 I'm going to set pointer equal to pointer arrow next. 1375 01:02:49,590 --> 01:02:50,650 So what does this mean? 1376 01:02:50,650 --> 01:02:53,400 If I go back to my picture here and I want 1377 01:02:53,400 --> 01:02:56,790 to actually walk through this thing, that first line of code 1378 01:02:56,790 --> 01:03:00,720 ensures that this foam finger, a.k.a. ptr, represented here, 1379 01:03:00,720 --> 01:03:02,910 is pointing at the first element of the list. 1380 01:03:02,910 --> 01:03:06,990 Once I've printed it out with printf, I'm then doing pointer 1381 01:03:06,990 --> 01:03:10,950 equals pointer next, which is following this next arrow. 1382 01:03:10,950 --> 01:03:13,770 So ptr now points at the 2. 1383 01:03:13,770 --> 01:03:16,590 I then print that out and set pointer equal to pointer next. 1384 01:03:16,590 --> 01:03:20,340 That's like following this arrow and updating pointer to point at this node 1385 01:03:20,340 --> 01:03:20,940 instead. 1386 01:03:20,940 --> 01:03:24,030 At that point, the next step is going to be to point it to null. 1387 01:03:24,030 --> 01:03:26,470 So for all intents and purposes, I'm done. 1388 01:03:26,470 --> 01:03:30,150 And that's why we can actually get away with this while loop 1389 01:03:30,150 --> 01:03:36,040 because while pointer is not null, it's going to print and print and print. 1390 01:03:36,040 --> 01:03:38,385 Now, let me go into my terminal window. 1391 01:03:38,385 --> 01:03:40,260 Let me go ahead and make list and really hope 1392 01:03:40,260 --> 01:03:43,230 I didn't make any mistakes because this was a lot all at once. 1393 01:03:43,230 --> 01:03:45,090 Seems to have compiled OK. 1394 01:03:45,090 --> 01:03:48,240 When I run ./list of 1, 2, 3-- 1395 01:03:48,240 --> 01:03:51,540 theoretically, this code is correct, should unbeknownst to me 1396 01:03:51,540 --> 01:03:53,850 build up an entire linked list in memory. 1397 01:03:53,850 --> 01:03:57,787 But what's it going to print out ultimately? 1398 01:03:57,787 --> 01:03:59,370 What do you think it's going to print? 1399 01:03:59,370 --> 01:04:01,710 Yeah? 1400 01:04:01,710 --> 01:04:05,305 It could print out null if I really screwed up, yes. 1401 01:04:05,305 --> 01:04:05,805 What else? 1402 01:04:05,805 --> 01:04:06,843 AUDIENCE: 3, 2, 1. 1403 01:04:06,843 --> 01:04:08,760 DAVID J. MALAN: Or it could print out 3, 2, 1. 1404 01:04:08,760 --> 01:04:10,427 And frankly, that's what I'm hoping for. 1405 01:04:10,427 --> 01:04:14,888 So even though I've given it in argv 1, 2, 3, 1406 01:04:14,888 --> 01:04:18,180 because I'm prepending to the beginning of the list, the beginning of the list, 1407 01:04:18,180 --> 01:04:20,370 beginning of the list each time, I think, 1408 01:04:20,370 --> 01:04:23,040 indeed, we're going to see 3, 2, 1. 1409 01:04:23,040 --> 01:04:24,060 Now, that's fine. 1410 01:04:24,060 --> 01:04:24,930 That's correct. 1411 01:04:24,930 --> 01:04:28,380 But it's not necessarily what we might want. 1412 01:04:28,380 --> 01:04:32,100 So how could we actually go about inserting things maybe? 1413 01:04:32,100 --> 01:04:35,530 Otherwise, because, in fact, if we consider this algorithm, 1414 01:04:35,530 --> 01:04:38,670 what's the running time insert? 1415 01:04:38,670 --> 01:04:41,717 How many steps are required right now, given a linked list of size n 1416 01:04:41,717 --> 01:04:43,800 if you want to go ahead and insert one more node-- 1417 01:04:43,800 --> 01:04:48,270 there's actually a reason I took this lazy approach of prepending prepending. 1418 01:04:48,270 --> 01:04:53,115 In big O notation, how much does it cost us to insert into a linked list? 1419 01:04:53,115 --> 01:04:56,770 1420 01:04:56,770 --> 01:04:57,770 Think about it this way. 1421 01:04:57,770 --> 01:05:00,080 Does it matter how many nodes are already 1422 01:05:00,080 --> 01:05:04,220 in the linked list, whether it's 1 or 2 or 3 or 300 or 3,000. 1423 01:05:04,220 --> 01:05:06,830 If you're prepending, it doesn't matter how long 1424 01:05:06,830 --> 01:05:09,152 that chain is, you're just constantly putting it 1425 01:05:09,152 --> 01:05:11,360 at the beginning, at the beginning, at the beginning. 1426 01:05:11,360 --> 01:05:12,500 Now, how many steps is this? 1427 01:05:12,500 --> 01:05:13,220 I don't know exactly. 1428 01:05:13,220 --> 01:05:14,720 I'd have to count the lines of code. 1429 01:05:14,720 --> 01:05:15,860 But it's some small number. 1430 01:05:15,860 --> 01:05:17,150 It's like two steps, three steps. 1431 01:05:17,150 --> 01:05:18,358 How many lines of code is it? 1432 01:05:18,358 --> 01:05:21,080 It's very few to prepend, prepend. 1433 01:05:21,080 --> 01:05:27,800 So I would dare say that the running time of insertion into a linked list 1434 01:05:27,800 --> 01:05:29,150 is actually constant time. 1435 01:05:29,150 --> 01:05:29,930 It's big O of 1. 1436 01:05:29,930 --> 01:05:32,763 And that's super fast because it doesn't matter how big the list is. 1437 01:05:32,763 --> 01:05:35,390 Boom, boom, boom, you've prepended to the list. 1438 01:05:35,390 --> 01:05:36,960 But there's a flip side. 1439 01:05:36,960 --> 01:05:39,200 What's the running time of searching a linked list, 1440 01:05:39,200 --> 01:05:43,710 looking for something in it, finding a number in it? 1441 01:05:43,710 --> 01:05:45,990 Well, if it looks like this, how long does 1442 01:05:45,990 --> 01:05:47,760 it take you to find some arbitrary number 1443 01:05:47,760 --> 01:05:50,040 that the human might ask you for? 1444 01:05:50,040 --> 01:05:53,080 How many steps will it take to find me the number 1 if it's there? 1445 01:05:53,080 --> 01:05:56,250 So big O of n-- because in the worst case, the number you're looking for 1446 01:05:56,250 --> 01:05:57,583 might be all the way at the end. 1447 01:05:57,583 --> 01:06:00,083 And even though you and I, again, have this bird's eye view, 1448 01:06:00,083 --> 01:06:03,420 and we can obviously see where the 1 is, the only way we can get to the 1 1449 01:06:03,420 --> 01:06:05,200 is by starting at the 2. 1450 01:06:05,200 --> 01:06:06,267 How do you get to the 2? 1451 01:06:06,267 --> 01:06:07,350 You got to start at the 3. 1452 01:06:07,350 --> 01:06:08,350 How do you get to the 3? 1453 01:06:08,350 --> 01:06:11,290 You've got to start at the beginning of the list itself. 1454 01:06:11,290 --> 01:06:14,280 And so whereas in the world of arrays where you had this contiguous 1455 01:06:14,280 --> 01:06:16,988 chunk of memory, just like we had lockers on the stage weeks ago, 1456 01:06:16,988 --> 01:06:19,738 and you could jump to the middle and then the middle of the middle 1457 01:06:19,738 --> 01:06:21,000 and the middle of the middle. 1458 01:06:21,000 --> 01:06:23,260 That was all predicated on contiguousness. 1459 01:06:23,260 --> 01:06:23,760 Why? 1460 01:06:23,760 --> 01:06:27,150 Because if you know where the first locker was, and you 1461 01:06:27,150 --> 01:06:29,850 know where the last locker was, you can substract one 1462 01:06:29,850 --> 01:06:31,680 from the other, divide by 2, and, boom, you 1463 01:06:31,680 --> 01:06:35,130 get the index or the location numerically of the middle locker. 1464 01:06:35,130 --> 01:06:36,870 And you can do that again and again. 1465 01:06:36,870 --> 01:06:39,510 I cannot do any such math here. 1466 01:06:39,510 --> 01:06:42,160 The middle of this linked list is obviously here. 1467 01:06:42,160 --> 01:06:45,438 But it doesn't matter what the location of this one is in memory. 1468 01:06:45,438 --> 01:06:47,230 It doesn't matter what the location of this 1469 01:06:47,230 --> 01:06:50,105 is in memory because they could be anywhere in the computer's memory. 1470 01:06:50,105 --> 01:06:52,510 So you can subtract one from the other, divide by 2, 1471 01:06:52,510 --> 01:06:55,000 and that's going to put you in some random location 1472 01:06:55,000 --> 01:06:58,690 because these chunks of memory are not back to back to back to back. 1473 01:06:58,690 --> 01:07:00,410 They're every which way. 1474 01:07:00,410 --> 01:07:03,040 So this is to say, what algorithm from week zero 1475 01:07:03,040 --> 01:07:05,650 can we not use on linked lists? 1476 01:07:05,650 --> 01:07:07,100 So binary search. 1477 01:07:07,100 --> 01:07:09,790 So that very algorithm we started the class with 1478 01:07:09,790 --> 01:07:13,540 was all predicated on contiguous chunks of memory, like an array. 1479 01:07:13,540 --> 01:07:15,470 The problem with an array of course, though, 1480 01:07:15,470 --> 01:07:17,620 is that you paint yourself into this corner. 1481 01:07:17,620 --> 01:07:20,950 And you have to in advance how many locations you want. 1482 01:07:20,950 --> 01:07:22,805 And if you round up, you're wasting space. 1483 01:07:22,805 --> 01:07:24,430 If you round down, you're wasting time. 1484 01:07:24,430 --> 01:07:26,050 So you're screwed either way. 1485 01:07:26,050 --> 01:07:27,820 A linked list avoids those problems. 1486 01:07:27,820 --> 01:07:30,220 It's more of a dynamic data structure that can grow. 1487 01:07:30,220 --> 01:07:32,500 And frankly, if we code it up, it could even shrink. 1488 01:07:32,500 --> 01:07:34,820 We could remove these nodes back and forth. 1489 01:07:34,820 --> 01:07:39,220 And so we're not necessarily wasting time on insertion, 1490 01:07:39,220 --> 01:07:41,650 but we are on searching this thing. 1491 01:07:41,650 --> 01:07:45,940 We're back to Big O of n when it comes to searching a linked list as opposed 1492 01:07:45,940 --> 01:07:49,460 to it being log n, which was much, much better. 1493 01:07:49,460 --> 01:07:52,180 So the upside of prepending the nodes in this way 1494 01:07:52,180 --> 01:07:54,567 is that we have constant time insertion of new nodes 1495 01:07:54,567 --> 01:07:56,650 because we just continually insert, insert, insert 1496 01:07:56,650 --> 01:07:58,270 into the very beginning of the list. 1497 01:07:58,270 --> 01:08:01,030 Of course, a side effect of this is that the numbers 1498 01:08:01,030 --> 01:08:04,090 might end up in completely reverse order as they have here 1499 01:08:04,090 --> 01:08:05,710 because I first inserted 1. 1500 01:08:05,710 --> 01:08:07,030 But then I prepended 2. 1501 01:08:07,030 --> 01:08:08,530 And then I prepended 3. 1502 01:08:08,530 --> 01:08:11,230 Well, we could perhaps take a completely different approach 1503 01:08:11,230 --> 01:08:13,940 and append the nodes upon insertion instead. 1504 01:08:13,940 --> 01:08:17,200 So, for instance, if I start off with an empty list, I could then insert 1. 1505 01:08:17,200 --> 01:08:18,399 I can insert 2. 1506 01:08:18,399 --> 01:08:19,571 And I can insert 3. 1507 01:08:19,571 --> 01:08:22,029 And, in this case, I actually get a bit lucky that now they 1508 01:08:22,029 --> 01:08:23,529 are in fact in sorted order. 1509 01:08:23,529 --> 01:08:25,361 Now, to be fair, that's not guaranteed. 1510 01:08:25,361 --> 01:08:27,069 But let's at least consider what the code 1511 01:08:27,069 --> 01:08:30,520 would look like if we were to take this alternative approach of appending 1512 01:08:30,520 --> 01:08:32,187 nodes instead of prepending. 1513 01:08:32,187 --> 01:08:34,270 Well, rather than write out the code from scratch, 1514 01:08:34,270 --> 01:08:38,470 let me open up a premade version of list.c that even has some comments 1515 01:08:38,470 --> 01:08:39,790 to explain what's going on. 1516 01:08:39,790 --> 01:08:42,130 Some of this code is pretty much the same. 1517 01:08:42,130 --> 01:08:45,609 But allow me to scroll down roughly to the middle where 1518 01:08:45,609 --> 01:08:48,470 we'll see the actual logic in question. 1519 01:08:48,470 --> 01:08:51,939 So, first, on line 35 here, we're checking if the list is null. 1520 01:08:51,939 --> 01:08:54,399 Because if there's no list yet, it's actually pretty easy 1521 01:08:54,399 --> 01:08:55,899 to prepend or append. 1522 01:08:55,899 --> 01:08:58,120 We're just going to go ahead and update the list 1523 01:08:58,120 --> 01:09:01,090 variable to point to this new node n. 1524 01:09:01,090 --> 01:09:04,450 But if the list isn't empty, and there's at least one node there already, 1525 01:09:04,450 --> 01:09:07,430 well, then what we're going to do is this in line 45. 1526 01:09:07,430 --> 01:09:10,840 We're going to iterate over that existing linked list. 1527 01:09:10,840 --> 01:09:14,050 And I'm going to do so with a temporary variable called pointer, 1528 01:09:14,050 --> 01:09:17,950 or ptr for short, that's initialized to the beginning of the list, a foam 1529 01:09:17,950 --> 01:09:20,260 finger pointing at that first node initially. 1530 01:09:20,260 --> 01:09:23,800 I'm going to on every iteration update that pointer variable 1531 01:09:23,800 --> 01:09:28,090 to point to the next node, to the next node, pointing one node ahead 1532 01:09:28,090 --> 01:09:29,109 with that foam finger. 1533 01:09:29,109 --> 01:09:31,300 But on each iteration, I'm also going to make sure 1534 01:09:31,300 --> 01:09:33,370 that the pointer variable is not null. 1535 01:09:33,370 --> 01:09:37,090 Because if it is null, that means I'm pointing past the end of the list or, 1536 01:09:37,090 --> 01:09:39,189 that is, the list has ended. 1537 01:09:39,189 --> 01:09:46,160 But if inside of that loop I notice that the current node's next field is null, 1538 01:09:46,160 --> 01:09:48,819 I actually know logically that I'm at the end of the list 1539 01:09:48,819 --> 01:09:50,300 without going past it. 1540 01:09:50,300 --> 01:09:53,300 So at that point, if my goal is to append this new node, 1541 01:09:53,300 --> 01:09:56,230 I'm going to go ahead and set pointer arrow next, 1542 01:09:56,230 --> 01:10:00,640 which is currently null, but set it equal to the address of this new node 1543 01:10:00,640 --> 01:10:03,680 effectively appending that node to the end of the list. 1544 01:10:03,680 --> 01:10:07,990 So, for instance, if we started with a list of 1 and 2, what we've just done 1545 01:10:07,990 --> 01:10:14,170 is updated 2's next field to be equal to the address of the node containing 3. 1546 01:10:14,170 --> 01:10:17,860 Meanwhile, the node containing 3's next field is null by default 1547 01:10:17,860 --> 01:10:21,040 because it is now the new end of the list. 1548 01:10:21,040 --> 01:10:25,330 Now, what are the implications for maybe performance or efficiency now? 1549 01:10:25,330 --> 01:10:28,120 Well, we are now appending to the list, which 1550 01:10:28,120 --> 01:10:32,620 means we're no longer gaining constant time of insertion. 1551 01:10:32,620 --> 01:10:36,088 Because any time we prepend it, it took us some finite number of steps. 1552 01:10:36,088 --> 01:10:39,130 We just had to update a couple of pointers at the beginning of the list-- 1553 01:10:39,130 --> 01:10:39,910 beginning of the list. 1554 01:10:39,910 --> 01:10:42,070 And it doesn't actually matter how much longer the list 1555 01:10:42,070 --> 01:10:43,778 is getting because we're never traversing 1556 01:10:43,778 --> 01:10:45,460 the list when we're prepending. 1557 01:10:45,460 --> 01:10:48,700 But when we're appending, by definition we're finding the end of the list, 1558 01:10:48,700 --> 01:10:51,075 finding the end of the list, finding the end of the list. 1559 01:10:51,075 --> 01:10:52,930 And so our running time now for insertion 1560 01:10:52,930 --> 01:10:55,180 is no longer big O of 1 or constant time. 1561 01:10:55,180 --> 01:10:58,600 It's now big O of n because if there's n nodes in the list already, 1562 01:10:58,600 --> 01:11:02,050 just to find the end of it we need to actually traverse the whole list 1563 01:11:02,050 --> 01:11:04,790 to actually find where this new node should go. 1564 01:11:04,790 --> 01:11:08,260 But even so, we've gotten lucky in this appending case 1565 01:11:08,260 --> 01:11:10,278 that we inserted 1 then 2 then 3. 1566 01:11:10,278 --> 01:11:12,070 That's just because of my choice of inputs. 1567 01:11:12,070 --> 01:11:15,250 Suppose that we don't in advance what the inputs are going to be. 1568 01:11:15,250 --> 01:11:18,610 They might be large numbers, small numbers, or anything in between. 1569 01:11:18,610 --> 01:11:21,140 But they might not necessarily be in order. 1570 01:11:21,140 --> 01:11:24,100 But if we want to maintain this linked list in sorted order, 1571 01:11:24,100 --> 01:11:27,380 I think our logic's actually going to have to change. 1572 01:11:27,380 --> 01:11:33,100 So let me actually go ahead and open up a new version of my linked list code, 1573 01:11:33,100 --> 01:11:34,990 this one too made in advance. 1574 01:11:34,990 --> 01:11:38,080 And in this version of my code, as we'll soon see, 1575 01:11:38,080 --> 01:11:43,130 I've gone about changing the logic just a little bit so that I can actually 1576 01:11:43,130 --> 01:11:48,560 now handle this additional case because when inserting nodes in arbitrary 1577 01:11:48,560 --> 01:11:51,650 order, if I wanted them to end up being sorted, 1578 01:11:51,650 --> 01:11:53,600 I have to consider a few possible scenarios. 1579 01:11:53,600 --> 01:11:55,440 Maybe there's no list whatsoever. 1580 01:11:55,440 --> 01:11:56,810 So let's actually look for that. 1581 01:11:56,810 --> 01:11:59,870 Let me scroll down in this final version of my linked list code. 1582 01:11:59,870 --> 01:12:03,690 And, actually, that case here on line 35 is pretty much the same. 1583 01:12:03,690 --> 01:12:06,110 If there's no list there, and the list variable is null, 1584 01:12:06,110 --> 01:12:08,330 well, let's just update it to point to this new node. 1585 01:12:08,330 --> 01:12:12,500 But things get more interesting when there is at least one node there. 1586 01:12:12,500 --> 01:12:16,170 Because if the goal is to maintain sorted order, we now need to decide, 1587 01:12:16,170 --> 01:12:18,710 does this new node, whatever its number is, 1588 01:12:18,710 --> 01:12:22,400 go before the beginning of the list, at the end of the list, 1589 01:12:22,400 --> 01:12:24,150 in the middle somewhere of the list? 1590 01:12:24,150 --> 01:12:25,850 So let's break that down. 1591 01:12:25,850 --> 01:12:32,670 If we find that the new node's number is less than the list's number here, 1592 01:12:32,670 --> 01:12:35,060 well, then it belongs at the beginning of the list 1593 01:12:35,060 --> 01:12:37,620 because it's smaller than any of the numbers already there. 1594 01:12:37,620 --> 01:12:41,450 So what I'm going to go ahead and do is update this new node's next field 1595 01:12:41,450 --> 01:12:43,940 to point at the current linked list. 1596 01:12:43,940 --> 01:12:46,160 And then I'm going to update the linked list variable 1597 01:12:46,160 --> 01:12:48,590 to equal the address of this new node. 1598 01:12:48,590 --> 01:12:53,240 The effect then is, no matter how long the existing list is if this new node's 1599 01:12:53,240 --> 01:12:55,820 number is smaller than everything else in the list, 1600 01:12:55,820 --> 01:12:57,970 I want to just splice it in at the beginning. 1601 01:12:57,970 --> 01:13:00,470 So that's actually pretty straightforward with just a couple 1602 01:13:00,470 --> 01:13:02,210 of pointer updates. 1603 01:13:02,210 --> 01:13:04,292 But the other scenario is that it doesn't just 1604 01:13:04,292 --> 01:13:06,000 belong at the very beginning of the list. 1605 01:13:06,000 --> 01:13:07,370 It's somewhere else in the list. 1606 01:13:07,370 --> 01:13:09,078 And that itself is two scenarios. 1607 01:13:09,078 --> 01:13:10,620 Maybe it's in the middle of the list. 1608 01:13:10,620 --> 01:13:12,245 Maybe it's at the very end of the list. 1609 01:13:12,245 --> 01:13:14,540 So let's consider those scenarios as well. 1610 01:13:14,540 --> 01:13:15,860 Let me scroll down here. 1611 01:13:15,860 --> 01:13:18,030 And in my else clause, it's a bit bigger this time. 1612 01:13:18,030 --> 01:13:18,530 Why? 1613 01:13:18,530 --> 01:13:23,030 Because on line 51, in this case, I'm going to induce another for loop 1614 01:13:23,030 --> 01:13:23,970 as before. 1615 01:13:23,970 --> 01:13:27,530 But this time I'm trying to determine if this node belongs at the end 1616 01:13:27,530 --> 01:13:29,070 or somewhere in the middle. 1617 01:13:29,070 --> 01:13:31,100 So I'm not just looking for the end this time. 1618 01:13:31,100 --> 01:13:35,180 I'm actually comparing the value, the integer inside of this new node, 1619 01:13:35,180 --> 01:13:37,290 against what is currently in the list. 1620 01:13:37,290 --> 01:13:41,150 So, for instance, if logically I actually find my way 1621 01:13:41,150 --> 01:13:43,130 all the way to the end of the list, whereby 1622 01:13:43,130 --> 01:13:47,510 the next field in the pointer variables node equals null, 1623 01:13:47,510 --> 01:13:50,850 well, then logically I didn't find an earlier spot for this node. 1624 01:13:50,850 --> 01:13:54,380 So let me go ahead and update that pointer's next field 1625 01:13:54,380 --> 01:13:56,260 to equal the address of this new node. 1626 01:13:56,260 --> 01:13:58,760 And then like before, let's just break out because I'm done. 1627 01:13:58,760 --> 01:14:01,790 I somehow mathematically got all the way to the end of the list 1628 01:14:01,790 --> 01:14:03,650 because there is that null pointer. 1629 01:14:03,650 --> 01:14:07,700 So it must be the case logically here that this new node belongs at the end. 1630 01:14:07,700 --> 01:14:11,270 But this is the juicier, slightly more challenging one. 1631 01:14:11,270 --> 01:14:13,910 But it's what ensures that we can maintain sorted order even 1632 01:14:13,910 --> 01:14:16,140 if the new node belongs somewhere in the middle. 1633 01:14:16,140 --> 01:14:20,030 So down here on line 62, I'm going to ask this question. 1634 01:14:20,030 --> 01:14:27,200 If the new node's number is less than the number in the next node-- that 1635 01:14:27,200 --> 01:14:29,360 is to say, if my foam fingers pointing here, 1636 01:14:29,360 --> 01:14:34,520 but the number I'm trying to insert is smaller than the next node over there 1637 01:14:34,520 --> 01:14:39,140 and implicitly the same as or greater than the current node's number, 1638 01:14:39,140 --> 01:14:41,160 well, then I'm going to go ahead and do this. 1639 01:14:41,160 --> 01:14:44,210 I'm going to update the new node's next pointer 1640 01:14:44,210 --> 01:14:48,810 to be equal to whatever the current node I'm pointing at next pointer 1641 01:14:48,810 --> 01:14:54,800 so that I can then update that pointer's next field to equal the new node. 1642 01:14:54,800 --> 01:14:59,240 And then I can break out altogether, doing a similar splice 1643 01:14:59,240 --> 01:15:03,020 in the middle of this list but manipulating a node effectively 1644 01:15:03,020 --> 01:15:06,450 to the left and the right to make room for this new node. 1645 01:15:06,450 --> 01:15:08,480 So, collectively, what does this code do? 1646 01:15:08,480 --> 01:15:11,150 Well, if we start out with that initially empty list, 1647 01:15:11,150 --> 01:15:14,090 and maybe we insert the number 2, it just goes right there. 1648 01:15:14,090 --> 01:15:17,570 But suppose that we insert next the number 1, which, of course, is smaller, 1649 01:15:17,570 --> 01:15:19,880 this code now ensures that the 1 is going to get 1650 01:15:19,880 --> 01:15:21,650 inserted at the beginning of the list. 1651 01:15:21,650 --> 01:15:25,310 If we then insert the number 4, well, that's bigger than 1 and bigger than 2. 1652 01:15:25,310 --> 01:15:28,380 So it logically is going to end up at the end of the list. 1653 01:15:28,380 --> 01:15:31,880 And, lastly, in this example, if we insert 3, which, again, is initially 1654 01:15:31,880 --> 01:15:36,500 out of order, this code can ensure that we still insert it in sorted order 1655 01:15:36,500 --> 01:15:40,130 because it's going to end up in between nodes 2 and 4. 1656 01:15:40,130 --> 01:15:44,260 So here too in terms of running time, insertion is still big O of n. 1657 01:15:44,260 --> 01:15:46,548 It's not quite as bad in practice as always adding it 1658 01:15:46,548 --> 01:15:48,340 to the end of the list, the end of the list 1659 01:15:48,340 --> 01:15:51,550 as was the case when we blindly appended new nodes. 1660 01:15:51,550 --> 01:15:54,220 But it is going to be in big O of n because, in the worst case 1661 01:15:54,220 --> 01:15:58,280 here, if we've got n nodes in the list already, then in the worst case 1662 01:15:58,280 --> 01:16:03,130 it might indeed be such a big number that it belongs at the end of the list. 1663 01:16:03,130 --> 01:16:04,400 All right, that was a lot. 1664 01:16:04,400 --> 01:16:06,700 Let's go ahead and take a delicious cookie break here. 1665 01:16:06,700 --> 01:16:09,310 And we'll be back in 10. 1666 01:16:09,310 --> 01:16:11,200 All right, we are back. 1667 01:16:11,200 --> 01:16:16,330 And to recap, the problems we've solved and the problems we've created are-- 1668 01:16:16,330 --> 01:16:18,735 arrays were problematic because they were a fixed size. 1669 01:16:18,735 --> 01:16:20,110 And that can get us into trouble. 1670 01:16:20,110 --> 01:16:23,920 Or it causes us to waste more space preemptively 1671 01:16:23,920 --> 01:16:25,580 even though we might not ever use it. 1672 01:16:25,580 --> 01:16:29,470 So we introduce the linked lists again to solve that problem 1673 01:16:29,470 --> 01:16:31,840 by being more dynamic and only allocate as much memory 1674 01:16:31,840 --> 01:16:33,790 as we need on demand step by step. 1675 01:16:33,790 --> 01:16:36,790 But, of course, we're spending extra space for the pointers. 1676 01:16:36,790 --> 01:16:40,940 We might gain performance if we at least prepend all of our elements to it. 1677 01:16:40,940 --> 01:16:45,330 But we lose time again if we append or insert in sorted order. 1678 01:16:45,330 --> 01:16:47,832 So it's not clear, frankly, I think, to me, 1679 01:16:47,832 --> 01:16:50,540 even hearing these upsides and downsides, if there's a clear win. 1680 01:16:50,540 --> 01:16:53,810 But maybe there's a way to get the best of both worlds 1681 01:16:53,810 --> 01:16:58,070 by trying to capture the upsides of having information that 1682 01:16:58,070 --> 01:17:03,200 is kept in sorted order that allows us to maybe divide and conquer still 1683 01:17:03,200 --> 01:17:06,860 but still gives us the dynamism to grow or shrink the data structure. 1684 01:17:06,860 --> 01:17:08,910 And thus we're born trees. 1685 01:17:08,910 --> 01:17:13,820 So what we're about to explore are variants of these ideas of arrays 1686 01:17:13,820 --> 01:17:17,750 and linked lists and see if we can maybe mash up some of those building blocks 1687 01:17:17,750 --> 01:17:22,130 and create more interesting, more compelling solutions 1688 01:17:22,130 --> 01:17:25,010 that are even not just one-dimensional left to right 1689 01:17:25,010 --> 01:17:29,760 but are maybe two dimensional and have different axes to them or dimensions. 1690 01:17:29,760 --> 01:17:32,030 So a tree in the real world, of course, tends 1691 01:17:32,030 --> 01:17:33,710 to grow up from the ground like this. 1692 01:17:33,710 --> 01:17:35,450 But it tends to branch out. 1693 01:17:35,450 --> 01:17:36,830 And branches branch. 1694 01:17:36,830 --> 01:17:41,240 And that might already in your mind's eye evoke notions of forks 1695 01:17:41,240 --> 01:17:43,460 in the road or conditionals as we've seen. 1696 01:17:43,460 --> 01:17:45,620 And let me propose that we first consider what 1697 01:17:45,620 --> 01:17:47,900 the world calls binary search trees. 1698 01:17:47,900 --> 01:17:51,050 And so bi is back in that we can do things in half and half and half 1699 01:17:51,050 --> 01:17:54,590 somehow if maybe we think about arrays a little bit more cleverly. 1700 01:17:54,590 --> 01:17:56,375 So here's an array of size 7. 1701 01:17:56,375 --> 01:17:59,000 And I chose that deliberately because there's a perfect middle. 1702 01:17:59,000 --> 01:18:02,370 There's a middle of middle and so forth, just like the lockers a few weeks back. 1703 01:18:02,370 --> 01:18:04,370 So when the world of arrays-- 1704 01:18:04,370 --> 01:18:06,380 this was actually pretty efficient because we 1705 01:18:06,380 --> 01:18:09,920 can do binary search and middle of middle, middle of middle, and so forth. 1706 01:18:09,920 --> 01:18:12,560 And that gave us logarithmic running time. 1707 01:18:12,560 --> 01:18:14,390 But its only size 7. 1708 01:18:14,390 --> 01:18:17,270 And we concluded that it's going to be like big O of n headache 1709 01:18:17,270 --> 01:18:20,418 to copy this into a slightly bigger array, free the old memory, 1710 01:18:20,418 --> 01:18:20,960 and so forth. 1711 01:18:20,960 --> 01:18:22,310 And thus were born linked list. 1712 01:18:22,310 --> 01:18:26,460 But with linked lists, we lost log of n running time. 1713 01:18:26,460 --> 01:18:26,960 Why? 1714 01:18:26,960 --> 01:18:29,360 Because we have to always start at the beginning to 1715 01:18:29,360 --> 01:18:33,013 get, for instance, to the middle or to the end of the list in the worst case. 1716 01:18:33,013 --> 01:18:36,180 But what if we start to think a little more cleverly in multiple dimensions? 1717 01:18:36,180 --> 01:18:39,500 So just for the sake of discussion, let me highlight the middle of this here 1718 01:18:39,500 --> 01:18:40,010 array. 1719 01:18:40,010 --> 01:18:41,480 Let me highlight the middle of the middle 1720 01:18:41,480 --> 01:18:42,920 and then the middle of the middle. 1721 01:18:42,920 --> 01:18:45,920 So there's implicit structure here. 1722 01:18:45,920 --> 01:18:47,220 There's a pattern of sorts. 1723 01:18:47,220 --> 01:18:49,400 And, in fact, just to make this more obvious, 1724 01:18:49,400 --> 01:18:53,270 let me not treat this as one dimension left to right but how about two 1725 01:18:53,270 --> 01:18:54,950 and give myself a bit of vertical space. 1726 01:18:54,950 --> 01:18:56,270 So it's the exact same array. 1727 01:18:56,270 --> 01:18:58,335 But allow me to just think about it now as 1728 01:18:58,335 --> 01:18:59,960 though the middle elements way up here. 1729 01:18:59,960 --> 01:19:01,880 The middle of the middles are slightly lower. 1730 01:19:01,880 --> 01:19:03,710 And the middle of the middles or the leaves 1731 01:19:03,710 --> 01:19:05,600 really are at the bottom of this tree. 1732 01:19:05,600 --> 01:19:06,780 And that word is deliberate. 1733 01:19:06,780 --> 01:19:09,072 We actually borrowed vernacular from the world of trees 1734 01:19:09,072 --> 01:19:13,550 where the leaf nodes or leaves are the ones at the very bottom. 1735 01:19:13,550 --> 01:19:16,567 And the root node is the one at the very top. 1736 01:19:16,567 --> 01:19:18,650 So for the sake of discussion, computer scientists 1737 01:19:18,650 --> 01:19:20,930 draw trees like this, instead of this. 1738 01:19:20,930 --> 01:19:22,820 But it's the exact same idea. 1739 01:19:22,820 --> 01:19:26,990 They just tend to grow down in discussions, more like a family tree 1740 01:19:26,990 --> 01:19:29,580 if you drew those growing up, for instance. 1741 01:19:29,580 --> 01:19:32,360 So what's interesting here? 1742 01:19:32,360 --> 01:19:35,480 Well, at the moment, we've broken the array model 1743 01:19:35,480 --> 01:19:38,690 because this memory is absolutely not contiguous because this number is here. 1744 01:19:38,690 --> 01:19:40,160 This number is here, here, here, and here. 1745 01:19:40,160 --> 01:19:41,390 It's all over the place. 1746 01:19:41,390 --> 01:19:44,840 But we do have pointers now in our toolkit, whereby 1747 01:19:44,840 --> 01:19:47,540 even if these numbers are anywhere in the computer's memory, 1748 01:19:47,540 --> 01:19:51,350 we can stitch them together like we did string and those balloons. 1749 01:19:51,350 --> 01:19:53,990 Now, it's not sufficient just to have one piece of string 1750 01:19:53,990 --> 01:19:56,000 for each node or one pointer. 1751 01:19:56,000 --> 01:19:59,480 But what if we actually give each of these nodes, not just a number, 1752 01:19:59,480 --> 01:20:01,460 like the number 4, the number 2, the number 6-- 1753 01:20:01,460 --> 01:20:05,690 let's give them each a number and two pointers, a so-called left child 1754 01:20:05,690 --> 01:20:08,160 and a right child so to speak. 1755 01:20:08,160 --> 01:20:09,650 So we could do this. 1756 01:20:09,650 --> 01:20:12,050 And I'm going to abstract away now. 1757 01:20:12,050 --> 01:20:13,550 They're not even rectangles anymore. 1758 01:20:13,550 --> 01:20:15,050 They're really long rectangles. 1759 01:20:15,050 --> 01:20:18,740 Or they're upside down Ts that have three boxes to them. 1760 01:20:18,740 --> 01:20:22,220 But I'm just going to abstract away nodes now as just simple squares. 1761 01:20:22,220 --> 01:20:25,880 And it's an implementation detail as to what the structs actually are. 1762 01:20:25,880 --> 01:20:30,515 But the arrows suggest that each of these nodes now has two pointers. 1763 01:20:30,515 --> 01:20:31,640 You don't have to use them. 1764 01:20:31,640 --> 01:20:33,307 The leaf nodes have nothing to point to. 1765 01:20:33,307 --> 01:20:35,330 So those can all be null probably. 1766 01:20:35,330 --> 01:20:38,870 But each of these nodes now has two pointers. 1767 01:20:38,870 --> 01:20:40,720 Now, what's the implication of this? 1768 01:20:40,720 --> 01:20:43,990 This is what we call a binary search tree 1769 01:20:43,990 --> 01:20:46,990 because, one and first and foremost, it's obviously a tree. 1770 01:20:46,990 --> 01:20:51,310 But it also is a data structure that's kept in sorted order, 1771 01:20:51,310 --> 01:20:52,780 whereby notice what is true. 1772 01:20:52,780 --> 01:20:55,760 If you pick any node in this tree, like the number 4, 1773 01:20:55,760 --> 01:21:00,530 everything to the left of it, its left subtree so to speak, is smaller. 1774 01:21:00,530 --> 01:21:04,097 Everything to the right of it, its right subtree, is larger. 1775 01:21:04,097 --> 01:21:05,180 And that's true elsewhere. 1776 01:21:05,180 --> 01:21:05,847 Look at the six. 1777 01:21:05,847 --> 01:21:07,400 Everything to the left is smaller. 1778 01:21:07,400 --> 01:21:11,000 Everything to the right is bigger and same thing over here. 1779 01:21:11,000 --> 01:21:13,930 So in some sense, this is a recursive data structure 1780 01:21:13,930 --> 01:21:17,080 because you can say the same thing about each of these nodes 1781 01:21:17,080 --> 01:21:20,950 because each of these subtrees compose a larger tree. 1782 01:21:20,950 --> 01:21:23,920 Or, conversely, this big tree is a composition 1783 01:21:23,920 --> 01:21:26,645 of 1, 2 subtrees plus one more node. 1784 01:21:26,645 --> 01:21:29,020 So think back to our [INAUDIBLE] example in those bricks. 1785 01:21:29,020 --> 01:21:30,880 Well, what's a pyramid of height 4? 1786 01:21:30,880 --> 01:21:33,280 Well, just a pyramid of height 3 plus one more row. 1787 01:21:33,280 --> 01:21:35,020 What's a tree of height 3? 1788 01:21:35,020 --> 01:21:39,440 Well, it's two subtrees of height 2 plus one more row or really 1789 01:21:39,440 --> 01:21:41,790 one new root node to connect them. 1790 01:21:41,790 --> 01:21:44,972 So this already is a recursive data structure by that logic. 1791 01:21:44,972 --> 01:21:46,430 How do we translate this into code? 1792 01:21:46,430 --> 01:21:49,700 Well, we won't sludge through so much low level C code this time around. 1793 01:21:49,700 --> 01:21:52,790 But let me propose that we could implement a node now 1794 01:21:52,790 --> 01:21:56,450 as being similar in spirit to what we did last time where every node used 1795 01:21:56,450 --> 01:21:58,652 to have a number and a next pointer. 1796 01:21:58,652 --> 01:22:00,860 But, now, let's actually make some room for ourselves 1797 01:22:00,860 --> 01:22:05,840 and redefine a node as still having a number but now having two pointers. 1798 01:22:05,840 --> 01:22:08,008 And I'll call them obviously left and right 1799 01:22:08,008 --> 01:22:09,800 though we could call them anything we want. 1800 01:22:09,800 --> 01:22:11,217 I could call it next and previous. 1801 01:22:11,217 --> 01:22:14,720 But really left and right would seem to make more sense with children 1802 01:22:14,720 --> 01:22:16,590 of a given node like this. 1803 01:22:16,590 --> 01:22:19,820 So this in C is how we might implement, therefore, 1804 01:22:19,820 --> 01:22:22,460 a node in a binary search tree. 1805 01:22:22,460 --> 01:22:25,235 And so let's consider pictorially what the running time is 1806 01:22:25,235 --> 01:22:26,360 of searching for something. 1807 01:22:26,360 --> 01:22:29,780 If this here is the tree and it follows that binary search tree 1808 01:22:29,780 --> 01:22:33,020 definition where everything to the left is smaller everything to the right 1809 01:22:33,020 --> 01:22:35,930 is bigger, well, how many steps might it take if you 1810 01:22:35,930 --> 01:22:38,750 have n nodes in a tree like this? 1811 01:22:38,750 --> 01:22:41,750 Well, it's not going to take me n steps because I certainly 1812 01:22:41,750 --> 01:22:43,370 don't have to look through every node. 1813 01:22:43,370 --> 01:22:45,650 And, in fact, just like a linked list starts 1814 01:22:45,650 --> 01:22:47,250 on the left hand side, so to speak. 1815 01:22:47,250 --> 01:22:48,792 So that's just an artist's rendition. 1816 01:22:48,792 --> 01:22:51,230 Just as a linked list starts on one end and you 1817 01:22:51,230 --> 01:22:54,380 have to traverse the whole thing, a tree, because it's two dimensional, 1818 01:22:54,380 --> 01:22:56,940 always starts in memory at the root node. 1819 01:22:56,940 --> 01:22:59,750 So this is always where you start any operation, insertion, 1820 01:22:59,750 --> 01:23:01,160 deletion, searching. 1821 01:23:01,160 --> 01:23:04,670 So by that logic, in the worst case if there's n nodes here, 1822 01:23:04,670 --> 01:23:07,280 how many steps would it seem to take? 1823 01:23:07,280 --> 01:23:09,620 It's not big O of n. 1824 01:23:09,620 --> 01:23:12,470 So it's actually back to bi O of log n. 1825 01:23:12,470 --> 01:23:13,070 Why? 1826 01:23:13,070 --> 01:23:14,987 Because, actually, if you think of the height, 1827 01:23:14,987 --> 01:23:16,490 there's roughly eight nodes in here. 1828 01:23:16,490 --> 01:23:18,260 And log base 2 of 8 is actually 3. 1829 01:23:18,260 --> 01:23:20,790 And so 1, 2, 3 is the height of this tree. 1830 01:23:20,790 --> 01:23:23,300 So in the worst case at the moment, it seems 1831 01:23:23,300 --> 01:23:26,630 that it's only going to take me like 1 node, 2 nodes, 3 nodes, or really 1832 01:23:26,630 --> 01:23:29,390 just two steps to get to the very bottom of this tree 1833 01:23:29,390 --> 01:23:32,430 to decide is a number there or not. 1834 01:23:32,430 --> 01:23:35,130 I certainly can ignore this entire subtree. 1835 01:23:35,130 --> 01:23:35,630 Why? 1836 01:23:35,630 --> 01:23:37,340 Because I'm searching for the number 7. 1837 01:23:37,340 --> 01:23:41,480 Just like the phone book from week 0, I can divide and conquer this problem. 1838 01:23:41,480 --> 01:23:43,730 If I'm looking for 7, I don't need to bother 1839 01:23:43,730 --> 01:23:48,680 wasting any time looking at this entire subtree, which is almost 50% 1840 01:23:48,680 --> 01:23:50,640 of the picture on the screen. 1841 01:23:50,640 --> 01:23:53,210 And so I can focus on this half then this half. 1842 01:23:53,210 --> 01:23:54,920 And, boom, I'm done. 1843 01:23:54,920 --> 01:23:57,230 So we sort have binary search back. 1844 01:23:57,230 --> 01:24:01,400 We have the metaphor of the lockers back by operating now in two dimensions 1845 01:24:01,400 --> 01:24:04,790 to mitigate the reality that our memory is no longer contiguous. 1846 01:24:04,790 --> 01:24:05,720 But that's fine. 1847 01:24:05,720 --> 01:24:07,160 We can follow these arrows. 1848 01:24:07,160 --> 01:24:12,120 We can use these pointers instead to get anywhere that we actually want. 1849 01:24:12,120 --> 01:24:16,230 So any questions now on trees or specifically binary search trees, 1850 01:24:16,230 --> 01:24:19,970 which I dare say are the best of both worlds, all of the upsides of an array. 1851 01:24:19,970 --> 01:24:21,650 And it's log n running time. 1852 01:24:21,650 --> 01:24:24,680 And all of the upsides of the dynamism of linked list because this thing 1853 01:24:24,680 --> 01:24:29,450 can grow and shrink and doesn't need to be contiguous. 1854 01:24:29,450 --> 01:24:32,550 Any questions on this? 1855 01:24:32,550 --> 01:24:36,890 All right, well, the code too lends itself to relative simplicity. 1856 01:24:36,890 --> 01:24:40,770 And here's where recursion applies not just to the structure of the data 1857 01:24:40,770 --> 01:24:42,082 but also the code itself. 1858 01:24:42,082 --> 01:24:44,540 So just for the sake of discussion, we won't run this code. 1859 01:24:44,540 --> 01:24:46,082 We'll just look at it on screen here. 1860 01:24:46,082 --> 01:24:49,340 Suppose you're implementing a function called search whose purpose in life 1861 01:24:49,340 --> 01:24:51,740 is to search a tree and return, true or false, 1862 01:24:51,740 --> 01:24:53,360 I found the number you're looking for. 1863 01:24:53,360 --> 01:24:55,027 Well, here's the number I'm looking for. 1864 01:24:55,027 --> 01:24:56,180 It's one of the arguments. 1865 01:24:56,180 --> 01:25:00,170 And the first argument more importantly is actually a pointer to the tree 1866 01:25:00,170 --> 01:25:02,450 itself a pointer to the root of the tree. 1867 01:25:02,450 --> 01:25:04,880 And that's all the information we need to search a tree 1868 01:25:04,880 --> 01:25:06,600 and go left, go right, go left, go right. 1869 01:25:06,600 --> 01:25:07,100 How? 1870 01:25:07,100 --> 01:25:08,120 Well, let me do this. 1871 01:25:08,120 --> 01:25:11,360 As always, we'll have a base case when it comes to recursion. 1872 01:25:11,360 --> 01:25:13,280 Because if there's no tree there, then it 1873 01:25:13,280 --> 01:25:14,870 makes no sense to even ask me this question. 1874 01:25:14,870 --> 01:25:16,162 I'm just going to return false. 1875 01:25:16,162 --> 01:25:19,140 If you hand me null, there's nothing to do, return false. 1876 01:25:19,140 --> 01:25:21,650 But suppose that you don't hand me null. 1877 01:25:21,650 --> 01:25:23,750 And suppose that the number I'm looking for 1878 01:25:23,750 --> 01:25:28,385 is less than the number in the tree at the moment, the number at that root. 1879 01:25:28,385 --> 01:25:29,510 Well, what do I want to do? 1880 01:25:29,510 --> 01:25:31,010 I effectively want to go left. 1881 01:25:31,010 --> 01:25:32,540 I want to search the left subtree. 1882 01:25:32,540 --> 01:25:33,690 How do I do that? 1883 01:25:33,690 --> 01:25:38,720 I'm going to return the recursive return value from the same search 1884 01:25:38,720 --> 01:25:42,800 function passing in a slightly smaller tree, a so-called subtree 1885 01:25:42,800 --> 01:25:44,160 but the same number. 1886 01:25:44,160 --> 01:25:46,190 And this is where recursion is beautiful. 1887 01:25:46,190 --> 01:25:48,290 Look at the relative simplicity of this. 1888 01:25:48,290 --> 01:25:51,267 If search exists, which it doesn't exist in its entirety yet. 1889 01:25:51,267 --> 01:25:52,100 But we'll get there. 1890 01:25:52,100 --> 01:25:55,430 If you want to search half of the tree, just go there. 1891 01:25:55,430 --> 01:25:56,810 So go to the root of the tree. 1892 01:25:56,810 --> 01:25:59,840 Follow the left child pointer and pass that in because it's a tree. 1893 01:25:59,840 --> 01:26:02,540 It's just a smaller tree but pass in the same number. 1894 01:26:02,540 --> 01:26:04,138 What if, though, it's a bigger number? 1895 01:26:04,138 --> 01:26:05,930 So what if the number you're looking for is 1896 01:26:05,930 --> 01:26:09,140 bigger than the number at the root of the tree? 1897 01:26:09,140 --> 01:26:12,630 Well, then just search the right subtree instead. 1898 01:26:12,630 --> 01:26:15,980 And now, logically, what's the fourth and final case? 1899 01:26:15,980 --> 01:26:19,140 1900 01:26:19,140 --> 01:26:22,490 So I can express that as if the number you're looking for equals 1901 01:26:22,490 --> 01:26:26,098 equals the number in the tree, that is, the root of the tree, 1902 01:26:26,098 --> 01:26:27,890 then I'm going to go ahead and return true. 1903 01:26:27,890 --> 01:26:29,660 And you might remember from our days with Scratch 1904 01:26:29,660 --> 01:26:31,010 even this conditional is not necessary. 1905 01:26:31,010 --> 01:26:32,310 I just did it to be explicit. 1906 01:26:32,310 --> 01:26:35,460 We can tighten it up as just an else instead. 1907 01:26:35,460 --> 01:26:36,600 And that's it. 1908 01:26:36,600 --> 01:26:38,810 And this is where, again, recursion finally 1909 01:26:38,810 --> 01:26:42,420 is maybe a little more accessible, a little more obvious in its cleanliness. 1910 01:26:42,420 --> 01:26:44,350 There's relatively little logic here. 1911 01:26:44,350 --> 01:26:48,420 But what's important is that these recursive calls here and here are 1912 01:26:48,420 --> 01:26:50,890 dividing and conquering the problem implicitly. 1913 01:26:50,890 --> 01:26:51,390 Why? 1914 01:26:51,390 --> 01:26:54,090 Because it's solving the same problem search for a number. 1915 01:26:54,090 --> 01:26:57,960 But it's doing it on just half of the tree or the other half of the tree. 1916 01:26:57,960 --> 01:26:59,920 And because we have this base case here, even 1917 01:26:59,920 --> 01:27:01,920 if you get all the way to the bottom of the tree 1918 01:27:01,920 --> 01:27:05,045 and you try to go down the left child or you try to go down the right child 1919 01:27:05,045 --> 01:27:07,290 but those pointers are null, then you know 1920 01:27:07,290 --> 01:27:10,080 you didn't find it because you would have returned true sooner 1921 01:27:10,080 --> 01:27:12,480 if anything had been in fact equal. 1922 01:27:12,480 --> 01:27:16,620 So that then is recursive code for searching a binary search tree, which 1923 01:27:16,620 --> 01:27:19,140 is, again, just to connect the dots of what we introduced 1924 01:27:19,140 --> 01:27:22,650 last time of actually doing things now recursively 1925 01:27:22,650 --> 01:27:25,230 and revisiting some of our own week 0 problems. 1926 01:27:25,230 --> 01:27:28,380 But I'm kind of lying to you here. 1927 01:27:28,380 --> 01:27:29,880 Yes, this is a binary search tree. 1928 01:27:29,880 --> 01:27:31,712 But it's not always as pretty as this. 1929 01:27:31,712 --> 01:27:33,420 It's certainly not always seven elements. 1930 01:27:33,420 --> 01:27:37,380 But it doesn't actually have to be as well-balanced as this one here is. 1931 01:27:37,380 --> 01:27:40,860 In fact, suppose that we insert the following numbers 1932 01:27:40,860 --> 01:27:42,802 into an empty list starting with 2. 1933 01:27:42,802 --> 01:27:44,010 I can plop the 2 right there. 1934 01:27:44,010 --> 01:27:45,990 That's the current root of this tree. 1935 01:27:45,990 --> 01:27:49,050 Suppose, though, that I insert next the number-- 1936 01:27:49,050 --> 01:27:49,890 how about 1? 1937 01:27:49,890 --> 01:27:52,510 Well, it stands to reason that it should go now to the left. 1938 01:27:52,510 --> 01:27:55,230 And so now this is the tree of size 2. 1939 01:27:55,230 --> 01:27:57,810 Now, I insert the number, say, 3. 1940 01:27:57,810 --> 01:27:59,413 It, of course, can go there. 1941 01:27:59,413 --> 01:28:00,580 So that makes perfect sense. 1942 01:28:00,580 --> 01:28:01,770 And I just got lucky. 1943 01:28:01,770 --> 01:28:04,830 Because I inserted these numbers as 2 then one then 3, 1944 01:28:04,830 --> 01:28:10,650 I very cleanly got a balanced tree that waited properly left and right. 1945 01:28:10,650 --> 01:28:14,100 But what if you have a more perverse set of inputs so to speak. 1946 01:28:14,100 --> 01:28:15,030 You're not lucky. 1947 01:28:15,030 --> 01:28:17,670 And the worst possible situation happens in terms 1948 01:28:17,670 --> 01:28:19,950 of the order in which the human is inputting data 1949 01:28:19,950 --> 01:28:21,090 into this data structure. 1950 01:28:21,090 --> 01:28:22,680 What if the human inserts 1 first? 1951 01:28:22,680 --> 01:28:24,690 OK, well, it goes as the root of the tree. 1952 01:28:24,690 --> 01:28:27,150 But here's where things start to devolve. 1953 01:28:27,150 --> 01:28:29,160 What if the human then inserts 2? 1954 01:28:29,160 --> 01:28:30,240 OK, it goes there. 1955 01:28:30,240 --> 01:28:31,800 What if the human then inserts 3? 1956 01:28:31,800 --> 01:28:34,410 Well, according to our definition, it goes there. 1957 01:28:34,410 --> 01:28:37,620 It looks like part of a tree because of how I've drawn it. 1958 01:28:37,620 --> 01:28:42,900 But what is it really if you tilt your head, right? 1959 01:28:42,900 --> 01:28:45,000 It looks really just like a linked list. 1960 01:28:45,000 --> 01:28:47,160 And there really is no second dimension. 1961 01:28:47,160 --> 01:28:48,240 I've drawn it this way. 1962 01:28:48,240 --> 01:28:51,280 But this for all intents and purposes is a linked list of size 3. 1963 01:28:51,280 --> 01:28:51,780 Why? 1964 01:28:51,780 --> 01:28:53,040 Because there's no halving. 1965 01:28:53,040 --> 01:28:56,130 There's no actual choosing left or right. 1966 01:28:56,130 --> 01:28:57,390 Now, this is fixable. 1967 01:28:57,390 --> 01:28:58,420 How could you fix this? 1968 01:28:58,420 --> 01:29:00,000 It's still the same numbers 1, 2, 3. 1969 01:29:00,000 --> 01:29:03,510 And it does adhere to the binary search tree definition. 1970 01:29:03,510 --> 01:29:05,340 Every number to the right is greater. 1971 01:29:05,340 --> 01:29:06,960 Every number to the right is greater. 1972 01:29:06,960 --> 01:29:09,210 Every number to the left is-- well, it's inapplicable. 1973 01:29:09,210 --> 01:29:11,280 But it certainly doesn't violate that definition. 1974 01:29:11,280 --> 01:29:14,880 Could you fix this tree somehow and make it 1975 01:29:14,880 --> 01:29:17,940 balanced so it's not devolving into big O of n 1976 01:29:17,940 --> 01:29:20,820 but is still technically log of n? 1977 01:29:20,820 --> 01:29:22,305 What should be the root? 1978 01:29:22,305 --> 01:29:25,510 AUDIENCE: You just reverse the pointer from 1 to 2. 1979 01:29:25,510 --> 01:29:28,890 DAVID J. MALAN: So I could reverse the pointer from 1 to 2. 1980 01:29:28,890 --> 01:29:31,140 And so sort of pictorially if I take this 1981 01:29:31,140 --> 01:29:35,400 and I just swing everything over and make 2 the new route, 1982 01:29:35,400 --> 01:29:37,800 then, indeed, this could be the new root up here. 1983 01:29:37,800 --> 01:29:42,270 1 could be hanging off of it over here and 3 can be hanging off of the 2 1984 01:29:42,270 --> 01:29:42,900 as is. 1985 01:29:42,900 --> 01:29:47,340 So long story short, when it comes to binary search trees, by themselves 1986 01:29:47,340 --> 01:29:49,870 they don't necessarily guarantee any balance. 1987 01:29:49,870 --> 01:29:53,730 So even though theoretically, yes, it's big O of log n, which is fantastic, 1988 01:29:53,730 --> 01:29:55,740 not if you get a perverse set of inputs that 1989 01:29:55,740 --> 01:29:58,770 just happen to be, for instance, the worst possible scenario-- now, 1990 01:29:58,770 --> 01:29:59,460 it is fixable. 1991 01:29:59,460 --> 01:30:01,877 And, in fact, in higher level courses in computer science, 1992 01:30:01,877 --> 01:30:03,900 specifically on algorithms and data structures, 1993 01:30:03,900 --> 01:30:07,020 you'll be introduced, if you go down that road, of how 1994 01:30:07,020 --> 01:30:12,180 you can tweak the code for insertion and deletion in a binary search tree 1995 01:30:12,180 --> 01:30:14,500 to make these fixes along the way. 1996 01:30:14,500 --> 01:30:17,040 And it's going to cost you a few more steps to fix things 1997 01:30:17,040 --> 01:30:18,180 when they get out of whack. 1998 01:30:18,180 --> 01:30:21,090 But if you do it every insertion or every deletion, 1999 01:30:21,090 --> 01:30:23,805 at least you can maintain a balanced tree. 2000 01:30:23,805 --> 01:30:26,180 And you'll learn about different types of balanced trees. 2001 01:30:26,180 --> 01:30:28,480 But for our purposes now, we don't necessarily 2002 01:30:28,480 --> 01:30:31,720 get that property even if we do want log n unless you 2003 01:30:31,720 --> 01:30:34,130 keep it balanced along the way. 2004 01:30:34,130 --> 01:30:38,255 Now, what about other combinations of arrays and linked lists? 2005 01:30:38,255 --> 01:30:41,380 We can really start to mash these things up and see what comes out of them. 2006 01:30:41,380 --> 01:30:45,130 Dictionaries are another abstract data type 2007 01:30:45,130 --> 01:30:49,750 similar in spirit to stacks and queues in that you can implement them 2008 01:30:49,750 --> 01:30:50,590 in different ways. 2009 01:30:50,590 --> 01:30:54,520 A dictionary is a data structure that stores keys and values. 2010 01:30:54,520 --> 01:30:56,870 And those are technical terms, keys and values. 2011 01:30:56,870 --> 01:31:00,730 The analog in the human world would be literally a dictionary 2012 01:31:00,730 --> 01:31:03,700 that you'd have in a classroom, a dictionary with words 2013 01:31:03,700 --> 01:31:07,850 and definitions, more generally known as keys and values. 2014 01:31:07,850 --> 01:31:09,550 So that's all a dictionary is. 2015 01:31:09,550 --> 01:31:12,038 It associates keys with values. 2016 01:31:12,038 --> 01:31:14,080 So, for instance, you could think of it almost as 2017 01:31:14,080 --> 01:31:16,270 like two columns in a spreadsheet, where on the left 2018 01:31:16,270 --> 01:31:18,370 you put the key, on the right you put the value. 2019 01:31:18,370 --> 01:31:21,620 Or, specifically, you put the word in a dictionary and the definition 2020 01:31:21,620 --> 01:31:22,120 thereafter. 2021 01:31:22,120 --> 01:31:25,750 And that's roughly how the printed pages in a dictionary are laid out. 2022 01:31:25,750 --> 01:31:29,950 So dictionaries associate words with definitions or more generally 2023 01:31:29,950 --> 01:31:31,360 keys with values. 2024 01:31:31,360 --> 01:31:33,393 But it's an abstract data type in that we could 2025 01:31:33,393 --> 01:31:34,810 implement this in a bunch of ways. 2026 01:31:34,810 --> 01:31:37,840 We could use maybe two arrays, one array for the keys, 2027 01:31:37,840 --> 01:31:39,310 one array for the definitions. 2028 01:31:39,310 --> 01:31:41,500 And you just hope that they line up. 2029 01:31:41,500 --> 01:31:44,620 Bracket i in this one maps to bracket i in this one. 2030 01:31:44,620 --> 01:31:48,070 But an array is not going to give us the dynamism that we want. 2031 01:31:48,070 --> 01:31:52,000 You might run out of space when Merriam-Webster or whoever 2032 01:31:52,000 --> 01:31:53,950 adds new words to the English language. 2033 01:31:53,950 --> 01:31:55,630 You might not want to be using an array. 2034 01:31:55,630 --> 01:31:57,130 You might want to use a linked list. 2035 01:31:57,130 --> 01:31:59,620 But, again, linked lists then devolve into big O of n. 2036 01:31:59,620 --> 01:32:02,710 And that's not good for dictionaries and spell checking. 2037 01:32:02,710 --> 01:32:05,710 If you have to check every possible word to find something, 2038 01:32:05,710 --> 01:32:08,420 getting something that's a little faster than that is compelling. 2039 01:32:08,420 --> 01:32:12,520 So let's consider how maybe Apple, maybe Google, maybe others are actually 2040 01:32:12,520 --> 01:32:14,380 implementing contacts. 2041 01:32:14,380 --> 01:32:18,370 Because even though I implied in week 0 and maybe outright said, 2042 01:32:18,370 --> 01:32:23,530 it's an array-- it's a big list of all of your names of contacts maybe of some 2043 01:32:23,530 --> 01:32:24,460 fixed size-- 2044 01:32:24,460 --> 01:32:29,170 they probably better be using some variant of a linked list, otherwise, 2045 01:32:29,170 --> 01:32:31,420 you could never add more friends potentially. 2046 01:32:31,420 --> 01:32:32,020 You'd max out. 2047 01:32:32,020 --> 01:32:34,478 And they'd say you have to unfriend someone just to fit it. 2048 01:32:34,478 --> 01:32:37,120 As an aside, this is sort of true in the social media world. 2049 01:32:37,120 --> 01:32:40,162 Once you have 5,000 friends on Facebook, you can't have 5,001. 2050 01:32:40,162 --> 01:32:43,120 Once you have some number on LinkedIn, you can't have more connections. 2051 01:32:43,120 --> 01:32:45,162 That's not necessarily that they're using arrays. 2052 01:32:45,162 --> 01:32:47,320 But it is the same implication that they've 2053 01:32:47,320 --> 01:32:50,000 chosen some finite size for memory. 2054 01:32:50,000 --> 01:32:53,650 So how might we consider implementing a dictionary specifically 2055 01:32:53,650 --> 01:32:55,810 for your address book or your contacts so you 2056 01:32:55,810 --> 01:32:58,300 can store the names of everyone ideally alphabetically 2057 01:32:58,300 --> 01:33:01,300 but also their phone numbers and maybe anything else? 2058 01:33:01,300 --> 01:33:04,000 Well, ultimately, we want to be able to get at someone's name 2059 01:33:04,000 --> 01:33:05,650 and lead to their number. 2060 01:33:05,650 --> 01:33:07,660 So the keys and values for our discussion 2061 01:33:07,660 --> 01:33:11,140 here will be names are the keys and phone numbers or the values. 2062 01:33:11,140 --> 01:33:14,890 But the values themselves could also include email address and a mailing 2063 01:33:14,890 --> 01:33:15,910 address and all of that. 2064 01:33:15,910 --> 01:33:18,555 But we'll keep it simple, names and phone numbers. 2065 01:33:18,555 --> 01:33:20,680 So here's how you might think about this or draw it 2066 01:33:20,680 --> 01:33:24,010 on a chalkboard, two columns or in a spreadsheet, left and right. 2067 01:33:24,010 --> 01:33:26,470 But how could we actually implement this in memory? 2068 01:33:26,470 --> 01:33:29,878 Because, ideally, we don't want it to devolve into something linear. 2069 01:33:29,878 --> 01:33:32,170 We don't want to have to look through all of my friends 2070 01:33:32,170 --> 01:33:34,628 and family and colleagues to find someone whose name starts 2071 01:33:34,628 --> 01:33:37,090 with Z, for instance, or anything else. 2072 01:33:37,090 --> 01:33:41,050 It would be nice to have something logarithmic with binary search. 2073 01:33:41,050 --> 01:33:47,440 But with binary search again, we have to maybe use a tree instead. 2074 01:33:47,440 --> 01:33:49,930 But now we have to use two pointers instead of one. 2075 01:33:49,930 --> 01:33:51,310 there's a lot of trade offs here. 2076 01:33:51,310 --> 01:33:54,130 But let's see how else we could solve this same problem. 2077 01:33:54,130 --> 01:33:57,530 Because wouldn't it be nice-- and we've not really talked about this before-- 2078 01:33:57,530 --> 01:34:01,300 if we instead aspire to this Holy Grail of algorithms? 2079 01:34:01,300 --> 01:34:05,140 The best algorithm out there is surely one that's big O of 1, 2080 01:34:05,140 --> 01:34:07,900 like constant time, because what that means 2081 01:34:07,900 --> 01:34:10,870 is it doesn't matter if you have 1 friend, 10 friends, 100, 1,000, 2082 01:34:10,870 --> 01:34:12,170 a million, a billion friends-- 2083 01:34:12,170 --> 01:34:15,040 it doesn't matter how big n is, your searches will always 2084 01:34:15,040 --> 01:34:17,710 take you the same amount of time. 2085 01:34:17,710 --> 01:34:19,390 It is independent of n. 2086 01:34:19,390 --> 01:34:24,270 And that's why it's sort of the ultimate goal for performance. 2087 01:34:24,270 --> 01:34:27,230 So can we get to this aspiration? 2088 01:34:27,230 --> 01:34:28,880 Well, a couple of building blocks. 2089 01:34:28,880 --> 01:34:31,550 There's this notion in computing known as hashing. 2090 01:34:31,550 --> 01:34:33,920 And hashing is a technique, literally a function 2091 01:34:33,920 --> 01:34:39,290 in math or in code that actually takes any number of inputs and maps 2092 01:34:39,290 --> 01:34:41,430 them to a finite number of outputs. 2093 01:34:41,430 --> 01:34:44,480 So if you think back to high school math, domains, and ranges, 2094 01:34:44,480 --> 01:34:47,480 you can take an infinite domain with any values in the world. 2095 01:34:47,480 --> 01:34:51,840 But it reduces them, a hash function, to a finite range of specific values. 2096 01:34:51,840 --> 01:34:55,190 So, for instance, it's no accident that we have these four buckets on the stage 2097 01:34:55,190 --> 01:34:58,880 now, each of which has a suit from a deck of cards. 2098 01:34:58,880 --> 01:35:01,430 We got for visibility's sake the biggest cards we can. 2099 01:35:01,430 --> 01:35:03,380 These are the super, jumbo playing cards. 2100 01:35:03,380 --> 01:35:06,680 And in this box are a bunch of randomly ordered playing cards. 2101 01:35:06,680 --> 01:35:09,140 And, typically, if you were to ever play some game 2102 01:35:09,140 --> 01:35:11,360 or you wanted to these for some reason, how would 2103 01:35:11,360 --> 01:35:14,990 you go about sorting them by suit and also by number? 2104 01:35:14,990 --> 01:35:17,930 Odds are if you're like me, you'd probably take some shortcuts 2105 01:35:17,930 --> 01:35:21,290 and maybe pull out all of the hearts, pull out all of the spades, 2106 01:35:21,290 --> 01:35:25,200 pull out all of the clubs, or you bucketize it into categories. 2107 01:35:25,200 --> 01:35:26,910 And that term is actually technical. 2108 01:35:26,910 --> 01:35:29,310 Here are four buckets to make this clear. 2109 01:35:29,310 --> 01:35:32,367 And, for instance, if the first card I find is the five of hearts, 2110 01:35:32,367 --> 01:35:32,950 you know what? 2111 01:35:32,950 --> 01:35:36,033 Just to make my life easier, I'm going to put that into the hearts bucket. 2112 01:35:36,033 --> 01:35:37,260 Or here we have 4. 2113 01:35:37,260 --> 01:35:39,120 Here we have 5. 2114 01:35:39,120 --> 01:35:40,980 Here we have 6. 2115 01:35:40,980 --> 01:35:43,290 Here we have queen. 2116 01:35:43,290 --> 01:35:46,740 And notice that I'm putting these cards into the appropriate buckets. 2117 01:35:46,740 --> 01:35:47,310 Why? 2118 01:35:47,310 --> 01:35:50,760 Because, ultimately, then I'm going to have four problems but of smaller 2119 01:35:50,760 --> 01:35:53,700 size, a 13 size problem, 13, 13, 13. 2120 01:35:53,700 --> 01:35:56,460 And, frankly, it's just going to be easier cognitively, daresay 2121 01:35:56,460 --> 01:36:00,690 algorithmically, to then sort each of the 13 cards in these buckets 2122 01:36:00,690 --> 01:36:04,063 rather than deal with four suits somehow combined all together. 2123 01:36:04,063 --> 01:36:07,230 So if you've ever in life made piles-- if you've ever literally used buckets 2124 01:36:07,230 --> 01:36:09,390 like this, you are hashing. 2125 01:36:09,390 --> 01:36:12,240 I'm taking some number of inputs, 52 in this case. 2126 01:36:12,240 --> 01:36:16,650 And I'm mapping it to a finite number of outputs, 4 in this case. 2127 01:36:16,650 --> 01:36:20,850 So hashing, again, just takes in inputs and hashes them 2128 01:36:20,850 --> 01:36:22,900 to output values in this way. 2129 01:36:22,900 --> 01:36:25,020 So beyond that terminology, let's consider 2130 01:36:25,020 --> 01:36:27,600 what we can now do with hash functions that's 2131 01:36:27,600 --> 01:36:30,600 a little more germane to storing things like our friends and family 2132 01:36:30,600 --> 01:36:33,450 and colleagues in dictionaries. 2133 01:36:33,450 --> 01:36:35,760 A hash function is just one that does that. 2134 01:36:35,760 --> 01:36:39,210 I as the human was just implementing or behaving like a hash function. 2135 01:36:39,210 --> 01:36:41,160 But technically a hash function is actually 2136 01:36:41,160 --> 01:36:44,160 a math function or a function in C or scratch 2137 01:36:44,160 --> 01:36:48,660 or soon Python or other languages that takes as input some value, 2138 01:36:48,660 --> 01:36:51,990 be it a physical card or a name or a number or something else, 2139 01:36:51,990 --> 01:36:53,740 and outputs some value. 2140 01:36:53,740 --> 01:36:58,020 And we can use hashing as an operation to implement 2141 01:36:58,020 --> 01:37:00,120 what we'll call hash tables. 2142 01:37:00,120 --> 01:37:01,917 And that's what that dictionary was. 2143 01:37:01,917 --> 01:37:04,500 If you think about how I drew it on the screen as two columns, 2144 01:37:04,500 --> 01:37:08,440 it's like a table of information, keys on the left, values on the right. 2145 01:37:08,440 --> 01:37:10,020 So what is a hash table? 2146 01:37:10,020 --> 01:37:12,300 The simplest way to think about it is that this 2147 01:37:12,300 --> 01:37:17,190 is an amalgam, a combination of arrays and linked lists right. 2148 01:37:17,190 --> 01:37:20,400 We borrowed some ideas of linked lists a moment ago 2149 01:37:20,400 --> 01:37:22,540 to give us trees in two dimensions. 2150 01:37:22,540 --> 01:37:25,620 What if we stick with this idea of having two-dimensional worlds 2151 01:37:25,620 --> 01:37:28,140 but now use an array initially? 2152 01:37:28,140 --> 01:37:31,270 So we get the speed benefits of arrays because everything's contiguous. 2153 01:37:31,270 --> 01:37:33,720 We can do simple arithmetic and jump to the middle or the middle or the middle 2154 01:37:33,720 --> 01:37:35,735 or the first or the last very easily. 2155 01:37:35,735 --> 01:37:36,870 And then you know what? 2156 01:37:36,870 --> 01:37:39,480 Let's use the horizontal part of the screen 2157 01:37:39,480 --> 01:37:41,560 to give us linked lists as needed. 2158 01:37:41,560 --> 01:37:45,150 So, for instance, if the goal at hand is to implement the contacts in my cell 2159 01:37:45,150 --> 01:37:50,520 phone or my Mac or PC, let me propose that we start at least in English 2160 01:37:50,520 --> 01:37:52,590 with an array of size 26. 2161 01:37:52,590 --> 01:37:53,790 Of course, it's 0 index. 2162 01:37:53,790 --> 01:37:56,350 So it's really location 0 through 25. 2163 01:37:56,350 --> 01:37:59,520 And for the sake of discussion, let me propose that location 0 represents 2164 01:37:59,520 --> 01:38:01,860 A. Location 25 represents z. 2165 01:38:01,860 --> 01:38:03,730 And then everything else in between. 2166 01:38:03,730 --> 01:38:04,230 Why? 2167 01:38:04,230 --> 01:38:07,290 We know from C that we can convert thanks to ASCII and Unicode 2168 01:38:07,290 --> 01:38:09,730 from letters to numbers and back and forth. 2169 01:38:09,730 --> 01:38:13,560 So in constant time, we can find location A. In constant time 2170 01:38:13,560 --> 01:38:15,630 we can find location Z. Why? 2171 01:38:15,630 --> 01:38:19,200 Because we're using an array just like in week 2. 2172 01:38:19,200 --> 01:38:21,360 All right, well, suppose that I want to think 2173 01:38:21,360 --> 01:38:24,457 about these more as letters of the alphabet, the English alphabet 2174 01:38:24,457 --> 01:38:25,290 rather than numbers. 2175 01:38:25,290 --> 01:38:27,630 So it's equivalent to label them A through Z. 2176 01:38:27,630 --> 01:38:31,260 And suppose now I want to start adding friends and family and contacts 2177 01:38:31,260 --> 01:38:32,550 to my address book. 2178 01:38:32,550 --> 01:38:33,700 How might this look? 2179 01:38:33,700 --> 01:38:35,730 Well, if the first one I want to add is Mario-- 2180 01:38:35,730 --> 01:38:39,570 Mario's name starts with an M. And so that's A, B, C, D, E, F-- 2181 01:38:39,570 --> 01:38:41,200 OK, M goes there. 2182 01:38:41,200 --> 01:38:46,050 So I'm going to put Mario at that location in the array. 2183 01:38:46,050 --> 01:38:49,140 After that, I add a second person, for instance, how about Luigi? 2184 01:38:49,140 --> 01:38:51,210 Well, L comes just before M. So it stands 2185 01:38:51,210 --> 01:38:53,940 to reason that it goes there in the array. 2186 01:38:53,940 --> 01:38:56,820 Meanwhile, if I go and add another character like peach, 2187 01:38:56,820 --> 01:39:00,360 she's going to go there a few spots away because her name starts 2188 01:39:00,360 --> 01:39:03,960 with P. Meanwhile, here's a whole bunch of other Nintendo characters 2189 01:39:03,960 --> 01:39:07,680 that happen to have unique letters of their first names. 2190 01:39:07,680 --> 01:39:10,950 And there's room for everyone, room for everyone on the board A 2191 01:39:10,950 --> 01:39:12,750 through Z with some blanks in the middle. 2192 01:39:12,750 --> 01:39:14,730 But you can perhaps see where this is going. 2193 01:39:14,730 --> 01:39:17,622 When and where might a problem arise with this array-based approach? 2194 01:39:17,622 --> 01:39:19,080 AUDIENCE: When you add [INAUDIBLE]. 2195 01:39:19,080 --> 01:39:21,240 DAVID J. MALAN: Yeah, so when we add someone 2196 01:39:21,240 --> 01:39:25,240 else who's name collides with one of these existing characters, 2197 01:39:25,240 --> 01:39:28,820 just because by accident, they have a name that starts with the same letter-- 2198 01:39:28,820 --> 01:39:33,220 So, for instance, there's Lakitu here who collides with Luigi potentially. 2199 01:39:33,220 --> 01:39:35,830 Here is Link who collides with both of them. 2200 01:39:35,830 --> 01:39:38,500 But I've drawn a solution to this along the way. 2201 01:39:38,500 --> 01:39:42,580 I could if I was Kronion just remove Luigi from the data structure 2202 01:39:42,580 --> 01:39:45,770 and put Lakitu in or remove and then put Link in there instead. 2203 01:39:45,770 --> 01:39:49,480 But that's stupid if you can only have one friend whose name starts with L. 2204 01:39:49,480 --> 01:39:51,250 That's just bad design. 2205 01:39:51,250 --> 01:39:56,680 But what if we now in the off chance I have two friends whose names start 2206 01:39:56,680 --> 01:40:00,490 with the same letter, well, I'll just string them together, 2207 01:40:00,490 --> 01:40:03,880 link them together, no pun intended, using pointers of sorts. 2208 01:40:03,880 --> 01:40:06,065 So my vertical here is an array. 2209 01:40:06,065 --> 01:40:07,690 And this is just an artist's rendition. 2210 01:40:07,690 --> 01:40:10,815 There's no actual notion of up, down, left, right in the computer's memory. 2211 01:40:10,815 --> 01:40:13,510 But this is my array always of size 26. 2212 01:40:13,510 --> 01:40:17,740 And each of the elements in this array are now not a simple number. 2213 01:40:17,740 --> 01:40:20,380 But it's a pointer to a linked list. 2214 01:40:20,380 --> 01:40:24,020 And if there's nothing there, it's just null, null, null, null. 2215 01:40:24,020 --> 01:40:27,330 But, otherwise, it's a valid address that points to the first node. 2216 01:40:27,330 --> 01:40:28,080 And you know what? 2217 01:40:28,080 --> 01:40:30,080 If we have multiple names with the same letters, 2218 01:40:30,080 --> 01:40:35,550 we can just string these nodes together together using pointers as well. 2219 01:40:35,550 --> 01:40:40,820 So a hash table then as implemented here is an array of linked lists. 2220 01:40:40,820 --> 01:40:43,730 And that allows us to, one, get some speed benefit 2221 01:40:43,730 --> 01:40:47,000 because look how fast we inserted or found Mario, Luigi, and Peach. 2222 01:40:47,000 --> 01:40:50,420 But it still covers the scenario where, OK, some people 2223 01:40:50,420 --> 01:40:51,830 can have the same first letters. 2224 01:40:51,830 --> 01:40:54,080 Some of these names will collide. 2225 01:40:54,080 --> 01:40:57,440 So collisions are an expected problem with a hash table, 2226 01:40:57,440 --> 01:41:03,345 whereby two values from some domain happen to map to the same value. 2227 01:41:03,345 --> 01:41:04,970 And, frankly, you'll see this here too. 2228 01:41:04,970 --> 01:41:07,250 So these buckets are technically a finite size. 2229 01:41:07,250 --> 01:41:09,860 They're definitely big enough for 13 cards each. 2230 01:41:09,860 --> 01:41:13,860 But you could imagine a world where if I'm using 2 decks, 3 decks, or 4 decks, 2231 01:41:13,860 --> 01:41:15,110 I'm going to run out of space. 2232 01:41:15,110 --> 01:41:17,585 And then my data structure can't fit any more information. 2233 01:41:17,585 --> 01:41:19,460 But we're not going to have this problem here 2234 01:41:19,460 --> 01:41:22,430 because the linked lists, as we've seen, can grow and even shrink 2235 01:41:22,430 --> 01:41:23,420 as much as they want. 2236 01:41:23,420 --> 01:41:25,962 In the world of Nintendo there's actually lots of collisions. 2237 01:41:25,962 --> 01:41:28,650 And these aren't even all of the characters. 2238 01:41:28,650 --> 01:41:30,660 So that's then a hash table. 2239 01:41:30,660 --> 01:41:34,370 So with a hash table in mind, how fast is it? 2240 01:41:34,370 --> 01:41:37,610 Did we achieve that Holy Grail of constant time? 2241 01:41:37,610 --> 01:41:41,570 Well, for some of these names if I back up, yeah, it's kind of constant time. 2242 01:41:41,570 --> 01:41:45,650 Yoshi and Zelda, boom, constant time, location 24, location 25. 2243 01:41:45,650 --> 01:41:48,860 Some of them, though, like Luigi, Link, it's 2244 01:41:48,860 --> 01:41:52,760 not quite constant time because I first have to get to Luigi's location. 2245 01:41:52,760 --> 01:41:54,990 And then I have to follow this linked list. 2246 01:41:54,990 --> 01:42:01,040 So, technically, then what's the running time of searching a hash table? 2247 01:42:01,040 --> 01:42:02,390 Sometimes you'll get lucky. 2248 01:42:02,390 --> 01:42:04,520 But sometimes you won't. 2249 01:42:04,520 --> 01:42:06,140 Consider the worst case. 2250 01:42:06,140 --> 01:42:08,280 Big O is often used to describe worst case. 2251 01:42:08,280 --> 01:42:11,300 So what would be the worst case in your own context? 2252 01:42:11,300 --> 01:42:12,880 A little louder. 2253 01:42:12,880 --> 01:42:13,530 So NY. 2254 01:42:13,530 --> 01:42:15,480 AUDIENCE: Because you might use [INAUDIBLE].. 2255 01:42:15,480 --> 01:42:16,480 DAVID J. MALAN: Correct. 2256 01:42:16,480 --> 01:42:18,820 And so to summarize in some weird scenario 2257 01:42:18,820 --> 01:42:20,950 all of your friends and family and contacts 2258 01:42:20,950 --> 01:42:23,200 could have names that start with the same letter. 2259 01:42:23,200 --> 01:42:25,780 And then it doesn't matter that this is a hash table 2260 01:42:25,780 --> 01:42:27,730 with an array of linked lists. 2261 01:42:27,730 --> 01:42:30,760 For all intents and purposes, if your friends names only 2262 01:42:30,760 --> 01:42:33,640 start with the same letter, all you have is a linked list. 2263 01:42:33,640 --> 01:42:37,960 Much like with a tree, if you don't keep it balanced, all you have really 2264 01:42:37,960 --> 01:42:38,960 is a linked list. 2265 01:42:38,960 --> 01:42:44,470 So technically speaking, yes, hash tables are big O of n 2266 01:42:44,470 --> 01:42:47,230 even if you're good about-- 2267 01:42:47,230 --> 01:42:49,390 even if you have-- 2268 01:42:49,390 --> 01:42:51,590 in the worst case, hash tables are big O of n. 2269 01:42:51,590 --> 01:42:52,090 Why? 2270 01:42:52,090 --> 01:42:54,220 Because it can devolve into this perverse scenario 2271 01:42:54,220 --> 01:42:57,370 where you just have lots and lots of collisions all at the same values. 2272 01:42:57,370 --> 01:42:59,680 But there's got to be a way to fix this. 2273 01:42:59,680 --> 01:43:03,488 How could we chip away at the length of these chains so to speak? 2274 01:43:03,488 --> 01:43:05,530 Could I decrease the length of these linked lists 2275 01:43:05,530 --> 01:43:08,320 so that with much higher probability there's no collisions? 2276 01:43:08,320 --> 01:43:12,190 Well, maybe the problem is that I started with just 26 buckets. 2277 01:43:12,190 --> 01:43:14,140 I mean, four buckets here, 26 here. 2278 01:43:14,140 --> 01:43:16,140 Maybe the problem is the size of my array. 2279 01:43:16,140 --> 01:43:18,415 So what if I instead just give myself a bigger array, 2280 01:43:18,415 --> 01:43:20,040 and it's too big to fit on the screen-- 2281 01:43:20,040 --> 01:43:24,950 but what if I instead have a dollar for names that start with Laa and Lab, 2282 01:43:24,950 --> 01:43:27,930 and Lac, Lad, do, dot, dot, all the way down? 2283 01:43:27,930 --> 01:43:33,380 Now, when I hash these names into my hash table, 2284 01:43:33,380 --> 01:43:35,720 Lakitu is going to end up at their own location 2285 01:43:35,720 --> 01:43:39,390 here, link at their own location here, Luigi at their own location here. 2286 01:43:39,390 --> 01:43:42,140 And so now I don't have linked lists. 2287 01:43:42,140 --> 01:43:44,910 I really just have an array of names. 2288 01:43:44,910 --> 01:43:47,520 So now I'm actually back to constant time. 2289 01:43:47,520 --> 01:43:48,020 Why? 2290 01:43:48,020 --> 01:43:52,220 Because so long as every letter of the alphabet has an ASCII value, 2291 01:43:52,220 --> 01:43:53,660 I can get that in constant time. 2292 01:43:53,660 --> 01:43:55,710 And we did that as far back as week one. 2293 01:43:55,710 --> 01:43:59,390 And so I can figure out what the arithmetic location 2294 01:43:59,390 --> 01:44:01,280 is of each of these buckets just by looking 2295 01:44:01,280 --> 01:44:05,820 at 1, 2, 3 characters or the total number of letters that I care about, 2296 01:44:05,820 --> 01:44:07,620 which is just 3 in this case. 2297 01:44:07,620 --> 01:44:08,870 So this feels like a solution. 2298 01:44:08,870 --> 01:44:10,620 Even though I haven't drawn all the names, 2299 01:44:10,620 --> 01:44:12,380 it feels like we've solved the problem. 2300 01:44:12,380 --> 01:44:16,355 But what's the downside or trade off of what we've just done? 2301 01:44:16,355 --> 01:44:17,063 AUDIENCE: Memory. 2302 01:44:17,063 --> 01:44:17,980 DAVID J. MALAN: Sorry? 2303 01:44:17,980 --> 01:44:18,832 AUDIENCE: Memory. 2304 01:44:18,832 --> 01:44:19,790 DAVID J. MALAN: Memory. 2305 01:44:19,790 --> 01:44:22,940 So not pictured here is the dot, dot, dot, and everything 2306 01:44:22,940 --> 01:44:24,380 above and everything below. 2307 01:44:24,380 --> 01:44:29,250 This just exploded in terms of the number of locations in this array. 2308 01:44:29,250 --> 01:44:29,750 Why? 2309 01:44:29,750 --> 01:44:32,250 Because if I'm taking into account not just the first letter 2310 01:44:32,250 --> 01:44:36,380 but the first, the second, and third, that's 26 to the third power, 2311 01:44:36,380 --> 01:44:38,940 26 times 26 times 26. 2312 01:44:38,940 --> 01:44:42,260 And even though there's going to be a crazy number of names 2313 01:44:42,260 --> 01:44:43,460 that just don't exist-- 2314 01:44:43,460 --> 01:44:47,090 I can't think of a Nintendo character whose name starts with Laa-- 2315 01:44:47,090 --> 01:44:48,510 you still need that bucket. 2316 01:44:48,510 --> 01:44:49,010 Why? 2317 01:44:49,010 --> 01:44:51,260 Because, otherwise, you don't have contiguousness. 2318 01:44:51,260 --> 01:44:53,670 You can't just arbitrarily label these buckets. 2319 01:44:53,670 --> 01:44:55,687 If you want to be able to use a function that 2320 01:44:55,687 --> 01:44:58,520 looks at first, second, third letter and then arithmetically figures 2321 01:44:58,520 --> 01:45:04,640 out where to go, whether it's 0 to 25 or 0 to 26 to the third power minus 1 2322 01:45:04,640 --> 01:45:06,522 being the number of buckets there-- 2323 01:45:06,522 --> 01:45:07,730 so there's a trade off there. 2324 01:45:07,730 --> 01:45:11,420 You're wasting a huge amount of memory just to give yourself that time. 2325 01:45:11,420 --> 01:45:14,250 But that would then give us constant time. 2326 01:45:14,250 --> 01:45:17,840 So in that sense, if we have an ideal hash function whereby 2327 01:45:17,840 --> 01:45:21,260 the function ensures that no values collide, 2328 01:45:21,260 --> 01:45:25,550 we do actually obtain that that Grail of big O of 1 because it only 2329 01:45:25,550 --> 01:45:30,068 takes one or maybe three steps to find that names location. 2330 01:45:30,068 --> 01:45:33,110 Now, to make this clear, how do we translate this to something like code? 2331 01:45:33,110 --> 01:45:36,470 Well, here again is the struct we used last time for that of a person 2332 01:45:36,470 --> 01:45:38,330 and a person had a name and a number. 2333 01:45:38,330 --> 01:45:42,030 Here, for a hash table, we might do something a little bit differently. 2334 01:45:42,030 --> 01:45:47,150 We might now have a node in a hash table storing the person's name, 2335 01:45:47,150 --> 01:45:51,440 person's phone number, and a pointer to the next such person 2336 01:45:51,440 --> 01:45:53,245 in that chain if needed. 2337 01:45:53,245 --> 01:45:56,120 Hopefully this is going to be null most of the time, all of the time. 2338 01:45:56,120 --> 01:45:58,460 But we need it just in case we do have that collision. 2339 01:45:58,460 --> 01:46:01,577 We've seen in our pictures the names, like Mario, Luigi, and so forth. 2340 01:46:01,577 --> 01:46:02,660 We didn't see the numbers. 2341 01:46:02,660 --> 01:46:05,270 But that's what's inside of those boxes on the picture. 2342 01:46:05,270 --> 01:46:09,650 But that node would give us what we need to build up these linked lists. 2343 01:46:09,650 --> 01:46:13,610 Meanwhile what is the hash table itself, that vertical strip along the left? 2344 01:46:13,610 --> 01:46:15,650 Well, it's really just a variable. 2345 01:46:15,650 --> 01:46:18,860 We could call it table for short of size 26. 2346 01:46:18,860 --> 01:46:21,950 And each of the locations in that array that was on the side 2347 01:46:21,950 --> 01:46:26,130 here, at least in the simple, small version, was a pointer to a node. 2348 01:46:26,130 --> 01:46:29,180 So it's null if there's no one there or it's 2349 01:46:29,180 --> 01:46:32,850 a valid address of the first node in the linked list. 2350 01:46:32,850 --> 01:46:34,760 So this then is a hash table. 2351 01:46:34,760 --> 01:46:38,580 And each of those nodes, to be clear, would be defined as follows. 2352 01:46:38,580 --> 01:46:40,640 So what's the takeaway then with a hash table? 2353 01:46:40,640 --> 01:46:44,300 Ideally, with a good hash function and with a good set 2354 01:46:44,300 --> 01:46:47,750 of inputs where you're not presented with some perverse set of inputs that's 2355 01:46:47,750 --> 01:46:50,720 all of the friends whose names start with the same letter, 2356 01:46:50,720 --> 01:46:53,900 ideally what the hash function will be doing for you is this. 2357 01:46:53,900 --> 01:46:55,700 The input is going to be someone's name. 2358 01:46:55,700 --> 01:46:58,242 The algorithm in the middle is going to be the hash function. 2359 01:46:58,242 --> 01:47:01,880 And the output is the so-called hash value or location in this case. 2360 01:47:01,880 --> 01:47:04,880 So, for instance, in the case of Mario, when we had just-- 2361 01:47:04,880 --> 01:47:08,810 when we had just 26 buckets total, the input to the hash function 2362 01:47:08,810 --> 01:47:09,680 would be Mario. 2363 01:47:09,680 --> 01:47:11,450 That hash function would really just look 2364 01:47:11,450 --> 01:47:15,580 at the first letter, M in that case, and would ideally output the number 12. 2365 01:47:15,580 --> 01:47:16,540 I did the same thing. 2366 01:47:16,540 --> 01:47:20,230 But in my head, whenever I pulled out a card like the five of diamonds here, 2367 01:47:20,230 --> 01:47:26,110 I figured out, OK, that's location 0 out of my 0, 1, 2, 3, 4 four total buckets. 2368 01:47:26,110 --> 01:47:28,220 Here we're doing it instead alphabetically. 2369 01:47:28,220 --> 01:47:32,230 And so someone like Luigi meanwhile would have a hash value of 11. 2370 01:47:32,230 --> 01:47:34,330 These numbers would be bigger, of course, 2371 01:47:34,330 --> 01:47:39,010 though, if we're looking at 1, 2, 3 letters instead of just one. 2372 01:47:39,010 --> 01:47:44,320 So with that said, if we were to implement this in actual code, 2373 01:47:44,320 --> 01:47:45,220 a hash function? 2374 01:47:45,220 --> 01:47:48,590 I did it physically by acting out the cards. 2375 01:47:48,590 --> 01:47:50,800 Here is how we might implement this in code 2376 01:47:50,800 --> 01:47:55,390 using C. I could have a function called hash whose argument is a string, a.k.a. 2377 01:47:55,390 --> 01:47:58,990 char star, a name of which is word where the word is 2378 01:47:58,990 --> 01:48:01,340 like the first word in their name. 2379 01:48:01,340 --> 01:48:03,280 We want this function to return an int, which 2380 01:48:03,280 --> 01:48:07,270 ideally in this case of 26 buckets would be a number from 0 to 26. 2381 01:48:07,270 --> 01:48:08,560 And how do we achieve that? 2382 01:48:08,560 --> 01:48:11,920 Well, if we use our old friend ctype, which had a function like toUpper 2383 01:48:11,920 --> 01:48:17,150 from a couple of weeks back, we could pass in the first letter of that word, 2384 01:48:17,150 --> 01:48:21,800 capitalize it, which is going to give us a number that's 65, 66, 67 2385 01:48:21,800 --> 01:48:24,030 on up for the 26 English letters. 2386 01:48:24,030 --> 01:48:27,722 And if I subtract 65, a.k.a., quote, unquote-- 2387 01:48:27,722 --> 01:48:29,180 single quotes because it's a char-- 2388 01:48:29,180 --> 01:48:35,300 that's going to mathematically give me a number between 0 and 25 inclusive. 2389 01:48:35,300 --> 01:48:36,590 There's a potential bug. 2390 01:48:36,590 --> 01:48:40,580 If I pass in punctuation or anything that's not alphabetical, 2391 01:48:40,580 --> 01:48:41,640 bad things will happen. 2392 01:48:41,640 --> 01:48:43,765 So I should probably have some more error checking, 2393 01:48:43,765 --> 01:48:45,980 but this is the simplest way in code that I 2394 01:48:45,980 --> 01:48:48,140 could implement a hash function that looks only 2395 01:48:48,140 --> 01:48:49,650 at the first letter of their name. 2396 01:48:49,650 --> 01:48:52,490 Probably not ideal because I can think of friends in the real world who have 2397 01:48:52,490 --> 01:48:54,140 the same first letter of their name. 2398 01:48:54,140 --> 01:48:57,710 Whether this is better or worse than looking a 2 letters, 3 letters, 4 2399 01:48:57,710 --> 01:49:00,500 letters, it's going to depend on how much memory you want to spend 2400 01:49:00,500 --> 01:49:02,990 and how much time you want to ultimately save. 2401 01:49:02,990 --> 01:49:04,820 Let me tweak this though a little bit. 2402 01:49:04,820 --> 01:49:07,400 It's conventional in C, just so you know, 2403 01:49:07,400 --> 01:49:11,120 that if you're passing in a string that is a char star to a function 2404 01:49:11,120 --> 01:49:14,750 and you have no intention of letting that function change the string, 2405 01:49:14,750 --> 01:49:18,230 you should probably declare the argument to the function as const. 2406 01:49:18,230 --> 01:49:20,840 And that will tell the compiler to please 2407 01:49:20,840 --> 01:49:23,720 don't let the human programmer actually change 2408 01:49:23,720 --> 01:49:25,160 that actual word in this function. 2409 01:49:25,160 --> 01:49:26,877 It's just not their place to do so. 2410 01:49:26,877 --> 01:49:28,460 And we can actually do something else. 2411 01:49:28,460 --> 01:49:32,900 In a hash function because you're using in this case, the output, the integer 2412 01:49:32,900 --> 01:49:37,070 as a location in an array, it had better not be negative You want it to be 2413 01:49:37,070 --> 01:49:38,910 or positive. 2414 01:49:38,910 --> 01:49:41,960 And so, technically, if you want to impose that in code, you 2415 01:49:41,960 --> 01:49:46,370 can specify that the int that's being returned has to be unsigned, that is, 2416 01:49:46,370 --> 01:49:48,800 it's 0 on up through the positive numbers. 2417 01:49:48,800 --> 01:49:50,670 It is not a negative value. 2418 01:49:50,670 --> 01:49:53,090 So this is slightly better than the previous version 2419 01:49:53,090 --> 01:49:56,960 where we didn't have these defenses in place. 2420 01:49:56,960 --> 01:50:00,800 All right, so what does this actually mean in practice? 2421 01:50:00,800 --> 01:50:04,790 You don't get to necessarily pick the hash function based 2422 01:50:04,790 --> 01:50:06,080 on the names of your friends. 2423 01:50:06,080 --> 01:50:08,390 Presumably, Apple and Google and others already 2424 01:50:08,390 --> 01:50:11,900 chose their hash function independent of what your friends names are. 2425 01:50:11,900 --> 01:50:16,100 So ideally, they want to pick a hash function that generally is quite fast, 2426 01:50:16,100 --> 01:50:17,060 big O of 1. 2427 01:50:17,060 --> 01:50:19,730 But practically speaking, in a hash table 2428 01:50:19,730 --> 01:50:24,050 unless you get really lucky with the inputs, which you generally won't, 2429 01:50:24,050 --> 01:50:26,460 really it's big O of n running time. 2430 01:50:26,460 --> 01:50:26,960 Why? 2431 01:50:26,960 --> 01:50:30,590 Because in the worst possible scenario, you might have one long linked list. 2432 01:50:30,590 --> 01:50:32,838 But in practice, ideally-- 2433 01:50:32,838 --> 01:50:34,880 and this is a little naive-- but suppose that you 2434 01:50:34,880 --> 01:50:39,500 have a uniform distribution of friends in the world where 126 of them 2435 01:50:39,500 --> 01:50:43,610 have names starting with and then another 126 out of B 2436 01:50:43,610 --> 01:50:47,330 and then dot, dot, dot Z. That would be a nice uniform distribution of friends. 2437 01:50:47,330 --> 01:50:50,660 Technically then, your running time of a hash table 2438 01:50:50,660 --> 01:50:53,000 for searching it or deleting or inserting 2439 01:50:53,000 --> 01:50:55,790 would technically be big O of n divided by k, where 2440 01:50:55,790 --> 01:50:57,810 k is the number of buckets, a constant. 2441 01:50:57,810 --> 01:51:00,890 So it's technically big O of n divided by 26. 2442 01:51:00,890 --> 01:51:05,690 Now, again, per our discussion of big O notation, that's still the same thing. 2443 01:51:05,690 --> 01:51:07,290 You get rid of constant factors. 2444 01:51:07,290 --> 01:51:09,410 So, yes, it's 26 times faster. 2445 01:51:09,410 --> 01:51:11,960 The chains are 126 the length. 2446 01:51:11,960 --> 01:51:16,430 But asymptotically in terms of big O notation, it's still big O of n. 2447 01:51:16,430 --> 01:51:18,890 And here's where now we can start to veer away 2448 01:51:18,890 --> 01:51:22,970 from what is theoretically right versus what is practically right. 2449 01:51:22,970 --> 01:51:26,600 In reality, in the real world, if you work for Google, Microsoft, Apple, 2450 01:51:26,600 --> 01:51:31,280 and others, 26 times faster is actually faster in the real world 2451 01:51:31,280 --> 01:51:34,590 even though a mathematician might say, that's really the same thing. 2452 01:51:34,590 --> 01:51:35,480 But it's not. 2453 01:51:35,480 --> 01:51:38,000 The real world wall clock time, if you watch 2454 01:51:38,000 --> 01:51:41,120 the number of seconds passing on the clock, n over k 2455 01:51:41,120 --> 01:51:43,998 is a much better running time than big O of n. 2456 01:51:43,998 --> 01:51:46,790 So here too we're getting to the point where the conversations need 2457 01:51:46,790 --> 01:51:49,250 to become a little more sophisticated. 2458 01:51:49,250 --> 01:51:52,080 It's not quite as simple as theory versus practice. 2459 01:51:52,080 --> 01:51:54,990 It depends on what matters ultimately to you. 2460 01:51:54,990 --> 01:51:59,840 But ideally and literally if somehow or other they picked an ideal hash 2461 01:51:59,840 --> 01:52:03,350 function, big O of 1 would really be the ideal here, 2462 01:52:03,350 --> 01:52:05,270 would really be the running time we achieve. 2463 01:52:05,270 --> 01:52:07,280 And what you'll generally find in the real world 2464 01:52:07,280 --> 01:52:09,738 is that you don't use hash functions that are as simplistic 2465 01:52:09,738 --> 01:52:11,120 as just look at the first letter. 2466 01:52:11,120 --> 01:52:13,130 And, honestly, they won't generally look at the first 2467 01:52:13,130 --> 01:52:14,630 and the second and the third letter. 2468 01:52:14,630 --> 01:52:17,680 They'll use some even fancier math to put real downward pressure 2469 01:52:17,680 --> 01:52:21,260 on the probability of collisions so that, yes, they will still happen. 2470 01:52:21,260 --> 01:52:25,660 But most of the time a really good hash function, even if it's not quite ideal, 2471 01:52:25,660 --> 01:52:30,460 will be darn close to constant time, which makes hash tables and in turn 2472 01:52:30,460 --> 01:52:35,080 dictionaries one of the most universally compelling data structures to use. 2473 01:52:35,080 --> 01:52:38,812 Now, with that said, we have time for just another data structure or so. 2474 01:52:38,812 --> 01:52:39,770 And this is not a typo. 2475 01:52:39,770 --> 01:52:41,210 This one's called and try. 2476 01:52:41,210 --> 01:52:44,860 And a try is short for retrieval, which is weird because you say retrieval. 2477 01:52:44,860 --> 01:52:45,687 But you say try. 2478 01:52:45,687 --> 01:52:47,020 But that's the etymology of try. 2479 01:52:47,020 --> 01:52:50,830 And a try is of the weirdest amalgamation 2480 01:52:50,830 --> 01:52:55,670 of all of these things, whereby a try is a tree of arrays. 2481 01:52:55,670 --> 01:52:59,560 So a hash table is an array of linked lists. 2482 01:52:59,560 --> 01:53:03,608 A try is a tree of arrays. 2483 01:53:03,608 --> 01:53:05,650 So at some point computer scientists just started 2484 01:53:05,650 --> 01:53:07,622 mashing together all of these different inputs, 2485 01:53:07,622 --> 01:53:09,080 and let's see what comes out of it. 2486 01:53:09,080 --> 01:53:11,380 But a try is actually really interesting. 2487 01:53:11,380 --> 01:53:14,710 And what you're about to see is a data structure that is literally 2488 01:53:14,710 --> 01:53:18,010 big O of one time, constant time. 2489 01:53:18,010 --> 01:53:19,580 But there is a downside. 2490 01:53:19,580 --> 01:53:23,830 So in a try, every node is an array. 2491 01:53:23,830 --> 01:53:26,500 And every location in that array generally 2492 01:53:26,500 --> 01:53:28,210 represents a letter of the alphabet. 2493 01:53:28,210 --> 01:53:31,430 But you could generalize this away from words too. 2494 01:53:31,430 --> 01:53:34,720 In this case, if we have a root node, that root node 2495 01:53:34,720 --> 01:53:37,610 is technically a big array with 26 locations. 2496 01:53:37,610 --> 01:53:42,700 And if you want to insert names or words more generally into a try, what you do 2497 01:53:42,700 --> 01:53:43,450 is this. 2498 01:53:43,450 --> 01:53:47,980 You hash again and again and again creating one array 2499 01:53:47,980 --> 01:53:49,880 for every letter in your word. 2500 01:53:49,880 --> 01:53:51,020 So what do I mean by that? 2501 01:53:51,020 --> 01:53:53,980 If we've got 26 elements here, this would 2502 01:53:53,980 --> 01:53:55,990 be representing A. This would be representing Z. 2503 01:53:55,990 --> 01:53:59,620 And initially these are all null by default when you have just this root. 2504 01:53:59,620 --> 01:54:02,920 But suppose I want to insert a few friends of mine, including 2505 01:54:02,920 --> 01:54:03,880 Toad for instance. 2506 01:54:03,880 --> 01:54:05,870 T-O-A-D is the name. 2507 01:54:05,870 --> 01:54:07,400 So how would I do that? 2508 01:54:07,400 --> 01:54:12,130 I would first find the location for T based on its number 0 through 25. 2509 01:54:12,130 --> 01:54:14,080 And if this is T, what would I then do? 2510 01:54:14,080 --> 01:54:18,320 I would change the null to actually be a pointer to another node, a.k.a. 2511 01:54:18,320 --> 01:54:19,490 Another array. 2512 01:54:19,490 --> 01:54:21,890 And then I would go into the second array 2513 01:54:21,890 --> 01:54:25,280 and hash on the second letter of Toad's name which is, of course O. 2514 01:54:25,280 --> 01:54:28,040 And then I would set a pointer to a third node 2515 01:54:28,040 --> 01:54:34,160 in my tree, which would be represented here, so another 26 pointers. 2516 01:54:34,160 --> 01:54:36,110 Then I would find the pointer representing A. 2517 01:54:36,110 --> 01:54:41,150 And I would create finally a fourth node, another ray representing 2518 01:54:41,150 --> 01:54:43,280 the fourth letter of Toad's name. 2519 01:54:43,280 --> 01:54:48,110 But because Toad's name ends with D and therefore 2520 01:54:48,110 --> 01:54:52,760 I already have four nodes here, we need to specially color 2521 01:54:52,760 --> 01:54:55,880 though we could probably use an actual variable here. 2522 01:54:55,880 --> 01:54:59,120 I need to somehow indicate that Toad's name stops here. 2523 01:54:59,120 --> 01:55:03,530 So it's not null per se, this actually means that T-O-A-D is in this data 2524 01:55:03,530 --> 01:55:04,160 structure. 2525 01:55:04,160 --> 01:55:06,260 But I did this deliberately because another friend of mine 2526 01:55:06,260 --> 01:55:08,090 might be Toadette in the Nintendo World. 2527 01:55:08,090 --> 01:55:10,340 And Toadette, of course, is a superstring 2528 01:55:10,340 --> 01:55:13,700 of Toad, that is, it's longer but it shares a common prefix. 2529 01:55:13,700 --> 01:55:15,080 So Toadette could continue. 2530 01:55:15,080 --> 01:55:17,840 And I could have another node for the E, another node for the T, 2531 01:55:17,840 --> 01:55:20,780 another node for the second T, and another node for the last E. 2532 01:55:20,780 --> 01:55:25,620 But I somehow have to mark that E as the end of her name as well. 2533 01:55:25,620 --> 01:55:28,370 So even though they share a common prefix, 2534 01:55:28,370 --> 01:55:33,080 the fact that there's two green boxes on the screen means that T-O-A-D is 2535 01:55:33,080 --> 01:55:38,772 in this dictionary as a key as T-O-A-D-E-T-T-E is another key. 2536 01:55:38,772 --> 01:55:40,730 And technically speaking, what's in these boxes 2537 01:55:40,730 --> 01:55:41,980 too-- it's not just a pointer. 2538 01:55:41,980 --> 01:55:44,870 It's probably Toad and Toadette's phone number and email 2539 01:55:44,870 --> 01:55:49,040 address and the actual value of the dictionary, which is to say, 2540 01:55:49,040 --> 01:55:51,060 this too is in fact a dictionary. 2541 01:55:51,060 --> 01:55:54,420 A dictionary is just an abstract data type, a collection of key value pairs, 2542 01:55:54,420 --> 01:55:56,600 just like I claimed a stack and a queue was. 2543 01:55:56,600 --> 01:55:58,280 And how you implement it can differ. 2544 01:55:58,280 --> 01:56:01,070 You could implement it with a hash table an, array of linked lists 2545 01:56:01,070 --> 01:56:07,370 as we just did, or you can implement a dictionary as a try, a tree of arrays. 2546 01:56:07,370 --> 01:56:11,298 And let me add one more name to the mix, Tom, for instance, 2547 01:56:11,298 --> 01:56:12,590 a valid name from the universe. 2548 01:56:12,590 --> 01:56:18,030 T-O-M just means that, OK, that name exists in this structure as well. 2549 01:56:18,030 --> 01:56:21,920 Now, what is the implication of storing the names 2550 01:56:21,920 --> 01:56:24,500 in this way, which is implicitly. 2551 01:56:24,500 --> 01:56:31,100 I'm effectively storing Toad and Toadette and Tom in this data structure 2552 01:56:31,100 --> 01:56:35,690 without actually storing T or O or A or D or any of the other letters. 2553 01:56:35,690 --> 01:56:38,000 I'm just implicitly storing those letters 2554 01:56:38,000 --> 01:56:41,490 by actually using valid pointers that lead to another node. 2555 01:56:41,490 --> 01:56:43,970 And so what's the implication of this encode? 2556 01:56:43,970 --> 01:56:45,980 Well, encode it might look like this. 2557 01:56:45,980 --> 01:56:52,160 Every node in a try is now redefined as being an array of size 26-- 2558 01:56:52,160 --> 01:56:55,100 and I'll call it children just to borrow the family tree metaphor-- 2559 01:56:55,100 --> 01:56:59,510 and that in each of these nodes there is room for the person's phone number, 2560 01:56:59,510 --> 01:57:02,190 for instance, a.k.a. a string or char star. 2561 01:57:02,190 --> 01:57:03,450 So what does this mean? 2562 01:57:03,450 --> 01:57:06,260 Well, if there's actually a non-null number there, 2563 01:57:06,260 --> 01:57:08,240 that's equivalent to there being a green box. 2564 01:57:08,240 --> 01:57:11,240 If you actually see plus 1, 617 dash whatever there, 2565 01:57:11,240 --> 01:57:14,320 that means there's a green box because Toad's number is right here. 2566 01:57:14,320 --> 01:57:16,000 Or Toadette's number is down here. 2567 01:57:16,000 --> 01:57:17,480 Or Tom's is over there. 2568 01:57:17,480 --> 01:57:21,100 But if this is null, that just means that maybe this is the T or the O 2569 01:57:21,100 --> 01:57:25,270 or the E, which are not actually ends of people's names. 2570 01:57:25,270 --> 01:57:27,850 So that's all these nodes actually are. 2571 01:57:27,850 --> 01:57:31,480 And if we think back now to what this data structure looks like, 2572 01:57:31,480 --> 01:57:37,010 this is in fact a data structure that can be navigated in constant time. 2573 01:57:37,010 --> 01:57:37,510 Why? 2574 01:57:37,510 --> 01:57:40,960 Well, all we need to keep track of this data structure is literally one pointer 2575 01:57:40,960 --> 01:57:44,530 called try that's a pointer to the first of these nodes, the so-called root 2576 01:57:44,530 --> 01:57:45,115 of the try. 2577 01:57:45,115 --> 01:57:49,150 And when it comes to now thinking about the running time of a try, well, 2578 01:57:49,150 --> 01:57:49,930 what is it? 2579 01:57:49,930 --> 01:57:53,020 Well, if you've got n friends in your contacts already 2580 01:57:53,020 --> 01:57:56,720 or if there's n keys in that data structure, 2581 01:57:56,720 --> 01:57:59,560 how many steps does it take to find anyone? 2582 01:57:59,560 --> 01:58:03,730 Well, whether I have three names, Toad, Toadette, or Tom or three million names 2583 01:58:03,730 --> 01:58:07,870 in that data structure, how many steps will it take me to find Toad ever? 2584 01:58:07,870 --> 01:58:10,750 T-O-A-D. How many steps for Toadette? 2585 01:58:10,750 --> 01:58:13,870 T-O-A-D-E-T-T-E. Eight steps. 2586 01:58:13,870 --> 01:58:14,620 How about for Tom? 2587 01:58:14,620 --> 01:58:15,700 1 2, 3. 2588 01:58:15,700 --> 01:58:17,680 And, frankly, I'm sure if we looked it up, 2589 01:58:17,680 --> 01:58:21,440 there's probably a limit on the number of characters in a Nintendo character's 2590 01:58:21,440 --> 01:58:21,940 name. 2591 01:58:21,940 --> 01:58:24,790 Maybe it's 20 characters total or maybe a little longer, 30. 2592 01:58:24,790 --> 01:58:25,960 There's some fixed value. 2593 01:58:25,960 --> 01:58:26,802 It's not unbounded. 2594 01:58:26,802 --> 01:58:28,510 There's not an infinite number of letters 2595 01:58:28,510 --> 01:58:30,080 in any Nintendo character's name. 2596 01:58:30,080 --> 01:58:31,720 So there's some constant value. 2597 01:58:31,720 --> 01:58:32,770 Call it k. 2598 01:58:32,770 --> 01:58:35,020 So no matter whose name you're looking for, 2599 01:58:35,020 --> 01:58:36,790 it's going to take you maximally k steps. 2600 01:58:36,790 --> 01:58:37,840 But k is a constant. 2601 01:58:37,840 --> 01:58:42,820 And we always said that big O of k is the same thing as big O of 1. 2602 01:58:42,820 --> 01:58:46,300 So for all intents and purposes, even though we're taking a bit of liberty 2603 01:58:46,300 --> 01:58:50,425 here, searching a try, inserting into a try, deleting from a try 2604 01:58:50,425 --> 01:58:51,940 is constant time. 2605 01:58:51,940 --> 01:58:55,715 Because if you have a billion names in the dictionary already, 2606 01:58:55,715 --> 01:58:57,590 it's going to take up a huge amount of space. 2607 01:58:57,590 --> 01:59:03,370 But it does not affect how many steps it takes to find Toad or Toadette or Tom. 2608 01:59:03,370 --> 01:59:05,500 That depends only on the length of their names 2609 01:59:05,500 --> 01:59:08,050 which effectively is a constant value. 2610 01:59:08,050 --> 01:59:09,610 But there is a downside here. 2611 01:59:09,610 --> 01:59:11,230 And it's a big one. 2612 01:59:11,230 --> 01:59:14,740 In practice, I daresay most computers, most systems, 2613 01:59:14,740 --> 01:59:18,940 would actually use hash tables, not tries, 2614 01:59:18,940 --> 01:59:21,490 to implement dictionaries, collections of key value pairs. 2615 01:59:21,490 --> 01:59:26,590 What's the downside of this here data structure might you think? 2616 01:59:26,590 --> 01:59:31,360 And this is just a representative picture for Toad, Tom, and Toadette. 2617 01:59:31,360 --> 01:59:32,990 All the space it takes up-- 2618 01:59:32,990 --> 01:59:36,550 I mean, even for these three names, look at how many empty pointers there are. 2619 01:59:36,550 --> 01:59:38,110 So they're null to be sure. 2620 01:59:38,110 --> 01:59:42,310 But there's 25 unused spaces here, another 25 unused spaces here, 2621 01:59:42,310 --> 01:59:43,900 24 unused spaces here. 2622 01:59:43,900 --> 01:59:46,630 And what's not pictured is if I've got more and more names, 2623 01:59:46,630 --> 01:59:50,410 this thing's just going to blow up with more and more and more and more arrays 2624 01:59:50,410 --> 01:59:53,140 even though there's not going to be someone whose name starts 2625 01:59:53,140 --> 01:59:56,620 with like Laa or Lba or Lbb. 2626 01:59:56,620 --> 01:59:58,990 There's going to be so many combinations of letters 2627 01:59:58,990 --> 02:00:01,490 where it's just going to be null pointers instead. 2628 02:00:01,490 --> 02:00:03,980 So it takes up a huge amount of space. 2629 02:00:03,980 --> 02:00:05,710 But it does give us constant time. 2630 02:00:05,710 --> 02:00:08,270 And that then is this here trade off. 2631 02:00:08,270 --> 02:00:11,710 So I would encourage you here on out as we exit the world of C 2632 02:00:11,710 --> 02:00:14,590 and so much of today's code in the past several weeks code 2633 02:00:14,590 --> 02:00:17,457 will soon be reduced in a week's time to just one 2634 02:00:17,457 --> 02:00:18,790 line of code, two lines of code. 2635 02:00:18,790 --> 02:00:21,430 Because Python and the authors of Python will 2636 02:00:21,430 --> 02:00:24,550 have implemented all of this week's and last week's and prior week's 2637 02:00:24,550 --> 02:00:28,390 ideas for us, we'll be able to operate at a higher level of abstraction. 2638 02:00:28,390 --> 02:00:30,550 And just think about what problems we want to solve 2639 02:00:30,550 --> 02:00:33,970 and how we want to do so algorithmically and with data structures. 2640 02:00:33,970 --> 02:00:37,390 And data structures in conclusion are everywhere. 2641 02:00:37,390 --> 02:00:43,300 Has anyone recognized this spot in Harvard Square? 2642 02:00:43,300 --> 02:00:44,010 Anyone? 2643 02:00:44,010 --> 02:00:44,980 What are we looking at? 2644 02:00:44,980 --> 02:00:46,458 AUDIENCE: Is that Sweetgreen? 2645 02:00:46,458 --> 02:00:49,000 DAVID J. MALAN: So this is Sweetgreen, a popular salad place. 2646 02:00:49,000 --> 02:00:53,480 And this is actually a dictionary or really a hash table of sorts. 2647 02:00:53,480 --> 02:00:53,980 Why? 2648 02:00:53,980 --> 02:00:56,590 Well, if you buy a very expensive salad at Sweetgreen, 2649 02:00:56,590 --> 02:00:59,830 they put it on the shelf for you if you've ordered via the app or online 2650 02:00:59,830 --> 02:01:00,343 in advance. 2651 02:01:00,343 --> 02:01:02,260 And if I, for instance, were to order a salad, 2652 02:01:02,260 --> 02:01:03,970 it would probably go under the D heading. 2653 02:01:03,970 --> 02:01:07,510 If Carter were to order a salad, it would go under C, Julia under y. 2654 02:01:07,510 --> 02:01:11,110 And so they hash the salads based on your first name 2655 02:01:11,110 --> 02:01:12,760 to a particular location on the shelf. 2656 02:01:12,760 --> 02:01:13,880 Why is that a good thing? 2657 02:01:13,880 --> 02:01:16,750 Well, if it were just one long shelf that wasn't even alphabetical, 2658 02:01:16,750 --> 02:01:19,360 it would be big O of n for me to find my salad 2659 02:01:19,360 --> 02:01:21,100 and for Carter and Julia to find theirs. 2660 02:01:21,100 --> 02:01:24,070 Because they've got 26 letters here, it's big O of 1. 2661 02:01:24,070 --> 02:01:26,740 It's one step for any of us to find our salads. 2662 02:01:26,740 --> 02:01:30,040 Except, again, in perverse situations, where 2663 02:01:30,040 --> 02:01:36,220 to might this system devolve at like 12:30 PM in the afternoon for instance? 2664 02:01:36,220 --> 02:01:37,195 What could go wrong? 2665 02:01:37,195 --> 02:01:38,970 AUDIENCE: A lot of people with names with the same first letter 2666 02:01:38,970 --> 02:01:39,772 order a salad. 2667 02:01:39,772 --> 02:01:42,480 DAVID J. MALAN: Yeah, a lot of people with the same first letters 2668 02:01:42,480 --> 02:01:43,938 of their names might order a salad. 2669 02:01:43,938 --> 02:01:46,937 So there's lots of like D, D, D. Where do we put the next person? 2670 02:01:46,937 --> 02:01:49,770 OK, well, maybe we overflow to E. What if there's a lot of E people? 2671 02:01:49,770 --> 02:01:50,437 It overflows to. 2672 02:01:50,437 --> 02:01:51,660 F What if it overflows? 2673 02:01:51,660 --> 02:01:54,870 Then we go to G. And it devolves anyway into a linked 2674 02:01:54,870 --> 02:01:58,680 list or really multiple arrays that you have to search in big O of n time? 2675 02:01:58,680 --> 02:02:01,182 I've even been to Sweetgreen at non-popular times. 2676 02:02:01,182 --> 02:02:04,140 And sometimes the staff just don't even choose to use the dictionaries. 2677 02:02:04,140 --> 02:02:06,003 They just put what's closest to them. 2678 02:02:06,003 --> 02:02:07,920 So you have to search the same thing anywhere. 2679 02:02:07,920 --> 02:02:10,500 But you'll start to see now that you've seen some of these building blocks 2680 02:02:10,500 --> 02:02:13,200 that data structures are everywhere algorithms are everywhere. 2681 02:02:13,200 --> 02:02:17,410 And among the goals of CS50 now are to harness these ideas most efficiently. 2682 02:02:17,410 --> 02:02:18,120 So that's a wrap. 2683 02:02:18,120 --> 02:02:19,590 We'll see you next time. 2684 02:02:19,590 --> 02:02:22,940 [MUSIC PLAYING] 2685 02:02:22,940 --> 02:02:49,000