JASON HIRSCHHORN: Welcome everyone to the Section Seven. We are in week seven of the course. And this upcoming Thursday is Halloween so I am dressed up like a pumpkin. I couldn't bend over and put on my shoes, so that's why I'm just wearing socks. I'm also not wearing anything under this, so I can't take it off if it's distracting to you. I apologize in advance for that. You don't need to imagine what's going on. I am wearing boxers. So it's all good. I have a longer story about why I'm dressed as a pumpkin, but I'm going to save that for later in this section because I do want to get started. We have a lot of exciting things to go over this week. Most of them relate directly to this week's problem set, misspellings. We're going to be going over linked lists and hash tables for the entire section. I put this list up every week, a list of resources for you to help you with the material on this course. If at a loss or if looking for some more information, check out one of these resources. Again, pset6 is misspellings, this week's pset. And it also encourages you, and I encourage you, to use some other resources specifically for this pset. In particular, the three I've listed up on the screen-- gdb, which we've been familiar with and been using for a while now, is going to be very helpful this week. So I put that up here. But whenever you're working with C, you should always be using gdb to debug your programs. This week also valgrind. Does anybody know what valgrind does? AUDIENCE: It checks for memory leaks? JASON HIRSCHHORN: Valgrind checks for memory leaks. So if you malloc something in your program, you're asking for memory. At the end of your program, you have to write free on everything you've malloced to give the memory back. If you don't write free at the end and your program comes to a conclusion, everything will automatically be freed. And for small programs, it's not that big a deal. But if you're writing a longer running program that doesn't quit, necessarily, in a couple of minutes or a couple of seconds, then memory leaks can become a huge deal. So for pset6, the expectation is that you will have zero memory leaks with your program. To check for memory leaks, run valgrind and it'll give you some nice output letting you know whether or not everything was free. We'll practice with it later today, hopefully. Finally, the diff command. You used something similar to it in pset5 with the peek tool. Allowed you to look inside. You also used diff, too, per the problem set spec. But in allowed you to compare two files. You could compare the bitmap file and info headers of a staff solution and your solution in pset5 if you chose to use it. Diff will allow you to do that, as well. You can compare the correct answer for this week's problem set to your answer and see if it lines up or see where the errors are. So those are three good tools that you should use for this week, and definitely check your program with these three tools before turning it in. Again, as I have mentioned every week, if you have any feedback for me-- both positive and constructive-- feel free to head to the website at the bottom of this slide and input it there. I really appreciate any and all feedback. And if you give me specific things that I can do to improve or that I'm doing well that you would like me to continue, I take that to heart and really try hard to listen to your feedback. I can't promise I'm going to do everything, though, like wearing a pumpkin costume every week. So we are going to spend the bulk of section, as I mentioned, talking about linked lists and hash tables, which will be directly applicable to the problem set this week. Linked lists we'll go over relatively quickly because we've spent a fair bit of time going over it in section. And so we'll get straight into the coding problems for linked lists. And then at the end we'll talk about hash tables and how they apply to this week's problem set. You've seen this code before. This is a struct, and it is defining something new called a node. And inside a node there is an integer right here and there is a pointer to another node. We've seen this before. This has been coming up for a couple of weeks now. It combines pointers, which we've been working with, and structs, which allow us to combine two different things into one data type. There's a lot going on on the screen. But all of it should be relatively familiar with you. On the first line, we declare a new node. And then inside that new node, I set the integer in that node to one. We see on the next line I'm doing a printf command, but I've grayed out the printf command because the really important part is this line here-- new_node.n. What does the dot mean? AUDIENCE: Go to the node and assess the n value for it. JASON HIRSCHHORN: That's exactly right. Dot means access the n part of this new node. This next line does what? Michael. AUDIENCE: It creates another node that will point to the new node. JASON HIRSCHHORN: So it doesn't create a new node. It creates a what? AUDIENCE: A pointer. JASON HIRSCHHORN: A pointer to a node, as indicated by this node* here. So it creates a pointer to a node. And which node is it pointing to, Michael? AUDIENCE: New node? JASON HIRSCHHORN: New node. And it's pointing there because we've given it the address of new node. And now in this line we see two different ways of expressing the same thing. And I wanted to point out how these two things are the same. In the first line, we dereference the pointer. So we go to the node. That's what this star means. We've seen that before with pointers. Go to that node. That's in parentheses. And then access via the dot operator the n element of that node. So that's taking the syntax we saw right here and now using it with a pointer. Of course, it gets kind of busy if you're writing those parentheses-- that star and that dot. It gets a little busy. So we have some syntactic sugar. And this line right here-- ptr_node->n. That does the same exact thing. So those two lines of code are equivalent and will do the exact same thing. But I wanted to point those out before we go any further so you understand that really this thing right here is just syntactic sugar for dereferencing the pointer and then going to the n part of that struct. Any questions about this slide? OK. So we're going to go through a couple of operations that you can do on linked lists. A linked list, recall, is a series of nodes that point to one another. And we generally start with a pointer called head, generally, that points to the first thing in the list. So on the first line here, we have our original L first. So that thing you can think of-- this text right here you can think of as just the pointer we've stored somewhere that points to the first element. And in this linked list we have four nodes. Each node is a big box. The larger box inside the big box is the integer part. And then we have a pointer part. These boxes aren't drawn to scale because how big is an integer in bytes? How big now? Four. And how big's a pointer? Four. So really, if we were to draw this to scale both boxes would be the same size. In this case, we want to insert something into the linked list. So you can see down here we're inserting five We traverse through the linked list, find where five goes, and then insert it. Let's break that down and go a little bit more slowly. I'm going to point to the board. So we have our node five that we've created in mallocs. Why is everybody laughing? Just kidding. OK. So we've malloced five. We've created this node somewhere else. We have it ready to go. We start at the front of our list with two. And we want to insert in a sorted fashion. So if we see two and we want to put in five, what do we do when we see something less than us? What? We want to insert five into this linked list, keeping it sorted. We see number two. So what do we do? Marcus? AUDIENCE: Call the pointer to the next node. JASON HIRSCHHORN: And why do we go to the next one? AUDIENCE: Because it's the next node in the list. And we only know that other location. JASON HIRSCHHORN: And five is greater than two, in particular. Because we want to keep it sorted. So five is greater than two. So we move on to the next one. And now we reach four. And what happens when we reach four? Five is greater than four. So we keep going. And now we're at six. And what do we see at six? Yes, Carlos? AUDIENCE: Six is greater than five. JASON HIRSCHHORN: Six is greater than five. So that's where we want to insert five. However, keep in mind that if we only have one pointer here-- this is our extra pointer that's traversing through the list. And we're pointing to six. We've lost track of what comes before six. So if we want to insert something into this list keeping it sorted, we probably need how many pointers? AUDIENCE: Two. JASON HIRSCHORN: Two. One to keep track of the current one and one to keep track of the previous one. This is only a singly linked list. It only goes one direction. If we had a doubly linked list, where everything was pointing to the thing after it and the thing before it, then we wouldn't need to do that. But in this case we don't want to lose track of what came before us in case we need to insert five somewhere in the middle. Say we were inserting nine. What would happen when we got to eight? AUDIENCE: You'd have to get that null point. Instead of having null point you'd have to add an element and then have it point to nine. JASON HIRSCHORN: Exactly. So we get eight. We reach the end of the list because this is pointing to null. And now, instead of having it point to null we have it point to our new node. And we set the pointer in our new node to null. Does anybody have any questions about inserting? What if I don't care about keeping the list sorted? AUDIENCE: Stick it at the beginning or the end. JASON HIRSCHORN: Stick it at the beginning or the end. Which one should we do? Bobby? Why the end? AUDIENCE: Because the beginning is already filled. JASON HIRSCHORN: OK. The beginning is already filled. Who wants to argue against Bobby. Marcus. AUDIENCE: Well you probably want to stick it at the beginning because otherwise if you put it at the end you'd have to traverse the entire list. JASON HIRSCHORN: Exactly. So if we're thinking about runtime, the runtime of inserting at the end would be n, the size of this. What's the big O runtime of inserting at the beginning? Constant time. So if you don't care about keeping something sorted, much better to just insert at the beginning of this list. And that can be done in constant time. OK. Next operation is find, which other-- we've phrased this as search. But we're going to look through the linked list for some object. You guys have seen code for search before in lecture. But we sort of just did it with insert, or at least inserting something sorted. You look through, going node by node, until you find the number that you're looking for. What happens if you reach the end of the list? Say I'm looking for nine and I reach the end of the list. What do we do? AUDIENCE: Return false? JASON HIRSCHORN: Return false. We didn't find it. If you reach the end of the list and you didn't find the number you're looking for, it's not in there. Any questions about find? If this was a sorted list, what would be different for our searching? Yeah. AUDIENCE: It would find the first value that's greater than the one you're looking for and then return false. JASON HIRSCHORN: Exactly. So if it's a sorted list, if we get to something that's greater than what we're looking for, we don't need to keep going to the end of the list. We can at that point return false because we're not going to find it. The question is now, we've talked about keeping linked lists sorted, keeping them unsorted. That's going to be something you're probably going to have to think about when coding problem set five if you choose a hash table with separate chaining approach, which we'll talk about later. But is it worth it to keep the list sorted and then be able to maybe have quicker searches? Or is it better to quickly insert something in constant runtime but then have longer searching? That's a tradeoff right there that you get to decide what is more appropriate for your specific problem. And there's not necessarily one absolutely right answer. But it's certainly a decision you get to make, and probably good to defend that in, say, a comment or two why you chose one over the other. Finally, deleting. We've seen deleting. It's similar to searching. We look for the element. Say we're trying to delete six. So we find six right here. The thing that we have to make sure we do is that whatever is pointing to six-- as we see in step two down here-- whatever's pointing to six needs to skip six now and be changed to whatever six is pointing to. We don't want to ever orphan the rest of our list by forgetting to set that previous pointer. And then sometimes, depending on the program, they'll just delete this node entirely. Sometimes you'll want to return the value that's in this node. So that's how deleting works. Any questions on delete? AUDIENCE: So if you're going to delete it, would you just use free because presumably it was malloced? JASON HIRSCHORN: If you want to free something that's exactly right and you malloced it. Say we wanted to return this value. We might return six and then free this node and call free on it. Or we'd probably call free first and then return six. OK. So let's move on to practice coding. We're going to code three functions. The first one is called insert_node. So you have code that I emailed you, and if you're watching this later on you can access the code in linked.c on the CS50 website. But in linked.c, there's some skeleton code that's already been written for you. And then there's a couple functions you need to write. First we're going to write insert_node. And what insert_node does is inserts an integer. And you're giving the integer into a linked list. And in particular, you need to keep the list sorted from smallest to largest. Also, you don't want to insert any duplicates. Finally, as you can see insert_node returns a bool. So you're supposed to let the user know whether or not the insert was successful by returning true or false. At the end of this program-- and for this stage you don't need to worry about freeing anything. So all you're doing is taking an integer and inserting it into a list. That is what I'm asking you to do now. Again, in the linked.c, which you all have, is the skeleton code. And you should see towards the bottom the sample function declaration. However, before going into coding it in C, I highly encourage you to go through the steps we've been practicing each week. We've already gone through a picture of this. So you should have some understanding of how this works. But I would encourage you to write some pseudocode before diving in. And we're going to go over pseudocode as a group. And then once you've written your pseudocode, and once we've written our pseudocode as a group, you can go into coding it in C. As a heads up, the insert_node function is probably the trickiest of the three we're going to write because I added some additional constraints to your programming, in particular that you're not going to insert any duplicates and that the list should remain sorted. So this is a non-trivial program that you need to code. And why don't you take five to seven minutes just to get working on the pseudocode and the code. And then we will start going as a group. Again, if you have any questions just raise your hand and I'll come around. . We also generally do these-- or I don't explicitly say you can work with people. But obviously, I highly encourage you, if you have questions, to ask the neighbor sitting next to you or even work with somebody else if you want to. This does not have to be an individual silent activity. Let's start with writing some pseudocode on the board. Who can give me the first line of pseudocode for this program? For this function, rather-- insert_node. Alden? AUDIENCE: So the first thing I did was create a new pointer to the node and I initialized it pointing to the same thing that list is pointing to. JASON HIRSCHORN: OK. So you're creating a new pointer to the list, not to the node. AUDIENCE: Right. Yeah. JASON HIRSCHORN: OK. And then what do we want to do? What's after that? What about the node? We don't have a node. We just have a value. If we want to insert a node, what do we need to do first before we can even think about inserting it? AUDIENCE: Oh, sorry. we need to malloc space for a node. JASON HIRSCHORN: Excellent. Let's do-- OK. Can't reach that high. OK. We're going to go down, and then we're using two columns. I can't go that-- OK. Create a new node. You can create another pointer to list or you can just use list as it exists. You don't really need to do that. So we create a new node. Great. That's what we do first. What's next? AUDIENCE: Wait. Should we create a new node now or should we wait to make sure that there's no duplicates of the node on the list before we create it? JASON HIRSCHORN: Good question. Let's hold that for later because the majority of the time we'll be creating a new node. So we'll keep that here. But that's a good question. If we create it and we find a duplicate, what should we do before returning? AUDIENCE: Free it. JASON HIRSCHORN: Yeah. Probably free it. OK. What do we do after we create a new node? Annie? AUDIENCE: We put the number in the node? JASON HIRSCHORN: Exactly. We put the number-- we malloc space. I'm going to leave that all as one line. But you're right. We malloc space, and then we put the number in. We can even set the pointer part of it to null. That's exactly right. And then what about after that? We drew this picture on the board. So what do we do? AUDIENCE: We go through the list. JASON HIRSCHORN: Go through the list. OK. And what do we check for at each node. Kurt, what do we check for at each node? AUDIENCE: See whether the n value of that node is greater than the n value of our node. JASON HIRSCHORN: OK. I'm going to do-- yeah, OK. So it's n-- I'm going to say if value is greater than this node, then what do we do? AUDIENCE: Well, then we insert the thing right before that. JASON HIRSCHORN: OK. So if it's greater than this, then we want to insert. But we want to insert it right before because we also would need to be keeping track, then, of what was before. So insert before. So we probably missed something earlier on. We probably need to be keeping track of what's going on. But we'll get back there. So what value is less than? Kurt, what do we do if value is less than? AUDIENCE: Then you just keep going unless it's the last one. JASON HIRSCHORN: I like that. So go to the next node. Unless it's the last one-- we're probably checking for that in the terms of a condition. But yeah, next node. And that's getting too low, so we'll move over here. But if-- can everybody see this? If we're equal what do we do? If the value we're trying to insert is equal to this node's value? Yeah? AUDIENCE: [INAUDIBLE]. JASON HIRSCHORN: Yeah. Given this-- Marcus is right. We could have maybe done something different. But given that we've created it, here we should free and then return. Oh boy. Is that better? How's that? OK. Free and then what do we return, [INAUDIBLE]? OK. Are we missing anything? So where are we keeping track of the prior node? AUDIENCE: I think it would go after create a new node. JASON HIRSCHORN: OK. So at the beginning we'll probably-- yeah, we can create a pointer to a new node, like a previous node pointer and a current node pointer. So let's insert that here. Create current and previous pointers to the nodes. But when do we adjust those pointers? Where do we do that in the code? Jeff? AUDIENCE: -- value conditions? JASON HIRSCHORN: Which one in particular? AUDIENCE: I'm just confused. If value is greater than this node, doesn't that mean that you want to go to the next node? JASON HIRSCHHORN: So if our value is greater than the value of this node. AUDIENCE: Yeah, then you'd want to go further down the line, right? JASON HIRSCHHORN: Right. So we don't insert it here. If value is less than this node, then we go to the next node-- or then we insert before. AUDIENCE: Wait, which is this node and which is value? JASON HIRSCHHORN: Good question. Value per this function definition is what we're given. So value is the number we're given. So if the value is less than this node, we need time to insert. If value is greater than this node, we go to the next node. And back to the original question, though, where-- AUDIENCE: If value is greater than this node. JASON HIRSCHHORN: And so what do we do here? Sweet. That is correct. I'm just going to write update pointers. But yes, with the current one you would update it to point to the next one. Anything else we're missing? So I'm going to type this code into gedit. And while I do this, you can have a couple more minutes to work on coding this in C. So I have input the pseudocode. A quick note before we get started. We may not be able to completely finish this in all three of these functions. There is correct solutions to them that I will email out to you guys after section, and it will be posted on CS50.net. So I don't encourage you to go look at the sections. I encourage you to try these on your own, and then use the the practice problems to check your answers. These have all been designed to closely relate to and adhere to what you have to do on the problem set. So I do encourage you to practice this on your own and then use the code to check your answers. Because I do want to move on to hash tables at some point in the section. So we might not get through it all. But we'll do as much we can now. OK. Let us begin. Asam, how do we create a new node? AUDIENCE: You do struct*. JASON HIRSCHHORN: So we have that up here. Oh, sorry. You were saying struct*. AUDIENCE: And then [? kind ?] node or c node. JASON HIRSCHHORN: OK. I'm going to call it new_node so we can stay consistent. AUDIENCE: And you want to set that to head, the first node. JASON HIRSCHHORN: OK. So now this pointing to-- so this hasn't created a new node yet. This is just pointing to the first node in the list. How do I create a new node? If I need space to create a new node. Malloc. And how big? AUDIENCE: The size of the struct. JASON HIRSCHHORN: The size of the struct. And what's the struct called? AUDIENCE: Node? JASON HIRSCHHORN: Node. So malloc(sizeof(node)); gives us space. And is this line-- one thing is incorrect on this line. Is new_node a pointer to a struct? That's a generic name. What is it-- node, exactly. It's a node*. And what do we do right after we malloc something, Asan? What's the first thing we do ? What if it doesn't work? AUDIENCE: Oh, check if it points to the node? JASON HIRSCHHORN: Exactly. So if you new_node equals equals null, what do we do? This returns a bool, this function. Exactly. Looks good. Anything to add there? We'll add things at the end. But that so far looks good. Create current and previous pointers. Michael, how do I do this? AUDIENCE: You would have to do a node*. You'd have to make one not for new_node but for the nodes we already have. JASON HIRSCHHORN: OK. So the current node we're on. I'll call that curr. All right. We've decided we want to keep two because we need to know what's before it. What do they get initialized to? AUDIENCE: Their value in our list. JASON HIRSCHHORN: So what is the first thing on our list? Or how do we know where the beginning of our list is? AUDIENCE: Isn't it passed into the function? JASON HIRSCHHORN: Right. It was passed in right here. So if it's passed into the function, the start of the list, what should we set current equal to? AUDIENCE: List. JASON HIRSCHHORN: List. That's exactly right. Now it has the address of the start of our list. And what about previous? AUDIENCE: List minus one? JASON HIRSCHHORN: There's nothing before it. So what can we do to signify nothing? AUDIENCE: Null. JASON HIRSCHHORN: Yeah. That sounds like a good idea. Perfect. Thank you. Go through the list. Constantine, how long are we going to go through the list? AUDIENCE: Until We reach null. JASON HIRSCHHORN: OK. So if, while, for loop. What are we doing? AUDIENCE: Maybe a for loop? JASON HIRSCHHORN: Let's do a for loop. OK. AUDIENCE: And we say for-- until the current pointer is not equal to null. JASON HIRSCHHORN: So if we know the condition, how can we write a loop based off that condition. What kind of a loop should we use? AUDIENCE: While. JASON HIRSCHHORN: Yeah. That makes more sense based off of what you said. If we just want to go into we it would just know that thing, it would make sense to do a while loop. While current does not equal null, if value is less than this node. Akshar, give me this line. AUDIENCE: If current->n n less than value. Or reverse that. Switch that bracket. JASON HIRSCHHORN: Sorry. AUDIENCE: Change the bracket. JASON HIRSCHHORN: So if it's greater than value. Because that's confusing with the comment above, I'm going to do that. But yes. If our value is less than this node, what do we do? Oh. I have it right here. Insert before. OK. How do we do that? AUDIENCE: Is it still me? JASON HIRSCHHORN: Yeah. AUDIENCE: You-- new_node->next. JASON HIRSCHHORN: So what's that going to equal? AUDIENCE: It's going to equal current. JASON HIRSCHHORN: Exactly. And so the other-- what else do we need to update? AUDIENCE: Check if past equals null. JASON HIRSCHHORN: If prev-- so if prev equals null. AUDIENCE: That means it's going to become the head. JASON HIRSCHHORN: That means it's become the head. So then what do we do? AUDIENCE: We do head equals new_node. JASON HIRSCHHORN: Head equals new_node. And why head here, not list? AUDIENCE: Because head is a global variable, which is the starting place. JASON HIRSCHHORN: Sweet. OK. And-- AUDIENCE: Then you do else prev-> next equals new_node. And then you return true. JASON HIRSCHHORN: Where do we set new_node end? AUDIENCE: I would-- I set that at the start. JASON HIRSCHHORN: So what line? AUDIENCE: After the if statement checking if it's known. JASON HIRSCHHORN: Right here? AUDIENCE: I'd do new_node->n equals value. JASON HIRSCHHORN: Sounds good. Probably it makes sense-- we don't need to know what list we're on because we're only dealing with one list. So a better function declaration for this is just to get rid of this entirely and just insert a value into head. We don't even need to know what list we're in. But I will keep it for now and then change it upon updating the slides and code. So that looks good for now. If value-- who can do this line? If-- what do we do here, Noah. AUDIENCE: If value is greater than curr->n-- JASON HIRSCHHORN: How do we go to the next node? AUDIENCE: Curr->n is equal to new_node. JASON HIRSCHHORN: So n is what part of the struct? The integer. And new_node is a pointer to a node. So what part of curr should we update? If not n, then what's the other part? Noah, what's the other part. AUDIENCE: Oh, next. JASON HIRSCHHORN: Next, exactly. Exactly. Next is the right one. And what else do we need to update, Noah? AUDIENCE: The pointers. JASON HIRSCHHORN: So we updated current. AUDIENCE: Prev->next. JASON HIRSCHHORN: Yeah. OK, we'll pause. Who can help us out here? Manu, what should we do? AUDIENCE: You've got to set it equal to curr->next. But do that before the previous line. JASON HIRSCHHORN: OK. Anything else? Akshar. AUDIENCE: I don't think you're meant to change curr->next. I think you're meant to do curr equals curr->next to go to the next node. JASON HIRSCHHORN: So sorry, where? On what line? This line? AUDIENCE: Yeah. Make curr equals curr->next. JASON HIRSCHHORN: So that's correct because current is a pointer to a node. And we want it to point to the next node of what's getting currently pointed to. Curr itself has a next. But if we were to update curr.next, we would be updating the actual note itself, not where this pointer was pointing. What about this line, though. Avi? AUDIENCE: Prev->next equals curr. JASON HIRSCHHORN: So again, if prev is a pointer to a node, prev->next is the actual pointer in the node. So this would be updating a pointer in a node to curr. We don't want to update a pointer in a node. We want to update previous. So how do we do that? AUDIENCE: It would just be prev. JASON HIRSCHHORN: Right. Prev is a pointer to a node. Now we're changing it to a new pointer to a node. OK Let us move down. Finally, this last condition. Jeff, what do we do here? AUDIENCE: If value is equal to curr->n. JASON HIRSCHHORN: Sorry. Oh my goodness. What? Value==curr->n. What do we do? AUDIENCE: You'd free our new_node, and then you'd return false. JASON HIRSCHHORN: This is what we have written so far. Does anybody have anything to add before we make? OK. Let's try it. Control may reach the end of a non-void function. Avi, what's going on? AUDIENCE: Are you supposed to put return true outside of the while loop? JASON HIRSCHHORN: I don't know. Do you want me to? AUDIENCE: Never mind. No. JASON HIRSCHHORN: Akshar? AUDIENCE: I think you meant to put return false at the end of the while loop. JASON HIRSCHHORN: So where do you want it to go? AUDIENCE: Like outside the while loop. So if you exit the while loop that means that you've reached the end and nothing's happened. JASON HIRSCHHORN: OK. So what do we do in here? AUDIENCE: You return false there as well. JASON HIRSCHHORN: Oh, we do it in both places? AUDIENCE: Yeah. JASON HIRSCHHORN: OK. Should we go? Oh my goodness. I'm sorry. I apologize for the screen. It's kind of freaking out on us. So choose an option. Zero, per the code, quits the program. One inserts something. Let's insert three. The insert was not successful. I'm going to print out. I don't have anything. OK. Maybe that was just a fluke. Insert one. Not successful. OK. Let's run through GDB really quickly to check out what is going on. Remember gdb./ the name of your program gets us into GDB. Is that a lot to handle? The flashing? Probably. Close your eyes and take some deep breaths if you get tired of looking at it. I'm in GDB. What's the first thing I do in GDB? We've got to figure out what's going on here. Let's see. We have six minutes to figure out what's going on. Break main. And then what do I do? Carlos? Run. OK. Let's choose an option. And what does N do? Next. Yeah. AUDIENCE: Didn't you mention-- didn't you say that the head, it was initialized to null at the beginning. But I thought you said that was OK. JASON HIRSCHHORN: Let's go-- let's look in GDB, and then we'll go back. But it sounds like you already have some ideas about what's going on. So we want to insert something. OK. We have insert. Please enter an int. We'll insert three. And then I'm on this line. How do I go start debugging the insert known function? Oh my goodness. That's a lot. Is that freaking out a lot? AUDIENCE: Oh, it died. JASON HIRSCHHORN: I just pulled it out. OK. AUDIENCE: Maybe it's the other end of the wire. JASON HIRSCHHORN: Wow. So the bottom line-- what did you say? AUDIENCE: I said the irony of technical difficulties in this class. JASON HIRSCHHORN: I know. If only I had control over that part. [INAUDIBLE] That sounds great. Why don't you guys start thinking about what we could have done wrong, and we will be back in 90 seconds. Avica, I'm going to ask you how to go inside insert_node to debug it. So this is where we last left off. How do I go inside insert_node, Avica, to examine what's going on? What GDB command? Break would not take me inside. Does Marquise know? AUDIENCE: What? JASON HIRSCHHORN: What GDB command I use to go inside this function? AUDIENCE: Step? JASON HIRSCHHORN: Step via S. That takes me inside. OK. New_node mallocing some space. That all looks like its going. Let's examine new_node. It got some memory address. Let's check-- that is all correct. So everything here seems to be working correctly. AUDIENCE: What's the difference between P and display? JASON HIRSCHHORN: P stands for print. And so you're asking what's the difference between that and this? In this case, nothing. But generally there are some differences. And you should look in the GDB manual. But in this case, nothing. We tend to use print, though, because we don't need to do much more than print a single value. OK. So we are on line 80 of our code, setting node* curr equal to list. Let us print out curr. It equals list. Sweet. Wait. It equals something. That doesn't seem right. There we go. It's because in GDB, right, if it's the line you're on it hasn't executed yet. So you need to actually type next to execute the line before seeing its results. So here we are. We just executed this line, previous equals null. So again, if we print previous we won't see anything weird. But if we actually execute that line, then we will see that that line worked. So we have curr. Those are both good. Right? Now we're on this line right here. While curr does not equal null. Well, what does curr equal? We just saw it equalled null. We printed it out. I'll print it out again. So is that while loop going to execute? AUDIENCE: No. JASON HIRSCHHORN: So when I typed that line, you see we jumped all the way down to the bottom, return false. And then we're going to return false and go back to our program and eventually print out, like we saw, the insert was not successful. So, anybody have any ideas on what we need to do to fix this? I'm going to wait until I see a couple of hands go up. We didn't execute this. Keep in mind, this was the first thing we were doing. I'm not going to do a couple. I'm going to do a few. Because a couple means two. I'll wait for more than two. The first insertion, curr, by default equals null. And this loop only executes if curr is not null. So how can I get around this? I see three hands. I'll wait for more than three. Marcus, what do you think? AUDIENCE: Well, if you need it to execute more than once, you just change it to a do-while loop. JASON HIRSCHHORN: OK. Will that solve our problem, though? AUDIENCE: In this case no because of the fact that the list is empty. So then you probably just need to add a statement that if the loop exits then you have to be at the end of the list, at which point you can just insert it. JASON HIRSCHHORN: I like that. That makes sense. If the loop exits-- because it'll return false here. So if the loop exits, then we're at the end of the list, or maybe the start of a list if there's nothing in it, which is the same as the end. So now we want to insert something here. So how does that code look, Marcus? AUDIENCE: If you already got the node malloced, you could just say new_node->next equals null because it has to be at the end. Or new_node->next equals null. JASON HIRSCHHORN: OK. Sorry. New_node->next equals null because we're at the end. That doesn't put it in. How do we put it in the list? Right. That's just setting it equal to. No how do we actually put it in the list? What's pointing to the end of the list? AUDIENCE: Head. JASON HIRSCHHORN: Sorry? AUDIENCE: Head is pointing to the end of the list. JASON HIRSCHHORN: If there's nothing in the list, head is pointing to the end of the list. So that'll work for the first insertion. What about if there are a couple things in the list? Than we don't want to set head equal to new_node. What do we want to do there? Yeah? Probably previous. Will that work? Recall that previous is just a pointer to a node. And previous is a local variable. So this line will set a local variable, previous, equal to or pointing to this new node. That won't actually put it in our list, though. How do we put it in our list? Akchar? AUDIENCE: I think you do current->next. JASON HIRSCHHORN: OK. curr->next. So again, the only reason we're down here is, what does current equal? AUDIENCE: Equals null. JASON HIRSCHHORN: And so what happens if we do null->next? What do we going to get? We'll get a segmentation fault. AUDIENCE: Do curr equals null. JASON HIRSCHHORN: That's the same thing as prev, though, because there's a local variable we're setting equal to this new node. Let's go back to our picture of inserting something. Say we're inserting at the end of the list, so right here. We have a current pointer that's pointing to null and a previous point that's pointing to 8. So what do we need to update, Avi? AUDIENCE: Previous->next? JASON HIRSCHHORN: Previous->next is what we want to update because that will actually insert it at the end of the list. We still have one bug, though, that we're going to run into. What's that bug? Yeah? AUDIENCE: It's going to return false in this case? JASON HIRSCHHORN: Oh, is is going to return false. But there's another bug. So we'll need to put in return true. AUDIENCE: Does previous still equal null at the top of the list? JASON HIRSCHHORN: So previous still equals null at the very beginning. So how can we get over that? Yeah? AUDIENCE: I think you can do a check before the while loop to see if it's an empty list. JASON HIRSCHHORN: OK. So let's go here. Do a check. If-- AUDIENCE: So if head equals equals null. JASON HIRSCHHORN: If head equals equals null-- that'll tell us if it's an empty list. AUDIENCE: And then you do head equals new. JASON HIRSCHHORN: Head equals new_node? And what else do we need to do? AUDIENCE: And then you return true. JASON HIRSCHHORN: Not quite. We're missing one step. AUDIENCE: New_node next has to point to null. JASON HIRSCHHORN: Exactly, Alden. And then we can return true. OK. But it's still a good idea to do things at the end of the list, right? All right. We still might actually get to the end of the list. So is this code fine if we're at the end of the list and there are some things in the list? Right? Because we still have Marcus's idea. We might exit this loop because we're at the end of the list. So do we still want this code down here? AUDIENCE: Yes. JASON HIRSCHHORN: Yeah. And what do we need to change this to? True. Does that sound good to everyone so far? Anybody have any-- Avi, do you have something to add? AUDIENCE: No. JASON HIRSCHHORN: OK. So we've made a couple of changes. We've made this check before we went in for an empty list. So we've taken care of an empty list. And here we took care of inserting something at the end of the list. So it seems like this while loop taking care of things in between, somewhere in the list if there are things in the list. OK. Let us run this program again. Not successful. AUDIENCE: You didn't make it. JASON HIRSCHHORN: Oh, I didn't make it. Good point, Michael. Let's add a make linked. Line 87 there's an error. Line 87. Alden, this was the line you gave me. What's wrong? AUDIENCE: It has to be to null. JASON HIRSCHHORN: Excellent. Exactly right. It should be null. Let's make again. Compile. OK. Let's insert three. The insert was successful. Let's print it out. Oh, if only we could check. But we haven't done the print function yet. Let's enter something else. What should we enter? AUDIENCE: Seven. JASON HIRSCHHORN: Seven? AUDIENCE: Yes. JASON HIRSCHHORN: We have a seg fault. So we got one, but we clearly can't get two. It is 5:07. So we could debug this for three minutes. But I'm going to leave us here and move on to hash tables. But again, the answers for this code I will email it to you in a bit. We are very close to it. I highly encourage you to figure out what's going on here and fix it. So I'll email you this code as well plus the solution-- probably the solution later on. First this code. The other thing I want to do before we finish is we haven't freed anything. So I want to show you what valgrind looks like. If we run valgrind boundaries on our program, ./linked. Again, according to this slide, we should run valgrind with some type of option, in this case --leak-check=full. So let's write valgrind --leak-check=full. So this will run valgrind on our program. And now the program actually runs. So we're going to run it just like before, put something in. I'm going to put in three. That works. I'm not going to try to put in something else because we're going to get a seg false in that case. So I'm just going to quit. And now you see down here leak and heap summary. These are the good things that you want to check out. So the heap summary-- it says, in use at exit-- eight bytes in one block. That one block is the node we malloced. Michael, you said before a node is eight bites because it has the integer and the pointer. So that's our node. And then it says we used malloc seven times and we freed something six times. But we never called free, so I have no idea what this is talking about. But suffice it to say that when your program runs, malloc is being called in some other places that we don't need to worry about. So malloc was probably called in some places. We don't need to worry where. But this is really us. This first line is us. We left that block. And you can see that here in the leak summary. Still reachable-- eight bytes in one block. That means that memory-- we have leaked that memory. Definitely lost-- something is lost for good. Generally, you won't see anything there. Still reachable is generally where you'll see things, where you'll want to look to see what code should you have freed but you forgot to free. And then if this wasn't the case, if we did free everything, we can check that. Let's just run the program not putting in anything. You'll see down here in use at exit-- zero bytes in zero blocks. That means we had nothing left when this program exited. So before turning in pset6, run valgrind and make sure you do not have any memory leaks in your program. If you have any questions with valgrind, feel free to reach out. But this is how you use it. Very simple-- see if you have in use at exit-- any bytes in any blocks. So we were working on insert node. I had two other functions here-- print nodes and free nodes. Again, these are functions that are going to be good for you to practice because they will help you not only with these sample exercises but also on the problem set. They map on pretty closely to things you're going to have to do in the problem set. But I do want to make sure we touch on everything. And hash tables are also crucial to what we're doing in section this week-- or in the problem set. So we're going to finish the section talking about hash tables. If you notice I made a little hash table. That is not what we're talking about, however. We are talking about a different type of hash tables. And at its core, a hash table is nothing more than an array plus a hash function. We're going to talk for a bit just to make sure everybody understands what a hash function is. And I'm telling you now that it is nothing more than two things-- an array and a hash function. And here are the steps through which this operates. There's our array. There's our function. In particular, hash functions need to do a couple of things with this. I'm going to talk specifically about this problem set. It's probably going to take in a string. And what's it going to return? What data type? Alden? Your hash function return? An integer. So this is what the hash table consists of-- a table in the form of array and a hash function. How it works? It works in three steps. We give it a key. In this case, we'll give it a string. We call the hash function per step one on the key and we get a value. Specifically, we'll say we get an integer. That integer, there are very specific limits to what that integer can be. In this example, our array is of size three. So what numbers can that integer be. What is the range of valid values for that integer, the return type of this hash function? Zero, one and two. The point of the hash function is to figure out the place in the array where our key is going. There are only three possible places here-- zero, one, or two. So this function better return zero, one, or two. Some valid indice in this array. And then depending on where it returns, you can see there array open bracket the value. That's where we put the key. So we throw in the pumpkin, we get out zero. At array bracket 0, we put pumpkin. We throw in cats, we get out one. We put cat at one. We put in spider. We get out two. We put spider at array bracket two. It would be so nice if it worked like that. But unfortunately, as we'll see, it's a bit more complicated. Before we get there, any questions about this basic set-up of a hash table? This is an image of exactly what we drew on the board. But since we drew it on the board, I am not going to go into it further. Essentially keys, the magic black box-- or in this case, teal box-- of a hash function puts them in buckets. And in this example we're not putting the name. We're putting the associated phone number of the name in the bucket. But you could very well just put the name in the bucket. This is just a picture of what we drew on the board. We have potential pitfalls, though. And there are two in particular slides that I want to go over. The first one is about a hash function. So I asked the question, what makes a good hash function? I give two answers. The first is that it's deterministic. In the context of hash functions, what does this mean? Yes? AUDIENCE: It can find the index in constant time? JASON HIRSCHHORN: That is not what it means. But that's a good guess. Anybody else have a guess to what this means? That a good hash function is deterministic? Annie? AUDIENCE: That a key can only be mapped to one place in the hash table. JASON HIRSCHHORN: That's exactly right. Every time you put in pumpkin, it always returns zero. If you put in pumpkin and your hash function returns zero but has a probability of returning something else greater than zero-- so maybe it can return one sometimes or two other times-- that is not a good hash function. You're exactly right. Your hash function should return the same exact integer, in this case, for the same exact string. Maybe it returns the same exact integer for the same exact string regardless of capitalization. But in that case it's still deterministic because multiple things are mapped onto the same value. That's fine. As long as there is only one output for a given input. OK. The second thing is that it returns valid indices. We brought up that earlier. This hash function-- oh boy-- a hash function should return valid indices. So say-- let's go back to this example. My hash function counts up the letters in the word. That's the hash function. And returns that integer. So if I have the word A, it's going to return one. And it's going to put A right here. What if I put in the word bat? It's going to return three. Where does bat go? It doesn't fit. But it needs to go somewhere. This is my hash table after all, and everything needs to go somewhere. So where should bat go? Any thoughts? Guesses? Good guesses? AUDIENCE: Zero. JASON HIRSCHHORN: Why zero? AUDIENCE: Because three modulo three is zero? JASON HIRSCHHORN: Three modulo three is zero. That is a great guess, and that's correct. So in this case it should probably go at zero. So a good way to ensure that this hash function only returns valid indices is to modulo it by the size of the table. If you modulo whatever this returns by three, you're always going to get something between zero, one, and two. And if this always returns seven, and you always modulo by three, you're always going to get the same thing. So it's still deterministic if you modulo. But that will ensure that you never get something-- an invalid industry. Generally, that modulo should happen inside your hash function. So you don't need to worry about this. You just can ensure that this is a valid indice. Any questions on this potential pitfall? OK. And there we go. Next potential pitfall, and this is the big one. What if two keys map to the same value? So there are two ways to handle this. The first one is called linear probing, which I'm not going to go over. But you should be familiar with how that works and what that is. The second one I am going to go over because that is the one that many people will probably end up deciding to use in their problem set. Of course, you don't have to. But for the problem set, many people tend to choose to create a hash table with separate chaining to implement their dictionary. So we're going to go over what it means to create a hash table with separate chaining. So I put in pumpkin. It returns zero. And I put pumpkin here. Then I put in-- what's another Halloween-themed thing? AUDIENCE: Candy. JASON HIRSCHHORN: Candy! That's a great one. I put in candy, and candy also gives me zero. What do I do? Any ideas? Because you all sort of know what separate chaining is. So any ideas what to do? Yeah. AUDIENCE: Putting the string actually in the hash table. JASON HIRSCHHORN: So we're going to draw the good idea over here. OK. AUDIENCE: Have the hashtable [INAUDIBLE] the pointer that points to the beginning of a list. And then have pumpkin be the first value in that linked list and candy be the second value in that linked list. JASON HIRSCHHORN: OK. Marcus, that was outstanding. I'm going to break that down. Marcus is saying don't overwrite pumpkin. That would be bad. Don't put candy somewhere else. We're going to put them both at zero. But we're going to deal with putting them at zero by creating a list at zero. And we're going to create a list of everything that mapped to zero. And the best way we learned to create a list that can grow and shrink dynamically is not within another array. So not a multi-dimensional array. But to just create a linked list. So what he proposed-- I'm going to get a new-- is create an array with pointers, an array of pointers. OK. Any idea or hint what the type of this pointers should be? Marcus? AUDIENCE: Pointers to-- JASON HIRSCHHORN: Because you said a linked list, so-- AUDIENCE: Node pointers? JASON HIRSCHHORN: Node pointers. If the things in our linked list are nodes then they should be node pointers. And what do they equal initially? AUDIENCE: Null. JASON HIRSCHHORN: Null. So there's our empty thing. Pumpkin returns zero. What do we do? Walk me through it? Actually, Marcus already gave me. Somebody else walk me through it. What we do when we-- this looks very similar to what we were just doing. Avi. AUDIENCE: I'm going to take a guess. So when you get candy. JASON HIRSCHHORN: Yeah. Well, we got pumpkin. Let's get our first one. We got pumpkin. AUDIENCE: OK. Pumpkin returns zero. So you put it in that. Or actually, you put it in the linked list. JASON HIRSCHHORN: How do we put it in the linked list? AUDIENCE: Oh, the actual syntax? JASON HIRSCHHORN: Just walk-- say more. What do we do? AUDIENCE: You just insert it as the first node. JASON HIRSCHHORN: OK. So we have our node, pumpkin. And now how do I insert it? AUDIENCE: You assign it to the pointer. JASON HIRSCHHORN: Which pointer? AUDIENCE: The pointer at zero. JASON HIRSCHHORN: So where does this point? AUDIENCE: To null right now. JASON HIRSCHHORN: Well, it's pointing to null. But I'm putting in pumpkin. So where should it point? AUDIENCE: To pumpkin. JASON HIRSCHHORN: To pumpkin. Exactly. So this points to pumpkin. And where does this pointer in pumpkin point? To AUDIENCE: Null. JASON HIRSCHHORN: To null. Exactly. So we just inserted something into the linked list. We just wrote this code to do this. Almost we almost got it completely cracked. Now we insert candy. Our candy also goes to zero. So what do we do with candy? AUDIENCE: It depends on whether or not we're trying to sort it. JASON HIRSCHHORN: That's exactly right. It depends on whether or not we're trying to sort it. Let's assume we're not going to sort it. AUDIENCE: Well then, as we discussed before, it's simplest just to put it right at the beginning so the pointer from zero points to candy. JASON HIRSCHHORN: OK. Hold on. Let me create candy right here. So this pointer-- AUDIENCE: Yeah, should now be pointing to candy. Then have the pointer from candy point to pumpkin. JASON HIRSCHHORN: Like that? And say we got another thing to map to zero? AUDIENCE: Well, you just do the same thing? JASON HIRSCHHORN: Do the same thing. So in this case, if we don't want to keep it sorted it sounds rather simple. We take the pointer in the indice given by our hash function. We have that point to our new node. And then whatever it was pointing to previously-- in this case null, in the second case pumpkin-- that, whatever it's pointing to previously, we add into the next of our new node. We're inserting something in the beginning. In fact this is a lot simpler than trying to keep the list sorted. But again, searching will be more complicated on here. We'll always have to go to the end. OK. Any questions about separate chaining? How that works? Please ask them now. I really want to make sure you all understand this before we head out. AUDIENCE: Why do you put pumpkin and candy into the same part of the hash table? JASON HIRSCHHORN: Good question. Why do we put them in the same part of the hash table? Well, in this case our hash function returns zero for both of them. So they need to go at indice zero because that's where we're going to look for them if we ever want to look them up. Again, with a linear probing approach we wouldn't put them both at zero. But in the separate chain approach, we're going to put them both at zero and then create a list off of zero. And we don't want to overwrite pumpkin simply for that because then we'll assume that pumpkin was never inserted. If we just keep one thing in the location that would be bad. Then there would be no chance of us ever-- if we ever had a duplicate, then we would just erase our initial value. So that's why we do this approach. Or that's why we chose-- but again, we chose the separate chaining approach, which there are many other approaches one could choose. Does that answer your question? OK. Carlos. Linear probing would involve-- if we found a collision at zero, we would look in the next spot to see if it was open and put it there. And then we look in the next sport and see if that was open and put it there. So we find the next available open spot and put it there. Any other questions? Yeah, Avi. AUDIENCE: As a follow up to that, what do you mean by next spot? In the hash table or in a linked list. JASON HIRSCHHORN: For linear programming, no linked lists. The next spot on the hash table. AUDIENCE: OK. So the hash table would be initialized to the size-- like the number of strings that you were inserting? JASON HIRSCHHORN: You would want it to be really big. Yes. Here is a picture of what we just drew on the board. Again, we have a collision right here. at 152. And you'll see we created a linked list off of it. Again, the hash table separate chaining approach is not the one you have to take for problems set six but is one that a lot of students tend to take. So on that note, let us talk briefly before we head out about problem six, and then I'll share a story with you. We have three minutes. Problem set six. You have four functions-- load, check, size, and unload. Load-- well, we've been going over load just now. We drew load on the board. And we even started coding a lot of inserting into a linked list. So load is not much more than what we've just been doing. Check is once you have something loaded. It's the same process as this. The same first two parts where you throw something into the hash function and get its value. But now we're not inserting it. Now we're looking for it. I have sample code written for finding something in a linked list. I encourage you to practice that. But intuitively finding something is pretty similar to inserting something. Indeed, we drew a picture of finding something in a linked list, moving through until you got to the end. And if you got to the end and couldn't find it, then it's not there. So that's check, essentially. Next is size. Let's skip size. Finally you have unload. Unload is one we haven't drawn on the board or coded yet. But I encourage you to try coding it in our sample linked list example. But unload intuitively is similar to free-- or I mean is similar to check. Except for now each time you're going through, you're not simply checking to see if you have your value there. But you're taking that node and freeing it, essentially. That's what unload asks you to do. Free everything you've malloced. So you're going through the whole list again, going through the whole hash table again. This time don't check to see what's there. Just free what's there. And finally size. Size should be implemented. If you do not implement size-- I'll say it like this. If you do not implement size in exactly one line of code including the return statement, you are doing size incorrectly. So make sure size, for full design points, you're doing it in exactly one line of code, including the return statement. And don't pack up yet, Akchar. Eager beaver. I wanted to say thank you guys for coming to section. Have a Happy Halloween. This is my costume. I'll be wearing this on Thursday if I see you at office hours. And if you're curious about some more background as to this costume, feel free to check out 2011 section for a story on why I'm wearing the pumpkin costume. And it is a sad story. So make sure you have some tissues nearby. But on that, if you have any questions I'll stick around outside after section. Good luck on problem set six. And as always, if you have any questions, let me know.