CARTER: All right. Welcome, everyone. This is our second test review session. My name is Carter. I'm joined by Connor, one of our head course assistants. And this is a pointer. When we were asking you all to come up with test review questions, like I'd pose to you all in lecture, or in some fun games later on, we asked you for a particularly hard question. One of you wrote, "Literally anything about pointers." And so I'm delighted to say that we're going to be spending a little bit of time on pointers. Talking about how they're useful. How they can be used in data structures later on. And talk about memory and data structures in this review session. So, before we talk about pointers, though, we have to talk about memory addresses and memory locations. And, if you recall from week four, we had this idea of a computer's memory as basically this collection of grids. And inside each of these blocks, is a byte, eight bits, that represent information, right? And so in some cases, we might care about some of these bytes, we might not about other bytes. Those are garbage values, not used in our program. But again, the study of memory as this sort of block of bytes in our computer. Now going back to representation, we talked about how some of these bytes actually refer to characters. Or to refer to numbers and so on. In this case, the characters are one byte, and these bits here represent this sequence of characters, "HI!" exclamation point. Where there's no character at the end. 4 bytes where each character is one byte. Now, this is all fine and good for storing these values in our memory, right? But it's also good to have a way to talk about the location of these places. If you want to refer back to them, if we want to use these later on in our program, it's good to have a way to talk about where these things are stored. And that's where memory addresses come in. Where we can start to give a name to these different locations. Different blocks in our computer's memory. So here we have hexadecimal notation, which you may recall, is basically talking in base 16, where instead of having a single digit correspond to numbers 0 through 9, they can also go up to A through F, right? Having 16 different values we can represent in a single place. And we notate this base 16 notation with this 0x at the beginning. So we have 0x0 for this very first location memory. 0x1 and so on. We count up to nine, but then we can go past nine, and we can say A B C D E F, and so on, having 16 different possible values we can show in that single digits place. So this works well for our basically 16-- or 4 by 4 grid, or 16 different possible values here. So, as a brief reflection on this, we have this idea of mapping memory locations. Let's say I want the value at 0x0, well what character am I talking about in this case? Feel free to message me in the chat. Message me or Connor, here. Which character does 0x0 refer to? All right. So I'm seeing H, in the chat. And it's definitely the case, that 0x0 refers to H, right? If we go back to our little map here, we see that 0x0 is the location of this first byte. It certainly refers back to H. But notice how 0x0 is not H, it's just pointing to H, right? It's not the actual value, it's just telling us, hey this is where H is stored. And this is what we call a pointer, right? A pointer is literally a memory address. A place in memory we can point to and say, this is where this certain value is stored. It's not that value. It's just the location of that value. And to show an example by contrast, let's say I wanted to store a value like H, and name it something, right? Maybe I want to store this character H and call it C. Well, what would happen here, is my compiler would go ahead and find location memory, put H there, and sort of name that, quote, unquote, "c." So whenever I talk about c, I get that value H. I order print c. I get that value H. So, c doesn't point to H, c is like a name for H, right? It's different than the pointer, it's like a name for H. But here, we started to add in this syntax of ampersands and stars, and talking about more than just about naming variables, getting their address and their location. So, before we sort of dive into the syntax, let's try to break it down, and read it from right to left. We had first this &c, right? Well, what does the ampersand mean? I have this mnemonic I really like to think about in my head, which is ampersand begins with A, and so does address. So ampersand sort of corresponds to this idea of an address in a computer's memory. If I were to say &c what I'm asking for is the address of c right? &c address of c And of course, in this case, since c refers to H, the address of C is 0x0. So, basically that's evaluating that right side of this expression first. But then, we can also talk about what's going on with this left hand side, right? This char *p, which is basically defining for us, saying, I want a pointer called p, that points to character. And this works out for us, right? Because we know a pointer is an address. And here we're actually giving an address to this pointer, right? So this star notation is saying p is a pointer to a character. It's not a character. It's not the actual value H, it's just a pointer to that character. And we call it p. But you know, what happened with that asterisk, right? We originally used it to define p, but now it's gone. And this is where our syntax gets a little bit confusing, right? We use it to define a pointer, but we don't necessarily use it to reference or get back the value of a point, or the actual address there. We do use that star to follow a pointer, and get the value store to that location. So *p would give us that value H, because it's saying follow that arrow to that location and get that value. So now is a bit of practice, based on what we've just seen here, so far. Let's say I wanted to have this value 'I' stored here. So what am I calling the value of 'I' in this case, if I'm writing this code on the left, char c = 'I' what am I now calling the value of 'I'? Right. So in the chat, I'm seeing I'm calling it c, right? This is literally, I'm naming this value c. Now, if I want to get the location of c, might something like this. I would say, OK, I want a new character pointer called p, that's going to get the value of &c. So what is &c in this case? You want to add in the chat. Right, so I'm seeing 0x1 right? Where this value of 'I' is now 0x1. And so this pointer, p, will point to 0x1 in our computer's memory. Now again, to go and get that value of 'I' what would we do? We would use this *p syntax, to say, go and get me that value there. Now, that's sort of the gist of pointers, and pointer syntax. Where we can do lots of really cool things with pointers. One of them is about allocating memory. Giving our program more memory it can use. And remember we saw this idea of malloc, this function that we can use to give us segments of a computer's memory. Now malloc as an argument, will take a certain number of bytes at the space level. So here I might say, malloc give me 4 bytes of memory, and then give me back a pointer that I'm going to call s. And I say I'm wanting to store characters in here, just as a sort of hint to the compiler. So, what I'm doing here, is saying, OK malloc has given me these 4 bytes. And now s is pointing to that first location. So 0x0 of those four bytes that I've gotten, right? Again, if I want to add a value in here, what I could do is say, OK let's go to that value of s, and add the character h. So go to s, which is that sort of asterisk there, and store the value of h. And now I would see that h sort of gets into that first location of my malloc locked space, right? I could then sort of move one bite over, using pointer arithmetic, and go ahead and add this 'I" in, right? I can sort of do that all the way through. Sort of adding these characters until I get to that terminating null character, right? But once we're done with this, one thing we have to do is free up our memory. We don't want to just keep adding more and more memory. You have to give it back to our computer. And that's what this idea of freeing comes in. I could free s, remember s was our pointer to that first byte in our malloc space. And I want to say I'm going to free it. Which basically means I'm not going to use this location anymore. It's free from their program to use. It doesn't necessarily delete the values there. It doesn't necessarily change where s is pointing to. It's just saying, hey, this memory, I'm no longer going to touch it, and other programs are free to go ahead and do that. So, what questions are there on pointers? On malloc and free, so far? Before we go ahead and see how these can be used in data structures. And I'll pause here for just a minute or so, so you have some time to write some questions in the chat, or message them to me or to Connor. All right, and certainly happy to keep taking more questions if they come in. But, what we're going to do now is see how these pointers can be worked into data structures, and Connor is here, our expert on data structure, to talk us through what we can do, and what we should think about for data structures, when it comes to the test. CONNOR: Right. Hello, everyone. I am to go ahead and get started with a little bit of review on data structures. So I'm going to go ahead and share my screen. Yes, to this nice presentation here, we have on data structures. So, to get started out, I'm just going to ask people to put in the chat, like, what are some data structures that we have learned about? Hopefully we'll get some interaction here. To get everyone started to thinking about this review. It's been a little bit since we talked about them initially. Someone said trees. That's a great example. Anybody else want to contribute to the chat a little bit? Perfect. Someone said a single linked list. Yeah that's a really important one. One that we've worked with a lot. We'll definitely go over that. Hash tables as well, yeah we're going to go over all of those. All right. Perfect. So, here's a little bit more of a comprehensive list of the ones that we're going to be going over. We're going to talk about all of these today. Notice they're in two categories here. Where we've got these top 5, are all kind of concrete data structures, in that they have specific implementations in C, or in Python, or in other languages that we can talk about. Whereas stacks and queues are abstract data structures, and we'll talk a little bit more about what that means when we get to them. And then, also, if you don't mind, if people could put in the chat as well some ideas on why we use data structures. Why they might be helpful in our code. Why don't we just use variables for everything? What are these data structures useful for? Yeah, someone said making better algorithms. Yeah. So and that's a great way of thinking about it. So, data structures really help us to organize the information that we're storing in our code. So, while maybe we could do store everything in individual variables, if we're storing like a bunch of data, like a bunch of numbers then it's probably easier to store an array with 100 numbers in it, rather than have 100 variables all called x1, x2, x3. Yeah, and someone else said searching large amounts of data. And that ties into this as well. We're going to have certain data structures that are really useful for searching or sorting data. Specifically, we're going to talk about the hash tables which you implemented for your Speller PSAT, if you all remember back that far. I know it was a long time ago. But today, we're going to start out talking about arrays. So, arrays are kind of the first data structure we talked about in this class. And essentially, they look like a list of some data type. In this case, it's integers. But it could also be characters, or floats, or whatever data type you want. And some interesting things to keep in mind about arrays, is that we index starting at 0. So remember that this is the 0th element. This is the first, second, third, fourth, and so on if you have a longer one. To look up or to change an element in array, it takes constant time. So if I want to say, get me the fourth element of this array, it'll go right over in constant time to 25 here. And so that's really quick. They're fixed length, though. So, this is the big drawback of arrays, is that I've got this list of five numbers here. If I want to add a sixth, then I'm going to have to go through a whole process of making it a new array, allocating some new memory. Essentially some difficult stuff. And so we're going to want to try and fix that. And the way that we're going to try and fix that is with this really cool data structure that we learned about called linked lists. So the idea of a linked list is to get that variable length that we don't have for an array. And so the base of a linked list is this struct called a node. That you've seen a lot in our labs and PSAT so far. So I want to go over it a little bit more. Where everything in our node is going to have some sort of value. In this case, we're saying the value is an integer, because we're storing a list of numbers. But that could also be a char, or a float, or anything else. And then we're going to have this pointer called "next." And that is going to point to the next thing in our list. So the way I like to think about a linked list is it's kind of like a treasure hunt or something like that, where you know where the first item is, but then you have to go to that first item to get to the next item, and then the next, and then the next after that. And so this is kind of what that can look like graphically. So we've got like some variable called my_linked_list. And that variable is just a pointer to a node over here. And that node can have a value, which in this case is five. And then a pointer to another node, which has a value, and then a pointer to another node, which has a value, and a pointer to or-- and then this one doesn't have a pointer to anything. It just has null here at the end. And so that's going to be our sign that the list is over, that that's our last element in the list. And notice these nodes aren't lined up neatly, and that's not just because I'm not great at making Google Slides line up well. It's also to show that when you're allocating memory for a linked list, the memory doesn't all have to be next to each other like in an array. We can have one chunk of memory over here, another chunk over here, another chunk over here. If we add a new element, the chunk of memory might be over here. They can be all over the place. Just as long as they're pointing to each other. And then I'm going to talk briefly about how we add elements to a linked list. Because that's some pretty interesting stuff. So when we start out, with the linked list, we'll essentially just have it be null at first. So our variable my linked list will be this pointer to null. And then let's say I want to add an element with a value of one to the list. What I have to do first is I'll use malloc to allocate some memory for that element. So now we've got this new node over here, but there's nothing in it yet. And then what I'll do is update the value of that node. So I'll say, I want this node to have a value of one. And then what I'll do is I'll update this next-- this bottom part as the next element of the node-- and I'll say point to wherever my linked list is pointing now. So I'll point here to this null. And then finally, I'll update this variable my linked list so that it's pointing to my new node. So now those steps have gone through have made it so that I now have a linked list with one element, one node, and then it's pointing to null. So we can go through this again. Where if I want to add another element to this I'll malloc to create space for it, I'll add the value, I'll add a pointer to my current linked list, and then I'll update my linked list to point to this new node. And then, just to get us up to speed with our last one, we can do this to get our whole linked list here. Just some features of linked lists that is going to be good to remember for the upcoming test. Linked lists have linear time indexing. So unlike arrays, we can't just grab the third element, or something, or the second element. If we want to get to this element one, this element with index two, we would have to search, first through this node, then through that node, and then finally to this node. So it's going to be linear time, because you might have to go through all n elements of a linked list. If we want to add to the front, like we just demonstrated, that will be constant time, because we won't have to iterate through the entire linked list. But adding to the end is going to be linear time as well, because we need to find that last element. But the nice thing about linked lists is that they're of variable length. So unlike arrays, we can keep adding and adding to linked lists indefinitely. And I'll just pause here, since linked lists are a very important concept. And something that we know that there can be pretty confusing. I definitely get confused by them all the time. I'll just pause and ask what questions you have about linked lists at the moment. And feel free to send them in the chat. All right. Feel free to keep sending questions, if any come up. But for now I'm going to go ahead and move on to our next topic, which are going to be hash tables. Which is the big topic in our Speller PSATs, looking back to that. The major thing to remember about hash tables is that it's just an array of linked lists. So here we have a visualization of this hash table that we're working with. And you can see over on the left, we just have this array of 26 elements. And each of the elements of that array is going to be a pointer to a linked list. And so we can see that like in the zero index of this array, we've got this linked list with just the name Albus. In the 25th index of this array, we've got this linked list with one element, Zacharias. But we can also see in this element of the array we've got this linked list with multiple items here. And so this is essentially what your linked lists are going to look like. Although, in practice, and for a lot of you, in Speller, your linked lists might not be an array of length 26, they might be an array of length 50,000, or 100,000, or something a little bit crazier like that. The really important thing when we're creating a hash table is finding a good hash function. So a hash function is what we use to find the array index of an element. So if I decide I want to add a new word to this hash table, if I want to add Carter to this hash table, then I'd have to figure out which index to put it in. And to do that, I'm going to have a function that takes in the element. So in this case, it would take in the string Carter, and they would output an integer. So in this case, we might want it to output two here so that Carter could be part of this linked list with all of the Cs. Just some properties of a good hash function. A good hash function uses all of the data available. So if we've got like a string, we probably don't want to use only like three letters of it, we probably want to use the whole string. It's deterministic. This is the most important part, is that I can't have Carter sometimes mapped to two, but Carter also sometimes maps to 24. I need it to always map to the same thing. And hopefully it uniformly distributes data. So I'll pause here and ask you all to think, for a minute or so, about which of these properties, if we're thinking about our hash function here, first of all, you can think about what our hash function is. It looks to me like we're just sorting things alphabetically. So like A would be zero, Z would be 25. And everything in between. Just based on the first letter of the word. And so our hash function would essentially be taking in a string, and then converting the first character of that string to an integer. And so, I'm curious as to what you all think about which of these three characteristics are true of this hash function that we're talking about? Does it use all the data available? Is it deterministic? And does it uniformly distribute data? So if you could all make a guess in the chat, that would be great. And then we can come together and talk about it. All right. We got one person saying it is deterministic. And that is great. If we're taking the first letter of the word and turning that into an integer, that first letter of a word is never going to change. So, we're always going to get the same value. Carter is always going to be two here. What about using all of the data available? Maybe just send a quick yes or no in the chat. Yes, does it use all of the data, or no, it does not? All right and here we've got some mixed reviews. So we've got some people saying it doesn't use all of the data, and some people saying it does. So this is an interesting question, in that if we're hashing the string Carter, are we using all of the data? And in this specific case, with this specific hash function, the answer is going to be no. Because any string with the first letter C is going to hash to the same value. So, even though Carter and Connor are two different names, and could potentially be in two different spots, they're always going to be in the same spot, here, because they both start with the same letter. So it does not use all the data. And then what about uniformly distributing data? This is a kind of tougher one. And you might have to think a little bit harder about this, but any guesses, yes or no, does it uniformly distribute data? All right and it looks like we've got some mixed reviews on this one, too. And so, this one really depends on what you're storing. So, in this case-- yeah, and someone asked if I could go over what uniformly distributing data is-- what it means is that we don't want a hash table with one of these spots in the arrays having like 30 elements, and all of the others don't have any elements. Because then it's essentially a big linked list. And we get rid of our constant time lookup, we have to use linear time like a linked list. And so, that can be a problem in this case, because if we're storing names, I know a lot more people with a first name that starts with M than people with a first name that starts with X. And so in our case, we might have a lot more people in the M or T or C or A category, than we have in the X category. So it maybe doesn't do a great job at uniformly distributing data. And we already talked about where Carter would go, Carter would go in this index two spot here. And then some general thoughts about hash tables it's constant time to add an element. It's constant time to look up if your hash function is good. Like I said, if the hash function isn't uniformly distributing data, then we could end up with a really long list in one of the buckets. And we don't want that. And importantly, it is a variable size. Because we have this flexibility of each of these spots being a linked list, we can store as many elements as we want in these. All right. Now we'll move on to talking a little bit about trees. So trees are very similar to the linked list that we just talked about, in that the base element of a tree is going to be a node. The only difference is that in a tree, the node might point to more other nodes. So, here we have this base node, or the root node four, and it's pointing to a two and a six here. There are also many different ways that we can use trees to represent data. So like in lecture, we talked about a binary search tree, and we'll look at one of those in a second. But we also in lab use like a family tree. Where each node was a child, and that child pointed to both of its parents, when we were doing the heredity lab. When we're starting with trees, we're usually going to start with the root of a tree and then branch off of that. And for the sake of time, I'm going to move on for now and not talk about the specific binary search tree, but it was covered in lecture. I want to talk quickly about tries. And tries are a lot like trees. I'm having a little bit of trouble switching through slides here. OK perfect. A tri is a tree where each node is an array. So here we can see we have this root, that's an array representing all 26 letters. And then the index of that array representing H, is going to point to another node, and that other node is also an array. And so the idea here is that we can store our data. We can store this name Hagrid, as a set of arrays. Where we've got like, this index points to another array, where the A points to another array, where the G points to another array, and so on, and so on. And then when we have a successful word, when we follow this path all the way down to get to the end, then we can turn this green, or maybe out of value true here, to show that Hagrid is a word in our system here. An advantage of a tri is its constant time to add an element. Its constant time to look up an element. Its variable size. The big problem with the tri, though, other than them being a little bit complicated and difficult to implement, is that the size required to store it can be huge. So, notice here, just to store one name, we needed six arrays, each of length 26, just to store this one name. So it can get a little bit unwieldy in terms of the actual size of this data. All right. And then lastly, we're going to talk about stacks and queues, which are abstract data types. So a stack is first in and last out. Which means that the first item I put into this stack is going to stay there and it's going to be the last one to come out. And so what I mean by that is that our two functions in a stack are going to be pushed, so I want to add something. Or pop, I want to take something off. And so if I push the element five to the stack, I'm just going to have this element five there. If I push another thing to the stack I push seven, then I put that on top of five here. If I push three, I'm going to put that on top. And now if I decide to pop something, so I just call this pop function, there's no arguments here. Then what it's going to do is it's going to do is take away the last thing. It's going to take away this three. And it's going to return that to me. And then if I pop another element, it'll return seven it'll take away this seven. The reason I call this an abstract data type is that we can use different things to do this. So I can have these numbers. These five, seven, three, as like an array, or as a linked list, or as some other data type. And so it's kind of flexible, which concrete data type we use for this abstract idea of a stack. And then finally, I'll talk a little bit about what a queue is. If I can get to that slide. Perfect. So queues are essentially the opposite, in that they're still in abstract data type, but they're first in, first out. And so, we can push this five same as we did last time, we can push the seven, we can push the three. But this time when we say pop, we're going to take the bottom element. We're going to take the element that went in first. If we push, say pop again, we're going to take the next element to the bottom. And so I'll pause there and just ask a quick question, again in the chat. Can anyone give me an example of when we might use a stack versus when we might use a queue? I can give an example to start off. I consider my desk to be a stack. Because my desk is very messy. And so when I have something that I'm working on, I put it on the desk. And then when I'm working on something else, I put it on top of that, and then on top of that again. And then when I need to-- when I'm working on things again, I'll take the thing off the top, and do some work on it, and finish that. And then take the next thing and work on that, and finish that. So, even though it might not be the best system, my desk is a little bit of a stack. And then a queue would be-- we see those all the time just in waiting in line. When you get into a line, the first one to be in that line is usually the first one to get out of line. And then Carter mentioned that you want to keep your clothes in a queue. Which is also a great idea. If I kept my clothes in a stack, and I just took off my favorite pair of jeans, and put them on top of the stack, and then the next day, grabbed them off of the stack again, then I'm going to be wearing the same pair of jeans every single day. And they'll start to smell. So a queue would be good for rotating your wardrobe, as well. That's all I have, right now, in terms of data structures. Data structures are a very complex topic, and we went over them very quickly. If you have more questions, feel free to ask them now in the chat. But I would also recommend heading through the lecture notes from week five. Or potentially rewatching parts of lectures, based on which parts were more or less confusing, but I'll pause now and ask what questions you all have from the chat. CARTER: Connor, one question we had was asking what does it mean to index a linked list? Do you have anything to say on that? CONNOR: Yeah that's a great question. So, we can-- let me get back to my linked list slide. So indexing linked list is something that we might not necessarily do in all situations. And a lot of the times when we've been using a linked list, we have not been thinking about them as having an order. So, like when we did our hash table down here, I won't try to go through all the sides again, but when we did our hash table, we didn't really care where in the linked list an item was, we just cared whether or not it was in the list. But sometimes, the linked list the order will actually matter. And so I might want to say, get me the fourth thing in the linked list, or get me eighth thing in the linked list. And so if I want the 12th thing in the linked list, then I'm going to have to go through the first, second, third, fourth, all the way up through the 11th, before I can find the 12th. And so that's essentially what I mean by indexing, is that action of saying get me the x element in this linked list. And then I had a question here about-- going to the slide-- about adding elements, and finding elements in hash tables. This slide right here. And I'll pause and ask if people have some more specific questions about this. CARTER: One thing that I've seen is a question about is a hash table the same thing as a dictionary? CONNOR: Yeah that's a great question. Let me think. So, Carter I don't know if you have a better answer for this, I'm actually not totally sure how dictionaries are implemented in Python. CARTER: Yeah I think that's something the Python documentation can certainly tell you. I think one of the main things to highlight is that very similar ideas are going on. We're trying to basically map a key to a value. Like, we're trying to map Carter to 28, or in a Python dictionary, we're trying to map name to the value Carter. So there's certainly there's this underlying idea of mapping keys and values, I think. CONNOR: Yeah, and an interesting side note on that, is that a Python set is implemented as a hash table. So if you ever find yourself using a set in Python, that's why they're so fast. All right. Do you think this would be a good time to move into a walk through of a practice problem? CARTER: I think so. So we thought for the rest of the session-- let me see if this will work-- right. We have a practice test question that focuses on data structures and our favorite song Baby Shark . We'll be going through this in the form of a walk through. Where you'll be able to ask questions, certainly feel free to unmute yourselves. We're going to pause the stream here.