WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:06.030 >> [MUSIC PLAYING] 00:00:06.030 --> 00:00:08.390 >> DOUG LLOYD: Pointers, here we are. 00:00:08.390 --> 00:00:11.080 This is probably going to be the most difficult topic 00:00:11.080 --> 00:00:12.840 that we talk about in CS50. 00:00:12.840 --> 00:00:15.060 And if you've read anything about pointers 00:00:15.060 --> 00:00:19.080 before you might be a little bit intimidating going into this video. 00:00:19.080 --> 00:00:21.260 It's true the pointers do allow you the ability 00:00:21.260 --> 00:00:23.740 to perhaps screw up pretty badly when you're 00:00:23.740 --> 00:00:27.450 working with variables, and data, and causing your program to crash. 00:00:27.450 --> 00:00:30.490 But they're actually really useful and they allow us a really great way 00:00:30.490 --> 00:00:33.340 to pass data back and forth between functions, 00:00:33.340 --> 00:00:35.490 that we're otherwise unable to do. 00:00:35.490 --> 00:00:37.750 >> And so what we really want to do here is train 00:00:37.750 --> 00:00:41.060 you to have good pointer discipline, so that you can use pointers effectively 00:00:41.060 --> 00:00:43.850 to make your programs that much better. 00:00:43.850 --> 00:00:48.220 As I said pointers give us a different way to pass data between functions. 00:00:48.220 --> 00:00:50.270 Now if you recall from an earlier video, when 00:00:50.270 --> 00:00:53.720 we were talking about variable scope, I mentioned 00:00:53.720 --> 00:01:00.610 that all the data that we pass between functions in C is passed by value. 00:01:00.610 --> 00:01:03.070 And I may not have used that term, what I meant there 00:01:03.070 --> 00:01:07.170 was that we are passing copies of data. 00:01:07.170 --> 00:01:12.252 When we pass a variable to a function, we're not actually passing the variable 00:01:12.252 --> 00:01:13.210 to the function, right? 00:01:13.210 --> 00:01:17.670 We're passing a copy of that data to the function. 00:01:17.670 --> 00:01:20.760 The function does what it will and it calculates some value, 00:01:20.760 --> 00:01:23.180 and maybe we use that value when it gives it back. 00:01:23.180 --> 00:01:26.700 >> There was one exception to this rule of passing by value, 00:01:26.700 --> 00:01:31.210 and we'll come back to what that is a little later on in this video. 00:01:31.210 --> 00:01:34.880 If we use pointers instead of using variables, 00:01:34.880 --> 00:01:38.180 or instead of using the variables themselves or copies of the variables, 00:01:38.180 --> 00:01:43.790 we can now pass the variables around between functions in a different way. 00:01:43.790 --> 00:01:46.550 This means that if we make a change in one function, 00:01:46.550 --> 00:01:49.827 that change will actually take effect in a different function. 00:01:49.827 --> 00:01:52.160 Again, this is something that we couldn't do previously, 00:01:52.160 --> 00:01:56.979 and if you've ever tried to swap the value of two variables in a function, 00:01:56.979 --> 00:01:59.270 you've noticed this problem sort of creeping up, right? 00:01:59.270 --> 00:02:04.340 >> If we want to swap X and Y, and we pass them to a function called swap, 00:02:04.340 --> 00:02:08.680 inside of the function swap the variables do exchange values. 00:02:08.680 --> 00:02:12.600 One becomes two, two becomes one, but we don't actually 00:02:12.600 --> 00:02:16.890 change anything in the original function, in the caller. 00:02:16.890 --> 00:02:19.550 Because we can't, we're only working with copies of them. 00:02:19.550 --> 00:02:24.760 With pointers though, we can actually pass X and Y to a function. 00:02:24.760 --> 00:02:26.960 That function can do something with them. 00:02:26.960 --> 00:02:29.250 And those variables values can actually change. 00:02:29.250 --> 00:02:33.710 So that's quite a change in our ability to work with data. 00:02:33.710 --> 00:02:36.100 >> Before we dive into pointers, I think it's worth 00:02:36.100 --> 00:02:38.580 taking a few minutes to go back to basics here. 00:02:38.580 --> 00:02:41.000 And have a look at how computer memory works 00:02:41.000 --> 00:02:45.340 because these two subjects are going to actually be pretty interrelated. 00:02:45.340 --> 00:02:48.480 As you probably know, on your computer system 00:02:48.480 --> 00:02:51.310 you have a hard drive or perhaps a solid state drive, 00:02:51.310 --> 00:02:54.430 some sort of file storage location. 00:02:54.430 --> 00:02:57.950 It's usually somewhere in the neighborhood of 250 gigabytes 00:02:57.950 --> 00:02:59.810 to maybe a couple of terabytes now. 00:02:59.810 --> 00:03:02.270 And it's where all of your files ultimately live, 00:03:02.270 --> 00:03:04.870 even when your computer is shut off, you can turn it back on 00:03:04.870 --> 00:03:09.190 and you'll find your files are there again when you reboot your system. 00:03:09.190 --> 00:03:14.820 But disk drives, like a hard disk drive, an HDD, or a solid state drive, an SSD, 00:03:14.820 --> 00:03:16.050 are just storage space. 00:03:16.050 --> 00:03:20.400 >> We can't actually do anything with the data that is in hard disk, 00:03:20.400 --> 00:03:22.080 or in a solid state drive. 00:03:22.080 --> 00:03:24.950 In order to actually change data or move it around, 00:03:24.950 --> 00:03:28.800 we have to move it to RAM, random access memory. 00:03:28.800 --> 00:03:31.170 Now RAM, you have a lot less of in your computer. 00:03:31.170 --> 00:03:34.185 You may have somewhere in the neighborhood of 512 megabytes 00:03:34.185 --> 00:03:38.850 if you have an older computer, to maybe two, four, eight, 16, 00:03:38.850 --> 00:03:41.820 possibly even a little more, gigabytes of RAM. 00:03:41.820 --> 00:03:46.390 So that's much smaller, but that's where all of the volatile data exists. 00:03:46.390 --> 00:03:48.270 That's where we can change things. 00:03:48.270 --> 00:03:53.350 But when we turn our computer off, all of the data in RAM is destroyed. 00:03:53.350 --> 00:03:57.150 >> So that's why we need to have hard disk for the more permanent location of it, 00:03:57.150 --> 00:03:59.720 so that it exists- it would be really bad if every time we 00:03:59.720 --> 00:04:03.310 turned our computer off, every file in our system was obliterated. 00:04:03.310 --> 00:04:05.600 So we work inside of RAM. 00:04:05.600 --> 00:04:09.210 And every time we're talking about memory, pretty much, in CS50, 00:04:09.210 --> 00:04:15.080 we're talking about RAM, not hard disk. 00:04:15.080 --> 00:04:18.657 >> So when we move things into memory, it takes up a certain amount of space. 00:04:18.657 --> 00:04:20.740 All of the data types that we've been working with 00:04:20.740 --> 00:04:23.480 take up different amounts of space in RAM. 00:04:23.480 --> 00:04:27.600 So every time you create an integer variable, four bytes of memory 00:04:27.600 --> 00:04:30.750 are set aside in RAM so you can work with that integer. 00:04:30.750 --> 00:04:34.260 You can declare the integer, change it, assign it 00:04:34.260 --> 00:04:36.700 to a value 10 incremented by one, so on and so on. 00:04:36.700 --> 00:04:39.440 All that needs to happen in RAM, and you get four bytes 00:04:39.440 --> 00:04:42.550 to work with for every integer that you create. 00:04:42.550 --> 00:04:45.410 >> Every character you create gets one byte. 00:04:45.410 --> 00:04:48.160 That's just how much space is needed to store a character. 00:04:48.160 --> 00:04:51.310 Every float, a real number, gets four bytes 00:04:51.310 --> 00:04:53.390 unless it's a double precision floating point 00:04:53.390 --> 00:04:56.510 number, which allows you to have more precise or more digits 00:04:56.510 --> 00:04:59.300 after the decimal point without losing precision, 00:04:59.300 --> 00:05:01.820 which take up eight bytes of memory. 00:05:01.820 --> 00:05:06.730 Long longs, really big integers, also take up eight bytes of memory. 00:05:06.730 --> 00:05:09.000 How many bytes of memory do strings take up? 00:05:09.000 --> 00:05:12.990 Well let's put a pin in that question for now, but we'll come back to it. 00:05:12.990 --> 00:05:17.350 >> So back to this idea of memory as a big array of byte-sized cells. 00:05:17.350 --> 00:05:20.871 That's really all it is, it's just a huge array of cells, 00:05:20.871 --> 00:05:23.370 just like any other array that you're familiar with and see, 00:05:23.370 --> 00:05:26.430 except every element is one byte wide. 00:05:26.430 --> 00:05:30.030 And just like an array, every element has an address. 00:05:30.030 --> 00:05:32.120 Every element of an array has an index, and we 00:05:32.120 --> 00:05:36.302 can use that index to do so-called random access on the array. 00:05:36.302 --> 00:05:38.510 We don't have to start at the beginning of the array, 00:05:38.510 --> 00:05:40.569 iterate through every single element thereof, 00:05:40.569 --> 00:05:41.860 to find what we're looking for. 00:05:41.860 --> 00:05:45.790 We can just say, I want to get to the 15th element or the 100th element. 00:05:45.790 --> 00:05:49.930 And you can just pass in that number and get the value you're looking for. 00:05:49.930 --> 00:05:54.460 >> Similarly every location in memory has an address. 00:05:54.460 --> 00:05:57.320 So your memory might look something like this. 00:05:57.320 --> 00:06:01.420 Here's a very small chunk of memory, this is 20 bytes of memory. 00:06:01.420 --> 00:06:04.060 The first 20 bytes because my addresses there at the bottom 00:06:04.060 --> 00:06:08.890 are 0, 1, 2, 3, and so on all the way up to 19. 00:06:08.890 --> 00:06:13.190 And when I declare variables and when I start to work with them, 00:06:13.190 --> 00:06:15.470 the system is going to set aside some space for me 00:06:15.470 --> 00:06:17.595 in this memory to work with my variables. 00:06:17.595 --> 00:06:21.610 So I might say, char c equals capital H. And what's going to happen? 00:06:21.610 --> 00:06:23.880 Well the system is going to set aside for me one byte. 00:06:23.880 --> 00:06:27.870 In this case it chose byte number four, the byte at address four, 00:06:27.870 --> 00:06:31.310 and it's going to store the letter capital H in there for me. 00:06:31.310 --> 00:06:34.350 If I then say int speed limit equals 65, it's 00:06:34.350 --> 00:06:36.806 going to set aside four bytes of memory for me. 00:06:36.806 --> 00:06:39.180 And it's going to treat those four bytes as a single unit 00:06:39.180 --> 00:06:41.305 because what we're working with is an integer here. 00:06:41.305 --> 00:06:44.350 And it's going to store 65 in there. 00:06:44.350 --> 00:06:47.000 >> Now already I'm kind of telling you a bit of a lie, 00:06:47.000 --> 00:06:50.150 right, because we know that computers work in binary. 00:06:50.150 --> 00:06:53.100 They don't understand necessarily what a capital H is 00:06:53.100 --> 00:06:57.110 or what a 65 is, they only understand binary, zeros and ones. 00:06:57.110 --> 00:06:59.000 And so actually what we're storing in there 00:06:59.000 --> 00:07:03.450 is not the letter H and the number 65, but rather the binary representations 00:07:03.450 --> 00:07:06.980 thereof, which look a little something like this. 00:07:06.980 --> 00:07:10.360 And in particular in the context of the integer variable, 00:07:10.360 --> 00:07:13.559 it's not going to just spit it into, it's not going to treat it as one four 00:07:13.559 --> 00:07:15.350 byte chunk necessarily, it's actually going 00:07:15.350 --> 00:07:19.570 to treat it as four one byte chunks, which might look something like this. 00:07:19.570 --> 00:07:22.424 And even this isn't entirely true either, 00:07:22.424 --> 00:07:24.840 because of something called an endianness, which we're not 00:07:24.840 --> 00:07:26.965 going to get into now, but if you're curious about, 00:07:26.965 --> 00:07:29.030 you can read up on little and big endianness. 00:07:29.030 --> 00:07:31.640 But for the sake of this argument, for the sake of this video, 00:07:31.640 --> 00:07:34.860 let's just assume that is, in fact, how the number 65 would 00:07:34.860 --> 00:07:36.970 be represented in memory on every system, 00:07:36.970 --> 00:07:38.850 although it's not entirely true. 00:07:38.850 --> 00:07:41.700 >> But let's actually just get rid of all binary entirely, 00:07:41.700 --> 00:07:44.460 and just think about as H and 65, it's a lot easier 00:07:44.460 --> 00:07:47.900 to think about it like that as a human being. 00:07:47.900 --> 00:07:51.420 All right, so it also seems maybe a little random that I've- my system 00:07:51.420 --> 00:07:55.130 didn't give me bytes 5, 6, 7, and 8 to store the integer. 00:07:55.130 --> 00:07:58.580 There's a reason for that, too, which we won't get into right now, but suffice 00:07:58.580 --> 00:08:00.496 it to say that what the computer is doing here 00:08:00.496 --> 00:08:02.810 is probably a good move on its part. 00:08:02.810 --> 00:08:06.020 To not give me memory that's necessarily back to back. 00:08:06.020 --> 00:08:10.490 Although it's going to do it now if I want to get another string, 00:08:10.490 --> 00:08:13.080 called surname, and I want to put Lloyd in there. 00:08:13.080 --> 00:08:18.360 I'm going to need to fit one character, each letter of that's 00:08:18.360 --> 00:08:21.330 going to require one character, one byte of memory. 00:08:21.330 --> 00:08:26.230 So if I could put Lloyd into my array like this I'm pretty good to go, right? 00:08:26.230 --> 00:08:28.870 What's missing? 00:08:28.870 --> 00:08:31.840 >> Remember that every string we work with in C ends with backslash zero, 00:08:31.840 --> 00:08:33.339 and we can't omit that here, either. 00:08:33.339 --> 00:08:36.090 We need to set aside one byte of memory to hold that so we 00:08:36.090 --> 00:08:39.130 know when our string has ended. 00:08:39.130 --> 00:08:41.049 So again this arrangement of the way things 00:08:41.049 --> 00:08:42.799 appear in memory might be a little random, 00:08:42.799 --> 00:08:44.870 but it actually is how most systems are designed. 00:08:44.870 --> 00:08:48.330 To line them up on multiples of four, for reasons again 00:08:48.330 --> 00:08:50.080 that we don't need to get into right now. 00:08:50.080 --> 00:08:53.060 But this, so suffice it to say that after these three lines of code, 00:08:53.060 --> 00:08:54.810 this is what memory might look like. 00:08:54.810 --> 00:08:58.930 If I need memory locations 4, 8, and 12 to hold my data, 00:08:58.930 --> 00:09:01.100 this is what my memory might look like. 00:09:01.100 --> 00:09:04.062 >> And just be particularly pedantic here, when 00:09:04.062 --> 00:09:06.020 we're talking about memory addresses we usually 00:09:06.020 --> 00:09:08.390 do so using hexadecimal notations. 00:09:08.390 --> 00:09:12.030 So why don't we convert all of these from decimal to hexadecimal notation 00:09:12.030 --> 00:09:15.010 just because that's generally how we refer to memory. 00:09:15.010 --> 00:09:17.880 So instead of being 0 through 19, what we have is zero 00:09:17.880 --> 00:09:20.340 x zero through zero x1 three. 00:09:20.340 --> 00:09:23.790 Those are the 20 bytes of memory that we have or we're looking at in this image 00:09:23.790 --> 00:09:25.540 right here. 00:09:25.540 --> 00:09:29.310 >> So all of that being said, let's step away from memory for a second 00:09:29.310 --> 00:09:30.490 and back to pointers. 00:09:30.490 --> 00:09:32.420 Here is the most important thing to remember 00:09:32.420 --> 00:09:34.070 as we start working with pointers. 00:09:34.070 --> 00:09:36.314 A pointer is nothing more than an address. 00:09:36.314 --> 00:09:38.230 I'll say it again because it's that important, 00:09:38.230 --> 00:09:42.730 a pointer is nothing more than an address. 00:09:42.730 --> 00:09:47.760 Pointers are addresses to locations in memory where variables live. 00:09:47.760 --> 00:09:52.590 Knowing that it becomes hopefully a little bit easier to work with them. 00:09:52.590 --> 00:09:54.550 Another thing I like to do is to have sort 00:09:54.550 --> 00:09:58.510 of diagrams visually representing what's happening with various lines of code. 00:09:58.510 --> 00:10:00.660 And we'll do this a couple of times in pointers, 00:10:00.660 --> 00:10:03.354 and when we talk about dynamic memory allocation as well. 00:10:03.354 --> 00:10:06.020 Because I think that these diagrams can be particularly helpful. 00:10:06.020 --> 00:10:09.540 >> So if I say for example, int k in my code, what is happening? 00:10:09.540 --> 00:10:12.524 Well what's basically happening is I'm getting memory set aside for me, 00:10:12.524 --> 00:10:14.690 but I don't even like to think about it like that, I 00:10:14.690 --> 00:10:16.300 like to think about it like a box. 00:10:16.300 --> 00:10:20.090 I have a box and it's colored green because I 00:10:20.090 --> 00:10:21.750 can put integers in green boxes. 00:10:21.750 --> 00:10:23.666 If it was a character I might have a blue box. 00:10:23.666 --> 00:10:27.290 But I always say, if I'm creating a box that can hold integers 00:10:27.290 --> 00:10:28.950 that box is colored green. 00:10:28.950 --> 00:10:33.020 And I take a permanent marker and I write k on the side of it. 00:10:33.020 --> 00:10:37.590 So I have a box called k, into which I can put integers. 00:10:37.590 --> 00:10:41.070 So when I say int k, that's what happens in my head. 00:10:41.070 --> 00:10:43.140 If I say k equals five, what am I doing? 00:10:43.140 --> 00:10:45.110 Well, I'm putting five in the box, right. 00:10:45.110 --> 00:10:48.670 This is pretty straightforward, if I say int k, create a box called k. 00:10:48.670 --> 00:10:52.040 If I say k equals 5, put five into the box. 00:10:52.040 --> 00:10:53.865 Hopefully that's not too much of a leap. 00:10:53.865 --> 00:10:55.990 Here's where things go a little interesting though. 00:10:55.990 --> 00:11:02.590 If I say int*pk, well even if I don't know what this necessarily means, 00:11:02.590 --> 00:11:06.150 it's clearly got something to do with an integer. 00:11:06.150 --> 00:11:08.211 So I'm going to color this box green-ish, 00:11:08.211 --> 00:11:10.210 I know it's got something to do with an integer, 00:11:10.210 --> 00:11:13.400 but it's not an integer itself, because it's an int star. 00:11:13.400 --> 00:11:15.390 There's something slightly different about it. 00:11:15.390 --> 00:11:17.620 So an integer's involved, but otherwise it's 00:11:17.620 --> 00:11:19.830 not too different from what we were talking about. 00:11:19.830 --> 00:11:24.240 It's a box, its got a label, it's wearing a label pk, 00:11:24.240 --> 00:11:27.280 and it's capable of holding int stars, whatever those are. 00:11:27.280 --> 00:11:29.894 They have something to do with integers, clearly. 00:11:29.894 --> 00:11:31.060 Here's the last line though. 00:11:31.060 --> 00:11:37.650 If I say pk=&k, whoa, what just happened, right? 00:11:37.650 --> 00:11:41.820 So this random number, seemingly random number, gets thrown into the box there. 00:11:41.820 --> 00:11:44.930 All that is, is pk gets the address of k. 00:11:44.930 --> 00:11:52.867 So I'm sticking where k lives in memory, its address, the address of its bytes. 00:11:52.867 --> 00:11:55.200 All I'm doing is I'm saying that value is what I'm going 00:11:55.200 --> 00:11:59.430 to put inside of my box called pk. 00:11:59.430 --> 00:12:02.080 And because these things are pointers, and because looking 00:12:02.080 --> 00:12:04.955 at a string like zero x eight zero c seven four eight 00:12:04.955 --> 00:12:07.790 two zero is probably not very meaningful. 00:12:07.790 --> 00:12:12.390 When we generally visualize pointers, we actually do so as pointers. 00:12:12.390 --> 00:12:17.000 Pk gives us the information we need to find k in memory. 00:12:17.000 --> 00:12:19.120 So basically pk has an arrow in it. 00:12:19.120 --> 00:12:21.670 And if we walk the length of that arrow, imagine 00:12:21.670 --> 00:12:25.280 it's something you can walk on, if we walk along the length of the arrow, 00:12:25.280 --> 00:12:29.490 at the very tip of that arrow, we will find the location in memory 00:12:29.490 --> 00:12:31.390 where k lives. 00:12:31.390 --> 00:12:34.360 And that's really important because once we know where k lives, 00:12:34.360 --> 00:12:37.870 we can start to work with the data inside of that memory location. 00:12:37.870 --> 00:12:40.780 Though we're getting a teeny bit ahead of ourselves for now. 00:12:40.780 --> 00:12:42.240 >> So what is a pointer? 00:12:42.240 --> 00:12:45.590 A pointer is a data item whose value is a memory address. 00:12:45.590 --> 00:12:49.740 That was that zero x eight zero stuff going on, that was a memory address. 00:12:49.740 --> 00:12:52.060 That was a location in memory. 00:12:52.060 --> 00:12:55.080 And the type of a pointer describes the kind 00:12:55.080 --> 00:12:56.930 of data you'll find at that memory address. 00:12:56.930 --> 00:12:58.810 So there's the int star part right. 00:12:58.810 --> 00:13:03.690 If I follow that arrow, it's going to lead me to a location. 00:13:03.690 --> 00:13:06.980 And that location, what I will find there in my example, 00:13:06.980 --> 00:13:08.240 is a green colored box. 00:13:08.240 --> 00:13:12.650 It's an integer, that's what I will find if I go to that address. 00:13:12.650 --> 00:13:14.830 The data type of a pointer describes what 00:13:14.830 --> 00:13:17.936 you will find at that memory address. 00:13:17.936 --> 00:13:19.560 So here's the really cool thing though. 00:13:19.560 --> 00:13:25.090 Pointers allow us to pass variables between functions. 00:13:25.090 --> 00:13:28.520 And actually pass variables and not pass copies of them. 00:13:28.520 --> 00:13:32.879 Because if we know exactly where in memory to find a variable, 00:13:32.879 --> 00:13:35.670 we don't need to make a copy of it, we can just go to that location 00:13:35.670 --> 00:13:37.844 and work with that variable. 00:13:37.844 --> 00:13:40.260 So in essence pointers sort of make a computer environment 00:13:40.260 --> 00:13:42.360 a lot more like the real world, right. 00:13:42.360 --> 00:13:44.640 >> So here's an analogy. 00:13:44.640 --> 00:13:48.080 Let's say that I have a notebook, right, and it's full of notes. 00:13:48.080 --> 00:13:50.230 And I would like you to update it. 00:13:50.230 --> 00:13:53.960 You are a function that updates notes, right. 00:13:53.960 --> 00:13:56.390 In the way we've been working so far, what 00:13:56.390 --> 00:14:02.370 happens is you will take my notebook, you'll go to the copy store, 00:14:02.370 --> 00:14:06.410 you'll make a Xerox copy of every page of the notebook. 00:14:06.410 --> 00:14:09.790 You'll leave my notebook back on my desk when you're done, 00:14:09.790 --> 00:14:14.600 you'll go and cross out things in my notebook that are out of date or wrong, 00:14:14.600 --> 00:14:19.280 and then you'll pass back to me the stack of Xerox pages 00:14:19.280 --> 00:14:22.850 that is a replica of my notebook with the changes that you've made to it. 00:14:22.850 --> 00:14:27.040 And at that point, it's up to me as the calling function, as the caller, 00:14:27.040 --> 00:14:30.582 to decide to take your notes and integrate them back into my notebook. 00:14:30.582 --> 00:14:32.540 So there's a lot of steps involved here, right. 00:14:32.540 --> 00:14:34.850 Like wouldn't it be better if I just say, hey, can you 00:14:34.850 --> 00:14:38.370 update my notebook for me, hand you my notebook, 00:14:38.370 --> 00:14:40.440 and you take things and literally cross them out 00:14:40.440 --> 00:14:42.810 and update my notes in my notebook. 00:14:42.810 --> 00:14:45.140 And then give me my notebook back. 00:14:45.140 --> 00:14:47.320 That's kind of what pointers allow us to do, 00:14:47.320 --> 00:14:51.320 they make this environment a lot more like how we operate in reality. 00:14:51.320 --> 00:14:54.640 >> All right so that's what a pointer is, let's talk 00:14:54.640 --> 00:14:58.040 about how pointers work in C, and how we can start to work with them. 00:14:58.040 --> 00:15:02.550 So there's a very simple pointer in C called the null pointer. 00:15:02.550 --> 00:15:04.830 The null pointer points to nothing. 00:15:04.830 --> 00:15:08.310 This probably seems like it's actually not a very useful thing, 00:15:08.310 --> 00:15:10.500 but as we'll see a little later on, the fact 00:15:10.500 --> 00:15:15.410 that this null pointer exists actually really can come in handy. 00:15:15.410 --> 00:15:19.090 And whenever you create a pointer, and you don't set its value immediately- 00:15:19.090 --> 00:15:21.060 an example of setting its value immediately 00:15:21.060 --> 00:15:25.401 will be a couple slides back where I said pk equals & k, 00:15:25.401 --> 00:15:28.740 pk gets k's address, as we'll see what that means, 00:15:28.740 --> 00:15:32.990 we'll see how to code that shortly- if we don't set its value to something 00:15:32.990 --> 00:15:35.380 meaningful immediately, you should always 00:15:35.380 --> 00:15:37.480 set your pointer to point to null. 00:15:37.480 --> 00:15:40.260 You should set it to point to nothing. 00:15:40.260 --> 00:15:43.614 >> That's very different than just leaving the value as it is 00:15:43.614 --> 00:15:45.530 and then declaring a pointer and just assuming 00:15:45.530 --> 00:15:48.042 it's null because that's rarely true. 00:15:48.042 --> 00:15:50.000 So you should always set the value of a pointer 00:15:50.000 --> 00:15:55.690 to null if you don't set its value to something meaningful immediately. 00:15:55.690 --> 00:15:59.090 You can check whether a pointer's value is null using the equality operator 00:15:59.090 --> 00:16:05.450 (==), just like you compare any integer values or character values using (==) 00:16:05.450 --> 00:16:06.320 as well. 00:16:06.320 --> 00:16:10.994 It's a special sort of constant value that you can use to test. 00:16:10.994 --> 00:16:13.160 So that was a very simple pointer, the null pointer. 00:16:13.160 --> 00:16:15.320 Another way to create a pointer is to extract 00:16:15.320 --> 00:16:18.240 the address of a variable you've already created, 00:16:18.240 --> 00:16:22.330 and you do this using the & operator address extraction. 00:16:22.330 --> 00:16:26.720 Which we've already seen previously in the first diagram example I showed. 00:16:26.720 --> 00:16:31.450 So if x is a variable that we've already created of type integer, 00:16:31.450 --> 00:16:35.110 then &x is a pointer to an integer. 00:16:35.110 --> 00:16:39.810 &x is- remember, & is going to extract the address of the thing on the right. 00:16:39.810 --> 00:16:45.350 And since a pointer is just an address, than &x is a pointer to an integer 00:16:45.350 --> 00:16:48.560 whose value is where in memory x lives. 00:16:48.560 --> 00:16:50.460 It's x's address. 00:16:50.460 --> 00:16:53.296 So &x is the address of x. 00:16:53.296 --> 00:16:55.670 Let's take this one step further and connect to something 00:16:55.670 --> 00:16:58.380 I alluded to in a prior video. 00:16:58.380 --> 00:17:06.730 If arr is an array of doubles, then &arr square bracket i is a pointer 00:17:06.730 --> 00:17:08.109 to a double. 00:17:08.109 --> 00:17:08.970 OK. 00:17:08.970 --> 00:17:12.160 arr square bracket i, if arr is an array of doubles, 00:17:12.160 --> 00:17:19.069 then arr square bracket i is the i-th element of that array, 00:17:19.069 --> 00:17:29.270 and &arr square bracket i is where in memory the i-th element of arr exists. 00:17:29.270 --> 00:17:31.790 >> So what's the implication here? 00:17:31.790 --> 00:17:34.570 An arrays name, the implication of this whole thing, 00:17:34.570 --> 00:17:39.290 is that an array's name is actually itself a pointer. 00:17:39.290 --> 00:17:41.170 You've been working with pointers all along 00:17:41.170 --> 00:17:45.290 every time that you've used an array. 00:17:45.290 --> 00:17:49.090 Remember from the example on variable scope, 00:17:49.090 --> 00:17:53.420 near the end of the video I present an example where we have a function 00:17:53.420 --> 00:17:56.890 called set int and a function called set array. 00:17:56.890 --> 00:18:00.490 And your challenge to determine whether or not, or what the 00:18:00.490 --> 00:18:03.220 values that we printed out the end of the function, 00:18:03.220 --> 00:18:05.960 at the end of the main program. 00:18:05.960 --> 00:18:08.740 >> If you recall from that example or if you've watched the video, 00:18:08.740 --> 00:18:13.080 you know that when you- the call to set int effectively does nothing. 00:18:13.080 --> 00:18:16.390 But the call to set array does. 00:18:16.390 --> 00:18:19.280 And I sort of glossed over why that was the case at the time. 00:18:19.280 --> 00:18:22.363 I just said, well it's an array, it's special, you know, there's a reason. 00:18:22.363 --> 00:18:25.020 The reason is that an array's name is really just a pointer, 00:18:25.020 --> 00:18:28.740 and there's this special square bracket syntax that 00:18:28.740 --> 00:18:30.510 make things a lot nicer to work with. 00:18:30.510 --> 00:18:34.410 And they make the idea of a pointer a lot less intimidating, 00:18:34.410 --> 00:18:36.800 and that's why they're sort of presented in that way. 00:18:36.800 --> 00:18:38.600 But really arrays are just pointers. 00:18:38.600 --> 00:18:41.580 And that's why when we made a change to the array, 00:18:41.580 --> 00:18:44.880 when we passed an array as a parameter to a function or as an argument 00:18:44.880 --> 00:18:50.110 to a function, the contents of the array actually changed in both the callee 00:18:50.110 --> 00:18:51.160 and in the caller. 00:18:51.160 --> 00:18:55.846 Which for every other kind of variable we saw was not the case. 00:18:55.846 --> 00:18:58.970 So that's just something to keep in mind when you're working with pointers, 00:18:58.970 --> 00:19:01.610 is that the name of an array actually a pointer 00:19:01.610 --> 00:19:04.750 to the first element of that array. 00:19:04.750 --> 00:19:08.930 >> OK so now we have all these facts, let's keep going, right. 00:19:08.930 --> 00:19:11.370 Why do we care about where something lives. 00:19:11.370 --> 00:19:14.120 Well like I said, it's pretty useful to know where something lives 00:19:14.120 --> 00:19:17.240 so you can go there and change it. 00:19:17.240 --> 00:19:19.390 Work with it and actually have the thing that you 00:19:19.390 --> 00:19:23.710 want to do to that variable take effect, and not take effect on some copy of it. 00:19:23.710 --> 00:19:26.150 This is called dereferencing. 00:19:26.150 --> 00:19:28.690 We go to the reference and we change the value there. 00:19:28.690 --> 00:19:32.660 So if we have a pointer and it's called pc, and it points to a character, 00:19:32.660 --> 00:19:40.610 then we can say *pc and *pc is the name of what we'll find if we go 00:19:40.610 --> 00:19:42.910 to the address pc. 00:19:42.910 --> 00:19:47.860 What we'll find there is a character and *pc is how we refer to the data at that 00:19:47.860 --> 00:19:48.880 location. 00:19:48.880 --> 00:19:54.150 So we could say something like *pc=D or something like that, 00:19:54.150 --> 00:19:59.280 and that means that whatever was at memory address pc, 00:19:59.280 --> 00:20:07.040 whatever character was previously there, is now D, if we say *pc=D. 00:20:07.040 --> 00:20:10.090 >> So here we go again with some weird C stuff, right. 00:20:10.090 --> 00:20:14.560 So we've seen * previously as being somehow part of the data type, 00:20:14.560 --> 00:20:17.160 and now it's being used in a slightly different context 00:20:17.160 --> 00:20:19.605 to access the data at a location. 00:20:19.605 --> 00:20:22.480 I know it's a little confusing and that's actually part of this whole 00:20:22.480 --> 00:20:25.740 like, why pointers have this mythology around them as being so complex, 00:20:25.740 --> 00:20:28.250 is kind of a syntax problem, honestly. 00:20:28.250 --> 00:20:31.810 But * is used in both contexts, both as part of the type name, 00:20:31.810 --> 00:20:34.100 and we'll see a little later something else, too. 00:20:34.100 --> 00:20:36.490 And right now is the dereference operator. 00:20:36.490 --> 00:20:38.760 So it goes to the reference, it accesses the data 00:20:38.760 --> 00:20:43.000 at the location of the pointer, and allows you to manipulate it at will. 00:20:43.000 --> 00:20:45.900 >> Now this is very similar to visiting your neighbor, right. 00:20:45.900 --> 00:20:48.710 If you know what your neighbor lives, you're 00:20:48.710 --> 00:20:50.730 not hanging out with your neighbor. 00:20:50.730 --> 00:20:53.510 You know you happen to know where they live, 00:20:53.510 --> 00:20:56.870 but that doesn't mean that by virtue of having that knowledge 00:20:56.870 --> 00:20:59.170 you are interacting with them. 00:20:59.170 --> 00:21:01.920 If you want to interact with them, you have to go to their house, 00:21:01.920 --> 00:21:03.760 you have to go to where they live. 00:21:03.760 --> 00:21:07.440 And once you do that, then you can interact 00:21:07.440 --> 00:21:09.420 with them just like you'd want to. 00:21:09.420 --> 00:21:12.730 And similarly with variables, you need to go to their address 00:21:12.730 --> 00:21:15.320 if you want to interact them, you can't just know the address. 00:21:15.320 --> 00:21:21.495 And the way you go to the address is to use *, the dereference operator. 00:21:21.495 --> 00:21:23.620 What do you think happens if we try and dereference 00:21:23.620 --> 00:21:25.260 a pointer whose value is null? 00:21:25.260 --> 00:21:28.470 Recall that the null pointer points to nothing. 00:21:28.470 --> 00:21:34.110 So if you try and dereference nothing or go to an address nothing, 00:21:34.110 --> 00:21:36.800 what do you think happens? 00:21:36.800 --> 00:21:39.630 Well if you guessed segmentation fault, you'd be right. 00:21:39.630 --> 00:21:41.390 If you try and dereference a null pointer, 00:21:41.390 --> 00:21:43.140 you suffer a segmentation fault. But wait, 00:21:43.140 --> 00:21:45.820 didn't I tell you, that if you're not going 00:21:45.820 --> 00:21:49.220 to set your value of your pointer to something meaningful, 00:21:49.220 --> 00:21:51.000 you should set to null? 00:21:51.000 --> 00:21:55.290 I did and actually the segmentation fault is kind of a good behavior. 00:21:55.290 --> 00:21:58.680 >> Have you ever declared a variable and not assigned its value immediately? 00:21:58.680 --> 00:22:02.680 So you just say int x; you don't actually assign it to anything 00:22:02.680 --> 00:22:05.340 and then later on in your code, you print out the value of x, 00:22:05.340 --> 00:22:07.650 having still not assigned it to anything. 00:22:07.650 --> 00:22:10.370 Frequently you'll get zero, but sometimes you 00:22:10.370 --> 00:22:15.000 might get some random number, and you have no idea where it came from. 00:22:15.000 --> 00:22:16.750 Similarly can things happen with pointers. 00:22:16.750 --> 00:22:20.110 When you declare a pointer int*pk for example, 00:22:20.110 --> 00:22:23.490 and you don't assign it to a value, you get four bytes for memory. 00:22:23.490 --> 00:22:25.950 Whatever four bytes of memory the system can 00:22:25.950 --> 00:22:28.970 find that have some meaningful value. 00:22:28.970 --> 00:22:31.760 And there might have been something already there that 00:22:31.760 --> 00:22:34.190 is no longer needed by another function, so you just have 00:22:34.190 --> 00:22:35.900 whatever data was there. 00:22:35.900 --> 00:22:40.570 >> What if you tried to do dereference some address that you don't- there were 00:22:40.570 --> 00:22:43.410 already bytes and information in there, that's now in your pointer. 00:22:43.410 --> 00:22:47.470 If you try and dereference that pointer, you might be messing with some memory 00:22:47.470 --> 00:22:49.390 that you didn't intend to mess with it all. 00:22:49.390 --> 00:22:51.639 And in fact you could do something really devastating, 00:22:51.639 --> 00:22:54.880 like break another program, or break another function, 00:22:54.880 --> 00:22:58.289 or do something malicious that you didn't intend to do at all. 00:22:58.289 --> 00:23:00.080 And so that's why it's actually a good idea 00:23:00.080 --> 00:23:04.030 to set your pointers to null if you don't set them to something meaningful. 00:23:04.030 --> 00:23:06.760 It's probably better at the end of the day for your program 00:23:06.760 --> 00:23:09.840 to crash then for it to do something that screws up 00:23:09.840 --> 00:23:12.400 another program or another function. 00:23:12.400 --> 00:23:15.207 That behavior is probably even less ideal than just crashing. 00:23:15.207 --> 00:23:17.040 And so that's why it's actually a good habit 00:23:17.040 --> 00:23:20.920 to get into to set your pointers to null if you don't set them 00:23:20.920 --> 00:23:24.540 to a meaningful value immediately, a value that you know 00:23:24.540 --> 00:23:27.260 and that you can safely the dereference. 00:23:27.260 --> 00:23:32.240 >> So let's come back now and take a look at the overall syntax of the situation. 00:23:32.240 --> 00:23:37.400 If I say int *p;, what have I just done? 00:23:37.400 --> 00:23:38.530 What I've done is this. 00:23:38.530 --> 00:23:43.290 I know the value of p is an address because all pointers are just 00:23:43.290 --> 00:23:44.660 addresses. 00:23:44.660 --> 00:23:47.750 I can dereference p using the * operator. 00:23:47.750 --> 00:23:51.250 In this context here, at the very top recall the * is part of the type. 00:23:51.250 --> 00:23:53.510 Int * is the data type. 00:23:53.510 --> 00:23:56.150 But I can dereference p using the * operator, 00:23:56.150 --> 00:24:01.897 and if I do so, if I go to that address, what will I find at that address? 00:24:01.897 --> 00:24:02.855 I will find an integer. 00:24:02.855 --> 00:24:05.910 So int*p is basically saying, p is an address. 00:24:05.910 --> 00:24:09.500 I can dereference p and if I do, I will find an integer 00:24:09.500 --> 00:24:11.920 at that memory location. 00:24:11.920 --> 00:24:14.260 >> OK so I said there was another annoying thing with stars 00:24:14.260 --> 00:24:17.060 and here's where that annoying thing with stars is. 00:24:17.060 --> 00:24:21.640 Have you ever tried to declare multiple variables of the same type 00:24:21.640 --> 00:24:24.409 on the same line of code? 00:24:24.409 --> 00:24:27.700 So for a second, pretend that the line, the code I actually have there in green 00:24:27.700 --> 00:24:29.366 isn't there and it just says int x,y,z;. 00:24:31.634 --> 00:24:34.550 What that would do is actually create three integer variables for you, 00:24:34.550 --> 00:24:36.930 one called x, one called y, and one called z. 00:24:36.930 --> 00:24:41.510 It's a way to do it without having to split onto three lines. 00:24:41.510 --> 00:24:43.890 >> Here's where stars get annoying again though, 00:24:43.890 --> 00:24:49.200 because the * is actually part of both the type name and part 00:24:49.200 --> 00:24:50.320 of the variable name. 00:24:50.320 --> 00:24:56.430 And so if I say int *px,py,pz, what I actually get is a pointer to an integer 00:24:56.430 --> 00:25:01.650 called px and two integers, py and pz. 00:25:01.650 --> 00:25:04.950 And that's probably not what we want, that's not good. 00:25:04.950 --> 00:25:09.290 >> So if I want to create multiple pointers on the same line, of the same type, 00:25:09.290 --> 00:25:12.140 and stars, what I actually need to do is say int *pa,*pb,*pc. 00:25:17.330 --> 00:25:20.300 Now having just said that and now telling you this, 00:25:20.300 --> 00:25:22.170 you probably will never do this. 00:25:22.170 --> 00:25:25.170 And it's probably a good thing honestly, because you might inadvertently 00:25:25.170 --> 00:25:26.544 omit a star, something like that. 00:25:26.544 --> 00:25:29.290 It's probably best to maybe declare pointers on individual lines, 00:25:29.290 --> 00:25:31.373 but it's just another one of those annoying syntax 00:25:31.373 --> 00:25:35.310 things with stars that make pointers so difficult to work with. 00:25:35.310 --> 00:25:39.480 Because it's just this syntactic mess you have to work through. 00:25:39.480 --> 00:25:41.600 With practice it does really become second nature. 00:25:41.600 --> 00:25:45.410 I still make mistakes with it still after programming for 10 years, 00:25:45.410 --> 00:25:49.630 so don't be upset if something happens to you, it's pretty common honestly. 00:25:49.630 --> 00:25:52.850 It's really kind of a flaw of the syntax. 00:25:52.850 --> 00:25:54.900 >> OK so I kind of promised that we would revisit 00:25:54.900 --> 00:25:59.370 the concept of how large is a string. 00:25:59.370 --> 00:26:02.750 Well if I told you that a string, we've really kind of 00:26:02.750 --> 00:26:04.140 been lying to you the whole time. 00:26:04.140 --> 00:26:06.181 There's no data type called string, and in fact I 00:26:06.181 --> 00:26:09.730 mentioned this in one of our earliest videos on data types, 00:26:09.730 --> 00:26:13.820 that string was a data type that was created for you in CS50.h. 00:26:13.820 --> 00:26:17.050 You have to #include CS50.h in order to use it. 00:26:17.050 --> 00:26:19.250 >> Well string is really just an alias for something 00:26:19.250 --> 00:26:23.600 called the char *, a pointer to a character. 00:26:23.600 --> 00:26:26.010 Well pointers, recall, are just addresses. 00:26:26.010 --> 00:26:28.780 So what is the size in bytes of a string? 00:26:28.780 --> 00:26:29.796 Well it's four or eight. 00:26:29.796 --> 00:26:32.170 And the reason I say four or eight is because it actually 00:26:32.170 --> 00:26:36.730 depends on the system, If you're using CS50 ide, char * is the size of a char 00:26:36.730 --> 00:26:39.340 * is eight, it's a 64-bit system. 00:26:39.340 --> 00:26:43.850 Every address in memory is 64 bits long. 00:26:43.850 --> 00:26:48.270 If you're using CS50 appliance or using any 32-bit machine, 00:26:48.270 --> 00:26:51.640 and you've heard that term 32-bit machine, what is a 32-bit machine? 00:26:51.640 --> 00:26:56.090 Well it just means that every address in memory is 32 bits long. 00:26:56.090 --> 00:26:59.140 And so 32 bits is four bytes. 00:26:59.140 --> 00:27:02.710 So a char * is four or eight bytes depending on your system. 00:27:02.710 --> 00:27:06.100 And indeed any data types, and a pointer to any data 00:27:06.100 --> 00:27:12.030 type, since all pointers are just addresses, are four or eight bytes. 00:27:12.030 --> 00:27:14.030 So let's revisit this diagram and let's conclude 00:27:14.030 --> 00:27:18.130 this video with a little exercise here. 00:27:18.130 --> 00:27:21.600 So here's the diagram we left off with at the very beginning of the video. 00:27:21.600 --> 00:27:23.110 So what happens now if I say *pk=35? 00:27:26.370 --> 00:27:30.530 So what does it mean when I say, *pk=35? 00:27:30.530 --> 00:27:32.420 Take a second. 00:27:32.420 --> 00:27:34.990 *pk. 00:27:34.990 --> 00:27:39.890 In context here, * is dereference operator. 00:27:39.890 --> 00:27:42.110 So when the dereference operator is used, 00:27:42.110 --> 00:27:48.520 we go to the address pointed to by pk, and we change what we find. 00:27:48.520 --> 00:27:55.270 So *pk=35 effectively does this to the picture. 00:27:55.270 --> 00:27:58.110 So it's basically syntactically identical to of having said k=35. 00:28:00.740 --> 00:28:01.930 >> One more. 00:28:01.930 --> 00:28:05.510 If I say int m, I create a new variable called m. 00:28:05.510 --> 00:28:08.260 A new box, it's a green box because it's going to hold an integer, 00:28:08.260 --> 00:28:09.840 and it's labeled m. 00:28:09.840 --> 00:28:14.960 If I say m=4, I put an integer into that box. 00:28:14.960 --> 00:28:20.290 If say pk=&m, how does this diagram change? 00:28:20.290 --> 00:28:28.760 Pk=&m, do you recall what the & operator does or is called? 00:28:28.760 --> 00:28:34.430 Remember that & some variable name is the address of a variable name. 00:28:34.430 --> 00:28:38.740 So what we're saying is pk gets the address of m. 00:28:38.740 --> 00:28:42.010 And so effectively what happens the diagram is that pk no longer points 00:28:42.010 --> 00:28:46.420 to k, but points to m. 00:28:46.420 --> 00:28:48.470 >> Again pointers are very tricky to work with 00:28:48.470 --> 00:28:50.620 and they take a lot of practice, but because 00:28:50.620 --> 00:28:54.150 of their ability to allow you to pass data between functions 00:28:54.150 --> 00:28:56.945 and actually have those changes take effect, 00:28:56.945 --> 00:28:58.820 getting your head around is really important. 00:28:58.820 --> 00:29:02.590 It probably is the most complicated topic we discuss in CS50, 00:29:02.590 --> 00:29:05.910 but the value that you get from using pointers 00:29:05.910 --> 00:29:09.200 far outweighs the complications that come from learning them. 00:29:09.200 --> 00:29:12.690 So I wish you the best of luck learning about pointers. 00:29:12.690 --> 00:29:15.760 I'm Doug Lloyd, this is CS50.