1 00:00:00,000 --> 00:00:02,520 [Section 4 - More Comfortable] 2 00:00:02,520 --> 00:00:04,850 [Rob Bowden - Harvard University] 3 00:00:04,850 --> 00:00:07,370 [This is CS50. - CS50.TV] 4 00:00:08,920 --> 00:00:13,350 We have a quiz tomorrow, in case you guys didn't know that. 5 00:00:14,810 --> 00:00:20,970 It's basically on everything you could have seen in class or should have seen in class. 6 00:00:20,970 --> 00:00:26,360 That includes pointers, even though they're a very recent topic. 7 00:00:26,360 --> 00:00:29,860 You should at least understand the high levels of them. 8 00:00:29,860 --> 00:00:34,760 Anything that was gone over in class you should understand for the quiz. 9 00:00:34,760 --> 00:00:37,320 So if you have questions on them, you can ask them now. 10 00:00:37,320 --> 00:00:43,280 But this is going to be a very student-led session where you guys ask questions, 11 00:00:43,280 --> 00:00:45,060 so hopefully people have questions. 12 00:00:45,060 --> 00:00:48,020 Does anyone have questions? 13 00:00:49,770 --> 00:00:52,090 Yes. >>[student] Can you go over pointers again? 14 00:00:52,090 --> 00:00:54,350 I'll go over pointers. 15 00:00:54,350 --> 00:00:59,180 All of your variables necessarily live in memory, 16 00:00:59,180 --> 00:01:04,450 but usually you don't worry about that and you just say x + 2 and y + 3 17 00:01:04,450 --> 00:01:07,080 and the compiler will figure out where the things are living for you. 18 00:01:07,080 --> 00:01:12,990 Once you're dealing with pointers, now you're explicitly using those memory addresses. 19 00:01:12,990 --> 00:01:19,800 So a single variable will only ever live at a single address at any given time. 20 00:01:19,800 --> 00:01:24,040 If we want to declare a pointer, what is the type going to look like? 21 00:01:24,040 --> 00:01:26,210 >> I want to declare a pointer p. What does the type look like? 22 00:01:26,210 --> 00:01:33,530 [student] int *p. >>Yeah. So int *p. 23 00:01:33,530 --> 00:01:38,030 And how do I make it point to x? >>[student] Ampersand. 24 00:01:40,540 --> 00:01:45,300 [Bowden] So ampersand is literally called the address of operator. 25 00:01:45,300 --> 00:01:50,460 So when I say &x it's getting the memory address of the variable x. 26 00:01:50,460 --> 00:01:56,790 So now I have the pointer p, and anywhere in my code I can use *p 27 00:01:56,790 --> 00:02:02,960 or I could use x and it will be the exact same thing. 28 00:02:02,960 --> 00:02:09,520 (*p). What is this doing? What does that star mean? 29 00:02:09,520 --> 00:02:13,120 [student] It means a value at that point. >>Yeah. 30 00:02:13,120 --> 00:02:17,590 So if we look at it, it can be very useful to draw out the diagrams 31 00:02:17,590 --> 00:02:22,230 where this is a little box of memory for x, which happens to have the value 4, 32 00:02:22,230 --> 00:02:25,980 then we have a little box of memory for p, 33 00:02:25,980 --> 00:02:31,590 and so p points to x, so we draw an arrow from p to x. 34 00:02:31,590 --> 00:02:40,270 So when we say *p we're saying go to the box that is p. 35 00:02:40,270 --> 00:02:46,480 Star is follow the arrow and then do whatever you want with that box right there. 36 00:02:46,480 --> 00:03:01,090 So I can say *p = 7; and that will go to the box that is x and change that to 7. 37 00:03:01,090 --> 00:03:13,540 Or I could say int z = *p * 2; That's confusing because it's star, star. 38 00:03:13,540 --> 00:03:19,230 The one star is dereferencing p, the other star is multiplying by 2. 39 00:03:19,230 --> 00:03:26,780 Notice I could have just as well replaced the *p with x. 40 00:03:26,780 --> 00:03:29,430 You can use them in the same way. 41 00:03:29,430 --> 00:03:38,000 And then later on I can have p point to a completely new thing. 42 00:03:38,000 --> 00:03:42,190 I can just say p = &z; 43 00:03:42,190 --> 00:03:44,940 So now p no longer points to x; it points to z. 44 00:03:44,940 --> 00:03:50,510 And any time I do *p it's the same as doing z. 45 00:03:50,510 --> 00:03:56,170 So the useful thing about this is once we start getting into functions. 46 00:03:56,170 --> 00:03:59,790 >> It's kind of useless to declare a pointer that points to something 47 00:03:59,790 --> 00:04:03,140 and then you're just dereferencing it 48 00:04:03,140 --> 00:04:06,060 when you could have used the original variable to begin with. 49 00:04:06,060 --> 00:04:18,190 But when you get into functions--so let's say we have some function, int foo, 50 00:04:18,190 --> 00:04:32,810 that takes a pointer and just does *p = 6; 51 00:04:32,810 --> 00:04:39,990 Like we saw before with swap, you can't do an effective swap and a separate function 52 00:04:39,990 --> 00:04:45,180 by just passing integers because everything in C is always passing by value. 53 00:04:45,180 --> 00:04:48,360 Even when you're passing pointers you're passing by value. 54 00:04:48,360 --> 00:04:51,940 It just so happens that those values are memory addresses. 55 00:04:51,940 --> 00:05:00,770 So when I say foo(p); I'm passing the pointer into the function foo 56 00:05:00,770 --> 00:05:03,910 and then foo is doing *p = 6; 57 00:05:03,910 --> 00:05:08,600 So inside of that function, *p is still equivalent to x, 58 00:05:08,600 --> 00:05:12,720 but I can't use x inside of that function because it's not scoped within that function. 59 00:05:12,720 --> 00:05:19,510 So *p = 6 is the only way I can access a local variable from another function. 60 00:05:19,510 --> 00:05:23,600 Or, well, pointers are the only way I can access a local variable from another function. 61 00:05:23,600 --> 00:05:31,600 [student] Let's say you wanted to return a pointer. How exactly do you do that? 62 00:05:31,600 --> 00:05:44,270 [Bowden] Return a pointer as in something like int y = 3; return &y? >>[student] Yeah. 63 00:05:44,270 --> 00:05:48,480 [Bowden] Okay. You should never do this. This is bad. 64 00:05:48,480 --> 00:05:59,480 I think I saw in these lecture slides you started seeing this whole diagram of memory 65 00:05:59,480 --> 00:06:02,880 where up here you've got memory address 0 66 00:06:02,880 --> 00:06:09,550 and down here you have memory address 4 gigs or 2 to the 32. 67 00:06:09,550 --> 00:06:15,120 So then you've got some stuff and some stuff and then you have your stack 68 00:06:15,120 --> 00:06:21,780 and you've got your heap, which you just started learning about, growing up. 69 00:06:21,780 --> 00:06:24,390 [student] Isn't the heap above the stack? 70 00:06:24,390 --> 00:06:27,760 >> Yeah. The heap is on top, isn't it? >>[student] Well, he put 0 on top. 71 00:06:27,760 --> 00:06:30,320 [student] Oh, he put 0 on top. >>[student] Oh, okay. 72 00:06:30,320 --> 00:06:36,060 Disclaimer: Anywhere with CS50 you're going to see it this way. >>[student] Okay. 73 00:06:36,060 --> 00:06:40,290 It's just that when you're first seeing stacks, 74 00:06:40,290 --> 00:06:45,000 like when you think of a stack you think of stacking things on top of one another. 75 00:06:45,000 --> 00:06:50,810 So we tend to flip this around so the stack is growing up like a stack normally would 76 00:06:50,810 --> 00:06:55,940 instead of the stack hanging down. >>[student] Don't heaps technically grow up too, though? 77 00:06:55,940 --> 00:07:01,100 It depends on what you mean by grow up. 78 00:07:01,100 --> 00:07:04,010 The stack and heap always grow in opposite directions. 79 00:07:04,010 --> 00:07:09,420 A stack is always growing up in the sense that it's growing up 80 00:07:09,420 --> 00:07:12,940 towards higher memory addresses, and the heap is growing down 81 00:07:12,940 --> 00:07:17,260 in that it's growing towards lower memory addresses. 82 00:07:17,260 --> 00:07:20,250 So the top is 0 and the bottom is high memory addresses. 83 00:07:20,250 --> 00:07:26,390 They're both growing, just in opposing directions. 84 00:07:26,390 --> 00:07:29,230 [student] I just meant that because you said you put stack on the bottom 85 00:07:29,230 --> 00:07:33,640 because it seems more intuitive because for the stack to start at the top of a heap, 86 00:07:33,640 --> 00:07:37,520 heap's on top of itself too, so that's-- >>Yeah. 87 00:07:37,520 --> 00:07:44,960 You also think of the heap as growing up and larger, but the stack more so. 88 00:07:44,960 --> 00:07:50,280 So the stack is the one that we kind of want to show growing up. 89 00:07:50,280 --> 00:07:55,390 But everywhere you look otherwise is going to show address 0 at the top 90 00:07:55,390 --> 00:07:59,590 and the highest memory address at the bottom, so this is your usual view of memory. 91 00:07:59,590 --> 00:08:02,100 >> Do you have a question? 92 00:08:02,100 --> 00:08:04,270 [student] Can you tell us more about the heap? 93 00:08:04,270 --> 00:08:06,180 Yeah. I'll get to that in a second. 94 00:08:06,180 --> 00:08:12,220 First, going back to why returning &y is a bad thing, 95 00:08:12,220 --> 00:08:18,470 on the stack you have a bunch of stack frames which represent all of the functions 96 00:08:18,470 --> 00:08:20,460 which have been called. 97 00:08:20,460 --> 00:08:27,990 So ignoring previous things, the top of your stack is always going to be the main function 98 00:08:27,990 --> 00:08:33,090 since that's the first function that's being called. 99 00:08:33,090 --> 00:08:37,130 And then when you call another function, the stack is going to grow down. 100 00:08:37,130 --> 00:08:41,640 So if I call some function, foo, and it gets its own stack frame, 101 00:08:41,640 --> 00:08:47,280 it can call some function, bar; it gets its own stack frame. 102 00:08:47,280 --> 00:08:49,840 And bar could be recursive and it could call itself, 103 00:08:49,840 --> 00:08:54,150 and so that second call to bar is going to get its own stack frame. 104 00:08:54,150 --> 00:08:58,880 And so what goes in these stack frames are all of the local variables 105 00:08:58,880 --> 00:09:03,450 and all of the function arguments that-- 106 00:09:03,450 --> 00:09:08,730 Any things that are locally scoped to this function go in these stack frames. 107 00:09:08,730 --> 00:09:21,520 So that means when I said something like bar is a function, 108 00:09:21,520 --> 00:09:29,270 I'm just going to declare an integer and then return a pointer to that integer. 109 00:09:29,270 --> 00:09:33,790 So where does y live? 110 00:09:33,790 --> 00:09:36,900 [student] y lives in bar. >>[Bowden] Yeah. 111 00:09:36,900 --> 00:09:45,010 Somewhere in this little square of memory is a littler square that has y in it. 112 00:09:45,010 --> 00:09:53,370 When I return &y, I'm returning a pointer to this little block of memory. 113 00:09:53,370 --> 00:09:58,400 But then when a function returns, its stack frame gets popped off the stack. 114 00:10:01,050 --> 00:10:03,530 And that's why it's called stack. 115 00:10:03,530 --> 00:10:06,570 It's like the stack data structure, if you know what that is. 116 00:10:06,570 --> 00:10:11,580 Or even like a stack of trays is always the example, 117 00:10:11,580 --> 00:10:16,060 main is going to go on the bottom, then the first function you call is going to go on top of that, 118 00:10:16,060 --> 00:10:20,400 and you can't get back to main until you return from all functions which have been called 119 00:10:20,400 --> 00:10:22,340 that have been placed on top of it. 120 00:10:22,340 --> 00:10:28,650 >> [student] So if you did do return the &y, that value is subject to change without notice. 121 00:10:28,650 --> 00:10:31,290 Yes, it's-- >>[student] It could be overwritten. >>Yeah. 122 00:10:31,290 --> 00:10:34,660 It's completely-- If you try and-- 123 00:10:34,660 --> 00:10:38,040 This would also be an int *bar because it's returning a pointer, 124 00:10:38,040 --> 00:10:41,310 so its return type is int *. 125 00:10:41,310 --> 00:10:46,500 If you try to use the return value of this function, it's undefined behavior 126 00:10:46,500 --> 00:10:51,770 because that pointer points to bad memory. >>[student] Okay. 127 00:10:51,770 --> 00:11:01,250 So what if, for example, you declared int *y = malloc(sizeof(int))? 128 00:11:01,250 --> 00:11:03,740 That's better. Yes. 129 00:11:03,740 --> 00:11:07,730 [student] We talked about how when we drag things to our recycle bin 130 00:11:07,730 --> 00:11:11,750 they're not actually erased; we just lose their pointers. 131 00:11:11,750 --> 00:11:15,550 So in this case do we actually erase the value or is it still there in memory? 132 00:11:15,550 --> 00:11:19,130 For the most part, it's going to still be there. 133 00:11:19,130 --> 00:11:24,220 But let's say we happen to call some other function, baz. 134 00:11:24,220 --> 00:11:28,990 Baz is going to get its own stack frame on here. 135 00:11:28,990 --> 00:11:31,470 It's going to be overwriting all of this stuff, 136 00:11:31,470 --> 00:11:34,180 and then if you later try and use the pointer that you got before, 137 00:11:34,180 --> 00:11:35,570 it's not going to be the same value. 138 00:11:35,570 --> 00:11:38,150 It's going to have changed just because you called the function baz. 139 00:11:38,150 --> 00:11:43,080 [student] But had we not, would we still get 3? 140 00:11:43,080 --> 00:11:44,990 [Bowden] In all likelihood, you would. 141 00:11:44,990 --> 00:11:49,670 But you can't rely on that. C just says undefined behavior. 142 00:11:49,670 --> 00:11:51,920 >> [student] Oh, it does. Okay. 143 00:11:51,920 --> 00:11:58,190 So when you want to return a pointer, this is where malloc comes in use. 144 00:12:00,930 --> 00:12:15,960 I'm writing actually just return malloc(3 * sizeof(int)). 145 00:12:17,360 --> 00:12:24,050 We'll go over malloc more in a second, but the idea of malloc is all of your local variables 146 00:12:24,050 --> 00:12:26,760 always go on the stack. 147 00:12:26,760 --> 00:12:31,570 Anything that's malloced goes on the heap, and it will forever and always be on the heap 148 00:12:31,570 --> 00:12:34,490 until you explicitly free it. 149 00:12:34,490 --> 00:12:42,130 So this means that when you malloc something, it's going to survive after the function returns. 150 00:12:42,130 --> 00:12:46,800 [student] Will it survive after the program stops running? >>No. 151 00:12:46,800 --> 00:12:53,180 Okay, so it's going to be there until the program is all the way done running. >>Yes. 152 00:12:53,180 --> 00:12:57,510 We can go over details of what happens when the program stops running. 153 00:12:57,510 --> 00:13:02,150 You might need to remind me, but that is a separate thing entirely. 154 00:13:02,150 --> 00:13:04,190 [student] So malloc creates a pointer? >>Yeah. 155 00:13:04,190 --> 00:13:13,030 Malloc-- >>[student] I think malloc designates a block of memory that a pointer can use. 156 00:13:15,400 --> 00:13:19,610 [Bowden] I want that diagram again. >>[student] So this function works, though? 157 00:13:19,610 --> 00:13:26,430 [student] Yeah, malloc designates a block of memory that you can use, 158 00:13:26,430 --> 00:13:30,470 and then it returns the address of the first block of that memory. 159 00:13:30,470 --> 00:13:36,750 >> [Bowden] Yeah. So when you malloc, you're grabbing some block of memory 160 00:13:36,750 --> 00:13:38,260 that's currently in the heap. 161 00:13:38,260 --> 00:13:43,040 If the heap is too small, then the heap is just going to grow, and it grows in this direction. 162 00:13:43,040 --> 00:13:44,650 So let's say the heap is too small. 163 00:13:44,650 --> 00:13:49,960 Then it's about to grow a little bit and return a pointer to this block that just grew. 164 00:13:49,960 --> 00:13:55,130 When you free stuff, you're making more room in the heap, 165 00:13:55,130 --> 00:14:00,030 so then a later call to malloc can reuse that memory that you had previously freed. 166 00:14:00,030 --> 00:14:09,950 The important thing about malloc and free is that it gives you complete control 167 00:14:09,950 --> 00:14:12,700 over the lifetime of these memory blocks. 168 00:14:12,700 --> 00:14:15,420 Global variables are always alive. 169 00:14:15,420 --> 00:14:18,500 Local variables are alive within their scope. 170 00:14:18,500 --> 00:14:22,140 As soon as you go past a curly brace, the local variables are dead. 171 00:14:22,140 --> 00:14:28,890 Malloced memory is alive when you want it to be alive 172 00:14:28,890 --> 00:14:33,480 and then is released when you tell it to be released. 173 00:14:33,480 --> 00:14:38,420 Those are actually the only 3 types of memory, really. 174 00:14:38,420 --> 00:14:41,840 There's automatic memory management, which is the stack. 175 00:14:41,840 --> 00:14:43,840 Things happen for you automatically. 176 00:14:43,840 --> 00:14:46,910 When you say int x, memory is allocated for int x. 177 00:14:46,910 --> 00:14:51,630 When x goes out of scope, memory is reclaimed for x. 178 00:14:51,630 --> 00:14:54,790 Then there's dynamic memory management, which is what malloc is, 179 00:14:54,790 --> 00:14:56,740 which is when you have control. 180 00:14:56,740 --> 00:15:01,290 You dynamically decide when memory should and should not be allocated. 181 00:15:01,290 --> 00:15:05,050 And then there's static, which just means that it lives forever, 182 00:15:05,050 --> 00:15:06,610 which is what global variables are. 183 00:15:06,610 --> 00:15:10,240 They're just always in memory. 184 00:15:10,960 --> 00:15:12,760 >> Questions? 185 00:15:14,490 --> 00:15:17,230 [student] Can you define a block just by using curly braces 186 00:15:17,230 --> 00:15:21,220 but not having to have an if statement or a while statement or anything like that? 187 00:15:21,220 --> 00:15:29,130 You can define a block as in a function, but that has curly braces too. 188 00:15:29,130 --> 00:15:32,100 [student] So you can't just have like a random pair of curly braces in your code 189 00:15:32,100 --> 00:15:35,680 that have local variables? >>Yes, you can. 190 00:15:35,680 --> 00:15:45,900 Inside of int bar we could have {int y = 3;}. 191 00:15:45,900 --> 00:15:48,440 That's supposed to be right here. 192 00:15:48,440 --> 00:15:52,450 But that completely defines the scope of int y. 193 00:15:52,450 --> 00:15:57,320 After that second curly brace, y cannot be used anymore. 194 00:15:57,910 --> 00:16:00,630 You almost never do that, though. 195 00:16:02,940 --> 00:16:07,370 Getting back to what happens when a program ends, 196 00:16:07,370 --> 00:16:18,760 there's kind of a misconception/half lie that we give in order to just make things easier. 197 00:16:18,760 --> 00:16:24,410 We tell you that when you allocate memory 198 00:16:24,410 --> 00:16:29,860 you're allocating some chunk of RAM for that variable. 199 00:16:29,860 --> 00:16:34,190 But you're not really directly touching RAM ever in your programs. 200 00:16:34,190 --> 00:16:37,490 If you think of it, how I drew-- 201 00:16:37,490 --> 00:16:44,330 And actually, if you go through in GDB you'll see the same thing. 202 00:16:51,120 --> 00:16:57,590 Regardless of how many times you run your program or what program you're running, 203 00:16:57,590 --> 00:16:59,950 the stack is always going to start-- 204 00:16:59,950 --> 00:17:06,510 you're always going to see variables around address oxbffff something. 205 00:17:06,510 --> 00:17:09,470 It's usually somewhere in that region. 206 00:17:09,470 --> 00:17:18,760 But how can 2 programs possibly have pointers to the same memory? 207 00:17:20,640 --> 00:17:27,650 [student] There's some arbitrary designation of where oxbfff is supposed to be on the RAM 208 00:17:27,650 --> 00:17:31,320 that can actually be in different places depending on when the function was called. 209 00:17:31,320 --> 00:17:35,920 Yeah. The term is virtual memory. 210 00:17:35,920 --> 00:17:42,250 The idea is that every single process, every single program that is running on your computer 211 00:17:42,250 --> 00:17:49,450 has its own--let's assume 32 bits--completely independent address space. 212 00:17:49,450 --> 00:17:51,590 This is the address space. 213 00:17:51,590 --> 00:17:56,220 It has its own completely independent 4 gigabytes to use. 214 00:17:56,220 --> 00:18:02,220 >> So if you run 2 programs simultaneously, this program sees 4 gigabytes to itself, 215 00:18:02,220 --> 00:18:04,870 this program sees 4 gigabytes to itself, 216 00:18:04,870 --> 00:18:07,720 and it's impossible for this program to dereference a pointer 217 00:18:07,720 --> 00:18:10,920 and end up with memory from this program. 218 00:18:10,920 --> 00:18:18,200 And what virtual memory is is a mapping from a processes address space 219 00:18:18,200 --> 00:18:20,470 to actual things on RAM. 220 00:18:20,470 --> 00:18:22,940 So it's up to your operating system to know that, 221 00:18:22,940 --> 00:18:28,080 hey, when this guy dereferences pointer oxbfff, that really means 222 00:18:28,080 --> 00:18:31,040 that he wants RAM byte 1000, 223 00:18:31,040 --> 00:18:38,150 whereas if this program dereferences oxbfff, he really wants RAM byte 10000. 224 00:18:38,150 --> 00:18:41,590 They can be arbitrarily far apart. 225 00:18:41,590 --> 00:18:48,730 This is even true of things within a single processes address space. 226 00:18:48,730 --> 00:18:54,770 So like it sees all 4 gigabytes to itself, but let's say-- 227 00:18:54,770 --> 00:18:57,290 [student] Does every single process-- 228 00:18:57,290 --> 00:19:01,350 Let's say you have a computer with only 4 gigabytes of RAM. 229 00:19:01,350 --> 00:19:06,430 Does every single process see the whole 4 gigabytes? >>Yes. 230 00:19:06,430 --> 00:19:13,060 But the 4 gigabytes it sees is a lie. 231 00:19:13,060 --> 00:19:20,460 It's just it thinks it has all this memory because it doesn't know any other process exists. 232 00:19:20,460 --> 00:19:28,140 It will only use as much memory as it actually needs. 233 00:19:28,140 --> 00:19:32,340 The operating system is not going to give RAM to this process 234 00:19:32,340 --> 00:19:35,750 if it's not using any memory in this entire region. 235 00:19:35,750 --> 00:19:39,300 It's not going to give it memory for that region. 236 00:19:39,300 --> 00:19:54,780 But the idea is that-- I'm trying to think of-- I can't think of an analogy. 237 00:19:54,780 --> 00:19:56,780 Analogies are hard. 238 00:19:57,740 --> 00:20:02,700 One of the issues of virtual memory or one of the things it's solving 239 00:20:02,700 --> 00:20:06,810 is that processes should be completely unaware of one another. 240 00:20:06,810 --> 00:20:12,140 And so you can write any program that just dereferences any pointer, 241 00:20:12,140 --> 00:20:19,340 like just write a program that says *(ox1234), 242 00:20:19,340 --> 00:20:22,890 and that's dereferencing memory address 1234. 243 00:20:22,890 --> 00:20:28,870 >> But it's up to the operating system to then translate what 1234 means. 244 00:20:28,870 --> 00:20:33,960 So if 1234 happens to be a valid memory address for this process, 245 00:20:33,960 --> 00:20:38,800 like it's on the stack or something, then this will return the value of that memory address 246 00:20:38,800 --> 00:20:41,960 as far as the process knows. 247 00:20:41,960 --> 00:20:47,520 But if 1234 is not a valid address, like it happens to land 248 00:20:47,520 --> 00:20:52,910 in some little piece of memory here that is beyond the stack and beyond the heap 249 00:20:52,910 --> 00:20:57,200 and you haven't really used that, then that's when you get things like segfaults 250 00:20:57,200 --> 00:21:00,260 because you're touching memory that you should not be touching. 251 00:21:07,180 --> 00:21:09,340 This is also true-- 252 00:21:09,340 --> 00:21:15,440 A 32-bit system, 32 bits means you have 32 bits to define a memory address. 253 00:21:15,440 --> 00:21:22,970 It's why pointers are 8 bytes because 32 bits are 8 bytes--or 4 bytes. 254 00:21:22,970 --> 00:21:25,250 Pointers are 4 bytes. 255 00:21:25,250 --> 00:21:33,680 So when you see a pointer like oxbfffff, that is-- 256 00:21:33,680 --> 00:21:40,080 Within any given program you can just construct any arbitrary pointer, 257 00:21:40,080 --> 00:21:46,330 anywhere from ox0 to ox 8 f's--ffffffff. 258 00:21:46,330 --> 00:21:49,180 [student] Didn't you say they're 4 bytes? >>Yeah. 259 00:21:49,180 --> 00:21:52,730 [student] Then each byte will have-- >>[Bowden] Hexadecimal. 260 00:21:52,730 --> 00:21:59,360 Hexadecimal--5, 6, 7, 8. So pointers you're going to always see in hexadecimal. 261 00:21:59,360 --> 00:22:01,710 It's just how we classify pointers. 262 00:22:01,710 --> 00:22:05,240 Every 2 digits of hexadecimal is 1 byte. 263 00:22:05,240 --> 00:22:09,600 So there's going to be 8 hexadecimal digits for 4 bytes. 264 00:22:09,600 --> 00:22:14,190 So every single pointer on a 32-bit system is going to be 4 bytes, 265 00:22:14,190 --> 00:22:18,550 which means that in your process you can construct any arbitrary 4 bytes 266 00:22:18,550 --> 00:22:20,550 and make a pointer out of it, 267 00:22:20,550 --> 00:22:32,730 which means that as far as it's aware, it can address an entire 2 to the 32 bytes of memory. 268 00:22:32,730 --> 00:22:34,760 Even though it doesn't really have access to that, 269 00:22:34,760 --> 00:22:40,190 even if your computer only has 512 megabytes, it thinks it has that much memory. 270 00:22:40,190 --> 00:22:44,930 And the operating system is smart enough that it will only allocate what you actually need. 271 00:22:44,930 --> 00:22:49,630 It doesn't just go, oh, a new process: 4 gigs. 272 00:22:49,630 --> 00:22:51,930 >> Yeah. >>[student] What does the ox mean? Why do you write it? 273 00:22:51,930 --> 00:22:54,980 It's just the symbol for hexadecimal. 274 00:22:54,980 --> 00:22:59,590 When you see a number start with ox, the successive things are hexadecimal. 275 00:23:01,930 --> 00:23:05,760 [student] You were explaining about what happens when a program ends. >>Yes. 276 00:23:05,760 --> 00:23:09,480 What happens when a program ends is the operating system 277 00:23:09,480 --> 00:23:13,600 just erases the mappings that it has for these addresses, and that's it. 278 00:23:13,600 --> 00:23:17,770 The operating system can now just give that memory to another program to use. 279 00:23:17,770 --> 00:23:19,490 [student] Okay. 280 00:23:19,490 --> 00:23:24,800 So when you allocate something on the heap or the stack or global variables or anything, 281 00:23:24,800 --> 00:23:27,010 they all just disappear as soon as the program ends 282 00:23:27,010 --> 00:23:32,120 because the operating system is now free to give that memory to any other process. 283 00:23:32,120 --> 00:23:35,150 [student] Even though there are probably still values written in? >>Yeah. 284 00:23:35,150 --> 00:23:37,740 The values are likely still there. 285 00:23:37,740 --> 00:23:41,570 It's just it's going to be difficult to get at them. 286 00:23:41,570 --> 00:23:45,230 It's much more difficult to get at them than it is to get at a deleted file 287 00:23:45,230 --> 00:23:51,450 because the deleted file kind of sits there for a long time and the hard drive is a lot bigger. 288 00:23:51,450 --> 00:23:54,120 So it's going to overwrite different parts of memory 289 00:23:54,120 --> 00:23:58,640 before it happens to overwrite the chunk of memory that that file used to be at. 290 00:23:58,640 --> 00:24:04,520 But main memory, RAM, you cycle through a lot faster, 291 00:24:04,520 --> 00:24:08,040 so it's going to very rapidly be overwritten. 292 00:24:10,300 --> 00:24:13,340 Questions on this or anything else? 293 00:24:13,340 --> 00:24:16,130 [student] I have questions about a different topic. >>Okay. 294 00:24:16,130 --> 00:24:19,060 Does anyone have questions on this? 295 00:24:20,170 --> 00:24:23,120 >> Okay. Different topic. >>[student] Okay. 296 00:24:23,120 --> 00:24:26,550 I was going through some of the practice tests, 297 00:24:26,550 --> 00:24:30,480 and in one of them it was talking about the sizeof 298 00:24:30,480 --> 00:24:35,630 and the value that it returns or different variable types. >>Yes. 299 00:24:35,630 --> 00:24:45,060 And it said that both int and long both return 4, so they're both 4 bytes long. 300 00:24:45,060 --> 00:24:48,070 Is there any difference between an int and a long, or is it the same thing? 301 00:24:48,070 --> 00:24:50,380 Yes, there is a difference. 302 00:24:50,380 --> 00:24:52,960 The C standard-- 303 00:24:52,960 --> 00:24:54,950 I'm probably going to mess up. 304 00:24:54,950 --> 00:24:58,800 The C standard is just like what C is, the official documentation of C. 305 00:24:58,800 --> 00:25:00,340 This is what it says. 306 00:25:00,340 --> 00:25:08,650 So the C standard just says that a char will forever and always be 1 byte. 307 00:25:10,470 --> 00:25:19,040 Everything after that--a short is always just defined as being greater than or equal to a char. 308 00:25:19,040 --> 00:25:23,010 This might be strictly greater than, but not positive. 309 00:25:23,010 --> 00:25:31,940 An int is just defined as being greater than or equal to a short. 310 00:25:31,940 --> 00:25:36,210 And a long is just defined as being greater than or equal to an int. 311 00:25:36,210 --> 00:25:41,600 And a long long is greater than or equal to a long. 312 00:25:41,600 --> 00:25:46,610 So the only thing the C standard defines is the relative ordering of everything. 313 00:25:46,610 --> 00:25:54,880 The actual amount of memory that things take up is generally up to implementation, 314 00:25:54,880 --> 00:25:57,640 but it's pretty well defined at this point. >>[student] Okay. 315 00:25:57,640 --> 00:26:02,490 So shorts are almost always going to be 2 bytes. 316 00:26:04,920 --> 00:26:09,950 Ints are almost always going to be 4 bytes. 317 00:26:12,070 --> 00:26:15,340 Long longs are almost always going to be 8 bytes. 318 00:26:17,990 --> 00:26:23,160 And longs, it depends on whether you're using a 32-bit or a 64-bit system. 319 00:26:23,160 --> 00:26:27,450 So a long is going to correspond to the type of system. 320 00:26:27,450 --> 00:26:31,920 If you're using a 32-bit system like the Appliance, it's going to be 4 bytes. 321 00:26:34,530 --> 00:26:42,570 If you're using a 64-bit like a lot of recent computers, it's going to be 8 bytes. 322 00:26:42,570 --> 00:26:45,230 >> Ints are almost always 4 bytes at this point. 323 00:26:45,230 --> 00:26:47,140 Long longs are almost always 8 bytes. 324 00:26:47,140 --> 00:26:50,300 In the past, ints used to only be 2 bytes. 325 00:26:50,300 --> 00:26:56,840 But notice that this completely satisfies all of these relations of greater than and equal to. 326 00:26:56,840 --> 00:27:01,280 So long is perfectly allowed to be the same size as an integer, 327 00:27:01,280 --> 00:27:04,030 and it's also allowed to be the same size as a long long. 328 00:27:04,030 --> 00:27:11,070 And it just so happens to be that in 99.999% of systems, it is going to be equal to 329 00:27:11,070 --> 00:27:15,800 either an int or a long long. It just depends on 32-bit or 64-bit. >>[student] Okay. 330 00:27:15,800 --> 00:27:24,600 In floats, how is the decimal point designated in terms of bits? 331 00:27:24,600 --> 00:27:27,160 Like as binary? >>Yeah. 332 00:27:27,160 --> 00:27:30,570 You do not need to know that for CS50. 333 00:27:30,570 --> 00:27:32,960 You don't even learn that in 61. 334 00:27:32,960 --> 00:27:37,350 You don't learn that really in any course. 335 00:27:37,350 --> 00:27:42,740 It's just a representation. 336 00:27:42,740 --> 00:27:45,440 I forget the exact bit allotments. 337 00:27:45,440 --> 00:27:53,380 The idea of floating point is that you allocate a specific number of bits to represent-- 338 00:27:53,380 --> 00:27:56,550 Basically, everything is in scientific notation. 339 00:27:56,550 --> 00:28:05,600 So you allocate a specific number of bits to represent the number itself, like 1.2345. 340 00:28:05,600 --> 00:28:10,200 I can never represent a number with more digits than 5. 341 00:28:12,200 --> 00:28:26,300 Then you also allocate a specific number of bits so that it tends to be like 342 00:28:26,300 --> 00:28:32,810 you can only go up to a certain number, like that's the largest exponent you can have, 343 00:28:32,810 --> 00:28:36,190 and you can only go down to a certain exponent, 344 00:28:36,190 --> 00:28:38,770 like that's the smallest exponent you can have. 345 00:28:38,770 --> 00:28:44,410 >> I don't remember the exact way bits are assigned to all of these values, 346 00:28:44,410 --> 00:28:47,940 but a certain number of bits are dedicated to 1.2345, 347 00:28:47,940 --> 00:28:50,930 another certain number of bits are dedicated to the exponent, 348 00:28:50,930 --> 00:28:55,670 and it's only possible to represent an exponent of a certain size. 349 00:28:55,670 --> 00:29:01,100 [student] And a double? Is that like an extra long float? >>Yeah. 350 00:29:01,100 --> 00:29:07,940 It's the same thing as a float except now you're using 8 bytes instead of 4 bytes. 351 00:29:07,940 --> 00:29:11,960 Now you'll be able to use 9 digits or 10 digits, 352 00:29:11,960 --> 00:29:16,630 and this will be able to go up to 300 instead of 100. >>[student] Okay. 353 00:29:16,630 --> 00:29:21,550 And floats are also 4 bytes. >>Yes. 354 00:29:21,550 --> 00:29:27,520 Well, again, it probably depends overall on general implementation, 355 00:29:27,520 --> 00:29:30,610 but floats are 4 bytes, doubles are 8. 356 00:29:30,610 --> 00:29:33,440 Doubles are called double because they are double the size of floats. 357 00:29:33,440 --> 00:29:38,380 [student] Okay. And are there double doubles? >>There are not. 358 00:29:38,380 --> 00:29:43,660 I think-- >>[student] Like long longs? >>Yeah. I don't think so. Yes. 359 00:29:43,660 --> 00:29:45,950 [student] On last year's test there was a question about the main function 360 00:29:45,950 --> 00:29:49,490 having to be part of your program. 361 00:29:49,490 --> 00:29:52,310 The answer was that it doesn't have to be part of your program. 362 00:29:52,310 --> 00:29:55,100 In what situation? That's what I saw. 363 00:29:55,100 --> 00:29:59,090 [Bowden] It seems-- >>[student] What situation? 364 00:29:59,090 --> 00:30:02,880 Do you have the problem? >>[student] Yeah, I can definitely pull it up. 365 00:30:02,880 --> 00:30:07,910 It doesn't have to be, technically, but basically it's going to be. 366 00:30:07,910 --> 00:30:10,030 [student] I saw one on a different year's. 367 00:30:10,030 --> 00:30:16,220 It was like True or False: A valid-- >>Oh, a .c file? 368 00:30:16,220 --> 00:30:18,790 [student] Any .c file must have-- [both speaking at once - unintelligible] 369 00:30:18,790 --> 00:30:21,120 Okay. So that's separate. 370 00:30:21,120 --> 00:30:26,800 >> A .c file just needs to contain functions. 371 00:30:26,800 --> 00:30:32,400 You can compile a file into machine code, binary, whatever, 372 00:30:32,400 --> 00:30:36,620 without it being executable yet. 373 00:30:36,620 --> 00:30:39,420 A valid executable must have a main function. 374 00:30:39,420 --> 00:30:45,460 You can write 100 functions in 1 file but no main 375 00:30:45,460 --> 00:30:48,800 and then compile that down to binary, 376 00:30:48,800 --> 00:30:54,460 then you write another file that only has main but it calls a bunch of these functions 377 00:30:54,460 --> 00:30:56,720 in this binary file over here. 378 00:30:56,720 --> 00:31:01,240 And so when you're making the executable, that's what the linker does 379 00:31:01,240 --> 00:31:05,960 is it combines these 2 binary files into an executable. 380 00:31:05,960 --> 00:31:11,400 So a .c file does not need to have a main function at all. 381 00:31:11,400 --> 00:31:19,220 And on big code bases you'll see thousands of .c files and 1 main file. 382 00:31:23,960 --> 00:31:26,110 More questions? 383 00:31:29,310 --> 00:31:31,940 [student] There was another question. 384 00:31:31,940 --> 00:31:36,710 It said make is a compiler. True or False? 385 00:31:36,710 --> 00:31:42,030 And the answer was false, and I understood why it's not like Clang. 386 00:31:42,030 --> 00:31:44,770 But what do we call make if it's not? 387 00:31:44,770 --> 00:31:49,990 Make is basically just-- I can see exactly what it calls it. 388 00:31:49,990 --> 00:31:52,410 But it just runs commands. 389 00:31:53,650 --> 00:31:55,650 Make. 390 00:31:58,240 --> 00:32:00,870 I can pull this up. Yeah. 391 00:32:10,110 --> 00:32:13,180 Oh, yeah. Make also does that. 392 00:32:13,180 --> 00:32:17,170 This says the purpose of the make utility is to determine automatically 393 00:32:17,170 --> 00:32:19,610 which pieces of a large program need to be recompiled 394 00:32:19,610 --> 00:32:22,350 and issue the commands to recompile them. 395 00:32:22,350 --> 00:32:27,690 You can make make files that are absolutely huge. 396 00:32:27,690 --> 00:32:33,210 Make looks at the time stamps of files and, like we said before, 397 00:32:33,210 --> 00:32:36,930 you can compile individual files down, and it's not until you get to the linker 398 00:32:36,930 --> 00:32:39,270 that they're put together into an executable. 399 00:32:39,270 --> 00:32:43,810 So if you have 10 different files and you make a change to 1 of them, 400 00:32:43,810 --> 00:32:47,870 then what make is going to do is just recompile that 1 file 401 00:32:47,870 --> 00:32:50,640 and then relink everything together. 402 00:32:50,640 --> 00:32:53,020 But it's much dumber than that. 403 00:32:53,020 --> 00:32:55,690 It's up to you to completely define that that's what it should be doing. 404 00:32:55,690 --> 00:32:59,560 It by default has the ability to recognize this time stamp stuff, 405 00:32:59,560 --> 00:33:03,220 but you can write a make file to do anything. 406 00:33:03,220 --> 00:33:09,150 You can write a make file so that when you type make it just cd's to another directory. 407 00:33:09,150 --> 00:33:15,560 I was getting frustrated because I tack everything inside of my Appliance 408 00:33:15,560 --> 00:33:21,740 and then I view the PDF from the Mac. 409 00:33:21,740 --> 00:33:30,720 >> So I go to Finder and I can do Go, Connect to Server, 410 00:33:30,720 --> 00:33:36,950 and the server I connect to is my Appliance, and then I open up the PDF 411 00:33:36,950 --> 00:33:40,190 that gets compiled by LaTeX. 412 00:33:40,190 --> 00:33:49,320 But I was getting frustrated because every single time I needed to refresh the PDF, 413 00:33:49,320 --> 00:33:53,900 I had to copy it to a specific directory that it could access 414 00:33:53,900 --> 00:33:57,710 and it was getting annoying. 415 00:33:57,710 --> 00:34:02,650 So instead I wrote a make file, which you have to define how it makes things. 416 00:34:02,650 --> 00:34:06,130 How you make in this is PDF LaTeX. 417 00:34:06,130 --> 00:34:10,090 Just like any other make file--or I guess you haven't seen the make files, 418 00:34:10,090 --> 00:34:13,510 but we have in the Appliance a global make file that just says, 419 00:34:13,510 --> 00:34:16,679 if you are compiling a C file, use Clang. 420 00:34:16,679 --> 00:34:20,960 And so here in my make file that I make I say, 421 00:34:20,960 --> 00:34:25,020 this file you're going to want to compile with PDF LaTeX. 422 00:34:25,020 --> 00:34:27,889 And so it's PDF LaTeX that's doing the compiling. 423 00:34:27,889 --> 00:34:31,880 Make is not compiling. It's just running these commands in the sequence I specified. 424 00:34:31,880 --> 00:34:36,110 So it runs PDF LaTeX, it copies it to the directory I want it to be copied to, 425 00:34:36,110 --> 00:34:38,270 it cd's to the directory and does other things, 426 00:34:38,270 --> 00:34:42,380 but all it does is recognize when a file changes, 427 00:34:42,380 --> 00:34:45,489 and if it changes, then it will run the commands that it's supposed to run 428 00:34:45,489 --> 00:34:48,760 when the file changes. >>[student] Okay. 429 00:34:50,510 --> 00:34:54,420 I don't know where the global make files are for me to check it out. 430 00:34:57,210 --> 00:35:04,290 Other questions? Anything from past quizzes? Any pointer things? 431 00:35:06,200 --> 00:35:08,730 There are subtle things with pointers like-- 432 00:35:08,730 --> 00:35:10,220 I'm not going to be able to find a quiz question on it-- 433 00:35:10,220 --> 00:35:16,250 but just like this sort of thing. 434 00:35:19,680 --> 00:35:24,060 Make sure you understand that when I say int *x *y-- 435 00:35:24,890 --> 00:35:28,130 This isn't exactly anything here, I guess. 436 00:35:28,130 --> 00:35:32,140 But like *x *y, those are 2 variables that are on the stack. 437 00:35:32,140 --> 00:35:37,220 When I say x = malloc(sizeof(int)), x is still a variable on the stack, 438 00:35:37,220 --> 00:35:41,180 malloc is some block over in the heap, and we're having x point to the heap. 439 00:35:41,180 --> 00:35:43,900 >> So something on the stack points to the heap. 440 00:35:43,900 --> 00:35:48,100 Whenever you malloc anything, you're inevitably storing it inside of a pointer. 441 00:35:48,100 --> 00:35:55,940 So that pointer is on the stack, the malloced block is on the heap. 442 00:35:55,940 --> 00:36:01,240 A lot of people get confused and say int *x = malloc; x is on the heap. 443 00:36:01,240 --> 00:36:04,100 No. What x points to is on the heap. 444 00:36:04,100 --> 00:36:08,540 x itself is on the stack, unless for whatever reason you have x be a global variable, 445 00:36:08,540 --> 00:36:11,960 in which case it happens to be in another region of memory. 446 00:36:13,450 --> 00:36:20,820 So keeping track, these box and arrow diagrams are pretty common for the quiz. 447 00:36:20,820 --> 00:36:25,740 Or if it's not on quiz 0, it will be on quiz 1. 448 00:36:27,570 --> 00:36:31,940 You should know all of these, the steps in compiling 449 00:36:31,940 --> 00:36:35,740 since you had to answer questions on those. Yes. 450 00:36:35,740 --> 00:36:38,940 [student] Could we go over those steps-- >>Sure. 451 00:36:48,340 --> 00:36:58,640 Before steps and compiling we have preprocessing, 452 00:36:58,640 --> 00:37:16,750 compiling, assembling, and linking. 453 00:37:16,750 --> 00:37:21,480 Preprocessing. What does that do? 454 00:37:29,720 --> 00:37:32,290 It is the easiest step in--well, not like-- 455 00:37:32,290 --> 00:37:35,770 that doesn't mean it should be obvious, but it's the easiest step. 456 00:37:35,770 --> 00:37:38,410 You guys could implement it yourselves. Yeah. 457 00:37:38,410 --> 00:37:43,410 [student] Take what you have in your includes like this and it copies and then also defines. 458 00:37:43,410 --> 00:37:49,250 It looks for things like #include and #define, 459 00:37:49,250 --> 00:37:53,800 and it just copies and pastes what those actually mean. 460 00:37:53,800 --> 00:37:59,240 So when you say #include cs50.h, the preprocessor is copying and pasting cs50.h 461 00:37:59,240 --> 00:38:01,030 into that line. 462 00:38:01,030 --> 00:38:06,640 When you say #define x to be 4, the preprocessor goes through the entire program 463 00:38:06,640 --> 00:38:10,400 and replaces all instances of x with 4. 464 00:38:10,400 --> 00:38:17,530 So the preprocessor takes a valid C file and outputs a valid C file 465 00:38:17,530 --> 00:38:20,300 where things have been copied and pasted. 466 00:38:20,300 --> 00:38:24,230 So now compiling. What does that do? 467 00:38:25,940 --> 00:38:28,210 [student] It goes from C to binary. 468 00:38:28,210 --> 00:38:30,970 >> [Bowden] It doesn't go all the way to binary. 469 00:38:30,970 --> 00:38:34,220 [student] To machine code then? >>It's not machine code. 470 00:38:34,220 --> 00:38:35,700 [student] Assembly? >>Assembly. 471 00:38:35,700 --> 00:38:38,890 It goes to Assembly before it goes all the way to C code, 472 00:38:38,890 --> 00:38:45,010 and most languages do something like this. 473 00:38:47,740 --> 00:38:50,590 Pick any high-level language, and if you're going to compile it, 474 00:38:50,590 --> 00:38:52,390 it's likely to compile in steps. 475 00:38:52,390 --> 00:38:58,140 First it's going to compile Python to C, then it's going to compile C to Assembly, 476 00:38:58,140 --> 00:39:01,600 and then Assembly is going to get translated to binary. 477 00:39:01,600 --> 00:39:07,800 So compiling is going to bring it from C to Assembly. 478 00:39:07,800 --> 00:39:12,130 The word compiling usually means bringing it from a higher level 479 00:39:12,130 --> 00:39:14,340 to a lower level programming language. 480 00:39:14,340 --> 00:39:19,190 So this is the only step in compilation where you start with a high-level language 481 00:39:19,190 --> 00:39:23,270 and end up in a low-level language, and that's why the step is called compiling. 482 00:39:25,280 --> 00:39:33,370 [student] During compiling, let's say that you've done #include cs50.h. 483 00:39:33,370 --> 00:39:42,190 Will the compiler recompile the cs50.h, like the functions that are in there, 484 00:39:42,190 --> 00:39:45,280 and translate that into Assembly code as well, 485 00:39:45,280 --> 00:39:50,830 or will it copy and paste something that's been pre-Assembly? 486 00:39:50,830 --> 00:39:56,910 cs50.h will pretty much never end up in Assembly. 487 00:39:59,740 --> 00:40:03,680 Stuff like function prototypes and things are just for you to be careful. 488 00:40:03,680 --> 00:40:09,270 It guarantees that the compiler can check things like you're calling functions 489 00:40:09,270 --> 00:40:12,910 with the right return types and the right arguments and stuff. 490 00:40:12,910 --> 00:40:18,350 >> So cs50.h will be preprocessed into the file, and then when it's compiling 491 00:40:18,350 --> 00:40:22,310 it's basically thrown away after it makes sure that everything is being called correctly. 492 00:40:22,310 --> 00:40:29,410 But the functions defined in the CS50 library, which are separate from cs50.h, 493 00:40:29,410 --> 00:40:33,610 those will not be separately compiled. 494 00:40:33,610 --> 00:40:37,270 That will actually come down in the linking step, so we'll get to that in a second. 495 00:40:37,270 --> 00:40:40,100 But first, what is assembling? 496 00:40:41,850 --> 00:40:44,500 [student] Assembly to binary? >>Yeah. 497 00:40:46,300 --> 00:40:48,190 Assembling. 498 00:40:48,190 --> 00:40:54,710 We don't call it compiling because Assembly is pretty much a pure translation of binary. 499 00:40:54,710 --> 00:41:00,230 There is very little logic in going from Assembly to binary. 500 00:41:00,230 --> 00:41:03,180 It's just like looking up in a table, oh, we have this instruction; 501 00:41:03,180 --> 00:41:06,290 that corresponds to binary 01110. 502 00:41:10,200 --> 00:41:15,230 And so the files that assembling generally outputs are .o files. 503 00:41:15,230 --> 00:41:19,020 And .o files are what we were saying before, 504 00:41:19,020 --> 00:41:21,570 how a file does not need to have a main function. 505 00:41:21,570 --> 00:41:27,640 Any file can be compiled down to a .o file as long as it's a valid C file. 506 00:41:27,640 --> 00:41:30,300 It can be compiled down to .o. 507 00:41:30,300 --> 00:41:43,030 Now, linking is what actually brings a bunch of .o files and brings them to an executable. 508 00:41:43,030 --> 00:41:51,110 And so what linking does is you can think of the CS50 library as a .o file. 509 00:41:51,110 --> 00:41:56,980 It is an already compiled binary file. 510 00:41:56,980 --> 00:42:03,530 And so when you compile your file, your hello.c, which calls GetString, 511 00:42:03,530 --> 00:42:06,360 hello.c gets compiled down to hello.o, 512 00:42:06,360 --> 00:42:08,910 hello.o is now in binary. 513 00:42:08,910 --> 00:42:12,830 It uses GetString, so it needs to go over to cs50.o, 514 00:42:12,830 --> 00:42:16,390 and the linker smooshes them together and copies GetString into this file 515 00:42:16,390 --> 00:42:20,640 and comes out with an executable that has all functions it needs. 516 00:42:20,640 --> 00:42:32,620 So cs50.o isn't actually an O file, but it's close enough that there is no fundamental difference. 517 00:42:32,620 --> 00:42:36,880 So linking just brings a bunch of files together 518 00:42:36,880 --> 00:42:41,390 that separately contain all of the functions I need to use 519 00:42:41,390 --> 00:42:46,120 and creates the executable that will actually run. 520 00:42:48,420 --> 00:42:50,780 >> And so that's also what we were saying before 521 00:42:50,780 --> 00:42:55,970 where you can have 1000 .c files, you compile them all to .o files, 522 00:42:55,970 --> 00:43:00,040 which will probably take a while, then you change 1 .c file. 523 00:43:00,040 --> 00:43:05,480 You only need to recompile that 1 .c file and then relink everything else, 524 00:43:05,480 --> 00:43:07,690 link everything back together. 525 00:43:09,580 --> 00:43:11,430 [student] When we're linking we write lcs50? 526 00:43:11,430 --> 00:43:20,510 Yeah, so -lcs50. That flag signals to the linker that you should be linking in that library. 527 00:43:26,680 --> 00:43:28,910 Questions? 528 00:43:41,310 --> 00:43:46,860 Have we gone over binary other than that 5 seconds in the first lecture? 529 00:43:50,130 --> 00:43:53,010 I don't think so. 530 00:43:55,530 --> 00:43:58,820 You should know all of the big Os that we've gone over, 531 00:43:58,820 --> 00:44:02,670 and you should be able to, if we gave you a function, 532 00:44:02,670 --> 00:44:09,410 you should be able to say it's big O, roughly. Or well, big O is rough. 533 00:44:09,410 --> 00:44:15,300 So if you see nested for loops looping over the same number of things, 534 00:44:15,300 --> 00:44:22,260 like int i, i < n; int j, j < n-- >>[student] n squared. >>it tends to be n squared. 535 00:44:22,260 --> 00:44:25,280 If you have triple nested, it tends to be n cubed. 536 00:44:25,280 --> 00:44:29,330 So that sort of thing you should be able to point out immediately. 537 00:44:29,330 --> 00:44:33,890 You need to know insertion sort and bubble sort and merge sort and all of those. 538 00:44:33,890 --> 00:44:41,420 It's easier to understand why they are those n squared and n log n and all of that 539 00:44:41,420 --> 00:44:47,810 because I think there was on a quiz one year where we basically gave you 540 00:44:47,810 --> 00:44:55,050 an implementation of bubble sort and said, "What is the running time of this function?" 541 00:44:55,050 --> 00:45:01,020 So if you recognize it as bubble sort, then you can immediately say n squared. 542 00:45:01,020 --> 00:45:05,470 But if you just look at it, you don't even need to realize it's bubble sort; 543 00:45:05,470 --> 00:45:08,990 you can just say this is doing this and this. This is n squared. 544 00:45:12,350 --> 00:45:14,710 [student] Are there any tough examples you can come up with, 545 00:45:14,710 --> 00:45:20,370 like a similar idea of figuring out? 546 00:45:20,370 --> 00:45:24,450 >> I don't think we would give you any tough examples. 547 00:45:24,450 --> 00:45:30,180 The bubble sort thing is about as tough as we would go, 548 00:45:30,180 --> 00:45:36,280 and even that, as long as you understand that you're iterating over the array 549 00:45:36,280 --> 00:45:41,670 for each element in the array, which is going to be something that's n squared. 550 00:45:45,370 --> 00:45:49,940 There are general questions, like right here we have-- Oh. 551 00:45:55,290 --> 00:45:58,530 Just the other day, Doug claimed, "I have invented an algorithm that can sort an array 552 00:45:58,530 --> 00:46:01,780 "of n numbers in O(log n) time!" 553 00:46:01,780 --> 00:46:04,900 So how do we know that's impossible? 554 00:46:04,900 --> 00:46:08,850 [inaudible student response] >>Yeah. 555 00:46:08,850 --> 00:46:13,710 At the very least, you have to touch each element in the array, 556 00:46:13,710 --> 00:46:16,210 so it's impossible to sort an array of-- 557 00:46:16,210 --> 00:46:20,850 If everything is in unsorted order, then you're going to be touching everything in the array, 558 00:46:20,850 --> 00:46:25,320 so it's impossible to do it in less than O of n. 559 00:46:27,430 --> 00:46:30,340 [student] You showed us that example of being able to do it in O of n 560 00:46:30,340 --> 00:46:33,920 if you use a lot of memory. >>Yeah. 561 00:46:33,920 --> 00:46:37,970 And that's-- I forget what that's-- Is it counting sort? 562 00:46:47,360 --> 00:46:51,330 Hmm. That is an integer sorting algorithm. 563 00:46:59,850 --> 00:47:05,100 I was looking for the special name for this that I couldn't remember last week. 564 00:47:05,100 --> 00:47:13,000 Yeah. These are the types of sorts that can accomplish things in big O of n. 565 00:47:13,000 --> 00:47:18,430 But there are limitations, like you can only use integers up to a certain number. 566 00:47:20,870 --> 00:47:24,560 Plus if you're trying to sort something that's-- 567 00:47:24,560 --> 00:47:30,750 If your array is 012, -12, 151, 4 million, 568 00:47:30,750 --> 00:47:35,120 then that single element is going to completely ruin the entire sorting. 569 00:47:42,060 --> 00:47:44,030 >> Questions? 570 00:47:49,480 --> 00:47:58,870 [student] If you have a recursive function and it just makes the recursive calls 571 00:47:58,870 --> 00:48:02,230 within a return statement, that's tail recursive, 572 00:48:02,230 --> 00:48:07,360 and so would that not use more memory during runtime 573 00:48:07,360 --> 00:48:12,550 or it would at least use comparable memory as an iterative solution? 574 00:48:12,550 --> 00:48:14,530 [Bowden] Yes. 575 00:48:14,530 --> 00:48:19,840 It would likely be somewhat slower, but not really. 576 00:48:19,840 --> 00:48:23,290 Tail recursive is pretty good. 577 00:48:23,290 --> 00:48:32,640 Looking again at stack frames, let's say we have main 578 00:48:32,640 --> 00:48:42,920 and we have int bar(int x) or something. 579 00:48:42,920 --> 00:48:52,310 This isn't a perfect recursive function, but return bar(x - 1). 580 00:48:52,310 --> 00:48:57,620 So obviously, this is flawed. You need base cases and stuff. 581 00:48:57,620 --> 00:49:00,360 But the idea here is that this is tail recursive, 582 00:49:00,360 --> 00:49:06,020 which means when main calls bar it's going to get its stack frame. 583 00:49:09,550 --> 00:49:12,440 In this stack frame there's going to be a little block of memory 584 00:49:12,440 --> 00:49:17,490 that corresponds to its argument x. 585 00:49:17,490 --> 00:49:25,840 And so let's say main happens to call bar(100); 586 00:49:25,840 --> 00:49:30,050 So x is going to start out as 100. 587 00:49:30,050 --> 00:49:35,660 If the compiler recognizes that this is a tail recursive function, 588 00:49:35,660 --> 00:49:38,540 then when bar makes its recursive call to bar, 589 00:49:38,540 --> 00:49:45,490 instead of making a new stack frame, which is where the stack starts growing largely, 590 00:49:45,490 --> 00:49:48,220 eventually it will run into the heap and then you get segfaults 591 00:49:48,220 --> 00:49:51,590 because memory starts colliding. 592 00:49:51,590 --> 00:49:54,830 >> So instead of making its own stack frame, it can realize, 593 00:49:54,830 --> 00:49:59,080 hey, I never really need to come back to this stack frame, 594 00:49:59,080 --> 00:50:08,040 so instead I'll just replace this argument with 99 and then start bar all over. 595 00:50:08,040 --> 00:50:11,810 And then it will do it again and it will reach return bar(x - 1), 596 00:50:11,810 --> 00:50:17,320 and instead of making a new stack frame, it will just replace its current argument with 98 597 00:50:17,320 --> 00:50:20,740 and then jump back to the very beginning of bar. 598 00:50:23,860 --> 00:50:30,430 Those operations, replacing that 1 value on the stack and jumping back to the beginning, 599 00:50:30,430 --> 00:50:32,430 are pretty efficient. 600 00:50:32,430 --> 00:50:41,500 So not only is this the same memory usage as a separate function which is iterative 601 00:50:41,500 --> 00:50:45,390 because you're only using 1 stack frame, but you're not suffering the downsides 602 00:50:45,390 --> 00:50:47,240 of having to call functions. 603 00:50:47,240 --> 00:50:50,240 Calling functions can be somewhat expensive because it has to do all this setup 604 00:50:50,240 --> 00:50:52,470 and teardown and all this stuff. 605 00:50:52,470 --> 00:50:58,160 So this tail recursion is good. 606 00:50:58,160 --> 00:51:01,170 [student] Why does it not create new steps? 607 00:51:01,170 --> 00:51:02,980 Because it realizes it doesn't need to. 608 00:51:02,980 --> 00:51:07,800 The call to bar is just returning the recursive call. 609 00:51:07,800 --> 00:51:12,220 So it doesn't need to do anything with the return value. 610 00:51:12,220 --> 00:51:15,120 It's just going to immediately return it. 611 00:51:15,120 --> 00:51:20,530 So it's just going to replace its own argument and start over. 612 00:51:20,530 --> 00:51:25,780 And also, if you don't have the tail recursive version, 613 00:51:25,780 --> 00:51:31,460 then you get all these bars where when this bar returns 614 00:51:31,460 --> 00:51:36,010 it has to return its value to this one, then that bar immediately returns 615 00:51:36,010 --> 00:51:39,620 and it returns its value to this one, then it's just going to immediately return 616 00:51:39,620 --> 00:51:41,350 and return its value to this one. 617 00:51:41,350 --> 00:51:45,350 So you're saving this popping all of these things off of the stack 618 00:51:45,350 --> 00:51:48,730 since the return value is just going to be passed all the way back up anyway. 619 00:51:48,730 --> 00:51:55,400 So why not just replace our argument with the updated argument and start over? 620 00:51:57,460 --> 00:52:01,150 If the function is not tail recursive, if you do something like-- 621 00:52:01,150 --> 00:52:07,530 [student] if bar(x + 1). >>Yeah. 622 00:52:07,530 --> 00:52:11,770 >> So if you put it in condition, then you're doing something with the return value. 623 00:52:11,770 --> 00:52:16,260 Or even if you just do return 2 * bar(x - 1). 624 00:52:16,260 --> 00:52:23,560 So now bar(x - 1) needs to return in order for it to calculate 2 times that value, 625 00:52:23,560 --> 00:52:26,140 so now it does need its own separate stack frame, 626 00:52:26,140 --> 00:52:31,180 and now, no matter how hard you try, you're going to need to-- 627 00:52:31,180 --> 00:52:34,410 This isn't tail recursive. 628 00:52:34,410 --> 00:52:37,590 [student] Would I try to bring a recursion to aim for a tail recursion-- 629 00:52:37,590 --> 00:52:41,450 [Bowden] In an ideal world, but in CS50 you don't have to. 630 00:52:43,780 --> 00:52:49,280 In order to get tail recursion, generally, you set up an additional argument 631 00:52:49,280 --> 00:52:53,550 where bar will take int x into y 632 00:52:53,550 --> 00:52:56,990 and y corresponds to the ultimate thing you want to return. 633 00:52:56,990 --> 00:53:03,650 So then this you're going to be returning bar(x - 1), 2 * y. 634 00:53:03,650 --> 00:53:09,810 So that's just a high-level how you transform things to be tail recursive. 635 00:53:09,810 --> 00:53:13,790 But the extra argument-- 636 00:53:13,790 --> 00:53:17,410 And then in the end when you reach your base case, you just return y 637 00:53:17,410 --> 00:53:22,740 because you've been accumulating the entire time the return value that you want. 638 00:53:22,740 --> 00:53:27,280 You kind of have been doing it iteratively but using recursive calls. 639 00:53:32,510 --> 00:53:34,900 Questions? 640 00:53:34,900 --> 00:53:39,890 [student] Maybe about pointer arithmetic, like when using strings. >>Sure. 641 00:53:39,890 --> 00:53:43,610 Pointer arithmetic. 642 00:53:43,610 --> 00:53:48,440 When using strings it's easy because strings are char stars, 643 00:53:48,440 --> 00:53:51,860 chars are forever and always a single byte, 644 00:53:51,860 --> 00:53:57,540 and so pointer arithmetic is equivalent to regular arithmetic when you're dealing with strings. 645 00:53:57,540 --> 00:54:08,790 Let's just say char* s = "hello". 646 00:54:08,790 --> 00:54:11,430 So we have a block in memory. 647 00:54:19,490 --> 00:54:22,380 It needs 6 bytes because you always need the null terminator. 648 00:54:22,380 --> 00:54:28,620 And char* s is going to point to the beginning of this array. 649 00:54:28,620 --> 00:54:32,830 So s points there. 650 00:54:32,830 --> 00:54:36,710 Now, this is basically how any array works, 651 00:54:36,710 --> 00:54:40,780 regardless of whether it was a return by malloc or whether it's on the stack. 652 00:54:40,780 --> 00:54:47,110 Any array is basically a pointer to the start of the array, 653 00:54:47,110 --> 00:54:53,640 and then any array operation, any indexing, is just going into that array a certain offset. 654 00:54:53,640 --> 00:55:05,360 >> So when I say something like s[3]; this is going to s and counting 3 chars in. 655 00:55:05,360 --> 00:55:12,490 So s[3], we have 0, 1, 2, 3, so s[3] is going to refer to this l. 656 00:55:12,490 --> 00:55:20,460 [student] And we could reach the same value by doing s + 3 and then parentheses star? 657 00:55:20,460 --> 00:55:22,570 Yes. 658 00:55:22,570 --> 00:55:26,010 This is equivalent to *(s + 3); 659 00:55:26,010 --> 00:55:31,240 and that is forever and always equivalent no matter what you do. 660 00:55:31,240 --> 00:55:34,070 You never need to use the bracket syntax. 661 00:55:34,070 --> 00:55:37,770 You can always use the *(s + 3) syntax. 662 00:55:37,770 --> 00:55:40,180 People tend to like the bracket syntax, though. 663 00:55:40,180 --> 00:55:43,860 [student] So all arrays are actually just pointers. 664 00:55:43,860 --> 00:55:53,630 There is a slight distinction when I say int x[4]; >>[student] Does that create the memory? 665 00:55:53,630 --> 00:56:03,320 [Bowden] That is going to create 4 ints on the stack, so 16 bytes overall. 666 00:56:03,320 --> 00:56:05,700 It's going to create 16 bytes on the stack. 667 00:56:05,700 --> 00:56:09,190 x isn't stored anywhere. 668 00:56:09,190 --> 00:56:13,420 It is just a symbol referring to the start of the thing. 669 00:56:13,420 --> 00:56:17,680 Because you declared the array inside of this function, 670 00:56:17,680 --> 00:56:22,340 what the compiler is going to do is just replace all instances of the variable x 671 00:56:22,340 --> 00:56:26,400 with where it happened to choose to put these 16 bytes. 672 00:56:26,400 --> 00:56:30,040 It can't do that with char* s because s is an actual pointer. 673 00:56:30,040 --> 00:56:32,380 It is free to then point to other things. 674 00:56:32,380 --> 00:56:36,140 x is a constant. You can't have it point to a different array. >>[student] Okay. 675 00:56:36,140 --> 00:56:43,420 But this idea, this indexing, is the same regardless of whether it's a traditional array 676 00:56:43,420 --> 00:56:48,230 or if it's a pointer to something or if it's a pointer to a malloced array. 677 00:56:48,230 --> 00:56:59,770 And in fact, it is so equivalent that that is also the same thing. 678 00:56:59,770 --> 00:57:05,440 It actually just translates what's inside of the brackets and what's left of the brackets, 679 00:57:05,440 --> 00:57:07,970 adds them together, and dereferences. 680 00:57:07,970 --> 00:57:14,710 So this is just as valid as *(s + 3) or s[3]. 681 00:57:16,210 --> 00:57:22,090 [student] Can you have pointers pointing to 2-dimensional arrays? 682 00:57:22,090 --> 00:57:27,380 >> It's harder. Traditionally, no. 683 00:57:27,380 --> 00:57:34,720 A 2-dimensional array is just a 1-dimensional array with some convenient syntax 684 00:57:34,720 --> 00:57:54,110 because when I say int x[3][3], this is really just 1 array with 9 values. 685 00:57:55,500 --> 00:58:03,000 And so when I index, the compiler knows what I mean. 686 00:58:03,000 --> 00:58:13,090 If I say x[1][2], it knows I want to go to the second row, so it's going to skip the first 3, 687 00:58:13,090 --> 00:58:17,460 and then it wants the second thing in that, so it's going to get this one. 688 00:58:17,460 --> 00:58:20,480 But it is still just a single-dimensional array. 689 00:58:20,480 --> 00:58:23,660 And so if I wanted to assign a pointer to that array, 690 00:58:23,660 --> 00:58:29,770 I would say int *p = x; 691 00:58:29,770 --> 00:58:33,220 The type of x is just-- 692 00:58:33,220 --> 00:58:38,280 It's rough saying type of x since it is just a symbol and it's not an actual variable, 693 00:58:38,280 --> 00:58:40,140 but it is just an int *. 694 00:58:40,140 --> 00:58:44,840 x is just a pointer to the start of this. >>[student] Okay. 695 00:58:44,840 --> 00:58:52,560 And so I won't be able to access [1][2]. 696 00:58:52,560 --> 00:58:58,370 I think there is special syntax for declaring a pointer, 697 00:58:58,370 --> 00:59:12,480 something ridiculous like int (*p[--something absolutely ridiculous. I don't even know. 698 00:59:12,480 --> 00:59:17,090 But there is a syntax for declaring pointers like with parentheses and things. 699 00:59:17,090 --> 00:59:22,960 It may not even let you do that. 700 00:59:22,960 --> 00:59:26,640 I could look back at something that would tell me the truth. 701 00:59:26,640 --> 00:59:34,160 I will look for it later, if there is a syntax for point. But you will never see it. 702 00:59:34,160 --> 00:59:39,670 And even the syntax is so archaic that if you use it, people will be baffled. 703 00:59:39,670 --> 00:59:43,540 Multidimensional arrays are pretty rare as it is. 704 00:59:43,540 --> 00:59:44,630 You pretty much-- 705 00:59:44,630 --> 00:59:48,490 Well, if you're doing matrix things it's not going to be rare, 706 00:59:48,490 --> 00:59:56,730 but in C you're rarely going to be using multidimensional arrays. 707 00:59:57,630 --> 01:00:00,470 Yeah. >>[student] Let's say you have a really long array. 708 01:00:00,470 --> 01:00:03,900 >> So in virtual memory it would appear to be all consecutive, 709 01:00:03,900 --> 01:00:05,640 like the elements right next to each other, 710 01:00:05,640 --> 01:00:08,770 but in the physical memory, would it be possible for that to be split up? >>Yes. 711 01:00:08,770 --> 01:00:16,860 How virtual memory works is it just separates-- 712 01:00:19,220 --> 01:00:24,860 The unit of allocation is a page, which tends to be 4 kilobytes, 713 01:00:24,860 --> 01:00:29,680 and so when a process says, hey, I want to use this memory, 714 01:00:29,680 --> 01:00:35,970 the operating system is going to allocate it 4 kilobytes for that little block of memory. 715 01:00:35,970 --> 01:00:39,100 Even if you only use a single little byte in the entire block of memory, 716 01:00:39,100 --> 01:00:42,850 the operating system is going to give it the full 4 kilobytes. 717 01:00:42,850 --> 01:00:49,410 So what this means is I could have--let's say this is my stack. 718 01:00:49,410 --> 01:00:53,180 This stack could be separated. My stack could be megabytes and megabytes. 719 01:00:53,180 --> 01:00:55,020 My stack could be huge. 720 01:00:55,020 --> 01:01:00,220 But the stack itself has to be split into individual pages, 721 01:01:00,220 --> 01:01:09,010 which if we look at over here let's say this is our RAM, 722 01:01:09,010 --> 01:01:16,600 if I have 2 gigabytes of RAM, this is actual address 0 like the zeroth byte of my RAM, 723 01:01:16,600 --> 01:01:22,210 and this is 2 gigabytes all the way down here. 724 01:01:22,210 --> 01:01:27,230 So this page might correspond to this block over here. 725 01:01:27,230 --> 01:01:29,400 This page might correspond to this block over here. 726 01:01:29,400 --> 01:01:31,560 This one might correspond to this one over here. 727 01:01:31,560 --> 01:01:35,540 So the operating system is free to assign physical memory 728 01:01:35,540 --> 01:01:39,320 to any individual page arbitrarily. 729 01:01:39,320 --> 01:01:46,180 And that means that if this border happens to straddle an array, 730 01:01:46,180 --> 01:01:50,070 an array happens to be left of this and right of this order of a page, 731 01:01:50,070 --> 01:01:54,460 then that array is going to be split in physical memory. 732 01:01:54,460 --> 01:01:59,280 And then when you quit the program, when the process ends, 733 01:01:59,280 --> 01:02:05,690 these mappings get erased and then it's free to use these little blocks for other things. 734 01:02:14,730 --> 01:02:17,410 More questions? 735 01:02:17,410 --> 01:02:19,960 [student] The pointer arithmetic. >>Oh yeah. 736 01:02:19,960 --> 01:02:28,410 Strings were easier, but looking at something like ints, 737 01:02:28,410 --> 01:02:35,000 so back to int x[4]; 738 01:02:35,000 --> 01:02:41,810 Whether this is an array or whether it's a pointer to a malloced array of 4 integers, 739 01:02:41,810 --> 01:02:47,060 it's going to be treated the same way. 740 01:02:50,590 --> 01:02:53,340 [student] So arrays are on the heap? 741 01:03:01,400 --> 01:03:05,270 [Bowden] Arrays are not on the heap. >>[student] Oh. 742 01:03:05,270 --> 01:03:08,320 >> [Bowden] This type of array tends to be on the stack 743 01:03:08,320 --> 01:03:12,220 unless you declared it at--ignoring global variables. Don't use global variables. 744 01:03:12,220 --> 01:03:16,280 Inside of a function I say int x[4]; 745 01:03:16,280 --> 01:03:22,520 It's going to create a 4-integer block on the stack for this array. 746 01:03:22,520 --> 01:03:26,960 But this malloc(4 * sizeof(int)); is going to go on the heap. 747 01:03:26,960 --> 01:03:31,870 But after this point I can use x and p in pretty much the same ways, 748 01:03:31,870 --> 01:03:36,140 other than the exceptions I said before about you can reassign p. 749 01:03:36,140 --> 01:03:40,960 Technically, their sizes are somewhat different, but that's completely irrelevant. 750 01:03:40,960 --> 01:03:43,310 You never actually use their sizes. 751 01:03:48,020 --> 01:03:56,810 The p I could say p[3] = 2; or x[3] = 2; 752 01:03:56,810 --> 01:03:59,680 You can use them in exactly the same ways. 753 01:03:59,680 --> 01:04:01,570 So pointer arithmetic now-- Yes. 754 01:04:01,570 --> 01:04:07,390 [student] Do you not have to do p* if you have the brackets? 755 01:04:07,390 --> 01:04:11,720 The brackets are an implicit dereference. >>Okay. 756 01:04:11,720 --> 01:04:20,200 Actually, also what you're saying with the can you get multidimensional arrays 757 01:04:20,200 --> 01:05:02,650 with pointers, what you can do is something like, let's say, int **pp = malloc(sizeof(int*) * 5); 758 01:05:02,650 --> 01:05:06,900 I'll just write it all out first. 759 01:05:37,880 --> 01:05:41,020 I did not want that one. 760 01:05:41,020 --> 01:05:42,550 Okay. 761 01:05:42,550 --> 01:05:48,910 What I did here is-- That should be pp[i]. 762 01:05:48,910 --> 01:05:53,680 So pp is a pointer to a pointer. 763 01:05:53,680 --> 01:06:02,420 You're mallocing pp to point to an array of 5 int stars. 764 01:06:02,420 --> 01:06:10,950 So in memory you have on the stack pp. 765 01:06:10,950 --> 01:06:20,150 It's going to point to an array of 5 blocks which are all themselves pointers. 766 01:06:20,150 --> 01:06:28,210 And then when I malloc down here, I malloc that each of those individual pointers 767 01:06:28,210 --> 01:06:32,080 should point to a separate block of 4 bytes on the heap. 768 01:06:32,080 --> 01:06:35,870 So this points to 4 bytes. 769 01:06:37,940 --> 01:06:40,660 And this one points to a different 4 bytes. 770 01:06:40,660 --> 01:06:43,200 >> And all of them point to their own 4 bytes. 771 01:06:43,200 --> 01:06:49,080 This gives me a way of doing multidimensional things. 772 01:06:49,080 --> 01:06:58,030 I could say pp[3][4], but now this is not the same thing as multidimensional arrays 773 01:06:58,030 --> 01:07:05,390 because multidimensional arrays it translated [3][4] into a single offset into the x array. 774 01:07:05,390 --> 01:07:14,790 This dereferences p, accesses the third index, then dereferences that 775 01:07:14,790 --> 01:07:20,790 and accesses--4 would be invalid--the second index. 776 01:07:24,770 --> 01:07:31,430 Whereas when we had the int x[3][4] before as a multidimensional array 777 01:07:31,430 --> 01:07:35,740 and when you double bracket it's really only a single dereference, 778 01:07:35,740 --> 01:07:40,490 you're following a single pointer and then an offset, 779 01:07:40,490 --> 01:07:42,850 this is really 2D references. 780 01:07:42,850 --> 01:07:45,840 You follow 2 separate pointers. 781 01:07:45,840 --> 01:07:50,420 So this also technically allows you to have multidimensional arrays 782 01:07:50,420 --> 01:07:53,550 where each individual array is different sizes. 783 01:07:53,550 --> 01:07:58,000 So I think jagged multidimensional arrays is what it's called 784 01:07:58,000 --> 01:08:01,870 since really the first thing could point to something that has 10 elements, 785 01:08:01,870 --> 01:08:05,540 the second thing could point to something that has 100 elements. 786 01:08:05,540 --> 01:08:10,790 [student] Is there any limit to the number of pointers you can have 787 01:08:10,790 --> 01:08:14,290 pointing to other pointers? >>No. 788 01:08:14,290 --> 01:08:17,010 You can have int *****p. 789 01:08:18,050 --> 01:08:23,760 Back to pointer arithmetic-- >>[student] Oh. >>Yeah. 790 01:08:23,760 --> 01:08:35,649 [student] If I have int ***p and then I do a dereferencing and I say p* is equal to this value, 791 01:08:35,649 --> 01:08:39,560 is it only going to do 1 level of dereferencing? >>Yes. 792 01:08:39,560 --> 01:08:43,340 So if I want to access the thing that the last pointer is pointing at-- 793 01:08:43,340 --> 01:08:46,210 Then you do ***p. >>Okay. 794 01:08:46,210 --> 01:08:54,080 So this is p points to 1 block, points to another block, points to another block. 795 01:08:54,080 --> 01:09:02,010 Then if you do *p = something else, then you are changing this 796 01:09:02,010 --> 01:09:13,640 to now point to a different block. >>Okay. 797 01:09:13,640 --> 01:09:17,649 >> [Bowden] And if these were malloced, then you have now leaked memory 798 01:09:17,649 --> 01:09:20,430 unless you happen to have different references of these 799 01:09:20,430 --> 01:09:25,270 since you can't get back to those ones that you just threw away. 800 01:09:25,270 --> 01:09:29,550 Pointer arithmetic. 801 01:09:29,550 --> 01:09:36,310 int x[4]; is going to allocate an array of 4 integers 802 01:09:36,310 --> 01:09:40,670 where x is going to point to the beginning of the array. 803 01:09:40,670 --> 01:09:50,420 So when I say something like x[1]; I want it to mean go to the second integer in the array, 804 01:09:50,420 --> 01:09:53,319 which would be this one. 805 01:09:53,319 --> 01:10:04,190 But really, that's 4 bytes into the array since this integer takes up 4 bytes. 806 01:10:04,190 --> 01:10:08,470 So an offset of 1 really means an offset of 1 807 01:10:08,470 --> 01:10:12,030 times the size of whatever the type of the array is. 808 01:10:12,030 --> 01:10:17,170 This is an array of integers, so it knows to do 1 times size of int when it wants to offset. 809 01:10:17,170 --> 01:10:25,260 The other syntax. Remember that this is equivalent to *(x + 1); 810 01:10:25,260 --> 01:10:35,250 When I say pointer + 1, what that returns is the address that the pointer is storing 811 01:10:35,250 --> 01:10:40,360 plus 1 times the size of the type of the pointer. 812 01:10:40,360 --> 01:10:59,510 So if x = ox100, then x + 1 = ox104. 813 01:10:59,510 --> 01:11:19,750 And you can abuse this and say something like char* c = (char*)x; 814 01:11:19,750 --> 01:11:23,050 and now c is going to be the same address as x. 815 01:11:23,050 --> 01:11:26,040 c is going to be equal to ox100, 816 01:11:26,040 --> 01:11:31,490 but c + 1 is going to be equal to ox101 817 01:11:31,490 --> 01:11:38,030 since pointer arithmetic depends on the type of the pointer that you are adding to. 818 01:11:38,030 --> 01:11:45,390 So c + 1, it looks at c, it's a char pointer, so it's going to add 1 times size of char, 819 01:11:45,390 --> 01:11:48,110 which is always going to be 1, so you get 101, 820 01:11:48,110 --> 01:11:54,890 whereas if I do x, which is also still 100, x + 1 is going to be 104. 821 01:11:56,660 --> 01:12:06,340 [student] Can you use c++ in order to advance your pointer by 1? 822 01:12:06,340 --> 01:12:09,810 Yes, you can. 823 01:12:09,810 --> 01:12:16,180 You can't do that with x because x is just a symbol, it is a constant; you can't change x. 824 01:12:16,180 --> 01:12:22,610 >> But c happens to just be a pointer, so c++ is perfectly valid and it will increment by 1. 825 01:12:22,610 --> 01:12:32,440 If c were just an int *, then c++ would be 104. 826 01:12:32,440 --> 01:12:41,250 ++ does pointer arithmetic just as c + 1 would have done pointer arithmetic. 827 01:12:43,000 --> 01:12:48,870 This is actually how a lot of things like merge sort-- 828 01:12:49,670 --> 01:12:55,710 Instead of creating copies of things, you can instead pass-- 829 01:12:55,710 --> 01:13:02,400 Like if I wanted to pass this half of the array--let's erase some of this. 830 01:13:04,770 --> 01:13:10,520 Let's say I wanted to pass this side of the array into a function. 831 01:13:10,520 --> 01:13:12,700 What would I pass to that function? 832 01:13:12,700 --> 01:13:17,050 If I pass x, I am passing this address. 833 01:13:17,050 --> 01:13:23,780 But I want to pass this particular address. So what should I pass? 834 01:13:23,780 --> 01:13:26,590 [student] Pointer + 2? 835 01:13:26,590 --> 01:13:29,350 [Bowden] So x + 2. Yes. 836 01:13:29,350 --> 01:13:31,620 That's going to be this address. 837 01:13:31,620 --> 01:13:42,810 You'll also very frequently see it as x[2] and then the address of that. 838 01:13:42,810 --> 01:13:47,850 So you need to take the address of it because the bracket is an implicit dereference. 839 01:13:47,850 --> 01:13:53,250 x[2] refers to the value that is in this box, and then you want the address of that box, 840 01:13:53,250 --> 01:13:56,850 so you say &x[2]. 841 01:13:56,850 --> 01:14:02,880 So that's how something in merge sort where you want to pass half the list to something 842 01:14:02,880 --> 01:14:08,790 you really just pass &x[2], and now as far as the recursive call is concerned, 843 01:14:08,790 --> 01:14:12,510 my new array starts there. 844 01:14:12,510 --> 01:14:15,130 Last minute questions. 845 01:14:15,130 --> 01:14:20,050 [student] If we don't put an ampersand or a--what's that called? >>Star? 846 01:14:20,050 --> 01:14:23,200 [student] Star. >>Technically, dereference operator, but-- >>[student] Dereference. 847 01:14:23,200 --> 01:14:29,310 >> If we don't put a star or an ampersand, what happens if I just say y = x and x is a pointer? 848 01:14:29,310 --> 01:14:34,620 What is the type of y? >>[student] I'll just say it's pointer 2. 849 01:14:34,620 --> 01:14:38,270 So if you just say y = x, now x and y point to the same thing. >>[student] Point to the same thing. 850 01:14:38,270 --> 01:14:45,180 And if x is an int pointer? >>It would complain because you can't assign pointers. 851 01:14:45,180 --> 01:14:46,540 [student] Okay. 852 01:14:46,540 --> 01:14:51,860 Remember that pointers, even though we draw them as arrows, 853 01:14:51,860 --> 01:15:02,010 really all they store--int *x--really all x is storing is something like ox100, 854 01:15:02,010 --> 01:15:06,490 which we happen to represent as pointing to the block stored at 100. 855 01:15:06,490 --> 01:15:19,660 So when I say int *y = x; I'm just copying ox100 into y, 856 01:15:19,660 --> 01:15:24,630 which we're just going to represent as y, also pointing to ox100. 857 01:15:24,630 --> 01:15:39,810 And if I say int i = (int)x; then i is going to store whatever the value of ox100 is 858 01:15:39,810 --> 01:15:45,100 inside of it, but now it's going to be interpreted as an integer instead of a pointer. 859 01:15:45,100 --> 01:15:49,310 But you need the cast or else it will complain. 860 01:15:49,310 --> 01:15:53,300 [student] So do you mean to cast-- 861 01:15:53,300 --> 01:16:00,290 Is it going to be casting int of x or casting int of y? 862 01:16:00,290 --> 01:16:03,700 [Bowden] What? 863 01:16:03,700 --> 01:16:07,690 [student] Okay. After these parentheses is there going to be an x or a y there? 864 01:16:07,690 --> 01:16:11,500 >> [Bowden] Either. x and y are equivalent. >>[student] Okay. 865 01:16:11,500 --> 01:16:14,390 Because they're both pointers. >>Yeah. 866 01:16:14,390 --> 01:16:21,050 [student] So it would store the hexadecimal 100 in integer form? >>[Bowden] Yeah. 867 01:16:21,050 --> 01:16:23,620 But not the value of whatever it points to. 868 01:16:23,620 --> 01:16:29,940 [Bowden] Yeah. >>[student] So just the address in integer form. Okay. 869 01:16:29,940 --> 01:16:34,720 [Bowden] If you wanted to for some bizarre reason, 870 01:16:34,720 --> 01:16:38,900 you could exclusively deal with pointers and never deal with integers 871 01:16:38,900 --> 01:16:49,240 and just be like int *x = 0. 872 01:16:49,240 --> 01:16:53,000 Then you're going to get really confused once pointer arithmetic starts happening. 873 01:16:53,000 --> 01:16:56,570 So the numbers that they store are meaningless. 874 01:16:56,570 --> 01:16:58,940 It's just how you end up interpreting them. 875 01:16:58,940 --> 01:17:02,920 So I'm free to copy ox100 from an int * to an int, 876 01:17:02,920 --> 01:17:07,790 and I'm free to assign--you're probably going to get yelled at for not casting-- 877 01:17:07,790 --> 01:17:18,160 I'm free to assign something like (int *)ox1234 into this arbitrary int *. 878 01:17:18,160 --> 01:17:25,480 So ox123 is just as valid a memory address as is &y. 879 01:17:25,480 --> 01:17:32,060 &y happens to return something that is pretty much ox123. 880 01:17:32,060 --> 01:17:35,430 [student] Would that be a really cool way to go from hexadecimal to decimal form, 881 01:17:35,430 --> 01:17:39,230 like if you have a pointer and you cast it as an int? 882 01:17:39,230 --> 01:17:44,860 [Bowden] You can really just print using like printf. 883 01:17:44,860 --> 01:17:50,300 Let's say I have int y = 100. 884 01:17:50,300 --> 01:18:02,700 So printf(%d\n--as you should already know--print that as an integer, %x. 885 01:18:02,700 --> 01:18:05,190 We'll just print it as hexadecimal. 886 01:18:05,190 --> 01:18:10,760 So a pointer is not stored as hexadecimal, 887 01:18:10,760 --> 01:18:12,960 and an integer is not stored as decimal. 888 01:18:12,960 --> 01:18:14,700 Everything is stored as binary. 889 01:18:14,700 --> 01:18:17,950 It's just that we tend to show pointers as hexadecimal 890 01:18:17,950 --> 01:18:23,260 because we think of things in these 4-byte blocks, 891 01:18:23,260 --> 01:18:25,390 and memory addresses tend to be familiar. 892 01:18:25,390 --> 01:18:28,890 We're like, if it starts with bf, then it happens to be on the stack. 893 01:18:28,890 --> 01:18:35,560 So it's just our interpretation of pointers as hexadecimal. 894 01:18:35,560 --> 01:18:39,200 Okay. Any last questions? 895 01:18:39,200 --> 01:18:41,700 >> I'll be here for a bit after if you have anything else. 896 01:18:41,700 --> 01:18:46,070 And that's the end of that. 897 01:18:46,070 --> 01:18:48,360 >> [student] Yay! [applause] 898 01:18:51,440 --> 01:18:53,000 >> [CS50.TV]