1 00:00:00,000 --> 00:00:01,988 [MUSIC PLAYING] 2 00:00:01,988 --> 00:01:23,070 3 00:01:23,070 --> 00:01:26,050 DAVID J. MALAN: All right, this is CS50 And this 4 00:01:26,050 --> 00:01:29,020 is the day we take off the proverbial training wheels, namely 5 00:01:29,020 --> 00:01:30,850 the CS50 library. 6 00:01:30,850 --> 00:01:33,370 You'll recall last week as we focused on algorithms, 7 00:01:33,370 --> 00:01:37,540 we started focusing on lots of comparisons and lots of swapping. 8 00:01:37,540 --> 00:01:41,930 And we did that fairly algorithmically, fairly conceptually last week. 9 00:01:41,930 --> 00:01:43,930 but today we're going to focus on actually doing 10 00:01:43,930 --> 00:01:48,360 that a little more mechanically, a little more methodically. 11 00:01:48,360 --> 00:01:53,050 And I thought this would be easier to take the training wheels off, 12 00:01:53,050 --> 00:01:54,520 hopefully not a metaphor for today. 13 00:01:54,520 --> 00:01:55,020 OK. 14 00:01:55,020 --> 00:01:57,790 So [CHUCKLE] what we'll do first though, is learn how 15 00:01:57,790 --> 00:01:59,350 to count in a slightly different way. 16 00:01:59,350 --> 00:02:01,527 You'll recall in Week 0 we did this already 17 00:02:01,527 --> 00:02:04,360 whereby we introduced not only the human decimal system-- with which 18 00:02:04,360 --> 00:02:06,240 everyone's familiar --but also binary. 19 00:02:06,240 --> 00:02:08,770 It turns out there's other base systems where you don't just 20 00:02:08,770 --> 00:02:12,850 use powers of 10 or 2, you use other base systems entirely as well. 21 00:02:12,850 --> 00:02:15,520 And this is useful because today when we focus really 22 00:02:15,520 --> 00:02:18,280 on the computer's memory, and later today on files-- 23 00:02:18,280 --> 00:02:21,040 the actual creation of and editing of files, 24 00:02:21,040 --> 00:02:23,770 like images you might have on your own phones or computers 25 00:02:23,770 --> 00:02:27,070 --it turns out it's very useful to be able to address the memory 26 00:02:27,070 --> 00:02:29,200 inside of our computers or phones-- that is assign 27 00:02:29,200 --> 00:02:31,660 a number, a unique identifier, to every byte 28 00:02:31,660 --> 00:02:34,490 so that we can just talk about where things are in memory. 29 00:02:34,490 --> 00:02:41,030 Now you might think we would do 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 30 00:02:41,030 --> 00:02:44,247 14, 15, but it turns out that's not actually human convention. 31 00:02:44,247 --> 00:02:45,580 There's nothing wrong with this. 32 00:02:45,580 --> 00:02:48,747 It's correct but you're about to see today a slightly different syntax where 33 00:02:48,747 --> 00:02:54,730 we do count from 0 to 1, to 2, to 3, to 4, to 5, to 6, to 7, to 8, to 9, 34 00:02:54,730 --> 00:02:59,680 but in the world of not decimal, not binary, but hexadecimal-- hex 35 00:02:59,680 --> 00:03:01,090 meaning 16. 36 00:03:01,090 --> 00:03:03,250 Can you actually count higher than nine? 37 00:03:03,250 --> 00:03:08,480 There is the letter A, B, C, D, E and F. Why? 38 00:03:08,480 --> 00:03:11,020 While using these individual alphabetical letters, can 39 00:03:11,020 --> 00:03:14,020 you effectively count not only from 0 through 9-- 40 00:03:14,020 --> 00:03:19,780 using single digits --but also 10, 11, 12, 13, 14, 15-- 41 00:03:19,780 --> 00:03:21,130 F, representing 15. 42 00:03:21,130 --> 00:03:24,675 And so I introduce this because we'll see this pattern throughout today 43 00:03:24,675 --> 00:03:27,550 and throughout the coming weeks programs where the computer will just 44 00:03:27,550 --> 00:03:31,000 very conventionally display to you numbers not in decimal, not in binary, 45 00:03:31,000 --> 00:03:32,440 but sometimes in hexadecimal. 46 00:03:32,440 --> 00:03:34,660 But we'll see why that is in just a moment. 47 00:03:34,660 --> 00:03:36,820 Indeed, in binary we had the digits 0 and 1, 48 00:03:36,820 --> 00:03:39,100 decimal we had 0 through 9, in hexadecimal-- 49 00:03:39,100 --> 00:03:43,420 to recap --we have 0 through F, where again, F is 15. 50 00:03:43,420 --> 00:03:44,980 So how does this actually work? 51 00:03:44,980 --> 00:03:48,430 Just a quick whirlwind tour, this was our notation in binary. 52 00:03:48,430 --> 00:03:52,330 And I had eight 0 bits here, bit meaning binary digit. 53 00:03:52,330 --> 00:03:54,700 And based on the columns there, we had powers of 2, 54 00:03:54,700 --> 00:03:57,550 or if we multiplied that out, the ones place over there, 55 00:03:57,550 --> 00:03:59,650 the 128's place over here. 56 00:03:59,650 --> 00:04:04,070 This of course, if you do the math, is what number in decimal? 57 00:04:04,070 --> 00:04:07,460 So just 0-- right --if you multiply the columns by the numbers they're in. 58 00:04:07,460 --> 00:04:08,610 But what about this? 59 00:04:08,610 --> 00:04:10,970 If I change all those 0s to 1s, what was the highest 60 00:04:10,970 --> 00:04:13,070 we could count in binary if we had eight bits? 61 00:04:13,070 --> 00:04:14,690 AUDIENCE: 255 62 00:04:14,690 --> 00:04:18,110 DAVID J. MALAN: Yeah, 255 was the highest we can count. 63 00:04:18,110 --> 00:04:21,320 You might say 256 but again, if you start counting at 0, 64 00:04:21,320 --> 00:04:23,930 you sort of spend one of those numbers as the 0. 65 00:04:23,930 --> 00:04:27,548 So 255 is the highest you can count with eight bits. 66 00:04:27,548 --> 00:04:29,090 And we could do the math if we cared. 67 00:04:29,090 --> 00:04:32,960 128 times 1 plus 64 times 1, and so forth. 68 00:04:32,960 --> 00:04:35,412 But let me just stipulate, that's indeed 255. 69 00:04:35,412 --> 00:04:38,120 In decimal, and indeed in decimal, we would represent the columns 70 00:04:38,120 --> 00:04:41,910 as powers of 10 or ones place, ten place, hundreds place, and so forth. 71 00:04:41,910 --> 00:04:44,000 So that's all Week 0 stuff. 72 00:04:44,000 --> 00:04:48,290 It turns out, though, that there's another way of representing 255 73 00:04:48,290 --> 00:04:53,720 in decimal using hexadecimal, except now instead of powers of 2 or powers of 10, 74 00:04:53,720 --> 00:04:55,580 we're just going to use powers of 16. 75 00:04:55,580 --> 00:05:00,170 And it turns out this is convenient for reasons related to computing. 76 00:05:00,170 --> 00:05:03,930 So the rightmost column will be our 16th to the zeroth or the ones place. 77 00:05:03,930 --> 00:05:06,000 The second column will be our 16s place. 78 00:05:06,000 --> 00:05:09,750 And remember, F, individually represents 15 in decimal. 79 00:05:09,750 --> 00:05:11,300 So we can count quite similarly. 80 00:05:11,300 --> 00:05:13,850 So this in hexadecimal would just be 0. 81 00:05:13,850 --> 00:05:17,630 16 times 0, plus 1 times 0, is of course 0. 82 00:05:17,630 --> 00:05:20,640 This of course, easy one, is what number? 83 00:05:20,640 --> 00:05:21,140 AUDIENCE: 1 84 00:05:21,140 --> 00:05:22,380 DAVID J. MALAN: 1 in decimal. 85 00:05:22,380 --> 00:05:26,870 This is going to be 2, 3, 4, 5, 6, 7, 8, 9. 86 00:05:26,870 --> 00:05:30,170 And whereas in the decimal role would you want to say 10-- 87 00:05:30,170 --> 00:05:37,880 or 1, 0 --here we can actually count a little higher to A, B, C, D, E, F-- 88 00:05:37,880 --> 00:05:39,560 and that represents 15. 89 00:05:39,560 --> 00:05:40,130 Why? 90 00:05:40,130 --> 00:05:44,450 16 times 0, plus 1 times F-- which again, F is 15. 91 00:05:44,450 --> 00:05:46,060 So 1 times F-- 92 00:05:46,060 --> 00:05:47,600 or 15-- gives you 15. 93 00:05:47,600 --> 00:05:49,667 Now how do you count as high as 16? 94 00:05:49,667 --> 00:05:51,750 Well, you can probably envision it already, right? 95 00:05:51,750 --> 00:05:54,710 You kind of carry the 1 just like in decimal and binary. 96 00:05:54,710 --> 00:05:58,775 So in hexadecimal, 1, 0 is the number 16. 97 00:05:58,775 --> 00:06:00,650 And here's where you just have to be careful. 98 00:06:00,650 --> 00:06:02,270 You shouldn't say 10 anymore. 99 00:06:02,270 --> 00:06:03,620 That's a decimal number. 100 00:06:03,620 --> 00:06:06,160 This is 1, 0 in hexadecimal. 101 00:06:06,160 --> 00:06:07,160 But we can count higher. 102 00:06:07,160 --> 00:06:17,780 If this is 16, this is 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 103 00:06:17,780 --> 00:06:19,490 31. 104 00:06:19,490 --> 00:06:23,360 And once you need 32, that's going to require another digit, if you will. 105 00:06:23,360 --> 00:06:24,380 So very low level. 106 00:06:24,380 --> 00:06:27,240 And none of us really on staff sort of think in hexadecimal, 107 00:06:27,240 --> 00:06:29,210 you'll just see things in hexadecimal. 108 00:06:29,210 --> 00:06:31,640 And all this is to say is that it can be converted back 109 00:06:31,640 --> 00:06:35,630 to the more familiar decimal or any other system as well. 110 00:06:35,630 --> 00:06:39,690 Higher than that we would go 2, 0, which of course, is 16 times 2-- 111 00:06:39,690 --> 00:06:42,530 which is 32 --plus 0. 112 00:06:42,530 --> 00:06:48,410 So it turns out that if you have four 1s and four 1s that it can be represented 113 00:06:48,410 --> 00:06:53,480 as FF, you've actually seen FF and probably 00 and other alphabetical 114 00:06:53,480 --> 00:06:54,360 characters before. 115 00:06:54,360 --> 00:06:58,340 How many of you have ever done web design using HTML, CSS? 116 00:06:58,340 --> 00:07:00,200 So like at least a third or so of the class. 117 00:07:00,200 --> 00:07:01,880 And for those unfamiliar, we'll get to that 118 00:07:01,880 --> 00:07:04,130 if you want to pursue that track later in the semester 119 00:07:04,130 --> 00:07:06,020 but recall RGB from Week 0. 120 00:07:06,020 --> 00:07:08,720 Red, green, blue refers to how computers can represent 121 00:07:08,720 --> 00:07:12,410 the colors of every pixel using some amount of red, some amount of green, 122 00:07:12,410 --> 00:07:13,460 some amount of blue. 123 00:07:13,460 --> 00:07:15,950 Well it turns out it's just human convention 124 00:07:15,950 --> 00:07:20,270 to describe the amounts of red, green, and blue in a color in terms 125 00:07:20,270 --> 00:07:25,592 of hexadecimal digits-- where this means give me no red, no green, no blue. 126 00:07:25,592 --> 00:07:28,550 And if you think back to Week 0 that's actually going to give us black. 127 00:07:28,550 --> 00:07:31,400 If you have none of those three colors, it's just the absence of those colors 128 00:07:31,400 --> 00:07:32,450 and you get black. 129 00:07:32,450 --> 00:07:35,090 If however, you have FF-- 130 00:07:35,090 --> 00:07:36,240 which is what? 131 00:07:36,240 --> 00:07:41,940 --255 amount of red, that's a lot of red, and 0 green, 0 blue. 132 00:07:41,940 --> 00:07:45,980 So if a computer were to represent a pixel on your screen as red 133 00:07:45,980 --> 00:07:48,230 it would store FF0000. 134 00:07:48,230 --> 00:07:51,500 That is a lot of red, no green, no blue. 135 00:07:51,500 --> 00:07:53,310 Meanwhile, if you had this representation, 136 00:07:53,310 --> 00:07:54,638 this is why this is green. 137 00:07:54,638 --> 00:07:55,430 This would be blue. 138 00:07:55,430 --> 00:07:57,710 And if you combine all three colors a lot-- 139 00:07:57,710 --> 00:07:59,870 a lot of red, lot of green, lot of blue --this 140 00:07:59,870 --> 00:08:01,780 is how a computer would represent white. 141 00:08:01,780 --> 00:08:04,280 And so we'll come back to this later on in game development, 142 00:08:04,280 --> 00:08:05,690 and web development, and mobile-- 143 00:08:05,690 --> 00:08:09,330 if of interest-- but notice that this is just a common convention as well. 144 00:08:09,330 --> 00:08:11,480 So if we reconsider what our memory looks like, 145 00:08:11,480 --> 00:08:13,010 it's just this big grid of bytes. 146 00:08:13,010 --> 00:08:16,980 And we might describe the top one is 0 and the bottom one in this case as 1F. 147 00:08:16,980 --> 00:08:18,230 And we can just keep counting. 148 00:08:18,230 --> 00:08:21,300 However, at first glance it might be a little ambiguous. 149 00:08:21,300 --> 00:08:22,700 Am I looking at decimal? 150 00:08:22,700 --> 00:08:23,990 Am I looking at hexadecimal? 151 00:08:23,990 --> 00:08:25,970 Am I looking at something else altogether? 152 00:08:25,970 --> 00:08:29,540 So humans years ago decided that just to avoid ambiguity, 153 00:08:29,540 --> 00:08:32,510 if you are using hexadecimal, the human convention 154 00:08:32,510 --> 00:08:38,240 is to prefix every digit on the screen with 0x, just arbitrarily. 155 00:08:38,240 --> 00:08:40,220 The 0x means nothing mathematically. 156 00:08:40,220 --> 00:08:42,870 It just means here comes a hexadecimal value. 157 00:08:42,870 --> 00:08:48,386 So you can disambiguate it from something like decimal itself. 158 00:08:48,386 --> 00:08:49,300 Whew. 159 00:08:49,300 --> 00:08:49,827 OK. 160 00:08:49,827 --> 00:08:50,660 That was a mouthful. 161 00:08:50,660 --> 00:08:51,960 And that's it for base systems. 162 00:08:51,960 --> 00:08:56,130 There's no more something decimals here on out this term. 163 00:08:56,130 --> 00:08:58,130 Slight white lie, there's something called octal 164 00:08:58,130 --> 00:08:59,588 but we probably won't look at that. 165 00:08:59,588 --> 00:09:02,290 Are there any questions at all? 166 00:09:02,290 --> 00:09:03,550 No, all right. 167 00:09:03,550 --> 00:09:05,660 So how can we actually use this information? 168 00:09:05,660 --> 00:09:07,930 Well let's now see some examples of what's 169 00:09:07,930 --> 00:09:09,960 going on truly inside of your computer's memory. 170 00:09:09,960 --> 00:09:11,710 And we'll see where hexadecimal is germane 171 00:09:11,710 --> 00:09:15,100 and how we can now start manipulating things more carefully inside 172 00:09:15,100 --> 00:09:16,240 of the computer's memory. 173 00:09:16,240 --> 00:09:19,690 This of course, is just a line of code involving creation of a variable called 174 00:09:19,690 --> 00:09:20,260 n. 175 00:09:20,260 --> 00:09:24,370 And that variable is having stored in it, the value 50. 176 00:09:24,370 --> 00:09:27,332 So let's go ahead and whip up a quick program that does exactly this. 177 00:09:27,332 --> 00:09:29,290 I'm going to go ahead and call this address dot 178 00:09:29,290 --> 00:09:33,040 c, just to convey that we're going to be playing with addresses 179 00:09:33,040 --> 00:09:34,580 in the computer's memory. 180 00:09:34,580 --> 00:09:36,830 And I'm going to go ahead and keep it simple at first, 181 00:09:36,830 --> 00:09:40,120 include standard I/O dot h and then int main void. 182 00:09:40,120 --> 00:09:43,145 And then down here, super simple, int n gets 50. 183 00:09:43,145 --> 00:09:45,020 And then I'm going to go ahead and print out, 184 00:09:45,020 --> 00:09:48,700 percent i comma n, thereby printing this value. 185 00:09:48,700 --> 00:09:52,480 So this too is sort of Week 1 stuff, whereby when I run this program now 186 00:09:52,480 --> 00:09:55,060 after saving it, make address-- 187 00:09:55,060 --> 00:10:00,512 seems to compile OK --dot slash address, I should see of course, 50. 188 00:10:00,512 --> 00:10:02,470 All right, just the number 50 in that variable. 189 00:10:02,470 --> 00:10:02,970 All right. 190 00:10:02,970 --> 00:10:06,190 So you're probably comfortable with these kinds of exercises thus far. 191 00:10:06,190 --> 00:10:09,490 But it turns out that we can now kind of infer what's 192 00:10:09,490 --> 00:10:11,170 going on inside the computer's memory. 193 00:10:11,170 --> 00:10:13,930 If this again is my computer's memory and somewhere in there 194 00:10:13,930 --> 00:10:17,470 I have a variable n, it might take up four bytes down there. 195 00:10:17,470 --> 00:10:20,050 An int recall is four bytes so I'm going to go ahead and use 196 00:10:20,050 --> 00:10:21,370 four squares on the screen. 197 00:10:21,370 --> 00:10:24,610 For consistency, I'm going to call it n and just put the number 50. 198 00:10:24,610 --> 00:10:27,670 Now if you really look underneath the hood, that's not 50 per se, 199 00:10:27,670 --> 00:10:31,148 it's like 32 bits, 0s and 1s that represent the number 50. 200 00:10:31,148 --> 00:10:33,940 But again, we don't care about transistors in that low level detail 201 00:10:33,940 --> 00:10:34,628 now. 202 00:10:34,628 --> 00:10:36,670 But when I go ahead and print this, all I'm doing 203 00:10:36,670 --> 00:10:40,630 is printing the contents of that variable called n. 204 00:10:40,630 --> 00:10:45,700 But that variable technically does exist at a specific address in memory. 205 00:10:45,700 --> 00:10:46,200 Right? 206 00:10:46,200 --> 00:10:49,150 If the top left hand corner was 0 and the bottom right hand corner 207 00:10:49,150 --> 00:10:51,442 was a bigger number-- and maybe this is out of context. 208 00:10:51,442 --> 00:10:53,710 I'm sort of zoomed out because you might have billions 209 00:10:53,710 --> 00:10:55,300 of bytes of memory in your computer. 210 00:10:55,300 --> 00:10:59,380 Suppose for the sake of discussion that that variable n and the value therein, 211 00:10:59,380 --> 00:11:05,890 50 is technically at address 0x meaning hexadecimal 12345678, wherever that is. 212 00:11:05,890 --> 00:11:07,780 It's a big arbitrary number. 213 00:11:07,780 --> 00:11:10,450 But it indeed exists somewhere in your computer's memory so long 214 00:11:10,450 --> 00:11:14,170 as you have that many bytes of hardware to use. 215 00:11:14,170 --> 00:11:17,920 Well it turns out that using C we can actually-- 216 00:11:17,920 --> 00:11:20,840 no pun intended --see this value as well. 217 00:11:20,840 --> 00:11:23,560 Let me go ahead and tweak this code slightly. 218 00:11:23,560 --> 00:11:25,920 I'm not going to go ahead and print out n this time, 219 00:11:25,920 --> 00:11:29,140 I'm going to go ahead and print out ampersand n, which 220 00:11:29,140 --> 00:11:31,390 happens to be a new piece of syntax for C. 221 00:11:31,390 --> 00:11:34,840 But it quite simply means the AddressOf operator. 222 00:11:34,840 --> 00:11:38,440 So wherever n is, go ahead and figure out what its address is, 223 00:11:38,440 --> 00:11:39,910 it's location in memory. 224 00:11:39,910 --> 00:11:42,490 And it turns out C has a special format code for this. 225 00:11:42,490 --> 00:11:46,570 Instead of percent i, it's percent p, where percent p 226 00:11:46,570 --> 00:11:48,760 is going to print that address for us. 227 00:11:48,760 --> 00:11:54,280 So let me go ahead and save that make address again to recompile and then do 228 00:11:54,280 --> 00:11:56,980 dot slash address, enter. 229 00:11:56,980 --> 00:11:57,910 And voila. 230 00:11:57,910 --> 00:12:02,260 Now it just so happens that in CS50 IDE running on this cloud server, 231 00:12:02,260 --> 00:12:04,810 it's not address 0x12345678. 232 00:12:04,810 --> 00:12:06,850 I just made that up for the sake of discussion. 233 00:12:06,850 --> 00:12:14,410 It's technically at 0x7FFE00B3ADBC, which has no meaning to us 234 00:12:14,410 --> 00:12:17,290 here in class but it is all hexadecimal because every digit there 235 00:12:17,290 --> 00:12:19,150 is 0 through F. 236 00:12:19,150 --> 00:12:20,530 So it's kind of cool. 237 00:12:20,530 --> 00:12:22,840 This doesn't seem like useful information yet 238 00:12:22,840 --> 00:12:27,670 but you can in fact see where values are inside of your computer's memory. 239 00:12:27,670 --> 00:12:28,970 Well, what is that value? 240 00:12:28,970 --> 00:12:31,120 Well it turns out that as soon as you ask 241 00:12:31,120 --> 00:12:34,210 the computer for the address of some value, 242 00:12:34,210 --> 00:12:37,450 you are getting what's called a pointer to that value. 243 00:12:37,450 --> 00:12:40,640 A pointer is effectively an address in the computer's memory. 244 00:12:40,640 --> 00:12:42,100 And that's why it's percent p. 245 00:12:42,100 --> 00:12:44,590 This is telling printf, go ahead and print for me 246 00:12:44,590 --> 00:12:47,080 a pointer, the address of some value. 247 00:12:47,080 --> 00:12:51,250 And by convention again, it's displayed in hexadecimal like that. 248 00:12:51,250 --> 00:12:53,500 Well, it turns out we can actually undo these effects. 249 00:12:53,500 --> 00:12:55,330 Let me go ahead and make one change here. 250 00:12:55,330 --> 00:12:59,410 Suppose that now I want to go ahead and print out 50 again. 251 00:12:59,410 --> 00:13:02,140 I can actually reverse the effects of this operator. 252 00:13:02,140 --> 00:13:06,610 So ampersand n means to go get the address of n. 253 00:13:06,610 --> 00:13:09,070 But it turns out there's another operator in C that's 254 00:13:09,070 --> 00:13:12,550 quite useful around now and that's this one here. 255 00:13:12,550 --> 00:13:15,970 So whereas ampersand is our so-called AddressOf operator, --star, 256 00:13:15,970 --> 00:13:17,020 or an asterisk-- 257 00:13:17,020 --> 00:13:18,730 we've seen before in multiplication. 258 00:13:18,730 --> 00:13:21,550 And today it has a different meaning in a different context. 259 00:13:21,550 --> 00:13:25,630 The star is the opposite of the AddressOf operator, 260 00:13:25,630 --> 00:13:28,450 it says go to a specific address. 261 00:13:28,450 --> 00:13:31,280 So whereas, an ampersand means what's the address, 262 00:13:31,280 --> 00:13:33,560 star means go to an address. 263 00:13:33,560 --> 00:13:36,760 So if I want to print out now, not the address per se, 264 00:13:36,760 --> 00:13:39,880 but I literally want to print out the value in n, 265 00:13:39,880 --> 00:13:44,980 ergo using percent i, I can actually undo what I literally did, 266 00:13:44,980 --> 00:13:48,760 stupidly-- but for the sake of demonstration --by doing star 267 00:13:48,760 --> 00:13:49,370 ampersand n. 268 00:13:49,370 --> 00:13:49,870 Why? 269 00:13:49,870 --> 00:13:51,495 The ampersand says, what's the address? 270 00:13:51,495 --> 00:13:53,002 The star says, go to that address. 271 00:13:53,002 --> 00:13:54,835 So it effectively just undoes the operation. 272 00:13:54,835 --> 00:13:56,770 So you wouldn't want to use this in practice 273 00:13:56,770 --> 00:14:00,800 but it just speaks to the sort of basic operations that we're doing here. 274 00:14:00,800 --> 00:14:06,090 So make address, let me go ahead and say now, dot slash address, enter. 275 00:14:06,090 --> 00:14:08,990 And what should I see this time? 276 00:14:08,990 --> 00:14:11,650 50, because I'm not even showing the address. 277 00:14:11,650 --> 00:14:15,490 I'm getting the address and going to the address, thereby defeating the point. 278 00:14:15,490 --> 00:14:16,730 I again see 50. 279 00:14:16,730 --> 00:14:19,960 But this is only to say quite simply that even though things might seem 280 00:14:19,960 --> 00:14:22,510 a little cryptic today at first glance, syntactically, 281 00:14:22,510 --> 00:14:26,990 ampersand is get the address, star is go to that address, one way or the other. 282 00:14:26,990 --> 00:14:27,490 Yeah? 283 00:14:27,490 --> 00:14:30,885 AUDIENCE: Can you [INAUDIBLE] by typing the address 284 00:14:30,885 --> 00:14:34,730 in [INAUDIBLE] like a [INAUDIBLE]? 285 00:14:34,730 --> 00:14:36,480 DAVID J. MALAN: Really good question, yes. 286 00:14:36,480 --> 00:14:40,650 So if I had remembered the address, maybe it was 0x12345678, 287 00:14:40,650 --> 00:14:43,590 I could actually hard code that address in my program 288 00:14:43,590 --> 00:14:45,585 and tell the computer to go there. 289 00:14:45,585 --> 00:14:46,960 The syntax is a little different. 290 00:14:46,960 --> 00:14:50,040 I would have to coerce it using a cast but I could make that happen, yes. 291 00:14:50,040 --> 00:14:50,540 Yeah. 292 00:14:50,540 --> 00:14:54,851 AUDIENCE: What happens if you don't know even the type of the variable? 293 00:14:54,851 --> 00:14:56,767 Can you [INAUDIBLE] without knowing that? 294 00:14:56,767 --> 00:14:58,010 DAVID J. MALAN: Ah, really good question. 295 00:14:58,010 --> 00:15:00,190 What if you don't know the type of the variable, 296 00:15:00,190 --> 00:15:02,650 what format code would you therefore use? 297 00:15:02,650 --> 00:15:04,270 Short answer, you have to decide. 298 00:15:04,270 --> 00:15:07,480 To a computer, everything in memory is just bits, 0s and 1s, how 299 00:15:07,480 --> 00:15:09,590 you display them is entirely up to you. 300 00:15:09,590 --> 00:15:11,710 So if you don't know what they are, you can only 301 00:15:11,710 --> 00:15:13,880 guess, or tell the computer arbitrarily to say 302 00:15:13,880 --> 00:15:15,880 it's a char, a float, an int, or something else. 303 00:15:15,880 --> 00:15:18,890 It can't figure that out for you, at least in C. 304 00:15:18,890 --> 00:15:19,390 All right. 305 00:15:19,390 --> 00:15:22,060 So let's just go ahead now and make more clear 306 00:15:22,060 --> 00:15:23,912 where we can store information here. 307 00:15:23,912 --> 00:15:26,120 Let me go ahead and change this code. now as follows. 308 00:15:26,120 --> 00:15:29,287 It turns out that you can actually store addresses and variables themselves. 309 00:15:29,287 --> 00:15:31,880 I don't have to just do this ampersand thing here. 310 00:15:31,880 --> 00:15:34,070 Let me go ahead and change the program as follows. 311 00:15:34,070 --> 00:15:38,530 Let me go ahead and declare another variable called p and store in it's 312 00:15:38,530 --> 00:15:40,390 the address of n. 313 00:15:40,390 --> 00:15:45,550 So again, nothing new here, just says, ampersand n, go get the address of n. 314 00:15:45,550 --> 00:15:47,798 But I do have to do something different here. 315 00:15:47,798 --> 00:15:49,840 On the left hand side is the name of my variable. 316 00:15:49,840 --> 00:15:51,430 I've called it p, for pointer. 317 00:15:51,430 --> 00:15:54,910 But if you want to store the address of some value 318 00:15:54,910 --> 00:15:59,980 in a variable you have to specify not just the type of value that's 319 00:15:59,980 --> 00:16:02,290 in that other variable, you have to specify 320 00:16:02,290 --> 00:16:06,310 with this star operator in a very confusing, unfortunate, different 321 00:16:06,310 --> 00:16:08,590 context, that this is a pointer. 322 00:16:08,590 --> 00:16:14,420 So whereas n has a data type of int-- just as it has since Week 0 323 00:16:14,420 --> 00:16:18,880 --the only thing new now is that it turns out there's another type of data 324 00:16:18,880 --> 00:16:20,980 that you can describe as a pointer. 325 00:16:20,980 --> 00:16:24,880 And a pointer is denoted with this star and the int just 326 00:16:24,880 --> 00:16:28,930 means this is the pointer to an int or it is the address of an int. 327 00:16:28,930 --> 00:16:31,990 And we'll see later we can do floats and-- 328 00:16:31,990 --> 00:16:34,720 floats, and chars, and bunches of other data types too. 329 00:16:34,720 --> 00:16:36,890 This just means that p is a variable that's going 330 00:16:36,890 --> 00:16:39,790 to contain a pointer to an int, a.k.a. 331 00:16:39,790 --> 00:16:41,920 The address of an int. 332 00:16:41,920 --> 00:16:42,430 All right. 333 00:16:42,430 --> 00:16:45,645 So what can I do now with this information? 334 00:16:45,645 --> 00:16:47,770 Well let me go ahead and print out either of these. 335 00:16:47,770 --> 00:16:51,490 If I want to go ahead and print out now, for instance, that address, 336 00:16:51,490 --> 00:16:56,330 I can go ahead and print % p and print out p just like this. 337 00:16:56,330 --> 00:16:58,810 Let me go ahead and make address, enter-- 338 00:16:58,810 --> 00:17:00,610 seems to compile OK --run address. 339 00:17:00,610 --> 00:17:06,032 And I'm going to see something cryptic again, 0x 7FFF3977662C, which 340 00:17:06,032 --> 00:17:07,990 is different from before but that's because one 341 00:17:07,990 --> 00:17:09,948 of the features of modern computers is actually 342 00:17:09,948 --> 00:17:12,970 to move things around in memory for you, which is a security feature. 343 00:17:12,970 --> 00:17:14,740 But more on that perhaps, later on. 344 00:17:14,740 --> 00:17:17,710 But it's still a big cryptic hexadecimal address. 345 00:17:17,710 --> 00:17:20,290 What if though, just for the sake of demonstration, 346 00:17:20,290 --> 00:17:23,440 I didn't want to print out the address because rarely after today 347 00:17:23,440 --> 00:17:27,040 are we going to care about the specific addresses where things are? 348 00:17:27,040 --> 00:17:32,680 How could I change line 7 here to print out, not the value of p, 349 00:17:32,680 --> 00:17:37,180 but what is at the location p? 350 00:17:37,180 --> 00:17:40,850 How do I go to the location in p? 351 00:17:40,850 --> 00:17:41,350 OK. 352 00:17:41,350 --> 00:17:42,610 Star p, I heard. 353 00:17:42,610 --> 00:17:46,050 So instead of printing p itself, I say star p. 354 00:17:46,050 --> 00:17:48,850 I change the format code just to be an int. 355 00:17:48,850 --> 00:17:49,350 OK. 356 00:17:49,350 --> 00:17:51,190 Siri is trying to be helpful here. 357 00:17:51,190 --> 00:17:55,580 But now I'm saying, go ahead and print me an integer. 358 00:17:55,580 --> 00:17:58,090 And the integer I want you to print is the one at p. 359 00:17:58,090 --> 00:18:00,570 Star means go to that address, which is p. 360 00:18:00,570 --> 00:18:03,190 So let me save this, make address. 361 00:18:03,190 --> 00:18:04,750 All right, seems to compile. 362 00:18:04,750 --> 00:18:06,910 Dot slash address, let's see what happens. 363 00:18:06,910 --> 00:18:08,453 And back to 50. 364 00:18:08,453 --> 00:18:10,870 So we're just kind of jumping through hoops at the moment, 365 00:18:10,870 --> 00:18:12,460 accomplishing nothing real yet. 366 00:18:12,460 --> 00:18:15,040 But again, just demonstrating, and applying, and reversing 367 00:18:15,040 --> 00:18:18,820 the effects of these two operators. 368 00:18:18,820 --> 00:18:25,428 Any questions thus far on these addresses, or pointers, or the like? 369 00:18:25,428 --> 00:18:26,880 Yeah. 370 00:18:26,880 --> 00:18:34,634 AUDIENCE: So there's six lines where you stored the address of n-- 371 00:18:34,634 --> 00:18:35,592 DAVID J. MALAN: Mm hmm. 372 00:18:35,592 --> 00:18:37,060 AUDIENCE: --pointer of p. 373 00:18:37,060 --> 00:18:41,710 DAVID J. MALAN: You stored the address of n in p and p 374 00:18:41,710 --> 00:18:45,790 is a pointer, specifically a pointer to an integer. 375 00:18:45,790 --> 00:18:49,130 Put another way, p is the address of an integer. 376 00:18:49,130 --> 00:18:50,470 Which integer? 377 00:18:50,470 --> 00:18:51,732 n 378 00:18:51,732 --> 00:18:54,149 AUDIENCE: Could I just write-- what would happen if I just 379 00:18:54,149 --> 00:18:55,920 write int p instead of int star p? 380 00:18:55,920 --> 00:18:57,170 DAVID J. MALAN: Good question. 381 00:18:57,170 --> 00:19:02,135 If you said int p equals ampersand n semicolon, instead of int star p, 382 00:19:02,135 --> 00:19:05,260 Clang-- the compiler --would actually yell at you because it realizes that, 383 00:19:05,260 --> 00:19:08,302 wait a minute, you're trying to store an address, not an integer like you 384 00:19:08,302 --> 00:19:10,212 and I know it, 12345678. 385 00:19:10,212 --> 00:19:11,920 Even though technically they are numbers, 386 00:19:11,920 --> 00:19:14,042 Clang is smart enough to realize that if you're 387 00:19:14,042 --> 00:19:16,750 getting the address of something, you must store it in a pointer. 388 00:19:16,750 --> 00:19:20,150 You cannot store it in just an integer. 389 00:19:20,150 --> 00:19:20,650 All right. 390 00:19:20,650 --> 00:19:22,320 So let's make this a little more visual. 391 00:19:22,320 --> 00:19:24,490 So if this is again my computer's memory, 392 00:19:24,490 --> 00:19:26,620 let me go ahead and pull up the slide from before. 393 00:19:26,620 --> 00:19:29,470 And the goal at hand is to visualize really these two lines of code. 394 00:19:29,470 --> 00:19:31,810 Give me a variable called n and store in it 50-- 395 00:19:31,810 --> 00:19:36,820 just like Week 1 --then also give me a variable called p and store in it 396 00:19:36,820 --> 00:19:39,280 the address of n. 397 00:19:39,280 --> 00:19:40,558 That's now in Week 4. 398 00:19:40,558 --> 00:19:41,600 What does this look like? 399 00:19:41,600 --> 00:19:42,725 Well, my computer's memory. 400 00:19:42,725 --> 00:19:44,800 Let's go ahead and put n on the screen again. 401 00:19:44,800 --> 00:19:47,560 And n might be down there arbitrarily somewhere in memory. 402 00:19:47,560 --> 00:19:50,230 And it's called n, the value is 50. 403 00:19:50,230 --> 00:19:52,220 Technically, that 50 is somewhere. 404 00:19:52,220 --> 00:19:55,270 And let's just arbitrarily for discussion sake, say it address 405 00:19:55,270 --> 00:19:58,600 0x 12345678, so somewhere arbitrary. 406 00:19:58,600 --> 00:20:00,940 What does p look like in this picture? 407 00:20:00,940 --> 00:20:04,840 Well p is a variable, which means it's a bunch of bits 408 00:20:04,840 --> 00:20:06,190 that can store information. 409 00:20:06,190 --> 00:20:09,220 And let's just propose that they're up here in the middle. 410 00:20:09,220 --> 00:20:10,960 This variable is called p. 411 00:20:10,960 --> 00:20:12,610 What value is p storing? 412 00:20:12,610 --> 00:20:21,220 It's literally storing 0x12345678, which is again, the address of the value n. 413 00:20:21,220 --> 00:20:22,690 So that's all that's going on here. 414 00:20:22,690 --> 00:20:24,280 But honestly, this is getting so low level. 415 00:20:24,280 --> 00:20:26,100 And even my sort of eyes are glazing over 416 00:20:26,100 --> 00:20:28,183 as we start talking about these low level details. 417 00:20:28,183 --> 00:20:30,680 Turns out that pointers lend themselves to abstraction. 418 00:20:30,680 --> 00:20:32,600 And in fact, we can start to do that already. 419 00:20:32,600 --> 00:20:36,190 Let's just focus now in the absence of memory, just on these two values. 420 00:20:36,190 --> 00:20:39,220 This big rectangle here represents a variable 421 00:20:39,220 --> 00:20:41,350 called p, which stores an address. 422 00:20:41,350 --> 00:20:43,480 This rectangle here represents another variable 423 00:20:43,480 --> 00:20:46,150 called n that storing the number 50. 424 00:20:46,150 --> 00:20:48,100 Technically speaking, I don't really want 425 00:20:48,100 --> 00:20:52,300 to care moving forward what address of n is. 426 00:20:52,300 --> 00:20:54,308 I just want you to know that I can access it. 427 00:20:54,308 --> 00:20:56,350 And so would a computer scientist would typically 428 00:20:56,350 --> 00:20:59,320 do is never talk about specific addresses-- 429 00:20:59,320 --> 00:21:02,410 certainly never write them down like I have thus far --but instead, just 430 00:21:02,410 --> 00:21:04,690 literally draw an arrow that conceptually 431 00:21:04,690 --> 00:21:09,220 says that this variable p is pointing at the number 50. 432 00:21:09,220 --> 00:21:11,530 And we can very quickly start to move away 433 00:21:11,530 --> 00:21:15,520 from the actual addresses in question. 434 00:21:15,520 --> 00:21:18,640 And in fact, we can visualize this even a little metaphorically. 435 00:21:18,640 --> 00:21:20,860 So for instance, here is, for instance, a mailbox. 436 00:21:20,860 --> 00:21:23,890 And suppose that this is address 123. 437 00:21:23,890 --> 00:21:25,390 What is in address 123? 438 00:21:25,390 --> 00:21:29,470 Well it's a variable of type int, called n, 439 00:21:29,470 --> 00:21:31,150 looks like it's storing the number 50. 440 00:21:31,150 --> 00:21:31,650 Right? 441 00:21:31,650 --> 00:21:32,700 We saw these letters-- 442 00:21:32,700 --> 00:21:33,700 these numbers last week. 443 00:21:33,700 --> 00:21:37,480 So here's the number 50, which is an integer inside of this variable, today, 444 00:21:37,480 --> 00:21:40,720 represented as a mailbox instead of as a locker. 445 00:21:40,720 --> 00:21:45,220 Well suppose that this mailbox over here is not n but suppose this is p. 446 00:21:45,220 --> 00:21:47,200 And it happens to be an address 456. 447 00:21:47,200 --> 00:21:48,970 But who really cares? 448 00:21:48,970 --> 00:21:55,720 If this variable p is a pointer to an integer, namely that one over there, 449 00:21:55,720 --> 00:21:58,210 when I open this door, what am I going to find? 450 00:21:58,210 --> 00:22:00,400 Well I'm hoping I find the equivalent of-- we 451 00:22:00,400 --> 00:22:02,620 picked these up at the Coop earlier --the equivalent 452 00:22:02,620 --> 00:22:07,580 of a conceptual pointer saying the number n is over there. 453 00:22:07,580 --> 00:22:11,350 But what specifically, at a lower level, is actually inside this mailbox 454 00:22:11,350 --> 00:22:15,520 if that variable n is at location 0x123? 455 00:22:15,520 --> 00:22:17,590 What's probably inside this mailbox? 456 00:22:17,590 --> 00:22:19,310 AUDIENCE: [INAUDIBLE] 457 00:22:19,310 --> 00:22:21,680 DAVID J. MALAN: Yeah, the address, indeed, 123. 458 00:22:21,680 --> 00:22:23,680 So it's sort of like a treasure map if you will. 459 00:22:23,680 --> 00:22:25,960 Oh, I have to go to 123 to get this value. 460 00:22:25,960 --> 00:22:28,817 Oh, the integer in question is indeed 50. 461 00:22:28,817 --> 00:22:30,400 And that's the fundamental difference. 462 00:22:30,400 --> 00:22:34,240 This is the int that happens to be inside of this variable of type int. 463 00:22:34,240 --> 00:22:38,980 This is the address that's a pointer that's in this other variable, p, 464 00:22:38,980 --> 00:22:42,340 but that is conceptually, simply pointing from one variable 465 00:22:42,340 --> 00:22:45,610 to another, thereby giving any sort of conceptual breadcrumbs. 466 00:22:45,610 --> 00:22:49,193 And we'll see-- frankly, in one week --how amazingly powerful it is. 467 00:22:49,193 --> 00:22:51,610 When you can have one piece of memory pointing at another, 468 00:22:51,610 --> 00:22:53,350 pointing at another, pointing at another, 469 00:22:53,350 --> 00:22:56,860 you can start to construct very sophisticated data structures, 470 00:22:56,860 --> 00:22:58,940 as they're called, things like family trees, 471 00:22:58,940 --> 00:23:01,690 and lists, and other data structures that you might have heard of. 472 00:23:01,690 --> 00:23:04,810 Or even if you haven't, these will be the underpinnings next week 473 00:23:04,810 --> 00:23:07,572 of all of today's fanciest algorithms used by, 474 00:23:07,572 --> 00:23:09,280 certainly the Googles, and the Facebooks, 475 00:23:09,280 --> 00:23:11,830 and the Microsofts of the world to manage large data sets. 476 00:23:11,830 --> 00:23:15,340 That's where we're going next week, in terms of application. 477 00:23:15,340 --> 00:23:18,317 So questions about that representation? 478 00:23:18,317 --> 00:23:19,150 Yeah, in the middle. 479 00:23:19,150 --> 00:23:22,380 AUDIENCE: Does that mean that your memory has to be twice as big? 480 00:23:22,380 --> 00:23:23,390 DAVID J. MALAN: Sorry can you say it once more? 481 00:23:23,390 --> 00:23:26,640 AUDIENCE: Is that to say your memory has to be twice as big to store pointers? 482 00:23:26,640 --> 00:23:28,348 DAVID J. MALAN: Ah, really good question. 483 00:23:28,348 --> 00:23:30,850 Is it the case that your pointers need to be twice as big? 484 00:23:30,850 --> 00:23:34,240 Not necessarily, just, this is the way life is these days. 485 00:23:34,240 --> 00:23:39,460 On most modern Macs and PCs, pointers use 64 bits-- the equivalent of a long, 486 00:23:39,460 --> 00:23:41,860 if you recall that brief discussion in Week 1. 487 00:23:41,860 --> 00:23:44,110 So I deliberately drew my pointer on the screen 488 00:23:44,110 --> 00:23:47,440 here as taking up 8 bytes or 64 bits. 489 00:23:47,440 --> 00:23:52,060 I've deliberately drawn my integer n as taking up 4 bytes or 32 bits. 490 00:23:52,060 --> 00:23:54,400 That is convention these days on modern hardware. 491 00:23:54,400 --> 00:23:56,900 But it's not necessarily the case. 492 00:23:56,900 --> 00:23:59,602 Frankly, I could not find a bigger mailbox at Home Depot, 493 00:23:59,602 --> 00:24:01,810 so we went with two identical different colored ones. 494 00:24:01,810 --> 00:24:03,880 So metaphor is imperfect. 495 00:24:03,880 --> 00:24:04,630 All right. 496 00:24:04,630 --> 00:24:09,070 So moving from this to something more familiar now, if you will. 497 00:24:09,070 --> 00:24:12,970 Recall that we've been talking about strings for quite some time. 498 00:24:12,970 --> 00:24:15,880 And in fact, most of the interesting programs we've written thus far 499 00:24:15,880 --> 00:24:19,630 involve maybe input from the human and some form of text 500 00:24:19,630 --> 00:24:21,280 that you are then manipulating. 501 00:24:21,280 --> 00:24:24,728 But string we said in Week 1 is a bit of a white lie. 502 00:24:24,728 --> 00:24:26,770 I mean, it is the training wheels that I promised 503 00:24:26,770 --> 00:24:28,360 we would start taking off today. 504 00:24:28,360 --> 00:24:32,990 So let's consider what a string actually is now in this new context. 505 00:24:32,990 --> 00:24:36,040 So if we have a string like EMMA here, declared in a variable 506 00:24:36,040 --> 00:24:39,920 called s, and quote unquote, EMMA in all caps, as we've done a couple of times 507 00:24:39,920 --> 00:24:40,420 now. 508 00:24:40,420 --> 00:24:42,795 What does this actually look like inside of the computer? 509 00:24:42,795 --> 00:24:47,380 Well somewhere in my computer's memory there are four, nay, five bytes, 510 00:24:47,380 --> 00:24:52,610 storing E-M-M-A, and then additionally, that null terminating character that 511 00:24:52,610 --> 00:24:55,390 demarcates where the end of the string is. 512 00:24:55,390 --> 00:24:58,150 This is just eight individual 0 bits. 513 00:24:58,150 --> 00:25:01,030 So that's where EMMA might be represented in the computer's memory. 514 00:25:01,030 --> 00:25:04,240 But recall that the variable in question was s. 515 00:25:04,240 --> 00:25:05,320 That was my string. 516 00:25:05,320 --> 00:25:07,330 And so that's why over the past few weeks 517 00:25:07,330 --> 00:25:11,050 any time you want to manipulate a string, you use its name, like s. 518 00:25:11,050 --> 00:25:13,870 And you can access bracket 0, bracket 1, bracket 2, bracket 3, 519 00:25:13,870 --> 00:25:19,450 to get at the individual characters in that string like EMMA, E-M-M-A, 520 00:25:19,450 --> 00:25:20,810 respectively. 521 00:25:20,810 --> 00:25:26,050 But of course it's the case, especially per today's revelation, that really, 522 00:25:26,050 --> 00:25:28,280 all of those bytes have their own addresses. 523 00:25:28,280 --> 00:25:28,780 Right? 524 00:25:28,780 --> 00:25:31,780 We're not going to care after this week what those addresses are 525 00:25:31,780 --> 00:25:32,920 but they certainly exist. 526 00:25:32,920 --> 00:25:36,160 For instance, E might be at 0x123. 527 00:25:36,160 --> 00:25:38,020 M might be at 0x124-- 528 00:25:38,020 --> 00:25:42,675 1 byte away --0x125, 0x126, 0x127. 529 00:25:42,675 --> 00:25:45,550 They're deliberately 1 byte away because remember a string is defined 530 00:25:45,550 --> 00:25:47,930 by characters back-to-back-to-back. 531 00:25:47,930 --> 00:25:51,700 So let's say for the sake of discussion that EMMA name in memory 532 00:25:51,700 --> 00:25:54,670 happens to start at 0x123. 533 00:25:54,670 --> 00:25:58,270 Well, what then really is that variable s? 534 00:25:58,270 --> 00:26:01,960 Well, I dare say that s is really just a pointer. 535 00:26:01,960 --> 00:26:02,830 Right? 536 00:26:02,830 --> 00:26:06,790 It can be a variable, depicted here just as before, called s. 537 00:26:06,790 --> 00:26:08,920 And it stores the value 0x123. 538 00:26:08,920 --> 00:26:09,700 Why? 539 00:26:09,700 --> 00:26:11,590 That's where Emma's name begins. 540 00:26:11,590 --> 00:26:14,680 But of course, we don't really have to care about this level of precision, 541 00:26:14,680 --> 00:26:15,472 the actual numbers. 542 00:26:15,472 --> 00:26:17,140 Let's just draw it as a picture. 543 00:26:17,140 --> 00:26:21,980 s is, if you will, a pointer to Emma's actual name in memory, 544 00:26:21,980 --> 00:26:23,230 which might be down over here. 545 00:26:23,230 --> 00:26:24,147 It might be over here. 546 00:26:24,147 --> 00:26:27,040 It might be over here, depending on where in the computer's memory 547 00:26:27,040 --> 00:26:28,390 it ended up by chance. 548 00:26:28,390 --> 00:26:32,830 But this arrow just suggests that s is pointing to Emma, specifically 549 00:26:32,830 --> 00:26:35,020 at the first letter in her name. 550 00:26:35,020 --> 00:26:36,670 But that's sufficient though, right? 551 00:26:36,670 --> 00:26:41,377 Because how-- if s stores the beginning of Emma's name, 0x123. 552 00:26:41,377 --> 00:26:43,210 And that's indeed where the E is but we just 553 00:26:43,210 --> 00:26:45,940 draw this pictorially with an arrow. 554 00:26:45,940 --> 00:26:48,550 How does the computer know where Emma's name 555 00:26:48,550 --> 00:26:52,392 ends if all it's technically remembering is the beginning? 556 00:26:52,392 --> 00:26:54,100 AUDIENCE: The null terminating character. 557 00:26:54,100 --> 00:26:55,300 DAVID J. MALAN: The null terminating character. 558 00:26:55,300 --> 00:26:58,130 And we stipulated a couple of weeks ago that that is important. 559 00:26:58,130 --> 00:27:00,610 But now it's all the more important because it turns out 560 00:27:00,610 --> 00:27:03,640 that s, this thing we've been calling a string, 561 00:27:03,640 --> 00:27:08,530 has no familiarity with MMA or the null terminator. 562 00:27:08,530 --> 00:27:11,500 All s is pointing at technically, as of today, 563 00:27:11,500 --> 00:27:16,090 is the first letter in her name, which happens to be in this story at 0x123. 564 00:27:16,090 --> 00:27:19,570 But the computer is smart enough to know that if you just point it 565 00:27:19,570 --> 00:27:22,630 at the first letter in a string, it can figure out 566 00:27:22,630 --> 00:27:25,150 where the string ends by just looking-- 567 00:27:25,150 --> 00:27:29,440 as with a loop --for that null terminating character. 568 00:27:29,440 --> 00:27:35,590 So this is to say ultimately, that there is no such thing as string. 569 00:27:35,590 --> 00:27:37,870 And we'll see if this strikes a chord. 570 00:27:37,870 --> 00:27:39,740 There is no such thing as a string. 571 00:27:39,740 --> 00:27:42,160 This was a little white lie we began telling in Week 1 572 00:27:42,160 --> 00:27:46,190 just so that we could get interesting, real work done, manipulating text. 573 00:27:46,190 --> 00:27:51,306 But what is string most likely implemented as would you say? 574 00:27:51,306 --> 00:27:52,970 AUDIENCE: An array of characters. 575 00:27:52,970 --> 00:27:54,140 DAVID J. MALAN: An array of characters, yes. 576 00:27:54,140 --> 00:27:55,515 But that was Week 1's definition. 577 00:27:55,515 --> 00:27:58,070 What technically now, as of today, must a string be? 578 00:27:58,070 --> 00:27:59,520 AUDIENCE: [INAUDIBLE] 579 00:27:59,520 --> 00:28:00,000 DAVID J. MALAN: Sorry, over here. 580 00:28:00,000 --> 00:28:00,800 AUDIENCE: A pointer. 581 00:28:00,800 --> 00:28:01,883 DAVID J. MALAN: A pointer. 582 00:28:01,883 --> 00:28:04,170 Right? s, the variable in which I was storing 583 00:28:04,170 --> 00:28:08,790 Emma's name would seem to manifest a pattern just 584 00:28:08,790 --> 00:28:11,430 like we saw with the numbers a moment ago, the number 50. 585 00:28:11,430 --> 00:28:14,640 s seems to be storing the address of the first character 586 00:28:14,640 --> 00:28:16,170 in that sequence of characters. 587 00:28:16,170 --> 00:28:18,417 And so indeed, it would seem to be a string. 588 00:28:18,417 --> 00:28:20,250 Well, how do we actually connect these dots? 589 00:28:20,250 --> 00:28:22,500 Well suppose that we have this line of code 590 00:28:22,500 --> 00:28:24,910 again where we had int n equals 50. 591 00:28:24,910 --> 00:28:27,160 And then we had this other line of code where we said, 592 00:28:27,160 --> 00:28:31,170 go ahead and create a variable called p and store in it the address of n. 593 00:28:31,170 --> 00:28:33,210 That's where we left off earlier. 594 00:28:33,210 --> 00:28:36,990 But it turns out that this thing here is our data type from Week 1. 595 00:28:36,990 --> 00:28:40,650 This thing here, int star, is a new data type as of today. 596 00:28:40,650 --> 00:28:43,860 The variable stores, not an int, but the address of an int. 597 00:28:43,860 --> 00:28:47,910 It turns out that something like this line of code, with Emma's name, 598 00:28:47,910 --> 00:28:51,850 is synonymous with char star. 599 00:28:51,850 --> 00:28:52,350 Right? 600 00:28:52,350 --> 00:28:58,200 If a star represents an address and char represents the type of address being 601 00:28:58,200 --> 00:29:02,760 pointed at, just as int star can let you point at a value like n-- 602 00:29:02,760 --> 00:29:05,550 which stored 50 --so could a char star-- 603 00:29:05,550 --> 00:29:09,390 by that same logic --allow you to store the address of and therefore 604 00:29:09,390 --> 00:29:12,030 point at a character. 605 00:29:12,030 --> 00:29:14,790 And of course, as you said, from Week 1, a string 606 00:29:14,790 --> 00:29:16,890 is just a sequence of characters. 607 00:29:16,890 --> 00:29:21,210 So a string would seem to be just the address of the first byte 608 00:29:21,210 --> 00:29:23,010 in the sequence of characters. 609 00:29:23,010 --> 00:29:27,540 And the last byte happens to be all 0s by convention, to help us find the end. 610 00:29:27,540 --> 00:29:29,340 So what then more technically is a string 611 00:29:29,340 --> 00:29:31,800 and what is the CS50 library that we're now going 612 00:29:31,800 --> 00:29:34,440 to start taking off as training wheels? 613 00:29:34,440 --> 00:29:36,900 Well last week we introduced you to the notion of typedef, 614 00:29:36,900 --> 00:29:40,890 where you can create your own customized data type that does not exist in C 615 00:29:40,890 --> 00:29:42,810 but does exist in your own program. 616 00:29:42,810 --> 00:29:44,675 And we introduced this keyword, typedef. 617 00:29:44,675 --> 00:29:47,550 We proposed last week that this was useful because you could actually 618 00:29:47,550 --> 00:29:50,340 declare a fancy structure that encapsulates 619 00:29:50,340 --> 00:29:52,860 multiple variables, like name and number, 620 00:29:52,860 --> 00:29:56,160 and then we called this data structure, last week, a person. 621 00:29:56,160 --> 00:29:58,050 That was the new data type we invented. 622 00:29:58,050 --> 00:30:01,410 Well it turns out you can use typedef in exactly the same way 623 00:30:01,410 --> 00:30:04,740 even more simply than we did last week by saying this. 624 00:30:04,740 --> 00:30:08,460 If you say typedef char star string-- 625 00:30:08,460 --> 00:30:11,910 typedef means give me a new data type, just for my own use. 626 00:30:11,910 --> 00:30:17,610 Char star means the type of value is going to be the address of a character. 627 00:30:17,610 --> 00:30:21,480 And the name I want to give to that data type is going to be string. 628 00:30:21,480 --> 00:30:24,630 And so literally, this line of code here, this 629 00:30:24,630 --> 00:30:28,230 is one of the lines of code in CS50 dot h-- the header 630 00:30:28,230 --> 00:30:30,480 file you've been including for several weeks, 631 00:30:30,480 --> 00:30:33,540 where we are creating a data type called string 632 00:30:33,540 --> 00:30:35,792 to make it a synonym for char star. 633 00:30:35,792 --> 00:30:37,500 So that if you will, it's an abstraction, 634 00:30:37,500 --> 00:30:42,090 a simplification on top of the idea of a sequence of characters 635 00:30:42,090 --> 00:30:45,257 being pointed at by an address. 636 00:30:45,257 --> 00:30:45,840 Any questions? 637 00:30:45,840 --> 00:30:47,910 And honestly, this is why-- and maybe those sort 638 00:30:47,910 --> 00:30:51,270 of blank stares --this is why we introduced strings in Week 1 639 00:30:51,270 --> 00:30:55,080 as being an actual type as opposed to not existing at all. 640 00:30:55,080 --> 00:30:57,630 Because who really cares about addresses and pointers 641 00:30:57,630 --> 00:30:59,670 and all of that when all you want to do is like, 642 00:30:59,670 --> 00:31:04,950 print, hello world, or hello, so and so's name? 643 00:31:04,950 --> 00:31:05,730 Yeah, question. 644 00:31:05,730 --> 00:31:10,450 AUDIENCE: What other-- what other functions are created-- 645 00:31:10,450 --> 00:31:13,568 major functions are created by CS50 are not intrinsic to-- 646 00:31:13,568 --> 00:31:15,110 DAVID J. MALAN: Really good question. 647 00:31:15,110 --> 00:31:16,610 We'll come back to this later today. 648 00:31:16,610 --> 00:31:19,410 But other functions that are defined in the CS50 library that 649 00:31:19,410 --> 00:31:21,660 are training wheels that come off today are getString, 650 00:31:21,660 --> 00:31:24,630 getInt, getFloat, and the other get functions as well. 651 00:31:24,630 --> 00:31:27,630 But that's about it that we do for you. 652 00:31:27,630 --> 00:31:30,072 Other questions? 653 00:31:30,072 --> 00:31:31,060 Yeah. 654 00:31:31,060 --> 00:31:34,024 AUDIENCE: Can you define all of these words again? 655 00:31:34,024 --> 00:31:38,964 Like, it's-- so string is like a character pointer which points-- 656 00:31:38,964 --> 00:31:40,107 I was confused about that. 657 00:31:40,107 --> 00:31:40,940 Can you repeat that? 658 00:31:40,940 --> 00:31:42,030 DAVID J. MALAN: Sure. 659 00:31:42,030 --> 00:31:47,707 A string, per this definition, is a char star, as a programmer would say. 660 00:31:47,707 --> 00:31:48,540 What does that mean? 661 00:31:48,540 --> 00:31:53,850 A string is quite simply a variable that contains the address of a character. 662 00:31:53,850 --> 00:31:56,970 By our human convention, that character might be the beginning 663 00:31:56,970 --> 00:31:59,400 of a multi character sequence. 664 00:31:59,400 --> 00:32:01,590 But that's what we called strings in Week 1. 665 00:32:01,590 --> 00:32:05,273 So a string is just the address of a single character. 666 00:32:05,273 --> 00:32:08,190 And we leave it to human convention to know that the end of the string 667 00:32:08,190 --> 00:32:12,000 will just be demarcated by eight 0 bits, a.k.a. 668 00:32:12,000 --> 00:32:13,088 the null terminator. 669 00:32:13,088 --> 00:32:14,880 And this is the sense in which-- especially 670 00:32:14,880 --> 00:32:16,755 if you have some prior programming experience 671 00:32:16,755 --> 00:32:18,570 --that C is much more low level. 672 00:32:18,570 --> 00:32:20,700 In Python, as you'll soon see in a few weeks, 673 00:32:20,700 --> 00:32:22,867 everything just works so splendidly easily. 674 00:32:22,867 --> 00:32:24,700 If you want a string, you can have a string. 675 00:32:24,700 --> 00:32:27,242 You don't have to worry about any of these low level details. 676 00:32:27,242 --> 00:32:30,090 But that's because Python is built here, conceptually, 677 00:32:30,090 --> 00:32:33,780 where C is built down here-- so to speak --closer to the computer's memory. 678 00:32:33,780 --> 00:32:34,740 But there's no magic. 679 00:32:34,740 --> 00:32:36,060 If you want to string, fine. 680 00:32:36,060 --> 00:32:38,310 Just remember where it starts, remember where it ends. 681 00:32:38,310 --> 00:32:39,630 And boom, you're done. 682 00:32:39,630 --> 00:32:45,130 The star in the syntax today is just a way of expressing those ideas in code. 683 00:32:45,130 --> 00:32:47,520 So let's go ahead then and experiment with this string, 684 00:32:47,520 --> 00:32:51,660 just as we did a moment ago using Emma's name now instead of an int. 685 00:32:51,660 --> 00:32:53,730 So let me go ahead and erase those lines earlier. 686 00:32:53,730 --> 00:32:57,930 And let me go back to Week 1 style stuff, where I just say string s 687 00:32:57,930 --> 00:32:59,340 equals quote unquote, Emma. 688 00:32:59,340 --> 00:33:03,750 And then of course, if I to print this, I can simply say this as before. 689 00:33:03,750 --> 00:33:07,950 So just as a quick safety check, let me go ahead and make address again. 690 00:33:07,950 --> 00:33:09,390 Whoops. 691 00:33:09,390 --> 00:33:11,570 What did I do wrong? 692 00:33:11,570 --> 00:33:13,620 Let me scroll up to the first-- 693 00:33:13,620 --> 00:33:15,964 of many it seems --errors. 694 00:33:15,964 --> 00:33:17,410 Yeah. 695 00:33:17,410 --> 00:33:19,873 AUDIENCE: You're using string, [INAUDIBLE] 696 00:33:19,873 --> 00:33:22,040 DAVID J. MALAN: Yeah, I kind of shouldn't have taken 697 00:33:22,040 --> 00:33:23,750 off all the training wheels just yet. 698 00:33:23,750 --> 00:33:25,087 I'm still using string. 699 00:33:25,087 --> 00:33:27,170 So let me go ahead and put that back just for now. 700 00:33:27,170 --> 00:33:29,870 That will give me access to that typedef for string. 701 00:33:29,870 --> 00:33:31,960 Let me recompile it as make address. 702 00:33:31,960 --> 00:33:32,480 That worked. 703 00:33:32,480 --> 00:33:33,980 So that was the solution, thank you. 704 00:33:33,980 --> 00:33:35,360 And then address again. 705 00:33:35,360 --> 00:33:36,470 We just see Emma. 706 00:33:36,470 --> 00:33:39,810 So what can we now do that's a little bit different here? 707 00:33:39,810 --> 00:33:42,350 Well, one, you know what I can actually do? 708 00:33:42,350 --> 00:33:45,230 I can get rid of this-- the solution a moment ago --and say, 709 00:33:45,230 --> 00:33:46,440 I don't need string anymore. 710 00:33:46,440 --> 00:33:47,898 I don't need those training wheels. 711 00:33:47,898 --> 00:33:51,483 If s is going to represent a string, technically, s 712 00:33:51,483 --> 00:33:53,900 is just going to store the address of the first character. 713 00:33:53,900 --> 00:33:57,170 And it suffices actually, just to write this. 714 00:33:57,170 --> 00:34:00,170 So literally instead of string, you write char star. 715 00:34:00,170 --> 00:34:01,730 Technically, you don't need-- 716 00:34:01,730 --> 00:34:03,680 you can have extra space to the left or right. 717 00:34:03,680 --> 00:34:08,150 But most programmers write it just as I have here, char star variable name. 718 00:34:08,150 --> 00:34:11,045 That looks scarier now but it's no different from what 719 00:34:11,045 --> 00:34:12,170 we've been doing for weeks. 720 00:34:12,170 --> 00:34:14,780 If I now do make address without the CS50 library, 721 00:34:14,780 --> 00:34:17,670 still works, because C knows what I'm talking about. 722 00:34:17,670 --> 00:34:20,780 And if I run address now, I still see Emma. 723 00:34:20,780 --> 00:34:22,270 But now I can start to play around. 724 00:34:22,270 --> 00:34:22,770 Right? 725 00:34:22,770 --> 00:34:26,480 If s is the address of a character, what was the format code 726 00:34:26,480 --> 00:34:29,570 I can use to print an address? 727 00:34:29,570 --> 00:34:30,947 Not percent i, but-- 728 00:34:30,947 --> 00:34:31,780 AUDIENCE: Percent p. 729 00:34:31,780 --> 00:34:33,650 DAVID J. MALAN: Percent p, a pointer. 730 00:34:33,650 --> 00:34:35,600 So let me go ahead and recompile this now. 731 00:34:35,600 --> 00:34:38,512 Make address, that compiles too. 732 00:34:38,512 --> 00:34:41,179 And when I run dot slash address, I'm not going to see Emma now. 733 00:34:41,179 --> 00:34:44,427 What should I see instead? 734 00:34:44,427 --> 00:34:45,260 Some address, right? 735 00:34:45,260 --> 00:34:46,409 I have no idea what it is. 736 00:34:46,409 --> 00:34:50,060 It looks like Emma's name is stored at 0x42A9F2, 737 00:34:50,060 --> 00:34:52,489 whatever that number translates to decimal, somewhere 738 00:34:52,489 --> 00:34:53,870 in the computer's memory. 739 00:34:53,870 --> 00:34:57,482 But it turns out then too, what about this? 740 00:34:57,482 --> 00:34:59,690 Let me go ahead and add another line of code and say, 741 00:34:59,690 --> 00:35:01,850 you know what, I'm really curious now. 742 00:35:01,850 --> 00:35:06,470 What is the address of the first letter in Emma's name? 743 00:35:06,470 --> 00:35:10,400 How do I express in C, the first letter only of Emma's name 744 00:35:10,400 --> 00:35:12,066 if Emma is stored in s. 745 00:35:12,066 --> 00:35:13,930 AUDIENCE: [INAUDIBLE] 746 00:35:13,930 --> 00:35:17,085 DAVID J. MALAN: s bracket zero, right? 747 00:35:17,085 --> 00:35:18,210 That would seem to be that. 748 00:35:18,210 --> 00:35:18,960 But that is what? 749 00:35:18,960 --> 00:35:21,510 That's a char. s bracket 0 is a char. 750 00:35:21,510 --> 00:35:23,850 How do I get the address of s bracket 0? 751 00:35:23,850 --> 00:35:24,770 AUDIENCE: Ampersand. 752 00:35:24,770 --> 00:35:26,728 DAVID J. MALAN: Yeah, I can just say ampersand. 753 00:35:26,728 --> 00:35:27,228 Right? 754 00:35:27,228 --> 00:35:29,180 So it's ugly looking but that's fine for now. 755 00:35:29,180 --> 00:35:30,620 Make address, enter. 756 00:35:30,620 --> 00:35:31,940 Whoops. 757 00:35:31,940 --> 00:35:34,130 It's uglier because I forgot my semicolon. 758 00:35:34,130 --> 00:35:37,160 Let me go ahead and make address again, enter. 759 00:35:37,160 --> 00:35:38,270 Seems to compile. 760 00:35:38,270 --> 00:35:41,850 And when I run dot slash address now, notice I get the same thing. 761 00:35:41,850 --> 00:35:43,880 And this is because C is taking me literally. 762 00:35:43,880 --> 00:35:46,972 When you print out s, a string, it's technically just the address 763 00:35:46,972 --> 00:35:47,930 of the first character. 764 00:35:47,930 --> 00:35:50,330 And indeed, I can corroborate as much by running s 765 00:35:50,330 --> 00:35:53,570 bracket zero then get the address of the first character. 766 00:35:53,570 --> 00:35:55,560 And they are indeed one in the same. 767 00:35:55,560 --> 00:35:59,660 So a string is this sort of abstraction on top of a bunch of characters. 768 00:35:59,660 --> 00:36:01,880 But again, s is just an address. 769 00:36:01,880 --> 00:36:03,520 And that's all we're emphasizing now. 770 00:36:03,520 --> 00:36:06,020 And if I get really curious-- not that you would necessarily 771 00:36:06,020 --> 00:36:08,090 do this in a real program --what if I print 772 00:36:08,090 --> 00:36:13,310 out a few more characters in Emma's name, like s bracket 1, 2, and 3? 773 00:36:13,310 --> 00:36:16,700 Let me go ahead, just out of curiosity and make this program and dot slash 774 00:36:16,700 --> 00:36:17,750 address. 775 00:36:17,750 --> 00:36:23,960 Now notice what I see, is again, s's address is at 42AB52. 776 00:36:23,960 --> 00:36:26,150 The first character in s is at the same thing, 777 00:36:26,150 --> 00:36:28,220 by definition of what a string is. 778 00:36:28,220 --> 00:36:31,280 And then notice what's kind of neat-- if this is-- 779 00:36:31,280 --> 00:36:35,960 if-- for some definition of neat --53, 54, 55 is noteworthy. 780 00:36:35,960 --> 00:36:36,670 Why? 781 00:36:36,670 --> 00:36:39,270 They're one byte apart. 782 00:36:39,270 --> 00:36:42,590 So this whole time, whenever you implemented Caesar, or substitution, 783 00:36:42,590 --> 00:36:45,053 or some other cipher in problem set two, anytime 784 00:36:45,053 --> 00:36:47,720 you were manipulating individual characters-- you didn't know it 785 00:36:47,720 --> 00:36:49,940 --but you were just visiting different mailboxes. 786 00:36:49,940 --> 00:36:53,330 You were just visiting different addresses in the computer's memory 787 00:36:53,330 --> 00:36:57,630 in order to manipulate them somehow. 788 00:36:57,630 --> 00:36:58,130 All right. 789 00:36:58,130 --> 00:37:00,530 Can I do one last demo that's a little arcane and then 790 00:37:00,530 --> 00:37:02,820 we'll make things more-- more real? 791 00:37:02,820 --> 00:37:03,320 All right. 792 00:37:03,320 --> 00:37:06,950 So it turns out if all that's going on underneath the hood 793 00:37:06,950 --> 00:37:10,940 is just addresses, watch what I can do here. 794 00:37:10,940 --> 00:37:16,370 If I want to go ahead and print out what is at the address s, 795 00:37:16,370 --> 00:37:21,695 what will I find in memory if I go to the address in s? 796 00:37:21,695 --> 00:37:23,030 AUDIENCE: [INAUDIBLE] 797 00:37:23,030 --> 00:37:23,750 DAVID J. MALAN: Sorry, a little louder. 798 00:37:23,750 --> 00:37:25,010 AUDIENCE: The first letter. 799 00:37:25,010 --> 00:37:27,302 DAVID J. MALAN: The first letter in Emma's name, right? 800 00:37:27,302 --> 00:37:29,840 If we can all agree-- even if it's a little unfamiliar still 801 00:37:29,840 --> 00:37:33,290 --that s is just the address of a character, and I say, go to s, 802 00:37:33,290 --> 00:37:34,670 what should I see specifically? 803 00:37:34,670 --> 00:37:35,900 AUDIENCE: [INAUDIBLE] 804 00:37:35,900 --> 00:37:38,510 DAVID J. MALAN: Probably E in Emma Right? 805 00:37:38,510 --> 00:37:41,070 If s is the address of the first character of her name, 806 00:37:41,070 --> 00:37:44,300 star s would mean go to that character. 807 00:37:44,300 --> 00:37:46,220 So let me go ahead and print that as a char. 808 00:37:46,220 --> 00:37:51,770 So let me go ahead now and make address dot slash address, enter. 809 00:37:51,770 --> 00:37:56,030 There is the E because I can say, go to that address and print what's there. 810 00:37:56,030 --> 00:37:59,060 And I can actually do this for all of her letters in her name. 811 00:37:59,060 --> 00:38:01,250 Let me go ahead and print out another one here. 812 00:38:01,250 --> 00:38:03,980 So how do I get at the second letter in Emma's name? 813 00:38:03,980 --> 00:38:07,050 Previous-- normally, like last week, we would have done this. 814 00:38:07,050 --> 00:38:10,218 And that just magically gets you to the second letter in her name. 815 00:38:10,218 --> 00:38:11,760 But I can do it a little differently. 816 00:38:11,760 --> 00:38:16,010 What if I go to s and then, from where do I want 817 00:38:16,010 --> 00:38:18,200 to go from s to get the second letter? 818 00:38:18,200 --> 00:38:19,087 AUDIENCE: Plus one. 819 00:38:19,087 --> 00:38:20,420 DAVID J. MALAN: Plus one, right? 820 00:38:20,420 --> 00:38:22,730 I mean, maybe we can literally just do arithmetic here. 821 00:38:22,730 --> 00:38:26,120 If s is the address of her first letter, it stands to reason that s plus 1 822 00:38:26,120 --> 00:38:27,830 is the address of her second letter. 823 00:38:27,830 --> 00:38:31,940 So make address now dot slash address. 824 00:38:31,940 --> 00:38:34,700 And I should see EM. 825 00:38:34,700 --> 00:38:39,710 And I can do this twice more maybe and go ahead and do this and then this. 826 00:38:39,710 --> 00:38:44,570 But this time add 2 and this time add 3, just doing some simple arithmetic. 827 00:38:44,570 --> 00:38:50,760 Make address dot slash address, there is Emma but in a much lower level detail. 828 00:38:50,760 --> 00:38:52,370 So what is this bracket symbol? 829 00:38:52,370 --> 00:38:55,307 In computer science, this is what's called syntactic sugar. 830 00:38:55,307 --> 00:38:56,390 It's kind of a silly name. 831 00:38:56,390 --> 00:39:00,800 But it just refers to a handy feature so that you, the programmer, can say, 832 00:39:00,800 --> 00:39:03,230 s bracket 0 or bracket 1. 833 00:39:03,230 --> 00:39:06,500 But what the computer is actually doing underneath the hood-- the compiler, 834 00:39:06,500 --> 00:39:10,520 Clang --it's actually converting all of your uses of square brackets since Week 835 00:39:10,520 --> 00:39:13,790 1 to this format here. 836 00:39:13,790 --> 00:39:15,980 It's just doing arithmetic underneath the hood. 837 00:39:15,980 --> 00:39:18,120 Now you don't have to do this moving forward. 838 00:39:18,120 --> 00:39:21,740 But I point out this low level detail just to give you a sense of, 839 00:39:21,740 --> 00:39:23,090 there really is no magic. 840 00:39:23,090 --> 00:39:25,460 When you say, go print an address or go do this, 841 00:39:25,460 --> 00:39:29,010 the computer is taking you literally. 842 00:39:29,010 --> 00:39:29,990 Whew. 843 00:39:29,990 --> 00:39:30,920 OK, that was a lot. 844 00:39:30,920 --> 00:39:32,046 Yes, question. 845 00:39:32,046 --> 00:39:37,006 AUDIENCE: So [INAUDIBLE] 846 00:39:37,006 --> 00:39:39,500 847 00:39:39,500 --> 00:39:42,830 DAVID J. MALAN: Star s would mean go to the address in s. 848 00:39:42,830 --> 00:39:52,708 AUDIENCE: So why for instance, if you [INAUDIBLE] character [INAUDIBLE] 849 00:39:52,708 --> 00:39:54,250 DAVID J. MALAN: Really good question. 850 00:39:54,250 --> 00:39:58,360 Why, when you print out s, does it print out the whole string and not 851 00:39:58,360 --> 00:39:59,410 just the character? 852 00:39:59,410 --> 00:40:02,020 That's what the printf format code is doing for you. 853 00:40:02,020 --> 00:40:06,160 When you tell printf to use percent s, that has special meaning to printf. 854 00:40:06,160 --> 00:40:08,980 And it knows to go to the first address and not just print 855 00:40:08,980 --> 00:40:12,280 the second-- the first char, but print every character thereafter 856 00:40:12,280 --> 00:40:13,880 until it sees what? 857 00:40:13,880 --> 00:40:15,130 AUDIENCE: The null terminator. 858 00:40:15,130 --> 00:40:17,088 DAVID J. MALAN: The null terminating character. 859 00:40:17,088 --> 00:40:21,370 So printf and percent s are special and have been special since the Week 1. 860 00:40:21,370 --> 00:40:24,580 They just know to do exactly what you've described. 861 00:40:24,580 --> 00:40:27,730 So pointer arithmetic, to be clear, is just taking addresses and like, 862 00:40:27,730 --> 00:40:30,040 doing arithmetic with them, adding 1, adding 2, adding 863 00:40:30,040 --> 00:40:33,890 3, or any other manipulation like that. 864 00:40:33,890 --> 00:40:34,390 All right. 865 00:40:34,390 --> 00:40:38,004 So [CHUCKLE] let's take another stab at a meme here. 866 00:40:38,004 --> 00:40:38,790 [CHUCKLE] 867 00:40:38,790 --> 00:40:39,700 OK, a few of us. 868 00:40:39,700 --> 00:40:41,397 All right. 869 00:40:41,397 --> 00:40:42,730 All right, it's trying too hard. 870 00:40:42,730 --> 00:40:43,230 All right. 871 00:40:43,230 --> 00:40:45,850 So what then do we have when it comes to strings? 872 00:40:45,850 --> 00:40:48,430 Well, let's now try to learn from these primitives 873 00:40:48,430 --> 00:40:51,490 and actually trip over some mistakes that we might otherwise make. 874 00:40:51,490 --> 00:40:53,840 I'm going to go ahead and open up a new file. 875 00:40:53,840 --> 00:40:56,590 I'm going to go ahead and call this one, compare. 876 00:40:56,590 --> 00:40:58,965 So we'll save this as compare dot c. 877 00:40:58,965 --> 00:41:01,840 And this will be reminiscent of something we started doing last week. 878 00:41:01,840 --> 00:41:04,420 And you've done this past week, particularly for implementing 879 00:41:04,420 --> 00:41:05,715 voting and comparing strings. 880 00:41:05,715 --> 00:41:07,840 I'm going to go ahead and make a quick program that 881 00:41:07,840 --> 00:41:09,225 just compares two integers. 882 00:41:09,225 --> 00:41:11,600 I'm going to put the training wheels back on temporarily, 883 00:41:11,600 --> 00:41:13,683 just so that we can get some numbers from the user 884 00:41:13,683 --> 00:41:17,900 pretty easily, including CS50 dot h and standard I/O dot h. 885 00:41:17,900 --> 00:41:21,080 I'm going to do int main void as my program. 886 00:41:21,080 --> 00:41:25,850 I'm going to get an integer called i and ask the human for that. 887 00:41:25,850 --> 00:41:29,440 I'm going to get another integer called j, ask the human for that. 888 00:41:29,440 --> 00:41:32,740 And then I'm going to go ahead and say if i equals equals j, 889 00:41:32,740 --> 00:41:38,570 then go ahead and print with printf that they're the same. 890 00:41:38,570 --> 00:41:43,510 Else, if i does not equal j, I'm going to go ahead quite simply and print out 891 00:41:43,510 --> 00:41:45,430 different backslash n. 892 00:41:45,430 --> 00:41:48,310 So if i equals equals j, it should say, same. 893 00:41:48,310 --> 00:41:50,620 Else, if it's different, it should say different. 894 00:41:50,620 --> 00:41:54,460 So let me go ahead and make compare dot slash compare. 895 00:41:54,460 --> 00:42:00,130 And I should see, hopefully, if I type in say, 1, 2, they're different. 896 00:42:00,130 --> 00:42:02,900 And if I instead do 1, 1, they're the same. 897 00:42:02,900 --> 00:42:03,400 All right. 898 00:42:03,400 --> 00:42:06,580 So it stands to reason that logically this is pretty straightforward when 899 00:42:06,580 --> 00:42:07,858 you want to compare things. 900 00:42:07,858 --> 00:42:10,400 So instead of using numbers, let me go ahead and change this. 901 00:42:10,400 --> 00:42:15,890 Let me go ahead and do, say, string s gets getString, just as before 902 00:42:15,890 --> 00:42:18,250 but using getString instead and ask the human for s. 903 00:42:18,250 --> 00:42:21,700 Then give me another string, t, just because it's alphabetically next. 904 00:42:21,700 --> 00:42:24,010 And I'll ask the human for t. 905 00:42:24,010 --> 00:42:26,860 And then I'm going to go ahead and ask this question, if s equals 906 00:42:26,860 --> 00:42:30,340 equals t, print same, else, print different. 907 00:42:30,340 --> 00:42:33,400 So now let me go ahead and make compare again. 908 00:42:33,400 --> 00:42:36,160 I'm going to go ahead and type in dot slash compare. 909 00:42:36,160 --> 00:42:37,540 We'll type in Emma. 910 00:42:37,540 --> 00:42:39,160 We'll then type in Rodrigo. 911 00:42:39,160 --> 00:42:41,230 And of course, it's going to say different. 912 00:42:41,230 --> 00:42:44,230 But if I instead run it again and type in Emma and all right, 913 00:42:44,230 --> 00:42:45,680 I'll type Emma again-- 914 00:42:45,680 --> 00:42:47,080 hmm, different. 915 00:42:47,080 --> 00:42:49,900 Maybe it's a capitalization thing? 916 00:42:49,900 --> 00:42:51,430 No. 917 00:42:51,430 --> 00:42:54,230 But why as of today, are they indeed different? 918 00:42:54,230 --> 00:42:56,980 Last week we kind of waved our hands and said, ah, they're arrays, 919 00:42:56,980 --> 00:42:57,770 you have to do some stuff. 920 00:42:57,770 --> 00:42:58,895 But why are they different? 921 00:42:58,895 --> 00:43:00,895 AUDIENCE: They're stored in different locations. 922 00:43:00,895 --> 00:43:03,520 DAVID J. MALAN: Exactly, they're stored in different locations. 923 00:43:03,520 --> 00:43:06,400 So when you get a string with getString and call it s, and then you 924 00:43:06,400 --> 00:43:09,100 get another string with t and call it t, you're 925 00:43:09,100 --> 00:43:10,780 getting two different chunks of memory. 926 00:43:10,780 --> 00:43:14,650 And yes, maybe the human has typed the same thing into the keyboard, 927 00:43:14,650 --> 00:43:17,220 but that doesn't necessarily mean that they're going 928 00:43:17,220 --> 00:43:19,000 to be stored in the exact same place. 929 00:43:19,000 --> 00:43:22,420 In fact, what we really have here is a picture not unlike this. 930 00:43:22,420 --> 00:43:26,410 If I have a variable called s-- and I'm just going to draw it as a box there 931 00:43:26,410 --> 00:43:28,180 --and if I have a variable called t-- 932 00:43:28,180 --> 00:43:31,510 I'll draw it as another box here --and I typed in Emma-- 933 00:43:31,510 --> 00:43:35,260 E-M-M-A --that's going to give me somewhere in memory, 934 00:43:35,260 --> 00:43:40,390 E-M-M-A backslash 0. 935 00:43:40,390 --> 00:43:43,720 And I'll try it as an actual array, albeit a little messily. 936 00:43:43,720 --> 00:43:46,900 And then here, if I type EMMA again in all caps, 937 00:43:46,900 --> 00:43:48,700 it's going to end up-- thanks to getString, 938 00:43:48,700 --> 00:43:50,590 at a different location in memory. 939 00:43:50,590 --> 00:43:54,470 By nature of how getString works, it's going to store anything you type in it. 940 00:43:54,470 --> 00:43:56,403 And what's going to get stored in s and t? 941 00:43:56,403 --> 00:43:58,570 Well, for the sake of discussion, let's suppose that 942 00:43:58,570 --> 00:44:01,540 this chunk of memory with the first input-- 943 00:44:01,540 --> 00:44:06,250 sorry --happens to be at 0x123. 944 00:44:06,250 --> 00:44:10,960 And the second chunk of memory happens to be at 0x456, just by chance. 945 00:44:10,960 --> 00:44:13,920 Well, what am I technically storing in s? 946 00:44:13,920 --> 00:44:16,110 0x123. 947 00:44:16,110 --> 00:44:17,250 And what am I storing in t? 948 00:44:17,250 --> 00:44:19,530 0x456. 949 00:44:19,530 --> 00:44:22,080 So when you say, is s equal equal t. 950 00:44:22,080 --> 00:44:23,230 Is it? 951 00:44:23,230 --> 00:44:23,730 Well, no. 952 00:44:23,730 --> 00:44:27,243 You're literally comparing 123 versus 456. 953 00:44:27,243 --> 00:44:29,160 The computer is not going to presumptuously go 954 00:44:29,160 --> 00:44:32,670 to that address for you unless you somehow tell it to. 955 00:44:32,670 --> 00:44:35,130 Put another way, if I instead draw these boxes, 956 00:44:35,130 --> 00:44:39,990 not as actual numbers, what we really have-- sorry --what we really have is 957 00:44:39,990 --> 00:44:42,000 what we'll draw as an arrow more generally, 958 00:44:42,000 --> 00:44:44,220 just a pointer to that value. 959 00:44:44,220 --> 00:44:46,400 Who really cares where the address is? 960 00:44:46,400 --> 00:44:48,900 So this is why last week we kind of waved our hand and said, 961 00:44:48,900 --> 00:44:51,840 eh, you can't just compare two strings because you probably 962 00:44:51,840 --> 00:44:53,310 have to compare every character. 963 00:44:53,310 --> 00:44:54,750 And that was true. 964 00:44:54,750 --> 00:44:58,140 But what you're technically comparing is indeed 965 00:44:58,140 --> 00:45:02,220 the addresses of those two variables. 966 00:45:02,220 --> 00:45:06,828 Any questions then on this here? 967 00:45:06,828 --> 00:45:09,220 Yeah. 968 00:45:09,220 --> 00:45:10,290 Sure, yes. 969 00:45:10,290 --> 00:45:14,190 AUDIENCE: So you said earlier that the, I 970 00:45:14,190 --> 00:45:17,190 guess, the pointer, and the actual thing it's 971 00:45:17,190 --> 00:45:22,080 pointing are like kind of somewhere in the memory not in a specific-- 972 00:45:22,080 --> 00:45:23,330 they're just somewhere, right? 973 00:45:23,330 --> 00:45:24,122 DAVID J. MALAN: OK. 974 00:45:24,122 --> 00:45:27,100 AUDIENCE: So do you need something that points to the point-- 975 00:45:27,100 --> 00:45:29,250 how does the computer know where the pointer is? 976 00:45:29,250 --> 00:45:32,250 DAVID J. MALAN: Oh, how does the computer know where these pointers are? 977 00:45:32,250 --> 00:45:33,690 So that's a really good question. 978 00:45:33,690 --> 00:45:35,610 And let's answer it right here. 979 00:45:35,610 --> 00:45:39,572 All this time when you've been calling getString to get a string, 980 00:45:39,572 --> 00:45:42,780 you've probably been assigning it to a variable like I have here on line six, 981 00:45:42,780 --> 00:45:44,130 with string s. 982 00:45:44,130 --> 00:45:49,440 But we know as of today that if we get rid of the CS50 library, technically, 983 00:45:49,440 --> 00:45:52,530 string is just synonymous with char star. 984 00:45:52,530 --> 00:45:58,110 And so both here and with t, do you technically have char star, right? 985 00:45:58,110 --> 00:46:01,200 It's just a find and replace if we get rid of that training wheel. 986 00:46:01,200 --> 00:46:05,310 Char star just means s is storing the address of a character. 987 00:46:05,310 --> 00:46:07,980 And char star t means t is storing the address of a character. 988 00:46:07,980 --> 00:46:14,610 Ergo, all this time since the Week 1 of CS50, what type of value has getString 989 00:46:14,610 --> 00:46:19,030 been returning, even though we never described it as such? 990 00:46:19,030 --> 00:46:22,040 What must getString be returning? 991 00:46:22,040 --> 00:46:22,752 Yeah. 992 00:46:22,752 --> 00:46:24,695 AUDIENCE: The index of the first letter. 993 00:46:24,695 --> 00:46:27,153 DAVID J. MALAN: Not even the index per se, but rather the-- 994 00:46:27,153 --> 00:46:28,350 AUDIENCE: It houses the memory of that. 995 00:46:28,350 --> 00:46:30,860 DAVID J. MALAN: The address of the first character. 996 00:46:30,860 --> 00:46:33,670 So anytime you called getString, getString code we wrote 997 00:46:33,670 --> 00:46:35,920 is finding in your computer's memory some free space, 998 00:46:35,920 --> 00:46:39,460 enough bytes to fit whatever the word was that got typed in. 999 00:46:39,460 --> 00:46:41,230 getString then, if we looked at its code, 1000 00:46:41,230 --> 00:46:46,840 is designed to return the address of the first byte of that chunk of memory. 1001 00:46:46,840 --> 00:46:49,960 So getString, this whole time, has been returning, if you will, 1002 00:46:49,960 --> 00:46:51,700 what's called a pointer. 1003 00:46:51,700 --> 00:46:54,970 But again, nuances that we didn't want to get into in the very first week 1004 00:46:54,970 --> 00:46:58,040 certainly, of C programming. 1005 00:46:58,040 --> 00:46:58,540 All right. 1006 00:46:58,540 --> 00:47:00,520 Well, let's go ahead and make this a little more concrete. 1007 00:47:00,520 --> 00:47:02,770 If I pull up this code, I don't have to just check 1008 00:47:02,770 --> 00:47:05,687 if they're same or different, let me just go ahead and print them out. 1009 00:47:05,687 --> 00:47:09,790 If I do percent p backslash n, I can literally print out s. 1010 00:47:09,790 --> 00:47:14,590 And if I go ahead and print out the same thing for t using percent p, 1011 00:47:14,590 --> 00:47:16,560 I can print out the value of t. 1012 00:47:16,560 --> 00:47:18,400 So let me go ahead and make compare. 1013 00:47:18,400 --> 00:47:19,233 Seems to compile OK. 1014 00:47:19,233 --> 00:47:21,358 And I don't know what the addresses are in advance. 1015 00:47:21,358 --> 00:47:23,980 But let me go ahead and type in, for instance, Emma and Emma. 1016 00:47:23,980 --> 00:47:26,890 So even though those strings look the same notice, 1017 00:47:26,890 --> 00:47:31,690 it's a little subtle this time, the first Emma's at 0xED76A0. 1018 00:47:31,690 --> 00:47:39,668 The second Emma's at 0xED76E0, which is a few numbers away from the first Emma. 1019 00:47:39,668 --> 00:47:41,710 So that just corroborates the instincts last week 1020 00:47:41,710 --> 00:47:44,330 that we can't just compare them like that. 1021 00:47:44,330 --> 00:47:46,300 So what are the implications then? 1022 00:47:46,300 --> 00:47:48,100 Let's do one other example here. 1023 00:47:48,100 --> 00:47:51,110 Let me go ahead and save this as copy dot C. 1024 00:47:51,110 --> 00:47:52,660 And let's try a very reasonable goal. 1025 00:47:52,660 --> 00:47:56,230 If I want to go ahead and get the user's input and actually copy a string 1026 00:47:56,230 --> 00:47:58,883 and capitalize the string from the user, let's see this. 1027 00:47:58,883 --> 00:48:02,050 So let me go ahead and give myself the temporary training wheels again, just 1028 00:48:02,050 --> 00:48:03,880 so I can get a string from the human. 1029 00:48:03,880 --> 00:48:08,710 Let me go ahead and include standard I/O dot h and then an int main void. 1030 00:48:08,710 --> 00:48:10,690 Let me do a simple example, the goal of which 1031 00:48:10,690 --> 00:48:16,430 now, is to get a string from the user and capitalize a copy thereof. 1032 00:48:16,430 --> 00:48:20,620 So I'm going to go ahead and do string s gets getString and call it s, 1033 00:48:20,620 --> 00:48:21,640 as before. 1034 00:48:21,640 --> 00:48:24,820 I'm going to go ahead and then do string t equals 1035 00:48:24,820 --> 00:48:26,767 s to make a copy of the variable. 1036 00:48:26,767 --> 00:48:28,600 And then I'm going to go ahead and say what? 1037 00:48:28,600 --> 00:48:30,640 Let me go ahead and capitalize the copy. 1038 00:48:30,640 --> 00:48:34,840 And to capitalize the copy, I can just change the first character 1039 00:48:34,840 --> 00:48:38,290 in t, so t bracket 0, to what? 1040 00:48:38,290 --> 00:48:41,316 I think we had toupper a while back. 1041 00:48:41,316 --> 00:48:42,940 Does this seem familiar? 1042 00:48:42,940 --> 00:48:44,473 You can call the toupper function. 1043 00:48:44,473 --> 00:48:46,390 And the toupper function, if you don't recall, 1044 00:48:46,390 --> 00:48:48,940 you technically have to use C type dot h. 1045 00:48:48,940 --> 00:48:51,040 This might be reminiscent of the second c problem 1046 00:48:51,040 --> 00:48:54,650 set, where you might have used this in Caesar, or substitution, or the like. 1047 00:48:54,650 --> 00:48:55,150 All right. 1048 00:48:55,150 --> 00:48:57,525 And now, let me go ahead and print out these two strings. 1049 00:48:57,525 --> 00:49:00,640 Let me go ahead and print out s. 1050 00:49:00,640 --> 00:49:04,390 And let me go ahead and print out t. 1051 00:49:04,390 --> 00:49:08,470 So again, all I've done in this program is get a string from the user, 1052 00:49:08,470 --> 00:49:13,060 copy that string, capitalize the copy called t. 1053 00:49:13,060 --> 00:49:15,260 And let's just print out the end results. 1054 00:49:15,260 --> 00:49:17,060 So let me go ahead and save the file. 1055 00:49:17,060 --> 00:49:19,360 Let me go ahead and make copy. 1056 00:49:19,360 --> 00:49:20,410 Seems to compile OK. 1057 00:49:20,410 --> 00:49:21,868 Let me go ahead and run copy. 1058 00:49:21,868 --> 00:49:24,160 And let me go ahead and type in emma, in all lowercase, 1059 00:49:24,160 --> 00:49:28,746 deliberately, because I want to see that t is capitalized but not s. 1060 00:49:28,746 --> 00:49:30,480 Hmm. 1061 00:49:30,480 --> 00:49:34,050 But somehow they're both capitalized. 1062 00:49:34,050 --> 00:49:39,060 Notice, that emma in all lowercase ended up being both capitalized in s 1063 00:49:39,060 --> 00:49:42,660 and capitalized in t per the two lines of output. 1064 00:49:42,660 --> 00:49:43,270 That's a bug? 1065 00:49:43,270 --> 00:49:43,770 Right? 1066 00:49:43,770 --> 00:49:46,020 I only capitalized t, how did I accidentally 1067 00:49:46,020 --> 00:49:48,300 also capitalize s do you think? 1068 00:49:48,300 --> 00:49:50,873 1069 00:49:50,873 --> 00:49:51,415 Any thoughts? 1070 00:49:51,415 --> 00:49:54,940 1071 00:49:54,940 --> 00:49:57,743 Doesn't matter if I avert the lights, I still can't see any hands. 1072 00:49:57,743 --> 00:49:58,910 OK, how about here in front? 1073 00:49:58,910 --> 00:49:59,590 Yeah. 1074 00:49:59,590 --> 00:50:05,280 AUDIENCE: So when you say t equal s you have to [INAUDIBLE] 1075 00:50:05,280 --> 00:50:06,280 DAVID J. MALAN: Exactly. 1076 00:50:06,280 --> 00:50:11,350 When I say t equals s on this line, I am getting a second variable called t. 1077 00:50:11,350 --> 00:50:12,790 And I am copying s. 1078 00:50:12,790 --> 00:50:15,070 But I'm copying s literally. 1079 00:50:15,070 --> 00:50:17,380 s as of today, is an address. 1080 00:50:17,380 --> 00:50:22,840 After all, string is the same thing as char star for both s and t. 1081 00:50:22,840 --> 00:50:25,760 And so technically, all I'm doing is copying an address. 1082 00:50:25,760 --> 00:50:28,810 So if I go back to my picture from before, this time, 1083 00:50:28,810 --> 00:50:33,670 if I've gone ahead and typed in an array of emma, with all lowercase-- 1084 00:50:33,670 --> 00:50:39,250 e-m-m-a --and then a backslash 0, somewhere in memory using getString, 1085 00:50:39,250 --> 00:50:42,685 and I've gone ahead initially and stored that in a variable called s-- 1086 00:50:42,685 --> 00:50:44,560 and I don't care about the addresses anymore. 1087 00:50:44,560 --> 00:50:47,170 I'm just going to use arrows now to depict it graphically. 1088 00:50:47,170 --> 00:50:53,590 When I created a second variable called t and I set t equal to s, 1089 00:50:53,590 --> 00:50:57,220 that's like literally copying the arrow that's in s 1090 00:50:57,220 --> 00:51:01,750 and storing it in t, which means t is also pointing at the same thing. 1091 00:51:01,750 --> 00:51:04,990 Because again, if I didn't do this hand wavy arrow notation, 1092 00:51:04,990 --> 00:51:07,640 I literally wrote out 0x123. 1093 00:51:07,640 --> 00:51:11,830 I would have just written out 0x123 in both s and t. 1094 00:51:11,830 --> 00:51:15,220 So when, in my code, I go ahead and say, you 1095 00:51:15,220 --> 00:51:19,780 know what, go to the first character in t and then go ahead and uppercase it. 1096 00:51:19,780 --> 00:51:22,390 Guess what the first character in t is? 1097 00:51:22,390 --> 00:51:23,380 Well, it's this e. 1098 00:51:23,380 --> 00:51:28,090 But guess what the first character in s is, literally that same e. 1099 00:51:28,090 --> 00:51:30,820 So this does not suffice to copy a string 1100 00:51:30,820 --> 00:51:35,480 by just saying t equals s, as it has up until now with every other variable. 1101 00:51:35,480 --> 00:51:38,230 Any time you've needed a temporary variable or a copy of something 1102 00:51:38,230 --> 00:51:39,040 this worked. 1103 00:51:39,040 --> 00:51:42,070 Intuitively, what do we have to do probably instead 1104 00:51:42,070 --> 00:51:45,890 to truly copy Emma into two different places in memory? 1105 00:51:45,890 --> 00:51:46,390 Yeah. 1106 00:51:46,390 --> 00:51:50,730 AUDIENCE: Probably create a char or create a variable exactly the same size 1107 00:51:50,730 --> 00:51:52,535 and copy each character individually. 1108 00:51:52,535 --> 00:51:53,410 DAVID J. MALAN: Nice. 1109 00:51:53,410 --> 00:51:55,452 So maybe we should give ourselves a variable that 1110 00:51:55,452 --> 00:51:58,840 has more memory, the same amount of memory being stored 1111 00:51:58,840 --> 00:52:02,530 for the original Emma, and then copy the characters from s 1112 00:52:02,530 --> 00:52:05,470 into the space we've allocated for t. 1113 00:52:05,470 --> 00:52:06,890 And so we can actually do this. 1114 00:52:06,890 --> 00:52:10,210 Let me go ahead and get rid of all but that first line, where 1115 00:52:10,210 --> 00:52:11,930 I've gotten s as before. 1116 00:52:11,930 --> 00:52:15,107 And I'm going to go ahead and do this, I'm to say that t is a string-- 1117 00:52:15,107 --> 00:52:17,440 but you know, we don't need that training wheel anymore. 1118 00:52:17,440 --> 00:52:20,020 String, char star, even though it looks uglier. 1119 00:52:20,020 --> 00:52:22,450 Let me go ahead and allocate more memory for myself. 1120 00:52:22,450 --> 00:52:23,292 How do I do that? 1121 00:52:23,292 --> 00:52:26,500 Well, it turns out-- we've not used this before --there's a C function called 1122 00:52:26,500 --> 00:52:28,660 malloc, for memory alloca. 1123 00:52:28,660 --> 00:52:32,320 And all it asks as input is how many bytes you want. 1124 00:52:32,320 --> 00:52:36,130 So how many bytes do I want for Emma to store her name? 1125 00:52:36,130 --> 00:52:39,422 1126 00:52:39,422 --> 00:52:40,297 AUDIENCE: [INAUDIBLE] 1127 00:52:40,297 --> 00:52:41,905 DAVID J. MALAN: I heard 4, 5. 1128 00:52:41,905 --> 00:52:42,460 Why, 5? 1129 00:52:42,460 --> 00:52:43,850 AUDIENCE: [INAUDIBLE] 1130 00:52:43,850 --> 00:52:46,267 DAVID J. MALAN: So we need the null terminating character, 1131 00:52:46,267 --> 00:52:47,660 e-m-m-a and then backslash 0. 1132 00:52:47,660 --> 00:52:48,532 So that's 5. 1133 00:52:48,532 --> 00:52:50,240 So I could literally hard code this here. 1134 00:52:50,240 --> 00:52:52,580 Of course, this feels a little fragile because I'm 1135 00:52:52,580 --> 00:52:54,572 asking for any string via getString. 1136 00:52:54,572 --> 00:52:56,030 I don't know it's going to be Emma. 1137 00:52:56,030 --> 00:52:58,238 So you know what, let me go ahead and ask a question? 1138 00:52:58,238 --> 00:53:01,940 Whatever the length is of the human's input in s, 1139 00:53:01,940 --> 00:53:04,970 go ahead and add 1 to it for the null character 1140 00:53:04,970 --> 00:53:06,580 and then allocate that many bytes. 1141 00:53:06,580 --> 00:53:08,570 So now my program's more dynamic. 1142 00:53:08,570 --> 00:53:11,520 And once I have this, well, how can I go ahead and copy this? 1143 00:53:11,520 --> 00:53:13,170 Well, let me just do old school loop. 1144 00:53:13,170 --> 00:53:18,980 So for int I get 0, i is less than the string length of s, i plus plus-- 1145 00:53:18,980 --> 00:53:22,610 so this is just a standard for loop iterating over a string 1146 00:53:22,610 --> 00:53:28,880 --and I think I can just do t bracket i equals s bracket i in order 1147 00:53:28,880 --> 00:53:31,520 to copy the two strings. 1148 00:53:31,520 --> 00:53:36,200 There's a subtle bug and a subtle inefficiency though. 1149 00:53:36,200 --> 00:53:42,050 Anyone want to critique how I've gone about copying s into t? 1150 00:53:42,050 --> 00:53:42,656 Yeah. 1151 00:53:42,656 --> 00:53:44,790 AUDIENCE: [INAUDIBLE] getString [INAUDIBLE].. 1152 00:53:44,790 --> 00:53:45,090 DAVID J. MALAN: Yeah. 1153 00:53:45,090 --> 00:53:45,840 This was inefficient. 1154 00:53:45,840 --> 00:53:47,730 We said a couple of weeks ago this is bad design 1155 00:53:47,730 --> 00:53:49,860 to just keep asking the question, what's the length the s? 1156 00:53:49,860 --> 00:53:51,000 What's the length of s? 1157 00:53:51,000 --> 00:53:54,070 So remember that we had a little optimization a couple of weeks ago. 1158 00:53:54,070 --> 00:53:56,790 Let's just declare n to equal the string length of s 1159 00:53:56,790 --> 00:53:59,460 and then do a condition of i is less than n. 1160 00:53:59,460 --> 00:54:01,000 So we've improved the design there. 1161 00:54:01,000 --> 00:54:02,208 It's a little more efficient. 1162 00:54:02,208 --> 00:54:03,330 We're wasting less time. 1163 00:54:03,330 --> 00:54:06,745 There's still a subtle bug here. 1164 00:54:06,745 --> 00:54:07,620 How many byte-- yeah. 1165 00:54:07,620 --> 00:54:09,960 AUDIENCE: Aren't you not copying the null terminator 1166 00:54:09,960 --> 00:54:12,127 DAVID J. MALAN: I'm not copying the null terminator. 1167 00:54:12,127 --> 00:54:15,840 So every other time we've iterated over a string, this has been correct. 1168 00:54:15,840 --> 00:54:20,850 Iterate up to the length but not through the length of that string. 1169 00:54:20,850 --> 00:54:23,880 But I technically do want to go one more step 1170 00:54:23,880 --> 00:54:26,790 this time, or equivalently, one more step. 1171 00:54:26,790 --> 00:54:30,810 Because I also want to copy not just e-m-m-a, which is str length 4-- 1172 00:54:30,810 --> 00:54:35,620 e-m-m-a is 4 --I also want to do it a fifth time for the null character. 1173 00:54:35,620 --> 00:54:37,860 So in this case, I'm deliberately going one step 1174 00:54:37,860 --> 00:54:42,090 past where I usually want to go to make sure I copy 5 bytes for Emma, 1175 00:54:42,090 --> 00:54:42,910 not just 4. 1176 00:54:42,910 --> 00:54:43,410 All right. 1177 00:54:43,410 --> 00:54:45,035 Let's go ahead now and capitalize Emma. 1178 00:54:45,035 --> 00:54:51,060 So t bracket 0 gets toupper of Emma's first character in the copy. 1179 00:54:51,060 --> 00:54:54,900 And now let's go ahead and print out both strings s 1180 00:54:54,900 --> 00:54:59,550 and t, just as before, with percent s of t. 1181 00:54:59,550 --> 00:55:02,278 And let me make one change, I use strlen now. 1182 00:55:02,278 --> 00:55:04,570 So I know I'm going to get an error if I don't do this. 1183 00:55:04,570 --> 00:55:07,760 I need to use string dot h-- recall --anytime you use string length. 1184 00:55:07,760 --> 00:55:09,600 So I'm going to go proactively add that. 1185 00:55:09,600 --> 00:55:10,710 So what's different? 1186 00:55:10,710 --> 00:55:12,340 This line is the same as before. 1187 00:55:12,340 --> 00:55:14,110 I'm getting a string from the user. 1188 00:55:14,110 --> 00:55:15,790 This line is the same as before. 1189 00:55:15,790 --> 00:55:17,615 I'm capitalizing the first letter. 1190 00:55:17,615 --> 00:55:18,990 And these two lines are the same. 1191 00:55:18,990 --> 00:55:20,610 I'm just printing out s and t. 1192 00:55:20,610 --> 00:55:24,960 So the new idea here is, with my malloc, am I allocating as many bytes 1193 00:55:24,960 --> 00:55:28,020 as I need to store a copy of Emma, and then with this for loop 1194 00:55:28,020 --> 00:55:31,440 am I actually doing the actual copy? 1195 00:55:31,440 --> 00:55:34,350 Let me go ahead and do make copy again. 1196 00:55:34,350 --> 00:55:35,190 Seems to run OK. 1197 00:55:35,190 --> 00:55:37,080 Run dot slash copy. 1198 00:55:37,080 --> 00:55:39,240 Type e-m-m-a in all lowercase. 1199 00:55:39,240 --> 00:55:44,570 And voila, now I've capitalized t but not s. 1200 00:55:44,570 --> 00:55:45,070 Yeah? 1201 00:55:45,070 --> 00:55:49,920 AUDIENCE: When you use malloc, it's just allocating number of bytes, 1202 00:55:49,920 --> 00:55:51,377 it doesn't matter where? 1203 00:55:51,377 --> 00:55:53,960 DAVID J. MALAN: It is just allocating that many bytes for you. 1204 00:55:53,960 --> 00:55:55,200 It does not matter where. 1205 00:55:55,200 --> 00:55:58,950 You indeed should not care where it is because you're just 1206 00:55:58,950 --> 00:56:01,180 being handed the address and using C code, 1207 00:56:01,180 --> 00:56:04,000 can you just go there as you want. 1208 00:56:04,000 --> 00:56:04,500 All right. 1209 00:56:04,500 --> 00:56:05,710 Let's clean this up too. 1210 00:56:05,710 --> 00:56:07,930 Surely, people copy strings for years. 1211 00:56:07,930 --> 00:56:10,470 And in fact, we don't need to do this for loop ourself. 1212 00:56:10,470 --> 00:56:14,160 It turns out we can simplify this code a little bit by enhancing this 1213 00:56:14,160 --> 00:56:14,920 as follows. 1214 00:56:14,920 --> 00:56:17,700 It turns out, if you look in the manual page for strings, 1215 00:56:17,700 --> 00:56:20,070 you can actually use something called strcopy-- 1216 00:56:20,070 --> 00:56:22,350 no-- without any vowels. 1217 00:56:22,350 --> 00:56:25,980 And you can copy into t, the contents of s. 1218 00:56:25,980 --> 00:56:29,220 strcpy is a function written a long time ago by some other human. 1219 00:56:29,220 --> 00:56:32,850 And they went ahead and implemented, probably, that loop for us. 1220 00:56:32,850 --> 00:56:35,670 And it tightens up our code here a little bit more. 1221 00:56:35,670 --> 00:56:36,570 AUDIENCE: Professor? 1222 00:56:36,570 --> 00:56:37,445 DAVID J. MALAN: Yeah. 1223 00:56:37,445 --> 00:56:41,338 AUDIENCE: What if I forgot to copy in the null character at the end? 1224 00:56:41,338 --> 00:56:42,880 DAVID J. MALAN: Really good question. 1225 00:56:42,880 --> 00:56:47,160 What if you forgot to copy in the null character at the end? 1226 00:56:47,160 --> 00:56:49,140 It is unclear what would happen. 1227 00:56:49,140 --> 00:56:52,770 If there just happened to be some bits in that location in memory 1228 00:56:52,770 --> 00:56:55,260 from earlier-- from some other part of your program 1229 00:56:55,260 --> 00:56:57,450 --and you try printing out s and printing out t, 1230 00:56:57,450 --> 00:56:59,880 you might print out many more characters than you actually 1231 00:56:59,880 --> 00:57:03,742 intended-- if there's no backslash 0 actually there. 1232 00:57:03,742 --> 00:57:04,950 We'll see this more and more. 1233 00:57:04,950 --> 00:57:07,380 Anytime you don't initialize the value of a variable, 1234 00:57:07,380 --> 00:57:09,600 it's what's called a garbage value, which means 1235 00:57:09,600 --> 00:57:11,550 who knows what 0s and 1s are there. 1236 00:57:11,550 --> 00:57:13,260 You might get lucky and it's all 0s. 1237 00:57:13,260 --> 00:57:17,950 But most likely it's going to print some garbage value instead. 1238 00:57:17,950 --> 00:57:18,450 All right. 1239 00:57:18,450 --> 00:57:20,843 Any questions on this? 1240 00:57:20,843 --> 00:57:21,809 Yeah. 1241 00:57:21,809 --> 00:57:25,060 AUDIENCE: Is the string length function only in the CS50 library? 1242 00:57:25,060 --> 00:57:26,060 DAVID J. MALAN: Is the-- 1243 00:57:26,060 --> 00:57:26,685 which function? 1244 00:57:26,685 --> 00:57:27,840 AUDIENCE: String length. 1245 00:57:27,840 --> 00:57:30,132 DAVID J. MALAN: Oh, strlen, no, that's in string dot h. 1246 00:57:30,132 --> 00:57:31,966 That is a standard C thing. 1247 00:57:31,966 --> 00:57:32,918 AUDIENCE: OK. 1248 00:57:32,918 --> 00:57:39,097 If string length is a standard function but strings are not-- 1249 00:57:39,097 --> 00:57:41,180 DAVID J. MALAN: So what's the dichotomy here then? 1250 00:57:41,180 --> 00:57:43,970 If strings don't exist-- 1251 00:57:43,970 --> 00:57:45,320 as I've noted multiple times. 1252 00:57:45,320 --> 00:57:49,010 And yet, there's functions like strcpy and strlen --what's going on? 1253 00:57:49,010 --> 00:57:50,990 C calls them char stars. 1254 00:57:50,990 --> 00:57:52,970 It is c that does not call them strings. 1255 00:57:52,970 --> 00:57:56,750 We, CS50, and the world in general, calls addresses 1256 00:57:56,750 --> 00:57:59,690 of sequences of characters, strings. 1257 00:57:59,690 --> 00:58:02,450 So the only training wheel here, really is the semantics. 1258 00:58:02,450 --> 00:58:06,620 We gave you a data type called string so that in the first week of C and CS50, 1259 00:58:06,620 --> 00:58:09,830 you don't have to see or type char star, which would arguably 1260 00:58:09,830 --> 00:58:11,720 be a lot more cryptic so early on. 1261 00:58:11,720 --> 00:58:14,390 It's arguably a bit cryptic today too. 1262 00:58:14,390 --> 00:58:15,950 Other questions? 1263 00:58:15,950 --> 00:58:16,991 All right, yeah. 1264 00:58:16,991 --> 00:58:21,027 AUDIENCE: So is char star ID type [INAUDIBLE] 1265 00:58:21,027 --> 00:58:21,860 DAVID J. MALAN: Is-- 1266 00:58:21,860 --> 00:58:23,090 say that once more. 1267 00:58:23,090 --> 00:58:26,498 AUDIENCE: Char star ID type [INAUDIBLE]. 1268 00:58:26,498 --> 00:58:29,540 DAVID J. MALAN: Not all of them, but any of them that take a string, yes. 1269 00:58:29,540 --> 00:58:33,890 In fact, any time you have seen us or TF in CS50 say string, 1270 00:58:33,890 --> 00:58:37,730 you can literally, starting today, change that expression to char star 1271 00:58:37,730 --> 00:58:39,966 and it will be one and the same. 1272 00:58:39,966 --> 00:58:40,500 Phew. 1273 00:58:40,500 --> 00:58:41,000 OK. 1274 00:58:41,000 --> 00:58:41,625 That was a lot. 1275 00:58:41,625 --> 00:58:44,530 Let's take our five minute break here with cookies outside. 1276 00:58:44,530 --> 00:58:46,550 All right. 1277 00:58:46,550 --> 00:58:48,560 So we are back. 1278 00:58:48,560 --> 00:58:50,870 That was a lot. 1279 00:58:50,870 --> 00:58:54,650 Let me draw our attention to what the newest feature was just 1280 00:58:54,650 --> 00:58:58,340 a moment ago, this notion of malloc, memory allocation. 1281 00:58:58,340 --> 00:59:02,240 So recall that getString I claim as of today, all this time, 1282 00:59:02,240 --> 00:59:05,030 it's just returning to you the address of the string 1283 00:59:05,030 --> 00:59:06,980 that was gotten from the human. 1284 00:59:06,980 --> 00:59:09,950 malloc, similarly, has a return value. 1285 00:59:09,950 --> 00:59:13,490 And when you ask malloc for this many bytes-- maybe it's five, for emma, 1286 00:59:13,490 --> 00:59:16,580 plus the null terminator, malloc's purpose in life 1287 00:59:16,580 --> 00:59:21,000 is to return to you the address of the first byte of that memory as well. 1288 00:59:21,000 --> 00:59:24,860 So memory alloc means, go get me a chunk of memory somewhere, 1289 00:59:24,860 --> 00:59:27,410 hand me back a pointer there too. 1290 00:59:27,410 --> 00:59:29,990 And the onus is on me to remember that address, 1291 00:59:29,990 --> 00:59:32,750 as I'm doing here, by storing it in t. 1292 00:59:32,750 --> 00:59:35,480 But it turns out, now that we're taking the training wheels off, 1293 00:59:35,480 --> 00:59:38,210 unfortunately, we have to kind of do a bit more work ourselves. 1294 00:59:38,210 --> 00:59:41,390 And there's actually a latent bug in this program. 1295 00:59:41,390 --> 00:59:45,470 It turns out that I am mal-allocating memory with this 1296 00:59:45,470 --> 00:59:47,150 but I'm never actually freeing it. 1297 00:59:47,150 --> 00:59:50,900 The opposite of malloc is a function called free, whose purpose in life 1298 00:59:50,900 --> 00:59:54,740 is to hand back the memory that you asked for so that you 1299 00:59:54,740 --> 00:59:57,828 have plenty of memory available for other parts of your program 1300 00:59:57,828 --> 00:59:58,370 and so forth. 1301 00:59:58,370 --> 01:00:01,435 And long story short, if you've ever-- on your Mac or PC 1302 01:00:01,435 --> 01:00:03,560 --been running a program that maybe is a little bit 1303 01:00:03,560 --> 01:00:06,500 buggy --you might notice your computer is getting slower, and slower, 1304 01:00:06,500 --> 01:00:08,540 or maybe it even runs out of memory explicitly, 1305 01:00:08,540 --> 01:00:10,670 per some error message --that might be quite 1306 01:00:10,670 --> 01:00:14,960 simply, that the programmer of that program kept using mallc, 1307 01:00:14,960 --> 01:00:18,600 and malloc, and malloc to grow, and grow, and grow their use of memory, 1308 01:00:18,600 --> 01:00:21,080 but they never got around to freeing any of that memory. 1309 01:00:21,080 --> 01:00:23,210 So programs can run out of memory. 1310 01:00:23,210 --> 01:00:24,780 Your computer can run out of memory. 1311 01:00:24,780 --> 01:00:28,670 So it's good practice, therefore, to free any memory you're not using. 1312 01:00:28,670 --> 01:00:30,340 However, how do you find this mistake? 1313 01:00:30,340 --> 01:00:33,110 So we've got one final debugging tool for you. 1314 01:00:33,110 --> 01:00:35,720 This one's not CS50 specific like debug50. 1315 01:00:35,720 --> 01:00:37,130 This one is called Valgrind. 1316 01:00:37,130 --> 01:00:41,178 Unfortunately, it's not the easiest thing to understand at first glance. 1317 01:00:41,178 --> 01:00:42,720 So I'm going to go ahead and do this. 1318 01:00:42,720 --> 01:00:48,820 I'm going to run Valgrind on this program, dot slash copy, and hit Enter. 1319 01:00:48,820 --> 01:00:49,320 And unfort-- 1320 01:00:49,320 --> 01:00:50,195 AUDIENCE: [INAUDIBLE] 1321 01:00:50,195 --> 01:00:51,320 [CHUCKLE] 1322 01:00:51,320 --> 01:00:52,022 [COUGH] 1323 01:00:52,022 --> 01:00:52,980 DAVID J. MALAN: Gotcha. 1324 01:00:52,980 --> 01:00:53,480 OK. 1325 01:00:53,480 --> 01:00:56,420 I'm going to go ahead and-- 1326 01:00:56,420 --> 01:00:57,450 there we go. 1327 01:00:57,450 --> 01:00:58,370 AUDIENCE: [INAUDIBLE] 1328 01:00:58,370 --> 01:01:00,780 So what you missed was a very scary message. 1329 01:01:00,780 --> 01:01:03,680 So I'm going to go ahead and run Valgrind on dot slash copy. 1330 01:01:03,680 --> 01:01:06,990 We see this esoteric output up top and then my prompt for s-- 1331 01:01:06,990 --> 01:01:08,240 because it's the same program. 1332 01:01:08,240 --> 01:01:11,448 It's prompting me for a string --so I'm going to give it emma, all lowercase, 1333 01:01:11,448 --> 01:01:12,200 and enter. 1334 01:01:12,200 --> 01:01:16,220 And you'll notice now, that there's some summary going on here 1335 01:01:16,220 --> 01:01:17,810 but also some mention of error. 1336 01:01:17,810 --> 01:01:21,860 So heap summary-- we'll come back to that in a bit --5 bytes in 1 blocks 1337 01:01:21,860 --> 01:01:24,950 are definitely lost in loss record 1 of 2. 1338 01:01:24,950 --> 01:01:27,767 Leak summary, I've got 5 bytes leaking in 1 blocks. 1339 01:01:27,767 --> 01:01:30,350 I mean, this is one of these programs in Linux-- the operating 1340 01:01:30,350 --> 01:01:34,100 system that we use, that's quite common in industry too --I mean, my god. 1341 01:01:34,100 --> 01:01:37,625 There's so-- there's so many more characters on the screen that 1342 01:01:37,625 --> 01:01:39,000 are actually enlightening for me. 1343 01:01:39,000 --> 01:01:41,510 Let's see if we can focus our attention on what matters. 1344 01:01:41,510 --> 01:01:43,220 Memory leaking, bad. 1345 01:01:43,220 --> 01:01:47,060 So how do we go about chasing down where memory is leaking? 1346 01:01:47,060 --> 01:01:49,740 Well, as before, we can use help50. 1347 01:01:49,740 --> 01:01:52,670 And in fact, help50 will analyze the output of Valgrind-- it's still 1348 01:01:52,670 --> 01:01:53,690 going to prompt me first string. 1349 01:01:53,690 --> 01:01:56,220 So I'm going to again, type in emma --it's going to look at that. 1350 01:01:56,220 --> 01:01:57,095 It's to ask for help. 1351 01:01:57,095 --> 01:02:01,460 And voila, highlighted in yellow, is a message that we, help50, recognize. 1352 01:02:01,460 --> 01:02:05,540 And notice our advices, looks like your program leaked 5 bytes of memory. 1353 01:02:05,540 --> 01:02:08,510 Did you forget to free memory that you allocated via malloc. 1354 01:02:08,510 --> 01:02:11,443 Take a closer look at line 10 of copy dot C. 1355 01:02:11,443 --> 01:02:14,360 Now once you've done this a couple of times and made the same mistake, 1356 01:02:14,360 --> 01:02:18,350 you can probably scroll up and glean for yourself where the error is. 1357 01:02:18,350 --> 01:02:21,560 We're not revealing any more information than is right in front of you. 1358 01:02:21,560 --> 01:02:26,330 And in fact, you can see here, ah, in main on copy dot C, line 10, 1359 01:02:26,330 --> 01:02:30,290 there's some kind of 5 bytes in 1 blocks are definitely lost. 1360 01:02:30,290 --> 01:02:33,690 So there's a lot of words there but it does draw attention to the right place. 1361 01:02:33,690 --> 01:02:36,890 So let me go ahead and scroll down, focus on line 10. 1362 01:02:36,890 --> 01:02:39,560 And indeed, line 10 is where I allocated the memory. 1363 01:02:39,560 --> 01:02:42,680 So it turns out the solution for this is quite simple. 1364 01:02:42,680 --> 01:02:45,770 Down here, I'm just going to go ahead and free 1365 01:02:45,770 --> 01:02:51,170 t, the address of the chunk of memory that malloc returned to me. 1366 01:02:51,170 --> 01:02:54,770 So I'm undoing the effects of allocating memory by de-allocating memory. 1367 01:02:54,770 --> 01:02:56,920 So now let me go ahead and run copy. 1368 01:02:56,920 --> 01:03:00,050 And if I run copy, it's not going to seem to run any differently. 1369 01:03:00,050 --> 01:03:01,700 It's still going to work correctly. 1370 01:03:01,700 --> 01:03:06,200 But now if I analyze it for mistakes with Valgrind, so Valgrind of dot 1371 01:03:06,200 --> 01:03:07,510 slash copy-- 1372 01:03:07,510 --> 01:03:10,520 I'm going to again type in emma in all lowercase 1373 01:03:10,520 --> 01:03:13,430 and I cross my fingers --that indeed now, leaked 1374 01:03:13,430 --> 01:03:15,975 summary, 0 bytes in 0 blocks. 1375 01:03:15,975 --> 01:03:19,190 So unfortunately, even when all is well, it still spits out a mouthful. 1376 01:03:19,190 --> 01:03:23,010 But now I see no mention of blocks that are actually 1377 01:03:23,010 --> 01:03:25,477 leaked, at least in the top part here. 1378 01:03:25,477 --> 01:03:27,810 And we'll see more of this over the next couple of weeks 1379 01:03:27,810 --> 01:03:30,750 as we use it to chase down more complicated bugs. 1380 01:03:30,750 --> 01:03:33,480 But it's just another tool in the toolkit that allows 1381 01:03:33,480 --> 01:03:35,820 us to detect these kinds of errors. 1382 01:03:35,820 --> 01:03:37,320 Let me try one other thing actually. 1383 01:03:37,320 --> 01:03:39,720 This is a program that I wrote in advance. 1384 01:03:39,720 --> 01:03:43,320 This one is called memory dot C. And as always, these are all on the course's 1385 01:03:43,320 --> 01:03:45,220 website if you'd like to tinker after. 1386 01:03:45,220 --> 01:03:46,950 And it's a little pointless. 1387 01:03:46,950 --> 01:03:49,030 It's just meant for demonstration purposes. 1388 01:03:49,030 --> 01:03:50,160 So here is a program. 1389 01:03:50,160 --> 01:03:54,642 And it's copied from this online manual for Valgrind, the tool I just used. 1390 01:03:54,642 --> 01:03:55,850 So let's see what's going on. 1391 01:03:55,850 --> 01:03:57,900 Here I have main, at the bottom of my code. 1392 01:03:57,900 --> 01:03:58,410 I copied it. 1393 01:03:58,410 --> 01:03:59,452 I didn't use a prototype. 1394 01:03:59,452 --> 01:04:00,750 I just copied what they did. 1395 01:04:00,750 --> 01:04:04,980 And see here, it calls a function called f and then returns 0. 1396 01:04:04,980 --> 01:04:07,140 Well what does f do? f is this random function 1397 01:04:07,140 --> 01:04:10,410 up here that takes no inputs per the void. 1398 01:04:10,410 --> 01:04:14,640 And in English, how would you describe what's happening in line 7 now-- 1399 01:04:14,640 --> 01:04:18,420 that we've introduced malloc and stars-- 1400 01:04:18,420 --> 01:04:19,840 or pointers? 1401 01:04:19,840 --> 01:04:20,590 What's this doing? 1402 01:04:20,590 --> 01:04:21,233 Yeah. 1403 01:04:21,233 --> 01:04:26,695 AUDIENCE: It's allocating enough memory in [INAUDIBLE] for [INAUDIBLE].. 1404 01:04:26,695 --> 01:04:27,570 DAVID J. MALAN: Good. 1405 01:04:27,570 --> 01:04:31,080 Allocate enough memory for 10 integers-- and then 1406 01:04:31,080 --> 01:04:33,210 let me add-- elaborate on your words --and then 1407 01:04:33,210 --> 01:04:37,860 store the address of that chunk of memory 1408 01:04:37,860 --> 01:04:40,020 in a pointer called x, if you will. 1409 01:04:40,020 --> 01:04:41,102 So sizeof is new. 1410 01:04:41,102 --> 01:04:42,560 But it literally does what it says. 1411 01:04:42,560 --> 01:04:44,400 If you say sizeof open paren, close paren, 1412 01:04:44,400 --> 01:04:47,563 and then the name of a data type, it will tell you that an int is 4 bytes. 1413 01:04:47,563 --> 01:04:49,230 It will tell you that a long is 8 bytes. 1414 01:04:49,230 --> 01:04:51,060 It will tell you that a char is one byte. 1415 01:04:51,060 --> 01:04:53,070 It's just a dynamic way of avoiding having 1416 01:04:53,070 --> 01:04:54,940 to memorize those kinds of things. 1417 01:04:54,940 --> 01:04:56,910 So this just means give me 10 times the size 1418 01:04:56,910 --> 01:04:58,830 of an int, which happens to be 4 bytes. 1419 01:04:58,830 --> 01:05:02,310 So that means give me 10 times 4, or 40 bytes of memory. 1420 01:05:02,310 --> 01:05:06,090 That's effectively an array of memory that I can store integers in. 1421 01:05:06,090 --> 01:05:09,420 And malloc, per its definition, is going to return to me the address 1422 01:05:09,420 --> 01:05:12,030 of the first byte of that memory. 1423 01:05:12,030 --> 01:05:20,170 What is now scary about line 8, relatively speaking? 1424 01:05:20,170 --> 01:05:25,240 What might worry you with line 8, which is buggy, unfortunately? 1425 01:05:25,240 --> 01:05:26,436 Yeah. 1426 01:05:26,436 --> 01:05:30,970 AUDIENCE: [INAUDIBLE] 1427 01:05:30,970 --> 01:05:31,970 DAVID J. MALAN: Exactly. 1428 01:05:31,970 --> 01:05:35,490 I'm doing x bracket 10 and just arbitrarily storing the number 0. 1429 01:05:35,490 --> 01:05:35,990 Why? 1430 01:05:35,990 --> 01:05:36,980 Just because. 1431 01:05:36,980 --> 01:05:38,340 But 10 does not exist. 1432 01:05:38,340 --> 01:05:38,840 Right? 1433 01:05:38,840 --> 01:05:44,390 If I have 10 int, it's bracket 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, not bracket 10. 1434 01:05:44,390 --> 01:05:47,870 So this is an example of overflowing a buffer, so to speak. 1435 01:05:47,870 --> 01:05:50,153 Anytime you're talking about memory, any time 1436 01:05:50,153 --> 01:05:52,820 you're talking about an array of memory-- which this effectively 1437 01:05:52,820 --> 01:05:55,280 is, 10 integers, room for 10 integers back to back 1438 01:05:55,280 --> 01:06:01,040 to back --if you go one step too far, that's what's called a buffer overflow, 1439 01:06:01,040 --> 01:06:03,150 whereby the buffer is the array. 1440 01:06:03,150 --> 01:06:05,150 And in fact, this would make it even more clear. 1441 01:06:05,150 --> 01:06:08,510 Suppose I tried to go there, bracket 10,000. 1442 01:06:08,510 --> 01:06:11,640 That is definitely not among the bytes of memory I allocated. 1443 01:06:11,640 --> 01:06:14,240 That's definitely going beyond the boundaries of my array. 1444 01:06:14,240 --> 01:06:18,590 But so is it true that bracket 10 is one step too far. 1445 01:06:18,590 --> 01:06:20,880 So what's nice about Valgrind is this. 1446 01:06:20,880 --> 01:06:24,910 Let me go ahead and rerun Valgrind after compiling this memory program-- 1447 01:06:24,910 --> 01:06:27,050 whoops --in my source directory. 1448 01:06:27,050 --> 01:06:29,390 Let me go ahead and make memory. 1449 01:06:29,390 --> 01:06:29,890 All right. 1450 01:06:29,890 --> 01:06:31,280 It compiled OK. 1451 01:06:31,280 --> 01:06:34,190 Valgrind dot slash memory-- 1452 01:06:34,190 --> 01:06:37,040 and unfortunately, we're going to see some crazy arcane error 1453 01:06:37,040 --> 01:06:38,000 messages for a moment. 1454 01:06:38,000 --> 01:06:39,920 But let's see what it says. 1455 01:06:39,920 --> 01:06:43,040 Notice here, invalid write of size 4-- 1456 01:06:43,040 --> 01:06:46,968 that sounds bad --and 40 bytes in one blocks are-- 1457 01:06:46,968 --> 01:06:49,260 OK, they didn't really add an if condition in Valgrind. 1458 01:06:49,260 --> 01:06:50,990 --40 bytes in 1 blocks-- 1459 01:06:50,990 --> 01:06:52,910 plural --are definitely lost. 1460 01:06:52,910 --> 01:06:54,830 So let's fix the second of those first. 1461 01:06:54,830 --> 01:06:57,615 Why am I leaking 40 bytes exactly? 1462 01:06:57,615 --> 01:06:58,490 AUDIENCE: [INAUDIBLE] 1463 01:06:58,490 --> 01:07:00,032 DAVID J. MALAN: I'm never freeing it. 1464 01:07:00,032 --> 01:07:03,770 So I think I can get away with just doing this here, just free the memory 1465 01:07:03,770 --> 01:07:06,560 after I'm done using it-- even though I'm not really 1466 01:07:06,560 --> 01:07:08,938 using it for anything purposeful here. 1467 01:07:08,938 --> 01:07:09,980 So let me try this again. 1468 01:07:09,980 --> 01:07:16,040 Make memory, now let me do Valgrind dot slash memory. 1469 01:07:16,040 --> 01:07:18,200 And-- OK, better. 1470 01:07:18,200 --> 01:07:20,450 I don't see 40 bytes lost anymore. 1471 01:07:20,450 --> 01:07:21,080 So that's good. 1472 01:07:21,080 --> 01:07:22,760 But I do still have this issue. 1473 01:07:22,760 --> 01:07:25,850 But here's where it's sometimes useful to understand the various data 1474 01:07:25,850 --> 01:07:27,230 types and their sizes. 1475 01:07:27,230 --> 01:07:29,300 Invalid write of size 4. 1476 01:07:29,300 --> 01:07:32,340 Writing in a program just means changing a value. 1477 01:07:32,340 --> 01:07:34,160 And it mentioned line 8 here. 1478 01:07:34,160 --> 01:07:37,580 In what sense is this an invalid write of size 4? 1479 01:07:37,580 --> 01:07:39,510 Well, how big is an int? 1480 01:07:39,510 --> 01:07:40,520 Four bytes. 1481 01:07:40,520 --> 01:07:42,530 You're trying to change it arbitrarily to 0. 1482 01:07:42,530 --> 01:07:44,900 But I could have made that 50 or any other number. 1483 01:07:44,900 --> 01:07:48,080 But I'm trying to touch an int that should not 1484 01:07:48,080 --> 01:07:51,470 be within the memory I have allocated for myself. 1485 01:07:51,470 --> 01:07:57,080 I asked for 40 bytes, or 10 ints, but again, because arrays are zero indexed, 1486 01:07:57,080 --> 01:07:59,510 this is like going one beyond the boundary. 1487 01:07:59,510 --> 01:08:03,750 So let me fix this and just arbitrarily say, let's go touch that part of it. 1488 01:08:03,750 --> 01:08:06,080 Let me go here and do make memory. 1489 01:08:06,080 --> 01:08:10,250 Let me go ahead and do Valgrind dot slash memory. 1490 01:08:10,250 --> 01:08:16,520 And now, arcane output aside, notice that that error message went away too. 1491 01:08:16,520 --> 01:08:19,370 So this will be helpful over the coming couple of weeks 1492 01:08:19,370 --> 01:08:22,890 as we continue to use C to implement a number of programs 1493 01:08:22,890 --> 01:08:24,390 that now start to manipulate memory. 1494 01:08:24,390 --> 01:08:26,479 It's just a tool that helps you spot errors 1495 01:08:26,479 --> 01:08:29,240 that certainly, your TF might otherwise, or that 1496 01:08:29,240 --> 01:08:33,140 might be causing your program to crash, or to freeze, or to segfault-- 1497 01:08:33,140 --> 01:08:35,899 if you've seen that yourselves before. 1498 01:08:35,899 --> 01:08:36,399 All right. 1499 01:08:36,399 --> 01:08:37,410 So that's just a tool. 1500 01:08:37,410 --> 01:08:41,899 Let's go ahead and transition now to some actual use cases here. 1501 01:08:41,899 --> 01:08:46,350 Recall from last week that it was pretty useful to be able to swap values. 1502 01:08:46,350 --> 01:08:46,850 Right? 1503 01:08:46,850 --> 01:08:50,758 With bubble sort, with selection sort, we needed to be able to exchange values 1504 01:08:50,758 --> 01:08:52,800 so that we could put things into the right place. 1505 01:08:52,800 --> 01:08:54,720 Turns out this is pretty straightforward. 1506 01:08:54,720 --> 01:08:54,960 Right? 1507 01:08:54,960 --> 01:08:57,002 And we can actually mimic this in the real world. 1508 01:08:57,002 --> 01:09:01,585 We just have opportunity for one volunteer today, one volunteer. 1509 01:09:01,585 --> 01:09:02,460 Can we get a little-- 1510 01:09:02,460 --> 01:09:03,170 OK, over here. 1511 01:09:03,170 --> 01:09:03,670 Yeah. 1512 01:09:03,670 --> 01:09:05,264 What's your name? 1513 01:09:05,264 --> 01:09:06,240 FARRAH: Farrah. 1514 01:09:06,240 --> 01:09:06,479 DAVID J. MALAN: Sorry. 1515 01:09:06,479 --> 01:09:07,161 FARRAH: Farrah. 1516 01:09:07,161 --> 01:09:08,036 DAVID J. MALAN: Vera. 1517 01:09:08,036 --> 01:09:08,990 FARRAH: Farrah. 1518 01:09:08,990 --> 01:09:10,532 DAVID J. MALAN: Oh, here, come on up. 1519 01:09:10,532 --> 01:09:12,085 Then I can hear you up here. 1520 01:09:12,085 --> 01:09:12,960 OK, what's your name? 1521 01:09:12,960 --> 01:09:13,950 FARRAH: Farrah. 1522 01:09:13,950 --> 01:09:14,825 DAVID J. MALAN: Vera. 1523 01:09:14,825 --> 01:09:15,630 FARRAH: With an F. 1524 01:09:15,630 --> 01:09:16,080 DAVID J. MALAN: Fera. 1525 01:09:16,080 --> 01:09:16,680 FARRAH: Farrah. 1526 01:09:16,680 --> 01:09:17,340 DAVID J. MALAN: Farrah. 1527 01:09:17,340 --> 01:09:17,609 FARRAH: Yes. 1528 01:09:17,609 --> 01:09:17,880 DAVID J. MALAN: Farrah. 1529 01:09:17,880 --> 01:09:18,430 Yes, OK. 1530 01:09:18,430 --> 01:09:18,930 Good. 1531 01:09:18,930 --> 01:09:19,439 Come on up. 1532 01:09:19,439 --> 01:09:20,022 Still come up. 1533 01:09:20,022 --> 01:09:20,880 [CHUCKLE] Thank you. 1534 01:09:20,880 --> 01:09:21,420 Thank you. 1535 01:09:21,420 --> 01:09:22,334 [APPLAUSE] 1536 01:09:22,334 --> 01:09:24,619 [CHEERS] 1537 01:09:24,619 --> 01:09:25,680 OK, nice to meet you. 1538 01:09:25,680 --> 01:09:26,700 FARRAH: Hi, nice to meet you. 1539 01:09:26,700 --> 01:09:27,149 DAVID J. MALAN: Farrah. 1540 01:09:27,149 --> 01:09:27,649 FARRAH: Yes. 1541 01:09:27,649 --> 01:09:28,620 DAVID J. MALAN: OK. 1542 01:09:28,620 --> 01:09:29,700 So let's go ahead here. 1543 01:09:29,700 --> 01:09:32,100 Let me give you this so that you can be mic'd as well. 1544 01:09:32,100 --> 01:09:36,479 OK, so the goal at hand is here, I have two glasses of colored water. 1545 01:09:36,479 --> 01:09:37,830 So we have some purple here. 1546 01:09:37,830 --> 01:09:39,324 [WATER POURING] 1547 01:09:39,324 --> 01:09:40,819 OK. 1548 01:09:40,819 --> 01:09:42,622 And we've got some green here. 1549 01:09:42,622 --> 01:09:43,770 [WATER POURING] 1550 01:09:43,770 --> 01:09:47,189 And the only goal at hand is to do a very simple operation 1551 01:09:47,189 --> 01:09:48,990 like we needed to do quite a bit last week, 1552 01:09:48,990 --> 01:09:51,819 which is to swap two variables just like we swap two numbers. 1553 01:09:51,819 --> 01:09:54,240 So if you could go ahead and get the purple liquid in here 1554 01:09:54,240 --> 01:09:56,250 and the green liquid in here, go. 1555 01:09:56,250 --> 01:09:59,950 1556 01:09:59,950 --> 01:10:00,450 [CHUCKLE] 1557 01:10:00,450 --> 01:10:01,993 FARRAH: Is it OK if they overlap? 1558 01:10:01,993 --> 01:10:03,160 DAVID J. MALAN: Ideally, no. 1559 01:10:03,160 --> 01:10:05,670 We want to put only purple here, and only green here, 1560 01:10:05,670 --> 01:10:07,460 and no temporary store. 1561 01:10:07,460 --> 01:10:07,960 [LAUGHTER] 1562 01:10:07,960 --> 01:10:09,050 FARRAH: Oh. 1563 01:10:09,050 --> 01:10:10,800 DAVID J. MALAN: OK, but you're hesitating. 1564 01:10:10,800 --> 01:10:12,005 Why? 1565 01:10:12,005 --> 01:10:14,630 FARRAH: Because you told me they couldn't touch [CHUCKLE] and-- 1566 01:10:14,630 --> 01:10:16,630 DAVID J. MALAN: Well, you can touch the glasses. 1567 01:10:16,630 --> 01:10:18,360 But you're hesitating to swap them, why? 1568 01:10:18,360 --> 01:10:19,020 [CLINK] 1569 01:10:19,020 --> 01:10:20,240 OK, that's just cheating. 1570 01:10:20,240 --> 01:10:21,707 [LAUGHTER] 1571 01:10:21,707 --> 01:10:22,685 [APPLAUSE] 1572 01:10:22,685 --> 01:10:25,130 [CHEERS] 1573 01:10:25,130 --> 01:10:27,840 OK, very clever. 1574 01:10:27,840 --> 01:10:29,970 Supposing you can't just move things around 1575 01:10:29,970 --> 01:10:32,890 in memory, what if I gave you a temporary variable. 1576 01:10:32,890 --> 01:10:33,390 FARRAH: OK. 1577 01:10:33,390 --> 01:10:35,310 DAVID J. MALAN: Does this help? 1578 01:10:35,310 --> 01:10:37,140 FARRAH: Yes. 1579 01:10:37,140 --> 01:10:40,573 DAVID J. MALAN: So how can we now get purple in there and green in there? 1580 01:10:40,573 --> 01:10:42,052 [CHUCKLE] 1581 01:10:42,052 --> 01:10:44,098 FARRAH: Can I put purple in here first? 1582 01:10:44,098 --> 01:10:44,973 DAVID J. MALAN: Sure. 1583 01:10:44,973 --> 01:10:45,720 FARRAH: I'm going to spill it. 1584 01:10:45,720 --> 01:10:47,056 DAVID J. MALAN: It's OK. 1585 01:10:47,056 --> 01:10:48,050 [WATER POURING] 1586 01:10:48,050 --> 01:10:49,090 OK. 1587 01:10:49,090 --> 01:10:51,340 So purple goes into the temporary, very nice. 1588 01:10:51,340 --> 01:10:51,840 [APPLAUSE] 1589 01:10:51,840 --> 01:10:55,040 FARRAH: Thank you. 1590 01:10:55,040 --> 01:10:57,290 DAVID J. MALAN: Green goes into what was purple. 1591 01:10:57,290 --> 01:10:58,410 FARRAH: Yes 1592 01:10:58,410 --> 01:11:00,920 DAVID J. MALAN: OK, good. 1593 01:11:00,920 --> 01:11:03,650 And then purple goes in-- from the temporary variable 1594 01:11:03,650 --> 01:11:06,140 into the original green glass. 1595 01:11:06,140 --> 01:11:08,070 Now, a proper round of applause if we could. 1596 01:11:08,070 --> 01:11:08,836 OK. 1597 01:11:08,836 --> 01:11:09,336 [APPLAUSE] 1598 01:11:09,336 --> 01:11:09,836 Thank you. 1599 01:11:09,836 --> 01:11:10,800 FARRAH: Thank you. 1600 01:11:10,800 --> 01:11:13,730 DAVID J. MALAN: OK. 1601 01:11:13,730 --> 01:11:17,780 So suffice it to say, that is the correct way of swapping two values. 1602 01:11:17,780 --> 01:11:21,960 But the key detail there was that Farrah had access to a temporary variable. 1603 01:11:21,960 --> 01:11:25,310 And so you would think that this idea, simple as it is in reality, 1604 01:11:25,310 --> 01:11:28,650 would translate pretty naturally to code as well. 1605 01:11:28,650 --> 01:11:30,957 But it turns out that's not necessarily the case. 1606 01:11:30,957 --> 01:11:33,290 So it turns out that if we wanted to swap two variables, 1607 01:11:33,290 --> 01:11:35,510 you might implement a function called swap 1608 01:11:35,510 --> 01:11:38,120 and just take in two integers, a and b, the goal of which 1609 01:11:38,120 --> 01:11:39,170 is to do the switcheroo. 1610 01:11:39,170 --> 01:11:41,210 Purple becomes green, green becomes purple, 1611 01:11:41,210 --> 01:11:43,580 just as a becomes b, b becomes a. 1612 01:11:43,580 --> 01:11:47,480 And you would think that we just need a temporary variable inside of that code 1613 01:11:47,480 --> 01:11:48,900 in order to make that happen. 1614 01:11:48,900 --> 01:11:51,960 So I would argue that the equivalent to what Farrah did in 1615 01:11:51,960 --> 01:11:54,980 person, in code in C, might look like this. 1616 01:11:54,980 --> 01:11:57,050 Give me a temporary variable called temp-- 1617 01:11:57,050 --> 01:12:00,950 or anything you want --store in it, a-- just as she stored one of the colors 1618 01:12:00,950 --> 01:12:05,000 in the temporary glass first, purple --then go ahead and change the value 1619 01:12:05,000 --> 01:12:07,310 of a to equal the value of b-- 1620 01:12:07,310 --> 01:12:10,940 because you've already kept a copy of a around in a temporary variable 1621 01:12:10,940 --> 01:12:13,730 --then finally, store in b what is in temp. 1622 01:12:13,730 --> 01:12:18,230 So that is the code equivalent of what Farrah did using these colored liquids. 1623 01:12:18,230 --> 01:12:22,430 Unfortunately, it's not quite as simple it would seem, as that. 1624 01:12:22,430 --> 01:12:26,090 I'm going to go ahead and open up, say, a program that I wrote in advance here 1625 01:12:26,090 --> 01:12:27,980 too, called-- 1626 01:12:27,980 --> 01:12:29,930 intentionally --no swap. 1627 01:12:29,930 --> 01:12:33,270 Even though you would like to think that it does exactly that. 1628 01:12:33,270 --> 01:12:36,140 So notice that in this code we have-- 1629 01:12:36,140 --> 01:12:39,170 including standard I/O dot h --we have a prototype 1630 01:12:39,170 --> 01:12:41,390 for the function I just proposed we make, swap, 1631 01:12:41,390 --> 01:12:43,220 that takes two ints a and b. 1632 01:12:43,220 --> 01:12:44,660 Here's my main function. 1633 01:12:44,660 --> 01:12:47,750 And I'm just going to arbitrarily initialize x to 1 and y 1634 01:12:47,750 --> 01:12:52,370 to 2, just as I initialized one glass to purple and one glass to green. 1635 01:12:52,370 --> 01:12:55,590 Then, just so that we can see what's going on inside our code, 1636 01:12:55,590 --> 01:12:59,390 I'm just going to print out x is such and such, y is such and such-- 1637 01:12:59,390 --> 01:13:02,510 printing x and y --then I'm going to call that swap function, 1638 01:13:02,510 --> 01:13:03,710 swapping x and y. 1639 01:13:03,710 --> 01:13:06,008 And then I'm going to literally print the same phrase. 1640 01:13:06,008 --> 01:13:09,050 But I'm hoping that it's going to say the opposite the second time around 1641 01:13:09,050 --> 01:13:10,970 if x and y are indeed swapped. 1642 01:13:10,970 --> 01:13:12,680 So how do I implement swap? 1643 01:13:12,680 --> 01:13:14,900 Well, it would seem to be, with this same code, 1644 01:13:14,900 --> 01:13:17,420 using a temporary variable-- or temporary glass, 1645 01:13:17,420 --> 01:13:21,170 just as Farrah did for the two liquids. 1646 01:13:21,170 --> 01:13:24,860 Unfortunately, when I go ahead and run this program, no swap-- 1647 01:13:24,860 --> 01:13:27,890 and its name alone is a bit of a spoiler --if I go ahead 1648 01:13:27,890 --> 01:13:33,380 and run dot slash no swap with x and y hardcoded to 1 and 2 respectively, 1649 01:13:33,380 --> 01:13:37,060 you'll see that it runs, and says, x is 1, y is 2, x is 1, 1650 01:13:37,060 --> 01:13:42,860 y is 2, thereby clearly failing to swap. 1651 01:13:42,860 --> 01:13:47,330 But if you're in agreement with me, this feels like it's correct. 1652 01:13:47,330 --> 01:13:49,490 I didn't get any compiler errors. 1653 01:13:49,490 --> 01:13:54,920 Yet, this line of code, which uses swap, seems to have no effect. 1654 01:13:54,920 --> 01:13:57,710 So what might the intuition here or hunch 1655 01:13:57,710 --> 01:14:01,640 be for why this program indeed does not swap? 1656 01:14:01,640 --> 01:14:05,584 AUDIENCE: So when it takes the [INAUDIBLE] in the-- 1657 01:14:05,584 --> 01:14:09,270 when it takes the [INAUDIBLE] whole new variable that [INAUDIBLE].. 1658 01:14:09,270 --> 01:14:10,520 DAVID J. MALAN: Yeah, exactly. 1659 01:14:10,520 --> 01:14:14,150 When you pass inputs to a function, you are effectively 1660 01:14:14,150 --> 01:14:17,850 passing copies of your own values to that function. 1661 01:14:17,850 --> 01:14:22,310 And so when you have two variables, x and y-- initialized to 1 and 2 --yes, 1662 01:14:22,310 --> 01:14:24,350 you're passing them as input to swap. 1663 01:14:24,350 --> 01:14:29,570 But swap is not getting actually x and y, it's getting copies of x and y. 1664 01:14:29,570 --> 01:14:34,020 And per its prototype, is calling them a and b, respectively. 1665 01:14:34,020 --> 01:14:36,740 So it turns out this swap function actually does work. 1666 01:14:36,740 --> 01:14:38,240 It swaps a and b. 1667 01:14:38,240 --> 01:14:42,800 But it does not swap x and y because those are copies. 1668 01:14:42,800 --> 01:14:48,470 Now this seems especially worrisome now in so far as I cannot seem to implement 1669 01:14:48,470 --> 01:14:52,640 a function called swap that can even implement bubble sorts or selection 1670 01:14:52,640 --> 01:14:53,150 sort. 1671 01:14:53,150 --> 01:14:55,233 And frankly, you might have run into this yourself 1672 01:14:55,233 --> 01:14:57,830 if trying to implement this for one of your voting algorithms. 1673 01:14:57,830 --> 01:14:59,990 If you needed to do a swap, if you had a helper function, 1674 01:14:59,990 --> 01:15:02,810 you might have had to think about it in a somewhat different way. 1675 01:15:02,810 --> 01:15:04,738 So what's the explanation for all of this? 1676 01:15:04,738 --> 01:15:06,530 Well, this version of swap doesn't actually 1677 01:15:06,530 --> 01:15:09,470 work because again, if we go back to first principles, 1678 01:15:09,470 --> 01:15:12,410 go inside of the computer's memory and consider 1679 01:15:12,410 --> 01:15:15,980 our memory is just a grid of bytes, top to bottom, left to right. 1680 01:15:15,980 --> 01:15:17,300 What's really going on? 1681 01:15:17,300 --> 01:15:19,190 Well, it turns out that all this time we've 1682 01:15:19,190 --> 01:15:22,910 been using C, my computer isn't just arbitrarily putting things in memory 1683 01:15:22,910 --> 01:15:24,470 over here, over here, over here. 1684 01:15:24,470 --> 01:15:27,990 It actually uses your computer's memory in a methodical way. 1685 01:15:27,990 --> 01:15:29,690 Certain types of data go down here. 1686 01:15:29,690 --> 01:15:32,250 Certain types of data go up here, and so forth. 1687 01:15:32,250 --> 01:15:33,830 So what is that methodology? 1688 01:15:33,830 --> 01:15:37,050 Well, if we consider it just abstractly as a big rectangle, 1689 01:15:37,050 --> 01:15:39,470 it turns out that if this is your computer's memory, 1690 01:15:39,470 --> 01:15:43,430 at the very top of it, conceptually, goes all of the 0s and 1s 1691 01:15:43,430 --> 01:15:45,500 that Clang compiled for you. 1692 01:15:45,500 --> 01:15:49,460 The so-called machine code, is literally loaded into your computer's RAM when 1693 01:15:49,460 --> 01:15:53,390 you run dot slash something, or in a Mac or PC, when you double click an icon, 1694 01:15:53,390 --> 01:15:57,350 those 0s and 1s-- the compiled code --is loaded into your computer's memory up 1695 01:15:57,350 --> 01:15:57,870 here-- 1696 01:15:57,870 --> 01:16:00,980 let's say --and it might take up this much space for a small program, 1697 01:16:00,980 --> 01:16:02,900 this much space for a big program. 1698 01:16:02,900 --> 01:16:07,250 Below that, if your program uses any global variables or other type of data, 1699 01:16:07,250 --> 01:16:10,650 those will go just below, so to speak, the machine 1700 01:16:10,650 --> 01:16:11,900 code in the computer's memory. 1701 01:16:11,900 --> 01:16:12,290 Why? 1702 01:16:12,290 --> 01:16:14,540 Just because humans needed to decide when implementing 1703 01:16:14,540 --> 01:16:17,720 compilers where to put stuff in the computer's memory. 1704 01:16:17,720 --> 01:16:21,080 Below that is a special chunk of memory called the heap. 1705 01:16:21,080 --> 01:16:22,150 And Valgrind gave it-- 1706 01:16:22,150 --> 01:16:24,620 a teaser of this word a moment ago. 1707 01:16:24,620 --> 01:16:30,410 The heap is a big chunk of memory where you can allocate memory from. 1708 01:16:30,410 --> 01:16:32,210 And in fact, if you call malloc-- 1709 01:16:32,210 --> 01:16:34,530 as I did once before --that memory is going 1710 01:16:34,530 --> 01:16:37,460 to come from this region of the computer's memory, 1711 01:16:37,460 --> 01:16:40,670 below the global variables, below the machine code, 1712 01:16:40,670 --> 01:16:44,230 because that's where Clang and compiler designers decided to draw memory from. 1713 01:16:44,230 --> 01:16:47,870 So every time you call malloc, you're carving out more and more bytes 1714 01:16:47,870 --> 01:16:49,250 for your program to use. 1715 01:16:49,250 --> 01:16:51,500 And that heap grows, conceptually, downward. 1716 01:16:51,500 --> 01:16:53,510 The more memory you use, the lower, lower, 1717 01:16:53,510 --> 01:16:56,450 lower it gets in this artist's rendition. 1718 01:16:56,450 --> 01:17:00,290 However, there's a different portion of memory here down below that's 1719 01:17:00,290 --> 01:17:02,450 used for a very different purpose. 1720 01:17:02,450 --> 01:17:06,410 Anytime you call a function in your program, 1721 01:17:06,410 --> 01:17:10,340 it turns out that that functions local variables end up 1722 01:17:10,340 --> 01:17:14,010 going at the bottom of your computer's memory on what's called a stack. 1723 01:17:14,010 --> 01:17:17,090 So if you have main, the default function, 1724 01:17:17,090 --> 01:17:20,300 and it has one or more arguments, or one or more local variables, 1725 01:17:20,300 --> 01:17:23,360 those variables just go down here, conceptually, in memory. 1726 01:17:23,360 --> 01:17:26,990 And if you call a function like swap, or anything else, 1727 01:17:26,990 --> 01:17:29,670 it just keeps using more and more memory above that. 1728 01:17:29,670 --> 01:17:33,050 So the heap is where malloc gets you bytes from. 1729 01:17:33,050 --> 01:17:36,230 And the stack is where your local variables go when 1730 01:17:36,230 --> 01:17:38,640 functions are called, bottom to top. 1731 01:17:38,640 --> 01:17:40,400 So let's see this in action here. 1732 01:17:40,400 --> 01:17:44,300 If we consider the stack alone in the context of swapping variables 1733 01:17:44,300 --> 01:17:47,730 unsuccessfully, what's really happening with code like this? 1734 01:17:47,730 --> 01:17:50,120 Well, on the bottom of my memory when I call main, 1735 01:17:50,120 --> 01:17:54,530 I am given-- by nature of how C programs work when compiled --a slice of memory 1736 01:17:54,530 --> 01:17:56,860 called a frame, a stack frame. 1737 01:17:56,860 --> 01:18:00,890 And this is just some number of bytes that store maybe argv, argc, 1738 01:18:00,890 --> 01:18:03,140 it stores x and y, my local variables. 1739 01:18:03,140 --> 01:18:07,250 Any variables I have in main get stored in this chunk of memory here. 1740 01:18:07,250 --> 01:18:11,480 If main calls a function, like this swap function, that function gets 1741 01:18:11,480 --> 01:18:15,140 its own frame of memory, its own slice of memory, that conceptually, 1742 01:18:15,140 --> 01:18:16,490 is above main. 1743 01:18:16,490 --> 01:18:19,400 So swap has two variables-- right-- 1744 01:18:19,400 --> 01:18:21,470 two arguments, right, a and b. 1745 01:18:21,470 --> 01:18:23,440 And it also had one other variable. 1746 01:18:23,440 --> 01:18:24,260 AUDIENCE: Temp. 1747 01:18:24,260 --> 01:18:25,135 DAVID J. MALAN: Temp. 1748 01:18:25,135 --> 01:18:28,700 So those three values are going to be in this frame of memory. 1749 01:18:28,700 --> 01:18:32,557 X and y are on the bottom, a, b, and temp are above it in there. 1750 01:18:32,557 --> 01:18:33,890 So let's actually focus on this. 1751 01:18:33,890 --> 01:18:36,200 If we focus on main, when my program first runs, 1752 01:18:36,200 --> 01:18:37,590 I have two variables, x and y. 1753 01:18:37,590 --> 01:18:40,730 And I initialize those to 1 and 2, respectively. 1754 01:18:40,730 --> 01:18:42,740 Then the swap function gets called. 1755 01:18:42,740 --> 01:18:46,340 So another frame gets used on the stack, just another bunch of bytes 1756 01:18:46,340 --> 01:18:48,320 are being allocated by the computer for me. 1757 01:18:48,320 --> 01:18:51,383 And swap had three variables, a, b, and temp. 1758 01:18:51,383 --> 01:18:54,050 The first two were its inputs, its arguments, the third of which 1759 01:18:54,050 --> 01:18:56,960 was an explicit temporary variable I gave it. 1760 01:18:56,960 --> 01:19:02,180 With those lines of code from before I initialized a and b to 1 and 2, 1761 01:19:02,180 --> 01:19:02,900 respectively. 1762 01:19:02,900 --> 01:19:07,640 And notice, they are literally identical to x and y but copies of x and y. 1763 01:19:07,640 --> 01:19:10,200 And then if we consider the code, what happens next? 1764 01:19:10,200 --> 01:19:12,570 Well, temp is assigned a. 1765 01:19:12,570 --> 01:19:14,696 So temp should take on what value? 1766 01:19:14,696 --> 01:19:15,590 AUDIENCE: 1. 1767 01:19:15,590 --> 01:19:16,550 DAVID J. MALAN: Just 1. 1768 01:19:16,550 --> 01:19:18,910 And then second line of code, a equals b. 1769 01:19:18,910 --> 01:19:23,120 So a should take on the value of b, which means it's now 2. 1770 01:19:23,120 --> 01:19:28,730 And meanwhile, b equals temp means that b should take on the value of 1. 1771 01:19:28,730 --> 01:19:31,040 And so now we have successfully swapped, it 1772 01:19:31,040 --> 01:19:34,892 seems-- with these three lines of code taken from my actual program --a and b. 1773 01:19:34,892 --> 01:19:37,850 Unfortunately, the thing about a stack is just like in the dining hall. 1774 01:19:37,850 --> 01:19:41,432 When you have the stacks of Harvard trays in the dining halls and you 1775 01:19:41,432 --> 01:19:43,640 keep putting news trays on top, on top, but then they 1776 01:19:43,640 --> 01:19:45,950 keep getting taken from the top as well. 1777 01:19:45,950 --> 01:19:50,360 So just when swap is done with its third line of code, 1778 01:19:50,360 --> 01:19:54,273 it's like someone has taken the tray away and that frame disappears. 1779 01:19:54,273 --> 01:19:56,190 So the memory technically doesn't go anywhere. 1780 01:19:56,190 --> 01:19:57,540 It's still a physical device. 1781 01:19:57,540 --> 01:20:01,460 But it's just no longer allocated for my own program. 1782 01:20:01,460 --> 01:20:05,360 So main is still intact after the swap function returns. 1783 01:20:05,360 --> 01:20:10,010 But of course, x and y have not actually been affected. 1784 01:20:10,010 --> 01:20:14,960 So what's the fundamental solution to this problem? 1785 01:20:14,960 --> 01:20:18,070 Swap did not work because it was passed copies. 1786 01:20:18,070 --> 01:20:20,390 It was passed by value, so to speak, when 1787 01:20:20,390 --> 01:20:26,510 main calls swap, passing an x and y, I get copies of x and y called a and b. 1788 01:20:26,510 --> 01:20:28,176 What could I do instead? 1789 01:20:28,176 --> 01:20:29,054 AUDIENCE: [INAUDIBLE] 1790 01:20:29,054 --> 01:20:30,387 DAVID J. MALAN: A little louder. 1791 01:20:30,387 --> 01:20:31,610 AUDIENCE: Pass by reference. 1792 01:20:31,610 --> 01:20:34,027 DAVID J. MALAN: Pass by reference, and what's a reference? 1793 01:20:34,027 --> 01:20:35,290 AUDIENCE: Make a pointer. 1794 01:20:35,290 --> 01:20:35,680 DAVID J. MALAN: Yeah. 1795 01:20:35,680 --> 01:20:38,180 So a reference is synonymous for our purposes, with pointer. 1796 01:20:38,180 --> 01:20:41,020 So yeah, that's actually kind of the germ of an idea from before. 1797 01:20:41,020 --> 01:20:44,260 If we now have the ability to address things --like slap some addresses 1798 01:20:44,260 --> 01:20:45,520 on mailboxes-- 1799 01:20:45,520 --> 01:20:49,570 you know what, let's not just pass from main to swap, literally x 1800 01:20:49,570 --> 01:20:54,520 and y, why don't we tell swap what the address of x is and the address of y 1801 01:20:54,520 --> 01:20:58,910 so that my swap code can go to x and y, change them. 1802 01:20:58,910 --> 01:21:00,820 And then even when the swap function returns, 1803 01:21:00,820 --> 01:21:03,760 that's fine because it went to the right locations. 1804 01:21:03,760 --> 01:21:06,320 So pictorially, what I really want to do is this. 1805 01:21:06,320 --> 01:21:09,220 If I take another stab at this, I'm going to go ahead now 1806 01:21:09,220 --> 01:21:12,580 and reinitialize main to have x and y equal to 1 and 2. 1807 01:21:12,580 --> 01:21:14,230 I'm now going to call swap. 1808 01:21:14,230 --> 01:21:16,750 But what I really want to do, using pictures this time, 1809 01:21:16,750 --> 01:21:20,950 is I want a to point to x and b to point to y. 1810 01:21:20,950 --> 01:21:24,430 I don't want them to equal x and y because now I 1811 01:21:24,430 --> 01:21:27,220 can sort of follow the breadcrumbs, or the chutes and ladder idea, 1812 01:21:27,220 --> 01:21:28,553 whatever metaphor works for you. 1813 01:21:28,553 --> 01:21:34,122 You can go from a to x, you can go from b to y, and do the switcheroo There 1814 01:21:34,122 --> 01:21:35,830 So the code I'm actually going to use now 1815 01:21:35,830 --> 01:21:37,788 is a little scary looking but it just goes back 1816 01:21:37,788 --> 01:21:40,540 to those first principles from the very start today. 1817 01:21:40,540 --> 01:21:44,140 I need to put, unfortunately, some asterisks all over the place here. 1818 01:21:44,140 --> 01:21:45,310 But let's see why. 1819 01:21:45,310 --> 01:21:47,860 First, let me actually back up for just a moment 1820 01:21:47,860 --> 01:21:53,440 and propose that the swap code I'm going to use now is not that in no swap dot c 1821 01:21:53,440 --> 01:21:58,060 but in a program called swap dot c. 1822 01:21:58,060 --> 01:22:02,830 So in swap dot C I have almost the same code, except this. 1823 01:22:02,830 --> 01:22:06,730 First of all, on line 13, I'm no longer passing an x and y, 1824 01:22:06,730 --> 01:22:10,330 I'm passing in the address of x and the address of y. 1825 01:22:10,330 --> 01:22:12,880 That was the key detail from earlier today when we first 1826 01:22:12,880 --> 01:22:13,870 introduced ampersand. 1827 01:22:13,870 --> 01:22:16,245 So this means, here's the address of x, the address of y. 1828 01:22:16,245 --> 01:22:19,300 It's like providing a map to swap so that it can go there. 1829 01:22:19,300 --> 01:22:23,110 The syntax for defining a function that accepts addresses 1830 01:22:23,110 --> 01:22:28,060 is unfortunately a little cryptic but name of the function, like swap, 1831 01:22:28,060 --> 01:22:30,880 the type of pointer, and the type of pointer. 1832 01:22:30,880 --> 01:22:36,820 So, int Star a means, I accept the address of an int and call it a. 1833 01:22:36,820 --> 01:22:40,330 I also accept the address of another int and I call it b. 1834 01:22:40,330 --> 01:22:42,650 So that's all the star means in this context. 1835 01:22:42,650 --> 01:22:43,780 It's a pointer to an int. 1836 01:22:43,780 --> 01:22:46,600 It's a pointer to an int, both b and a. 1837 01:22:46,600 --> 01:22:50,560 Down here just gets a little scary looking but it's the same exact thing. 1838 01:22:50,560 --> 01:22:51,790 What does star a mean? 1839 01:22:51,790 --> 01:22:55,010 Well, star means go to that address. 1840 01:22:55,010 --> 01:22:59,180 So star a means follow the arrow to whatever a is pointing at. 1841 01:22:59,180 --> 01:23:02,160 And what was a pointing at? 1842 01:23:02,160 --> 01:23:03,610 It was pointing at x. 1843 01:23:03,610 --> 01:23:07,180 So this means go to the address in a and that will reach-- 1844 01:23:07,180 --> 01:23:10,110 that will lead you to x, whose value I think is 1. 1845 01:23:10,110 --> 01:23:12,400 And that's going to store the number 1 in temp. 1846 01:23:12,400 --> 01:23:14,230 The second line of code means go to b. 1847 01:23:14,230 --> 01:23:19,106 So if you follow the address in b, where does it lead you? 1848 01:23:19,106 --> 01:23:21,810 It should lead you to what we called y. 1849 01:23:21,810 --> 01:23:24,060 And that y was a 2. 1850 01:23:24,060 --> 01:23:27,690 And star a means go to the address in a and put whatever 1851 01:23:27,690 --> 01:23:30,692 was at the address in b there as well. 1852 01:23:30,692 --> 01:23:32,400 And then lastly, go ahead and take temp-- 1853 01:23:32,400 --> 01:23:35,010 which is just the number one I claim --and go ahead and put it 1854 01:23:35,010 --> 01:23:36,450 at the address in b. 1855 01:23:36,450 --> 01:23:38,020 It's hard to see this in code. 1856 01:23:38,020 --> 01:23:39,300 So let's instead visualize it. 1857 01:23:39,300 --> 01:23:43,560 Instead, if I go back here to these three lines of code, 1858 01:23:43,560 --> 01:23:45,540 here now is a correct version. 1859 01:23:45,540 --> 01:23:48,190 The first line of code here says go to-- 1860 01:23:48,190 --> 01:23:51,390 whatever-- go to the address in a and store it in temp. 1861 01:23:51,390 --> 01:23:53,730 So in a moment I'm going to go to the address in a 1862 01:23:53,730 --> 01:23:56,040 by following this arrow down to x. 1863 01:23:56,040 --> 01:23:59,700 And I'm going to store in temp the number 1. 1864 01:23:59,700 --> 01:24:02,520 Second line of code, I'm going to go to the address in b. 1865 01:24:02,520 --> 01:24:05,520 so that's like following the arrow, which leads me to the 2 1866 01:24:05,520 --> 01:24:09,660 I then follow the address and a, which leads me to x. 1867 01:24:09,660 --> 01:24:12,750 And I put 2 in x. 1868 01:24:12,750 --> 01:24:14,598 Last line, I go to temp. 1869 01:24:14,598 --> 01:24:15,390 That's an easy one. 1870 01:24:15,390 --> 01:24:16,620 It's just the number 1. 1871 01:24:16,620 --> 01:24:20,620 Then I say, go to the address in b and store temp there. 1872 01:24:20,620 --> 01:24:23,730 So let's go to the address in b by following the arrow 1873 01:24:23,730 --> 01:24:26,160 and change it to temp. 1874 01:24:26,160 --> 01:24:28,530 And so now I've still called another function. 1875 01:24:28,530 --> 01:24:31,440 I'm still using local variables but these local variables 1876 01:24:31,440 --> 01:24:35,490 are by definition now, pointers, addresses, or sort of treasure maps 1877 01:24:35,490 --> 01:24:36,760 that are leading me-- 1878 01:24:36,760 --> 01:24:41,890 a la these arrows --to the values in memory that I actually care about. 1879 01:24:41,890 --> 01:24:43,650 And so now when the swap function returns, 1880 01:24:43,650 --> 01:24:46,710 it doesn't matter that a and b and temp go away, 1881 01:24:46,710 --> 01:24:53,660 I have actually changed fundamentally, what x and y themselves were. 1882 01:24:53,660 --> 01:24:57,100 Any questions then on that code? 1883 01:24:57,100 --> 01:24:57,746 Yeah. 1884 01:24:57,746 --> 01:25:01,218 AUDIENCE: [INAUDIBLE] 1885 01:25:01,218 --> 01:25:07,840 1886 01:25:07,840 --> 01:25:09,090 DAVID J. MALAN: Good question. 1887 01:25:09,090 --> 01:25:12,830 So in this case, there is nothing to free because we did not use malloc. 1888 01:25:12,830 --> 01:25:15,000 So you can use addresses without using malloc. 1889 01:25:15,000 --> 01:25:17,000 In this case, I'm using the address of operator, 1890 01:25:17,000 --> 01:25:19,750 which just tells me where x and y is-- 1891 01:25:19,750 --> 01:25:20,418 or-- 1892 01:25:20,418 --> 01:25:22,460 AUDIENCE: Not with this [INAUDIBLE],, in general, 1893 01:25:22,460 --> 01:25:25,128 would you use malloc [INAUDIBLE] 1894 01:25:25,128 --> 01:25:26,670 DAVID J. MALAN: Really good question. 1895 01:25:26,670 --> 01:25:31,190 So if you're using malloc in a function and it returns some chunk of memory, 1896 01:25:31,190 --> 01:25:32,540 how do you deal with that? 1897 01:25:32,540 --> 01:25:35,600 The onus is on you to remember to somehow call free 1898 01:25:35,600 --> 01:25:37,340 on that same block of memory. 1899 01:25:37,340 --> 01:25:38,990 Case in point, getString does this. 1900 01:25:38,990 --> 01:25:42,140 Long story short, getString allocates memory using malloc. 1901 01:25:42,140 --> 01:25:45,230 And you, up to this date have never had to call 1902 01:25:45,230 --> 01:25:48,760 free on strings, that's actually because one of the features of the CS50 library 1903 01:25:48,760 --> 01:25:50,635 is something called garbage collection, where 1904 01:25:50,635 --> 01:25:54,050 we notice if your program quits without freeing memory from getString. 1905 01:25:54,050 --> 01:25:55,700 We do it for you magically. 1906 01:25:55,700 --> 01:25:57,830 But you can see in the CS50 library how you can 1907 01:25:57,830 --> 01:25:59,560 do exactly what you're asking about. 1908 01:25:59,560 --> 01:26:02,250 And, or just ask me after as well. 1909 01:26:02,250 --> 01:26:02,750 All right. 1910 01:26:02,750 --> 01:26:06,530 So this is only to say that, OK, after all of last week's presumption 1911 01:26:06,530 --> 01:26:09,510 that we could actually swap values, we can in fact do it. 1912 01:26:09,510 --> 01:26:13,382 So how can we go about now solving more interesting, more real world problems? 1913 01:26:13,382 --> 01:26:15,590 Well, let's transition from here to some of the power 1914 01:26:15,590 --> 01:26:18,410 now that we gain by understanding these kinds of primitives. 1915 01:26:18,410 --> 01:26:22,280 First of all, you might have noticed or anticipated this wasn't necessarily 1916 01:26:22,280 --> 01:26:23,440 the best design. 1917 01:26:23,440 --> 01:26:23,990 Right? 1918 01:26:23,990 --> 01:26:26,640 What strikes you as worrisome about this picture at the moment? 1919 01:26:26,640 --> 01:26:27,890 AUDIENCE: They're gonna crash. 1920 01:26:27,890 --> 01:26:29,360 DAVID J. MALAN: Right, they're going to collide with each other. 1921 01:26:29,360 --> 01:26:29,860 Right? 1922 01:26:29,860 --> 01:26:32,698 If I keep calling malloc, malloc, malloc, malloc, per the arrow, 1923 01:26:32,698 --> 01:26:35,240 I claim that you're going to keep using more and more memory. 1924 01:26:35,240 --> 01:26:37,657 But it turns out you're going to keep using the stack too. 1925 01:26:37,657 --> 01:26:40,040 If you call function, function, function, function, 1926 01:26:40,040 --> 01:26:44,240 you're going to collide or somehow overrun each of these chunks of memory. 1927 01:26:44,240 --> 01:26:46,530 And in fact, recall recursion from last week. 1928 01:26:46,530 --> 01:26:49,670 If you don't have that base case and a function calls itself forever, 1929 01:26:49,670 --> 01:26:52,718 you have what's actually called a stack overflow. 1930 01:26:52,718 --> 01:26:55,760 And those of you familiar with the popular website for programmers, stack 1931 01:26:55,760 --> 01:26:58,940 overflow derives its name from exactly that idea, the fact 1932 01:26:58,940 --> 01:27:02,300 that a computer if running a program that has some bug-- 1933 01:27:02,300 --> 01:27:06,260 whereby, function calls itself again, and again, and again, and again, 1934 01:27:06,260 --> 01:27:09,110 and never stopping --you might overflow the stack. 1935 01:27:09,110 --> 01:27:11,090 And there's other incarnations of that as well. 1936 01:27:11,090 --> 01:27:14,330 But that's one of the forms from which the website gets its name. 1937 01:27:14,330 --> 01:27:15,650 Heap overflow is the opposite. 1938 01:27:15,650 --> 01:27:18,192 When you keep calling malloc, malloc, malloc, malloc, and you 1939 01:27:18,192 --> 01:27:21,260 just ask for so much memory that you overwrite memory that's 1940 01:27:21,260 --> 01:27:22,820 being used by some of your functions. 1941 01:27:22,820 --> 01:27:24,920 Unfortunately, this is just the way life is. 1942 01:27:24,920 --> 01:27:28,290 If you have a finite amount of memory, there is this risk. 1943 01:27:28,290 --> 01:27:32,600 And this is why computers can only use so much memory before they indeed 1944 01:27:32,600 --> 01:27:35,870 can't oh, load more files for you, can't open more images for you, 1945 01:27:35,870 --> 01:27:40,520 or simply crash or freeze if the problem wasn't anticipated. 1946 01:27:40,520 --> 01:27:43,220 Those are generally known as buffer overflows. 1947 01:27:43,220 --> 01:27:46,790 So let's take off one final set of training wheels, if you will, 1948 01:27:46,790 --> 01:27:49,700 all of these functions that you asked about earlier today. 1949 01:27:49,700 --> 01:27:52,010 All of these functions, getFloat, getString, getDouble, 1950 01:27:52,010 --> 01:27:57,380 and so forth-- from the CS50 library --actually deal with pointers for you 1951 01:27:57,380 --> 01:27:59,990 and deal with memory addresses in a way that allows you not 1952 01:27:59,990 --> 01:28:01,610 to have to worry about them. 1953 01:28:01,610 --> 01:28:05,408 Let me go ahead and implement the same idea as getInt, 1954 01:28:05,408 --> 01:28:08,450 but the low level way that you would have to do it if you didn't actually 1955 01:28:08,450 --> 01:28:09,948 have CS50's library. 1956 01:28:09,948 --> 01:28:11,990 I'm going to go ahead and create a program called 1957 01:28:11,990 --> 01:28:14,500 scan f for formatted scan. 1958 01:28:14,500 --> 01:28:18,510 And I'm going to go ahead and implement the following logic. 1959 01:28:18,510 --> 01:28:22,490 Let me go ahead and first give myself include standard I/O dot 1960 01:28:22,490 --> 01:28:24,740 h-- because I'm not going to use the CS50 library here 1961 01:28:24,740 --> 01:28:28,550 at all --int main void-- so I have a default function --let me give myself 1962 01:28:28,550 --> 01:28:29,900 a variable x. 1963 01:28:29,900 --> 01:28:33,620 And let me go ahead and ask the human for a value of x. 1964 01:28:33,620 --> 01:28:36,620 And then normally, I would have done this, 1965 01:28:36,620 --> 01:28:39,380 getInt and get the int from the user. 1966 01:28:39,380 --> 01:28:42,310 If we're taking away the CS50 library, we need an alternative. 1967 01:28:42,310 --> 01:28:44,060 And it turns out there's a function called 1968 01:28:44,060 --> 01:28:48,140 scanf and scanf is kind of similar to printf, 1969 01:28:48,140 --> 01:28:52,160 where you give it a format code, which signifies what it is you want to scan 1970 01:28:52,160 --> 01:28:54,620 from the user's keyboard, so to speak. 1971 01:28:54,620 --> 01:28:58,370 And you specify the address of a chunk of memory 1972 01:28:58,370 --> 01:29:01,550 that you want to put the user's input in. 1973 01:29:01,550 --> 01:29:04,250 And then I'm going to go ahead, just arbitrarily, and print out 1974 01:29:04,250 --> 01:29:08,370 that the human here typed in, for instance, that value. 1975 01:29:08,370 --> 01:29:09,530 So what's new here? 1976 01:29:09,530 --> 01:29:10,880 It's this line here. 1977 01:29:10,880 --> 01:29:15,270 If we did not have the CS50 library and in turn, the getInt function, 1978 01:29:15,270 --> 01:29:18,800 this is the line of code you would instead have been using since Week 1 1979 01:29:18,800 --> 01:29:20,870 to get an integer from the user. 1980 01:29:20,870 --> 01:29:24,890 It's up to you on line 5 to declare the variable, like x and int. 1981 01:29:24,890 --> 01:29:28,910 It's then up to you on line 7 to pass the address of that variable 1982 01:29:28,910 --> 01:29:33,440 to scanf because scanf's purpose in life is to give the human a blinking prompt. 1983 01:29:33,440 --> 01:29:36,260 And provided the human types in a number and hits enter, 1984 01:29:36,260 --> 01:29:40,590 that number will get stored at that address for you. 1985 01:29:40,590 --> 01:29:44,330 And the reason why you need to call a function like scanf here-- 1986 01:29:44,330 --> 01:29:47,900 or rather, the reason that you need to pass to scanf, the address of x, 1987 01:29:47,900 --> 01:29:50,210 is for the same reason as swapping. 1988 01:29:50,210 --> 01:29:53,660 If you want to use a helper function, something you wrote or someone else 1989 01:29:53,660 --> 01:29:57,500 wrote, and you want it to change the value of a variable, 1990 01:29:57,500 --> 01:29:59,180 you cannot pass it by value. 1991 01:29:59,180 --> 01:30:02,240 You can't just pass an x because it will get a copy. 1992 01:30:02,240 --> 01:30:03,770 And that will not persist. 1993 01:30:03,770 --> 01:30:06,650 You have to instead use ampersand x to pass the address 1994 01:30:06,650 --> 01:30:09,320 of x so that the function, swap-- 1995 01:30:09,320 --> 01:30:11,990 or in this case, scanf --can go to that address 1996 01:30:11,990 --> 01:30:14,720 and put some value there for you. 1997 01:30:14,720 --> 01:30:17,000 Unfortunately, what scanf does not do is if the user 1998 01:30:17,000 --> 01:30:19,520 types in Emma instead of an int, it's quite 1999 01:30:19,520 --> 01:30:21,440 possible the program will choke, or crash, 2000 01:30:21,440 --> 01:30:23,270 or behave in some unpredictable way. 2001 01:30:23,270 --> 01:30:26,480 There's no error checking built in to scanf in this case. 2002 01:30:26,480 --> 01:30:27,690 But let's try another thing. 2003 01:30:27,690 --> 01:30:29,732 It's not that interesting to read in just an int. 2004 01:30:29,732 --> 01:30:31,830 Let's try to read in something like a string. 2005 01:30:31,830 --> 01:30:33,560 So I could give myself a string s-- 2006 01:30:33,560 --> 01:30:35,990 although we know that there is no such thing as string. 2007 01:30:35,990 --> 01:30:38,420 That's technically a char star or the address 2008 01:30:38,420 --> 01:30:42,720 of a character called s --let me go ahead and prompt the human for string 2009 01:30:42,720 --> 01:30:43,970 s here. 2010 01:30:43,970 --> 01:30:47,150 And let me go ahead and read into that string using 2011 01:30:47,150 --> 01:30:50,870 the percent s format code, the value s. 2012 01:30:50,870 --> 01:30:55,670 And then let me go ahead and print out what the human typed for us, s colon 2013 01:30:55,670 --> 01:30:56,660 that. 2014 01:30:56,660 --> 01:30:59,270 So what am I doing here? 2015 01:30:59,270 --> 01:31:03,320 Line 5 is saying, give me a variable called s 2016 01:31:03,320 --> 01:31:06,650 that's going to store the address of a character. 2017 01:31:06,650 --> 01:31:08,570 Line 6 just says, s colon, like print. 2018 01:31:08,570 --> 01:31:11,150 It's a prompt for the human, nothing too interesting there. 2019 01:31:11,150 --> 01:31:13,840 scanf is this function that takes the format code 2020 01:31:13,840 --> 01:31:19,010 so it knows what to read from the user's keyboard and the address of a place 2021 01:31:19,010 --> 01:31:19,977 to put it. 2022 01:31:19,977 --> 01:31:22,310 And char star-- this is an address --I don't need to use 2023 01:31:22,310 --> 01:31:25,610 ampersand because unlike an int, char star is already, 2024 01:31:25,610 --> 01:31:28,580 by definition, a pointer or an address. 2025 01:31:28,580 --> 01:31:31,850 And then lastly, I just print out whatever the human typed in. 2026 01:31:31,850 --> 01:31:33,770 Unfortunately, let's see what happens here. 2027 01:31:33,770 --> 01:31:37,640 Let me go ahead and save this. 2028 01:31:37,640 --> 01:31:42,740 Make scanf-- give myself a bigger terminal window --enter. 2029 01:31:42,740 --> 01:31:43,590 Oh, my goodness. 2030 01:31:43,590 --> 01:31:44,090 All right. 2031 01:31:44,090 --> 01:31:44,965 So what's wrong here? 2032 01:31:44,965 --> 01:31:47,645 Variable s is uninitialized when used here. 2033 01:31:47,645 --> 01:31:49,520 So Clang is trying to protect me from myself. 2034 01:31:49,520 --> 01:31:52,070 I haven't initialized s to an address. 2035 01:31:52,070 --> 01:31:53,810 Where do we want to put Emma's name? 2036 01:31:53,810 --> 01:31:57,680 Well, maybe we could do like 0x123, or something like this, 2037 01:31:57,680 --> 01:31:58,992 or in the absence of that-- 2038 01:31:58,992 --> 01:32:00,950 if you don't know the address in advance --null 2039 01:32:00,950 --> 01:32:02,783 is the convention to which it's alluding to. 2040 01:32:02,783 --> 01:32:08,310 N-U-L-L is a special pointer that means there is no pointer there. 2041 01:32:08,310 --> 01:32:09,140 It's all 0s. 2042 01:32:09,140 --> 01:32:12,230 Let me try this again, make scanf-- 2043 01:32:12,230 --> 01:32:15,080 OK, it seemed to work --dot slash scanf. 2044 01:32:15,080 --> 01:32:17,374 Let me go ahead and type in Emma. 2045 01:32:17,374 --> 01:32:18,230 Hmm. 2046 01:32:18,230 --> 01:32:19,230 Emma is null. 2047 01:32:19,230 --> 01:32:20,990 Let me try that again. 2048 01:32:20,990 --> 01:32:25,400 So Emma is the Head CA for CS50-- 2049 01:32:25,400 --> 01:32:27,590 let's type a longer string --null. 2050 01:32:27,590 --> 01:32:31,880 So nothing even seems to fit, not even the first letter of her name. 2051 01:32:31,880 --> 01:32:33,050 So why is that? 2052 01:32:33,050 --> 01:32:35,990 And actually, sometimes we can get the program to crash. 2053 01:32:35,990 --> 01:32:40,136 Let's see, a little weird but, let's do this. 2054 01:32:40,136 --> 01:32:41,585 [CHUCKLES] 2055 01:32:41,585 --> 01:32:45,940 2056 01:32:45,940 --> 01:32:47,402 So a longer string-- 2057 01:32:47,402 --> 01:32:48,610 slightly creepy now, perhaps. 2058 01:32:48,610 --> 01:32:50,710 But, OK. 2059 01:32:50,710 --> 01:32:51,690 --enter. 2060 01:32:51,690 --> 01:32:52,390 Dammit. 2061 01:32:52,390 --> 01:32:53,380 Emma not found. 2062 01:32:53,380 --> 01:32:56,080 OK, not what I intended. 2063 01:32:56,080 --> 01:32:58,160 Let's do this once more. 2064 01:32:58,160 --> 01:32:58,660 Oh, my god. 2065 01:32:58,660 --> 01:33:06,110 Now, my histor-- OK, dot slash scanf, Emma, Emma, Emma, Emma, enter. 2066 01:33:06,110 --> 01:33:07,016 Dammit. 2067 01:33:07,016 --> 01:33:09,200 [LAUGHTER] 2068 01:33:09,200 --> 01:33:12,500 OK, well, either way it's broken, which was the only point I'm trying to make. 2069 01:33:12,500 --> 01:33:13,000 [LAUGHTER] 2070 01:33:13,000 --> 01:33:15,370 So why is this not actually working? 2071 01:33:15,370 --> 01:33:18,220 Well, you have to remember what char star s means. 2072 01:33:18,220 --> 01:33:20,740 This means, give me a variable in which I can 2073 01:33:20,740 --> 01:33:23,590 store the address of a chunk of memory. 2074 01:33:23,590 --> 01:33:26,980 Null, at the moment is a symbol that means, 2075 01:33:26,980 --> 01:33:29,050 like, there is no memory allocated yet. 2076 01:33:29,050 --> 01:33:33,310 So technically speaking, I've not actually allocated any memory for Emma 2077 01:33:33,310 --> 01:33:34,810 to actually be stored in. 2078 01:33:34,810 --> 01:33:37,240 So really what I should be doing is something like this. 2079 01:33:37,240 --> 01:33:39,157 If I know in advance, a little presumptuously, 2080 01:33:39,157 --> 01:33:40,990 that the human's going to type in Emma, let 2081 01:33:40,990 --> 01:33:45,520 me go ahead and give myself an array called s of size 5 2082 01:33:45,520 --> 01:33:48,670 and then pass this in on line 7. 2083 01:33:48,670 --> 01:33:50,240 So in short, there's this-- 2084 01:33:50,240 --> 01:33:52,865 there's this relationship between arrays and pointers 2085 01:33:52,865 --> 01:33:55,240 that's sort of been latent throughout today's discussion. 2086 01:33:55,240 --> 01:33:58,390 An array is just a chunk of memory back-to-back-to-back. 2087 01:33:58,390 --> 01:34:02,170 A string is just a sequence of characters back-to-back-to-back. 2088 01:34:02,170 --> 01:34:05,890 A string is technically an address of the first byte of that memory. 2089 01:34:05,890 --> 01:34:08,110 And so sort of by transitivity, a pointer 2090 01:34:08,110 --> 01:34:12,560 can be viewed as the same thing as an array, at least in this context. 2091 01:34:12,560 --> 01:34:15,940 So let me go ahead and allocate myself an array of five characters. 2092 01:34:15,940 --> 01:34:22,030 It turns out that Clang will treat the name of an array just like a pointer 2093 01:34:22,030 --> 01:34:25,240 if you use it in this context to scanf, passing 2094 01:34:25,240 --> 01:34:28,670 in the address of the first byte in that array. 2095 01:34:28,670 --> 01:34:31,630 So now if I go ahead and make scanf with this third version 2096 01:34:31,630 --> 01:34:34,960 and do dot slash scanf and type in Emma-- 2097 01:34:34,960 --> 01:34:35,980 that's four characters. 2098 01:34:35,980 --> 01:34:39,220 I know safely I'm leaving room for the null terminator --now 2099 01:34:39,220 --> 01:34:41,140 it's storing Emma's name successfully. 2100 01:34:41,140 --> 01:34:45,260 And if I go ahead and do this here, emma, in lower case, that works. 2101 01:34:45,260 --> 01:34:49,780 And if I get a little greedy and do like Emma Humphrey, first name, last name, 2102 01:34:49,780 --> 01:34:50,740 Hmm. 2103 01:34:50,740 --> 01:34:52,000 It didn't work. 2104 01:34:52,000 --> 01:34:52,930 But why might that be? 2105 01:34:52,930 --> 01:34:54,847 I haven't allocated enough space for her name. 2106 01:34:54,847 --> 01:34:57,290 I'm lucky frankly, that the program's not crashing. 2107 01:34:57,290 --> 01:35:00,130 But if I loaded as I was trying to do, a big enough paragraph 2108 01:35:00,130 --> 01:35:03,980 of text, my program outright might crash or segfault, 2109 01:35:03,980 --> 01:35:06,670 so to speak-- an error message that you'll likely see this week 2110 01:35:06,670 --> 01:35:09,850 or next as we continue to use memory. 2111 01:35:09,850 --> 01:35:11,860 Let me do one final example now because there's 2112 01:35:11,860 --> 01:35:14,890 one sort of power we now get that we have the ability 2113 01:35:14,890 --> 01:35:17,860 to talk in terms of memory addresses. 2114 01:35:17,860 --> 01:35:21,070 I'm going to go ahead and make a program here, reminiscent of last week, 2115 01:35:21,070 --> 01:35:23,860 called phone book dot C, whose purpose in life 2116 01:35:23,860 --> 01:35:27,720 is going to be to store some information in a file-- 2117 01:35:27,720 --> 01:35:28,720 for the very first time. 2118 01:35:28,720 --> 01:35:31,137 I'm going to use the CS50 library just to put the training 2119 01:35:31,137 --> 01:35:34,870 wheels back on briefly so I can get input from the human easily. 2120 01:35:34,870 --> 01:35:38,410 But I'm going to go ahead then and use the string library and standard I/O, 2121 01:35:38,410 --> 01:35:39,820 int main void. 2122 01:35:39,820 --> 01:35:41,990 And I'm going to go ahead and do the following. 2123 01:35:41,990 --> 01:35:45,640 I'm going to go ahead and open a file called 2124 01:35:45,640 --> 01:35:52,030 file, using a new function called fopen, phone book dot CSV, a. 2125 01:35:52,030 --> 01:35:53,720 Now what is going on here? 2126 01:35:53,720 --> 01:35:56,770 Well it turns out, now that we know pointers-- or starting 2127 01:35:56,770 --> 01:35:59,410 to get comfortable with pointers over the next couple of weeks 2128 01:35:59,410 --> 01:36:03,490 --notice that I can actually use a new data type-- it's weirdly capitalized-- 2129 01:36:03,490 --> 01:36:05,020 all caps, FILE. 2130 01:36:05,020 --> 01:36:09,370 But I can say, give me a pointer to a file and call it lower case file. 2131 01:36:09,370 --> 01:36:13,090 So this is just a variable called FILE, that effectively, for today's purposes, 2132 01:36:13,090 --> 01:36:15,687 is going to store the contents of a file for me. 2133 01:36:15,687 --> 01:36:18,520 It's not technically doing that but that's a reasonable mental model 2134 01:36:18,520 --> 01:36:19,270 for now. 2135 01:36:19,270 --> 01:36:23,830 fopen takes, as its first argument, the name of the file you want to open. 2136 01:36:23,830 --> 01:36:29,320 And the second argument is either r, or w, or a-- r, for read w, for write, 2137 01:36:29,320 --> 01:36:30,340 a, for append-- 2138 01:36:30,340 --> 01:36:32,082 to just keep adding to a file. 2139 01:36:32,082 --> 01:36:33,790 The goal at hand is to write a phone book 2140 01:36:33,790 --> 01:36:36,610 program that lets me type in a human's name and number 2141 01:36:36,610 --> 01:36:38,360 and just keep appending it to a text file, 2142 01:36:38,360 --> 01:36:41,443 like a database that I can store if I want to keep track of people's phone 2143 01:36:41,443 --> 01:36:41,980 numbers. 2144 01:36:41,980 --> 01:36:46,790 fopen, by definition, is going to return a pointer to that file. 2145 01:36:46,790 --> 01:36:49,523 So let me go ahead now and do the following. 2146 01:36:49,523 --> 01:36:52,690 First, I'm going to go ahead and give myself a name, although I don't really 2147 01:36:52,690 --> 01:36:53,890 need to use string per se. 2148 01:36:53,890 --> 01:36:55,300 I'll use char star name. 2149 01:36:55,300 --> 01:36:58,450 But I am going to use getString just to save myself some trouble here, 2150 01:36:58,450 --> 01:37:00,070 asking the human for their name. 2151 01:37:00,070 --> 01:37:03,530 I am going to then ask the human for their number using getString as well. 2152 01:37:03,530 --> 01:37:05,350 But again I could use scanf If I want. 2153 01:37:05,350 --> 01:37:08,770 But it's going to require more error checking today. 2154 01:37:08,770 --> 01:37:10,550 And now I'm going to go ahead and do this. 2155 01:37:10,550 --> 01:37:12,970 It turns out that besides the function printf, 2156 01:37:12,970 --> 01:37:17,290 there's another function called fprintf, which means file printf. 2157 01:37:17,290 --> 01:37:19,400 You can print literally to a file. 2158 01:37:19,400 --> 01:37:23,740 So I'm going to go ahead here and now do print to this file, 2159 01:37:23,740 --> 01:37:28,160 print a string, and a comma, and another string, and then a new line. 2160 01:37:28,160 --> 01:37:31,840 And I'm going to go ahead and print out someone's name and then their number. 2161 01:37:31,840 --> 01:37:34,660 And then down here I'm going to close the file. 2162 01:37:34,660 --> 01:37:37,620 So a bunch of new lines, but this one in short-- 2163 01:37:37,620 --> 01:37:45,280 I'll comment it --open file, get strings from user, print-- 2164 01:37:45,280 --> 01:37:49,720 that is write --strings to file, and then close file. 2165 01:37:49,720 --> 01:37:51,970 So new functions but pretty straightforward at least, 2166 01:37:51,970 --> 01:37:53,137 conceptually, I would argue. 2167 01:37:53,137 --> 01:37:56,320 It's terms of what's happening even though the syntax is a little strange. 2168 01:37:56,320 --> 01:37:59,260 But I did deliberately choose this file name, phone book dot CSV. 2169 01:37:59,260 --> 01:38:02,100 Does anyone know what a CSV is? 2170 01:38:02,100 --> 01:38:03,850 Yeah, comma separated variables. 2171 01:38:03,850 --> 01:38:04,900 It's like a very-- 2172 01:38:04,900 --> 01:38:07,990 comma separated values, it's a very simple spreadsheet format 2173 01:38:07,990 --> 01:38:11,410 that you can open in Excel, or Apple Numbers, or other tools like that. 2174 01:38:11,410 --> 01:38:14,390 So I can actually make my own CSV files kind of like this. 2175 01:38:14,390 --> 01:38:16,078 Let me go ahead and make phone book. 2176 01:38:16,078 --> 01:38:17,370 All right, that seemed to work. 2177 01:38:17,370 --> 01:38:19,630 Let me go ahead and do dot slash phone book. 2178 01:38:19,630 --> 01:38:22,360 And now it's asking for a name, so I'll do Emma. 2179 01:38:22,360 --> 01:38:27,340 And then I think her number last week was 555-0100, enter. 2180 01:38:27,340 --> 01:38:30,490 But notice this, if I type ls, besides all of the programs 2181 01:38:30,490 --> 01:38:34,430 we've written today, there's also this phone book dot CSV file. 2182 01:38:34,430 --> 01:38:37,270 And in fact, let me open up phone book dot CSV. 2183 01:38:37,270 --> 01:38:40,310 And there's Emma's name and number in a file. 2184 01:38:40,310 --> 01:38:42,490 Let me go ahead and run it once more and this time 2185 01:38:42,490 --> 01:38:48,280 do Rodrigo, like last week, 617-555-0101, enter. 2186 01:38:48,280 --> 01:38:50,950 And voila, his name just appeared in the file. 2187 01:38:50,950 --> 01:38:51,820 We'll do one more. 2188 01:38:51,820 --> 01:38:56,320 So Brian was 617-555-0102, enter. 2189 01:38:56,320 --> 01:38:58,870 And the CSV file is getting updated in real time. 2190 01:38:58,870 --> 01:39:02,290 And now if I actually go and download this file from the IDE 2191 01:39:02,290 --> 01:39:04,300 by control clicking or right clicking on it, 2192 01:39:04,300 --> 01:39:05,930 that ends up in my downloads folder. 2193 01:39:05,930 --> 01:39:08,930 And if I go ahead and click on this-- if you have something like Numbers 2194 01:39:08,930 --> 01:39:12,400 or Microsoft Excel installed and you use it for the very first time-- 2195 01:39:12,400 --> 01:39:16,300 you'll see that it's opened up a spreadsheet containing 2196 01:39:16,300 --> 01:39:17,783 those names and those numbers. 2197 01:39:17,783 --> 01:39:20,950 So if you've ever needed to do a sort of data science-like analysis of data, 2198 01:39:20,950 --> 01:39:23,470 you can actually write code that generates the data for you 2199 01:39:23,470 --> 01:39:29,470 in a CSV format and gives you these, perhaps, familiar, rows and columns. 2200 01:39:29,470 --> 01:39:34,360 But let me do one final example now that will motivate this coming week's 2201 01:39:34,360 --> 01:39:35,620 problem set challenges. 2202 01:39:35,620 --> 01:39:39,430 So I'm going to go ahead now and write a final program that-- whose 2203 01:39:39,430 --> 01:39:41,410 purpose in life is to detect this. 2204 01:39:41,410 --> 01:39:48,310 I have here in front of me a picture of Brian [LAUGHTER] in JPEG format. 2205 01:39:48,310 --> 01:39:52,660 And I have a cat in GIF format-- which doesn't work in the IDE 2206 01:39:52,660 --> 01:39:56,020 but let me go ahead and download it locally --does look like this. 2207 01:39:56,020 --> 01:39:58,060 So it's this guy from a couple of weeks ago. 2208 01:39:58,060 --> 01:40:00,240 But both-- one is in GIF format, one is in JPEG, 2209 01:40:00,240 --> 01:40:01,990 which if you're familiar from file formats 2210 01:40:01,990 --> 01:40:04,030 are just different types of images. 2211 01:40:04,030 --> 01:40:10,000 Let me go ahead and write a program real quick that is called JPEG dot c. 2212 01:40:10,000 --> 01:40:15,430 And its purpose in life is just to check if a file passed by its name 2213 01:40:15,430 --> 01:40:18,430 at the command line is a JPEG or not. 2214 01:40:18,430 --> 01:40:22,270 I'm going to go ahead and include standard I/O dot h. 2215 01:40:22,270 --> 01:40:24,850 I'm going to call my function int main, but not void. 2216 01:40:24,850 --> 01:40:27,820 This time I'm going to use int argc, like last week, 2217 01:40:27,820 --> 01:40:31,633 and string argv open paren-- 2218 01:40:31,633 --> 01:40:32,800 open bracket closed bracket. 2219 01:40:32,800 --> 01:40:33,550 But you know what? 2220 01:40:33,550 --> 01:40:35,050 We don't need strings anymore. 2221 01:40:35,050 --> 01:40:38,858 This is actually what you've been typing sort of unknowingly the past week when 2222 01:40:38,858 --> 01:40:41,650 you were using command line arguments, or the past couple of weeks. 2223 01:40:41,650 --> 01:40:44,020 Now I'm going to go ahead and do a quick error check. 2224 01:40:44,020 --> 01:40:46,900 If argc does not equal 2, I'm just going to. quit. 2225 01:40:46,900 --> 01:40:49,510 I want the human to type, not just the program's name, 2226 01:40:49,510 --> 01:40:51,250 but one other word as well. 2227 01:40:51,250 --> 01:40:54,340 I then want to go ahead and open up the file 2228 01:40:54,340 --> 01:40:56,300 that the human typed in at the prompt-- 2229 01:40:56,300 --> 01:40:59,890 which I claim is going to be the second word they type --so argv 1. 2230 01:40:59,890 --> 01:41:02,490 And I want to read it this time, not append line-by-line, 2231 01:41:02,490 --> 01:41:04,240 I just want to read it from the beginning. 2232 01:41:04,240 --> 01:41:07,270 And the key-- keyword for that is r. 2233 01:41:07,270 --> 01:41:10,660 And then I'm going to go ahead and actually do a little error check. 2234 01:41:10,660 --> 01:41:12,940 If file equals equals null-- 2235 01:41:12,940 --> 01:41:16,090 we haven't seen this before --but if fopen, if malloc, 2236 01:41:16,090 --> 01:41:19,300 if getString return error conditions, they actually 2237 01:41:19,300 --> 01:41:20,550 return the special value null. 2238 01:41:20,550 --> 01:41:23,217 But for now, let me just go ahead and say, something went wrong. 2239 01:41:23,217 --> 01:41:24,160 I'm going to return 1. 2240 01:41:24,160 --> 01:41:26,770 But we won't worry too much more about it for now. 2241 01:41:26,770 --> 01:41:30,850 So at this point I have opened file. 2242 01:41:30,850 --> 01:41:36,310 I have ensure user ran program with two words 2243 01:41:36,310 --> 01:41:39,310 at prompt, that's our argc use there. 2244 01:41:39,310 --> 01:41:41,180 Now let's go ahead and do this. 2245 01:41:41,180 --> 01:41:45,610 I'm going to go ahead and give myself an array of 3 bytes. 2246 01:41:45,610 --> 01:41:48,755 And I'm going to go ahead and use a function called fread-- 2247 01:41:48,755 --> 01:41:50,630 And we'll see more of this in the assignment. 2248 01:41:50,630 --> 01:41:52,080 So this is deliberately quick. 2249 01:41:52,080 --> 01:41:56,350 --I pass in his argument, the array, the number of bytes I want to read, 2250 01:41:56,350 --> 01:41:59,410 how many times I want to read those bytes, and then the file 2251 01:41:59,410 --> 01:42:01,210 from which I want to read those bytes. 2252 01:42:01,210 --> 01:42:02,500 So that was a mouthful. 2253 01:42:02,500 --> 01:42:08,420 But collectively, these two lines of code read 3 bytes from file. 2254 01:42:08,420 --> 01:42:13,180 It just literally reads the first 24 bits, or 3 bytes-- 2255 01:42:13,180 --> 01:42:15,940 each of which is 8 bits --from the file. 2256 01:42:15,940 --> 01:42:17,420 And why am I doing this? 2257 01:42:17,420 --> 01:42:24,247 Well, it turns out, check if bytes are 0xFF, 0xD8, 0xxFF. 2258 01:42:24,247 --> 01:42:26,080 So again, coming full circle to hexadecimal, 2259 01:42:26,080 --> 01:42:30,190 it turns out that in the documentation for the JPEG image format, 2260 01:42:30,190 --> 01:42:32,925 the first 3 bytes of any JPEG in the world-- 2261 01:42:32,925 --> 01:42:35,050 any photograph you've ever taken with your camera-- 2262 01:42:35,050 --> 01:42:38,200 start with FF, then D8, then FF. 2263 01:42:38,200 --> 01:42:41,770 This is a so-called magic number that the designers of the JPEG format 2264 01:42:41,770 --> 01:42:45,330 just decided, use this as a sort of clue at the beginning of the file that hey, 2265 01:42:45,330 --> 01:42:48,130 here comes a JPEG image. 2266 01:42:48,130 --> 01:42:49,210 So how do I do this? 2267 01:42:49,210 --> 01:42:53,970 It's actually pretty simple, if bytes 0 equals equals 0xFF-- 2268 01:42:53,970 --> 01:42:57,840 I can literally type hexadecimal in C --or byte-- 2269 01:42:57,840 --> 01:43:06,870 rather, and bytes 1 equals 0xD8, and bytes 2 equals equals 0xFF, 2270 01:43:06,870 --> 01:43:10,947 then it turns out, it's probably a JPEG. 2271 01:43:10,947 --> 01:43:12,030 There are some conditions. 2272 01:43:12,030 --> 01:43:13,405 We'll explore in the problem set. 2273 01:43:13,405 --> 01:43:16,380 So I'm just going to say maybe it's a JPEG. 2274 01:43:16,380 --> 01:43:20,220 But if that's not true, I am going to say with confidence, 2275 01:43:20,220 --> 01:43:25,110 no, it's not a JPEG if those first 3 bytes are not that. 2276 01:43:25,110 --> 01:43:27,060 And then for arcane reasons, I technically 2277 01:43:27,060 --> 01:43:28,920 need to make this what's called unsigned, 2278 01:43:28,920 --> 01:43:33,660 which means it's a number from 0 to 255, instead of negative 128 to 127. 2279 01:43:33,660 --> 01:43:37,170 But let me wave my hands at that, just so that we get this code right for now. 2280 01:43:37,170 --> 01:43:41,550 I'm going to go ahead and run JPEG and fail miserably. 2281 01:43:41,550 --> 01:43:43,400 What did I do wrong? 2282 01:43:43,400 --> 01:43:46,410 fopen is the name of that function-- sorry-- 2283 01:43:46,410 --> 01:43:48,390 make JPEG, good. 2284 01:43:48,390 --> 01:43:52,710 And now I'm going to run JPEG on my Brian 2285 01:43:52,710 --> 01:43:55,740 image, which is in my source for directory on the course's website. 2286 01:43:55,740 --> 01:43:57,420 He is maybe a JPEG. 2287 01:43:57,420 --> 01:44:01,770 And then I'm going to go ahead and do JPEG on source for cat dot GIF, which 2288 01:44:01,770 --> 01:44:06,720 is no, not a GIF, which is to say that once you have the ability to express 2289 01:44:06,720 --> 01:44:10,770 pointers, we now have the programmatic capabilities, not only to write files, 2290 01:44:10,770 --> 01:44:12,370 but read them as well. 2291 01:44:12,370 --> 01:44:15,210 Now what can we actually use that information for? 2292 01:44:15,210 --> 01:44:19,500 Well it turns out what we'll be doing now, this coming week and beyond, 2293 01:44:19,500 --> 01:44:25,470 is exploring a number of features here of what's 2294 01:44:25,470 --> 01:44:29,010 called file I/O. Long story short, if you've ever wondered really 2295 01:44:29,010 --> 01:44:32,290 what an image is-- we talked briefly about this in Week 0 2296 01:44:32,290 --> 01:44:33,540 --this is an image. 2297 01:44:33,540 --> 01:44:36,090 But it's in binary, 0s and 1s. 2298 01:44:36,090 --> 01:44:37,950 Does anyone know what this image is of? 2299 01:44:37,950 --> 01:44:38,982 AUDIENCE: A smiley face. 2300 01:44:38,982 --> 01:44:42,190 DAVID J. MALAN: Well, how did you-- are a nonzero number of you looking ahead 2301 01:44:42,190 --> 01:44:42,773 on the slides? 2302 01:44:42,773 --> 01:44:44,470 Because yes, it's a smiley face. 2303 01:44:44,470 --> 01:44:47,980 And you would only know this by assuming that 1 represents 2304 01:44:47,980 --> 01:44:50,630 a white pixel, 0 represents a black pixel, 2305 01:44:50,630 --> 01:44:53,440 and if we effectively have a grid of bits-- 2306 01:44:53,440 --> 01:44:57,410 1's and 0's --this from far back kind of looks like the simplest possible smiley 2307 01:44:57,410 --> 01:44:57,910 face. 2308 01:44:57,910 --> 01:45:01,420 So that's an image, or a bitmap, a map of bits, 2309 01:45:01,420 --> 01:45:03,725 that represent the pixels in an image. 2310 01:45:03,725 --> 01:45:06,100 So with problem set four, what we're going to start to do 2311 01:45:06,100 --> 01:45:08,858 is explore the world of forensics, first and foremost. 2312 01:45:08,858 --> 01:45:10,150 And we have a few minutes left. 2313 01:45:10,150 --> 01:45:13,100 And we're going to spend one of them on this little teaser here, 2314 01:45:13,100 --> 01:45:17,870 which is something that you might see typically on your typical CSI type 2315 01:45:17,870 --> 01:45:18,370 shows. 2316 01:45:18,370 --> 01:45:20,570 And let's motivate it as follows. 2317 01:45:20,570 --> 01:45:22,533 If we could dim the lights for this clip. 2318 01:45:22,533 --> 01:45:23,360 [VIDEO PLAYBACK] 2319 01:45:23,360 --> 01:45:25,520 - --we know? 2320 01:45:25,520 --> 01:45:28,730 - That at 9:15, Ray Santoya was at the ATM. 2321 01:45:28,730 --> 01:45:32,600 - OK, so the question is, what was he doing at 9:16? 2322 01:45:32,600 --> 01:45:35,370 - Shooting the 9 millimeter at something. 2323 01:45:35,370 --> 01:45:37,040 Maybe he saw the sniper. 2324 01:45:37,040 --> 01:45:38,443 - Or was working with him? 2325 01:45:38,443 --> 01:45:39,130 [BEEPS] 2326 01:45:39,130 --> 01:45:41,460 - Wait, go back one. 2327 01:45:41,460 --> 01:45:42,582 - What do you see? 2328 01:45:42,582 --> 01:45:43,082 [TYPING] 2329 01:45:43,082 --> 01:45:44,054 [BEEPS] 2330 01:45:44,054 --> 01:45:49,890 2331 01:45:49,890 --> 01:45:51,900 - Bring his face up, full screen. 2332 01:45:51,900 --> 01:45:53,700 [BEEPS] 2333 01:45:53,700 --> 01:45:54,672 - His glasses. 2334 01:45:54,672 --> 01:45:55,630 - There's a reflection. 2335 01:45:55,630 --> 01:45:57,088 2336 01:45:57,088 --> 01:45:58,546 [TYPING] 2337 01:45:58,546 --> 01:46:00,004 [BEEPS] 2338 01:46:00,004 --> 01:46:02,920 [CHUCKLE] 2339 01:46:02,920 --> 01:46:05,336 [BEEPS] 2340 01:46:05,336 --> 01:46:05,836 [LAUGHTER] 2341 01:46:05,836 --> 01:46:07,820 - [INAUDIBLE] baseball team. 2342 01:46:07,820 --> 01:46:08,890 That's their logo. 2343 01:46:08,890 --> 01:46:11,190 - And he's talking to whoever's wearing that jacket. 2344 01:46:11,190 --> 01:46:13,360 - We may have a witness. 2345 01:46:13,360 --> 01:46:15,017 - To both shootings. 2346 01:46:15,017 --> 01:46:15,600 [END PLAYBACK] 2347 01:46:15,600 --> 01:46:18,183 DAVID J. MALAN: So at the risk of ruining a lot of TV for you, 2348 01:46:18,183 --> 01:46:19,270 this is not a thing. 2349 01:46:19,270 --> 01:46:22,170 You can't just say enhance and things get enhanced. 2350 01:46:22,170 --> 01:46:22,680 Why? 2351 01:46:22,680 --> 01:46:24,380 Well, here's that same picture of Brian. 2352 01:46:24,380 --> 01:46:27,480 And let's [LAUGHTER] look at this glint in his eye. 2353 01:46:27,480 --> 01:46:28,480 Let's see what's there. 2354 01:46:28,480 --> 01:46:31,680 If we could zoom in on this, and then zoom in on this, and then 2355 01:46:31,680 --> 01:46:32,520 zoom in on this. 2356 01:46:32,520 --> 01:46:35,640 This is all of the data that is in Brian's eye. 2357 01:46:35,640 --> 01:46:37,920 There is no enhance at that point, when you're 2358 01:46:37,920 --> 01:46:42,360 looking at just pixels represented by colors, a la Week 0. 2359 01:46:42,360 --> 01:46:44,917 So what you'll do for this coming week in fact-- 2360 01:46:44,917 --> 01:46:46,750 in fact, let's actually make this more real. 2361 01:46:46,750 --> 01:46:50,040 If we could go back to the clip here for just 20 seconds, 2362 01:46:50,040 --> 01:46:52,213 if we could dim the lights once more. 2363 01:46:52,213 --> 01:46:52,880 [VIDEO PLAYBACK] 2364 01:46:52,880 --> 01:46:54,950 - Magnify that death sphere. 2365 01:46:54,950 --> 01:46:57,150 [BEEPS] 2366 01:46:57,150 --> 01:46:58,610 Why is it still blurry? 2367 01:46:58,610 --> 01:47:00,570 - That's all the resolution we have. 2368 01:47:00,570 --> 01:47:02,940 Making it bigger doesn't make it clearer. 2369 01:47:02,940 --> 01:47:04,757 - It does on CSI Miami. 2370 01:47:04,757 --> 01:47:05,340 [END PLAYBACK] 2371 01:47:05,340 --> 01:47:07,215 DAVID J. MALAN: So with that said, this week, 2372 01:47:07,215 --> 01:47:09,390 will we understand all the more how images work? 2373 01:47:09,390 --> 01:47:11,430 And here for instance, is a shot of the Charles River. 2374 01:47:11,430 --> 01:47:14,640 And for the first part of the problem set, we implement a number of Instagram 2375 01:47:14,640 --> 01:47:17,070 like filters, understanding how an image is represented 2376 01:47:17,070 --> 01:47:19,170 and how you therefore can transform it. 2377 01:47:19,170 --> 01:47:22,230 For instance, first, into grayscale, by writing your own grayscale 2378 01:47:22,230 --> 01:47:24,720 filter, into sepia, into-- 2379 01:47:24,720 --> 01:47:28,440 reflecting it on the opposite from left to right, blurring an image, 2380 01:47:28,440 --> 01:47:28,980 even still. 2381 01:47:28,980 --> 01:47:31,272 And if you're feeling more comfortable, to do something 2382 01:47:31,272 --> 01:47:33,810 called edge detection, which finds all of the edges 2383 01:47:33,810 --> 01:47:36,480 within a particular picture. 2384 01:47:36,480 --> 01:47:38,520 More than that, will you actually implement 2385 01:47:38,520 --> 01:47:41,160 code that recovers JPEG files? 2386 01:47:41,160 --> 01:47:43,900 We've been taking some photographs of people, places, and things. 2387 01:47:43,900 --> 01:47:46,530 Unfortunately, we accidentally deleted those photos 2388 01:47:46,530 --> 01:47:50,548 but first made a forensic image of the memory card from the camera, which 2389 01:47:50,548 --> 01:47:52,590 we will then provide to you so that you can write 2390 01:47:52,590 --> 01:47:55,470 code in C that recovers all of the seemingly 2391 01:47:55,470 --> 01:47:58,950 lost JPEGs from that forensic image. 2392 01:47:58,950 --> 01:48:01,740 And last but not least, it would not be a CS class 2393 01:48:01,740 --> 01:48:03,180 without a little bit of CS humor. 2394 01:48:03,180 --> 01:48:09,218 We thought we'd end on this one note, a joke that you will perhaps now get. 2395 01:48:09,218 --> 01:48:11,130 [LAUGHTER] 2396 01:48:11,130 --> 01:48:11,630 All right. 2397 01:48:11,630 --> 01:48:12,890 That's it for CS50. 2398 01:48:12,890 --> 01:48:13,890 We'll see you next time. 2399 01:48:13,890 --> 01:48:16,040 [MUSIC PLAYING]