1 00:00:00,506 --> 00:00:09,726 [ Silence ] 2 00:00:10,226 --> 00:00:10,836 >> Alright! 3 00:00:10,976 --> 00:00:12,126 Welcome back to CS50. 4 00:00:12,176 --> 00:00:14,476 This is the end of week 4. 5 00:00:14,476 --> 00:00:17,346 So, if you haven't already, do pull up 6 00:00:17,346 --> 00:00:21,306 or follow along while I do here a simple Google search 7 00:00:21,606 --> 00:00:23,686 for this keyword recursion 8 00:00:23,686 --> 00:00:27,376 which we discussed last week, and do you get it? 9 00:00:27,376 --> 00:00:28,446 [ Inaudible Remarks ] 10 00:00:28,446 --> 00:00:30,096 >> Ah, ha ha ha. 11 00:00:30,096 --> 00:00:32,606 Okay, so many new jokes available to us now. 12 00:00:33,006 --> 00:00:39,266 So, someone has a sense of humor on the Google humor team, right? 13 00:00:39,266 --> 00:00:39,333 [ Inaudible Remarks ] 14 00:00:39,333 --> 00:00:44,276 >> Alright, we're all a bunch of dorks now, alright. 15 00:00:44,276 --> 00:00:47,366 So, a couple of recaps and announcements. 16 00:00:47,366 --> 00:00:50,166 So last time, recall that we focused on algorithms 17 00:00:50,166 --> 00:00:52,796 and actually solving some problems a bit more efficiently 18 00:00:53,006 --> 00:00:56,616 and we started to use this so-called asymptotic notation, 19 00:00:56,616 --> 00:00:58,986 just big O in omega and there's also feta [phonetic] 20 00:00:58,986 --> 00:01:00,976 but we'll focus mostly on big O in-- 21 00:01:01,216 --> 00:01:04,176 in Omega and just to toss some jargon out here, 22 00:01:04,176 --> 00:01:06,876 know that we focused a lot on big O of N linear 23 00:01:07,076 --> 00:01:10,706 and so linear then perhaps brings back memories 24 00:01:10,706 --> 00:01:12,706 of the straight lines that we do while graphing. 25 00:01:12,706 --> 00:01:16,216 We looked at big O of N squared, bubble sort, selection sort, 26 00:01:16,216 --> 00:01:18,656 these were in big O of N squared and we found 27 00:01:18,656 --> 00:01:20,146 that that was generally slow. 28 00:01:20,146 --> 00:01:22,656 If definitely felt or looked slow when we compared it 29 00:01:22,936 --> 00:01:25,306 to something faster, namely N log N. Now, 30 00:01:25,866 --> 00:01:29,016 frankly most people don't tend to say supralinear 31 00:01:29,016 --> 00:01:32,936 but it is an apt word but N log N was kind of a sweet spot 32 00:01:32,986 --> 00:01:36,326 between N squared which was really slow and N 33 00:01:36,326 --> 00:01:38,726 which was generally unrealistic, at least thus far 34 00:01:38,726 --> 00:01:40,086 for doing something like sorting. 35 00:01:40,346 --> 00:01:42,516 But know too that anytime you have an algorithm 36 00:01:42,516 --> 00:01:46,636 that takes the same amount of time, irrespective of the size 37 00:01:46,746 --> 00:01:49,046 of the problem, irrespective of the number of students 38 00:01:49,046 --> 00:01:52,316 in the room, irrespective of the number of inputs provided, 39 00:01:52,496 --> 00:01:54,386 well that's what we would call constant time. 40 00:01:54,386 --> 00:01:58,086 And just as we cross out things like divided by 2 and times 2, 41 00:01:58,266 --> 00:02:00,776 anytime you wanna express the notion that an algorithm runs 42 00:02:00,776 --> 00:02:03,176 in constant time, 5 seconds, 10 seconds, 43 00:02:03,176 --> 00:02:04,746 no matter how big the input is, 44 00:02:05,036 --> 00:02:07,436 you would typically say big O of 1. 45 00:02:07,436 --> 00:02:09,556 Well, what's an example of a constant algorithm? 46 00:02:09,826 --> 00:02:12,266 Well, I mean you can think of something simple like-- 47 00:02:12,266 --> 00:02:16,476 at the start of lecture, rather than taking something 48 00:02:16,476 --> 00:02:19,006 like attendance, the algorithm can be greet students, 49 00:02:19,286 --> 00:02:21,956 so I execute that algorithm, "hello", 50 00:02:21,956 --> 00:02:24,236 like no matter what that's gonna take constant time. 51 00:02:24,286 --> 00:02:27,306 Even if I say three things like "hello class", okay, 52 00:02:27,306 --> 00:02:29,456 so that's two steps but again we get rid 53 00:02:29,456 --> 00:02:32,376 of this constant multiplier so it's still constant time. 54 00:02:32,556 --> 00:02:36,196 We saw binary search with Shawn up at the board on video 55 00:02:36,196 --> 00:02:38,716 where log N was the ultimate conclusion there. 56 00:02:38,976 --> 00:02:40,156 But know that there are other things. 57 00:02:40,156 --> 00:02:41,966 Sometimes, you even can't do N squared, 58 00:02:41,966 --> 00:02:44,726 the problem is just too big or too complex, so you might have 59 00:02:44,726 --> 00:02:48,106 to have something polynomial, N raised to the something power, 60 00:02:48,286 --> 00:02:51,576 where that something, call it C, is bigger than 2, 61 00:02:51,786 --> 00:02:54,786 and really bad algorithms are things that are in, say, 62 00:02:55,006 --> 00:02:58,996 factorial-- N a factorial which we'll typically avoid 63 00:02:59,206 --> 00:03:01,866 but there are-- there is the potential to write algorithms 64 00:03:01,906 --> 00:03:03,126 that really are that slow. 65 00:03:03,126 --> 00:03:04,186 And then exponential is 66 00:03:04,186 --> 00:03:05,776 yet another one not even depicted here, 67 00:03:06,056 --> 00:03:07,746 and that too tends to be quite slow. 68 00:03:07,746 --> 00:03:10,266 But today we return now to the practice 69 00:03:10,266 --> 00:03:13,206 of actually coding things and understanding what's going 70 00:03:13,206 --> 00:03:17,666 on inside of a computer and peel back a layer called pointers 71 00:03:17,876 --> 00:03:21,226 that is both empowering as we'll see but also quite dangerous. 72 00:03:21,406 --> 00:03:22,626 So if you don't recall, 73 00:03:22,836 --> 00:03:25,336 WiFi should now be available here in Sanders. 74 00:03:25,386 --> 00:03:27,726 What we've also done, thanks to HUIT, 75 00:03:27,966 --> 00:03:30,516 is Quincy House whose wireless we broke last Wednesday 76 00:03:30,516 --> 00:03:34,046 at office hours since we had 209 students there in attendance. 77 00:03:34,496 --> 00:03:37,076 They have added six more wireless access points for us, 78 00:03:37,336 --> 00:03:39,886 so tonight's office hours in Quincy should be much improved, 79 00:03:39,886 --> 00:03:42,066 and the other houses have been bolstered as well. 80 00:03:42,356 --> 00:03:45,336 Also, too, if you've been trying to catch me for pass/fail forms 81 00:03:45,336 --> 00:03:47,986 or add/drop questions and I have been quiet on email, 82 00:03:47,986 --> 00:03:49,926 it's just because I've gotten kind of overwhelmed. 83 00:03:49,926 --> 00:03:52,456 Please don't hesitate to just re-ping me, and I'll do my best 84 00:03:52,456 --> 00:03:55,476 to catch up today and tomorrow, but certainly you engage me, 85 00:03:55,476 --> 00:03:58,596 or Matt, or Rob, or your TF in a chat if you have any questions 86 00:03:58,596 --> 00:04:00,516 or concerns at this point in the term. 87 00:04:00,986 --> 00:04:03,046 So, a little teaser before we move 88 00:04:03,046 --> 00:04:04,856 on to some milk and OJ here. 89 00:04:05,156 --> 00:04:07,766 So this was a-- a little video that should tease us 90 00:04:07,766 --> 00:04:11,086 as to today's focus, this topic of pointers. 91 00:04:11,526 --> 00:04:13,746 In a nutshell, a pointer is going to be an address, 92 00:04:13,926 --> 00:04:15,316 the address of something in memory 93 00:04:15,456 --> 00:04:17,986 but we'll see what damage we can actually do 94 00:04:17,986 --> 00:04:19,676 if we use these things incorrectly, 95 00:04:19,676 --> 00:04:23,556 so a clip from a certain Stanford University Computer 96 00:04:23,556 --> 00:04:24,466 Science professor here. 97 00:04:24,466 --> 00:04:25,516 [ Music ] 98 00:04:25,516 --> 00:04:32,096 >> Hey Binky, wake up, its time for Pointer Fun. 99 00:04:32,546 --> 00:04:34,156 >> What's that? 100 00:04:34,296 --> 00:04:35,636 Learn about pointers? 101 00:04:35,636 --> 00:04:37,016 Oh goody! 102 00:04:37,016 --> 00:04:37,826 [ Laughter ] 103 00:04:37,826 --> 00:04:40,856 >> That's all you get right now, we will conclude 104 00:04:40,856 --> 00:04:42,566 with the-- the follow up there. 105 00:04:42,566 --> 00:04:44,806 It's just fun to imagine Nick, 106 00:04:44,806 --> 00:04:47,486 this esteemed computer science professor at Stanford kind 107 00:04:47,486 --> 00:04:49,506 of playing at home with his clay models 108 00:04:49,506 --> 00:04:52,996 and recording himself do this, so more on that later today. 109 00:04:53,106 --> 00:04:57,606 So, recall that we ran into some problems early on when we tried 110 00:04:57,606 --> 00:04:59,866 to do something so simple like swapping variables 111 00:04:59,866 --> 00:05:02,706 and writing our own swap function also failed ultimately. 112 00:05:02,966 --> 00:05:05,096 And so consider how we solve the problem like this. 113 00:05:05,096 --> 00:05:07,536 I'm gonna get rid of things like instant charge here 114 00:05:07,536 --> 00:05:09,266 because we can just generalize with pseudo code, 115 00:05:09,506 --> 00:05:11,546 but here might be two variables, A and B, 116 00:05:11,756 --> 00:05:14,006 and they've already been initialized to be orange juice 117 00:05:14,046 --> 00:05:15,356 and milk, respectively. 118 00:05:15,526 --> 00:05:16,896 Well, suppose that the goal here is 119 00:05:16,896 --> 00:05:19,826 to swap the contents of A and B, right? 120 00:05:19,826 --> 00:05:24,256 And sort of intuitive term, the goal is to put this over here, 121 00:05:24,256 --> 00:05:26,226 this over here, but obviously 122 00:05:26,296 --> 00:05:29,526 if you take the straightforward approach, that kind of looks 123 00:05:29,526 --> 00:05:33,346 at first glance, correct, AB goes into A, A goes into B, 124 00:05:33,556 --> 00:05:36,166 of course, you get a bit of a mess if this is A, this is B 125 00:05:36,166 --> 00:05:39,806 and you try doing-- that is disgusting. 126 00:05:39,806 --> 00:05:42,576 This-- right, there's no way I'm gonna be able 127 00:05:42,576 --> 00:05:45,766 to move just the white stuff back over here on to the left. 128 00:05:45,766 --> 00:05:48,986 And so, right, this is buggy, right, broken algorithm, 129 00:05:48,986 --> 00:05:50,246 if that weren't clear already. 130 00:05:50,736 --> 00:05:51,866 So, how do we solve this? 131 00:05:52,156 --> 00:05:54,806 Perhaps, in real world, obvious problem. 132 00:05:56,386 --> 00:05:58,226 Right, so you have some kind of temporary variable. 133 00:05:58,226 --> 00:06:01,406 So if we introduce for instance just an extra 32 bits 134 00:06:01,406 --> 00:06:02,696 or whatever the data size is, 135 00:06:02,936 --> 00:06:09,286 then we can actually fix this whereby-- let's try this again-- 136 00:06:10,206 --> 00:06:16,816 whereby we have a fresh cup of A, fresh cup of B, 137 00:06:16,966 --> 00:06:21,116 and then we're gonna need-- whoops let's not do that. 138 00:06:21,116 --> 00:06:21,446 [ Laughter ] 139 00:06:21,446 --> 00:06:24,986 >> Alright, fresh cup of B, so now we need 140 00:06:24,986 --> 00:06:26,676 of course a temporary variable. 141 00:06:27,266 --> 00:06:30,036 If there's any suspense, really, we should know 142 00:06:30,036 --> 00:06:30,916 where this is going, right? 143 00:06:30,916 --> 00:06:35,406 So-- so we have A with orange juice, we put A into temp, 144 00:06:36,096 --> 00:06:37,516 okay, that was productive. 145 00:06:37,736 --> 00:06:41,206 Now, we have B going into A, so that's over here, 146 00:06:41,206 --> 00:06:43,586 little cross contamination but not nearly as bad. 147 00:06:43,586 --> 00:06:46,586 And now, temp goes into B, right, 148 00:06:46,646 --> 00:06:48,176 so this is kind of a fun game, right? 149 00:06:48,176 --> 00:06:48,996 We could do this all day. 150 00:06:48,996 --> 00:06:49,063 [ Laughter ] 151 00:06:49,063 --> 00:06:51,476 >> But thank you. 152 00:06:51,476 --> 00:06:51,846 Thank you. 153 00:06:52,346 --> 00:06:55,146 [ Applause ] 154 00:06:55,646 --> 00:06:56,516 >> So, not bad. 155 00:06:56,996 --> 00:06:59,046 We've solved it, but then we broke this, right? 156 00:06:59,086 --> 00:07:00,776 You could implement something like this, 157 00:07:00,776 --> 00:07:02,906 albeit in C in a function like main, 158 00:07:02,906 --> 00:07:04,436 and it would actually work. 159 00:07:04,436 --> 00:07:07,126 So you could swap two variables, A and B, if they're integers 160 00:07:07,126 --> 00:07:08,656 or whatnot, and it would work in main 161 00:07:08,846 --> 00:07:12,946 but then we kinda got arrogant and introduced our own function, 162 00:07:12,946 --> 00:07:15,756 swap, whereby we pass in A and B, 163 00:07:15,756 --> 00:07:17,496 and we claim this is good practice, right? 164 00:07:17,496 --> 00:07:19,926 You factor out sort of conceptual chunks of code 165 00:07:20,086 --> 00:07:22,686 into functions that are aptly named, we slapped the label, 166 00:07:22,686 --> 00:07:25,626 hierarchical dot decomposition on it, but then this broke, 167 00:07:25,746 --> 00:07:27,836 this simple, simple, simple thing of swapping. 168 00:07:27,836 --> 00:07:30,956 And what was the simple answer as to why the inputs 169 00:07:31,016 --> 00:07:34,416 to this function are not in fact swapped once this 170 00:07:34,416 --> 00:07:35,186 function executes? 171 00:07:36,216 --> 00:07:37,656 Yeah, so it's this issue of scope. 172 00:07:37,656 --> 00:07:41,196 When you pass in two variables, say X and Y, into a function 173 00:07:41,196 --> 00:07:43,016 like swap, what's actually passed 174 00:07:43,016 --> 00:07:44,896 in is two copies of those things. 175 00:07:44,896 --> 00:07:47,186 So, what's really been happening in the case 176 00:07:47,186 --> 00:07:52,396 of an actual swap function is if this is, say X and Y, 177 00:07:52,396 --> 00:07:53,826 and this exist in main. 178 00:07:53,956 --> 00:07:57,596 Well, when you actually call swap, what happens is-- 179 00:07:58,566 --> 00:08:01,366 I'm gonna have to do a little bit of a magic here. 180 00:08:01,666 --> 00:08:05,256 So if this is X, this is Y, this is going to be A, this is going 181 00:08:05,256 --> 00:08:08,166 to be B, what actually goes into this cup? 182 00:08:09,016 --> 00:08:11,216 Like it's a copy of whatever X is, 183 00:08:11,216 --> 00:08:13,406 so it wouldn't really make sense to pour this in here just 184 00:08:13,406 --> 00:08:15,686 yet so I'm just gonna cheat and say this is a copy, 185 00:08:15,686 --> 00:08:17,706 which pretty much is, half a glass of orange juice. 186 00:08:18,286 --> 00:08:21,166 Y goes into B so we have a perfect copy of this. 187 00:08:21,816 --> 00:08:23,906 And then what happens in this swap function? 188 00:08:23,906 --> 00:08:26,116 Well, the swap function again is actually correct. 189 00:08:26,116 --> 00:08:30,166 We have this local temp variable so we go ahead and move A 190 00:08:30,166 --> 00:08:33,326 into temp, then we go ahead and move B into A, 191 00:08:33,326 --> 00:08:36,866 and then we go ahead and move temp into B, 192 00:08:37,336 --> 00:08:39,206 this is all working really well 193 00:08:39,206 --> 00:08:41,306 and then we return, and what happens. 194 00:08:41,556 --> 00:08:43,806 Well, the slice of memory that was allocated 195 00:08:43,916 --> 00:08:46,626 to the swap function is effectively gone. 196 00:08:46,626 --> 00:08:47,896 I mean, it's actually still there 197 00:08:47,896 --> 00:08:49,486 but the computer just forgets about it 198 00:08:49,486 --> 00:08:50,996 so you kind of fast forward. 199 00:08:50,996 --> 00:08:53,126 If we ever call another function, we might end 200 00:08:53,126 --> 00:08:56,416 up with orange juice or milk by default in those new variables, 201 00:08:56,676 --> 00:08:57,796 just because of those remnants, 202 00:08:57,796 --> 00:08:59,566 so there actually is something still here in memory 203 00:08:59,786 --> 00:09:03,156 but now we have X and Y, we didn't really do anything. 204 00:09:03,596 --> 00:09:06,326 So how do we actually solve this? 205 00:09:06,466 --> 00:09:09,626 Like just kind of intuitively, if the swap function isn't 206 00:09:09,626 --> 00:09:12,736 so much a function but rather it's just a friend and you want 207 00:09:12,736 --> 00:09:15,846 to hand your friend, X and Y, and you want that friend 208 00:09:15,846 --> 00:09:17,766 to swap X and Y for you, 209 00:09:18,096 --> 00:09:21,746 what should you really be handing your friend? 210 00:09:21,926 --> 00:09:24,606 Yeah, so ideally, you just hand him or her the actual cups, 211 00:09:24,736 --> 00:09:28,006 the originals, X and Y, and let him or her do their thing 212 00:09:28,126 --> 00:09:29,696 with an extra temp variable-- 213 00:09:29,696 --> 00:09:31,846 variable and actually do the swap. 214 00:09:32,176 --> 00:09:35,616 So how do you actually hand to a function the original X and Y? 215 00:09:35,886 --> 00:09:37,686 Well, what you can do is this. 216 00:09:37,686 --> 00:09:39,636 If you have a little scrap paper, 217 00:09:40,166 --> 00:09:44,336 and you say to your friend, alright, well, let's suppose now 218 00:09:44,336 --> 00:09:46,926 that X-- actually this is an unrealistic, suppose that X 219 00:09:46,926 --> 00:09:49,736 and Y are kind of bolted to this table in the same sense 220 00:09:49,736 --> 00:09:53,266 that a variable X and Y are in one place in RAM, right, 221 00:09:53,266 --> 00:09:54,326 the variables aren't gonna move. 222 00:09:54,326 --> 00:09:56,916 They are where the computer allocated them in RAM. 223 00:09:57,186 --> 00:09:59,656 So, what I can do is I can't really hand my friend these 224 00:09:59,656 --> 00:10:01,876 variables because that would be to move them from memory, 225 00:10:02,106 --> 00:10:03,836 but I can say, you know what, dear friend, 226 00:10:04,256 --> 00:10:10,166 the OJ is at location 1, 2, 3, 4 in RAM and the location 227 00:10:10,166 --> 00:10:12,976 of Y is at 4, 5, 6, 7. 228 00:10:13,046 --> 00:10:14,376 >> I'm just making these numbers up. 229 00:10:14,616 --> 00:10:16,816 So what you hand your friend is not X and Y 230 00:10:17,076 --> 00:10:21,746 but rather this little chi-chi, the address of X and Y. So that 231 00:10:21,746 --> 00:10:24,266 when your friend receives this, he or she can say alright, 232 00:10:24,546 --> 00:10:27,566 the X is apparently at location 1, 2, 3, 4, 233 00:10:27,566 --> 00:10:29,666 so I need to swap these cups for David 234 00:10:29,966 --> 00:10:32,406 so I'm gonna allocate my temporary variable, TMP, 235 00:10:32,406 --> 00:10:34,756 I'm gonna go ahead and bring it with me 236 00:10:34,756 --> 00:10:37,166 to location 1, 2, 3, 4 in memory. 237 00:10:37,436 --> 00:10:41,986 Okay, I'm gonna go ahead now and pour that location's contents 238 00:10:42,076 --> 00:10:43,336 into the temporary variable. 239 00:10:43,556 --> 00:10:44,496 Now, what am I gonna do? 240 00:10:44,496 --> 00:10:47,776 Alright, I need to now go to location 4, 5, 6, 7, 241 00:10:48,106 --> 00:10:50,986 so I go there, I find it and I realize, oh okay, 242 00:10:51,166 --> 00:10:54,276 I'm gonna move its contents to that same temporary variable. 243 00:10:54,526 --> 00:10:55,846 And then what do I have to do? 244 00:10:55,846 --> 00:10:58,256 Right, well now I have to go back to the original cup 245 00:10:58,496 --> 00:11:00,376 and actually say, alright, now I have to move the-- 246 00:11:00,376 --> 00:11:03,296 I got completely lost-- what goes next? 247 00:11:04,256 --> 00:11:05,146 Orange juice goes next. 248 00:11:05,376 --> 00:11:08,926 I have to take those contents, move it over, 249 00:11:08,926 --> 00:11:10,826 and if I didn't completely screw up this algorithm, 250 00:11:11,136 --> 00:11:12,996 they are indeed swapped but they're swapped 251 00:11:12,996 --> 00:11:14,236 at the original locations 252 00:11:14,236 --> 00:11:16,906 because I actually followed this map, if you will, 253 00:11:17,156 --> 00:11:21,156 to my friend's cups of OJ and milk, change those values, 254 00:11:21,156 --> 00:11:23,566 albeit with the help of my extra temporary cup 255 00:11:24,046 --> 00:11:25,936 and then the original contents are changed. 256 00:11:26,236 --> 00:11:28,556 So on the one hand I'm saving memory 257 00:11:28,716 --> 00:11:31,086 and that I'm not making copies of milk 258 00:11:31,086 --> 00:11:33,846 and orange juice this time, but I am spending some memory-- 259 00:11:33,846 --> 00:11:35,296 what am I spending in this scenario? 260 00:11:35,736 --> 00:11:39,366 Whatever the costs of these pieces of paper are, 261 00:11:39,366 --> 00:11:41,706 so it turns out inside of a computer these little slips 262 00:11:41,706 --> 00:11:45,626 of paper will cost you generally 32 bits, a pointer, 263 00:11:45,626 --> 00:11:48,806 otherwise known as an address in memory will cost you 4 bytes 264 00:11:48,876 --> 00:11:51,766 or 32 bits, at least on many computers including 265 00:11:51,766 --> 00:11:52,276 the appliance. 266 00:11:52,606 --> 00:11:54,626 Now, if you're familiar with your own personal MAC 267 00:11:54,626 --> 00:11:57,506 or PC being 64 bit Intel Inside, 268 00:11:57,646 --> 00:11:59,596 well that actually means these slips of paper are twice 269 00:11:59,596 --> 00:12:01,876 as long, you can fit bigger addresses in them 270 00:12:02,126 --> 00:12:04,406 and so pointers can also be 64 bits. 271 00:12:04,596 --> 00:12:07,426 And on hardware that's 10 years old, 20 years old, 30 years old, 272 00:12:07,626 --> 00:12:09,626 you might have pointers that are just 16 bits 273 00:12:09,626 --> 00:12:11,506 or 8 bits, but that's all it is. 274 00:12:11,616 --> 00:12:13,776 A pointer is just the address of something in memory 275 00:12:14,016 --> 00:12:17,106 and so this seems to now solve the problem. 276 00:12:17,106 --> 00:12:20,086 We can just inform someone where these things actually are. 277 00:12:20,266 --> 00:12:21,406 So what's this gonna look like? 278 00:12:21,406 --> 00:12:22,536 Well, this is before. 279 00:12:22,536 --> 00:12:25,486 This is the broken version of swap. 280 00:12:25,646 --> 00:12:29,026 But the after version, if I now spoil this here, 281 00:12:29,556 --> 00:12:31,586 is going to be that. 282 00:12:32,866 --> 00:12:35,016 So we have to introduce a little bit of more syntax 283 00:12:35,116 --> 00:12:36,736 and there's actually one tweak we're gonna have 284 00:12:36,736 --> 00:12:37,446 to make to main. 285 00:12:37,686 --> 00:12:40,756 But if we want to rewrite our swap function 286 00:12:40,996 --> 00:12:45,796 so that it does still swap, but it swaps correctly by passing 287 00:12:45,796 --> 00:12:48,886 in the address of X and Y or whatever the inputs are called 288 00:12:48,886 --> 00:12:52,076 and actually changes them at their original locations, 289 00:12:52,366 --> 00:12:56,066 we essentially have to introduce some of these stars, 290 00:12:56,136 --> 00:12:58,266 these asterisks before the variable names. 291 00:12:58,656 --> 00:13:02,746 So we'll tease apart exactly why that is but let's now see-- 292 00:13:03,296 --> 00:13:05,636 let's now just toss out a bit of jargon so that 293 00:13:05,886 --> 00:13:06,926 when we start seeing these things 294 00:13:06,926 --> 00:13:08,356 on the screen, its not so scary. 295 00:13:08,356 --> 00:13:10,746 So I have to teach you how to count again, 296 00:13:11,126 --> 00:13:12,556 so we've done this a couple of times now, right? 297 00:13:12,556 --> 00:13:15,146 You came in presumably knowing decimal, well, 0 through 9, 298 00:13:15,376 --> 00:13:18,086 we spent a little bit of time in week 01 binary 299 00:13:18,086 --> 00:13:20,886 where you only have two digits, 0 and 1, but we realized 300 00:13:20,886 --> 00:13:24,126 in the first week that if I wanna express the number 0 301 00:13:24,396 --> 00:13:26,116 in binary, it's actually pretty easy. 302 00:13:26,116 --> 00:13:27,606 I can just say 0. 303 00:13:27,796 --> 00:13:29,216 But then remember that we said, 304 00:13:29,216 --> 00:13:32,206 you don't typically just use one bit, you typically work 305 00:13:32,206 --> 00:13:33,216 in units of at least what? 306 00:13:34,276 --> 00:13:36,696 So at least 8, a byte is 8 bits 307 00:13:36,926 --> 00:13:38,876 so even though it's a little tedious to write 308 00:13:39,366 --> 00:13:42,606 on a chalkboard, really, if you want to represent the number 0 309 00:13:42,606 --> 00:13:45,396 in binary, you would probably store 8 bits, 310 00:13:45,396 --> 00:13:47,826 all of which are 0, and it got pretty easy after this. 311 00:13:47,826 --> 00:13:50,936 If you wanna represent the number 1, you have seven zeroes 312 00:13:50,936 --> 00:13:53,506 and then one 1, but then we actually had to think 313 00:13:53,606 --> 00:13:55,556 for the first time, right, in this scenario 314 00:13:55,556 --> 00:14:02,266 where we can't just go 00000002 'cause we don't have that digit, 315 00:14:02,396 --> 00:14:06,196 we only have a-- a binary number of digits, 0 and 1, 316 00:14:06,536 --> 00:14:09,946 so we didn't do this, so what was the number 2 in binary? 317 00:14:11,036 --> 00:14:13,556 Yeah, so it's actually 1, 0, so we had to move this here, 318 00:14:13,556 --> 00:14:16,586 so it's like carrying the 1, this then becomes a 0. 319 00:14:16,716 --> 00:14:18,406 And if you then think back to what this is, 320 00:14:18,406 --> 00:14:22,006 this is the 1's columns, the 2's, 4's, 8, 16, 32, 64, 321 00:14:22,006 --> 00:14:24,646 128 and so forth, if you have more bits. 322 00:14:25,086 --> 00:14:27,536 So even though it's a little longer, it takes more chalk 323 00:14:27,646 --> 00:14:31,796 to write these numbers, we can represent all possible integers 324 00:14:31,796 --> 00:14:32,936 with which we already are familiar. 325 00:14:33,226 --> 00:14:35,086 So this is binary, we already know decimal, 326 00:14:35,306 --> 00:14:36,916 let's introduce a bit of hexadecimal. 327 00:14:36,916 --> 00:14:39,756 Hexadecimal is gonna be useful here only in so far as some 328 00:14:39,756 --> 00:14:40,946 of the numbers we're gonna start seeing 329 00:14:40,946 --> 00:14:43,676 on the screen are little less scary although in reality, 330 00:14:43,676 --> 00:14:45,186 we're rarely going to care 331 00:14:45,436 --> 00:14:48,126 about what the address of some variable is. 332 00:14:48,126 --> 00:14:53,266 It could be 12345, could be 4567, it could be a huge number 333 00:14:53,266 --> 00:14:56,006 of digits, we just care that these addresses exist, 334 00:14:56,196 --> 00:14:58,456 we're not gonna typically care what they actually are. 335 00:14:58,676 --> 00:15:00,646 But what we're gonna start seeing in a moment is 336 00:15:00,646 --> 00:15:03,206 that a hexadecimal number-- so that everyone can see, 337 00:15:03,206 --> 00:15:06,206 let me move to just the window up here. 338 00:15:07,326 --> 00:15:10,436 What we're gonna see now is a whole lot of numbers that start 339 00:15:10,436 --> 00:15:12,836 with 0X, which is just a human convention 340 00:15:12,836 --> 00:15:14,996 that says here comes a hexadecimal number. 341 00:15:15,186 --> 00:15:18,726 Hexa, in this case, means 16, so in a hexadecimal system, 342 00:15:18,726 --> 00:15:20,546 you don't have 2, you don't have 10, 343 00:15:20,756 --> 00:15:23,606 you have 16 digits available to you. 344 00:15:23,856 --> 00:15:25,856 So how do you go about counting in hexadecimal? 345 00:15:26,156 --> 00:15:29,606 Typically, you would start at 0, then you would go up to 1, 346 00:15:29,946 --> 00:15:31,846 and thankfully hexadecimal is actually a little easier, 347 00:15:31,846 --> 00:15:34,806 now you can go to 2, then you can go to 3, then you can go 348 00:15:34,806 --> 00:15:43,426 to 4, 5, 6, 7, 8, 9, and? 349 00:15:43,426 --> 00:15:44,206 [ Inaudible Remark ] 350 00:15:44,206 --> 00:15:47,076 >> Ooh, very good, so it's A, it's not 10. 351 00:15:47,836 --> 00:15:49,516 Right, so each of these digits, 352 00:15:49,516 --> 00:15:53,226 each of these values here can be any of 16 possible values. 353 00:15:53,226 --> 00:15:55,196 Of course, once you start counting at 0, 354 00:15:55,426 --> 00:15:59,236 you get all the way up to 7, 8, 9, and you wanna say 10, 355 00:15:59,436 --> 00:16:02,376 but expressing 10 takes two digits but we need-- 356 00:16:02,376 --> 00:16:05,566 we have 16 available so the world decided that after you get 357 00:16:05,566 --> 00:16:13,386 to 9, you actually go to A, to B, to C, to D, to-- oops, to E-- 358 00:16:14,606 --> 00:16:16,466 and capital or lower case doesn't really matter, 359 00:16:16,466 --> 00:16:17,286 its just a convention. 360 00:16:17,286 --> 00:16:20,906 And then after you get to F, which is actually the 16th digit 361 00:16:20,906 --> 00:16:22,376 if we count all these up, I've just typed 362 00:16:22,376 --> 00:16:25,686 out 16 hexadecimal digits, what do you get to after F? 363 00:16:26,846 --> 00:16:30,146 One zero, so then you carry the 1, you go from the biggest value 364 00:16:30,146 --> 00:16:31,776 which in the normal world is 9, 365 00:16:31,776 --> 00:16:33,196 and you carry the 1 and you get to10. 366 00:16:33,456 --> 00:16:36,246 In the hexadecimal world, you get to F, then you carry the 1 367 00:16:36,246 --> 00:16:41,806 and you get to 1-0 and then it's 1-1, 1-2, 1-3, 1-4, 1-- 368 00:16:41,806 --> 00:16:45,456 dot dot dot, 1A, 1B and you can keep counting up like this. 369 00:16:45,786 --> 00:16:48,196 So we're gonna start seeing a lot of things in hexadecimal. 370 00:16:48,346 --> 00:16:51,736 What's nice about hexadecimal is that each of these digits, 371 00:16:51,736 --> 00:16:53,226 and there's 16 possible ones, 372 00:16:53,456 --> 00:16:55,296 takes up an exact number of bits. 373 00:16:55,416 --> 00:16:57,656 Every hexadecimal digit is 4 bits, 374 00:16:58,066 --> 00:17:01,246 so if you write 4 bits followed by 4 bits, you get a byte, 375 00:17:01,636 --> 00:17:04,186 so it's very convenient to express numbers as hexadecimal 376 00:17:04,186 --> 00:17:06,796 because with just two digits, you can represent a whole byte, 377 00:17:06,796 --> 00:17:08,866 whereas in binary, my god, you have to write 378 00:17:08,866 --> 00:17:10,366 out eight separate digits. 379 00:17:10,636 --> 00:17:12,376 So this is just a more succinct base system. 380 00:17:12,616 --> 00:17:15,016 Now we're gonna see it in a very arcane sense this week and next 381 00:17:15,176 --> 00:17:17,306 but fast forward toward term's end and we start doing things 382 00:17:17,306 --> 00:17:19,416 like web programming and a language called HTML 383 00:17:19,416 --> 00:17:22,516 and a language called CSS, Cascading Style Sheets. 384 00:17:22,726 --> 00:17:23,726 We're actually gonna re-see-- 385 00:17:23,726 --> 00:17:25,736 see these things again because they're also used 386 00:17:25,736 --> 00:17:26,796 on the web for colors. 387 00:17:26,796 --> 00:17:29,516 So just to fast forward, if you ever want 388 00:17:29,516 --> 00:17:31,836 to express the color red on a web page, 389 00:17:32,016 --> 00:17:33,536 you can either just say "red" 390 00:17:33,536 --> 00:17:35,626 which is actually pretty compelling, 391 00:17:36,026 --> 00:17:40,586 or you can actually say FF 00, 00, 392 00:17:40,686 --> 00:17:42,606 and that should be consistent today, FF, 393 00:17:42,606 --> 00:17:43,876 but it doesn't matter. 394 00:17:44,126 --> 00:17:44,806 Now, why is this? 395 00:17:44,976 --> 00:17:49,586 Little teaser, if you've ever heard of the expression RG-- 396 00:17:49,666 --> 00:17:52,306 whoops-- RGB, red, green, blue, 397 00:17:52,506 --> 00:17:56,746 the way we're gonna implement colors in a few weeks and also 398 00:17:56,746 --> 00:17:59,526 in the multimedia section of one of our problem sets, 399 00:17:59,786 --> 00:18:02,596 FF is kind of the biggest possible hexadecimal digit you 400 00:18:02,596 --> 00:18:03,696 can write, it's the-- the-- 401 00:18:03,836 --> 00:18:06,346 before you have to roll over to a 1 and a 0, 402 00:18:06,826 --> 00:18:10,516 well this means a lot of red, no green, no blue, 403 00:18:10,516 --> 00:18:11,276 so what's the result 404 00:18:11,276 --> 00:18:13,446 if you combine these various paint colors? 405 00:18:13,576 --> 00:18:15,716 Only you get just red, so that's red. 406 00:18:15,876 --> 00:18:19,336 Alright, so that was a bit of a tangent only to emphasize 407 00:18:19,336 --> 00:18:21,106 that we're gonna start seeing these things now 408 00:18:21,386 --> 00:18:23,736 but it's really not a big deal, it's just a more succinct way 409 00:18:23,736 --> 00:18:26,136 of representing numbers than binary is 410 00:18:26,136 --> 00:18:29,066 and even then decimal itself is. 411 00:18:29,726 --> 00:18:32,576 So let's go ahead and solve something. 412 00:18:32,576 --> 00:18:36,206 I'm gonna go ahead and open up a new window in G edit 413 00:18:36,206 --> 00:18:37,746 and we're gonna go ahead and try 414 00:18:37,746 --> 00:18:40,676 to compare something very simple, so let me go ahead 415 00:18:40,676 --> 00:18:42,936 and start from scratch here, 416 00:18:43,366 --> 00:18:45,856 though the code is available online and I'm gonna go ahead 417 00:18:45,856 --> 00:18:49,876 and say-- let's zoom in-- let's go ahead and save this 418 00:18:49,876 --> 00:18:54,996 as compare1.c and I'm gonna go ahead 419 00:18:54,996 --> 00:18:58,656 and say "int main void" alright, and the goal 420 00:18:58,656 --> 00:19:01,396 of this program ultimately is to get two strings from the user 421 00:19:01,396 --> 00:19:03,816 and then just compare them and say yes or no, 422 00:19:03,816 --> 00:19:06,536 the user typed the same word twice or here she did not. 423 00:19:07,036 --> 00:19:09,586 So let's go ahead and say we wanna get a string 424 00:19:09,586 --> 00:19:10,886 from the user, so printf-- 425 00:19:11,616 --> 00:19:14,746 let's just do something like "say something" alright, 426 00:19:14,746 --> 00:19:17,296 I put a space for aesthetics and now how do I get a string? 427 00:19:17,296 --> 00:19:20,686 Little recap-- so string S-- 428 00:19:20,786 --> 00:19:22,406 let's call it S1 'cause I'm gonna need two, 429 00:19:22,406 --> 00:19:25,086 so S1 gets-- GetString. 430 00:19:25,436 --> 00:19:27,626 Okay, not quite, I need to now kind of remind myself, 431 00:19:27,626 --> 00:19:29,436 wait a minute, I've just used two functions 432 00:19:29,436 --> 00:19:30,416 that I did not invent. 433 00:19:30,696 --> 00:19:32,096 So what do I need to include at the top? 434 00:19:33,196 --> 00:19:36,926 Okay, so CS50, and also-- yup, standard I/O for printf. 435 00:19:37,546 --> 00:19:40,976 And now, it should compile okay at this point in the story. 436 00:19:41,236 --> 00:19:43,746 So let's assume that the user doesn't try to be difficult. 437 00:19:43,746 --> 00:19:46,306 Here she just types in a word, so now let's go ahead 438 00:19:46,306 --> 00:19:48,666 and do the same thing, and we could use-- whoops! 439 00:19:49,076 --> 00:19:51,626 We could use a loop here but it's only a finite number 440 00:19:51,676 --> 00:19:53,276 of things, so it's actually pretty reasonable just 441 00:19:53,276 --> 00:19:56,876 to copy paste these two lines of code and get something 442 00:19:57,506 --> 00:19:58,976 from the user but this time say "say something again". 443 00:19:59,846 --> 00:20:02,096 >> So I want the user to type in some word, then another word, 444 00:20:02,266 --> 00:20:04,836 and now let's go ahead and compare these things. 445 00:20:05,006 --> 00:20:08,136 So we know how to do branches in both C and scratch, so let's say 446 00:20:08,136 --> 00:20:13,226 if S1 equals equals S2, well let me go ahead and say 447 00:20:13,226 --> 00:20:17,946 to the user printf, you typed the same thing, 448 00:20:17,946 --> 00:20:23,086 exclamation point, semicolon, else let's say printf, 449 00:20:23,716 --> 00:20:28,006 you typed different things exclamation point semicolon. 450 00:20:28,786 --> 00:20:31,456 Any bugs jumped out at anyone? 451 00:20:31,456 --> 00:20:33,446 Or any syntax bugs? 452 00:20:33,776 --> 00:20:36,926 Typos? Okay, feels pretty reasonable, right? 453 00:20:37,116 --> 00:20:39,946 So we get a string S1, get a string S2, we then compare it 454 00:20:40,066 --> 00:20:41,286 and I didn't make this mistake, right? 455 00:20:41,286 --> 00:20:44,076 This could be the first possible mistake, very commonly happens, 456 00:20:44,266 --> 00:20:46,846 but I am using equals equals and that's as comparative things, 457 00:20:46,846 --> 00:20:48,836 don't assign one to the other, so I print-- 458 00:20:48,836 --> 00:20:51,226 you print the type the same thing or different thing, 459 00:20:51,256 --> 00:20:52,266 so let's compile this. 460 00:20:52,266 --> 00:20:55,026 Let me go down to my terminal window here. 461 00:20:55,026 --> 00:20:57,936 Let me go ahead and do make compare 1. 462 00:20:58,516 --> 00:20:59,816 Looks like you compiled okay. 463 00:20:59,816 --> 00:21:03,316 Let me make this a little bigger and let's go ahead and run, 464 00:21:03,316 --> 00:21:09,566 compare 1, enter, say something, let's say hello, okay, hello. 465 00:21:11,106 --> 00:21:13,786 Now I did it, right? 466 00:21:13,866 --> 00:21:16,036 So, let's try this again and make sure maybe-- 467 00:21:16,036 --> 00:21:19,336 maybe we got it all backwards, alright, so hello and goodbye. 468 00:21:20,286 --> 00:21:21,726 Okay, so I'm half right, right? 469 00:21:21,726 --> 00:21:24,626 So that's at least correct and let's just do a sanity check. 470 00:21:25,016 --> 00:21:27,156 Hello? Hello. 471 00:21:27,156 --> 00:21:29,766 Alright, so it feels like there's something going on here. 472 00:21:29,996 --> 00:21:31,676 So what might this be? 473 00:21:31,896 --> 00:21:34,126 Right, so there's clearly a lead in today, it somehow relates 474 00:21:34,126 --> 00:21:36,736 to this stuff but like what really is probably going 475 00:21:36,736 --> 00:21:42,966 on underneath the hood that's breaking this program? 476 00:21:42,966 --> 00:21:46,106 Yeah? 477 00:21:46,106 --> 00:21:46,173 [ Inaudible Remark ] 478 00:21:46,173 --> 00:21:48,076 >> Yeah, so I mean that's kind of what it reduces. 479 00:21:48,076 --> 00:21:50,926 To even if you're not quite sure at all on the Y, 480 00:21:51,196 --> 00:21:52,676 feels like the only opportunity 481 00:21:52,676 --> 00:21:55,376 for the bug here really reduces to just one line. 482 00:21:55,376 --> 00:21:57,326 Every other line in this program is pretty straightforward 483 00:21:57,326 --> 00:22:00,196 and pretty uninteresting so the decision point must be 484 00:22:00,196 --> 00:22:02,406 where the flaw is and in fact it is the case 485 00:22:02,406 --> 00:22:05,956 that you cannot compare two strings in this way. 486 00:22:06,626 --> 00:22:09,076 Why can we not compare two strings in this way? 487 00:22:09,076 --> 00:22:11,316 'Cause equals equals has worked for like 4 weeks. 488 00:22:11,316 --> 00:22:13,486 We've used it for chars and ints, the-- 489 00:22:13,486 --> 00:22:15,266 it can break with floats and doubles 490 00:22:15,266 --> 00:22:16,266 but that's a different issue. 491 00:22:16,266 --> 00:22:17,796 Remember when you compare floats and doubles 492 00:22:17,796 --> 00:22:18,816 because of imprecision, 493 00:22:18,816 --> 00:22:21,286 you might be comparing something that's ever so slightly off 494 00:22:21,846 --> 00:22:22,906 but a string is a string. 495 00:22:22,906 --> 00:22:25,256 But what is a string really? 496 00:22:25,256 --> 00:22:27,776 What did we say "string" is synonymous? 497 00:22:27,776 --> 00:22:29,256 [ Inaudible Remarks ] 498 00:22:29,256 --> 00:22:32,086 >> Yeah, it's an array of characters, so really remember 499 00:22:32,086 --> 00:22:35,016 if we get rid of the CS50 Library, we lose some 500 00:22:35,016 --> 00:22:37,936 of its synonyms that it's defined and recall that we kind 501 00:22:37,936 --> 00:22:40,106 of peeled back this layer recently 502 00:22:40,106 --> 00:22:41,516 and then quickly hid it again. 503 00:22:41,736 --> 00:22:45,496 What string really is just a synonym for is char star. 504 00:22:45,716 --> 00:22:48,556 So star, we'd just seen a moment ago is apparently the solution 505 00:22:48,556 --> 00:22:51,076 to all of our problems, at least with regard to swapping things. 506 00:22:51,376 --> 00:22:53,946 But char we've seen is just a character, 507 00:22:54,096 --> 00:22:59,056 so char star means pointer to character, or if we boil it 508 00:22:59,056 --> 00:23:02,286 down to its basic definition, char star means the address 509 00:23:02,826 --> 00:23:06,246 of a character, so who cares? 510 00:23:06,246 --> 00:23:07,546 What is this really doing for us? 511 00:23:07,546 --> 00:23:10,326 Well, let me spin this around to a-- a blanks-- 512 00:23:10,906 --> 00:23:15,496 okay, so it's coming with us. 513 00:23:15,846 --> 00:23:24,266 Let me spin this around to-- alright. 514 00:23:24,266 --> 00:23:26,866 Okay. So, what really is char star? 515 00:23:26,866 --> 00:23:29,776 Char star is pointer to char but that's just address 516 00:23:29,776 --> 00:23:31,606 of char, so what is a string? 517 00:23:31,826 --> 00:23:37,056 When you allocate S1 like this, and then assign it a value, 518 00:23:37,236 --> 00:23:38,446 what's really going on? 519 00:23:38,696 --> 00:23:41,966 Well, on the left hand side, what you're getting in RAM is-- 520 00:23:42,006 --> 00:23:44,496 let's call it a square, a chunk of memory, 521 00:23:45,076 --> 00:23:47,016 that is how many bytes? 522 00:23:47,436 --> 00:23:47,956 How many bits? 523 00:23:49,576 --> 00:23:51,826 In the appliance, at least, this is gonna be 32 bits. 524 00:23:51,996 --> 00:23:55,776 The mere act of having written a star means you're gonna get a 525 00:23:55,776 --> 00:23:56,916 pointer, whatever that is, 526 00:23:56,916 --> 00:23:59,476 and we can represent this pictorially with the square, 527 00:23:59,476 --> 00:24:02,246 and that square represents 32 bits. 528 00:24:02,246 --> 00:24:04,116 Now, we've-- the wrong picture is like this before. 529 00:24:04,116 --> 00:24:08,066 If I ever said int X on the screen with a semicolon, 530 00:24:08,066 --> 00:24:10,446 what that means is give me 32 bits and allow me 531 00:24:10,446 --> 00:24:11,436 to put a number in there. 532 00:24:11,586 --> 00:24:13,476 Well, that's all a pointers, it's just a number. 533 00:24:13,636 --> 00:24:16,036 It just so happens that number has special meaning, 534 00:24:16,036 --> 00:24:18,386 it represents the address of something in memory, 535 00:24:18,686 --> 00:24:22,416 it doesn't represent an integer that I put in memory. 536 00:24:22,696 --> 00:24:26,206 So this is what S1 is, so logically, what is S2? 537 00:24:26,946 --> 00:24:32,226 S2 is just another such chunk of RAM, another 32 bits. 538 00:24:32,716 --> 00:24:36,066 So when I call GetString, we already know 539 00:24:36,066 --> 00:24:40,296 at least conceptually that in a string is an array of characters 540 00:24:40,296 --> 00:24:42,136 and we know this, we've seen this, right in P set 2 541 00:24:42,136 --> 00:24:45,556 with Caesar and Visonaire and the hacker edition with crack, 542 00:24:45,556 --> 00:24:48,686 you've actually, you know, traversed arrays of characters 543 00:24:48,686 --> 00:24:50,986 from bracket 0 all the way up to N minus 1. 544 00:24:51,276 --> 00:24:52,686 So, what really is a string? 545 00:24:52,686 --> 00:24:53,826 What is GetString giving you? 546 00:24:53,826 --> 00:24:56,636 Well, what GetString does, and we'll see this week and next, 547 00:24:56,866 --> 00:24:59,486 is that when you type in the word hello as the human, 548 00:25:00,116 --> 00:25:02,526 GetString is calling a special function, 549 00:25:02,526 --> 00:25:05,036 happens to be called malloc, for memory allocation, 550 00:25:05,316 --> 00:25:07,846 and what it's doing is it's allocating a chunk of memory 551 00:25:07,926 --> 00:25:12,386 for you and it's then putting in whatever letters you type, H, E, 552 00:25:12,506 --> 00:25:16,826 L, L, need a little more, O, need a little more actually. 553 00:25:17,076 --> 00:25:19,716 What else is GetString probably doing based on past promises? 554 00:25:20,456 --> 00:25:22,376 Yeah, it's putting that so-called null character 555 00:25:22,376 --> 00:25:24,206 which we draw as backslash 0. 556 00:25:24,446 --> 00:25:26,816 So, what has GetString doing all of these weeks? 557 00:25:26,906 --> 00:25:29,546 It's been allocating with a new function called malloc, 558 00:25:29,646 --> 00:25:32,386 a chunk of RAM, but what's key about this chunk of memory is 559 00:25:32,386 --> 00:25:34,276 that it's back to back to back to back, it's not a little 560 00:25:34,276 --> 00:25:35,876 over here, little over here, little over here, 561 00:25:35,996 --> 00:25:39,246 it's all contiguous and it's of size N plus 1 562 00:25:39,416 --> 00:25:42,096 so that you can fit the string plus this null 563 00:25:42,096 --> 00:25:43,216 terminating character. 564 00:25:43,456 --> 00:25:48,176 So now, if a pointer I claim is just 32 bits, it's this small, 565 00:25:48,176 --> 00:25:51,586 and it's clearly too small to fit H-E-L-L-O backslash 0, 566 00:25:51,916 --> 00:25:55,316 what's really going in here, do you think? 567 00:25:55,316 --> 00:25:55,506 [ Inaudible Remark ] 568 00:25:55,506 --> 00:25:59,136 >> Yeah, just the address of this. 569 00:25:59,366 --> 00:26:01,036 Right, if this is memory, this is RAM, 570 00:26:01,276 --> 00:26:04,066 and RAM is like 2 gigabytes worth on your MAC or PC, 571 00:26:04,206 --> 00:26:06,386 well we've claimed that you can say this is byte 0, 572 00:26:06,386 --> 00:26:09,266 this is byte 1, this is byte 2 billion, if you withdraw this-- 573 00:26:09,266 --> 00:26:10,656 this rectangle as we keep doing. 574 00:26:10,956 --> 00:26:14,376 So these cells in memory, these chunks 575 00:26:14,376 --> 00:26:16,016 of bits all have some address. 576 00:26:16,186 --> 00:26:18,486 This might be address 1, 2, 3, 4. 577 00:26:18,706 --> 00:26:23,746 This might be 1, 2, 3, 5; 1, 2, 3, 6 and so forth, that the-- 578 00:26:23,746 --> 00:26:25,006 what the numbers are doesn't matter 579 00:26:25,186 --> 00:26:27,616 but that they have numbers does matter. 580 00:26:27,846 --> 00:26:30,456 So, which of these numbers is probably going inside 581 00:26:30,456 --> 00:26:31,306 of this 32 bits? 582 00:26:32,746 --> 00:26:33,616 Which one do you need to know? 583 00:26:34,826 --> 00:26:35,996 It's really just the first. 584 00:26:36,146 --> 00:26:40,586 What's gonna go in here is the number 1, 2, 3, 4 if 1, 2, 3, 585 00:26:40,586 --> 00:26:43,796 4 is the physical address in memory of the first byte. 586 00:26:44,346 --> 00:26:48,116 Now, this is sufficient to remember S1 because if I know 587 00:26:48,116 --> 00:26:50,836 that the start of the string is at-- address 1, 2, 3, 4, 588 00:26:50,966 --> 00:26:53,836 how do I find the end of the string? 589 00:26:53,836 --> 00:26:54,216 [ Inaudible Remark ] 590 00:26:54,216 --> 00:26:54,806 >> Exactly! 591 00:26:54,846 --> 00:26:57,206 You just very simply, honestly, four loop, Y loop, 592 00:26:57,206 --> 00:26:58,236 whatever your preference is, 593 00:26:58,406 --> 00:27:00,586 you start at that address and you check here. 594 00:27:00,586 --> 00:27:01,766 Is this the null character? 595 00:27:01,766 --> 00:27:02,826 Is this the null character? 596 00:27:02,826 --> 00:27:03,826 Is this the null character? 597 00:27:03,826 --> 00:27:04,846 Is this the null character? 598 00:27:04,846 --> 00:27:05,806 Is this the null character? 599 00:27:05,806 --> 00:27:06,946 Is this-- aha, it is! 600 00:27:07,566 --> 00:27:08,696 That's the end of the string. 601 00:27:08,896 --> 00:27:11,666 So the function printf does something pretty much like that. 602 00:27:11,826 --> 00:27:13,926 It looks at every character back to back to back, 603 00:27:13,926 --> 00:27:15,656 just like you did with Caesar and Visionaire, 604 00:27:15,866 --> 00:27:18,286 and as soon as printf sees, "wait a minute, 605 00:27:18,286 --> 00:27:21,146 this is not a printable character, this is backslash 0," 606 00:27:21,146 --> 00:27:25,066 all 0 bits, it stops printing, it returns and voila, 607 00:27:25,066 --> 00:27:28,156 you see some text-- string of text on the screen. 608 00:27:28,756 --> 00:27:30,676 So now, back to the flawed program. 609 00:27:31,246 --> 00:27:35,356 This S1 equals equals S2, what is it really doing then 610 00:27:35,356 --> 00:27:38,366 if we now understand what's going on underneath the hood? 611 00:27:39,686 --> 00:27:40,486 What's it comparing? 612 00:27:40,596 --> 00:27:45,076 S1 and S2, so let's finish this picture. 613 00:27:45,076 --> 00:27:48,046 I typed in goodbye before so what do I need for goodbye? 614 00:27:48,046 --> 00:27:52,456 Well, I need another chunk of memory that's gonna have G-O-O-- 615 00:27:53,086 --> 00:27:56,486 bye backslash 0, and now let me just break it 616 00:27:56,486 --> 00:27:58,206 up pictorially with lines. 617 00:27:58,806 --> 00:28:00,016 So now I'm gonna get back-- 618 00:28:00,106 --> 00:28:03,526 this is I said before maybe address 4, 5, 6, 7, 619 00:28:03,776 --> 00:28:07,576 so what's going in S2, 4, 5, 6, 7, again numbers don't matter 620 00:28:07,576 --> 00:28:09,716 but that a number is there matters. 621 00:28:09,956 --> 00:28:11,026 So what am I comparing? 622 00:28:11,406 --> 00:28:13,746 You know, I'm kind of comparing two-- 623 00:28:13,746 --> 00:28:16,026 I am comparing two distinct addresses. 624 00:28:16,796 --> 00:28:19,746 So even if I typed in "hello here, hello here," 625 00:28:19,936 --> 00:28:21,856 they're gonna end up at different locations in memory. 626 00:28:21,856 --> 00:28:24,246 The computer is not smart enough to realize, "oh you typed 627 00:28:24,246 --> 00:28:26,606 in hello twice, I'm gonna optimize this 628 00:28:26,796 --> 00:28:28,196 and just reuse that same memory." 629 00:28:28,376 --> 00:28:30,666 No, GetString is gonna use a different chunk of memory, 630 00:28:30,666 --> 00:28:33,016 it's gonna-- thus be at a different place in RAM, 631 00:28:33,466 --> 00:28:34,436 so what are you comparing? 632 00:28:34,596 --> 00:28:38,046 Two addresses, and so, yup, they are wrong. 633 00:28:38,476 --> 00:28:39,316 Those are not correct. 634 00:28:39,316 --> 00:28:42,946 So let's see this actually, so S1 equals equals S2, 635 00:28:42,946 --> 00:28:44,536 let's actually print this out. 636 00:28:45,056 --> 00:28:47,666 So let me just make some temporary space here, printf, 637 00:28:47,666 --> 00:28:54,506 and I'm gonna say, "S1 is percent S comma S1 semicolon," 638 00:28:54,686 --> 00:28:56,136 and actually I may put a new line in there, 639 00:28:56,796 --> 00:29:04,066 and now I'm gonna also say, "S2 is percent S paste in S2," 640 00:29:04,286 --> 00:29:08,416 and let me temporary-- I'll leave that in there. 641 00:29:08,576 --> 00:29:10,946 So S1 is something, S2 is something, let's go ahead 642 00:29:10,946 --> 00:29:13,116 and compile this again, so let me go 643 00:29:13,116 --> 00:29:14,306 down to my terminal window. 644 00:29:14,926 --> 00:29:16,996 We go ahead and do make compare 1, 645 00:29:17,336 --> 00:29:19,046 compiled okay, let me run compare. 646 00:29:19,046 --> 00:29:22,866 I'm gonna type in hello, I'm gonna type in hello again just 647 00:29:22,866 --> 00:29:25,086 as I started with, enter, okay. 648 00:29:25,346 --> 00:29:27,046 So at least, they look the same, 649 00:29:27,236 --> 00:29:28,906 but I'm saying I typed different things. 650 00:29:29,166 --> 00:29:30,966 So let's look underneath the hood. 651 00:29:31,186 --> 00:29:34,326 S1 is not percent S, that's actually printed 652 00:29:34,326 --> 00:29:35,946 as what it really is, a pointer, 653 00:29:36,046 --> 00:29:39,526 percent P. So this is gonna show me the address hopefully, 654 00:29:39,526 --> 00:29:42,566 so let's scroll down here, and now let me go ahead 655 00:29:42,566 --> 00:29:48,836 and recompile, let me rerun it, hello, hello, and wow. 656 00:29:49,046 --> 00:29:51,326 So the numbers are not as simplistic as 1, 2, 3, 4, 657 00:29:51,376 --> 00:29:54,776 they're much bigger, they're 32 bits but now you see 658 00:29:54,776 --> 00:29:56,496 that they're so close, right? 659 00:29:56,766 --> 00:29:59,806 But they're actually a few bytes off. 660 00:29:59,916 --> 00:30:00,976 They're 10 bytes off from each other. 661 00:30:01,126 --> 00:30:04,806 >> Notice the 3 for S1 and then the 4 for the next? 662 00:30:05,146 --> 00:30:07,316 So there's a bit of an offset there, and that's just 663 00:30:07,316 --> 00:30:10,006 because GetString has put one of the strings over here, 664 00:30:10,266 --> 00:30:12,246 the other one over here, and they're pretty close 665 00:30:12,246 --> 00:30:13,636 but they're not the same. 666 00:30:14,826 --> 00:30:17,316 So how might we go about solving this? 667 00:30:17,826 --> 00:30:24,276 What do we have to do in order to compare two strings? 668 00:30:25,046 --> 00:30:25,446 Sorry? 669 00:30:25,446 --> 00:30:25,513 [ Inaudible Remark ] 670 00:30:25,513 --> 00:30:26,056 >> A to I? 671 00:30:26,116 --> 00:30:29,946 Okay, so if we used A to I, ASCII to integer, we could com-- 672 00:30:29,946 --> 00:30:32,626 we could convert a string to a number. 673 00:30:32,626 --> 00:30:35,376 But the problem here is that H-E-L-L-O is not a number, 674 00:30:35,376 --> 00:30:37,206 so this is a little distinct from Caesar 675 00:30:37,206 --> 00:30:39,356 where you might have typed in 13 and even though 676 00:30:39,356 --> 00:30:40,966 that is a string you wanted as a number, 677 00:30:41,036 --> 00:30:43,066 hello is a string, it's not a number. 678 00:30:43,656 --> 00:30:46,696 So not quite, what could we do? 679 00:30:46,696 --> 00:30:47,206 [ Inaudible Remark ] 680 00:30:47,206 --> 00:30:47,926 >> Well, yes. 681 00:30:47,926 --> 00:30:50,356 So there is in fact a string compare function. 682 00:30:50,356 --> 00:30:52,376 Very, very good-- googling? 683 00:30:52,656 --> 00:30:53,736 Okay, very good. 684 00:30:53,736 --> 00:30:54,896 [ Laughter ] 685 00:30:54,896 --> 00:30:56,966 >> So there's a string compare function. 686 00:30:56,966 --> 00:30:58,416 Alright, so let's actually borrow that, 687 00:30:58,416 --> 00:31:01,136 so I actually don't remember how to use it, so let me go 688 00:31:01,136 --> 00:31:05,386 to cs50.net/resources, let me go to the little [inaudible] 689 00:31:05,386 --> 00:31:06,946 up here, the C reference. 690 00:31:07,386 --> 00:31:09,706 This has something to do with strings, so I'm gonna click 691 00:31:09,766 --> 00:31:12,286 down here on the bottom left corner, standard C-string 692 00:31:12,286 --> 00:31:15,196 in character functions, string compare-- 693 00:31:15,456 --> 00:31:16,506 okay, these are all kind 694 00:31:16,506 --> 00:31:18,906 of abbreviated as is the convention. 695 00:31:19,516 --> 00:31:22,036 String CMP, that sounds like string compare. 696 00:31:22,186 --> 00:31:24,356 Alright, so it looks like in string.h, 697 00:31:24,646 --> 00:31:28,156 there's a function called strcmp, S-T-R-C-M-P, 698 00:31:28,306 --> 00:31:29,566 that stands for string comparison. 699 00:31:29,566 --> 00:31:31,506 It apparently takes in two arguments, 700 00:31:31,506 --> 00:31:33,346 string 1 and string 2. 701 00:31:33,596 --> 00:31:36,676 They are indeed strings, because again char star is synonymous 702 00:31:36,676 --> 00:31:37,346 with string. 703 00:31:37,566 --> 00:31:38,846 There is a new keyword here 704 00:31:38,846 --> 00:31:40,776 but you can probably guess what this means, 705 00:31:40,916 --> 00:31:43,416 const-- so it means constant. 706 00:31:43,416 --> 00:31:45,816 So C actually has this mechanism and you might have seen this 707 00:31:45,816 --> 00:31:48,076 in man pages before where you can say just 708 00:31:48,076 --> 00:31:49,956 to be extra paranoid that you don't screw 709 00:31:49,956 --> 00:31:53,316 up the user's inputs, you can say, I'm gonna receive str1 710 00:31:53,316 --> 00:31:56,146 and str2 but I'm specifying they're constant. 711 00:31:56,256 --> 00:31:58,136 I am not gonna take any liberties with these 712 00:31:58,136 --> 00:32:00,316 like a function like swap might and move them around, 713 00:32:00,566 --> 00:32:03,966 I'm gonna promise to whoever's calling me that I will take 714 00:32:03,966 --> 00:32:07,846 in your parameters as constant, even though you're handing them 715 00:32:07,846 --> 00:32:10,426 to me by pointer as suggested by the star. 716 00:32:10,426 --> 00:32:12,376 So this is just a convention that's good practice, 717 00:32:12,756 --> 00:32:14,496 so no need to be too distracted by it. 718 00:32:14,716 --> 00:32:15,496 So here's the summary, 719 00:32:15,496 --> 00:32:17,526 the function strcmp compares string 1 720 00:32:17,526 --> 00:32:18,776 and string 2, and then returns. 721 00:32:19,086 --> 00:32:20,936 So this is interesting, it doesn't return a Boolean, 722 00:32:21,096 --> 00:32:22,996 it actually returns 1 of 3 values. 723 00:32:22,996 --> 00:32:29,566 So string comparison will return 0 if what's the case? 724 00:32:29,566 --> 00:32:29,806 [ Inaudible Remark ] 725 00:32:29,806 --> 00:32:31,746 >> If str1 is equals to str2. 726 00:32:31,746 --> 00:32:33,636 And in this context of the documentation, 727 00:32:33,636 --> 00:32:36,546 it means literally equal like the words are the same 728 00:32:36,546 --> 00:32:37,746 in the-- in a human sense. 729 00:32:38,036 --> 00:32:42,396 But it will also, just to help you out, return less than 0, 730 00:32:42,476 --> 00:32:44,276 it probably returns negative 1 731 00:32:44,276 --> 00:32:46,186 but the programmer didn't even wanna make that commitment. 732 00:32:46,186 --> 00:32:49,126 He or she just said, I will commit to returning a none-- 733 00:32:49,266 --> 00:32:50,436 a negative number to you. 734 00:32:50,756 --> 00:32:53,656 So if it returns less than 0, that means str1 is less 735 00:32:53,746 --> 00:32:57,136 than str2 and if the function returns greater than 0, 736 00:32:57,276 --> 00:32:59,206 that means str1 is greater than string 2. 737 00:32:59,206 --> 00:33:00,786 What does it mean for string though do you think 738 00:33:00,786 --> 00:33:02,086 to be less than or greater than? 739 00:33:02,086 --> 00:33:03,346 [ Inaudible Remark ] 740 00:33:03,346 --> 00:33:05,476 >> What's that? 741 00:33:05,476 --> 00:33:05,543 [ Inaudible Remark ] 742 00:33:05,543 --> 00:33:07,926 >> Oh not-- so not about [inaudible], so not correct 743 00:33:07,926 --> 00:33:09,436 but it-- it could be the sum 744 00:33:09,436 --> 00:33:11,276 of the values 'cause these are just ASCII characters, 745 00:33:11,276 --> 00:33:14,136 but that's not it, its even more simplistic than that. 746 00:33:14,136 --> 00:33:14,203 [ Inaudible Remark ] 747 00:33:14,203 --> 00:33:14,806 >> What's that? 748 00:33:14,806 --> 00:33:15,486 [ Inaudible Remark ] 749 00:33:15,486 --> 00:33:18,626 >> Could be the length but even that's not-- not on point here. 750 00:33:18,626 --> 00:33:18,693 [ Inaudible Remark ] 751 00:33:18,693 --> 00:33:20,026 >> Alphabetical. 752 00:33:20,136 --> 00:33:20,966 That's what it's doing. 753 00:33:21,506 --> 00:33:23,106 Right, so this is actually very compelling. 754 00:33:23,106 --> 00:33:24,096 If you want a function 755 00:33:24,266 --> 00:33:26,506 that compares two strings alphabetically, 756 00:33:26,696 --> 00:33:29,046 you actually have it in strcmp already, 757 00:33:29,046 --> 00:33:31,886 it doesn't just tell you equal or not equal, it tells you 758 00:33:31,886 --> 00:33:34,506 which one comes alphabetically first, less than, 759 00:33:34,716 --> 00:33:36,626 or which one comes alphabetically later, 760 00:33:37,016 --> 00:33:39,096 greater than, so you can compare two strings. 761 00:33:39,096 --> 00:33:41,166 So now, already we have a function 762 00:33:41,166 --> 00:33:43,386 that could apparently allow us to implement something 763 00:33:43,386 --> 00:33:45,466 like binary search, which again we've done 764 00:33:45,466 --> 00:33:46,466 on the board here with Shawn. 765 00:33:46,846 --> 00:33:48,566 That doesn't just search for numbers. 766 00:33:48,566 --> 00:33:51,736 Now we could actually search for words, words like Mike Smith 767 00:33:51,736 --> 00:33:54,416 in a phonebook implemented in a computer program. 768 00:33:54,596 --> 00:33:55,406 So let's try this. 769 00:33:55,476 --> 00:33:58,676 Apparently to use this, I have to include string.h 770 00:33:58,676 --> 00:34:00,706 and then I have to know about these return values, 771 00:34:00,706 --> 00:34:01,616 so let's go back here. 772 00:34:02,206 --> 00:34:05,916 Alright, so rather than do equal equals, I'm gonna go ahead 773 00:34:05,916 --> 00:34:09,886 and get rid of my temporary pointer printing 774 00:34:10,446 --> 00:34:13,016 and now I'm gonna say, not this, I wanna compare, 775 00:34:13,326 --> 00:34:16,826 so I wanna call strcmp of S1, S2, if it what? 776 00:34:16,826 --> 00:34:17,336 [ Inaudible Remark ] 777 00:34:17,336 --> 00:34:24,436 >> Equals equals 0, right? 778 00:34:24,436 --> 00:34:26,056 And again, that's just from the documentation. 779 00:34:26,056 --> 00:34:28,546 If it equals 0, that means they're the same. 780 00:34:28,716 --> 00:34:31,196 So you know what, I don't care about alphabetical less than 781 00:34:31,196 --> 00:34:34,506 or greater than here, I'm gonna keep it simple and just say yes 782 00:34:34,506 --> 00:34:35,686 or no, same or different. 783 00:34:35,976 --> 00:34:38,626 So if str compare S1 S2 equals equals 0, 784 00:34:38,856 --> 00:34:40,306 hopefully this will fix my bug. 785 00:34:40,306 --> 00:34:45,376 So let me scroll down here, let me go-- error? 786 00:34:45,376 --> 00:34:45,766 [ Inaudible Remark ] 787 00:34:45,766 --> 00:34:47,826 >> Good! So-- good-- good point. 788 00:34:48,656 --> 00:34:50,376 This is gonna break when I try to compile it, 789 00:34:50,376 --> 00:34:52,236 what error am I about to see? 790 00:34:53,446 --> 00:34:54,996 Yeah, so something undeclared, right? 791 00:34:54,996 --> 00:34:57,106 So let's do a quick-- quick recompile make-- 792 00:34:57,666 --> 00:35:01,706 implicit declaration of function strcmp, what does that mean? 793 00:35:01,706 --> 00:35:03,736 Well, that just means that I forgot to declare it. 794 00:35:03,736 --> 00:35:04,736 I don't need to declare it. 795 00:35:04,736 --> 00:35:07,746 I need to include whatever file it's actually declared in, 796 00:35:08,096 --> 00:35:09,896 so let's go ahead and type string.h there, 797 00:35:10,386 --> 00:35:12,766 scroll back down, let me recompile with make, 798 00:35:12,766 --> 00:35:13,816 that seems to be good. 799 00:35:14,066 --> 00:35:14,826 Let's compare. 800 00:35:16,006 --> 00:35:17,286 Let's say hello. 801 00:35:17,976 --> 00:35:20,086 Make sure I didn't break the only part that was working. 802 00:35:20,926 --> 00:35:22,696 Okay, different things, that's good. 803 00:35:22,876 --> 00:35:26,516 Hello, hello-- finally. 804 00:35:26,846 --> 00:35:27,796 It's actually working. 805 00:35:28,406 --> 00:35:30,816 So there is this magical function, str compare, 806 00:35:30,816 --> 00:35:32,916 but you don't need these magical functions, right? 807 00:35:32,916 --> 00:35:34,646 Printf is useful, it's kind of non-obvious 808 00:35:34,646 --> 00:35:36,626 to me how I would actually make stuff appear 809 00:35:36,626 --> 00:35:37,596 on a computer monitor. 810 00:35:37,846 --> 00:35:40,226 But comparing two strings, if you understand what's going 811 00:35:40,226 --> 00:35:43,146 on underneath the hood, should actually be pretty tenable. 812 00:35:43,416 --> 00:35:44,386 So let's actually do this. 813 00:35:44,386 --> 00:35:48,076 I'm gonna say, you are not allowed to use strcmp-- compare. 814 00:35:48,076 --> 00:35:51,466 So for instance, string.h, we accidentally deleted it along 815 00:35:51,466 --> 00:35:54,656 with string.c. So just conceptually, how can we take 816 00:35:54,656 --> 00:35:59,086 in two inputs like string 1 and string 2, and compare them, 817 00:35:59,086 --> 00:36:00,876 and say yes or no, same or different. 818 00:36:02,196 --> 00:36:02,586 Yeah? 819 00:36:02,586 --> 00:36:02,876 [ Inaudible Remark ] 820 00:36:02,876 --> 00:36:03,356 >> Perfect! 821 00:36:03,356 --> 00:36:04,976 So we could just compare each character 822 00:36:05,076 --> 00:36:09,266 and what do we do as we compare? 823 00:36:09,266 --> 00:36:09,396 [ Inaudible Remark ] 824 00:36:09,396 --> 00:36:10,806 >> What's that? 825 00:36:10,806 --> 00:36:10,873 [ Inaudible Remark ] 826 00:36:10,873 --> 00:36:13,926 >> So iterate through and I start comparing the left end 827 00:36:13,926 --> 00:36:15,746 against the left end, and then here, and then here, 828 00:36:15,746 --> 00:36:17,616 and then here, and at what point can I make a decision 829 00:36:17,616 --> 00:36:19,626 or just quit altogether? 830 00:36:19,626 --> 00:36:20,316 [ Inaudible Remark ] 831 00:36:20,316 --> 00:36:23,156 >> So if you see null at the end or if-- 832 00:36:24,666 --> 00:36:26,836 if it's a different in any location. 833 00:36:27,046 --> 00:36:27,786 So let's try this. 834 00:36:27,786 --> 00:36:32,096 Let me go ahead and say, okay, 4 int, I gets 0. 835 00:36:32,496 --> 00:36:38,986 I is let's say less than the string length of S1, I++, 836 00:36:39,526 --> 00:36:42,706 let me go in here now and say, okay so if-- 837 00:36:42,706 --> 00:36:45,586 how do I express if the leftmost character 838 00:36:45,586 --> 00:36:46,796 of either is different? 839 00:36:47,466 --> 00:36:48,286 Do something, if-- ? 840 00:36:48,976 --> 00:36:51,176 Think back to Caesar, if S1-- 841 00:36:51,176 --> 00:36:51,243 [ Inaudible Remark ] 842 00:36:51,243 --> 00:36:56,766 >> I'll say it was good so if-- I could just say is F-- 843 00:36:56,806 --> 00:37:03,116 S1 bracket I does not equal S2 bracket I, 844 00:37:03,116 --> 00:37:05,696 so just compare their Ith locations, what can I do? 845 00:37:05,696 --> 00:37:12,576 Well I can say, you typed different things, semicolon, 846 00:37:12,576 --> 00:37:13,966 and then I can quit, right? 847 00:37:14,026 --> 00:37:16,276 So let's say return 1, 848 00:37:16,276 --> 00:37:17,936 'cause something bad happened, that's an error. 849 00:37:18,466 --> 00:37:19,426 But not quite, right? 850 00:37:19,426 --> 00:37:20,196 I just made a bug. 851 00:37:20,846 --> 00:37:22,966 Right, so we need the brackets. 852 00:37:22,966 --> 00:37:24,936 Otherwise only one of those lines will actually get 853 00:37:24,936 --> 00:37:27,576 executed, so let me go back in here, let me do this. 854 00:37:27,576 --> 00:37:31,846 And now, again, I feel like I got some issue still 855 00:37:31,846 --> 00:37:36,076 but I'm gonna claim that if I at least get way down here, 856 00:37:37,036 --> 00:37:40,646 is it pretty safe to assume that-- 1, 2, 3, 4, you-- 857 00:37:40,646 --> 00:37:43,136 so this is kind of my catchall. 858 00:37:43,326 --> 00:37:45,726 If I get all the way through the loop 859 00:37:45,786 --> 00:37:48,606 and I don't actually return prematurely and yell at the user 860 00:37:48,606 --> 00:37:49,646 for having typed different things, 861 00:37:49,646 --> 00:37:52,116 I can claim you type the same thing, right? 862 00:37:52,116 --> 00:37:53,756 [ Inaudible Remark ] 863 00:37:53,756 --> 00:37:54,336 >> Alright, why not? 864 00:37:54,336 --> 00:37:55,066 What's flawed here? 865 00:37:55,066 --> 00:37:55,816 [ Inaudible Remark ] 866 00:37:55,816 --> 00:38:01,286 >> I still need what at the top? 867 00:38:01,286 --> 00:38:01,353 [ Inaudible Remark ] 868 00:38:01,353 --> 00:38:05,556 >> So I actually do need string.h, but why this time? 869 00:38:05,556 --> 00:38:05,816 [ Inaudible Remark ] 870 00:38:05,816 --> 00:38:06,176 >> Damn it! 871 00:38:06,706 --> 00:38:08,816 Okay. So let's-- let-- I'll give you that back. 872 00:38:08,946 --> 00:38:10,266 Right, so, I don't really feel 873 00:38:10,266 --> 00:38:13,066 like implementing string length just yet though we could. 874 00:38:13,436 --> 00:38:15,076 Right, how do you implement string length frankly? 875 00:38:15,076 --> 00:38:15,143 [ Inaudible Remark ] 876 00:38:15,143 --> 00:38:19,246 >> Yeah, just start at the beginning and then count, count, 877 00:38:19,246 --> 00:38:20,866 count, count, count, so we'll come back to that 878 00:38:20,866 --> 00:38:22,346 or maybe assign it on the quiz or something, 879 00:38:22,346 --> 00:38:24,036 but it's pretty easy to implement string length 880 00:38:24,186 --> 00:38:27,026 if all you have to do is take in a string, start iterating 881 00:38:27,026 --> 00:38:30,706 over each of its characters, and the moment you hit backslash 0, 882 00:38:30,706 --> 00:38:33,586 stop counting, you're done, you found the end of the string. 883 00:38:33,806 --> 00:38:35,676 Alright, so let's-- we'll give you str length 884 00:38:35,706 --> 00:38:36,896 but not string compare, right? 885 00:38:36,896 --> 00:38:38,066 So I'll take away Google from you. 886 00:38:38,296 --> 00:38:38,956 How about that? 887 00:38:39,396 --> 00:38:41,906 So now, we can't string compare it with the function 888 00:38:41,906 --> 00:38:44,676 but we can do it this way but I actually think this is flawed 889 00:38:44,676 --> 00:38:51,506 for a couple of reasons still. 890 00:38:52,516 --> 00:38:53,986 Yeah? 891 00:38:53,986 --> 00:38:55,216 [ Inaudible Remark ] 892 00:38:55,216 --> 00:38:55,306 >> Yeah. 893 00:38:55,306 --> 00:38:55,536 [ Inaudible Remark ] 894 00:38:55,536 --> 00:38:56,126 >> Exactly. 895 00:38:56,126 --> 00:38:58,736 Let's see what happens, maybe nothing, maybe something. 896 00:38:58,736 --> 00:39:02,546 Let's see what happens if I save it as is, I go back here 897 00:39:02,546 --> 00:39:07,606 and I recompile and I type in S1 that's pretty long, 898 00:39:07,796 --> 00:39:10,226 but an S2 that's pretty short, right? 899 00:39:10,226 --> 00:39:12,226 Because how many times am I gonna iterate according 900 00:39:12,226 --> 00:39:14,836 to my four loop at the moment? 901 00:39:15,006 --> 00:39:16,166 The length of S1, so it feels 902 00:39:16,166 --> 00:39:19,436 like there's a bad opportunity here, so let's run this. 903 00:39:19,466 --> 00:39:27,056 So say something like I am a very long string hello, 904 00:39:27,636 --> 00:39:30,736 okay, S2-- okay, it worked. 905 00:39:31,946 --> 00:39:33,096 Not broken-- correct. 906 00:39:33,546 --> 00:39:34,606 Correctness 5. 907 00:39:35,316 --> 00:39:37,546 Why is it actually flawed? 908 00:39:37,546 --> 00:39:38,516 Can we still break this? 909 00:39:39,476 --> 00:39:48,396 I just need to stall for a moment here. 910 00:39:49,296 --> 00:39:49,406 Okay. 911 00:39:49,406 --> 00:39:51,066 [ Inaudible Remark ] 912 00:39:51,066 --> 00:39:51,836 >> What's that? 913 00:39:51,836 --> 00:39:53,506 [ Inaudible Remark ] 914 00:39:53,506 --> 00:39:56,936 >> Oh okay, so cat and cat-- oh yeah, you know, 915 00:39:56,936 --> 00:39:58,186 I'm kind of an idiot, right? 916 00:39:58,276 --> 00:39:59,366 So what are we doing? 917 00:39:59,546 --> 00:40:00,966 I'm comparing a really long string 918 00:40:01,186 --> 00:40:02,966 against a really short string, but guess where they differ? 919 00:40:03,296 --> 00:40:05,136 >> At the first location, right? 920 00:40:05,176 --> 00:40:07,716 So, not a problem, I'm actually catching that at the beginning. 921 00:40:07,996 --> 00:40:11,306 So let's-- let's control C, let's start this over, 922 00:40:11,306 --> 00:40:17,746 and let's say caterpillar, enter, did I spell that right? 923 00:40:17,746 --> 00:40:17,813 [ Laughter ] 924 00:40:17,813 --> 00:40:18,076 >> Really? 925 00:40:18,076 --> 00:40:26,226 Hang on, I'm giving us-- there we go. 926 00:40:26,226 --> 00:40:29,516 I-- I had to take to go back. 927 00:40:29,516 --> 00:40:35,746 Okay, so-- damn it, I forgot what I copy pasted. 928 00:40:35,746 --> 00:40:37,126 Caterpillar, alright. 929 00:40:37,406 --> 00:40:38,526 Come on, that is not a word I've had 930 00:40:38,526 --> 00:40:39,976 to type very often in life, alright. 931 00:40:40,756 --> 00:40:44,096 So here we go, caterpillar, okay, cat. 932 00:40:44,576 --> 00:40:51,916 Switched them around, cat, caterpillar. 933 00:40:55,146 --> 00:40:59,596 [Laughter] That was only a surprise to make. 934 00:41:00,056 --> 00:41:02,296 So, what's going on here? 935 00:41:02,326 --> 00:41:04,436 Well, now, we have a substring issue, right? 936 00:41:04,436 --> 00:41:07,976 So cat is obviously a substring, a prefix of caterpillar, 937 00:41:08,286 --> 00:41:11,276 and so they do in fact match but if I'm only iterating 938 00:41:11,556 --> 00:41:14,336 from the length of 0 to the length of-- 939 00:41:14,376 --> 00:41:17,326 rather, from 0 to the length of S1, I'm only gonna compare 940 00:41:17,326 --> 00:41:18,976 like the first three characters, C-A-T, 941 00:41:18,976 --> 00:41:20,016 and then I'm gonna call it quits. 942 00:41:20,456 --> 00:41:22,176 But in the opposite direction, 943 00:41:22,176 --> 00:41:24,046 it didn't actually happen this time, but if I [inaudible] 944 00:41:24,146 --> 00:41:25,136 around enough it might. 945 00:41:25,486 --> 00:41:28,776 What could happen if the differences 946 00:41:28,776 --> 00:41:33,416 of these strings is long enough? 947 00:41:33,546 --> 00:41:36,236 So I'm gonna start comparing C-A-T, 948 00:41:36,496 --> 00:41:40,186 but what am I gonna hit once I get to the fourth location, 949 00:41:40,186 --> 00:41:42,246 if the one word is caterpillar and it's first, 950 00:41:42,246 --> 00:41:43,996 and then cat is the second word. 951 00:41:44,866 --> 00:41:49,106 So it null, but null is not equal to C-A-T-E, it's not equal 952 00:41:49,196 --> 00:41:52,436 to E, and so it actually still catches that scenario 953 00:41:52,436 --> 00:41:54,686 and we don't overstep the bounds of the array because we 954 00:41:54,686 --> 00:41:55,556 at least stopped there. 955 00:41:55,766 --> 00:41:58,756 But flip them around and we do confuse cat 956 00:41:59,046 --> 00:42:00,696 for caterpillar, or vice versa. 957 00:42:00,916 --> 00:42:02,136 So what's a quick fix here? 958 00:42:02,136 --> 00:42:03,376 What's one way we could fix this? 959 00:42:04,206 --> 00:42:07,486 So we could get the longer length first, 960 00:42:07,546 --> 00:42:11,536 so we could say something like if string length of S1 is less 961 00:42:11,536 --> 00:42:14,986 than string length of S2, then we could go ahead 962 00:42:14,986 --> 00:42:20,076 and say the maximum length is equal to str length of S2, 963 00:42:20,076 --> 00:42:22,386 but I'm kinda pushing it now 964 00:42:22,386 --> 00:42:25,626 with design issues, let me put max here. 965 00:42:25,626 --> 00:42:27,256 Is this correct? 966 00:42:27,486 --> 00:42:31,746 And then I can say, let's change this to max. 967 00:42:32,746 --> 00:42:35,976 So this kind of helps but I'm still kind 968 00:42:35,976 --> 00:42:36,856 of doing something stupid. 969 00:42:36,856 --> 00:42:39,276 What's the quickest way to just check 970 00:42:39,276 --> 00:42:41,716 if two strings are not the same? 971 00:42:41,716 --> 00:42:42,616 [ Inaudible Remark ] 972 00:42:42,616 --> 00:42:44,176 >> Right, if it's not the same length, right? 973 00:42:44,176 --> 00:42:48,386 I could actually have a check like if string length 974 00:42:48,546 --> 00:42:52,296 of S1 does not equal the string length of S2, you don't need 975 00:42:52,296 --> 00:42:54,446 to iterate over anything, you can just immediately yell 976 00:42:54,446 --> 00:42:58,426 at the user and say, obviously not the same, right, 977 00:42:58,426 --> 00:43:00,586 because they are different lengths, and then so long 978 00:43:00,586 --> 00:43:03,366 as I have my curly braces here, I can return one here, 979 00:43:03,616 --> 00:43:07,276 and then if I get to this point here, now does it matter 980 00:43:07,276 --> 00:43:08,526 which string length I use? 981 00:43:09,356 --> 00:43:11,326 So now, I can use either because they're equivalent 982 00:43:11,616 --> 00:43:13,586 so there's a couple optimizations 983 00:43:13,586 --> 00:43:17,266 that are possible here, but now I can do response to the user 984 00:43:17,266 --> 00:43:18,106 in a couple different ways. 985 00:43:18,106 --> 00:43:20,316 I can say they are obviously the different 986 00:43:20,446 --> 00:43:21,696 because they're not even the same length 987 00:43:21,786 --> 00:43:24,466 or same length but different things. 988 00:43:24,706 --> 00:43:28,786 Now, hereto, not perfect design, so correct I think at this point 989 00:43:29,206 --> 00:43:35,666 but where is an opportunity for better design in that loop? 990 00:43:36,276 --> 00:43:36,366 Yeah? 991 00:43:36,366 --> 00:43:36,616 [ Inaudible Remark ] 992 00:43:36,616 --> 00:43:39,336 >> Okay, so what if we did less than or equal to string length 993 00:43:39,446 --> 00:43:42,616 of S1, what would that have us comparing, 994 00:43:42,616 --> 00:43:44,126 in addition to the string itself? 995 00:43:44,126 --> 00:43:44,616 [ Inaudible Remark ] 996 00:43:44,616 --> 00:43:48,016 >> So it would have us comparing the null character, and now just 997 00:43:48,016 --> 00:43:51,346 to push back, is that necessary? 998 00:43:51,346 --> 00:43:54,316 So this is not incorrect but it doesn't really gain as much, 999 00:43:54,316 --> 00:43:56,896 and if anything, we're spending an extra bit of time now 1000 00:43:57,186 --> 00:43:58,726 because if we already know these lengths 1001 00:43:58,726 --> 00:44:01,436 of the strings are the same, and we already know that in C, 1002 00:44:01,646 --> 00:44:04,926 every string ends with backslash 0, it's sort of a waste 1003 00:44:04,926 --> 00:44:07,786 of our time to bother comparing the backslash 0, 1004 00:44:07,786 --> 00:44:10,506 we can stop one short of that, so not incorrect 1005 00:44:10,506 --> 00:44:17,316 but we're just wasting an extra iteration. 1006 00:44:18,526 --> 00:44:18,606 Yeah? 1007 00:44:18,606 --> 00:44:18,906 [ Inaudible Remark ] 1008 00:44:18,906 --> 00:44:20,986 >> Yeah. 1009 00:44:20,986 --> 00:44:22,386 [ Inaudible Remark ] 1010 00:44:22,386 --> 00:44:22,966 >> Exactly. 1011 00:44:22,966 --> 00:44:24,576 So this is subtle and frankly it's-- 1012 00:44:24,576 --> 00:44:25,546 it's kind of elegant, right? 1013 00:44:25,546 --> 00:44:28,236 We've kind of preached the idea of making these succinct lines 1014 00:44:28,236 --> 00:44:29,996 of code that are just very straightforward, 1015 00:44:30,096 --> 00:44:31,896 but the problem here is that the condition 1016 00:44:31,896 --> 00:44:34,086 in a four loop is evaluated not once, 1017 00:44:34,356 --> 00:44:35,906 but on every iteration, right? 1018 00:44:35,906 --> 00:44:37,756 'Cause you have to check, is it time to break? 1019 00:44:37,756 --> 00:44:38,566 Is it time to break? 1020 00:44:38,566 --> 00:44:39,366 Is it time to break? 1021 00:44:39,626 --> 00:44:42,916 So, every time you check that condition, you're comparing I 1022 00:44:42,916 --> 00:44:44,326 against the current length of S1, 1023 00:44:44,326 --> 00:44:47,616 but what is the current length of S1 on every iteration, 1024 00:44:48,266 --> 00:44:49,906 what's the same as the very first one. 1025 00:44:49,906 --> 00:44:52,846 So a common technique would be, don't call functions 1026 00:44:52,846 --> 00:44:55,846 in your conditions, rather move this over here, 1027 00:44:55,846 --> 00:44:59,686 introduce something like N and assign it to the string length 1028 00:44:59,686 --> 00:45:02,126 and then iterate from I to N. Even here, 1029 00:45:02,126 --> 00:45:03,586 there's still an opportunity 1030 00:45:03,586 --> 00:45:05,966 for slight improvement 'cause what am I doing that's still 1031 00:45:06,106 --> 00:45:07,076 slightly foolish? 1032 00:45:07,211 --> 00:45:09,211 [ Inaudible Remark ] 1033 00:45:09,346 --> 00:45:10,606 >> I called it earlier as well. 1034 00:45:10,606 --> 00:45:12,826 So, you know, not as an expensive of a mistake. 1035 00:45:13,136 --> 00:45:15,156 I mean frankly the code is kind of elegant here 1036 00:45:15,156 --> 00:45:17,336 and is the strings are pretty short, it doesn't matter 1037 00:45:17,336 --> 00:45:18,756 if I'm spending a few extra cycles, 1038 00:45:18,786 --> 00:45:21,296 but still I should probably compute string length 1039 00:45:21,296 --> 00:45:22,996 of S1 once up top. 1040 00:45:23,216 --> 00:45:25,036 Put the result perhaps in a variable 1041 00:45:25,036 --> 00:45:27,916 and then actually proceed with the comparisons. 1042 00:45:28,266 --> 00:45:31,206 So, all of these though can go away, of course, 1043 00:45:31,206 --> 00:45:33,686 if we just use the string compare function. 1044 00:45:33,686 --> 00:45:35,206 But what's it doing and what's a lot 1045 00:45:35,206 --> 00:45:36,936 of these functions doing in the documentation? 1046 00:45:37,126 --> 00:45:38,856 It's just implementing these fundamentals 1047 00:45:39,046 --> 00:45:40,656 that years ago people got tired 1048 00:45:40,656 --> 00:45:43,396 of every time they wanna compare strings, writing like 10 lines 1049 00:45:43,396 --> 00:45:47,036 of code, you collapse it to one line, str compare, and thus, 1050 00:45:47,036 --> 00:45:49,196 do you hopefully start to see really the motivation 1051 00:45:49,196 --> 00:45:50,946 for writing things as functions 1052 00:45:50,946 --> 00:45:52,606 and not just writing everything in main. 1053 00:45:53,306 --> 00:45:55,126 So I've got some more craziness in a bit, 1054 00:45:55,126 --> 00:45:56,156 let's take a 5-minute break, 1055 00:45:56,156 --> 00:45:58,696 and return with the damage we can do with pointers. 1056 00:45:59,196 --> 00:46:02,066 [ Pause ] 1057 00:46:02,566 --> 00:46:03,286 >> Alright. 1058 00:46:04,546 --> 00:46:08,506 So we're back, first, any questions on pointers 1059 00:46:08,726 --> 00:46:10,166 and just what they are? 1060 00:46:12,076 --> 00:46:15,546 Alright. So let's now start pulling off some 1061 00:46:15,546 --> 00:46:16,516 of these training wheels. 1062 00:46:16,746 --> 00:46:19,326 So in the CS50 Library, we've been taking these functions 1063 00:46:19,326 --> 00:46:22,496 for granted and they've been useful in that they handle all 1064 00:46:22,496 --> 00:46:24,666 of the getting of input from the user from the keyboard. 1065 00:46:24,806 --> 00:46:27,856 They even do some error checking whereby GetInt if you type 1066 00:46:27,856 --> 00:46:31,286 in as we've done monkey, it will say retry, retry, retry, 1067 00:46:31,456 --> 00:46:33,266 until the user actually gives you an integer. 1068 00:46:33,426 --> 00:46:37,076 But let's focus now on GetString because clearly all this time, 1069 00:46:37,516 --> 00:46:40,016 GetString has been doing some interesting stuff with us 1070 00:46:40,016 --> 00:46:41,036 with regard to pointers. 1071 00:46:41,316 --> 00:46:43,926 This is an excerpt from cs50.h, 1072 00:46:44,076 --> 00:46:45,816 the actual file you've been including. 1073 00:46:46,096 --> 00:46:48,746 This file lives in a special directory in the appliance 1074 00:46:48,746 --> 00:46:50,556 as it would on a typical computer system. 1075 00:46:50,756 --> 00:46:52,446 For the curious, it happens to live 1076 00:46:52,446 --> 00:46:56,376 in a directory called slash user slash include, 1077 00:46:56,626 --> 00:46:58,756 this is where a lot of dot H files are. 1078 00:46:58,796 --> 00:47:01,116 In fact, if you recall in a Linux environment, 1079 00:47:01,366 --> 00:47:04,016 if you type LS that will list the contents of a directory 1080 00:47:04,216 --> 00:47:07,196 and if I do slash users slash include, 1081 00:47:07,376 --> 00:47:10,676 notice there's a whole bunch of dot H files in my appliance. 1082 00:47:10,976 --> 00:47:14,656 Among them over there on the left is cs50.h. Not all 1083 00:47:14,656 --> 00:47:17,426 of them are necessarily in those same directory 1084 00:47:17,706 --> 00:47:22,456 but we might see a couple of familiar ones if we go to-- 1085 00:47:22,456 --> 00:47:27,466 I don't think we've used any of these just yet. 1086 00:47:27,606 --> 00:47:29,926 Sorry, oh and crypt, so we've used crypt there 1087 00:47:29,926 --> 00:47:31,366 for the Hacker Edition and then also-- 1088 00:47:31,366 --> 00:47:32,196 [ Inaudible Remark ] 1089 00:47:32,196 --> 00:47:35,466 >> -- the CS50, obviously, dot H the header file. 1090 00:47:35,716 --> 00:47:40,046 So here it is up at the top, the CS50 header file only contains 1091 00:47:40,326 --> 00:47:42,486 that synonym for string, char star. 1092 00:47:42,486 --> 00:47:45,396 Remember we saw this a few days ago. 1093 00:47:45,396 --> 00:47:47,046 If I scroll all the way to the top, 1094 00:47:47,246 --> 00:47:48,826 we introduced this key word, typedef, 1095 00:47:49,056 --> 00:47:50,426 which we won't use terribly often. 1096 00:47:50,426 --> 00:47:52,096 We will use it for 1 or 2 P sets, 1097 00:47:52,336 --> 00:47:54,086 but typedef just defines a type 1098 00:47:54,396 --> 00:47:57,326 that char star is henceforth gonna be known as string, 1099 00:47:57,536 --> 00:47:58,576 that's where that came from. 1100 00:47:58,576 --> 00:48:01,426 But if I scroll now down to the documentation for GetString, 1101 00:48:01,676 --> 00:48:03,906 we pretty much been taking for granted how it works 1102 00:48:03,906 --> 00:48:06,476 but let's actually read CS50's official documentation. 1103 00:48:06,676 --> 00:48:09,496 It says, GetString function reads a line 1104 00:48:09,496 --> 00:48:10,946 of text from standard input. 1105 00:48:11,116 --> 00:48:13,676 Standard input just generally means the keyboard, the user, 1106 00:48:13,676 --> 00:48:17,396 and returns it as a string, otherwise known as char star, 1107 00:48:17,846 --> 00:48:19,546 sends trailing new line character. 1108 00:48:19,546 --> 00:48:22,106 So even though all this time we humans have been hitting enter 1109 00:48:22,106 --> 00:48:26,536 to give the computer a-- a string, we presumptuously, CS50, 1110 00:48:26,566 --> 00:48:28,596 we just throw away that backslash N, 1111 00:48:28,736 --> 00:48:30,176 because we don't want all of your strings 1112 00:48:30,176 --> 00:48:32,546 to have a carriage-- to have a new line in it. 1113 00:48:32,786 --> 00:48:36,846 So notice now if the user inputs only backslash N, 1114 00:48:37,186 --> 00:48:40,196 it returns this thing, so double quote, 1115 00:48:40,196 --> 00:48:42,426 double unquote means the empty string. 1116 00:48:42,506 --> 00:48:44,636 It is a string, how many bytes are taken 1117 00:48:44,636 --> 00:48:46,416 up to represent quote, unquote? 1118 00:48:46,416 --> 00:48:49,816 What do you think? 1119 00:48:50,006 --> 00:48:51,586 Zero or one? 1120 00:48:52,616 --> 00:48:54,416 Any zero? One? 1121 00:48:55,016 --> 00:48:59,126 Okay, so it is actually one. 1122 00:48:59,126 --> 00:49:03,266 Why? Well, every string, even if it's of some crazy 0 length, 1123 00:49:03,636 --> 00:49:05,466 still has to have the backslash 0. 1124 00:49:05,466 --> 00:49:08,796 Why? We have to be able to know how long this string is 1125 00:49:08,796 --> 00:49:10,546 so that you're not just handed some garbage value. 1126 00:49:10,786 --> 00:49:13,066 But just in case it turns out that there is 1127 00:49:13,066 --> 00:49:15,346 and we've seen this word before, NULL, in all caps, 1128 00:49:16,126 --> 00:49:19,936 that represents, as of today, a pointer-- a special pointer, 1129 00:49:20,276 --> 00:49:22,496 the pointer to memory address zero. 1130 00:49:22,746 --> 00:49:25,526 So a very useful thing about computers is that you 1131 00:49:25,526 --> 00:49:29,046 as programmers are never ever, ever allowed to store anything 1132 00:49:29,256 --> 00:49:31,906 at a location number 0 in a computer's RAM. 1133 00:49:32,416 --> 00:49:35,266 This is simply because so that there's at least one place 1134 00:49:35,266 --> 00:49:38,816 in memory that is guaranteed not to contain a legitimate value, 1135 00:49:39,036 --> 00:49:41,446 so that we can reserve that address, 0, 1136 00:49:41,616 --> 00:49:42,936 as a special constant. 1137 00:49:43,256 --> 00:49:45,286 So henceforth, if we ever wanna check 1138 00:49:45,286 --> 00:49:48,836 if a pointer is actually legitimate or not, that is, 1139 00:49:48,836 --> 00:49:51,496 is there a value there or is it maybe some garbage value, 1140 00:49:51,776 --> 00:49:55,246 we're gonna start checking for NULL in all caps. 1141 00:49:55,246 --> 00:49:58,106 So this function GetString apparently returns NULL 1142 00:49:58,106 --> 00:49:58,906 if there's an error 1143 00:49:59,226 --> 00:50:00,976 or if there's no user input whatsoever. 1144 00:50:01,076 --> 00:50:04,396 >> Now this is hard to induce but it turns out for the curious 1145 00:50:04,396 --> 00:50:07,486 if you hit control D whenever asked by a program for input, 1146 00:50:07,796 --> 00:50:10,966 sometimes just hitting control D will crash a program 1147 00:50:11,196 --> 00:50:14,036 because control D typically means I give you no input 1148 00:50:14,136 --> 00:50:15,126 but stop asking me. 1149 00:50:15,476 --> 00:50:17,226 And that is different from hitting enter, 1150 00:50:17,396 --> 00:50:18,926 which does give you some input, 1151 00:50:18,926 --> 00:50:21,276 it gives you backslash N. Now we have 1152 00:50:21,276 --> 00:50:23,556 to catch this and we soon will. 1153 00:50:23,846 --> 00:50:26,676 So, leading and trailing white space is not ignored. 1154 00:50:26,676 --> 00:50:28,386 So this is to say if the user hit space, space, 1155 00:50:28,386 --> 00:50:30,406 space H-E-L-L-O, you're gonna get back 1156 00:50:30,406 --> 00:50:32,556 to the space bar characters and so forth. 1157 00:50:32,626 --> 00:50:35,436 So ultimately we'll see eventually the implementation 1158 00:50:35,436 --> 00:50:37,466 of GetString but at least it seems 1159 00:50:37,636 --> 00:50:41,186 that sometimes this function might return null. 1160 00:50:41,186 --> 00:50:42,816 So we're gonna have to start checking for this. 1161 00:50:43,126 --> 00:50:44,526 So let's do the following, I'm gonna go ahead 1162 00:50:44,526 --> 00:50:47,336 and create a new file, we'll call it copy1.c, 1163 00:50:47,496 --> 00:50:49,326 and let's now try to oops, 1164 00:50:49,476 --> 00:50:55,776 let's try to actually copy a string rather than just compare. 1165 00:50:55,776 --> 00:50:56,696 So let me zoom in. 1166 00:50:56,696 --> 00:50:58,646 I'm gonna go ahead and do int, main, 1167 00:50:58,726 --> 00:51:03,486 void and I'm gonna go ahead and let's say first ask the user 1168 00:51:03,536 --> 00:51:04,876 as before, say something. 1169 00:51:05,276 --> 00:51:06,466 So we have something to play with. 1170 00:51:06,466 --> 00:51:11,076 And then I'm gonna do a no more string char star, 1171 00:51:11,126 --> 00:51:14,476 S1 gets GetString, alright? 1172 00:51:14,836 --> 00:51:17,366 And now I need to start getting into a new habit. 1173 00:51:17,486 --> 00:51:20,506 We've not mentioned as much, not expected as much, 1174 00:51:20,746 --> 00:51:21,866 but now that we know 1175 00:51:21,866 --> 00:51:27,496 that GetString could return a null pointer where null is bad, 1176 00:51:27,496 --> 00:51:29,386 we better start checking for this. 1177 00:51:29,386 --> 00:51:33,716 So if S1 equals equals the special constant null, 1178 00:51:33,986 --> 00:51:39,056 we should probably do something like printf, an error occurred 1179 00:51:39,056 --> 00:51:43,156 and then backslash N and then return 1, 1180 00:51:43,266 --> 00:51:44,966 or whatever nonzero value. 1181 00:51:45,406 --> 00:51:48,056 Alright, so what-- when might a function 1182 00:51:48,056 --> 00:51:49,886 like GetString ever return an error? 1183 00:51:49,966 --> 00:51:51,336 When might it return null rather 1184 00:51:51,476 --> 00:51:53,476 to signify an error, what could go wrong? 1185 00:51:53,746 --> 00:51:56,326 I can think of at least two things. 1186 00:51:57,076 --> 00:51:58,136 One I already answered. 1187 00:51:59,576 --> 00:52:01,856 Control D, so whatever that thing is, 1188 00:52:01,856 --> 00:52:03,686 control D apparently can do bad things 1189 00:52:03,876 --> 00:52:04,976 and so null might get return. 1190 00:52:04,976 --> 00:52:07,056 Now this is not a terribly worrisome case 1191 00:52:07,056 --> 00:52:09,406 because most humans are not gonna accidentally hit control 1192 00:52:09,406 --> 00:52:12,696 D. But again, there's another more reasonable approach. 1193 00:52:13,166 --> 00:52:15,416 How else could GetString fail potentially? 1194 00:52:16,016 --> 00:52:18,356 If the string is too long, right? 1195 00:52:18,356 --> 00:52:20,976 If I spend half an hour typing letters at the keyboard 1196 00:52:21,196 --> 00:52:23,896 such that I type in like 2 billion and 1 characters 1197 00:52:23,896 --> 00:52:26,156 or 4 billion and 1 characters, more characters 1198 00:52:26,346 --> 00:52:29,246 than I have bytes of RAM, it's possible that GetString, 1199 00:52:29,446 --> 00:52:31,626 which happens to use that function called malloc, 1200 00:52:31,836 --> 00:52:33,946 memory allocation, which we'll see before long, 1201 00:52:34,296 --> 00:52:36,496 what if they just can't give me two gigabytes worth 1202 00:52:36,496 --> 00:52:39,466 of memory just to fit some user's crazy long essay? 1203 00:52:39,786 --> 00:52:42,216 So how does GetString signify that error condition? 1204 00:52:42,216 --> 00:52:43,686 It's got to return a special value. 1205 00:52:43,686 --> 00:52:44,736 What's that special value? 1206 00:52:44,986 --> 00:52:47,006 It's gonna go ahead and return null. 1207 00:52:47,276 --> 00:52:49,836 Now in the past we might have returned false to signify 1208 00:52:49,836 --> 00:52:51,316 that something went wrong. 1209 00:52:51,636 --> 00:52:54,946 But if GetString is defined as returning a string 1210 00:52:55,146 --> 00:52:58,756 but a string we now know is just a pointer, we can't return false 1211 00:52:58,756 --> 00:53:00,136 because false is what? 1212 00:53:00,846 --> 00:53:01,536 It's a Boolean. 1213 00:53:01,716 --> 00:53:04,216 we have to return a pointer so we need to see 1214 00:53:04,216 --> 00:53:06,896 that it gives us a special pointer that's reserved 1215 00:53:06,896 --> 00:53:10,036 for error conditions, and so it commits to returning null 1216 00:53:10,206 --> 00:53:11,276 so now we can check for this. 1217 00:53:11,796 --> 00:53:17,166 So this is as simple as this is, this failure to check with an 1218 00:53:17,166 --> 00:53:19,016 if condition, something as simple as this 1219 00:53:19,016 --> 00:53:23,016 in a program is a huge source of bugs in people's programs, 1220 00:53:23,226 --> 00:53:27,066 as well as relatedly a huge source of hacking opportunities 1221 00:53:27,276 --> 00:53:29,366 if you are not checking the values of pointers 1222 00:53:29,436 --> 00:53:32,446 and are potentially doing things with pointers 1223 00:53:32,446 --> 00:53:33,696 that are just garbage values. 1224 00:53:33,766 --> 00:53:35,706 And we've already talked about the notion of garbage values, 1225 00:53:35,706 --> 00:53:37,546 they apply to pointers as well. 1226 00:53:37,816 --> 00:53:40,266 Alright, so now let's go ahead and make a copy of this thing. 1227 00:53:40,636 --> 00:53:44,246 I wanna make a copy of S2-- of S1 and call it S2. 1228 00:53:44,246 --> 00:53:47,366 So let me say char star S2 gets S1. 1229 00:53:47,366 --> 00:53:50,906 Alright, on the left hand side I'm allocating a pointer called 1230 00:53:51,176 --> 00:53:52,706 S2, I'm assigning it the value of S1. 1231 00:53:52,706 --> 00:53:56,236 And now I'm gonna go ahead and do a printf. 1232 00:53:56,776 --> 00:54:00,996 S1 is percent S backslash N and paste in S1. 1233 00:54:00,996 --> 00:54:03,376 And let me do the same for S2. 1234 00:54:03,576 --> 00:54:08,076 S2 is percent S, S2, okay. 1235 00:54:08,076 --> 00:54:10,126 And let's try this. 1236 00:54:10,346 --> 00:54:14,006 Let me scroll down, let me go ahead and do make copy 1. 1237 00:54:14,706 --> 00:54:17,626 Oh my god, so many errors, what's wrong here? 1238 00:54:18,246 --> 00:54:20,896 So, a couple of things, right? 1239 00:54:20,996 --> 00:54:25,356 I skipped this part, so include cs50.h, alright, 1240 00:54:25,356 --> 00:54:27,026 now I need that for sure for GetString, 1241 00:54:27,026 --> 00:54:28,676 let's see if that solved all my problems. 1242 00:54:28,676 --> 00:54:30,196 This does not mean I have six 1243 00:54:30,196 --> 00:54:32,276 of my problems rather I could just have one 1244 00:54:32,276 --> 00:54:33,486 that has a cascading effect. 1245 00:54:33,766 --> 00:54:36,236 Make copy 1, no, printf, damn it! 1246 00:54:36,316 --> 00:54:37,006 I need that one. 1247 00:54:37,396 --> 00:54:39,206 Alright, so it's probably just a good idea to get into the habit 1248 00:54:39,206 --> 00:54:40,476 of almost always using these things. 1249 00:54:40,796 --> 00:54:42,296 Let's see if I've solved all my problems now. 1250 00:54:43,166 --> 00:54:45,016 Make copy 1, whew look out. 1251 00:54:45,496 --> 00:54:49,026 So copy 1, enter, say something, hello. 1252 00:54:49,936 --> 00:54:51,666 Oh nice, my copy program works. 1253 00:54:52,536 --> 00:54:53,246 Okay, but not quite. 1254 00:54:53,516 --> 00:54:55,476 Let's try something to prove 1255 00:54:55,476 --> 00:54:57,156 that I'm actually not so good at this. 1256 00:54:57,726 --> 00:55:00,676 So let me try to capitalize-- capit-- 1257 00:55:00,816 --> 00:55:01,676 I don't need to Google this. 1258 00:55:01,676 --> 00:55:05,676 Capitalize just one of the strings. 1259 00:55:05,856 --> 00:55:08,076 And I'm gonna do this as follows. 1260 00:55:08,076 --> 00:55:11,426 I'm gonna capitalize just S2, and how do I capitalize it? 1261 00:55:11,536 --> 00:55:14,656 Well, I'm gonna capitalize its first letter like this 1262 00:55:15,076 --> 00:55:18,476 by 2 upper, can I do this? 1263 00:55:18,996 --> 00:55:22,206 S2 bracket 0, you might have used this function before 1264 00:55:22,206 --> 00:55:24,916 in Caesar or Visionare or if not 2 upper should. 1265 00:55:24,916 --> 00:55:27,136 Just uppercase this one-- a letter. 1266 00:55:27,466 --> 00:55:28,966 But let's see if I need to include anything. 1267 00:55:28,966 --> 00:55:29,926 Let's try to run this. 1268 00:55:30,116 --> 00:55:32,886 Implicit declaration of function 2 upper, so no problem. 1269 00:55:33,206 --> 00:55:35,156 So two upper is a string function. 1270 00:55:35,236 --> 00:55:37,046 So let's go back to the same documentation. 1271 00:55:37,046 --> 00:55:40,076 Two upper, C type, so here's another file. 1272 00:55:40,076 --> 00:55:42,456 We actually glimpsed that a moment ago in my directory, 1273 00:55:42,696 --> 00:55:44,186 but I need to do C type. 1274 00:55:44,916 --> 00:55:46,006 So let's go up here. 1275 00:55:46,916 --> 00:55:50,866 Include ctype.h. Now let's zoom in on the bottom, 1276 00:55:50,976 --> 00:55:53,936 try recompiling one last time, it looks good. 1277 00:55:54,116 --> 00:55:56,946 Now let's run copy 1, hello? 1278 00:55:58,236 --> 00:56:00,856 Hmm, now kind of interesting 1279 00:56:00,856 --> 00:56:03,446 but hopefully less interesting having just spent all the time 1280 00:56:03,446 --> 00:56:05,396 talking about what a pointer is and what a string is. 1281 00:56:05,496 --> 00:56:07,056 Why is this program flawed? 1282 00:56:07,056 --> 00:56:10,646 I capitalized S2 but apparently the computer decided 1283 00:56:10,646 --> 00:56:12,526 to capitalize S1 also, but why is that? 1284 00:56:12,526 --> 00:56:14,606 >> They point to the same place. 1285 00:56:14,606 --> 00:56:16,146 >> They point to the same place, exactly. 1286 00:56:16,146 --> 00:56:17,266 So what's really happened? 1287 00:56:17,506 --> 00:56:18,626 Well, I didn't type goodbye, 1288 00:56:18,686 --> 00:56:20,596 so let me delete this part of that story. 1289 00:56:20,946 --> 00:56:22,906 I did type in H-E-L-L-O. 1290 00:56:23,006 --> 00:56:26,326 I did create S1 and S1 is apparently pointing there. 1291 00:56:26,516 --> 00:56:31,796 But when I do S1 equal sign S2, or rather S2 equal sign S1, 1292 00:56:31,796 --> 00:56:32,956 what's really happening? 1293 00:56:33,166 --> 00:56:34,806 I'm not copying all of this. 1294 00:56:34,806 --> 00:56:38,506 I'm literally copying S1 putting it into S2. 1295 00:56:38,506 --> 00:56:39,786 So what ends up getting written here? 1296 00:56:41,486 --> 00:56:43,946 1, 2, 3, 4, 1, 2, 3, 4. 1297 00:56:44,156 --> 00:56:48,146 So when I go to S2 bracket 0, bracket 0 is 1298 00:56:48,146 --> 00:56:49,016 at what memory address? 1299 00:56:49,666 --> 00:56:51,806 1, 2, 3, 4. 1300 00:56:51,806 --> 00:56:52,886 Where is that on the board? 1301 00:56:52,886 --> 00:56:53,756 Well, it's over here. 1302 00:56:53,926 --> 00:56:55,426 So what did I change the H to? 1303 00:56:55,686 --> 00:56:58,516 A capital H, but wait a minute that's the exact same chunk 1304 00:56:58,516 --> 00:57:01,686 of memory that represents the S1 string, but just so happens 1305 00:57:01,686 --> 00:57:04,666 that now I do have copies but have copies of the pointer, 1306 00:57:04,906 --> 00:57:06,346 not copies of the whole string. 1307 00:57:06,586 --> 00:57:08,706 Now as an aside, I've been making 1308 00:57:08,706 --> 00:57:10,686 up 1, 2, 3, 4; 4, 5, 6, 7. 1309 00:57:10,886 --> 00:57:13,476 But the reality is these numbers are almost always uninteresting 1310 00:57:13,476 --> 00:57:13,846 to us. 1311 00:57:14,086 --> 00:57:16,366 So generally what the world does as we'll see 1312 00:57:16,366 --> 00:57:19,366 in clay before long is the world just says if it's a pointer, 1313 00:57:19,366 --> 00:57:21,316 let's just go with something much more intuitive. 1314 00:57:21,466 --> 00:57:23,736 If that's a pointer, just point on the board 1315 00:57:23,896 --> 00:57:26,556 to whatever it is you're pointing to in memory. 1316 00:57:26,966 --> 00:57:30,086 So we would typically draw pointers as arrows pointing 1317 00:57:30,086 --> 00:57:32,596 at something but realize that what's really inside 1318 00:57:32,596 --> 00:57:34,546 of here is a number like 1, 2, 3, 4. 1319 00:57:34,546 --> 00:57:37,036 But it's kind of uninteresting to keep making up fake numbers. 1320 00:57:37,236 --> 00:57:38,516 So we'll typically just draw arrows. 1321 00:57:38,956 --> 00:57:41,666 So just conceptually, what's the solution to this copy problem? 1322 00:57:41,666 --> 00:57:43,966 How do we actually duplicate the string? 1323 00:57:47,336 --> 00:57:50,506 What do we need to do in order 1324 00:57:50,506 --> 00:57:55,066 to make a genuine copy of S1 and call it S2? 1325 00:57:55,066 --> 00:57:56,226 Sorry? Yeah. 1326 00:57:58,776 --> 00:58:01,666 >> Individually copy each character. 1327 00:58:01,746 --> 00:58:04,166 >> Okay, we need to individually copy each character, 1328 00:58:04,346 --> 00:58:04,816 but into what? 1329 00:58:05,956 --> 00:58:06,826 >> Into like another-- 1330 00:58:06,826 --> 00:58:09,186 >> Yeah, so really what we need to fix this problem 1331 00:58:09,186 --> 00:58:12,506 and finish this story is we need the same amount of memory 1332 00:58:12,756 --> 00:58:14,886 that has as many locations. 1333 00:58:15,056 --> 00:58:17,236 I don't know what's here by default, 'cause again just 1334 00:58:17,236 --> 00:58:19,206 as with variables when you allocate memory, 1335 00:58:19,396 --> 00:58:21,276 you're gonna get a whole bunch of garbage, it's up to you 1336 00:58:21,276 --> 00:58:22,726 to populate it with actual values. 1337 00:58:22,976 --> 00:58:25,376 So that begs the question, how do you create an array 1338 00:58:25,546 --> 00:58:27,906 of a specific size dynamically? 1339 00:58:28,116 --> 00:58:30,026 How do you dynamically allocate memory? 1340 00:58:30,026 --> 00:58:31,256 Well, it's actually pretty easy. 1341 00:58:31,256 --> 00:58:34,916 Let me go up here and I'm still gonna check null. 1342 00:58:34,916 --> 00:58:37,886 But before I do this copy, what I'm gonna do is this. 1343 00:58:37,886 --> 00:58:40,396 I'm gonna first say S2 is not S1 1344 00:58:40,516 --> 00:58:45,416 but rather char star S2 equals the return value 1345 00:58:45,416 --> 00:58:47,086 of this fancy new function malloc. 1346 00:58:47,196 --> 00:58:50,776 And just to oversimplify for a moment, I need 1, 2, 3, 4, 1347 00:58:50,776 --> 00:58:52,696 5, technically 6 bytes. 1348 00:58:53,266 --> 00:58:55,306 So I'm gonna hard code this ever so briefly. 1349 00:58:55,756 --> 00:58:58,546 This function, malloc, called as follows, 1350 00:58:58,546 --> 00:59:01,346 will hand me back 6 bytes of memory 1351 00:59:01,556 --> 00:59:03,106 which pictorially would look like this. 1352 00:59:03,106 --> 00:59:06,146 They're at some other memory address, but how do I know 1353 00:59:06,146 --> 00:59:09,386 at what memory address malloc allocated me memory? 1354 00:59:10,246 --> 00:59:11,366 Via it's return value. 1355 00:59:11,596 --> 00:59:15,206 So what malloc does if it can is it allocates memory for you. 1356 00:59:15,586 --> 00:59:19,196 It then returns to you the address of which of the bytes 1357 00:59:19,196 --> 00:59:21,456 of memory it just allocated, which of the six bytes. 1358 00:59:21,926 --> 00:59:24,406 The first, and it's gonna be up to me 1359 00:59:24,406 --> 00:59:27,066 to make sure I put a special sentinel value, 1360 00:59:27,066 --> 00:59:29,236 special backslash 0 at the very end 1361 00:59:29,546 --> 00:59:32,876 so that I don't forget how long this chunk of memory is. 1362 00:59:33,096 --> 00:59:36,376 Now I probably shouldn't hard code something like 6. 1363 00:59:36,756 --> 00:59:40,396 How can I decide dynamically what the length 1364 00:59:40,396 --> 00:59:41,686 of S1 actually is? 1365 00:59:41,686 --> 00:59:47,706 Yeah, so a sterling of S1 like this. 1366 00:59:48,626 --> 00:59:49,646 Okay, so plus 1. 1367 00:59:49,706 --> 00:59:50,916 So here it's okay to hard code 1368 00:59:50,916 --> 00:59:53,436 that plus 1 'cause you always need plus 1 byte 1369 00:59:53,696 --> 00:59:55,256 for that special backslash 0. 1370 00:59:55,256 --> 00:59:58,066 So to be clear, when you call string length as you've done 1371 00:59:58,066 --> 00:59:59,856 for a week or so perhaps in your own code, 1372 01:00:00,086 --> 01:00:02,546 it returns the actual human length of the string, 1373 01:00:02,736 --> 01:00:04,886 it does not include the backslash 0. 1374 01:00:05,086 --> 01:00:06,726 >> So when you're allocating memory for this, 1375 01:00:06,996 --> 01:00:08,886 you actually do need to do so. 1376 01:00:09,166 --> 01:00:11,446 Now as an aside, the official version 1377 01:00:11,446 --> 01:00:12,466 of this code that's posted 1378 01:00:12,466 --> 01:00:17,076 on the course's website actually has a slight modification, 1379 01:00:17,076 --> 01:00:21,536 whereby I have multiplied by the size of a char. 1380 01:00:21,986 --> 01:00:23,726 So this is not strictly necessary 'cause 1381 01:00:23,726 --> 01:00:26,896 on almost every system a string is composed 1382 01:00:26,896 --> 01:00:29,256 of chars and a char is 1 byte. 1383 01:00:29,656 --> 01:00:33,176 But just in case my program ends up getting used for 20 years 1384 01:00:33,446 --> 01:00:35,146 and is eventually used on a computer 1385 01:00:35,296 --> 01:00:37,266 that actually uses 2 bytes for characters. 1386 01:00:37,266 --> 01:00:38,676 And this has already started to happen. 1387 01:00:38,676 --> 01:00:40,246 There's a system-- there's an alternative 1388 01:00:40,246 --> 01:00:42,846 to ASCII called Unicode that actually uses at least 2 bytes. 1389 01:00:42,846 --> 01:00:45,246 This is particularly helpful for Asian languages. 1390 01:00:45,246 --> 01:00:47,126 We're having only 8 bits. 1391 01:00:47,406 --> 01:00:49,756 It's insufficient to represent all possible symbols. 1392 01:00:50,016 --> 01:00:52,396 So just in case this program is run on the computer 1393 01:00:52,396 --> 01:00:54,456 where characters are more than 1 byte, 1394 01:00:54,746 --> 01:00:57,846 there's a special operator called sizeof that if I call it 1395 01:00:57,846 --> 01:01:00,436 and pass in the name of the data type I'm trying 1396 01:01:00,436 --> 01:01:02,796 to allocate memory for, it will say 1. 1397 01:01:03,106 --> 01:01:05,516 Or maybe on fancier systems, it will return 2. 1398 01:01:05,776 --> 01:01:08,426 So technically, even though this is getting a little scarier 1399 01:01:08,426 --> 01:01:10,846 versus the just malloc 6 that I hard coded, 1400 01:01:10,846 --> 01:01:12,496 this is now a bit more rigorous. 1401 01:01:13,016 --> 01:01:16,276 So now I have memory, now I have this chunk of memory. 1402 01:01:16,496 --> 01:01:19,356 How do I actually populate it with values? 1403 01:01:19,766 --> 01:01:22,526 Well, how do I go about copying one string into another? 1404 01:01:23,086 --> 01:01:28,676 What about for int I get 0, I is less than-- 1405 01:01:28,676 --> 01:01:30,286 oh wait, let's not make that same mistake again. 1406 01:01:30,616 --> 01:01:32,946 N gets the string length of S1. 1407 01:01:33,516 --> 01:01:38,796 I is less than N, I++, and what do I wanna do? 1408 01:01:38,826 --> 01:01:43,676 S2 bracket I gets S1 bracket I, and is this quite right? 1409 01:01:44,786 --> 01:01:47,066 Slight-- ever so subtle bug. 1410 01:01:48,146 --> 01:01:50,126 Okay, besides saving it, thank you. 1411 01:01:51,546 --> 01:01:51,886 What else? 1412 01:01:52,146 --> 01:01:57,016 I wanna click here, I wanna fix this here, I actually-- 1413 01:01:57,016 --> 01:01:58,836 I need to fix-- I can fix this in a couple ways. 1414 01:01:58,836 --> 01:02:01,266 I probably wanna do something like this, 'cause I also want 1415 01:02:01,266 --> 01:02:05,176 to copy that backslash 0, so it's okay in this case 1416 01:02:05,176 --> 01:02:07,956 to go one extra because I know it's a backslash 0, 1417 01:02:08,296 --> 01:02:10,526 so now I've actually copied the string. 1418 01:02:10,676 --> 01:02:12,376 But something could still go wrong. 1419 01:02:12,376 --> 01:02:14,046 What happens if what the user is trying 1420 01:02:14,046 --> 01:02:15,966 to copy is again a huge essay that they've typed in? 1421 01:02:16,356 --> 01:02:20,076 I need to check the value of some-- of some variable here. 1422 01:02:21,326 --> 01:02:24,736 If S2 equals equals NULL, 1423 01:02:24,886 --> 01:02:27,356 what I should really do here is something like printf, 1424 01:02:27,566 --> 01:02:32,706 an error occurred EG out of memory or something like that. 1425 01:02:32,706 --> 01:02:35,946 And then I wanna go ahead and return something other than 0. 1426 01:02:36,126 --> 01:02:38,256 So these are the habits you now need to get into. 1427 01:02:38,256 --> 01:02:40,996 Anytime you're dealing with memory, it's gonna be super easy 1428 01:02:40,996 --> 01:02:43,526 to have your code, segmentation fault, core dump, 1429 01:02:43,676 --> 01:02:47,696 all because of misuse of these things called pointers. 1430 01:02:48,126 --> 01:02:50,746 So now if I capitalize, should this work? 1431 01:02:51,856 --> 01:02:53,426 If I leave the rest of my code alone? 1432 01:02:53,676 --> 01:02:56,886 Let's go down here, let's go ahead and rerun-- 1433 01:02:56,886 --> 01:03:00,426 copy, big syntax error, implicit [inaudible] of string length, 1434 01:03:00,426 --> 01:03:02,056 so I'm back to making those old mistakes. 1435 01:03:02,056 --> 01:03:05,386 Alright, include string.h 'cause I remember where that came from, 1436 01:03:05,696 --> 01:03:08,956 and it's telling me I have a syntax error somewhere? 1437 01:03:08,956 --> 01:03:10,316 [ Inaudible Remark ] 1438 01:03:10,316 --> 01:03:11,836 >> Semicolon, right there. 1439 01:03:11,836 --> 01:03:14,676 Perfect. Alright, so now let me try recompiling, 1440 01:03:15,046 --> 01:03:19,466 let me rerun copy, hello, enter, nice. 1441 01:03:19,986 --> 01:03:21,996 So now, one is lower case, the original; 1442 01:03:22,316 --> 01:03:24,176 one is upper case, the new one. 1443 01:03:24,526 --> 01:03:26,396 So what's really going on here? 1444 01:03:26,396 --> 01:03:30,096 So, here's an example that actually has a bit more syntax 1445 01:03:30,096 --> 01:03:31,196 and this time it uses int. 1446 01:03:31,876 --> 01:03:33,426 So, here's a function main, 1447 01:03:34,046 --> 01:03:36,006 and let's say we allocate a variable X 1448 01:03:36,596 --> 01:03:39,086 but it's this time not an int, it's gonna be a pointer 1449 01:03:39,386 --> 01:03:44,126 to a value X-- sorry, rather a pointer to an integer, 1450 01:03:44,126 --> 01:03:48,896 so let me erase this and draw a picture consistent 1451 01:03:49,266 --> 01:03:53,206 with what's going on here, alright. 1452 01:03:54,066 --> 01:03:55,796 So, what's going on? 1453 01:03:55,796 --> 01:03:58,896 When I declare int star X, that's essentially like saying, 1454 01:03:58,896 --> 01:04:01,786 give me a pointer, call it X and I don't know what's there, 1455 01:04:01,786 --> 01:04:03,456 so it's some garbage value, question mark. 1456 01:04:03,746 --> 01:04:06,426 Then I do the same thing, give me a pointer , call it Y, 1457 01:04:06,426 --> 01:04:08,716 it's gonna be the same size but a different chunk of memory, 1458 01:04:08,716 --> 01:04:10,266 question mark, 'cause I don't know what it is. 1459 01:04:10,636 --> 01:04:13,596 Now, X gets malloc size of int, 1460 01:04:13,956 --> 01:04:15,946 so malloc again allocates memory, 1461 01:04:16,226 --> 01:04:17,466 how much memory is gonna come back? 1462 01:04:17,466 --> 01:04:19,406 How many bytes based on this call? 1463 01:04:20,796 --> 01:04:22,806 How big is an int? 1464 01:04:23,006 --> 01:04:25,616 So an int is 32 bits, otherwise known as 4 bytes, 1465 01:04:25,866 --> 01:04:29,356 that by coincidence is identical to the size of a pointer 1466 01:04:29,406 --> 01:04:31,286 in the appliance, so how do I draw this? 1467 01:04:31,806 --> 01:04:34,556 Well, I keep saying that 4 bytes or 32 bits is a square, 1468 01:04:34,906 --> 01:04:37,106 so this is what malloc just gave me. 1469 01:04:37,176 --> 01:04:38,666 And what got stored here in X, 1470 01:04:39,156 --> 01:04:44,246 according to the third line of code? 1471 01:04:44,496 --> 01:04:47,806 So malloc allocated this one chunk of memory 1472 01:04:48,056 --> 01:04:50,696 which is 32 bits-- or 4 bytes. 1473 01:04:50,696 --> 01:04:52,966 Why? Because the size of an int is 32 bits. 1474 01:04:53,516 --> 01:04:54,866 What does malloc return in general? 1475 01:04:55,996 --> 01:04:59,216 The memory address, the first byte of whatever chunk 1476 01:04:59,216 --> 01:05:02,016 of memory it just allocated, so maybe it's 1, 2, 3, 4, 1477 01:05:02,216 --> 01:05:03,766 so we would draw 1, 2, 3, 4 here. 1478 01:05:03,936 --> 01:05:05,506 But again, we're kind of past that point. 1479 01:05:05,696 --> 01:05:08,066 We can now just generalize and say, what is an X? 1480 01:05:08,276 --> 01:05:10,196 It's a pointer to that chunk of memory. 1481 01:05:10,846 --> 01:05:12,086 Alright, and how about Y? 1482 01:05:12,086 --> 01:05:13,386 What's happening now with Y? 1483 01:05:13,746 --> 01:05:14,506 Well, let's see. 1484 01:05:14,606 --> 01:05:17,996 Well, it turns out that this star notation can actually be 1485 01:05:17,996 --> 01:05:19,716 used in two different ways. 1486 01:05:19,816 --> 01:05:21,566 One, to say that something is a pointer, 1487 01:05:21,856 --> 01:05:24,906 and another to say go there. 1488 01:05:25,066 --> 01:05:27,546 So before when I had this piece of paper and I said 1, 2, 3, 4, 1489 01:05:27,546 --> 01:05:30,206 I handed it to my friend, when my friend went there, 1490 01:05:30,446 --> 01:05:32,956 you expressed the idea of going to a pointer 1491 01:05:33,136 --> 01:05:35,086 by saying star pointer name. 1492 01:05:35,296 --> 01:05:38,546 You don't specify the keyword int again, you just say star X, 1493 01:05:38,666 --> 01:05:42,116 means go to the address of X that's on that piece of paper, 1494 01:05:42,416 --> 01:05:46,096 so here's X, when I say go there, you can literally think 1495 01:05:46,096 --> 01:05:49,206 of it as following the arrow to see where you end up 1496 01:05:49,206 --> 01:05:51,766 and what do I put here according to the line of code? 1497 01:05:52,196 --> 01:05:54,416 42. So that int goes there. 1498 01:05:54,986 --> 01:05:59,806 But here's a bad thing, now I'm saying star Y gets 13. 1499 01:06:00,156 --> 01:06:02,896 So here's Y, where do I go? 1500 01:06:04,146 --> 01:06:07,306 Question mark, you can now think of as being like, you know, 1501 01:06:07,306 --> 01:06:10,426 who knows where this actually is, it's pointing 1502 01:06:10,426 --> 01:06:11,446 to some garbage value. 1503 01:06:11,446 --> 01:06:12,446 So where am I gonna go? 1504 01:06:12,666 --> 01:06:15,166 I'm gonna follow this pointer and then bam! 1505 01:06:15,396 --> 01:06:18,526 I'm gonna put 13 here but that is not my memory. 1506 01:06:18,576 --> 01:06:22,246 What happens when you touch memory that's not yours? 1507 01:06:22,536 --> 01:06:24,126 Segmentation fault, right, core dump, 1508 01:06:24,126 --> 01:06:25,636 if you touch memory that's not yours. 1509 01:06:25,856 --> 01:06:30,846 Now, this is bad, so if I then do though Y equals X, 1510 01:06:31,556 --> 01:06:33,656 Y equals X, how does the picture change? 1511 01:06:34,036 --> 01:06:35,716 Y equals X? 1512 01:06:37,296 --> 01:06:40,126 So I change its garbage value, I have no idea 1513 01:06:40,126 --> 01:06:41,116 where that was even going. 1514 01:06:41,376 --> 01:06:44,296 So if I say Y equals X, well, this is literally like saying 1515 01:06:44,296 --> 01:06:47,796 if this is 1, 2, 3, 4, put 1, 2, 3, 4 in Y. But pictorially, 1516 01:06:47,856 --> 01:06:49,846 it's just saying point to the same value. 1517 01:06:50,226 --> 01:06:54,026 So now if I say star Y gets 13, let's bring it home, 1518 01:06:54,806 --> 01:06:55,926 what changes on the board? 1519 01:06:56,156 --> 01:06:57,426 Star Y gets 13? 1520 01:06:58,976 --> 01:07:03,386 42 becomes 13, so what is the value of star Y? 1521 01:07:03,686 --> 01:07:05,346 13. What is the value of star X? 1522 01:07:06,586 --> 01:07:09,606 Also 13. But this program might not even gotten this far 1523 01:07:10,096 --> 01:07:10,616 because of this. 1524 01:07:11,616 --> 01:07:13,416 Lets see what Binky has to say on the same subject. 1525 01:07:14,516 --> 01:07:21,516 [ Pause ] 1526 01:07:22,016 --> 01:07:22,083 [ Music ] 1527 01:07:22,126 --> 01:07:28,836 >> Hey Binky, wake up, its time for Pointer Fun. 1528 01:07:29,136 --> 01:07:29,876 >> What's that? 1529 01:07:29,876 --> 01:07:31,026 Learn about pointers? 1530 01:07:31,026 --> 01:07:32,516 Oh goody! 1531 01:07:32,516 --> 01:07:34,296 >> Well, to get started, I guess we're gonna need a 1532 01:07:34,296 --> 01:07:34,986 couple pointers. 1533 01:07:35,486 --> 01:07:38,116 >> Okay, this code allocates two pointers 1534 01:07:38,116 --> 01:07:39,396 which can point to integers. 1535 01:07:40,036 --> 01:07:42,846 >> Okay, well, I see the two pointers but they don't seem 1536 01:07:42,846 --> 01:07:43,996 to be pointing to anything. 1537 01:07:43,996 --> 01:07:44,936 >> That's right. 1538 01:07:44,936 --> 01:07:46,746 Initially, pointers don't point to anything. 1539 01:07:46,796 --> 01:07:49,296 The things they point to are called pointees 1540 01:07:49,296 --> 01:07:51,336 and so [inaudible] is a separate step. 1541 01:07:51,336 --> 01:07:52,646 >> Oh right, right, I knew that. 1542 01:07:52,886 --> 01:07:54,646 The pointees are separate or-- 1543 01:07:54,646 --> 01:07:56,366 so, how do you allocate a pointee? 1544 01:07:57,126 --> 01:08:00,646 >> Okay, well, this code allocates a new integer pointee 1545 01:08:00,646 --> 01:08:03,096 and this part sets extra point to it. 1546 01:08:03,976 --> 01:08:06,196 >> Hey that looks better, so make it do something. 1547 01:08:06,756 --> 01:08:09,366 >> Okay, I'll dereference the pointer X 1548 01:08:09,566 --> 01:08:12,106 to store the number 42 into its pointee. 1549 01:08:12,106 --> 01:08:15,436 For this trick, I'll need my magic wand of dereferencing. 1550 01:08:16,056 --> 01:08:18,766 >> You're magic wand of dereferencing? 1551 01:08:19,066 --> 01:08:20,096 That-- that's great. 1552 01:08:20,096 --> 01:08:22,666 >> This is what the code looks like. 1553 01:08:22,996 --> 01:08:24,886 I'll just set up the number and-- 1554 01:08:24,886 --> 01:08:27,416 >> Hey look, there it goes. 1555 01:08:28,066 --> 01:08:31,526 So doing a dereference on X, follows the own, 1556 01:08:31,526 --> 01:08:34,676 access its pointee, in this case to store 42 in there. 1557 01:08:35,066 --> 01:08:37,296 Hey, try using it to store the number 13 1558 01:08:37,296 --> 01:08:38,706 through the other pointer, Y. 1559 01:08:39,476 --> 01:08:44,466 >> Okay, I'll just go over here to Y and get the number 13 set 1560 01:08:44,686 --> 01:08:50,936 up and then take the wand of dereferencing and just-- whoa! 1561 01:08:50,936 --> 01:08:53,186 >> Oh hey, that didn't work. 1562 01:08:53,186 --> 01:08:54,756 Say, Binky, I don't think 1563 01:08:54,806 --> 01:08:58,136 that dereferencing Y is a good idea 'cause you know setting 1564 01:08:58,136 --> 01:08:59,776 up the pointee is a separate step 1565 01:08:59,776 --> 01:09:02,076 and I don't think we ever did it. 1566 01:09:02,366 --> 01:09:02,826 >> Good point. 1567 01:09:03,546 --> 01:09:06,736 >> Yeah, we-- we allocated the pointer Y, but we never set it 1568 01:09:06,736 --> 01:09:08,156 to point to a pointee. 1569 01:09:08,526 --> 01:09:09,826 >> Very observant. 1570 01:09:10,426 --> 01:09:11,816 >> Hey, you're looking good there, Binky. 1571 01:09:12,356 --> 01:09:15,376 Can you fix it so that Y points to the same pointee as X? 1572 01:09:15,376 --> 01:09:17,996 >> Sure, I'll use my magic wand of pointer assignment. 1573 01:09:17,996 --> 01:09:20,366 >> Is that gonna be a problem like before? 1574 01:09:20,366 --> 01:09:22,556 >> No, this doesn't touch the pointees. 1575 01:09:22,706 --> 01:09:24,256 It does changes one pointer to point 1576 01:09:24,256 --> 01:09:25,386 to the same thing as another. 1577 01:09:26,256 --> 01:09:30,796 >> Oh I see, now Y points to the same place as X. So-- so wait! 1578 01:09:30,796 --> 01:09:33,866 Now, Y is fixed, it has a pointee, so you can try the wand 1579 01:09:33,866 --> 01:09:37,056 of dereferencing again to send the 13 over. 1580 01:09:37,496 --> 01:09:38,896 >> Okay, here goes. 1581 01:09:40,006 --> 01:09:41,266 >> Hey look at that. 1582 01:09:41,266 --> 01:09:42,596 Now, dereferencing works on Y, 1583 01:09:42,886 --> 01:09:44,266 and because the pointers are sharing 1584 01:09:44,266 --> 01:09:46,836 that one pointee, they both see the 13. 1585 01:09:47,086 --> 01:09:48,446 >> Yeah, sharing-- whatever. 1586 01:09:48,786 --> 01:09:50,376 So are we gonna switch places now? 1587 01:09:50,686 --> 01:09:52,286 >> Oh look, we're out of time. 1588 01:09:52,616 --> 01:09:52,796 >> But-- 1589 01:09:53,486 --> 01:09:55,096 >> Just remember the three pointer rules. 1590 01:09:55,446 --> 01:09:58,216 Number 1, the basic structure is that you have a pointer 1591 01:09:58,216 --> 01:10:01,146 and it points over to a pointee, but the pointer 1592 01:10:01,146 --> 01:10:03,846 and pointee are separate, and the common error is to set 1593 01:10:03,846 --> 01:10:06,246 up a pointer but to forget to give it a pointee. 1594 01:10:06,926 --> 01:10:09,826 Number 2, pointer dereferencing starts at the pointer 1595 01:10:10,106 --> 01:10:12,666 and follows its arrow over to access its pointee. 1596 01:10:12,966 --> 01:10:15,816 As we all know, this only works if there is a pointee, 1597 01:10:15,996 --> 01:10:17,536 which kind of gets back to rule number 1. 1598 01:10:17,536 --> 01:10:21,716 Number 3, pointer assignment takes one pointer and changes it 1599 01:10:21,716 --> 01:10:23,806 to point to the same pointee as another pointer, 1600 01:10:24,266 --> 01:10:26,466 so after the assignment, the two pointers will point 1601 01:10:26,466 --> 01:10:29,246 to the same pointee, sometimes that's called sharing, 1602 01:10:29,246 --> 01:10:31,256 and that's all there is to it really. 1603 01:10:31,546 --> 01:10:31,876 Bye-bye now. 1604 01:10:32,756 --> 01:10:33,386 >> This is Binky. 1605 01:10:33,386 --> 01:10:35,816 This is Nick Parlante of Stanford University. 1606 01:10:35,816 --> 01:10:36,496 This is CS50. 1607 01:10:36,496 --> 01:10:38,146 We will see you next week. 1608 01:10:38,646 --> 01:10:45,490 [ Applause ]