Section Problem Set 2: Hacker Edition Rob Bowden, Harvard University This is CS50. CS50.TV So, I'm Rob. I'm a senior in Kirkland. This is my third year TFing CS50. It is the first time that we are changing from the traditional-lecture-style section, where we just kind of review what happened in lecture and then you guys ask questions, now to being a lot more problem-based, where we use Spaces, and-- Oh, so the idea is to go to that link I sent you and then you'll be in my Space. Does anyone not have a laptop? Okay. So we're going to be using this, and we're going to be doing problems live in section and discussing them and figuring out what's wrong and I might pull up some of your code, and I might discuss your ideas. So has anyone had difficulty? You can chat on the side; I don't know if we'll have reason for that. Now, like the previous supersection, if you were at that class, you know what that's about. On all of P sets there's going to be these sections. So P-set 2, specifications, I guess you saw it on P-set 1 already. But we can look at P-set 2 for what we're going to be going over today. And you'll see a section of questions. So this will be in all of the P-sets; there'll be a section of questions. So far we've said, "Consider this an opportunity to practice." You won't be asked to submit this program. The idea is that these are supposed to kind of help you get started with the problem set. I guess on the Hacker edition, a lot of them are supposed to just be new, interesting things to learn. They may not be directly applicable to the problem set. And right now we're not having you submit them, but in theory, for later problem sets, you might submit them, and thus you can either come to section or watch the section to get the answers, or you can just get them on your own if you don't feel like enjoying my presence. So the--I think this is the first one. Oh. Also, under these sections of questions we also have you ask questions about the shorts. So I guess, in theory, you're supposed to watch these before coming to section, but it's fine if you don't; we'll go over them anyway. So we can start with these: "How does a while loop differ from a do-while loop? When is the latter particularly useful?" So anyone have any--? [Student] The do-while loop will always execute at least once. Yes. So that is the difference. A while loop--I'll just do it on here--while loop, we have the condition right here, whereas a do-while, you don't have a condition until we get down here. And so, when your program's executing, and it gets to the while loop, it immediately checks if this condition is true. If that condition is not true, it will just skip over the loop entirely. Do-while loop, as the program is executing, it gets to the "do." Nothing happens at this point, just continues executing. Then when it hits the "while," if the condition is true, it'll loop back and do it again and again and again until the condition is not true and then just falls through. So, the difference being, that this can skip right from the very start. This necessarily executes once and then may execute more times if the condition is still true. So the while loop will only do it once, or--the while loop--we may not need to do it at all, since as soon as we get to it, if the condition is false, we'll just skip right over it. Whereas do-while loop, we will execute it once, necessarily. Then, when we get to the condition, we check if it's true or false. If it's true, we'll do it again; if it's false, we'll just continue going. So when is the latter particularly useful? So I can say that in the entirety of the 4 years, 3 years, whatever, that I've been programming, I have used this, like, under 10 times. And probably 5 of them are in CS50 when we're introducing do-while loops. So when do you used do-while loops? When is the--yeah? [Student] When you're trying to get user input, or something you want to check-- Yeah. So do-while loops, user input is the big one. That's why on the first couple problem sets, when you want to ask the user, like, "give me a string," you can't continue until you get that string. And so you, necessarily, need to ask for the string at least once. But then if they answer something bad, then you need to loop back and ask again. But other than user input, it's very rare that I encounter a case where I want to loop "at least once" but possibly more. Questions or--? Has anyone used a do-while loop anywhere else? Okay. So the next one is, "What does undeclared identifier usually indicate if outputted by clang?" So what kind of code could I write to get 'undeclared identifier?' [Student] That x = 2? So we can just try it in here, x = 2. We'll run this--oh, I didn't click it. So here we get--all right. "Use of undeclared identifier x." So that's the undeclared identifier, a variable. It will frequently call a variable an identifier. So it might not know it's actually a variable; it doesn't know what it is. So it's an identifier. So why is it undeclared? Yeah. So to be clear on terminology, the declaration of a variable is when you say "int x," or "string y," whatever. The initialization of the variable, or the assignment of the variable, is whenever you say "x = 2." So we can do these in separate steps, int x, x = 2, and until--we can have a bunch of stuff in here-- but until this line happens, x is still uninitialized, but it has been declared. And so we can obviously do it in 1 line, and now we are declaring and initializing. Questions? And finally, "Why is the Caesar Cipher not very secure?" So first, does anyone want to say what the Caesar Cipher is? [Student] Caesar Cipher just is that you map, you shift every letter, a certain number of letters go over, and move back over, and it's not very secure because there's only 26 possible options and you just have to try each 1 of those until you get it. Oh. So, I should repeat? The Caesar Cipher, it's--I mean, you'll be dealing with it on the problems that you-- or I guess the standard edition of the problem set that's not on the hacker edition. So on the standard edition to the problem set, you get a message like, "Hello, world, " and you also have a number like 6, and you take that message, and each individual character, you rotate it by 6 positions in the alphabet. So the 'h' in hello would become h-i-j-k-l-m-n. So the first letter would be n. We do the same thing with e. If we have a, like, z or something, then we wrap back around to 'a.' But each character gets cycled 6 characters later in the alphabet, and it's not very secure since there are only 26 possibilities for how many ways you could wrap a single letter. So you can just try all 26 of them and, presumably, for a long enough message, only 1 of those possible 26 things is going to be legible, and the legible one is going to be the original message. So it's not a very good way of encrypting anything at all. Unrelated to those shorts, "What is a function?" So what is a function? Yes. [Student] It's like a separate piece of code that you can call to go through and then get the return value of whatever. Yeah. So I'll answer it by also answering the next--or repeat by also just answering the next one. You can use functions instead of just copying and pasting code over and over again. Just take that code, put it into a fuction, and then you could just call the function wherever you have been copying and pasting. So functions are useful. So now we'll do actual problems. The first one. So the idea of the first one is, you pass it a string, and regardless of the-- or does it say all lowercase? It does not say all lowercase. So the message can be anything, and--oh no. It does. "For simplicity, you may assume that the user will only input lowercase letters and spaces." So we pass it a message with only lowercase letters and then we alternate between capital and lowercase--we change the string to be capital and lowercase, alternating. So before we give you a second to even dive into the problem, what is the first thing that we need to do? Oh, what did I just click on? Oh, I just clicked on an email in here. So the first thing we need to do--am I looking at the wrong one? Is this part of this one? No, those are still in there, though. Okay, still here. Now we can not assume--? Yes. Here we cannot assume that it's only lowercase and spaces. So now we have to deal with the fact that the letters can be whatever we want them to be. And so the first thing we want to do is just get the message. We just need to get a string, string s = GetString, okay. Now this problem, there are a couple of ways of doing it. But we are going to want to use bitwise operators here. Are there people who either were not at the supersection, or something, and don't know what bitwise operators are? Or how they relate to ASCII in any way? [Student] I wasn't at the supersection, but I know what bitwise operators are. Okay. So then I don't have to go over the basics of them, but I'll explain what we're going to want to use here. So 'A': Binary representation of capital A, the number is 65. I'm just going to look at--41 is going to be 01000001. So that should be 65 in decimal; so this is the binary representation of the character capital A. Now, the binary representation of the character lowercase 'a' is going to be the same thing, almost. Is that--6, yeah. This is right. So binary capital A, binary lowercase 'a.' So notice that the difference between A and 'a' is this single bit. And this happens to be the 32 bit, the bit representing the number 32. And that makes sense since A is 65; 'a' is 97. The difference between them is 32. So now we know we can convert from A to 'a' by taking A and bitwise ORing it, with--that looks like a 1. This is a bitwise OR, with 00100000, and that'll give us 'a.' And we can get from 'a' to A by bitwise ANDing with 11, 0 in that place, 11111. So this will then give us exactly what 'a' was; but cancel out this individual bit, so we'll have 01000001; I don't know if I counted right. But this technique of bitwise ORing to get from capital to lowercase, and bitwise ANDing to get from lowercase to capital isn't exclusive to A. All of the letters, K vs k, Z vs z, all of them are just going to differ by this single bit. And so you can use this to change from any lowercase letter to any capital letter and vice versa. Okay. So an easy way of getting from this--so instead of having to write out whatever 1011111 is--an easy way of representing this number, and this isn't one that I went over in the supersection, but tilde (~) is another bitwise operator. What ~ does is it looks at the bit representation. Let's take any number. This is just some binary number, and what ~ does is it just flips all of the bits. So this was a 1, now a 0, this is a 0, now a 1, 010100. So that's all ~ does. So 32 is going to be the number--get rid of that-- so 32 is going to be the number 00100000, and so ~ of this is going to be this number up here that I ANDed 'a' with. Does everyone see that? This is pretty common, like when you want to figure out for later things that we might be seeing, when we want to see if-- or we want everything, every single bit set except for 1 you tend to do ~ of the bit that we don't want set. So we don't want the 32 bit set, so we do ~ of 32. Okay. So we can use all of those here. All right, so it's fine if you're not done, we shall slowly walk over together, or walk over this, so--through this. Walk through this. So we have our string, and we want to loop over each character in that string and do something to it. So how do we loop over a string? What should we use? I'm not going to do it on here. Yeah. So I have my iterator, and he said it, but how do I know how many characters are in the string? Strlen (s), then i + +. So what I've done here isn't the best way of doing things. Does anyone know why? Because you're checking the language of the string every single time. So we are going to want to move strlen, I could say up here, int length = strlen (s), and then do i < length, and in case you haven't seen it before, I could also do int i = 0, length = strlen (s). And so this is somewhat preferable, since now I've restricted the scope of the variable length to just this 'for' loop, instead of declaring it before and that it always exists, and in case you didn't catch why that's bad, or why the original was bad, it's--start at the for loop. I checked the condition. Is i < the length of s? So the length of s, let's work with "hello" the entire time. So length of s, h-e-l-l-o. Length is 5. So i = 0, length is 5, so i is not < 5, so the loop continues. Then we go again. We check the condition. Is i < the length of hello? So let's check the length of hello. H-e-l-l-o. That's 5; i is not < 5, so we continue again. So we are calculating, we are counting h-e-l-l-o, for each iteration of the loop, even thought it's never going to change; it's always going to be 5. So we just remember 5 up front, and now everything's better. So iterating over the entire string. What do we want to do for each character of the string? [Student speaking, unintelligible] Yeah. So, if the character is non-alphabetic, then we just want to skip over it. Because we only care about alphabetic letters; we can't capitalize a number. So how can we do this? So our condition, so if we want something--check if it's alphabetical. So how do we check this? [Student] You can just use the function is alpha. Is that included in either of these, or any include like, char.h or something? Let's not use the is alpha function, and use the explicit--so we have s[i], that is the eighth character of s, remember that a string is an array of characters, so the eighth character of s. Now, if it is a capital letter, we know it has to be in a specific range. And what is that range? Yeah. So if s[i] is ≥ 65, and s[i] is ≤ 90, what should I do instead? Yeah. So you should absolutely never even need to know the ASCII values of anything ever. Never think of the numbers 65, 90, 97 and 102, or whatever it is. You don't need--112?--you don't need to know those at all. That's wrong too. Only use the single-quote characters, single quote constants. So 'A' and less than 90 is 'Z.' And this is significantly better--I would not know off the top of my head that Z is 90. I do know off the top of my head that 'Z' is capital Z. So as long as this is in the range of capital A to capital Z, or we can check for lowercase, Or if it's in the range ≥ 'a' and ≤ z. So that's our condition. The style for where to put these things varies. I'll do it like this. Now, what do we want to do? We know this letter is a character, an alphabetical character. So we need to alternate between whether this should now be a capital letter or a lowercase letter. How do we keep track of which one we want it to be? [Student voices, unintelligible] So yes, but let me check. Module 0-2 was said, was a suggestion thrown out, and I agree with that. Except notice that, like--is this the case? Yeah. It's every other one, but we can't module 2 of i, or i mod 2, since notice that E is capital and 'a' is lowercase? But there's a space separating them? So they're going to be the same mod 2, but they're different cases. [Student question, unintelligible] Yeah. So we're just going to keep a count. We could also do that in here if we wanted; that might get a little unwieldy in the for loop declarations; I'll put it up here. So int count = starts at 0. And so now, I'm going to count how many alphabetical characters we've had. So we're inevitably going to count + + since we found another alphabetical character. But, so now you're saying if count mod 2. So what if count mod 2? Oh. I'll do = = 0 for now. We'll also go over that. So if count mod 2 = = 0, then what? [Students answer, unintelligible] So we want it to end up uppercase. There are 2 cases; uppercase and lowercase are the 2 cases. So if we're in lowercase we need to make it uppercase. If it's uppercase we don't need to do anything. But, is there a way--shouldn't have flipped-- that we don't even need to check whether it's uppercase or lowercase? What can we do to always make sure that we always end up at uppercase? So notice what we did for lowercase 'a'; what if we did this same exact thing to uppercase A? Does uppercase A change, or does the value change? Yeah. So any capital letter bitwise ANDed with ~ 32 is going to be that same uppercase character because for any uppercase character the 32nd bit is not set. So if we want to bring the character s[i], we want it to become lowercase or uppercase. So if it was lowercase, it is now uppercase , if it was uppercase, it's still uppercase, and that's it. I said this in the supersection: You can use 32 if you want, but I tend to prefer doing 'a' - A, instead of just plain 32, because it can be any other bit. After the 32 bit, it can be any of these, or we wouldn't have enough numbers to represent all of the characters. So if you get the 32 bit, it could be the 64 bit, it could be the 128 bit. Any of those bits could be the bit that distinguishes between uppercase and lowercase. I shouldn't need to know that it's the 32 bit. I can use this 'a' - A to get the bit that differs between the two without needing to rely on the magic number that is 32. And so now, else count was odd, and so what do I want to do? [Student answers, unintelligible] [Student] What's that? I will do it in 1 second. So now if I want to--I want to make sure the character is now lowercase, and so I can OR by 32, and 32 meaning 'a' - A. But notice, by the same reasoning as the previous one, that if the letter was already lowercase, then ORing by 32 just keeps it lowercase. It hasn't changed the original character. But now I don't have to avoid saying, "If it is lowercase, just forget about it, if it's uppercase, then change it." It's much more convenient to do this. [Student] Would that strategy of subtracting the uppercase from the lowercase work if it weren't 32? If it was, like, 34 or something? So, you need to know that the difference between the 2 is--? >>1 bit. It could be more than 1 bit, as long as all of the bits below this position are the same. So we need at least 26 characters--or, there are 26 characters. So we need at least 26 numbers to represent the difference-- The difference between A and 'a' has to be at least 26, or else we wouldn't have represented all the capital numbers. That means that A, if we start at 1, it's going to use all of these bits, all of these first 5 bits, to represent everything through Z. That's why the next bit, or this bit, the next bit is the one that's chosen to distinguish between A and 'a.' That's also why, in ASCII table, there are 5 symbols separating capital letters from lowercase letters. Since those are the symbols, the extra 5 that brings up the 32 being the difference between them. [Student] So we could do it, because ASCII's designed that way. Yes. But ASCII--the difference could also be both of these bits. Like, if A were 10000001, and 'a' was 11100001--I forget, whatever. But if it were this, then we could still use 'a' - A. It's just now the difference between A and 'a' is still these 2 bits. I think it's written 48. Is it 32 + 64? I think it is? It would still be 2 bits; every single character, like, Z and z, K and k, they would still have the same exact bits set except for those 2 bits. So as long as that's always true, regardless of if we're using ASCII or some other system, as long as there's only a set number of bits that are different for each character, then that works fine. It's just that 32 was set up because it's the first one we could possibly use. >>Cool. I tend to prefer, in case you haven't seen, if the block is only a single line, you can get rid of the curly braces; so I tend to prefer doing this. Also, you know how we can do things like s[i] + = 1? You can also do s[i] bitwise AND = 32. And bitwise OR = 32. Also, count mod 2 = = 0. So remember that--I won't write it--any non-zero value is true, and 0 is false. So "if count mod 2 = = 0" is the same as saying "if not count mod 2." I probably would have just reversed the lines and said, "if count mod 2, do the OR 1, else do the AND 1," so that I didn't need the "not." But this works just as well. And what else can I do here? You could combine them with ternary if you wanted, but then that'd just make things messier and probably more difficult to read, so we won't do that. Anyone have any other suggestions? Is that all the problem asked for? Oh yeah. So get rid of these empty lines, now we'll print f , % s being the one for strings, We will print f, s. Now let's run it. Did I do anything wrong? That's a \ "; I want an n. Okay. Now we'll run it. It'll probably yell at me. Strlen is in string.h. So this is the nice thing about Clang is it tells you what it's in, instead of GCC which just says, "Hey, you forgot something, I don't know what it was." But this will tell me, "You meant to include string.h." So I didn't prompt for anything, so it's not saying anything. But we'll do their example, "Thanks 4 the add". That looks right. Hooray. So returning to your main, I almost never do it. It's optional. And main is the only function for which it is optional. If you do not return anything from main, it's assumed that you meant to return 0. Questions? Okay. So now the second problem. "Recall from week 2's second lecture that swapping 2 variables' values by passing those 2 variables to a function (even if called swap) doesn't exactly work, at least not without 'pointers.'" And ignore pointers until we get to them. We want to swap 2 variables; we're not using a function to do it. We're still going to do it in main like it says. But to use those 2 variables, we don't want to use a temporary variable. There are 2 ways to do this. You can do it using your traditional binary operators. So does anyone know a quick and dirty way of doing that? It might actually take a minute of thinking. If I have-- I'll set the problem up like they ask. So if I have 2 variables, A, which is just an integer that they give me, and sum variable B, which is another integer that I'm given. So if I have these 2 variables, now I want to swap them. The traditional, using your regular binary operators, I mean, like +, -, ÷. Not bitwise operators which act on binary. So using -, +, ÷, and all those. We could swap by doing something like a = a + b, and b = a - b, a = a - b. So, sanity check, and then we'll see why that works. Let's say a = 7, b = 3, then a + b is going to be 10. So we're now setting a = 10, and then we're doing b = a - b. So we're doing b = a - b, which is going to be 7, and b = a - b again, or a = a - b. Which is going to be 10 - 7 which is 3. So now, correctly, 'a' was 7, b was 3, and now b is 7 and 'a' is 3. So that kind of makes sense; 'a' is the combination of the 2 numbers. At this point, 'a' is the combination, and then we're subtracting out the original b, and then we're subtracting out what was the original 'a.' But this does not work for all numbers. To see this, let's consider a system; so we usually think of integers as 32 bits. Let's work on something that's only like 4 bits. Hopefully I come up with a good example right now. So, I know, this will be easy. Let's say our 2 numbers are 1111, and 1111; so we're in binary right now. In actual decimals, if you want to think of it that way, a = 15 and b = 15. And so we expect, after we swap them--they don't even have to be the same numbers, but I did it this way. Let's make them not the same numbers. Let's do 1111 and 0001. So a = 15 and b = 1. After we swap them, we expect 'a' to be 1 and b to be 15. So our first step is a = a + b. Our numbers are only 4 bits wide, so 'a,' which is 1111, + b, which is 0001, is going to end up being 10000, but we only have 4 bits. So now a = 0. And now we want to set b = a - b--actually, this still works out perfectly. a = a - b--let's see if this works out perfectly. So then b = 0 - 1, which would still be 15, and then a = a - b, which would be 1. Maybe this does work. I feel like there's a reason it doesn't work using regular. Okay, so working on the assumption that it doesn't work with regular binary operations, and I will look for--I will Google to see if that is true. So we want to do it using bitwise operators, and the clue here is XOR. So, introducing XOR (^) if you have not seen it yet. It's, again, a bitwise operator so it acts bit by bit, and it's-- If you have the bits 0 and 1, then this will be 1. If you have the bits 1 and 0, it'll be 1, you have the bits 0 and 0 it'll be 0, and if you have the bits 1 and 1 it'll be 0. So it's like OR. If either of the bits are true, it's 1, but unlike OR, it can't be both bits that are true. OR would have this be 1, XOR would have this be 0. So we're going to want to use XOR here. Think about it for a minute; I'm going to Google. Well, you can't read that; I'm currently on the XOR swap algorithm page. Hopefully this will explain why I can't-- This is exactly the algorithm that we just did. I still don't see why--I must have just picked a bad example, but this case where 'a' happened to become 0, after getting to 5 bits, so now 'a' is 0, that is what is called "integer overflow." According to Wikipedia, "Unlike the XOR swap, this variation requires that it uses some methods to guarantee that x + y does not cause an integer overflow." So this does have problems; this was integer overflow, but I did something wrong. I'm not sure. I'll try to come up with another one. [Student] Well, isn't integer overflow when you're trying to put a number in there bigger than the amount of bits you have allocated? Yeah. We have 4 bits. That's--we had 4 bits, we then try to add 1 to it, so we end up with 5 bits. But the fifth bit just gets cut off, yeah. It might actually-- [Student] Does that throw you an error, or does that--would that throw an error? No. So there's no error. When you get to the assembly level, a special bit somewhere is set that said there was an overflow, but in C you kind of just don't deal with that. You actually can't deal with it unless you use special assembly instructions in C. Let's think about XOR swap. And I think the Wikipedia article might have also been saying that-- So it also brought up modular arithmetic, so I guess I was, in theory, doing modular arithmetic when I said that 0 - 1 is 15 again. So that might actually--on a regular processor that does 0 - 1 = 15. Since we end up at 0, we subtract 1, so then it just wraps back around to 1111. So this algorithm might actually work, the a + b, the a - b, b - a; that might be fine. But there's some processors which don't do that, and so it would not be fine in those specific ones. XOR swap will work on any processor. Okay. The idea is that it's supposed to be the same, though. Where we are using XOR to somehow get the information of both into 1 of the variables, and then pull out the information of the individual variables again. So does anyone have ideas/the answer? [Student answer, unintelligible] So this should work, and also, XOR is commutative. Regardless of which order these 2 numbers happen to be in up here, this result is going to be the same. So a ^ b is b ^ a. You might also see this written as a ^ = b, b ^ = a, a ^ = b again. So this is right, and to see why this works, think of the bits. Using a smallish number, let's say 11001, and 01100. So this is 'a'; this is b. So a ^ = b. We're going to be setting 'a' = to the XOR of these 2 things. So 1 ^ 0 is 1; 1 ^ 1 is 0; 0 ^ 1 is 1, and 0 ^ 0 is 0; 1 ^ 0 is 1. So 'a,' if you look at the decimal number, it's going to be-- you're not going to see much of a relationship between the original 'a' and the new 'a,' but looking at the bits, 'a' is now like a mesh of the information of both the original 'a' and the original b. So if we take b ^ a, we see that we'll end up at the original 'a.' And if we take the original 'a' ^ the new 'a,' we see we end up at the original b. So (a ^ b) ^ b = the original 'a.' And (a ^ b) ^ a = the original b. There is--another way of seeing this is anything XOR itself is always 0. So 1101 ^ 1101, all the bits are going to be the same. So there's never going to be a case where 1 is a 0 and the other is 1. So this is 0000. The same with this. (a ^ b) ^ b is like a ^ (b ^ b). (b ^ b) is going to be 0; a ^ 0 is just going to be 'a,' since all the bits are 0. So the only ones that are going to be where 'a' was originally a 1--had ones. And the same idea here; I'm pretty sure it's also commutative. Yeah. I did say before that it was commutative. The ^ 'a,' and it's associative, so now (b ^ a) ^ a. And we can do b ^ (a ^ a). And so again, we get the original b. So 'a' is now the combination of 'a' and b together. Using our new combo 'a' we say b = combo 'a' ^ the original b, we get the original 'a.' And now a = combo 'a' ^ the new b, which was the original--or which is now what was 'a' or b. That's this case down here. This is = b, old b. So now everything is back in the swapped order. If we actually looked at the bits, b = a ^ b, is going to XOR these 2, and the answer is going to be this, and then a = a ^ b is XORing these 2 and the answer is this. Questions? Okay. So the last one is somewhat significantly more difficult. [Student] I think he has a question about it. >>Oh, sorry. [Student] What's actually faster? If you use this XOR, or is it if you declare a new variable? So what is actually faster, declaring a new variable or using XOR to swap? The answer is, in all likelihood, a temporary variable. And that is because once it's compiled down--so at the assembly level, there's no such thing as local variables or any temporary variables or any of this stuff. They're just like--there's memory, and there are registers. Registers are where things are actively happening. You don't add 2 things in memory; you add 2 things in registers. And you bring things from memory into registers to then add them, and then you might put them back into memory, but all the action happens in registers. So when you're using the temporary variable approach, usually what happens is these 2 numbers are already in registers. And then from that point on, after you've swapped them, it'll just start using the other register. Anywhere you had been using b, it'll just use the register that was already storing 'a.' So it doesn't need to do anything to actually do the swap. Yeah? [Student] But it also takes more memory, right? It will only take more memory if it needs to store that temporary variable. Like if you later use that temporary variable again somewhere, then--or you assign something to that temporary variable. So if at any point in time 'a,' b in temp have distinct values or something, then it's going to have distinct locations in memory, but it is true that there are many local variables which will only exist in registers. In which case, it's never put into memory, and so you're never wasting memory. Okay. Last question is a bit more. So here, in this CS50 appliance, there is a dictionary. And the reason for this is because [??B66] is a spell checker where you'll be writing using hash tables or tries or some data structure. You're going to be writing a spell checker, and you're going to be using this dictionary to do that. But for this problem, we are just going to look up to see if a single word is in the dictionary. So instead of storing the entire dictionary in some data structure and then looking over an entire document to see if anything's misspelled, we just want to find 1 word. So we can just scan over the entire dictionary and if we never find the word in the entire dictionary, then it wasn't in there. If we scan over the entire dictionary and do see the word, then we're good, we found it. It says here that we want to start looking at C's file-handling function, since we want to read the dictionary, but I will give the hint here as to which functions you should think of. I'll write them on Spaces. So the main ones you'll want to look at are f open and then, inevitably, f closed, which will go at the end of your program, and f scan f. You could also use f read, but you probably don't want to because that--you don't end up needing that. F scan f is what you're going to be using to scan over the dictionary. And so you don't need to code up the solution, just try and like pseudo-code your way to a solution, and then we'll discuss it. And actually, since I already gave you these, if you go into any terminal or your appliance's shell, I would--I usually--if you have not seen yet, I don't know if you did in class, but man, so the man pages, are pretty useful for looking at pretty much any function. So I can do, like, man f, scan f. This is now the info about the scan f family of functions. I could also do man f, open, and that'll give me the details of that. So if you know what function you are using, or you're reading code and you see some function and you're like, "What does this do?" Just man that function name. There are a couple of weird examples where you might have to say like. man 2 that function name, or man 3 that function name, but you only have to do that if man function name doesn't happen to work the first time. [Student] So I'm reading the man page for open, but I'm still confused on how to use it and the program. Okay. A lot of the man pages are less than helpful. They're more helpful if you already know what they do and then you just need to remember the order of the arguments or something. Or they can give you a general overview, but some of them are very overwhelming. Like f scan f, also. It gives you the information for all of these functions, and 1 line down here happens to say, "F scan f reads from the string point or stream." But f open. So, how would we use f open? The idea of a program which needs to do file I/O is that you first need to open the file you want to do things with, and inevitably, read things from that file and do stuff with them. F open is what we use to open the file. The thing we get back, so what file do we want to open, it gives us the-- in here it says "/user/share/dict/words." This is the file that we want to open, and we want to open it-- we have to explicitly specify whether we want to open it to read or if we want to open it to write. There's a couple of combinations and stuff, but we want to open this for reading. We want to read from the file. So what does this return? It returns a file star (*), and I'll just show everything in the variable f, so *, again, it's a pointer, but we don't want to deal with pointers. You can think of f as, f is now the variable you're going to use to represent the file. So if you want to read from the file, you read from f. If you want to close the file, you close f. So at the end of the program when we inevitably want to close the file, what should we do? We want to close f. So now the last file function that we're going to want to use is scan f, f scan f. And what that does is it scans over the file looking for a pattern to match. Looking at the man page here, we see int f scan f, ignore the return value for now. The first argument is the file * stream, so the first argument we're going to want to pass is f. We're scanning over f. The second argument is a format string. I will give you a format string right now. I think we happen to say, 127s\n, a lot of that's unnecessary. The idea of what that format string is, is you can think of scan f as the opposite of print f. So print f, print f we also use this type of format parameter, but in print f what we're doing is--let's look at an equivalent. So print f, and there's actually also f print f, where the first argument is going to be f. When you print f, we could say something like, "print 127s\n" and then if we pass it some string, it's going to print this string and then a new line. What 127 means, I'm pretty sure, but I've never restricted myself to it, You wouldn't even need to say '127' in the print f, but what it means is print the first 127 characters. So I'm pretty sure that's the case. You can Google for that. But in the next one I'm almost positive it means that. So this is print the first 127 characters, followed by a new line. F scan f now, instead of looking at a variable and printing it, it's going to look at some string, and store the pattern into the variable. Let's actually use scan f in a different example. So let's say we had some int, x = 4, and we wanted to create a string made of--wanted to create the string that was like, this will come up much later, something that's just like 4.jpg. So this might be a program where you will have sum counter, sum counter i, and you want to save a bunch of images. So you want to save i.jpg, where i is some iteration of your loop. So how do we make this string for that JPEG? If you wanted to print 4.jpg, we could just say print f, % d.jpg, and then it would print for that JPEG. But if we want to save the string 4.jpg, we use scan f. So string s--actually we can't--character, char s, let's go 100. So I just declared some array of 100 characters, and that's what we're inevitably going to be storing that JPEG in. So we're going to use scan f, and the format, how we would say % d.jpg in order to print 4.jpg, the format of this is going to be % d.jpg. So the format is % d.jpg, what we want to replace % d with is x, and now we need to store that string somewhere. And where we're going to store this string is in the array s. So after this line of code, s, if we print f, % s of the variable s, it's going to print 4.jpg. So f scan f is the same as scan f, except now it's looking over this file for what to store in s. That's what the last argument is going to be. We want to store--"Scan f family of functions scans in both according to format as tried below. If any are stored in the location points you might return--" No, we might be good. Let me think for a second. So scan f does not--what the heck is the function which does that? So scan f isn't going to take an integer and do dot jpg. It's going to [mumbles]. Save int variable in string int C. What is this variable, or what is this function called? Yes. That's--yes. So what I was defining to you before was s print f, which--that makes much more sense, why I said it was much more like print f. Scan f is still kind of like print f, but s print f is going to scan it over and replace the variables and now store it in a string. Instead of printing it, it stores it in a string. So ignore that entirely. You can still think of the format specifier as like that of print f. So now, if we wanted to do the 4.jpg thing, we would do s print f, x of this. So what scan f is doing--what was your question going to be? [Student] I'm just confused on what we're trying to do right here with that JPEG. Can you explain that 1 more time? So this was--it's less relevent to f scan f now; hopefully, it will tie back in some kind of way. But what I initially was intending to show was--this is actually directly relevant to these [?? F5] You're going to be using s print f, where, say we have 100 images, and you want to read image 1.jpg, 2.jpg, 3.jpg. So in order to do that, you need to f open, and then you have to pass in the string that you want to open. So we would want to open 1.jpg; in order to create the string that is 1.jpg, we do s print f of % d.jpg--we didn't do for int i = 0. i < 40, i + +. So s print f % d.jpg of i. So after this line, now the variable or the array s is going to 1.jpg. Or, 0.jpg, 1.jpg, 2.jpg. And so we can open, in turn, each image for reading. So that is what s print f does. Do you see what s print f is now doing? [Student] Okay, so it's taking--it creates a string, something.jpg, and then stores it. Yes. It creates--this is another format string, just like scan f and print f, where it inserts all of the variables into the second argument, might be s as opposed to i. Perhaps--I mean, that's the case. But whatever the order of arguments is. It's going to insert all of the variables into the format string and then store into our buffer; we call that a buffer, it's where we're storing the string. So we are storing inside of s the correctly-formatted string, % d having been replaced with 4. [Student] So if we did this, is the variable f just going to be reassigned? Yes. So we should close the original f before doing this. But--and then also, if there were not an f open up here, then we would need to say-- Yeah. But it would open a hundred different files. [Student] But we wouldn't be able to access or--okay. Okay. So scan f, f scan f, is kind of the same idea, but instead of, instead of storing it into a string, it's more like you are now going over a sting and pattern matching against that string and storing the results into variables. You can use scan f to parse over something like 4.jpg, and store the integer 4 into sum int x. That's what we can use scan f for. F scan f is going to do that at the command line. I'm actually pretty sure this is what the CS50 library does. So when you say, "get int," it's scan f-ing over--scan f is the way you get user input. F scan f is going to do the same thing but using a file to scan over. So here, we are scanning over this file. The pattern we are trying to match is some string that is 127 characters long followed by a new line So I'm pretty sure we could even just say "match s," since in the dictionary we happen to have, we're guaranteed no word is that long, and also f scan f, I think, will stop at the new line no matter what. But we'll include the new line in the match, and-- [Student] If we didn't include the new line, wouldn't it find parts of a word? It--each--looking at the dictionary-- So in the dictionary, these are all of our words. Each one is on a new line. The scan f is going to pick up this word. If we don't include the new line, then it's possible that the next scan f will just read the new line. But including new line then will just ignore the new line. But we'll never get part of a word, since we are always reading up to a new line, no matter what. [Student] But what if you search for the word "cissa," like c-i-s-s-a. Will it find that, and say it's a match? So here we--it will read in--this is actually a good point. We're never using the current--the word we're looking for is the first command line argument. So string, word = argv 1. So the string we're looking for is argv 1. We are not looking for a word at all in our scan f. What we were doing with scan f is getting each word in the dictionary, and then once we have that word we're going to use strcmp to compare them. We're going to compare our word and what we just read in. So inevitably, we're going to end up doing a bunch of scan fs until it just so happens that scan f will return-- it will return one, as long as it has matched a new word, and it will return something else as soon as it has failed to match the word. We are reading over the entire dictionary, storing line by line each word into the variable s. Then we are comparing word with s, and if the comparison = = 0, strcmp happens to bring 0 if a match was made. So if it was 0, then we can print f, matched, or word is in dictionary, or whatever you want to print f. And then--we don't want to f close over and over again. This is the kind of thing we want to do, and we are not just looking for word in the dictionary. So we could do that, if we wanted to look for their pattern, c-i-s-s-a, like you said before, if we wanted to look for that pattern, then it would fail in the case because that's not actually a word, but one of the words in the dictionary happens to have that in it. So it would match this word, but this subset of the word is not a word itself. But that's not how we're using it; we're reading in each word and then comparing the word we have with that word. So we're always comparing full words. I can send out the finalized solutions later. This is kind of nearly the right answer, I think. [Student comment, unintelligible] Oh, did I get rid of that before? Char s, I guess we said 127--I forget what the largest is. We'll just do 128; so now s is long enough. We don't need to print anything. We're also going to want to have to close our file, and that should be about the right answer. CS50.TV