[MUSIC PLAYING] DAVID MALAN: All right. This is CS50, and this is week 6. And this is, again, one of those rare days where in just a bit of time you'll be able to say that you learned a new language. And that language today is going to be this language called Python. And we'd thought we'd begin by introducing Python by way of some more familiar friends. So this, of course, is where we began the course back in week 0 when we introduced Scratch, a simple program that quite simply says "hello, world." And then very quickly, things escalated and became a lot more cryptic, a lot more arcane, and we introduced C and syntax like this, which of course do the exact same thing, just printing out "hello, world" on the screen, but with the requirement that you understand and you include all of this various syntax. So today, all of this complexity, all of the syntax from C, suddenly begins to melt away, such that we're left with this new language called Python that's going to achieve the exact same goal simply with this line of code here. Which is to say that Python tends to be more accessible, it tends to be a little easier. But that's because it's built on this tradition of having started, as humans years ago, building these low-level languages like C, realizing what features are missing, what some of the pain points are, and then layering on top of those older languages new ideas, new features, and in turn new languages. So there are dozens, hundreds really, of programming languages out there. But there's always a subset of them that tend to be very popular, very in vogue at any given time. Python is among those very popular languages. And it's the third of our languages that we'll look at, indeed, at this point in the term. So let's go ahead and introduce some of the syntax of Python, really by way of comparison with what we've seen in the past. Because no matter how new some of today's topics are, they should all be familiar in the sense that we're going to see loops again, conditions, variables, functions, return values. There's pretty much just going to be a translation of features past to now features present. So this of course, in the world of Scratch, was just one puzzle piece or a function, whose purpose in life is to say "hello, world" on the screen. In week 1, we translated this to the more cryptic syntax here, key details being that it's printf, that you have the quote, the string, "hello, world," you have this backslash n to represent a new line character. And then of course, this kind of statement has to end with a semicolon. The equivalent line of code today on out in this language called Python is going to be quite simply this. So it looks similar, certainly, but it's now print instead of printf. We still have the double quotes, but gone are the backslash n as well as the semicolon. So if you've been kicking yourself all too frequently for forgetting stupid things like the semicolons, Python will now be your friend. Well, let's take a look at another example here, how we might go about getting user input as well. Well, here notice that we have a puzzle piece called Ask. And it says, ask "What's your name?" and wait. And the next puzzle piece said, whatever the human had typed in, precede it with the word "hello." In C we saw code like this-- string_answer equals get_string "what's your name?" and then printing out with printf, "hello %s," plugging in one value for the other. In Python, some of this complexity is about to melt away, too. And in Python, we're going to see a little something like this. So no longer present is the mention of the type of variable. No longer present is the semicolon at the end. And no longer present is the %s and that additional argument to print. So in fact, let's go ahead and see these things in action. I'm going to go ahead and go over to CS50 IDE here for just a moment. And within CS50 IDE, I'm going to go ahead and write my very first Python program. And to do that, I'm going to go ahead and create a file that we'll initially called hello.py. Much like in the world of C, Python programs have a standard file extension being .py instead of .c. And I'm just going to do what I proposed was the simplest translation. I'm just going to go ahead and say print, "hello, world." I'm going to save my file. And then I'm going to go down to my terminal window. And in the past, of course, we would have used make, and then we would have done ./hello or the like. But today, I'm quite simply going to run a command that itself is called Python. I'm going to pass in the name of the file I just created as its command line argument. And voila, hitting Enter, there is my very first program in Python. So that's pretty powerful. Let's go ahead and create the second program that I proposed a moment ago. Instead of just printing out "hello, world" the whole time, I'm also going to go ahead this time and give myself a variable that I'll call answer. I'm going to go ahead now and get input from the user. And I'm going to go ahead and use the familiar get_string that we did see in C. I'm going to go ahead and ask, "What's your name" question mark. I'm not going to bother with a semicolon. But down here, I'm going to go ahead and say print "hello," comma, and then a space inside of the quotes. And instead of doing something like %s, I'm actually going to go ahead and just do a plus operator, and then literally the word "answer." But the catch is that this isn't going to work just yet. This isn't going to work just yet, because get_string, it turns out, just like it doesn't come with C, it also doesn't come with Python. So I need to do one thing that's going to be a little bit different from the past. Instead of hash including something, I'm going to literally say from cs50 import get_string. So in the world of C, recall that we included cs50.h, which had declarations for functions like get_string and get_int and so forth. In the world of Python, we're going to show you something similar in spirit, but the syntax is just a little different. We're going to say from cs50, which is our Python library that we the staff wrote, import, that is, include a function specifically called get_string. And now any errors that I might have seen a moment ago on the screen have disappeared. If I go ahead and save this file and now do python space hello.py and hit Enter, now I can go ahead and type in my actual name, and voila, I see "hello," comma, "David." So let's tease apart what's different about this code and consider what more we can do after this. So again, notice-- on line 3, there's no mention of string anymore. If I want a variable, I just go ahead and give myself a variable called answer. The function is still called get_string, and it still takes an argument just like the C version, but the line no longer ends with a semicolon. On my final line of code here, print is now indeed print instead of printf. And then this is new syntax. But in some sense, it's going to be a lot more straightforward. Instead of having to think in advance where I want the %s and my placeholder, this plus operator seems to be doing something for me. And let me go ahead and ask a question of the group here. What does that plus operator seem to be doing? Because it's not addition in the arithmetic sense. We're not like adding numbers together. But the plus is clearly doing something that gives us a visual result. Any thoughts from Peter? What's this plus doing? AUDIENCE: It's concatenating strings. DAVID MALAN: Yeah, it's concatenating strings, which is the term of art to describe the joining of one string and the other. So it's quite like, therefore, Scratch's own Join block. We now have a literal translation of that Join block, which we didn't have in C. In C we had to use printf, we had to use %s. Python is going to be a little more user friendly, such that if you want to join two strings like "hello," comma, space, and the contents of that variable, we can just use this plus operator instead. And the last thing that we had to do was, of course, import this library so that we have access to the get_string function itself. Well, let's go ahead and take a tour of just some other features of Python and then dive in primarily to a lot of hands-on examples today. So recall that in the example we just saw, we had this first line of code, which gets a string from the user, stores it in a variable called answer. We had this second line of code, which as Peter notes, concatenated two values together. But it turns out, even though this is definitely more convenient than in C in that you can just take an existing string and another and join them together without having to use format strings or the like, well, it turns out there's another way, there's frankly many ways in languages like Python to achieve the same result. And I'm going to go ahead and propose that we now change this line here to this funky syntax. So definitely ugly at first glance, and that's partly because this is a relatively new feature of Python. But notice that in Python can we use these curly braces, so curly braces that we have used in C, to plug in an actual value of a variable here. So instead of %s, Python's print function uses these curly braces that essentially say, plug in a value here. But there's one oddity here. You can't just start putting curly braces and variable names into strings, that is quoted strings in Python. You also have to tell the language that what follows is a formatted string. So this is perhaps the weirdest thing we've seen yet. But when you do have a pair of double quotes like I have here, prefixing it with an f will actually tell the computer to format the contents of that string, plugging in values between those currently braces, as opposed to literally printing those curly braces themselves. So let me go ahead and transition to my actual code here and try this out. Instead of using the concatenation operator as Peter described it, this plus operator, let me literally go ahead and say, "hello, answer," initially. So this is probably not going to be the right approach, because if I rerun this program, python of hello.py, it's going to ask me what's my name. I'm going to type in "David," and it's going to ignore me altogether, because I literally hardcoded "hello, answer." But it's also not going to be quite right to just start putting that in curly braces, because if I again run this program, python of hello.py, and type in my name, now it's going to say "hello, squiggly brace answer." So here is just a subtle change where I have to tell Python that this type of string between the double quotes is in fact a formatted string. And now if I rerun python of hello.py and type in "David," I now get "hello, David." So it's marginally more convenient than C, because, again, you don't have to have a placeholder here, a placeholder here, and then a comma separated list of additional arguments. So it's just a more succinct way, if you will, to actually introduce more values into a string that you want to create. These are called format strings, or for short f-strings. And it's a new feature that we now have in our toolkit when programming with this new language called Python. Well, let's take a look at a few other translation of puzzle pieces to see, and then turn to Python and then start building some programs of our own. So here in Scratch, this was an example early on of a variable called counter, initializing it to 0. In C, in week 1, we started translating that to code like this-- int counter equals 0 semicolon. And that gave us a variable of type int whose initial value was 0. In Python, the code is going to be similar-- similar, but it's going to be a little simpler still. Notice that I don't have to in Python mention the type of variable I want. It will infer from context what it is. And I also don't have to have the semicolon there. So counter equals 0 in Python is going to give you a variable called counter. And because you're assigning it the value 0, Python itself the language will infer that, oh, you must mean this to be an int or an integer. What else did we see in Scratch? Change counter by 1. So this was a way of increasing the value of a variable by 1. In C, we had a few different ways to implement this. We could say counter equals counter plus 1. It's kind of pedantic, it's kind of long and tedious to type. So instead, we had some shorthand notation that allowed us to do it this way instead. In C, we were able to do counter plus equals 1, and that was going to achieve the same result. Well, in Python we actually have a couple of approaches as well. We can, much like in C, say it explicitly like this but just omit the semicolon. So counter equals counter plus 1. The logic in Python is exactly the same as in C. And as for this shorthand notation, this also exists in Python, again without the semicolon. The one thing that does not exist in Python at this point in the story is that fancy counter++ syntax, or i++, that syntactic sugar that made it even more succinct to just increment a variable, unfortunately does not exist in Python. But you can do counter plus equals 1, or whatever your variable happens to be. Well, what else did we see in Scratch and then C? recall this. We introduced, of course, conditions pretty early on. And those conditions use Boolean expressions to decide whether to do this, or this other thing, or something else altogether. In C, we converted this to what looked kind of similar. Indeed, the curly braces kind of hug the printf line, just like the yellow condition here hugs the purple Say block. And we had parentheses around the Boolean expression, like x less than y. We again used printf inside of the curly braces which had double quotes, a backslash n for a new line, and a semicolon. Python, nicely enough, is going to be sort of identical in spirit but simpler syntactically. What Python is going to look like henceforth is just this. So the parentheses around the x less than y go away. The curly braces go away. The new line goes away. And the semicolon goes away. And here you see just a tiny example of evolution of humans programming languages. If you and I have been frustrated for some time about all the stupid semicolons and curly braces all over the place, it makes it harder, in some sense, for your code to read, let alone being correct, humans decided when inventing new languages that, you know what, why don't we just say what we mean and not worry as much about all of this syntactic complexity? Let's keep things simpler. And indeed, that's what we see here, is one example in Python. But there's a key detail. If any of you have been in the habit, when writing code in C, of being a little sloppy when it comes to your indentation, and maybe style50 is constantly yelling at you to add spaces, add spaces, or remove spaces or lines, well, in Python it is now necessary to indent your code correctly. In C, of course, we, CS50 and a lot of the world in general recommend that you indent your code by 4 spaces, typically, or one tab. In the context of Python, you must do so. If you accidentally omit these spaces just to the left of the print statement here, your Python code is not going to run at all. The Python program just won't work. So no more sloppiness. Python is going to impose this on you. But the upside is you don't have to bother including the curly braces. What about a more complicated condition where there's two paths you can follow, if or else? Well, in this case in C, we translated it pretty straightforwardly like this. Again, parentheses up here, curly braces here and here, backslash n, backslash n, and semicolon. You can perhaps guess in Python that this is going to get a little more compact, because boom, now we don't need the parentheses anymore. We do we need to indent, but we don't need the curly braces. We don't need the new line, and we don't need the semicolon. So we're sort of shedding features that can be taken now for granted. What about this example in Scratch when we had a three-way fork in the road, if, else, if, else? Well, in Python-- or rather in C, we would have translated this like this. And there's not much going on there. But it's pretty substantive number of lines of code, some 12 lines, just to achieve this simple idea. In Python, notice what's going to go away here is, again those parentheses, again those curly braces, again the backslash n, and the semicolon. There's only one oddity here. There's only one oddity. What looks wrong or weird to you? Maybe, what looks like a typo to you? And I promise I haven't screwed up here. Maybe elsewhere, but not here. Andrew? AUDIENCE: I would say the elif instead of else if is different syntactically. DAVID MALAN: Exactly. So whereas in C we would literally say else if, in Python, humans years ago, decided, heck, why say else if and waste all of that time typing that out if you can more succinctly say "elif" as one word, E-L-I-F. So indeed, this is correct syntax here. And you can have more of those. You can have four forks in the road, five, six, any number thereafter. But the syntax is indeed a little different. But it's a little tighter, right? There's less syntactic distraction when you glance at this code. You don't have to ignore as many semicolons and curly braces and the like. Python tends to just be a little cleaner syntactically. And indeed, that's characteristic of a lot of more recent, more modern languages like it. All right, let's take a look at a few other blocks in Scratch and in turn C. In Scratch, when we wanted to do something again and again as a loop, perhaps forever, we would literally use the Forever block. In C, we could implement this in a few different ways. And we proposed quite simply this one-- while true print out "hello, world," again and again and again. And because the Boolean expression never changes, it's going to indeed execute forever. So Python is actually pretty similar, but there are a couple of subtle differences. So ingrain in your mind what this looks like here. We have true in parentheses, the curly braces, the new line, the semicolon. A lot of that's about to go away, but they're still going to be a slight difference. Notice that we're indenting, as I keep emphasizing. We no longer have the new line or the semicolon or the currently braces, but True-- and it turns out, False-- now must be capitalized. So whereas in C it was lowercase false, lowercase true, in Python it's going to be capitalized False, capitalized True. Why? Just because. But there is one other detail that's important to note, both with our loops here, as well as with our conditions. Just as before, if I rewind to our most recent condition, notice that even though we've gotten rid of the curly braces and we've gotten rid of the parentheses, we now have introduced these colons, which are necessary after this expression, this expression, and this one, to make clear to Python that the lines of code that follow indented underneath are indeed relevant to that if, elif, or else. And we see that same feature again here in the context of a loop. We saw other loops, of course. In Scratch, when we wanted to do something a finite number of times like 3, we would repeat the following three times. In C, we had a few different approaches to this. And all of them, I dare say, were very mechanical. Like, if you want to do something three times, the onus in C is on you to declare a variable, keep track of how many times you've counted already, increment the thing. Like, there's a lot of moving parts. And so in C, one approach looked like this. We declare a variable called i equals 0-- but we could call it anything we wan-- we have a while block here that's asking a Boolean expression again and again, is i less than 0-- is i less than 3? And then inside of the loop, we printed out "hello, world." And using C's syntactic sugar, the plus plus notation, we kept adding 1 to i, add 1 to i, add 1 to i, until we implicitly break out of the loop because it's, of course, no longer less than 3. So in Python, similar in spirit, but again, some of that clutter goes away. i equals 0 is all we need say to give ourselves a variable. While i less than 3 is all we need to say there but with a colon. Then inside of that, indented properly, we print out "hello, world." And-- we can't do the plus plus, so minor disappointment-- but i plus equals 1 increments i. So this would be one way of implementing in Python the exact same thing a loop that executes three times. But we saw other approaches, of course, in C, and there's other approaches possible in Python as well. You might recall in C that we saw this approach, the for loop. And odds are you've been reaching for the for loop pretty frequently, because even though it looks a little more cryptic, you can pack more features into that one line of code in between those semicolons, if you will. So same exact logic, it just prints out this "hello, world" three times using a for loop instead. In Python, things start to get a little elegant here now. It's a little weird at first glance, but it's definitely more succinct. If you want to do something three times, it turns out in Python you can use a more succinct syntax for the for loop-- for i in, and then in square brackets a list of values. So just as we used in the past square brackets in a few different places to connote arrays and indexing into arrays, in the world of Python whenever you surround a bunch of values that themselves have commas in between them, and you encapsulate them all using square brackets, that's what we're going to call in Python a list. And it's very similar in spirit to an array, but we'll call it in the context of Python a list. And so what this line of code says is, for i in 0, 1, 2-- what does that mean? This is a for loop in Python that says, give me a variable called i. And on the first iteration of this loop set i equal to 0. On the second iteration of this loop set i equal to 1. And on the last iteration of this loop, set i equal to 2 for me. It just does all of that for you. Now, at the end of the day it actually doesn't matter what i is per se, because I'm not printing the value of i. And that's totally fine. Odds are you've used for loops where you did something again and again, like printing "hello, world," even though you didn't print out the value of i. So technically, I could have put any 3 things in the square brackets if I want. But the convention would be just enumerate, just like in C, 0, 1, 2, just like a computer scientist counting from 0. But this could break down pretty easily. This could become very ugly very quickly. Does anyone see a problem with for loops in Python if you have to put in between those square brackets the list of values that you want to iterate over? Noah? AUDIENCE: If you want to do, for example, a thing 50 times, you'd have to write out 0, 1, 2, 3, 4, 5, 6. DAVID MALAN: Yeah. My God, it would start to look hideous quickly. And it's funny you mention 50, because in preparing this demonstration for lecture today, I went back to week 0, when actually the analog in week 0 was to indeed print out "hello, world" 50 times. And I thought to myself, damn it, this is going to look atrocious now, because I literally have to put inside of square brackets 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, all the way to 49, as Noah says, which would just look atrocious. Like, surely there's got to be a better way. And there is. While this might be compelling for very short values, there's a simpler way in Python when you want to do something some number of times. We can replace this list of three values with this, a function called range that takes an input, which is the number of things that you want to return. And essentially, what range will do for you passed an input like 3, it will automatically generate for you a list of three values, 0, 1, and 2. And then Python will iterate over those three values for you. So to Noah's concern a moment ago, if I now want to iterate 50 times, I just change the 3 to a 50, I don't have to create this crazy mess of a manually typed out list of 0 through 49, which, of course, would not be a very well designed a program, it would seem, just because of the length of it and the opportunity to mess up and the like. So in Python, this is perhaps now, if you will, the most Pythonic way to do something some number of times. And indeed, this is a term of art in the Python community. Long story short, technical people, programmers, they tend to be pretty religious in some sense when it comes to the "right way" of doing things. And indeed, within the world of Python programming, a lot of Python programmers do have both opinions but also standardized recommendations that dictate how you "should" write Python code. And tricks like this are what are considered Pythonic. You are doing something Pythonically if you're doing it the quote, unquote "right way," which doesn't mean right in the absolute, it means right in the sense that most other people, rather, agree with you in this sense. All right. Let's see a few final features of Python before we now start to build some of our own features. In C, recall, we had this whole list of data types. And there are more, and you can create your own, of course. But the primitives that we looked at initially were these-- bool, char, double, float, int, long, string, and so forth. In Python, even though I haven't needed them, because I can give myself a variable like a string or an int, just by giving it a name like counter or i or answer, and then assigning it a value, and Python infers from what you're assigning it what data type it should be, Python does have data types. It's just what's known in the programming world as a loosely typed language. In the world of C, C is a strongly typed language, where, not only do types exist, you must use them explicitly. In the world of Python, you have what's called a loosely typed language, in which types exist, but you can often infer them implicitly. The burden is not on you the programmer to specify those data types incessantly. Let the computer figure it out for you. So this is our list from C. This now is going to be our analogous list in the world of Python. We're going to have bool still, True and False, but capital T, capital F. We're going to have floats, which are real numbers with decimal points. We're going to have ints, which of course are numbers like negative 1, 0, and 1, and so forth. And then not strings per se, but "stirs", S-T-R. And where is in the world of C, there was technically no "string type"-- that was a feature offered by the cs50 library, which just made more accessible the idea of a char star-- recall that C has strings. And they're called strings, but there's no data type called string. The way you give yourself a string, of course, in C is to declare something as a char star. And in cs50's library, we just gave that char star a synonym, a nickname, an alias, called "string." In Python, there are actual-- there is an actual data type for strings. And for short, it's called S-T-R. All right. So with that said, what other features do we have from Python that we can use here? Well, there's other data types as well in Python that are actually going to prove super useful as we begin to develop more sophisticated programs and do even cooler things with the language. We've seen range already. Strictly speaking, this is a data type of sorts within Python that gives you back a range of values, by default 0 on up, based on the input you provide. List, I keep mentioning verbally. A list is a proper data type in Python that's similar in spirit to arrays. But whereas in arrays-- recall, we've spent great emphasis over the past few weeks noting that arrays are a fixed size. You have to decide in advance how big that array is going to be. And like last week, if you decide, oops, I need more memory, you have to dynamically allocate more space for it, copy values over, and then free up the old memory. Like, there's so much jumping through hoops, so to speak, when you want to use arrays in C if you want to grow them or even shrink them. Python and other higher-level languages like it do all of that for you. So a list is like an array that automatically resizes itself, bigger and smaller. That feature now you get for free in the language, so to speak. You don't have to implement it yourself. Python has what are called tuples. In the context of like math, or GPS, you might have x- and y-coordinates, or latitude and longitude coordinates, so like comma separated values. Tuples are one way of implementing those in Python. Dict, or dictionaries. So Python has dictionaries that allow you to store keys and values. Or literally in our human world, if you have a human dictionary here, for instance for English, much like a dictionary in physical form, lets you store words and their definitions, a dictionary in Python, more generally, lets you store any keys and any values. You can associate one thing with another. And we'll see that this is a wonderfully useful and versatile data structure. And then lastly for today's purposes, there's these things called sets which, if you recall from math, a set is a collection of values, like a, b, c or 1, 2, 3, without duplicates. But Python manages that for you. You can add items to a set, you can remove items from a set. Python will make sure that there are no duplicates for you, and it will manage all of the memory for you as well. So what we have in the way of functions, meanwhile, is a few familiar friends. Recall that in C we used the cs50 library to get chars, doubles, floats, ints, longs, and strings. In Python, thankfully, we don't have to worry about doubles or longs anymore. More on that in a bit. But the cs50 library for Python, which you saw me import a few minutes ago, does give you a function called get_float. It does give you a function called get_int, it does give you a function called get_string, that, at least for this week's purposes, are just going to make your life easier. These two are training wheels that we will very quickly take off so that you're only using native Python code ultimately, and not CS50'S own library. But for the sake of transitioning this week from C to Python, you'll find that these will just make your life easier before we relax and take those away, too. So in C, to use the library you had to include cs50.h. In Python, again you're going to go ahead and import cs50, or more explicitly, the specific function that you might want to import. So it turns out there's different ways to import things. They ultimately achieve essentially the same goal. You can, with lines like this, explicitly import one function at a time, like I did earlier using get_string, or you can import the whole library all at once by just saying more succinctly, import cs50. It's going to affect the syntax we have to use hereafter, but you'll see multiple ways of doing this in our examples here on out. You can also simplify this a bit, and you can import a comma separated list of functions from a library like ours. And this is a convention we'll see quite frequently as well. Because if we start using popular third-party libraries written by other programmers on the internet, they will very commonly give us lots of functions that we ourselves can use, and we will be able to import those one after the other, by just specifying them here in this way. All right. Let me pause here just to see if there's any questions on Python syntax. Like, that's essentially it for our crash course in Python syntax. We're now going to start building things and explore what the features of Python are and what some of the nuances are, and really the power of Python. But first, any questions on syntax? We've seen loops, conditions, variables. Olivia, question or comment. AUDIENCE: In a for loop, if you want to increment by something besides 1, but you don't want to explicitly type out the list, how would you do that? DAVID MALAN: Really good question. So if you wanted to use a for loop and iterate over a range of values, but you wanted that range to be 0, 2, 4, 6, 8, instead of 0, 1, 2, 3, let me go ahead and go back to that slide from a moment ago. And I can actually change this on the fly. Let me go into that slide, which was right here. And what I can do, actually, is specify another value, which might be this. If I change the input to range to be not one value but two values, that's going to be a clue to the computer that it should count a total of three values, but it should increment 2 at a time instead of the default, which is 1. And there's even other capabilities there, too. You don't have to start counting at 0. You can adjust that as well, which is to say that with Python, you're going to find a lot more features come with the language, and even more powerfully, the functions that you can write and the functions that you can use in Python also can take different numbers of arguments. Sometimes it's 0, sometimes it's 1, sometimes it's 2. But it's ultimately often up to you. Good catch. Other questions? AUDIENCE: Will we see sequences primarily in the for loops? Or are there other applications where they're very useful? DAVID MALAN: Sequences in what sense? In the sense of ranges or lists or something else? AUDIENCE: Yeah, in terms of ranges, specifically. DAVID MALAN: Good question. Will we use them in other contexts? Generally speaking, it's pretty rare. I mean, I'm racking my brain now as to other use cases that I have used range for. And I'm sure I could come up with something. But I think hands down, the most common case is in the context of iteration, as in a for loop. And I'll think on that to see other applications. But any time you want to generate a long list of values that follow some pattern, whether it's 0, 1, 2, or as Olivia points out, a range of values with gaps, range will allow you to avoid having to hardcode it entirely. And you can actually write your own generator function, so to speak, a function that returns whatever pattern of values that you want. Other questions or confusion? Anything on your end, Brian, from the chat or beyond? BRIAN: Looks like all the questions are answered here. DAVID MALAN: All right. Well, let's go ahead now and do something more interesting than hello, world. Because after all, this is where programming really gets fun, really gets powerful, when you and I no longer have to implement those low-level implementation details, when you had to implement memory management for your hash table, or memory management for a linked list, or copying values in an array. We've spent the past several weeks focusing really on some low-level primitives that are useful to understand, but they're not fun to write. And I concede that they might not be fun to write in problem set form. And they're certainly not going to be fun to write for the rest of your life, every time you want to just write code to solve some problem. But again, that's where libraries come in. And now, this is where other languages come in. It turns out that Python is a much better, a much easier language to use for solving certain types of problems, among them some of the problems we have been solving in past problems sets. So in fact, let me go ahead and do this. I'm going to go ahead and grab a file here-- give me one moment-- called bridge.bmp, which you might recall from a past problem set. This is the beautiful Weeks bridge down by the Charles River in Cambridge, Mass by Harvard. And this is a very clear photograph taken by one of CS50's team members. And in recent weeks, of course, you wrote code to do all sorts of mutations of this image, among them blurring the image. And blur, I dare say, was not the easiest problem to solve. You had to look up, down, left, and right, sort of average all of those pixels. You had to understand how an image is represented one pixel at a time. So there's a lot of low-level minutia there, when at the end of the day, all you want to do is just blur an image. So whereas in past weeks we sort of had to think at and write at this lower level, now with Python it turns out we're going to have the ability to think at a higher level of abstraction and write far less code for ourselves. So let me go ahead and do this. I'm going to use my Mac for this instead of CS50 IDE, so I can open the images more quickly. This is to say that, even though we'll continue using CS50 IDE for Python and for other languages over the remainder of the course, you can also install the requisite software on a Mac, on a PC, sometimes even kind of sort of a phone today, to use Python and sort of see, in other languages, on your own devices. But again, we tend to CS50 IDE during the class so as to have a standard environment that just works. So I'm going to go ahead and write, though, on my computer a program called blur.py, py, of course, being the file extension for Python programs. So my program looks a little different now. I've got this black and blue and white window. But this is just a text editor on my own personal Mac here. I'm going to go ahead and do this. I need to have some functionality related to images in order to blur an image. So I'm going to go ahead and import from a PIL library, a Pillow library, so to speak, a special feature called Image and a special feature called ImageFilter. That is to say, these are essentially two functions that someone else smarter than me when it comes to image manipulation wrote, they made their code freely available on the internet free and open source, which means anyone can use the code, and I am allowed now to import it into my program, because I before class downloaded and installed it beforehand. Now I'm going to go ahead and do this. I'm going to give myself a variable called before. And I'm going to call Image.open on bridge.bmp. So again, even though we've never seen this before, never used this before, you can kind of glean syntactically what's going on. I've got a variable on the left called before. I've got a function on the right called Image.open, and I'm passing in the name bridge.bmp. So it sounds like this is kind of like fopen in the world of C. Now notice, this dot is kind of serving a new role here. In the past, we've used the operator only for structs in C, when we want to go into a person object, or into a node object, and we want to go inside of it and access some variable therein. Well, it turns out in Python, you have things similar in spirit to structs in C. But instead of containing only variables or data, like name and number like we did for the person struct a few weeks back, in Python you can have inside of a structure not only data, that is variables, you can also have functions inside of structures. And that starts to open up all sorts of possibilities in terms of features available to you. So it seems that I've got this Image object, this Image struct that I've, again, imported from someone else. Inside of it is an open function that expects as input the name of a file to open. So we'll see this syntax increasingly over the course of today's examples. Let me give myself a second variable, after. Let me go ahead now and assign to this variable called after the results of calling that before image's filter function, passing in ImageFilter.BoxBlur of 1. Now, this is a little cryptic, and we're not going to spend time on this particular syntax, because odds are, in life you're not going to have that many opportunities to want to blur an image for which you're going to run and write code. But for today's purposes, notice that inside of my before variable, because I assigned it the return value of this new feature, it has inside of it not just data but also functions, one of them now called filter. And this filter function takes as input the return value of some other function called that, long story short, will blur my image using a box of a 1-pixel radius. So just like your own code, if you implemented blur in C, this code is going to tell my code to look up, down, left, and right and blur the pixels by taking the average around them. And that's kind of it. After that I'm going to do after.save. And I'm going to save this as out.bmp. I just want to create a new file called out.bmp. And if I've made no mistakes, let me go ahead now and run python of blur.py and hit Enter. No error messages, so that's usually a good thing. If I type ls now, notice that I've got bridge.bmp, which I already opened, blur.py, which I just wrote, and out.bmp. And if I go ahead and open out.bmp, let's go ahead and take a look. Here's before, here's after. Huh. Before, after. Now, over the internet it probably doesn't look that blurred, though on my Mac right here a few inches away, it definitely looks blurred. But let's do it a little more compellingly. How about, instead of looking one pixel up, down, left, and right, why don't we look 10 pixels at a time? So we really blur it by looking at more values and averaging more. Let me go ahead now and run python of blur.py. Now let me go ahead and reopen. And now you see before and after. Before and after. So what is this to say? Well, here is, what, problem set 4 in four lines of code blurring an image. So pretty cool, pretty powerful. By standing on the shoulders of others and using their libraries can we do other things quite quickly. Notice what I can also do here, too, is solve a more recent problem. Let me go over to a different directory, where I have in advance-- and you can download these files off of the course's website-- a few files that we wrote before class. One is called speller.py. So long story short, speller.py is a translation from C into Python the code for speller.c. Recall that that was part of the distribution code for problem set 5, and in speller.c, we translated it now to speller.py. And in dictionaries and in texts, we see the same files, as in problem set 5, two different sized dictionaries and a whole bunch of short and long texts. What hasn't been created yet is the equivalent of a dictionary.c, a.k.a. now, dictionary.py. So let me go ahead and implement my spell checker in Python. Let me go ahead and create a file called dictionary.py, as is again, the convention. And let's go ahead. We have to implement four functions, right? We have to implement check, load, size, and unload. But I probably need like a global variable here to store my dictionary. And this is where you all implemented your hash table with a pointer, and then linked lists, and arrays, and all of that, a lot of complexity. You know what, I'm just going to go ahead and give myself a variable called words and declare it as a set. So recall that a set is just a collection of values that handles duplicates for you. And frankly, that's all I really need. I need to be able to store all of the words in a dictionary and just throw them into a set, so that there's no duplicate values and I can just check, is one word in the set or is it not. Well, let's go ahead now and load words into that set. I'm going to go ahead and define a function called load that takes the name of a file to open. And here is some admittedly some new syntax. So thus far, we've only typed code into the file itself. In fact, the most striking difference thus far, dare say, about Python versus C, is that I have never once even written a main function. And that, too, is a feature of Python. If you want to write a program, you don't have to bother writing your default code in a function called main. Just start writing your code. And that's how we were able to get hello, world down from this many lines of code in C to one line in Python. We didn't even need to have main. But if I want to define my own functions, it turns out in Python, you use the key word def for define, then you put the name of the function , and then in parentheses, like in C, you put the names of the variables or parameters that you want the function to take. You don't have to specify data types, though. And again, we don't use curly braces, we're instead using a colon. So this says, hey, Python, give me a function called load that takes an argument called dictionary. And what should this function do? Well, the purpose of the load function in speller was to load each word from the dictionary and somehow put it into your hash table. I'm going to go ahead and do the same-- read each word from the dictionary and put it into this so-called set, my variable called words. So I'm going to go ahead and open the file, which I can do with this function here. In Python, you don't use fopen. You just use a function called open. And I'm going to sign the return value of open to a variable called file. But I could call that anything I want. This is where Python gets really cool. Recall that reading the lines from Python-- from the file in C was kind of arduous, right? You had to use fread or some other function in order to read character after character after character, one line at a time. Well, here in Python, you know what, if I want to iterate over all the lines in the file, we'll just say for line in file. This is going to automatically give me a for loop that assigns the variable line to each successive line in the file for me. It will figure out where all of those lines are. What do I want to do with each line? Well, I want to go ahead and add to my set of words that line. Insofar as each word-- each line represents a word, I just want to add to my global variable words that line. And that's not quite right, because what's at the end of every line in my file? Every line in my file by definition has a backslash n, right? That is why all of the words in the big dictionary we gave you are one per line. So how do you get rid of the new line at the end of a string? Well, in C, my God, we would have to use malloc to make a copy, and then move all of the characters over, and then shorten it a little bit by getting rid of the backslash n. Uh-uh. In Python, if you want to strip off the new line at the end of a string, just do rstrip. To strip characters means by default to strip off white space. White space includes the space bar, the tab character, and backslash n. And so if you want to take each line and throw away the trailing new line at the end of it, you can simply say line.rstrip. And this is where strings again in Python are powerful. Because they are their own data type, they have inside of them, not only all of the characters composing the string, but also functions, like rstrip which strips from the end of the line any white space that might be there. You know what, after this I think I'm done. I'm just going to go ahead and close the file, and I'm going to go ahead and return True. So that's it. That's the load function in Python. Open the dictionary, for each line in the file add it to your global variable, close the file, return True. I mean, I'm pretty sure that my code is probably several lines, and certainly many hours, shorter than your code might have been for implementing that as well. Well, what about checking? Maybe the complexity is just elsewhere. Well, let me go ahead and define a function called check that takes a specific word as input as its argument. And then I'm just going to check if that given word is in my set of words. Well, it turns out in C you would probably have to use a for loop or a while loop, and you'd have to iterate over the whole list of words that you've loaded using binary search or linear search or the like. Ugh, I'm so past that at this point so many weeks in. I'm just going to say, if word in words, go ahead and return True, else return False. And that now is my implementation of check. Now, it's a little buggy. And I will fix this. Does anyone spot the bug? Even if you've never seen Python before, but having spent hours implementing your own version of check, is there some step I'm missing logically? There is a bug here. Does anyone spot what I'm not doing that you probably did do when checking if a given word is in fact in the dictionary? BRIAN: A couple of people are commenting on case sensitivity. DAVID MALAN: Yeah, case sensitivity. So odds are, in your implementation in C you probably forced the word to all uppercase, or you forced it to all lowercase. Totally doable, but you probably had to do it like character for character. You might have had to copy the input using malloc, or putting it into an array character for character, then using a toupper or tolower to capitalize or lowercase each individual letter. Ugh, like, that would take forever, as indeed it might have. So you know what, if you want to take a given word and lowercase it, just say word.lower. And Python will take care of all of those steps of iterating over every character, changing each one to lowercase, and returning to you the new result. And indeed, this now, I would think, is consistent with what you did in your example as well. Well, how about size? Well, in size recall that you had to define a function that doesn't take any inputs but returns the number of words in the set of words. And I'm going to go ahead here-- and actually, I got my invitation slightly off here. Let me fix this real fast. If you want to return the size of your dictionary, or really the number of words in your set, you can just return the length of that global variable words. Done. And lastly, if you want to unload the dictionary, let me go ahead and unload things. Doesn't take input as well. Honestly, because I've not done any equivalent of malloc, I've not done any memory management-- why? You don't have to in Python-- I can literally just return True in all cases, because my code is undoubtedly correct, because I didn't have to bother with pointers and addresses and memory management. So all of the stress that might have been induced over the past few weeks as you understood the lower level details of memory management now go away, not because it's not happening underneath the hood, but because Python is doing it for you. And I did spot one bug here actually. Notice I kind of relapsed into C code here. What I should have said here is it's actually file.close. So here when I close the file in load, I actually have to call file.close, because now that function close is associated with that variable for me. So again, there is memory management happening. Malloc and free or realloc are all happening sort of for you underneath the hood. But what Python the language is doing for you now is managing all of that for you. That's what you get by using a so-called higher-level language instead of a lower-level language. You get more features, and in turn in this case, you get all of those problems taken care of for you, so that you and I can focus on building our spell checker, so you and I can focus on building our Instagram filters, not on allocating memory, copying strings, uppercase and things, which honestly, while it might have been fun and very gratifying the first time you got those things working, programming would very quickly become the most tedious thing in the world if any time you want to write a program you have to think and write code at that low level. All right. Let me go ahead and really cross my fingers that I didn't screw up here, and go ahead and run this code. So I'm going to go ahead and run python of speller.py-- which, admittedly, I wrote in advance, because just like the distribution code in speller, we wrote speller.c for you, we wrote speller.py in advance. But we won't look at the internals of that. I'm going to go ahead and test this on, how about something big like Shakespeare. And I'm going to cross my fingers here. And so far so good. The words are kind of flying by. I'm going to assume they're correct. Hopefully we'll get to the output. And it looks like, yeah, I think I see some familiar numbers here. I've got 143,091 words. And then down here, the total time involved was just under 1 second. So that's pretty darn fast. And to be clear, I'm using my Mac instead of the IDE, so my numbers might be a little different than in the cloud, but 0.9 seconds. But you know what, out of curiosity, let me open up a different tab real quick, and let me go ahead and make speller from problem set 5. So I brought in advance our own implementation of speller, the staff solution, written in C in dictionary.c and speller.c, and I've just compiled it with make. And let me go ahead and run ./speller using the same text on Shakespeare. So again, I just ran the Python version, now I want to run the C version using the staff's implementation. All right. Wow. All right, it flew by way faster, kind of twice as fast. And notice, even though the numbers are the same up above, the times are not. My C version took 0.52 seconds, so half a second. My Python version took 0.9, or roughly 1 second. So it would seem that my C version is faster, my Python version is slower. Why might that be? Why might that be? Because I'm kind of disappointed if we just spent all this time preaching the virtues of Python, and yet here we are writing worse code, in some sense. Santiago? AUDIENCE: Could it be because C, even though it's low level, it explicitly tells the computer what to do, and so that makes it a little faster, whilst in Python it all happens like underneath the hood, as you were saying, so that could make it a little slower. DAVID MALAN: Yeah. In Python, you have a general-purpose solution to the problem of memory management, and capitalization, and all of these other features, that we ourselves have to implement ourselves in C. Python has general-purpose implementations of all of those. But there's a price you pay by using someone else's code to implement all of those things for you. And you pay an even greater price by using the type of language that Python is in a sense. So there's been this other salient difference between using C and using Python. When I wrote C code, I would compile my code from source code into machine code. And recall that machine code are 0's and 1's understood by the computer's brain, the so-called CPU, or Central Processing Unit. We always had to compile our code every time we changed the source code. And then we did like ./hello to run the program. But every demo thus far in Python, I haven't used make or clang. I have used not ./hello, but rather python space the name of the program. And why is that? Well, it turns out that Python is often implemented as what we describe with an interpreter. So Python is not only a language like we've been writing, it's also a program unto itself. The Python program I keep running is an identically named program that understands the Python language. And what's happening, though, is that by using an interpreter, so to speak, to run my programs you're incurring some amount of overhead. You're paying a performance price. Why? Well, computers, recall from week 0, at the end of the day, only understand 0's and 1's. That's what makes them tick. But I have not outputted any 0's and 1's. I the human have only been writing Python. So there needs to be some kind of translation between my Python code, in this English-like syntax, into what the computer itself understands. And if you're not going to go through the effort of compiling your code every time you make a change, but instead you're just going to run your code through an interpreter, as is the norm in the Python world, you're going to pay a price, because someone had to implement a translator for you. And in fact, there's formal terminology for this. In the world of Python we have, for instance, a picture that looks more like this. Whereas in the world of C, we would actually take our source code as input and output, first machine code is output, and then run the machine code, in the world of Python thus far, I'm writing source code, and then I'm immediately running it. I'm not compiling it into 0's and 1's in advance. I'm trusting that there's a program, coincidentally called Python, whose purpose in life is to translate that code for me into something the computer does understand. And what does that actually mean in real terms? Well, it means that if I were to think back to an algorithm like this, which probably cryptic to many of you, though not all, might be a Spanish algorithm for searching a phone book for someone. And suppose that I don't speak Spanish at all. I might, ideally, compile this program, this algorithm, into something I do understand by using a compiler that translates Spanish to English. Like voila, this English version, much better reading and understanding this, I can execute this algorithm pretty fast, because I'm pretty good at English. But if you only give me the Spanish version, the source code, and you require that I translate it or interpret it line by line, honestly that's really going to slow me down, because it's like me having to go take like a Spanish dictionary and look up every word-- "Recoge guia telefonica." All right, well, what's "recoge"? I have to look that up. What's "guia", what's "telefonica"? Oh, OK. Pick up phone book. Got that. Step one. What's step two? "Abre a la mitad de guia telefonica." So "open to the middle"-- well, wait, I don't know that. Spoiler. What does that mean, "abre"? All right, let me look that up. And it means "open." "A la mitad," that means "to the middle." "De guia telefonica," "of the phone book." Oh, that means "open to the middle of the phone book." So I'm struggling to go back and forth here, clearly. But it's clearly a slower process. And if I keep going, "Ve la pagina," "Look at the page," looking up, translating every line, it's undoubtedly going to slow down the process. And so that's effectively what's happening for us when we run these Python programs. There is a translator, a man in the middle, so to speak, that's looking at your source code and reading it top to bottom, left to right, and essentially translating each line respectively into the corresponding code that the computer understands. So the upside of this is that, thankfully, we don't have to run make or clang. We don't have to compile our code anymore. Like, how many people here have made a change to an earlier pset in C, forgotten to save the file but you rerun the-- sorry, you forgot to recompile the file, and you rerun it, and the program obviously has not changed because you haven't actually, not only saved but recompiled it? So that stupid, annoying human step is gone. In the world of Python, if you change your file, go ahead and just rerun it, reinterpret it. You can save that step. But the price you're going to pay is a little bit of overhead. And indeed, we see that here in terms of my Python version taking roughly 1 second to spellcheck Shakespeare, and my C version taking only one half of a second. So here, too, I promised in past weeks this theme of trade-offs. This is so prevalent in the world of computer science and programming, and frankly in the real world. Any time you make some improvement or gain some benefit, odds are you are paying some price. Maybe it's time, maybe it's space, maybe it's money, maybe it's complexity, maybe it's anything else. There's this perpetual trade-off of resources. And being a good programmer, ultimately, is about finding those inflection points and knowing ultimately what tools to use for the trade. All right, let's go ahead here, take a 5-minute break. And when we come back, we'll look at other features of Python, we'll end ultimately today with some really powerful capabilities. Back in five. All right. We are back. And first, a retraction if I may. Brian kindly pointed out that my answer to Olivia and Noah's follow-up question unfortunately missed the mark, as I was doing things on the fly instead of reading the documentation. So let me recall for us this example here, wherein we had the range function returning three values. So that code correct, that gives us the values 0, 1, and 2. But what I think Olivia asked was that if you wanted to skip values, and for instance do every two digits, how do we do that? And I unfortunately screwed up the syntax for that, providing only two inputs to range instead of three, as would be needed here. So for instance, suppose that we wanted to print out all of the numbers between 0 and 100, inclusive, but skipping every other-- so, 0, 2, 4, 6, 8, so all the even numbers on up through 100. We would actually want to do something like this instead. We would say, for i in range of 0 comma 101 comma 2. Why is that? Well, we'll pull up the documentation in just a moment, but 0 is where you want to start counting. The second value, 101, is where you want to stop counting. But it is by definition exclusive, so we have to go 1 past the value we care about. And then the 2, the third argument, is how many numbers do you want to increment at a time, from 0 to 2 to 4 to 6 to 8, on up through 100. So how could I have figured this out in advance rather than embarrassing myself now? Well, it turns out there is official documentation for Python. And we'll always link this to you. And here there is this search box at the very top. And you can see that during the break I was searching for the documentation for range. And sure enough, if I search for the range documentation, at first glance it might seem kind of overwhelming, because there's a lot of mentions of something like range in the documentation. Fortunately, the first result here is the one we want. And if I click on that, you'll see some documentation that's a little cryptic at first glance. But what's interesting about this is that range comes in two different flavors. And even though I keep calling it a function, technically it's what's called a class. But more on that another time. It behaves for our purposes as a function. Notice that there's two lines here. And they're similar but different. The first one specifies that this range function can take one input, the stop value. So at what value do you want to stop counting? So before, when we did range of 3, it stands to reason that by default, if you start counting at 0 and you stop at 3, that will get you to use i equals 0, 1, and 2. But there's another flavor of the range function, which is not the one that I proposed exists. There's another that takes in potentially three arguments, here or technically two. But it works in the following way. When you see syntax like this in Python's documentation, this means that the alternate form of range takes an argument called start, followed by an argument called stop, followed by, optionally, a third argument called step. And I know as the reader it's optional, because it's in square brackets here. So nothing to do with lists or arrays or anything like this. This is just human documentation. Anytime you see things in square brackets, that tends to imply to the human reader that this is optional. So what does that mean? Well, notice that there is no flavor of range that lets me specify a stop and a step, which I thought there was a moment ago when answering Olivia and Noah. But rather, there is this three-input version. So if I specify I want to start at 0, I want to stop at 101, which is just past the 100 I care about, and then provide an optional step of 2, this will give me a program ultimately that will print out all of those even numbers. So let me do this. First let me go into a program here. I'll call it count.py. And I'm going to go ahead and start at 0, go up to but not through 101, stepping 2 at a time. And this time I'm going to print out i. And here, too, another handy feature of Python-- no more %s, and also no more %i. If you want to print out the value of a variable called i, just say print, open paren, i, close paren. You don't need another format string as in C. Let me go ahead now and run python of count.py, Enter. And it scrolled by really fast. But notice that it stopped at 100, and if I scroll to the beginning it started at 0. So my apologies. Mea culpa for messing that up earlier. But what a wonderful opportunity to introduce the official documentation for Python, which will soon become your friend, cryptic though it might feel at first glance. All right. Let's go ahead then and revisit one other program that we started with earlier. And that program was again this relatively simple Hello program that we left off in this state. We were using the get_string function from the CS50 library in Python. We had a variable called answer that was getting the return value of that version of get_string. And we were printing out "hello," comma, so-and-so. And we were using that new cryptic feature, but handy, known as a format string or an f-string, which just means replace whatever's in curly braces with the actual value. So let's start to now take off the training wheels that we just put on only an hour ago. Let's get rid of the CS50 library. How can we actually get input in Python without using a library from someone like CS50? Well, get_string no longer exists. But thankfully there is another function we can use called, quite simply, input. Input is a function that, quite similar to get_string in both C and Python, prompts the user with a phrase, like this one here, "What's your name?"; waits for them to type in a value; and as soon as they hit Enter, it returns whatever the human has typed in for you. So if I go ahead now and rerun this program, python of hello.py, after getting rid of the CS50 library and using input instead of get_string, what's my name? David. "Hello," comma, "David." So already there now, this is raw, native Python code completely unrelated to anything CS50 specific. But now let's go ahead, and let's keep using the CS50 library initially, because we'll see that very quickly are there advantages of using it, because we do a lot of error checking for you. But we'll eventually take those training wheels off entirely as well. But notice, indeed, how relatively simple it is to do so. Let me go ahead and open up a program that we wrote in advance. And I'm going to go ahead and grab this. This is available, as always, on the course's website. And I'm going to go ahead and open a file called addition0.c, which we've actually seen before. And I'm going to go ahead and do this fancy thing here where, in just a moment, I'm going to split my window so that I can see two files at a time. And over here I'm going to create a new file, and I'll call this addition.py. So that is to say, I'm just going to rearrange my IDE temporarily today so that we can see one language on the left, C, and then corresponding language on the right in Python. And again, you can download all these examples online if you'd like to follow along on your own. So if I'm translating this program on the left to this program on the right, let's first recall what the program on the left actually did. This was a program that prompts the user for x, prompts the user for y, and quite simply performs addition on the two. So this is week 1 stuff, way back when now. Well, let's go ahead and translate this. I will use the get_int function from the CS50 library, because it's going to make my life a little easier for now. I'm going to say from cs50 import get_int. I'm going to then go ahead and get an int from the user using get_int and prompting them for x. I'm going to then go ahead and get an int from the user prompting them for y. I'm going to then finally go ahead and, let's say, print out x plus y. And let me go ahead down here and run python of addition.py. I'm now being prompted for x, let's type in 1, y, let's type in 2, and voila, 3 is my program here. So pretty straightforward. Fewer lines of code, because one, I don't have these unnecessary includes like stdio.h. I don't have any of the curly braces. To be fair, I don't have any of the comments. So let me write comments. In Python, it's going to be a different symbol. "Prompt user for x" should be prefixed with a hash symbol now instead of a //. I'll go ahead and prompt user for y, and then, how about here, perform addition. But even still, it's pretty tight. It's only 10 lines of code with some of those comments there. All right, well, what might I do that's a little bit different? Well, let's take off the training wheels. Let's take off the training wheels and get rid of the CS50 library again and get input here. Well, if I go ahead and get input here, get input here, assigning the values to x and y respectively, I'm going to go ahead now and run python of addition.py. x will be 1 again, y will be 2 again, and the answer, of course, is-- 12. Well, that's wrong. What's going on? How did I screw up such a simple program already? Albeit in a new language for me, Python. What did I do here? Yeah, Ben? AUDIENCE: Because it's really taking it in as two strings, so it's just putting them next to each other as opposed to doing the actual math on it. It's not reading it as in int. DAVID MALAN: Exactly. So input, this function that comes with Python, really is analogous to Cs50's get_string. No matter what the human types, it's going to come back as keyboard input characters, or ASCII characters, or Unicode characters from weeks past. Even if they look like numbers, they're not going to be treated as numbers, a.k.a., integers, unless we coerce them so. Now remember in C, we had this ability to cast values from one to another. Casting meant to convert one data type to another. And we were allowed to do that for chars to ints or ints to chars, but you could not do it for strings to ints, or from ints to strings. For that we needed special functions. And some of you might have used atoi, ASCII to int, which was a function that actually looks at all of the characters in an ASCII string and converts it to the corresponding integer. In Python, frankly, it's a little simpler. We can just cast it from one thing to another. So I'm going to go ahead and cast the return value of input as using this, int. And I'm going to do the same for y, passing the return value of input there to convert what looks like a string to what's-- what looks like an int to what's actually an int. And now let me go ahead and perform the additions again, python of addition.py. And notice this time, hopefully to Ben's point, it's not going to concatenate two strings, as we saw is the default behavior of plus when you have two strings left and right. Hopefully now it will do a do addition on x equals 1, y equals 2. And voila, now we're back in business. However, what if I'm not the most cooperative or sharp user, and I type in "cat" for x? Now some crazy stuff starts to happen. So notice we've triggered our very first error when it comes to running a program whereby my program won't even run in the first place. And notice I'm getting some somewhat cryptic syntax here-- traceback, most recent call last, file addition.py line 2. All right, that's at least familiar. I screwed up somewhere on line 2. It's showing me the line of code here. And it's saying "ValueError-- invalid literal for int with base 10, cat." That's a very cryptic way of saying I just have tried to cast something that's not an integer to an integer. And so this is why we use things like the CS50 library. It's actually kind of annoying to write all of the code that checks and makes sure did the user type in a number and only a number, and not "cat" or "dog" or some other cryptic string. We ourselves now would have to implement that kind of error checking if we don't want to use the CS50 library. So there, trade-off. Maybe you feel more comfortable writing all of the code yourself. You don't want to use some random person on the internet's library, whether it's CS50's or someone else's, even if it's free and open source. You want to write it yourself. OK, fine. If you want to write it yourself, now I've got to add a bunch more lines of code to check, did the human type in a decimal digit one after the other, or did they type in other ASCII characters? So again, trade-off between using libraries are not. Generally, the answer is going to be use a common library to do-- to solve these kinds of problems. Well, let's go ahead and change the program a little bit. Let me go ahead and open a new file called division.py just to do a bit of division here. And let me go ahead on the right-hand side and copy paste what we did before, but just change to division here. Let me go ahead and divide x by y. And I keep typing in 1 for x, 2 for y. In a moment I'm going to run python of division.py and type in 1 for x and 2 for y. But before I hit Enter, if this were a program in C, what would the answer be? Feel free to just respond in the chat if you'd like. If this were a program in C, and I'm dividing x by y, what would I have gotten in week 1 and every week since, Brian? BRIAN: The consensus looks like 0. DAVID MALAN: Yeah, because of truncation. If 1 divided by 2, of course, is 1/2, or 0.5, 0.5 is a float. But if I'm dealing with integers, even though it's implicitly integers thus far, and now explicitly now that I've casted them, I would seem to throw away the 0.5 and just get back 0. But let me go ahead and run python of division.py and putting x equals 1, y equals 2. And voila, wow, one of the most annoying features, or lack of features in C, seems to have been-- seems to have been solved in Python by division doing what you want. And if you divide one integer by another in Python, it turns out one of the other features of today's language is that it does what you the programmer would expect, without having to get into the weeds, of the nuances of floats and ints. Just does the quote, unquote "right thing" instead. Well, let me go ahead and open up another program here, also from week 1. This one was called conditions.c. And this one-- give me one moment to open this up on the left-- this one here was a program whose purpose in life was to get an int from the user called x, get another called y. And then it just did this-- if x less than y, print out as much. Else if x greater than y, print out as much, and so forth. Let's go ahead and translate this program into the corresponding Python code using some of the syntax we've seen already. I'm going to go ahead and save this as conditions.py. And I think I'm going to go ahead and keep using the library, the CS50 library, so that I don't have to worry about those kinds of errors when casting bad input to another. So from cs50 import get_int. And let me go ahead and now get an int from the user, calling it x. Let's go ahead and get an int from the user, calling it y. And I won't bother typing comments this time, just for time's sake. And now let me ask the question. In C, I would have done if x less than y. Python's a little more terse. If x less than y suffices, but with a colon. Under that, I'm going to go ahead and say print "x is less than y." Elif-- this is the weird one-- x is greater than y, go ahead and print out "x is greater than y." And then else, also with a colon, print out "x is equal to y." And I think that's just about it. I'm going to go ahead down here and run python of conditions.py. I'll type in 1, I'll type in 2, and indeed x is less than y. I'll run it again, this time with 2 and 1. X is greater than y. And let me run it again with 1 and 1. X is equal to y. So that seems to have worked. And let me point out one other thing. I mentioned earlier that you have this other shorthand syntax where you can just say import the CS50 library if you don't want to bother typing out individual function names. That's totally fine. But notice that the IDE is yelling at me at lines 3 and 4 that get_int is no longer recognized. That's because Python supports this feature, when using other people's libraries, that it can namespace them for you. That is to say, you can't refer to get_int anymore directly. You have to more explicitly say, call the get_int function that's inside of the CS50 library. And so again, using our familiar dot operator, means go inside of that CS50 library, just like a C struct, and call the function called get_int therein. So I can now go ahead and rerun this, python of conditions.py, typing in 1 and 1, and voila, the code is now working again. So which is better? It depends. I mean, if it's sort of more readable to just write get_int all over the place, that's going to save you a lot of keystrokes-- you don't have to keep typing cs50 dot, cs50 dot. If, though, you're writing a pretty big program, and maybe you're using two different libraries that both implement a function called get_int, you want to be able to distinguish one from the other. So you might want to just import the libraries by their name, and then prefix the function calls, as I've done here, which is known as namespacing. Namespacing means that you can have two identically named variables or functions existing in two different namespaces. They don't collide, so long as they are inside of the CS50 library or some other library's name instead. Let me do one other thing with conditions here. Let me go ahead and open up another file from week 1. This one was agree.c. And this program prompted the user to input whether or not they agree. And we checked a little curiously that first week using equals equals quote, unquote "Y" or lowercase "y," or quote, unquote capital "N" or lowercase "n." Well, how do we go about converting this one? Let me go ahead and give myself a new file over here. I'll call it agree.py in this case. And it turns out we can solve this one in a few different ways. Let me go ahead and start off by importing from CS50 get_int, just because it's-- oh, no, get_string, rather, because it's convenient. Let me go ahead and get the user's input via get_string and ask them the same question, "Do you agree," question mark with a space. Then let me check. If s equals equals quote, unquote "Y" or s equals equals lowercase "y," then I'm going to go ahead and print out "Agreed." Else-- oh, no, elif s equals equals capital "N" or s equals equals lowercase "n," let me go ahead and print out here quote, unquote, "Not agreed." And I think that should do it. But something's weird here. There's a few differences. What strikes you as different from C? What muscle memory might you have to break now when using conditions with multiple Boolean expressions combined in this way? And there's another subtlety. There's at least two salient differences between C and Python with just this example alone. Any thoughts in chat or [INAUDIBLE]? Ryan? AUDIENCE: I was going to say, for this one, instead of using the symbols for the logical operators, you can just type the text directly. DAVID MALAN: Yeah. We can literally just type the English word "or" if we want to express a logical or. So in C, recall on the left, we would have done this vertical bar thing, which is fine. You get used to it. But it's not very readable, at least in any English sense. Python took the approach of using more frequently actual English or English-like words that actually do read left to right. And indeed, a theme is emerging here. When you read Python code, it is closer to English than C is, because you don't trip over as much punctuation. Each line of Python code tends to read a little more like an English phrase or an English sentence. And there's one other subtlety here. On the left back in week 1, I took care to use single quotes around the Ys and the Ns. This week I'm using double quotes. But to be honest, it actually doesn't matter. I can alternatively use single quotes everywhere, so long as I'm consistent. But in Python there is no fundamental difference between double quotes and single quotes, so long as you are consistent. The reason being, when we looked at the data types that existed between C and now Python, absent from the list of Python data types was char. In Python there is no such thing as an individual char. Everything that's character-based is a string. Even if it's just one character long, everything is a string. Downside is we don't have quite as fine grained control. Upside is we get a lot more features with those string structures, as we've already seen with, for instance, doing something like uppercase with those as well. Well, let me go ahead and-- I think I can simplify this. For instance, suppose I wanted to tolerate something like not just "Y" or "y," in uppercase or lowercase. Suppose I wanted to also tolerate "Yes" in uppercase or lowercase as well. Well, you could imagine just starting to add to the code or s equals equals "Yes," or s equals equals "yes." But wait a minute, what if the user is being a little sloppy? And what if I want to actually say like, well, what if they're yelling? Or s equals equals "YES" in all caps. And there's a few other permutations as well. Like, this is quickly devolving into quite the mess. But if at the end of the day you really just want to detect "Y" or the word "Yes," irrespective of capitalization, I bet we can be pretty clever in Python here. What if I go ahead and say, if s is in quote, unquote "y" or "yes"-- in fact, I can borrow an idea from earlier, whereby I can use the square bracket notation to give me a list, which again, is like an array, but it will automatically grow or shrink as you need it. You don't have to decide in advance how big it is. This preposition here, in, is a new keyword in Python that will literally answer that question for me. And we've used it before earlier. When I implemented speller, I said if the word is in my set of words, return True. So if s in this list, I'll get back True or False based on the answer to that question. But again, it's not tolerating case. But no big deal-- dot lower, now I can say, is the lowercase version of s, no matter what the human typed in, in this list of two values? That means now the user can type in all caps, in alternating caps, and one capitalized letter, or any other permutation whatsoever. All right. So that, then, is our conditions. Let me pause here to see if there's any questions. Any questions or confusion that we can clear up? With syntax, with conditions, Boolean variable-- Boolean values? BRIAN: So a question came up. So in Python we are allowed to use the equals equals syntax to compare two strings? DAVID MALAN: Yes. So another really good catch. In Python, there are no pointers. Underneath the hood, there are still addresses. Like, your memory hasn't gone anywhere. But underneath the hood, all, of that is now managed for you by the language itself. So if you want to conceptually compare one string against another, just as I did here now on line 7, you can indeed use equals equals, and Python will do the quote, unquote "right thing" for you. You don't need to regress into using strcmp instead. Just for clarity, let me go ahead and update this. If s.lower in quote, unquote "n" or comma "no," we can achieve the same result there by doing the same technique. Well, let me go ahead and open up another example that you might recall we did a progression of examples to make it good, better, and then best, this one involving just a cat meowing in some form. So let me go ahead and open up from week 1 an example that was called meow0, relatively straightforward, that simply did this. It simply meowed three times. So suffice it to say now, in Python, it's pretty trivial to do something three times like this. I'm going to go ahead and call this meow.py. And of course, I can just do something like print "meow." And I can just copy paste that. But of course, the whole point of this example back in week 1 was not to devolve into just copy paste. Surely there's a better way. And we've seen a better way this time. If we wanted to change this into a for loop in C, we could have done something like for int i get 0, i less than 3, i++. Then in some curly braces we could have done printf of "meow," new line, semicolon. So that was the next version of our meow code in C. But in Python, of course, it's a little more succinct. I can just do for i in range 3 print quote, unquote "meow." So very similar in spirit to our hello, world of before. But again, we don't have to include any libraries for this. We don't need to have a main function. We don't need any of those curly braces or semicolon or the like. We can just dive in and focus on the code itself. But recall that we also, last time, evolved the meow program into having our own helper function, our own function that actually allowed us to create an abstraction on top of meowing. And that was in our third version, a.k.a., meow2. Let me go ahead and open up this version in a tab. And notice that this version starts to get a little involved, because one, we needed a prototype at the top, because I now have meow function at the bottom whose purpose in life was just to print "meow," but to abstract that away as a new helper function. And then I had this code here with a for loop inside. Well, in Python it's going to work out to be a little simpler here, too. If I want to do something three times, for i in range of 3 go ahead and call meow. Now of course, meow doesn't yet exist. So I can solve that problem. We've seen earlier, albeit quickly, in speller that I can define my own functions like meow. There's no more void, because if you don't want to have arguments in a function, just don't put them there. There's no return value specified in Python. They're implicit instead. So it suffices to do this. And now I can just print out "meow." So here now, I have a program that iterates three times, calling meow each time, and meow is defined down below. Let me go ahead and run this, python of meow.py. Huh. Traceback, most recent call last. There's a problem on line 2 of meow.py because of NameError-- name "meow" is not defined. Now, the language being used there by Python is a little different from C's. It's frankly a little more human friendly. But what just happened? What problem has arisen that I yet haven't tripped over until now? Even if you've never programmed in Python before, and even if you haven't run help50 yet, what might be the issue there? Ginny? AUDIENCE: It's that the function is not found when we are trying to call it, because it's described below when we are calling it. DAVID MALAN: Yeah. AUDIENCE: There is no prototype. DAVID MALAN: Yeah, there's no prototype. And it turns out in Python, there isn't a notion of prototypes. So unfortunately, the solution we saw in week 1 is not to just copy and paste the first line up above and end it with a semicolon. That's just not a thing. I could do this. I could just move my meow function to the top of the file, thereby defining the function first, and then using it last. And that would actually solve the problem, "meow meow meow." That, of course, doesn't really help us long term, because you could probably imagine a situation where this function wants to call this function, but this function calls this one, and you just can't really neatly order them in some safe way. And it's just not going to be as maintainable, right? Recall that one of the values of putting main at the top of our C programs was that any reasonable person who wants to understand your code is probably going to start reading top to bottom. They're not going to want to have to scroll through all of your code looking for the actual main code. So it turns out in Python, even though you don't need a main function, it's actually common to define one nonetheless. It's going to be implemented with something like this. And I'm just going to indent my code below that there. So now I've defined main. But I haven't executed any code yet. On line 6, I've now defined meow, but I haven't executed any code yet. And I mean that literally. If I run python of meow now and hit Enter, I would hope to see "meow meow meow," but I see nothing. And this is a little weird. But Python is doing literally what I told it to do. I told it to define a function called main, and I told it to define a function called meow. What I never told it to do is to call either of those functions. So the simplest fix here-- it's a little different from C and a little weird-- is just call main is your very last thought in the file. So define main up at the top, just where most programmers would expect it to be, but call it all the way at the bottom. And let me go ahead and now and run my program. And now voila, "meow meow meow" is back, because I've defined main, I've defined meow, and now I am calling main. Now, as an aside, you will very often see in various documentation and tutorials online a much more cryptic incantation than this, which will have you typing out this. This achieves the same goal, but it's not strictly necessary for our purposes. This line of code, if you see it in any online references, or examples, or books, or sections or the like, it is necessary only when you're implementing, essentially, your own libraries-- like your own CS50 library, or your own image blurring library or the like. It's not necessary when we're just writing individual programs of our own. So I'm going to go ahead and keep mine simple and literally just call main. And let me just wave my hand at why you'd need that syntax otherwise in this context. But let me go ahead and modify this one last time. Because recall that in C, the last version of my program had me running meow and passing it input. Because I defined meow as taking an input like n, and then doing something like for int i get 0, i less than n, i++, and then inside of my curly braces did I print meow, so that now I have a helper function that I've invented that takes one input, an int called n. And it loops that many times and prints out meow that many times. And now I have a real nice abstraction, and that now my program is distilled, it's just meow three times. And it doesn't matter how I implemented meow. I can do the same thing in Python. I can go ahead and say that meow takes an argument called n. I don't have to bother specifying its type. I can now say for i in range of n, and I can print "meow" that many times. And now I can get rid of my loop in main and just say "meow" three times. And so same functionality. If I run this a final time, "meow meow meow," but now I'm kind of designing my code in a more sophisticated way by actually giving myself now some of my own actual helper functions. All right, any questions, then, on this progression? Now, we're not really seeing new Python syntax. We're now just seeing a translation of some actual past C programs into Python to show really the equivalence. All right. Well, let's go ahead, then, and open another version from week 1 of a program called positive.c, which was an opportunity back then, not only to define our own helper function called get_positive_int, but it also introduced us to the familiar do while loop. And unfortunately, we're going to take that away from you now. Python does not have a do while loop. But it's, of course, a very useful thing to be able to do something while a condition is true. After all, pretty much any time we've gotten user input in the class, we've used do while, so that we prompt them at least once and then optionally again and again and again, until they cooperate. So let me go ahead and implement this in Python now in a file called positive.py, and go ahead here in positive.py, and translate this thing as follows. Let me go ahead and from cs50 import get_int. Let me go ahead and define a function called main. So now I'm just going to start to get into this habit. I'm going to go ahead and give myself a variable called i and call get_positive_int. And then I'm just going to go ahead and print out i, keeping it nice and simple. Now I have to implement get_positive_int. It doesn't need to take any input, so I'm not going to give it any arguments. And now I have to do to do while thing. So the Pythonic way to do this in Python is almost always to deliberately induce an infinite loop. And the idea being, if you want to do something again and again, just start doing it forever and then break out of the loop when you are ready to. So what do I want to do forever in this function? Well, I want to go ahead and get an int and prompt the human for a positive integer. And then I want to go ahead on the next line and ask a question. Well, if n is greater than 0, thereby making it positive, break. And the last line of code here is going to be to return n. So notice in C on the left, I did this do whole thing. I had to declare n outside of the do while loop, because it had to be outside the curly braces to be in scope. But in Python here, notice what I'm doing here is actually a little bit different. And did I screw up? Oh, yes, I did screw up. OK. If ask the actual question, if n greater than 0. So what did I do actually differently here on the right-hand side? Well, notice, I deliberately induced this infinite loop on line 10, which just means, do the following forever. I then ask the user for a variable n with get_int, and then I check, is n greater than 0? If so, break out of the loop. How do I break out of the loop? Well, notice that the indentation here has been very consistent. So when I break out of the loop, that puts me back in line with the original indentation which is now on line 14. Notice that the return lines up with the while loop, which means it's the first line of code that's outside of that loop. In the past, we would have had very explicit curly braces. Now we rely only on indentation that then lets me return n. So what are some of the differences here? One, the do while loop is completely gone. But two, scope is no longer an issue. It turns out in Python that the moment you declare a variable, it exists until the end of that function. You don't have to worry about the nuance of declaring a variable first like we did in C up here and then returning it down below. The moment we execute this line of code 11 here, n suddenly exists for the entirety of the remainder of that function. So even though we declared it inside of the loop, so to speak, as per the indentation, it is still accessible to the return statement here at the end of the program. All right. Let me pause there and see if there's any questions or confusion on getting user input, doing the equivalent, logically, of do while, but doing it now in this more Pythonic way. Peter? AUDIENCE: In Python, are variables accessible across functions or no? DAVID MALAN: Good question. No. So if you declare a variable inside of a function, it is scoped, so to speak, to that function. It is not available elsewhere. You would have to return it and pass it as output to input. Or you would have to define it, for instance, as a global variable instead. All right. Well, what else, then, might we translate? Well, recall from our earlier endeavors in week 1, we played around with these examples from Mario. And for instance, we wanted to print something out in Python-- in C that mimics the notion of these pyramids, or these coins, or these little bricks on the screen. Well, here let me go ahead and open up a new file called mario.py. And I'm going to transition away from always showing the before and after and now just start to focus more on the Python code. But you can always look back if you wanted the corresponding C versions. How do I go about printing out three bricks like this vertically? Well, in Python I might say something like, for i in range of 3, quite simply, as we've done a few times already, and just go ahead and print out a hash. I don't need to worry about the new line, because you get it for free, so to speak. But I'm going to go ahead now and run python of mario.py. And voila, there's my very simple ASCII version of this Mario structure. But what if I want to do the coins instead? What if I want to do this horizontal coins that appear in these four bricks and print out a version of that? Well, how might I do that? Well, let me go ahead and change this to be-- instead in my code for i in range of 4, so I can print four of these things. Let me go ahead and print out a question mark and then run this. So let me run mario.py. And voila-- damn. Like, not what I wanted. And so here is that trade-off. You might have been kind of excited, so far as it's possible to be excited about code, that, oh, my God, you don't need to do the stupid new line characters anymore. But what if you don't want it? Now we've kind of found a downside of getting those new lines automatically. Well, it turns out if we read the documentation for the print function in Python, it, too, can take multiple arguments. And what's powerful about Python, too, is that it supports not just positional arguments, where you just do a comma separated list of multiple arguments to a function. Python supports what are called named arguments, whereby if a function, especially one that's super powerful like print, takes multiple inputs, like this one, this other one, and this other thing. Each of those inputs can have names. And you, the user of that function, can specify the name. And it turns out that print in Python supports an argument called "end." And you can explicitly say what value you want to give to that parameter by mentioning its name. And here I'm going to literally do this. I'm going to tell the print function that I want the value of "end," a parameter, an argument to it, to be quote, unquote. The reason for that is that if I read the documentation, the default is actually this. If you read the documentation, it will tell you print's default value for its end argument is backslash n. This, too, is a feature that C did not have. C did not have optional arguments. They're either there or they're not. Rather, they either have to be there, or they cannot be there. Python supports optional arguments that even have default values. And so in this case, the default value of this, per the documentation, is that end is quote, unquote backslash n, which is why every line ends with that value. If you want to change that to be nothing, the so-called empty string, you change it to quote, unquote. So let me go ahead and run this now, and voila, closer. It's a little stupid looking, because now my cursor ended up-- my prompt ended up on the same line. So maybe after this line, let me just go ahead and print nothing, that is, a new line. And now if I run mario.py, voila, now I get the effect I want. And if you want to see what's really going on here, I can do something stupid like "HELLO." And now I can end every print with "HELLO," "HELLO," "HELLO," "HELLO." Not that you would do that, but that's all it means. It's ending every call to print with that expression. But the correct version, of course, is just to blank it out in this way. But here's something that's kind of cool. And this is where if you're kind of a geek, life starts to get really interesting fast. I can actually change my Python code to print out these four question marks in the sky to be quite simply print quote, unquote question mark times 4. And now if I rerun this program, boom, done. And here's where, again, you're getting a lot of features in the language where you don't have to think about loops, you don't have to think about a lot of syntax. If you want to take a question mark and do it four times, you can literally use the star operator, which has been overloaded to support not only multiplication with numbers but also automatic concatenation, if you will, with strings in this way. So let me go ahead and do one final version for mario. Recall that the last thing we built with mario looked a little something like this. Let me go ahead and change my mario code now to be for i in range of 3, because this is a 3 by 3 grid of bricks, let's say. And let's go ahead and now, inside of this loop, do another nested loop where I do three columns as well. And in here, I want to print out a single hash at a time. But I don't want to print out a new line. I only want to print out a new line here. So it turns out that essentially, because Python gives you the backslash n's automatically, essentially any logic you wrote in the past now needs to be reversed. If you ever printed a new line, now you don't want to print a new line. And if you ever didn't print a new line, now you do, in some sense. So let me go ahead and-- not make, wrong language-- python of mario.py. And voila, my 3 by 3 grid. So this is to say that in Python, we can nest loops, just like we did in C. I can use multiple variable names, like i and j being conventional. There's no curly braces, there's no semicolons. But again, the logic, the ideas are still the same. It just takes a little bit of time to get used to, for instance, some of the new syntax. You'll recall that in C, we ran into a problem pretty early on with integers. And let me create a program here called int.py. And let me initialize a variable called i to 1. And let me go ahead and do this forever. Let me do this forever. Instead of a while True block. Let me print out whatever i is. And then let me go ahead and just add 1 to i on each iteration. Let me go ahead and run this program. And let me increase the size of my window for now and just run this thing. Whoops, that was mario. Let me run this thing, python of int.py. And you'll see that it's counting up to infinity. And honestly, this is going to take a while. You know what's faster than counting by 1/ maybe multiplying by 2. So let me go ahead and multiply by 2 instead. To kill the program, just like in C I used Control-C. And that's why I see keyboard interrupt. It respected my wanting to cancel the program. Let me rerun this now and just count really big. And even though the internet's being a little slow, which is why it's a little shaky, that's a really big number already if I keep doubling i. What would have happened already at this point if I were using C to implement this program? If in C I declared a variable called i, and it was an int, and I kept doubling it, again and again and again and again and again, literally forever? Any thoughts? Yeah. What would have happened in C. Joy? AUDIENCE: Yeah, I think it would have crashed. DAVID MALAN: It would have crashed? Why? AUDIENCE: Because it would be taking much memory. DAVID MALAN: Good thought. So it wouldn't crash per se. Something would go wrong. It wouldn't crash. Because it's still an int, and in C at least, it would still be taking up on a typical computer 32 bits or 4 bytes. But honestly, the program probably would have started printing 0 by now, or even negative numbers. Because recall, one of the limitations of C is that integers are a finite size-- only 32 bits or 4 bytes. Which means if you keep going from 1, 2, 4 8, 16, a million, 2 million, 4 million, 8 million, and so forth, eventually you're going to get into the billions. And as soon as you cross the 2 billion threshold or maybe the 4 billion threshold, if using signed or unsigned numbers, it's going to get too big. You're going to have integer overflow. But in the world of Python, integer overflow, not a thing anymore. In the world of Python, your numbers will get as big as you need them to get. They will automatically address this problem for you. Unfortunately, floating point imprecision, still a thing. So I only divided 1 by 2 earlier. But if I continue to divide other values and I looked at enough decimal points, we would still suffer, unfortunately, from floating point imprecision. However, in the world of Python, like in Java and other languages, there are libraries, scientific libraries that allow you to use as much precision as you need, or at least as much memory as your computer has. So those problems, too, have been better solved in more modern languages than in something out of the box like C code. But just by multiplying that number again and again was I able, then, to demonstrate much larger numbers than we ever saw in weeks past. Well, let me go ahead and do another program here, this one called scores.py. That's going to be an example of really keeping track of scores, which was an example we did early on in week 2 of the class. And in Python, I'm going to go ahead and give myself a list of scores like this-- 72, 73, and 33-- again, sort of a playful reference to our ASCII numbers. But in this context, they're quiz scores-- so two OK quiz scores, and one kind of low quiz score, assuming these things are out of like 100. But notice the syntax I'm using. Square brackets in Python give me a list. I don't have to decide in advance how big it is. It's not an array per se, but it's similar in spirit. But it will automatically grow or shrink. And the syntax is even simpler. Suppose I want to average these scores in Python. I could do something like this. I could print out that the average of these scores is, for instance-- and then I could do something like this. I could do the sum of scores divided by the length of scores. And some of this is actually kind of new already. It turns out in Python that there is sum function that will take a list as input and return to you the sum of those items. And we've seen already there's a len function, L-E-N that tells you the length of a list. So if I add up all my scores and then divide by the total number of scores, that should give me by definition my average. So python of scores.py, voila-- whoops, what did I do here? Ah, I screwed up. So unintended, admittedly, but let me try to save myself here. So what just happened? Well, this error message is a little cryptic. It says, "TypeError-- can only concatenate str, not float, to str." long. Story short, Python in this case does not like the fact that I'm trying to take a string, average, on the left and concatenate to it a float on the right. So there's a couple of ways I can solve this. And we saw the fundamental solution earlier. If this expression here that I've highlighted is by definition mathematically a float, but I want it to become a string, I can just tell Python, convert that float to a string. So much like there's the itoa function that some of you discovered, which is the opposite of the atoi function, I can take in Python, in this case a float, and convert it to a string equivalent. So now if I run python of scores.py, voila, my average is 59.333333. And you already see a bit of imprecision. There's some rounding error at the end there that is not a perfect one third. But there's another way I could do this. And it's a little uglier. But I could use one of those f-strings. I could, say, go ahead and plug in a value here and just print out the user's average. So it turns out that inside of these curly braces, you don't have to print just variables. You can actually put entire coding expressions. And I would encourage you not to paste crazy long lines of code, because it's going to very quickly get unreadable. At that point you probably should use a variable. But here I can go ahead and run python of scores.py. And voila-- I screwed up again. Also not intentional, but I can fix this. Yeah, I'm missing the f at the beginning to make this a formatted string. And now if I rerun it, voila, same exact answer. So again, I have multiple approaches. There's a third one here. I could do something-- and actually, I don't need the str in that context, because now if it's inside of a format string, Python will presume that I want to automatically convert it to a string. So that's nice. Or I can just factor this out, and I can say something like this-- give me a variable called average, assign it equal to that math, and then print out the average. So again, just like in C, so many different ways to solve the problem. And which one is best depends really on what might be most readable, most maintainable, or easiest to do. Let me go ahead and add some scores dynamically now. Instead of hardcoding my three scores, let me ask myself for my scores over the course of the semester. From cs50 let me get_int, just so I can get some numbers easily. Let me give myself an empty list of scores, the syntax for which is just open bracket, close bracket, so nothing inside of it initially. And now let me go ahead and do this. Let me get myself three scores-- maybe it's the end of the term now. For i in range of 3, let me go ahead and append to the scores array whatever the return value of get_int is like this. Now, this, too, I could do in a bunch of ways. Let me get rid of this here. Whoops. Nope, we'll leave that there. This I could do in a bunch of ways. But notice what I'm doing. I'm getting int, and I'm passing the return value of int to a new function called append. It turns out that lists, the square brackets, once you've defined them in a variable like scores, they, too, have functions built into them. So I can do scores.append in order to add a number to the list. So now let me go ahead and run this, python of scores.py. Let me manually type in my 72, my 73, and my 33. And voila, same exact answer. But think about how much of a pain this would have been in C, if you had to either decide in advance the size of the array, or not decide in advance and use malloc and realloc to keep growing and shrinking it. Python, using this append function, which comes inside of that list variable, handles all of this automatically for us. All right. So that, too, is a whole bunch of features. Any questions, though, that I can answer here? Any questions? No? Yeah, over to Santiago. AUDIENCE: Yeah. I had a question about-- so even if append reduces the amount of code you have to write, does it underneath the hood just do exactly what we were doing in C, which is like, malloc and realloc, or something like that? Is that all-- is that happening inside Python? DAVID MALAN: It is. Yeah, that's exactly what you're getting for free, so to speak, with the language. All of that malloc stuff, realloc stuff, maybe it's implemented with an array underneath the hood, like in the actual computer's memory. Maybe it's a linked list like we saw last week. But all of that is happening for you. But that, again, is one of the reasons why the code ultimately runs a little slower, because you have someone else's code in between you and the CPU in your computer doing a bit of that work for you. Sophia? AUDIENCE: Are there efficiency differences in between the ways that we print, of utilizing the f formatting or the other forms that we've used? DAVID MALAN: You don't have to be-- if I'm understanding correctly, there are some fancy features of it. For instance, there is syntax you can use to specify how many decimal points you want to print after a floating point value. But it's no longer all of the %i, %s, %f, and so forth. They're slightly different syntax, but fortunately less of it, since you don't have to worry as much about those conventions. Other questions or confusion? No? All right. Well, let me go ahead and do one other example that might be familiar from some weeks past. Let me go ahead and whip up a quick example of uppercasing, just to tie together one of our earlier examples that we saw more organically, or lowercasing. In this case, a file called uppercase.py. Let me go ahead, and from the CS50 library, let me go ahead and import get_string. And then once I have this, let me go ahead and get a string from the user and ask them for, "Before," for instance. And then let me go ahead and do the following. Let me go ahead and print out "After," the goal being I want to uppercase this whole string for the user. And I'm going to keep this all on the same line. So again, I want a program that's going to print "Before," ask the human for some input, and then after, show the capitalized version of the whole string. So how can I do this? Well, we've seen one way already. I can do literally, for instance, s.upper. And let me go ahead and save this. And now run python of uppercase.py. Let me type in "hi" in lowercase, and boom, now I get back the uppercase version. But if you want, you can actually manipulate individual characters as well. Let me go ahead and a little more pedantically do this. For c in s, print c. Now, this isn't quite what I want yet, but it's a stepping stone. Notice now if I type in "hi" in lowercase, I see "h," "i," exclamation point, all still lowercase. So I haven't done anything interesting. But you know what, let me get rid of the new line, just so it all stays on the same line, because that was kind of ugly. Let me do it again. OK, a little better. Let me actually add a new line at the very end of the program to move my cursor to the new line. Let's do it once more, "hi." OK, I'm not uppercasing anything. But if I change c to c.upper, I can do that as expected. And let me run it again, "hi," and boom. Now I have another working program. But the new feature now is, notice this coolness on line 5. If you want to iterate over a string's characters, you don't need to initialize i to 0 and then use square bracket notation like you did in C. You just say, for c in s, or for x and y, whatever it is. For can also be used to iterate over the individual characters in a string, as you might want to do when doing something like cryptography or the like. So we don't have to just uppercase the whole string all at once. We can still gain access to our individual values. And there's other things you can do in Python as well that we could do in C. Let me go ahead and create a program here called argv.py, for argument vector, which, recall, was the name of the input to main that allows you to access command line arguments. Now today, we have seen that you can have a main function but you don't need to, but it's conventional. It's not required anymore. And so we haven't seen argc or argv yet, but that's because they're elsewhere in Python. If you want to access command line arguments in Python, it turns out that you can import a module called argv. And this is a little new, but it follows the same pattern as the CS50's library. I'm going to import from the System library a feature called argv. So this just means that it comes with Python, but to use it you have to import it explicitly. And now I'm going to do this. If the length of argv equals 2, then I'm going to go ahead and print out, just like we did a few weeks ago, "hello," and then argv bracket 1. Somewhat cryptic, but I'll come back to this in a moment. Else, I'm going to go ahead and print out a default of "hello, world." So we did this some weeks ago, in week 2, whereby we ran a program that if the user typed their name at the prompt, it would say "hello, David" or "hello, Brian." If they didn't, it would just say "hello, world." So to be clear, if I run this thing and run it without any command line arguments, I just see "hello, world." If I run it again, though, and type my name in and hit Enter, now I see "hello, David." So how is that working? Well, this first line of code gives me access to argv, which is now tucked away in the sys library, if you will, the sys package, so to speak. But it works the same way. There's no argc, but no problem. If argv is a list of command line arguments, which it is, len, L-E-N, will tell me the length of that list, which is equivalent to argc. So I can reconstruct the same idea from my version in C. And here, then, I have a format string that prints out "hello," comma, and then whatever's in curly braces. And argv is a list. And just like in C, which had arrays, a list is just an array that can dynamically grow and shrink for you. You can still use square bracket notation to get at, in this case, the second thing the human typed. So let me change this just for clarity to be 0. And if I rerun this now and type in David, it says weirdly, "hello, argv.py." So what you don't see is the word "Python." Python is the interpreter, but that's not part of your program's execution per se. argv 0 is going to be the name of the Python program you're running, and argv 1 is going to be the first word thereafter, and so forth. So we still have access to that feature, but now we can convert it now to Python. And in fact, if I want to print out all the command line arguments, I can just more simply do this-- for arg in argv, go ahead and print arg. So very succinct, if not obvious at first glance. Now let me go ahead and type in something like "David Malan," two words. Enter, you now see everything printed or typed after the program's name, and so forth. So here, too, notice how neatly we can iterate over a list in Python. There's no i, there's no square brackets necessarily. You can just say, for arg in argv, just like a moment ago I said for c in s. Pretty much the Python for loop is smart enough to figure out what it is you want it to iterate over, whether it's a string or a list. And my God, it's just so much more fun or pleasant to program now, when you don't have to worry about all the stupid mechanics of incrementing, and plus plus, and semicolons, and all of that syntactical mess. All right, let me pause here to see if there's any questions. I know we're going through some of these examples quickly, but they're really just translations again. And for upcoming problems and problems sets will you be able to more methodically compare before and after as well. Anything at all on your end, Brian? BRIAN: Nothing here. DAVID MALAN: All right. So let's look at some of our final past examples. And then we'll reserve some time at the end of today to look at some even more powerful things that we can do because now of languages like Python. Let me go ahead and create a program, this time called exit.py, exit.py. And this program's purpose in life, it's just going to demonstrate exit statuses. Recall that eventually in C, we introduced the notion of returning 0, or returning 1, or any other value from main. We do have that ability now in Python, too, that you'll start to see in more larger programs. Here, too, I'm going to go ahead and import sys, the whole thing this time, just to show a different way of doing this. I'm going to say, if the length of sys.argv does not equal 2, let me go ahead and yell at the user, "Missing command-line arguments." And then after this, I'm going to go ahead and do sys.exit 1. Otherwise, I'm going to go ahead and print out a formatted string that says "hello," comma arg v bracket 1, with sys now in front of it for reasons I'll explain in a moment. And then at the end, I'm going to go ahead and by default print sys.exit 0. All right. So what is going on here? One, because I'm now using sys for two different things, I decided not to import argv specifically, but just to import the whole library. But because I did that, I can't just write the word "argv" anywhere. I now have to prefix it with the name of the package or library that it's in. So that's why I started doing sys.argv, sys.argv. But I'm also using another feature of the sys library, which gives me access to an exit function, which is the equivalent to returning from main. So this is a bit of a dichotomy. In C, you had to return 0 or 1, or some other integer from main. In Python, you instead call sys.exit with the same kinds of numbers. So a little bit different syntactically, but it's the same fundamental idea. What's the purpose of this program? Well, if I run this thing, its purpose is just to make me type in one word and only one word after my program's name. So notice, if I just run python of exit.py, it's yelling at me, "Missing command-line argument." If I run it instead with my name after that, now it says "hello, David." So stupid program. It's only meant to demonstrate how you can now return different values or really return prematurely from a program, because you're no longer in main. You can't return per se, but you can now in Python exit as needed. So that's the comparable line there. All right, any questions, then, on exit statuses? Again, we're just kind of churning through the list of features we saw in C, even if they don't come to you super naturally-- pun not intended-- but rather, there are analogs here in the Python world. No? All right. Well, recall that after that we started focusing really in the class on algorithms. And that's when the size of our data sets and our-- the efficiency of our code started to really matter. Let me go ahead and write a program called numbers.py that, for instance, contains an import at the top for sys, because I'll need that in a moment. And then it gives me-- and let me give myself an array of numbers, like 4, 6, 8, 2, 7, 5, 0. And you might recall that those were the numbers behind the doors in week 3. And suppose that I want to search for the number 0. Well, in C, to implement linear search you would use a for loop and a variable like i, and check all of the locations. Python is way simpler. If 0 in numbers, go ahead and print out "Found." And then I'll go ahead and else print out "Not found." And that's it. So let me go ahead now and do python of numbers.py. Hopefully I will see [INAUDIBLE] found, because it's in fact there. So that's it. Linear search is just a prepositional phrase, if 0 in numbers, gives you the answer True or False that you want. So there is our linear search. What if I want to do it for names? Well, let me go ahead and give myself a second file, similar in spirit, called names.py. Let me again import-- and actually, if I really want to be identical to our C version, let me go ahead and exit with 0 here, and let me exit with 1 here. But strictly speaking, that's not necessary. That just happens to be what I did when we did this in C instead. In names, let me go ahead and do something similar. Let me give myself a names list with a whole bunch of names-- "Bill," and "Charlie," and "Fred," and "George," and "Ginny," and "Percy," and lastly "Ron," all the way at the end. And then let me just check if "Ron" is in that list using linear search. If "Ron" in names, go ahead and print out "Found." Else, go ahead and print out "Not found." And I won't bother printing out or exiting with 0 or 1 this time. But let me go ahead and run python of names-- whoops, python of names. And voila, we found "Ron." And notice, I'm not cheating. I don't think I've screwed up. If I go ahead and say "Ronald," if that was in fact his formal name, now I search for "Ron," not found. It's looking, indeed, for an exact match. So that's pretty cool, that we can distill something like that pretty readily. Well, recall that a little bit ago, I proposed that Python has other data types as well, among which are these things called dictionaries or dicts, D-I-C-T, which represent a collection of key-value pairs similar in spirit to a dictionary. Like, the Spanish dictionary has Spanish keys and English values converting one to the other, this English dictionary has English words and English definitions. But the same idea-- a collection of keys and values. Using one, you can find the other. Well, let's go ahead and translate this into Python in a program called phonebook.py, and implements something like our C phone book a while back, which, recall, in C, we used a couple of arrays initially, then we scratched that, and we used an array of structs instead. Now let's use a dictionary, which is a more general-purpose data structure, as follows. Let me go ahead here and from cs50 import get_string. Then let me go ahead and give myself a dictionary of people. And the syntax here is a little different, but I'm going to go ahead and preemptively use curly braces. They are back for the purposes of dictionaries. And then here's how you define key-value pairs. One key is going to be "Brian." And his value is going to be "+1-617-495-1000." That's his number. And then I'll be one of the other keys from now We'll keep it a very small phone book or dictionary. Mine will be "+1-949-468-2750." And that's it. So the curly braces can technically be on different lines. I could move this up here, I could get rid of this. But there are certain style conventions in Python. The point, though, here is that a dictionary is defined with curly braces at the beginning and end; the keys and values are separated by colons; and the key-value pairs are separated by commas. So that's why it's conventional to write it the way I did. It's just a little more obvious that this is a dictionary with two keys, each of which has a value. It's just associating left with right, so to speak. Now, what does this mean? Suppose I want to search for someone's name. Well, let me go ahead and give myself a name variable called get_string, asking the human for a name. And let me implement my own virtual phone book, much like the Contacts app on your phone. Let me go ahead and then say, once I have the name, if name in people, that's great. If I found the name in people, let me go ahead and print out that the number for that person is people bracket name. And this is where dictionaries are going to get really powerful. Let me run it first and then explain. Python of phonebook.py, Enter-- whoops, python of phonebook.py. Let me search for Brian's number. Boom, there's Brian's number. Let me go ahead and run it with David's name. Boom, there's that number. Let me go ahead and run it with, say, Montague's name. Don't have his phone number just yet. He's unlisted, as would be anyone else that I type in. So what has gone on here? Well, at the top I'm declaring this new variable called people. And it's a dictionary, a set of key-value pairs left and right. Then I'm just getting a string from the user using get_string as before. And then this is powerful, too. This is essentially, on line 9, searching the whole dictionary for the given name. And it's returning to me down here the name associated with that-- or, sorry, the number associated with that person's name. And let me make this more clear by factoring this out. Let me give myself a variable called number and then more explicitly print out that variable's name. Here's what's different today. "If name in people" in here, what this does is Python searches all of the keys for that name. It doesn't search values. When you say if name in a given dictionary, like people is, it searches only the keys. If you've then found the key, I know definitively that "David" or "Brian" are in the dictionary. And notice this. It's just like in C's arrays syntax. You can now use square bracket notation to index into a dictionary using a word like "David" or "Brian," and get back a value like our phone number. In C, and thus far even in Python, whenever we've seen square bracket notation, it would only be typically for numbers, because arrays or lists have indices, numbers that addresses the first location, middle, and last, and so forth, everything in between. But what's powerful about dictionaries is that they're otherwise known as associative arrays. A dictionary is a collection of key-value pairs. And if you want to look up a key, you simply use square bracket notation, just like we used to use square brackets for numbers. And because Python is a pretty fancy language, it handles the searching for you. And even better, it does not use linear search. When searching a dictionary, it aspires to give you constant time by using what we called last week a hash table. Dictionaries are typically implemented underneath the hood using something like a hash table. And recall that, even though it was really a goal of achieving constant time, if you choose a really good hash function and a really-- the right size array to hash into, you can come close to constant time. So again, among the features of a dictionary in Python are that it gives you very high performance. It's not linear search. And in fact, set-- recall that when we began playing with Python earlier, and I re-implemented speller using, what, 10 or 20 lines of code max instead of the many more that you might have written for pset 5, speller used a set. And a set is just a collection of values. Long story short, it's similar in spirit to a dictionary in that it, too, underneath the hood uses a hash table to get you answers quickly. So if you think back to what that speller example was from earlier on today, when I had a line of code that just said words equals set, that one line of code was implementing pretty much the entirety of your spell checker. All of those pointers, all of those hash tables and chains and linked lists are distilled into just one line of code. You get that with the language itself. All right. Any questions, then, on dictionaries? They will recur, and they tend to be one of the most useful data structures, because this ability to just associate something with something else is just a wonderful way, it turns out, to organize your data. Any questions here? Yeah, Sophia? AUDIENCE: Is there only a set hash function that Python has defined for these dictionaries, or can we change the hash function in any way? DAVID MALAN: Good question. It comes with a hash function for you, and Python figures all of that out for you. So that's the kind of detail that you should leave to the library, because someone else has spent all of the time thinking about how to dynamically adapt the data structure, move things around as needed, so that you no longer need to stress to the degree you might have when implementing speller yourself. And it turns out, other things get easy, too. This is not a commonly needed feature, necessarily, but it is something we can do. And let me go ahead and write a quick program called swap.py. Recall that in swap.c a couple of weeks ago, we gave x a value of 1, y a variable-- a value of 2, and then I printed out something like "x is x, y is y." But this week I'm using format strings just to print that out. Then I did something like swap x, y, and I just kind of hoped for the best, and then I printed out those values again. Well it turns out in Python, because you don't have pointers and you don't have addresses per se that you have access to, you can't resort to the solution like last week and pass these variables around by reference, so to speak, by their address. That's just not possible. Why is that a thing? Well, it would seem to be taking a feature away from you, but honestly, if this past week was any indication, including the week prior, pointers are hard. And segmentation faults are frequent. And getting all of that stuff right is difficult. And at worst, your programs can be compromised, because someone can access memory that they shouldn't. So Python takes that feature away. Java also takes that feature away from programmers to protect you against yourself from screwing up, like you may have and should have in some number of times this past week. But it turns out, in Python there are solutions to these problems. And if you want to swap x and y, that's fine. Swap x and y. And so now if I run python of swap on this program, voila, boom, it's distilled into one other line. So even though they take something away from us that you can do a lot of damage with or make a lot of mistakes with, we can nonetheless hand you back a more powerful feature with this one liner for swap. And notice that it's x comma y on the left, but y comma x on the right. And that has the effect of doing what Brian did with the glasses of liquid of doing the switcheroo, even without a temporary variable explicitly there, though some magic is happening underneath the hood. Well, let's go ahead and implement a couple of final programs from week 4 and then introduce a few of our own here in week 6. Let me go ahead and implement another phone book that this one's a little more persistent. Let me go ahead here and open-- create a file here called phonebook.csv. And let me go ahead and name this name comma number. So CSV file, recall, is like a very simple spreadsheet. And we're going to go ahead and just create that so I have it nearby. And then I'm going to create a new file called phonebook.py that's initially empty. And this time I'm going to do this. I'm going to import from cs50 the get_string function as before. But I'm also going to import a library called the CSV library. It turns out, Python comes with a whole lot of functionality related to CSV files to make your life easier and make it easier to do things with CSVs. Among the things I might want to do is this. Let me go ahead and open up that file, phonebook.csv, in append mode, similar to fopen two weeks ago. And let me go ahead and assign that to a variable called file. Then let me go ahead and just get a name from the user. So let me use get_string to get someone's name, "Name" here. Then let me go ahead and get-- use get_string again to get someone's number here, so using "Number." And then lastly-- and this is the new code-- let me save that name and number to a file. And recall from pset 4 that saving files and writing bytes out to files is pretty involved. Like, it probably took you a while to implement recover, or blur, any of those filters that involved creating new files. Turns out the CSV library makes this pretty easy. Let me go ahead and give myself what's called a writer. And I'm going to give myself the return value of calling csv.writer of file. So what is this doing? File, again, represents the file I'm trying to open. csv.writer is some function that comes with the CSV library. And it expects as input a file that you've already opened. And it kind of wraps that file with some fancier functionality that's going to make it way easier for me the programmer to write to that file. What am I going to do? I'm going to use that writer variable to write a row that specifically contains a name and a number. And I'm using a list, because if you think of a spreadsheet with columns and rows, a list is kind of the right idea. Each of the cells from left to right is kind of like a list. A row is like a list. So I'm going to deliberately use a list here. And then lastly, I'm going to close the file, just as I've done in the past. So it's a little cryptic here. But again, get_string-- get_string is old now. This is old now. So the only things that are new are importing the CSV. I'm opening this file in append mode, similar to what I did in C. And then these lines here involve wrapping the file with the CSV functionality, writing a row to this file with writerow, and then closing it. So let me go ahead and try this now. Let me open up a phonebook.csv, which for now only contains these headers which I created manually a moment ago. And let me go ahead and run this, python of phonebook.py. Let me go ahead and add Brian. And Brian will be +1-617-495-1000, Enter. And now let me go to my CSV file over here. Dammit, I screwed up. Pretend I didn't hit Enter there. Now it works. Let me go ahead now and do this again by input-- I should have hit Enter when I created the file manually, but I screwed up on creating it. So let me wave my hand at that and convince you that I did this correctly in code by adding myself, David, +1-949-468-2750, Enter. Let me go back to my CSV file. And voila, now it's formatted correctly, because I did-- writerow includes a line for me. And notice, too, if I download this file-- let me download phonebook.csv like I did in a past week. Let me download this to my own Mac. Let me open this CSV file. And whether you have Apple Numbers installed or Microsoft Excel, you'll open something that looks like this. And voila, I've dynamically created, using Python code now, my own sort of CSV file. And it turns out there's a way to tighten this up just a little bit. Not a big deal to do it the way I did, but it turns out that you can also open and close files a little differently. You can do this. With file-- with, rather, with open as file. Then I can indent all of this here, and I can get rid of my close line. So not a big deal to do it the way I did with open and close, but the way I've done this here is a little more Pythonic. This "with" keyword, which is not something analogous to anything we've seen in C, the with keyword, when you open a file, it will automatically close it for you eventually. So you might see that in some online references or other materials. But again, it just does that for you automatically. Well, let's go ahead and do this. I like the fact that we can now manipulate CSV. And it turns out that if you've ever used Google Forms-- that's a very popular way of collecting data from users. In fact, let me go ahead and go to a URL which is going to show you a form like this here. Brian, if you wouldn't mind typing that into the chat, go to that you URL, cs50.ly.hogwarts. And if everyone wouldn't mind just playing along, just tell us what house you wish you were assigned to by the Sorting Hat in the world of Hogwarts. What house would you be in? Now, if you've used Google Forms before, you'll recall that you can see these results, certainly in the Google Form itself-- and already 122 of you have buzzed in. And we could see a distribution and a graph and so forth. But what I want is not the distribution pictorially there. I'm going to go ahead and open up a spreadsheet. And so if you've never used Google Forms before, you can click a button, and then you can get a list of all of the responses that are coming in live right now. And by default, Google keeps track of the timestamp, when the form was submitted, and what house was actually used. So I'm going to go ahead now and do this. Let me go ahead and download that in another tab. Give me just a moment to do it on this screen here. I'm going to go ahead and download that CSV file onto my Mac locally, by going to File, Download, CSV. That's going to put it into my Downloads folder. And then I'm going to go ahead and upload this into my IDE by just dragging and dropping. Whoops, I have to open the file browser. I'm going to do this by dragging and dropping the file. All right. Now I have that file there. And let me go ahead now and make sure the file's there. I have this file called "Sorting Hat Responses-- Form Responses 1," and so forth. Well, let me go ahead and write a program now that manipulates this data, much like you might if running a student group that's collecting data in a Google Form, or you're just collecting information in general and have it in CSV format. How might you now tally up all of the results, especially if Google weren't just telling you graphically what the results were? Well, let me go ahead and write a program called hogwarts, which was not something that we've seen ever before in C. Let me go ahead and import this CSV library. Let me give myself initially a dictionary called houses that contains a whole bunch of keys, like "Gryffindor" with initial count of 0; "Hufflepuff" with an initial count of 0; "Ravenclaw" with an initial count of 0; and also, "Slytherin" with an initial count of 0. So notice, in a dictionary, or dict in Python, the keys and values don't need to be strings and strings. It can certainly be strings and numbers. Because I'm going to use this dictionary ultimately to keep count of all of the votes for one house or another. So let me go ahead and do this. Let me go ahead and open up with open the Sorting Hat File-- Form Responses 1.csv"-- long filename, but that's the default from Google-- as file. So I'm going to use my one liner instead of having to open and close. I'm going to give myself this time a reader, which we did not see before. CSV library has a reader function that allows me to read a CSV file automatically. I'm going to go ahead and skip the first row. Next is a function that just skips the first row, because recall that that one is just timestamp and house, which I do want to ignore. I want the real data from you all. And here's what's cool about CSVs in Python. I can-- if I want to iterate over all of the rows that are in that spreadsheet, I can do for row in reader. And now, let me go ahead and get at, for instance, the house in question. So the house in a given row is going to be the row's first entry, 0 indexed. So what is going on here? Well, let me go back to the Google spreadsheet a moment ago. And in the Google spreadsheet, there's two columns. And the way the CSV reader works is it returns to you one row at a time-- and that's conceptually pretty straightforward. It maps perfectly to the idea of a spreadsheet. But each row is returned to you as a list, a list in this case of size 2. So row bracket 0 would give me a given timestamp, row bracket 1 would give me a given house name. So that's why here in the IDE, I'm going ahead and declaring a variable called house, and I'm assigning it equal to row bracket 1, because I don't care about the timestamp. We all just did this roughly at the same time. But now that I have the house, I can now index into the dictionary, just like in C you could index into an array using a number. But in a dictionary, I can use strings. So I'm going to go ahead and say, go into the houses dictionary, which I defined up above, and go to the house key, and go ahead and increment it by 1. And that's it. At this point, I have opened the CSV file and read it using the library. In this loop, I'm iterating over every row in the spreadsheet that you all created by filling out that form again and again. I'm just using a variable to get at whatever's in the second column, otherwise known as row bracket 1, because row bracket 0 would be the timestamp. And then I'm going into the dictionary called houses, which we defined up here. I'm indexing into it just like an array, but it's a list in this case, using its house name, which looks up the appropriate key. And then plus equals 1 has the effect of incrementing its value. So it's a nice way of going into the dictionary and incrementing. Go in and increment. So now let's go ahead at the very end here and just print out the result. "For house in houses" is the fancy way to iterate over all of the keys in a dictionary, go ahead and print out a formatted string as follows. Let me print out the house name followed by a colon followed by the houses dictionary, indexing into it with house. So again, cryptic. We'll come back to this in a second. Python of hogwarts. Let me cross my fingers that I didn't screw this up. And I did. The IDE knew before I did. All right. Now let me hope that I didn't screw this up-- and dammit. All right. The file is called something slightly different. Google's name must have changed, sorry, versus when I practiced. Let me copy this. So close. Sorting hat responses. Ah, it has parentheses which I forgot. All right. Now let me cross my fingers, rerun the program, dammit. OK, no such file or-- oh, I forgot the csv, dot csv. OK. Now cross fingers and-- oh, thank God. OK so Gryffindor, not surprisingly, the most popular house. Hufflepuff at 40, Ravenclaw at 71, Slytherin-- oh, beat out Hufflepuff. Very interesting for whatever sociological reason. But here we have a program now that analyzed the CSV. Now, we happened to do it with silly Harry Potter data. But again, imagine collecting any data you want from users, downloading it as a CSV to your Mac or PC or your IDE, then writing code that analyzes that data however you want. I did a very simple summation, but you could certainly imagine doing something fancier than that, like doing summations or averages, standard deviations. All of that functionality could we get as well. All right, any questions on dictionaries before we now offer up some of the most powerful features we've yet seen in a programming language? Anything at all on your end, Brian? BRIAN: No hands raised here. DAVID MALAN: All right. Well, let me go ahead now, and I'm going to transition actually to my Mac where I have in advance pre-installed Python, just so that I can do things locally. It will make things a little faster. I don't have to worry about internet speeds and the like. And this is indeed the case, that on your own Mac, your own PC, you can download and install the Python interpreter, run it on your own Mac and PC. However, I would recommend you continue using this IDE, certainly for problem sets' sake until the end of the semester, maybe transitioning to your Mac or PC for final projects only, only because what I did this weekend was spent-- waste a huge amount of time just getting stupid libraries to work on my own Mac, which is often easier said than done, just because when programmers are writing code that's supposed to work on every possible Mac and PC in the world, you and I and everyone else have slightly different version numbers, different software install, different incompatibilities. So those kinds of headaches very quickly arise when you're doing things locally. So let me encourage you to wait until terms end with final projects, perhaps, to move off of the IDE and do what I'm about to now do, just because you'll be able to see these demos more clearly here. I'm going to go ahead, and on my own Mac, I'm going to go ahead and create a program called speech.py. In advance, I've installed a library that supports speech synthesis. And if I want access to that functionality, it suffices to import pyttsx3, which is the name of that person's open source free library that I downloaded and installed on my Mac in advance. I read the documentation. I literally never used this before this past week. And I found that I can declare a variable called engine, for instance. I can then call pyttsx3.init to initialize the library. Why? That's just because of how the programmer designed it. You have to initialize it first. I then can use that engine to say things like, say, "hello," comma "world." Then after that, I should run the engine and wait for it to finish before my own program quits. All right. Let me go ahead now and close that, and run python of speech.py on my own Mac here. COMPUTER VOICE: Hello, world. DAVID MALAN: Interesting. So it said what I typed in. And indeed, I can probably make this even more interesting. Let me go ahead and say something like this. Let me open up speech.py again and add some functionality. I won't use the CS50 library, but I will use maybe the input function. Let me go ahead and say name gets input, "What's your name," question mark. And then let me go ahead and say, not "hello, world," but let me use an f-string-- which doesn't have to be used in print, you can use it in any function that takes a string. Let me go ahead and say "hello" to that name. All right. Let me go ahead and run python speech.py again. Oops. Let me go ahead and run python of speech.py again. What's my name? David. COMPUTER VOICE: Hello, David. DAVID MALAN: Weird choice of inflection, but indeed it synthesized it. Let's try Brian. COMPUTER VOICE: Hello, Brian. DAVID MALAN: OK. So we could probably tinker with the settings to make the voice sound a little more natural. But that's pretty cool. Well, let me go into some code I wrote in advance this time using a different library, this one related to faces and facial detection. Certainly very much in vogue when it comes to social media these days, with Facebook and other websites automatically tagging you, very concerning increasingly with state governments and federal governments and law enforcement using facial detection to find people in a crowd. And let me go ahead and open up a file here, for instance, a little more benignly, like a whole bunch of people in an office. So here is a photograph of some people in an office. And there's a lot of faces there. But there's a lot of boxes of paper and other distractions besides those faces. But let me go ahead and look at, quickly, a program called detect.py. Most of this file is comments, just so that if you want at home you can follow along and see what it does. But let me just highlight a few salient lines. Here is that Pillow library again, where I'm accessing image related functionality from a pre-installed Python function. And this one's just kind of amazing. If you want to use facial recognition technology, just import face_recognition. That is a library you can import that will give you access to that kind of power. Down here now, I only knew how to figure this out by reading some documentation, but you access the library called face_recognition.load_image_file, which is a function that does what it means. I'm opening up office.jpg. And then scrolling down here to the white code, which is the actual code-- all of the blue is comments, recall-- this line of code here is all that's required in Python to use the face recognition library, find all of the face locations in a given image, and store them in a list called face_locations. This line of code here is just a Python loop that iterates over every face in the faces that were detected. And then these several lines of code here, long story short, just crop out individual faces and create a new image with the found faces. So without getting too much into the details of the library, which are not that intellectually interesting, the features are interesting to us for now, let me run python of detect.py. Let me give my Mac a few seconds here to do its thing. And voila, if I zoom in here we see Phyllis, and Jim, and Roy, and pretty much every other face that was detected in that photograph, cropped out as, indeed, an individual face. So if you've ever noticed a little square on yourself in Facebook when uploading a photo, this is exactly the kind of code that Facebook and others are using on their servers in order to execute that. Well, you know what, how about this? In the same office photo, you know, there's one person that always seems to stand out. No one really likes him. And that's Toby. What if we had a mug shot of Toby in a separate file like this? Can we find Toby in a crowd among these people in the office? Well, we can. Let me go ahead now and run a program called recognize.py, which you're welcome to look at online. It's similar lines of code, It's not terribly many, that is going to do some thinking. It's opening up both the office JPEG and this one. And notice what just happened, if I zoom in, wonderfully, Toby is the only one with a big green box around his face, having indeed been recognized. So again, I'll just glance at the code. This time, if I open up recognize.py, it's a few more lines of code. But again, I'm importing face recognition and some other things. I'm loading toby.jpg. And I'm loading office.jpg. And then there's some more code here that's looking for Toby, looking for Toby, and then drawing a big green box around the face that is ultimately found. So again, at the end of the day, it's just loops. It's just functions. It's just variables. But now the functions are pretty darn fancy and powerful, because again, they're taking advantage of all of these other features that we ourselves have implemented in a language like C, or have now seen glimpses of within the world of Python. Well, let's do another one. Let me go ahead and open up real quickly a program that will allow me to create one of these 2D barcodes, a so-called QR code. Let me go ahead and create a file called qr.py And in this file, let me go ahead and do this. Import the operating system library, for reasons we'll soon see, and let me import the QR code library, which will do all of the hard work for me. Let me go ahead and create an image called qr-- that's assigned the value of qrcode making. And let me paste in this URL of one of the course's lecture videos, for instance. And then let me go ahead and save this image as qr.png, Portable Network Graphic, as indeed a PNG, a very popular file format for photos and other things. And then let me actually open this thing up. Open up system-- actually, nope, that's fine. Let me keep it simple. We don't need the os library. Nope, we do. Let's go ahead and open it up with "open qr.png." So three lines of code-- make a QR code with that URL, save it as qr.png, and open the file. Three lines of code. Let me go ahead and run python of qr.py. Voila, it was pretty fast. If you would like to take out your own iPhone or Android phone, turn on the camera if your phone supports this, and scan this 3D barcode by awkwardly just pointing your phone at the lecture as we speak, it should open up YouTube for you, hopefully, and with such is-- I apologize to those-- yes, thank you for showing me what you're not seeing. I apologize for doing that yet again. Never gets old. But all we've done is embed in a two-dimensional format, details of which we won't go into in class, a URL, which suggests that you can store anything inside of these 2D barcodes, and if you decode them with something like your camera can the software running on your phones these days decode these things for you. Well, let me do something else, this time involving another sense, this one listening. Let me go into a file called listen.py. And let me go ahead and do something very simple. Let me go ahead and get a user's input in a variable called word by using the input function. Say something. And then let me just send it all to lowercase, just to keep things simple. And now let me do this. Once I get the user's words, let me go ahead and say, if the word "hello" is in their words, go ahead and print out "Hello to you too!" So if they say hello, I want to say hello back. Elif, "how are you" in words, then let me go ahead and print out something like, "I am well, thanks," as the computer. Elif "goodbye" in words, then let me go ahead and say something reasonable like "Goodbye to you too." And then lastly, else let me go ahead and print out just something like "Huh?" Unrecognized. So if you will, here is the beginnings of an artificial intelligence, an AI-- a program that's going to somehow interact with me the human typing in phrases to this thing. So if I did it correctly, let me go ahead and run python of listen.py. I did not do something correctly. Oh, not "is," "in." OK, sorry. Let me go ahead and run python of listen.py. Say something. I'll say "hello." Oh, Hello to you too. What a nice friendly program. Let me ask it how it is, "how are you," question mark. It seems to detect that. Let me go ahead and say, "ok goodbye for now." And it detects that, too, because "goodbye" is in the phrase that the user typed in. But if I say something like, "hey there," it's not recognized. So pretty cool. We can use very simple string comparisons using the in preposition to detect things. But I bet-- you know, I bet if we use the right library, we can really make this more powerful, too. Let me go ahead, and just like I imported facial recognition, let me import speech recognition in Python, which is yet another library that I pre-installed. Let me go ahead and now do this, recognizer equals speech_recognition.Recognizer. And this is just creating a variable called recognizer by my having followed literally the documentation for using this library. Then let me go ahead and do this, also from the documentation, with speech_recognition.Microphone as source. So this is opening up my microphone in some sense, again just following the documentation. Let me go ahead and say "Say something" to the user. And then after that, let me go ahead and declare a variable called audio, set it equal to the recognizer's listen function, passing in my microphone as the source. And now down here, let me go ahead and say print out "You said," and below that I will print out recognizer.recognize, is the hardest part today so far for some reason, google audio. All right. So what's going on? This line of code-- these lines of code here are opening up a connection to my microphone on my Mac. It's then using the speech recognition library to listen to my microphone, and storing the audio from my microphone in a variable called audio. These lines of code down here are literally printing, "You said," and then it's passing to the, the google.com, the file of audio that I just recorded on my microphone, and it's printing out whatever comes back from Google. So let's see what comes out, again, crossing my fingers that I didn't mess up. Python of listen. Hello, world. Hoo. How are you? It's a pretty good speech recognition. It's using the cloud, so to speak. It's passing it up to Google. But now let's make things a little fancier and actually respond to the human. So let me go back into here and add back some of the previous logic and say something like this. If "hello" in words, then go ahead and print out, like before, "Hello to you too." Elif "how are you" in the words that have come back from Google, go ahead and print out "I am well, thanks!" And down here if I said "goodbye" in words, then go ahead and print out "Goodbye to you too!" Else if nothing comes back that I recognize, let's just print out "Huh?" So if I did this right, let's now go ahead and let's do python of listen.py. Hello, there. Oh, dammit. OK, standby. Da-da-da. Oh, sorry. Let me do a find and replace. I called the variable "words" instead of "audio." And I just executed a fancy command to replace it everywhere. So "audio" is what I meant to say this time. Now, let's go ahead and run this, python of listen.py. Hello, world. Dammit. AudioData is not iterable. This is a bug. Give me one second to double check my notes. Very sorry to disappoint. The audio in-- oh, I did-- sorry. I did it right the first time but the wrong way. Let me change my variable back to words. OK. What I forgot to do was call one line of code here that's literally sitting in front of me. I need to convert the recognizer's return value, recognize_google audio. I need to store the return value of passing the audio to Google and storing the resulting text here. And so I have re-stored, using the words variable here. All right now let me go ahead and run python of listen.py. Hello, there. Very nice. How are you today? Cool. OK, goodbye. All right. So there we have an even more compelling artificial intelligence. Granted, it's not that intelligent, it's just looking for preordained strings. But I bet we can do something even more. And in fact, let me go ahead and step inside, and see if a colleague of mine can't help do something in real time. On a big fancy PC here in the theater, we are running some other Python program on a CPU that's fast enough to do this in real time. And we've connected one of our cameras to that PC, so that what you're about to see is the result of one of our cameras being wired into this PC, running that camera's input into Python software running on that PC. And we have trained the PC, using this Python software, to recognize certain images in the past. And let's see if we can't do this as well. Brian, would you mind putting me on screen 1? And Rongxin, do you want to go ahead and load up our first guest? I think we are live. So again, you see my mouth moving in lock step with Einstein here. His lips are matching mine. His head movements are moving-- matching mine. We can even be inquisitive. If my eyebrows go up, move my mouth this way, this way. And you can see that the Python program in real time is mapping my facial movements onto someone else's face, of course otherwise known as a deep fake. Rongxin, could we try out Brian's photo instead? Here now we have Brian who similarly is matching big smile. Gets a little fake at some point. But again, if we pre-rendered all of this instead of doing it live, the PC could probably do an even better job. How about could, we invite Harvard president Larry Bacow to join us, Rongxin? This is CS50, Harvard University's introduction to the intellectual enterprises of computer science and the art of programming. How about President Peter Salovey from Yale, Rongxin? This is CS50, Yale University's introduction to the intellectual enterprises of computer science and the art of programming. Now at this point, the real-world implications of this should be getting increasingly clear. While it's all fun and games to do this on Instagram, in TikTok and the like, using various mobile applications these days, which are essentially doing the same thing-- and you can see the image doesn't quite keep up with me if I start moving a little too quickly right now-- this is very real-world implications in the world of politics, government, business, and really just the real world more generally, because I'm essentially putting in someone else's mouth my own words. And while it's clear that these examples thus far aren't really that compelling-- if I start to move too much, you see that things start to get out of sync-- just imagine that if we wait one year, our computers are going to be twice as fast with even more memory and the like. Software is only getting better and more powerful, the libraries and the artificial intelligence is getting more trained. And so among the themes for the coming weeks of the class is not just how to do some things with technology and how to write code, but frankly asking the much bigger, more important picture question of should you do certain things with technology, and should you actually write such code. We did ask President Salovey and President Bacow for their permission in advance to spoof them in this way. But we thought we would more playfully end with just a couple of other examples that you perhaps see on Instagram, TikTok, and the like. Rongxin, could we invite Pam to join us first? And how about a certain Jim? All right. That's it for CS50 and Python today. We'll see you next time. [MUSIC PLAYING]