WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:05.988 [MUSIC PLAYING] 00:01:17.990 --> 00:01:21.800 DAVID J. MALAN: All right, this is CS50, and this is already week 6. 00:01:21.800 --> 00:01:24.458 And this is the week in which you learn yet another language. 00:01:24.458 --> 00:01:26.750 But the goal is not just to teach you another language, 00:01:26.750 --> 00:01:29.480 for languages sake, as we transition today 00:01:29.480 --> 00:01:32.780 and in the coming weeks from C, where we've spent the past several weeks, now 00:01:32.780 --> 00:01:33.440 to Python. 00:01:33.440 --> 00:01:37.530 The goal ultimately is to teach you all how to teach yourselves new languages, 00:01:37.530 --> 00:01:40.020 so that by the end of this course, it's not in your mind, 00:01:40.020 --> 00:01:42.710 the fact that you learned how to program in C 00:01:42.710 --> 00:01:44.960 or learned some weeks back how to program in Scratch, 00:01:44.960 --> 00:01:48.170 but really how you learned how to program fundamentally, 00:01:48.170 --> 00:01:50.630 in a paradigm known as procedural programming, 00:01:50.630 --> 00:01:53.450 as well as with some taste today, and in the weeks to come, 00:01:53.450 --> 00:01:55.310 of other aspects of programming languages, 00:01:55.310 --> 00:01:58.010 like object-oriented programming, and more. 00:01:58.010 --> 00:02:00.180 So recall, though, back in week zero, Hello, world 00:02:00.180 --> 00:02:01.680 looked a little something like this. 00:02:01.680 --> 00:02:03.387 And the world was quite simple. 00:02:03.387 --> 00:02:05.720 All you had to do was drag and drop these puzzle pieces. 00:02:05.720 --> 00:02:08.960 But there were still functions and conditionals and loops and variables 00:02:08.960 --> 00:02:11.030 and all of those kinds of primitives. 00:02:11.030 --> 00:02:14.300 We then transitioned, of course, to a much more arcane language that 00:02:14.300 --> 00:02:15.840 looked a little something like this. 00:02:15.840 --> 00:02:17.798 And even now, some weeks later, you might still 00:02:17.798 --> 00:02:20.470 be struggling with some of the syntax or getting annoying bugs 00:02:20.470 --> 00:02:22.970 when you try to compile your code, and it just doesn't work. 00:02:22.970 --> 00:02:24.800 But there, too, the past few weeks, we've 00:02:24.800 --> 00:02:28.130 been focusing on functions and loops and variables, conditionals, and really 00:02:28.130 --> 00:02:29.550 all of those same ideas. 00:02:29.550 --> 00:02:33.710 And so what we begin to do today is to, one, simplify the language 00:02:33.710 --> 00:02:38.840 we're using, transitioning from C now to Python, this now being the equivalent 00:02:38.840 --> 00:02:42.200 program in Python, and look at its relative simplicity, 00:02:42.200 --> 00:02:43.940 but also transitioning to look at how you 00:02:43.940 --> 00:02:45.800 can implement these same kinds of features, 00:02:45.800 --> 00:02:47.430 just using a different language. 00:02:47.430 --> 00:02:49.250 So we're going to see a lot of code today. 00:02:49.250 --> 00:02:53.150 And you won't have nearly as much practice with Python as you did with C. 00:02:53.150 --> 00:02:56.210 But that's because so many of the ideas are still going to be with us. 00:02:56.210 --> 00:02:58.580 And, really, it's going to be a process of figuring out, all right, 00:02:58.580 --> 00:02:59.413 I want to do a loop. 00:02:59.413 --> 00:03:01.760 I know how to do it in C. How do I do this in Python? 00:03:01.760 --> 00:03:02.990 How do I do the same with conditionals? 00:03:02.990 --> 00:03:04.710 How do I declare variables, and the like, 00:03:04.710 --> 00:03:07.460 and moving forward, not just in CS50, but in life in general, 00:03:07.460 --> 00:03:10.760 if you continue programming and learn some other language after the class, 00:03:10.760 --> 00:03:14.270 if in 5-10 years, there's a new, more popular language that you pick up, 00:03:14.270 --> 00:03:16.520 it's just going to be a matter of googling and looking 00:03:16.520 --> 00:03:18.410 at websites like Stack Overflow and the like, 00:03:18.410 --> 00:03:21.350 to look at just basic building blocks of programming languages, 00:03:21.350 --> 00:03:24.680 because you already speak, after these past 6 plus weeks, 00:03:24.680 --> 00:03:27.500 you already speak programming itself fundamentally. 00:03:27.500 --> 00:03:31.070 All right, so let's do a few quick comparisons, left and right, of what 00:03:31.070 --> 00:03:32.960 something might have looked like in Scratch, 00:03:32.960 --> 00:03:34.820 and what it then looked like in C, but now, 00:03:34.820 --> 00:03:36.770 as of today, what it's going to look like in Python. 00:03:36.770 --> 00:03:38.853 Then we'll turn our attention to the command line, 00:03:38.853 --> 00:03:42.510 ultimately, in order to implement some actual programs. 00:03:42.510 --> 00:03:45.740 So in Scratch, we had functions like this, say Hello, 00:03:45.740 --> 00:03:47.270 world, a verb or an action. 00:03:47.270 --> 00:03:49.740 In C it looked a little something like this, 00:03:49.740 --> 00:03:53.150 and a bit of a cryptic mess the first week, you had the printf, 00:03:53.150 --> 00:03:54.290 you had the double quotes. 00:03:54.290 --> 00:03:55.980 You had the semicolon, the parentheses. 00:03:55.980 --> 00:03:58.423 So there's a lot more syntax just to do the same thing. 00:03:58.423 --> 00:04:01.340 We're not going to get rid of all of that syntax now, but as of today, 00:04:01.340 --> 00:04:05.580 in Python, that same statement is going to look a little something like this. 00:04:05.580 --> 00:04:07.640 And just to perhaps call out the obvious, what 00:04:07.640 --> 00:04:12.050 is different or, now, simpler in Python versus C, even 00:04:12.050 --> 00:04:13.640 in this simple example here? 00:04:13.640 --> 00:04:14.545 Yeah. 00:04:14.545 --> 00:04:17.420 AUDIENCE: Now print, instead of printf would be, something like that. 00:04:17.420 --> 00:04:19.837 DAVID J. MALAN: Good, so it's now print instead of printf. 00:04:19.837 --> 00:04:21.110 And there's also no semicolon. 00:04:21.110 --> 00:04:23.103 And there's one other subtlety, over here. 00:04:23.103 --> 00:04:24.020 AUDIENCE: No new line. 00:04:24.020 --> 00:04:25.640 DAVID J. MALAN: Yeah, so no new line, and that 00:04:25.640 --> 00:04:27.110 doesn't mean it's not going to be printed. 00:04:27.110 --> 00:04:29.402 It just turns out that one of the differences we'll see 00:04:29.402 --> 00:04:31.640 is that, with print, you get the new line for free. 00:04:31.640 --> 00:04:34.950 It automatically gets outputted by default, being sort of a common case. 00:04:34.950 --> 00:04:37.190 But you can override it, we'll see, ultimately, too. 00:04:37.190 --> 00:04:38.300 How about in Scratch? 00:04:38.300 --> 00:04:42.082 We had multiple functions like this, that not only said something 00:04:42.082 --> 00:04:43.790 on the screen, but also asked a question, 00:04:43.790 --> 00:04:47.300 thereby being another function that returned a value, called answer. 00:04:47.300 --> 00:04:49.730 In C we saw code that looked a little something 00:04:49.730 --> 00:04:53.420 like this, whereby that first line declares a variable called answer, 00:04:53.420 --> 00:04:55.790 sets it equal to the return value of getString, 00:04:55.790 --> 00:04:57.740 one of the functions from the CS50 library, 00:04:57.740 --> 00:05:00.980 and then the same double quotes and parentheses and semicolon. 00:05:00.980 --> 00:05:05.390 Then we had this format code in C that allowed us, with %S, 00:05:05.390 --> 00:05:07.760 to actually print out that same value. 00:05:07.760 --> 00:05:10.400 In Python, this, too, is going to look a little bit simpler. 00:05:10.400 --> 00:05:13.460 Instead, we're going to have answer equals getString, 00:05:13.460 --> 00:05:16.070 quote unquote "What's your name," and then print, 00:05:16.070 --> 00:05:18.870 with a plus sign and a little bit of new syntax. 00:05:18.870 --> 00:05:21.650 But let's see if we can't just infer from this example what 00:05:21.650 --> 00:05:22.860 it is that's going on. 00:05:22.860 --> 00:05:25.670 Well, first missing on the left is what? 00:05:25.670 --> 00:05:28.620 To the left of the equal sign, there's no what this time? 00:05:28.620 --> 00:05:29.870 Feel free to just call it out. 00:05:29.870 --> 00:05:30.690 AUDIENCE: Type. 00:05:30.690 --> 00:05:31.460 DAVID J. MALAN: So there's no type. 00:05:31.460 --> 00:05:33.770 There's no type, like the word string, which 00:05:33.770 --> 00:05:38.090 even though that was a type in CS50, every other variable in C 00:05:38.090 --> 00:05:41.437 did we use Int or string or float, or Bool or something else. 00:05:41.437 --> 00:05:43.520 In Python, there are still going to be data types, 00:05:43.520 --> 00:05:45.980 today onward, but you, the programmer, don't 00:05:45.980 --> 00:05:49.042 have to bother telling the computer what types you're using. 00:05:49.042 --> 00:05:50.750 The computer is going to be smart enough, 00:05:50.750 --> 00:05:53.240 the language, really, is going to be smart enough, to just figure it out 00:05:53.240 --> 00:05:54.260 from context. 00:05:54.260 --> 00:05:56.150 Meanwhile, on the right hand side, getString 00:05:56.150 --> 00:05:57.858 is going to be a function we'll use today 00:05:57.858 --> 00:06:01.320 and this week, which comes from a Python version of the CS50 library. 00:06:01.320 --> 00:06:04.370 But we'll also start to take off those training wheels, so that you'll 00:06:04.370 --> 00:06:07.670 see how to do things without any CS50 library moving forward, 00:06:07.670 --> 00:06:09.290 using a different function instead. 00:06:09.290 --> 00:06:12.920 As before, no semicolon, but the rest of the syntax is pretty much the same 00:06:12.920 --> 00:06:13.430 here. 00:06:13.430 --> 00:06:16.013 This starts, of course, to get a little bit different, though. 00:06:16.013 --> 00:06:17.650 We're using print instead of printf. 00:06:17.650 --> 00:06:20.860 But now, even though this looks a little cryptic, 00:06:20.860 --> 00:06:23.110 perhaps, if you've never programmed before CS50, 00:06:23.110 --> 00:06:27.130 what might that plus be doing, just based on inference here. 00:06:27.130 --> 00:06:27.880 What do you think? 00:06:27.880 --> 00:06:31.720 AUDIENCE: Adding answer to the string Hello. 00:06:31.720 --> 00:06:34.990 DAVID J. MALAN: Yeah, so adding answer to the string Hello, 00:06:34.990 --> 00:06:37.030 and adding, so to speak, not mathematically, 00:06:37.030 --> 00:06:39.580 but in the form of joining them together, much like we 00:06:39.580 --> 00:06:43.040 saw the joined block in Scratch, or concatenation was the term of art 00:06:43.040 --> 00:06:43.540 there. 00:06:43.540 --> 00:06:46.810 This plus sign appends, if you will, whatever's 00:06:46.810 --> 00:06:48.625 in answer to whatever is quoted here. 00:06:48.625 --> 00:06:51.250 And I deliberately left a space there, so that grammatically it 00:06:51.250 --> 00:06:53.422 looks nice, after the comma as well. 00:06:53.422 --> 00:06:54.880 Now there's another way to do this. 00:06:54.880 --> 00:06:57.130 And it, too, is going to look cryptic at first glance. 00:06:57.130 --> 00:06:59.510 But it just gets easier and more convenient over time. 00:06:59.510 --> 00:07:04.580 You can also change this second line to be this, instead. 00:07:04.580 --> 00:07:05.770 So what's going on here. 00:07:05.770 --> 00:07:08.710 This is actually a relatively new feature of Python in the past couple 00:07:08.710 --> 00:07:11.020 of years, where now what you're seeing is, yes, 00:07:11.020 --> 00:07:13.580 a string, between these same double quotes, 00:07:13.580 --> 00:07:17.075 but this is what Python would call a format string, or Fstring. 00:07:17.075 --> 00:07:20.200 And it literally starts with the letter F, which admittedly looks, I think, 00:07:20.200 --> 00:07:20.980 a little weird. 00:07:20.980 --> 00:07:24.700 But that just indicates that Python should 00:07:24.700 --> 00:07:29.110 assume that anything inside of curly braces inside of the string 00:07:29.110 --> 00:07:32.560 should be interpolated, so to speak, which is a fancy term saying, 00:07:32.560 --> 00:07:36.160 substitute the value of any variables therein. 00:07:36.160 --> 00:07:38.030 And it can do some other things as well. 00:07:38.030 --> 00:07:42.040 So answer is a variable, declared, of course, on this first line. 00:07:42.040 --> 00:07:46.300 This Fstring, then, says to Python, print out Hello comma space, and then 00:07:46.300 --> 00:07:47.950 the value of Answer. 00:07:47.950 --> 00:07:52.390 If, by contrast, you omitted the curly braces, 00:07:52.390 --> 00:07:54.040 just take a guess, what would happen? 00:07:54.040 --> 00:07:56.920 What would the symptom of that bug be, if you accidentally 00:07:56.920 --> 00:08:00.010 forgot the curly braces, but maybe still had the F there? 00:08:00.010 --> 00:08:01.750 AUDIENCE: It would print below it, too. 00:08:01.750 --> 00:08:04.300 DAVID J. MALAN: Yeah, it would literally print Hello, comma answer, because it's 00:08:04.300 --> 00:08:05.200 going to take you literally. 00:08:05.200 --> 00:08:07.690 So the curly braces just kind of allow you to plug things in. 00:08:07.690 --> 00:08:09.350 And, again, it looks a little more cryptic, 00:08:09.350 --> 00:08:11.267 but it's just going to save us time over time. 00:08:11.267 --> 00:08:14.120 And if any of you programmed in Java in high school, for instance, 00:08:14.120 --> 00:08:16.630 you saw plus in that context, too, for concatenation. 00:08:16.630 --> 00:08:19.755 This just kind of makes your code a little tighter, a little more succinct. 00:08:19.755 --> 00:08:21.730 So it's a convenient feature now in Python. 00:08:21.730 --> 00:08:24.190 All right, this was an example in Scratch of a variable, 00:08:24.190 --> 00:08:26.740 setting a variable like counter equal to 0. 00:08:26.740 --> 00:08:30.460 In C it looked like this, where you specify the type, the name, 00:08:30.460 --> 00:08:32.230 and then the value, with a semicolon. 00:08:32.230 --> 00:08:35.096 In Python, it's going to look like this. 00:08:35.096 --> 00:08:36.429 And I'll state the obvious here. 00:08:36.429 --> 00:08:39.340 You don't need to mention the type, just like before with string. 00:08:39.340 --> 00:08:41.030 And you don't need a semicolon. 00:08:41.030 --> 00:08:42.130 So it's a little simpler. 00:08:42.130 --> 00:08:45.005 If you want a variable, just write it and set it equal to some value. 00:08:45.005 --> 00:08:48.070 But the single equal sign still behaves the same as in C. 00:08:48.070 --> 00:08:50.440 Suppose we wanted to increment counter by one. 00:08:50.440 --> 00:08:52.750 In Scratch, we use this puzzle piece here. 00:08:52.750 --> 00:08:55.250 In C, we could do this, actually, in a few different ways. 00:08:55.250 --> 00:08:57.400 There was this way, if counter already exists, 00:08:57.400 --> 00:08:59.980 you just say counter equals counter plus 1. 00:08:59.980 --> 00:09:04.840 There was the slightly less verbose way, where you could say, oops, sorry. 00:09:04.840 --> 00:09:06.400 Let me do the first sentence first. 00:09:06.400 --> 00:09:08.690 In Python, that same thing, as you might guess, 00:09:08.690 --> 00:09:12.160 is actually going to be almost the same, you just throw away the semicolon. 00:09:12.160 --> 00:09:15.370 And the mathematics are ultimately the same, copying from right to left, 00:09:15.370 --> 00:09:17.290 via the assignment operator. 00:09:17.290 --> 00:09:19.570 Now, recall, in C, that we had this shorthand 00:09:19.570 --> 00:09:22.000 notation, which did the same thing. 00:09:22.000 --> 00:09:26.980 In Python, you can similarly do the same thing, just no need for the semicolon. 00:09:26.980 --> 00:09:29.290 The only step backwards we're taking, if you 00:09:29.290 --> 00:09:33.790 were a big fan of counter plus plus, that doesn't exist in Python, 00:09:33.790 --> 00:09:34.625 nor minus minus. 00:09:34.625 --> 00:09:35.500 You just can't do it. 00:09:35.500 --> 00:09:40.210 You have to do the plus equals 1 or plus/minus or minus equals 1 00:09:40.210 --> 00:09:43.720 to achieve that same result. All right, how about in Python 2? 00:09:43.720 --> 00:09:46.360 Here in Scratch, recall, was a conditional, 00:09:46.360 --> 00:09:49.990 asking a silly question like is x less than y, and if so, just say as much. 00:09:49.990 --> 00:09:53.980 In C, that looked a little something like this, printf and if 00:09:53.980 --> 00:09:57.310 with the parentheses, the curly braces, the semicolon, and all of that. 00:09:57.310 --> 00:10:00.610 In Python, this is going to get a little more pleasant to type, too. 00:10:00.610 --> 00:10:03.320 It's going to be just this. 00:10:03.320 --> 00:10:06.460 And if someone wants to call out some of the obvious changes here, 00:10:06.460 --> 00:10:10.365 what has been simplified now in Python for a conditional, it would seem? 00:10:10.365 --> 00:10:11.740 Yeah, what's missing, or changed? 00:10:11.740 --> 00:10:12.350 AUDIENCE: Braces. 00:10:12.350 --> 00:10:13.405 DAVID J. MALAN: So no curly braces. 00:10:13.405 --> 00:10:14.740 AUDIENCE: Colon is back. 00:10:14.740 --> 00:10:15.370 DAVID J. MALAN: I'm sorry? 00:10:15.370 --> 00:10:16.510 AUDIENCE: Using the colon instead. 00:10:16.510 --> 00:10:18.593 DAVID J. MALAN: And we're using the colon instead. 00:10:18.593 --> 00:10:20.620 So I got rid of the curly braces in Python. 00:10:20.620 --> 00:10:22.193 But I'm using a colon instead. 00:10:22.193 --> 00:10:24.110 And even though this is a single line of code, 00:10:24.110 --> 00:10:28.450 so long as you indent subsequent lines along with the printf, 00:10:28.450 --> 00:10:32.830 that's going to imply that everything, if the if condition is true, 00:10:32.830 --> 00:10:36.970 should be executed below it, until you start to un-indent and start writing 00:10:36.970 --> 00:10:38.470 a different line of code altogether. 00:10:38.470 --> 00:10:41.000 So indentation in Python is important. 00:10:41.000 --> 00:10:45.100 So this is among the reasons we've emphasized axes like style, 00:10:45.100 --> 00:10:46.840 just how well styled your code is. 00:10:46.840 --> 00:10:49.360 And honestly, we've seen, certainly, in office hours, 00:10:49.360 --> 00:10:52.000 and you've seen in your own code, sort of a tendency sometimes 00:10:52.000 --> 00:10:55.030 to be a little lax when it comes to indentation, right? 00:10:55.030 --> 00:10:57.670 If you're one of those folks who likes to indent everything 00:10:57.670 --> 00:11:01.210 on the left hand side of the window, yeah, it might compile and run. 00:11:01.210 --> 00:11:04.870 But it's not particularly readable by you or anyone else. 00:11:04.870 --> 00:11:08.590 Python actually addresses this by just requiring indentation, 00:11:08.590 --> 00:11:09.790 when logically needed. 00:11:09.790 --> 00:11:14.050 So Python is going to force you to start inventing properly now, if that's been, 00:11:14.050 --> 00:11:16.680 perhaps, a tendency otherwise. 00:11:16.680 --> 00:11:17.620 What else is missing? 00:11:17.620 --> 00:11:19.050 Well, we have no semicolon here. 00:11:19.050 --> 00:11:21.150 Of course, it's print instead of printf. 00:11:21.150 --> 00:11:23.820 But otherwise, those seem to be the primary differences. 00:11:23.820 --> 00:11:25.680 What about something larger in Scratch? 00:11:25.680 --> 00:11:28.812 If an if-else block, like this, you can perhaps 00:11:28.812 --> 00:11:30.270 guess what it's going to look like. 00:11:30.270 --> 00:11:33.540 In C it looks like this, curly braces semicolons, and so forth. 00:11:33.540 --> 00:11:37.530 In Python, it's going to now look like this, almost the same, 00:11:37.530 --> 00:11:38.820 but indentation is important. 00:11:38.820 --> 00:11:39.960 The colons are important. 00:11:39.960 --> 00:11:42.810 And there's one other difference that's now again visible here, 00:11:42.810 --> 00:11:44.670 but we didn't call it out a second ago. 00:11:44.670 --> 00:11:47.760 What else is different in Python versus C for these conditionals? 00:11:47.760 --> 00:11:48.471 Yeah. 00:11:48.471 --> 00:11:51.120 AUDIENCE: You don't have any parentheses around the condition. 00:11:51.120 --> 00:11:51.700 DAVID J. MALAN: Perfect. 00:11:51.700 --> 00:11:54.090 We don't have any parentheses around the condition, 00:11:54.090 --> 00:11:55.710 the Boolean expression itself. 00:11:55.710 --> 00:11:56.567 And why not? 00:11:56.567 --> 00:11:57.900 Well, it's just simpler to type. 00:11:57.900 --> 00:11:58.950 It's less to type. 00:11:58.950 --> 00:12:00.450 You can still use parentheses. 00:12:00.450 --> 00:12:02.550 And, in fact, you might want to or need to, 00:12:02.550 --> 00:12:07.470 if you want to combine thoughts and do this and that, or this or that. 00:12:07.470 --> 00:12:10.920 But by default, you no longer need or should have those parentheses. 00:12:10.920 --> 00:12:12.150 Just say what you mean. 00:12:12.150 --> 00:12:14.440 Lastly, with conditionals, we had something like this, 00:12:14.440 --> 00:12:16.770 an if else if else statement. 00:12:16.770 --> 00:12:18.840 In C, it looked a little something like this. 00:12:18.840 --> 00:12:20.880 In Python, it's going to get really tighter now. 00:12:20.880 --> 00:12:25.830 It's just if, and this is the curiosity, elif x greater than y. 00:12:25.830 --> 00:12:31.110 So it's not else if, it's literally one keyword, elif, and the colons 00:12:31.110 --> 00:12:33.315 remain now on each of the three lines. 00:12:33.315 --> 00:12:34.690 But the indentation is important. 00:12:34.690 --> 00:12:36.480 And if we did want to do multiple things, 00:12:36.480 --> 00:12:40.238 we could just indent below each of these conditionals, as well. 00:12:40.238 --> 00:12:42.030 All right, let me pause there first, to see 00:12:42.030 --> 00:12:44.490 if there's any questions on these syntactic differences. 00:12:44.490 --> 00:12:45.247 Yeah. 00:12:45.247 --> 00:12:47.532 AUDIENCE: My thought is maybe like, it's good, 00:12:47.532 --> 00:12:51.160 though, does it matter if there's this in between thing like that, but 00:12:51.160 --> 00:12:52.170 and why. 00:12:52.170 --> 00:12:55.050 DAVID J. MALAN: In between, between what and what? 00:12:55.050 --> 00:12:58.420 AUDIENCE: So like the left-hand side and like the right side spaces? 00:12:58.420 --> 00:13:01.830 DAVID J. MALAN: Ah, good question, is Python sensitive 00:13:01.830 --> 00:13:03.750 to spaces and where they go? 00:13:03.750 --> 00:13:06.390 Sometimes no, sometimes yes, is the short answer. 00:13:06.390 --> 00:13:10.080 Stylistically, though, you should be practicing what we're preaching here, 00:13:10.080 --> 00:13:14.265 whereby you do have spaces to the left and right of binary operators, 00:13:14.265 --> 00:13:16.140 that they're called, something like less than 00:13:16.140 --> 00:13:18.348 or greater than is a binary operator, because there's 00:13:18.348 --> 00:13:20.580 two operands to the left and to the right of them. 00:13:20.580 --> 00:13:23.640 And in fact, in Python, more so than the world of C, 00:13:23.640 --> 00:13:26.340 there's actually formal style conventions. 00:13:26.340 --> 00:13:30.687 Not only within CS50 have we had a style guide on the course's website, 00:13:30.687 --> 00:13:34.020 for instance, that just dictates how you should write your code so that it looks 00:13:34.020 --> 00:13:34.945 like everyone else's. 00:13:34.945 --> 00:13:37.320 In the Python community, they take this one step further, 00:13:37.320 --> 00:13:41.260 and there's an actual standard whereby you don't have to adhere to it, 00:13:41.260 --> 00:13:44.310 but generally speaking, in the real world, someone would reprimand you, 00:13:44.310 --> 00:13:47.100 would reject your code, if you're trying to contribute it to another project, 00:13:47.100 --> 00:13:48.730 if you don't adhere to these standards. 00:13:48.730 --> 00:13:51.690 So while you could be lax with some of this white space, 00:13:51.690 --> 00:13:52.860 do make things readable. 00:13:52.860 --> 00:13:56.775 And that's Python theme, for the code to be as readable as possible. 00:13:56.775 --> 00:13:59.400 All right, so let's take a look at a couple of other constructs 00:13:59.400 --> 00:14:01.360 before transitioning to some actual code. 00:14:01.360 --> 00:14:04.110 This, of course, in Scratch was a loop, meowing forever. 00:14:04.110 --> 00:14:08.340 In C, the closest we could get was doing something while true, because true 00:14:08.340 --> 00:14:09.100 never changes. 00:14:09.100 --> 00:14:12.060 So it's sort of a simple way of just saying do this forever. 00:14:12.060 --> 00:14:14.940 In Python, it's pretty much the same thing, 00:14:14.940 --> 00:14:16.740 but a couple of small differences here. 00:14:16.740 --> 00:14:18.600 The parentheses are gone. 00:14:18.600 --> 00:14:19.598 The colon is there. 00:14:19.598 --> 00:14:20.640 The indentation is there. 00:14:20.640 --> 00:14:24.263 No semicolon, and there's one other subtle difference. 00:14:24.263 --> 00:14:24.930 What do you see? 00:14:24.930 --> 00:14:25.920 AUDIENCE: True is capitalized? 00:14:25.920 --> 00:14:28.003 DAVID J. MALAN: True is capitalized, just because. 00:14:28.003 --> 00:14:30.570 Both true and false are Boolean values in Python. 00:14:30.570 --> 00:14:33.150 But you've got to start capitalizing them, just because. 00:14:33.150 --> 00:14:35.040 All right, how about a loop like this, where 00:14:35.040 --> 00:14:38.460 you repeat something a finite number of times, like meowing three times. 00:14:38.460 --> 00:14:41.050 In C, we could do this a few different ways. 00:14:41.050 --> 00:14:44.790 There's this very mechanical way, where you initialize a variable like i 00:14:44.790 --> 00:14:45.570 to zero. 00:14:45.570 --> 00:14:49.350 You then use a while loop and check if i is less than 3, 00:14:49.350 --> 00:14:51.187 the total number of times you want to meow. 00:14:51.187 --> 00:14:52.770 Then you print what you want to print. 00:14:52.770 --> 00:14:56.370 You increment i using this syntax, or the longer, more verbose syntax, 00:14:56.370 --> 00:14:57.880 with plus equals or whatnot. 00:14:57.880 --> 00:15:00.210 And then you do it again and again and again. 00:15:00.210 --> 00:15:04.170 In Python, you can do it functionally the same way, same idea, 00:15:04.170 --> 00:15:05.580 slightly different syntax. 00:15:05.580 --> 00:15:08.190 You just don't bother saying what type of variable you want. 00:15:08.190 --> 00:15:11.038 Python will infer from the fact that there's a 0 right there. 00:15:11.038 --> 00:15:12.330 You don't need the parentheses. 00:15:12.330 --> 00:15:13.260 You do need the colon. 00:15:13.260 --> 00:15:14.760 You do need the indentation. 00:15:14.760 --> 00:15:17.910 You can't do the i plus plus, but you can do this other technique, 00:15:17.910 --> 00:15:20.100 as we could have done in C, as well. 00:15:20.100 --> 00:15:22.320 How else might we do this, though, too? 00:15:22.320 --> 00:15:24.540 Well. it turns out in C, we could do something 00:15:24.540 --> 00:15:28.230 like this, which, again, sort of cryptic at first glance, 00:15:28.230 --> 00:15:31.170 became perhaps more familiar, where you have initialization, 00:15:31.170 --> 00:15:34.920 a conditional, and then an update that you do after each iteration. 00:15:34.920 --> 00:15:37.950 In Python, there isn't really an analog. 00:15:37.950 --> 00:15:40.500 There is no analog in Python, where you have 00:15:40.500 --> 00:15:43.380 the parentheses and the multiple semicolons in the same line. 00:15:43.380 --> 00:15:47.010 Instead, there is a for loop, but it's meant to read a little more 00:15:47.010 --> 00:15:50.550 like English, for i in 0, 1, and 2. 00:15:50.550 --> 00:15:54.780 So we'll see in a bit, these square brackets represent an array, now 00:15:54.780 --> 00:15:57.090 to be called a list in Python. 00:15:57.090 --> 00:16:01.290 So lists in Python are more like link lists than they are arrays. 00:16:01.290 --> 00:16:02.380 More on that soon. 00:16:02.380 --> 00:16:06.210 So this just means for i and the following list of three values. 00:16:06.210 --> 00:16:09.820 And on each iteration of this loop, Python automatically, for you, 00:16:09.820 --> 00:16:11.250 it first sets i to zero. 00:16:11.250 --> 00:16:12.840 Then it sets i to one. 00:16:12.840 --> 00:16:17.880 Then it sets i to two, so that you effectively do things three times. 00:16:17.880 --> 00:16:21.450 But this doesn't necessarily scale, as I've drawn it on the board. 00:16:21.450 --> 00:16:25.140 Suppose you took this at face value as the way 00:16:25.140 --> 00:16:28.980 you iterate some number of times in Python, using a for loop. 00:16:28.980 --> 00:16:33.482 At what point does this approach perhaps get bad, or bad design? 00:16:33.482 --> 00:16:35.190 Let me give folks just a moment to think. 00:16:35.190 --> 00:16:36.415 Yeah, in back. 00:16:36.415 --> 00:16:39.082 AUDIENCE: If you don't know how many times, last time, you know, 00:16:39.082 --> 00:16:41.083 you've got the link in there. 00:16:41.083 --> 00:16:43.500 DAVID J. MALAN: Sure, if you don't know how many times you 00:16:43.500 --> 00:16:47.460 want to loop or iterate, you can't really create a hard-coded list 00:16:47.460 --> 00:16:48.750 like that, of 0, 1, 2. 00:16:48.750 --> 00:16:50.323 Other thoughts? 00:16:50.323 --> 00:16:52.990 AUDIENCE: So you want to say raise a large number of allowances. 00:16:52.990 --> 00:16:55.740 DAVID J. MALAN: Yeah, if you're iterating a large number of times, 00:16:55.740 --> 00:16:57.640 this list is going to get longer and longer, 00:16:57.640 --> 00:16:59.932 and you're just kind of stupidly going to be typing out 00:16:59.932 --> 00:17:03.660 like comma 3, comma 4, comma 5, comma dot dot dot, comma 99, comma 100. 00:17:03.660 --> 00:17:06.160 I mean, your code would start to look atrocious, eventually. 00:17:06.160 --> 00:17:07.510 So there is a better way. 00:17:07.510 --> 00:17:10.359 In Python, there is a function, or technically a type, 00:17:10.359 --> 00:17:14.530 called range, that essentially magically gives you back a range of values 00:17:14.530 --> 00:17:17.599 from 0 on up to, but not through a value. 00:17:17.599 --> 00:17:21.609 So the effect of this line of code, for i in the following range, 00:17:21.609 --> 00:17:24.484 essentially hands you back a list of three values, 00:17:24.484 --> 00:17:26.359 thereby letting you do something three times. 00:17:26.359 --> 00:17:29.067 And if you want to do something 99 times instead, you, of course, 00:17:29.067 --> 00:17:30.575 just change the 3 to a 99. 00:17:30.575 --> 00:17:31.075 Question. 00:17:31.075 --> 00:17:35.090 AUDIENCE: Is there a way to start the beginning point of that range 00:17:35.090 --> 00:17:39.410 at a number or an integer that's higher than zero, or is there never a really 00:17:39.410 --> 00:17:40.460 any point to do so? 00:17:40.460 --> 00:17:41.540 DAVID J. MALAN: A really good question, can 00:17:41.540 --> 00:17:43.440 you start counting at a higher number. 00:17:43.440 --> 00:17:46.910 So not 0, which is the implied default, but something larger than that. 00:17:46.910 --> 00:17:51.560 Yes, so it turns out the range function takes multiple arguments, not just one 00:17:51.560 --> 00:17:54.998 but maybe two or even three, that allows you to customize this behavior. 00:17:54.998 --> 00:17:56.540 So you can customize where it begins. 00:17:56.540 --> 00:17:57.920 You can customize the increment. 00:17:57.920 --> 00:17:59.712 By default, it's one, but if you want to do 00:17:59.712 --> 00:18:02.582 every two values, for like evens or odds, you could do that as well, 00:18:02.582 --> 00:18:03.540 and a few other things. 00:18:03.540 --> 00:18:05.930 And before long, we'll take a look at some Python documentation 00:18:05.930 --> 00:18:08.810 that will become your authoritative source for answers like that. 00:18:08.810 --> 00:18:10.790 Like, what can this function do. 00:18:10.790 --> 00:18:15.020 Other questions on this thus far? 00:18:15.020 --> 00:18:19.980 Seeing none, so what else might we compare and contrast here. 00:18:19.980 --> 00:18:24.320 Well, in the world of C, recall that we had a whole bunch of built-in data 00:18:24.320 --> 00:18:28.310 types, like these here, Bool and char and double and float, and so forth, 00:18:28.310 --> 00:18:31.670 string, which happened to come from the CS50 library. 00:18:31.670 --> 00:18:35.990 But the language C itself certainly understood the idea of strings, 00:18:35.990 --> 00:18:40.700 because the backslash 0, the support for %S and printf, that's all native, 00:18:40.700 --> 00:18:43.370 built into C, not a CS50 simplification. 00:18:43.370 --> 00:18:45.620 All we did, and revealed, as of a couple of weeks 00:18:45.620 --> 00:18:48.050 ago, is that string, this data type, is just 00:18:48.050 --> 00:18:52.730 a synonym for a typedef for char star, which is part of the language natively. 00:18:52.730 --> 00:18:55.610 In Python now, this list actually gets a little shorter, at least 00:18:55.610 --> 00:18:57.443 for these common primitive data types. 00:18:57.443 --> 00:19:00.110 Still going to have bulls, we're going to have floats, and Ints, 00:19:00.110 --> 00:19:02.600 and we're going to have strings, but we're going to call them STRs. 00:19:02.600 --> 00:19:04.760 And this is not a CS50 thing from the library, 00:19:04.760 --> 00:19:08.300 STR, S-T-R, is, in fact, a data type in Python, 00:19:08.300 --> 00:19:12.260 that's going to do a lot more than strings did for us automatically in C. 00:19:12.260 --> 00:19:17.133 Ints and floats, meanwhile, don't need the corresponding longs and doubles, 00:19:17.133 --> 00:19:19.550 because, in fact, among the problems Python solves for us, 00:19:19.550 --> 00:19:22.340 too, Ints can get as big as you want. 00:19:22.340 --> 00:19:25.220 Integer overflow is no longer going to be an issue. 00:19:25.220 --> 00:19:27.950 Per week 1, the language solves that for us. 00:19:27.950 --> 00:19:29.790 Floating point imprecision, unfortunately, 00:19:29.790 --> 00:19:31.190 is still a problem that remains. 00:19:31.190 --> 00:19:34.730 But there are libraries, code that other people have written, as we briefly 00:19:34.730 --> 00:19:37.010 discussed in weeks past, that allow you to do 00:19:37.010 --> 00:19:40.250 scientific or financial computing, using libraries that build 00:19:40.250 --> 00:19:42.625 on top of these data types, as well. 00:19:42.625 --> 00:19:45.500 So there's other data types, too, in Python, which we'll see actually 00:19:45.500 --> 00:19:48.710 gives us a whole bunch of more power and capability, 00:19:48.710 --> 00:19:51.500 things called ranges, like we just saw, lists, 00:19:51.500 --> 00:19:54.080 like I called out verbally, with the square brackets, 00:19:54.080 --> 00:19:56.900 things called tuples, for things like x comma y, 00:19:56.900 --> 00:20:00.305 or latitude, longitude, dictionaries, or Dicts, 00:20:00.305 --> 00:20:03.740 which allow you to store keys and values, much like our hash tables 00:20:03.740 --> 00:20:06.973 from last time, and then sets in the mathematical sense, where they filter 00:20:06.973 --> 00:20:09.890 out duplicates for you, and you can just put a whole bunch of numbers, 00:20:09.890 --> 00:20:13.910 a whole bunch of words or whatnot, and the language, via this data type, 00:20:13.910 --> 00:20:16.400 will filter out duplicates for you. 00:20:16.400 --> 00:20:19.985 Now there's going to be a few functions we give you this week and beyond, 00:20:19.985 --> 00:20:22.610 training wheels that we're then going to very quickly take off, 00:20:22.610 --> 00:20:26.060 just because, as we'll see today, they just simplify the process of getting 00:20:26.060 --> 00:20:29.205 user input correctly, without accidentally writing buggy code, 00:20:29.205 --> 00:20:32.330 just when you're trying to get Hello, World, or something similar, to work. 00:20:32.330 --> 00:20:36.050 And we'll give you functions, not like, not as long as this list in C, 00:20:36.050 --> 00:20:38.630 but a subset of these, get float, get Int, 00:20:38.630 --> 00:20:41.660 and get string, that'll automate the process of getting 00:20:41.660 --> 00:20:45.410 user input in a way that's more resilient against potential bugs. 00:20:45.410 --> 00:20:47.270 But we'll see what those bugs might be. 00:20:47.270 --> 00:20:50.120 And the way we're going to do this is similar in spirit to C. 00:20:50.120 --> 00:20:54.380 Instead of doing include, CS50.h, like we did in C, 00:20:54.380 --> 00:20:57.290 you're going to now start saying import CS50. 00:20:57.290 --> 00:21:00.560 Python supports, similar to C, libraries, 00:21:00.560 --> 00:21:02.300 but there aren't header files anymore. 00:21:02.300 --> 00:21:05.090 You just use the name of the library in Python. 00:21:05.090 --> 00:21:08.450 And if you want to import CS50's functions, you just say import CS50. 00:21:08.450 --> 00:21:12.470 Or, if you want to be more precise, and not just import the whole thing, which 00:21:12.470 --> 00:21:15.860 could be slow, if you've got a really big library with a lot of functionality 00:21:15.860 --> 00:21:19.730 in it, you can be more precise and say from CS50, import get float. 00:21:19.730 --> 00:21:23.480 From CS50 import get Int, from CSM 50 import get string, 00:21:23.480 --> 00:21:26.270 or you can just separate them by commas and import 3 00:21:26.270 --> 00:21:30.550 and only 3 things from a particular library, like ours. 00:21:30.550 --> 00:21:32.300 But starting today and onward, we're going 00:21:32.300 --> 00:21:35.450 to start making much more heavy use of libraries, code 00:21:35.450 --> 00:21:38.570 that other people wrote, so that we're no longer reinventing the wheel. 00:21:38.570 --> 00:21:41.875 We're not making our own linked lists, our own trees, our own dictionaries. 00:21:41.875 --> 00:21:44.250 We're going to start standing on the shoulders of others, 00:21:44.250 --> 00:21:47.120 so that you can get real work done, so to speak, faster, 00:21:47.120 --> 00:21:51.710 by building your software on top of others' code as well. 00:21:51.710 --> 00:21:55.110 All right, so that's it for the syntactic tour of the language, 00:21:55.110 --> 00:21:56.360 and the sort of core features. 00:21:56.360 --> 00:21:58.320 Soon we'll transition to application thereof. 00:21:58.320 --> 00:22:04.040 But let me pause here to see if there's any questions on syntax or primitives 00:22:04.040 --> 00:22:10.340 or otherwise, or otherwise. 00:22:10.340 --> 00:22:12.204 Oh, yes, in back. 00:22:12.204 --> 00:22:16.163 AUDIENCE: Why don't Python have the increment operators. 00:22:16.163 --> 00:22:18.330 DAVID J. MALAN: I'm sorry, say it again, why doesn't 00:22:18.330 --> 00:22:19.788 Python have what kind of operators? 00:22:19.788 --> 00:22:22.578 AUDIENCE: Why doesn't Python have the increment operator? 00:22:22.578 --> 00:22:25.620 DAVID J. MALAN: Sorry, someone coughed when you said something operators. 00:22:25.620 --> 00:22:26.948 AUDIENCE: The increment. 00:22:26.948 --> 00:22:28.740 DAVID J. MALAN: Oh, the increment operator? 00:22:28.740 --> 00:22:30.407 I'd have to check the history, honestly. 00:22:30.407 --> 00:22:32.910 Python has tended to be a fairly minimus language. 00:22:32.910 --> 00:22:36.090 And if you can do something one way, the community, arguably, 00:22:36.090 --> 00:22:40.145 has tended to not give you multiple ways to do the same thing syntactically. 00:22:40.145 --> 00:22:41.520 There's probably a better answer. 00:22:41.520 --> 00:22:45.840 And I'll see if I can dig in and post something online, to follow up on that. 00:22:45.840 --> 00:22:49.870 All right, so before we transition to now writing some actual code, 00:22:49.870 --> 00:22:54.870 let me go ahead and consider exactly how we're going to write code. 00:22:54.870 --> 00:22:58.770 In the world of C, recall that it's generally been a 2-step process. 00:22:58.770 --> 00:23:04.230 We create a file called like Hello.c, and then, step one, make Hello, step 2, 00:23:04.230 --> 00:23:05.400 ./Hello. 00:23:05.400 --> 00:23:08.130 Or, if you think back to week two, when we sort of peeled back 00:23:08.130 --> 00:23:11.100 the layer of what Hello, of what make was doing, 00:23:11.100 --> 00:23:14.310 you could more verbosely type out the name of the actual compiler, 00:23:14.310 --> 00:23:17.640 Clang in our case, command line arguments like dash Oh, Hello, 00:23:17.640 --> 00:23:19.840 to specify what name you want to create. 00:23:19.840 --> 00:23:21.660 And then you can specify the file name. 00:23:21.660 --> 00:23:25.050 And then you can specify what libraries you want to link in. 00:23:25.050 --> 00:23:26.550 So that was a very verbose approach. 00:23:26.550 --> 00:23:28.930 But it was always a two-step approach. 00:23:28.930 --> 00:23:31.680 And so, even as you've been doing recent problem sets, 00:23:31.680 --> 00:23:35.400 odds are you've realized that, any time you want to make a change to your code, 00:23:35.400 --> 00:23:39.660 or make a change to your code and try and test your code again, 00:23:39.660 --> 00:23:42.360 you're constantly doing those two steps. 00:23:42.360 --> 00:23:45.840 Moving forward in Python, it's going to become simpler, 00:23:45.840 --> 00:23:47.610 and it's going to be just this. 00:23:47.610 --> 00:23:50.460 The file name is going to change, but that might go without saying. 00:23:50.460 --> 00:23:55.260 It's going to be something like Hello.py, P-Y, instead of Hello.c. 00:23:55.260 --> 00:23:57.990 And that's just a convention, using a different file extension. 00:23:57.990 --> 00:24:00.780 But there's no compilation step per se. 00:24:00.780 --> 00:24:04.170 You jump right to the execution of your code. 00:24:04.170 --> 00:24:07.200 And so Python, it turns out, is the name, not only of the language 00:24:07.200 --> 00:24:12.150 we're going to start using, it's also the name of a program on a Mac, a PC, 00:24:12.150 --> 00:24:16.020 assuming it's been pre-installed, that interprets the language for you. 00:24:16.020 --> 00:24:20.100 This is to say that Python is generally described as being interpreted, 00:24:20.100 --> 00:24:21.360 not compiled. 00:24:21.360 --> 00:24:25.170 And by that, I mean you get to skip, from the programmer's perspective, 00:24:25.170 --> 00:24:26.370 that compilation step. 00:24:26.370 --> 00:24:30.870 There is no manual step in the world of Python, typically, of writing your code 00:24:30.870 --> 00:24:34.530 and then compiling it to zeros and ones, and then running the zeros and ones. 00:24:34.530 --> 00:24:36.870 Instead, these kind of two steps get collapsed 00:24:36.870 --> 00:24:42.570 into the illusion of one, whereby you, instead, are able to just run the code, 00:24:42.570 --> 00:24:46.200 and let the computer figure out how to actually convert it 00:24:46.200 --> 00:24:48.240 to something the computer understands. 00:24:48.240 --> 00:24:51.850 And the way we do that is via this old process, input and output. 00:24:51.850 --> 00:24:53.910 But now, when you have source code, it's going 00:24:53.910 --> 00:24:56.850 to be passed into an interpreter, not a compiler. 00:24:56.850 --> 00:24:59.400 And the best analog of this is just to perhaps point out 00:24:59.400 --> 00:25:01.950 that, in the human world, if you speak, or don't speak, 00:25:01.950 --> 00:25:05.640 multiple human languages, it can be a pretty slow process from going 00:25:05.640 --> 00:25:07.270 from one language to another. 00:25:07.270 --> 00:25:10.170 For instance, here are step-by-step instructions for finding someone 00:25:10.170 --> 00:25:12.540 in a phone book, unfortunately, in Spanish. 00:25:12.540 --> 00:25:15.360 Unfortunately, if you don't speak or read Spanish. 00:25:15.360 --> 00:25:16.560 You could figure this out. 00:25:16.560 --> 00:25:19.380 You could run this algorithm, but you're going to have to do some googling, 00:25:19.380 --> 00:25:22.130 or you're going to have to open up literal dictionary from Spanish 00:25:22.130 --> 00:25:23.460 to English and convert this. 00:25:23.460 --> 00:25:27.060 And the catch with translating any language, human or computer 00:25:27.060 --> 00:25:30.850 or otherwise, is that you're going to pay a price, typically some time. 00:25:30.850 --> 00:25:33.840 And so converting this in Spanish to this in English 00:25:33.840 --> 00:25:36.360 is just going to take you longer than if this were already 00:25:36.360 --> 00:25:38.453 in your native language. 00:25:38.453 --> 00:25:41.370 And that's going to be one of the subtleties with the world of Python. 00:25:41.370 --> 00:25:45.180 Yes, it's a feature that you can just run the code without having 00:25:45.180 --> 00:25:47.880 to bother compiling it manually first. 00:25:47.880 --> 00:25:49.050 But we might pay a price. 00:25:49.050 --> 00:25:50.815 And things might be a little slower. 00:25:50.815 --> 00:25:52.440 Now, there's ways to chip away at that. 00:25:52.440 --> 00:25:53.815 But we'll see an example thereof. 00:25:53.815 --> 00:25:56.700 In fact, let me transition now to just a couple of examples 00:25:56.700 --> 00:26:00.660 that demonstrate how Python is not only easier for many people 00:26:00.660 --> 00:26:03.240 to use, perhaps yourselves too, because it throws away 00:26:03.240 --> 00:26:06.120 a lot of the annoying syntax, it shortens the number of lines 00:26:06.120 --> 00:26:09.810 you have to write, and also it comes with so many darn libraries, 00:26:09.810 --> 00:26:14.740 you can just do so much more without having to write the code yourself. 00:26:14.740 --> 00:26:17.670 So, as an example of this, let me switch over here 00:26:17.670 --> 00:26:24.090 to this image from problem set 4, which is the Weeks Bridge down by the Charles 00:26:24.090 --> 00:26:25.290 River here in Cambridge. 00:26:25.290 --> 00:26:27.245 And this is the original photo, pretty clear, 00:26:27.245 --> 00:26:30.370 and it's even higher res if we looked at the original version of the photo. 00:26:30.370 --> 00:26:33.660 But there have been no filters, a la Instagram, applied to this photo. 00:26:33.660 --> 00:26:36.750 Recall, for problem set four, you had to implement a few filters. 00:26:36.750 --> 00:26:38.460 And among them might have been blur. 00:26:38.460 --> 00:26:41.610 And blur was probably among the more challenging of the ones, 00:26:41.610 --> 00:26:44.190 because you had to iterate over all of the pixels, 00:26:44.190 --> 00:26:47.130 you had to take into account what's above, what's below, to the left, 00:26:47.130 --> 00:26:47.490 to the right. 00:26:47.490 --> 00:26:49.448 I mean, there was a lot of math and arithmetic. 00:26:49.448 --> 00:26:52.620 And if you ultimately got it, it was probably a great sense of satisfaction. 00:26:52.620 --> 00:26:54.780 But that was probably several hours later. 00:26:54.780 --> 00:26:57.540 In a language like Python, where there might 00:26:57.540 --> 00:27:01.170 be libraries that had been written by others, on whose shoulders 00:27:01.170 --> 00:27:03.880 you can stand, we could perhaps do something like this. 00:27:03.880 --> 00:27:08.280 Let me go ahead and run a program, or write a program, called Blur.py here. 00:27:08.280 --> 00:27:12.130 And in Blur.py, in VS Code, let me just do this. 00:27:12.130 --> 00:27:15.370 Let me import from a library, not the CS50 library, 00:27:15.370 --> 00:27:19.620 but the Pillow library, so to speak, a keyword called image 00:27:19.620 --> 00:27:23.330 and another one called image filter, then let me go ahead 00:27:23.330 --> 00:27:26.420 and say, let me open the current version of this image, which 00:27:26.420 --> 00:27:27.740 is called Bridge.bmp. 00:27:27.740 --> 00:27:30.260 So the before version of the image will be 00:27:30.260 --> 00:27:34.550 the result of calling image.open quote unquote "Bridge.bmp," 00:27:34.550 --> 00:27:37.040 and then, let me create an after version. 00:27:37.040 --> 00:27:38.840 So you'll see before and after. 00:27:38.840 --> 00:27:45.010 After equals the before version .filter of image filter. 00:27:45.010 --> 00:27:46.760 And there is, if I read the documentation, 00:27:46.760 --> 00:27:49.052 I'll see that there's something called a box blur, that 00:27:49.052 --> 00:27:52.160 allows you to blur in box format, like one pixel above, 00:27:52.160 --> 00:27:53.750 below, left, and right. 00:27:53.750 --> 00:27:55.367 So I'll do one pixel there. 00:27:55.367 --> 00:27:57.950 And then, after that's done, let me go ahead and save the file 00:27:57.950 --> 00:28:01.070 as something like Out.bmp. 00:28:01.070 --> 00:28:02.180 That's it. 00:28:02.180 --> 00:28:04.910 Assuming this library works as described, 00:28:04.910 --> 00:28:08.060 I am opening the file in Python, using line 3. 00:28:08.060 --> 00:28:09.680 And this is somewhat new syntax. 00:28:09.680 --> 00:28:13.250 In the world of Python, we're going to start making use of the dot operator 00:28:13.250 --> 00:28:15.320 more, because in the world of Python, you have 00:28:15.320 --> 00:28:19.700 what's called object-oriented programming, or OOP, as a term of art. 00:28:19.700 --> 00:28:22.470 And what this means is that you still have functions, 00:28:22.470 --> 00:28:24.980 you still have variables, but sometimes those functions 00:28:24.980 --> 00:28:28.850 are embedded inside of the variables, or, more specifically, 00:28:28.850 --> 00:28:30.710 inside of the data types themselves. 00:28:30.710 --> 00:28:34.430 Think back to C. When you wanted to convert something to uppercase, 00:28:34.430 --> 00:28:38.582 there was a to upper function that takes as input an argument that's a char. 00:28:38.582 --> 00:28:41.540 And you can pass in any char you want, and it will uppercase it for you 00:28:41.540 --> 00:28:42.890 and give you back a value. 00:28:42.890 --> 00:28:46.160 Well, you know what, if that's such a common paradigm, where 00:28:46.160 --> 00:28:49.850 upper-casing chars is a useful thing, what the world of Python does 00:28:49.850 --> 00:28:54.470 is it embeds into the string data type, or char if you will, 00:28:54.470 --> 00:28:59.240 the ability just to uppercase any char by treating the char, or the string, 00:28:59.240 --> 00:29:02.150 as though it's a struct in C. Recall that structs 00:29:02.150 --> 00:29:04.400 encapsulate multiple types of values. 00:29:04.400 --> 00:29:07.610 In object-oriented programming, in a language like Python, 00:29:07.610 --> 00:29:11.510 you can encapsulate not just values, but also functionality. 00:29:11.510 --> 00:29:13.818 Functions can now be inside of structs. 00:29:13.818 --> 00:29:15.860 But we're not going to call them structs anymore. 00:29:15.860 --> 00:29:17.270 We're going to call them objects. 00:29:17.270 --> 00:29:19.130 But that's just a different vernacular. 00:29:19.130 --> 00:29:20.870 So what am I doing here? 00:29:20.870 --> 00:29:23.870 Inside of the image library, there's a function called open, 00:29:23.870 --> 00:29:26.630 and it takes an argument, the name of the file, to open. 00:29:26.630 --> 00:29:30.260 Once I have a variable called before, that is a struct, or technically 00:29:30.260 --> 00:29:33.290 an object, inside of which is now, because it 00:29:33.290 --> 00:29:36.140 was returned from this function, a function 00:29:36.140 --> 00:29:38.280 called filter, that takes an argument. 00:29:38.280 --> 00:29:41.660 The argument here happens to be image.boxblur1, 00:29:41.660 --> 00:29:42.830 which itself is a function. 00:29:42.830 --> 00:29:44.803 But it just returns the filter to use. 00:29:44.803 --> 00:29:46.970 And then, after, dot save does what you might think. 00:29:46.970 --> 00:29:48.150 It just saves the file. 00:29:48.150 --> 00:29:51.470 So instead of using fopen and fwrite, you just say dot save, 00:29:51.470 --> 00:29:54.510 and that does all of that messy work for you. 00:29:54.510 --> 00:29:57.230 So it's just, what, four lines of code total? 00:29:57.230 --> 00:30:00.240 Let me go ahead and go down to my terminal window. 00:30:00.240 --> 00:30:03.533 Let me go ahead and show you with LS that, at the moment, 00:30:03.533 --> 00:30:05.450 whoops, sorry, let me not bother showing that, 00:30:05.450 --> 00:30:07.160 because I have other examples to come. 00:30:07.160 --> 00:30:14.310 I'm going to go ahead and do Python of Blur.py, nope, sorry, wrong place. 00:30:14.310 --> 00:30:15.570 I did need to make a command. 00:30:15.570 --> 00:30:16.280 There we go. 00:30:16.280 --> 00:30:19.340 OK, let me go ahead and type LS inside of my filter directory, which 00:30:19.340 --> 00:30:21.560 is among the sample code online today. 00:30:21.560 --> 00:30:24.800 There's only one file called Bridge.bmp, dammit, 00:30:24.800 --> 00:30:27.630 I'm trying to get these things ready at the same time. 00:30:27.630 --> 00:30:28.730 Let me rewind. 00:30:28.730 --> 00:30:32.120 Let me move this code into place. 00:30:32.120 --> 00:30:34.710 All right, I've gone ahead and moved this file, Blur.py, 00:30:34.710 --> 00:30:37.190 into a folder called filter, inside of which 00:30:37.190 --> 00:30:42.080 there's another file called Bridge.bmp, which we can confer with LS. 00:30:42.080 --> 00:30:44.390 Let me now go ahead and run Python, which 00:30:44.390 --> 00:30:46.700 is my interpreter, and also the name of the language, 00:30:46.700 --> 00:30:48.990 and run Python on this file. 00:30:48.990 --> 00:30:51.348 So much like running the Spanish algorithm 00:30:51.348 --> 00:30:53.390 through Google Translate, or something like that, 00:30:53.390 --> 00:30:55.650 as input, to get back the English output, 00:30:55.650 --> 00:30:59.540 this is going to translate the Python language to something 00:30:59.540 --> 00:31:01.760 this computer, or this cloud-based environment, 00:31:01.760 --> 00:31:05.070 understands, and then run the corresponding code, top to bottom, 00:31:05.070 --> 00:31:05.707 left to right. 00:31:05.707 --> 00:31:07.040 I'm going to go ahead and Enter. 00:31:07.040 --> 00:31:08.930 No error message is generally a good thing. 00:31:08.930 --> 00:31:11.960 If I type LS you'll now see out.bmp. 00:31:11.960 --> 00:31:13.295 Let me go ahead and open that. 00:31:13.295 --> 00:31:15.920 And, you know what, just to make clear what's really happening, 00:31:15.920 --> 00:31:17.087 let me blur it even further. 00:31:17.087 --> 00:31:20.550 Let's make a box that's not just one pixel around, but 10. 00:31:20.550 --> 00:31:21.950 So let's make that change. 00:31:21.950 --> 00:31:24.830 And let me just go ahead and rerun it with Python of Blur.py. 00:31:24.830 --> 00:31:27.320 I still have Out.bmp. 00:31:27.320 --> 00:31:32.100 Let me go ahead and open Out.bmp and show you first the before, 00:31:32.100 --> 00:31:33.680 which looks like this. 00:31:33.680 --> 00:31:34.550 That's the original. 00:31:34.550 --> 00:31:37.820 And now, crossing my fingers, four lines of code later, 00:31:37.820 --> 00:31:39.758 the result of blurring it, as well. 00:31:39.758 --> 00:31:42.050 So the library is doing all of the same kind of legwork 00:31:42.050 --> 00:31:44.120 that you all did for the assignment, but it's 00:31:44.120 --> 00:31:48.303 encapsulated it all into a single library, that you can then use instead. 00:31:48.303 --> 00:31:50.720 Those of you who might have been feeling more comfortable, 00:31:50.720 --> 00:31:52.595 might have done a little something like this. 00:31:52.595 --> 00:31:56.900 Let me go ahead and open up one other file, called Edges.py. 00:31:56.900 --> 00:32:00.290 And in Edges.py, I'm again going to import from the Pillow library 00:32:00.290 --> 00:32:03.010 the image keyword, and the image filter. 00:32:03.010 --> 00:32:05.510 Then I'm going to go ahead and create a before image, that's 00:32:05.510 --> 00:32:09.590 a result of calling image.open of the same thing, Bridge.bmp, 00:32:09.590 --> 00:32:16.910 then I'm going to go ahead and run a filter on that, called image, whoops, 00:32:16.910 --> 00:32:21.850 image filter.find edges, which is like a content, if you will, 00:32:21.850 --> 00:32:23.708 defined inside of this library for us. 00:32:23.708 --> 00:32:25.750 And then I'm going to do after.save quote unquote 00:32:25.750 --> 00:32:28.210 "Out.bmp," using the same file name. 00:32:28.210 --> 00:32:36.490 I'm now going to run Python of Edges.py, after, sorry, user error. 00:32:36.490 --> 00:32:38.930 We'll see what syntax error means soon. 00:32:38.930 --> 00:32:41.470 Let me go ahead and run the code now, Edges.py. 00:32:41.470 --> 00:32:44.830 Let me now open that new file, Out.bmp. 00:32:44.830 --> 00:32:49.510 And before we had this, and now, especially if what will look familiar 00:32:49.510 --> 00:32:52.210 if we did the more comfortable version of P set 4, 00:32:52.210 --> 00:32:55.340 we now get this, after just four lines of code. 00:32:55.340 --> 00:32:58.120 So again, suggesting the power of using a language that's better 00:32:58.120 --> 00:32:59.560 optimized for the tool at hand. 00:32:59.560 --> 00:33:02.950 And at the risk of really making folks sad, let's go ahead 00:33:02.950 --> 00:33:06.820 and re-implement, if we could, problem set five, real quickly here. 00:33:06.820 --> 00:33:11.080 Let me go ahead and open another version of this code, 00:33:11.080 --> 00:33:14.307 wherein I have a C version, just from problem 00:33:14.307 --> 00:33:16.390 set five, wherein you implemented a spell checker, 00:33:16.390 --> 00:33:18.640 loading 100,000 plus words into memory. 00:33:18.640 --> 00:33:22.390 And then you kept track of just how much time and memory it took. 00:33:22.390 --> 00:33:24.340 And that probably took a while, implementing 00:33:24.340 --> 00:33:26.530 all of those functions in Dictionary.c. 00:33:26.530 --> 00:33:32.240 Let me instead now go into a new file, called Dictionary.py. 00:33:32.240 --> 00:33:35.200 And let me stipulate, for the sake of discussion, 00:33:35.200 --> 00:33:37.660 that we already wrote in advance, Speller.py, 00:33:37.660 --> 00:33:39.850 which corresponds to Speller.c. 00:33:39.850 --> 00:33:41.380 You didn't write either of those. 00:33:41.380 --> 00:33:43.600 Recall for problem set five, we gave you Speller.c. 00:33:43.600 --> 00:33:45.558 Assume that we're going to give you Speller.py. 00:33:45.558 --> 00:33:52.030 So the onus on us right now is only to implement Speller, Dictionary.py. 00:33:52.030 --> 00:33:54.940 All right, so I'm going to go ahead and define a few functions. 00:33:54.940 --> 00:33:58.000 And we're going to see now the syntax for defining functions in Python. 00:33:58.000 --> 00:34:02.230 I want to go ahead and define first, a hash table, which 00:34:02.230 --> 00:34:04.840 was the very first thing you defined in Dictionary.c. 00:34:04.840 --> 00:34:09.969 I'm going to go ahead, then, and say words gets this, give me a dictionary, 00:34:09.969 --> 00:34:11.683 otherwise known as a hash table. 00:34:11.683 --> 00:34:13.600 All right, now let me define a function called 00:34:13.600 --> 00:34:16.630 check, which was the first function you might have implemented. 00:34:16.630 --> 00:34:19.000 Check is going to take a word, and you'll see in Python, 00:34:19.000 --> 00:34:20.375 the syntax is a little different. 00:34:20.375 --> 00:34:21.880 You don't specify the return type. 00:34:21.880 --> 00:34:24.610 You use the word Def instead to define. 00:34:24.610 --> 00:34:28.540 You still specify the name of the function and any arguments thereto. 00:34:28.540 --> 00:34:31.210 But you omit any mention of types. 00:34:31.210 --> 00:34:33.280 But you do use a colon and indent. 00:34:33.280 --> 00:34:37.780 So how do I check if a word is in my dictionary, or in my hash table? 00:34:37.780 --> 00:34:41.440 Well, in Python, I can just say, if word in words, 00:34:41.440 --> 00:34:46.570 go ahead and return true, else go ahead and return false, done, 00:34:46.570 --> 00:34:47.949 with the check function. 00:34:47.949 --> 00:34:49.639 All right, now I want to do like load. 00:34:49.639 --> 00:34:52.639 That was the heavy lift, where you had to load the big file into memory. 00:34:52.639 --> 00:34:54.306 So let me define a function called load. 00:34:54.306 --> 00:34:56.650 It takes a string, the name of a file to load. 00:34:56.650 --> 00:34:59.980 So I'll call that Dictionary, just like in C, but no data type. 00:34:59.980 --> 00:35:04.180 Let me go ahead and open a file by using an open function in Python, 00:35:04.180 --> 00:35:06.740 by opening that Dictionary in read mode. 00:35:06.740 --> 00:35:10.360 So this is a little similar to fopen, a function in C you might recall. 00:35:10.360 --> 00:35:12.880 Then let me iterate over every line in the file. 00:35:12.880 --> 00:35:17.800 In Python, this is pretty pleasant, for line in file colon indent. 00:35:17.800 --> 00:35:22.510 How, now, do I get at the current word, and then strip off the new line, 00:35:22.510 --> 00:35:25.570 because in this file of words, 140,000 words, 00:35:25.570 --> 00:35:28.752 there's word backslash n, word backslash n, all right? 00:35:28.752 --> 00:35:31.210 Well, let me go ahead and get a word from the current line, 00:35:31.210 --> 00:35:34.840 but strip off, from the right end of the string, the new line, which 00:35:34.840 --> 00:35:37.540 the Rstrip function in Python does for me. 00:35:37.540 --> 00:35:42.370 Then let me go ahead and add to my dictionary, or hash table, that word, 00:35:42.370 --> 00:35:43.030 done. 00:35:43.030 --> 00:35:45.535 Let me go ahead and close the file for good measure. 00:35:45.535 --> 00:35:48.160 And then let me go ahead and return true, because all was well. 00:35:48.160 --> 00:35:50.320 That's it for the load function in Python. 00:35:50.320 --> 00:35:51.580 How about the size function? 00:35:51.580 --> 00:35:54.820 This did not take any arguments, it just returns the size of the hash table 00:35:54.820 --> 00:35:55.990 or dictionary in Python. 00:35:55.990 --> 00:35:59.980 I can do that by returning the length of the dictionary in question. 00:35:59.980 --> 00:36:04.660 And then lastly, gone from the world of Python is malloc and free. 00:36:04.660 --> 00:36:06.090 Memory is managed for you. 00:36:06.090 --> 00:36:08.950 So no matter what I do, there's nothing to unload. 00:36:08.950 --> 00:36:10.820 The computer will do that for me. 00:36:10.820 --> 00:36:14.860 So I give you, in these functions, problem set five in Python. 00:36:14.860 --> 00:36:17.020 So, I'm sorry, we made you write it in C first. 00:36:17.020 --> 00:36:20.620 But the implication now is that, what are you getting for free, 00:36:20.620 --> 00:36:21.850 in a language like Python? 00:36:21.850 --> 00:36:24.370 Well, encapsulated in this one line of code 00:36:24.370 --> 00:36:28.270 is much of what you wrote for problem set five, implementing 00:36:28.270 --> 00:36:31.270 your array for all of your letters of the alphabet or more, 00:36:31.270 --> 00:36:34.390 all of the linked lists that you implemented to create chains, 00:36:34.390 --> 00:36:35.930 to store all of those words. 00:36:35.930 --> 00:36:37.060 All of that is happening. 00:36:37.060 --> 00:36:40.090 It's just someone else in the world wrote that code for you. 00:36:40.090 --> 00:36:43.060 And you can now use it by way of a dictionary. 00:36:43.060 --> 00:36:45.550 And actually, I can change this a little bit, 00:36:45.550 --> 00:36:48.670 because add is technically not the right function to use here. 00:36:48.670 --> 00:36:51.620 I'm actually treating the dictionary as something simpler, a set. 00:36:51.620 --> 00:36:55.420 So I'm going to make one tweak, set recall was another data type in Python. 00:36:55.420 --> 00:36:57.700 But set just allows it to handle duplicates, 00:36:57.700 --> 00:37:00.430 and it allows me to just throw things into it by literally 00:37:00.430 --> 00:37:02.320 using a function as simple as add. 00:37:02.320 --> 00:37:05.170 And I'm going to make one other tweak here, 00:37:05.170 --> 00:37:09.790 because, when I'm checking a word, it's possible it might be given 00:37:09.790 --> 00:37:12.520 to me in uppercase or capitalized. 00:37:12.520 --> 00:37:15.880 It's not going to necessarily come in in the same lowercase format 00:37:15.880 --> 00:37:17.470 that my dictionary did. 00:37:17.470 --> 00:37:22.390 I can force every word to lowercase by using word.lower. 00:37:22.390 --> 00:37:24.500 And I don't have to do it character for character, 00:37:24.500 --> 00:37:29.800 I can do the whole darn string at once, by just saying word.lower. 00:37:29.800 --> 00:37:32.860 All right, let me go ahead and open up a terminal window here. 00:37:32.860 --> 00:37:36.118 And let me go into, first, my C version, on the left. 00:37:36.118 --> 00:37:39.160 And actually I'm going to go ahead and split my terminal window into two. 00:37:39.160 --> 00:37:44.007 And on the right, I'm going to go into a version that I essentially just wrote. 00:37:44.007 --> 00:37:46.840 But it's also available online, if you want to play along afterward. 00:37:46.840 --> 00:37:50.170 I'm going to go ahead and make speller in C on the left, 00:37:50.170 --> 00:37:52.270 and note that it takes a moment to compile. 00:37:52.270 --> 00:37:56.530 Then I'm going to be ready to run speller of dictionaries, 00:37:56.530 --> 00:37:59.330 let's do like the Sherlock Holmes text, which is pretty big. 00:37:59.330 --> 00:38:03.970 And then over here, let me get ready to run Python of speller 00:38:03.970 --> 00:38:07.733 on texts/homes.txt2. 00:38:07.733 --> 00:38:10.150 So the syntax is a little different at the command prompt. 00:38:10.150 --> 00:38:12.880 I just, on the left, have to compile the code, with make, 00:38:12.880 --> 00:38:14.650 and then run it with ./speller. 00:38:14.650 --> 00:38:16.370 On the right, I don't need to compile it. 00:38:16.370 --> 00:38:17.860 But I do need to use the interpreter. 00:38:17.860 --> 00:38:20.230 So even though the lines are wrapping a little bit here, 00:38:20.230 --> 00:38:22.180 let me go ahead and run it on the right. 00:38:22.180 --> 00:38:24.305 And I'm going to count how long it takes, verbally, 00:38:24.305 --> 00:38:25.570 for demonstration sake. 00:38:25.570 --> 00:38:28.720 One Mississippi, two Mississippi, three Mississippi, OK, 00:38:28.720 --> 00:38:31.190 so it's like three seconds, give or take. 00:38:31.190 --> 00:38:33.520 Now running it in Python, keeping in mind, 00:38:33.520 --> 00:38:37.103 I spent way fewer hours implementing a spell checker in Python 00:38:37.103 --> 00:38:38.770 than you might have in problem set five. 00:38:38.770 --> 00:38:42.007 But what's the trade-off going to be, and what kinds of design decisions 00:38:42.007 --> 00:38:43.840 do we all now need to be making consciously? 00:38:43.840 --> 00:38:46.300 Here we go, on the right, in Python. 00:38:46.300 --> 00:38:50.020 One Mississippi, two Mississippi, three Mississippi, four Mississippi, 00:38:50.020 --> 00:38:54.070 five Mississippi, six Mississippi, seven Mississippi, eight Mississippi, 00:38:54.070 --> 00:38:57.100 nine Mississippi, 10 Mississippi, 11 Mississippi, 00:38:57.100 --> 00:38:59.990 all right, so 10 or 11 seconds. 00:38:59.990 --> 00:39:01.980 So which one is better? 00:39:01.980 --> 00:39:06.550 Let's go to the group here, which of these programs is the better one? 00:39:06.550 --> 00:39:10.780 How might you answer that question, based on demonstration alone? 00:39:10.780 --> 00:39:11.530 What do you think? 00:39:11.530 --> 00:39:13.738 AUDIENCE: I think Python's better for the programmer, 00:39:13.738 --> 00:39:17.847 more comfortable for the programmer, but C is better for the user. 00:39:17.847 --> 00:39:19.680 DAVID J. MALAN: OK, so Python, to summarize, 00:39:19.680 --> 00:39:23.460 is better for the programmer, because it was way faster to write, 00:39:23.460 --> 00:39:26.460 but C is maybe better for the computer, because it's much faster to run. 00:39:26.460 --> 00:39:28.127 I think that's a reasonable formulation. 00:39:28.127 --> 00:39:29.430 Other opinions? 00:39:29.430 --> 00:39:30.588 Yeah. 00:39:30.588 --> 00:39:32.880 AUDIENCE: I think it depends on the size of the project 00:39:32.880 --> 00:39:33.910 that you're dealing with. 00:39:33.910 --> 00:39:36.285 So if it's going to be something that's relatively quick, 00:39:36.285 --> 00:39:38.710 I might not care that it takes 10 seconds to do it. 00:39:38.710 --> 00:39:40.910 And it could be way faster to do it with Python. 00:39:40.910 --> 00:39:44.070 Whereas with C, if I'm dealing with something like a massive data 00:39:44.070 --> 00:39:48.300 set or something huge, then that time is going to really build up on, 00:39:48.300 --> 00:39:52.740 it might be worth it to put in the upfront effort and just load it into C, 00:39:52.740 --> 00:39:56.260 so the process continually will run faster over a longer period of time. 00:39:56.260 --> 00:39:57.430 DAVID J. MALAN: Absolutely, a really good answer. 00:39:57.430 --> 00:40:00.300 And let me summarize, is it depends on the workload, if you will. 00:40:00.300 --> 00:40:04.050 If you have a very large data set, you might 00:40:04.050 --> 00:40:07.128 want to optimize your code to be as fast and performant as it can be, 00:40:07.128 --> 00:40:09.420 especially if you're running that code again and again. 00:40:09.420 --> 00:40:10.950 Maybe you're a company like Google. 00:40:10.950 --> 00:40:13.110 People are searching a huge database all the time. 00:40:13.110 --> 00:40:15.750 You really want to squeeze every bit of performance 00:40:15.750 --> 00:40:17.222 as you can out of the computer. 00:40:17.222 --> 00:40:19.680 You might want to have someone smart take a language like C 00:40:19.680 --> 00:40:21.450 and write it at a very low level. 00:40:21.450 --> 00:40:22.500 It's going to be painful. 00:40:22.500 --> 00:40:23.400 They're going to have bugs. 00:40:23.400 --> 00:40:26.150 They're going to have to deal with memory management and the like. 00:40:26.150 --> 00:40:29.490 But if and when it works correctly, it's going to be much faster, it would seem. 00:40:29.490 --> 00:40:32.280 By contrast, if you have a data set that's big, 00:40:32.280 --> 00:40:35.820 and 140,000 words is not small, but you don't 00:40:35.820 --> 00:40:38.940 want to spend like 5 hours, 10 hours, a week of your time, 00:40:38.940 --> 00:40:41.063 building a spell checker or a dictionary, 00:40:41.063 --> 00:40:43.980 you can instead leverage a different language with different libraries 00:40:43.980 --> 00:40:48.690 and build on top of it, in order to prioritize the human time instead. 00:40:48.690 --> 00:40:50.841 Other thoughts? 00:40:50.841 --> 00:40:52.789 AUDIENCE: Would you, because with Python, 00:40:52.789 --> 00:40:56.928 doesn't it also like convert the words, or like 00:40:56.928 --> 00:40:58.539 convert the words, for a lesson? 00:40:58.539 --> 00:41:00.581 When we convert that into the same version again, 00:41:00.581 --> 00:41:04.148 do we just take that into view? 00:41:04.148 --> 00:41:06.940 DAVID J. MALAN: That's a perfect segue to exactly the next point we 00:41:06.940 --> 00:41:09.340 wanted to make, which was, is there something in between? 00:41:09.340 --> 00:41:10.360 And indeed there is. 00:41:10.360 --> 00:41:12.970 I'm oversimplifying what this language is actually doing. 00:41:12.970 --> 00:41:15.280 It's not as stark a difference as saying, like, hey, 00:41:15.280 --> 00:41:18.340 Python is four times slower than C. Like that's not the right takeaway. 00:41:18.340 --> 00:41:21.460 There are absolutely ways that engineers can optimize languages, 00:41:21.460 --> 00:41:23.230 as they have already done for Python. 00:41:23.230 --> 00:41:25.840 And in fact, I've configured my settings in such a way 00:41:25.840 --> 00:41:28.777 that I've kind of dramatized just how big the difference is. 00:41:28.777 --> 00:41:30.610 It is going to be slower, Python, typically, 00:41:30.610 --> 00:41:31.930 than the equivalent C program. 00:41:31.930 --> 00:41:33.940 But it doesn't have to be as big of a gap 00:41:33.940 --> 00:41:37.720 as it is here, because, indeed, among the features you can turn on in Python 00:41:37.720 --> 00:41:40.120 is to save some intermediate results. 00:41:40.120 --> 00:41:43.360 Technically speaking, yes, Python is interpreting 00:41:43.360 --> 00:41:46.690 Dictionary.py and these other files, translating them 00:41:46.690 --> 00:41:48.203 from one language to another. 00:41:48.203 --> 00:41:51.370 But that doesn't mean it has to do that every darn time you run the program. 00:41:51.370 --> 00:41:57.020 As you propose, you can save, or cache, C-A-C-H-E, the results of that process. 00:41:57.020 --> 00:42:00.440 So that the second time and the third time are actually notably faster. 00:42:00.440 --> 00:42:03.430 And, in fact, Python itself, the interpreter, the most popular version 00:42:03.430 --> 00:42:05.980 thereof, itself is actually implemented in C. 00:42:05.980 --> 00:42:09.290 So you can make sure that your interpreter is as fast as possible. 00:42:09.290 --> 00:42:11.350 And what then is maybe the high level takeaway? 00:42:11.350 --> 00:42:14.320 Yes, if you are going to try to squeeze every bit of performance 00:42:14.320 --> 00:42:17.710 out of your code, and maybe code is constrained. 00:42:17.710 --> 00:42:19.150 Maybe you have very small devices. 00:42:19.150 --> 00:42:20.770 Maybe it's like a watch nowadays. 00:42:20.770 --> 00:42:26.320 Or maybe it's a sensor that's installed in some small format in an appliance, 00:42:26.320 --> 00:42:29.710 or in infrastructure, where you don't have much battery life 00:42:29.710 --> 00:42:31.630 and you don't have much size, you might want 00:42:31.630 --> 00:42:33.710 to minimize just how much work is being done. 00:42:33.710 --> 00:42:36.743 And so the faster the code runs, and the better it's going to be, 00:42:36.743 --> 00:42:38.410 if it's implemented something low level. 00:42:38.410 --> 00:42:42.310 So C is still very commonly used for certain types of applications. 00:42:42.310 --> 00:42:45.580 But, again, if you just want to solve real world problems, 00:42:45.580 --> 00:42:49.840 and get real work done, and your time is just as, if not more, valuable 00:42:49.840 --> 00:42:52.000 than the device you're running it on, long term, 00:42:52.000 --> 00:42:55.358 you know what, Python is among the most popular languages as well. 00:42:55.358 --> 00:42:58.150 And frankly, if I were implementing a spell checker moving forward, 00:42:58.150 --> 00:42:59.710 I'm probably starting with Python. 00:42:59.710 --> 00:43:01.543 And I'm not going to waste time implementing 00:43:01.543 --> 00:43:04.930 all of that low-level stuff, because the whole point of using newer, 00:43:04.930 --> 00:43:09.460 modern languages is to use abstractions that other people have created for you. 00:43:09.460 --> 00:43:12.910 And by abstraction, I mean something like the dictionary function, 00:43:12.910 --> 00:43:15.370 that just gives you a dictionary, or hash table, 00:43:15.370 --> 00:43:19.225 or the equivalent version that I used, which in this case was a set. 00:43:19.225 --> 00:43:22.720 All right, any questions, then, on Python thus far? 00:43:25.730 --> 00:43:26.710 No, all right. 00:43:26.710 --> 00:43:27.710 Oh, yeah, in the middle. 00:43:27.710 --> 00:43:29.920 AUDIENCE: Could you compile the Python code, 00:43:29.920 --> 00:43:34.610 or is there some, I'd imagine that with the audience that can happen, 00:43:34.610 --> 00:43:38.180 but it feels like if you can just come up with a Python compiler, 00:43:38.180 --> 00:43:40.093 that would give you the best of both worlds. 00:43:40.093 --> 00:43:42.260 DAVID J. MALAN: Really good question or observation, 00:43:42.260 --> 00:43:43.718 could you just compile Python code? 00:43:43.718 --> 00:43:47.180 Yes, absolutely, this idea of compiling code or interpreting code 00:43:47.180 --> 00:43:49.490 is not native to the language itself. 00:43:49.490 --> 00:43:52.410 It tends to be native to the conventions that we humans use. 00:43:52.410 --> 00:43:54.730 So you could actually write an interpreter for C 00:43:54.730 --> 00:43:57.980 that would read it top to bottom, left to right, converting it to, on the fly, 00:43:57.980 --> 00:44:01.640 something the computer understands, but historically that's not been the case. 00:44:01.640 --> 00:44:03.560 C is generally a compiled language. 00:44:03.560 --> 00:44:04.670 But it doesn't have to be. 00:44:04.670 --> 00:44:08.010 What Python nowadays is actually doing is what you described earlier. 00:44:08.010 --> 00:44:10.220 It technically is, sort of unbeknownst to us, 00:44:10.220 --> 00:44:13.970 compiling the code, technically not into 0's and 1's, technically 00:44:13.970 --> 00:44:17.510 into something called byte code, which is this intermediate step that 00:44:17.510 --> 00:44:21.510 just doesn't take as much time as it would to recompile the whole thing. 00:44:21.510 --> 00:44:24.377 And this is an area of research for computer scientists working 00:44:24.377 --> 00:44:26.960 in programming languages, to improve these kinds of paradigms. 00:44:26.960 --> 00:44:27.500 Why? 00:44:27.500 --> 00:44:30.740 Well, honestly, for you and I, the programmer, it's just much easier to, 00:44:30.740 --> 00:44:33.800 one, run the code and not worry about the stupid second step 00:44:33.800 --> 00:44:35.100 of compiling it all the time. 00:44:35.100 --> 00:44:35.600 Why? 00:44:35.600 --> 00:44:38.220 It's literally half as many steps for me, the human. 00:44:38.220 --> 00:44:40.500 And that's a nice thing to optimize for. 00:44:40.500 --> 00:44:44.330 And ultimately, too, you might want all of the fancy features that 00:44:44.330 --> 00:44:45.920 come with these other languages. 00:44:45.920 --> 00:44:47.960 So you should really just be fine-tuning how 00:44:47.960 --> 00:44:51.800 you can enable these features, as opposed to shying away from them here. 00:44:51.800 --> 00:44:54.590 And, in fact, the only time I personally ever use C 00:44:54.590 --> 00:44:57.950 is from like September to October of every year, during CS50. 00:44:57.950 --> 00:45:00.350 Almost every other month do I reach for Python, 00:45:00.350 --> 00:45:03.690 or another language called JavaScript, to actually get real work done, 00:45:03.690 --> 00:45:07.640 which is not to impugn C. It's just that those other languages tend to be better 00:45:07.640 --> 00:45:11.030 fits for the amount of time I have to allocate, and the types of problems 00:45:11.030 --> 00:45:11.905 that I want to solve. 00:45:11.905 --> 00:45:14.405 All right, let's go ahead and take a five minute break here. 00:45:14.405 --> 00:45:17.390 And when we come back, we'll start writing some programs from Scratch. 00:45:17.390 --> 00:45:18.300 All right. 00:45:18.300 --> 00:45:21.740 So let's go ahead and start writing some code from the beginning 00:45:21.740 --> 00:45:24.710 here, whereby we start small with some simple examples, 00:45:24.710 --> 00:45:28.042 and then we'll build our way up to more sophisticated examples in Python. 00:45:28.042 --> 00:45:29.750 But what we'll do along the way is first, 00:45:29.750 --> 00:45:31.865 look side by side at what the C code looked 00:45:31.865 --> 00:45:34.640 like way back in week 1 or 2 or 3 and so forth, 00:45:34.640 --> 00:45:36.890 and then write the corresponding Python code at right. 00:45:36.890 --> 00:45:39.530 And then we'll transition just to focusing on Python itself. 00:45:39.530 --> 00:45:42.322 What I've done in advance today is I've downloaded some of the code 00:45:42.322 --> 00:45:44.930 from the course's website, my source 6 directory, which 00:45:44.930 --> 00:45:47.825 contains all of the pre-written C code from weeks past. 00:45:47.825 --> 00:45:49.700 But it'll also have copies of the Python code 00:45:49.700 --> 00:45:51.660 we'll write here together and look at. 00:45:51.660 --> 00:45:55.445 So first, here is Hello.c back from week 0. 00:45:55.445 --> 00:45:57.323 And this was version 0 of it. 00:45:57.323 --> 00:45:58.740 I'm going to go ahead and do this. 00:45:58.740 --> 00:46:02.240 I'm going to go ahead and split my code window up here. 00:46:02.240 --> 00:46:05.042 I'm going to go ahead and create a new file called Hello.py. 00:46:05.042 --> 00:46:07.250 And this isn't something you'll typically have to do, 00:46:07.250 --> 00:46:08.810 laying your code out side by side. 00:46:08.810 --> 00:46:10.880 But I've just clicked the little icon in VS Code 00:46:10.880 --> 00:46:14.330 that looks like two columns, that splits my code editor into two places, 00:46:14.330 --> 00:46:17.330 so that we can, in fact, see things, for now, side by side, 00:46:17.330 --> 00:46:18.788 with my terminal window down below. 00:46:18.788 --> 00:46:21.747 All right, now I'm going to go ahead and write the corresponding Python 00:46:21.747 --> 00:46:24.560 program on the right, which, recall, was just print, quote 00:46:24.560 --> 00:46:27.170 unquote, "Hello, world," and that's it. 00:46:27.170 --> 00:46:29.420 Now down in my terminal window, I'm going 00:46:29.420 --> 00:46:33.080 to go ahead and run Python of Hello.py, Enter, and voila, 00:46:33.080 --> 00:46:34.450 we've got Hello.py working. 00:46:34.450 --> 00:46:36.950 So again, I'm not going to play any further with the C code. 00:46:36.950 --> 00:46:38.930 It's there just to jog your memory left and right. 00:46:38.930 --> 00:46:41.240 So let's now look at a second version of Hello, world 00:46:41.240 --> 00:46:44.452 from that first week, whereby if I go and get Hello1.c, 00:46:44.452 --> 00:46:46.160 I'm going to drag that over to the right. 00:46:46.160 --> 00:46:48.980 Whoops, I'm going to go ahead and drag that over to the left here. 00:46:48.980 --> 00:46:51.950 And now, on the right, let's modify Hello.py 00:46:51.950 --> 00:46:55.700 to look a little more like this second version in C, all right? 00:46:55.700 --> 00:46:59.867 I want to get an answer from the user as a return value, 00:46:59.867 --> 00:47:01.700 but I also want to get some input from them. 00:47:01.700 --> 00:47:05.420 So from CS50, I'm going to import the function called getString for now. 00:47:05.420 --> 00:47:07.170 We're going to get rid of that eventually, 00:47:07.170 --> 00:47:08.962 but for now, it's a helpful training wheel. 00:47:08.962 --> 00:47:11.180 And then down here, I'm going to say, answer 00:47:11.180 --> 00:47:14.510 equals getString quote unquote, "What's your name"? 00:47:14.510 --> 00:47:15.980 Question mark, space. 00:47:15.980 --> 00:47:17.453 But no semicolon, no data type. 00:47:17.453 --> 00:47:19.370 And then I'm going to go ahead and print, just 00:47:19.370 --> 00:47:25.118 like the first example on the slide, Hello, comma space plus answer. 00:47:25.118 --> 00:47:26.660 And now let me go ahead and run this. 00:47:26.660 --> 00:47:29.660 Python, of Hello.py, all right, it's asking me what's my name. 00:47:29.660 --> 00:47:30.170 David. 00:47:30.170 --> 00:47:31.370 Hello comma David. 00:47:31.370 --> 00:47:36.507 But it's worth calling attention to the fact that I've also simplified further. 00:47:36.507 --> 00:47:38.840 It's not just that the individual functions are simpler. 00:47:38.840 --> 00:47:42.470 What is also now glaringly omitted from my Python code at right, 00:47:42.470 --> 00:47:44.657 both in this version, and the previous version. 00:47:44.657 --> 00:47:46.115 What did I not bother implementing? 00:47:46.115 --> 00:47:47.267 AUDIENCE: The main code. 00:47:47.267 --> 00:47:49.850 DAVID J. MALAN: Yeah, so I didn't even need to implement main. 00:47:49.850 --> 00:47:53.210 We'll revisit the main function, because having a main function 00:47:53.210 --> 00:47:54.860 actually does solve problems sometimes. 00:47:54.860 --> 00:47:56.090 But it's no longer required. 00:47:56.090 --> 00:47:59.750 In C you have to have that to kick-start the entire process of actually running 00:47:59.750 --> 00:48:00.337 your code. 00:48:00.337 --> 00:48:03.170 And in fact, if you were missing main, as you might have experienced 00:48:03.170 --> 00:48:06.033 if you accidentally compiled Helpers.c instead of the file 00:48:06.033 --> 00:48:08.450 that contained main, you would have seen a compiler error. 00:48:08.450 --> 00:48:09.658 In Python it's not necessary. 00:48:09.658 --> 00:48:12.410 In Python you can just jump right in, start programming, and boom, 00:48:12.410 --> 00:48:13.350 you're good to go. 00:48:13.350 --> 00:48:15.225 Especially if it's a small program like this, 00:48:15.225 --> 00:48:18.210 you don't need the added overhead or complexity of a main function. 00:48:18.210 --> 00:48:19.860 So that's one other difference here. 00:48:19.860 --> 00:48:23.390 All right, there are a few other ways we could say Hello, world. 00:48:23.390 --> 00:48:26.160 Recall that I could use a format string. 00:48:26.160 --> 00:48:30.360 So I could put this whole thing in quotes, I could use this f prefix. 00:48:30.360 --> 00:48:33.250 And then let me go ahead and run Python of Hello.py again. 00:48:33.250 --> 00:48:35.250 You can perhaps see where we're going with this. 00:48:35.250 --> 00:48:37.170 Let me type my name, David, and here we go. 00:48:37.170 --> 00:48:39.570 OK, that's the mistake that someone identified earlier, 00:48:39.570 --> 00:48:41.040 you need the curly braces. 00:48:41.040 --> 00:48:44.940 Otherwise no variables are interpolated, that is substituted, 00:48:44.940 --> 00:48:46.390 with their actual values. 00:48:46.390 --> 00:48:50.160 So if I go back in and add those curly braces to the F string, 00:48:50.160 --> 00:48:54.632 now let me run Python of Hello.py, type in my name, and there we go. 00:48:54.632 --> 00:48:55.590 We're back in business. 00:48:55.590 --> 00:48:56.388 Which one's better? 00:48:56.388 --> 00:48:57.180 I mean, it depends. 00:48:57.180 --> 00:49:00.540 But generally speaking, making shorter, more concise code 00:49:00.540 --> 00:49:01.870 tends to be a good thing. 00:49:01.870 --> 00:49:06.450 So stylistically, the F string is probably a reasonable instinct to have. 00:49:06.450 --> 00:49:09.280 All right, well, what more can we do besides this? 00:49:09.280 --> 00:49:12.180 Well, let me go ahead here and let's get rid of the training wheel 00:49:12.180 --> 00:49:13.230 altogether, actually. 00:49:13.230 --> 00:49:15.180 So same C code at left. 00:49:15.180 --> 00:49:18.150 Let me get rid of the CS50 library, which we will ultimately, 00:49:18.150 --> 00:49:19.620 in a couple of weeks, anyway. 00:49:19.620 --> 00:49:22.560 I can't use getString, but I can use a function 00:49:22.560 --> 00:49:24.730 that comes with Python called input. 00:49:24.730 --> 00:49:28.050 And, in fact, this is actually a one-for-one substitution, pretty much. 00:49:28.050 --> 00:49:31.380 There's really no downside to using input instead of getString. 00:49:31.380 --> 00:49:33.420 We implement getString just for consistency 00:49:33.420 --> 00:49:37.800 with what you saw in C. Python of Hello.py, what's your name, David. 00:49:37.800 --> 00:49:39.310 Still actually works the same. 00:49:39.310 --> 00:49:41.227 So gone are the CS50 specific training wheels. 00:49:41.227 --> 00:49:43.227 But we're going to bring them back shortly, just 00:49:43.227 --> 00:49:45.240 to deal with integers or floats or other values, 00:49:45.240 --> 00:49:47.490 too, because it's going to make our lives a little simpler, 00:49:47.490 --> 00:49:48.510 with error checking. 00:49:48.510 --> 00:49:52.350 All right, any questions, before we now pivot to revisiting other examples 00:49:52.350 --> 00:49:56.280 from week 1, but now in Python? 00:49:56.280 --> 00:49:58.110 All right, let me go ahead and open up now. 00:49:58.110 --> 00:50:03.240 Let's say Calculator0.c, which was one of the first examples we did involving 00:50:03.240 --> 00:50:06.870 math and operators like that, as well as functions like getInt, 00:50:06.870 --> 00:50:11.820 let me go ahead and create a new file now called Calculator.py, 00:50:11.820 --> 00:50:15.360 at right, so that I have my C code at left still, 00:50:15.360 --> 00:50:16.950 and my Python code at right. 00:50:16.950 --> 00:50:20.610 All right, let me go dive into a translation of this code into Python. 00:50:20.610 --> 00:50:23.100 I am going to use getInt from the CS50 library. 00:50:23.100 --> 00:50:24.960 So let me import that. 00:50:24.960 --> 00:50:27.340 I'm going to go ahead now and get an Int from the user. 00:50:27.340 --> 00:50:31.000 So x equals getInt, and I'll ask them for an x value, 00:50:31.000 --> 00:50:32.430 just like we did weeks ago. 00:50:32.430 --> 00:50:37.800 No need to specify a semicolon, though, or an Int for the x. 00:50:37.800 --> 00:50:38.940 It will just figure it out. 00:50:38.940 --> 00:50:42.090 Y is going to get another Int via y colon, 00:50:42.090 --> 00:50:46.830 and then down here, I'm going to go ahead and say print of x plus y. 00:50:46.830 --> 00:50:48.720 So this is already a bit new. 00:50:48.720 --> 00:50:53.400 Recall, the C version required that I use this format string, as well 00:50:53.400 --> 00:50:54.428 as printf itself. 00:50:54.428 --> 00:50:56.220 Python is just a little more user-friendly. 00:50:56.220 --> 00:50:59.670 If all you want to do is print out a value, like x plus y, just print it. 00:50:59.670 --> 00:51:02.610 Don't futz with any percent signs or format codes. 00:51:02.610 --> 00:51:05.160 It's not printf, it's indeed just print now. 00:51:05.160 --> 00:51:08.610 All right, let me go ahead and run Python of Calculator.py, 00:51:08.610 --> 00:51:13.620 Enter, just do a quick sample, 1 plus 2 indeed equals 3. 00:51:13.620 --> 00:51:16.410 As an aside, suppose I had taken a different approach 00:51:16.410 --> 00:51:19.508 to importing the whole CS50 library, functionally, it's the same. 00:51:19.508 --> 00:51:21.550 You're not to notice any performance impact here. 00:51:21.550 --> 00:51:22.690 It's a small library. 00:51:22.690 --> 00:51:25.680 But notice what does not work now, whereas it did work 00:51:25.680 --> 00:51:31.110 in C. Python of Calculator.py, Enter, we see our first traceback deliberately 00:51:31.110 --> 00:51:31.690 here. 00:51:31.690 --> 00:51:33.570 So a traceback is just a term of art that 00:51:33.570 --> 00:51:37.210 says, here is a trace back through all of the functions 00:51:37.210 --> 00:51:38.250 that just got executed. 00:51:38.250 --> 00:51:40.170 In the world of C, you might call this a stack 00:51:40.170 --> 00:51:42.937 trace, stack being the operative word. 00:51:42.937 --> 00:51:45.270 Recall that when we talked about the stack and the heap, 00:51:45.270 --> 00:51:48.077 the stack, like a stack of trays, was all of the functions that 00:51:48.077 --> 00:51:49.660 might get called, one after the other. 00:51:49.660 --> 00:51:54.330 We had main, we had swap, then swap went away, and then main finished, recall. 00:51:54.330 --> 00:51:58.020 So here's a trace back of all of the functions or code that got executed. 00:51:58.020 --> 00:52:00.880 There's not really any functions other than my file itself. 00:52:00.880 --> 00:52:02.350 Otherwise there'd be more detail. 00:52:02.350 --> 00:52:05.580 But even though it's a little cryptic, we can perhaps infer from the output 00:52:05.580 --> 00:52:09.960 here, name error, so something related to the name of something, name, getInt 00:52:09.960 --> 00:52:10.950 is not defined. 00:52:10.950 --> 00:52:14.190 And this of course, happens on line 3 over there. 00:52:14.190 --> 00:52:15.520 All right, so why is that? 00:52:15.520 --> 00:52:19.170 Well, Python essentially allows us to namespace 00:52:19.170 --> 00:52:21.750 our functions that come from libraries. 00:52:21.750 --> 00:52:25.290 There was a problem in C. If you were using the CS50 library, 00:52:25.290 --> 00:52:27.180 and thus had access to getInt, getString, 00:52:27.180 --> 00:52:29.850 and so forth, you could not use another library 00:52:29.850 --> 00:52:31.590 that had the same function names. 00:52:31.590 --> 00:52:33.510 They would collide, and the compiler would not 00:52:33.510 --> 00:52:36.030 know how to link them together correctly. 00:52:36.030 --> 00:52:41.520 In Python, and other languages like JavaScript, and in Java, 00:52:41.520 --> 00:52:45.270 you have support for effectively what would be called namespaces. 00:52:45.270 --> 00:52:50.370 You can isolate variables and function names to their own namespace, 00:52:50.370 --> 00:52:52.590 like their own container in memory. 00:52:52.590 --> 00:52:55.560 And what this means is, if you import all of CS50, 00:52:55.560 --> 00:52:59.730 you have to say that the getInt you want is inside the CS50 library. 00:52:59.730 --> 00:53:03.180 So just like with the image blurring, and the image edges 00:53:03.180 --> 00:53:08.430 before, where I had to specify image dot and image filter dot, similarly here, 00:53:08.430 --> 00:53:11.970 am I specifying with a dot operator, albeit a little differently, that I 00:53:11.970 --> 00:53:14.410 want CS50.getInt in both places. 00:53:14.410 --> 00:53:18.120 And now if I rerun Python of Calculator.py, 1 and 2, 00:53:18.120 --> 00:53:19.860 now we're back in business. 00:53:19.860 --> 00:53:20.790 Which one is better? 00:53:20.790 --> 00:53:24.790 Generally speaking, it depends on just how many functions 00:53:24.790 --> 00:53:26.040 you're using from the library. 00:53:26.040 --> 00:53:29.040 If you're using a whole bunch of functions, just import the whole thing. 00:53:29.040 --> 00:53:33.333 If you're only using maybe one or two, import them line by line. 00:53:33.333 --> 00:53:35.750 All right, so let's go ahead and make a little tweak here. 00:53:35.750 --> 00:53:38.917 Let's get rid of this library and take this training wheel off, 00:53:38.917 --> 00:53:41.750 too, as quickly as we introduced it, though for the problems set six 00:53:41.750 --> 00:53:44.310 you'll be able to use all of these same functions. 00:53:44.310 --> 00:53:48.110 Suppose I get rid of this, and I just use the input function, 00:53:48.110 --> 00:53:51.710 just like I did by replacing getString earlier. 00:53:51.710 --> 00:53:54.710 Let me go ahead now and run this version of the code. 00:53:54.710 --> 00:54:00.964 Python of Calculator.py, OK, how about 1 plus 2 equals 3. 00:54:00.964 --> 00:54:02.660 Huh. 00:54:02.660 --> 00:54:05.330 All right, obviously wrong, incorrect. 00:54:05.330 --> 00:54:09.890 Can anyone explain what just happened, based on instincts? 00:54:09.890 --> 00:54:10.890 What just happened here. 00:54:10.890 --> 00:54:11.390 Yeah. 00:54:11.390 --> 00:54:12.620 AUDIENCE: You want an answer? 00:54:12.620 --> 00:54:13.745 DAVID J. MALAN: Sure, yeah. 00:54:13.745 --> 00:54:17.930 AUDIENCE: Say you have a number of strings that don't have Ints, 00:54:17.930 --> 00:54:21.320 so you would part with them and say, printing one, two, better. 00:54:21.320 --> 00:54:24.650 DAVID J. MALAN: Exactly, Python is interpreting, or treating, 00:54:24.650 --> 00:54:26.810 both x and y as strings, which is actually 00:54:26.810 --> 00:54:29.120 what the input function returns by default. 00:54:29.120 --> 00:54:32.150 And so plus is now being interpreted as concatenation, as we defined it 00:54:32.150 --> 00:54:32.660 earlier. 00:54:32.660 --> 00:54:35.780 So x plus y isn't x plus y mathematically, 00:54:35.780 --> 00:54:38.480 but in terms of string joining, just like in Scratch. 00:54:38.480 --> 00:54:41.690 So that's why we're getting 12, or really one two, 00:54:41.690 --> 00:54:43.040 which isn't itself a number. 00:54:43.040 --> 00:54:44.180 It, too, is another string. 00:54:44.180 --> 00:54:45.950 So we somehow need to convert things. 00:54:45.950 --> 00:54:49.040 And we didn't have this ability quite as easily in C. 00:54:49.040 --> 00:54:52.670 We did have like the A to i function, ASCII to integer, 00:54:52.670 --> 00:54:54.270 which did allow you to do this. 00:54:54.270 --> 00:54:59.390 The analog in Python is actually just to do a cast, a typecast, using Int. 00:54:59.390 --> 00:55:02.750 So just like in C, you can use the keyword Int, 00:55:02.750 --> 00:55:04.500 but you use it a little differently. 00:55:04.500 --> 00:55:09.300 Notice that I'm not doing parenthesis Int close parenthesis before the value. 00:55:09.300 --> 00:55:11.010 I'm using Int as a function. 00:55:11.010 --> 00:55:13.430 So indeed, in Python, Int is a function. 00:55:13.430 --> 00:55:16.610 Float is a function, that you can pass values into, 00:55:16.610 --> 00:55:18.270 to do this kind of conversion. 00:55:18.270 --> 00:55:22.010 So now, if I run Python of Calculator.py, 1 and 2, 00:55:22.010 --> 00:55:25.430 now we're back in business, and getting the answer of 3. 00:55:25.430 --> 00:55:27.240 But there's kind of a catch here. 00:55:27.240 --> 00:55:28.430 There's always going to be a trade-off. 00:55:28.430 --> 00:55:30.560 Like that sounds amazing that it just works in this way. 00:55:30.560 --> 00:55:32.450 We can throw away the CS50 library already. 00:55:32.450 --> 00:55:37.130 But what if the user accidentally types, or maliciously types in, 00:55:37.130 --> 00:55:39.035 like a cat, instead of a number. 00:55:39.035 --> 00:55:40.910 Damn, well, there's one of these trace backs. 00:55:40.910 --> 00:55:42.780 Like, now my program has crashed. 00:55:42.780 --> 00:55:45.342 This is similar in spirit to the kinds of segfaults 00:55:45.342 --> 00:55:46.550 that you might have had in C. 00:55:46.550 --> 00:55:47.840 But they're not segfaults per se. 00:55:47.840 --> 00:55:49.507 It doesn't necessarily relate to memory. 00:55:49.507 --> 00:55:55.290 This time it relates to actual runtime values, not being as expected. 00:55:55.290 --> 00:55:58.250 So this time it's not a name error, it's a value error, 00:55:58.250 --> 00:56:02.580 invalid literal for Int with base 10 quote unquote "cat." 00:56:02.580 --> 00:56:06.800 So, again, it's written for sort of a programmer, more than sort 00:56:06.800 --> 00:56:09.650 of a typical person, because it's pretty arcane, the language here. 00:56:09.650 --> 00:56:10.900 But let's try to interpret it. 00:56:10.900 --> 00:56:14.862 Invalid literal, a literal is just something someone typed for Int, which 00:56:14.862 --> 00:56:16.320 is the function name, with base 10. 00:56:16.320 --> 00:56:18.170 It's just defaulting to decimal numbers. 00:56:18.170 --> 00:56:20.415 Cat is apparently not a decimal number. 00:56:20.415 --> 00:56:23.040 It doesn't look like it, therefore it can't be treated like it. 00:56:23.040 --> 00:56:24.930 Therefore, there's a value error. 00:56:24.930 --> 00:56:26.750 So what can we do? 00:56:26.750 --> 00:56:30.200 Unfortunately, you would have to somehow catch this error. 00:56:30.200 --> 00:56:32.450 And the only way to do that in Python really 00:56:32.450 --> 00:56:34.970 is by way of another feature that C did not have, 00:56:34.970 --> 00:56:37.400 namely, what are called exceptions. 00:56:37.400 --> 00:56:42.080 An exception is exactly what just happened, name error, value error. 00:56:42.080 --> 00:56:45.590 They are things that can go wrong when your Python code is running, 00:56:45.590 --> 00:56:50.670 that aren't necessarily going to be detected until you run your code. 00:56:50.670 --> 00:56:56.240 So in Python, and in JavaScript, and in Java, and other more modern languages, 00:56:56.240 --> 00:56:59.240 there's this ability to actually try to do something, 00:56:59.240 --> 00:57:01.015 except if something goes wrong. 00:57:01.015 --> 00:57:03.140 And in fact, I'm going to introduce a bit of syntax 00:57:03.140 --> 00:57:05.557 here, even though we won't have to use this much just yet. 00:57:05.557 --> 00:57:09.980 Instead of just blindly converting x to an Int, let me go ahead 00:57:09.980 --> 00:57:11.970 and try to do that. 00:57:11.970 --> 00:57:15.380 And if there's an exception, go ahead and say something 00:57:15.380 --> 00:57:22.280 like print, that is not an Int. 00:57:22.280 --> 00:57:25.538 And then I'm going to do something like exit, right there. 00:57:25.538 --> 00:57:27.080 And let me go ahead and do this here. 00:57:27.080 --> 00:57:31.370 Let me try to get y, except if there's an exception. 00:57:31.370 --> 00:57:35.997 Then let me go ahead and say, again, that is not an Int exclamation point. 00:57:35.997 --> 00:57:38.330 And then I'm going to exit from there to, otherwise I'll 00:57:38.330 --> 00:57:39.860 go ahead and print x plus y. 00:57:39.860 --> 00:57:46.460 If I run Python of Calculator.py now, whoops, oh, 00:57:46.460 --> 00:57:48.680 forgot my close quote, sorry. 00:57:48.680 --> 00:57:54.560 All right, so close quote, Python of Calculator.py, 1 and 2 still work. 00:57:54.560 --> 00:57:57.800 But if I try to type in something wrong like cat, now 00:57:57.800 --> 00:57:59.310 it actually detects the error. 00:57:59.310 --> 00:58:01.850 So what is the CS50 library in Python doing? 00:58:01.850 --> 00:58:05.600 It's actually doing that try and accept for you, because suffice it to say, 00:58:05.600 --> 00:58:08.540 otherwise your programs for something simple, like a calculator, 00:58:08.540 --> 00:58:09.900 start to get longer and longer. 00:58:09.900 --> 00:58:13.160 So we factored that kind of logic out to the CS50 getInt 00:58:13.160 --> 00:58:14.690 function and get float function. 00:58:14.690 --> 00:58:18.783 But underneath the hood, they're essentially doing this, try except, 00:58:18.783 --> 00:58:20.450 but they're being a little more precise. 00:58:20.450 --> 00:58:24.450 They're detecting a specific error, and they are doing it in a loop, 00:58:24.450 --> 00:58:27.050 so that these functions will get executed again and again. 00:58:27.050 --> 00:58:30.710 In fact, the best way to do this is to say except if there's a value error, 00:58:30.710 --> 00:58:34.078 then print that error message out to the user. 00:58:34.078 --> 00:58:36.870 And again, let's not get too into the weeds here with this feature. 00:58:36.870 --> 00:58:38.760 We've already put into the CS50 library. 00:58:38.760 --> 00:58:41.060 But that's why, for instance, we bootstrap things, 00:58:41.060 --> 00:58:44.420 by just using these functions out of the box. 00:58:44.420 --> 00:58:47.610 All right, let's do something more with our calculator here. 00:58:47.610 --> 00:58:49.010 How about this. 00:58:49.010 --> 00:58:51.890 In the world of C, we had another version 00:58:51.890 --> 00:58:56.990 of this code, which actually did some division by way of-- 00:58:56.990 --> 00:59:01.680 which actually did division of numbers, not just the addition herein. 00:59:01.680 --> 00:59:05.990 So let me go ahead and close the C version, and let's focus only on Python 00:59:05.990 --> 00:59:07.942 now, doing some of these same lines of codes. 00:59:07.942 --> 00:59:09.650 But I'm going to go ahead and just assume 00:59:09.650 --> 00:59:12.140 that the user is going to cooperate and use proper input. 00:59:12.140 --> 00:59:16.310 So from CS50, import getInt, that will deal with any errors for me. 00:59:16.310 --> 00:59:23.640 X gets getInt, ask the user for an Int x, y equals getInt, 00:59:23.640 --> 00:59:25.170 ask the user for an Int y. 00:59:25.170 --> 00:59:27.010 And then, let's go ahead and do this. 00:59:27.010 --> 00:59:31.110 Let's declare a variable called z, set it equal to x divided by y. 00:59:31.110 --> 00:59:32.850 Then let's go ahead and print z. 00:59:32.850 --> 00:59:37.240 Still no need for a format string, I can just print out the variable's value. 00:59:37.240 --> 00:59:39.240 Let me go ahead and run Python of Calculator.py. 00:59:39.240 --> 00:59:43.650 Let me do 1, 10, and I get 0.1. 00:59:43.650 --> 00:59:49.260 What did I get in C, though, if you think back. 00:59:49.260 --> 00:59:52.076 What would we have happened in C? 00:59:52.076 --> 00:59:53.420 AUDIENCE: Zero? 00:59:53.420 --> 00:59:55.640 DAVID J. MALAN: Yeah, we would have gotten zero in C. 00:59:55.640 --> 00:59:57.998 But why, in C, when you divide one Int by another, 00:59:57.998 --> 00:59:59.915 and those Ints are like 1 and 10 respectively? 00:59:59.915 --> 01:00:01.677 AUDIENCE: It'll give you an integer back. 01:00:01.677 --> 01:00:03.260 DAVID J. MALAN: It will give you what? 01:00:03.260 --> 01:00:04.343 AUDIENCE: An integer back. 01:00:04.343 --> 01:00:07.910 DAVID J. MALAN: It will give you an integer back, and, unfortunately, 0.1, 01:00:07.910 --> 01:00:09.860 the integer part of it is indeed zero. 01:00:09.860 --> 01:00:11.970 So this was an example of truncation. 01:00:11.970 --> 01:00:14.540 So truncation was an issue in C. But it would 01:00:14.540 --> 01:00:17.450 seem as though this is no longer a problem in Python, 01:00:17.450 --> 01:00:21.290 insofar as the division operator actually handles that for us. 01:00:21.290 --> 01:00:24.230 As an aside, if you want the old behavior, because it actually 01:00:24.230 --> 01:00:27.020 is sometimes useful for rounding or flooring values, 01:00:27.020 --> 01:00:29.570 you can actually use two slashes. 01:00:29.570 --> 01:00:31.620 And now you get the C behavior. 01:00:31.620 --> 01:00:33.710 So that now 1 divided by 10 is zero. 01:00:33.710 --> 01:00:36.230 So you don't give up that capability, but at least it 01:00:36.230 --> 01:00:37.610 does a more sensible default. 01:00:37.610 --> 01:00:41.030 Most people, especially new programmers, when dividing one value by another, 01:00:41.030 --> 01:00:44.000 would want to get 0.1, not 0, for reasons 01:00:44.000 --> 01:00:46.100 that indeed we had to explain weeks ago. 01:00:46.100 --> 01:00:49.940 But what about another problem we had with the world of floats before, 01:00:49.940 --> 01:00:52.040 whereby there is imprecision? 01:00:52.040 --> 01:00:54.980 Let me go ahead and, somewhat cryptically, print out the value of z 01:00:54.980 --> 01:00:55.860 as follows. 01:00:55.860 --> 01:00:58.340 I'm going to format it using an f-string. 01:00:58.340 --> 01:01:02.720 And I'm going to go ahead and format, not just z, because this is essentially 01:01:02.720 --> 01:01:03.450 the same thing. 01:01:03.450 --> 01:01:06.620 Notice this, if I do Python of Calculator.py, 1 and 10, 01:01:06.620 --> 01:01:09.770 I get, by default, just one significant digit. 01:01:09.770 --> 01:01:13.920 But if I use this syntax in Python, which we won't have to use often, 01:01:13.920 --> 01:01:16.550 I can actually do in C like I did before, 01:01:16.550 --> 01:01:19.650 50 significant digits after the decimal point. 01:01:19.650 --> 01:01:24.020 So now let me rerun Python of Calculator.py 1 and 10, 01:01:24.020 --> 01:01:26.990 and let's see if floating point imprecision is still with us. 01:01:26.990 --> 01:01:28.280 Unfortunately, it is. 01:01:28.280 --> 01:01:30.950 And you can see as much here, the f-string, the format string, 01:01:30.950 --> 01:01:33.990 is just showing us now 50 digits instead of the default one. 01:01:33.990 --> 01:01:36.110 So we've not solved all problems. 01:01:36.110 --> 01:01:38.845 But we have solved at least some. 01:01:38.845 --> 01:01:41.720 All right, before we pivot away from a mere calculator, any questions 01:01:41.720 --> 01:01:45.350 now on syntax or concepts or the like? 01:01:45.350 --> 01:01:46.070 Yeah. 01:01:46.070 --> 01:01:49.320 AUDIENCE: Do you think the double slash you get 01:01:49.320 --> 01:01:51.937 has merit, how do you comment on that? 01:01:51.937 --> 01:01:53.270 DAVID J. MALAN: How do you what? 01:01:53.270 --> 01:01:54.228 Oh, how do you comment. 01:01:54.228 --> 01:01:57.410 Really good question, if you're using double slash for division 01:01:57.410 --> 01:01:59.870 with flooring or truncation, like I described, 01:01:59.870 --> 01:02:01.850 how do you do a comment in Python. 01:02:01.850 --> 01:02:03.380 This is a comment. 01:02:03.380 --> 01:02:05.930 And the convention is actually to use a complete sentence, 01:02:05.930 --> 01:02:07.473 like with a capital T here. 01:02:07.473 --> 01:02:09.890 You don't need a period unless there's multiple sentences. 01:02:09.890 --> 01:02:12.840 And technically, it should be above the line of code by convention. 01:02:12.840 --> 01:02:15.120 So you would use a hash symbol instead. 01:02:15.120 --> 01:02:16.080 Good question. 01:02:16.080 --> 01:02:17.420 I haven't seen those yet. 01:02:17.420 --> 01:02:20.750 All right, let's go ahead and make something else here, how about. 01:02:20.750 --> 01:02:23.430 Let me go ahead and open up, for instance, 01:02:23.430 --> 01:02:29.090 an example called Points1.c, which we saw a few weeks back. 01:02:29.090 --> 01:02:33.530 And let me go ahead on the other side and create a file called Points.py. 01:02:33.530 --> 01:02:36.890 This was a program, recall, that asked the user how many points they 01:02:36.890 --> 01:02:39.388 lost on the first assignment. 01:02:39.388 --> 01:02:41.180 And then it went ahead and just printed out 01:02:41.180 --> 01:02:43.790 whether they lost fewer points than me, because I lost two, 01:02:43.790 --> 01:02:47.117 if you recall the photo, more points than me, or the same points as me. 01:02:47.117 --> 01:02:49.700 Let me go ahead and zoom out so we can see a bit more of this. 01:02:49.700 --> 01:02:54.208 And let me now, on the top right here, go about implementing this in Python. 01:02:54.208 --> 01:02:56.750 So I want to first prompt the user for some number of points. 01:02:56.750 --> 01:03:00.540 So from CS50 let's import getInt, so it handles the error-checking. 01:03:00.540 --> 01:03:03.410 Let's then do points equals getInt, and ask 01:03:03.410 --> 01:03:07.430 the user, how many points did you lose, question mark. 01:03:07.430 --> 01:03:11.990 Then let's go ahead and say, if points less than two, which was my value, 01:03:11.990 --> 01:03:15.800 print, you lost fewer points than me. 01:03:15.800 --> 01:03:23.270 Otherwise, if it's else if points greater than 2, go ahead and print, 01:03:23.270 --> 01:03:27.070 you lost more points than me. 01:03:27.070 --> 01:03:30.800 Else let's go ahead and handle the final scenario, which is you 01:03:30.800 --> 01:03:34.600 lost the same number of points as me. 01:03:34.600 --> 01:03:39.230 Before I run this, does anyone want to point out a mistake I've already made? 01:03:39.230 --> 01:03:39.730 Yeah. 01:03:39.730 --> 01:03:41.390 AUDIENCE: Else if has to be elif. 01:03:41.390 --> 01:03:44.690 DAVID J. MALAN: Yeah, so else if in C is actually now elif in Python. 01:03:44.690 --> 01:03:45.780 It's a single word. 01:03:45.780 --> 01:03:49.790 So let me change this to elif, and now cross my fingers, Python of Points.py, 01:03:49.790 --> 01:03:53.330 suppose you lost three points on some assignment. 01:03:53.330 --> 01:03:55.190 You lost more points than my two. 01:03:55.190 --> 01:03:57.808 If you only lost one point, you lost fewer points than me. 01:03:57.808 --> 01:03:58.850 So the logic is the same. 01:03:58.850 --> 01:04:01.040 But notice the code is much tighter. 01:04:01.040 --> 01:04:04.700 In 10 total lines, we did in what was 24 lines, because we've 01:04:04.700 --> 01:04:06.350 thrown away a lot of the syntax. 01:04:06.350 --> 01:04:08.370 The curly braces are no longer necessary. 01:04:08.370 --> 01:04:10.230 The parentheses are gone, the semicolons. 01:04:10.230 --> 01:04:13.670 So this is why it just tends to be more pleasant pretty quickly, 01:04:13.670 --> 01:04:16.310 using a language like this. 01:04:16.310 --> 01:04:18.770 All right, let's do one other example here. 01:04:18.770 --> 01:04:23.000 In C, recall that we were able to determine the parity of some number, 01:04:23.000 --> 01:04:24.590 if something is even or odd. 01:04:24.590 --> 01:04:29.000 Well, in Python, let me go ahead and create a file called Parity.py, 01:04:29.000 --> 01:04:32.810 and let's look for a moment at the C version at left. 01:04:32.810 --> 01:04:36.680 Here was the code in C that we used to determine the parity of a number. 01:04:36.680 --> 01:04:39.800 And, really, the key takeaway from all these lines 01:04:39.800 --> 01:04:41.290 was just the remainder operator. 01:04:41.290 --> 01:04:42.540 And that one is still with us. 01:04:42.540 --> 01:04:44.998 So this is a simple demonstration, just to make that point, 01:04:44.998 --> 01:04:48.770 if in Python, I want to determine whether a number is even or odd. 01:04:48.770 --> 01:04:53.150 Well, let's go ahead and from CS50, import getInt, then let's go ahead 01:04:53.150 --> 01:04:58.610 and get a number like n from the user, using getInt, and ask them for n. 01:04:58.610 --> 01:05:04.220 And then let's go ahead and say, if n percent sign 2 equals 0, 01:05:04.220 --> 01:05:08.270 then let's go ahead and print quote unquote "Even." 01:05:08.270 --> 01:05:13.753 Else let's go ahead and print out Odd, but before I run this, 01:05:13.753 --> 01:05:16.670 anyone want to instinctively, even though we've not talked about this, 01:05:16.670 --> 01:05:19.010 point out a mistake here? 01:05:19.010 --> 01:05:19.810 What I did wrong? 01:05:19.810 --> 01:05:20.810 AUDIENCE: Double equals. 01:05:20.810 --> 01:05:22.435 DAVID J. MALAN: Yeah, so double equals. 01:05:22.435 --> 01:05:25.850 Again, so even though some of the stuff is changing, some of the same ideas 01:05:25.850 --> 01:05:26.430 are the same. 01:05:26.430 --> 01:05:28.520 So this, too, should be a double equal sign, 01:05:28.520 --> 01:05:30.620 because I'm comparing for equality here. 01:05:30.620 --> 01:05:32.153 And why is this the right math? 01:05:32.153 --> 01:05:34.070 Well, if you divide a number by 2, it's either 01:05:34.070 --> 01:05:36.290 going to have 0 or 1 as a remainder. 01:05:36.290 --> 01:05:39.030 And that's going to determine if it's even or odd for us. 01:05:39.030 --> 01:05:42.200 So let's run Python of Parity.py, type in a number like 50, 01:05:42.200 --> 01:05:44.660 and hopefully we get, indeed, even. 01:05:44.660 --> 01:05:46.910 So again, same idea, but now we're down to eight lines 01:05:46.910 --> 01:05:48.560 of code instead of the 20. 01:05:48.560 --> 01:05:50.810 Well, let's now do something a little more interactive 01:05:50.810 --> 01:05:54.680 and a little representative of tools that actually ask the user questions. 01:05:54.680 --> 01:06:00.320 In C, recall that we had this agreement program, Agree.c. 01:06:00.320 --> 01:06:04.280 And then let's go ahead and implement a corresponding version in Python, 01:06:04.280 --> 01:06:05.870 in a file called Agree.py. 01:06:05.870 --> 01:06:08.570 And let's look at the C version first. 01:06:08.570 --> 01:06:10.700 On the left, we used get char here. 01:06:10.700 --> 01:06:13.190 And then we used the double vertical bars 01:06:13.190 --> 01:06:16.430 to check if C is equal to capital Y or lowercase y. 01:06:16.430 --> 01:06:18.500 And then we did the same thing for n for no. 01:06:18.500 --> 01:06:24.380 And so let's go over here and let's do from CS50, import get-- 01:06:24.380 --> 01:06:26.570 OK, get char is not a thing. 01:06:26.570 --> 01:06:29.090 And this here is another difference with Python. 01:06:29.090 --> 01:06:32.510 There is no data type for individual characters. 01:06:32.510 --> 01:06:34.640 You have strings, STRs, and, honestly, those 01:06:34.640 --> 01:06:36.620 are fine, because if you have a STR that's 01:06:36.620 --> 01:06:38.960 just one character, for all intents and purposes, 01:06:38.960 --> 01:06:40.710 it is just a single character. 01:06:40.710 --> 01:06:41.960 So it's just a simplification. 01:06:41.960 --> 01:06:43.200 You don't have to think as much. 01:06:43.200 --> 01:06:45.658 You don't have to worry about double quotes, single quotes. 01:06:45.658 --> 01:06:49.350 In fact, in Python, you can use double quotes or single quotes, 01:06:49.350 --> 01:06:50.930 so long as you're consistent. 01:06:50.930 --> 01:06:52.970 So long as you're consistent, the single quotes 01:06:52.970 --> 01:06:55.670 do not mean something different, like they do in C. 01:06:55.670 --> 01:06:58.340 So I'm going to go ahead and use getString here, 01:06:58.340 --> 01:07:01.220 although, strictly speaking, I could just use the input function, 01:07:01.220 --> 01:07:02.480 as we saw before. 01:07:02.480 --> 01:07:07.250 I'm going to get a string from the user that asks them this, getString, 01:07:07.250 --> 01:07:10.557 quote unquote, "Do you agree," like a little checkbox or interactive prompt, 01:07:10.557 --> 01:07:13.640 where you have to say yes or no, you want to agree to the following terms, 01:07:13.640 --> 01:07:14.580 or whatnot. 01:07:14.580 --> 01:07:18.110 And then let's translate the conditionals to Python, now, too. 01:07:18.110 --> 01:07:25.850 So if S equals equals quote-unquote "Y," or S equals equals lowercase y, 01:07:25.850 --> 01:07:32.180 let's go ahead and print out agreed, just like in C, elif S equals 01:07:32.180 --> 01:07:35.540 equals N or S equals equals little n. 01:07:35.540 --> 01:07:38.058 Let's go ahead, then, and print out not agreed. 01:07:38.058 --> 01:07:40.850 And you can already see, perhaps, one of the differences here, too. 01:07:40.850 --> 01:07:43.700 Is Python a little more English-like, in that 01:07:43.700 --> 01:07:47.610 you just literally use the English word or, instead of the two vertical bars. 01:07:47.610 --> 01:07:50.370 But it's ultimately doing the same thing. 01:07:50.370 --> 01:07:53.390 Can we simplify this code a bit, though. 01:07:53.390 --> 01:07:55.340 This would be a little annoying if we wanted 01:07:55.340 --> 01:07:57.800 to add support, not just for big Y and little y, 01:07:57.800 --> 01:08:04.230 but Yes or big Yes or little yes or big Y, lowercase e, capital S, right? 01:08:04.230 --> 01:08:07.130 There's a lot of permutations of Y-E-S or just y, 01:08:07.130 --> 01:08:08.720 that we ideally should tolerate. 01:08:08.720 --> 01:08:11.470 Otherwise, the user is going to have to type exactly what we want, 01:08:11.470 --> 01:08:12.770 which isn't very user-friendly. 01:08:12.770 --> 01:08:15.050 Any intuition for how we could logically, 01:08:15.050 --> 01:08:18.270 even if you don't know how to do it in code, make this better? 01:08:18.270 --> 01:08:18.770 Yeah. 01:08:18.770 --> 01:08:21.535 AUDIENCE: Write way over the list, and then up, 01:08:21.535 --> 01:08:22.910 it's like the things in the list. 01:08:22.910 --> 01:08:27.050 DAVID J. MALAN: Nice, yeah, we saw an example of a list before, just 0, 1, 2. 01:08:27.050 --> 01:08:29.899 Why don't we take that same idea and ask a similar question. 01:08:29.899 --> 01:08:34.819 If S is in the following list of values, Y or little y, 01:08:34.819 --> 01:08:38.600 or heck, let me add to the list now, yes, or maybe all capital YES. 01:08:38.600 --> 01:08:40.779 And it's going to get a little annoying, admittedly, 01:08:40.779 --> 01:08:43.750 but this is still better than the alternative, with all the or's. 01:08:43.750 --> 01:08:45.640 I could do things like this, and so forth. 01:08:45.640 --> 01:08:47.740 There's a whole bunch more permutations. 01:08:47.740 --> 01:08:50.470 But let's leave this alone, and let me just go into here 01:08:50.470 --> 01:08:57.279 and change this to, if S is in the following list of N or little n or no, 01:08:57.279 --> 01:09:00.460 and I won't do as, let's just not worry about the weird capitalizations 01:09:00.460 --> 01:09:01.600 there, for now. 01:09:01.600 --> 01:09:02.800 Let's go ahead and run this. 01:09:02.800 --> 01:09:05.950 Python of Agree.py, do I agree? 01:09:05.950 --> 01:09:08.740 Y. OK, how about yes? 01:09:08.740 --> 01:09:10.359 All right, how about big Yes. 01:09:10.359 --> 01:09:11.850 OK, that does not seem to work. 01:09:11.850 --> 01:09:14.350 Notice it did not say agreed, and it did not say not agreed. 01:09:14.350 --> 01:09:15.410 It didn't detect it. 01:09:15.410 --> 01:09:17.180 So how can I do this? 01:09:17.180 --> 01:09:20.770 Well, you know what I could do, what I don't really 01:09:20.770 --> 01:09:22.240 need the uppercase and lowercase. 01:09:22.240 --> 01:09:24.189 Let me tighten this list up a little bit. 01:09:24.189 --> 01:09:27.640 And why don't I just force S to be lowercase. 01:09:27.640 --> 01:09:31.000 S.lower, recall, whether it's one character or more, 01:09:31.000 --> 01:09:34.180 is a function built into STRs now, strings in Python, 01:09:34.180 --> 01:09:35.950 that forces the whole thing to lowercase. 01:09:35.950 --> 01:09:37.450 So now, watch what I can do. 01:09:37.450 --> 01:09:42.700 Python of Agree.py, little y, that works, big Y, that works. 01:09:42.700 --> 01:09:47.840 Big Yes, that works, big Y, little e, big S, that also works. 01:09:47.840 --> 01:09:50.910 So we've now handled, in one fell swoop, a whole bunch more logic. 01:09:50.910 --> 01:09:52.910 And you know what, we can tighten this up a bit. 01:09:52.910 --> 01:09:56.350 Here's an opportunity, in Python, for slightly better design. 01:09:56.350 --> 01:10:00.070 What have I done in here that's a little redundant? 01:10:00.070 --> 01:10:04.180 Does anyone see an opportunity to eliminate a redundancy, 01:10:04.180 --> 01:10:06.820 doing something more times than you need. 01:10:06.820 --> 01:10:08.030 Is a stretch here, no. 01:10:08.030 --> 01:10:08.530 Yep. 01:10:08.530 --> 01:10:11.163 AUDIENCE: You can do S dot lower, above. 01:10:11.163 --> 01:10:13.330 DAVID J. MALAN: We could move the S dot lower above. 01:10:13.330 --> 01:10:15.310 Notice that I'm using S dot lower twice. 01:10:15.310 --> 01:10:17.870 But it's going to give me the same answer both times. 01:10:17.870 --> 01:10:20.080 So I could do a couple of things here. 01:10:20.080 --> 01:10:24.700 I could, first of all, get rid of this lower, and get rid of this lower, 01:10:24.700 --> 01:10:28.720 and then above this, maybe I could do something like this, S equal-- 01:10:28.720 --> 01:10:31.600 I can't just do this, because that throws the value away. 01:10:31.600 --> 01:10:34.240 It does the math, but it doesn't convert the string itself. 01:10:34.240 --> 01:10:35.840 It's going to return a value. 01:10:35.840 --> 01:10:38.260 So I have to say S equals s.lower. 01:10:38.260 --> 01:10:39.340 I could do that. 01:10:39.340 --> 01:10:41.840 Or, honestly, I can chain these things together. 01:10:41.840 --> 01:10:46.070 And this is not something we saw in C. If getString returns a string, 01:10:46.070 --> 01:10:49.240 and strings have functions like lower in them, 01:10:49.240 --> 01:10:52.330 you can chain these functions together, like this, and do dot this, 01:10:52.330 --> 01:10:53.788 dot that, dot this other thing. 01:10:53.788 --> 01:10:56.830 And eventually you want to stop, because it's going to become crazy long. 01:10:56.830 --> 01:10:58.810 But this is reasonable, still fits on the screen. 01:10:58.810 --> 01:10:59.560 It's pretty tight. 01:10:59.560 --> 01:11:01.690 It does in one place what I was doing in two. 01:11:01.690 --> 01:11:03.010 So I think that's OK. 01:11:03.010 --> 01:11:05.980 Let me go ahead and do Python of Agree.py one last time. 01:11:05.980 --> 01:11:07.120 Let's try it one last time. 01:11:07.120 --> 01:11:10.360 And it's still working as intended. 01:11:10.360 --> 01:11:12.700 Also if I tried those other inputs as well. 01:11:12.700 --> 01:11:13.435 Yeah, question. 01:11:13.435 --> 01:11:19.290 AUDIENCE: Could you add on like a for uppercase as well, for like upper, 01:11:19.290 --> 01:11:22.700 and then cover all the functions where it's lowercase, for all the functions 01:11:22.700 --> 01:11:25.450 where it's uppercase as well, or could you not just do this again. 01:11:29.095 --> 01:11:30.470 DAVID J. MALAN: Let me summarize. 01:11:30.470 --> 01:11:33.340 Could we handle uppercase and lowercase together in some form? 01:11:33.340 --> 01:11:35.020 I'm actually doing that already. 01:11:35.020 --> 01:11:36.370 I just have to pick a lane. 01:11:36.370 --> 01:11:39.307 I have to either be all lowercase in my logic or all uppercase, 01:11:39.307 --> 01:11:41.140 and not worry about what the human types in, 01:11:41.140 --> 01:11:43.240 because no matter what the human types in, I'm 01:11:43.240 --> 01:11:44.950 forcing their input to lowercase. 01:11:44.950 --> 01:11:48.280 And then I am using a lowercase list of values. 01:11:48.280 --> 01:11:49.520 If I want to flip that, fine. 01:11:49.520 --> 01:11:51.040 I just have to be self-consistent. 01:11:51.040 --> 01:11:52.420 But I'm handling that already. 01:11:52.420 --> 01:11:53.223 Yeah. 01:11:53.223 --> 01:11:56.953 AUDIENCE: Are strings no longer an array of characters? 01:11:56.953 --> 01:11:58.870 DAVID J. MALAN: A really good loaded questions 01:11:58.870 --> 01:12:02.080 are strings no longer an array of characters? 01:12:02.080 --> 01:12:04.120 Conceptually, yes, underneath the hood, no. 01:12:04.120 --> 01:12:06.190 They're a little more sophisticated than that, 01:12:06.190 --> 01:12:08.590 because with strings, you have a few changes. 01:12:08.590 --> 01:12:10.600 Not only do they have functions built into them, 01:12:10.600 --> 01:12:12.580 because strings are now what we call objects, 01:12:12.580 --> 01:12:14.500 in what's called object-oriented programming. 01:12:14.500 --> 01:12:17.042 And we're going to keep seeing examples of this dot operator. 01:12:17.042 --> 01:12:21.550 They are also immutable, so to speak, I-M-M-U-T-A-B-L-E. 01:12:21.550 --> 01:12:25.180 Immutable means they cannot be changed, which means, unlike C, 01:12:25.180 --> 01:12:28.750 you can't go into a string and change its individual characters. 01:12:28.750 --> 01:12:31.480 You can make a copy of the string that makes a change, 01:12:31.480 --> 01:12:33.698 but you can't change the original string itself. 01:12:33.698 --> 01:12:35.740 This is both a little annoying, maybe, sometimes. 01:12:35.740 --> 01:12:38.365 But it's also pretty protective, because you can't do screw-ups 01:12:38.365 --> 01:12:41.680 like I did weeks ago, when I was trying to copy S and call it T. 01:12:41.680 --> 01:12:43.270 And then one affected the other. 01:12:43.270 --> 01:12:47.080 Python, underneath the hood, is handling all of the memory management 01:12:47.080 --> 01:12:48.550 and the pointers and all of that. 01:12:48.550 --> 01:12:51.040 There are no pointers in Python. 01:12:51.040 --> 01:12:55.840 So If that wasn't clear, all of that pain, if you will, all of that power, 01:12:55.840 --> 01:13:00.280 is now handled by the language itself, not by us, the programmers. 01:13:00.280 --> 01:13:02.440 All right, so let's introduce maybe some loops, 01:13:02.440 --> 01:13:04.390 like we've been in the habit of doing. 01:13:04.390 --> 01:13:08.170 Let me open up Meow.c, which was an example in C, just meowing 01:13:08.170 --> 01:13:09.730 a bunch of times textually. 01:13:09.730 --> 01:13:12.800 Let me create a file called Meow.py here on the right. 01:13:12.800 --> 01:13:15.190 And notice on the left, this was correct code in C, 01:13:15.190 --> 01:13:16.670 but it was kind of poorly designed. 01:13:16.670 --> 01:13:17.170 Why? 01:13:17.170 --> 01:13:19.450 Because it was a missed opportunity for a loop. 01:13:19.450 --> 01:13:22.460 Why say something three times when you can say it just once? 01:13:22.460 --> 01:13:25.990 So in Python, let me do it the poorly designed way first. 01:13:25.990 --> 01:13:27.400 Let me print out meow. 01:13:27.400 --> 01:13:31.210 And, like I generally should not, let me copy, paste it three times, 01:13:31.210 --> 01:13:33.670 run Python of Meow.py, and it works. 01:13:33.670 --> 01:13:35.318 OK, but not good practice. 01:13:35.318 --> 01:13:37.360 So let me go ahead and improve this a little bit. 01:13:37.360 --> 01:13:38.990 And there's a few ways to do this. 01:13:38.990 --> 01:13:44.050 If I wanted to do this three times, I could instead do something like this. 01:13:44.050 --> 01:13:48.010 For i in range of 3, recall that that was the better version, 01:13:48.010 --> 01:13:51.370 rather than arbitrarily enumerate numbers yourself, let me go ahead 01:13:51.370 --> 01:13:53.490 and print out quote unquote "Meow." 01:13:53.490 --> 01:13:56.077 Now if I run Python of Meow, still seems to work. 01:13:56.077 --> 01:13:57.910 So it's a little tighter, and, my God, like, 01:13:57.910 --> 01:13:59.952 programs can't really get much shorter than this. 01:13:59.952 --> 01:14:04.300 We're down to two lines of code, no main function, no gratuitous syntax. 01:14:04.300 --> 01:14:06.580 Let's now improve the design further, like we 01:14:06.580 --> 01:14:09.550 did in C, by introducing a function called 01:14:09.550 --> 01:14:11.230 meow, that actually does the meowing. 01:14:11.230 --> 01:14:13.000 So this was our first abstraction, recall, 01:14:13.000 --> 01:14:18.100 both in Scratch and in C. Let me focus now entirely on the Python version 01:14:18.100 --> 01:14:18.760 here. 01:14:18.760 --> 01:14:23.485 Let me go ahead and first define a function. 01:14:26.890 --> 01:14:30.250 Let me first go ahead and do this, for i in range of 3, 01:14:30.250 --> 01:14:33.430 let's assume for the moment that there's a meow function, 01:14:33.430 --> 01:14:34.720 that I'm just going to call. 01:14:34.720 --> 01:14:38.320 Let's now go ahead and define, using the Def key word, which we saw briefly 01:14:38.320 --> 01:14:41.170 with the speller demonstration, a function 01:14:41.170 --> 01:14:42.880 called meow that takes no arguments. 01:14:42.880 --> 01:14:45.460 And all it does for now is print meow. 01:14:45.460 --> 01:14:50.620 Let me now go ahead and run Python of Meow.py Enter, huh, one 01:14:50.620 --> 01:14:51.950 of those trace backs. 01:14:51.950 --> 01:14:54.080 So this is another name error. 01:14:54.080 --> 01:14:57.080 And, again, name meow is not defined. 01:14:57.080 --> 01:14:59.080 What's your instinct here, even though we've not 01:14:59.080 --> 01:15:00.760 tripped over this yet in Python? 01:15:00.760 --> 01:15:03.130 Where does your mind go here? 01:15:03.130 --> 01:15:03.670 Yeah. 01:15:03.670 --> 01:15:06.080 AUDIENCE: Does it read top to bottom, left to right? 01:15:06.080 --> 01:15:09.600 I'm guessing we could find a new case. 01:15:09.600 --> 01:15:13.020 DAVID J. MALAN: Perfect, as smart, as smarter as Python seems to be, 01:15:13.020 --> 01:15:14.770 it still makes certain assumptions. 01:15:14.770 --> 01:15:18.010 And if it hasn't seen a keyword yet, it just doesn't exist. 01:15:18.010 --> 01:15:21.000 So if you want it to exist, we have to be a little clever here. 01:15:21.000 --> 01:15:24.090 I could just put it, flip it around, like this. 01:15:24.090 --> 01:15:26.470 But this honestly isn't particularly good design. 01:15:26.470 --> 01:15:26.970 Why? 01:15:26.970 --> 01:15:30.390 Because now, if you, the reader of your code, whether you 01:15:30.390 --> 01:15:32.970 wrote it or someone else, you kind of have to go fishing now. 01:15:32.970 --> 01:15:34.560 Like where does this program begin? 01:15:34.560 --> 01:15:38.130 And even though, yes, it's obvious that it begins on line four, logically, 01:15:38.130 --> 01:15:40.710 like, if the file were longer, you're going to be annoyed 01:15:40.710 --> 01:15:43.180 and fishing visually for the right lines of code. 01:15:43.180 --> 01:15:44.397 So let's reintroduce main. 01:15:44.397 --> 01:15:46.230 And indeed, this would be a common paradigm. 01:15:46.230 --> 01:15:49.380 When you want to start having abstractions in your own functions, 01:15:49.380 --> 01:15:53.460 just put your own code in main, so that, one, you can leave it up top, and two, 01:15:53.460 --> 01:15:55.650 you can solve the problem we just encountered. 01:15:55.650 --> 01:15:58.860 So let me define a function called main that has that same loop, 01:15:58.860 --> 01:16:00.240 meowing three times. 01:16:00.240 --> 01:16:02.040 But now watch what happens. 01:16:02.040 --> 01:16:07.350 Let me go into my terminal and run Python of Meow.py, Enter. 01:16:07.350 --> 01:16:07.850 Nothing. 01:16:10.500 --> 01:16:14.050 All right, investigate this. 01:16:14.050 --> 01:16:16.290 What could explain this symptom. 01:16:16.290 --> 01:16:18.020 I have not told you the answer yet. 01:16:18.020 --> 01:16:19.770 So all you have is your instinct, assuming 01:16:19.770 --> 01:16:21.720 you've never touched Python before. 01:16:21.720 --> 01:16:26.800 What might explain this symptom, where nothing is meowing? 01:16:26.800 --> 01:16:27.300 Yeah? 01:16:27.300 --> 01:16:28.970 AUDIENCE: Didn't run the main function. 01:16:28.970 --> 01:16:31.178 DAVID J. MALAN: Yeah, I didn't run the main function. 01:16:31.178 --> 01:16:33.390 So in C, this is functionality you get for free. 01:16:33.390 --> 01:16:34.765 You have to have a main function. 01:16:34.765 --> 01:16:37.580 But, heck, so long as you make it, it will be called for you. 01:16:37.580 --> 01:16:41.390 In Python, this is just a convention, to create a main function, 01:16:41.390 --> 01:16:43.200 borrowing a very common name for it. 01:16:43.200 --> 01:16:46.320 But if you want to call that main function, you have to do it. 01:16:46.320 --> 01:16:48.110 So this looks a little weird, admittedly, 01:16:48.110 --> 01:16:50.030 that you have to call your own main function now, 01:16:50.030 --> 01:16:51.860 and it has to be at the bottom of the file, 01:16:51.860 --> 01:16:55.040 because only once the interpreter gets to the bottom of the file, 01:16:55.040 --> 01:16:58.460 have all of your functions been defined, higher up. 01:16:58.460 --> 01:16:59.990 But this solves both problems. 01:16:59.990 --> 01:17:02.450 It keeps your code, that's the main part of your code, 01:17:02.450 --> 01:17:03.660 at the very top of the file. 01:17:03.660 --> 01:17:06.980 So it's just obvious to you, and a TF, or any reader in the future, 01:17:06.980 --> 01:17:09.140 where the program logically starts. 01:17:09.140 --> 01:17:13.310 But it also ensures that main is not called until everything else, main 01:17:13.310 --> 01:17:15.660 included, has been defined. 01:17:15.660 --> 01:17:17.648 So this is another perfect example of we're 01:17:17.648 --> 01:17:19.440 learning a new language for the first time. 01:17:19.440 --> 01:17:21.020 You're not going to have heard all of the answers before. 01:17:21.020 --> 01:17:24.830 Just apply some logic, as to, like, all right, what could explain this symptom. 01:17:24.830 --> 01:17:28.190 Start to infer how the language does or doesn't work. 01:17:28.190 --> 01:17:32.450 If I now go and run this, Python of Meow.py, now we're back in business. 01:17:32.450 --> 01:17:35.360 And just so you have seen it, there is a quote 01:17:35.360 --> 01:17:38.840 unquote "better" way of doing this, that solves different problems that we 01:17:38.840 --> 01:17:42.050 are not going to encounter, certainly in these initial days. 01:17:42.050 --> 01:17:45.440 Typically, you would see in online tutorials or books, 01:17:45.440 --> 01:17:49.400 something that looks like this, where you actually have a weird conditional 01:17:49.400 --> 01:17:50.810 with multiple underscores. 01:17:50.810 --> 01:17:54.470 That's functionally the same thing, but it solves problems with libraries, 01:17:54.470 --> 01:17:57.840 if we ourselves were implementing a library or something similar in spirit. 01:17:57.840 --> 01:18:00.882 But we're going to keep things simpler and just write main at the bottom, 01:18:00.882 --> 01:18:03.355 because we're not going to encounter that problem just yet. 01:18:03.355 --> 01:18:06.230 All right, let's make one change to this, just to show how it's done. 01:18:06.230 --> 01:18:11.420 In C, the last version of meow also took command line argument, sorry, also 01:18:11.420 --> 01:18:13.910 took arguments to the function meow. 01:18:13.910 --> 01:18:16.490 So suppose that I want to factor this out. 01:18:16.490 --> 01:18:19.250 And I want to just call meow as a better abstraction, where I just 01:18:19.250 --> 01:18:21.080 say meow this number of times. 01:18:21.080 --> 01:18:24.290 And I figure out how many times by just, like, putting in number 3 01:18:24.290 --> 01:18:26.990 or using getInt or something like that, to figure out 01:18:26.990 --> 01:18:28.550 how many times to say meow. 01:18:28.550 --> 01:18:31.820 Well, now, I have to define inside my meow function, in input, 01:18:31.820 --> 01:18:38.330 let's call it n, and then use that, as by doing this, for i in range of n, 01:18:38.330 --> 01:18:41.640 let me go ahead and print out meow that many times. 01:18:41.640 --> 01:18:43.820 So again, the only thing that's different in C 01:18:43.820 --> 01:18:47.630 is we don't bother specifying return types for any of these functions, 01:18:47.630 --> 01:18:52.230 and we don't bother specifying the type of our arguments or our variables. 01:18:52.230 --> 01:18:54.930 So same ideas, simpler in some sense. 01:18:54.930 --> 01:18:56.660 We're just throwing away keystrokes. 01:18:56.660 --> 01:18:59.450 All right, let me run this one final time, Python of Meow.py, 01:18:59.450 --> 01:19:02.390 and we still have the same program. 01:19:02.390 --> 01:19:04.110 All right, let me pause here. 01:19:04.110 --> 01:19:04.780 Any questions? 01:19:04.780 --> 01:19:06.030 And I know this is going fast. 01:19:06.030 --> 01:19:11.355 But hopefully, the C code is still somewhat familiar. 01:19:11.355 --> 01:19:11.855 Yeah. 01:19:11.855 --> 01:19:17.530 AUDIENCE: Is there any difference between global and local variables. 01:19:17.530 --> 01:19:18.780 DAVID J. MALAN: Good question. 01:19:18.780 --> 01:19:21.238 Is there any difference between global and local variables? 01:19:21.238 --> 01:19:23.850 Short answer, yes, and we would run into that same problem, 01:19:23.850 --> 01:19:25.320 if we declare a variable in one function, 01:19:25.320 --> 01:19:27.445 another function is not going to have access to it. 01:19:27.445 --> 01:19:30.660 We can solve that by putting variables globally. 01:19:30.660 --> 01:19:32.760 But we don't have all of the features we had in C, 01:19:32.760 --> 01:19:35.160 like there's no such thing as a constant in Python. 01:19:35.160 --> 01:19:36.900 The mentality in the Python community is, 01:19:36.900 --> 01:19:39.480 if you don't want some value to change, don't touch it. 01:19:39.480 --> 01:19:40.630 Like just don't screw up. 01:19:40.630 --> 01:19:42.240 So there's trade-offs here, too. 01:19:42.240 --> 01:19:45.000 Some languages are stronger or more defensive than that. 01:19:45.000 --> 01:19:48.990 But that, too, is part of the mindset with this particular language. 01:19:48.990 --> 01:19:49.770 [SIREN] 01:19:49.770 --> 01:19:50.645 DAVID J. MALAN: Yeah. 01:19:50.645 --> 01:19:52.937 AUDIENCE: There is really only one green line, in the-- 01:19:52.937 --> 01:19:54.437 DAVID J. MALAN: Oh, sorry, where's-- 01:19:54.437 --> 01:19:55.080 say it louder. 01:19:55.080 --> 01:19:58.342 AUDIENCE: There has only been one green line printed at a time. 01:19:58.342 --> 01:20:00.050 DAVID J. MALAN: That is an amazing segue. 01:20:00.050 --> 01:20:01.370 Let's come to that in just a moment, because we're 01:20:01.370 --> 01:20:03.620 going to recreate also that Mario example, where 01:20:03.620 --> 01:20:06.925 we had like the question marks for the coins and the vertical bars. 01:20:06.925 --> 01:20:08.550 So let's come back to that in a second. 01:20:08.550 --> 01:20:09.656 And your question? 01:20:09.656 --> 01:20:13.362 AUDIENCE: If strings are immutable, and every time you like make a copy. 01:20:13.362 --> 01:20:15.320 DAVID J. MALAN: Correct, strings are immutable. 01:20:15.320 --> 01:20:19.220 Any time you seem to be modifying it, as with the lower function, 01:20:19.220 --> 01:20:20.480 you're getting back a copy. 01:20:20.480 --> 01:20:22.940 So it's taking a little more memory somewhere. 01:20:22.940 --> 01:20:26.145 But you don't have to deal with it Python's doing that for you. 01:20:26.145 --> 01:20:28.892 AUDIENCE: So you don't free anything. 01:20:28.892 --> 01:20:30.100 DAVID J. MALAN: Say it again? 01:20:30.100 --> 01:20:31.226 You don't need what? 01:20:31.226 --> 01:20:34.663 AUDIENCE: You don't free like taking leave on stuff. 01:20:34.663 --> 01:20:36.330 DAVID J. MALAN: You don't free anything. 01:20:36.330 --> 01:20:38.870 So if you weren't a big fan, over the past couple of weeks, 01:20:38.870 --> 01:20:42.860 of malloc or free or memory or addresses, or all 01:20:42.860 --> 01:20:44.990 of those low level implementation details, 01:20:44.990 --> 01:20:47.390 Python is the language for you, because all of that 01:20:47.390 --> 01:20:49.340 is handled for you automatically. 01:20:49.340 --> 01:20:50.780 Java does the same. 01:20:50.780 --> 01:20:51.960 JavaScript does the same. 01:20:51.960 --> 01:20:52.460 Yeah. 01:20:52.460 --> 01:20:58.244 AUDIENCE: Each up for the variable, you put it before the name, use of the body 01:20:58.244 --> 01:20:59.700 before the name, correct? 01:20:59.700 --> 01:21:03.785 Well, if there isn't a main function in Python, how do you define those words? 01:21:03.785 --> 01:21:05.910 DAVID J. MALAN: How do you define a global variable 01:21:05.910 --> 01:21:07.493 if there's no main function in Python? 01:21:07.493 --> 01:21:11.480 Global variables, by definition, always need to be outside of main, as well. 01:21:11.480 --> 01:21:12.480 So that's not a problem. 01:21:12.480 --> 01:21:15.300 If I wanted to have a function that's outside of, 01:21:15.300 --> 01:21:19.703 and, therefore, global to all of these, like global-- 01:21:19.703 --> 01:21:22.620 actually, don't use the word global, that's a special word in Python-- 01:21:22.620 --> 01:21:27.450 variable equals Foo, F-O-O, just as an arbitrary string 01:21:27.450 --> 01:21:31.410 value that a computer scientist would typically use, that is now global. 01:21:31.410 --> 01:21:34.000 There are some caveats, though, as to how you access that. 01:21:34.000 --> 01:21:36.010 But let's come back to that another time. 01:21:36.010 --> 01:21:38.030 But that problem is solvable, too. 01:21:38.030 --> 01:21:38.530 All right. 01:21:38.530 --> 01:21:39.780 So let's go ahead and do this. 01:21:39.780 --> 01:21:43.050 To come back to the question about the print command, let me go ahead 01:21:43.050 --> 01:21:45.300 and create a file now called Mario.py. 01:21:45.300 --> 01:21:47.700 Won't bother showing the C code anymore. 01:21:47.700 --> 01:21:49.590 We'll focus just on the new language here. 01:21:49.590 --> 01:21:54.540 But recall that, in Python, in Mario, we wanted to first do something like this. 01:21:54.540 --> 01:21:57.600 This was a random screen from the side scroller version 1 01:21:57.600 --> 01:21:58.800 of Super Mario Brothers. 01:21:58.800 --> 01:22:02.820 And we just want to print like three hashes to represent those three blocks. 01:22:02.820 --> 01:22:04.950 Well, in Python, we could do something like this, 01:22:04.950 --> 01:22:11.280 print, oh, sorry, for i in the range of 3, go ahead and print out quote unquote 01:22:11.280 --> 01:22:11.828 "hash." 01:22:11.828 --> 01:22:13.620 And I think this is pretty straightforward. 01:22:13.620 --> 01:22:16.260 Python of Mario.py, we get our three hashes. 01:22:16.260 --> 01:22:18.850 You could imagine parameterizing this now, though, 01:22:18.850 --> 01:22:20.350 and getting actual user input. 01:22:20.350 --> 01:22:21.730 So let's do that. 01:22:21.730 --> 01:22:27.420 Let me go up here and let me go and say from CS50, import getInt, 01:22:27.420 --> 01:22:31.090 and then let's get the input from the user. 01:22:31.090 --> 01:22:33.210 So it actually is a value n, like, all right, 01:22:33.210 --> 01:22:38.190 getInt the height of the column of bricks that you want to do. 01:22:38.190 --> 01:22:42.270 And then, let's go ahead and print out n hashes instead of three. 01:22:42.270 --> 01:22:43.560 So let me run this. 01:22:43.560 --> 01:22:45.385 Let's print out like five hashes. 01:22:45.385 --> 01:22:47.760 OK, one, two, three, four, five, that seems to work, too. 01:22:47.760 --> 01:22:49.677 And it's going to work for any positive value. 01:22:49.677 --> 01:22:53.400 But it's not going to work for, how about negative 1? 01:22:53.400 --> 01:22:54.660 That just doesn't do anything. 01:22:54.660 --> 01:22:55.747 But that seems OK. 01:22:55.747 --> 01:22:58.830 But also recall that it's not going to work if the user types in something 01:22:58.830 --> 01:23:03.990 weird, like, oh, sorry, it is going to work if the user types in something 01:23:03.990 --> 01:23:05.790 weird like cat, why? 01:23:05.790 --> 01:23:08.820 We're using CS50's getInt function, which is 01:23:08.820 --> 01:23:11.710 handling all of those headaches for us. 01:23:11.710 --> 01:23:15.180 But, what if the user indeed types a negative number? 01:23:15.180 --> 01:23:16.110 We're tolerating that. 01:23:16.110 --> 01:23:17.860 So that was the bug I wanted to highlight. 01:23:17.860 --> 01:23:20.250 It would be nice to re-prompt them and re-prompt them. 01:23:20.250 --> 01:23:22.560 And in C, what was the programming construct we 01:23:22.560 --> 01:23:25.020 used when we wanted to ask the user a question. 01:23:25.020 --> 01:23:29.280 And then, if they didn't cooperate, prompt them again, prompt them again. 01:23:29.280 --> 01:23:29.890 What was that? 01:23:29.890 --> 01:23:30.390 Yeah. 01:23:30.390 --> 01:23:30.750 AUDIENCE: Do while loop. 01:23:30.750 --> 01:23:32.100 DAVID J. MALAN: Yeah, do while loop, right? 01:23:32.100 --> 01:23:34.830 That was useful, because it's almost the same as a while loop. 01:23:34.830 --> 01:23:38.100 But instead of checking a condition, and then doing something, 01:23:38.100 --> 01:23:39.948 you do something and then check a condition, 01:23:39.948 --> 01:23:42.240 which makes sense with user input, because what are you 01:23:42.240 --> 01:23:44.615 even going to check if the user hasn't done anything yet? 01:23:44.615 --> 01:23:46.200 You need that inverted logic. 01:23:46.200 --> 01:23:50.010 Unfortunately in Python, there is no do while loop. 01:23:50.010 --> 01:23:51.300 There is a for loop. 01:23:51.300 --> 01:23:52.740 There is a while loop. 01:23:52.740 --> 01:23:55.590 And frankly, those are enough to recreate this idea. 01:23:55.590 --> 01:23:59.160 And the way to do this in Python, the Pythonic way, which 01:23:59.160 --> 01:24:02.160 is another term of art in the community, is to say this. 01:24:02.160 --> 01:24:06.300 Deliberately induce an infinite loop, while True, with capital T for true. 01:24:06.300 --> 01:24:09.930 And then do what you got to do, like get an Int from a user, 01:24:09.930 --> 01:24:12.060 asking them for the height of this thing. 01:24:12.060 --> 01:24:18.270 And then, if that is what you want, like a number greater than zero, go ahead 01:24:18.270 --> 01:24:20.020 and break out of the loop. 01:24:20.020 --> 01:24:25.440 So this is how, in Python, you could recreate the idea of a do while loop. 01:24:25.440 --> 01:24:27.315 You deliberately induce an infinite loop. 01:24:27.315 --> 01:24:29.190 So something's going to happen at least once. 01:24:29.190 --> 01:24:32.280 Then, if you get the answer you want, you break out of it, 01:24:32.280 --> 01:24:34.330 effectively achieving the same logic. 01:24:34.330 --> 01:24:37.080 So this is the Pythonic way of doing a do while loop. 01:24:37.080 --> 01:24:41.760 Let me go ahead and run Python of Mario.py, type in 3 this time. 01:24:41.760 --> 01:24:44.670 And now I get back just the 3 hashes as well. 01:24:44.670 --> 01:24:50.310 What if, though, I wanted to get rid of, how about ultimately 01:24:50.310 --> 01:24:55.058 that CS50 library function, and also encapsulate this in a function. 01:24:55.058 --> 01:24:57.100 Well, let's go ahead and tweak this a little bit. 01:24:57.100 --> 01:24:59.070 Let me go ahead and remove this temporarily. 01:24:59.070 --> 01:25:01.680 Give myself a main function, so I don't make the same mistake 01:25:01.680 --> 01:25:03.360 as I did initially earlier. 01:25:03.360 --> 01:25:07.110 And let me give myself a function called get height that takes no arguments. 01:25:07.110 --> 01:25:10.620 And inside of that function is going to be that same code. 01:25:10.620 --> 01:25:14.280 But I don't want to break in this case, I want to return n. 01:25:14.280 --> 01:25:17.293 So, recall, that if you return from a function, you're done, 01:25:17.293 --> 01:25:19.210 you're going to exit from right at that point. 01:25:19.210 --> 01:25:20.320 So this would be fine. 01:25:20.320 --> 01:25:22.680 You can just say return n inside of the loop, 01:25:22.680 --> 01:25:25.320 or, if you would prefer to break out, you 01:25:25.320 --> 01:25:26.940 could do something like this instead. 01:25:26.940 --> 01:25:32.700 Break, and then down here, you could return, down here, 01:25:32.700 --> 01:25:34.630 you could return n as well. 01:25:34.630 --> 01:25:37.290 And let me make one point here before we go back up to main. 01:25:37.290 --> 01:25:41.490 This is a little different from C. And this one's subtle. 01:25:41.490 --> 01:25:47.250 What have I done here that in C would have been a bug, but is apparently not, 01:25:47.250 --> 01:25:48.315 I claim, in Python. 01:25:50.860 --> 01:25:52.220 It's super subtle, this one. 01:25:52.220 --> 01:25:52.720 Yeah. 01:25:52.720 --> 01:25:55.911 AUDIENCE: So aren't we like defining mostly object, 01:25:55.911 --> 01:25:59.470 like we're using it first, defining an object? 01:25:59.470 --> 01:26:04.275 [INAUDIBLE] 01:26:04.275 --> 01:26:07.150 DAVID J. MALAN: So similar, it's not quite that we're using it first. 01:26:07.150 --> 01:26:10.980 So it's OK not to declare a variable with like the data type. 01:26:10.980 --> 01:26:15.420 We've addressed that before, but on line 9, we're assigning n a value, it seems. 01:26:15.420 --> 01:26:18.600 And then we return n on line 12. 01:26:18.600 --> 01:26:20.190 But notice the indentation. 01:26:20.190 --> 01:26:25.410 In the world of C, if we had declared a variable inside of a loop, on line 9, 01:26:25.410 --> 01:26:28.200 it would have been scoped to that loop, which 01:26:28.200 --> 01:26:31.530 means as soon as you get out of that loop, like further down in the program, 01:26:31.530 --> 01:26:33.340 n would not exist. 01:26:33.340 --> 01:26:36.090 It would be local to the curly braces therein. 01:26:36.090 --> 01:26:39.720 Here, logically, curly braces are gone, but the indentation 01:26:39.720 --> 01:26:44.250 makes clear that n is still inside of this loop, between lines 8 through 11. 01:26:44.250 --> 01:26:47.280 But n is actually still in scope in Python. 01:26:47.280 --> 01:26:50.380 The moment you create a variable in Python, for better or for worse, 01:26:50.380 --> 01:26:53.760 It is available everywhere within that function, even outside 01:26:53.760 --> 01:26:55.690 of the loop in which you defined it. 01:26:55.690 --> 01:26:59.070 So this logic is actually OK in Python. 01:26:59.070 --> 01:27:02.138 In C, recall, to solve this same problem, 01:27:02.138 --> 01:27:04.680 we would have had to do something a little hackish like this, 01:27:04.680 --> 01:27:09.600 like define n up here on line 8, so that it exists, now, on line 10, 01:27:09.600 --> 01:27:12.000 and so that it exists on line 13. 01:27:12.000 --> 01:27:15.700 That is no longer an issue or need, in Python. 01:27:15.700 --> 01:27:17.700 Once you create a variable, even if it's nested, 01:27:17.700 --> 01:27:19.867 nested, nested inside of some loops or conditionals, 01:27:19.867 --> 01:27:23.520 it still exists within the function itself. 01:27:23.520 --> 01:27:27.870 All right, any questions then on this, before we now run this and then get 01:27:27.870 --> 01:27:31.680 rid of the CS50 library again? 01:27:31.680 --> 01:27:34.300 OK, so let me go ahead and get the height from the user. 01:27:34.300 --> 01:27:36.758 Let's go ahead and create a variable in main called height. 01:27:36.758 --> 01:27:38.460 Let's call this get height function. 01:27:38.460 --> 01:27:43.380 And then let's use that height value, instead of something hardcoded there. 01:27:43.380 --> 01:27:45.000 And let me see if this all works now. 01:27:45.000 --> 01:27:46.410 Python of Mario.py. 01:27:46.410 --> 01:27:49.110 Hopefully, I haven't messed up, but I did. 01:27:49.110 --> 01:27:51.460 But this is an easy fix now. 01:27:51.460 --> 01:27:51.960 Yeah. 01:27:51.960 --> 01:27:53.085 AUDIENCE: Got to call main. 01:27:53.085 --> 01:27:54.543 DAVID J. MALAN: I got to call main. 01:27:54.543 --> 01:27:55.980 So again, I deleted that earlier. 01:27:55.980 --> 01:27:56.920 But let me bring it back. 01:27:56.920 --> 01:27:58.128 So I'm actually calling main. 01:27:58.128 --> 01:28:02.190 Let me rerun Python of Mario.py, there we go, height 3. 01:28:02.190 --> 01:28:03.880 Now it seems to be working. 01:28:03.880 --> 01:28:05.880 So let's do one last thing with Mario, just 01:28:05.880 --> 01:28:08.980 to tie together that idea now of exceptions from before. 01:28:08.980 --> 01:28:11.070 Again, exceptions are a feature of Python, 01:28:11.070 --> 01:28:13.060 whereby you can try to do something. 01:28:13.060 --> 01:28:16.710 And if there's a problem, you can handle it in any way you see fit. 01:28:16.710 --> 01:28:20.070 Previously, I handled it by just yelling at the user that that's not an Int. 01:28:20.070 --> 01:28:23.460 But let's actually use this to re-implement CS50's own getInt 01:28:23.460 --> 01:28:24.240 function. 01:28:24.240 --> 01:28:27.130 Let me throw away CS50's getInt function. 01:28:27.130 --> 01:28:32.880 And now let me go ahead and replace getInt with input. 01:28:32.880 --> 01:28:35.670 But it's not sufficient to just use input. 01:28:35.670 --> 01:28:39.480 What do I have to add to this line of code on line 8? 01:28:39.480 --> 01:28:40.740 If I want to get back an Int? 01:28:40.740 --> 01:28:41.790 AUDIENCE: The Int function. 01:28:41.790 --> 01:28:43.832 DAVID J. MALAN: Yeah, I have to cast it to an Int 01:28:43.832 --> 01:28:46.500 by calling the Int function around that value, 01:28:46.500 --> 01:28:48.750 or I could do it on a separate line, just to be clear. 01:28:48.750 --> 01:28:52.110 I could also do n equals Int of n. 01:28:52.110 --> 01:28:55.020 That would work too, but it's sort of an unnecessary extra line. 01:28:55.020 --> 01:28:57.990 This is not sufficient, because that does not change the value. 01:28:57.990 --> 01:28:58.935 It creates the value. 01:28:58.935 --> 01:29:00.060 But then it throws it away. 01:29:00.060 --> 01:29:01.192 We need to assign it. 01:29:01.192 --> 01:29:03.900 So the conventional way to do this would probably be in one line, 01:29:03.900 --> 01:29:05.358 just to keep things nice and tight. 01:29:05.358 --> 01:29:06.780 So that works fine now. 01:29:06.780 --> 01:29:11.470 If I run Python of Mario.py, I can still type in 3, and all as well. 01:29:11.470 --> 01:29:15.720 I can still type in negative 1, because that is an Int that I am handling. 01:29:15.720 --> 01:29:18.750 What I'm not yet handling is weird input like cat 01:29:18.750 --> 01:29:21.760 or some string that is not a base 10 number. 01:29:21.760 --> 01:29:23.880 So here, again, is my traceback. 01:29:23.880 --> 01:29:27.000 And notice that here, let me scroll up a little bit, 01:29:27.000 --> 01:29:31.620 here we can actually see more detail in the traceback. 01:29:31.620 --> 01:29:36.900 Notice that, just like in C, or just like in the debugger in VS Code, 01:29:36.900 --> 01:29:38.100 you can see a few things. 01:29:38.100 --> 01:29:41.490 You can see mention of module, that just means your file, main, which 01:29:41.490 --> 01:29:43.013 is my main function, and get height. 01:29:43.013 --> 01:29:44.430 So notice, it's kind of backwards. 01:29:44.430 --> 01:29:46.720 It's top to bottom instead of bottom up, as we drew it 01:29:46.720 --> 01:29:48.720 on the board the other day, and as we envisioned 01:29:48.720 --> 01:29:50.520 stacks of trays in the cafeteria. 01:29:50.520 --> 01:29:52.680 But this is your stack, of functions that 01:29:52.680 --> 01:29:54.330 have been called, from top to bottom. 01:29:54.330 --> 01:29:57.360 Get height is the most recent, main is the very first, 01:29:57.360 --> 01:29:59.200 value error is the problem. 01:29:59.200 --> 01:30:03.740 So let's try to do, let's try to do this literally, except if there's an error. 01:30:03.740 --> 01:30:04.740 So what do I want to do? 01:30:04.740 --> 01:30:09.720 I'm going to go in here, and I'm going to say, try to do the following. 01:30:09.720 --> 01:30:17.070 Whoops, try to do the following, except if there's a value error, value error, 01:30:17.070 --> 01:30:20.640 then go ahead and say something, well, like before, print, 01:30:20.640 --> 01:30:23.830 that's not an integer exclamation point. 01:30:23.830 --> 01:30:26.760 But the difference this time is because I'm in a loop, the user 01:30:26.760 --> 01:30:29.200 is going to have a chance to recover from this issue. 01:30:29.200 --> 01:30:32.340 So if I run Mario.py, 3 still works as before. 01:30:32.340 --> 01:30:35.880 If I run Mario.py and type in cat, I detect it now, 01:30:35.880 --> 01:30:39.240 and because I'm still in that loop, and because the program hasn't crashed, 01:30:39.240 --> 01:30:43.050 because I've caught, so to speak, the value error, using this line of code 01:30:43.050 --> 01:30:46.950 here, that's the way in Python to detect these kinds of errors, 01:30:46.950 --> 01:30:49.680 that would otherwise end up being on the user's own screen. 01:30:49.680 --> 01:30:51.540 If I type in cat, dog, that doesn't work. 01:30:51.540 --> 01:30:56.820 If I type in, though, 2, I get my two hashes, because that's, indeed, an Int. 01:30:56.820 --> 01:30:58.740 Are any questions on this, and we're not going 01:30:58.740 --> 01:31:00.750 to spend too much time on exceptions, but just wanted 01:31:00.750 --> 01:31:03.680 to show you what's involved with getting rid of those training wheels. 01:31:03.680 --> 01:31:04.180 Yeah. 01:31:04.180 --> 01:31:05.763 AUDIENCE: Then the hash marks in line. 01:31:05.763 --> 01:31:07.305 DAVID J. MALAN: OK, so let's do this. 01:31:07.305 --> 01:31:09.140 That actually comes to the earlier question 01:31:09.140 --> 01:31:11.060 about printing the hashes on the same line, 01:31:11.060 --> 01:31:13.808 or maybe something like this, where we have the little bricks 01:31:13.808 --> 01:31:15.350 in the sky, or little question marks. 01:31:15.350 --> 01:31:17.725 Let's recreate this idea, because the problem with print, 01:31:17.725 --> 01:31:20.930 as was noted earlier, is you're automatically printing out new lines. 01:31:20.930 --> 01:31:22.460 But what if we don't want that. 01:31:22.460 --> 01:31:24.740 Well, let's change this program entirely. 01:31:24.740 --> 01:31:26.310 Let me throw away all the functions. 01:31:26.310 --> 01:31:29.220 Let's just go to a simpler world, where we're just doing this. 01:31:29.220 --> 01:31:30.912 So let me start fresh in Mario.py. 01:31:30.912 --> 01:31:33.120 I'm not going to bother with exceptions or functions. 01:31:33.120 --> 01:31:39.410 Let's just do a very simple program, to create this idea, for i in range of 4 01:31:39.410 --> 01:31:42.860 this time, because there are four of these things in the sky. 01:31:42.860 --> 01:31:45.230 Let's go ahead and just print out a question mark 01:31:45.230 --> 01:31:47.450 to represent each of those bricks. 01:31:47.450 --> 01:31:51.140 Odds are you know this not going to end well, because these are unfortunately, 01:31:51.140 --> 01:31:54.450 as you've predicted, on separate lines. 01:31:54.450 --> 01:31:57.380 So it turns out that the print function actually 01:31:57.380 --> 01:32:00.320 takes in multiple arguments, not just the thing you want to print, 01:32:00.320 --> 01:32:03.650 but also some additional arguments, that allow you to specify 01:32:03.650 --> 01:32:06.170 what the default line ending should be. 01:32:06.170 --> 01:32:09.110 But what's interesting about this is that, if you 01:32:09.110 --> 01:32:12.630 want to change the line ending to be something like, 01:32:12.630 --> 01:32:16.790 quote unquote, "that is nothing," instead of backslash n, 01:32:16.790 --> 01:32:19.310 this is not sufficient, because in Python, you 01:32:19.310 --> 01:32:21.770 can have two types of arguments, or parameters. 01:32:21.770 --> 01:32:25.160 Some arguments are positional, which is the fancy way of saying it's 01:32:25.160 --> 01:32:26.690 a comma separated list of arguments. 01:32:26.690 --> 01:32:29.540 And that's what we did all the time in C. Something comma, something 01:32:29.540 --> 01:32:31.665 comma, something, we did it in printf all the time, 01:32:31.665 --> 01:32:33.980 and in other functions that took multiple arguments. 01:32:33.980 --> 01:32:37.880 In Python, you have, not only positional arguments, 01:32:37.880 --> 01:32:41.660 where you just separate them by commas, to give one or two or three or more 01:32:41.660 --> 01:32:42.650 arguments. 01:32:42.650 --> 01:32:46.220 There are also named arguments, which looks weird but is 01:32:46.220 --> 01:32:48.140 helpful for reasons like this. 01:32:48.140 --> 01:32:50.900 If you read the documentation, you will see 01:32:50.900 --> 01:32:54.740 that there is a named argument that Python accepts, called end. 01:32:54.740 --> 01:32:57.680 And if you set that equal to something, that 01:32:57.680 --> 01:33:00.200 will be used as the end of every line, instead 01:33:00.200 --> 01:33:02.750 of the default, which the documentation will also say 01:33:02.750 --> 01:33:04.700 is quote unquote backslash n. 01:33:04.700 --> 01:33:09.000 So this line here has no effect on my logic at the moment. 01:33:09.000 --> 01:33:13.280 But if I change it to just quote unquote, essentially overriding 01:33:13.280 --> 01:33:18.470 the default new line character, and now run Mario again, now I get all four 01:33:18.470 --> 01:33:19.278 on the same line. 01:33:19.278 --> 01:33:20.570 There's a bit of a bug, though. 01:33:20.570 --> 01:33:23.610 My prompt is not meant to be on the same line. 01:33:23.610 --> 01:33:25.640 So I can fix that by just printing nothing. 01:33:25.640 --> 01:33:28.640 But, really, it's not nothing, because you get the new line for free. 01:33:28.640 --> 01:33:32.930 So let me run Python of Mario.py again, and now we 01:33:32.930 --> 01:33:36.140 have what I intended in the first place, which was a little something that 01:33:36.140 --> 01:33:37.170 looked like this. 01:33:37.170 --> 01:33:40.910 And this is just one example of an argument that has a name. 01:33:40.910 --> 01:33:43.280 But this is a common paradigm in Python 2, 01:33:43.280 --> 01:33:46.250 to not just separate things by commas, but to be very specific, 01:33:46.250 --> 01:33:50.810 because the print function might take 5, 10, even 20 different arguments. 01:33:50.810 --> 01:33:54.628 And my God, if you had to enumerate like 10 or 20 commas, 01:33:54.628 --> 01:33:55.670 you're going to screw up. 01:33:55.670 --> 01:33:57.587 You're going to get things in the wrong order. 01:33:57.587 --> 01:34:00.600 Named arguments allow you to be resilient against that. 01:34:00.600 --> 01:34:02.690 So you only specify arguments by name, and it 01:34:02.690 --> 01:34:06.004 doesn't matter what order they are in. 01:34:06.004 --> 01:34:10.160 All right, any questions, then, on this, and the overriding of new line. 01:34:10.160 --> 01:34:14.270 And to be clear, you can do something like, very weird, 01:34:14.270 --> 01:34:19.910 but logically expected, like this, by just changing the line ending, too. 01:34:19.910 --> 01:34:21.830 But the right way to solve the Mario problem 01:34:21.830 --> 01:34:25.652 would be just to override it to be nothing like this. 01:34:25.652 --> 01:34:27.110 All right, how about this for cool. 01:34:27.110 --> 01:34:29.000 And this is why a lot of people like Python. 01:34:29.000 --> 01:34:30.440 Suppose you don't really like loops. 01:34:30.440 --> 01:34:31.970 You don't really like three-line programs, 01:34:31.970 --> 01:34:34.637 because that was kind of three times longer than it needs to be. 01:34:34.637 --> 01:34:39.200 What if you just printed out a question mark four times? 01:34:39.200 --> 01:34:43.380 Python, whoops, Python of Mario.py, that also works. 01:34:43.380 --> 01:34:46.550 So it turns out that, just like the plus operator in Python 01:34:46.550 --> 01:34:50.570 can join things together, the multiply operator is not 01:34:50.570 --> 01:34:51.840 arithmetic in this case. 01:34:51.840 --> 01:34:56.070 It actually means, take this and concatenate it four times over. 01:34:56.070 --> 01:34:59.000 So that's a way of just distilling into one line what 01:34:59.000 --> 01:35:02.750 would have otherwise taken multiple lines in C, fewer, but still multiple 01:35:02.750 --> 01:35:07.130 lines in Python, but is really now rather succinct in Python, 01:35:07.130 --> 01:35:08.385 by doing that instead. 01:35:08.385 --> 01:35:11.510 Let's do one last Mario example, which looked a little something like this. 01:35:11.510 --> 01:35:14.090 If this is another part of the Mario interface, 01:35:14.090 --> 01:35:16.800 this is like a grid of like 3 by 3 bricks, for instance. 01:35:16.800 --> 01:35:20.690 So two dimensions now, just not just vertical, not horizontal, but now both. 01:35:20.690 --> 01:35:23.130 Let's print out something like that, using hashes. 01:35:23.130 --> 01:35:26.070 Well, how about, how do I do this. 01:35:26.070 --> 01:35:29.210 So how about for i in range of 3. 01:35:29.210 --> 01:35:34.280 Then I could do for j in range of 3, just because j comes after I 01:35:34.280 --> 01:35:35.810 and that's reasonable for counting. 01:35:35.810 --> 01:35:41.000 I could now print out a hash symbol, well, let's see what this does. 01:35:41.000 --> 01:35:47.660 Python of Mario.py, OK, that's just one crazy long column. 01:35:47.660 --> 01:35:51.240 What do I need to fix and where here, to make this look like this? 01:35:51.240 --> 01:35:55.850 So 3 by 3 bricks, instead of one long column. 01:35:55.850 --> 01:35:56.450 Any instincts? 01:35:56.450 --> 01:36:00.500 AUDIENCE: Why don't we create a line and then we'll skip it. 01:36:00.500 --> 01:36:03.450 DAVID J. MALAN: OK, so after printing 3, we want to skip a line. 01:36:03.450 --> 01:36:05.750 So maybe like print out a blank line here. 01:36:05.750 --> 01:36:06.740 OK, let's try that. 01:36:06.740 --> 01:36:09.920 I like that instinct, right, print 3, new line, print 3, new line. 01:36:09.920 --> 01:36:12.260 Let's go ahead and run Python of Mario.py. 01:36:12.260 --> 01:36:16.580 OK, it's more visible, what I'm doing, but still wrong. 01:36:16.580 --> 01:36:19.110 What can I, what's the remaining fix, though? 01:36:19.110 --> 01:36:19.610 Yeah. 01:36:19.610 --> 01:36:22.790 AUDIENCE: So right behind the two. 01:36:22.790 --> 01:36:25.680 DAVID J. MALAN: Yeah, I'm getting an extra new line here, 01:36:25.680 --> 01:36:27.870 which I don't want while I'm on this row. 01:36:27.870 --> 01:36:31.850 So let me do n equals quote unquote, and now, together, your solutions might 01:36:31.850 --> 01:36:33.950 take us the whole way there. 01:36:33.950 --> 01:36:37.345 Python of Mario.py, voila, now we've got it, in two dimensions. 01:36:37.345 --> 01:36:38.720 And even this, we can tighten up. 01:36:38.720 --> 01:36:41.220 Like, we could just use the little trick we learned. 01:36:41.220 --> 01:36:45.230 So we could just say, print a hash times 3 times, 01:36:45.230 --> 01:36:47.810 and we can get rid of one of those loops altogether. 01:36:47.810 --> 01:36:50.930 All it's doing is, whoops, all it's doing is automating that process. 01:36:50.930 --> 01:36:53.060 But, no, I don't want to do that. 01:36:53.060 --> 01:36:54.832 What do I, how do I fix this here. 01:36:54.832 --> 01:36:56.540 I don't think I want this anymore, right? 01:36:56.540 --> 01:36:58.350 Because that's giving me an extra new line. 01:36:58.350 --> 01:37:01.260 So now this program is really tightened up. 01:37:01.260 --> 01:37:03.050 Same thing, two lines of code. 01:37:03.050 --> 01:37:07.220 But we're now implementing this same two dimensional structure here. 01:37:07.220 --> 01:37:10.440 All right, any questions here on these? 01:37:10.440 --> 01:37:10.940 Yeah. 01:37:10.940 --> 01:37:16.790 AUDIENCE: Is there any practical reason why when we write n, n is, I mean, 01:37:16.790 --> 01:37:19.850 the print function, you don't put any spaces in it. 01:37:19.850 --> 01:37:22.430 DAVID J. MALAN: If I print n, any spaces. 01:37:22.430 --> 01:37:23.300 Say that once more. 01:37:23.300 --> 01:37:25.440 AUDIENCE: Whenever we write n, for example, 01:37:25.440 --> 01:37:28.850 the print function is, you know, in order 01:37:28.850 --> 01:37:33.820 to stop it from going to a new line, it seems like any spaces, 01:37:33.820 --> 01:37:37.800 we did like n equals and then too close. 01:37:37.800 --> 01:37:38.820 There were no spaces. 01:37:38.820 --> 01:37:40.300 Did you do that on purpose? 01:37:40.300 --> 01:37:42.300 DAVID J. MALAN: Oh. 01:37:42.300 --> 01:37:43.200 yes, good question. 01:37:43.200 --> 01:37:44.242 I see what you're saying. 01:37:44.242 --> 01:37:48.030 So in a previous version, let me rewind in time, when we had this, 01:37:48.030 --> 01:37:49.170 I did not put spaces. 01:37:49.170 --> 01:37:51.720 The convention in Python is not to do that. 01:37:51.720 --> 01:37:52.350 Why? 01:37:52.350 --> 01:37:54.263 It just starts to add too much space. 01:37:54.263 --> 01:37:56.430 And this is a little inconsistent, because, earlier, 01:37:56.430 --> 01:37:58.470 when we talked about like pluses or spaces 01:37:58.470 --> 01:38:00.750 around the less than or equal signs, I did say add it. 01:38:00.750 --> 01:38:03.010 Here it's actually clearer and recommended 01:38:03.010 --> 01:38:04.260 to keep them tighter together. 01:38:04.260 --> 01:38:07.560 Otherwise it just becomes harder to read where the gaps are. 01:38:07.560 --> 01:38:08.820 Good observation. 01:38:08.820 --> 01:38:14.357 All right, let's do, how about, another five minute break. 01:38:14.357 --> 01:38:14.940 Let's do that. 01:38:14.940 --> 01:38:17.732 And then we're going to dive into some more sophisticated problems, 01:38:17.732 --> 01:38:21.160 and then ultimately build with some audio and visual examples, as well. 01:38:21.160 --> 01:38:23.130 See you in five. 01:38:23.130 --> 01:38:28.260 All right, so almost all of the examples we just did 01:38:28.260 --> 01:38:30.540 were recreations of what we did in week 1. 01:38:30.540 --> 01:38:33.120 And recall that week 1 was like our most syntax-heavy week. 01:38:33.120 --> 01:38:36.930 It was when we were first learning how to program in C. But after week 1, 01:38:36.930 --> 01:38:39.900 we began to focus a bit more on ideas, like arrays, 01:38:39.900 --> 01:38:41.640 and other higher-level constructs. 01:38:41.640 --> 01:38:44.880 And we'll do that again here, condensing some of those first early weeks 01:38:44.880 --> 01:38:47.250 into a fewer set of examples in Python. 01:38:47.250 --> 01:38:50.020 And we'll culminate by actually taking Python out for a spin, 01:38:50.020 --> 01:38:52.300 and doing things that would be way harder to do, 01:38:52.300 --> 01:38:56.830 and way more time-consuming to do in C, even more so than the speller example. 01:38:56.830 --> 01:38:59.790 But how do you go about figuring out what functions exist, 01:38:59.790 --> 01:39:02.970 if you didn't hear it in class, you don't see it online, 01:39:02.970 --> 01:39:06.480 but you want to see it officially, you can go to the Python documentation, 01:39:06.480 --> 01:39:08.220 docs.python.org here. 01:39:08.220 --> 01:39:11.340 And I will disclaim that, honestly, the Python documentation is not 01:39:11.340 --> 01:39:12.750 terribly user-friendly. 01:39:12.750 --> 01:39:15.240 Google will often be your friend, so googling something 01:39:15.240 --> 01:39:19.350 you're interested in, to find your way to the appropriate page on Python.org, 01:39:19.350 --> 01:39:22.410 or StackOverflow.com is another popular website. 01:39:22.410 --> 01:39:24.780 As always, though, the line should be googling 01:39:24.780 --> 01:39:27.600 things like, how do I convert a string to lowercase. 01:39:27.600 --> 01:39:29.070 Like that's reasonable to Google. 01:39:29.070 --> 01:39:33.160 Or how to convert to uppercase or how implement function in Python. 01:39:33.160 --> 01:39:37.950 But googling, of course, things like how to implement problem set 6 in CS50, 01:39:37.950 --> 01:39:39.120 of course, crosses the line. 01:39:39.120 --> 01:39:42.078 But moving forward, and really with programming in general, like Google 01:39:42.078 --> 01:39:44.220 and Stack Overflow are your friends, but the line 01:39:44.220 --> 01:39:46.540 is between the reasonable and the unreasonable. 01:39:46.540 --> 01:39:49.890 So let me officially use the Python documentation search, just 01:39:49.890 --> 01:39:52.530 to search for something like the lowercase function. 01:39:52.530 --> 01:39:54.540 Like, I know I can lowercase things in Python. 01:39:54.540 --> 01:39:55.980 I don't quite remember how. 01:39:55.980 --> 01:39:57.870 So let me just search for the word lower. 01:39:57.870 --> 01:40:00.810 You're going to get, often, an overwhelming number of results, 01:40:00.810 --> 01:40:03.678 because Python is a pretty big language, with lots of functionality. 01:40:03.678 --> 01:40:05.970 And you're going to want to look for familiar patterns. 01:40:05.970 --> 01:40:09.060 For whatever reason, string.lower, which is probably 01:40:09.060 --> 01:40:12.420 more popular or more commonly used than these other ones, is third on the list. 01:40:12.420 --> 01:40:15.460 But it's purple, because I clicked it a moment ago, when looking for it. 01:40:15.460 --> 01:40:18.450 So str.lower is probably what I want, because I 01:40:18.450 --> 01:40:21.060 am interested at the moment in lower casing strings. 01:40:21.060 --> 01:40:25.258 When I click on that, this is an example of what Python's documentation tends 01:40:25.258 --> 01:40:25.800 to look like. 01:40:25.800 --> 01:40:27.340 It's in this general format. 01:40:27.340 --> 01:40:29.340 Here's my str.lower function. 01:40:29.340 --> 01:40:31.540 This returns a copy of the string, with all 01:40:31.540 --> 01:40:33.750 of the cased characters converted to lowercase, 01:40:33.750 --> 01:40:35.670 and the lower-casing algorithm, dot dot dot. 01:40:35.670 --> 01:40:37.168 So that doesn't give me much. 01:40:37.168 --> 01:40:38.460 It doesn't give me sample code. 01:40:38.460 --> 01:40:40.210 But it does say what the function does. 01:40:40.210 --> 01:40:43.890 And if we keep looking, you'll see mention of Lstrip, which is left strip. 01:40:43.890 --> 01:40:48.120 I used its analog, Rstrip before, right strip, which allows you to remove, 01:40:48.120 --> 01:40:51.000 that is strip, from the end of a string, something like white space, 01:40:51.000 --> 01:40:52.930 like a new line, or even something else. 01:40:52.930 --> 01:40:56.410 And if you scroll through string, this web page here. 01:40:56.410 --> 01:40:58.110 And we're halfway down the page already. 01:40:58.110 --> 01:41:00.180 If you see my scroll bar, tiny on the right, 01:41:00.180 --> 01:41:05.250 there's a huge amount of functionality built into string objects, here. 01:41:05.250 --> 01:41:08.460 And this is just testament to just how rich the language itself is. 01:41:08.460 --> 01:41:12.620 But it's also reason to reassure that the goal, when 01:41:12.620 --> 01:41:14.870 playing around with some new language and learning it, 01:41:14.870 --> 01:41:16.598 is not to learn it exhaustively. 01:41:16.598 --> 01:41:18.390 Just like in English or any human language, 01:41:18.390 --> 01:41:20.640 there's always going to be vocab words you don't know, 01:41:20.640 --> 01:41:23.563 ways of presenting the same information in some language. 01:41:23.563 --> 01:41:25.230 That's going to be the case with Python. 01:41:25.230 --> 01:41:28.620 And what we'll do today and this week in problem set 6 is really 01:41:28.620 --> 01:41:30.120 get your footing with this language. 01:41:30.120 --> 01:41:33.300 But you won't know all of Python, just like you won't know all of C. 01:41:33.300 --> 01:41:36.300 And, honestly, you won't know all of any of these languages on your own, 01:41:36.300 --> 01:41:38.800 unless you're, perhaps, using them full time professionally, 01:41:38.800 --> 01:41:42.370 and even then, there's more libraries than one might even retain themselves. 01:41:42.370 --> 01:41:45.420 So let's actually now pivot to a few other ideas, 01:41:45.420 --> 01:41:47.560 that we'll implement in Python, in a moment. 01:41:47.560 --> 01:41:50.010 Let me switch back over to VS Code here. 01:41:50.010 --> 01:41:55.260 And let me whip up, say, a recreation of our scores example from week two, 01:41:55.260 --> 01:41:57.883 where we averaged like three scores together. 01:41:57.883 --> 01:42:00.300 And that was an opportunity in week 2 to play with arrays, 01:42:00.300 --> 01:42:02.430 to realize how constrained arrays are. 01:42:02.430 --> 01:42:03.720 They can't grow or shrink. 01:42:03.720 --> 01:42:05.040 You have to decide in advance. 01:42:05.040 --> 01:42:07.110 But let's see what's different here in Python. 01:42:07.110 --> 01:42:11.580 So let me do Scores.py, and let me give myself an array in Python 01:42:11.580 --> 01:42:15.780 called scores, sorry, let me give myself a variable in Python called scores. 01:42:15.780 --> 01:42:17.940 Set it equal to a list of three scores, which 01:42:17.940 --> 01:42:22.560 are the same ones we've used before, 72, 73, 33, in this context 01:42:22.560 --> 01:42:24.630 meant to be scores, not ASCII values. 01:42:24.630 --> 01:42:26.520 And then let's just do the average of these. 01:42:26.520 --> 01:42:28.630 So average will be another variable. 01:42:28.630 --> 01:42:32.910 And it turns out I can do, well, how did I sum these before? 01:42:32.910 --> 01:42:36.580 I probably had a for loop to add one, then I knew how long they were. 01:42:36.580 --> 01:42:39.580 Turns out in Python, you can just say sum of scores 01:42:39.580 --> 01:42:41.530 divided by the length of scores. 01:42:41.530 --> 01:42:43.130 That's going to give me my average. 01:42:43.130 --> 01:42:46.210 So sum is a function that takes a list, in this case, as input, 01:42:46.210 --> 01:42:49.000 and it just does the sum for you, with a for loop or whatever 01:42:49.000 --> 01:42:49.930 underneath the hood. 01:42:49.930 --> 01:42:53.480 Len gives you the length of the list, how many things are in it. 01:42:53.480 --> 01:42:55.240 So I can dynamically figure that out. 01:42:55.240 --> 01:43:00.340 Now let me go ahead and print out, using print, the word average, and then, 01:43:00.340 --> 01:43:03.628 in curly braces, the actual average, close quote. 01:43:03.628 --> 01:43:05.920 All right, so let's run this code, Python of Scores.py. 01:43:05.920 --> 01:43:11.050 And there is my average, in this case, 59.33333 and so forth, 01:43:11.050 --> 01:43:12.310 based on the math. 01:43:12.310 --> 01:43:14.500 Well, let's actually, now, change this a little bit 01:43:14.500 --> 01:43:17.625 and make it a little more interesting, and actually get input from the user 01:43:17.625 --> 01:43:19.190 rather than hard coding this. 01:43:19.190 --> 01:43:22.568 Let me go back up here and use from CS50 import getInt, 01:43:22.568 --> 01:43:25.360 because I don't want to deal with all the exceptions and the loops. 01:43:25.360 --> 01:43:27.820 Like, I just want to use someone else's function here. 01:43:27.820 --> 01:43:31.600 Let me give myself an empty list called scores. 01:43:31.600 --> 01:43:34.480 And this is not something we were able to do in C, right? 01:43:34.480 --> 01:43:36.610 Because in C, if you tried to make an empty array, 01:43:36.610 --> 01:43:39.590 well, that's pretty stupid, because you can't add things to it. 01:43:39.590 --> 01:43:40.910 It's a fixed size. 01:43:40.910 --> 01:43:42.650 So it wouldn't even let you do that. 01:43:42.650 --> 01:43:45.640 But I can just create an empty list in Python, 01:43:45.640 --> 01:43:48.340 because lists, unlike arrays, are really lengthless. 01:43:48.340 --> 01:43:49.750 They'll grow and shrink. 01:43:49.750 --> 01:43:52.870 But you and I are not dealing with all the pointers underneath the hood. 01:43:52.870 --> 01:43:54.770 Python's doing that for us. 01:43:54.770 --> 01:43:58.435 So now, let's go ahead and get a whole bunch of scores from the user. 01:43:58.435 --> 01:43:59.810 How about three of them in total. 01:43:59.810 --> 01:44:05.350 So for i in range of 3, let's go ahead and grab a score from the user, 01:44:05.350 --> 01:44:07.810 using getInt, asking them for score. 01:44:07.810 --> 01:44:14.840 And then let's go ahead and append, to the scores list, that particular score. 01:44:14.840 --> 01:44:17.200 So it turns out that a list, and I could read the Python 01:44:17.200 --> 01:44:21.280 documentation to confirm as much, lists have a function built into them, 01:44:21.280 --> 01:44:25.155 and functions built into objects are generally known as methods, 01:44:25.155 --> 01:44:26.530 if you've heard that term before. 01:44:26.530 --> 01:44:29.320 Same idea, but whereas a function kind of stands on its own, 01:44:29.320 --> 01:44:33.430 a method is a function built into an object, like a list here. 01:44:33.430 --> 01:44:35.917 That's going to achieve the same result. Strictly speaking, 01:44:35.917 --> 01:44:37.000 I don't need the variable. 01:44:37.000 --> 01:44:40.603 Just like in C, I could tighten this up and do something like this as well. 01:44:40.603 --> 01:44:42.520 But, I don't know, I kind of like it this way. 01:44:42.520 --> 01:44:45.970 It's more clear, to me, at least, that what I'm doing here, getting the score 01:44:45.970 --> 01:44:47.838 and then appending it to the list. 01:44:47.838 --> 01:44:49.630 Now the rest of the code can stay the same. 01:44:49.630 --> 01:44:54.700 Python of Scores.py, score will be 72, 73, 33. 01:44:54.700 --> 01:44:55.820 And I get back the math. 01:44:55.820 --> 01:44:58.840 But now the program's a little more dynamic, which is nice. 01:44:58.840 --> 01:45:00.940 But there's other syntax I could use here. 01:45:00.940 --> 01:45:04.330 Just so you've seen it, Python does have some neat syntactic tricks, 01:45:04.330 --> 01:45:06.850 whereby, if you don't want to do scores.append, 01:45:06.850 --> 01:45:11.290 you can actually say scores plus equals this score. 01:45:11.290 --> 01:45:15.730 So you can actually concatenate lists together in Python 2. 01:45:15.730 --> 01:45:18.340 Just as we used plus to join two strings together, 01:45:18.340 --> 01:45:21.400 you can use plus to join two lists together. 01:45:21.400 --> 01:45:24.040 The catch is, you need to put the one score I'm 01:45:24.040 --> 01:45:26.770 adding here in a list of its own, which is kind of silly. 01:45:26.770 --> 01:45:31.330 But it's necessary, so that this thing and this thing are both lists. 01:45:31.330 --> 01:45:33.970 To do this more verbosely, which most programmers wouldn't 01:45:33.970 --> 01:45:36.310 do, but just for clarity, this is the same thing 01:45:36.310 --> 01:45:38.950 as saying scores plus this score. 01:45:38.950 --> 01:45:42.910 So now maybe it's a little more clear that scores and brackets score 01:45:42.910 --> 01:45:47.680 plural, sorry, singular, are both lists themselves, being concatenated 01:45:47.680 --> 01:45:48.860 or joined together. 01:45:48.860 --> 01:45:51.740 So two different ways, not sure one is better than the other. 01:45:51.740 --> 01:45:57.640 This way is pretty common, but .append is also quite reasonable as well. 01:45:57.640 --> 01:46:00.340 All right, how about another example from week two. 01:46:00.340 --> 01:46:03.070 This one was called uppercase. 01:46:03.070 --> 01:46:06.320 So let me do this in Uppercase.py, though, this time. 01:46:06.320 --> 01:46:10.180 And let me import from CS50, get string again. 01:46:10.180 --> 01:46:14.020 And let me go ahead and say, before will be my first variable. 01:46:14.020 --> 01:46:17.500 Let me get a string from the user, asking them for a before string. 01:46:17.500 --> 01:46:22.660 And then let me go ahead and say, after, just to demonstrate some changes, 01:46:22.660 --> 01:46:25.190 upper-casing to this string. 01:46:25.190 --> 01:46:27.850 Let me change my line ending to be that, using our new trick. 01:46:27.850 --> 01:46:31.490 And this is where things get cool in Python, relatively speaking. 01:46:31.490 --> 01:46:35.050 If I want to iterate over all of the characters in a string, 01:46:35.050 --> 01:46:38.140 and print them out in uppercase, one way to do that would be this. 01:46:38.140 --> 01:46:46.032 For c in the before string, go ahead and print out C.uppercase, sorry, C.upper, 01:46:46.032 --> 01:46:49.240 but don't end the line yet, because I want to keep these all on the same line 01:46:49.240 --> 01:46:50.440 until I'm all done. 01:46:50.440 --> 01:46:51.490 So what am I doing? 01:46:51.490 --> 01:46:54.970 Python of Uppercase.py, let me type in Hello in all lowercase. 01:46:54.970 --> 01:46:57.010 I've just upper-cased the whole string. 01:46:57.010 --> 01:46:57.700 How? 01:46:57.700 --> 01:47:00.130 I first get string, calling it before. 01:47:00.130 --> 01:47:02.680 I then just print out some fluffy text that says after colon, 01:47:02.680 --> 01:47:04.840 and I get rid of the line ending, just so I can kind of line these up. 01:47:04.840 --> 01:47:06.632 Notice I hit the spacebar a couple of times 01:47:06.632 --> 01:47:08.620 just so letters line up to be pretty. 01:47:08.620 --> 01:47:10.780 For c and before, this is new. 01:47:10.780 --> 01:47:14.500 This is powerful in C, sorry, in Python, whereby 01:47:14.500 --> 01:47:17.590 you don't have to do like Int i equals 0 and i less than this, 01:47:17.590 --> 01:47:22.310 you could just say, for c in the string in question, for c and before. 01:47:22.310 --> 01:47:25.510 And then here is just upper-casing that specific character, 01:47:25.510 --> 01:47:27.700 and making sure we don't output a new line too soon. 01:47:27.700 --> 01:47:29.920 But this is actually more work than I need to do. 01:47:29.920 --> 01:47:34.000 Based on what we've seen thus far, like from our agreement example, 01:47:34.000 --> 01:47:35.620 can I tighten this up further? 01:47:35.620 --> 01:47:40.340 Can I collapse lines 5 and 6, maybe even 7, all together? 01:47:40.340 --> 01:47:46.550 If the goal of this program is just to uppercase the before string, 01:47:46.550 --> 01:47:49.640 how might I do this? 01:47:49.640 --> 01:47:50.480 Yeah, in back. 01:47:50.480 --> 01:47:52.287 AUDIENCE: Would it be str.upper? 01:47:52.287 --> 01:47:54.620 DAVID J. MALAN: Str.upper, yeah, so I could do something 01:47:54.620 --> 01:47:57.500 like this, after gets before.upper. 01:47:57.500 --> 01:47:59.750 So it's not stir literally dot upper, stir 01:47:59.750 --> 01:48:01.500 just represents the string in question. 01:48:01.500 --> 01:48:04.620 So it would be before.upper, but right idea otherwise. 01:48:04.620 --> 01:48:08.130 And so let me go ahead and just tweak my print statement a little bit. 01:48:08.130 --> 01:48:12.810 Let me just go ahead and print out the after variable here, after creating it. 01:48:12.810 --> 01:48:15.440 So this line is the same, I'm getting a string called before. 01:48:15.440 --> 01:48:18.530 I'm creating another variable called after, and, as you propose, 01:48:18.530 --> 01:48:21.960 I'm calling upper on the whole string, not one character at a time. 01:48:21.960 --> 01:48:22.460 Why? 01:48:22.460 --> 01:48:23.360 Because it's allowed. 01:48:23.360 --> 01:48:27.350 And, again, in Python, there aren't technically characters individually. 01:48:27.350 --> 01:48:28.760 There's only strings, anyway. 01:48:28.760 --> 01:48:30.600 So I might as well do them all at once. 01:48:30.600 --> 01:48:34.220 So if I rerun the code now, Python of Uppercase.py. 01:48:34.220 --> 01:48:39.080 Now I'll type in Hello in all lowercase, and, oh, so close, 01:48:39.080 --> 01:48:42.110 I think I can get rid of this override, because I'm 01:48:42.110 --> 01:48:45.510 printing the whole thing out at once, not character by character. 01:48:45.510 --> 01:48:49.880 So now if I type in Hello before, now I have an even tighter version 01:48:49.880 --> 01:48:52.080 of the program here. 01:48:52.080 --> 01:48:55.910 All right, any questions, then, on lists or on strings, 01:48:55.910 --> 01:49:01.240 and what this kind of function, upper, represents, with its docs. 01:49:01.240 --> 01:49:01.740 No? 01:49:01.740 --> 01:49:04.760 All right, so a couple other building blocks before we start. 01:49:04.760 --> 01:49:05.855 Oh. 01:49:05.855 --> 01:49:06.480 Where was that? 01:49:06.480 --> 01:49:08.010 AUDIENCE: To the right. 01:49:08.010 --> 01:49:10.050 DAVID J. MALAN: To the right, right. 01:49:10.050 --> 01:49:11.040 Yes, thank you. 01:49:11.040 --> 01:49:17.202 AUDIENCE: Could you write, very close to variable string, and then print upper, 01:49:17.202 --> 01:49:19.257 you start creating a variable upper. 01:49:19.257 --> 01:49:21.840 DAVID J. MALAN: Yes, do I have to create this variable, upper? 01:49:21.840 --> 01:49:22.590 No, I don't. 01:49:22.590 --> 01:49:24.870 I could actually tighten this up, and, if you really 01:49:24.870 --> 01:49:28.170 want to see something neat, inside of the curly braces, 01:49:28.170 --> 01:49:31.050 you don't have to just put the names of variables. 01:49:31.050 --> 01:49:33.600 You can put a small amount of logic, so long 01:49:33.600 --> 01:49:36.780 as it doesn't start to look stupid and kind of overwhelmingly complex, such 01:49:36.780 --> 01:49:38.940 that it's sort of bad design at that point. 01:49:38.940 --> 01:49:40.540 I can tighten this up like this. 01:49:40.540 --> 01:49:44.610 And now we're in Python of Uppercase.py, writing Hello again. 01:49:44.610 --> 01:49:45.730 And that, too, works. 01:49:45.730 --> 01:49:47.280 But I would be careful about this. 01:49:47.280 --> 01:49:50.483 You want to resist the temptation of having like a long line of code that's 01:49:50.483 --> 01:49:53.400 inside the curly braces, because it's just going to be harder to read. 01:49:53.400 --> 01:49:55.890 But, absolutely, you could indeed do that, too. 01:49:55.890 --> 01:49:58.950 All right, how about command line arguments, which was one thing 01:49:58.950 --> 01:50:03.030 we introduced in week two also, so that we could actually have the ability 01:50:03.030 --> 01:50:06.750 to take input from the user, whoops. 01:50:06.750 --> 01:50:10.270 So we could actually take input from the user at the command line, 01:50:10.270 --> 01:50:13.210 so as to take literally command line arguments. 01:50:13.210 --> 01:50:16.020 These are a little different, but it follows the same paradigm. 01:50:16.020 --> 01:50:19.860 There's no main by default. And there's no Def main int 01:50:19.860 --> 01:50:26.050 arg c char, or we called it string, argv by default. There's none of this. 01:50:26.050 --> 01:50:30.510 So if you want access to the argument vector, argv, you import it. 01:50:30.510 --> 01:50:35.100 And it turns out, there's another module in Python, or library in Python 01:50:35.100 --> 01:50:39.180 called CIS, and you can import from the system this thing called argv. 01:50:39.180 --> 01:50:41.357 So same idea, different place. 01:50:41.357 --> 01:50:42.940 Now I'm going to go ahead and do this. 01:50:42.940 --> 01:50:47.820 Let's write a program that just requires that the user types in two, a word, 01:50:47.820 --> 01:50:50.050 after the program's name, or none at all. 01:50:50.050 --> 01:50:56.670 So if the length of argv equals 2, let's go ahead and print out, how about, 01:50:56.670 --> 01:51:05.088 Hello comma argv bracket 1 close quote, else if they don't type two words 01:51:05.088 --> 01:51:08.130 total at the prompt, let's just say the default's, like we did weeks ago, 01:51:08.130 --> 01:51:09.160 Hello, world. 01:51:09.160 --> 01:51:12.180 So the only thing that's new here is we're importing argv from CIS, 01:51:12.180 --> 01:51:15.450 and we're using this fancy f-string format, which kind of to your point, 01:51:15.450 --> 01:51:18.510 too, it's putting more complex logic in the curly braces. 01:51:18.510 --> 01:51:19.270 But that's OK. 01:51:19.270 --> 01:51:23.890 In this case, it's a list called argv, and we're getting bracket 1 from it. 01:51:23.890 --> 01:51:27.780 Let's do Python of Argv.py, Enter, Hello, world. 01:51:27.780 --> 01:51:31.480 What if I do Argv.py David at the command line. 01:51:31.480 --> 01:51:32.730 Now I get Hello, David. 01:51:32.730 --> 01:51:34.680 So there's one curiosity here. 01:51:34.680 --> 01:51:39.375 Python is not included in argv, whereas in C, dot 01:51:39.375 --> 01:51:41.940 slash whatever was the first thing. 01:51:41.940 --> 01:51:45.510 If the analog in Python is that the name of your Python program 01:51:45.510 --> 01:51:49.800 is the first thing, in bracket 0, which is why David is in bracket 1, 01:51:49.800 --> 01:51:55.740 the word Python does not appear in the argv list, just to be clear. 01:51:55.740 --> 01:51:57.990 But otherwise, the idea of these arguments 01:51:57.990 --> 01:52:00.383 is exactly the same as before. 01:52:00.383 --> 01:52:02.550 And in fact, what you can do, which is kind of cool, 01:52:02.550 --> 01:52:05.730 is, because argv is a list, you can do things like this. 01:52:05.730 --> 01:52:10.890 For arg in argv, go ahead and print out each argument. 01:52:10.890 --> 01:52:12.990 So instead of using a for loop and i and all 01:52:12.990 --> 01:52:17.220 of this, if I do Python of argv Enter, it just writes the program's name. 01:52:17.220 --> 01:52:21.960 If I do Python of argv Foo, it puts Argv.py and Foo. 01:52:21.960 --> 01:52:26.520 If I do, sorry, if I do Foo and bar, those words all print out. 01:52:26.520 --> 01:52:28.770 If I do Foobar baz, those print out too. 01:52:28.770 --> 01:52:31.830 And Foo and bar or baz are like a mathematician's x and y and z 01:52:31.830 --> 01:52:35.200 for computer scientists, when you just need some placeholder words. 01:52:35.200 --> 01:52:36.420 So this is just nice. 01:52:36.420 --> 01:52:40.020 It reads a little more like English, and a for loop is just much more concise, 01:52:40.020 --> 01:52:43.530 allows you to iterate very quickly when you want something like that. 01:52:43.530 --> 01:52:46.170 Suppose I only wanted the real words that the human typed 01:52:46.170 --> 01:52:47.250 after the program's name. 01:52:47.250 --> 01:52:50.460 Like, suppose I want to ignore Argv.py. 01:52:50.460 --> 01:52:53.640 I mean I could do something hackish like this. 01:52:53.640 --> 01:52:59.105 If arg equals Argv.py, I could just ignore, 01:52:59.105 --> 01:53:00.480 you know, let's invert the logic. 01:53:00.480 --> 01:53:02.530 I could do this, for instance. 01:53:02.530 --> 01:53:05.100 So if the arg does not equal the program name, 01:53:05.100 --> 01:53:07.890 then go ahead and print out the word. 01:53:07.890 --> 01:53:09.840 So I get Foobar and baz only. 01:53:09.840 --> 01:53:14.400 Or, this is what's kind of neat about Python 2, let me undo that. 01:53:14.400 --> 01:53:18.400 And let me just take a slice of the array of the list instead. 01:53:18.400 --> 01:53:22.810 So it turns out, if argv is a list, I can actually say, 01:53:22.810 --> 01:53:27.060 you know what, go into that list, start at element 1, instead of 0, 01:53:27.060 --> 01:53:29.200 and then go all the way to the end. 01:53:29.200 --> 01:53:31.800 And we have not seen this syntax in C. But this 01:53:31.800 --> 01:53:34.410 is a way of slicing a list in Python. 01:53:34.410 --> 01:53:35.820 So now watch what happens. 01:53:35.820 --> 01:53:40.860 If I run Python of Argv.py, Foo bar baz Enter, 01:53:40.860 --> 01:53:44.730 I get only a subset of the list, starting at position 1, 01:53:44.730 --> 01:53:46.892 going all of the way to the end. 01:53:46.892 --> 01:53:48.600 And you can even do kind of the opposite. 01:53:48.600 --> 01:53:51.330 If, for whatever reason, you want to ignore the last element, 01:53:51.330 --> 01:53:57.030 you can say colon, we could say colon negative 1, 01:53:57.030 --> 01:53:59.560 and use a negative number, which we've not seen before, 01:53:59.560 --> 01:54:02.470 which slices off the end of the list, as well. 01:54:02.470 --> 01:54:06.000 So there's some syntactic tricks that tend to be powerful in Python 2, 01:54:06.000 --> 01:54:10.140 even if at first glance, you might not need them for typical things. 01:54:10.140 --> 01:54:12.798 All right, let's do one other example with exit, 01:54:12.798 --> 01:54:15.090 and then we'll start actually applying some algorithms, 01:54:15.090 --> 01:54:16.215 to make things interesting. 01:54:16.215 --> 01:54:20.470 So in one last program here, let's do Exit.py, just to do one more mechanic, 01:54:20.470 --> 01:54:22.210 before we introduce some algorithms. 01:54:22.210 --> 01:54:24.220 And let's do this. 01:54:24.220 --> 01:54:28.900 Let's import from CIS, import argv. 01:54:28.900 --> 01:54:30.490 Let's now do this. 01:54:30.490 --> 01:54:33.200 Let's make sure the user gives me one command line argument. 01:54:33.200 --> 01:54:39.580 So if the length of argv does not equal 2 in total, then let's go ahead 01:54:39.580 --> 01:54:42.790 and print out something like missing command line argument, 01:54:42.790 --> 01:54:44.590 just to explain what the problem is. 01:54:44.590 --> 01:54:47.380 And then let's do this. 01:54:47.380 --> 01:54:48.580 We can exit. 01:54:48.580 --> 01:54:50.710 But I'm going to use a better version of exit here. 01:54:50.710 --> 01:54:52.900 Let me import two functions from CIS. 01:54:52.900 --> 01:54:57.040 Turns out the better way to do this is with CIS.exit, because I can then exit 01:54:57.040 --> 01:54:59.993 specifically 2, with this exit code. 01:54:59.993 --> 01:55:02.410 Otherwise, down here, I'm going to go ahead and print out, 01:55:02.410 --> 01:55:06.818 something like Hello, comma argv bracket 1, same as before. 01:55:06.818 --> 01:55:08.360 And then I'm going to exit with zero. 01:55:08.360 --> 01:55:10.410 So, again, this was a subtle thing we introduced 01:55:10.410 --> 01:55:12.910 in week two, where you can actually have your programs exit, 01:55:12.910 --> 01:55:15.430 with some number, where 0 signifies success, 01:55:15.430 --> 01:55:17.350 and anything else signifies error. 01:55:17.350 --> 01:55:19.240 This is just the same idea in Python. 01:55:19.240 --> 01:55:23.920 So if I, for instance, just run the program like this, oops, I screwed up. 01:55:23.920 --> 01:55:26.620 I meant to say exit here and exit here. 01:55:26.620 --> 01:55:27.710 Let me do that again. 01:55:27.710 --> 01:55:30.500 If I run this like this, I'm missing a command line argument. 01:55:30.500 --> 01:55:33.200 So let me rerun it with like my name at the prompt. 01:55:33.200 --> 01:55:37.030 So I have exactly two command line arguments, the file name and my name, 01:55:37.030 --> 01:55:38.050 Hello comma David. 01:55:38.050 --> 01:55:40.342 And if I do David Malan, it's not going to work either, 01:55:40.342 --> 01:55:42.160 because now argv does not equal 2. 01:55:42.160 --> 01:55:44.860 But the difference here is that we're exiting with 1, 01:55:44.860 --> 01:55:49.900 so that special programs can detect an error, or 0 in the event of success. 01:55:49.900 --> 01:55:52.180 And now there's one other way to do this, too. 01:55:52.180 --> 01:55:54.460 Suppose that you're importing a lot of functions, 01:55:54.460 --> 01:55:56.943 and you don't really want to make a mess of things 01:55:56.943 --> 01:55:59.110 and just have all of these function names available, 01:55:59.110 --> 01:56:01.630 without it being clear where they came from. 01:56:01.630 --> 01:56:03.460 Let's just import all of CIS. 01:56:03.460 --> 01:56:07.180 And let's just change our syntax, kind of like I proposed for CS50, 01:56:07.180 --> 01:56:09.970 where we just prepend to all of these library functions, 01:56:09.970 --> 01:56:13.420 CIS, just to be super-explicit where they came from, 01:56:13.420 --> 01:56:18.837 and if there's another exit or argv value 01:56:18.837 --> 01:56:21.920 that we want to import from a library, this is one way to avoid collision. 01:56:21.920 --> 01:56:25.150 So if I do it one last time here, missing command line argument. 01:56:25.150 --> 01:56:27.190 But David still actually worked. 01:56:27.190 --> 01:56:30.250 All right, only to demonstrate how we can implement that same idea. 01:56:30.250 --> 01:56:33.130 Let's now do something more powerful, like a search algorithm, 01:56:33.130 --> 01:56:34.032 like binary search. 01:56:34.032 --> 01:56:36.490 I'm going to go ahead and open up a file called Numbers.py, 01:56:36.490 --> 01:56:40.420 and let's just do some searching or linear search, rather, 01:56:40.420 --> 01:56:42.440 on a list of numbers. 01:56:42.440 --> 01:56:44.060 Let's go ahead and do this. 01:56:44.060 --> 01:56:47.050 How about import CIS as before. 01:56:47.050 --> 01:56:52.840 Let me give myself a list of numbers, like 4, 6, 8, 2, 7, 5, 0, 01:56:52.840 --> 01:56:54.670 so just a bunch of integers. 01:56:54.670 --> 01:56:56.170 And then let's do this. 01:56:56.170 --> 01:56:59.590 If you recall from week three, we searched for the number 0 01:56:59.590 --> 01:57:01.880 at the end of the lockers on stage. 01:57:01.880 --> 01:57:04.120 So let's just ask that question in Python. 01:57:04.120 --> 01:57:05.860 No need for a loop or anything like that. 01:57:05.860 --> 01:57:09.550 If 0 is in the numbers, go ahead and print out found. 01:57:09.550 --> 01:57:13.420 And then let's just exit successfully, with 0, else, if we get down here, 01:57:13.420 --> 01:57:15.670 let's just say print not found. 01:57:15.670 --> 01:57:19.210 And then we'll CIS exit with 1. 01:57:19.210 --> 01:57:21.820 So this is where Python starts to get powerful again. 01:57:21.820 --> 01:57:23.050 Here's your list. 01:57:23.050 --> 01:57:25.733 Here is your loop, that's doing all of the checking for you. 01:57:25.733 --> 01:57:28.150 Underneath the hood, Python is going to use linear search. 01:57:28.150 --> 01:57:29.817 You don't have to implement it yourself. 01:57:29.817 --> 01:57:32.320 No while loop, no for loop, you just ask a question. 01:57:32.320 --> 01:57:36.230 If 0 is in numbers, then do the following. 01:57:36.230 --> 01:57:38.350 So that's one feature we now get with Python, 01:57:38.350 --> 01:57:40.340 and get to throw away a lot of that code. 01:57:40.340 --> 01:57:41.830 We can do it with strings, too. 01:57:41.830 --> 01:57:44.840 Let me open a file called Names.py instead, 01:57:44.840 --> 01:57:46.990 and do something that was even more involved in C, 01:57:46.990 --> 01:57:50.020 because we needed Str Comp and the for loop, and so forth. 01:57:50.020 --> 01:57:52.000 Let me import CIS for this file. 01:57:52.000 --> 01:57:54.460 Let's give myself a bunch of names like we did in C. 01:57:54.460 --> 01:58:01.630 And those were Bill and Charlie and Fred and George and Ginny, 01:58:01.630 --> 01:58:05.440 and two more, Percy, and lastly Ron. 01:58:05.440 --> 01:58:07.390 And recall, at the time, we looked for Ron. 01:58:07.390 --> 01:58:09.432 And so we had to iterate through the whole thing, 01:58:09.432 --> 01:58:11.810 doing Str Comp and i plus plus and all of that. 01:58:11.810 --> 01:58:18.760 Now just ask the question, if Ron is in names, then let's go ahead 01:58:18.760 --> 01:58:20.440 and, whoops, let me hide that. 01:58:20.440 --> 01:58:22.250 I hit the command too soon. 01:58:22.250 --> 01:58:26.180 Let me go ahead and say print, found, as before. 01:58:26.180 --> 01:58:29.710 CIS exit 1, just to indicate success, and then down here, 01:58:29.710 --> 01:58:32.840 if we get to this point, we can say not found. 01:58:32.840 --> 01:58:36.170 And then we'll just CIS exit 1 instead. 01:58:36.170 --> 01:58:40.960 So, again, this just does linear search for us by default, Python of Names.py, 01:58:40.960 --> 01:58:44.410 we found Ron, because, indeed, he's there, and at the end of the list. 01:58:44.410 --> 01:58:48.190 But we don't need to deal with all of the mechanics of it. 01:58:48.190 --> 01:58:50.530 All right, let's take things one step further. 01:58:50.530 --> 01:58:52.840 In week three, we also implemented the idea 01:58:52.840 --> 01:58:56.980 of a phone book, that actually associated keys with values. 01:58:56.980 --> 01:59:00.010 But remember, the phone book in C, was kind of a hack, right? 01:59:00.010 --> 01:59:03.520 Because we first had two arrays, one with names, one with numbers. 01:59:03.520 --> 01:59:07.330 Then we introduced structs, and so we gave you a person structure. 01:59:07.330 --> 01:59:10.900 And then we had an array of persons. 01:59:10.900 --> 01:59:15.040 You can do this in Python, using objects and things called classes. 01:59:15.040 --> 01:59:17.670 But we can also just use a general purpose dictionary, 01:59:17.670 --> 01:59:21.420 because just like in P set 5, you can associate keys with values, using 01:59:21.420 --> 01:59:23.100 a hash table, using a try. 01:59:23.100 --> 01:59:26.400 Well, similarly, can Python just do this for us. 01:59:26.400 --> 01:59:29.250 From CS50, let's import get string. 01:59:29.250 --> 01:59:32.760 And now let's give myself a dictionary of people, 01:59:32.760 --> 01:59:36.540 D-I-C-T () open paren closed paren gives you a dictionary. 01:59:36.540 --> 01:59:39.300 Or you can simplify the syntax, actually, 01:59:39.300 --> 01:59:42.360 and a dictionary again is just keys and values, words and definitions. 01:59:42.360 --> 01:59:45.060 You can also just use curly braces instead. 01:59:45.060 --> 01:59:47.020 That gives me an empty dictionary. 01:59:47.020 --> 01:59:50.400 But if I know what I want to put in it by default, let's put Carter in there, 01:59:50.400 --> 01:59:57.790 with a number of plus 1-617-495-1000, just like last time, and put myself, 01:59:57.790 --> 02:00:03.777 David, with plus 1-949-468-2750. 02:00:03.777 --> 02:00:06.360 And it came to my attention, tragically, after class that day, 02:00:06.360 --> 02:00:08.152 that we had a bug in our little Easter egg. 02:00:08.152 --> 02:00:11.190 If today, you would like to call me or text me, at that number, 02:00:11.190 --> 02:00:14.130 we have fixed the code that underlies that little Easter egg. 02:00:14.130 --> 02:00:15.090 Spoiler ahead. 02:00:15.090 --> 02:00:17.040 All right, so this now gives me a variable 02:00:17.040 --> 02:00:21.120 called people, that's associating keys with values. 02:00:21.120 --> 02:00:25.230 There is some new syntax here in Python, not just the curly braces, 02:00:25.230 --> 02:00:28.290 but the colons, and the quotes on the left and the right. 02:00:28.290 --> 02:00:31.380 This is a way, in Python, of associating keys 02:00:31.380 --> 02:00:35.350 with values, words with definitions, anything with anything else. 02:00:35.350 --> 02:00:38.550 And it's going to be a super-common paradigm, including in week seven, 02:00:38.550 --> 02:00:42.450 when we look at CSS and HTML and web programming, keys and values 02:00:42.450 --> 02:00:45.840 are like this omnipresent idea in computer science and programming, 02:00:45.840 --> 02:00:49.300 because it's just a really useful way of associating one thing with another. 02:00:49.300 --> 02:00:52.690 So, at this point in the story, we have a dictionary, a hash table, 02:00:52.690 --> 02:00:56.190 if you will, of people, associating names with phone numbers, 02:00:56.190 --> 02:00:57.675 just like a real world phone book. 02:00:57.675 --> 02:01:01.200 So let's write a program that gets a string from the user and asks them 02:01:01.200 --> 02:01:03.390 whose number they would like to look up. 02:01:03.390 --> 02:01:09.510 Then, let's go ahead and say, if that name is in the people dictionary, 02:01:09.510 --> 02:01:12.090 go ahead and print out that person's number, 02:01:12.090 --> 02:01:14.730 by going into the people dictionary and going 02:01:14.730 --> 02:01:19.480 to that specific name, within there, using an f-string for the whole thing. 02:01:19.480 --> 02:01:21.960 So this is similar in spirit to before. 02:01:21.960 --> 02:01:26.130 Linear search and dictionary lookups will just happen automatically for you 02:01:26.130 --> 02:01:29.280 in Python, by just asking the question, if name and people. 02:01:29.280 --> 02:01:31.170 And this line is just going to print out, 02:01:31.170 --> 02:01:35.710 whoever is in the people dictionary, at that name. 02:01:35.710 --> 02:01:40.200 So I'm using square brackets, because here's the interesting thing in Python, 02:01:40.200 --> 02:01:43.320 just like you can index into an array, or a list in Python, 02:01:43.320 --> 02:01:48.150 using numbers, 0, 1, 2, you can very conveniently index 02:01:48.150 --> 02:01:53.080 into a dictionary in Python, using square brackets, as well. 02:01:53.080 --> 02:01:56.070 And just to make clear what's going on here, let me go 02:01:56.070 --> 02:02:00.480 and create a temporary variable, person equals people bracket name. 02:02:00.480 --> 02:02:05.010 And then let's just, or, sorry, let's say, number equals people bracket name. 02:02:05.010 --> 02:02:07.890 And that will just print out the number in question. 02:02:07.890 --> 02:02:11.850 In C, and previously in Python, anything with square brackets like this 02:02:11.850 --> 02:02:16.950 would have been go to a location in a list or an array, using a number. 02:02:16.950 --> 02:02:20.790 But that can actually be a string, like a word the human has typed. 02:02:20.790 --> 02:02:22.830 And this is what's amazing about dictionaries, 02:02:22.830 --> 02:02:25.890 it's not like a big line, a big linear thing. 02:02:25.890 --> 02:02:28.740 It's this table, that you can look up in one column the name, 02:02:28.740 --> 02:02:31.060 and get back in the other column the number. 02:02:31.060 --> 02:02:33.120 So let's go ahead and run Python of Phonebook.py, 02:02:33.120 --> 02:02:38.100 found, not that, oh, wait. 02:02:38.100 --> 02:02:41.880 That's not what's supposed to happen at all. 02:02:41.880 --> 02:02:43.440 I think I'm in the wrong play. 02:02:43.440 --> 02:02:44.290 Phonebook.py. 02:02:47.130 --> 02:02:49.260 What's going on? 02:02:49.260 --> 02:02:51.720 Print found. 02:02:51.720 --> 02:02:53.580 I am confused. 02:02:53.580 --> 02:02:55.830 OK, let's run this again. 02:02:55.830 --> 02:02:59.970 Python of Phonebook.py, what the-- 02:02:59.970 --> 02:03:01.050 OK, stand by. 02:03:07.026 --> 02:03:17.902 [KEYS CLICKING] 02:03:17.902 --> 02:03:19.140 What the heck? 02:03:19.140 --> 02:03:21.255 What am I not understanding here? 02:03:24.180 --> 02:03:27.348 OK, Roxanne, Carter, do you see what I'm doing wrong? 02:03:27.348 --> 02:03:29.220 AUDIENCE: I don't. 02:03:29.220 --> 02:03:31.484 DAVID J. MALAN: What the-- 02:03:31.484 --> 02:03:33.720 [LAUGHTER] 02:03:33.720 --> 02:03:34.230 Say again? 02:03:34.230 --> 02:03:38.110 SPEAKER 47: When you found the test results, it was doing both commands. 02:03:38.110 --> 02:03:43.390 DAVID J. MALAN: Oh, yeah, found, OK, we're going to do this. 02:03:43.390 --> 02:03:45.622 One sec. 02:03:45.622 --> 02:03:52.270 [KEYS CLICKING] 02:03:52.270 --> 02:03:55.360 Whoa, OK. 02:03:55.360 --> 02:03:57.270 All this is coming out of the video. 02:03:57.270 --> 02:03:58.228 So. 02:03:58.228 --> 02:03:59.164 [LAUGHTER] 02:03:59.164 --> 02:04:01.310 [APPLAUSE] 02:04:01.310 --> 02:04:01.810 Thanks. 02:04:05.400 --> 02:04:06.283 All right. 02:04:06.283 --> 02:04:08.200 I will try to figure out what was going wrong. 02:04:08.200 --> 02:04:10.800 The best I can tell, it was running the wrong program. 02:04:10.800 --> 02:04:12.820 I don't quite understand why. 02:04:12.820 --> 02:04:14.170 So we will diagnose this later. 02:04:14.170 --> 02:04:16.962 I just put the file into a temporary directory, for now, to run it. 02:04:16.962 --> 02:04:22.710 So let me go ahead and just run this, Python of Phonebook.py, 02:04:22.710 --> 02:04:24.240 type in, for instance, my name. 02:04:24.240 --> 02:04:26.418 And there's my corresponding number. 02:04:26.418 --> 02:04:27.960 Have no idea what was just happening. 02:04:27.960 --> 02:04:30.060 But I will get to the bottom of it and update you, 02:04:30.060 --> 02:04:31.360 if we can put our finger on it. 02:04:31.360 --> 02:04:34.890 So this was just an example, now, of implementing a phone book. 02:04:34.890 --> 02:04:37.590 Let's now consider what we can do that's a little more 02:04:37.590 --> 02:04:40.410 powerful, in these examples, like a phone book that 02:04:40.410 --> 02:04:42.150 actually keeps this information around. 02:04:42.150 --> 02:04:45.510 Thus far, these simple phone book examples throw the information away. 02:04:45.510 --> 02:04:48.780 But using CSV files, comma separated values, 02:04:48.780 --> 02:04:51.555 maybe we could actually keep around the names and numbers, 02:04:51.555 --> 02:04:53.430 so that, like on your phone, you can actually 02:04:53.430 --> 02:04:55.780 keep your contacts around long-term. 02:04:55.780 --> 02:04:59.060 So I'm going to go ahead now and do a slightly different example. 02:04:59.060 --> 02:05:03.240 And let me just hide this detail, so it's not confusing. 02:05:03.240 --> 02:05:06.630 Whoops, I'm going to change my prompt temporarily. 02:05:06.630 --> 02:05:10.540 So let me go ahead now and refine this example as follows. 02:05:10.540 --> 02:05:13.830 I'm going to go into Phonebook.py, and I'm 02:05:13.830 --> 02:05:16.290 going to import a whole library called CSV. 02:05:16.290 --> 02:05:18.150 And this is a powerful one, because Python 02:05:18.150 --> 02:05:21.870 comes with a library that just handles CSV files for you. 02:05:21.870 --> 02:05:25.600 A CSV file is just a file with comma separated values. 02:05:25.600 --> 02:05:29.580 And, in fact, to demonstrate this, let me check on one thing 02:05:29.580 --> 02:05:32.460 here, just to make this a little more real. 02:05:32.460 --> 02:05:39.010 To demonstrate this, let's go ahead and do this. 02:05:39.010 --> 02:05:41.970 Let me import the CSV library from CS50. 02:05:41.970 --> 02:05:43.830 Let me import getString. 02:05:43.830 --> 02:05:47.550 Let me then open a file, using the open function, 02:05:47.550 --> 02:05:52.410 open a file called Phonebook.csv, in append format, 02:05:52.410 --> 02:05:54.900 in contrast with read format and write format. 02:05:54.900 --> 02:05:58.450 Write just blows it away if it exists, append adds to the bottom of it. 02:05:58.450 --> 02:06:00.930 So I keep this phone book around, just like you might 02:06:00.930 --> 02:06:02.868 keep adding contacts to your phone. 02:06:02.868 --> 02:06:05.410 Now let me go ahead and get a couple of values from the user. 02:06:05.410 --> 02:06:08.820 Let me say getString and ask the user for a name. 02:06:08.820 --> 02:06:14.160 Then let me getString again, and ask the user for their number. 02:06:14.160 --> 02:06:16.185 And now, let me go ahead and do this. 02:06:16.185 --> 02:06:18.060 And this is new, and this is Python-specific. 02:06:18.060 --> 02:06:20.820 And you would only know this by following a tutorial, 02:06:20.820 --> 02:06:22.480 or reading the documentation. 02:06:22.480 --> 02:06:24.870 Let me give myself a variable called writer, 02:06:24.870 --> 02:06:29.950 and ask the CSV library for a writer to that file. 02:06:29.950 --> 02:06:33.390 Then, let me go ahead and use that writer variable, 02:06:33.390 --> 02:06:36.720 use a function or a method inside of it, called write row, 02:06:36.720 --> 02:06:41.200 to write out a list containing that person's name and number. 02:06:41.200 --> 02:06:44.310 Notice the square brackets inside the parentheses, 02:06:44.310 --> 02:06:49.350 because I'm just printing a list to that particular row in the file. 02:06:49.350 --> 02:06:51.100 And then I'm just going to close the file. 02:06:51.100 --> 02:06:52.742 So what is the effect of all of this? 02:06:52.742 --> 02:06:55.200 Well, let me go ahead and run this version of Phonebook.py, 02:06:55.200 --> 02:06:56.680 and I'm prompted for a name. 02:06:56.680 --> 02:07:05.130 Let's do Carter's first, plus 1-617-495-1000, and then, 02:07:05.130 --> 02:07:07.770 let's go ahead and LS. 02:07:07.770 --> 02:07:10.960 Notice in my current directory, there's two files now, Phonebook.py, 02:07:10.960 --> 02:07:14.430 which I wrote, and apparently Phonebook.csv. 02:07:14.430 --> 02:07:16.830 CSV just stands for comma separated values. 02:07:16.830 --> 02:07:20.380 And it's like a very simple way of storing data in a spreadsheet, 02:07:20.380 --> 02:07:23.670 if you will, where the comma represents the separation between your columns. 02:07:23.670 --> 02:07:26.370 There's only two columns here, name and number. 02:07:26.370 --> 02:07:29.580 But, because I'm writing to this file in append mode, 02:07:29.580 --> 02:07:33.220 let me run it one more time, Python of Phonebook.py, 02:07:33.220 --> 02:07:41.490 and let me go ahead and do David and plus 1-949-468-2750, Enter. 02:07:41.490 --> 02:07:43.350 And notice what happened in the CSV file. 02:07:43.350 --> 02:07:46.380 It automatically updated, because I'm now persisting 02:07:46.380 --> 02:07:49.000 this data to the file in question. 02:07:49.000 --> 02:07:51.360 So if I wanted to now read this file in, I 02:07:51.360 --> 02:07:55.680 could actually go ahead and do linear search on the data, 02:07:55.680 --> 02:07:58.650 using a read function to actually read from the CSV. 02:07:58.650 --> 02:08:01.350 But, for now, we'll just leave it a little simply as write. 02:08:01.350 --> 02:08:03.270 And let me make one refinement here. 02:08:03.270 --> 02:08:07.020 It turns out that, if you're in the habit of re-opening a file, 02:08:07.020 --> 02:08:09.330 you don't have to even close it explicitly. 02:08:09.330 --> 02:08:10.920 You can instead do this. 02:08:10.920 --> 02:08:16.050 You can instead say, with the opening of a file called Phonebook.csv 02:08:16.050 --> 02:08:21.300 in append mode, calling the thing file, go ahead and do all of these lines 02:08:21.300 --> 02:08:22.350 here. 02:08:22.350 --> 02:08:24.377 So the with keyword is a new thing in Python. 02:08:24.377 --> 02:08:27.210 And it's used in a few different ways, but one of the ways it's used 02:08:27.210 --> 02:08:28.335 is to tighten up code here. 02:08:28.335 --> 02:08:30.418 And I'm going to move my variables to the outside, 02:08:30.418 --> 02:08:32.910 because they don't need to be inside of the with statement, 02:08:32.910 --> 02:08:33.868 where the file is open. 02:08:33.868 --> 02:08:36.452 This just has the effect of ensuring that you, the programmer, 02:08:36.452 --> 02:08:38.790 don't screw up, and accidentally don't close your file. 02:08:38.790 --> 02:08:40.680 In fact, you might recall, from C, Valgrind 02:08:40.680 --> 02:08:45.237 might have complained at you, if you had a file that, you didn't close a file, 02:08:45.237 --> 02:08:47.820 you might have had a memory leak as a result. The with keyword 02:08:47.820 --> 02:08:51.840 takes care of all of that for you, as well. 02:08:51.840 --> 02:08:54.670 How about let's do, want to do this. 02:08:54.670 --> 02:08:57.960 How about, let's do one other thing. 02:08:57.960 --> 02:08:59.230 Let's do this. 02:08:59.230 --> 02:09:02.280 Let me go ahead and propose, that on your phone or laptop 02:09:02.280 --> 02:09:07.470 here, or online, go to this URL here, where you'll find a Google form. 02:09:07.470 --> 02:09:10.290 And just to show that these CSVs are actually kind of omnipresent, 02:09:10.290 --> 02:09:11.850 and if you've ever like used a Google Form 02:09:11.850 --> 02:09:13.560 or managed a student group, or something where you've 02:09:13.560 --> 02:09:15.750 collected data via Google Forms, you can actually 02:09:15.750 --> 02:09:18.640 export all of that data via CSV files. 02:09:18.640 --> 02:09:21.150 So go ahead to this URL here. 02:09:21.150 --> 02:09:22.950 And those of you watching on demand later, 02:09:22.950 --> 02:09:24.540 will find that the form is no longer working, 02:09:24.540 --> 02:09:26.030 since we're only doing this live. 02:09:26.030 --> 02:09:27.780 But that will lead to a Google Form that's 02:09:27.780 --> 02:09:30.750 going to let everyone input their answer to a question, 02:09:30.750 --> 02:09:33.660 like what house do you want to end up into, 02:09:33.660 --> 02:09:36.630 sort of an approximation of the sorting hat in Harry Potter. 02:09:36.630 --> 02:09:40.680 And via this form, will we then have the ability to export, 02:09:40.680 --> 02:09:43.780 we'll see, a CSV file. 02:09:43.780 --> 02:09:47.610 So let's give you a moment to do that. 02:09:47.610 --> 02:09:50.460 In just a moment, I'll share my version of the screen, which 02:09:50.460 --> 02:09:54.330 is going to let me actually open the file, the form itself. 02:09:54.330 --> 02:09:59.070 And in just a moment, I'll switch over. 02:09:59.070 --> 02:10:01.020 OK, so this is now my version of the form 02:10:01.020 --> 02:10:04.290 here, where we have 200 plus responses to a simple question of the form, what 02:10:04.290 --> 02:10:08.010 house do you belong in, Gryffindor, Hufflepuff, Ravenclaw, or Slytherin. 02:10:08.010 --> 02:10:12.800 If I go over to responses, I'll see all of the responses in the GUI form here. 02:10:12.800 --> 02:10:15.300 So graphical user interface, and we could flip through this. 02:10:15.300 --> 02:10:20.010 And it looks like, interestingly, 40% of Harvard students 02:10:20.010 --> 02:10:24.223 want to be in Gryffindor, 22% in Slytherin, and everyone else 02:10:24.223 --> 02:10:25.140 in between the others. 02:10:25.140 --> 02:10:27.270 But you might have noticed, if ever using a Google Form, 02:10:27.270 --> 02:10:28.720 this Google Spreadsheets link. 02:10:28.720 --> 02:10:30.010 So I'm going to go ahead and click that. 02:10:30.010 --> 02:10:32.460 And that's going to automatically open, in this case, Google Spreadsheets. 02:10:32.460 --> 02:10:35.290 But you can do the same thing with Office 365 as well. 02:10:35.290 --> 02:10:38.040 And now you see the raw data as a spreadsheet. 02:10:38.040 --> 02:10:42.900 But in Google Spreadsheets, if I go to File and then I go to Download, 02:10:42.900 --> 02:10:46.800 notice I can download this as an Excel file, a PDF, and also 02:10:46.800 --> 02:10:48.910 a CSV, comma separated values. 02:10:48.910 --> 02:10:50.620 So let me go ahead and do that. 02:10:50.620 --> 02:10:53.920 That gives me a file in my Downloads folder on my computer. 02:10:53.920 --> 02:10:57.970 I'm going to now go back to my code editor here. 02:10:57.970 --> 02:11:00.180 And what I'm going to go ahead and do is upload 02:11:00.180 --> 02:11:04.320 this file, from my Downloads folder to VS Code, 02:11:04.320 --> 02:11:06.610 so that we can actually see it within here. 02:11:06.610 --> 02:11:08.220 And now you can see this open file. 02:11:08.220 --> 02:11:11.220 And I'm going to shorten its name, just so it's a little easier to read. 02:11:11.220 --> 02:11:15.990 I'm going to rename this using the MV command, to just Hogwarts.csv. 02:11:15.990 --> 02:11:19.367 And then we can see, in the file, that there's two columns, timestamp column 02:11:19.367 --> 02:11:21.450 house, where you have a whole bunch of time stamps 02:11:21.450 --> 02:11:24.270 when people filled out the form, with someone very early in class. 02:11:24.270 --> 02:11:25.980 And then everyone else just a moment ago. 02:11:25.980 --> 02:11:29.310 And the second value, after each comma, is the name of the house. 02:11:29.310 --> 02:11:32.040 Well, let me go ahead here and implement a program 02:11:32.040 --> 02:11:36.100 in a file called Hogwarts.py, that processes this data. 02:11:36.100 --> 02:11:38.280 So in Hogwarts.py, let's just write a program 02:11:38.280 --> 02:11:41.440 that now reads a CSV, in this case not a phone book, 02:11:41.440 --> 02:11:43.410 but everyone's sorting hat information. 02:11:43.410 --> 02:11:45.450 And I'm going to go ahead and Import CSV. 02:11:45.450 --> 02:11:48.660 And suppose I want to answer a reasonable question, ignoring 02:11:48.660 --> 02:11:52.470 the fact that Google's GUI or graphical user interface, can do this for me. 02:11:52.470 --> 02:11:55.320 I just want to count up who's going to be in which house. 02:11:55.320 --> 02:11:59.640 So let me give myself a dictionary called houses, that's initially empty, 02:11:59.640 --> 02:12:00.780 with curly braces. 02:12:00.780 --> 02:12:02.790 And let me pre-create a few keys. 02:12:02.790 --> 02:12:07.500 Let me say Gryffindor is going to be initialized to 0, 02:12:07.500 --> 02:12:11.820 Hufflepuff will be initialized to 0 as well, Ravenclaw 02:12:11.820 --> 02:12:13.200 will be initialized to 0. 02:12:13.200 --> 02:12:16.770 And finally, Slytherin will be initialized to 0. 02:12:16.770 --> 02:12:19.950 So here's another example of a dictionary, or a hash table, 02:12:19.950 --> 02:12:22.140 just being a very general-purpose piece of data. 02:12:22.140 --> 02:12:23.760 You can have keys and values. 02:12:23.760 --> 02:12:25.470 The keys, in this case, are the houses. 02:12:25.470 --> 02:12:28.500 The values are initially zero, but I'm going to use this, 02:12:28.500 --> 02:12:33.600 instead of like four separate variables, to keep track of everyone's answer 02:12:33.600 --> 02:12:34.730 to this form. 02:12:34.730 --> 02:12:35.730 So I'm going to do this. 02:12:35.730 --> 02:12:43.180 With opening Hogwarts.csv, in read mode, not append, I don't want to change it. 02:12:43.180 --> 02:12:46.440 I just want to read it, as file as my variable name. 02:12:46.440 --> 02:12:49.530 Let's go ahead and create a reader this time, 02:12:49.530 --> 02:12:54.710 that is using the reader function in the CSV library, by opening that file. 02:12:54.710 --> 02:12:57.210 I'm going to go ahead and ignore the first line of the file, 02:12:57.210 --> 02:13:00.270 because, recall, that the first line is just timestamp and house. 02:13:00.270 --> 02:13:01.450 I want to get the real data. 02:13:01.450 --> 02:13:03.540 So this next function is just a little trick 02:13:03.540 --> 02:13:06.730 for ignoring the first line of the file. 02:13:06.730 --> 02:13:07.800 Then let's do this. 02:13:07.800 --> 02:13:12.180 For every other row in the reader, that is line by line, 02:13:12.180 --> 02:13:15.420 get the current person's house, which is in row bracket 1. 02:13:15.420 --> 02:13:18.213 This is what the CSV reader library is doing for us. 02:13:18.213 --> 02:13:20.130 It's handling all of the reading of this file. 02:13:20.130 --> 02:13:23.760 It figures out where the comma is, and, for every row in the file, 02:13:23.760 --> 02:13:26.250 it hands you back a list of size 2. 02:13:26.250 --> 02:13:31.090 In bracket 0 is the time stamp, in bracket 1 is the house name. 02:13:31.090 --> 02:13:34.830 So, in my code, I can say house equals row bracket 1. 02:13:34.830 --> 02:13:36.970 I don't care about the time stamp for this program. 02:13:36.970 --> 02:13:41.070 And then let's go into my dictionary called houses, plural, index 02:13:41.070 --> 02:13:47.370 into it at the house location, by its name, and increment that 0 to 1. 02:13:47.370 --> 02:13:50.280 And now, at the end of this block of code, 02:13:50.280 --> 02:13:53.040 that has the effect of iterating over every line of the file, 02:13:53.040 --> 02:13:55.470 updating my dictionary in four different places, 02:13:55.470 --> 02:13:59.190 based on whether someone typed Gryffindor or Slytherin or anything 02:13:59.190 --> 02:13:59.700 else. 02:13:59.700 --> 02:14:03.810 And notice that I'm using the name of the house to index into my dictionary, 02:14:03.810 --> 02:14:07.500 to essentially go up to this little cheat sheet and change the 0 to a 1, 02:14:07.500 --> 02:14:10.020 the 1 to a 2, the 2 to a 3, instead of having 02:14:10.020 --> 02:14:12.000 like four separate variables, which would just 02:14:12.000 --> 02:14:14.070 be much more annoying to maintain. 02:14:14.070 --> 02:14:16.290 Down at the bottom, let's just print out the results. 02:14:16.290 --> 02:14:19.620 For each house in those houses, iterating over 02:14:19.620 --> 02:14:21.750 the keys they're in by default in Python, 02:14:21.750 --> 02:14:24.630 let's go ahead and print out an f-string that says, 02:14:24.630 --> 02:14:29.460 the current house has the current count. 02:14:29.460 --> 02:14:35.070 And count will be the result of indexing into houses, for that given house. 02:14:35.070 --> 02:14:36.810 And let me close my quote. 02:14:36.810 --> 02:14:41.940 So let's run this to summarize the data, Hogwarts.py, 140 of you 02:14:41.940 --> 02:14:46.200 answered Gryffindor, 54 Hufflepuff, 72 Ravenclaw, and 80 of you Slytherin. 02:14:46.200 --> 02:14:48.570 And that's just my now way of code, and this is, oh, 02:14:48.570 --> 02:14:52.227 my God, so much easier than C, to actually analyze data in this way. 02:14:52.227 --> 02:14:55.560 And one of the reasons that Python is so popular for data science and analytics, 02:14:55.560 --> 02:14:59.910 more generally, is that it's actually really easy to manipulate data, and run 02:14:59.910 --> 02:15:00.940 analytics like this. 02:15:00.940 --> 02:15:02.370 And let me clean this up slightly. 02:15:02.370 --> 02:15:05.160 It's a little annoying that I just have to know and trust 02:15:05.160 --> 02:15:10.410 that the house name is in bracket 1 and timestamp is in bracket 0. 02:15:10.410 --> 02:15:11.440 Let's clean this up. 02:15:11.440 --> 02:15:16.530 There's something called a Dictionary Reader in the CSV library 02:15:16.530 --> 02:15:17.880 that I can use instead. 02:15:17.880 --> 02:15:22.470 Capital D, capital R, this means I can throw away this next thing, 02:15:22.470 --> 02:15:24.900 because what a dictionary reader does is it 02:15:24.900 --> 02:15:28.890 still returns to me every row from the file, one after the other, 02:15:28.890 --> 02:15:32.560 but it doesn't just give me a list of size 2 representing each row. 02:15:32.560 --> 02:15:33.960 It gives me a dictionary. 02:15:33.960 --> 02:15:39.000 And it uses, as the keys in that dictionary, timestamp and house, 02:15:39.000 --> 02:15:41.460 for every row in the file, which is just to say 02:15:41.460 --> 02:15:43.950 it makes my code a little more readable, because instead 02:15:43.950 --> 02:15:46.590 of doing this little trickery, bracket 1, 02:15:46.590 --> 02:15:49.500 I can say quote unquote "Bracket House" with a capital H, 02:15:49.500 --> 02:15:52.360 because it's capitalized in the Google Form itself. 02:15:52.360 --> 02:15:54.798 So the code now is just minorly different, 02:15:54.798 --> 02:15:57.840 but it's way more resilient, especially if I'm using Google Spreadsheets, 02:15:57.840 --> 02:16:00.390 and I'm moving the columns around or doing something like that, 02:16:00.390 --> 02:16:01.973 where the numbers might get messed up. 02:16:01.973 --> 02:16:05.260 Now I can run this on Hogwarts.py again, and I get the same answers. 02:16:05.260 --> 02:16:09.960 But I now don't have to worry about where those individual columns are. 02:16:09.960 --> 02:16:14.880 All right, any questions on those capabilities there. 02:16:14.880 --> 02:16:17.400 And that's a teaser of sorts, for some of the manipulation 02:16:17.400 --> 02:16:19.620 we'll do in P set 6. 02:16:19.620 --> 02:16:23.555 All right, so some final examples and flair, to intrigue 02:16:23.555 --> 02:16:24.930 with what you can do with Python. 02:16:24.930 --> 02:16:28.710 I'm going to actually switch over to a terminal window on my own Mac, 02:16:28.710 --> 02:16:31.900 so that I can actually use audio a little more effectively. 02:16:31.900 --> 02:16:33.930 So here's just a terminal window on Mac OS. 02:16:33.930 --> 02:16:37.950 I before class have preinstalled some additional Python libraries, 02:16:37.950 --> 02:16:40.379 that won't really work in VS Code in the cloud, 02:16:40.379 --> 02:16:43.535 because they require audio that the browser won't necessarily support. 02:16:43.535 --> 02:16:45.660 But I'm going to go ahead and write an example here 02:16:45.660 --> 02:16:49.559 that involves writing a speech-based program, that actually does something 02:16:49.559 --> 02:16:50.212 with speech. 02:16:50.212 --> 02:16:52.170 And I'm going to go ahead and import a library, 02:16:52.170 --> 02:16:55.709 that, again, I pre-installed, called Python text to speech, 02:16:55.709 --> 02:16:58.260 and I'm going to go ahead and, per its documentation, 02:16:58.260 --> 02:17:02.879 give myself a speech engine, by using that library's init function, 02:17:02.879 --> 02:17:04.080 for initialize. 02:17:04.080 --> 02:17:06.930 I'm then going to use this engine's save function 02:17:06.930 --> 02:17:09.180 to do something fun, like Hello, world. 02:17:09.180 --> 02:17:12.480 And then I'm going to go ahead and tell this engine to run and wait, 02:17:12.480 --> 02:17:13.855 while it says those words. 02:17:13.855 --> 02:17:15.480 All right, I'm going to save this file. 02:17:15.480 --> 02:17:16.980 I'm not using VS Code at the moment. 02:17:16.980 --> 02:17:20.070 I'm using another popular program that we used in CS50 back in my day, 02:17:20.070 --> 02:17:22.830 called Vim, which is a command line program that's 02:17:22.830 --> 02:17:24.790 just in this black and white window. 02:17:24.790 --> 02:17:28.849 Let me go ahead now and run Python of Speech.py, and-- 02:17:28.849 --> 02:17:30.745 COMPUTER: Hello, world. 02:17:30.745 --> 02:17:33.120 DAVID J. MALAN: All right, so it's a little computerized, 02:17:33.120 --> 02:17:36.113 but it is speech that has been synthesized from this example. 02:17:36.113 --> 02:17:38.280 Let's change it a little bit to be more interesting. 02:17:38.280 --> 02:17:39.488 Let's do something like this. 02:17:39.488 --> 02:17:43.950 Let's ask the user for their name, like what's your name question mark. 02:17:43.950 --> 02:17:47.850 And then, let's use the little F string, and say, not Hello, world, 02:17:47.850 --> 02:17:50.010 but Hello to that person's name. 02:17:50.010 --> 02:17:54.270 Let me save my file, run Python of Speech.py, Enter. 02:17:54.270 --> 02:17:55.260 David. 02:17:55.260 --> 02:17:57.360 COMPUTER: Hello, David. 02:17:57.360 --> 02:17:59.639 DAVID J. MALAN: All right, so we pronounce my name OK, 02:17:59.639 --> 02:18:02.306 might struggle with different names, depending on the phonetics. 02:18:02.306 --> 02:18:03.570 But that one seemed to be OK. 02:18:03.570 --> 02:18:05.850 Let's do something else with Python, using similarly, 02:18:05.850 --> 02:18:07.780 just a few lines of code. 02:18:07.780 --> 02:18:12.540 Let me go into today's examples. 02:18:12.540 --> 02:18:18.330 And I'm going to go into a folder called Detect, whoops, a folder called 02:18:18.330 --> 02:18:19.680 Faces.py. 02:18:19.680 --> 02:18:20.790 Sorry, Faces. 02:18:20.790 --> 02:18:23.370 And in this folder, that I've written in advance, 02:18:23.370 --> 02:18:25.879 are a few files, Detect.py, Recognize.py, 02:18:25.879 --> 02:18:30.330 and two full of photos, Office.jpeg and Toby.jpeg. 02:18:30.330 --> 02:18:32.799 If you're familiar with the show, here, for instance, 02:18:32.799 --> 02:18:34.809 is the cast photo from The Office here. 02:18:34.809 --> 02:18:36.299 So here's a photo as input. 02:18:36.299 --> 02:18:38.639 Suppose I want to do something very Facebook-style, 02:18:38.639 --> 02:18:40.860 where I want to analyze all of the faces, 02:18:40.860 --> 02:18:42.870 or detect all of the faces in there. 02:18:42.870 --> 02:18:44.940 Well, let me go ahead and show you a program 02:18:44.940 --> 02:18:47.879 I wrote in advance, that's not terribly long. 02:18:47.879 --> 02:18:49.379 Much of it is actually comments. 02:18:49.379 --> 02:18:50.639 But let's see what I'm doing. 02:18:50.639 --> 02:18:54.000 I'm importing the Pillow library, again, to get access to images. 02:18:54.000 --> 02:18:57.480 I'm importing a library called face recognition, which I downloaded 02:18:57.480 --> 02:18:58.590 and installed in advance. 02:18:58.590 --> 02:19:00.129 But it does what it says. 02:19:00.129 --> 02:19:02.959 According to its documentation, you go into that library 02:19:02.959 --> 02:19:04.760 and you call a function called load image 02:19:04.760 --> 02:19:07.370 file, to load something like Office.jpeg, 02:19:07.370 --> 02:19:10.040 and then you can use the line of code like this. 02:19:10.040 --> 02:19:14.120 Call a function called face locations, passing the images input, 02:19:14.120 --> 02:19:17.120 and you get back a list of all of the faces in the image. 02:19:17.120 --> 02:19:20.750 And then down here, a for loop, that iterates over all of those 02:19:20.750 --> 02:19:22.040 face locations. 02:19:22.040 --> 02:19:24.799 And inside of this loop, I just do a bit of trickery. 02:19:24.799 --> 02:19:29.580 I figure out the top, right, bottom, and left corners of those locations. 02:19:29.580 --> 02:19:31.940 And then, using these lines of code here, 02:19:31.940 --> 02:19:34.834 I'm using that image library, to just draw a box, essentially. 02:19:34.834 --> 02:19:35.959 And the code looks cryptic. 02:19:35.959 --> 02:19:38.150 Honestly, I would have to look this up to write it again. 02:19:38.150 --> 02:19:40.650 But per the documentation, this just draws a nice little box 02:19:40.650 --> 02:19:41.610 around the image. 02:19:41.610 --> 02:19:48.200 So let me go ahead and zoom out here, and run this now on Office.jpeg. 02:19:48.200 --> 02:19:53.390 All right, it's analyzing, analyzing, and you can see in the sidebar here, 02:19:53.390 --> 02:19:54.380 here's the original. 02:19:54.380 --> 02:19:59.180 And here is every face that my, what, 10 lines of Python code 02:19:59.180 --> 02:20:00.740 found, within that file. 02:20:00.740 --> 02:20:01.410 What's a face? 02:20:01.410 --> 02:20:04.190 Presumably the library is looking for something, 02:20:04.190 --> 02:20:07.100 maybe without a mask, that has two eyes, a nose, and a mouth, 02:20:07.100 --> 02:20:09.420 in some kind of arrangement, some kind of pattern. 02:20:09.420 --> 02:20:12.440 So it would seem pretty reliable, at least on these fairly easy-to-read 02:20:12.440 --> 02:20:13.370 faces here. 02:20:13.370 --> 02:20:15.660 What if we want to look for someone specific, 02:20:15.660 --> 02:20:17.180 for instance, someone that's always getting picked on. 02:20:17.180 --> 02:20:18.763 Well, we could do something like this. 02:20:18.763 --> 02:20:23.060 Recognize.py, which is taking two files as input, that image and the image 02:20:23.060 --> 02:20:24.620 of one person in particular. 02:20:24.620 --> 02:20:26.900 And if you're trying to find Toby in a crowd, 02:20:26.900 --> 02:20:29.570 here I conflated the program, sorry, this is the version that 02:20:29.570 --> 02:20:31.550 draws a box around the given face. 02:20:31.550 --> 02:20:33.680 Here we have Toby as identified. 02:20:33.680 --> 02:20:34.220 Why? 02:20:34.220 --> 02:20:38.450 Because that program, Recognize.py, has a few more lines of code, 02:20:38.450 --> 02:20:42.800 but long story short, it additionally loads as input Toby.jpeg, 02:20:42.800 --> 02:20:45.410 in order to recognize that specific face. 02:20:45.410 --> 02:20:48.350 And that specific face is a completely different photo, 02:20:48.350 --> 02:20:52.970 but it looks similar enough to the person, that it all worked out OK. 02:20:52.970 --> 02:20:55.820 Let's do one other that's a little sensitive to microphones. 02:20:55.820 --> 02:21:00.650 Let me go into, how about my listen folder here, which is available 02:21:00.650 --> 02:21:01.610 online, too. 02:21:01.610 --> 02:21:04.380 And let's just run Python of Listen0.py. 02:21:04.380 --> 02:21:07.430 I'm going to type in like David. 02:21:07.430 --> 02:21:10.520 Oh, sorry, no, I'm going to-- 02:21:10.520 --> 02:21:11.150 Hello, world. 02:21:16.045 --> 02:21:17.420 Oh, no, that's the wrong version. 02:21:17.420 --> 02:21:19.250 [CHUCKLES] OK, I looked like an idiot. 02:21:19.250 --> 02:21:21.500 OK, hello, there we go. 02:21:21.500 --> 02:21:22.310 Hello to you, too. 02:21:22.310 --> 02:21:26.300 And if I say goodbye, I'm talking to my laptop like an idiot, OK. 02:21:26.300 --> 02:21:28.590 Now it's detecting what I'm saying here. 02:21:28.590 --> 02:21:32.130 So this first version of the program is just using some relatively simple, if 02:21:32.130 --> 02:21:36.472 elif elif, and it's just asking for input, forcing it to lowercase. 02:21:36.472 --> 02:21:38.430 And that was my mistake with the first example. 02:21:38.430 --> 02:21:41.360 And then, I'm just checking, is Hello in the user's words? 02:21:41.360 --> 02:21:42.818 Is how are you in the user's words? 02:21:42.818 --> 02:21:44.152 Didn't see that, but it's there. 02:21:44.152 --> 02:21:45.470 Is goodbye in the user's words? 02:21:45.470 --> 02:21:49.280 Now let's do a cooler version, using a library, just by looking at the effect. 02:21:49.280 --> 02:21:51.140 Python of Listen1.py. 02:21:51.140 --> 02:21:55.685 Hello, world. 02:21:55.685 --> 02:21:56.720 Huh. 02:21:56.720 --> 02:22:04.170 Let's do version 2 of this, that uses an audio speech-to-text library. 02:22:04.170 --> 02:22:07.160 Hello, world. 02:22:07.160 --> 02:22:09.710 OK, so now it's artificial intelligence. 02:22:09.710 --> 02:22:11.810 Now let's do something a little more interesting. 02:22:11.810 --> 02:22:15.230 The third version of this program that actually analyzes the words that are 02:22:15.230 --> 02:22:16.880 said. 02:22:16.880 --> 02:22:18.800 Hello, world, my name is David. 02:22:18.800 --> 02:22:19.700 How are you? 02:22:22.760 --> 02:22:26.000 OK, so that time, it not only analyzed what I said, 02:22:26.000 --> 02:22:27.930 but it plucked my name out of it. 02:22:27.930 --> 02:22:30.480 Let's do two final examples. 02:22:30.480 --> 02:22:33.150 This one will generate a QR code. 02:22:33.150 --> 02:22:35.120 Let me go ahead and write a program called 02:22:35.120 --> 02:22:39.030 QR.py, that very simply does this. 02:22:39.030 --> 02:22:40.820 Let me import a library called OS. 02:22:40.820 --> 02:22:43.230 Let me import a library called QR code. 02:22:43.230 --> 02:22:48.000 Let me grab an image here, that's QRcode.make. 02:22:48.000 --> 02:22:51.440 And let me give you the URL of like a lecture video on YouTube, or something 02:22:51.440 --> 02:22:55.040 like that, with this ID. 02:22:55.040 --> 02:22:59.840 Let me just type this, so I don't get it wrong. 02:22:59.840 --> 02:23:05.300 OK, so if I now use this URL here, of a video on YouTube, making 02:23:05.300 --> 02:23:07.812 sure I haven't made any typos, I'm now going 02:23:07.812 --> 02:23:09.770 to go ahead and do two lines of code in Python. 02:23:09.770 --> 02:23:13.460 I'm going to first save that as a file called QR.png, which is 02:23:13.460 --> 02:23:15.490 a two dimensional barcode, a QR code. 02:23:15.490 --> 02:23:17.240 And, indeed, I'm going to use this format. 02:23:17.240 --> 02:23:23.790 And I'm going to use the OS.system library to open QR.png automatically. 02:23:23.790 --> 02:23:26.090 And if you'd like to take out your phone at this point, 02:23:26.090 --> 02:23:32.270 you can see the result of my barcode, that's just been dynamically generated. 02:23:32.270 --> 02:23:33.785 Hopefully from afar that will scan. 02:23:37.355 --> 02:23:40.150 [UPROAR] 02:23:40.150 --> 02:23:42.460 And I think that's an appropriate line to end on. 02:23:42.460 --> 02:23:43.860 So that's it for CS50. 02:23:43.860 --> 02:23:46.020 We will see you next time. 02:23:46.020 --> 02:23:47.820 [APPLAUSE] 02:23:47.820 --> 02:23:51.470 [MUSIC PLAYING]