WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:02.982 --> 00:00:06.461 [MUSIC PLAYING] 00:01:12.065 --> 00:01:13.210 DAVID MALAN: All right. 00:01:13.210 --> 00:01:18.700 This is CS50, and this is week six, wherein we finally transition 00:01:18.700 --> 00:01:20.935 from Scratch to C to, now, Python. 00:01:20.935 --> 00:01:22.735 And, indeed, this is going to be somewhat 00:01:22.735 --> 00:01:27.370 of a unique experience in that, just like a few weeks past-- 00:01:27.370 --> 00:01:30.605 perhaps, for the first time-- and now, today, you're 00:01:30.605 --> 00:01:31.855 going to learn a new language. 00:01:31.855 --> 00:01:35.935 But the goal isn't just to throw another fire hose of content and syntax 00:01:35.935 --> 00:01:39.568 and whatnot at you, but rather, to really equip you all to actually teach 00:01:39.568 --> 00:01:41.110 yourself new languages in the future. 00:01:41.110 --> 00:01:43.902 And so, indeed, what we'll do today, what we'll do this coming week 00:01:43.902 --> 00:01:46.580 is prepare you to stand on your own. 00:01:46.580 --> 00:01:48.527 And once Python is passe and the world has 00:01:48.527 --> 00:01:50.860 moved on to some other language in some number of years, 00:01:50.860 --> 00:01:52.568 you'll be well equipped to figure out how 00:01:52.568 --> 00:01:55.027 to wrap your mind around some new syntax, some new language 00:01:55.027 --> 00:01:56.280 and solve problems, as well. 00:01:56.280 --> 00:01:59.320 Now, you recall, in week zero, this is where we started-- 00:01:59.320 --> 00:02:01.390 just saying hello to the world. 00:02:01.390 --> 00:02:03.850 And that quickly escalated just a week later in C 00:02:03.850 --> 00:02:06.250 to be something much, much more cryptic. 00:02:06.250 --> 00:02:09.234 And if you've still struggled with some of the syntax, 00:02:09.234 --> 00:02:11.723 find yourself checking your notes or your previous code, 00:02:11.723 --> 00:02:12.640 that's totally normal. 00:02:12.640 --> 00:02:16.675 And that's one of the reasons why there are languages besides C 00:02:16.675 --> 00:02:18.970 out there-- among them, this language called Python. 00:02:18.970 --> 00:02:21.520 Humans over the decades have realized, gee, 00:02:21.520 --> 00:02:25.167 that wasn't necessarily the best design decision, or humans have realized, wow, 00:02:25.167 --> 00:02:25.750 you know what? 00:02:25.750 --> 00:02:30.160 Now that computers have gotten faster with more memory and faster CPUs, 00:02:30.160 --> 00:02:33.070 we can actually do more with our programming languages. 00:02:33.070 --> 00:02:36.985 So just as human languages evolve, so do actual programming languages. 00:02:36.985 --> 00:02:40.810 And even within a programming language, there's typically different versions. 00:02:40.810 --> 00:02:43.870 We, for instance, have been using version C11 00:02:43.870 --> 00:02:46.720 of C, which was updated in 2011. 00:02:46.720 --> 00:02:50.800 But Python itself continues to evolve, and it's now up to version 3-plus. 00:02:50.800 --> 00:02:53.680 And so there, too, these things will evolve in the coming days. 00:02:53.680 --> 00:02:56.560 Thankfully, what you're about to see is "Hello, World!" 00:02:56.560 --> 00:02:59.440 for the third time, but it's going to be literally this. 00:02:59.440 --> 00:03:04.930 None of the crazy syntax above or below, fewer semicolons, if any, fewer 00:03:04.930 --> 00:03:05.770 currently braces. 00:03:05.770 --> 00:03:08.630 And, really, a lot of the distractions get out of the way. 00:03:08.630 --> 00:03:11.200 So to get there, let's consider exactly how 00:03:11.200 --> 00:03:13.000 we've been programming up until now. 00:03:13.000 --> 00:03:16.300 So you write a program in C and you've got, hopefully, 00:03:16.300 --> 00:03:19.135 no syntax error, so you're ready to build it-- that is, compile it. 00:03:19.135 --> 00:03:22.135 And so, you've run make, and then, you've run the program, like ./hello. 00:03:22.135 --> 00:03:24.850 Or if you think back to week two, where we 00:03:24.850 --> 00:03:27.100 took a peek underneath the hood of what make is doing, 00:03:27.100 --> 00:03:29.710 it's really running the actual compiler-- 00:03:29.710 --> 00:03:32.800 something called clang-- maybe with some command-line arguments creating 00:03:32.800 --> 00:03:34.090 a program called hello. 00:03:34.090 --> 00:03:36.128 And then, you could do ./hello. 00:03:36.128 --> 00:03:38.920 So, today, you're going to start doing something similar in spirit, 00:03:38.920 --> 00:03:40.270 but fewer steps. 00:03:40.270 --> 00:03:42.270 No longer will you have to compile your code 00:03:42.270 --> 00:03:45.520 and then run it, and then, maybe, fix or change it, and then compile your code 00:03:45.520 --> 00:03:47.470 and run it, and then repeat, repeat. 00:03:47.470 --> 00:03:50.200 The process of running your code is going 00:03:50.200 --> 00:03:52.542 to be distilled into just a single step. 00:03:52.542 --> 00:03:54.250 And the way to think of this, for now, is 00:03:54.250 --> 00:03:58.420 that, whereas C is frequently used as, indeed, a compiled language whereby 00:03:58.420 --> 00:04:01.045 you convert it first to 0s and 1s, Python's 00:04:01.045 --> 00:04:04.400 going to let you speed things up whereby you, the human programmer, 00:04:04.400 --> 00:04:05.740 don't have to compile it. 00:04:05.740 --> 00:04:09.400 You're just going to run what's called an interpreter-- which, by design, 00:04:09.400 --> 00:04:12.190 is named the exact same thing as the language itself-- 00:04:12.190 --> 00:04:14.860 and by running this program installed in VS Code 00:04:14.860 --> 00:04:17.230 or, eventually, on your own Macs or PCs. 00:04:17.230 --> 00:04:20.320 This is just going to tell your computer to interpret this code 00:04:20.320 --> 00:04:23.800 and figure out how to get down to that lower level of 0s and 1s. 00:04:23.800 --> 00:04:26.626 But you don't have to compile the code yourself anymore. 00:04:26.626 --> 00:04:31.000 So with that said, let's consider what the code is going to look like, 00:04:31.000 --> 00:04:31.690 side by side. 00:04:31.690 --> 00:04:33.850 In fact, let's look back at some Scratch blocks, 00:04:33.850 --> 00:04:36.582 just like we did with C in week one, and do some side by sides. 00:04:36.582 --> 00:04:39.040 Because even though some of the syntax this week and beyond 00:04:39.040 --> 00:04:42.705 is going to be different, the ideas are truly going to be the same. 00:04:42.705 --> 00:04:45.565 There's not all that much intellectually new just yet. 00:04:45.565 --> 00:04:48.190 So whereas, in week zero, we might have said hello to the world 00:04:48.190 --> 00:04:51.220 with this purple puzzle piece, today, of course-- 00:04:51.220 --> 00:04:56.080 or, rather, in week one, it looked like this in C. But today, moving forward, 00:04:56.080 --> 00:04:58.665 it's going to, quite simply, look like this instead. 00:04:58.665 --> 00:05:00.610 And if we go back and forth for just a moment, 00:05:00.610 --> 00:05:03.580 here, again, is the version in C, noticing 00:05:03.580 --> 00:05:05.500 the very C-like characteristics. 00:05:05.500 --> 00:05:09.200 And just at a glance here, in Python, I claim it's now this. 00:05:09.200 --> 00:05:13.190 What do you apparently need not worry about anymore? 00:05:13.190 --> 00:05:14.940 What's gone? 00:05:14.940 --> 00:05:15.990 So semi-colon is gone. 00:05:15.990 --> 00:05:19.073 And, indeed, you don't need those to finish most of your thoughts anymore. 00:05:19.073 --> 00:05:19.830 Anything else? 00:05:19.830 --> 00:05:20.860 AUDIENCE: Backslash n. 00:05:20.860 --> 00:05:22.690 DAVID MALAN: So the backslash n is absent. 00:05:22.690 --> 00:05:25.140 And that's curious because we're still going to get a new line, 00:05:25.140 --> 00:05:26.985 but we'll see that it's become the default. 00:05:26.985 --> 00:05:29.402 And this one's a little more subtle, but now, the function 00:05:29.402 --> 00:05:31.185 is called print instead of printf. 00:05:31.185 --> 00:05:33.610 So it's a little more familiar in that sense. 00:05:33.610 --> 00:05:34.110 All right. 00:05:34.110 --> 00:05:37.050 So when it comes to using libraries-- that 00:05:37.050 --> 00:05:39.300 is, code that other people have written-- in the past, 00:05:39.300 --> 00:05:43.350 we've done things like #include cs50.h to use CS50's own header 00:05:43.350 --> 00:05:47.730 file or standard I/O or standard lib or string or any number of other header 00:05:47.730 --> 00:05:49.440 files you have all used. 00:05:49.440 --> 00:05:52.635 Moving forward, we're going to give you, for this first week, a similar CS50 00:05:52.635 --> 00:05:53.280 library-- 00:05:53.280 --> 00:05:55.920 just very short-term training wheels that we'll quickly 00:05:55.920 --> 00:05:59.370 take off because, in reality, it's a lot easier to do things in Python, 00:05:59.370 --> 00:06:00.267 as we'll see. 00:06:00.267 --> 00:06:02.100 But the syntax for this, now, is going to be 00:06:02.100 --> 00:06:05.165 to import the CS50 library in this way. 00:06:05.165 --> 00:06:08.452 And when we have, now, this ability, we can actually 00:06:08.452 --> 00:06:09.910 start writing some code right away. 00:06:09.910 --> 00:06:12.420 In fact, let me switch over to VS Code here. 00:06:12.420 --> 00:06:14.760 And just as in the past, I'll create a new file. 00:06:14.760 --> 00:06:17.230 But instead of creating something called .c, 00:06:17.230 --> 00:06:19.980 I'm going to go ahead and create my first program called hello.py, 00:06:19.980 --> 00:06:22.260 using code space hello dot py. 00:06:22.260 --> 00:06:24.000 That, of course, gives me this new tab. 00:06:24.000 --> 00:06:28.185 And let me actually, quite simply, do what I proposed-- print, quote unquote, 00:06:28.185 --> 00:06:33.780 "Hello, world" without the /n, without the semicolon, without the f in print. 00:06:33.780 --> 00:06:36.270 And now, let me go down to my terminal window. 00:06:36.270 --> 00:06:37.792 And I don't have to compile it. 00:06:37.792 --> 00:06:39.000 I don't have to do dot slash. 00:06:39.000 --> 00:06:43.140 I, instead, run a program called python, whose purpose in life 00:06:43.140 --> 00:06:46.180 is, now, to interpret my code top to bottom, left to right. 00:06:46.180 --> 00:06:50.130 And if I run python of hello.py, crossing my fingers, as always-- 00:06:50.130 --> 00:06:51.000 voila. 00:06:51.000 --> 00:06:53.190 Now I have printed out "hello, world." 00:06:53.190 --> 00:06:56.460 So we seem to have gotten the new line for free, in the sense where 00:06:56.460 --> 00:06:57.735 it's automatically happening. 00:06:57.735 --> 00:06:59.880 The dollar sign isn't weirdly on the same line, 00:06:59.880 --> 00:07:02.220 like it once was in week one. 00:07:02.220 --> 00:07:04.493 But that's just a minor detail here. 00:07:04.493 --> 00:07:06.660 If we switch back to, now, some other capabilities-- 00:07:06.660 --> 00:07:09.780 well, indeed, with the CS50 library, you can also not 00:07:09.780 --> 00:07:12.795 just import the library itself, but specific functions. 00:07:12.795 --> 00:07:14.850 And you'll see that, temporarily, we're going 00:07:14.850 --> 00:07:19.080 to give you a helper function called get_string, just like in C, that just 00:07:19.080 --> 00:07:20.872 makes it work exactly the same way as in C. 00:07:20.872 --> 00:07:22.580 And we'll see a couple of other functions 00:07:22.580 --> 00:07:24.660 that will just make life easier, initially. 00:07:24.660 --> 00:07:26.910 But, quickly, will we take those training wheels off 00:07:26.910 --> 00:07:29.295 so that nothing is, indeed, CS50-specific. 00:07:29.295 --> 00:07:29.970 All right. 00:07:29.970 --> 00:07:32.640 Well, how about functions, more generally, in Python? 00:07:32.640 --> 00:07:34.710 Let's do a whirlwind tour, if you will, much 00:07:34.710 --> 00:07:38.940 like we did in that first week of C, comparing one to the other. 00:07:38.940 --> 00:07:42.270 So back in our world of Scratch, one of the first programs we wrote 00:07:42.270 --> 00:07:45.360 was this one here, whereby we ask the human their name. 00:07:45.360 --> 00:07:49.110 We then used the return value that was automatically stored 00:07:49.110 --> 00:07:53.130 in this answer variable as an second argument 00:07:53.130 --> 00:07:56.265 to join so that we could say "Hello, David" or "Hello, Carter." 00:07:56.265 --> 00:07:59.340 So this was back in week zero. 00:07:59.340 --> 00:08:01.143 In week one, we converted it to this. 00:08:01.143 --> 00:08:03.810 And here is a perfect example of things like escalating quickly. 00:08:03.810 --> 00:08:05.910 And, again, this is why we start in Scratch. 00:08:05.910 --> 00:08:09.060 There's just so much distraction here to achieve the same idea. 00:08:09.060 --> 00:08:12.010 But even today, we're going to chip away at some of that syntax. 00:08:12.010 --> 00:08:17.940 So, in C, we had to declare the variable as a string, here. 00:08:17.940 --> 00:08:19.935 We of course, had the semicolon and more. 00:08:19.935 --> 00:08:22.650 Well, in Python, the comparable code, now, 00:08:22.650 --> 00:08:26.100 is going to look, more simply, like this. 00:08:26.100 --> 00:08:29.250 So semicolon is, again, gone on both lines, for that matter. 00:08:29.250 --> 00:08:30.450 So that's good. 00:08:30.450 --> 00:08:33.100 What else appears to have changed or disappeared? 00:08:33.100 --> 00:08:33.600 Yeah. 00:08:33.600 --> 00:08:35.340 AUDIENCE: [? Do you have ?] the same type of variable? 00:08:35.340 --> 00:08:36.090 DAVID MALAN: Yeah. 00:08:36.090 --> 00:08:39.419 So I didn't have to specifically say that answer is now a string. 00:08:39.419 --> 00:08:41.820 And, indeed, Python is dynamically typed. 00:08:41.820 --> 00:08:45.270 And, in fact, it will infer from context exactly what 00:08:45.270 --> 00:08:48.000 it is you are storing in that variable. 00:08:48.000 --> 00:08:50.775 Other details that seem a little bit different? 00:08:53.640 --> 00:08:54.607 A little bit different. 00:08:54.607 --> 00:08:55.940 What else jumps out at you here? 00:08:55.940 --> 00:08:56.482 I'll go back. 00:08:56.482 --> 00:08:58.690 This was the C version. 00:08:58.690 --> 00:09:01.570 And maybe focus, now, on the second line because we've rather 00:09:01.570 --> 00:09:02.740 exhausted the first. 00:09:02.740 --> 00:09:04.690 Here's, now, the Python version. 00:09:04.690 --> 00:09:05.720 What's different here? 00:09:05.720 --> 00:09:06.220 Yeah? 00:09:06.220 --> 00:09:08.845 AUDIENCE: You don't need to worry about %s or percent anything. 00:09:08.845 --> 00:09:10.930 You just have the variable after [? them. ?] 00:09:10.930 --> 00:09:11.680 DAVID MALAN: Yeah. 00:09:11.680 --> 00:09:12.820 There's no %s anymore. 00:09:12.820 --> 00:09:16.480 There's no second argument, at the moment, per se, to print. 00:09:16.480 --> 00:09:17.818 Now, it is still a little weird. 00:09:17.818 --> 00:09:20.485 It's as though I've deployed some addition here, arithmetically. 00:09:20.485 --> 00:09:21.860 But that's not the case. 00:09:21.860 --> 00:09:23.230 Some of you have program before. 00:09:23.230 --> 00:09:27.377 And plus, some of you might know, means what in this context? 00:09:27.377 --> 00:09:29.960 So to combine or, more technically-- anyone know the buzzword? 00:09:29.960 --> 00:09:30.390 Yeah. 00:09:30.390 --> 00:09:31.040 AUDIENCE: Concatenate. 00:09:31.040 --> 00:09:32.460 DAVID MALAN: To concatenate. 00:09:32.460 --> 00:09:35.753 So to concatenate is the fancy way of what Scratch calls joining, 00:09:35.753 --> 00:09:38.420 which is to take one string on the left, one string on the right 00:09:38.420 --> 00:09:40.100 and to join them together. 00:09:40.100 --> 00:09:41.880 To glue them together, if you will. 00:09:41.880 --> 00:09:43.080 So this is not addition. 00:09:43.080 --> 00:09:45.080 It would be if it were numbers involved instead. 00:09:45.080 --> 00:09:46.413 But because we've got a string-- 00:09:46.413 --> 00:09:49.430 Hello comma-- and another string implicitly in this variable 00:09:49.430 --> 00:09:53.540 based on what the human typed in in response to this get_string function. 00:09:53.540 --> 00:09:58.130 That's going to concatenate Hello comma space, and then, David or Carter 00:09:58.130 --> 00:09:59.637 or whatever the human has typed in. 00:09:59.637 --> 00:10:02.720 But it turns out, there's going to be different ways to do this in Python. 00:10:02.720 --> 00:10:04.387 And we'll show you a few different ones. 00:10:04.387 --> 00:10:06.380 And here, too, try not to get too hung up 00:10:06.380 --> 00:10:09.255 on or frustrated by all of the different ways you can solve problems. 00:10:09.255 --> 00:10:12.130 Odds are, you're going to be picking up tips and techniques for years 00:10:12.130 --> 00:10:14.100 to come if you continue programming. 00:10:14.100 --> 00:10:16.710 So let's just give you a few of the possible ways. 00:10:16.710 --> 00:10:20.900 So here's a second way you could print out hello comma David or hello comma 00:10:20.900 --> 00:10:21.680 Carter. 00:10:21.680 --> 00:10:22.655 But what has changed? 00:10:22.655 --> 00:10:26.030 In the previous version, I used concatenation explicitly. 00:10:26.030 --> 00:10:28.445 And the space here is important, grammatically, 00:10:28.445 --> 00:10:30.485 just so we get that in the final phrase. 00:10:30.485 --> 00:10:33.410 Now, I'm proposing to get rid of that space 00:10:33.410 --> 00:10:36.985 to add a comma outside of the double quotes, as well. 00:10:36.985 --> 00:10:39.020 But if you think back to C, this probably 00:10:39.020 --> 00:10:42.620 just means that print, similar in spirit to printf, 00:10:42.620 --> 00:10:45.200 can take not just one argument, but even two. 00:10:45.200 --> 00:10:47.510 And in fact, because of this comma in the middle that's 00:10:47.510 --> 00:10:50.390 outside of the double quotes, it's hello comma, 00:10:50.390 --> 00:10:52.655 and then, it will be automatically concatenated 00:10:52.655 --> 00:10:56.420 with-- even without using the plus, to whatever the value of answer is. 00:10:56.420 --> 00:10:59.630 And by default, just for grammatical prettiness, 00:10:59.630 --> 00:11:01.850 the print function always gives you a space 00:11:01.850 --> 00:11:05.120 for free in between each of the multiple arguments you pass in. 00:11:05.120 --> 00:11:07.290 We'll see how you can override that down the line. 00:11:07.290 --> 00:11:09.248 But, for now, that's just another way to do it. 00:11:09.248 --> 00:11:12.680 Now, perhaps the better, if slightly cryptic way to do this-- 00:11:12.680 --> 00:11:14.420 or just the increasingly common way-- 00:11:14.420 --> 00:11:18.290 is, probably, the third version, which looks a little weird, too. 00:11:18.290 --> 00:11:20.555 And, probably, the weirdness jumps out. 00:11:20.555 --> 00:11:24.060 We've suddenly introduced these curly braces, 00:11:24.060 --> 00:11:25.518 which I promised were mostly gone. 00:11:25.518 --> 00:11:26.060 And they are. 00:11:26.060 --> 00:11:29.270 But inside of this string here, I've done 00:11:29.270 --> 00:11:31.520 a curly brace, which might mean what? 00:11:31.520 --> 00:11:32.918 Just intuitively. 00:11:32.918 --> 00:11:35.210 And here is an example of how you learn a new language. 00:11:35.210 --> 00:11:39.945 Just infer, from context, how Python probably works. 00:11:39.945 --> 00:11:40.820 What might this mean? 00:11:40.820 --> 00:11:41.320 Yeah? 00:11:41.320 --> 00:11:45.160 AUDIENCE: [INAUDIBLE] 00:11:45.160 --> 00:11:45.910 DAVID MALAN: Yeah. 00:11:45.910 --> 00:11:48.610 So this is an indication, because the curly braces-- 00:11:48.610 --> 00:11:50.740 because this was the way Python was designed-- 00:11:50.740 --> 00:11:55.340 that we want to plug in the value of answer, not literally A-N-S-W-E-R. 00:11:55.340 --> 00:11:59.688 And the fancy word here is that the answer variable will be interpolated-- 00:11:59.688 --> 00:12:01.480 that is, substituted with its actual value. 00:12:01.480 --> 00:12:04.435 But, but, but-- and this is actually weird-looking; 00:12:04.435 --> 00:12:06.820 this was introduced a few years ago to Python. 00:12:06.820 --> 00:12:11.230 What else did I have to change to make these curly braces work, apparently? 00:12:11.230 --> 00:12:11.935 Yeah? 00:12:11.935 --> 00:12:13.510 AUDIENCE: Drop the f before the-- 00:12:13.510 --> 00:12:14.260 DAVID MALAN: Yeah. 00:12:14.260 --> 00:12:15.160 There's this weird f. 00:12:15.160 --> 00:12:17.245 And so, it's like part of printf. 00:12:17.245 --> 00:12:20.950 But now, it's inside the parentheses there. 00:12:20.950 --> 00:12:22.945 This is just the way Python designed this. 00:12:22.945 --> 00:12:24.820 So a few years ago, when they introduced what 00:12:24.820 --> 00:12:30.070 are called format strings or fstrings, you literally prefix your quoted string 00:12:30.070 --> 00:12:32.080 with the letter f. 00:12:32.080 --> 00:12:34.570 And then, you can use trickery like this, 00:12:34.570 --> 00:12:36.640 like putting curly braces so that the value will 00:12:36.640 --> 00:12:38.170 be substituted automatically. 00:12:38.170 --> 00:12:41.530 If you forget the f, you're going to literally see hello comma curly 00:12:41.530 --> 00:12:43.330 brace answer closed curly brace. 00:12:43.330 --> 00:12:45.355 If you add the f, it's, indeed, interpolated. 00:12:45.355 --> 00:12:47.360 The value is plugged in. 00:12:47.360 --> 00:12:47.860 All right. 00:12:47.860 --> 00:12:52.510 Questions on how we can just say hello to the world via Python, in this case. 00:12:52.510 --> 00:12:53.350 Yeah? 00:12:53.350 --> 00:12:55.280 AUDIENCE: If you do this without the f, what would happen? 00:12:55.280 --> 00:12:56.300 DAVID MALAN: If you do this without the-- 00:12:56.300 --> 00:12:57.260 AUDIENCE: [? The f. ?] 00:12:57.260 --> 00:12:58.385 DAVID MALAN: Without the f? 00:12:58.385 --> 00:13:02.450 If you omit the f, you will literally see H-E-L-L-O comma curly brace 00:13:02.450 --> 00:13:04.730 A-N-S-W-E-R closed curly brace. 00:13:04.730 --> 00:13:05.930 So, in fact, let's do this. 00:13:05.930 --> 00:13:08.300 Let me go back to VS Code here, quickly. 00:13:08.300 --> 00:13:11.540 I've still got my file called hello.py open. 00:13:11.540 --> 00:13:14.210 And let me go ahead and change this ever so slightly. 00:13:14.210 --> 00:13:16.700 So I'm going to go ahead and-- 00:13:16.700 --> 00:13:20.930 let's say from cs50 import get_string. 00:13:20.930 --> 00:13:23.615 And that's just the new syntax I propose using to import 00:13:23.615 --> 00:13:26.150 a function from someone else's library. 00:13:26.150 --> 00:13:30.593 I'm going to now go ahead and ask the question-- 00:13:30.593 --> 00:13:33.260 let's go ahead and use get_string, storing the result in answer. 00:13:33.260 --> 00:13:37.480 So get_string, quote unquote, "What's your name?" 00:13:37.480 --> 00:13:41.090 And then, on this line, I'm going to deliberately make a mistake here, 00:13:41.090 --> 00:13:42.450 exactly to your question. 00:13:42.450 --> 00:13:46.820 Let me just say hello comma answer, and just this. 00:13:46.820 --> 00:13:48.980 Now, even though answer is a variable, Python's 00:13:48.980 --> 00:13:53.150 not going to be so presumptuous as to just plug in the value of a variable 00:13:53.150 --> 00:13:53.810 called answer. 00:13:53.810 --> 00:13:56.000 What it's going to do, of course, is-- 00:13:56.000 --> 00:13:56.985 if I type in my name-- 00:13:56.985 --> 00:13:57.485 whoops. 00:13:57.485 --> 00:13:58.880 I typed too fast. 00:13:58.880 --> 00:14:00.470 Let me go ahead and rerun that again. 00:14:00.470 --> 00:14:04.550 If I run python with hello.py, type in my name and hit Enter, 00:14:04.550 --> 00:14:06.035 I get hello comma answer. 00:14:06.035 --> 00:14:07.160 Well, let me do one better. 00:14:07.160 --> 00:14:10.680 Let me apply these curly braces as before. 00:14:10.680 --> 00:14:13.340 Let me rerun python of hello.py. 00:14:13.340 --> 00:14:14.060 What's your name? 00:14:14.060 --> 00:14:14.405 D-A-V-I-D. 00:14:14.405 --> 00:14:16.363 And here's, again, the answer to your question. 00:14:16.363 --> 00:14:18.780 Now, we get, literally, the curly braces. 00:14:18.780 --> 00:14:20.780 So the fix here, ultimately, is just going 00:14:20.780 --> 00:14:24.640 to be to add the f there, rerun my program again with David. 00:14:24.640 --> 00:14:26.482 And now, hello comma David. 00:14:26.482 --> 00:14:28.940 So this is, admittedly, a little more cryptic than the ones 00:14:28.940 --> 00:14:31.858 with the plus or the comma, but this is just increasingly common. 00:14:31.858 --> 00:14:33.650 Why? because you can read it left to right. 00:14:33.650 --> 00:14:34.720 It's nice and convenient. 00:14:34.720 --> 00:14:36.125 It's less cryptic than the %s's. 00:14:36.125 --> 00:14:40.130 So it's a new and improved version, if you will, of printf in C, 00:14:40.130 --> 00:14:44.780 based on decades of experience of programmers doing things like this. 00:14:44.780 --> 00:14:49.540 Questions on printing in this way? 00:14:49.540 --> 00:14:52.780 We're now on our way to programming in Python. 00:14:52.780 --> 00:14:53.280 Anything? 00:14:53.280 --> 00:14:53.780 All right. 00:14:53.780 --> 00:14:56.825 Well, what more can we do with this language, here? 00:14:56.825 --> 00:15:00.000 Well, let me propose that we consider that we 00:15:00.000 --> 00:15:07.200 have, for instance, a few other features that we can add to the mix, as well-- 00:15:07.200 --> 00:15:12.640 namely, let's say some data types, as well. 00:15:12.640 --> 00:15:15.600 So let me flip over here, to back to the slides. 00:15:15.600 --> 00:15:18.318 And there's different data types in Python, as we'll soon see. 00:15:18.318 --> 00:15:19.485 But they're not as explicit. 00:15:19.485 --> 00:15:23.070 As we already saw, by using a string from get_string, 00:15:23.070 --> 00:15:25.050 you don't have to explicitly state what it is. 00:15:25.050 --> 00:15:29.130 But you saw-- recall, in C-- all of these various data types. 00:15:29.130 --> 00:15:33.720 And then, in Python, nicely enough, this list is about to get shorter. 00:15:33.720 --> 00:15:37.740 And so, here is our list in C. Here is an abbreviated list in Python. 00:15:37.740 --> 00:15:41.220 So we're still going to have strings, but they're going to be more succinctly 00:15:41.220 --> 00:15:45.032 called strs now, S-T-R. We're still going to have ints for integers. 00:15:45.032 --> 00:15:47.490 We're still going to have floats for floating point values. 00:15:47.490 --> 00:15:49.900 We're even going to have bools for true and false. 00:15:49.900 --> 00:15:53.550 But what's missing, now, from the list is long and floats. 00:15:53.550 --> 00:15:54.420 And why is that? 00:15:54.420 --> 00:15:56.220 Or rather, long and double. 00:15:56.220 --> 00:15:58.650 We'll recall that, in C, those used more bits. 00:15:58.650 --> 00:16:02.550 Well, in Python, the smaller data types, previously-- int and float, 00:16:02.550 --> 00:16:04.950 themselves-- just used more bits for you. 00:16:04.950 --> 00:16:08.010 And so, you don't need to distinguish between small and large. 00:16:08.010 --> 00:16:10.290 You just use one data type, and the language 00:16:10.290 --> 00:16:12.345 gives you a bigger range than before. 00:16:12.345 --> 00:16:15.510 It turns out, though, there's going to be some other features, as well, 00:16:15.510 --> 00:16:17.610 of Python, and these data types-- one of which 00:16:17.610 --> 00:16:20.010 will be called range, another of which will be list. 00:16:20.010 --> 00:16:21.402 So gone will be arrays. 00:16:21.402 --> 00:16:23.610 We'll actually use something literally called a list. 00:16:23.610 --> 00:16:28.110 Tuples-- sort of x, y pairs for coordinates and things like that. 00:16:28.110 --> 00:16:31.260 Dicts for dictionaries-- so we'll have built-in capabilities 00:16:31.260 --> 00:16:34.270 for storing keys and values we'll see, and even a set. 00:16:34.270 --> 00:16:36.270 Mathematically, a set is a collection of values, 00:16:36.270 --> 00:16:38.790 but it automatically gets rid of duplicates for you. 00:16:38.790 --> 00:16:43.470 So all of these things, we could absolutely implement in C if we wanted. 00:16:43.470 --> 00:16:47.940 And, indeed, in problem set five, you've been implementing your very own spell 00:16:47.940 --> 00:16:50.400 checker using some form of hash table. 00:16:50.400 --> 00:16:54.060 Well, it turns out that, in Python, you can solve those same problems, 00:16:54.060 --> 00:16:56.070 but perhaps a little more readily. 00:16:56.070 --> 00:16:58.980 In fact, let me go back over here to VS Code, 00:16:58.980 --> 00:17:01.895 and let me propose that I do the following. 00:17:01.895 --> 00:17:06.210 Let me go ahead and create a file called dictionary.py. 00:17:06.210 --> 00:17:09.510 Let me propose that I try to implement, say-- problem set five-- 00:17:09.510 --> 00:17:14.220 our spell checker in Python instead of C and achieve, ultimately, 00:17:14.220 --> 00:17:17.443 the same kind of behavior whereby I'll be 00:17:17.443 --> 00:17:19.235 able to spell check a whole bunch of words. 00:17:19.235 --> 00:17:21.480 So this is jumping the gun a little bit because you're 00:17:21.480 --> 00:17:23.897 about to see syntax will revisit over the course of today. 00:17:23.897 --> 00:17:26.580 But, for now, I've got a new file called dictionary.py. 00:17:26.580 --> 00:17:30.810 And let me begin to create some placeholders for functions. 00:17:30.810 --> 00:17:34.710 We'll see in just a bit that, in Python, you can define a function called check, 00:17:34.710 --> 00:17:38.000 and that check function can take a word as its input. 00:17:38.000 --> 00:17:40.292 And I'll come back to this in just a moment. 00:17:40.292 --> 00:17:42.000 In Python, I can define a second function 00:17:42.000 --> 00:17:44.865 like load, which itself will take a whole dictionary, 00:17:44.865 --> 00:17:47.010 just like in problem set five. 00:17:47.010 --> 00:17:51.010 And I'll go ahead and come back to the implementation of this. 00:17:51.010 --> 00:17:53.130 Meanwhile, we might similarly implement a function 00:17:53.130 --> 00:17:57.090 called size, which takes no arguments but, ultimately, is going to return 00:17:57.090 --> 00:17:59.100 the size of my dictionary of words. 00:17:59.100 --> 00:18:02.370 And then, lastly, for consistency with problem set five, 00:18:02.370 --> 00:18:05.130 we might define an unload function, whose purpose in life 00:18:05.130 --> 00:18:07.770 is to free any memory that you've been using, just 00:18:07.770 --> 00:18:09.390 to give it back to the computer. 00:18:09.390 --> 00:18:11.790 Now, odds are, whether you're still working on speller 00:18:11.790 --> 00:18:15.660 or have finished speller, you wrote a decent amount of lines of code. 00:18:15.660 --> 00:18:18.550 And indeed, it's been, by design, a challenge. 00:18:18.550 --> 00:18:22.620 But one of the reasons for these higher-level languages like Python 00:18:22.620 --> 00:18:25.680 is that you can stand on the shoulders of programmers before you 00:18:25.680 --> 00:18:28.703 and solve very common problems much more quickly. 00:18:28.703 --> 00:18:31.620 So that you can focus on building your new app or your web application 00:18:31.620 --> 00:18:34.690 or your own project to solve problems of interest to you. 00:18:34.690 --> 00:18:38.490 So at the risk of crushing some spirits, let 00:18:38.490 --> 00:18:42.540 me propose that, in Python if you want a dictionary for something like a spell 00:18:42.540 --> 00:18:44.070 checker, well, that's fine. 00:18:44.070 --> 00:18:48.030 Go ahead and give yourself a variable, like words, to store all of those words 00:18:48.030 --> 00:18:52.410 and just assign it equal to a dictionary-- or dict, for short, 00:18:52.410 --> 00:18:53.220 in Python. 00:18:53.220 --> 00:18:55.140 That will give you a hash table. 00:18:55.140 --> 00:18:57.690 Now, it turns out, in speller recall, you 00:18:57.690 --> 00:18:59.720 don't need to worry about words and definitions. 00:18:59.720 --> 00:19:01.763 It's just about spell-checking the words. 00:19:01.763 --> 00:19:03.930 So strictly speaking, we don't need keys and values. 00:19:03.930 --> 00:19:05.610 We just need keys. 00:19:05.610 --> 00:19:07.980 So I'm going to save myself a few more keystrokes 00:19:07.980 --> 00:19:11.055 by just saying that, technically, in Python, using a set suffices. 00:19:11.055 --> 00:19:13.770 Again, a set is just a collection of values with no duplicates. 00:19:13.770 --> 00:19:16.400 But they don't necessarily have keys and values. 00:19:16.400 --> 00:19:18.250 It's just one or the other. 00:19:18.250 --> 00:19:21.420 But now that I have-- on line one, I claim the equivalent, in Python, 00:19:21.420 --> 00:19:25.720 of a hash table, I can actually do something like this. 00:19:25.720 --> 00:19:28.890 Here's how I might implement the check function in Python. 00:19:28.890 --> 00:19:33.840 If the word passed into this function is in my variable called words, 00:19:33.840 --> 00:19:35.390 well, return True. 00:19:35.390 --> 00:19:39.360 Else, go ahead and return False. 00:19:39.360 --> 00:19:40.030 Done. 00:19:40.030 --> 00:19:40.530 No, wait. 00:19:40.530 --> 00:19:42.990 You're thinking, if anything at all, maybe 00:19:42.990 --> 00:19:46.507 we want to handle lowercase instead of just uppercase and lowercase. 00:19:46.507 --> 00:19:47.340 Well, you know what? 00:19:47.340 --> 00:19:49.725 In Python, if you want to force a whole word to lowercase, 00:19:49.725 --> 00:19:51.360 you don't have to iterate over it with a loop. 00:19:51.360 --> 00:19:54.190 You don't have to use any of that C-type functions or anything. 00:19:54.190 --> 00:19:56.947 Just say word.lower, and that will convert the whole thing 00:19:56.947 --> 00:19:58.780 to lowercase for parity with the dictionary. 00:19:58.780 --> 00:19:59.440 All right. 00:19:59.440 --> 00:20:02.185 How about something like the load function in Python? 00:20:02.185 --> 00:20:06.130 Well, in Python, you can open files just like in C. For instance, in Python, I 00:20:06.130 --> 00:20:09.940 might do open, the dictionary argument in read mode, 00:20:09.940 --> 00:20:11.798 just like fopen in Python. 00:20:11.798 --> 00:20:13.090 I might do something like this. 00:20:13.090 --> 00:20:20.230 For each line in that file, let me go ahead and add, to my words variable, 00:20:20.230 --> 00:20:21.430 that line. 00:20:21.430 --> 00:20:24.790 And then, let me go ahead and close that file. 00:20:24.790 --> 00:20:26.320 And I think I'm done. 00:20:26.320 --> 00:20:28.457 I'm just going to go ahead and return True, 00:20:28.457 --> 00:20:30.040 just because I think I'm already done. 00:20:30.040 --> 00:20:32.350 Now, here, too, I could nitpick a little bit. 00:20:32.350 --> 00:20:35.680 Technically, if I'm reading in every line from the file, 00:20:35.680 --> 00:20:38.620 every line in the dictionary ends with, technically, a backslash n. 00:20:38.620 --> 00:20:41.140 But there's an easy way to get rid of that, 00:20:41.140 --> 00:20:43.360 just like you might see with an alternative syntax. 00:20:43.360 --> 00:20:45.060 What I'm actually going to do is this. 00:20:45.060 --> 00:20:49.060 Let me grab from the current line, the current word, 00:20:49.060 --> 00:20:51.940 by stripping off with reverse strip-- 00:20:51.940 --> 00:20:53.935 rstrip; a function we'll, again, see-- 00:20:53.935 --> 00:20:55.810 that just gets rid of the trailing new line-- 00:20:55.810 --> 00:20:58.000 the backslash n at the end of that line. 00:20:58.000 --> 00:21:01.900 And what I really want to do, then, is add this word to that dictionary. 00:21:01.900 --> 00:21:05.780 Meanwhile, if I want to figure out what the size is of my dictionary, well-- 00:21:05.780 --> 00:21:08.890 and, see, you're probably writing code to iterate over all of those lines, 00:21:08.890 --> 00:21:12.040 and you're just going to count them up using a variable. 00:21:12.040 --> 00:21:13.060 Not so in Python. 00:21:13.060 --> 00:21:15.460 You can just return the length of those words. 00:21:15.460 --> 00:21:19.360 And better still, in Python, you don't have to manage your own memory. 00:21:19.360 --> 00:21:20.500 No more malloc. 00:21:20.500 --> 00:21:21.700 No more free. 00:21:21.700 --> 00:21:24.370 No more manual thinking about memory. 00:21:24.370 --> 00:21:27.310 The language just deals with all of that for you. 00:21:27.310 --> 00:21:28.030 So you know what? 00:21:28.030 --> 00:21:30.760 It suffices for me to just return True and claim 00:21:30.760 --> 00:21:33.640 that unloading is done for me. 00:21:33.640 --> 00:21:35.170 And that's it. 00:21:35.170 --> 00:21:37.840 Again, whether, you're in the middle of or already finished, 00:21:37.840 --> 00:21:39.960 this might, perhaps, adjust some frustration, 00:21:39.960 --> 00:21:45.700 but also, enlightenment in that this is why higher-level languages exist. 00:21:45.700 --> 00:21:47.605 You can build on top of the same principles, 00:21:47.605 --> 00:21:50.170 the same ideas, with which you've been dealing, 00:21:50.170 --> 00:21:51.820 struggling even this past week. 00:21:51.820 --> 00:21:55.090 But you can now express yourself all the more succinctly. 00:21:55.090 --> 00:21:59.590 This one line implements a hash table for you, and all of this, now, 00:21:59.590 --> 00:22:03.250 just uses that hash table in a simpler way. 00:22:03.250 --> 00:22:05.980 Any questions, now, on this, keeping in mind 00:22:05.980 --> 00:22:08.830 that the point, nonetheless, of speller in p-set 5 00:22:08.830 --> 00:22:12.160 is to understand what's really going on underneath the hood 00:22:12.160 --> 00:22:14.860 and, better still, to notice this. 00:22:14.860 --> 00:22:18.010 This might seem all rather amazing, but let me go ahead and do this. 00:22:18.010 --> 00:22:21.100 I've actually got a couple of versions of speller written here, 00:22:21.100 --> 00:22:24.800 and I've got a version written in C that I won't show the source code for. 00:22:24.800 --> 00:22:28.990 But I'm going to go ahead and make that version of speller in C. 00:22:28.990 --> 00:22:32.470 And I'm going to go ahead here and, let's say, split 00:22:32.470 --> 00:22:34.270 my window here for just a moment. 00:22:34.270 --> 00:22:37.030 And I'm going to go into a Python version of speller, 00:22:37.030 --> 00:22:38.470 really, that I just wrote. 00:22:38.470 --> 00:22:42.820 And on the left-hand side here, let me go ahead and run speller-- 00:22:42.820 --> 00:22:44.740 the version I compiled in C-- 00:22:44.740 --> 00:22:47.890 using a big text like the Sherlock Holmes text, 00:22:47.890 --> 00:22:50.030 which has a whole lot of words in it. 00:22:50.030 --> 00:22:52.180 And on the right-hand side, let me run python 00:22:52.180 --> 00:22:55.510 of speller.py, which is a separate file I wrote in advance, 00:22:55.510 --> 00:22:57.430 just like we give you speller.c. 00:22:57.430 --> 00:23:00.790 And I'll, similarly, run this on the Sherlock Holmes text. 00:23:00.790 --> 00:23:05.020 And I'm going to do my best to hit Enter on the left and the right of my screen 00:23:05.020 --> 00:23:06.100 at the same time. 00:23:06.100 --> 00:23:08.770 But we should see, hopefully, the same list of misspelled words 00:23:08.770 --> 00:23:10.390 and the timings thereof. 00:23:10.390 --> 00:23:12.380 So here we go on the right. 00:23:12.380 --> 00:23:15.136 Here we go on the left. 00:23:15.136 --> 00:23:16.730 All right. 00:23:16.730 --> 00:23:18.680 A race to see which one wins here. 00:23:18.680 --> 00:23:19.820 C is on the left. 00:23:19.820 --> 00:23:21.680 Python is on the right. 00:23:21.680 --> 00:23:23.270 OK. 00:23:23.270 --> 00:23:25.530 Interesting. 00:23:25.530 --> 00:23:28.200 Hopefully, Python's close behind. 00:23:28.200 --> 00:23:30.330 Note that some of this is internet delay. 00:23:30.330 --> 00:23:33.360 And so, it might not necessarily be a crazy number of seconds. 00:23:33.360 --> 00:23:37.050 But the system is, indeed, using, if we measure it, a low level. 00:23:37.050 --> 00:23:39.630 How much time the CPU spent executing my code? 00:23:39.630 --> 00:23:41.653 C took a total of 1.64 seconds. 00:23:41.653 --> 00:23:44.820 That was pretty fast, even though it took a moment more for all of the bytes 00:23:44.820 --> 00:23:46.590 to come over the internet. 00:23:46.590 --> 00:23:49.050 The Python version, though, took what? 00:23:49.050 --> 00:23:50.605 2.44 seconds. 00:23:50.605 --> 00:23:53.100 So what might the inference be? 00:23:53.100 --> 00:23:55.590 One, maybe I'm just better at programming in C 00:23:55.590 --> 00:23:59.400 than I am in Python, which is probably not true. 00:23:59.400 --> 00:24:03.210 But what else might you infer from this example? 00:24:07.541 --> 00:24:11.176 Should we, maybe, give up on Python, stick with C? 00:24:11.176 --> 00:24:12.070 No? 00:24:12.070 --> 00:24:14.410 So what might be going on here? 00:24:14.410 --> 00:24:16.870 Why is the Python version, that I claim is correct-- 00:24:16.870 --> 00:24:20.620 and I think the numbers all line up, just not the times. 00:24:20.620 --> 00:24:21.820 Where is the trade-off here? 00:24:21.820 --> 00:24:23.915 Well, here, again, is this design trade-off. 00:24:23.915 --> 00:24:24.415 Yeah? 00:24:24.415 --> 00:24:29.310 AUDIENCE: In order to save the programmer time, [INAUDIBLE].. 00:24:29.310 --> 00:24:30.690 DAVID MALAN: Yeah, exactly. 00:24:30.690 --> 00:24:32.910 In order to save the human programmer time, 00:24:32.910 --> 00:24:35.700 there's a lot more features built into Python-- more functions, 00:24:35.700 --> 00:24:38.920 more automatic management of memory and so forth-- 00:24:38.920 --> 00:24:40.530 and you have to pay a price. 00:24:40.530 --> 00:24:43.193 Someone else's code is doing all of that work for you. 00:24:43.193 --> 00:24:45.360 But if they've written some number of lines of code, 00:24:45.360 --> 00:24:47.152 those are just more lines of code that need 00:24:47.152 --> 00:24:50.730 to be executed for you, whereas here, the computer is 00:24:50.730 --> 00:24:54.615 at the risk of oversimplifying only running my lines of code. 00:24:54.615 --> 00:24:55.865 So there's just less overhead. 00:24:55.865 --> 00:24:57.448 And so, this is a perpetual trade-off. 00:24:57.448 --> 00:25:00.990 Typically, when using a more user-friendly and more modern language, 00:25:00.990 --> 00:25:02.983 one of the prices you might pay is performance. 00:25:02.983 --> 00:25:06.150 Now, there's a lot of smart computer scientists in the world, though, trying 00:25:06.150 --> 00:25:08.440 to push back on those same trade-offs. 00:25:08.440 --> 00:25:11.220 And so, these interpreters, like the command I wrote, 00:25:11.220 --> 00:25:15.390 Python technically can-- especially if you run a program again and again-- 00:25:15.390 --> 00:25:19.350 actually, secretly, behind the scenes, compile your code for you, 00:25:19.350 --> 00:25:20.610 down to 0s and 1s. 00:25:20.610 --> 00:25:23.640 And then, the second, the third, the fourth time you run that program, 00:25:23.640 --> 00:25:25.010 it might very well be faster. 00:25:25.010 --> 00:25:27.150 So this is a bit of a head fake here, in that 00:25:27.150 --> 00:25:29.490 I'm running them once and only once. 00:25:29.490 --> 00:25:32.070 But we could get benefit over time if we kept 00:25:32.070 --> 00:25:34.183 running the Python version again and again 00:25:34.183 --> 00:25:35.850 and, perhaps, fine-tune the performance. 00:25:35.850 --> 00:25:38.017 But, in general, there's going to be this trade-off. 00:25:38.017 --> 00:25:40.560 Now, would you rather spend the 60 seconds 00:25:40.560 --> 00:25:43.620 I wrote implementing a spell checker or this 6 hours, 00:25:43.620 --> 00:25:47.910 16 hours you might be or have spent implementing the same in C? 00:25:47.910 --> 00:25:48.720 Probably not. 00:25:48.720 --> 00:25:52.650 For productivity's sake, this is why we have these additional languages. 00:25:52.650 --> 00:25:57.300 Just for fun, let me flip over to another screen here and open up 00:25:57.300 --> 00:26:00.540 a version of Python that's actually-- in just a second-- 00:26:00.540 --> 00:26:04.230 on my own Mac instead of the cloud so that 00:26:04.230 --> 00:26:06.490 I can actually do something with graphics. 00:26:06.490 --> 00:26:09.930 So, here, I just have a black and white terminal window on my very own Mac. 00:26:09.930 --> 00:26:12.450 And I've pre-installed Python, just like we've done so 00:26:12.450 --> 00:26:14.370 for VS Code in the cloud for you. 00:26:14.370 --> 00:26:19.320 Notice that I've got this photo of, perhaps, one of your favorite TV 00:26:19.320 --> 00:26:21.090 shows here, with the cast of The Office. 00:26:21.090 --> 00:26:24.630 Notice all of the faces in this image here. 00:26:24.630 --> 00:26:30.210 And let me propose that we try to find one face in the crowd, CSI-style, 00:26:30.210 --> 00:26:33.660 whereby we want to find, perhaps, the Scranton Strangler, so to speak. 00:26:33.660 --> 00:26:37.080 And so, here is an example of this guy's face. 00:26:37.080 --> 00:26:40.385 Now, how do we go about finding this specific face in the crowd? 00:26:40.385 --> 00:26:42.510 Well, our human eyes, obviously, can pluck him out, 00:26:42.510 --> 00:26:44.370 especially if you're familiar with the show. 00:26:44.370 --> 00:26:46.605 But let me go ahead and do this instead. 00:26:46.605 --> 00:26:50.730 Let me go ahead and propose that we run code 00:26:50.730 --> 00:26:52.800 that I already wrote in advance here. 00:26:52.800 --> 00:26:55.085 This is a Python program with more lines of code 00:26:55.085 --> 00:26:56.460 that we won't dwell on for today. 00:26:56.460 --> 00:26:58.800 But it's meant to motivate what we can do. 00:26:58.800 --> 00:27:03.150 From a pillow library, implying a Python image library, 00:27:03.150 --> 00:27:07.033 I want to import some type of information, 00:27:07.033 --> 00:27:09.450 some feature called image so that I can manipulate images, 00:27:09.450 --> 00:27:12.150 not unlike our own problem set four. 00:27:12.150 --> 00:27:13.330 And this is powerful. 00:27:13.330 --> 00:27:13.830 in? 00:27:13.830 --> 00:27:14.330 Python. 00:27:14.330 --> 00:27:18.450 You can just [MIMICS EXPLOSION] import face recognition as a library 00:27:18.450 --> 00:27:19.950 that someone else wrote. 00:27:19.950 --> 00:27:22.770 From there, I'm going to create a variable called image. 00:27:22.770 --> 00:27:25.050 I'm going to use this face recognition libraries. 00:27:25.050 --> 00:27:27.330 load_image_file function. 00:27:27.330 --> 00:27:30.030 It's a little verbose, but it's similar in spirit to fopen. 00:27:30.030 --> 00:27:32.100 And I'm going to open office.jpeg. 00:27:32.100 --> 00:27:36.990 I'm going to, then, declare a second variable called face_locations, plural, 00:27:36.990 --> 00:27:40.620 because what I'm expecting to get back, per the documentation for this library, 00:27:40.620 --> 00:27:44.650 is a list of all of the faces' locations that are detected. 00:27:44.650 --> 00:27:45.150 All right. 00:27:45.150 --> 00:27:48.660 Then, I'm going to iterate over each of those faces using a for loop, 00:27:48.660 --> 00:27:50.460 that we'll see in more detail. 00:27:50.460 --> 00:27:53.475 I'm going to, then, infer what the top, right, bottom, and left corners 00:27:53.475 --> 00:27:55.050 are of that face. 00:27:55.050 --> 00:28:00.300 And then, what I'm going to do here is show that face alone, 00:28:00.300 --> 00:28:03.040 if I've detected the face in question. 00:28:03.040 --> 00:28:08.760 So let me go ahead, here, and run detect.py. 00:28:08.760 --> 00:28:12.370 And we'll see not just the one face we're looking for. 00:28:12.370 --> 00:28:16.430 But if I run Python of detect.py, it's going to do all of the analysis. 00:28:16.430 --> 00:28:22.380 I'll see a big opening here, now, of all of the faces that 00:28:22.380 --> 00:28:24.870 were detected in this here program. 00:28:24.870 --> 00:28:26.870 [CHUCKLES] OK, some better than others, I guess, 00:28:26.870 --> 00:28:28.560 if you zoom in on catching someone. 00:28:28.560 --> 00:28:29.970 Typical Angela. 00:28:29.970 --> 00:28:32.700 If you want to, now, find that one face, I 00:28:32.700 --> 00:28:34.920 think we need to train the software a bit more. 00:28:34.920 --> 00:28:37.080 So let me actually open up a second program called 00:28:37.080 --> 00:28:39.270 recognize that's got more going on. 00:28:39.270 --> 00:28:41.370 But let me, with a wave of a hand, point out 00:28:41.370 --> 00:28:45.870 that I'm now loading not only the office.jpeg, but also toby.jpeg 00:28:45.870 --> 00:28:49.840 to train the algorithm to find that specific face. 00:28:49.840 --> 00:28:53.580 And so, now, if I run this second version-- recognize.py-- 00:28:53.580 --> 00:28:56.310 with Python of recognize.py-- 00:28:56.310 --> 00:28:59.160 hold my breath for just a moment; it's analyzing, presumably, 00:28:59.160 --> 00:29:00.420 all of the faces-- 00:29:00.420 --> 00:29:02.070 you see the same, original photo. 00:29:02.070 --> 00:29:05.610 But do you see one such face highlighted here? 00:29:05.610 --> 00:29:09.420 This version of the code found Toby, highlighted him 00:29:09.420 --> 00:29:12.110 with the screen and, voila, we have face recognition. 00:29:12.110 --> 00:29:14.318 So for better or for worse, this is what's happening, 00:29:14.318 --> 00:29:15.967 increasingly societally, nowadays. 00:29:15.967 --> 00:29:18.300 And honestly, even though I didn't write the code live-- 00:29:18.300 --> 00:29:21.420 because it's a good dozen or more lines of code-- it's not terribly many. 00:29:21.420 --> 00:29:24.450 And literally, all the authorities-- all we have to do 00:29:24.450 --> 00:29:27.960 is import face recognition and, voila, you have access. 00:29:27.960 --> 00:29:29.890 These technologies are here already. 00:29:29.890 --> 00:29:31.690 But let's consider, for just a moment-- 00:29:31.690 --> 00:29:33.820 how did we find Toby? 00:29:33.820 --> 00:29:35.150 How might that library-- 00:29:35.150 --> 00:29:37.900 even though we're not going to look at its implementation details, 00:29:37.900 --> 00:29:40.000 how does it find Toby and distinguish him 00:29:40.000 --> 00:29:43.960 from all of these other faces in the crowd? 00:29:43.960 --> 00:29:47.180 What might it be doing, intuitively. 00:29:47.180 --> 00:29:50.570 Think back even to p-set four, what you, yourselves, have access to, data-wise. 00:29:50.570 --> 00:29:51.083 Yeah? 00:29:51.083 --> 00:29:53.750 AUDIENCE: [? Since ?] we gave it an image of Toby's face before, 00:29:53.750 --> 00:29:59.010 it probably looks at are the pixels in one area the same as in another area 00:29:59.010 --> 00:30:00.720 and allots it to the same-- 00:30:00.720 --> 00:30:02.998 from that reference image to this image. 00:30:02.998 --> 00:30:06.870 And then, it's going to say, hey, a lot of the similar consult ranges 00:30:06.870 --> 00:30:09.292 are here and here, so we can safely guess 00:30:09.292 --> 00:30:10.750 that this is the same [? person. ?] 00:30:10.750 --> 00:30:11.875 DAVID MALAN: Yeah, exactly. 00:30:11.875 --> 00:30:15.610 And to summarize for the camera here, we have trained the software, if you will, 00:30:15.610 --> 00:30:17.560 by giving it a photo of Toby's face. 00:30:17.560 --> 00:30:20.218 So, by looking for the same or, really, similar pixels-- 00:30:20.218 --> 00:30:22.510 especially if it's a slightly different image of Toby-- 00:30:22.510 --> 00:30:24.970 we can, perhaps, identify him in the crowd. 00:30:24.970 --> 00:30:26.412 And what really is a human face? 00:30:26.412 --> 00:30:28.120 Well, at the end of the day, the computer 00:30:28.120 --> 00:30:30.340 only knows it as a pattern of bits or, really, 00:30:30.340 --> 00:30:32.110 at a higher level, a pattern of pixels. 00:30:32.110 --> 00:30:35.170 So maybe a human face is, perhaps, best defined, in general, 00:30:35.170 --> 00:30:39.295 as two eyes and a nose and a mouth that, even though all of us look similar, 00:30:39.295 --> 00:30:43.268 structurally, odds are, the measurement between the eyes and the nose 00:30:43.268 --> 00:30:45.310 and the width of the mouth, the skin tone and all 00:30:45.310 --> 00:30:47.920 of these other physical characteristics are patterns 00:30:47.920 --> 00:30:51.280 that software could, perhaps, detect and then look, statistically, 00:30:51.280 --> 00:30:53.920 through the image, looking for the closest possible match 00:30:53.920 --> 00:30:57.422 to these various measurement shapes, colors and sizes and the like. 00:30:57.422 --> 00:30:59.130 And, indeed, that might be the intuition. 00:30:59.130 --> 00:31:03.520 But what's powerful here, again, is just how easy and readily available 00:31:03.520 --> 00:31:06.280 this technology now is. 00:31:06.280 --> 00:31:06.820 All right. 00:31:06.820 --> 00:31:10.605 So with that said, let's propose to consider what more we 00:31:10.605 --> 00:31:13.480 can do with Python itself, get back to the fundamentals, so that you, 00:31:13.480 --> 00:31:16.990 yourselves can start to implement something along those same lines. 00:31:16.990 --> 00:31:21.820 So besides having access to things like a get_string function, 00:31:21.820 --> 00:31:26.080 the CS50 library provides a few other things, as well-- namely, in C, 00:31:26.080 --> 00:31:27.040 we had these. 00:31:27.040 --> 00:31:29.052 But in Python, we're going to have fewer. 00:31:29.052 --> 00:31:32.260 In Python, our library, short-term, is going to give you not only get_string, 00:31:32.260 --> 00:31:33.740 but also get_int and get_float. 00:31:33.740 --> 00:31:34.240 Why? 00:31:34.240 --> 00:31:36.310 It's actually just annoying, as we'll soon 00:31:36.310 --> 00:31:39.190 see, to get back an integer or a float from a user 00:31:39.190 --> 00:31:44.890 and just make sure that it's an int and a float and not a word like cat or dog, 00:31:44.890 --> 00:31:47.170 or some string that's not actually a number. 00:31:47.170 --> 00:31:50.810 Well, we can import not just the specific function, get_string, 00:31:50.810 --> 00:31:53.920 but we can actually import all of these functions one at a time, 00:31:53.920 --> 00:31:55.840 like this, as we'll soon see. 00:31:55.840 --> 00:31:59.410 Or you can even, in Python, import specific functions from a file. 00:31:59.410 --> 00:32:04.300 One of you asked a while back, when you include something like CS50.h 00:32:04.300 --> 00:32:08.780 or standard I/O .h, you're actually getting all of the code in that file, 00:32:08.780 --> 00:32:12.010 which, potentially, can add bulk to your own program or time. 00:32:12.010 --> 00:32:15.040 In this case, when you import specific functions from Python, 00:32:15.040 --> 00:32:17.875 you can be a little more narrowly precise 00:32:17.875 --> 00:32:21.230 as to what it is you want to have access to. 00:32:21.230 --> 00:32:21.730 All right. 00:32:21.730 --> 00:32:23.890 So, with that said, let's go ahead and see 00:32:23.890 --> 00:32:25.900 what conditionals look like in Python. 00:32:25.900 --> 00:32:29.470 So in the left-hand side again, here, we'll see Scratch. 00:32:29.470 --> 00:32:33.190 So it's just a contrived example asking if x is less than y, then, 00:32:33.190 --> 00:32:35.350 say, x is less than y. 00:32:35.350 --> 00:32:37.540 In C, it looked like this. 00:32:37.540 --> 00:32:41.050 In Python, now, it's going to look like this instead. 00:32:41.050 --> 00:32:44.815 And here's before in C, and here's after. 00:32:44.815 --> 00:32:47.320 And just to call out a few of the obvious differences, what 00:32:47.320 --> 00:32:49.810 has changed, in Python, for conditionals, it would seem? 00:32:53.013 --> 00:32:53.930 What's the difference? 00:32:53.930 --> 00:32:54.230 Yeah. 00:32:54.230 --> 00:32:55.920 AUDIENCE: There's a lack of curly braces. 00:32:55.920 --> 00:32:56.380 DAVID MALAN: Yeah. 00:32:56.380 --> 00:32:57.760 So there's no more curly braces. 00:32:57.760 --> 00:32:59.170 And, indeed, you don't use those. 00:32:59.170 --> 00:33:04.138 What appears to be taking their place, if you might infer? 00:33:04.138 --> 00:33:05.680 What seems to have taken their place? 00:33:05.680 --> 00:33:05.890 What do you think? 00:33:05.890 --> 00:33:06.765 AUDIENCE: [INAUDIBLE] 00:33:06.765 --> 00:33:09.560 DAVID MALAN: So the colon at the start of this line, here. 00:33:09.560 --> 00:33:13.760 But also even more important, now, is this indentation below it. 00:33:13.760 --> 00:33:16.160 So some of you, and we know this from office hours, 00:33:16.160 --> 00:33:19.380 have a habit of indenting everything on the left, right? 00:33:19.380 --> 00:33:21.200 And it's just this crazy mess to look at. 00:33:21.200 --> 00:33:23.000 Frustrating for you, surely. 00:33:23.000 --> 00:33:25.670 But C and Clang is pretty tolerant when it 00:33:25.670 --> 00:33:27.860 comes to things like white space in a program. 00:33:27.860 --> 00:33:29.030 Python, uh-uh. 00:33:29.030 --> 00:33:32.240 They realized, years ago, that-- let's help humans help themselves and just 00:33:32.240 --> 00:33:34.610 require standard indentation. 00:33:34.610 --> 00:33:36.620 So four spaces would be the norm here. 00:33:36.620 --> 00:33:38.870 But because it's indented below that colon, that, 00:33:38.870 --> 00:33:42.110 indeed, indicates that this, now, is part of that condition. 00:33:42.110 --> 00:33:46.340 Something else has gone missing, versus C, in this conditional. 00:33:46.340 --> 00:33:47.855 What else is a little simplified? 00:33:47.855 --> 00:33:49.660 AUDIENCE: [INAUDIBLE] 00:33:49.660 --> 00:33:50.410 DAVID MALAN: Yeah. 00:33:50.410 --> 00:33:51.368 So no more parentheses. 00:33:51.368 --> 00:33:53.650 You can still use them, especially when you need to, 00:33:53.650 --> 00:33:56.112 logically, to do order of operations, like in math. 00:33:56.112 --> 00:33:57.820 But in this case, if you just want to ask 00:33:57.820 --> 00:34:01.162 a simple question, like if x less than y, you can just do it like that. 00:34:01.162 --> 00:34:02.620 How about when you have an if else? 00:34:02.620 --> 00:34:05.170 Well, this is almost the same, here, with these same changes. 00:34:05.170 --> 00:34:06.800 In C, this looked like this. 00:34:06.800 --> 00:34:08.800 And it's starting to get a bit bulky-- at least, 00:34:08.800 --> 00:34:10.659 if we use our curly braces in this way. 00:34:10.659 --> 00:34:13.060 In Python, we can tighten things up further, even though, 00:34:13.060 --> 00:34:15.727 strictly speaking, in C, you don't always need the curly braces. 00:34:15.727 --> 00:34:18.280 But here, gone are the parentheses, again. 00:34:18.280 --> 00:34:20.020 Gone are the curly braces. 00:34:20.020 --> 00:34:23.380 Indentation is consistent, and we've just added another keyword, 00:34:23.380 --> 00:34:24.580 else, with the colon. 00:34:24.580 --> 00:34:26.325 But no more semicolons, as well. 00:34:26.325 --> 00:34:30.010 How about something larger, like this, in if, else, if else? 00:34:30.010 --> 00:34:31.960 This one's a little curious. 00:34:31.960 --> 00:34:35.290 But in C, it looked like this-- if, else, if else. 00:34:35.290 --> 00:34:38.143 In Python, it now looks like this. 00:34:38.143 --> 00:34:40.060 And there's, perhaps, one curiosity here that, 00:34:40.060 --> 00:34:41.977 honestly, all these years later, I still can't 00:34:41.977 --> 00:34:43.630 remember how to spell it half the time. 00:34:43.630 --> 00:34:46.900 What's weird about this? 00:34:46.900 --> 00:34:50.415 What do you spot as different? 00:34:50.415 --> 00:34:51.230 Yeah, over here. 00:34:51.230 --> 00:34:53.520 AUDIENCE: [INAUDIBLE] 00:34:53.520 --> 00:34:54.270 DAVID MALAN: Yeah. 00:34:54.270 --> 00:34:56.260 Instead of else if, it's elif. 00:34:56.260 --> 00:34:56.760 Why? 00:34:56.760 --> 00:34:59.340 [SIGHS] Apparently, else space if was just too many 00:34:59.340 --> 00:35:02.250 keystrokes for humans to type, so they condensed it into this way. 00:35:02.250 --> 00:35:05.100 Probably means it's a little more distinguishable, too, 00:35:05.100 --> 00:35:07.200 for the computer between the if and the else, too. 00:35:07.200 --> 00:35:08.700 But just something to remember, now. 00:35:08.700 --> 00:35:10.620 It's, indeed, elif and not else if. 00:35:10.620 --> 00:35:11.123 All right. 00:35:11.123 --> 00:35:12.540 So what about variables in Python? 00:35:12.540 --> 00:35:16.590 I've used a couple of them already, but let's 00:35:16.590 --> 00:35:19.533 distill exactly how you define and declare these things, as well. 00:35:19.533 --> 00:35:22.200 So, in Scratch, if we wanted to create a variable called counter 00:35:22.200 --> 00:35:25.185 and set it equal, initially, to 0, we would do something 00:35:25.185 --> 00:35:28.680 like this-- specify that it's an int, use the assignment operator, 00:35:28.680 --> 00:35:30.060 end the thought with a semicolon. 00:35:30.060 --> 00:35:32.310 In Python, it's just simpler. 00:35:32.310 --> 00:35:34.680 You name the variable, use the assignment operator, 00:35:34.680 --> 00:35:37.755 as before, you set it equal to some value, and that's it. 00:35:37.755 --> 00:35:38.880 You don't mention the type. 00:35:38.880 --> 00:35:41.250 You don't mention the semicolon or anything more. 00:35:41.250 --> 00:35:44.250 What if you want to change a variable, like counter, 00:35:44.250 --> 00:35:46.320 by 1-- that is, incremented by 1? 00:35:46.320 --> 00:35:47.800 You have a few different ways here. 00:35:47.800 --> 00:35:51.990 In C, we saw syntax like this, where you can say counter equals counter plus 1, 00:35:51.990 --> 00:35:54.900 which, again, feels illogical. 00:35:54.900 --> 00:35:56.610 How can counter equal counter plus 1? 00:35:56.610 --> 00:36:01.890 But, again, we read this code, really, right to left, updating its value by 1. 00:36:01.890 --> 00:36:03.550 In Python, it's almost the same. 00:36:03.550 --> 00:36:04.535 You just get rid of the semicolon. 00:36:04.535 --> 00:36:05.580 So that logic is there. 00:36:05.580 --> 00:36:08.070 But recall, in C, we could do something slightly different 00:36:08.070 --> 00:36:09.840 that we can also do in Python. 00:36:09.840 --> 00:36:12.060 In Python, you can also, more succinctly, 00:36:12.060 --> 00:36:15.420 do this-- plus equals, and then, whatever number you want to add. 00:36:15.420 --> 00:36:17.790 Or you can even change it to subtract, if you prefer. 00:36:17.790 --> 00:36:21.495 Sadly, gone is something you've probably typed a whole lot. 00:36:21.495 --> 00:36:23.940 What was the other way you can add 1? 00:36:23.940 --> 00:36:24.773 AUDIENCE: Plus plus? 00:36:24.773 --> 00:36:26.940 DAVID MALAN: Plus plus is no more, sadly, in Python. 00:36:26.940 --> 00:36:29.600 Just too many ways to do the same thing, so they got rid of it 00:36:29.600 --> 00:36:31.705 in favor of just this syntax, here. 00:36:31.705 --> 00:36:33.140 So keep that in mind, as well. 00:36:33.140 --> 00:36:36.500 What about loops, when you want to do something in Python again and again. 00:36:36.500 --> 00:36:39.380 Well, in Scratch, in week zero, here's how we meowed three times, 00:36:39.380 --> 00:36:40.700 specifically. 00:36:40.700 --> 00:36:42.650 In C, we had a couple of ways of doing this. 00:36:42.650 --> 00:36:46.460 This was the more mechanical approach, where you create a variable called i. 00:36:46.460 --> 00:36:47.780 You set it equal to 0. 00:36:47.780 --> 00:36:51.230 You then do while i is less than 3, the following. 00:36:51.230 --> 00:36:54.530 And then, you, yourself increment i again and again. 00:36:54.530 --> 00:36:57.920 Mechanical in the sense that you have to implement all of these gears 00:36:57.920 --> 00:37:01.130 and make them turn yourself, but this was a correct way to do that. 00:37:01.130 --> 00:37:03.740 In Python, we can still achieve the same idea, 00:37:03.740 --> 00:37:05.945 but we don't need the int keyword. 00:37:05.945 --> 00:37:07.445 We don't need any of the semicolons. 00:37:07.445 --> 00:37:08.695 We don't need the parentheses. 00:37:08.695 --> 00:37:10.310 We don't need the curly braces. 00:37:10.310 --> 00:37:12.200 We can't use the plus plus, so maybe that's 00:37:12.200 --> 00:37:14.300 a minor step backwards if you're a fan. 00:37:14.300 --> 00:37:17.930 But otherwise, the code, the logic is exactly the same. 00:37:17.930 --> 00:37:20.390 But there's other ways to achieve this same idea. 00:37:20.390 --> 00:37:22.950 Recall that, in C, we could also do this. 00:37:22.950 --> 00:37:25.880 You could use a for loop, which does exactly the same thing. 00:37:25.880 --> 00:37:26.893 Both are correct. 00:37:26.893 --> 00:37:28.310 Both are, arguably, well-designed. 00:37:28.310 --> 00:37:32.000 It's to each their own when it comes to choosing between these. 00:37:32.000 --> 00:37:35.930 In Python, though, we're going to have to think through how to do this. 00:37:35.930 --> 00:37:41.300 So you don't do the same for loop as in C. The closest I could come up with 00:37:41.300 --> 00:37:44.270 is this, where you say for i-- 00:37:44.270 --> 00:37:47.555 or whatever variable you want to do the counting-- in-- literally 00:37:47.555 --> 00:37:50.522 the preposition-- and then, you use square brackets here. 00:37:50.522 --> 00:37:52.730 And we've used square brackets before, in the context 00:37:52.730 --> 00:37:55.370 of arrays and things like that. 00:37:55.370 --> 00:38:00.080 And the 0, 1, 2 looks like an array, in some sense, even though we've also seen 00:38:00.080 --> 00:38:01.470 arrays with curly braces. 00:38:01.470 --> 00:38:03.950 But these square brackets, for now, denote a list. 00:38:03.950 --> 00:38:05.420 Python does not have arrays. 00:38:05.420 --> 00:38:08.600 An array is that contiguous chunk of memory, back to back to back, 00:38:08.600 --> 00:38:13.160 that you have to resize somehow by moving things around in memory, 00:38:13.160 --> 00:38:14.450 as per two weeks ago. 00:38:14.450 --> 00:38:19.175 In Python, though, you can just create a list like this using square brackets. 00:38:19.175 --> 00:38:22.700 And better still, as we'll see, you can add or even remove things 00:38:22.700 --> 00:38:24.920 from that list down the road. 00:38:24.920 --> 00:38:27.140 This, though, is not going to be very well-designed. 00:38:27.140 --> 00:38:28.610 This will work. 00:38:28.610 --> 00:38:32.030 This will iterate in Python three times. 00:38:32.030 --> 00:38:34.700 But what might rub you the wrong way about this design, 00:38:34.700 --> 00:38:36.860 even if you've never seen Python before? 00:38:36.860 --> 00:38:38.460 How does this example not end well? 00:38:38.460 --> 00:38:38.960 Yeah? 00:38:38.960 --> 00:38:41.810 AUDIENCE: Making a large list [INAUDIBLE].. 00:38:41.810 --> 00:38:42.560 DAVID MALAN: Yeah. 00:38:42.560 --> 00:38:45.830 If you're making a large list, you have to type out each one of these numbers, 00:38:45.830 --> 00:38:50.178 like comma 3, comma 4, comma 5, comma, dot, dot, dot, 50 comma, dot, dot, dot, 00:38:50.178 --> 00:38:50.678 500. 00:38:50.678 --> 00:38:52.640 Like, surely, that's not the best solution, 00:38:52.640 --> 00:38:55.760 to have all of these numbers on the screen, 00:38:55.760 --> 00:38:57.140 wrapping endlessly on the screen. 00:38:57.140 --> 00:39:01.100 So, in Python, another way to do this would be to use a function 00:39:01.100 --> 00:39:04.160 called range, which, technically, is a data type onto itself. 00:39:04.160 --> 00:39:08.080 And this returns to you as many values as you ask for it. 00:39:08.080 --> 00:39:09.830 range takes some other arguments, as well. 00:39:09.830 --> 00:39:14.540 But the simplest use case here is, if you want back the numbers 0, 1, and 2-- 00:39:14.540 --> 00:39:15.890 a total of three values-- 00:39:15.890 --> 00:39:19.070 you say, hey, Python, please give me a range of three values. 00:39:19.070 --> 00:39:21.260 And by default, they start at 0 on up. 00:39:21.260 --> 00:39:24.320 But this is more efficient than it would be 00:39:24.320 --> 00:39:26.390 to hard code the entire list at once. 00:39:26.390 --> 00:39:29.150 And the best metaphor I could come up with is something like this. 00:39:29.150 --> 00:39:30.775 Here, for instance, is a deck of cards. 00:39:30.775 --> 00:39:34.430 This is normal, human size, and there's presumably 52 cards here. 00:39:34.430 --> 00:39:38.728 So writing out 0 through 51 on code would be a little ridiculous 00:39:38.728 --> 00:39:39.770 for the reasons you know. 00:39:39.770 --> 00:39:44.510 And it would just be very unwieldy and ugly and wrapping in all of that. 00:39:44.510 --> 00:39:48.500 It would be the virtual equivalent of me handing you all of these cards at once 00:39:48.500 --> 00:39:49.430 to just deal with. 00:39:49.430 --> 00:39:52.760 And, right, they're not that big, but it's a lot of cards to hold on to. 00:39:52.760 --> 00:39:55.760 It requires a lot of memory or physical storage, if you will. 00:39:55.760 --> 00:39:59.840 What range does, metaphorically, is, if you ask me for three cards, 00:39:59.840 --> 00:40:04.910 I hand you them one at a time, like this, so that, at any point in time, 00:40:04.910 --> 00:40:08.150 you only have one number in the computer's memory 00:40:08.150 --> 00:40:09.760 until you're handed the next. 00:40:09.760 --> 00:40:11.840 The alternative-- the previous version would 00:40:11.840 --> 00:40:15.360 be to hand me all three cards at once, or all 52 cards at once. 00:40:15.360 --> 00:40:17.840 But in this case, range is just way more efficient. 00:40:17.840 --> 00:40:19.700 You can do range of 1,000. 00:40:19.700 --> 00:40:22.940 That's not going to give you a list of 1,000 values all at once. 00:40:22.940 --> 00:40:25.910 It's going to give you 1,000 values one at a time, 00:40:25.910 --> 00:40:30.800 reducing memory significantly in the computer itself. 00:40:30.800 --> 00:40:31.310 All right. 00:40:31.310 --> 00:40:34.745 So, besides this, what about doing something forever in Scratch? 00:40:34.745 --> 00:40:38.060 Well, we could do this, literally, with a forever block, which didn't quite 00:40:38.060 --> 00:40:42.590 exist in C. In C, we had to hack it together by saying while True-- 00:40:42.590 --> 00:40:46.000 because True is, by definition, T-R-U-E, always true. 00:40:46.000 --> 00:40:50.420 So this just deliberately induces an infinite loop for us. 00:40:50.420 --> 00:40:53.375 In Python, the logic's going to be almost the same. 00:40:53.375 --> 00:40:55.250 And infinite loops in Python tend to actually 00:40:55.250 --> 00:40:58.760 be even more common because you can always break out of them, as you could 00:40:58.760 --> 00:41:02.280 in C. In Python, it looks like this. 00:41:02.280 --> 00:41:05.960 And this is slightly more subtle, but gone are the curly braces. 00:41:05.960 --> 00:41:07.370 Gone are the parentheses. 00:41:07.370 --> 00:41:10.400 But ever so slight difference, too? 00:41:10.400 --> 00:41:13.187 A capital T for True and it's going to be a capital F for False. 00:41:13.187 --> 00:41:14.270 Stupid little differences. 00:41:14.270 --> 00:41:16.440 Eventually, you're going to mistype one or the other. 00:41:16.440 --> 00:41:18.607 But these are the kinds of things to keep an eye out 00:41:18.607 --> 00:41:21.770 and to start recognizing in your mind's eye when you read code. 00:41:21.770 --> 00:41:25.310 Questions, now, on any of these building blocks? 00:41:25.310 --> 00:41:26.075 Yeah? 00:41:26.075 --> 00:41:31.360 AUDIENCE: In the for loop, was i set to 0 once for [? every loop? ?] 00:41:31.360 --> 00:41:33.970 DAVID MALAN: In the for loop, was i-- 00:41:33.970 --> 00:41:37.090 it was set to 0 on the first iteration, then 1 on the next, 00:41:37.090 --> 00:41:38.530 then 2 on the third. 00:41:38.530 --> 00:41:39.985 And the same thing for range. 00:41:39.985 --> 00:41:44.050 It just doesn't use up as much memory all at once. 00:41:44.050 --> 00:41:49.860 Other questions, now, on any of these building blocks of Python? 00:41:49.860 --> 00:41:50.400 All right. 00:41:50.400 --> 00:41:53.250 Well, let's go ahead and build something a little more than hello. 00:41:53.250 --> 00:41:56.415 Let me propose that, over here, we implement, maybe, 00:41:56.415 --> 00:41:58.200 the simplest of calculators here. 00:41:58.200 --> 00:42:02.145 So let me go back to VS Code here, open my terminal window 00:42:02.145 --> 00:42:06.885 and open up, say, a file called calculator.py. 00:42:06.885 --> 00:42:09.000 And in calculator.py, we'll have an opportunity 00:42:09.000 --> 00:42:11.340 to explore some of these building blocks, 00:42:11.340 --> 00:42:13.890 but we'll allow things to escalate pretty quickly 00:42:13.890 --> 00:42:17.225 to more interesting examples so that we can do the same thing, ultimately, 00:42:17.225 --> 00:42:17.760 as well. 00:42:17.760 --> 00:42:19.510 And, in fact, let me go ahead and do this. 00:42:19.510 --> 00:42:22.950 Moreover, I've brought some code with me in advance. 00:42:22.950 --> 00:42:25.725 For instance, something called calculator0.c, 00:42:25.725 --> 00:42:28.860 from the first week of C. And let me go ahead 00:42:28.860 --> 00:42:34.420 and split my window here, in fact, so that I can now do something like this. 00:42:34.420 --> 00:42:37.170 Let me move this over here, here. 00:42:37.170 --> 00:42:38.105 Calculator.py. 00:42:38.105 --> 00:42:40.920 So now, I have, on the left of my screen, calculator.c-- 00:42:40.920 --> 00:42:43.620 or calculator0.c because that's the first version I 00:42:43.620 --> 00:42:45.690 made-- and calculator.py on the right. 00:42:45.690 --> 00:42:48.290 Let me go ahead and implement, really, the same idea here. 00:42:48.290 --> 00:42:51.675 So on the right-hand side, the analog of including cs50.h 00:42:51.675 --> 00:42:56.390 would be from cs50 import get_int if I want to, indeed, use this function. 00:42:56.390 --> 00:42:58.140 Now, I'm going to go ahead and give myself 00:42:58.140 --> 00:43:00.453 a variable x without defining its type. 00:43:00.453 --> 00:43:02.370 I'm going to use this get_int function and I'm 00:43:02.370 --> 00:43:05.302 going to prompt the user for x, just like in C. 00:43:05.302 --> 00:43:08.010 I'm, then, going to go ahead and prompt the user for another int, 00:43:08.010 --> 00:43:12.300 like y, here, just like in C. And at the very end, I'm going to go ahead 00:43:12.300 --> 00:43:14.640 and do print x plus y. 00:43:14.640 --> 00:43:15.690 And that's it. 00:43:15.690 --> 00:43:19.020 Now, granted, I have some comments in my C version of the code, 00:43:19.020 --> 00:43:21.090 just to remind you of what each line is doing. 00:43:21.090 --> 00:43:23.878 But I've still distilled this into six lines-- or, really, four 00:43:23.878 --> 00:43:25.170 if I get rid of the blank line. 00:43:25.170 --> 00:43:29.580 So it's already, perhaps, a bit tighter here. 00:43:29.580 --> 00:43:33.600 It's tighter because something really important, historically, is missing. 00:43:33.600 --> 00:43:38.240 What did I seem to omit altogether that we haven't really highlighted yet? 00:43:38.240 --> 00:43:39.136 Yeah? 00:43:39.136 --> 00:43:40.530 AUDIENCE: [INAUDIBLE] 00:43:40.530 --> 00:43:41.280 DAVID MALAN: Yeah. 00:43:41.280 --> 00:43:42.910 The main function is gone. 00:43:42.910 --> 00:43:45.330 And in fact, maybe you took for granted that it just 00:43:45.330 --> 00:43:47.580 worked a moment ago when I wrote hello, but I didn't 00:43:47.580 --> 00:43:49.273 have a main function in hello, either. 00:43:49.273 --> 00:43:52.440 And this, too, is a feature of Python and a lot of other languages, as well. 00:43:52.440 --> 00:43:55.320 Instead of having to adhere to these long-standing traditions, 00:43:55.320 --> 00:43:57.400 if you just want to write code and get something done, fine. 00:43:57.400 --> 00:43:59.925 Just write code and get something done without, necessarily, 00:43:59.925 --> 00:44:01.185 all of this same boilerplate. 00:44:01.185 --> 00:44:04.380 So whatever is in your Python file-- 00:44:04.380 --> 00:44:06.510 left indented, if you will, by default-- 00:44:06.510 --> 00:44:10.180 is just going to be the code that the interpreter runs, top to bottom, 00:44:10.180 --> 00:44:10.850 left to right. 00:44:10.850 --> 00:44:14.300 Well, let me go ahead, now, and run code like this. 00:44:14.300 --> 00:44:17.470 Let me go ahead and open back up my terminal window, 00:44:17.470 --> 00:44:19.140 run python of calculator.py. 00:44:19.140 --> 00:44:21.570 And I'll do x is 1, y is 2. 00:44:21.570 --> 00:44:23.460 And as you might expect, it gives me 3. 00:44:23.460 --> 00:44:24.570 Slight aesthetic bug. 00:44:24.570 --> 00:44:26.590 I put my space in the wrong place here. 00:44:26.590 --> 00:44:27.810 So that's a newbie mistake. 00:44:27.810 --> 00:44:29.220 Let me fix that, aesthetically. 00:44:29.220 --> 00:44:31.050 Let me rerun python of calculator.py. 00:44:31.050 --> 00:44:31.680 Type in 1. 00:44:31.680 --> 00:44:32.250 Type in 2. 00:44:32.250 --> 00:44:36.280 And, voila, there is now my same version again. 00:44:36.280 --> 00:44:39.585 But let me propose, now, that we get rid of this training wheel. 00:44:39.585 --> 00:44:41.460 We don't want to keep taking one step forward 00:44:41.460 --> 00:44:43.793 and then two steps back by adding these training wheels, 00:44:43.793 --> 00:44:45.330 so let me instead do this. 00:44:45.330 --> 00:44:49.590 In my version of calculator.py, suppose that we take away, already, 00:44:49.590 --> 00:44:53.610 the training wheel that is the CS50 library here and let me, 00:44:53.610 --> 00:44:56.910 instead, then, use just Python's built-in function called 00:44:56.910 --> 00:44:59.020 input, which literally does just that. 00:44:59.020 --> 00:45:03.600 It gets input from the user and it stores it, as before, in x and y. 00:45:03.600 --> 00:45:04.950 So this is not CS50-specific. 00:45:04.950 --> 00:45:07.155 This is real-world Python programming. 00:45:07.155 --> 00:45:10.740 Well, let me go ahead and run, again, python of calculator.py. 00:45:10.740 --> 00:45:16.530 And, of course, if x is 1 and y is 2, x plus y should, of course, still be 3. 00:45:19.306 --> 00:45:24.285 It's apparently 12, according to Python, until CS50's library gets involved. 00:45:24.285 --> 00:45:28.620 But does anyone want to infer what just went wrong? 00:45:28.620 --> 00:45:29.160 Yeah? 00:45:29.160 --> 00:45:32.925 AUDIENCE: We're always [INAUDIBLE]. 00:45:32.925 --> 00:45:33.800 DAVID MALAN: Exactly. 00:45:33.800 --> 00:45:37.660 The input function, by design, always returns a string of text. 00:45:37.660 --> 00:45:39.410 After all, that's what the human typed in. 00:45:39.410 --> 00:45:42.620 And even though, yes, I typed the number keys on the keyboard, 00:45:42.620 --> 00:45:44.600 it's still coming back as all text. 00:45:44.600 --> 00:45:47.090 Now, maybe we should use like a get_int function. 00:45:47.090 --> 00:45:48.575 Well, that doesn't exist in Python. 00:45:48.575 --> 00:45:52.340 All you can do is get textual input-- a string from the user. 00:45:52.340 --> 00:45:54.415 But we can convert one to the other. 00:45:54.415 --> 00:45:58.610 And so, a fix for this so that we don't accidentally concatenate-- 00:45:58.610 --> 00:46:02.760 that is, join x plus y together-- would be to do something like this. 00:46:02.760 --> 00:46:04.595 Let me go back to my Python code, here. 00:46:04.595 --> 00:46:08.870 And whereas, in C, we could previously do typecasting-- 00:46:08.870 --> 00:46:11.060 we can convert one type to another-- 00:46:11.060 --> 00:46:14.420 that generally wasn't the case when you were doing something complex, 00:46:14.420 --> 00:46:15.470 like a string to an int. 00:46:15.470 --> 00:46:18.450 You could do a char to an int and vise versa. 00:46:18.450 --> 00:46:22.370 But for a string, recall, there was a special function in the C-type library 00:46:22.370 --> 00:46:25.100 called a to I, like Ascii to integer. 00:46:25.100 --> 00:46:27.880 That's the closest analog, here. 00:46:27.880 --> 00:46:29.630 And, in fact, the way to do this in Python 00:46:29.630 --> 00:46:32.740 would be to use a function called int, which, 00:46:32.740 --> 00:46:34.490 indeed, is the name of the data type, too, 00:46:34.490 --> 00:46:36.380 even though I have not yet had to type it. 00:46:36.380 --> 00:46:40.340 And I can convert the output of the input function 00:46:40.340 --> 00:46:44.600 automatically from a string immediately to an int. 00:46:44.600 --> 00:46:48.620 And now, if I go back to my terminal window, rerun python of calculator.py 00:46:48.620 --> 00:46:52.770 with 1 and 2 for x and y, now, I'm back in business. 00:46:52.770 --> 00:46:55.400 So that, then, is, for instance, what the CS50 library 00:46:55.400 --> 00:46:59.420 does, if temporarily this week, is it just deals with the conversion for you. 00:46:59.420 --> 00:47:03.500 And, in fact, bad things could happen if I type the wrong thing, 00:47:03.500 --> 00:47:05.615 like dog or cat instead of a number. 00:47:05.615 --> 00:47:08.400 But we'll cross that bridge in just a moment, as well. 00:47:08.400 --> 00:47:08.900 All right. 00:47:08.900 --> 00:47:11.990 What if we do something slightly different, now, with our calculator. 00:47:16.400 --> 00:47:18.790 Instead of addition, let's do division instead. 00:47:18.790 --> 00:47:23.990 So z equals x divided by y, thereby giving me a third variable z. 00:47:23.990 --> 00:47:27.320 Let me go ahead and run python of calculator.py again. 00:47:27.320 --> 00:47:29.120 I'll type in 1. 00:47:29.120 --> 00:47:31.790 I'll type in 3 this time. 00:47:31.790 --> 00:47:37.470 And what problem do you think we're about to see? 00:47:37.470 --> 00:47:38.400 Or is it gone? 00:47:38.400 --> 00:47:41.670 What happened when I did this in C, albeit with some slightly more 00:47:41.670 --> 00:47:47.680 cryptic syntax, when I divided one number, like 1 divided by 3? 00:47:47.680 --> 00:47:48.600 Anyone recall? 00:47:48.600 --> 00:47:49.100 Yeah? 00:47:49.100 --> 00:47:51.310 AUDIENCE: You would round to the nearest integer. 00:47:51.310 --> 00:47:52.060 DAVID MALAN: Yeah. 00:47:52.060 --> 00:47:55.030 So it would round down to the nearest integer, 00:47:55.030 --> 00:47:57.560 whereby you experience truncation. 00:47:57.560 --> 00:48:00.340 So if you take an integer like 1, you divide it 00:48:00.340 --> 00:48:02.530 by another integer like 3, that technically 00:48:02.530 --> 00:48:06.310 should be 0.33333, infinitely long. 00:48:06.310 --> 00:48:10.297 But in C, recall, you truncate the value. 00:48:10.297 --> 00:48:12.130 If you divide an int by an int, you get back 00:48:12.130 --> 00:48:14.965 an int, which means you get only the integer part, which was the 0. 00:48:14.965 --> 00:48:18.805 Now, Python actually handles this for us and avoids the truncation. 00:48:18.805 --> 00:48:23.650 But it leaves us, still, with one other problem here, which is going to be, 00:48:23.650 --> 00:48:27.453 for instance, not necessarily visible at a glance. 00:48:27.453 --> 00:48:28.245 This looks correct. 00:48:28.245 --> 00:48:31.780 This has solved the problem in C. So truncation does not happen. 00:48:31.780 --> 00:48:36.010 The integers are automatically converted to a float-- a floating point value. 00:48:36.010 --> 00:48:41.970 But what other problem did we trip over, back in week one? 00:48:44.480 --> 00:48:49.700 What else got a little dicey when dealing with simple arithmetic? 00:48:49.700 --> 00:48:51.238 Anyone recall? 00:48:51.238 --> 00:48:53.280 Well, the syntax in Python is a little different, 00:48:53.280 --> 00:48:54.780 but let me go ahead and do this. 00:48:54.780 --> 00:48:58.700 It turns out, in Python, if you want to see more significant digits than what 00:48:58.700 --> 00:49:02.360 I'm seeing here by default, which is a dozen or so, let me go ahead 00:49:02.360 --> 00:49:03.715 and print out z as follows. 00:49:03.715 --> 00:49:07.310 Let me first print out a format string because I want to format z 00:49:07.310 --> 00:49:08.780 in an interesting way. 00:49:08.780 --> 00:49:11.330 And notice, this would have no effect on the difference. 00:49:11.330 --> 00:49:14.630 This is just a format string that, for no compelling reason at the moment, 00:49:14.630 --> 00:49:19.280 is interpolating z in those curly braces using an fstring or format string. 00:49:19.280 --> 00:49:23.390 If I run this again with 1 and 3, we'll see, indeed, the exact same thing. 00:49:23.390 --> 00:49:25.700 But when you use an fstring, you, indeed, 00:49:25.700 --> 00:49:28.460 have the ability to format that string more precisely. 00:49:28.460 --> 00:49:32.930 Just like with %f in Python, you could start to fine-tune how many significant 00:49:32.930 --> 00:49:35.720 digits you see-- 00:49:35.720 --> 00:49:37.070 in C, rather. 00:49:37.070 --> 00:49:40.190 In Python, you can do the same, but the syntax is a little different. 00:49:40.190 --> 00:49:43.925 If you want the computer to interpolate z and show you 00:49:43.925 --> 00:49:47.570 50 significant digits-- that is, 50 numbers 00:49:47.570 --> 00:49:50.033 after the decimal point-- syntax is similar to C, 00:49:50.033 --> 00:49:51.200 but it's a little different. 00:49:51.200 --> 00:49:54.110 You literally put a colon after the variable's name. 00:49:54.110 --> 00:49:59.090 dot 50 means show me the decimal point and, then, 50 digits to the right, 00:49:59.090 --> 00:50:02.760 and the f just indicates please treat this as a floating point value. 00:50:02.760 --> 00:50:05.540 So now, if I rerun python of calculator.py, 00:50:05.540 --> 00:50:11.495 divide 1 by 3, unfortunately, Python has not solved all of the world's problems 00:50:11.495 --> 00:50:12.710 for us. 00:50:12.710 --> 00:50:15.545 This, again, was an example of floating point imprecision. 00:50:15.545 --> 00:50:17.692 So that problem is still latent. 00:50:17.692 --> 00:50:20.150 So just because the world has advanced, doesn't necessarily 00:50:20.150 --> 00:50:22.317 mean that all of our problems from C have gone away. 00:50:22.317 --> 00:50:26.418 There are solutions using third-party libraries for scientific calculations 00:50:26.418 --> 00:50:26.960 and the like. 00:50:26.960 --> 00:50:31.445 But out of the box, floating point imprecision is still an issue. 00:50:31.445 --> 00:50:35.780 Meanwhile, there was one other problem in C 00:50:35.780 --> 00:50:39.890 that we ran into involving numbers, and that was this-- integer overflow. 00:50:39.890 --> 00:50:41.930 Recall that an integer in C only took up, 00:50:41.930 --> 00:50:45.140 what, 32 bits typically, which meant you could count as high as 4 billion 00:50:45.140 --> 00:50:48.140 or, maybe, if you're doing positive and negatives, as high as 2 billion, 00:50:48.140 --> 00:50:50.030 after which, weird things would happen. 00:50:50.030 --> 00:50:54.798 The number would go to 0 or negative or it would overflow or wrap back around. 00:50:54.798 --> 00:50:56.840 Well, wonderfully, in Python, they did, at least, 00:50:56.840 --> 00:51:00.800 address this, whereby you can count as high as you want. 00:51:00.800 --> 00:51:03.830 And Python will just use more and more and more and more 00:51:03.830 --> 00:51:08.000 bits and bytes to store really big numbers so integer overflow is not 00:51:08.000 --> 00:51:09.020 a thing. 00:51:09.020 --> 00:51:13.820 With that said, Python is limited to how many digits it will show you 00:51:13.820 --> 00:51:15.410 on the screen at once as a string. 00:51:15.410 --> 00:51:18.560 But, mathematically, your math will be correct now. 00:51:18.560 --> 00:51:21.860 So we've taken a couple of steps forward, one step sideways. 00:51:21.860 --> 00:51:25.530 But, indeed, we have solved some of our problems here. 00:51:25.530 --> 00:51:26.030 All right. 00:51:26.030 --> 00:51:32.230 Questions, now, on any of these examples thus far? 00:51:32.230 --> 00:51:34.400 Question? 00:51:34.400 --> 00:51:35.000 All right. 00:51:35.000 --> 00:51:40.250 Well, how about another problem that we encountered in C. Let's 00:51:40.250 --> 00:51:41.720 revisit it here in Python, as well. 00:51:41.720 --> 00:51:43.595 So let me go ahead and, on the left-hand side 00:51:43.595 --> 00:51:54.020 here, let me open up a file called, say, compare3.c on the left, 00:51:54.020 --> 00:51:57.640 and let me go ahead and create a new file on the right called compare.py. 00:51:57.640 --> 00:52:00.070 Because recall that bad things happened when 00:52:00.070 --> 00:52:03.580 we needed to compare two values in C. So on the left, 00:52:03.580 --> 00:52:06.550 here, is a reminder of what we once did in C, 00:52:06.550 --> 00:52:11.230 whereby, if we want to compare values, we can get an int in C, store it in x. 00:52:11.230 --> 00:52:13.450 A get_int in C, store it in y. 00:52:13.450 --> 00:52:16.180 We then have our familiar, conditional logic here, 00:52:16.180 --> 00:52:19.210 just printing out if x x less than y or not. 00:52:19.210 --> 00:52:23.080 Well, we can certainly do the same thing, ultimately, in Python 00:52:23.080 --> 00:52:25.720 by using some fairly familiar syntax. 00:52:25.720 --> 00:52:27.640 And let's just demonstrate this one quickly. 00:52:27.640 --> 00:52:29.500 Let me go over here, too. 00:52:29.500 --> 00:52:34.690 I'll do from cs50 import get_int, even though I could do this, instead, 00:52:34.690 --> 00:52:36.700 with the input function itself. 00:52:36.700 --> 00:52:39.700 x equals get_int, and I'll prompt the user for that. 00:52:39.700 --> 00:52:42.880 y equals get_int, and I'll prompt the user for that. 00:52:42.880 --> 00:52:45.910 After that, recall that I can say, without parentheses, 00:52:45.910 --> 00:52:52.010 if x is less than y, then print out, without the f, "x is less than y." 00:52:52.010 --> 00:52:58.570 Then, I can go ahead and say else if x is greater than y, I can print out, 00:52:58.570 --> 00:53:01.270 quote unquote, "x is greater than y." 00:53:01.270 --> 00:53:05.320 If you'd like to interject now, what did I screw up? 00:53:05.320 --> 00:53:05.820 Anyone? 00:53:05.820 --> 00:53:06.150 Yeah? 00:53:06.150 --> 00:53:06.915 AUDIENCE: Elif. 00:53:06.915 --> 00:53:07.957 DAVID MALAN: Elif, right? 00:53:07.957 --> 00:53:13.965 So elif x is greater than y, else-- this part's the same-- print 00:53:13.965 --> 00:53:18.000 "x is equal to y." 00:53:18.000 --> 00:53:19.805 There's no new logic going on here. 00:53:19.805 --> 00:53:21.960 But, at least syntactically, it's a little cleaner. 00:53:21.960 --> 00:53:25.500 Indeed, this program is only 11 lines long, albeit without any comments. 00:53:25.500 --> 00:53:27.765 Let me go ahead and run python of compare.py. 00:53:27.765 --> 00:53:28.350 Let's see. 00:53:28.350 --> 00:53:30.235 Is 1 less than 2? 00:53:30.235 --> 00:53:30.735 Indeed. 00:53:30.735 --> 00:53:32.070 Let's run it again. 00:53:32.070 --> 00:53:33.330 Is 2 less than 1? 00:53:33.330 --> 00:53:34.890 No, it's greater than. 00:53:34.890 --> 00:53:37.740 And let's, lastly, type in 1 and 1 twice. 00:53:37.740 --> 00:53:38.910 x is equal to y. 00:53:38.910 --> 00:53:42.030 So we've got a pretty side-by-side, one-to-one conversion here. 00:53:42.030 --> 00:53:44.190 Let's do something a little more interesting, then. 00:53:44.190 --> 00:53:48.270 In C, how about I open, instead, something where we actually 00:53:48.270 --> 00:53:49.310 compared for a purpose? 00:53:49.310 --> 00:53:54.150 So if I open up, from earlier in the course-- 00:53:54.150 --> 00:54:00.320 how about agree.c, which prompt the user to agree to something or not? 00:54:00.320 --> 00:54:03.860 And let me code up a new version here, called agree.py. 00:54:03.860 --> 00:54:06.720 And I'll do this on the right-hand side, with agree.py. 00:54:06.720 --> 00:54:08.830 But on agree.c on the left-- 00:54:08.830 --> 00:54:12.210 notice that this is how we did this yes-no thing in C-- 00:54:12.210 --> 00:54:16.590 we compared c, a character, equal to single quotes 'Y' 00:54:16.590 --> 00:54:18.840 or equal to single quotes little 'y.' 00:54:18.840 --> 00:54:20.430 And then, the same thing for n. 00:54:20.430 --> 00:54:22.470 Now, in Python, this one is actually going 00:54:22.470 --> 00:54:23.960 to be a little bit different, here. 00:54:23.960 --> 00:54:27.310 Let me go ahead and, in the Python version of this, 00:54:27.310 --> 00:54:29.640 let me do something like this. 00:54:29.640 --> 00:54:31.258 We'll use get_string. 00:54:31.258 --> 00:54:31.800 Actually, no. 00:54:31.800 --> 00:54:33.217 We'll just use input in this case. 00:54:33.217 --> 00:54:36.780 So let's do s equals input. 00:54:36.780 --> 00:54:38.940 And we'll ask the user the same thing-- 00:54:38.940 --> 00:54:40.875 Do you agree, question mark. 00:54:40.875 --> 00:54:46.110 Then, let's go ahead and say, if s equals equals-- 00:54:46.110 --> 00:54:48.940 how about Y? 00:54:48.940 --> 00:54:49.740 Huh. 00:54:49.740 --> 00:54:50.758 How do I do this? 00:54:50.758 --> 00:54:51.550 Well, a few things. 00:54:51.550 --> 00:54:54.660 Turns out, I'm going to do this-- s equals equals little y. 00:54:54.660 --> 00:54:57.210 Then, I'm going to go ahead and print out "Agreed." 00:54:57.210 --> 00:55:03.390 And elif s equals equals capital N or s equals equals lowercase n, 00:55:03.390 --> 00:55:05.520 I'm going to go ahead and print out "Not agreed." 00:55:05.520 --> 00:55:08.820 And I claim, for the moment, that this is identical, now, 00:55:08.820 --> 00:55:13.760 to the program on the left in C. But what's different? 00:55:13.760 --> 00:55:17.280 So we're still doing the same kind of logic, these equal equals 00:55:17.280 --> 00:55:18.780 for comparing for equality. 00:55:18.780 --> 00:55:21.922 But notice that, nicely enough, Python got rid of the two vertical bars, 00:55:21.922 --> 00:55:23.505 and it's just literally the word "or." 00:55:23.505 --> 00:55:27.933 If you recall seeing ampersand ampersand to express a logical and in C, [GRUNTS] 00:55:27.933 --> 00:55:29.850 you can just write, literally, the word "and." 00:55:29.850 --> 00:55:33.390 And so, here's a hint of why Python tends to be pretty popular. 00:55:33.390 --> 00:55:35.640 People just like that it's a little closer to English. 00:55:35.640 --> 00:55:38.520 There's a little less of the cryptic syntax here. 00:55:38.520 --> 00:55:41.850 Now, this is correct, as this code will now work. 00:55:41.850 --> 00:55:45.750 But I've also used double quotes instead of single quotes, 00:55:45.750 --> 00:55:48.780 and I also omitted, a few minutes ago, from my list of data 00:55:48.780 --> 00:55:51.180 types in Python the word "char." 00:55:51.180 --> 00:55:53.430 In Python, there are no chars. 00:55:53.430 --> 00:55:55.320 There are no individual characters. 00:55:55.320 --> 00:55:58.830 If you want to manipulate an individual character, you use a string-- 00:55:58.830 --> 00:56:00.510 that is to say, a str-- 00:56:00.510 --> 00:56:01.680 of size 1. 00:56:01.680 --> 00:56:04.930 Now, in Python, you can use single quotes or double quotes. 00:56:04.930 --> 00:56:06.930 I'm deliberately using double quotes everywhere, 00:56:06.930 --> 00:56:09.715 just for consistency with how we treat strings in C. 00:56:09.715 --> 00:56:12.090 It's pretty common, though, to use single quotes instead, 00:56:12.090 --> 00:56:14.190 if only because, on most keyboards, you don't 00:56:14.190 --> 00:56:16.320 have to hold the Shift key anymore. 00:56:16.320 --> 00:56:18.288 Humans have really started to optimize just how 00:56:18.288 --> 00:56:19.830 quickly they want to be able to code. 00:56:19.830 --> 00:56:22.110 So using a single quote tends to be pretty popular 00:56:22.110 --> 00:56:24.270 in Python and other languages, as well. 00:56:24.270 --> 00:56:29.520 They are fundamentally the same, single or double, unlike in C, 00:56:29.520 --> 00:56:30.570 where they have meaning. 00:56:30.570 --> 00:56:33.120 So this is correct, I claim. 00:56:33.120 --> 00:56:34.830 And, in fact, let me run this real quick. 00:56:34.830 --> 00:56:37.090 I'll open up my terminal window here. 00:56:37.090 --> 00:56:40.230 Let me get rid of the version in C and run python of agree.py. 00:56:40.230 --> 00:56:42.420 And I'll type in Y. OK. 00:56:42.420 --> 00:56:44.220 I'll run it again and type in little y. 00:56:44.220 --> 00:56:46.780 And I'll stipulate it's going to work for no, as well. 00:56:46.780 --> 00:56:49.840 But this isn't necessarily the only way we can do this. 00:56:49.840 --> 00:56:52.350 There are other ways to implement the same idea. 00:56:52.350 --> 00:56:57.630 And in fact, I can go about doing this instead. 00:56:57.630 --> 00:56:59.910 Let me go back up to my code here. 00:56:59.910 --> 00:57:03.240 And we saw a hint of this earlier. 00:57:03.240 --> 00:57:06.240 We know that lists exist in Python, and you can create them 00:57:06.240 --> 00:57:08.040 just by using square brackets. 00:57:08.040 --> 00:57:10.380 So what if I simplify the code a little bit and just 00:57:10.380 --> 00:57:14.940 say if s is in the following list of values-- 00:57:14.940 --> 00:57:17.850 capital Y or lowercase y. 00:57:17.850 --> 00:57:21.090 It's not all that different, logically, but it's a little tighter. 00:57:21.090 --> 00:57:22.440 It's a little more compact. 00:57:22.440 --> 00:57:29.040 So elif s is in capital N or lowercase n, I can express that same idea, too. 00:57:29.040 --> 00:57:32.220 So here, again, it's just getting a little more pleasant to write code. 00:57:32.220 --> 00:57:33.960 There's less hitting of the keyboard. 00:57:33.960 --> 00:57:36.090 You can express yourself a little more succinctly. 00:57:36.090 --> 00:57:40.020 And using the keyword in, Python will figure out 00:57:40.020 --> 00:57:44.370 how to search the entire list for whatever the value of s is. 00:57:44.370 --> 00:57:47.010 And if it finds it, it will return True automatically. 00:57:47.010 --> 00:57:48.230 Else, it will return False. 00:57:48.230 --> 00:57:54.960 So if I run agree.py again and type in capital Y or lowercase y, that still, 00:57:54.960 --> 00:57:55.695 now, works. 00:57:55.695 --> 00:58:00.330 Well, I can tighten this up further if I want to add more features. 00:58:00.330 --> 00:58:04.710 Well, what if I want to support not just big Y and little y, 00:58:04.710 --> 00:58:10.050 but how about "Yes" or "yes" or, in case the user 00:58:10.050 --> 00:58:14.357 is yelling or someone who isn't good with CapsLock types in "YES?" 00:58:14.357 --> 00:58:14.940 Wait a minute. 00:58:14.940 --> 00:58:16.020 But it could be weird. 00:58:16.020 --> 00:58:20.850 Do we want to support this or this? 00:58:20.850 --> 00:58:23.480 This just gets really tedious, quickly, combinatorially, 00:58:23.480 --> 00:58:25.710 if you consider all of these possible permutations. 00:58:25.710 --> 00:58:27.990 What would be smarter than doing something 00:58:27.990 --> 00:58:30.120 like this, if you want to just be able to tolerate 00:58:30.120 --> 00:58:33.570 "yes" in any form of capitalization? 00:58:33.570 --> 00:58:35.370 Logically, what would be nice? 00:58:35.370 --> 00:58:38.232 AUDIENCE: Maybe, whatever the input is, you just transfer it over 00:58:38.232 --> 00:58:40.357 to all lowercase while uppercase, and then redo it? 00:58:40.357 --> 00:58:41.125 DAVID MALAN: Exactly. 00:58:41.125 --> 00:58:42.042 Super common paradigm. 00:58:42.042 --> 00:58:46.510 Why don't we just force the user's input to all lowercase or all uppercase-- 00:58:46.510 --> 00:58:49.570 doesn't matter, so long as we're self-consistent-- and just compare 00:58:49.570 --> 00:58:52.030 against all uppercase or all lowercase. 00:58:52.030 --> 00:58:55.760 And that will get rid of all of the possible permutations, otherwise. 00:58:55.760 --> 00:58:58.510 Now, in C, we might have done something like this. 00:58:58.510 --> 00:59:01.820 We might have simplified this whole list and just said-- 00:59:01.820 --> 00:59:04.940 let's say we'll do-- 00:59:04.940 --> 00:59:06.220 how about lowercase? 00:59:06.220 --> 00:59:10.490 So y or yes, and we'll just leave it at that. 00:59:10.490 --> 00:59:12.370 But we need to force, now, s to lowercase. 00:59:12.370 --> 00:59:15.970 Well, in C, we would have used the C-type library. 00:59:15.970 --> 00:59:19.660 We would have done to.lower and call that function, passing it in. 00:59:19.660 --> 00:59:22.330 Although, not really because, in C-type, those 00:59:22.330 --> 00:59:25.870 operate on individual characters or chars, not whole strings. 00:59:25.870 --> 00:59:29.920 We actually didn't see a function that could convert a whole string in C 00:59:29.920 --> 00:59:31.030 to lowercase. 00:59:31.030 --> 00:59:34.910 But in Python, we're going to benefit from some other feature, as well. 00:59:34.910 --> 00:59:39.330 It turns out that Python supports what's called object-oriented programming. 00:59:39.330 --> 00:59:41.830 And we're only going to scratch the surface of this in CS50. 00:59:41.830 --> 00:59:44.740 But if you take a higher-level C course in programming or CS, 00:59:44.740 --> 00:59:46.750 you explore this as a different paradigm. 00:59:46.750 --> 00:59:49.930 Up until now, in C, we've been focusing on what's called, really, 00:59:49.930 --> 00:59:51.025 procedural programming. 00:59:51.025 --> 00:59:52.210 You write procedures. 00:59:52.210 --> 00:59:55.250 You write functions, top to bottom, left to right. 00:59:55.250 --> 00:59:57.790 And when you want to change some value, we 00:59:57.790 --> 01:00:00.550 were in the habit of using a procedure-- that is, a function. 01:00:00.550 --> 01:00:03.670 You would pass something, like a variable, into a function, 01:00:03.670 --> 01:00:07.600 like toupper or tolower, and it would do its thing and hand you back a value. 01:00:07.600 --> 01:00:12.610 Well, it turns out that it would be nicer, programming-wise, if some data 01:00:12.610 --> 01:00:15.250 types just had built-in functionality. 01:00:15.250 --> 01:00:18.220 Why do we have our variables over here and all of our helper functions, 01:00:18.220 --> 01:00:21.010 like toupper and tolower over here, such that we constantly 01:00:21.010 --> 01:00:22.660 have to pass one into the other. 01:00:22.660 --> 01:00:27.590 It would be nice to bake into our data type some built-in functionality 01:00:27.590 --> 01:00:33.267 so that you can change variables using their own, default built-in 01:00:33.267 --> 01:00:33.850 functionality. 01:00:33.850 --> 01:00:37.510 And so, Object-Oriented Programming, otherwise known as OOP, 01:00:37.510 --> 01:00:41.635 is a technique whereby certain types of values, like a string-- 01:00:41.635 --> 01:00:47.230 AKA str-- not only have properties inside of them-- 01:00:47.230 --> 01:00:49.900 attributes, just like a struct in C-- 01:00:49.900 --> 01:00:54.480 your data can also have functions built into them, as well. 01:00:54.480 --> 01:00:57.955 So, whereas in C, which is not object-oriented, you have structs. 01:00:57.955 --> 01:01:01.150 And structs can only store data, like a name and a number 01:01:01.150 --> 01:01:02.620 when implementing a person. 01:01:02.620 --> 01:01:07.210 In Python, you can, for instance, have not just a structure-- 01:01:07.210 --> 01:01:09.010 otherwise known as a class-- 01:01:09.010 --> 01:01:10.930 storing a name and a number. 01:01:10.930 --> 01:01:15.460 You can have a function call that person or email that person 01:01:15.460 --> 01:01:19.510 or actual verbs or actions associated with that piece of data. 01:01:19.510 --> 01:01:21.910 Now, in the context of strings, it turns out 01:01:21.910 --> 01:01:24.565 that strings come with a lot of useful functionality. 01:01:24.565 --> 01:01:28.900 And in fact, at this URL here, which is in docs.python.org, 01:01:28.900 --> 01:01:31.720 which is the official documentation for Python, 01:01:31.720 --> 01:01:34.300 you'll see a whole list of methods-- 01:01:34.300 --> 01:01:37.705 that is, functions-- that come with strings that you can actually 01:01:37.705 --> 01:01:40.150 use to modify their values. 01:01:40.150 --> 01:01:42.440 And what I mean by this is the following. 01:01:42.440 --> 01:01:44.900 If we go through the documentation, poke around, 01:01:44.900 --> 01:01:48.163 it turns out that strings come with a function called lower. 01:01:48.163 --> 01:01:50.080 And if you want to use that function, you just 01:01:50.080 --> 01:01:54.850 have to use slightly different syntax than in C. You do not do tolower, 01:01:54.850 --> 01:01:59.140 and you do not say, as I just did, lower because this function is 01:01:59.140 --> 01:02:01.150 built into s itself. 01:02:01.150 --> 01:02:05.770 And just like in C, when you want to go inside of a variable, like a structure, 01:02:05.770 --> 01:02:09.790 and access a piece of data inside of it, like name or number, 01:02:09.790 --> 01:02:12.370 when you also have functions built into data types-- 01:02:12.370 --> 01:02:17.530 AKA methods; a method is just a function that is built into a piece of data-- 01:02:17.530 --> 01:02:23.480 you can do s dot lower open paren, closed paren in this case. 01:02:23.480 --> 01:02:25.480 And I can do this down here, as well. 01:02:25.480 --> 01:02:33.280 If s.lower in, quote unquote, "n" or "no", the whole thing, 01:02:33.280 --> 01:02:35.455 I can force this whole thing to lowercase. 01:02:35.455 --> 01:02:38.620 So the only difference here, now, as an object-oriented programming, 01:02:38.620 --> 01:02:41.840 instead of constantly passing a value into a function, 01:02:41.840 --> 01:02:45.910 you just access a function that's inside of the value. 01:02:45.910 --> 01:02:48.928 It just works because of how the language itself is defined. 01:02:48.928 --> 01:02:51.220 And the only way you know whether these functions exist 01:02:51.220 --> 01:02:55.495 is the documentation-- a class, a book, a website or the like. 01:02:55.495 --> 01:03:00.490 Questions, now, on this technique? 01:03:00.490 --> 01:03:01.070 All right. 01:03:01.070 --> 01:03:02.513 I claim this is correct. 01:03:02.513 --> 01:03:05.180 Now, even though you've never programmed, most of you, in Python 01:03:05.180 --> 01:03:07.655 before, not super well-designed. 01:03:07.655 --> 01:03:12.140 There's an subtle inefficiency, now, on lines 3 and 5 together. 01:03:12.140 --> 01:03:18.150 What's dumb about how I've used lower, might you think? 01:03:18.150 --> 01:03:18.720 Yeah? 01:03:18.720 --> 01:03:21.975 AUDIENCE: I feel like, using it twice, you'd just want another [? variable. ?] 01:03:21.975 --> 01:03:22.440 DAVID MALAN: Yeah. 01:03:22.440 --> 01:03:25.482 If you're going to use the same function twice and ask the same question, 01:03:25.482 --> 01:03:29.248 expecting the same answer, why are you calling the function itself twice? 01:03:29.248 --> 01:03:31.415 Maybe we should just store the result in a variable. 01:03:31.415 --> 01:03:33.030 So we could do this in a couple of different ways. 01:03:33.030 --> 01:03:36.360 We, for instance, could go up here and create another variable called t 01:03:36.360 --> 01:03:38.040 and set that equal to s.lower. 01:03:38.040 --> 01:03:41.330 And then, we could just change this to be t, here. 01:03:41.330 --> 01:03:43.080 But honestly, I don't think we technically 01:03:43.080 --> 01:03:45.480 need another variable altogether, here. 01:03:45.480 --> 01:03:47.410 I could just do something like this. 01:03:47.410 --> 01:03:52.360 Let's change the value of s to be the lowercase version thereof. 01:03:52.360 --> 01:03:55.920 And so, now, I can quite simply refer to s again and again like this, 01:03:55.920 --> 01:03:57.550 reusing that same value. 01:03:57.550 --> 01:04:01.380 Now, to be sure, I have now just lost the user's original input. 01:04:01.380 --> 01:04:05.430 And if I care about that-- if they typed in all caps, I have no idea anymore. 01:04:05.430 --> 01:04:08.070 So maybe I do want to use a separate variable, altogether. 01:04:08.070 --> 01:04:10.830 But a takeaway here, too, is that strings in Python 01:04:10.830 --> 01:04:13.590 are technically what we'll call immutable-- 01:04:13.590 --> 01:04:15.640 that is, they cannot be changed. 01:04:15.640 --> 01:04:19.830 This was not true in C. Once we gave you arrays in week two 01:04:19.830 --> 01:04:22.800 or memory in week four, you could go to town on a string 01:04:22.800 --> 01:04:25.780 and change any of the characters you want-- uppercasing, lowercasing, 01:04:25.780 --> 01:04:27.560 changing it, shortening it and so forth. 01:04:27.560 --> 01:04:33.690 But in this case, this returns a copy of s, forced to lowercase. 01:04:33.690 --> 01:04:35.790 It doesn't change the original string-- 01:04:35.790 --> 01:04:38.700 that is, the bytes in the computer's memory. 01:04:38.700 --> 01:04:41.580 When you assign it back to s, you're essentially 01:04:41.580 --> 01:04:43.703 forgetting about the old version of s. 01:04:43.703 --> 01:04:46.620 But because Python does memory management for you-- there's no malloc, 01:04:46.620 --> 01:04:47.820 there's no free-- 01:04:47.820 --> 01:04:52.200 Python automatically frees up the original bytes, like Y-E-S, 01:04:52.200 --> 01:04:54.750 and hands them back to the operating system for you. 01:04:54.750 --> 01:04:55.340 All right. 01:04:55.340 --> 01:04:59.640 Questions, now, on this technique? 01:04:59.640 --> 01:05:02.310 Questions on this? 01:05:02.310 --> 01:05:05.145 In general, I'll call out-- the Python documentation 01:05:05.145 --> 01:05:07.927 will start to be your friend because, in class, we'll only scratch 01:05:07.927 --> 01:05:09.510 the surface with some of these things. 01:05:09.510 --> 01:05:12.210 But in docs.python.org, for instance, there's 01:05:12.210 --> 01:05:15.630 a whole reference of all of the built-in functions that come with the language, 01:05:15.630 --> 01:05:18.135 as well as, for instance, those with a string. 01:05:18.135 --> 01:05:19.620 All right. 01:05:19.620 --> 01:05:23.205 Before we take a break, let's go ahead and create something a little familiar 01:05:23.205 --> 01:05:27.030 too based on our weeks here, in C. Let me 01:05:27.030 --> 01:05:30.690 propose that we revisit those examples involving some meows. 01:05:30.690 --> 01:05:34.260 So, for instance, when we had our cat meow back in the first week 01:05:34.260 --> 01:05:37.650 and, then, second in C, we did something that was a little stupid at first 01:05:37.650 --> 01:05:41.960 whereby we created a file, as I'll do here-- this time, called meow.py. 01:05:41.960 --> 01:05:44.550 And if I want a cat to meow three times, I 01:05:44.550 --> 01:05:47.190 could run it once, like this, a little copy-paste. 01:05:47.190 --> 01:05:50.580 And now, python of meow.py, and I'm done. 01:05:50.580 --> 01:05:53.100 Now, we've visited this example two times, at least, 01:05:53.100 --> 01:05:54.690 now in Scratch and in C. 01:05:54.690 --> 01:06:00.080 It's correct, I'll stipulate, but what's, obviously, poorly designed? 01:06:00.080 --> 01:06:01.655 What's the fault here? 01:06:01.655 --> 01:06:02.212 Yeah? 01:06:02.212 --> 01:06:03.670 AUDIENCE: It should just be a loop. 01:06:03.670 --> 01:06:04.990 DAVID MALAN: It should just be a loop, right? 01:06:04.990 --> 01:06:05.990 Why type it three times? 01:06:05.990 --> 01:06:08.560 Literally, copying and pasting is almost always a bad thing-- 01:06:08.560 --> 01:06:11.440 except in C, when you have the function prototypes that you need to borrow. 01:06:11.440 --> 01:06:13.232 But in this case, this is just inefficient. 01:06:13.232 --> 01:06:15.652 So what could we do better here, in Python? 01:06:15.652 --> 01:06:18.610 Well, in Python, we could probably change this in a few different ways. 01:06:18.610 --> 01:06:21.280 We could borrow some of the syntax we proposed in slide form 01:06:21.280 --> 01:06:23.710 earlier, like give me a variable called i. 01:06:23.710 --> 01:06:26.080 Set it to 0, no semicolon. 01:06:26.080 --> 01:06:29.510 While i is less than 3-- if I want to do this three times-- 01:06:29.510 --> 01:06:31.280 I can go ahead and print out "meow." 01:06:31.280 --> 01:06:33.580 And then, I can do i plus equals 1. 01:06:33.580 --> 01:06:35.080 And I think this would do the trick. 01:06:35.080 --> 01:06:38.650 Python of meow.py, and we're back in business already. 01:06:38.650 --> 01:06:41.463 Well, if I wanted to change this to a for loop, well, in Python, 01:06:41.463 --> 01:06:44.380 it would be a little tighter, but this would not be the best approach. 01:06:44.380 --> 01:06:52.510 So for i in 0, 1, 2, I could just do print "meow", like this. 01:06:52.510 --> 01:06:54.250 And that, too, would get the job done. 01:06:54.250 --> 01:06:58.390 But, to our discussion earlier, this would get stupid pretty quickly 01:06:58.390 --> 01:07:00.970 if you had to keep enumerating all of these values. 01:07:00.970 --> 01:07:03.880 What did we introduce instead? 01:07:03.880 --> 01:07:04.940 The range function. 01:07:04.940 --> 01:07:05.440 Exactly. 01:07:05.440 --> 01:07:09.040 So that hands me back, way more efficiently, just the values I want, 01:07:09.040 --> 01:07:10.635 indeed, one at a time. 01:07:10.635 --> 01:07:14.745 So even this, if I run it a third or fourth time, we've got the same result. 01:07:14.745 --> 01:07:18.220 But now, let's transition to where we went with this back in the day. 01:07:18.220 --> 01:07:20.650 How can we start to modularize this? 01:07:20.650 --> 01:07:24.100 It would be nice, I claimed, if MIT had given us a meow function. 01:07:24.100 --> 01:07:27.370 Wouldn't it be nice if Python had given us a meow function? 01:07:27.370 --> 01:07:30.580 Maybe less compelling in Python, but how can I build my own function? 01:07:30.580 --> 01:07:33.618 Well, I did this briefly with the spell checker earlier, 01:07:33.618 --> 01:07:36.160 but let me go ahead and propose that we could implement, now, 01:07:36.160 --> 01:07:40.280 our own version of this in Python as follows. 01:07:40.280 --> 01:07:44.050 Let me go ahead and start fresh here and use the keyword def. 01:07:44.050 --> 01:07:47.860 So this did not exist in C. You had the return value, the function 01:07:47.860 --> 01:07:48.850 name, the arguments. 01:07:48.850 --> 01:07:52.120 In Python, you literally say def to define a function. 01:07:52.120 --> 01:07:54.757 You give it a name, like meow. 01:07:54.757 --> 01:07:57.840 And now, I'm going to go ahead and, in this function, just print out meow. 01:07:57.840 --> 01:08:01.460 And this lets me change it to anything else I want in the future. 01:08:01.460 --> 01:08:03.400 But for now, it's an abstraction. 01:08:03.400 --> 01:08:07.773 And in fact, I can move it out of sight, out of mind-- 01:08:07.773 --> 01:08:09.940 just going to hit Enter a bunch of times to pretend, 01:08:09.940 --> 01:08:13.382 now, it exists, but I don't care how it is implemented. 01:08:13.382 --> 01:08:15.340 And up here, now, I can do something like this. 01:08:15.340 --> 01:08:20.590 For i in range of 3, let me go ahead and not print "meow" anymore. 01:08:20.590 --> 01:08:25.359 Let me just call meow and tightening up my code further. 01:08:25.359 --> 01:08:25.960 Let's see. 01:08:25.960 --> 01:08:26.859 Python of meow.py. 01:08:26.859 --> 01:08:31.240 This is, I think, going to be the first time it does not work correctly. 01:08:31.240 --> 01:08:32.680 OK. 01:08:32.680 --> 01:08:36.310 So here, we have, sadly, our first Python error. 01:08:36.310 --> 01:08:37.569 And let's see. 01:08:37.569 --> 01:08:40.300 The syntax is going to be different from C or Clangs output. 01:08:40.300 --> 01:08:41.920 Traceback is the term of art here. 01:08:41.920 --> 01:08:44.859 This is like a trace back of all of the lines of code 01:08:44.859 --> 01:08:47.560 that were just executed or, really, functions you've called. 01:08:47.560 --> 01:08:49.090 The file name is uninteresting. 01:08:49.090 --> 01:08:52.149 This is my codespace, specifically, but the file name 01:08:52.149 --> 01:08:53.890 is important here-- meow.py. 01:08:53.890 --> 01:08:55.675 Our line 2 is the issue-- 01:08:55.675 --> 01:08:58.060 OK, I didn't get very far before I screwed up-- 01:08:58.060 --> 01:08:59.470 and then, there's a name error. 01:08:59.470 --> 01:09:03.430 And you'll see, in Python, there's typically these capitalized keywords 01:09:03.430 --> 01:09:05.350 that hint at what the issue is. 01:09:05.350 --> 01:09:09.260 It's something related to names of variables. "meow" is not defined. 01:09:09.260 --> 01:09:09.760 All right. 01:09:09.760 --> 01:09:11.635 You're programming Python for the first time. 01:09:11.635 --> 01:09:12.399 You've screwed up. 01:09:12.399 --> 01:09:14.560 You're following some online tutorial. 01:09:14.560 --> 01:09:16.149 You're seeing this. 01:09:16.149 --> 01:09:18.010 Reason through it. 01:09:18.010 --> 01:09:20.680 Why might "meow" not be defined? 01:09:20.680 --> 01:09:24.779 What can we infer about Python? 01:09:24.779 --> 01:09:27.240 How to troubleshoot, logically? 01:09:27.240 --> 01:09:29.147 AUDIENCE: [INAUDIBLE] 01:09:29.147 --> 01:09:29.939 DAVID MALAN: Maybe. 01:09:29.939 --> 01:09:32.520 Is it because "meow" is defined after? 01:09:32.520 --> 01:09:34.890 As smart as Python seems to be, vis-a-vis C, 01:09:34.890 --> 01:09:37.055 they have some similar design characteristics. 01:09:37.055 --> 01:09:37.920 So let's try that. 01:09:37.920 --> 01:09:41.729 So let me scroll all the way back down to where I moved this earlier. 01:09:41.729 --> 01:09:43.649 Let me get rid of it-- 01:09:43.649 --> 01:09:44.279 way down there. 01:09:44.279 --> 01:09:46.410 I'll copy it to my clipboard. 01:09:46.410 --> 01:09:48.180 And let me just hack something together. 01:09:48.180 --> 01:09:49.963 Let me just put it up here. 01:09:49.963 --> 01:09:51.130 And let's see if this works. 01:09:51.130 --> 01:09:54.120 So now, let me clear my terminal, run python of meow.py. 01:09:54.120 --> 01:09:55.110 OK. 01:09:55.110 --> 01:09:56.198 We're back in business. 01:09:56.198 --> 01:09:57.990 So that was actually really good intuition. 01:09:57.990 --> 01:10:00.180 Good debugging technique, just reason through it. 01:10:00.180 --> 01:10:02.430 Now, this is contradicting what I claimed back 01:10:02.430 --> 01:10:05.325 in week one, which was that the main part of your program, 01:10:05.325 --> 01:10:07.470 ideally, should just be at the top of the file. 01:10:07.470 --> 01:10:08.580 Don't make me look for it. 01:10:08.580 --> 01:10:10.497 It's not a huge deal with a four-line program, 01:10:10.497 --> 01:10:13.290 but if you've got 40 lines or 400 lines, you 01:10:13.290 --> 01:10:15.480 don't want the juicy part of your program 01:10:15.480 --> 01:10:18.455 to be way down here, and all of these functions way up here. 01:10:18.455 --> 01:10:22.085 So it would be nice, maybe, if we actually have a main function. 01:10:22.085 --> 01:10:25.260 And so, it actually turns out to be a convention in Python 01:10:25.260 --> 01:10:27.460 to define a main function. 01:10:27.460 --> 01:10:30.720 It's not a special function that's automatically called, like in C. 01:10:30.720 --> 01:10:32.340 But humans realized, you know what? 01:10:32.340 --> 01:10:34.120 That was a pretty useful feature. 01:10:34.120 --> 01:10:36.540 Let me define a function called main. 01:10:36.540 --> 01:10:39.000 Let me indent these lines underneath it. 01:10:39.000 --> 01:10:41.070 Let me practice what I'm preaching, which is put 01:10:41.070 --> 01:10:43.290 the main code at the top of the file. 01:10:43.290 --> 01:10:47.730 And, wonderfully, in Python, now, you do not need prototypes. 01:10:47.730 --> 01:10:49.920 There's none of that hackish copying and pasting 01:10:49.920 --> 01:10:52.462 of the return type, the name and the arguments to a function, 01:10:52.462 --> 01:10:58.485 like we needed in C. This is now OK instead, except for one, minor detail. 01:10:58.485 --> 01:11:01.290 Let me go ahead and run python of meow.py. 01:11:01.290 --> 01:11:05.940 Hopefully, now, I've solved this problem by having [GROANS] a main function. 01:11:05.940 --> 01:11:08.170 But now, nothing has happened. 01:11:08.170 --> 01:11:08.670 All right. 01:11:08.670 --> 01:11:12.200 Even if you've never programmed in Python before, 01:11:12.200 --> 01:11:17.855 what might explain this behavior, and how do I fix? 01:11:17.855 --> 01:11:20.730 Again, when you're off in the real world, learning some new language, 01:11:20.730 --> 01:11:23.790 all you have is deductive logic to debug. 01:11:23.790 --> 01:11:24.300 Yeah? 01:11:24.300 --> 01:11:28.656 AUDIENCE: I remember in C, even though we [INAUDIBLE].. 01:11:31.708 --> 01:11:32.500 DAVID MALAN: Right. 01:11:32.500 --> 01:11:34.510 So the solution, to be clear, in C was that we 01:11:34.510 --> 01:11:35.650 had to put the prototype up here. 01:11:35.650 --> 01:11:36.790 Otherwise, we'd get an error message. 01:11:36.790 --> 01:11:39.123 In this case, I'm actually not getting an error message. 01:11:39.123 --> 01:11:42.610 And, indeed, I'll claim that you don't need the prototypes in Python. 01:11:42.610 --> 01:11:46.910 Just not necessary because that was annoying, if nothing else. 01:11:46.910 --> 01:11:48.820 But what else might explain? 01:11:48.820 --> 01:11:49.570 Yeah, in the back? 01:11:49.570 --> 01:11:51.030 AUDIENCE: [INAUDIBLE] 01:11:51.030 --> 01:11:51.780 DAVID MALAN: Yeah. 01:11:51.780 --> 01:11:53.880 Maybe you have to call main itself. 01:11:53.880 --> 01:11:58.410 If main is not some special status in Python, maybe just because it exists 01:11:58.410 --> 01:11:59.040 isn't enough. 01:11:59.040 --> 01:12:02.580 And, indeed, if you want to call main, the new convention 01:12:02.580 --> 01:12:05.460 is actually going to be-- as the very last line of your program, 01:12:05.460 --> 01:12:07.350 typically-- to literally call main. 01:12:07.350 --> 01:12:10.950 It's a little stupid-looking, but they made a design decision. 01:12:10.950 --> 01:12:13.200 And this is how, now, we work around it. 01:12:13.200 --> 01:12:14.610 Python of meow.py. 01:12:14.610 --> 01:12:16.890 Now we're back in business. 01:12:16.890 --> 01:12:19.560 But now, logically, why does this work the way it does? 01:12:19.560 --> 01:12:22.320 Well, in this case-- top to bottom-- 01:12:22.320 --> 01:12:25.350 line 1 is telling Python to define a function called main 01:12:25.350 --> 01:12:27.660 and, then, define it as follows, lines 2 and 3. 01:12:27.660 --> 01:12:29.610 But it's not calling main yet. 01:12:29.610 --> 01:12:33.210 Line 6 is telling Python how to define a function called meow, 01:12:33.210 --> 01:12:35.580 but it's not calling these lines yet. 01:12:35.580 --> 01:12:38.730 Now, on line 10, you're telling Python, call main. 01:12:38.730 --> 01:12:41.310 And at that point, Python has been trained, if you will, 01:12:41.310 --> 01:12:45.390 to know what main is on line 1, to know what meow is on line 6. 01:12:45.390 --> 01:12:49.650 And so, it's now perfectly OK for main to be above meow 01:12:49.650 --> 01:12:51.150 because you never called them yet. 01:12:51.150 --> 01:12:54.340 You defined, defined, and then, you called. 01:12:54.340 --> 01:12:56.380 And that's the logic behind this. 01:12:56.380 --> 01:13:01.250 Any questions, now, on the structure of this technique, here? 01:13:01.250 --> 01:13:03.000 Now, let's do one more, then. 01:13:03.000 --> 01:13:07.740 Recall that the last thing we did in Scratch and in C was to, 01:13:07.740 --> 01:13:10.940 actually, parameterize these same functions. 01:13:10.940 --> 01:13:14.070 So suppose that you don't want main to be responsible for the loop here. 01:13:14.070 --> 01:13:17.580 You instead want to, very simply, do something like "meow" three times 01:13:17.580 --> 01:13:18.660 and be done with it. 01:13:18.660 --> 01:13:21.427 Well, in Python, it's going to be similar in spirit to C. 01:13:21.427 --> 01:13:23.760 But, again, we don't need to keep mentioning data types. 01:13:23.760 --> 01:13:26.310 If you want "meow" to take some argument-- 01:13:26.310 --> 01:13:27.930 like a number n-- 01:13:27.930 --> 01:13:30.792 you can just specify n as the name of that argument. 01:13:30.792 --> 01:13:33.250 Or you can call it anything else, of course, that you want. 01:13:33.250 --> 01:13:35.700 You don't have to specify int or anything else. 01:13:35.700 --> 01:13:40.890 In your code, now, inside of meow, you can do something like for i in, 01:13:40.890 --> 01:13:41.670 let's say-- 01:13:41.670 --> 01:13:45.690 I definitely, now, can't do this because that would be weird, to start the list 01:13:45.690 --> 01:13:46.590 and end it with n. 01:13:46.590 --> 01:13:49.360 So, if I can come back over here, what's the solution? 01:13:49.360 --> 01:13:51.270 How can I do something n times? 01:13:51.270 --> 01:13:52.410 AUDIENCE: [INAUDIBLE] 01:13:52.410 --> 01:13:53.160 DAVID MALAN: Yeah. 01:13:53.160 --> 01:13:54.340 Using range. 01:13:54.340 --> 01:13:58.140 So range is nice because I can pass in, now, this variable n. 01:13:58.140 --> 01:13:59.940 And now, I can meow-- whoops. 01:13:59.940 --> 01:14:03.195 Now i can print out, quote unquote, "meow." 01:14:03.195 --> 01:14:05.820 So it's almost the same as in Scratch, almost the same as in C. 01:14:05.820 --> 01:14:06.903 But it's a little simpler. 01:14:06.903 --> 01:14:12.210 And if, now, I run meow.py, I'll have the ability, now, to do this here, 01:14:12.210 --> 01:14:13.110 as well. 01:14:13.110 --> 01:14:13.770 All right. 01:14:13.770 --> 01:14:16.590 Questions on any of this? 01:14:16.590 --> 01:14:19.800 Right now, we're taking this stroll through week one. 01:14:19.800 --> 01:14:22.050 We're going to, momentarily, escalate things 01:14:22.050 --> 01:14:24.840 to look not only at some of these basics, 01:14:24.840 --> 01:14:27.390 but also, other features, like we saw with face recognition 01:14:27.390 --> 01:14:28.920 with the speller or the like. 01:14:28.920 --> 01:14:31.962 Because of how many of us are here, we have a huge amount of candy 01:14:31.962 --> 01:14:32.670 out in the lobby. 01:14:32.670 --> 01:14:34.440 So why don't we go ahead and take a 10-minute break? 01:14:34.440 --> 01:14:37.230 And when we come back, we'll do even fancier, more powerful things 01:14:37.230 --> 01:14:38.595 with Python in 10. 01:14:38.595 --> 01:14:40.020 All right. 01:14:40.020 --> 01:14:41.730 So we are back. 01:14:41.730 --> 01:14:44.280 Among our goals, now, are to introduce a few more building 01:14:44.280 --> 01:14:47.880 blocks so that we can solve more interesting problems at the end, 01:14:47.880 --> 01:14:49.560 much like those that we began with. 01:14:49.560 --> 01:14:52.830 You'll recall, from a few weeks ago, we played with this two-dimensional Super 01:14:52.830 --> 01:14:53.670 Mario world. 01:14:53.670 --> 01:14:57.380 And we tried to print a vertical column of three or more bricks. 01:14:57.380 --> 01:15:00.210 Well, let me propose that we use this as an opportunity to, now, 01:15:00.210 --> 01:15:02.880 tinker with some of Python's more useful, more 01:15:02.880 --> 01:15:04.470 user-friendly functionality, as well. 01:15:04.470 --> 01:15:09.265 So let me code a file called mario.py, and let's just print out 01:15:09.265 --> 01:15:10.890 the equivalent of that vertical column. 01:15:10.890 --> 01:15:12.690 So it's of height 3. 01:15:12.690 --> 01:15:16.740 Each one is a hash, so let's do for i in range of 3 initially, 01:15:16.740 --> 01:15:18.600 and let's just print out a single hash. 01:15:18.600 --> 01:15:21.790 And I think, now, python of mario.py-- 01:15:21.790 --> 01:15:22.290 voila. 01:15:22.290 --> 01:15:27.480 We're in business, printing out just that same column there. 01:15:27.480 --> 01:15:31.110 What if, though, we want to print a column of some variable height 01:15:31.110 --> 01:15:33.510 where the user tells us how tall they want it to be? 01:15:33.510 --> 01:15:39.600 Well, let me go up here, for instance and, instead, how about-- 01:15:39.600 --> 01:15:40.920 let's do this. 01:15:40.920 --> 01:15:45.210 How about from cs50 import? 01:15:45.210 --> 01:15:47.620 How about the get_int function, as before? 01:15:47.620 --> 01:15:50.430 So it will deal with making sure the user gives us an integer. 01:15:50.430 --> 01:15:54.750 And now, in the past, whenever we wanted to get a number from a user, 01:15:54.750 --> 01:15:56.780 we've actually followed a certain paradigm. 01:15:56.780 --> 01:16:02.895 In fact, if I open up here, for instance, 01:16:02.895 --> 01:16:06.630 how about mario1.c from a while back, you 01:16:06.630 --> 01:16:11.430 might recall that we had code like this. 01:16:11.430 --> 01:16:13.800 And we specifically use the do while loop in C 01:16:13.800 --> 01:16:16.410 whenever we want to get something from the user, 01:16:16.410 --> 01:16:18.858 maybe, again and again and again, until they cooperate. 01:16:18.858 --> 01:16:20.900 At which point, we finally break out of the loop. 01:16:20.900 --> 01:16:22.830 So it turns out, Python does have while loops, 01:16:22.830 --> 01:16:25.698 does have for loops, does not have do while loops. 01:16:25.698 --> 01:16:27.990 And yet, pretty much any time you've gotten user input, 01:16:27.990 --> 01:16:30.100 you've probably used this paradigm. 01:16:30.100 --> 01:16:33.930 So it turns out that the Python equivalent of this is to do, 01:16:33.930 --> 01:16:36.450 similar in spirit, but using only a while loop. 01:16:36.450 --> 01:16:39.300 And a common paradigm in Python, as I alluded earlier, 01:16:39.300 --> 01:16:43.440 is to actually deliberately induce an infinite loop while True-- 01:16:43.440 --> 01:16:48.240 capital T-- and then, do what you want to do, like get an int from the user 01:16:48.240 --> 01:16:51.690 and prompt them for the height, for instance, in question. 01:16:51.690 --> 01:16:56.070 And then, if you're sure that the user has given you what you want-- 01:16:56.070 --> 01:16:59.220 like n is greater than 0, which is what I want, in this case, 01:16:59.220 --> 01:17:02.610 because I want a positive integer; otherwise, there's nothing to print-- 01:17:02.610 --> 01:17:04.505 you literally just break out of the loop. 01:17:04.505 --> 01:17:08.070 And so, we could actually use this technique in C. It's just not 01:17:08.070 --> 01:17:10.260 really done in C. You could absolutely, in C, 01:17:10.260 --> 01:17:13.590 have done a while True loop with the parentheses, lowercase true. 01:17:13.590 --> 01:17:15.670 You could break out of it, and so forth. 01:17:15.670 --> 01:17:18.312 But in Python, this is the Python way. 01:17:18.312 --> 01:17:19.770 And this is actually a term of art. 01:17:19.770 --> 01:17:24.017 This way in Python is pythonic This is "the way everyone does it," 01:17:24.017 --> 01:17:24.600 quote unquote. 01:17:24.600 --> 01:17:28.830 Doesn't mean you have to, but that's the way the cool Python programmers would 01:17:28.830 --> 01:17:31.980 implement an idea like this-- trying to do something again and again 01:17:31.980 --> 01:17:34.607 and again until the user actually cooperates. 01:17:34.607 --> 01:17:36.690 But all we've done is take away the do while loop. 01:17:36.690 --> 01:17:39.790 But still, logically, we can implement the same idea. 01:17:39.790 --> 01:17:44.580 Now, below this, let me go ahead and just print out, for i in range of n 01:17:44.580 --> 01:17:47.370 this time-- because I want it to be variable and not 3. 01:17:47.370 --> 01:17:49.920 I can go ahead and print out the hash-- 01:17:49.920 --> 01:17:52.260 let me go ahead and get rid of the C version here-- 01:17:52.260 --> 01:17:55.920 open my terminal window and I'll run, again, Python of mario.py. 01:17:55.920 --> 01:17:58.530 I'll type in 3 and I get back those three hashes. 01:17:58.530 --> 01:18:02.635 But if I, instead, type in 4, I now get four hashes instead. 01:18:02.635 --> 01:18:04.640 So the takeaway here is, quite simply, that this 01:18:04.640 --> 01:18:08.030 would be the way, for instance, to actually get back 01:18:08.030 --> 01:18:11.615 a value in Python that is consistent with some parameter, 01:18:11.615 --> 01:18:13.160 like greater than 0. 01:18:13.160 --> 01:18:13.950 How about this? 01:18:13.950 --> 01:18:17.810 Let's actually practice what we preached a moment ago with our meowing examples 01:18:17.810 --> 01:18:19.830 and factoring all this out. 01:18:19.830 --> 01:18:23.220 Let me go ahead and define a main function, as before. 01:18:23.220 --> 01:18:25.190 Let me go ahead and assume, for the moment, 01:18:25.190 --> 01:18:28.673 that a get_height function exists, which is not a thing in Python. 01:18:28.673 --> 01:18:30.340 I'm going to invent it in just a moment. 01:18:30.340 --> 01:18:33.620 And now, I'm going to go ahead and do something like this. for i 01:18:33.620 --> 01:18:39.470 in the range of that height, well, let's go ahead and print out those hashes. 01:18:39.470 --> 01:18:41.760 So I'm assuming that get_height exists. 01:18:41.760 --> 01:18:44.725 Let me go ahead and implement that abstraction, so define a function, 01:18:44.725 --> 01:18:46.100 now, called get_height. 01:18:46.100 --> 01:18:48.830 It's not going to take any arguments in this design. 01:18:48.830 --> 01:18:52.820 While True, I can go ahead and do the same thing as before-- 01:18:52.820 --> 01:18:55.880 assign a variable n, the return value of get_int 01:18:55.880 --> 01:18:58.140 prompting the user for that height. 01:18:58.140 --> 01:19:03.980 And then, if n is greater than 0, I can go ahead and break. 01:19:03.980 --> 01:19:08.390 But if I break here, I, logically-- just like in C-- 01:19:08.390 --> 01:19:11.360 end up executing below the loop in question. 01:19:11.360 --> 01:19:12.690 But there's nothing there. 01:19:12.690 --> 01:19:16.820 But if I want get_height to return the height, what should 01:19:16.820 --> 01:19:18.650 I type here on line 14, logically? 01:19:21.580 --> 01:19:23.380 What do I want to return, to be clear? 01:19:23.380 --> 01:19:23.995 AUDIENCE: [INAUDIBLE] 01:19:23.995 --> 01:19:24.745 DAVID MALAN: Yeah. 01:19:24.745 --> 01:19:26.890 So I actually want to return n. 01:19:26.890 --> 01:19:30.880 And here's another curiosity of Python, vis-a-vis C. 01:19:30.880 --> 01:19:33.670 There doesn't seem to be an issue of scope anymore, right? 01:19:33.670 --> 01:19:37.180 In C, it was super important to not only declare your variables with the data 01:19:37.180 --> 01:19:39.550 types, you also had to be mindful of where they exist-- 01:19:39.550 --> 01:19:41.200 inside of those curly braces. 01:19:41.200 --> 01:19:45.238 In Python, it turns out you can be a little looser with things, for better 01:19:45.238 --> 01:19:45.780 or for worse. 01:19:45.780 --> 01:19:50.020 And so, on line 11, if I create a variable called n, 01:19:50.020 --> 01:19:57.170 it exists on line 11, 12 and even 13, outside of the while loop. 01:19:57.170 --> 01:19:59.710 So to be clear, in C, with a while loop, we 01:19:59.710 --> 01:20:03.040 would have ordinarily had not a colon. 01:20:03.040 --> 01:20:05.920 We would have had the curly brace, like here and over here. 01:20:05.920 --> 01:20:08.770 And a week ago, I would have claimed that, in C, n 01:20:08.770 --> 01:20:12.130 does not exist outside of the while loop, by nature of those curly braces. 01:20:12.130 --> 01:20:15.250 Even though the curly braces are gone, Python actually 01:20:15.250 --> 01:20:20.685 allows you to use a variable any time after you have assigned it a value. 01:20:20.685 --> 01:20:23.625 So slightly more powerful, as such. 01:20:23.625 --> 01:20:26.830 However, I can tighten this up a little bit, logically. 01:20:26.830 --> 01:20:30.700 And this is true in C. I don't really need to break out of the loop 01:20:30.700 --> 01:20:32.020 by using break. 01:20:32.020 --> 01:20:36.070 Recall that or know that I can actually-- once I'm ready to go, 01:20:36.070 --> 01:20:40.030 I can just return the value I care about, even inside of the loop. 01:20:40.030 --> 01:20:43.000 And that will have the side effect of breaking me out of the loop 01:20:43.000 --> 01:20:46.590 and, also, breaking me out of and returning from the entire function. 01:20:46.590 --> 01:20:50.470 So nothing too new here, in terms of C versus Python, except for this issue 01:20:50.470 --> 01:20:51.490 with scope. 01:20:51.490 --> 01:20:53.770 And I, indeed, returned n at the bottom there, 01:20:53.770 --> 01:20:56.360 just to make clear that n would still exist. 01:20:56.360 --> 01:20:58.170 So either of those are correct. 01:20:58.170 --> 01:21:02.350 Now, I just have a Python program that I think 01:21:02.350 --> 01:21:05.590 is going to allow me to implement this same Mario idea. 01:21:05.590 --> 01:21:07.450 So let's run python of mario.py. 01:21:07.450 --> 01:21:09.820 And-- OK, so nothing happened. 01:21:09.820 --> 01:21:13.390 Python of mario.py. 01:21:13.390 --> 01:21:14.260 What did I do wrong? 01:21:14.260 --> 01:21:14.965 AUDIENCE: [INAUDIBLE] 01:21:14.965 --> 01:21:16.590 DAVID MALAN: Yeah, I have to call main. 01:21:16.590 --> 01:21:19.720 So, at the bottom of my code, I have to call main here. 01:21:19.720 --> 01:21:22.720 And this is a stylistic detail that's been subtle. 01:21:22.720 --> 01:21:26.050 Generally speaking, when you are writing in Python, 01:21:26.050 --> 01:21:28.360 there's not a CS50 style guide, per se. 01:21:28.360 --> 01:21:33.700 There's actually a Python style guide that most people adhere to. 01:21:33.700 --> 01:21:37.480 And in this case, double blank lines between functions is the norm. 01:21:37.480 --> 01:21:41.890 I'm doing that deliberately, although it might, otherwise, not be obvious. 01:21:41.890 --> 01:21:45.130 But now that I've called main on line 16, let's run mario.py once more. 01:21:45.130 --> 01:21:46.690 Aha. 01:21:46.690 --> 01:21:47.560 Now we see it. 01:21:47.560 --> 01:21:51.730 Type in 3, and I'm back in business, printing out the values there. 01:21:51.730 --> 01:21:52.330 Yeah? 01:21:52.330 --> 01:21:54.146 AUDIENCE: Why do you [INAUDIBLE]? 01:21:54.146 --> 01:21:56.120 Why can't [INAUDIBLE]? 01:21:56.120 --> 01:21:56.870 DAVID MALAN: Sure. 01:21:56.870 --> 01:21:58.453 Why do I need the if condition at all? 01:21:58.453 --> 01:22:02.390 Why can't I just return n here as by doing return n. 01:22:02.390 --> 01:22:06.890 Or if I really want to be succinct, I could technically just do this. 01:22:06.890 --> 01:22:09.512 The only reason I added the if condition is 01:22:09.512 --> 01:22:11.720 because, if the user types in negative 1, negative 2, 01:22:11.720 --> 01:22:13.850 I wanted to prompt them again and again. 01:22:13.850 --> 01:22:14.390 That's all. 01:22:14.390 --> 01:22:17.660 But that would be totally acceptable, too, if you were OK with that result 01:22:17.660 --> 01:22:18.630 instead. 01:22:18.630 --> 01:22:21.170 Well, let me do one other thing here to point out 01:22:21.170 --> 01:22:23.870 why we are using get_int so frequently. 01:22:23.870 --> 01:22:26.030 This new training wheel, albeit temporarily. 01:22:26.030 --> 01:22:28.490 So let me go back to the way it was a moment ago 01:22:28.490 --> 01:22:32.510 and let me propose, now, to take away get_int. 01:22:32.510 --> 01:22:35.840 I claimed earlier that, if you're not using get_int, 01:22:35.840 --> 01:22:40.400 you can just use the input function itself from Python. 01:22:40.400 --> 01:22:43.250 But that always returns a string, or a str. 01:22:43.250 --> 01:22:48.110 And so, recall that you have to pass the output of the input function to an int, 01:22:48.110 --> 01:22:51.930 either on the same line or, if you prefer, on another line, instead. 01:22:51.930 --> 01:22:54.110 But it turns out what I didn't do was show 01:22:54.110 --> 01:22:59.250 you what happens if you don't cooperate with the program. 01:22:59.250 --> 01:23:02.540 So if I run python of mario.py now, works great, even 01:23:02.540 --> 01:23:04.252 without the get_int function. 01:23:04.252 --> 01:23:05.210 And I can do it with 4. 01:23:05.210 --> 01:23:06.575 Still works great. 01:23:06.575 --> 01:23:09.122 But let me clear my terminal and be difficult, now, 01:23:09.122 --> 01:23:11.330 as the user and type in "cat" for the height instead. 01:23:11.330 --> 01:23:12.560 Enter. 01:23:12.560 --> 01:23:14.540 Now, we see one of those trace backs again. 01:23:14.540 --> 01:23:15.900 This one is different. 01:23:15.900 --> 01:23:18.780 This isn't a name error, but, apparently, a value error. 01:23:18.780 --> 01:23:20.870 And if I ignore the stuff I don't understand, 01:23:20.870 --> 01:23:24.440 I can see "invalid literal for int with base 10-- "cat."" 01:23:24.440 --> 01:23:27.800 That's a super cryptic way of saying that C-A-T is not 01:23:27.800 --> 01:23:29.640 a number in decimal notation. 01:23:29.640 --> 01:23:32.600 And so, I would seem to have to, somehow, handle this case. 01:23:32.600 --> 01:23:34.490 And if you want to be more curious, you'll 01:23:34.490 --> 01:23:36.350 see that this is, indeed, a traceback. 01:23:36.350 --> 01:23:40.100 And C tends to do this, too, or the debugger would do this for you, too. 01:23:40.100 --> 01:23:41.960 You can see all of the functions that have 01:23:41.960 --> 01:23:43.502 been called to get you to this point. 01:23:43.502 --> 01:23:48.170 So apparently, my problem is, initially, in line 14. 01:23:48.170 --> 01:23:50.375 But line 14, if I keep scrolling, is uninteresting. 01:23:50.375 --> 01:23:51.410 It's main. 01:23:51.410 --> 01:23:55.820 But line 14 leads me to execute line 2, which is, indeed, in main. 01:23:55.820 --> 01:23:59.225 That leads me to execute line 9, which is in get_height. 01:23:59.225 --> 01:24:00.880 And so, OK, here is the issue. 01:24:00.880 --> 01:24:02.960 So the closest line number to the error message 01:24:02.960 --> 01:24:05.360 is the one that probably reveals the most. 01:24:05.360 --> 01:24:06.950 Line 9 is where my issue is. 01:24:06.950 --> 01:24:10.940 So I can't just blindly ask the user for input and, then, convert it to an int 01:24:10.940 --> 01:24:12.620 if they're not going to give me an int. 01:24:12.620 --> 01:24:13.870 Now, how do we deal with this? 01:24:13.870 --> 01:24:16.010 Well, back in problem set two, you might recall 01:24:16.010 --> 01:24:18.380 validating that the user typed in a number 01:24:18.380 --> 01:24:19.862 and using a for loop and the like. 01:24:19.862 --> 01:24:22.445 Well, it turns out, there's a better way to do this in Python, 01:24:22.445 --> 01:24:24.830 and the semantics are there. 01:24:24.830 --> 01:24:29.600 If you want to try to convert something to a number that might not actually 01:24:29.600 --> 01:24:32.780 be a number, turns out, Python and certain other languages 01:24:32.780 --> 01:24:35.060 literally have a keyword called try. 01:24:35.060 --> 01:24:37.820 And if only this existed for the past few weeks, I know. 01:24:37.820 --> 01:24:40.583 But you can try to do the following with your code. 01:24:40.583 --> 01:24:41.750 What do I want to try to do? 01:24:41.750 --> 01:24:46.980 Well, I want to try to execute those few lines, except if there's an error. 01:24:46.980 --> 01:24:50.225 So I can say except if there's a value error-- specifically, 01:24:50.225 --> 01:24:53.065 the one I screwed up and created a moment ago. 01:24:53.065 --> 01:24:56.480 And if there is a value error, I can print out an informative message 01:24:56.480 --> 01:25:00.920 to the user, like "not an integer" or anything else. 01:25:00.920 --> 01:25:05.270 And what's happening here, now, is literally this operative word, try. 01:25:05.270 --> 01:25:09.920 Python is going to try to get input and try to convert it to an int, 01:25:09.920 --> 01:25:12.470 and it's going to try to check if it's greater than 0 01:25:12.470 --> 01:25:14.750 and then try to return it. 01:25:14.750 --> 01:25:15.467 Why? 01:25:15.467 --> 01:25:17.300 Three of those lines are inside of, indented 01:25:17.300 --> 01:25:20.780 underneath the try block, except if something goes wrong-- 01:25:20.780 --> 01:25:23.540 specifically, a value error happens. 01:25:23.540 --> 01:25:24.560 Then, it prints this. 01:25:24.560 --> 01:25:26.110 But it doesn't return anything. 01:25:26.110 --> 01:25:30.335 And because I'm in a loop, that means it's going to do it again and again 01:25:30.335 --> 01:25:33.980 and again until the human actually cooperates and gives me 01:25:33.980 --> 01:25:35.360 an actual number. 01:25:35.360 --> 01:25:38.210 And so, this, too, is what the world would call pythonic. 01:25:38.210 --> 01:25:41.420 In Python, you don't, necessarily, rigorously try to validate 01:25:41.420 --> 01:25:43.940 the user's input, make sure they haven't screwed up. 01:25:43.940 --> 01:25:46.160 You honestly take a more lackadaisical approach 01:25:46.160 --> 01:25:50.300 and just try to do something, but catch an error if it happens. 01:25:50.300 --> 01:25:53.720 So catch is also a term of art, even though it's not a keyword here. 01:25:53.720 --> 01:25:55.760 Except if something happens, you handle it. 01:25:55.760 --> 01:25:57.470 So you try and you handle it. 01:25:57.470 --> 01:25:59.480 You best-effort programming, if you will. 01:25:59.480 --> 01:26:04.200 But this is baked into the mindset of the Python programming community. 01:26:04.200 --> 01:26:08.630 So now, if I do python of mario.py and I cooperate, works great as before. 01:26:08.630 --> 01:26:09.830 Try and succeed. 01:26:09.830 --> 01:26:10.670 3 works. 01:26:10.670 --> 01:26:11.345 4 works. 01:26:11.345 --> 01:26:17.243 If, though, I try and fail by typing in "cat," it doesn't crash, per se. 01:26:17.243 --> 01:26:18.410 It doesn't show me an error. 01:26:18.410 --> 01:26:20.695 It shows me something more user-friendly, like "not an integer." 01:26:20.695 --> 01:26:22.610 And then, I can try again with "dog." 01:26:22.610 --> 01:26:23.390 "Not an integer." 01:26:23.390 --> 01:26:24.980 I can try again with 5. 01:26:24.980 --> 01:26:26.240 And now, it works. 01:26:26.240 --> 01:26:28.160 So we won't, generally, have you write much 01:26:28.160 --> 01:26:30.500 in the way of these try-except blocks, only because they 01:26:30.500 --> 01:26:33.080 get a little sophisticated quickly. 01:26:33.080 --> 01:26:35.777 But that is to reveal what the get_int function is doing. 01:26:35.777 --> 01:26:37.610 This is why we give you the training wheels, 01:26:37.610 --> 01:26:39.420 so that, when you want to get an int, you 01:26:39.420 --> 01:26:41.990 don't have to jump through all these annoying hoops to do so. 01:26:41.990 --> 01:26:45.965 But that's all the library's really doing for you, is just try and except. 01:26:45.965 --> 01:26:48.980 You won't be left with any training wheels, ultimately. 01:26:48.980 --> 01:26:52.760 Questions, now, on getting input and trying in this way? 01:26:55.433 --> 01:26:56.100 Anything at all? 01:26:56.100 --> 01:26:56.610 Yeah? 01:26:56.610 --> 01:27:03.643 AUDIENCE: I'm still [INAUDIBLE] try block. 01:27:03.643 --> 01:27:06.560 DAVID MALAN: Oh, could you put the condition outside of the try block? 01:27:06.560 --> 01:27:07.310 Short answer, yes. 01:27:07.310 --> 01:27:09.227 And, in fact, I struggled with this last night 01:27:09.227 --> 01:27:11.750 when tweaking this example to show the simplest version. 01:27:11.750 --> 01:27:17.180 I will disclaim that, really, I should only be trying, literally, 01:27:17.180 --> 01:27:18.470 to do the fragile part. 01:27:18.470 --> 01:27:21.710 And then, down here, I should be really doing 01:27:21.710 --> 01:27:24.380 what you're proposing, which is do the condition out here. 01:27:24.380 --> 01:27:27.380 The problem is, though, that, logically, this gets messy quickly, right? 01:27:27.380 --> 01:27:31.205 Because except if there's a value error, I want to print out "not an integer." 01:27:31.205 --> 01:27:33.920 I can't compare n against 0, then, because n doesn't 01:27:33.920 --> 01:27:35.752 exist because there was an error. 01:27:35.752 --> 01:27:37.460 So it turns out-- and I'll show you this; 01:27:37.460 --> 01:27:39.350 this is now the advanced version of Python-- 01:27:39.350 --> 01:27:42.620 there's actually an else keyword you can use in Python 01:27:42.620 --> 01:27:44.570 that does not accompany if or elif. 01:27:44.570 --> 01:27:48.680 It accompanies try and except, which I think is weirdly confusing. 01:27:48.680 --> 01:27:50.640 A different word would have been better. 01:27:50.640 --> 01:27:53.692 But if you'd really prefer, I could have done this, instead. 01:27:53.692 --> 01:27:56.900 And this is one of these design things where reasonable people will disagree. 01:27:56.900 --> 01:27:58.775 Generally speaking, you should only try to do 01:27:58.775 --> 01:28:00.980 the one line that might very well fail. 01:28:00.980 --> 01:28:02.420 But honestly, this looks stupid. 01:28:02.420 --> 01:28:04.850 No, it's just unnecessarily complicated. 01:28:04.850 --> 01:28:08.560 And so, my own preference was actually the original, which was-- yeah, 01:28:08.560 --> 01:28:10.310 I'm trying a few extra lines that, really, 01:28:10.310 --> 01:28:11.973 aren't going to fail, mathematically. 01:28:11.973 --> 01:28:12.890 But it's just tighter. 01:28:12.890 --> 01:28:14.030 It's cleaner this way. 01:28:14.030 --> 01:28:16.580 And here's, again, the sort of arguments you'll 01:28:16.580 --> 01:28:18.530 start to make yourself as you get more comfortable with programming. 01:28:18.530 --> 01:28:19.280 You'll have an opinion. 01:28:19.280 --> 01:28:20.488 You'll disagree with someone. 01:28:20.488 --> 01:28:25.200 And so long as you can back you argument up, it's pretty reasonable, probably. 01:28:25.200 --> 01:28:25.700 All right. 01:28:25.700 --> 01:28:30.222 So how about we, now, take away some piece of magic 01:28:30.222 --> 01:28:31.430 that's been here for a while. 01:28:31.430 --> 01:28:33.950 Let me go ahead and delete all of this here. 01:28:33.950 --> 01:28:38.855 And let me propose that we revisit not that vertical column and the exceptions 01:28:38.855 --> 01:28:42.110 that might result from getting input, but these horizontal question marks 01:28:42.110 --> 01:28:43.130 that we saw a while ago. 01:28:43.130 --> 01:28:45.980 So I want all of those question marks on the same line. 01:28:45.980 --> 01:28:48.860 And yet, I worry we're about to see a challenge here because print, 01:28:48.860 --> 01:28:51.830 up until now, has been putting new lines everywhere automatically, 01:28:51.830 --> 01:28:53.570 even without those backslash n's. 01:28:53.570 --> 01:28:56.360 Well, let me propose that we do this. 01:28:56.360 --> 01:28:58.130 for i in the range of 4. 01:28:58.130 --> 01:29:02.165 If I want four question marks, let me just print four question marks. 01:29:02.165 --> 01:29:04.370 Unfortunately, I don't think this is correct yet. 01:29:04.370 --> 01:29:06.530 Let me run python of mario.py. 01:29:06.530 --> 01:29:11.510 And, of course, this gives me a column instead of the row of question marks 01:29:11.510 --> 01:29:12.630 that I want. 01:29:12.630 --> 01:29:13.550 So how do we do this? 01:29:13.550 --> 01:29:17.785 Well, it turns out, if you read the documentation for the print function, 01:29:17.785 --> 01:29:19.910 it turns out that print, not surprisingly, perhaps, 01:29:19.910 --> 01:29:22.000 takes a lot of different arguments, as well. 01:29:22.000 --> 01:29:24.590 And in fact, if you go to the documentation for it, 01:29:24.590 --> 01:29:27.650 you'll see that it takes not just positional 01:29:27.650 --> 01:29:30.685 arguments-- that is, from left to right, separated by commas. 01:29:30.685 --> 01:29:32.810 It turns out, Python has supports a fancier feature 01:29:32.810 --> 01:29:36.860 with arguments where you can pass the names of arguments to functions, too. 01:29:36.860 --> 01:29:38.470 So what do I mean by this? 01:29:38.470 --> 01:29:43.430 If I go back to VS Code here and I've read the documentation, 01:29:43.430 --> 01:29:48.995 it turns out that, yes, as before, you can pass multiple arguments to Python, 01:29:48.995 --> 01:29:49.700 like this. 01:29:49.700 --> 01:29:53.030 Hello comma David comma Nalan, that will just automatically 01:29:53.030 --> 01:29:56.553 concatenate all three of those positional arguments together. 01:29:56.553 --> 01:29:59.720 They're positional in the sense that they literally flow from left to right, 01:29:59.720 --> 01:30:01.238 separated by commas. 01:30:01.238 --> 01:30:03.530 But if you don't want to just pass in values like that, 01:30:03.530 --> 01:30:07.370 you want to actually print out, as I did before, a question mark. 01:30:07.370 --> 01:30:11.240 But you want to override the default behavior of print 01:30:11.240 --> 01:30:14.610 by changing the line ending, you can actually do this. 01:30:14.610 --> 01:30:18.890 You can use the name of an argument that you know exists from the documentation 01:30:18.890 --> 01:30:22.130 and set it equal to some alternative value. 01:30:22.130 --> 01:30:24.770 And in fact, even though this looks cryptic, 01:30:24.770 --> 01:30:30.380 this is how I would override the end of each line, to be quote, unquote. 01:30:30.380 --> 01:30:32.900 That is nothing because, if you read the documentation, 01:30:32.900 --> 01:30:37.190 the default value for this end argument-- does someone want to guess-- 01:30:37.190 --> 01:30:38.750 is-- 01:30:38.750 --> 01:30:39.800 is backslash n. 01:30:39.800 --> 01:30:41.690 So if you read the documentation, you'll se 01:30:41.690 --> 01:30:46.550 that backslash n is the implied default for this end argument. 01:30:46.550 --> 01:30:49.810 And so, if you want to change it, you just say end equals something else. 01:30:49.810 --> 01:30:57.057 And so, here, I can change it to nothing and, now, rerun python of mario.py. 01:30:57.057 --> 01:30:58.640 And now, they're all in the same line. 01:30:58.640 --> 01:31:01.190 Now, it looks a little stupid because I made that week 01:31:01.190 --> 01:31:04.190 one mistake where I still need to move the cursor to the next line. 01:31:04.190 --> 01:31:05.570 That's just a different problem. 01:31:05.570 --> 01:31:07.612 I'm just going to go over here and print nothing. 01:31:07.612 --> 01:31:10.550 I don't even need to print backslash n because, if print automatically 01:31:10.550 --> 01:31:13.970 gives you a backslash n, just call print with nothing, 01:31:13.970 --> 01:31:15.420 and you'll get that for free. 01:31:15.420 --> 01:31:16.940 So let me rerun python of mario.py. 01:31:16.940 --> 01:31:19.895 And now, it looks a little prettier at the prompt. 01:31:19.895 --> 01:31:21.770 And to be super clear as to what's going on-- 01:31:21.770 --> 01:31:24.300 suppose I want to make an exclamation here. 01:31:24.300 --> 01:31:27.320 I could change the backslash n default to an exclamation point, 01:31:27.320 --> 01:31:28.680 just for kicks. 01:31:28.680 --> 01:31:31.550 And if I run python of mario.py Again, now, I 01:31:31.550 --> 01:31:36.662 get this exclamation with question marks and exclamation points, as well. 01:31:36.662 --> 01:31:38.120 So that's all that's going on here. 01:31:38.120 --> 01:31:40.670 And this is what's called a named argument. 01:31:40.670 --> 01:31:43.670 It literally has a name that you can specify when calling it in. 01:31:43.670 --> 01:31:47.787 And it's different from positional in that you're literally using the name. 01:31:47.787 --> 01:31:49.370 Let me propose something else, though. 01:31:49.370 --> 01:31:50.828 And this is why people like Python. 01:31:50.828 --> 01:31:52.550 There's just cool ways to do things. 01:31:55.724 --> 01:32:00.740 That's a three-line, verbose way of printing out four question marks. 01:32:00.740 --> 01:32:04.002 I could certainly take the shortcut and just do this. 01:32:04.002 --> 01:32:06.085 But that's not really that interesting for anyone, 01:32:06.085 --> 01:32:08.720 especially if I want to do it a variable number of times. 01:32:08.720 --> 01:32:10.390 But Python does let you do this. 01:32:10.390 --> 01:32:15.110 If you want to multiply a character some number of times, 01:32:15.110 --> 01:32:18.020 not only can you use plus for concatenation, 01:32:18.020 --> 01:32:23.930 you can use star or an asterisk for multiplication, if you will-- that is, 01:32:23.930 --> 01:32:26.250 concatenation again and again and again. 01:32:26.250 --> 01:32:29.030 So if I just print out, quote unquote, "?" 01:32:29.030 --> 01:32:34.190 times 4, that's actually going to be the tightest way, the most distinct way 01:32:34.190 --> 01:32:36.020 I can print four question marks instead. 01:32:36.020 --> 01:32:39.095 And if I don't use 4, I use n, where I get n from the user. 01:32:39.095 --> 01:32:39.830 Bang. 01:32:39.830 --> 01:32:42.320 Now, I've gotten rid of the for loop entirely, 01:32:42.320 --> 01:32:48.000 and I'm using the star operator to manipulate it instead. 01:32:48.000 --> 01:32:50.120 And, to be super clear here, insofar as Python 01:32:50.120 --> 01:32:54.440 does not have malloc or free or memory management that you have to do, 01:32:54.440 --> 01:32:56.060 guess what Python also doesn't have. 01:32:59.760 --> 01:33:03.110 Anything on your minds in the past couple of week? 01:33:03.110 --> 01:33:03.875 Doesn't have-- 01:33:03.875 --> 01:33:04.853 AUDIENCE: Pointers. 01:33:04.853 --> 01:33:06.020 DAVID MALAN: Pointers, yeah. 01:33:06.020 --> 01:33:09.295 So Python does not have pointers, which just means that all of that 01:33:09.295 --> 01:33:11.420 happens for you automatically, underneath the hood, 01:33:11.420 --> 01:33:14.150 again, by way of code that someone else wrote. 01:33:14.150 --> 01:33:15.950 How about one more throwback with Mario? 01:33:15.950 --> 01:33:20.450 We've talked about, in week one, this two-dimensional structure where 01:33:20.450 --> 01:33:24.302 it's like I claim 3 by 3-- a grid of bricks, if you will. 01:33:24.302 --> 01:33:25.760 Well, how can we do this in Python? 01:33:25.760 --> 01:33:27.590 We can do this in a couple of ways, now. 01:33:27.590 --> 01:33:32.810 Let me go back to my mario.py, and let me do something like for i in range 01:33:32.810 --> 01:33:36.200 of-- we'll just do 3, even though I know, now, I could use get_int 01:33:36.200 --> 01:33:38.453 or I could use input and int. 01:33:38.453 --> 01:33:41.120 And if I want to do something two-dimensionally, just like in C, 01:33:41.120 --> 01:33:42.590 you can nest your for loops. 01:33:42.590 --> 01:33:45.980 So maybe I could do for j in range of 3. 01:33:45.980 --> 01:33:50.690 And then, in here, I could print out a hash symbol. 01:33:50.690 --> 01:33:53.210 And then, let's see if that gives me 9 total. 01:33:53.210 --> 01:33:56.870 So if I've got a nested loop like this, python of mario.py 01:33:56.870 --> 01:33:58.625 hopefully gives me a grid. 01:33:58.625 --> 01:34:01.710 No, it gave me a column of 9. 01:34:01.710 --> 01:34:09.280 Why, logically, even though I've got my row and my columns? 01:34:09.280 --> 01:34:10.210 Yeah. 01:34:10.210 --> 01:34:11.542 AUDIENCE: [INAUDIBLE] 01:34:11.542 --> 01:34:13.000 DAVID MALAN: Yeah, the line ending. 01:34:13.000 --> 01:34:17.380 So in my row, I can't let print just keep adding new line, adding new line. 01:34:17.380 --> 01:34:20.740 So I just have to override this here and let me not screw up like before. 01:34:20.740 --> 01:34:24.250 Let me print one at the end of the whole row, just to move the cursor down. 01:34:24.250 --> 01:34:28.090 And I think, now, together, we've got our 3 by 3. 01:34:28.090 --> 01:34:29.950 Of course, we could tighten this up further. 01:34:29.950 --> 01:34:33.730 If I don't like the nested loop, I probably could go in here 01:34:33.730 --> 01:34:37.975 and just print out, for instance, a brick times 3. 01:34:37.975 --> 01:34:41.055 Or I could change the 3 to a variable if I've gotten it from the user. 01:34:41.055 --> 01:34:42.582 So I can tighten this up further. 01:34:42.582 --> 01:34:45.790 So, again, just different ways to solve the same problem and, again, evidence 01:34:45.790 --> 01:34:47.575 of why a lot of people like Python. 01:34:47.575 --> 01:34:49.825 There's just some more pleasant ways to solve problems 01:34:49.825 --> 01:34:52.330 without getting into the weeds, constantly, of doing things, 01:34:52.330 --> 01:34:56.845 like with for loops and while loops endlessly. 01:34:56.845 --> 01:34:57.430 All right. 01:34:57.430 --> 01:34:59.222 Well, how about some other building blocks? 01:34:59.222 --> 01:35:02.983 Lists are going to be so incredibly useful in Python, just as arrays 01:35:02.983 --> 01:35:04.900 were in C. But arrays are annoying because you 01:35:04.900 --> 01:35:06.410 have to manage the memory yourself. 01:35:06.410 --> 01:35:08.327 You have to in advance how big they are or you 01:35:08.327 --> 01:35:11.440 have to use pointers and malloc or realloc to resize them. 01:35:11.440 --> 01:35:12.100 Oh my god. 01:35:12.100 --> 01:35:14.267 The past two weeks have been painful, in that sense. 01:35:14.267 --> 01:35:17.298 But Python does this all for free for you. 01:35:17.298 --> 01:35:19.090 In fact, there's a whole bunch of functions 01:35:19.090 --> 01:35:22.030 that come with Python that involve lists, 01:35:22.030 --> 01:35:29.678 and they'll allow us, ultimately, to do things again and again and again 01:35:29.678 --> 01:35:30.970 within the same data structure. 01:35:30.970 --> 01:35:33.220 And, for instance, we'll be able to get the length of a list. 01:35:33.220 --> 01:35:35.560 You don't have to remember it yourself in a variable. 01:35:35.560 --> 01:35:39.085 You can just ask Python how many elements are in this list. 01:35:39.085 --> 01:35:42.850 And with this, I think we can solve some old problems, too. 01:35:42.850 --> 01:35:45.250 So let me go back here, to VS Code. 01:35:45.250 --> 01:35:50.890 Let me close mario and give us a new program called scores.py. 01:35:50.890 --> 01:35:54.535 And rather than show the C and the Python now, let's just focus on Python. 01:35:54.535 --> 01:35:59.390 And in scores.c way back when, we just averaged three test scores or something 01:35:59.390 --> 01:35:59.890 like that-- 01:35:59.890 --> 01:36:01.900 72, 73, and 33-- 01:36:01.900 --> 01:36:03.230 a few weeks ago. 01:36:03.230 --> 01:36:07.450 So if I want to create a list in this Python version of 72, 73, 33, 01:36:07.450 --> 01:36:09.220 I just use my square bracket notation. 01:36:09.220 --> 01:36:12.640 C let you use curly braces if you know the values in advance, 01:36:12.640 --> 01:36:14.170 but Python's just this. 01:36:14.170 --> 01:36:16.855 And now, if I want to compute the average-- 01:36:16.855 --> 01:36:19.360 in C, recall, I did something with a loop. 01:36:19.360 --> 01:36:21.140 I added all the values together. 01:36:21.140 --> 01:36:23.230 I, then, divide it by the total number of values 01:36:23.230 --> 01:36:26.110 just like you would in grade school, and that gave me the average. 01:36:26.110 --> 01:36:29.085 Well, Python comes with a lot of super handy functions-- 01:36:29.085 --> 01:36:31.395 not just length, but others, as well. 01:36:31.395 --> 01:36:34.150 And so, in fact, if you want to compute the average, 01:36:34.150 --> 01:36:36.970 you can take the sum of all of those scores 01:36:36.970 --> 01:36:40.010 and divide it by the length of all of those scores. 01:36:40.010 --> 01:36:42.490 So Python comes with length, comes with sum. 01:36:42.490 --> 01:36:45.310 You can just pass in a whole list of any size 01:36:45.310 --> 01:36:47.590 and let it deal with that problem for you. 01:36:47.590 --> 01:36:49.900 So if I want to, now, print out this average, 01:36:49.900 --> 01:36:51.760 I can print out Average colon-- 01:36:51.760 --> 01:36:55.570 and then, I'll plug in my average variable for interpolation. 01:36:55.570 --> 01:36:58.900 Let me make this an fstring so that it gets formatted, 01:36:58.900 --> 01:37:01.530 and let me just run python of scores.py. 01:37:01.530 --> 01:37:02.800 And there is my average. 01:37:02.800 --> 01:37:05.890 It's rounding weird because we're still vulnerable to some floating point 01:37:05.890 --> 01:37:09.340 imprecision, but at least I didn't need loops 01:37:09.340 --> 01:37:11.575 and I didn't have to write all this darn code just 01:37:11.575 --> 01:37:15.130 to do something that Excel and Google Spreadsheets can just do like that. 01:37:15.130 --> 01:37:17.950 Well, Python is closer to those kinds of tools, 01:37:17.950 --> 01:37:21.790 but more powerful in that you can manipulate the data yourself. 01:37:21.790 --> 01:37:25.510 How about, though, if I want to get a bunch of scores manually from the user 01:37:25.510 --> 01:37:27.280 and, then, sum them together. 01:37:27.280 --> 01:37:28.920 Well, let's combine a few ideas here. 01:37:28.920 --> 01:37:29.830 How about this? 01:37:29.830 --> 01:37:36.070 First, let me go ahead and import the get_int function from the CS50 library, 01:37:36.070 --> 01:37:39.340 just so we don't have to deal with try and except or all of that. 01:37:39.340 --> 01:37:42.340 And let me go ahead and give myself an empty list. 01:37:42.340 --> 01:37:44.410 And this is powerful. 01:37:44.410 --> 01:37:48.068 In C, [SIGHS] there's no point to an empty array 01:37:48.068 --> 01:37:50.860 because, if you create an empty array with square bracket notation, 01:37:50.860 --> 01:37:52.600 it's not useful for anything. 01:37:52.600 --> 01:37:55.780 But in Python, you can create it empty because Python 01:37:55.780 --> 01:37:59.590 will grow and shrink the list for you automatically, as you add things to it. 01:37:59.590 --> 01:38:01.600 So if I want to get three scores from the user, 01:38:01.600 --> 01:38:04.840 I could do something like this-- for i in range of 3. 01:38:04.840 --> 01:38:08.680 And then, I can grab a variable called "score" or anything. 01:38:08.680 --> 01:38:11.467 I could call get_int, prompt the human for the score 01:38:11.467 --> 01:38:12.550 that they want to type in. 01:38:12.550 --> 01:38:15.060 And then, once they do, I can do this. 01:38:15.060 --> 01:38:19.450 Thinking back to our object-oriented programming capability now, 01:38:19.450 --> 01:38:24.358 I could do scores.append, and I can append that score to it. 01:38:24.358 --> 01:38:27.400 And you would only know this from having read the documentation, heard it 01:38:27.400 --> 01:38:30.040 in class, in a book or whatnot, but it turns out 01:38:30.040 --> 01:38:33.880 that, just like strings have functions like lower built into them, 01:38:33.880 --> 01:38:37.735 lists have functions like append built into them that just literally appends 01:38:37.735 --> 01:38:40.165 to the end of the list for you, and Python 01:38:40.165 --> 01:38:42.250 will grow or shrink it as needed. 01:38:42.250 --> 01:38:44.760 No more malloc or realloc or the like. 01:38:44.760 --> 01:38:49.120 So this just appends to the scores list. 01:38:49.120 --> 01:38:51.740 That score, and then again and again and again. 01:38:51.740 --> 01:38:52.990 So the array starts at-- 01:38:52.990 --> 01:38:57.640 sorry, the list starts at size 0, then grows to 1 then 2 then 3 01:38:57.640 --> 01:38:59.320 without you having to do anything else. 01:38:59.320 --> 01:39:02.845 And so, now, down here, I can compute an average 01:39:02.845 --> 01:39:05.620 with the sum of those scores divided by the length 01:39:05.620 --> 01:39:07.455 of the total number of scores. 01:39:07.455 --> 01:39:11.830 And to be clear, length is the total number of elements in the list. 01:39:11.830 --> 01:39:14.200 Doesn't matter how big the values themselves are. 01:39:14.200 --> 01:39:18.160 Now I can go ahead and print out an fstring with something 01:39:18.160 --> 01:39:22.100 like Average colon average in curly braces. 01:39:22.100 --> 01:39:24.680 And if I run python of scores.py-- 01:39:24.680 --> 01:39:27.505 I'll type in, just for the sake of discussion, the three values, 01:39:27.505 --> 01:39:29.440 I still get the same answer. 01:39:29.440 --> 01:39:31.390 But that would have been painful to do in C 01:39:31.390 --> 01:39:35.770 unless you committed, in advance, to a fixed size array-- which we already 01:39:35.770 --> 01:39:41.830 decided, weeks ago, was annoying-- or you grew it dynamically 01:39:41.830 --> 01:39:44.740 using malloc or realloc or the like. 01:39:44.740 --> 01:39:45.400 All right. 01:39:45.400 --> 01:39:46.240 What else can I do? 01:39:46.240 --> 01:39:49.990 Well, there's some nice things you might as well know exist. 01:39:49.990 --> 01:39:54.340 Instead of scores.append, you can do slight fanciness like this. 01:39:54.340 --> 01:39:57.290 If you want to append something to a list, 01:39:57.290 --> 01:40:00.100 you can actually do plus equals, and then 01:40:00.100 --> 01:40:03.620 put that thing in a temporary list of its own 01:40:03.620 --> 01:40:05.740 and just use what is essentially concatenation-- 01:40:05.740 --> 01:40:09.410 but not concatenation of strings, but concatenation of lists. 01:40:09.410 --> 01:40:13.480 So this new line 6 appends to the score's list-- 01:40:13.480 --> 01:40:15.640 this tiny, little list I'm temporarily creating 01:40:15.640 --> 01:40:17.670 with just the current new score. 01:40:17.670 --> 01:40:20.260 So just another piece of syntax that's worth seeing that 01:40:20.260 --> 01:40:23.290 allows you to do something like that, as well. 01:40:23.290 --> 01:40:23.890 All right. 01:40:23.890 --> 01:40:26.093 Well, how about we go back to strings for a moment? 01:40:26.093 --> 01:40:29.260 And all of these examples, as always, are on the course's website afterward. 01:40:29.260 --> 01:40:32.860 Suppose we want to do something like converting characters to uppercase. 01:40:32.860 --> 01:40:35.170 Well, to be clear, I could do something like this. 01:40:35.170 --> 01:40:38.080 Let me create a program called uppercase.py. 01:40:38.080 --> 01:40:42.280 Let me prompt the user for a before string as by using the input function 01:40:42.280 --> 01:40:44.510 or get_string, which is almost the same. 01:40:44.510 --> 01:40:47.110 And I'll prompt the user for a string beforehand. 01:40:47.110 --> 01:40:52.750 Then, let me go ahead and print out, how about, the keyword "After," 01:40:52.750 --> 01:40:56.650 and then end the new line with nothing, just so 01:40:56.650 --> 01:41:00.010 that I can see "Before" on one line and "After" on the next line. 01:41:00.010 --> 01:41:01.240 And then, let me do this-- 01:41:01.240 --> 01:41:04.450 and here's where Python gets pleasant, too, with loops-- 01:41:04.450 --> 01:41:07.270 for c in before-- 01:41:07.270 --> 01:41:11.110 print c.upper end equals quote, unquote. 01:41:11.110 --> 01:41:12.580 And then, I'll print this here. 01:41:12.580 --> 01:41:13.120 All right. 01:41:13.120 --> 01:41:15.950 That was fast, but let's try to infer what's going on. 01:41:15.950 --> 01:41:19.600 So line 1 just gets input from the user, stores it in a variable called before. 01:41:19.600 --> 01:41:22.510 Line two literally just prints "After" but doesn't 01:41:22.510 --> 01:41:25.300 move the cursor to the next line. 01:41:25.300 --> 01:41:27.015 What it, then, does is this. 01:41:27.015 --> 01:41:29.875 And, in C, this was a little more annoying. 01:41:29.875 --> 01:41:31.450 You needed a for loop with i. 01:41:31.450 --> 01:41:34.690 You needed array notation with the square brackets. 01:41:34.690 --> 01:41:39.850 But, Python, if you say for variable in string-- 01:41:39.850 --> 01:41:42.670 so for c, for character, in string, Python 01:41:42.670 --> 01:41:46.060 is going to automatically assign c to the first letter 01:41:46.060 --> 01:41:47.110 that the user types in. 01:41:47.110 --> 01:41:49.120 Then, on the next iteration, the second letter, the third letter, 01:41:49.120 --> 01:41:49.745 and the fourth. 01:41:49.745 --> 01:41:52.360 So you don't need any square bracket notation, you just use c, 01:41:52.360 --> 01:41:55.180 and Python will do it for you and just hand you back, 01:41:55.180 --> 01:41:59.000 one at a time, each of the letters that the user has typed in. 01:41:59.000 --> 01:42:04.720 So if I go back over here and I run, for instance, python of uppercase.py 01:42:04.720 --> 01:42:09.760 and I'll type in, how about, "david" in all lowercase and hit Enter, 01:42:09.760 --> 01:42:13.630 you'll now see that it's all uppercase instead by iterating over it, 01:42:13.630 --> 01:42:15.372 indeed, one character at a time. 01:42:15.372 --> 01:42:17.830 But we already know, thanks to object-oriented programming, 01:42:17.830 --> 01:42:20.027 strings themselves have the functionality built 01:42:20.027 --> 01:42:24.100 in to not just uppercase single characters, but the whole string. 01:42:24.100 --> 01:42:26.530 So, honestly, this was a bit of a silly exercise. 01:42:26.530 --> 01:42:31.360 I don't need to use a loop anymore, like in C. And so, some of the habits 01:42:31.360 --> 01:42:34.720 you've only just developed in recent weeks, it's time to start breaking them 01:42:34.720 --> 01:42:36.130 when they're not necessary. 01:42:36.130 --> 01:42:40.470 I can create a variable called after, set it equal to before.upper-- 01:42:40.470 --> 01:42:43.600 which, indeed, exists, just like dot lower exists. 01:42:43.600 --> 01:42:47.490 And then, what I can go ahead and print out is, for instance-- 01:42:47.490 --> 01:42:49.990 let's get rid of this print line here and do it at the end-- 01:42:49.990 --> 01:42:53.900 "After" and print the value of that variable. 01:42:53.900 --> 01:42:58.005 So now, if I rerun uppercase.py, type in "david" in all lowercase, 01:42:58.005 --> 01:43:03.400 I can just uppercase the whole thing all at once because, again, in Python, 01:43:03.400 --> 01:43:07.000 you don't have to operate on characters individually. 01:43:07.000 --> 01:43:13.310 Questions on any of these tricks up until now? 01:43:13.310 --> 01:43:13.810 No? 01:43:13.810 --> 01:43:14.290 All right. 01:43:14.290 --> 01:43:17.290 How about a few other techniques that we saw in C that we'll bring back, 01:43:17.290 --> 01:43:18.145 now, in Python. 01:43:18.145 --> 01:43:22.860 So it turns out, in Python, there are other libraries you can use, too, 01:43:22.860 --> 01:43:24.360 that unlock even more functionality. 01:43:24.360 --> 01:43:27.040 So, in C, if you wanted command line arguments, 01:43:27.040 --> 01:43:32.410 you just change the signature for main to be, instead of void, 01:43:32.410 --> 01:43:38.515 int argc comma string argv, open brackets for an array or char star, 01:43:38.515 --> 01:43:39.130 eventually. 01:43:39.130 --> 01:43:41.770 Well, it turns out, in Python, that, if you want to access command line 01:43:41.770 --> 01:43:44.770 arguments, it's a little simpler, but they're tucked away in a library-- 01:43:44.770 --> 01:43:46.990 otherwise known as a module-- 01:43:46.990 --> 01:43:49.552 called sys, the system module. 01:43:49.552 --> 01:43:51.760 Now, this is similar, in spirit, to the CS50 library, 01:43:51.760 --> 01:43:53.802 and that's got a bunch of functionality built in. 01:43:53.802 --> 01:43:55.725 But this one comes with Python itself. 01:43:55.725 --> 01:43:59.710 So if I want tot create a program like greet.py, in VS Code, 01:43:59.710 --> 01:44:01.510 here, let me go ahead and do this. 01:44:01.510 --> 01:44:05.785 From the sys library, let's import argv. 01:44:05.785 --> 01:44:07.850 And that's just a thing that exists. 01:44:07.850 --> 01:44:10.660 It's not built into main because there is no main, per se, anymore. 01:44:10.660 --> 01:44:12.590 So it's tucked away in that library. 01:44:12.590 --> 01:44:14.330 And now, I can do something like this. 01:44:14.330 --> 01:44:16.925 If the length of argv equals equals 2, well, 01:44:16.925 --> 01:44:19.090 let's go ahead and print out something friendly, 01:44:19.090 --> 01:44:24.955 like hello comma argv bracket 1, and then, close quotes. 01:44:24.955 --> 01:44:28.360 Else, if the length of argv is not equal to 2, 01:44:28.360 --> 01:44:30.400 Let's just go ahead and print out hello, world. 01:44:30.400 --> 01:44:32.525 Now, at a glance, this might look a little cryptic, 01:44:32.525 --> 01:44:35.050 but it's identical to what we did a few weeks ago. 01:44:35.050 --> 01:44:39.570 When I run this, python of greet.py, with no arguments, 01:44:39.570 --> 01:44:40.950 it just says "hello, world." 01:44:40.950 --> 01:44:46.180 But if I, instead, add a command line argument, like my first name and hit 01:44:46.180 --> 01:44:49.825 Enter, now, the length of argv is no longer 1. 01:44:49.825 --> 01:44:51.700 It's going to be 2. 01:44:51.700 --> 01:44:54.680 And so, it prints out "Hello, David" instead. 01:44:54.680 --> 01:44:57.880 So the takeaway here is that, whereas in C, 01:44:57.880 --> 01:45:03.955 argv technically contained the name of your program, like ./hello or ./greet, 01:45:03.955 --> 01:45:05.455 and then everything the human typed. 01:45:05.455 --> 01:45:08.410 Python's a little different in that, because we're 01:45:08.410 --> 01:45:10.150 using the interpreter in this way-- 01:45:10.150 --> 01:45:16.090 technically, when you run python of greet.py, the length of argv is only 1. 01:45:16.090 --> 01:45:18.760 It contains only greet.py, so the name of the file. 01:45:18.760 --> 01:45:21.670 It does not unnecessarily contain Python itself 01:45:21.670 --> 01:45:24.460 because what's the point of that being there, omnipresently? 01:45:24.460 --> 01:45:28.760 It does contain the number of words that the human typed after Python itself. 01:45:28.760 --> 01:45:32.230 So argv is length 1 here. argv is length 2 here. 01:45:32.230 --> 01:45:35.350 And that's why, when it did equal 2, I saw "Hello, David" instead 01:45:35.350 --> 01:45:37.240 of the default "Hello, world." 01:45:37.240 --> 01:45:41.440 So same ability to access command line arguments, add these kinds of inputs 01:45:41.440 --> 01:45:43.570 to your functions, but you have to unlock it 01:45:43.570 --> 01:45:47.830 by way of using argv instead, in this way. 01:45:47.830 --> 01:45:51.910 If you want to see all of the words, you could do something like this. 01:45:51.910 --> 01:45:57.760 Just as-- if we combine ideas, here-- for i in range of, how about, length 01:45:57.760 --> 01:45:59.610 of argv. 01:45:59.610 --> 01:46:02.260 Then, I can do this-- print argv bracket i. 01:46:02.260 --> 01:46:02.860 All right. 01:46:02.860 --> 01:46:06.385 A little cryptic, but line 3 is just a for loop iterating 01:46:06.385 --> 01:46:08.410 over the range of length of argv. 01:46:08.410 --> 01:46:12.640 So if the human types in two words, the length of argv will be 2. 01:46:12.640 --> 01:46:16.885 So this is just a way of saying iterate over all of the words in argv, 01:46:16.885 --> 01:46:18.380 printing them one at a time. 01:46:18.380 --> 01:46:22.810 So python of greet.py, Enter just prints out the name of the program. 01:46:22.810 --> 01:46:27.340 python of greet.py with David prints out greet.py and, then, David. 01:46:27.340 --> 01:46:29.470 I can keep running it though with more words, 01:46:29.470 --> 01:46:32.650 and they'll each get printed one at a time. 01:46:32.650 --> 01:46:35.440 But what's nice, too, about Python-- 01:46:35.440 --> 01:46:38.920 and this is the point of this exercise-- honestly, this looks pretty cryptic. 01:46:38.920 --> 01:46:40.720 This is not very pleasant to look at. 01:46:40.720 --> 01:46:46.150 If you just want to iterate over every word in a list, which argv is, 01:46:46.150 --> 01:46:47.680 watch what I can do. 01:46:47.680 --> 01:46:52.090 I can do for arg or any variable name in argv. 01:46:52.090 --> 01:46:54.147 Let me just, now, print out that argument. 01:46:54.147 --> 01:46:56.980 I could keep calling it i, but i seems weird when it's not a number. 01:46:56.980 --> 01:46:59.710 So I'm changing to arg as a word, instead. 01:46:59.710 --> 01:47:03.970 If I now do python of greet.py, it does this. 01:47:03.970 --> 01:47:06.460 If I do python of greet.py, David, it does that again. 01:47:06.460 --> 01:47:08.690 David Malan, it does that again. 01:47:08.690 --> 01:47:10.898 So this is, again, why Python is just very appealing. 01:47:10.898 --> 01:47:13.482 You want to do something this many times, iterate over a list? 01:47:13.482 --> 01:47:15.820 Just say it, and it reads a little more like English. 01:47:15.820 --> 01:47:18.130 And there's even other fanciness, too, if I may. 01:47:18.130 --> 01:47:21.820 It's a little stupid that I keep seeing the name of the program, greet.py, 01:47:21.820 --> 01:47:24.640 so it'd be nice if I could remove that. 01:47:24.640 --> 01:47:28.960 Python also supports what are called slices of arrays-- 01:47:28.960 --> 01:47:30.340 sorry, slices of lists. 01:47:30.340 --> 01:47:32.050 Even I get the terminology confused. 01:47:32.050 --> 01:47:36.400 If argv is a list, then it's going to print out everything in it. 01:47:36.400 --> 01:47:41.950 But if I want a slice of it that starts at location 1 all the way to the end, 01:47:41.950 --> 01:47:45.500 you can use this funky syntax in between the square brackets, which 01:47:45.500 --> 01:47:48.700 we've not seen yet, that's going to start at item 1 01:47:48.700 --> 01:47:50.220 and go all the way to the end. 01:47:50.220 --> 01:47:53.830 And so, this is a nice, clever way of slicing off, 01:47:53.830 --> 01:47:56.170 if you will, the very first element because now, 01:47:56.170 --> 01:48:01.900 when I run greet.py, David Malan, I should only see David and Malan. 01:48:01.900 --> 01:48:04.940 If I only want one element, I could do 1 to 2. 01:48:04.940 --> 01:48:08.260 If I want all of them, I could do 0 onward. 01:48:08.260 --> 01:48:10.900 I could give myself just one of them in this way. 01:48:10.900 --> 01:48:14.380 So you can play with the start value and the end value in this way, 01:48:14.380 --> 01:48:17.020 to slice and dice these lists in different ways. 01:48:17.020 --> 01:48:20.620 That would have been a pain in C, just because we didn't really 01:48:20.620 --> 01:48:26.840 have the built-in support for manipulating arrays as cleanly as this. 01:48:26.840 --> 01:48:27.340 All right. 01:48:27.340 --> 01:48:31.440 Just so you've seen it, too-- though, this one is less exciting to see live-- 01:48:31.440 --> 01:48:33.940 if I go ahead and create a quick program here, it turns out, 01:48:33.940 --> 01:48:37.630 there's something else in the sys library, the ability to exit programs-- 01:48:37.630 --> 01:48:41.590 either exiting with status code 1 or 0, as we've been doing any time something 01:48:41.590 --> 01:48:42.673 goes right or wrong. 01:48:42.673 --> 01:48:45.340 So, for instance, let me whip up a quick program that just says, 01:48:45.340 --> 01:48:52.300 if the length of sys.argv does not equal 2, then let's yell at the user 01:48:52.300 --> 01:48:54.970 and say you're missing a command line argument. 01:48:54.970 --> 01:48:57.380 Otherwise, command-line argument. 01:48:57.380 --> 01:49:01.360 And let's, then, return sys.exit(1). 01:49:01.360 --> 01:49:05.590 Else, let's go ahead and, logically, just say print a formatted string that 01:49:05.590 --> 01:49:07.450 says hello-- as before-- 01:49:07.450 --> 01:49:09.640 sys.argv 1. 01:49:09.640 --> 01:49:11.770 Now, things look different all of a sudden, 01:49:11.770 --> 01:49:13.312 but I'm doing something deliberately. 01:49:13.312 --> 01:49:14.870 First, let's see what this does. 01:49:14.870 --> 01:49:18.730 So, on line 1, I'm importing not argv, specifically. 01:49:18.730 --> 01:49:22.150 I'm importing the whole sys library, and we'll see why in a second. 01:49:22.150 --> 01:49:27.220 Well, it turns out that the sys library has not only the argv list, 01:49:27.220 --> 01:49:30.580 it also has a function called exit, which I'd like to be able to use, 01:49:30.580 --> 01:49:31.370 as well. 01:49:31.370 --> 01:49:35.200 So it turns out that, if you import a whole library in this way, that's fine. 01:49:35.200 --> 01:49:37.840 But you have to refer to the things inside of it 01:49:37.840 --> 01:49:42.980 by using that same library's name and a dot to namespace it, so to speak. 01:49:42.980 --> 01:49:47.002 So here, I'm just saying, if the user does not type in two words, 01:49:47.002 --> 01:49:49.960 yell at them with missing command line argument, and then, exit with 1. 01:49:49.960 --> 01:49:52.975 Just like in C, when you do exit 1, just means something went wrong. 01:49:52.975 --> 01:49:54.785 Otherwise, print out hello to this. 01:49:54.785 --> 01:49:57.910 And this is starting to look cryptic, but it's just a combination of ideas. 01:49:57.910 --> 01:50:02.080 The curly braces means interpolate this value, plug it in here. 01:50:02.080 --> 01:50:05.740 sys.argv is just the verbose way of saying go into the sys library 01:50:05.740 --> 01:50:09.010 and get the argv variable therein. 01:50:09.010 --> 01:50:11.860 And bracket 1, of course, just like arrays in C, 01:50:11.860 --> 01:50:15.440 is just the second element at the prompt. 01:50:15.440 --> 01:50:18.700 So when I run this version, now-- python of exit.py-- 01:50:18.700 --> 01:50:21.340 with no arguments, I get yelled at in this way. 01:50:21.340 --> 01:50:24.640 If, however, I type in two arguments total-- 01:50:24.640 --> 01:50:26.950 the name of the file and my own name-- 01:50:26.950 --> 01:50:29.050 now, I get greeted with hello, David. 01:50:29.050 --> 01:50:30.310 And it's the same idea before. 01:50:30.310 --> 01:50:33.160 This was a very low-level technique, but same thing here. 01:50:33.160 --> 01:50:36.310 If you do echo dollar sign question mark Enter, 01:50:36.310 --> 01:50:39.170 you'll see the exit code of your program. 01:50:39.170 --> 01:50:41.270 So if I do this incorrectly again-- 01:50:41.270 --> 01:50:43.953 let me rerun it without my name, Enter-- 01:50:43.953 --> 01:50:44.620 I get yelled at. 01:50:44.620 --> 01:50:47.320 But if I do echo dollar sign question mark, 01:50:47.320 --> 01:50:50.170 there's the secret one that's returned. 01:50:50.170 --> 01:50:54.160 Again, just to show you parity with C, in this case. 01:50:54.160 --> 01:50:56.320 Questions, now, on any of these techniques, here? 01:50:58.900 --> 01:50:59.400 No. 01:50:59.400 --> 01:51:00.030 All right. 01:51:00.030 --> 01:51:02.580 How about something that's a little more powerful, too? 01:51:02.580 --> 01:51:05.880 We spend so much time in week 0 and 1 doing searching 01:51:05.880 --> 01:51:07.830 and, then, eventually, sorting in week 3. 01:51:07.830 --> 01:51:10.288 Well, it turns out, Python can help with some of this, too. 01:51:10.288 --> 01:51:12.720 Let me go ahead and create a program called names.py 01:51:12.720 --> 01:51:15.053 that's just going to be an opportunity to, maybe, search 01:51:15.053 --> 01:51:16.650 over a whole bunch of names. 01:51:16.650 --> 01:51:21.060 Let me go ahead and import sys, just so I have access to exit. 01:51:21.060 --> 01:51:22.920 And let me go ahead and create a variable 01:51:22.920 --> 01:51:26.756 called names that's going to be a list with a whole bunch of names. 01:51:26.756 --> 01:51:27.660 How about here? 01:51:27.660 --> 01:51:34.740 Charlie and Fred and George and Ginny and Percy and, lastly, Ron. 01:51:34.740 --> 01:51:36.290 So a whole bunch of names here. 01:51:36.290 --> 01:51:38.040 And it'd be a little annoying to implement 01:51:38.040 --> 01:51:42.540 code that iterates over that, from left to right, in C, searching for one 01:51:42.540 --> 01:51:43.165 of those names. 01:51:43.165 --> 01:51:43.957 In fact, what name? 01:51:43.957 --> 01:51:46.290 Well, let's go ahead and ask the user to input the name 01:51:46.290 --> 01:51:48.498 that they want to search for so that we can tell them 01:51:48.498 --> 01:51:50.460 if the name is there or not. 01:51:50.460 --> 01:51:54.670 And we could do this, similar to C, in Python, doing something like this. 01:51:54.670 --> 01:52:00.600 So for n in names, where n is just a variable to iterate over each name-- 01:52:00.600 --> 01:52:05.595 if the name I'm looking for equals the current name in the list-- 01:52:05.595 --> 01:52:09.060 AKA n-- well, let's print out something friendly, like "Found." 01:52:09.060 --> 01:52:14.250 And then, let's do sys.exit 0 to indicate that we found whoever that is. 01:52:14.250 --> 01:52:17.460 Otherwise, if we get all the way to the bottom here, outside of this loop, 01:52:17.460 --> 01:52:20.340 let's just print "Not found" because if we haven't exited yet. 01:52:20.340 --> 01:52:22.800 And then, let's just exit with 1. 01:52:22.800 --> 01:52:25.980 Just to be clear, I can continue importing all of sys, 01:52:25.980 --> 01:52:31.920 or I could do from sys import exit, and then, I could get rid of sys dot 01:52:31.920 --> 01:52:33.240 everywhere else. 01:52:33.240 --> 01:52:36.540 But sometimes, it's helpful to know exactly where functions came from. 01:52:36.540 --> 01:52:39.675 So this, too, is just a matter of style, in this case. 01:52:39.675 --> 01:52:40.230 All right. 01:52:40.230 --> 01:52:41.522 So let's go ahead and run this. 01:52:41.522 --> 01:52:46.540 python of names.py, and let's look for Ron, all the way at the end. 01:52:46.540 --> 01:52:47.040 All right. 01:52:47.040 --> 01:52:47.910 He's found. 01:52:47.910 --> 01:52:51.570 And let's search for someone outside of the family here, like Hermione. 01:52:51.570 --> 01:52:52.700 Not found. 01:52:52.700 --> 01:52:53.200 OK. 01:52:53.200 --> 01:52:54.783 So it seems to be working in this way. 01:52:54.783 --> 01:52:58.548 But I've essentially implemented what algorithm? 01:52:58.548 --> 01:53:05.247 What algorithm would this seem to be, per line 7 and 8 to 9 and 10? 01:53:05.247 --> 01:53:05.955 AUDIENCE: Linear. 01:53:05.955 --> 01:53:06.450 DAVID MALAN: Yeah. 01:53:06.450 --> 01:53:07.350 So it's just linear search. 01:53:07.350 --> 01:53:10.185 It's a loop, even thought he syntax is a little more succinct today, 01:53:10.185 --> 01:53:12.060 and it's just iterating over the whole thing. 01:53:12.060 --> 01:53:15.240 Well, honestly, we've seen an even more terse way to do this in Python. 01:53:15.240 --> 01:53:19.230 And this, again, is what makes it a more pleasant language, sometimes. 01:53:19.230 --> 01:53:20.630 Why don't I just do this? 01:53:20.630 --> 01:53:24.790 Instead of iterating one at a time, why don't I just say this? 01:53:24.790 --> 01:53:27.840 Let me go ahead and change my condition to just 01:53:27.840 --> 01:53:33.270 be-- how about if the name we're looking for is in the names list, we're done. 01:53:33.270 --> 01:53:33.960 We found it. 01:53:33.960 --> 01:53:36.570 Use the end preposition that we've seen a couple of times, 01:53:36.570 --> 01:53:40.710 now, that itself asks the question, is something in something else? 01:53:40.710 --> 01:53:44.050 And Python will take care of linear search for us. 01:53:44.050 --> 01:53:46.080 And it's going to work exactly the same if I 01:53:46.080 --> 01:53:48.030 do python of names.py, search for Ron. 01:53:48.030 --> 01:53:50.077 It's still going to find him and it's still 01:53:50.077 --> 01:53:51.660 going to do it linearly, in this case. 01:53:51.660 --> 01:53:58.060 But I don't have to write all of the lower-level code myself, in this case. 01:53:58.060 --> 01:54:02.430 Questions, now, on any of this? 01:54:02.430 --> 01:54:05.380 The code's just getting shorter and shorter. 01:54:05.380 --> 01:54:05.880 No? 01:54:05.880 --> 01:54:07.740 What about-- let's see. 01:54:07.740 --> 01:54:09.250 What else might we have here? 01:54:09.250 --> 01:54:10.770 How about this? 01:54:10.770 --> 01:54:12.780 Let's go ahead and implement that phonebook 01:54:12.780 --> 01:54:15.690 that we started, metaphorically, with in the beginning of the course. 01:54:15.690 --> 01:54:17.940 Let's code up a program called phonebook.py. 01:54:17.940 --> 01:54:22.440 And in this case, let's go ahead and let's create a dictionary this time. 01:54:22.440 --> 01:54:25.470 Recall that a dictionary is a little something that 01:54:25.470 --> 01:54:27.060 implements something like this-- 01:54:27.060 --> 01:54:31.140 a two-column table that's got keys and values, words 01:54:31.140 --> 01:54:33.240 and definitions, names and numbers. 01:54:33.240 --> 01:54:36.367 And let's focus on the last of those, names and numbers, in this case. 01:54:36.367 --> 01:54:38.700 Well, I claimed earlier that Python has built-in support 01:54:38.700 --> 01:54:42.780 for dictionaries-- dict objects-- that you can create with one line. 01:54:42.780 --> 01:54:45.120 I didn't need it for speller because a set is sufficient 01:54:45.120 --> 01:54:47.610 when you only want one of the keys or the values, not both. 01:54:47.610 --> 01:54:49.680 But now, I want some names and numbers. 01:54:49.680 --> 01:54:53.220 So it turns out, in Python, you can create an empty dictionary 01:54:53.220 --> 01:54:55.680 by saying dict open parenthesis, closed. 01:54:55.680 --> 01:54:58.080 And that just gives you, essentially, a chart that 01:54:58.080 --> 01:54:59.640 looks like this, with nothing in it. 01:54:59.640 --> 01:55:01.725 Or there's more succinct syntax. 01:55:01.725 --> 01:55:06.858 You can, alternatively, do this, with two curly braces, instead. 01:55:06.858 --> 01:55:09.150 And, in fact, I've been using a shortcut all this time. 01:55:09.150 --> 01:55:15.885 When I had a list, earlier, where my variable was called scores, 01:55:15.885 --> 01:55:19.860 and I did this, that was actually the shorthand version of this-- 01:55:19.860 --> 01:55:21.637 hey, Python, give me an empty list. 01:55:21.637 --> 01:55:23.970 So there's different syntax for achieving the same goal. 01:55:23.970 --> 01:55:27.540 In this case, if I want a dictionary for people, 01:55:27.540 --> 01:55:32.530 I can either do this or, more commonly, just two curly braces, like that. 01:55:32.530 --> 01:55:33.030 All right. 01:55:33.030 --> 01:55:34.360 Well, what do I want to put in this? 01:55:34.360 --> 01:55:36.360 Well, let me actually put some things in this. 01:55:36.360 --> 01:55:39.360 And I'm going to just move my closed curly brace to a new line. 01:55:39.360 --> 01:55:42.580 If I want to implement this idea of keys and values, 01:55:42.580 --> 01:55:47.220 the way you do this in Python is key colon value comma. 01:55:47.220 --> 01:55:48.230 Key colon value. 01:55:48.230 --> 01:55:50.410 So you'd implement it more in code. 01:55:50.410 --> 01:55:54.270 So, for instance, if I want Carter to be the first key in my phone book and I 01:55:54.270 --> 01:56:00.135 want his number to be +1-617-495-1000, I can put that as the corresponding 01:56:00.135 --> 01:56:00.960 value. 01:56:00.960 --> 01:56:02.010 The colon is in between. 01:56:02.010 --> 01:56:05.970 Both are strings, or strs, so I've quoted both deliberately. 01:56:05.970 --> 01:56:07.762 If I want to add myself, I can put a comma. 01:56:07.762 --> 01:56:10.970 And then, just to keep things pretty, I'm moving the cursor to the next line. 01:56:10.970 --> 01:56:12.990 But that's not strictly required, aesthetically. 01:56:12.990 --> 01:56:13.865 It's just good style. 01:56:13.865 --> 01:56:19.500 And here, I might do +1-949-468-2750. 01:56:19.500 --> 01:56:24.270 And now, I have a dictionary that, essentially, has two rows, here-- 01:56:24.270 --> 01:56:27.322 Carter and his number and David and his number, as well. 01:56:27.322 --> 01:56:30.405 And if I kept adding to this, this chart would just get longer and longer. 01:56:30.405 --> 01:56:32.430 Suppose I want to search for one of our numbers. 01:56:32.430 --> 01:56:34.950 Well, let's prompt the user for the name, 01:56:34.950 --> 01:56:37.470 for whose number you want to search by getting string. 01:56:37.470 --> 01:56:38.560 Or you know what? 01:56:38.560 --> 01:56:39.893 We don't need this CS50 library. 01:56:39.893 --> 01:56:43.090 Let's just use input and prompt the user for a name. 01:56:43.090 --> 01:56:49.230 And now, we can use this super terse syntax and just say if name in people, 01:56:49.230 --> 01:56:53.700 print the formatted string number colon and-- 01:56:53.700 --> 01:56:57.160 here, we can do this-- people bracket name. 01:56:57.160 --> 01:56:57.930 OK. 01:56:57.930 --> 01:57:01.800 So this is getting cool quickly, confusingly. 01:57:01.800 --> 01:57:02.805 So let me run this. 01:57:02.805 --> 01:57:06.810 python of phonebook.py Let's type in Carter. 01:57:06.810 --> 01:57:08.910 And, indeed, I see his number. 01:57:08.910 --> 01:57:12.910 Let's run it again with David, and I see my number here. 01:57:12.910 --> 01:57:14.590 So what's going on? 01:57:14.590 --> 01:57:19.320 Well, it turns out that a dictionary is very similar, in spirit, to a list. 01:57:19.320 --> 01:57:22.350 It's actually very similar, in spirit, to an array in C. 01:57:22.350 --> 01:57:27.150 But instead of being limited to keys that are numbers, like bracket 0, 01:57:27.150 --> 01:57:30.690 bracket 1, bracket 2, you can actually use words. 01:57:30.690 --> 01:57:33.060 And that's all I'm doing here on line 8. 01:57:33.060 --> 01:57:36.765 If I want to check for the name Carter, which is currently 01:57:36.765 --> 01:57:39.555 in this variable called name, I can index 01:57:39.555 --> 01:57:42.660 into my people dictionary using not a number, 01:57:42.660 --> 01:57:44.830 but using, literally, a string-- 01:57:44.830 --> 01:57:48.000 the name Carter or David or anything else. 01:57:48.000 --> 01:57:50.640 To make this clearer, too, notice that I'm, at the moment, 01:57:50.640 --> 01:57:54.095 using this format string, which is adding some undue complexity. 01:57:54.095 --> 01:57:56.220 But I could clarify this, perhaps, further as this. 01:57:56.220 --> 01:57:58.080 I could give myself another variable called 01:57:58.080 --> 01:58:01.320 number, set it equal to the people dictionary, 01:58:01.320 --> 01:58:03.875 indexing into it using the current name. 01:58:03.875 --> 01:58:07.230 And now, I can shorten this to make it clearer that all I'm doing 01:58:07.230 --> 01:58:09.910 is printing the value of that. 01:58:09.910 --> 01:58:12.930 And, in fact, I can do this even more cryptically. 01:58:12.930 --> 01:58:16.710 This would be weird to do, but if I only ever want to show David's phone number 01:58:16.710 --> 01:58:21.150 and never Carter's, I can literally, quote unquote, "index into" the people 01:58:21.150 --> 01:58:24.930 dictionary because, now, when I run this, even if I type Carter, 01:58:24.930 --> 01:58:27.020 I'm going to get back my number instead. 01:58:27.020 --> 01:58:31.080 But that's all that's happening if I undo that, because that's now a bug. 01:58:31.080 --> 01:58:35.250 But I index into it using the value of name. 01:58:35.250 --> 01:58:37.230 Dictionaries are just so wonderfully convenient 01:58:37.230 --> 01:58:39.688 because, now, you can associate anything with anything else 01:58:39.688 --> 01:58:43.420 but not using numbers, but entire key words, instead. 01:58:43.420 --> 01:58:46.770 So here's how, if, in speller, we gave you not just words, 01:58:46.770 --> 01:58:50.340 but hundreds of thousands of definitions, as well, 01:58:50.340 --> 01:58:52.385 you could essentially store them as this. 01:58:52.385 --> 01:58:55.680 And then, when the human wants to look up a definition in a proper dictionary, 01:58:55.680 --> 01:58:57.750 not just for spell checking, you could index 01:58:57.750 --> 01:59:00.290 into the dictionary using square brackets 01:59:00.290 --> 01:59:04.240 and get back the definition in English, as well. 01:59:04.240 --> 01:59:06.770 Questions on this? 01:59:06.770 --> 01:59:07.280 Yeah? 01:59:07.280 --> 01:59:09.760 AUDIENCE: Is the way this code does, as presented, 01:59:09.760 --> 01:59:11.744 saying that Python has [INAUDIBLE]? 01:59:21.390 --> 01:59:22.890 DAVID MALAN: A really good question. 01:59:22.890 --> 01:59:27.330 So, to summarize, how is Python finding that name within that dictionary? 01:59:27.330 --> 01:59:31.110 This is where, honestly, speller in p-set 5 is what Python's all about. 01:59:31.110 --> 01:59:34.215 So you have struggled, are struggling with implementing your own spell 01:59:34.215 --> 01:59:36.090 checker and implementing your own hash table. 01:59:36.090 --> 01:59:39.210 And recall that, per last week, the goal of a hash table is to, 01:59:39.210 --> 01:59:41.190 ideally, get constant time access. 01:59:41.190 --> 01:59:45.435 Not something linear, which is slow and even better than something logarithmic, 01:59:45.435 --> 01:59:47.400 like log base 2 of n. 01:59:47.400 --> 01:59:50.130 So Python and the really smart people who invented it, 01:59:50.130 --> 01:59:53.310 they have written the code that does its best to give you 01:59:53.310 --> 01:59:55.853 constant time searches of dictionaries. 01:59:55.853 --> 01:59:58.020 And they're not always going to succeed, just as you 01:59:58.020 --> 01:59:59.430 and your own problem set are probably going 01:59:59.430 --> 02:00:01.805 to have some collisions once in a while and start to have 02:00:01.805 --> 02:00:03.440 chains of length lists of words. 02:00:03.440 --> 02:00:05.940 But this is where, again, you defer to someone else, someone 02:00:05.940 --> 02:00:07.800 smarter than you, someone with more time than you 02:00:07.800 --> 02:00:09.270 to solve these problems for you. 02:00:09.270 --> 02:00:11.490 And if you read Python's documentation, you'll 02:00:11.490 --> 02:00:13.650 see that it doesn't guarantee constant time, 02:00:13.650 --> 02:00:15.990 but it's going to, ideally, optimize the data structure 02:00:15.990 --> 02:00:19.320 for you to get as fast as possible. 02:00:19.320 --> 02:00:22.690 And of all of the data structures like a dictionary, 02:00:22.690 --> 02:00:25.380 a hash table is, really, like the Swiss army knife of computing 02:00:25.380 --> 02:00:28.260 because it just lets you associate something with something else. 02:00:28.260 --> 02:00:30.510 And even though we keep focusing on names and numbers, 02:00:30.510 --> 02:00:32.400 that's a really powerful thing because it's 02:00:32.400 --> 02:00:34.230 more powerful than lists and arrays, which 02:00:34.230 --> 02:00:35.910 are only numbers and something else. 02:00:35.910 --> 02:00:38.690 Now, you can have any sorts of relationships, instead. 02:00:38.690 --> 02:00:39.270 All right. 02:00:39.270 --> 02:00:41.178 Let me show a few other examples before we 02:00:41.178 --> 02:00:43.470 culminate with some more powerful techniques in Python, 02:00:43.470 --> 02:00:45.000 thanks to libraries. 02:00:45.000 --> 02:00:49.480 How about this problem we encountered in week 4, which was this. 02:00:49.480 --> 02:00:54.120 Let me code up a program called, again, compare.py here but, this time, 02:00:54.120 --> 02:00:56.770 compare to strings and not numbers. 02:00:56.770 --> 02:01:01.230 So let me, for instance, get one string from the user called s. 02:01:01.230 --> 02:01:04.890 Just for the sake of discussion, let me get another string from the user 02:01:04.890 --> 02:01:07.830 called t so that we can actually do some comparison here. 02:01:07.830 --> 02:01:12.780 And if s equals equals t, let's go ahead and print out that they're the same. 02:01:12.780 --> 02:01:15.640 Else, let's go ahead and print out that they're different. 02:01:15.640 --> 02:01:17.910 So this is very similar to what we did in week 4. 02:01:17.910 --> 02:01:20.580 But in week 4, recall we did this specifically 02:01:20.580 --> 02:01:23.800 because we had encountered a problem. 02:01:23.800 --> 02:01:28.680 For instance, if I run-- whoops. 02:01:28.680 --> 02:01:34.970 If I run-- what's going on? 02:01:34.970 --> 02:01:40.396 [INAUDIBLE] Come on. 02:01:40.396 --> 02:01:41.390 Oh. 02:01:41.390 --> 02:01:41.890 OK. 02:01:41.890 --> 02:01:43.240 Wow, OK. 02:01:43.240 --> 02:01:43.840 Long day. 02:01:43.840 --> 02:01:44.380 All right. 02:01:44.380 --> 02:01:48.670 If I run the proper command, python of compare.py, then let's go ahead 02:01:48.670 --> 02:01:53.785 and type in something like "cat" in all lowercase, "cat" in all lowercase. 02:01:53.785 --> 02:01:56.110 And they're the same. 02:01:56.110 --> 02:01:59.565 If, though, I do this again with "dog" and "dog," they're the same. 02:01:59.565 --> 02:02:01.690 And, of course, "cat" and "dog," they're different. 02:02:01.690 --> 02:02:06.430 But does anyone recall, from two weeks ago, when I typed in my name twice, 02:02:06.430 --> 02:02:08.680 both identically capitalized. 02:02:08.680 --> 02:02:10.360 What did it say? 02:02:10.360 --> 02:02:13.390 That they were, in fact, different. 02:02:13.390 --> 02:02:14.110 And why was that? 02:02:14.110 --> 02:02:16.660 Why were two strings in C different, even though I typed literally 02:02:16.660 --> 02:02:17.410 the same thing? 02:02:20.040 --> 02:02:21.540 Two different places in memory. 02:02:21.540 --> 02:02:24.560 So each string might look the same, aesthetically, but, of course, 02:02:24.560 --> 02:02:25.852 was stored elsewhere in memory. 02:02:25.852 --> 02:02:29.970 And yet, Python appears to be using the equality operator-- 02:02:29.970 --> 02:02:33.510 equals equals-- like you and I would expect, as humans-- actually 02:02:33.510 --> 02:02:38.510 comparing for us char by char in each of those strings for actual [? quality. ?] 02:02:38.510 --> 02:02:41.610 So this is a feature of Python, in that it's just easier to do. 02:02:41.610 --> 02:02:42.210 And why? 02:02:42.210 --> 02:02:44.627 Well, this derives from the reality that, in Python, there 02:02:44.627 --> 02:02:45.630 are no pointers anymore. 02:02:45.630 --> 02:02:47.297 There's no underlying memory management. 02:02:47.297 --> 02:02:50.400 It's not up to you, now, to worry about those lower-level details. 02:02:50.400 --> 02:02:52.960 The language itself takes care of that for you. 02:02:52.960 --> 02:02:55.050 And so, similarly, if I do this and don't 02:02:55.050 --> 02:02:57.510 ask the user for two strings, but just one, 02:02:57.510 --> 02:02:59.370 and then, I do something like this. 02:02:59.370 --> 02:03:05.550 How about give myself a second variable t, set it equal to s.capitalize, which, 02:03:05.550 --> 02:03:08.040 note, is not the same as upper; capitalize, by design, 02:03:08.040 --> 02:03:12.270 per Python's documentation, will only capitalize the first letter for you-- 02:03:12.270 --> 02:03:15.240 I can now print out, say, two fstrings here-- 02:03:15.240 --> 02:03:18.240 what the value of s is and, then, let me print out, 02:03:18.240 --> 02:03:20.340 with another fstring, what the value of t is. 02:03:20.340 --> 02:03:22.995 And recall that, in C, this was a problem 02:03:22.995 --> 02:03:26.820 because if you capitalize s and store it in t, 02:03:26.820 --> 02:03:29.670 we accidentally capitalized both s and t. 02:03:29.670 --> 02:03:33.510 But in this case, in Python, when I actually run this and type in "cat" 02:03:33.510 --> 02:03:37.770 In all lowercase, the original s is unchanged 02:03:37.770 --> 02:03:42.780 because, when I use capitalize on line 3, this is, indeed, capitalizing s. 02:03:42.780 --> 02:03:47.550 But it's returning a copy of the result. It cannot change s itself 02:03:47.550 --> 02:03:50.385 because, again, for that technical term, s is immutable. 02:03:50.385 --> 02:03:53.265 Strings, once they exist, cannot be changed themselves. 02:03:53.265 --> 02:03:58.590 But you can return copies and modify mutated copies of those same strings. 02:03:58.590 --> 02:04:02.040 So, in short, all of those headaches we encountered in week 4 02:04:02.040 --> 02:04:05.070 are now solved, really, in the way you might expect. 02:04:05.070 --> 02:04:07.500 And here's another one that we dwelled on in week 4, 02:04:07.500 --> 02:04:09.660 with the colored liquid in glasses. 02:04:09.660 --> 02:04:12.150 Let me code up a program called swap.py. 02:04:12.150 --> 02:04:16.690 And in swap.py, let me set x equal to 1, y equal to 2. 02:04:16.690 --> 02:04:18.690 And then, let me just print out an fstring here. 02:04:18.690 --> 02:04:24.360 So how about x is this comma y is that. 02:04:24.360 --> 02:04:27.735 And then, let me do that twice, just for the sake of demonstration. 02:04:27.735 --> 02:04:31.005 And in here, recall that we had to create a swap function. 02:04:31.005 --> 02:04:33.630 But then, we had to pass it in by reference with the ampersand. 02:04:33.630 --> 02:04:38.460 And oh my god, that was peak complexity in C. Well, 02:04:38.460 --> 02:04:41.100 if you want to swap x and y in Python, you 02:04:41.100 --> 02:04:43.830 could do x comma y equals y comma x. 02:04:43.830 --> 02:04:49.020 And now, python of swap.py. 02:04:49.020 --> 02:04:50.130 And there we go. 02:04:50.130 --> 02:04:51.840 All of that's handled for you. 02:04:51.840 --> 02:04:56.350 It's like a shell game without even a temporary variable in mind. 02:04:56.350 --> 02:04:58.290 So what more can we do here? 02:04:58.290 --> 02:05:00.870 How about a few final building blocks? 02:05:00.870 --> 02:05:03.330 And these related, now, to files from that week 4. 02:05:03.330 --> 02:05:07.710 Suppose that I want to save some names and numbers in a CSV file-- 02:05:07.710 --> 02:05:11.080 Comma Separated Values, which is like a very lightweight spreadsheet. 02:05:11.080 --> 02:05:15.300 Well, first, let me create a phonebook.csv file 02:05:15.300 --> 02:05:19.458 that just has name comma number as the first row there. 02:05:19.458 --> 02:05:21.750 But after that, I'm going to go ahead, now, and code up 02:05:21.750 --> 02:05:25.170 a phonebook.py program that actually allows 02:05:25.170 --> 02:05:27.040 me to add things to this phonebook. 02:05:27.040 --> 02:05:31.020 So let me split my screen here so that we can see the old and the new. 02:05:31.020 --> 02:05:34.050 And down here, in my code for phonebook.py, 02:05:34.050 --> 02:05:36.360 in this new and improved version, I'm going 02:05:36.360 --> 02:05:40.020 to actually import a whole other library, this one called CSV. 02:05:40.020 --> 02:05:42.885 And here, too, especially for people in data science and the like, 02:05:42.885 --> 02:05:46.500 really like being able to manipulate files and data that might very well be 02:05:46.500 --> 02:05:48.060 stored in spreadsheets or CSVs-- 02:05:48.060 --> 02:05:51.510 Comma Separated Values, which we saw briefly in week 4. 02:05:51.510 --> 02:05:53.670 In phonebook.py, then, it suffices to just 02:05:53.670 --> 02:05:57.348 import CSV after reading the documentation therefore 02:05:57.348 --> 02:05:59.265 because this is going to give me functionality 02:05:59.265 --> 02:06:02.150 in code related to CSV files. 02:06:02.150 --> 02:06:04.950 So here's how I might open a file in Python. 02:06:04.950 --> 02:06:08.340 I literally call open-- it's not fopen now; it's just open-- 02:06:08.340 --> 02:06:10.860 and I open this file called phonebook.csv. 02:06:10.860 --> 02:06:13.470 And just as in C, I'm going to open it in append mode-- 02:06:13.470 --> 02:06:15.930 not right, where it would change the whole thing. 02:06:15.930 --> 02:06:18.660 I want to append new line at a time. 02:06:18.660 --> 02:06:21.750 After this, I want to get, maybe, a name from the user. 02:06:21.750 --> 02:06:25.350 So let's prompt the user for some input for their name. 02:06:25.350 --> 02:06:27.255 And then, let's prompt the user for a number, 02:06:27.255 --> 02:06:31.060 as well, using input prompting for number. 02:06:31.060 --> 02:06:31.560 All right. 02:06:31.560 --> 02:06:33.602 And now, this is a little cryptic, and you'd only 02:06:33.602 --> 02:06:35.050 know this from the documentation. 02:06:35.050 --> 02:06:38.370 But if you want to write rows to a CSV file 02:06:38.370 --> 02:06:41.850 that you can, then, view in Excel or the like, you can do this-- 02:06:41.850 --> 02:06:45.060 give me a variable called writer-- but I could call it anything I want. 02:06:45.060 --> 02:06:50.760 Let me use a csv.writer function that comes with this CSV library, 02:06:50.760 --> 02:06:51.885 passing in the file. 02:06:51.885 --> 02:06:56.070 This is like saying, hey, Python, treat this open file as a CSV file 02:06:56.070 --> 02:06:59.340 so that things are separated with commas and nicely formatted 02:06:59.340 --> 02:07:00.515 in rows and columns. 02:07:00.515 --> 02:07:02.100 Now, I'm going to do this-- 02:07:02.100 --> 02:07:04.030 use that writer to write a row. 02:07:04.030 --> 02:07:05.280 Well, what do I want to write? 02:07:05.280 --> 02:07:07.380 I want to write a short list-- 02:07:07.380 --> 02:07:10.200 namely, the current name and the current number-- 02:07:10.200 --> 02:07:14.790 to that file, but I don't want to use fprintf and %s and all of that stuff 02:07:14.790 --> 02:07:16.440 that we might have had in the past. 02:07:16.440 --> 02:07:19.030 And now, I just want to close the file. 02:07:19.030 --> 02:07:20.410 Let me reopen my terminal. 02:07:20.410 --> 02:07:26.102 Let me run python of phonebook.py, and let me type in David and then 02:07:26.102 --> 02:07:30.190 +1-949-468-2750 and, crossing my fingers, 02:07:30.190 --> 02:07:33.430 watching the actual CSV at top-left. 02:07:33.430 --> 02:07:35.737 My code has just added me to the file. 02:07:35.737 --> 02:07:37.570 And if I were to run it again, for instance, 02:07:37.570 --> 02:07:41.770 with Carter and +1-617-495-1000, crossing my fingers again-- 02:07:41.770 --> 02:07:42.820 we've updated the file. 02:07:42.820 --> 02:07:46.150 And it turns out, there's code now, via which I can even read that file. 02:07:46.150 --> 02:07:48.850 But I can, first, tighten this up, just so you've seen it. 02:07:48.850 --> 02:07:52.720 It turns out, in Python, it's so common to open files and close them. 02:07:52.720 --> 02:07:54.610 Humans make mistakes, and they often forget 02:07:54.610 --> 02:07:58.477 to close files, which might, then, end up using more memory than you intend. 02:07:58.477 --> 02:08:00.310 So you can, alternatively, do this in Python 02:08:00.310 --> 02:08:03.310 so that you don't have to worry about closing files. 02:08:03.310 --> 02:08:05.920 You can use this keyword instead. 02:08:05.920 --> 02:08:09.100 You can say with the opening of this file 02:08:09.100 --> 02:08:13.420 as a variable called file do all of the following underneath. 02:08:13.420 --> 02:08:15.470 So I'm indenting most of my code. 02:08:15.470 --> 02:08:18.430 I'm using this new, Python-specific keyword called width. 02:08:18.430 --> 02:08:22.330 And this is just a matter of saying, with the following opening of the file, 02:08:22.330 --> 02:08:26.120 do those next four lines of code, and then, automatically close it for me 02:08:26.120 --> 02:08:27.370 at the end of the indentation. 02:08:27.370 --> 02:08:31.480 It's a minor optimization, but this, again, is the pythonic way 02:08:31.480 --> 02:08:33.250 to do things, instead. 02:08:33.250 --> 02:08:34.720 How else might I do this, too? 02:08:34.720 --> 02:08:38.860 Well, it turns out that the code I've written here-- on line 9, 02:08:38.860 --> 02:08:40.630 especially-- is a little fragile. 02:08:40.630 --> 02:08:44.350 If any human opens this spreadsheet-- the CSV file in Excel, 02:08:44.350 --> 02:08:46.000 Google Spreadsheets, Apple Numbers-- 02:08:46.000 --> 02:08:49.390 and maybe moves the columns around just because, maybe, they're fussing. 02:08:49.390 --> 02:08:52.790 They saved it, and they don't realize they've, now, changed my assumptions. 02:08:52.790 --> 02:08:55.120 I don't want to, necessarily, write name and number 02:08:55.120 --> 02:08:58.360 always in that order because what if someone screws up and flips those two 02:08:58.360 --> 02:09:01.040 columns by literally dragging and dropping? 02:09:01.040 --> 02:09:03.640 So it turns out that, instead of using a list here, 02:09:03.640 --> 02:09:06.890 we can use another feature of this library, as follows. 02:09:06.890 --> 02:09:09.520 Instead of using a writer, there's something 02:09:09.520 --> 02:09:11.530 called a dictionary writer or dict writer 02:09:11.530 --> 02:09:14.140 that takes the same argument as input-- 02:09:14.140 --> 02:09:15.580 the file that's opened. 02:09:15.580 --> 02:09:18.070 But now, the one difference here is that you 02:09:18.070 --> 02:09:25.030 need to tell this dictionary writer that your field names are name and number. 02:09:25.030 --> 02:09:27.370 And let me close the CSV here. 02:09:27.370 --> 02:09:32.140 Name and number are the names of the fields, the columns in this CSV file. 02:09:32.140 --> 02:09:34.450 And when it comes time to write a new row, 02:09:34.450 --> 02:09:37.750 the syntax here is going to be a little uglier, but it's just a dictionary. 02:09:37.750 --> 02:09:40.120 The name I want to write to the dictionary 02:09:40.120 --> 02:09:42.310 is going to be whatever name the human typed in. 02:09:42.310 --> 02:09:45.790 The number that I want to write to the CSV file 02:09:45.790 --> 02:09:48.550 is going to be whatever the number the human typed in. 02:09:48.550 --> 02:09:51.010 But what's different, now, about this code is, 02:09:51.010 --> 02:09:55.960 by simply using a dictionary writer here instead of the generic writer, 02:09:55.960 --> 02:10:00.640 now, the columns can be in this order or this order or any order. 02:10:00.640 --> 02:10:03.010 And the dictionary writer is going to figure out, 02:10:03.010 --> 02:10:06.557 based on the first line of text in that CSV, where to put name, 02:10:06.557 --> 02:10:07.390 where to put number. 02:10:07.390 --> 02:10:08.883 So if you flip them, no big deal. 02:10:08.883 --> 02:10:11.050 It's going to notice, oh, wait, the columns changed. 02:10:11.050 --> 02:10:14.330 And it's going to insert the columns correctly. 02:10:14.330 --> 02:10:18.970 So just, again, another more powerful feature that lets you 02:10:18.970 --> 02:10:22.750 focus on real work, as opposed to actually getting 02:10:22.750 --> 02:10:27.250 tied up in the weeds of writing code like this, otherwise. 02:10:27.250 --> 02:10:30.440 Questions on this one, as well? 02:10:30.440 --> 02:10:33.520 But what we will do, now, is come full circle 02:10:33.520 --> 02:10:37.180 to some of the more sophisticated examples with which we began, 02:10:37.180 --> 02:10:40.855 and I'm going to go back over to my own Mac laptop 02:10:40.855 --> 02:10:43.743 here, where I've got my own terminal window up and running, 02:10:43.743 --> 02:10:46.285 and I was just going to introduce a couple of final libraries 02:10:46.285 --> 02:10:49.788 that really speak to just how powerful Python can be 02:10:49.788 --> 02:10:51.580 and how quickly you can get up and running. 02:10:51.580 --> 02:10:54.330 To be fair, can't necessarily do all of these things in the cloud, 02:10:54.330 --> 02:10:57.337 like in code spaces, because you need access to your own speakers 02:10:57.337 --> 02:10:58.420 or microphone or the like. 02:10:58.420 --> 02:11:01.090 So that's why I'm doing it on my own Mac, here. 02:11:01.090 --> 02:11:05.680 But let me go ahead and open up a program called speech.py. 02:11:05.680 --> 02:11:07.300 And I'm not using VS Code here. 02:11:07.300 --> 02:11:10.150 I'm using a program called VI that's entirely terminal window based. 02:11:10.150 --> 02:11:13.105 But it's going to allow me, for instance, to import the Python 02:11:13.105 --> 02:11:16.120 text to speech version 3 library. 02:11:16.120 --> 02:11:18.790 I'm going to give myself a variable called engine that's 02:11:18.790 --> 02:11:21.610 going to be set equal to the Python text to speech 02:11:21.610 --> 02:11:26.350 3 libraries init method, which is just going to initialize this library that 02:11:26.350 --> 02:11:28.090 relates to text to speech. 02:11:28.090 --> 02:11:32.410 I'm going to, then, use the engine's say function to say something 02:11:32.410 --> 02:11:35.260 like, how about, hello comma world. 02:11:35.260 --> 02:11:39.850 And then, as my last line, I'm going to say engine.runAndWait, capitalized 02:11:39.850 --> 02:11:44.690 as such, to tell my program, now, to run that speech and wait until it's done. 02:11:44.690 --> 02:11:45.190 All right. 02:11:45.190 --> 02:11:46.540 I'm going to save this file. 02:11:46.540 --> 02:11:49.110 I'm going to run python of speech.py. 02:11:49.110 --> 02:11:52.357 And I'm going to cross my fingers, as always, and-- 02:11:52.357 --> 02:11:53.440 INTERPRETER: Hello, world. 02:11:53.440 --> 02:11:54.398 DAVID MALAN: All right. 02:11:54.398 --> 02:11:57.130 So now, I have a program that's actually synthesizing speech 02:11:57.130 --> 02:11:58.570 using a library like this. 02:11:58.570 --> 02:12:01.285 How can I, now, modify this to be a little more interesting? 02:12:01.285 --> 02:12:02.690 Well, how about this? 02:12:02.690 --> 02:12:05.050 Let me go ahead and prompt the user for their name, 02:12:05.050 --> 02:12:08.680 like we've done several times here, using Python's built-in name function. 02:12:08.680 --> 02:12:11.665 And now, let me go ahead and use a format string in conjunction 02:12:11.665 --> 02:12:14.980 with this library, interpolating the value of name there. 02:12:14.980 --> 02:12:18.460 And-- at least, if my name is somewhat phonetically pronounceable-- 02:12:18.460 --> 02:12:23.587 let's go ahead and run python of speech.py, type in my name, and-- 02:12:23.587 --> 02:12:24.670 INTERPRETER: Hello, David. 02:12:24.670 --> 02:12:25.445 DAVID MALAN: OK. 02:12:25.445 --> 02:12:27.640 It's a weird choice of inflection, but we're 02:12:27.640 --> 02:12:30.475 starting to synthesize voice, not unlike Siri or Google Assistant 02:12:30.475 --> 02:12:32.050 or Alexa or the like. 02:12:32.050 --> 02:12:36.130 Now, we can, maybe, do something a little more advanced, too. 02:12:36.130 --> 02:12:39.310 In addition to synthesizing speech in this way, 02:12:39.310 --> 02:12:43.270 we could synthesize, for instance, an actual graphic. 02:12:43.270 --> 02:12:45.740 Let me go ahead, now, and do something like this. 02:12:45.740 --> 02:12:48.760 Let me create a program called qr.py. 02:12:48.760 --> 02:12:50.890 I'm going to go ahead and import a library called 02:12:50.890 --> 02:12:54.860 OS, which gives you access to operating system related functionality in Python. 02:12:54.860 --> 02:12:56.860 I'm going to import a library I've pre-installed 02:12:56.860 --> 02:12:59.830 called qrcode, which is a two-dimensional barcode that you 02:12:59.830 --> 02:13:01.300 might have seen in the real world. 02:13:01.300 --> 02:13:03.715 I'm going to go ahead and create an image variable using 02:13:03.715 --> 02:13:08.260 this qrcode library's make function, which, per its documentation, 02:13:08.260 --> 02:13:10.365 takes a URL, like one of CS50's own videos. 02:13:10.365 --> 02:13:23.003 So we'll do this with youtu.be/xvF2joSPgG0. 02:13:23.003 --> 02:13:24.670 So, hopefully, that's the right lecture. 02:13:24.670 --> 02:13:27.160 And now, we've got img.save, which is going to allow 02:13:27.160 --> 02:13:30.130 me to create a file called qr.ping. 02:13:30.130 --> 02:13:33.460 Think back, now, on problem set 4 and how painful it was to save files. 02:13:33.460 --> 02:13:36.940 We'll just use the save function, now, in Python and save this as a PNG file-- 02:13:36.940 --> 02:13:38.260 Portable Network Graphic. 02:13:38.260 --> 02:13:43.420 And then, lastly, let's just go ahead and open with the command open qr.png 02:13:43.420 --> 02:13:46.120 on my Mac so that, hopefully, this just automatically opens. 02:13:46.120 --> 02:13:46.660 All right. 02:13:46.660 --> 02:13:49.300 I'm going to go ahead and just double-check my syntax here 02:13:49.300 --> 02:13:51.280 so that I haven't made any mistakes. 02:13:51.280 --> 02:13:54.235 I'm going to go ahead and run python of qr.py. 02:13:54.235 --> 02:13:55.810 Enter. 02:13:55.810 --> 02:13:57.223 That opens up this. 02:13:57.223 --> 02:13:58.390 Let me go ahead and zoom in. 02:13:58.390 --> 02:14:03.750 If you've got a phone handy and you'd like to scan this code here, 02:14:03.750 --> 02:14:07.131 whether in person or online-- 02:14:07.131 --> 02:14:08.095 I apologize. 02:14:08.095 --> 02:14:09.130 You won't appreciate it. 02:14:11.640 --> 02:14:12.140 Amazing! 02:14:12.140 --> 02:14:13.600 OK. 02:14:13.600 --> 02:14:17.230 And, lastly, let me go back into our speech example 02:14:17.230 --> 02:14:21.400 here, create a final ending here in our final moments. 02:14:21.400 --> 02:14:26.060 And how about we just say something like "This was CS50," like this. 02:14:26.060 --> 02:14:27.087 Let's go ahead, here. 02:14:27.087 --> 02:14:28.795 Fix my capitalization, just for tidiness. 02:14:28.795 --> 02:14:29.878 Let's get rid of the name. 02:14:29.878 --> 02:14:33.840 And now, with our final flourish and your introduction to Python equipped-- 02:14:33.840 --> 02:14:35.230 here we go-- 02:14:35.230 --> 02:14:36.535 INTERPRETER: This was CS50. 02:14:36.535 --> 02:14:37.000 DAVID MALAN: All right. 02:14:37.000 --> 02:14:38.000 We'll see you next time. 02:14:38.000 --> 02:14:39.460 [APPLAUSE] 02:14:41.860 --> 02:14:45.210 [MUSIC PLAYING]