WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:06.461 [MUSIC PLAYING] 00:00:48.562 --> 00:00:49.520 DAVID MALAN: All right. 00:00:49.520 --> 00:00:52.400 This is CS50, and this is week 6. 00:00:52.400 --> 00:00:55.950 And this is, again, one of those rare days where in just a bit of time 00:00:55.950 --> 00:00:58.338 you'll be able to say that you learned a new language. 00:00:58.338 --> 00:01:01.130 And that language today is going to be this language called Python. 00:01:01.130 --> 00:01:04.430 And we'd thought we'd begin by introducing Python 00:01:04.430 --> 00:01:06.110 by way of some more familiar friends. 00:01:06.110 --> 00:01:08.748 So this, of course, is where we began the course back in week 0 00:01:08.748 --> 00:01:10.790 when we introduced Scratch, a simple program that 00:01:10.790 --> 00:01:12.860 quite simply says "hello, world." 00:01:12.860 --> 00:01:16.100 And then very quickly, things escalated and became 00:01:16.100 --> 00:01:19.525 a lot more cryptic, a lot more arcane, and we introduced C and syntax 00:01:19.525 --> 00:01:21.650 like this, which of course do the exact same thing, 00:01:21.650 --> 00:01:25.430 just printing out "hello, world" on the screen, but with the requirement 00:01:25.430 --> 00:01:28.880 that you understand and you include all of this various syntax. 00:01:28.880 --> 00:01:34.520 So today, all of this complexity, all of the syntax from C, 00:01:34.520 --> 00:01:38.780 suddenly begins to melt away, such that we're 00:01:38.780 --> 00:01:40.880 left with this new language called Python 00:01:40.880 --> 00:01:44.530 that's going to achieve the exact same goal simply with this line of code 00:01:44.530 --> 00:01:45.030 here. 00:01:45.030 --> 00:01:48.440 Which is to say that Python tends to be more accessible, 00:01:48.440 --> 00:01:50.270 it tends to be a little easier. 00:01:50.270 --> 00:01:53.750 But that's because it's built on this tradition of having started, 00:01:53.750 --> 00:01:57.290 as humans years ago, building these low-level languages like C, 00:01:57.290 --> 00:02:00.240 realizing what features are missing, what some of the pain points are, 00:02:00.240 --> 00:02:04.340 and then layering on top of those older languages new ideas, new features, 00:02:04.340 --> 00:02:05.730 and in turn new languages. 00:02:05.730 --> 00:02:09.210 So there are dozens, hundreds really, of programming languages out there. 00:02:09.210 --> 00:02:13.400 But there's always a subset of them that tend to be very popular, very in vogue 00:02:13.400 --> 00:02:14.210 at any given time. 00:02:14.210 --> 00:02:17.090 Python is among those very popular languages. 00:02:17.090 --> 00:02:20.270 And it's the third of our languages that we'll look at, indeed, 00:02:20.270 --> 00:02:22.290 at this point in the term. 00:02:22.290 --> 00:02:25.070 So let's go ahead and introduce some of the syntax of Python, 00:02:25.070 --> 00:02:28.280 really by way of comparison with what we've seen in the past. 00:02:28.280 --> 00:02:31.130 Because no matter how new some of today's topics are, 00:02:31.130 --> 00:02:34.760 they should all be familiar in the sense that we're going to see loops again, 00:02:34.760 --> 00:02:38.540 conditions, variables, functions, return values. 00:02:38.540 --> 00:02:41.810 There's pretty much just going to be a translation of features 00:02:41.810 --> 00:02:43.378 past to now features present. 00:02:43.378 --> 00:02:45.170 So this of course, in the world of Scratch, 00:02:45.170 --> 00:02:48.380 was just one puzzle piece or a function, whose purpose in life 00:02:48.380 --> 00:02:50.300 is to say "hello, world" on the screen. 00:02:50.300 --> 00:02:54.320 In week 1, we translated this to the more cryptic syntax here, 00:02:54.320 --> 00:02:58.850 key details being that it's printf, that you have the quote, the string, 00:02:58.850 --> 00:03:03.630 "hello, world," you have this backslash n to represent a new line character. 00:03:03.630 --> 00:03:06.800 And then of course, this kind of statement has to end with a semicolon. 00:03:06.800 --> 00:03:10.310 The equivalent line of code today on out in this language 00:03:10.310 --> 00:03:12.930 called Python is going to be quite simply this. 00:03:12.930 --> 00:03:16.850 So it looks similar, certainly, but it's now print instead of printf. 00:03:16.850 --> 00:03:21.620 We still have the double quotes, but gone are the backslash n as well as 00:03:21.620 --> 00:03:22.470 the semicolon. 00:03:22.470 --> 00:03:25.303 So if you've been kicking yourself all too frequently for forgetting 00:03:25.303 --> 00:03:29.150 stupid things like the semicolons, Python will now be your friend. 00:03:29.150 --> 00:03:31.160 Well, let's take a look at another example 00:03:31.160 --> 00:03:35.150 here, how we might go about getting user input as well. 00:03:35.150 --> 00:03:38.360 Well, here notice that we have a puzzle piece called Ask. 00:03:38.360 --> 00:03:40.610 And it says, ask "What's your name?" and wait. 00:03:40.610 --> 00:03:43.580 And the next puzzle piece said, whatever the human had typed in, 00:03:43.580 --> 00:03:45.290 precede it with the word "hello." 00:03:45.290 --> 00:03:47.960 In C we saw code like this-- string_answer 00:03:47.960 --> 00:03:50.360 equals get_string "what's your name?" 00:03:50.360 --> 00:03:54.080 and then printing out with printf, "hello %s," 00:03:54.080 --> 00:03:56.810 plugging in one value for the other. 00:03:56.810 --> 00:04:00.230 In Python, some of this complexity is about to melt away, too. 00:04:00.230 --> 00:04:03.690 And in Python, we're going to see a little something like this. 00:04:03.690 --> 00:04:07.100 So no longer present is the mention of the type of variable. 00:04:07.100 --> 00:04:10.110 No longer present is the semicolon at the end. 00:04:10.110 --> 00:04:15.102 And no longer present is the %s and that additional argument to print. 00:04:15.102 --> 00:04:17.519 So in fact, let's go ahead and see these things in action. 00:04:17.519 --> 00:04:21.320 I'm going to go ahead and go over to CS50 IDE here for just a moment. 00:04:21.320 --> 00:04:24.830 And within CS50 IDE, I'm going to go ahead and write 00:04:24.830 --> 00:04:26.930 my very first Python program. 00:04:26.930 --> 00:04:30.140 And to do that, I'm going to go ahead and create a file that we'll initially 00:04:30.140 --> 00:04:31.730 called hello.py. 00:04:31.730 --> 00:04:35.870 Much like in the world of C, Python programs have a standard file extension 00:04:35.870 --> 00:04:38.270 being .py instead of .c. 00:04:38.270 --> 00:04:41.390 And I'm just going to do what I proposed was the simplest translation. 00:04:41.390 --> 00:04:45.170 I'm just going to go ahead and say print, "hello, world." 00:04:45.170 --> 00:04:46.430 I'm going to save my file. 00:04:46.430 --> 00:04:48.110 And then I'm going to go down to my terminal window. 00:04:48.110 --> 00:04:50.420 And in the past, of course, we would have used make, 00:04:50.420 --> 00:04:54.420 and then we would have done ./hello or the like. 00:04:54.420 --> 00:04:58.940 But today, I'm quite simply going to run a command that itself is called Python. 00:04:58.940 --> 00:05:01.430 I'm going to pass in the name of the file I just 00:05:01.430 --> 00:05:03.620 created as its command line argument. 00:05:03.620 --> 00:05:08.630 And voila, hitting Enter, there is my very first program in Python. 00:05:08.630 --> 00:05:09.960 So that's pretty powerful. 00:05:09.960 --> 00:05:14.398 Let's go ahead and create the second program that I proposed a moment ago. 00:05:14.398 --> 00:05:16.190 Instead of just printing out "hello, world" 00:05:16.190 --> 00:05:18.770 the whole time, I'm also going to go ahead this time 00:05:18.770 --> 00:05:22.070 and give myself a variable that I'll call answer. 00:05:22.070 --> 00:05:25.650 I'm going to go ahead now and get input from the user. 00:05:25.650 --> 00:05:28.025 And I'm going to go ahead and use the familiar get_string 00:05:28.025 --> 00:05:29.540 that we did see in C. 00:05:29.540 --> 00:05:33.080 I'm going to go ahead and ask, "What's your name" question mark. 00:05:33.080 --> 00:05:35.370 I'm not going to bother with a semicolon. 00:05:35.370 --> 00:05:40.160 But down here, I'm going to go ahead and say print "hello," comma, and then 00:05:40.160 --> 00:05:41.870 a space inside of the quotes. 00:05:41.870 --> 00:05:46.340 And instead of doing something like %s, I'm actually going to go ahead and just 00:05:46.340 --> 00:05:50.510 do a plus operator, and then literally the word "answer." 00:05:50.510 --> 00:05:53.527 But the catch is that this isn't going to work just yet. 00:05:53.527 --> 00:05:56.360 This isn't going to work just yet, because get_string, it turns out, 00:05:56.360 --> 00:05:59.357 just like it doesn't come with C, it also doesn't come with Python. 00:05:59.357 --> 00:06:01.190 So I need to do one thing that's going to be 00:06:01.190 --> 00:06:02.732 a little bit different from the past. 00:06:02.732 --> 00:06:05.870 Instead of hash including something, I'm going to literally say 00:06:05.870 --> 00:06:09.110 from cs50 import get_string. 00:06:09.110 --> 00:06:11.310 So in the world of C, recall that we included 00:06:11.310 --> 00:06:15.680 cs50.h, which had declarations for functions like get_string and get_int 00:06:15.680 --> 00:06:16.430 and so forth. 00:06:16.430 --> 00:06:19.350 In the world of Python, we're going to show you something similar in spirit, 00:06:19.350 --> 00:06:21.100 but the syntax is just a little different. 00:06:21.100 --> 00:06:25.460 We're going to say from cs50, which is our Python library that we the staff 00:06:25.460 --> 00:06:28.880 wrote, import, that is, include a function specifically 00:06:28.880 --> 00:06:29.915 called get_string. 00:06:29.915 --> 00:06:32.870 And now any errors that I might have seen a moment ago on the screen 00:06:32.870 --> 00:06:33.770 have disappeared. 00:06:33.770 --> 00:06:39.590 If I go ahead and save this file and now do python space hello.py and hit Enter, 00:06:39.590 --> 00:06:44.090 now I can go ahead and type in my actual name, and voila, I see "hello," comma, 00:06:44.090 --> 00:06:44.630 "David." 00:06:44.630 --> 00:06:47.120 So let's tease apart what's different about this code 00:06:47.120 --> 00:06:50.040 and consider what more we can do after this. 00:06:50.040 --> 00:06:53.870 So again, notice-- on line 3, there's no mention of string anymore. 00:06:53.870 --> 00:06:56.150 If I want a variable, I just go ahead and give myself 00:06:56.150 --> 00:06:57.620 a variable called answer. 00:06:57.620 --> 00:07:01.010 The function is still called get_string, and it still takes an argument just 00:07:01.010 --> 00:07:04.520 like the C version, but the line no longer ends with a semicolon. 00:07:04.520 --> 00:07:09.020 On my final line of code here, print is now indeed print instead of printf. 00:07:09.020 --> 00:07:10.505 And then this is new syntax. 00:07:10.505 --> 00:07:13.130 But in some sense, it's going to be a lot more straightforward. 00:07:13.130 --> 00:07:17.330 Instead of having to think in advance where I want the %s and my placeholder, 00:07:17.330 --> 00:07:20.750 this plus operator seems to be doing something for me. 00:07:20.750 --> 00:07:23.210 And let me go ahead and ask a question of the group here. 00:07:23.210 --> 00:07:26.300 What does that plus operator seem to be doing? 00:07:26.300 --> 00:07:29.840 Because it's not addition in the arithmetic sense. 00:07:29.840 --> 00:07:32.180 We're not like adding numbers together. 00:07:32.180 --> 00:07:35.790 But the plus is clearly doing something that gives us a visual result. 00:07:35.790 --> 00:07:37.850 Any thoughts from Peter? 00:07:37.850 --> 00:07:39.080 What's this plus doing? 00:07:39.080 --> 00:07:40.973 AUDIENCE: It's concatenating strings. 00:07:40.973 --> 00:07:42.890 DAVID MALAN: Yeah, it's concatenating strings, 00:07:42.890 --> 00:07:47.220 which is the term of art to describe the joining of one string and the other. 00:07:47.220 --> 00:07:50.240 So it's quite like, therefore, Scratch's own Join block. 00:07:50.240 --> 00:07:53.360 We now have a literal translation of that Join block, 00:07:53.360 --> 00:07:57.458 which we didn't have in C. In C we had to use printf, we had to use %s. 00:07:57.458 --> 00:07:59.750 Python is going to be a little more user friendly, such 00:07:59.750 --> 00:08:01.833 that if you want to join two strings like "hello," 00:08:01.833 --> 00:08:04.490 comma, space, and the contents of that variable, 00:08:04.490 --> 00:08:06.770 we can just use this plus operator instead. 00:08:06.770 --> 00:08:09.380 And the last thing that we had to do was, of course, 00:08:09.380 --> 00:08:11.870 import this library so that we have access 00:08:11.870 --> 00:08:13.493 to the get_string function itself. 00:08:13.493 --> 00:08:16.160 Well, let's go ahead and take a tour of just some other features 00:08:16.160 --> 00:08:20.730 of Python and then dive in primarily to a lot of hands-on examples today. 00:08:20.730 --> 00:08:23.850 So recall that in the example we just saw, 00:08:23.850 --> 00:08:26.600 we had this first line of code, which gets a string from the user, 00:08:26.600 --> 00:08:29.000 stores it in a variable called answer. 00:08:29.000 --> 00:08:31.250 We had this second line of code, which as Peter notes, 00:08:31.250 --> 00:08:33.500 concatenated two values together. 00:08:33.500 --> 00:08:38.360 But it turns out, even though this is definitely more convenient than in C 00:08:38.360 --> 00:08:40.700 in that you can just take an existing string and another 00:08:40.700 --> 00:08:44.480 and join them together without having to use format strings or the like, 00:08:44.480 --> 00:08:47.150 well, it turns out there's another way, there's frankly many 00:08:47.150 --> 00:08:50.060 ways in languages like Python to achieve the same result. 00:08:50.060 --> 00:08:53.540 And I'm going to go ahead and propose that we now change this line here 00:08:53.540 --> 00:08:55.160 to this funky syntax. 00:08:55.160 --> 00:08:57.710 So definitely ugly at first glance, and that's 00:08:57.710 --> 00:09:01.100 partly because this is a relatively new feature of Python. 00:09:01.100 --> 00:09:06.170 But notice that in Python can we use these curly braces, so curly braces 00:09:06.170 --> 00:09:11.280 that we have used in C, to plug in an actual value of a variable here. 00:09:11.280 --> 00:09:15.980 So instead of %s, Python's print function uses these curly braces that 00:09:15.980 --> 00:09:18.950 essentially say, plug in a value here. 00:09:18.950 --> 00:09:20.660 But there's one oddity here. 00:09:20.660 --> 00:09:25.220 You can't just start putting curly braces and variable names into strings, 00:09:25.220 --> 00:09:27.230 that is quoted strings in Python. 00:09:27.230 --> 00:09:32.750 You also have to tell the language that what follows is a formatted string. 00:09:32.750 --> 00:09:35.060 So this is perhaps the weirdest thing we've seen yet. 00:09:35.060 --> 00:09:36.950 But when you do have a pair of double quotes 00:09:36.950 --> 00:09:41.240 like I have here, prefixing it with an f will actually 00:09:41.240 --> 00:09:44.060 tell the computer to format the contents of that string, 00:09:44.060 --> 00:09:47.150 plugging in values between those currently braces, as opposed to 00:09:47.150 --> 00:09:50.540 literally printing those curly braces themselves. 00:09:50.540 --> 00:09:54.740 So let me go ahead and transition to my actual code here and try this out. 00:09:54.740 --> 00:09:58.490 Instead of using the concatenation operator as Peter described it, 00:09:58.490 --> 00:10:01.070 this plus operator, let me literally go ahead 00:10:01.070 --> 00:10:04.860 and say, "hello, answer," initially. 00:10:04.860 --> 00:10:07.580 So this is probably not going to be the right approach, 00:10:07.580 --> 00:10:10.760 because if I rerun this program, python of hello.py, 00:10:10.760 --> 00:10:12.260 it's going to ask me what's my name. 00:10:12.260 --> 00:10:14.093 I'm going to type in "David," and it's going 00:10:14.093 --> 00:10:18.140 to ignore me altogether, because I literally hardcoded "hello, answer." 00:10:18.140 --> 00:10:20.990 But it's also not going to be quite right to just start 00:10:20.990 --> 00:10:25.520 putting that in curly braces, because if I again run this program, python 00:10:25.520 --> 00:10:28.190 of hello.py, and type in my name, now it's going 00:10:28.190 --> 00:10:31.350 to say "hello, squiggly brace answer." 00:10:31.350 --> 00:10:33.620 So here is just a subtle change where I have 00:10:33.620 --> 00:10:38.390 to tell Python that this type of string between the double quotes is in fact 00:10:38.390 --> 00:10:39.830 a formatted string. 00:10:39.830 --> 00:10:43.370 And now if I rerun python of hello.py and type in "David," 00:10:43.370 --> 00:10:45.347 I now get "hello, David." 00:10:45.347 --> 00:10:47.930 So it's marginally more convenient than C, because, again, you 00:10:47.930 --> 00:10:50.722 don't have to have a placeholder here, a placeholder here, and then 00:10:50.722 --> 00:10:52.730 a comma separated list of additional arguments. 00:10:52.730 --> 00:10:55.500 So it's just a more succinct way, if you will, 00:10:55.500 --> 00:10:59.900 to actually introduce more values into a string that you want to create. 00:10:59.900 --> 00:11:03.470 These are called format strings, or for short f-strings. 00:11:03.470 --> 00:11:06.740 And it's a new feature that we now have in our toolkit when programming 00:11:06.740 --> 00:11:08.540 with this new language called Python. 00:11:08.540 --> 00:11:11.660 Well, let's take a look at a few other translation of puzzle pieces 00:11:11.660 --> 00:11:13.760 to see, and then turn to Python and then start 00:11:13.760 --> 00:11:16.050 building some programs of our own. 00:11:16.050 --> 00:11:20.000 So here in Scratch, this was an example early on of a variable 00:11:20.000 --> 00:11:22.640 called counter, initializing it to 0. 00:11:22.640 --> 00:11:26.990 In C, in week 1, we started translating that to code like this-- int counter 00:11:26.990 --> 00:11:28.910 equals 0 semicolon. 00:11:28.910 --> 00:11:33.020 And that gave us a variable of type int whose initial value was 0. 00:11:33.020 --> 00:11:35.690 In Python, the code is going to be similar-- 00:11:35.690 --> 00:11:39.590 similar, but it's going to be a little simpler still. 00:11:39.590 --> 00:11:44.030 Notice that I don't have to in Python mention the type of variable I want. 00:11:44.030 --> 00:11:46.520 It will infer from context what it is. 00:11:46.520 --> 00:11:48.930 And I also don't have to have the semicolon there. 00:11:48.930 --> 00:11:53.750 So counter equals 0 in Python is going to give you a variable called counter. 00:11:53.750 --> 00:11:57.410 And because you're assigning it the value 0, Python itself 00:11:57.410 --> 00:11:59.780 the language will infer that, oh, you must 00:11:59.780 --> 00:12:02.510 mean this to be an int or an integer. 00:12:02.510 --> 00:12:04.010 What else did we see in Scratch? 00:12:04.010 --> 00:12:05.540 Change counter by 1. 00:12:05.540 --> 00:12:08.780 So this was a way of increasing the value of a variable by 1. 00:12:08.780 --> 00:12:11.600 In C, we had a few different ways to implement this. 00:12:11.600 --> 00:12:14.360 We could say counter equals counter plus 1. 00:12:14.360 --> 00:12:17.160 It's kind of pedantic, it's kind of long and tedious to type. 00:12:17.160 --> 00:12:19.610 So instead, we had some shorthand notation that 00:12:19.610 --> 00:12:23.140 allowed us to do it this way instead. 00:12:23.140 --> 00:12:27.200 In C, we were able to do counter plus equals 1, 00:12:27.200 --> 00:12:29.850 and that was going to achieve the same result. 00:12:29.850 --> 00:12:32.940 Well, in Python we actually have a couple of approaches as well. 00:12:32.940 --> 00:12:37.130 We can, much like in C, say it explicitly like this 00:12:37.130 --> 00:12:38.700 but just omit the semicolon. 00:12:38.700 --> 00:12:40.730 So counter equals counter plus 1. 00:12:40.730 --> 00:12:44.420 The logic in Python is exactly the same as in C. 00:12:44.420 --> 00:12:48.800 And as for this shorthand notation, this also exists in Python, again 00:12:48.800 --> 00:12:50.150 without the semicolon. 00:12:50.150 --> 00:12:55.310 The one thing that does not exist in Python at this point in the story is 00:12:55.310 --> 00:13:01.850 that fancy counter++ syntax, or i++, that syntactic sugar that made it even 00:13:01.850 --> 00:13:04.040 more succinct to just increment a variable, 00:13:04.040 --> 00:13:06.710 unfortunately does not exist in Python. 00:13:06.710 --> 00:13:12.170 But you can do counter plus equals 1, or whatever your variable happens to be. 00:13:12.170 --> 00:13:14.420 Well, what else did we see in Scratch and then C? 00:13:14.420 --> 00:13:15.360 recall this. 00:13:15.360 --> 00:13:18.290 We introduced, of course, conditions pretty early on. 00:13:18.290 --> 00:13:20.300 And those conditions use Boolean expressions 00:13:20.300 --> 00:13:23.570 to decide whether to do this, or this other thing, or something else 00:13:23.570 --> 00:13:24.410 altogether. 00:13:24.410 --> 00:13:28.850 In C, we converted this to what looked kind of similar. 00:13:28.850 --> 00:13:32.300 Indeed, the curly braces kind of hug the printf line, just 00:13:32.300 --> 00:13:36.020 like the yellow condition here hugs the purple Say block. 00:13:36.020 --> 00:13:41.060 And we had parentheses around the Boolean expression, like x less than y. 00:13:41.060 --> 00:13:43.730 We again used printf inside of the curly braces 00:13:43.730 --> 00:13:48.500 which had double quotes, a backslash n for a new line, and a semicolon. 00:13:48.500 --> 00:13:52.130 Python, nicely enough, is going to be sort of identical in spirit 00:13:52.130 --> 00:13:54.080 but simpler syntactically. 00:13:54.080 --> 00:13:57.930 What Python is going to look like henceforth is just this. 00:13:57.930 --> 00:14:01.700 So the parentheses around the x less than y go away. 00:14:01.700 --> 00:14:03.980 The curly braces go away. 00:14:03.980 --> 00:14:05.540 The new line goes away. 00:14:05.540 --> 00:14:07.550 And the semicolon goes away. 00:14:07.550 --> 00:14:11.510 And here you see just a tiny example of evolution of humans programming 00:14:11.510 --> 00:14:12.440 languages. 00:14:12.440 --> 00:14:14.900 If you and I have been frustrated for some time about all 00:14:14.900 --> 00:14:17.630 the stupid semicolons and curly braces all over the place, 00:14:17.630 --> 00:14:20.040 it makes it harder, in some sense, for your code to read, 00:14:20.040 --> 00:14:23.468 let alone being correct, humans decided when inventing new languages 00:14:23.468 --> 00:14:25.760 that, you know what, why don't we just say what we mean 00:14:25.760 --> 00:14:29.330 and not worry as much about all of this syntactic complexity? 00:14:29.330 --> 00:14:30.560 Let's keep things simpler. 00:14:30.560 --> 00:14:34.250 And indeed, that's what we see here, is one example in Python. 00:14:34.250 --> 00:14:35.930 But there's a key detail. 00:14:35.930 --> 00:14:37.940 If any of you have been in the habit, when 00:14:37.940 --> 00:14:41.630 writing code in C, of being a little sloppy when it comes 00:14:41.630 --> 00:14:44.900 to your indentation, and maybe style50 is constantly 00:14:44.900 --> 00:14:49.010 yelling at you to add spaces, add spaces, or remove spaces or lines, 00:14:49.010 --> 00:14:55.100 well, in Python it is now necessary to indent your code correctly. 00:14:55.100 --> 00:14:58.820 In C, of course, we, CS50 and a lot of the world in general 00:14:58.820 --> 00:15:03.260 recommend that you indent your code by 4 spaces, typically, or one tab. 00:15:03.260 --> 00:15:06.590 In the context of Python, you must do so. 00:15:06.590 --> 00:15:11.420 If you accidentally omit these spaces just to the left of the print statement 00:15:11.420 --> 00:15:14.480 here, your Python code is not going to run at all. 00:15:14.480 --> 00:15:17.400 The Python program just won't work. 00:15:17.400 --> 00:15:19.257 So no more sloppiness. 00:15:19.257 --> 00:15:20.840 Python is going to impose this on you. 00:15:20.840 --> 00:15:24.152 But the upside is you don't have to bother including the curly braces. 00:15:24.152 --> 00:15:26.360 What about a more complicated condition where there's 00:15:26.360 --> 00:15:30.470 two paths you can follow, if or else? 00:15:30.470 --> 00:15:34.460 Well, in this case in C, we translated it pretty straightforwardly like this. 00:15:34.460 --> 00:15:38.960 Again, parentheses up here, curly braces here and here, backslash n, 00:15:38.960 --> 00:15:40.640 backslash n, and semicolon. 00:15:40.640 --> 00:15:42.440 You can perhaps guess in Python that this 00:15:42.440 --> 00:15:45.140 is going to get a little more compact, because boom, 00:15:45.140 --> 00:15:47.700 now we don't need the parentheses anymore. 00:15:47.700 --> 00:15:50.380 We do we need to indent, but we don't need the curly braces. 00:15:50.380 --> 00:15:53.450 We don't need the new line, and we don't need the semicolon. 00:15:53.450 --> 00:15:57.670 So we're sort of shedding features that can be taken now for granted. 00:15:57.670 --> 00:16:01.430 What about this example in Scratch when we had a three-way fork in the road, 00:16:01.430 --> 00:16:03.790 if, else, if, else? 00:16:03.790 --> 00:16:07.960 Well, in Python-- or rather in C, we would have translated this like this. 00:16:07.960 --> 00:16:09.560 And there's not much going on there. 00:16:09.560 --> 00:16:13.090 But it's pretty substantive number of lines of code, some 12 lines, 00:16:13.090 --> 00:16:14.980 just to achieve this simple idea. 00:16:14.980 --> 00:16:17.800 In Python, notice what's going to go away here 00:16:17.800 --> 00:16:22.345 is, again those parentheses, again those curly braces, again the backslash n, 00:16:22.345 --> 00:16:24.340 and the semicolon. 00:16:24.340 --> 00:16:26.890 There's only one oddity here. 00:16:26.890 --> 00:16:28.120 There's only one oddity. 00:16:28.120 --> 00:16:31.120 What looks wrong or weird to you? 00:16:31.120 --> 00:16:34.720 Maybe, what looks like a typo to you? 00:16:34.720 --> 00:16:38.550 And I promise I haven't screwed up here. 00:16:38.550 --> 00:16:40.620 Maybe elsewhere, but not here. 00:16:40.620 --> 00:16:42.130 Andrew? 00:16:42.130 --> 00:16:46.525 AUDIENCE: I would say the elif instead of else if is different syntactically. 00:16:46.525 --> 00:16:47.400 DAVID MALAN: Exactly. 00:16:47.400 --> 00:16:53.100 So whereas in C we would literally say else if, in Python, humans years ago, 00:16:53.100 --> 00:16:57.960 decided, heck, why say else if and waste all of that time typing that out if you 00:16:57.960 --> 00:17:02.640 can more succinctly say "elif" as one word, E-L-I-F. So indeed, 00:17:02.640 --> 00:17:04.020 this is correct syntax here. 00:17:04.020 --> 00:17:05.312 And you can have more of those. 00:17:05.312 --> 00:17:10.440 You can have four forks in the road, five, six, any number thereafter. 00:17:10.440 --> 00:17:12.425 But the syntax is indeed a little different. 00:17:12.425 --> 00:17:13.800 But it's a little tighter, right? 00:17:13.800 --> 00:17:17.369 There's less syntactic distraction when you glance at this code. 00:17:17.369 --> 00:17:19.829 You don't have to ignore as many semicolons and curly 00:17:19.829 --> 00:17:21.240 braces and the like. 00:17:21.240 --> 00:17:23.807 Python tends to just be a little cleaner syntactically. 00:17:23.807 --> 00:17:25.890 And indeed, that's characteristic of a lot of more 00:17:25.890 --> 00:17:28.260 recent, more modern languages like it. 00:17:28.260 --> 00:17:31.770 All right, let's take a look at a few other blocks in Scratch and in turn C. 00:17:31.770 --> 00:17:34.890 In Scratch, when we wanted to do something again and again as a loop, 00:17:34.890 --> 00:17:38.040 perhaps forever, we would literally use the Forever block. 00:17:38.040 --> 00:17:41.460 In C, we could implement this in a few different ways. 00:17:41.460 --> 00:17:46.230 And we proposed quite simply this one-- while true print out "hello, world," 00:17:46.230 --> 00:17:47.940 again and again and again. 00:17:47.940 --> 00:17:50.200 And because the Boolean expression never changes, 00:17:50.200 --> 00:17:51.960 it's going to indeed execute forever. 00:17:51.960 --> 00:17:54.180 So Python is actually pretty similar, but there 00:17:54.180 --> 00:17:55.990 are a couple of subtle differences. 00:17:55.990 --> 00:17:58.350 So ingrain in your mind what this looks like here. 00:17:58.350 --> 00:18:03.420 We have true in parentheses, the curly braces, the new line, the semicolon. 00:18:03.420 --> 00:18:06.090 A lot of that's about to go away, but they're still 00:18:06.090 --> 00:18:07.440 going to be a slight difference. 00:18:07.440 --> 00:18:11.070 Notice that we're indenting, as I keep emphasizing. 00:18:11.070 --> 00:18:14.520 We no longer have the new line or the semicolon or the currently braces, 00:18:14.520 --> 00:18:15.720 but True-- 00:18:15.720 --> 00:18:17.280 and it turns out, False-- 00:18:17.280 --> 00:18:18.910 now must be capitalized. 00:18:18.910 --> 00:18:23.610 So whereas in C it was lowercase false, lowercase true, in Python 00:18:23.610 --> 00:18:26.280 it's going to be capitalized False, capitalized True. 00:18:26.280 --> 00:18:27.000 Why? 00:18:27.000 --> 00:18:28.140 Just because. 00:18:28.140 --> 00:18:32.550 But there is one other detail that's important to note, both with our loops 00:18:32.550 --> 00:18:35.010 here, as well as with our conditions. 00:18:35.010 --> 00:18:38.040 Just as before, if I rewind to our most recent condition, 00:18:38.040 --> 00:18:40.770 notice that even though we've gotten rid of the curly braces 00:18:40.770 --> 00:18:43.620 and we've gotten rid of the parentheses, we now 00:18:43.620 --> 00:18:47.640 have introduced these colons, which are necessary after this expression, 00:18:47.640 --> 00:18:50.580 this expression, and this one, to make clear to Python 00:18:50.580 --> 00:18:54.150 that the lines of code that follow indented underneath 00:18:54.150 --> 00:18:57.910 are indeed relevant to that if, elif, or else. 00:18:57.910 --> 00:19:01.540 And we see that same feature again here in the context of a loop. 00:19:01.540 --> 00:19:02.790 We saw other loops, of course. 00:19:02.790 --> 00:19:04.560 In Scratch, when we wanted to do something 00:19:04.560 --> 00:19:08.910 a finite number of times like 3, we would repeat the following three times. 00:19:08.910 --> 00:19:11.770 In C, we had a few different approaches to this. 00:19:11.770 --> 00:19:14.100 And all of them, I dare say, were very mechanical. 00:19:14.100 --> 00:19:17.520 Like, if you want to do something three times, the onus in C 00:19:17.520 --> 00:19:21.240 is on you to declare a variable, keep track of how many times 00:19:21.240 --> 00:19:23.250 you've counted already, increment the thing. 00:19:23.250 --> 00:19:24.840 Like, there's a lot of moving parts. 00:19:24.840 --> 00:19:27.970 And so in C, one approach looked like this. 00:19:27.970 --> 00:19:30.270 We declare a variable called i equals 0-- 00:19:30.270 --> 00:19:32.110 but we could call it anything we wan-- 00:19:32.110 --> 00:19:35.880 we have a while block here that's asking a Boolean expression again 00:19:35.880 --> 00:19:37.770 and again, is i less than 0-- 00:19:37.770 --> 00:19:39.480 is i less than 3? 00:19:39.480 --> 00:19:42.240 And then inside of the loop, we printed out "hello, world." 00:19:42.240 --> 00:19:46.200 And using C's syntactic sugar, the plus plus notation, 00:19:46.200 --> 00:19:49.800 we kept adding 1 to i, add 1 to i, add 1 to i, 00:19:49.800 --> 00:19:51.840 until we implicitly break out of the loop 00:19:51.840 --> 00:19:54.810 because it's, of course, no longer less than 3. 00:19:54.810 --> 00:19:59.220 So in Python, similar in spirit, but again, some of that clutter goes away. 00:19:59.220 --> 00:20:03.300 i equals 0 is all we need say to give ourselves a variable. 00:20:03.300 --> 00:20:07.680 While i less than 3 is all we need to say there but with a colon. 00:20:07.680 --> 00:20:11.220 Then inside of that, indented properly, we print out "hello, world." 00:20:11.220 --> 00:20:15.610 And-- we can't do the plus plus, so minor disappointment-- 00:20:15.610 --> 00:20:18.300 but i plus equals 1 increments i. 00:20:18.300 --> 00:20:23.070 So this would be one way of implementing in Python the exact same thing a loop 00:20:23.070 --> 00:20:24.720 that executes three times. 00:20:24.720 --> 00:20:27.750 But we saw other approaches, of course, in C, 00:20:27.750 --> 00:20:30.600 and there's other approaches possible in Python as well. 00:20:30.600 --> 00:20:33.527 You might recall in C that we saw this approach, the for loop. 00:20:33.527 --> 00:20:35.610 And odds are you've been reaching for the for loop 00:20:35.610 --> 00:20:38.527 pretty frequently, because even though it looks a little more cryptic, 00:20:38.527 --> 00:20:41.400 you can pack more features into that one line of code 00:20:41.400 --> 00:20:43.510 in between those semicolons, if you will. 00:20:43.510 --> 00:20:47.070 So same exact logic, it just prints out this "hello, world" 00:20:47.070 --> 00:20:49.590 three times using a for loop instead. 00:20:49.590 --> 00:20:54.180 In Python, things start to get a little elegant here now. 00:20:54.180 --> 00:20:57.210 It's a little weird at first glance, but it's definitely more succinct. 00:20:57.210 --> 00:21:01.320 If you want to do something three times, it turns out in Python 00:21:01.320 --> 00:21:05.280 you can use a more succinct syntax for the for loop-- for i 00:21:05.280 --> 00:21:09.430 in, and then in square brackets a list of values. 00:21:09.430 --> 00:21:13.110 So just as we used in the past square brackets in a few different places 00:21:13.110 --> 00:21:18.120 to connote arrays and indexing into arrays, in the world of Python whenever 00:21:18.120 --> 00:21:22.770 you surround a bunch of values that themselves have commas in between them, 00:21:22.770 --> 00:21:26.638 and you encapsulate them all using square brackets, 00:21:26.638 --> 00:21:28.680 that's what we're going to call in Python a list. 00:21:28.680 --> 00:21:30.900 And it's very similar in spirit to an array, 00:21:30.900 --> 00:21:33.490 but we'll call it in the context of Python a list. 00:21:33.490 --> 00:21:38.250 And so what this line of code says is, for i in 0, 1, 2-- what does that mean? 00:21:38.250 --> 00:21:42.660 This is a for loop in Python that says, give me a variable called i. 00:21:42.660 --> 00:21:45.780 And on the first iteration of this loop set i equal to 0. 00:21:45.780 --> 00:21:48.270 On the second iteration of this loop set i equal to 1. 00:21:48.270 --> 00:21:51.660 And on the last iteration of this loop, set i equal to 2 for me. 00:21:51.660 --> 00:21:53.583 It just does all of that for you. 00:21:53.583 --> 00:21:55.500 Now, at the end of the day it actually doesn't 00:21:55.500 --> 00:21:59.550 matter what i is per se, because I'm not printing the value of i. 00:21:59.550 --> 00:22:00.550 And that's totally fine. 00:22:00.550 --> 00:22:03.508 Odds are you've used for loops where you did something again and again, 00:22:03.508 --> 00:22:06.120 like printing "hello, world," even though you didn't print out 00:22:06.120 --> 00:22:07.320 the value of i. 00:22:07.320 --> 00:22:10.600 So technically, I could have put any 3 things in the square brackets 00:22:10.600 --> 00:22:11.100 if I want. 00:22:11.100 --> 00:22:15.450 But the convention would be just enumerate, just like in C, 0, 1, 2, 00:22:15.450 --> 00:22:18.330 just like a computer scientist counting from 0. 00:22:18.330 --> 00:22:22.530 But this could break down pretty easily. 00:22:22.530 --> 00:22:25.500 This could become very ugly very quickly. 00:22:25.500 --> 00:22:29.520 Does anyone see a problem with for loops in Python 00:22:29.520 --> 00:22:33.120 if you have to put in between those square brackets the list of values 00:22:33.120 --> 00:22:36.060 that you want to iterate over? 00:22:36.060 --> 00:22:37.110 Noah? 00:22:37.110 --> 00:22:40.290 AUDIENCE: If you want to do, for example, a thing 50 times, 00:22:40.290 --> 00:22:42.930 you'd have to write out 0, 1, 2, 3, 4, 5, 6. 00:22:42.930 --> 00:22:43.680 DAVID MALAN: Yeah. 00:22:43.680 --> 00:22:45.330 My God, it would start to look hideous quickly. 00:22:45.330 --> 00:22:47.455 And it's funny you mention 50, because in preparing 00:22:47.455 --> 00:22:51.060 this demonstration for lecture today, I went back to week 0, 00:22:51.060 --> 00:22:55.350 when actually the analog in week 0 was to indeed print out "hello, world" 00:22:55.350 --> 00:22:56.042 50 times. 00:22:56.042 --> 00:22:57.750 And I thought to myself, damn it, this is 00:22:57.750 --> 00:22:59.910 going to look atrocious now, because I literally 00:22:59.910 --> 00:23:03.960 have to put inside of square brackets 0, 1, 2, 3, 4, 5, 6, 7, 8, 00:23:03.960 --> 00:23:07.807 9, all the way to 49, as Noah says, which would just look atrocious. 00:23:07.807 --> 00:23:09.640 Like, surely there's got to be a better way. 00:23:09.640 --> 00:23:10.560 And there is. 00:23:10.560 --> 00:23:12.810 While this might be compelling for very short values, 00:23:12.810 --> 00:23:14.760 there's a simpler way in Python when you want 00:23:14.760 --> 00:23:16.860 to do something some number of times. 00:23:16.860 --> 00:23:21.150 We can replace this list of three values with this, 00:23:21.150 --> 00:23:25.350 a function called range that takes an input, which is the number of things 00:23:25.350 --> 00:23:26.430 that you want to return. 00:23:26.430 --> 00:23:30.540 And essentially, what range will do for you passed an input like 3, 00:23:30.540 --> 00:23:35.580 it will automatically generate for you a list of three values, 0, 1, and 2. 00:23:35.580 --> 00:23:39.360 And then Python will iterate over those three values for you. 00:23:39.360 --> 00:23:43.170 So to Noah's concern a moment ago, if I now want to iterate 50 times, 00:23:43.170 --> 00:23:45.690 I just change the 3 to a 50, I don't have 00:23:45.690 --> 00:23:51.300 to create this crazy mess of a manually typed out list of 0 through 49, 00:23:51.300 --> 00:23:55.560 which, of course, would not be a very well designed a program, it would seem, 00:23:55.560 --> 00:23:59.710 just because of the length of it and the opportunity to mess up and the like. 00:23:59.710 --> 00:24:04.080 So in Python, this is perhaps now, if you will, the most Pythonic way 00:24:04.080 --> 00:24:05.953 to do something some number of times. 00:24:05.953 --> 00:24:08.370 And indeed, this is a term of art in the Python community. 00:24:08.370 --> 00:24:10.770 Long story short, technical people, programmers, 00:24:10.770 --> 00:24:12.990 they tend to be pretty religious in some sense 00:24:12.990 --> 00:24:15.420 when it comes to the "right way" of doing things. 00:24:15.420 --> 00:24:18.000 And indeed, within the world of Python programming, 00:24:18.000 --> 00:24:22.450 a lot of Python programmers do have both opinions 00:24:22.450 --> 00:24:26.940 but also standardized recommendations that dictate how you "should" 00:24:26.940 --> 00:24:28.350 write Python code. 00:24:28.350 --> 00:24:31.620 And tricks like this are what are considered Pythonic. 00:24:31.620 --> 00:24:35.100 You are doing something Pythonically if you're doing it the quote, unquote 00:24:35.100 --> 00:24:37.830 "right way," which doesn't mean right in the absolute, 00:24:37.830 --> 00:24:40.890 it means right in the sense that most other people, rather, 00:24:40.890 --> 00:24:42.500 agree with you in this sense. 00:24:42.500 --> 00:24:43.000 All right. 00:24:43.000 --> 00:24:46.500 Let's see a few final features of Python before we now start 00:24:46.500 --> 00:24:48.030 to build some of our own features. 00:24:48.030 --> 00:24:51.702 In C, recall, we had this whole list of data types. 00:24:51.702 --> 00:24:54.160 And there are more, and you can create your own, of course. 00:24:54.160 --> 00:24:56.100 But the primitives that we looked at initially 00:24:56.100 --> 00:25:00.690 were these-- bool, char, double, float, int, long, string, and so forth. 00:25:00.690 --> 00:25:04.290 In Python, even though I haven't needed them, 00:25:04.290 --> 00:25:08.610 because I can give myself a variable like a string or an int, 00:25:08.610 --> 00:25:13.380 just by giving it a name like counter or i or answer, 00:25:13.380 --> 00:25:15.300 and then assigning it a value, and Python 00:25:15.300 --> 00:25:18.480 infers from what you're assigning it what data type it should be, 00:25:18.480 --> 00:25:20.040 Python does have data types. 00:25:20.040 --> 00:25:21.960 It's just what's known in the programming 00:25:21.960 --> 00:25:24.390 world as a loosely typed language. 00:25:24.390 --> 00:25:28.560 In the world of C, C is a strongly typed language, 00:25:28.560 --> 00:25:32.760 where, not only do types exist, you must use them explicitly. 00:25:32.760 --> 00:25:34.920 In the world of Python, you have what's called 00:25:34.920 --> 00:25:39.960 a loosely typed language, in which types exist, 00:25:39.960 --> 00:25:43.380 but you can often infer them implicitly. 00:25:43.380 --> 00:25:46.770 The burden is not on you the programmer to specify 00:25:46.770 --> 00:25:48.480 those data types incessantly. 00:25:48.480 --> 00:25:50.740 Let the computer figure it out for you. 00:25:50.740 --> 00:25:52.950 So this is our list from C. 00:25:52.950 --> 00:25:56.790 This now is going to be our analogous list in the world of Python. 00:25:56.790 --> 00:25:59.230 We're going to have bool still, True and False, 00:25:59.230 --> 00:26:01.470 but capital T, capital F. We're going to have floats, 00:26:01.470 --> 00:26:03.210 which are real numbers with decimal points. 00:26:03.210 --> 00:26:06.127 We're going to have ints, which of course are numbers like negative 1, 00:26:06.127 --> 00:26:07.600 0, and 1, and so forth. 00:26:07.600 --> 00:26:10.950 And then not strings per se, but "stirs", S-T-R. 00:26:10.950 --> 00:26:15.930 And where is in the world of C, there was technically no "string type"-- 00:26:15.930 --> 00:26:20.010 that was a feature offered by the cs50 library, which just made more 00:26:20.010 --> 00:26:22.620 accessible the idea of a char star-- 00:26:22.620 --> 00:26:24.450 recall that C has strings. 00:26:24.450 --> 00:26:27.630 And they're called strings, but there's no data type called string. 00:26:27.630 --> 00:26:29.910 The way you give yourself a string, of course, in C 00:26:29.910 --> 00:26:31.890 is to declare something as a char star. 00:26:31.890 --> 00:26:34.650 And in cs50's library, we just gave that char star 00:26:34.650 --> 00:26:37.620 a synonym, a nickname, an alias, called "string." 00:26:37.620 --> 00:26:40.430 In Python, there are actual-- 00:26:40.430 --> 00:26:42.560 there is an actual data type for strings. 00:26:42.560 --> 00:26:46.000 And for short, it's called S-T-R. 00:26:46.000 --> 00:26:46.500 All right. 00:26:46.500 --> 00:26:49.550 So with that said, what other features do we 00:26:49.550 --> 00:26:51.660 have from Python that we can use here? 00:26:51.660 --> 00:26:53.690 Well, there's other data types as well in 00:26:53.690 --> 00:26:57.050 Python that are actually going to prove super useful as we begin 00:26:57.050 --> 00:26:59.210 to develop more sophisticated programs and do 00:26:59.210 --> 00:27:00.980 even cooler things with the language. 00:27:00.980 --> 00:27:02.570 We've seen range already. 00:27:02.570 --> 00:27:05.960 Strictly speaking, this is a data type of sorts within Python 00:27:05.960 --> 00:27:09.560 that gives you back a range of values, by default 0 on up, 00:27:09.560 --> 00:27:10.970 based on the input you provide. 00:27:10.970 --> 00:27:12.860 List, I keep mentioning verbally. 00:27:12.860 --> 00:27:18.920 A list is a proper data type in Python that's similar in spirit to arrays. 00:27:18.920 --> 00:27:20.300 But whereas in arrays-- 00:27:20.300 --> 00:27:23.390 recall, we've spent great emphasis over the past few weeks 00:27:23.390 --> 00:27:25.970 noting that arrays are a fixed size. 00:27:25.970 --> 00:27:28.960 You have to decide in advance how big that array is going to be. 00:27:28.960 --> 00:27:31.940 And like last week, if you decide, oops, I need more memory, 00:27:31.940 --> 00:27:34.260 you have to dynamically allocate more space for it, 00:27:34.260 --> 00:27:36.830 copy values over, and then free up the old memory. 00:27:36.830 --> 00:27:39.740 Like, there's so much jumping through hoops, so to speak, 00:27:39.740 --> 00:27:44.030 when you want to use arrays in C if you want to grow them or even shrink them. 00:27:44.030 --> 00:27:49.170 Python and other higher-level languages like it do all of that for you. 00:27:49.170 --> 00:27:53.330 So a list is like an array that automatically resizes itself, 00:27:53.330 --> 00:27:54.500 bigger and smaller. 00:27:54.500 --> 00:27:57.350 That feature now you get for free in the language, so to speak. 00:27:57.350 --> 00:27:59.240 You don't have to implement it yourself. 00:27:59.240 --> 00:28:00.860 Python has what are called tuples. 00:28:00.860 --> 00:28:03.530 In the context of like math, or GPS, you might 00:28:03.530 --> 00:28:06.500 have x- and y-coordinates, or latitude and longitude coordinates, 00:28:06.500 --> 00:28:08.270 so like comma separated values. 00:28:08.270 --> 00:28:11.510 Tuples are one way of implementing those in Python. 00:28:11.510 --> 00:28:13.220 Dict, or dictionaries. 00:28:13.220 --> 00:28:17.690 So Python has dictionaries that allow you to store keys and values. 00:28:17.690 --> 00:28:21.150 Or literally in our human world, if you have a human dictionary here, 00:28:21.150 --> 00:28:24.650 for instance for English, much like a dictionary in physical form, 00:28:24.650 --> 00:28:29.450 lets you store words and their definitions, a dictionary in Python, 00:28:29.450 --> 00:28:33.350 more generally, lets you store any keys and any values. 00:28:33.350 --> 00:28:35.540 You can associate one thing with another. 00:28:35.540 --> 00:28:39.155 And we'll see that this is a wonderfully useful and versatile data structure. 00:28:39.155 --> 00:28:41.030 And then lastly for today's purposes, there's 00:28:41.030 --> 00:28:44.090 these things called sets which, if you recall from math, 00:28:44.090 --> 00:28:49.100 a set is a collection of values, like a, b, c or 1, 2, 3, without duplicates. 00:28:49.100 --> 00:28:50.810 But Python manages that for you. 00:28:50.810 --> 00:28:54.050 You can add items to a set, you can remove items from a set. 00:28:54.050 --> 00:28:57.110 Python will make sure that there are no duplicates for you, 00:28:57.110 --> 00:29:01.230 and it will manage all of the memory for you as well. 00:29:01.230 --> 00:29:06.920 So what we have in the way of functions, meanwhile, is a few familiar friends. 00:29:06.920 --> 00:29:10.910 Recall that in C we used the cs50 library to get chars, 00:29:10.910 --> 00:29:13.430 doubles, floats, ints, longs, and strings. 00:29:13.430 --> 00:29:17.330 In Python, thankfully, we don't have to worry about doubles or longs anymore. 00:29:17.330 --> 00:29:18.450 More on that in a bit. 00:29:18.450 --> 00:29:23.000 But the cs50 library for Python, which you saw me import a few minutes ago, 00:29:23.000 --> 00:29:25.040 does give you a function called get_float. 00:29:25.040 --> 00:29:26.960 It does give you a function called get_int, 00:29:26.960 --> 00:29:29.110 it does give you a function called get_string, 00:29:29.110 --> 00:29:31.280 that, at least for this week's purposes, are just 00:29:31.280 --> 00:29:32.720 going to make your life easier. 00:29:32.720 --> 00:29:36.140 These two are training wheels that we will very quickly take off 00:29:36.140 --> 00:29:39.800 so that you're only using native Python code ultimately, 00:29:39.800 --> 00:29:41.720 and not CS50'S own library. 00:29:41.720 --> 00:29:44.750 But for the sake of transitioning this week from C to Python, 00:29:44.750 --> 00:29:48.590 you'll find that these will just make your life easier before we relax 00:29:48.590 --> 00:29:52.020 and take those away, too. 00:29:52.020 --> 00:29:56.420 So in C, to use the library you had to include cs50.h. 00:29:56.420 --> 00:29:58.860 In Python, again you're going to go ahead and import 00:29:58.860 --> 00:30:03.620 cs50, or more explicitly, the specific function that you might want to import. 00:30:03.620 --> 00:30:06.020 So it turns out there's different ways to import things. 00:30:06.020 --> 00:30:08.840 They ultimately achieve essentially the same goal. 00:30:08.840 --> 00:30:11.330 You can, with lines like this, explicitly 00:30:11.330 --> 00:30:15.640 import one function at a time, like I did earlier using get_string, 00:30:15.640 --> 00:30:18.290 or you can import the whole library all at once 00:30:18.290 --> 00:30:21.185 by just saying more succinctly, import cs50. 00:30:21.185 --> 00:30:24.200 It's going to affect the syntax we have to use hereafter, 00:30:24.200 --> 00:30:28.530 but you'll see multiple ways of doing this in our examples here on out. 00:30:28.530 --> 00:30:30.350 You can also simplify this a bit, and you 00:30:30.350 --> 00:30:35.900 can import a comma separated list of functions from a library like ours. 00:30:35.900 --> 00:30:38.810 And this is a convention we'll see quite frequently as well. 00:30:38.810 --> 00:30:43.130 Because if we start using popular third-party libraries written 00:30:43.130 --> 00:30:46.130 by other programmers on the internet, they will very commonly 00:30:46.130 --> 00:30:48.830 give us lots of functions that we ourselves can use, 00:30:48.830 --> 00:30:52.490 and we will be able to import those one after the other, 00:30:52.490 --> 00:30:56.080 by just specifying them here in this way. 00:30:56.080 --> 00:30:56.580 All right. 00:30:56.580 --> 00:31:01.940 Let me pause here just to see if there's any questions on Python syntax. 00:31:01.940 --> 00:31:06.080 Like, that's essentially it for our crash course in Python syntax. 00:31:06.080 --> 00:31:10.135 We're now going to start building things and explore what the features of Python 00:31:10.135 --> 00:31:13.010 are and what some of the nuances are, and really the power of Python. 00:31:13.010 --> 00:31:17.890 But first, any questions on syntax? 00:31:17.890 --> 00:31:20.980 We've seen loops, conditions, variables. 00:31:20.980 --> 00:31:25.220 Olivia, question or comment. 00:31:25.220 --> 00:31:28.880 AUDIENCE: In a for loop, if you want to increment by something besides 1, 00:31:28.880 --> 00:31:32.843 but you don't want to explicitly type out the list, how would you do that? 00:31:32.843 --> 00:31:34.260 DAVID MALAN: Really good question. 00:31:34.260 --> 00:31:39.150 So if you wanted to use a for loop and iterate over a range of values, 00:31:39.150 --> 00:31:46.070 but you wanted that range to be 0, 2, 4, 6, 8, instead of 0, 1, 2, 3, 00:31:46.070 --> 00:31:48.980 let me go ahead and go back to that slide from a moment ago. 00:31:48.980 --> 00:31:51.710 And I can actually change this on the fly. 00:31:51.710 --> 00:31:54.840 Let me go into that slide, which was right here. 00:31:54.840 --> 00:31:59.630 And what I can do, actually, is specify another value, which might be this. 00:31:59.630 --> 00:32:04.790 If I change the input to range to be not one value but two values, 00:32:04.790 --> 00:32:07.520 that's going to be a clue to the computer 00:32:07.520 --> 00:32:10.310 that it should count a total of three values, 00:32:10.310 --> 00:32:14.022 but it should increment 2 at a time instead of the default, which is 1. 00:32:14.022 --> 00:32:15.980 And there's even other capabilities there, too. 00:32:15.980 --> 00:32:17.563 You don't have to start counting at 0. 00:32:17.563 --> 00:32:20.870 You can adjust that as well, which is to say that with Python, you're 00:32:20.870 --> 00:32:23.630 going to find a lot more features come with the language, 00:32:23.630 --> 00:32:28.160 and even more powerfully, the functions that you can write 00:32:28.160 --> 00:32:31.010 and the functions that you can use in Python 00:32:31.010 --> 00:32:34.220 also can take different numbers of arguments. 00:32:34.220 --> 00:32:36.830 Sometimes it's 0, sometimes it's 1, sometimes it's 2. 00:32:36.830 --> 00:32:40.310 But it's ultimately often up to you. 00:32:40.310 --> 00:32:40.820 Good catch. 00:32:40.820 --> 00:32:42.020 Other questions? 00:32:42.020 --> 00:32:45.860 AUDIENCE: Will we see sequences primarily in the for loops? 00:32:45.860 --> 00:32:48.620 Or are there other applications where they're very useful? 00:32:48.620 --> 00:32:50.162 DAVID MALAN: Sequences in what sense? 00:32:50.162 --> 00:32:53.030 In the sense of ranges or lists or something else? 00:32:53.030 --> 00:32:55.535 AUDIENCE: Yeah, in terms of ranges, specifically. 00:32:55.535 --> 00:32:56.660 DAVID MALAN: Good question. 00:32:56.660 --> 00:32:58.118 Will we use them in other contexts? 00:32:58.118 --> 00:33:01.110 Generally speaking, it's pretty rare. 00:33:01.110 --> 00:33:04.250 I mean, I'm racking my brain now as to other use cases 00:33:04.250 --> 00:33:06.257 that I have used range for. 00:33:06.257 --> 00:33:08.090 And I'm sure I could come up with something. 00:33:08.090 --> 00:33:12.050 But I think hands down, the most common case is in the context of iteration, 00:33:12.050 --> 00:33:13.308 as in a for loop. 00:33:13.308 --> 00:33:15.350 And I'll think on that to see other applications. 00:33:15.350 --> 00:33:18.110 But any time you want to generate a long list of values 00:33:18.110 --> 00:33:21.950 that follow some pattern, whether it's 0, 1, 2, or as Olivia points out, 00:33:21.950 --> 00:33:24.650 a range of values with gaps, range will allow 00:33:24.650 --> 00:33:26.870 you to avoid having to hardcode it entirely. 00:33:26.870 --> 00:33:29.840 And you can actually write your own generator function, so to speak, 00:33:29.840 --> 00:33:34.260 a function that returns whatever pattern of values that you want. 00:33:34.260 --> 00:33:36.870 Other questions or confusion? 00:33:39.610 --> 00:33:43.900 Anything on your end, Brian, from the chat or beyond? 00:33:43.900 --> 00:33:46.150 BRIAN: Looks like all the questions are answered here. 00:33:46.150 --> 00:33:46.850 DAVID MALAN: All right. 00:33:46.850 --> 00:33:48.820 Well, let's go ahead now and do something more interesting 00:33:48.820 --> 00:33:49.570 than hello, world. 00:33:49.570 --> 00:33:52.660 Because after all, this is where programming really gets fun, 00:33:52.660 --> 00:33:55.810 really gets powerful, when you and I no longer 00:33:55.810 --> 00:33:59.050 have to implement those low-level implementation details, when 00:33:59.050 --> 00:34:02.380 you had to implement memory management for your hash table, 00:34:02.380 --> 00:34:06.040 or memory management for a linked list, or copying values in an array. 00:34:06.040 --> 00:34:08.500 We've spent the past several weeks focusing really 00:34:08.500 --> 00:34:12.070 on some low-level primitives that are useful to understand, 00:34:12.070 --> 00:34:13.989 but they're not fun to write. 00:34:13.989 --> 00:34:17.322 And I concede that they might not be fun to write in problem set form. 00:34:17.322 --> 00:34:20.530 And they're certainly not going to be fun to write for the rest of your life, 00:34:20.530 --> 00:34:23.219 every time you want to just write code to solve some problem. 00:34:23.219 --> 00:34:24.969 But again, that's where libraries come in. 00:34:24.969 --> 00:34:27.159 And now, this is where other languages come in. 00:34:27.159 --> 00:34:31.900 It turns out that Python is a much better, a much easier 00:34:31.900 --> 00:34:35.080 language to use for solving certain types of problems, 00:34:35.080 --> 00:34:39.199 among them some of the problems we have been solving in past problems sets. 00:34:39.199 --> 00:34:41.750 So in fact, let me go ahead and do this. 00:34:41.750 --> 00:34:46.030 I'm going to go ahead and grab a file here-- 00:34:46.030 --> 00:34:47.710 give me one moment-- 00:34:47.710 --> 00:34:52.389 called bridge.bmp, which you might recall from a past problem set. 00:34:52.389 --> 00:34:56.500 This is the beautiful Weeks bridge down by the Charles River in Cambridge, Mass 00:34:56.500 --> 00:34:57.160 by Harvard. 00:34:57.160 --> 00:35:00.430 And this is a very clear photograph taken by one of CS50's team members. 00:35:00.430 --> 00:35:02.590 And in recent weeks, of course, you wrote code 00:35:02.590 --> 00:35:06.850 to do all sorts of mutations of this image, among them blurring the image. 00:35:06.850 --> 00:35:10.510 And blur, I dare say, was not the easiest problem to solve. 00:35:10.510 --> 00:35:13.240 You had to look up, down, left, and right, sort of average 00:35:13.240 --> 00:35:14.080 all of those pixels. 00:35:14.080 --> 00:35:17.300 You had to understand how an image is represented one pixel at a time. 00:35:17.300 --> 00:35:20.300 So there's a lot of low-level minutia there, when at the end of the day, 00:35:20.300 --> 00:35:22.600 all you want to do is just blur an image. 00:35:22.600 --> 00:35:26.740 So whereas in past weeks we sort of had to think at and write at this lower 00:35:26.740 --> 00:35:29.770 level, now with Python it turns out we're 00:35:29.770 --> 00:35:33.070 going to have the ability to think at a higher level of abstraction 00:35:33.070 --> 00:35:35.328 and write far less code for ourselves. 00:35:35.328 --> 00:35:36.620 So let me go ahead and do this. 00:35:36.620 --> 00:35:39.430 I'm going to use my Mac for this instead of CS50 IDE, 00:35:39.430 --> 00:35:41.582 so I can open the images more quickly. 00:35:41.582 --> 00:35:43.540 This is to say that, even though we'll continue 00:35:43.540 --> 00:35:47.050 using CS50 IDE for Python and for other languages 00:35:47.050 --> 00:35:51.220 over the remainder of the course, you can also install the requisite software 00:35:51.220 --> 00:35:54.880 on a Mac, on a PC, sometimes even kind of sort of a phone 00:35:54.880 --> 00:35:59.950 today, to use Python and sort of see, in other languages, on your own devices. 00:35:59.950 --> 00:36:02.600 But again, we tend to CS50 IDE during the class 00:36:02.600 --> 00:36:05.208 so as to have a standard environment that just works. 00:36:05.208 --> 00:36:07.000 So I'm going to go ahead and write, though, 00:36:07.000 --> 00:36:11.680 on my computer a program called blur.py, py, of course, 00:36:11.680 --> 00:36:13.630 being the file extension for Python programs. 00:36:13.630 --> 00:36:15.460 So my program looks a little different now. 00:36:15.460 --> 00:36:17.620 I've got this black and blue and white window. 00:36:17.620 --> 00:36:21.130 But this is just a text editor on my own personal Mac here. 00:36:21.130 --> 00:36:22.870 I'm going to go ahead and do this. 00:36:22.870 --> 00:36:25.870 I need to have some functionality related to images 00:36:25.870 --> 00:36:27.320 in order to blur an image. 00:36:27.320 --> 00:36:30.370 So I'm going to go ahead and import from a PIL library, 00:36:30.370 --> 00:36:34.480 a Pillow library, so to speak, a special feature 00:36:34.480 --> 00:36:37.510 called Image and a special feature called ImageFilter. 00:36:37.510 --> 00:36:39.970 That is to say, these are essentially two functions 00:36:39.970 --> 00:36:43.360 that someone else smarter than me when it comes to image manipulation wrote, 00:36:43.360 --> 00:36:46.970 they made their code freely available on the internet free and open source, 00:36:46.970 --> 00:36:49.630 which means anyone can use the code, and I am allowed now 00:36:49.630 --> 00:36:54.190 to import it into my program, because I before class downloaded and installed 00:36:54.190 --> 00:36:55.437 it beforehand. 00:36:55.437 --> 00:36:57.020 Now I'm going to go ahead and do this. 00:36:57.020 --> 00:36:59.110 I'm going to give myself a variable called before. 00:36:59.110 --> 00:37:03.460 And I'm going to call Image.open on bridge.bmp. 00:37:03.460 --> 00:37:06.700 So again, even though we've never seen this before, never used this before, 00:37:06.700 --> 00:37:09.130 you can kind of glean syntactically what's going on. 00:37:09.130 --> 00:37:11.500 I've got a variable on the left called before. 00:37:11.500 --> 00:37:15.130 I've got a function on the right called Image.open, 00:37:15.130 --> 00:37:17.410 and I'm passing in the name bridge.bmp. 00:37:17.410 --> 00:37:21.130 So it sounds like this is kind of like fopen in the world of C. 00:37:21.130 --> 00:37:24.430 Now notice, this dot is kind of serving a new role here. 00:37:24.430 --> 00:37:29.530 In the past, we've used the operator only for structs in C, 00:37:29.530 --> 00:37:34.160 when we want to go into a person object, or into a node object, 00:37:34.160 --> 00:37:37.420 and we want to go inside of it and access some variable therein. 00:37:37.420 --> 00:37:42.310 Well, it turns out in Python, you have things similar in spirit to structs 00:37:42.310 --> 00:37:49.780 in C. But instead of containing only variables or data, like name and number 00:37:49.780 --> 00:37:52.780 like we did for the person struct a few weeks back, 00:37:52.780 --> 00:37:56.200 in Python you can have inside of a structure 00:37:56.200 --> 00:37:59.410 not only data, that is variables, you can also 00:37:59.410 --> 00:38:01.900 have functions inside of structures. 00:38:01.900 --> 00:38:05.080 And that starts to open up all sorts of possibilities 00:38:05.080 --> 00:38:07.370 in terms of features available to you. 00:38:07.370 --> 00:38:13.480 So it seems that I've got this Image object, this Image struct that I've, 00:38:13.480 --> 00:38:15.040 again, imported from someone else. 00:38:15.040 --> 00:38:17.710 Inside of it is an open function that expects 00:38:17.710 --> 00:38:19.880 as input the name of a file to open. 00:38:19.880 --> 00:38:23.110 So we'll see this syntax increasingly over the course of today's examples. 00:38:23.110 --> 00:38:25.370 Let me give myself a second variable, after. 00:38:25.370 --> 00:38:28.000 Let me go ahead now and assign to this variable called 00:38:28.000 --> 00:38:33.610 after the results of calling that before image's filter function, 00:38:33.610 --> 00:38:37.747 passing in ImageFilter.BoxBlur of 1. 00:38:37.747 --> 00:38:39.580 Now, this is a little cryptic, and we're not 00:38:39.580 --> 00:38:41.860 going to spend time on this particular syntax, because odds are, 00:38:41.860 --> 00:38:44.100 in life you're not going to have that many opportunities to want 00:38:44.100 --> 00:38:46.683 to blur an image for which you're going to run and write code. 00:38:46.683 --> 00:38:50.700 But for today's purposes, notice that inside of my before variable, 00:38:50.700 --> 00:38:54.570 because I assigned it the return value of this new feature, 00:38:54.570 --> 00:38:59.280 it has inside of it not just data but also functions, one of them 00:38:59.280 --> 00:39:00.450 now called filter. 00:39:00.450 --> 00:39:04.860 And this filter function takes as input the return value of some other function 00:39:04.860 --> 00:39:08.190 called that, long story short, will blur my image using 00:39:08.190 --> 00:39:12.130 a box of a 1-pixel radius. 00:39:12.130 --> 00:39:15.210 So just like your own code, if you implemented blur in C, 00:39:15.210 --> 00:39:18.840 this code is going to tell my code to look up, down, left, and right 00:39:18.840 --> 00:39:23.380 and blur the pixels by taking the average around them. 00:39:23.380 --> 00:39:24.300 And that's kind of it. 00:39:24.300 --> 00:39:26.250 After that I'm going to do after.save. 00:39:26.250 --> 00:39:28.500 And I'm going to save this as out.bmp. 00:39:28.500 --> 00:39:30.990 I just want to create a new file called out.bmp. 00:39:30.990 --> 00:39:33.450 And if I've made no mistakes, let me go ahead now 00:39:33.450 --> 00:39:37.830 and run python of blur.py and hit Enter. 00:39:37.830 --> 00:39:40.200 No error messages, so that's usually a good thing. 00:39:40.200 --> 00:39:43.980 If I type ls now, notice that I've got bridge.bmp, 00:39:43.980 --> 00:39:48.330 which I already opened, blur.py, which I just wrote, and out.bmp. 00:39:48.330 --> 00:39:52.860 And if I go ahead and open out.bmp, let's go ahead and take a look. 00:39:52.860 --> 00:39:55.720 Here's before, here's after. 00:39:55.720 --> 00:39:56.220 Huh. 00:39:56.220 --> 00:39:58.610 Before, after. 00:39:58.610 --> 00:40:00.360 Now, over the internet it probably doesn't 00:40:00.360 --> 00:40:03.068 look that blurred, though on my Mac right here a few inches away, 00:40:03.068 --> 00:40:04.238 it definitely looks blurred. 00:40:04.238 --> 00:40:06.030 But let's do it a little more compellingly. 00:40:06.030 --> 00:40:09.030 How about, instead of looking one pixel up, down, left, and right, 00:40:09.030 --> 00:40:10.930 why don't we look 10 pixels at a time? 00:40:10.930 --> 00:40:15.390 So we really blur it by looking at more values and averaging more. 00:40:15.390 --> 00:40:19.020 Let me go ahead now and run python of blur.py. 00:40:19.020 --> 00:40:20.880 Now let me go ahead and reopen. 00:40:20.880 --> 00:40:24.820 And now you see before and after. 00:40:24.820 --> 00:40:27.220 Before and after. 00:40:27.220 --> 00:40:28.480 So what is this to say? 00:40:28.480 --> 00:40:33.270 Well, here is, what, problem set 4 in four lines of code blurring an image. 00:40:33.270 --> 00:40:34.890 So pretty cool, pretty powerful. 00:40:34.890 --> 00:40:37.800 By standing on the shoulders of others and using their libraries can 00:40:37.800 --> 00:40:40.320 we do other things quite quickly. 00:40:40.320 --> 00:40:45.670 Notice what I can also do here, too, is solve a more recent problem. 00:40:45.670 --> 00:40:50.340 Let me go over to a different directory, where I have in advance-- 00:40:50.340 --> 00:40:53.070 and you can download these files off of the course's website-- 00:40:53.070 --> 00:40:55.920 a few files that we wrote before class. 00:40:55.920 --> 00:40:58.110 One is called speller.py. 00:40:58.110 --> 00:41:03.090 So long story short, speller.py is a translation from C 00:41:03.090 --> 00:41:06.065 into Python the code for speller.c. 00:41:06.065 --> 00:41:08.940 Recall that that was part of the distribution code for problem set 5, 00:41:08.940 --> 00:41:11.910 and in speller.c, we translated it now to speller.py. 00:41:11.910 --> 00:41:15.450 And in dictionaries and in texts, we see the same files, 00:41:15.450 --> 00:41:19.200 as in problem set 5, two different sized dictionaries and a whole bunch 00:41:19.200 --> 00:41:21.060 of short and long texts. 00:41:21.060 --> 00:41:25.500 What hasn't been created yet is the equivalent of a dictionary.c, a.k.a. 00:41:25.500 --> 00:41:27.540 now, dictionary.py. 00:41:27.540 --> 00:41:30.240 So let me go ahead and implement my spell checker in Python. 00:41:30.240 --> 00:41:34.090 Let me go ahead and create a file called dictionary.py, as is again, 00:41:34.090 --> 00:41:34.980 the convention. 00:41:34.980 --> 00:41:36.218 And let's go ahead. 00:41:36.218 --> 00:41:38.010 We have to implement four functions, right? 00:41:38.010 --> 00:41:40.950 We have to implement check, load, size, and unload. 00:41:40.950 --> 00:41:44.520 But I probably need like a global variable here to store my dictionary. 00:41:44.520 --> 00:41:47.700 And this is where you all implemented your hash table with a pointer, 00:41:47.700 --> 00:41:50.935 and then linked lists, and arrays, and all of that, a lot of complexity. 00:41:50.935 --> 00:41:54.060 You know what, I'm just going to go ahead and give myself a variable called 00:41:54.060 --> 00:41:56.340 words and declare it as a set. 00:41:56.340 --> 00:41:58.920 So recall that a set is just a collection of values 00:41:58.920 --> 00:42:01.140 that handles duplicates for you. 00:42:01.140 --> 00:42:02.820 And frankly, that's all I really need. 00:42:02.820 --> 00:42:05.730 I need to be able to store all of the words in a dictionary 00:42:05.730 --> 00:42:09.180 and just throw them into a set, so that there's no duplicate values 00:42:09.180 --> 00:42:13.020 and I can just check, is one word in the set or is it not. 00:42:13.020 --> 00:42:15.768 Well, let's go ahead now and load words into that set. 00:42:15.768 --> 00:42:18.060 I'm going to go ahead and define a function called load 00:42:18.060 --> 00:42:20.160 that takes the name of a file to open. 00:42:20.160 --> 00:42:22.650 And here is some admittedly some new syntax. 00:42:22.650 --> 00:42:27.760 So thus far, we've only typed code into the file itself. 00:42:27.760 --> 00:42:30.300 In fact, the most striking difference thus far, 00:42:30.300 --> 00:42:33.480 dare say, about Python versus C, is that I have never 00:42:33.480 --> 00:42:36.307 once even written a main function. 00:42:36.307 --> 00:42:37.890 And that, too, is a feature of Python. 00:42:37.890 --> 00:42:39.600 If you want to write a program, you don't 00:42:39.600 --> 00:42:43.200 have to bother writing your default code in a function called main. 00:42:43.200 --> 00:42:44.460 Just start writing your code. 00:42:44.460 --> 00:42:46.920 And that's how we were able to get hello, world 00:42:46.920 --> 00:42:50.850 down from this many lines of code in C to one line in Python. 00:42:50.850 --> 00:42:52.530 We didn't even need to have main. 00:42:52.530 --> 00:42:57.060 But if I want to define my own functions, it turns out in Python, 00:42:57.060 --> 00:43:01.080 you use the key word def for define, then you put the name of the function , 00:43:01.080 --> 00:43:04.680 and then in parentheses, like in C, you put the names of the variables 00:43:04.680 --> 00:43:07.260 or parameters that you want the function to take. 00:43:07.260 --> 00:43:09.600 You don't have to specify data types, though. 00:43:09.600 --> 00:43:13.390 And again, we don't use curly braces, we're instead using a colon. 00:43:13.390 --> 00:43:16.980 So this says, hey, Python, give me a function called load that 00:43:16.980 --> 00:43:19.410 takes an argument called dictionary. 00:43:19.410 --> 00:43:21.250 And what should this function do? 00:43:21.250 --> 00:43:23.670 Well, the purpose of the load function in speller 00:43:23.670 --> 00:43:25.650 was to load each word from the dictionary 00:43:25.650 --> 00:43:27.517 and somehow put it into your hash table. 00:43:27.517 --> 00:43:30.600 I'm going to go ahead and do the same-- read each word from the dictionary 00:43:30.600 --> 00:43:33.752 and put it into this so-called set, my variable called words. 00:43:33.752 --> 00:43:36.960 So I'm going to go ahead and open the file, which I can do with this function 00:43:36.960 --> 00:43:37.590 here. 00:43:37.590 --> 00:43:39.730 In Python, you don't use fopen. 00:43:39.730 --> 00:43:41.490 You just use a function called open. 00:43:41.490 --> 00:43:45.630 And I'm going to sign the return value of open to a variable called file. 00:43:45.630 --> 00:43:47.790 But I could call that anything I want. 00:43:47.790 --> 00:43:49.710 This is where Python gets really cool. 00:43:49.710 --> 00:43:52.560 Recall that reading the lines from Python-- 00:43:52.560 --> 00:43:55.920 from the file in C was kind of arduous, right? 00:43:55.920 --> 00:43:59.790 You had to use fread or some other function 00:43:59.790 --> 00:44:02.250 in order to read character after character 00:44:02.250 --> 00:44:04.232 after character, one line at a time. 00:44:04.232 --> 00:44:05.940 Well, here in Python, you know what, if I 00:44:05.940 --> 00:44:08.280 want to iterate over all the lines in the file, 00:44:08.280 --> 00:44:10.770 we'll just say for line in file. 00:44:10.770 --> 00:44:15.720 This is going to automatically give me a for loop that 00:44:15.720 --> 00:44:22.050 assigns the variable line to each successive line in the file for me. 00:44:22.050 --> 00:44:25.140 It will figure out where all of those lines are. 00:44:25.140 --> 00:44:27.210 What do I want to do with each line? 00:44:27.210 --> 00:44:31.410 Well, I want to go ahead and add to my set of words that line. 00:44:31.410 --> 00:44:33.600 Insofar as each word-- 00:44:33.600 --> 00:44:39.660 each line represents a word, I just want to add to my global variable words 00:44:39.660 --> 00:44:40.510 that line. 00:44:40.510 --> 00:44:42.300 And that's not quite right, because what's 00:44:42.300 --> 00:44:45.000 at the end of every line in my file? 00:44:45.000 --> 00:44:48.600 Every line in my file by definition has a backslash n, right? 00:44:48.600 --> 00:44:51.090 That is why all of the words in the big dictionary 00:44:51.090 --> 00:44:53.020 we gave you are one per line. 00:44:53.020 --> 00:44:57.150 So how do you get rid of the new line at the end of a string? 00:44:57.150 --> 00:45:01.470 Well, in C, my God, we would have to use malloc to make a copy, 00:45:01.470 --> 00:45:04.860 and then move all of the characters over, and then shorten it a little bit 00:45:04.860 --> 00:45:06.480 by getting rid of the backslash n. 00:45:06.480 --> 00:45:07.050 Uh-uh. 00:45:07.050 --> 00:45:12.960 In Python, if you want to strip off the new line at the end of a string, 00:45:12.960 --> 00:45:15.030 just do rstrip. 00:45:15.030 --> 00:45:18.510 To strip characters means by default to strip off white space. 00:45:18.510 --> 00:45:21.960 White space includes the space bar, the tab character, and backslash n. 00:45:21.960 --> 00:45:25.320 And so if you want to take each line and throw away 00:45:25.320 --> 00:45:30.150 the trailing new line at the end of it, you can simply say line.rstrip. 00:45:30.150 --> 00:45:33.120 And this is where strings again in Python are powerful. 00:45:33.120 --> 00:45:37.200 Because they are their own data type, they have inside of them, 00:45:37.200 --> 00:45:42.810 not only all of the characters composing the string, but also functions, 00:45:42.810 --> 00:45:46.080 like rstrip which strips from the end of the line 00:45:46.080 --> 00:45:48.210 any white space that might be there. 00:45:48.210 --> 00:45:50.370 You know what, after this I think I'm done. 00:45:50.370 --> 00:45:52.740 I'm just going to go ahead and close the file, 00:45:52.740 --> 00:45:55.270 and I'm going to go ahead and return True. 00:45:55.270 --> 00:45:56.010 So that's it. 00:45:56.010 --> 00:45:58.110 That's the load function in Python. 00:45:58.110 --> 00:46:01.200 Open the dictionary, for each line in the file 00:46:01.200 --> 00:46:04.890 add it to your global variable, close the file, return True. 00:46:04.890 --> 00:46:08.910 I mean, I'm pretty sure that my code is probably several lines, and certainly 00:46:08.910 --> 00:46:11.640 many hours, shorter than your code might have 00:46:11.640 --> 00:46:13.380 been for implementing that as well. 00:46:13.380 --> 00:46:14.700 Well, what about checking? 00:46:14.700 --> 00:46:16.500 Maybe the complexity is just elsewhere. 00:46:16.500 --> 00:46:18.292 Well, let me go ahead and define a function 00:46:18.292 --> 00:46:22.380 called check that takes a specific word as input as its argument. 00:46:22.380 --> 00:46:26.760 And then I'm just going to check if that given word is in my set of words. 00:46:26.760 --> 00:46:28.710 Well, it turns out in C you would probably 00:46:28.710 --> 00:46:30.752 have to use a for loop or a while loop, and you'd 00:46:30.752 --> 00:46:32.910 have to iterate over the whole list of words 00:46:32.910 --> 00:46:35.800 that you've loaded using binary search or linear search or the like. 00:46:35.800 --> 00:46:39.000 Ugh, I'm so past that at this point so many weeks in. 00:46:39.000 --> 00:46:48.360 I'm just going to say, if word in words, go ahead and return True, else return 00:46:48.360 --> 00:46:49.500 False. 00:46:49.500 --> 00:46:52.410 And that now is my implementation of check. 00:46:52.410 --> 00:46:53.730 Now, it's a little buggy. 00:46:53.730 --> 00:46:55.140 And I will fix this. 00:46:55.140 --> 00:46:56.460 Does anyone spot the bug? 00:46:56.460 --> 00:47:00.300 Even if you've never seen Python before, but having spent hours implementing 00:47:00.300 --> 00:47:07.950 your own version of check, is there some step I'm missing logically? 00:47:07.950 --> 00:47:10.340 There is a bug here. 00:47:10.340 --> 00:47:13.640 Does anyone spot what I'm not doing that you probably 00:47:13.640 --> 00:47:19.280 did do when checking if a given word is in fact in the dictionary? 00:47:19.280 --> 00:47:22.030 BRIAN: A couple of people are commenting on case sensitivity. 00:47:22.030 --> 00:47:23.530 DAVID MALAN: Yeah, case sensitivity. 00:47:23.530 --> 00:47:26.180 So odds are, in your implementation in C you probably 00:47:26.180 --> 00:47:29.870 forced the word to all uppercase, or you forced it to all lowercase. 00:47:29.870 --> 00:47:33.170 Totally doable, but you probably had to do it like character for character. 00:47:33.170 --> 00:47:36.380 You might have had to copy the input using malloc, or putting it 00:47:36.380 --> 00:47:38.240 into an array character for character, then 00:47:38.240 --> 00:47:43.370 using a toupper or tolower to capitalize or lowercase each individual letter. 00:47:43.370 --> 00:47:46.620 Ugh, like, that would take forever, as indeed it might have. 00:47:46.620 --> 00:47:50.180 So you know what, if you want to take a given word and lowercase it, 00:47:50.180 --> 00:47:51.770 just say word.lower. 00:47:51.770 --> 00:47:54.920 And Python will take care of all of those steps of iterating 00:47:54.920 --> 00:47:58.730 over every character, changing each one to lowercase, and returning to you 00:47:58.730 --> 00:48:02.000 the new result. And indeed, this now, I would think, 00:48:02.000 --> 00:48:05.390 is consistent with what you did in your example as well. 00:48:05.390 --> 00:48:06.530 Well, how about size? 00:48:06.530 --> 00:48:09.020 Well, in size recall that you had to define 00:48:09.020 --> 00:48:13.940 a function that doesn't take any inputs but returns the number of words 00:48:13.940 --> 00:48:15.630 in the set of words. 00:48:15.630 --> 00:48:17.630 And I'm going to go ahead here-- and actually, I 00:48:17.630 --> 00:48:19.850 got my invitation slightly off here. 00:48:19.850 --> 00:48:22.790 Let me fix this real fast. 00:48:22.790 --> 00:48:25.400 If you want to return the size of your dictionary, 00:48:25.400 --> 00:48:27.660 or really the number of words in your set, 00:48:27.660 --> 00:48:31.160 you can just return the length of that global variable words. 00:48:31.160 --> 00:48:32.130 Done. 00:48:32.130 --> 00:48:35.660 And lastly, if you want to unload the dictionary, 00:48:35.660 --> 00:48:37.220 let me go ahead and unload things. 00:48:37.220 --> 00:48:38.600 Doesn't take input as well. 00:48:38.600 --> 00:48:41.690 Honestly, because I've not done any equivalent of malloc, 00:48:41.690 --> 00:48:43.880 I've not done any memory management-- why? 00:48:43.880 --> 00:48:46.010 You don't have to in Python-- 00:48:46.010 --> 00:48:51.560 I can literally just return True in all cases, because my code is undoubtedly 00:48:51.560 --> 00:48:55.190 correct, because I didn't have to bother with pointers and addresses and memory 00:48:55.190 --> 00:48:55.920 management. 00:48:55.920 --> 00:48:58.962 So all of the stress that might have been induced over the past few weeks 00:48:58.962 --> 00:49:01.880 as you understood the lower level details of memory management now 00:49:01.880 --> 00:49:09.710 go away, not because it's not happening underneath the hood, 00:49:09.710 --> 00:49:12.080 but because Python is doing it for you. 00:49:12.080 --> 00:49:14.300 And I did spot one bug here actually. 00:49:14.300 --> 00:49:16.760 Notice I kind of relapsed into C code here. 00:49:16.760 --> 00:49:20.660 What I should have said here is it's actually file.close. 00:49:20.660 --> 00:49:25.130 So here when I close the file in load, I actually have to call file.close, 00:49:25.130 --> 00:49:30.360 because now that function close is associated with that variable for me. 00:49:30.360 --> 00:49:33.380 So again, there is memory management happening. 00:49:33.380 --> 00:49:37.550 Malloc and free or realloc are all happening sort of for you 00:49:37.550 --> 00:49:38.390 underneath the hood. 00:49:38.390 --> 00:49:40.310 But what Python the language is doing for 00:49:40.310 --> 00:49:42.523 you now is managing all of that for you. 00:49:42.523 --> 00:49:45.440 That's what you get by using a so-called higher-level language instead 00:49:45.440 --> 00:49:47.030 of a lower-level language. 00:49:47.030 --> 00:49:49.490 You get more features, and in turn in this case, 00:49:49.490 --> 00:49:52.610 you get all of those problems taken care of for you, 00:49:52.610 --> 00:49:55.430 so that you and I can focus on building our spell checker, 00:49:55.430 --> 00:49:58.190 so you and I can focus on building our Instagram filters, 00:49:58.190 --> 00:50:02.078 not on allocating memory, copying strings, uppercase and things, which 00:50:02.078 --> 00:50:05.120 honestly, while it might have been fun and very gratifying the first time 00:50:05.120 --> 00:50:08.390 you got those things working, programming would very quickly become 00:50:08.390 --> 00:50:11.150 the most tedious thing in the world if any time you 00:50:11.150 --> 00:50:16.320 want to write a program you have to think and write code at that low level. 00:50:16.320 --> 00:50:16.820 All right. 00:50:16.820 --> 00:50:20.480 Let me go ahead and really cross my fingers that I didn't screw up here, 00:50:20.480 --> 00:50:22.200 and go ahead and run this code. 00:50:22.200 --> 00:50:25.088 So I'm going to go ahead and run python of speller.py-- 00:50:25.088 --> 00:50:28.130 which, admittedly, I wrote in advance, because just like the distribution 00:50:28.130 --> 00:50:32.248 code in speller, we wrote speller.c for you, we wrote speller.py in advance. 00:50:32.248 --> 00:50:34.040 But we won't look at the internals of that. 00:50:34.040 --> 00:50:35.832 I'm going to go ahead and test this on, how 00:50:35.832 --> 00:50:37.880 about something big like Shakespeare. 00:50:37.880 --> 00:50:40.070 And I'm going to cross my fingers here. 00:50:40.070 --> 00:50:41.810 And so far so good. 00:50:41.810 --> 00:50:44.030 The words are kind of flying by. 00:50:44.030 --> 00:50:46.273 I'm going to assume they're correct. 00:50:46.273 --> 00:50:47.690 Hopefully we'll get to the output. 00:50:47.690 --> 00:50:50.870 And it looks like, yeah, I think I see some familiar numbers here. 00:50:50.870 --> 00:50:53.450 I've got 143,091 words. 00:50:53.450 --> 00:50:57.330 And then down here, the total time involved was just under 1 second. 00:50:57.330 --> 00:50:58.790 So that's pretty darn fast. 00:50:58.790 --> 00:51:01.010 And to be clear, I'm using my Mac instead of the IDE, 00:51:01.010 --> 00:51:05.150 so my numbers might be a little different than in the cloud, but 0.9 00:51:05.150 --> 00:51:05.840 seconds. 00:51:05.840 --> 00:51:09.560 But you know what, out of curiosity, let me open up a different tab real quick, 00:51:09.560 --> 00:51:13.340 and let me go ahead and make speller from problem set 5. 00:51:13.340 --> 00:51:17.240 So I brought in advance our own implementation of speller, the staff 00:51:17.240 --> 00:51:21.530 solution, written in C in dictionary.c and speller.c, 00:51:21.530 --> 00:51:23.330 and I've just compiled it with make. 00:51:23.330 --> 00:51:29.040 And let me go ahead and run ./speller using the same text on Shakespeare. 00:51:29.040 --> 00:51:31.310 So again, I just ran the Python version, now 00:51:31.310 --> 00:51:37.140 I want to run the C version using the staff's implementation. 00:51:37.140 --> 00:51:37.640 All right. 00:51:37.640 --> 00:51:38.610 Wow. 00:51:38.610 --> 00:51:42.140 All right, it flew by way faster, kind of twice as fast. 00:51:42.140 --> 00:51:47.090 And notice, even though the numbers are the same up above, the times are not. 00:51:47.090 --> 00:51:51.290 My C version took 0.52 seconds, so half a second. 00:51:51.290 --> 00:51:55.310 My Python version took 0.9, or roughly 1 second. 00:51:55.310 --> 00:52:01.220 So it would seem that my C version is faster, my Python version is slower. 00:52:01.220 --> 00:52:04.850 Why might that be? 00:52:04.850 --> 00:52:07.310 Why might that be? 00:52:07.310 --> 00:52:10.550 Because I'm kind of disappointed if we just spent all this time 00:52:10.550 --> 00:52:13.130 preaching the virtues of Python, and yet here we 00:52:13.130 --> 00:52:15.410 are writing worse code, in some sense. 00:52:15.410 --> 00:52:17.330 Santiago? 00:52:17.330 --> 00:52:21.270 AUDIENCE: Could it be because C, even though it's low level, 00:52:21.270 --> 00:52:24.660 it explicitly tells the computer what to do, 00:52:24.660 --> 00:52:29.492 and so that makes it a little faster, whilst in Python it all 00:52:29.492 --> 00:52:31.700 happens like underneath the hood, as you were saying, 00:52:31.700 --> 00:52:33.560 so that could make it a little slower. 00:52:33.560 --> 00:52:34.310 DAVID MALAN: Yeah. 00:52:34.310 --> 00:52:36.980 In Python, you have a general-purpose solution 00:52:36.980 --> 00:52:39.787 to the problem of memory management, and capitalization, 00:52:39.787 --> 00:52:41.870 and all of these other features, that we ourselves 00:52:41.870 --> 00:52:45.890 have to implement ourselves in C. Python has general-purpose implementations 00:52:45.890 --> 00:52:46.820 of all of those. 00:52:46.820 --> 00:52:50.750 But there's a price you pay by using someone else's code to implement 00:52:50.750 --> 00:52:53.270 all of those things for you. 00:52:53.270 --> 00:52:57.140 And you pay an even greater price by using the type of language 00:52:57.140 --> 00:52:58.655 that Python is in a sense. 00:52:58.655 --> 00:53:00.530 So there's been this other salient difference 00:53:00.530 --> 00:53:03.020 between using C and using Python. 00:53:03.020 --> 00:53:08.150 When I wrote C code, I would compile my code from source code 00:53:08.150 --> 00:53:09.080 into machine code. 00:53:09.080 --> 00:53:11.450 And recall that machine code are 0's and 1's understood 00:53:11.450 --> 00:53:14.750 by the computer's brain, the so-called CPU, or Central Processing Unit. 00:53:14.750 --> 00:53:17.750 We always had to compile our code every time we changed the source code. 00:53:17.750 --> 00:53:21.150 And then we did like ./hello to run the program. 00:53:21.150 --> 00:53:25.860 But every demo thus far in Python, I haven't used make or clang. 00:53:25.860 --> 00:53:32.690 I have used not ./hello, but rather python space the name of the program. 00:53:32.690 --> 00:53:33.870 And why is that? 00:53:33.870 --> 00:53:36.560 Well, it turns out that Python is often implemented as what 00:53:36.560 --> 00:53:38.990 we describe with an interpreter. 00:53:38.990 --> 00:53:42.120 So Python is not only a language like we've been writing, 00:53:42.120 --> 00:53:44.460 it's also a program unto itself. 00:53:44.460 --> 00:53:48.800 The Python program I keep running is an identically named program 00:53:48.800 --> 00:53:51.330 that understands the Python language. 00:53:51.330 --> 00:53:56.150 And what's happening, though, is that by using an interpreter, so to speak, 00:53:56.150 --> 00:53:59.810 to run my programs you're incurring some amount of overhead. 00:53:59.810 --> 00:54:01.490 You're paying a performance price. 00:54:01.490 --> 00:54:02.252 Why? 00:54:02.252 --> 00:54:04.710 Well, computers, recall from week 0, at the end of the day, 00:54:04.710 --> 00:54:06.380 only understand 0's and 1's. 00:54:06.380 --> 00:54:08.390 That's what makes them tick. 00:54:08.390 --> 00:54:11.240 But I have not outputted any 0's and 1's. 00:54:11.240 --> 00:54:13.790 I the human have only been writing Python. 00:54:13.790 --> 00:54:18.860 So there needs to be some kind of translation between my Python code, 00:54:18.860 --> 00:54:23.120 in this English-like syntax, into what the computer itself understands. 00:54:23.120 --> 00:54:25.670 And if you're not going to go through the effort of compiling 00:54:25.670 --> 00:54:27.963 your code every time you make a change, but instead 00:54:27.963 --> 00:54:30.380 you're just going to run your code through an interpreter, 00:54:30.380 --> 00:54:33.470 as is the norm in the Python world, you're 00:54:33.470 --> 00:54:37.760 going to pay a price, because someone had to implement a translator for you. 00:54:37.760 --> 00:54:40.610 And in fact, there's formal terminology for this. 00:54:40.610 --> 00:54:45.150 In the world of Python we have, for instance, 00:54:45.150 --> 00:54:47.240 a picture that looks more like this. 00:54:47.240 --> 00:54:50.630 Whereas in the world of C, we would actually take our source code as input 00:54:50.630 --> 00:54:52.970 and output, first machine code is output, 00:54:52.970 --> 00:54:56.480 and then run the machine code, in the world of Python thus far, 00:54:56.480 --> 00:54:59.450 I'm writing source code, and then I'm immediately running it. 00:54:59.450 --> 00:55:01.940 I'm not compiling it into 0's and 1's in advance. 00:55:01.940 --> 00:55:05.150 I'm trusting that there's a program, coincidentally called Python, 00:55:05.150 --> 00:55:09.920 whose purpose in life is to translate that code for me 00:55:09.920 --> 00:55:12.470 into something the computer does understand. 00:55:12.470 --> 00:55:15.560 And what does that actually mean in real terms? 00:55:15.560 --> 00:55:17.960 Well, it means that if I were to think back 00:55:17.960 --> 00:55:22.250 to an algorithm like this, which probably cryptic to many of you, 00:55:22.250 --> 00:55:25.670 though not all, might be a Spanish algorithm 00:55:25.670 --> 00:55:27.890 for searching a phone book for someone. 00:55:27.890 --> 00:55:30.380 And suppose that I don't speak Spanish at all. 00:55:30.380 --> 00:55:35.000 I might, ideally, compile this program, this algorithm, into something 00:55:35.000 --> 00:55:39.410 I do understand by using a compiler that translates Spanish to English. 00:55:39.410 --> 00:55:43.070 Like voila, this English version, much better reading and understanding this, 00:55:43.070 --> 00:55:45.050 I can execute this algorithm pretty fast, 00:55:45.050 --> 00:55:46.700 because I'm pretty good at English. 00:55:46.700 --> 00:55:50.300 But if you only give me the Spanish version, the source code, 00:55:50.300 --> 00:55:54.725 and you require that I translate it or interpret it line by line, 00:55:54.725 --> 00:55:56.600 honestly that's really going to slow me down, 00:55:56.600 --> 00:55:59.660 because it's like me having to go take like a Spanish dictionary 00:55:59.660 --> 00:56:01.490 and look up every word-- 00:56:01.490 --> 00:56:02.990 "Recoge guia telefonica." 00:56:02.990 --> 00:56:04.912 All right, well, what's "recoge"? 00:56:04.912 --> 00:56:05.870 I have to look that up. 00:56:05.870 --> 00:56:07.880 What's "guia", what's "telefonica"? 00:56:07.880 --> 00:56:08.630 Oh, OK. 00:56:08.630 --> 00:56:09.710 Pick up phone book. 00:56:09.710 --> 00:56:10.230 Got that. 00:56:10.230 --> 00:56:10.730 Step one. 00:56:10.730 --> 00:56:11.397 What's step two? 00:56:11.397 --> 00:56:13.580 "Abre a la mitad de guia telefonica." 00:56:13.580 --> 00:56:16.640 So "open to the middle"-- well, wait, I don't know that. 00:56:16.640 --> 00:56:17.420 Spoiler. 00:56:17.420 --> 00:56:18.738 What does that mean, "abre"? 00:56:18.738 --> 00:56:20.030 All right, let me look that up. 00:56:20.030 --> 00:56:20.900 And it means "open." 00:56:20.900 --> 00:56:23.870 "A la mitad," that means "to the middle." 00:56:23.870 --> 00:56:26.300 "De guia telefonica," "of the phone book." 00:56:26.300 --> 00:56:28.710 Oh, that means "open to the middle of the phone book." 00:56:28.710 --> 00:56:31.040 So I'm struggling to go back and forth here, clearly. 00:56:31.040 --> 00:56:32.870 But it's clearly a slower process. 00:56:32.870 --> 00:56:36.110 And if I keep going, "Ve la pagina," "Look at the page," 00:56:36.110 --> 00:56:39.260 looking up, translating every line, it's undoubtedly 00:56:39.260 --> 00:56:40.890 going to slow down the process. 00:56:40.890 --> 00:56:43.160 And so that's effectively what's happening for us 00:56:43.160 --> 00:56:44.900 when we run these Python programs. 00:56:44.900 --> 00:56:48.350 There is a translator, a man in the middle, so to speak, 00:56:48.350 --> 00:56:51.420 that's looking at your source code and reading it top to bottom, 00:56:51.420 --> 00:56:55.550 left to right, and essentially translating each line respectively 00:56:55.550 --> 00:56:59.340 into the corresponding code that the computer understands. 00:56:59.340 --> 00:57:01.550 So the upside of this is that, thankfully, we 00:57:01.550 --> 00:57:02.930 don't have to run make or clang. 00:57:02.930 --> 00:57:04.800 We don't have to compile our code anymore. 00:57:04.800 --> 00:57:07.480 Like, how many people here have made a change 00:57:07.480 --> 00:57:11.830 to an earlier pset in C, forgotten to save the file but you rerun the-- 00:57:11.830 --> 00:57:14.680 sorry, you forgot to recompile the file, and you rerun it, 00:57:14.680 --> 00:57:16.450 and the program obviously has not changed 00:57:16.450 --> 00:57:19.840 because you haven't actually, not only saved but recompiled it? 00:57:19.840 --> 00:57:22.720 So that stupid, annoying human step is gone. 00:57:22.720 --> 00:57:25.960 In the world of Python, if you change your file, go ahead and just rerun it, 00:57:25.960 --> 00:57:26.950 reinterpret it. 00:57:26.950 --> 00:57:28.417 You can save that step. 00:57:28.417 --> 00:57:31.000 But the price you're going to pay is a little bit of overhead. 00:57:31.000 --> 00:57:34.660 And indeed, we see that here in terms of my Python version 00:57:34.660 --> 00:57:37.570 taking roughly 1 second to spellcheck Shakespeare, 00:57:37.570 --> 00:57:41.450 and my C version taking only one half of a second. 00:57:41.450 --> 00:57:44.320 So here, too, I promised in past weeks this theme of trade-offs. 00:57:44.320 --> 00:57:47.320 This is so prevalent in the world of computer science and programming, 00:57:47.320 --> 00:57:48.730 and frankly in the real world. 00:57:48.730 --> 00:57:51.940 Any time you make some improvement or gain some benefit, 00:57:51.940 --> 00:57:53.860 odds are you are paying some price. 00:57:53.860 --> 00:57:57.610 Maybe it's time, maybe it's space, maybe it's money, maybe it's complexity, 00:57:57.610 --> 00:57:58.840 maybe it's anything else. 00:57:58.840 --> 00:58:01.720 There's this perpetual trade-off of resources. 00:58:01.720 --> 00:58:03.670 And being a good programmer, ultimately, is 00:58:03.670 --> 00:58:06.520 about finding those inflection points and knowing ultimately 00:58:06.520 --> 00:58:09.760 what tools to use for the trade. 00:58:09.760 --> 00:58:12.250 All right, let's go ahead here, take a 5-minute break. 00:58:12.250 --> 00:58:14.833 And when we come back, we'll look at other features of Python, 00:58:14.833 --> 00:58:19.330 we'll end ultimately today with some really powerful capabilities. 00:58:19.330 --> 00:58:21.130 Back in five. 00:58:21.130 --> 00:58:21.850 All right. 00:58:21.850 --> 00:58:22.540 We are back. 00:58:22.540 --> 00:58:24.730 And first, a retraction if I may. 00:58:24.730 --> 00:58:28.330 Brian kindly pointed out that my answer to Olivia and Noah's follow-up question 00:58:28.330 --> 00:58:31.300 unfortunately missed the mark, as I was doing things on the fly instead 00:58:31.300 --> 00:58:32.890 of reading the documentation. 00:58:32.890 --> 00:58:36.100 So let me recall for us this example here, 00:58:36.100 --> 00:58:38.990 wherein we had the range function returning three values. 00:58:38.990 --> 00:58:43.330 So that code correct, that gives us the values 0, 1, and 2. 00:58:43.330 --> 00:58:46.450 But what I think Olivia asked was that if you wanted to skip values, 00:58:46.450 --> 00:58:49.155 and for instance do every two digits, how do we do that? 00:58:49.155 --> 00:58:51.280 And I unfortunately screwed up the syntax for that, 00:58:51.280 --> 00:58:54.610 providing only two inputs to range instead of three, 00:58:54.610 --> 00:58:56.000 as would be needed here. 00:58:56.000 --> 00:58:58.810 So for instance, suppose that we wanted to print out 00:58:58.810 --> 00:59:02.500 all of the numbers between 0 and 100, inclusive, 00:59:02.500 --> 00:59:06.520 but skipping every other-- so, 0, 2, 4, 6, 8, so all the even 00:59:06.520 --> 00:59:09.230 numbers on up through 100. 00:59:09.230 --> 00:59:12.430 We would actually want to do something like this instead. 00:59:12.430 --> 00:59:17.200 We would say, for i in range of 0 comma 101 comma 2. 00:59:17.200 --> 00:59:18.210 Why is that? 00:59:18.210 --> 00:59:20.600 Well, we'll pull up the documentation in just a moment, 00:59:20.600 --> 00:59:23.530 but 0 is where you want to start counting. 00:59:23.530 --> 00:59:26.770 The second value, 101, is where you want to stop counting. 00:59:26.770 --> 00:59:29.890 But it is by definition exclusive, so we have 00:59:29.890 --> 00:59:31.940 to go 1 past the value we care about. 00:59:31.940 --> 00:59:35.170 And then the 2, the third argument, is how many 00:59:35.170 --> 00:59:40.090 numbers do you want to increment at a time, from 0 to 2 to 4 to 6 to 8, 00:59:40.090 --> 00:59:41.920 on up through 100. 00:59:41.920 --> 00:59:44.290 So how could I have figured this out in advance 00:59:44.290 --> 00:59:45.880 rather than embarrassing myself now? 00:59:45.880 --> 00:59:48.880 Well, it turns out there is official documentation for Python. 00:59:48.880 --> 00:59:50.710 And we'll always link this to you. 00:59:50.710 --> 00:59:53.060 And here there is this search box at the very top. 00:59:53.060 --> 00:59:54.880 And you can see that during the break I was searching 00:59:54.880 --> 00:59:56.320 for the documentation for range. 00:59:56.320 --> 00:59:59.362 And sure enough, if I search for the range documentation, at first glance 00:59:59.362 --> 01:00:01.487 it might seem kind of overwhelming, because there's 01:00:01.487 --> 01:00:04.450 a lot of mentions of something like range in the documentation. 01:00:04.450 --> 01:00:07.460 Fortunately, the first result here is the one we want. 01:00:07.460 --> 01:00:10.300 And if I click on that, you'll see some documentation that's 01:00:10.300 --> 01:00:12.520 a little cryptic at first glance. 01:00:12.520 --> 01:00:15.040 But what's interesting about this is that range 01:00:15.040 --> 01:00:16.332 comes in two different flavors. 01:00:16.332 --> 01:00:18.207 And even though I keep calling it a function, 01:00:18.207 --> 01:00:19.880 technically it's what's called a class. 01:00:19.880 --> 01:00:21.130 But more on that another time. 01:00:21.130 --> 01:00:23.330 It behaves for our purposes as a function. 01:00:23.330 --> 01:00:24.880 Notice that there's two lines here. 01:00:24.880 --> 01:00:26.780 And they're similar but different. 01:00:26.780 --> 01:00:30.730 The first one specifies that this range function 01:00:30.730 --> 01:00:33.080 can take one input, the stop value. 01:00:33.080 --> 01:00:35.500 So at what value do you want to stop counting? 01:00:35.500 --> 01:00:40.300 So before, when we did range of 3, it stands to reason that by default, 01:00:40.300 --> 01:00:43.360 if you start counting at 0 and you stop at 3, that will 01:00:43.360 --> 01:00:47.260 get you to use i equals 0, 1, and 2. 01:00:47.260 --> 01:00:50.500 But there's another flavor of the range function, which 01:00:50.500 --> 01:00:52.420 is not the one that I proposed exists. 01:00:52.420 --> 01:00:56.050 There's another that takes in potentially three arguments, here 01:00:56.050 --> 01:00:57.400 or technically two. 01:00:57.400 --> 01:00:59.080 But it works in the following way. 01:00:59.080 --> 01:01:02.320 When you see syntax like this in Python's documentation, 01:01:02.320 --> 01:01:04.900 this means that the alternate form of range 01:01:04.900 --> 01:01:09.490 takes an argument called start, followed by an argument called stop, 01:01:09.490 --> 01:01:13.660 followed by, optionally, a third argument called step. 01:01:13.660 --> 01:01:17.360 And I know as the reader it's optional, because it's in square brackets here. 01:01:17.360 --> 01:01:20.050 So nothing to do with lists or arrays or anything like this. 01:01:20.050 --> 01:01:21.590 This is just human documentation. 01:01:21.590 --> 01:01:23.530 Anytime you see things in square brackets, 01:01:23.530 --> 01:01:27.050 that tends to imply to the human reader that this is optional. 01:01:27.050 --> 01:01:28.130 So what does that mean? 01:01:28.130 --> 01:01:31.510 Well, notice that there is no flavor of range that 01:01:31.510 --> 01:01:36.100 lets me specify a stop and a step, which I thought there was a moment ago when 01:01:36.100 --> 01:01:37.450 answering Olivia and Noah. 01:01:37.450 --> 01:01:40.170 But rather, there is this three-input version. 01:01:40.170 --> 01:01:43.210 So if I specify I want to start at 0, I want 01:01:43.210 --> 01:01:47.230 to stop at 101, which is just past the 100 I care about, 01:01:47.230 --> 01:01:50.110 and then provide an optional step of 2, this 01:01:50.110 --> 01:01:53.200 will give me a program ultimately that will print out 01:01:53.200 --> 01:01:54.730 all of those even numbers. 01:01:54.730 --> 01:01:55.780 So let me do this. 01:01:55.780 --> 01:01:58.250 First let me go into a program here. 01:01:58.250 --> 01:01:59.650 I'll call it count.py. 01:01:59.650 --> 01:02:04.030 And I'm going to go ahead and start at 0, go up to but not through 101, 01:02:04.030 --> 01:02:05.350 stepping 2 at a time. 01:02:05.350 --> 01:02:07.200 And this time I'm going to print out i. 01:02:07.200 --> 01:02:09.300 And here, too, another handy feature of Python-- 01:02:09.300 --> 01:02:11.730 no more %s, and also no more %i. 01:02:11.730 --> 01:02:14.640 If you want to print out the value of a variable called i, 01:02:14.640 --> 01:02:18.120 just say print, open paren, i, close paren. 01:02:18.120 --> 01:02:20.565 You don't need another format string as in C. 01:02:20.565 --> 01:02:24.570 Let me go ahead now and run python of count.py, Enter. 01:02:24.570 --> 01:02:26.070 And it scrolled by really fast. 01:02:26.070 --> 01:02:28.350 But notice that it stopped at 100, and if I scroll 01:02:28.350 --> 01:02:30.810 to the beginning it started at 0. 01:02:30.810 --> 01:02:31.615 So my apologies. 01:02:31.615 --> 01:02:33.420 Mea culpa for messing that up earlier. 01:02:33.420 --> 01:02:36.510 But what a wonderful opportunity to introduce the official documentation 01:02:36.510 --> 01:02:39.930 for Python, which will soon become your friend, 01:02:39.930 --> 01:02:42.600 cryptic though it might feel at first glance. 01:02:42.600 --> 01:02:43.360 All right. 01:02:43.360 --> 01:02:45.840 Let's go ahead then and revisit one other program 01:02:45.840 --> 01:02:47.400 that we started with earlier. 01:02:47.400 --> 01:02:50.850 And that program was again this relatively simple Hello program 01:02:50.850 --> 01:02:52.710 that we left off in this state. 01:02:52.710 --> 01:02:56.310 We were using the get_string function from the CS50 library in Python. 01:02:56.310 --> 01:02:59.160 We had a variable called answer that was getting the return 01:02:59.160 --> 01:03:01.020 value of that version of get_string. 01:03:01.020 --> 01:03:04.620 And we were printing out "hello," comma, so-and-so. 01:03:04.620 --> 01:03:07.620 And we were using that new cryptic feature, but handy, 01:03:07.620 --> 01:03:12.480 known as a format string or an f-string, which just means replace whatever's 01:03:12.480 --> 01:03:14.717 in curly braces with the actual value. 01:03:14.717 --> 01:03:16.800 So let's start to now take off the training wheels 01:03:16.800 --> 01:03:18.660 that we just put on only an hour ago. 01:03:18.660 --> 01:03:20.670 Let's get rid of the CS50 library. 01:03:20.670 --> 01:03:24.210 How can we actually get input in Python without using 01:03:24.210 --> 01:03:26.670 a library from someone like CS50? 01:03:26.670 --> 01:03:28.290 Well, get_string no longer exists. 01:03:28.290 --> 01:03:33.480 But thankfully there is another function we can use called, quite simply, input. 01:03:33.480 --> 01:03:38.730 Input is a function that, quite similar to get_string in both C and Python, 01:03:38.730 --> 01:03:42.030 prompts the user with a phrase, like this one here, "What's your name?"; 01:03:42.030 --> 01:03:45.120 waits for them to type in a value; and as soon as they hit Enter, 01:03:45.120 --> 01:03:48.610 it returns whatever the human has typed in for you. 01:03:48.610 --> 01:03:52.800 So if I go ahead now and rerun this program, python of hello.py, 01:03:52.800 --> 01:03:56.505 after getting rid of the CS50 library and using input instead of get_string, 01:03:56.505 --> 01:03:57.810 what's my name? 01:03:57.810 --> 01:03:58.760 David. 01:03:58.760 --> 01:03:59.820 "Hello," comma, "David." 01:03:59.820 --> 01:04:02.790 So already there now, this is raw, native Python 01:04:02.790 --> 01:04:07.080 code completely unrelated to anything CS50 specific. 01:04:07.080 --> 01:04:10.260 But now let's go ahead, and let's keep using the CS50 library initially, 01:04:10.260 --> 01:04:13.830 because we'll see that very quickly are there advantages of using it, 01:04:13.830 --> 01:04:15.960 because we do a lot of error checking for you. 01:04:15.960 --> 01:04:19.500 But we'll eventually take those training wheels off entirely as well. 01:04:19.500 --> 01:04:22.290 But notice, indeed, how relatively simple it is to do so. 01:04:22.290 --> 01:04:26.983 Let me go ahead and open up a program that we wrote in advance. 01:04:26.983 --> 01:04:28.650 And I'm going to go ahead and grab this. 01:04:28.650 --> 01:04:31.510 This is available, as always, on the course's website. 01:04:31.510 --> 01:04:35.850 And I'm going to go ahead and open a file called addition0.c, 01:04:35.850 --> 01:04:37.830 which we've actually seen before. 01:04:37.830 --> 01:04:40.200 And I'm going to go ahead and do this fancy thing here 01:04:40.200 --> 01:04:43.350 where, in just a moment, I'm going to split my window so 01:04:43.350 --> 01:04:45.240 that I can see two files at a time. 01:04:45.240 --> 01:04:49.350 And over here I'm going to create a new file, and I'll call this addition.py. 01:04:49.350 --> 01:04:52.620 So that is to say, I'm just going to rearrange my IDE temporarily 01:04:52.620 --> 01:04:56.400 today so that we can see one language on the left, C, and then 01:04:56.400 --> 01:04:58.872 corresponding language on the right in Python. 01:04:58.872 --> 01:05:01.080 And again, you can download all these examples online 01:05:01.080 --> 01:05:03.220 if you'd like to follow along on your own. 01:05:03.220 --> 01:05:06.820 So if I'm translating this program on the left to this program on the right, 01:05:06.820 --> 01:05:09.810 let's first recall what the program on the left actually did. 01:05:09.810 --> 01:05:13.500 This was a program that prompts the user for x, prompts the user for y, 01:05:13.500 --> 01:05:15.720 and quite simply performs addition on the two. 01:05:15.720 --> 01:05:18.300 So this is week 1 stuff, way back when now. 01:05:18.300 --> 01:05:19.990 Well, let's go ahead and translate this. 01:05:19.990 --> 01:05:22.622 I will use the get_int function from the CS50 library, 01:05:22.622 --> 01:05:25.080 because it's going to make my life a little easier for now. 01:05:25.080 --> 01:05:28.260 I'm going to say from cs50 import get_int. 01:05:28.260 --> 01:05:30.990 I'm going to then go ahead and get an int from the user using 01:05:30.990 --> 01:05:32.820 get_int and prompting them for x. 01:05:32.820 --> 01:05:36.390 I'm going to then go ahead and get an int from the user prompting them for y. 01:05:36.390 --> 01:05:41.700 I'm going to then finally go ahead and, let's say, print out x plus y. 01:05:41.700 --> 01:05:45.690 And let me go ahead down here and run python of addition.py. 01:05:45.690 --> 01:05:50.310 I'm now being prompted for x, let's type in 1, y, let's type in 2, and voila, 01:05:50.310 --> 01:05:52.140 3 is my program here. 01:05:52.140 --> 01:05:53.520 So pretty straightforward. 01:05:53.520 --> 01:05:56.550 Fewer lines of code, because one, I don't have these unnecessary 01:05:56.550 --> 01:05:58.600 includes like stdio.h. 01:05:58.600 --> 01:06:00.355 I don't have any of the curly braces. 01:06:00.355 --> 01:06:02.230 To be fair, I don't have any of the comments. 01:06:02.230 --> 01:06:03.272 So let me write comments. 01:06:03.272 --> 01:06:05.970 In Python, it's going to be a different symbol. 01:06:05.970 --> 01:06:12.960 "Prompt user for x" should be prefixed with a hash symbol now instead of a //. 01:06:12.960 --> 01:06:18.300 I'll go ahead and prompt user for y, and then, how about here, perform addition. 01:06:18.300 --> 01:06:19.890 But even still, it's pretty tight. 01:06:19.890 --> 01:06:23.965 It's only 10 lines of code with some of those comments there. 01:06:23.965 --> 01:06:26.590 All right, well, what might I do that's a little bit different? 01:06:26.590 --> 01:06:27.990 Well, let's take off the training wheels. 01:06:27.990 --> 01:06:30.740 Let's take off the training wheels and get rid of the CS50 library 01:06:30.740 --> 01:06:32.880 again and get input here. 01:06:32.880 --> 01:06:36.240 Well, if I go ahead and get input here, get input here, 01:06:36.240 --> 01:06:40.230 assigning the values to x and y respectively, I'm going to go ahead now 01:06:40.230 --> 01:06:44.340 and run python of addition.py. 01:06:44.340 --> 01:06:48.540 x will be 1 again, y will be 2 again, and the answer, of course, is-- 01:06:48.540 --> 01:06:49.790 12. 01:06:49.790 --> 01:06:52.560 Well, that's wrong. 01:06:52.560 --> 01:06:55.020 What's going on? 01:06:55.020 --> 01:06:59.420 How did I screw up such a simple program already? 01:06:59.420 --> 01:07:02.850 Albeit in a new language for me, Python. 01:07:02.850 --> 01:07:03.950 What did I do here? 01:07:03.950 --> 01:07:05.300 Yeah, Ben? 01:07:05.300 --> 01:07:08.387 AUDIENCE: Because it's really taking it in as two strings, 01:07:08.387 --> 01:07:10.220 so it's just putting them next to each other 01:07:10.220 --> 01:07:12.020 as opposed to doing the actual math on it. 01:07:12.020 --> 01:07:13.635 It's not reading it as in int. 01:07:13.635 --> 01:07:14.510 DAVID MALAN: Exactly. 01:07:14.510 --> 01:07:17.270 So input, this function that comes with Python, 01:07:17.270 --> 01:07:19.505 really is analogous to Cs50's get_string. 01:07:19.505 --> 01:07:21.380 No matter what the human types, it's going 01:07:21.380 --> 01:07:25.370 to come back as keyboard input characters, or ASCII characters, 01:07:25.370 --> 01:07:27.260 or Unicode characters from weeks past. 01:07:27.260 --> 01:07:29.052 Even if they look like numbers, they're not 01:07:29.052 --> 01:07:32.990 going to be treated as numbers, a.k.a., integers, unless we coerce them so. 01:07:32.990 --> 01:07:37.550 Now remember in C, we had this ability to cast values from one to another. 01:07:37.550 --> 01:07:40.170 Casting meant to convert one data type to another. 01:07:40.170 --> 01:07:44.150 And we were allowed to do that for chars to ints or ints to chars, 01:07:44.150 --> 01:07:48.668 but you could not do it for strings to ints, or from ints to strings. 01:07:48.668 --> 01:07:50.210 For that we needed special functions. 01:07:50.210 --> 01:07:53.520 And some of you might have used atoi, ASCII to int, 01:07:53.520 --> 01:07:56.960 which was a function that actually looks at all of the characters in an ASCII 01:07:56.960 --> 01:07:59.690 string and converts it to the corresponding integer. 01:07:59.690 --> 01:08:02.040 In Python, frankly, it's a little simpler. 01:08:02.040 --> 01:08:04.320 We can just cast it from one thing to another. 01:08:04.320 --> 01:08:08.270 So I'm going to go ahead and cast the return value of input 01:08:08.270 --> 01:08:11.390 as using this, int. 01:08:11.390 --> 01:08:16.010 And I'm going to do the same for y, passing the return value of input there 01:08:16.010 --> 01:08:18.620 to convert what looks like a string to what's-- 01:08:18.620 --> 01:08:21.800 what looks like an int to what's actually an int. 01:08:21.800 --> 01:08:25.399 And now let me go ahead and perform the additions again, python of addition.py. 01:08:25.399 --> 01:08:27.830 And notice this time, hopefully to Ben's point, 01:08:27.830 --> 01:08:31.529 it's not going to concatenate two strings, as we saw 01:08:31.529 --> 01:08:34.790 is the default behavior of plus when you have two strings left and right. 01:08:34.790 --> 01:08:39.290 Hopefully now it will do a do addition on x equals 1, y equals 2. 01:08:39.290 --> 01:08:42.310 And voila, now we're back in business. 01:08:42.310 --> 01:08:47.479 However, what if I'm not the most cooperative or sharp user, 01:08:47.479 --> 01:08:50.090 and I type in "cat" for x? 01:08:50.090 --> 01:08:52.950 Now some crazy stuff starts to happen. 01:08:52.950 --> 01:08:55.729 So notice we've triggered our very first error when 01:08:55.729 --> 01:08:58.520 it comes to running a program whereby my program won't even 01:08:58.520 --> 01:08:59.660 run in the first place. 01:08:59.660 --> 01:09:02.180 And notice I'm getting some somewhat cryptic syntax here-- 01:09:02.180 --> 01:09:06.319 traceback, most recent call last, file addition.py line 2. 01:09:06.319 --> 01:09:07.819 All right, that's at least familiar. 01:09:07.819 --> 01:09:09.800 I screwed up somewhere on line 2. 01:09:09.800 --> 01:09:11.970 It's showing me the line of code here. 01:09:11.970 --> 01:09:16.729 And it's saying "ValueError-- invalid literal for int with base 10, cat." 01:09:16.729 --> 01:09:19.430 That's a very cryptic way of saying I just 01:09:19.430 --> 01:09:23.750 have tried to cast something that's not an integer to an integer. 01:09:23.750 --> 01:09:26.600 And so this is why we use things like the CS50 library. 01:09:26.600 --> 01:09:28.970 It's actually kind of annoying to write all of the code 01:09:28.970 --> 01:09:32.450 that checks and makes sure did the user type in a number and only a number, 01:09:32.450 --> 01:09:35.270 and not "cat" or "dog" or some other cryptic string. 01:09:35.270 --> 01:09:38.450 We ourselves now would have to implement that kind of error checking 01:09:38.450 --> 01:09:40.250 if we don't want to use the CS50 library. 01:09:40.250 --> 01:09:41.430 So there, trade-off. 01:09:41.430 --> 01:09:43.760 Maybe you feel more comfortable writing all of the code yourself. 01:09:43.760 --> 01:09:46.552 You don't want to use some random person on the internet's library, 01:09:46.552 --> 01:09:49.760 whether it's CS50's or someone else's, even if it's free and open source. 01:09:49.760 --> 01:09:51.140 You want to write it yourself. 01:09:51.140 --> 01:09:51.880 OK, fine. 01:09:51.880 --> 01:09:53.630 If you want to write it yourself, now I've 01:09:53.630 --> 01:09:56.630 got to add a bunch more lines of code to check, 01:09:56.630 --> 01:10:00.260 did the human type in a decimal digit one after the other, or did they 01:10:00.260 --> 01:10:02.490 type in other ASCII characters? 01:10:02.490 --> 01:10:05.300 So again, trade-off between using libraries are not. 01:10:05.300 --> 01:10:10.100 Generally, the answer is going to be use a common library to do-- 01:10:10.100 --> 01:10:12.110 to solve these kinds of problems. 01:10:12.110 --> 01:10:14.520 Well, let's go ahead and change the program a little bit. 01:10:14.520 --> 01:10:20.810 Let me go ahead and open a new file called division.py just 01:10:20.810 --> 01:10:22.490 to do a bit of division here. 01:10:22.490 --> 01:10:25.040 And let me go ahead on the right-hand side and copy 01:10:25.040 --> 01:10:28.440 paste what we did before, but just change to division here. 01:10:28.440 --> 01:10:31.310 Let me go ahead and divide x by y. 01:10:31.310 --> 01:10:33.830 And I keep typing in 1 for x, 2 for y. 01:10:33.830 --> 01:10:36.890 In a moment I'm going to run python of division.py and type 01:10:36.890 --> 01:10:38.900 in 1 for x and 2 for y. 01:10:38.900 --> 01:10:44.780 But before I hit Enter, if this were a program in C, what would the answer be? 01:10:44.780 --> 01:10:47.240 Feel free to just respond in the chat if you'd like. 01:10:47.240 --> 01:10:51.050 If this were a program in C, and I'm dividing x by y, 01:10:51.050 --> 01:10:54.860 what would I have gotten in week 1 and every week since, Brian? 01:10:54.860 --> 01:10:56.572 BRIAN: The consensus looks like 0. 01:10:56.572 --> 01:10:58.280 DAVID MALAN: Yeah, because of truncation. 01:10:58.280 --> 01:11:04.650 If 1 divided by 2, of course, is 1/2, or 0.5, 0.5 is a float. 01:11:04.650 --> 01:11:07.340 But if I'm dealing with integers, even though it's implicitly 01:11:07.340 --> 01:11:11.180 integers thus far, and now explicitly now that I've casted them, 01:11:11.180 --> 01:11:14.360 I would seem to throw away the 0.5 and just get back 0. 01:11:14.360 --> 01:11:18.290 But let me go ahead and run python of division.py and putting x equals 1, 01:11:18.290 --> 01:11:19.190 y equals 2. 01:11:19.190 --> 01:11:24.590 And voila, wow, one of the most annoying features, or lack of features in C, 01:11:24.590 --> 01:11:25.820 seems to have been-- 01:11:25.820 --> 01:11:29.880 seems to have been solved in Python by division doing what you want. 01:11:29.880 --> 01:11:32.540 And if you divide one integer by another in Python, 01:11:32.540 --> 01:11:35.660 it turns out one of the other features of today's language 01:11:35.660 --> 01:11:37.910 is that it does what you the programmer would 01:11:37.910 --> 01:11:42.080 expect, without having to get into the weeds, of the nuances of floats 01:11:42.080 --> 01:11:42.590 and ints. 01:11:42.590 --> 01:11:46.220 Just does the quote, unquote "right thing" instead. 01:11:46.220 --> 01:11:50.720 Well, let me go ahead and open up another program here, also from week 1. 01:11:50.720 --> 01:11:54.425 This one was called conditions.c. 01:11:54.425 --> 01:11:58.340 And this one-- give me one moment to open this up on the left-- 01:11:58.340 --> 01:12:01.520 this one here was a program whose purpose in life 01:12:01.520 --> 01:12:04.910 was to get an int from the user called x, get another called y. 01:12:04.910 --> 01:12:08.840 And then it just did this-- if x less than y, print out as much. 01:12:08.840 --> 01:12:12.412 Else if x greater than y, print out as much, and so forth. 01:12:12.412 --> 01:12:14.120 Let's go ahead and translate this program 01:12:14.120 --> 01:12:18.350 into the corresponding Python code using some of the syntax we've seen already. 01:12:18.350 --> 01:12:20.560 I'm going to go ahead and save this as conditions.py. 01:12:20.560 --> 01:12:22.310 And I think I'm going to go ahead and keep 01:12:22.310 --> 01:12:24.350 using the library, the CS50 library, so that I 01:12:24.350 --> 01:12:26.900 don't have to worry about those kinds of errors 01:12:26.900 --> 01:12:29.140 when casting bad input to another. 01:12:29.140 --> 01:12:31.910 So from cs50 import get_int. 01:12:31.910 --> 01:12:36.050 And let me go ahead and now get an int from the user, calling it x. 01:12:36.050 --> 01:12:39.560 Let's go ahead and get an int from the user, calling it y. 01:12:39.560 --> 01:12:42.470 And I won't bother typing comments this time, just for time's sake. 01:12:42.470 --> 01:12:43.820 And now let me ask the question. 01:12:43.820 --> 01:12:46.460 In C, I would have done if x less than y. 01:12:46.460 --> 01:12:48.050 Python's a little more terse. 01:12:48.050 --> 01:12:50.900 If x less than y suffices, but with a colon. 01:12:50.900 --> 01:12:55.380 Under that, I'm going to go ahead and say print "x is less than y." 01:12:55.380 --> 01:12:57.860 Elif-- this is the weird one-- 01:12:57.860 --> 01:13:02.420 x is greater than y, go ahead and print out "x is greater than y." 01:13:02.420 --> 01:13:09.050 And then else, also with a colon, print out "x is equal to y." 01:13:09.050 --> 01:13:11.250 And I think that's just about it. 01:13:11.250 --> 01:13:15.020 I'm going to go ahead down here and run python of conditions.py. 01:13:15.020 --> 01:13:19.040 I'll type in 1, I'll type in 2, and indeed x is less than y. 01:13:19.040 --> 01:13:22.040 I'll run it again, this time with 2 and 1. 01:13:22.040 --> 01:13:23.690 X is greater than y. 01:13:23.690 --> 01:13:25.880 And let me run it again with 1 and 1. 01:13:25.880 --> 01:13:27.230 X is equal to y. 01:13:27.230 --> 01:13:28.460 So that seems to have worked. 01:13:28.460 --> 01:13:30.002 And let me point out one other thing. 01:13:30.002 --> 01:13:33.230 I mentioned earlier that you have this other shorthand syntax where 01:13:33.230 --> 01:13:36.650 you can just say import the CS50 library if you don't want to bother typing out 01:13:36.650 --> 01:13:38.000 individual function names. 01:13:38.000 --> 01:13:39.510 That's totally fine. 01:13:39.510 --> 01:13:42.470 But notice that the IDE is yelling at me at lines 3 and 4 01:13:42.470 --> 01:13:44.820 that get_int is no longer recognized. 01:13:44.820 --> 01:13:47.000 That's because Python supports this feature, 01:13:47.000 --> 01:13:52.010 when using other people's libraries, that it can namespace them for you. 01:13:52.010 --> 01:13:55.910 That is to say, you can't refer to get_int anymore directly. 01:13:55.910 --> 01:13:59.990 You have to more explicitly say, call the get_int function that's 01:13:59.990 --> 01:14:02.630 inside of the CS50 library. 01:14:02.630 --> 01:14:05.510 And so again, using our familiar dot operator, 01:14:05.510 --> 01:14:08.810 means go inside of that CS50 library, just like a C struct, 01:14:08.810 --> 01:14:11.870 and call the function called get_int therein. 01:14:11.870 --> 01:14:15.830 So I can now go ahead and rerun this, python of conditions.py, 01:14:15.830 --> 01:14:19.560 typing in 1 and 1, and voila, the code is now working again. 01:14:19.560 --> 01:14:20.390 So which is better? 01:14:20.390 --> 01:14:21.080 It depends. 01:14:21.080 --> 01:14:23.750 I mean, if it's sort of more readable to just write get_int 01:14:23.750 --> 01:14:26.917 all over the place, that's going to save you a lot of keystrokes-- you don't 01:14:26.917 --> 01:14:28.866 have to keep typing cs50 dot, cs50 dot. 01:14:28.866 --> 01:14:31.250 If, though, you're writing a pretty big program, 01:14:31.250 --> 01:14:35.300 and maybe you're using two different libraries that both implement 01:14:35.300 --> 01:14:37.460 a function called get_int, you want to be 01:14:37.460 --> 01:14:39.870 able to distinguish one from the other. 01:14:39.870 --> 01:14:42.710 So you might want to just import the libraries by their name, 01:14:42.710 --> 01:14:46.290 and then prefix the function calls, as I've done here, 01:14:46.290 --> 01:14:47.750 which is known as namespacing. 01:14:47.750 --> 01:14:51.020 Namespacing means that you can have two identically named variables 01:14:51.020 --> 01:14:54.470 or functions existing in two different namespaces. 01:14:54.470 --> 01:14:57.440 They don't collide, so long as they are inside 01:14:57.440 --> 01:15:02.300 of the CS50 library or some other library's name instead. 01:15:02.300 --> 01:15:04.530 Let me do one other thing with conditions here. 01:15:04.530 --> 01:15:07.850 Let me go ahead and open up another file from week 1. 01:15:07.850 --> 01:15:10.790 This one was agree.c. 01:15:10.790 --> 01:15:17.360 And this program prompted the user to input whether or not they agree. 01:15:17.360 --> 01:15:21.710 And we checked a little curiously that first week using equals 01:15:21.710 --> 01:15:26.930 equals quote, unquote "Y" or lowercase "y," or quote, unquote capital "N" 01:15:26.930 --> 01:15:28.385 or lowercase "n." 01:15:28.385 --> 01:15:30.260 Well, how do we go about converting this one? 01:15:30.260 --> 01:15:32.520 Let me go ahead and give myself a new file over here. 01:15:32.520 --> 01:15:35.390 I'll call it agree.py in this case. 01:15:35.390 --> 01:15:38.510 And it turns out we can solve this one in a few different ways. 01:15:38.510 --> 01:15:42.950 Let me go ahead and start off by importing from CS50 get_int, 01:15:42.950 --> 01:15:46.190 just because it's-- oh, no, get_string, rather, because it's convenient. 01:15:46.190 --> 01:15:49.200 Let me go ahead and get the user's input via get_string 01:15:49.200 --> 01:15:53.510 and ask them the same question, "Do you agree," question mark with a space. 01:15:53.510 --> 01:15:54.350 Then let me check. 01:15:54.350 --> 01:16:01.760 If s equals equals quote, unquote "Y" or s equals equals lowercase "y," then 01:16:01.760 --> 01:16:04.970 I'm going to go ahead and print out "Agreed." 01:16:04.970 --> 01:16:11.680 Else-- oh, no, elif s equals equals capital "N" 01:16:11.680 --> 01:16:17.860 or s equals equals lowercase "n," let me go ahead and print out here quote, 01:16:17.860 --> 01:16:20.530 unquote, "Not agreed." 01:16:20.530 --> 01:16:22.930 And I think that should do it. 01:16:22.930 --> 01:16:25.120 But something's weird here. 01:16:25.120 --> 01:16:27.250 There's a few differences. 01:16:27.250 --> 01:16:31.090 What strikes you as different from C? 01:16:31.090 --> 01:16:33.640 What muscle memory might you have to break now 01:16:33.640 --> 01:16:37.660 when using conditions with multiple Boolean expressions 01:16:37.660 --> 01:16:39.512 combined in this way? 01:16:39.512 --> 01:16:40.720 And there's another subtlety. 01:16:40.720 --> 01:16:43.660 There's at least two salient differences between C and Python 01:16:43.660 --> 01:16:45.160 with just this example alone. 01:16:48.100 --> 01:16:51.600 Any thoughts in chat or [INAUDIBLE]? 01:16:51.600 --> 01:16:52.100 Ryan? 01:16:52.100 --> 01:16:53.892 AUDIENCE: I was going to say, for this one, 01:16:53.892 --> 01:16:56.300 instead of using the symbols for the logical operators, 01:16:56.300 --> 01:16:57.990 you can just type the text directly. 01:16:57.990 --> 01:16:58.740 DAVID MALAN: Yeah. 01:16:58.740 --> 01:17:01.020 We can literally just type the English word "or" 01:17:01.020 --> 01:17:03.150 if we want to express a logical or. 01:17:03.150 --> 01:17:06.445 So in C, recall on the left, we would have done this vertical bar 01:17:06.445 --> 01:17:07.320 thing, which is fine. 01:17:07.320 --> 01:17:08.140 You get used to it. 01:17:08.140 --> 01:17:10.770 But it's not very readable, at least in any English sense. 01:17:10.770 --> 01:17:13.470 Python took the approach of using more frequently 01:17:13.470 --> 01:17:17.628 actual English or English-like words that actually do read left to right. 01:17:17.628 --> 01:17:19.170 And indeed, a theme is emerging here. 01:17:19.170 --> 01:17:22.740 When you read Python code, it is closer to English 01:17:22.740 --> 01:17:26.040 than C is, because you don't trip over as much punctuation. 01:17:26.040 --> 01:17:29.340 Each line of Python code tends to read a little more 01:17:29.340 --> 01:17:32.190 like an English phrase or an English sentence. 01:17:32.190 --> 01:17:33.930 And there's one other subtlety here. 01:17:33.930 --> 01:17:37.470 On the left back in week 1, I took care to use single quotes 01:17:37.470 --> 01:17:39.600 around the Ys and the Ns. 01:17:39.600 --> 01:17:41.620 This week I'm using double quotes. 01:17:41.620 --> 01:17:43.840 But to be honest, it actually doesn't matter. 01:17:43.840 --> 01:17:48.360 I can alternatively use single quotes everywhere, so long as I'm consistent. 01:17:48.360 --> 01:17:50.610 But in Python there is no fundamental difference 01:17:50.610 --> 01:17:54.630 between double quotes and single quotes, so long as you are consistent. 01:17:54.630 --> 01:17:58.050 The reason being, when we looked at the data types that existed between C 01:17:58.050 --> 01:18:03.030 and now Python, absent from the list of Python data types was char. 01:18:03.030 --> 01:18:06.600 In Python there is no such thing as an individual char. 01:18:06.600 --> 01:18:09.810 Everything that's character-based is a string. 01:18:09.810 --> 01:18:13.650 Even if it's just one character long, everything is a string. 01:18:13.650 --> 01:18:16.290 Downside is we don't have quite as fine grained control. 01:18:16.290 --> 01:18:21.355 Upside is we get a lot more features with those string structures, 01:18:21.355 --> 01:18:23.730 as we've already seen with, for instance, doing something 01:18:23.730 --> 01:18:26.625 like uppercase with those as well. 01:18:26.625 --> 01:18:27.750 Well, let me go ahead and-- 01:18:27.750 --> 01:18:29.310 I think I can simplify this. 01:18:29.310 --> 01:18:32.550 For instance, suppose I wanted to tolerate something like not just "Y" 01:18:32.550 --> 01:18:34.980 or "y," in uppercase or lowercase. 01:18:34.980 --> 01:18:40.230 Suppose I wanted to also tolerate "Yes" in uppercase or lowercase as well. 01:18:40.230 --> 01:18:42.720 Well, you could imagine just starting to add to the code 01:18:42.720 --> 01:18:48.128 or s equals equals "Yes," or s equals equals "yes." 01:18:48.128 --> 01:18:50.670 But wait a minute, what if the user is being a little sloppy? 01:18:50.670 --> 01:18:54.360 And what if I want to actually say like, well, what if they're yelling? 01:18:54.360 --> 01:18:57.135 Or s equals equals "YES" in all caps. 01:18:57.135 --> 01:18:59.010 And there's a few other permutations as well. 01:18:59.010 --> 01:19:02.130 Like, this is quickly devolving into quite the mess. 01:19:02.130 --> 01:19:06.270 But if at the end of the day you really just want to detect "Y" or the word 01:19:06.270 --> 01:19:11.190 "Yes," irrespective of capitalization, I bet we can be pretty clever in Python 01:19:11.190 --> 01:19:11.790 here. 01:19:11.790 --> 01:19:19.230 What if I go ahead and say, if s is in quote, unquote "y" or "yes"-- 01:19:19.230 --> 01:19:21.780 in fact, I can borrow an idea from earlier, 01:19:21.780 --> 01:19:24.300 whereby I can use the square bracket notation to give me 01:19:24.300 --> 01:19:27.300 a list, which again, is like an array, but it will automatically grow 01:19:27.300 --> 01:19:28.440 or shrink as you need it. 01:19:28.440 --> 01:19:30.690 You don't have to decide in advance how big it is. 01:19:30.690 --> 01:19:34.950 This preposition here, in, is a new keyword in Python 01:19:34.950 --> 01:19:37.470 that will literally answer that question for me. 01:19:37.470 --> 01:19:39.030 And we've used it before earlier. 01:19:39.030 --> 01:19:44.610 When I implemented speller, I said if the word is in my set of words, return 01:19:44.610 --> 01:19:45.490 True. 01:19:45.490 --> 01:19:50.208 So if s in this list, I'll get back True or False 01:19:50.208 --> 01:19:51.750 based on the answer to that question. 01:19:51.750 --> 01:19:53.310 But again, it's not tolerating case. 01:19:53.310 --> 01:19:54.420 But no big deal-- 01:19:54.420 --> 01:19:58.170 dot lower, now I can say, is the lowercase version of s, 01:19:58.170 --> 01:20:01.920 no matter what the human typed in, in this list of two values? 01:20:01.920 --> 01:20:04.500 That means now the user can type in all caps, 01:20:04.500 --> 01:20:10.770 in alternating caps, and one capitalized letter, or any other permutation 01:20:10.770 --> 01:20:12.300 whatsoever. 01:20:12.300 --> 01:20:13.020 All right. 01:20:13.020 --> 01:20:14.880 So that, then, is our conditions. 01:20:14.880 --> 01:20:19.440 Let me pause here to see if there's any questions. 01:20:19.440 --> 01:20:21.870 Any questions or confusion that we can clear up? 01:20:21.870 --> 01:20:24.870 With syntax, with conditions, Boolean variable-- 01:20:24.870 --> 01:20:26.032 Boolean values? 01:20:26.032 --> 01:20:27.240 BRIAN: So a question came up. 01:20:27.240 --> 01:20:30.660 So in Python we are allowed to use the equals equals syntax 01:20:30.660 --> 01:20:32.032 to compare two strings? 01:20:32.032 --> 01:20:32.740 DAVID MALAN: Yes. 01:20:32.740 --> 01:20:34.740 So another really good catch. 01:20:34.740 --> 01:20:37.372 In Python, there are no pointers. 01:20:37.372 --> 01:20:39.330 Underneath the hood, there are still addresses. 01:20:39.330 --> 01:20:40.955 Like, your memory hasn't gone anywhere. 01:20:40.955 --> 01:20:44.340 But underneath the hood, all, of that is now managed for you by the language 01:20:44.340 --> 01:20:45.070 itself. 01:20:45.070 --> 01:20:49.020 So if you want to conceptually compare one string against another, 01:20:49.020 --> 01:20:53.370 just as I did here now on line 7, you can indeed use equals equals, 01:20:53.370 --> 01:20:56.590 and Python will do the quote, unquote "right thing" for you. 01:20:56.590 --> 01:21:00.647 You don't need to regress into using strcmp instead. 01:21:00.647 --> 01:21:02.730 Just for clarity, let me go ahead and update this. 01:21:02.730 --> 01:21:08.430 If s.lower in quote, unquote "n" or comma "no," 01:21:08.430 --> 01:21:12.540 we can achieve the same result there by doing the same technique. 01:21:12.540 --> 01:21:15.450 Well, let me go ahead and open up another example 01:21:15.450 --> 01:21:17.940 that you might recall we did a progression of examples 01:21:17.940 --> 01:21:22.380 to make it good, better, and then best, this one involving 01:21:22.380 --> 01:21:24.100 just a cat meowing in some form. 01:21:24.100 --> 01:21:26.760 So let me go ahead and open up from week 1 01:21:26.760 --> 01:21:31.410 an example that was called meow0, relatively straightforward, that 01:21:31.410 --> 01:21:32.730 simply did this. 01:21:32.730 --> 01:21:34.350 It simply meowed three times. 01:21:34.350 --> 01:21:37.230 So suffice it to say now, in Python, it's pretty trivial 01:21:37.230 --> 01:21:38.820 to do something three times like this. 01:21:38.820 --> 01:21:41.730 I'm going to go ahead and call this meow.py. 01:21:41.730 --> 01:21:45.240 And of course, I can just do something like print "meow." 01:21:45.240 --> 01:21:46.750 And I can just copy paste that. 01:21:46.750 --> 01:21:49.740 But of course, the whole point of this example back in week 1 01:21:49.740 --> 01:21:52.020 was not to devolve into just copy paste. 01:21:52.020 --> 01:21:53.377 Surely there's a better way. 01:21:53.377 --> 01:21:54.960 And we've seen a better way this time. 01:21:54.960 --> 01:21:57.750 If we wanted to change this into a for loop in C, 01:21:57.750 --> 01:22:04.650 we could have done something like for int i get 0, i less than 3, i++. 01:22:04.650 --> 01:22:06.630 Then in some curly braces we could have done 01:22:06.630 --> 01:22:09.780 printf of "meow," new line, semicolon. 01:22:09.780 --> 01:22:12.750 So that was the next version of our meow code in C. 01:22:12.750 --> 01:22:15.270 But in Python, of course, it's a little more succinct. 01:22:15.270 --> 01:22:22.320 I can just do for i in range 3 print quote, unquote "meow." 01:22:22.320 --> 01:22:24.792 So very similar in spirit to our hello, world of before. 01:22:24.792 --> 01:22:27.250 But again, we don't have to include any libraries for this. 01:22:27.250 --> 01:22:28.980 We don't need to have a main function. 01:22:28.980 --> 01:22:31.688 We don't need any of those curly braces or semicolon or the like. 01:22:31.688 --> 01:22:35.040 We can just dive in and focus on the code itself. 01:22:35.040 --> 01:22:39.840 But recall that we also, last time, evolved the meow program 01:22:39.840 --> 01:22:42.780 into having our own helper function, our own function that 01:22:42.780 --> 01:22:48.480 actually allowed us to create an abstraction on top of meowing. 01:22:48.480 --> 01:22:51.120 And that was in our third version, a.k.a., meow2. 01:22:51.120 --> 01:22:53.400 Let me go ahead and open up this version in a tab. 01:22:53.400 --> 01:22:56.970 And notice that this version starts to get a little involved, because one, 01:22:56.970 --> 01:22:59.580 we needed a prototype at the top, because I now 01:22:59.580 --> 01:23:02.520 have meow function at the bottom whose purpose in life 01:23:02.520 --> 01:23:04.860 was just to print "meow," but to abstract that away 01:23:04.860 --> 01:23:07.140 as a new helper function. 01:23:07.140 --> 01:23:10.230 And then I had this code here with a for loop inside. 01:23:10.230 --> 01:23:13.800 Well, in Python it's going to work out to be a little simpler here, too. 01:23:13.800 --> 01:23:17.040 If I want to do something three times, for i in range of 3 01:23:17.040 --> 01:23:18.990 go ahead and call meow. 01:23:18.990 --> 01:23:21.460 Now of course, meow doesn't yet exist. 01:23:21.460 --> 01:23:22.770 So I can solve that problem. 01:23:22.770 --> 01:23:25.290 We've seen earlier, albeit quickly, in speller that I 01:23:25.290 --> 01:23:27.180 can define my own functions like meow. 01:23:27.180 --> 01:23:29.223 There's no more void, because if you don't 01:23:29.223 --> 01:23:31.890 want to have arguments in a function, just don't put them there. 01:23:31.890 --> 01:23:34.770 There's no return value specified in Python. 01:23:34.770 --> 01:23:36.000 They're implicit instead. 01:23:36.000 --> 01:23:37.800 So it suffices to do this. 01:23:37.800 --> 01:23:40.530 And now I can just print out "meow." 01:23:40.530 --> 01:23:44.370 So here now, I have a program that iterates three times, 01:23:44.370 --> 01:23:47.770 calling meow each time, and meow is defined down below. 01:23:47.770 --> 01:23:51.876 Let me go ahead and run this, python of meow.py. 01:23:51.876 --> 01:23:52.650 Huh. 01:23:52.650 --> 01:23:54.690 Traceback, most recent call last. 01:23:54.690 --> 01:23:59.010 There's a problem on line 2 of meow.py because of NameError-- name 01:23:59.010 --> 01:24:01.360 "meow" is not defined. 01:24:01.360 --> 01:24:05.550 Now, the language being used there by Python is a little different from C's. 01:24:05.550 --> 01:24:10.050 It's frankly a little more human friendly. 01:24:10.050 --> 01:24:11.400 But what just happened? 01:24:11.400 --> 01:24:16.365 What problem has arisen that I yet haven't tripped over until now? 01:24:19.210 --> 01:24:21.620 Even if you've never programmed in Python before, 01:24:21.620 --> 01:24:26.770 and even if you haven't run help50 yet, what might be the issue there? 01:24:26.770 --> 01:24:28.750 Ginny? 01:24:28.750 --> 01:24:32.410 AUDIENCE: It's that the function is not found when we are trying to call it, 01:24:32.410 --> 01:24:35.370 because it's described below when we are calling it. 01:24:35.370 --> 01:24:36.130 DAVID MALAN: Yeah. 01:24:36.130 --> 01:24:37.703 AUDIENCE: There is no prototype. 01:24:37.703 --> 01:24:39.370 DAVID MALAN: Yeah, there's no prototype. 01:24:39.370 --> 01:24:42.370 And it turns out in Python, there isn't a notion of prototypes. 01:24:42.370 --> 01:24:44.860 So unfortunately, the solution we saw in week 1 01:24:44.860 --> 01:24:47.410 is not to just copy and paste the first line up above 01:24:47.410 --> 01:24:48.850 and end it with a semicolon. 01:24:48.850 --> 01:24:50.170 That's just not a thing. 01:24:50.170 --> 01:24:51.520 I could do this. 01:24:51.520 --> 01:24:54.760 I could just move my meow function to the top of the file, 01:24:54.760 --> 01:24:58.330 thereby defining the function first, and then using it last. 01:24:58.330 --> 01:25:01.480 And that would actually solve the problem, "meow meow meow." 01:25:01.480 --> 01:25:03.940 That, of course, doesn't really help us long term, 01:25:03.940 --> 01:25:06.940 because you could probably imagine a situation where this function wants 01:25:06.940 --> 01:25:09.273 to call this function, but this function calls this one, 01:25:09.273 --> 01:25:12.592 and you just can't really neatly order them in some safe way. 01:25:12.592 --> 01:25:14.800 And it's just not going to be as maintainable, right? 01:25:14.800 --> 01:25:18.220 Recall that one of the values of putting main at the top of our C programs 01:25:18.220 --> 01:25:21.128 was that any reasonable person who wants to understand your code 01:25:21.128 --> 01:25:23.170 is probably going to start reading top to bottom. 01:25:23.170 --> 01:25:26.003 They're not going to want to have to scroll through all of your code 01:25:26.003 --> 01:25:28.120 looking for the actual main code. 01:25:28.120 --> 01:25:32.050 So it turns out in Python, even though you don't need a main function, 01:25:32.050 --> 01:25:35.410 it's actually common to define one nonetheless. 01:25:35.410 --> 01:25:38.560 It's going to be implemented with something like this. 01:25:38.560 --> 01:25:41.780 And I'm just going to indent my code below that there. 01:25:41.780 --> 01:25:43.510 So now I've defined main. 01:25:43.510 --> 01:25:45.490 But I haven't executed any code yet. 01:25:45.490 --> 01:25:49.690 On line 6, I've now defined meow, but I haven't executed any code yet. 01:25:49.690 --> 01:25:50.890 And I mean that literally. 01:25:50.890 --> 01:25:53.530 If I run python of meow now and hit Enter, 01:25:53.530 --> 01:25:58.390 I would hope to see "meow meow meow," but I see nothing. 01:25:58.390 --> 01:26:00.470 And this is a little weird. 01:26:00.470 --> 01:26:02.860 But Python is doing literally what I told it to do. 01:26:02.860 --> 01:26:04.840 I told it to define a function called main, 01:26:04.840 --> 01:26:08.290 and I told it to define a function called meow. 01:26:08.290 --> 01:26:12.840 What I never told it to do is to call either of those functions. 01:26:12.840 --> 01:26:16.480 So the simplest fix here-- it's a little different from C and a little weird-- 01:26:16.480 --> 01:26:19.700 is just call main is your very last thought in the file. 01:26:19.700 --> 01:26:23.600 So define main up at the top, just where most programmers would expect it to be, 01:26:23.600 --> 01:26:25.292 but call it all the way at the bottom. 01:26:25.292 --> 01:26:27.250 And let me go ahead and now and run my program. 01:26:27.250 --> 01:26:30.640 And now voila, "meow meow meow" is back, because I've defined main, 01:26:30.640 --> 01:26:33.910 I've defined meow, and now I am calling main. 01:26:33.910 --> 01:26:37.510 Now, as an aside, you will very often see in various documentation 01:26:37.510 --> 01:26:42.250 and tutorials online a much more cryptic incantation than this, 01:26:42.250 --> 01:26:44.440 which will have you typing out this. 01:26:44.440 --> 01:26:47.050 This achieves the same goal, but it's not strictly necessary 01:26:47.050 --> 01:26:47.860 for our purposes. 01:26:47.860 --> 01:26:50.920 This line of code, if you see it in any online references, or examples, 01:26:50.920 --> 01:26:53.328 or books, or sections or the like, it is necessary 01:26:53.328 --> 01:26:55.120 only when you're implementing, essentially, 01:26:55.120 --> 01:26:58.030 your own libraries-- like your own CS50 library, 01:26:58.030 --> 01:27:00.580 or your own image blurring library or the like. 01:27:00.580 --> 01:27:03.875 It's not necessary when we're just writing individual programs of our own. 01:27:03.875 --> 01:27:07.000 So I'm going to go ahead and keep mine simple and literally just call main. 01:27:07.000 --> 01:27:10.630 And let me just wave my hand at why you'd need that syntax otherwise 01:27:10.630 --> 01:27:11.950 in this context. 01:27:11.950 --> 01:27:14.770 But let me go ahead and modify this one last time. 01:27:14.770 --> 01:27:18.520 Because recall that in C, the last version of my program 01:27:18.520 --> 01:27:21.310 had me running meow and passing it input. 01:27:21.310 --> 01:27:24.670 Because I defined meow as taking an input like n, 01:27:24.670 --> 01:27:30.370 and then doing something like for int i get 0, i less than n, i++, 01:27:30.370 --> 01:27:34.030 and then inside of my curly braces did I print meow, 01:27:34.030 --> 01:27:38.470 so that now I have a helper function that I've invented that takes one 01:27:38.470 --> 01:27:40.240 input, an int called n. 01:27:40.240 --> 01:27:43.390 And it loops that many times and prints out meow that many times. 01:27:43.390 --> 01:27:47.320 And now I have a real nice abstraction, and that now my program is distilled, 01:27:47.320 --> 01:27:48.820 it's just meow three times. 01:27:48.820 --> 01:27:51.130 And it doesn't matter how I implemented meow. 01:27:51.130 --> 01:27:52.900 I can do the same thing in Python. 01:27:52.900 --> 01:27:56.530 I can go ahead and say that meow takes an argument called n. 01:27:56.530 --> 01:27:58.450 I don't have to bother specifying its type. 01:27:58.450 --> 01:28:03.520 I can now say for i in range of n, and I can print "meow" that many times. 01:28:03.520 --> 01:28:07.810 And now I can get rid of my loop in main and just say "meow" three times. 01:28:07.810 --> 01:28:09.310 And so same functionality. 01:28:09.310 --> 01:28:11.770 If I run this a final time, "meow meow meow," 01:28:11.770 --> 01:28:16.570 but now I'm kind of designing my code in a more sophisticated way 01:28:16.570 --> 01:28:23.320 by actually giving myself now some of my own actual helper functions. 01:28:23.320 --> 01:28:26.350 All right, any questions, then, on this progression? 01:28:26.350 --> 01:28:28.810 Now, we're not really seeing new Python syntax. 01:28:28.810 --> 01:28:32.980 We're now just seeing a translation of some actual past C programs 01:28:32.980 --> 01:28:37.020 into Python to show really the equivalence. 01:28:39.560 --> 01:28:40.080 All right. 01:28:40.080 --> 01:28:42.247 Well, let's go ahead, then, and open another version 01:28:42.247 --> 01:28:45.590 from week 1 of a program called positive.c, 01:28:45.590 --> 01:28:49.850 which was an opportunity back then, not only to define our own helper function 01:28:49.850 --> 01:28:52.760 called get_positive_int, but it also introduced us 01:28:52.760 --> 01:28:54.440 to the familiar do while loop. 01:28:54.440 --> 01:28:57.380 And unfortunately, we're going to take that away from you now. 01:28:57.380 --> 01:28:59.390 Python does not have a do while loop. 01:28:59.390 --> 01:29:03.260 But it's, of course, a very useful thing to be able to do something 01:29:03.260 --> 01:29:04.730 while a condition is true. 01:29:04.730 --> 01:29:07.700 After all, pretty much any time we've gotten user input in the class, 01:29:07.700 --> 01:29:11.570 we've used do while, so that we prompt them at least once and then optionally 01:29:11.570 --> 01:29:14.300 again and again and again, until they cooperate. 01:29:14.300 --> 01:29:16.430 So let me go ahead and implement this in Python 01:29:16.430 --> 01:29:24.490 now in a file called positive.py, and go ahead here in positive.py, 01:29:24.490 --> 01:29:27.050 and translate this thing as follows. 01:29:27.050 --> 01:29:31.060 Let me go ahead and from cs50 import get_int. 01:29:31.060 --> 01:29:33.670 Let me go ahead and define a function called main. 01:29:33.670 --> 01:29:35.920 So now I'm just going to start to get into this habit. 01:29:35.920 --> 01:29:38.575 I'm going to go ahead and give myself a variable called i 01:29:38.575 --> 01:29:40.953 and call get_positive_int. 01:29:40.953 --> 01:29:43.120 And then I'm just going to go ahead and print out i, 01:29:43.120 --> 01:29:44.560 keeping it nice and simple. 01:29:44.560 --> 01:29:48.550 Now I have to implement get_positive_int. 01:29:48.550 --> 01:29:53.440 It doesn't need to take any input, so I'm not going to give it any arguments. 01:29:53.440 --> 01:29:55.510 And now I have to do to do while thing. 01:29:55.510 --> 01:29:59.770 So the Pythonic way to do this in Python is almost always 01:29:59.770 --> 01:30:02.320 to deliberately induce an infinite loop. 01:30:02.320 --> 01:30:04.990 And the idea being, if you want to do something again and again, 01:30:04.990 --> 01:30:07.930 just start doing it forever and then break out of the loop 01:30:07.930 --> 01:30:09.458 when you are ready to. 01:30:09.458 --> 01:30:11.500 So what do I want to do forever in this function? 01:30:11.500 --> 01:30:14.230 Well, I want to go ahead and get an int and prompt 01:30:14.230 --> 01:30:17.600 the human for a positive integer. 01:30:17.600 --> 01:30:21.490 And then I want to go ahead on the next line and ask a question. 01:30:21.490 --> 01:30:27.280 Well, if n is greater than 0, thereby making it positive, break. 01:30:27.280 --> 01:30:31.490 And the last line of code here is going to be to return n. 01:30:31.490 --> 01:30:35.170 So notice in C on the left, I did this do whole thing. 01:30:35.170 --> 01:30:37.870 I had to declare n outside of the do while loop, 01:30:37.870 --> 01:30:40.630 because it had to be outside the curly braces to be in scope. 01:30:40.630 --> 01:30:44.290 But in Python here, notice what I'm doing here 01:30:44.290 --> 01:30:47.860 is actually a little bit different. 01:30:47.860 --> 01:30:50.050 And did I screw up? 01:30:53.790 --> 01:30:55.300 Oh, yes, I did screw up. 01:30:55.300 --> 01:30:56.500 OK. 01:30:56.500 --> 01:30:58.840 If ask the actual question, if n greater than 0. 01:30:58.840 --> 01:31:01.690 So what did I do actually differently here on the right-hand side? 01:31:01.690 --> 01:31:03.982 Well, notice, I deliberately induced this infinite loop 01:31:03.982 --> 01:31:06.550 on line 10, which just means, do the following forever. 01:31:06.550 --> 01:31:10.510 I then ask the user for a variable n with get_int, and then I check, 01:31:10.510 --> 01:31:12.460 is n greater than 0? 01:31:12.460 --> 01:31:14.408 If so, break out of the loop. 01:31:14.408 --> 01:31:15.700 How do I break out of the loop? 01:31:15.700 --> 01:31:19.160 Well, notice that the indentation here has been very consistent. 01:31:19.160 --> 01:31:22.150 So when I break out of the loop, that puts me back 01:31:22.150 --> 01:31:26.055 in line with the original indentation which is now on line 14. 01:31:26.055 --> 01:31:28.930 Notice that the return lines up with the while loop, which means it's 01:31:28.930 --> 01:31:31.840 the first line of code that's outside of that loop. 01:31:31.840 --> 01:31:34.300 In the past, we would have had very explicit curly braces. 01:31:34.300 --> 01:31:37.752 Now we rely only on indentation that then lets me return n. 01:31:37.752 --> 01:31:39.460 So what are some of the differences here? 01:31:39.460 --> 01:31:41.530 One, the do while loop is completely gone. 01:31:41.530 --> 01:31:45.250 But two, scope is no longer an issue. 01:31:45.250 --> 01:31:48.940 It turns out in Python that the moment you declare a variable, 01:31:48.940 --> 01:31:51.980 it exists until the end of that function. 01:31:51.980 --> 01:31:55.750 You don't have to worry about the nuance of declaring a variable first like we 01:31:55.750 --> 01:31:59.620 did in C up here and then returning it down below. 01:31:59.620 --> 01:32:03.220 The moment we execute this line of code 11 here, n 01:32:03.220 --> 01:32:07.220 suddenly exists for the entirety of the remainder of that function. 01:32:07.220 --> 01:32:10.120 So even though we declared it inside of the loop, so to speak, 01:32:10.120 --> 01:32:14.800 as per the indentation, it is still accessible to the return statement 01:32:14.800 --> 01:32:16.610 here at the end of the program. 01:32:16.610 --> 01:32:17.110 All right. 01:32:17.110 --> 01:32:21.250 Let me pause there and see if there's any questions or confusion 01:32:21.250 --> 01:32:26.530 on getting user input, doing the equivalent, logically, of do while, 01:32:26.530 --> 01:32:28.870 but doing it now in this more Pythonic way. 01:32:28.870 --> 01:32:29.530 Peter? 01:32:29.530 --> 01:32:32.935 AUDIENCE: In Python, are variables accessible across functions or no? 01:32:32.935 --> 01:32:34.060 DAVID MALAN: Good question. 01:32:34.060 --> 01:32:34.660 No. 01:32:34.660 --> 01:32:37.360 So if you declare a variable inside of a function, 01:32:37.360 --> 01:32:40.100 it is scoped, so to speak, to that function. 01:32:40.100 --> 01:32:41.500 It is not available elsewhere. 01:32:41.500 --> 01:32:44.710 You would have to return it and pass it as output to input. 01:32:44.710 --> 01:32:50.740 Or you would have to define it, for instance, as a global variable instead. 01:32:50.740 --> 01:32:51.440 All right. 01:32:51.440 --> 01:32:53.500 Well, what else, then, might we translate? 01:32:53.500 --> 01:32:57.680 Well, recall from our earlier endeavors in week 1, 01:32:57.680 --> 01:32:59.680 we played around with these examples from Mario. 01:32:59.680 --> 01:33:02.920 And for instance, we wanted to print something out in Python-- 01:33:02.920 --> 01:33:07.690 in C that mimics the notion of these pyramids, or these coins, 01:33:07.690 --> 01:33:09.760 or these little bricks on the screen. 01:33:09.760 --> 01:33:13.858 Well, here let me go ahead and open up a new file called mario.py. 01:33:13.858 --> 01:33:16.900 And I'm going to transition away from always showing the before and after 01:33:16.900 --> 01:33:19.420 and now just start to focus more on the Python code. 01:33:19.420 --> 01:33:22.550 But you can always look back if you wanted the corresponding C versions. 01:33:22.550 --> 01:33:25.998 How do I go about printing out three bricks like this vertically? 01:33:25.998 --> 01:33:27.790 Well, in Python I might say something like, 01:33:27.790 --> 01:33:34.000 for i in range of 3, quite simply, as we've done a few times already, 01:33:34.000 --> 01:33:35.680 and just go ahead and print out a hash. 01:33:35.680 --> 01:33:38.740 I don't need to worry about the new line, because you get it for free, 01:33:38.740 --> 01:33:39.610 so to speak. 01:33:39.610 --> 01:33:42.700 But I'm going to go ahead now and run python of mario.py. 01:33:42.700 --> 01:33:47.620 And voila, there's my very simple ASCII version of this Mario structure. 01:33:47.620 --> 01:33:49.870 But what if I want to do the coins instead? 01:33:49.870 --> 01:33:54.040 What if I want to do this horizontal coins that appear in these four bricks 01:33:54.040 --> 01:33:56.320 and print out a version of that? 01:33:56.320 --> 01:33:57.770 Well, how might I do that? 01:33:57.770 --> 01:34:00.430 Well, let me go ahead and change this to be-- 01:34:00.430 --> 01:34:05.150 instead in my code for i in range of 4, so I can print four of these things. 01:34:05.150 --> 01:34:09.080 Let me go ahead and print out a question mark and then run this. 01:34:09.080 --> 01:34:11.440 So let me run mario.py. 01:34:11.440 --> 01:34:13.180 And voila-- damn. 01:34:13.180 --> 01:34:15.110 Like, not what I wanted. 01:34:15.110 --> 01:34:16.750 And so here is that trade-off. 01:34:16.750 --> 01:34:18.580 You might have been kind of excited, so far 01:34:18.580 --> 01:34:21.790 as it's possible to be excited about code, that, oh, my God, 01:34:21.790 --> 01:34:24.500 you don't need to do the stupid new line characters anymore. 01:34:24.500 --> 01:34:25.870 But what if you don't want it? 01:34:25.870 --> 01:34:31.390 Now we've kind of found a downside of getting those new lines automatically. 01:34:31.390 --> 01:34:34.870 Well, it turns out if we read the documentation for the print function 01:34:34.870 --> 01:34:38.170 in Python, it, too, can take multiple arguments. 01:34:38.170 --> 01:34:41.462 And what's powerful about Python, too, is 01:34:41.462 --> 01:34:43.420 that it supports not just positional arguments, 01:34:43.420 --> 01:34:47.620 where you just do a comma separated list of multiple arguments to a function. 01:34:47.620 --> 01:34:51.070 Python supports what are called named arguments, whereby 01:34:51.070 --> 01:34:53.950 if a function, especially one that's super powerful like print, 01:34:53.950 --> 01:34:58.540 takes multiple inputs, like this one, this other one, and this other thing. 01:34:58.540 --> 01:35:00.430 Each of those inputs can have names. 01:35:00.430 --> 01:35:04.240 And you, the user of that function, can specify the name. 01:35:04.240 --> 01:35:10.330 And it turns out that print in Python supports an argument called "end." 01:35:10.330 --> 01:35:14.920 And you can explicitly say what value you want to give to that parameter 01:35:14.920 --> 01:35:16.300 by mentioning its name. 01:35:16.300 --> 01:35:18.550 And here I'm going to literally do this. 01:35:18.550 --> 01:35:20.740 I'm going to tell the print function that I 01:35:20.740 --> 01:35:25.900 want the value of "end," a parameter, an argument to it, to be quote, unquote. 01:35:25.900 --> 01:35:28.630 The reason for that is that if I read the documentation, 01:35:28.630 --> 01:35:30.160 the default is actually this. 01:35:30.160 --> 01:35:34.600 If you read the documentation, it will tell you print's default value 01:35:34.600 --> 01:35:37.510 for its end argument is backslash n. 01:35:37.510 --> 01:35:39.580 This, too, is a feature that C did not have. 01:35:39.580 --> 01:35:41.350 C did not have optional arguments. 01:35:41.350 --> 01:35:43.510 They're either there or they're not. 01:35:43.510 --> 01:35:46.870 Rather, they either have to be there, or they cannot be there. 01:35:46.870 --> 01:35:51.310 Python supports optional arguments that even have default values. 01:35:51.310 --> 01:35:54.910 And so in this case, the default value of this, per the documentation, 01:35:54.910 --> 01:35:58.240 is that end is quote, unquote backslash n, which 01:35:58.240 --> 01:36:00.880 is why every line ends with that value. 01:36:00.880 --> 01:36:02.950 If you want to change that to be nothing, 01:36:02.950 --> 01:36:06.070 the so-called empty string, you change it to quote, unquote. 01:36:06.070 --> 01:36:09.760 So let me go ahead and run this now, and voila, closer. 01:36:09.760 --> 01:36:12.490 It's a little stupid looking, because now my cursor ended up-- 01:36:12.490 --> 01:36:14.690 my prompt ended up on the same line. 01:36:14.690 --> 01:36:18.130 So maybe after this line, let me just go ahead and print nothing, that is, 01:36:18.130 --> 01:36:19.190 a new line. 01:36:19.190 --> 01:36:23.775 And now if I run mario.py, voila, now I get the effect I want. 01:36:23.775 --> 01:36:25.900 And if you want to see what's really going on here, 01:36:25.900 --> 01:36:28.150 I can do something stupid like "HELLO." 01:36:28.150 --> 01:36:34.180 And now I can end every print with "HELLO," "HELLO," "HELLO," "HELLO." 01:36:34.180 --> 01:36:36.430 Not that you would do that, but that's all it means. 01:36:36.430 --> 01:36:40.060 It's ending every call to print with that expression. 01:36:40.060 --> 01:36:44.560 But the correct version, of course, is just to blank it out in this way. 01:36:44.560 --> 01:36:47.260 But here's something that's kind of cool. 01:36:47.260 --> 01:36:49.660 And this is where if you're kind of a geek, 01:36:49.660 --> 01:36:51.880 life starts to get really interesting fast. 01:36:51.880 --> 01:36:55.510 I can actually change my Python code to print out these four question 01:36:55.510 --> 01:37:00.760 marks in the sky to be quite simply print quote, unquote question 01:37:00.760 --> 01:37:03.100 mark times 4. 01:37:03.100 --> 01:37:06.280 And now if I rerun this program, boom, done. 01:37:06.280 --> 01:37:10.350 And here's where, again, you're getting a lot of features in the language 01:37:10.350 --> 01:37:12.100 where you don't have to think about loops, 01:37:12.100 --> 01:37:14.680 you don't have to think about a lot of syntax. 01:37:14.680 --> 01:37:17.590 If you want to take a question mark and do it four times, 01:37:17.590 --> 01:37:20.080 you can literally use the star operator, which 01:37:20.080 --> 01:37:24.490 has been overloaded to support not only multiplication with numbers 01:37:24.490 --> 01:37:31.340 but also automatic concatenation, if you will, with strings in this way. 01:37:31.340 --> 01:37:33.643 So let me go ahead and do one final version for mario. 01:37:33.643 --> 01:37:35.560 Recall that the last thing we built with mario 01:37:35.560 --> 01:37:37.060 looked a little something like this. 01:37:37.060 --> 01:37:40.840 Let me go ahead and change my mario code now to be for i in range of 3, 01:37:40.840 --> 01:37:44.260 because this is a 3 by 3 grid of bricks, let's say. 01:37:44.260 --> 01:37:46.960 And let's go ahead and now, inside of this loop, 01:37:46.960 --> 01:37:53.830 do another nested loop where I do three columns as well. 01:37:53.830 --> 01:37:56.950 And in here, I want to print out a single hash at a time. 01:37:56.950 --> 01:37:58.980 But I don't want to print out a new line. 01:37:58.980 --> 01:38:02.023 I only want to print out a new line here. 01:38:02.023 --> 01:38:04.440 So it turns out that essentially, because Python gives you 01:38:04.440 --> 01:38:07.710 the backslash n's automatically, essentially any logic 01:38:07.710 --> 01:38:09.870 you wrote in the past now needs to be reversed. 01:38:09.870 --> 01:38:13.200 If you ever printed a new line, now you don't want to print a new line. 01:38:13.200 --> 01:38:17.400 And if you ever didn't print a new line, now you do, in some sense. 01:38:17.400 --> 01:38:19.500 So let me go ahead and-- 01:38:19.500 --> 01:38:22.440 not make, wrong language-- python of mario.py. 01:38:22.440 --> 01:38:24.820 And voila, my 3 by 3 grid. 01:38:24.820 --> 01:38:28.320 So this is to say that in Python, we can nest loops, just 01:38:28.320 --> 01:38:31.380 like we did in C. I can use multiple variable names, like i 01:38:31.380 --> 01:38:32.640 and j being conventional. 01:38:32.640 --> 01:38:35.010 There's no curly braces, there's no semicolons. 01:38:35.010 --> 01:38:37.920 But again, the logic, the ideas are still the same. 01:38:37.920 --> 01:38:42.070 It just takes a little bit of time to get used to, for instance, 01:38:42.070 --> 01:38:43.650 some of the new syntax. 01:38:43.650 --> 01:38:48.600 You'll recall that in C, we ran into a problem pretty early on with integers. 01:38:48.600 --> 01:38:50.880 And let me create a program here called int.py. 01:38:50.880 --> 01:38:53.970 And let me initialize a variable called i to 1. 01:38:53.970 --> 01:38:55.830 And let me go ahead and do this forever. 01:38:55.830 --> 01:38:56.890 Let me do this forever. 01:38:56.890 --> 01:38:58.140 Instead of a while True block. 01:38:58.140 --> 01:38:59.760 Let me print out whatever i is. 01:38:59.760 --> 01:39:04.380 And then let me go ahead and just add 1 to i on each iteration. 01:39:04.380 --> 01:39:06.370 Let me go ahead and run this program. 01:39:06.370 --> 01:39:09.640 And let me increase the size of my window for now and just run this thing. 01:39:09.640 --> 01:39:10.740 Whoops, that was mario. 01:39:10.740 --> 01:39:16.010 Let me run this thing, python of int.py. 01:39:16.010 --> 01:39:18.510 And you'll see that it's counting up to infinity. 01:39:18.510 --> 01:39:21.020 And honestly, this is going to take a while. 01:39:21.020 --> 01:39:25.340 You know what's faster than counting by 1/ maybe multiplying by 2. 01:39:25.340 --> 01:39:28.040 So let me go ahead and multiply by 2 instead. 01:39:28.040 --> 01:39:30.440 To kill the program, just like in C I used Control-C. 01:39:30.440 --> 01:39:32.390 And that's why I see keyboard interrupt. 01:39:32.390 --> 01:39:34.860 It respected my wanting to cancel the program. 01:39:34.860 --> 01:39:37.610 Let me rerun this now and just count really big. 01:39:37.610 --> 01:39:39.830 And even though the internet's being a little slow, 01:39:39.830 --> 01:39:44.120 which is why it's a little shaky, that's a really big number already 01:39:44.120 --> 01:39:46.100 if I keep doubling i. 01:39:46.100 --> 01:39:48.500 What would have happened already at this point 01:39:48.500 --> 01:39:51.620 if I were using C to implement this program? 01:39:51.620 --> 01:39:55.700 If in C I declared a variable called i, and it was an int, 01:39:55.700 --> 01:39:57.770 and I kept doubling it, again and again and again 01:39:57.770 --> 01:39:59.465 and again and again, literally forever? 01:40:02.150 --> 01:40:02.840 Any thoughts? 01:40:02.840 --> 01:40:03.740 Yeah. 01:40:03.740 --> 01:40:05.480 What would have happened in C. Joy? 01:40:08.210 --> 01:40:11.102 AUDIENCE: Yeah, I think it would have crashed. 01:40:11.102 --> 01:40:12.560 DAVID MALAN: It would have crashed? 01:40:12.560 --> 01:40:14.870 Why? 01:40:14.870 --> 01:40:17.537 AUDIENCE: Because it would be taking much memory. 01:40:17.537 --> 01:40:18.620 DAVID MALAN: Good thought. 01:40:18.620 --> 01:40:20.098 So it wouldn't crash per se. 01:40:20.098 --> 01:40:21.140 Something would go wrong. 01:40:21.140 --> 01:40:21.890 It wouldn't crash. 01:40:21.890 --> 01:40:24.170 Because it's still an int, and in C at least, 01:40:24.170 --> 01:40:27.590 it would still be taking up on a typical computer 32 bits or 4 bytes. 01:40:27.590 --> 01:40:31.520 But honestly, the program probably would have started printing 0 01:40:31.520 --> 01:40:33.320 by now, or even negative numbers. 01:40:33.320 --> 01:40:35.840 Because recall, one of the limitations of C 01:40:35.840 --> 01:40:38.570 is that integers are a finite size-- 01:40:38.570 --> 01:40:40.650 only 32 bits or 4 bytes. 01:40:40.650 --> 01:40:43.910 Which means if you keep going from 1, 2, 4 8, 16, 01:40:43.910 --> 01:40:47.000 a million, 2 million, 4 million, 8 million, 01:40:47.000 --> 01:40:49.640 and so forth, eventually you're going to get into the billions. 01:40:49.640 --> 01:40:52.820 And as soon as you cross the 2 billion threshold or maybe the 4 billion 01:40:52.820 --> 01:40:57.000 threshold, if using signed or unsigned numbers, it's going to get too big. 01:40:57.000 --> 01:40:58.670 You're going to have integer overflow. 01:40:58.670 --> 01:41:03.710 But in the world of Python, integer overflow, not a thing anymore. 01:41:03.710 --> 01:41:05.600 In the world of Python, your numbers will 01:41:05.600 --> 01:41:08.090 get as big as you need them to get. 01:41:08.090 --> 01:41:10.770 They will automatically address this problem for you. 01:41:10.770 --> 01:41:15.200 Unfortunately, floating point imprecision, still a thing. 01:41:15.200 --> 01:41:18.058 So I only divided 1 by 2 earlier. 01:41:18.058 --> 01:41:21.350 But if I continue to divide other values and I looked at enough decimal points, 01:41:21.350 --> 01:41:24.290 we would still suffer, unfortunately, from floating point imprecision. 01:41:24.290 --> 01:41:27.950 However, in the world of Python, like in Java and other languages, 01:41:27.950 --> 01:41:30.380 there are libraries, scientific libraries 01:41:30.380 --> 01:41:33.500 that allow you to use as much precision as you need, 01:41:33.500 --> 01:41:35.720 or at least as much memory as your computer has. 01:41:35.720 --> 01:41:39.560 So those problems, too, have been better solved in more modern languages 01:41:39.560 --> 01:41:42.260 than in something out of the box like C code. 01:41:42.260 --> 01:41:45.800 But just by multiplying that number again and again was I able, then, 01:41:45.800 --> 01:41:50.720 to demonstrate much larger numbers than we ever saw in weeks past. 01:41:50.720 --> 01:41:53.690 Well, let me go ahead and do another program here, 01:41:53.690 --> 01:41:56.690 this one called scores.py. 01:41:56.690 --> 01:41:58.850 That's going to be an example of really keeping 01:41:58.850 --> 01:42:03.140 track of scores, which was an example we did early on in week 2 of the class. 01:42:03.140 --> 01:42:05.390 And in Python, I'm going to go ahead and give myself 01:42:05.390 --> 01:42:06.890 a list of scores like this-- 01:42:06.890 --> 01:42:08.900 72, 73, and 33-- 01:42:08.900 --> 01:42:11.285 again, sort of a playful reference to our ASCII numbers. 01:42:11.285 --> 01:42:13.160 But in this context, they're quiz scores-- so 01:42:13.160 --> 01:42:16.010 two OK quiz scores, and one kind of low quiz score, 01:42:16.010 --> 01:42:18.110 assuming these things are out of like 100. 01:42:18.110 --> 01:42:19.850 But notice the syntax I'm using. 01:42:19.850 --> 01:42:22.680 Square brackets in Python give me a list. 01:42:22.680 --> 01:42:25.070 I don't have to decide in advance how big it is. 01:42:25.070 --> 01:42:27.600 It's not an array per se, but it's similar in spirit. 01:42:27.600 --> 01:42:29.480 But it will automatically grow or shrink. 01:42:29.480 --> 01:42:31.370 And the syntax is even simpler. 01:42:31.370 --> 01:42:33.618 Suppose I want to average these scores in Python. 01:42:33.618 --> 01:42:34.910 I could do something like this. 01:42:34.910 --> 01:42:39.140 I could print out that the average of these scores is, for instance-- 01:42:39.140 --> 01:42:40.880 and then I could do something like this. 01:42:40.880 --> 01:42:46.130 I could do the sum of scores divided by the length of scores. 01:42:46.130 --> 01:42:49.190 And some of this is actually kind of new already. 01:42:49.190 --> 01:42:54.710 It turns out in Python that there is sum function that will take a list as input 01:42:54.710 --> 01:42:58.520 and return to you the sum of those items. 01:42:58.520 --> 01:43:01.790 And we've seen already there's a len function, L-E-N 01:43:01.790 --> 01:43:03.570 that tells you the length of a list. 01:43:03.570 --> 01:43:07.460 So if I add up all my scores and then divide by the total number of scores, 01:43:07.460 --> 01:43:09.660 that should give me by definition my average. 01:43:09.660 --> 01:43:13.390 So python of scores.py, voila-- 01:43:13.390 --> 01:43:15.620 whoops, what did I do here? 01:43:15.620 --> 01:43:18.390 Ah, I screwed up. 01:43:18.390 --> 01:43:21.560 So unintended, admittedly, but let me try to save myself here. 01:43:21.560 --> 01:43:22.980 So what just happened? 01:43:22.980 --> 01:43:24.855 Well, this error message is a little cryptic. 01:43:24.855 --> 01:43:29.470 It says, "TypeError-- can only concatenate str, not float, to str." 01:43:29.470 --> 01:43:29.970 long. 01:43:29.970 --> 01:43:32.460 Story short, Python in this case does not 01:43:32.460 --> 01:43:36.540 like the fact that I'm trying to take a string, average, on the left 01:43:36.540 --> 01:43:40.215 and concatenate to it a float on the right. 01:43:40.215 --> 01:43:42.090 So there's a couple of ways I can solve this. 01:43:42.090 --> 01:43:44.860 And we saw the fundamental solution earlier. 01:43:44.860 --> 01:43:47.700 If this expression here that I've highlighted 01:43:47.700 --> 01:43:52.320 is by definition mathematically a float, but I want it to become a string, 01:43:52.320 --> 01:43:56.400 I can just tell Python, convert that float to a string. 01:43:56.400 --> 01:44:00.128 So much like there's the itoa function that some of you discovered, 01:44:00.128 --> 01:44:01.920 which is the opposite of the atoi function, 01:44:01.920 --> 01:44:05.430 I can take in Python, in this case a float, 01:44:05.430 --> 01:44:07.330 and convert it to a string equivalent. 01:44:07.330 --> 01:44:13.320 So now if I run python of scores.py, voila, my average is 59.333333. 01:44:13.320 --> 01:44:15.300 And you already see a bit of imprecision. 01:44:15.300 --> 01:44:19.573 There's some rounding error at the end there that is not a perfect one third. 01:44:19.573 --> 01:44:21.240 But there's another way I could do this. 01:44:21.240 --> 01:44:22.470 And it's a little uglier. 01:44:22.470 --> 01:44:25.050 But I could use one of those f-strings. 01:44:25.050 --> 01:44:27.540 I could, say, go ahead and plug in a value 01:44:27.540 --> 01:44:30.610 here and just print out the user's average. 01:44:30.610 --> 01:44:32.970 So it turns out that inside of these curly braces, 01:44:32.970 --> 01:44:35.880 you don't have to print just variables. 01:44:35.880 --> 01:44:39.040 You can actually put entire coding expressions. 01:44:39.040 --> 01:44:42.137 And I would encourage you not to paste crazy long lines of code, 01:44:42.137 --> 01:44:44.220 because it's going to very quickly get unreadable. 01:44:44.220 --> 01:44:46.290 At that point you probably should use a variable. 01:44:46.290 --> 01:44:49.920 But here I can go ahead and run python of scores.py. 01:44:49.920 --> 01:44:52.170 And voila-- I screwed up again. 01:44:52.170 --> 01:44:54.750 Also not intentional, but I can fix this. 01:44:54.750 --> 01:44:59.310 Yeah, I'm missing the f at the beginning to make this a formatted string. 01:44:59.310 --> 01:45:02.928 And now if I rerun it, voila, same exact answer. 01:45:02.928 --> 01:45:04.470 So again, I have multiple approaches. 01:45:04.470 --> 01:45:05.640 There's a third one here. 01:45:05.640 --> 01:45:09.480 I could do something-- and actually, I don't need the str in that context, 01:45:09.480 --> 01:45:11.940 because now if it's inside of a format string, 01:45:11.940 --> 01:45:15.210 Python will presume that I want to automatically convert it to a string. 01:45:15.210 --> 01:45:16.170 So that's nice. 01:45:16.170 --> 01:45:18.780 Or I can just factor this out, and I can say something 01:45:18.780 --> 01:45:22.230 like this-- give me a variable called average, assign it equal to that math, 01:45:22.230 --> 01:45:23.920 and then print out the average. 01:45:23.920 --> 01:45:26.970 So again, just like in C, so many different ways to solve the problem. 01:45:26.970 --> 01:45:29.550 And which one is best depends really on what 01:45:29.550 --> 01:45:33.953 might be most readable, most maintainable, or easiest to do. 01:45:33.953 --> 01:45:36.120 Let me go ahead and add some scores dynamically now. 01:45:36.120 --> 01:45:38.010 Instead of hardcoding my three scores, let 01:45:38.010 --> 01:45:41.010 me ask myself for my scores over the course of the semester. 01:45:41.010 --> 01:45:44.850 From cs50 let me get_int, just so I can get some numbers easily. 01:45:44.850 --> 01:45:48.270 Let me give myself an empty list of scores, the syntax for which 01:45:48.270 --> 01:45:52.380 is just open bracket, close bracket, so nothing inside of it initially. 01:45:52.380 --> 01:45:53.880 And now let me go ahead and do this. 01:45:53.880 --> 01:45:55.380 Let me get myself three scores-- 01:45:55.380 --> 01:45:56.880 maybe it's the end of the term now. 01:45:56.880 --> 01:46:03.120 For i in range of 3, let me go ahead and append to the scores array 01:46:03.120 --> 01:46:07.453 whatever the return value of get_int is like this. 01:46:07.453 --> 01:46:09.370 Now, this, too, I could do in a bunch of ways. 01:46:09.370 --> 01:46:12.042 Let me get rid of this here. 01:46:12.042 --> 01:46:12.930 Whoops. 01:46:12.930 --> 01:46:14.160 Nope, we'll leave that there. 01:46:14.160 --> 01:46:15.540 This I could do in a bunch of ways. 01:46:15.540 --> 01:46:16.623 But notice what I'm doing. 01:46:16.623 --> 01:46:20.370 I'm getting int, and I'm passing the return value of int 01:46:20.370 --> 01:46:21.870 to a new function called append. 01:46:21.870 --> 01:46:24.840 It turns out that lists, the square brackets, 01:46:24.840 --> 01:46:27.600 once you've defined them in a variable like scores, 01:46:27.600 --> 01:46:29.350 they, too, have functions built into them. 01:46:29.350 --> 01:46:34.090 So I can do scores.append in order to add a number to the list. 01:46:34.090 --> 01:46:36.840 So now let me go ahead and run this, python of scores.py. 01:46:36.840 --> 01:46:40.260 Let me manually type in my 72, my 73, and my 33. 01:46:40.260 --> 01:46:42.675 And voila, same exact answer. 01:46:42.675 --> 01:46:44.550 But think about how much of a pain this would 01:46:44.550 --> 01:46:46.950 have been in C, if you had to either decide 01:46:46.950 --> 01:46:49.800 in advance the size of the array, or not decide in advance 01:46:49.800 --> 01:46:53.460 and use malloc and realloc to keep growing and shrinking it. 01:46:53.460 --> 01:46:56.340 Python, using this append function, which 01:46:56.340 --> 01:46:59.880 comes inside of that list variable, handles 01:46:59.880 --> 01:47:03.070 all of this automatically for us. 01:47:03.070 --> 01:47:03.570 All right. 01:47:03.570 --> 01:47:06.780 So that, too, is a whole bunch of features. 01:47:06.780 --> 01:47:10.200 Any questions, though, that I can answer here? 01:47:13.150 --> 01:47:16.420 Any questions? 01:47:16.420 --> 01:47:16.920 No? 01:47:16.920 --> 01:47:19.290 Yeah, over to Santiago. 01:47:19.290 --> 01:47:20.460 AUDIENCE: Yeah. 01:47:20.460 --> 01:47:22.800 I had a question about-- 01:47:22.800 --> 01:47:28.410 so even if append reduces the amount of code you have to write, 01:47:28.410 --> 01:47:31.560 does it underneath the hood just do exactly what we 01:47:31.560 --> 01:47:35.900 were doing in C, which is like, malloc and realloc, or something like that? 01:47:35.900 --> 01:47:38.100 Is that all-- is that happening inside Python? 01:47:38.100 --> 01:47:38.970 DAVID MALAN: It is. 01:47:38.970 --> 01:47:41.068 Yeah, that's exactly what you're getting for free, 01:47:41.068 --> 01:47:42.360 so to speak, with the language. 01:47:42.360 --> 01:47:45.030 All of that malloc stuff, realloc stuff, maybe it's 01:47:45.030 --> 01:47:47.448 implemented with an array underneath the hood, 01:47:47.448 --> 01:47:48.990 like in the actual computer's memory. 01:47:48.990 --> 01:47:51.138 Maybe it's a linked list like we saw last week. 01:47:51.138 --> 01:47:52.680 But all of that is happening for you. 01:47:52.680 --> 01:47:55.470 But that, again, is one of the reasons why the code ultimately 01:47:55.470 --> 01:47:59.370 runs a little slower, because you have someone else's code in between you 01:47:59.370 --> 01:48:03.210 and the CPU in your computer doing a bit of that work for you. 01:48:03.210 --> 01:48:06.180 Sophia? 01:48:06.180 --> 01:48:08.360 AUDIENCE: Are there efficiency differences 01:48:08.360 --> 01:48:13.640 in between the ways that we print, of utilizing the f formatting 01:48:13.640 --> 01:48:16.550 or the other forms that we've used? 01:48:16.550 --> 01:48:19.700 DAVID MALAN: You don't have to be-- if I'm understanding correctly, 01:48:19.700 --> 01:48:21.292 there are some fancy features of it. 01:48:21.292 --> 01:48:23.000 For instance, there is syntax you can use 01:48:23.000 --> 01:48:25.190 to specify how many decimal points you want 01:48:25.190 --> 01:48:27.470 to print after a floating point value. 01:48:27.470 --> 01:48:32.030 But it's no longer all of the %i, %s, %f, and so forth. 01:48:32.030 --> 01:48:34.970 They're slightly different syntax, but fortunately less of it, 01:48:34.970 --> 01:48:39.620 since you don't have to worry as much about those conventions. 01:48:39.620 --> 01:48:43.370 Other questions or confusion? 01:48:43.370 --> 01:48:43.870 No? 01:48:43.870 --> 01:48:44.240 All right. 01:48:44.240 --> 01:48:46.365 Well, let me go ahead and do one other example that 01:48:46.365 --> 01:48:48.280 might be familiar from some weeks past. 01:48:48.280 --> 01:48:51.190 Let me go ahead and whip up a quick example of uppercasing, just 01:48:51.190 --> 01:48:53.230 to tie together one of our earlier examples 01:48:53.230 --> 01:48:55.660 that we saw more organically, or lowercasing. 01:48:55.660 --> 01:48:58.392 In this case, a file called uppercase.py. 01:48:58.392 --> 01:49:00.850 Let me go ahead, and from the CS50 library, let me go ahead 01:49:00.850 --> 01:49:02.338 and import get_string. 01:49:02.338 --> 01:49:05.380 And then once I have this, let me go ahead and get a string from the user 01:49:05.380 --> 01:49:09.070 and ask them for, "Before," for instance. 01:49:09.070 --> 01:49:11.510 And then let me go ahead and do the following. 01:49:11.510 --> 01:49:13.810 Let me go ahead and print out "After," the goal being I 01:49:13.810 --> 01:49:16.950 want to uppercase this whole string for the user. 01:49:16.950 --> 01:49:18.950 And I'm going to keep this all on the same line. 01:49:18.950 --> 01:49:21.367 So again, I want a program that's going to print "Before," 01:49:21.367 --> 01:49:23.320 ask the human for some input, and then after, 01:49:23.320 --> 01:49:26.150 show the capitalized version of the whole string. 01:49:26.150 --> 01:49:27.320 So how can I do this? 01:49:27.320 --> 01:49:28.720 Well, we've seen one way already. 01:49:28.720 --> 01:49:33.160 I can do literally, for instance, s.upper. 01:49:33.160 --> 01:49:34.790 And let me go ahead and save this. 01:49:34.790 --> 01:49:37.070 And now run python of uppercase.py. 01:49:37.070 --> 01:49:39.550 Let me type in "hi" in lowercase, and boom, now 01:49:39.550 --> 01:49:41.350 I get back the uppercase version. 01:49:41.350 --> 01:49:44.200 But if you want, you can actually manipulate individual characters 01:49:44.200 --> 01:49:44.848 as well. 01:49:44.848 --> 01:49:47.140 Let me go ahead and a little more pedantically do this. 01:49:47.140 --> 01:49:50.290 For c in s, print c. 01:49:50.290 --> 01:49:53.180 Now, this isn't quite what I want yet, but it's a stepping stone. 01:49:53.180 --> 01:49:55.930 Notice now if I type in "hi" in lowercase, 01:49:55.930 --> 01:49:59.930 I see "h," "i," exclamation point, all still lowercase. 01:49:59.930 --> 01:50:01.557 So I haven't done anything interesting. 01:50:01.557 --> 01:50:03.640 But you know what, let me get rid of the new line, 01:50:03.640 --> 01:50:06.650 just so it all stays on the same line, because that was kind of ugly. 01:50:06.650 --> 01:50:07.710 Let me do it again. 01:50:07.710 --> 01:50:08.590 OK, a little better. 01:50:08.590 --> 01:50:11.230 Let me actually add a new line at the very end of the program 01:50:11.230 --> 01:50:13.040 to move my cursor to the new line. 01:50:13.040 --> 01:50:14.860 Let's do it once more, "hi." 01:50:14.860 --> 01:50:17.420 OK, I'm not uppercasing anything. 01:50:17.420 --> 01:50:23.560 But if I change c to c.upper, I can do that as expected. 01:50:23.560 --> 01:50:25.630 And let me run it again, "hi," and boom. 01:50:25.630 --> 01:50:27.490 Now I have another working program. 01:50:27.490 --> 01:50:32.230 But the new feature now is, notice this coolness on line 5. 01:50:32.230 --> 01:50:35.080 If you want to iterate over a string's characters, 01:50:35.080 --> 01:50:39.130 you don't need to initialize i to 0 and then use square bracket notation 01:50:39.130 --> 01:50:45.460 like you did in C. You just say, for c in s, or for x and y, whatever it is. 01:50:45.460 --> 01:50:50.257 For can also be used to iterate over the individual characters in a string, 01:50:50.257 --> 01:50:52.090 as you might want to do when doing something 01:50:52.090 --> 01:50:54.133 like cryptography or the like. 01:50:54.133 --> 01:50:56.800 So we don't have to just uppercase the whole string all at once. 01:50:56.800 --> 01:51:00.340 We can still gain access to our individual values. 01:51:00.340 --> 01:51:03.550 And there's other things you can do in Python as well that we could do in C. 01:51:03.550 --> 01:51:07.480 Let me go ahead and create a program here called argv.py, 01:51:07.480 --> 01:51:12.308 for argument vector, which, recall, was the name of the input to main 01:51:12.308 --> 01:51:14.350 that allows you to access command line arguments. 01:51:14.350 --> 01:51:17.090 Now today, we have seen that you can have a main function 01:51:17.090 --> 01:51:19.240 but you don't need to, but it's conventional. 01:51:19.240 --> 01:51:20.810 It's not required anymore. 01:51:20.810 --> 01:51:24.370 And so we haven't seen argc or argv yet, but that's 01:51:24.370 --> 01:51:26.500 because they're elsewhere in Python. 01:51:26.500 --> 01:51:29.650 If you want to access command line arguments in Python, 01:51:29.650 --> 01:51:33.640 it turns out that you can import a module called argv. 01:51:33.640 --> 01:51:37.300 And this is a little new, but it follows the same pattern as the CS50's library. 01:51:37.300 --> 01:51:42.640 I'm going to import from the System library a feature called argv. 01:51:42.640 --> 01:51:45.700 So this just means that it comes with Python, but to use it 01:51:45.700 --> 01:51:47.980 you have to import it explicitly. 01:51:47.980 --> 01:51:49.360 And now I'm going to do this. 01:51:49.360 --> 01:51:54.490 If the length of argv equals 2, then I'm going 01:51:54.490 --> 01:51:57.340 to go ahead and print out, just like we did a few weeks ago, 01:51:57.340 --> 01:52:01.660 "hello," and then argv bracket 1. 01:52:01.660 --> 01:52:04.480 Somewhat cryptic, but I'll come back to this in a moment. 01:52:04.480 --> 01:52:07.580 Else, I'm going to go ahead and print out a default of "hello, world." 01:52:07.580 --> 01:52:09.970 So we did this some weeks ago, in week 2, 01:52:09.970 --> 01:52:14.260 whereby we ran a program that if the user typed their name at the prompt, 01:52:14.260 --> 01:52:16.250 it would say "hello, David" or "hello, Brian." 01:52:16.250 --> 01:52:18.410 If they didn't, it would just say "hello, world." 01:52:18.410 --> 01:52:22.150 So to be clear, if I run this thing and run it without any command line 01:52:22.150 --> 01:52:24.250 arguments, I just see "hello, world." 01:52:24.250 --> 01:52:27.790 If I run it again, though, and type my name in and hit Enter, 01:52:27.790 --> 01:52:29.020 now I see "hello, David." 01:52:29.020 --> 01:52:30.340 So how is that working? 01:52:30.340 --> 01:52:33.340 Well, this first line of code gives me access to argv, 01:52:33.340 --> 01:52:37.100 which is now tucked away in the sys library, if you will, 01:52:37.100 --> 01:52:38.800 the sys package, so to speak. 01:52:38.800 --> 01:52:40.300 But it works the same way. 01:52:40.300 --> 01:52:42.580 There's no argc, but no problem. 01:52:42.580 --> 01:52:46.210 If argv is a list of command line arguments, which it is, 01:52:46.210 --> 01:52:50.620 len, L-E-N, will tell me the length of that list, which is equivalent to argc. 01:52:50.620 --> 01:52:55.930 So I can reconstruct the same idea from my version in C. 01:52:55.930 --> 01:52:59.680 And here, then, I have a format string that prints out "hello," comma, 01:52:59.680 --> 01:53:01.270 and then whatever's in curly braces. 01:53:01.270 --> 01:53:02.650 And argv is a list. 01:53:02.650 --> 01:53:05.830 And just like in C, which had arrays, a list 01:53:05.830 --> 01:53:09.265 is just an array that can dynamically grow and shrink for you. 01:53:09.265 --> 01:53:13.610 You can still use square bracket notation to get at, in this case, 01:53:13.610 --> 01:53:15.790 the second thing the human typed. 01:53:15.790 --> 01:53:18.310 So let me change this just for clarity to be 0. 01:53:18.310 --> 01:53:20.680 And if I rerun this now and type in David, 01:53:20.680 --> 01:53:23.590 it says weirdly, "hello, argv.py." 01:53:23.590 --> 01:53:25.540 So what you don't see is the word "Python." 01:53:25.540 --> 01:53:29.050 Python is the interpreter, but that's not part of your program's execution 01:53:29.050 --> 01:53:29.890 per se. 01:53:29.890 --> 01:53:36.100 argv 0 is going to be the name of the Python program you're running, 01:53:36.100 --> 01:53:39.920 and argv 1 is going to be the first word thereafter, and so forth. 01:53:39.920 --> 01:53:42.310 So we still have access to that feature, but now 01:53:42.310 --> 01:53:44.008 we can convert it now to Python. 01:53:44.008 --> 01:53:46.800 And in fact, if I want to print out all the command line arguments, 01:53:46.800 --> 01:53:48.330 I can just more simply do this-- 01:53:48.330 --> 01:53:52.200 for arg in argv, go ahead and print arg. 01:53:52.200 --> 01:53:55.172 So very succinct, if not obvious at first glance. 01:53:55.172 --> 01:53:56.880 Now let me go ahead and type in something 01:53:56.880 --> 01:53:58.710 like "David Malan," two words. 01:53:58.710 --> 01:54:04.710 Enter, you now see everything printed or typed after the program's name, 01:54:04.710 --> 01:54:05.650 and so forth. 01:54:05.650 --> 01:54:10.800 So here, too, notice how neatly we can iterate over a list in Python. 01:54:10.800 --> 01:54:13.470 There's no i, there's no square brackets necessarily. 01:54:13.470 --> 01:54:18.240 You can just say, for arg in argv, just like a moment ago I said for c in s. 01:54:18.240 --> 01:54:21.480 Pretty much the Python for loop is smart enough 01:54:21.480 --> 01:54:25.050 to figure out what it is you want it to iterate over, 01:54:25.050 --> 01:54:26.650 whether it's a string or a list. 01:54:26.650 --> 01:54:29.553 And my God, it's just so much more fun or pleasant to program 01:54:29.553 --> 01:54:32.220 now, when you don't have to worry about all the stupid mechanics 01:54:32.220 --> 01:54:35.040 of incrementing, and plus plus, and semicolons, and all 01:54:35.040 --> 01:54:37.620 of that syntactical mess. 01:54:37.620 --> 01:54:40.162 All right, let me pause here to see if there's any questions. 01:54:40.162 --> 01:54:42.578 I know we're going through some of these examples quickly, 01:54:42.578 --> 01:54:44.430 but they're really just translations again. 01:54:44.430 --> 01:54:46.860 And for upcoming problems and problems sets 01:54:46.860 --> 01:54:52.990 will you be able to more methodically compare before and after as well. 01:54:52.990 --> 01:54:55.180 Anything at all on your end, Brian? 01:54:55.180 --> 01:54:56.080 BRIAN: Nothing here. 01:54:56.080 --> 01:54:57.038 DAVID MALAN: All right. 01:54:57.038 --> 01:54:59.080 So let's look at some of our final past examples. 01:54:59.080 --> 01:55:01.246 And then we'll reserve some time at the end of today 01:55:01.246 --> 01:55:02.980 to look at some even more powerful things 01:55:02.980 --> 01:55:06.230 that we can do because now of languages like Python. 01:55:06.230 --> 01:55:10.615 Let me go ahead and create a program, this time called exit.py, exit.py. 01:55:10.615 --> 01:55:12.490 And this program's purpose in life, it's just 01:55:12.490 --> 01:55:14.080 going to demonstrate exit statuses. 01:55:14.080 --> 01:55:16.270 Recall that eventually in C, we introduced 01:55:16.270 --> 01:55:20.320 the notion of returning 0, or returning 1, or any other value from main. 01:55:20.320 --> 01:55:22.720 We do have that ability now in Python, too, 01:55:22.720 --> 01:55:25.278 that you'll start to see in more larger programs. 01:55:25.278 --> 01:55:27.070 Here, too, I'm going to go ahead and import 01:55:27.070 --> 01:55:30.740 sys, the whole thing this time, just to show a different way of doing this. 01:55:30.740 --> 01:55:34.420 I'm going to say, if the length of sys.argv 01:55:34.420 --> 01:55:37.960 does not equal 2, let me go ahead and yell at the user, 01:55:37.960 --> 01:55:40.570 "Missing command-line arguments." 01:55:40.570 --> 01:55:44.680 And then after this, I'm going to go ahead and do sys.exit 1. 01:55:44.680 --> 01:55:48.940 Otherwise, I'm going to go ahead and print out a formatted string that 01:55:48.940 --> 01:55:54.640 says "hello," comma arg v bracket 1, with sys now in front of it 01:55:54.640 --> 01:55:56.420 for reasons I'll explain in a moment. 01:55:56.420 --> 01:56:01.490 And then at the end, I'm going to go ahead and by default print sys.exit 0. 01:56:01.490 --> 01:56:01.990 All right. 01:56:01.990 --> 01:56:03.270 So what is going on here? 01:56:03.270 --> 01:56:06.730 One, because I'm now using sys for two different things, 01:56:06.730 --> 01:56:09.190 I decided not to import argv specifically, 01:56:09.190 --> 01:56:11.170 but just to import the whole library. 01:56:11.170 --> 01:56:14.770 But because I did that, I can't just write the word "argv" anywhere. 01:56:14.770 --> 01:56:18.700 I now have to prefix it with the name of the package or library that it's in. 01:56:18.700 --> 01:56:22.870 So that's why I started doing sys.argv, sys.argv. 01:56:22.870 --> 01:56:27.580 But I'm also using another feature of the sys library, which gives me access 01:56:27.580 --> 01:56:33.220 to an exit function, which is the equivalent to returning from main. 01:56:33.220 --> 01:56:34.630 So this is a bit of a dichotomy. 01:56:34.630 --> 01:56:39.430 In C, you had to return 0 or 1, or some other integer from main. 01:56:39.430 --> 01:56:44.750 In Python, you instead call sys.exit with the same kinds of numbers. 01:56:44.750 --> 01:56:48.340 So a little bit different syntactically, but it's the same fundamental idea. 01:56:48.340 --> 01:56:49.890 What's the purpose of this program? 01:56:49.890 --> 01:56:52.150 Well, if I run this thing, its purpose is just 01:56:52.150 --> 01:56:56.590 to make me type in one word and only one word after my program's name. 01:56:56.590 --> 01:56:59.080 So notice, if I just run python of exit.py, 01:56:59.080 --> 01:57:01.750 it's yelling at me, "Missing command-line argument." 01:57:01.750 --> 01:57:05.890 If I run it instead with my name after that, now it says "hello, David." 01:57:05.890 --> 01:57:07.300 So stupid program. 01:57:07.300 --> 01:57:11.200 It's only meant to demonstrate how you can now return different values 01:57:11.200 --> 01:57:14.295 or really return prematurely from a program, 01:57:14.295 --> 01:57:15.670 because you're no longer in main. 01:57:15.670 --> 01:57:21.770 You can't return per se, but you can now in Python exit as needed. 01:57:21.770 --> 01:57:24.350 So that's the comparable line there. 01:57:24.350 --> 01:57:26.435 All right, any questions, then, on exit statuses? 01:57:26.435 --> 01:57:29.060 Again, we're just kind of churning through the list of features 01:57:29.060 --> 01:57:33.350 we saw in C, even if they don't come to you super naturally-- 01:57:33.350 --> 01:57:40.540 pun not intended-- but rather, there are analogs here in the Python world. 01:57:40.540 --> 01:57:41.040 No? 01:57:41.040 --> 01:57:41.500 All right. 01:57:41.500 --> 01:57:43.440 Well, recall that after that we started focusing really 01:57:43.440 --> 01:57:44.760 in the class on algorithms. 01:57:44.760 --> 01:57:46.560 And that's when the size of our data sets 01:57:46.560 --> 01:57:49.620 and our-- the efficiency of our code started to really matter. 01:57:49.620 --> 01:57:52.230 Let me go ahead and write a program called numbers.py 01:57:52.230 --> 01:57:55.440 that, for instance, contains an import at the top for sys, 01:57:55.440 --> 01:57:56.980 because I'll need that in a moment. 01:57:56.980 --> 01:57:58.050 And then it gives me-- 01:57:58.050 --> 01:58:02.850 and let me give myself an array of numbers, like 4, 6, 8, 2, 7, 5, 0. 01:58:02.850 --> 01:58:07.030 And you might recall that those were the numbers behind the doors in week 3. 01:58:07.030 --> 01:58:09.450 And suppose that I want to search for the number 0. 01:58:09.450 --> 01:58:13.830 Well, in C, to implement linear search you would use a for loop 01:58:13.830 --> 01:58:17.130 and a variable like i, and check all of the locations. 01:58:17.130 --> 01:58:18.690 Python is way simpler. 01:58:18.690 --> 01:58:25.180 If 0 in numbers, go ahead and print out "Found." 01:58:25.180 --> 01:58:31.850 And then I'll go ahead and else print out "Not found." 01:58:31.850 --> 01:58:32.750 And that's it. 01:58:32.750 --> 01:58:35.500 So let me go ahead now and do python of numbers.py. 01:58:35.500 --> 01:58:38.360 Hopefully I will see [INAUDIBLE] found, because it's in fact there. 01:58:38.360 --> 01:58:38.920 So that's it. 01:58:38.920 --> 01:58:44.620 Linear search is just a prepositional phrase, if 0 in numbers, 01:58:44.620 --> 01:58:47.620 gives you the answer True or False that you want. 01:58:47.620 --> 01:58:49.360 So there is our linear search. 01:58:49.360 --> 01:58:50.830 What if I want to do it for names? 01:58:50.830 --> 01:58:54.070 Well, let me go ahead and give myself a second file, similar in spirit, 01:58:54.070 --> 01:58:55.600 called names.py. 01:58:55.600 --> 01:58:56.800 Let me again import-- 01:58:56.800 --> 01:59:00.400 and actually, if I really want to be identical to our C version, 01:59:00.400 --> 01:59:05.737 let me go ahead and exit with 0 here, and let me exit with 1 here. 01:59:05.737 --> 01:59:07.570 But strictly speaking, that's not necessary. 01:59:07.570 --> 01:59:11.265 That just happens to be what I did when we did this in C instead. 01:59:11.265 --> 01:59:13.390 In names, let me go ahead and do something similar. 01:59:13.390 --> 01:59:17.020 Let me give myself a names list with a whole bunch of names-- 01:59:17.020 --> 01:59:25.750 "Bill," and "Charlie," and "Fred," and "George," and "Ginny," and "Percy," 01:59:25.750 --> 01:59:28.700 and lastly "Ron," all the way at the end. 01:59:28.700 --> 01:59:31.720 And then let me just check if "Ron" is in that list using linear search. 01:59:31.720 --> 01:59:36.820 If "Ron" in names, go ahead and print out "Found." 01:59:36.820 --> 01:59:39.220 Else, go ahead and print out "Not found." 01:59:39.220 --> 01:59:43.120 And I won't bother printing out or exiting with 0 or 1 this time. 01:59:43.120 --> 01:59:45.760 But let me go ahead and run python of names-- 01:59:45.760 --> 01:59:48.010 whoops, python of names. 01:59:48.010 --> 01:59:49.840 And voila, we found "Ron." 01:59:49.840 --> 01:59:51.160 And notice, I'm not cheating. 01:59:51.160 --> 01:59:52.720 I don't think I've screwed up. 01:59:52.720 --> 01:59:55.960 If I go ahead and say "Ronald," if that was in fact his formal name, 01:59:55.960 --> 01:59:58.120 now I search for "Ron," not found. 01:59:58.120 --> 01:59:59.973 It's looking, indeed, for an exact match. 01:59:59.973 --> 02:00:02.140 So that's pretty cool, that we can distill something 02:00:02.140 --> 02:00:03.430 like that pretty readily. 02:00:03.430 --> 02:00:06.670 Well, recall that a little bit ago, I proposed that Python has other data 02:00:06.670 --> 02:00:11.080 types as well, among which are these things called dictionaries or dicts, 02:00:11.080 --> 02:00:17.380 D-I-C-T, which represent a collection of key-value pairs similar in spirit 02:00:17.380 --> 02:00:18.250 to a dictionary. 02:00:18.250 --> 02:00:23.110 Like, the Spanish dictionary has Spanish keys and English values converting one 02:00:23.110 --> 02:00:25.330 to the other, this English dictionary has 02:00:25.330 --> 02:00:27.700 English words and English definitions. 02:00:27.700 --> 02:00:30.700 But the same idea-- a collection of keys and values. 02:00:30.700 --> 02:00:32.863 Using one, you can find the other. 02:00:32.863 --> 02:00:35.530 Well, let's go ahead and translate this into Python in a program 02:00:35.530 --> 02:00:38.740 called phonebook.py, and implements something 02:00:38.740 --> 02:00:41.170 like our C phone book a while back, which, recall, 02:00:41.170 --> 02:00:45.580 in C, we used a couple of arrays initially, then we scratched that, 02:00:45.580 --> 02:00:48.160 and we used an array of structs instead. 02:00:48.160 --> 02:00:51.970 Now let's use a dictionary, which is a more general-purpose data 02:00:51.970 --> 02:00:54.320 structure, as follows. 02:00:54.320 --> 02:00:59.290 Let me go ahead here and from cs50 import get_string. 02:00:59.290 --> 02:01:02.410 Then let me go ahead and give myself a dictionary of people. 02:01:02.410 --> 02:01:04.600 And the syntax here is a little different, 02:01:04.600 --> 02:01:07.690 but I'm going to go ahead and preemptively use curly braces. 02:01:07.690 --> 02:01:10.330 They are back for the purposes of dictionaries. 02:01:10.330 --> 02:01:12.820 And then here's how you define key-value pairs. 02:01:12.820 --> 02:01:14.560 One key is going to be "Brian." 02:01:14.560 --> 02:01:18.910 And his value is going to be "+1-617-495-1000." 02:01:18.910 --> 02:01:19.842 That's his number. 02:01:19.842 --> 02:01:21.800 And then I'll be one of the other keys from now 02:01:21.800 --> 02:01:24.430 We'll keep it a very small phone book or dictionary. 02:01:24.430 --> 02:01:29.380 Mine will be "+1-949-468-2750." 02:01:29.380 --> 02:01:31.360 And that's it. 02:01:31.360 --> 02:01:34.275 So the curly braces can technically be on different lines. 02:01:34.275 --> 02:01:36.400 I could move this up here, I could get rid of this. 02:01:36.400 --> 02:01:39.100 But there are certain style conventions in Python. 02:01:39.100 --> 02:01:43.240 The point, though, here is that a dictionary is defined with curly braces 02:01:43.240 --> 02:01:47.770 at the beginning and end; the keys and values are separated by colons; 02:01:47.770 --> 02:01:50.993 and the key-value pairs are separated by commas. 02:01:50.993 --> 02:01:53.410 So that's why it's conventional to write it the way I did. 02:01:53.410 --> 02:01:55.118 It's just a little more obvious that this 02:01:55.118 --> 02:01:58.750 is a dictionary with two keys, each of which has a value. 02:01:58.750 --> 02:02:01.640 It's just associating left with right, so to speak. 02:02:01.640 --> 02:02:02.770 Now, what does this mean? 02:02:02.770 --> 02:02:04.840 Suppose I want to search for someone's name. 02:02:04.840 --> 02:02:08.440 Well, let me go ahead and give myself a name variable called get_string, asking 02:02:08.440 --> 02:02:09.520 the human for a name. 02:02:09.520 --> 02:02:13.210 And let me implement my own virtual phone book, much like the Contacts app 02:02:13.210 --> 02:02:13.935 on your phone. 02:02:13.935 --> 02:02:16.060 Let me go ahead and then say, once I have the name, 02:02:16.060 --> 02:02:18.640 if name in people, that's great. 02:02:18.640 --> 02:02:20.890 If I found the name in people, let me go ahead 02:02:20.890 --> 02:02:27.100 and print out that the number for that person is people bracket name. 02:02:27.100 --> 02:02:30.100 And this is where dictionaries are going to get really powerful. 02:02:30.100 --> 02:02:32.020 Let me run it first and then explain. 02:02:32.020 --> 02:02:34.870 Python of phonebook.py, Enter-- 02:02:34.870 --> 02:02:38.350 whoops, python of phonebook.py. 02:02:38.350 --> 02:02:40.300 Let me search for Brian's number. 02:02:40.300 --> 02:02:42.685 Boom, there's Brian's number. 02:02:42.685 --> 02:02:44.560 Let me go ahead and run it with David's name. 02:02:44.560 --> 02:02:46.270 Boom, there's that number. 02:02:46.270 --> 02:02:50.380 Let me go ahead and run it with, say, Montague's name. 02:02:50.380 --> 02:02:52.630 Don't have his phone number just yet. 02:02:52.630 --> 02:02:55.810 He's unlisted, as would be anyone else that I type in. 02:02:55.810 --> 02:02:57.850 So what has gone on here? 02:02:57.850 --> 02:03:00.730 Well, at the top I'm declaring this new variable called people. 02:03:00.730 --> 02:03:03.820 And it's a dictionary, a set of key-value pairs left and right. 02:03:03.820 --> 02:03:07.900 Then I'm just getting a string from the user using get_string as before. 02:03:07.900 --> 02:03:09.760 And then this is powerful, too. 02:03:09.760 --> 02:03:14.830 This is essentially, on line 9, searching the whole dictionary 02:03:14.830 --> 02:03:16.030 for the given name. 02:03:16.030 --> 02:03:22.290 And it's returning to me down here the name associated with that-- or, sorry, 02:03:22.290 --> 02:03:24.760 the number associated with that person's name. 02:03:24.760 --> 02:03:27.200 And let me make this more clear by factoring this out. 02:03:27.200 --> 02:03:30.090 Let me give myself a variable called number and then more 02:03:30.090 --> 02:03:32.460 explicitly print out that variable's name. 02:03:32.460 --> 02:03:34.260 Here's what's different today. 02:03:34.260 --> 02:03:39.780 "If name in people" in here, what this does is Python 02:03:39.780 --> 02:03:42.540 searches all of the keys for that name. 02:03:42.540 --> 02:03:43.650 It doesn't search values. 02:03:43.650 --> 02:03:47.280 When you say if name in a given dictionary, like people is, 02:03:47.280 --> 02:03:49.050 it searches only the keys. 02:03:49.050 --> 02:03:52.290 If you've then found the key, I know definitively 02:03:52.290 --> 02:03:54.780 that "David" or "Brian" are in the dictionary. 02:03:54.780 --> 02:03:55.830 And notice this. 02:03:55.830 --> 02:03:58.560 It's just like in C's arrays syntax. 02:03:58.560 --> 02:04:01.800 You can now use square bracket notation to index 02:04:01.800 --> 02:04:06.510 into a dictionary using a word like "David" or "Brian," 02:04:06.510 --> 02:04:09.150 and get back a value like our phone number. 02:04:09.150 --> 02:04:12.060 In C, and thus far even in Python, whenever 02:04:12.060 --> 02:04:16.770 we've seen square bracket notation, it would only be typically for numbers, 02:04:16.770 --> 02:04:19.590 because arrays or lists have indices, numbers 02:04:19.590 --> 02:04:22.050 that addresses the first location, middle, and last, 02:04:22.050 --> 02:04:24.033 and so forth, everything in between. 02:04:24.033 --> 02:04:26.700 But what's powerful about dictionaries is that they're otherwise 02:04:26.700 --> 02:04:28.710 known as associative arrays. 02:04:28.710 --> 02:04:31.150 A dictionary is a collection of key-value pairs. 02:04:31.150 --> 02:04:33.570 And if you want to look up a key, you simply 02:04:33.570 --> 02:04:35.730 use square bracket notation, just like we used 02:04:35.730 --> 02:04:37.500 to use square brackets for numbers. 02:04:37.500 --> 02:04:42.030 And because Python is a pretty fancy language, 02:04:42.030 --> 02:04:44.130 it handles the searching for you. 02:04:44.130 --> 02:04:46.920 And even better, it does not use linear search. 02:04:46.920 --> 02:04:50.460 When searching a dictionary, it aspires to give you 02:04:50.460 --> 02:04:54.600 constant time by using what we called last week a hash table. 02:04:54.600 --> 02:04:58.020 Dictionaries are typically implemented underneath the hood using something 02:04:58.020 --> 02:04:59.190 like a hash table. 02:04:59.190 --> 02:05:01.380 And recall that, even though it was really 02:05:01.380 --> 02:05:05.730 a goal of achieving constant time, if you choose a really good hash 02:05:05.730 --> 02:05:07.290 function and a really-- 02:05:07.290 --> 02:05:11.670 the right size array to hash into, you can come close to constant time. 02:05:11.670 --> 02:05:15.660 So again, among the features of a dictionary in Python 02:05:15.660 --> 02:05:18.270 are that it gives you very high performance. 02:05:18.270 --> 02:05:19.350 It's not linear search. 02:05:19.350 --> 02:05:22.470 And in fact, set-- recall that when we began playing with Python earlier, 02:05:22.470 --> 02:05:24.795 and I re-implemented speller using, what, 02:05:24.795 --> 02:05:27.420 10 or 20 lines of code max instead of the many 02:05:27.420 --> 02:05:31.440 more that you might have written for pset 5, speller used a set. 02:05:31.440 --> 02:05:33.460 And a set is just a collection of values. 02:05:33.460 --> 02:05:36.960 Long story short, it's similar in spirit to a dictionary 02:05:36.960 --> 02:05:40.140 in that it, too, underneath the hood uses a hash table 02:05:40.140 --> 02:05:42.010 to get you answers quickly. 02:05:42.010 --> 02:05:46.110 So if you think back to what that speller example was from earlier 02:05:46.110 --> 02:05:51.420 on today, when I had a line of code that just said words equals set, 02:05:51.420 --> 02:05:55.680 that one line of code was implementing pretty much the entirety 02:05:55.680 --> 02:05:57.180 of your spell checker. 02:05:57.180 --> 02:06:01.080 All of those pointers, all of those hash tables and chains and linked lists 02:06:01.080 --> 02:06:04.020 are distilled into just one line of code. 02:06:04.020 --> 02:06:07.210 You get that with the language itself. 02:06:07.210 --> 02:06:07.710 All right. 02:06:07.710 --> 02:06:10.500 Any questions, then, on dictionaries? 02:06:10.500 --> 02:06:14.670 They will recur, and they tend to be one of the most useful data structures, 02:06:14.670 --> 02:06:17.910 because this ability to just associate something with something else 02:06:17.910 --> 02:06:24.140 is just a wonderful way, it turns out, to organize your data. 02:06:24.140 --> 02:06:27.260 Any questions here? 02:06:27.260 --> 02:06:29.225 Yeah, Sophia? 02:06:29.225 --> 02:06:31.390 AUDIENCE: Is there only a set hash function 02:06:31.390 --> 02:06:33.445 that Python has defined for these dictionaries, 02:06:33.445 --> 02:06:36.400 or can we change the hash function in any way? 02:06:36.400 --> 02:06:38.140 DAVID MALAN: Good question. 02:06:38.140 --> 02:06:40.540 It comes with a hash function for you, and Python 02:06:40.540 --> 02:06:42.430 figures all of that out for you. 02:06:42.430 --> 02:06:46.210 So that's the kind of detail that you should leave to the library, 02:06:46.210 --> 02:06:48.502 because someone else has spent all of the time thinking 02:06:48.502 --> 02:06:51.502 about how to dynamically adapt the data structure, move things around as 02:06:51.502 --> 02:06:54.040 needed, so that you no longer need to stress to the degree 02:06:54.040 --> 02:06:57.115 you might have when implementing speller yourself. 02:06:57.115 --> 02:06:58.990 And it turns out, other things get easy, too. 02:06:58.990 --> 02:07:01.710 This is not a commonly needed feature, necessarily, 02:07:01.710 --> 02:07:02.960 but it is something we can do. 02:07:02.960 --> 02:07:05.650 And let me go ahead and write a quick program called swap.py. 02:07:05.650 --> 02:07:10.810 Recall that in swap.c a couple of weeks ago, we gave x a value of 1, 02:07:10.810 --> 02:07:14.320 y a variable-- a value of 2, and then I printed out something 02:07:14.320 --> 02:07:17.740 like "x is x, y is y." 02:07:17.740 --> 02:07:20.680 But this week I'm using format strings just to print that out. 02:07:20.680 --> 02:07:24.670 Then I did something like swap x, y, and I just kind of hoped for the best, 02:07:24.670 --> 02:07:26.860 and then I printed out those values again. 02:07:26.860 --> 02:07:30.100 Well it turns out in Python, because you don't have pointers 02:07:30.100 --> 02:07:34.720 and you don't have addresses per se that you have access to, 02:07:34.720 --> 02:07:37.330 you can't resort to the solution like last week 02:07:37.330 --> 02:07:39.770 and pass these variables around by reference, 02:07:39.770 --> 02:07:41.140 so to speak, by their address. 02:07:41.140 --> 02:07:42.620 That's just not possible. 02:07:42.620 --> 02:07:43.630 Why is that a thing? 02:07:43.630 --> 02:07:46.005 Well, it would seem to be taking a feature away from you, 02:07:46.005 --> 02:07:49.240 but honestly, if this past week was any indication, including the week prior, 02:07:49.240 --> 02:07:50.260 pointers are hard. 02:07:50.260 --> 02:07:52.330 And segmentation faults are frequent. 02:07:52.330 --> 02:07:55.600 And getting all of that stuff right is difficult. And at worst, 02:07:55.600 --> 02:07:58.600 your programs can be compromised, because someone can access memory that 02:07:58.600 --> 02:07:59.470 they shouldn't. 02:07:59.470 --> 02:08:01.390 So Python takes that feature away. 02:08:01.390 --> 02:08:04.510 Java also takes that feature away from programmers 02:08:04.510 --> 02:08:08.320 to protect you against yourself from screwing up, like you may have 02:08:08.320 --> 02:08:11.190 and should have in some number of times this past week. 02:08:11.190 --> 02:08:13.940 But it turns out, in Python there are solutions to these problems. 02:08:13.940 --> 02:08:16.420 And if you want to swap x and y, that's fine. 02:08:16.420 --> 02:08:17.860 Swap x and y. 02:08:17.860 --> 02:08:23.500 And so now if I run python of swap on this program, voila, boom, 02:08:23.500 --> 02:08:25.160 it's distilled into one other line. 02:08:25.160 --> 02:08:28.120 So even though they take something away from us that you can do a lot of damage 02:08:28.120 --> 02:08:30.520 with or make a lot of mistakes with, we can nonetheless 02:08:30.520 --> 02:08:35.620 hand you back a more powerful feature with this one liner for swap. 02:08:35.620 --> 02:08:39.498 And notice that it's x comma y on the left, but y comma x on the right. 02:08:39.498 --> 02:08:41.290 And that has the effect of doing what Brian 02:08:41.290 --> 02:08:43.990 did with the glasses of liquid of doing the switcheroo, 02:08:43.990 --> 02:08:47.830 even without a temporary variable explicitly there, 02:08:47.830 --> 02:08:51.290 though some magic is happening underneath the hood. 02:08:51.290 --> 02:08:54.970 Well, let's go ahead and implement a couple of final programs from week 4 02:08:54.970 --> 02:08:58.240 and then introduce a few of our own here in week 6. 02:08:58.240 --> 02:09:00.580 Let me go ahead and implement another phone book 02:09:00.580 --> 02:09:02.680 that this one's a little more persistent. 02:09:02.680 --> 02:09:09.940 Let me go ahead here and open-- create a file here called phonebook.csv. 02:09:09.940 --> 02:09:12.640 And let me go ahead and name this name comma number. 02:09:12.640 --> 02:09:15.520 So CSV file, recall, is like a very simple spreadsheet. 02:09:15.520 --> 02:09:18.430 And we're going to go ahead and just create that so I have it nearby. 02:09:18.430 --> 02:09:22.180 And then I'm going to create a new file called phonebook.py that's 02:09:22.180 --> 02:09:23.500 initially empty. 02:09:23.500 --> 02:09:25.150 And this time I'm going to do this. 02:09:25.150 --> 02:09:30.040 I'm going to import from cs50 the get_string function as before. 02:09:30.040 --> 02:09:33.670 But I'm also going to import a library called the CSV library. 02:09:33.670 --> 02:09:37.690 It turns out, Python comes with a whole lot of functionality related to CSV 02:09:37.690 --> 02:09:41.980 files to make your life easier and make it easier to do things with CSVs. 02:09:41.980 --> 02:09:43.970 Among the things I might want to do is this. 02:09:43.970 --> 02:09:48.820 Let me go ahead and open up that file, phonebook.csv, in append mode, 02:09:48.820 --> 02:09:51.520 similar to fopen two weeks ago. 02:09:51.520 --> 02:09:56.050 And let me go ahead and assign that to a variable called file. 02:09:56.050 --> 02:09:58.640 Then let me go ahead and just get a name from the user. 02:09:58.640 --> 02:10:02.710 So let me use get_string to get someone's name, "Name" here. 02:10:02.710 --> 02:10:04.900 Then let me go ahead and get-- use get_string again 02:10:04.900 --> 02:10:07.810 to get someone's number here, so using "Number." 02:10:07.810 --> 02:10:10.240 And then lastly-- and this is the new code-- 02:10:10.240 --> 02:10:13.300 let me save that name and number to a file. 02:10:13.300 --> 02:10:17.278 And recall from pset 4 that saving files and writing bytes out to files 02:10:17.278 --> 02:10:18.070 is pretty involved. 02:10:18.070 --> 02:10:21.250 Like, it probably took you a while to implement recover, or blur, 02:10:21.250 --> 02:10:24.160 any of those filters that involved creating new files. 02:10:24.160 --> 02:10:26.380 Turns out the CSV library makes this pretty easy. 02:10:26.380 --> 02:10:29.230 Let me go ahead and give myself what's called a writer. 02:10:29.230 --> 02:10:34.790 And I'm going to give myself the return value of calling csv.writer of file. 02:10:34.790 --> 02:10:35.830 So what is this doing? 02:10:35.830 --> 02:10:38.890 File, again, represents the file I'm trying to open. 02:10:38.890 --> 02:10:42.490 csv.writer is some function that comes with the CSV library. 02:10:42.490 --> 02:10:45.940 And it expects as input a file that you've already opened. 02:10:45.940 --> 02:10:49.210 And it kind of wraps that file with some fancier functionality 02:10:49.210 --> 02:10:52.000 that's going to make it way easier for me the programmer 02:10:52.000 --> 02:10:53.650 to write to that file. 02:10:53.650 --> 02:10:54.830 What am I going to do? 02:10:54.830 --> 02:10:59.230 I'm going to use that writer variable to write a row that specifically 02:10:59.230 --> 02:11:00.690 contains a name and a number. 02:11:00.690 --> 02:11:02.440 And I'm using a list, because if you think 02:11:02.440 --> 02:11:06.010 of a spreadsheet with columns and rows, a list is kind of the right idea. 02:11:06.010 --> 02:11:08.710 Each of the cells from left to right is kind of like a list. 02:11:08.710 --> 02:11:10.100 A row is like a list. 02:11:10.100 --> 02:11:11.990 So I'm going to deliberately use a list here. 02:11:11.990 --> 02:11:15.800 And then lastly, I'm going to close the file, just as I've done in the past. 02:11:15.800 --> 02:11:17.140 So it's a little cryptic here. 02:11:17.140 --> 02:11:21.910 But again, get_string-- get_string is old now. 02:11:21.910 --> 02:11:22.900 This is old now. 02:11:22.900 --> 02:11:26.050 So the only things that are new are importing the CSV. 02:11:26.050 --> 02:11:30.170 I'm opening this file in append mode, similar to what I did in C. 02:11:30.170 --> 02:11:33.950 And then these lines here involve wrapping the file with the CSV 02:11:33.950 --> 02:11:38.000 functionality, writing a row to this file with writerow, 02:11:38.000 --> 02:11:39.260 and then closing it. 02:11:39.260 --> 02:11:40.760 So let me go ahead and try this now. 02:11:40.760 --> 02:11:43.880 Let me open up a phonebook.csv, which for now only 02:11:43.880 --> 02:11:47.030 contains these headers which I created manually a moment ago. 02:11:47.030 --> 02:11:50.480 And let me go ahead and run this, python of phonebook.py. 02:11:50.480 --> 02:11:52.040 Let me go ahead and add Brian. 02:11:52.040 --> 02:11:56.810 And Brian will be +1-617-495-1000, Enter. 02:11:56.810 --> 02:12:00.420 And now let me go to my CSV file over here. 02:12:00.420 --> 02:12:02.195 Dammit, I screwed up. 02:12:02.195 --> 02:12:03.570 Pretend I didn't hit Enter there. 02:12:03.570 --> 02:12:04.630 Now it works. 02:12:04.630 --> 02:12:06.715 Let me go ahead now and do this again by input-- 02:12:06.715 --> 02:12:09.090 I should have hit Enter when I created the file manually, 02:12:09.090 --> 02:12:10.210 but I screwed up on creating it. 02:12:10.210 --> 02:12:13.252 So let me wave my hand at that and convince you that I did this correctly 02:12:13.252 --> 02:12:21.270 in code by adding myself, David, +1-949-468-2750, Enter. 02:12:21.270 --> 02:12:23.250 Let me go back to my CSV file. 02:12:23.250 --> 02:12:26.370 And voila, now it's formatted correctly, because I did-- 02:12:26.370 --> 02:12:28.930 writerow includes a line for me. 02:12:28.930 --> 02:12:32.700 And notice, too, if I download this file-- let me download phonebook.csv 02:12:32.700 --> 02:12:34.410 like I did in a past week. 02:12:34.410 --> 02:12:36.390 Let me download this to my own Mac. 02:12:36.390 --> 02:12:38.130 Let me open this CSV file. 02:12:38.130 --> 02:12:40.808 And whether you have Apple Numbers installed or Microsoft Excel, 02:12:40.808 --> 02:12:42.600 you'll open something that looks like this. 02:12:42.600 --> 02:12:46.260 And voila, I've dynamically created, using Python code now, 02:12:46.260 --> 02:12:49.067 my own sort of CSV file. 02:12:49.067 --> 02:12:51.900 And it turns out there's a way to tighten this up just a little bit. 02:12:51.900 --> 02:12:54.690 Not a big deal to do it the way I did, but it turns out 02:12:54.690 --> 02:12:59.310 that you can also open and close files a little differently. 02:12:59.310 --> 02:13:00.510 You can do this. 02:13:00.510 --> 02:13:06.000 With file-- with, rather, with open as file. 02:13:06.000 --> 02:13:09.810 Then I can indent all of this here, and I can get rid of my close line. 02:13:09.810 --> 02:13:12.780 So not a big deal to do it the way I did with open and close, 02:13:12.780 --> 02:13:15.780 but the way I've done this here is a little more Pythonic. 02:13:15.780 --> 02:13:19.260 This "with" keyword, which is not something analogous to anything 02:13:19.260 --> 02:13:22.230 we've seen in C, the with keyword, when you open a file, 02:13:22.230 --> 02:13:24.420 it will automatically close it for you eventually. 02:13:24.420 --> 02:13:27.550 So you might see that in some online references or other materials. 02:13:27.550 --> 02:13:31.535 But again, it just does that for you automatically. 02:13:31.535 --> 02:13:32.910 Well, let's go ahead and do this. 02:13:32.910 --> 02:13:35.640 I like the fact that we can now manipulate CSV. 02:13:35.640 --> 02:13:38.010 And it turns out that if you've ever used Google Forms-- 02:13:38.010 --> 02:13:41.230 that's a very popular way of collecting data from users. 02:13:41.230 --> 02:13:44.670 In fact, let me go ahead and go to a URL which is going 02:13:44.670 --> 02:13:47.160 to show you a form like this here. 02:13:47.160 --> 02:13:51.000 Brian, if you wouldn't mind typing that into the chat, go to that you URL, 02:13:51.000 --> 02:13:53.458 cs50.ly.hogwarts. 02:13:53.458 --> 02:13:55.500 And if everyone wouldn't mind just playing along, 02:13:55.500 --> 02:13:59.160 just tell us what house you wish you were assigned to by the Sorting 02:13:59.160 --> 02:14:02.550 Hat in the world of Hogwarts. 02:14:02.550 --> 02:14:04.290 What house would you be in? 02:14:04.290 --> 02:14:06.300 Now, if you've used Google Forms before, you'll 02:14:06.300 --> 02:14:09.730 recall that you can see these results, certainly in the Google Form itself-- 02:14:09.730 --> 02:14:12.185 and already 122 of you have buzzed in. 02:14:12.185 --> 02:14:14.560 And we could see a distribution and a graph and so forth. 02:14:14.560 --> 02:14:17.390 But what I want is not the distribution pictorially there. 02:14:17.390 --> 02:14:19.390 I'm going to go ahead and open up a spreadsheet. 02:14:19.390 --> 02:14:22.230 And so if you've never used Google Forms before, you can click a button, 02:14:22.230 --> 02:14:24.450 and then you can get a list of all of the responses 02:14:24.450 --> 02:14:26.160 that are coming in live right now. 02:14:26.160 --> 02:14:28.560 And by default, Google keeps track of the timestamp, 02:14:28.560 --> 02:14:32.500 when the form was submitted, and what house was actually used. 02:14:32.500 --> 02:14:34.440 So I'm going to go ahead now and do this. 02:14:34.440 --> 02:14:37.980 Let me go ahead and download that in another tab. 02:14:37.980 --> 02:14:42.090 Give me just a moment to do it on this screen here. 02:14:42.090 --> 02:14:48.270 I'm going to go ahead and download that CSV file onto my Mac 02:14:48.270 --> 02:14:53.700 locally, by going to File, Download, CSV. 02:14:53.700 --> 02:14:55.830 That's going to put it into my Downloads folder. 02:14:55.830 --> 02:14:59.970 And then I'm going to go ahead and upload this into my IDE 02:14:59.970 --> 02:15:01.530 by just dragging and dropping. 02:15:01.530 --> 02:15:03.480 Whoops, I have to open the file browser. 02:15:03.480 --> 02:15:07.700 I'm going to do this by dragging and dropping the file. 02:15:07.700 --> 02:15:08.260 All right. 02:15:08.260 --> 02:15:09.980 Now I have that file there. 02:15:09.980 --> 02:15:13.270 And let me go ahead now and make sure the file's there. 02:15:13.270 --> 02:15:15.280 I have this file called "Sorting Hat Responses-- 02:15:15.280 --> 02:15:17.510 Form Responses 1," and so forth. 02:15:17.510 --> 02:15:20.602 Well, let me go ahead and write a program now that manipulates this data, 02:15:20.602 --> 02:15:22.810 much like you might if running a student group that's 02:15:22.810 --> 02:15:25.900 collecting data in a Google Form, or you're just collecting information 02:15:25.900 --> 02:15:28.000 in general and have it in CSV format. 02:15:28.000 --> 02:15:30.680 How might you now tally up all of the results, 02:15:30.680 --> 02:15:32.560 especially if Google weren't just telling 02:15:32.560 --> 02:15:34.480 you graphically what the results were? 02:15:34.480 --> 02:15:36.940 Well, let me go ahead and write a program called hogwarts, 02:15:36.940 --> 02:15:40.630 which was not something that we've seen ever before in C. Let me go ahead 02:15:40.630 --> 02:15:42.580 and import this CSV library. 02:15:42.580 --> 02:15:44.620 Let me give myself initially a dictionary 02:15:44.620 --> 02:15:48.070 called houses that contains a whole bunch of keys, 02:15:48.070 --> 02:15:52.540 like "Gryffindor" with initial count of 0; 02:15:52.540 --> 02:15:59.500 "Hufflepuff" with an initial count of 0; "Ravenclaw" with an initial count of 0; 02:15:59.500 --> 02:16:02.860 and also, "Slytherin" with an initial count of 0. 02:16:02.860 --> 02:16:05.590 So notice, in a dictionary, or dict in Python, 02:16:05.590 --> 02:16:09.100 the keys and values don't need to be strings and strings. 02:16:09.100 --> 02:16:11.712 It can certainly be strings and numbers. 02:16:11.712 --> 02:16:14.170 Because I'm going to use this dictionary ultimately to keep 02:16:14.170 --> 02:16:17.918 count of all of the votes for one house or another. 02:16:17.918 --> 02:16:19.210 So let me go ahead and do this. 02:16:19.210 --> 02:16:24.400 Let me go ahead and open up with open the Sorting Hat File-- 02:16:24.400 --> 02:16:30.280 Form Responses 1.csv"-- long filename, but that's the default from Google-- 02:16:30.280 --> 02:16:31.130 as file. 02:16:31.130 --> 02:16:34.299 So I'm going to use my one liner instead of having to open and close. 02:16:34.299 --> 02:16:38.209 I'm going to give myself this time a reader, which we did not see before. 02:16:38.209 --> 02:16:41.680 CSV library has a reader function that allows 02:16:41.680 --> 02:16:44.570 me to read a CSV file automatically. 02:16:44.570 --> 02:16:46.459 I'm going to go ahead and skip the first row. 02:16:46.459 --> 02:16:48.760 Next is a function that just skips the first row, 02:16:48.760 --> 02:16:53.290 because recall that that one is just timestamp and house, which 02:16:53.290 --> 02:16:54.200 I do want to ignore. 02:16:54.200 --> 02:16:55.690 I want the real data from you all. 02:16:55.690 --> 02:16:58.450 And here's what's cool about CSVs in Python. 02:16:58.450 --> 02:17:02.740 I can-- if I want to iterate over all of the rows that are in that spreadsheet, 02:17:02.740 --> 02:17:05.080 I can do for row in reader. 02:17:05.080 --> 02:17:12.980 And now, let me go ahead and get at, for instance, the house in question. 02:17:12.980 --> 02:17:20.080 So the house in a given row is going to be the row's first entry, 0 indexed. 02:17:20.080 --> 02:17:21.590 So what is going on here? 02:17:21.590 --> 02:17:25.000 Well, let me go back to the Google spreadsheet a moment ago. 02:17:25.000 --> 02:17:27.910 And in the Google spreadsheet, there's two columns. 02:17:27.910 --> 02:17:32.570 And the way the CSV reader works is it returns to you one row at a time-- 02:17:32.570 --> 02:17:34.639 and that's conceptually pretty straightforward. 02:17:34.639 --> 02:17:37.180 It maps perfectly to the idea of a spreadsheet. 02:17:37.180 --> 02:17:42.280 But each row is returned to you as a list, a list in this case of size 2. 02:17:42.280 --> 02:17:46.299 So row bracket 0 would give me a given timestamp, row bracket 02:17:46.299 --> 02:17:48.639 1 would give me a given house name. 02:17:48.639 --> 02:17:52.870 So that's why here in the IDE, I'm going ahead and declaring a variable called 02:17:52.870 --> 02:17:55.597 house, and I'm assigning it equal to row bracket 1, because I 02:17:55.597 --> 02:17:56.889 don't care about the timestamp. 02:17:56.889 --> 02:17:59.120 We all just did this roughly at the same time. 02:17:59.120 --> 02:18:04.150 But now that I have the house, I can now index into the dictionary, just 02:18:04.150 --> 02:18:08.719 like in C you could index into an array using a number. 02:18:08.719 --> 02:18:10.990 But in a dictionary, I can use strings. 02:18:10.990 --> 02:18:14.650 So I'm going to go ahead and say, go into the houses dictionary, which 02:18:14.650 --> 02:18:19.389 I defined up above, and go to the house key, and go ahead 02:18:19.389 --> 02:18:22.240 and increment it by 1. 02:18:22.240 --> 02:18:23.230 And that's it. 02:18:23.230 --> 02:18:28.030 At this point, I have opened the CSV file and read it using the library. 02:18:28.030 --> 02:18:30.940 In this loop, I'm iterating over every row in the spreadsheet 02:18:30.940 --> 02:18:34.000 that you all created by filling out that form again and again. 02:18:34.000 --> 02:18:36.309 I'm just using a variable to get at whatever's 02:18:36.309 --> 02:18:40.150 in the second column, otherwise known as row bracket 1, 02:18:40.150 --> 02:18:42.250 because row bracket 0 would be the timestamp. 02:18:42.250 --> 02:18:44.230 And then I'm going into the dictionary called 02:18:44.230 --> 02:18:46.570 houses, which we defined up here. 02:18:46.570 --> 02:18:50.590 I'm indexing into it just like an array, but it's a list in this case, 02:18:50.590 --> 02:18:54.219 using its house name, which looks up the appropriate key. 02:18:54.219 --> 02:18:59.270 And then plus equals 1 has the effect of incrementing its value. 02:18:59.270 --> 02:19:02.170 So it's a nice way of going into the dictionary and incrementing. 02:19:02.170 --> 02:19:03.478 Go in and increment. 02:19:03.478 --> 02:19:06.520 So now let's go ahead at the very end here and just print out the result. 02:19:06.520 --> 02:19:10.330 "For house in houses" is the fancy way to iterate over 02:19:10.330 --> 02:19:14.230 all of the keys in a dictionary, go ahead and print out a formatted string 02:19:14.230 --> 02:19:15.500 as follows. 02:19:15.500 --> 02:19:21.160 Let me print out the house name followed by a colon followed by the houses 02:19:21.160 --> 02:19:24.309 dictionary, indexing into it with house. 02:19:24.309 --> 02:19:25.090 So again, cryptic. 02:19:25.090 --> 02:19:26.590 We'll come back to this in a second. 02:19:26.590 --> 02:19:27.820 Python of hogwarts. 02:19:27.820 --> 02:19:31.870 Let me cross my fingers that I didn't screw this up. 02:19:31.870 --> 02:19:33.670 And I did. 02:19:33.670 --> 02:19:35.209 The IDE knew before I did. 02:19:35.209 --> 02:19:35.709 All right. 02:19:35.709 --> 02:19:38.270 Now let me hope that I didn't screw this up-- and dammit. 02:19:38.270 --> 02:19:38.770 All right. 02:19:38.770 --> 02:19:41.830 The file is called something slightly different. 02:19:41.830 --> 02:19:46.840 Google's name must have changed, sorry, versus when I practiced. 02:19:46.840 --> 02:19:49.310 Let me copy this. 02:19:49.310 --> 02:19:50.590 So close. 02:19:50.590 --> 02:19:52.180 Sorting hat responses. 02:19:52.180 --> 02:19:54.440 Ah, it has parentheses which I forgot. 02:19:54.440 --> 02:19:54.940 All right. 02:19:54.940 --> 02:19:58.600 Now let me cross my fingers, rerun the program, dammit. 02:19:58.600 --> 02:20:00.070 OK, no such file or-- 02:20:00.070 --> 02:20:02.890 oh, I forgot the csv, dot csv. 02:20:02.890 --> 02:20:03.460 OK. 02:20:03.460 --> 02:20:05.260 Now cross fingers and-- 02:20:05.260 --> 02:20:06.040 oh, thank God. 02:20:06.040 --> 02:20:11.440 OK so Gryffindor, not surprisingly, the most popular house. 02:20:11.440 --> 02:20:15.910 Hufflepuff at 40, Ravenclaw at 71, Slytherin-- oh, beat out Hufflepuff. 02:20:15.910 --> 02:20:18.880 Very interesting for whatever sociological reason. 02:20:18.880 --> 02:20:21.740 But here we have a program now that analyzed the CSV. 02:20:21.740 --> 02:20:24.380 Now, we happened to do it with silly Harry Potter data. 02:20:24.380 --> 02:20:26.890 But again, imagine collecting any data you want from users, 02:20:26.890 --> 02:20:30.700 downloading it as a CSV to your Mac or PC or your IDE, 02:20:30.700 --> 02:20:34.220 then writing code that analyzes that data however you want. 02:20:34.220 --> 02:20:36.790 I did a very simple summation, but you could certainly 02:20:36.790 --> 02:20:39.295 imagine doing something fancier than that, 02:20:39.295 --> 02:20:42.640 like doing summations or averages, standard deviations. 02:20:42.640 --> 02:20:46.570 All of that functionality could we get as well. 02:20:46.570 --> 02:20:50.230 All right, any questions on dictionaries before we now 02:20:50.230 --> 02:20:53.710 offer up some of the most powerful features we've yet 02:20:53.710 --> 02:20:55.630 seen in a programming language? 02:20:58.380 --> 02:21:01.600 Anything at all on your end, Brian? 02:21:01.600 --> 02:21:02.820 BRIAN: No hands raised here. 02:21:02.820 --> 02:21:03.778 DAVID MALAN: All right. 02:21:03.778 --> 02:21:06.690 Well, let me go ahead now, and I'm going to transition actually 02:21:06.690 --> 02:21:10.080 to my Mac where I have in advance pre-installed Python, 02:21:10.080 --> 02:21:11.870 just so that I can do things locally. 02:21:11.870 --> 02:21:13.370 It will make things a little faster. 02:21:13.370 --> 02:21:15.030 I don't have to worry about internet speeds and the like. 02:21:15.030 --> 02:21:18.090 And this is indeed the case, that on your own Mac, your own PC, 02:21:18.090 --> 02:21:21.210 you can download and install the Python interpreter, 02:21:21.210 --> 02:21:23.070 run it on your own Mac and PC. 02:21:23.070 --> 02:21:25.740 However, I would recommend you continue using this IDE, 02:21:25.740 --> 02:21:28.590 certainly for problem sets' sake until the end of the semester, 02:21:28.590 --> 02:21:31.350 maybe transitioning to your Mac or PC for final projects 02:21:31.350 --> 02:21:34.440 only, only because what I did this weekend was spent-- 02:21:34.440 --> 02:21:37.500 waste a huge amount of time just getting stupid libraries to work 02:21:37.500 --> 02:21:40.253 on my own Mac, which is often easier said than done, 02:21:40.253 --> 02:21:43.170 just because when programmers are writing code that's supposed to work 02:21:43.170 --> 02:21:46.980 on every possible Mac and PC in the world, you and I and everyone else 02:21:46.980 --> 02:21:49.830 have slightly different version numbers, different software install, 02:21:49.830 --> 02:21:51.270 different incompatibilities. 02:21:51.270 --> 02:21:55.140 So those kinds of headaches very quickly arise when you're doing things locally. 02:21:55.140 --> 02:21:58.210 So let me encourage you to wait until terms end with final projects, 02:21:58.210 --> 02:22:00.960 perhaps, to move off of the IDE and do what 02:22:00.960 --> 02:22:05.640 I'm about to now do, just because you'll be able to see these demos more 02:22:05.640 --> 02:22:06.810 clearly here. 02:22:06.810 --> 02:22:08.970 I'm going to go ahead, and on my own Mac, 02:22:08.970 --> 02:22:12.540 I'm going to go ahead and create a program called speech.py. 02:22:12.540 --> 02:22:16.770 In advance, I've installed a library that supports speech synthesis. 02:22:16.770 --> 02:22:18.810 And if I want access to that functionality, 02:22:18.810 --> 02:22:24.720 it suffices to import pyttsx3, which is the name of that person's open source 02:22:24.720 --> 02:22:28.260 free library that I downloaded and installed on my Mac in advance. 02:22:28.260 --> 02:22:29.400 I read the documentation. 02:22:29.400 --> 02:22:31.980 I literally never used this before this past week. 02:22:31.980 --> 02:22:36.270 And I found that I can declare a variable called engine, for instance. 02:22:36.270 --> 02:22:41.830 I can then call pyttsx3.init to initialize the library. 02:22:41.830 --> 02:22:42.330 Why? 02:22:42.330 --> 02:22:44.370 That's just because of how the programmer designed it. 02:22:44.370 --> 02:22:45.720 You have to initialize it first. 02:22:45.720 --> 02:22:50.610 I then can use that engine to say things like, say, "hello," comma "world." 02:22:50.610 --> 02:22:54.180 Then after that, I should run the engine and wait for it 02:22:54.180 --> 02:22:56.800 to finish before my own program quits. 02:22:56.800 --> 02:22:57.300 All right. 02:22:57.300 --> 02:23:03.420 Let me go ahead now and close that, and run python of speech.py on my own Mac 02:23:03.420 --> 02:23:04.950 here. 02:23:04.950 --> 02:23:06.678 COMPUTER VOICE: Hello, world. 02:23:06.678 --> 02:23:07.720 DAVID MALAN: Interesting. 02:23:07.720 --> 02:23:09.480 So it said what I typed in. 02:23:09.480 --> 02:23:13.050 And indeed, I can probably make this even more interesting. 02:23:13.050 --> 02:23:15.370 Let me go ahead and say something like this. 02:23:15.370 --> 02:23:19.170 Let me open up speech.py again and add some functionality. 02:23:19.170 --> 02:23:23.910 I won't use the CS50 library, but I will use maybe the input function. 02:23:23.910 --> 02:23:30.270 Let me go ahead and say name gets input, "What's your name," question mark. 02:23:30.270 --> 02:23:33.090 And then let me go ahead and say, not "hello, world," 02:23:33.090 --> 02:23:34.680 but let me use an f-string-- 02:23:34.680 --> 02:23:36.570 which doesn't have to be used in print, you 02:23:36.570 --> 02:23:39.220 can use it in any function that takes a string. 02:23:39.220 --> 02:23:42.160 Let me go ahead and say "hello" to that name. 02:23:42.160 --> 02:23:42.660 All right. 02:23:42.660 --> 02:23:45.550 Let me go ahead and run python speech.py again. 02:23:45.550 --> 02:23:46.320 Oops. 02:23:46.320 --> 02:23:50.930 Let me go ahead and run python of speech.py again. 02:23:50.930 --> 02:23:51.620 What's my name? 02:23:51.620 --> 02:23:52.350 David. 02:23:52.350 --> 02:23:54.590 COMPUTER VOICE: Hello, David. 02:23:54.590 --> 02:23:57.860 DAVID MALAN: Weird choice of inflection, but indeed it synthesized it. 02:23:57.860 --> 02:23:58.790 Let's try Brian. 02:23:58.790 --> 02:24:00.253 COMPUTER VOICE: Hello, Brian. 02:24:00.253 --> 02:24:00.920 DAVID MALAN: OK. 02:24:00.920 --> 02:24:02.795 So we could probably tinker with the settings 02:24:02.795 --> 02:24:04.800 to make the voice sound a little more natural. 02:24:04.800 --> 02:24:05.960 But that's pretty cool. 02:24:05.960 --> 02:24:08.750 Well, let me go into some code I wrote in advance this time using 02:24:08.750 --> 02:24:13.430 a different library, this one related to faces and facial detection. 02:24:13.430 --> 02:24:16.130 Certainly very much in vogue when it comes to social media 02:24:16.130 --> 02:24:19.640 these days, with Facebook and other websites automatically tagging you, 02:24:19.640 --> 02:24:23.150 very concerning increasingly with state governments and federal governments 02:24:23.150 --> 02:24:26.540 and law enforcement using facial detection to find people in a crowd. 02:24:26.540 --> 02:24:29.180 And let me go ahead and open up a file here, for instance, 02:24:29.180 --> 02:24:32.550 a little more benignly, like a whole bunch of people in an office. 02:24:32.550 --> 02:24:34.745 So here is a photograph of some people in an office. 02:24:34.745 --> 02:24:36.120 And there's a lot of faces there. 02:24:36.120 --> 02:24:42.020 But there's a lot of boxes of paper and other distractions besides those faces. 02:24:42.020 --> 02:24:46.400 But let me go ahead and look at, quickly, a program called detect.py. 02:24:46.400 --> 02:24:49.100 Most of this file is comments, just so that if you want at home 02:24:49.100 --> 02:24:50.930 you can follow along and see what it does. 02:24:50.930 --> 02:24:53.500 But let me just highlight a few salient lines. 02:24:53.500 --> 02:24:55.760 Here is that Pillow library again, where I'm 02:24:55.760 --> 02:24:59.660 accessing image related functionality from a pre-installed Python function. 02:24:59.660 --> 02:25:01.160 And this one's just kind of amazing. 02:25:01.160 --> 02:25:03.470 If you want to use facial recognition technology, 02:25:03.470 --> 02:25:05.360 just import face_recognition. 02:25:05.360 --> 02:25:08.330 That is a library you can import that will give you access 02:25:08.330 --> 02:25:09.920 to that kind of power. 02:25:09.920 --> 02:25:13.790 Down here now, I only knew how to figure this out by reading some documentation, 02:25:13.790 --> 02:25:17.897 but you access the library called face_recognition.load_image_file, 02:25:17.897 --> 02:25:19.730 which is a function that does what it means. 02:25:19.730 --> 02:25:21.590 I'm opening up office.jpg. 02:25:21.590 --> 02:25:25.340 And then scrolling down here to the white code, which is the actual code-- 02:25:25.340 --> 02:25:28.280 all of the blue is comments, recall-- 02:25:28.280 --> 02:25:32.690 this line of code here is all that's required in Python 02:25:32.690 --> 02:25:36.050 to use the face recognition library, find all of the face locations 02:25:36.050 --> 02:25:40.820 in a given image, and store them in a list called face_locations. 02:25:40.820 --> 02:25:43.310 This line of code here is just a Python loop 02:25:43.310 --> 02:25:47.180 that iterates over every face in the faces that were detected. 02:25:47.180 --> 02:25:50.180 And then these several lines of code here, long story short, 02:25:50.180 --> 02:25:54.050 just crop out individual faces and create a new image with the found 02:25:54.050 --> 02:25:55.107 faces. 02:25:55.107 --> 02:25:58.190 So without getting too much into the details of the library, which are not 02:25:58.190 --> 02:26:01.550 that intellectually interesting, the features are interesting to us for now, 02:26:01.550 --> 02:26:04.100 let me run python of detect.py. 02:26:04.100 --> 02:26:06.890 Let me give my Mac a few seconds here to do its thing. 02:26:06.890 --> 02:26:11.810 And voila, if I zoom in here we see Phyllis, and Jim, 02:26:11.810 --> 02:26:15.470 and Roy, and pretty much every other face that 02:26:15.470 --> 02:26:19.982 was detected in that photograph, cropped out as, indeed, an individual face. 02:26:19.982 --> 02:26:22.190 So if you've ever noticed a little square on yourself 02:26:22.190 --> 02:26:25.880 in Facebook when uploading a photo, this is exactly the kind of code 02:26:25.880 --> 02:26:30.380 that Facebook and others are using on their servers in order to execute that. 02:26:30.380 --> 02:26:31.970 Well, you know what, how about this? 02:26:31.970 --> 02:26:35.300 In the same office photo, you know, there's 02:26:35.300 --> 02:26:37.120 one person that always seems to stand out. 02:26:37.120 --> 02:26:38.120 No one really likes him. 02:26:38.120 --> 02:26:39.050 And that's Toby. 02:26:39.050 --> 02:26:43.370 What if we had a mug shot of Toby in a separate file like this? 02:26:43.370 --> 02:26:47.390 Can we find Toby in a crowd among these people in the office? 02:26:47.390 --> 02:26:48.140 Well, we can. 02:26:48.140 --> 02:26:50.985 Let me go ahead now and run a program called recognize.py, 02:26:50.985 --> 02:26:52.610 which you're welcome to look at online. 02:26:52.610 --> 02:26:54.830 It's similar lines of code, It's not terribly many, 02:26:54.830 --> 02:26:58.040 that is going to do some thinking. 02:26:58.040 --> 02:27:00.540 It's opening up both the office JPEG and this one. 02:27:00.540 --> 02:27:02.780 And notice what just happened, if I zoom in, 02:27:02.780 --> 02:27:09.200 wonderfully, Toby is the only one with a big green box around his face, 02:27:09.200 --> 02:27:10.965 having indeed been recognized. 02:27:10.965 --> 02:27:12.590 So again, I'll just glance at the code. 02:27:12.590 --> 02:27:16.700 This time, if I open up recognize.py, it's a few more lines of code. 02:27:16.700 --> 02:27:19.610 But again, I'm importing face recognition and some other things. 02:27:19.610 --> 02:27:21.260 I'm loading toby.jpg. 02:27:21.260 --> 02:27:23.278 And I'm loading office.jpg. 02:27:23.278 --> 02:27:25.820 And then there's some more code here that's looking for Toby, 02:27:25.820 --> 02:27:29.990 looking for Toby, and then drawing a big green box around the face that 02:27:29.990 --> 02:27:31.260 is ultimately found. 02:27:31.260 --> 02:27:33.380 So again, at the end of the day, it's just loops. 02:27:33.380 --> 02:27:34.500 It's just functions. 02:27:34.500 --> 02:27:35.630 It's just variables. 02:27:35.630 --> 02:27:39.440 But now the functions are pretty darn fancy and powerful, 02:27:39.440 --> 02:27:43.070 because again, they're taking advantage of all of these other features 02:27:43.070 --> 02:27:46.430 that we ourselves have implemented in a language like C, 02:27:46.430 --> 02:27:50.435 or have now seen glimpses of within the world of Python. 02:27:50.435 --> 02:27:51.560 Well, let's do another one. 02:27:51.560 --> 02:27:57.830 Let me go ahead and open up real quickly a program 02:27:57.830 --> 02:28:02.630 that will allow me to create one of these 2D barcodes, a so-called QR code. 02:28:02.630 --> 02:28:07.220 Let me go ahead and create a file called qr.py And in this file, let me go ahead 02:28:07.220 --> 02:28:08.360 and do this. 02:28:08.360 --> 02:28:10.970 Import the operating system library, for reasons 02:28:10.970 --> 02:28:14.750 we'll soon see, and let me import the QR code library, which 02:28:14.750 --> 02:28:17.000 will do all of the hard work for me. 02:28:17.000 --> 02:28:19.520 Let me go ahead and create an image called qr-- 02:28:19.520 --> 02:28:22.280 that's assigned the value of qrcode making. 02:28:22.280 --> 02:28:25.680 And let me paste in this URL of one of the course's lecture videos, 02:28:25.680 --> 02:28:26.510 for instance. 02:28:26.510 --> 02:28:31.700 And then let me go ahead and save this image as qr.png, Portable Network 02:28:31.700 --> 02:28:34.910 Graphic, as indeed a PNG, a very popular file format 02:28:34.910 --> 02:28:36.480 for photos and other things. 02:28:36.480 --> 02:28:38.330 And then let me actually open this thing up. 02:28:38.330 --> 02:28:41.720 Open up system-- actually, nope, that's fine. 02:28:41.720 --> 02:28:42.840 Let me keep it simple. 02:28:42.840 --> 02:28:44.210 We don't need the os library. 02:28:44.210 --> 02:28:45.080 Nope, we do. 02:28:45.080 --> 02:28:49.250 Let's go ahead and open it up with "open qr.png." 02:28:49.250 --> 02:28:50.970 So three lines of code-- 02:28:50.970 --> 02:28:56.220 make a QR code with that URL, save it as qr.png, and open the file. 02:28:56.220 --> 02:28:57.730 Three lines of code. 02:28:57.730 --> 02:29:00.750 Let me go ahead and run python of qr.py. 02:29:00.750 --> 02:29:02.280 Voila, it was pretty fast. 02:29:02.280 --> 02:29:05.670 If you would like to take out your own iPhone or Android phone, 02:29:05.670 --> 02:29:08.190 turn on the camera if your phone supports this, 02:29:08.190 --> 02:29:13.800 and scan this 3D barcode by awkwardly just pointing your phone at the lecture 02:29:13.800 --> 02:29:23.040 as we speak, it should open up YouTube for you, hopefully, and with such is-- 02:29:23.040 --> 02:29:26.730 I apologize to those-- yes, thank you for showing me what you're not seeing. 02:29:26.730 --> 02:29:28.740 I apologize for doing that yet again. 02:29:28.740 --> 02:29:29.670 Never gets old. 02:29:29.670 --> 02:29:33.330 But all we've done is embed in a two-dimensional format, details 02:29:33.330 --> 02:29:35.940 of which we won't go into in class, a URL, 02:29:35.940 --> 02:29:39.373 which suggests that you can store anything inside of these 2D barcodes, 02:29:39.373 --> 02:29:41.790 and if you decode them with something like your camera can 02:29:41.790 --> 02:29:45.870 the software running on your phones these days decode these things for you. 02:29:45.870 --> 02:29:49.590 Well, let me do something else, this time involving another sense, this one 02:29:49.590 --> 02:29:50.340 listening. 02:29:50.340 --> 02:29:53.428 Let me go into a file called listen.py. 02:29:53.428 --> 02:29:55.470 And let me go ahead and do something very simple. 02:29:55.470 --> 02:29:59.310 Let me go ahead and get a user's input in a variable called word 02:29:59.310 --> 02:30:01.290 by using the input function. 02:30:01.290 --> 02:30:02.670 Say something. 02:30:02.670 --> 02:30:06.690 And then let me just send it all to lowercase, just to keep things simple. 02:30:06.690 --> 02:30:07.980 And now let me do this. 02:30:07.980 --> 02:30:10.290 Once I get the user's words, let me go ahead and say, 02:30:10.290 --> 02:30:17.730 if the word "hello" is in their words, go ahead and print out "Hello to you 02:30:17.730 --> 02:30:18.370 too!" 02:30:18.370 --> 02:30:20.640 So if they say hello, I want to say hello back. 02:30:20.640 --> 02:30:27.840 Elif, "how are you" in words, then let me go ahead and print out something 02:30:27.840 --> 02:30:31.290 like, "I am well, thanks," as the computer. 02:30:31.290 --> 02:30:37.590 Elif "goodbye" in words, then let me go ahead and say something reasonable 02:30:37.590 --> 02:30:42.350 like "Goodbye to you too." 02:30:42.350 --> 02:30:45.530 And then lastly, else let me go ahead and print out 02:30:45.530 --> 02:30:46.820 just something like "Huh?" 02:30:46.820 --> 02:30:48.020 Unrecognized. 02:30:48.020 --> 02:30:52.580 So if you will, here is the beginnings of an artificial intelligence, an AI-- 02:30:52.580 --> 02:30:55.940 a program that's going to somehow interact with me the human typing 02:30:55.940 --> 02:30:57.390 in phrases to this thing. 02:30:57.390 --> 02:31:00.830 So if I did it correctly, let me go ahead and run python of listen.py. 02:31:00.830 --> 02:31:04.130 I did not do something correctly. 02:31:04.130 --> 02:31:06.390 Oh, not "is," "in." 02:31:06.390 --> 02:31:08.090 OK, sorry. 02:31:08.090 --> 02:31:10.160 Let me go ahead and run python of listen.py. 02:31:10.160 --> 02:31:10.910 Say something. 02:31:10.910 --> 02:31:12.020 I'll say "hello." 02:31:12.020 --> 02:31:13.100 Oh, Hello to you too. 02:31:13.100 --> 02:31:14.900 What a nice friendly program. 02:31:14.900 --> 02:31:18.230 Let me ask it how it is, "how are you," question mark. 02:31:18.230 --> 02:31:20.100 It seems to detect that. 02:31:20.100 --> 02:31:23.840 Let me go ahead and say, "ok goodbye for now." 02:31:23.840 --> 02:31:27.860 And it detects that, too, because "goodbye" is in the phrase 02:31:27.860 --> 02:31:29.130 that the user typed in. 02:31:29.130 --> 02:31:32.880 But if I say something like, "hey there," it's not recognized. 02:31:32.880 --> 02:31:33.750 So pretty cool. 02:31:33.750 --> 02:31:37.400 We can use very simple string comparisons using the in preposition 02:31:37.400 --> 02:31:38.390 to detect things. 02:31:38.390 --> 02:31:40.880 But I bet-- you know, I bet if we use the right library, 02:31:40.880 --> 02:31:43.580 we can really make this more powerful, too. 02:31:43.580 --> 02:31:46.940 Let me go ahead, and just like I imported facial recognition, 02:31:46.940 --> 02:31:51.710 let me import speech recognition in Python, which is yet another library 02:31:51.710 --> 02:31:53.240 that I pre-installed. 02:31:53.240 --> 02:31:56.330 Let me go ahead and now do this, recognizer equals 02:31:56.330 --> 02:31:58.320 speech_recognition.Recognizer. 02:32:01.700 --> 02:32:04.610 And this is just creating a variable called recognizer 02:32:04.610 --> 02:32:08.420 by my having followed literally the documentation for using this library. 02:32:08.420 --> 02:32:11.030 Then let me go ahead and do this, also from the documentation, 02:32:11.030 --> 02:32:19.640 with speech_recognition.Microphone as source. 02:32:19.640 --> 02:32:22.500 So this is opening up my microphone in some sense, 02:32:22.500 --> 02:32:24.290 again just following the documentation. 02:32:24.290 --> 02:32:28.300 Let me go ahead and say "Say something" to the user. 02:32:28.300 --> 02:32:30.760 And then after that, let me go ahead and declare 02:32:30.760 --> 02:32:36.700 a variable called audio, set it equal to the recognizer's listen function, 02:32:36.700 --> 02:32:39.310 passing in my microphone as the source. 02:32:39.310 --> 02:32:44.500 And now down here, let me go ahead and say print out "You said," 02:32:44.500 --> 02:32:51.532 and below that I will print out recognizer.recognize, 02:32:51.532 --> 02:32:55.670 is the hardest part today so far for some reason, google audio. 02:32:55.670 --> 02:32:56.170 All right. 02:32:56.170 --> 02:32:57.610 So what's going on? 02:32:57.610 --> 02:33:00.040 This line of code-- these lines of code here 02:33:00.040 --> 02:33:03.010 are opening up a connection to my microphone on my Mac. 02:33:03.010 --> 02:33:07.030 It's then using the speech recognition library to listen to my microphone, 02:33:07.030 --> 02:33:11.680 and storing the audio from my microphone in a variable called audio. 02:33:11.680 --> 02:33:14.230 These lines of code down here are literally printing, 02:33:14.230 --> 02:33:20.350 "You said," and then it's passing to the, the google.com, the file of audio 02:33:20.350 --> 02:33:23.560 that I just recorded on my microphone, and it's printing out 02:33:23.560 --> 02:33:25.147 whatever comes back from Google. 02:33:25.147 --> 02:33:26.980 So let's see what comes out, again, crossing 02:33:26.980 --> 02:33:28.810 my fingers that I didn't mess up. 02:33:28.810 --> 02:33:32.020 Python of listen. 02:33:32.020 --> 02:33:32.725 Hello, world. 02:33:35.680 --> 02:33:37.940 Hoo. 02:33:37.940 --> 02:33:41.320 How are you? 02:33:41.320 --> 02:33:42.933 It's a pretty good speech recognition. 02:33:42.933 --> 02:33:44.350 It's using the cloud, so to speak. 02:33:44.350 --> 02:33:45.558 It's passing it up to Google. 02:33:45.558 --> 02:33:47.350 But now let's make things a little fancier 02:33:47.350 --> 02:33:49.060 and actually respond to the human. 02:33:49.060 --> 02:33:52.360 So let me go back into here and add back some of the previous logic 02:33:52.360 --> 02:33:53.620 and say something like this. 02:33:53.620 --> 02:33:58.330 If "hello" in words, then go ahead and print out, like before, 02:33:58.330 --> 02:34:00.220 "Hello to you too." 02:34:00.220 --> 02:34:05.650 Elif "how are you" in the words that have come back from Google, 02:34:05.650 --> 02:34:08.800 go ahead and print out "I am well, thanks!" 02:34:08.800 --> 02:34:13.480 And down here if I said "goodbye" in words, 02:34:13.480 --> 02:34:20.050 then go ahead and print out "Goodbye to you too!" 02:34:20.050 --> 02:34:25.210 Else if nothing comes back that I recognize, let's just print out "Huh?" 02:34:25.210 --> 02:34:30.160 So if I did this right, let's now go ahead and let's do python of listen.py. 02:34:30.160 --> 02:34:32.560 Hello, there. 02:34:32.560 --> 02:34:33.490 Oh, dammit. 02:34:33.490 --> 02:34:35.380 OK, standby. 02:34:35.380 --> 02:34:36.370 Da-da-da. 02:34:36.370 --> 02:34:36.985 Oh, sorry. 02:34:39.802 --> 02:34:41.010 Let me do a find and replace. 02:34:41.010 --> 02:34:43.170 I called the variable "words" instead of "audio." 02:34:43.170 --> 02:34:45.790 And I just executed a fancy command to replace it everywhere. 02:34:45.790 --> 02:34:47.890 So "audio" is what I meant to say this time. 02:34:47.890 --> 02:34:52.005 Now, let's go ahead and run this, python of listen.py. 02:34:52.005 --> 02:34:54.590 Hello, world. 02:34:54.590 --> 02:34:55.500 Dammit. 02:34:55.500 --> 02:34:57.780 AudioData is not iterable. 02:34:57.780 --> 02:34:59.820 This is a bug. 02:34:59.820 --> 02:35:03.900 Give me one second to double check my notes. 02:35:03.900 --> 02:35:06.720 Very sorry to disappoint. 02:35:06.720 --> 02:35:09.450 The audio in-- oh, I did-- 02:35:09.450 --> 02:35:10.200 sorry. 02:35:10.200 --> 02:35:12.960 I did it right the first time but the wrong way. 02:35:12.960 --> 02:35:16.620 Let me change my variable back to words. 02:35:16.620 --> 02:35:17.310 OK. 02:35:17.310 --> 02:35:20.010 What I forgot to do was call one line of code here 02:35:20.010 --> 02:35:21.700 that's literally sitting in front of me. 02:35:21.700 --> 02:35:28.080 I need to convert the recognizer's return value, recognize_google audio. 02:35:28.080 --> 02:35:32.040 I need to store the return value of passing the audio to Google 02:35:32.040 --> 02:35:33.900 and storing the resulting text here. 02:35:33.900 --> 02:35:37.530 And so I have re-stored, using the words variable here. 02:35:37.530 --> 02:35:42.810 All right now let me go ahead and run python of listen.py. 02:35:42.810 --> 02:35:45.770 Hello, there. 02:35:45.770 --> 02:35:47.270 Very nice. 02:35:47.270 --> 02:35:48.185 How are you today? 02:35:51.200 --> 02:35:52.880 Cool. 02:35:52.880 --> 02:35:53.690 OK, goodbye. 02:35:56.430 --> 02:35:56.970 All right. 02:35:56.970 --> 02:35:59.678 So there we have an even more compelling artificial intelligence. 02:35:59.678 --> 02:36:03.030 Granted, it's not that intelligent, it's just looking for preordained strings. 02:36:03.030 --> 02:36:05.250 But I bet we can do something even more. 02:36:05.250 --> 02:36:08.730 And in fact, let me go ahead and step inside, and see if a colleague of mine 02:36:08.730 --> 02:36:10.440 can't help do something in real time. 02:36:10.440 --> 02:36:13.080 On a big fancy PC here in the theater, we 02:36:13.080 --> 02:36:16.148 are running some other Python program on a CPU 02:36:16.148 --> 02:36:17.940 that's fast enough to do this in real time. 02:36:17.940 --> 02:36:20.730 And we've connected one of our cameras to that PC, 02:36:20.730 --> 02:36:24.210 so that what you're about to see is the result of one of our cameras being 02:36:24.210 --> 02:36:28.860 wired into this PC, running that camera's input into Python software 02:36:28.860 --> 02:36:30.330 running on that PC. 02:36:30.330 --> 02:36:34.080 And we have trained the PC, using this Python software, 02:36:34.080 --> 02:36:37.020 to recognize certain images in the past. 02:36:37.020 --> 02:36:40.710 And let's see if we can't do this as well. 02:36:40.710 --> 02:36:43.920 Brian, would you mind putting me on screen 1? 02:36:43.920 --> 02:36:48.120 And Rongxin, do you want to go ahead and load up our first guest? 02:36:48.120 --> 02:36:49.930 I think we are live. 02:36:49.930 --> 02:36:55.950 So again, you see my mouth moving in lock step with Einstein here. 02:36:55.950 --> 02:36:57.570 His lips are matching mine. 02:36:57.570 --> 02:36:59.130 His head movements are moving-- 02:36:59.130 --> 02:36:59.760 matching mine. 02:36:59.760 --> 02:37:01.380 We can even be inquisitive. 02:37:01.380 --> 02:37:05.770 If my eyebrows go up, move my mouth this way, this way. 02:37:05.770 --> 02:37:08.610 And you can see that the Python program in real time 02:37:08.610 --> 02:37:13.050 is mapping my facial movements onto someone else's face, of course 02:37:13.050 --> 02:37:15.820 otherwise known as a deep fake. 02:37:15.820 --> 02:37:17.865 Rongxin, could we try out Brian's photo instead? 02:37:24.150 --> 02:37:31.110 Here now we have Brian who similarly is matching big smile. 02:37:31.110 --> 02:37:32.940 Gets a little fake at some point. 02:37:32.940 --> 02:37:36.000 But again, if we pre-rendered all of this instead of doing it live, 02:37:36.000 --> 02:37:38.550 the PC could probably do an even better job. 02:37:38.550 --> 02:37:42.130 How about could, we invite Harvard president Larry Bacow to join us, 02:37:42.130 --> 02:37:42.630 Rongxin? 02:37:47.470 --> 02:37:51.250 This is CS50, Harvard University's introduction 02:37:51.250 --> 02:37:53.950 to the intellectual enterprises of computer science 02:37:53.950 --> 02:37:56.790 and the art of programming. 02:37:56.790 --> 02:38:00.140 How about President Peter Salovey from Yale, Rongxin? 02:38:04.660 --> 02:38:08.440 This is CS50, Yale University's introduction 02:38:08.440 --> 02:38:10.960 to the intellectual enterprises of computer science 02:38:10.960 --> 02:38:12.940 and the art of programming. 02:38:12.940 --> 02:38:15.850 Now at this point, the real-world implications of this 02:38:15.850 --> 02:38:17.420 should be getting increasingly clear. 02:38:17.420 --> 02:38:20.650 While it's all fun and games to do this on Instagram, in TikTok and the like, 02:38:20.650 --> 02:38:22.907 using various mobile applications these days, 02:38:22.907 --> 02:38:24.740 which are essentially doing the same thing-- 02:38:24.740 --> 02:38:26.290 and you can see the image doesn't quite keep up 02:38:26.290 --> 02:38:28.810 with me if I start moving a little too quickly right now-- 02:38:28.810 --> 02:38:32.987 this is very real-world implications in the world of politics, government, 02:38:32.987 --> 02:38:35.320 business, and really just the real world more generally, 02:38:35.320 --> 02:38:39.280 because I'm essentially putting in someone else's mouth my own words. 02:38:39.280 --> 02:38:42.790 And while it's clear that these examples thus far aren't really that 02:38:42.790 --> 02:38:44.560 compelling-- if I start to move too much, 02:38:44.560 --> 02:38:46.510 you see that things start to get out of sync-- 02:38:46.510 --> 02:38:48.842 just imagine that if we wait one year, our computers 02:38:48.842 --> 02:38:51.550 are going to be twice as fast with even more memory and the like. 02:38:51.550 --> 02:38:53.590 Software is only getting better and more powerful, 02:38:53.590 --> 02:38:56.170 the libraries and the artificial intelligence is getting more trained. 02:38:56.170 --> 02:38:58.545 And so among the themes for the coming weeks of the class 02:38:58.545 --> 02:39:01.270 is not just how to do some things with technology 02:39:01.270 --> 02:39:04.300 and how to write code, but frankly asking the much bigger, more 02:39:04.300 --> 02:39:07.990 important picture question of should you do certain things with technology, 02:39:07.990 --> 02:39:10.690 and should you actually write such code. 02:39:10.690 --> 02:39:14.320 We did ask President Salovey and President Bacow for their permission 02:39:14.320 --> 02:39:16.440 in advance to spoof them in this way. 02:39:16.440 --> 02:39:18.190 But we thought we would more playfully end 02:39:18.190 --> 02:39:20.950 with just a couple of other examples that you perhaps 02:39:20.950 --> 02:39:23.980 see on Instagram, TikTok, and the like. 02:39:23.980 --> 02:39:26.290 Rongxin, could we invite Pam to join us first? 02:39:30.440 --> 02:39:32.450 And how about a certain Jim? 02:39:40.070 --> 02:39:40.670 All right. 02:39:40.670 --> 02:39:43.190 That's it for CS50 and Python today. 02:39:43.190 --> 02:39:44.960 We'll see you next time. 02:39:44.960 --> 02:39:49.810 [MUSIC PLAYING]