WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:02.988 [MUSIC PLAYING] 00:01:01.320 --> 00:01:02.670 DAVID MALAN: All right. 00:01:02.670 --> 00:01:06.240 This is CS50, and this is finally week 6. 00:01:06.240 --> 00:01:08.460 And this is that week we promised, wherein we finally 00:01:08.460 --> 00:01:13.705 transition from C, this lower-level older language via which we explored 00:01:13.705 --> 00:01:16.080 memory and how really computers work underneath the hood, 00:01:16.080 --> 00:01:19.590 to what's now called Python, which is a more modern, higher-level language, 00:01:19.590 --> 00:01:22.632 whereby we're still going to be able to solve the same types of problems. 00:01:22.632 --> 00:01:25.560 But it's going to suddenly start to get much, much easier because what 00:01:25.560 --> 00:01:29.190 Python offers, as do higher-level languages more generally, 00:01:29.190 --> 00:01:31.680 are what we might describe as abstractions 00:01:31.680 --> 00:01:35.850 over the very low-level ideas that you've been implementing in sections 00:01:35.850 --> 00:01:37.780 and problem sets and so much more. 00:01:37.780 --> 00:01:39.838 But recall from week 0, where we began. 00:01:39.838 --> 00:01:42.630 This was our simplest of programs that just printed "hello, world." 00:01:42.630 --> 00:01:44.820 Things escalated quickly thereafter in week 1, 00:01:44.820 --> 00:01:46.860 where, suddenly, we had all of this new syntax. 00:01:46.860 --> 00:01:50.790 But the idea was still the same of just printing out "hello, world." 00:01:50.790 --> 00:01:53.670 Well, as of today, a lot of that distraction, 00:01:53.670 --> 00:01:56.490 a lot of the visual distraction, goes away entirely 00:01:56.490 --> 00:02:02.410 such that what used to be this in C will now be quite simply this in Python. 00:02:02.410 --> 00:02:04.480 And that's a bit of a head fake in that we're 00:02:04.480 --> 00:02:06.850 going to see some other fancier features of Python. 00:02:06.850 --> 00:02:09.910 But you'll find that Python's popularity in large part 00:02:09.910 --> 00:02:13.300 derives from just how relatively readable it is 00:02:13.300 --> 00:02:16.750 and also, as we'll ultimately see, just how exciting 00:02:16.750 --> 00:02:19.930 and filled the ecosystem among Python programmers. 00:02:19.930 --> 00:02:22.068 That is to say there's a lot more libraries. 00:02:22.068 --> 00:02:24.610 There's a lot more problems that people have solved in Python 00:02:24.610 --> 00:02:26.980 that you can now incorporate into your own programs 00:02:26.980 --> 00:02:30.640 in order to stand on their shoulders and get real work done faster. 00:02:30.640 --> 00:02:33.430 But recall, though, from C that we had a few steps via which 00:02:33.430 --> 00:02:35.360 to actually compile that kind of code. 00:02:35.360 --> 00:02:38.470 So we got into the habit of make to make our program called hello. 00:02:38.470 --> 00:02:41.560 And then we've been in the habit of running it with ./hello, 00:02:41.560 --> 00:02:45.070 the effect of which, of course, is to feed all of the zeros and ones that 00:02:45.070 --> 00:02:48.610 compose the hello program into the computer's memory and, in turn, 00:02:48.610 --> 00:02:49.510 the CPU. 00:02:49.510 --> 00:02:52.270 We revealed that what make is really doing 00:02:52.270 --> 00:02:56.410 is something a little more specific, namely running clang, the C language 00:02:56.410 --> 00:03:00.610 compiler specifically, with some automatic command line arguments so as 00:03:00.610 --> 00:03:04.250 to output the name that you want, link in the library that you want, 00:03:04.250 --> 00:03:05.030 and so forth. 00:03:05.030 --> 00:03:08.590 But with Python, wonderfully, we're going to get rid of those steps, 00:03:08.590 --> 00:03:11.650 too, and quite simply run it as follows. 00:03:11.650 --> 00:03:15.100 Henceforth, our programs will no longer be in files ending in .c, 00:03:15.100 --> 00:03:16.090 suffice it to say. 00:03:16.090 --> 00:03:18.970 Our files starting today are going to start ending with .py, 00:03:18.970 --> 00:03:22.280 which is an indication to the computer-- macOS, Windows, 00:03:22.280 --> 00:03:25.450 or Linux or anything else-- that this is a Python program. 00:03:25.450 --> 00:03:30.310 But unlike C, wherein we've been in the habit of compiling our code 00:03:30.310 --> 00:03:33.580 and running it, compiling our code and running it, any time you make 00:03:33.580 --> 00:03:37.590 a change, with Python, those two steps get reduced into one, such 00:03:37.590 --> 00:03:40.090 that any time you make a change and want to rerun your code, 00:03:40.090 --> 00:03:42.310 you don't explicitly compile it anymore. 00:03:42.310 --> 00:03:46.930 You instead just run a program called python, similar in spirit to clang. 00:03:46.930 --> 00:03:49.780 But whereas clang is a compiler, python will 00:03:49.780 --> 00:03:53.050 see as not only the name of the language, but the name of a program. 00:03:53.050 --> 00:03:56.110 And the type of that program is that of interpreter. 00:03:56.110 --> 00:03:59.710 An interpreter is a program that reads your code top to bottom, left to right, 00:03:59.710 --> 00:04:04.300 and really does what it says without having this intermediate step of first 00:04:04.300 --> 00:04:06.860 having to compile it in zeros and ones. 00:04:06.860 --> 00:04:08.570 So with that said, let me do this. 00:04:08.570 --> 00:04:10.690 Let me flip over here to VS Code. 00:04:10.690 --> 00:04:13.600 And within VS Code, let me write my first Python program. 00:04:13.600 --> 00:04:17.200 And as always, I can create a new file with the code command within VS Code. 00:04:17.200 --> 00:04:20.500 I'm going to create this file called hello.py, for instance. 00:04:20.500 --> 00:04:25.345 And quite, quite simply, I'm going to go ahead and simply do print("Hello, 00:04:25.345 --> 00:04:27.560 world"). 00:04:27.560 --> 00:04:30.430 And if I go down to my terminal window, instead of compiling this, 00:04:30.430 --> 00:04:34.600 I'm instead going to interpret this program by running python, space, 00:04:34.600 --> 00:04:37.600 and the name of the file I want Python to interpret, hitting Enter. 00:04:37.600 --> 00:04:38.650 And voila. 00:04:38.650 --> 00:04:40.970 Now you see "hello, world." 00:04:40.970 --> 00:04:43.300 But let me go ahead and compare this at left. 00:04:43.300 --> 00:04:47.740 Let me also go ahead and bring back briefly a file called hello.c. 00:04:47.740 --> 00:04:51.250 And I'm going to do this as we did in the very first day of C, 00:04:51.250 --> 00:04:53.740 where I included standard io.h. 00:04:53.740 --> 00:04:55.540 I did int main(void). 00:04:55.540 --> 00:04:59.770 I did inside of there printf(), quote unquote, "hello, world," backslash n, 00:04:59.770 --> 00:05:01.508 close quote, semicolon. 00:05:01.508 --> 00:05:02.800 And let me go ahead in VS Code. 00:05:02.800 --> 00:05:05.390 And if you drag your file over to the right or the left, 00:05:05.390 --> 00:05:08.072 you can actually split-screen things if of help. 00:05:08.072 --> 00:05:10.780 And what I've done here is-- and let me hide my terminal window-- 00:05:10.780 --> 00:05:12.940 I've now compared these two files left and right. 00:05:12.940 --> 00:05:15.280 So here's hello.c from, say, week 1. 00:05:15.280 --> 00:05:18.100 Here's hello.py from week 6 now. 00:05:18.100 --> 00:05:20.680 And the obvious-- the differences are perhaps obvious. 00:05:20.680 --> 00:05:23.890 But there's still some-- there's a subtlety, at least one subtlety. 00:05:23.890 --> 00:05:26.530 Beyond getting rid of lots of syntax, what 00:05:26.530 --> 00:05:29.020 did I apparently omit from my Python version, 00:05:29.020 --> 00:05:32.680 even though it didn't appear to behave in any buggy way? 00:05:32.680 --> 00:05:34.180 Yeah? 00:05:34.180 --> 00:05:35.380 Sorry? 00:05:35.380 --> 00:05:36.180 Say one more time? 00:05:36.180 --> 00:05:36.620 AUDIENCE: The library. 00:05:36.620 --> 00:05:37.700 DAVID MALAN: The library. 00:05:37.700 --> 00:05:41.570 So I didn't have to include any kind of library like the standard I/O library. 00:05:41.570 --> 00:05:44.123 print(), apparently, in Python, just works. 00:05:44.123 --> 00:05:45.290 AUDIENCE: main() [INAUDIBLE] 00:05:45.290 --> 00:05:47.790 DAVID MALAN: So I don't need to use main() anymore. 00:05:47.790 --> 00:05:51.350 So this main() function, to be clear, was required in C because that's what 00:05:51.350 --> 00:05:54.050 told the compiler what the main part of your program is. 00:05:54.050 --> 00:05:56.060 And you can't just start writing code otherwise. 00:05:56.060 --> 00:05:56.935 What else do you see? 00:05:56.935 --> 00:05:57.893 AUDIENCE: No semicolon. 00:05:57.893 --> 00:06:00.560 DAVID MALAN: So there's no more semicolon, wonderfully enough, 00:06:00.560 --> 00:06:03.090 at the end of this line, even though there was here. 00:06:03.090 --> 00:06:05.090 And things are getting a little more subtle now. 00:06:05.090 --> 00:06:06.300 What else? 00:06:06.300 --> 00:06:07.460 So the new line. 00:06:07.460 --> 00:06:10.520 So recall that in printf(), if you wanted to move the cursor to the next 00:06:10.520 --> 00:06:13.470 line when you're done printing, you had to do it yourself. 00:06:13.470 --> 00:06:15.500 So it seems as though Python-- 00:06:15.500 --> 00:06:17.810 because when I interpreted this program a moment ago, 00:06:17.810 --> 00:06:21.150 the cursor did move to the next line on its own. 00:06:21.150 --> 00:06:23.340 They sort of reversed the default behavior. 00:06:23.340 --> 00:06:25.680 So those are just some of the salient differences here. 00:06:25.680 --> 00:06:29.540 One, you don't have to explicitly include standard library, so to speak, 00:06:29.540 --> 00:06:33.728 like standard I/O. You don't need to define a main() function anymore. 00:06:33.728 --> 00:06:35.270 You can just start writing your code. 00:06:35.270 --> 00:06:37.790 You don't need these parentheses, these curly braces. 00:06:37.790 --> 00:06:40.280 printf() is now called print(), it would seem. 00:06:40.280 --> 00:06:43.100 And you don't need the backslash n. 00:06:43.100 --> 00:06:46.250 Now, there is one thing that's also a little looser, 00:06:46.250 --> 00:06:47.700 even though I didn't do it here. 00:06:47.700 --> 00:06:51.710 Even though in C, it was required to use double quotes any times you-- 00:06:51.710 --> 00:06:56.240 any time you want to use a string, a.k.a., char*, in Python, 00:06:56.240 --> 00:07:00.210 as with a lot of languages nowadays, you can actually get away with just using 00:07:00.210 --> 00:07:03.780 single quotes so long as you are consistent. 00:07:03.780 --> 00:07:05.730 Generally speaking, some people like this 00:07:05.730 --> 00:07:07.650 because you don't have to hold Shift, and therefore, you just 00:07:07.650 --> 00:07:08.970 hit one key instead of two. 00:07:08.970 --> 00:07:11.290 So there's an argument in terms of efficiency. 00:07:11.290 --> 00:07:14.220 However, if you want to use an apostrophe in your string, 00:07:14.220 --> 00:07:15.580 then you have to escape it. 00:07:15.580 --> 00:07:19.020 And so in general, stylistically, I'll use double quotes in this way. 00:07:19.020 --> 00:07:21.570 But things are getting a little looser now with Python, 00:07:21.570 --> 00:07:24.390 whereby that's not actually a requirement. 00:07:24.390 --> 00:07:27.240 But what's especially exciting with Python, 00:07:27.240 --> 00:07:30.960 and, really, a lot of higher-level languages, is just how much real work 00:07:30.960 --> 00:07:32.642 you can get done relatively quickly. 00:07:32.642 --> 00:07:34.350 So you've just spent quite a bit of time, 00:07:34.350 --> 00:07:37.050 daresay, implementing your spell checker and implementing 00:07:37.050 --> 00:07:39.360 your own dictionary of sorts. 00:07:39.360 --> 00:07:42.420 Well, let me propose that maybe we should have asked you 00:07:42.420 --> 00:07:45.513 to do that in Python instead of C. Why? 00:07:45.513 --> 00:07:46.930 Well, let me go ahead and do this. 00:07:46.930 --> 00:07:50.170 Let me close these two tabs and reopen my terminal window. 00:07:50.170 --> 00:07:52.440 Let me go into a directory called speller 00:07:52.440 --> 00:07:55.020 that I downloaded in advance for class. 00:07:55.020 --> 00:07:57.150 And if I type ls in here, you'll notice that it's 00:07:57.150 --> 00:08:00.660 very similar to what you spent time on with problem set 5. 00:08:00.660 --> 00:08:02.670 But the file extensions are different. 00:08:02.670 --> 00:08:05.040 There's a dictionary.py instead of dictionary.c. 00:08:05.040 --> 00:08:07.620 There's a speller.py instead of a speller.c. 00:08:07.620 --> 00:08:10.980 And there's the exact same directories, dictionaries, and texts 00:08:10.980 --> 00:08:13.110 that we gave you for problem set 5. 00:08:13.110 --> 00:08:18.660 So let me just stipulate that I spent time implementing speller.c in Python. 00:08:18.660 --> 00:08:20.580 And so I gave it a name of speller.py. 00:08:20.580 --> 00:08:25.090 But I didn't go about really implementing dictionary.py yet. 00:08:25.090 --> 00:08:29.250 And so why don't we go ahead and actually implement dictionary.py 00:08:29.250 --> 00:08:30.900 together by doing this? 00:08:30.900 --> 00:08:34.530 Let me clear my terminal, do code dictionary.py. 00:08:34.530 --> 00:08:38.309 And let me propose that we implement, ultimately, four functions. 00:08:38.309 --> 00:08:40.120 And what are those functions going to be? 00:08:40.120 --> 00:08:42.960 Well, they're going to be the check() function, the load() function, 00:08:42.960 --> 00:08:45.420 the size() function, and the unload() function. 00:08:45.420 --> 00:08:50.140 But recall that in problem set 5, you implemented your own hash table. 00:08:50.140 --> 00:08:54.333 And so while there isn't a hash table data type in Python, 00:08:54.333 --> 00:08:55.750 I'm going to go ahead and do this. 00:08:55.750 --> 00:08:58.860 I'm going to create a variable, a global variable in dictionary.py, 00:08:58.860 --> 00:09:01.620 called words, and I'm going to make it a set. 00:09:01.620 --> 00:09:03.960 In the mathematical sense, a set is a collection 00:09:03.960 --> 00:09:05.970 of things that won't contain duplicates. 00:09:05.970 --> 00:09:07.588 Any duplicates will be filtered out. 00:09:07.588 --> 00:09:10.380 So I'm going to now, after that, creating that one global variable, 00:09:10.380 --> 00:09:13.410 I'm going to create a function called check(), just as you did. 00:09:13.410 --> 00:09:15.730 And check() takes as input a word. 00:09:15.730 --> 00:09:19.200 And if I want to check if a word is in that set of words, 00:09:19.200 --> 00:09:25.080 I can simply do word.lower in words. 00:09:25.080 --> 00:09:26.070 And that's it. 00:09:26.070 --> 00:09:28.740 Let me now define another function called load(), which, recall, 00:09:28.740 --> 00:09:32.310 took an argument, which was the name of the dictionary you want to load 00:09:32.310 --> 00:09:33.060 into memory. 00:09:33.060 --> 00:09:35.352 Inside of my load() function, I'm now going to do this. 00:09:35.352 --> 00:09:39.525 I'm going to say with open(dictionary) as a variable called file. 00:09:39.525 --> 00:09:42.150 And in there, I'm going to go ahead and update the set of words 00:09:42.150 --> 00:09:47.670 to be the updated version of whatever's in this file as a result of reading 00:09:47.670 --> 00:09:50.580 it and then splitting its lines, whereby this file has 00:09:50.580 --> 00:09:54.090 a big, long column of words, each of which is separated by a new line, 00:09:54.090 --> 00:09:57.930 splitline is going to split all of those into one big collection. 00:09:57.930 --> 00:10:00.270 And then I'm just going to go ahead and return True. 00:10:00.270 --> 00:10:03.643 I'm now going to go ahead and define a size function, just as you did. 00:10:03.643 --> 00:10:06.810 But in Python, I'm going to go ahead and just go ahead and return the length 00:10:06.810 --> 00:10:12.120 of that set of words, where length, or len(), is a function itself in Python. 00:10:12.120 --> 00:10:14.170 And I'm going to do one last function. 00:10:14.170 --> 00:10:17.560 It turns out that in Python, even though, for this program, 00:10:17.560 --> 00:10:20.040 I'm going to go and implement a function called unload, 00:10:20.040 --> 00:10:22.500 there's not actually anything to unload in Python, 00:10:22.500 --> 00:10:25.410 because Python will manage your memory for you. 00:10:25.410 --> 00:10:27.750 malloc() is gone. free() is gone. 00:10:27.750 --> 00:10:29.340 Pointers are gone. 00:10:29.340 --> 00:10:33.280 It handles all of that, seemingly magically for now, for you. 00:10:33.280 --> 00:10:37.080 So here then is, I claim, what you could have done with problem set 5 00:10:37.080 --> 00:10:39.090 if implementing it in Python instead. 00:10:39.090 --> 00:10:41.470 Let me go ahead and open my terminal window. 00:10:41.470 --> 00:10:42.690 Let me increase its size. 00:10:42.690 --> 00:10:46.380 Let me run Python of speller.py, which is the name of the actual program, not 00:10:46.380 --> 00:10:48.360 the dictionary per se that I implemented. 00:10:48.360 --> 00:10:51.300 Let's run it on a file called holmes.txt because that 00:10:51.300 --> 00:10:53.140 was a particularly big file. 00:10:53.140 --> 00:10:55.680 And if I hit Enter now, we'll see, hopefully, 00:10:55.680 --> 00:10:59.630 the same output that you saw in C flying across the screen. 00:10:59.630 --> 00:11:02.870 And eventually, we should see that same summary at the bottom 00:11:02.870 --> 00:11:04.820 as to how many words seem to be misspelled, 00:11:04.820 --> 00:11:09.170 how many words were in the dictionary, and, ultimately, how 00:11:09.170 --> 00:11:11.510 fast this whole process was. 00:11:11.510 --> 00:11:15.500 Now, the total amount of time required was 1.93 seconds, which was actually 00:11:15.500 --> 00:11:16.940 longer than it seemed to take. 00:11:16.940 --> 00:11:18.590 That's because we're doing this in the cloud, 00:11:18.590 --> 00:11:21.715 and it was taking some amount of time to send all of the text to my screen. 00:11:21.715 --> 00:11:26.120 But the code was only taking 1.93 seconds total on the actual server. 00:11:26.120 --> 00:11:29.300 And hopefully, these same kinds of numbers line up with your own, 00:11:29.300 --> 00:11:33.500 the difference being what I did not have to implement for this spell checker is 00:11:33.500 --> 00:11:36.500 your own hash table, is your own dictionary, literally, 00:11:36.500 --> 00:11:41.780 beyond what I've done using Python here with some of these built-in features. 00:11:41.780 --> 00:11:45.440 So why, you see, why not always use Python, 00:11:45.440 --> 00:11:48.140 assuming that you prefer the idea of being 00:11:48.140 --> 00:11:53.540 able to whip up within seconds the entirety of problem set 5? 00:11:53.540 --> 00:11:57.940 How might you choose now between languages? 00:11:57.940 --> 00:11:59.940 And I apologize if you're harboring resentment 00:11:59.940 --> 00:12:02.790 that this wasn't a week earlier. 00:12:02.790 --> 00:12:05.850 Why Python or why C? 00:12:05.850 --> 00:12:08.790 Any instincts? 00:12:08.790 --> 00:12:09.615 Any thoughts? 00:12:09.615 --> 00:12:11.190 There's hopefully a reason? 00:12:11.190 --> 00:12:13.390 Yeah, over here? 00:12:13.390 --> 00:12:15.286 Yeah? 00:12:15.286 --> 00:12:20.813 AUDIENCE: I always thought that Python was a little slower than C [INAUDIBLE] 00:12:20.813 --> 00:12:22.480 DAVID MALAN: Ah, really good conjecture. 00:12:22.480 --> 00:12:24.730 So you always thought that Python was slower than C 00:12:24.730 --> 00:12:27.950 and takes up more space than C. Odds are that's, in fact, correct. 00:12:27.950 --> 00:12:32.050 So even though, ultimately, this 1.93 seconds is still pretty darn fast, 00:12:32.050 --> 00:12:35.230 odds are it's a little slower than the C version would have been. 00:12:35.230 --> 00:12:38.110 It's possible, too, that my version in Python 00:12:38.110 --> 00:12:41.050 actually does take up more RAM or memory underneath the hood. 00:12:41.050 --> 00:12:41.560 Why? 00:12:41.560 --> 00:12:44.950 Well, because Python itself is managing memory for you. 00:12:44.950 --> 00:12:49.420 And it doesn't necessarily know a priori how much memory you're going to need. 00:12:49.420 --> 00:12:52.840 You, the programmer might, and you, the programmer writing in C, 00:12:52.840 --> 00:12:55.540 allocated presumably exactly as much memory 00:12:55.540 --> 00:12:58.660 as you might have needed last week with problem set 5. 00:12:58.660 --> 00:13:01.630 But Python's got to maybe do its best effort for you 00:13:01.630 --> 00:13:05.260 and try to manage memory for you, and there's going to be some overhead. 00:13:05.260 --> 00:13:08.170 The fact that I have so many fewer lines of code, 00:13:08.170 --> 00:13:11.560 the fact that these lines of code solve problem set 5 for me, 00:13:11.560 --> 00:13:17.050 means that Python, or whoever invented Python, they wrote lines of code 00:13:17.050 --> 00:13:19.190 to of give me this functionality. 00:13:19.190 --> 00:13:21.440 And so if you think of Python as a middleman of sorts, 00:13:21.440 --> 00:13:23.420 it's doing more work for me. 00:13:23.420 --> 00:13:24.960 It's doing more of the heavy lift. 00:13:24.960 --> 00:13:26.540 So it might take me a bit more time. 00:13:26.540 --> 00:13:30.050 But, my gosh, look how much time it has saved in terms 00:13:30.050 --> 00:13:31.820 of writing this code more quickly. 00:13:31.820 --> 00:13:34.220 And arguably, this code is even more readable, 00:13:34.220 --> 00:13:38.660 or at least will be after today, week 6, once you have an eye for the syntax 00:13:38.660 --> 00:13:41.430 and features of Python itself. 00:13:41.430 --> 00:13:45.030 So beyond that, it turns out you can do other things pretty easily as well. 00:13:45.030 --> 00:13:47.490 Let me go back into my terminal window. 00:13:47.490 --> 00:13:49.400 Let me close this dictionary.py. 00:13:49.400 --> 00:13:52.220 Let me go into a folder called filter, in which 00:13:52.220 --> 00:13:56.520 I have this same bridge that we've seen in the past across the river there. 00:13:56.520 --> 00:13:57.352 So here's a bridge. 00:13:57.352 --> 00:13:59.810 This is the original version of this particular photograph. 00:13:59.810 --> 00:14:02.640 Suppose I actually want to write a program that blurs this. 00:14:02.640 --> 00:14:06.590 Well, you might recall from problem set 4 you could write that same code in C 00:14:06.590 --> 00:14:10.760 by manipulating all of the red, the green, the blue pixels that 00:14:10.760 --> 00:14:12.770 are ultimately composing that file. 00:14:12.770 --> 00:14:14.940 But let me go ahead and propose this instead. 00:14:14.940 --> 00:14:17.540 Let me create a file called blur.py. 00:14:17.540 --> 00:14:23.840 And in this file, let me go ahead and just go ahead and import a library. 00:14:23.840 --> 00:14:27.950 So from the Python image library, PIL, let me go ahead 00:14:27.950 --> 00:14:31.790 and import something called Image, capital I, and Image Filter, capital 00:14:31.790 --> 00:14:32.810 I, capital F. 00:14:32.810 --> 00:14:35.570 So I'm going to do before = Image.open("bridge.bmp"). 00:14:38.480 --> 00:14:41.300 Then let me go ahead and create another variable called after 00:14:41.300 --> 00:14:46.220 and set that equal to before.filter, and then, in parentheses, 00:14:46.220 --> 00:14:50.328 ImageFilter, spelled as before, dot BoxBlur, 00:14:50.328 --> 00:14:51.870 and then we'll give it a value of 10. 00:14:51.870 --> 00:14:53.900 How much do I want to blur it, for instance? 00:14:53.900 --> 00:14:57.710 After that, I'm going to literally call after.save, and let's 00:14:57.710 --> 00:15:00.320 save it as a file called out.bmp. 00:15:00.320 --> 00:15:01.520 And that's it. 00:15:01.520 --> 00:15:05.640 I propose that this is how you can now write code in Python to blur an image, 00:15:05.640 --> 00:15:07.640 much like you might have for problem set 4. 00:15:07.640 --> 00:15:12.200 Now let me go ahead in my terminal window and run python of blur.py. 00:15:12.200 --> 00:15:14.403 When I hit Enter, those four lines of code will run. 00:15:14.403 --> 00:15:16.070 It seems to have happened quite quickly. 00:15:16.070 --> 00:15:19.460 Let me go ahead and open now out.bmp. 00:15:19.460 --> 00:15:24.020 And whereas the previous image looked like this a moment ago, let me go ahead 00:15:24.020 --> 00:15:26.030 and open out.bmp. 00:15:26.030 --> 00:15:28.430 And hopefully, you can indeed see that it blurred it 00:15:28.430 --> 00:15:30.597 for me using that same code. 00:15:30.597 --> 00:15:32.930 And if we want things to escalate a little more quickly, 00:15:32.930 --> 00:15:35.220 let me go ahead and do this instead. 00:15:35.220 --> 00:15:36.650 Let me close blur.bmp. 00:15:36.650 --> 00:15:39.440 Let me go ahead and open a file called edges.py. 00:15:39.440 --> 00:15:41.990 And maybe, in edges.py, we can use this same library. 00:15:41.990 --> 00:15:47.913 So from the Python Image Library, import Image and import ImageFilter. 00:15:47.913 --> 00:15:50.330 Let me go ahead and create another variable called before, 00:15:50.330 --> 00:15:53.900 set it equal to Image.open("bridge.bmp"), 00:15:53.900 --> 00:15:54.980 just like before. 00:15:54.980 --> 00:15:57.560 Let me create another variable called after, 00:15:57.560 --> 00:16:03.200 set that equal to before.filter(ImageFilter.FIND_EDGES), 00:16:03.200 --> 00:16:06.500 which comes with this library automatically, and lastly, 00:16:06.500 --> 00:16:10.370 the same thing-- save this as a file called out.bmp. 00:16:10.370 --> 00:16:12.470 So if you struggled perhaps with this one 00:16:12.470 --> 00:16:16.310 previously, whereby you wrote for the more comfortable version of problem 00:16:16.310 --> 00:16:19.580 set 4, edge detection, so to speak, well, you 00:16:19.580 --> 00:16:23.270 might have then created a file that given an input like this, 00:16:23.270 --> 00:16:28.340 the original bridge.bmp, this new version, out.bmp, with just four 00:16:28.340 --> 00:16:31.050 lines of code, now looks like this. 00:16:31.050 --> 00:16:32.840 So, again, if this is a little frustrating 00:16:32.840 --> 00:16:36.110 that we had to do all of this in C, that was exactly 00:16:36.110 --> 00:16:39.140 the point to motivate that you now understand nonetheless 00:16:39.140 --> 00:16:40.850 what's going on underneath the hood. 00:16:40.850 --> 00:16:43.580 But with Python, you can express the solutions 00:16:43.580 --> 00:16:46.880 to problems all the more efficiently, all the more readily. 00:16:46.880 --> 00:16:50.060 And just one last one, too-- it's very common nowadays 00:16:50.060 --> 00:16:52.910 in the world of photography and social media and the like to do face 00:16:52.910 --> 00:16:54.830 detection, for better or for worse. 00:16:54.830 --> 00:16:56.990 And it turns out that face detection, even 00:16:56.990 --> 00:16:59.240 if you want to integrate it into your own application, 00:16:59.240 --> 00:17:02.660 is something that lots of other people have integrated into their applications 00:17:02.660 --> 00:17:03.360 as well. 00:17:03.360 --> 00:17:07.880 So Python, to my point earlier of having this very rich ecosystem of libraries 00:17:07.880 --> 00:17:12.349 that other people wrote, you can literally run a command like pip 00:17:12.349 --> 00:17:19.680 install face_recognition if you want to add support to your code space, 00:17:19.680 --> 00:17:21.930 or to your programming and environment more generally, 00:17:21.930 --> 00:17:24.532 for the notion of face recognition. 00:17:24.532 --> 00:17:26.490 In fact, this is going to automatically install 00:17:26.490 --> 00:17:29.340 from some server elsewhere a library that someone else wrote 00:17:29.340 --> 00:17:30.870 called face_recognition. 00:17:30.870 --> 00:17:34.000 And with this library, you can do something like this. 00:17:34.000 --> 00:17:37.290 Let me go into a directory that I came with in advance. 00:17:37.290 --> 00:17:40.350 Let me go ahead and ls in there, and you'll see four files-- 00:17:40.350 --> 00:17:43.710 detect.py and recognize.py, which are going to detect 00:17:43.710 --> 00:17:48.150 faces and then recognize specific faces, respectively, and then two files 00:17:48.150 --> 00:17:50.260 I brought from a popular TV show, for instance. 00:17:50.260 --> 00:17:55.350 So if I open office.jpg, here is one of the early cast photos from the hit TV 00:17:55.350 --> 00:17:56.580 series The Office. 00:17:56.580 --> 00:18:03.060 And here is a photograph of someone specific from the show, Toby. 00:18:03.060 --> 00:18:05.970 Now, this is, of course, Toby's face. 00:18:05.970 --> 00:18:10.830 But what is it that makes Toby's face a face? 00:18:10.830 --> 00:18:15.450 More generally, if I open up office.jpg, and I asked you, the human, 00:18:15.450 --> 00:18:17.917 to identify all of the faces in this picture, 00:18:17.917 --> 00:18:21.000 it wouldn't be that hard with a marker to sort of circle all of the faces. 00:18:21.000 --> 00:18:21.510 But how? 00:18:21.510 --> 00:18:22.410 Why? 00:18:22.410 --> 00:18:26.460 How do you as humans detect faces, might you think? 00:18:26.460 --> 00:18:27.030 Yeah? 00:18:27.030 --> 00:18:28.380 AUDIENCE: You have eyes, nose. 00:18:28.380 --> 00:18:30.360 DAVID MALAN: Features, yeah, like eyes, nose, 00:18:30.360 --> 00:18:32.610 generally in a similar orientation, even though we all 00:18:32.610 --> 00:18:34.140 have different faces, ultimately. 00:18:34.140 --> 00:18:38.140 But there's a pattern to the shapes that you're seeing on the screen. 00:18:38.140 --> 00:18:40.140 Well, it turns out this face_recognition library 00:18:40.140 --> 00:18:43.590 has been trained, perhaps via artificial intelligence over time, 00:18:43.590 --> 00:18:46.830 to recognize faces, but any number of different faces, 00:18:46.830 --> 00:18:48.430 perhaps among these folks here. 00:18:48.430 --> 00:18:51.250 So if I go back into my terminal window here, 00:18:51.250 --> 00:18:56.970 let me go ahead and run, say, python of detect.py, which I wrote 00:18:56.970 --> 00:18:59.190 in advance, which uses that library. 00:18:59.190 --> 00:19:03.240 And what that program is going to do-- it's going to think, do some thinking. 00:19:03.240 --> 00:19:04.830 It's just found some face. 00:19:04.830 --> 00:19:08.100 And let me go ahead now and open a file it just created 00:19:08.100 --> 00:19:12.370 called detected.jpg, which I didn't have in my folder a moment ago. 00:19:12.370 --> 00:19:16.650 But when I open this here file, you'll now see all of the faces 00:19:16.650 --> 00:19:19.290 based on this library's detection thereof. 00:19:19.290 --> 00:19:22.500 But suppose that we're looking for a very specific face among them, 00:19:22.500 --> 00:19:23.435 maybe Toby's. 00:19:23.435 --> 00:19:25.560 Well, maybe if we write a program that doesn't just 00:19:25.560 --> 00:19:30.030 take as input the office.jpg, but a second input, toby.jpg, 00:19:30.030 --> 00:19:32.910 maybe this library, and code more generally, 00:19:32.910 --> 00:19:37.320 can distinguish Toby's face from Jim's, from Pam's, from everyone else 00:19:37.320 --> 00:19:41.880 in the show, just based on this one piece of training data, so to speak. 00:19:41.880 --> 00:19:47.580 Well let me instead run python of recognize.py and hit Enter. 00:19:47.580 --> 00:19:50.370 It's going to do some thinking, some thinking, some thinking. 00:19:50.370 --> 00:19:52.800 And it is going to output now a file called 00:19:52.800 --> 00:19:59.740 recognized.jpg, which should show me his face, ideally, specifically. 00:19:59.740 --> 00:20:01.200 And so what has it done? 00:20:01.200 --> 00:20:05.670 Well, with sort of a green marker, there is Toby among all of these faces. 00:20:05.670 --> 00:20:07.800 That's maybe a dozen or so lines of code, 00:20:07.800 --> 00:20:10.860 but it's built on top of this ecosystem of libraries. 00:20:10.860 --> 00:20:14.295 And this is, again, just one of the reasons why Python is so popular. 00:20:14.295 --> 00:20:16.920 Undoubtedly, some number of years from now, Python will be out, 00:20:16.920 --> 00:20:18.378 and something else will be back in. 00:20:18.378 --> 00:20:21.600 But that's indeed among the goals of CS50, too, is not to teach you C, 00:20:21.600 --> 00:20:23.820 not to teach you Python, not in a couple of weeks 00:20:23.820 --> 00:20:26.280 to teach you JavaScript and other languages, too, 00:20:26.280 --> 00:20:28.200 but to teach you how to program. 00:20:28.200 --> 00:20:30.900 And indeed, all of the ideas we have explored and will now 00:20:30.900 --> 00:20:36.000 explore more today, you'll see recurring for languages in the years to come. 00:20:36.000 --> 00:20:40.530 Any questions before we now dive into how it is this code is working 00:20:40.530 --> 00:20:45.150 and why I type the things that I did before we forge ahead? 00:20:45.150 --> 00:20:48.890 Any questions along these lines? 00:20:48.890 --> 00:20:50.720 Anything at all? 00:20:50.720 --> 00:20:51.270 No? 00:20:51.270 --> 00:20:51.770 All right. 00:20:51.770 --> 00:20:54.887 So how does Python itself work? 00:20:54.887 --> 00:20:56.720 Well, let's do a quick review as we did when 00:20:56.720 --> 00:20:58.730 we transitioned from Scratch to C, this time, 00:20:58.730 --> 00:21:00.470 though, from Scratch, say, to Python. 00:21:00.470 --> 00:21:02.507 So in Python, as with many languages, there 00:21:02.507 --> 00:21:05.090 are these things called functions-- the actions and verbs that 00:21:05.090 --> 00:21:06.330 actually get things done. 00:21:06.330 --> 00:21:09.590 So here on the left, recall from week 0, was the simplest of functions. 00:21:09.590 --> 00:21:12.320 We played with, first, the say block, which just literally has 00:21:12.320 --> 00:21:13.880 the cat say something on the screen. 00:21:13.880 --> 00:21:18.980 We've seen in C, for instance, the equivalent line of code is arguably 00:21:18.980 --> 00:21:22.130 this here, with printf(), with the parentheses, the quotation marks, 00:21:22.130 --> 00:21:23.810 the backslash n, the semicolon. 00:21:23.810 --> 00:21:27.330 In Python now, it's going to indeed be a little simpler than that. 00:21:27.330 --> 00:21:30.590 But the idea is the same as it was back in week 0. 00:21:30.590 --> 00:21:33.800 Libraries-- so we've seen already in C, and now we've 00:21:33.800 --> 00:21:36.170 already seen in Python that these things exist, too. 00:21:36.170 --> 00:21:40.460 In the world of C, recall that besides the standard ones, like standard io.h, 00:21:40.460 --> 00:21:43.978 that header file, we could very quickly introduce cs50.h, 00:21:43.978 --> 00:21:45.770 which was like your entry point, the header 00:21:45.770 --> 00:21:49.342 file for the CS50 library, which gave you a bunch of functions as well. 00:21:49.342 --> 00:21:51.300 Well, we're going to give you a similar library 00:21:51.300 --> 00:21:54.145 for at least the next week or two, training wheels for Python 00:21:54.145 --> 00:21:57.270 specifically, that, again, will take off so that you can stand on your own, 00:21:57.270 --> 00:21:59.160 even with CS50 behind you. 00:21:59.160 --> 00:22:03.130 But the syntax for using a library in Python is a little different. 00:22:03.130 --> 00:22:05.250 You don't include a .h file. 00:22:05.250 --> 00:22:09.550 You just import, instead, the name of the library. 00:22:09.550 --> 00:22:10.050 All right. 00:22:10.050 --> 00:22:11.258 What does that actually mean? 00:22:11.258 --> 00:22:14.280 Well, if there are specific functions in that library you want to use, 00:22:14.280 --> 00:22:16.050 in Python, you can be more precise. 00:22:16.050 --> 00:22:18.750 You don't just have to say, give me the whole library. 00:22:18.750 --> 00:22:23.580 For efficiency purposes, you can say, let me import the get_string() function 00:22:23.580 --> 00:22:25.800 from the CS50 library. 00:22:25.800 --> 00:22:29.610 So you have finer-grained control in Python, which can actually speed things 00:22:29.610 --> 00:22:31.590 up if you're not loading things unnecessarily 00:22:31.590 --> 00:22:35.040 into memory, if all you want is, say, one feature therein. 00:22:35.040 --> 00:22:40.230 So here, for instance, in Scratch, was an example of how we might use not only 00:22:40.230 --> 00:22:45.120 a built-in function, like the say block, or, in C, in the printf(), 00:22:45.120 --> 00:22:50.250 but how we might similarly now do the same but achieve this in Python. 00:22:50.250 --> 00:22:51.610 So how might we do this? 00:22:51.610 --> 00:22:54.672 Well, in Python, or rather, in C, this code 00:22:54.672 --> 00:22:56.130 looks a little something like this. 00:22:56.130 --> 00:22:59.070 Back in week 1, we declared a variable of type string, 00:22:59.070 --> 00:23:01.530 even though later we revealed that to be char*. 00:23:01.530 --> 00:23:04.980 I gave this a variable name of answer for parity with Scratch. 00:23:04.980 --> 00:23:08.140 Then we use CS50's own get_string() function and asked, for instance, 00:23:08.140 --> 00:23:10.290 the same question as in the white oval here. 00:23:10.290 --> 00:23:14.430 And then, using this placeholder syntax, these format codes, 00:23:14.430 --> 00:23:18.870 which was printf()-specific, we could plug in that answer to this premade 00:23:18.870 --> 00:23:21.120 string where the %s is. 00:23:21.120 --> 00:23:24.370 And we saw %i and %f and a bunch of others as well. 00:23:24.370 --> 00:23:27.450 So this is sort of how, in C, you approximate 00:23:27.450 --> 00:23:31.080 the idea of concatenating two things together, joining two things, 00:23:31.080 --> 00:23:33.330 just as we did here in Scratch. 00:23:33.330 --> 00:23:36.360 So in Python, it turns out it's not only going to be a little easier, 00:23:36.360 --> 00:23:39.070 but there's going to be even more ways to do this. 00:23:39.070 --> 00:23:42.690 And so even what might seem today like a lot of different syntax, 00:23:42.690 --> 00:23:45.300 it really is just different ways, stylistically, 00:23:45.300 --> 00:23:46.392 to achieve the same goals. 00:23:46.392 --> 00:23:48.600 And over time, as you get more comfortable with this, 00:23:48.600 --> 00:23:51.900 you too will develop your own style, or, if working for a company 00:23:51.900 --> 00:23:54.120 or working with a team, you might collectively 00:23:54.120 --> 00:23:56.950 decide which conventions you want to use. 00:23:56.950 --> 00:24:01.290 But here, for instance, is one way you could implement this same idea 00:24:01.290 --> 00:24:04.120 in Scratch but in Python instead. 00:24:04.120 --> 00:24:06.690 So notice I'm going to still use a variable called answer. 00:24:06.690 --> 00:24:08.910 I'm going to use CS50's function called get_string(). 00:24:08.910 --> 00:24:11.370 I'm still going to use, quote unquote, "What's your name?" 00:24:11.370 --> 00:24:14.408 But down here is where we see the most difference. 00:24:14.408 --> 00:24:16.200 It's, again, not called printf() in Python. 00:24:16.200 --> 00:24:18.060 It's now called just print(). 00:24:18.060 --> 00:24:22.350 And what might you infer the plus operator is doing here? 00:24:22.350 --> 00:24:26.063 It's not addition, obviously, in a mathematical sense. 00:24:26.063 --> 00:24:28.230 But those of you who have perhaps programmed before, 00:24:28.230 --> 00:24:30.195 what does the plus represent in this context? 00:24:30.195 --> 00:24:32.340 AUDIENCE: It's joining the two strings together. 00:24:32.340 --> 00:24:34.757 DAVID MALAN: It's indeed joining the two strings together. 00:24:34.757 --> 00:24:36.630 So this is indeed concatenating the thing 00:24:36.630 --> 00:24:38.830 on the left with the thing on the right. 00:24:38.830 --> 00:24:42.160 So you don't use the placeholder in this particular scenario. 00:24:42.160 --> 00:24:44.490 You can instead, a little more simply, just use plus. 00:24:44.490 --> 00:24:46.120 But you want your grammar to line up. 00:24:46.120 --> 00:24:50.430 So I still have "hello," and then close quote 00:24:50.430 --> 00:24:52.950 because I want to form a full phrase. 00:24:52.950 --> 00:24:55.410 Notice, too, there's also one other slightly more 00:24:55.410 --> 00:24:58.200 subtle difference on the first line. 00:24:58.200 --> 00:25:01.482 Besides the fact that we don't have a semicolon, what else is different? 00:25:01.482 --> 00:25:03.690 AUDIENCE: You don't declare the type of the variable. 00:25:03.690 --> 00:25:05.982 DAVID MALAN: I didn't declare the type of the variable. 00:25:05.982 --> 00:25:08.340 So Python still has strings, as we'll see. 00:25:08.340 --> 00:25:12.290 But you don't have to tell the interpreter what type of variable 00:25:12.290 --> 00:25:12.790 it is. 00:25:12.790 --> 00:25:14.490 And this is going to save us some keystrokes, 00:25:14.490 --> 00:25:17.230 and it's just going to be a little more user-friendly over time. 00:25:17.230 --> 00:25:21.130 Meanwhile, you can do this also a little bit differently if you prefer. 00:25:21.130 --> 00:25:25.620 You can instead trust that the print() function in Python can actually do even 00:25:25.620 --> 00:25:27.210 more for you automatically. 00:25:27.210 --> 00:25:32.160 The print() function in Python can take multiple arguments separated by commas 00:25:32.160 --> 00:25:33.240 in the usual way. 00:25:33.240 --> 00:25:35.340 And by default, Python is going to insert 00:25:35.340 --> 00:25:39.767 for you a single space between its first argument and its second argument. 00:25:39.767 --> 00:25:41.850 So notice what I've done here is my first argument 00:25:41.850 --> 00:25:45.090 is, quote unquote, "hello," with a comma but no space. 00:25:45.090 --> 00:25:48.150 Then, outside of the quotes, I'm putting a comma because that just means, 00:25:48.150 --> 00:25:49.550 here comes my second argument. 00:25:49.550 --> 00:25:52.098 And then I put the same variable as before. 00:25:52.098 --> 00:25:53.890 And I'm just going to let Python figure out 00:25:53.890 --> 00:25:56.860 that it should, by default, per its documentation, 00:25:56.860 --> 00:26:02.050 can join these two variables, putting a single space in between them. 00:26:02.050 --> 00:26:03.610 You can do this yet another way. 00:26:03.610 --> 00:26:06.850 And this way looks a little weirder, but this is actually 00:26:06.850 --> 00:26:09.910 probably the most common way nowadays in Python 00:26:09.910 --> 00:26:14.450 is to use what's called a format string, or f string, for short. 00:26:14.450 --> 00:26:16.390 And this looks weird to me still. 00:26:16.390 --> 00:26:17.170 It looks weird. 00:26:17.170 --> 00:26:21.010 But if you prefix a string in Python with an f, 00:26:21.010 --> 00:26:26.020 literally, you can then use curly braces inside of that string in Python. 00:26:26.020 --> 00:26:29.930 And Python will not print out literally a curly brace and a closed curly brace. 00:26:29.930 --> 00:26:34.330 It will instead interpolate whatever is inside of those curly braces. 00:26:34.330 --> 00:26:36.400 That is to say if answer is a variable that 00:26:36.400 --> 00:26:39.190 has some value, like "David" or something like that, 00:26:39.190 --> 00:26:42.400 saying f before the first quotation mark, 00:26:42.400 --> 00:26:44.530 and then using these curly braces therein, 00:26:44.530 --> 00:26:48.430 is going to do the exact same thing of creating a string that says "Hello," 00:26:48.430 --> 00:26:50.440 comma, space, "David." 00:26:50.440 --> 00:26:52.390 So it's going to plug in the value for you. 00:26:52.390 --> 00:26:56.350 So you can think of this as %s but without that second step of having 00:26:56.350 --> 00:26:59.635 to keep track of what you want to plug back in for %s. 00:26:59.635 --> 00:27:02.330 Instead of %s, you literally put in curly braces, 00:27:02.330 --> 00:27:04.060 what do you want to put right there? 00:27:04.060 --> 00:27:06.610 You format the string yourself. 00:27:06.610 --> 00:27:10.570 So given all of those ways, how might we actually 00:27:10.570 --> 00:27:14.410 go about implementing this or using this ourselves? 00:27:14.410 --> 00:27:18.040 Well, let me propose that we do this here. 00:27:18.040 --> 00:27:21.070 Let me propose that I go back to VS Code. 00:27:21.070 --> 00:27:24.640 Let me go ahead and open up hello.py again. 00:27:24.640 --> 00:27:28.730 And as before, instead of just printing out something like, 00:27:28.730 --> 00:27:31.330 quote unquote, "hello, world," let me actually print out 00:27:31.330 --> 00:27:33.260 something a little more interesting. 00:27:33.260 --> 00:27:36.850 So let me go ahead and, from the CS50 library, 00:27:36.850 --> 00:27:39.310 import the function called get_string(). 00:27:39.310 --> 00:27:42.010 Then let me go ahead and create a variable called answer. 00:27:42.010 --> 00:27:45.700 Let me set that equal to the return value of get_string() with, 00:27:45.700 --> 00:27:50.590 as an argument, quote unquote, "What's your name?" 00:27:50.590 --> 00:27:54.610 And then no semicolon at the end of that line, but on the next line, frankly 00:27:54.610 --> 00:27:57.650 here, I can pick any one of those potential solutions. 00:27:57.650 --> 00:27:59.090 So let me start with the first. 00:27:59.090 --> 00:28:00.415 So "hello, " + answer. 00:28:03.640 --> 00:28:07.840 And now, if I go down to my terminal window and run python of hello.py, 00:28:07.840 --> 00:28:08.950 I'm prompted for my name. 00:28:08.950 --> 00:28:12.280 I can type in D-A-V-I-D, and voila, that there then works. 00:28:12.280 --> 00:28:13.870 Or I can tweak this a little bit. 00:28:13.870 --> 00:28:19.120 I can trust that Python will concatenate its first and second argument for me. 00:28:19.120 --> 00:28:21.010 But this isn't quite right. 00:28:21.010 --> 00:28:24.820 Let me go ahead and rerun python of hello.py, hit Enter, and type 00:28:24.820 --> 00:28:25.420 in "David." 00:28:25.420 --> 00:28:28.840 It's going to be ever-so-slightly buggy, sort of grammatically 00:28:28.840 --> 00:28:29.980 or visually, if you will. 00:28:29.980 --> 00:28:31.880 What did I do wrong here? 00:28:31.880 --> 00:28:32.380 Yeah. 00:28:32.380 --> 00:28:35.900 So I left the space in there, even though I'm getting one for free from 00:28:35.900 --> 00:28:36.400 print(). 00:28:36.400 --> 00:28:38.260 So that's an easy solution here. 00:28:38.260 --> 00:28:41.560 But let's do it one other way after running this to be sure-- 00:28:41.560 --> 00:28:42.250 D-A-V-I-D. 00:28:42.250 --> 00:28:44.620 And OK, now it looks like I intended. 00:28:44.620 --> 00:28:47.780 Well, let's go ahead and use that placeholder syntax. 00:28:47.780 --> 00:28:52.240 So let's just pass in one bigger string as our argument, do "hello," 00:28:52.240 --> 00:28:55.930 and then, in curly braces, [? answer ?],, like this. 00:28:55.930 --> 00:28:58.390 Well, let me go down to my terminal window and clear it. 00:28:58.390 --> 00:29:03.700 Let me run python of hello.py and enter, type in D-A-V-I-D, and voila. 00:29:03.700 --> 00:29:05.650 OK, I made a mistake. 00:29:05.650 --> 00:29:08.150 What did I do wrong here, minor though it seems to be? 00:29:08.150 --> 00:29:08.650 Yeah? 00:29:08.650 --> 00:29:09.525 AUDIENCE: [INAUDIBLE] 00:29:09.525 --> 00:29:11.410 DAVID MALAN: So the stupid little f that you 00:29:11.410 --> 00:29:13.690 have to put before the string to tell Python 00:29:13.690 --> 00:29:16.570 that this is a special string-- it's a format string, or f 00:29:16.570 --> 00:29:20.450 string-- that it should additionally format for you. 00:29:20.450 --> 00:29:24.272 So if I rerun this after adding that f, I can do python of hello.py. 00:29:24.272 --> 00:29:24.980 What's your name? 00:29:24.980 --> 00:29:25.480 David. 00:29:25.480 --> 00:29:28.690 And now it looks the way I might intend. 00:29:28.690 --> 00:29:33.160 But it turns out in Python, you don't actually need to use get_string(). 00:29:33.160 --> 00:29:35.770 In C, recall that we introduced that because it's actually 00:29:35.770 --> 00:29:40.840 pretty annoying in C to get strings, in particular to get strings safely. 00:29:40.840 --> 00:29:44.950 Recall those short examples we did with scanf not too long ago. 00:29:44.950 --> 00:29:47.530 And scanf kind of scans what the user types at the keyboard 00:29:47.530 --> 00:29:49.150 and loads it into memory. 00:29:49.150 --> 00:29:53.890 But the fundamental danger with scanf when it comes to strings was what? 00:29:53.890 --> 00:30:00.580 Why was it dangerous to use scanf to get strings from a user? 00:30:00.580 --> 00:30:01.080 Why? 00:30:01.080 --> 00:30:02.058 Yeah? 00:30:02.058 --> 00:30:04.690 AUDIENCE: What if they give you a really long string you don't have space for? 00:30:04.690 --> 00:30:05.260 DAVID MALAN: Exactly. 00:30:05.260 --> 00:30:07.010 What if they give you a really long string 00:30:07.010 --> 00:30:08.598 that you didn't allocate space for? 00:30:08.598 --> 00:30:11.140 Because you're not going to know as the programmer in advance 00:30:11.140 --> 00:30:13.750 how long of a string the human is going to type in. 00:30:13.750 --> 00:30:15.130 So you might under-- 00:30:15.130 --> 00:30:19.895 you might undercut it and therefore have too much memory, or too many 00:30:19.895 --> 00:30:22.270 characters being put into that memory, thereby giving you 00:30:22.270 --> 00:30:26.260 some kind of buffer overflow, which might crash the computer or, minimally, 00:30:26.260 --> 00:30:27.130 your program. 00:30:27.130 --> 00:30:31.570 So it turns out in C, get_string() was especially useful. 00:30:31.570 --> 00:30:33.700 In Python, it's not really that useful. 00:30:33.700 --> 00:30:39.640 All it does is use a function that does come with Python called input(). 00:30:39.640 --> 00:30:43.370 And, in fact, the input() function in Python, for all intents and purposes, 00:30:43.370 --> 00:30:47.270 is the same as the get_string() function that we give to you. 00:30:47.270 --> 00:30:50.140 But just to ease the transition from C to Python, 00:30:50.140 --> 00:30:52.780 we implemented a Python version of get_string() nonetheless. 00:30:52.780 --> 00:30:55.120 But this is to say if I go to VS Code here, 00:30:55.120 --> 00:30:59.030 and I just change get_string() to input(), and, in fact, 00:30:59.030 --> 00:31:03.680 I even get rid of the CS50 library at the top, this too should work fine. 00:31:03.680 --> 00:31:07.940 If I rerun python of hello.py, type in my name, David, and voila, 00:31:07.940 --> 00:31:10.710 I have that now working as well. 00:31:10.710 --> 00:31:11.210 All right. 00:31:11.210 --> 00:31:19.020 Questions about this use of get_string() or input() or any of our syntax thus 00:31:19.020 --> 00:31:19.520 far? 00:31:21.990 --> 00:31:22.490 All right. 00:31:22.490 --> 00:31:24.470 Well, what about variables? 00:31:24.470 --> 00:31:26.480 We've used variables already, and we already 00:31:26.480 --> 00:31:29.960 identified the fact that you don't have to specify the type of your variables 00:31:29.960 --> 00:31:33.710 proactively, even though, clearly, Python supports strings thus far, 00:31:33.710 --> 00:31:37.010 well, in Python, here's how you might declare 00:31:37.010 --> 00:31:40.130 a variable that not necessarily is assigned like a string, 00:31:40.130 --> 00:31:41.440 but maybe an integer instead. 00:31:41.440 --> 00:31:43.190 So in Scratch, here's how you could create 00:31:43.190 --> 00:31:47.030 a variable called counter if you want to count things and set it equal to 0. 00:31:47.030 --> 00:31:50.180 In C, what we would have done is this-- 00:31:50.180 --> 00:31:55.040 int counter = 0; that's the exact same thing as in Scratch. 00:31:55.040 --> 00:31:59.570 But in Python, as you might imagine, we can chip away at this and type 00:31:59.570 --> 00:32:01.760 out this same idea little more easily. 00:32:01.760 --> 00:32:03.920 One, we don't need to say int anymore. 00:32:03.920 --> 00:32:06.240 Two, we don't need the semicolon anymore. 00:32:06.240 --> 00:32:07.850 And so you just do what you intend. 00:32:07.850 --> 00:32:09.770 If you want a variable, just write it out. 00:32:09.770 --> 00:32:12.352 If you want to assign it a value, you use the equals sign. 00:32:12.352 --> 00:32:14.810 If you want to specify that value, you put it on the right. 00:32:14.810 --> 00:32:17.570 And just as in C, this is not the equality operator. 00:32:17.570 --> 00:32:20.420 It's the assignment operator from right to left. 00:32:20.420 --> 00:32:25.460 Recall that in Scratch, if you wanted to increment a variable by 1 or any value, 00:32:25.460 --> 00:32:27.260 you could use this puzzle piece here. 00:32:27.260 --> 00:32:32.310 Well, in C, you could do syntax like this, which, again, is not equality. 00:32:32.310 --> 00:32:38.600 It's saying add 1 to counter and then assign it back to the counter variable. 00:32:38.600 --> 00:32:42.750 In Python, you can do exactly the same thing minus the semicolon. 00:32:42.750 --> 00:32:44.690 So you don't need to use the semicolon here. 00:32:44.690 --> 00:32:48.380 But you might recall that in C, there was some syntactic sugar for this idea 00:32:48.380 --> 00:32:49.640 because it was pretty popular. 00:32:49.640 --> 00:32:53.810 And so you could shorten this in C, as you can in Python, 00:32:53.810 --> 00:32:59.270 to actually just this. += 1 will add to the counter variable whatever that 00:32:59.270 --> 00:33:00.560 value is. 00:33:00.560 --> 00:33:03.740 But it's not all steps forward. 00:33:03.740 --> 00:33:07.250 You might be in the habit of using ++ or --. 00:33:07.250 --> 00:33:09.870 Sorry, those are not available in Python. 00:33:09.870 --> 00:33:10.370 Why? 00:33:10.370 --> 00:33:12.590 It's because the designers of Python decided that you 00:33:12.590 --> 00:33:14.360 don't need them because this is-- 00:33:14.360 --> 00:33:16.230 gets the job done anyway. 00:33:16.230 --> 00:33:19.290 But there's a question down here in front, unless it was about the same. 00:33:19.290 --> 00:33:19.790 All right. 00:33:19.790 --> 00:33:22.070 So that's one feature we're taking away. 00:33:22.070 --> 00:33:25.130 But it's not such a big deal to do += in this case. 00:33:25.130 --> 00:33:28.520 Well, what about the actual types involved here 00:33:28.520 --> 00:33:31.970 beyond actually being able to define variables? 00:33:31.970 --> 00:33:36.620 Well, recall that in the world of C, we had at least these data 00:33:36.620 --> 00:33:39.020 types, those that came with the language in particular. 00:33:39.020 --> 00:33:41.570 And we played with quite a few of these over time. 00:33:41.570 --> 00:33:45.110 In Python, we're going to take a bunch of those away. 00:33:45.110 --> 00:33:51.650 In Python, you're only going to have access to a bool, true or false, 00:33:51.650 --> 00:33:54.110 a float, which is a real number with a decimal point, 00:33:54.110 --> 00:33:58.430 typically, an int, or an integer, and a string, now known as str. 00:33:58.430 --> 00:34:00.533 So Python here sort of cuts some corners, 00:34:00.533 --> 00:34:02.450 feels like it's too long to write out strings. 00:34:02.450 --> 00:34:08.750 So a string in Python is called str, S-T-R, but it's the exact same idea. 00:34:08.750 --> 00:34:11.870 Notice, though, that missing from this now, in particular, 00:34:11.870 --> 00:34:16.639 are double and long, which, recall, actually used more bits in order 00:34:16.639 --> 00:34:17.929 to store information. 00:34:17.929 --> 00:34:20.781 We'll see that that might not necessarily be a bad thing. 00:34:20.781 --> 00:34:22.489 In fact, Python just simplifies the world 00:34:22.489 --> 00:34:24.590 into two different types of variables but gets 00:34:24.590 --> 00:34:26.460 out of the business of you having to decide, 00:34:26.460 --> 00:34:30.923 do you want a small int or a large int or something along those lines? 00:34:30.923 --> 00:34:32.340 Well, let me go ahead and do this. 00:34:32.340 --> 00:34:34.670 Let me switch back over to VS Code here. 00:34:34.670 --> 00:34:37.940 And why don't we actually try to play around with some calculations using 00:34:37.940 --> 00:34:39.590 these data types and more? 00:34:39.590 --> 00:34:42.500 Let me go ahead and propose that we implement, 00:34:42.500 --> 00:34:48.690 like we did way back in week 1, a simple calculator. 00:34:48.690 --> 00:34:51.889 So let me do this-- code of calculator.c. 00:34:51.889 --> 00:34:55.670 So I'm indeed going to do this in C first, just so 00:34:55.670 --> 00:34:58.470 that we have a similar example at hand. 00:34:58.470 --> 00:35:02.940 So I'm going to include standard io.h here at the top. 00:35:02.940 --> 00:35:06.370 I'm going to go ahead and do int main(void). 00:35:06.370 --> 00:35:10.270 Inside of main(), I'm going to go ahead and declare a variable called x and set 00:35:10.270 --> 00:35:15.190 that equal to get_int(), and I'm going to prompt the user for that value x. 00:35:15.190 --> 00:35:18.550 But if I'm using get_int(), recall that actually is from the CS50 library. 00:35:18.550 --> 00:35:22.000 So in C, I'm going to need cs50.h, still, for this example. 00:35:22.000 --> 00:35:24.290 But back in week 1, I then did something else. 00:35:24.290 --> 00:35:28.240 I then said, give me another variable called y, set that equal to get_int(), 00:35:28.240 --> 00:35:31.960 and set that equal to that-- pass in that prompt there. 00:35:31.960 --> 00:35:34.210 And then, lastly, let's just do something super simple 00:35:34.210 --> 00:35:35.900 like add two numbers together. 00:35:35.900 --> 00:35:37.480 So in C, I'll use printf(). 00:35:37.480 --> 00:35:41.860 I'm going to go ahead and do %i backslash n as a placeholder. 00:35:41.860 --> 00:35:44.380 And then I'm just going to plug in x + y. 00:35:44.380 --> 00:35:48.460 So all of that was in C. So it was a decent number of lines of code 00:35:48.460 --> 00:35:50.650 to accomplish that task, only three of which 00:35:50.650 --> 00:35:53.200 are really the logical part of my program. 00:35:53.200 --> 00:35:55.640 These are the three that we're really interested in. 00:35:55.640 --> 00:35:59.980 So let me instead now do this, code of calculator.py, 00:35:59.980 --> 00:36:01.510 which is going to give me a new tab. 00:36:01.510 --> 00:36:05.540 Let me just drag it over to the right so I can view these side by side. 00:36:05.540 --> 00:36:07.790 And in calculator.py, let's do this. 00:36:07.790 --> 00:36:11.910 From the CS50 library, import the get_int() function, 00:36:11.910 --> 00:36:13.430 which is also available. 00:36:13.430 --> 00:36:16.670 Then let's go ahead and create a variable called x and set it equal 00:36:16.670 --> 00:36:20.180 to the return value of get_int(), passing in the same prompt-- 00:36:20.180 --> 00:36:22.220 no semicolon, no mention of int. 00:36:22.220 --> 00:36:25.040 Let's then create a second variable y, set it equal to get_int(), 00:36:25.040 --> 00:36:29.970 prompt the user for y, as before, no int, explicitly, no semicolon. 00:36:29.970 --> 00:36:33.620 And now let's just go ahead and print out x + y. 00:36:33.620 --> 00:36:37.070 So it turns out that the print() function in Python is further flexible, 00:36:37.070 --> 00:36:39.050 that you don't need these format strings. 00:36:39.050 --> 00:36:42.470 If you want to print out an integer, just pass it an integer, 00:36:42.470 --> 00:36:45.750 even if that integer is the sum of two other integers. 00:36:45.750 --> 00:36:48.180 So it just sort of works as you might expect. 00:36:48.180 --> 00:36:50.060 So let me go down into my terminal here. 00:36:50.060 --> 00:36:52.580 Let me run python of calculator.py. 00:36:52.580 --> 00:36:54.830 And when I hit Enter, I'm prompted for x. 00:36:54.830 --> 00:36:55.730 Let's do 1. 00:36:55.730 --> 00:36:56.630 I'm prompted for y. 00:36:56.630 --> 00:36:57.530 Let's do 2. 00:36:57.530 --> 00:37:02.070 And voila, I should see 3 as the result-- 00:37:02.070 --> 00:37:04.800 so no actual surprises there. 00:37:04.800 --> 00:37:07.948 But let me go ahead and, you know what? 00:37:07.948 --> 00:37:09.740 Let's take away this training wheel, right? 00:37:09.740 --> 00:37:12.210 We don't want to keep introducing CS50-specific things. 00:37:12.210 --> 00:37:14.960 So suppose we didn't give you get_int(). 00:37:14.960 --> 00:37:18.505 Well, it turns out that get_int() is still doing a bit of help for you, 00:37:18.505 --> 00:37:21.380 even though get_string() was kind of a throwaway and we could replace 00:37:21.380 --> 00:37:22.590 get_string() with input(). 00:37:22.590 --> 00:37:24.530 So let's try this same idea. 00:37:24.530 --> 00:37:28.880 Let's go ahead and prompt the user for input for both x and y using 00:37:28.880 --> 00:37:33.410 the input() function in Python instead of get_int() from CS50. 00:37:33.410 --> 00:37:37.280 Let me go ahead and rerun Python of calculator.py and hit Enter. 00:37:37.280 --> 00:37:38.330 So far, so good. 00:37:38.330 --> 00:37:39.470 Let me type in 1. 00:37:39.470 --> 00:37:40.490 Let me type in 2. 00:37:40.490 --> 00:37:43.230 And what answer should we see? 00:37:43.230 --> 00:37:46.260 Hopefully still 3, but nope. 00:37:46.260 --> 00:37:51.570 Now the answer is 12, or is it? 00:37:51.570 --> 00:37:54.233 Why am I seeing 12 and not 3? 00:37:54.233 --> 00:37:55.650 AUDIENCE: [INAUDIBLE] two strings. 00:37:55.650 --> 00:37:56.400 DAVID MALAN: Yeah. 00:37:56.400 --> 00:37:59.230 So it's actually concatenating what seem to be two strings. 00:37:59.230 --> 00:38:01.980 So if we actually read the documentation for the input() function, 00:38:01.980 --> 00:38:04.530 it's behaving exactly as it's supposed to. 00:38:04.530 --> 00:38:06.810 It is getting input from the user from their keyboard. 00:38:06.810 --> 00:38:09.840 But anything you type at the keyboard is effectively a string. 00:38:09.840 --> 00:38:12.390 Even if some of the symbols happen to look like or actually 00:38:12.390 --> 00:38:15.780 be decimal numbers, they're still going to come to you as strings. 00:38:15.780 --> 00:38:20.130 And so x is a string, a.k.a., str, y is a str, 00:38:20.130 --> 00:38:24.660 and we've already seen that if you use plus in between two strings, or strs, 00:38:24.660 --> 00:38:27.390 you're going to get concatenation, not addition. 00:38:27.390 --> 00:38:32.590 So you're not seeing 12 as much as you're seeing 1 2, not 12. 00:38:32.590 --> 00:38:33.820 So how can we fix this? 00:38:33.820 --> 00:38:38.040 Well, in C, we had this technique where we could cast one thing to another 00:38:38.040 --> 00:38:40.710 by just putting int in parentheses, for instance. 00:38:40.710 --> 00:38:42.880 In Python, things are a little higher-level such 00:38:42.880 --> 00:38:46.060 that you can't quite get away with just casting 00:38:46.060 --> 00:38:51.280 one thing to another because a string, recall, is not 00:38:51.280 --> 00:38:53.080 the same thing as a char. 00:38:53.080 --> 00:38:55.450 A string has zero or more characters. 00:38:55.450 --> 00:38:56.890 A char always has one. 00:38:56.890 --> 00:39:00.490 And in C, there was a perfect mapping between single characters 00:39:00.490 --> 00:39:05.230 and single numbers in decimal, like 65 for capital A. 00:39:05.230 --> 00:39:09.430 But in Python, we can do something somewhat similar and not so much cast 00:39:09.430 --> 00:39:15.510 but convert this input() to an int and convert this input() to an int. 00:39:15.510 --> 00:39:17.260 So just like in C, you can nest functions. 00:39:17.260 --> 00:39:19.750 You can call one function and pass its output 00:39:19.750 --> 00:39:21.700 as the input to another function. 00:39:21.700 --> 00:39:25.390 And this now will convert x and y to integers. 00:39:25.390 --> 00:39:28.780 And so now plus is going to behave as you should-- as you would expect. 00:39:28.780 --> 00:39:33.010 Let me rerun python of calculator.py, type in 1, type in 2, and now 00:39:33.010 --> 00:39:37.150 we're back to seeing 3 as the result. If this is a little unclear, 00:39:37.150 --> 00:39:39.190 this nesting, let me do this one other way. 00:39:39.190 --> 00:39:43.360 Instead of just passing input() output into int, 00:39:43.360 --> 00:39:45.760 I could also more pedantically do this. 00:39:45.760 --> 00:39:52.000 x should actually equal int(x), y should actually equal int(y). 00:39:52.000 --> 00:39:54.200 This would be the exact same effect. 00:39:54.200 --> 00:39:57.340 It's just two extra lines where it's not really necessary. 00:39:57.340 --> 00:39:58.540 But that would work fine. 00:39:58.540 --> 00:40:01.270 If you don't like that approach, we could even do it inline. 00:40:01.270 --> 00:40:05.420 We could actually convert x to an int and y to an int. 00:40:05.420 --> 00:40:05.920 Why? 00:40:05.920 --> 00:40:10.180 Well, int, I-N-T, in the context of Python itself, is a function. 00:40:10.180 --> 00:40:13.570 And it takes as input here a string, or str, 00:40:13.570 --> 00:40:19.550 and returns to you the numeric, the integral equivalent-- so similar idea, 00:40:19.550 --> 00:40:20.870 but it's actually a function. 00:40:20.870 --> 00:40:23.320 So all of the syntax that I've been tinkering with here 00:40:23.320 --> 00:40:27.800 is sort of fundamentally the same as it would be in C. But in this case, 00:40:27.800 --> 00:40:31.120 we're not casting but converting more specifically. 00:40:31.120 --> 00:40:33.830 Well, let me go back to these data types. 00:40:33.830 --> 00:40:37.423 These are some of the data types that are available to us in Python. 00:40:37.423 --> 00:40:39.340 It turns out there's a bunch of others as well 00:40:39.340 --> 00:40:40.923 that we'll start to dabble with today. 00:40:40.923 --> 00:40:43.490 You can get a range of values, a list of values, 00:40:43.490 --> 00:40:46.960 which is going to be like an array, but better, tuples, which 00:40:46.960 --> 00:40:50.920 are kind of like x, comma, y, often, combinations of values that 00:40:50.920 --> 00:40:51.940 don't change. 00:40:51.940 --> 00:40:57.580 dict for dictionary-- it turns out that in Python, you get dictionaries. 00:40:57.580 --> 00:40:58.917 You get hash tables for free. 00:40:58.917 --> 00:41:00.250 They're built into the language. 00:41:00.250 --> 00:41:02.125 And we already saw that Python also gives you 00:41:02.125 --> 00:41:04.060 a data type known as a set, which is just 00:41:04.060 --> 00:41:06.400 a collection of values that gives you-- 00:41:06.400 --> 00:41:08.090 gets rid of any duplicates for you. 00:41:08.090 --> 00:41:11.830 And as we saw briefly in speller-- and we'll play more with these ideas soon-- 00:41:11.830 --> 00:41:15.340 it's going to actually be pretty darn easy to get values or check 00:41:15.340 --> 00:41:17.960 for values in those there data types. 00:41:17.960 --> 00:41:22.510 So that in C, we were able to get input easily, we had all of these functions. 00:41:22.510 --> 00:41:26.253 In the CS50 library for Python, we're only going to give you these instead. 00:41:26.253 --> 00:41:27.670 They're going to be the same name. 00:41:27.670 --> 00:41:30.670 So it's still get_string(), not get_str, because we wanted the functions 00:41:30.670 --> 00:41:31.840 to remain named the same. 00:41:31.840 --> 00:41:34.690 But get_float(), get_int(), get_string() all exist. 00:41:34.690 --> 00:41:37.280 But, again, get_string() is not all that useful. 00:41:37.280 --> 00:41:41.980 But get_int() and get_float() actually are. 00:41:41.980 --> 00:41:42.580 Why? 00:41:42.580 --> 00:41:44.830 Well, let me go back to VS Code here. 00:41:44.830 --> 00:41:47.920 And let me go back to the second version of this program, 00:41:47.920 --> 00:41:54.980 whereby I proactively converted each of these return values to integers. 00:41:54.980 --> 00:41:59.020 So recall that this is the solution to the 1 2 problem. 00:41:59.020 --> 00:42:03.550 And to be clear, if I run python of calculator.py and input 1 and 2, 00:42:03.550 --> 00:42:06.100 I get back now 3 as expected. 00:42:06.100 --> 00:42:09.850 But what I'm not showing you is that there's still potentially a bug here. 00:42:09.850 --> 00:42:13.480 Let me run python of calculator.py, and let me just not cooperate. 00:42:13.480 --> 00:42:15.400 Instead of typing what looks like a number, 00:42:15.400 --> 00:42:19.180 let me actually type something that's clearly a string, like cat. 00:42:19.180 --> 00:42:22.570 And unfortunately, we're going to see the first of our errors, 00:42:22.570 --> 00:42:23.938 the first of our runtime errors. 00:42:23.938 --> 00:42:26.230 And this, like in C, is going to look cryptic at first. 00:42:26.230 --> 00:42:28.330 But this is generally known as a traceback, where 00:42:28.330 --> 00:42:31.523 it's going to trace back for you everything your program just did, 00:42:31.523 --> 00:42:33.190 even though this one's relatively short. 00:42:33.190 --> 00:42:36.915 And you'll see that calculator.py, line 1-- 00:42:36.915 --> 00:42:39.040 I didn't even get very far before there's an error. 00:42:39.040 --> 00:42:42.300 And then, with all of these carrot symbols here, this is a problem. 00:42:42.300 --> 00:42:42.810 Why? 00:42:42.810 --> 00:42:47.370 invalid literal for int() function with base 10, quote unquote, 'cat.' 00:42:47.370 --> 00:42:49.147 Again, just like in C, It's very arcane. 00:42:49.147 --> 00:42:51.480 It's hard to understand this the first time you read it. 00:42:51.480 --> 00:42:55.920 But what it's trying to tell me is that cat is not an integer. 00:42:55.920 --> 00:42:59.850 And therefore, the int() function cannot convert it to an integer for you. 00:42:59.850 --> 00:43:01.920 We're going to leave this problem alone for now. 00:43:01.920 --> 00:43:04.950 But this is why, again, get_int()'s looking kind of good, 00:43:04.950 --> 00:43:07.980 and get_float()'s looking kind of good because those functions from 00:43:07.980 --> 00:43:12.970 CS50's library will deal with these kinds of problems for you. 00:43:12.970 --> 00:43:14.970 Now, just so you've seen it, there's another way 00:43:14.970 --> 00:43:17.370 to import functions from these things. 00:43:17.370 --> 00:43:20.430 If you were to use, for instance, in a program, get_float(), get_int(), 00:43:20.430 --> 00:43:24.180 and get_string(), you don't need to do three separate lines like this. 00:43:24.180 --> 00:43:27.420 You can actually separate them a little more cleanly with commas. 00:43:27.420 --> 00:43:33.240 And, in fact, if I go back to a version of this program here in VS Code whereby 00:43:33.240 --> 00:43:35.710 I actually do use the get_int() function-- 00:43:35.710 --> 00:43:39.810 so let me actually get rid of all this and use get_int() as before. 00:43:39.810 --> 00:43:43.000 Let me get rid of all this and use get_int() as before. 00:43:43.000 --> 00:43:48.640 Previously, the way I did this was by saying from cs50 import get_int() 00:43:48.640 --> 00:43:51.200 if you know in advance what function you want to use. 00:43:51.200 --> 00:43:54.400 But suppose, for whatever reason, you already have your own function named 00:43:54.400 --> 00:43:57.940 get_int(), and therefore, it would collide with CS50's own, 00:43:57.940 --> 00:44:01.400 you can avoid that issue, too, by just using that first statement we saw 00:44:01.400 --> 00:44:01.900 earlier. 00:44:01.900 --> 00:44:03.520 Just import the library itself. 00:44:03.520 --> 00:44:06.520 Don't specify explicitly which functions you're going to use. 00:44:06.520 --> 00:44:09.850 But thereafter-- and you could not do this in C-- 00:44:09.850 --> 00:44:14.950 you could specify cs50.get_int(), cs50.get_int(), 00:44:14.950 --> 00:44:19.450 in order to go into the library, access its get_int() function, and therefore, 00:44:19.450 --> 00:44:22.630 it doesn't matter if you or any number of other people wrote 00:44:22.630 --> 00:44:25.360 an identically-named function called get_int(). 00:44:25.360 --> 00:44:28.660 You're using here, clearly, CS50's own. 00:44:28.660 --> 00:44:34.180 So this is, again, just more ways to achieve the same solution 00:44:34.180 --> 00:44:36.410 but with different syntax. 00:44:36.410 --> 00:44:36.910 All right. 00:44:36.910 --> 00:44:44.030 Any questions about any of this syntax or features thus far? 00:44:44.030 --> 00:44:44.530 No? 00:44:44.530 --> 00:44:45.030 All right. 00:44:45.030 --> 00:44:48.010 Well, how about maybe another example here, 00:44:48.010 --> 00:44:52.450 whereby we revisit conditionals, which was the way of implementing 00:44:52.450 --> 00:44:55.510 do this thing or this thing, sort of proverbial forks in the road. 00:44:55.510 --> 00:44:58.420 In Scratch, recall, we might use building blocks like these 00:44:58.420 --> 00:45:01.990 to just check, is x less than y, and if so, say so. 00:45:01.990 --> 00:45:04.400 In C, this code looked like this. 00:45:04.400 --> 00:45:07.210 And notice that we had parentheses around the x and the y. 00:45:07.210 --> 00:45:11.380 We had curly braces, even though I did disclaim that for single lines of code, 00:45:11.380 --> 00:45:13.160 you can actually omit the curly braces. 00:45:13.160 --> 00:45:15.790 But stylistically, we always include them in CS50's code. 00:45:15.790 --> 00:45:18.250 But you have the backslash n and the semicolon. 00:45:18.250 --> 00:45:21.770 In a moment, you're about to see the Python equivalent of this, 00:45:21.770 --> 00:45:23.270 which is almost the same. 00:45:23.270 --> 00:45:24.760 It's just a little nicer. 00:45:24.760 --> 00:45:28.160 This, then, is the Python equivalent thereof. 00:45:28.160 --> 00:45:32.840 So what's different at a glance here, just to be clear? 00:45:32.840 --> 00:45:33.620 What's different? 00:45:33.620 --> 00:45:34.475 Yeah? 00:45:34.475 --> 00:45:35.350 AUDIENCE: [INAUDIBLE] 00:45:35.350 --> 00:45:38.440 DAVID MALAN: So the conditional is not in parentheses. 00:45:38.440 --> 00:45:42.230 You can use parentheses, especially if, logically, you need to group things. 00:45:42.230 --> 00:45:45.160 But if you don't need them, don't use them is Python's mindset. 00:45:45.160 --> 00:45:46.780 What else has changed here? 00:45:46.780 --> 00:45:47.500 Yeah? 00:45:47.500 --> 00:45:48.700 AUDIENCE: No curly brackets. 00:45:48.700 --> 00:45:51.620 DAVID MALAN: No curly braces, yeah, so no curly braces around this. 00:45:51.620 --> 00:45:55.250 And even though it's one line of code, you just don't use curly braces at all. 00:45:55.250 --> 00:45:55.750 Why? 00:45:55.750 --> 00:46:00.230 Because in Python, indentation is actually really, really important. 00:46:00.230 --> 00:46:02.410 And we know from office hours and problem sets 00:46:02.410 --> 00:46:05.200 occasionally that if you forgot to run style50 00:46:05.200 --> 00:46:07.870 or you didn't manually format your code beautifully, 00:46:07.870 --> 00:46:10.960 C is not actually going to care if everything is aligned on the left. 00:46:10.960 --> 00:46:14.830 If you never once hit the Tab character or the space bar, 00:46:14.830 --> 00:46:17.770 C, or specifically, clang, isn't really going to care. 00:46:17.770 --> 00:46:19.722 But your teaching fellow, your TA, is going 00:46:19.722 --> 00:46:22.930 to care, or your colleague in the real world, because your code's just a mess 00:46:22.930 --> 00:46:23.920 and hard to read. 00:46:23.920 --> 00:46:29.380 Python, though-- because you are not the only ones in the world that might have 00:46:29.380 --> 00:46:31.450 bad habits when it comes to style-- 00:46:31.450 --> 00:46:34.490 Python as a language decided, that's it. 00:46:34.490 --> 00:46:37.890 Everyone has to indent in order for their code to even work. 00:46:37.890 --> 00:46:40.470 So the convention as Python is to use for spaces-- 00:46:40.470 --> 00:46:43.850 so 1, 2, 3, 4, or hit Tab and let it automatically convert to the same, 00:46:43.850 --> 00:46:47.960 and use a colon instead of the curly braces, 00:46:47.960 --> 00:46:50.480 for instance, to make clear what is associated 00:46:50.480 --> 00:46:53.000 with this particular conditional. 00:46:53.000 --> 00:46:55.310 We can omit, though, the backslash n per before. 00:46:55.310 --> 00:46:56.630 We can omit the semicolon. 00:46:56.630 --> 00:46:59.750 But this is essentially the Python version thereof. 00:46:59.750 --> 00:47:03.560 Here in C-- in Scratch, if you wanted to do an if-else, 00:47:03.560 --> 00:47:08.000 like we did back in week 0, in C, It's very similar to the if, except you add 00:47:08.000 --> 00:47:11.810 the else clause and write out an additional printf() like this. 00:47:11.810 --> 00:47:13.760 In Python, we can tighten this up. 00:47:13.760 --> 00:47:16.730 if x less than y, colon, that's exactly the same. 00:47:16.730 --> 00:47:17.810 First line's the same. 00:47:17.810 --> 00:47:21.680 All we're doing now is adding an else and the second print line here. 00:47:21.680 --> 00:47:22.640 How about in Scratch? 00:47:22.640 --> 00:47:26.060 If we had a three-way fork in the road-- if, else, if, else. 00:47:26.060 --> 00:47:29.750 In C, it looked pretty much like that-- if, else, if, else. 00:47:29.750 --> 00:47:31.820 In Python, we can tighten this up. 00:47:31.820 --> 00:47:34.170 And this is not a typo. 00:47:34.170 --> 00:47:38.300 What jumps out at you as weird but you got to just get used to it? 00:47:38.300 --> 00:47:39.020 Yeah? 00:47:39.020 --> 00:47:39.820 AUDIENCE: elif. 00:47:39.820 --> 00:47:40.570 DAVID MALAN: elif. 00:47:40.570 --> 00:47:44.750 And honestly, years later, I still can't remember if it's elif or elsif 00:47:44.750 --> 00:47:49.332 because other languages actually do E-L-S-I-F. 00:47:49.332 --> 00:47:52.040 and now I probably now biased all of you to now questioning this. 00:47:52.040 --> 00:47:53.390 But it's elif in Python. 00:47:53.390 --> 00:47:55.200 E-L-I-F is not a typo. 00:47:55.200 --> 00:47:58.520 It's in the spirit of let's just save ourselves some keystrokes. 00:47:58.520 --> 00:48:03.950 So elif is identical to elsif, but it's a little tighter to type it this way. 00:48:03.950 --> 00:48:04.670 All right. 00:48:04.670 --> 00:48:09.080 So if we now have this ability to express conditionals, 00:48:09.080 --> 00:48:11.250 what can we actually do with them? 00:48:11.250 --> 00:48:13.530 Well, let me go over to VS Code here. 00:48:13.530 --> 00:48:18.320 And let me propose that we revisit maybe another program from before, 00:48:18.320 --> 00:48:21.600 where we just compare two integers in particular. 00:48:21.600 --> 00:48:22.580 So I'm in VS Code. 00:48:22.580 --> 00:48:25.670 Let me open up a file called, say, compare.py. 00:48:25.670 --> 00:48:28.460 And in compare.py, we'll use the CS50 library just 00:48:28.460 --> 00:48:32.220 so we don't risk any errors, like if the human doesn't type an integer. 00:48:32.220 --> 00:48:35.450 So we're going to go ahead and say from cs50 import get_int(). 00:48:35.450 --> 00:48:39.350 And in compare.py, let's get two variables-- x = get_int(), 00:48:39.350 --> 00:48:41.390 and prompt the user for x. 00:48:41.390 --> 00:48:42.830 So "What's x?" 00:48:42.830 --> 00:48:47.840 To be a bit more verbose, y = get+int("What's y? ") 00:48:47.840 --> 00:48:50.490 And then let's go ahead and just compare these two values. 00:48:50.490 --> 00:48:55.196 So if x is less than y, then go ahead and print out with print("x is less 00:48:55.196 --> 00:49:01.160 than y"), close quote, elif x is greater than y, 00:49:01.160 --> 00:49:07.220 go ahead and print out "x is greater than y," close quote, else, 00:49:07.220 --> 00:49:12.830 go ahead and print out "x is equal to y"-- so the exact same program, 00:49:12.830 --> 00:49:16.400 but I've added to the mix getting a value of x and y. 00:49:16.400 --> 00:49:18.650 Let me run python of compare.py. 00:49:18.650 --> 00:49:19.430 Enter. 00:49:19.430 --> 00:49:23.060 Let's type in 1 for x, 2 for y. x is less than y. 00:49:23.060 --> 00:49:24.500 Let's run it once more. 00:49:24.500 --> 00:49:26.240 x is 2. y is 1. 00:49:26.240 --> 00:49:27.440 x is greater than y. 00:49:27.440 --> 00:49:30.230 And just for good measure, let's run it a third time. x is 1. 00:49:30.230 --> 00:49:31.400 y is 1. 00:49:31.400 --> 00:49:32.690 x is equal to y. 00:49:32.690 --> 00:49:37.430 So the code, daresay, works exactly as you would expect, as you would hope. 00:49:37.430 --> 00:49:40.430 But it turns out that in the world of Python, 00:49:40.430 --> 00:49:44.210 we're actually going to get some other behavior that might actually 00:49:44.210 --> 00:49:48.620 have been what you expected weeks ago, even though C did not behave this way. 00:49:48.620 --> 00:49:52.610 In the world of Python and in the world of strings, a.k.a. 00:49:52.610 --> 00:49:57.090 strs, strings actually behave more like you would expect. 00:49:57.090 --> 00:49:58.070 So by that I mean this. 00:49:58.070 --> 00:49:59.780 Let me actually go back to this code. 00:49:59.780 --> 00:50:05.420 And instead of using integers, let me go ahead and get rid of-- 00:50:05.420 --> 00:50:08.670 I could do get_string(), but we said that that's not really necessary. 00:50:08.670 --> 00:50:11.030 So let's just go ahead and change this to input(). 00:50:11.030 --> 00:50:11.720 And actually, you know what? 00:50:11.720 --> 00:50:12.678 Let's just start fresh. 00:50:12.678 --> 00:50:16.370 Let's give myself a string called s and use the input() function and ask 00:50:16.370 --> 00:50:17.920 the user for s. 00:50:17.920 --> 00:50:21.800 Let's use another variable called t just because it comes after s and use 00:50:21.800 --> 00:50:23.870 the input() function to get t. 00:50:23.870 --> 00:50:26.698 Then let's compare if s and t are the same. 00:50:26.698 --> 00:50:28.490 Now, a couple of weeks ago, this backfired. 00:50:28.490 --> 00:50:32.670 And if I tried to compare two strings for equality, it did not work. 00:50:32.670 --> 00:50:38.100 But if I do if s == t, print("Same"), else, 00:50:38.100 --> 00:50:40.590 let's go ahead and print("Different"). 00:50:40.590 --> 00:50:44.410 I daresay, in Python, I think this is going to work as you would expect. 00:50:44.410 --> 00:50:48.480 So python of compare.py, let's type in cat and cat. 00:50:48.480 --> 00:50:49.920 And indeed those are the same. 00:50:49.920 --> 00:50:53.190 Let me run it again and type in cat and dog, respectively. 00:50:53.190 --> 00:50:55.140 And those are now different. 00:50:55.140 --> 00:51:00.240 But in C, we always got "Different," "Different," "Different," 00:51:00.240 --> 00:51:04.500 even if I typed the exact same word, be it cat or dog or high or anything else. 00:51:04.500 --> 00:51:08.820 Why, in C, were s and t always different a couple of weeks ago? 00:51:08.820 --> 00:51:09.904 Yeah? 00:51:09.904 --> 00:51:12.946 AUDIENCE: Because it was comparing the value of the char* with the memory 00:51:12.946 --> 00:51:13.515 address. 00:51:13.515 --> 00:51:14.390 DAVID MALAN: Exactly. 00:51:14.390 --> 00:51:18.570 In C, string is the same thing as char*, which is a memory address. 00:51:18.570 --> 00:51:20.780 And because we had called get_string() twice, 00:51:20.780 --> 00:51:24.290 even if the human typed the same things, that was two different chunks of memory 00:51:24.290 --> 00:51:25.790 at two different addresses. 00:51:25.790 --> 00:51:29.090 So those two char*s were just naturally always different, 00:51:29.090 --> 00:51:31.880 even if the characters at those addresses were the same. 00:51:31.880 --> 00:51:33.802 Python is meant to be higher-level. 00:51:33.802 --> 00:51:35.510 It's meant to be a little more intuitive. 00:51:35.510 --> 00:51:38.720 It's meant to be more accessible to folks who might not necessarily 00:51:38.720 --> 00:51:41.310 know or want to understand those lower-level details. 00:51:41.310 --> 00:51:48.080 So in Python, ==, even for strings just works the way that you might expect. 00:51:48.080 --> 00:51:50.450 But in Python, we can do some other things, 00:51:50.450 --> 00:51:55.230 too, even more easily than we could in C. Let me go back to VS Code here. 00:51:55.230 --> 00:51:56.660 Let me close compare.py. 00:51:56.660 --> 00:51:59.720 And let's reimplement a program from C called agree, 00:51:59.720 --> 00:52:02.980 which allowed us to prompt the user for a yes/no question, like, 00:52:02.980 --> 00:52:05.730 do you agree to these terms and conditions or something like that. 00:52:05.730 --> 00:52:08.660 So let's do code of agree.py. 00:52:08.660 --> 00:52:12.410 And with agree.py, let me go ahead and-- 00:52:12.410 --> 00:52:14.680 actually, let's go ahead and do this. 00:52:14.680 --> 00:52:18.220 Let me also open up a file that I came with in advance. 00:52:18.220 --> 00:52:20.310 And this is called agree.c. 00:52:20.310 --> 00:52:23.340 And this is what we did some weeks ago when 00:52:23.340 --> 00:52:26.860 we wanted to check whether or not the user had agreed to something or not. 00:52:26.860 --> 00:52:29.310 So we used the CS50 library, the standard I/O library, 00:52:29.310 --> 00:52:31.290 we had a main() function, we used get_char(). 00:52:31.290 --> 00:52:35.460 And then we used == a lot, and we used the two vertical bars, 00:52:35.460 --> 00:52:36.990 which meant logical or. 00:52:36.990 --> 00:52:39.100 Is this thing true or is this thing true? 00:52:39.100 --> 00:52:42.120 And if so, printf() "Agreed" or "Not agreed." 00:52:42.120 --> 00:52:42.900 So this worked. 00:52:42.900 --> 00:52:44.160 And this is relatively simple. 00:52:44.160 --> 00:52:47.670 That's the right way to do it in C. But notice 00:52:47.670 --> 00:52:50.400 it was a little verbose because we wanted 00:52:50.400 --> 00:52:53.820 to handle uppercase and lowercase, uppercase and lowercase. 00:52:53.820 --> 00:52:56.640 So that did start to bloat the code, admittedly. 00:52:56.640 --> 00:52:58.770 So let's try to do the same thing in Python 00:52:58.770 --> 00:53:02.530 and see what we can do the same or different-- no pun intended. 00:53:02.530 --> 00:53:03.330 So let me do this. 00:53:03.330 --> 00:53:08.160 In agree.py, why don't we try to get input from the user as before? 00:53:08.160 --> 00:53:09.450 And I will use-- 00:53:09.450 --> 00:53:12.060 I could use get_string(), but I'll go ahead and use input(). 00:53:12.060 --> 00:53:17.940 So s = input("Do you agree? ") in double quotes. 00:53:17.940 --> 00:53:21.090 And then let's go ahead and check if s == "Y"-- 00:53:24.240 --> 00:53:26.980 and it's not vertical bar now, it's actually more readable, 00:53:26.980 --> 00:53:34.830 more English-like-- or s == "y," then go ahead and print out "Agreed" as before, 00:53:34.830 --> 00:53:36.270 elsif-- 00:53:36.270 --> 00:53:43.710 see, I did it there-- elif s == "N" or s == "n," 00:53:43.710 --> 00:53:47.040 go ahead and print out "Not agreed." 00:53:47.040 --> 00:53:51.720 So it's almost the same as the C version, except that I'm using, 00:53:51.720 --> 00:53:53.980 literally, O-R instead of two vertical bars. 00:53:53.980 --> 00:53:56.430 So let's run this-- so python of agree.py. 00:53:56.430 --> 00:53:57.030 Enter. 00:53:57.030 --> 00:53:58.110 Do I agree? 00:53:58.110 --> 00:53:59.610 Yes, for little y. 00:53:59.610 --> 00:54:02.970 Let's do it again. python of agree.py, capital Y. Yes. 00:54:02.970 --> 00:54:03.910 That works there. 00:54:03.910 --> 00:54:06.630 And if I do it again with lowercase n, and if I do it 00:54:06.630 --> 00:54:10.240 with capital N, this program, too, seems to work. 00:54:10.240 --> 00:54:12.160 But what if I do this? 00:54:12.160 --> 00:54:13.980 Let me rerun python of agree.py. 00:54:13.980 --> 00:54:16.140 Let me type in Yes. 00:54:16.140 --> 00:54:17.227 OK, it just ignores me. 00:54:17.227 --> 00:54:18.060 Let me run it again. 00:54:18.060 --> 00:54:18.915 Let me type in no. 00:54:18.915 --> 00:54:20.130 It just ignores me. 00:54:20.130 --> 00:54:22.230 Let me try it very emphatically, YES in all caps. 00:54:22.230 --> 00:54:23.370 It just ignores me. 00:54:23.370 --> 00:54:26.100 So there's some explosion of possibilities 00:54:26.100 --> 00:54:27.930 that ideally we should handle, right? 00:54:27.930 --> 00:54:32.790 This is bad user interface design if I have-- the user has to type Y or N, 00:54:32.790 --> 00:54:37.260 even if yes and no in English are perfectly reasonable and logical, too. 00:54:37.260 --> 00:54:39.250 So how could we handle that? 00:54:39.250 --> 00:54:43.950 Well, it turns out in Python, we can use something like an array, 00:54:43.950 --> 00:54:47.740 technically called a list, to maybe check a bunch of things at once. 00:54:47.740 --> 00:54:49.120 So let me do this. 00:54:49.120 --> 00:54:54.330 Let me instead say not equality, but let me use the in keyword in Python 00:54:54.330 --> 00:54:56.880 and check if it's in a collection of possible values. 00:54:56.880 --> 00:54:59.940 Let me say if s is in-- 00:54:59.940 --> 00:55:03.930 and here comes, in square brackets, just like-- 00:55:03.930 --> 00:55:09.630 in square brackets, quote unquote, "y", quote unquote, "yes," 00:55:09.630 --> 00:55:13.770 then we can go ahead and print out "Agreed," 00:55:13.770 --> 00:55:20.100 elif s in this list of values, lowercase "n" or lowercase "no," 00:55:20.100 --> 00:55:23.190 then we can print out, for instance, "Not agreed." 00:55:23.190 --> 00:55:25.890 But this is a bit of a step backwards because now 00:55:25.890 --> 00:55:29.010 I'm only handling lowercase. 00:55:29.010 --> 00:55:32.520 So let me go into the mix and maybe add capital "Y"-- 00:55:32.520 --> 00:55:36.780 wait a minute, then maybe capital "YES," then maybe "YeS," also-- 00:55:36.780 --> 00:55:41.310 I mean, weird, but we should probably support this and "YEs." 00:55:41.310 --> 00:55:43.480 I mean, there's a lot of combinations. 00:55:43.480 --> 00:55:45.360 So this is not going to end well. 00:55:45.360 --> 00:55:47.760 Or it's just going to bloat my code unnecessarily. 00:55:47.760 --> 00:55:51.840 And eventually, for longer words, I'm surely going to miss capitalization. 00:55:51.840 --> 00:55:54.990 So logically, whether it's in Python or C or any language, 00:55:54.990 --> 00:56:00.900 what might be a better design for this problem of handling Y and Yes, 00:56:00.900 --> 00:56:03.090 but who cares about the capitalization? 00:56:03.090 --> 00:56:07.500 AUDIENCE: Don't use capitals or [INAUDIBLE] 00:56:07.500 --> 00:56:10.260 DAVID MALAN: So OK, so don't use capitals. 00:56:10.260 --> 00:56:12.280 You could only support lowercase. 00:56:12.280 --> 00:56:12.780 That's fine. 00:56:12.780 --> 00:56:14.072 That's kind of a copout, right? 00:56:14.072 --> 00:56:16.225 Because now the program's usability is worse. 00:56:16.225 --> 00:56:17.100 AUDIENCE: Convert it. 00:56:17.100 --> 00:56:19.590 DAVID MALAN: Oh, we could convert it to lowercase, yeah. 00:56:19.590 --> 00:56:22.440 Though I did hear you say we could just check the first letter, 00:56:22.440 --> 00:56:24.460 I bet that's going to get us into trouble. 00:56:24.460 --> 00:56:26.700 And we probably don't want to allow any word starting 00:56:26.700 --> 00:56:31.320 with Y, any word starting with N, just because it logically-- especially you 00:56:31.320 --> 00:56:33.150 want the lawyers happy, presumably. 00:56:33.150 --> 00:56:36.960 You should probably get an explicit semantically correct word like Y or N 00:56:36.960 --> 00:56:37.890 or yes or no. 00:56:37.890 --> 00:56:41.580 But, yeah, we can actually go about converting this to something 00:56:41.580 --> 00:56:42.295 maybe smaller. 00:56:42.295 --> 00:56:43.920 But how do we go about converting this? 00:56:43.920 --> 00:56:48.450 In C, that alone was going to be pretty darn annoying because we'd have to use 00:56:48.450 --> 00:56:53.770 the tolower() function on every character and compare it for equality. 00:56:53.770 --> 00:56:55.440 It just feels like that's a bit of work. 00:56:55.440 --> 00:56:58.840 But in Python, you're going to get more functionality for free. 00:56:58.840 --> 00:57:01.680 So there might very well be a function, like in C, 00:57:01.680 --> 00:57:03.600 called tolower() or toupper(). 00:57:03.600 --> 00:57:06.300 But the weird thing about C, perhaps in retrospect, 00:57:06.300 --> 00:57:10.090 is that those functions just kind of worked on the honor system. 00:57:10.090 --> 00:57:15.010 tolower() and toupper() just trusted that you would pass them an input, 00:57:15.010 --> 00:57:18.400 an argument, that is, in fact, a char. 00:57:18.400 --> 00:57:22.300 In Python, and in a lot of other higher-level languages, 00:57:22.300 --> 00:57:26.120 they introduced this notion of Object-Oriented Programming, 00:57:26.120 --> 00:57:28.270 which is commonly described as OOP. 00:57:28.270 --> 00:57:31.120 And in the world of Object-Oriented Programming, 00:57:31.120 --> 00:57:34.510 your values can not only-- your variables, 00:57:34.510 --> 00:57:37.450 for instance, and your data types can not only have values. 00:57:37.450 --> 00:57:40.940 They can also have functionality built into them. 00:57:40.940 --> 00:57:43.270 So if you have a data type like a string, 00:57:43.270 --> 00:57:45.370 frankly, it just makes good sense that strings 00:57:45.370 --> 00:57:48.490 should be uppercaseable, lowercaseable, capitalizable, 00:57:48.490 --> 00:57:50.990 and any number of other operations on strings. 00:57:50.990 --> 00:57:54.040 So in the world of object-oriented programming functions, 00:57:54.040 --> 00:57:58.150 like toupper() and tolower() and isupper() and islower() are not just 00:57:58.150 --> 00:57:59.980 in some random library that you can use. 00:57:59.980 --> 00:58:02.560 They're built into the strings themselves. 00:58:02.560 --> 00:58:06.400 And what this means is that in the world of strings in Python, 00:58:06.400 --> 00:58:10.360 here, for instance, is the URL of the documentation for all of the functions, 00:58:10.360 --> 00:58:13.810 otherwise known as methods, that come with strings. 00:58:13.810 --> 00:58:16.150 So you don't go check for a C-type library 00:58:16.150 --> 00:58:18.790 like we did in C. You check the actual data 00:58:18.790 --> 00:58:20.770 type, the documentation, therefore, and you 00:58:20.770 --> 00:58:24.580 will see in Python's own documentation what functions, a.k.a. 00:58:24.580 --> 00:58:26.680 methods, come with strings. 00:58:26.680 --> 00:58:28.420 So a method is just a function. 00:58:28.420 --> 00:58:32.540 But it's a function that comes with some data type, like a string. 00:58:32.540 --> 00:58:35.830 So let me propose that we do this. 00:58:35.830 --> 00:58:38.980 In the world of object-oriented programming, 00:58:38.980 --> 00:58:41.720 we can come back to agree.py. 00:58:41.720 --> 00:58:44.620 And we can actually improve the program by getting 00:58:44.620 --> 00:58:47.440 rid of this crazy long list, which I wasn't even done with, 00:58:47.440 --> 00:58:50.260 and just canonicalize everything as lowercase. 00:58:50.260 --> 00:58:53.470 So let's just check for lowercase y and lowercase yes, lowercase 00:58:53.470 --> 00:58:55.630 n, lowercase no, and that's it. 00:58:55.630 --> 00:58:58.000 But to your suggestion, let's force everything 00:58:58.000 --> 00:59:01.600 that the user types into lowercase, not because we want 00:59:01.600 --> 00:59:03.160 to permanently change their input-- 00:59:03.160 --> 00:59:05.980 we can throw the value away thereafter-- but 00:59:05.980 --> 00:59:09.520 because we want to more easily logically compare it 00:59:09.520 --> 00:59:12.350 for membership in this list of values. 00:59:12.350 --> 00:59:18.200 So one way to do this would be to literally do s = s.lower(). 00:59:18.200 --> 00:59:19.480 So here's the difference. 00:59:19.480 --> 00:59:22.360 In the world of C, we would have done this-- 00:59:22.360 --> 00:59:25.990 tolower and pass in the value s. 00:59:25.990 --> 00:59:30.010 But in the world of Python, and, in general, object-oriented programming-- 00:59:30.010 --> 00:59:32.530 Java is another language that does this-- 00:59:32.530 --> 00:59:36.460 if s is a string, a.k.a. str, therefore, s is actually 00:59:36.460 --> 00:59:38.290 what's known in Python as an object. 00:59:38.290 --> 00:59:41.560 An object can not only have values or attributes inside of them, 00:59:41.560 --> 00:59:43.300 but also functionality built in. 00:59:43.300 --> 00:59:47.260 And just like in C, with a struct, if you want to go inside of something, 00:59:47.260 --> 00:59:49.030 you use the dot operator. 00:59:49.030 --> 00:59:54.250 And inside of this string, I claim, is a function, a.k.a., method, 00:59:54.250 --> 00:59:55.330 called lower(). 00:59:55.330 --> 00:59:58.510 Long story short, the only takeaway, if this is a bit abstract, 00:59:58.510 --> 01:00:01.185 is that instead of doing lower and then, in parentheses, 01:00:01.185 --> 01:00:04.060 s, in the world of object-oriented programming, you kind of flip that 01:00:04.060 --> 01:00:08.950 and you do s dot name of the method, and then open paren and close paren if you 01:00:08.950 --> 01:00:10.940 don't need to pass in any arguments. 01:00:10.940 --> 01:00:12.650 So this actually achieves the same. 01:00:12.650 --> 01:00:17.680 So let me go ahead and rerun agree.py, and let me type in lowercase y. 01:00:17.680 --> 01:00:18.460 That works. 01:00:18.460 --> 01:00:20.680 Let me run it again, type in lowercase yes. 01:00:20.680 --> 01:00:24.820 That works let me run it again, type in capital Y. That works. 01:00:24.820 --> 01:00:27.070 Let me type in capital YES, all capital-- 01:00:27.070 --> 01:00:28.360 all uppercase YES. 01:00:28.360 --> 01:00:29.410 That too works. 01:00:29.410 --> 01:00:31.450 Let me try no. 01:00:31.450 --> 01:00:33.640 Let me try no in lowercase. 01:00:33.640 --> 01:00:37.180 And all of these permutations now actually work 01:00:37.180 --> 01:00:38.830 because I'm forcing it to lowercase. 01:00:38.830 --> 01:00:42.460 But even more interestingly, in Python, if you're sort of becoming a languages 01:00:42.460 --> 01:00:48.850 person, if you have a variable s that is being set the return value of input() 01:00:48.850 --> 01:00:52.420 function, and then you're immediately going about changing it to lowercase, 01:00:52.420 --> 01:00:57.800 you can also chain method calls together in something like Python by doing this. 01:00:57.800 --> 01:01:02.720 We can get rid of this line altogether, and then I can just do this, .lower. 01:01:02.720 --> 01:01:05.950 And so whatever the return value of input() is, it's going to be a str. 01:01:05.950 --> 01:01:08.350 Whatever the human types in, you can then immediately 01:01:08.350 --> 01:01:12.490 force it to lowercase and then assign the whole value to this variable 01:01:12.490 --> 01:01:13.120 called s. 01:01:13.120 --> 01:01:17.260 You don't actually have to wait around and do it on a separate line 01:01:17.260 --> 01:01:20.220 altogether. 01:01:20.220 --> 01:01:23.235 Questions, then, on any of this? 01:01:26.250 --> 01:01:26.750 No? 01:01:26.750 --> 01:01:27.250 All right. 01:01:27.250 --> 01:01:30.260 Let me do one other that's reminiscent of something we did in the past. 01:01:30.260 --> 01:01:32.900 Let me go into VS Code here, clear my terminal. 01:01:32.900 --> 01:01:35.960 Let's close both the C and the Python version of agree. 01:01:35.960 --> 01:01:39.500 And let's create a program called uppercase.py, whose purpose in life 01:01:39.500 --> 01:01:41.390 is to actually uppercase a whole string. 01:01:41.390 --> 01:01:45.440 In the world of C, we had to do this character by character by character. 01:01:45.440 --> 01:01:46.200 And that's fine. 01:01:46.200 --> 01:01:48.380 I'm going to go ahead and do it similarly here 01:01:48.380 --> 01:01:53.540 in Python, whereby I want to convert it character by character. 01:01:53.540 --> 01:01:56.510 But unfortunately, before I can do that, I actually 01:01:56.510 --> 01:02:00.320 need some way of looping in Python, which we actually haven't seen yet. 01:02:00.320 --> 01:02:02.420 So we need one more set of building blocks. 01:02:02.420 --> 01:02:05.120 And, in fact, if we were to consult the Python documentation, 01:02:05.120 --> 01:02:06.530 we'd see this and much more. 01:02:06.530 --> 01:02:10.240 So, in fact, here's a list of all of the functions that come with Python. 01:02:10.240 --> 01:02:11.990 And it's actually not that long of a list, 01:02:11.990 --> 01:02:15.680 because so much of the functionality of Python is built into data types, 01:02:15.680 --> 01:02:18.950 like strings and integers and floats and more. 01:02:18.950 --> 01:02:22.790 Here is the canonical source of truth for Python documentation. 01:02:22.790 --> 01:02:25.950 So as opposed to using the CS50 manual for C, 01:02:25.950 --> 01:02:29.700 which is meant to be a simplified version of publicly 01:02:29.700 --> 01:02:32.520 available documentation, we'll generally, for Python, 01:02:32.520 --> 01:02:33.870 point you to the official docs. 01:02:33.870 --> 01:02:39.000 I will disclaim they're not really written for introductory students. 01:02:39.000 --> 01:02:42.150 And they'll generally leave some detail off and use arcane language. 01:02:42.150 --> 01:02:43.947 But at this point in the term, even if it 01:02:43.947 --> 01:02:45.780 might be a little frustrating at first, it's 01:02:45.780 --> 01:02:47.873 good to see documentation in the real world 01:02:47.873 --> 01:02:50.290 because that's what you're going to have after the course. 01:02:50.290 --> 01:02:52.860 And so you'll get used to it through practice over time. 01:02:52.860 --> 01:02:55.408 But with loops, let's introduce one other feature 01:02:55.408 --> 01:02:56.700 that we can compare to Scratch. 01:02:56.700 --> 01:03:00.270 Here, for instance, in Scratch, is how we might have repeated something three 01:03:00.270 --> 01:03:01.980 times, like meowing on the screen. 01:03:01.980 --> 01:03:04.330 In C, there were a bunch of ways to do this. 01:03:04.330 --> 01:03:07.080 And the clunkiest was maybe to do it with a while loop 01:03:07.080 --> 01:03:10.080 where we declare a variable called i, set it equal to 0, 01:03:10.080 --> 01:03:14.580 and then, iteratively, increment i again and again until it exceeds-- 01:03:14.580 --> 01:03:17.970 until it equals 3, each time printing out "meow." 01:03:17.970 --> 01:03:22.620 In Python, we can do this in a few different ways as well. 01:03:22.620 --> 01:03:26.850 The nearest translation of C into Python is perhaps this. 01:03:26.850 --> 01:03:29.850 It's almost the same, and logically, it really is the same, 01:03:29.850 --> 01:03:32.730 but you don't specify int, and you don't have a semicolon. 01:03:32.730 --> 01:03:34.270 You don't have curly braces. 01:03:34.270 --> 01:03:35.520 But you do have a colon. 01:03:35.520 --> 01:03:36.690 You don't use printf(). 01:03:36.690 --> 01:03:37.800 You use print(). 01:03:37.800 --> 01:03:42.240 And you can't use i++, but you still can use i += 1. 01:03:42.240 --> 01:03:45.360 So logically, exactly the same idea as in C-- 01:03:45.360 --> 01:03:46.660 It's just a little tighter. 01:03:46.660 --> 01:03:49.920 I mean, it's a little easier to read, even though it's very mechanical, 01:03:49.920 --> 01:03:50.490 if you will. 01:03:50.490 --> 01:03:51.900 You're defining all of these. 01:03:51.900 --> 01:03:54.600 You're defining this variable and changing it incrementally. 01:03:54.600 --> 01:03:58.317 Well, recall that in C, we could also use a for loop, which at first glance 01:03:58.317 --> 01:04:00.150 was probably more cryptic than a while loop. 01:04:00.150 --> 01:04:02.670 But odds are by now, you're more comfortable or more 01:04:02.670 --> 01:04:04.950 in the habit of using loops-- same exact idea. 01:04:04.950 --> 01:04:08.160 In Python, though, we might do it like this. 01:04:08.160 --> 01:04:13.530 We've seen how, in square brackets, you can have lists of values, like y, yes, 01:04:13.530 --> 01:04:14.690 and so forth. 01:04:14.690 --> 01:04:16.690 Well, let's just do the same thing with numbers. 01:04:16.690 --> 01:04:19.380 So if you want Python to do something three times, give it 01:04:19.380 --> 01:04:24.390 a list of three values, like 0, 1, 2, and then print out "hello, world" 01:04:24.390 --> 01:04:26.100 that many times. 01:04:26.100 --> 01:04:29.550 Now, this is correct, but it's bad design. 01:04:29.550 --> 01:04:33.720 Even if you've never seen Python before, extrapolate mentally from this. 01:04:33.720 --> 01:04:38.204 Why is this probably not the right way or the best way to do this looping? 01:04:38.204 --> 01:04:40.329 AUDIENCE: Because if you wanted to do it more than, 01:04:40.329 --> 01:04:42.360 like, three times, you have to [INAUDIBLE].. 01:04:42.360 --> 01:04:43.110 DAVID MALAN: Yeah. 01:04:43.110 --> 01:04:46.830 If you want to do it four times, five times, 50 times, 100 times, 01:04:46.830 --> 01:04:50.250 I mean, surely, there's a better way than enumerating all of these values. 01:04:50.250 --> 01:04:51.060 And there is. 01:04:51.060 --> 01:04:55.620 In fact, in Python, there's a function called range() that actually returns 01:04:55.620 --> 01:04:58.170 to you very efficiently a range of values. 01:04:58.170 --> 01:05:02.040 And by default, it hands you the number 0 and then 1 and then 2. 01:05:02.040 --> 01:05:05.430 And if you want more than that, you just change the argument to range() to be 01:05:05.430 --> 01:05:07.060 how many values do you want. 01:05:07.060 --> 01:05:10.560 So if you passed in range of 50, you would get back 0 01:05:10.560 --> 01:05:15.430 through 49, which effectively allows you to do something 50 times in total. 01:05:15.430 --> 01:05:18.430 So this is perhaps the most Pythonic way, so to speak. 01:05:18.430 --> 01:05:20.070 And this is actually a term of art. 01:05:20.070 --> 01:05:23.340 Pythonic isn't necessarily the only way to do something. 01:05:23.340 --> 01:05:28.230 But it's the way to do something based on consensus in the Python community. 01:05:28.230 --> 01:05:30.090 So it's pretty common to do this. 01:05:30.090 --> 01:05:32.100 But there's some curiosity here. 01:05:32.100 --> 01:05:36.720 Notice I'm declaring a variable i, but I'm never actually using it. 01:05:36.720 --> 01:05:39.137 In fact, I don't even increment it because that's 01:05:39.137 --> 01:05:40.470 sort of happening automatically. 01:05:40.470 --> 01:05:43.690 Well, what's really happening here is automatically in Python, 01:05:43.690 --> 01:05:50.000 on every iteration of this loop, Python is assigning i to the next value. 01:05:50.000 --> 01:05:51.393 So initially, i is 0. 01:05:51.393 --> 01:05:52.810 Then it goes through an iteration. 01:05:52.810 --> 01:05:53.920 Then i is 1. 01:05:53.920 --> 01:05:55.210 Then i is 2. 01:05:55.210 --> 01:05:58.210 And then that's it if you only asked for three values. 01:05:58.210 --> 01:06:00.940 But there's this other technique in Python, just so you know, 01:06:00.940 --> 01:06:03.850 whereby if you're the programmer, and you know you don't actually 01:06:03.850 --> 01:06:06.100 care about the name of this variable, you 01:06:06.100 --> 01:06:10.600 can actually change it to an underscore, which has no functional effect per se. 01:06:10.600 --> 01:06:14.020 It just signals to the reader, your colleague, your teaching fellow, 01:06:14.020 --> 01:06:17.738 that it's a variable, and you need it in order to achieve a for loop. 01:06:17.738 --> 01:06:19.780 But you don't care about the name of the variable 01:06:19.780 --> 01:06:22.150 because you're not going to use it explicitly anywhere. 01:06:22.150 --> 01:06:25.310 So that might be an even more Pythonic way of doing things. 01:06:25.310 --> 01:06:27.580 But if you're more comfortable seeing the i 01:06:27.580 --> 01:06:30.010 and using the variable more explicitly, that's fine. 01:06:30.010 --> 01:06:32.480 Underscore does not mean anything special. 01:06:32.480 --> 01:06:35.720 It's just a valid character for a variable name. 01:06:35.720 --> 01:06:38.630 So this is convention, nothing more technical than that. 01:06:38.630 --> 01:06:42.430 What about a forever loop in Scratch, like literally meow forever. 01:06:42.430 --> 01:06:46.840 Well, over here, we can just use in C, while(true) printf() "meow," 01:06:46.840 --> 01:06:49.210 again and again and again. 01:06:49.210 --> 01:06:51.977 In Python, it's almost the same. 01:06:51.977 --> 01:06:53.560 You still get rid of the curly braces. 01:06:53.560 --> 01:06:54.430 You add the colon. 01:06:54.430 --> 01:06:55.638 You get rid of the semicolon. 01:06:55.638 --> 01:06:57.940 But there's a subtlety. 01:06:57.940 --> 01:07:00.040 What else is different here? 01:07:00.040 --> 01:07:01.090 Yeah? 01:07:01.090 --> 01:07:02.620 So True is uppercase. 01:07:02.620 --> 01:07:03.310 Why? 01:07:03.310 --> 01:07:04.210 Who knows? 01:07:04.210 --> 01:07:07.210 The world decided that in Python, True is capitalized 01:07:07.210 --> 01:07:08.320 and False is capitalized. 01:07:08.320 --> 01:07:11.030 In many other languages, daresay most, they are not. 01:07:11.030 --> 01:07:15.430 It's just a difference that you have to keep in mind or remember. 01:07:15.430 --> 01:07:15.940 All right. 01:07:15.940 --> 01:07:20.480 So now that we have looping constructs, let me go back to my code here. 01:07:20.480 --> 01:07:23.680 And recall that I proposed that we re-implement a program like uppercase, 01:07:23.680 --> 01:07:25.420 force an entire string to uppercase. 01:07:25.420 --> 01:07:29.470 And in C, we would have done this with a for loop, iterating from left to right. 01:07:29.470 --> 01:07:32.410 But what's nice in Python frankly, is that it's a lot easier 01:07:32.410 --> 01:07:37.510 to loop in Python than it is in C because you can loop over 01:07:37.510 --> 01:07:39.460 anything that is iterable. 01:07:39.460 --> 01:07:43.210 A string is iterable in the sense that you can iterate over it 01:07:43.210 --> 01:07:44.630 from left to right. 01:07:44.630 --> 01:07:45.740 So what do I mean by this? 01:07:45.740 --> 01:07:48.010 Well, let me go ahead and, in uppercase.py, 01:07:48.010 --> 01:07:51.790 let's first prompt the user for a variable called before and set that 01:07:51.790 --> 01:07:56.650 equal to the return value of input(), giving them a prompt of "Before," 01:07:56.650 --> 01:07:57.820 colon. 01:07:57.820 --> 01:08:01.720 Then let's go ahead, as we did weeks ago, and print out just the word 01:08:01.720 --> 01:08:08.110 "After," just to make clear to the user what is actually going to be printed. 01:08:08.110 --> 01:08:12.700 Then let me go ahead and specify the following loop-- 01:08:12.700 --> 01:08:16.005 for-- and previously you saw me use i, but because I'm dealing with 01:08:16.005 --> 01:08:18.130 characters, I'm actually going to do this instead-- 01:08:18.130 --> 01:08:24.910 for c in before, colon, print out c.upper. 01:08:24.910 --> 01:08:26.290 And that's it. 01:08:26.290 --> 01:08:28.359 Now, this is a little flawed, I will concede. 01:08:28.359 --> 01:08:31.540 But let me run this-- python of uppercase.py. 01:08:31.540 --> 01:08:35.229 Let's type in something like cat, C-A-T in all lowercase. 01:08:35.229 --> 01:08:35.990 Enter. 01:08:35.990 --> 01:08:36.490 All right. 01:08:36.490 --> 01:08:39.040 Well, you see "After," and I did get it right in the sense 01:08:39.040 --> 01:08:43.359 that it is capital C, capital A, capital T, but it looks a little stupid. 01:08:43.359 --> 01:08:45.189 And in order to fix this, we actually need 01:08:45.189 --> 01:08:49.479 to introduce something that's called named parameters. 01:08:49.479 --> 01:08:55.510 So let me actually go ahead and propose that we can fix this problem 01:08:55.510 --> 01:08:59.140 by actually passing in another argument to the print() function. 01:08:59.140 --> 01:09:01.540 And this is a little different syntactically from C. 01:09:01.540 --> 01:09:04.479 But if I go back to VS Code here, it turns out 01:09:04.479 --> 01:09:06.319 that there's two aesthetic problems here. 01:09:06.319 --> 01:09:10.130 One, I did not want the new line automatically inserted after "After." 01:09:10.130 --> 01:09:10.630 Why? 01:09:10.630 --> 01:09:13.569 Because, just like in week 1, I want them to line up nicely-- 01:09:13.569 --> 01:09:15.310 or in week 2. 01:09:15.310 --> 01:09:18.367 And I don't want a new line after C-A-T. So even 01:09:18.367 --> 01:09:20.200 though at first glance a moment-- a bit ago, 01:09:20.200 --> 01:09:23.620 it might have seemed nice that Python just does the backslash n for you, 01:09:23.620 --> 01:09:27.649 it can backfire if you don't actually want a new line every time. 01:09:27.649 --> 01:09:29.660 So the syntax is going to look a little weird. 01:09:29.660 --> 01:09:32.529 But in Python, with the print() function, 01:09:32.529 --> 01:09:36.819 if you want to change the character that's automatically used at the end 01:09:36.819 --> 01:09:43.130 of every line, you can literally pass in a second argument called end and set it 01:09:43.130 --> 01:09:45.660 equal to something else. 01:09:45.660 --> 01:09:48.350 So if you want to set it equal to something else, 01:09:48.350 --> 01:09:52.620 and that something else is nothing, "", then that's fine. 01:09:52.620 --> 01:09:57.050 You can actually specify end="". 01:09:57.050 --> 01:10:00.980 Down here, too, if you want to specify that at the end of every one of these 01:10:00.980 --> 01:10:05.330 characters should be nothing, I can specify end="". 01:10:05.330 --> 01:10:08.390 What this implies is that by default in Python, 01:10:08.390 --> 01:10:13.280 the default value of this end parameter is actually always backslash n. 01:10:13.280 --> 01:10:15.800 So if you want to override it and take that away, 01:10:15.800 --> 01:10:19.380 you just literally change it to "" instead. 01:10:19.380 --> 01:10:23.960 And now if I clear my-- if I rerun this program, uppercase.py, 01:10:23.960 --> 01:10:27.820 type in cat in all lowercase, now you'll see-- 01:10:27.820 --> 01:10:29.307 oh, two minor bugs here. 01:10:29.307 --> 01:10:30.140 One was just stupid. 01:10:30.140 --> 01:10:31.940 I had one too many spaces here. 01:10:31.940 --> 01:10:35.150 But you'll notice that I didn't move the cursor to the next line 01:10:35.150 --> 01:10:38.090 after CAT was printed in all uppercase. 01:10:38.090 --> 01:10:40.070 And that we can fix by just printing nothing. 01:10:40.070 --> 01:10:43.140 It turns out when you don't pass print() an argument at all, 01:10:43.140 --> 01:10:46.690 it automatically gives you just the line ending, nothing else. 01:10:46.690 --> 01:10:49.210 So I think this will move the cursor as expected. 01:10:49.210 --> 01:10:52.200 So let me clear it now, run python of uppercase.py 01:10:52.200 --> 01:10:55.290 and hit Enter, type in cat in all lowercase, cross my fingers this time, 01:10:55.290 --> 01:10:59.910 and now I have indeed capitalized this, character by character 01:10:59.910 --> 01:11:03.360 by character, just like we did in C. 01:11:03.360 --> 01:11:06.060 But honestly, this, too, not really necessary-- 01:11:06.060 --> 01:11:08.610 it turns out I don't need to loop over a whole string, 01:11:08.610 --> 01:11:10.510 because strings themselves come with methods. 01:11:10.510 --> 01:11:12.930 And if you were to visit the documentation for strings, 01:11:12.930 --> 01:11:17.370 you would see that indeed, upper is a method that comes with every string, 01:11:17.370 --> 01:11:20.880 and you don't need to call it on every character individually. 01:11:20.880 --> 01:11:25.650 I could instead get rid of all of this and just print out-- 01:11:25.650 --> 01:11:31.860 for instance, I can just print out before.upper. 01:11:31.860 --> 01:11:35.400 And the upper() function that comes with strings will automatically apply it 01:11:35.400 --> 01:11:39.370 to every character they're in and, I think, achieve the same result. 01:11:39.370 --> 01:11:42.990 So let me go ahead and try this again-- python of uppercase.py, type in cat, 01:11:42.990 --> 01:11:46.330 enter, and indeed, it works exactly the same way. 01:11:46.330 --> 01:11:48.090 Let me take this one step further. 01:11:48.090 --> 01:11:51.510 Let me go ahead and combine a couple of ideas now here. 01:11:51.510 --> 01:11:56.220 Let me go ahead and, for instance, let me get rid of this last print() line. 01:11:56.220 --> 01:12:00.090 Let me change my logic to be after equals the return value of this. 01:12:00.090 --> 01:12:04.770 And now I can use one of those f strings and plug this in maybe here, After. 01:12:04.770 --> 01:12:06.750 And I can get rid of the new line ending. 01:12:06.750 --> 01:12:08.385 I can specify this is an f string. 01:12:08.385 --> 01:12:10.260 So I'm just changing this around a little bit 01:12:10.260 --> 01:12:13.680 logically so that now I have a variable called after that 01:12:13.680 --> 01:12:15.940 is the uppercase version of before. 01:12:15.940 --> 01:12:21.910 And now, if I do python of uppercase.py, type in cat, that too now works. 01:12:21.910 --> 01:12:23.980 And if I-- actually let me add a space there, 01:12:23.980 --> 01:12:28.350 if I run python of uppercase.py, type in cat, that too now works. 01:12:28.350 --> 01:12:31.440 And lastly here, if you don't want to bother 01:12:31.440 --> 01:12:33.960 creating another variable like this, you can even 01:12:33.960 --> 01:12:37.830 put short bits of code inside of these format strings. 01:12:37.830 --> 01:12:40.800 So I, for instance, could go in here into these curly braces 01:12:40.800 --> 01:12:42.510 and not just put a variable name. 01:12:42.510 --> 01:12:48.120 I can actually put Python code inside of the curly braces, inside of my string. 01:12:48.120 --> 01:12:51.600 And so now if I run Python of uppercase.py, type in cat, 01:12:51.600 --> 01:12:54.360 even that too now works. 01:12:54.360 --> 01:12:55.890 Now, which one is the best? 01:12:55.890 --> 01:12:59.590 This is kind of reasonable to put the bit of code inside of the string. 01:12:59.590 --> 01:13:02.793 I would not start writing long lines of code inside of curly braces 01:13:02.793 --> 01:13:04.710 that start to wrap, no less, because then it's 01:13:04.710 --> 01:13:06.420 just going to be a matter of bad style. 01:13:06.420 --> 01:13:09.780 But this, again, is to say that there's a bunch of different ways 01:13:09.780 --> 01:13:11.590 to solve each of these problems. 01:13:11.590 --> 01:13:15.240 And so up until now, we've generally seen not named parameters. 01:13:15.240 --> 01:13:20.100 end is the first parameter we've ever seen that has a name, literally, end. 01:13:20.100 --> 01:13:23.790 Up until now in C and up until a moment ago in Python, 01:13:23.790 --> 01:13:27.840 we've always been assuming that our parameters are positional. 01:13:27.840 --> 01:13:33.600 What matters is the order in which you specify them, not necessarily something 01:13:33.600 --> 01:13:35.280 else. 01:13:35.280 --> 01:13:35.880 Whew. 01:13:35.880 --> 01:13:37.600 OK, that was a lot. 01:13:37.600 --> 01:13:42.660 Any questions about any of this here? 01:13:42.660 --> 01:13:43.230 No? 01:13:43.230 --> 01:13:43.620 All right. 01:13:43.620 --> 01:13:44.290 It feels like a lot. 01:13:44.290 --> 01:13:45.450 Let's take our 10-minute break here. 01:13:45.450 --> 01:13:46.700 Fruit roll-ups are now served. 01:13:46.700 --> 01:13:50.310 We'll be back in 10. 01:13:50.310 --> 01:13:51.690 All right. 01:13:51.690 --> 01:13:52.830 We are back. 01:13:52.830 --> 01:13:58.440 And recall that as we left off, we had just introduced loops. 01:13:58.440 --> 01:14:01.620 And we'd seen a bunch of different ways by which 01:14:01.620 --> 01:14:03.140 we could get, say, a cat to meow. 01:14:03.140 --> 01:14:04.890 Let's actually translate that to some code 01:14:04.890 --> 01:14:08.490 and start to make sense of some of the programs with which we began, 01:14:08.490 --> 01:14:11.452 like creating our own functions, as we did for the speller example 01:14:11.452 --> 01:14:14.410 at the very beginning, and actually do this a little more methodically. 01:14:14.410 --> 01:14:16.270 So let me go over to VS Code here. 01:14:16.270 --> 01:14:20.790 Let me go ahead and create a program called meow.py, instead of meow.c 01:14:20.790 --> 01:14:22.140 as in the past. 01:14:22.140 --> 01:14:25.890 And suffice it to say if you want to implement the idea of a cat, 01:14:25.890 --> 01:14:30.270 we can do better than just saying print("meow"), print("meow"), 01:14:30.270 --> 01:14:31.260 print("meow"). 01:14:31.260 --> 01:14:32.457 This, of course, would work. 01:14:32.457 --> 01:14:35.290 This is correct if the goal is to get the thing to meow three times. 01:14:35.290 --> 01:14:40.710 But when I run python of meow.py, it's going to work as expected, 01:14:40.710 --> 01:14:42.650 but this is just not good design, right? 01:14:42.650 --> 01:14:44.310 We should minimally be using a loop. 01:14:44.310 --> 01:14:47.870 So let me propose that we improve this per the building blocks we've seen. 01:14:47.870 --> 01:14:51.470 And I could say something like, for i in range(3), 01:14:51.470 --> 01:14:54.170 go ahead and print out now, quote unquote, "meow." 01:14:54.170 --> 01:14:58.340 So this is better in the sense that it still prints meow, meow, meow. 01:14:58.340 --> 01:15:01.490 But if I want to change this to a dog and change the meow to a woof 01:15:01.490 --> 01:15:04.370 or something like that, I can change it in one place and not three 01:15:04.370 --> 01:15:07.290 different places-- so just, in general, better design. 01:15:07.290 --> 01:15:10.460 But what if now, much like in Scratch and in C, 01:15:10.460 --> 01:15:14.270 I wanted to create my own meow() function which did not come with either 01:15:14.270 --> 01:15:15.770 of those languages as well. 01:15:15.770 --> 01:15:18.170 Well, as a teaser at the start of class, we 01:15:18.170 --> 01:15:20.600 saw that you can define your own functions 01:15:20.600 --> 01:15:24.410 with this keyword def, which is a little bit different from how C does it. 01:15:24.410 --> 01:15:29.060 But let me go ahead and do this indeed in Python and define my own function 01:15:29.060 --> 01:15:29.690 meow(). 01:15:29.690 --> 01:15:36.950 So let me go ahead and do def meow(), and then, inside of that function, 01:15:36.950 --> 01:15:41.370 I'm just going to literally do for now, quote unquote, "meow" with print(). 01:15:41.370 --> 01:15:46.910 And now down here, notice, I can actually go ahead and just call meow(). 01:15:46.910 --> 01:15:49.880 And I can go ahead and call meow(), and I can call meow(). 01:15:49.880 --> 01:15:52.370 And this is not the best design at the moment. 01:15:52.370 --> 01:15:56.900 But Python does not constrain me to have to implement a main() function, 01:15:56.900 --> 01:15:58.200 as we've seen thus far. 01:15:58.200 --> 01:16:01.850 But I can define my own helper functions, if you will, 01:16:01.850 --> 01:16:03.590 like a helper function called meow(). 01:16:03.590 --> 01:16:06.350 So let me go ahead and just run this for demonstration's sake 01:16:06.350 --> 01:16:08.120 and run python of meow.py. 01:16:08.120 --> 01:16:09.380 That does seem to work. 01:16:09.380 --> 01:16:10.610 But this is not good design. 01:16:10.610 --> 01:16:15.800 And let me go ahead and actually do this-- for i in range(3), 01:16:15.800 --> 01:16:18.140 now let me call the meow() function. 01:16:18.140 --> 01:16:19.430 And this, too, should work. 01:16:19.430 --> 01:16:23.480 If I do python of meow.py, there we have meow, meow, meow. 01:16:23.480 --> 01:16:26.840 But I very deliberately did something clever here. 01:16:26.840 --> 01:16:29.060 I defined meow at the top of my file. 01:16:29.060 --> 01:16:31.600 But that's not the best practice because as in C, 01:16:31.600 --> 01:16:34.100 when someone opens the file for the first time, whether you, 01:16:34.100 --> 01:16:38.510 a TF, a TA, a colleague, you'd like to see the main part of the program 01:16:38.510 --> 01:16:42.050 at the top of the file, just because it's easier mentally to dive right in 01:16:42.050 --> 01:16:43.610 and know what this file is doing. 01:16:43.610 --> 01:16:47.420 So let me go ahead and practice what I'm preaching and put the main part 01:16:47.420 --> 01:16:49.670 of my code, even if there's no main() function per se, 01:16:49.670 --> 01:16:51.480 at the top of this file. 01:16:51.480 --> 01:16:53.600 So now I have the loop at the top. 01:16:53.600 --> 01:16:57.710 I'm calling meow() on line 2, and I'm defining meow() on lines 5 and 6. 01:16:57.710 --> 01:17:00.420 Well, instinctively, you can perhaps see where this is going. 01:17:00.420 --> 01:17:02.750 If I run Python of meow.py and hit Enter, 01:17:02.750 --> 01:17:06.570 there's one of those tracebacks that's tracing my error. 01:17:06.570 --> 01:17:11.060 And here, my error is apparently on line 2 in meow.py. 01:17:11.060 --> 01:17:15.120 And you'll notice that, huh, the name 'meow' is not defined. 01:17:15.120 --> 01:17:18.440 And so previously, we saw a different type of error, a value error. 01:17:18.440 --> 01:17:20.870 Here we're seeing a name error in the sense 01:17:20.870 --> 01:17:23.690 that Python does not recognize the name of this function. 01:17:23.690 --> 01:17:27.630 And intuitively, why might that be, even if the error is a little cryptic? 01:17:27.630 --> 01:17:28.130 Yeah? 01:17:28.130 --> 01:17:29.660 AUDIENCE: [INAUDIBLE] top to bottom. 01:17:29.660 --> 01:17:33.680 DAVID MALAN: Yeah, Python, too-- as fancier as it seems to be than C, 01:17:33.680 --> 01:17:36.810 it still takes things pretty literally, top to bottom, left to right. 01:17:36.810 --> 01:17:40.820 So if you define meow() on line 5, you can't use it on line 2. 01:17:40.820 --> 01:17:43.352 OK, so I could undo this, and I could flip the order. 01:17:43.352 --> 01:17:46.310 But let me just stipulate that as soon as we have a bunch of functions, 01:17:46.310 --> 01:17:49.880 it's probably naive to assume I can just keep putting my functions above, above, 01:17:49.880 --> 01:17:50.510 above, above. 01:17:50.510 --> 01:17:53.810 And honestly, that's going to move all of my main code, so to speak, 01:17:53.810 --> 01:17:57.360 to the bottom of the file, which is sort of counterproductive or less obvious. 01:17:57.360 --> 01:18:01.400 So it turns out in Python, even though you don't need a main() function, 01:18:01.400 --> 01:18:05.160 it's actually quite common to define one nonetheless. 01:18:05.160 --> 01:18:08.850 So what I could do to solve this problem is this. 01:18:08.850 --> 01:18:12.980 Let me go ahead and define a function called main() that takes no arguments, 01:18:12.980 --> 01:18:14.030 in this case. 01:18:14.030 --> 01:18:17.310 Let me indent that same code beneath it. 01:18:17.310 --> 01:18:20.550 And now let me keep meow() defined at the bottom of my file. 01:18:20.550 --> 01:18:24.170 So if we read this literally, on line 1, I'm defining a function called main(). 01:18:24.170 --> 01:18:27.110 And it will do what is prescribed on lines 2 and 3. 01:18:27.110 --> 01:18:30.050 On line 6, I'm defining a function called meow(), 01:18:30.050 --> 01:18:33.690 and it will do what's prescribed on line 7-- so fairly straightforward, 01:18:33.690 --> 01:18:36.260 even though the keyword def is, of course, new today. 01:18:36.260 --> 01:18:38.870 If I run, though, python of meow.py, you'd 01:18:38.870 --> 01:18:40.370 like to think I'll see three meows. 01:18:40.370 --> 01:18:43.330 But I see nothing. 01:18:43.330 --> 01:18:45.140 I don't see an error, but I see nothing. 01:18:45.140 --> 01:18:45.640 Why? 01:18:45.640 --> 01:18:50.240 Intuitively, what explains the lack of behavior? 01:18:50.240 --> 01:18:51.310 I didn't call main(). 01:18:51.310 --> 01:18:55.300 So this is the thing even though it's not required in Python to have a main() 01:18:55.300 --> 01:18:59.840 function, but it is conventional in Python to have a main() function, 01:18:59.840 --> 01:19:02.320 you have to call the function yourself. 01:19:02.320 --> 01:19:04.840 It doesn't get magically called as it does in C. 01:19:04.840 --> 01:19:06.730 So this might seem a little stupid-- 01:19:06.730 --> 01:19:09.970 and that's fine-- but it is the convention in Python. 01:19:09.970 --> 01:19:14.050 Generally, the very last line of your file might just be to literally this, 01:19:14.050 --> 01:19:18.310 call main(), because this satisfies the constraint that main() is defined 01:19:18.310 --> 01:19:24.020 on line 1 meow() is defined on line 6, but we don't call anything until line 01:19:24.020 --> 01:19:24.520 10. 01:19:24.520 --> 01:19:26.590 So line 10 says call main(). 01:19:26.590 --> 01:19:28.420 So that means execute this code. 01:19:28.420 --> 01:19:32.060 Line 3 says call meow(), which means execute this code. 01:19:32.060 --> 01:19:36.640 So now it all works because the last thing I'm doing is call main(). 01:19:36.640 --> 01:19:38.920 You can think of C as just kind of secretly having 01:19:38.920 --> 01:19:41.380 this line there for you the whole time. 01:19:41.380 --> 01:19:45.210 But now that we have our own functions, notice that we can enhance this 01:19:45.210 --> 01:19:48.900 implementation of meow() to maybe be parameterized and take actually 01:19:48.900 --> 01:19:50.080 an argument itself. 01:19:50.080 --> 01:19:51.510 So let me make a tweak here. 01:19:51.510 --> 01:19:54.270 Just like in C, and just like in Scratch, 01:19:54.270 --> 01:19:58.170 I can actually let meow() meow a specific number of times. 01:19:58.170 --> 01:19:58.980 So let me do this. 01:19:58.980 --> 01:20:01.950 Wouldn't it be nice, instead of having my loop in main(), 01:20:01.950 --> 01:20:05.790 to instead just distill main() into a single line of code and just pass 01:20:05.790 --> 01:20:08.250 in the number of times you want the thing to meow? 01:20:08.250 --> 01:20:11.640 What I could do in meow() here is I have to give it a parameter. 01:20:11.640 --> 01:20:13.140 And I could call it anything I want. 01:20:13.140 --> 01:20:16.000 I'm going to call it n for number, which seems fine. 01:20:16.000 --> 01:20:18.270 And then, in the meow() function, I could do this-- 01:20:18.270 --> 01:20:25.290 for i in range of, not 3, but n now, I can tell range() to give me a range 01:20:25.290 --> 01:20:27.930 that is of variable length based on what n is. 01:20:27.930 --> 01:20:31.380 And then I indent the print() below the loop now. 01:20:31.380 --> 01:20:33.960 And this should now do what I expect, too. 01:20:33.960 --> 01:20:36.750 Let me run python of meow.py. 01:20:36.750 --> 01:20:37.530 Enter. 01:20:37.530 --> 01:20:38.730 And there's 3. 01:20:38.730 --> 01:20:43.000 But if I change the 3 to a 5 and rerun this, python of meow.py, 01:20:43.000 --> 01:20:44.560 now I'm getting five meows. 01:20:44.560 --> 01:20:48.040 So we've just seen a third way how, in Python, now we 01:20:48.040 --> 01:20:52.780 can implement the idea of meowing as its own abstracted function. 01:20:52.780 --> 01:20:54.730 And I can assume now that meow() exists. 01:20:54.730 --> 01:20:57.767 I can now treat it as out of sight, out of mind. 01:20:57.767 --> 01:20:58.600 It's an abstraction. 01:20:58.600 --> 01:21:02.530 And frankly, I could even put it into a library, import it from a file, 01:21:02.530 --> 01:21:07.880 like we've done with CS50, and make it usable by other people as well. 01:21:07.880 --> 01:21:10.720 So the takeaway here, really, though, is that in Python, you 01:21:10.720 --> 01:21:13.750 can, similarly to C, define your own functions. 01:21:13.750 --> 01:21:15.790 But you should understand the slight differences 01:21:15.790 --> 01:21:19.150 as to what gets called automatically for you. 01:21:19.150 --> 01:21:19.900 All right. 01:21:19.900 --> 01:21:22.360 Other differences or similarities with C? 01:21:22.360 --> 01:21:25.930 Well, recall that in C, truncation was an issue. 01:21:25.930 --> 01:21:30.910 Truncation is whereby if you, for instance, divide an int by an int, 01:21:30.910 --> 01:21:34.510 and it's a fractional answer, everything after the decimal point 01:21:34.510 --> 01:21:38.440 gets truncated by default because an int divided by an int in C 01:21:38.440 --> 01:21:39.560 gives you an int. 01:21:39.560 --> 01:21:43.780 And if you can't fit the remainder in that integer, everything at the decimal 01:21:43.780 --> 01:21:44.810 gets cut off. 01:21:44.810 --> 01:21:45.800 So what does this mean? 01:21:45.800 --> 01:21:48.170 Well, let me actually go back to VS Code here. 01:21:48.170 --> 01:21:52.540 Let me go ahead and open, say, calculator.py again, 01:21:52.540 --> 01:21:54.760 and let's change up what the calculator now does. 01:21:54.760 --> 01:21:55.610 Let me do this. 01:21:55.610 --> 01:22:00.010 Let me define a variable called x, set it equal to the input() function, 01:22:00.010 --> 01:22:01.510 prompting the user for x. 01:22:01.510 --> 01:22:05.710 Let me ask the user for y, let me not repeat past mistakes, 01:22:05.710 --> 01:22:09.160 and let me proactively convert both of these to ints. 01:22:09.160 --> 01:22:13.720 And I'll do it in one pretty one-liner here so that I definitely get x and y. 01:22:13.720 --> 01:22:15.850 And on the honor system, I just won't type cat. 01:22:15.850 --> 01:22:17.980 I won't type dog, even though this program is not 01:22:17.980 --> 01:22:19.917 really complete without error checking. 01:22:19.917 --> 01:22:22.000 Now, let me go ahead and declare a third variable, 01:22:22.000 --> 01:22:26.230 z = x / y, and now let's just go ahead and print out z. 01:22:26.230 --> 01:22:27.700 I don't need a format code. 01:22:27.700 --> 01:22:28.810 I don't need an f string. 01:22:28.810 --> 01:22:32.420 If all you want to do is print a variable, print() is very flexible. 01:22:32.420 --> 01:22:35.200 You can just say print(z), in parentheses. 01:22:35.200 --> 01:22:38.260 Let me run python of calculator.py, hit Enter. 01:22:38.260 --> 01:22:42.040 Let's type in 1 for x, 3 for y. 01:22:42.040 --> 01:22:43.570 I left out a space there. 01:22:43.570 --> 01:22:46.240 And oh, interesting. 01:22:46.240 --> 01:22:48.170 What seems to have happened here? 01:22:48.170 --> 01:22:52.900 Let me fix my spacing and rerun this again-- python of calculator.py-- so 1, 01:22:52.900 --> 01:22:53.740 3. 01:22:53.740 --> 01:22:55.510 What did not happen? 01:22:55.510 --> 01:22:56.830 AUDIENCE: It doesn't truncate. 01:22:56.830 --> 01:22:57.370 DAVID MALAN: Yeah. 01:22:57.370 --> 01:22:58.400 So it didn't truncate. 01:22:58.400 --> 01:23:00.430 So Python is a little smarter when it comes 01:23:00.430 --> 01:23:02.660 to converting one value to another. 01:23:02.660 --> 01:23:05.050 So an integer divided by an integer, if it ends up 01:23:05.050 --> 01:23:07.780 giving you this fractional component, not to worry now, 01:23:07.780 --> 01:23:11.860 you'll get back what is effectively a float in Python here. 01:23:11.860 --> 01:23:17.050 Well, what else do we want to be mindful of in, say, Python? 01:23:17.050 --> 01:23:20.920 Well, recall that in C, we had this issue of floating point and precision 01:23:20.920 --> 01:23:24.760 whereby if you want to represent a number, like 1/3, and on a piece 01:23:24.760 --> 01:23:27.640 of paper, it's, like, 0.3 with a line over it 01:23:27.640 --> 01:23:29.860 because the 3 infinitely repeats-- 01:23:29.860 --> 01:23:33.040 but we saw a problem in C last time when we actually 01:23:33.040 --> 01:23:34.550 played around with some value. 01:23:34.550 --> 01:23:37.000 So, for instance, let me go back to VS Code here. 01:23:37.000 --> 01:23:40.300 And this is going to be the ugliest syntax I do think we see today. 01:23:40.300 --> 01:23:45.700 But there was a way in C, using %f, to show more than the default number 01:23:45.700 --> 01:23:49.030 of digits after the decimal point, to see more significant digits. 01:23:49.030 --> 01:23:50.830 In Python, there's something similar. 01:23:50.830 --> 01:23:51.970 It just looks very weird. 01:23:51.970 --> 01:23:53.860 And the way you do it in Python is this. 01:23:53.860 --> 01:23:56.950 You specify that you want an f string, a format string. 01:23:56.950 --> 01:23:59.440 And I'm just going to start and finish my thought first-- 01:23:59.440 --> 01:24:01.270 f before "". 01:24:01.270 --> 01:24:04.910 If you want to print out z, you could literally just do this. 01:24:04.910 --> 01:24:08.620 And so this is just an f string, but you're interpolating z. 01:24:08.620 --> 01:24:12.040 So it doesn't do anything more than it did a moment ago when I literally just 01:24:12.040 --> 01:24:13.090 passed in z. 01:24:13.090 --> 01:24:15.880 But as soon as you have an f string, you can 01:24:15.880 --> 01:24:19.700 configure the variable to print out to a specific number of digits. 01:24:19.700 --> 01:24:24.910 So if you actually want to print out z to, say, 50 decimal points, 01:24:24.910 --> 01:24:28.210 just to see a lot, you can use crazy syntax like this. 01:24:28.210 --> 01:24:31.270 So it's just using the curly braces, as I introduced before. 01:24:31.270 --> 01:24:34.000 But you then use a dot after a colon, and then 01:24:34.000 --> 01:24:37.270 you specify the number of digits that you want and then an f to make clear 01:24:37.270 --> 01:24:37.960 it's a float. 01:24:37.960 --> 01:24:40.877 Honestly, I google this all the time when I don't remember the syntax. 01:24:40.877 --> 01:24:43.470 But the point is the functionality exists. 01:24:43.470 --> 01:24:43.970 All right. 01:24:43.970 --> 01:24:48.320 Let me go down here and rerun python of calculator.py. 01:24:48.320 --> 01:24:52.640 And unfortunately, if I divide 1 by 3, not all of my problems are solved. 01:24:52.640 --> 01:24:56.090 Floating point precision is still a thing. 01:24:56.090 --> 01:24:59.150 So be mindful of the fact that there are these limitations 01:24:59.150 --> 01:25:00.860 in the world of Python. 01:25:00.860 --> 01:25:02.240 Floating point precision remains. 01:25:02.240 --> 01:25:04.490 If you want to do even better than that, though, there 01:25:04.490 --> 01:25:07.670 exist a lot more libraries, third-party libraries, 01:25:07.670 --> 01:25:11.630 that can give you much greater precision for scientific purposes, 01:25:11.630 --> 01:25:13.830 financial purposes, or the like. 01:25:13.830 --> 01:25:16.580 But what about another problem from C, integer overflow? 01:25:16.580 --> 01:25:19.370 If you just count to high, recall that you might accidentally 01:25:19.370 --> 01:25:22.880 overflow the capacity of an integer and end up going back to 0, 01:25:22.880 --> 01:25:25.190 or worse, going negative altogether. 01:25:25.190 --> 01:25:28.430 In Python, this problem does not exist. 01:25:28.430 --> 01:25:31.610 In Python, when you have an integer, a.k.a. 01:25:31.610 --> 01:25:34.460 int, even though we haven't needed to use the keyword int, 01:25:34.460 --> 01:25:37.490 it will grow and grow and grow. 01:25:37.490 --> 01:25:41.590 And Python will reserve more and more memory for that integer to fit it. 01:25:41.590 --> 01:25:43.770 So it is not a fixed number of bits. 01:25:43.770 --> 01:25:46.900 So floating point imprecision is still a problem. 01:25:46.900 --> 01:25:51.120 Integer overflow-- not a problem in the latest versions of Python, 01:25:51.120 --> 01:25:53.250 so a difference worth knowing. 01:25:53.250 --> 01:25:56.778 But what about other features of Python that we didn't have in C? 01:25:56.778 --> 01:25:59.820 Well, let's actually revisit one of those tracebacks, one of those errors 01:25:59.820 --> 01:26:03.130 I ran into earlier, to see how we might actually solve it. 01:26:03.130 --> 01:26:05.250 So let me go back to VS Code here. 01:26:05.250 --> 01:26:07.870 And just for fun, let me go ahead and do this. 01:26:07.870 --> 01:26:09.070 Let me clear my terminal. 01:26:09.070 --> 01:26:12.248 And let me change my calculator to actually have a get_int() function. 01:26:12.248 --> 01:26:14.040 We've seen how to define our own functions. 01:26:14.040 --> 01:26:15.930 Let me not bother with the CS50 library. 01:26:15.930 --> 01:26:18.790 Let me just invent my own get_int() function as follows. 01:26:18.790 --> 01:26:22.590 So def get_int(), and just like the CS50 function, 01:26:22.590 --> 01:26:26.250 I'm going to have get int take a prompt, a string to show the user to ask them 01:26:26.250 --> 01:26:27.150 for an integer. 01:26:27.150 --> 01:26:31.410 And now I'm going to go ahead and return the return value of input(), 01:26:31.410 --> 01:26:33.780 passing that same prompt to input()-- because input(), 01:26:33.780 --> 01:26:37.330 just like get_string(), shows the user a string of text. 01:26:37.330 --> 01:26:40.930 But I do want to convert this thing here to an int. 01:26:40.930 --> 01:26:44.730 So this is just a one-liner, really, of an implementation of get_int(). 01:26:44.730 --> 01:26:49.050 So this is kind of like what CS50 did in its Python library, but not quite. 01:26:49.050 --> 01:26:49.590 Why? 01:26:49.590 --> 01:26:51.010 Because there's a problem with it. 01:26:51.010 --> 01:26:51.760 So let me do this. 01:26:51.760 --> 01:26:54.030 Let me define a main() function just by convention. 01:26:54.030 --> 01:26:57.780 Let me use this implementation of get_int() to ask the user for x. 01:26:57.780 --> 01:27:01.170 Let me use this get_int() function to prompt the user for y. 01:27:01.170 --> 01:27:05.190 And then let me do something simple like print out x + y. 01:27:05.190 --> 01:27:08.340 And then, very last thing, I have to call main(). 01:27:08.340 --> 01:27:10.470 And this is a minor point, but I'm deliberately 01:27:10.470 --> 01:27:13.770 putting multiple blank lines between my functions. 01:27:13.770 --> 01:27:14.970 This too is Pythonic. 01:27:14.970 --> 01:27:17.590 It's a matter of style. style50 will help you with this. 01:27:17.590 --> 01:27:21.630 It's just meant for larger files to really make your functions stand out 01:27:21.630 --> 01:27:24.400 and be a little more separated visually from others. 01:27:24.400 --> 01:27:24.900 All right. 01:27:24.900 --> 01:27:27.720 Let me go ahead and run Python of calculator.py. 01:27:27.720 --> 01:27:28.710 Enter. 01:27:28.710 --> 01:27:29.790 Let me type in 1. 01:27:29.790 --> 01:27:31.020 Let me type in 3. 01:27:31.020 --> 01:27:32.400 And that actually works. 01:27:32.400 --> 01:27:33.840 1 plus 3 is 4. 01:27:33.840 --> 01:27:35.130 Let me do the more obvious. 01:27:35.130 --> 01:27:37.200 1 plus 2 gives me 3. 01:27:37.200 --> 01:27:41.310 So the calculator is in fact working until such time as I, the human, 01:27:41.310 --> 01:27:44.410 don't cooperate and type in something like cat for x. 01:27:44.410 --> 01:27:47.490 Then we get that same traceback as before, 01:27:47.490 --> 01:27:49.390 but I'm seeing it now in this file. 01:27:49.390 --> 01:27:51.790 And let me zoom in on my terminal just to make clear. 01:27:51.790 --> 01:27:55.920 We don't need to see the old history there. 01:27:55.920 --> 01:28:00.390 Let me type in cat, Enter, and you'll see the same traceback. 01:28:00.390 --> 01:28:03.150 And you'll see that, OK, here's where now there's 01:28:03.150 --> 01:28:04.450 multiple functions involved. 01:28:04.450 --> 01:28:05.430 So what's going on? 01:28:05.430 --> 01:28:08.550 The first problem is at line 12 in main(). 01:28:08.550 --> 01:28:12.410 But that's not actually the problem because main() calls my get_int() 01:28:12.410 --> 01:28:12.910 function. 01:28:12.910 --> 01:28:17.410 So on line 6 of calculator.py, this is really the issue-- 01:28:17.410 --> 01:28:21.330 so, again, it's tracing everything that just happened from top to bottom here-- 01:28:21.330 --> 01:28:25.440 and value error-- invalid literal for int() with base 10, 01:28:25.440 --> 01:28:30.780 'cat,' which is to say, like before, cat is not an integer in base 10 or any 01:28:30.780 --> 01:28:31.350 other base. 01:28:31.350 --> 01:28:34.090 It just cannot be converted to an integer. 01:28:34.090 --> 01:28:38.080 So how do you fix this, or, really, how does the CS50 library fix this? 01:28:38.080 --> 01:28:40.180 You won't have to write much code like this. 01:28:40.180 --> 01:28:44.220 But it turns out that Python supports what are called exceptions. 01:28:44.220 --> 01:28:47.310 And generally, an exception is a better way 01:28:47.310 --> 01:28:50.610 of handling certain types of errors because in C, recall 01:28:50.610 --> 01:28:53.160 that the only way we could really handle errors 01:28:53.160 --> 01:28:56.310 is by having functions return special values. 01:28:56.310 --> 01:29:00.330 malloc() could return null, which means it ran out of memory. 01:29:00.330 --> 01:29:01.560 Something went wrong. 01:29:01.560 --> 01:29:06.270 Some functions we wrote in C could return 1, could return 2, 01:29:06.270 --> 01:29:07.380 could return negative 1. 01:29:07.380 --> 01:29:10.260 Recall that we could write our own functions that return values 01:29:10.260 --> 01:29:12.180 to indicate something went wrong. 01:29:12.180 --> 01:29:15.690 But the problem in C is that if you're stealing certain values, 01:29:15.690 --> 01:29:23.280 be it null or 1 or 2 or 3, your function can never return null or 1 or 2 or 3 01:29:23.280 --> 01:29:24.640 as actual values. 01:29:24.640 --> 01:29:25.140 Why? 01:29:25.140 --> 01:29:27.598 Because other people are going to interpret them as errors. 01:29:27.598 --> 01:29:30.960 So you kind of have to use up some of your possible return values 01:29:30.960 --> 01:29:35.310 in a language like C and treat them specially as errors. 01:29:35.310 --> 01:29:37.380 In Python and other languages-- 01:29:37.380 --> 01:29:39.280 Java and others-- you don't have to do that. 01:29:39.280 --> 01:29:43.260 You can instead have more out of band error handling, known as exceptions. 01:29:43.260 --> 01:29:44.760 And that's what's happening here. 01:29:44.760 --> 01:29:49.880 When I run calculator.py and I type in cat, what I'm seeing here 01:29:49.880 --> 01:29:51.810 is actually an exception. 01:29:51.810 --> 01:29:54.650 It's something exceptional, but not in a good way. 01:29:54.650 --> 01:29:57.500 This exception means this was not supposed to happen. 01:29:57.500 --> 01:30:00.663 The type of exception happens to be called a value error. 01:30:00.663 --> 01:30:03.830 And within the world of Python, there's this whole taxonomy, that is to say, 01:30:03.830 --> 01:30:05.570 a whole list of possible exceptions. 01:30:05.570 --> 01:30:07.370 ValueError is one of the most common. 01:30:07.370 --> 01:30:09.980 We saw another one before, name error, when I said 01:30:09.980 --> 01:30:12.560 meow when Python didn't know what meow meant. 01:30:12.560 --> 01:30:14.630 So this is just an example of an exception. 01:30:14.630 --> 01:30:18.320 But what this means is that there is a way for me to try to handle this 01:30:18.320 --> 01:30:19.130 myself. 01:30:19.130 --> 01:30:21.170 So I'm actually going to go ahead and do this. 01:30:21.170 --> 01:30:26.420 Instead of get_int() simply blindly returning the integer conversion 01:30:26.420 --> 01:30:31.430 of whatever input the user gives me, I'm going to instead literally try to do 01:30:31.430 --> 01:30:32.880 this instead. 01:30:32.880 --> 01:30:34.600 So it's kind of a aptly named phrase. 01:30:34.600 --> 01:30:35.600 It literally means that. 01:30:35.600 --> 01:30:39.710 Please try to do this, except if something goes wrong, 01:30:39.710 --> 01:30:45.030 except if there is a ValueError, in which case 01:30:45.030 --> 01:30:47.340 I want Python to do something else, for instance, 01:30:47.340 --> 01:30:50.380 quote unquote, "Not an integer." 01:30:50.380 --> 01:30:51.550 So what does this mean? 01:30:51.550 --> 01:30:53.130 It's a little weird, the syntax. 01:30:53.130 --> 01:30:57.790 But in the get_int() function, Python will first try to do the following. 01:30:57.790 --> 01:31:00.090 It will try to get an input from the user. 01:31:00.090 --> 01:31:01.780 It will try to convert it to an integer. 01:31:01.780 --> 01:31:03.240 And it will try to return it. 01:31:03.240 --> 01:31:07.590 But if one of those operations fails, namely the integer step in this case, 01:31:07.590 --> 01:31:09.840 then an exception could happen. 01:31:09.840 --> 01:31:11.770 And you might get what's called a ValueError. 01:31:11.770 --> 01:31:11.940 Why? 01:31:11.940 --> 01:31:14.190 Because the documentation tells you that might happen. 01:31:14.190 --> 01:31:16.170 Or, in my case, I experienced it firsthand, 01:31:16.170 --> 01:31:20.140 and now I want to catch this kind of exception in my own code. 01:31:20.140 --> 01:31:22.200 So if there is a ValueError, I'm not going 01:31:22.200 --> 01:31:24.030 to see that crazy traceback anymore. 01:31:24.030 --> 01:31:28.080 I'm instead going to see, quote unquote, "Not an integer." 01:31:28.080 --> 01:31:31.080 But what the CS50 library does for you technically is it 01:31:31.080 --> 01:31:33.510 lets you try again and again and again. 01:31:33.510 --> 01:31:36.178 Recall in the past, if I type in cat and dog and bird, 01:31:36.178 --> 01:31:38.220 it's just going to keep asking me again and again 01:31:38.220 --> 01:31:39.810 until I actually give it an int. 01:31:39.810 --> 01:31:43.620 So that kind of implies that we really need a loop inside of this function. 01:31:43.620 --> 01:31:45.810 And the easiest way to do something forever 01:31:45.810 --> 01:31:50.820 is to loop while true, just like in C, but a capital T in Python. 01:31:50.820 --> 01:31:54.810 And what I'm going to do now is implement a better version of get_int() 01:31:54.810 --> 01:31:57.030 here because what's it going to do? 01:31:57.030 --> 01:31:59.910 It is going to try-- it's going to do this forever. 01:31:59.910 --> 01:32:03.840 It's going to try to get an input, convert it to an int, and return it. 01:32:03.840 --> 01:32:07.590 And just like break breaks you out of a loop, 01:32:07.590 --> 01:32:10.650 return also breaks you out of a loop as well, right? 01:32:10.650 --> 01:32:14.010 Because once you've returned, there's no more need for this function to execute. 01:32:14.010 --> 01:32:17.370 So long story short, you won't have to write much code like this yourself. 01:32:17.370 --> 01:32:23.100 But this is essentially what the CS50 library is doing when it implements 01:32:23.100 --> 01:32:24.960 the Python version of get_int(). 01:32:24.960 --> 01:32:26.310 So what happens now? 01:32:26.310 --> 01:32:31.028 If I run python of calculator.py, and I type in cat, I get yelled at, 01:32:31.028 --> 01:32:32.820 but I'm prompted again because of the loop. 01:32:32.820 --> 01:32:33.697 I type in dog. 01:32:33.697 --> 01:32:35.280 I'm yelled at, but I'm prompted again. 01:32:35.280 --> 01:32:38.190 I type in bird, yelled at, but I'm prompted again. 01:32:38.190 --> 01:32:41.970 If I type in 1, then I type in 2, now it proceeds 01:32:41.970 --> 01:32:46.470 because it tried and succeeded this time as opposed to trying and failing 01:32:46.470 --> 01:32:47.130 last time. 01:32:47.130 --> 01:32:49.770 And technically, the CS50 library doesn't actually 01:32:49.770 --> 01:32:51.580 yell at you with "Not an integer." 01:32:51.580 --> 01:32:54.750 So technically, if you want to handle the error, that is to say, 01:32:54.750 --> 01:32:58.440 catch the exception, you can actually just say, oh, pass, 01:32:58.440 --> 01:33:02.290 and it will just silently try again and again. 01:33:02.290 --> 01:33:06.670 So let me go ahead and run this. python of calculator.py works almost the same. 01:33:06.670 --> 01:33:09.390 But notice now it works just like the C version. 01:33:09.390 --> 01:33:13.290 It doesn't yell at you, but it does prompt you again and again and again. 01:33:13.290 --> 01:33:16.120 But I'll do 1 and 2, and that now is satisfied. 01:33:16.120 --> 01:33:18.450 So that then is exceptions which you'll encounter, 01:33:18.450 --> 01:33:21.930 but you yourself won't have to write much code along those lines. 01:33:21.930 --> 01:33:23.650 Well, what else can we now do? 01:33:23.650 --> 01:33:25.860 Well, let's revisit something like this for Mario, 01:33:25.860 --> 01:33:29.400 recall, whereby we had this two-dimensional world with things 01:33:29.400 --> 01:33:33.030 in the way for Mario, like this column of three bricks. 01:33:33.030 --> 01:33:35.610 Let me actually play around now for a moment with some loops 01:33:35.610 --> 01:33:38.040 just to see how there's different ways that might actually 01:33:38.040 --> 01:33:41.410 resonate with you just in terms of the simplicity of some of these things. 01:33:41.410 --> 01:33:44.310 Let me go ahead and create a program called mario.py. 01:33:44.310 --> 01:33:47.970 And suppose that I want to print a column of three bricks. 01:33:47.970 --> 01:33:50.580 It kind of doesn't get any easier than this in Python. 01:33:50.580 --> 01:33:55.800 So for i in range(3), just go ahead and print out a single hash-- 01:33:55.800 --> 01:33:57.150 done. 01:33:57.150 --> 01:34:00.600 That then is what we took us more lines of code in the past. 01:34:00.600 --> 01:34:03.755 But if I run mario.py, that there gets the job done. 01:34:03.755 --> 01:34:05.880 I could change the i to an underscore, but it's not 01:34:05.880 --> 01:34:09.660 bad to remind myself that i is what's really doing my counting. 01:34:09.660 --> 01:34:11.970 Well, what else could we do beyond this? 01:34:11.970 --> 01:34:17.310 Well, recall that in the world of Mario, we prompted the user, actually, 01:34:17.310 --> 01:34:18.660 for a specific height. 01:34:18.660 --> 01:34:20.640 We didn't just always hardcode 3. 01:34:20.640 --> 01:34:23.680 So I could actually do something like this. 01:34:23.680 --> 01:34:28.860 Let me actually open up from today's code that I came with in advance 01:34:28.860 --> 01:34:31.830 and pull up this C version of Mario. 01:34:31.830 --> 01:34:34.260 So this was from some time ago, in week 1. 01:34:34.260 --> 01:34:38.250 And this is how we implemented a loop that 01:34:38.250 --> 01:34:43.400 ensures that we get a positive integer from the user by just doing while 01:34:43.400 --> 01:34:45.890 and is not positive, and then we use this for loop 01:34:45.890 --> 01:34:47.850 to actually print out that many hashes. 01:34:47.850 --> 01:34:50.250 Now, in Python, it's actually going to be pretty similar, 01:34:50.250 --> 01:34:53.900 except for the fact that in Python, there is no do while loop. 01:34:53.900 --> 01:34:55.670 But recall that a do while loop was useful 01:34:55.670 --> 01:34:59.215 because it means you can get the user to try something and then maybe try again, 01:34:59.215 --> 01:35:00.590 maybe try again, maybe try again. 01:35:00.590 --> 01:35:02.970 So it's really good for user input. 01:35:02.970 --> 01:35:04.220 So let's actually do this. 01:35:04.220 --> 01:35:07.580 Let me borrow the CS50's library get_int() function, 01:35:07.580 --> 01:35:10.940 just so we don't have to re-implement that ourselves again and again. 01:35:10.940 --> 01:35:14.300 Let me, in Python, do this the Pythonic way. 01:35:14.300 --> 01:35:17.630 In Python, if you want to prompt the user to do something again and again 01:35:17.630 --> 01:35:20.780 and again, potentially, you deliberately, by convention, 01:35:20.780 --> 01:35:22.200 induce an infinite loop. 01:35:22.200 --> 01:35:24.200 You just get yourself into an infinite loop. 01:35:24.200 --> 01:35:27.260 But the goal is going to be try something, try something, try 01:35:27.260 --> 01:35:30.230 something, and as soon as you have what you want, break out of the loop 01:35:30.230 --> 01:35:31.050 instead. 01:35:31.050 --> 01:35:34.370 So we're implementing the idea of a do while loop ourselves. 01:35:34.370 --> 01:35:37.940 So I'm going to do this. n, for number, equals get_int(), 01:35:37.940 --> 01:35:40.220 and let's ask the user for a height. 01:35:40.220 --> 01:35:42.020 Then let's just check. 01:35:42.020 --> 01:35:44.430 If n is greater than 0, you know what? 01:35:44.430 --> 01:35:44.930 Break. 01:35:44.930 --> 01:35:46.400 We've got the value we need. 01:35:46.400 --> 01:35:50.420 And if not, it's just going to implicitly keep looping again and again 01:35:50.420 --> 01:35:51.020 and again. 01:35:51.020 --> 01:35:53.690 So in Python, this is to say-- super common-- 01:35:53.690 --> 01:35:57.770 to deliberately induce an infinite loop and break out of it when you have 01:35:57.770 --> 01:35:58.740 what you want. 01:35:58.740 --> 01:35:59.240 All right? 01:35:59.240 --> 01:36:03.770 Now I can just do the same kind of code as before. for i in range not of-- 01:36:03.770 --> 01:36:07.580 rage sometimes-- for i in range, not 3, but n, 01:36:07.580 --> 01:36:10.640 now I can go ahead and print out-- 01:36:10.640 --> 01:36:12.432 oops-- a hash like this. 01:36:12.432 --> 01:36:15.140 If I open my terminal window, it's going to work almost the same, 01:36:15.140 --> 01:36:17.400 but now mario is going to prompt me for the height. 01:36:17.400 --> 01:36:20.690 So I could type in 3, or I could type in 4, 01:36:20.690 --> 01:36:25.910 or I could be uncooperative and type in 0 or negative 1 or even cat. 01:36:25.910 --> 01:36:29.510 And because I'm using the CS50 library, cat is ignored. 01:36:29.510 --> 01:36:32.420 Because I'm using my while loop and breaking out 01:36:32.420 --> 01:36:37.520 of it only when n is positive, I'm also ignoring the 0 and the negative 1. 01:36:37.520 --> 01:36:43.060 So, again, this would be a Pythonic way of implementing this particular idea. 01:36:43.060 --> 01:36:49.630 If I want to maybe enhance this a bit further, let me propose that, 01:36:49.630 --> 01:36:55.450 for instance, we consider something like the two-dimensional version-- 01:36:55.450 --> 01:36:58.490 or the horizontal version of this instead. 01:36:58.490 --> 01:37:00.610 So recall that some time ago, we printed out, 01:37:00.610 --> 01:37:03.340 like, four question marks in the sky that might have 01:37:03.340 --> 01:37:05.090 looked a little something like this. 01:37:05.090 --> 01:37:09.740 Now, the very mechanical way to do this would be as follows. 01:37:09.740 --> 01:37:11.200 Let me close my C code. 01:37:11.200 --> 01:37:12.770 Let me clear my terminal. 01:37:12.770 --> 01:37:16.060 And let me just delete my old mario version here. 01:37:16.060 --> 01:37:20.800 And let's just do this-- for i in range(4), let's go ahead 01:37:20.800 --> 01:37:23.620 and print out a question mark, all right? 01:37:23.620 --> 01:37:26.140 I'm going to run python of mario.py, enter, 01:37:26.140 --> 01:37:30.430 and, ugh, it's still a column instead of a row. 01:37:30.430 --> 01:37:34.245 But what's the fix here, perhaps? 01:37:34.245 --> 01:37:34.870 What's the fix? 01:37:34.870 --> 01:37:35.770 Yeah? 01:37:35.770 --> 01:37:37.270 AUDIENCE: The end equals [INAUDIBLE] 01:37:37.270 --> 01:37:38.020 DAVID MALAN: Yeah. 01:37:38.020 --> 01:37:41.350 We can use that named parameter and say end="" 01:37:41.350 --> 01:37:43.720 to just suppress the default backslash n. 01:37:43.720 --> 01:37:46.660 But let's give ourselves one at the very end of the loop 01:37:46.660 --> 01:37:48.500 just to move the cursor correctly. 01:37:48.500 --> 01:37:50.890 So now if I run python of mario.py, now it 01:37:50.890 --> 01:37:53.780 looks like what it might have in the sky here. 01:37:53.780 --> 01:37:57.400 But it turns out Python has some neat features, too, more syntactic sugar, 01:37:57.400 --> 01:37:59.590 if you will, for doing things a little more easily. 01:37:59.590 --> 01:38:01.850 It turns out in Python, you could also do this. 01:38:01.850 --> 01:38:03.430 You could just say print("?" * 4). 01:38:06.490 --> 01:38:11.230 And just like + means concatenation, * here means, 01:38:11.230 --> 01:38:14.170 really, multiply the string by itself that many times, so sort 01:38:14.170 --> 01:38:16.610 of automatically concatenate it with itself. 01:38:16.610 --> 01:38:19.327 So if I run python of mario.py, this too works-- so, 01:38:19.327 --> 01:38:21.910 again, just some features of Python that make it a little more 01:38:21.910 --> 01:38:25.660 pleasant to use so you don't always have to slog through implementing 01:38:25.660 --> 01:38:27.710 a loop or something along those lines. 01:38:27.710 --> 01:38:29.710 Well, what about something more two-dimensional, 01:38:29.710 --> 01:38:32.800 like in the world of this brick here? 01:38:32.800 --> 01:38:36.040 Well, in the context of this sort of grid of bricks, 01:38:36.040 --> 01:38:38.450 we might do something like this in VS Code. 01:38:38.450 --> 01:38:43.700 Let me go back to mario.py, and let me do a 3-by-3 grid for that block, 01:38:43.700 --> 01:38:44.840 like we did in week 1. 01:38:44.840 --> 01:38:47.870 So for i in range(3)-- 01:38:47.870 --> 01:38:52.340 I can nest loops, just like in C-- for j in range(3), 01:38:52.340 --> 01:38:55.190 I can then print out a hash here. 01:38:55.190 --> 01:38:58.370 And then let's leave this alone even though it's not quite right yet. 01:38:58.370 --> 01:39:00.620 Let's do python of mario.py. 01:39:00.620 --> 01:39:04.700 OK, it's, like, nine bricks all in a column, which so your mind might 01:39:04.700 --> 01:39:06.770 wander to the end parameter again. 01:39:06.770 --> 01:39:10.310 So, yeah, let's fix this-- end="", but at the end of that loop, 01:39:10.310 --> 01:39:11.910 let's just print out a new line. 01:39:11.910 --> 01:39:16.020 So this logically is the same as it was in C But in this case, 01:39:16.020 --> 01:39:19.730 I'm now doing it in Python, just a little more easily, without i++, 01:39:19.730 --> 01:39:26.120 without a conditional, I'm just relying on this for i in syntax using range(). 01:39:26.120 --> 01:39:28.080 I can tighten this up further, frankly. 01:39:28.080 --> 01:39:30.840 If I already have the outer loop, I could do something like this. 01:39:30.840 --> 01:39:34.280 I could print out a single hash times 3. 01:39:34.280 --> 01:39:37.610 And now if I run python of mario.py, that works, too. 01:39:37.610 --> 01:39:40.710 So I can combine these ideas in interesting ways as well. 01:39:40.710 --> 01:39:44.550 The goal is simply to seed you with some of these building blocks. 01:39:44.550 --> 01:39:45.050 All right. 01:39:45.050 --> 01:39:48.540 How about code that was maybe a little more logical in nature? 01:39:48.540 --> 01:39:52.640 Well, in Python, we indeed have some other features as well, namely lists. 01:39:52.640 --> 01:39:54.950 And lists are denoted by those square brackets, 01:39:54.950 --> 01:39:56.570 reminiscent of the world of arrays. 01:39:56.570 --> 01:39:59.270 But in Python, what's really nice about lists 01:39:59.270 --> 01:40:02.510 is that their memory is automatically handled for you. 01:40:02.510 --> 01:40:05.930 An array is about having values contiguously in memory. 01:40:05.930 --> 01:40:09.440 In Python, a list is more like a linked list. 01:40:09.440 --> 01:40:12.770 It will allocate memory for you and grow and shrink these things. 01:40:12.770 --> 01:40:14.990 And you do not have to know about pointers. 01:40:14.990 --> 01:40:16.520 You do not have to know about nodes. 01:40:16.520 --> 01:40:18.830 You do not have to implement linked lists yourself. 01:40:18.830 --> 01:40:22.250 You just get list as a data type in Python itself. 01:40:22.250 --> 01:40:25.490 Here, for instance, is some of the documentation for lists specifically. 01:40:25.490 --> 01:40:29.210 And in particular, lists also, like strings, or strs, 01:40:29.210 --> 01:40:31.970 have methods, functions that come with them, 01:40:31.970 --> 01:40:34.710 that just make it easy to do certain things. 01:40:34.710 --> 01:40:40.100 So, for instance, if I wanted to maybe do something like taking averages 01:40:40.100 --> 01:40:43.850 of scores, like we did some time ago, we can do that using a combination 01:40:43.850 --> 01:40:47.550 of lists and the function called len(), which I alluded to earlier, 01:40:47.550 --> 01:40:49.550 which will tell you the length of a list. 01:40:49.550 --> 01:40:50.705 Now, how might we do this? 01:40:50.705 --> 01:40:52.580 Well, if we read the documentation for len(), 01:40:52.580 --> 01:40:55.455 it turns out there's other functions there too that might be helpful. 01:40:55.455 --> 01:40:57.240 So let me go back to VS Code here. 01:40:57.240 --> 01:40:59.030 Let me close mario.py. 01:40:59.030 --> 01:41:02.360 And let me open a file called scores.py, reminiscent of something 01:41:02.360 --> 01:41:03.800 we did weeks ago, too. 01:41:03.800 --> 01:41:06.090 Let me go ahead and, just for demonstration's sake, 01:41:06.090 --> 01:41:09.920 give myself a variable called scores that has my three test scores 01:41:09.920 --> 01:41:11.480 or whatnot from weeks ago. 01:41:11.480 --> 01:41:15.890 So I'm using square brackets, not curly braces, as in C. This is a linked list, 01:41:15.890 --> 01:41:17.780 or a list in Python. 01:41:17.780 --> 01:41:20.460 And let me get the average of these values. 01:41:20.460 --> 01:41:24.380 Well, I could do this-- average =, and it turns out in Python, 01:41:24.380 --> 01:41:26.300 you just get a lot of functionality for free. 01:41:26.300 --> 01:41:29.600 And those functions sometimes take not single arguments, 01:41:29.600 --> 01:41:31.740 but lists as their arguments. 01:41:31.740 --> 01:41:35.330 So, for instance, I can use Python's built-in sum() function and pass 01:41:35.330 --> 01:41:36.230 in those scores. 01:41:36.230 --> 01:41:41.130 I can then divide that sum by the length of the scores list as well. 01:41:41.130 --> 01:41:44.670 So length of a list just tells you how many things are in it. 01:41:44.670 --> 01:41:51.620 So this is like doing magically 72 plus 73 plus 33, all divided by 3 in total. 01:41:51.620 --> 01:41:54.840 If I want to now do the math out, I can print the result. 01:41:54.840 --> 01:41:59.630 So I can print out, using an f string and maybe some prefix text here. 01:41:59.630 --> 01:42:02.520 Let's print out that average here. 01:42:02.520 --> 01:42:06.170 So let me do python of scores.py, enter, and there 01:42:06.170 --> 01:42:08.280 is the average, slightly imprecisely. 01:42:08.280 --> 01:42:10.280 But at that point, I'm not doing so well anyway. 01:42:10.280 --> 01:42:11.100 So that's fine. 01:42:11.100 --> 01:42:17.840 So at this point, we've seen that we have sort of more functionality than C. 01:42:17.840 --> 01:42:20.490 In C, how would we have computed the average weeks ago? 01:42:20.490 --> 01:42:22.170 I mean, we literally created a variable. 01:42:22.170 --> 01:42:22.970 We then had a loop. 01:42:22.970 --> 01:42:24.278 We iterated over the array. 01:42:24.278 --> 01:42:25.320 We added things together. 01:42:25.320 --> 01:42:26.930 It was just so much more work. 01:42:26.930 --> 01:42:30.920 It's nice when you have a language that comes with functions, among them len(), 01:42:30.920 --> 01:42:34.670 among them sum(), that just does more of this for you. 01:42:34.670 --> 01:42:37.690 But suppose you actually want to get the scores from the user. 01:42:37.690 --> 01:42:41.020 In C, we used an array, and in C, we used get_int(). 01:42:41.020 --> 01:42:42.850 We can do something a little similar here. 01:42:42.850 --> 01:42:46.300 Let me propose that instead of hardcoding those three values, 01:42:46.300 --> 01:42:49.770 let me do this. from cs50 import get_int(). 01:42:49.770 --> 01:42:53.710 Now let me give myself an empty list by just saying scores 01:42:53.710 --> 01:42:55.570 equals open bracket, closed bracket. 01:42:55.570 --> 01:42:59.890 And unlike C, where you just can't do this-- you can't say give me an array 01:42:59.890 --> 01:43:03.100 and I'll figure out the length later, unless you resort 01:43:03.100 --> 01:43:06.040 to pointers and memory management or the like, in Python 01:43:06.040 --> 01:43:09.680 you can absolutely give yourself an initially empty list. 01:43:09.680 --> 01:43:12.970 Now let's do this. for i in range(3), let's 01:43:12.970 --> 01:43:15.320 prompt the human for three test scores. 01:43:15.320 --> 01:43:18.400 So the first score will be the return value of get_int(), 01:43:18.400 --> 01:43:20.680 prompting the user for their score. 01:43:20.680 --> 01:43:25.000 And now, if I want to add this score to that otherwise empty list, 01:43:25.000 --> 01:43:27.130 here's where methods come into play, functions 01:43:27.130 --> 01:43:30.130 that come with objects, like lists. 01:43:30.130 --> 01:43:31.780 I can do scores, plural-- 01:43:31.780 --> 01:43:35.710 because that's the name of my variable from line 3-- .append, 01:43:35.710 --> 01:43:37.650 and I can append that score. 01:43:37.650 --> 01:43:40.070 So if we read the documentation for lists in Python, 01:43:40.070 --> 01:43:43.940 you will see that lists come with a function, a method called append(), 01:43:43.940 --> 01:43:47.630 which literally just tacks a value onto the end, tacks a value onto the end, 01:43:47.630 --> 01:43:52.040 like all of that annoying code we would have written in C to iterate with 01:43:52.040 --> 01:43:55.040 pointer and pointer and pointer to the end of the list, append it, 01:43:55.040 --> 01:43:56.180 malloc() a new node. 01:43:56.180 --> 01:43:58.262 Python does all of that for us. 01:43:58.262 --> 01:44:01.220 And so once you've done that, now I can do something similar to before. 01:44:01.220 --> 01:44:04.460 The average equals the sum of those scores divided 01:44:04.460 --> 01:44:06.740 by the length of that list of scores. 01:44:06.740 --> 01:44:11.660 And I can again print out, with an f string, the average value 01:44:11.660 --> 01:44:13.667 in that variable like this. 01:44:13.667 --> 01:44:16.250 So, again, you just have more building blocks at your disposal 01:44:16.250 --> 01:44:19.770 when it comes to something like this. 01:44:19.770 --> 01:44:22.790 You can also do this, just so you've seen other syntax. 01:44:22.790 --> 01:44:28.610 It turns out that instead of doing scores.append, you could also do this. 01:44:28.610 --> 01:44:36.530 You could concatenate scores with itself by adding two lists together like this. 01:44:36.530 --> 01:44:38.000 This looks a little weird. 01:44:38.000 --> 01:44:40.910 But on the left is my variable scores. 01:44:40.910 --> 01:44:44.900 On the right here, I am taking whatever is in that list, 01:44:44.900 --> 01:44:49.190 and I'm adding the current score by adding it to its own list. 01:44:49.190 --> 01:44:51.900 And this will update the value as we go. 01:44:51.900 --> 01:44:54.110 But it does, in fact, change the value of score 01:44:54.110 --> 01:44:58.070 as opposed to appending to the initial list. 01:44:58.070 --> 01:44:58.610 All right. 01:44:58.610 --> 01:45:02.120 How about some other building blocks here? 01:45:02.120 --> 01:45:04.010 Let me propose this. 01:45:04.010 --> 01:45:06.470 Let me close out scores.py. 01:45:06.470 --> 01:45:09.800 Let me open up a file called phonebook.py, 01:45:09.800 --> 01:45:12.080 reminiscent of what we did weeks ago in C. 01:45:12.080 --> 01:45:13.747 And let me give myself a list of names. 01:45:13.747 --> 01:45:15.330 We won't bother with numbers just yet. 01:45:15.330 --> 01:45:17.247 Let's just play with lists for another moment. 01:45:17.247 --> 01:45:18.830 So here is a variable called names. 01:45:18.830 --> 01:45:22.670 It has maybe three names in it-- maybe Carter and David 01:45:22.670 --> 01:45:25.220 and John Harvard, as in past weeks. 01:45:25.220 --> 01:45:29.303 And now let me go ahead and ask the user to input a name-- 01:45:29.303 --> 01:45:31.220 because this is going to be like a phone book. 01:45:31.220 --> 01:45:33.110 I want to ask the user for a name and then look up 01:45:33.110 --> 01:45:35.943 that person's name and the phone book, even though I'm not bothering 01:45:35.943 --> 01:45:38.330 by having any phone numbers just yet. 01:45:38.330 --> 01:45:41.780 How could I search for, a la linear search, someone's name? 01:45:41.780 --> 01:45:45.410 Well, in Python I could do this. for name-- 01:45:45.410 --> 01:45:53.210 rather, for n in names, if the current name equals what the human typed in, 01:45:53.210 --> 01:45:58.470 then go ahead and print out "Found," then break out of this loop. 01:45:58.470 --> 01:46:02.250 Otherwise, we'll print out "Not found" at the bottom. 01:46:02.250 --> 01:46:02.750 All right. 01:46:02.750 --> 01:46:05.360 So let's try this-- python of phonebook.py. 01:46:05.360 --> 01:46:06.960 Let's search for maybe Carter. 01:46:06.960 --> 01:46:07.460 That's easy. 01:46:07.460 --> 01:46:08.570 He's at the beginning. 01:46:08.570 --> 01:46:09.475 Oh, hmm. 01:46:09.475 --> 01:46:11.600 Well, he was found, but then I printed "Not found." 01:46:11.600 --> 01:46:13.250 So that's not quite what I want. 01:46:13.250 --> 01:46:14.240 How about David? 01:46:14.240 --> 01:46:16.820 D-A-V-I-D. "Found," "Not found"-- 01:46:16.820 --> 01:46:18.260 all right, not very correct. 01:46:18.260 --> 01:46:20.090 How about this? 01:46:20.090 --> 01:46:21.900 Let's search for Eli, not in the list. 01:46:21.900 --> 01:46:22.400 OK. 01:46:22.400 --> 01:46:25.070 So at least someone not being in the list is working. 01:46:25.070 --> 01:46:28.310 But logically, for Carter, for David, and even John, 01:46:28.310 --> 01:46:31.355 why are we seeing "Found" and then "Not found?" 01:46:37.030 --> 01:46:38.650 Why is it not found? 01:46:38.650 --> 01:46:39.170 Yeah? 01:46:39.170 --> 01:46:41.063 AUDIENCE: You need to intend the print(). 01:46:41.063 --> 01:46:41.730 DAVID MALAN: OK. 01:46:41.730 --> 01:46:45.610 I don't have seem to have indented the print(), but let me try this. 01:46:45.610 --> 01:46:48.180 If I just go with the else here-- 01:46:48.180 --> 01:46:50.910 let me go up here and indent this and say else-- 01:46:50.910 --> 01:46:53.790 I'm not sure logically this is what we want, 01:46:53.790 --> 01:46:57.810 because what I think this is going to do if I search for maybe Carter-- 01:46:57.810 --> 01:46:58.660 OK, that worked. 01:46:58.660 --> 01:47:00.480 So it's partially fixed the problem. 01:47:00.480 --> 01:47:03.150 But let me try searching for maybe David. 01:47:03.150 --> 01:47:05.280 Oh, now we're sort of the opposite problem-- 01:47:05.280 --> 01:47:06.420 "Not found," "Found." 01:47:06.420 --> 01:47:07.020 Why? 01:47:07.020 --> 01:47:09.450 Well, I don't think we want to immediately conclude 01:47:09.450 --> 01:47:12.000 that someone's not found just because they don't 01:47:12.000 --> 01:47:15.220 equal the current name in the list. 01:47:15.220 --> 01:47:19.950 So it turns out we could fix this in a couple of different ways. 01:47:19.950 --> 01:47:22.290 But there's kind of a neat features of Python. 01:47:22.290 --> 01:47:25.980 In Python, even for loops can have an else clause. 01:47:25.980 --> 01:47:27.150 And this is weird. 01:47:27.150 --> 01:47:29.320 But the way this works is as follows. 01:47:29.320 --> 01:47:34.560 In Python, if you break out of a loop, that's it for the for loop. 01:47:34.560 --> 01:47:38.640 If, though, you get all the way through the list that you're looping over, 01:47:38.640 --> 01:47:42.520 and you never once call line 8-- you never break out of the loop-- 01:47:42.520 --> 01:47:44.950 Python is smart enough to realize, OK, you just 01:47:44.950 --> 01:47:46.510 went through lines 5 through 8. 01:47:46.510 --> 01:47:49.120 You never actually logically called break. 01:47:49.120 --> 01:47:51.490 Here's an else clause to be associated with it. 01:47:51.490 --> 01:47:52.730 Semantically, this is weird. 01:47:52.730 --> 01:47:55.610 We've only ever seen if and else associated with each other. 01:47:55.610 --> 01:47:59.330 But for loops in Python actually can have else as well. 01:47:59.330 --> 01:48:03.190 And in this case now, if I do python of phonebook.py, type in Carter, 01:48:03.190 --> 01:48:05.005 now we get only one answer. 01:48:05.005 --> 01:48:07.630 If I do it again and type in David, now we get only one answer. 01:48:07.630 --> 01:48:08.547 Do it again with John. 01:48:08.547 --> 01:48:09.700 Now we get only one answer. 01:48:09.700 --> 01:48:10.480 Do it with Eli. 01:48:10.480 --> 01:48:12.530 Now we get only one answer. 01:48:12.530 --> 01:48:14.998 So, again, you just get a few more tools in your toolkit 01:48:14.998 --> 01:48:18.040 when it comes to a language like Python that might very well make solving 01:48:18.040 --> 01:48:20.890 problems a little more pleasant. 01:48:20.890 --> 01:48:22.870 But this is kind of stupid in Python. 01:48:22.870 --> 01:48:25.060 This is correct, but it's not well designed, 01:48:25.060 --> 01:48:28.810 because I don't need to iterate over lists like this so pedantically 01:48:28.810 --> 01:48:30.700 like we've been doing for weeks in C. 01:48:30.700 --> 01:48:33.880 I can actually tighten this up, and I can just do this. 01:48:33.880 --> 01:48:38.440 I can get rid of the loop, and I can say if name in names, then print out, 01:48:38.440 --> 01:48:39.430 quote unquote, "Found." 01:48:39.430 --> 01:48:40.750 That's it in Python. 01:48:40.750 --> 01:48:44.020 If you want Python to search a whole list of values for you, 01:48:44.020 --> 01:48:45.700 just let Python do the work. 01:48:45.700 --> 01:48:50.590 And you can literally just say if the name that the human inputted is 01:48:50.590 --> 01:48:55.180 in names, which is this list here, Python will use linear search for you, 01:48:55.180 --> 01:48:59.240 search automatically from left to right, presumably, looking for the value. 01:48:59.240 --> 01:49:02.260 And if it doesn't find it then and only then will 01:49:02.260 --> 01:49:04.490 this else clause execute instead. 01:49:04.490 --> 01:49:06.970 So, again, Python's just starting to save us 01:49:06.970 --> 01:49:10.970 some time because this, too, will find Carter, but it will not find, 01:49:10.970 --> 01:49:12.800 for instance, Eli. 01:49:12.800 --> 01:49:13.300 All right? 01:49:13.300 --> 01:49:15.350 So we get that functionality for free. 01:49:15.350 --> 01:49:18.470 But what more can we perhaps do here? 01:49:18.470 --> 01:49:22.360 Well, it turns out that Python has yet other features 01:49:22.360 --> 01:49:25.930 we might want to explore, namely dictionaries, shortened as dict. 01:49:25.930 --> 01:49:29.708 And a dictionary in Python is just like it was in C and, really, 01:49:29.708 --> 01:49:31.000 in computer science in general. 01:49:31.000 --> 01:49:32.625 A dictionary was an abstract data type. 01:49:32.625 --> 01:49:36.460 And it's a collection of key value pairs it looks a little something like this. 01:49:36.460 --> 01:49:38.980 If in C, if in Python, if, in any language, 01:49:38.980 --> 01:49:43.210 you want to associate something with something, like a name with a number, 01:49:43.210 --> 01:49:46.990 you had to, in problem set 5, implement the darn thing yourself 01:49:46.990 --> 01:49:50.950 by implementing an entire spell checker with an array and linked list 01:49:50.950 --> 01:49:53.890 to store all of those words in your dictionary. 01:49:53.890 --> 01:49:57.220 In Python, as we saw earlier, you can use a set, or you can use, 01:49:57.220 --> 01:50:02.080 more simply, a dictionary that implements for you all of problem 01:50:02.080 --> 01:50:03.730 set 5's ideas. 01:50:03.730 --> 01:50:06.370 But Python does the heavy lifting for you. 01:50:06.370 --> 01:50:11.600 A dict in Python is essentially a hash table, a collection of key value pairs. 01:50:11.600 --> 01:50:13.720 So what does this mean for me in Python? 01:50:13.720 --> 01:50:18.050 It means that I can do some pretty handy things pretty easily. 01:50:18.050 --> 01:50:21.070 So, for instance, let me go back here to VS Code, 01:50:21.070 --> 01:50:25.210 and let me change my phone book altogether to be this. 01:50:25.210 --> 01:50:29.570 Let me give myself a list of dictionaries. 01:50:29.570 --> 01:50:32.830 So people is now going to be a global list. 01:50:32.830 --> 01:50:35.977 And I'm going to demarcate it here with open square bracket 01:50:35.977 --> 01:50:37.060 and closed square bracket. 01:50:37.060 --> 01:50:39.430 And just to be nice and neat and tidy, I'm 01:50:39.430 --> 01:50:45.160 going to have these people no longer just be Carter and David and John, 01:50:45.160 --> 01:50:46.600 as in the previous example. 01:50:46.600 --> 01:50:52.060 But I want each of the elements of this list to be a key value 01:50:52.060 --> 01:50:54.800 pair, like a name and a number. 01:50:54.800 --> 01:50:56.140 So how can I do this? 01:50:56.140 --> 01:50:58.940 In Python, you can use this syntax. 01:50:58.940 --> 01:51:02.290 And this is, I think, the last of the weird looking syntax today. 01:51:02.290 --> 01:51:06.910 You can define a dictionary that is something like this 01:51:06.910 --> 01:51:10.070 by using two curly braces like this. 01:51:10.070 --> 01:51:12.820 And inside of your curly braces, you get to invent 01:51:12.820 --> 01:51:15.140 the name, the keys, and the values. 01:51:15.140 --> 01:51:18.107 So if you want one key to be the person's name, you can do, 01:51:18.107 --> 01:51:20.440 quote unquote, "name" and then, quote unquote, "Carter." 01:51:20.440 --> 01:51:24.850 If you want another key to be "number," you can do, quote unquote, "number," 01:51:24.850 --> 01:51:29.560 and then, quote unquote, something like last time, "1-617-495-1000," 01:51:29.560 --> 01:51:31.420 for instance, for Carter's number there. 01:51:31.420 --> 01:51:35.530 And collectively, everything here on line 2 represents a dictionary. 01:51:35.530 --> 01:51:40.680 It's as though, on a chalkboard, I wrote down "name, Carter, number, 01:51:40.680 --> 01:51:45.790 +1-617-495-1000," row by row by row in this table. 01:51:45.790 --> 01:51:47.700 This is simply the code equivalent thereof. 01:51:47.700 --> 01:51:50.340 If you want to be really nitpicky or tidy, 01:51:50.340 --> 01:51:52.840 you could style your code to look like this, 01:51:52.840 --> 01:51:56.360 which makes it a little more clear, perhaps, as to what's going on. 01:51:56.360 --> 01:51:58.860 It's just starting to add a lot of whitespace to the screen. 01:51:58.860 --> 01:52:01.140 But it's just a collection of key value pairs, 01:52:01.140 --> 01:52:04.573 again, akin to a two-column table like this. 01:52:04.573 --> 01:52:07.740 I'm going to undo the whitespace just to kind of tighten things up because I 01:52:07.740 --> 01:52:09.760 want to cram two other people in here. 01:52:09.760 --> 01:52:13.740 So I'm going to go ahead and do another set of curly braces with, 01:52:13.740 --> 01:52:17.370 quote unquote, "name" and "David," quote unquote, "number"-- 01:52:17.370 --> 01:52:21.300 and we'll have the same number, so "+1-617-495-1000." 01:52:21.300 --> 01:52:25.560 And then, lastly, let's do another set of curly braces for a name of say 01:52:25.560 --> 01:52:34.140 "John," and John Harvard's number, quote unquote, "number" will be "+1"-- 01:52:34.140 --> 01:52:40.120 let's see-- "949-468-2750" is always John Harvard's number. 01:52:40.120 --> 01:52:44.090 And then, by convention, you typically end even this element with a comma. 01:52:44.090 --> 01:52:46.150 But it's not strictly necessary syntactically. 01:52:46.150 --> 01:52:48.410 But stylistically, that's often added for you. 01:52:48.410 --> 01:52:49.450 So what is people? 01:52:49.450 --> 01:52:54.470 people is now a list of dictionaries, a list of dictionaries. 01:52:54.470 --> 01:52:55.700 So what does that mean? 01:52:55.700 --> 01:52:57.700 It means I can now do code like this. 01:52:57.700 --> 01:53:02.200 I can prompt the user with the input() function for someone's name if the goal 01:53:02.200 --> 01:53:04.300 now is to look up that person's number. 01:53:04.300 --> 01:53:05.680 How can I look up that number? 01:53:05.680 --> 01:53:10.720 Well, for each person in the list of people, let's go ahead and do this. 01:53:10.720 --> 01:53:16.930 If the current person's name equals equals whatever name the human 01:53:16.930 --> 01:53:22.930 typed in, then get that person's number by going into that person and doing, 01:53:22.930 --> 01:53:25.870 quote unquote, "number," and then go ahead 01:53:25.870 --> 01:53:31.270 and print out something like this f string "Found" that person's number. 01:53:31.270 --> 01:53:34.310 And then, since we found them, let's just break out all together. 01:53:34.310 --> 01:53:37.600 And if we get through that whole thing, let's just, at the very end, print 01:53:37.600 --> 01:53:39.310 out "Not found." 01:53:39.310 --> 01:53:40.860 So what's weird here? 01:53:40.860 --> 01:53:44.370 If I focus on this code here, this syntax obviously is new. 01:53:44.370 --> 01:53:47.760 The square brackets, though, just means, hey, Python, here comes a list. 01:53:47.760 --> 01:53:49.590 Hey, Python, that's it for the list. 01:53:49.590 --> 01:53:52.320 Inside of this list are three dictionaries. 01:53:52.320 --> 01:53:55.470 The curly braces mean, hey, Python, here comes a dictionary. 01:53:55.470 --> 01:53:57.300 Hey, Python, that's it for the dictionary. 01:53:57.300 --> 01:53:59.970 Each of these dictionaries has two key value pairs-- 01:53:59.970 --> 01:54:03.580 "name" and its value, "number" and its value. 01:54:03.580 --> 01:54:07.980 So you can think of each of these lines as being like a C struct, 01:54:07.980 --> 01:54:09.310 like with typedef and struct. 01:54:09.310 --> 01:54:12.060 But I don't have to decide in advance what the keys and the values 01:54:12.060 --> 01:54:12.727 are going to be. 01:54:12.727 --> 01:54:15.360 I can just, on the fly, create a dictionary like this, 01:54:15.360 --> 01:54:18.550 again, reminiscent of this kind of chalkboard design. 01:54:18.550 --> 01:54:19.050 All right. 01:54:19.050 --> 01:54:21.210 So what am I actually doing in code? 01:54:21.210 --> 01:54:26.490 A dictionary in Python lets you index into it, 01:54:26.490 --> 01:54:31.287 similar to an array with numbers in C. So in C, this 01:54:31.287 --> 01:54:32.370 is a little bit different. 01:54:32.370 --> 01:54:35.550 In C, you might have been in the habit of doing person.name. 01:54:35.550 --> 01:54:38.220 But because it's a dictionary, the syntax in Python 01:54:38.220 --> 01:54:41.590 is you actually use square brackets with strings 01:54:41.590 --> 01:54:45.620 as being inside the square brackets rather than numbers. 01:54:45.620 --> 01:54:49.750 But all this is now doing is it's creating a variable on line 11, 01:54:49.750 --> 01:54:53.060 setting that number equal to that same person's number. 01:54:53.060 --> 01:54:53.560 Why? 01:54:53.560 --> 01:54:56.140 Because we're inside of this loop, I'm iterating 01:54:56.140 --> 01:54:58.340 over each person one at a time. 01:54:58.340 --> 01:54:59.260 And that's what for-- 01:54:59.260 --> 01:55:00.430 that's what n does. 01:55:00.430 --> 01:55:06.280 It assigns the person variable to this dictionary, then this dictionary, 01:55:06.280 --> 01:55:08.860 then this dictionary automatically for me-- 01:55:08.860 --> 01:55:11.480 no need for i and i++ and all of that. 01:55:11.480 --> 01:55:14.020 So this is just saying, if the current person's name 01:55:14.020 --> 01:55:17.560 equals the name we're looking for, get a variable called number and assign it 01:55:17.560 --> 01:55:20.708 that person's number, and then print out that person's number. 01:55:20.708 --> 01:55:23.500 So whereas last time we were just printing "Found" and "Not found," 01:55:23.500 --> 01:55:25.310 now I'm going to print an actual number. 01:55:25.310 --> 01:55:28.270 So if I run python of phonebook.py and I search for Carter, 01:55:28.270 --> 01:55:29.680 there then is his number. 01:55:29.680 --> 01:55:34.550 If I run python of phonebook.py, type in John, there then is John's number. 01:55:34.550 --> 01:55:38.840 And if I search for someone who's not there, I instead just get "Not found." 01:55:38.840 --> 01:55:42.010 So what's interesting and compelling about dictionaries 01:55:42.010 --> 01:55:45.580 is they're kind of known as the Swiss Army knives of data structures 01:55:45.580 --> 01:55:48.010 in programming because you can just use them 01:55:48.010 --> 01:55:49.930 in so many interesting, clever ways. 01:55:49.930 --> 01:55:52.750 If you ever want to associate something with something else, 01:55:52.750 --> 01:55:54.910 a dictionary is your friend. 01:55:54.910 --> 01:55:58.390 And you no longer have to write dozens of lines of code as in P set 5. 01:55:58.390 --> 01:56:01.850 You can write single lines of code to achieve this same idea. 01:56:01.850 --> 01:56:04.990 So, for instance, if I, too, want to tighten this up, 01:56:04.990 --> 01:56:07.180 I actually don't need this loop altogether. 01:56:07.180 --> 01:56:10.240 An even better version of this code would be this. 01:56:10.240 --> 01:56:12.737 I don't need this variable, technically, even 01:56:12.737 --> 01:56:14.320 though this will look a little uglier. 01:56:14.320 --> 01:56:16.870 Notice that I'm only creating a variable called number 01:56:16.870 --> 01:56:19.180 because I want to set it equal to this person's number. 01:56:19.180 --> 01:56:21.940 But strictly speaking, any time you've declared a variable 01:56:21.940 --> 01:56:25.220 and then used it in the next line, eh, you don't really need it. 01:56:25.220 --> 01:56:26.450 So I could do this. 01:56:26.450 --> 01:56:27.950 I could get rid of that line. 01:56:27.950 --> 01:56:30.430 And instead of printing "number" in my curly braces, 01:56:30.430 --> 01:56:33.640 I could actually do person, square brackets, 01:56:33.640 --> 01:56:35.740 and you might be inclined to do this. 01:56:35.740 --> 01:56:38.740 But this is going to confuse Python because you're mixing double quotes 01:56:38.740 --> 01:56:40.410 on the inside and the outside. 01:56:40.410 --> 01:56:43.680 But you can use single quotes here, compellingly. 01:56:43.680 --> 01:56:45.390 So you don't have to do it this way. 01:56:45.390 --> 01:56:47.850 But this is just to show you, syntactically, 01:56:47.850 --> 01:56:51.010 you can put most anything you want in these curly braces 01:56:51.010 --> 01:56:55.350 so long as you don't confuse Python by using the same syntax. 01:56:55.350 --> 01:56:58.240 But let me do one other thing here. 01:56:58.240 --> 01:56:59.910 This is even more powerful. 01:56:59.910 --> 01:57:02.400 Let me propose that if all you're storing 01:57:02.400 --> 01:57:05.760 is names and numbers, names and numbers, I can actually 01:57:05.760 --> 01:57:09.090 simplify this dictionary significantly. 01:57:09.090 --> 01:57:14.370 Let me actually redeclare this people data structure 01:57:14.370 --> 01:57:19.770 to be not a list of dictionaries, but how about just one big dictionary? 01:57:19.770 --> 01:57:22.230 Because if I'm only associating names with numbers, 01:57:22.230 --> 01:57:24.780 I don't technically need to create special keys 01:57:24.780 --> 01:57:26.040 called "name" and "number." 01:57:26.040 --> 01:57:32.730 Why don't I just associate Carter with his number, +1-617-495-1000? 01:57:32.730 --> 01:57:37.240 Why don't I just associate, quote unquote, "David" with his number, 01:57:37.240 --> 01:57:41.500 +1-617-495-1000? 01:57:41.500 --> 01:57:51.850 And then, lastly, let's just associate John with his number, +1-949-468-2750? 01:57:51.850 --> 01:57:54.280 And that too would work. 01:57:54.280 --> 01:57:57.520 But notice that I'm going to get rid of my list of people 01:57:57.520 --> 01:58:01.120 and instead just have one dictionary of people, the downside of which 01:58:01.120 --> 01:58:04.660 is that you can only have one key, one value, one key, one value. 01:58:04.660 --> 01:58:09.013 You can't have a name key and a number key and an email key and an address key 01:58:09.013 --> 01:58:11.680 and any number of other pieces of data that might be compelling. 01:58:11.680 --> 01:58:14.290 But if you've only got key value pairs like this, 01:58:14.290 --> 01:58:17.860 we can tighten up this code significantly so that now, down here, 01:58:17.860 --> 01:58:19.360 I can actually do this. 01:58:19.360 --> 01:58:24.580 If the name I'm looking for is somewhere in that people dictionary, 01:58:24.580 --> 01:58:27.640 then go ahead and get the person's number 01:58:27.640 --> 01:58:32.740 by going into the people dictionary, indexing into it at that person's name, 01:58:32.740 --> 01:58:39.100 and then printing out "Found," for instance, that here number, 01:58:39.100 --> 01:58:42.160 making this an f string, else you can go ahead 01:58:42.160 --> 01:58:44.827 and print out "Not found" in this case here. 01:58:44.827 --> 01:58:47.410 So, again, the difference is that the previous version created 01:58:47.410 --> 01:58:49.930 a list of dictionaries, and I very manually, 01:58:49.930 --> 01:58:52.420 methodically, iterated over it, looking for the person. 01:58:52.420 --> 01:58:55.030 But what's nice again about dictionaries is 01:58:55.030 --> 01:58:58.840 that Python gives you a lot of support for just looking into them easily. 01:58:58.840 --> 01:59:01.630 And this syntax, just like you can use it for lists, 01:59:01.630 --> 01:59:03.520 you can use it for dictionaries as well. 01:59:03.520 --> 01:59:08.210 And Python will look for that name among the keys in the dictionary. 01:59:08.210 --> 01:59:15.610 And if it finds it, you use this syntax to get at that person's number. 01:59:15.610 --> 01:59:16.990 Whew, OK. 01:59:16.990 --> 01:59:21.250 A lot all at once, but are there any questions on this here syntax? 01:59:21.250 --> 01:59:25.220 We'll then introduce a couple of final features with a final flourish. 01:59:25.220 --> 01:59:25.910 Yes? 01:59:25.910 --> 01:59:28.722 AUDIENCE: This way [INAUDIBLE] break [INAUDIBLE].. 01:59:28.722 --> 01:59:30.930 DAVID MALAN: In this case, I do not need to use break 01:59:30.930 --> 01:59:33.030 because I don't have any loop involved. 01:59:33.030 --> 01:59:37.050 So break is only used, as we've seen it, in the context of looping 01:59:37.050 --> 01:59:39.600 over something when you want to terminate the loop early. 01:59:39.600 --> 01:59:42.360 But here Python is doing the searching for you. 01:59:42.360 --> 01:59:46.500 So Python is taking care of that automatically. 01:59:46.500 --> 01:59:47.000 All right. 01:59:47.000 --> 01:59:50.300 Just a couple of final features so that you have a couple of more building 01:59:50.300 --> 01:59:53.540 blocks-- here is the documentation for dictionaries themselves 01:59:53.540 --> 01:59:56.580 in case you want to poke around as to what more you can do with them. 01:59:56.580 --> 01:59:59.300 But it turns out that there are other libraries that 01:59:59.300 --> 02:00:01.880 come with Python, not even third-party, and one of them 02:00:01.880 --> 02:00:07.610 is the sys library, whereby you have system-related functionality. 02:00:07.610 --> 02:00:10.230 And here's its official documentation, for instance. 02:00:10.230 --> 02:00:12.980 But what this means is that certain functionality that was just 02:00:12.980 --> 02:00:17.480 immediately available in C is sometimes tucked away now into libraries 02:00:17.480 --> 02:00:18.080 in Python. 02:00:18.080 --> 02:00:19.910 So, for instance, let me go over to VS Code 02:00:19.910 --> 02:00:23.150 here, and let me just create a program called greet.py, which 02:00:23.150 --> 02:00:25.490 is reminiscent of an old C program that just greets 02:00:25.490 --> 02:00:27.710 the user using command-line arguments. 02:00:27.710 --> 02:00:31.580 But in C, recall that we got access to command-line arguments with main() 02:00:31.580 --> 02:00:34.070 and argc and argv. 02:00:34.070 --> 02:00:36.870 But none of those have we seen at all today. 02:00:36.870 --> 02:00:39.540 And, in fact, main() itself is no longer required. 02:00:39.540 --> 02:00:43.010 So if you want to do command-line arguments in Python, 02:00:43.010 --> 02:00:44.270 you actually do this. 02:00:44.270 --> 02:00:48.720 From the sys library, you can import something called argv. 02:00:48.720 --> 02:00:50.260 So argv still exists. 02:00:50.260 --> 02:00:54.040 It's just tucked away inside of this library, otherwise known as a module. 02:00:54.040 --> 02:00:55.630 And I can then do this. 02:00:55.630 --> 02:01:00.450 If the length of argv, for instance, does not equal 2, 02:01:00.450 --> 02:01:02.918 well, then, we're going to go ahead and do what we did-- 02:01:02.918 --> 02:01:03.960 or rather, let's do this. 02:01:03.960 --> 02:01:06.330 If the length of argv does equal 2, we're 02:01:06.330 --> 02:01:08.580 going to go ahead and do what we did a couple of weeks 02:01:08.580 --> 02:01:11.670 ago, whereby I'm going to print out "hello," 02:01:11.670 --> 02:01:15.490 comma, and then argv bracket 1, for instance, 02:01:15.490 --> 02:01:18.540 so whatever is in location 1 of that list. 02:01:18.540 --> 02:01:21.990 Else, if the length of argv is not equal to 2-- that is, 02:01:21.990 --> 02:01:24.120 the human did not type two words at the prompt-- 02:01:24.120 --> 02:01:27.730 let's go ahead and print out "hello, world" by default. 02:01:27.730 --> 02:01:31.860 So we did the exact same thing in C. The only difference here is that this now 02:01:31.860 --> 02:01:33.660 is how you get access to argv. 02:01:33.660 --> 02:01:38.020 So let me run this-- python of greet.py and hit Enter. "hello, world" is all I 02:01:38.020 --> 02:01:38.520 get. 02:01:38.520 --> 02:01:40.680 And actually, I got an extra line break because out of habit, 02:01:40.680 --> 02:01:43.320 I included backslash n, but I don't need that in Python. 02:01:43.320 --> 02:01:46.680 So let me fix that. python of greet.py-- "hello, world." 02:01:46.680 --> 02:01:50.160 But if I do python of greet.py, D-A-V-I-D, 02:01:50.160 --> 02:01:53.010 now notice that argv equals 2. 02:01:53.010 --> 02:01:56.490 If I instead do something like Carter, argv now equals 2. 02:01:56.490 --> 02:01:57.780 But there is a difference. 02:01:57.780 --> 02:02:02.280 Technically, I'm typing three words at the prompt, three words at the prompt, 02:02:02.280 --> 02:02:07.380 but argv still only equals 2 because the command python is ignored from argv. 02:02:07.380 --> 02:02:11.440 It's only the name of your file and the thing you type after it. 02:02:11.440 --> 02:02:16.980 So that's then how we might print out arguments in Python using argv. 02:02:16.980 --> 02:02:20.490 Well, what else might we do using some of these here features? 02:02:20.490 --> 02:02:25.180 Well, it turns out that you can exit from programs using this same sys 02:02:25.180 --> 02:02:25.680 library. 02:02:25.680 --> 02:02:27.270 So let me close greet.py. 02:02:27.270 --> 02:02:30.545 Let me open up exit.py just for demonstration's sake. 02:02:30.545 --> 02:02:31.920 And let's do something like this. 02:02:31.920 --> 02:02:33.210 Let's import sys. 02:02:33.210 --> 02:02:37.920 And if the length of sys.argv-- 02:02:37.920 --> 02:02:40.050 so here's just another way of doing this. 02:02:40.050 --> 02:02:44.130 And actually, I'll do it the same first-- from sys import argv. 02:02:44.130 --> 02:02:49.483 If the length of argv does not equal 2-- 02:02:49.483 --> 02:02:51.650 well, let's actually yell at the user with something 02:02:51.650 --> 02:02:54.800 like "Missing command-line argument." 02:02:54.800 --> 02:03:01.220 And then what we can do is exit out of the program entirely using sys.exit(), 02:03:01.220 --> 02:03:02.750 which is a function therein. 02:03:02.750 --> 02:03:05.282 But notice that exit() is a function in sys. 02:03:05.282 --> 02:03:05.990 So you know what? 02:03:05.990 --> 02:03:07.980 It's actually more convenient in this case. 02:03:07.980 --> 02:03:09.590 Let's just import all of sys. 02:03:09.590 --> 02:03:12.800 But because that has not given me direct access to argv, 02:03:12.800 --> 02:03:16.363 let me do sys.argv here and sys.exit() here. 02:03:16.363 --> 02:03:19.280 Otherwise, if all is well, let's just go ahead and print out something 02:03:19.280 --> 02:03:27.140 like "hello, sys.argv," bracket 1, close quote, and that will print out "hello, 02:03:27.140 --> 02:03:27.920 so-and-so." 02:03:27.920 --> 02:03:31.580 And when I'm ready to exit with a non-0-- 02:03:31.580 --> 02:03:35.580 with a 0 exit status, I can actually start to specify these things here. 02:03:35.580 --> 02:03:38.900 So just like in C, if you want to exit from a program with 1 or 2 02:03:38.900 --> 02:03:41.330 or anything else, you can use sys.exit. 02:03:41.330 --> 02:03:45.570 And if you want to exit with a 0, you can do this here instead. 02:03:45.570 --> 02:03:48.090 So we have the same capabilities as in C, 02:03:48.090 --> 02:03:51.720 just accessed a little bit differently. 02:03:51.720 --> 02:03:53.700 Let me propose that-- 02:03:53.700 --> 02:03:55.530 let's see. 02:03:55.530 --> 02:03:58.200 Let me propose that-- 02:03:58.200 --> 02:03:59.530 how about this? 02:04:03.480 --> 02:04:04.590 How about this? 02:04:04.590 --> 02:04:09.120 If we want to go ahead and create something a little more interactive, 02:04:09.120 --> 02:04:12.540 recall that there was that command a while back, namely 02:04:12.540 --> 02:04:15.862 pip, whereby I ran pip install face_recognition. 02:04:15.862 --> 02:04:17.820 That's one of the examples with which we began. 02:04:17.820 --> 02:04:21.150 And that allows me to install more functionality from a third party 02:04:21.150 --> 02:04:24.207 into my own code space or my programming environment more generally. 02:04:24.207 --> 02:04:26.290 Well, we can have a little fun with this, in fact. 02:04:26.290 --> 02:04:27.780 Let me go back to VS Code here. 02:04:27.780 --> 02:04:30.750 And just like there's a command in Linux called cowsay, 02:04:30.750 --> 02:04:32.730 whereby you can get the cow to say something, 02:04:32.730 --> 02:04:35.100 you can also use this kind of thing in Python. 02:04:35.100 --> 02:04:39.330 So if I do pip install cowsay, this, if it's not installed already, 02:04:39.330 --> 02:04:41.250 will install a library called cowsay. 02:04:41.250 --> 02:04:44.160 And what this means is that if I actually want to code up a program 02:04:44.160 --> 02:04:48.540 called, like, moo.py, I can import the cowsay library, 02:04:48.540 --> 02:04:52.080 and I can do something simple like cowsay.cow, 02:04:52.080 --> 02:04:55.620 because there's a function in this library called cow(), 02:04:55.620 --> 02:04:59.970 and I can say something like "This is CS50," quote unquote. 02:04:59.970 --> 02:05:01.330 How do I run this program? 02:05:01.330 --> 02:05:05.700 I can run python of moo.py, and-- oh, underwhelming. 02:05:05.700 --> 02:05:09.210 If I increase the size of my terminal window, run python of moo.py, 02:05:09.210 --> 02:05:11.460 we have that same adorable cow as before, 02:05:11.460 --> 02:05:15.425 but I now have programmatic capabilities with which to manipulate it. 02:05:15.425 --> 02:05:18.300 And so, in fact, I could make this program a little more interesting. 02:05:18.300 --> 02:05:22.170 I could do something like name = quote-- 02:05:22.170 --> 02:05:25.650 or rather, name = input("What's your name?") and combine some 02:05:25.650 --> 02:05:26.580 of today's ideas. 02:05:26.580 --> 02:05:29.320 And now I can say not "This is CS50," but something like, 02:05:29.320 --> 02:05:32.520 quote unquote, "Hello," comma, person's name. 02:05:32.520 --> 02:05:36.720 And now, if I increase the size of my terminal, rerun python of moo.py, 02:05:36.720 --> 02:05:39.570 it's not going to actually moo or say "This is CS50." 02:05:39.570 --> 02:05:42.780 It's going to say something like "hello, David," and so forth. 02:05:42.780 --> 02:05:45.250 And suffice it to say through other functions, 02:05:45.250 --> 02:05:48.730 you can do not only cows but dragons and other fancy things, too. 02:05:48.730 --> 02:05:52.000 But even in Python, too, can you generate not just ASCII art, 02:05:52.000 --> 02:05:54.310 but actual art and actual images. 02:05:54.310 --> 02:05:57.845 And the note I thought we'd end on is doing one other library. 02:05:57.845 --> 02:05:59.470 I'm going to go back into VS Code here. 02:05:59.470 --> 02:06:01.240 I'm going to close moo.py. 02:06:01.240 --> 02:06:06.220 I'm going to do pip install qrcode, which is the name of a library 02:06:06.220 --> 02:06:10.570 that I might want to install to generate QR codes automatically. 02:06:10.570 --> 02:06:12.790 And QR codes are these two-dimensional bar codes. 02:06:12.790 --> 02:06:14.840 If you want to generate these things yourself, 02:06:14.840 --> 02:06:17.080 you don't have to go to a website and type in a URL. 02:06:17.080 --> 02:06:19.880 You can actually write this kind of code yourself. 02:06:19.880 --> 02:06:21.100 So how might I do this? 02:06:21.100 --> 02:06:26.890 Well, let me go into a new file called, say, qr.py. 02:06:26.890 --> 02:06:28.370 And let me do this. 02:06:28.370 --> 02:06:34.030 Let me go ahead and import this library called qrcode. 02:06:34.030 --> 02:06:37.360 Let me go ahead and create a variable called image, or anything else. 02:06:37.360 --> 02:06:40.720 Let me set it equal to this library's qrcodes function called 02:06:40.720 --> 02:06:43.090 make-- no relationship to C. It's just called make 02:06:43.090 --> 02:06:44.650 because you want to make a QR code. 02:06:44.650 --> 02:06:48.370 Let me type in, maybe, the URL of a lecture video here on YouTube-- 02:06:48.370 --> 02:07:01.990 so, like, youtu.be/xvFZjo5PgG0, quote unquote. 02:07:01.990 --> 02:07:07.490 And then I can go ahead and do img.save because inside of this image variable, 02:07:07.490 --> 02:07:10.540 which is a different data type that this library gave me-- 02:07:10.540 --> 02:07:12.130 it doesn't come with Python per se-- 02:07:12.130 --> 02:07:18.820 I can save a file like qr.png, And I can save it in the PNG format, the Portable 02:07:18.820 --> 02:07:19.665 Network Graphic. 02:07:19.665 --> 02:07:21.790 And so just to be clear, what this should hopefully 02:07:21.790 --> 02:07:26.980 do for me is create a QR code containing that particular URL, 02:07:26.980 --> 02:07:31.540 but not as text, but rather as an actual image that I can send, 02:07:31.540 --> 02:07:34.810 I can post online, or, in our case, generate into my code space, 02:07:34.810 --> 02:07:35.830 and then open. 02:07:35.830 --> 02:07:38.558 And so, with all that said, we've seen a bunch of new syntax 02:07:38.558 --> 02:07:39.850 today, a bunch of new features. 02:07:39.850 --> 02:07:45.160 But the ideas underlying Python are exactly the same as they've been in C. 02:07:45.160 --> 02:07:49.000 It's just that you don't have to do nearly as much heavy lifting yourself. 02:07:49.000 --> 02:07:51.850 And here, for instance, in just three lines of code, 02:07:51.850 --> 02:07:54.910 can you generate a massive QR code that people can scan, 02:07:54.910 --> 02:07:57.130 as you can in a moment with your phones, and actually 02:07:57.130 --> 02:07:59.870 link to something like a CS50 class. 02:07:59.870 --> 02:08:03.070 So let me go ahead and run python of qr.py. 02:08:03.070 --> 02:08:04.990 It seems to have run. 02:08:04.990 --> 02:08:09.760 Let me run code of qr.png, which is the file I created. 02:08:09.760 --> 02:08:12.760 I'll close my terminal window, allow you an opportunity 02:08:12.760 --> 02:08:17.930 to scan this here very CS50 lecture. 02:08:17.930 --> 02:08:25.183 And-- and-- is someone's volume up? 02:08:25.183 --> 02:08:26.850 [RICK ASTLEY, "NEVER GONNA GIVE YOU UP"] 02:08:26.850 --> 02:08:27.660 There we go. 02:08:27.660 --> 02:08:28.680 What a perfect ending. 02:08:28.680 --> 02:08:29.180 All right. 02:08:29.180 --> 02:08:30.120 That was CS50. 02:08:30.120 --> 02:08:32.670 We'll see you next time. 02:08:32.670 --> 02:08:35.720 [MUSIC PLAYING]