1 00:00:00,000 --> 00:00:06,461 [MUSIC PLAYING] 2 00:00:06,461 --> 00:00:48,562 3 00:00:48,562 --> 00:00:49,520 DAVID MALAN: All right. 4 00:00:49,520 --> 00:00:52,400 This is CS50, and this is week 6. 5 00:00:52,400 --> 00:00:55,950 And this is, again, one of those rare days where in just a bit of time 6 00:00:55,950 --> 00:00:58,338 you'll be able to say that you learned a new language. 7 00:00:58,338 --> 00:01:01,130 And that language today is going to be this language called Python. 8 00:01:01,130 --> 00:01:04,430 And we'd thought we'd begin by introducing Python 9 00:01:04,430 --> 00:01:06,110 by way of some more familiar friends. 10 00:01:06,110 --> 00:01:08,748 So this, of course, is where we began the course back in week 0 11 00:01:08,748 --> 00:01:10,790 when we introduced Scratch, a simple program that 12 00:01:10,790 --> 00:01:12,860 quite simply says "hello, world." 13 00:01:12,860 --> 00:01:16,100 And then very quickly, things escalated and became 14 00:01:16,100 --> 00:01:19,525 a lot more cryptic, a lot more arcane, and we introduced C and syntax 15 00:01:19,525 --> 00:01:21,650 like this, which of course do the exact same thing, 16 00:01:21,650 --> 00:01:25,430 just printing out "hello, world" on the screen, but with the requirement 17 00:01:25,430 --> 00:01:28,880 that you understand and you include all of this various syntax. 18 00:01:28,880 --> 00:01:34,520 So today, all of this complexity, all of the syntax from C, 19 00:01:34,520 --> 00:01:38,780 suddenly begins to melt away, such that we're 20 00:01:38,780 --> 00:01:40,880 left with this new language called Python 21 00:01:40,880 --> 00:01:44,530 that's going to achieve the exact same goal simply with this line of code 22 00:01:44,530 --> 00:01:45,030 here. 23 00:01:45,030 --> 00:01:48,440 Which is to say that Python tends to be more accessible, 24 00:01:48,440 --> 00:01:50,270 it tends to be a little easier. 25 00:01:50,270 --> 00:01:53,750 But that's because it's built on this tradition of having started, 26 00:01:53,750 --> 00:01:57,290 as humans years ago, building these low-level languages like C, 27 00:01:57,290 --> 00:02:00,240 realizing what features are missing, what some of the pain points are, 28 00:02:00,240 --> 00:02:04,340 and then layering on top of those older languages new ideas, new features, 29 00:02:04,340 --> 00:02:05,730 and in turn new languages. 30 00:02:05,730 --> 00:02:09,210 So there are dozens, hundreds really, of programming languages out there. 31 00:02:09,210 --> 00:02:13,400 But there's always a subset of them that tend to be very popular, very in vogue 32 00:02:13,400 --> 00:02:14,210 at any given time. 33 00:02:14,210 --> 00:02:17,090 Python is among those very popular languages. 34 00:02:17,090 --> 00:02:20,270 And it's the third of our languages that we'll look at, indeed, 35 00:02:20,270 --> 00:02:22,290 at this point in the term. 36 00:02:22,290 --> 00:02:25,070 So let's go ahead and introduce some of the syntax of Python, 37 00:02:25,070 --> 00:02:28,280 really by way of comparison with what we've seen in the past. 38 00:02:28,280 --> 00:02:31,130 Because no matter how new some of today's topics are, 39 00:02:31,130 --> 00:02:34,760 they should all be familiar in the sense that we're going to see loops again, 40 00:02:34,760 --> 00:02:38,540 conditions, variables, functions, return values. 41 00:02:38,540 --> 00:02:41,810 There's pretty much just going to be a translation of features 42 00:02:41,810 --> 00:02:43,378 past to now features present. 43 00:02:43,378 --> 00:02:45,170 So this of course, in the world of Scratch, 44 00:02:45,170 --> 00:02:48,380 was just one puzzle piece or a function, whose purpose in life 45 00:02:48,380 --> 00:02:50,300 is to say "hello, world" on the screen. 46 00:02:50,300 --> 00:02:54,320 In week 1, we translated this to the more cryptic syntax here, 47 00:02:54,320 --> 00:02:58,850 key details being that it's printf, that you have the quote, the string, 48 00:02:58,850 --> 00:03:03,630 "hello, world," you have this backslash n to represent a new line character. 49 00:03:03,630 --> 00:03:06,800 And then of course, this kind of statement has to end with a semicolon. 50 00:03:06,800 --> 00:03:10,310 The equivalent line of code today on out in this language 51 00:03:10,310 --> 00:03:12,930 called Python is going to be quite simply this. 52 00:03:12,930 --> 00:03:16,850 So it looks similar, certainly, but it's now print instead of printf. 53 00:03:16,850 --> 00:03:21,620 We still have the double quotes, but gone are the backslash n as well as 54 00:03:21,620 --> 00:03:22,470 the semicolon. 55 00:03:22,470 --> 00:03:25,303 So if you've been kicking yourself all too frequently for forgetting 56 00:03:25,303 --> 00:03:29,150 stupid things like the semicolons, Python will now be your friend. 57 00:03:29,150 --> 00:03:31,160 Well, let's take a look at another example 58 00:03:31,160 --> 00:03:35,150 here, how we might go about getting user input as well. 59 00:03:35,150 --> 00:03:38,360 Well, here notice that we have a puzzle piece called Ask. 60 00:03:38,360 --> 00:03:40,610 And it says, ask "What's your name?" and wait. 61 00:03:40,610 --> 00:03:43,580 And the next puzzle piece said, whatever the human had typed in, 62 00:03:43,580 --> 00:03:45,290 precede it with the word "hello." 63 00:03:45,290 --> 00:03:47,960 In C we saw code like this-- string_answer 64 00:03:47,960 --> 00:03:50,360 equals get_string "what's your name?" 65 00:03:50,360 --> 00:03:54,080 and then printing out with printf, "hello %s," 66 00:03:54,080 --> 00:03:56,810 plugging in one value for the other. 67 00:03:56,810 --> 00:04:00,230 In Python, some of this complexity is about to melt away, too. 68 00:04:00,230 --> 00:04:03,690 And in Python, we're going to see a little something like this. 69 00:04:03,690 --> 00:04:07,100 So no longer present is the mention of the type of variable. 70 00:04:07,100 --> 00:04:10,110 No longer present is the semicolon at the end. 71 00:04:10,110 --> 00:04:15,102 And no longer present is the %s and that additional argument to print. 72 00:04:15,102 --> 00:04:17,519 So in fact, let's go ahead and see these things in action. 73 00:04:17,519 --> 00:04:21,320 I'm going to go ahead and go over to CS50 IDE here for just a moment. 74 00:04:21,320 --> 00:04:24,830 And within CS50 IDE, I'm going to go ahead and write 75 00:04:24,830 --> 00:04:26,930 my very first Python program. 76 00:04:26,930 --> 00:04:30,140 And to do that, I'm going to go ahead and create a file that we'll initially 77 00:04:30,140 --> 00:04:31,730 called hello.py. 78 00:04:31,730 --> 00:04:35,870 Much like in the world of C, Python programs have a standard file extension 79 00:04:35,870 --> 00:04:38,270 being .py instead of .c. 80 00:04:38,270 --> 00:04:41,390 And I'm just going to do what I proposed was the simplest translation. 81 00:04:41,390 --> 00:04:45,170 I'm just going to go ahead and say print, "hello, world." 82 00:04:45,170 --> 00:04:46,430 I'm going to save my file. 83 00:04:46,430 --> 00:04:48,110 And then I'm going to go down to my terminal window. 84 00:04:48,110 --> 00:04:50,420 And in the past, of course, we would have used make, 85 00:04:50,420 --> 00:04:54,420 and then we would have done ./hello or the like. 86 00:04:54,420 --> 00:04:58,940 But today, I'm quite simply going to run a command that itself is called Python. 87 00:04:58,940 --> 00:05:01,430 I'm going to pass in the name of the file I just 88 00:05:01,430 --> 00:05:03,620 created as its command line argument. 89 00:05:03,620 --> 00:05:08,630 And voila, hitting Enter, there is my very first program in Python. 90 00:05:08,630 --> 00:05:09,960 So that's pretty powerful. 91 00:05:09,960 --> 00:05:14,398 Let's go ahead and create the second program that I proposed a moment ago. 92 00:05:14,398 --> 00:05:16,190 Instead of just printing out "hello, world" 93 00:05:16,190 --> 00:05:18,770 the whole time, I'm also going to go ahead this time 94 00:05:18,770 --> 00:05:22,070 and give myself a variable that I'll call answer. 95 00:05:22,070 --> 00:05:25,650 I'm going to go ahead now and get input from the user. 96 00:05:25,650 --> 00:05:28,025 And I'm going to go ahead and use the familiar get_string 97 00:05:28,025 --> 00:05:29,540 that we did see in C. 98 00:05:29,540 --> 00:05:33,080 I'm going to go ahead and ask, "What's your name" question mark. 99 00:05:33,080 --> 00:05:35,370 I'm not going to bother with a semicolon. 100 00:05:35,370 --> 00:05:40,160 But down here, I'm going to go ahead and say print "hello," comma, and then 101 00:05:40,160 --> 00:05:41,870 a space inside of the quotes. 102 00:05:41,870 --> 00:05:46,340 And instead of doing something like %s, I'm actually going to go ahead and just 103 00:05:46,340 --> 00:05:50,510 do a plus operator, and then literally the word "answer." 104 00:05:50,510 --> 00:05:53,527 But the catch is that this isn't going to work just yet. 105 00:05:53,527 --> 00:05:56,360 This isn't going to work just yet, because get_string, it turns out, 106 00:05:56,360 --> 00:05:59,357 just like it doesn't come with C, it also doesn't come with Python. 107 00:05:59,357 --> 00:06:01,190 So I need to do one thing that's going to be 108 00:06:01,190 --> 00:06:02,732 a little bit different from the past. 109 00:06:02,732 --> 00:06:05,870 Instead of hash including something, I'm going to literally say 110 00:06:05,870 --> 00:06:09,110 from cs50 import get_string. 111 00:06:09,110 --> 00:06:11,310 So in the world of C, recall that we included 112 00:06:11,310 --> 00:06:15,680 cs50.h, which had declarations for functions like get_string and get_int 113 00:06:15,680 --> 00:06:16,430 and so forth. 114 00:06:16,430 --> 00:06:19,350 In the world of Python, we're going to show you something similar in spirit, 115 00:06:19,350 --> 00:06:21,100 but the syntax is just a little different. 116 00:06:21,100 --> 00:06:25,460 We're going to say from cs50, which is our Python library that we the staff 117 00:06:25,460 --> 00:06:28,880 wrote, import, that is, include a function specifically 118 00:06:28,880 --> 00:06:29,915 called get_string. 119 00:06:29,915 --> 00:06:32,870 And now any errors that I might have seen a moment ago on the screen 120 00:06:32,870 --> 00:06:33,770 have disappeared. 121 00:06:33,770 --> 00:06:39,590 If I go ahead and save this file and now do python space hello.py and hit Enter, 122 00:06:39,590 --> 00:06:44,090 now I can go ahead and type in my actual name, and voila, I see "hello," comma, 123 00:06:44,090 --> 00:06:44,630 "David." 124 00:06:44,630 --> 00:06:47,120 So let's tease apart what's different about this code 125 00:06:47,120 --> 00:06:50,040 and consider what more we can do after this. 126 00:06:50,040 --> 00:06:53,870 So again, notice-- on line 3, there's no mention of string anymore. 127 00:06:53,870 --> 00:06:56,150 If I want a variable, I just go ahead and give myself 128 00:06:56,150 --> 00:06:57,620 a variable called answer. 129 00:06:57,620 --> 00:07:01,010 The function is still called get_string, and it still takes an argument just 130 00:07:01,010 --> 00:07:04,520 like the C version, but the line no longer ends with a semicolon. 131 00:07:04,520 --> 00:07:09,020 On my final line of code here, print is now indeed print instead of printf. 132 00:07:09,020 --> 00:07:10,505 And then this is new syntax. 133 00:07:10,505 --> 00:07:13,130 But in some sense, it's going to be a lot more straightforward. 134 00:07:13,130 --> 00:07:17,330 Instead of having to think in advance where I want the %s and my placeholder, 135 00:07:17,330 --> 00:07:20,750 this plus operator seems to be doing something for me. 136 00:07:20,750 --> 00:07:23,210 And let me go ahead and ask a question of the group here. 137 00:07:23,210 --> 00:07:26,300 What does that plus operator seem to be doing? 138 00:07:26,300 --> 00:07:29,840 Because it's not addition in the arithmetic sense. 139 00:07:29,840 --> 00:07:32,180 We're not like adding numbers together. 140 00:07:32,180 --> 00:07:35,790 But the plus is clearly doing something that gives us a visual result. 141 00:07:35,790 --> 00:07:37,850 Any thoughts from Peter? 142 00:07:37,850 --> 00:07:39,080 What's this plus doing? 143 00:07:39,080 --> 00:07:40,973 AUDIENCE: It's concatenating strings. 144 00:07:40,973 --> 00:07:42,890 DAVID MALAN: Yeah, it's concatenating strings, 145 00:07:42,890 --> 00:07:47,220 which is the term of art to describe the joining of one string and the other. 146 00:07:47,220 --> 00:07:50,240 So it's quite like, therefore, Scratch's own Join block. 147 00:07:50,240 --> 00:07:53,360 We now have a literal translation of that Join block, 148 00:07:53,360 --> 00:07:57,458 which we didn't have in C. In C we had to use printf, we had to use %s. 149 00:07:57,458 --> 00:07:59,750 Python is going to be a little more user friendly, such 150 00:07:59,750 --> 00:08:01,833 that if you want to join two strings like "hello," 151 00:08:01,833 --> 00:08:04,490 comma, space, and the contents of that variable, 152 00:08:04,490 --> 00:08:06,770 we can just use this plus operator instead. 153 00:08:06,770 --> 00:08:09,380 And the last thing that we had to do was, of course, 154 00:08:09,380 --> 00:08:11,870 import this library so that we have access 155 00:08:11,870 --> 00:08:13,493 to the get_string function itself. 156 00:08:13,493 --> 00:08:16,160 Well, let's go ahead and take a tour of just some other features 157 00:08:16,160 --> 00:08:20,730 of Python and then dive in primarily to a lot of hands-on examples today. 158 00:08:20,730 --> 00:08:23,850 So recall that in the example we just saw, 159 00:08:23,850 --> 00:08:26,600 we had this first line of code, which gets a string from the user, 160 00:08:26,600 --> 00:08:29,000 stores it in a variable called answer. 161 00:08:29,000 --> 00:08:31,250 We had this second line of code, which as Peter notes, 162 00:08:31,250 --> 00:08:33,500 concatenated two values together. 163 00:08:33,500 --> 00:08:38,360 But it turns out, even though this is definitely more convenient than in C 164 00:08:38,360 --> 00:08:40,700 in that you can just take an existing string and another 165 00:08:40,700 --> 00:08:44,480 and join them together without having to use format strings or the like, 166 00:08:44,480 --> 00:08:47,150 well, it turns out there's another way, there's frankly many 167 00:08:47,150 --> 00:08:50,060 ways in languages like Python to achieve the same result. 168 00:08:50,060 --> 00:08:53,540 And I'm going to go ahead and propose that we now change this line here 169 00:08:53,540 --> 00:08:55,160 to this funky syntax. 170 00:08:55,160 --> 00:08:57,710 So definitely ugly at first glance, and that's 171 00:08:57,710 --> 00:09:01,100 partly because this is a relatively new feature of Python. 172 00:09:01,100 --> 00:09:06,170 But notice that in Python can we use these curly braces, so curly braces 173 00:09:06,170 --> 00:09:11,280 that we have used in C, to plug in an actual value of a variable here. 174 00:09:11,280 --> 00:09:15,980 So instead of %s, Python's print function uses these curly braces that 175 00:09:15,980 --> 00:09:18,950 essentially say, plug in a value here. 176 00:09:18,950 --> 00:09:20,660 But there's one oddity here. 177 00:09:20,660 --> 00:09:25,220 You can't just start putting curly braces and variable names into strings, 178 00:09:25,220 --> 00:09:27,230 that is quoted strings in Python. 179 00:09:27,230 --> 00:09:32,750 You also have to tell the language that what follows is a formatted string. 180 00:09:32,750 --> 00:09:35,060 So this is perhaps the weirdest thing we've seen yet. 181 00:09:35,060 --> 00:09:36,950 But when you do have a pair of double quotes 182 00:09:36,950 --> 00:09:41,240 like I have here, prefixing it with an f will actually 183 00:09:41,240 --> 00:09:44,060 tell the computer to format the contents of that string, 184 00:09:44,060 --> 00:09:47,150 plugging in values between those currently braces, as opposed to 185 00:09:47,150 --> 00:09:50,540 literally printing those curly braces themselves. 186 00:09:50,540 --> 00:09:54,740 So let me go ahead and transition to my actual code here and try this out. 187 00:09:54,740 --> 00:09:58,490 Instead of using the concatenation operator as Peter described it, 188 00:09:58,490 --> 00:10:01,070 this plus operator, let me literally go ahead 189 00:10:01,070 --> 00:10:04,860 and say, "hello, answer," initially. 190 00:10:04,860 --> 00:10:07,580 So this is probably not going to be the right approach, 191 00:10:07,580 --> 00:10:10,760 because if I rerun this program, python of hello.py, 192 00:10:10,760 --> 00:10:12,260 it's going to ask me what's my name. 193 00:10:12,260 --> 00:10:14,093 I'm going to type in "David," and it's going 194 00:10:14,093 --> 00:10:18,140 to ignore me altogether, because I literally hardcoded "hello, answer." 195 00:10:18,140 --> 00:10:20,990 But it's also not going to be quite right to just start 196 00:10:20,990 --> 00:10:25,520 putting that in curly braces, because if I again run this program, python 197 00:10:25,520 --> 00:10:28,190 of hello.py, and type in my name, now it's going 198 00:10:28,190 --> 00:10:31,350 to say "hello, squiggly brace answer." 199 00:10:31,350 --> 00:10:33,620 So here is just a subtle change where I have 200 00:10:33,620 --> 00:10:38,390 to tell Python that this type of string between the double quotes is in fact 201 00:10:38,390 --> 00:10:39,830 a formatted string. 202 00:10:39,830 --> 00:10:43,370 And now if I rerun python of hello.py and type in "David," 203 00:10:43,370 --> 00:10:45,347 I now get "hello, David." 204 00:10:45,347 --> 00:10:47,930 So it's marginally more convenient than C, because, again, you 205 00:10:47,930 --> 00:10:50,722 don't have to have a placeholder here, a placeholder here, and then 206 00:10:50,722 --> 00:10:52,730 a comma separated list of additional arguments. 207 00:10:52,730 --> 00:10:55,500 So it's just a more succinct way, if you will, 208 00:10:55,500 --> 00:10:59,900 to actually introduce more values into a string that you want to create. 209 00:10:59,900 --> 00:11:03,470 These are called format strings, or for short f-strings. 210 00:11:03,470 --> 00:11:06,740 And it's a new feature that we now have in our toolkit when programming 211 00:11:06,740 --> 00:11:08,540 with this new language called Python. 212 00:11:08,540 --> 00:11:11,660 Well, let's take a look at a few other translation of puzzle pieces 213 00:11:11,660 --> 00:11:13,760 to see, and then turn to Python and then start 214 00:11:13,760 --> 00:11:16,050 building some programs of our own. 215 00:11:16,050 --> 00:11:20,000 So here in Scratch, this was an example early on of a variable 216 00:11:20,000 --> 00:11:22,640 called counter, initializing it to 0. 217 00:11:22,640 --> 00:11:26,990 In C, in week 1, we started translating that to code like this-- int counter 218 00:11:26,990 --> 00:11:28,910 equals 0 semicolon. 219 00:11:28,910 --> 00:11:33,020 And that gave us a variable of type int whose initial value was 0. 220 00:11:33,020 --> 00:11:35,690 In Python, the code is going to be similar-- 221 00:11:35,690 --> 00:11:39,590 similar, but it's going to be a little simpler still. 222 00:11:39,590 --> 00:11:44,030 Notice that I don't have to in Python mention the type of variable I want. 223 00:11:44,030 --> 00:11:46,520 It will infer from context what it is. 224 00:11:46,520 --> 00:11:48,930 And I also don't have to have the semicolon there. 225 00:11:48,930 --> 00:11:53,750 So counter equals 0 in Python is going to give you a variable called counter. 226 00:11:53,750 --> 00:11:57,410 And because you're assigning it the value 0, Python itself 227 00:11:57,410 --> 00:11:59,780 the language will infer that, oh, you must 228 00:11:59,780 --> 00:12:02,510 mean this to be an int or an integer. 229 00:12:02,510 --> 00:12:04,010 What else did we see in Scratch? 230 00:12:04,010 --> 00:12:05,540 Change counter by 1. 231 00:12:05,540 --> 00:12:08,780 So this was a way of increasing the value of a variable by 1. 232 00:12:08,780 --> 00:12:11,600 In C, we had a few different ways to implement this. 233 00:12:11,600 --> 00:12:14,360 We could say counter equals counter plus 1. 234 00:12:14,360 --> 00:12:17,160 It's kind of pedantic, it's kind of long and tedious to type. 235 00:12:17,160 --> 00:12:19,610 So instead, we had some shorthand notation that 236 00:12:19,610 --> 00:12:23,140 allowed us to do it this way instead. 237 00:12:23,140 --> 00:12:27,200 In C, we were able to do counter plus equals 1, 238 00:12:27,200 --> 00:12:29,850 and that was going to achieve the same result. 239 00:12:29,850 --> 00:12:32,940 Well, in Python we actually have a couple of approaches as well. 240 00:12:32,940 --> 00:12:37,130 We can, much like in C, say it explicitly like this 241 00:12:37,130 --> 00:12:38,700 but just omit the semicolon. 242 00:12:38,700 --> 00:12:40,730 So counter equals counter plus 1. 243 00:12:40,730 --> 00:12:44,420 The logic in Python is exactly the same as in C. 244 00:12:44,420 --> 00:12:48,800 And as for this shorthand notation, this also exists in Python, again 245 00:12:48,800 --> 00:12:50,150 without the semicolon. 246 00:12:50,150 --> 00:12:55,310 The one thing that does not exist in Python at this point in the story is 247 00:12:55,310 --> 00:13:01,850 that fancy counter++ syntax, or i++, that syntactic sugar that made it even 248 00:13:01,850 --> 00:13:04,040 more succinct to just increment a variable, 249 00:13:04,040 --> 00:13:06,710 unfortunately does not exist in Python. 250 00:13:06,710 --> 00:13:12,170 But you can do counter plus equals 1, or whatever your variable happens to be. 251 00:13:12,170 --> 00:13:14,420 Well, what else did we see in Scratch and then C? 252 00:13:14,420 --> 00:13:15,360 recall this. 253 00:13:15,360 --> 00:13:18,290 We introduced, of course, conditions pretty early on. 254 00:13:18,290 --> 00:13:20,300 And those conditions use Boolean expressions 255 00:13:20,300 --> 00:13:23,570 to decide whether to do this, or this other thing, or something else 256 00:13:23,570 --> 00:13:24,410 altogether. 257 00:13:24,410 --> 00:13:28,850 In C, we converted this to what looked kind of similar. 258 00:13:28,850 --> 00:13:32,300 Indeed, the curly braces kind of hug the printf line, just 259 00:13:32,300 --> 00:13:36,020 like the yellow condition here hugs the purple Say block. 260 00:13:36,020 --> 00:13:41,060 And we had parentheses around the Boolean expression, like x less than y. 261 00:13:41,060 --> 00:13:43,730 We again used printf inside of the curly braces 262 00:13:43,730 --> 00:13:48,500 which had double quotes, a backslash n for a new line, and a semicolon. 263 00:13:48,500 --> 00:13:52,130 Python, nicely enough, is going to be sort of identical in spirit 264 00:13:52,130 --> 00:13:54,080 but simpler syntactically. 265 00:13:54,080 --> 00:13:57,930 What Python is going to look like henceforth is just this. 266 00:13:57,930 --> 00:14:01,700 So the parentheses around the x less than y go away. 267 00:14:01,700 --> 00:14:03,980 The curly braces go away. 268 00:14:03,980 --> 00:14:05,540 The new line goes away. 269 00:14:05,540 --> 00:14:07,550 And the semicolon goes away. 270 00:14:07,550 --> 00:14:11,510 And here you see just a tiny example of evolution of humans programming 271 00:14:11,510 --> 00:14:12,440 languages. 272 00:14:12,440 --> 00:14:14,900 If you and I have been frustrated for some time about all 273 00:14:14,900 --> 00:14:17,630 the stupid semicolons and curly braces all over the place, 274 00:14:17,630 --> 00:14:20,040 it makes it harder, in some sense, for your code to read, 275 00:14:20,040 --> 00:14:23,468 let alone being correct, humans decided when inventing new languages 276 00:14:23,468 --> 00:14:25,760 that, you know what, why don't we just say what we mean 277 00:14:25,760 --> 00:14:29,330 and not worry as much about all of this syntactic complexity? 278 00:14:29,330 --> 00:14:30,560 Let's keep things simpler. 279 00:14:30,560 --> 00:14:34,250 And indeed, that's what we see here, is one example in Python. 280 00:14:34,250 --> 00:14:35,930 But there's a key detail. 281 00:14:35,930 --> 00:14:37,940 If any of you have been in the habit, when 282 00:14:37,940 --> 00:14:41,630 writing code in C, of being a little sloppy when it comes 283 00:14:41,630 --> 00:14:44,900 to your indentation, and maybe style50 is constantly 284 00:14:44,900 --> 00:14:49,010 yelling at you to add spaces, add spaces, or remove spaces or lines, 285 00:14:49,010 --> 00:14:55,100 well, in Python it is now necessary to indent your code correctly. 286 00:14:55,100 --> 00:14:58,820 In C, of course, we, CS50 and a lot of the world in general 287 00:14:58,820 --> 00:15:03,260 recommend that you indent your code by 4 spaces, typically, or one tab. 288 00:15:03,260 --> 00:15:06,590 In the context of Python, you must do so. 289 00:15:06,590 --> 00:15:11,420 If you accidentally omit these spaces just to the left of the print statement 290 00:15:11,420 --> 00:15:14,480 here, your Python code is not going to run at all. 291 00:15:14,480 --> 00:15:17,400 The Python program just won't work. 292 00:15:17,400 --> 00:15:19,257 So no more sloppiness. 293 00:15:19,257 --> 00:15:20,840 Python is going to impose this on you. 294 00:15:20,840 --> 00:15:24,152 But the upside is you don't have to bother including the curly braces. 295 00:15:24,152 --> 00:15:26,360 What about a more complicated condition where there's 296 00:15:26,360 --> 00:15:30,470 two paths you can follow, if or else? 297 00:15:30,470 --> 00:15:34,460 Well, in this case in C, we translated it pretty straightforwardly like this. 298 00:15:34,460 --> 00:15:38,960 Again, parentheses up here, curly braces here and here, backslash n, 299 00:15:38,960 --> 00:15:40,640 backslash n, and semicolon. 300 00:15:40,640 --> 00:15:42,440 You can perhaps guess in Python that this 301 00:15:42,440 --> 00:15:45,140 is going to get a little more compact, because boom, 302 00:15:45,140 --> 00:15:47,700 now we don't need the parentheses anymore. 303 00:15:47,700 --> 00:15:50,380 We do we need to indent, but we don't need the curly braces. 304 00:15:50,380 --> 00:15:53,450 We don't need the new line, and we don't need the semicolon. 305 00:15:53,450 --> 00:15:57,670 So we're sort of shedding features that can be taken now for granted. 306 00:15:57,670 --> 00:16:01,430 What about this example in Scratch when we had a three-way fork in the road, 307 00:16:01,430 --> 00:16:03,790 if, else, if, else? 308 00:16:03,790 --> 00:16:07,960 Well, in Python-- or rather in C, we would have translated this like this. 309 00:16:07,960 --> 00:16:09,560 And there's not much going on there. 310 00:16:09,560 --> 00:16:13,090 But it's pretty substantive number of lines of code, some 12 lines, 311 00:16:13,090 --> 00:16:14,980 just to achieve this simple idea. 312 00:16:14,980 --> 00:16:17,800 In Python, notice what's going to go away here 313 00:16:17,800 --> 00:16:22,345 is, again those parentheses, again those curly braces, again the backslash n, 314 00:16:22,345 --> 00:16:24,340 and the semicolon. 315 00:16:24,340 --> 00:16:26,890 There's only one oddity here. 316 00:16:26,890 --> 00:16:28,120 There's only one oddity. 317 00:16:28,120 --> 00:16:31,120 What looks wrong or weird to you? 318 00:16:31,120 --> 00:16:34,720 Maybe, what looks like a typo to you? 319 00:16:34,720 --> 00:16:38,550 And I promise I haven't screwed up here. 320 00:16:38,550 --> 00:16:40,620 Maybe elsewhere, but not here. 321 00:16:40,620 --> 00:16:42,130 Andrew? 322 00:16:42,130 --> 00:16:46,525 AUDIENCE: I would say the elif instead of else if is different syntactically. 323 00:16:46,525 --> 00:16:47,400 DAVID MALAN: Exactly. 324 00:16:47,400 --> 00:16:53,100 So whereas in C we would literally say else if, in Python, humans years ago, 325 00:16:53,100 --> 00:16:57,960 decided, heck, why say else if and waste all of that time typing that out if you 326 00:16:57,960 --> 00:17:02,640 can more succinctly say "elif" as one word, E-L-I-F. So indeed, 327 00:17:02,640 --> 00:17:04,020 this is correct syntax here. 328 00:17:04,020 --> 00:17:05,312 And you can have more of those. 329 00:17:05,312 --> 00:17:10,440 You can have four forks in the road, five, six, any number thereafter. 330 00:17:10,440 --> 00:17:12,425 But the syntax is indeed a little different. 331 00:17:12,425 --> 00:17:13,800 But it's a little tighter, right? 332 00:17:13,800 --> 00:17:17,369 There's less syntactic distraction when you glance at this code. 333 00:17:17,369 --> 00:17:19,829 You don't have to ignore as many semicolons and curly 334 00:17:19,829 --> 00:17:21,240 braces and the like. 335 00:17:21,240 --> 00:17:23,807 Python tends to just be a little cleaner syntactically. 336 00:17:23,807 --> 00:17:25,890 And indeed, that's characteristic of a lot of more 337 00:17:25,890 --> 00:17:28,260 recent, more modern languages like it. 338 00:17:28,260 --> 00:17:31,770 All right, let's take a look at a few other blocks in Scratch and in turn C. 339 00:17:31,770 --> 00:17:34,890 In Scratch, when we wanted to do something again and again as a loop, 340 00:17:34,890 --> 00:17:38,040 perhaps forever, we would literally use the Forever block. 341 00:17:38,040 --> 00:17:41,460 In C, we could implement this in a few different ways. 342 00:17:41,460 --> 00:17:46,230 And we proposed quite simply this one-- while true print out "hello, world," 343 00:17:46,230 --> 00:17:47,940 again and again and again. 344 00:17:47,940 --> 00:17:50,200 And because the Boolean expression never changes, 345 00:17:50,200 --> 00:17:51,960 it's going to indeed execute forever. 346 00:17:51,960 --> 00:17:54,180 So Python is actually pretty similar, but there 347 00:17:54,180 --> 00:17:55,990 are a couple of subtle differences. 348 00:17:55,990 --> 00:17:58,350 So ingrain in your mind what this looks like here. 349 00:17:58,350 --> 00:18:03,420 We have true in parentheses, the curly braces, the new line, the semicolon. 350 00:18:03,420 --> 00:18:06,090 A lot of that's about to go away, but they're still 351 00:18:06,090 --> 00:18:07,440 going to be a slight difference. 352 00:18:07,440 --> 00:18:11,070 Notice that we're indenting, as I keep emphasizing. 353 00:18:11,070 --> 00:18:14,520 We no longer have the new line or the semicolon or the currently braces, 354 00:18:14,520 --> 00:18:15,720 but True-- 355 00:18:15,720 --> 00:18:17,280 and it turns out, False-- 356 00:18:17,280 --> 00:18:18,910 now must be capitalized. 357 00:18:18,910 --> 00:18:23,610 So whereas in C it was lowercase false, lowercase true, in Python 358 00:18:23,610 --> 00:18:26,280 it's going to be capitalized False, capitalized True. 359 00:18:26,280 --> 00:18:27,000 Why? 360 00:18:27,000 --> 00:18:28,140 Just because. 361 00:18:28,140 --> 00:18:32,550 But there is one other detail that's important to note, both with our loops 362 00:18:32,550 --> 00:18:35,010 here, as well as with our conditions. 363 00:18:35,010 --> 00:18:38,040 Just as before, if I rewind to our most recent condition, 364 00:18:38,040 --> 00:18:40,770 notice that even though we've gotten rid of the curly braces 365 00:18:40,770 --> 00:18:43,620 and we've gotten rid of the parentheses, we now 366 00:18:43,620 --> 00:18:47,640 have introduced these colons, which are necessary after this expression, 367 00:18:47,640 --> 00:18:50,580 this expression, and this one, to make clear to Python 368 00:18:50,580 --> 00:18:54,150 that the lines of code that follow indented underneath 369 00:18:54,150 --> 00:18:57,910 are indeed relevant to that if, elif, or else. 370 00:18:57,910 --> 00:19:01,540 And we see that same feature again here in the context of a loop. 371 00:19:01,540 --> 00:19:02,790 We saw other loops, of course. 372 00:19:02,790 --> 00:19:04,560 In Scratch, when we wanted to do something 373 00:19:04,560 --> 00:19:08,910 a finite number of times like 3, we would repeat the following three times. 374 00:19:08,910 --> 00:19:11,770 In C, we had a few different approaches to this. 375 00:19:11,770 --> 00:19:14,100 And all of them, I dare say, were very mechanical. 376 00:19:14,100 --> 00:19:17,520 Like, if you want to do something three times, the onus in C 377 00:19:17,520 --> 00:19:21,240 is on you to declare a variable, keep track of how many times 378 00:19:21,240 --> 00:19:23,250 you've counted already, increment the thing. 379 00:19:23,250 --> 00:19:24,840 Like, there's a lot of moving parts. 380 00:19:24,840 --> 00:19:27,970 And so in C, one approach looked like this. 381 00:19:27,970 --> 00:19:30,270 We declare a variable called i equals 0-- 382 00:19:30,270 --> 00:19:32,110 but we could call it anything we wan-- 383 00:19:32,110 --> 00:19:35,880 we have a while block here that's asking a Boolean expression again 384 00:19:35,880 --> 00:19:37,770 and again, is i less than 0-- 385 00:19:37,770 --> 00:19:39,480 is i less than 3? 386 00:19:39,480 --> 00:19:42,240 And then inside of the loop, we printed out "hello, world." 387 00:19:42,240 --> 00:19:46,200 And using C's syntactic sugar, the plus plus notation, 388 00:19:46,200 --> 00:19:49,800 we kept adding 1 to i, add 1 to i, add 1 to i, 389 00:19:49,800 --> 00:19:51,840 until we implicitly break out of the loop 390 00:19:51,840 --> 00:19:54,810 because it's, of course, no longer less than 3. 391 00:19:54,810 --> 00:19:59,220 So in Python, similar in spirit, but again, some of that clutter goes away. 392 00:19:59,220 --> 00:20:03,300 i equals 0 is all we need say to give ourselves a variable. 393 00:20:03,300 --> 00:20:07,680 While i less than 3 is all we need to say there but with a colon. 394 00:20:07,680 --> 00:20:11,220 Then inside of that, indented properly, we print out "hello, world." 395 00:20:11,220 --> 00:20:15,610 And-- we can't do the plus plus, so minor disappointment-- 396 00:20:15,610 --> 00:20:18,300 but i plus equals 1 increments i. 397 00:20:18,300 --> 00:20:23,070 So this would be one way of implementing in Python the exact same thing a loop 398 00:20:23,070 --> 00:20:24,720 that executes three times. 399 00:20:24,720 --> 00:20:27,750 But we saw other approaches, of course, in C, 400 00:20:27,750 --> 00:20:30,600 and there's other approaches possible in Python as well. 401 00:20:30,600 --> 00:20:33,527 You might recall in C that we saw this approach, the for loop. 402 00:20:33,527 --> 00:20:35,610 And odds are you've been reaching for the for loop 403 00:20:35,610 --> 00:20:38,527 pretty frequently, because even though it looks a little more cryptic, 404 00:20:38,527 --> 00:20:41,400 you can pack more features into that one line of code 405 00:20:41,400 --> 00:20:43,510 in between those semicolons, if you will. 406 00:20:43,510 --> 00:20:47,070 So same exact logic, it just prints out this "hello, world" 407 00:20:47,070 --> 00:20:49,590 three times using a for loop instead. 408 00:20:49,590 --> 00:20:54,180 In Python, things start to get a little elegant here now. 409 00:20:54,180 --> 00:20:57,210 It's a little weird at first glance, but it's definitely more succinct. 410 00:20:57,210 --> 00:21:01,320 If you want to do something three times, it turns out in Python 411 00:21:01,320 --> 00:21:05,280 you can use a more succinct syntax for the for loop-- for i 412 00:21:05,280 --> 00:21:09,430 in, and then in square brackets a list of values. 413 00:21:09,430 --> 00:21:13,110 So just as we used in the past square brackets in a few different places 414 00:21:13,110 --> 00:21:18,120 to connote arrays and indexing into arrays, in the world of Python whenever 415 00:21:18,120 --> 00:21:22,770 you surround a bunch of values that themselves have commas in between them, 416 00:21:22,770 --> 00:21:26,638 and you encapsulate them all using square brackets, 417 00:21:26,638 --> 00:21:28,680 that's what we're going to call in Python a list. 418 00:21:28,680 --> 00:21:30,900 And it's very similar in spirit to an array, 419 00:21:30,900 --> 00:21:33,490 but we'll call it in the context of Python a list. 420 00:21:33,490 --> 00:21:38,250 And so what this line of code says is, for i in 0, 1, 2-- what does that mean? 421 00:21:38,250 --> 00:21:42,660 This is a for loop in Python that says, give me a variable called i. 422 00:21:42,660 --> 00:21:45,780 And on the first iteration of this loop set i equal to 0. 423 00:21:45,780 --> 00:21:48,270 On the second iteration of this loop set i equal to 1. 424 00:21:48,270 --> 00:21:51,660 And on the last iteration of this loop, set i equal to 2 for me. 425 00:21:51,660 --> 00:21:53,583 It just does all of that for you. 426 00:21:53,583 --> 00:21:55,500 Now, at the end of the day it actually doesn't 427 00:21:55,500 --> 00:21:59,550 matter what i is per se, because I'm not printing the value of i. 428 00:21:59,550 --> 00:22:00,550 And that's totally fine. 429 00:22:00,550 --> 00:22:03,508 Odds are you've used for loops where you did something again and again, 430 00:22:03,508 --> 00:22:06,120 like printing "hello, world," even though you didn't print out 431 00:22:06,120 --> 00:22:07,320 the value of i. 432 00:22:07,320 --> 00:22:10,600 So technically, I could have put any 3 things in the square brackets 433 00:22:10,600 --> 00:22:11,100 if I want. 434 00:22:11,100 --> 00:22:15,450 But the convention would be just enumerate, just like in C, 0, 1, 2, 435 00:22:15,450 --> 00:22:18,330 just like a computer scientist counting from 0. 436 00:22:18,330 --> 00:22:22,530 But this could break down pretty easily. 437 00:22:22,530 --> 00:22:25,500 This could become very ugly very quickly. 438 00:22:25,500 --> 00:22:29,520 Does anyone see a problem with for loops in Python 439 00:22:29,520 --> 00:22:33,120 if you have to put in between those square brackets the list of values 440 00:22:33,120 --> 00:22:36,060 that you want to iterate over? 441 00:22:36,060 --> 00:22:37,110 Noah? 442 00:22:37,110 --> 00:22:40,290 AUDIENCE: If you want to do, for example, a thing 50 times, 443 00:22:40,290 --> 00:22:42,930 you'd have to write out 0, 1, 2, 3, 4, 5, 6. 444 00:22:42,930 --> 00:22:43,680 DAVID MALAN: Yeah. 445 00:22:43,680 --> 00:22:45,330 My God, it would start to look hideous quickly. 446 00:22:45,330 --> 00:22:47,455 And it's funny you mention 50, because in preparing 447 00:22:47,455 --> 00:22:51,060 this demonstration for lecture today, I went back to week 0, 448 00:22:51,060 --> 00:22:55,350 when actually the analog in week 0 was to indeed print out "hello, world" 449 00:22:55,350 --> 00:22:56,042 50 times. 450 00:22:56,042 --> 00:22:57,750 And I thought to myself, damn it, this is 451 00:22:57,750 --> 00:22:59,910 going to look atrocious now, because I literally 452 00:22:59,910 --> 00:23:03,960 have to put inside of square brackets 0, 1, 2, 3, 4, 5, 6, 7, 8, 453 00:23:03,960 --> 00:23:07,807 9, all the way to 49, as Noah says, which would just look atrocious. 454 00:23:07,807 --> 00:23:09,640 Like, surely there's got to be a better way. 455 00:23:09,640 --> 00:23:10,560 And there is. 456 00:23:10,560 --> 00:23:12,810 While this might be compelling for very short values, 457 00:23:12,810 --> 00:23:14,760 there's a simpler way in Python when you want 458 00:23:14,760 --> 00:23:16,860 to do something some number of times. 459 00:23:16,860 --> 00:23:21,150 We can replace this list of three values with this, 460 00:23:21,150 --> 00:23:25,350 a function called range that takes an input, which is the number of things 461 00:23:25,350 --> 00:23:26,430 that you want to return. 462 00:23:26,430 --> 00:23:30,540 And essentially, what range will do for you passed an input like 3, 463 00:23:30,540 --> 00:23:35,580 it will automatically generate for you a list of three values, 0, 1, and 2. 464 00:23:35,580 --> 00:23:39,360 And then Python will iterate over those three values for you. 465 00:23:39,360 --> 00:23:43,170 So to Noah's concern a moment ago, if I now want to iterate 50 times, 466 00:23:43,170 --> 00:23:45,690 I just change the 3 to a 50, I don't have 467 00:23:45,690 --> 00:23:51,300 to create this crazy mess of a manually typed out list of 0 through 49, 468 00:23:51,300 --> 00:23:55,560 which, of course, would not be a very well designed a program, it would seem, 469 00:23:55,560 --> 00:23:59,710 just because of the length of it and the opportunity to mess up and the like. 470 00:23:59,710 --> 00:24:04,080 So in Python, this is perhaps now, if you will, the most Pythonic way 471 00:24:04,080 --> 00:24:05,953 to do something some number of times. 472 00:24:05,953 --> 00:24:08,370 And indeed, this is a term of art in the Python community. 473 00:24:08,370 --> 00:24:10,770 Long story short, technical people, programmers, 474 00:24:10,770 --> 00:24:12,990 they tend to be pretty religious in some sense 475 00:24:12,990 --> 00:24:15,420 when it comes to the "right way" of doing things. 476 00:24:15,420 --> 00:24:18,000 And indeed, within the world of Python programming, 477 00:24:18,000 --> 00:24:22,450 a lot of Python programmers do have both opinions 478 00:24:22,450 --> 00:24:26,940 but also standardized recommendations that dictate how you "should" 479 00:24:26,940 --> 00:24:28,350 write Python code. 480 00:24:28,350 --> 00:24:31,620 And tricks like this are what are considered Pythonic. 481 00:24:31,620 --> 00:24:35,100 You are doing something Pythonically if you're doing it the quote, unquote 482 00:24:35,100 --> 00:24:37,830 "right way," which doesn't mean right in the absolute, 483 00:24:37,830 --> 00:24:40,890 it means right in the sense that most other people, rather, 484 00:24:40,890 --> 00:24:42,500 agree with you in this sense. 485 00:24:42,500 --> 00:24:43,000 All right. 486 00:24:43,000 --> 00:24:46,500 Let's see a few final features of Python before we now start 487 00:24:46,500 --> 00:24:48,030 to build some of our own features. 488 00:24:48,030 --> 00:24:51,702 In C, recall, we had this whole list of data types. 489 00:24:51,702 --> 00:24:54,160 And there are more, and you can create your own, of course. 490 00:24:54,160 --> 00:24:56,100 But the primitives that we looked at initially 491 00:24:56,100 --> 00:25:00,690 were these-- bool, char, double, float, int, long, string, and so forth. 492 00:25:00,690 --> 00:25:04,290 In Python, even though I haven't needed them, 493 00:25:04,290 --> 00:25:08,610 because I can give myself a variable like a string or an int, 494 00:25:08,610 --> 00:25:13,380 just by giving it a name like counter or i or answer, 495 00:25:13,380 --> 00:25:15,300 and then assigning it a value, and Python 496 00:25:15,300 --> 00:25:18,480 infers from what you're assigning it what data type it should be, 497 00:25:18,480 --> 00:25:20,040 Python does have data types. 498 00:25:20,040 --> 00:25:21,960 It's just what's known in the programming 499 00:25:21,960 --> 00:25:24,390 world as a loosely typed language. 500 00:25:24,390 --> 00:25:28,560 In the world of C, C is a strongly typed language, 501 00:25:28,560 --> 00:25:32,760 where, not only do types exist, you must use them explicitly. 502 00:25:32,760 --> 00:25:34,920 In the world of Python, you have what's called 503 00:25:34,920 --> 00:25:39,960 a loosely typed language, in which types exist, 504 00:25:39,960 --> 00:25:43,380 but you can often infer them implicitly. 505 00:25:43,380 --> 00:25:46,770 The burden is not on you the programmer to specify 506 00:25:46,770 --> 00:25:48,480 those data types incessantly. 507 00:25:48,480 --> 00:25:50,740 Let the computer figure it out for you. 508 00:25:50,740 --> 00:25:52,950 So this is our list from C. 509 00:25:52,950 --> 00:25:56,790 This now is going to be our analogous list in the world of Python. 510 00:25:56,790 --> 00:25:59,230 We're going to have bool still, True and False, 511 00:25:59,230 --> 00:26:01,470 but capital T, capital F. We're going to have floats, 512 00:26:01,470 --> 00:26:03,210 which are real numbers with decimal points. 513 00:26:03,210 --> 00:26:06,127 We're going to have ints, which of course are numbers like negative 1, 514 00:26:06,127 --> 00:26:07,600 0, and 1, and so forth. 515 00:26:07,600 --> 00:26:10,950 And then not strings per se, but "stirs", S-T-R. 516 00:26:10,950 --> 00:26:15,930 And where is in the world of C, there was technically no "string type"-- 517 00:26:15,930 --> 00:26:20,010 that was a feature offered by the cs50 library, which just made more 518 00:26:20,010 --> 00:26:22,620 accessible the idea of a char star-- 519 00:26:22,620 --> 00:26:24,450 recall that C has strings. 520 00:26:24,450 --> 00:26:27,630 And they're called strings, but there's no data type called string. 521 00:26:27,630 --> 00:26:29,910 The way you give yourself a string, of course, in C 522 00:26:29,910 --> 00:26:31,890 is to declare something as a char star. 523 00:26:31,890 --> 00:26:34,650 And in cs50's library, we just gave that char star 524 00:26:34,650 --> 00:26:37,620 a synonym, a nickname, an alias, called "string." 525 00:26:37,620 --> 00:26:40,430 In Python, there are actual-- 526 00:26:40,430 --> 00:26:42,560 there is an actual data type for strings. 527 00:26:42,560 --> 00:26:46,000 And for short, it's called S-T-R. 528 00:26:46,000 --> 00:26:46,500 All right. 529 00:26:46,500 --> 00:26:49,550 So with that said, what other features do we 530 00:26:49,550 --> 00:26:51,660 have from Python that we can use here? 531 00:26:51,660 --> 00:26:53,690 Well, there's other data types as well in 532 00:26:53,690 --> 00:26:57,050 Python that are actually going to prove super useful as we begin 533 00:26:57,050 --> 00:26:59,210 to develop more sophisticated programs and do 534 00:26:59,210 --> 00:27:00,980 even cooler things with the language. 535 00:27:00,980 --> 00:27:02,570 We've seen range already. 536 00:27:02,570 --> 00:27:05,960 Strictly speaking, this is a data type of sorts within Python 537 00:27:05,960 --> 00:27:09,560 that gives you back a range of values, by default 0 on up, 538 00:27:09,560 --> 00:27:10,970 based on the input you provide. 539 00:27:10,970 --> 00:27:12,860 List, I keep mentioning verbally. 540 00:27:12,860 --> 00:27:18,920 A list is a proper data type in Python that's similar in spirit to arrays. 541 00:27:18,920 --> 00:27:20,300 But whereas in arrays-- 542 00:27:20,300 --> 00:27:23,390 recall, we've spent great emphasis over the past few weeks 543 00:27:23,390 --> 00:27:25,970 noting that arrays are a fixed size. 544 00:27:25,970 --> 00:27:28,960 You have to decide in advance how big that array is going to be. 545 00:27:28,960 --> 00:27:31,940 And like last week, if you decide, oops, I need more memory, 546 00:27:31,940 --> 00:27:34,260 you have to dynamically allocate more space for it, 547 00:27:34,260 --> 00:27:36,830 copy values over, and then free up the old memory. 548 00:27:36,830 --> 00:27:39,740 Like, there's so much jumping through hoops, so to speak, 549 00:27:39,740 --> 00:27:44,030 when you want to use arrays in C if you want to grow them or even shrink them. 550 00:27:44,030 --> 00:27:49,170 Python and other higher-level languages like it do all of that for you. 551 00:27:49,170 --> 00:27:53,330 So a list is like an array that automatically resizes itself, 552 00:27:53,330 --> 00:27:54,500 bigger and smaller. 553 00:27:54,500 --> 00:27:57,350 That feature now you get for free in the language, so to speak. 554 00:27:57,350 --> 00:27:59,240 You don't have to implement it yourself. 555 00:27:59,240 --> 00:28:00,860 Python has what are called tuples. 556 00:28:00,860 --> 00:28:03,530 In the context of like math, or GPS, you might 557 00:28:03,530 --> 00:28:06,500 have x- and y-coordinates, or latitude and longitude coordinates, 558 00:28:06,500 --> 00:28:08,270 so like comma separated values. 559 00:28:08,270 --> 00:28:11,510 Tuples are one way of implementing those in Python. 560 00:28:11,510 --> 00:28:13,220 Dict, or dictionaries. 561 00:28:13,220 --> 00:28:17,690 So Python has dictionaries that allow you to store keys and values. 562 00:28:17,690 --> 00:28:21,150 Or literally in our human world, if you have a human dictionary here, 563 00:28:21,150 --> 00:28:24,650 for instance for English, much like a dictionary in physical form, 564 00:28:24,650 --> 00:28:29,450 lets you store words and their definitions, a dictionary in Python, 565 00:28:29,450 --> 00:28:33,350 more generally, lets you store any keys and any values. 566 00:28:33,350 --> 00:28:35,540 You can associate one thing with another. 567 00:28:35,540 --> 00:28:39,155 And we'll see that this is a wonderfully useful and versatile data structure. 568 00:28:39,155 --> 00:28:41,030 And then lastly for today's purposes, there's 569 00:28:41,030 --> 00:28:44,090 these things called sets which, if you recall from math, 570 00:28:44,090 --> 00:28:49,100 a set is a collection of values, like a, b, c or 1, 2, 3, without duplicates. 571 00:28:49,100 --> 00:28:50,810 But Python manages that for you. 572 00:28:50,810 --> 00:28:54,050 You can add items to a set, you can remove items from a set. 573 00:28:54,050 --> 00:28:57,110 Python will make sure that there are no duplicates for you, 574 00:28:57,110 --> 00:29:01,230 and it will manage all of the memory for you as well. 575 00:29:01,230 --> 00:29:06,920 So what we have in the way of functions, meanwhile, is a few familiar friends. 576 00:29:06,920 --> 00:29:10,910 Recall that in C we used the cs50 library to get chars, 577 00:29:10,910 --> 00:29:13,430 doubles, floats, ints, longs, and strings. 578 00:29:13,430 --> 00:29:17,330 In Python, thankfully, we don't have to worry about doubles or longs anymore. 579 00:29:17,330 --> 00:29:18,450 More on that in a bit. 580 00:29:18,450 --> 00:29:23,000 But the cs50 library for Python, which you saw me import a few minutes ago, 581 00:29:23,000 --> 00:29:25,040 does give you a function called get_float. 582 00:29:25,040 --> 00:29:26,960 It does give you a function called get_int, 583 00:29:26,960 --> 00:29:29,110 it does give you a function called get_string, 584 00:29:29,110 --> 00:29:31,280 that, at least for this week's purposes, are just 585 00:29:31,280 --> 00:29:32,720 going to make your life easier. 586 00:29:32,720 --> 00:29:36,140 These two are training wheels that we will very quickly take off 587 00:29:36,140 --> 00:29:39,800 so that you're only using native Python code ultimately, 588 00:29:39,800 --> 00:29:41,720 and not CS50'S own library. 589 00:29:41,720 --> 00:29:44,750 But for the sake of transitioning this week from C to Python, 590 00:29:44,750 --> 00:29:48,590 you'll find that these will just make your life easier before we relax 591 00:29:48,590 --> 00:29:52,020 and take those away, too. 592 00:29:52,020 --> 00:29:56,420 So in C, to use the library you had to include cs50.h. 593 00:29:56,420 --> 00:29:58,860 In Python, again you're going to go ahead and import 594 00:29:58,860 --> 00:30:03,620 cs50, or more explicitly, the specific function that you might want to import. 595 00:30:03,620 --> 00:30:06,020 So it turns out there's different ways to import things. 596 00:30:06,020 --> 00:30:08,840 They ultimately achieve essentially the same goal. 597 00:30:08,840 --> 00:30:11,330 You can, with lines like this, explicitly 598 00:30:11,330 --> 00:30:15,640 import one function at a time, like I did earlier using get_string, 599 00:30:15,640 --> 00:30:18,290 or you can import the whole library all at once 600 00:30:18,290 --> 00:30:21,185 by just saying more succinctly, import cs50. 601 00:30:21,185 --> 00:30:24,200 It's going to affect the syntax we have to use hereafter, 602 00:30:24,200 --> 00:30:28,530 but you'll see multiple ways of doing this in our examples here on out. 603 00:30:28,530 --> 00:30:30,350 You can also simplify this a bit, and you 604 00:30:30,350 --> 00:30:35,900 can import a comma separated list of functions from a library like ours. 605 00:30:35,900 --> 00:30:38,810 And this is a convention we'll see quite frequently as well. 606 00:30:38,810 --> 00:30:43,130 Because if we start using popular third-party libraries written 607 00:30:43,130 --> 00:30:46,130 by other programmers on the internet, they will very commonly 608 00:30:46,130 --> 00:30:48,830 give us lots of functions that we ourselves can use, 609 00:30:48,830 --> 00:30:52,490 and we will be able to import those one after the other, 610 00:30:52,490 --> 00:30:56,080 by just specifying them here in this way. 611 00:30:56,080 --> 00:30:56,580 All right. 612 00:30:56,580 --> 00:31:01,940 Let me pause here just to see if there's any questions on Python syntax. 613 00:31:01,940 --> 00:31:06,080 Like, that's essentially it for our crash course in Python syntax. 614 00:31:06,080 --> 00:31:10,135 We're now going to start building things and explore what the features of Python 615 00:31:10,135 --> 00:31:13,010 are and what some of the nuances are, and really the power of Python. 616 00:31:13,010 --> 00:31:17,890 But first, any questions on syntax? 617 00:31:17,890 --> 00:31:20,980 We've seen loops, conditions, variables. 618 00:31:20,980 --> 00:31:25,220 Olivia, question or comment. 619 00:31:25,220 --> 00:31:28,880 AUDIENCE: In a for loop, if you want to increment by something besides 1, 620 00:31:28,880 --> 00:31:32,843 but you don't want to explicitly type out the list, how would you do that? 621 00:31:32,843 --> 00:31:34,260 DAVID MALAN: Really good question. 622 00:31:34,260 --> 00:31:39,150 So if you wanted to use a for loop and iterate over a range of values, 623 00:31:39,150 --> 00:31:46,070 but you wanted that range to be 0, 2, 4, 6, 8, instead of 0, 1, 2, 3, 624 00:31:46,070 --> 00:31:48,980 let me go ahead and go back to that slide from a moment ago. 625 00:31:48,980 --> 00:31:51,710 And I can actually change this on the fly. 626 00:31:51,710 --> 00:31:54,840 Let me go into that slide, which was right here. 627 00:31:54,840 --> 00:31:59,630 And what I can do, actually, is specify another value, which might be this. 628 00:31:59,630 --> 00:32:04,790 If I change the input to range to be not one value but two values, 629 00:32:04,790 --> 00:32:07,520 that's going to be a clue to the computer 630 00:32:07,520 --> 00:32:10,310 that it should count a total of three values, 631 00:32:10,310 --> 00:32:14,022 but it should increment 2 at a time instead of the default, which is 1. 632 00:32:14,022 --> 00:32:15,980 And there's even other capabilities there, too. 633 00:32:15,980 --> 00:32:17,563 You don't have to start counting at 0. 634 00:32:17,563 --> 00:32:20,870 You can adjust that as well, which is to say that with Python, you're 635 00:32:20,870 --> 00:32:23,630 going to find a lot more features come with the language, 636 00:32:23,630 --> 00:32:28,160 and even more powerfully, the functions that you can write 637 00:32:28,160 --> 00:32:31,010 and the functions that you can use in Python 638 00:32:31,010 --> 00:32:34,220 also can take different numbers of arguments. 639 00:32:34,220 --> 00:32:36,830 Sometimes it's 0, sometimes it's 1, sometimes it's 2. 640 00:32:36,830 --> 00:32:40,310 But it's ultimately often up to you. 641 00:32:40,310 --> 00:32:40,820 Good catch. 642 00:32:40,820 --> 00:32:42,020 Other questions? 643 00:32:42,020 --> 00:32:45,860 AUDIENCE: Will we see sequences primarily in the for loops? 644 00:32:45,860 --> 00:32:48,620 Or are there other applications where they're very useful? 645 00:32:48,620 --> 00:32:50,162 DAVID MALAN: Sequences in what sense? 646 00:32:50,162 --> 00:32:53,030 In the sense of ranges or lists or something else? 647 00:32:53,030 --> 00:32:55,535 AUDIENCE: Yeah, in terms of ranges, specifically. 648 00:32:55,535 --> 00:32:56,660 DAVID MALAN: Good question. 649 00:32:56,660 --> 00:32:58,118 Will we use them in other contexts? 650 00:32:58,118 --> 00:33:01,110 Generally speaking, it's pretty rare. 651 00:33:01,110 --> 00:33:04,250 I mean, I'm racking my brain now as to other use cases 652 00:33:04,250 --> 00:33:06,257 that I have used range for. 653 00:33:06,257 --> 00:33:08,090 And I'm sure I could come up with something. 654 00:33:08,090 --> 00:33:12,050 But I think hands down, the most common case is in the context of iteration, 655 00:33:12,050 --> 00:33:13,308 as in a for loop. 656 00:33:13,308 --> 00:33:15,350 And I'll think on that to see other applications. 657 00:33:15,350 --> 00:33:18,110 But any time you want to generate a long list of values 658 00:33:18,110 --> 00:33:21,950 that follow some pattern, whether it's 0, 1, 2, or as Olivia points out, 659 00:33:21,950 --> 00:33:24,650 a range of values with gaps, range will allow 660 00:33:24,650 --> 00:33:26,870 you to avoid having to hardcode it entirely. 661 00:33:26,870 --> 00:33:29,840 And you can actually write your own generator function, so to speak, 662 00:33:29,840 --> 00:33:34,260 a function that returns whatever pattern of values that you want. 663 00:33:34,260 --> 00:33:36,870 Other questions or confusion? 664 00:33:36,870 --> 00:33:39,610 665 00:33:39,610 --> 00:33:43,900 Anything on your end, Brian, from the chat or beyond? 666 00:33:43,900 --> 00:33:46,150 BRIAN: Looks like all the questions are answered here. 667 00:33:46,150 --> 00:33:46,850 DAVID MALAN: All right. 668 00:33:46,850 --> 00:33:48,820 Well, let's go ahead now and do something more interesting 669 00:33:48,820 --> 00:33:49,570 than hello, world. 670 00:33:49,570 --> 00:33:52,660 Because after all, this is where programming really gets fun, 671 00:33:52,660 --> 00:33:55,810 really gets powerful, when you and I no longer 672 00:33:55,810 --> 00:33:59,050 have to implement those low-level implementation details, when 673 00:33:59,050 --> 00:34:02,380 you had to implement memory management for your hash table, 674 00:34:02,380 --> 00:34:06,040 or memory management for a linked list, or copying values in an array. 675 00:34:06,040 --> 00:34:08,500 We've spent the past several weeks focusing really 676 00:34:08,500 --> 00:34:12,070 on some low-level primitives that are useful to understand, 677 00:34:12,070 --> 00:34:13,989 but they're not fun to write. 678 00:34:13,989 --> 00:34:17,322 And I concede that they might not be fun to write in problem set form. 679 00:34:17,322 --> 00:34:20,530 And they're certainly not going to be fun to write for the rest of your life, 680 00:34:20,530 --> 00:34:23,219 every time you want to just write code to solve some problem. 681 00:34:23,219 --> 00:34:24,969 But again, that's where libraries come in. 682 00:34:24,969 --> 00:34:27,159 And now, this is where other languages come in. 683 00:34:27,159 --> 00:34:31,900 It turns out that Python is a much better, a much easier 684 00:34:31,900 --> 00:34:35,080 language to use for solving certain types of problems, 685 00:34:35,080 --> 00:34:39,199 among them some of the problems we have been solving in past problems sets. 686 00:34:39,199 --> 00:34:41,750 So in fact, let me go ahead and do this. 687 00:34:41,750 --> 00:34:46,030 I'm going to go ahead and grab a file here-- 688 00:34:46,030 --> 00:34:47,710 give me one moment-- 689 00:34:47,710 --> 00:34:52,389 called bridge.bmp, which you might recall from a past problem set. 690 00:34:52,389 --> 00:34:56,500 This is the beautiful Weeks bridge down by the Charles River in Cambridge, Mass 691 00:34:56,500 --> 00:34:57,160 by Harvard. 692 00:34:57,160 --> 00:35:00,430 And this is a very clear photograph taken by one of CS50's team members. 693 00:35:00,430 --> 00:35:02,590 And in recent weeks, of course, you wrote code 694 00:35:02,590 --> 00:35:06,850 to do all sorts of mutations of this image, among them blurring the image. 695 00:35:06,850 --> 00:35:10,510 And blur, I dare say, was not the easiest problem to solve. 696 00:35:10,510 --> 00:35:13,240 You had to look up, down, left, and right, sort of average 697 00:35:13,240 --> 00:35:14,080 all of those pixels. 698 00:35:14,080 --> 00:35:17,300 You had to understand how an image is represented one pixel at a time. 699 00:35:17,300 --> 00:35:20,300 So there's a lot of low-level minutia there, when at the end of the day, 700 00:35:20,300 --> 00:35:22,600 all you want to do is just blur an image. 701 00:35:22,600 --> 00:35:26,740 So whereas in past weeks we sort of had to think at and write at this lower 702 00:35:26,740 --> 00:35:29,770 level, now with Python it turns out we're 703 00:35:29,770 --> 00:35:33,070 going to have the ability to think at a higher level of abstraction 704 00:35:33,070 --> 00:35:35,328 and write far less code for ourselves. 705 00:35:35,328 --> 00:35:36,620 So let me go ahead and do this. 706 00:35:36,620 --> 00:35:39,430 I'm going to use my Mac for this instead of CS50 IDE, 707 00:35:39,430 --> 00:35:41,582 so I can open the images more quickly. 708 00:35:41,582 --> 00:35:43,540 This is to say that, even though we'll continue 709 00:35:43,540 --> 00:35:47,050 using CS50 IDE for Python and for other languages 710 00:35:47,050 --> 00:35:51,220 over the remainder of the course, you can also install the requisite software 711 00:35:51,220 --> 00:35:54,880 on a Mac, on a PC, sometimes even kind of sort of a phone 712 00:35:54,880 --> 00:35:59,950 today, to use Python and sort of see, in other languages, on your own devices. 713 00:35:59,950 --> 00:36:02,600 But again, we tend to CS50 IDE during the class 714 00:36:02,600 --> 00:36:05,208 so as to have a standard environment that just works. 715 00:36:05,208 --> 00:36:07,000 So I'm going to go ahead and write, though, 716 00:36:07,000 --> 00:36:11,680 on my computer a program called blur.py, py, of course, 717 00:36:11,680 --> 00:36:13,630 being the file extension for Python programs. 718 00:36:13,630 --> 00:36:15,460 So my program looks a little different now. 719 00:36:15,460 --> 00:36:17,620 I've got this black and blue and white window. 720 00:36:17,620 --> 00:36:21,130 But this is just a text editor on my own personal Mac here. 721 00:36:21,130 --> 00:36:22,870 I'm going to go ahead and do this. 722 00:36:22,870 --> 00:36:25,870 I need to have some functionality related to images 723 00:36:25,870 --> 00:36:27,320 in order to blur an image. 724 00:36:27,320 --> 00:36:30,370 So I'm going to go ahead and import from a PIL library, 725 00:36:30,370 --> 00:36:34,480 a Pillow library, so to speak, a special feature 726 00:36:34,480 --> 00:36:37,510 called Image and a special feature called ImageFilter. 727 00:36:37,510 --> 00:36:39,970 That is to say, these are essentially two functions 728 00:36:39,970 --> 00:36:43,360 that someone else smarter than me when it comes to image manipulation wrote, 729 00:36:43,360 --> 00:36:46,970 they made their code freely available on the internet free and open source, 730 00:36:46,970 --> 00:36:49,630 which means anyone can use the code, and I am allowed now 731 00:36:49,630 --> 00:36:54,190 to import it into my program, because I before class downloaded and installed 732 00:36:54,190 --> 00:36:55,437 it beforehand. 733 00:36:55,437 --> 00:36:57,020 Now I'm going to go ahead and do this. 734 00:36:57,020 --> 00:36:59,110 I'm going to give myself a variable called before. 735 00:36:59,110 --> 00:37:03,460 And I'm going to call Image.open on bridge.bmp. 736 00:37:03,460 --> 00:37:06,700 So again, even though we've never seen this before, never used this before, 737 00:37:06,700 --> 00:37:09,130 you can kind of glean syntactically what's going on. 738 00:37:09,130 --> 00:37:11,500 I've got a variable on the left called before. 739 00:37:11,500 --> 00:37:15,130 I've got a function on the right called Image.open, 740 00:37:15,130 --> 00:37:17,410 and I'm passing in the name bridge.bmp. 741 00:37:17,410 --> 00:37:21,130 So it sounds like this is kind of like fopen in the world of C. 742 00:37:21,130 --> 00:37:24,430 Now notice, this dot is kind of serving a new role here. 743 00:37:24,430 --> 00:37:29,530 In the past, we've used the operator only for structs in C, 744 00:37:29,530 --> 00:37:34,160 when we want to go into a person object, or into a node object, 745 00:37:34,160 --> 00:37:37,420 and we want to go inside of it and access some variable therein. 746 00:37:37,420 --> 00:37:42,310 Well, it turns out in Python, you have things similar in spirit to structs 747 00:37:42,310 --> 00:37:49,780 in C. But instead of containing only variables or data, like name and number 748 00:37:49,780 --> 00:37:52,780 like we did for the person struct a few weeks back, 749 00:37:52,780 --> 00:37:56,200 in Python you can have inside of a structure 750 00:37:56,200 --> 00:37:59,410 not only data, that is variables, you can also 751 00:37:59,410 --> 00:38:01,900 have functions inside of structures. 752 00:38:01,900 --> 00:38:05,080 And that starts to open up all sorts of possibilities 753 00:38:05,080 --> 00:38:07,370 in terms of features available to you. 754 00:38:07,370 --> 00:38:13,480 So it seems that I've got this Image object, this Image struct that I've, 755 00:38:13,480 --> 00:38:15,040 again, imported from someone else. 756 00:38:15,040 --> 00:38:17,710 Inside of it is an open function that expects 757 00:38:17,710 --> 00:38:19,880 as input the name of a file to open. 758 00:38:19,880 --> 00:38:23,110 So we'll see this syntax increasingly over the course of today's examples. 759 00:38:23,110 --> 00:38:25,370 Let me give myself a second variable, after. 760 00:38:25,370 --> 00:38:28,000 Let me go ahead now and assign to this variable called 761 00:38:28,000 --> 00:38:33,610 after the results of calling that before image's filter function, 762 00:38:33,610 --> 00:38:37,747 passing in ImageFilter.BoxBlur of 1. 763 00:38:37,747 --> 00:38:39,580 Now, this is a little cryptic, and we're not 764 00:38:39,580 --> 00:38:41,860 going to spend time on this particular syntax, because odds are, 765 00:38:41,860 --> 00:38:44,100 in life you're not going to have that many opportunities to want 766 00:38:44,100 --> 00:38:46,683 to blur an image for which you're going to run and write code. 767 00:38:46,683 --> 00:38:50,700 But for today's purposes, notice that inside of my before variable, 768 00:38:50,700 --> 00:38:54,570 because I assigned it the return value of this new feature, 769 00:38:54,570 --> 00:38:59,280 it has inside of it not just data but also functions, one of them 770 00:38:59,280 --> 00:39:00,450 now called filter. 771 00:39:00,450 --> 00:39:04,860 And this filter function takes as input the return value of some other function 772 00:39:04,860 --> 00:39:08,190 called that, long story short, will blur my image using 773 00:39:08,190 --> 00:39:12,130 a box of a 1-pixel radius. 774 00:39:12,130 --> 00:39:15,210 So just like your own code, if you implemented blur in C, 775 00:39:15,210 --> 00:39:18,840 this code is going to tell my code to look up, down, left, and right 776 00:39:18,840 --> 00:39:23,380 and blur the pixels by taking the average around them. 777 00:39:23,380 --> 00:39:24,300 And that's kind of it. 778 00:39:24,300 --> 00:39:26,250 After that I'm going to do after.save. 779 00:39:26,250 --> 00:39:28,500 And I'm going to save this as out.bmp. 780 00:39:28,500 --> 00:39:30,990 I just want to create a new file called out.bmp. 781 00:39:30,990 --> 00:39:33,450 And if I've made no mistakes, let me go ahead now 782 00:39:33,450 --> 00:39:37,830 and run python of blur.py and hit Enter. 783 00:39:37,830 --> 00:39:40,200 No error messages, so that's usually a good thing. 784 00:39:40,200 --> 00:39:43,980 If I type ls now, notice that I've got bridge.bmp, 785 00:39:43,980 --> 00:39:48,330 which I already opened, blur.py, which I just wrote, and out.bmp. 786 00:39:48,330 --> 00:39:52,860 And if I go ahead and open out.bmp, let's go ahead and take a look. 787 00:39:52,860 --> 00:39:55,720 Here's before, here's after. 788 00:39:55,720 --> 00:39:56,220 Huh. 789 00:39:56,220 --> 00:39:58,610 Before, after. 790 00:39:58,610 --> 00:40:00,360 Now, over the internet it probably doesn't 791 00:40:00,360 --> 00:40:03,068 look that blurred, though on my Mac right here a few inches away, 792 00:40:03,068 --> 00:40:04,238 it definitely looks blurred. 793 00:40:04,238 --> 00:40:06,030 But let's do it a little more compellingly. 794 00:40:06,030 --> 00:40:09,030 How about, instead of looking one pixel up, down, left, and right, 795 00:40:09,030 --> 00:40:10,930 why don't we look 10 pixels at a time? 796 00:40:10,930 --> 00:40:15,390 So we really blur it by looking at more values and averaging more. 797 00:40:15,390 --> 00:40:19,020 Let me go ahead now and run python of blur.py. 798 00:40:19,020 --> 00:40:20,880 Now let me go ahead and reopen. 799 00:40:20,880 --> 00:40:24,820 And now you see before and after. 800 00:40:24,820 --> 00:40:27,220 Before and after. 801 00:40:27,220 --> 00:40:28,480 So what is this to say? 802 00:40:28,480 --> 00:40:33,270 Well, here is, what, problem set 4 in four lines of code blurring an image. 803 00:40:33,270 --> 00:40:34,890 So pretty cool, pretty powerful. 804 00:40:34,890 --> 00:40:37,800 By standing on the shoulders of others and using their libraries can 805 00:40:37,800 --> 00:40:40,320 we do other things quite quickly. 806 00:40:40,320 --> 00:40:45,670 Notice what I can also do here, too, is solve a more recent problem. 807 00:40:45,670 --> 00:40:50,340 Let me go over to a different directory, where I have in advance-- 808 00:40:50,340 --> 00:40:53,070 and you can download these files off of the course's website-- 809 00:40:53,070 --> 00:40:55,920 a few files that we wrote before class. 810 00:40:55,920 --> 00:40:58,110 One is called speller.py. 811 00:40:58,110 --> 00:41:03,090 So long story short, speller.py is a translation from C 812 00:41:03,090 --> 00:41:06,065 into Python the code for speller.c. 813 00:41:06,065 --> 00:41:08,940 Recall that that was part of the distribution code for problem set 5, 814 00:41:08,940 --> 00:41:11,910 and in speller.c, we translated it now to speller.py. 815 00:41:11,910 --> 00:41:15,450 And in dictionaries and in texts, we see the same files, 816 00:41:15,450 --> 00:41:19,200 as in problem set 5, two different sized dictionaries and a whole bunch 817 00:41:19,200 --> 00:41:21,060 of short and long texts. 818 00:41:21,060 --> 00:41:25,500 What hasn't been created yet is the equivalent of a dictionary.c, a.k.a. 819 00:41:25,500 --> 00:41:27,540 now, dictionary.py. 820 00:41:27,540 --> 00:41:30,240 So let me go ahead and implement my spell checker in Python. 821 00:41:30,240 --> 00:41:34,090 Let me go ahead and create a file called dictionary.py, as is again, 822 00:41:34,090 --> 00:41:34,980 the convention. 823 00:41:34,980 --> 00:41:36,218 And let's go ahead. 824 00:41:36,218 --> 00:41:38,010 We have to implement four functions, right? 825 00:41:38,010 --> 00:41:40,950 We have to implement check, load, size, and unload. 826 00:41:40,950 --> 00:41:44,520 But I probably need like a global variable here to store my dictionary. 827 00:41:44,520 --> 00:41:47,700 And this is where you all implemented your hash table with a pointer, 828 00:41:47,700 --> 00:41:50,935 and then linked lists, and arrays, and all of that, a lot of complexity. 829 00:41:50,935 --> 00:41:54,060 You know what, I'm just going to go ahead and give myself a variable called 830 00:41:54,060 --> 00:41:56,340 words and declare it as a set. 831 00:41:56,340 --> 00:41:58,920 So recall that a set is just a collection of values 832 00:41:58,920 --> 00:42:01,140 that handles duplicates for you. 833 00:42:01,140 --> 00:42:02,820 And frankly, that's all I really need. 834 00:42:02,820 --> 00:42:05,730 I need to be able to store all of the words in a dictionary 835 00:42:05,730 --> 00:42:09,180 and just throw them into a set, so that there's no duplicate values 836 00:42:09,180 --> 00:42:13,020 and I can just check, is one word in the set or is it not. 837 00:42:13,020 --> 00:42:15,768 Well, let's go ahead now and load words into that set. 838 00:42:15,768 --> 00:42:18,060 I'm going to go ahead and define a function called load 839 00:42:18,060 --> 00:42:20,160 that takes the name of a file to open. 840 00:42:20,160 --> 00:42:22,650 And here is some admittedly some new syntax. 841 00:42:22,650 --> 00:42:27,760 So thus far, we've only typed code into the file itself. 842 00:42:27,760 --> 00:42:30,300 In fact, the most striking difference thus far, 843 00:42:30,300 --> 00:42:33,480 dare say, about Python versus C, is that I have never 844 00:42:33,480 --> 00:42:36,307 once even written a main function. 845 00:42:36,307 --> 00:42:37,890 And that, too, is a feature of Python. 846 00:42:37,890 --> 00:42:39,600 If you want to write a program, you don't 847 00:42:39,600 --> 00:42:43,200 have to bother writing your default code in a function called main. 848 00:42:43,200 --> 00:42:44,460 Just start writing your code. 849 00:42:44,460 --> 00:42:46,920 And that's how we were able to get hello, world 850 00:42:46,920 --> 00:42:50,850 down from this many lines of code in C to one line in Python. 851 00:42:50,850 --> 00:42:52,530 We didn't even need to have main. 852 00:42:52,530 --> 00:42:57,060 But if I want to define my own functions, it turns out in Python, 853 00:42:57,060 --> 00:43:01,080 you use the key word def for define, then you put the name of the function , 854 00:43:01,080 --> 00:43:04,680 and then in parentheses, like in C, you put the names of the variables 855 00:43:04,680 --> 00:43:07,260 or parameters that you want the function to take. 856 00:43:07,260 --> 00:43:09,600 You don't have to specify data types, though. 857 00:43:09,600 --> 00:43:13,390 And again, we don't use curly braces, we're instead using a colon. 858 00:43:13,390 --> 00:43:16,980 So this says, hey, Python, give me a function called load that 859 00:43:16,980 --> 00:43:19,410 takes an argument called dictionary. 860 00:43:19,410 --> 00:43:21,250 And what should this function do? 861 00:43:21,250 --> 00:43:23,670 Well, the purpose of the load function in speller 862 00:43:23,670 --> 00:43:25,650 was to load each word from the dictionary 863 00:43:25,650 --> 00:43:27,517 and somehow put it into your hash table. 864 00:43:27,517 --> 00:43:30,600 I'm going to go ahead and do the same-- read each word from the dictionary 865 00:43:30,600 --> 00:43:33,752 and put it into this so-called set, my variable called words. 866 00:43:33,752 --> 00:43:36,960 So I'm going to go ahead and open the file, which I can do with this function 867 00:43:36,960 --> 00:43:37,590 here. 868 00:43:37,590 --> 00:43:39,730 In Python, you don't use fopen. 869 00:43:39,730 --> 00:43:41,490 You just use a function called open. 870 00:43:41,490 --> 00:43:45,630 And I'm going to sign the return value of open to a variable called file. 871 00:43:45,630 --> 00:43:47,790 But I could call that anything I want. 872 00:43:47,790 --> 00:43:49,710 This is where Python gets really cool. 873 00:43:49,710 --> 00:43:52,560 Recall that reading the lines from Python-- 874 00:43:52,560 --> 00:43:55,920 from the file in C was kind of arduous, right? 875 00:43:55,920 --> 00:43:59,790 You had to use fread or some other function 876 00:43:59,790 --> 00:44:02,250 in order to read character after character 877 00:44:02,250 --> 00:44:04,232 after character, one line at a time. 878 00:44:04,232 --> 00:44:05,940 Well, here in Python, you know what, if I 879 00:44:05,940 --> 00:44:08,280 want to iterate over all the lines in the file, 880 00:44:08,280 --> 00:44:10,770 we'll just say for line in file. 881 00:44:10,770 --> 00:44:15,720 This is going to automatically give me a for loop that 882 00:44:15,720 --> 00:44:22,050 assigns the variable line to each successive line in the file for me. 883 00:44:22,050 --> 00:44:25,140 It will figure out where all of those lines are. 884 00:44:25,140 --> 00:44:27,210 What do I want to do with each line? 885 00:44:27,210 --> 00:44:31,410 Well, I want to go ahead and add to my set of words that line. 886 00:44:31,410 --> 00:44:33,600 Insofar as each word-- 887 00:44:33,600 --> 00:44:39,660 each line represents a word, I just want to add to my global variable words 888 00:44:39,660 --> 00:44:40,510 that line. 889 00:44:40,510 --> 00:44:42,300 And that's not quite right, because what's 890 00:44:42,300 --> 00:44:45,000 at the end of every line in my file? 891 00:44:45,000 --> 00:44:48,600 Every line in my file by definition has a backslash n, right? 892 00:44:48,600 --> 00:44:51,090 That is why all of the words in the big dictionary 893 00:44:51,090 --> 00:44:53,020 we gave you are one per line. 894 00:44:53,020 --> 00:44:57,150 So how do you get rid of the new line at the end of a string? 895 00:44:57,150 --> 00:45:01,470 Well, in C, my God, we would have to use malloc to make a copy, 896 00:45:01,470 --> 00:45:04,860 and then move all of the characters over, and then shorten it a little bit 897 00:45:04,860 --> 00:45:06,480 by getting rid of the backslash n. 898 00:45:06,480 --> 00:45:07,050 Uh-uh. 899 00:45:07,050 --> 00:45:12,960 In Python, if you want to strip off the new line at the end of a string, 900 00:45:12,960 --> 00:45:15,030 just do rstrip. 901 00:45:15,030 --> 00:45:18,510 To strip characters means by default to strip off white space. 902 00:45:18,510 --> 00:45:21,960 White space includes the space bar, the tab character, and backslash n. 903 00:45:21,960 --> 00:45:25,320 And so if you want to take each line and throw away 904 00:45:25,320 --> 00:45:30,150 the trailing new line at the end of it, you can simply say line.rstrip. 905 00:45:30,150 --> 00:45:33,120 And this is where strings again in Python are powerful. 906 00:45:33,120 --> 00:45:37,200 Because they are their own data type, they have inside of them, 907 00:45:37,200 --> 00:45:42,810 not only all of the characters composing the string, but also functions, 908 00:45:42,810 --> 00:45:46,080 like rstrip which strips from the end of the line 909 00:45:46,080 --> 00:45:48,210 any white space that might be there. 910 00:45:48,210 --> 00:45:50,370 You know what, after this I think I'm done. 911 00:45:50,370 --> 00:45:52,740 I'm just going to go ahead and close the file, 912 00:45:52,740 --> 00:45:55,270 and I'm going to go ahead and return True. 913 00:45:55,270 --> 00:45:56,010 So that's it. 914 00:45:56,010 --> 00:45:58,110 That's the load function in Python. 915 00:45:58,110 --> 00:46:01,200 Open the dictionary, for each line in the file 916 00:46:01,200 --> 00:46:04,890 add it to your global variable, close the file, return True. 917 00:46:04,890 --> 00:46:08,910 I mean, I'm pretty sure that my code is probably several lines, and certainly 918 00:46:08,910 --> 00:46:11,640 many hours, shorter than your code might have 919 00:46:11,640 --> 00:46:13,380 been for implementing that as well. 920 00:46:13,380 --> 00:46:14,700 Well, what about checking? 921 00:46:14,700 --> 00:46:16,500 Maybe the complexity is just elsewhere. 922 00:46:16,500 --> 00:46:18,292 Well, let me go ahead and define a function 923 00:46:18,292 --> 00:46:22,380 called check that takes a specific word as input as its argument. 924 00:46:22,380 --> 00:46:26,760 And then I'm just going to check if that given word is in my set of words. 925 00:46:26,760 --> 00:46:28,710 Well, it turns out in C you would probably 926 00:46:28,710 --> 00:46:30,752 have to use a for loop or a while loop, and you'd 927 00:46:30,752 --> 00:46:32,910 have to iterate over the whole list of words 928 00:46:32,910 --> 00:46:35,800 that you've loaded using binary search or linear search or the like. 929 00:46:35,800 --> 00:46:39,000 Ugh, I'm so past that at this point so many weeks in. 930 00:46:39,000 --> 00:46:48,360 I'm just going to say, if word in words, go ahead and return True, else return 931 00:46:48,360 --> 00:46:49,500 False. 932 00:46:49,500 --> 00:46:52,410 And that now is my implementation of check. 933 00:46:52,410 --> 00:46:53,730 Now, it's a little buggy. 934 00:46:53,730 --> 00:46:55,140 And I will fix this. 935 00:46:55,140 --> 00:46:56,460 Does anyone spot the bug? 936 00:46:56,460 --> 00:47:00,300 Even if you've never seen Python before, but having spent hours implementing 937 00:47:00,300 --> 00:47:07,950 your own version of check, is there some step I'm missing logically? 938 00:47:07,950 --> 00:47:10,340 There is a bug here. 939 00:47:10,340 --> 00:47:13,640 Does anyone spot what I'm not doing that you probably 940 00:47:13,640 --> 00:47:19,280 did do when checking if a given word is in fact in the dictionary? 941 00:47:19,280 --> 00:47:22,030 BRIAN: A couple of people are commenting on case sensitivity. 942 00:47:22,030 --> 00:47:23,530 DAVID MALAN: Yeah, case sensitivity. 943 00:47:23,530 --> 00:47:26,180 So odds are, in your implementation in C you probably 944 00:47:26,180 --> 00:47:29,870 forced the word to all uppercase, or you forced it to all lowercase. 945 00:47:29,870 --> 00:47:33,170 Totally doable, but you probably had to do it like character for character. 946 00:47:33,170 --> 00:47:36,380 You might have had to copy the input using malloc, or putting it 947 00:47:36,380 --> 00:47:38,240 into an array character for character, then 948 00:47:38,240 --> 00:47:43,370 using a toupper or tolower to capitalize or lowercase each individual letter. 949 00:47:43,370 --> 00:47:46,620 Ugh, like, that would take forever, as indeed it might have. 950 00:47:46,620 --> 00:47:50,180 So you know what, if you want to take a given word and lowercase it, 951 00:47:50,180 --> 00:47:51,770 just say word.lower. 952 00:47:51,770 --> 00:47:54,920 And Python will take care of all of those steps of iterating 953 00:47:54,920 --> 00:47:58,730 over every character, changing each one to lowercase, and returning to you 954 00:47:58,730 --> 00:48:02,000 the new result. And indeed, this now, I would think, 955 00:48:02,000 --> 00:48:05,390 is consistent with what you did in your example as well. 956 00:48:05,390 --> 00:48:06,530 Well, how about size? 957 00:48:06,530 --> 00:48:09,020 Well, in size recall that you had to define 958 00:48:09,020 --> 00:48:13,940 a function that doesn't take any inputs but returns the number of words 959 00:48:13,940 --> 00:48:15,630 in the set of words. 960 00:48:15,630 --> 00:48:17,630 And I'm going to go ahead here-- and actually, I 961 00:48:17,630 --> 00:48:19,850 got my invitation slightly off here. 962 00:48:19,850 --> 00:48:22,790 Let me fix this real fast. 963 00:48:22,790 --> 00:48:25,400 If you want to return the size of your dictionary, 964 00:48:25,400 --> 00:48:27,660 or really the number of words in your set, 965 00:48:27,660 --> 00:48:31,160 you can just return the length of that global variable words. 966 00:48:31,160 --> 00:48:32,130 Done. 967 00:48:32,130 --> 00:48:35,660 And lastly, if you want to unload the dictionary, 968 00:48:35,660 --> 00:48:37,220 let me go ahead and unload things. 969 00:48:37,220 --> 00:48:38,600 Doesn't take input as well. 970 00:48:38,600 --> 00:48:41,690 Honestly, because I've not done any equivalent of malloc, 971 00:48:41,690 --> 00:48:43,880 I've not done any memory management-- why? 972 00:48:43,880 --> 00:48:46,010 You don't have to in Python-- 973 00:48:46,010 --> 00:48:51,560 I can literally just return True in all cases, because my code is undoubtedly 974 00:48:51,560 --> 00:48:55,190 correct, because I didn't have to bother with pointers and addresses and memory 975 00:48:55,190 --> 00:48:55,920 management. 976 00:48:55,920 --> 00:48:58,962 So all of the stress that might have been induced over the past few weeks 977 00:48:58,962 --> 00:49:01,880 as you understood the lower level details of memory management now 978 00:49:01,880 --> 00:49:09,710 go away, not because it's not happening underneath the hood, 979 00:49:09,710 --> 00:49:12,080 but because Python is doing it for you. 980 00:49:12,080 --> 00:49:14,300 And I did spot one bug here actually. 981 00:49:14,300 --> 00:49:16,760 Notice I kind of relapsed into C code here. 982 00:49:16,760 --> 00:49:20,660 What I should have said here is it's actually file.close. 983 00:49:20,660 --> 00:49:25,130 So here when I close the file in load, I actually have to call file.close, 984 00:49:25,130 --> 00:49:30,360 because now that function close is associated with that variable for me. 985 00:49:30,360 --> 00:49:33,380 So again, there is memory management happening. 986 00:49:33,380 --> 00:49:37,550 Malloc and free or realloc are all happening sort of for you 987 00:49:37,550 --> 00:49:38,390 underneath the hood. 988 00:49:38,390 --> 00:49:40,310 But what Python the language is doing for 989 00:49:40,310 --> 00:49:42,523 you now is managing all of that for you. 990 00:49:42,523 --> 00:49:45,440 That's what you get by using a so-called higher-level language instead 991 00:49:45,440 --> 00:49:47,030 of a lower-level language. 992 00:49:47,030 --> 00:49:49,490 You get more features, and in turn in this case, 993 00:49:49,490 --> 00:49:52,610 you get all of those problems taken care of for you, 994 00:49:52,610 --> 00:49:55,430 so that you and I can focus on building our spell checker, 995 00:49:55,430 --> 00:49:58,190 so you and I can focus on building our Instagram filters, 996 00:49:58,190 --> 00:50:02,078 not on allocating memory, copying strings, uppercase and things, which 997 00:50:02,078 --> 00:50:05,120 honestly, while it might have been fun and very gratifying the first time 998 00:50:05,120 --> 00:50:08,390 you got those things working, programming would very quickly become 999 00:50:08,390 --> 00:50:11,150 the most tedious thing in the world if any time you 1000 00:50:11,150 --> 00:50:16,320 want to write a program you have to think and write code at that low level. 1001 00:50:16,320 --> 00:50:16,820 All right. 1002 00:50:16,820 --> 00:50:20,480 Let me go ahead and really cross my fingers that I didn't screw up here, 1003 00:50:20,480 --> 00:50:22,200 and go ahead and run this code. 1004 00:50:22,200 --> 00:50:25,088 So I'm going to go ahead and run python of speller.py-- 1005 00:50:25,088 --> 00:50:28,130 which, admittedly, I wrote in advance, because just like the distribution 1006 00:50:28,130 --> 00:50:32,248 code in speller, we wrote speller.c for you, we wrote speller.py in advance. 1007 00:50:32,248 --> 00:50:34,040 But we won't look at the internals of that. 1008 00:50:34,040 --> 00:50:35,832 I'm going to go ahead and test this on, how 1009 00:50:35,832 --> 00:50:37,880 about something big like Shakespeare. 1010 00:50:37,880 --> 00:50:40,070 And I'm going to cross my fingers here. 1011 00:50:40,070 --> 00:50:41,810 And so far so good. 1012 00:50:41,810 --> 00:50:44,030 The words are kind of flying by. 1013 00:50:44,030 --> 00:50:46,273 I'm going to assume they're correct. 1014 00:50:46,273 --> 00:50:47,690 Hopefully we'll get to the output. 1015 00:50:47,690 --> 00:50:50,870 And it looks like, yeah, I think I see some familiar numbers here. 1016 00:50:50,870 --> 00:50:53,450 I've got 143,091 words. 1017 00:50:53,450 --> 00:50:57,330 And then down here, the total time involved was just under 1 second. 1018 00:50:57,330 --> 00:50:58,790 So that's pretty darn fast. 1019 00:50:58,790 --> 00:51:01,010 And to be clear, I'm using my Mac instead of the IDE, 1020 00:51:01,010 --> 00:51:05,150 so my numbers might be a little different than in the cloud, but 0.9 1021 00:51:05,150 --> 00:51:05,840 seconds. 1022 00:51:05,840 --> 00:51:09,560 But you know what, out of curiosity, let me open up a different tab real quick, 1023 00:51:09,560 --> 00:51:13,340 and let me go ahead and make speller from problem set 5. 1024 00:51:13,340 --> 00:51:17,240 So I brought in advance our own implementation of speller, the staff 1025 00:51:17,240 --> 00:51:21,530 solution, written in C in dictionary.c and speller.c, 1026 00:51:21,530 --> 00:51:23,330 and I've just compiled it with make. 1027 00:51:23,330 --> 00:51:29,040 And let me go ahead and run ./speller using the same text on Shakespeare. 1028 00:51:29,040 --> 00:51:31,310 So again, I just ran the Python version, now 1029 00:51:31,310 --> 00:51:37,140 I want to run the C version using the staff's implementation. 1030 00:51:37,140 --> 00:51:37,640 All right. 1031 00:51:37,640 --> 00:51:38,610 Wow. 1032 00:51:38,610 --> 00:51:42,140 All right, it flew by way faster, kind of twice as fast. 1033 00:51:42,140 --> 00:51:47,090 And notice, even though the numbers are the same up above, the times are not. 1034 00:51:47,090 --> 00:51:51,290 My C version took 0.52 seconds, so half a second. 1035 00:51:51,290 --> 00:51:55,310 My Python version took 0.9, or roughly 1 second. 1036 00:51:55,310 --> 00:52:01,220 So it would seem that my C version is faster, my Python version is slower. 1037 00:52:01,220 --> 00:52:04,850 Why might that be? 1038 00:52:04,850 --> 00:52:07,310 Why might that be? 1039 00:52:07,310 --> 00:52:10,550 Because I'm kind of disappointed if we just spent all this time 1040 00:52:10,550 --> 00:52:13,130 preaching the virtues of Python, and yet here we 1041 00:52:13,130 --> 00:52:15,410 are writing worse code, in some sense. 1042 00:52:15,410 --> 00:52:17,330 Santiago? 1043 00:52:17,330 --> 00:52:21,270 AUDIENCE: Could it be because C, even though it's low level, 1044 00:52:21,270 --> 00:52:24,660 it explicitly tells the computer what to do, 1045 00:52:24,660 --> 00:52:29,492 and so that makes it a little faster, whilst in Python it all 1046 00:52:29,492 --> 00:52:31,700 happens like underneath the hood, as you were saying, 1047 00:52:31,700 --> 00:52:33,560 so that could make it a little slower. 1048 00:52:33,560 --> 00:52:34,310 DAVID MALAN: Yeah. 1049 00:52:34,310 --> 00:52:36,980 In Python, you have a general-purpose solution 1050 00:52:36,980 --> 00:52:39,787 to the problem of memory management, and capitalization, 1051 00:52:39,787 --> 00:52:41,870 and all of these other features, that we ourselves 1052 00:52:41,870 --> 00:52:45,890 have to implement ourselves in C. Python has general-purpose implementations 1053 00:52:45,890 --> 00:52:46,820 of all of those. 1054 00:52:46,820 --> 00:52:50,750 But there's a price you pay by using someone else's code to implement 1055 00:52:50,750 --> 00:52:53,270 all of those things for you. 1056 00:52:53,270 --> 00:52:57,140 And you pay an even greater price by using the type of language 1057 00:52:57,140 --> 00:52:58,655 that Python is in a sense. 1058 00:52:58,655 --> 00:53:00,530 So there's been this other salient difference 1059 00:53:00,530 --> 00:53:03,020 between using C and using Python. 1060 00:53:03,020 --> 00:53:08,150 When I wrote C code, I would compile my code from source code 1061 00:53:08,150 --> 00:53:09,080 into machine code. 1062 00:53:09,080 --> 00:53:11,450 And recall that machine code are 0's and 1's understood 1063 00:53:11,450 --> 00:53:14,750 by the computer's brain, the so-called CPU, or Central Processing Unit. 1064 00:53:14,750 --> 00:53:17,750 We always had to compile our code every time we changed the source code. 1065 00:53:17,750 --> 00:53:21,150 And then we did like ./hello to run the program. 1066 00:53:21,150 --> 00:53:25,860 But every demo thus far in Python, I haven't used make or clang. 1067 00:53:25,860 --> 00:53:32,690 I have used not ./hello, but rather python space the name of the program. 1068 00:53:32,690 --> 00:53:33,870 And why is that? 1069 00:53:33,870 --> 00:53:36,560 Well, it turns out that Python is often implemented as what 1070 00:53:36,560 --> 00:53:38,990 we describe with an interpreter. 1071 00:53:38,990 --> 00:53:42,120 So Python is not only a language like we've been writing, 1072 00:53:42,120 --> 00:53:44,460 it's also a program unto itself. 1073 00:53:44,460 --> 00:53:48,800 The Python program I keep running is an identically named program 1074 00:53:48,800 --> 00:53:51,330 that understands the Python language. 1075 00:53:51,330 --> 00:53:56,150 And what's happening, though, is that by using an interpreter, so to speak, 1076 00:53:56,150 --> 00:53:59,810 to run my programs you're incurring some amount of overhead. 1077 00:53:59,810 --> 00:54:01,490 You're paying a performance price. 1078 00:54:01,490 --> 00:54:02,252 Why? 1079 00:54:02,252 --> 00:54:04,710 Well, computers, recall from week 0, at the end of the day, 1080 00:54:04,710 --> 00:54:06,380 only understand 0's and 1's. 1081 00:54:06,380 --> 00:54:08,390 That's what makes them tick. 1082 00:54:08,390 --> 00:54:11,240 But I have not outputted any 0's and 1's. 1083 00:54:11,240 --> 00:54:13,790 I the human have only been writing Python. 1084 00:54:13,790 --> 00:54:18,860 So there needs to be some kind of translation between my Python code, 1085 00:54:18,860 --> 00:54:23,120 in this English-like syntax, into what the computer itself understands. 1086 00:54:23,120 --> 00:54:25,670 And if you're not going to go through the effort of compiling 1087 00:54:25,670 --> 00:54:27,963 your code every time you make a change, but instead 1088 00:54:27,963 --> 00:54:30,380 you're just going to run your code through an interpreter, 1089 00:54:30,380 --> 00:54:33,470 as is the norm in the Python world, you're 1090 00:54:33,470 --> 00:54:37,760 going to pay a price, because someone had to implement a translator for you. 1091 00:54:37,760 --> 00:54:40,610 And in fact, there's formal terminology for this. 1092 00:54:40,610 --> 00:54:45,150 In the world of Python we have, for instance, 1093 00:54:45,150 --> 00:54:47,240 a picture that looks more like this. 1094 00:54:47,240 --> 00:54:50,630 Whereas in the world of C, we would actually take our source code as input 1095 00:54:50,630 --> 00:54:52,970 and output, first machine code is output, 1096 00:54:52,970 --> 00:54:56,480 and then run the machine code, in the world of Python thus far, 1097 00:54:56,480 --> 00:54:59,450 I'm writing source code, and then I'm immediately running it. 1098 00:54:59,450 --> 00:55:01,940 I'm not compiling it into 0's and 1's in advance. 1099 00:55:01,940 --> 00:55:05,150 I'm trusting that there's a program, coincidentally called Python, 1100 00:55:05,150 --> 00:55:09,920 whose purpose in life is to translate that code for me 1101 00:55:09,920 --> 00:55:12,470 into something the computer does understand. 1102 00:55:12,470 --> 00:55:15,560 And what does that actually mean in real terms? 1103 00:55:15,560 --> 00:55:17,960 Well, it means that if I were to think back 1104 00:55:17,960 --> 00:55:22,250 to an algorithm like this, which probably cryptic to many of you, 1105 00:55:22,250 --> 00:55:25,670 though not all, might be a Spanish algorithm 1106 00:55:25,670 --> 00:55:27,890 for searching a phone book for someone. 1107 00:55:27,890 --> 00:55:30,380 And suppose that I don't speak Spanish at all. 1108 00:55:30,380 --> 00:55:35,000 I might, ideally, compile this program, this algorithm, into something 1109 00:55:35,000 --> 00:55:39,410 I do understand by using a compiler that translates Spanish to English. 1110 00:55:39,410 --> 00:55:43,070 Like voila, this English version, much better reading and understanding this, 1111 00:55:43,070 --> 00:55:45,050 I can execute this algorithm pretty fast, 1112 00:55:45,050 --> 00:55:46,700 because I'm pretty good at English. 1113 00:55:46,700 --> 00:55:50,300 But if you only give me the Spanish version, the source code, 1114 00:55:50,300 --> 00:55:54,725 and you require that I translate it or interpret it line by line, 1115 00:55:54,725 --> 00:55:56,600 honestly that's really going to slow me down, 1116 00:55:56,600 --> 00:55:59,660 because it's like me having to go take like a Spanish dictionary 1117 00:55:59,660 --> 00:56:01,490 and look up every word-- 1118 00:56:01,490 --> 00:56:02,990 "Recoge guia telefonica." 1119 00:56:02,990 --> 00:56:04,912 All right, well, what's "recoge"? 1120 00:56:04,912 --> 00:56:05,870 I have to look that up. 1121 00:56:05,870 --> 00:56:07,880 What's "guia", what's "telefonica"? 1122 00:56:07,880 --> 00:56:08,630 Oh, OK. 1123 00:56:08,630 --> 00:56:09,710 Pick up phone book. 1124 00:56:09,710 --> 00:56:10,230 Got that. 1125 00:56:10,230 --> 00:56:10,730 Step one. 1126 00:56:10,730 --> 00:56:11,397 What's step two? 1127 00:56:11,397 --> 00:56:13,580 "Abre a la mitad de guia telefonica." 1128 00:56:13,580 --> 00:56:16,640 So "open to the middle"-- well, wait, I don't know that. 1129 00:56:16,640 --> 00:56:17,420 Spoiler. 1130 00:56:17,420 --> 00:56:18,738 What does that mean, "abre"? 1131 00:56:18,738 --> 00:56:20,030 All right, let me look that up. 1132 00:56:20,030 --> 00:56:20,900 And it means "open." 1133 00:56:20,900 --> 00:56:23,870 "A la mitad," that means "to the middle." 1134 00:56:23,870 --> 00:56:26,300 "De guia telefonica," "of the phone book." 1135 00:56:26,300 --> 00:56:28,710 Oh, that means "open to the middle of the phone book." 1136 00:56:28,710 --> 00:56:31,040 So I'm struggling to go back and forth here, clearly. 1137 00:56:31,040 --> 00:56:32,870 But it's clearly a slower process. 1138 00:56:32,870 --> 00:56:36,110 And if I keep going, "Ve la pagina," "Look at the page," 1139 00:56:36,110 --> 00:56:39,260 looking up, translating every line, it's undoubtedly 1140 00:56:39,260 --> 00:56:40,890 going to slow down the process. 1141 00:56:40,890 --> 00:56:43,160 And so that's effectively what's happening for us 1142 00:56:43,160 --> 00:56:44,900 when we run these Python programs. 1143 00:56:44,900 --> 00:56:48,350 There is a translator, a man in the middle, so to speak, 1144 00:56:48,350 --> 00:56:51,420 that's looking at your source code and reading it top to bottom, 1145 00:56:51,420 --> 00:56:55,550 left to right, and essentially translating each line respectively 1146 00:56:55,550 --> 00:56:59,340 into the corresponding code that the computer understands. 1147 00:56:59,340 --> 00:57:01,550 So the upside of this is that, thankfully, we 1148 00:57:01,550 --> 00:57:02,930 don't have to run make or clang. 1149 00:57:02,930 --> 00:57:04,800 We don't have to compile our code anymore. 1150 00:57:04,800 --> 00:57:07,480 Like, how many people here have made a change 1151 00:57:07,480 --> 00:57:11,830 to an earlier pset in C, forgotten to save the file but you rerun the-- 1152 00:57:11,830 --> 00:57:14,680 sorry, you forgot to recompile the file, and you rerun it, 1153 00:57:14,680 --> 00:57:16,450 and the program obviously has not changed 1154 00:57:16,450 --> 00:57:19,840 because you haven't actually, not only saved but recompiled it? 1155 00:57:19,840 --> 00:57:22,720 So that stupid, annoying human step is gone. 1156 00:57:22,720 --> 00:57:25,960 In the world of Python, if you change your file, go ahead and just rerun it, 1157 00:57:25,960 --> 00:57:26,950 reinterpret it. 1158 00:57:26,950 --> 00:57:28,417 You can save that step. 1159 00:57:28,417 --> 00:57:31,000 But the price you're going to pay is a little bit of overhead. 1160 00:57:31,000 --> 00:57:34,660 And indeed, we see that here in terms of my Python version 1161 00:57:34,660 --> 00:57:37,570 taking roughly 1 second to spellcheck Shakespeare, 1162 00:57:37,570 --> 00:57:41,450 and my C version taking only one half of a second. 1163 00:57:41,450 --> 00:57:44,320 So here, too, I promised in past weeks this theme of trade-offs. 1164 00:57:44,320 --> 00:57:47,320 This is so prevalent in the world of computer science and programming, 1165 00:57:47,320 --> 00:57:48,730 and frankly in the real world. 1166 00:57:48,730 --> 00:57:51,940 Any time you make some improvement or gain some benefit, 1167 00:57:51,940 --> 00:57:53,860 odds are you are paying some price. 1168 00:57:53,860 --> 00:57:57,610 Maybe it's time, maybe it's space, maybe it's money, maybe it's complexity, 1169 00:57:57,610 --> 00:57:58,840 maybe it's anything else. 1170 00:57:58,840 --> 00:58:01,720 There's this perpetual trade-off of resources. 1171 00:58:01,720 --> 00:58:03,670 And being a good programmer, ultimately, is 1172 00:58:03,670 --> 00:58:06,520 about finding those inflection points and knowing ultimately 1173 00:58:06,520 --> 00:58:09,760 what tools to use for the trade. 1174 00:58:09,760 --> 00:58:12,250 All right, let's go ahead here, take a 5-minute break. 1175 00:58:12,250 --> 00:58:14,833 And when we come back, we'll look at other features of Python, 1176 00:58:14,833 --> 00:58:19,330 we'll end ultimately today with some really powerful capabilities. 1177 00:58:19,330 --> 00:58:21,130 Back in five. 1178 00:58:21,130 --> 00:58:21,850 All right. 1179 00:58:21,850 --> 00:58:22,540 We are back. 1180 00:58:22,540 --> 00:58:24,730 And first, a retraction if I may. 1181 00:58:24,730 --> 00:58:28,330 Brian kindly pointed out that my answer to Olivia and Noah's follow-up question 1182 00:58:28,330 --> 00:58:31,300 unfortunately missed the mark, as I was doing things on the fly instead 1183 00:58:31,300 --> 00:58:32,890 of reading the documentation. 1184 00:58:32,890 --> 00:58:36,100 So let me recall for us this example here, 1185 00:58:36,100 --> 00:58:38,990 wherein we had the range function returning three values. 1186 00:58:38,990 --> 00:58:43,330 So that code correct, that gives us the values 0, 1, and 2. 1187 00:58:43,330 --> 00:58:46,450 But what I think Olivia asked was that if you wanted to skip values, 1188 00:58:46,450 --> 00:58:49,155 and for instance do every two digits, how do we do that? 1189 00:58:49,155 --> 00:58:51,280 And I unfortunately screwed up the syntax for that, 1190 00:58:51,280 --> 00:58:54,610 providing only two inputs to range instead of three, 1191 00:58:54,610 --> 00:58:56,000 as would be needed here. 1192 00:58:56,000 --> 00:58:58,810 So for instance, suppose that we wanted to print out 1193 00:58:58,810 --> 00:59:02,500 all of the numbers between 0 and 100, inclusive, 1194 00:59:02,500 --> 00:59:06,520 but skipping every other-- so, 0, 2, 4, 6, 8, so all the even 1195 00:59:06,520 --> 00:59:09,230 numbers on up through 100. 1196 00:59:09,230 --> 00:59:12,430 We would actually want to do something like this instead. 1197 00:59:12,430 --> 00:59:17,200 We would say, for i in range of 0 comma 101 comma 2. 1198 00:59:17,200 --> 00:59:18,210 Why is that? 1199 00:59:18,210 --> 00:59:20,600 Well, we'll pull up the documentation in just a moment, 1200 00:59:20,600 --> 00:59:23,530 but 0 is where you want to start counting. 1201 00:59:23,530 --> 00:59:26,770 The second value, 101, is where you want to stop counting. 1202 00:59:26,770 --> 00:59:29,890 But it is by definition exclusive, so we have 1203 00:59:29,890 --> 00:59:31,940 to go 1 past the value we care about. 1204 00:59:31,940 --> 00:59:35,170 And then the 2, the third argument, is how many 1205 00:59:35,170 --> 00:59:40,090 numbers do you want to increment at a time, from 0 to 2 to 4 to 6 to 8, 1206 00:59:40,090 --> 00:59:41,920 on up through 100. 1207 00:59:41,920 --> 00:59:44,290 So how could I have figured this out in advance 1208 00:59:44,290 --> 00:59:45,880 rather than embarrassing myself now? 1209 00:59:45,880 --> 00:59:48,880 Well, it turns out there is official documentation for Python. 1210 00:59:48,880 --> 00:59:50,710 And we'll always link this to you. 1211 00:59:50,710 --> 00:59:53,060 And here there is this search box at the very top. 1212 00:59:53,060 --> 00:59:54,880 And you can see that during the break I was searching 1213 00:59:54,880 --> 00:59:56,320 for the documentation for range. 1214 00:59:56,320 --> 00:59:59,362 And sure enough, if I search for the range documentation, at first glance 1215 00:59:59,362 --> 01:00:01,487 it might seem kind of overwhelming, because there's 1216 01:00:01,487 --> 01:00:04,450 a lot of mentions of something like range in the documentation. 1217 01:00:04,450 --> 01:00:07,460 Fortunately, the first result here is the one we want. 1218 01:00:07,460 --> 01:00:10,300 And if I click on that, you'll see some documentation that's 1219 01:00:10,300 --> 01:00:12,520 a little cryptic at first glance. 1220 01:00:12,520 --> 01:00:15,040 But what's interesting about this is that range 1221 01:00:15,040 --> 01:00:16,332 comes in two different flavors. 1222 01:00:16,332 --> 01:00:18,207 And even though I keep calling it a function, 1223 01:00:18,207 --> 01:00:19,880 technically it's what's called a class. 1224 01:00:19,880 --> 01:00:21,130 But more on that another time. 1225 01:00:21,130 --> 01:00:23,330 It behaves for our purposes as a function. 1226 01:00:23,330 --> 01:00:24,880 Notice that there's two lines here. 1227 01:00:24,880 --> 01:00:26,780 And they're similar but different. 1228 01:00:26,780 --> 01:00:30,730 The first one specifies that this range function 1229 01:00:30,730 --> 01:00:33,080 can take one input, the stop value. 1230 01:00:33,080 --> 01:00:35,500 So at what value do you want to stop counting? 1231 01:00:35,500 --> 01:00:40,300 So before, when we did range of 3, it stands to reason that by default, 1232 01:00:40,300 --> 01:00:43,360 if you start counting at 0 and you stop at 3, that will 1233 01:00:43,360 --> 01:00:47,260 get you to use i equals 0, 1, and 2. 1234 01:00:47,260 --> 01:00:50,500 But there's another flavor of the range function, which 1235 01:00:50,500 --> 01:00:52,420 is not the one that I proposed exists. 1236 01:00:52,420 --> 01:00:56,050 There's another that takes in potentially three arguments, here 1237 01:00:56,050 --> 01:00:57,400 or technically two. 1238 01:00:57,400 --> 01:00:59,080 But it works in the following way. 1239 01:00:59,080 --> 01:01:02,320 When you see syntax like this in Python's documentation, 1240 01:01:02,320 --> 01:01:04,900 this means that the alternate form of range 1241 01:01:04,900 --> 01:01:09,490 takes an argument called start, followed by an argument called stop, 1242 01:01:09,490 --> 01:01:13,660 followed by, optionally, a third argument called step. 1243 01:01:13,660 --> 01:01:17,360 And I know as the reader it's optional, because it's in square brackets here. 1244 01:01:17,360 --> 01:01:20,050 So nothing to do with lists or arrays or anything like this. 1245 01:01:20,050 --> 01:01:21,590 This is just human documentation. 1246 01:01:21,590 --> 01:01:23,530 Anytime you see things in square brackets, 1247 01:01:23,530 --> 01:01:27,050 that tends to imply to the human reader that this is optional. 1248 01:01:27,050 --> 01:01:28,130 So what does that mean? 1249 01:01:28,130 --> 01:01:31,510 Well, notice that there is no flavor of range that 1250 01:01:31,510 --> 01:01:36,100 lets me specify a stop and a step, which I thought there was a moment ago when 1251 01:01:36,100 --> 01:01:37,450 answering Olivia and Noah. 1252 01:01:37,450 --> 01:01:40,170 But rather, there is this three-input version. 1253 01:01:40,170 --> 01:01:43,210 So if I specify I want to start at 0, I want 1254 01:01:43,210 --> 01:01:47,230 to stop at 101, which is just past the 100 I care about, 1255 01:01:47,230 --> 01:01:50,110 and then provide an optional step of 2, this 1256 01:01:50,110 --> 01:01:53,200 will give me a program ultimately that will print out 1257 01:01:53,200 --> 01:01:54,730 all of those even numbers. 1258 01:01:54,730 --> 01:01:55,780 So let me do this. 1259 01:01:55,780 --> 01:01:58,250 First let me go into a program here. 1260 01:01:58,250 --> 01:01:59,650 I'll call it count.py. 1261 01:01:59,650 --> 01:02:04,030 And I'm going to go ahead and start at 0, go up to but not through 101, 1262 01:02:04,030 --> 01:02:05,350 stepping 2 at a time. 1263 01:02:05,350 --> 01:02:07,200 And this time I'm going to print out i. 1264 01:02:07,200 --> 01:02:09,300 And here, too, another handy feature of Python-- 1265 01:02:09,300 --> 01:02:11,730 no more %s, and also no more %i. 1266 01:02:11,730 --> 01:02:14,640 If you want to print out the value of a variable called i, 1267 01:02:14,640 --> 01:02:18,120 just say print, open paren, i, close paren. 1268 01:02:18,120 --> 01:02:20,565 You don't need another format string as in C. 1269 01:02:20,565 --> 01:02:24,570 Let me go ahead now and run python of count.py, Enter. 1270 01:02:24,570 --> 01:02:26,070 And it scrolled by really fast. 1271 01:02:26,070 --> 01:02:28,350 But notice that it stopped at 100, and if I scroll 1272 01:02:28,350 --> 01:02:30,810 to the beginning it started at 0. 1273 01:02:30,810 --> 01:02:31,615 So my apologies. 1274 01:02:31,615 --> 01:02:33,420 Mea culpa for messing that up earlier. 1275 01:02:33,420 --> 01:02:36,510 But what a wonderful opportunity to introduce the official documentation 1276 01:02:36,510 --> 01:02:39,930 for Python, which will soon become your friend, 1277 01:02:39,930 --> 01:02:42,600 cryptic though it might feel at first glance. 1278 01:02:42,600 --> 01:02:43,360 All right. 1279 01:02:43,360 --> 01:02:45,840 Let's go ahead then and revisit one other program 1280 01:02:45,840 --> 01:02:47,400 that we started with earlier. 1281 01:02:47,400 --> 01:02:50,850 And that program was again this relatively simple Hello program 1282 01:02:50,850 --> 01:02:52,710 that we left off in this state. 1283 01:02:52,710 --> 01:02:56,310 We were using the get_string function from the CS50 library in Python. 1284 01:02:56,310 --> 01:02:59,160 We had a variable called answer that was getting the return 1285 01:02:59,160 --> 01:03:01,020 value of that version of get_string. 1286 01:03:01,020 --> 01:03:04,620 And we were printing out "hello," comma, so-and-so. 1287 01:03:04,620 --> 01:03:07,620 And we were using that new cryptic feature, but handy, 1288 01:03:07,620 --> 01:03:12,480 known as a format string or an f-string, which just means replace whatever's 1289 01:03:12,480 --> 01:03:14,717 in curly braces with the actual value. 1290 01:03:14,717 --> 01:03:16,800 So let's start to now take off the training wheels 1291 01:03:16,800 --> 01:03:18,660 that we just put on only an hour ago. 1292 01:03:18,660 --> 01:03:20,670 Let's get rid of the CS50 library. 1293 01:03:20,670 --> 01:03:24,210 How can we actually get input in Python without using 1294 01:03:24,210 --> 01:03:26,670 a library from someone like CS50? 1295 01:03:26,670 --> 01:03:28,290 Well, get_string no longer exists. 1296 01:03:28,290 --> 01:03:33,480 But thankfully there is another function we can use called, quite simply, input. 1297 01:03:33,480 --> 01:03:38,730 Input is a function that, quite similar to get_string in both C and Python, 1298 01:03:38,730 --> 01:03:42,030 prompts the user with a phrase, like this one here, "What's your name?"; 1299 01:03:42,030 --> 01:03:45,120 waits for them to type in a value; and as soon as they hit Enter, 1300 01:03:45,120 --> 01:03:48,610 it returns whatever the human has typed in for you. 1301 01:03:48,610 --> 01:03:52,800 So if I go ahead now and rerun this program, python of hello.py, 1302 01:03:52,800 --> 01:03:56,505 after getting rid of the CS50 library and using input instead of get_string, 1303 01:03:56,505 --> 01:03:57,810 what's my name? 1304 01:03:57,810 --> 01:03:58,760 David. 1305 01:03:58,760 --> 01:03:59,820 "Hello," comma, "David." 1306 01:03:59,820 --> 01:04:02,790 So already there now, this is raw, native Python 1307 01:04:02,790 --> 01:04:07,080 code completely unrelated to anything CS50 specific. 1308 01:04:07,080 --> 01:04:10,260 But now let's go ahead, and let's keep using the CS50 library initially, 1309 01:04:10,260 --> 01:04:13,830 because we'll see that very quickly are there advantages of using it, 1310 01:04:13,830 --> 01:04:15,960 because we do a lot of error checking for you. 1311 01:04:15,960 --> 01:04:19,500 But we'll eventually take those training wheels off entirely as well. 1312 01:04:19,500 --> 01:04:22,290 But notice, indeed, how relatively simple it is to do so. 1313 01:04:22,290 --> 01:04:26,983 Let me go ahead and open up a program that we wrote in advance. 1314 01:04:26,983 --> 01:04:28,650 And I'm going to go ahead and grab this. 1315 01:04:28,650 --> 01:04:31,510 This is available, as always, on the course's website. 1316 01:04:31,510 --> 01:04:35,850 And I'm going to go ahead and open a file called addition0.c, 1317 01:04:35,850 --> 01:04:37,830 which we've actually seen before. 1318 01:04:37,830 --> 01:04:40,200 And I'm going to go ahead and do this fancy thing here 1319 01:04:40,200 --> 01:04:43,350 where, in just a moment, I'm going to split my window so 1320 01:04:43,350 --> 01:04:45,240 that I can see two files at a time. 1321 01:04:45,240 --> 01:04:49,350 And over here I'm going to create a new file, and I'll call this addition.py. 1322 01:04:49,350 --> 01:04:52,620 So that is to say, I'm just going to rearrange my IDE temporarily 1323 01:04:52,620 --> 01:04:56,400 today so that we can see one language on the left, C, and then 1324 01:04:56,400 --> 01:04:58,872 corresponding language on the right in Python. 1325 01:04:58,872 --> 01:05:01,080 And again, you can download all these examples online 1326 01:05:01,080 --> 01:05:03,220 if you'd like to follow along on your own. 1327 01:05:03,220 --> 01:05:06,820 So if I'm translating this program on the left to this program on the right, 1328 01:05:06,820 --> 01:05:09,810 let's first recall what the program on the left actually did. 1329 01:05:09,810 --> 01:05:13,500 This was a program that prompts the user for x, prompts the user for y, 1330 01:05:13,500 --> 01:05:15,720 and quite simply performs addition on the two. 1331 01:05:15,720 --> 01:05:18,300 So this is week 1 stuff, way back when now. 1332 01:05:18,300 --> 01:05:19,990 Well, let's go ahead and translate this. 1333 01:05:19,990 --> 01:05:22,622 I will use the get_int function from the CS50 library, 1334 01:05:22,622 --> 01:05:25,080 because it's going to make my life a little easier for now. 1335 01:05:25,080 --> 01:05:28,260 I'm going to say from cs50 import get_int. 1336 01:05:28,260 --> 01:05:30,990 I'm going to then go ahead and get an int from the user using 1337 01:05:30,990 --> 01:05:32,820 get_int and prompting them for x. 1338 01:05:32,820 --> 01:05:36,390 I'm going to then go ahead and get an int from the user prompting them for y. 1339 01:05:36,390 --> 01:05:41,700 I'm going to then finally go ahead and, let's say, print out x plus y. 1340 01:05:41,700 --> 01:05:45,690 And let me go ahead down here and run python of addition.py. 1341 01:05:45,690 --> 01:05:50,310 I'm now being prompted for x, let's type in 1, y, let's type in 2, and voila, 1342 01:05:50,310 --> 01:05:52,140 3 is my program here. 1343 01:05:52,140 --> 01:05:53,520 So pretty straightforward. 1344 01:05:53,520 --> 01:05:56,550 Fewer lines of code, because one, I don't have these unnecessary 1345 01:05:56,550 --> 01:05:58,600 includes like stdio.h. 1346 01:05:58,600 --> 01:06:00,355 I don't have any of the curly braces. 1347 01:06:00,355 --> 01:06:02,230 To be fair, I don't have any of the comments. 1348 01:06:02,230 --> 01:06:03,272 So let me write comments. 1349 01:06:03,272 --> 01:06:05,970 In Python, it's going to be a different symbol. 1350 01:06:05,970 --> 01:06:12,960 "Prompt user for x" should be prefixed with a hash symbol now instead of a //. 1351 01:06:12,960 --> 01:06:18,300 I'll go ahead and prompt user for y, and then, how about here, perform addition. 1352 01:06:18,300 --> 01:06:19,890 But even still, it's pretty tight. 1353 01:06:19,890 --> 01:06:23,965 It's only 10 lines of code with some of those comments there. 1354 01:06:23,965 --> 01:06:26,590 All right, well, what might I do that's a little bit different? 1355 01:06:26,590 --> 01:06:27,990 Well, let's take off the training wheels. 1356 01:06:27,990 --> 01:06:30,740 Let's take off the training wheels and get rid of the CS50 library 1357 01:06:30,740 --> 01:06:32,880 again and get input here. 1358 01:06:32,880 --> 01:06:36,240 Well, if I go ahead and get input here, get input here, 1359 01:06:36,240 --> 01:06:40,230 assigning the values to x and y respectively, I'm going to go ahead now 1360 01:06:40,230 --> 01:06:44,340 and run python of addition.py. 1361 01:06:44,340 --> 01:06:48,540 x will be 1 again, y will be 2 again, and the answer, of course, is-- 1362 01:06:48,540 --> 01:06:49,790 12. 1363 01:06:49,790 --> 01:06:52,560 Well, that's wrong. 1364 01:06:52,560 --> 01:06:55,020 What's going on? 1365 01:06:55,020 --> 01:06:59,420 How did I screw up such a simple program already? 1366 01:06:59,420 --> 01:07:02,850 Albeit in a new language for me, Python. 1367 01:07:02,850 --> 01:07:03,950 What did I do here? 1368 01:07:03,950 --> 01:07:05,300 Yeah, Ben? 1369 01:07:05,300 --> 01:07:08,387 AUDIENCE: Because it's really taking it in as two strings, 1370 01:07:08,387 --> 01:07:10,220 so it's just putting them next to each other 1371 01:07:10,220 --> 01:07:12,020 as opposed to doing the actual math on it. 1372 01:07:12,020 --> 01:07:13,635 It's not reading it as in int. 1373 01:07:13,635 --> 01:07:14,510 DAVID MALAN: Exactly. 1374 01:07:14,510 --> 01:07:17,270 So input, this function that comes with Python, 1375 01:07:17,270 --> 01:07:19,505 really is analogous to Cs50's get_string. 1376 01:07:19,505 --> 01:07:21,380 No matter what the human types, it's going 1377 01:07:21,380 --> 01:07:25,370 to come back as keyboard input characters, or ASCII characters, 1378 01:07:25,370 --> 01:07:27,260 or Unicode characters from weeks past. 1379 01:07:27,260 --> 01:07:29,052 Even if they look like numbers, they're not 1380 01:07:29,052 --> 01:07:32,990 going to be treated as numbers, a.k.a., integers, unless we coerce them so. 1381 01:07:32,990 --> 01:07:37,550 Now remember in C, we had this ability to cast values from one to another. 1382 01:07:37,550 --> 01:07:40,170 Casting meant to convert one data type to another. 1383 01:07:40,170 --> 01:07:44,150 And we were allowed to do that for chars to ints or ints to chars, 1384 01:07:44,150 --> 01:07:48,668 but you could not do it for strings to ints, or from ints to strings. 1385 01:07:48,668 --> 01:07:50,210 For that we needed special functions. 1386 01:07:50,210 --> 01:07:53,520 And some of you might have used atoi, ASCII to int, 1387 01:07:53,520 --> 01:07:56,960 which was a function that actually looks at all of the characters in an ASCII 1388 01:07:56,960 --> 01:07:59,690 string and converts it to the corresponding integer. 1389 01:07:59,690 --> 01:08:02,040 In Python, frankly, it's a little simpler. 1390 01:08:02,040 --> 01:08:04,320 We can just cast it from one thing to another. 1391 01:08:04,320 --> 01:08:08,270 So I'm going to go ahead and cast the return value of input 1392 01:08:08,270 --> 01:08:11,390 as using this, int. 1393 01:08:11,390 --> 01:08:16,010 And I'm going to do the same for y, passing the return value of input there 1394 01:08:16,010 --> 01:08:18,620 to convert what looks like a string to what's-- 1395 01:08:18,620 --> 01:08:21,800 what looks like an int to what's actually an int. 1396 01:08:21,800 --> 01:08:25,399 And now let me go ahead and perform the additions again, python of addition.py. 1397 01:08:25,399 --> 01:08:27,830 And notice this time, hopefully to Ben's point, 1398 01:08:27,830 --> 01:08:31,529 it's not going to concatenate two strings, as we saw 1399 01:08:31,529 --> 01:08:34,790 is the default behavior of plus when you have two strings left and right. 1400 01:08:34,790 --> 01:08:39,290 Hopefully now it will do a do addition on x equals 1, y equals 2. 1401 01:08:39,290 --> 01:08:42,310 And voila, now we're back in business. 1402 01:08:42,310 --> 01:08:47,479 However, what if I'm not the most cooperative or sharp user, 1403 01:08:47,479 --> 01:08:50,090 and I type in "cat" for x? 1404 01:08:50,090 --> 01:08:52,950 Now some crazy stuff starts to happen. 1405 01:08:52,950 --> 01:08:55,729 So notice we've triggered our very first error when 1406 01:08:55,729 --> 01:08:58,520 it comes to running a program whereby my program won't even 1407 01:08:58,520 --> 01:08:59,660 run in the first place. 1408 01:08:59,660 --> 01:09:02,180 And notice I'm getting some somewhat cryptic syntax here-- 1409 01:09:02,180 --> 01:09:06,319 traceback, most recent call last, file addition.py line 2. 1410 01:09:06,319 --> 01:09:07,819 All right, that's at least familiar. 1411 01:09:07,819 --> 01:09:09,800 I screwed up somewhere on line 2. 1412 01:09:09,800 --> 01:09:11,970 It's showing me the line of code here. 1413 01:09:11,970 --> 01:09:16,729 And it's saying "ValueError-- invalid literal for int with base 10, cat." 1414 01:09:16,729 --> 01:09:19,430 That's a very cryptic way of saying I just 1415 01:09:19,430 --> 01:09:23,750 have tried to cast something that's not an integer to an integer. 1416 01:09:23,750 --> 01:09:26,600 And so this is why we use things like the CS50 library. 1417 01:09:26,600 --> 01:09:28,970 It's actually kind of annoying to write all of the code 1418 01:09:28,970 --> 01:09:32,450 that checks and makes sure did the user type in a number and only a number, 1419 01:09:32,450 --> 01:09:35,270 and not "cat" or "dog" or some other cryptic string. 1420 01:09:35,270 --> 01:09:38,450 We ourselves now would have to implement that kind of error checking 1421 01:09:38,450 --> 01:09:40,250 if we don't want to use the CS50 library. 1422 01:09:40,250 --> 01:09:41,430 So there, trade-off. 1423 01:09:41,430 --> 01:09:43,760 Maybe you feel more comfortable writing all of the code yourself. 1424 01:09:43,760 --> 01:09:46,552 You don't want to use some random person on the internet's library, 1425 01:09:46,552 --> 01:09:49,760 whether it's CS50's or someone else's, even if it's free and open source. 1426 01:09:49,760 --> 01:09:51,140 You want to write it yourself. 1427 01:09:51,140 --> 01:09:51,880 OK, fine. 1428 01:09:51,880 --> 01:09:53,630 If you want to write it yourself, now I've 1429 01:09:53,630 --> 01:09:56,630 got to add a bunch more lines of code to check, 1430 01:09:56,630 --> 01:10:00,260 did the human type in a decimal digit one after the other, or did they 1431 01:10:00,260 --> 01:10:02,490 type in other ASCII characters? 1432 01:10:02,490 --> 01:10:05,300 So again, trade-off between using libraries are not. 1433 01:10:05,300 --> 01:10:10,100 Generally, the answer is going to be use a common library to do-- 1434 01:10:10,100 --> 01:10:12,110 to solve these kinds of problems. 1435 01:10:12,110 --> 01:10:14,520 Well, let's go ahead and change the program a little bit. 1436 01:10:14,520 --> 01:10:20,810 Let me go ahead and open a new file called division.py just 1437 01:10:20,810 --> 01:10:22,490 to do a bit of division here. 1438 01:10:22,490 --> 01:10:25,040 And let me go ahead on the right-hand side and copy 1439 01:10:25,040 --> 01:10:28,440 paste what we did before, but just change to division here. 1440 01:10:28,440 --> 01:10:31,310 Let me go ahead and divide x by y. 1441 01:10:31,310 --> 01:10:33,830 And I keep typing in 1 for x, 2 for y. 1442 01:10:33,830 --> 01:10:36,890 In a moment I'm going to run python of division.py and type 1443 01:10:36,890 --> 01:10:38,900 in 1 for x and 2 for y. 1444 01:10:38,900 --> 01:10:44,780 But before I hit Enter, if this were a program in C, what would the answer be? 1445 01:10:44,780 --> 01:10:47,240 Feel free to just respond in the chat if you'd like. 1446 01:10:47,240 --> 01:10:51,050 If this were a program in C, and I'm dividing x by y, 1447 01:10:51,050 --> 01:10:54,860 what would I have gotten in week 1 and every week since, Brian? 1448 01:10:54,860 --> 01:10:56,572 BRIAN: The consensus looks like 0. 1449 01:10:56,572 --> 01:10:58,280 DAVID MALAN: Yeah, because of truncation. 1450 01:10:58,280 --> 01:11:04,650 If 1 divided by 2, of course, is 1/2, or 0.5, 0.5 is a float. 1451 01:11:04,650 --> 01:11:07,340 But if I'm dealing with integers, even though it's implicitly 1452 01:11:07,340 --> 01:11:11,180 integers thus far, and now explicitly now that I've casted them, 1453 01:11:11,180 --> 01:11:14,360 I would seem to throw away the 0.5 and just get back 0. 1454 01:11:14,360 --> 01:11:18,290 But let me go ahead and run python of division.py and putting x equals 1, 1455 01:11:18,290 --> 01:11:19,190 y equals 2. 1456 01:11:19,190 --> 01:11:24,590 And voila, wow, one of the most annoying features, or lack of features in C, 1457 01:11:24,590 --> 01:11:25,820 seems to have been-- 1458 01:11:25,820 --> 01:11:29,880 seems to have been solved in Python by division doing what you want. 1459 01:11:29,880 --> 01:11:32,540 And if you divide one integer by another in Python, 1460 01:11:32,540 --> 01:11:35,660 it turns out one of the other features of today's language 1461 01:11:35,660 --> 01:11:37,910 is that it does what you the programmer would 1462 01:11:37,910 --> 01:11:42,080 expect, without having to get into the weeds, of the nuances of floats 1463 01:11:42,080 --> 01:11:42,590 and ints. 1464 01:11:42,590 --> 01:11:46,220 Just does the quote, unquote "right thing" instead. 1465 01:11:46,220 --> 01:11:50,720 Well, let me go ahead and open up another program here, also from week 1. 1466 01:11:50,720 --> 01:11:54,425 This one was called conditions.c. 1467 01:11:54,425 --> 01:11:58,340 And this one-- give me one moment to open this up on the left-- 1468 01:11:58,340 --> 01:12:01,520 this one here was a program whose purpose in life 1469 01:12:01,520 --> 01:12:04,910 was to get an int from the user called x, get another called y. 1470 01:12:04,910 --> 01:12:08,840 And then it just did this-- if x less than y, print out as much. 1471 01:12:08,840 --> 01:12:12,412 Else if x greater than y, print out as much, and so forth. 1472 01:12:12,412 --> 01:12:14,120 Let's go ahead and translate this program 1473 01:12:14,120 --> 01:12:18,350 into the corresponding Python code using some of the syntax we've seen already. 1474 01:12:18,350 --> 01:12:20,560 I'm going to go ahead and save this as conditions.py. 1475 01:12:20,560 --> 01:12:22,310 And I think I'm going to go ahead and keep 1476 01:12:22,310 --> 01:12:24,350 using the library, the CS50 library, so that I 1477 01:12:24,350 --> 01:12:26,900 don't have to worry about those kinds of errors 1478 01:12:26,900 --> 01:12:29,140 when casting bad input to another. 1479 01:12:29,140 --> 01:12:31,910 So from cs50 import get_int. 1480 01:12:31,910 --> 01:12:36,050 And let me go ahead and now get an int from the user, calling it x. 1481 01:12:36,050 --> 01:12:39,560 Let's go ahead and get an int from the user, calling it y. 1482 01:12:39,560 --> 01:12:42,470 And I won't bother typing comments this time, just for time's sake. 1483 01:12:42,470 --> 01:12:43,820 And now let me ask the question. 1484 01:12:43,820 --> 01:12:46,460 In C, I would have done if x less than y. 1485 01:12:46,460 --> 01:12:48,050 Python's a little more terse. 1486 01:12:48,050 --> 01:12:50,900 If x less than y suffices, but with a colon. 1487 01:12:50,900 --> 01:12:55,380 Under that, I'm going to go ahead and say print "x is less than y." 1488 01:12:55,380 --> 01:12:57,860 Elif-- this is the weird one-- 1489 01:12:57,860 --> 01:13:02,420 x is greater than y, go ahead and print out "x is greater than y." 1490 01:13:02,420 --> 01:13:09,050 And then else, also with a colon, print out "x is equal to y." 1491 01:13:09,050 --> 01:13:11,250 And I think that's just about it. 1492 01:13:11,250 --> 01:13:15,020 I'm going to go ahead down here and run python of conditions.py. 1493 01:13:15,020 --> 01:13:19,040 I'll type in 1, I'll type in 2, and indeed x is less than y. 1494 01:13:19,040 --> 01:13:22,040 I'll run it again, this time with 2 and 1. 1495 01:13:22,040 --> 01:13:23,690 X is greater than y. 1496 01:13:23,690 --> 01:13:25,880 And let me run it again with 1 and 1. 1497 01:13:25,880 --> 01:13:27,230 X is equal to y. 1498 01:13:27,230 --> 01:13:28,460 So that seems to have worked. 1499 01:13:28,460 --> 01:13:30,002 And let me point out one other thing. 1500 01:13:30,002 --> 01:13:33,230 I mentioned earlier that you have this other shorthand syntax where 1501 01:13:33,230 --> 01:13:36,650 you can just say import the CS50 library if you don't want to bother typing out 1502 01:13:36,650 --> 01:13:38,000 individual function names. 1503 01:13:38,000 --> 01:13:39,510 That's totally fine. 1504 01:13:39,510 --> 01:13:42,470 But notice that the IDE is yelling at me at lines 3 and 4 1505 01:13:42,470 --> 01:13:44,820 that get_int is no longer recognized. 1506 01:13:44,820 --> 01:13:47,000 That's because Python supports this feature, 1507 01:13:47,000 --> 01:13:52,010 when using other people's libraries, that it can namespace them for you. 1508 01:13:52,010 --> 01:13:55,910 That is to say, you can't refer to get_int anymore directly. 1509 01:13:55,910 --> 01:13:59,990 You have to more explicitly say, call the get_int function that's 1510 01:13:59,990 --> 01:14:02,630 inside of the CS50 library. 1511 01:14:02,630 --> 01:14:05,510 And so again, using our familiar dot operator, 1512 01:14:05,510 --> 01:14:08,810 means go inside of that CS50 library, just like a C struct, 1513 01:14:08,810 --> 01:14:11,870 and call the function called get_int therein. 1514 01:14:11,870 --> 01:14:15,830 So I can now go ahead and rerun this, python of conditions.py, 1515 01:14:15,830 --> 01:14:19,560 typing in 1 and 1, and voila, the code is now working again. 1516 01:14:19,560 --> 01:14:20,390 So which is better? 1517 01:14:20,390 --> 01:14:21,080 It depends. 1518 01:14:21,080 --> 01:14:23,750 I mean, if it's sort of more readable to just write get_int 1519 01:14:23,750 --> 01:14:26,917 all over the place, that's going to save you a lot of keystrokes-- you don't 1520 01:14:26,917 --> 01:14:28,866 have to keep typing cs50 dot, cs50 dot. 1521 01:14:28,866 --> 01:14:31,250 If, though, you're writing a pretty big program, 1522 01:14:31,250 --> 01:14:35,300 and maybe you're using two different libraries that both implement 1523 01:14:35,300 --> 01:14:37,460 a function called get_int, you want to be 1524 01:14:37,460 --> 01:14:39,870 able to distinguish one from the other. 1525 01:14:39,870 --> 01:14:42,710 So you might want to just import the libraries by their name, 1526 01:14:42,710 --> 01:14:46,290 and then prefix the function calls, as I've done here, 1527 01:14:46,290 --> 01:14:47,750 which is known as namespacing. 1528 01:14:47,750 --> 01:14:51,020 Namespacing means that you can have two identically named variables 1529 01:14:51,020 --> 01:14:54,470 or functions existing in two different namespaces. 1530 01:14:54,470 --> 01:14:57,440 They don't collide, so long as they are inside 1531 01:14:57,440 --> 01:15:02,300 of the CS50 library or some other library's name instead. 1532 01:15:02,300 --> 01:15:04,530 Let me do one other thing with conditions here. 1533 01:15:04,530 --> 01:15:07,850 Let me go ahead and open up another file from week 1. 1534 01:15:07,850 --> 01:15:10,790 This one was agree.c. 1535 01:15:10,790 --> 01:15:17,360 And this program prompted the user to input whether or not they agree. 1536 01:15:17,360 --> 01:15:21,710 And we checked a little curiously that first week using equals 1537 01:15:21,710 --> 01:15:26,930 equals quote, unquote "Y" or lowercase "y," or quote, unquote capital "N" 1538 01:15:26,930 --> 01:15:28,385 or lowercase "n." 1539 01:15:28,385 --> 01:15:30,260 Well, how do we go about converting this one? 1540 01:15:30,260 --> 01:15:32,520 Let me go ahead and give myself a new file over here. 1541 01:15:32,520 --> 01:15:35,390 I'll call it agree.py in this case. 1542 01:15:35,390 --> 01:15:38,510 And it turns out we can solve this one in a few different ways. 1543 01:15:38,510 --> 01:15:42,950 Let me go ahead and start off by importing from CS50 get_int, 1544 01:15:42,950 --> 01:15:46,190 just because it's-- oh, no, get_string, rather, because it's convenient. 1545 01:15:46,190 --> 01:15:49,200 Let me go ahead and get the user's input via get_string 1546 01:15:49,200 --> 01:15:53,510 and ask them the same question, "Do you agree," question mark with a space. 1547 01:15:53,510 --> 01:15:54,350 Then let me check. 1548 01:15:54,350 --> 01:16:01,760 If s equals equals quote, unquote "Y" or s equals equals lowercase "y," then 1549 01:16:01,760 --> 01:16:04,970 I'm going to go ahead and print out "Agreed." 1550 01:16:04,970 --> 01:16:11,680 Else-- oh, no, elif s equals equals capital "N" 1551 01:16:11,680 --> 01:16:17,860 or s equals equals lowercase "n," let me go ahead and print out here quote, 1552 01:16:17,860 --> 01:16:20,530 unquote, "Not agreed." 1553 01:16:20,530 --> 01:16:22,930 And I think that should do it. 1554 01:16:22,930 --> 01:16:25,120 But something's weird here. 1555 01:16:25,120 --> 01:16:27,250 There's a few differences. 1556 01:16:27,250 --> 01:16:31,090 What strikes you as different from C? 1557 01:16:31,090 --> 01:16:33,640 What muscle memory might you have to break now 1558 01:16:33,640 --> 01:16:37,660 when using conditions with multiple Boolean expressions 1559 01:16:37,660 --> 01:16:39,512 combined in this way? 1560 01:16:39,512 --> 01:16:40,720 And there's another subtlety. 1561 01:16:40,720 --> 01:16:43,660 There's at least two salient differences between C and Python 1562 01:16:43,660 --> 01:16:45,160 with just this example alone. 1563 01:16:45,160 --> 01:16:48,100 1564 01:16:48,100 --> 01:16:51,600 Any thoughts in chat or [INAUDIBLE]? 1565 01:16:51,600 --> 01:16:52,100 Ryan? 1566 01:16:52,100 --> 01:16:53,892 AUDIENCE: I was going to say, for this one, 1567 01:16:53,892 --> 01:16:56,300 instead of using the symbols for the logical operators, 1568 01:16:56,300 --> 01:16:57,990 you can just type the text directly. 1569 01:16:57,990 --> 01:16:58,740 DAVID MALAN: Yeah. 1570 01:16:58,740 --> 01:17:01,020 We can literally just type the English word "or" 1571 01:17:01,020 --> 01:17:03,150 if we want to express a logical or. 1572 01:17:03,150 --> 01:17:06,445 So in C, recall on the left, we would have done this vertical bar 1573 01:17:06,445 --> 01:17:07,320 thing, which is fine. 1574 01:17:07,320 --> 01:17:08,140 You get used to it. 1575 01:17:08,140 --> 01:17:10,770 But it's not very readable, at least in any English sense. 1576 01:17:10,770 --> 01:17:13,470 Python took the approach of using more frequently 1577 01:17:13,470 --> 01:17:17,628 actual English or English-like words that actually do read left to right. 1578 01:17:17,628 --> 01:17:19,170 And indeed, a theme is emerging here. 1579 01:17:19,170 --> 01:17:22,740 When you read Python code, it is closer to English 1580 01:17:22,740 --> 01:17:26,040 than C is, because you don't trip over as much punctuation. 1581 01:17:26,040 --> 01:17:29,340 Each line of Python code tends to read a little more 1582 01:17:29,340 --> 01:17:32,190 like an English phrase or an English sentence. 1583 01:17:32,190 --> 01:17:33,930 And there's one other subtlety here. 1584 01:17:33,930 --> 01:17:37,470 On the left back in week 1, I took care to use single quotes 1585 01:17:37,470 --> 01:17:39,600 around the Ys and the Ns. 1586 01:17:39,600 --> 01:17:41,620 This week I'm using double quotes. 1587 01:17:41,620 --> 01:17:43,840 But to be honest, it actually doesn't matter. 1588 01:17:43,840 --> 01:17:48,360 I can alternatively use single quotes everywhere, so long as I'm consistent. 1589 01:17:48,360 --> 01:17:50,610 But in Python there is no fundamental difference 1590 01:17:50,610 --> 01:17:54,630 between double quotes and single quotes, so long as you are consistent. 1591 01:17:54,630 --> 01:17:58,050 The reason being, when we looked at the data types that existed between C 1592 01:17:58,050 --> 01:18:03,030 and now Python, absent from the list of Python data types was char. 1593 01:18:03,030 --> 01:18:06,600 In Python there is no such thing as an individual char. 1594 01:18:06,600 --> 01:18:09,810 Everything that's character-based is a string. 1595 01:18:09,810 --> 01:18:13,650 Even if it's just one character long, everything is a string. 1596 01:18:13,650 --> 01:18:16,290 Downside is we don't have quite as fine grained control. 1597 01:18:16,290 --> 01:18:21,355 Upside is we get a lot more features with those string structures, 1598 01:18:21,355 --> 01:18:23,730 as we've already seen with, for instance, doing something 1599 01:18:23,730 --> 01:18:26,625 like uppercase with those as well. 1600 01:18:26,625 --> 01:18:27,750 Well, let me go ahead and-- 1601 01:18:27,750 --> 01:18:29,310 I think I can simplify this. 1602 01:18:29,310 --> 01:18:32,550 For instance, suppose I wanted to tolerate something like not just "Y" 1603 01:18:32,550 --> 01:18:34,980 or "y," in uppercase or lowercase. 1604 01:18:34,980 --> 01:18:40,230 Suppose I wanted to also tolerate "Yes" in uppercase or lowercase as well. 1605 01:18:40,230 --> 01:18:42,720 Well, you could imagine just starting to add to the code 1606 01:18:42,720 --> 01:18:48,128 or s equals equals "Yes," or s equals equals "yes." 1607 01:18:48,128 --> 01:18:50,670 But wait a minute, what if the user is being a little sloppy? 1608 01:18:50,670 --> 01:18:54,360 And what if I want to actually say like, well, what if they're yelling? 1609 01:18:54,360 --> 01:18:57,135 Or s equals equals "YES" in all caps. 1610 01:18:57,135 --> 01:18:59,010 And there's a few other permutations as well. 1611 01:18:59,010 --> 01:19:02,130 Like, this is quickly devolving into quite the mess. 1612 01:19:02,130 --> 01:19:06,270 But if at the end of the day you really just want to detect "Y" or the word 1613 01:19:06,270 --> 01:19:11,190 "Yes," irrespective of capitalization, I bet we can be pretty clever in Python 1614 01:19:11,190 --> 01:19:11,790 here. 1615 01:19:11,790 --> 01:19:19,230 What if I go ahead and say, if s is in quote, unquote "y" or "yes"-- 1616 01:19:19,230 --> 01:19:21,780 in fact, I can borrow an idea from earlier, 1617 01:19:21,780 --> 01:19:24,300 whereby I can use the square bracket notation to give me 1618 01:19:24,300 --> 01:19:27,300 a list, which again, is like an array, but it will automatically grow 1619 01:19:27,300 --> 01:19:28,440 or shrink as you need it. 1620 01:19:28,440 --> 01:19:30,690 You don't have to decide in advance how big it is. 1621 01:19:30,690 --> 01:19:34,950 This preposition here, in, is a new keyword in Python 1622 01:19:34,950 --> 01:19:37,470 that will literally answer that question for me. 1623 01:19:37,470 --> 01:19:39,030 And we've used it before earlier. 1624 01:19:39,030 --> 01:19:44,610 When I implemented speller, I said if the word is in my set of words, return 1625 01:19:44,610 --> 01:19:45,490 True. 1626 01:19:45,490 --> 01:19:50,208 So if s in this list, I'll get back True or False 1627 01:19:50,208 --> 01:19:51,750 based on the answer to that question. 1628 01:19:51,750 --> 01:19:53,310 But again, it's not tolerating case. 1629 01:19:53,310 --> 01:19:54,420 But no big deal-- 1630 01:19:54,420 --> 01:19:58,170 dot lower, now I can say, is the lowercase version of s, 1631 01:19:58,170 --> 01:20:01,920 no matter what the human typed in, in this list of two values? 1632 01:20:01,920 --> 01:20:04,500 That means now the user can type in all caps, 1633 01:20:04,500 --> 01:20:10,770 in alternating caps, and one capitalized letter, or any other permutation 1634 01:20:10,770 --> 01:20:12,300 whatsoever. 1635 01:20:12,300 --> 01:20:13,020 All right. 1636 01:20:13,020 --> 01:20:14,880 So that, then, is our conditions. 1637 01:20:14,880 --> 01:20:19,440 Let me pause here to see if there's any questions. 1638 01:20:19,440 --> 01:20:21,870 Any questions or confusion that we can clear up? 1639 01:20:21,870 --> 01:20:24,870 With syntax, with conditions, Boolean variable-- 1640 01:20:24,870 --> 01:20:26,032 Boolean values? 1641 01:20:26,032 --> 01:20:27,240 BRIAN: So a question came up. 1642 01:20:27,240 --> 01:20:30,660 So in Python we are allowed to use the equals equals syntax 1643 01:20:30,660 --> 01:20:32,032 to compare two strings? 1644 01:20:32,032 --> 01:20:32,740 DAVID MALAN: Yes. 1645 01:20:32,740 --> 01:20:34,740 So another really good catch. 1646 01:20:34,740 --> 01:20:37,372 In Python, there are no pointers. 1647 01:20:37,372 --> 01:20:39,330 Underneath the hood, there are still addresses. 1648 01:20:39,330 --> 01:20:40,955 Like, your memory hasn't gone anywhere. 1649 01:20:40,955 --> 01:20:44,340 But underneath the hood, all, of that is now managed for you by the language 1650 01:20:44,340 --> 01:20:45,070 itself. 1651 01:20:45,070 --> 01:20:49,020 So if you want to conceptually compare one string against another, 1652 01:20:49,020 --> 01:20:53,370 just as I did here now on line 7, you can indeed use equals equals, 1653 01:20:53,370 --> 01:20:56,590 and Python will do the quote, unquote "right thing" for you. 1654 01:20:56,590 --> 01:21:00,647 You don't need to regress into using strcmp instead. 1655 01:21:00,647 --> 01:21:02,730 Just for clarity, let me go ahead and update this. 1656 01:21:02,730 --> 01:21:08,430 If s.lower in quote, unquote "n" or comma "no," 1657 01:21:08,430 --> 01:21:12,540 we can achieve the same result there by doing the same technique. 1658 01:21:12,540 --> 01:21:15,450 Well, let me go ahead and open up another example 1659 01:21:15,450 --> 01:21:17,940 that you might recall we did a progression of examples 1660 01:21:17,940 --> 01:21:22,380 to make it good, better, and then best, this one involving 1661 01:21:22,380 --> 01:21:24,100 just a cat meowing in some form. 1662 01:21:24,100 --> 01:21:26,760 So let me go ahead and open up from week 1 1663 01:21:26,760 --> 01:21:31,410 an example that was called meow0, relatively straightforward, that 1664 01:21:31,410 --> 01:21:32,730 simply did this. 1665 01:21:32,730 --> 01:21:34,350 It simply meowed three times. 1666 01:21:34,350 --> 01:21:37,230 So suffice it to say now, in Python, it's pretty trivial 1667 01:21:37,230 --> 01:21:38,820 to do something three times like this. 1668 01:21:38,820 --> 01:21:41,730 I'm going to go ahead and call this meow.py. 1669 01:21:41,730 --> 01:21:45,240 And of course, I can just do something like print "meow." 1670 01:21:45,240 --> 01:21:46,750 And I can just copy paste that. 1671 01:21:46,750 --> 01:21:49,740 But of course, the whole point of this example back in week 1 1672 01:21:49,740 --> 01:21:52,020 was not to devolve into just copy paste. 1673 01:21:52,020 --> 01:21:53,377 Surely there's a better way. 1674 01:21:53,377 --> 01:21:54,960 And we've seen a better way this time. 1675 01:21:54,960 --> 01:21:57,750 If we wanted to change this into a for loop in C, 1676 01:21:57,750 --> 01:22:04,650 we could have done something like for int i get 0, i less than 3, i++. 1677 01:22:04,650 --> 01:22:06,630 Then in some curly braces we could have done 1678 01:22:06,630 --> 01:22:09,780 printf of "meow," new line, semicolon. 1679 01:22:09,780 --> 01:22:12,750 So that was the next version of our meow code in C. 1680 01:22:12,750 --> 01:22:15,270 But in Python, of course, it's a little more succinct. 1681 01:22:15,270 --> 01:22:22,320 I can just do for i in range 3 print quote, unquote "meow." 1682 01:22:22,320 --> 01:22:24,792 So very similar in spirit to our hello, world of before. 1683 01:22:24,792 --> 01:22:27,250 But again, we don't have to include any libraries for this. 1684 01:22:27,250 --> 01:22:28,980 We don't need to have a main function. 1685 01:22:28,980 --> 01:22:31,688 We don't need any of those curly braces or semicolon or the like. 1686 01:22:31,688 --> 01:22:35,040 We can just dive in and focus on the code itself. 1687 01:22:35,040 --> 01:22:39,840 But recall that we also, last time, evolved the meow program 1688 01:22:39,840 --> 01:22:42,780 into having our own helper function, our own function that 1689 01:22:42,780 --> 01:22:48,480 actually allowed us to create an abstraction on top of meowing. 1690 01:22:48,480 --> 01:22:51,120 And that was in our third version, a.k.a., meow2. 1691 01:22:51,120 --> 01:22:53,400 Let me go ahead and open up this version in a tab. 1692 01:22:53,400 --> 01:22:56,970 And notice that this version starts to get a little involved, because one, 1693 01:22:56,970 --> 01:22:59,580 we needed a prototype at the top, because I now 1694 01:22:59,580 --> 01:23:02,520 have meow function at the bottom whose purpose in life 1695 01:23:02,520 --> 01:23:04,860 was just to print "meow," but to abstract that away 1696 01:23:04,860 --> 01:23:07,140 as a new helper function. 1697 01:23:07,140 --> 01:23:10,230 And then I had this code here with a for loop inside. 1698 01:23:10,230 --> 01:23:13,800 Well, in Python it's going to work out to be a little simpler here, too. 1699 01:23:13,800 --> 01:23:17,040 If I want to do something three times, for i in range of 3 1700 01:23:17,040 --> 01:23:18,990 go ahead and call meow. 1701 01:23:18,990 --> 01:23:21,460 Now of course, meow doesn't yet exist. 1702 01:23:21,460 --> 01:23:22,770 So I can solve that problem. 1703 01:23:22,770 --> 01:23:25,290 We've seen earlier, albeit quickly, in speller that I 1704 01:23:25,290 --> 01:23:27,180 can define my own functions like meow. 1705 01:23:27,180 --> 01:23:29,223 There's no more void, because if you don't 1706 01:23:29,223 --> 01:23:31,890 want to have arguments in a function, just don't put them there. 1707 01:23:31,890 --> 01:23:34,770 There's no return value specified in Python. 1708 01:23:34,770 --> 01:23:36,000 They're implicit instead. 1709 01:23:36,000 --> 01:23:37,800 So it suffices to do this. 1710 01:23:37,800 --> 01:23:40,530 And now I can just print out "meow." 1711 01:23:40,530 --> 01:23:44,370 So here now, I have a program that iterates three times, 1712 01:23:44,370 --> 01:23:47,770 calling meow each time, and meow is defined down below. 1713 01:23:47,770 --> 01:23:51,876 Let me go ahead and run this, python of meow.py. 1714 01:23:51,876 --> 01:23:52,650 Huh. 1715 01:23:52,650 --> 01:23:54,690 Traceback, most recent call last. 1716 01:23:54,690 --> 01:23:59,010 There's a problem on line 2 of meow.py because of NameError-- name 1717 01:23:59,010 --> 01:24:01,360 "meow" is not defined. 1718 01:24:01,360 --> 01:24:05,550 Now, the language being used there by Python is a little different from C's. 1719 01:24:05,550 --> 01:24:10,050 It's frankly a little more human friendly. 1720 01:24:10,050 --> 01:24:11,400 But what just happened? 1721 01:24:11,400 --> 01:24:16,365 What problem has arisen that I yet haven't tripped over until now? 1722 01:24:16,365 --> 01:24:19,210 1723 01:24:19,210 --> 01:24:21,620 Even if you've never programmed in Python before, 1724 01:24:21,620 --> 01:24:26,770 and even if you haven't run help50 yet, what might be the issue there? 1725 01:24:26,770 --> 01:24:28,750 Ginny? 1726 01:24:28,750 --> 01:24:32,410 AUDIENCE: It's that the function is not found when we are trying to call it, 1727 01:24:32,410 --> 01:24:35,370 because it's described below when we are calling it. 1728 01:24:35,370 --> 01:24:36,130 DAVID MALAN: Yeah. 1729 01:24:36,130 --> 01:24:37,703 AUDIENCE: There is no prototype. 1730 01:24:37,703 --> 01:24:39,370 DAVID MALAN: Yeah, there's no prototype. 1731 01:24:39,370 --> 01:24:42,370 And it turns out in Python, there isn't a notion of prototypes. 1732 01:24:42,370 --> 01:24:44,860 So unfortunately, the solution we saw in week 1 1733 01:24:44,860 --> 01:24:47,410 is not to just copy and paste the first line up above 1734 01:24:47,410 --> 01:24:48,850 and end it with a semicolon. 1735 01:24:48,850 --> 01:24:50,170 That's just not a thing. 1736 01:24:50,170 --> 01:24:51,520 I could do this. 1737 01:24:51,520 --> 01:24:54,760 I could just move my meow function to the top of the file, 1738 01:24:54,760 --> 01:24:58,330 thereby defining the function first, and then using it last. 1739 01:24:58,330 --> 01:25:01,480 And that would actually solve the problem, "meow meow meow." 1740 01:25:01,480 --> 01:25:03,940 That, of course, doesn't really help us long term, 1741 01:25:03,940 --> 01:25:06,940 because you could probably imagine a situation where this function wants 1742 01:25:06,940 --> 01:25:09,273 to call this function, but this function calls this one, 1743 01:25:09,273 --> 01:25:12,592 and you just can't really neatly order them in some safe way. 1744 01:25:12,592 --> 01:25:14,800 And it's just not going to be as maintainable, right? 1745 01:25:14,800 --> 01:25:18,220 Recall that one of the values of putting main at the top of our C programs 1746 01:25:18,220 --> 01:25:21,128 was that any reasonable person who wants to understand your code 1747 01:25:21,128 --> 01:25:23,170 is probably going to start reading top to bottom. 1748 01:25:23,170 --> 01:25:26,003 They're not going to want to have to scroll through all of your code 1749 01:25:26,003 --> 01:25:28,120 looking for the actual main code. 1750 01:25:28,120 --> 01:25:32,050 So it turns out in Python, even though you don't need a main function, 1751 01:25:32,050 --> 01:25:35,410 it's actually common to define one nonetheless. 1752 01:25:35,410 --> 01:25:38,560 It's going to be implemented with something like this. 1753 01:25:38,560 --> 01:25:41,780 And I'm just going to indent my code below that there. 1754 01:25:41,780 --> 01:25:43,510 So now I've defined main. 1755 01:25:43,510 --> 01:25:45,490 But I haven't executed any code yet. 1756 01:25:45,490 --> 01:25:49,690 On line 6, I've now defined meow, but I haven't executed any code yet. 1757 01:25:49,690 --> 01:25:50,890 And I mean that literally. 1758 01:25:50,890 --> 01:25:53,530 If I run python of meow now and hit Enter, 1759 01:25:53,530 --> 01:25:58,390 I would hope to see "meow meow meow," but I see nothing. 1760 01:25:58,390 --> 01:26:00,470 And this is a little weird. 1761 01:26:00,470 --> 01:26:02,860 But Python is doing literally what I told it to do. 1762 01:26:02,860 --> 01:26:04,840 I told it to define a function called main, 1763 01:26:04,840 --> 01:26:08,290 and I told it to define a function called meow. 1764 01:26:08,290 --> 01:26:12,840 What I never told it to do is to call either of those functions. 1765 01:26:12,840 --> 01:26:16,480 So the simplest fix here-- it's a little different from C and a little weird-- 1766 01:26:16,480 --> 01:26:19,700 is just call main is your very last thought in the file. 1767 01:26:19,700 --> 01:26:23,600 So define main up at the top, just where most programmers would expect it to be, 1768 01:26:23,600 --> 01:26:25,292 but call it all the way at the bottom. 1769 01:26:25,292 --> 01:26:27,250 And let me go ahead and now and run my program. 1770 01:26:27,250 --> 01:26:30,640 And now voila, "meow meow meow" is back, because I've defined main, 1771 01:26:30,640 --> 01:26:33,910 I've defined meow, and now I am calling main. 1772 01:26:33,910 --> 01:26:37,510 Now, as an aside, you will very often see in various documentation 1773 01:26:37,510 --> 01:26:42,250 and tutorials online a much more cryptic incantation than this, 1774 01:26:42,250 --> 01:26:44,440 which will have you typing out this. 1775 01:26:44,440 --> 01:26:47,050 This achieves the same goal, but it's not strictly necessary 1776 01:26:47,050 --> 01:26:47,860 for our purposes. 1777 01:26:47,860 --> 01:26:50,920 This line of code, if you see it in any online references, or examples, 1778 01:26:50,920 --> 01:26:53,328 or books, or sections or the like, it is necessary 1779 01:26:53,328 --> 01:26:55,120 only when you're implementing, essentially, 1780 01:26:55,120 --> 01:26:58,030 your own libraries-- like your own CS50 library, 1781 01:26:58,030 --> 01:27:00,580 or your own image blurring library or the like. 1782 01:27:00,580 --> 01:27:03,875 It's not necessary when we're just writing individual programs of our own. 1783 01:27:03,875 --> 01:27:07,000 So I'm going to go ahead and keep mine simple and literally just call main. 1784 01:27:07,000 --> 01:27:10,630 And let me just wave my hand at why you'd need that syntax otherwise 1785 01:27:10,630 --> 01:27:11,950 in this context. 1786 01:27:11,950 --> 01:27:14,770 But let me go ahead and modify this one last time. 1787 01:27:14,770 --> 01:27:18,520 Because recall that in C, the last version of my program 1788 01:27:18,520 --> 01:27:21,310 had me running meow and passing it input. 1789 01:27:21,310 --> 01:27:24,670 Because I defined meow as taking an input like n, 1790 01:27:24,670 --> 01:27:30,370 and then doing something like for int i get 0, i less than n, i++, 1791 01:27:30,370 --> 01:27:34,030 and then inside of my curly braces did I print meow, 1792 01:27:34,030 --> 01:27:38,470 so that now I have a helper function that I've invented that takes one 1793 01:27:38,470 --> 01:27:40,240 input, an int called n. 1794 01:27:40,240 --> 01:27:43,390 And it loops that many times and prints out meow that many times. 1795 01:27:43,390 --> 01:27:47,320 And now I have a real nice abstraction, and that now my program is distilled, 1796 01:27:47,320 --> 01:27:48,820 it's just meow three times. 1797 01:27:48,820 --> 01:27:51,130 And it doesn't matter how I implemented meow. 1798 01:27:51,130 --> 01:27:52,900 I can do the same thing in Python. 1799 01:27:52,900 --> 01:27:56,530 I can go ahead and say that meow takes an argument called n. 1800 01:27:56,530 --> 01:27:58,450 I don't have to bother specifying its type. 1801 01:27:58,450 --> 01:28:03,520 I can now say for i in range of n, and I can print "meow" that many times. 1802 01:28:03,520 --> 01:28:07,810 And now I can get rid of my loop in main and just say "meow" three times. 1803 01:28:07,810 --> 01:28:09,310 And so same functionality. 1804 01:28:09,310 --> 01:28:11,770 If I run this a final time, "meow meow meow," 1805 01:28:11,770 --> 01:28:16,570 but now I'm kind of designing my code in a more sophisticated way 1806 01:28:16,570 --> 01:28:23,320 by actually giving myself now some of my own actual helper functions. 1807 01:28:23,320 --> 01:28:26,350 All right, any questions, then, on this progression? 1808 01:28:26,350 --> 01:28:28,810 Now, we're not really seeing new Python syntax. 1809 01:28:28,810 --> 01:28:32,980 We're now just seeing a translation of some actual past C programs 1810 01:28:32,980 --> 01:28:37,020 into Python to show really the equivalence. 1811 01:28:37,020 --> 01:28:39,560 1812 01:28:39,560 --> 01:28:40,080 All right. 1813 01:28:40,080 --> 01:28:42,247 Well, let's go ahead, then, and open another version 1814 01:28:42,247 --> 01:28:45,590 from week 1 of a program called positive.c, 1815 01:28:45,590 --> 01:28:49,850 which was an opportunity back then, not only to define our own helper function 1816 01:28:49,850 --> 01:28:52,760 called get_positive_int, but it also introduced us 1817 01:28:52,760 --> 01:28:54,440 to the familiar do while loop. 1818 01:28:54,440 --> 01:28:57,380 And unfortunately, we're going to take that away from you now. 1819 01:28:57,380 --> 01:28:59,390 Python does not have a do while loop. 1820 01:28:59,390 --> 01:29:03,260 But it's, of course, a very useful thing to be able to do something 1821 01:29:03,260 --> 01:29:04,730 while a condition is true. 1822 01:29:04,730 --> 01:29:07,700 After all, pretty much any time we've gotten user input in the class, 1823 01:29:07,700 --> 01:29:11,570 we've used do while, so that we prompt them at least once and then optionally 1824 01:29:11,570 --> 01:29:14,300 again and again and again, until they cooperate. 1825 01:29:14,300 --> 01:29:16,430 So let me go ahead and implement this in Python 1826 01:29:16,430 --> 01:29:24,490 now in a file called positive.py, and go ahead here in positive.py, 1827 01:29:24,490 --> 01:29:27,050 and translate this thing as follows. 1828 01:29:27,050 --> 01:29:31,060 Let me go ahead and from cs50 import get_int. 1829 01:29:31,060 --> 01:29:33,670 Let me go ahead and define a function called main. 1830 01:29:33,670 --> 01:29:35,920 So now I'm just going to start to get into this habit. 1831 01:29:35,920 --> 01:29:38,575 I'm going to go ahead and give myself a variable called i 1832 01:29:38,575 --> 01:29:40,953 and call get_positive_int. 1833 01:29:40,953 --> 01:29:43,120 And then I'm just going to go ahead and print out i, 1834 01:29:43,120 --> 01:29:44,560 keeping it nice and simple. 1835 01:29:44,560 --> 01:29:48,550 Now I have to implement get_positive_int. 1836 01:29:48,550 --> 01:29:53,440 It doesn't need to take any input, so I'm not going to give it any arguments. 1837 01:29:53,440 --> 01:29:55,510 And now I have to do to do while thing. 1838 01:29:55,510 --> 01:29:59,770 So the Pythonic way to do this in Python is almost always 1839 01:29:59,770 --> 01:30:02,320 to deliberately induce an infinite loop. 1840 01:30:02,320 --> 01:30:04,990 And the idea being, if you want to do something again and again, 1841 01:30:04,990 --> 01:30:07,930 just start doing it forever and then break out of the loop 1842 01:30:07,930 --> 01:30:09,458 when you are ready to. 1843 01:30:09,458 --> 01:30:11,500 So what do I want to do forever in this function? 1844 01:30:11,500 --> 01:30:14,230 Well, I want to go ahead and get an int and prompt 1845 01:30:14,230 --> 01:30:17,600 the human for a positive integer. 1846 01:30:17,600 --> 01:30:21,490 And then I want to go ahead on the next line and ask a question. 1847 01:30:21,490 --> 01:30:27,280 Well, if n is greater than 0, thereby making it positive, break. 1848 01:30:27,280 --> 01:30:31,490 And the last line of code here is going to be to return n. 1849 01:30:31,490 --> 01:30:35,170 So notice in C on the left, I did this do whole thing. 1850 01:30:35,170 --> 01:30:37,870 I had to declare n outside of the do while loop, 1851 01:30:37,870 --> 01:30:40,630 because it had to be outside the curly braces to be in scope. 1852 01:30:40,630 --> 01:30:44,290 But in Python here, notice what I'm doing here 1853 01:30:44,290 --> 01:30:47,860 is actually a little bit different. 1854 01:30:47,860 --> 01:30:50,050 And did I screw up? 1855 01:30:50,050 --> 01:30:53,790 1856 01:30:53,790 --> 01:30:55,300 Oh, yes, I did screw up. 1857 01:30:55,300 --> 01:30:56,500 OK. 1858 01:30:56,500 --> 01:30:58,840 If ask the actual question, if n greater than 0. 1859 01:30:58,840 --> 01:31:01,690 So what did I do actually differently here on the right-hand side? 1860 01:31:01,690 --> 01:31:03,982 Well, notice, I deliberately induced this infinite loop 1861 01:31:03,982 --> 01:31:06,550 on line 10, which just means, do the following forever. 1862 01:31:06,550 --> 01:31:10,510 I then ask the user for a variable n with get_int, and then I check, 1863 01:31:10,510 --> 01:31:12,460 is n greater than 0? 1864 01:31:12,460 --> 01:31:14,408 If so, break out of the loop. 1865 01:31:14,408 --> 01:31:15,700 How do I break out of the loop? 1866 01:31:15,700 --> 01:31:19,160 Well, notice that the indentation here has been very consistent. 1867 01:31:19,160 --> 01:31:22,150 So when I break out of the loop, that puts me back 1868 01:31:22,150 --> 01:31:26,055 in line with the original indentation which is now on line 14. 1869 01:31:26,055 --> 01:31:28,930 Notice that the return lines up with the while loop, which means it's 1870 01:31:28,930 --> 01:31:31,840 the first line of code that's outside of that loop. 1871 01:31:31,840 --> 01:31:34,300 In the past, we would have had very explicit curly braces. 1872 01:31:34,300 --> 01:31:37,752 Now we rely only on indentation that then lets me return n. 1873 01:31:37,752 --> 01:31:39,460 So what are some of the differences here? 1874 01:31:39,460 --> 01:31:41,530 One, the do while loop is completely gone. 1875 01:31:41,530 --> 01:31:45,250 But two, scope is no longer an issue. 1876 01:31:45,250 --> 01:31:48,940 It turns out in Python that the moment you declare a variable, 1877 01:31:48,940 --> 01:31:51,980 it exists until the end of that function. 1878 01:31:51,980 --> 01:31:55,750 You don't have to worry about the nuance of declaring a variable first like we 1879 01:31:55,750 --> 01:31:59,620 did in C up here and then returning it down below. 1880 01:31:59,620 --> 01:32:03,220 The moment we execute this line of code 11 here, n 1881 01:32:03,220 --> 01:32:07,220 suddenly exists for the entirety of the remainder of that function. 1882 01:32:07,220 --> 01:32:10,120 So even though we declared it inside of the loop, so to speak, 1883 01:32:10,120 --> 01:32:14,800 as per the indentation, it is still accessible to the return statement 1884 01:32:14,800 --> 01:32:16,610 here at the end of the program. 1885 01:32:16,610 --> 01:32:17,110 All right. 1886 01:32:17,110 --> 01:32:21,250 Let me pause there and see if there's any questions or confusion 1887 01:32:21,250 --> 01:32:26,530 on getting user input, doing the equivalent, logically, of do while, 1888 01:32:26,530 --> 01:32:28,870 but doing it now in this more Pythonic way. 1889 01:32:28,870 --> 01:32:29,530 Peter? 1890 01:32:29,530 --> 01:32:32,935 AUDIENCE: In Python, are variables accessible across functions or no? 1891 01:32:32,935 --> 01:32:34,060 DAVID MALAN: Good question. 1892 01:32:34,060 --> 01:32:34,660 No. 1893 01:32:34,660 --> 01:32:37,360 So if you declare a variable inside of a function, 1894 01:32:37,360 --> 01:32:40,100 it is scoped, so to speak, to that function. 1895 01:32:40,100 --> 01:32:41,500 It is not available elsewhere. 1896 01:32:41,500 --> 01:32:44,710 You would have to return it and pass it as output to input. 1897 01:32:44,710 --> 01:32:50,740 Or you would have to define it, for instance, as a global variable instead. 1898 01:32:50,740 --> 01:32:51,440 All right. 1899 01:32:51,440 --> 01:32:53,500 Well, what else, then, might we translate? 1900 01:32:53,500 --> 01:32:57,680 Well, recall from our earlier endeavors in week 1, 1901 01:32:57,680 --> 01:32:59,680 we played around with these examples from Mario. 1902 01:32:59,680 --> 01:33:02,920 And for instance, we wanted to print something out in Python-- 1903 01:33:02,920 --> 01:33:07,690 in C that mimics the notion of these pyramids, or these coins, 1904 01:33:07,690 --> 01:33:09,760 or these little bricks on the screen. 1905 01:33:09,760 --> 01:33:13,858 Well, here let me go ahead and open up a new file called mario.py. 1906 01:33:13,858 --> 01:33:16,900 And I'm going to transition away from always showing the before and after 1907 01:33:16,900 --> 01:33:19,420 and now just start to focus more on the Python code. 1908 01:33:19,420 --> 01:33:22,550 But you can always look back if you wanted the corresponding C versions. 1909 01:33:22,550 --> 01:33:25,998 How do I go about printing out three bricks like this vertically? 1910 01:33:25,998 --> 01:33:27,790 Well, in Python I might say something like, 1911 01:33:27,790 --> 01:33:34,000 for i in range of 3, quite simply, as we've done a few times already, 1912 01:33:34,000 --> 01:33:35,680 and just go ahead and print out a hash. 1913 01:33:35,680 --> 01:33:38,740 I don't need to worry about the new line, because you get it for free, 1914 01:33:38,740 --> 01:33:39,610 so to speak. 1915 01:33:39,610 --> 01:33:42,700 But I'm going to go ahead now and run python of mario.py. 1916 01:33:42,700 --> 01:33:47,620 And voila, there's my very simple ASCII version of this Mario structure. 1917 01:33:47,620 --> 01:33:49,870 But what if I want to do the coins instead? 1918 01:33:49,870 --> 01:33:54,040 What if I want to do this horizontal coins that appear in these four bricks 1919 01:33:54,040 --> 01:33:56,320 and print out a version of that? 1920 01:33:56,320 --> 01:33:57,770 Well, how might I do that? 1921 01:33:57,770 --> 01:34:00,430 Well, let me go ahead and change this to be-- 1922 01:34:00,430 --> 01:34:05,150 instead in my code for i in range of 4, so I can print four of these things. 1923 01:34:05,150 --> 01:34:09,080 Let me go ahead and print out a question mark and then run this. 1924 01:34:09,080 --> 01:34:11,440 So let me run mario.py. 1925 01:34:11,440 --> 01:34:13,180 And voila-- damn. 1926 01:34:13,180 --> 01:34:15,110 Like, not what I wanted. 1927 01:34:15,110 --> 01:34:16,750 And so here is that trade-off. 1928 01:34:16,750 --> 01:34:18,580 You might have been kind of excited, so far 1929 01:34:18,580 --> 01:34:21,790 as it's possible to be excited about code, that, oh, my God, 1930 01:34:21,790 --> 01:34:24,500 you don't need to do the stupid new line characters anymore. 1931 01:34:24,500 --> 01:34:25,870 But what if you don't want it? 1932 01:34:25,870 --> 01:34:31,390 Now we've kind of found a downside of getting those new lines automatically. 1933 01:34:31,390 --> 01:34:34,870 Well, it turns out if we read the documentation for the print function 1934 01:34:34,870 --> 01:34:38,170 in Python, it, too, can take multiple arguments. 1935 01:34:38,170 --> 01:34:41,462 And what's powerful about Python, too, is 1936 01:34:41,462 --> 01:34:43,420 that it supports not just positional arguments, 1937 01:34:43,420 --> 01:34:47,620 where you just do a comma separated list of multiple arguments to a function. 1938 01:34:47,620 --> 01:34:51,070 Python supports what are called named arguments, whereby 1939 01:34:51,070 --> 01:34:53,950 if a function, especially one that's super powerful like print, 1940 01:34:53,950 --> 01:34:58,540 takes multiple inputs, like this one, this other one, and this other thing. 1941 01:34:58,540 --> 01:35:00,430 Each of those inputs can have names. 1942 01:35:00,430 --> 01:35:04,240 And you, the user of that function, can specify the name. 1943 01:35:04,240 --> 01:35:10,330 And it turns out that print in Python supports an argument called "end." 1944 01:35:10,330 --> 01:35:14,920 And you can explicitly say what value you want to give to that parameter 1945 01:35:14,920 --> 01:35:16,300 by mentioning its name. 1946 01:35:16,300 --> 01:35:18,550 And here I'm going to literally do this. 1947 01:35:18,550 --> 01:35:20,740 I'm going to tell the print function that I 1948 01:35:20,740 --> 01:35:25,900 want the value of "end," a parameter, an argument to it, to be quote, unquote. 1949 01:35:25,900 --> 01:35:28,630 The reason for that is that if I read the documentation, 1950 01:35:28,630 --> 01:35:30,160 the default is actually this. 1951 01:35:30,160 --> 01:35:34,600 If you read the documentation, it will tell you print's default value 1952 01:35:34,600 --> 01:35:37,510 for its end argument is backslash n. 1953 01:35:37,510 --> 01:35:39,580 This, too, is a feature that C did not have. 1954 01:35:39,580 --> 01:35:41,350 C did not have optional arguments. 1955 01:35:41,350 --> 01:35:43,510 They're either there or they're not. 1956 01:35:43,510 --> 01:35:46,870 Rather, they either have to be there, or they cannot be there. 1957 01:35:46,870 --> 01:35:51,310 Python supports optional arguments that even have default values. 1958 01:35:51,310 --> 01:35:54,910 And so in this case, the default value of this, per the documentation, 1959 01:35:54,910 --> 01:35:58,240 is that end is quote, unquote backslash n, which 1960 01:35:58,240 --> 01:36:00,880 is why every line ends with that value. 1961 01:36:00,880 --> 01:36:02,950 If you want to change that to be nothing, 1962 01:36:02,950 --> 01:36:06,070 the so-called empty string, you change it to quote, unquote. 1963 01:36:06,070 --> 01:36:09,760 So let me go ahead and run this now, and voila, closer. 1964 01:36:09,760 --> 01:36:12,490 It's a little stupid looking, because now my cursor ended up-- 1965 01:36:12,490 --> 01:36:14,690 my prompt ended up on the same line. 1966 01:36:14,690 --> 01:36:18,130 So maybe after this line, let me just go ahead and print nothing, that is, 1967 01:36:18,130 --> 01:36:19,190 a new line. 1968 01:36:19,190 --> 01:36:23,775 And now if I run mario.py, voila, now I get the effect I want. 1969 01:36:23,775 --> 01:36:25,900 And if you want to see what's really going on here, 1970 01:36:25,900 --> 01:36:28,150 I can do something stupid like "HELLO." 1971 01:36:28,150 --> 01:36:34,180 And now I can end every print with "HELLO," "HELLO," "HELLO," "HELLO." 1972 01:36:34,180 --> 01:36:36,430 Not that you would do that, but that's all it means. 1973 01:36:36,430 --> 01:36:40,060 It's ending every call to print with that expression. 1974 01:36:40,060 --> 01:36:44,560 But the correct version, of course, is just to blank it out in this way. 1975 01:36:44,560 --> 01:36:47,260 But here's something that's kind of cool. 1976 01:36:47,260 --> 01:36:49,660 And this is where if you're kind of a geek, 1977 01:36:49,660 --> 01:36:51,880 life starts to get really interesting fast. 1978 01:36:51,880 --> 01:36:55,510 I can actually change my Python code to print out these four question 1979 01:36:55,510 --> 01:37:00,760 marks in the sky to be quite simply print quote, unquote question 1980 01:37:00,760 --> 01:37:03,100 mark times 4. 1981 01:37:03,100 --> 01:37:06,280 And now if I rerun this program, boom, done. 1982 01:37:06,280 --> 01:37:10,350 And here's where, again, you're getting a lot of features in the language 1983 01:37:10,350 --> 01:37:12,100 where you don't have to think about loops, 1984 01:37:12,100 --> 01:37:14,680 you don't have to think about a lot of syntax. 1985 01:37:14,680 --> 01:37:17,590 If you want to take a question mark and do it four times, 1986 01:37:17,590 --> 01:37:20,080 you can literally use the star operator, which 1987 01:37:20,080 --> 01:37:24,490 has been overloaded to support not only multiplication with numbers 1988 01:37:24,490 --> 01:37:31,340 but also automatic concatenation, if you will, with strings in this way. 1989 01:37:31,340 --> 01:37:33,643 So let me go ahead and do one final version for mario. 1990 01:37:33,643 --> 01:37:35,560 Recall that the last thing we built with mario 1991 01:37:35,560 --> 01:37:37,060 looked a little something like this. 1992 01:37:37,060 --> 01:37:40,840 Let me go ahead and change my mario code now to be for i in range of 3, 1993 01:37:40,840 --> 01:37:44,260 because this is a 3 by 3 grid of bricks, let's say. 1994 01:37:44,260 --> 01:37:46,960 And let's go ahead and now, inside of this loop, 1995 01:37:46,960 --> 01:37:53,830 do another nested loop where I do three columns as well. 1996 01:37:53,830 --> 01:37:56,950 And in here, I want to print out a single hash at a time. 1997 01:37:56,950 --> 01:37:58,980 But I don't want to print out a new line. 1998 01:37:58,980 --> 01:38:02,023 I only want to print out a new line here. 1999 01:38:02,023 --> 01:38:04,440 So it turns out that essentially, because Python gives you 2000 01:38:04,440 --> 01:38:07,710 the backslash n's automatically, essentially any logic 2001 01:38:07,710 --> 01:38:09,870 you wrote in the past now needs to be reversed. 2002 01:38:09,870 --> 01:38:13,200 If you ever printed a new line, now you don't want to print a new line. 2003 01:38:13,200 --> 01:38:17,400 And if you ever didn't print a new line, now you do, in some sense. 2004 01:38:17,400 --> 01:38:19,500 So let me go ahead and-- 2005 01:38:19,500 --> 01:38:22,440 not make, wrong language-- python of mario.py. 2006 01:38:22,440 --> 01:38:24,820 And voila, my 3 by 3 grid. 2007 01:38:24,820 --> 01:38:28,320 So this is to say that in Python, we can nest loops, just 2008 01:38:28,320 --> 01:38:31,380 like we did in C. I can use multiple variable names, like i 2009 01:38:31,380 --> 01:38:32,640 and j being conventional. 2010 01:38:32,640 --> 01:38:35,010 There's no curly braces, there's no semicolons. 2011 01:38:35,010 --> 01:38:37,920 But again, the logic, the ideas are still the same. 2012 01:38:37,920 --> 01:38:42,070 It just takes a little bit of time to get used to, for instance, 2013 01:38:42,070 --> 01:38:43,650 some of the new syntax. 2014 01:38:43,650 --> 01:38:48,600 You'll recall that in C, we ran into a problem pretty early on with integers. 2015 01:38:48,600 --> 01:38:50,880 And let me create a program here called int.py. 2016 01:38:50,880 --> 01:38:53,970 And let me initialize a variable called i to 1. 2017 01:38:53,970 --> 01:38:55,830 And let me go ahead and do this forever. 2018 01:38:55,830 --> 01:38:56,890 Let me do this forever. 2019 01:38:56,890 --> 01:38:58,140 Instead of a while True block. 2020 01:38:58,140 --> 01:38:59,760 Let me print out whatever i is. 2021 01:38:59,760 --> 01:39:04,380 And then let me go ahead and just add 1 to i on each iteration. 2022 01:39:04,380 --> 01:39:06,370 Let me go ahead and run this program. 2023 01:39:06,370 --> 01:39:09,640 And let me increase the size of my window for now and just run this thing. 2024 01:39:09,640 --> 01:39:10,740 Whoops, that was mario. 2025 01:39:10,740 --> 01:39:16,010 Let me run this thing, python of int.py. 2026 01:39:16,010 --> 01:39:18,510 And you'll see that it's counting up to infinity. 2027 01:39:18,510 --> 01:39:21,020 And honestly, this is going to take a while. 2028 01:39:21,020 --> 01:39:25,340 You know what's faster than counting by 1/ maybe multiplying by 2. 2029 01:39:25,340 --> 01:39:28,040 So let me go ahead and multiply by 2 instead. 2030 01:39:28,040 --> 01:39:30,440 To kill the program, just like in C I used Control-C. 2031 01:39:30,440 --> 01:39:32,390 And that's why I see keyboard interrupt. 2032 01:39:32,390 --> 01:39:34,860 It respected my wanting to cancel the program. 2033 01:39:34,860 --> 01:39:37,610 Let me rerun this now and just count really big. 2034 01:39:37,610 --> 01:39:39,830 And even though the internet's being a little slow, 2035 01:39:39,830 --> 01:39:44,120 which is why it's a little shaky, that's a really big number already 2036 01:39:44,120 --> 01:39:46,100 if I keep doubling i. 2037 01:39:46,100 --> 01:39:48,500 What would have happened already at this point 2038 01:39:48,500 --> 01:39:51,620 if I were using C to implement this program? 2039 01:39:51,620 --> 01:39:55,700 If in C I declared a variable called i, and it was an int, 2040 01:39:55,700 --> 01:39:57,770 and I kept doubling it, again and again and again 2041 01:39:57,770 --> 01:39:59,465 and again and again, literally forever? 2042 01:39:59,465 --> 01:40:02,150 2043 01:40:02,150 --> 01:40:02,840 Any thoughts? 2044 01:40:02,840 --> 01:40:03,740 Yeah. 2045 01:40:03,740 --> 01:40:05,480 What would have happened in C. Joy? 2046 01:40:05,480 --> 01:40:08,210 2047 01:40:08,210 --> 01:40:11,102 AUDIENCE: Yeah, I think it would have crashed. 2048 01:40:11,102 --> 01:40:12,560 DAVID MALAN: It would have crashed? 2049 01:40:12,560 --> 01:40:14,870 Why? 2050 01:40:14,870 --> 01:40:17,537 AUDIENCE: Because it would be taking much memory. 2051 01:40:17,537 --> 01:40:18,620 DAVID MALAN: Good thought. 2052 01:40:18,620 --> 01:40:20,098 So it wouldn't crash per se. 2053 01:40:20,098 --> 01:40:21,140 Something would go wrong. 2054 01:40:21,140 --> 01:40:21,890 It wouldn't crash. 2055 01:40:21,890 --> 01:40:24,170 Because it's still an int, and in C at least, 2056 01:40:24,170 --> 01:40:27,590 it would still be taking up on a typical computer 32 bits or 4 bytes. 2057 01:40:27,590 --> 01:40:31,520 But honestly, the program probably would have started printing 0 2058 01:40:31,520 --> 01:40:33,320 by now, or even negative numbers. 2059 01:40:33,320 --> 01:40:35,840 Because recall, one of the limitations of C 2060 01:40:35,840 --> 01:40:38,570 is that integers are a finite size-- 2061 01:40:38,570 --> 01:40:40,650 only 32 bits or 4 bytes. 2062 01:40:40,650 --> 01:40:43,910 Which means if you keep going from 1, 2, 4 8, 16, 2063 01:40:43,910 --> 01:40:47,000 a million, 2 million, 4 million, 8 million, 2064 01:40:47,000 --> 01:40:49,640 and so forth, eventually you're going to get into the billions. 2065 01:40:49,640 --> 01:40:52,820 And as soon as you cross the 2 billion threshold or maybe the 4 billion 2066 01:40:52,820 --> 01:40:57,000 threshold, if using signed or unsigned numbers, it's going to get too big. 2067 01:40:57,000 --> 01:40:58,670 You're going to have integer overflow. 2068 01:40:58,670 --> 01:41:03,710 But in the world of Python, integer overflow, not a thing anymore. 2069 01:41:03,710 --> 01:41:05,600 In the world of Python, your numbers will 2070 01:41:05,600 --> 01:41:08,090 get as big as you need them to get. 2071 01:41:08,090 --> 01:41:10,770 They will automatically address this problem for you. 2072 01:41:10,770 --> 01:41:15,200 Unfortunately, floating point imprecision, still a thing. 2073 01:41:15,200 --> 01:41:18,058 So I only divided 1 by 2 earlier. 2074 01:41:18,058 --> 01:41:21,350 But if I continue to divide other values and I looked at enough decimal points, 2075 01:41:21,350 --> 01:41:24,290 we would still suffer, unfortunately, from floating point imprecision. 2076 01:41:24,290 --> 01:41:27,950 However, in the world of Python, like in Java and other languages, 2077 01:41:27,950 --> 01:41:30,380 there are libraries, scientific libraries 2078 01:41:30,380 --> 01:41:33,500 that allow you to use as much precision as you need, 2079 01:41:33,500 --> 01:41:35,720 or at least as much memory as your computer has. 2080 01:41:35,720 --> 01:41:39,560 So those problems, too, have been better solved in more modern languages 2081 01:41:39,560 --> 01:41:42,260 than in something out of the box like C code. 2082 01:41:42,260 --> 01:41:45,800 But just by multiplying that number again and again was I able, then, 2083 01:41:45,800 --> 01:41:50,720 to demonstrate much larger numbers than we ever saw in weeks past. 2084 01:41:50,720 --> 01:41:53,690 Well, let me go ahead and do another program here, 2085 01:41:53,690 --> 01:41:56,690 this one called scores.py. 2086 01:41:56,690 --> 01:41:58,850 That's going to be an example of really keeping 2087 01:41:58,850 --> 01:42:03,140 track of scores, which was an example we did early on in week 2 of the class. 2088 01:42:03,140 --> 01:42:05,390 And in Python, I'm going to go ahead and give myself 2089 01:42:05,390 --> 01:42:06,890 a list of scores like this-- 2090 01:42:06,890 --> 01:42:08,900 72, 73, and 33-- 2091 01:42:08,900 --> 01:42:11,285 again, sort of a playful reference to our ASCII numbers. 2092 01:42:11,285 --> 01:42:13,160 But in this context, they're quiz scores-- so 2093 01:42:13,160 --> 01:42:16,010 two OK quiz scores, and one kind of low quiz score, 2094 01:42:16,010 --> 01:42:18,110 assuming these things are out of like 100. 2095 01:42:18,110 --> 01:42:19,850 But notice the syntax I'm using. 2096 01:42:19,850 --> 01:42:22,680 Square brackets in Python give me a list. 2097 01:42:22,680 --> 01:42:25,070 I don't have to decide in advance how big it is. 2098 01:42:25,070 --> 01:42:27,600 It's not an array per se, but it's similar in spirit. 2099 01:42:27,600 --> 01:42:29,480 But it will automatically grow or shrink. 2100 01:42:29,480 --> 01:42:31,370 And the syntax is even simpler. 2101 01:42:31,370 --> 01:42:33,618 Suppose I want to average these scores in Python. 2102 01:42:33,618 --> 01:42:34,910 I could do something like this. 2103 01:42:34,910 --> 01:42:39,140 I could print out that the average of these scores is, for instance-- 2104 01:42:39,140 --> 01:42:40,880 and then I could do something like this. 2105 01:42:40,880 --> 01:42:46,130 I could do the sum of scores divided by the length of scores. 2106 01:42:46,130 --> 01:42:49,190 And some of this is actually kind of new already. 2107 01:42:49,190 --> 01:42:54,710 It turns out in Python that there is sum function that will take a list as input 2108 01:42:54,710 --> 01:42:58,520 and return to you the sum of those items. 2109 01:42:58,520 --> 01:43:01,790 And we've seen already there's a len function, L-E-N 2110 01:43:01,790 --> 01:43:03,570 that tells you the length of a list. 2111 01:43:03,570 --> 01:43:07,460 So if I add up all my scores and then divide by the total number of scores, 2112 01:43:07,460 --> 01:43:09,660 that should give me by definition my average. 2113 01:43:09,660 --> 01:43:13,390 So python of scores.py, voila-- 2114 01:43:13,390 --> 01:43:15,620 whoops, what did I do here? 2115 01:43:15,620 --> 01:43:18,390 Ah, I screwed up. 2116 01:43:18,390 --> 01:43:21,560 So unintended, admittedly, but let me try to save myself here. 2117 01:43:21,560 --> 01:43:22,980 So what just happened? 2118 01:43:22,980 --> 01:43:24,855 Well, this error message is a little cryptic. 2119 01:43:24,855 --> 01:43:29,470 It says, "TypeError-- can only concatenate str, not float, to str." 2120 01:43:29,470 --> 01:43:29,970 long. 2121 01:43:29,970 --> 01:43:32,460 Story short, Python in this case does not 2122 01:43:32,460 --> 01:43:36,540 like the fact that I'm trying to take a string, average, on the left 2123 01:43:36,540 --> 01:43:40,215 and concatenate to it a float on the right. 2124 01:43:40,215 --> 01:43:42,090 So there's a couple of ways I can solve this. 2125 01:43:42,090 --> 01:43:44,860 And we saw the fundamental solution earlier. 2126 01:43:44,860 --> 01:43:47,700 If this expression here that I've highlighted 2127 01:43:47,700 --> 01:43:52,320 is by definition mathematically a float, but I want it to become a string, 2128 01:43:52,320 --> 01:43:56,400 I can just tell Python, convert that float to a string. 2129 01:43:56,400 --> 01:44:00,128 So much like there's the itoa function that some of you discovered, 2130 01:44:00,128 --> 01:44:01,920 which is the opposite of the atoi function, 2131 01:44:01,920 --> 01:44:05,430 I can take in Python, in this case a float, 2132 01:44:05,430 --> 01:44:07,330 and convert it to a string equivalent. 2133 01:44:07,330 --> 01:44:13,320 So now if I run python of scores.py, voila, my average is 59.333333. 2134 01:44:13,320 --> 01:44:15,300 And you already see a bit of imprecision. 2135 01:44:15,300 --> 01:44:19,573 There's some rounding error at the end there that is not a perfect one third. 2136 01:44:19,573 --> 01:44:21,240 But there's another way I could do this. 2137 01:44:21,240 --> 01:44:22,470 And it's a little uglier. 2138 01:44:22,470 --> 01:44:25,050 But I could use one of those f-strings. 2139 01:44:25,050 --> 01:44:27,540 I could, say, go ahead and plug in a value 2140 01:44:27,540 --> 01:44:30,610 here and just print out the user's average. 2141 01:44:30,610 --> 01:44:32,970 So it turns out that inside of these curly braces, 2142 01:44:32,970 --> 01:44:35,880 you don't have to print just variables. 2143 01:44:35,880 --> 01:44:39,040 You can actually put entire coding expressions. 2144 01:44:39,040 --> 01:44:42,137 And I would encourage you not to paste crazy long lines of code, 2145 01:44:42,137 --> 01:44:44,220 because it's going to very quickly get unreadable. 2146 01:44:44,220 --> 01:44:46,290 At that point you probably should use a variable. 2147 01:44:46,290 --> 01:44:49,920 But here I can go ahead and run python of scores.py. 2148 01:44:49,920 --> 01:44:52,170 And voila-- I screwed up again. 2149 01:44:52,170 --> 01:44:54,750 Also not intentional, but I can fix this. 2150 01:44:54,750 --> 01:44:59,310 Yeah, I'm missing the f at the beginning to make this a formatted string. 2151 01:44:59,310 --> 01:45:02,928 And now if I rerun it, voila, same exact answer. 2152 01:45:02,928 --> 01:45:04,470 So again, I have multiple approaches. 2153 01:45:04,470 --> 01:45:05,640 There's a third one here. 2154 01:45:05,640 --> 01:45:09,480 I could do something-- and actually, I don't need the str in that context, 2155 01:45:09,480 --> 01:45:11,940 because now if it's inside of a format string, 2156 01:45:11,940 --> 01:45:15,210 Python will presume that I want to automatically convert it to a string. 2157 01:45:15,210 --> 01:45:16,170 So that's nice. 2158 01:45:16,170 --> 01:45:18,780 Or I can just factor this out, and I can say something 2159 01:45:18,780 --> 01:45:22,230 like this-- give me a variable called average, assign it equal to that math, 2160 01:45:22,230 --> 01:45:23,920 and then print out the average. 2161 01:45:23,920 --> 01:45:26,970 So again, just like in C, so many different ways to solve the problem. 2162 01:45:26,970 --> 01:45:29,550 And which one is best depends really on what 2163 01:45:29,550 --> 01:45:33,953 might be most readable, most maintainable, or easiest to do. 2164 01:45:33,953 --> 01:45:36,120 Let me go ahead and add some scores dynamically now. 2165 01:45:36,120 --> 01:45:38,010 Instead of hardcoding my three scores, let 2166 01:45:38,010 --> 01:45:41,010 me ask myself for my scores over the course of the semester. 2167 01:45:41,010 --> 01:45:44,850 From cs50 let me get_int, just so I can get some numbers easily. 2168 01:45:44,850 --> 01:45:48,270 Let me give myself an empty list of scores, the syntax for which 2169 01:45:48,270 --> 01:45:52,380 is just open bracket, close bracket, so nothing inside of it initially. 2170 01:45:52,380 --> 01:45:53,880 And now let me go ahead and do this. 2171 01:45:53,880 --> 01:45:55,380 Let me get myself three scores-- 2172 01:45:55,380 --> 01:45:56,880 maybe it's the end of the term now. 2173 01:45:56,880 --> 01:46:03,120 For i in range of 3, let me go ahead and append to the scores array 2174 01:46:03,120 --> 01:46:07,453 whatever the return value of get_int is like this. 2175 01:46:07,453 --> 01:46:09,370 Now, this, too, I could do in a bunch of ways. 2176 01:46:09,370 --> 01:46:12,042 Let me get rid of this here. 2177 01:46:12,042 --> 01:46:12,930 Whoops. 2178 01:46:12,930 --> 01:46:14,160 Nope, we'll leave that there. 2179 01:46:14,160 --> 01:46:15,540 This I could do in a bunch of ways. 2180 01:46:15,540 --> 01:46:16,623 But notice what I'm doing. 2181 01:46:16,623 --> 01:46:20,370 I'm getting int, and I'm passing the return value of int 2182 01:46:20,370 --> 01:46:21,870 to a new function called append. 2183 01:46:21,870 --> 01:46:24,840 It turns out that lists, the square brackets, 2184 01:46:24,840 --> 01:46:27,600 once you've defined them in a variable like scores, 2185 01:46:27,600 --> 01:46:29,350 they, too, have functions built into them. 2186 01:46:29,350 --> 01:46:34,090 So I can do scores.append in order to add a number to the list. 2187 01:46:34,090 --> 01:46:36,840 So now let me go ahead and run this, python of scores.py. 2188 01:46:36,840 --> 01:46:40,260 Let me manually type in my 72, my 73, and my 33. 2189 01:46:40,260 --> 01:46:42,675 And voila, same exact answer. 2190 01:46:42,675 --> 01:46:44,550 But think about how much of a pain this would 2191 01:46:44,550 --> 01:46:46,950 have been in C, if you had to either decide 2192 01:46:46,950 --> 01:46:49,800 in advance the size of the array, or not decide in advance 2193 01:46:49,800 --> 01:46:53,460 and use malloc and realloc to keep growing and shrinking it. 2194 01:46:53,460 --> 01:46:56,340 Python, using this append function, which 2195 01:46:56,340 --> 01:46:59,880 comes inside of that list variable, handles 2196 01:46:59,880 --> 01:47:03,070 all of this automatically for us. 2197 01:47:03,070 --> 01:47:03,570 All right. 2198 01:47:03,570 --> 01:47:06,780 So that, too, is a whole bunch of features. 2199 01:47:06,780 --> 01:47:10,200 Any questions, though, that I can answer here? 2200 01:47:10,200 --> 01:47:13,150 2201 01:47:13,150 --> 01:47:16,420 Any questions? 2202 01:47:16,420 --> 01:47:16,920 No? 2203 01:47:16,920 --> 01:47:19,290 Yeah, over to Santiago. 2204 01:47:19,290 --> 01:47:20,460 AUDIENCE: Yeah. 2205 01:47:20,460 --> 01:47:22,800 I had a question about-- 2206 01:47:22,800 --> 01:47:28,410 so even if append reduces the amount of code you have to write, 2207 01:47:28,410 --> 01:47:31,560 does it underneath the hood just do exactly what we 2208 01:47:31,560 --> 01:47:35,900 were doing in C, which is like, malloc and realloc, or something like that? 2209 01:47:35,900 --> 01:47:38,100 Is that all-- is that happening inside Python? 2210 01:47:38,100 --> 01:47:38,970 DAVID MALAN: It is. 2211 01:47:38,970 --> 01:47:41,068 Yeah, that's exactly what you're getting for free, 2212 01:47:41,068 --> 01:47:42,360 so to speak, with the language. 2213 01:47:42,360 --> 01:47:45,030 All of that malloc stuff, realloc stuff, maybe it's 2214 01:47:45,030 --> 01:47:47,448 implemented with an array underneath the hood, 2215 01:47:47,448 --> 01:47:48,990 like in the actual computer's memory. 2216 01:47:48,990 --> 01:47:51,138 Maybe it's a linked list like we saw last week. 2217 01:47:51,138 --> 01:47:52,680 But all of that is happening for you. 2218 01:47:52,680 --> 01:47:55,470 But that, again, is one of the reasons why the code ultimately 2219 01:47:55,470 --> 01:47:59,370 runs a little slower, because you have someone else's code in between you 2220 01:47:59,370 --> 01:48:03,210 and the CPU in your computer doing a bit of that work for you. 2221 01:48:03,210 --> 01:48:06,180 Sophia? 2222 01:48:06,180 --> 01:48:08,360 AUDIENCE: Are there efficiency differences 2223 01:48:08,360 --> 01:48:13,640 in between the ways that we print, of utilizing the f formatting 2224 01:48:13,640 --> 01:48:16,550 or the other forms that we've used? 2225 01:48:16,550 --> 01:48:19,700 DAVID MALAN: You don't have to be-- if I'm understanding correctly, 2226 01:48:19,700 --> 01:48:21,292 there are some fancy features of it. 2227 01:48:21,292 --> 01:48:23,000 For instance, there is syntax you can use 2228 01:48:23,000 --> 01:48:25,190 to specify how many decimal points you want 2229 01:48:25,190 --> 01:48:27,470 to print after a floating point value. 2230 01:48:27,470 --> 01:48:32,030 But it's no longer all of the %i, %s, %f, and so forth. 2231 01:48:32,030 --> 01:48:34,970 They're slightly different syntax, but fortunately less of it, 2232 01:48:34,970 --> 01:48:39,620 since you don't have to worry as much about those conventions. 2233 01:48:39,620 --> 01:48:43,370 Other questions or confusion? 2234 01:48:43,370 --> 01:48:43,870 No? 2235 01:48:43,870 --> 01:48:44,240 All right. 2236 01:48:44,240 --> 01:48:46,365 Well, let me go ahead and do one other example that 2237 01:48:46,365 --> 01:48:48,280 might be familiar from some weeks past. 2238 01:48:48,280 --> 01:48:51,190 Let me go ahead and whip up a quick example of uppercasing, just 2239 01:48:51,190 --> 01:48:53,230 to tie together one of our earlier examples 2240 01:48:53,230 --> 01:48:55,660 that we saw more organically, or lowercasing. 2241 01:48:55,660 --> 01:48:58,392 In this case, a file called uppercase.py. 2242 01:48:58,392 --> 01:49:00,850 Let me go ahead, and from the CS50 library, let me go ahead 2243 01:49:00,850 --> 01:49:02,338 and import get_string. 2244 01:49:02,338 --> 01:49:05,380 And then once I have this, let me go ahead and get a string from the user 2245 01:49:05,380 --> 01:49:09,070 and ask them for, "Before," for instance. 2246 01:49:09,070 --> 01:49:11,510 And then let me go ahead and do the following. 2247 01:49:11,510 --> 01:49:13,810 Let me go ahead and print out "After," the goal being I 2248 01:49:13,810 --> 01:49:16,950 want to uppercase this whole string for the user. 2249 01:49:16,950 --> 01:49:18,950 And I'm going to keep this all on the same line. 2250 01:49:18,950 --> 01:49:21,367 So again, I want a program that's going to print "Before," 2251 01:49:21,367 --> 01:49:23,320 ask the human for some input, and then after, 2252 01:49:23,320 --> 01:49:26,150 show the capitalized version of the whole string. 2253 01:49:26,150 --> 01:49:27,320 So how can I do this? 2254 01:49:27,320 --> 01:49:28,720 Well, we've seen one way already. 2255 01:49:28,720 --> 01:49:33,160 I can do literally, for instance, s.upper. 2256 01:49:33,160 --> 01:49:34,790 And let me go ahead and save this. 2257 01:49:34,790 --> 01:49:37,070 And now run python of uppercase.py. 2258 01:49:37,070 --> 01:49:39,550 Let me type in "hi" in lowercase, and boom, now 2259 01:49:39,550 --> 01:49:41,350 I get back the uppercase version. 2260 01:49:41,350 --> 01:49:44,200 But if you want, you can actually manipulate individual characters 2261 01:49:44,200 --> 01:49:44,848 as well. 2262 01:49:44,848 --> 01:49:47,140 Let me go ahead and a little more pedantically do this. 2263 01:49:47,140 --> 01:49:50,290 For c in s, print c. 2264 01:49:50,290 --> 01:49:53,180 Now, this isn't quite what I want yet, but it's a stepping stone. 2265 01:49:53,180 --> 01:49:55,930 Notice now if I type in "hi" in lowercase, 2266 01:49:55,930 --> 01:49:59,930 I see "h," "i," exclamation point, all still lowercase. 2267 01:49:59,930 --> 01:50:01,557 So I haven't done anything interesting. 2268 01:50:01,557 --> 01:50:03,640 But you know what, let me get rid of the new line, 2269 01:50:03,640 --> 01:50:06,650 just so it all stays on the same line, because that was kind of ugly. 2270 01:50:06,650 --> 01:50:07,710 Let me do it again. 2271 01:50:07,710 --> 01:50:08,590 OK, a little better. 2272 01:50:08,590 --> 01:50:11,230 Let me actually add a new line at the very end of the program 2273 01:50:11,230 --> 01:50:13,040 to move my cursor to the new line. 2274 01:50:13,040 --> 01:50:14,860 Let's do it once more, "hi." 2275 01:50:14,860 --> 01:50:17,420 OK, I'm not uppercasing anything. 2276 01:50:17,420 --> 01:50:23,560 But if I change c to c.upper, I can do that as expected. 2277 01:50:23,560 --> 01:50:25,630 And let me run it again, "hi," and boom. 2278 01:50:25,630 --> 01:50:27,490 Now I have another working program. 2279 01:50:27,490 --> 01:50:32,230 But the new feature now is, notice this coolness on line 5. 2280 01:50:32,230 --> 01:50:35,080 If you want to iterate over a string's characters, 2281 01:50:35,080 --> 01:50:39,130 you don't need to initialize i to 0 and then use square bracket notation 2282 01:50:39,130 --> 01:50:45,460 like you did in C. You just say, for c in s, or for x and y, whatever it is. 2283 01:50:45,460 --> 01:50:50,257 For can also be used to iterate over the individual characters in a string, 2284 01:50:50,257 --> 01:50:52,090 as you might want to do when doing something 2285 01:50:52,090 --> 01:50:54,133 like cryptography or the like. 2286 01:50:54,133 --> 01:50:56,800 So we don't have to just uppercase the whole string all at once. 2287 01:50:56,800 --> 01:51:00,340 We can still gain access to our individual values. 2288 01:51:00,340 --> 01:51:03,550 And there's other things you can do in Python as well that we could do in C. 2289 01:51:03,550 --> 01:51:07,480 Let me go ahead and create a program here called argv.py, 2290 01:51:07,480 --> 01:51:12,308 for argument vector, which, recall, was the name of the input to main 2291 01:51:12,308 --> 01:51:14,350 that allows you to access command line arguments. 2292 01:51:14,350 --> 01:51:17,090 Now today, we have seen that you can have a main function 2293 01:51:17,090 --> 01:51:19,240 but you don't need to, but it's conventional. 2294 01:51:19,240 --> 01:51:20,810 It's not required anymore. 2295 01:51:20,810 --> 01:51:24,370 And so we haven't seen argc or argv yet, but that's 2296 01:51:24,370 --> 01:51:26,500 because they're elsewhere in Python. 2297 01:51:26,500 --> 01:51:29,650 If you want to access command line arguments in Python, 2298 01:51:29,650 --> 01:51:33,640 it turns out that you can import a module called argv. 2299 01:51:33,640 --> 01:51:37,300 And this is a little new, but it follows the same pattern as the CS50's library. 2300 01:51:37,300 --> 01:51:42,640 I'm going to import from the System library a feature called argv. 2301 01:51:42,640 --> 01:51:45,700 So this just means that it comes with Python, but to use it 2302 01:51:45,700 --> 01:51:47,980 you have to import it explicitly. 2303 01:51:47,980 --> 01:51:49,360 And now I'm going to do this. 2304 01:51:49,360 --> 01:51:54,490 If the length of argv equals 2, then I'm going 2305 01:51:54,490 --> 01:51:57,340 to go ahead and print out, just like we did a few weeks ago, 2306 01:51:57,340 --> 01:52:01,660 "hello," and then argv bracket 1. 2307 01:52:01,660 --> 01:52:04,480 Somewhat cryptic, but I'll come back to this in a moment. 2308 01:52:04,480 --> 01:52:07,580 Else, I'm going to go ahead and print out a default of "hello, world." 2309 01:52:07,580 --> 01:52:09,970 So we did this some weeks ago, in week 2, 2310 01:52:09,970 --> 01:52:14,260 whereby we ran a program that if the user typed their name at the prompt, 2311 01:52:14,260 --> 01:52:16,250 it would say "hello, David" or "hello, Brian." 2312 01:52:16,250 --> 01:52:18,410 If they didn't, it would just say "hello, world." 2313 01:52:18,410 --> 01:52:22,150 So to be clear, if I run this thing and run it without any command line 2314 01:52:22,150 --> 01:52:24,250 arguments, I just see "hello, world." 2315 01:52:24,250 --> 01:52:27,790 If I run it again, though, and type my name in and hit Enter, 2316 01:52:27,790 --> 01:52:29,020 now I see "hello, David." 2317 01:52:29,020 --> 01:52:30,340 So how is that working? 2318 01:52:30,340 --> 01:52:33,340 Well, this first line of code gives me access to argv, 2319 01:52:33,340 --> 01:52:37,100 which is now tucked away in the sys library, if you will, 2320 01:52:37,100 --> 01:52:38,800 the sys package, so to speak. 2321 01:52:38,800 --> 01:52:40,300 But it works the same way. 2322 01:52:40,300 --> 01:52:42,580 There's no argc, but no problem. 2323 01:52:42,580 --> 01:52:46,210 If argv is a list of command line arguments, which it is, 2324 01:52:46,210 --> 01:52:50,620 len, L-E-N, will tell me the length of that list, which is equivalent to argc. 2325 01:52:50,620 --> 01:52:55,930 So I can reconstruct the same idea from my version in C. 2326 01:52:55,930 --> 01:52:59,680 And here, then, I have a format string that prints out "hello," comma, 2327 01:52:59,680 --> 01:53:01,270 and then whatever's in curly braces. 2328 01:53:01,270 --> 01:53:02,650 And argv is a list. 2329 01:53:02,650 --> 01:53:05,830 And just like in C, which had arrays, a list 2330 01:53:05,830 --> 01:53:09,265 is just an array that can dynamically grow and shrink for you. 2331 01:53:09,265 --> 01:53:13,610 You can still use square bracket notation to get at, in this case, 2332 01:53:13,610 --> 01:53:15,790 the second thing the human typed. 2333 01:53:15,790 --> 01:53:18,310 So let me change this just for clarity to be 0. 2334 01:53:18,310 --> 01:53:20,680 And if I rerun this now and type in David, 2335 01:53:20,680 --> 01:53:23,590 it says weirdly, "hello, argv.py." 2336 01:53:23,590 --> 01:53:25,540 So what you don't see is the word "Python." 2337 01:53:25,540 --> 01:53:29,050 Python is the interpreter, but that's not part of your program's execution 2338 01:53:29,050 --> 01:53:29,890 per se. 2339 01:53:29,890 --> 01:53:36,100 argv 0 is going to be the name of the Python program you're running, 2340 01:53:36,100 --> 01:53:39,920 and argv 1 is going to be the first word thereafter, and so forth. 2341 01:53:39,920 --> 01:53:42,310 So we still have access to that feature, but now 2342 01:53:42,310 --> 01:53:44,008 we can convert it now to Python. 2343 01:53:44,008 --> 01:53:46,800 And in fact, if I want to print out all the command line arguments, 2344 01:53:46,800 --> 01:53:48,330 I can just more simply do this-- 2345 01:53:48,330 --> 01:53:52,200 for arg in argv, go ahead and print arg. 2346 01:53:52,200 --> 01:53:55,172 So very succinct, if not obvious at first glance. 2347 01:53:55,172 --> 01:53:56,880 Now let me go ahead and type in something 2348 01:53:56,880 --> 01:53:58,710 like "David Malan," two words. 2349 01:53:58,710 --> 01:54:04,710 Enter, you now see everything printed or typed after the program's name, 2350 01:54:04,710 --> 01:54:05,650 and so forth. 2351 01:54:05,650 --> 01:54:10,800 So here, too, notice how neatly we can iterate over a list in Python. 2352 01:54:10,800 --> 01:54:13,470 There's no i, there's no square brackets necessarily. 2353 01:54:13,470 --> 01:54:18,240 You can just say, for arg in argv, just like a moment ago I said for c in s. 2354 01:54:18,240 --> 01:54:21,480 Pretty much the Python for loop is smart enough 2355 01:54:21,480 --> 01:54:25,050 to figure out what it is you want it to iterate over, 2356 01:54:25,050 --> 01:54:26,650 whether it's a string or a list. 2357 01:54:26,650 --> 01:54:29,553 And my God, it's just so much more fun or pleasant to program 2358 01:54:29,553 --> 01:54:32,220 now, when you don't have to worry about all the stupid mechanics 2359 01:54:32,220 --> 01:54:35,040 of incrementing, and plus plus, and semicolons, and all 2360 01:54:35,040 --> 01:54:37,620 of that syntactical mess. 2361 01:54:37,620 --> 01:54:40,162 All right, let me pause here to see if there's any questions. 2362 01:54:40,162 --> 01:54:42,578 I know we're going through some of these examples quickly, 2363 01:54:42,578 --> 01:54:44,430 but they're really just translations again. 2364 01:54:44,430 --> 01:54:46,860 And for upcoming problems and problems sets 2365 01:54:46,860 --> 01:54:52,990 will you be able to more methodically compare before and after as well. 2366 01:54:52,990 --> 01:54:55,180 Anything at all on your end, Brian? 2367 01:54:55,180 --> 01:54:56,080 BRIAN: Nothing here. 2368 01:54:56,080 --> 01:54:57,038 DAVID MALAN: All right. 2369 01:54:57,038 --> 01:54:59,080 So let's look at some of our final past examples. 2370 01:54:59,080 --> 01:55:01,246 And then we'll reserve some time at the end of today 2371 01:55:01,246 --> 01:55:02,980 to look at some even more powerful things 2372 01:55:02,980 --> 01:55:06,230 that we can do because now of languages like Python. 2373 01:55:06,230 --> 01:55:10,615 Let me go ahead and create a program, this time called exit.py, exit.py. 2374 01:55:10,615 --> 01:55:12,490 And this program's purpose in life, it's just 2375 01:55:12,490 --> 01:55:14,080 going to demonstrate exit statuses. 2376 01:55:14,080 --> 01:55:16,270 Recall that eventually in C, we introduced 2377 01:55:16,270 --> 01:55:20,320 the notion of returning 0, or returning 1, or any other value from main. 2378 01:55:20,320 --> 01:55:22,720 We do have that ability now in Python, too, 2379 01:55:22,720 --> 01:55:25,278 that you'll start to see in more larger programs. 2380 01:55:25,278 --> 01:55:27,070 Here, too, I'm going to go ahead and import 2381 01:55:27,070 --> 01:55:30,740 sys, the whole thing this time, just to show a different way of doing this. 2382 01:55:30,740 --> 01:55:34,420 I'm going to say, if the length of sys.argv 2383 01:55:34,420 --> 01:55:37,960 does not equal 2, let me go ahead and yell at the user, 2384 01:55:37,960 --> 01:55:40,570 "Missing command-line arguments." 2385 01:55:40,570 --> 01:55:44,680 And then after this, I'm going to go ahead and do sys.exit 1. 2386 01:55:44,680 --> 01:55:48,940 Otherwise, I'm going to go ahead and print out a formatted string that 2387 01:55:48,940 --> 01:55:54,640 says "hello," comma arg v bracket 1, with sys now in front of it 2388 01:55:54,640 --> 01:55:56,420 for reasons I'll explain in a moment. 2389 01:55:56,420 --> 01:56:01,490 And then at the end, I'm going to go ahead and by default print sys.exit 0. 2390 01:56:01,490 --> 01:56:01,990 All right. 2391 01:56:01,990 --> 01:56:03,270 So what is going on here? 2392 01:56:03,270 --> 01:56:06,730 One, because I'm now using sys for two different things, 2393 01:56:06,730 --> 01:56:09,190 I decided not to import argv specifically, 2394 01:56:09,190 --> 01:56:11,170 but just to import the whole library. 2395 01:56:11,170 --> 01:56:14,770 But because I did that, I can't just write the word "argv" anywhere. 2396 01:56:14,770 --> 01:56:18,700 I now have to prefix it with the name of the package or library that it's in. 2397 01:56:18,700 --> 01:56:22,870 So that's why I started doing sys.argv, sys.argv. 2398 01:56:22,870 --> 01:56:27,580 But I'm also using another feature of the sys library, which gives me access 2399 01:56:27,580 --> 01:56:33,220 to an exit function, which is the equivalent to returning from main. 2400 01:56:33,220 --> 01:56:34,630 So this is a bit of a dichotomy. 2401 01:56:34,630 --> 01:56:39,430 In C, you had to return 0 or 1, or some other integer from main. 2402 01:56:39,430 --> 01:56:44,750 In Python, you instead call sys.exit with the same kinds of numbers. 2403 01:56:44,750 --> 01:56:48,340 So a little bit different syntactically, but it's the same fundamental idea. 2404 01:56:48,340 --> 01:56:49,890 What's the purpose of this program? 2405 01:56:49,890 --> 01:56:52,150 Well, if I run this thing, its purpose is just 2406 01:56:52,150 --> 01:56:56,590 to make me type in one word and only one word after my program's name. 2407 01:56:56,590 --> 01:56:59,080 So notice, if I just run python of exit.py, 2408 01:56:59,080 --> 01:57:01,750 it's yelling at me, "Missing command-line argument." 2409 01:57:01,750 --> 01:57:05,890 If I run it instead with my name after that, now it says "hello, David." 2410 01:57:05,890 --> 01:57:07,300 So stupid program. 2411 01:57:07,300 --> 01:57:11,200 It's only meant to demonstrate how you can now return different values 2412 01:57:11,200 --> 01:57:14,295 or really return prematurely from a program, 2413 01:57:14,295 --> 01:57:15,670 because you're no longer in main. 2414 01:57:15,670 --> 01:57:21,770 You can't return per se, but you can now in Python exit as needed. 2415 01:57:21,770 --> 01:57:24,350 So that's the comparable line there. 2416 01:57:24,350 --> 01:57:26,435 All right, any questions, then, on exit statuses? 2417 01:57:26,435 --> 01:57:29,060 Again, we're just kind of churning through the list of features 2418 01:57:29,060 --> 01:57:33,350 we saw in C, even if they don't come to you super naturally-- 2419 01:57:33,350 --> 01:57:40,540 pun not intended-- but rather, there are analogs here in the Python world. 2420 01:57:40,540 --> 01:57:41,040 No? 2421 01:57:41,040 --> 01:57:41,500 All right. 2422 01:57:41,500 --> 01:57:43,440 Well, recall that after that we started focusing really 2423 01:57:43,440 --> 01:57:44,760 in the class on algorithms. 2424 01:57:44,760 --> 01:57:46,560 And that's when the size of our data sets 2425 01:57:46,560 --> 01:57:49,620 and our-- the efficiency of our code started to really matter. 2426 01:57:49,620 --> 01:57:52,230 Let me go ahead and write a program called numbers.py 2427 01:57:52,230 --> 01:57:55,440 that, for instance, contains an import at the top for sys, 2428 01:57:55,440 --> 01:57:56,980 because I'll need that in a moment. 2429 01:57:56,980 --> 01:57:58,050 And then it gives me-- 2430 01:57:58,050 --> 01:58:02,850 and let me give myself an array of numbers, like 4, 6, 8, 2, 7, 5, 0. 2431 01:58:02,850 --> 01:58:07,030 And you might recall that those were the numbers behind the doors in week 3. 2432 01:58:07,030 --> 01:58:09,450 And suppose that I want to search for the number 0. 2433 01:58:09,450 --> 01:58:13,830 Well, in C, to implement linear search you would use a for loop 2434 01:58:13,830 --> 01:58:17,130 and a variable like i, and check all of the locations. 2435 01:58:17,130 --> 01:58:18,690 Python is way simpler. 2436 01:58:18,690 --> 01:58:25,180 If 0 in numbers, go ahead and print out "Found." 2437 01:58:25,180 --> 01:58:31,850 And then I'll go ahead and else print out "Not found." 2438 01:58:31,850 --> 01:58:32,750 And that's it. 2439 01:58:32,750 --> 01:58:35,500 So let me go ahead now and do python of numbers.py. 2440 01:58:35,500 --> 01:58:38,360 Hopefully I will see [INAUDIBLE] found, because it's in fact there. 2441 01:58:38,360 --> 01:58:38,920 So that's it. 2442 01:58:38,920 --> 01:58:44,620 Linear search is just a prepositional phrase, if 0 in numbers, 2443 01:58:44,620 --> 01:58:47,620 gives you the answer True or False that you want. 2444 01:58:47,620 --> 01:58:49,360 So there is our linear search. 2445 01:58:49,360 --> 01:58:50,830 What if I want to do it for names? 2446 01:58:50,830 --> 01:58:54,070 Well, let me go ahead and give myself a second file, similar in spirit, 2447 01:58:54,070 --> 01:58:55,600 called names.py. 2448 01:58:55,600 --> 01:58:56,800 Let me again import-- 2449 01:58:56,800 --> 01:59:00,400 and actually, if I really want to be identical to our C version, 2450 01:59:00,400 --> 01:59:05,737 let me go ahead and exit with 0 here, and let me exit with 1 here. 2451 01:59:05,737 --> 01:59:07,570 But strictly speaking, that's not necessary. 2452 01:59:07,570 --> 01:59:11,265 That just happens to be what I did when we did this in C instead. 2453 01:59:11,265 --> 01:59:13,390 In names, let me go ahead and do something similar. 2454 01:59:13,390 --> 01:59:17,020 Let me give myself a names list with a whole bunch of names-- 2455 01:59:17,020 --> 01:59:25,750 "Bill," and "Charlie," and "Fred," and "George," and "Ginny," and "Percy," 2456 01:59:25,750 --> 01:59:28,700 and lastly "Ron," all the way at the end. 2457 01:59:28,700 --> 01:59:31,720 And then let me just check if "Ron" is in that list using linear search. 2458 01:59:31,720 --> 01:59:36,820 If "Ron" in names, go ahead and print out "Found." 2459 01:59:36,820 --> 01:59:39,220 Else, go ahead and print out "Not found." 2460 01:59:39,220 --> 01:59:43,120 And I won't bother printing out or exiting with 0 or 1 this time. 2461 01:59:43,120 --> 01:59:45,760 But let me go ahead and run python of names-- 2462 01:59:45,760 --> 01:59:48,010 whoops, python of names. 2463 01:59:48,010 --> 01:59:49,840 And voila, we found "Ron." 2464 01:59:49,840 --> 01:59:51,160 And notice, I'm not cheating. 2465 01:59:51,160 --> 01:59:52,720 I don't think I've screwed up. 2466 01:59:52,720 --> 01:59:55,960 If I go ahead and say "Ronald," if that was in fact his formal name, 2467 01:59:55,960 --> 01:59:58,120 now I search for "Ron," not found. 2468 01:59:58,120 --> 01:59:59,973 It's looking, indeed, for an exact match. 2469 01:59:59,973 --> 02:00:02,140 So that's pretty cool, that we can distill something 2470 02:00:02,140 --> 02:00:03,430 like that pretty readily. 2471 02:00:03,430 --> 02:00:06,670 Well, recall that a little bit ago, I proposed that Python has other data 2472 02:00:06,670 --> 02:00:11,080 types as well, among which are these things called dictionaries or dicts, 2473 02:00:11,080 --> 02:00:17,380 D-I-C-T, which represent a collection of key-value pairs similar in spirit 2474 02:00:17,380 --> 02:00:18,250 to a dictionary. 2475 02:00:18,250 --> 02:00:23,110 Like, the Spanish dictionary has Spanish keys and English values converting one 2476 02:00:23,110 --> 02:00:25,330 to the other, this English dictionary has 2477 02:00:25,330 --> 02:00:27,700 English words and English definitions. 2478 02:00:27,700 --> 02:00:30,700 But the same idea-- a collection of keys and values. 2479 02:00:30,700 --> 02:00:32,863 Using one, you can find the other. 2480 02:00:32,863 --> 02:00:35,530 Well, let's go ahead and translate this into Python in a program 2481 02:00:35,530 --> 02:00:38,740 called phonebook.py, and implements something 2482 02:00:38,740 --> 02:00:41,170 like our C phone book a while back, which, recall, 2483 02:00:41,170 --> 02:00:45,580 in C, we used a couple of arrays initially, then we scratched that, 2484 02:00:45,580 --> 02:00:48,160 and we used an array of structs instead. 2485 02:00:48,160 --> 02:00:51,970 Now let's use a dictionary, which is a more general-purpose data 2486 02:00:51,970 --> 02:00:54,320 structure, as follows. 2487 02:00:54,320 --> 02:00:59,290 Let me go ahead here and from cs50 import get_string. 2488 02:00:59,290 --> 02:01:02,410 Then let me go ahead and give myself a dictionary of people. 2489 02:01:02,410 --> 02:01:04,600 And the syntax here is a little different, 2490 02:01:04,600 --> 02:01:07,690 but I'm going to go ahead and preemptively use curly braces. 2491 02:01:07,690 --> 02:01:10,330 They are back for the purposes of dictionaries. 2492 02:01:10,330 --> 02:01:12,820 And then here's how you define key-value pairs. 2493 02:01:12,820 --> 02:01:14,560 One key is going to be "Brian." 2494 02:01:14,560 --> 02:01:18,910 And his value is going to be "+1-617-495-1000." 2495 02:01:18,910 --> 02:01:19,842 That's his number. 2496 02:01:19,842 --> 02:01:21,800 And then I'll be one of the other keys from now 2497 02:01:21,800 --> 02:01:24,430 We'll keep it a very small phone book or dictionary. 2498 02:01:24,430 --> 02:01:29,380 Mine will be "+1-949-468-2750." 2499 02:01:29,380 --> 02:01:31,360 And that's it. 2500 02:01:31,360 --> 02:01:34,275 So the curly braces can technically be on different lines. 2501 02:01:34,275 --> 02:01:36,400 I could move this up here, I could get rid of this. 2502 02:01:36,400 --> 02:01:39,100 But there are certain style conventions in Python. 2503 02:01:39,100 --> 02:01:43,240 The point, though, here is that a dictionary is defined with curly braces 2504 02:01:43,240 --> 02:01:47,770 at the beginning and end; the keys and values are separated by colons; 2505 02:01:47,770 --> 02:01:50,993 and the key-value pairs are separated by commas. 2506 02:01:50,993 --> 02:01:53,410 So that's why it's conventional to write it the way I did. 2507 02:01:53,410 --> 02:01:55,118 It's just a little more obvious that this 2508 02:01:55,118 --> 02:01:58,750 is a dictionary with two keys, each of which has a value. 2509 02:01:58,750 --> 02:02:01,640 It's just associating left with right, so to speak. 2510 02:02:01,640 --> 02:02:02,770 Now, what does this mean? 2511 02:02:02,770 --> 02:02:04,840 Suppose I want to search for someone's name. 2512 02:02:04,840 --> 02:02:08,440 Well, let me go ahead and give myself a name variable called get_string, asking 2513 02:02:08,440 --> 02:02:09,520 the human for a name. 2514 02:02:09,520 --> 02:02:13,210 And let me implement my own virtual phone book, much like the Contacts app 2515 02:02:13,210 --> 02:02:13,935 on your phone. 2516 02:02:13,935 --> 02:02:16,060 Let me go ahead and then say, once I have the name, 2517 02:02:16,060 --> 02:02:18,640 if name in people, that's great. 2518 02:02:18,640 --> 02:02:20,890 If I found the name in people, let me go ahead 2519 02:02:20,890 --> 02:02:27,100 and print out that the number for that person is people bracket name. 2520 02:02:27,100 --> 02:02:30,100 And this is where dictionaries are going to get really powerful. 2521 02:02:30,100 --> 02:02:32,020 Let me run it first and then explain. 2522 02:02:32,020 --> 02:02:34,870 Python of phonebook.py, Enter-- 2523 02:02:34,870 --> 02:02:38,350 whoops, python of phonebook.py. 2524 02:02:38,350 --> 02:02:40,300 Let me search for Brian's number. 2525 02:02:40,300 --> 02:02:42,685 Boom, there's Brian's number. 2526 02:02:42,685 --> 02:02:44,560 Let me go ahead and run it with David's name. 2527 02:02:44,560 --> 02:02:46,270 Boom, there's that number. 2528 02:02:46,270 --> 02:02:50,380 Let me go ahead and run it with, say, Montague's name. 2529 02:02:50,380 --> 02:02:52,630 Don't have his phone number just yet. 2530 02:02:52,630 --> 02:02:55,810 He's unlisted, as would be anyone else that I type in. 2531 02:02:55,810 --> 02:02:57,850 So what has gone on here? 2532 02:02:57,850 --> 02:03:00,730 Well, at the top I'm declaring this new variable called people. 2533 02:03:00,730 --> 02:03:03,820 And it's a dictionary, a set of key-value pairs left and right. 2534 02:03:03,820 --> 02:03:07,900 Then I'm just getting a string from the user using get_string as before. 2535 02:03:07,900 --> 02:03:09,760 And then this is powerful, too. 2536 02:03:09,760 --> 02:03:14,830 This is essentially, on line 9, searching the whole dictionary 2537 02:03:14,830 --> 02:03:16,030 for the given name. 2538 02:03:16,030 --> 02:03:22,290 And it's returning to me down here the name associated with that-- or, sorry, 2539 02:03:22,290 --> 02:03:24,760 the number associated with that person's name. 2540 02:03:24,760 --> 02:03:27,200 And let me make this more clear by factoring this out. 2541 02:03:27,200 --> 02:03:30,090 Let me give myself a variable called number and then more 2542 02:03:30,090 --> 02:03:32,460 explicitly print out that variable's name. 2543 02:03:32,460 --> 02:03:34,260 Here's what's different today. 2544 02:03:34,260 --> 02:03:39,780 "If name in people" in here, what this does is Python 2545 02:03:39,780 --> 02:03:42,540 searches all of the keys for that name. 2546 02:03:42,540 --> 02:03:43,650 It doesn't search values. 2547 02:03:43,650 --> 02:03:47,280 When you say if name in a given dictionary, like people is, 2548 02:03:47,280 --> 02:03:49,050 it searches only the keys. 2549 02:03:49,050 --> 02:03:52,290 If you've then found the key, I know definitively 2550 02:03:52,290 --> 02:03:54,780 that "David" or "Brian" are in the dictionary. 2551 02:03:54,780 --> 02:03:55,830 And notice this. 2552 02:03:55,830 --> 02:03:58,560 It's just like in C's arrays syntax. 2553 02:03:58,560 --> 02:04:01,800 You can now use square bracket notation to index 2554 02:04:01,800 --> 02:04:06,510 into a dictionary using a word like "David" or "Brian," 2555 02:04:06,510 --> 02:04:09,150 and get back a value like our phone number. 2556 02:04:09,150 --> 02:04:12,060 In C, and thus far even in Python, whenever 2557 02:04:12,060 --> 02:04:16,770 we've seen square bracket notation, it would only be typically for numbers, 2558 02:04:16,770 --> 02:04:19,590 because arrays or lists have indices, numbers 2559 02:04:19,590 --> 02:04:22,050 that addresses the first location, middle, and last, 2560 02:04:22,050 --> 02:04:24,033 and so forth, everything in between. 2561 02:04:24,033 --> 02:04:26,700 But what's powerful about dictionaries is that they're otherwise 2562 02:04:26,700 --> 02:04:28,710 known as associative arrays. 2563 02:04:28,710 --> 02:04:31,150 A dictionary is a collection of key-value pairs. 2564 02:04:31,150 --> 02:04:33,570 And if you want to look up a key, you simply 2565 02:04:33,570 --> 02:04:35,730 use square bracket notation, just like we used 2566 02:04:35,730 --> 02:04:37,500 to use square brackets for numbers. 2567 02:04:37,500 --> 02:04:42,030 And because Python is a pretty fancy language, 2568 02:04:42,030 --> 02:04:44,130 it handles the searching for you. 2569 02:04:44,130 --> 02:04:46,920 And even better, it does not use linear search. 2570 02:04:46,920 --> 02:04:50,460 When searching a dictionary, it aspires to give you 2571 02:04:50,460 --> 02:04:54,600 constant time by using what we called last week a hash table. 2572 02:04:54,600 --> 02:04:58,020 Dictionaries are typically implemented underneath the hood using something 2573 02:04:58,020 --> 02:04:59,190 like a hash table. 2574 02:04:59,190 --> 02:05:01,380 And recall that, even though it was really 2575 02:05:01,380 --> 02:05:05,730 a goal of achieving constant time, if you choose a really good hash 2576 02:05:05,730 --> 02:05:07,290 function and a really-- 2577 02:05:07,290 --> 02:05:11,670 the right size array to hash into, you can come close to constant time. 2578 02:05:11,670 --> 02:05:15,660 So again, among the features of a dictionary in Python 2579 02:05:15,660 --> 02:05:18,270 are that it gives you very high performance. 2580 02:05:18,270 --> 02:05:19,350 It's not linear search. 2581 02:05:19,350 --> 02:05:22,470 And in fact, set-- recall that when we began playing with Python earlier, 2582 02:05:22,470 --> 02:05:24,795 and I re-implemented speller using, what, 2583 02:05:24,795 --> 02:05:27,420 10 or 20 lines of code max instead of the many 2584 02:05:27,420 --> 02:05:31,440 more that you might have written for pset 5, speller used a set. 2585 02:05:31,440 --> 02:05:33,460 And a set is just a collection of values. 2586 02:05:33,460 --> 02:05:36,960 Long story short, it's similar in spirit to a dictionary 2587 02:05:36,960 --> 02:05:40,140 in that it, too, underneath the hood uses a hash table 2588 02:05:40,140 --> 02:05:42,010 to get you answers quickly. 2589 02:05:42,010 --> 02:05:46,110 So if you think back to what that speller example was from earlier 2590 02:05:46,110 --> 02:05:51,420 on today, when I had a line of code that just said words equals set, 2591 02:05:51,420 --> 02:05:55,680 that one line of code was implementing pretty much the entirety 2592 02:05:55,680 --> 02:05:57,180 of your spell checker. 2593 02:05:57,180 --> 02:06:01,080 All of those pointers, all of those hash tables and chains and linked lists 2594 02:06:01,080 --> 02:06:04,020 are distilled into just one line of code. 2595 02:06:04,020 --> 02:06:07,210 You get that with the language itself. 2596 02:06:07,210 --> 02:06:07,710 All right. 2597 02:06:07,710 --> 02:06:10,500 Any questions, then, on dictionaries? 2598 02:06:10,500 --> 02:06:14,670 They will recur, and they tend to be one of the most useful data structures, 2599 02:06:14,670 --> 02:06:17,910 because this ability to just associate something with something else 2600 02:06:17,910 --> 02:06:24,140 is just a wonderful way, it turns out, to organize your data. 2601 02:06:24,140 --> 02:06:27,260 Any questions here? 2602 02:06:27,260 --> 02:06:29,225 Yeah, Sophia? 2603 02:06:29,225 --> 02:06:31,390 AUDIENCE: Is there only a set hash function 2604 02:06:31,390 --> 02:06:33,445 that Python has defined for these dictionaries, 2605 02:06:33,445 --> 02:06:36,400 or can we change the hash function in any way? 2606 02:06:36,400 --> 02:06:38,140 DAVID MALAN: Good question. 2607 02:06:38,140 --> 02:06:40,540 It comes with a hash function for you, and Python 2608 02:06:40,540 --> 02:06:42,430 figures all of that out for you. 2609 02:06:42,430 --> 02:06:46,210 So that's the kind of detail that you should leave to the library, 2610 02:06:46,210 --> 02:06:48,502 because someone else has spent all of the time thinking 2611 02:06:48,502 --> 02:06:51,502 about how to dynamically adapt the data structure, move things around as 2612 02:06:51,502 --> 02:06:54,040 needed, so that you no longer need to stress to the degree 2613 02:06:54,040 --> 02:06:57,115 you might have when implementing speller yourself. 2614 02:06:57,115 --> 02:06:58,990 And it turns out, other things get easy, too. 2615 02:06:58,990 --> 02:07:01,710 This is not a commonly needed feature, necessarily, 2616 02:07:01,710 --> 02:07:02,960 but it is something we can do. 2617 02:07:02,960 --> 02:07:05,650 And let me go ahead and write a quick program called swap.py. 2618 02:07:05,650 --> 02:07:10,810 Recall that in swap.c a couple of weeks ago, we gave x a value of 1, 2619 02:07:10,810 --> 02:07:14,320 y a variable-- a value of 2, and then I printed out something 2620 02:07:14,320 --> 02:07:17,740 like "x is x, y is y." 2621 02:07:17,740 --> 02:07:20,680 But this week I'm using format strings just to print that out. 2622 02:07:20,680 --> 02:07:24,670 Then I did something like swap x, y, and I just kind of hoped for the best, 2623 02:07:24,670 --> 02:07:26,860 and then I printed out those values again. 2624 02:07:26,860 --> 02:07:30,100 Well it turns out in Python, because you don't have pointers 2625 02:07:30,100 --> 02:07:34,720 and you don't have addresses per se that you have access to, 2626 02:07:34,720 --> 02:07:37,330 you can't resort to the solution like last week 2627 02:07:37,330 --> 02:07:39,770 and pass these variables around by reference, 2628 02:07:39,770 --> 02:07:41,140 so to speak, by their address. 2629 02:07:41,140 --> 02:07:42,620 That's just not possible. 2630 02:07:42,620 --> 02:07:43,630 Why is that a thing? 2631 02:07:43,630 --> 02:07:46,005 Well, it would seem to be taking a feature away from you, 2632 02:07:46,005 --> 02:07:49,240 but honestly, if this past week was any indication, including the week prior, 2633 02:07:49,240 --> 02:07:50,260 pointers are hard. 2634 02:07:50,260 --> 02:07:52,330 And segmentation faults are frequent. 2635 02:07:52,330 --> 02:07:55,600 And getting all of that stuff right is difficult. And at worst, 2636 02:07:55,600 --> 02:07:58,600 your programs can be compromised, because someone can access memory that 2637 02:07:58,600 --> 02:07:59,470 they shouldn't. 2638 02:07:59,470 --> 02:08:01,390 So Python takes that feature away. 2639 02:08:01,390 --> 02:08:04,510 Java also takes that feature away from programmers 2640 02:08:04,510 --> 02:08:08,320 to protect you against yourself from screwing up, like you may have 2641 02:08:08,320 --> 02:08:11,190 and should have in some number of times this past week. 2642 02:08:11,190 --> 02:08:13,940 But it turns out, in Python there are solutions to these problems. 2643 02:08:13,940 --> 02:08:16,420 And if you want to swap x and y, that's fine. 2644 02:08:16,420 --> 02:08:17,860 Swap x and y. 2645 02:08:17,860 --> 02:08:23,500 And so now if I run python of swap on this program, voila, boom, 2646 02:08:23,500 --> 02:08:25,160 it's distilled into one other line. 2647 02:08:25,160 --> 02:08:28,120 So even though they take something away from us that you can do a lot of damage 2648 02:08:28,120 --> 02:08:30,520 with or make a lot of mistakes with, we can nonetheless 2649 02:08:30,520 --> 02:08:35,620 hand you back a more powerful feature with this one liner for swap. 2650 02:08:35,620 --> 02:08:39,498 And notice that it's x comma y on the left, but y comma x on the right. 2651 02:08:39,498 --> 02:08:41,290 And that has the effect of doing what Brian 2652 02:08:41,290 --> 02:08:43,990 did with the glasses of liquid of doing the switcheroo, 2653 02:08:43,990 --> 02:08:47,830 even without a temporary variable explicitly there, 2654 02:08:47,830 --> 02:08:51,290 though some magic is happening underneath the hood. 2655 02:08:51,290 --> 02:08:54,970 Well, let's go ahead and implement a couple of final programs from week 4 2656 02:08:54,970 --> 02:08:58,240 and then introduce a few of our own here in week 6. 2657 02:08:58,240 --> 02:09:00,580 Let me go ahead and implement another phone book 2658 02:09:00,580 --> 02:09:02,680 that this one's a little more persistent. 2659 02:09:02,680 --> 02:09:09,940 Let me go ahead here and open-- create a file here called phonebook.csv. 2660 02:09:09,940 --> 02:09:12,640 And let me go ahead and name this name comma number. 2661 02:09:12,640 --> 02:09:15,520 So CSV file, recall, is like a very simple spreadsheet. 2662 02:09:15,520 --> 02:09:18,430 And we're going to go ahead and just create that so I have it nearby. 2663 02:09:18,430 --> 02:09:22,180 And then I'm going to create a new file called phonebook.py that's 2664 02:09:22,180 --> 02:09:23,500 initially empty. 2665 02:09:23,500 --> 02:09:25,150 And this time I'm going to do this. 2666 02:09:25,150 --> 02:09:30,040 I'm going to import from cs50 the get_string function as before. 2667 02:09:30,040 --> 02:09:33,670 But I'm also going to import a library called the CSV library. 2668 02:09:33,670 --> 02:09:37,690 It turns out, Python comes with a whole lot of functionality related to CSV 2669 02:09:37,690 --> 02:09:41,980 files to make your life easier and make it easier to do things with CSVs. 2670 02:09:41,980 --> 02:09:43,970 Among the things I might want to do is this. 2671 02:09:43,970 --> 02:09:48,820 Let me go ahead and open up that file, phonebook.csv, in append mode, 2672 02:09:48,820 --> 02:09:51,520 similar to fopen two weeks ago. 2673 02:09:51,520 --> 02:09:56,050 And let me go ahead and assign that to a variable called file. 2674 02:09:56,050 --> 02:09:58,640 Then let me go ahead and just get a name from the user. 2675 02:09:58,640 --> 02:10:02,710 So let me use get_string to get someone's name, "Name" here. 2676 02:10:02,710 --> 02:10:04,900 Then let me go ahead and get-- use get_string again 2677 02:10:04,900 --> 02:10:07,810 to get someone's number here, so using "Number." 2678 02:10:07,810 --> 02:10:10,240 And then lastly-- and this is the new code-- 2679 02:10:10,240 --> 02:10:13,300 let me save that name and number to a file. 2680 02:10:13,300 --> 02:10:17,278 And recall from pset 4 that saving files and writing bytes out to files 2681 02:10:17,278 --> 02:10:18,070 is pretty involved. 2682 02:10:18,070 --> 02:10:21,250 Like, it probably took you a while to implement recover, or blur, 2683 02:10:21,250 --> 02:10:24,160 any of those filters that involved creating new files. 2684 02:10:24,160 --> 02:10:26,380 Turns out the CSV library makes this pretty easy. 2685 02:10:26,380 --> 02:10:29,230 Let me go ahead and give myself what's called a writer. 2686 02:10:29,230 --> 02:10:34,790 And I'm going to give myself the return value of calling csv.writer of file. 2687 02:10:34,790 --> 02:10:35,830 So what is this doing? 2688 02:10:35,830 --> 02:10:38,890 File, again, represents the file I'm trying to open. 2689 02:10:38,890 --> 02:10:42,490 csv.writer is some function that comes with the CSV library. 2690 02:10:42,490 --> 02:10:45,940 And it expects as input a file that you've already opened. 2691 02:10:45,940 --> 02:10:49,210 And it kind of wraps that file with some fancier functionality 2692 02:10:49,210 --> 02:10:52,000 that's going to make it way easier for me the programmer 2693 02:10:52,000 --> 02:10:53,650 to write to that file. 2694 02:10:53,650 --> 02:10:54,830 What am I going to do? 2695 02:10:54,830 --> 02:10:59,230 I'm going to use that writer variable to write a row that specifically 2696 02:10:59,230 --> 02:11:00,690 contains a name and a number. 2697 02:11:00,690 --> 02:11:02,440 And I'm using a list, because if you think 2698 02:11:02,440 --> 02:11:06,010 of a spreadsheet with columns and rows, a list is kind of the right idea. 2699 02:11:06,010 --> 02:11:08,710 Each of the cells from left to right is kind of like a list. 2700 02:11:08,710 --> 02:11:10,100 A row is like a list. 2701 02:11:10,100 --> 02:11:11,990 So I'm going to deliberately use a list here. 2702 02:11:11,990 --> 02:11:15,800 And then lastly, I'm going to close the file, just as I've done in the past. 2703 02:11:15,800 --> 02:11:17,140 So it's a little cryptic here. 2704 02:11:17,140 --> 02:11:21,910 But again, get_string-- get_string is old now. 2705 02:11:21,910 --> 02:11:22,900 This is old now. 2706 02:11:22,900 --> 02:11:26,050 So the only things that are new are importing the CSV. 2707 02:11:26,050 --> 02:11:30,170 I'm opening this file in append mode, similar to what I did in C. 2708 02:11:30,170 --> 02:11:33,950 And then these lines here involve wrapping the file with the CSV 2709 02:11:33,950 --> 02:11:38,000 functionality, writing a row to this file with writerow, 2710 02:11:38,000 --> 02:11:39,260 and then closing it. 2711 02:11:39,260 --> 02:11:40,760 So let me go ahead and try this now. 2712 02:11:40,760 --> 02:11:43,880 Let me open up a phonebook.csv, which for now only 2713 02:11:43,880 --> 02:11:47,030 contains these headers which I created manually a moment ago. 2714 02:11:47,030 --> 02:11:50,480 And let me go ahead and run this, python of phonebook.py. 2715 02:11:50,480 --> 02:11:52,040 Let me go ahead and add Brian. 2716 02:11:52,040 --> 02:11:56,810 And Brian will be +1-617-495-1000, Enter. 2717 02:11:56,810 --> 02:12:00,420 And now let me go to my CSV file over here. 2718 02:12:00,420 --> 02:12:02,195 Dammit, I screwed up. 2719 02:12:02,195 --> 02:12:03,570 Pretend I didn't hit Enter there. 2720 02:12:03,570 --> 02:12:04,630 Now it works. 2721 02:12:04,630 --> 02:12:06,715 Let me go ahead now and do this again by input-- 2722 02:12:06,715 --> 02:12:09,090 I should have hit Enter when I created the file manually, 2723 02:12:09,090 --> 02:12:10,210 but I screwed up on creating it. 2724 02:12:10,210 --> 02:12:13,252 So let me wave my hand at that and convince you that I did this correctly 2725 02:12:13,252 --> 02:12:21,270 in code by adding myself, David, +1-949-468-2750, Enter. 2726 02:12:21,270 --> 02:12:23,250 Let me go back to my CSV file. 2727 02:12:23,250 --> 02:12:26,370 And voila, now it's formatted correctly, because I did-- 2728 02:12:26,370 --> 02:12:28,930 writerow includes a line for me. 2729 02:12:28,930 --> 02:12:32,700 And notice, too, if I download this file-- let me download phonebook.csv 2730 02:12:32,700 --> 02:12:34,410 like I did in a past week. 2731 02:12:34,410 --> 02:12:36,390 Let me download this to my own Mac. 2732 02:12:36,390 --> 02:12:38,130 Let me open this CSV file. 2733 02:12:38,130 --> 02:12:40,808 And whether you have Apple Numbers installed or Microsoft Excel, 2734 02:12:40,808 --> 02:12:42,600 you'll open something that looks like this. 2735 02:12:42,600 --> 02:12:46,260 And voila, I've dynamically created, using Python code now, 2736 02:12:46,260 --> 02:12:49,067 my own sort of CSV file. 2737 02:12:49,067 --> 02:12:51,900 And it turns out there's a way to tighten this up just a little bit. 2738 02:12:51,900 --> 02:12:54,690 Not a big deal to do it the way I did, but it turns out 2739 02:12:54,690 --> 02:12:59,310 that you can also open and close files a little differently. 2740 02:12:59,310 --> 02:13:00,510 You can do this. 2741 02:13:00,510 --> 02:13:06,000 With file-- with, rather, with open as file. 2742 02:13:06,000 --> 02:13:09,810 Then I can indent all of this here, and I can get rid of my close line. 2743 02:13:09,810 --> 02:13:12,780 So not a big deal to do it the way I did with open and close, 2744 02:13:12,780 --> 02:13:15,780 but the way I've done this here is a little more Pythonic. 2745 02:13:15,780 --> 02:13:19,260 This "with" keyword, which is not something analogous to anything 2746 02:13:19,260 --> 02:13:22,230 we've seen in C, the with keyword, when you open a file, 2747 02:13:22,230 --> 02:13:24,420 it will automatically close it for you eventually. 2748 02:13:24,420 --> 02:13:27,550 So you might see that in some online references or other materials. 2749 02:13:27,550 --> 02:13:31,535 But again, it just does that for you automatically. 2750 02:13:31,535 --> 02:13:32,910 Well, let's go ahead and do this. 2751 02:13:32,910 --> 02:13:35,640 I like the fact that we can now manipulate CSV. 2752 02:13:35,640 --> 02:13:38,010 And it turns out that if you've ever used Google Forms-- 2753 02:13:38,010 --> 02:13:41,230 that's a very popular way of collecting data from users. 2754 02:13:41,230 --> 02:13:44,670 In fact, let me go ahead and go to a URL which is going 2755 02:13:44,670 --> 02:13:47,160 to show you a form like this here. 2756 02:13:47,160 --> 02:13:51,000 Brian, if you wouldn't mind typing that into the chat, go to that you URL, 2757 02:13:51,000 --> 02:13:53,458 cs50.ly.hogwarts. 2758 02:13:53,458 --> 02:13:55,500 And if everyone wouldn't mind just playing along, 2759 02:13:55,500 --> 02:13:59,160 just tell us what house you wish you were assigned to by the Sorting 2760 02:13:59,160 --> 02:14:02,550 Hat in the world of Hogwarts. 2761 02:14:02,550 --> 02:14:04,290 What house would you be in? 2762 02:14:04,290 --> 02:14:06,300 Now, if you've used Google Forms before, you'll 2763 02:14:06,300 --> 02:14:09,730 recall that you can see these results, certainly in the Google Form itself-- 2764 02:14:09,730 --> 02:14:12,185 and already 122 of you have buzzed in. 2765 02:14:12,185 --> 02:14:14,560 And we could see a distribution and a graph and so forth. 2766 02:14:14,560 --> 02:14:17,390 But what I want is not the distribution pictorially there. 2767 02:14:17,390 --> 02:14:19,390 I'm going to go ahead and open up a spreadsheet. 2768 02:14:19,390 --> 02:14:22,230 And so if you've never used Google Forms before, you can click a button, 2769 02:14:22,230 --> 02:14:24,450 and then you can get a list of all of the responses 2770 02:14:24,450 --> 02:14:26,160 that are coming in live right now. 2771 02:14:26,160 --> 02:14:28,560 And by default, Google keeps track of the timestamp, 2772 02:14:28,560 --> 02:14:32,500 when the form was submitted, and what house was actually used. 2773 02:14:32,500 --> 02:14:34,440 So I'm going to go ahead now and do this. 2774 02:14:34,440 --> 02:14:37,980 Let me go ahead and download that in another tab. 2775 02:14:37,980 --> 02:14:42,090 Give me just a moment to do it on this screen here. 2776 02:14:42,090 --> 02:14:48,270 I'm going to go ahead and download that CSV file onto my Mac 2777 02:14:48,270 --> 02:14:53,700 locally, by going to File, Download, CSV. 2778 02:14:53,700 --> 02:14:55,830 That's going to put it into my Downloads folder. 2779 02:14:55,830 --> 02:14:59,970 And then I'm going to go ahead and upload this into my IDE 2780 02:14:59,970 --> 02:15:01,530 by just dragging and dropping. 2781 02:15:01,530 --> 02:15:03,480 Whoops, I have to open the file browser. 2782 02:15:03,480 --> 02:15:07,700 I'm going to do this by dragging and dropping the file. 2783 02:15:07,700 --> 02:15:08,260 All right. 2784 02:15:08,260 --> 02:15:09,980 Now I have that file there. 2785 02:15:09,980 --> 02:15:13,270 And let me go ahead now and make sure the file's there. 2786 02:15:13,270 --> 02:15:15,280 I have this file called "Sorting Hat Responses-- 2787 02:15:15,280 --> 02:15:17,510 Form Responses 1," and so forth. 2788 02:15:17,510 --> 02:15:20,602 Well, let me go ahead and write a program now that manipulates this data, 2789 02:15:20,602 --> 02:15:22,810 much like you might if running a student group that's 2790 02:15:22,810 --> 02:15:25,900 collecting data in a Google Form, or you're just collecting information 2791 02:15:25,900 --> 02:15:28,000 in general and have it in CSV format. 2792 02:15:28,000 --> 02:15:30,680 How might you now tally up all of the results, 2793 02:15:30,680 --> 02:15:32,560 especially if Google weren't just telling 2794 02:15:32,560 --> 02:15:34,480 you graphically what the results were? 2795 02:15:34,480 --> 02:15:36,940 Well, let me go ahead and write a program called hogwarts, 2796 02:15:36,940 --> 02:15:40,630 which was not something that we've seen ever before in C. Let me go ahead 2797 02:15:40,630 --> 02:15:42,580 and import this CSV library. 2798 02:15:42,580 --> 02:15:44,620 Let me give myself initially a dictionary 2799 02:15:44,620 --> 02:15:48,070 called houses that contains a whole bunch of keys, 2800 02:15:48,070 --> 02:15:52,540 like "Gryffindor" with initial count of 0; 2801 02:15:52,540 --> 02:15:59,500 "Hufflepuff" with an initial count of 0; "Ravenclaw" with an initial count of 0; 2802 02:15:59,500 --> 02:16:02,860 and also, "Slytherin" with an initial count of 0. 2803 02:16:02,860 --> 02:16:05,590 So notice, in a dictionary, or dict in Python, 2804 02:16:05,590 --> 02:16:09,100 the keys and values don't need to be strings and strings. 2805 02:16:09,100 --> 02:16:11,712 It can certainly be strings and numbers. 2806 02:16:11,712 --> 02:16:14,170 Because I'm going to use this dictionary ultimately to keep 2807 02:16:14,170 --> 02:16:17,918 count of all of the votes for one house or another. 2808 02:16:17,918 --> 02:16:19,210 So let me go ahead and do this. 2809 02:16:19,210 --> 02:16:24,400 Let me go ahead and open up with open the Sorting Hat File-- 2810 02:16:24,400 --> 02:16:30,280 Form Responses 1.csv"-- long filename, but that's the default from Google-- 2811 02:16:30,280 --> 02:16:31,130 as file. 2812 02:16:31,130 --> 02:16:34,299 So I'm going to use my one liner instead of having to open and close. 2813 02:16:34,299 --> 02:16:38,209 I'm going to give myself this time a reader, which we did not see before. 2814 02:16:38,209 --> 02:16:41,680 CSV library has a reader function that allows 2815 02:16:41,680 --> 02:16:44,570 me to read a CSV file automatically. 2816 02:16:44,570 --> 02:16:46,459 I'm going to go ahead and skip the first row. 2817 02:16:46,459 --> 02:16:48,760 Next is a function that just skips the first row, 2818 02:16:48,760 --> 02:16:53,290 because recall that that one is just timestamp and house, which 2819 02:16:53,290 --> 02:16:54,200 I do want to ignore. 2820 02:16:54,200 --> 02:16:55,690 I want the real data from you all. 2821 02:16:55,690 --> 02:16:58,450 And here's what's cool about CSVs in Python. 2822 02:16:58,450 --> 02:17:02,740 I can-- if I want to iterate over all of the rows that are in that spreadsheet, 2823 02:17:02,740 --> 02:17:05,080 I can do for row in reader. 2824 02:17:05,080 --> 02:17:12,980 And now, let me go ahead and get at, for instance, the house in question. 2825 02:17:12,980 --> 02:17:20,080 So the house in a given row is going to be the row's first entry, 0 indexed. 2826 02:17:20,080 --> 02:17:21,590 So what is going on here? 2827 02:17:21,590 --> 02:17:25,000 Well, let me go back to the Google spreadsheet a moment ago. 2828 02:17:25,000 --> 02:17:27,910 And in the Google spreadsheet, there's two columns. 2829 02:17:27,910 --> 02:17:32,570 And the way the CSV reader works is it returns to you one row at a time-- 2830 02:17:32,570 --> 02:17:34,639 and that's conceptually pretty straightforward. 2831 02:17:34,639 --> 02:17:37,180 It maps perfectly to the idea of a spreadsheet. 2832 02:17:37,180 --> 02:17:42,280 But each row is returned to you as a list, a list in this case of size 2. 2833 02:17:42,280 --> 02:17:46,299 So row bracket 0 would give me a given timestamp, row bracket 2834 02:17:46,299 --> 02:17:48,639 1 would give me a given house name. 2835 02:17:48,639 --> 02:17:52,870 So that's why here in the IDE, I'm going ahead and declaring a variable called 2836 02:17:52,870 --> 02:17:55,597 house, and I'm assigning it equal to row bracket 1, because I 2837 02:17:55,597 --> 02:17:56,889 don't care about the timestamp. 2838 02:17:56,889 --> 02:17:59,120 We all just did this roughly at the same time. 2839 02:17:59,120 --> 02:18:04,150 But now that I have the house, I can now index into the dictionary, just 2840 02:18:04,150 --> 02:18:08,719 like in C you could index into an array using a number. 2841 02:18:08,719 --> 02:18:10,990 But in a dictionary, I can use strings. 2842 02:18:10,990 --> 02:18:14,650 So I'm going to go ahead and say, go into the houses dictionary, which 2843 02:18:14,650 --> 02:18:19,389 I defined up above, and go to the house key, and go ahead 2844 02:18:19,389 --> 02:18:22,240 and increment it by 1. 2845 02:18:22,240 --> 02:18:23,230 And that's it. 2846 02:18:23,230 --> 02:18:28,030 At this point, I have opened the CSV file and read it using the library. 2847 02:18:28,030 --> 02:18:30,940 In this loop, I'm iterating over every row in the spreadsheet 2848 02:18:30,940 --> 02:18:34,000 that you all created by filling out that form again and again. 2849 02:18:34,000 --> 02:18:36,309 I'm just using a variable to get at whatever's 2850 02:18:36,309 --> 02:18:40,150 in the second column, otherwise known as row bracket 1, 2851 02:18:40,150 --> 02:18:42,250 because row bracket 0 would be the timestamp. 2852 02:18:42,250 --> 02:18:44,230 And then I'm going into the dictionary called 2853 02:18:44,230 --> 02:18:46,570 houses, which we defined up here. 2854 02:18:46,570 --> 02:18:50,590 I'm indexing into it just like an array, but it's a list in this case, 2855 02:18:50,590 --> 02:18:54,219 using its house name, which looks up the appropriate key. 2856 02:18:54,219 --> 02:18:59,270 And then plus equals 1 has the effect of incrementing its value. 2857 02:18:59,270 --> 02:19:02,170 So it's a nice way of going into the dictionary and incrementing. 2858 02:19:02,170 --> 02:19:03,478 Go in and increment. 2859 02:19:03,478 --> 02:19:06,520 So now let's go ahead at the very end here and just print out the result. 2860 02:19:06,520 --> 02:19:10,330 "For house in houses" is the fancy way to iterate over 2861 02:19:10,330 --> 02:19:14,230 all of the keys in a dictionary, go ahead and print out a formatted string 2862 02:19:14,230 --> 02:19:15,500 as follows. 2863 02:19:15,500 --> 02:19:21,160 Let me print out the house name followed by a colon followed by the houses 2864 02:19:21,160 --> 02:19:24,309 dictionary, indexing into it with house. 2865 02:19:24,309 --> 02:19:25,090 So again, cryptic. 2866 02:19:25,090 --> 02:19:26,590 We'll come back to this in a second. 2867 02:19:26,590 --> 02:19:27,820 Python of hogwarts. 2868 02:19:27,820 --> 02:19:31,870 Let me cross my fingers that I didn't screw this up. 2869 02:19:31,870 --> 02:19:33,670 And I did. 2870 02:19:33,670 --> 02:19:35,209 The IDE knew before I did. 2871 02:19:35,209 --> 02:19:35,709 All right. 2872 02:19:35,709 --> 02:19:38,270 Now let me hope that I didn't screw this up-- and dammit. 2873 02:19:38,270 --> 02:19:38,770 All right. 2874 02:19:38,770 --> 02:19:41,830 The file is called something slightly different. 2875 02:19:41,830 --> 02:19:46,840 Google's name must have changed, sorry, versus when I practiced. 2876 02:19:46,840 --> 02:19:49,310 Let me copy this. 2877 02:19:49,310 --> 02:19:50,590 So close. 2878 02:19:50,590 --> 02:19:52,180 Sorting hat responses. 2879 02:19:52,180 --> 02:19:54,440 Ah, it has parentheses which I forgot. 2880 02:19:54,440 --> 02:19:54,940 All right. 2881 02:19:54,940 --> 02:19:58,600 Now let me cross my fingers, rerun the program, dammit. 2882 02:19:58,600 --> 02:20:00,070 OK, no such file or-- 2883 02:20:00,070 --> 02:20:02,890 oh, I forgot the csv, dot csv. 2884 02:20:02,890 --> 02:20:03,460 OK. 2885 02:20:03,460 --> 02:20:05,260 Now cross fingers and-- 2886 02:20:05,260 --> 02:20:06,040 oh, thank God. 2887 02:20:06,040 --> 02:20:11,440 OK so Gryffindor, not surprisingly, the most popular house. 2888 02:20:11,440 --> 02:20:15,910 Hufflepuff at 40, Ravenclaw at 71, Slytherin-- oh, beat out Hufflepuff. 2889 02:20:15,910 --> 02:20:18,880 Very interesting for whatever sociological reason. 2890 02:20:18,880 --> 02:20:21,740 But here we have a program now that analyzed the CSV. 2891 02:20:21,740 --> 02:20:24,380 Now, we happened to do it with silly Harry Potter data. 2892 02:20:24,380 --> 02:20:26,890 But again, imagine collecting any data you want from users, 2893 02:20:26,890 --> 02:20:30,700 downloading it as a CSV to your Mac or PC or your IDE, 2894 02:20:30,700 --> 02:20:34,220 then writing code that analyzes that data however you want. 2895 02:20:34,220 --> 02:20:36,790 I did a very simple summation, but you could certainly 2896 02:20:36,790 --> 02:20:39,295 imagine doing something fancier than that, 2897 02:20:39,295 --> 02:20:42,640 like doing summations or averages, standard deviations. 2898 02:20:42,640 --> 02:20:46,570 All of that functionality could we get as well. 2899 02:20:46,570 --> 02:20:50,230 All right, any questions on dictionaries before we now 2900 02:20:50,230 --> 02:20:53,710 offer up some of the most powerful features we've yet 2901 02:20:53,710 --> 02:20:55,630 seen in a programming language? 2902 02:20:55,630 --> 02:20:58,380 2903 02:20:58,380 --> 02:21:01,600 Anything at all on your end, Brian? 2904 02:21:01,600 --> 02:21:02,820 BRIAN: No hands raised here. 2905 02:21:02,820 --> 02:21:03,778 DAVID MALAN: All right. 2906 02:21:03,778 --> 02:21:06,690 Well, let me go ahead now, and I'm going to transition actually 2907 02:21:06,690 --> 02:21:10,080 to my Mac where I have in advance pre-installed Python, 2908 02:21:10,080 --> 02:21:11,870 just so that I can do things locally. 2909 02:21:11,870 --> 02:21:13,370 It will make things a little faster. 2910 02:21:13,370 --> 02:21:15,030 I don't have to worry about internet speeds and the like. 2911 02:21:15,030 --> 02:21:18,090 And this is indeed the case, that on your own Mac, your own PC, 2912 02:21:18,090 --> 02:21:21,210 you can download and install the Python interpreter, 2913 02:21:21,210 --> 02:21:23,070 run it on your own Mac and PC. 2914 02:21:23,070 --> 02:21:25,740 However, I would recommend you continue using this IDE, 2915 02:21:25,740 --> 02:21:28,590 certainly for problem sets' sake until the end of the semester, 2916 02:21:28,590 --> 02:21:31,350 maybe transitioning to your Mac or PC for final projects 2917 02:21:31,350 --> 02:21:34,440 only, only because what I did this weekend was spent-- 2918 02:21:34,440 --> 02:21:37,500 waste a huge amount of time just getting stupid libraries to work 2919 02:21:37,500 --> 02:21:40,253 on my own Mac, which is often easier said than done, 2920 02:21:40,253 --> 02:21:43,170 just because when programmers are writing code that's supposed to work 2921 02:21:43,170 --> 02:21:46,980 on every possible Mac and PC in the world, you and I and everyone else 2922 02:21:46,980 --> 02:21:49,830 have slightly different version numbers, different software install, 2923 02:21:49,830 --> 02:21:51,270 different incompatibilities. 2924 02:21:51,270 --> 02:21:55,140 So those kinds of headaches very quickly arise when you're doing things locally. 2925 02:21:55,140 --> 02:21:58,210 So let me encourage you to wait until terms end with final projects, 2926 02:21:58,210 --> 02:22:00,960 perhaps, to move off of the IDE and do what 2927 02:22:00,960 --> 02:22:05,640 I'm about to now do, just because you'll be able to see these demos more 2928 02:22:05,640 --> 02:22:06,810 clearly here. 2929 02:22:06,810 --> 02:22:08,970 I'm going to go ahead, and on my own Mac, 2930 02:22:08,970 --> 02:22:12,540 I'm going to go ahead and create a program called speech.py. 2931 02:22:12,540 --> 02:22:16,770 In advance, I've installed a library that supports speech synthesis. 2932 02:22:16,770 --> 02:22:18,810 And if I want access to that functionality, 2933 02:22:18,810 --> 02:22:24,720 it suffices to import pyttsx3, which is the name of that person's open source 2934 02:22:24,720 --> 02:22:28,260 free library that I downloaded and installed on my Mac in advance. 2935 02:22:28,260 --> 02:22:29,400 I read the documentation. 2936 02:22:29,400 --> 02:22:31,980 I literally never used this before this past week. 2937 02:22:31,980 --> 02:22:36,270 And I found that I can declare a variable called engine, for instance. 2938 02:22:36,270 --> 02:22:41,830 I can then call pyttsx3.init to initialize the library. 2939 02:22:41,830 --> 02:22:42,330 Why? 2940 02:22:42,330 --> 02:22:44,370 That's just because of how the programmer designed it. 2941 02:22:44,370 --> 02:22:45,720 You have to initialize it first. 2942 02:22:45,720 --> 02:22:50,610 I then can use that engine to say things like, say, "hello," comma "world." 2943 02:22:50,610 --> 02:22:54,180 Then after that, I should run the engine and wait for it 2944 02:22:54,180 --> 02:22:56,800 to finish before my own program quits. 2945 02:22:56,800 --> 02:22:57,300 All right. 2946 02:22:57,300 --> 02:23:03,420 Let me go ahead now and close that, and run python of speech.py on my own Mac 2947 02:23:03,420 --> 02:23:04,950 here. 2948 02:23:04,950 --> 02:23:06,678 COMPUTER VOICE: Hello, world. 2949 02:23:06,678 --> 02:23:07,720 DAVID MALAN: Interesting. 2950 02:23:07,720 --> 02:23:09,480 So it said what I typed in. 2951 02:23:09,480 --> 02:23:13,050 And indeed, I can probably make this even more interesting. 2952 02:23:13,050 --> 02:23:15,370 Let me go ahead and say something like this. 2953 02:23:15,370 --> 02:23:19,170 Let me open up speech.py again and add some functionality. 2954 02:23:19,170 --> 02:23:23,910 I won't use the CS50 library, but I will use maybe the input function. 2955 02:23:23,910 --> 02:23:30,270 Let me go ahead and say name gets input, "What's your name," question mark. 2956 02:23:30,270 --> 02:23:33,090 And then let me go ahead and say, not "hello, world," 2957 02:23:33,090 --> 02:23:34,680 but let me use an f-string-- 2958 02:23:34,680 --> 02:23:36,570 which doesn't have to be used in print, you 2959 02:23:36,570 --> 02:23:39,220 can use it in any function that takes a string. 2960 02:23:39,220 --> 02:23:42,160 Let me go ahead and say "hello" to that name. 2961 02:23:42,160 --> 02:23:42,660 All right. 2962 02:23:42,660 --> 02:23:45,550 Let me go ahead and run python speech.py again. 2963 02:23:45,550 --> 02:23:46,320 Oops. 2964 02:23:46,320 --> 02:23:50,930 Let me go ahead and run python of speech.py again. 2965 02:23:50,930 --> 02:23:51,620 What's my name? 2966 02:23:51,620 --> 02:23:52,350 David. 2967 02:23:52,350 --> 02:23:54,590 COMPUTER VOICE: Hello, David. 2968 02:23:54,590 --> 02:23:57,860 DAVID MALAN: Weird choice of inflection, but indeed it synthesized it. 2969 02:23:57,860 --> 02:23:58,790 Let's try Brian. 2970 02:23:58,790 --> 02:24:00,253 COMPUTER VOICE: Hello, Brian. 2971 02:24:00,253 --> 02:24:00,920 DAVID MALAN: OK. 2972 02:24:00,920 --> 02:24:02,795 So we could probably tinker with the settings 2973 02:24:02,795 --> 02:24:04,800 to make the voice sound a little more natural. 2974 02:24:04,800 --> 02:24:05,960 But that's pretty cool. 2975 02:24:05,960 --> 02:24:08,750 Well, let me go into some code I wrote in advance this time using 2976 02:24:08,750 --> 02:24:13,430 a different library, this one related to faces and facial detection. 2977 02:24:13,430 --> 02:24:16,130 Certainly very much in vogue when it comes to social media 2978 02:24:16,130 --> 02:24:19,640 these days, with Facebook and other websites automatically tagging you, 2979 02:24:19,640 --> 02:24:23,150 very concerning increasingly with state governments and federal governments 2980 02:24:23,150 --> 02:24:26,540 and law enforcement using facial detection to find people in a crowd. 2981 02:24:26,540 --> 02:24:29,180 And let me go ahead and open up a file here, for instance, 2982 02:24:29,180 --> 02:24:32,550 a little more benignly, like a whole bunch of people in an office. 2983 02:24:32,550 --> 02:24:34,745 So here is a photograph of some people in an office. 2984 02:24:34,745 --> 02:24:36,120 And there's a lot of faces there. 2985 02:24:36,120 --> 02:24:42,020 But there's a lot of boxes of paper and other distractions besides those faces. 2986 02:24:42,020 --> 02:24:46,400 But let me go ahead and look at, quickly, a program called detect.py. 2987 02:24:46,400 --> 02:24:49,100 Most of this file is comments, just so that if you want at home 2988 02:24:49,100 --> 02:24:50,930 you can follow along and see what it does. 2989 02:24:50,930 --> 02:24:53,500 But let me just highlight a few salient lines. 2990 02:24:53,500 --> 02:24:55,760 Here is that Pillow library again, where I'm 2991 02:24:55,760 --> 02:24:59,660 accessing image related functionality from a pre-installed Python function. 2992 02:24:59,660 --> 02:25:01,160 And this one's just kind of amazing. 2993 02:25:01,160 --> 02:25:03,470 If you want to use facial recognition technology, 2994 02:25:03,470 --> 02:25:05,360 just import face_recognition. 2995 02:25:05,360 --> 02:25:08,330 That is a library you can import that will give you access 2996 02:25:08,330 --> 02:25:09,920 to that kind of power. 2997 02:25:09,920 --> 02:25:13,790 Down here now, I only knew how to figure this out by reading some documentation, 2998 02:25:13,790 --> 02:25:17,897 but you access the library called face_recognition.load_image_file, 2999 02:25:17,897 --> 02:25:19,730 which is a function that does what it means. 3000 02:25:19,730 --> 02:25:21,590 I'm opening up office.jpg. 3001 02:25:21,590 --> 02:25:25,340 And then scrolling down here to the white code, which is the actual code-- 3002 02:25:25,340 --> 02:25:28,280 all of the blue is comments, recall-- 3003 02:25:28,280 --> 02:25:32,690 this line of code here is all that's required in Python 3004 02:25:32,690 --> 02:25:36,050 to use the face recognition library, find all of the face locations 3005 02:25:36,050 --> 02:25:40,820 in a given image, and store them in a list called face_locations. 3006 02:25:40,820 --> 02:25:43,310 This line of code here is just a Python loop 3007 02:25:43,310 --> 02:25:47,180 that iterates over every face in the faces that were detected. 3008 02:25:47,180 --> 02:25:50,180 And then these several lines of code here, long story short, 3009 02:25:50,180 --> 02:25:54,050 just crop out individual faces and create a new image with the found 3010 02:25:54,050 --> 02:25:55,107 faces. 3011 02:25:55,107 --> 02:25:58,190 So without getting too much into the details of the library, which are not 3012 02:25:58,190 --> 02:26:01,550 that intellectually interesting, the features are interesting to us for now, 3013 02:26:01,550 --> 02:26:04,100 let me run python of detect.py. 3014 02:26:04,100 --> 02:26:06,890 Let me give my Mac a few seconds here to do its thing. 3015 02:26:06,890 --> 02:26:11,810 And voila, if I zoom in here we see Phyllis, and Jim, 3016 02:26:11,810 --> 02:26:15,470 and Roy, and pretty much every other face that 3017 02:26:15,470 --> 02:26:19,982 was detected in that photograph, cropped out as, indeed, an individual face. 3018 02:26:19,982 --> 02:26:22,190 So if you've ever noticed a little square on yourself 3019 02:26:22,190 --> 02:26:25,880 in Facebook when uploading a photo, this is exactly the kind of code 3020 02:26:25,880 --> 02:26:30,380 that Facebook and others are using on their servers in order to execute that. 3021 02:26:30,380 --> 02:26:31,970 Well, you know what, how about this? 3022 02:26:31,970 --> 02:26:35,300 In the same office photo, you know, there's 3023 02:26:35,300 --> 02:26:37,120 one person that always seems to stand out. 3024 02:26:37,120 --> 02:26:38,120 No one really likes him. 3025 02:26:38,120 --> 02:26:39,050 And that's Toby. 3026 02:26:39,050 --> 02:26:43,370 What if we had a mug shot of Toby in a separate file like this? 3027 02:26:43,370 --> 02:26:47,390 Can we find Toby in a crowd among these people in the office? 3028 02:26:47,390 --> 02:26:48,140 Well, we can. 3029 02:26:48,140 --> 02:26:50,985 Let me go ahead now and run a program called recognize.py, 3030 02:26:50,985 --> 02:26:52,610 which you're welcome to look at online. 3031 02:26:52,610 --> 02:26:54,830 It's similar lines of code, It's not terribly many, 3032 02:26:54,830 --> 02:26:58,040 that is going to do some thinking. 3033 02:26:58,040 --> 02:27:00,540 It's opening up both the office JPEG and this one. 3034 02:27:00,540 --> 02:27:02,780 And notice what just happened, if I zoom in, 3035 02:27:02,780 --> 02:27:09,200 wonderfully, Toby is the only one with a big green box around his face, 3036 02:27:09,200 --> 02:27:10,965 having indeed been recognized. 3037 02:27:10,965 --> 02:27:12,590 So again, I'll just glance at the code. 3038 02:27:12,590 --> 02:27:16,700 This time, if I open up recognize.py, it's a few more lines of code. 3039 02:27:16,700 --> 02:27:19,610 But again, I'm importing face recognition and some other things. 3040 02:27:19,610 --> 02:27:21,260 I'm loading toby.jpg. 3041 02:27:21,260 --> 02:27:23,278 And I'm loading office.jpg. 3042 02:27:23,278 --> 02:27:25,820 And then there's some more code here that's looking for Toby, 3043 02:27:25,820 --> 02:27:29,990 looking for Toby, and then drawing a big green box around the face that 3044 02:27:29,990 --> 02:27:31,260 is ultimately found. 3045 02:27:31,260 --> 02:27:33,380 So again, at the end of the day, it's just loops. 3046 02:27:33,380 --> 02:27:34,500 It's just functions. 3047 02:27:34,500 --> 02:27:35,630 It's just variables. 3048 02:27:35,630 --> 02:27:39,440 But now the functions are pretty darn fancy and powerful, 3049 02:27:39,440 --> 02:27:43,070 because again, they're taking advantage of all of these other features 3050 02:27:43,070 --> 02:27:46,430 that we ourselves have implemented in a language like C, 3051 02:27:46,430 --> 02:27:50,435 or have now seen glimpses of within the world of Python. 3052 02:27:50,435 --> 02:27:51,560 Well, let's do another one. 3053 02:27:51,560 --> 02:27:57,830 Let me go ahead and open up real quickly a program 3054 02:27:57,830 --> 02:28:02,630 that will allow me to create one of these 2D barcodes, a so-called QR code. 3055 02:28:02,630 --> 02:28:07,220 Let me go ahead and create a file called qr.py And in this file, let me go ahead 3056 02:28:07,220 --> 02:28:08,360 and do this. 3057 02:28:08,360 --> 02:28:10,970 Import the operating system library, for reasons 3058 02:28:10,970 --> 02:28:14,750 we'll soon see, and let me import the QR code library, which 3059 02:28:14,750 --> 02:28:17,000 will do all of the hard work for me. 3060 02:28:17,000 --> 02:28:19,520 Let me go ahead and create an image called qr-- 3061 02:28:19,520 --> 02:28:22,280 that's assigned the value of qrcode making. 3062 02:28:22,280 --> 02:28:25,680 And let me paste in this URL of one of the course's lecture videos, 3063 02:28:25,680 --> 02:28:26,510 for instance. 3064 02:28:26,510 --> 02:28:31,700 And then let me go ahead and save this image as qr.png, Portable Network 3065 02:28:31,700 --> 02:28:34,910 Graphic, as indeed a PNG, a very popular file format 3066 02:28:34,910 --> 02:28:36,480 for photos and other things. 3067 02:28:36,480 --> 02:28:38,330 And then let me actually open this thing up. 3068 02:28:38,330 --> 02:28:41,720 Open up system-- actually, nope, that's fine. 3069 02:28:41,720 --> 02:28:42,840 Let me keep it simple. 3070 02:28:42,840 --> 02:28:44,210 We don't need the os library. 3071 02:28:44,210 --> 02:28:45,080 Nope, we do. 3072 02:28:45,080 --> 02:28:49,250 Let's go ahead and open it up with "open qr.png." 3073 02:28:49,250 --> 02:28:50,970 So three lines of code-- 3074 02:28:50,970 --> 02:28:56,220 make a QR code with that URL, save it as qr.png, and open the file. 3075 02:28:56,220 --> 02:28:57,730 Three lines of code. 3076 02:28:57,730 --> 02:29:00,750 Let me go ahead and run python of qr.py. 3077 02:29:00,750 --> 02:29:02,280 Voila, it was pretty fast. 3078 02:29:02,280 --> 02:29:05,670 If you would like to take out your own iPhone or Android phone, 3079 02:29:05,670 --> 02:29:08,190 turn on the camera if your phone supports this, 3080 02:29:08,190 --> 02:29:13,800 and scan this 3D barcode by awkwardly just pointing your phone at the lecture 3081 02:29:13,800 --> 02:29:23,040 as we speak, it should open up YouTube for you, hopefully, and with such is-- 3082 02:29:23,040 --> 02:29:26,730 I apologize to those-- yes, thank you for showing me what you're not seeing. 3083 02:29:26,730 --> 02:29:28,740 I apologize for doing that yet again. 3084 02:29:28,740 --> 02:29:29,670 Never gets old. 3085 02:29:29,670 --> 02:29:33,330 But all we've done is embed in a two-dimensional format, details 3086 02:29:33,330 --> 02:29:35,940 of which we won't go into in class, a URL, 3087 02:29:35,940 --> 02:29:39,373 which suggests that you can store anything inside of these 2D barcodes, 3088 02:29:39,373 --> 02:29:41,790 and if you decode them with something like your camera can 3089 02:29:41,790 --> 02:29:45,870 the software running on your phones these days decode these things for you. 3090 02:29:45,870 --> 02:29:49,590 Well, let me do something else, this time involving another sense, this one 3091 02:29:49,590 --> 02:29:50,340 listening. 3092 02:29:50,340 --> 02:29:53,428 Let me go into a file called listen.py. 3093 02:29:53,428 --> 02:29:55,470 And let me go ahead and do something very simple. 3094 02:29:55,470 --> 02:29:59,310 Let me go ahead and get a user's input in a variable called word 3095 02:29:59,310 --> 02:30:01,290 by using the input function. 3096 02:30:01,290 --> 02:30:02,670 Say something. 3097 02:30:02,670 --> 02:30:06,690 And then let me just send it all to lowercase, just to keep things simple. 3098 02:30:06,690 --> 02:30:07,980 And now let me do this. 3099 02:30:07,980 --> 02:30:10,290 Once I get the user's words, let me go ahead and say, 3100 02:30:10,290 --> 02:30:17,730 if the word "hello" is in their words, go ahead and print out "Hello to you 3101 02:30:17,730 --> 02:30:18,370 too!" 3102 02:30:18,370 --> 02:30:20,640 So if they say hello, I want to say hello back. 3103 02:30:20,640 --> 02:30:27,840 Elif, "how are you" in words, then let me go ahead and print out something 3104 02:30:27,840 --> 02:30:31,290 like, "I am well, thanks," as the computer. 3105 02:30:31,290 --> 02:30:37,590 Elif "goodbye" in words, then let me go ahead and say something reasonable 3106 02:30:37,590 --> 02:30:42,350 like "Goodbye to you too." 3107 02:30:42,350 --> 02:30:45,530 And then lastly, else let me go ahead and print out 3108 02:30:45,530 --> 02:30:46,820 just something like "Huh?" 3109 02:30:46,820 --> 02:30:48,020 Unrecognized. 3110 02:30:48,020 --> 02:30:52,580 So if you will, here is the beginnings of an artificial intelligence, an AI-- 3111 02:30:52,580 --> 02:30:55,940 a program that's going to somehow interact with me the human typing 3112 02:30:55,940 --> 02:30:57,390 in phrases to this thing. 3113 02:30:57,390 --> 02:31:00,830 So if I did it correctly, let me go ahead and run python of listen.py. 3114 02:31:00,830 --> 02:31:04,130 I did not do something correctly. 3115 02:31:04,130 --> 02:31:06,390 Oh, not "is," "in." 3116 02:31:06,390 --> 02:31:08,090 OK, sorry. 3117 02:31:08,090 --> 02:31:10,160 Let me go ahead and run python of listen.py. 3118 02:31:10,160 --> 02:31:10,910 Say something. 3119 02:31:10,910 --> 02:31:12,020 I'll say "hello." 3120 02:31:12,020 --> 02:31:13,100 Oh, Hello to you too. 3121 02:31:13,100 --> 02:31:14,900 What a nice friendly program. 3122 02:31:14,900 --> 02:31:18,230 Let me ask it how it is, "how are you," question mark. 3123 02:31:18,230 --> 02:31:20,100 It seems to detect that. 3124 02:31:20,100 --> 02:31:23,840 Let me go ahead and say, "ok goodbye for now." 3125 02:31:23,840 --> 02:31:27,860 And it detects that, too, because "goodbye" is in the phrase 3126 02:31:27,860 --> 02:31:29,130 that the user typed in. 3127 02:31:29,130 --> 02:31:32,880 But if I say something like, "hey there," it's not recognized. 3128 02:31:32,880 --> 02:31:33,750 So pretty cool. 3129 02:31:33,750 --> 02:31:37,400 We can use very simple string comparisons using the in preposition 3130 02:31:37,400 --> 02:31:38,390 to detect things. 3131 02:31:38,390 --> 02:31:40,880 But I bet-- you know, I bet if we use the right library, 3132 02:31:40,880 --> 02:31:43,580 we can really make this more powerful, too. 3133 02:31:43,580 --> 02:31:46,940 Let me go ahead, and just like I imported facial recognition, 3134 02:31:46,940 --> 02:31:51,710 let me import speech recognition in Python, which is yet another library 3135 02:31:51,710 --> 02:31:53,240 that I pre-installed. 3136 02:31:53,240 --> 02:31:56,330 Let me go ahead and now do this, recognizer equals 3137 02:31:56,330 --> 02:31:58,320 speech_recognition.Recognizer. 3138 02:31:58,320 --> 02:32:01,700 3139 02:32:01,700 --> 02:32:04,610 And this is just creating a variable called recognizer 3140 02:32:04,610 --> 02:32:08,420 by my having followed literally the documentation for using this library. 3141 02:32:08,420 --> 02:32:11,030 Then let me go ahead and do this, also from the documentation, 3142 02:32:11,030 --> 02:32:19,640 with speech_recognition.Microphone as source. 3143 02:32:19,640 --> 02:32:22,500 So this is opening up my microphone in some sense, 3144 02:32:22,500 --> 02:32:24,290 again just following the documentation. 3145 02:32:24,290 --> 02:32:28,300 Let me go ahead and say "Say something" to the user. 3146 02:32:28,300 --> 02:32:30,760 And then after that, let me go ahead and declare 3147 02:32:30,760 --> 02:32:36,700 a variable called audio, set it equal to the recognizer's listen function, 3148 02:32:36,700 --> 02:32:39,310 passing in my microphone as the source. 3149 02:32:39,310 --> 02:32:44,500 And now down here, let me go ahead and say print out "You said," 3150 02:32:44,500 --> 02:32:51,532 and below that I will print out recognizer.recognize, 3151 02:32:51,532 --> 02:32:55,670 is the hardest part today so far for some reason, google audio. 3152 02:32:55,670 --> 02:32:56,170 All right. 3153 02:32:56,170 --> 02:32:57,610 So what's going on? 3154 02:32:57,610 --> 02:33:00,040 This line of code-- these lines of code here 3155 02:33:00,040 --> 02:33:03,010 are opening up a connection to my microphone on my Mac. 3156 02:33:03,010 --> 02:33:07,030 It's then using the speech recognition library to listen to my microphone, 3157 02:33:07,030 --> 02:33:11,680 and storing the audio from my microphone in a variable called audio. 3158 02:33:11,680 --> 02:33:14,230 These lines of code down here are literally printing, 3159 02:33:14,230 --> 02:33:20,350 "You said," and then it's passing to the, the google.com, the file of audio 3160 02:33:20,350 --> 02:33:23,560 that I just recorded on my microphone, and it's printing out 3161 02:33:23,560 --> 02:33:25,147 whatever comes back from Google. 3162 02:33:25,147 --> 02:33:26,980 So let's see what comes out, again, crossing 3163 02:33:26,980 --> 02:33:28,810 my fingers that I didn't mess up. 3164 02:33:28,810 --> 02:33:32,020 Python of listen. 3165 02:33:32,020 --> 02:33:32,725 Hello, world. 3166 02:33:32,725 --> 02:33:35,680 3167 02:33:35,680 --> 02:33:37,940 Hoo. 3168 02:33:37,940 --> 02:33:41,320 How are you? 3169 02:33:41,320 --> 02:33:42,933 It's a pretty good speech recognition. 3170 02:33:42,933 --> 02:33:44,350 It's using the cloud, so to speak. 3171 02:33:44,350 --> 02:33:45,558 It's passing it up to Google. 3172 02:33:45,558 --> 02:33:47,350 But now let's make things a little fancier 3173 02:33:47,350 --> 02:33:49,060 and actually respond to the human. 3174 02:33:49,060 --> 02:33:52,360 So let me go back into here and add back some of the previous logic 3175 02:33:52,360 --> 02:33:53,620 and say something like this. 3176 02:33:53,620 --> 02:33:58,330 If "hello" in words, then go ahead and print out, like before, 3177 02:33:58,330 --> 02:34:00,220 "Hello to you too." 3178 02:34:00,220 --> 02:34:05,650 Elif "how are you" in the words that have come back from Google, 3179 02:34:05,650 --> 02:34:08,800 go ahead and print out "I am well, thanks!" 3180 02:34:08,800 --> 02:34:13,480 And down here if I said "goodbye" in words, 3181 02:34:13,480 --> 02:34:20,050 then go ahead and print out "Goodbye to you too!" 3182 02:34:20,050 --> 02:34:25,210 Else if nothing comes back that I recognize, let's just print out "Huh?" 3183 02:34:25,210 --> 02:34:30,160 So if I did this right, let's now go ahead and let's do python of listen.py. 3184 02:34:30,160 --> 02:34:32,560 Hello, there. 3185 02:34:32,560 --> 02:34:33,490 Oh, dammit. 3186 02:34:33,490 --> 02:34:35,380 OK, standby. 3187 02:34:35,380 --> 02:34:36,370 Da-da-da. 3188 02:34:36,370 --> 02:34:36,985 Oh, sorry. 3189 02:34:36,985 --> 02:34:39,802 3190 02:34:39,802 --> 02:34:41,010 Let me do a find and replace. 3191 02:34:41,010 --> 02:34:43,170 I called the variable "words" instead of "audio." 3192 02:34:43,170 --> 02:34:45,790 And I just executed a fancy command to replace it everywhere. 3193 02:34:45,790 --> 02:34:47,890 So "audio" is what I meant to say this time. 3194 02:34:47,890 --> 02:34:52,005 Now, let's go ahead and run this, python of listen.py. 3195 02:34:52,005 --> 02:34:54,590 Hello, world. 3196 02:34:54,590 --> 02:34:55,500 Dammit. 3197 02:34:55,500 --> 02:34:57,780 AudioData is not iterable. 3198 02:34:57,780 --> 02:34:59,820 This is a bug. 3199 02:34:59,820 --> 02:35:03,900 Give me one second to double check my notes. 3200 02:35:03,900 --> 02:35:06,720 Very sorry to disappoint. 3201 02:35:06,720 --> 02:35:09,450 The audio in-- oh, I did-- 3202 02:35:09,450 --> 02:35:10,200 sorry. 3203 02:35:10,200 --> 02:35:12,960 I did it right the first time but the wrong way. 3204 02:35:12,960 --> 02:35:16,620 Let me change my variable back to words. 3205 02:35:16,620 --> 02:35:17,310 OK. 3206 02:35:17,310 --> 02:35:20,010 What I forgot to do was call one line of code here 3207 02:35:20,010 --> 02:35:21,700 that's literally sitting in front of me. 3208 02:35:21,700 --> 02:35:28,080 I need to convert the recognizer's return value, recognize_google audio. 3209 02:35:28,080 --> 02:35:32,040 I need to store the return value of passing the audio to Google 3210 02:35:32,040 --> 02:35:33,900 and storing the resulting text here. 3211 02:35:33,900 --> 02:35:37,530 And so I have re-stored, using the words variable here. 3212 02:35:37,530 --> 02:35:42,810 All right now let me go ahead and run python of listen.py. 3213 02:35:42,810 --> 02:35:45,770 Hello, there. 3214 02:35:45,770 --> 02:35:47,270 Very nice. 3215 02:35:47,270 --> 02:35:48,185 How are you today? 3216 02:35:48,185 --> 02:35:51,200 3217 02:35:51,200 --> 02:35:52,880 Cool. 3218 02:35:52,880 --> 02:35:53,690 OK, goodbye. 3219 02:35:53,690 --> 02:35:56,430 3220 02:35:56,430 --> 02:35:56,970 All right. 3221 02:35:56,970 --> 02:35:59,678 So there we have an even more compelling artificial intelligence. 3222 02:35:59,678 --> 02:36:03,030 Granted, it's not that intelligent, it's just looking for preordained strings. 3223 02:36:03,030 --> 02:36:05,250 But I bet we can do something even more. 3224 02:36:05,250 --> 02:36:08,730 And in fact, let me go ahead and step inside, and see if a colleague of mine 3225 02:36:08,730 --> 02:36:10,440 can't help do something in real time. 3226 02:36:10,440 --> 02:36:13,080 On a big fancy PC here in the theater, we 3227 02:36:13,080 --> 02:36:16,148 are running some other Python program on a CPU 3228 02:36:16,148 --> 02:36:17,940 that's fast enough to do this in real time. 3229 02:36:17,940 --> 02:36:20,730 And we've connected one of our cameras to that PC, 3230 02:36:20,730 --> 02:36:24,210 so that what you're about to see is the result of one of our cameras being 3231 02:36:24,210 --> 02:36:28,860 wired into this PC, running that camera's input into Python software 3232 02:36:28,860 --> 02:36:30,330 running on that PC. 3233 02:36:30,330 --> 02:36:34,080 And we have trained the PC, using this Python software, 3234 02:36:34,080 --> 02:36:37,020 to recognize certain images in the past. 3235 02:36:37,020 --> 02:36:40,710 And let's see if we can't do this as well. 3236 02:36:40,710 --> 02:36:43,920 Brian, would you mind putting me on screen 1? 3237 02:36:43,920 --> 02:36:48,120 And Rongxin, do you want to go ahead and load up our first guest? 3238 02:36:48,120 --> 02:36:49,930 I think we are live. 3239 02:36:49,930 --> 02:36:55,950 So again, you see my mouth moving in lock step with Einstein here. 3240 02:36:55,950 --> 02:36:57,570 His lips are matching mine. 3241 02:36:57,570 --> 02:36:59,130 His head movements are moving-- 3242 02:36:59,130 --> 02:36:59,760 matching mine. 3243 02:36:59,760 --> 02:37:01,380 We can even be inquisitive. 3244 02:37:01,380 --> 02:37:05,770 If my eyebrows go up, move my mouth this way, this way. 3245 02:37:05,770 --> 02:37:08,610 And you can see that the Python program in real time 3246 02:37:08,610 --> 02:37:13,050 is mapping my facial movements onto someone else's face, of course 3247 02:37:13,050 --> 02:37:15,820 otherwise known as a deep fake. 3248 02:37:15,820 --> 02:37:17,865 Rongxin, could we try out Brian's photo instead? 3249 02:37:17,865 --> 02:37:24,150 3250 02:37:24,150 --> 02:37:31,110 Here now we have Brian who similarly is matching big smile. 3251 02:37:31,110 --> 02:37:32,940 Gets a little fake at some point. 3252 02:37:32,940 --> 02:37:36,000 But again, if we pre-rendered all of this instead of doing it live, 3253 02:37:36,000 --> 02:37:38,550 the PC could probably do an even better job. 3254 02:37:38,550 --> 02:37:42,130 How about could, we invite Harvard president Larry Bacow to join us, 3255 02:37:42,130 --> 02:37:42,630 Rongxin? 3256 02:37:42,630 --> 02:37:47,470 3257 02:37:47,470 --> 02:37:51,250 This is CS50, Harvard University's introduction 3258 02:37:51,250 --> 02:37:53,950 to the intellectual enterprises of computer science 3259 02:37:53,950 --> 02:37:56,790 and the art of programming. 3260 02:37:56,790 --> 02:38:00,140 How about President Peter Salovey from Yale, Rongxin? 3261 02:38:00,140 --> 02:38:04,660 3262 02:38:04,660 --> 02:38:08,440 This is CS50, Yale University's introduction 3263 02:38:08,440 --> 02:38:10,960 to the intellectual enterprises of computer science 3264 02:38:10,960 --> 02:38:12,940 and the art of programming. 3265 02:38:12,940 --> 02:38:15,850 Now at this point, the real-world implications of this 3266 02:38:15,850 --> 02:38:17,420 should be getting increasingly clear. 3267 02:38:17,420 --> 02:38:20,650 While it's all fun and games to do this on Instagram, in TikTok and the like, 3268 02:38:20,650 --> 02:38:22,907 using various mobile applications these days, 3269 02:38:22,907 --> 02:38:24,740 which are essentially doing the same thing-- 3270 02:38:24,740 --> 02:38:26,290 and you can see the image doesn't quite keep up 3271 02:38:26,290 --> 02:38:28,810 with me if I start moving a little too quickly right now-- 3272 02:38:28,810 --> 02:38:32,987 this is very real-world implications in the world of politics, government, 3273 02:38:32,987 --> 02:38:35,320 business, and really just the real world more generally, 3274 02:38:35,320 --> 02:38:39,280 because I'm essentially putting in someone else's mouth my own words. 3275 02:38:39,280 --> 02:38:42,790 And while it's clear that these examples thus far aren't really that 3276 02:38:42,790 --> 02:38:44,560 compelling-- if I start to move too much, 3277 02:38:44,560 --> 02:38:46,510 you see that things start to get out of sync-- 3278 02:38:46,510 --> 02:38:48,842 just imagine that if we wait one year, our computers 3279 02:38:48,842 --> 02:38:51,550 are going to be twice as fast with even more memory and the like. 3280 02:38:51,550 --> 02:38:53,590 Software is only getting better and more powerful, 3281 02:38:53,590 --> 02:38:56,170 the libraries and the artificial intelligence is getting more trained. 3282 02:38:56,170 --> 02:38:58,545 And so among the themes for the coming weeks of the class 3283 02:38:58,545 --> 02:39:01,270 is not just how to do some things with technology 3284 02:39:01,270 --> 02:39:04,300 and how to write code, but frankly asking the much bigger, more 3285 02:39:04,300 --> 02:39:07,990 important picture question of should you do certain things with technology, 3286 02:39:07,990 --> 02:39:10,690 and should you actually write such code. 3287 02:39:10,690 --> 02:39:14,320 We did ask President Salovey and President Bacow for their permission 3288 02:39:14,320 --> 02:39:16,440 in advance to spoof them in this way. 3289 02:39:16,440 --> 02:39:18,190 But we thought we would more playfully end 3290 02:39:18,190 --> 02:39:20,950 with just a couple of other examples that you perhaps 3291 02:39:20,950 --> 02:39:23,980 see on Instagram, TikTok, and the like. 3292 02:39:23,980 --> 02:39:26,290 Rongxin, could we invite Pam to join us first? 3293 02:39:26,290 --> 02:39:30,440 3294 02:39:30,440 --> 02:39:32,450 And how about a certain Jim? 3295 02:39:32,450 --> 02:39:40,070 3296 02:39:40,070 --> 02:39:40,670 All right. 3297 02:39:40,670 --> 02:39:43,190 That's it for CS50 and Python today. 3298 02:39:43,190 --> 02:39:44,960 We'll see you next time. 3299 02:39:44,960 --> 02:39:49,810 [MUSIC PLAYING] 3300 02:39:49,810 --> 02:40:45,000