1 00:00:00,000 --> 00:00:03,479 [MUSIC PLAYING] 2 00:00:03,479 --> 00:01:22,999 3 00:01:22,999 --> 00:01:24,520 DAVID MALAN: All right. 4 00:01:24,520 --> 00:01:26,200 This is CS50. 5 00:01:26,200 --> 00:01:27,375 Welcome back to all. 6 00:01:27,375 --> 00:01:30,250 And this is one of those rare days, where, in just a couple of hours, 7 00:01:30,250 --> 00:01:33,095 you'll be able to say that you've learned a new language. 8 00:01:33,095 --> 00:01:35,470 Or if you have a little bit of Python background already, 9 00:01:35,470 --> 00:01:38,053 you'll be able to say hopefully that you know it all the more, 10 00:01:38,053 --> 00:01:41,380 because even though we've spent the past several weeks focusing on C, 11 00:01:41,380 --> 00:01:44,500 one of the overarching goals of the class is not to teach you C-- 12 00:01:44,500 --> 00:01:47,080 and indeed, C is officially now behind us-- 13 00:01:47,080 --> 00:01:48,760 but really to teach you how to program. 14 00:01:48,760 --> 00:01:52,540 But realize, too, that even as we dive into a new language today, 15 00:01:52,540 --> 00:01:56,230 the goal is not to take a course on one language or another. 16 00:01:56,230 --> 00:01:58,540 Indeed, I, myself, back in the day took CS50 and just 17 00:01:58,540 --> 00:02:00,957 one other follow-on class, where I learned how to program. 18 00:02:00,957 --> 00:02:03,370 And every language since then have I pretty much 19 00:02:03,370 --> 00:02:06,670 taught myself, learned from others, learned by reading other code, 20 00:02:06,670 --> 00:02:08,538 and really bootstrapping myself from that. 21 00:02:08,538 --> 00:02:10,330 So after just this term, hopefully will you 22 00:02:10,330 --> 00:02:12,670 have the power to teach yourselves new languages. 23 00:02:12,670 --> 00:02:15,400 And today, we start that together. 24 00:02:15,400 --> 00:02:15,970 All right. 25 00:02:15,970 --> 00:02:16,850 So where do we begin? 26 00:02:16,850 --> 00:02:19,030 Back in week 0-- this is, recall, where we began, 27 00:02:19,030 --> 00:02:21,820 just making a little cat on the screen say "Hello world." 28 00:02:21,820 --> 00:02:23,830 And very quickly, things escalated a week later 29 00:02:23,830 --> 00:02:25,570 and started looking like this. 30 00:02:25,570 --> 00:02:27,490 Now, hopefully, over the past several weeks, 31 00:02:27,490 --> 00:02:31,630 you've begun to see through the syntax and see the underlying concepts 32 00:02:31,630 --> 00:02:33,220 and ideas that actually matter. 33 00:02:33,220 --> 00:02:35,740 But even so, there's a lot of cognitive overhead. 34 00:02:35,740 --> 00:02:39,160 There's a lot of syntactic overhead just to getting something simple done 35 00:02:39,160 --> 00:02:40,322 in this language called C. 36 00:02:40,322 --> 00:02:42,280 So starting today, we're going to introduce you 37 00:02:42,280 --> 00:02:45,070 to another programming language called Python 38 00:02:45,070 --> 00:02:48,340 that has been gaining steam in recent years and is wonderfully applicable, 39 00:02:48,340 --> 00:02:50,440 not only for the sort of command line programs 40 00:02:50,440 --> 00:02:52,440 that we've been writing in our terminal windows, 41 00:02:52,440 --> 00:02:56,380 but also in data science applications, analytics of large data sets, 42 00:02:56,380 --> 00:02:57,920 web programming, and the like. 43 00:02:57,920 --> 00:03:01,060 So this is the type of language that can actually solve many problems. 44 00:03:01,060 --> 00:03:03,610 And wonderfully, if we want to say "Hello, world" 45 00:03:03,610 --> 00:03:08,410 starting today in this new language, Python, all we need type is this-- 46 00:03:08,410 --> 00:03:10,970 typing the commands that you actually ultimately care about. 47 00:03:10,970 --> 00:03:13,030 So how do we get to that point ultimately? 48 00:03:13,030 --> 00:03:16,540 Well, recall that in C, we had this process of compiling our code and then 49 00:03:16,540 --> 00:03:20,020 running it, as with make or more specifically, as with clang, 50 00:03:20,020 --> 00:03:22,330 and then running it with the file ./hello, 51 00:03:22,330 --> 00:03:24,760 representing a file in your current working directory. 52 00:03:24,760 --> 00:03:27,250 Today, even that process gets a little easier 53 00:03:27,250 --> 00:03:30,130 in that it's no longer a two-step process to write and run code. 54 00:03:30,130 --> 00:03:32,020 It's now just one. 55 00:03:32,020 --> 00:03:34,990 But it's a little bit different from the past, whereas in the past, 56 00:03:34,990 --> 00:03:38,410 we've, indeed, compiled our code from source code into machine code and then 57 00:03:38,410 --> 00:03:40,180 done ./ in order to run it. 58 00:03:40,180 --> 00:03:42,610 Just as in a Mac or PC, you would double click an icon, 59 00:03:42,610 --> 00:03:44,710 Python is used a little differently. 60 00:03:44,710 --> 00:03:46,960 And other languages are used in the same way, too. 61 00:03:46,960 --> 00:03:49,720 You don't run the programs directly per se. 62 00:03:49,720 --> 00:03:52,600 You instead, literally, starting today, run a program 63 00:03:52,600 --> 00:03:54,400 that itself is called Python. 64 00:03:54,400 --> 00:03:58,750 And you pass as input to it the name of the file containing your source code. 65 00:03:58,750 --> 00:04:00,310 So Python itself is the program. 66 00:04:00,310 --> 00:04:02,220 It supports command line arguments. 67 00:04:02,220 --> 00:04:03,970 And one of those arguments can be the name 68 00:04:03,970 --> 00:04:07,390 of your very program, which means we don't have to very annoyingly keep 69 00:04:07,390 --> 00:04:10,390 compiling and recompiling our code every time we make a change. 70 00:04:10,390 --> 00:04:12,807 If you want to make a change to your code, all you need do 71 00:04:12,807 --> 00:04:15,190 is save your file and rerun this command. 72 00:04:15,190 --> 00:04:16,540 So let's put this into context. 73 00:04:16,540 --> 00:04:20,410 Let me go over to CS50 IDE, which for Python, you can continue using, 74 00:04:20,410 --> 00:04:21,170 as well. 75 00:04:21,170 --> 00:04:25,450 Let me go ahead and create a new file called, for instance, hello.py. 76 00:04:25,450 --> 00:04:28,060 So instead of hello.c, I'll use hello.py-- 77 00:04:28,060 --> 00:04:30,640 py being the convention for Python-based programs. 78 00:04:30,640 --> 00:04:31,390 And you know what? 79 00:04:31,390 --> 00:04:34,630 If I want to print "hello world," I'm just going to go ahead and say 80 00:04:34,630 --> 00:04:36,580 print("hello, world"). 81 00:04:36,580 --> 00:04:38,310 I'm going to go ahead and save my file. 82 00:04:38,310 --> 00:04:40,810 And then, in my terminal window, there's no need to compile. 83 00:04:40,810 --> 00:04:44,200 I can now run the program called Python, which is identically 84 00:04:44,200 --> 00:04:45,700 named to the language itself. 85 00:04:45,700 --> 00:04:48,670 And I'm going to go ahead and run the file called hello.py 86 00:04:48,670 --> 00:04:50,170 as input into that program. 87 00:04:50,170 --> 00:04:53,440 And voila, my very first program in Python. 88 00:04:53,440 --> 00:04:57,430 No curly braces, no int, no main, no void, no include-- 89 00:04:57,430 --> 00:05:00,250 you can just start to get real work done. 90 00:05:00,250 --> 00:05:03,080 But to get more interesting real work done, 91 00:05:03,080 --> 00:05:05,778 let's start to bootstrap things from where we left off 92 00:05:05,778 --> 00:05:07,820 when there are comparisons between Scratch and C, 93 00:05:07,820 --> 00:05:10,362 doing the same thing, again, this time between Scratch and C, 94 00:05:10,362 --> 00:05:11,770 but now Python, as well. 95 00:05:11,770 --> 00:05:14,770 So in the world of Scratch, if you wanted to say "hello, world," 96 00:05:14,770 --> 00:05:18,130 you would use this purple block, a function, as it was called at the time. 97 00:05:18,130 --> 00:05:21,410 And we translated that a few weeks back now to the corresponding C code-- 98 00:05:21,410 --> 00:05:23,080 printf("hello,world"). 99 00:05:23,080 --> 00:05:25,540 And there were a few nuances and things to trip over. 100 00:05:25,540 --> 00:05:26,290 It's printf. 101 00:05:26,290 --> 00:05:27,280 It's not print. 102 00:05:27,280 --> 00:05:29,710 You've got the backslash n and the semicolon. 103 00:05:29,710 --> 00:05:32,590 Today, in Python, if you want to achieve that same goal, 104 00:05:32,590 --> 00:05:37,030 as I just did in the IDE, you can simplify this to just that. 105 00:05:37,030 --> 00:05:41,260 So just to be super clear, what has changed from C to Python? 106 00:05:41,260 --> 00:05:43,690 What do you no longer need to worry about in Python-- 107 00:05:43,690 --> 00:05:44,990 some observations? 108 00:05:44,990 --> 00:05:45,575 Yeah. 109 00:05:45,575 --> 00:05:46,450 AUDIENCE: Semicolons. 110 00:05:46,450 --> 00:05:49,210 DAVID MALAN: No more semicolons-- those are officially gone. 111 00:05:49,210 --> 00:05:49,903 Other comments? 112 00:05:49,903 --> 00:05:51,070 AUDIENCE: No more new lines. 113 00:05:51,070 --> 00:05:52,403 DAVID MALAN: No more new lines-- 114 00:05:52,403 --> 00:05:55,127 print will actually give you one if you simply call print. 115 00:05:55,127 --> 00:05:55,960 Let me go over here. 116 00:05:55,960 --> 00:05:57,377 AUDIENCE: Print instead of printf. 117 00:05:57,377 --> 00:06:00,430 DAVID MALAN: And it's print instead of printf and-- 118 00:06:00,430 --> 00:06:03,970 this is going to end poorly today, because my arm will eventually fail. 119 00:06:03,970 --> 00:06:06,380 Are there any other differences that jump out? 120 00:06:06,380 --> 00:06:06,880 Maybe? 121 00:06:06,880 --> 00:06:08,320 AUDIENCE: No more standard I/O. 122 00:06:08,320 --> 00:06:09,865 DAVID MALAN: No more standard I/O-- 123 00:06:09,865 --> 00:06:11,740 so there's none of the overhead that we need. 124 00:06:11,740 --> 00:06:13,210 I'm not going to give you a stress ball, though, from that one 125 00:06:13,210 --> 00:06:15,490 just because it wasn't in the previous slide for C. 126 00:06:15,490 --> 00:06:17,982 But indeed, there's no overhead needed, the includes 127 00:06:17,982 --> 00:06:19,690 and so forth, just to get real work done. 128 00:06:19,690 --> 00:06:21,040 AUDIENCE: No backslash [INAUDIBLE]. 129 00:06:21,040 --> 00:06:22,707 DAVID MALAN: Oh, that was taken already. 130 00:06:22,707 --> 00:06:23,350 So I'm sorry. 131 00:06:23,350 --> 00:06:24,340 The stress ball's again given out. 132 00:06:24,340 --> 00:06:24,840 Yeah. 133 00:06:24,840 --> 00:06:26,050 AUDIENCE: No %s. 134 00:06:26,050 --> 00:06:28,450 DAVID MALAN: No %s, but not germane fear, 135 00:06:28,450 --> 00:06:30,295 because I'm not yet plugging anything in. 136 00:06:30,295 --> 00:06:32,170 So, in fact, let me just move on, because I'm 137 00:06:32,170 --> 00:06:35,210 pretty sure there's no other differences or stress balls for this one. 138 00:06:35,210 --> 00:06:37,903 So let's take a look, though, at a variant of this, 139 00:06:37,903 --> 00:06:40,570 where we wanted to do something more interesting than just print 140 00:06:40,570 --> 00:06:44,170 statically-- that is, hardcoded-- the same thing again and again-- 141 00:06:44,170 --> 00:06:45,730 hello, world-- something like this. 142 00:06:45,730 --> 00:06:47,730 And now, I'll come back to you in just a moment. 143 00:06:47,730 --> 00:06:50,740 If you want to get users' input, in Scratch, we use this Ask block. 144 00:06:50,740 --> 00:06:55,000 That gave us access to a special return value or variable called answer. 145 00:06:55,000 --> 00:06:57,130 And then, we could use "join" and creatively 146 00:06:57,130 --> 00:07:01,180 use the Say block to concatenate, or join those two values together. 147 00:07:01,180 --> 00:07:05,620 In C, this ended up being this, where you declare a variable on the left. 148 00:07:05,620 --> 00:07:09,340 You assign it the return value on the right, as with the first line there. 149 00:07:09,340 --> 00:07:12,130 And then, you go ahead and print out not just hello. 150 00:07:12,130 --> 00:07:15,580 But hello, %s, which then plugged in that value. 151 00:07:15,580 --> 00:07:17,530 In Python, you can achieve the same goal. 152 00:07:17,530 --> 00:07:19,130 But it's going to be a little simpler. 153 00:07:19,130 --> 00:07:21,730 We can now do it with just this. 154 00:07:21,730 --> 00:07:24,375 So what has disappeared clearly from the screen? 155 00:07:24,375 --> 00:07:26,500 What do we no longer need to worry about in Python? 156 00:07:26,500 --> 00:07:27,000 Yeah. 157 00:07:27,000 --> 00:07:30,070 AUDIENCE: Well, you could just do plus answer instead of, like, 158 00:07:30,070 --> 00:07:33,075 having to do it with a comma and the %s answer. 159 00:07:33,075 --> 00:07:33,950 DAVID MALAN: Exactly. 160 00:07:33,950 --> 00:07:35,150 So there's no %s. 161 00:07:35,150 --> 00:07:38,060 We're just using this comma operator, which is new in Python. 162 00:07:38,060 --> 00:07:40,850 This is actually now called the concatenation operator. 163 00:07:40,850 --> 00:07:43,100 And if you've studied Java or a few other languages, 164 00:07:43,100 --> 00:07:45,225 you know that this will join the string on the left 165 00:07:45,225 --> 00:07:46,433 with the string on the right. 166 00:07:46,433 --> 00:07:48,641 So we can sort of construct this phrase that we want. 167 00:07:48,641 --> 00:07:50,433 And because you called out the %s earlier-- 168 00:07:50,433 --> 00:07:51,020 AUDIENCE: Oh. 169 00:07:51,020 --> 00:07:52,220 DAVID MALAN: --let me be fair there. 170 00:07:52,220 --> 00:07:52,940 Yeah. 171 00:07:52,940 --> 00:07:55,340 AUDIENCE: We didn't have to identify answer as a string. 172 00:07:55,340 --> 00:07:56,090 DAVID MALAN: Good. 173 00:07:56,090 --> 00:07:59,630 We don't have to identify answer, which is, indeed, our variable as a string, 174 00:07:59,630 --> 00:08:02,990 because even though Python will see has data types-- 175 00:08:02,990 --> 00:08:05,180 and it does know what type of value you're storing-- 176 00:08:05,180 --> 00:08:08,055 you don't have to, pedantically as the programmer, tell the computer. 177 00:08:08,055 --> 00:08:10,460 The computer can figure it out from context. 178 00:08:10,460 --> 00:08:12,035 Any other distinctions? 179 00:08:12,035 --> 00:08:13,540 AUDIENCE: No semicolons. 180 00:08:13,540 --> 00:08:15,460 DAVID MALAN: No, no, semicolons, as well, 181 00:08:15,460 --> 00:08:18,210 and I was hoping no one would raise their hands from farther away. 182 00:08:18,210 --> 00:08:20,460 But here we go. 183 00:08:20,460 --> 00:08:20,960 Oh. 184 00:08:20,960 --> 00:08:21,710 [LAUGHTER] 185 00:08:21,710 --> 00:08:22,440 OK. 186 00:08:22,440 --> 00:08:22,940 My bad. 187 00:08:22,940 --> 00:08:23,130 Good. 188 00:08:23,130 --> 00:08:23,310 Good. 189 00:08:23,310 --> 00:08:23,660 Good. 190 00:08:23,660 --> 00:08:23,900 OK. 191 00:08:23,900 --> 00:08:25,942 So there's a few differences, but the short of it 192 00:08:25,942 --> 00:08:28,250 is that it's, indeed, simpler this time. 193 00:08:28,250 --> 00:08:29,544 Indeed, I don't need the %-- 194 00:08:29,544 --> 00:08:32,169 the backslash n either, because I'm going to get that for free. 195 00:08:32,169 --> 00:08:34,909 So let's fly through a few other comparisons, as well, not just 196 00:08:34,909 --> 00:08:38,270 on the string here or here, but now using a different approach. 197 00:08:38,270 --> 00:08:40,770 It turns out that you can use print in a few different ways. 198 00:08:40,770 --> 00:08:43,640 You can, indeed, just concatenate one string with another 199 00:08:43,640 --> 00:08:45,203 by using that plus operator. 200 00:08:45,203 --> 00:08:47,120 Or if you read the documentation, it turns out 201 00:08:47,120 --> 00:08:49,155 that print takes multiple arguments. 202 00:08:49,155 --> 00:08:51,530 So the first one might be the first word you want to say. 203 00:08:51,530 --> 00:08:53,660 The second argument might be the second thing you want to say. 204 00:08:53,660 --> 00:08:56,450 And by default, what print will do, per its documentation, 205 00:08:56,450 --> 00:08:59,780 is automatically join, or concatenate those two strings 206 00:08:59,780 --> 00:09:01,580 automatically by adding a space. 207 00:09:01,580 --> 00:09:04,307 So it's not a typo that I removed the space after the comma. 208 00:09:04,307 --> 00:09:06,140 I'm going to get that for free, so to speak, 209 00:09:06,140 --> 00:09:08,590 because print is going to do that for me. 210 00:09:08,590 --> 00:09:10,340 Now, this one's about to be a little ugly. 211 00:09:10,340 --> 00:09:13,560 But it's an increasingly common approach in Python to do the same thing. 212 00:09:13,560 --> 00:09:16,010 And it's a little more reminiscent of C. But it turns out 213 00:09:16,010 --> 00:09:18,110 we'll see over time it's a little more powerful. 214 00:09:18,110 --> 00:09:21,270 You can also achieve the same result like this. 215 00:09:21,270 --> 00:09:21,770 All right. 216 00:09:21,770 --> 00:09:23,062 So it's a little weird looking. 217 00:09:23,062 --> 00:09:26,480 But once you start to recognize the pattern, it's pretty straightforward. 218 00:09:26,480 --> 00:09:28,130 So it's still the function print. 219 00:09:28,130 --> 00:09:30,590 There's still a double quoted string, though it turns out 220 00:09:30,590 --> 00:09:32,810 you can use single quotes, as well in Python. 221 00:09:32,810 --> 00:09:34,610 Answer is the variable we want to print. 222 00:09:34,610 --> 00:09:39,800 So what's new now is these curly braces, which say interpolate the value 223 00:09:39,800 --> 00:09:44,465 in between those curly braces-- that is, substitute it in just like %s works. 224 00:09:44,465 --> 00:09:47,090 But there's one more oddity, definitely worthy of a stress ball 225 00:09:47,090 --> 00:09:50,810 here, that's not a typo, but does distinguish this from C. Yeah. 226 00:09:50,810 --> 00:09:51,920 AUDIENCE: The f. 227 00:09:51,920 --> 00:09:54,295 DAVID MALAN: The f-- and this is one that-- here you go-- 228 00:09:54,295 --> 00:09:56,520 the weirdest features of-- oh, my bad. 229 00:09:56,520 --> 00:09:57,020 [LAUGHS] 230 00:09:57,020 --> 00:10:00,830 This is one of the weirdest things about recent versions of Python 231 00:10:00,830 --> 00:10:01,550 in recent years. 232 00:10:01,550 --> 00:10:04,460 This is what's called a format string, or f string. 233 00:10:04,460 --> 00:10:09,380 If you don't have this weird f in the beginning of the string immediately 234 00:10:09,380 --> 00:10:12,740 to the left of the double quotes, you will literally print on the screen 235 00:10:12,740 --> 00:10:19,280 H-E-L-L-O comma space curly brace ANSWER curly brace. 236 00:10:19,280 --> 00:10:20,030 And that's it. 237 00:10:20,030 --> 00:10:23,750 So f in front of this turns the string into an f string or format 238 00:10:23,750 --> 00:10:26,900 string, which tells Python, don't print this literally. 239 00:10:26,900 --> 00:10:29,730 Plug the value in that I've placed between the curly braces. 240 00:10:29,730 --> 00:10:32,480 So it's pretty powerful once you pick up the convention like that. 241 00:10:32,480 --> 00:10:32,880 All right. 242 00:10:32,880 --> 00:10:34,100 Let's look at a few other examples. 243 00:10:34,100 --> 00:10:35,017 This, on the example-- 244 00:10:35,017 --> 00:10:37,580 on the left was an-- this on the left was an example 245 00:10:37,580 --> 00:10:41,060 of what type of programming feature? 246 00:10:41,060 --> 00:10:42,680 What do we call this-- 247 00:10:42,680 --> 00:10:43,520 the encounter? 248 00:10:43,520 --> 00:10:43,660 Yeah. 249 00:10:43,660 --> 00:10:44,420 AUDIENCE: The variable. 250 00:10:44,420 --> 00:10:46,087 DAVID MALAN: So this is just a variable. 251 00:10:46,087 --> 00:10:48,285 So a variable here and let me not-- well, 252 00:10:48,285 --> 00:10:50,768 this is getting a little easier for the stress balls. 253 00:10:50,768 --> 00:10:51,560 This is a variable. 254 00:10:51,560 --> 00:10:53,690 And in C, it corresponded to a line like this. 255 00:10:53,690 --> 00:10:56,150 So in Python, this, too, gets a little simpler. 256 00:10:56,150 --> 00:10:59,750 Instead of saying int counter equals zero semicolon, 257 00:10:59,750 --> 00:11:01,820 now, you want a variable called counter? 258 00:11:01,820 --> 00:11:02,960 Just make it so. 259 00:11:02,960 --> 00:11:05,180 Use the equals sign as the assignment operator. 260 00:11:05,180 --> 00:11:07,310 Set it equal to some value on the right-hand side, 261 00:11:07,310 --> 00:11:09,830 but no semicolon anymore. 262 00:11:09,830 --> 00:11:11,990 This, on the left, for instance, was an example 263 00:11:11,990 --> 00:11:15,950 of Scratch updating the value of a variable by one, 264 00:11:15,950 --> 00:11:17,420 incrementing it, so to speak. 265 00:11:17,420 --> 00:11:21,020 In C, we achieve that same result by just saying counter equals counter 266 00:11:21,020 --> 00:11:25,340 plus 1 semicolon, assuming the variable already existed. 267 00:11:25,340 --> 00:11:27,180 We could also do this in another way. 268 00:11:27,180 --> 00:11:29,810 But in Python, we can do this like this. 269 00:11:29,810 --> 00:11:32,040 It's identical, but no semicolon. 270 00:11:32,040 --> 00:11:36,200 But in C, we could also do it like this-- counter plus equals 1 semicolon. 271 00:11:36,200 --> 00:11:39,178 That was just a little shorter than having to type the whole thing out. 272 00:11:39,178 --> 00:11:40,970 In Python, you can do the exact same thing. 273 00:11:40,970 --> 00:11:42,622 But it's going to look different how? 274 00:11:42,622 --> 00:11:43,580 AUDIENCE: No semicolon. 275 00:11:43,580 --> 00:11:45,890 DAVID MALAN: No semicolon for this one, as well-- 276 00:11:45,890 --> 00:11:48,380 what you cannot do, for better or for worse, in C, 277 00:11:48,380 --> 00:11:50,942 you have an even more succinct trick. 278 00:11:50,942 --> 00:11:52,900 What could you do in C to increment a variable? 279 00:11:52,900 --> 00:11:53,547 Yeah. 280 00:11:53,547 --> 00:11:54,795 AUDIENCE: Type in plus plus. 281 00:11:54,795 --> 00:11:57,920 DAVID MALAN: You could do the plus plus operator after the variable's name. 282 00:11:57,920 --> 00:11:59,600 That does not exist in Python. 283 00:11:59,600 --> 00:12:00,200 Here we go. 284 00:12:00,200 --> 00:12:01,790 That does not exist-- sorry. 285 00:12:01,790 --> 00:12:04,010 It exists in Python. 286 00:12:04,010 --> 00:12:05,490 It's simply not in the language. 287 00:12:05,490 --> 00:12:07,880 So you have to start using this approach to be the most succinct. 288 00:12:07,880 --> 00:12:09,422 Well, what else do we have in Python? 289 00:12:09,422 --> 00:12:12,020 Here is, in Scratch, an example of a condition 290 00:12:12,020 --> 00:12:15,910 that only if x is less than y, does it say something on the screen like this. 291 00:12:15,910 --> 00:12:18,470 In C, a little ugly at first, but you've probably 292 00:12:18,470 --> 00:12:21,530 gotten used to this after multiple weeks of coding in C. 293 00:12:21,530 --> 00:12:23,900 Now, in Python, this is going to get simpler, too. 294 00:12:23,900 --> 00:12:25,910 The semicolon's definitely going away. 295 00:12:25,910 --> 00:12:28,070 The backslash n is definitely going away. 296 00:12:28,070 --> 00:12:31,340 Printf is about to become print, but also going away 297 00:12:31,340 --> 00:12:32,940 is most everything else. 298 00:12:32,940 --> 00:12:35,120 So there's no curly braces anymore. 299 00:12:35,120 --> 00:12:37,940 There is now a colon after the condition, 300 00:12:37,940 --> 00:12:39,410 or the Boolean expression there. 301 00:12:39,410 --> 00:12:41,960 There is necessary indentation. 302 00:12:41,960 --> 00:12:44,780 So those of you, who've been a little loose with style50 303 00:12:44,780 --> 00:12:47,420 and favoring instead, just writing all of your code 304 00:12:47,420 --> 00:12:50,120 over on the left-hand side of the terminal, that 305 00:12:50,120 --> 00:12:54,170 has to stop now, even if style50 hasn't broken you of that habit already. 306 00:12:54,170 --> 00:12:57,140 Python is sensitive to whitespace, which means 307 00:12:57,140 --> 00:13:01,430 that if you want to use a condition and execute code inside of that condition, 308 00:13:01,430 --> 00:13:06,080 it must be indented consistently, by convention, four spaces. 309 00:13:06,080 --> 00:13:09,470 And it should always be four spaces or four more spaces and so forth. 310 00:13:09,470 --> 00:13:11,620 The curly braces, though, are now gone. 311 00:13:11,620 --> 00:13:12,870 How about something like this? 312 00:13:12,870 --> 00:13:16,220 If we have an if else statement, just like we did in week 0, in week 1, 313 00:13:16,220 --> 00:13:20,990 we translated that to C as such, introducing if and else this time. 314 00:13:20,990 --> 00:13:22,250 That, too, gets simpler. 315 00:13:22,250 --> 00:13:23,627 Now, it can be distilled as this. 316 00:13:23,627 --> 00:13:24,710 The curly braces are gone. 317 00:13:24,710 --> 00:13:26,360 The backslash n's are gone. 318 00:13:26,360 --> 00:13:29,450 But we've, again, added some colons, some colons, 319 00:13:29,450 --> 00:13:33,170 and some explicit indentation that's now matters all the more. 320 00:13:33,170 --> 00:13:37,040 How about an if else if else-- so a three-way fork in the road, 321 00:13:37,040 --> 00:13:37,560 if you will? 322 00:13:37,560 --> 00:13:41,510 In C, you just continue that same logic, asking if else if else. 323 00:13:41,510 --> 00:13:43,880 Python's not only going to get more succinct. 324 00:13:43,880 --> 00:13:46,880 It's also going to get a little weird, but not a typo. 325 00:13:46,880 --> 00:13:50,730 What jumps out at you here with Python that seems a little misleading? 326 00:13:50,730 --> 00:13:51,230 Yeah. 327 00:13:51,230 --> 00:13:52,675 AUDIENCE: Else if becomes elif. 328 00:13:52,675 --> 00:13:55,550 DAVID MALAN: Yeah, so else if was apparently too laborious for humans 329 00:13:55,550 --> 00:13:56,050 to type. 330 00:13:56,050 --> 00:13:58,370 And so now, in Python, that's just elif-- 331 00:13:58,370 --> 00:14:01,500 E-L-I-F-- but it means exactly the same thing. 332 00:14:01,500 --> 00:14:02,000 All right. 333 00:14:02,000 --> 00:14:02,630 How about this? 334 00:14:02,630 --> 00:14:04,250 This is a loop in Scratch. 335 00:14:04,250 --> 00:14:05,690 It does something forever. 336 00:14:05,690 --> 00:14:09,117 This wasn't super straightforward to convert to C, because in C, 337 00:14:09,117 --> 00:14:10,700 you don't really have a forever block. 338 00:14:10,700 --> 00:14:12,742 But we did decide that you can use while and just 339 00:14:12,742 --> 00:14:15,470 say true, true being a Boolean value that 340 00:14:15,470 --> 00:14:17,540 evaluates always to true by definition. 341 00:14:17,540 --> 00:14:19,600 So this would print out hello world forever. 342 00:14:19,600 --> 00:14:21,650 In Python, it's almost the same. 343 00:14:21,650 --> 00:14:24,470 But in Python, it's going to look like this. 344 00:14:24,470 --> 00:14:25,940 So the curly braces are gone. 345 00:14:25,940 --> 00:14:27,020 The semicolon is gone. 346 00:14:27,020 --> 00:14:28,130 The hand is already up. 347 00:14:28,130 --> 00:14:29,047 What's different here? 348 00:14:29,047 --> 00:14:30,680 AUDIENCE: I have a question about if. 349 00:14:30,680 --> 00:14:31,430 DAVID MALAN: Sure. 350 00:14:31,430 --> 00:14:32,998 What's the question about if? 351 00:14:32,998 --> 00:14:36,420 AUDIENCE: We didn't use curly brackets to solve the if. 352 00:14:36,420 --> 00:14:39,585 So like, we just indent back to [INAUDIBLE].. 353 00:14:39,585 --> 00:14:40,460 DAVID MALAN: Correct. 354 00:14:40,460 --> 00:14:43,340 But you don't-- because we don't have curly braces, 355 00:14:43,340 --> 00:14:47,660 it's not necessarily obvious at first glance where the code you want 356 00:14:47,660 --> 00:14:49,790 to execute conditionally begins and ends, 357 00:14:49,790 --> 00:14:52,263 unless you rely on the indentation. 358 00:14:52,263 --> 00:14:54,680 So if you wanted to do something outside of the condition, 359 00:14:54,680 --> 00:14:56,780 you just un-indent and move on your way. 360 00:14:56,780 --> 00:14:59,360 So it's identical to how you should have been writing C code. 361 00:14:59,360 --> 00:15:00,380 There's no curly braces. 362 00:15:00,380 --> 00:15:02,900 But now, the indentation matters. 363 00:15:02,900 --> 00:15:05,600 So back to the for loop here-- this will loop infinitely in C. 364 00:15:05,600 --> 00:15:07,320 In Python, I claim it looks like this. 365 00:15:07,320 --> 00:15:10,070 And the only new difference here that's worth noting is-- is what? 366 00:15:10,070 --> 00:15:11,060 AUDIENCE: True is capitalized. 367 00:15:11,060 --> 00:15:12,110 DAVID MALAN: True is capitalized. 368 00:15:12,110 --> 00:15:12,610 Why? 369 00:15:12,610 --> 00:15:15,560 Just because, but in Python, the two Boolean values, true and false, 370 00:15:15,560 --> 00:15:17,900 are, indeed, capitalized as here. 371 00:15:17,900 --> 00:15:18,420 All right. 372 00:15:18,420 --> 00:15:20,570 So let's finish out with a few more blocks. 373 00:15:20,570 --> 00:15:23,780 Recall that we implemented a coughing cat early on. 374 00:15:23,780 --> 00:15:26,630 And this is how you might do that three times specifically. 375 00:15:26,630 --> 00:15:28,460 In C, you can do this in a couple of ways. 376 00:15:28,460 --> 00:15:31,700 And the first way we proposed in week 1 was 377 00:15:31,700 --> 00:15:34,340 that you give yourself the counting variable like i, 378 00:15:34,340 --> 00:15:35,660 but you could call it anything. 379 00:15:35,660 --> 00:15:40,323 And then, you do something while i is greater than some target value, like 0. 380 00:15:40,323 --> 00:15:43,490 And then, you go ahead and cough again and again and again on each iteration 381 00:15:43,490 --> 00:15:46,070 decrementing-- that is, decreasing the value of i-- 382 00:15:46,070 --> 00:15:47,870 and then, keep checking that condition. 383 00:15:47,870 --> 00:15:50,240 So in Python, we can do pretty much the same thing. 384 00:15:50,240 --> 00:15:53,480 This converts pretty tightly to just this, which 385 00:15:53,480 --> 00:15:56,300 is pretty equivalent, except for the semicolons, the curly braces, 386 00:15:56,300 --> 00:16:01,742 and so forth, noting this time that we have the colon after the word while. 387 00:16:01,742 --> 00:16:03,200 But you can do this in another way. 388 00:16:03,200 --> 00:16:05,840 And indeed, we implemented it using a for loop, which is probably 389 00:16:05,840 --> 00:16:08,257 something you've gotten pretty familiar with and hopefully 390 00:16:08,257 --> 00:16:09,800 pretty comfortable with by now. 391 00:16:09,800 --> 00:16:11,987 These don't map directly to Python. 392 00:16:11,987 --> 00:16:13,070 You can do the same thing. 393 00:16:13,070 --> 00:16:17,000 But it's actually a little easier at least once you get used to it. 394 00:16:17,000 --> 00:16:20,720 So here, we had a variable called i incremented to 0. 395 00:16:20,720 --> 00:16:26,042 It kept getting incremented by a 1 up to but not including the value 3. 396 00:16:26,042 --> 00:16:29,000 And on each iteration, we printed cough, thereby achieving three coughs 397 00:16:29,000 --> 00:16:29,930 on the screen. 398 00:16:29,930 --> 00:16:32,930 In Python, we can change this to the following. 399 00:16:32,930 --> 00:16:34,235 You still have the keyword for. 400 00:16:34,235 --> 00:16:35,360 But there's no parentheses. 401 00:16:35,360 --> 00:16:36,710 There are no semicolons. 402 00:16:36,710 --> 00:16:42,110 And you a little more casually say for i in the following list of values. 403 00:16:42,110 --> 00:16:44,570 So in Python, square brackets represent what 404 00:16:44,570 --> 00:16:46,070 we're going to start calling a list. 405 00:16:46,070 --> 00:16:49,340 It's pretty much the same thing as an array, but with many more features. 406 00:16:49,340 --> 00:16:52,100 You can grow and shrink these lists in C-- 407 00:16:52,100 --> 00:16:52,730 in Python. 408 00:16:52,730 --> 00:16:55,250 You could not do that in C. 409 00:16:55,250 --> 00:16:57,590 And so in this case, this is, on the first iteration, 410 00:16:57,590 --> 00:16:58,900 going to set i equal to 0. 411 00:16:58,900 --> 00:16:59,900 And it's going to cough. 412 00:16:59,900 --> 00:17:03,050 It's then going to automatically set equal to 1 and then cough. 413 00:17:03,050 --> 00:17:05,302 It's then going to set i equal to 2 and then cough. 414 00:17:05,302 --> 00:17:07,010 And even though you're not doing anything 415 00:17:07,010 --> 00:17:10,369 with the value of i, because there is three values in this list-- 416 00:17:10,369 --> 00:17:13,940 0, 1, 2-- it's going to cough three times. 417 00:17:13,940 --> 00:17:18,050 But there's a way to do this even more succinctly, because how would you 418 00:17:18,050 --> 00:17:21,505 implement this same idea if you wanted to cough 10 times or 50 times? 419 00:17:21,505 --> 00:17:23,630 I mean, that would get pretty atrocious if you just 420 00:17:23,630 --> 00:17:27,020 had to make a really big list with 0 through 49. 421 00:17:27,020 --> 00:17:27,890 You don't have to. 422 00:17:27,890 --> 00:17:29,960 There's a special function in Python called 423 00:17:29,960 --> 00:17:32,960 range that does that work for you. 424 00:17:32,960 --> 00:17:37,210 If you want to iterate three times, you literally say range open paren 425 00:17:37,210 --> 00:17:38,362 3 close paren. 426 00:17:38,362 --> 00:17:40,070 And what that's going to do for your code 427 00:17:40,070 --> 00:17:44,150 is, essentially, hand you back three values from 0 to 1 to 2 428 00:17:44,150 --> 00:17:47,990 automatically without you having to hard code them or write them explicitly. 429 00:17:47,990 --> 00:17:51,773 So now, if you want to call 50 times, you just change the 3 to a 50. 430 00:17:51,773 --> 00:17:54,690 You don't have to, of course, declare everything with square brackets. 431 00:17:54,690 --> 00:17:58,130 So this is a very common paradigm then in Python for loops. 432 00:17:58,130 --> 00:17:59,240 Well, what about types? 433 00:17:59,240 --> 00:18:00,920 Even this world gets a little simpler. 434 00:18:00,920 --> 00:18:04,310 These were the data types we focused on in C. But a bunch of them 435 00:18:04,310 --> 00:18:06,350 now go away in Python. 436 00:18:06,350 --> 00:18:08,930 We still have bool, like the capital true and false. 437 00:18:08,930 --> 00:18:12,120 We still have ints and floats, it turns out. 438 00:18:12,120 --> 00:18:16,520 But we also have strs, which is just a shorter version of the word string. 439 00:18:16,520 --> 00:18:20,780 And whereas in C, we definitely had the notion, the concept of strings, 440 00:18:20,780 --> 00:18:24,890 but we pretended that the word string existed, thanks to the CS50 library-- 441 00:18:24,890 --> 00:18:28,220 in Python, there actually is a data type called str-- 442 00:18:28,220 --> 00:18:29,660 you can just call it string-- 443 00:18:29,660 --> 00:18:33,480 that gives us even more functionality than the CS50 library did. 444 00:18:33,480 --> 00:18:35,780 So that was just a stepping stone to what exists here. 445 00:18:35,780 --> 00:18:37,940 And there's other data types in Python, too. 446 00:18:37,940 --> 00:18:39,860 In fact, a few of them are just here. 447 00:18:39,860 --> 00:18:42,200 And we'll play today with a few of these data types, 448 00:18:42,200 --> 00:18:44,700 because if you think about what we did the past two or three 449 00:18:44,700 --> 00:18:49,100 weeks introducing not only arrays, but then linked to lists and hash tables 450 00:18:49,100 --> 00:18:53,720 and trees and tris and stacks and Qs, this whole toolkit of data structures 451 00:18:53,720 --> 00:18:55,250 did we start talking about-- 452 00:18:55,250 --> 00:18:59,930 in Python, wonderfully, if you want a hash table, it comes with the language. 453 00:18:59,930 --> 00:19:03,770 If you want a linked list, it comes with the language-- no more pointers, 454 00:19:03,770 --> 00:19:07,130 no more creation of those low-level data structures yourself. 455 00:19:07,130 --> 00:19:09,140 You can just use them out of the box. 456 00:19:09,140 --> 00:19:12,800 So here's a list, then, to summarize some of the more powerful data types 457 00:19:12,800 --> 00:19:16,430 we get in Python that we did not have in C, unless we wrote them ourselves. 458 00:19:16,430 --> 00:19:19,670 You can have a range, like we just saw, which is just a sequence of numbers, 459 00:19:19,670 --> 00:19:22,130 like 0, 1, 2, or anything else. 460 00:19:22,130 --> 00:19:25,400 We can have a list, which is a sequence of mutable values, which 461 00:19:25,400 --> 00:19:28,940 is a fancy way of saying, they are values that can be changed. 462 00:19:28,940 --> 00:19:31,940 Mutable, like mutation, just means you can change those values. 463 00:19:31,940 --> 00:19:36,050 So you can add to, remove, and replace the values in the initial list. 464 00:19:36,050 --> 00:19:39,800 A list, then, in Python is like an array in C, 465 00:19:39,800 --> 00:19:43,670 but that can be automatically increased in size or decreased in size. 466 00:19:43,670 --> 00:19:46,740 So you don't have to do all of that maloc or realloc stuff anymore. 467 00:19:46,740 --> 00:19:49,490 A tuple is a sequence of immutable values, which 468 00:19:49,490 --> 00:19:52,790 is a fancy way of saying a sequence of values that once you put them there, 469 00:19:52,790 --> 00:19:54,050 you can't change them. 470 00:19:54,050 --> 00:19:56,900 So this is sometimes useful for, like, coordinates, x comma y, 471 00:19:56,900 --> 00:19:58,740 for GPS coordinates or the like. 472 00:19:58,740 --> 00:20:01,073 But when you know you're not going to change the values, 473 00:20:01,073 --> 00:20:02,690 you can use a tuple instead. 474 00:20:02,690 --> 00:20:06,900 Dict, or dictionary, is a collection of key value pairs. 475 00:20:06,900 --> 00:20:11,240 And this is the abstract data type, to borrow a word from a couple weeks ago, 476 00:20:11,240 --> 00:20:14,600 that underneath the hood is implemented with the thing we called-- 477 00:20:14,600 --> 00:20:16,610 and you built for Pset5-- 478 00:20:16,610 --> 00:20:17,870 a hash table. 479 00:20:17,870 --> 00:20:19,790 So Python comes with hash tables. 480 00:20:19,790 --> 00:20:23,030 They're called dictionaries, abbreviated dict in the language. 481 00:20:23,030 --> 00:20:26,690 And this simply will allow you to-- if you want a hash table, 482 00:20:26,690 --> 00:20:29,380 just declare it, just like you would an int or a float. 483 00:20:29,380 --> 00:20:31,487 There's no more implementing that yourself. 484 00:20:31,487 --> 00:20:34,070 And then, lastly, at least among the ones we'll look at today, 485 00:20:34,070 --> 00:20:36,170 a set is a collection of unique values. 486 00:20:36,170 --> 00:20:38,085 You might recall this term from a math class. 487 00:20:38,085 --> 00:20:39,710 So this is just a collection of values. 488 00:20:39,710 --> 00:20:42,630 But even if you put multiple copies of the same value in there, 489 00:20:42,630 --> 00:20:44,750 it's going to throw the duplicates away for you, 490 00:20:44,750 --> 00:20:46,633 which is just sometimes convenience. 491 00:20:46,633 --> 00:20:48,050 And there's other data types, too. 492 00:20:48,050 --> 00:20:50,730 But that's more than enough to get us started today. 493 00:20:50,730 --> 00:20:52,860 Indeed, everything we're going to look at today 494 00:20:52,860 --> 00:20:55,160 ultimately is derivative of the documentation. 495 00:20:55,160 --> 00:20:57,650 And Python's documentation is very thorough. 496 00:20:57,650 --> 00:21:00,290 But I will disclaim it's not super user friendly. 497 00:21:00,290 --> 00:21:03,135 And so starting this week and beyond, in really any language, 498 00:21:03,135 --> 00:21:04,760 like Google is going to be your friend. 499 00:21:04,760 --> 00:21:06,680 And sometimes Stack Overflow is going to be your friend. 500 00:21:06,680 --> 00:21:09,260 And your teaching fellows in course this instance will certainly 501 00:21:09,260 --> 00:21:11,968 be your friends, not in the sense that you should start googling, 502 00:21:11,968 --> 00:21:14,960 how to implement problem set 6, but rather, how 503 00:21:14,960 --> 00:21:17,480 do you iterate over values in Python? 504 00:21:17,480 --> 00:21:21,050 Or how do you convert string to lower case? 505 00:21:21,050 --> 00:21:23,435 Those kinds of building blocks that, frankly, 506 00:21:23,435 --> 00:21:26,060 are not intellectually interesting to memorize from our class-- 507 00:21:26,060 --> 00:21:28,852 you can just grabb them off the shelf or off Google when you need-- 508 00:21:28,852 --> 00:21:31,640 is exactly how folks like Brian and I and [INAUDIBLE] and Rodrigo 509 00:21:31,640 --> 00:21:32,855 program every day. 510 00:21:32,855 --> 00:21:35,480 You don't necessarily memorize everything in the documentation. 511 00:21:35,480 --> 00:21:37,180 But you know how to find it. 512 00:21:37,180 --> 00:21:38,930 And indeed, among the goals for this class 513 00:21:38,930 --> 00:21:41,090 is to take off the last of those training wheels 514 00:21:41,090 --> 00:21:45,440 and actually have you teach yourself new things on your own, 515 00:21:45,440 --> 00:21:48,240 having done it with the support structure of the class itself. 516 00:21:48,240 --> 00:21:51,920 So with that said, let's go ahead and do a couple of demonstrations 517 00:21:51,920 --> 00:21:53,720 of just what we can do with this language 518 00:21:53,720 --> 00:21:57,710 and why it's not only so powerful, but also so popular right now. 519 00:21:57,710 --> 00:22:02,870 I'm going to go ahead, for instance, and open up a file called-- 520 00:22:02,870 --> 00:22:04,730 let's call it blur.py. 521 00:22:04,730 --> 00:22:08,300 And blur.py might be reminiscent of what we did a few weeks back 522 00:22:08,300 --> 00:22:11,090 in Pset4, where in C, you implemented a set of filters. 523 00:22:11,090 --> 00:22:13,430 And blurring an image was one of them. 524 00:22:13,430 --> 00:22:17,510 And let me go ahead and open up the image here, for instance. 525 00:22:17,510 --> 00:22:25,070 I have in the source 6 directory today a whole bunch of examples, such as-- 526 00:22:25,070 --> 00:22:28,550 the image I want is going to be in Filter. 527 00:22:28,550 --> 00:22:31,030 This was the one we looked at some weeks ago. 528 00:22:31,030 --> 00:22:34,830 So we had this nice picture of [INAUDIBLE] bridge down by the river. 529 00:22:34,830 --> 00:22:38,300 And it's super pristine, nice and clear, because it's a very high-quality photo. 530 00:22:38,300 --> 00:22:41,292 But let's try to blur this in, this time, using Python. 531 00:22:41,292 --> 00:22:42,750 So I'm going to go over to blur.py. 532 00:22:42,750 --> 00:22:46,710 And I'm going to go ahead and do the equivalent in Python of including 533 00:22:46,710 --> 00:22:48,390 some library or some header files. 534 00:22:48,390 --> 00:22:50,310 But you don't say include in Python. 535 00:22:50,310 --> 00:22:51,810 You, instead, say import. 536 00:22:51,810 --> 00:22:55,040 And I'm going to say from PIL, which is like the pillow library-- 537 00:22:55,040 --> 00:22:56,790 I'm going to go ahead and import something 538 00:22:56,790 --> 00:22:58,950 called an image and an image filter. 539 00:22:58,950 --> 00:23:02,160 I only know these exist by having read the documentation for them 540 00:23:02,160 --> 00:23:07,440 and knowing that I can include or import those special features. 541 00:23:07,440 --> 00:23:08,770 And let's go ahead and do this. 542 00:23:08,770 --> 00:23:11,100 I'm going to go ahead and open up the image as it stands now. 543 00:23:11,100 --> 00:23:12,190 And I'll call that before. 544 00:23:12,190 --> 00:23:17,580 So I'm going to go ahead and open an image called bridge.bmp. 545 00:23:17,580 --> 00:23:20,400 And then, I'm going to go ahead and after that, say, you know what? 546 00:23:20,400 --> 00:23:27,630 Go ahead and run the before image through a filter called ImageFilter, 547 00:23:27,630 --> 00:23:30,960 specifically ImageFilter.BLUR. 548 00:23:30,960 --> 00:23:36,540 And then, after that, I'm going to go ahead and say after.save("out.bmp"). 549 00:23:36,540 --> 00:23:38,140 And I'm going to save my file. 550 00:23:38,140 --> 00:23:41,400 So once this has been read here-- 551 00:23:41,400 --> 00:23:43,480 there we go-- once this has been saved here, 552 00:23:43,480 --> 00:23:45,460 now I'm going to go ahead and do the following. 553 00:23:45,460 --> 00:23:47,760 Let me go into my file directory here. 554 00:23:47,760 --> 00:23:49,990 Let me open my terminal window here. 555 00:23:49,990 --> 00:23:53,460 Let me go ahead and grab a copy of this from my src6 directory here, 556 00:23:53,460 --> 00:23:55,980 which is in my filter subdirectory today-- 557 00:23:55,980 --> 00:23:57,600 bridge.bmp. 558 00:23:57,600 --> 00:24:01,228 And let me go ahead now and run python blur.py. 559 00:24:01,228 --> 00:24:03,020 So I'm going to go ahead and hit Enter now. 560 00:24:03,020 --> 00:24:05,957 Notice that another file was just created in my directory here. 561 00:24:05,957 --> 00:24:08,040 Let's go ahead and look at the nice pretty bridge, 562 00:24:08,040 --> 00:24:09,550 which is where we started. 563 00:24:09,550 --> 00:24:11,260 Let me shrink my terminal window here. 564 00:24:11,260 --> 00:24:13,680 Let me open now out.bmp. 565 00:24:13,680 --> 00:24:22,890 And voila-- blurred-- before, after, before, after. 566 00:24:22,890 --> 00:24:26,290 But what's more important-- three lines of code-- 567 00:24:26,290 --> 00:24:28,350 so that's how you would implement the same thing 568 00:24:28,350 --> 00:24:31,080 as Pset4's blur feature in Python. 569 00:24:31,080 --> 00:24:31,800 But wait. 570 00:24:31,800 --> 00:24:33,090 There's more. 571 00:24:33,090 --> 00:24:34,710 What about Pset5? 572 00:24:34,710 --> 00:24:38,328 Pset5, recall, you implemented a hash table. 573 00:24:38,328 --> 00:24:41,620 And indeed, you decided how to implement the underlying link list and the array 574 00:24:41,620 --> 00:24:42,162 and so forth. 575 00:24:42,162 --> 00:24:42,995 Well, you know what? 576 00:24:42,995 --> 00:24:46,080 Let me go ahead and create another file, this time, in Python-- 577 00:24:46,080 --> 00:24:48,598 wasn't allowed two weeks ago, but is allowed now. 578 00:24:48,598 --> 00:24:50,640 And I'm going to go ahead and implement this how? 579 00:24:50,640 --> 00:24:53,557 Well, I had a few different data structures to choose from in Python-- 580 00:24:53,557 --> 00:24:58,170 dict for dictionary and list and range and so forth and then also set. 581 00:24:58,170 --> 00:25:00,128 And I could use dict or dictionary. 582 00:25:00,128 --> 00:25:02,920 But I'm actually going to set, because what really is a dictionary? 583 00:25:02,920 --> 00:25:04,045 It's a set of unique words. 584 00:25:04,045 --> 00:25:06,030 So I'm going to use something called sets. 585 00:25:06,030 --> 00:25:08,840 So I'm going to go ahead and give myself a variable called words. 586 00:25:08,840 --> 00:25:11,340 And I'm going to initialize it to an empty set, if you will, 587 00:25:11,340 --> 00:25:13,620 just a container that can grow to fit values. 588 00:25:13,620 --> 00:25:16,500 But just in case I screw up and put duplicates in there, that's OK. 589 00:25:16,500 --> 00:25:18,420 The set is going to get rid of them for me. 590 00:25:18,420 --> 00:25:19,710 And then, recall for-- 591 00:25:19,710 --> 00:25:24,750 or sorry-- for this program, not speller.py, but rather dictionary.py 592 00:25:24,750 --> 00:25:29,370 to correspond with dictionary.c, we had a few functions. 593 00:25:29,370 --> 00:25:31,560 Now, in Python, the way you implement a function 594 00:25:31,560 --> 00:25:34,800 is not by saying int main void or something like that. 595 00:25:34,800 --> 00:25:37,470 You, instead, more simply say def for define 596 00:25:37,470 --> 00:25:40,620 and then the name of the function you want, like check, and then the inputs 597 00:25:40,620 --> 00:25:42,255 to that function, like word. 598 00:25:42,255 --> 00:25:43,380 And I'll come back to this. 599 00:25:43,380 --> 00:25:45,090 And I'm just going to say TODO for a moment, 600 00:25:45,090 --> 00:25:46,882 because I'm going to go ahead and predefine 601 00:25:46,882 --> 00:25:51,305 my other functions, like load, took a dictionary file name as input. 602 00:25:51,305 --> 00:25:53,430 So I'm going to go ahead and come back and do that. 603 00:25:53,430 --> 00:25:56,100 I, then, had a size function-- took no inputs. 604 00:25:56,100 --> 00:25:58,420 I'm going to go ahead and do that. 605 00:25:58,420 --> 00:26:01,150 And then, down here, I had an unload function. 606 00:26:01,150 --> 00:26:03,520 So I'm going to go ahead and come back and do that. 607 00:26:03,520 --> 00:26:05,943 So how do I now implement each of these functions? 608 00:26:05,943 --> 00:26:07,110 Well, let's start with load. 609 00:26:07,110 --> 00:26:10,460 After all, if I'm handed the dictionary, first thing I wanted to do in Pset4-- 610 00:26:10,460 --> 00:26:12,810 or Pset5-- was load it into memory. 611 00:26:12,810 --> 00:26:15,570 Well, it turns out in Python, you can do something like this-- 612 00:26:15,570 --> 00:26:21,427 file=open(dictionay), which is so close to C. But it's open instead of fopen. 613 00:26:21,427 --> 00:26:23,010 And I'm going to open it in read mode. 614 00:26:23,010 --> 00:26:26,040 So so far, this actually looks quite like the C version. 615 00:26:26,040 --> 00:26:29,260 But now, if I want to iterate over every word in the file, 616 00:26:29,260 --> 00:26:31,950 it turns out I can use a for loop, because a for loop in Python 617 00:26:31,950 --> 00:26:34,290 is way more powerful than a for loop in C. 618 00:26:34,290 --> 00:26:39,090 I can literally say for line in file. 619 00:26:39,090 --> 00:26:42,930 And then, here, I can go ahead and add to my set of words, 620 00:26:42,930 --> 00:26:46,350 which is in this variable called words, literally using a function called 621 00:26:46,350 --> 00:26:48,750 add that particular line-- 622 00:26:48,750 --> 00:26:50,460 that is, the word from the file. 623 00:26:50,460 --> 00:26:52,342 And then, you know, after that, file.close 624 00:26:52,342 --> 00:26:53,550 is how I'm going to close it. 625 00:26:53,550 --> 00:26:54,630 And then, all seems well. 626 00:26:54,630 --> 00:26:56,850 I'm going to go ahead and return True. 627 00:26:56,850 --> 00:26:58,680 Now, there's one bug here at the moment. 628 00:26:58,680 --> 00:27:03,120 Every line in the dictionary actually ended with what character 629 00:27:03,120 --> 00:27:05,220 technically, even though you don't see it, per se? 630 00:27:05,220 --> 00:27:06,270 AUDIENCE: A new line. 631 00:27:06,270 --> 00:27:07,562 DAVID MALAN: A new line, right? 632 00:27:07,562 --> 00:27:09,562 Every word in the file ended with a backslash n, 633 00:27:09,562 --> 00:27:12,020 even though when you open the file, we humans don't see it. 634 00:27:12,020 --> 00:27:12,780 But it is there. 635 00:27:12,780 --> 00:27:13,450 So that's OK. 636 00:27:13,450 --> 00:27:16,110 If you want to go ahead and strip off the trailing new line, so 637 00:27:16,110 --> 00:27:19,080 to speak, at the end of every line, you can just 638 00:27:19,080 --> 00:27:22,590 go to the line of the current file-- say rstrip, 639 00:27:22,590 --> 00:27:24,480 where rstrip means reverse strip. 640 00:27:24,480 --> 00:27:27,420 So remove from the end of the string what character? 641 00:27:27,420 --> 00:27:30,110 Backslash n-- and that's going to now look at the line, 642 00:27:30,110 --> 00:27:33,470 chopp off the backslash n, and pass as input to this 643 00:27:33,470 --> 00:27:36,660 add function the word from the dictionary. 644 00:27:36,660 --> 00:27:37,160 All right. 645 00:27:37,160 --> 00:27:37,850 What remains? 646 00:27:37,850 --> 00:27:40,290 Well, up here, how do I check the dictionary? 647 00:27:40,290 --> 00:27:43,310 Well, it turns out in Python, you can use conditions 648 00:27:43,310 --> 00:27:45,260 even more powerfully than in C. And if you 649 00:27:45,260 --> 00:27:49,740 want to know if a word is in a variable, like a word is in a set called words, 650 00:27:49,740 --> 00:27:54,410 we'll just ask the question, if word in words, you know what? 651 00:27:54,410 --> 00:27:56,520 Go ahead and return true. 652 00:27:56,520 --> 00:28:00,890 Else, go ahead and return false, although slight bug-- 653 00:28:00,890 --> 00:28:03,770 we also had to deal with capitalization in Pset5, right? 654 00:28:03,770 --> 00:28:07,550 The user's input from the file, the text, might be uppercase or lowercase. 655 00:28:07,550 --> 00:28:09,642 No big deal-- you want to lowercase a word? 656 00:28:09,642 --> 00:28:11,600 You don't have to do it character by character. 657 00:28:11,600 --> 00:28:16,790 Just call word, which is the word you're looking for, dot, which means go inside 658 00:28:16,790 --> 00:28:19,640 of it, just like a struct in C. And here, 659 00:28:19,640 --> 00:28:23,760 call a function that's built into that string called lower. 660 00:28:23,760 --> 00:28:24,260 All right. 661 00:28:24,260 --> 00:28:26,610 Well, I'm getting a little bored with implementing this. 662 00:28:26,610 --> 00:28:27,590 So let's finish this up. 663 00:28:27,590 --> 00:28:28,257 Let me go ahead. 664 00:28:28,257 --> 00:28:31,680 And how do I check how many words are in my dictionary? 665 00:28:31,680 --> 00:28:35,690 Well, just ask what the length is of that set. 666 00:28:35,690 --> 00:28:37,340 And how do you go about in free-- 667 00:28:37,340 --> 00:28:41,240 how do you go about freeing all of the memory used by your program in Python? 668 00:28:41,240 --> 00:28:43,730 How do you go about undoing the effects? 669 00:28:43,730 --> 00:28:44,960 Well, you don't. 670 00:28:44,960 --> 00:28:45,950 It's done for you. 671 00:28:45,950 --> 00:28:48,390 So we'll just return true. 672 00:28:48,390 --> 00:28:50,210 So this, then, is-- 673 00:28:50,210 --> 00:28:51,860 I'm sad to say-- 674 00:28:51,860 --> 00:28:57,710 I mean, excited to say-- is the entirety of Pset5 implemented in Python. 675 00:28:57,710 --> 00:29:01,100 So why did we do what we did? 676 00:29:01,100 --> 00:29:03,650 Well, let's actually run an example here. 677 00:29:03,650 --> 00:29:06,702 So I've got two windows open now-- two terminal windows-- 678 00:29:06,702 --> 00:29:07,910 on the left and on the right. 679 00:29:07,910 --> 00:29:09,800 On the left is my implementation of speller 680 00:29:09,800 --> 00:29:12,230 in C from a couple of weeks ago. 681 00:29:12,230 --> 00:29:15,110 Let me go ahead and run speller on one of the bigger files, 682 00:29:15,110 --> 00:29:17,400 like Shakespeare was one of the bigger files. 683 00:29:17,400 --> 00:29:20,840 So let's go ahead and see all of the misspelled words in Shakespeare, 684 00:29:20,840 --> 00:29:23,600 and using a hash table two weeks ago, looks 685 00:29:23,600 --> 00:29:31,190 like it took me 0.51 seconds to look for misspellings in Shakespeare.text. 686 00:29:31,190 --> 00:29:32,210 How about in Python? 687 00:29:32,210 --> 00:29:35,420 Well, over here, I have a copy of what we just wrote. 688 00:29:35,420 --> 00:29:39,022 This is also using a program called speller.py, which I didn't pull up, 689 00:29:39,022 --> 00:29:39,980 but I wrote in advance. 690 00:29:39,980 --> 00:29:41,690 And this is not the code that's timed. 691 00:29:41,690 --> 00:29:44,257 Only dictionary.c and dictionary.py are timed. 692 00:29:44,257 --> 00:29:47,090 So I'm going to go ahead and run my Python version of speller, which 693 00:29:47,090 --> 00:29:50,090 is going to muse dictionary.py that I just 694 00:29:50,090 --> 00:29:54,140 wrote on Shakespeare.text-- same file, right-hand side. 695 00:29:54,140 --> 00:29:57,200 You'll see the same words quickly flying by on the screen, 696 00:29:57,200 --> 00:30:02,180 but you might notice something already. 697 00:30:02,180 --> 00:30:06,200 So there's always a tradeoff in computer science and certainly in programming. 698 00:30:06,200 --> 00:30:07,820 There's always a price paid. 699 00:30:07,820 --> 00:30:11,990 Wowed as you were by how fast this is, relatively speaking, and more 700 00:30:11,990 --> 00:30:16,010 compellingly how many seconds it took me to implement Pset5 701 00:30:16,010 --> 00:30:20,540 in Python and presumably how many hours it took you to implement Pset5 in C, 702 00:30:20,540 --> 00:30:24,390 that, too, developer time is a resource, a human resource. 703 00:30:24,390 --> 00:30:25,820 But we are paying a price. 704 00:30:25,820 --> 00:30:27,950 And based on the output of C on the left and Python 705 00:30:27,950 --> 00:30:31,930 on the right, what apparently is at least one of the prices paid? 706 00:30:31,930 --> 00:30:33,340 AUDIENCE: It's slow. 707 00:30:33,340 --> 00:30:33,700 DAVID MALAN: Say it again. 708 00:30:33,700 --> 00:30:34,340 AUDIENCE: Slower. 709 00:30:34,340 --> 00:30:35,673 DAVID MALAN: It's slower, right? 710 00:30:35,673 --> 00:30:39,680 Whereas this took 0.51 seconds in C, the same problem solved in Python 711 00:30:39,680 --> 00:30:42,927 took 1.45 seconds in Python. 712 00:30:42,927 --> 00:30:45,260 Now, frankly, thinking back two weeks and the many hours 713 00:30:45,260 --> 00:30:48,050 you probably spent on Pset5, who cares? 714 00:30:48,050 --> 00:30:49,590 Like, oh, my God. 715 00:30:49,590 --> 00:30:50,090 Sure. 716 00:30:50,090 --> 00:30:51,530 It's three times slower. 717 00:30:51,530 --> 00:30:53,990 But my God, the number of hours it took to implement 718 00:30:53,990 --> 00:30:57,020 that solution-- but it really depends on what your goals are, right? 719 00:30:57,020 --> 00:31:00,030 If you're optimizing for spending as little time as possible on a P set, 720 00:31:00,030 --> 00:31:02,030 odds are you're going to want to go with Python. 721 00:31:02,030 --> 00:31:04,880 But if you're implementing a spell checker used every day 722 00:31:04,880 --> 00:31:08,840 by thousands or millions of people, for instance, on Google or Facebook 723 00:31:08,840 --> 00:31:11,600 or even in Google Docs and the like, you know what? 724 00:31:11,600 --> 00:31:16,010 You probably don't want to spend three times as many seconds or fractions 725 00:31:16,010 --> 00:31:19,070 of seconds just because it's easier to write it in Python, 726 00:31:19,070 --> 00:31:22,842 because that three times increase might cost your users more time. 727 00:31:22,842 --> 00:31:24,800 It might cost you three times as much hardware. 728 00:31:24,800 --> 00:31:26,592 It might cost you three times as much money 729 00:31:26,592 --> 00:31:29,790 to buy three times as many servers to do the exact same work. 730 00:31:29,790 --> 00:31:33,260 So again, this is going to be representative 731 00:31:33,260 --> 00:31:37,070 of the types of tradeoffs in programming, 732 00:31:37,070 --> 00:31:39,540 but my apologies for not mentioning this two weeks ago. 733 00:31:39,540 --> 00:31:40,040 All right. 734 00:31:40,040 --> 00:31:43,130 So let's now see if we can't tease apart some 735 00:31:43,130 --> 00:31:45,380 of the differences in this language by way of examples 736 00:31:45,380 --> 00:31:48,560 by walking through a number of the examples we've done in weeks past. 737 00:31:48,560 --> 00:31:50,510 And to make it easier to see before and after, 738 00:31:50,510 --> 00:31:52,730 let me go ahead and use this feature of the IDE-- 739 00:31:52,730 --> 00:31:55,100 turns out if you click this little white icon here, 740 00:31:55,100 --> 00:31:56,720 you can split your screen like this. 741 00:31:56,720 --> 00:31:58,720 So I'm going to adopt the habit for a little bit 742 00:31:58,720 --> 00:32:04,550 now of opening one file on the left in C and one file in the right in Python 743 00:32:04,550 --> 00:32:05,270 instead. 744 00:32:05,270 --> 00:32:08,180 So lets go into, for instance, this directory called 745 00:32:08,180 --> 00:32:12,350 One, which has all of my programs from week 1 written in C, 746 00:32:12,350 --> 00:32:16,170 as well as some new ones for today that we'll write mostly in real time. 747 00:32:16,170 --> 00:32:19,460 So here is a program in week 1 that simply did this. 748 00:32:19,460 --> 00:32:20,920 It gets the user's name. 749 00:32:20,920 --> 00:32:23,090 How do we go about implementing this in Python? 750 00:32:23,090 --> 00:32:26,700 Well, let me go ahead and create a file called string.py. 751 00:32:26,700 --> 00:32:32,100 And as before, I'm going to go ahead now and convert this from before to after. 752 00:32:32,100 --> 00:32:34,800 However, this get string function is, for the moment, something 753 00:32:34,800 --> 00:32:36,210 that we give you in CS50. 754 00:32:36,210 --> 00:32:37,897 There is a CS50 library for Python. 755 00:32:37,897 --> 00:32:40,230 But we're only going to use it for a week or two's time. 756 00:32:40,230 --> 00:32:41,855 And we'll take that training wheel off. 757 00:32:41,855 --> 00:32:45,570 To use it, you can either say quite simply import cs50, 758 00:32:45,570 --> 00:32:48,900 which is similar to include cs50.h. 759 00:32:48,900 --> 00:32:51,590 Or you can more explicitly say from cs50, 760 00:32:51,590 --> 00:32:54,393 import the actual function you want, like get_string. 761 00:32:54,393 --> 00:32:57,060 So I'm going to go ahead and do it the more explicit way for now 762 00:32:57,060 --> 00:32:59,850 so that I can then do s gets get string. 763 00:32:59,850 --> 00:33:02,340 What's your name question mark? 764 00:33:02,340 --> 00:33:05,550 And I will put a backslash in here, because get_string is not print. 765 00:33:05,550 --> 00:33:07,785 It doesn't presumptuously give you a new line. 766 00:33:07,785 --> 00:33:10,410 And then, I'm going to go ahead and print out the user's name-- 767 00:33:10,410 --> 00:33:13,490 hello comma plus s. 768 00:33:13,490 --> 00:33:16,080 I'm going to save my file, go down to my terminal window, 769 00:33:16,080 --> 00:33:18,260 and run Python on string.py. 770 00:33:18,260 --> 00:33:21,280 I'm going to go ahead then and when prompted, type my name David. 771 00:33:21,280 --> 00:33:24,030 And hopefully, it's going to say hello comma David. 772 00:33:24,030 --> 00:33:26,910 Just to warm up here, too, we don't need to use the plus operator. 773 00:33:26,910 --> 00:33:29,910 I can, instead, change this to a second argument, 774 00:33:29,910 --> 00:33:34,247 getting rid of the space inside of hello and now rerun this program. 775 00:33:34,247 --> 00:33:37,080 And I'm hopefully going to see the exact same effect-- for instance, 776 00:33:37,080 --> 00:33:39,750 if Brian types his name, hello, Brian. 777 00:33:39,750 --> 00:33:42,060 And if I really want to get fancy, recall 778 00:33:42,060 --> 00:33:43,560 there's one other way I can do this. 779 00:33:43,560 --> 00:33:47,490 If I want to plug in the user's name here, as in Scratch, 780 00:33:47,490 --> 00:33:49,650 I can put what in between curly braces? 781 00:33:49,650 --> 00:33:50,670 AUDIENCE: S. 782 00:33:50,670 --> 00:33:54,270 DAVID MALAN: S, which is the name of the variable I've chosen, but notice this. 783 00:33:54,270 --> 00:33:58,860 If I get a little sloppy and I just use the curly braces and then I run Python 784 00:33:58,860 --> 00:34:02,070 of string.py, and type in, for instance, Emma's name-- 785 00:34:02,070 --> 00:34:03,248 that is not Emma's name. 786 00:34:03,248 --> 00:34:04,290 It's taking me literally. 787 00:34:04,290 --> 00:34:06,600 I have to turn it into an f string or format string, 788 00:34:06,600 --> 00:34:08,370 even though that syntax looks weird. 789 00:34:08,370 --> 00:34:10,920 Now, if I rerun it and type Emma, we'll hopefully 790 00:34:10,920 --> 00:34:15,250 be greeting, indeed, Emma-- so just some warm-ups to map one to the other. 791 00:34:15,250 --> 00:34:18,690 But let's see what else we can do here in Python. 792 00:34:18,690 --> 00:34:23,580 Well, recall in Python-- in C, we had this example, int.c. 793 00:34:23,580 --> 00:34:26,332 And this was a relatively simple example whose purpose in life 794 00:34:26,332 --> 00:34:28,290 was just to get an integer and then actually do 795 00:34:28,290 --> 00:34:32,159 some math by multiplying age by 365 to figure out roughly 796 00:34:32,159 --> 00:34:33,880 how many days old you are. 797 00:34:33,880 --> 00:34:36,270 Well, in Python, we can do this pretty similarly. 798 00:34:36,270 --> 00:34:42,270 Let me go ahead and open up a file that I will call int.py. 799 00:34:42,270 --> 00:34:46,072 And on the top of this file, I'm going to do from cs50 import get_int, 800 00:34:46,072 --> 00:34:48,239 because that's the function I want to use this time. 801 00:34:48,239 --> 00:34:50,989 I'm going to go ahead and get the user's age with get_int and say, 802 00:34:50,989 --> 00:34:53,120 what's your age backslash n. 803 00:34:53,120 --> 00:34:55,620 And then, I'm going to go ahead and print out-- not printf-- 804 00:34:55,620 --> 00:35:00,222 but print out the same thing as last time-- you are at least-- 805 00:35:00,222 --> 00:35:02,430 let me go ahead and make it this a little more room-- 806 00:35:02,430 --> 00:35:05,175 you are at least-- 807 00:35:05,175 --> 00:35:09,100 I'll come back to this-- something days period. 808 00:35:09,100 --> 00:35:10,600 So how do I now do this? 809 00:35:10,600 --> 00:35:14,010 Well, it turns out that you can plug in not just values, but expressions. 810 00:35:14,010 --> 00:35:19,352 I can actually say age times 365 inside the curly braces. 811 00:35:19,352 --> 00:35:21,810 So I don't need to, therefore, give myself another variable 812 00:35:21,810 --> 00:35:23,370 or use any commas. 813 00:35:23,370 --> 00:35:26,073 But of course, I'm missing one thing still. 814 00:35:26,073 --> 00:35:26,880 AUDIENCE: F. 815 00:35:26,880 --> 00:35:28,930 DAVID MALAN: The f to make this a format string, 816 00:35:28,930 --> 00:35:30,388 and you'll notice the IDE is smart. 817 00:35:30,388 --> 00:35:32,790 As soon as it notices, oh, that's a format string, 818 00:35:32,790 --> 00:35:35,100 it highlights in different colors the values 819 00:35:35,100 --> 00:35:37,740 that will be interpolated, the code inside your string 820 00:35:37,740 --> 00:35:39,010 that will be executed. 821 00:35:39,010 --> 00:35:44,400 So now, if I do Python of int.py and type in my age, for instance, 50, 822 00:35:44,400 --> 00:35:49,180 looks like I'm at least 18,000 days old, in this case. 823 00:35:49,180 --> 00:35:49,680 All right. 824 00:35:49,680 --> 00:35:52,080 So let's see what more we have in Python. 825 00:35:52,080 --> 00:35:55,020 Well, it turns out we had conditions in C. Let me go ahead 826 00:35:55,020 --> 00:35:59,580 and open up, for instance, conditions.c from last time. 827 00:35:59,580 --> 00:36:02,010 And we had this program here, where we prompted the user 828 00:36:02,010 --> 00:36:03,960 for a couple of integers, x and y. 829 00:36:03,960 --> 00:36:06,960 And then, we just compared the two and said x is less than y, 830 00:36:06,960 --> 00:36:09,030 or x is greater than y. 831 00:36:09,030 --> 00:36:11,010 Or x is equal to y. 832 00:36:11,010 --> 00:36:13,800 Well, this one I can type up pretty succinctly, too-- 833 00:36:13,800 --> 00:36:20,345 conditions.py-- let me go ahead and say from cs50 import get_int. 834 00:36:20,345 --> 00:36:22,470 Then, let me go ahead and get an int from the user. 835 00:36:22,470 --> 00:36:24,000 And I'm going to call it x. 836 00:36:24,000 --> 00:36:26,340 Let me go ahead and get another int from the user. 837 00:36:26,340 --> 00:36:27,600 And I'll call it-- oops-- 838 00:36:27,600 --> 00:36:29,760 get_int-- get_int. 839 00:36:29,760 --> 00:36:31,597 Let me go ahead and call that y. 840 00:36:31,597 --> 00:36:33,180 And then, let's just ask the question. 841 00:36:33,180 --> 00:36:34,670 If x is less than y-- oops-- 842 00:36:34,670 --> 00:36:35,250 [LAUGHS] 843 00:36:35,250 --> 00:36:42,030 --if x is less than y, go ahead and print x is less than y. 844 00:36:42,030 --> 00:36:44,235 Else if or-- 845 00:36:44,235 --> 00:36:45,110 AUDIENCE: [INAUDIBLE] 846 00:36:45,110 --> 00:36:46,580 DAVID MALAN: --elif-- slightly more succinct-- 847 00:36:46,580 --> 00:36:48,270 so you'll have to get used to it. 848 00:36:48,270 --> 00:36:49,830 x is greater than y. 849 00:36:49,830 --> 00:36:55,440 Let's go ahead and print out x is greater than y else-- 850 00:36:55,440 --> 00:37:02,790 I'm going to go ahead and say by deduction, that x must be equal to y. 851 00:37:02,790 --> 00:37:04,020 I'll save that file. 852 00:37:04,020 --> 00:37:06,960 I'll go ahead and run Python on conditions.py. 853 00:37:06,960 --> 00:37:10,020 I'll give myself two numbers just to do a quick cursory test. 854 00:37:10,020 --> 00:37:11,580 And indeed, x is less than y. 855 00:37:11,580 --> 00:37:13,440 And I trust if I keep running it, hopefully 856 00:37:13,440 --> 00:37:16,560 it should bear out that the rest of it is correct, as well. 857 00:37:16,560 --> 00:37:17,060 All right. 858 00:37:17,060 --> 00:37:18,930 So pretty one-to-one mapping here-- let's 859 00:37:18,930 --> 00:37:21,388 now start to do something that's a little more interesting. 860 00:37:21,388 --> 00:37:25,560 You might recall from week 1, we had this simple agreement program, 861 00:37:25,560 --> 00:37:27,720 where we prompted the user for a char. 862 00:37:27,720 --> 00:37:30,890 And then, we asked did the user type in y or-- 863 00:37:30,890 --> 00:37:33,530 Y or y or N or n. 864 00:37:33,530 --> 00:37:35,540 And we said agreed or not agreed, accordingly , 865 00:37:35,540 --> 00:37:38,665 just like a program that prompts you to agree to some terms and conditions, 866 00:37:38,665 --> 00:37:39,500 for instance. 867 00:37:39,500 --> 00:37:43,220 Well, let's go ahead and create another file over here called agree.py 868 00:37:43,220 --> 00:37:46,100 and do this in one or more ways. 869 00:37:46,100 --> 00:37:52,280 Let me go ahead and do from cs50 import get_char. 870 00:37:52,280 --> 00:37:53,130 This is subtle. 871 00:37:53,130 --> 00:37:55,370 But what is there not in Python recall? 872 00:37:55,370 --> 00:37:56,470 AUDIENCE: Chars. 873 00:37:56,470 --> 00:38:00,230 DAVID MALAN: Chars-- so what do you think the best approximation of a char 874 00:38:00,230 --> 00:38:03,020 is in a language that does not have chars, per se? 875 00:38:03,020 --> 00:38:04,250 AUDIENCE: A string. 876 00:38:04,250 --> 00:38:05,330 DAVID MALAN: A string-- and we'll just have 877 00:38:05,330 --> 00:38:07,640 to enforce on ourselves that the strings we're using 878 00:38:07,640 --> 00:38:09,170 are only going to be one character. 879 00:38:09,170 --> 00:38:13,280 So I'm going to go ahead and keep using get_string for this case. 880 00:38:13,280 --> 00:38:17,210 And I'm going to go ahead now and prompt the user for a string. 881 00:38:17,210 --> 00:38:20,570 And I'm going to ask them, do you agree question mark? 882 00:38:20,570 --> 00:38:26,180 And then, I'm going to ask the question if s equals equals Y-- 883 00:38:26,180 --> 00:38:27,920 that would be one possibility. 884 00:38:27,920 --> 00:38:33,830 I'm going to go ahead and say print("Agreed.") elif s equals equals 885 00:38:33,830 --> 00:38:34,490 N-- 886 00:38:34,490 --> 00:38:40,970 I'm going to go ahead and print("Not agreed.") just as in the C version. 887 00:38:40,970 --> 00:38:44,510 So is this identical? 888 00:38:44,510 --> 00:38:46,884 Or what feature is missing still? 889 00:38:46,884 --> 00:38:48,692 AUDIENCE: [INAUDIBLE] 890 00:38:48,692 --> 00:38:50,440 DAVID MALAN: Yeah, the lower case, right? 891 00:38:50,440 --> 00:38:53,273 So obviously, the lower case-- so you might be inclined to do, well, 892 00:38:53,273 --> 00:38:55,570 or s equals equals y. 893 00:38:55,570 --> 00:38:58,630 But no, in Python, if you want to say something or something else, 894 00:38:58,630 --> 00:39:00,670 you can literally just say or now. 895 00:39:00,670 --> 00:39:01,630 And in C-- 896 00:39:01,630 --> 00:39:04,860 Python here, we can say or s equals equals n. 897 00:39:04,860 --> 00:39:05,860 We can do the same here. 898 00:39:05,860 --> 00:39:10,553 Now, if I go ahead and run Python on agree.py and I type something like Y-- 899 00:39:10,553 --> 00:39:11,470 I seem to have agreed. 900 00:39:11,470 --> 00:39:13,356 If I type something like y-- 901 00:39:13,356 --> 00:39:14,980 oops-- let's do this again. 902 00:39:14,980 --> 00:39:19,090 If I do it again and type y, it should work, as well. 903 00:39:19,090 --> 00:39:22,780 And then, just for good measure, let's say no with a N-- 904 00:39:22,780 --> 00:39:23,410 Not agreed. 905 00:39:23,410 --> 00:39:25,400 So I'm checking in a couple of ways. 906 00:39:25,400 --> 00:39:27,640 But there's other ways you can do this, right? 907 00:39:27,640 --> 00:39:30,370 We've seen a hint of other features here. 908 00:39:30,370 --> 00:39:32,172 This gets a little verbose. 909 00:39:32,172 --> 00:39:33,880 I could actually say something like this. 910 00:39:33,880 --> 00:39:38,200 If s is in the following list of possible values, 911 00:39:38,200 --> 00:39:40,593 I could ask the question like this instead, 912 00:39:40,593 --> 00:39:42,010 and I could do the same down here. 913 00:39:42,010 --> 00:39:43,780 If s is n-- 914 00:39:43,780 --> 00:39:51,520 if s in N and n, I could similarly now determine that the user has not agreed. 915 00:39:51,520 --> 00:39:54,670 But now, things get more powerful without getting super long and verbose. 916 00:39:54,670 --> 00:39:58,600 Suppose I wanted to support not just Y or y, but Yes or yes 917 00:39:58,600 --> 00:40:00,250 in uppercase and lowercase. 918 00:40:00,250 --> 00:40:03,940 Well, I could actually enumerate other possibilities, like this. 919 00:40:03,940 --> 00:40:04,690 But you know what? 920 00:40:04,690 --> 00:40:08,350 Design-wise, I bet I can do better than this. 921 00:40:08,350 --> 00:40:09,520 I bet I can shrink this. 922 00:40:09,520 --> 00:40:11,370 And heck, I can keep going-- nope. 923 00:40:11,370 --> 00:40:12,390 And nope. 924 00:40:12,390 --> 00:40:14,890 How could I improve the design of this, even if you've never 925 00:40:14,890 --> 00:40:17,620 seen Python before today? 926 00:40:17,620 --> 00:40:22,060 How could I avoid explicitly typing so many values, a few of them 927 00:40:22,060 --> 00:40:22,980 quite similar? 928 00:40:22,980 --> 00:40:23,643 Yeah. 929 00:40:23,643 --> 00:40:26,347 AUDIENCE: By using, like, something similar to two lower case. 930 00:40:26,347 --> 00:40:28,680 DAVID MALAN: Yeah, something similar to two lower case-- 931 00:40:28,680 --> 00:40:32,130 recall that in C, you were able to lower case individual characters. 932 00:40:32,130 --> 00:40:35,700 But just a few moments ago when we re-implemented speller for Pset5, 933 00:40:35,700 --> 00:40:37,282 we could lowercase a whole word. 934 00:40:37,282 --> 00:40:37,990 So you know what? 935 00:40:37,990 --> 00:40:40,770 I could just say if s.lower. 936 00:40:40,770 --> 00:40:43,110 This treats s as the string that it is. 937 00:40:43,110 --> 00:40:46,050 But just like in C, there are these things called strucs, 938 00:40:46,050 --> 00:40:50,910 so are the data types in Python like strings also structures themselves. 939 00:40:50,910 --> 00:40:53,310 And inside of those structures are not only values, 940 00:40:53,310 --> 00:40:55,380 like the individual characters that compose them, 941 00:40:55,380 --> 00:40:58,770 but also built-in functions, otherwise known as methods. 942 00:40:58,770 --> 00:41:02,070 And so you can say s.lower and just lowercase the whole string 943 00:41:02,070 --> 00:41:03,190 automatically. 944 00:41:03,190 --> 00:41:05,190 So now, I can get rid of this. 945 00:41:05,190 --> 00:41:09,838 I can get rid of this, although can I? 946 00:41:09,838 --> 00:41:10,380 AUDIENCE: No. 947 00:41:10,380 --> 00:41:13,255 DAVID MALAN: No, I probably-- if I'm forcing everything to lowercase, 948 00:41:13,255 --> 00:41:14,770 I have to let things match up. 949 00:41:14,770 --> 00:41:18,150 So I'm going to go ahead and do the same thing down here-- s.lower. 950 00:41:18,150 --> 00:41:26,440 And I'm going to check, in this case, if it's equal to n or no like this. 951 00:41:26,440 --> 00:41:28,260 So now, if I go ahead and save that, rerun 952 00:41:28,260 --> 00:41:32,610 the program, and type in not just y, but maybe something like Yes, I'm agreed. 953 00:41:32,610 --> 00:41:35,640 And even if I do something weird like this-- 954 00:41:35,640 --> 00:41:39,060 Y, S, but e for whatever accidental reason, 955 00:41:39,060 --> 00:41:41,450 that, too, is tolerated, as well. 956 00:41:41,450 --> 00:41:45,710 So you can make your programs more user friendly in this way. 957 00:41:45,710 --> 00:41:46,210 All right. 958 00:41:46,210 --> 00:41:50,225 Before we forge ahead, any questions on what we've done thus far 959 00:41:50,225 --> 00:41:51,100 or syntax we've seen? 960 00:41:51,100 --> 00:41:52,763 Yeah. 961 00:41:52,763 --> 00:41:56,691 AUDIENCE: [INAUDIBLE] 962 00:41:56,691 --> 00:42:01,825 963 00:42:01,825 --> 00:42:03,950 DAVID MALAN: Yes, can-- so to restate the question, 964 00:42:03,950 --> 00:42:09,230 can we alternatively still simply check if the first letter of the user's input 965 00:42:09,230 --> 00:42:10,130 is y? 966 00:42:10,130 --> 00:42:11,063 We absolutely could. 967 00:42:11,063 --> 00:42:12,980 And I think there's arguments for and against. 968 00:42:12,980 --> 00:42:15,500 You don't want to necessarily tolerate any word that starts 969 00:42:15,500 --> 00:42:17,235 with y or any word that starts with n. 970 00:42:17,235 --> 00:42:20,360 But let me come back to that in a little bit of time-- turns out in Python, 971 00:42:20,360 --> 00:42:23,277 there's a feature known as regular expressions, where you can actually 972 00:42:23,277 --> 00:42:26,060 define a pattern of characters that you're looking for. 973 00:42:26,060 --> 00:42:28,560 And I think that will let us solve that even more elegantly. 974 00:42:28,560 --> 00:42:31,050 So we'll come back to that before long. 975 00:42:31,050 --> 00:42:31,550 All right. 976 00:42:31,550 --> 00:42:32,967 Well, let's-- yeah, over in front. 977 00:42:32,967 --> 00:42:35,175 AUDIENCE: Is the difference between Python and C 978 00:42:35,175 --> 00:42:39,594 just C [INAUDIBLE] programming, or is there 979 00:42:39,594 --> 00:42:42,550 anything you can do in one language that you can't in the other? 980 00:42:42,550 --> 00:42:44,383 DAVID MALAN: Really good question-- is there 981 00:42:44,383 --> 00:42:47,200 anything you can do in Python that you can't do in C or vice versa? 982 00:42:47,200 --> 00:42:48,730 Short answer-- no. 983 00:42:48,730 --> 00:42:50,650 The languages we're looking at in this course 984 00:42:50,650 --> 00:42:53,860 can all effectively be used to solve the same problems. 985 00:42:53,860 --> 00:42:59,410 However, some languages are designed for or better suited for certain domains. 986 00:42:59,410 --> 00:43:01,590 Honestly, even the few examples we've done now 987 00:43:01,590 --> 00:43:04,090 were so much more pleasant to write in Python than they ever 988 00:43:04,090 --> 00:43:07,552 were in C, not to mention the filter example and the speller example 989 00:43:07,552 --> 00:43:09,760 and a bunch more that we're going to see before long. 990 00:43:09,760 --> 00:43:11,920 Similarly, with C, it would be a nightmare 991 00:43:11,920 --> 00:43:15,790 to implement a web-based application in C, because you 992 00:43:15,790 --> 00:43:18,370 have to implement so much of the plumbing, so to speak, 993 00:43:18,370 --> 00:43:20,710 the underlying code yourself. 994 00:43:20,710 --> 00:43:22,720 However, using something like Python or Ruby 995 00:43:22,720 --> 00:43:26,680 or PHP or Java these days gives you a lot more features out of the box. 996 00:43:26,680 --> 00:43:28,240 But you do pay a price. 997 00:43:28,240 --> 00:43:30,880 And that, in this case of C, for instance, is performance. 998 00:43:30,880 --> 00:43:32,668 You give up some bit of time. 999 00:43:32,668 --> 00:43:34,210 But you gain other features, as well. 1000 00:43:34,210 --> 00:43:37,240 And the fact truly that Python does not have pointers 1001 00:43:37,240 --> 00:43:40,150 is a feature not just because pointers were, hard but 1002 00:43:40,150 --> 00:43:42,940 because it's so easy with pointers to make mistakes, 1003 00:43:42,940 --> 00:43:44,830 as you probably experienced yourself. 1004 00:43:44,830 --> 00:43:46,090 Segfaults are gone. 1005 00:43:46,090 --> 00:43:49,930 And null pointers are gone, because the language protects you from yourself. 1006 00:43:49,930 --> 00:43:52,720 And the reason why humans have dozens, hundreds 1007 00:43:52,720 --> 00:43:55,930 of programming languages in the wild today is because a lot of people 1008 00:43:55,930 --> 00:43:59,260 keep trying to improve upon languages from yesteryear. 1009 00:43:59,260 --> 00:44:01,840 So we'll see other features distinguishing the two in a bit. 1010 00:44:01,840 --> 00:44:02,080 All right. 1011 00:44:02,080 --> 00:44:03,997 Let me go ahead and create another file called 1012 00:44:03,997 --> 00:44:07,120 cough.py just to show how we can also bootstrap ourselves 1013 00:44:07,120 --> 00:44:10,870 from something very simple and naive to a better designed version in Python. 1014 00:44:10,870 --> 00:44:14,170 Recall from week 0, we wanted the cat to cough three times. 1015 00:44:14,170 --> 00:44:16,840 And in week 1, we re-implemented that same idea 1016 00:44:16,840 --> 00:44:19,940 with a little bit of copy/paste, but in a way that works. 1017 00:44:19,940 --> 00:44:22,030 So notice this is a Python program. 1018 00:44:22,030 --> 00:44:23,530 And it's going to cough three times. 1019 00:44:23,530 --> 00:44:25,530 And I'm not going to keep running every program, 1020 00:44:25,530 --> 00:44:27,460 because let me just stipulate that it will. 1021 00:44:27,460 --> 00:44:29,847 But in this case here, even though I claim 1022 00:44:29,847 --> 00:44:32,680 this is a program that will cough three times, let's be super clear. 1023 00:44:32,680 --> 00:44:37,930 With this in all prior examples, what have I not put in the file, as well? 1024 00:44:37,930 --> 00:44:40,120 Like, what is missing vis a vis C programs? 1025 00:44:40,120 --> 00:44:41,780 AUDIENCE: [INAUDIBLE] 1026 00:44:41,780 --> 00:44:42,928 DAVID MALAN: No what? 1027 00:44:42,928 --> 00:44:44,170 AUDIENCE: Int main void. 1028 00:44:44,170 --> 00:44:45,753 DAVID MALAN: There's no int main void. 1029 00:44:45,753 --> 00:44:47,158 And there's no main whatsoever. 1030 00:44:47,158 --> 00:44:50,200 So another feature of Python is that if you want to just write a program, 1031 00:44:50,200 --> 00:44:51,658 you just start writing the program. 1032 00:44:51,658 --> 00:44:53,200 You don't need a main function. 1033 00:44:53,200 --> 00:44:55,870 Now, I'm going to walk that back a little bit, that claim, 1034 00:44:55,870 --> 00:44:58,900 because there are some situations in which you do want a main function. 1035 00:44:58,900 --> 00:45:01,010 But unlike in C, it's not necessary. 1036 00:45:01,010 --> 00:45:03,280 Now, back in week 0 and 1, a bunch of people 1037 00:45:03,280 --> 00:45:07,600 commented that surely, we can implement this better, not using three prints. 1038 00:45:07,600 --> 00:45:09,010 But let's use a loop instead. 1039 00:45:09,010 --> 00:45:13,480 So in Python, you could say for i in [0, 1, 2], 1040 00:45:13,480 --> 00:45:17,410 go ahead and print out "cough," but of course, this is going to get annoying, 1041 00:45:17,410 --> 00:45:19,810 because if you want to print four times or-- sorry-- 1042 00:45:19,810 --> 00:45:23,620 four times or five times or six times or seven times zero index, 1043 00:45:23,620 --> 00:45:25,610 you have to keep enumerating the stupid values. 1044 00:45:25,610 --> 00:45:27,400 So that's why we use what function? 1045 00:45:27,400 --> 00:45:28,175 AUDIENCE: Range. 1046 00:45:28,175 --> 00:45:30,550 DAVID MALAN: Range-- so that is the same thing now that's 1047 00:45:30,550 --> 00:45:32,690 going to print cough three times. 1048 00:45:32,690 --> 00:45:34,635 But what if we wanted to now start to define 1049 00:45:34,635 --> 00:45:36,010 our own coughing function, right? 1050 00:45:36,010 --> 00:45:38,170 The goal of weeks 1 and 2 and onward was start 1051 00:45:38,170 --> 00:45:41,320 to abstract away and build our own reusable puzzle 1052 00:45:41,320 --> 00:45:43,370 pieces, albeit in a different language. 1053 00:45:43,370 --> 00:45:45,643 How could I go about doing this in Python? 1054 00:45:45,643 --> 00:45:47,560 Well, suppose that I want to do the following. 1055 00:45:47,560 --> 00:45:51,130 For i in range 3, I want to just cough. 1056 00:45:51,130 --> 00:45:55,060 And I want cough to be an abstraction, a custom function or a Scratch puzzle 1057 00:45:55,060 --> 00:45:57,460 piece, that someone else or maybe I wrote 1058 00:45:57,460 --> 00:45:59,190 that does this notion of coughing. 1059 00:45:59,190 --> 00:46:00,940 Well, in Python, what's the keyword we can 1060 00:46:00,940 --> 00:46:02,482 use to give ourselves a new function? 1061 00:46:02,482 --> 00:46:03,130 AUDIENCE: Def. 1062 00:46:03,130 --> 00:46:04,372 DAVID MALAN: Def for define-- 1063 00:46:04,372 --> 00:46:06,580 and I can just say the name of the function is cough. 1064 00:46:06,580 --> 00:46:07,850 And it takes no arguments. 1065 00:46:07,850 --> 00:46:10,957 So unlike C, I don't specify a return type. 1066 00:46:10,957 --> 00:46:13,540 And I don't specify the types of the inputs, but in this case, 1067 00:46:13,540 --> 00:46:15,700 that's moot, because there are no inputs to cough. 1068 00:46:15,700 --> 00:46:17,630 This function is super simple. 1069 00:46:17,630 --> 00:46:21,070 It just wants to say print("cough"). 1070 00:46:21,070 --> 00:46:24,850 And so here, I now have a function that's going to quite simply do this. 1071 00:46:24,850 --> 00:46:28,090 And it's an abstraction in the sense that it can be all the way down here 1072 00:46:28,090 --> 00:46:29,200 out of sight, out of mind. 1073 00:46:29,200 --> 00:46:31,240 I don't care anymore how it's implemented. 1074 00:46:31,240 --> 00:46:32,800 Maybe even a friend implemented it. 1075 00:46:32,800 --> 00:46:35,110 And I've imported their code. 1076 00:46:35,110 --> 00:46:38,480 But the problem arises now as follows. 1077 00:46:38,480 --> 00:46:40,870 Let me go ahead and save this without all the whitespace. 1078 00:46:40,870 --> 00:46:43,490 I seem to be practicing what I'm preaching-- no main function. 1079 00:46:43,490 --> 00:46:45,490 Just start writing the code, but use def. 1080 00:46:45,490 --> 00:46:48,730 But let me go ahead and run now Python of cough.py. 1081 00:46:48,730 --> 00:46:51,687 I think-- yeah, I'm going to see the first of our errors. 1082 00:46:51,687 --> 00:46:53,270 Python errors look a little different. 1083 00:46:53,270 --> 00:46:55,090 You're going to see this word tracebac a lot, 1084 00:46:55,090 --> 00:46:57,880 which is like trace back in time of everything that just happened. 1085 00:46:57,880 --> 00:46:59,110 But you do see some clues. 1086 00:46:59,110 --> 00:47:00,400 Cough.py is the file. 1087 00:47:00,400 --> 00:47:01,990 Line 2 is the problem. 1088 00:47:01,990 --> 00:47:04,220 Name cough is not defined. 1089 00:47:04,220 --> 00:47:04,970 But wait a minute. 1090 00:47:04,970 --> 00:47:06,040 It is. 1091 00:47:06,040 --> 00:47:10,030 Cough is defined literally with the word def right here on line 4. 1092 00:47:10,030 --> 00:47:13,503 But there's a problem on line 2, which is here. 1093 00:47:13,503 --> 00:47:15,670 So even if you've never programmed in Python before, 1094 00:47:15,670 --> 00:47:19,258 what's the intuition for this bug? 1095 00:47:19,258 --> 00:47:20,050 Why is this broken? 1096 00:47:20,050 --> 00:47:20,240 Yeah. 1097 00:47:20,240 --> 00:47:22,475 AUDIENCE: You didn't define your function before using it. 1098 00:47:22,475 --> 00:47:24,392 DAVID MALAN: Yeah, I didn't define my function 1099 00:47:24,392 --> 00:47:27,550 before using it, which was exactly a problem we ran into in C. 1100 00:47:27,550 --> 00:47:31,870 Unfortunately, in Python, there's no notion of prototypes. 1101 00:47:31,870 --> 00:47:34,300 So we have one or two solutions. 1102 00:47:34,300 --> 00:47:36,640 I can just move the function up here. 1103 00:47:36,640 --> 00:47:38,320 But there's arguments against this. 1104 00:47:38,320 --> 00:47:41,050 Right now, as with main, in general, it's 1105 00:47:41,050 --> 00:47:45,100 a little bit annoying to put, like, all of your functions on top, 1106 00:47:45,100 --> 00:47:48,190 because then, the reader or you have to go fishing through bigger files 1107 00:47:48,190 --> 00:47:49,490 if you've written more lines. 1108 00:47:49,490 --> 00:47:51,160 Where is the main part of this program? 1109 00:47:51,160 --> 00:47:54,880 So in general, it's better to put the main code up top and the helper code 1110 00:47:54,880 --> 00:47:56,330 down below. 1111 00:47:56,330 --> 00:47:59,740 So the way to solve this conventionally is actually 1112 00:47:59,740 --> 00:48:01,772 going to be to define a main function. 1113 00:48:01,772 --> 00:48:03,730 Technically, it doesn't have to be called main. 1114 00:48:03,730 --> 00:48:06,640 It does not have a special significance like in C. 1115 00:48:06,640 --> 00:48:09,370 But humans adopt this paradigm and just define themselves 1116 00:48:09,370 --> 00:48:10,630 a function called main. 1117 00:48:10,630 --> 00:48:13,060 And they put it up top by convention, too. 1118 00:48:13,060 --> 00:48:15,310 But now, I've introduced a new problem. 1119 00:48:15,310 --> 00:48:20,830 Python of cough.py enter doesn't do anything. 1120 00:48:20,830 --> 00:48:21,860 Well, why is that? 1121 00:48:21,860 --> 00:48:23,800 Python is going to take you literally. 1122 00:48:23,800 --> 00:48:25,415 You've defined a function called main. 1123 00:48:25,415 --> 00:48:27,040 You've defined a function called cough. 1124 00:48:27,040 --> 00:48:29,480 What have I not apparently done explicitly? 1125 00:48:29,480 --> 00:48:30,897 AUDIENCE: You haven't called main. 1126 00:48:30,897 --> 00:48:32,355 DAVID MALAN: I haven't called main. 1127 00:48:32,355 --> 00:48:34,100 Now, in C, you get this feature for free. 1128 00:48:34,100 --> 00:48:35,770 If you write main, it will be called. 1129 00:48:35,770 --> 00:48:38,590 Python-- those training wheels are off, too. 1130 00:48:38,590 --> 00:48:40,410 You have to call main explicitly. 1131 00:48:40,410 --> 00:48:42,040 So this looks a little stupid. 1132 00:48:42,040 --> 00:48:46,210 But this is the solution conventionally to this problem, where you literally 1133 00:48:46,210 --> 00:48:50,350 call main at the bottom of your file, but you define main at the top. 1134 00:48:50,350 --> 00:48:55,060 And this ensures that by the time line 8 is read by the computer, 1135 00:48:55,060 --> 00:48:58,690 by the Python program, the interpreter, it's going to realize, oh, that's OK. 1136 00:48:58,690 --> 00:48:59,980 You've defined main earlier. 1137 00:48:59,980 --> 00:49:01,670 I know now what it is. 1138 00:49:01,670 --> 00:49:05,590 So now, if I run it again, I see cough, cough, cough. 1139 00:49:05,590 --> 00:49:06,210 All right. 1140 00:49:06,210 --> 00:49:09,040 Let's make one final tweak here now so that I 1141 00:49:09,040 --> 00:49:12,640 can factor out my loop here and instead change 1142 00:49:12,640 --> 00:49:16,930 my cough function just as we did in week 0 and 1 to cough some number of times. 1143 00:49:16,930 --> 00:49:19,960 How do I define a Python function that takes an input? 1144 00:49:19,960 --> 00:49:21,940 It's actually relatively straightforward. 1145 00:49:21,940 --> 00:49:24,280 Recall that you don't have to specify types. 1146 00:49:24,280 --> 00:49:25,960 But you do have to specify names. 1147 00:49:25,960 --> 00:49:31,882 And what might be a good name for the input to cough for a number? 1148 00:49:31,882 --> 00:49:34,840 n, right, barring something else-- you could call it anything you want. 1149 00:49:34,840 --> 00:49:37,370 But n is kind of a go-to for an integer. 1150 00:49:37,370 --> 00:49:39,980 So if you're going to cough n times, what do I want to do? 1151 00:49:39,980 --> 00:49:44,710 For i in range of n, I can go ahead and cough n times. 1152 00:49:44,710 --> 00:49:46,810 So this program is functionally the same. 1153 00:49:46,810 --> 00:49:50,860 But now, notice my custom function, just like in week 0 and 1, is more powerful. 1154 00:49:50,860 --> 00:49:53,060 It takes input and produces output. 1155 00:49:53,060 --> 00:49:57,760 So now, I can abstract away the notion of coughing to just say cough 3. 1156 00:49:57,760 --> 00:50:00,910 So again, same exact ideas as we encountered a while back, 1157 00:50:00,910 --> 00:50:04,480 but now, we have the ability to do this now in Python. 1158 00:50:04,480 --> 00:50:07,480 Any questions, then, on those examples thus far? 1159 00:50:07,480 --> 00:50:08,200 This is too fast. 1160 00:50:08,200 --> 00:50:10,870 By all means, push back. 1161 00:50:10,870 --> 00:50:12,100 And ask now. 1162 00:50:12,100 --> 00:50:12,838 Yeah. 1163 00:50:12,838 --> 00:50:18,316 AUDIENCE: I [INAUDIBLE] for Python, and I remember it saying like, 1164 00:50:18,316 --> 00:50:24,680 if [INAUDIBLE] cough times [INAUDIBLE]. 1165 00:50:24,680 --> 00:50:25,600 DAVID MALAN: Yes, OK. 1166 00:50:25,600 --> 00:50:28,990 Would you like your mind to really be blown here then? 1167 00:50:28,990 --> 00:50:32,980 Yes, you can also in Python do this. 1168 00:50:32,980 --> 00:50:39,820 If you want to cough three times, you can just multiply the string by three. 1169 00:50:39,820 --> 00:50:44,600 So now-- and if you're impressed by this, now you're really geeks, 1170 00:50:44,600 --> 00:50:45,410 but here we go-- 1171 00:50:45,410 --> 00:50:45,910 [LAUGHTER] 1172 00:50:45,910 --> 00:50:49,300 --cough, cough, cough-- in a good way. 1173 00:50:49,300 --> 00:50:50,858 This is very Pythonic, right? 1174 00:50:50,858 --> 00:50:51,400 So all right. 1175 00:50:51,400 --> 00:50:52,942 So now, we can let you into the club. 1176 00:50:52,942 --> 00:50:55,045 So there's this expression in the world of Python. 1177 00:50:55,045 --> 00:50:56,920 And there's a lot of programming communities, 1178 00:50:56,920 --> 00:50:59,200 where things are considered Pythonic if-- which 1179 00:50:59,200 --> 00:51:01,313 means this is the way to do it. 1180 00:51:01,313 --> 00:51:02,230 It's not the only way. 1181 00:51:02,230 --> 00:51:03,897 And it's arguably not even the best way. 1182 00:51:03,897 --> 00:51:06,580 But it's the way everyone does it, sort of in double quotes. 1183 00:51:06,580 --> 00:51:08,830 People are very religious when it comes, though, to their languages. 1184 00:51:08,830 --> 00:51:11,260 And so a Pythonic way of doing this-- and the reason why 1185 00:51:11,260 --> 00:51:14,300 there's memes making fun of this is that this is the Pythonic way. 1186 00:51:14,300 --> 00:51:17,890 Like, boom-- no loops whatsoever, just multiply the thing you want. 1187 00:51:17,890 --> 00:51:19,600 Now, to be fair, it's a little buggy. 1188 00:51:19,600 --> 00:51:21,300 Like, I actually have an extra new line. 1189 00:51:21,300 --> 00:51:23,800 So I probably have to try a little harder to get that right. 1190 00:51:23,800 --> 00:51:26,065 But yes, there are hidden tricks in Python, 1191 00:51:26,065 --> 00:51:27,940 a few of which we'll encounter today that let 1192 00:51:27,940 --> 00:51:30,760 you do very fancy one-liners to save time, too. 1193 00:51:30,760 --> 00:51:35,430 AUDIENCE: Why in some scenarios you said that we don't need backslashes, 1194 00:51:35,430 --> 00:51:36,998 but like, for this one, we do? 1195 00:51:36,998 --> 00:51:38,790 DAVID MALAN: Oh, really good question-- why 1196 00:51:38,790 --> 00:51:41,580 do you sometimes not need backslash in, but sometimes you do? 1197 00:51:41,580 --> 00:51:44,790 Print is going to give us a new line at the end of what it's printing. 1198 00:51:44,790 --> 00:51:48,630 So let me go ahead now and rerun this without the explicit backslash n. 1199 00:51:48,630 --> 00:51:52,420 You might be able to intuitively guess cough, cough, cough. 1200 00:51:52,420 --> 00:51:54,630 You're not wrong, per se, but not what I intended. 1201 00:51:54,630 --> 00:51:56,505 So that's why I need to put it back manually. 1202 00:51:56,505 --> 00:51:57,450 AUDIENCE: OK. 1203 00:51:57,450 --> 00:52:00,310 DAVID MALAN: Good question-- other questions on this here? 1204 00:52:00,310 --> 00:52:00,810 All right. 1205 00:52:00,810 --> 00:52:03,870 A few more examples from week 1 before we'll take things up 1206 00:52:03,870 --> 00:52:06,180 to the more interesting problems from week 2 onward. 1207 00:52:06,180 --> 00:52:08,350 Let me go ahead and split my screen once more. 1208 00:52:08,350 --> 00:52:11,340 Let me go ahead and on the left, open up positive.c, 1209 00:52:11,340 --> 00:52:13,590 which was a program recall that allowed us to define 1210 00:52:13,590 --> 00:52:15,930 a function getting a positive integer. 1211 00:52:15,930 --> 00:52:18,040 And we used a special-- 1212 00:52:18,040 --> 00:52:22,470 a type of loop in week 1 when implementing this, that of a 1213 00:52:22,470 --> 00:52:23,580 do while loop. 1214 00:52:23,580 --> 00:52:27,180 Unfortunately, in Python, just as you don't have the plus plus operator, 1215 00:52:27,180 --> 00:52:29,730 you also don't have a do while loop, which 1216 00:52:29,730 --> 00:52:32,850 would seem problematic for very simple ideas like this, where you want 1217 00:52:32,850 --> 00:52:36,390 the human to do something at least once and then maybe again 1218 00:52:36,390 --> 00:52:37,680 and again and again. 1219 00:52:37,680 --> 00:52:38,880 But that's OK, right? 1220 00:52:38,880 --> 00:52:41,790 You have more than enough tools in the toolkit, both in C and Python, 1221 00:52:41,790 --> 00:52:44,980 to do this without the more familiar, more comfortable structure. 1222 00:52:44,980 --> 00:52:47,400 So let me write a program called positive.py. 1223 00:52:47,400 --> 00:52:52,080 Let me go ahead and from CS50 import get_int. 1224 00:52:52,080 --> 00:52:54,180 Let me go ahead and define a main function, 1225 00:52:54,180 --> 00:52:56,340 just as I did before just so I can demonstrate 1226 00:52:56,340 --> 00:52:58,950 how you can get a positive int from the user 1227 00:52:58,950 --> 00:53:01,470 and then print it out-- so super simple example 1228 00:53:01,470 --> 00:53:03,690 that's equivalent, for the moment, to what I'm 1229 00:53:03,690 --> 00:53:05,640 doing over here back from week 1. 1230 00:53:05,640 --> 00:53:06,930 So nothing on the left is new. 1231 00:53:06,930 --> 00:53:10,650 It's all back from week 1, even if it's a bit far back now. 1232 00:53:10,650 --> 00:53:14,280 Let me go ahead now and define also on the right-hand side def 1233 00:53:14,280 --> 00:53:16,410 get_positive_int. 1234 00:53:16,410 --> 00:53:18,120 It's not going to take any arguments. 1235 00:53:18,120 --> 00:53:23,190 But I need to implement this notion of doing something while it's still true. 1236 00:53:23,190 --> 00:53:26,490 And the most Pythonic or conventional way of doing this in Python 1237 00:53:26,490 --> 00:53:28,110 is actually like this. 1238 00:53:28,110 --> 00:53:30,868 Deliberately induce a infinite loop for yourself, 1239 00:53:30,868 --> 00:53:32,910 because you can break out of it anytime you want. 1240 00:53:32,910 --> 00:53:35,160 So this is a common Python paradigm. 1241 00:53:35,160 --> 00:53:38,160 Go ahead, and at least once, get an int from the user asking them 1242 00:53:38,160 --> 00:53:40,440 for positive integer. 1243 00:53:40,440 --> 00:53:44,070 And then, after that, under what circumstances do I probably 1244 00:53:44,070 --> 00:53:50,922 want to break out of this infinite loop if the goal is to get positive_int? 1245 00:53:50,922 --> 00:53:52,380 What questions should I ask myself? 1246 00:53:52,380 --> 00:53:52,880 Yeah. 1247 00:53:52,880 --> 00:53:53,830 AUDIENCE: [INAUDIBLE] 1248 00:53:53,830 --> 00:53:55,830 DAVID MALAN: Yeah, quite simply, if n is greater 1249 00:53:55,830 --> 00:53:59,610 than greater than 0-- no need for parentheses, but I do need the colon. 1250 00:53:59,610 --> 00:54:02,610 I can, just as in C, use the break command, 1251 00:54:02,610 --> 00:54:08,160 which breaks me out of the loop at which point now I can go ahead and return n. 1252 00:54:08,160 --> 00:54:10,200 So it's different from what you see on the left. 1253 00:54:10,200 --> 00:54:11,760 But it's logically the same. 1254 00:54:11,760 --> 00:54:15,900 And honestly you could go back in week 1 and implement this logic in C, 1255 00:54:15,900 --> 00:54:17,040 because we had while loops. 1256 00:54:17,040 --> 00:54:18,893 We had the word true, albeit in lowercase. 1257 00:54:18,893 --> 00:54:21,810 And we had all of this same code, too, even though we had curly braces 1258 00:54:21,810 --> 00:54:23,910 and semicolons and a few other things. 1259 00:54:23,910 --> 00:54:27,910 This, though, is the equivalent Python way of doing it here. 1260 00:54:27,910 --> 00:54:31,590 But there is, it seems, a bug. 1261 00:54:31,590 --> 00:54:34,150 Or rather, there is what you would think is a bug. 1262 00:54:34,150 --> 00:54:36,000 This is OK, not a problem there. 1263 00:54:36,000 --> 00:54:39,030 That'll go away eventually hopefully. 1264 00:54:39,030 --> 00:54:39,725 Go. 1265 00:54:39,725 --> 00:54:41,828 [LAUGHS] 1266 00:54:41,828 --> 00:54:42,870 Pay no attention to that. 1267 00:54:42,870 --> 00:54:44,340 The code is right, I believe. 1268 00:54:44,340 --> 00:54:46,470 So there seems to be a bug. 1269 00:54:46,470 --> 00:54:48,030 And this one is super subtle. 1270 00:54:48,030 --> 00:54:52,120 But in weeks 1 through 5 when we were writing in C-- oh, see? 1271 00:54:52,120 --> 00:54:52,913 It went away. 1272 00:54:52,913 --> 00:54:54,330 Just ignore the problem sometimes. 1273 00:54:54,330 --> 00:54:54,997 It will go away. 1274 00:54:54,997 --> 00:54:57,730 [LAUGHTER] 1275 00:54:57,730 --> 00:55:00,667 There is a seemingly subtle bug here. 1276 00:55:00,667 --> 00:55:02,250 But it's not actually a bug in Python. 1277 00:55:02,250 --> 00:55:06,480 But it would have been in C., what am I doing wrong, 1278 00:55:06,480 --> 00:55:11,810 at least in C, even though I claim this is going to work? 1279 00:55:11,810 --> 00:55:17,400 And if you compare left and right, it might become more obvious. 1280 00:55:17,400 --> 00:55:18,660 What am I doing? 1281 00:55:18,660 --> 00:55:20,712 Is that a-- yeah, in back. 1282 00:55:20,712 --> 00:55:22,504 AUDIENCE: You're breaking before returning. 1283 00:55:22,504 --> 00:55:24,295 DAVID MALAN: I'm breaking before returning. 1284 00:55:24,295 --> 00:55:26,940 That's OK, because this break statement if n is greater than 0 1285 00:55:26,940 --> 00:55:29,790 is going to break me out of the indentation, out of the loop. 1286 00:55:29,790 --> 00:55:31,600 So that's OK. 1287 00:55:31,600 --> 00:55:36,210 But I think your concern is related if we can put our finger on it 1288 00:55:36,210 --> 00:55:37,980 a little more precisely. 1289 00:55:37,980 --> 00:55:38,480 Yeah. 1290 00:55:38,480 --> 00:55:42,600 AUDIENCE: Like, you're not-- you're returning n, but n is [INAUDIBLE].. 1291 00:55:42,600 --> 00:55:45,310 DAVID MALAN: Yes, so this is maybe the second part of your claim. 1292 00:55:45,310 --> 00:55:48,300 The n is being returned on line 12. 1293 00:55:48,300 --> 00:55:49,830 And I claim this is actually fine. 1294 00:55:49,830 --> 00:55:52,650 But n was declared albeit implicitly-- that is, 1295 00:55:52,650 --> 00:55:54,390 without any data type in Python-- 1296 00:55:54,390 --> 00:55:55,350 on line 9. 1297 00:55:55,350 --> 00:55:58,350 If we had done that in C over here, would not 1298 00:55:58,350 --> 00:56:00,120 have worked, because recall in C, there's 1299 00:56:00,120 --> 00:56:02,460 this notion of scope, where when you define a variable, 1300 00:56:02,460 --> 00:56:06,540 it only exists inside of the curly braces that encapsulate it. 1301 00:56:06,540 --> 00:56:08,280 Now, Python doesn't have curly braces. 1302 00:56:08,280 --> 00:56:11,040 But there's still indentation, which implies the same. 1303 00:56:11,040 --> 00:56:15,532 But in Python, your variables, even if they're declared under, under, 1304 00:56:15,532 --> 00:56:19,110 under, under conditions or variables-- or loops, 1305 00:56:19,110 --> 00:56:23,100 they will be accessible to you outside of those conditions and loops. 1306 00:56:23,100 --> 00:56:24,390 So it's a nice feature. 1307 00:56:24,390 --> 00:56:28,290 And it allows me, then, to run this program, Python of positive.py. 1308 00:56:28,290 --> 00:56:30,640 Let me go ahead and provide-- 1309 00:56:30,640 --> 00:56:34,420 oops-- hmm, turns out there is a bug. 1310 00:56:34,420 --> 00:56:34,920 Yeah. 1311 00:56:34,920 --> 00:56:36,120 AUDIENCE: [INAUDIBLE] main. 1312 00:56:36,120 --> 00:56:37,980 DAVID MALAN: Yeah, so I have to call main at the bottom 1313 00:56:37,980 --> 00:56:39,240 even though that looks a little silly. 1314 00:56:39,240 --> 00:56:41,282 But now, let me go ahead and run the program now. 1315 00:56:41,282 --> 00:56:43,380 Oh, now, it's prompting me for a positive integer. 1316 00:56:43,380 --> 00:56:46,930 Let's not cooperate-- negative 1, 0, 1. 1317 00:56:46,930 --> 00:56:48,003 Now, it, in fact, works. 1318 00:56:48,003 --> 00:56:50,670 So again, sometimes you might have to think a little harder when 1319 00:56:50,670 --> 00:56:54,180 it comes to implementing something in Python as opposed to C. 1320 00:56:54,180 --> 00:56:56,910 But indeed, it is very much possible. 1321 00:56:56,910 --> 00:56:57,410 Yeah. 1322 00:56:57,410 --> 00:57:00,182 AUDIENCE: Are variables identical accessible across functions? 1323 00:57:00,182 --> 00:57:03,140 DAVID MALAN: Good question-- are variables accessible across functions? 1324 00:57:03,140 --> 00:57:05,360 No, they will be isolated to the function, 1325 00:57:05,360 --> 00:57:09,200 but not to the indentation level in which they were defined. 1326 00:57:09,200 --> 00:57:15,330 Well, let's go back for just a moment to a place we saw some weeks ago, 1327 00:57:15,330 --> 00:57:18,290 which was this here. 1328 00:57:18,290 --> 00:57:20,930 You'll recall that in Mario, we did a few examples early on, 1329 00:57:20,930 --> 00:57:23,347 where we wanted to replicate the idea, printing out, like, 1330 00:57:23,347 --> 00:57:25,370 four question marks in a row here. 1331 00:57:25,370 --> 00:57:30,448 And we wanted to print out something like three squares in a column. 1332 00:57:30,448 --> 00:57:32,990 And then, we also had this two-dimensional structure printing 1333 00:57:32,990 --> 00:57:33,680 bricks. 1334 00:57:33,680 --> 00:57:36,590 Let's see how we can implement those same ideas now 1335 00:57:36,590 --> 00:57:40,290 using Python a bit more simply than before. 1336 00:57:40,290 --> 00:57:41,750 So let me go ahead here. 1337 00:57:41,750 --> 00:57:46,640 And I'll create a program called mario.py In which to whip these up, 1338 00:57:46,640 --> 00:57:47,180 as well. 1339 00:57:47,180 --> 00:57:50,310 So Mario.py-- the first goal is to do something like this. 1340 00:57:50,310 --> 00:57:54,290 So I want to go ahead and print out four question marks in the sky or just 1341 00:57:54,290 --> 00:57:57,470 in simple ASCII terms, just four question marks on the screen. 1342 00:57:57,470 --> 00:57:59,870 So I can obviously just do 1, 2, 3, 4. 1343 00:57:59,870 --> 00:58:02,020 But this is not particularly well designed. 1344 00:58:02,020 --> 00:58:04,670 I can make it a little more reusable, a little more dynamic 1345 00:58:04,670 --> 00:58:07,532 by saying for i in range (4). 1346 00:58:07,532 --> 00:58:09,740 And then, I can go ahead and print out, for instance, 1347 00:58:09,740 --> 00:58:12,170 a single question mark instead. 1348 00:58:12,170 --> 00:58:14,870 But something's going to backfire now. 1349 00:58:14,870 --> 00:58:18,880 If I run this, what am I going to see that I don't want to see? 1350 00:58:18,880 --> 00:58:19,380 Yeah. 1351 00:58:19,380 --> 00:58:22,595 AUDIENCE: It will be a question mark [INAUDIBLE].. 1352 00:58:22,595 --> 00:58:23,470 DAVID MALAN: Exactly. 1353 00:58:23,470 --> 00:58:25,300 It's going to be question marks in a vertical row. 1354 00:58:25,300 --> 00:58:25,930 Why? 1355 00:58:25,930 --> 00:58:29,080 Well, finally, we were so happy to get rid of the backslash n's. 1356 00:58:29,080 --> 00:58:32,650 Now, it's come back to bite us, because sometimes you don't want the backslash 1357 00:58:32,650 --> 00:58:33,160 n's. 1358 00:58:33,160 --> 00:58:36,760 So here's where Python's functions are parameterizable 1359 00:58:36,760 --> 00:58:38,860 in a little different way from C. 1360 00:58:38,860 --> 00:58:41,950 Most every function we've seen in C might have taken 1361 00:58:41,950 --> 00:58:44,690 zero or more arguments inside the parentheses, 1362 00:58:44,690 --> 00:58:46,360 and you just separate them with commas. 1363 00:58:46,360 --> 00:58:50,050 Python's a little fancier in that it has what are called named arguments, where 1364 00:58:50,050 --> 00:58:53,290 you don't just specify comma something, comma, something, comma, something. 1365 00:58:53,290 --> 00:58:56,770 You can, instead, specify the name of an argument or a parameter, 1366 00:58:56,770 --> 00:58:59,000 an equals sign, and then its value. 1367 00:58:59,000 --> 00:59:01,780 So you would only know this from Python's documentation. 1368 00:59:01,780 --> 00:59:06,450 But it turns out that the print function takes an argument called end-- 1369 00:59:06,450 --> 00:59:10,540 E-N-D-- whose value can equal whatever you want it to. 1370 00:59:10,540 --> 00:59:14,710 By default, it literally equals backslash n. 1371 00:59:14,710 --> 00:59:17,710 It sort of happens automatically, but you can override this. 1372 00:59:17,710 --> 00:59:19,270 You can actually, say you know what? 1373 00:59:19,270 --> 00:59:23,920 I don't want anything at the end of each thing I'm printing. 1374 00:59:23,920 --> 00:59:25,630 So let me just to quote unquote. 1375 00:59:25,630 --> 00:59:28,180 Let me rerun mario.py now. 1376 00:59:28,180 --> 00:59:29,710 And now, I almost have what I want. 1377 00:59:29,710 --> 00:59:30,752 But it's a little sloppy. 1378 00:59:30,752 --> 00:59:32,543 I still want to move the cursor to the end. 1379 00:59:32,543 --> 00:59:33,190 But that's OK. 1380 00:59:33,190 --> 00:59:34,982 I can just print nothing, because I'm going 1381 00:59:34,982 --> 00:59:37,810 to get a new line for free at the bottom of the program. 1382 00:59:37,810 --> 00:59:40,930 So now is how I can implement this same idea. 1383 00:59:40,930 --> 00:59:42,180 But you can put anything here. 1384 00:59:42,180 --> 00:59:43,305 It might be a little weird. 1385 00:59:43,305 --> 00:59:45,250 But I could put commas in between. 1386 00:59:45,250 --> 00:59:48,940 And then, I could rerun mario.py and now get question mark comma question mark 1387 00:59:48,940 --> 00:59:52,840 comma question mark comma, because I'm printing a comma after each one. 1388 00:59:52,840 --> 00:59:56,672 But for our purposes, it suffices just to override that, in this case. 1389 00:59:56,672 --> 00:59:58,880 Well, how can I go about doing this a little fancier? 1390 00:59:58,880 --> 01:00:01,510 Well, you proposed-- or the meme you saw proposed 1391 01:00:01,510 --> 01:00:04,420 that we can instead do this instead. 1392 01:00:04,420 --> 01:00:10,060 We can just print, for instance, print question mark times 4. 1393 01:00:10,060 --> 01:00:11,810 Now, we can rerun the program now. 1394 01:00:11,810 --> 01:00:14,530 And voila-- even more Pythonic-- 1395 01:00:14,530 --> 01:00:18,970 not necessarily as obvious or reusable, but certainly more succinct. 1396 01:00:18,970 --> 01:00:21,070 Let's do one more this time for-- 1397 01:00:21,070 --> 01:00:22,630 how about this? 1398 01:00:22,630 --> 01:00:25,420 Recall that we wanted to print a column of three bricks. 1399 01:00:25,420 --> 01:00:26,895 So how might we do this? 1400 01:00:26,895 --> 01:00:29,020 Well, let me go ahead and do it the simplistic way. 1401 01:00:29,020 --> 01:00:33,550 For i in range of 3, let me go ahead and print out a brick like that. 1402 01:00:33,550 --> 01:00:35,830 Let me run the program now, mario.py. 1403 01:00:35,830 --> 01:00:37,630 And voila, that one's pretty easy. 1404 01:00:37,630 --> 01:00:43,030 But I can actually do this a little more cleverly if I do do this-- 1405 01:00:43,030 --> 01:00:45,340 print one of these-- 1406 01:00:45,340 --> 01:00:48,400 backslash n times 3. 1407 01:00:48,400 --> 01:00:51,490 But let's fix that bug that came up earlier, as well. 1408 01:00:51,490 --> 01:00:52,420 That's almost right. 1409 01:00:52,420 --> 01:00:54,380 But I claim that this was a little messy. 1410 01:00:54,380 --> 01:00:56,530 So what is the solution for fixing this bug, where 1411 01:00:56,530 --> 01:00:57,730 I'm just being a little nit picky? 1412 01:00:57,730 --> 01:01:00,040 I don't want this extra blank line at the end, which 1413 01:01:00,040 --> 01:01:02,440 I'm getting for free from print itself. 1414 01:01:02,440 --> 01:01:04,900 The blank lines-- the new lines in the middle 1415 01:01:04,900 --> 01:01:07,870 are coming from the quoted string here. 1416 01:01:07,870 --> 01:01:10,670 What's the fix to get rid of that extra new line at the very end? 1417 01:01:10,670 --> 01:01:11,170 Yeah. 1418 01:01:11,170 --> 01:01:13,570 AUDIENCE: You could change n to nothing. 1419 01:01:13,570 --> 01:01:15,788 DAVID MALAN: Yeah, just say equals quote unquote. 1420 01:01:15,788 --> 01:01:18,080 So the syntax is starting to get a little funky, right? 1421 01:01:18,080 --> 01:01:20,260 Like, it's a little harder to parse visually. 1422 01:01:20,260 --> 01:01:23,600 But this is, indeed, just the paradigm we've seen before. 1423 01:01:23,600 --> 01:01:25,200 Here is one argument on the left. 1424 01:01:25,200 --> 01:01:26,930 Here is another argument in the right. 1425 01:01:26,930 --> 01:01:28,638 The only thing that's different in Python 1426 01:01:28,638 --> 01:01:32,140 is that now, some arguments can have explicit names that you only 1427 01:01:32,140 --> 01:01:33,940 know from the documentation. 1428 01:01:33,940 --> 01:01:36,550 So now, if I rerun this after saving, now, I've 1429 01:01:36,550 --> 01:01:38,800 got the effect that I actually want. 1430 01:01:38,800 --> 01:01:41,560 Well, let's do one more with Mario here, this time 1431 01:01:41,560 --> 01:01:44,980 to do something a little two dimensional and print out a brick that's like a 3 1432 01:01:44,980 --> 01:01:48,340 by 3 brick of hashes instead. 1433 01:01:48,340 --> 01:01:50,290 Well, let's go back to my code here. 1434 01:01:50,290 --> 01:01:54,050 And let me go ahead and do a first example in Python of a nested loop. 1435 01:01:54,050 --> 01:01:58,690 So let me go ahead and do for i in range of 3. 1436 01:01:58,690 --> 01:02:00,550 That gives me my rows. 1437 01:02:00,550 --> 01:02:04,300 And then, I can just do for j in range 3 also. 1438 01:02:04,300 --> 01:02:07,660 And then, in here, I can go ahead and print out just a hash mark. 1439 01:02:07,660 --> 01:02:10,660 But I don't want to print out new lines every time. 1440 01:02:10,660 --> 01:02:13,330 Otherwise, it's going to be a super tall column of hashes. 1441 01:02:13,330 --> 01:02:17,110 But after I print a row, I do want to print a blank line. 1442 01:02:17,110 --> 01:02:18,397 So I think this suffices. 1443 01:02:18,397 --> 01:02:19,730 I'm going a little quickly here. 1444 01:02:19,730 --> 01:02:22,180 But again, this-- the logic is from week 1. 1445 01:02:22,180 --> 01:02:24,910 The syntax is now from week 6. 1446 01:02:24,910 --> 01:02:26,150 Let me run this again-- 1447 01:02:26,150 --> 01:02:27,560 mario.py. 1448 01:02:27,560 --> 01:02:28,060 Nope. 1449 01:02:28,060 --> 01:02:29,710 I screwed up. 1450 01:02:29,710 --> 01:02:31,670 What did I do wrong? 1451 01:02:31,670 --> 01:02:33,490 I didn't actually override what I intended. 1452 01:02:33,490 --> 01:02:36,338 1453 01:02:36,338 --> 01:02:37,880 Whats-- yeah, over there on the left. 1454 01:02:37,880 --> 01:02:39,505 AUDIENCE: You included the backslash n. 1455 01:02:39,505 --> 01:02:41,463 DAVID MALAN: Yeah, and the whole point of using 1456 01:02:41,463 --> 01:02:43,320 the n parameter was to override it. 1457 01:02:43,320 --> 01:02:46,100 So let me change it to that, and let's see what happens now. 1458 01:02:46,100 --> 01:02:46,670 Voila. 1459 01:02:46,670 --> 01:02:48,920 Now I've implemented that same idea. 1460 01:02:48,920 --> 01:02:51,410 Whoo, I think Rice Krispie Treats await us in the lobby. 1461 01:02:51,410 --> 01:02:53,990 We'll see you in five minutes. 1462 01:02:53,990 --> 01:02:54,785 All right. 1463 01:02:54,785 --> 01:02:57,800 1464 01:02:57,800 --> 01:02:59,930 We are back. 1465 01:02:59,930 --> 01:03:03,650 And let's now look back at where we started this conversation 1466 01:03:03,650 --> 01:03:05,420 of comparing C against Python. 1467 01:03:05,420 --> 01:03:08,090 And recall that one of the earliest examples we did today 1468 01:03:08,090 --> 01:03:11,030 involved strings and using the CS50 library. 1469 01:03:11,030 --> 01:03:14,690 But the CS50 library-- we're going to very quickly take away, indeed, 1470 01:03:14,690 --> 01:03:18,080 just after a few problems that you implement in problem set 6. 1471 01:03:18,080 --> 01:03:21,830 But we'll see now just how easily that can be done. 1472 01:03:21,830 --> 01:03:25,610 It turns out in Python, you don't need to use get_string or the CS50 library 1473 01:03:25,610 --> 01:03:29,960 itself, because there actually exists a function quite simply called input. 1474 01:03:29,960 --> 01:03:32,120 And indeed, I can get rid of get_string, replace it 1475 01:03:32,120 --> 01:03:35,780 with this function called input, and actually store the return value in s. 1476 01:03:35,780 --> 01:03:40,860 And for the most part, that will behave identically to get_string. 1477 01:03:40,860 --> 01:03:42,980 If I go ahead and run Python on string.py, 1478 01:03:42,980 --> 01:03:45,260 I can go ahead and type my name in. 1479 01:03:45,260 --> 01:03:47,390 And it still works as expected. 1480 01:03:47,390 --> 01:03:51,320 But I need to be mindful now that input, by definition, 1481 01:03:51,320 --> 01:03:53,660 in Python's documentation, always returns 1482 01:03:53,660 --> 01:03:57,255 a string, which means that if I'm going to get rid of get_int 1483 01:03:57,255 --> 01:04:00,380 and maybe get_float, another function you might want to use for problem set 1484 01:04:00,380 --> 01:04:06,830 6, and use input instead, it's no longer sufficient to just call input and store 1485 01:04:06,830 --> 01:04:08,610 the answer in a variable called age. 1486 01:04:08,610 --> 01:04:09,110 Why? 1487 01:04:09,110 --> 01:04:13,190 Even though I've not specified the type of age on line 1, 1488 01:04:13,190 --> 01:04:16,850 what apparently will its type be as I've just defined? 1489 01:04:16,850 --> 01:04:18,350 AUDIENCE: It's going to be a string. 1490 01:04:18,350 --> 01:04:19,975 DAVID MALAN: It's going to be a string. 1491 01:04:19,975 --> 01:04:23,130 Input, by definition in Python, returns a string. 1492 01:04:23,130 --> 01:04:26,330 So if you want to convert it to an integer, you need to know how. 1493 01:04:26,330 --> 01:04:30,860 And the simplest way to do it is quite simply to convert it with a function 1494 01:04:30,860 --> 01:04:31,820 called int. 1495 01:04:31,820 --> 01:04:33,983 So this is actually very similar to casting in C. 1496 01:04:33,983 --> 01:04:35,150 But it's a little backwards. 1497 01:04:35,150 --> 01:04:38,390 In C, you would say parentheses int close parentheses. 1498 01:04:38,390 --> 01:04:41,930 In Python, you say int open paren, whatever 1499 01:04:41,930 --> 01:04:44,300 it is you want to convert, and then close parentheses. 1500 01:04:44,300 --> 01:04:46,340 You call it as an actual function. 1501 01:04:46,340 --> 01:04:48,140 But this is going to be a little fragile. 1502 01:04:48,140 --> 01:04:52,610 It turns out that if you just blindly pass the user's input to this int 1503 01:04:52,610 --> 01:04:55,920 function, if it doesn't look like an int, bad things are going to happen. 1504 01:04:55,920 --> 01:04:59,250 You're going to see some kind of trace back or error message on the screen. 1505 01:04:59,250 --> 01:05:02,315 That's why, for this first week, we used the CS50 library and get_int 1506 01:05:02,315 --> 01:05:04,550 and get_string and get_float just because it's 1507 01:05:04,550 --> 01:05:10,910 a little harder using the library to accidentally mistreat input. 1508 01:05:10,910 --> 01:05:12,380 But you don't need to use this. 1509 01:05:12,380 --> 01:05:17,010 And you needn't-- you won't use it after just a week or so more time. 1510 01:05:17,010 --> 01:05:17,510 All right. 1511 01:05:17,510 --> 01:05:20,600 A few other examples, and we'll build ultimately 1512 01:05:20,600 --> 01:05:22,940 to some of the more powerful examples we can do even 1513 01:05:22,940 --> 01:05:26,120 after just two hours of Python programming. 1514 01:05:26,120 --> 01:05:30,500 Let me go ahead and open up, first of all, overflow.c, 1515 01:05:30,500 --> 01:05:33,020 which you might recall from a few weeks back 1516 01:05:33,020 --> 01:05:36,350 was a problem, because as soon as I kept doubling and doubling and doubling 1517 01:05:36,350 --> 01:05:39,020 an integer in C and printing it out, what eventually happened? 1518 01:05:39,020 --> 01:05:41,185 AUDIENCE: [INAUDIBLE] 1519 01:05:41,185 --> 01:05:43,060 DAVID MALAN: Slight spoiler in the file name. 1520 01:05:43,060 --> 01:05:44,433 AUDIENCE: It overflowed. 1521 01:05:44,433 --> 01:05:45,850 DAVID MALAN: It overflowed, right? 1522 01:05:45,850 --> 01:05:48,743 And it rolled around, so to speak, to 0, because all of the bits 1523 01:05:48,743 --> 01:05:50,660 eventually rolled-- you carried too many ones. 1524 01:05:50,660 --> 01:05:52,570 And voila, you were left with all zeros. 1525 01:05:52,570 --> 01:05:54,520 Python is actually kind of cool. 1526 01:05:54,520 --> 01:05:58,500 Let me go ahead and open up a file here called overflow.py 1527 01:05:58,500 --> 01:06:03,010 and implement this same idea this time in Python. 1528 01:06:03,010 --> 01:06:07,960 Let me go ahead and save this as overflow.py, which now might actually 1529 01:06:07,960 --> 01:06:09,040 be a bit of a misnomer. 1530 01:06:09,040 --> 01:06:11,740 I'm going to go ahead and do this. 1531 01:06:11,740 --> 01:06:13,570 i equals 1 initially. 1532 01:06:13,570 --> 01:06:15,940 While True, do the following forever. 1533 01:06:15,940 --> 01:06:17,100 Go ahead and print out i. 1534 01:06:17,100 --> 01:06:18,100 And then, you know what? 1535 01:06:18,100 --> 01:06:20,590 Let me go ahead and sleep for one second and then, 1536 01:06:20,590 --> 01:06:24,760 go ahead and multiply i times 2, which I can also more 1537 01:06:24,760 --> 01:06:27,690 succinctly write as i star equals 2-- 1538 01:06:27,690 --> 01:06:31,360 so almost identical to C, except no semicolon here. 1539 01:06:31,360 --> 01:06:33,670 However, sleep you don't just get automatically. 1540 01:06:33,670 --> 01:06:36,790 It turns out sleep is in a library called time. 1541 01:06:36,790 --> 01:06:39,560 So I'm going to have to import sleep, so to speak, 1542 01:06:39,560 --> 01:06:41,500 by using this one-liner up top. 1543 01:06:41,500 --> 01:06:44,200 Let me go ahead and run this as Python of overflow.py. 1544 01:06:44,200 --> 01:06:50,360 Let me go ahead and increase the size of this window here and run this. 1545 01:06:50,360 --> 01:06:51,162 OK. 1546 01:06:51,162 --> 01:06:52,120 I'm a little impatient. 1547 01:06:52,120 --> 01:06:53,380 That seems a little slow. 1548 01:06:53,380 --> 01:06:57,270 In Python, you can actually sleep for fractions of sentence-- frackish-- 1549 01:06:57,270 --> 01:06:59,643 blah, blah-- fractions of seconds. 1550 01:06:59,643 --> 01:07:00,685 So let me do this faster. 1551 01:07:00,685 --> 01:07:04,181 AUDIENCE: [INAUDIBLE] 1552 01:07:04,181 --> 01:07:04,833 1553 01:07:04,833 --> 01:07:05,500 DAVID MALAN: OK. 1554 01:07:05,500 --> 01:07:06,400 Now, I'm not counting. 1555 01:07:06,400 --> 01:07:09,192 But I'm pretty sure that's more than 4 billion, which you'll recall 1556 01:07:09,192 --> 01:07:11,068 was the upper bound the last time around. 1557 01:07:11,068 --> 01:07:13,610 And in fact, even though the internet is a little slow here-- 1558 01:07:13,610 --> 01:07:17,620 so that's why it's not churning it out at a super fast rate-- 1559 01:07:17,620 --> 01:07:18,910 these are really big numbers. 1560 01:07:18,910 --> 01:07:22,300 And amazingly in Python, indeed, it's great for data science and analytics 1561 01:07:22,300 --> 01:07:22,960 and such. 1562 01:07:22,960 --> 01:07:24,610 Ints have no upper bounds. 1563 01:07:24,610 --> 01:07:26,080 You cannot overflow an int. 1564 01:07:26,080 --> 01:07:28,330 It will just grow and grow and grow until, frankly, it 1565 01:07:28,330 --> 01:07:29,470 takes over your computer. 1566 01:07:29,470 --> 01:07:34,210 But there is no fixed limit, as there was in C, which is wonderful. 1567 01:07:34,210 --> 01:07:36,610 Downside, though, if Python floats, still 1568 01:07:36,610 --> 01:07:38,422 imprecise-- so there are libraries, though. 1569 01:07:38,422 --> 01:07:40,630 There is code that other people have written, though, 1570 01:07:40,630 --> 01:07:43,160 to mitigate that problem in Python, as well. 1571 01:07:43,160 --> 01:07:43,660 All right. 1572 01:07:43,660 --> 01:07:46,360 Let's move now to where we left off in week 2, 1573 01:07:46,360 --> 01:07:49,930 where we started introducing arrays that we're now going to start calling lists. 1574 01:07:49,930 --> 01:07:51,790 Let me go ahead and split my window again. 1575 01:07:51,790 --> 01:07:58,790 Let me go ahead and open from week 2 an example like scores2.c, which 1576 01:07:58,790 --> 01:08:00,290 looked a little something like this. 1577 01:08:00,290 --> 01:08:01,165 So it's been a while. 1578 01:08:01,165 --> 01:08:03,230 But we did see this example a while back, 1579 01:08:03,230 --> 01:08:06,630 which just initializes an array with three values-- 1580 01:08:06,630 --> 01:08:10,930 72, 73, 33-- and then computes the average using a bit of arithmetic 1581 01:08:10,930 --> 01:08:11,660 down below. 1582 01:08:11,660 --> 01:08:14,960 So a while back, but all it did was quite simply that. 1583 01:08:14,960 --> 01:08:18,220 Let me go ahead and create a file called scores.py on the right-hand side 1584 01:08:18,220 --> 01:08:19,560 now in Python. 1585 01:08:19,560 --> 01:08:22,750 And let me go ahead and just give myself an array now called a list. 1586 01:08:22,750 --> 01:08:25,060 And it's a list in the sense, like a linked list, 1587 01:08:25,060 --> 01:08:27,770 that it can grow and shrink automatically-- 1588 01:08:27,770 --> 01:08:29,529 so no more alloc or realloc. 1589 01:08:29,529 --> 01:08:32,649 So in fact, if I want to add something to this list, 1590 01:08:32,649 --> 01:08:35,470 I can literally say scores, which is the name of the variable, 1591 01:08:35,470 --> 01:08:39,580 go inside of it just like a struct in C, and use a function, otherwise known 1592 01:08:39,580 --> 01:08:41,890 now as a method that's inside of a structure, 1593 01:08:41,890 --> 01:08:44,080 and just append a value like 72. 1594 01:08:44,080 --> 01:08:46,450 I can then do this again and append 73. 1595 01:08:46,450 --> 01:08:49,720 And I can then do this again and append 33. 1596 01:08:49,720 --> 01:08:51,790 And now, I can go ahead and print out an average. 1597 01:08:51,790 --> 01:08:54,220 Let's go ahead and say average, just like before. 1598 01:08:54,220 --> 01:08:57,220 And it turns out Python has some fancy functions that are useful here. 1599 01:08:57,220 --> 01:08:59,260 I can take the sum of all of those scores 1600 01:08:59,260 --> 01:09:04,240 and divide by the length of that list, thereby giving me, hopefully, 1601 01:09:04,240 --> 01:09:06,310 the total count-- 1602 01:09:06,310 --> 01:09:09,010 the total sum of the scores divided by the total count of scores 1603 01:09:09,010 --> 01:09:12,200 and getting an average-- so python scores.py. 1604 01:09:12,200 --> 01:09:13,850 Oh, no, I forgot what? 1605 01:09:13,850 --> 01:09:14,350 AUDIENCE: f. 1606 01:09:14,350 --> 01:09:15,880 DAVID MALAN: Just the f for an fstring. 1607 01:09:15,880 --> 01:09:16,380 All right. 1608 01:09:16,380 --> 01:09:18,100 So let me go ahead now and rerun that. 1609 01:09:18,100 --> 01:09:21,490 And wala-- it looks like with those three values, the average out 1610 01:09:21,490 --> 01:09:25,332 actually to, for instance, 59.33333. 1611 01:09:25,332 --> 01:09:28,540 And if I actually started poking around, we would really see the imprecision. 1612 01:09:28,540 --> 01:09:31,183 And we're starting to see it on the screen here already. 1613 01:09:31,183 --> 01:09:33,100 Well, let me go ahead make this more succinct. 1614 01:09:33,100 --> 01:09:34,892 I don't need to use append, append, append. 1615 01:09:34,892 --> 01:09:40,569 In Python, I can just say scores 72, 73, 33, not unlike the curly brace notation 1616 01:09:40,569 --> 01:09:43,029 you might recall seeing at some points in C. 1617 01:09:43,029 --> 01:09:46,149 But it's a little more commonly used here in Python. 1618 01:09:46,149 --> 01:09:49,465 So this, too, is going to work exactly the same, the point being lists 1619 01:09:49,465 --> 01:09:51,010 can grow and shrink. 1620 01:09:51,010 --> 01:09:52,359 If you want a list, just use it. 1621 01:09:52,359 --> 01:09:55,820 You don't have to think as hard anymore about using that type of structure. 1622 01:09:55,820 --> 01:09:56,320 All right. 1623 01:09:56,320 --> 01:09:58,120 Let me open up one of the first problems, 1624 01:09:58,120 --> 01:09:59,830 though, we encountered in week 2. 1625 01:09:59,830 --> 01:10:03,250 And that was, for instance, in string2.c. 1626 01:10:03,250 --> 01:10:08,920 In string2.c, recall that I simply wanted to iterate over 1627 01:10:08,920 --> 01:10:10,377 all of the characters in a string. 1628 01:10:10,377 --> 01:10:13,210 And this problem we were able to solve pretty straightforwardly in C 1629 01:10:13,210 --> 01:10:16,330 by using the square bracket notation-- turns out in Python, 1630 01:10:16,330 --> 01:10:18,130 we can do this a little more succinctly. 1631 01:10:18,130 --> 01:10:21,622 Let me go ahead and call this string.py. 1632 01:10:21,622 --> 01:10:27,880 I'm going to go ahead and now import from CS50 the get_string library 1633 01:10:27,880 --> 01:10:30,130 just to make user input a little easier today. 1634 01:10:30,130 --> 01:10:32,800 I'm going to go ahead and get a string from the user, 1635 01:10:32,800 --> 01:10:34,150 asking them for their inputs. 1636 01:10:34,150 --> 01:10:37,450 And then, I'm just going to go ahead and print out output. 1637 01:10:37,450 --> 01:10:39,550 And then, I'm going to suppress the new line, just 1638 01:10:39,550 --> 01:10:41,800 to keep things all in the same line. 1639 01:10:41,800 --> 01:10:44,590 And then, I want to iterate now over the user's input 1640 01:10:44,590 --> 01:10:46,720 and print it character for character. 1641 01:10:46,720 --> 01:10:51,010 Well, in C, I did this with square bracket notation and a very verbose 1642 01:10:51,010 --> 01:10:51,880 for loop. 1643 01:10:51,880 --> 01:10:57,288 In Python, I can do something pretty similar-- for i in range length of s, 1644 01:10:57,288 --> 01:11:00,080 because the length of the string is the total number of characters. 1645 01:11:00,080 --> 01:11:01,960 If I pass that as input to range, that lets 1646 01:11:01,960 --> 01:11:04,120 me iterate once for every character. 1647 01:11:04,120 --> 01:11:05,650 And I can use the same notation. 1648 01:11:05,650 --> 01:11:08,080 I can print s bracket i in Python. 1649 01:11:08,080 --> 01:11:11,732 And let me get rid of the new lines so that I only have one at the very end. 1650 01:11:11,732 --> 01:11:12,940 So again, I'm typing quickly. 1651 01:11:12,940 --> 01:11:15,315 But range just counts some number of times. 1652 01:11:15,315 --> 01:11:15,940 How many times? 1653 01:11:15,940 --> 01:11:19,060 However many characters there are, as per the length of the string, 1654 01:11:19,060 --> 01:11:22,600 and on each iteration, print the i'th character of s. 1655 01:11:22,600 --> 01:11:24,900 Let me go ahead and run this-- python of string.py. 1656 01:11:24,900 --> 01:11:27,850 Let me type in, for instance-- oops. 1657 01:11:27,850 --> 01:11:28,840 Do that again. 1658 01:11:28,840 --> 01:11:31,360 After I see the prompt for input, let me type Emma's name. 1659 01:11:31,360 --> 01:11:32,430 And there's the output, right? 1660 01:11:32,430 --> 01:11:34,347 It looks the same, even though I'm technically 1661 01:11:34,347 --> 01:11:36,090 printing it character for character. 1662 01:11:36,090 --> 01:11:37,510 But Python is kind of fancy. 1663 01:11:37,510 --> 01:11:39,970 And you don't need all of this mechanical stuff, 1664 01:11:39,970 --> 01:11:42,700 like counting numbers and square bracket notation. 1665 01:11:42,700 --> 01:11:46,900 If you want to iterate over a string character by character, 1666 01:11:46,900 --> 01:11:51,037 you can just say for c in s, print c. 1667 01:11:51,037 --> 01:11:53,620 And it will figure out how to get the character that you want. 1668 01:11:53,620 --> 01:11:55,570 Technically, let me override the new line. 1669 01:11:55,570 --> 01:11:57,290 But this is much more pleasant now. 1670 01:11:57,290 --> 01:12:00,580 Now, if I want to type in the same thing, voila, works the same, 1671 01:12:00,580 --> 01:12:03,500 less code, getting more work done, getting back to other things 1672 01:12:03,500 --> 01:12:05,410 I really want to do instead. 1673 01:12:05,410 --> 01:12:08,440 Let's look at another case from p-- of week 2, 1674 01:12:08,440 --> 01:12:10,570 where we had this upper case code. 1675 01:12:10,570 --> 01:12:14,215 The goal here, recall, was to take a string from the user s, 1676 01:12:14,215 --> 01:12:18,880 and then go ahead and capitalize all of the letters therein. 1677 01:12:18,880 --> 01:12:21,760 So how might I do this in-- oops-- how might I do this in Python? 1678 01:12:21,760 --> 01:12:23,800 Well, we've seen hints of this already. 1679 01:12:23,800 --> 01:12:27,350 Let me go ahead and in a file called uppercase.py, 1680 01:12:27,350 --> 01:12:32,562 I'm going to go ahead and from cs50 import get_string as before. 1681 01:12:32,562 --> 01:12:35,020 Then, I'm going to go ahead and get a string from the user, 1682 01:12:35,020 --> 01:12:37,300 asking them for the before version. 1683 01:12:37,300 --> 01:12:40,370 And then, here, I'm going to go ahead and print out after. 1684 01:12:40,370 --> 01:12:43,000 And then, I'm going to go ahead and print out known line. 1685 01:12:43,000 --> 01:12:43,390 And you know what? 1686 01:12:43,390 --> 01:12:45,265 If I want to print the string, I'm just going 1687 01:12:45,265 --> 01:12:49,210 to go ahead and print the string.upper and be done with it today. 1688 01:12:49,210 --> 01:12:51,910 So now, if I do Python upper-- up-- 1689 01:12:51,910 --> 01:12:56,170 oops-- Python of uppercase, and let's type in Emma's name this time in all 1690 01:12:56,170 --> 01:12:57,220 lowercase-- 1691 01:12:57,220 --> 01:12:57,942 wala-- done. 1692 01:12:57,942 --> 01:12:59,650 And you don't have to worry about getting 1693 01:12:59,650 --> 01:13:02,920 into the weeds of each individual character. 1694 01:13:02,920 --> 01:13:05,650 Variables of type string, like s in this case, 1695 01:13:05,650 --> 01:13:07,930 have functions built in, like upper. 1696 01:13:07,930 --> 01:13:10,080 And we saw lower, as well, earlier. 1697 01:13:10,080 --> 01:13:10,580 All right. 1698 01:13:10,580 --> 01:13:13,705 Someone asked during the break about command line arguments, the things you 1699 01:13:13,705 --> 01:13:15,382 can type after the word at the prompt. 1700 01:13:15,382 --> 01:13:17,590 Well, it's a little weird with Python, because you're 1701 01:13:17,590 --> 01:13:20,470 running a program called Python whose command line argument 1702 01:13:20,470 --> 01:13:22,240 is the name of your program. 1703 01:13:22,240 --> 01:13:26,380 But you can still provide command line arguments to your own program 1704 01:13:26,380 --> 01:13:28,070 after the name of the file. 1705 01:13:28,070 --> 01:13:29,680 So it's kind of offset by one. 1706 01:13:29,680 --> 01:13:31,550 But you can, nonetheless, do this. 1707 01:13:31,550 --> 01:13:36,280 So let me go ahead and open up from week 2, say, argv1.c. 1708 01:13:36,280 --> 01:13:37,840 And this is from a few weeks back. 1709 01:13:37,840 --> 01:13:40,240 And the purpose of this program in C was just 1710 01:13:40,240 --> 01:13:43,630 to print each command line argument one at a time. 1711 01:13:43,630 --> 01:13:46,630 In Python, today, I'm going to call this argv.py. 1712 01:13:46,630 --> 01:13:48,745 And this is a little different. 1713 01:13:48,745 --> 01:13:50,620 If you want to access command line arguments, 1714 01:13:50,620 --> 01:13:53,410 you can't just use argv and argc because there 1715 01:13:53,410 --> 01:13:59,320 is no int main void, or specifically, int main argc, string argv, 1716 01:13:59,320 --> 01:14:00,190 as there was in c. 1717 01:14:00,190 --> 01:14:01,330 That's gone. 1718 01:14:01,330 --> 01:14:04,480 But argv and command line arguments more generally 1719 01:14:04,480 --> 01:14:06,430 are exposed to you in another library. 1720 01:14:06,430 --> 01:14:08,460 It happens to be called sys for system. 1721 01:14:08,460 --> 01:14:11,210 And you can literally just import argv if you want. 1722 01:14:11,210 --> 01:14:13,270 So it's a little different, but same exact idea. 1723 01:14:13,270 --> 01:14:17,650 And if I want to print each of those, I can say for i in range-- 1724 01:14:17,650 --> 01:14:19,270 now I want to say argc. 1725 01:14:19,270 --> 01:14:22,700 My goal at hand, again, per the left, is just 1726 01:14:22,700 --> 01:14:25,880 to print each command line argument and be done with it. 1727 01:14:25,880 --> 01:14:27,160 But I don't have argc. 1728 01:14:27,160 --> 01:14:30,160 And you might like to do this, but that doesn't exist. 1729 01:14:30,160 --> 01:14:30,970 But that's OK. 1730 01:14:30,970 --> 01:14:35,530 How do you think I could get the number of arguments in argv? 1731 01:14:35,530 --> 01:14:37,676 The number of strings in argv? 1732 01:14:37,676 --> 01:14:39,350 AUDIENCE: [INAUDIBLE] 1733 01:14:39,350 --> 01:14:41,100 DAVID MALAN: Yeah, go with your instincts. 1734 01:14:41,100 --> 01:14:42,933 We've only seen a few building blocks today. 1735 01:14:42,933 --> 01:14:46,590 But if argv is a list of all command line arguments, 1736 01:14:46,590 --> 01:14:51,630 it stands to reason that the length of that list is the same thing as argc. 1737 01:14:51,630 --> 01:14:54,002 In c, the length of something and the something 1738 01:14:54,002 --> 01:14:55,710 were kept separate in separate variables. 1739 01:14:55,710 --> 01:14:57,960 In Python, you only need the thing itself 1740 01:14:57,960 --> 01:15:00,660 because you can just ask it, what is your length? 1741 01:15:00,660 --> 01:15:02,700 So if I go ahead and do this, I can now go ahead 1742 01:15:02,700 --> 01:15:06,750 and print out argv of bracket i. 1743 01:15:06,750 --> 01:15:07,470 And let's see. 1744 01:15:07,470 --> 01:15:09,370 Python of argv.py. 1745 01:15:09,370 --> 01:15:10,920 Enter. 1746 01:15:10,920 --> 01:15:13,150 Nothing printed except the program's name. 1747 01:15:13,150 --> 01:15:15,270 But what if I type in foo? 1748 01:15:15,270 --> 01:15:16,650 What if I type in bar? 1749 01:15:16,650 --> 01:15:17,910 What if I type in baz? 1750 01:15:17,910 --> 01:15:20,460 These are just weird go-to words that computer scientists use 1751 01:15:20,460 --> 01:15:22,740 when they need a placeholder like xyz. 1752 01:15:22,740 --> 01:15:27,333 It's indeed printing all of the words after my program's name. 1753 01:15:27,333 --> 01:15:29,250 Of course, I don't need to get into the weeds. 1754 01:15:29,250 --> 01:15:31,542 As before, if you want to iterate over all of the words 1755 01:15:31,542 --> 01:15:36,285 in a list for i and/or, let's say, for arg in argv, 1756 01:15:36,285 --> 01:15:37,410 just go ahead and print it. 1757 01:15:37,410 --> 01:15:37,910 Voila. 1758 01:15:37,910 --> 01:15:38,580 Python. 1759 01:15:38,580 --> 01:15:40,680 Much faster to do the same thing. 1760 01:15:40,680 --> 01:15:43,830 So it reads a lot more like English even though it's a little terse, 1761 01:15:43,830 --> 01:15:47,970 but the end result is going to be the same thing here. 1762 01:15:47,970 --> 01:15:51,330 A couple more quick examples just of building blocks that you might assume 1763 01:15:51,330 --> 01:15:53,370 exist, and indeed do. 1764 01:15:53,370 --> 01:15:56,790 In exit.c, a few weeks back, we just introduced the notion 1765 01:15:56,790 --> 01:15:59,970 of returning 0 or returning 1 or some other value 1766 01:15:59,970 --> 01:16:02,220 just to signify that something worked or did not work. 1767 01:16:02,220 --> 01:16:04,140 This was success or failure. 1768 01:16:04,140 --> 01:16:07,860 Python offers the same feature but the syntax is a little different. 1769 01:16:07,860 --> 01:16:10,980 Let me create a file called exit.py. 1770 01:16:10,980 --> 01:16:14,710 And I can get access to both argv and exit like this. 1771 01:16:14,710 --> 01:16:19,953 Let me go ahead and from sys import argv and a function called exit. 1772 01:16:19,953 --> 01:16:22,620 So in Python, you don't just magically have access to functions. 1773 01:16:22,620 --> 01:16:25,140 Sometimes you do need, as in C, to import them. 1774 01:16:25,140 --> 01:16:27,792 And you only know this from the documentation what exists. 1775 01:16:27,792 --> 01:16:29,250 And I'm going to do the same thing. 1776 01:16:29,250 --> 01:16:33,390 So I wanted to say in c, if argc does not equal to, the equivalent in Python 1777 01:16:33,390 --> 01:16:36,930 is if length of argv does not equal to. 1778 01:16:36,930 --> 01:16:37,890 What do I want to do? 1779 01:16:37,890 --> 01:16:42,570 I want to go ahead and print missing command line argument. 1780 01:16:42,570 --> 01:16:45,180 And then I'm going to go ahead and exit 1. 1781 01:16:45,180 --> 01:16:47,520 So whereas in c we said return 1 because we 1782 01:16:47,520 --> 01:16:50,278 had a special main function, in Python, for now, 1783 01:16:50,278 --> 01:16:51,570 we're just going to say exit 1. 1784 01:16:51,570 --> 01:16:53,880 Same idea, slightly different name. 1785 01:16:53,880 --> 01:16:59,940 Otherwise I'm going to go ahead and print out hello, placeholder, argv 1. 1786 01:16:59,940 --> 01:17:01,140 With an f string. 1787 01:17:01,140 --> 01:17:03,690 So this one's a little faster. 1788 01:17:03,690 --> 01:17:06,830 But just to be super clear, all I'm doing is converting from left to right. 1789 01:17:06,830 --> 01:17:09,330 And we'll have all of these examples on the course's website 1790 01:17:09,330 --> 01:17:11,250 if you want to look at the more slowly left and right. 1791 01:17:11,250 --> 01:17:14,020 The only new detail here is instead of returning one in error, 1792 01:17:14,020 --> 01:17:15,570 I'm going to start calling exit 1. 1793 01:17:15,570 --> 01:17:19,415 And I have to access that function after importing it from the sys library. 1794 01:17:19,415 --> 01:17:20,790 That's all that's different here. 1795 01:17:20,790 --> 01:17:25,530 Returning 0 is then, the same thing is exiting 0 as well. 1796 01:17:25,530 --> 01:17:26,190 All right. 1797 01:17:26,190 --> 01:17:28,650 What more building blocks might we like? 1798 01:17:28,650 --> 01:17:33,020 How about-- oh, this is interesting to me. 1799 01:17:33,020 --> 01:17:37,080 Here, let's go ahead and open up names.py, or rather-- 1800 01:17:37,080 --> 01:17:39,450 let's see. 1801 01:17:39,450 --> 01:17:41,920 Actually, let's go out and do this one from scratch. 1802 01:17:41,920 --> 01:17:45,330 I'm going to go ahead and do a quick linear search style algorithm, 1803 01:17:45,330 --> 01:17:47,420 this one called names.py. 1804 01:17:47,420 --> 01:17:53,310 Let me go ahead and import from sys import exit 1805 01:17:53,310 --> 01:17:55,967 just so I can return 0 or 1 as needed. 1806 01:17:55,967 --> 01:17:58,800 Let me give myself a list of names just like we did a few weeks ago. 1807 01:17:58,800 --> 01:18:02,760 Emma, and Rodrigo, and Brian, and my own. 1808 01:18:02,760 --> 01:18:05,787 All in caps just because, just for consistency with a few weeks back. 1809 01:18:05,787 --> 01:18:07,620 Suppose I want to search for just one of us. 1810 01:18:07,620 --> 01:18:09,370 And suppose this program is only searching 1811 01:18:09,370 --> 01:18:12,280 for Emma to see if she's in a list, just as we did a few weeks back. 1812 01:18:12,280 --> 01:18:14,170 Well, in the past, you would do a 4 loop. 1813 01:18:14,170 --> 01:18:16,420 You would iterate over every darn element in the list, 1814 01:18:16,420 --> 01:18:19,830 checking if it equals equals Emma or stir comparing against Emma. 1815 01:18:19,830 --> 01:18:20,490 Oh my god, no. 1816 01:18:20,490 --> 01:18:21,865 We don't need to do that anymore. 1817 01:18:21,865 --> 01:18:27,960 If you want to know if something is in a list, just say if Emma in names, print, 1818 01:18:27,960 --> 01:18:29,300 found. 1819 01:18:29,300 --> 01:18:32,700 And then I'm going to go ahead and exit 0 for success. 1820 01:18:32,700 --> 01:18:36,210 And down here, I'm going to assume if I get this far, Not found. 1821 01:18:36,210 --> 01:18:38,340 And I'll exit 1. 1822 01:18:38,340 --> 01:18:42,910 So if I run Python of names.py. 1823 01:18:42,910 --> 01:18:43,620 Enter. 1824 01:18:43,620 --> 01:18:44,640 Emma is found. 1825 01:18:44,640 --> 01:18:48,600 Suppose I change her name to Humphrey up here. 1826 01:18:48,600 --> 01:18:52,110 Now it's not going to be found because Emma is not technically in the list. 1827 01:18:52,110 --> 01:18:53,470 Emma Humphrey is in the list. 1828 01:18:53,470 --> 01:18:55,890 So now if I rerun it she's not found. 1829 01:18:55,890 --> 01:19:00,240 But I have distilled into a succinct one liner all of the logic 1830 01:19:00,240 --> 01:19:04,840 that for weeks we've been using things like for loops, for, and the like. 1831 01:19:04,840 --> 01:19:05,340 All right. 1832 01:19:05,340 --> 01:19:10,020 Any questions before now we introduce some new Python-specific capabilities? 1833 01:19:10,020 --> 01:19:10,656 Yeah. 1834 01:19:10,656 --> 01:19:13,392 AUDIENCE: [INAUDIBLE] 1835 01:19:13,392 --> 01:19:14,373 1836 01:19:14,373 --> 01:19:15,790 DAVID MALAN: Really good question. 1837 01:19:15,790 --> 01:19:18,570 What would be the big O notation for doing this here? 1838 01:19:18,570 --> 01:19:19,762 This is well-documented. 1839 01:19:19,762 --> 01:19:21,720 So if you actually read Python's documentation, 1840 01:19:21,720 --> 01:19:24,270 for each of its data structures, something like a list 1841 01:19:24,270 --> 01:19:25,560 will give you big O of n. 1842 01:19:25,560 --> 01:19:26,850 That is well-defined. 1843 01:19:26,850 --> 01:19:29,635 A dictionary, too, has well-defined with high probability, 1844 01:19:29,635 --> 01:19:31,260 and we'll come to that in a little bit. 1845 01:19:31,260 --> 01:19:33,843 You would read the documentation to know exactly those things. 1846 01:19:33,843 --> 01:19:35,850 So having familiarity with that big O notation 1847 01:19:35,850 --> 01:19:39,280 can actually help you answer those things from docs as well. 1848 01:19:39,280 --> 01:19:39,780 All right. 1849 01:19:39,780 --> 01:19:43,320 Let's go ahead and open up a fancier example, 1850 01:19:43,320 --> 01:19:46,650 or write one, called phonebook.py, the goal of which 1851 01:19:46,650 --> 01:19:48,510 is to represent the notion of a phone book. 1852 01:19:48,510 --> 01:19:52,110 Let me go ahead now and still from sys import exit 1853 01:19:52,110 --> 01:19:54,000 just so I can terminate if we fail. 1854 01:19:54,000 --> 01:19:56,040 Let me go ahead and define a bunch of people. 1855 01:19:56,040 --> 01:19:59,880 But instead of putting people in a list like before, now 1856 01:19:59,880 --> 01:20:02,400 I want to use something like a hash table. 1857 01:20:02,400 --> 01:20:06,122 A hash table, recall, has inputs and outputs like keys and values. 1858 01:20:06,122 --> 01:20:07,830 Or more generally, this is now what we're 1859 01:20:07,830 --> 01:20:09,340 going to start calling a dictionary. 1860 01:20:09,340 --> 01:20:11,160 A dictionary, just like in the human world, 1861 01:20:11,160 --> 01:20:13,922 has a lot of words with a lot of definitions. 1862 01:20:13,922 --> 01:20:15,630 A phone book is essentially a dictionary. 1863 01:20:15,630 --> 01:20:18,030 It's got a lot of names and a lot of numbers. 1864 01:20:18,030 --> 01:20:20,770 Those are keys and values respectively. 1865 01:20:20,770 --> 01:20:25,950 So a dict in Python takes as input keys and produces as output values. 1866 01:20:25,950 --> 01:20:28,080 And it happens to be implemented typically 1867 01:20:28,080 --> 01:20:31,410 by the people who invented Python using a hash table. 1868 01:20:31,410 --> 01:20:34,500 So the hash table you all wrote is now a building block 1869 01:20:34,500 --> 01:20:37,740 to these data structures or abstract data structures that we'll now call, 1870 01:20:37,740 --> 01:20:40,450 for instance, a dictionary more generally. 1871 01:20:40,450 --> 01:20:43,770 So curly braces are back only in the context 1872 01:20:43,770 --> 01:20:47,070 here of defining what's a dict or dictionary. 1873 01:20:47,070 --> 01:20:49,530 I'm going to go ahead and define a key called Emma 1874 01:20:49,530 --> 01:20:51,988 and I'm going to give her the same phone number we gave her 1875 01:20:51,988 --> 01:20:53,340 a while back of this. 1876 01:20:53,340 --> 01:20:54,210 Notice the colon. 1877 01:20:54,210 --> 01:20:57,497 Notice the double quotes around each value. 1878 01:20:57,497 --> 01:20:59,580 Let me go ahead and put Rodrigo in the phone book. 1879 01:20:59,580 --> 01:21:04,320 And his number is going to be 617-555-0101 as before. 1880 01:21:04,320 --> 01:21:07,230 Let me go ahead and put Brian in there, also separated with a colon. 1881 01:21:07,230 --> 01:21:09,060 555-0102. 1882 01:21:09,060 --> 01:21:14,520 And I'll put myself in there with 617-555-0103. 1883 01:21:14,520 --> 01:21:16,530 So this is a little different-looking. 1884 01:21:16,530 --> 01:21:18,210 The curly braces say, hey, Python. 1885 01:21:18,210 --> 01:21:19,795 Here comes a dictionary. 1886 01:21:19,795 --> 01:21:22,920 A dictionary has keys and values, just like a dictionary in the human world 1887 01:21:22,920 --> 01:21:26,490 has keys which are words and values which are definitions. 1888 01:21:26,490 --> 01:21:27,750 Phone is the same idea. 1889 01:21:27,750 --> 01:21:30,990 Names and numbers are our keys and values. 1890 01:21:30,990 --> 01:21:33,480 I'm separating each key and value with a colon 1891 01:21:33,480 --> 01:21:36,720 and I'm separating those pairs with a comma. 1892 01:21:36,720 --> 01:21:37,220 All right. 1893 01:21:37,220 --> 01:21:38,770 So why is this useful? 1894 01:21:38,770 --> 01:21:42,030 This is now the simplest way to represent a phone book or even 1895 01:21:42,030 --> 01:21:45,270 a dictionary with words and definitions in Python. 1896 01:21:45,270 --> 01:21:48,990 I can now ask a question like if Emma in people. 1897 01:21:48,990 --> 01:21:50,880 Well, let me go ahead and get her number. 1898 01:21:50,880 --> 01:21:55,740 Let me go to say ahead and say Found, people, bracket, 1899 01:21:55,740 --> 01:21:58,570 Emma, using some newer syntax. 1900 01:21:58,570 --> 01:22:00,620 But I'll come back to this in a moment. 1901 01:22:00,620 --> 01:22:02,365 And let's just start with this. 1902 01:22:02,365 --> 01:22:04,740 So this is not going to work until I make it an f string, 1903 01:22:04,740 --> 01:22:06,030 but let's see why this works. 1904 01:22:06,030 --> 01:22:08,070 Python phonebook.py. 1905 01:22:08,070 --> 01:22:09,660 Am I going to find Emma? 1906 01:22:09,660 --> 01:22:10,350 Indeed. 1907 01:22:10,350 --> 01:22:11,160 I found her number. 1908 01:22:11,160 --> 01:22:16,570 If I change this to myself, David, and save and rerun it-- 1909 01:22:16,570 --> 01:22:17,933 oh. 1910 01:22:17,933 --> 01:22:19,350 You have to change this here, too. 1911 01:22:19,350 --> 01:22:20,620 David. 1912 01:22:20,620 --> 01:22:21,400 Sorry. 1913 01:22:21,400 --> 01:22:22,910 Now I get my number as well. 1914 01:22:22,910 --> 01:22:24,380 So what's going on here? 1915 01:22:24,380 --> 01:22:29,127 So this is the Pythonic way of just asking, is a value in a data structure? 1916 01:22:29,127 --> 01:22:30,460 You don't have to use for loops. 1917 01:22:30,460 --> 01:22:33,430 You don't have to traverse chains or linked lists or the like. 1918 01:22:33,430 --> 01:22:36,190 You can just ask the question as on line 10 here. 1919 01:22:36,190 --> 01:22:38,290 This is somewhat new syntax. 1920 01:22:38,290 --> 01:22:41,260 But what's cool about dictionaries in Python 1921 01:22:41,260 --> 01:22:43,240 is that if the dictionary's called people-- 1922 01:22:43,240 --> 01:22:46,000 and you know it's a dictionary only from these curly braces. 1923 01:22:46,000 --> 01:22:48,490 If the dictionary is called people, you can treat it 1924 01:22:48,490 --> 01:22:53,770 like an array but whose indices are not numbers 0, 1, 2, 3, 1925 01:22:53,770 --> 01:22:56,230 but whose indices are words. 1926 01:22:56,230 --> 01:22:58,360 So another name for a dictionary and programming 1927 01:22:58,360 --> 01:23:01,510 is called in associative array, which is almost a better name, because it 1928 01:23:01,510 --> 01:23:03,010 makes it sound like an array. 1929 01:23:03,010 --> 01:23:06,700 But it's associative in the sense that you can associate words with values, 1930 01:23:06,700 --> 01:23:08,590 not just numbers with values. 1931 01:23:08,590 --> 01:23:10,270 So a dictionary, to be clear-- 1932 01:23:10,270 --> 01:23:11,650 key value pairs. 1933 01:23:11,650 --> 01:23:13,870 The keys, though, are strings. 1934 01:23:13,870 --> 01:23:15,493 And the values are anything you want. 1935 01:23:15,493 --> 01:23:16,910 In this case, their phone numbers. 1936 01:23:16,910 --> 01:23:21,260 But they could be definitions of actual English words in a dictionary. 1937 01:23:21,260 --> 01:23:21,760 All right. 1938 01:23:21,760 --> 01:23:23,710 And I can go ahead and clean this up, too. 1939 01:23:23,710 --> 01:23:25,120 I can change this back to Emma. 1940 01:23:25,120 --> 01:23:28,490 And if I find her, I can go ahead and say exit 0. 1941 01:23:28,490 --> 01:23:32,470 And if I don't find her, I could just say print not found and exit 1. 1942 01:23:32,470 --> 01:23:34,180 But the exits aren't strictly necessary. 1943 01:23:34,180 --> 01:23:36,040 The program will still quit. 1944 01:23:36,040 --> 01:23:36,768 Yeah. 1945 01:23:36,768 --> 01:23:39,482 AUDIENCE: [INAUDIBLE] 1946 01:23:39,482 --> 01:23:41,690 DAVID MALAN: Really good question and that's subtlety 1947 01:23:41,690 --> 01:23:43,310 that I didn't mention explicitly. 1948 01:23:43,310 --> 01:23:45,500 The single quotes are necessary here because Python 1949 01:23:45,500 --> 01:23:49,280 would get confused if I've got outer quotes here and outer quotes here 1950 01:23:49,280 --> 01:23:51,420 on the beginning and end of line 11. 1951 01:23:51,420 --> 01:23:54,500 So I'm deliberately using single quotes, which are OK in Python. 1952 01:23:54,500 --> 01:23:55,880 You can use double or single. 1953 01:23:55,880 --> 01:23:59,420 Unlike in C where double was strings and single was chars, 1954 01:23:59,420 --> 01:24:00,650 there are no chars in Python. 1955 01:24:00,650 --> 01:24:03,020 So you get to use both for either purpose. 1956 01:24:03,020 --> 01:24:03,699 Yeah. 1957 01:24:03,699 --> 01:24:07,691 AUDIENCE: [INAUDIBLE] 1958 01:24:07,691 --> 01:24:10,273 1959 01:24:10,273 --> 01:24:11,690 DAVID MALAN: Really good question. 1960 01:24:11,690 --> 01:24:15,020 So in pset 5, you implemented a hash table, 1961 01:24:15,020 --> 01:24:17,950 which is the more lower-level notion of a dictionary. 1962 01:24:17,950 --> 01:24:21,430 What I mean by that is that you stored words in the dictionary. 1963 01:24:21,430 --> 01:24:24,430 But sometimes you had collisions, and so you use the linked lists. 1964 01:24:24,430 --> 01:24:25,330 That's fine. 1965 01:24:25,330 --> 01:24:29,620 But your check function, recall, in pset 5 only returns true or false. 1966 01:24:29,620 --> 01:24:31,630 Is the word in the dictionary or not? 1967 01:24:31,630 --> 01:24:34,330 The check function did not reveal any information 1968 01:24:34,330 --> 01:24:38,110 about how long it took to find that word or how far down the chain 1969 01:24:38,110 --> 01:24:39,340 it actually was. 1970 01:24:39,340 --> 01:24:43,030 A dictionary is similarly an abstraction similar in spirit to your check 1971 01:24:43,030 --> 01:24:43,820 function. 1972 01:24:43,820 --> 01:24:44,320 Yes. 1973 01:24:44,320 --> 01:24:47,740 Technically, underneath the hood, Emma and Rodrigo 1974 01:24:47,740 --> 01:24:51,430 for whatever reason might hash to the same bucket, like the buckets on stage. 1975 01:24:51,430 --> 01:24:53,440 But all you care about is the value. 1976 01:24:53,440 --> 01:24:57,700 The dictionary's purpose in life is to go find Emma's value for you 1977 01:24:57,700 --> 01:25:01,240 or Rodrigo's value for you and return it as quickly as possible. 1978 01:25:01,240 --> 01:25:03,700 The fact that it happens to lead to a linked list, 1979 01:25:03,700 --> 01:25:07,960 maybe, is an implementation detail that is not exposed to me, 1980 01:25:07,960 --> 01:25:10,570 the programmer who just wants to store keys and values. 1981 01:25:10,570 --> 01:25:13,810 And that's the difference between an abstract data type like a dictionary 1982 01:25:13,810 --> 01:25:17,110 and an actual data structure like a hash table. 1983 01:25:17,110 --> 01:25:21,600 You use the latter to implement the former. 1984 01:25:21,600 --> 01:25:22,150 All right. 1985 01:25:22,150 --> 01:25:25,820 Few final examples before we now make things more real world. 1986 01:25:25,820 --> 01:25:29,230 You'll recall from week 4, the last past week that we'll look at, 1987 01:25:29,230 --> 01:25:32,260 we had a few problems that we encountered, for instance, 1988 01:25:32,260 --> 01:25:34,420 with comparing strings. 1989 01:25:34,420 --> 01:25:36,790 This is a couple of weeks back now. 1990 01:25:36,790 --> 01:25:40,420 But recall that this example was initially problematic 1991 01:25:40,420 --> 01:25:43,660 because you could not compare s equals equals t. 1992 01:25:43,660 --> 01:25:45,580 You had to use stir compare. 1993 01:25:45,580 --> 01:25:48,820 Why could you not just say if s equals equals t to compare two strings 1994 01:25:48,820 --> 01:25:50,470 and see? 1995 01:25:50,470 --> 01:25:51,168 Yeah. 1996 01:25:51,168 --> 01:25:53,585 AUDIENCE: We could [INAUDIBLE]. 1997 01:25:53,585 --> 01:25:54,460 DAVID MALAN: Exactly. 1998 01:25:54,460 --> 01:25:57,640 They were pointer to chars or addresses of strings. 1999 01:25:57,640 --> 01:26:00,970 And you would be comparing the addresses of those strings that 2000 01:26:00,970 --> 01:26:04,210 might look the same but they are stored in different locations. 2001 01:26:04,210 --> 01:26:07,870 In Python, that nuance is now gone. 2002 01:26:07,870 --> 01:26:11,230 If in Python you want to compare two strings, by god, 2003 01:26:11,230 --> 01:26:14,050 just compare those two strings like this. 2004 01:26:14,050 --> 01:26:16,060 Let me call this compare.py. 2005 01:26:16,060 --> 01:26:19,540 Let me go ahead and from the cs50 library import get_string. 2006 01:26:19,540 --> 01:26:21,750 Let me go ahead and get two strings from the user. 2007 01:26:21,750 --> 01:26:25,330 For instance, s and t, arbitrarily as before. 2008 01:26:25,330 --> 01:26:27,730 get_string. 2009 01:26:27,730 --> 01:26:28,300 Here we go. 2010 01:26:28,300 --> 01:26:30,520 Quote, unquote t. 2011 01:26:30,520 --> 01:26:34,810 And then if you want to check if s equals equals t, just ask the question 2012 01:26:34,810 --> 01:26:36,520 and say Same if so. 2013 01:26:36,520 --> 01:26:39,730 Else, go ahead and say Different. 2014 01:26:39,730 --> 01:26:45,640 Now if I run this program as compare.py, Python of compare.py, 2015 01:26:45,640 --> 01:26:50,680 let me go ahead and type in, say, my name here and then my name again. 2016 01:26:50,680 --> 01:26:54,310 Technically in C, s and t were stored in different locations. 2017 01:26:54,310 --> 01:26:56,290 And in Python, they technically are, too. 2018 01:26:56,290 --> 01:26:57,100 Doesn't matter. 2019 01:26:57,100 --> 01:26:59,380 The equal equal operator in Python is going 2020 01:26:59,380 --> 01:27:02,810 to compare literally what you intended. 2021 01:27:02,810 --> 01:27:03,310 All right. 2022 01:27:03,310 --> 01:27:04,105 What about this? 2023 01:27:04,105 --> 01:27:08,800 This one was painful and sparked the whole exploration down 2024 01:27:08,800 --> 01:27:11,740 the rabbit hole of pointers and addresses and the like. 2025 01:27:11,740 --> 01:27:13,570 Suppose you just want to swap two values, 2026 01:27:13,570 --> 01:27:16,900 x and y initialized a couple weeks ago to 1 and 2. 2027 01:27:16,900 --> 01:27:20,860 My god, the hoops we had to jump through in C just to swap two values. 2028 01:27:20,860 --> 01:27:25,090 Hopefully by the end, you understood why there was this fundamental issue. 2029 01:27:25,090 --> 01:27:28,570 And that, again, had to do with memory and moving things around and copying. 2030 01:27:28,570 --> 01:27:30,730 But in Python, guess what? 2031 01:27:30,730 --> 01:27:34,197 Let me go ahead in Python and call a program swap.py. 2032 01:27:34,197 --> 01:27:36,280 And let me go ahead and give myself two variables. 2033 01:27:36,280 --> 01:27:38,197 That alone is already faster because you don't 2034 01:27:38,197 --> 01:27:40,720 have to worry about data types or semicolons. 2035 01:27:40,720 --> 01:27:43,150 Let me go ahead and just declare that x is 2036 01:27:43,150 --> 01:27:48,670 x, y, is y, just so we can see what these values are. 2037 01:27:48,670 --> 01:27:50,500 However, I could just use debug50. 2038 01:27:50,500 --> 01:27:53,530 You can also debug Python programs in the IDE is well. 2039 01:27:53,530 --> 01:27:57,920 I'm going to do this twice, recall, the goal now being to swap two values. 2040 01:27:57,920 --> 01:28:00,400 So if I want to swap x and y, guess what? 2041 01:28:00,400 --> 01:28:02,980 In Python, no big deal. 2042 01:28:02,980 --> 01:28:05,560 Swap. 2043 01:28:05,560 --> 01:28:06,290 All right. 2044 01:28:06,290 --> 01:28:07,180 Python. 2045 01:28:07,180 --> 01:28:09,080 swap.py. 2046 01:28:09,080 --> 01:28:10,220 oh, my god. 2047 01:28:10,220 --> 01:28:12,830 You get it for free with the language. 2048 01:28:12,830 --> 01:28:17,750 So now let's actually start to take things 2049 01:28:17,750 --> 01:28:20,270 in the direction we did in week 4 with file IO. 2050 01:28:20,270 --> 01:28:23,060 Let me open up phonebook.c. 2051 01:28:23,060 --> 01:28:27,740 This was another example of phone book manipulation where, recall, 2052 01:28:27,740 --> 01:28:31,760 we opened a file called phonebook.csv which is like a lightweight Excel file. 2053 01:28:31,760 --> 01:28:33,080 Comma, separated values. 2054 01:28:33,080 --> 01:28:34,460 Simple text file. 2055 01:28:34,460 --> 01:28:36,020 We opened it with fopen. 2056 01:28:36,020 --> 01:28:38,210 We then got a name and a number from the human. 2057 01:28:38,210 --> 01:28:40,220 And then we use this new function fprintf-- 2058 01:28:40,220 --> 01:28:45,290 file printf-- to just print something percent s comma something else. 2059 01:28:45,290 --> 01:28:46,910 The name comma number to the file. 2060 01:28:46,910 --> 01:28:51,980 And this is how I was able to add the heads' names and numbers to that CSV. 2061 01:28:51,980 --> 01:28:55,040 Well, we can actually do the same thing in Python 2062 01:28:55,040 --> 01:28:57,470 but a little more simply as well. 2063 01:28:57,470 --> 01:29:01,010 Although the syntax is going to look a little cryptic at first glance. 2064 01:29:01,010 --> 01:29:04,970 Let me go ahead and save this file also as phonebook.py, 2065 01:29:04,970 --> 01:29:06,950 although a fancier version now. 2066 01:29:06,950 --> 01:29:12,440 Let me go ahead and open up here phonebook.csv 2067 01:29:12,440 --> 01:29:15,230 which I've already populated with name comma number, 2068 01:29:15,230 --> 01:29:18,710 just so that if we were to open it in Excel we would have column headings. 2069 01:29:18,710 --> 01:29:20,293 And I'm going to go ahead and do this. 2070 01:29:20,293 --> 01:29:22,390 In Python, if you want to deal with CSV files, 2071 01:29:22,390 --> 01:29:24,980 there's actually a package called CSV. 2072 01:29:24,980 --> 01:29:27,410 Package is a Python word for a library. 2073 01:29:27,410 --> 01:29:30,860 And in that package is a lot of CSV-related functionality. 2074 01:29:30,860 --> 01:29:34,980 And I'm also going to import from cs50 again get string. 2075 01:29:34,980 --> 01:29:35,480 All right. 2076 01:29:35,480 --> 01:29:36,530 What do I want to do? 2077 01:29:36,530 --> 01:29:38,690 First line is going to be pretty similar to C. 2078 01:29:38,690 --> 01:29:41,840 I'm going to open the file using open instead of fopen. 2079 01:29:41,840 --> 01:29:44,030 And I'm going to call the file phonebook.csv. 2080 01:29:44,030 --> 01:29:46,370 And I'm going to open it in quote, unquote, a mode. 2081 01:29:46,370 --> 01:29:48,510 What was a again? 2082 01:29:48,510 --> 01:29:49,010 append. 2083 01:29:49,010 --> 01:29:52,570 If used w, It writes it and will just keep changing it again and again. 2084 01:29:52,570 --> 01:29:54,440 A pen we'll keep adding to the file. 2085 01:29:54,440 --> 01:29:56,730 So we can keep adding more tfs to the file. 2086 01:29:56,730 --> 01:29:57,230 All right. 2087 01:29:57,230 --> 01:29:59,438 Now let me go ahead and just get a name from someone. 2088 01:29:59,438 --> 01:30:01,620 So get_string Name. 2089 01:30:01,620 --> 01:30:04,120 Let me go ahead and get their number via get_string as well. 2090 01:30:04,120 --> 01:30:04,850 Whoops. 2091 01:30:04,850 --> 01:30:07,730 Number equals get string number. 2092 01:30:07,730 --> 01:30:09,020 And get that from the human. 2093 01:30:09,020 --> 01:30:10,460 And now this part's a little new. 2094 01:30:10,460 --> 01:30:13,070 But again, this is the kind of thing that you just Google it 2095 01:30:13,070 --> 01:30:15,800 when you forget the syntax for something like this. 2096 01:30:15,800 --> 01:30:17,750 I'm going to declare a variable called writer, 2097 01:30:17,750 --> 01:30:19,070 though I could call it anything I want. 2098 01:30:19,070 --> 01:30:21,740 The purpose in life is going to be to write stuff to the file. 2099 01:30:21,740 --> 01:30:24,110 I'm going to go inside of the CSV package, 2100 01:30:24,110 --> 01:30:26,810 again, the library that I imported up top. 2101 01:30:26,810 --> 01:30:30,535 And I'm going to pass to a writer function the file. 2102 01:30:30,535 --> 01:30:32,660 So you would only know this from the documentation. 2103 01:30:32,660 --> 01:30:35,530 But what I've highlighted here means hey, Python. 2104 01:30:35,530 --> 01:30:39,650 Pass the open file to this library that's going to make it easier 2105 01:30:39,650 --> 01:30:41,960 for me to read it as a CSV file. 2106 01:30:41,960 --> 01:30:43,130 Rows and columns. 2107 01:30:43,130 --> 01:30:44,150 That's all. 2108 01:30:44,150 --> 01:30:46,580 Now let me go ahead and do this. writer-- 2109 01:30:46,580 --> 01:30:49,640 oops. writer.writerow. 2110 01:30:49,640 --> 01:30:53,870 So writerow is a function that's built in to the CSV library's 2111 01:30:53,870 --> 01:31:01,080 functionality that quite simply lets me write a name and a number to that file. 2112 01:31:01,080 --> 01:31:03,170 It will take care of the commas. 2113 01:31:03,170 --> 01:31:05,330 It will take care of quoting anything. 2114 01:31:05,330 --> 01:31:08,090 As an aside, if one of us were to have a comma in our name 2115 01:31:08,090 --> 01:31:12,050 like Brian U, comma, Junior, that comma could be problematic 2116 01:31:12,050 --> 01:31:14,900 because it could break the CSV's implicit assumption that 2117 01:31:14,900 --> 01:31:16,250 commas separated values. 2118 01:31:16,250 --> 01:31:18,420 But you could put quotes around Brian's full name, 2119 01:31:18,420 --> 01:31:21,140 even if he had a comma, Junior or whatever in his name. 2120 01:31:21,140 --> 01:31:23,810 This library takes care of all of that headache for you. 2121 01:31:23,810 --> 01:31:25,070 But there is a subtlety. 2122 01:31:25,070 --> 01:31:27,440 I mentioned something called a tuple before. 2123 01:31:27,440 --> 01:31:30,110 For low-level, uninteresting reasons now, you actually 2124 01:31:30,110 --> 01:31:31,490 need double parentheses now. 2125 01:31:31,490 --> 01:31:35,210 So you're technically passing in one thing in parens. 2126 01:31:35,210 --> 01:31:37,440 But more on that another time. 2127 01:31:37,440 --> 01:31:40,400 Now let me go ahead and close the file. 2128 01:31:40,400 --> 01:31:41,630 file.close. 2129 01:31:41,630 --> 01:31:43,520 So let me go ahead and run this. 2130 01:31:43,520 --> 01:31:45,780 Python phonebook.py. 2131 01:31:45,780 --> 01:31:46,670 Whoops. 2132 01:31:46,670 --> 01:31:47,720 Invalid syntax. 2133 01:31:47,720 --> 01:31:49,490 I forgot an equal sign. 2134 01:31:49,490 --> 01:31:52,198 And just as in C, you'll see that the red things appear sometimes 2135 01:31:52,198 --> 01:31:55,282 when it knows what you've done wrong, but it takes a little while for them 2136 01:31:55,282 --> 01:31:56,330 to disappear sometimes. 2137 01:31:56,330 --> 01:31:56,830 Name. 2138 01:31:56,830 --> 01:32:01,640 Let's go ahead and add Emma, all caps just for consistency. 2139 01:32:01,640 --> 01:32:04,980 617-555-0101 was her number. 2140 01:32:04,980 --> 01:32:05,480 All right. 2141 01:32:05,480 --> 01:32:07,820 Hopefully, hopefully. 2142 01:32:07,820 --> 01:32:09,930 Come on. 2143 01:32:09,930 --> 01:32:10,470 Come on. 2144 01:32:10,470 --> 01:32:13,690 2145 01:32:13,690 --> 01:32:14,190 Oh wait. 2146 01:32:14,190 --> 01:32:15,367 That's the wrong file. 2147 01:32:15,367 --> 01:32:16,560 [LAUGHTER] 2148 01:32:16,560 --> 01:32:17,310 Here we go. 2149 01:32:17,310 --> 01:32:18,540 Because I created a new one. 2150 01:32:18,540 --> 01:32:20,840 So, cheating. 2151 01:32:20,840 --> 01:32:21,870 Name, number. 2152 01:32:21,870 --> 01:32:24,360 I ran my program in a different directory 2153 01:32:24,360 --> 01:32:25,860 which meant it created a new file. 2154 01:32:25,860 --> 01:32:27,330 So I'm not actually cheating there. 2155 01:32:27,330 --> 01:32:28,580 I was just in the wrong place. 2156 01:32:28,580 --> 01:32:29,170 User error. 2157 01:32:29,170 --> 01:32:30,490 Let's run it once more. 2158 01:32:30,490 --> 01:32:31,950 Rodrigo. 2159 01:32:31,950 --> 01:32:34,970 617-555-0101. 2160 01:32:34,970 --> 01:32:35,610 Enter. 2161 01:32:35,610 --> 01:32:36,760 There we go. 2162 01:32:36,760 --> 01:32:39,630 Let's run it again, this time with Brian. 2163 01:32:39,630 --> 01:32:44,470 Brian, 617-555-0102, and so forth. 2164 01:32:44,470 --> 01:32:47,272 So this code admittedly is not super straightforward. 2165 01:32:47,272 --> 01:32:49,230 And honestly, this is exactly the kind of stuff 2166 01:32:49,230 --> 01:32:52,350 that I Google when I forget actually how to manipulate the CSV. 2167 01:32:52,350 --> 01:32:55,652 But that's what the documentation indeed is there for you. 2168 01:32:55,652 --> 01:32:57,610 And in fact, let me clean this up a little bit. 2169 01:32:57,610 --> 01:32:59,700 It turns out you can write this code a little differently. 2170 01:32:59,700 --> 01:33:02,160 And online, you'll see slightly different approaches. 2171 01:33:02,160 --> 01:33:05,190 You'll see a keyword in Python called with which this makes 2172 01:33:05,190 --> 01:33:07,290 it a little tighter to write your code. 2173 01:33:07,290 --> 01:33:09,360 If you use this keyword with as you'll see 2174 01:33:09,360 --> 01:33:12,060 in documentation and some of the staff sample code, 2175 01:33:12,060 --> 01:33:14,020 you don't have to close the file. 2176 01:33:14,020 --> 01:33:16,740 It will automatically be closed for you, thereby just saving 2177 01:33:16,740 --> 01:33:19,840 you one line of code. 2178 01:33:19,840 --> 01:33:21,120 All right. 2179 01:33:21,120 --> 01:33:24,540 Any questions on that? 2180 01:33:24,540 --> 01:33:25,040 All right. 2181 01:33:25,040 --> 01:33:28,490 And now if we can, enough with the sort of syntactic details. 2182 01:33:28,490 --> 01:33:29,840 Like, that's Python. 2183 01:33:29,840 --> 01:33:34,453 That's going to get you like 80%, 90% of the way through learning Python, 2184 01:33:34,453 --> 01:33:37,370 even though you'll invariably have to lean on the slides and the notes 2185 01:33:37,370 --> 01:33:40,130 and Google and Stack Overflow for a little syntactic details 2186 01:33:40,130 --> 01:33:43,730 as you translate your C programs in problem set 6 2187 01:33:43,730 --> 01:33:46,510 to Python programs in problem set 6. 2188 01:33:46,510 --> 01:33:47,760 But regular expressions. 2189 01:33:47,760 --> 01:33:50,385 Now let's introduce some new powerful features of this language 2190 01:33:50,385 --> 01:33:52,970 that C did not have but other languages do have, too. 2191 01:33:52,970 --> 01:33:56,580 Regular expressions I alluded to earlier as representative of a feature 2192 01:33:56,580 --> 01:33:58,580 where you can define patterns when you're trying 2193 01:33:58,580 --> 01:34:01,890 to detect patterns in users' input. 2194 01:34:01,890 --> 01:34:03,890 And it turns out in regular expressions, there's 2195 01:34:03,890 --> 01:34:07,280 a few pieces of syntax that are useful to know. 2196 01:34:07,280 --> 01:34:12,020 Dot in the examples we're about to do represents any character. 2197 01:34:12,020 --> 01:34:14,300 So if you don't know what character you're expecting, 2198 01:34:14,300 --> 01:34:16,940 you can just say dot to represent any character. 2199 01:34:16,940 --> 01:34:19,670 Dot star is going to mean zero or more characters. 2200 01:34:19,670 --> 01:34:21,817 Dot plus is going to mean one or more characters. 2201 01:34:21,817 --> 01:34:23,900 Question mark is going to mean something optional. 2202 01:34:23,900 --> 01:34:25,770 And. there's some other syntax as well. 2203 01:34:25,770 --> 01:34:27,860 But let's make this more real first. 2204 01:34:27,860 --> 01:34:32,930 If I go back from before into the very simple agreement example 2205 01:34:32,930 --> 01:34:37,760 that we did a while back, you may recall that we had this code here 2206 01:34:37,760 --> 01:34:43,340 where I enumerated explicitly yes and y and no and n. 2207 01:34:43,340 --> 01:34:47,180 But as someone noted, these already kind of follow a pattern. 2208 01:34:47,180 --> 01:34:50,030 And it turns out it might be sufficient just to check for a word 2209 01:34:50,030 --> 01:34:52,580 starting with y or maybe I could check a little more 2210 01:34:52,580 --> 01:34:54,828 succinctly for multiple values at once. 2211 01:34:54,828 --> 01:34:56,120 So let me go ahead and do this. 2212 01:34:56,120 --> 01:35:00,620 It turns out Python has a library called regular expressions, or RE. 2213 01:35:00,620 --> 01:35:04,430 In this library, is a bunch of fancier functionality. 2214 01:35:04,430 --> 01:35:07,520 I can change this if condition to be this instead. 2215 01:35:07,520 --> 01:35:12,410 I can go ahead and use re.search which is a function whose purpose in life 2216 01:35:12,410 --> 01:35:15,260 is going to be to search a string for a pattern 2217 01:35:15,260 --> 01:35:17,900 that you care about, like something starting with y. 2218 01:35:17,900 --> 01:35:21,950 And the way I'm going to do this is search for initially yes. 2219 01:35:21,950 --> 01:35:24,890 And the string I'm going to search is s. 2220 01:35:24,890 --> 01:35:27,980 And that is going to return effectively true or false. 2221 01:35:27,980 --> 01:35:31,370 So I'm going to change my code to just quite simply be this. 2222 01:35:31,370 --> 01:35:32,840 This says hey, Python. 2223 01:35:32,840 --> 01:35:36,720 Search the string s for this word here. 2224 01:35:36,720 --> 01:35:37,220 All right. 2225 01:35:37,220 --> 01:35:38,190 Let's test this out. 2226 01:35:38,190 --> 01:35:40,520 So Python of agree-- 2227 01:35:40,520 --> 01:35:42,060 whoops, now in this version. 2228 01:35:42,060 --> 01:35:42,560 Whoops. 2229 01:35:42,560 --> 01:35:45,640 I forgot my own-- 2230 01:35:45,640 --> 01:35:46,760 let's see. 2231 01:35:46,760 --> 01:35:47,930 I forgot my colons. 2232 01:35:47,930 --> 01:35:49,410 So Python of agree. 2233 01:35:49,410 --> 01:35:50,510 Enter. 2234 01:35:50,510 --> 01:35:51,290 Do I agree? 2235 01:35:51,290 --> 01:35:53,690 I'm going to go ahead and type in yes, agreed. 2236 01:35:53,690 --> 01:35:57,530 But at the moment, y by itself does not work. 2237 01:35:57,530 --> 01:35:58,760 So let's make it work. 2238 01:35:58,760 --> 01:36:01,160 Well, I could do this in a couple of ways. 2239 01:36:01,160 --> 01:36:05,220 In regular expressions, you can say yes or some other value. 2240 01:36:05,220 --> 01:36:07,200 So a vertical bar just means or. 2241 01:36:07,200 --> 01:36:09,200 So it's not the word or and it's not double bars 2242 01:36:09,200 --> 01:36:11,210 in this context of patterns. 2243 01:36:11,210 --> 01:36:13,970 It's just a single vertical bar. 2244 01:36:13,970 --> 01:36:16,400 But now I can type y or yes. 2245 01:36:16,400 --> 01:36:18,490 But there's some cleverness here, right? 2246 01:36:18,490 --> 01:36:20,292 Like, yes already starts with y. 2247 01:36:20,292 --> 01:36:21,500 So I could actually say this. 2248 01:36:21,500 --> 01:36:25,310 Let me arbitrarily put parentheses around es initially. 2249 01:36:25,310 --> 01:36:27,830 But then put a question mark at the end. 2250 01:36:27,830 --> 01:36:28,970 This is funky syntax. 2251 01:36:28,970 --> 01:36:31,512 And again, what we're talking about now is not Python per se. 2252 01:36:31,512 --> 01:36:34,980 These or regular expressions, patterns of text. 2253 01:36:34,980 --> 01:36:39,500 This just means look for a y and maybe an es but maybe not an es. 2254 01:36:39,500 --> 01:36:45,050 So the question mark means 0 or 1 instance of the thing to the left. 2255 01:36:45,050 --> 01:36:45,800 It's optional. 2256 01:36:45,800 --> 01:36:48,530 So now I can run this again and say yes. 2257 01:36:48,530 --> 01:36:50,090 And that seems to work. 2258 01:36:50,090 --> 01:36:52,640 Or I can say y and that seems to work. 2259 01:36:52,640 --> 01:36:55,910 But this does not work. 2260 01:36:55,910 --> 01:37:00,910 So how could I fix this and make it case-insensitive? 2261 01:37:00,910 --> 01:37:04,650 I could actually just say lower and just force everything to lowercase. 2262 01:37:04,650 --> 01:37:06,650 Or it turns out, if you read the documentation-- 2263 01:37:06,650 --> 01:37:07,820 this looks a little weird-- 2264 01:37:07,820 --> 01:37:10,707 you can also pass in a third argument, which weirdly is all caps 2265 01:37:10,707 --> 01:37:11,540 like you're yelling. 2266 01:37:11,540 --> 01:37:14,420 But this is regular expression IGNORECASE. 2267 01:37:14,420 --> 01:37:18,800 And this will just force everything to be treated as lowercase or uppercase. 2268 01:37:18,800 --> 01:37:20,120 It doesn't matter. 2269 01:37:20,120 --> 01:37:22,520 But we'll see here this is actually going 2270 01:37:22,520 --> 01:37:25,860 to make it a lot easier to search for certain patterns. 2271 01:37:25,860 --> 01:37:29,755 We can say no similarly here by just starting to construct patterns. 2272 01:37:29,755 --> 01:37:32,630 And again, you don't sit down generally and write regular expressions 2273 01:37:32,630 --> 01:37:34,970 that just work like this. 2274 01:37:34,970 --> 01:37:37,700 You build them up piece by piece as I already am. 2275 01:37:37,700 --> 01:37:40,010 So let me fix this real quick. 2276 01:37:40,010 --> 01:37:41,360 What did I just do wrong? 2277 01:37:41,360 --> 01:37:42,380 Here we go. 2278 01:37:42,380 --> 01:37:44,330 Let me do one last thing. 2279 01:37:44,330 --> 01:37:46,190 Suppose I agree. 2280 01:37:46,190 --> 01:37:48,060 Yes. 2281 01:37:48,060 --> 01:37:48,560 OK. 2282 01:37:48,560 --> 01:37:49,610 That's OK. 2283 01:37:49,610 --> 01:37:53,300 Because I'm searching the whole string s 2284 01:37:53,300 --> 01:37:56,630 But if I want to search for literally the beginning of the string, 2285 01:37:56,630 --> 01:37:58,104 I can use a caret symbol here. 2286 01:37:58,104 --> 01:38:01,354 And to search all the way to the end of the string, you can use a dollar sign. 2287 01:38:01,354 --> 01:38:03,710 Why these are the way they are I don't know. 2288 01:38:03,710 --> 01:38:04,910 It's hideous. 2289 01:38:04,910 --> 01:38:07,040 But caret means start of string. 2290 01:38:07,040 --> 01:38:09,270 Dollar sign means end of string. 2291 01:38:09,270 --> 01:38:16,730 And if it's not crazy enough now, yes is not going to work. 2292 01:38:16,730 --> 01:38:18,080 No agreement. 2293 01:38:18,080 --> 01:38:21,020 But yes literally will. 2294 01:38:21,020 --> 01:38:23,560 Because this means the human must type literally 2295 01:38:23,560 --> 01:38:27,880 at the beginning of their input a y followed optionally by an es. 2296 01:38:27,880 --> 01:38:30,730 And then per the dollar sign, that's got to be it for their input. 2297 01:38:30,730 --> 01:38:33,730 You can make it really tight around the user's input 2298 01:38:33,730 --> 01:38:36,310 to control what they are typing in, especially 2299 01:38:36,310 --> 01:38:38,410 for something like an agreement. 2300 01:38:38,410 --> 01:38:38,980 All right. 2301 01:38:38,980 --> 01:38:42,157 So now let's do something more fun. 2302 01:38:42,157 --> 01:38:45,490 So now that we have Python, it turns out we can do some more interesting things. 2303 01:38:45,490 --> 01:38:48,040 And it turns out you can do these even on your own Mac or PC. 2304 01:38:48,040 --> 01:38:49,810 I've been using the IDE all this time. 2305 01:38:49,810 --> 01:38:53,985 But Python is even easier than C to get working on your own Mac and PC. 2306 01:38:53,985 --> 01:38:56,110 And so indeed, before class, I literally downloaded 2307 01:38:56,110 --> 01:38:59,560 a program called Python, installed it on my Mac-- and you could do it on a PC 2308 01:38:59,560 --> 01:39:03,280 as well-- which allows me on my own Mac to use something like this terminal 2309 01:39:03,280 --> 01:39:08,820 window in order to run Python programs on my own Mac without the IDE 2310 01:39:08,820 --> 01:39:09,430 in the way. 2311 01:39:09,430 --> 01:39:12,760 What this means in particular, I can use hardware on my own Mac or PC. 2312 01:39:12,760 --> 01:39:15,190 For instance, like the microphone built in. 2313 01:39:15,190 --> 01:39:17,560 So let me go ahead and make a program here 2314 01:39:17,560 --> 01:39:21,340 that's going to be called, for instance, voice. 2315 01:39:21,340 --> 01:39:23,940 Let me go ahead and open voice.py. 2316 01:39:23,940 --> 01:39:25,900 I'm going use a different text editing program. 2317 01:39:25,900 --> 01:39:28,417 It's not the IDE, but it's going to let me write code. 2318 01:39:28,417 --> 01:39:29,750 And let me go ahead and do this. 2319 01:39:29,750 --> 01:39:33,670 Let me go ahead and get input from the user not even using the CS50 library. 2320 01:39:33,670 --> 01:39:38,317 But I'm just going to ask the human to say something backslash n. 2321 01:39:38,317 --> 01:39:40,900 And then I'm going to force the user's input to lowercase just 2322 01:39:40,900 --> 01:39:42,282 to make my life a little easier. 2323 01:39:42,282 --> 01:39:43,990 And now I'm going to ask a few questions. 2324 01:39:43,990 --> 01:39:48,100 If the word hello is in the user's words, 2325 01:39:48,100 --> 01:39:52,240 well, let me go ahead and say hello to you, too. 2326 01:39:52,240 --> 01:39:53,200 That's nice. 2327 01:39:53,200 --> 01:39:57,280 elif, for instance, how are you in words. 2328 01:39:57,280 --> 01:40:03,100 Let me go ahead and say something like, print, for instance, I am well. 2329 01:40:03,100 --> 01:40:04,620 Thanks. 2330 01:40:04,620 --> 01:40:09,100 elif, how about goodbye in words. 2331 01:40:09,100 --> 01:40:12,670 Let me go ahead and print goodbye to you, too. 2332 01:40:12,670 --> 01:40:15,130 Though I could certainly say most anything I want here. 2333 01:40:15,130 --> 01:40:18,670 else, I don't know what's going on, so I'm just going to say huh. 2334 01:40:18,670 --> 01:40:21,670 So what is the essence of this program? 2335 01:40:21,670 --> 01:40:22,612 What have I done? 2336 01:40:22,612 --> 01:40:24,820 Like, this is kind of, sort of, definitely a stretch, 2337 01:40:24,820 --> 01:40:27,310 but the beginnings of artificial intelligence, if you will. 2338 01:40:27,310 --> 01:40:29,680 It's a program that's interacting with me. 2339 01:40:29,680 --> 01:40:32,350 And way back when, some of the earliest programs in AI 2340 01:40:32,350 --> 01:40:34,323 were just text-based like this. 2341 01:40:34,323 --> 01:40:36,490 Artificial intelligence is essentially like creating 2342 01:40:36,490 --> 01:40:40,900 a human that's sentient and actually can respond to and react to a human 2343 01:40:40,900 --> 01:40:42,950 as though they too are human themselves. 2344 01:40:42,950 --> 01:40:44,470 So let me go ahead and run this. 2345 01:40:44,470 --> 01:40:49,720 Python voice.py as though I'm talking to it and say, hello there. 2346 01:40:49,720 --> 01:40:51,820 That's grammatically wrong, but we won't care. 2347 01:40:51,820 --> 01:40:53,740 Hello to you, too. 2348 01:40:53,740 --> 01:40:55,630 How are you? 2349 01:40:55,630 --> 01:40:58,690 I am well, thanks that's kind of cool. 2350 01:40:58,690 --> 01:41:00,130 Goodbye. 2351 01:41:00,130 --> 01:41:01,120 Goodbye to you, too. 2352 01:41:01,120 --> 01:41:02,440 Now why did that work? 2353 01:41:02,440 --> 01:41:06,400 I'm just using pythons in operator, searching the user's words 2354 01:41:06,400 --> 01:41:09,670 which are just strings that have been typed in via the input function. 2355 01:41:09,670 --> 01:41:12,563 And again, the input function is almost the same as get string 2356 01:41:12,563 --> 01:41:14,230 but it's the one that comes with Python. 2357 01:41:14,230 --> 01:41:18,400 And I'm just doing if else, if else, if else, if else, printing out things. 2358 01:41:18,400 --> 01:41:21,220 But it turns out with Python-- and honestly, other languages, 2359 01:41:21,220 --> 01:41:25,150 but Python especially-- it's easy to do even fancier things, too. 2360 01:41:25,150 --> 01:41:28,060 Let me go ahead and not get the human's words from the keyboard 2361 01:41:28,060 --> 01:41:32,380 but let me import speech recognition, which is a library that I've 2362 01:41:32,380 --> 01:41:34,300 installed on my computer in advance. 2363 01:41:34,300 --> 01:41:37,180 And let me go ahead and change this a little bit. 2364 01:41:37,180 --> 01:41:39,970 Let me go ahead and say something like this. 2365 01:41:39,970 --> 01:41:45,760 Recognizer gets speech recognition.recognizer. 2366 01:41:45,760 --> 01:41:47,730 And I literally did not know what I was doing. 2367 01:41:47,730 --> 01:41:49,480 I was simply following the directions when 2368 01:41:49,480 --> 01:41:51,280 I downloaded the library initially. 2369 01:41:51,280 --> 01:42:00,290 But I learned that I can say speech recognition.microphone as source. 2370 01:42:00,290 --> 01:42:00,790 Print. 2371 01:42:00,790 --> 01:42:05,140 Now let's go ahead and say something to the human so they provide input. 2372 01:42:05,140 --> 01:42:06,640 Then let me get some audio from the. 2373 01:42:06,640 --> 01:42:11,200 User recognizer.listen to that source being the microphone. 2374 01:42:11,200 --> 01:42:16,760 And then down here I'm going to say, Google speech recognition 2375 01:42:16,760 --> 01:42:19,660 thinks things you said. 2376 01:42:19,660 --> 01:42:25,210 And then print recognizer.recognize Google audio. 2377 01:42:25,210 --> 01:42:28,162 So it's OK if we don't understand each and every line. 2378 01:42:28,162 --> 01:42:31,120 I didn't last night when I was sort of experimenting with this example. 2379 01:42:31,120 --> 01:42:34,300 The key, though, is that I've imported a very powerful library that's 2380 01:42:34,300 --> 01:42:35,890 open source and freely available. 2381 01:42:35,890 --> 01:42:39,070 Happens to talk to Google's back end infrastructure 2382 01:42:39,070 --> 01:42:42,170 where they implement a number of artificial intelligence features. 2383 01:42:42,170 --> 01:42:44,920 And if I didn't screw up, let's see how this one works. 2384 01:42:44,920 --> 01:42:48,630 Python of voices.py. 2385 01:42:48,630 --> 01:42:52,410 2386 01:42:52,410 --> 01:42:53,370 Hello, world. 2387 01:42:53,370 --> 01:42:59,420 2388 01:42:59,420 --> 01:43:00,450 How are you? 2389 01:43:00,450 --> 01:43:05,010 2390 01:43:05,010 --> 01:43:05,930 Goodbye, world. 2391 01:43:05,930 --> 01:43:09,720 2392 01:43:09,720 --> 01:43:10,340 OK. 2393 01:43:10,340 --> 01:43:12,230 Pretty, pretty amazing. 2394 01:43:12,230 --> 01:43:12,830 [APPLAUSE] 2395 01:43:12,830 --> 01:43:13,430 Thank you. 2396 01:43:13,430 --> 01:43:17,990 2397 01:43:17,990 --> 01:43:22,820 Let me go in, and for time's sake, let me open up A variant of this 2398 01:43:22,820 --> 01:43:23,930 that I wrote in advance. 2399 01:43:23,930 --> 01:43:26,390 This one now is exactly the same. 2400 01:43:26,390 --> 01:43:29,930 But now notice insofar as Google is handing me back a bunch of words, 2401 01:43:29,930 --> 01:43:32,240 I can certainly just use some Python syntax and say, 2402 01:43:32,240 --> 01:43:34,210 is hello in the user's words? 2403 01:43:34,210 --> 01:43:36,030 Is how are you in the user's words? 2404 01:43:36,030 --> 01:43:38,795 Goodbye to you, IS goodbye in the user's words? 2405 01:43:38,795 --> 01:43:39,920 So let me run this version. 2406 01:43:39,920 --> 01:43:42,800 Python voices2, which is available-- 2407 01:43:42,800 --> 01:43:46,700 I can't talk while I'm doing this demo. 2408 01:43:46,700 --> 01:43:47,420 Hello world. 2409 01:43:47,420 --> 01:43:52,210 2410 01:43:52,210 --> 01:43:53,250 How are you today? 2411 01:43:53,250 --> 01:43:57,900 2412 01:43:57,900 --> 01:43:58,790 Goodbye, world. 2413 01:43:58,790 --> 01:44:03,665 2414 01:44:03,665 --> 01:44:04,165 OK. 2415 01:44:04,165 --> 01:44:05,130 [LAUGHTER] 2416 01:44:05,130 --> 01:44:09,480 Now let me take it up a notch and introduce, in this case, 2417 01:44:09,480 --> 01:44:12,270 an example using regular expressions. 2418 01:44:12,270 --> 01:44:13,380 So notice this. 2419 01:44:13,380 --> 01:44:15,840 At quick glance, uses re.search. 2420 01:44:15,840 --> 01:44:19,380 And it's searching for the words my name is, 2421 01:44:19,380 --> 01:44:22,800 which is to say that hopefully this will detect if I 2422 01:44:22,800 --> 01:44:24,630 have said my name is such and such. 2423 01:44:24,630 --> 01:44:27,570 And it's then going to say hey to whatever matches. 2424 01:44:27,570 --> 01:44:31,590 You can use regular expressions to extract information from input. 2425 01:44:31,590 --> 01:44:36,390 So I'm extracting with parentheses here whatever comes after the word is. 2426 01:44:36,390 --> 01:44:37,650 So here we go again. 2427 01:44:37,650 --> 01:44:39,340 Python, this time of voices.3. 2428 01:44:39,340 --> 01:44:43,550 2429 01:44:43,550 --> 01:44:44,520 Hello, there. 2430 01:44:44,520 --> 01:44:45,350 My name is David. 2431 01:44:45,350 --> 01:44:48,170 2432 01:44:48,170 --> 01:44:50,080 Ho, ho, ho! 2433 01:44:50,080 --> 01:44:52,870 Now your computer is indeed sentiment. 2434 01:44:52,870 --> 01:44:56,860 Let's do something else more powerful. 2435 01:44:56,860 --> 01:44:59,800 And I hope you'll forgive if we go, like, two minutes over today. 2436 01:44:59,800 --> 01:45:01,870 I hope it's going to be worth it. 2437 01:45:01,870 --> 01:45:05,410 Let me go ahead, and in today's examples 2 for week 6, 2438 01:45:05,410 --> 01:45:08,470 let me open up something like faces. 2439 01:45:08,470 --> 01:45:11,300 In this case here, we have, for instance, 2440 01:45:11,300 --> 01:45:14,620 a whole bunch of our Yale staff some weeks ago. 2441 01:45:14,620 --> 01:45:17,530 So you'll see here a whole bunch of faces in Yale. 2442 01:45:17,530 --> 01:45:19,690 And now I'm going to go ahead and, in advance, I 2443 01:45:19,690 --> 01:45:25,360 wrote a program here called detect to detect faces. 2444 01:45:25,360 --> 01:45:28,360 I'm going to go ahead and run this program called detect.py. 2445 01:45:28,360 --> 01:45:31,300 It's written in Python but we'll let you see the code online. 2446 01:45:31,300 --> 01:45:34,630 It's going to open that Yale JPEG file. 2447 01:45:34,630 --> 01:45:38,290 It's going to analyze it looking for things that look like faces. 2448 01:45:38,290 --> 01:45:41,090 Eyes, and nose, and mouth, and so forth. 2449 01:45:41,090 --> 01:45:49,160 And if it finds them, it's going to open and extract each and every one of them, 2450 01:45:49,160 --> 01:45:51,190 for better or for worse. 2451 01:45:51,190 --> 01:45:56,020 Better still, suppose we have this photo which is a photo of most of CS50 staff 2452 01:45:56,020 --> 01:45:57,620 here at Harvard this year. 2453 01:45:57,620 --> 01:46:00,850 And if you see, I am among them somewhere. 2454 01:46:00,850 --> 01:46:04,240 Well, I wrote another program thanks to a nice tutorial online, 2455 01:46:04,240 --> 01:46:09,460 this one called recognize.py, that's going to analyze harvard.jpg this time 2456 01:46:09,460 --> 01:46:14,290 and actually find, hopefully, me. 2457 01:46:14,290 --> 01:46:18,640 Because I also have fed this program as input one photo of myself 2458 01:46:18,640 --> 01:46:20,380 from CS50's website. 2459 01:46:20,380 --> 01:46:23,740 And in just a moment, hopefully this will open up 2460 01:46:23,740 --> 01:46:29,050 a file containing an analyzed version. 2461 01:46:29,050 --> 01:46:34,550 And indeed, if we look for Waldo, there I am in the back. 2462 01:46:34,550 --> 01:46:37,870 And the program in Python drew that green box. 2463 01:46:37,870 --> 01:46:40,100 Let's do one final example. 2464 01:46:40,100 --> 01:46:43,120 This one is going to be called qr.py. 2465 01:46:43,120 --> 01:46:45,220 And it turns out, if you're familiar with QR 2466 01:46:45,220 --> 01:46:48,310 codes, those two-dimensional barcodes you sometimes see online 2467 01:46:48,310 --> 01:46:52,000 and in the real world, you can import a library called QR code. 2468 01:46:52,000 --> 01:46:56,920 I can then generate an image using QR codes built-in function make. 2469 01:46:56,920 --> 01:46:59,320 And let me go ahead and make a QR code containing, like, 2470 01:46:59,320 --> 01:47:01,620 a link to one of the courses videos. 2471 01:47:01,620 --> 01:47:12,820 Https://youtu.be/OHG5SJYRHA0. 2472 01:47:12,820 --> 01:47:15,310 Let me just double check that there's no typos. 2473 01:47:15,310 --> 01:47:16,374 OHG5SJYRHA0. 2474 01:47:16,374 --> 01:47:20,980 2475 01:47:20,980 --> 01:47:24,070 So that's going to embed in a two-dimensional barcode that URL. 2476 01:47:24,070 --> 01:47:29,200 I'm then going to do image.save qr.ping, which is a graphic format-- 2477 01:47:29,200 --> 01:47:30,657 indeed, a ping format. 2478 01:47:30,657 --> 01:47:31,240 And that's it. 2479 01:47:31,240 --> 01:47:32,710 Two lines of code. 2480 01:47:32,710 --> 01:47:36,970 I'm going to go ahead now and run for my final example homemade in Python, 2481 01:47:36,970 --> 01:47:39,700 two lines of code, qr.py. 2482 01:47:39,700 --> 01:47:41,110 That was super quick. 2483 01:47:41,110 --> 01:47:45,160 And if I now go into my directory, you will see qr.ping. 2484 01:47:45,160 --> 01:47:49,960 And if you'd like to take out your iPhone or Android, open your camera, 2485 01:47:49,960 --> 01:47:51,400 point it at the code. 2486 01:47:51,400 --> 01:47:53,770 You might need to zoom in. 2487 01:47:53,770 --> 01:47:55,270 Hopefully this will work. 2488 01:47:55,270 --> 01:47:58,180 2489 01:47:58,180 --> 01:48:04,490 [MUSIC - RICK ASTLEY, "NEVER GONNA GIVE YOU UP"] 2490 01:48:04,490 --> 01:48:05,800 That's it for CS50. 2491 01:48:05,800 --> 01:48:07,940 We'll see you next time. 2492 01:48:07,940 --> 01:48:09,117