1 00:00:00,000 --> 00:00:01,882 2 00:00:01,882 --> 00:00:02,590 CARTER ZENKE: OK. 3 00:00:02,590 --> 00:00:06,470 Well, hello one and all, and welcome to CS50'S Week 6 session. 4 00:00:06,470 --> 00:00:07,510 My name is Carter Zenke. 5 00:00:07,510 --> 00:00:09,280 I'm the course's Preceptor here on campus. 6 00:00:09,280 --> 00:00:12,940 And the goal of these sections is to help you bridge the gap between lecture 7 00:00:12,940 --> 00:00:14,840 and this week's problem set. 8 00:00:14,840 --> 00:00:18,130 So this week, we learned all about this new language 9 00:00:18,130 --> 00:00:22,900 called Python, which is hopefully a little bit more high level than C, 10 00:00:22,900 --> 00:00:25,712 a little bit easier to grasp the syntax of. 11 00:00:25,712 --> 00:00:27,670 So I hope you are as excited as I am to dive in 12 00:00:27,670 --> 00:00:32,020 and learn even more about how Python can serve us in this week's problem set. 13 00:00:32,020 --> 00:00:34,090 Now I have a few topics for today. 14 00:00:34,090 --> 00:00:35,500 These are among them. 15 00:00:35,500 --> 00:00:38,170 We'll talk about Python generally, but focus 16 00:00:38,170 --> 00:00:41,900 in particular on this idea of strings and loops, 17 00:00:41,900 --> 00:00:45,400 and this new type, if you will, called a dictionary, 18 00:00:45,400 --> 00:00:48,100 where we can store keys and values. 19 00:00:48,100 --> 00:00:50,740 We'll also, at a higher level, talk about the idea 20 00:00:50,740 --> 00:00:55,700 of a library, where we can actually have some code somebody else has written, 21 00:00:55,700 --> 00:00:58,030 including that in our own Python code. 22 00:00:58,030 --> 00:01:03,200 And then, towards the end, we'll talk about file I/O, or file input and file 23 00:01:03,200 --> 00:01:07,430 output, in particular, how we can read data from files 24 00:01:07,430 --> 00:01:09,270 and write data to files. 25 00:01:09,270 --> 00:01:15,050 And even more particularly, talking about CSVs, or comma-separated value 26 00:01:15,050 --> 00:01:15,990 files. 27 00:01:15,990 --> 00:01:20,390 So let's jump right in here and talk about strings to begin with. 28 00:01:20,390 --> 00:01:23,660 Strings, as we saw earlier, are these collections of text. 29 00:01:23,660 --> 00:01:27,620 We have individual characters, like A, and B, and C. 30 00:01:27,620 --> 00:01:31,640 But if we wanted to string those characters together, like into a word, 31 00:01:31,640 --> 00:01:34,910 let's say, we would get what we call a string. 32 00:01:34,910 --> 00:01:38,090 And one of my favorite examples of strings in the real world 33 00:01:38,090 --> 00:01:39,262 are really just these books. 34 00:01:39,262 --> 00:01:40,970 And one of my favorite books as a kid was 35 00:01:40,970 --> 00:01:44,510 this one called Goodnight Moon, where, if you're not familiar, basically, 36 00:01:44,510 --> 00:01:48,560 you get to read a book and say goodnight to the various household objects, 37 00:01:48,560 --> 00:01:51,770 like the fireplace, the picture of the cow jumping over the moon, 38 00:01:51,770 --> 00:01:53,640 and finally, the moon itself. 39 00:01:53,640 --> 00:01:56,790 And there are a few lines in this book, like the following, 40 00:01:56,790 --> 00:02:00,680 "In the great green room," is how this book begins. 41 00:02:00,680 --> 00:02:05,510 And actually, I think it's interesting now how we can take these strings, 42 00:02:05,510 --> 00:02:10,160 and we have things like AI that can actually generate pictures for us 43 00:02:10,160 --> 00:02:11,370 from these strings. 44 00:02:11,370 --> 00:02:16,970 So just for fun, I gave this string to this model called DALL-E 2 by OpenAI, 45 00:02:16,970 --> 00:02:18,470 and here's what it came up with. 46 00:02:18,470 --> 00:02:21,610 In the great green room gave me a room with some blinds, 47 00:02:21,610 --> 00:02:24,020 and you see some green in the back there. 48 00:02:24,020 --> 00:02:27,710 I gave it that next piece of text, this string here, which is, 49 00:02:27,710 --> 00:02:30,200 "There was a telephone and a red balloon." 50 00:02:30,200 --> 00:02:33,080 And here's what DALL-E came up with here, 51 00:02:33,080 --> 00:02:36,952 a telephone attached to a red balloon, so kind of creative in some ways. 52 00:02:36,952 --> 00:02:38,660 So that's all just for fun, but I thought 53 00:02:38,660 --> 00:02:41,747 we'd see the applications of strings, and also 54 00:02:41,747 --> 00:02:43,580 where they can come from, too, in this case, 55 00:02:43,580 --> 00:02:45,840 from some of our favorite childhood books. 56 00:02:45,840 --> 00:02:50,930 So let's compare then how C and Python actually work with strings. 57 00:02:50,930 --> 00:02:57,170 And here is one example where we see C on the top and Python on the bottom. 58 00:02:57,170 --> 00:03:00,170 And I'm curious, if you're here live, go ahead and tell me 59 00:03:00,170 --> 00:03:01,970 a few differences you spot. 60 00:03:01,970 --> 00:03:04,670 Let's play a find-the-difference game. 61 00:03:04,670 --> 00:03:08,460 What do you see different between the top and the bottom? 62 00:03:08,460 --> 00:03:11,930 I'm seeing there's no more semicolon in Python, so that's correct. 63 00:03:11,930 --> 00:03:17,070 Generally, in Python, you shouldn't be ending your lines with a semicolon. 64 00:03:17,070 --> 00:03:22,080 There's no what we call char star, or this idea of a pointer to a character. 65 00:03:22,080 --> 00:03:27,140 So in C, we have to tell C that a string is literally a pointer 66 00:03:27,140 --> 00:03:29,900 to the first character of a string. 67 00:03:29,900 --> 00:03:32,030 In Python, though, it's more abstract. 68 00:03:32,030 --> 00:03:33,380 It's a higher level. 69 00:03:33,380 --> 00:03:35,150 It doesn't care too much about pointers. 70 00:03:35,150 --> 00:03:37,558 We just tell it we want this string in general. 71 00:03:37,558 --> 00:03:40,850 You go figure it out and find out where to put the pointers to make that string 72 00:03:40,850 --> 00:03:42,530 to begin with. 73 00:03:42,530 --> 00:03:44,960 Another observation here is that we didn't even 74 00:03:44,960 --> 00:03:47,250 have to declare the data type. 75 00:03:47,250 --> 00:03:50,900 So up above, you'll see we had this variable called text, 76 00:03:50,900 --> 00:03:55,280 and we declared it as a type char star, or a pointer to a character. 77 00:03:55,280 --> 00:03:58,190 Down below, there's no type declaration at all. 78 00:03:58,190 --> 00:04:01,100 Python just infers for us what type we might 79 00:04:01,100 --> 00:04:03,750 want to use in this particular case. 80 00:04:03,750 --> 00:04:07,070 So those are a few key differences between C and Python 81 00:04:07,070 --> 00:04:09,080 here that you'll keep in mind as you go off 82 00:04:09,080 --> 00:04:12,740 and work now in Python, and less so in C. 83 00:04:12,740 --> 00:04:15,270 Here's another example here. 84 00:04:15,270 --> 00:04:17,800 Here's comparing two strings. 85 00:04:17,800 --> 00:04:20,160 So let's again play spot the difference. 86 00:04:20,160 --> 00:04:23,520 Let's look up above and down below, where the above is in C, 87 00:04:23,520 --> 00:04:25,260 and the bottom is in Python. 88 00:04:25,260 --> 00:04:28,350 What do you see different here? 89 00:04:28,350 --> 00:04:30,330 I see no curly braces. 90 00:04:30,330 --> 00:04:33,630 That is certainly a big difference between C and Python. 91 00:04:33,630 --> 00:04:38,070 So C, up above, specifies things like the blocks 92 00:04:38,070 --> 00:04:40,680 for an if statement, or the blocks for a loop 93 00:04:40,680 --> 00:04:43,710 by putting that code inside of curly braces. 94 00:04:43,710 --> 00:04:46,770 Python, though, focuses on indentation. 95 00:04:46,770 --> 00:04:49,830 So notice how if I wanted to have a block of code that 96 00:04:49,830 --> 00:04:57,000 executes in response to this conditional here, I have to indent it by one level. 97 00:04:57,000 --> 00:04:57,930 Let's see, what else? 98 00:04:57,930 --> 00:05:01,590 There are no parentheses, so that's fair. 99 00:05:01,590 --> 00:05:06,930 Up above, we had parentheses for our if statement, down below, not so. 100 00:05:06,930 --> 00:05:09,820 We also don't have to call our very own function. 101 00:05:09,820 --> 00:05:13,740 So up above, in C, we had to call a function to compare two strings. 102 00:05:13,740 --> 00:05:17,610 In Python, it just kind of infers that if I want to compare two strings, 103 00:05:17,610 --> 00:05:24,840 it better do, basically, the same procedure that strcmp, up above, in C, 104 00:05:24,840 --> 00:05:25,540 would do. 105 00:05:25,540 --> 00:05:27,910 So it's much more, I would say, intuitive. 106 00:05:27,910 --> 00:05:30,670 In this case, no need to call any particular function. 107 00:05:30,670 --> 00:05:34,850 Python just knows what we'd like to do in this case. 108 00:05:34,850 --> 00:05:39,580 And let's maybe do one more here. 109 00:05:39,580 --> 00:05:42,370 The code above, in C, and the code down below, 110 00:05:42,370 --> 00:05:47,050 in Python, trying to access some character of a string. 111 00:05:47,050 --> 00:05:48,745 Let's play spot the difference here. 112 00:05:48,745 --> 00:05:51,900 113 00:05:51,900 --> 00:05:54,840 And I'm seeing some hesitancy. 114 00:05:54,840 --> 00:05:56,200 There's no difference. 115 00:05:56,200 --> 00:05:59,580 So that's the idea here, is that in C, in Python, these are both the same. 116 00:05:59,580 --> 00:06:02,940 In order to access some individual character of some string, 117 00:06:02,940 --> 00:06:05,730 we simply use the bracket notation and put 118 00:06:05,730 --> 00:06:08,850 in the index of the character you want to actually get. 119 00:06:08,850 --> 00:06:13,200 So for instance, text bracket 0 would give us that very first character 120 00:06:13,200 --> 00:06:15,030 in this string called text. 121 00:06:15,030 --> 00:06:18,060 Text bracket one would give us that second character, 122 00:06:18,060 --> 00:06:19,860 and so on, and so forth. 123 00:06:19,860 --> 00:06:24,390 In Python and C, these are both what we call zero indexed. 124 00:06:24,390 --> 00:06:27,240 So let's pause here and ask, what questions do you 125 00:06:27,240 --> 00:06:32,400 have on some of these differences so far between C and Python? 126 00:06:32,400 --> 00:06:34,260 Any that I can answer while we're here? 127 00:06:34,260 --> 00:06:38,370 128 00:06:38,370 --> 00:06:43,100 I'll give folks just a minute to think. 129 00:06:43,100 --> 00:06:47,720 A good question I see, so do we not have pointers in Python? 130 00:06:47,720 --> 00:06:50,460 I think it is possible. 131 00:06:50,460 --> 00:06:54,590 I'm going to go out on a limb here and say, in base Python, 132 00:06:54,590 --> 00:06:56,900 you're not going to be dealing with pointers at all. 133 00:06:56,900 --> 00:07:01,140 There are probably ways to work with pointers in Python if you want to. 134 00:07:01,140 --> 00:07:03,950 But the creators of Python made it so you do not 135 00:07:03,950 --> 00:07:05,690 have to deal with pointers at all. 136 00:07:05,690 --> 00:07:08,430 137 00:07:08,430 --> 00:07:11,220 A good question, how does Python know when a string ends? 138 00:07:11,220 --> 00:07:14,010 So in C, we saw that strings end with what 139 00:07:14,010 --> 00:07:17,550 we called the NUL character, N-U-L, which denotes 140 00:07:17,550 --> 00:07:19,500 the end of this particular string. 141 00:07:19,500 --> 00:07:21,750 The fascinating thing about Python is that it probably 142 00:07:21,750 --> 00:07:24,870 does the very same thing, that strings in Python 143 00:07:24,870 --> 00:07:28,980 do have, underneath the hood, pointers to the initial character, 144 00:07:28,980 --> 00:07:31,080 and a NUL character at the end of that string. 145 00:07:31,080 --> 00:07:33,750 But what Python does for us is it allows us 146 00:07:33,750 --> 00:07:35,820 to not think about these underlying details 147 00:07:35,820 --> 00:07:38,100 and instead think at a higher level. 148 00:07:38,100 --> 00:07:42,450 So at the end of the day, in terms of our actual memory inside a computer, 149 00:07:42,450 --> 00:07:44,460 Python and C might be doing similar things. 150 00:07:44,460 --> 00:07:48,360 But we can tell Python to do those things with much less precision, 151 00:07:48,360 --> 00:07:51,990 and in this case, a little more intuition and taking advantage of some 152 00:07:51,990 --> 00:07:55,640 of its syntax it offers us at the end. 153 00:07:55,640 --> 00:07:58,700 And one final question we'll take here, if I go back 154 00:07:58,700 --> 00:08:04,520 to this particular comparison, we said that Python infers 155 00:08:04,520 --> 00:08:07,550 the type of this text is a string. 156 00:08:07,550 --> 00:08:12,960 And that's because this input function here returns to us a string. 157 00:08:12,960 --> 00:08:16,760 So this function input always returns to us a string. 158 00:08:16,760 --> 00:08:21,200 And it's up to us to cast it, to change it to an integer or some other type 159 00:08:21,200 --> 00:08:24,120 if we want to at the end. 160 00:08:24,120 --> 00:08:25,890 OK, so let's keep going. 161 00:08:25,890 --> 00:08:29,310 And one of the nice features about Python 162 00:08:29,310 --> 00:08:34,289 is that it's able to have a lot of built-in functions 163 00:08:34,289 --> 00:08:38,940 that simplify things that we saw in C. And some of these built-in functions 164 00:08:38,940 --> 00:08:42,580 we can use using Python's new dot notation. 165 00:08:42,580 --> 00:08:44,670 So we saw this a bit in lecture, but let's 166 00:08:44,670 --> 00:08:47,500 say I want to get some input from the user. 167 00:08:47,500 --> 00:08:50,130 And here is my line of code to do so. 168 00:08:50,130 --> 00:08:53,010 I say, let's make this new variable called text. 169 00:08:53,010 --> 00:08:57,900 And I'm going to set it equal to the result of calling the function input. 170 00:08:57,900 --> 00:09:02,820 And let's say the user types in this particular string on the right. 171 00:09:02,820 --> 00:09:07,560 Now what do you notice that's a bit, well, not so 172 00:09:07,560 --> 00:09:11,660 clean about the string on the right? 173 00:09:11,660 --> 00:09:14,440 What's the matter with it? 174 00:09:14,440 --> 00:09:16,870 Yes, there are spaces at the start and the end of it. 175 00:09:16,870 --> 00:09:21,400 So often when I'm filling out a form, I might be doing so very quickly. 176 00:09:21,400 --> 00:09:26,080 And I might actually add in a space at the end, some spaces at the beginning 177 00:09:26,080 --> 00:09:28,660 just by a typo on my keyboard. 178 00:09:28,660 --> 00:09:33,340 So it's common that users will actually give us data that is dirty in this way, 179 00:09:33,340 --> 00:09:37,550 that it has extra spaces, it's not capitalized correctly, and so on. 180 00:09:37,550 --> 00:09:40,300 So there's this interesting Python function 181 00:09:40,300 --> 00:09:42,440 that we could use that looks a bit like this. 182 00:09:42,440 --> 00:09:46,240 I could say text.strip, text.strip. 183 00:09:46,240 --> 00:09:49,120 And what would happen is, I would take that same text 184 00:09:49,120 --> 00:09:52,490 and I would then convert it to this. 185 00:09:52,490 --> 00:09:54,070 So notice the difference here? 186 00:09:54,070 --> 00:09:57,070 I had spaces at the beginning and spaces at the end. 187 00:09:57,070 --> 00:10:02,590 With text.strip, I can strip those and be left with just the characters that 188 00:10:02,590 --> 00:10:04,820 are inside this particular string. 189 00:10:04,820 --> 00:10:09,910 So the purpose of dot strip is to take out beginning and trailing 190 00:10:09,910 --> 00:10:11,830 whitespace characters. 191 00:10:11,830 --> 00:10:16,490 Now notice here how strip isn't actually a standalone function. 192 00:10:16,490 --> 00:10:19,490 I didn't say, strip, and then gave the input as text. 193 00:10:19,490 --> 00:10:22,490 I instead said, text.strip. 194 00:10:22,490 --> 00:10:25,160 So this is Python's dot notation coming into play. 195 00:10:25,160 --> 00:10:28,978 And we'll see why this actually happens in just a little bit. 196 00:10:28,978 --> 00:10:30,770 Let's focus on some other ones we could use 197 00:10:30,770 --> 00:10:33,140 that are a bit handy for working with strings. 198 00:10:33,140 --> 00:10:37,490 Here we have, let's say, a new kind of input from the user. 199 00:10:37,490 --> 00:10:41,120 And what looks wrong about this one? 200 00:10:41,120 --> 00:10:43,470 The capitalization is just a little bit off. 201 00:10:43,470 --> 00:10:45,420 I think they might have had some typos here. 202 00:10:45,420 --> 00:10:49,370 So one way to fix this is to use the dot capital-- 203 00:10:49,370 --> 00:10:53,130 or, well, in this case, the dot lower method, or dot lower function. 204 00:10:53,130 --> 00:10:56,520 So I could say text.lower, and what would that do? 205 00:10:56,520 --> 00:11:00,810 Well, it would convert these characters all to lowercase. 206 00:11:00,810 --> 00:11:04,310 So I would take this with mixed capitalization 207 00:11:04,310 --> 00:11:07,880 and then bring it down to all lowercase. 208 00:11:07,880 --> 00:11:11,750 And as I just said before, we also have this one we could use called 209 00:11:11,750 --> 00:11:15,110 dot capitalize, where dot capitalize takes a string 210 00:11:15,110 --> 00:11:21,420 and makes the first character only uppercase, so some options there. 211 00:11:21,420 --> 00:11:23,480 Now with dot lower, we also have dot upper. 212 00:11:23,480 --> 00:11:27,770 And there's actually a lot of these we can ever use in Python. 213 00:11:27,770 --> 00:11:31,970 We can find all of these in the Python documentation. 214 00:11:31,970 --> 00:11:37,760 So one note of caution here is that when I actually do text.capitalize, 215 00:11:37,760 --> 00:11:41,840 I need to reassign the result back to my variable, 216 00:11:41,840 --> 00:11:44,360 otherwise the changes won't actually stay. 217 00:11:44,360 --> 00:11:47,240 So, for example, I'll go back to my code space. 218 00:11:47,240 --> 00:11:53,900 And I'll open up here, let's say this one is just capitalize.py. 219 00:11:53,900 --> 00:11:55,730 And I'll have the same piece of code. 220 00:11:55,730 --> 00:12:00,950 I'll say, text equals input, and maybe for cleanliness sake, 221 00:12:00,950 --> 00:12:03,200 I'll say, Enter a string. 222 00:12:03,200 --> 00:12:07,820 And then I'll say, text.input, or text.capitalize, 223 00:12:07,820 --> 00:12:10,550 like this, hoping to uppercase that text. 224 00:12:10,550 --> 00:12:12,350 Then I'll say, text. 225 00:12:12,350 --> 00:12:15,710 I'll say, let's print out text here to see the result. 226 00:12:15,710 --> 00:12:20,600 I'll say, python of capitalize.py to run this program, as we saw in lecture. 227 00:12:20,600 --> 00:12:25,820 And now I'll say, In THE GREAT green Room. 228 00:12:25,820 --> 00:12:30,200 And if I hit Enter, hopefully I should see this text being capitalized, 229 00:12:30,200 --> 00:12:33,710 such that that very first character at the beginning is capital, 230 00:12:33,710 --> 00:12:35,960 but all the rest are lowercase. 231 00:12:35,960 --> 00:12:38,350 So I'll hit Enter here. 232 00:12:38,350 --> 00:12:41,800 And do you see any changes? 233 00:12:41,800 --> 00:12:42,760 I don't seem to. 234 00:12:42,760 --> 00:12:47,950 So I think what we should do is say, text equals text.capitalized, 235 00:12:47,950 --> 00:12:52,360 so reassign the capitalized version of text back to text, 236 00:12:52,360 --> 00:12:54,710 and then print the result. So let's try this. 237 00:12:54,710 --> 00:13:00,700 I'll say python of capitalize.py, and I'll say, In the GREAT GREEN room. 238 00:13:00,700 --> 00:13:05,670 Hit Enter, and now I see my text being capitalized. 239 00:13:05,670 --> 00:13:09,000 So let's again ask, what questions do we have? 240 00:13:09,000 --> 00:13:13,560 We've seen a few what we call string methods. 241 00:13:13,560 --> 00:13:16,170 A question, can we still make functions in Python? 242 00:13:16,170 --> 00:13:17,350 We certainly can. 243 00:13:17,350 --> 00:13:20,010 So you'll be able to see in this week's problem set 244 00:13:20,010 --> 00:13:25,140 some of the syntax via which you can make your very own functions in Python. 245 00:13:25,140 --> 00:13:28,990 A question about reassignment being necessary for dot upper and dot lower? 246 00:13:28,990 --> 00:13:29,490 Yes. 247 00:13:29,490 --> 00:13:33,990 So if I were to use dot lower or dot upper, which would make these lowercase 248 00:13:33,990 --> 00:13:36,300 or uppercase all characters respectively, 249 00:13:36,300 --> 00:13:39,570 I should still reassign the result if I want to change text 250 00:13:39,570 --> 00:13:42,488 as a whole in my program. 251 00:13:42,488 --> 00:13:43,530 Question about dot strip. 252 00:13:43,530 --> 00:13:48,530 Does it allow us to, let's say, take out a middle whitespace character? 253 00:13:48,530 --> 00:13:49,830 So let's try this. 254 00:13:49,830 --> 00:13:53,730 It's no longer capitalizing, but I'll still say, python capitalize.py. 255 00:13:53,730 --> 00:14:01,880 I'll enter a string, like H, and then space, ell, space, o, hit Enter, 256 00:14:01,880 --> 00:14:04,850 and I still see the whitespace in the middle. 257 00:14:04,850 --> 00:14:10,670 So strip only removes white space at the beginning and at the end of our string, 258 00:14:10,670 --> 00:14:14,870 or what we call the leading and trailing whitespace. 259 00:14:14,870 --> 00:14:20,420 A question, can we make our own dot notation functions, or more precisely, 260 00:14:20,420 --> 00:14:21,500 methods? 261 00:14:21,500 --> 00:14:25,980 You could, and we'll see a bit more about how you could just now. 262 00:14:25,980 --> 00:14:29,900 So let's dive underneath the hood and talk about where these dot functions, 263 00:14:29,900 --> 00:14:31,980 dot methods actually come from. 264 00:14:31,980 --> 00:14:39,920 So unlike in C, in Python, this string is actually what we call an object. 265 00:14:39,920 --> 00:14:45,980 So an object in programming is basically some particular element 266 00:14:45,980 --> 00:14:48,320 you can reuse throughout your code. 267 00:14:48,320 --> 00:14:54,080 And it has accessible to it not just certain values, like pieces of data, 268 00:14:54,080 --> 00:14:56,570 but also, certain functions. 269 00:14:56,570 --> 00:14:58,550 And we actually call those, because they're 270 00:14:58,550 --> 00:15:00,530 associated with an object, methods. 271 00:15:00,530 --> 00:15:04,020 So functions associated with objects are often called methods. 272 00:15:04,020 --> 00:15:07,710 And for your own benefit, Python actually 273 00:15:07,710 --> 00:15:13,090 tells you all of the possible string methods in the Python documentation. 274 00:15:13,090 --> 00:15:16,830 So if you go to Docs.Python.org, you'll be 275 00:15:16,830 --> 00:15:20,640 able to see the entire Python manual that tells you 276 00:15:20,640 --> 00:15:26,430 all the possible methods, and functions, and objects that are built into Python. 277 00:15:26,430 --> 00:15:30,000 And if you're first encountering this idea of an object, 278 00:15:30,000 --> 00:15:32,070 I think it's a little bit abstract. 279 00:15:32,070 --> 00:15:36,460 I said it was some entity you can use multiple times in your code. 280 00:15:36,460 --> 00:15:39,690 So I want to make it a little more concrete for you, hopefully. 281 00:15:39,690 --> 00:15:44,880 Now if you recall from an earlier problem set, let's say, problem set 3, 282 00:15:44,880 --> 00:15:48,780 I believe, we had this idea of a candidate. 283 00:15:48,780 --> 00:15:52,380 And do you remember how we implemented the idea of a candidate? 284 00:15:52,380 --> 00:15:58,100 What did we do in code to create a candidate? 285 00:15:58,100 --> 00:15:59,280 What did we do in code? 286 00:15:59,280 --> 00:16:02,820 So we had the idea of creating some kind of struct. 287 00:16:02,820 --> 00:16:07,700 So in C, we had a struct that allowed us to combine different data types. 288 00:16:07,700 --> 00:16:10,970 We could have a string for the candidate's name 289 00:16:10,970 --> 00:16:13,100 and an integer for the number of votes. 290 00:16:13,100 --> 00:16:17,090 And that, as a whole, was what we called a candidate. 291 00:16:17,090 --> 00:16:20,930 So to visualize here, we had a single candidate, 292 00:16:20,930 --> 00:16:26,610 and we gave that candidate some name, which we accessed via candidate.name. 293 00:16:26,610 --> 00:16:29,240 We also perhaps gave that candidate some number 294 00:16:29,240 --> 00:16:32,780 of votes, which we accessed via candidate.vote. 295 00:16:32,780 --> 00:16:37,250 So these are what we called attributes of this struct, different data 296 00:16:37,250 --> 00:16:42,190 values that were associated with this idea of a candidate. 297 00:16:42,190 --> 00:16:45,590 Now an object is very much like this. 298 00:16:45,590 --> 00:16:49,990 We can assign it some particular data that composes that object. 299 00:16:49,990 --> 00:16:52,090 In this case, a candidate could itself be 300 00:16:52,090 --> 00:16:55,660 an object that has some attributes, like name and votes. 301 00:16:55,660 --> 00:16:58,360 But where things differ is where we actually 302 00:16:58,360 --> 00:17:02,800 allow us to have not just values, but actually functions associated 303 00:17:02,800 --> 00:17:04,730 with some particular object. 304 00:17:04,730 --> 00:17:08,890 So let's say here, a str, which is itself an object in Python, 305 00:17:08,890 --> 00:17:14,410 the str represents a string, it has not just certain attributes like a link. 306 00:17:14,410 --> 00:17:18,430 It also has some functions associated with it, like, in this case, 307 00:17:18,430 --> 00:17:19,270 capitalize. 308 00:17:19,270 --> 00:17:22,180 So you could think of it as taking out a toolbox 309 00:17:22,180 --> 00:17:27,250 and being able to modify this particular value, the value it's storing. 310 00:17:27,250 --> 00:17:29,620 You could think of this, too, with dot lower. 311 00:17:29,620 --> 00:17:32,710 This is another function, another tool in your toolbox 312 00:17:32,710 --> 00:17:36,410 by which you could update this value of a string. 313 00:17:36,410 --> 00:17:39,340 So, again, the key difference here is that objects 314 00:17:39,340 --> 00:17:44,120 allow us to have functions associated with them and not just values, 315 00:17:44,120 --> 00:17:48,450 as we saw with structs in C. 316 00:17:48,450 --> 00:17:50,430 Let me pause here and ask for questions. 317 00:17:50,430 --> 00:17:54,750 What questions do you have after this high-level overview 318 00:17:54,750 --> 00:17:57,420 of an object in Python? 319 00:17:57,420 --> 00:18:04,050 320 00:18:04,050 --> 00:18:06,390 Question, so lower is actually a function? 321 00:18:06,390 --> 00:18:08,310 It is, so lower is a function. 322 00:18:08,310 --> 00:18:11,880 It takes as input, let's say, the value that the string holds, 323 00:18:11,880 --> 00:18:13,240 the text that's inside of it. 324 00:18:13,240 --> 00:18:17,340 And it returns to us the lowercase version of that. 325 00:18:17,340 --> 00:18:22,200 In that case, it is a function but it's also associated with an object. 326 00:18:22,200 --> 00:18:24,360 I can't use lower on anything. 327 00:18:24,360 --> 00:18:27,150 I can only use it on strs. 328 00:18:27,150 --> 00:18:31,170 So in that case, it is what we call a method, a function associated 329 00:18:31,170 --> 00:18:34,590 with some object, in this case. 330 00:18:34,590 --> 00:18:36,930 Question, is int an object? 331 00:18:36,930 --> 00:18:41,837 So a spoiler is that everything in Python is an object. 332 00:18:41,837 --> 00:18:43,670 So if you want to learn more about this, you 333 00:18:43,670 --> 00:18:45,870 could learn about object-oriented programming. 334 00:18:45,870 --> 00:18:48,373 But Python is an object-oriented language. 335 00:18:48,373 --> 00:18:51,540 You'll often learn more about that as you take higher level computer science 336 00:18:51,540 --> 00:18:54,030 courses and get into differences in how we can actually 337 00:18:54,030 --> 00:18:58,790 program things and design languages. 338 00:18:58,790 --> 00:19:02,390 Let's see, are capitalize and upper the very same functions? 339 00:19:02,390 --> 00:19:03,300 They're different. 340 00:19:03,300 --> 00:19:08,000 So upper takes the entire string and uppercases all characters. 341 00:19:08,000 --> 00:19:12,050 Capitalize takes that string and uppercases only the very first 342 00:19:12,050 --> 00:19:12,960 character. 343 00:19:12,960 --> 00:19:16,340 So think of it like you'd capitalize a sentence, 344 00:19:16,340 --> 00:19:18,650 the very first character is capitalized. 345 00:19:18,650 --> 00:19:21,560 346 00:19:21,560 --> 00:19:25,340 Question, is there a summary somewhere of the most important objects 347 00:19:25,340 --> 00:19:26,930 and their associated methods? 348 00:19:26,930 --> 00:19:29,850 Yes, that is called the Python Documentation. 349 00:19:29,850 --> 00:19:32,120 So if you go to Docs.Python.org, you'll be 350 00:19:32,120 --> 00:19:36,020 able to find all of the important, really, actually, 351 00:19:36,020 --> 00:19:39,110 not just the important ones, really every possible object 352 00:19:39,110 --> 00:19:43,760 and its associated methods in the Python Documentation. 353 00:19:43,760 --> 00:19:47,130 OK, so let's keep going. 354 00:19:47,130 --> 00:19:50,880 And I think now that we have this idea of strings, what we can do with them, 355 00:19:50,880 --> 00:19:53,630 it's also interesting to bring in loops to the mix. 356 00:19:53,630 --> 00:19:57,500 So loops, we saw earlier, are about going through some piece of code 357 00:19:57,500 --> 00:19:58,820 multiple times. 358 00:19:58,820 --> 00:20:03,150 And actually, in Python we get access to a very special kind of loop. 359 00:20:03,150 --> 00:20:06,680 So in lecture, we saw for loops and while loops. 360 00:20:06,680 --> 00:20:12,500 But I find this one to be maybe the most exciting for new Python programmers. 361 00:20:12,500 --> 00:20:15,260 So this loop looks a bit like this. 362 00:20:15,260 --> 00:20:20,260 For c in text, print c. 363 00:20:20,260 --> 00:20:21,960 For c in text, print c. 364 00:20:21,960 --> 00:20:23,710 And I'm going to ask you in the chat, what 365 00:20:23,710 --> 00:20:25,540 do you think is going to happen here? 366 00:20:25,540 --> 00:20:27,500 What will we see on our screen? 367 00:20:27,500 --> 00:20:32,080 If we have the following input, which is, in the great green room, 368 00:20:32,080 --> 00:20:33,490 what do you think we'll see? 369 00:20:33,490 --> 00:20:38,437 370 00:20:38,437 --> 00:20:40,270 Yeah, so I'm seeing people saying that we're 371 00:20:40,270 --> 00:20:45,070 going to see the string printed out character by character. 372 00:20:45,070 --> 00:20:46,430 Now how would this happen? 373 00:20:46,430 --> 00:20:49,180 Well, Python just has a special kind of loop 374 00:20:49,180 --> 00:20:52,660 called this for blank in blank loop, where I can simply take something 375 00:20:52,660 --> 00:20:57,790 like a string and say, for blank, it could be c, could be x, could be y. 376 00:20:57,790 --> 00:21:00,760 And then I could, let's say, print that value. 377 00:21:00,760 --> 00:21:04,420 And every iteration of the loop, the value of C 378 00:21:04,420 --> 00:21:10,220 will update to be the next, let's say, character in this string, for instance. 379 00:21:10,220 --> 00:21:15,580 So to visualize this, I decided to call this element c So 380 00:21:15,580 --> 00:21:17,770 on the very first iteration of this loop, 381 00:21:17,770 --> 00:21:22,270 c will equal capital I. On the very next iteration, 382 00:21:22,270 --> 00:21:28,990 c will equal N. On the next iteration, c will equal space, and then T, and then 383 00:21:28,990 --> 00:21:30,530 H, and so on. 384 00:21:30,530 --> 00:21:33,430 Now it's not special I called this c. 385 00:21:33,430 --> 00:21:35,860 I could have called it x, or y, or z. 386 00:21:35,860 --> 00:21:39,620 As long as I'm being consistent and say, for x in text, 387 00:21:39,620 --> 00:21:46,960 print x, or for y in text, print y, I'll get the very same result. 388 00:21:46,960 --> 00:21:51,480 So that's how we can go through character by character 389 00:21:51,480 --> 00:21:54,270 and look at these strings. 390 00:21:54,270 --> 00:21:58,570 Now there's one more thing we could do that we could use these types of loops 391 00:21:58,570 --> 00:21:59,070 for. 392 00:21:59,070 --> 00:22:03,450 They're not just good for breaking apart strings in individual characters. 393 00:22:03,450 --> 00:22:06,790 They're also good for working with lists. 394 00:22:06,790 --> 00:22:11,760 So let's say I want to take this text, In the great green room, 395 00:22:11,760 --> 00:22:16,710 and I want to put every individual word into its own string, 396 00:22:16,710 --> 00:22:20,790 but then have those strings be part of some list, let's say. 397 00:22:20,790 --> 00:22:24,720 So if I use text.split, what this will do 398 00:22:24,720 --> 00:22:28,780 is break apart this string at all of the whitespace characters, 399 00:22:28,780 --> 00:22:30,570 in this case, the space characters. 400 00:22:30,570 --> 00:22:33,690 And I'll say, get this list at the end. 401 00:22:33,690 --> 00:22:37,545 The first element is In, the next element is the, 402 00:22:37,545 --> 00:22:40,950 the next element is great, and so on, then green, then room. 403 00:22:40,950 --> 00:22:45,810 So I've basically taken this string, split it at all the spaces, 404 00:22:45,810 --> 00:22:49,750 and now I have a list of individual words. 405 00:22:49,750 --> 00:22:54,830 And this kind of loop is good for going through lists as well. 406 00:22:54,830 --> 00:22:58,040 So I could say something like, for word in words, 407 00:22:58,040 --> 00:23:00,040 let's print that particular word. 408 00:23:00,040 --> 00:23:02,260 And the result will be like this. 409 00:23:02,260 --> 00:23:06,290 First word will equal In, and I'll print In. 410 00:23:06,290 --> 00:23:09,910 Then, word will update and go to the next word. 411 00:23:09,910 --> 00:23:12,490 And I'll print the, and so on, and so forth. 412 00:23:12,490 --> 00:23:16,490 I could go to great and print great, and find at the very end, 413 00:23:16,490 --> 00:23:22,750 I would print the entire piece of text, but word by word. 414 00:23:22,750 --> 00:23:25,930 So I'm seeing a few questions here that I can answer. 415 00:23:25,930 --> 00:23:30,070 Let's see, so then words is not a string but a list. 416 00:23:30,070 --> 00:23:31,550 That is actually true. 417 00:23:31,550 --> 00:23:35,950 So in this case, when I take a string, like text, and decide 418 00:23:35,950 --> 00:23:38,920 to split it using the string method split, 419 00:23:38,920 --> 00:23:43,600 I get back a list of the strings that were 420 00:23:43,600 --> 00:23:47,360 found by splitting on a whitespace character, in this case, 421 00:23:47,360 --> 00:23:50,420 the space itself. 422 00:23:50,420 --> 00:23:52,520 Does text.split always return a list? 423 00:23:52,520 --> 00:23:53,270 Yes, it does. 424 00:23:53,270 --> 00:23:59,280 So it will always return to us a list, whether it's empty or not. 425 00:23:59,280 --> 00:24:04,400 And a question here, let me find it again. 426 00:24:04,400 --> 00:24:07,950 So I don't have to specify what I want c to equal? 427 00:24:07,950 --> 00:24:11,390 I'll go back to this one here. 428 00:24:11,390 --> 00:24:12,858 It's a good question. 429 00:24:12,858 --> 00:24:14,150 So here it's kind of confusing. 430 00:24:14,150 --> 00:24:18,480 I just magically said c, and suddenly c popped into existence, 431 00:24:18,480 --> 00:24:19,820 which is kind of what happened. 432 00:24:19,820 --> 00:24:23,480 I could say for c in text, and magically c 433 00:24:23,480 --> 00:24:28,160 is now some variable I can use for the duration of this particular loop. 434 00:24:28,160 --> 00:24:31,310 So I can't use c on indented. 435 00:24:31,310 --> 00:24:33,050 That's not part of this loop anymore. 436 00:24:33,050 --> 00:24:38,970 But I can use c while I'm indented inside of this particular for loop. 437 00:24:38,970 --> 00:24:43,620 And in this case, Python has some various heuristics 438 00:24:43,620 --> 00:24:45,830 it uses to determine what c should equal. 439 00:24:45,830 --> 00:24:52,580 If I say for blank in some string, I'll get back every individual character 440 00:24:52,580 --> 00:24:53,480 in that string. 441 00:24:53,480 --> 00:24:57,470 And whatever I say, like c, will equal every individual character 442 00:24:57,470 --> 00:24:58,370 in that string. 443 00:24:58,370 --> 00:25:03,290 If, as we saw here, as we saw here, I give it a list, 444 00:25:03,290 --> 00:25:07,790 it will then equal every individual element of that list, 445 00:25:07,790 --> 00:25:11,140 if that makes sense. 446 00:25:11,140 --> 00:25:12,520 Other questions here? 447 00:25:12,520 --> 00:25:14,820 Can you declare c before the loop? 448 00:25:14,820 --> 00:25:18,480 I believe-- not quite sure. 449 00:25:18,480 --> 00:25:19,470 We can test it out. 450 00:25:19,470 --> 00:25:21,690 I'll go over to my Python code here. 451 00:25:21,690 --> 00:25:24,150 I'll say, code loop.py. 452 00:25:24,150 --> 00:25:28,620 Let's say our text is In the great green room. 453 00:25:28,620 --> 00:25:34,230 And we wanted to figure out, maybe c is first text bracket 0. 454 00:25:34,230 --> 00:25:38,820 Then I'll say, for c in text, print c. 455 00:25:38,820 --> 00:25:42,570 I'll say python loop.py. 456 00:25:42,570 --> 00:25:45,270 I don't seem to get an error, so I think you could. 457 00:25:45,270 --> 00:25:53,380 My guess, though, is that it will simply overwrite c, and set it first 458 00:25:53,380 --> 00:25:55,552 equal to that very first character. 459 00:25:55,552 --> 00:25:57,760 So it doesn't quite matter what c is assigned before. 460 00:25:57,760 --> 00:26:00,410 You're going to update it during your loop at the very end. 461 00:26:00,410 --> 00:26:02,275 So I hope that answers your question there. 462 00:26:02,275 --> 00:26:05,060 463 00:26:05,060 --> 00:26:08,410 OK, what if we had a list of integers? 464 00:26:08,410 --> 00:26:09,410 A good question. 465 00:26:09,410 --> 00:26:11,290 It would still be the same thing. 466 00:26:11,290 --> 00:26:15,320 I'd go through the list element by element. 467 00:26:15,320 --> 00:26:21,590 So we have a few rules here, or a few pieces of advice. 468 00:26:21,590 --> 00:26:25,180 So generally, Python's for-in syntax helps 469 00:26:25,180 --> 00:26:29,230 you iterate through components of what we call an iterable, while referring 470 00:26:29,230 --> 00:26:31,520 to them by some convenient name. 471 00:26:31,520 --> 00:26:34,270 So before I said c, and then I said words, 472 00:26:34,270 --> 00:26:37,180 I could have used any name for those. 473 00:26:37,180 --> 00:26:41,680 If the iterable, the thing I could iterate over in a loop, is a list, 474 00:26:41,680 --> 00:26:44,240 I'll iterate over every element of that list. 475 00:26:44,240 --> 00:26:50,260 And if it's a string, I'll iterate over every character of that string. 476 00:26:50,260 --> 00:26:54,660 So feel free to keep these in mind as you work in Python 477 00:26:54,660 --> 00:26:56,610 with these particular kinds of loops. 478 00:26:56,610 --> 00:27:01,500 And indeed, this vocabulary word called an iterable just 479 00:27:01,500 --> 00:27:04,140 stands for anything we could really have a loop over, 480 00:27:04,140 --> 00:27:08,850 whether a string, or a list, or other data types, other objects altogether, 481 00:27:08,850 --> 00:27:11,660 too. 482 00:27:11,660 --> 00:27:15,440 OK, so let's get into some practice. 483 00:27:15,440 --> 00:27:20,300 Here I have several files called text.py. 484 00:27:20,300 --> 00:27:24,117 And we'll play a game where I'll show you a loop in Python, 485 00:27:24,117 --> 00:27:26,450 and you'll tell me what you think should come out of it. 486 00:27:26,450 --> 00:27:30,770 And we'll run that program and see if our guess is correct. 487 00:27:30,770 --> 00:27:36,740 So I'll go back to my terminal and I'll code text0.py, 488 00:27:36,740 --> 00:27:40,290 where I have this particular example. 489 00:27:40,290 --> 00:27:43,190 So I'm curious, for those of you who are here, 490 00:27:43,190 --> 00:27:47,375 what do you think we'll see if I run Python of text0.py? 491 00:27:47,375 --> 00:27:52,340 492 00:27:52,340 --> 00:27:54,380 Yeah, so I'm seeing some good examples here. 493 00:27:54,380 --> 00:27:59,580 Basically, we will see every word printed to our screen. 494 00:27:59,580 --> 00:28:03,680 So I'll say Python text0.py, hit Enter. 495 00:28:03,680 --> 00:28:07,970 And now I see, In the great green room. 496 00:28:07,970 --> 00:28:10,310 And now a question here is, why are we seeing 497 00:28:10,310 --> 00:28:17,340 these new lines, In, new line, the, new line, great, new line, and so on? 498 00:28:17,340 --> 00:28:21,090 Where do you think that's coming from? 499 00:28:21,090 --> 00:28:23,960 So as you saw in lecture, when we use print, 500 00:28:23,960 --> 00:28:27,780 print automatically appends a new line for us. 501 00:28:27,780 --> 00:28:31,430 So when I say, for word in text.split, I'm 502 00:28:31,430 --> 00:28:38,400 taking this string called text, turning it into a list of the individual words. 503 00:28:38,400 --> 00:28:44,420 And then I'm calling every element just this basic convenient name called word. 504 00:28:44,420 --> 00:28:45,960 I'll print it out as I go. 505 00:28:45,960 --> 00:28:50,450 So first, word will be equal to In, then the, then great. 506 00:28:50,450 --> 00:28:53,150 And as I print these out, I'm printing out not just 507 00:28:53,150 --> 00:28:58,880 In, or the, or great, I'm also printing out the new line associated with it. 508 00:28:58,880 --> 00:29:02,420 I could override this if I say end equals nothing here, 509 00:29:02,420 --> 00:29:04,220 perhaps end equals space. 510 00:29:04,220 --> 00:29:09,830 And now I've just recreated that same string but going through it as a list, 511 00:29:09,830 --> 00:29:11,400 if that makes sense. 512 00:29:11,400 --> 00:29:14,270 So I'll put this back to what it was. 513 00:29:14,270 --> 00:29:16,640 All right, let's find another example here. 514 00:29:16,640 --> 00:29:20,640 I'll code up text1.py, our next challenge. 515 00:29:20,640 --> 00:29:22,965 What do you think we'll see here? 516 00:29:22,965 --> 00:29:28,190 517 00:29:28,190 --> 00:29:29,720 What do you think we'll see here? 518 00:29:29,720 --> 00:29:33,640 519 00:29:33,640 --> 00:29:38,493 Yeah, so I'm seeing we should see every character, but no spaces. 520 00:29:38,493 --> 00:29:39,410 So let's try that out. 521 00:29:39,410 --> 00:29:43,810 Let's say, python text1.py to run this program. 522 00:29:43,810 --> 00:29:46,790 And I think you just about hit the nail on the head. 523 00:29:46,790 --> 00:29:51,610 I see I-N T-H-E, so it seems like we're going to have every character, 524 00:29:51,610 --> 00:29:54,310 but there are no longer any spaces. 525 00:29:54,310 --> 00:29:57,070 And I'm curious, what happened to those spaces? 526 00:29:57,070 --> 00:29:58,570 Where do you think they went? 527 00:29:58,570 --> 00:30:02,820 528 00:30:02,820 --> 00:30:03,320 Yeah. 529 00:30:03,320 --> 00:30:08,040 So when we used dot split here, the split string method, we said, 530 00:30:08,040 --> 00:30:12,720 let's take this piece of text and turn it into a list of words. 531 00:30:12,720 --> 00:30:16,430 And when we do that, we're getting rid of the spaces between the words, 532 00:30:16,430 --> 00:30:20,160 such that we just have a list of individual words, no spaces at all. 533 00:30:20,160 --> 00:30:25,070 So word here still refers to an individual word in our list of words, 534 00:30:25,070 --> 00:30:29,810 but then c goes through every individual word we have and prints them 535 00:30:29,810 --> 00:30:33,460 out character by character. 536 00:30:33,460 --> 00:30:34,270 All right. 537 00:30:34,270 --> 00:30:35,410 So let's keep going. 538 00:30:35,410 --> 00:30:36,740 Let's do code text2.py. 539 00:30:36,740 --> 00:30:39,730 540 00:30:39,730 --> 00:30:42,340 What do you think we'll see here? 541 00:30:42,340 --> 00:30:45,340 Some new syntax. 542 00:30:45,340 --> 00:30:47,260 Intuitively, what do you think will happen? 543 00:30:47,260 --> 00:30:52,810 544 00:30:52,810 --> 00:30:58,030 Yeah, so maybe we only see those words that have g in them. 545 00:30:58,030 --> 00:30:58,810 A good guess. 546 00:30:58,810 --> 00:31:02,230 So I'll say python of text2.py, hit Enter, 547 00:31:02,230 --> 00:31:05,620 and here I only see great and green. 548 00:31:05,620 --> 00:31:10,180 So here we see another use of this in syntax in Python. 549 00:31:10,180 --> 00:31:15,250 So when it's used with a for loop, I'm able to iterate over something 550 00:31:15,250 --> 00:31:20,650 like a list or a string and extract those sub elements. 551 00:31:20,650 --> 00:31:25,510 Here, though, if I use it with a conditional, if g in word, 552 00:31:25,510 --> 00:31:28,640 I could do something like linear search and say, 553 00:31:28,640 --> 00:31:33,610 is this character g inside this particular string that I'm giving you? 554 00:31:33,610 --> 00:31:38,590 And if so, it will then say, yes, it is, and do whatever is indented, 555 00:31:38,590 --> 00:31:44,220 or if not, it will say no, it isn't, and just pass right on through. 556 00:31:44,220 --> 00:31:49,000 So I'll say, do this again, and we see great and green. 557 00:31:49,000 --> 00:31:51,530 The question is, what if the word doesn't start with g 558 00:31:51,530 --> 00:31:54,260 but has g inside of it? 559 00:31:54,260 --> 00:31:55,620 This will still work. 560 00:31:55,620 --> 00:31:56,520 So let me try this. 561 00:31:56,520 --> 00:31:59,300 I'll say, have a typo here intentionally. 562 00:31:59,300 --> 00:32:06,260 I'll change great to rgeat and say python text2.py. 563 00:32:06,260 --> 00:32:10,640 And I'm still printing out rgeat because g is in that type of word 564 00:32:10,640 --> 00:32:11,950 I just created. 565 00:32:11,950 --> 00:32:12,950 I hope that makes sense. 566 00:32:12,950 --> 00:32:15,570 567 00:32:15,570 --> 00:32:16,550 All right. 568 00:32:16,550 --> 00:32:20,540 And let's do a few more, text3.py. 569 00:32:20,540 --> 00:32:24,900 Here also is some new syntax. 570 00:32:24,900 --> 00:32:26,610 What do you think we'll see here? 571 00:32:26,610 --> 00:32:30,390 572 00:32:30,390 --> 00:32:31,890 It's OK if you don't know. 573 00:32:31,890 --> 00:32:32,775 We'll explain. 574 00:32:32,775 --> 00:32:35,960 575 00:32:35,960 --> 00:32:39,200 I'm seeing maybe we'll print out every two words. 576 00:32:39,200 --> 00:32:43,705 Maybe we'll print out just great green room. 577 00:32:43,705 --> 00:32:45,080 I think we're on the right track. 578 00:32:45,080 --> 00:32:45,920 I like these ideas. 579 00:32:45,920 --> 00:32:50,720 So let me try python of text3.py and see what we get. 580 00:32:50,720 --> 00:32:55,280 Looks like we get only great, green, and room. 581 00:32:55,280 --> 00:32:58,880 So notice here we have some familiar things. 582 00:32:58,880 --> 00:33:00,710 We have text.split. 583 00:33:00,710 --> 00:33:03,320 We have for word in text.split, print word. 584 00:33:03,320 --> 00:33:06,080 But the only addition here is the addition 585 00:33:06,080 --> 00:33:12,530 of the brackets with some syntax in the middle, like this 2 and this colon. 586 00:33:12,530 --> 00:33:16,010 So let me hopefully give you an idea of how these particular things work 587 00:33:16,010 --> 00:33:17,370 if you'd like to use them. 588 00:33:17,370 --> 00:33:21,350 Let me go ahead and say, code brackets.py, 589 00:33:21,350 --> 00:33:22,910 just so you can see how to use these. 590 00:33:22,910 --> 00:33:25,880 I'll make my text the very same. 591 00:33:25,880 --> 00:33:31,490 And I will now have words equals text.split. 592 00:33:31,490 --> 00:33:40,080 And let's see what happens if I print words bracket 2 colon. 593 00:33:40,080 --> 00:33:43,110 I'll say python brackets.py. 594 00:33:43,110 --> 00:33:49,110 And now I see a list but it only includes great, green, and room. 595 00:33:49,110 --> 00:33:49,960 What if I did this? 596 00:33:49,960 --> 00:33:54,180 What if I did words bracket 1 colon? 597 00:33:54,180 --> 00:33:57,930 Well, now I see the great green room. 598 00:33:57,930 --> 00:34:06,540 What if I did this, words, maybe 0 colon python of brackets.py? 599 00:34:06,540 --> 00:34:10,690 Now I see, In the great green room, the entire list. 600 00:34:10,690 --> 00:34:15,135 So now let me ask you again, what do you think this syntax is doing? 601 00:34:15,135 --> 00:34:19,150 602 00:34:19,150 --> 00:34:22,670 Yeah, so it's changing where our list starts. 603 00:34:22,670 --> 00:34:26,800 And if you remember, we can still use bracket notation 604 00:34:26,800 --> 00:34:29,679 to access some particular element of our list. 605 00:34:29,679 --> 00:34:35,860 If I say words bracket 0, I'll get the first word, In, words bracket 1, 606 00:34:35,860 --> 00:34:38,800 I'll get that second word, the. 607 00:34:38,800 --> 00:34:44,080 But if I want to get not just that first or second word, but all the rest 608 00:34:44,080 --> 00:34:47,210 as well, I can include a colon after it. 609 00:34:47,210 --> 00:34:54,820 And now I get the very element I'm asking for and all the rest. 610 00:34:54,820 --> 00:34:57,400 I could further decide to subset my list. 611 00:34:57,400 --> 00:35:03,220 I could say maybe 1 to 2, like this, and I get just the. 612 00:35:03,220 --> 00:35:05,080 Let me try 1 to 3. 613 00:35:05,080 --> 00:35:07,000 Now I get the great. 614 00:35:07,000 --> 00:35:11,500 So it turns out that in Python, this index, if you're using this colon here, 615 00:35:11,500 --> 00:35:12,730 is inclusive. 616 00:35:12,730 --> 00:35:18,560 It's going to give you that particular indexed value, followed by a colon. 617 00:35:18,560 --> 00:35:21,830 I could say, I want everything with no index here. 618 00:35:21,830 --> 00:35:25,560 But I could also say, let's stop at some particular index. 619 00:35:25,560 --> 00:35:27,810 And don't give me back that particular index. 620 00:35:27,810 --> 00:35:32,390 So I'll say, give me back 1 and 2, not including 3. 621 00:35:32,390 --> 00:35:34,520 I'll get back, the great. 622 00:35:34,520 --> 00:35:37,910 I could say 1 through 4, not including 4. 623 00:35:37,910 --> 00:35:39,740 Now I get, the great green. 624 00:35:39,740 --> 00:35:44,210 So I'm able to manipulate my list and extract certain pieces of it 625 00:35:44,210 --> 00:35:46,650 as I would like to. 626 00:35:46,650 --> 00:35:49,200 So it can sometimes be handy for you as you're working 627 00:35:49,200 --> 00:35:53,670 in Python to manipulate your lists. 628 00:35:53,670 --> 00:35:55,990 The question, is it like an automatic loop? 629 00:35:55,990 --> 00:36:01,110 So there's nothing about this that is particular to loops in general. 630 00:36:01,110 --> 00:36:02,790 I could use this with or without a loop. 631 00:36:02,790 --> 00:36:06,030 But it does help you if you want to only loop 632 00:36:06,030 --> 00:36:11,120 through some elements of your list, and not, let's say, all of them. 633 00:36:11,120 --> 00:36:13,390 A good question. 634 00:36:13,390 --> 00:36:17,890 Yeah, and a question here is, it's like start and stop, so very similar, 635 00:36:17,890 --> 00:36:22,570 where the first number is the index to start at, inclusive, 636 00:36:22,570 --> 00:36:26,410 the next number is the index to end at, exclusive. 637 00:36:26,410 --> 00:36:32,180 I won't get whatever value is indicated by this number here. 638 00:36:32,180 --> 00:36:34,225 OK, other questions, too? 639 00:36:34,225 --> 00:36:39,660 640 00:36:39,660 --> 00:36:45,030 OK, seeing none for now, so let's learn some more exciting things about Python. 641 00:36:45,030 --> 00:36:50,100 One of my favorite new structures in Python is this idea of a dictionary. 642 00:36:50,100 --> 00:36:51,660 And dictionaries are so handy. 643 00:36:51,660 --> 00:36:54,960 So we spent all this time in the last problem 644 00:36:54,960 --> 00:36:59,130 set making our very own dictionary of sorts in speller, 645 00:36:59,130 --> 00:37:03,720 where you were able to add words to a hash table and look them up to see, 646 00:37:03,720 --> 00:37:07,020 are they actually in this hash table, or are they not? 647 00:37:07,020 --> 00:37:10,440 So a dictionary allows you to have some very similar functionality, 648 00:37:10,440 --> 00:37:13,320 but it all comes so easily in Python. 649 00:37:13,320 --> 00:37:17,730 So to help us conceptualize what a dictionary actually is, 650 00:37:17,730 --> 00:37:19,980 I want to go through a bit of a diagram here. 651 00:37:19,980 --> 00:37:22,870 You could think of it very similar to a real dictionary. 652 00:37:22,870 --> 00:37:26,230 So here, let's say I have a blank piece of paper. 653 00:37:26,230 --> 00:37:31,450 And I want to associate some authors with the book that they have written. 654 00:37:31,450 --> 00:37:35,040 So let's assume that every book is written by one author, 655 00:37:35,040 --> 00:37:37,840 and every author writes one book. 656 00:37:37,840 --> 00:37:41,950 So here I could say, I have a dictionary called authors, 657 00:37:41,950 --> 00:37:46,540 and there are a few keys in this dictionary, like Goodnight Moon, 658 00:37:46,540 --> 00:37:49,160 like Corduroy, like Curious George. 659 00:37:49,160 --> 00:37:53,290 And if I look up those particular titles just below, 660 00:37:53,290 --> 00:37:56,050 I'll then see the author that wrote that book. 661 00:37:56,050 --> 00:37:59,080 So let's say I want to look up Goodnight Moon. 662 00:37:59,080 --> 00:38:01,870 Who wrote-- who is the author for Goodnight Moon? 663 00:38:01,870 --> 00:38:03,940 Well, it's Margaret Wise Brown. 664 00:38:03,940 --> 00:38:05,290 Same for Corduroy. 665 00:38:05,290 --> 00:38:06,340 Who wrote Corduroy? 666 00:38:06,340 --> 00:38:09,820 Well, that was, let's see, Don Freeman. 667 00:38:09,820 --> 00:38:13,360 So this is the idea of associating, in this case, book titles 668 00:38:13,360 --> 00:38:15,400 with their authors. 669 00:38:15,400 --> 00:38:20,170 And more particularly, we say, this is the key in our dictionary. 670 00:38:20,170 --> 00:38:24,910 And the result that we get by using that key is the value. 671 00:38:24,910 --> 00:38:29,830 Now there are other ways to use dictionaries. 672 00:38:29,830 --> 00:38:33,740 And ideally, we'd probably want something a bit like this. 673 00:38:33,740 --> 00:38:37,017 Maybe I want to store information on books. 674 00:38:37,017 --> 00:38:38,350 Well, I could very well do that. 675 00:38:38,350 --> 00:38:42,700 I could say, let's make a dictionary called book and give it two keys, 676 00:38:42,700 --> 00:38:44,860 like title and author. 677 00:38:44,860 --> 00:38:47,590 And if I had many of these dictionaries, I 678 00:38:47,590 --> 00:38:51,020 could give different values for those keys. 679 00:38:51,020 --> 00:38:54,340 So I have a book here, the title of which is Goodnight Moon, 680 00:38:54,340 --> 00:38:56,710 and the author is Margaret Wise Brown. 681 00:38:56,710 --> 00:38:58,630 But you could imagine I have maybe perhaps 682 00:38:58,630 --> 00:39:01,090 many of these dictionaries with different titles 683 00:39:01,090 --> 00:39:03,410 and different authors in the end. 684 00:39:03,410 --> 00:39:05,950 So different ways to use dictionaries, one 685 00:39:05,950 --> 00:39:09,100 to keep track of just all authors and their book titles, 686 00:39:09,100 --> 00:39:11,590 or here, keeping track of individual books 687 00:39:11,590 --> 00:39:16,320 and the information on that single book as well. 688 00:39:16,320 --> 00:39:18,590 So let's dive in and see some syntax where 689 00:39:18,590 --> 00:39:20,520 we can actually create dictionaries. 690 00:39:20,520 --> 00:39:25,790 So here I have one example, book equals dict. 691 00:39:25,790 --> 00:39:30,710 Now dict is the way to create some new blank dictionary. 692 00:39:30,710 --> 00:39:33,480 And to visualize what this is doing on the right-hand side, 693 00:39:33,480 --> 00:39:37,880 I get the following, basically a blank piece of paper, if you will, 694 00:39:37,880 --> 00:39:40,430 that's called book. 695 00:39:40,430 --> 00:39:44,570 Now let's say I want to add a title to this book. 696 00:39:44,570 --> 00:39:48,260 Well, I want to add a key called title, and I 697 00:39:48,260 --> 00:39:51,290 want to set the value equal to some book title. 698 00:39:51,290 --> 00:39:56,690 So I could do that like this, book, bracket, and then the key name, 699 00:39:56,690 --> 00:40:01,340 in this case, title, and then equals some particular book title, 700 00:40:01,340 --> 00:40:04,260 like Corduroy, which is another children's book here. 701 00:40:04,260 --> 00:40:09,230 So notice how my bracket syntax is back but I'm no longer using 702 00:40:09,230 --> 00:40:11,240 an index for this dictionary. 703 00:40:11,240 --> 00:40:15,440 I'm using a string which functions as my key. 704 00:40:15,440 --> 00:40:19,400 Now let's go on and I'll say, I want not just the title for this book, 705 00:40:19,400 --> 00:40:22,320 I also want to add in, let's say, the author. 706 00:40:22,320 --> 00:40:26,160 So I'll add a new key and set it equal to some value. 707 00:40:26,160 --> 00:40:29,960 So book, bracket, author now equals Don Freeman. 708 00:40:29,960 --> 00:40:33,890 So now I'm able to see that, in this particular book, the title is Corduroy 709 00:40:33,890 --> 00:40:37,400 and the author is Don Freeman. 710 00:40:37,400 --> 00:40:41,480 And maybe later on in my code I want to print out, 711 00:40:41,480 --> 00:40:47,660 let's say, the author, or the, let's see, print out the title of this book. 712 00:40:47,660 --> 00:40:48,750 I could do so like this. 713 00:40:48,750 --> 00:40:51,740 I could say, print, book, bracket, title. 714 00:40:51,740 --> 00:40:54,950 And without the assignment operator, I should just 715 00:40:54,950 --> 00:40:58,910 see printed the title of the book, which is Corduroy. 716 00:40:58,910 --> 00:41:02,810 So here we see not just how to create a dictionary 717 00:41:02,810 --> 00:41:05,660 and assign some keys and values, but also how 718 00:41:05,660 --> 00:41:09,840 to access the values at certain keys. 719 00:41:09,840 --> 00:41:14,940 Let me ask then, what questions do we have on this syntax here? 720 00:41:14,940 --> 00:41:17,770 721 00:41:17,770 --> 00:41:19,120 Book is just one book, right? 722 00:41:19,120 --> 00:41:19,870 So it is. 723 00:41:19,870 --> 00:41:21,250 So this is a single book. 724 00:41:21,250 --> 00:41:23,620 It wouldn't make sense for this book dictionary 725 00:41:23,620 --> 00:41:28,030 to have more than one title in it because it's only a single book. 726 00:41:28,030 --> 00:41:30,610 And in fact, a limitation of a dictionary 727 00:41:30,610 --> 00:41:35,870 is you can only have one particular value for a particular key. 728 00:41:35,870 --> 00:41:43,150 So I couldn't, let's say, have more than one key called title in here. 729 00:41:43,150 --> 00:41:45,340 Are dictionaries like structs in C? 730 00:41:45,340 --> 00:41:48,880 So it's a interesting comparison. 731 00:41:48,880 --> 00:41:52,120 I think that there's some similar functionality, where 732 00:41:52,120 --> 00:41:56,470 with a struct in C, you could give certain attributes a name. 733 00:41:56,470 --> 00:41:59,000 Like in candidate, we had name and votes, 734 00:41:59,000 --> 00:42:02,530 that's a similar idea to assigning keys and values, where the attribute 735 00:42:02,530 --> 00:42:05,710 name is the key, and the value is the value that you store there. 736 00:42:05,710 --> 00:42:08,867 They are functioning, I believe, a little bit differently 737 00:42:08,867 --> 00:42:09,700 underneath the hood. 738 00:42:09,700 --> 00:42:11,807 So a dictionary is good for just straight 739 00:42:11,807 --> 00:42:13,390 up storing keys and values as a whole. 740 00:42:13,390 --> 00:42:14,710 You can store a lot of them. 741 00:42:14,710 --> 00:42:19,390 A struct often represents some particular entity, like a candidate. 742 00:42:19,390 --> 00:42:22,000 Although, I guess this is also representing a book. 743 00:42:22,000 --> 00:42:25,270 Let's just say, they're very similar in use case, but different in actually 744 00:42:25,270 --> 00:42:29,260 how they're implemented underneath the hood. 745 00:42:29,260 --> 00:42:32,380 Could we have a list of values for a key like title? 746 00:42:32,380 --> 00:42:33,890 We absolutely could. 747 00:42:33,890 --> 00:42:37,930 So although you can only have one particular key 748 00:42:37,930 --> 00:42:42,650 and associate it with one particular value, that value could be a list. 749 00:42:42,650 --> 00:42:44,210 It could be itself a dictionary. 750 00:42:44,210 --> 00:42:46,850 It could be any object in Python you'd like to associate 751 00:42:46,850 --> 00:42:50,730 with that particular key. 752 00:42:50,730 --> 00:42:53,670 And a question here, if we have a list of books and authors, 753 00:42:53,670 --> 00:42:56,320 how would we actually get that to work? 754 00:42:56,320 --> 00:42:57,840 So it's a good question. 755 00:42:57,840 --> 00:43:00,070 And we're going to see that in just a moment here. 756 00:43:00,070 --> 00:43:02,460 I want to pose one scenario, though. 757 00:43:02,460 --> 00:43:08,110 Let's say I mess up and I do something a bit like this. 758 00:43:08,110 --> 00:43:16,350 I print not book title, but I confuse things and I say Corduroy as my key 759 00:43:16,350 --> 00:43:16,980 instead. 760 00:43:16,980 --> 00:43:20,578 What do you think might happen? 761 00:43:20,578 --> 00:43:23,120 You might not know unless you've programmed in Python before, 762 00:43:23,120 --> 00:43:27,120 but the result here is I'll get what we call a key error. 763 00:43:27,120 --> 00:43:31,700 So the key error says, Corduroy is not a key in this dictionary. 764 00:43:31,700 --> 00:43:33,770 It's a value, but it's not a key. 765 00:43:33,770 --> 00:43:37,640 I can't look some value up by using this name, Corduroy. 766 00:43:37,640 --> 00:43:42,275 I can only do that with my keys, which in this case, are title and author. 767 00:43:42,275 --> 00:43:46,400 768 00:43:46,400 --> 00:43:47,780 And I actually see a good-- 769 00:43:47,780 --> 00:43:50,990 someone saved me here with comparing structs and dictionaries. 770 00:43:50,990 --> 00:43:53,570 With a struct, you have specified attributes. 771 00:43:53,570 --> 00:43:57,080 You can't really go ahead and add or remove the attributes later. 772 00:43:57,080 --> 00:43:58,880 With a dictionary, you can. 773 00:43:58,880 --> 00:44:00,750 You can add or remove keys and values. 774 00:44:00,750 --> 00:44:06,240 So that's another difference between dictionaries and structs in C. Good 775 00:44:06,240 --> 00:44:08,910 question. 776 00:44:08,910 --> 00:44:09,840 All right. 777 00:44:09,840 --> 00:44:15,390 So we've seen here how we could have a single book. 778 00:44:15,390 --> 00:44:18,810 If you wanted to actually define it all the way up front, 779 00:44:18,810 --> 00:44:23,820 you could use this syntax as well using curly braces, followed by, let's say, 780 00:44:23,820 --> 00:44:25,620 the key, colon, the value. 781 00:44:25,620 --> 00:44:27,520 We saw this in lecture. 782 00:44:27,520 --> 00:44:34,030 But we could also create not just a single book but a list of books, too. 783 00:44:34,030 --> 00:44:37,750 So here is what this visually would look like. 784 00:44:37,750 --> 00:44:42,060 Notice how I have some square brackets? 785 00:44:42,060 --> 00:44:49,740 And in this list, separated by commas are my individual dictionaries. 786 00:44:49,740 --> 00:44:52,530 These dictionaries have the same key, which is OK 787 00:44:52,530 --> 00:44:54,300 because they're different dictionaries. 788 00:44:54,300 --> 00:44:58,230 But now I could see how I could develop a list of books. 789 00:44:58,230 --> 00:45:00,490 Here I have the list itself. 790 00:45:00,490 --> 00:45:03,610 And here I have the individual dictionaries. 791 00:45:03,610 --> 00:45:08,660 So I'm taking multiple dictionaries and putting them in the same list. 792 00:45:08,660 --> 00:45:13,307 And now it's kind of similar to having a list of books. 793 00:45:13,307 --> 00:45:16,390 Now let's see an example of this so we can make it a little more concrete. 794 00:45:16,390 --> 00:45:19,600 I have this program called books.py. 795 00:45:19,600 --> 00:45:22,060 And I'm going to complete it so that a user is 796 00:45:22,060 --> 00:45:25,090 able to add books to their bookshelf. 797 00:45:25,090 --> 00:45:31,600 So I'll go back to my code space and I'll open up books.py. 798 00:45:31,600 --> 00:45:35,470 And we'll see here, I have some pieces of code already. 799 00:45:35,470 --> 00:45:39,190 I have a list called books. 800 00:45:39,190 --> 00:45:44,020 I know this is a list because it has braces here, square braces. 801 00:45:44,020 --> 00:45:50,500 I have a for loop here that will loop three times, we saw in lecture. 802 00:45:50,500 --> 00:45:52,630 And I want to complete this part where I'm 803 00:45:52,630 --> 00:45:57,130 able to add three books to my shelf, or really, my books list up top. 804 00:45:57,130 --> 00:46:02,330 And then, finally, I'm going to print out that list of books down below. 805 00:46:02,330 --> 00:46:07,750 So let's think about just adding a single book inside this for loop. 806 00:46:07,750 --> 00:46:08,930 What could I do? 807 00:46:08,930 --> 00:46:15,640 Well, we saw earlier, I could make a new dictionary by giving it some name, 808 00:46:15,640 --> 00:46:19,960 like book, and saying that is equal to dict. 809 00:46:19,960 --> 00:46:24,100 That gives me a blank dictionary called book, some blank piece of paper 810 00:46:24,100 --> 00:46:28,310 that I could use to associate keys and values. 811 00:46:28,310 --> 00:46:34,480 And now my question to you is, let's say I want to add a key called author. 812 00:46:34,480 --> 00:46:37,030 How could I add a key called author? 813 00:46:37,030 --> 00:46:38,545 What syntax could I use now? 814 00:46:38,545 --> 00:46:42,783 815 00:46:42,783 --> 00:46:44,200 I could use some syntax like this. 816 00:46:44,200 --> 00:46:48,040 I'd say book, and then bracket, the key name. 817 00:46:48,040 --> 00:46:49,920 So the key name is author. 818 00:46:49,920 --> 00:46:53,410 And I'll then set that equal to-- 819 00:46:53,410 --> 00:46:59,730 well, before we saw something hard coded, like maybe Margaret Wise Brown, 820 00:46:59,730 --> 00:47:01,720 but I want to get input from the user. 821 00:47:01,720 --> 00:47:03,130 So I could very well do this. 822 00:47:03,130 --> 00:47:06,330 I could say, it's the result of calling input. 823 00:47:06,330 --> 00:47:10,990 And I'll say Enter an author, like this. 824 00:47:10,990 --> 00:47:15,210 So now whatever the user types in will be associated with the author 825 00:47:15,210 --> 00:47:18,060 key inside this dictionary called book. 826 00:47:18,060 --> 00:47:18,910 Let's try this. 827 00:47:18,910 --> 00:47:22,140 I'll say, book, and I want to have a key called title. 828 00:47:22,140 --> 00:47:29,860 So I'll say book, bracket, title, in quotes, input, Enter a title. 829 00:47:29,860 --> 00:47:33,750 So now I'm doing the same thing twice for different keys. 830 00:47:33,750 --> 00:47:37,080 And now I can do that final step where I'm 831 00:47:37,080 --> 00:47:40,560 trying to add this dictionary to my list of books. 832 00:47:40,560 --> 00:47:43,900 Now we saw in lecture that to add some item to a list, 833 00:47:43,900 --> 00:47:46,760 I could simply say the name of that list, 834 00:47:46,760 --> 00:47:50,470 and then dot append to add some new item. 835 00:47:50,470 --> 00:47:56,630 And I'll say book, books.append, book, which is this new dictionary I created. 836 00:47:56,630 --> 00:48:00,490 So every loop will create this new blank dictionary, 837 00:48:00,490 --> 00:48:04,510 call it book, add in a new key and value for author and title, 838 00:48:04,510 --> 00:48:06,880 and then add that book to our list. 839 00:48:06,880 --> 00:48:08,290 Let's try that out here. 840 00:48:08,290 --> 00:48:10,780 I'll say python books.py. 841 00:48:10,780 --> 00:48:14,020 I'll say Margaret Wise Brown. 842 00:48:14,020 --> 00:48:18,580 And then I'll say, Goodnight Moon is one book. 843 00:48:18,580 --> 00:48:21,280 Notice how I'm prompted again for another book. 844 00:48:21,280 --> 00:48:25,720 I'll say Corduroy and Don Freeman. 845 00:48:25,720 --> 00:48:32,110 One more book, I'll say Curious George, and then H.A. Ray. 846 00:48:32,110 --> 00:48:36,130 Hit Enter, and then what happened? 847 00:48:36,130 --> 00:48:39,040 What can we do down below? 848 00:48:39,040 --> 00:48:41,780 I still need to print my list of books. 849 00:48:41,780 --> 00:48:45,250 So I'm curious, for those of you who are here, what kind of loop 850 00:48:45,250 --> 00:48:49,600 do you think would be good for looping over every individual dictionary 851 00:48:49,600 --> 00:48:51,280 and printing out what's inside? 852 00:48:51,280 --> 00:48:56,040 853 00:48:56,040 --> 00:48:57,960 Yeah, so I'm seeing maybe a for loop. 854 00:48:57,960 --> 00:49:02,820 I'll say for, and I could take advantage of Python's special for-loop syntax. 855 00:49:02,820 --> 00:49:07,860 I could say something like, for book in books, colon, 856 00:49:07,860 --> 00:49:11,280 and then figure out what to do inside that loop. 857 00:49:11,280 --> 00:49:13,650 But this is pretty much why people like Python so much. 858 00:49:13,650 --> 00:49:18,810 I could say, write in English, for every book in my list of books, 859 00:49:18,810 --> 00:49:20,415 do something, right? 860 00:49:20,415 --> 00:49:21,540 I could call this anything. 861 00:49:21,540 --> 00:49:25,320 I could say, for novel in books, that's also valid 862 00:49:25,320 --> 00:49:27,690 as long as I use novel down below here. 863 00:49:27,690 --> 00:49:30,420 But here I'll stick to book. 864 00:49:30,420 --> 00:49:32,340 That's actually kind of a convention. 865 00:49:32,340 --> 00:49:37,290 If you have a list that is a plural, like books, 866 00:49:37,290 --> 00:49:40,350 and you have a for loop like this, it's often 867 00:49:40,350 --> 00:49:44,220 convention to make this the singular version of that noun, 868 00:49:44,220 --> 00:49:46,800 and this the plural version of that noun. 869 00:49:46,800 --> 00:49:48,190 So let's see what we can do. 870 00:49:48,190 --> 00:49:50,850 I could print out book, let's say. 871 00:49:50,850 --> 00:49:57,640 So I'll say python books.py, and let me hit Enter. 872 00:49:57,640 --> 00:50:02,230 I'll say Goodnight Moon. 873 00:50:02,230 --> 00:50:03,640 Oh, actually, that's wrong. 874 00:50:03,640 --> 00:50:05,470 That is not the author of this book. 875 00:50:05,470 --> 00:50:06,830 That is the title of this book. 876 00:50:06,830 --> 00:50:09,970 So if you want to exit your Python program, you can Control-C, 877 00:50:09,970 --> 00:50:12,310 and I will then do Python books again. 878 00:50:12,310 --> 00:50:18,010 And I will say Margaret Wise Brown, and then type in Goodnight Moon. 879 00:50:18,010 --> 00:50:23,470 I'll then do Don Freeman and Corduroy. 880 00:50:23,470 --> 00:50:28,720 And then I'll do H.A. Ray, and then I'll do Curious George, 881 00:50:28,720 --> 00:50:30,980 so three new books here. 882 00:50:30,980 --> 00:50:34,750 I'll hit Enter, and now I'll see those three 883 00:50:34,750 --> 00:50:37,070 dictionaries printed to the screen. 884 00:50:37,070 --> 00:50:39,850 So notice here we have them denoted in curly braces, 885 00:50:39,850 --> 00:50:44,050 the key author, and the key title, all associated, hopefully, 886 00:50:44,050 --> 00:50:46,830 as I would like them to be. 887 00:50:46,830 --> 00:50:54,385 So questions then on this short program we wrote to add books to our list? 888 00:50:54,385 --> 00:51:00,000 889 00:51:00,000 --> 00:51:01,140 Any questions? 890 00:51:01,140 --> 00:51:06,390 891 00:51:06,390 --> 00:51:09,370 What if we want to print the keys? 892 00:51:09,370 --> 00:51:10,245 It's a good question. 893 00:51:10,245 --> 00:51:13,590 894 00:51:13,590 --> 00:51:14,890 Maybe this is helpful here. 895 00:51:14,890 --> 00:51:17,460 So one thing you'll learn as you do more Python 896 00:51:17,460 --> 00:51:21,190 is that dictionaries themselves have their own methods. 897 00:51:21,190 --> 00:51:24,760 And one of these methods is the dot keys method. 898 00:51:24,760 --> 00:51:28,170 So I could say, print book.keys, and that 899 00:51:28,170 --> 00:51:31,710 should return to me all the keys that are in this dictionary 900 00:51:31,710 --> 00:51:33,520 without their values. 901 00:51:33,520 --> 00:51:34,450 So let me try this. 902 00:51:34,450 --> 00:51:39,060 I'll say, python of books.py, and I'll go through this very quickly. 903 00:51:39,060 --> 00:51:42,970 I'll just say 1, 2, 3, 4, 5, 6. 904 00:51:42,970 --> 00:51:48,060 And notice here how when I said print book.keys, 905 00:51:48,060 --> 00:51:51,870 I'm now actually getting the keys associated 906 00:51:51,870 --> 00:51:54,000 with that particular dictionary. 907 00:51:54,000 --> 00:51:59,505 And notice how all three dictionaries have the same keys in this case. 908 00:51:59,505 --> 00:52:03,340 909 00:52:03,340 --> 00:52:06,810 Let's see, other questions, too? 910 00:52:06,810 --> 00:52:10,680 If there is a book with the same name but different author, 911 00:52:10,680 --> 00:52:13,500 could you have it search for the right one? 912 00:52:13,500 --> 00:52:14,520 You probably could. 913 00:52:14,520 --> 00:52:17,400 So there is a way that's a little more advanced here. 914 00:52:17,400 --> 00:52:23,940 I could say something like, for book in books, if the author equals, 915 00:52:23,940 --> 00:52:30,640 let's say, Margaret Wise Brown, then print the book, like this. 916 00:52:30,640 --> 00:52:34,090 There's all kinds of different logic you could use here in Python. 917 00:52:34,090 --> 00:52:39,907 You could, in that way, differentiate between different, 918 00:52:39,907 --> 00:52:41,740 let's say a book has the same title but it's 919 00:52:41,740 --> 00:52:43,420 written by Margaret Wise Brown versus somebody else. 920 00:52:43,420 --> 00:52:45,170 You could differentiate books like this. 921 00:52:45,170 --> 00:52:47,290 So I hope that gives you some idea of what 922 00:52:47,290 --> 00:52:51,190 you could do to help filter your books. 923 00:52:51,190 --> 00:52:55,210 Let's see, the question, is it possible to remove the curly braces 924 00:52:55,210 --> 00:52:56,870 in the answer, which is a good one. 925 00:52:56,870 --> 00:53:00,670 If I scroll up a little bit, notice how we saw here curly 926 00:53:00,670 --> 00:53:02,950 braces when we print out the dictionary. 927 00:53:02,950 --> 00:53:06,880 That's just Python's default printing for dictionaries. 928 00:53:06,880 --> 00:53:09,550 But if I wanted to get fancy and make this a little prettier, 929 00:53:09,550 --> 00:53:10,880 I certainly could. 930 00:53:10,880 --> 00:53:16,780 I could, for book in books, print not just the dictionary itself, 931 00:53:16,780 --> 00:53:19,545 but let's say have a full sentence I print. 932 00:53:19,545 --> 00:53:21,170 Let's say I wanted something like this. 933 00:53:21,170 --> 00:53:26,560 I could say, Author wrote book. 934 00:53:26,560 --> 00:53:28,310 That's the structure of my sentence here, 935 00:53:28,310 --> 00:53:32,780 Author wrote book, where author is the author, the actual author of that book, 936 00:53:32,780 --> 00:53:35,720 and book is the title of that book. 937 00:53:35,720 --> 00:53:41,390 Well, we saw in lecture, Python allows us to interpolate certain values 938 00:53:41,390 --> 00:53:43,050 using f-strings. 939 00:53:43,050 --> 00:53:45,560 So here I'll mark it as an f-string. 940 00:53:45,560 --> 00:53:47,060 We're putting f at the beginning. 941 00:53:47,060 --> 00:53:52,850 And I could use curly braces to substitute in some value of a variable. 942 00:53:52,850 --> 00:53:57,170 I'll say, book, bracket, author, which will give me, 943 00:53:57,170 --> 00:54:01,010 in this case, book, bracket, author, the author associated 944 00:54:01,010 --> 00:54:04,430 with a particular book and interpolate that in my sentence. 945 00:54:04,430 --> 00:54:05,490 Now I'll do this. 946 00:54:05,490 --> 00:54:10,350 I'll say, book, bracket, title, like this. 947 00:54:10,350 --> 00:54:14,480 And now I should be able to have three sentences that say, 948 00:54:14,480 --> 00:54:19,520 Margaret Wise Brown wrote Goodnight Moon, or Don Freeman wrote Corduroy, 949 00:54:19,520 --> 00:54:22,040 or H.A. Ray wrote Curious George. 950 00:54:22,040 --> 00:54:24,140 If you're curious, I'll invite you to try this out 951 00:54:24,140 --> 00:54:25,932 on your own computer on your own code space 952 00:54:25,932 --> 00:54:28,160 to see how that works well for you. 953 00:54:28,160 --> 00:54:30,680 954 00:54:30,680 --> 00:54:31,180 All right. 955 00:54:31,180 --> 00:54:35,280 956 00:54:35,280 --> 00:54:36,390 Other questions, too? 957 00:54:36,390 --> 00:54:40,410 958 00:54:40,410 --> 00:54:43,455 A question, how could we print it like key 1 equals value 1? 959 00:54:43,455 --> 00:54:46,790 960 00:54:46,790 --> 00:54:56,680 You could do something like, author wrote book, 961 00:54:56,680 --> 00:55:05,080 or author equals book author, title equals book title. 962 00:55:05,080 --> 00:55:07,900 Either way you have to access the value associated 963 00:55:07,900 --> 00:55:10,240 with that particular key using an f-string, 964 00:55:10,240 --> 00:55:13,040 and then substitute it in to your string at the end. 965 00:55:13,040 --> 00:55:15,363 So I hope that helps. 966 00:55:15,363 --> 00:55:17,780 There are other ways to do this besides f-strings, though. 967 00:55:17,780 --> 00:55:19,430 You could use pluses and so on. 968 00:55:19,430 --> 00:55:25,020 We'll focus here on f-strings given time. 969 00:55:25,020 --> 00:55:25,940 All right. 970 00:55:25,940 --> 00:55:29,090 So just a few more things to wrap us up, one of which 971 00:55:29,090 --> 00:55:32,060 is this idea of working with libraries and modules. 972 00:55:32,060 --> 00:55:36,890 So we saw here how to build up our very own bookshelf of books 973 00:55:36,890 --> 00:55:39,920 by having the user type in their books manually. 974 00:55:39,920 --> 00:55:42,650 But odds are, nobody wants to sit around typing 975 00:55:42,650 --> 00:55:44,630 in manually all the books they own. 976 00:55:44,630 --> 00:55:47,300 It's better if we could use some kind of file 977 00:55:47,300 --> 00:55:51,030 to actually read in all possible books. 978 00:55:51,030 --> 00:55:55,190 So I want to introduce you to this idea of libraries and modules, 979 00:55:55,190 --> 00:56:00,630 and in particular, this idea of being able to load in some list of books. 980 00:56:00,630 --> 00:56:04,160 So let's say I am very conscientious. 981 00:56:04,160 --> 00:56:07,500 I want to keep track of all the books that I own and I have a spreadsheet, 982 00:56:07,500 --> 00:56:12,530 a bit like this, where I have a title column and an author column. 983 00:56:12,530 --> 00:56:15,620 And each of these rows is some particular book 984 00:56:15,620 --> 00:56:18,720 with the title and author associated between them. 985 00:56:18,720 --> 00:56:21,620 So in Excel, it'll look a bit like this. 986 00:56:21,620 --> 00:56:26,640 But actually, I could export this file into what we call a CSV, 987 00:56:26,640 --> 00:56:29,250 or comma-separated values file. 988 00:56:29,250 --> 00:56:32,730 This is a very common file format because it's so portable, 989 00:56:32,730 --> 00:56:35,890 because it's so useful, and it's actually pretty simple. 990 00:56:35,890 --> 00:56:40,530 So here, notice how the first row is title, comma, author. 991 00:56:40,530 --> 00:56:43,350 These are the names of my columns. 992 00:56:43,350 --> 00:56:48,540 Then, for every new line that I have in this file, books.csv, 993 00:56:48,540 --> 00:56:53,670 I have a title, comma, the author, so Goodnight Moon, comma, 994 00:56:53,670 --> 00:56:58,870 Margaret Wise Brown, Corduroy, comma, Don Freeman, and so on, and so forth. 995 00:56:58,870 --> 00:57:04,710 So basically, I have the same kind of idea here, but now in a single file, 996 00:57:04,710 --> 00:57:08,640 where every element is separated by a comma 997 00:57:08,640 --> 00:57:11,550 to associate the titles and the authors. 998 00:57:11,550 --> 00:57:15,730 So Python actually works very well with CSV files. 999 00:57:15,730 --> 00:57:18,660 And they have a very own library or a module 1000 00:57:18,660 --> 00:57:22,860 called CSV that gives you access to functions, 1001 00:57:22,860 --> 00:57:26,830 methods you can use to read a CSV file. 1002 00:57:26,830 --> 00:57:30,840 So let's actually try reading in some CSV file 1003 00:57:30,840 --> 00:57:34,720 and visualizing what would happen along the way. 1004 00:57:34,720 --> 00:57:39,690 So here I have, at the top of my Python program, import csv. 1005 00:57:39,690 --> 00:57:46,470 This tells Python I want to load in this library, this module, called csv. 1006 00:57:46,470 --> 00:57:48,630 Import csv. 1007 00:57:48,630 --> 00:57:51,750 Now it's kind of similar to me telling my program, 1008 00:57:51,750 --> 00:57:55,410 give me that big box of stuff called csv. 1009 00:57:55,410 --> 00:57:59,650 And it has many features inside of it, many functionalities, et cetera, 1010 00:57:59,650 --> 00:58:03,660 but I just want that entire box of things that you call csv. 1011 00:58:03,660 --> 00:58:07,830 And maybe inside that box, there are some particular books, 1012 00:58:07,830 --> 00:58:09,930 there are some particular values or functions 1013 00:58:09,930 --> 00:58:13,890 I could use, like DictReader, DictWriter, reader, and writer. 1014 00:58:13,890 --> 00:58:17,550 I would only know these things if I read the Python Documentation. 1015 00:58:17,550 --> 00:58:21,040 But I could say, Python, give me that entire box, 1016 00:58:21,040 --> 00:58:23,490 and give me access to the functions that are inside 1017 00:58:23,490 --> 00:58:26,830 of it, among them these here. 1018 00:58:26,830 --> 00:58:29,970 So if I wanted to use, let's say, DictReader, 1019 00:58:29,970 --> 00:58:32,490 which is a function that allows me to read 1020 00:58:32,490 --> 00:58:39,900 a CSV as a dictionary, every row as a dictionary, I could say csv.DictReader. 1021 00:58:39,900 --> 00:58:43,350 So I'm saying, csv, the module name, and then 1022 00:58:43,350 --> 00:58:47,040 dot the function, method, object, whatever 1023 00:58:47,040 --> 00:58:51,720 it is I want from that particular module or that particular library. 1024 00:58:51,720 --> 00:58:54,480 So that gives me access to DictReader. 1025 00:58:54,480 --> 00:58:56,760 Let's say I want reader as well. 1026 00:58:56,760 --> 00:58:59,160 I could use csv.reader. 1027 00:58:59,160 --> 00:59:02,580 So here, again, is some example of Python's dot syntax, 1028 00:59:02,580 --> 00:59:07,920 where I'm now using the library name, and then dot the function, or object, 1029 00:59:07,920 --> 00:59:14,870 or whatever it is I want that's inside that particular library or module. 1030 00:59:14,870 --> 00:59:16,860 I could also do things a bit like this. 1031 00:59:16,860 --> 00:59:22,820 I could say, import csv, but then only give me some particular function 1032 00:59:22,820 --> 00:59:23,750 or object. 1033 00:59:23,750 --> 00:59:29,540 Import csv, or let's say, from csv import DictReader. 1034 00:59:29,540 --> 00:59:31,940 This is another way of importing modules, 1035 00:59:31,940 --> 00:59:36,750 or functions, et cetera, from csv import some particular element of it. 1036 00:59:36,750 --> 00:59:39,650 So here, I'm only getting DictReader. 1037 00:59:39,650 --> 00:59:45,230 And I could then use DictReader without the csv dot in front of it, 1038 00:59:45,230 --> 00:59:50,270 because Python knows, well, DictReader comes from the csv library. 1039 00:59:50,270 --> 00:59:57,680 So two ways here, often folks will tend to import an entire library if they're 1040 00:59:57,680 --> 01:00:00,170 using many pieces of that library. 1041 01:00:00,170 --> 01:00:03,770 If they're only using one or two, though, they might be more particular 1042 01:00:03,770 --> 01:00:07,070 and say, from csv import that particular function 1043 01:00:07,070 --> 01:00:11,510 or object they want from that library, if that makes sense. 1044 01:00:11,510 --> 01:00:16,950 So let's get into how we could use this to read and write from files. 1045 01:00:16,950 --> 01:00:21,890 So here, generally the syntax we'll use to work with reading and writing 1046 01:00:21,890 --> 01:00:30,770 files in Python, with open FILENAME as file, with open FILENAME as file. 1047 01:00:30,770 --> 01:00:37,820 Now this is allowing us to open up some file using this function called open, 1048 01:00:37,820 --> 01:00:40,800 and open it up with the file name. 1049 01:00:40,800 --> 01:00:42,920 So let's say I want to open up books.csv. 1050 01:00:42,920 --> 01:00:48,600 I could say, with open books.csv as file, colon. 1051 01:00:48,600 --> 01:00:52,850 And now, as long as I'm indented in this block of code here, 1052 01:00:52,850 --> 01:00:59,250 I'm able to access that file through the name I gave it, file, in this case. 1053 01:00:59,250 --> 01:01:03,800 So let's say I want to read in all the text from that file. 1054 01:01:03,800 --> 01:01:05,490 I could do something a bit like this. 1055 01:01:05,490 --> 01:01:08,990 I could say, create a new variable called text, 1056 01:01:08,990 --> 01:01:14,430 and set it equal to the result of calling file.read. 1057 01:01:14,430 --> 01:01:19,980 So when I use open, open returns to me some file object. 1058 01:01:19,980 --> 01:01:24,640 And one of the methods of that file object is read. 1059 01:01:24,640 --> 01:01:28,140 So if I called my file, just plain old file here, 1060 01:01:28,140 --> 01:01:32,140 I could say file.read to simply take all the data inside of it 1061 01:01:32,140 --> 01:01:37,170 and store it inside a Python variable, this one called text. 1062 01:01:37,170 --> 01:01:41,130 That works really well for plain text files, like dot txt files, 1063 01:01:41,130 --> 01:01:41,710 for instance. 1064 01:01:41,710 --> 01:01:44,040 I can just get and grab all the contents and put it 1065 01:01:44,040 --> 01:01:47,490 inside some particular text variable. 1066 01:01:47,490 --> 01:01:54,420 For a CSV, though, I might want to use Python's CSV module, CSV library. 1067 01:01:54,420 --> 01:01:59,640 So for that, I could say, maybe I need to make a particular kind of reader 1068 01:01:59,640 --> 01:02:02,160 for this file. 1069 01:02:02,160 --> 01:02:06,930 I could say, file_reader equals csv.DictReader 1070 01:02:06,930 --> 01:02:09,177 given the particular file. 1071 01:02:09,177 --> 01:02:11,260 And we'll see how this works in just a little bit, 1072 01:02:11,260 --> 01:02:16,530 but DictReader basically lets me say, I want to read every row of this CSV file 1073 01:02:16,530 --> 01:02:18,270 as a dictionary. 1074 01:02:18,270 --> 01:02:22,470 And by specifying this variable that I called file_reader, 1075 01:02:22,470 --> 01:02:25,920 it will serve as a helper for me to read this file. 1076 01:02:25,920 --> 01:02:31,320 And it will return to me every row of that CSV as a dictionary that I 1077 01:02:31,320 --> 01:02:34,560 could actually iterate as a dictionary. 1078 01:02:34,560 --> 01:02:40,060 And I could iterate over all those dictionaries in my particular file. 1079 01:02:40,060 --> 01:02:41,850 So let's try this step here. 1080 01:02:41,850 --> 01:02:47,580 For row in file_reader, this is me iterating over my file_reader. 1081 01:02:47,580 --> 01:02:51,270 And every time I do, it will return to me some dictionary 1082 01:02:51,270 --> 01:02:53,310 I could use in my program. 1083 01:02:53,310 --> 01:02:56,082 And I could do something with it in the end. 1084 01:02:56,082 --> 01:02:59,040 So I think it's worth being a little more concrete here and seeing this 1085 01:02:59,040 --> 01:03:00,370 actually in action. 1086 01:03:00,370 --> 01:03:01,540 So I'll go to my code space. 1087 01:03:01,540 --> 01:03:06,210 And I'll show you what we could do to import a file called books.csv 1088 01:03:06,210 --> 01:03:10,830 using a Python program called reads.py so you can build up 1089 01:03:10,830 --> 01:03:13,720 a list of children's books. 1090 01:03:13,720 --> 01:03:17,860 So I'll go to my code space here and I'll do this. 1091 01:03:17,860 --> 01:03:20,910 I'll say, code reads.py. 1092 01:03:20,910 --> 01:03:23,490 And notice what I already have here. 1093 01:03:23,490 --> 01:03:27,060 I have imported the csv library. 1094 01:03:27,060 --> 01:03:30,180 I have a blank list of books. 1095 01:03:30,180 --> 01:03:36,780 And my first step is to add book to the shelf by reading from books.csv. 1096 01:03:36,780 --> 01:03:41,340 So to open up this file, it's generally good practice in Python 1097 01:03:41,340 --> 01:03:43,900 to use the with syntax. 1098 01:03:43,900 --> 01:03:49,320 So I'll say, with open, and I'll give it some file name, like books.csv. 1099 01:03:49,320 --> 01:03:54,710 I want to open the file that's already stored on my computer called books.csv. 1100 01:03:54,710 --> 01:04:01,080 I want to call that file in my program simply file, like this. 1101 01:04:01,080 --> 01:04:08,990 Now if I say text equals file.read, and then, finally, maybe print text, 1102 01:04:08,990 --> 01:04:10,880 let's see what we get. 1103 01:04:10,880 --> 01:04:15,890 I'll say python of reads.py, and now I see, printed to 1104 01:04:15,890 --> 01:04:19,670 my screen, the contents of books.csv. 1105 01:04:19,670 --> 01:04:24,110 To prove it to you, I could say code books.csv. 1106 01:04:24,110 --> 01:04:28,040 And here is the contents of books.csv. 1107 01:04:28,040 --> 01:04:31,160 Notice how I just saw those printed to my terminal 1108 01:04:31,160 --> 01:04:37,230 all by using file.read and this with open syntax. 1109 01:04:37,230 --> 01:04:41,720 So to answer your question here, dot read does not read only one line, 1110 01:04:41,720 --> 01:04:47,440 and in fact, reads all the lines in a particular file. 1111 01:04:47,440 --> 01:04:49,860 Now this isn't quite helpful to me because I don't want 1112 01:04:49,860 --> 01:04:52,020 to read just everything all at once. 1113 01:04:52,020 --> 01:04:56,640 If I have this, let's see, if I have just all this text all 1114 01:04:56,640 --> 01:04:59,730 in the same variable, not super helpful for me. 1115 01:04:59,730 --> 01:05:03,540 What I would love is if we could do the same thing as earlier, where I instead 1116 01:05:03,540 --> 01:05:06,280 have a list of dictionaries. 1117 01:05:06,280 --> 01:05:13,900 So for that, I could make use of the CSV library's DictReader function. 1118 01:05:13,900 --> 01:05:14,950 So I could do this. 1119 01:05:14,950 --> 01:05:20,970 I could say, let's create a reader for my CSV, a reader that 1120 01:05:20,970 --> 01:05:25,530 is csv.DictReader, given my file. 1121 01:05:25,530 --> 01:05:30,000 Now this is simply a helper the makers of the Python CSV module, 1122 01:05:30,000 --> 01:05:34,350 they decided that they would create this helper called DictReader 1123 01:05:34,350 --> 01:05:39,270 that, when given a file, allows you to iterate over it and get 1124 01:05:39,270 --> 01:05:42,790 a dictionary for every row in the CSV. 1125 01:05:42,790 --> 01:05:43,660 So let's try this. 1126 01:05:43,660 --> 01:05:47,890 I'll say, iterate over this reader for row in reader, 1127 01:05:47,890 --> 01:05:49,330 and let's see what's happening. 1128 01:05:49,330 --> 01:05:51,250 I'll just print row. 1129 01:05:51,250 --> 01:05:56,830 I'll go back to my terminal and I will then say, python of reads.py. 1130 01:05:56,830 --> 01:05:59,710 And now I see something a little better formatted. 1131 01:05:59,710 --> 01:06:05,140 I actually see dictionaries, where they have a title, and an author key, 1132 01:06:05,140 --> 01:06:07,340 and values associated with them. 1133 01:06:07,340 --> 01:06:12,010 So Python basically converted my books.csv file 1134 01:06:12,010 --> 01:06:18,310 into a list of dictionaries with the given keys 1135 01:06:18,310 --> 01:06:21,820 and values I gave it in books.csv. 1136 01:06:21,820 --> 01:06:28,060 Notice that the key title and key author are each in the dictionaries 1137 01:06:28,060 --> 01:06:31,600 I have available to me here. 1138 01:06:31,600 --> 01:06:36,960 So now what do I want to do as I loop over this reader which gives me 1139 01:06:36,960 --> 01:06:39,690 every row as a single dictionary? 1140 01:06:39,690 --> 01:06:42,180 Well, I could probably just add it to my list of books. 1141 01:06:42,180 --> 01:06:50,100 I could say, books.append, and I want to append a given row I read from my CSV. 1142 01:06:50,100 --> 01:06:57,110 So if I do now, for book in books, print book, let's see what happens. 1143 01:06:57,110 --> 01:07:00,410 I'll say python of reads.py, and I should 1144 01:07:00,410 --> 01:07:05,150 see the same thing, which I could tidy up, as we saw in the last one to say, 1145 01:07:05,150 --> 01:07:08,420 for example, Margaret Wise Brown wrote Goodnight Moon, 1146 01:07:08,420 --> 01:07:12,920 Don Freeman wrote Corduroy, et cetera. 1147 01:07:12,920 --> 01:07:16,840 So let's pause here and let me ask, what questions do we have? 1148 01:07:16,840 --> 01:07:18,580 I see a few already. 1149 01:07:18,580 --> 01:07:21,370 One is, what happens if the file has no header? 1150 01:07:21,370 --> 01:07:25,060 So in books.csv, I had title and author. 1151 01:07:25,060 --> 01:07:29,050 What happens if I actually remove this from books.csv? 1152 01:07:29,050 --> 01:07:30,850 We could try it and see what happens. 1153 01:07:30,850 --> 01:07:39,700 I'll say python of reads.py, and something odd has happened here. 1154 01:07:39,700 --> 01:07:44,710 I seem to be saying my key is Goodnight Moon. 1155 01:07:44,710 --> 01:07:47,530 Another key is Margaret Wise Brown. 1156 01:07:47,530 --> 01:07:49,480 Why do you think that happened? 1157 01:07:49,480 --> 01:07:53,440 1158 01:07:53,440 --> 01:07:58,195 It seems like, to me, the first line, whatever it is, 1159 01:07:58,195 --> 01:08:01,400 is interpreted as the keys I want in my dictionaries, 1160 01:08:01,400 --> 01:08:03,730 so long as I'm using DictReader. 1161 01:08:03,730 --> 01:08:09,220 So I better be sure to include the key names that I want in my CSV, 1162 01:08:09,220 --> 01:08:12,655 otherwise they'll be interpreted as key values even when I don't want them to. 1163 01:08:12,655 --> 01:08:15,590 1164 01:08:15,590 --> 01:08:17,960 Let's see, other questions here? 1165 01:08:17,960 --> 01:08:20,120 What data type is reader? 1166 01:08:20,120 --> 01:08:25,340 Reader, if it's helpful, is a kind of iterable. 1167 01:08:25,340 --> 01:08:27,680 Remember earlier we said an iterable is anything 1168 01:08:27,680 --> 01:08:29,880 you can iterate over, like in a loop? 1169 01:08:29,880 --> 01:08:36,439 So it is an iterator or iterable that returns to us every loop a given 1170 01:08:36,439 --> 01:08:39,724 row as a dictionary from our CSV. 1171 01:08:39,724 --> 01:08:42,260 1172 01:08:42,260 --> 01:08:44,060 Other questions here? 1173 01:08:44,060 --> 01:08:47,540 Are the words module and library synonymous? 1174 01:08:47,540 --> 01:08:48,630 Great question. 1175 01:08:48,630 --> 01:08:52,520 They are technically different in Python. 1176 01:08:52,520 --> 01:08:54,260 This varies by language. 1177 01:08:54,260 --> 01:08:57,287 A module and a library are two different things. 1178 01:08:57,287 --> 01:08:59,870 I've been saying both interchangeably because it doesn't quite 1179 01:08:59,870 --> 01:09:01,649 matter for our purposes here. 1180 01:09:01,649 --> 01:09:03,859 But if you want to, you can go off and read more 1181 01:09:03,859 --> 01:09:07,340 about the difference between a library and a module in Python. 1182 01:09:07,340 --> 01:09:14,069 They are two distinct things that have very slightly different definitions. 1183 01:09:14,069 --> 01:09:15,630 Let's see, other ones, too? 1184 01:09:15,630 --> 01:09:18,569 1185 01:09:18,569 --> 01:09:22,729 Do we have to specify the mode that we're reading the file in? 1186 01:09:22,729 --> 01:09:26,229 You can if you'd like to, often good practice. 1187 01:09:26,229 --> 01:09:30,330 So in C, remember, we had reading and writing mode, and other modes, too. 1188 01:09:30,330 --> 01:09:31,979 In Python, we have the same thing. 1189 01:09:31,979 --> 01:09:36,279 I could give it the filename, comma, the mode I want to use. 1190 01:09:36,279 --> 01:09:38,250 In this case, I'm just reading so I'll say 1191 01:09:38,250 --> 01:09:42,990 the mode is R. I could also create a new file using w. 1192 01:09:42,990 --> 01:09:47,399 And there are other modes, too, if you want to read, write, et cetera. 1193 01:09:47,399 --> 01:09:48,419 To files as well. 1194 01:09:48,419 --> 01:09:51,830 1195 01:09:51,830 --> 01:09:58,200 Let's see, why do we use with? 1196 01:09:58,200 --> 01:10:00,740 So another good question here. 1197 01:10:00,740 --> 01:10:07,530 With basically handles a few things for us that we might otherwise forget. 1198 01:10:07,530 --> 01:10:12,120 So in Python, we could very much do this same thing as follows. 1199 01:10:12,120 --> 01:10:20,090 I could say, file equals open books.csv in reading mode. 1200 01:10:20,090 --> 01:10:24,680 And then I could unindent these, and I have access to file, 1201 01:10:24,680 --> 01:10:28,040 because I said file equals open books.csv. 1202 01:10:28,040 --> 01:10:35,060 But now, if I run this program python reads.py, I see it still works. 1203 01:10:35,060 --> 01:10:38,660 But for those of you who are observant or know Python a bit 1204 01:10:38,660 --> 01:10:41,060 already, what have I forgotten to do? 1205 01:10:41,060 --> 01:10:44,820 1206 01:10:44,820 --> 01:10:47,000 Any ideas? 1207 01:10:47,000 --> 01:10:48,630 I've forgotten to close the file. 1208 01:10:48,630 --> 01:10:51,260 So just like in C where we opened a file and had 1209 01:10:51,260 --> 01:10:55,430 to remember to close it at the end, here I didn't close this file. 1210 01:10:55,430 --> 01:11:02,268 I think I can do so with file.close, and that should take care of that for me. 1211 01:11:02,268 --> 01:11:04,310 But I don't want to have to remember every time I 1212 01:11:04,310 --> 01:11:06,090 open the file to close it later on. 1213 01:11:06,090 --> 01:11:08,040 So I use with instead. 1214 01:11:08,040 --> 01:11:13,550 And what with does, with open as file, indent these, 1215 01:11:13,550 --> 01:11:18,750 as long as I'm indented inside this with statement, my file is open, 1216 01:11:18,750 --> 01:11:20,090 I can use it as I'd like. 1217 01:11:20,090 --> 01:11:24,740 But as soon as I unindent, my file is automatically closed for me. 1218 01:11:24,740 --> 01:11:26,900 I don't have to call file.close. 1219 01:11:26,900 --> 01:11:29,207 Python does that for me automatically. 1220 01:11:29,207 --> 01:11:31,040 So it's a special way of reading and writing 1221 01:11:31,040 --> 01:11:34,880 files where it just makes things easier to open and close things for me 1222 01:11:34,880 --> 01:11:35,780 automatically. 1223 01:11:35,780 --> 01:11:40,240 1224 01:11:40,240 --> 01:11:43,465 Other good questions here, like how to get the user to upload a file? 1225 01:11:43,465 --> 01:11:45,340 That is more in the realm of web development. 1226 01:11:45,340 --> 01:11:48,370 So if you're interested in that, go and check out CS50W with Brian. 1227 01:11:48,370 --> 01:11:51,100 1228 01:11:51,100 --> 01:11:52,300 All right. 1229 01:11:52,300 --> 01:11:55,390 Any other questions here? 1230 01:11:55,390 --> 01:11:59,150 1231 01:11:59,150 --> 01:12:01,540 OK, not seeing too many. 1232 01:12:01,540 --> 01:12:02,710 Oh, one more. 1233 01:12:02,710 --> 01:12:06,040 Do you need to specify the stored location of the CSV file? 1234 01:12:06,040 --> 01:12:06,580 Yes. 1235 01:12:06,580 --> 01:12:12,100 So here, let me show you, if I'm in my terminal and I type something like ls, 1236 01:12:12,100 --> 01:12:18,760 notice how books.csv and books.py are in the same folder? 1237 01:12:18,760 --> 01:12:23,080 That is what allows me to simply use books.csv. 1238 01:12:23,080 --> 01:12:25,660 If, though, this was in some other folder, 1239 01:12:25,660 --> 01:12:29,080 like let's say it was in data slash books.csv, 1240 01:12:29,080 --> 01:12:32,230 I would have to specify that to open so it can go ahead and find 1241 01:12:32,230 --> 01:12:35,890 that file for me on my file system. 1242 01:12:35,890 --> 01:12:37,810 Good questions. 1243 01:12:37,810 --> 01:12:38,920 OK. 1244 01:12:38,920 --> 01:12:42,670 So that just about brings us to the end of this section, which 1245 01:12:42,670 --> 01:12:46,335 we've been able to read and write to files, learn about strings, et cetera. 1246 01:12:46,335 --> 01:12:48,460 This is kind of a lot as an introduction to Python. 1247 01:12:48,460 --> 01:12:50,590 But I hope you enjoyed getting to dive in, 1248 01:12:50,590 --> 01:12:53,290 and I hope you feel equipped to tackle this week's problem set. 1249 01:12:53,290 --> 01:12:55,630 Certainly feel free to ask any questions you'd like to, 1250 01:12:55,630 --> 01:12:58,680 but we'll hopefully see you next week. 1251 01:12:58,680 --> 01:13:02,000