1 00:00:00,000 --> 00:00:03,269 2 00:00:03,269 --> 00:00:05,280 CARTER ZENKE: OK, hello, everyone. 3 00:00:05,280 --> 00:00:06,690 It is so good to see you here. 4 00:00:06,690 --> 00:00:07,700 My name is Carter Zenke. 5 00:00:07,700 --> 00:00:10,220 I'm the course preceptor here on campus. 6 00:00:10,220 --> 00:00:12,380 This is our week six section for CS50. 7 00:00:12,380 --> 00:00:14,150 We'll dive into Python. 8 00:00:14,150 --> 00:00:16,400 So in lecture, we saw a few different topics 9 00:00:16,400 --> 00:00:18,890 and we have those same topics on the board today. 10 00:00:18,890 --> 00:00:22,400 Some including strings, this new dot notation for Python, 11 00:00:22,400 --> 00:00:26,580 loops, dictionaries, libraries, and how we read and write from files. 12 00:00:26,580 --> 00:00:30,620 So we'll touch a bit on each of these different topics during this section. 13 00:00:30,620 --> 00:00:34,022 And we'll spend a bit more time on things like loops and dictionaries 14 00:00:34,022 --> 00:00:36,230 and file writing and file reading to help prepare you 15 00:00:36,230 --> 00:00:38,602 for this week's problem set. 16 00:00:38,602 --> 00:00:41,060 So that's all those topics this week, but we'll try to dive 17 00:00:41,060 --> 00:00:42,810 into a few of them in particular. 18 00:00:42,810 --> 00:00:45,470 So let's actually start off with this idea of these strings. 19 00:00:45,470 --> 00:00:48,020 So we saw strings in C. And these strings 20 00:00:48,020 --> 00:00:50,102 are actually still here in Python. 21 00:00:50,102 --> 00:00:53,310 And there are all kinds of interesting things you can really do with strings. 22 00:00:53,310 --> 00:00:56,120 So one of them might be taking in some information from a book. 23 00:00:56,120 --> 00:00:58,105 So maybe you've read Goodnight Moon. 24 00:00:58,105 --> 00:01:00,230 And if you have, you know it's this children's book 25 00:01:00,230 --> 00:01:02,480 that has lots of simple text that is involved with it. 26 00:01:02,480 --> 00:01:04,160 And maybe the book starts off like this. 27 00:01:04,160 --> 00:01:07,795 It says, in the great green room, and maybe we're 28 00:01:07,795 --> 00:01:10,170 interested in seeing what a computer does with this text. 29 00:01:10,170 --> 00:01:13,250 So one thing we can now do with Python is have more access 30 00:01:13,250 --> 00:01:16,130 to all these different kinds of libraries and things we can use. 31 00:01:16,130 --> 00:01:18,060 Actually use AI and such. 32 00:01:18,060 --> 00:01:22,550 And so maybe we give this piece of text to this AI called DALL-E with OpenAI. 33 00:01:22,550 --> 00:01:25,580 And we get back maybe this kind of image here in the great green room 34 00:01:25,580 --> 00:01:28,010 that DALL-E generated just from seeing this piece of text. 35 00:01:28,010 --> 00:01:29,885 And we could also do it with the next phrase. 36 00:01:29,885 --> 00:01:32,570 We could say, there was a telephone and a red balloon, 37 00:01:32,570 --> 00:01:35,420 and we'll get back some text-- or some images a bit like this. 38 00:01:35,420 --> 00:01:40,520 This telephone here, this red balloon generated by the AI, again, here. 39 00:01:40,520 --> 00:01:43,940 Now, all of this-- this fancy AI, this image innovation, 40 00:01:43,940 --> 00:01:46,860 comes down to just giving text to a computer. 41 00:01:46,860 --> 00:01:50,420 And so let's think about how we do that in Python here as compared to C. 42 00:01:50,420 --> 00:01:53,690 So we saw in C we had this top level code here, 43 00:01:53,690 --> 00:01:58,520 char *text gets this get_string function whenever we prompt the user for there. 44 00:01:58,520 --> 00:02:02,990 And then, in Python, we had this text gets the value of input. 45 00:02:02,990 --> 00:02:05,270 And so take a moment here, maybe pause the video 46 00:02:05,270 --> 00:02:09,380 and think, what are the differences you see between the top level 47 00:02:09,380 --> 00:02:11,000 code and the bottom level coding? 48 00:02:11,000 --> 00:02:13,790 The top and C and the bottom one in Python. 49 00:02:13,790 --> 00:02:18,080 50 00:02:18,080 --> 00:02:20,690 So maybe you notice that, in Python, we no longer 51 00:02:20,690 --> 00:02:23,340 have to say the type of the variable we're working with. 52 00:02:23,340 --> 00:02:25,400 So down below, we're still working with strings, 53 00:02:25,400 --> 00:02:27,020 as we are in the top piece of code. 54 00:02:27,020 --> 00:02:30,350 But notice how, in Python, we just get to say the variable name, like just 55 00:02:30,350 --> 00:02:31,250 text, right? 56 00:02:31,250 --> 00:02:34,700 Not string text, not char *text, just text. 57 00:02:34,700 --> 00:02:38,280 And so C is what we call a statically typed language. 58 00:02:38,280 --> 00:02:41,120 We have to declare the type before we use a given variable. 59 00:02:41,120 --> 00:02:44,900 Like text has to declare it as a string or a char *. 60 00:02:44,900 --> 00:02:47,788 Whereas, in Python, which is a dynamically typed language, 61 00:02:47,788 --> 00:02:48,830 we don't have to do that. 62 00:02:48,830 --> 00:02:51,540 We can say, Python, please infer for me the data type 63 00:02:51,540 --> 00:02:52,790 that we're talking about here. 64 00:02:52,790 --> 00:02:55,580 Saying text gets whatever input gives us. 65 00:02:55,580 --> 00:02:58,790 And because we know this input function always gives us a string, 66 00:02:58,790 --> 00:03:03,090 well text, of course, is going to be a string. 67 00:03:03,090 --> 00:03:07,440 OK, so it's one difference between C and Python in getting these variables 68 00:03:07,440 --> 00:03:08,550 and getting these strings. 69 00:03:08,550 --> 00:03:10,620 But how would we actually compare them perhaps? 70 00:03:10,620 --> 00:03:12,600 So in C, we had this top code. 71 00:03:12,600 --> 00:03:14,538 And in Python, we now have this bottom code. 72 00:03:14,538 --> 00:03:16,830 And so take a moment to yourself, maybe pause the video 73 00:03:16,830 --> 00:03:19,110 and think, what differences do you notice here 74 00:03:19,110 --> 00:03:20,580 between the top and the bottom? 75 00:03:20,580 --> 00:03:25,540 76 00:03:25,540 --> 00:03:27,290 And so maybe you've seen that, in the top, 77 00:03:27,290 --> 00:03:29,322 we actually are comparing text with hello 78 00:03:29,322 --> 00:03:30,780 using a different kind of function. 79 00:03:30,780 --> 00:03:34,250 We're saying, let's use str compare to look at text and look at hello 80 00:03:34,250 --> 00:03:36,290 and check the return value of that and see 81 00:03:36,290 --> 00:03:39,048 if it's 0, where 0 indicates that these are the same. 82 00:03:39,048 --> 00:03:41,090 Well, in Python, we don't have to do any of that. 83 00:03:41,090 --> 00:03:43,110 No special functions involved here. 84 00:03:43,110 --> 00:03:46,160 All we get to say is, if text is equal to hello, 85 00:03:46,160 --> 00:03:49,760 if this text is hello, let's do something that's indented. 86 00:03:49,760 --> 00:03:52,760 And maybe you notice as well that, in the top, in our if statement, 87 00:03:52,760 --> 00:03:54,530 we have to have these curly braces. 88 00:03:54,530 --> 00:03:57,650 And the code that gets run if this condition is true 89 00:03:57,650 --> 00:03:59,840 is inside those curly braces. 90 00:03:59,840 --> 00:04:04,643 Whereas, in Python down below, we see that we have this if text equals hello. 91 00:04:04,643 --> 00:04:06,560 Then, we actually do the code that's indented. 92 00:04:06,560 --> 00:04:10,520 So no longer do we need these curly braces here. 93 00:04:10,520 --> 00:04:13,040 We can only rely on code that is indented. 94 00:04:13,040 --> 00:04:17,600 And so while you may be able to throw out the braces and the semicolons, 95 00:04:17,600 --> 00:04:20,899 you can also make sure you have to have your code indented. 96 00:04:20,899 --> 00:04:25,417 That's what can matter more in Python versus these curly braces up top. 97 00:04:25,417 --> 00:04:28,250 So comparing strings works a little differently in Python, but let's 98 00:04:28,250 --> 00:04:29,480 take a look at how we actually get access 99 00:04:29,480 --> 00:04:31,447 to individual characters of a string. 100 00:04:31,447 --> 00:04:33,530 Because remember that strings are just collections 101 00:04:33,530 --> 00:04:35,330 of characters or strings of characters. 102 00:04:35,330 --> 00:04:38,180 And to see individual characters in Python, 103 00:04:38,180 --> 00:04:41,270 we can actually do the same thing we did in C. We have the same bracket 104 00:04:41,270 --> 00:04:45,440 syntax, like text [i] to get access to a particular character 105 00:04:45,440 --> 00:04:47,100 inside our piece of text. 106 00:04:47,100 --> 00:04:48,780 And this also works for lists. 107 00:04:48,780 --> 00:04:52,580 So let's say we had a list in Python similar to an array in C. 108 00:04:52,580 --> 00:04:56,150 We could try to actually have these curly brace-- these brackets here 109 00:04:56,150 --> 00:05:00,680 that get us access to an individual element inside of our list. 110 00:05:00,680 --> 00:05:02,810 So some differences in strings here. 111 00:05:02,810 --> 00:05:05,300 But mostly, we can actually use some familiar syntax 112 00:05:05,300 --> 00:05:08,570 with these accessing individual characters here. 113 00:05:08,570 --> 00:05:11,450 Now, strings are a little more powerful than just comparing 114 00:05:11,450 --> 00:05:15,170 them or getting them in your program or even getting individual characters. 115 00:05:15,170 --> 00:05:17,810 And in fact, in C, we have-- not in C. In Python-- 116 00:05:17,810 --> 00:05:20,780 sorry-- we have access to these individual functions 117 00:05:20,780 --> 00:05:25,670 or the individual methods that belong to this type called str in Python, 118 00:05:25,670 --> 00:05:29,370 more long we call the string. 119 00:05:29,370 --> 00:05:32,750 So here we have this dot notation that we can introduce. 120 00:05:32,750 --> 00:05:35,802 So let's say we're trying to give this string to our program. 121 00:05:35,802 --> 00:05:38,510 Well, we could have some code a bit like this like we saw before. 122 00:05:38,510 --> 00:05:40,670 We could say that text equals input. 123 00:05:40,670 --> 00:05:43,380 And maybe our text looks like this on the right hand side. 124 00:05:43,380 --> 00:05:45,620 And if you pause the video or asked yourself, 125 00:05:45,620 --> 00:05:49,700 what looks a little odd about this text on the right? 126 00:05:49,700 --> 00:05:51,770 What is a little messy about it? 127 00:05:51,770 --> 00:05:54,620 128 00:05:54,620 --> 00:05:56,620 And so maybe you've noticed that, in general, we 129 00:05:56,620 --> 00:06:00,160 have some spaces before the text and some places after the text. 130 00:06:00,160 --> 00:06:02,080 And ideally, those shouldn't be there. 131 00:06:02,080 --> 00:06:04,510 That's going to be a remnant of the user typing in some information 132 00:06:04,510 --> 00:06:05,635 and they typed it in wrong. 133 00:06:05,635 --> 00:06:06,830 We want to get rid of that. 134 00:06:06,830 --> 00:06:09,950 Well, luckily, in Python, we actually have these methods 135 00:06:09,950 --> 00:06:12,910 we can use to get rid of that whitespace on either end 136 00:06:12,910 --> 00:06:14,380 to strip that whitespace off. 137 00:06:14,380 --> 00:06:17,060 And this method is actually called dot strip. 138 00:06:17,060 --> 00:06:19,030 So we can say text.strip. 139 00:06:19,030 --> 00:06:21,550 And if you run this line of code now, we'll 140 00:06:21,550 --> 00:06:26,090 see that those white spaces on either end, actually, go away. 141 00:06:26,090 --> 00:06:27,462 So here is what we had before. 142 00:06:27,462 --> 00:06:29,920 It's a piece of messy text with white space on either side. 143 00:06:29,920 --> 00:06:32,260 But running text.strip, well, we've got rid of that. 144 00:06:32,260 --> 00:06:34,180 Now, we just have, in the great green room, 145 00:06:34,180 --> 00:06:37,400 the actual characters inside of our string. 146 00:06:37,400 --> 00:06:38,925 OK, and that's actually pretty good. 147 00:06:38,925 --> 00:06:40,800 But what if we had some other kinds of input? 148 00:06:40,800 --> 00:06:43,910 Let's say we had maybe miscapitalized letters. 149 00:06:43,910 --> 00:06:48,500 Like IN, all caps, thE with E capitalized, and ROom with RO 150 00:06:48,500 --> 00:06:49,130 capitalized. 151 00:06:49,130 --> 00:06:51,830 Well, how could we make sure this is all standardized? 152 00:06:51,830 --> 00:06:53,983 Well, we could use another method that belongs 153 00:06:53,983 --> 00:06:55,400 to strings, this one called lower. 154 00:06:55,400 --> 00:06:56,840 We could say text.lower. 155 00:06:56,840 --> 00:07:00,800 And that then gives us this same string, but in lowercase. 156 00:07:00,800 --> 00:07:03,140 And similarly, maybe we want to actually capitalize this 157 00:07:03,140 --> 00:07:03,950 because it's a sentence. 158 00:07:03,950 --> 00:07:05,700 We could actually do that very same thing. 159 00:07:05,700 --> 00:07:09,980 We could say text.capitalize to make sure that the I in this string 160 00:07:09,980 --> 00:07:13,850 is capitalized and the rest is here lowercased. 161 00:07:13,850 --> 00:07:16,850 So all of these are what we call methods that 162 00:07:16,850 --> 00:07:20,210 really belong to this idea of the string-- this data type called 163 00:07:20,210 --> 00:07:22,220 the string in Python. 164 00:07:22,220 --> 00:07:26,240 Some Python developers long ago decided that because these functions, 165 00:07:26,240 --> 00:07:32,870 like lower or capitalize or split and so on-- or not split, but strip, 166 00:07:32,870 --> 00:07:34,423 belong to this thing called a string. 167 00:07:34,423 --> 00:07:36,590 They're so integral to what it means to be a string, 168 00:07:36,590 --> 00:07:41,400 we actually have to include them inside this string data type, so to speak. 169 00:07:41,400 --> 00:07:45,080 So if we keep going here, we can actually think of other ways 170 00:07:45,080 --> 00:07:46,850 to use these methods. 171 00:07:46,850 --> 00:07:50,840 But first, let's actually dive into how they work, why they exist, 172 00:07:50,840 --> 00:07:51,650 and where they are. 173 00:07:51,650 --> 00:07:54,900 Compare what kinds of methods you want to use in your own code. 174 00:07:54,900 --> 00:07:59,990 So again, these strings, or more succinctly called strs in Python, S-T-R 175 00:07:59,990 --> 00:08:03,020 for short, actually you have a variety of methods you can figure out 176 00:08:03,020 --> 00:08:04,500 in the Python documentation. 177 00:08:04,500 --> 00:08:07,530 So if you go to docs.python.org, you'll see something like this. 178 00:08:07,530 --> 00:08:10,040 And if you scroll down to the str section, 179 00:08:10,040 --> 00:08:11,780 you'll see string methods inside. 180 00:08:11,780 --> 00:08:15,830 You get to find all the methods you could call on your strings in Python. 181 00:08:15,830 --> 00:08:19,230 We have capitalized down here and some others if we scroll down. 182 00:08:19,230 --> 00:08:22,140 But notice how we have access to this dot notation. 183 00:08:22,140 --> 00:08:26,360 It's not capitalize, and then, give the string as input to capitalize. 184 00:08:26,360 --> 00:08:30,060 It's actually str.capitalize or string.lower. 185 00:08:30,060 --> 00:08:32,870 And so why does this come up in Python? 186 00:08:32,870 --> 00:08:36,890 Well, if we actually bring it back to what we saw in C with our structs, 187 00:08:36,890 --> 00:08:38,490 we might get some intuition here. 188 00:08:38,490 --> 00:08:42,530 So in C, we had this idea of a struct called a candidate, for example, 189 00:08:42,530 --> 00:08:43,789 in an earlier problem set. 190 00:08:43,789 --> 00:08:47,430 And a candidate looks a bit like this, just a single person over here. 191 00:08:47,430 --> 00:08:49,910 And remember how this candidate had different attributes, 192 00:08:49,910 --> 00:08:54,350 like they had a name, or they had maybe a number of votes. 193 00:08:54,350 --> 00:08:56,270 And this candidate was some data type we had 194 00:08:56,270 --> 00:08:58,310 constructed to have these attributes. 195 00:08:58,310 --> 00:09:00,890 It had a name and some number of votes. 196 00:09:00,890 --> 00:09:04,860 And to get access to those, well, we did this very same thing of candidate.votes 197 00:09:04,860 --> 00:09:06,050 to get access to votes. 198 00:09:06,050 --> 00:09:09,720 And then, candidate.name to get access to name. 199 00:09:09,720 --> 00:09:14,000 So in Python, this str data type now is somewhat similar, 200 00:09:14,000 --> 00:09:16,130 but you can think of it as more of a toolkit. 201 00:09:16,130 --> 00:09:18,320 It's a data type that has some tools inside 202 00:09:18,320 --> 00:09:21,440 you can use on the string you're talking about in this case. 203 00:09:21,440 --> 00:09:24,350 So we could say str.capitalize. 204 00:09:24,350 --> 00:09:26,580 And there's a tool inside of this str type 205 00:09:26,580 --> 00:09:29,580 we can use-- this function we can use, or more particularly this method, 206 00:09:29,580 --> 00:09:32,060 we can use on this str data type. 207 00:09:32,060 --> 00:09:35,150 And similarly, we could even have str.lower, this other tool 208 00:09:35,150 --> 00:09:38,420 inside of our toolbox for strs we can use to actually make 209 00:09:38,420 --> 00:09:40,590 the str do what we want it to do. 210 00:09:40,590 --> 00:09:45,725 So very similar in spirit to this idea of attributes for our structs in C 211 00:09:45,725 --> 00:09:49,040 where structure often maybe defined as our own data types, similarly, 212 00:09:49,040 --> 00:09:51,350 in Python, we have our own data type called 213 00:09:51,350 --> 00:09:53,930 a str that now has not just attributes, but also 214 00:09:53,930 --> 00:09:58,910 functions that can belong to it and that can operate on its own self. 215 00:09:58,910 --> 00:10:01,840 So that's all for dot syntax. 216 00:10:01,840 --> 00:10:06,610 And take a look at the Python all the functions you can use here. 217 00:10:06,610 --> 00:10:09,370 But let's actually take a look at loops in Python 218 00:10:09,370 --> 00:10:11,720 and how strings come in with loops. 219 00:10:11,720 --> 00:10:15,700 So if we remember from lecture, we see that there are maybe 220 00:10:15,700 --> 00:10:17,500 the same kinds of loops impact. 221 00:10:17,500 --> 00:10:21,340 We have the while loop, the for loop, but they look a little different. 222 00:10:21,340 --> 00:10:24,340 And then, one big example of this difference 223 00:10:24,340 --> 00:10:28,000 is Python's four blank in blank syntax. 224 00:10:28,000 --> 00:10:31,400 And we'll see this very often in Python because it's so convenient to use. 225 00:10:31,400 --> 00:10:34,930 So let's say, for example, I have some piece of text, again, 226 00:10:34,930 --> 00:10:37,160 in the great green room on the right hand side. 227 00:10:37,160 --> 00:10:39,770 And I want to loop through this piece of text. 228 00:10:39,770 --> 00:10:44,230 Well, I could do that much more simply than I really could in C. All I 229 00:10:44,230 --> 00:10:47,650 have to do is say for C in text. 230 00:10:47,650 --> 00:10:49,870 Maybe we want to print out every character. 231 00:10:49,870 --> 00:10:52,840 And so what we'll do is make a new variable called C. 232 00:10:52,840 --> 00:10:56,680 And it'll actually loop through all the characters inside of this string called 233 00:10:56,680 --> 00:11:01,040 text and make sure that on each iteration C updates as we go through. 234 00:11:01,040 --> 00:11:02,870 So for example, let's say I run this code. 235 00:11:02,870 --> 00:11:07,270 I would see that C gets that very first character in text, like the I. 236 00:11:07,270 --> 00:11:11,320 On the next iteration, C will get that next character called n, and then 237 00:11:11,320 --> 00:11:14,650 the space, and then the t, and then the h, and then the e. 238 00:11:14,650 --> 00:11:17,920 And this will keep going and going and going all the way through our string. 239 00:11:17,920 --> 00:11:19,990 Now, we could call C anything we want to call it. 240 00:11:19,990 --> 00:11:21,250 We could call it z. 241 00:11:21,250 --> 00:11:22,690 We could call it s. 242 00:11:22,690 --> 00:11:25,537 We could call it character, just like a long variable name. 243 00:11:25,537 --> 00:11:27,370 We could even call it zebra if we wanted to. 244 00:11:27,370 --> 00:11:30,730 But the main thing here is that Python takes this string 245 00:11:30,730 --> 00:11:35,290 and infers that when it sees this loop called for blank in whatever string 246 00:11:35,290 --> 00:11:38,590 you have, it's going to take every individual character, 247 00:11:38,590 --> 00:11:43,840 assign it some name as you loop through, and make sure that each time that name 248 00:11:43,840 --> 00:11:46,210 refers to a particular element inside of your string 249 00:11:46,210 --> 00:11:49,760 where the element is now, in this case, a character. 250 00:11:49,760 --> 00:11:53,470 Now, this is actually-- this works beyond simply strings. 251 00:11:53,470 --> 00:11:55,900 We also have this kind of syntax for lists. 252 00:11:55,900 --> 00:11:58,780 And so let's say we want to turn this piece of text, which 253 00:11:58,780 --> 00:12:02,503 is all one string, into really a list of different strings 254 00:12:02,503 --> 00:12:03,670 where each string is a word. 255 00:12:03,670 --> 00:12:06,220 Well, we could have a different kind of method for a string. 256 00:12:06,220 --> 00:12:08,050 This one called .split. 257 00:12:08,050 --> 00:12:11,290 So here we have, again, our text on the right hand side. 258 00:12:11,290 --> 00:12:14,890 And let's make this new variable called words that will get our text 259 00:12:14,890 --> 00:12:17,320 but split up now individual words. 260 00:12:17,320 --> 00:12:21,190 And when you call .split onto a piece of text, it's going to automatically, 261 00:12:21,190 --> 00:12:26,290 or by default, look for spaces in that string and give you back substrings 262 00:12:26,290 --> 00:12:29,540 that are smaller strings that are between those spaces. 263 00:12:29,540 --> 00:12:32,500 So for example, if we were to run text.split here, what we'll get 264 00:12:32,500 --> 00:12:34,460 is now a list of individual words. 265 00:12:34,460 --> 00:12:39,610 So see how this changes from one long string into multiple individual words 266 00:12:39,610 --> 00:12:41,500 and are part of this Python list. 267 00:12:41,500 --> 00:12:46,238 And as a thought question here, how do we know this is a list? 268 00:12:46,238 --> 00:12:48,280 Take a look at the syntax on the right hand side. 269 00:12:48,280 --> 00:12:50,140 Let me show it back to you again. 270 00:12:50,140 --> 00:12:51,865 How do we know this is a list? 271 00:12:51,865 --> 00:12:55,050 272 00:12:55,050 --> 00:12:58,300 Well, you might have thought that it's because of the brackets on either side. 273 00:12:58,300 --> 00:13:02,040 We see these square brackets, and that in Python denotes a list. 274 00:13:02,040 --> 00:13:05,910 But we also see these commas in between our words or their individual strings. 275 00:13:05,910 --> 00:13:10,500 So we say in, which is a string, comma, though, which is a string, comma. 276 00:13:10,500 --> 00:13:12,810 Those are all inside these square brackets. 277 00:13:12,810 --> 00:13:16,440 So that denotes to us a Python list, which is similar in spirit 278 00:13:16,440 --> 00:13:21,340 to a C array, but it's much more flexible overall for us. 279 00:13:21,340 --> 00:13:24,333 OK, so now, we split our words into-- 280 00:13:24,333 --> 00:13:26,500 split our text into individual words, let's actually 281 00:13:26,500 --> 00:13:28,940 see how we can loop through these words as a whole. 282 00:13:28,940 --> 00:13:34,900 So if we then use our for blank in blank syntax with a list as that second blank 283 00:13:34,900 --> 00:13:38,000 there, we could say, for word in words. 284 00:13:38,000 --> 00:13:39,280 Let's print out the word. 285 00:13:39,280 --> 00:13:43,090 And what this will do for us visually is say, on this first iteration, 286 00:13:43,090 --> 00:13:45,550 word will get the value in. 287 00:13:45,550 --> 00:13:49,480 And on the second iteration, word will get the value the. 288 00:13:49,480 --> 00:13:51,817 And on the third, it will get the value great and so on. 289 00:13:51,817 --> 00:13:55,150 And so you might be able to guess, well, on the fourth iteration, what will word 290 00:13:55,150 --> 00:13:56,605 get? 291 00:13:56,605 --> 00:13:58,020 It will get green. 292 00:13:58,020 --> 00:14:00,621 And on the fifth iteration, what will word get? 293 00:14:00,621 --> 00:14:02,405 It will get room, right? 294 00:14:02,405 --> 00:14:03,780 It will get room at the very end. 295 00:14:03,780 --> 00:14:08,540 So we see that word is going through and getting these individual words 296 00:14:08,540 --> 00:14:09,750 inside of our list. 297 00:14:09,750 --> 00:14:10,730 Well, why is that? 298 00:14:10,730 --> 00:14:14,300 Why is it the actual words now inside of our list 299 00:14:14,300 --> 00:14:16,640 as opposed to the characters inside of our string? 300 00:14:16,640 --> 00:14:20,390 Well, what matters here is the kind of data type 301 00:14:20,390 --> 00:14:23,010 you are asking Python to iterate over. 302 00:14:23,010 --> 00:14:26,480 So this for/in syntax is helpful if you want 303 00:14:26,480 --> 00:14:30,362 to have some kind of loop going through every element of a-- 304 00:14:30,362 --> 00:14:33,320 what we call iterable, where an iterable means you can iterate over it. 305 00:14:33,320 --> 00:14:35,630 It has some elements you can go over individually. 306 00:14:35,630 --> 00:14:38,780 If you have a list, what Python will do is go over 307 00:14:38,780 --> 00:14:40,950 every element inside that list. 308 00:14:40,950 --> 00:14:45,760 So in this case, our strings are the elements of our list, right? 309 00:14:45,760 --> 00:14:50,920 But if we have just a single string, like in or the or great or the string 310 00:14:50,920 --> 00:14:53,470 as a whole, in the great green room, well, Python 311 00:14:53,470 --> 00:14:55,840 will decide to go through every individual character. 312 00:14:55,840 --> 00:14:59,830 Because in this case, the characters are that subpiece, that subelement inside 313 00:14:59,830 --> 00:15:01,832 of our longer string, in this case. 314 00:15:01,832 --> 00:15:03,790 So I encourage you to actually go out and check 315 00:15:03,790 --> 00:15:06,220 what Python does in these different cases 316 00:15:06,220 --> 00:15:13,110 to actually see how this changes as you change what you iterate over in Python. 317 00:15:13,110 --> 00:15:16,200 So our first exercise together will be to actually take 318 00:15:16,200 --> 00:15:18,330 a look at this piece of code that will have 319 00:15:18,330 --> 00:15:21,540 us look at a variety different loops and figure out which 320 00:15:21,540 --> 00:15:24,250 one is going to print out which thing. 321 00:15:24,250 --> 00:15:26,370 So let's look back at our code here. 322 00:15:26,370 --> 00:15:30,430 I'll go over to my code space and I'll go ahead and sign in over here. 323 00:15:30,430 --> 00:15:32,775 I'll let you get your code space up too. 324 00:15:32,775 --> 00:15:34,150 So we have our code space loaded. 325 00:15:34,150 --> 00:15:35,910 So let's actually go ahead and pull up our piece of code 326 00:15:35,910 --> 00:15:37,493 that we'll take a look at it together. 327 00:15:37,493 --> 00:15:39,420 So we'll open up text.py. 328 00:15:39,420 --> 00:15:42,032 And I'll zoom in a bit on it for you. 329 00:15:42,032 --> 00:15:43,740 Now, the goal for this exercise-- so take 330 00:15:43,740 --> 00:15:46,860 a look at each of these loops in Python and figure out 331 00:15:46,860 --> 00:15:48,610 what they're going to do for us. 332 00:15:48,610 --> 00:15:51,210 So notice at the very top, we have the same text from before, 333 00:15:51,210 --> 00:15:52,590 in the great green room. 334 00:15:52,590 --> 00:15:54,960 And we split it up by word. 335 00:15:54,960 --> 00:15:59,460 So we're saying let's split up our text and make this list called words. 336 00:15:59,460 --> 00:16:02,700 But we have different loops down here to actually loop through those words 337 00:16:02,700 --> 00:16:05,242 and print out those words out in a variety of different ways. 338 00:16:05,242 --> 00:16:06,690 So let's look at round one here. 339 00:16:06,690 --> 00:16:08,010 What might we see in? 340 00:16:08,010 --> 00:16:11,130 And feel free to pause the video and write it out for yourself. 341 00:16:11,130 --> 00:16:12,750 What might we see in this round one? 342 00:16:12,750 --> 00:16:17,150 343 00:16:17,150 --> 00:16:21,630 So we might see, as we saw before, every individual word being printed. 344 00:16:21,630 --> 00:16:25,580 So we might see first in, and then the, and then 345 00:16:25,580 --> 00:16:28,850 great, and then green, and then room. 346 00:16:28,850 --> 00:16:31,220 These are going to be on new lines because, if we know, 347 00:16:31,220 --> 00:16:36,050 print at the very end always includes that new line for us automatically. 348 00:16:36,050 --> 00:16:39,770 If we wanted to change that, let's say, print these all on the same line, 349 00:16:39,770 --> 00:16:45,740 like in the great green room on that same line, well, 350 00:16:45,740 --> 00:16:48,822 we could change the ending character of this print statement. 351 00:16:48,822 --> 00:16:49,530 We could do this. 352 00:16:49,530 --> 00:16:51,470 We could say end equals blank. 353 00:16:51,470 --> 00:16:53,720 Or end equals, in this case, it's going to be a space. 354 00:16:53,720 --> 00:16:58,760 So what'll happen here is Python will print out end, that very first word, 355 00:16:58,760 --> 00:17:02,420 add a space at the end, and then, print out that next word going 356 00:17:02,420 --> 00:17:04,670 and going and going through. 357 00:17:04,670 --> 00:17:06,950 So up to you, but we can maybe just leave this 358 00:17:06,950 --> 00:17:11,342 as printing out the individual words on single lines. 359 00:17:11,342 --> 00:17:12,800 Let's take a look at round two now. 360 00:17:12,800 --> 00:17:15,217 And feel free to pause the video and ask yourself, 361 00:17:15,217 --> 00:17:16,550 what might get printed out here? 362 00:17:16,550 --> 00:17:23,829 363 00:17:23,829 --> 00:17:26,619 So if we look through this, we have for word in words. 364 00:17:26,619 --> 00:17:28,980 It's the very same loop we had before. 365 00:17:28,980 --> 00:17:32,680 We're getting access to every individual word in our list of words. 366 00:17:32,680 --> 00:17:39,720 So first in, then the, then great, then green, then room. 367 00:17:39,720 --> 00:17:41,790 But now, we have for c in word. 368 00:17:41,790 --> 00:17:46,260 And so recall, when we have a list we're looping over, like in this first loop, 369 00:17:46,260 --> 00:17:48,990 we're getting access to the elements of that list. 370 00:17:48,990 --> 00:17:52,330 But word now is simply an individual string. 371 00:17:52,330 --> 00:17:53,790 So what are we printing out here? 372 00:17:53,790 --> 00:17:58,030 Probably the individual characters inside of that particular word. 373 00:17:58,030 --> 00:18:03,510 So if we loop through this, we might see first I and then n. 374 00:18:03,510 --> 00:18:08,400 And then, would we see the space or would we not? 375 00:18:08,400 --> 00:18:11,308 Maybe take a guess? 376 00:18:11,308 --> 00:18:13,350 In this case, we actually wouldn't see the space. 377 00:18:13,350 --> 00:18:17,040 And that's because we've split our words into a list of words. 378 00:18:17,040 --> 00:18:20,010 And then, I actually try to print this out for you to see. 379 00:18:20,010 --> 00:18:26,070 I'll go ahead and copy this and I'll open up a new one called list.py. 380 00:18:26,070 --> 00:18:27,240 I'll paste this here. 381 00:18:27,240 --> 00:18:28,653 I'll print words. 382 00:18:28,653 --> 00:18:29,820 Let's go ahead and run this. 383 00:18:29,820 --> 00:18:31,830 Python of list.py. 384 00:18:31,830 --> 00:18:35,400 Well, we see now in the great green room. 385 00:18:35,400 --> 00:18:38,430 Notice how there aren't any spaces inside 386 00:18:38,430 --> 00:18:41,520 of these individual strings that are inside of our list. 387 00:18:41,520 --> 00:18:45,510 So if we go back here and we loop through the individual characters, 388 00:18:45,510 --> 00:18:50,050 first in this word, which is I and n, well, there are no spaces there. 389 00:18:50,050 --> 00:18:52,020 And if we loop through this next word like, 390 00:18:52,020 --> 00:18:54,600 T-H-E, there are no spaces there, either. 391 00:18:54,600 --> 00:18:58,750 So we'll say T-H-E and so on and so forth. 392 00:18:58,750 --> 00:19:00,300 You can keep going like this. 393 00:19:00,300 --> 00:19:03,150 We're printing out every individual character except for the spaces, 394 00:19:03,150 --> 00:19:05,120 in this case. 395 00:19:05,120 --> 00:19:09,800 OK, let's look at now our third round where we have for word in words. 396 00:19:09,800 --> 00:19:10,940 Same thing we saw before. 397 00:19:10,940 --> 00:19:12,560 But now, if g-- 398 00:19:12,560 --> 00:19:16,520 this character g is in the word, what will we do? 399 00:19:16,520 --> 00:19:17,600 Print the word. 400 00:19:17,600 --> 00:19:21,890 So take a guess as to what you might see in this loop, keeping in mind that this 401 00:19:21,890 --> 00:19:25,340 is our text, in the great green room. 402 00:19:25,340 --> 00:19:26,345 What would we see here? 403 00:19:26,345 --> 00:19:32,090 404 00:19:32,090 --> 00:19:35,140 So you might see only the words that have g in them. 405 00:19:35,140 --> 00:19:38,590 And in Python, it's just this easy to ask if some string is 406 00:19:38,590 --> 00:19:39,760 part of another string. 407 00:19:39,760 --> 00:19:42,250 That's because we're asking if g, this string, 408 00:19:42,250 --> 00:19:45,830 will be inside this smaller string, this word right here. 409 00:19:45,830 --> 00:19:49,900 So let's go ahead and say I, it doesn't have any g's in it. 410 00:19:49,900 --> 00:19:51,760 The, no g's. 411 00:19:51,760 --> 00:19:54,610 Great has a g at the beginning, so maybe we print that. 412 00:19:54,610 --> 00:19:57,500 And green does too, but room does not. 413 00:19:57,500 --> 00:20:00,880 So we probably-- it would be safe to say that we would see great here 414 00:20:00,880 --> 00:20:07,060 and we would see green printed out to the screen on these new lines. 415 00:20:07,060 --> 00:20:10,150 And now, we have perhaps some new syntax here. 416 00:20:10,150 --> 00:20:13,367 For word, in words, [2:] . 417 00:20:13,367 --> 00:20:15,958 418 00:20:15,958 --> 00:20:18,000 And maybe you're not familiar with that, but feel 419 00:20:18,000 --> 00:20:22,650 free to take a guess as to what you might think will happen here. 420 00:20:22,650 --> 00:20:23,745 Maybe pause the video. 421 00:20:23,745 --> 00:20:27,110 422 00:20:27,110 --> 00:20:31,642 And if we take a look at this, well, let's go back to our list.py file 423 00:20:31,642 --> 00:20:34,100 and try to get a grasp on what this is really doing for us. 424 00:20:34,100 --> 00:20:37,690 So we see words [2:] . 425 00:20:37,690 --> 00:20:39,490 And this is somewhat familiar to us. 426 00:20:39,490 --> 00:20:42,760 We've seen words [2] before. 427 00:20:42,760 --> 00:20:45,180 What this will do if I run it as Python of list.py. 428 00:20:45,180 --> 00:20:47,560 Well, that will give us just great, right? 429 00:20:47,560 --> 00:20:50,530 It will give us the second indexed element 430 00:20:50,530 --> 00:20:55,400 where this is 0, 1, and 2 in our list. 431 00:20:55,400 --> 00:20:56,750 Just point that out. 432 00:20:56,750 --> 00:21:00,870 But what if we wanted not just that element, but that one in all the rest 433 00:21:00,870 --> 00:21:01,370 after it? 434 00:21:01,370 --> 00:21:03,200 Well, Python comes with this fancy feature. 435 00:21:03,200 --> 00:21:06,890 We can say a colon here to say, get me that word at index 2 436 00:21:06,890 --> 00:21:08,638 and all the rest, right? 437 00:21:08,638 --> 00:21:09,430 So I could do this. 438 00:21:09,430 --> 00:21:12,680 I could say Python of list.py and see great, green, and room. 439 00:21:12,680 --> 00:21:18,080 Now, I've sliced my list into these smaller piece here. 440 00:21:18,080 --> 00:21:20,305 That only includes great and green and room. 441 00:21:20,305 --> 00:21:21,680 That's the technical term for it. 442 00:21:21,680 --> 00:21:22,310 Slicing. 443 00:21:22,310 --> 00:21:26,600 We're slicing a list using this colon syntax here. 444 00:21:26,600 --> 00:21:29,210 Now, I could even add an end state. 445 00:21:29,210 --> 00:21:30,770 Let's say I don't want room. 446 00:21:30,770 --> 00:21:32,690 I only want great and green. 447 00:21:32,690 --> 00:21:34,560 Well, I could modify my slice like this. 448 00:21:34,560 --> 00:21:38,090 I could say start at index 2, in this case, that's great, again, 449 00:21:38,090 --> 00:21:39,110 we're 0 index. 450 00:21:39,110 --> 00:21:41,780 0, 1, and 2. 451 00:21:41,780 --> 00:21:46,460 And then, go up to, but not including, that last index here. 452 00:21:46,460 --> 00:21:48,050 And that last index is 4. 453 00:21:48,050 --> 00:21:50,540 Again, 0, 1, 2, 3, 4. 454 00:21:50,540 --> 00:21:51,470 So we'll go here. 455 00:21:51,470 --> 00:21:53,910 2:4 Python list.py. 456 00:21:53,910 --> 00:21:55,880 Now, I see great and green. 457 00:21:55,880 --> 00:22:00,770 This worked because the very first number we put in is inclusive. 458 00:22:00,770 --> 00:22:02,840 We're going to get this index back. 459 00:22:02,840 --> 00:22:06,720 So we're going to get this value here, this string here. 460 00:22:06,720 --> 00:22:08,480 And we're also going to get 3-- 461 00:22:08,480 --> 00:22:09,470 index 3. 462 00:22:09,470 --> 00:22:11,570 But we won't get index 4. 463 00:22:11,570 --> 00:22:13,100 This is exclusive. 464 00:22:13,100 --> 00:22:17,100 The very first number is inclusive, the last number is exclusive. 465 00:22:17,100 --> 00:22:21,945 So to get just, for example, great, we could also do this. 466 00:22:21,945 --> 00:22:23,570 But that's not really necessary, right? 467 00:22:23,570 --> 00:22:27,320 We only really need 2, 4 to get great and green. 468 00:22:27,320 --> 00:22:32,563 Or if we wanted to, just 2: to get all the rest of them in this list, OK? 469 00:22:32,563 --> 00:22:35,230 So now that we've seen that, what do you think will happen here? 470 00:22:35,230 --> 00:22:38,800 Well, it will probably print out every individual word that is in this list 471 00:22:38,800 --> 00:22:41,270 only after the second index. 472 00:22:41,270 --> 00:22:44,150 So we'll probably see great-- 473 00:22:44,150 --> 00:22:49,280 we'll see great and green and we'll see room overall. 474 00:22:49,280 --> 00:22:50,570 OK, last one here. 475 00:22:50,570 --> 00:22:54,305 We have for word in words, print Goodnight Moon. 476 00:22:54,305 --> 00:22:57,305 Now, what do you think you will see here if you were to pause the video? 477 00:22:57,305 --> 00:23:00,890 478 00:23:00,890 --> 00:23:04,340 And maybe you've noticed that, for a Python loop, 479 00:23:04,340 --> 00:23:07,910 we have this idea of going through every element we have in our list. 480 00:23:07,910 --> 00:23:11,010 A lot of Python for loops are really built on lists. 481 00:23:11,010 --> 00:23:14,000 So if we have for word in words, well, our list, 482 00:23:14,000 --> 00:23:19,710 again, is the same one in list.py if I print out just words now. 483 00:23:19,710 --> 00:23:21,840 In the great green room. 484 00:23:21,840 --> 00:23:25,930 Well, this will actually iterate over every element in that list. 485 00:23:25,930 --> 00:23:32,490 We'll say, OK, first in, then the, then great, then green, then room. 486 00:23:32,490 --> 00:23:36,210 And it doesn't matter-- if we're not doing anything with word, 487 00:23:36,210 --> 00:23:37,990 we're just looping that many times. 488 00:23:37,990 --> 00:23:40,530 So we're going to loop, in this case, five times and print 489 00:23:40,530 --> 00:23:42,240 out Goodnight Moon. 490 00:23:42,240 --> 00:23:45,678 But it seems a little odd to do it that way. 491 00:23:45,678 --> 00:23:47,970 And actually, I think I might have a syntax error here. 492 00:23:47,970 --> 00:23:54,100 I don't think we need this parentheses here for word in words. 493 00:23:54,100 --> 00:23:57,540 If we did this, well, why do we have but call it word? 494 00:23:57,540 --> 00:23:59,290 We're not really using word at all. 495 00:23:59,290 --> 00:24:03,845 So we could just say underscore, meaning that, look, this name doesn't matter. 496 00:24:03,845 --> 00:24:04,720 It could be anything. 497 00:24:04,720 --> 00:24:10,180 It could be z, it could be f, it could be zebra, whatever we want it to be. 498 00:24:10,180 --> 00:24:12,220 But we're not going to use this variable name, 499 00:24:12,220 --> 00:24:16,920 SO let's just call it underscore just to signify that this file name doesn't 500 00:24:16,920 --> 00:24:17,760 quite matter. 501 00:24:17,760 --> 00:24:20,550 All we're interested in doing is looping through it. 502 00:24:20,550 --> 00:24:22,545 It doesn't change the output of this for loop. 503 00:24:22,545 --> 00:24:24,420 It doesn't change anything about how it runs. 504 00:24:24,420 --> 00:24:25,837 It just changes the variable name. 505 00:24:25,837 --> 00:24:28,840 So it signifies we don't really care what the name is, in this case. 506 00:24:28,840 --> 00:24:31,130 So I'll change it back to for word in words. 507 00:24:31,130 --> 00:24:32,880 Let's go ahead and run this piece of code. 508 00:24:32,880 --> 00:24:34,410 We'll run Python of text.py. 509 00:24:34,410 --> 00:24:38,160 And now we'll see, in round five, five Goodnight Moons. 510 00:24:38,160 --> 00:24:40,110 So let's keep scrolling up again. 511 00:24:40,110 --> 00:24:44,280 Round one, we did see, in the great green room. 512 00:24:44,280 --> 00:24:49,590 Round two, we see all the individual characters in the great green room. 513 00:24:49,590 --> 00:24:53,480 Round three, we just see great and green. 514 00:24:53,480 --> 00:24:56,472 Round four, we see great green room. 515 00:24:56,472 --> 00:24:58,430 And of course, in round five, as we saw before, 516 00:24:58,430 --> 00:25:02,890 we do see five instances of Goodnight Moon. 517 00:25:02,890 --> 00:25:06,470 OK, so that covers a lot of Python loops. 518 00:25:06,470 --> 00:25:09,220 And in general, if you're going to use this kind of Python syntax, 519 00:25:09,220 --> 00:25:10,720 I think you'll find it really handy. 520 00:25:10,720 --> 00:25:13,750 But it just takes some practice to get to know what each of these loops 521 00:25:13,750 --> 00:25:17,450 is doing and how they work with different data types, in this case. 522 00:25:17,450 --> 00:25:19,910 So let's keep going here. 523 00:25:19,910 --> 00:25:23,140 And let's take a look at this new Python data type. 524 00:25:23,140 --> 00:25:24,880 This one called dictionary. 525 00:25:24,880 --> 00:25:27,700 And dictionaries are really important in Python, very easy to use, 526 00:25:27,700 --> 00:25:31,300 and often very useful for us, as you'll see in the problem set this week. 527 00:25:31,300 --> 00:25:32,750 So a dictionary. 528 00:25:32,750 --> 00:25:36,350 Well, if we think of a dictionary, it's this piece of paper, so to speak. 529 00:25:36,350 --> 00:25:40,930 We might have some idea of hosting some keys and some values. 530 00:25:40,930 --> 00:25:44,240 Maybe words and their definitions, like a real dictionary has. 531 00:25:44,240 --> 00:25:47,530 So here we have, for example, maybe a dictionary of authors. 532 00:25:47,530 --> 00:25:50,830 So maybe Goodnight Moon is one of our keys. 533 00:25:50,830 --> 00:25:54,040 And the value associated with that key is Margaret Wise Brown. 534 00:25:54,040 --> 00:25:57,100 Or Corduroy, that's our key, the book title, 535 00:25:57,100 --> 00:26:01,210 is now associated with the value Don Freeman, that author there. 536 00:26:01,210 --> 00:26:03,760 And Curious George associated with H.A. Ray, 537 00:26:03,760 --> 00:26:07,640 where we have Curious George as a key and H.A. Ray has the value. 538 00:26:07,640 --> 00:26:10,540 So what this gives us is this ability to look up, 539 00:26:10,540 --> 00:26:14,860 like in a dictionary, the actual author of some piece of text given the title. 540 00:26:14,860 --> 00:26:18,100 So we see, again, Goodnight Moon is example of a key here 541 00:26:18,100 --> 00:26:20,900 and Margaret Wise Brown example of a value. 542 00:26:20,900 --> 00:26:26,710 So in this dictionary, authors, I could ask for a title-- like a book title. 543 00:26:26,710 --> 00:26:29,335 I could say, give me a book title and I'll say, Goodnight Moon. 544 00:26:29,335 --> 00:26:31,168 And the dictionary will actually giving back 545 00:26:31,168 --> 00:26:34,270 the value associated with that key, like Margaret Wise Brown. 546 00:26:34,270 --> 00:26:39,190 Now, this example here is a collection of multiple objects 547 00:26:39,190 --> 00:26:40,540 in the same dictionary. 548 00:26:40,540 --> 00:26:43,592 Have multiple books here all in that same dictionary. 549 00:26:43,592 --> 00:26:46,300 We have Goodnight Moon, we have Corduroy, we have Curious George. 550 00:26:46,300 --> 00:26:49,810 But we can also have a single dictionary for a single object. 551 00:26:49,810 --> 00:26:53,560 We could also have, for example, a single book dictionary. 552 00:26:53,560 --> 00:26:56,302 And this dictionary has a title key and an author key, 553 00:26:56,302 --> 00:26:57,760 the two things that make it a book. 554 00:26:57,760 --> 00:26:59,020 It has a title and an author. 555 00:26:59,020 --> 00:27:01,450 Well, the title of this book is Goodnight Moon 556 00:27:01,450 --> 00:27:03,970 and the author is Margaret Wise Brown. 557 00:27:03,970 --> 00:27:08,170 So I could simply ask this dictionary for the title of the book by saying, 558 00:27:08,170 --> 00:27:09,220 give me the value-- 559 00:27:09,220 --> 00:27:10,408 the key title. 560 00:27:10,408 --> 00:27:11,950 I'm going to give you Goodnight Moon. 561 00:27:11,950 --> 00:27:13,783 I could also ask for the author of this book 562 00:27:13,783 --> 00:27:16,670 by asking for the value associated with the key author, 563 00:27:16,670 --> 00:27:20,590 and I get back Margaret Wise Brown, in this case. 564 00:27:20,590 --> 00:27:24,100 OK, so let's see an example of this in actual syntax. 565 00:27:24,100 --> 00:27:25,990 This is a pretty good theoretical overview, 566 00:27:25,990 --> 00:27:28,330 but let's think about it in syntax form. 567 00:27:28,330 --> 00:27:32,260 Here I have a new dictionary, book equals dict. 568 00:27:32,260 --> 00:27:34,300 And what this is doing for me is saying that, 569 00:27:34,300 --> 00:27:37,810 give me an empty dictionary, nothing in it yet, and call it book. 570 00:27:37,810 --> 00:27:42,410 So on the right hand side, I'll see this dictionary, a blank slate called book. 571 00:27:42,410 --> 00:27:46,302 Now, let's I want to add in a key and a value that's associated with it. 572 00:27:46,302 --> 00:27:47,010 So I can do this. 573 00:27:47,010 --> 00:27:51,680 I can say, make sure you add this key called title 574 00:27:51,680 --> 00:27:53,180 and give it the value Corduroy. 575 00:27:53,180 --> 00:27:55,852 So this bracket notation is back. 576 00:27:55,852 --> 00:27:58,560 But now, to add a key to the dictionary, we're going to use that. 577 00:27:58,560 --> 00:28:03,290 So we're going to say book ["title"] to insert a new key called title and give 578 00:28:03,290 --> 00:28:04,478 it the value Corduroy. 579 00:28:04,478 --> 00:28:06,020 And we can also do it for the author. 580 00:28:06,020 --> 00:28:10,440 We could say book ["author"] is Don Freeman, in this case. 581 00:28:10,440 --> 00:28:14,990 So we're to say that, in this case, this book's author is Don Freeman. 582 00:28:14,990 --> 00:28:18,290 Now, if I want to get back some value from my dictionary, I could do this. 583 00:28:18,290 --> 00:28:21,320 I could say, book, bracket, title, and print it out. 584 00:28:21,320 --> 00:28:22,880 And what would I get in this case? 585 00:28:22,880 --> 00:28:24,965 What do you think? 586 00:28:24,965 --> 00:28:25,840 I might get Corduroy. 587 00:28:25,840 --> 00:28:29,110 I would see Corduroy printed out to the screen down below. 588 00:28:29,110 --> 00:28:32,780 That's because saying book ["title"] is saying, 589 00:28:32,780 --> 00:28:38,380 look for the value associated with this key called title inside this dictionary 590 00:28:38,380 --> 00:28:41,280 that we're calling book, right? 591 00:28:41,280 --> 00:28:46,748 OK, now though, what if I asked for the key Corduroy? 592 00:28:46,748 --> 00:28:48,040 What do you think would happen? 593 00:28:48,040 --> 00:28:51,140 594 00:28:51,140 --> 00:28:55,075 Well, if we look at our dictionary, do we have a key in Corduroy? 595 00:28:55,075 --> 00:28:56,200 It doesn't look like we do. 596 00:28:56,200 --> 00:28:58,900 We only have a key for title and a key for author. 597 00:28:58,900 --> 00:29:02,650 Corduroy is a value, but it's not really a key, right? 598 00:29:02,650 --> 00:29:06,590 It's associated with this key title, but it isn't a key itself. 599 00:29:06,590 --> 00:29:09,160 So if we did this, we get what Python calls 600 00:29:09,160 --> 00:29:11,800 a key error where a key error is simply we're 601 00:29:11,800 --> 00:29:15,910 trying to access some key in our dictionary that doesn't exist. 602 00:29:15,910 --> 00:29:18,560 They'll tell us that this key is not part of dictionary. 603 00:29:18,560 --> 00:29:24,000 You can't look up the value for a key that doesn't exist, in this case. 604 00:29:24,000 --> 00:29:27,750 OK, so that gives us access to these dictionaries. 605 00:29:27,750 --> 00:29:30,010 We're going to make them in code. 606 00:29:30,010 --> 00:29:32,790 But what if we wanted to do something a little more advanced? 607 00:29:32,790 --> 00:29:36,330 This is a single book, but what if we had multiple books? 608 00:29:36,330 --> 00:29:39,720 Well, we could maybe shorten our syntax a little bit here. 609 00:29:39,720 --> 00:29:43,980 We could say that if I want a new book, let's just define it all in one breath. 610 00:29:43,980 --> 00:29:47,610 Here we have a new dictionary denoted by these curly braces now. 611 00:29:47,610 --> 00:29:49,900 Not square brackets, but curly braces. 612 00:29:49,900 --> 00:29:54,210 And we have the key, like title and the value, like the Goodnight Moon, 613 00:29:54,210 --> 00:29:58,080 and the key author in the value Margaret Wise Brown. 614 00:29:58,080 --> 00:30:01,410 To give you the full picture here it looks like-- a bit like this. 615 00:30:01,410 --> 00:30:03,230 But again, this is only one book. 616 00:30:03,230 --> 00:30:06,170 So how could I actually get multiple books to actually represent 617 00:30:06,170 --> 00:30:07,830 multiple books in our code? 618 00:30:07,830 --> 00:30:10,432 Well, we could keep this same style of dictionary 619 00:30:10,432 --> 00:30:12,140 where we have a dictionary for every book 620 00:30:12,140 --> 00:30:14,660 and it has the two keys, title and author. 621 00:30:14,660 --> 00:30:17,503 We could also make a list of them-- a list of dictionaries. 622 00:30:17,503 --> 00:30:19,295 So let's take a look at this we could see-- 623 00:30:19,295 --> 00:30:22,150 OK, here I have this list. 624 00:30:22,150 --> 00:30:27,230 And how do you know it's a list if you take a look at this piece of code? 625 00:30:27,230 --> 00:30:29,390 Well, we see those square brackets on either end. 626 00:30:29,390 --> 00:30:30,560 Again, the square brackets that are there 627 00:30:30,560 --> 00:30:32,880 at the beginning and square brackets at the end. 628 00:30:32,880 --> 00:30:35,900 And we also see we have some commas in the middle, 629 00:30:35,900 --> 00:30:37,380 just to highlight that here. 630 00:30:37,380 --> 00:30:39,350 This is a list. 631 00:30:39,350 --> 00:30:43,130 But inside of this list, instead of individual strings, 632 00:30:43,130 --> 00:30:46,490 for example, as we saw earlier, we now have full dictionaries. 633 00:30:46,490 --> 00:30:50,480 We have this dictionary for Goodnight Moon, this dictionary for Corduroy, 634 00:30:50,480 --> 00:30:53,180 and this dictionary for Curious George. 635 00:30:53,180 --> 00:30:58,070 So this is helpful for us because we can represent all kinds of different things 636 00:30:58,070 --> 00:31:01,400 using dictionaries, but make sure we have multiple of them 637 00:31:01,400 --> 00:31:05,610 by keeping a list of these very same dictionaries. 638 00:31:05,610 --> 00:31:07,010 So let's get some practice here. 639 00:31:07,010 --> 00:31:08,300 Let's go to books.py. 640 00:31:08,300 --> 00:31:11,480 And inside of books.py, we'll actually make 641 00:31:11,480 --> 00:31:15,020 sure we can prompt the user for a title of book and an author. 642 00:31:15,020 --> 00:31:19,260 And we'll add that to our bookshelf, which is a list of books, in this case. 643 00:31:19,260 --> 00:31:20,670 So let's go back over here. 644 00:31:20,670 --> 00:31:23,210 And let's close out some old files and maybe we'll 645 00:31:23,210 --> 00:31:27,310 go ahead and code up books.py. 646 00:31:27,310 --> 00:31:30,940 So notice how we have part of our code imprinted for us. 647 00:31:30,940 --> 00:31:33,160 We have this list of books that is empty. 648 00:31:33,160 --> 00:31:37,090 And this is a list, again, because it's simply two empty square braces. 649 00:31:37,090 --> 00:31:40,300 Nothing inside this list, but there could be eventually. 650 00:31:40,300 --> 00:31:44,140 And now, we have this for loop for i in range 3. 651 00:31:44,140 --> 00:31:48,040 We saw range in lecture, which simply gives us a list from 0 all the way up 652 00:31:48,040 --> 00:31:49,280 to 2, in this case. 653 00:31:49,280 --> 00:31:52,780 We can do 0, 1, 2, loop three times. 654 00:31:52,780 --> 00:31:56,890 And inside this loop, we'll actually make sure we have our new dictionary 655 00:31:56,890 --> 00:31:58,885 and we add it to our list of books. 656 00:31:58,885 --> 00:32:00,760 And finally, at the very end, we'll print out 657 00:32:00,760 --> 00:32:03,890 our bookshelf, our list of books, in this case. 658 00:32:03,890 --> 00:32:06,775 So if we wanted to start here, well, I can go back to my syntax 659 00:32:06,775 --> 00:32:07,900 as we saw before. 660 00:32:07,900 --> 00:32:09,280 If I go back here and I see-- 661 00:32:09,280 --> 00:32:13,840 if I want to make a new dictionary, I just need to ask for a blank dict. 662 00:32:13,840 --> 00:32:18,670 So I'll go back over here and I'll say give, me a new dictionary called book, 663 00:32:18,670 --> 00:32:20,060 in this case. 664 00:32:20,060 --> 00:32:23,150 And maybe I want to add a key to this dictionary. 665 00:32:23,150 --> 00:32:27,022 So think to yourself, what is the syntax for that? 666 00:32:27,022 --> 00:32:28,230 And it looks a bit like this. 667 00:32:28,230 --> 00:32:31,350 We could say, OK, I want to have a title here. 668 00:32:31,350 --> 00:32:34,350 The key to this dictionary will be title. 669 00:32:34,350 --> 00:32:37,800 And I'll say, make sure that the title is whatever the user inputs. 670 00:32:37,800 --> 00:32:40,950 And I'll ask them for a title. 671 00:32:40,950 --> 00:32:45,600 And similarly, I could also say, OK, let's add an author to this book. 672 00:32:45,600 --> 00:32:50,170 And I'll say, the input for that will be asking the user for an author. 673 00:32:50,170 --> 00:32:52,320 So now, we have this blank dictionary. 674 00:32:52,320 --> 00:32:55,110 We've asked the user to give us a new key 675 00:32:55,110 --> 00:32:59,430 for the-- a new value for the key title, and similarly, 676 00:32:59,430 --> 00:33:01,620 a new value for the key author. 677 00:33:01,620 --> 00:33:05,220 We've made these keys and given them some value from the user. 678 00:33:05,220 --> 00:33:07,388 All right, we have our book. 679 00:33:07,388 --> 00:33:09,180 We could even print it out if we wanted to. 680 00:33:09,180 --> 00:33:11,480 You can print book down here. 681 00:33:11,480 --> 00:33:15,510 Let's run Python of books.py. 682 00:33:15,510 --> 00:33:24,060 We'll say let's get, in this case, Goodnight Moon by Margaret Wise Brown. 683 00:33:24,060 --> 00:33:28,320 And we see that, down below here, we do have that dictionary being printed out. 684 00:33:28,320 --> 00:33:31,650 And it's dictionary because we see it has these curly braces on either end 685 00:33:31,650 --> 00:33:34,480 and it has these keys associated with these values. 686 00:33:34,480 --> 00:33:37,170 So let's actually end our program here. 687 00:33:37,170 --> 00:33:40,812 What we can do is try to add this book to our list. 688 00:33:40,812 --> 00:33:43,020 And if you are not familiar with this yet, that's OK. 689 00:33:43,020 --> 00:33:47,320 We can actually go ahead and say books.append to add to our list. 690 00:33:47,320 --> 00:33:48,990 So currently, our list is empty. 691 00:33:48,990 --> 00:33:53,610 But we could use this method associated with a list called append to actually 692 00:33:53,610 --> 00:33:55,650 insert this book into our list. 693 00:33:55,650 --> 00:33:58,410 Books.append and individual book. 694 00:33:58,410 --> 00:33:59,470 So let's try this. 695 00:33:59,470 --> 00:34:01,220 Let's actually go ahead and down books.py. 696 00:34:01,220 --> 00:34:03,090 We'll do Python of books.py. 697 00:34:03,090 --> 00:34:05,190 Let's go ahead and add Goodnight Moon. 698 00:34:05,190 --> 00:34:07,890 This one written by, let's say, CS50. 699 00:34:07,890 --> 00:34:10,050 We could also add Corduroy. 700 00:34:10,050 --> 00:34:12,300 And maybe, [INAUDIBLE] CS52. 701 00:34:12,300 --> 00:34:14,850 And maybe we could add-- 702 00:34:14,850 --> 00:34:16,710 oh, I don't know, we can add Curious George, 703 00:34:16,710 --> 00:34:19,150 and that one is also written by CS50. 704 00:34:19,150 --> 00:34:21,210 So now we see down below-- 705 00:34:21,210 --> 00:34:22,530 if I go full screen-- 706 00:34:22,530 --> 00:34:24,193 we have this list of dictionaries. 707 00:34:24,193 --> 00:34:25,110 So what we saw before. 708 00:34:25,110 --> 00:34:29,110 We have the brackets starting our list on either end. 709 00:34:29,110 --> 00:34:31,170 The commas separate our dictionaries. 710 00:34:31,170 --> 00:34:37,500 And on the inside, we have this dictionary, this dictionary, 711 00:34:37,500 --> 00:34:40,120 and this dictionary down below here. 712 00:34:40,120 --> 00:34:43,889 So now, we have some individual books on our bookshelf. 713 00:34:43,889 --> 00:34:44,915 And now, what can we do? 714 00:34:44,915 --> 00:34:47,040 We could maybe decide to print out just the titles. 715 00:34:47,040 --> 00:34:48,570 I can go back through my books. 716 00:34:48,570 --> 00:34:52,949 I could say for book in books, looping through my list of books, 717 00:34:52,949 --> 00:34:56,547 print out the book's title like this. 718 00:34:56,547 --> 00:34:58,380 So now, instead of printing the entire list, 719 00:34:58,380 --> 00:34:59,670 I could print out just the titles. 720 00:34:59,670 --> 00:35:00,337 I could do this. 721 00:35:00,337 --> 00:35:02,100 I could say Python of books.py. 722 00:35:02,100 --> 00:35:07,440 I'll say Goodnight Moon by CS50. 723 00:35:07,440 --> 00:35:11,550 I'll say Corduroy by CS50. 724 00:35:11,550 --> 00:35:17,900 And I will say, in this case, Curious George by CS50. 725 00:35:17,900 --> 00:35:21,470 And now, I see Goodnight Moon, Corduroy, and Curious George. 726 00:35:21,470 --> 00:35:24,770 So helpful because we're able to structure our data inside 727 00:35:24,770 --> 00:35:28,500 of a dictionary and put that inside of our list here. 728 00:35:28,500 --> 00:35:32,720 OK, so what if we wanted to make sure the user 729 00:35:32,720 --> 00:35:35,990 couldn't type in a really awkward title. 730 00:35:35,990 --> 00:35:38,030 Like let's say, if I do this again, I might 731 00:35:38,030 --> 00:35:44,240 type in space, space, Goodnight Moon. 732 00:35:44,240 --> 00:35:46,670 And that isn't quite what I want, right? 733 00:35:46,670 --> 00:35:50,272 That isn't really good on me as a user to give this kind of data, 734 00:35:50,272 --> 00:35:51,980 but it's also not going to the programmer 735 00:35:51,980 --> 00:35:54,772 to assume that these are going to give me exactly what I want here. 736 00:35:54,772 --> 00:35:57,890 So instead of just accepting user input, I actually 737 00:35:57,890 --> 00:36:00,570 could go through and try to sanitize it a little bit. 738 00:36:00,570 --> 00:36:01,658 Make sure to clean it up. 739 00:36:01,658 --> 00:36:03,950 So I actually have it in the format I want it to be in. 740 00:36:03,950 --> 00:36:07,310 I could instead say, OK, whenever I get this input, 741 00:36:07,310 --> 00:36:11,640 I want to afterwards strip the white space. 742 00:36:11,640 --> 00:36:14,640 And I also want to-- and we capitalize it. 743 00:36:14,640 --> 00:36:17,610 So what this is doing here is stringing this dot notation together. 744 00:36:17,610 --> 00:36:21,860 So here I have input, which took me back a string. 745 00:36:21,860 --> 00:36:25,640 And remember, strings have access to these methods, like dot strip 746 00:36:25,640 --> 00:36:27,080 and dot capitalize. 747 00:36:27,080 --> 00:36:30,020 So first, we're going to get some input from the user, some string. 748 00:36:30,020 --> 00:36:31,760 We're going to strip it using this. 749 00:36:31,760 --> 00:36:34,010 And then, we're to run capitalize on it like this. 750 00:36:34,010 --> 00:36:35,010 So let's try that again. 751 00:36:35,010 --> 00:36:39,230 Let's do, in this case, print out just the title right after we get it. 752 00:36:39,230 --> 00:36:42,048 We'll print out book title. 753 00:36:42,048 --> 00:36:43,340 And let's go ahead and do this. 754 00:36:43,340 --> 00:36:45,560 We'd say Python of books.py. 755 00:36:45,560 --> 00:36:52,970 We'll say Goodnight Moon. 756 00:36:52,970 --> 00:36:54,620 And it's a little better, right? 757 00:36:54,620 --> 00:36:58,190 We capitalized the G in Goodnight Moon and we made sure 758 00:36:58,190 --> 00:37:00,170 that everything else is in lowercase. 759 00:37:00,170 --> 00:37:05,200 And there's no white space in front of anything we added here. 760 00:37:05,200 --> 00:37:08,540 OK, so just some handy syntax for cleaning up user input 761 00:37:08,540 --> 00:37:11,390 and making sure that you can make sure your data 762 00:37:11,390 --> 00:37:16,020 is formatted correctly in your own programs here. 763 00:37:16,020 --> 00:37:21,150 OK, so now that we have some dictionaries and this ability 764 00:37:21,150 --> 00:37:23,310 to represent data in this way, we can think 765 00:37:23,310 --> 00:37:26,380 of getting a little more advanced with our programs. 766 00:37:26,380 --> 00:37:29,040 If I go back to our slides, we might think 767 00:37:29,040 --> 00:37:33,210 of not just getting this shelf of books that the user types in, 768 00:37:33,210 --> 00:37:35,157 but really using some data in our programs. 769 00:37:35,157 --> 00:37:37,740 And we'll see this in action during the problem set this week. 770 00:37:37,740 --> 00:37:41,700 How do we get data and use it inside of our programs, especially using Python. 771 00:37:41,700 --> 00:37:45,480 Well, you can think of these libraries and these modules, 772 00:37:45,480 --> 00:37:48,090 where a library is some code somebody else has written. 773 00:37:48,090 --> 00:37:52,230 And in Python, we more specifically we call this often a individual module. 774 00:37:52,230 --> 00:37:54,210 And so, in this example, we'll actually see 775 00:37:54,210 --> 00:37:58,960 a CSV module to work with data that's inside of a CSV file. 776 00:37:58,960 --> 00:38:00,540 But what is a CSC file? 777 00:38:00,540 --> 00:38:04,710 So on your computer, maybe you have Excel or Google Sheets or something 778 00:38:04,710 --> 00:38:05,280 like that. 779 00:38:05,280 --> 00:38:08,730 And you could store data in different rows and different columns. 780 00:38:08,730 --> 00:38:12,480 So notice how here I have a title column and an author column 781 00:38:12,480 --> 00:38:13,993 and individual rows for every book. 782 00:38:13,993 --> 00:38:17,160 So I see Goodnight Moon with Margaret Wise Brown, Corduroy with Don Freeman, 783 00:38:17,160 --> 00:38:21,310 all the way down for these 15 or so books that I have in my data set. 784 00:38:21,310 --> 00:38:24,330 So this is what it looks like in Google Sheets or Excel. 785 00:38:24,330 --> 00:38:27,960 But under the hood, in the actual computer's file, 786 00:38:27,960 --> 00:38:33,660 you'll see something looks a bit like this with title,author Goodnight Moon, 787 00:38:33,660 --> 00:38:36,930 Margaret Wise Brown, and Corduroy, Don Freeman. 788 00:38:36,930 --> 00:38:40,680 So a CSV stands for Comma Separated Values, 789 00:38:40,680 --> 00:38:43,815 where notice how every individual row is a single book 790 00:38:43,815 --> 00:38:45,690 except for that first one, which is the row-- 791 00:38:45,690 --> 00:38:47,550 is the column titles. 792 00:38:47,550 --> 00:38:52,200 And for every row we have, we have multiple columns 793 00:38:52,200 --> 00:38:54,000 separated by these columns. 794 00:38:54,000 --> 00:38:56,310 So Goodnight Moon is a title of this book 795 00:38:56,310 --> 00:38:59,520 and Margaret Wise Brown is the author of this book that's on the same row 796 00:38:59,520 --> 00:39:00,310 right here. 797 00:39:00,310 --> 00:39:03,480 And similarly, Winnie the Pooh is the title of this book and A.A. 798 00:39:03,480 --> 00:39:05,580 Milne is title of this book-- 799 00:39:05,580 --> 00:39:08,350 is the author of this book on that same row. 800 00:39:08,350 --> 00:39:12,770 OK, so to read in these kinds of files, we 801 00:39:12,770 --> 00:39:15,590 might want to use a specialized system that understands 802 00:39:15,590 --> 00:39:17,750 how this data is formatted, right? 803 00:39:17,750 --> 00:39:20,690 It would be a lot of work for you to go through and parse every comma. 804 00:39:20,690 --> 00:39:22,790 To figure out, OK, if there's a comma here, 805 00:39:22,790 --> 00:39:26,120 I need to put this piece of data in that dictionary or this dictionary. 806 00:39:26,120 --> 00:39:27,090 Let's not do that. 807 00:39:27,090 --> 00:39:30,390 Let's actually rely on somebody else who's done that work for us here. 808 00:39:30,390 --> 00:39:35,300 So in Python, there is this CSV library, or CSV module, 809 00:39:35,300 --> 00:39:40,010 that has various methods or functions given inside of it 810 00:39:40,010 --> 00:39:42,540 that can help us read CSV files. 811 00:39:42,540 --> 00:39:45,980 So here, if you go to the Python documentation, docs.python.org, 812 00:39:45,980 --> 00:39:48,230 and you look at this CSV module, you'll be 813 00:39:48,230 --> 00:39:50,510 able to see all the kinds of information on what 814 00:39:50,510 --> 00:39:55,268 is defined inside the CSV module and what you get as part of that module. 815 00:39:55,268 --> 00:39:56,810 Now, how would I use this in my code? 816 00:39:56,810 --> 00:39:57,768 We saw this in lecture. 817 00:39:57,768 --> 00:40:01,340 I could simply write import csv, similar to hashtag 818 00:40:01,340 --> 00:40:04,130 includes DNIO or hashtag includes CS50. 819 00:40:04,130 --> 00:40:07,100 Here, I'm simply including, or importing, 820 00:40:07,100 --> 00:40:10,130 the CSV library that contains all this functionality I 821 00:40:10,130 --> 00:40:13,160 saw in the documentation. 822 00:40:13,160 --> 00:40:15,070 So we can think visually of this. 823 00:40:15,070 --> 00:40:17,740 It's a bit like getting a big box of stuff. 824 00:40:17,740 --> 00:40:22,960 We have this big box of code we can use in our program now called CSV. 825 00:40:22,960 --> 00:40:24,460 This is giving us a big box of code. 826 00:40:24,460 --> 00:40:27,400 And inside of that are some individual functions we could use. 827 00:40:27,400 --> 00:40:30,970 We could use maybe DictReader, DictWriter, reader, or writer. 828 00:40:30,970 --> 00:40:35,140 All this is defined inside of the CSV library. 829 00:40:35,140 --> 00:40:39,130 But how do we know from this big box of stuff 830 00:40:39,130 --> 00:40:41,990 what we actually want to use in our program? 831 00:40:41,990 --> 00:40:46,300 So if we just import the entire module, this entire big box of stuff, 832 00:40:46,300 --> 00:40:49,630 well, to be more specific, what we want to use, we have to use that dot syntax. 833 00:40:49,630 --> 00:40:53,870 We could say something like this. csv.DictReader, for example, 834 00:40:53,870 --> 00:40:57,430 to read our CSV as this collection of dictionaries. 835 00:40:57,430 --> 00:41:01,810 We could say csv.DictReader saying, go to that big box of stuff 836 00:41:01,810 --> 00:41:04,960 in the CSV module and give me the DictReader part of it-- 837 00:41:04,960 --> 00:41:07,210 the DictReader function, right? 838 00:41:07,210 --> 00:41:11,150 We could also do csv.reader to get the reader aspect and so on. 839 00:41:11,150 --> 00:41:14,020 So this dot syntax is coming back, but now, it's 840 00:41:14,020 --> 00:41:18,530 enabling us to access individual parts of our module. 841 00:41:18,530 --> 00:41:23,575 But let's say we don't want the entirety of this entire big box of CSV module. 842 00:41:23,575 --> 00:41:24,950 We don't want everything in here. 843 00:41:24,950 --> 00:41:26,540 Well, we could do this also. 844 00:41:26,540 --> 00:41:30,470 We could say, instead of import CSV as a whole, well, we could just say, 845 00:41:30,470 --> 00:41:34,643 give me from the CSV library the DictReader portion. 846 00:41:34,643 --> 00:41:35,310 I could do this. 847 00:41:35,310 --> 00:41:38,480 I could simply use now from here on out just DictReader. 848 00:41:38,480 --> 00:41:43,250 So from CSV, from this big box of stuff, import just DictReader. 849 00:41:43,250 --> 00:41:47,180 And then, I can simply use DictReader without qualifying where it comes from 850 00:41:47,180 --> 00:41:50,500 or what module it's part of just using DictReader now. 851 00:41:50,500 --> 00:41:54,440 Now, in general, we might prefer to actually do this the other way, 852 00:41:54,440 --> 00:41:55,970 to use it this way. 853 00:41:55,970 --> 00:41:58,153 And why do you think that might be? 854 00:41:58,153 --> 00:41:59,445 Think to yourself for a moment. 855 00:41:59,445 --> 00:42:03,870 856 00:42:03,870 --> 00:42:06,000 Well, it's often handy to do it this way. 857 00:42:06,000 --> 00:42:07,890 Because we do it this way, we actually are 858 00:42:07,890 --> 00:42:10,620 able to make sure we don't collide our name. 859 00:42:10,620 --> 00:42:13,410 So maybe my own program has this function 860 00:42:13,410 --> 00:42:16,290 called reader, by chance, right? 861 00:42:16,290 --> 00:42:20,910 Here, if I say csv.reader, that differentiates this reader function 862 00:42:20,910 --> 00:42:22,630 from my own reader function. 863 00:42:22,630 --> 00:42:25,380 So it's helpful if you're actually defining your own function that 864 00:42:25,380 --> 00:42:28,980 might collide names with the functions you get from other modules in Python. 865 00:42:28,980 --> 00:42:30,750 But you can, of course, do it this way. 866 00:42:30,750 --> 00:42:32,792 If you'd like, and you're very certain don't have 867 00:42:32,792 --> 00:42:35,800 any function called DictReader here. 868 00:42:35,800 --> 00:42:41,560 OK, so let's see some of the differences between using these various functions 869 00:42:41,560 --> 00:42:43,750 inside of this CSV library. 870 00:42:43,750 --> 00:42:47,170 So we saw here before we had DictReader, DictWriter, reader, and writer. 871 00:42:47,170 --> 00:42:48,920 But what are the difference between these, 872 00:42:48,920 --> 00:42:51,020 and why would we even use one versus another? 873 00:42:51,020 --> 00:42:55,230 So to do that, let's actually dive into reading files and maybe writing 874 00:42:55,230 --> 00:42:55,730 to them. 875 00:42:55,730 --> 00:43:00,160 So let's actually go back to our code space now and think about this CSV. 876 00:43:00,160 --> 00:43:02,643 We have books.csv. 877 00:43:02,643 --> 00:43:04,060 and it's same thing we saw before. 878 00:43:04,060 --> 00:43:07,420 We have title, author as our column names, 879 00:43:07,420 --> 00:43:13,460 and we have the title and the author on individual rows separated by commas. 880 00:43:13,460 --> 00:43:17,800 So if I want to read these, I'll say, code reads.py. 881 00:43:17,800 --> 00:43:21,820 I now have this list of books and I've imported the CSV library 882 00:43:21,820 --> 00:43:27,270 so I can read books from this CSV and add them to my shelf, so to speak. 883 00:43:27,270 --> 00:43:32,340 So to open a file in Python, well, there's a few different ways to do it. 884 00:43:32,340 --> 00:43:33,860 I could simply just call open. 885 00:43:33,860 --> 00:43:37,190 I could say open books.csv. 886 00:43:37,190 --> 00:43:41,870 But if I do that, I later have to do something with the file here. 887 00:43:41,870 --> 00:43:44,772 I'll say it like, file equals open. 888 00:43:44,772 --> 00:43:45,980 And then, I have to close it. 889 00:43:45,980 --> 00:43:49,603 I could say close file like this. 890 00:43:49,603 --> 00:43:50,270 Or is it fclose? 891 00:43:50,270 --> 00:43:52,942 892 00:43:52,942 --> 00:43:53,900 Pretty sure it's close. 893 00:43:53,900 --> 00:43:56,150 But you can double check me on that. 894 00:43:56,150 --> 00:43:58,460 But we actually don't have to-- we actually 895 00:43:58,460 --> 00:44:02,150 have to worry about this at all if we just say let's not just open 896 00:44:02,150 --> 00:44:03,120 it like this. 897 00:44:03,120 --> 00:44:06,320 Let's actually open it within a certain context. 898 00:44:06,320 --> 00:44:08,045 Only open it for a little bit. 899 00:44:08,045 --> 00:44:10,920 And once we're done with that file, go ahead and close it afterwards. 900 00:44:10,920 --> 00:44:15,890 We can say with open this file and let's call it something, like file, 901 00:44:15,890 --> 00:44:16,500 in this case. 902 00:44:16,500 --> 00:44:21,890 So we're going to open books.csv and call it file with open this file 903 00:44:21,890 --> 00:44:22,880 as file. 904 00:44:22,880 --> 00:44:25,220 Let's do the following code indented. 905 00:44:25,220 --> 00:44:28,800 So while we're indented here, our file will be open. 906 00:44:28,800 --> 00:44:30,630 We can do all kinds of things with it. 907 00:44:30,630 --> 00:44:35,130 But once we unindent, we go back out, our file will be closed. 908 00:44:35,130 --> 00:44:37,850 We can't do anything more with it here. 909 00:44:37,850 --> 00:44:40,760 So this takes care of running close on our file 910 00:44:40,760 --> 00:44:42,760 or figuring out when to open it versus close it. 911 00:44:42,760 --> 00:44:47,900 Python handles all of that now for us using this with syntax here. 912 00:44:47,900 --> 00:44:50,490 OK, so let's see this in a little more depth. 913 00:44:50,490 --> 00:44:51,620 We saw with open. 914 00:44:51,620 --> 00:44:53,488 This whatever filing we have as file. 915 00:44:53,488 --> 00:44:54,530 You can change file here. 916 00:44:54,530 --> 00:44:59,130 We can also call this maybe even books_file. 917 00:44:59,130 --> 00:45:01,230 Doesn't have to be just file. 918 00:45:01,230 --> 00:45:03,530 But here, we'll call it file. 919 00:45:03,530 --> 00:45:06,330 We can then read it in a few different ways. 920 00:45:06,330 --> 00:45:08,490 And one way, it doesn't even use the CSV module. 921 00:45:08,490 --> 00:45:09,230 We could do this. 922 00:45:09,230 --> 00:45:15,050 Text equals file.read where .read is some method associated with a file that 923 00:45:15,050 --> 00:45:18,650 lets us read in all the data that's part of it and store it inside some 924 00:45:18,650 --> 00:45:19,230 variables. 925 00:45:19,230 --> 00:45:20,355 So here, let's do the same. 926 00:45:20,355 --> 00:45:24,810 Let's say, text equals file.read. 927 00:45:24,810 --> 00:45:26,700 And this is not using the CSV library. 928 00:45:26,700 --> 00:45:28,410 We don't even need this right now. 929 00:45:28,410 --> 00:45:30,940 We could simply say, Python of reads.py. 930 00:45:30,940 --> 00:45:33,360 And we've maybe read our file. 931 00:45:33,360 --> 00:45:36,480 It's hard to tell, so let's go ahead and maybe print out the text. 932 00:45:36,480 --> 00:45:38,220 And our Python of reads.py. 933 00:45:38,220 --> 00:45:43,110 And now we see, in our terminal, well, all the same text we had before. 934 00:45:43,110 --> 00:45:44,340 Title, author. 935 00:45:44,340 --> 00:45:45,930 Goodnight Moon, Margaret Wise Brown. 936 00:45:45,930 --> 00:45:47,760 All of that is in our terminal now. 937 00:45:47,760 --> 00:45:49,920 But this isn't very useful, right? 938 00:45:49,920 --> 00:45:53,670 If we wanted to actually read in some data, store it in our bookshelf, 939 00:45:53,670 --> 00:45:57,690 well, that isn't going to help us add to our list of books, right? 940 00:45:57,690 --> 00:46:00,417 All of these books are still jumbled together. 941 00:46:00,417 --> 00:46:01,500 We don't really want that. 942 00:46:01,500 --> 00:46:03,540 We want to actually differentiate them and have 943 00:46:03,540 --> 00:46:07,080 dictionaries for every book in our CSV. 944 00:46:07,080 --> 00:46:12,030 So that's where this function of-- this method called a DictReader comes in. 945 00:46:12,030 --> 00:46:16,770 We can actually use the CSV module to define a special way to read our file. 946 00:46:16,770 --> 00:46:18,660 And we can then use it like this. 947 00:46:18,660 --> 00:46:21,375 We could say, give us a new file reader, this one 948 00:46:21,375 --> 00:46:23,910 is special from the csv.DictReader function. 949 00:46:23,910 --> 00:46:26,760 And let's actually use that to go through every individual row 950 00:46:26,760 --> 00:46:28,920 in our file and do something for those rows. 951 00:46:28,920 --> 00:46:30,810 So it's best shown by example here. 952 00:46:30,810 --> 00:46:34,080 If we go back to our code, let's not just read all the text. 953 00:46:34,080 --> 00:46:36,960 Let's go ahead and get a special reader for our file. 954 00:46:36,960 --> 00:46:39,960 Let's say I want a file reader. 955 00:46:39,960 --> 00:46:46,400 And I'm going to make sure this is the DictReader inside of the CSV module. 956 00:46:46,400 --> 00:46:50,710 Well, if I read the documentation, I know that DictReader needs access 957 00:46:50,710 --> 00:46:51,910 to a certain file to read. 958 00:46:51,910 --> 00:46:56,110 So I'll give it my file here that I've opened, book.csv. 959 00:46:56,110 --> 00:46:58,630 And DictReader will give us back some special data 960 00:46:58,630 --> 00:47:01,300 structure that we'll call file reader. 961 00:47:01,300 --> 00:47:03,620 And we can use this in our code as follows. 962 00:47:03,620 --> 00:47:07,330 I have to loop over it similar to a list in Python. 963 00:47:07,330 --> 00:47:08,830 And I can do that with for syntax. 964 00:47:08,830 --> 00:47:12,768 I could say for whatever in file reader. 965 00:47:12,768 --> 00:47:14,560 And let's maybe call this-- because we know 966 00:47:14,560 --> 00:47:16,477 we're looking at books-- let's call this book. 967 00:47:16,477 --> 00:47:18,400 For book in file reader. 968 00:47:18,400 --> 00:47:21,860 And now let's just print out book to see what we get here. 969 00:47:21,860 --> 00:47:24,760 So I want it to just say Python of reads.py. 970 00:47:24,760 --> 00:47:29,300 And now, I see individual books as dictionaries. 971 00:47:29,300 --> 00:47:31,240 So DictReader has done a lot of stuff for us. 972 00:47:31,240 --> 00:47:34,970 It said, well, I know that from your CSV file, 973 00:47:34,970 --> 00:47:38,200 you have these columns called title and author. 974 00:47:38,200 --> 00:47:40,677 And I also know that every individual row 975 00:47:40,677 --> 00:47:42,760 is going to be some particular element that you're 976 00:47:42,760 --> 00:47:46,480 interested in where maybe this column corresponds to title 977 00:47:46,480 --> 00:47:48,820 and this column corresponds to author. 978 00:47:48,820 --> 00:47:51,460 So what I'll do is I'll give you each of those rows 979 00:47:51,460 --> 00:47:55,660 as a dictionary with the keys that are your column names 980 00:47:55,660 --> 00:48:00,320 and the values that are whatever is inside every individual row right here. 981 00:48:00,320 --> 00:48:02,800 Notice how we have, in this case, title as the key. 982 00:48:02,800 --> 00:48:04,180 Goodnight Moon is the value. 983 00:48:04,180 --> 00:48:06,880 Author is the key and Margaret Wise Brown is the value. 984 00:48:06,880 --> 00:48:09,730 And it's the very same thing all the way through our CSV. 985 00:48:09,730 --> 00:48:13,950 Now, print it out in our terminal in individual dictionaries. 986 00:48:13,950 --> 00:48:17,470 So if we've printed these out, adding them to our list is pretty simple. 987 00:48:17,470 --> 00:48:21,950 We can just say books.append an individual book. 988 00:48:21,950 --> 00:48:25,693 And now, if we clear our terminal and are in Python or reads.py, 989 00:48:25,693 --> 00:48:28,110 well, we don't see anything because I didn't print it out. 990 00:48:28,110 --> 00:48:31,850 So do print books down below here. 991 00:48:31,850 --> 00:48:35,720 And now, we see all of our books inside of our list. 992 00:48:35,720 --> 00:48:38,660 And this is helpful because, again, we could just print out 993 00:48:38,660 --> 00:48:40,670 individual books or book titles. 994 00:48:40,670 --> 00:48:43,910 I could say for book in books. 995 00:48:43,910 --> 00:48:46,860 Let me go ahead and print out the book title. 996 00:48:46,860 --> 00:48:48,180 And I should see-- 997 00:48:48,180 --> 00:48:51,060 instead of all of my information on all my books, 998 00:48:51,060 --> 00:48:56,130 like we saw earlier with our just .read method, I can say Python of read.py. 999 00:48:56,130 --> 00:49:01,320 Now, I see just the titles formatted very nicely for myself here. 1000 00:49:01,320 --> 00:49:05,690 So if we go back, this is how we're going to actually read CSV files. 1001 00:49:05,690 --> 00:49:08,660 There's other ways too, like csv.reader. 1002 00:49:08,660 --> 00:49:11,150 But this isn't going to quite be as useful for us 1003 00:49:11,150 --> 00:49:12,567 because let's take a look at this. 1004 00:49:12,567 --> 00:49:19,140 If we say csv.reader, let's go ahead and save for row in file reader, print row. 1005 00:49:19,140 --> 00:49:21,840 Let's see what we get instead. 1006 00:49:21,840 --> 00:49:22,800 Python reads.py. 1007 00:49:22,800 --> 00:49:24,870 Well, we get just a list. 1008 00:49:24,870 --> 00:49:28,740 And notice how it's actually included the column names in a single list. 1009 00:49:28,740 --> 00:49:33,540 So reader gives us back every row of our file as a list. 1010 00:49:33,540 --> 00:49:35,850 But this isn't quite as handy because I have 1011 00:49:35,850 --> 00:49:40,410 to know that, for example, if I want to print out the book title, well, 1012 00:49:40,410 --> 00:49:44,910 that's going to be a row where my titles are in the very first index. 1013 00:49:44,910 --> 00:49:48,083 So I'll say row [0] to get the titles. 1014 00:49:48,083 --> 00:49:48,750 Python reads.py. 1015 00:49:48,750 --> 00:49:52,320 And I get title Goodnight Moon, Corduroy. 1016 00:49:52,320 --> 00:49:55,020 That works, but it's not quite as clean as being 1017 00:49:55,020 --> 00:49:58,660 able to name the actual attributes of my book that I want. 1018 00:49:58,660 --> 00:50:00,300 And that's where DictReader comes in. 1019 00:50:00,300 --> 00:50:03,120 By reading our rows as dictionaries, we get access 1020 00:50:03,120 --> 00:50:06,230 to those individual keys we can use throughout our code here. 1021 00:50:06,230 --> 00:50:07,980 So for book and file reader, I can instead 1022 00:50:07,980 --> 00:50:12,130 print, in this case, the book's title. 1023 00:50:12,130 --> 00:50:16,440 Oops, the book title. 1024 00:50:16,440 --> 00:50:17,700 Python reads.py. 1025 00:50:17,700 --> 00:50:20,860 And now, I see all of this. 1026 00:50:20,860 --> 00:50:22,930 And DictReader also knows, if you're curious, 1027 00:50:22,930 --> 00:50:26,650 not to print out the actual first row because it assumes that these are going 1028 00:50:26,650 --> 00:50:29,350 to be the key names, unlike reader, which 1029 00:50:29,350 --> 00:50:34,270 does not make that same assumption, OK? 1030 00:50:34,270 --> 00:50:38,440 So having done this tour of these libraries and these modules 1031 00:50:38,440 --> 00:50:40,652 and how to read in different pieces of data, 1032 00:50:40,652 --> 00:50:42,610 this is really going to give you a lot of tools 1033 00:50:42,610 --> 00:50:44,350 to use on this week's problem set. 1034 00:50:44,350 --> 00:50:48,040 You'll be actually working with these very similar files, CSV, 1035 00:50:48,040 --> 00:50:49,625 so even just textual files. 1036 00:50:49,625 --> 00:50:52,750 And as you go through, food to keep in mind all what we learned right here. 1037 00:50:52,750 --> 00:50:56,780 How to open a file, how to read it in using DictReader, and so on. 1038 00:50:56,780 --> 00:51:00,580 And as you go off into this week, feel free to use all this stuff 1039 00:51:00,580 --> 00:51:02,028 from the section. 1040 00:51:02,028 --> 00:51:03,320 Thank you all for coming today. 1041 00:51:03,320 --> 00:51:04,195 Wonderful to see you. 1042 00:51:04,195 --> 00:51:06,500 We'll see you next week. 1043 00:51:06,500 --> 00:51:07,000