WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:03.269 --> 00:00:05.280 CARTER ZENKE: OK, hello, everyone. 00:00:05.280 --> 00:00:06.690 It is so good to see you here. 00:00:06.690 --> 00:00:07.700 My name is Carter Zenke. 00:00:07.700 --> 00:00:10.220 I'm the course preceptor here on campus. 00:00:10.220 --> 00:00:12.380 This is our week six section for CS50. 00:00:12.380 --> 00:00:14.150 We'll dive into Python. 00:00:14.150 --> 00:00:16.400 So in lecture, we saw a few different topics 00:00:16.400 --> 00:00:18.890 and we have those same topics on the board today. 00:00:18.890 --> 00:00:22.400 Some including strings, this new dot notation for Python, 00:00:22.400 --> 00:00:26.580 loops, dictionaries, libraries, and how we read and write from files. 00:00:26.580 --> 00:00:30.620 So we'll touch a bit on each of these different topics during this section. 00:00:30.620 --> 00:00:34.022 And we'll spend a bit more time on things like loops and dictionaries 00:00:34.022 --> 00:00:36.230 and file writing and file reading to help prepare you 00:00:36.230 --> 00:00:38.602 for this week's problem set. 00:00:38.602 --> 00:00:41.060 So that's all those topics this week, but we'll try to dive 00:00:41.060 --> 00:00:42.810 into a few of them in particular. 00:00:42.810 --> 00:00:45.470 So let's actually start off with this idea of these strings. 00:00:45.470 --> 00:00:48.020 So we saw strings in C. And these strings 00:00:48.020 --> 00:00:50.102 are actually still here in Python. 00:00:50.102 --> 00:00:53.310 And there are all kinds of interesting things you can really do with strings. 00:00:53.310 --> 00:00:56.120 So one of them might be taking in some information from a book. 00:00:56.120 --> 00:00:58.105 So maybe you've read Goodnight Moon. 00:00:58.105 --> 00:01:00.230 And if you have, you know it's this children's book 00:01:00.230 --> 00:01:02.480 that has lots of simple text that is involved with it. 00:01:02.480 --> 00:01:04.160 And maybe the book starts off like this. 00:01:04.160 --> 00:01:07.795 It says, in the great green room, and maybe we're 00:01:07.795 --> 00:01:10.170 interested in seeing what a computer does with this text. 00:01:10.170 --> 00:01:13.250 So one thing we can now do with Python is have more access 00:01:13.250 --> 00:01:16.130 to all these different kinds of libraries and things we can use. 00:01:16.130 --> 00:01:18.060 Actually use AI and such. 00:01:18.060 --> 00:01:22.550 And so maybe we give this piece of text to this AI called DALL-E with OpenAI. 00:01:22.550 --> 00:01:25.580 And we get back maybe this kind of image here in the great green room 00:01:25.580 --> 00:01:28.010 that DALL-E generated just from seeing this piece of text. 00:01:28.010 --> 00:01:29.885 And we could also do it with the next phrase. 00:01:29.885 --> 00:01:32.570 We could say, there was a telephone and a red balloon, 00:01:32.570 --> 00:01:35.420 and we'll get back some text-- or some images a bit like this. 00:01:35.420 --> 00:01:40.520 This telephone here, this red balloon generated by the AI, again, here. 00:01:40.520 --> 00:01:43.940 Now, all of this-- this fancy AI, this image innovation, 00:01:43.940 --> 00:01:46.860 comes down to just giving text to a computer. 00:01:46.860 --> 00:01:50.420 And so let's think about how we do that in Python here as compared to C. 00:01:50.420 --> 00:01:53.690 So we saw in C we had this top level code here, 00:01:53.690 --> 00:01:58.520 char *text gets this get_string function whenever we prompt the user for there. 00:01:58.520 --> 00:02:02.990 And then, in Python, we had this text gets the value of input. 00:02:02.990 --> 00:02:05.270 And so take a moment here, maybe pause the video 00:02:05.270 --> 00:02:09.380 and think, what are the differences you see between the top level 00:02:09.380 --> 00:02:11.000 code and the bottom level coding? 00:02:11.000 --> 00:02:13.790 The top and C and the bottom one in Python. 00:02:18.080 --> 00:02:20.690 So maybe you notice that, in Python, we no longer 00:02:20.690 --> 00:02:23.340 have to say the type of the variable we're working with. 00:02:23.340 --> 00:02:25.400 So down below, we're still working with strings, 00:02:25.400 --> 00:02:27.020 as we are in the top piece of code. 00:02:27.020 --> 00:02:30.350 But notice how, in Python, we just get to say the variable name, like just 00:02:30.350 --> 00:02:31.250 text, right? 00:02:31.250 --> 00:02:34.700 Not string text, not char *text, just text. 00:02:34.700 --> 00:02:38.280 And so C is what we call a statically typed language. 00:02:38.280 --> 00:02:41.120 We have to declare the type before we use a given variable. 00:02:41.120 --> 00:02:44.900 Like text has to declare it as a string or a char *. 00:02:44.900 --> 00:02:47.788 Whereas, in Python, which is a dynamically typed language, 00:02:47.788 --> 00:02:48.830 we don't have to do that. 00:02:48.830 --> 00:02:51.540 We can say, Python, please infer for me the data type 00:02:51.540 --> 00:02:52.790 that we're talking about here. 00:02:52.790 --> 00:02:55.580 Saying text gets whatever input gives us. 00:02:55.580 --> 00:02:58.790 And because we know this input function always gives us a string, 00:02:58.790 --> 00:03:03.090 well text, of course, is going to be a string. 00:03:03.090 --> 00:03:07.440 OK, so it's one difference between C and Python in getting these variables 00:03:07.440 --> 00:03:08.550 and getting these strings. 00:03:08.550 --> 00:03:10.620 But how would we actually compare them perhaps? 00:03:10.620 --> 00:03:12.600 So in C, we had this top code. 00:03:12.600 --> 00:03:14.538 And in Python, we now have this bottom code. 00:03:14.538 --> 00:03:16.830 And so take a moment to yourself, maybe pause the video 00:03:16.830 --> 00:03:19.110 and think, what differences do you notice here 00:03:19.110 --> 00:03:20.580 between the top and the bottom? 00:03:25.540 --> 00:03:27.290 And so maybe you've seen that, in the top, 00:03:27.290 --> 00:03:29.322 we actually are comparing text with hello 00:03:29.322 --> 00:03:30.780 using a different kind of function. 00:03:30.780 --> 00:03:34.250 We're saying, let's use str compare to look at text and look at hello 00:03:34.250 --> 00:03:36.290 and check the return value of that and see 00:03:36.290 --> 00:03:39.048 if it's 0, where 0 indicates that these are the same. 00:03:39.048 --> 00:03:41.090 Well, in Python, we don't have to do any of that. 00:03:41.090 --> 00:03:43.110 No special functions involved here. 00:03:43.110 --> 00:03:46.160 All we get to say is, if text is equal to hello, 00:03:46.160 --> 00:03:49.760 if this text is hello, let's do something that's indented. 00:03:49.760 --> 00:03:52.760 And maybe you notice as well that, in the top, in our if statement, 00:03:52.760 --> 00:03:54.530 we have to have these curly braces. 00:03:54.530 --> 00:03:57.650 And the code that gets run if this condition is true 00:03:57.650 --> 00:03:59.840 is inside those curly braces. 00:03:59.840 --> 00:04:04.643 Whereas, in Python down below, we see that we have this if text equals hello. 00:04:04.643 --> 00:04:06.560 Then, we actually do the code that's indented. 00:04:06.560 --> 00:04:10.520 So no longer do we need these curly braces here. 00:04:10.520 --> 00:04:13.040 We can only rely on code that is indented. 00:04:13.040 --> 00:04:17.600 And so while you may be able to throw out the braces and the semicolons, 00:04:17.600 --> 00:04:20.899 you can also make sure you have to have your code indented. 00:04:20.899 --> 00:04:25.417 That's what can matter more in Python versus these curly braces up top. 00:04:25.417 --> 00:04:28.250 So comparing strings works a little differently in Python, but let's 00:04:28.250 --> 00:04:29.480 take a look at how we actually get access 00:04:29.480 --> 00:04:31.447 to individual characters of a string. 00:04:31.447 --> 00:04:33.530 Because remember that strings are just collections 00:04:33.530 --> 00:04:35.330 of characters or strings of characters. 00:04:35.330 --> 00:04:38.180 And to see individual characters in Python, 00:04:38.180 --> 00:04:41.270 we can actually do the same thing we did in C. We have the same bracket 00:04:41.270 --> 00:04:45.440 syntax, like text [i] to get access to a particular character 00:04:45.440 --> 00:04:47.100 inside our piece of text. 00:04:47.100 --> 00:04:48.780 And this also works for lists. 00:04:48.780 --> 00:04:52.580 So let's say we had a list in Python similar to an array in C. 00:04:52.580 --> 00:04:56.150 We could try to actually have these curly brace-- these brackets here 00:04:56.150 --> 00:05:00.680 that get us access to an individual element inside of our list. 00:05:00.680 --> 00:05:02.810 So some differences in strings here. 00:05:02.810 --> 00:05:05.300 But mostly, we can actually use some familiar syntax 00:05:05.300 --> 00:05:08.570 with these accessing individual characters here. 00:05:08.570 --> 00:05:11.450 Now, strings are a little more powerful than just comparing 00:05:11.450 --> 00:05:15.170 them or getting them in your program or even getting individual characters. 00:05:15.170 --> 00:05:17.810 And in fact, in C, we have-- not in C. In Python-- 00:05:17.810 --> 00:05:20.780 sorry-- we have access to these individual functions 00:05:20.780 --> 00:05:25.670 or the individual methods that belong to this type called str in Python, 00:05:25.670 --> 00:05:29.370 more long we call the string. 00:05:29.370 --> 00:05:32.750 So here we have this dot notation that we can introduce. 00:05:32.750 --> 00:05:35.802 So let's say we're trying to give this string to our program. 00:05:35.802 --> 00:05:38.510 Well, we could have some code a bit like this like we saw before. 00:05:38.510 --> 00:05:40.670 We could say that text equals input. 00:05:40.670 --> 00:05:43.380 And maybe our text looks like this on the right hand side. 00:05:43.380 --> 00:05:45.620 And if you pause the video or asked yourself, 00:05:45.620 --> 00:05:49.700 what looks a little odd about this text on the right? 00:05:49.700 --> 00:05:51.770 What is a little messy about it? 00:05:54.620 --> 00:05:56.620 And so maybe you've noticed that, in general, we 00:05:56.620 --> 00:06:00.160 have some spaces before the text and some places after the text. 00:06:00.160 --> 00:06:02.080 And ideally, those shouldn't be there. 00:06:02.080 --> 00:06:04.510 That's going to be a remnant of the user typing in some information 00:06:04.510 --> 00:06:05.635 and they typed it in wrong. 00:06:05.635 --> 00:06:06.830 We want to get rid of that. 00:06:06.830 --> 00:06:09.950 Well, luckily, in Python, we actually have these methods 00:06:09.950 --> 00:06:12.910 we can use to get rid of that whitespace on either end 00:06:12.910 --> 00:06:14.380 to strip that whitespace off. 00:06:14.380 --> 00:06:17.060 And this method is actually called dot strip. 00:06:17.060 --> 00:06:19.030 So we can say text.strip. 00:06:19.030 --> 00:06:21.550 And if you run this line of code now, we'll 00:06:21.550 --> 00:06:26.090 see that those white spaces on either end, actually, go away. 00:06:26.090 --> 00:06:27.462 So here is what we had before. 00:06:27.462 --> 00:06:29.920 It's a piece of messy text with white space on either side. 00:06:29.920 --> 00:06:32.260 But running text.strip, well, we've got rid of that. 00:06:32.260 --> 00:06:34.180 Now, we just have, in the great green room, 00:06:34.180 --> 00:06:37.400 the actual characters inside of our string. 00:06:37.400 --> 00:06:38.925 OK, and that's actually pretty good. 00:06:38.925 --> 00:06:40.800 But what if we had some other kinds of input? 00:06:40.800 --> 00:06:43.910 Let's say we had maybe miscapitalized letters. 00:06:43.910 --> 00:06:48.500 Like IN, all caps, thE with E capitalized, and ROom with RO 00:06:48.500 --> 00:06:49.130 capitalized. 00:06:49.130 --> 00:06:51.830 Well, how could we make sure this is all standardized? 00:06:51.830 --> 00:06:53.983 Well, we could use another method that belongs 00:06:53.983 --> 00:06:55.400 to strings, this one called lower. 00:06:55.400 --> 00:06:56.840 We could say text.lower. 00:06:56.840 --> 00:07:00.800 And that then gives us this same string, but in lowercase. 00:07:00.800 --> 00:07:03.140 And similarly, maybe we want to actually capitalize this 00:07:03.140 --> 00:07:03.950 because it's a sentence. 00:07:03.950 --> 00:07:05.700 We could actually do that very same thing. 00:07:05.700 --> 00:07:09.980 We could say text.capitalize to make sure that the I in this string 00:07:09.980 --> 00:07:13.850 is capitalized and the rest is here lowercased. 00:07:13.850 --> 00:07:16.850 So all of these are what we call methods that 00:07:16.850 --> 00:07:20.210 really belong to this idea of the string-- this data type called 00:07:20.210 --> 00:07:22.220 the string in Python. 00:07:22.220 --> 00:07:26.240 Some Python developers long ago decided that because these functions, 00:07:26.240 --> 00:07:32.870 like lower or capitalize or split and so on-- or not split, but strip, 00:07:32.870 --> 00:07:34.423 belong to this thing called a string. 00:07:34.423 --> 00:07:36.590 They're so integral to what it means to be a string, 00:07:36.590 --> 00:07:41.400 we actually have to include them inside this string data type, so to speak. 00:07:41.400 --> 00:07:45.080 So if we keep going here, we can actually think of other ways 00:07:45.080 --> 00:07:46.850 to use these methods. 00:07:46.850 --> 00:07:50.840 But first, let's actually dive into how they work, why they exist, 00:07:50.840 --> 00:07:51.650 and where they are. 00:07:51.650 --> 00:07:54.900 Compare what kinds of methods you want to use in your own code. 00:07:54.900 --> 00:07:59.990 So again, these strings, or more succinctly called strs in Python, S-T-R 00:07:59.990 --> 00:08:03.020 for short, actually you have a variety of methods you can figure out 00:08:03.020 --> 00:08:04.500 in the Python documentation. 00:08:04.500 --> 00:08:07.530 So if you go to docs.python.org, you'll see something like this. 00:08:07.530 --> 00:08:10.040 And if you scroll down to the str section, 00:08:10.040 --> 00:08:11.780 you'll see string methods inside. 00:08:11.780 --> 00:08:15.830 You get to find all the methods you could call on your strings in Python. 00:08:15.830 --> 00:08:19.230 We have capitalized down here and some others if we scroll down. 00:08:19.230 --> 00:08:22.140 But notice how we have access to this dot notation. 00:08:22.140 --> 00:08:26.360 It's not capitalize, and then, give the string as input to capitalize. 00:08:26.360 --> 00:08:30.060 It's actually str.capitalize or string.lower. 00:08:30.060 --> 00:08:32.870 And so why does this come up in Python? 00:08:32.870 --> 00:08:36.890 Well, if we actually bring it back to what we saw in C with our structs, 00:08:36.890 --> 00:08:38.490 we might get some intuition here. 00:08:38.490 --> 00:08:42.530 So in C, we had this idea of a struct called a candidate, for example, 00:08:42.530 --> 00:08:43.789 in an earlier problem set. 00:08:43.789 --> 00:08:47.430 And a candidate looks a bit like this, just a single person over here. 00:08:47.430 --> 00:08:49.910 And remember how this candidate had different attributes, 00:08:49.910 --> 00:08:54.350 like they had a name, or they had maybe a number of votes. 00:08:54.350 --> 00:08:56.270 And this candidate was some data type we had 00:08:56.270 --> 00:08:58.310 constructed to have these attributes. 00:08:58.310 --> 00:09:00.890 It had a name and some number of votes. 00:09:00.890 --> 00:09:04.860 And to get access to those, well, we did this very same thing of candidate.votes 00:09:04.860 --> 00:09:06.050 to get access to votes. 00:09:06.050 --> 00:09:09.720 And then, candidate.name to get access to name. 00:09:09.720 --> 00:09:14.000 So in Python, this str data type now is somewhat similar, 00:09:14.000 --> 00:09:16.130 but you can think of it as more of a toolkit. 00:09:16.130 --> 00:09:18.320 It's a data type that has some tools inside 00:09:18.320 --> 00:09:21.440 you can use on the string you're talking about in this case. 00:09:21.440 --> 00:09:24.350 So we could say str.capitalize. 00:09:24.350 --> 00:09:26.580 And there's a tool inside of this str type 00:09:26.580 --> 00:09:29.580 we can use-- this function we can use, or more particularly this method, 00:09:29.580 --> 00:09:32.060 we can use on this str data type. 00:09:32.060 --> 00:09:35.150 And similarly, we could even have str.lower, this other tool 00:09:35.150 --> 00:09:38.420 inside of our toolbox for strs we can use to actually make 00:09:38.420 --> 00:09:40.590 the str do what we want it to do. 00:09:40.590 --> 00:09:45.725 So very similar in spirit to this idea of attributes for our structs in C 00:09:45.725 --> 00:09:49.040 where structure often maybe defined as our own data types, similarly, 00:09:49.040 --> 00:09:51.350 in Python, we have our own data type called 00:09:51.350 --> 00:09:53.930 a str that now has not just attributes, but also 00:09:53.930 --> 00:09:58.910 functions that can belong to it and that can operate on its own self. 00:09:58.910 --> 00:10:01.840 So that's all for dot syntax. 00:10:01.840 --> 00:10:06.610 And take a look at the Python all the functions you can use here. 00:10:06.610 --> 00:10:09.370 But let's actually take a look at loops in Python 00:10:09.370 --> 00:10:11.720 and how strings come in with loops. 00:10:11.720 --> 00:10:15.700 So if we remember from lecture, we see that there are maybe 00:10:15.700 --> 00:10:17.500 the same kinds of loops impact. 00:10:17.500 --> 00:10:21.340 We have the while loop, the for loop, but they look a little different. 00:10:21.340 --> 00:10:24.340 And then, one big example of this difference 00:10:24.340 --> 00:10:28.000 is Python's four blank in blank syntax. 00:10:28.000 --> 00:10:31.400 And we'll see this very often in Python because it's so convenient to use. 00:10:31.400 --> 00:10:34.930 So let's say, for example, I have some piece of text, again, 00:10:34.930 --> 00:10:37.160 in the great green room on the right hand side. 00:10:37.160 --> 00:10:39.770 And I want to loop through this piece of text. 00:10:39.770 --> 00:10:44.230 Well, I could do that much more simply than I really could in C. All I 00:10:44.230 --> 00:10:47.650 have to do is say for C in text. 00:10:47.650 --> 00:10:49.870 Maybe we want to print out every character. 00:10:49.870 --> 00:10:52.840 And so what we'll do is make a new variable called C. 00:10:52.840 --> 00:10:56.680 And it'll actually loop through all the characters inside of this string called 00:10:56.680 --> 00:11:01.040 text and make sure that on each iteration C updates as we go through. 00:11:01.040 --> 00:11:02.870 So for example, let's say I run this code. 00:11:02.870 --> 00:11:07.270 I would see that C gets that very first character in text, like the I. 00:11:07.270 --> 00:11:11.320 On the next iteration, C will get that next character called n, and then 00:11:11.320 --> 00:11:14.650 the space, and then the t, and then the h, and then the e. 00:11:14.650 --> 00:11:17.920 And this will keep going and going and going all the way through our string. 00:11:17.920 --> 00:11:19.990 Now, we could call C anything we want to call it. 00:11:19.990 --> 00:11:21.250 We could call it z. 00:11:21.250 --> 00:11:22.690 We could call it s. 00:11:22.690 --> 00:11:25.537 We could call it character, just like a long variable name. 00:11:25.537 --> 00:11:27.370 We could even call it zebra if we wanted to. 00:11:27.370 --> 00:11:30.730 But the main thing here is that Python takes this string 00:11:30.730 --> 00:11:35.290 and infers that when it sees this loop called for blank in whatever string 00:11:35.290 --> 00:11:38.590 you have, it's going to take every individual character, 00:11:38.590 --> 00:11:43.840 assign it some name as you loop through, and make sure that each time that name 00:11:43.840 --> 00:11:46.210 refers to a particular element inside of your string 00:11:46.210 --> 00:11:49.760 where the element is now, in this case, a character. 00:11:49.760 --> 00:11:53.470 Now, this is actually-- this works beyond simply strings. 00:11:53.470 --> 00:11:55.900 We also have this kind of syntax for lists. 00:11:55.900 --> 00:11:58.780 And so let's say we want to turn this piece of text, which 00:11:58.780 --> 00:12:02.503 is all one string, into really a list of different strings 00:12:02.503 --> 00:12:03.670 where each string is a word. 00:12:03.670 --> 00:12:06.220 Well, we could have a different kind of method for a string. 00:12:06.220 --> 00:12:08.050 This one called .split. 00:12:08.050 --> 00:12:11.290 So here we have, again, our text on the right hand side. 00:12:11.290 --> 00:12:14.890 And let's make this new variable called words that will get our text 00:12:14.890 --> 00:12:17.320 but split up now individual words. 00:12:17.320 --> 00:12:21.190 And when you call .split onto a piece of text, it's going to automatically, 00:12:21.190 --> 00:12:26.290 or by default, look for spaces in that string and give you back substrings 00:12:26.290 --> 00:12:29.540 that are smaller strings that are between those spaces. 00:12:29.540 --> 00:12:32.500 So for example, if we were to run text.split here, what we'll get 00:12:32.500 --> 00:12:34.460 is now a list of individual words. 00:12:34.460 --> 00:12:39.610 So see how this changes from one long string into multiple individual words 00:12:39.610 --> 00:12:41.500 and are part of this Python list. 00:12:41.500 --> 00:12:46.238 And as a thought question here, how do we know this is a list? 00:12:46.238 --> 00:12:48.280 Take a look at the syntax on the right hand side. 00:12:48.280 --> 00:12:50.140 Let me show it back to you again. 00:12:50.140 --> 00:12:51.865 How do we know this is a list? 00:12:55.050 --> 00:12:58.300 Well, you might have thought that it's because of the brackets on either side. 00:12:58.300 --> 00:13:02.040 We see these square brackets, and that in Python denotes a list. 00:13:02.040 --> 00:13:05.910 But we also see these commas in between our words or their individual strings. 00:13:05.910 --> 00:13:10.500 So we say in, which is a string, comma, though, which is a string, comma. 00:13:10.500 --> 00:13:12.810 Those are all inside these square brackets. 00:13:12.810 --> 00:13:16.440 So that denotes to us a Python list, which is similar in spirit 00:13:16.440 --> 00:13:21.340 to a C array, but it's much more flexible overall for us. 00:13:21.340 --> 00:13:24.333 OK, so now, we split our words into-- 00:13:24.333 --> 00:13:26.500 split our text into individual words, let's actually 00:13:26.500 --> 00:13:28.940 see how we can loop through these words as a whole. 00:13:28.940 --> 00:13:34.900 So if we then use our for blank in blank syntax with a list as that second blank 00:13:34.900 --> 00:13:38.000 there, we could say, for word in words. 00:13:38.000 --> 00:13:39.280 Let's print out the word. 00:13:39.280 --> 00:13:43.090 And what this will do for us visually is say, on this first iteration, 00:13:43.090 --> 00:13:45.550 word will get the value in. 00:13:45.550 --> 00:13:49.480 And on the second iteration, word will get the value the. 00:13:49.480 --> 00:13:51.817 And on the third, it will get the value great and so on. 00:13:51.817 --> 00:13:55.150 And so you might be able to guess, well, on the fourth iteration, what will word 00:13:55.150 --> 00:13:56.605 get? 00:13:56.605 --> 00:13:58.020 It will get green. 00:13:58.020 --> 00:14:00.621 And on the fifth iteration, what will word get? 00:14:00.621 --> 00:14:02.405 It will get room, right? 00:14:02.405 --> 00:14:03.780 It will get room at the very end. 00:14:03.780 --> 00:14:08.540 So we see that word is going through and getting these individual words 00:14:08.540 --> 00:14:09.750 inside of our list. 00:14:09.750 --> 00:14:10.730 Well, why is that? 00:14:10.730 --> 00:14:14.300 Why is it the actual words now inside of our list 00:14:14.300 --> 00:14:16.640 as opposed to the characters inside of our string? 00:14:16.640 --> 00:14:20.390 Well, what matters here is the kind of data type 00:14:20.390 --> 00:14:23.010 you are asking Python to iterate over. 00:14:23.010 --> 00:14:26.480 So this for/in syntax is helpful if you want 00:14:26.480 --> 00:14:30.362 to have some kind of loop going through every element of a-- 00:14:30.362 --> 00:14:33.320 what we call iterable, where an iterable means you can iterate over it. 00:14:33.320 --> 00:14:35.630 It has some elements you can go over individually. 00:14:35.630 --> 00:14:38.780 If you have a list, what Python will do is go over 00:14:38.780 --> 00:14:40.950 every element inside that list. 00:14:40.950 --> 00:14:45.760 So in this case, our strings are the elements of our list, right? 00:14:45.760 --> 00:14:50.920 But if we have just a single string, like in or the or great or the string 00:14:50.920 --> 00:14:53.470 as a whole, in the great green room, well, Python 00:14:53.470 --> 00:14:55.840 will decide to go through every individual character. 00:14:55.840 --> 00:14:59.830 Because in this case, the characters are that subpiece, that subelement inside 00:14:59.830 --> 00:15:01.832 of our longer string, in this case. 00:15:01.832 --> 00:15:03.790 So I encourage you to actually go out and check 00:15:03.790 --> 00:15:06.220 what Python does in these different cases 00:15:06.220 --> 00:15:13.110 to actually see how this changes as you change what you iterate over in Python. 00:15:13.110 --> 00:15:16.200 So our first exercise together will be to actually take 00:15:16.200 --> 00:15:18.330 a look at this piece of code that will have 00:15:18.330 --> 00:15:21.540 us look at a variety different loops and figure out which 00:15:21.540 --> 00:15:24.250 one is going to print out which thing. 00:15:24.250 --> 00:15:26.370 So let's look back at our code here. 00:15:26.370 --> 00:15:30.430 I'll go over to my code space and I'll go ahead and sign in over here. 00:15:30.430 --> 00:15:32.775 I'll let you get your code space up too. 00:15:32.775 --> 00:15:34.150 So we have our code space loaded. 00:15:34.150 --> 00:15:35.910 So let's actually go ahead and pull up our piece of code 00:15:35.910 --> 00:15:37.493 that we'll take a look at it together. 00:15:37.493 --> 00:15:39.420 So we'll open up text.py. 00:15:39.420 --> 00:15:42.032 And I'll zoom in a bit on it for you. 00:15:42.032 --> 00:15:43.740 Now, the goal for this exercise-- so take 00:15:43.740 --> 00:15:46.860 a look at each of these loops in Python and figure out 00:15:46.860 --> 00:15:48.610 what they're going to do for us. 00:15:48.610 --> 00:15:51.210 So notice at the very top, we have the same text from before, 00:15:51.210 --> 00:15:52.590 in the great green room. 00:15:52.590 --> 00:15:54.960 And we split it up by word. 00:15:54.960 --> 00:15:59.460 So we're saying let's split up our text and make this list called words. 00:15:59.460 --> 00:16:02.700 But we have different loops down here to actually loop through those words 00:16:02.700 --> 00:16:05.242 and print out those words out in a variety of different ways. 00:16:05.242 --> 00:16:06.690 So let's look at round one here. 00:16:06.690 --> 00:16:08.010 What might we see in? 00:16:08.010 --> 00:16:11.130 And feel free to pause the video and write it out for yourself. 00:16:11.130 --> 00:16:12.750 What might we see in this round one? 00:16:17.150 --> 00:16:21.630 So we might see, as we saw before, every individual word being printed. 00:16:21.630 --> 00:16:25.580 So we might see first in, and then the, and then 00:16:25.580 --> 00:16:28.850 great, and then green, and then room. 00:16:28.850 --> 00:16:31.220 These are going to be on new lines because, if we know, 00:16:31.220 --> 00:16:36.050 print at the very end always includes that new line for us automatically. 00:16:36.050 --> 00:16:39.770 If we wanted to change that, let's say, print these all on the same line, 00:16:39.770 --> 00:16:45.740 like in the great green room on that same line, well, 00:16:45.740 --> 00:16:48.822 we could change the ending character of this print statement. 00:16:48.822 --> 00:16:49.530 We could do this. 00:16:49.530 --> 00:16:51.470 We could say end equals blank. 00:16:51.470 --> 00:16:53.720 Or end equals, in this case, it's going to be a space. 00:16:53.720 --> 00:16:58.760 So what'll happen here is Python will print out end, that very first word, 00:16:58.760 --> 00:17:02.420 add a space at the end, and then, print out that next word going 00:17:02.420 --> 00:17:04.670 and going and going through. 00:17:04.670 --> 00:17:06.950 So up to you, but we can maybe just leave this 00:17:06.950 --> 00:17:11.342 as printing out the individual words on single lines. 00:17:11.342 --> 00:17:12.800 Let's take a look at round two now. 00:17:12.800 --> 00:17:15.217 And feel free to pause the video and ask yourself, 00:17:15.217 --> 00:17:16.550 what might get printed out here? 00:17:23.829 --> 00:17:26.619 So if we look through this, we have for word in words. 00:17:26.619 --> 00:17:28.980 It's the very same loop we had before. 00:17:28.980 --> 00:17:32.680 We're getting access to every individual word in our list of words. 00:17:32.680 --> 00:17:39.720 So first in, then the, then great, then green, then room. 00:17:39.720 --> 00:17:41.790 But now, we have for c in word. 00:17:41.790 --> 00:17:46.260 And so recall, when we have a list we're looping over, like in this first loop, 00:17:46.260 --> 00:17:48.990 we're getting access to the elements of that list. 00:17:48.990 --> 00:17:52.330 But word now is simply an individual string. 00:17:52.330 --> 00:17:53.790 So what are we printing out here? 00:17:53.790 --> 00:17:58.030 Probably the individual characters inside of that particular word. 00:17:58.030 --> 00:18:03.510 So if we loop through this, we might see first I and then n. 00:18:03.510 --> 00:18:08.400 And then, would we see the space or would we not? 00:18:08.400 --> 00:18:11.308 Maybe take a guess? 00:18:11.308 --> 00:18:13.350 In this case, we actually wouldn't see the space. 00:18:13.350 --> 00:18:17.040 And that's because we've split our words into a list of words. 00:18:17.040 --> 00:18:20.010 And then, I actually try to print this out for you to see. 00:18:20.010 --> 00:18:26.070 I'll go ahead and copy this and I'll open up a new one called list.py. 00:18:26.070 --> 00:18:27.240 I'll paste this here. 00:18:27.240 --> 00:18:28.653 I'll print words. 00:18:28.653 --> 00:18:29.820 Let's go ahead and run this. 00:18:29.820 --> 00:18:31.830 Python of list.py. 00:18:31.830 --> 00:18:35.400 Well, we see now in the great green room. 00:18:35.400 --> 00:18:38.430 Notice how there aren't any spaces inside 00:18:38.430 --> 00:18:41.520 of these individual strings that are inside of our list. 00:18:41.520 --> 00:18:45.510 So if we go back here and we loop through the individual characters, 00:18:45.510 --> 00:18:50.050 first in this word, which is I and n, well, there are no spaces there. 00:18:50.050 --> 00:18:52.020 And if we loop through this next word like, 00:18:52.020 --> 00:18:54.600 T-H-E, there are no spaces there, either. 00:18:54.600 --> 00:18:58.750 So we'll say T-H-E and so on and so forth. 00:18:58.750 --> 00:19:00.300 You can keep going like this. 00:19:00.300 --> 00:19:03.150 We're printing out every individual character except for the spaces, 00:19:03.150 --> 00:19:05.120 in this case. 00:19:05.120 --> 00:19:09.800 OK, let's look at now our third round where we have for word in words. 00:19:09.800 --> 00:19:10.940 Same thing we saw before. 00:19:10.940 --> 00:19:12.560 But now, if g-- 00:19:12.560 --> 00:19:16.520 this character g is in the word, what will we do? 00:19:16.520 --> 00:19:17.600 Print the word. 00:19:17.600 --> 00:19:21.890 So take a guess as to what you might see in this loop, keeping in mind that this 00:19:21.890 --> 00:19:25.340 is our text, in the great green room. 00:19:25.340 --> 00:19:26.345 What would we see here? 00:19:32.090 --> 00:19:35.140 So you might see only the words that have g in them. 00:19:35.140 --> 00:19:38.590 And in Python, it's just this easy to ask if some string is 00:19:38.590 --> 00:19:39.760 part of another string. 00:19:39.760 --> 00:19:42.250 That's because we're asking if g, this string, 00:19:42.250 --> 00:19:45.830 will be inside this smaller string, this word right here. 00:19:45.830 --> 00:19:49.900 So let's go ahead and say I, it doesn't have any g's in it. 00:19:49.900 --> 00:19:51.760 The, no g's. 00:19:51.760 --> 00:19:54.610 Great has a g at the beginning, so maybe we print that. 00:19:54.610 --> 00:19:57.500 And green does too, but room does not. 00:19:57.500 --> 00:20:00.880 So we probably-- it would be safe to say that we would see great here 00:20:00.880 --> 00:20:07.060 and we would see green printed out to the screen on these new lines. 00:20:07.060 --> 00:20:10.150 And now, we have perhaps some new syntax here. 00:20:10.150 --> 00:20:13.367 For word, in words, [2:] . 00:20:15.958 --> 00:20:18.000 And maybe you're not familiar with that, but feel 00:20:18.000 --> 00:20:22.650 free to take a guess as to what you might think will happen here. 00:20:22.650 --> 00:20:23.745 Maybe pause the video. 00:20:27.110 --> 00:20:31.642 And if we take a look at this, well, let's go back to our list.py file 00:20:31.642 --> 00:20:34.100 and try to get a grasp on what this is really doing for us. 00:20:34.100 --> 00:20:37.690 So we see words [2:] . 00:20:37.690 --> 00:20:39.490 And this is somewhat familiar to us. 00:20:39.490 --> 00:20:42.760 We've seen words [2] before. 00:20:42.760 --> 00:20:45.180 What this will do if I run it as Python of list.py. 00:20:45.180 --> 00:20:47.560 Well, that will give us just great, right? 00:20:47.560 --> 00:20:50.530 It will give us the second indexed element 00:20:50.530 --> 00:20:55.400 where this is 0, 1, and 2 in our list. 00:20:55.400 --> 00:20:56.750 Just point that out. 00:20:56.750 --> 00:21:00.870 But what if we wanted not just that element, but that one in all the rest 00:21:00.870 --> 00:21:01.370 after it? 00:21:01.370 --> 00:21:03.200 Well, Python comes with this fancy feature. 00:21:03.200 --> 00:21:06.890 We can say a colon here to say, get me that word at index 2 00:21:06.890 --> 00:21:08.638 and all the rest, right? 00:21:08.638 --> 00:21:09.430 So I could do this. 00:21:09.430 --> 00:21:12.680 I could say Python of list.py and see great, green, and room. 00:21:12.680 --> 00:21:18.080 Now, I've sliced my list into these smaller piece here. 00:21:18.080 --> 00:21:20.305 That only includes great and green and room. 00:21:20.305 --> 00:21:21.680 That's the technical term for it. 00:21:21.680 --> 00:21:22.310 Slicing. 00:21:22.310 --> 00:21:26.600 We're slicing a list using this colon syntax here. 00:21:26.600 --> 00:21:29.210 Now, I could even add an end state. 00:21:29.210 --> 00:21:30.770 Let's say I don't want room. 00:21:30.770 --> 00:21:32.690 I only want great and green. 00:21:32.690 --> 00:21:34.560 Well, I could modify my slice like this. 00:21:34.560 --> 00:21:38.090 I could say start at index 2, in this case, that's great, again, 00:21:38.090 --> 00:21:39.110 we're 0 index. 00:21:39.110 --> 00:21:41.780 0, 1, and 2. 00:21:41.780 --> 00:21:46.460 And then, go up to, but not including, that last index here. 00:21:46.460 --> 00:21:48.050 And that last index is 4. 00:21:48.050 --> 00:21:50.540 Again, 0, 1, 2, 3, 4. 00:21:50.540 --> 00:21:51.470 So we'll go here. 00:21:51.470 --> 00:21:53.910 2:4 Python list.py. 00:21:53.910 --> 00:21:55.880 Now, I see great and green. 00:21:55.880 --> 00:22:00.770 This worked because the very first number we put in is inclusive. 00:22:00.770 --> 00:22:02.840 We're going to get this index back. 00:22:02.840 --> 00:22:06.720 So we're going to get this value here, this string here. 00:22:06.720 --> 00:22:08.480 And we're also going to get 3-- 00:22:08.480 --> 00:22:09.470 index 3. 00:22:09.470 --> 00:22:11.570 But we won't get index 4. 00:22:11.570 --> 00:22:13.100 This is exclusive. 00:22:13.100 --> 00:22:17.100 The very first number is inclusive, the last number is exclusive. 00:22:17.100 --> 00:22:21.945 So to get just, for example, great, we could also do this. 00:22:21.945 --> 00:22:23.570 But that's not really necessary, right? 00:22:23.570 --> 00:22:27.320 We only really need 2, 4 to get great and green. 00:22:27.320 --> 00:22:32.563 Or if we wanted to, just 2: to get all the rest of them in this list, OK? 00:22:32.563 --> 00:22:35.230 So now that we've seen that, what do you think will happen here? 00:22:35.230 --> 00:22:38.800 Well, it will probably print out every individual word that is in this list 00:22:38.800 --> 00:22:41.270 only after the second index. 00:22:41.270 --> 00:22:44.150 So we'll probably see great-- 00:22:44.150 --> 00:22:49.280 we'll see great and green and we'll see room overall. 00:22:49.280 --> 00:22:50.570 OK, last one here. 00:22:50.570 --> 00:22:54.305 We have for word in words, print Goodnight Moon. 00:22:54.305 --> 00:22:57.305 Now, what do you think you will see here if you were to pause the video? 00:23:00.890 --> 00:23:04.340 And maybe you've noticed that, for a Python loop, 00:23:04.340 --> 00:23:07.910 we have this idea of going through every element we have in our list. 00:23:07.910 --> 00:23:11.010 A lot of Python for loops are really built on lists. 00:23:11.010 --> 00:23:14.000 So if we have for word in words, well, our list, 00:23:14.000 --> 00:23:19.710 again, is the same one in list.py if I print out just words now. 00:23:19.710 --> 00:23:21.840 In the great green room. 00:23:21.840 --> 00:23:25.930 Well, this will actually iterate over every element in that list. 00:23:25.930 --> 00:23:32.490 We'll say, OK, first in, then the, then great, then green, then room. 00:23:32.490 --> 00:23:36.210 And it doesn't matter-- if we're not doing anything with word, 00:23:36.210 --> 00:23:37.990 we're just looping that many times. 00:23:37.990 --> 00:23:40.530 So we're going to loop, in this case, five times and print 00:23:40.530 --> 00:23:42.240 out Goodnight Moon. 00:23:42.240 --> 00:23:45.678 But it seems a little odd to do it that way. 00:23:45.678 --> 00:23:47.970 And actually, I think I might have a syntax error here. 00:23:47.970 --> 00:23:54.100 I don't think we need this parentheses here for word in words. 00:23:54.100 --> 00:23:57.540 If we did this, well, why do we have but call it word? 00:23:57.540 --> 00:23:59.290 We're not really using word at all. 00:23:59.290 --> 00:24:03.845 So we could just say underscore, meaning that, look, this name doesn't matter. 00:24:03.845 --> 00:24:04.720 It could be anything. 00:24:04.720 --> 00:24:10.180 It could be z, it could be f, it could be zebra, whatever we want it to be. 00:24:10.180 --> 00:24:12.220 But we're not going to use this variable name, 00:24:12.220 --> 00:24:16.920 SO let's just call it underscore just to signify that this file name doesn't 00:24:16.920 --> 00:24:17.760 quite matter. 00:24:17.760 --> 00:24:20.550 All we're interested in doing is looping through it. 00:24:20.550 --> 00:24:22.545 It doesn't change the output of this for loop. 00:24:22.545 --> 00:24:24.420 It doesn't change anything about how it runs. 00:24:24.420 --> 00:24:25.837 It just changes the variable name. 00:24:25.837 --> 00:24:28.840 So it signifies we don't really care what the name is, in this case. 00:24:28.840 --> 00:24:31.130 So I'll change it back to for word in words. 00:24:31.130 --> 00:24:32.880 Let's go ahead and run this piece of code. 00:24:32.880 --> 00:24:34.410 We'll run Python of text.py. 00:24:34.410 --> 00:24:38.160 And now we'll see, in round five, five Goodnight Moons. 00:24:38.160 --> 00:24:40.110 So let's keep scrolling up again. 00:24:40.110 --> 00:24:44.280 Round one, we did see, in the great green room. 00:24:44.280 --> 00:24:49.590 Round two, we see all the individual characters in the great green room. 00:24:49.590 --> 00:24:53.480 Round three, we just see great and green. 00:24:53.480 --> 00:24:56.472 Round four, we see great green room. 00:24:56.472 --> 00:24:58.430 And of course, in round five, as we saw before, 00:24:58.430 --> 00:25:02.890 we do see five instances of Goodnight Moon. 00:25:02.890 --> 00:25:06.470 OK, so that covers a lot of Python loops. 00:25:06.470 --> 00:25:09.220 And in general, if you're going to use this kind of Python syntax, 00:25:09.220 --> 00:25:10.720 I think you'll find it really handy. 00:25:10.720 --> 00:25:13.750 But it just takes some practice to get to know what each of these loops 00:25:13.750 --> 00:25:17.450 is doing and how they work with different data types, in this case. 00:25:17.450 --> 00:25:19.910 So let's keep going here. 00:25:19.910 --> 00:25:23.140 And let's take a look at this new Python data type. 00:25:23.140 --> 00:25:24.880 This one called dictionary. 00:25:24.880 --> 00:25:27.700 And dictionaries are really important in Python, very easy to use, 00:25:27.700 --> 00:25:31.300 and often very useful for us, as you'll see in the problem set this week. 00:25:31.300 --> 00:25:32.750 So a dictionary. 00:25:32.750 --> 00:25:36.350 Well, if we think of a dictionary, it's this piece of paper, so to speak. 00:25:36.350 --> 00:25:40.930 We might have some idea of hosting some keys and some values. 00:25:40.930 --> 00:25:44.240 Maybe words and their definitions, like a real dictionary has. 00:25:44.240 --> 00:25:47.530 So here we have, for example, maybe a dictionary of authors. 00:25:47.530 --> 00:25:50.830 So maybe Goodnight Moon is one of our keys. 00:25:50.830 --> 00:25:54.040 And the value associated with that key is Margaret Wise Brown. 00:25:54.040 --> 00:25:57.100 Or Corduroy, that's our key, the book title, 00:25:57.100 --> 00:26:01.210 is now associated with the value Don Freeman, that author there. 00:26:01.210 --> 00:26:03.760 And Curious George associated with H.A. Ray, 00:26:03.760 --> 00:26:07.640 where we have Curious George as a key and H.A. Ray has the value. 00:26:07.640 --> 00:26:10.540 So what this gives us is this ability to look up, 00:26:10.540 --> 00:26:14.860 like in a dictionary, the actual author of some piece of text given the title. 00:26:14.860 --> 00:26:18.100 So we see, again, Goodnight Moon is example of a key here 00:26:18.100 --> 00:26:20.900 and Margaret Wise Brown example of a value. 00:26:20.900 --> 00:26:26.710 So in this dictionary, authors, I could ask for a title-- like a book title. 00:26:26.710 --> 00:26:29.335 I could say, give me a book title and I'll say, Goodnight Moon. 00:26:29.335 --> 00:26:31.168 And the dictionary will actually giving back 00:26:31.168 --> 00:26:34.270 the value associated with that key, like Margaret Wise Brown. 00:26:34.270 --> 00:26:39.190 Now, this example here is a collection of multiple objects 00:26:39.190 --> 00:26:40.540 in the same dictionary. 00:26:40.540 --> 00:26:43.592 Have multiple books here all in that same dictionary. 00:26:43.592 --> 00:26:46.300 We have Goodnight Moon, we have Corduroy, we have Curious George. 00:26:46.300 --> 00:26:49.810 But we can also have a single dictionary for a single object. 00:26:49.810 --> 00:26:53.560 We could also have, for example, a single book dictionary. 00:26:53.560 --> 00:26:56.302 And this dictionary has a title key and an author key, 00:26:56.302 --> 00:26:57.760 the two things that make it a book. 00:26:57.760 --> 00:26:59.020 It has a title and an author. 00:26:59.020 --> 00:27:01.450 Well, the title of this book is Goodnight Moon 00:27:01.450 --> 00:27:03.970 and the author is Margaret Wise Brown. 00:27:03.970 --> 00:27:08.170 So I could simply ask this dictionary for the title of the book by saying, 00:27:08.170 --> 00:27:09.220 give me the value-- 00:27:09.220 --> 00:27:10.408 the key title. 00:27:10.408 --> 00:27:11.950 I'm going to give you Goodnight Moon. 00:27:11.950 --> 00:27:13.783 I could also ask for the author of this book 00:27:13.783 --> 00:27:16.670 by asking for the value associated with the key author, 00:27:16.670 --> 00:27:20.590 and I get back Margaret Wise Brown, in this case. 00:27:20.590 --> 00:27:24.100 OK, so let's see an example of this in actual syntax. 00:27:24.100 --> 00:27:25.990 This is a pretty good theoretical overview, 00:27:25.990 --> 00:27:28.330 but let's think about it in syntax form. 00:27:28.330 --> 00:27:32.260 Here I have a new dictionary, book equals dict. 00:27:32.260 --> 00:27:34.300 And what this is doing for me is saying that, 00:27:34.300 --> 00:27:37.810 give me an empty dictionary, nothing in it yet, and call it book. 00:27:37.810 --> 00:27:42.410 So on the right hand side, I'll see this dictionary, a blank slate called book. 00:27:42.410 --> 00:27:46.302 Now, let's I want to add in a key and a value that's associated with it. 00:27:46.302 --> 00:27:47.010 So I can do this. 00:27:47.010 --> 00:27:51.680 I can say, make sure you add this key called title 00:27:51.680 --> 00:27:53.180 and give it the value Corduroy. 00:27:53.180 --> 00:27:55.852 So this bracket notation is back. 00:27:55.852 --> 00:27:58.560 But now, to add a key to the dictionary, we're going to use that. 00:27:58.560 --> 00:28:03.290 So we're going to say book ["title"] to insert a new key called title and give 00:28:03.290 --> 00:28:04.478 it the value Corduroy. 00:28:04.478 --> 00:28:06.020 And we can also do it for the author. 00:28:06.020 --> 00:28:10.440 We could say book ["author"] is Don Freeman, in this case. 00:28:10.440 --> 00:28:14.990 So we're to say that, in this case, this book's author is Don Freeman. 00:28:14.990 --> 00:28:18.290 Now, if I want to get back some value from my dictionary, I could do this. 00:28:18.290 --> 00:28:21.320 I could say, book, bracket, title, and print it out. 00:28:21.320 --> 00:28:22.880 And what would I get in this case? 00:28:22.880 --> 00:28:24.965 What do you think? 00:28:24.965 --> 00:28:25.840 I might get Corduroy. 00:28:25.840 --> 00:28:29.110 I would see Corduroy printed out to the screen down below. 00:28:29.110 --> 00:28:32.780 That's because saying book ["title"] is saying, 00:28:32.780 --> 00:28:38.380 look for the value associated with this key called title inside this dictionary 00:28:38.380 --> 00:28:41.280 that we're calling book, right? 00:28:41.280 --> 00:28:46.748 OK, now though, what if I asked for the key Corduroy? 00:28:46.748 --> 00:28:48.040 What do you think would happen? 00:28:51.140 --> 00:28:55.075 Well, if we look at our dictionary, do we have a key in Corduroy? 00:28:55.075 --> 00:28:56.200 It doesn't look like we do. 00:28:56.200 --> 00:28:58.900 We only have a key for title and a key for author. 00:28:58.900 --> 00:29:02.650 Corduroy is a value, but it's not really a key, right? 00:29:02.650 --> 00:29:06.590 It's associated with this key title, but it isn't a key itself. 00:29:06.590 --> 00:29:09.160 So if we did this, we get what Python calls 00:29:09.160 --> 00:29:11.800 a key error where a key error is simply we're 00:29:11.800 --> 00:29:15.910 trying to access some key in our dictionary that doesn't exist. 00:29:15.910 --> 00:29:18.560 They'll tell us that this key is not part of dictionary. 00:29:18.560 --> 00:29:24.000 You can't look up the value for a key that doesn't exist, in this case. 00:29:24.000 --> 00:29:27.750 OK, so that gives us access to these dictionaries. 00:29:27.750 --> 00:29:30.010 We're going to make them in code. 00:29:30.010 --> 00:29:32.790 But what if we wanted to do something a little more advanced? 00:29:32.790 --> 00:29:36.330 This is a single book, but what if we had multiple books? 00:29:36.330 --> 00:29:39.720 Well, we could maybe shorten our syntax a little bit here. 00:29:39.720 --> 00:29:43.980 We could say that if I want a new book, let's just define it all in one breath. 00:29:43.980 --> 00:29:47.610 Here we have a new dictionary denoted by these curly braces now. 00:29:47.610 --> 00:29:49.900 Not square brackets, but curly braces. 00:29:49.900 --> 00:29:54.210 And we have the key, like title and the value, like the Goodnight Moon, 00:29:54.210 --> 00:29:58.080 and the key author in the value Margaret Wise Brown. 00:29:58.080 --> 00:30:01.410 To give you the full picture here it looks like-- a bit like this. 00:30:01.410 --> 00:30:03.230 But again, this is only one book. 00:30:03.230 --> 00:30:06.170 So how could I actually get multiple books to actually represent 00:30:06.170 --> 00:30:07.830 multiple books in our code? 00:30:07.830 --> 00:30:10.432 Well, we could keep this same style of dictionary 00:30:10.432 --> 00:30:12.140 where we have a dictionary for every book 00:30:12.140 --> 00:30:14.660 and it has the two keys, title and author. 00:30:14.660 --> 00:30:17.503 We could also make a list of them-- a list of dictionaries. 00:30:17.503 --> 00:30:19.295 So let's take a look at this we could see-- 00:30:19.295 --> 00:30:22.150 OK, here I have this list. 00:30:22.150 --> 00:30:27.230 And how do you know it's a list if you take a look at this piece of code? 00:30:27.230 --> 00:30:29.390 Well, we see those square brackets on either end. 00:30:29.390 --> 00:30:30.560 Again, the square brackets that are there 00:30:30.560 --> 00:30:32.880 at the beginning and square brackets at the end. 00:30:32.880 --> 00:30:35.900 And we also see we have some commas in the middle, 00:30:35.900 --> 00:30:37.380 just to highlight that here. 00:30:37.380 --> 00:30:39.350 This is a list. 00:30:39.350 --> 00:30:43.130 But inside of this list, instead of individual strings, 00:30:43.130 --> 00:30:46.490 for example, as we saw earlier, we now have full dictionaries. 00:30:46.490 --> 00:30:50.480 We have this dictionary for Goodnight Moon, this dictionary for Corduroy, 00:30:50.480 --> 00:30:53.180 and this dictionary for Curious George. 00:30:53.180 --> 00:30:58.070 So this is helpful for us because we can represent all kinds of different things 00:30:58.070 --> 00:31:01.400 using dictionaries, but make sure we have multiple of them 00:31:01.400 --> 00:31:05.610 by keeping a list of these very same dictionaries. 00:31:05.610 --> 00:31:07.010 So let's get some practice here. 00:31:07.010 --> 00:31:08.300 Let's go to books.py. 00:31:08.300 --> 00:31:11.480 And inside of books.py, we'll actually make 00:31:11.480 --> 00:31:15.020 sure we can prompt the user for a title of book and an author. 00:31:15.020 --> 00:31:19.260 And we'll add that to our bookshelf, which is a list of books, in this case. 00:31:19.260 --> 00:31:20.670 So let's go back over here. 00:31:20.670 --> 00:31:23.210 And let's close out some old files and maybe we'll 00:31:23.210 --> 00:31:27.310 go ahead and code up books.py. 00:31:27.310 --> 00:31:30.940 So notice how we have part of our code imprinted for us. 00:31:30.940 --> 00:31:33.160 We have this list of books that is empty. 00:31:33.160 --> 00:31:37.090 And this is a list, again, because it's simply two empty square braces. 00:31:37.090 --> 00:31:40.300 Nothing inside this list, but there could be eventually. 00:31:40.300 --> 00:31:44.140 And now, we have this for loop for i in range 3. 00:31:44.140 --> 00:31:48.040 We saw range in lecture, which simply gives us a list from 0 all the way up 00:31:48.040 --> 00:31:49.280 to 2, in this case. 00:31:49.280 --> 00:31:52.780 We can do 0, 1, 2, loop three times. 00:31:52.780 --> 00:31:56.890 And inside this loop, we'll actually make sure we have our new dictionary 00:31:56.890 --> 00:31:58.885 and we add it to our list of books. 00:31:58.885 --> 00:32:00.760 And finally, at the very end, we'll print out 00:32:00.760 --> 00:32:03.890 our bookshelf, our list of books, in this case. 00:32:03.890 --> 00:32:06.775 So if we wanted to start here, well, I can go back to my syntax 00:32:06.775 --> 00:32:07.900 as we saw before. 00:32:07.900 --> 00:32:09.280 If I go back here and I see-- 00:32:09.280 --> 00:32:13.840 if I want to make a new dictionary, I just need to ask for a blank dict. 00:32:13.840 --> 00:32:18.670 So I'll go back over here and I'll say give, me a new dictionary called book, 00:32:18.670 --> 00:32:20.060 in this case. 00:32:20.060 --> 00:32:23.150 And maybe I want to add a key to this dictionary. 00:32:23.150 --> 00:32:27.022 So think to yourself, what is the syntax for that? 00:32:27.022 --> 00:32:28.230 And it looks a bit like this. 00:32:28.230 --> 00:32:31.350 We could say, OK, I want to have a title here. 00:32:31.350 --> 00:32:34.350 The key to this dictionary will be title. 00:32:34.350 --> 00:32:37.800 And I'll say, make sure that the title is whatever the user inputs. 00:32:37.800 --> 00:32:40.950 And I'll ask them for a title. 00:32:40.950 --> 00:32:45.600 And similarly, I could also say, OK, let's add an author to this book. 00:32:45.600 --> 00:32:50.170 And I'll say, the input for that will be asking the user for an author. 00:32:50.170 --> 00:32:52.320 So now, we have this blank dictionary. 00:32:52.320 --> 00:32:55.110 We've asked the user to give us a new key 00:32:55.110 --> 00:32:59.430 for the-- a new value for the key title, and similarly, 00:32:59.430 --> 00:33:01.620 a new value for the key author. 00:33:01.620 --> 00:33:05.220 We've made these keys and given them some value from the user. 00:33:05.220 --> 00:33:07.388 All right, we have our book. 00:33:07.388 --> 00:33:09.180 We could even print it out if we wanted to. 00:33:09.180 --> 00:33:11.480 You can print book down here. 00:33:11.480 --> 00:33:15.510 Let's run Python of books.py. 00:33:15.510 --> 00:33:24.060 We'll say let's get, in this case, Goodnight Moon by Margaret Wise Brown. 00:33:24.060 --> 00:33:28.320 And we see that, down below here, we do have that dictionary being printed out. 00:33:28.320 --> 00:33:31.650 And it's dictionary because we see it has these curly braces on either end 00:33:31.650 --> 00:33:34.480 and it has these keys associated with these values. 00:33:34.480 --> 00:33:37.170 So let's actually end our program here. 00:33:37.170 --> 00:33:40.812 What we can do is try to add this book to our list. 00:33:40.812 --> 00:33:43.020 And if you are not familiar with this yet, that's OK. 00:33:43.020 --> 00:33:47.320 We can actually go ahead and say books.append to add to our list. 00:33:47.320 --> 00:33:48.990 So currently, our list is empty. 00:33:48.990 --> 00:33:53.610 But we could use this method associated with a list called append to actually 00:33:53.610 --> 00:33:55.650 insert this book into our list. 00:33:55.650 --> 00:33:58.410 Books.append and individual book. 00:33:58.410 --> 00:33:59.470 So let's try this. 00:33:59.470 --> 00:34:01.220 Let's actually go ahead and down books.py. 00:34:01.220 --> 00:34:03.090 We'll do Python of books.py. 00:34:03.090 --> 00:34:05.190 Let's go ahead and add Goodnight Moon. 00:34:05.190 --> 00:34:07.890 This one written by, let's say, CS50. 00:34:07.890 --> 00:34:10.050 We could also add Corduroy. 00:34:10.050 --> 00:34:12.300 And maybe, [INAUDIBLE] CS52. 00:34:12.300 --> 00:34:14.850 And maybe we could add-- 00:34:14.850 --> 00:34:16.710 oh, I don't know, we can add Curious George, 00:34:16.710 --> 00:34:19.150 and that one is also written by CS50. 00:34:19.150 --> 00:34:21.210 So now we see down below-- 00:34:21.210 --> 00:34:22.530 if I go full screen-- 00:34:22.530 --> 00:34:24.193 we have this list of dictionaries. 00:34:24.193 --> 00:34:25.110 So what we saw before. 00:34:25.110 --> 00:34:29.110 We have the brackets starting our list on either end. 00:34:29.110 --> 00:34:31.170 The commas separate our dictionaries. 00:34:31.170 --> 00:34:37.500 And on the inside, we have this dictionary, this dictionary, 00:34:37.500 --> 00:34:40.120 and this dictionary down below here. 00:34:40.120 --> 00:34:43.889 So now, we have some individual books on our bookshelf. 00:34:43.889 --> 00:34:44.915 And now, what can we do? 00:34:44.915 --> 00:34:47.040 We could maybe decide to print out just the titles. 00:34:47.040 --> 00:34:48.570 I can go back through my books. 00:34:48.570 --> 00:34:52.949 I could say for book in books, looping through my list of books, 00:34:52.949 --> 00:34:56.547 print out the book's title like this. 00:34:56.547 --> 00:34:58.380 So now, instead of printing the entire list, 00:34:58.380 --> 00:34:59.670 I could print out just the titles. 00:34:59.670 --> 00:35:00.337 I could do this. 00:35:00.337 --> 00:35:02.100 I could say Python of books.py. 00:35:02.100 --> 00:35:07.440 I'll say Goodnight Moon by CS50. 00:35:07.440 --> 00:35:11.550 I'll say Corduroy by CS50. 00:35:11.550 --> 00:35:17.900 And I will say, in this case, Curious George by CS50. 00:35:17.900 --> 00:35:21.470 And now, I see Goodnight Moon, Corduroy, and Curious George. 00:35:21.470 --> 00:35:24.770 So helpful because we're able to structure our data inside 00:35:24.770 --> 00:35:28.500 of a dictionary and put that inside of our list here. 00:35:28.500 --> 00:35:32.720 OK, so what if we wanted to make sure the user 00:35:32.720 --> 00:35:35.990 couldn't type in a really awkward title. 00:35:35.990 --> 00:35:38.030 Like let's say, if I do this again, I might 00:35:38.030 --> 00:35:44.240 type in space, space, Goodnight Moon. 00:35:44.240 --> 00:35:46.670 And that isn't quite what I want, right? 00:35:46.670 --> 00:35:50.272 That isn't really good on me as a user to give this kind of data, 00:35:50.272 --> 00:35:51.980 but it's also not going to the programmer 00:35:51.980 --> 00:35:54.772 to assume that these are going to give me exactly what I want here. 00:35:54.772 --> 00:35:57.890 So instead of just accepting user input, I actually 00:35:57.890 --> 00:36:00.570 could go through and try to sanitize it a little bit. 00:36:00.570 --> 00:36:01.658 Make sure to clean it up. 00:36:01.658 --> 00:36:03.950 So I actually have it in the format I want it to be in. 00:36:03.950 --> 00:36:07.310 I could instead say, OK, whenever I get this input, 00:36:07.310 --> 00:36:11.640 I want to afterwards strip the white space. 00:36:11.640 --> 00:36:14.640 And I also want to-- and we capitalize it. 00:36:14.640 --> 00:36:17.610 So what this is doing here is stringing this dot notation together. 00:36:17.610 --> 00:36:21.860 So here I have input, which took me back a string. 00:36:21.860 --> 00:36:25.640 And remember, strings have access to these methods, like dot strip 00:36:25.640 --> 00:36:27.080 and dot capitalize. 00:36:27.080 --> 00:36:30.020 So first, we're going to get some input from the user, some string. 00:36:30.020 --> 00:36:31.760 We're going to strip it using this. 00:36:31.760 --> 00:36:34.010 And then, we're to run capitalize on it like this. 00:36:34.010 --> 00:36:35.010 So let's try that again. 00:36:35.010 --> 00:36:39.230 Let's do, in this case, print out just the title right after we get it. 00:36:39.230 --> 00:36:42.048 We'll print out book title. 00:36:42.048 --> 00:36:43.340 And let's go ahead and do this. 00:36:43.340 --> 00:36:45.560 We'd say Python of books.py. 00:36:45.560 --> 00:36:52.970 We'll say Goodnight Moon. 00:36:52.970 --> 00:36:54.620 And it's a little better, right? 00:36:54.620 --> 00:36:58.190 We capitalized the G in Goodnight Moon and we made sure 00:36:58.190 --> 00:37:00.170 that everything else is in lowercase. 00:37:00.170 --> 00:37:05.200 And there's no white space in front of anything we added here. 00:37:05.200 --> 00:37:08.540 OK, so just some handy syntax for cleaning up user input 00:37:08.540 --> 00:37:11.390 and making sure that you can make sure your data 00:37:11.390 --> 00:37:16.020 is formatted correctly in your own programs here. 00:37:16.020 --> 00:37:21.150 OK, so now that we have some dictionaries and this ability 00:37:21.150 --> 00:37:23.310 to represent data in this way, we can think 00:37:23.310 --> 00:37:26.380 of getting a little more advanced with our programs. 00:37:26.380 --> 00:37:29.040 If I go back to our slides, we might think 00:37:29.040 --> 00:37:33.210 of not just getting this shelf of books that the user types in, 00:37:33.210 --> 00:37:35.157 but really using some data in our programs. 00:37:35.157 --> 00:37:37.740 And we'll see this in action during the problem set this week. 00:37:37.740 --> 00:37:41.700 How do we get data and use it inside of our programs, especially using Python. 00:37:41.700 --> 00:37:45.480 Well, you can think of these libraries and these modules, 00:37:45.480 --> 00:37:48.090 where a library is some code somebody else has written. 00:37:48.090 --> 00:37:52.230 And in Python, we more specifically we call this often a individual module. 00:37:52.230 --> 00:37:54.210 And so, in this example, we'll actually see 00:37:54.210 --> 00:37:58.960 a CSV module to work with data that's inside of a CSV file. 00:37:58.960 --> 00:38:00.540 But what is a CSC file? 00:38:00.540 --> 00:38:04.710 So on your computer, maybe you have Excel or Google Sheets or something 00:38:04.710 --> 00:38:05.280 like that. 00:38:05.280 --> 00:38:08.730 And you could store data in different rows and different columns. 00:38:08.730 --> 00:38:12.480 So notice how here I have a title column and an author column 00:38:12.480 --> 00:38:13.993 and individual rows for every book. 00:38:13.993 --> 00:38:17.160 So I see Goodnight Moon with Margaret Wise Brown, Corduroy with Don Freeman, 00:38:17.160 --> 00:38:21.310 all the way down for these 15 or so books that I have in my data set. 00:38:21.310 --> 00:38:24.330 So this is what it looks like in Google Sheets or Excel. 00:38:24.330 --> 00:38:27.960 But under the hood, in the actual computer's file, 00:38:27.960 --> 00:38:33.660 you'll see something looks a bit like this with title,author Goodnight Moon, 00:38:33.660 --> 00:38:36.930 Margaret Wise Brown, and Corduroy, Don Freeman. 00:38:36.930 --> 00:38:40.680 So a CSV stands for Comma Separated Values, 00:38:40.680 --> 00:38:43.815 where notice how every individual row is a single book 00:38:43.815 --> 00:38:45.690 except for that first one, which is the row-- 00:38:45.690 --> 00:38:47.550 is the column titles. 00:38:47.550 --> 00:38:52.200 And for every row we have, we have multiple columns 00:38:52.200 --> 00:38:54.000 separated by these columns. 00:38:54.000 --> 00:38:56.310 So Goodnight Moon is a title of this book 00:38:56.310 --> 00:38:59.520 and Margaret Wise Brown is the author of this book that's on the same row 00:38:59.520 --> 00:39:00.310 right here. 00:39:00.310 --> 00:39:03.480 And similarly, Winnie the Pooh is the title of this book and A.A. 00:39:03.480 --> 00:39:05.580 Milne is title of this book-- 00:39:05.580 --> 00:39:08.350 is the author of this book on that same row. 00:39:08.350 --> 00:39:12.770 OK, so to read in these kinds of files, we 00:39:12.770 --> 00:39:15.590 might want to use a specialized system that understands 00:39:15.590 --> 00:39:17.750 how this data is formatted, right? 00:39:17.750 --> 00:39:20.690 It would be a lot of work for you to go through and parse every comma. 00:39:20.690 --> 00:39:22.790 To figure out, OK, if there's a comma here, 00:39:22.790 --> 00:39:26.120 I need to put this piece of data in that dictionary or this dictionary. 00:39:26.120 --> 00:39:27.090 Let's not do that. 00:39:27.090 --> 00:39:30.390 Let's actually rely on somebody else who's done that work for us here. 00:39:30.390 --> 00:39:35.300 So in Python, there is this CSV library, or CSV module, 00:39:35.300 --> 00:39:40.010 that has various methods or functions given inside of it 00:39:40.010 --> 00:39:42.540 that can help us read CSV files. 00:39:42.540 --> 00:39:45.980 So here, if you go to the Python documentation, docs.python.org, 00:39:45.980 --> 00:39:48.230 and you look at this CSV module, you'll be 00:39:48.230 --> 00:39:50.510 able to see all the kinds of information on what 00:39:50.510 --> 00:39:55.268 is defined inside the CSV module and what you get as part of that module. 00:39:55.268 --> 00:39:56.810 Now, how would I use this in my code? 00:39:56.810 --> 00:39:57.768 We saw this in lecture. 00:39:57.768 --> 00:40:01.340 I could simply write import csv, similar to hashtag 00:40:01.340 --> 00:40:04.130 includes DNIO or hashtag includes CS50. 00:40:04.130 --> 00:40:07.100 Here, I'm simply including, or importing, 00:40:07.100 --> 00:40:10.130 the CSV library that contains all this functionality I 00:40:10.130 --> 00:40:13.160 saw in the documentation. 00:40:13.160 --> 00:40:15.070 So we can think visually of this. 00:40:15.070 --> 00:40:17.740 It's a bit like getting a big box of stuff. 00:40:17.740 --> 00:40:22.960 We have this big box of code we can use in our program now called CSV. 00:40:22.960 --> 00:40:24.460 This is giving us a big box of code. 00:40:24.460 --> 00:40:27.400 And inside of that are some individual functions we could use. 00:40:27.400 --> 00:40:30.970 We could use maybe DictReader, DictWriter, reader, or writer. 00:40:30.970 --> 00:40:35.140 All this is defined inside of the CSV library. 00:40:35.140 --> 00:40:39.130 But how do we know from this big box of stuff 00:40:39.130 --> 00:40:41.990 what we actually want to use in our program? 00:40:41.990 --> 00:40:46.300 So if we just import the entire module, this entire big box of stuff, 00:40:46.300 --> 00:40:49.630 well, to be more specific, what we want to use, we have to use that dot syntax. 00:40:49.630 --> 00:40:53.870 We could say something like this. csv.DictReader, for example, 00:40:53.870 --> 00:40:57.430 to read our CSV as this collection of dictionaries. 00:40:57.430 --> 00:41:01.810 We could say csv.DictReader saying, go to that big box of stuff 00:41:01.810 --> 00:41:04.960 in the CSV module and give me the DictReader part of it-- 00:41:04.960 --> 00:41:07.210 the DictReader function, right? 00:41:07.210 --> 00:41:11.150 We could also do csv.reader to get the reader aspect and so on. 00:41:11.150 --> 00:41:14.020 So this dot syntax is coming back, but now, it's 00:41:14.020 --> 00:41:18.530 enabling us to access individual parts of our module. 00:41:18.530 --> 00:41:23.575 But let's say we don't want the entirety of this entire big box of CSV module. 00:41:23.575 --> 00:41:24.950 We don't want everything in here. 00:41:24.950 --> 00:41:26.540 Well, we could do this also. 00:41:26.540 --> 00:41:30.470 We could say, instead of import CSV as a whole, well, we could just say, 00:41:30.470 --> 00:41:34.643 give me from the CSV library the DictReader portion. 00:41:34.643 --> 00:41:35.310 I could do this. 00:41:35.310 --> 00:41:38.480 I could simply use now from here on out just DictReader. 00:41:38.480 --> 00:41:43.250 So from CSV, from this big box of stuff, import just DictReader. 00:41:43.250 --> 00:41:47.180 And then, I can simply use DictReader without qualifying where it comes from 00:41:47.180 --> 00:41:50.500 or what module it's part of just using DictReader now. 00:41:50.500 --> 00:41:54.440 Now, in general, we might prefer to actually do this the other way, 00:41:54.440 --> 00:41:55.970 to use it this way. 00:41:55.970 --> 00:41:58.153 And why do you think that might be? 00:41:58.153 --> 00:41:59.445 Think to yourself for a moment. 00:42:03.870 --> 00:42:06.000 Well, it's often handy to do it this way. 00:42:06.000 --> 00:42:07.890 Because we do it this way, we actually are 00:42:07.890 --> 00:42:10.620 able to make sure we don't collide our name. 00:42:10.620 --> 00:42:13.410 So maybe my own program has this function 00:42:13.410 --> 00:42:16.290 called reader, by chance, right? 00:42:16.290 --> 00:42:20.910 Here, if I say csv.reader, that differentiates this reader function 00:42:20.910 --> 00:42:22.630 from my own reader function. 00:42:22.630 --> 00:42:25.380 So it's helpful if you're actually defining your own function that 00:42:25.380 --> 00:42:28.980 might collide names with the functions you get from other modules in Python. 00:42:28.980 --> 00:42:30.750 But you can, of course, do it this way. 00:42:30.750 --> 00:42:32.792 If you'd like, and you're very certain don't have 00:42:32.792 --> 00:42:35.800 any function called DictReader here. 00:42:35.800 --> 00:42:41.560 OK, so let's see some of the differences between using these various functions 00:42:41.560 --> 00:42:43.750 inside of this CSV library. 00:42:43.750 --> 00:42:47.170 So we saw here before we had DictReader, DictWriter, reader, and writer. 00:42:47.170 --> 00:42:48.920 But what are the difference between these, 00:42:48.920 --> 00:42:51.020 and why would we even use one versus another? 00:42:51.020 --> 00:42:55.230 So to do that, let's actually dive into reading files and maybe writing 00:42:55.230 --> 00:42:55.730 to them. 00:42:55.730 --> 00:43:00.160 So let's actually go back to our code space now and think about this CSV. 00:43:00.160 --> 00:43:02.643 We have books.csv. 00:43:02.643 --> 00:43:04.060 and it's same thing we saw before. 00:43:04.060 --> 00:43:07.420 We have title, author as our column names, 00:43:07.420 --> 00:43:13.460 and we have the title and the author on individual rows separated by commas. 00:43:13.460 --> 00:43:17.800 So if I want to read these, I'll say, code reads.py. 00:43:17.800 --> 00:43:21.820 I now have this list of books and I've imported the CSV library 00:43:21.820 --> 00:43:27.270 so I can read books from this CSV and add them to my shelf, so to speak. 00:43:27.270 --> 00:43:32.340 So to open a file in Python, well, there's a few different ways to do it. 00:43:32.340 --> 00:43:33.860 I could simply just call open. 00:43:33.860 --> 00:43:37.190 I could say open books.csv. 00:43:37.190 --> 00:43:41.870 But if I do that, I later have to do something with the file here. 00:43:41.870 --> 00:43:44.772 I'll say it like, file equals open. 00:43:44.772 --> 00:43:45.980 And then, I have to close it. 00:43:45.980 --> 00:43:49.603 I could say close file like this. 00:43:49.603 --> 00:43:50.270 Or is it fclose? 00:43:52.942 --> 00:43:53.900 Pretty sure it's close. 00:43:53.900 --> 00:43:56.150 But you can double check me on that. 00:43:56.150 --> 00:43:58.460 But we actually don't have to-- we actually 00:43:58.460 --> 00:44:02.150 have to worry about this at all if we just say let's not just open 00:44:02.150 --> 00:44:03.120 it like this. 00:44:03.120 --> 00:44:06.320 Let's actually open it within a certain context. 00:44:06.320 --> 00:44:08.045 Only open it for a little bit. 00:44:08.045 --> 00:44:10.920 And once we're done with that file, go ahead and close it afterwards. 00:44:10.920 --> 00:44:15.890 We can say with open this file and let's call it something, like file, 00:44:15.890 --> 00:44:16.500 in this case. 00:44:16.500 --> 00:44:21.890 So we're going to open books.csv and call it file with open this file 00:44:21.890 --> 00:44:22.880 as file. 00:44:22.880 --> 00:44:25.220 Let's do the following code indented. 00:44:25.220 --> 00:44:28.800 So while we're indented here, our file will be open. 00:44:28.800 --> 00:44:30.630 We can do all kinds of things with it. 00:44:30.630 --> 00:44:35.130 But once we unindent, we go back out, our file will be closed. 00:44:35.130 --> 00:44:37.850 We can't do anything more with it here. 00:44:37.850 --> 00:44:40.760 So this takes care of running close on our file 00:44:40.760 --> 00:44:42.760 or figuring out when to open it versus close it. 00:44:42.760 --> 00:44:47.900 Python handles all of that now for us using this with syntax here. 00:44:47.900 --> 00:44:50.490 OK, so let's see this in a little more depth. 00:44:50.490 --> 00:44:51.620 We saw with open. 00:44:51.620 --> 00:44:53.488 This whatever filing we have as file. 00:44:53.488 --> 00:44:54.530 You can change file here. 00:44:54.530 --> 00:44:59.130 We can also call this maybe even books_file. 00:44:59.130 --> 00:45:01.230 Doesn't have to be just file. 00:45:01.230 --> 00:45:03.530 But here, we'll call it file. 00:45:03.530 --> 00:45:06.330 We can then read it in a few different ways. 00:45:06.330 --> 00:45:08.490 And one way, it doesn't even use the CSV module. 00:45:08.490 --> 00:45:09.230 We could do this. 00:45:09.230 --> 00:45:15.050 Text equals file.read where .read is some method associated with a file that 00:45:15.050 --> 00:45:18.650 lets us read in all the data that's part of it and store it inside some 00:45:18.650 --> 00:45:19.230 variables. 00:45:19.230 --> 00:45:20.355 So here, let's do the same. 00:45:20.355 --> 00:45:24.810 Let's say, text equals file.read. 00:45:24.810 --> 00:45:26.700 And this is not using the CSV library. 00:45:26.700 --> 00:45:28.410 We don't even need this right now. 00:45:28.410 --> 00:45:30.940 We could simply say, Python of reads.py. 00:45:30.940 --> 00:45:33.360 And we've maybe read our file. 00:45:33.360 --> 00:45:36.480 It's hard to tell, so let's go ahead and maybe print out the text. 00:45:36.480 --> 00:45:38.220 And our Python of reads.py. 00:45:38.220 --> 00:45:43.110 And now we see, in our terminal, well, all the same text we had before. 00:45:43.110 --> 00:45:44.340 Title, author. 00:45:44.340 --> 00:45:45.930 Goodnight Moon, Margaret Wise Brown. 00:45:45.930 --> 00:45:47.760 All of that is in our terminal now. 00:45:47.760 --> 00:45:49.920 But this isn't very useful, right? 00:45:49.920 --> 00:45:53.670 If we wanted to actually read in some data, store it in our bookshelf, 00:45:53.670 --> 00:45:57.690 well, that isn't going to help us add to our list of books, right? 00:45:57.690 --> 00:46:00.417 All of these books are still jumbled together. 00:46:00.417 --> 00:46:01.500 We don't really want that. 00:46:01.500 --> 00:46:03.540 We want to actually differentiate them and have 00:46:03.540 --> 00:46:07.080 dictionaries for every book in our CSV. 00:46:07.080 --> 00:46:12.030 So that's where this function of-- this method called a DictReader comes in. 00:46:12.030 --> 00:46:16.770 We can actually use the CSV module to define a special way to read our file. 00:46:16.770 --> 00:46:18.660 And we can then use it like this. 00:46:18.660 --> 00:46:21.375 We could say, give us a new file reader, this one 00:46:21.375 --> 00:46:23.910 is special from the csv.DictReader function. 00:46:23.910 --> 00:46:26.760 And let's actually use that to go through every individual row 00:46:26.760 --> 00:46:28.920 in our file and do something for those rows. 00:46:28.920 --> 00:46:30.810 So it's best shown by example here. 00:46:30.810 --> 00:46:34.080 If we go back to our code, let's not just read all the text. 00:46:34.080 --> 00:46:36.960 Let's go ahead and get a special reader for our file. 00:46:36.960 --> 00:46:39.960 Let's say I want a file reader. 00:46:39.960 --> 00:46:46.400 And I'm going to make sure this is the DictReader inside of the CSV module. 00:46:46.400 --> 00:46:50.710 Well, if I read the documentation, I know that DictReader needs access 00:46:50.710 --> 00:46:51.910 to a certain file to read. 00:46:51.910 --> 00:46:56.110 So I'll give it my file here that I've opened, book.csv. 00:46:56.110 --> 00:46:58.630 And DictReader will give us back some special data 00:46:58.630 --> 00:47:01.300 structure that we'll call file reader. 00:47:01.300 --> 00:47:03.620 And we can use this in our code as follows. 00:47:03.620 --> 00:47:07.330 I have to loop over it similar to a list in Python. 00:47:07.330 --> 00:47:08.830 And I can do that with for syntax. 00:47:08.830 --> 00:47:12.768 I could say for whatever in file reader. 00:47:12.768 --> 00:47:14.560 And let's maybe call this-- because we know 00:47:14.560 --> 00:47:16.477 we're looking at books-- let's call this book. 00:47:16.477 --> 00:47:18.400 For book in file reader. 00:47:18.400 --> 00:47:21.860 And now let's just print out book to see what we get here. 00:47:21.860 --> 00:47:24.760 So I want it to just say Python of reads.py. 00:47:24.760 --> 00:47:29.300 And now, I see individual books as dictionaries. 00:47:29.300 --> 00:47:31.240 So DictReader has done a lot of stuff for us. 00:47:31.240 --> 00:47:34.970 It said, well, I know that from your CSV file, 00:47:34.970 --> 00:47:38.200 you have these columns called title and author. 00:47:38.200 --> 00:47:40.677 And I also know that every individual row 00:47:40.677 --> 00:47:42.760 is going to be some particular element that you're 00:47:42.760 --> 00:47:46.480 interested in where maybe this column corresponds to title 00:47:46.480 --> 00:47:48.820 and this column corresponds to author. 00:47:48.820 --> 00:47:51.460 So what I'll do is I'll give you each of those rows 00:47:51.460 --> 00:47:55.660 as a dictionary with the keys that are your column names 00:47:55.660 --> 00:48:00.320 and the values that are whatever is inside every individual row right here. 00:48:00.320 --> 00:48:02.800 Notice how we have, in this case, title as the key. 00:48:02.800 --> 00:48:04.180 Goodnight Moon is the value. 00:48:04.180 --> 00:48:06.880 Author is the key and Margaret Wise Brown is the value. 00:48:06.880 --> 00:48:09.730 And it's the very same thing all the way through our CSV. 00:48:09.730 --> 00:48:13.950 Now, print it out in our terminal in individual dictionaries. 00:48:13.950 --> 00:48:17.470 So if we've printed these out, adding them to our list is pretty simple. 00:48:17.470 --> 00:48:21.950 We can just say books.append an individual book. 00:48:21.950 --> 00:48:25.693 And now, if we clear our terminal and are in Python or reads.py, 00:48:25.693 --> 00:48:28.110 well, we don't see anything because I didn't print it out. 00:48:28.110 --> 00:48:31.850 So do print books down below here. 00:48:31.850 --> 00:48:35.720 And now, we see all of our books inside of our list. 00:48:35.720 --> 00:48:38.660 And this is helpful because, again, we could just print out 00:48:38.660 --> 00:48:40.670 individual books or book titles. 00:48:40.670 --> 00:48:43.910 I could say for book in books. 00:48:43.910 --> 00:48:46.860 Let me go ahead and print out the book title. 00:48:46.860 --> 00:48:48.180 And I should see-- 00:48:48.180 --> 00:48:51.060 instead of all of my information on all my books, 00:48:51.060 --> 00:48:56.130 like we saw earlier with our just .read method, I can say Python of read.py. 00:48:56.130 --> 00:49:01.320 Now, I see just the titles formatted very nicely for myself here. 00:49:01.320 --> 00:49:05.690 So if we go back, this is how we're going to actually read CSV files. 00:49:05.690 --> 00:49:08.660 There's other ways too, like csv.reader. 00:49:08.660 --> 00:49:11.150 But this isn't going to quite be as useful for us 00:49:11.150 --> 00:49:12.567 because let's take a look at this. 00:49:12.567 --> 00:49:19.140 If we say csv.reader, let's go ahead and save for row in file reader, print row. 00:49:19.140 --> 00:49:21.840 Let's see what we get instead. 00:49:21.840 --> 00:49:22.800 Python reads.py. 00:49:22.800 --> 00:49:24.870 Well, we get just a list. 00:49:24.870 --> 00:49:28.740 And notice how it's actually included the column names in a single list. 00:49:28.740 --> 00:49:33.540 So reader gives us back every row of our file as a list. 00:49:33.540 --> 00:49:35.850 But this isn't quite as handy because I have 00:49:35.850 --> 00:49:40.410 to know that, for example, if I want to print out the book title, well, 00:49:40.410 --> 00:49:44.910 that's going to be a row where my titles are in the very first index. 00:49:44.910 --> 00:49:48.083 So I'll say row [0] to get the titles. 00:49:48.083 --> 00:49:48.750 Python reads.py. 00:49:48.750 --> 00:49:52.320 And I get title Goodnight Moon, Corduroy. 00:49:52.320 --> 00:49:55.020 That works, but it's not quite as clean as being 00:49:55.020 --> 00:49:58.660 able to name the actual attributes of my book that I want. 00:49:58.660 --> 00:50:00.300 And that's where DictReader comes in. 00:50:00.300 --> 00:50:03.120 By reading our rows as dictionaries, we get access 00:50:03.120 --> 00:50:06.230 to those individual keys we can use throughout our code here. 00:50:06.230 --> 00:50:07.980 So for book and file reader, I can instead 00:50:07.980 --> 00:50:12.130 print, in this case, the book's title. 00:50:12.130 --> 00:50:16.440 Oops, the book title. 00:50:16.440 --> 00:50:17.700 Python reads.py. 00:50:17.700 --> 00:50:20.860 And now, I see all of this. 00:50:20.860 --> 00:50:22.930 And DictReader also knows, if you're curious, 00:50:22.930 --> 00:50:26.650 not to print out the actual first row because it assumes that these are going 00:50:26.650 --> 00:50:29.350 to be the key names, unlike reader, which 00:50:29.350 --> 00:50:34.270 does not make that same assumption, OK? 00:50:34.270 --> 00:50:38.440 So having done this tour of these libraries and these modules 00:50:38.440 --> 00:50:40.652 and how to read in different pieces of data, 00:50:40.652 --> 00:50:42.610 this is really going to give you a lot of tools 00:50:42.610 --> 00:50:44.350 to use on this week's problem set. 00:50:44.350 --> 00:50:48.040 You'll be actually working with these very similar files, CSV, 00:50:48.040 --> 00:50:49.625 so even just textual files. 00:50:49.625 --> 00:50:52.750 And as you go through, food to keep in mind all what we learned right here. 00:50:52.750 --> 00:50:56.780 How to open a file, how to read it in using DictReader, and so on. 00:50:56.780 --> 00:51:00.580 And as you go off into this week, feel free to use all this stuff 00:51:00.580 --> 00:51:02.028 from the section. 00:51:02.028 --> 00:51:03.320 Thank you all for coming today. 00:51:03.320 --> 00:51:04.195 Wonderful to see you. 00:51:04.195 --> 00:51:06.500 We'll see you next week.