[CROWD MURMURING] [MUSIC PLAYING] DAVID MALAN: All right, this is CS50's Introduction to Programming with Python. My name is David Malan, and this is our week on File I/O, Input and Output of files. So up until now, most every program we've written just stores all the information that it collects in memory-- that is, in variables or inside of the program itself, a downside of which is that, as soon as the program exits, anything you typed in, anything that you did with that program is lost. Now, with files, of course, on your Mac or PC, you can hang on to information long term. And File I/O within the context of programming is all about writing code that can read from, that is load information from, or write to, that is save information to, files themselves. So let's see if we can't transition then from only using memory and variables and the like to actually writing code that saves some files for us and, therefore, data persistently. Well, to do this, let me propose that we first consider a familiar data structure, a familiar type of variable that we've seen before, that of a list. And using lists, we've been able to store more than one piece of information in the past. Using one variable, we typically store one value. But if that variable is a list, we can store multiple values. Unfortunately, lists are stored in the computer's memory. And so once your program exits, even the contents of those disappear. But let's at least give ourselves a starting point. So I'm over here in VS Code. And I'm going to go ahead and create a simple program using code of names.py, a program that just collects people's names, students' names, if you will. And I'm going to do it super simply initially in a manner consistent with what we've done in the past to get user input and print it back out. I'm going to say something like this, name equals input, quote/unquote, what's your name? Thereby storing in a variable called name the return value of input, as always. And as always, I'm going to go ahead and very simply print out a nice f string that says, hello, comma, and then, in curly braces, name to print out Hello, David, hello, world, whoever happens to be using the program. Let me go ahead and run this just to remind myself what I should expect. And if I run python of names.py and hit Enter, type in my name like David, of course, I now see Hello, comma, David. Suppose, though, that we wanted to add support not just for one name, but multiple names-- maybe three names for the sake of discussion so that we can begin to accumulate some amount of information in the program, such that it's really going to be a downside if we keep throwing it away once the program exits. Well, let me go back into names.py up here at top. Let me proactively give myself a variable, this time called names, plural. And set it equal to an empty list. Recall that the square bracket notation, especially if nothing's inside of it, just means, give me an empty list that we can add things to over time. Well, what do we want to add to it? Well, let's add three names, each from the user. And let me say something like this, for underscore in range of 3, let me go ahead and prompt the user with the input function and getting their name in this variable. And then using list syntax, I can say, names.append name to that list. And now I have, in that list, that given name-- 1, 2, 3 of them. Other points to note is, I could use a variable here, like i, which is conventional. But if I'm not actually using i explicitly on any subsequent lines, I might as well just use underscore, which is a Pythonic convention. And actually, if I want to clean this up a little bit right now, notice that my name variable doesn't really need to exist because I'm assigning it a value and then immediately appending it. Well, I could tighten this up further by just getting rid of that variable altogether and just appending immediately the return value of input. I think we could go both ways in terms of design here. On the one hand, it's a pretty short line, and it's readable. On the other hand, if I were to eventually change this phrase to be not what's your name but something longer, we might want to break it out again into two lines. But for now, I think it's pretty readable. Now later in the program, let's just go ahead and print out those same names, but let's sort them alphabetically so that it makes sense to be gathering them all together, then sorting them, and printing them. So how can I do that? Well, in Python, the simplest way to sort a list in a loop is probably to do something like this. For name in names-- but wait. Let's sort the names first. Recall that there's a function called sorted which will return a sorted version of that list. Now let's go ahead and print out an f string that says, again, hello, bracket, name, close quotes. All right, let me go ahead and run this. So Python of names.py, and let me go ahead and type in a few names this time. How about Hermione? How about Harry? How about Ron? And notice that they're not quite in alphabetical order. But when I hit Enter and that loop kicks in, it's going to print out, hello, Harry, hello, Hermione, hello, Ron, in sorted order. But of course, now, if I run this program again, all of the names are lost. And if this is a bigger program than this, that might actually be pretty painful to have to re-input the same information again, and again, and again. Wouldn't it be nice, like most any program today on a phone, or a laptop, or desktop, or cloud to be able to save this information somehow instead? And that's where File I/O comes in. And that's where files come in. They are a way of storing information persistently on your own phone, or Mac, or PC, or some cloud server's disk so that they're there when you come back and run the program again. So how can we go about saving all three of these names on in a file as opposed to having to type them again and again? Let me go ahead and simplify this file and, again, give myself just a single variable called name, and set the return value of input equal to that variable. So what's your name, as before, quote/unquote. And now let me go ahead, and let me do something more with this value. Instead of just adding it to a list or printing it immediately out, let's save the value of the person's name that's just been typed in to a file. Well, how do we go about doing that? Well, in Python, there's this function called open whose purpose in life is to do just that, to open a file, but to open it up programmatically so that you, the programmer, can actually read information from it or write information to it. So open is like the programmer's equivalent of double clicking on an icon on your Mac or PC. But it's a programmer's technique because it's going to allow you to specify exactly what you want to read from or write to that file. Formally, it's documentation is here, and you'll see that it's usage is relatively straightforward. It minimally just requires the name of the file that we want to open and, optionally, how we want to open it. So let me go back to VS Code here, and let me propose now that I do this. I'm going to go ahead and call this function called open, passing in an argument for names.txt, which is the name of the file I would like to store all of these names in. I could call it anything I want. But because it's going to be just text, it's conventional to call it something.txt. But I'm also going to tell the open function that I plan to write to this file. So as a second argument to open, I'm going to put literally, quote/unquote, w, for Write, and that's going to tell open to open the file in a way that's going to allow me to change the content. And better yet, if it doesn't even exist yet, it's going to create the file for me. Now, open returns what's called a file handle, a special value that allows me to access that file subsequently. So I'm going to go ahead and sign it equal to a variable like file. And now I'm going to go ahead and, quite simply, write this person's name to that file. So I'm going to literally type file, which is the variable linking to that file, .write, which is a function otherwise known as a method that comes with open files that allows me to write that name to the file. And then lastly, I'm going to quite simply going to go ahead and say, file.close, which will close and effectively save the file. So these three lines of code here are essentially the programmer's equivalent to double clicking an icon on your Mac or PC, making some changes in Microsoft Word or some other program, and going to File, Save. We're doing that all in code with just these three lines here. Well, let's see, now, how this works. Let me go ahead now and run python of names.py and Enter. Let's type in a name. I'll type in Hermione, Enter. All right, where did she end up? Well, let me go ahead now and type code of names.txt, which is a file that happens now to exist because I opened it in write mode. And if I open this in a tab, we'll see there is Hermione. Well, let's go ahead and run names.py once more. I'm going to go ahead and run python of names.py, Enter, and this time, I'll type in Harry. Let me go ahead and run it one more time. And this time, I'll type in Ron. And now let me go up to names.txt, where, hopefully, I'll see all three of them here. But no. I've just actually seen Ron. What might explain what happened to Hermione and Harry, even though I'm pretty sure I ran the program three times, and I definitely wrote the code that writes their name to that file? What's going on here, do you think? AUDIENCE: I think because we're not appending them, we should append the names. Since we are writing directly, it is erasing the old content, and it is replacing with the last set of characters that we mentioned. DAVID MALAN: Exactly. Unfortunately, quote/unquote w is a little dangerous. Not only will it create the file for you, it will also recreate the file for you every time you open the file in that mode. So if you open the file once and write Hermione, that worked just fine, as we saw. But if you do it again for Harry, if you do it again for Ron, the code is working. But each time, it's opening the file and recreating it with brand-new contents, so we had one version with Hermione, and one version with Harry, and one final version with Ron. But ideally, I think we probably want to be appending, as Vishal says, each of those names to the file, not just clobbering-- that is, overwriting the file each time. So how can I do this? It's actually a relatively easy fix. Let me go ahead and do this as follows. I'm going to first remove the old version of names.txt. And now I'm going to change my code to do this. I'm going to change the w, quote/unquote, to just a, quote/unquote-- a for Append, which means to add to the bottom, to the bottom, to the bottom, again and again. Now let me go ahead and rerun python of names.py, Enter. I'll again start from scratch with Hermione because I'm creating the file new. Notice that if I now do code of names.txt, Enter, we do see that Hermione is back. So after removing the file, it did get recreated, even though I'm using append, which is good. But now let's see what happens when I go back to my terminal. And this time, I run python of names.py again-- this time, typing in Harry. And let me run it one more time-- this time, typing in Ron. So hopefully, this time, in that second tab, names.txt, I should now see all three of them. But, but, but, but this doesn't look ideal. What have I clearly done wrong? Something tells me, even though all three names are there, it's not going to be easy to read those back unless you know where each name ends and begins. AUDIENCE: The English format is not correct. The English format is not correct. It's incorrect. It's concatenating them. DAVID MALAN: It is. Well, it appears to be concatenating. But technically speaking, it's just appending to the file-- first Hermione, then Harry, then Ron. It has the effect of combining them back to back, but it's not concatenating, per se. It really is just appending. Let's go to another hand here. What really have I done wrong? Or equivalently, how might I fix? It would be nice if there were some kind of gaps between each of the names, so we could read them more cleanly. AUDIENCE: Hello. We should add a new line before we write new name. DAVID MALAN: Good. We want to add a new line ourselves. So whereas print by default, recall, always outputs, automatically, a line ending of backslash n. Unless we override it with the named parameter called end, write does not do that. Write takes you literally. And if you say write Hermione, that's it. You're getting the H through the e. If you say, write Harry, you get the H through the y. You don't get any extra new lines automatically. So if you want to have a new line at the end of each of these names, we've got to do that manually. So let me, again, close names.txt, and let me remove the current file. And let me go back up to my code here. And I can fix this in any number of ways, but I'm just going to go ahead and do this. I'm going to write out an f string that contains name and backslash n at the end. We could do this in different ways. We could manually print just the new line or some other technique, but I'm going to go ahead and use my f strings, as I'm in the habit of doing, and just print the name and the new line all at once. I'm going to go ahead now and down to my terminal window, run python of names.py again, Enter. We'll type in Hermione. I'm going to run it again, type in Harry. I'm going to type it again and this time, Ron. Now I'm going to run code of names.txt and open that file. And now it looks like the file is a bit cleaner. Indeed, I have each of the name on its own line as well as a line ending, which ensures that we can separate one from the other. Now, if I were writing code, I bet I could parse, that is, read the previous file by looking at differences between lowercase and uppercase letters. But that's going to get messy quickly. Generally speaking, when storing data long-term in a file, you should probably do it somehow cleanly, like doing one name at a time. Well, let's now go back, and I'll propose that this code is now working correctly, but we can design it a little bit better. It turns out that it's all too easy when writing code to sometimes forget to close files. And sometimes, this isn't necessarily a big deal. But sometimes, it can create problems. Files could get corrupted or accidentally deleted or the like, depending on what happens in your code. So it turns out that you don't strictly need to call close on the file yourself if you take another approach instead. More Pythonic when manipulating files is to do this, to introduce this other keyword called, quite simply, with that allows you to specify that, in this context, I want you to open and automatically close some file. So how do we use with? It simply looks like this. Let me go back to my code here. I've gotten rid of the close line. And I'm now just going to say this instead. Instead of saying, file equals open, I'm going to say, with open, then the same arguments as before, and somewhat curiously, I'm going to put the variable at the end of the line. Why? That's just the way this is done. You say, with, you call the function in question, and then you say as and specify the name of the variable that should be assigned the return value of open. Then I'm going to go ahead and indent the line underneath so that the line of code that's writing the name is now in the context of this with statement, which just ensures that, automatically, if I had more code in this file down below no longer indented, the file would be automatically closed as soon as line 4 is done executing. So it doesn't change what has just happened, but it does automate the process of at least closing things for us just to ensure I don't forget and so that something doesn't go wrong. But suppose, now, that I wanted to read these names from the file. All I've done thus far is write code that writes names to the file. But let's assume, now, that we have all of these names in the file. And heck, let's go ahead and add one more. Let me go ahead and run this one more time-- python of names.py. And let's add in Draco to the mix. So now that we have all four of these names here, how might we want to read them back? Well, let me propose that we go into names.py now, or we could create another program altogether. But I'm going to keep reusing the same name just to keep us focused on this. And now I'm going to write code that reads an existing file with Hermione, Harry, Ron, and Draco together. And how do I do this? Well, it's similar in spirit. I'm going to start this time with with open, and then the first argument is going to be the name of the file that I want to open, as before. And I'm going to open it, this time, in read mode-- quote/unquote, r. And to read a file just means to load it, not to save it. And I'm going to name the return value file. And now I'm going to do this. And there's a number of ways I can do this, but one way to read all of the lines from the file at once would be this. Let me declare a variable called lines. Let me access that file and call a function or a method that comes with it called readlines. So if you read the documentation on File I/O in Python, you'll see that open files come with a special method whose purpose in life is to read all the lines from the file and return them to me as a list. So what this line 2 is doing is it's reading all of the lines from that file, storing them in a variable called lines. Now, suppose I want to iterate over all of those lines and print out each of those names. For line in lines, this is just a standard for loop in Python. Lines as a list. Line is the variable that will be automatically set to each of those lines. Let me go ahead and print out something like, oh, hello, comma, and then I'll print out the line itself. All right, so let me go to my terminal window, run python of names.py now-- I have not deleted names.txt, so it still contains all four of those names-- and hit Enter, and OK, it's not bad, but it's a little ugly here. What's going on? When I ran names.py, it's saying Hello to Hermione, to Harry, to Ron, to Draco. But there's these gaps now between the lines. What explains that symptom? If nothing else, it just looks ugly. AUDIENCE: It happens because in the text file, we have new line symbols in between those names, and the print always adds another new line at the end. So you use the same symbol twice. DAVID MALAN: Perfect. And here's a good example of a bug, a mistake in a program. But if you just think about those first principles, like, how do each of the lines of code work that I'm using? You should be able to reason, exactly as Ripal there to say that, all right, well, one of those new lines is coming from the file after each name. And then, of course, print, all of these weeks later, is still giving us for free that extra new line. So there's a couple possible solutions. I could certainly do this, which we've done in the past, and pass in a named argument to print, like end="". And that's fine. I would argue a little better than that might actually be to do this, to strip off of the end of the line the actual new line itself so that print is handling the printing of everything, the person's name as well as the new line. But you're just stripping off what is really just an implementation detail in the file. We chose to use new lines in my text file to separate one name from another. So arguably, it should be a little cleaner in terms of design to strip that off and then let print print out what is really just now a name. But that's ultimately a design decision. The effect is going to be exactly the same. Well, if I'm going to open this file and read all the lines and then iterate over all of those lines and print them each out, I could actually combine this into one thing because, right now, I'm doing twice as much work. I'm reading all of the lines, then I'm iterating over all of the lines just to print out each of them. Well, in Python, with files, you can actually do this. I'm going to erase almost all of these lines now, keeping only with statement at top. And inside of this with statement, I'm going to say this, for line in file, go ahead and print out, quote/unquote, hello, comma, and then line.rstrip. So I'm going to take the approach of stripping off the end of the line. But notice how elegant this is, so to speak. I've opened the file in line 1. And if I want to iterate over every line in the file, I don't have to very explicitly read all the lines, then iterate over all of the lines. I can combine this into one thought. In Python, you can simply say, for line in file, and that's going to have the effect of giving you a for loop that iterates over every line in the file, one at a time, and on each iteration, updating the value of this variable line to be Hermione, then Harry, then Ron, then Draco. So this, again, is one of the appealing aspects of Python is that it reads rather like English-- for line in file, print this. It's a little more compact when written this way. Well, what if, though, I don't want quite this behavior? Because notice now, if I run python of names.py, it's correct. I'm seeing each of the names and each of the hellos, and there's no extra spaces in between. But just to be difficult, I'd really like us to be sorting these hellos. Really, I'd like to see Draco first, then Harry, then Hermione, then Ron, no matter what order they appear in the file. So I could go in, of course, to the file and manually change the file. But if that file is changing over time based on who is typing their name into the program, that's not really a good solution. In code, I should be able to load the file, no matter what it looks like, and just sort it all at once. Now, here is a reason to not do what I've just done. I can't iterate over each line in the file and print it out but sort everything in advance. Logically, if I'm looking at each line one at a time and printing it out, it's too late to sort. I really need to read all of the lines first without printing them, sort them, then print them. So we have to take a step back in order to add now this new feature. So how can I do this? Well, let me combine some ideas from before. Let me go ahead and start fresh with this. Let me give myself a list called names, and assign it an empty list, just so I have a variable in which to accumulate all of these lines. And now let me open the file with open, quote/unquote, names.txt. And it turns out, I can tighten this up a little bit. It turns out, if you're opening a file to read it, you don't need to specify, quote/unquote, r. That is the implicit default. So you can tighten things up by just saying, open names.txt. And you'll be able to read the file but not write it. I'm going to give myself a variable called file, as before. I am going to iterate over the file in the same way, for line in file. But instead of printing each line, I'm going to do this. I'm going to take my names list and append to it. And this is appending to a list in memory, not appending to the file itself. I'm going to go ahead and append the current line, but I'm going to strip off the new line at the end so that all I'm adding to this list is each of the students' names. Now I can use that familiar technique from before. Let me go outside of this with statement because now I've read the entire file, presumably. So by the time I'm done with lines 4 and 5, again, and again, and again, for each line in the file, I'm done with the file. It can close. I now have all of the students' names in this list variable. Let me do this. For name in, not just names, but the sorted names, using our Python function sorted, which does just that, and do print, quote/unquote, with an f string, hello, comma, and now I'll plug in bracket name. So now, what have I done? I'm creating a list at the beginning, just so I have a place to gather my data. I then, on lines 3 through 5, iterate over the file from top to bottom, reading in each line, one at a time, stripping off the new line and adding just the student's name to this list. And the reason I'm doing that is so that on line 7, I can sort all of those names, now that they're all in memory, and print them in order. I need to load them all into memory before I can sort them. Otherwise, I'd be printing them out prematurely, and Draco would end up last instead of first. So let me go ahead in my terminal window and run python of names.py now, and hit Enter. And there we go. The same list of four hellos, but now they're sorted. And this is a very common technique. When dealing with files and information more generally, if you want to change that data in some way, like sorting it, creating some kind of variable at the top of your program, like a list, adding or appending information to it just to collect it in one place, and then do something interesting with that collection, that list, is exactly what I've done here. Now, I should note that if we just want to sort the file, we can actually do this even more simply in Python, particularly by not bothering with this names list, nor the second for loop. And let me go ahead and, instead, just do more simply this. Let me go ahead and tell Python that we want the file itself to be sorted using that same sorted function, but this time on the file itself. And then inside of that for loop, let's just go ahead and print right away our hello, comma, followed by the line itself, but still stripping off of the end of it any white space therein. If we go ahead and run this same program now with python of names.py and hit Enter, we get the same result. But of course, it's a lot more compact. But for the sake of discussion, let's assume that we do actually want to potentially make some changes to the data as we iterate over it. So let me undo those changes, leave things as is. Whereby now, we'll continue to accumulate all of the names first into a list, maybe do something to them, maybe forcing them to uppercase or lowercase or the like, and then sort and print out each item. Let me pause and see if there's any questions now on File I/O reading or writing or now accumulating all of these values in some list. AUDIENCE: Hi. Is there a way to sort the files-- instead if you want it from alphabetically from A to Z, is there a way to reverse it from Z to A. Is there a little extension that you can add to the end to do that? Or would you have to create a new function? DAVID MALAN: If you wanted to reverse the contents of the file? AUDIENCE: Yeah, so if you, instead of sorting them from A to Z in ascending order, if you wanted them in descending order, is there an extension for that? DAVID MALAN: There is, indeed. And as always, the documentation is your friend. So if the goal is to sort them, not in alphabetical order, which is the default, but maybe reverse alphabetical order, you can take a look, for instance, at the formal Python documentation there. And what you'll see is this summary. You'll see that the sorted function takes the first argument, generally known as an iterable. And something that's iterable means that you can iterate over it. That is you can loop over it one thing at a time. What the rest of this line here means is that you can specify a key, like, how you want to sort it, but more on that later. But this last named parameter here is reverse. And by default, per the documentation, it's false. It will not be reversed by default. But if we change that to true, I bet we can do that. So let me go back to VS Code here and do just that. Let me go ahead and pass in a second argument to sorted in addition to this iterable, which is my names list-- iterable, again, in the sense that it can be looped over. And let me pass in reverse=True, thereby overriding the default of false. Let me now run python of names.py. And now Ron's at the top, and Draco's at the bottom. So there, too, whenever you have a question like that moving forward, consider, what does the documentation say? And see if there's a germ of an idea there because, odds are, if you have some problem, odds are, some programmer before you has had the same question. Other thoughts? AUDIENCE: Can we limit the number or numbers of names? And the second question, can we find a specific name in list? DAVID MALAN: Really good question, can we limit the number of the names in the file? And can we find a specific one? We absolutely could. If we were to write code, we could, for instance, open the file first, count how many lines are already there, and then if there's too many already, we could just exit with sys.exit or some other message to indicate to the user that, sorry, the class is full. As for finding someone specifically, absolutely. You could imagine opening the file, iterating over it with a for loop again and again and then adding a conditional. Like, if the current line equals equals Harry, then we found the chosen run. And you can print something like that. So you can absolutely combine these ideas with previous ideas, like conditionals, to ask those same questions. How about one other question on File I/O? AUDIENCE: So I just thought about this function, like read all the lines. And it looks like it's separate all the lines by this special character, backslash. And but it looks like we don't need it character, and we always strip it. And it looks like some bad design or function. Why wouldn't we just strip it inside this function? DAVID MALAN: A really good question. So we are, in my examples thus far, using rstrip to strip from the end of the line all of this white space. You might not want to do that. In this case, I am stripping it away because I know that each of those lines isn't some generic line of text. Each line really represents a name that I have put there myself. I'm using the new line just to separate one value from another. In other scenarios, you might very well want to keep that line ending because it's a very long series of text, or a paragraph, or something like that, where you want to keep it distinct from the others. But it's just a convention. We have to use something, presumably, to separate one chunk of text from another. There are other functions in Python that will, in fact, handle the removal of that white space for you. Readlines, though, does literally that, though. It reads all of the lines as is. Well, allow me to turn our attention back to where we left off here, which is just names to propose that, with names.txt, we have an ability, it seems, to store each of these names pretty straightforwardly. But what if we wanted to keep track of other information as well? Suppose that we wanted to store information, including a student's name and their house at Hogwarts, be it Gryffindor, or Slytherin, or something else. Well, where do we go about putting that? Hermione lives in Gryffindor, so we could do something like this in our text file. Harry lives in Gryffindor, so we could do that. Ron lives in Gryffindor, so we could do that. And Draco lives in Slytherin, so we could do that. But I worry here-- but I worry now that we're mixing apples and oranges, so to speak. Some lines are names. Some lines are houses. So this probably isn't the best design, if only because it's confusing, or it's ambiguous. So maybe what we could do is adopt a convention. And indeed, this is, in fact, what a lot of programmers do. They change this file not to be names.txt, but instead, let me create a new file called names.csv. CSV stands for Comma-Separated Values. And it's a very common convention to store multiple pieces of information that are related in the same file. And so to do this, I'm going to separate each of these types of data, not with another new line, but simply with a comma. I'm going to keep each student on their own line, but I'm going to separate the information about each student using a comma instead. And so now we sort of have a two-dimensional file, if you will. Row by row, we have our students. But if you think of these commas as representing a column, even though it's not perfectly straight because of the lengths of these names, it's a little jagged. You can think of these commas as representing a column. And it turns out, these CSV files are very commonly used when you use something like Microsoft Excel, Apple Numbers, or Google Spreadsheets, and you want to export the data to share with someone else as a CSV file. Or conversely, if you want to import a CSV file into your preferred spreadsheet software, like Excel, or Numbers, or Google Spreadsheets, you can do that as well. So CSV is a very common, very simple text format that just separates values with commas and different types of values, ultimately, with new lines as well. Let me go ahead and run code of students.csv to create a brand-new file that's initially empty. And we'll add to it those same names but also some other information as well. So if I now have this new file, students.csv, inside of which is one column of names, so to speak, and one column of houses, how do I go about changing my code to read not just those names but also those names and houses so that they're not all on one line-- we somehow have access to both type of value separately? Well, let me go ahead and create a new program here called students.py. And in this program, let's go about reading, not a text file, per se, but a specific type of text file, a CSV, a Comma-Separated Values file. And to do this, I'm going to use similar code as before. I'm going to say with open, quote/unquote, students.csv. I'm not going to bother specifying, quote/unquote, r because, again, that's the default. But I'm going to give myself a variable name of file. And then in this file, I'm going to go ahead and do this. For line in file, as before, and now I have to be a bit clever here. Let me go back to students.csv, looking at this file, and it seems that on my loop on each iteration, I'm going to get access to the whole line of text. I'm not going to automatically get access to just Hermione or just Gryffindor. Recall that the loop is going to give me each full line of text. So logically, what would you propose that we do inside of a for loop that's reading a whole line of text at once, but we now want to get access to the individual values, like Hermione and Gryffindor, Harry and Gryffindor? How do we go about taking one line of text and gaining access to those individual values, do you think? Just instinctively, even if you're not sure what the name of the functions would be. AUDIENCE: You can access it as you would as if you were using a dictionary, like using a key and value. DAVID MALAN: So ideally, we would access it using it a key and value. But at this point in the story, all we have is this loop, and this loop is giving me one line of text that is the time. I'm the programmer now. I have to solve this. There is no dictionary yet in question. How about another suggestion here? AUDIENCE: So you can somehow split the two words based on the comma? DAVID MALAN: Yeah, even if you're not quite sure what function is going to do this, intuitively, you want to take this whole line of text-- Hermione, comma, Gryffindor, Harry, comma, Gryffindor, and so forth-- and split that line into two pieces, if you will. And it turns out wonderfully, the function we'll use is actually called split that can split on any characters, but you can tell it what character to use. So I'm going to go back into students.py, and inside of this loop, I'm going to go ahead and do this. I'm going to take the current line. I'm going to remove the white space at the end, as always, using rstrip here. And then whatever the result of that is, I'm going to now call split and, quote/unquote, comma. So the split function or method comes with strings. Strs in Python-- any str has this method built-in. And if you pass in an argument, like a comma, what this split function will do is split that current string into 1, 2, 3, maybe more pieces by looking for that character again and again. Ultimately, split is going to return to us a list of all of the individual parts to the left and to the right of those commas. So I can give myself a variable called row here. And this is a common paradigm. When you know you're iterating over a file, specifically a CSV, it's common to think of each line of it as being a row and each of the values therein separated by commas as columns, so to speak. So I'm going to deliberately name my variable row, just to be consistent with that convention. And now what do I want to print? Well, I'm going to go ahead and say this. Print, how about the following, an f string that starts with curly braces-- well, how do I get access to the first thing in that row? Well, the row is going to have how many parts? Two, because if I'm splitting on commas, and there's one comma per line, that's going to give me a left part and a right part, like Hermione and Gryffindor, Harry and Gryffindor. When I have a list like row, how do I get access to individual values? Well, I can do this. I can say, row, bracket, 0. And that's going to go to the first element of the list, which should hopefully be the student's name. Then after that, I'm going to say, is in, and I'm going to have another curly brace here for row, bracket, 1. And then I'm going to close my whole quote. So it looks a little cryptic at first glance. But most of this is just f string syntax with curly braces to plug in values. And what values am I plugging in? Well, row, again, is a list, and it has two elements, presumably-- Hermione in one and Gryffindor in the other, and so forth. So bracket 0 is the first element because, remember, we start indexing at 0 in Python. And 1 is going to be the second element. So let me go ahead and run this now and see what happens-- python of students.py, Enter. And we see Hermione is in Gryffindor. Harry's in Gryffindor. Ron is in Gryffindor. And Draco is in Slytherin. So we have now implemented our own code from scratch that actually parses, that is, reads and interprets a CSV file ultimately here. Now, let me pause to see if there's any questions. But we'll make this even easier to read in just a moment. Any questions on what we've just done here by splitting by comma? AUDIENCE: So my question is, can we edit any line of code any time we want? Or the only option that we have is to append the lines? Or let's say, we want to, let's say, change Harry's house to Slytherin or some other house. DAVID MALAN: Yeah, a really good question. What if you want to, in Python, change a line in the file and not just append to the end? You would have to implement that logic yourself. So for instance, you could imagine now opening the file and reading all of the contents in, then maybe iterating over each of those lines. And as soon as you see that the current name equals equals Harry, you could maybe change his house to Slytherin. And then it would be up to you, though, to write all of those changes back to the file. So in that case, you might want to, in simplest form, read the file once and let it close. Then open it again, but open for writing, and change the whole file. It's not really possible or easy to go in and change just part of the file, though you can do it. It's easier to actually read the whole file, make your changes in memory, then write the whole file out. But for larger files where that might be quite slow, you can be more clever than that. Well, let me propose now that we clean this up a little bit because I actually think this is a little cryptic to read-- row, bracket, 0, row, bracket, 1-- it's not that well-written at the moment, I would say. But it turns out that when you have a variable that's a list like row, you don't have to throw all of those variables into a list. You can actually unpack that whole sequence at once. That is to say, if you know that a function like split returns a list, but you know in advance that it's going to return two values in a list, the first and the second, you don't have to throw them all into a variable that itself is a list. You can actually unpack them simultaneously into two variables, doing name, comma, house. So this is a nice Python technique to not only create, but assign, automatically, in parallel, two variables at once, rather than just one. So this will have the effect of putting the name in the left, Hermione, and it will have the effect of putting Gryffindor the house in the right variable. And we now no longer have a row. We can now make our code a little more readable by now literally just saying name down here and, for instance, house down here. So just a little more readable, even though, functionally, the code now is exactly the same. All right, so this now works. And I'll confirm as much by just running it once more-- python of students.py, Enter. And we see that the text is as intended. But suppose, for the sake of discussion, that I'd like to sort this list of output. I'd like to say hello, again, to Draco first, then hello to Harry, then Hermione, then Ron. How can I go about doing this? Well, let's take some inspiration from the previous example, where we were only dealing with names and, instead, do it with these full phrases. So and so is in house. Well, let me go ahead and do this. I'm going to go ahead and start scratch and give myself a list called students, equal to an empty list, initially. And then with open students.csv as file, I'm going to go ahead and say this-- for line in file. And then below this, I'm going to do exactly as before-- name, comma, house equals the current line, stripping off the white space at the end, splitting it on a comma-- so that's exact same as before. But this time, before I go about printing the sentence, I'm going to store it temporarily in a list so that I can accumulate all of these sentences and then sort them later. So let me go ahead and do this. Students, which is my list, .append-- let me append the actual sentence I want to show on the screen-- so another f string. So name is in house, just as before. But notice, I'm not printing that sentence. I'm appending it to my list-- not a file, but to my list. Why am I doing this? Well, just because, as before, I want to do this. For student in the sorted students, I want to go ahead and print out students, like this. Well, let me go ahead and run python of students.py, and hit Enter now. And I think we'll see, indeed, Draco is now first. Harry is second. Hermione is third. And Ron is fourth. But this is arguably a little sloppy, right? It seems a little hackish that I'm constructing these sentences. And even though I technically want to sort by name, I'm technically sorting by these whole English sentences. So it's not wrong. It's achieving the intended result, but it's not really well designed because I'm just getting lucky that English is reading from left to right. And therefore, when I print this out, it's sorting properly. It would be better, really, to come up with a technique for sorting by the students' names, not by some English sentence that I've constructed here on line 6. So to achieve this, I'm going to need to make my life more complicated for a moment. And I'm going to need to collect information about each student before I bother assembling that sentence. So let me propose that we do this. Let me go ahead and undo these last few lines of code so that we currently have two variables, name and house, each of which has name and the student's house respectively. And we still have our global variable, students. But let me do this. Recall that Python supports dictionaries. And dictionaries are just collections of keys and values. So you can associate something with something else, like, a name with Hermione, like, a house with Gryffindor. That really is a dictionary. So let me do this. Let me temporarily create a dictionary that stores this association of name with house. Let me go ahead and do this. Let me say that the student here is going to be represented initially by an empty dictionary. And just like you can create an empty list with square brackets, you can create an empty dictionary with curly braces. So give me an empty dictionary that will soon have two keys, name and house. How do I do that? Well, I could do it this way-- student, open bracket, name equals the student's name that we got from the line. Student, bracket, house equals the house that we got from the line. And now I'm going to append to the students list-- plural-- that particular student. Now, why have I done this? I've admittedly made my code more complicated. It's more lines of code, but I've now collected all of the information I have about students while still keeping track-- what's a name, what's a house. The list, meanwhile, has all of the students' names and houses together. Now, why have I done this? Well, let me, for the moment, just do something simple. Let me do for student in students, and let me very simply now say, print the following f string, the current student with this name is in this current student's house. And now notice one detail. Inside of this f string, I'm using my curly braces, as always. I'm using, inside of those curly braces, the name of a variable, as always. But then I'm using not bracket 0 or 1 because these are dictionaries now, not list. But why am I using single quotes to surround house and to surround name? Why single quotes inside of this f string to access those keys? AUDIENCE: Yes, because you have double quotes in that line 12. And so you have to tell Python to differentiate. DAVID MALAN: Exactly, because I'm already using double quotes outside of the f string, if I want to put quotes around any strings on the inside, which I do need to do for dictionaries because, recall, when you index into a dictionary, you don't use numbers like lists-- 0, 1, 2, onward-- you, instead, use strings, which need to be quoted. But if you're already using double quotes, it's easiest to then use single quotes on the inside, so Python doesn't get confused about what lines up with what. So at the moment, when I run this program, it's going to print out those hellos. But they're not yet sorted. In fact, what I now have is a list of dictionaries, and nothing is yet sorted. But let me tighten up the code too to point out that it doesn't need to be quite as verbose. If you're in the habit of creating an empty dictionary, like this on line 6, and then immediately putting in two keys, name and house, each with two values, name and house respectively, you can actually do this all at once. So let me show you a slightly different syntax. I can do this. Give me a variable called student, and let me use curly braces on the right-hand side here. But instead of leaving them empty, let's just define those keys and those values now. Quote/unquote name will be name, and quote/unquote house will be house. This achieves the exact same effect in one line instead of three. It creates a new non-empty dictionary containing a name key, the value of which is the student's name, and a house key, the value of which is the student's house. Nothing else needs to change. That will still just work so that if I, again, run python of students.py, I'm still seeing those greetings, but they're still not quite actually sorted. Well, what might I go about doing here in order to-- what could I do to improve upon this further? Well, we need some mechanism now of sorting those students. But unfortunately, you can't do this. We can't sort all of the students now because those students are not names like they were before. They aren't sentences like they were before. Each of the students is a dictionary, and it's not obvious how you would sort a dictionary inside of a list. So ideally, what do we want to do? If at the moment we hit line 9, we have a list of all of these students, and inside of that list is one dictionary per student, and each of those dictionaries has two keys, name and house, wouldn't it be nice if there were way in code to tell Python, sort this list by looking at this key in each dictionary? Because that would give us the ability to sort either by name, or even by house, or even by any other field that we add to that file. So it turns out, we can do this. We can tell the sorted function not just to reverse things or not. It takes another positional-- it takes another named parameter called key, where you can specify what key should be used in order to sort some list of dictionaries. And I'm going to propose that we do this. I'm going to first define a function-- temporarily, for now-- called get_name. And this function's purpose in life, given a student, is to, quite simply, return the student's name from that particular dictionary. So if student is a dictionary, this is going to return literally the student's name, and that's it. That's the sole purpose of this function in life. What do I now want to do? Well now that I have a function that, given a student, will return to me the student's name, I can do this. I can change sorted to say, use a key that's equal to whatever the return value of get_name is. And this now is a feature of Python. Python allows you to pass functions as arguments into other functions. So get_name is a function. Sorted is a function. And I'm passing in get_name to sorted as the value of that key parameter. Now, why am I doing that? Well, if you think of the get_name function, it's just a block of code that will get the name of a student. That's handy because that's the capability that sorted needs. When given a list of students, each of which is a dictionary, sorted needs to know, how do I get the name of the student? In order to do alphabetical sorting for you. The authors of Python didn't know that we were going to be creating students here in this class, so they couldn't have anticipated writing code in advance that specifically sorts on a field called student, let alone called name, let alone house. So what did they do? They instead built into the sorted function this named parameter key that allows us, all these years later, to tell their function sorted how to sort this list of dictionaries. So now watch what happens. If I run python of students.py and hit Enter, I now have a sorted list of output. Why? Because now that list of dictionaries has all been sorted by the student's name. I can further do this. If, as before, we want to reverse the whole thing by saying reverse equals true, we can do that too. Let me rerun Python of students.py, and hit Enter. Now it's reversed. Now it's Ron, then Hermione, Harry, and Draco. But we can do something different as well. What if I want to sort, for instance, by house name reversed? I could do this. I could change this function from get_name to get_house. I could change the implementation up here to be get_house. And I can return not the student's name but the student's house. And so now notice, if I run python of students.py, Enter, notice now it is sorted by house in reverse order. Slytherin is first, and then Gryffindor. If I get rid of the reverse but keep the get_house and rerun this program, now it's sorted by house. Gryffindor is first, and Slytherin is last. And the upside now of this is, because I'm using this list of dictionaries and keeping the students data together until the last minute when I'm finally doing the printing, I now have full control over the information itself, and I can sort by this or that. I don't have to construct those sentences in advance, like I rather hackishly did the first time. All right, that was a lot. Let me pause here to see if there are questions. AUDIENCE: So when we are sorting the files, every time, should we use the loops, or a text dictionary, or any kind of list? Can we sort by just sorting, not looping or any kind of stuff? DAVID MALAN: A good question, and the short answer with Python alone, you're the programmer. You need to do the sorting. With libraries and other techniques, absolutely. You can do more of this automatically because someone else has written that code. What we're doing at the moment is doing everything from scratch ourselves. But absolutely, with other functions or libraries, some of this could be made more easily done. Some of this could be made easier. Other questions on this technique here? AUDIENCE: If equal to the return value of the function, can it be equal to just a variable or a value? DAVID MALAN: Well, yes. It should equal a value. And I should clarify, actually, since this was not obvious. So when you pass in a function like get_name or get_house to the sorted function as the value of key, that function is automatically called by the sorted function for you on each of the dictionaries in the list. And it uses the return value of get_name or get_house to decide what strings to actually use to compare in order to decide which is alphabetically correct. So this function, which you pass just by name, you do not pass in parentheses at the end, is called by the sorted function in order to figure out for you how to compare these same values. AUDIENCE: How can we use nested dictionaries? I have read about nested dictionaries. What is the difference between nested dictionaries and the dictionary inside a list? I think it is that. DAVID MALAN: Sure. So we are using a list of dictionaries. Why? Because each of those dictionaries represents a student. And a student has a name and a house, and we want to, I claim, maintain that association. And it's a list of students because we've got multiple students-- four, in this case. You could create a structure that is a dictionary of dictionaries. But I would argue, it just doesn't solve a problem. I don't need a dictionary of dictionary. I need a list of key-value pairs right now. That's all. So let me propose, if we go back to students.py here, and we revert back to the approach where we have get_name as the function, both used and defined here, and that function returns the student's name, what happens to be clear is that the sorted function will use the value of key-- get_name, in this case-- calling that function on every dictionary in the list that it's supposed to sort. And that function, get_name, returns the string that sorted will actually use to decide whether things go in this order, left-right, or in this order, right-left. It alphabetizes these things based on that return value. So notice that I'm not calling the function get_name here with parentheses. I'm passing it in only by its name so that the sorted function can call that get name function for me. Now, it turns out, as always, if you're defining something, be it a variable or, in this case, a function, and then immediately using it but never, once again, needing the name of that function, like, get_name, we can actually tighten this code up further. I can actually do this. I can get rid of the get_name function all together, just like I could get rid of a variable that isn't strictly necessary. And instead of passing key, the name of a function, I can actually pass key what's called a lambda function, which is an anonymous function, a function that just has no name. Why? Because you don't need to give it a name if you're only going to call it in one place. And the syntax for this in Python is a little weird. But if I do key equals literally the word lambda, then something like student, which is the name of the parameter I expect this function to take, and then I don't even type the Return key. I instead just say, student, bracket, name. So what am I doing here with my code? This code here that I've highlighted is equivalent to the get_name function I implemented a moment ago. The syntax is admittedly a little different. I don't use def. I didn't even give it a name, like get_name. I, instead, am using this other keyword in Python called lambda, which says, hey, Python, here comes a function, but it has no name. It's anonymous. That function takes a parameter. I could call it anything I want. I'm calling it student. Why? Because this function that's passed in as key is called on every one of the students in that list, every one of the dictionaries in that list. What do I want this anonymous function to return? Well given a student, I want to index into that dictionary and access their name so that the string Hermione, and Harry, and Ron, and Draco is ultimately returned. And that's what the sorted function uses to decide how to sort these bigger dictionaries that have other keys, like house, as well. So if I now go back to my terminal window and run python of students.py, it still seems to work the same, but it's arguably a little better design because I didn't waste lines of code by defining some other function, calling it in one and only one place. I've done it all sort of in one breath, if you will. All right, let me pause here to see if there's any questions specifically about lambda, or anonymous functions, and this tightening up of the code. AUDIENCE: I have a question, like whether we could define lambda twice. DAVID MALAN: You can use lambda twice. You can create as many anonymous functions as you'd like. And you generally use them in contexts like this, where you want to pass to some other function a function that itself does not need a name. So you can absolutely use it in more than one place. I just have only one use case for it. How about one other question on lambda or anonymous functions specifically? AUDIENCE: What if our lambda would take more than one line, for example? DAVID MALAN: Sure, if your lambda function takes multiple parameters, that is fine. You can simply specify commas followed by the names of those parameters, maybe x and y or so forth, after the name student. So here too, lambda looks a little different from def in that you don't have parentheses, you don't have the keyword def, you don't have a function name. But ultimately, they achieve that same effect. They create a function anonymously and allow you to pass it in, for instance, as some value here. So let's now change students.csv to contain not students' houses at Hogwarts, but their homes where they grew up. So Draco, for instance, grew up in Malfoy Manor. Ron grew up in The Burrow. Harry grew up in Number Four, Privet Drive. And according to the internet, no one knows where Hermione grew up. The movies apparently took certain liberties with where she grew up. So for this purpose, we're actually going to remove Hermione because it is unknown exactly where she was born. So we still have some three students. But if anyone can spot the potential problem now, how might this be a bad thing? Well, let's go and try and run our own code here. Let me go back to students.py here. And let me propose that I just change my semantics because I'm now not thinking about Hogwarts houses but the students' own homes. So I'm just going to change some variables. I'm going to change this house to a home, this house to a home, as well as this one here. I'm still going to sort the students by name, but I'm going to say that they're not in a house, but rather, from a home. So I've just changed the names of my variables and my grammar in English here, ultimately, to print out that, for instance, Harry is from Number Four, Privet Drive, and so forth. But let's see what happens here when I run Python of this version of students.py, having changed students.csv to contain those homes and not houses. Enter. Huh, our first value error, like the program just doesn't work. What might explain this value error? The explanation of which rather cryptically is, too many values to unpack. And the line in question is this one involving split. How did, all of a sudden, after all of these successful runs of this program, did line 5 suddenly now break? AUDIENCE: In the line in students.csv, you have three values. There's a line that you have three values and in students. DAVID MALAN: Yeah, I spent a lot of time trying to figure out where every student should be from so that we could create this problem for us. And wonderfully, like, the first sentence of the book is Number Four, Privet Drive. And so the fact that address has a comma in it is problematic. Why? Because you and I decided sometime ago to just standardize on commas-- CSV, Comma-Separated Values-- to denote the-- we standardized on commas in order to delineate one value from another. And if we have commas grammatically in the student's home, we're clearly confusing it as this special symbol. And the split function is now, for just Harry, trying to split it into three values, not just two. And that's why there's too many values to unpack because we're only trying to assign two variables, name and house. Now, what could we do here? Well, we could just change our approach, for instance. One paradigm that is not uncommon is to use something a little less common, like a vertical bar. So I could go in and change all of my commas to vertical bars. That, too, could eventually come back to bite us in that if my file eventually has vertical bars somewhere, it might still break. So maybe that's not the best approach. I could maybe do something like this. I could escape the data, as I've done in the past. And maybe I could put quotes around any English string that itself contains a comma. And that's fine. I could do that, but then my code, students.py, is going to have to change too because I can't just naively split on a comma now. I'm going to have to be smarter about it. I'm going to have to take into account split only on the commas that are not inside of quotes. And oh, it's getting complicated fast. And at this point, you need to take a step back and consider, you know what, if we're having this problem, odds are, many other people before us have had this same problem. It is incredibly common to store data in files. It is incredibly common to use CSV files specifically. And so you know what. Why don't we see if there's a library in Python that exists to read and/or write CSV files? Rather than reinvent the wheel, so to speak, let's see if we can write better code by standing on the shoulders of others who have come before us-- programmers passed-- and actually use their code to do the reading and writing of CSVs, so we can focus on the part of our problem that you and I care about. So let's propose that we go back to our code here and see how we might use the CSV library. Indeed, within Python, there is a module called CSV. The documentation for it is at this URL here in Python's official documentation. But there's a few functions that are pretty readily accessible if we just dive right in. And let me propose that we do this. Let me go back to my code here. And instead of re-inventing this wheel and reading the file line by line, and splitting on commas, and dealing now with quotes, and Privet Drives, and so forth, let's do this instead. At the start of my program, let me go up and import the CSV module. Let's use this library that someone else has written that's dealing with all of these corner cases, if you will. I'm still going to give myself a list, initially empty, in which to store all these students. But I'm going to change my approach here now just a little bit. When I open this file with with, let me go in here and change this a little bit. I'm going to go in here now and say this. Reader equals csv.reader, passing in file as input. So it turns out, if you read the documentation for the CSV module, it comes with a function called reader whose purpose in life is to read a CSV file for you and figure out, where are the commas, where are the quotes, where are all the potential corner cases, and just deal with them for you. You can override certain defaults or assumptions in case you're using not a comma, but a pipe or something else. But by default, I think it's just going to work. Now, how do I integrate over a reader and not the raw file itself? It's almost the same. The library allows you still to do this. For each row in the reader-- so you're not iterating over the file directly now. You're iterating over the reader, which is, again, going to handle all of the parsing of commas, and new lines, and more. For each row in the reader, what am I going to do? Well, at the moment, I'm going to do this. I'm going to append to my students list the following dictionary, a dictionary that has a name whose value is the current row's first column, and whose house, or rather, home now is the row's second. column. Now, it's worth noting that the reader for each line in the file, indeed, returns to me a row. But it returns to me a row that's a list, which is to say that the first element of that list is going to be the student's name, as before. The second element of that list is going to be the student's home, as now before. But if I want to access each of those elements, remember that lists are 0 indexed. We start counting at 0 and then 1, rather than 1 and then 2. So if I want to get at the student's name, I use row, bracket, 0. And if I want to get at the student's home, I use row, bracket, 1. But in my for loop, we can do that same unpacking as before. If I know the CSV is only going to have two columns, I could even do this-- for name, home in reader. And now I don't need to use list notation. I can unpack things all at once and say, name here, and home here. The rest of my code can stay exactly the same because, what am I doing now on line 8? I'm still constructing the same dictionary as before, albeit for homes instead of houses. And I'm grabbing those values now, not from the file itself and my use of split, but the reader. And again, what the reader is going to do is figure out, where are those commas, where are the quotes? And just solve that problem for you. So let me go now down to my terminal window and run python of students.py, and hit Enter. And now we see successfully, sorted no less, that Draco is from Malfoy Manor. Harry is from Number Four, comma, Privet Drive. And Ron is from The Burrow. Questions now on this technique of using CSV reader from that CSV module, which, again, is just getting us out of the business of reading each line ourself and reading each of those commas and splitting? AUDIENCE: So my questions are related to something in the past. I recognize that you are reading a file every time-- well, we assume that we have the CSV file to hand already in this case. Is it possible to make a file readable and writable? So in this case, you could write such stuff to the file, but then at the same time, you could have another function that reads through the file and does changes to it as you go along? DAVID MALAN: A really good question. And the short answer is, yes. However, historically, the mental model for a file is that of a cassette tape. Years ago, not really in use anymore, but cassette tapes are sequential whereby they start at the beginning, and if you want to get to the end, you kind of have to unwind the tape to get to that point. The closest analog nowadays would be something like Netflix or any streaming service, where there's a scrubber that you have to go left to right. You can't just jump there or jump there. You don't have random access. So the problem with files, if you want to read and write them, you or some library needs to keep track of where you are in the file so that if you're reading from the top and then you write at the bottom, and you want to start reading again, you seek back to the beginning. So it's not something we'll do here in class. It's more involved, but it's absolutely doable. For our purposes, we'll generally recommend, read the file. And then if you want to change it, write it back out, rather than trying to make more piecemeal changes, which is good if, though, the file is massive, and it would just be very expensive time-wise to change the whole thing. Other questions on this CSV reader? AUDIENCE: It's possible to write a paragraph in that file? DAVID MALAN: Absolutely. Right now, I'm writing very small strings, just names or houses, as I did before. But you can absolutely write as much text as you want, indeed. Other questions on CSV reader? AUDIENCE: Can a user chose himself a key? Like, input key will be a name or code. DAVID MALAN: So short answer, yes, we could absolutely write a program that prompts the user for a name and a home, a name and a home. And we could write out those values. And in a moment, we'll see how you can write to a CSV file. For now, I'm assuming, as the programmer who created students.csv, that I know what the columns are going to be. And therefore, I'm naming my variables accordingly. However, this is a good segue to one final feature of reading CSVs, which is that you don't have to rely on either getting a row as a list and using bracket 0 or bracket 1, and, you don't have to unpack things manually in this way. We could actually be smarter and start storing the names of these columns in the CSV file itself. And in fact, if any of you have ever opened a spreadsheet file before, be it in Excel, Apple Numbers, Google Spreadsheets or the like, odds are, you've noticed that the first row, very frequently, is a little different. It actually is boldface sometimes, or it actually contains the names of those columns, the names of those attributes below. And we can do this here. In students.csv, I don't have to just keep assuming that the student's name is first and that the student's home is second. I can explicitly bake that information into the file just to reduce the probability of mistakes down the road. I can literally use the first row of this file and say, name, comma, home. So notice that name is not literally someone's name, and home is not literally someone's home. It is literally the words, name and home, separated by comma. And if I now go back into students.py and don't use CSV reader, but instead, I use a dictionary reader, I can actually treat my CSV file even more flexibly, not just for this, but for other examples too. Let me do this. Instead of using a CSV reader, let me use a CSV dict reader, which will now iterate over the file top to bottom, loading in each line of text not as a list of columns but as a dictionary of columns. What's nice about this is that it's going to give me automatic access now to those columns' names. I'm going to revert to just saying, for row in reader, and now I'm going to append a name and a home. But how am I going to get access to the current row's name and the current row's home? Well, earlier, I used bracket 0 for the first and bracket 1 for the second when I was using a reader. A reader returns lists. A dict reader or dictionary reader returns dictionaries, one at a time. And so if I want to access the current row's name, I can say, row, quote/unquote, name. I can say here for home, row, quote/unquote, home. And I now have access to those same values. The only change I had to make, to be clear, was in my CSV file, I had to include, on the very first row, little hints as to what these columns are. And if I now run this code, I think it should behave pretty much the same-- python of students.py. And indeed, we get the same sentences. But now my code is more robust against changes in this data. If I were to open the CSV file in Excel, or Google Spreadsheets, or Apple Numbers, and for whatever reason change the columns around, maybe this is a file that you're sharing with someone else, and just because, they decide to sort things differently left to right by moving the columns around, previously, my code would have broken because I was assuming that name is always first, and home is always second. But if I did this-- be it manually in one of those programs or here-- home, comma, name, and suppose, I reversed all of this. The home comes first, followed by Harry, The Burrow, then by Ron, and then lastly, Malfoy Manor, then Draco, notice that my file is now completely flipped. The first column is now the second, and the second's the first. But I took care to update the header of that file, the first row. Notice my Python code, I'm not going to touch it at all. I'm going to rerun python of students.py, and hit Enter. And it still just works. And this, too, is an example of coding defensively. What if someone changes your CSV file, your data file? Ideally, that won't happen. But even if it does now, because I'm using a dictionary reader that's going to infer from that first row for me what the columns are called, my code just keeps working. And so it keeps getting, if you will, better and better. Any questions now on this approach? AUDIENCE: Yeah, what is the importance of new line in the CSV file? DAVID MALAN: What's the importance of the new line in the CSV file? It's partly a convention. In the world of text files, we humans have just been, for decades, in the habit of storing data line by line. It's visually convenient. It's just easy to extract from the file because you just look for the new lines. So the new line just separates some data from some other data. We could use any other symbol on the keyboard, but it's just common to hit Enter to just move the data to the next line. Just a convention. Other questions? AUDIENCE: It seems to be working fine if you just have name and home. I'm wondering what will happen if you want to put in more data. Say, you wanted to add a house to both the name and the home. DAVID MALAN: Sure, if you wanted to add the house back-- so if I go in here and add house last, and I go here and say, Gryffindor for Harry, Gryffindor for Ron, and Slytherin for Draco, now I have three columns, effectively, if you will-- home on the left, name in the middle, house on the right, each separated by commas with weird things, like Number Four, comma, Privet Drive still quoted. Notice, if I go back to students.py, and I don't change the code at all and run python of students.py, it still just works. And this is what's so powerful about a dictionary reader. It can change over time. It can have more and more columns. Your existing code is not going to break. Your code would break, would be much more fragile, so to speak, if you were making assumptions like, the first column's always going to be name. The second column is always going to be house. Things will break fast if those assumptions break down-- so not a problem in this case. Well, let me propose that, besides reading CSVs, let's at least take a peek at how we might write a CSV too. If you're writing a program in which you want to store not just students' names, but maybe their homes as well in a file, how can we keep adding to this file? Let me go ahead and delete the contents of students.csv and just re-add a single simple row, name, comma, home, so as to anticipate inserting more names and homes into this file. And then let me go to students.py, and let me just start fresh so as to write out data this time. I'm still going to go ahead and Import CSV. I'm going to go ahead now and prompt the user for their name-- so input, quote/unquote, What's your name? And I'm going to go ahead and prompt the user for their home-- so home equals input, quote/unquote, Where's your home? Now I'm going to go ahead and open the file, but this time for writing instead of reading, as follows-- with open, quote/unquote, students.csv. I'm going to open it in append mode so that I keep adding more and more students and homes to the file, rather than just overwriting the entire file itself. And I'm going to use a variable name of file. I'm then going to go ahead and give myself a variable called writer, and I'm going to set it equal to the return value of another function in the CSV module called csv.writer. And that writer function takes as its sole argument the file variable there. Now I'm going to go ahead and just do this. I'm going to say, writer.writerow, and I'm going to pass into writerow the line that I want to write to the file specifically as a list. So I'm going to give this a list of name, comma, home, which, of course, are the contents of those variables. Now I'm going to go ahead and save the file. I'm going to go ahead and rerun python of students.py, hit Enter. And what's your name? Well, let me go ahead and type in Harry as my name and Number Four, comma, Privet Drive, Enter. Now notice, that input itself did have a comma. And so if I go to my CSV file now, notice that it's automatically been quoted for me so that subsequent reads from this file don't confuse that comma with the actual comma between Harry and his home. Well, let me go ahead and run it a couple of more times. Let me go ahead and rerun python of students.py. Let me go ahead and input this time Ron and his home as The Burrow. Let's go back to students.csv to see what it looks like. Now we see Ron, comma, The Burrow has been added automatically to the file. And let's do one more-- python of students.py, Enter. Let's go ahead and give Draco's name and his home, which would be Malfoy Manor, Enter. And if we go back to students.csv, now, we see that Draco is in the file itself. And the library took care of not only writing each of those rows, per the function's name. It also handled the escaping, so to speak, of any strings that themselves contained a comma, like Harry's own home. Well, it turns out, there's yet another way we could implement this same program without having to worry about precisely that order again and again and just passing in a list. It turns out, if we're keeping track of what's the name and what's the home, we could use something like a dictionary to associate those keys with those values. So let me go ahead and back up and remove these students from the file, leaving only the header row again-- name, comma, home. And let me go over to students.py. And this time, instead of using CSV writer, I'm going to go ahead and use csv.DictWriter, which is a dictionary writer, that's going to open the file in much the same way. But rather than write a row as this list of name, comma, home, what I'm now going to do is follows. I'm going to first output an actual dictionary, the first key of which is name, colon, and then the value thereof is going to be the name that was typed in. And I'm going to pass in a key of home, quote/unquote, the value of which, of course, is the home that was typed in. But with DictWriter, I do need to give it a hint as to the order in which those columns are when writing it out so that, subsequently, they could be read, even if those orderings change. Let me go ahead and pass in fieldnames, which is a second argument to DictWriter, equals, and then a list of the actual columns that I know are in this file, which, of course, are name, comma, home. Those times, in quotes because that's, indeed, the string names of the columns, so to speak, that I intend to write to in that file. All right, now let me go ahead and go to my terminal window, run python of students.py. This time, I'll type in Harry's name again. I'll, again, type in Number Four, comma, Privet Drive, Enter. Let's now go back to students.csv. And voila, Harry is back in the file, and it's properly escaped or quoted. I'm sure that if we do this again with Ron and The Burrow, and let's go ahead and run it one third time with Draco and Malfoy Manor, Enter. Let's go back to students.csv. And via this dictionary writer, we now have all three of those students as well. So whereas with CSV writer, the onus is on us to pass in a list of all of the values that we want to put from left to right, with a dictionary writer, technically, they could be in any order in the dictionary. In fact, I could just have correctly done this, passing in home followed by name. But it's a dictionary. And so the ordering in this case does not matter so long as the key is there and the value is there. And because I have passed in field names as the second argument to DictWriter, it ensures that the library knows exactly which column contains name or home, respectively. Are there any questions now on dictionary reading, dictionary writing, or CSVs more generally? AUDIENCE: In any specific situation for me to use a single quotation or double quotation? Because after the print, we use single quotation to represent the key of the dictionary. But after the reading or writing, we use the double quotation. DAVID MALAN: It's a good question. In Python, you can generally use double quotes, or you can use single quotes. And it doesn't matter. You should just be self-consistent so that stylistically your code looks the same all throughout. Sometimes, though, it is necessary to alternate. If you're already using double quotes, as I was earlier for a long f string, but inside that f string, I was interpolating the values of some variables using curly braces, and those variables were dictionaries. And in order to index into a dictionary, you use square brackets and then quotes. But if you're already using double quotes out here, you should generally use single quotes here, or vise versa. But otherwise, I'm in the habit of using double quotes everywhere. Others are in the habit of using single quotes everywhere. It only matters sometimes if one might be confused for the other. Other questions on dictionary writing or reading? AUDIENCE: Yeah, my question is, can we use multiple CSV files in any program? DAVID MALAN: Absolutely. You can use as many CSV files as you want. And it's just one of the formats that you can use to save data. Other questions on CSVs or File I/O? AUDIENCE: Thanks for taking my question. So when you're reading from the file as a dictionary, you had the fields called. When you're reading, couldn't you just call the row? the previous version of the students.py file, when you're reading each row, you were splitting out the fields by name. Yeah, so when you're appending to the students list, couldn't you just call for row and reader, students.append row, rather than naming each of the fields? DAVID MALAN: Oh, very clever. Short answer, yes, in so far as DictReader returns one dictionary at a time, when you loop over it, row is already going to be a dictionary. So yes, you could actually get away with doing this. And the effect would really be the same in this case. Good observation. How about one more question on CSVs? AUDIENCE: Yeah, when reading in CSVs from my past work with data, a lot of things can go wrong. I don't know if it's a fair question that you can answer in a few sentences. But are there any best practices to double check that no mistakes occurred? DAVID MALAN: It's a really good question. And I would say, in general, if you're using code to generate the CSVs and to read the CSVs, and you're using a good library, theoretically, nothing should go wrong. It should be 100% correct if the libraries are 100% correct. You and I tend to be the problem. When you let a human touch the CSV, or when Excel, or Apple Numbers, or some other tools involved that might not be aligned with your code's expectations, things then, yes, can break. The goal-- sometimes, honestly, the solution is manual fixes. You go in and fix the CSV, or you have a lot of error checking, or you have a lot of try, except just to tolerate mistakes in the data. But generally, I would say, if you're using CSV or any file format internally to a program to both read and write it, you shouldn't have concerns there. You and I, the humans, are the problem, generally speaking-- and not the programmers, the users of those files, instead. All right, allow me to propose that we leave CSVs behind but to note that they're not the only file format you can use in order to read or write data. In fact, they're a popular format, as is just raw text files-- .txt files. But you can store data, really, any way that you want. We've just picked CSVs because it's representative of how you might read and write from a file and do so in a structured way, where you can somehow have multiple keys, multiple values all in the same file without having to resort to what would be otherwise known as a binary file. So a binary file is a file that's really just zeros and ones. And they can be laid out in any pattern you might want, particularly if you want to store not textual information, but maybe graphical, or audio, or video information as well. So it turns out that Python is really good when it comes to having libraries for, really, everything. And in fact, there's a popular library called pillow that allows you to navigate image files as well and to perform operations on image files. You can apply filters, a la Instagram. You can animate them as well. And so what I thought we'd do is leave behind text files for now and tackle one more demonstration, this time, focusing on this particular library and image files instead. So let me propose that we go over here to VS Code and create a program, ultimately, that creates an animated GIF. These things are everywhere nowadays in the form of memes, and animations, and stickers, and the like. And an animated GIF is really just an image file that has multiple images inside of it. And your computer or your phone shows you those images, one after another, sometimes on an endless loop, again and again. And so long as there's enough images, it creates the illusion of animation because your mind and mine kind of fills in the gaps visually and just assumes that if something is moving, even though you're only seeing one frame per second, or some sequence thereof, it looks like an animation. So it's like a simplistic version of a video file. Well, let me propose that we start with maybe a couple of costumes from another popular programming language. And let me go ahead and open up my first costume here, number 1. So suppose here that this is a costume or, really, just a static image here, costume1.gif. And it's just a static picture of a cat, no movement at all. Let me go ahead now and open up a second one, costume2.gif, that looks a little bit different. Notice-- and I'll go back and forth-- this cat's legs are a little bit aligned differently so that this was version 1, and this was version 2. Now, these cats come from a programming language from MIT called scratch that allows you, very graphically, to animate all this and more. But we'll use just these two static images, costume1 and costume2 to create our own animated GIF that, after this, you could text to a friend or message them, much like any meme online. Well, let me propose that we create this animated GIF, not by just using some off-the-shelf program that we downloaded, but by writing our own code. Let me go ahead and run code of costumes.py and create our very own program that's going to take, as input, two or even more image files and then generate an animated GIF from them by essentially creating this animated GIF by toggling back and forth endlessly between those two images. Well, how am I going to do this? Well, let's assume that this will be a program called costumes.py that expects two command line arguments, the names of the files, the individual costumes that we want to animate back and forth. So to do that, I'm going to import sys so that we ultimately have access to sys.argv. I'm then, from this pillow library, going to import support for images specifically. So from PIL import Image-- capital I, as per the library's documentation. Now I'm going to give myself an empty list called images, just so I have a list in which to store one, or two, or more of these images. And now let me do this. For each argument in sys.argv, I'm going to go ahead and create a new image variable, set it equal to this Image.open function, passing in arg. Now, what is this doing? I'm proposing that, eventually, I want to be able to run python of costumes.py, and then as command line argument, specify costume1.gif, space, costume2.gif. So I want to take in those file names from the command line as my arguments. So what am I doing here? Well, I'm iterating over sys.argv all of the words in my command line arguments. I'm creating a variable called image, and I'm passing to this function, Image.open from the pillow library, that specific argument. And that library is essentially going to open that image in a way that gives me a lot of functionality for manipulating it, like animating. Now I'm going to go ahead and append to my images list that particular image. And that's it. So this loop's purpose in life is just to iterate over the command line arguments and open those images using this library. The last line is pretty straightforward. I'm going to say this. I'm going to grab the first of those images, which is going to be in my list at location 0, and I'm going to save it to disk. That is, I'm going to save this file. Now, in the past when we use CSVs or text files, I had to do the file opening. I had to do the file writing, maybe even the closing. I don't need to do that with this library. The pillow library takes care of the opening, the closing, and the saving for me by just calling save. I'm going to call this save function. And just to leave space, because I have a number of arguments to pass, I'm going to move to another line so it fits. I'm going to pass in the name of the file that I want to create, costumes.gif-- that will be the name of my animated GIF. I'm going to tell this library to save all of the frames that I pass to it-- so the first costume, the second costume, and even more if I gave them. I'm going to then append to this first image-- the images 0-- the following images, equals this list of images. And this is a bit clever, but I'm going to do this. I want to append the next image there, images[1]. And now I want to specify a duration of 200 milliseconds for each of these frames, and I want this to loop forever. And if you specify loop=0, that is time 0, it means it's just not going to loop a finite number of times, but an infinite number of times instead. And I need to do one other thing. Recall that sys.argv contains not just the words I typed after my program's name, but what else does sys.argv contain? If you think back to our discussion of command line arguments, what else is sys.argv besides the words I'm about to type, like costume1.gif and costume2? AUDIENCE: Yeah, so we'll actually get the original name of the program we want to run, the costumes.py. DAVID MALAN: Indeed, we'll get the original name of the program, costumes.py in this case, which is not a GIF, obviously. So remember that using slices in Python, we can do this. If sys.argv is a list, and we want to get a slice of that list, everything after the first element, we can do 1, colon, which says, start it location 1, not 0, and take a slice all the way to the end. So give me everything except the first thing in that list, which, to McKenzie's point, is the name of the program. Now, if I haven't made any mistakes, let's see what happens. I'm going to run python of costumes.py, and now I'm going to specify the two images that I want to animate-- so costume1.gif and costume2.gif. What is the code now going to do? Well, to recap, we're using the sys library to access those command line arguments. We're using the pillow library to treat those files as images and with all the functionality that comes with that library. I'm using this images list just to accumulate all of these images, one at a time from the command line. And in lines 7 through 9, I'm just using a loop to iterate over all of them and just add them to this list after opening them with the library. And the last step, which is really just one line of code broken onto three so that it all fits, I'm going to save the first image, but I'm asking the library to append this other image to it as well-- not bracket 0, but bracket 1. And if I had more, I could express those as well. I want to save all of these files together. I want to pause 200 milliseconds-- a fifth of a second in between each frame. And I want it to loop infinitely many times. So now if I cross my fingers as always, hit Enter, nothing bad happened, and that's almost always a good thing. Let me now run code of costumes.gif to open up in VS Code the final image. And what I think I should see is a very happy cat? And indeed. So now we've seen not only that we can read and write files, be it textually. We can read and now write files that are binary zeros and ones. We've just scratched the surface. This is using the library called pillow. But ultimately, this is going to give us the ability to read and write files however we want. So we've now seen that via File I/O, we can manipulate not just textual files, be it TXT files, or CSVs, but even binary files as well. In this case, they happen to be images. But if we dived in deeper, we could explore audio, and video, and so much more all by way of these simple primitives, this ability, somehow, to read and write files. That's it for now. We'll see you next time.