[MUSIC PLAYING] SPEAKER: Well, hello, one and all, and welcome to our short on Reading and Writing CSVs. Now, CSVs are great for storing data, in terms of comma-separated values, and they're great for analyzing data, particularly when coupled with Python. So I have here a program called views.py, which is not implemented yet, but will be soon. The goal of it is to read in some data from views.csv, these comma-separated values in this file here. And notice how I have, in this case, the names of each of the 36 views of Mount Fuji, some prints produced by the artist, Hokusai, in the 1830s. Now, I have here three columns of data, if you will. One called id, which identifies the id of the print. One called English title, which is the English title of the print. And one called Japanese title, which is the Japanese title of the print, as well. So notice here, if I go to 1.jpeg 2, I actually have a picture of the Great Wave off Kanagawa right here, as well. If I go to 2.jpeg, I'll see Fine Wind, Clear Morning. So there's actually a correspondence between the id's of the pictures here and the actual pictures, the dot jpegs I have stored in my code space here, as well. So I'll go ahead and close these. And our goal is to read in this CSV of views, and do a bit of image analysis on them. I think it'll be interesting if you actually find out how bright or not bright each of these photos is. In fact, we have a function here called, calculate brightness, which takes as input a file name, and returns to us 0 to 1 on a 0 to 1 scale, how bright or not bright, how dark the image is. 1 being brightest, completely white, and 0 being the darkest, completely black, in this case. So let's see if we can maybe start off by just reading whatever is inside of this CSV file and focus on the analyzing just a little bit later. So our first step will be to try to open this CSV file and maybe get a sense for the data that's inside of it, and make sure we can actually see that data in Python, too. So to read a CSV file, we've seen we should use the CSV library. I'll import CSV up top. And it turns out, that we've also seen CSV, the library comes with a object called a DictReader, which actually allows us to take a file and read each row of it as a dictionary. But before we can get there, we need to open our file first. And we've seen too to do that, we can use a keyword called, with, which takes care of opening and closing our file automatically for us, thanks to the magic of Python. So I can type, with open, and then give the name of the file I'm hoping to open, in this case, views.csv. Now, views.csv, I want to open it and not only just read the data inside of it. Reading being, I don't want to change anything, I just want to see what's inside. So I'll say as the second argument, to open r, which stands for reading mode. I'm going to just simply just read whatever is inside of views.csv. Now I'll go ahead and give a temporary name to the file I have opened. Here, I'm typing, with open views.csv r as file, meaning in this case, file refers to now the file I have opened, views.csv for reading. Now, within this block of code that I've opened up with with, I can do all kinds of things while this file is open. One thing I could do, as we said before, is use CSV.DictReader and pass as input, in this case, file, giving myself a reader I can use to loop over every kind of row that I'll see inside of this views.csv file. And we've also seen I can type this. For row in reader, meaning every row this reader gives back to me, I want to keep looping and looping over it, until I have no more rows to loop over. And I could, to make sure everything's working, type, print row, just like this. So what have I done? I've opened up views.csv, called it file, so long as we're inside of this indented block here, lines 8 through 10. I'm looping through every row I receive from this object called a DictReader, built into the CSV library, and I'll print the results as a row. Let's go ahead and run Python of views.py and hmm, here, we actually see a lot of dictionaries. I'll scroll up a little bit here. And we'll see our very first dictionary, id 1, which makes sense, because we have a column called, id. English title, The Great Wave off Kanagawa. Makes sense, because we have a heading, English title. And then Japanese title with this text right here. So it seems like we've gotten back every row in our CSV now as a dictionary. And this is likely the most convenient way to read your CSVs. Every row is its own dictionary. You can access the values at that row using the keys, which are the same names that you'll get at the top of your CSV file, for instance. So that seems to be going pretty well for us. Let's see what else we could do. I mean, our goal ultimately, is to open up these jpeg files and calculate how bright they are on a scale of 0 to 1. So notice how we said the id's in each of these rows mapped to or matched with the file names. So here again, The Great Wave has id 1, meaning it matches with the jpeg file, 1.jpeg. So maybe let's focus on the id's here. As we said, every row is a dictionary. And I can access the value at a given row by typing in the column name, if you will. So here, our column name, our header was id. I can access 1 for this particular row, and every other row as well. 2, 3, 4 5 and so on. I'll go ahead and run Python of views.py. And notice how now, I'm looping over and getting the id's from each particular row. So, then we can use this to calculate the brightness. Again, all calculate brightness needs is a file name. So I could maybe do this. I could say, brightness equals the result of calling calculate brightness, given some particular file name. And here, what I'll go ahead and type, is the file name I'm looking for on each iteration. I'll begin first with row, bracket, id, using a Python f string here. So if row id was 1, would have 1. If row id was 2, I would have 2. I'll then type .jpeg, which would complete the file name for me. So now what's happening, is calculate brightness is receiving for each row, the file name. It should look up to calculate the brightness from, and storing that result in brightness. I'll go ahead and print now the result of brightness. And why don't I try running this down below. I'll go ahead and type Python of views.py, and oops. Brightness does not seem to be defined. Aha. OK, so I should not have done this. Equals equals, we've seen before, that means comparing two values, setting them equal or testing if they are equal. What we want instead, is just equal. So typo on my part. I'll go ahead and now run Python of views.py and we'll see a lot of brightness values. And so we said before that in this case, we're actually getting brightness on a scale of 0 to 1. So, higher numbers mean more brightness. Lower numbers mean less brightness or more darkness, in this case. So these are pretty long numbers in terms of decimal places. So I could also round them if I wanted to, as well. I could type round, in this case, brightness 2, to round to two decimal places. I think we'll get a more friendly output here. I'll see for each file that I've opened and tested brightness for, some brightness value for each of those. But these numbers don't really mean much to me if they're not alongside, let's say, the file names, and the actual painting names, too. So, why don't we go ahead and try to make that happen for us as well? Well, what I need to do, is no longer just read in this CSV views.csv, I might also want to make a CSV, write some new one that has as part of it, some new column, let's say. Maybe one called, brightness, which actually includes the value we found for each, in this case, print that we have been thinking about in this collection of prints here. So, I can actually go ahead and open up not just one file, but two at one time. Using with. I can specify that I want to open one for reading right in here, but also, one for writing, as well. I'll type comma, and then open again, open in this case, maybe a new CSV called, analysis.csv. And I'll open this one, of course, in writing mode. W here stands for writing. And I'll give this one a name we can use inside of this indented block of code that's part of this with block. So now, I actually have two files open at once, one called views.csv in reading mode, which we're calling file for now. And one called analysis.csv in writing mode, which we're calling analysis for now. And for consistency, why don't I call file here, views, instead, and I'll update this down below, as well. So with these two files open, what can I do? I can both read from one file and write to the other. Now, to write to a CSV, we have a very similar object to DictReader, but instead, one called DictWriter. So maybe I'll make for myself a new object, one called writer, which is a CSV.DictWriter, which can take as input, dictionaries, and write them each as their own row in my new CSV file. I'll say that this DictWriter should be operating on the file we called, analysis, all the way over here. And now I have myself a new object called writer. But to create this writer, I still need more information. Notice how in views.csv, we actually have what's called a header. Some column names, if you will, id, English title, Japanese title. We decided to specify the same things for our new CSV, called analysis.csv. And we do so when we create this DictWriter. In fact, it has another argument called, field names, where field names refers to the header names you'll see up top. Well, what should field names for analysis be equal to? Well, they should be equal to, of course, the same column names as before. So id, English title, Japanese title. And I could type these out just like this, or I could make use of the reader itself. The reader actually stores those same field names in an attribute called, field names. So I can type reader.fieldnames. What we're saying here, is that this new file analysis should have the same field names, the same header as we see in views.csv, thanks to them being stored in reader.fieldnames, which is created from DictReader on this views file here. But now we also want a brightness column, a new field name here. So I could go ahead and combine various lists. This field names is a type of list. I can go ahead and add a new value to that list with this syntax here, plus some new list. And I'll go ahead and call this, brightness. So now I have for this new writer on this file, analysis, I have not just these headers, ID, English title, and Japanese title, but also, now one called, brightness, as well. And just to be sure we're making progress here, I could even go ahead and I could try to write the header. It turns out this writer object has a method called, write header, just like this. And if I were to run this code as is, we should hopefully see a new file called, analysis.csv with those headers in place. I'll run Python of views.py and we'll see, of course, our brightness being printed out, thanks to 12 through 14, but also, we'll see analysis.csv, which has all the same field names as before, the same headers, as well is one called, brightness. So now we have the basis for our analysis.csv, but we still need our rows. So in views.py, what can we do? We could probably, as we loop over the rows in this reader object, maybe also write some rows. Maybe, let's say, read in some row, find the brightness for that print, and then go ahead and write some new row in our analysis.csv over here. So let's try this. Maybe inside of this loop on line 12, I still want to calculate the brightness just like this, but rather than printing, what I really want to do, is write a new row. So writer, this CSV DictWriter object, has a method called, write row, allowing me to pass in a dictionary. A dictionary that will then be written to this file, in this case, analysis. Now, we can specify here the keys, which should match, in this case, the headers of our CSV file, id, English title, Japanese title, and brightness, and provide some values that this row should have for each of those columns. I'll go ahead and type ID, which is a key. Same thing that we was a field name, as a header here. I'll then go ahead and say, row bracket id. So in this case, the id, the value for id of this new row should match the id of this row. That makes sense. So first, 1, then 2. We'll have 1, then 2, as well, and analysis.csv. Now, really same thing for English title. That should be the same as the current rows, English title, where row here, refers to a row from views.csv. So again, if we had The Great Wave, we would also see The Great Wave as the English title in analysis.csv. Same thing here for Japanese title. Japanese title, we would have, of course, the Japanese title from the row. And now, here comes the change we're making here. If I wanted to include brightness, I can do that, as well. I can specify a key called, brightness, matching in this case, this column named brightness, and specify the value that should appear on any given row, in this case, the brightness we calculated up above. Let's go ahead and try this out. And again, here was our full code here. I'll go ahead and run down below, Python of views.py and we'll see no output, at least in our terminal. But now if I look at analysis.csv, well, I'll see the very same data from views.csv. And now this new column called, brightness, with the brightness value we calculated. Probably still want to round this though. So I'll go back to views.py and I will round brightness here, just like this. And then I'll go ahead and run Python of views.py again. If I go back and look, let's see, I'll have those rounded brightness values, as well. Now let's test visually to see if this is actually correct. So we said before that numbers closer to 0 were darker and numbers closer to 1 were brighter. Now I have here on line a painting or print ID 3, 0.34, which seems like a pretty low value compared to the other ones. Let's go check this out. I'll go now to 3.jpeg and we'll see a pretty dark print. If I go back to analysis.csv here, let's find a pretty bright one. I think in this case, looks like a view of Mount Fuji. That one seems pretty bright, 0.75. If we look at this one, we'll see it is in fact, among the brighter of these, as well. So just to be clear, we had before 3, among the darker, and now 35 among the brighter. So a neat way of analyzing these images by their brightness. So I'd argue that our code seems to work. We're both reading and writing from a CSV, but our next step could be to improve the design of our code. Notice here on line 15 through 20, there's a lot of redundancy. We said before that we really want the same data from the rows or reading from to show up in the rows we're writing to. So it turns out, I actually don't need to make a brand new dictionary. I could probably just add on to the row dictionary itself. I'll show you what I mean here. If I go now to just underneath brightness, what I could try to do is the following. I could try to not make a whole new dictionary, but instead, maybe add on a new key to row. So recall, row here has all the values for ID, English title, Japanese title, and so on. All we're doing is adding on some new key with some new value, just like this, one called brightness. And if I wanted to, in this case, write this row, I could simply type write writer, writer.writerow, given the row we're currently reading, but now adding in this new key called, brightness. So much simpler, much fewer lines of code. I'll go ahead and run Python of views.py and we'll see the same results, as well. Of course, no longer rounded, so I'll go ahead and go back here and say, round this to two decimal places. And now I'll run Python of views.py and voila, there is our new CSV analysis.csv with that same information, plus brightness, as well. So this was our brief foray into both reading and writing CSVs, thanks to the CSV library and objects, like DictReader and DictWriter. We'll see you next time.