CARTER ZENKE: Well, hello one and all, and welcome back to CS50-- Instruction to Databases with SQL. My name is Carter Zenke and up until now you've been learning how to design databases and how to add data to them, like how to insert, update, and delete some values all together. Now, along the way, we've added some complexity. That is, we went from one table to multiple, , and we had relationships, too, among all those tables, as well. 

Now, what we'll introduce today is a way to simplify some of that complexity and give you a way to see your data all the more clearly-- . this tool called a view. Now, we'll begin with a familiar data set-- this data set of books. And we learn about this data set of books back in week zero. If you remember, we had a table of 78 books that had all been nominated for the International Booker Prize. It was a single table, just of these books. 

Now quickly, in week one, we introduced more than just this single table. We actually had multiple tables-- a table for books, a table for authors, for example, and also some relationship between those authors and those books. We said that an author could write not just one book, but multiple books. And similarly, a book could be written by not just one author, but multiple as well. This is what we call a many-to-many relationship. 

Now, we also saw this not just visually, but in our database, in terms of tables, as well. We had this kind of schema, where we had a table for authors, a table for books, and then a table in the middle, to talk about the relationship between authors and books-- which author wrote which book. We can see this in the author table, looking at these IDs here. 

And now to recap, if I wanted to find the book that Han here has written, let me ask, how could I do that? How could I find the book that Han, in the authors table, has written, over here in the books table? Feel free to raise your hand. 

AUDIENCE: By looking at the author_id, 31. 

CARTER ZENKE: Nice. So start by looking at the author_id. That's a great idea. So here we see that Han's author_id is 31. And we could make this link between the authors table and the authored table. So I'll highlight this here, so we can see it visually. 

Han has the ID 31. And here on the authored table, we see the author_id of 31 next to, actually, some book_id. This book_id is 74. Well what book has that ID, 74? Turns out it's the white book, all the way over here. Now, this is adding some complexity to our queries. If I want to find who wrote what book, I have to go through the authors table, then the authored table, and then the book table. 

That's just a lot of steps to go through. And we could, though, try to simplify this, to have a better view of this data, to see it with just a single column for authors and a single column for title. So let's try to visualize this together. What I might do first is try to flip around this primary key of authors, this ID column. Let's put it closer to the author table, just visually, metaphorically, here. 

Now I might see that these IDs line, up one to another. And we saw in a previous lecture that I could use a join in SQL to put these tables into a single table. And what a join does, if you remember, is take the primary key in one table, and the foreign key in another, and just find which ones match, to put the rows of each table together. A bit like touching your fingertips together, like this. 

So I'll then visually move these together, join these tables into a single table. And notice how if I, now, just get rid of these IDs-- metaphorically, just visually here-- I now have a much more simple table, a simple view of this data. And if I want to find which book Han has written, well, I could do that. Just look at a single row here. Look at the name column and then the title column here. 

Now, what we're doing, by creating this new view of our data set, is actually creating a view. And a view is defined by this here-- a virtual table that is itself defined by a query. Now, what does this mean? What is a virtual table? What is a query? 

Let's go back a few steps and see this in action. So we started with, in this case, three tables-- authors, books, and authored. And presumably, there's some query here that gets us from point A, these three tables, to point B, this single table. That query might involve some selects, some join we talked about earlier. But there is some query we can write to get us from point A to point B. 

And notice here that the result of this query is itself a table. Like, this has rows and columns and what we can do is see all of those in terms of a table we could later on query. So a view is a way of writing some query to see our data in a new way and then saving this view of our data, so we can query it later on. 

Now, why might you want to build a view? Well, you might want to for a few reasons here. You could use it to simplify your data, as we just saw here. You can use it to aggregate your data, to try to sum up some values and store that in a single table. You could use it to partition your data, or divide it into logical pieces. And you could even use it to secure your data, to perhaps hide some columns you don't want somebody to see. 

There are more reasons, too, to use views. We'll focus on these four today. And let's begin by focusing on simplifying. So we saw earlier a way to simplify these multiple tables down in to one. 

If I go back here, we'll see, we got from this set of three tables to a single table, that had both author names and author titles. So let's see here exactly how much we're reducing complexity by trying out different queries, and then adding in a view to see all the benefits of this new view we have here. 

So we'll go back. And we'll see, in this case, our three tables-- authors, books, and authored. And let's say I want to find the books a particular author has written. I could write a query to do just that. 

And let's go back to SQLite here, to run that query ourselves. I'll come back to my computer, here in SQLite. And my goal will be to find which author wrote which particular books. So if I go to my database, here, I can go to my terminal, and type sqlite3 longlist.db, to reopen this database. 

And now, I should see, if I type .schema, all the tables that are inside of it. I have a table for authors, if I scroll up a little bit, a table for books, and a table for authored-- that is, relationship between authors and books. Now, let's say I want to write a query to find which books Fernanda Melchor wrote. Well, I probably first need to find Fernanda's ID, as we just saw visually in our earlier example. 

So let's find Fernanda's ID. I could say SELECT, let's say, id FROM the authors table, where the name equals Fernanda Melchor. So I'll say SELECT id FROM authors. Then, WHERE the "name" equals Fernanda-- 'Fernanda Melchor' semicolon. So it seems Fernanda's ID is 24. 

And I could use this ID in the rest of my query, but I could probably do better than this. I could use what we saw before, something called a subquery, to take this query I just wrote and put it inside some other one here. So let's take the next step and find the book_ids that Fernando wrote. So let's try looking in the authored table for that, which has author_ids and book_ids, side-by-side. 

OK, I'll SELECT, in this case, the book_id from the authored table, WHERE, in this case, the author_id is equal to-- well, I could say 24, but I want to do better-- I could actually have a very own subquery here to find Fernanda's ID dynamically. I could put parentheses here, then hit one, two, three, four, spaces, and say SELECT id from the authors table, WHERE the name equals Fernanda Melchor-- that same query from before. I'll close this out, close out my subquery. And then if I hit Enter, I should see now the book IDs Fernanda has written. 

But there's still one more step here. What do I now need to do? I have book IDs that Fernando wrote but how do you think I could find the titles from those book IDs? 

AUDIENCE: You could select the title from the books. 

CARTER ZENKE: Nice. When you select title from that books table-- so if you remember, we have titles inside of our books table, but we still need to link those titles through their ID to the book IDs Fernanda has written. So let's do one last step here. I'll come back to my terminal. And I'll write the query in full now. 

So I know I want to end up with titles. I'll say SELECT, in this case, title from the books table. Now, I want to find particular titles where the ID is in that list of books Fernanda has written. So I'll say WHERE the id is IN some subquery, the query we actually just wrote. So let me write it again. 

I'll say one, two, three, four spaces here to indent for style. Then I'll say SELECT the book_id from the authored table, in this case, WHERE the author_id is equal to some number-- in this case, Fernanda's ID. But what is Fernanda's ID? Let's find out. 

I'll make another subquery here to find Fernanda's ID, indenting four times just for style here. Then I'll say SELECT id from the authors table-- and then again indent-- WHERE the name equals Fernanda Melchor. Then I'll close out all my queries to create this single long query to find all the books Fernanda has written. 

And, again, the order here is first find Fernanda's ID, then find the book IDs associated with that ID, then find the titles associated with those book IDs themselves. Now I'll hit Enter and I should see all those books that Fernanda has written. 

Now, as we talked about, this is a lot of complexity, a lot of tables to go through, and there's probably a better way to do this. So let me open up a new terminal and try out a different way of going about this. I'll make a new connection to my database. I'll make a new terminal and I'll say sqlite3 longlist.db, hit Enter. Now I should see that I have a brand new connection to my database. And I could try to find a better way to do this. Let me try to do what we did visually and join these tables together. 

Well I could try SELECT name and title FROM the authors table. But notice how authors table doesn't have titles in it. It does have the name of the author, but it doesn't have book title. So to get titles, I have to join in some new table here. I'll hit Enter and I'll say JOIN, let's say, the authored table on authors dot id equals authored dot author_id. And all I'm doing here is saying I'll take first the authors table and then why don't I try to combine this table with the authored table, linking it up by looking at the authors dot ID column and the authored dot author_id column. 

And visually, let's take a look what we're doing here in this join. If I look at this in my slides here, I should see-- I started with my authors table. Here I had names of those authors and I had an ID column in that author table. Now I'm trying to join the authored table by telling SQL that these two columns, author_id and id, should align. They should line up like this so their rows are all together. 

But there's still one more step. I've joined the author table and the authored table, but what's the next step to find book titles? I still have to join the books table. So let's try that here. I'll come back over and instead of joining just the authored table, let me now try to also join-- let me hit Enter-- to also join the books table. I'll JOIN books ON books dot id equals, in this case, authored dot book_id. And to be clear, this line here means that what I'm trying to do is now join this book table, which is over here, with this author table, aligning it so that their IDs match up into a single row, as we see here with 74 and 74, 1 and 1, and so on. I'll come back. 

And now let me try this. Come back over here. And I'll just try hitting Enter to see what I can see from this query. Let me zoom out a little bit and you'll see I actually do have author names next to book titles all in one table, just as we thought we would. So this is useful to me, but what if I wanted to have this stored as a way to see my tables? I want to be able to query this kind of table I see right here. 

Well for that, I could try to introduce a view. Now a view starts with a query, as we just saw. I wrote a query to select names and titles from three different tables, joining them together. But if I want to save the results of that query so I can query them later on, I could use this syntax here create a view that has a name that I can give to it as this query down below here. So CREATE VIEW given some name AS the result of this query gives me a view that is then saved and part of my schema. So let's try that here to create a view by which you can see both author and book titles side by side. 

I'll come back to SQLite here and I'm already in this brand new terminal. So I could try to remake this view and save it now for later usage. I'll say create a view and call it longlist, the same name we gave our single table back in Week 0. Now I'll say this view is the result of this following query AS, let's say, SELECT name and title from the authors table-- but not just the author's table-- join, let's say, the authored table-- JOIN authored on authors dot id equals authored dot author_id. And then join again our book table as we saw visually here. JOIN books ON books dot id equals authored dot book_id. Now that is the query I used before to see author names and book titles side by side. If I create this as a view-- hitting semicolon and Enter-- nothing quite happens. But now if I type .schema and hit Enter, I should see a view called longlist that has two columns, name and title, and actually see the query I use to create this view. 

So let me think now. I want to see what's inside this view. And to query a view I can use it exactly as I would a table. I could say SELECT star FROM longlist semicolon hit Enter, and now I should see author names and book titles side by side. So if I have this kind of table with authors and book title side by side, what then does my query become to find the books written by Fernanda? Let me ask our audience here. What query could I use now on this view to find the books written by Fernanda? 

AUDIENCE: Thank you, Mr. [INAUDIBLE]. We can query the book or all of the books using her specific name. 

CARTER ZENKE: A good idea. I can use a particular name in this case and perhaps just use a simple WHERE. So let's try that here. I'll go back to my terminal. 

And if I want to find the books written by Fernanda, well, I could just query for them in this single table that I created called longlist-- or not really a table but now a view. I'll say, in this case, SELECT the title from-- SELECT title from longlist WHERE-- to Muhammad's point-- I could just say name equals Fernanda Melchor semicolon. I'll hit Enter here and I see those same books Fernanda has written Paradais and Hurricane Season. 

So let's see where we started here. We started with this kind of query-- long, multiple lines, many subqueries involved-- just to find the answer to which books Fernanda had written. Now, though, that we have a new view, I can simplify this query to just this here-- SELECT title FROM longlist WHERE the name equals Fernanda. And I can do all of this without using much, if any, more disk space. A view, remember, is a virtual table, not a real one. And I can then have the result organized like this while still being in those underlying tables stored essentially elsewhere. So let me ask then-- what questions we have on how views can simplify our data? 

AUDIENCE: Can we then somehow manipulate the views to maybe order the results or display them just differently? 

CARTER ZENKE: A great idea. So if I wanted to maybe order these books, I could absolutely do that in the same way I would do in a table. So let me go back and show you how to do that. Let's say I want to organize these books alphabetically by title. So if you remember, using our regular old table, we could use ORDER BY. I could do the same thing here. I could say SELECT both the name and title columns from my longlist table. And now let me try to ORDER BY the title column in this case and I'll hit semicolon. Now, if I hit Enter, I should see the book titles organized alphabetically. Now, the view itself does not have these titles automatically alphabetically sorted for me, but I could make it do that. If I had, when I was making this view included ORDER BY, let's say, title at the end of that query, I would then have those titles saved in this view automatically as A to Z. So very powerful if you want to not just find particular ways of seeing your data but also organizing it or ordering it too. 

So with that note, we've seen how to simplify one example of having multiple tables down into one. We'll come back in just a few and see what we can aggregate data too, putting that data into a single table for us to see some statistics and such as well. Come back in a few. 

And we're back. So we just saw how to simplify some data, how to take multiple tables and see them as one. What we'll do now is think about how we can aggregate data, that is to take many individual data points and summarize them to find an average, or a min, or a max, but see those results inside of a view. 

So if you remember in our longlist database, we had a table of ratings. And in this case, books had individual ratings from individual people, like maybe I gave book_id 1 a four out of five stars, but David gave it only a three. Now, we store these individual ratings in our database, but ultimately I don't care so much what I rated it, or maybe what David rated it, I care about the average rating, let's say, for each book. 

But if we care about those averages, why then would we store all the individual ratings? Let's think of this as a design question here. Can you think of a reason we might want to actually have these individual ratings stored in our database? What does that give us at the end of the day? 

AUDIENCE: Perhaps reliability, to know how many persons rated the book. 

CARTER ZENKE: Nice. So it's nice to have some more data. I could know how, let's say, I rated a book or how you rated a book. That's good here too. And I would argue that from having this additional data, we could compute more statistics. I could find not just the average rating but also the minimum rating, or the maximum rating, or the median rating, and so on. So these individual results allow us to find and compute more statistics, like averages, mins, and maxes, as opposed to just storing the average. 

Now, if I wanted to find the average of these results, what could I do? We saw before-- I could try to group ratings buy individual book IDs. Let's say I want to find the ratings for book_id 1 and book_id 2. I could group by the book ID column, turning these into individual groups. And let's say I want to find the average here using a SQL query. I could say find me the average of the rating column when I have groups by book ID. That effectively takes my table like this and says, OK, I'll combine all the ratings for each individual book ID and find for you the average. 

I'll first compress this data together, literally putting it one on top the other here, and then find the average of all those results-- let's say, 3.67 book 1 and 2.5 for book 2-- compressing all of our data into a single result-- in this case, the average. So let's walk through then how we found the average using a SQL query and then we'll turn that into a view. 

So I'll come back over here to that computer and I'll open up longlist.db. I'll already here, so I can type .schema ratings to see the schema for this ratings table. And you'll see it does have a book_id column and a rating column as well. So if I want to find the average rating for each individual book ID, what could I do? 

I could try to SELECT the book_id column and the rating column like this. But I don't want to select just the rating column itself that gives me individual values. I want to find the average rating instead. I could instead say find me the average rating from this table. Now let me give this a new name. I'll say AS rating, calling the average rating now newly, in this case, rating. And I'll select this FROM the ratings table. And finally I need to GROUP BY something. I need to GROUP BY book_id. That means I'll calculate the average rating for each book ID. I'll hit semicolon here and Enter and I should see all of these book IDs, top to bottom, with their own rating. 

But here it doesn't look so pretty. What could I do to try to make this not so long, like 3.774019431-- there's a better way to do this and we saw it before. What could I do to enhance this view of my ratings table? 

AUDIENCE: I see some suggestions about rounding the ratings. But I was also going to suggest adding the book titles, of course. 

CARTER ZENKE: Ah, good point here too. So we first want to add the-- first we want to probably add the book title to see what titles are talking about-- not just, in this case, individual book IDs-- but also a good point to say let's try rounding these so we don't see 3.7740194. Let's try to round that maybe to two decimal places just to be safe. So let's first try rounding this and then we'll try adding in some book titles to your suggestion here. 

I'll come back over and let me just try rounding it first. So I will come back to my terminal and I'll say I want to find the average rating for each book_id. And I want to select, not just the average rating as we did before, but the rounded average rating. So I can use ROUND and say take the average rating and round it to two decimal places. And I'll call this column just rating in the final result. And I'll select this from the ratings table. And now I can group by, in this case, book_id Enter. I should see I have rounded ratings now, a little more pretty than before. 

Now, to your point, let's try to add in not just book IDs but also book titles. What does it mean for a book ID to have some rating? I want to see titles here in the end. So let's do that. I'll say SELECT book_id, and also SELECT title, and maybe even select the year that book was nominated for the Booker Prize. I'll then select the rounded average rating to two decimal places as rating. And I want to select this from the ratings table, from ratings. 

But what do you notice? If I were to select just from ratings, what data am I missing and where could I find it? 

AUDIENCE: Hey, you're missing the titles from the books table. 

CARTER ZENKE: Yeah, I have to have titles and those are stored in the books table. So what should we do? Probably just join in the books table matching up that primary key with some foreign keys. So I'll come back over here and we'll do that. I want to select from ratings, of course, but I also want to join, in this case, the book table. So I'll hit Enter and now I'll say JOIN as well the books table. Now, how can I match up the books table with the rating table? Well, I know that the rating table has a foreign key called book_id. And the book table has a primary key called id. So I'll tell SQL I want to match these two up a bit like this. 

I'll say JOIN books ON ratings dot book_id-- that foreign key that's inside of ratings-- equals, in this case, books dot id. Now, there's one more step, which is to group by book_id still. So I'll hit Enter and GROUP BY book_id. But now I should be able to hit semicolon and Enter and I should see, not just book IDs, but now titles and the years those books were nominated as well. 

So this view is getting better and better. And I might say we can actually go ahead and just add this to our schema. We could create a view that returns to us the results of this query. So let's try that. 

I'll say, create a view called maybe average book ratings, average book ratings for people to come in and do analysis on if they'd like. Then I could say, I want to create this view as that query we wrote before. So I'll say, again, select the book_id, the title, and let's say the year column, the rounded average rating to two decimal places. 

And then I want to select as rating. I'll select this from the ratings table, from ratings. And then I'll join, as we did before, the book table from ratings and then join books on that primary key and not foreign key. So I'll say on ratings.book_id that foreign key in ratings, and set that equal to books.id, that primary key we're going to use to join these tables together. 

Then I'll say group by book_id. And I should have if I hit Enter my very own view called average book ratings that I can use to then see the averages for each of these books. I'll say select-- let's say star-- from average_book_ratings semicolon. And now I should see all of those results just as we had before. 

Now notably, because this is in a view, I'm not using much, if any, additional space in my database. I still have underlying this view individual ratings. But because I saved this query and the results it created in a view, I can actually have access to the rounded average ratings for each of these. And I can query that view as well. 

So let me ask then what questions we have on how we've used views to aggregate some data so far? 

AUDIENCE: Actually, I have a question. Can we use both the same name, like before it was long list. And here we're using for average something. Can use long list for both of them? 

CARTER ZENKE: Yeah, so if you remember before we had this view called long list. And long list had book titles and authors inside of it. And I think you're asking, could we try to join, let's say, the long list table with our ratings to get book titles and so on? We couldn't do that because we don't have the ID column in that long list view. 

We could though perhaps write some queries that involve that long list view as well. OK, let's keep going here and seeing what advantages this view might give us. So I'll come back over here. And one more advantage is that if I were to add in some new rating to my ratings column, I could just requery this view to find the updated average. 

So let's say we use some INSERT INTO statement here and I update some books ratings. Well, if I'm a data analyst and I come back in a little later and I want to find the updated rating, I can do it just again, by selecting from, selecting star, from average book ratings like this, average book ratings. Hit Enter. And now I see perhaps some updated averages as well. Every time I query this view, I'll rerun essentially that query from before. And now I'll see the updated averages for each book. 

Now, sometimes as a data analyst, I might want to create a view but not store it long term as part of my schema. Like here if I type dot schema, I see both our prior view called long list right here. And I also see our average book ratings view. But if I am somebody who comes in and I maybe want to find the average book ratings not by each book, but by each year, I could do that, but then not store that view long term. 

Maybe I don't want others to see this view. Or maybe it's just from my own analysis on this given day. Well, to do that, I could move beyond just create view. And I could also enhance this to be create temporary view, where a temporary view exists only for the duration of my connection with the database. 

So to be clear, that means if I type SQLite three long list dot DB, I'm opening a connection to the database. If I then type dot quit, I would then lose that connection, quit that connection. And that view would no longer be there. So let's try this. 

I'll come back to my terminal. And we'll create a temporary view that shows us the average ratings for the books by the year they were nominated, not by every individual book here. So I could try doing the same query from before, like select, let's say, the average rating select it from the ratings table, join books, and titles, and so on. But what do I have already to build on top of? If I wanted to find a view that has a title, a rating, and even let's say the year that book was published, what do I already have in my schema for that? 

AUDIENCE: So we could use the view that we just created? 

CARTER ZENKE: Yeah, a nice idea. So we could actually use the same view we just created to create ourselves a brand new view. So just because something is a view doesn't mean we can't use it to then create another view as well. So we'll do exactly that. We'll come back over here. And I can type dot schema to see I still have this view called average book ratings. And in it I have book IDs, their title, the year they were nominated, and their average rating. 

But now I want to find, let's say, the average rating for each individual year, not for each individual book. So what could I do? I could say select the year column and the rounded average rating, in this case, from the rating column in average_book_ratings. And I'll wrap this query here. 

Let me zoom out, so you can see it all on one line. Now, I'll say I want to group by the year column. So here I'm now saying, find me the average rating of each book, of each book's average rating, for each individual year. I'll hit Enter. And we should see, if I zoom back in, for 2018 we have 3.75. For 2019, we have 3.64. So this means that apparently in 2023, the average rating for books that were longlisted for the international Booker Prize was 3.78, and same for 2022, 3.87. 2021 was 3.69. 

So I have this new view here. I could store this temporarily inside of a view that I could use while I'm doing some analysis on this given day. So I'll say create a temporary view. And I'll call this one average ratings by year. How were the books rated each individual year on average? And I'll say this is the query that I just did before. Select, in this case, the year and the rounded average rating to two decimal places, as the rating column from average book ratings, that same view from before. 

Now, I'm creating a view from my view. I'll group by here, group by year. And then I'll hit Enter. And I should see nothing popping up. But now if I wanted to query this view, I could say select star from average ratings by year. And I'll see the results of that view. Now, though, here's the catch and why it's temporary. If I were to type dot quit now, dot quit, and try it again, SQLite 3 long list dot DB. Let me retype that. SQLite 3 long list dot DB. 

If I now try to select star from average ratings by year semicolon, I'll get no such table and really no such view. I don't have a view called average ratings by year anymore because this was temporary. So let me ask, what questions do we have on temporary tables, if any? Temporary views rather. 

AUDIENCE: Right, so thank you. So the first thing that I notice is that it just helps us to do some kind of testing when we use temporary views. Basically, the views will not be saved in the database. But again, it's just a good way just to test the query, whether it is usable or not. 

CARTER ZENKE: That's a great use case for them. And in general, if you are working on a database and you want to organize the data in some way, but you don't want to store that organization long term, a temporary view is great for that. If though, you are a developer and you want to ingrain this view as part of your database, you might just use create view, plain and simple, to have it always be in your database for you wherever you connect to it and whenever you connect to it. Great question. 

OK, let's keep going. And let's introduce now something called a common table expression, or a CTE. And a common table expression is simply a view that exists for the duration of a single query. So a regular view, as we saw before, exists forever in my schema. A temporary view exists for a single connection. But a CTE exists for a single query. And the syntax looks just a bit different. 

With a CTE I can write a single query, but at the beginning of that query, define the CTE, the Common Table Expression. To do so, I would say with and give it some name, as and then inside these parentheses say the query I want to be this CTE here. I could then optionally follow it up with more CTEs for this same query. Or I could omit the comma, and then use just select from the name of a CTE up above. 

This is useful if you want to have some very temporary view that you don't want to store in your schema, but you do want to use for this particular query. So let's see an example now recreating our average ratings by year, but now using a CTE. So I'll come back to my computer. And I'm already in my SQLite terminal. What I should do, though, is try to drop the prior view. I want to remove that view called average book ratings to create room to have now a CTE with that very same name. 

So I'll say, drop view and the name of the view I want to drop, average book ratings, average book ratings, semicolon. And now I should see that view is no longer in my schema. So I want to recreate though what we did before, which was finding the average ratings of books year over year. 

So let me try to write this query. But it would be handy if in that query I had access to a view that gave me the average rating for each book. So I'll write the query like this. I'll use a CTE. And I'll say, with average book ratings, average book ratings, as now this query I'll write now inside. And this query should be familiar. It's the same query we use to find the average rating for each book. 

I'll say here, I want to select the book ID column, the title column, the year column, and the rounded average rating to two decimal places, calling it rating, in this case from my ratings table, like this. Now, what do I need to do? 

I also need to, as you saw before, join in my books table. Join books on the primary and foreign keys aligning. So I'll say the book ID column, that foreign key in the ratings table, should equal books dot ID, that primary key column in books. Then I will group by the book ID to find the average rating for each book. And now, let me close out this CTE. I'll close my parentheses. 

And because I'm only using one CTE, I don't need a comma. I can then just write the rest of my query. I'll hit Enter for style sake. And now with this CTE called average book ratings, I want to find the average rating year over year grouping not by book, but by year. So I'll say select, in this case, year, and then select the rounded average rating still to two decimal places as a new rating column. 

Then I'll select this from, in this case, my CTE. So I'll hit Enter here as well from average book ratings, that same CTE I defined earlier in my query. Now finally, to find the average rating across individual years, I'll say group by year semicolon. And now hopefully, if I hit Enter, I should see the average rating for each year in my database. 

OK, so this CTE is useful when you have subqueries you don't want to repeat or when you want to make a view that only exists for the duration of a query. And it's with all of these tools, like views, temporary views, and CTEs that we can get away with storing statistics about some data in our data set and not storing them in the actual database, but only inside this view we might have here. 

What we'll do next is see how we can use views to create not just these aggregations, but also partitions of our data. A way to split our data up into logical pieces for use for our own sake or even for application sake as well. We'll come back In just a few. 

And we're back. So we've seen so far how to use views to simplify and also aggregate our data. What we'll see now though, is how to use views to partition our data. That is to break it into smaller pieces that will be useful to us or to some application. Now, on the International Booker Prize website, they have a page for each year of longlisted books. But in our table, we have all the books together, regardless of their year. 

Well, maybe it will be useful in the application, even for me personally, to have a table one for each year of books that have been nominated for this prize. But before I create that view, I should ask, how do I try to find the books nominated in 2023? If my table looks like this, I have a table of books with ID, title, and year columns, what query could I use to find only those books nominated in 2023? 

AUDIENCE: Use the WHERE clause with the year? 

CARTER ZENKE: Yeah, I could use the WHERE clause and try to search for a given year. Like I could try to say, where, in this case, the year is 2023 or equal to 2023. And that would give me back these two rows if only two books were nominated in 2023. In reality it's 13. If I wanted to find the next ones from 2022, what could I do? I could just change the WHERE clause to say where year equals 2022. And then same thing for 2021. 

I could say where year equals 2021. Now, each of these segments of my data is what we'll call a partition, a certain breaking down of my data into smaller pieces, in this case, organized by year. So let's now create smaller tables, one for each year that might be in our data set here. I'll come back to my terminal. And let's think about writing this query from before. 

So if I wanted to find the books that were nominated in 2022, I could say select the ID column and the title column from my books table. And then I want to find only those books, where the year equals 2022, then semicolon, and then Enter. And I should see only those books that were nominated in 2022. But now, I want to create a view that might actually store the results of this query more long term. 

I could say create view and call this, let's say, 2022 as now same query from before, select ID and title from my books table, in this case where the year equals 2022 semicolon. Now, if I ever want to find only those books nominated in 2022, I could now use my view. I could say select star from 2022 quote and then semicolon, hit Enter. And now I should see those very same books. 

So what could I do now? Maybe I want those books from 2021 as well. Let me ask the audience, what could you do to create the view for the books in 2021 based on what we've just seen now? 

AUDIENCE: We could just change the year from 2022 to 2021. 

CARTER ZENKE: Yeah, just change the year. And what would you propose we call this new view? if you had to take the reins on naming this view, what would you name it? 

AUDIENCE: Books that were written in 2021. 

CARTER ZENKE: Yeah, we could call it books that were nominated in 2021. But I would argue that might be a bit of a long view title. So let's shorten just a little bit. And maybe in keeping with our current naming, we'll say 2021 instead. So I'll come back over here. And let's try this. I'll say create a view. And we could, by all accounts and purposes, we could say books nominated in 2021. And that's totally OK. 

But if I were to pass this view on to a new programmer, a new database analyst, well, I kind of want to shorten it for them, make it more simple for them to read. So I could just call this 2021. Of course, this title is kind of ambiguous. Like if I pass you a view called 2021, are you automatically going to know what's inside that view? Probably not. So it's actually a balance between how descriptive you want to be with your names here. 

We'll stick though with 2021 to our earlier example here. Now I'll say create view 2021 as select ID, title from books. And to Fabian's point, I'll say where the year equals 2021 semicolon Enter. And now I could say, select star from 2021 semicolon. So this is a good way to break apart a large table into logical pieces. It's useful for both you as a programmer and also for applications that might only need to access some subset of your data, where some condition is true. 

So based on what we've seen so far, what questions do we have on partitioning data with views? 

AUDIENCE: We have created two views for 2021 and 2022. Rather than creating two views, is it possible to simplify it as one and then use it rather like where you simply change that alone? 

CARTER ZENKE: Yeah, I mean, it's a trade off here. So you could imagine trying to do, let's say, a single view that maybe has just titles and just years in it. But then you'd have to rewrite the query later on to find only those books for a certain year. Here, what we're trying to do is just like break up this big table into smaller pieces. And you can imagine it's being particularly useful if you had not just 78 books, but even like thousands, or tens of thousands, or millions of books. It could be useful to have a view for each logical segment of those 10,000 or more books. 

Let's take one more question here. 

AUDIENCE: I would like to ask about updating the view itself. If I update a single table, any element in each table, their view can be updated automatically? Or maybe it [INAUDIBLE]. 

CARTER ZENKE: OK, a good forward-thinking question here. You can, you're right, update a table. You can say update the table and set some column to some value. What you can't do actually is update a view. And that's because a view doesn't have any underlying-- actually, it doesn't have any data inside of it exactly. It only pulls data from underlying tables. 

So let me show you how I could update this view, or at least try to update this view. I'll come back to my computer. And let's say we wanted to change a book in the 2021 view. I could say select star from 2021 semicolon. And maybe it turns out there's a typo in here. So instead of a minor detail, it's called something like the minor detail, this book on ID 34 here. I could try to update this view. I could say, update 2021 and set the title equal to the minor detail, where the title currently is minor detail, like this, minor detail. 

Then I'll hit semicolon. And if I do this, I get this error, cannot modify 2021 because it is a view. Now, why can't we modify a view? Well, recall that a view is simply the combination of data from underlying tables. I've assembled data from different pieces of my database and put them into a single view. But I can't then update that data because that data is located in multiple individual tables around my database. 

What I should instead do is try to update the underlying table, which I can do as we saw before with just update and set to. So more on how we can update, or at least try to update, views a little bit later. We'll take a break and come back to talk about how we can secure data using views as well. 

And we're back. So one more use case for views is being able to secure the data that's in your database. You might imagine having a table. But some columns in that table you don't want to share with somebody else. Well, a common practice in security is to only give someone the information they need to know. With views you could do that very principle. You could accomplish that by taking a table and reducing the number of columns somebody else might see. 

So let's consider here some rideshare company. And rideshare companies keep track of a lot of information about us. They keep track of where we're going, where we're coming from, who's in the car with us, and who drove us there even sometimes. Let's say that a rideshare company has a table called rides. And inside this table is the origin of several rides and the destination along with the rider who took that ride. 

Now, if I were to ask you here, let's say, I want to give this data to an analyst to figure out which rides are most popular or which rides are highly rated. What data should I probably omit? If I wanted to share this data with somebody else, what should I remove from this table? 

AUDIENCE: The data you could remove would be the rider. 

CARTER ZENKE: Yeah, good point. So if I look at this table and I think of what information is personally identifiable, PII, Personally Identifiable Information, it seems like this rider column could identify somebody who took a ride from, let's say, Good Egg Galaxy to Honeyhive Galaxy, and we don't want to leak that information if I gave this to some analyst. Well, with a view I could keep this underlying table. But I could create a view that only has the origin and the destination, that omits the rider column. And I could more confidently share this view with an analyst, not the underlying table. 

So let me try to implement this view, so we can see it in action inside this database here. I'll come back to my computer. And in this case, I have a new database. I have one called rideshare dot DB. I'll type SQLite 3 rideshare dot DB to open up this database. And if I type dot schema to see the schema of the database, I should see I have one table called rides, where each ride has an ID, an origin, a destination, and a rider. 

Now, I'll select star from the rides table, select star from rides, hit semicolon. And I should see the very same data we just saw in our slides, where Peach is going from Good Egg Galaxy to Honeyhive Galaxy, Mario going from Castle Courtyard to Cascade Kingdom, and so on. So now, I know I don't want to share this data with somebody else. What could I do? 

Instead of selecting everything, I could try to omit this rider column to the point before. So let's try that. I'll select just ID, and just origin, and just destination from rides. I'll deliberately omit that rider column. Now, I'll hit Enter. And I should see the origin and destination. But I actually don't see the rider column. So this is a decent view. But I could probably improve this even more. 

It turns out that in SQLite if I want to fill a column with some arbitrary value, I can do that a bit like this. Why don't I try to select ID and select origin and destination. But I also want to let somebody know that this data has been anonymized. There is rider information in the underlying table. But you don't have access to it. What I'll do instead is say, let me make this new column that's filled with this value anonymous as a string. 

I'll say quote unquote "anonymous" as this column name called rider. And I'll select all of this from the rides table semicolon. Now I have more helpfully indicated that there is a rider column. But you deliberately don't have access to see who those riders are. 

So my last step here is probably to turn this into a view. I have a good query. But I want to save it so I could query it later on. Well, I could type select or not select. I could type Create View first. I'll create a view. This one just called analysis. I'll give this to my data analysts here. Create view analysis. And then I'll say as let's go with select ID, origin, and destination, as well as the anonymous column as, in this case, the rider. And I'll select all of this from my rides table semicolon, hit Enter. 

Now, if I type dot schema, I see my new view analysis. And if I say select star from analysis semicolon, I should see my "anonymized", quote unquote, this data. But this is helpful. I'd argue it's not very secure, at least in SQLite. And if you're thinking adversarially here, what might you be able to do still? If I gave you access to this database, what could you do to de-anonymize this data and to see the data I didn't want you to see? 

AUDIENCE: You can just query the riders table and find what are the riders values in the riders column. 

CARTER ZENKE: OK, so you're correct that we can still see the rides table. Maybe I look at the ID column in the analysis view and I see those IDs. I might be able to link those IDs with the IDs in the rides table. And nothing's stopping me at least in SQLite from saying select star from the rides table. So I'll come back over here. And I'll try that. I'll say, in this case, select star from rides, hit semicolon. 

And now I see all that ride information, although I hid it in this view called analysis. And this is a downside of SQLite. In SQLite we can't set access controls, meaning I either give you the entire database or nothing at all. In other DBMSes though, I could set access controls. I could give David access only to the analysis view. I can give myself access to both the rides table and the analysis view. 

So keep that in mind when you're working with SQLite. OK, so we've seen here how to secure some data using SQLite and using views in this as well. Let's come back in just a minute and see how we could actually use SQLite and views to not just secure our data, but also to work on soft deletions, which you saw just a bit ago. 

And we're back for one final application of views. We saw just a few weeks ago how we can implement soft deletions, that is not fully deleting something in our database, but only marking it as deleted. And we saw this in the context of Boston's own Museum of Fine Art. It's keeping track of items inside their collection. 

So let's say we had this table called collections full of items in the MFA, the Museum of Fine Arts in Boston. Here I have four items currently in our collections ID one, ID two, three, and four. But I also have this deleted column, which is either 0 or 1, depending on whether an item has been deleted or not. So I want to delete, let's say, Farmers Working at Dawn, this first piece here. Instead of deleting it from the table fully, I could set this deleted column from a 0 to a 1, just like this. 

And now, it's only soft deleted. I still have the data here. I still have the title. But now instead of being a 0, it's marked as a 1. Now, this is useful. But what could I do to only see now the items that have not been deleted? If I show you this database, collections here, the table here, I have titles and the deleted column. But if I only want to see those items that are not deleted, what query could I use for that? 

AUDIENCE: You would use a WHERE statement. 

CARTER ZENKE: I could use a SELECT statement. And I could use probably a WHERE clause as well. I could say select star where perhaps deleted equals 0 because in this case, when deleted is 0, we know this item is still supposed to be in our table. So here's the query here, select star from collections where deleted equals 0. 

Now, we can try to convert this query into a view, so we could always only see the items in our current collection, not those we have marked as deleted. So I'll come back over here. And we'll see we have our collections table. But we also have now perhaps a new view of this table called current collections. 

Well, if I set current collections to be the result of this query, I should see when I update deleted to be 1 for some of these items here, it should be removed from this current collections table. So let's try that. I could take Farmers Working at Dawn and delete it. I could see this correspondence between this row here and this row here. 

It's in both our collections table and current collections. But if I set deleted from a 0 to 1, I should see that item removed from current collections, thus updating my view and keeping track of my items that are not currently in my collection. So let's come back here and implement this in terms of our own database. I'll come back over here. And I will open up MFA dot DB, that old database from a few weeks ago. I'll say SQLite 3 MFA dot DB. And now if I type dot schema, I should see the tables inside of this database, collections, artists, and created. 

We'll focus in particular on the collections table. So if I say select star from collections, I should see that I have the title, the accession number, and the date this piece was acquired by the MFA. Now, though, if I want to add a deleted column, I could do so by using alter table. I could say, let's alter table collections and add a column called deleted. I'll say this column has the type integer. It stores whole numbers. And the default number is 0 semicolon. 

Now, if I say select star from collections semicolon, I should see there is this new deleted column where all the values are 0 at first. So if I want to delete something, I could instead of typing delete from as we saw before I can just update this deleted value. I could say maybe update collections and set deleted equal to 1 where a certain title is present. I'll say this that the title is equal to Farmers-- title equals Farmers Working at Dawn semicolon. 

So now, I've soft deleted this item. It's still in my table, at least the title is. But now I've marked it as being deleted. It should not be in my ultimate view. I'll hit Enter here. And I'll say select star from collections. I'll see that in this case, I still see Farmers Working at Dawn. But what do I want? I want a view that only shows me those items that have not been deleted. 

So I'll try this. I'll say select star from collections and apply a condition to it, to our earlier point before. So I'll say where, in this case, deleted equals 0 semicolon, hit Enter. And now I should see only those items that are not marked as deleted. But I want to turn this into a view, which I could. I could say maybe create a view called current collections. And then I'll make this the following query as select ID, select title, accession number, and acquired. But now I'll actually omit that deleted column. 

I don't want this column to show up in my current collections. I only want it to be in the underlying table. So I'll select all of these columns from the collections table. And then I'll say where deleted equals 0 to apply this condition and only get those items that are not marked as deleted. Now, I'll hit Enter. And I should be able to do this, select star from current collections semicolon. 

And now I'll see only those items that have not been deleted. So this seems worthwhile. I'm now able to view the items that are currently in my collection. But a problem remains. Like if I tried to update this table and insert something into it, what might happen? Let me ask our audience here. Based on what we saw before, what would happen if I tried to insert into this table here, this view perhaps? 

AUDIENCE: I think we cannot update the view itself, as we saw before. 

CARTER ZENKE: Yeah, we can't update the view itself. We can't insert new data. We can't even delete data. So let me try first to delete data. And then we'll see how to insert some data a little bit later. I'll come back over to my terminal. And what I wanted to delete, Imaginative Landscape, I could say delete from current collections where the title equals-- let me make a new line here-- where the title equals Imaginative Landscape semicolon. 

Now, I'll see that same error we got before when we tried to update some view. I can't modify this because it is a view. Well, it turns out there is a way to work around this. We can't modify the view. But we can modify the underlying table. So to do this, we rank from old friends, in this case, those called triggers. So if you remember, a trigger is a way to specify some SQL statement I want to run when some other SQL statement is run. 

We saw before we could use before or after to say run this SQL statement either before or after some other SQL statement here. What we didn't see until now is this INSTEAD OF trigger. I can create a trigger that has a name. And I can run it instead of some other SQL statement. So I could say INSTEAD OF deleting on this view, let me give you the proper query to run, in this case. So instead of deleting on this view, I will for each row run this other statement. 

I'll say begin. This is the statement I want you to run instead. And in this statement, we could have something that actually updates not the view, but the underlying table. And we then end that query here. So based on what we know about our view and the underlying table called collections, what query should we put in here to actually delete something from our table? Let me ask our audience. Instead of deleting from our view, what should we instead do to our table? 

AUDIENCE: Yeah, you should update the deleted column in the collections table. 

CARTER ZENKE: Yeah, so instead of deleting from our view, what I really want to do is update that table. So let's try to trigger it to do just that. I'll come back to my computer here. And I'll start on this trigger. To our point, I want to create a trigger that runs instead of deleting on this view. And what I want to instead happen is an update on the underlying table. 

So I'll say create a trigger, this one called delete. And then I'll say instead of delete on current collections, that is whenever SQL listens and hears a delete on the current collections view, I don't want that to happen. I instead want this other statement to happen. I'll say for each row affected by that delete, for each row I ask delete on current collections, run instead this other statement. 

I'll say begin. Here is that statement. One, two, three, four indents for style sake. And now I'll say I want to update collections and set deleted equal to 1. But I don't want to leave this just as is. If I say update collections, set deleted equal to 1, what's the problem with that statement, if I end it there? 

AUDIENCE: So yeah, it will update all the columns, whether it is we want it to be deleted or not. 

CARTER ZENKE: Yeah, with no condition it'll update all the columns here. So I want to apply a condition to make sure it only affects the particular row I'm working with. Well, what uniquely identifies each row? The ID does. So let's try using the ID here. I'll come back. And I'll say I want to update collections, set deleted equal to 1 but only when some condition is true. 

I'll indent again four times and say where, in this case, the ID column is equal to the old ID. Now, old here is a special keyword that means the row we just deleted or that we would have deleted from current collections. I want to update that same row with that same ID in the underlying collections table. 

So I'll put a semicolon here to end. And then I'll say end this statement. This is the entire statement I want to run after I try to delete all my current collections table. Now, I'll hit Enter. And I'll see nothing changes. But if I say dot schema, I should then see down below here, I have a trigger called delete that will instead run an update when I try to delete from this view. 

So let's try that. What I can do now is try again to delete from current collections. I'll say delete from current collections where. In this case, the title equals Imaginative Landscape. Let's say we sold this painting. It's no longer part of our collection. Imaginative Landscape semicolon. And now it actually seems to work. I get no error here. So it seems like it went OK. 

Well, if I select star from current collections, I actually see that it's gone. But is it really gone? If I try select star from collections, not current collections, the underlying table for current collections semicolon, what do I see? That the ID column has been updated from a 0 now to a 1. So a good way of trying to use triggers here to update our views by instead modifying the underlying table. 

Now, going to be one more issue we encounter here, which is the following. Let's say I'm at this current state in my tables. I have collections, where Farmers Working at Dawn is deleted. It has a 1 right here. And we thus can't see it inside of current collections. Well, let's say I want to not delete from current collections, but also add to current collections. I could try to insert into this view. But I probably actually won't be able to do that because it is a view. 

Let's think then. If I wanted to insert into this view, what should I really do instead? 

AUDIENCE: You should add it to the collections table instead of the view. 

CARTER ZENKE: I like that idea. So I should add this row to the collections table instead of the view. But now, here's a bit of a trick. Let's say I add Farmers Working at Dawn. We reacquired it. What should I do in that case, when I'm adding a painting I previously deleted? 

AUDIENCE: So what we could do is we know that the deleted column-- we can check if the deleted column is gone and then based on that, we could update that particular column. Or if there is no title in the collections table, we can just add it, insert it. 

CARTER ZENKE: Yeah, I like your thinking there. First trying to check, do we have this item already in our table? Or do we not? If we do already have it in our table, we should just flip this bit from 1 to a 0. If we don't though, let's add a brand new row, as we heard before. So let's think then. What could we use to conditionally run some code on the collections table? 

Well, it turns out that you can use some other aspects of triggers. In this case, triggers have not just unconditional use cases, but also conditional ones too. So let's try this. I could try to insert on this view and then run some code down below. But to our discussion a little bit earlier, I don't want to unconditionally do this. I want to check, is this item in my table? Or is it not already? 

So for that, I could use this keyword called when. I could say for each row when some condition is true run this trigger. And you can imagine having two triggers to insert into our view, one that runs when the item is already in our table. We could then flip that bit from a 1 to a 0. We could also have a different trigger to actually insert a new item. In that case, actually inserting a brand new row to our underlying collections table. 

So let's work on this now to finish off and polish our collections view. I'll come back to our terminal here. And what we'll try to do is now insert into this table. I'll create a new trigger, in this case, called insert when exists. I'll say create trigger insert when exists. So this trigger will run when an item we're trying to insert is already in our collections table. 

I'll say instead of insert on the current collections table, I want this to happen instead. I'll say for each row. Now I'll Enter again and say when-- just for style sake-- when new dot accession number is in my table already. I want to find which accession numbers I already have. And if you remember, accession number is a unique ID for the museum. It's not the ID column of my table, but a unique ID to keep track of items in the museum. 

So if I've already given this item an accession number, I should already see it in my table already. So I'll say then Enter, Enter, Enter for style sake, select accession number from collections. So this subquery tells me what accession numbers do we already have. And this condition tells me is the new row I'm inserting, does that have the same accession number or not? 

Now, I'll close this subquery. And I'll continue my trigger. I'll say begin. Well, in the case that this item already exists in my collections table, I just want to update. I want to flip that integer from a 1 back down to a 0. So I'll say update collections and set deleted equal to 0. Now, I want to do this not on my entire table, but only where the accession number is equal to the new accession number, that is the accession number I'm inserting here as well. 

I'll then end like this. And now I have my entire trigger. I'll say instead of inserting into my view, I want to run this instead. Check if I've already added this item. And if I have, change that deleted column from a 1 to a 0. Now I'll hit Enter. And let me hit Enter again. And let me check. I needed to have a semicolon after my where. So let me try to run this again. 

I'll do Control D to restart. And I'll say SQLite 3. Let's do MFA dot DB. And I'll restart this trigger. So here I'll say create trigger insert when exists. We gave the name before. Then I'll say instead of insert on current collections, what I want to do is run this. I'll say for each row I insert, check some condition. When the new accession number that is the unique ID the museum has for this item is in my already available accession numbers. I'll say select in this case, select accession number from collections. 

And then I'll close the subquery. Now, I'll say I want to begin the same way I want to run. Begin now update, in this case, my collections table and set the deleted column from a 1 back down to a 0. Update collections and set deleted equals 0 where, in this case, the accession number is equal to the new accession number, only modifying that particular row. 

Now semicolon to close this statement. And then I'll say end below, a second semicolon, Enter. Now if I type dot schema, I should see my new trigger to insert into my collections table when I try to insert into my view. So let's try this. I'll say maybe I want to insert into my current collections. Well, let me try this. I'll say select star from collections. And let's say I want to reinsert, in this case, Profusion of Flowers. 

So I'll say insert into my current collections. Actually, reinsert into current collections. And I want to insert Imaginative Landscape. So I'll say insert into current collections some set of values along with some columns. I'll say the title is Perfusion of Flowers as Imaginative Landscape later on, then accession number, then the date it was acquired, like this. Now I'll hit Enter and provide the values. 

I'll say the title is Imaginative Landscape like this. The accession number is 56.496. And the acquired date is going to be null, in this case. We don't know when we got it. I'll hit semicolon here and Enter. So if I select star from collections now, select star from collections semicolon, I should see that I didn't actually insert a new row. I only flipped that deleted value from a 1 back down to a 0. 

Now, there's certainly more to do here. I could write a trigger to actually add a brand new row to this table here. I could also try to update this data, change, for example, the acquired date from null to a date that we actually do know. For now though, we leave all that up to you. And we'll see you next time.