1 00:00:00,000 --> 00:00:17,330 2 00:00:17,330 --> 00:00:20,460 CARTER ZENKE: Well, hello one and all, and welcome back to CS50-- 3 00:00:20,460 --> 00:00:22,940 Instruction to Databases with SQL. 4 00:00:22,940 --> 00:00:26,030 My name is Carter Zenke and up until now you've 5 00:00:26,030 --> 00:00:28,460 been learning how to design databases and how 6 00:00:28,460 --> 00:00:32,420 to add data to them, like how to insert, update, and delete 7 00:00:32,420 --> 00:00:34,580 some values all together. 8 00:00:34,580 --> 00:00:37,070 Now, along the way, we've added some complexity. 9 00:00:37,070 --> 00:00:41,120 That is, we went from one table to multiple, , and we had relationships, 10 00:00:41,120 --> 00:00:43,860 too, among all those tables, as well. 11 00:00:43,860 --> 00:00:47,930 Now, what we'll introduce today is a way to simplify some of that complexity 12 00:00:47,930 --> 00:00:51,890 and give you a way to see your data all the more clearly-- . 13 00:00:51,890 --> 00:00:54,230 this tool called a view. 14 00:00:54,230 --> 00:00:58,250 Now, we'll begin with a familiar data set-- this data set of books. 15 00:00:58,250 --> 00:01:01,730 And we learn about this data set of books back in week zero. 16 00:01:01,730 --> 00:01:04,430 If you remember, we had a table of 78 books 17 00:01:04,430 --> 00:01:08,330 that had all been nominated for the International Booker Prize. 18 00:01:08,330 --> 00:01:11,030 It was a single table, just of these books. 19 00:01:11,030 --> 00:01:15,170 Now quickly, in week one, we introduced more than just this single table. 20 00:01:15,170 --> 00:01:20,000 We actually had multiple tables-- a table for books, a table for authors, 21 00:01:20,000 --> 00:01:25,670 for example, and also some relationship between those authors and those books. 22 00:01:25,670 --> 00:01:31,370 We said that an author could write not just one book, but multiple books. 23 00:01:31,370 --> 00:01:35,420 And similarly, a book could be written by not just one author, 24 00:01:35,420 --> 00:01:37,320 but multiple as well. 25 00:01:37,320 --> 00:01:40,650 This is what we call a many-to-many relationship. 26 00:01:40,650 --> 00:01:45,080 Now, we also saw this not just visually, but in our database, 27 00:01:45,080 --> 00:01:46,790 in terms of tables, as well. 28 00:01:46,790 --> 00:01:52,520 We had this kind of schema, where we had a table for authors, a table for books, 29 00:01:52,520 --> 00:01:57,410 and then a table in the middle, to talk about the relationship between authors 30 00:01:57,410 --> 00:02:00,440 and books-- which author wrote which book. 31 00:02:00,440 --> 00:02:05,270 We can see this in the author table, looking at these IDs here. 32 00:02:05,270 --> 00:02:11,060 And now to recap, if I wanted to find the book that Han here has written, 33 00:02:11,060 --> 00:02:14,010 let me ask, how could I do that? 34 00:02:14,010 --> 00:02:17,690 How could I find the book that Han, in the authors table, 35 00:02:17,690 --> 00:02:21,890 has written, over here in the books table? 36 00:02:21,890 --> 00:02:24,320 Feel free to raise your hand. 37 00:02:24,320 --> 00:02:26,768 AUDIENCE: By looking at the author_id, 31. 38 00:02:26,768 --> 00:02:27,560 CARTER ZENKE: Nice. 39 00:02:27,560 --> 00:02:28,760 So start by looking at the author_id. 40 00:02:28,760 --> 00:02:29,880 That's a great idea. 41 00:02:29,880 --> 00:02:33,350 So here we see that Han's author_id is 31. 42 00:02:33,350 --> 00:02:37,700 And we could make this link between the authors table and the authored table. 43 00:02:37,700 --> 00:02:40,160 So I'll highlight this here, so we can see it visually. 44 00:02:40,160 --> 00:02:42,080 Han has the ID 31. 45 00:02:42,080 --> 00:02:46,700 And here on the authored table, we see the author_id of 31 next 46 00:02:46,700 --> 00:02:49,280 to, actually, some book_id. 47 00:02:49,280 --> 00:02:51,050 This book_id is 74. 48 00:02:51,050 --> 00:02:53,840 Well what book has that ID, 74? 49 00:02:53,840 --> 00:02:57,830 Turns out it's the white book, all the way over here. 50 00:02:57,830 --> 00:03:02,790 Now, this is adding some complexity to our queries. 51 00:03:02,790 --> 00:03:05,510 If I want to find who wrote what book, I have 52 00:03:05,510 --> 00:03:09,080 to go through the authors table, then the authored table, and then 53 00:03:09,080 --> 00:03:09,800 the book table. 54 00:03:09,800 --> 00:03:12,390 That's just a lot of steps to go through. 55 00:03:12,390 --> 00:03:15,530 And we could, though, try to simplify this, 56 00:03:15,530 --> 00:03:19,100 to have a better view of this data, to see it 57 00:03:19,100 --> 00:03:23,600 with just a single column for authors and a single column for title. 58 00:03:23,600 --> 00:03:27,170 So let's try to visualize this together. 59 00:03:27,170 --> 00:03:31,760 What I might do first is try to flip around this primary key of authors, 60 00:03:31,760 --> 00:03:32,810 this ID column. 61 00:03:32,810 --> 00:03:36,980 Let's put it closer to the author table, just visually, metaphorically, here. 62 00:03:36,980 --> 00:03:42,200 Now I might see that these IDs line, up one to another. 63 00:03:42,200 --> 00:03:46,910 And we saw in a previous lecture that I could use a join in SQL 64 00:03:46,910 --> 00:03:50,420 to put these tables into a single table. 65 00:03:50,420 --> 00:03:55,550 And what a join does, if you remember, is take the primary key in one table, 66 00:03:55,550 --> 00:03:59,655 and the foreign key in another, and just find which ones match, 67 00:03:59,655 --> 00:04:01,280 to put the rows of each table together. 68 00:04:01,280 --> 00:04:04,290 A bit like touching your fingertips together, like this. 69 00:04:04,290 --> 00:04:06,600 So I'll then visually move these together, 70 00:04:06,600 --> 00:04:09,410 join these tables into a single table. 71 00:04:09,410 --> 00:04:13,100 And notice how if I, now, just get rid of these IDs-- 72 00:04:13,100 --> 00:04:15,090 metaphorically, just visually here-- 73 00:04:15,090 --> 00:04:21,140 I now have a much more simple table, a simple view of this data. 74 00:04:21,140 --> 00:04:24,950 And if I want to find which book Han has written, well, I could do that. 75 00:04:24,950 --> 00:04:26,870 Just look at a single row here. 76 00:04:26,870 --> 00:04:31,880 Look at the name column and then the title column here. 77 00:04:31,880 --> 00:04:37,190 Now, what we're doing, by creating this new view of our data set, 78 00:04:37,190 --> 00:04:39,200 is actually creating a view. 79 00:04:39,200 --> 00:04:43,850 And a view is defined by this here-- a virtual table 80 00:04:43,850 --> 00:04:46,700 that is itself defined by a query. 81 00:04:46,700 --> 00:04:48,290 Now, what does this mean? 82 00:04:48,290 --> 00:04:49,430 What is a virtual table? 83 00:04:49,430 --> 00:04:50,330 What is a query? 84 00:04:50,330 --> 00:04:53,310 Let's go back a few steps and see this in action. 85 00:04:53,310 --> 00:04:56,030 So we started with, in this case, three tables-- 86 00:04:56,030 --> 00:04:58,730 authors, books, and authored. 87 00:04:58,730 --> 00:05:04,160 And presumably, there's some query here that gets us from point A, these three 88 00:05:04,160 --> 00:05:07,400 tables, to point B, this single table. 89 00:05:07,400 --> 00:05:11,880 That query might involve some selects, some join we talked about earlier. 90 00:05:11,880 --> 00:05:17,350 But there is some query we can write to get us from point A to point B. 91 00:05:17,350 --> 00:05:22,540 And notice here that the result of this query is itself a table. 92 00:05:22,540 --> 00:05:26,830 Like, this has rows and columns and what we can do 93 00:05:26,830 --> 00:05:32,030 is see all of those in terms of a table we could later on query. 94 00:05:32,030 --> 00:05:38,020 So a view is a way of writing some query to see our data in a new way 95 00:05:38,020 --> 00:05:44,540 and then saving this view of our data, so we can query it later on. 96 00:05:44,540 --> 00:05:47,050 Now, why might you want to build a view? 97 00:05:47,050 --> 00:05:49,330 Well, you might want to for a few reasons here. 98 00:05:49,330 --> 00:05:53,200 You could use it to simplify your data, as we just saw here. 99 00:05:53,200 --> 00:05:57,050 You can use it to aggregate your data, to try to sum up some values 100 00:05:57,050 --> 00:05:58,820 and store that in a single table. 101 00:05:58,820 --> 00:06:02,710 You could use it to partition your data, or divide it into logical pieces. 102 00:06:02,710 --> 00:06:05,560 And you could even use it to secure your data, 103 00:06:05,560 --> 00:06:09,580 to perhaps hide some columns you don't want somebody to see. 104 00:06:09,580 --> 00:06:11,890 There are more reasons, too, to use views. 105 00:06:11,890 --> 00:06:14,420 We'll focus on these four today. 106 00:06:14,420 --> 00:06:18,110 And let's begin by focusing on simplifying. 107 00:06:18,110 --> 00:06:23,920 So we saw earlier a way to simplify these multiple tables down in to one. 108 00:06:23,920 --> 00:06:28,330 If I go back here, we'll see, we got from this set of three tables 109 00:06:28,330 --> 00:06:33,310 to a single table, that had both author names and author titles. 110 00:06:33,310 --> 00:06:38,440 So let's see here exactly how much we're reducing complexity 111 00:06:38,440 --> 00:06:40,900 by trying out different queries, and then 112 00:06:40,900 --> 00:06:45,800 adding in a view to see all the benefits of this new view we have here. 113 00:06:45,800 --> 00:06:46,570 So we'll go back. 114 00:06:46,570 --> 00:06:49,090 And we'll see, in this case, our three tables-- 115 00:06:49,090 --> 00:06:51,700 authors, books, and authored. 116 00:06:51,700 --> 00:06:56,000 And let's say I want to find the books a particular author has written. 117 00:06:56,000 --> 00:06:58,540 I could write a query to do just that. 118 00:06:58,540 --> 00:07:02,600 And let's go back to SQLite here, to run that query ourselves. 119 00:07:02,600 --> 00:07:05,410 I'll come back to my computer, here in SQLite. 120 00:07:05,410 --> 00:07:09,940 And my goal will be to find which author wrote which particular books. 121 00:07:09,940 --> 00:07:13,770 So if I go to my database, here, I can go to my terminal, and type sqlite3 122 00:07:13,770 --> 00:07:18,700 longlist.db, to reopen this database. 123 00:07:18,700 --> 00:07:21,430 And now, I should see, if I type .schema, 124 00:07:21,430 --> 00:07:23,560 all the tables that are inside of it. 125 00:07:23,560 --> 00:07:27,580 I have a table for authors, if I scroll up a little bit, a table for books, 126 00:07:27,580 --> 00:07:33,340 and a table for authored-- that is, relationship between authors and books. 127 00:07:33,340 --> 00:07:37,240 Now, let's say I want to write a query to find 128 00:07:37,240 --> 00:07:40,570 which books Fernanda Melchor wrote. 129 00:07:40,570 --> 00:07:44,500 Well, I probably first need to find Fernanda's ID, as we just 130 00:07:44,500 --> 00:07:47,210 saw visually in our earlier example. 131 00:07:47,210 --> 00:07:49,570 So let's find Fernanda's ID. 132 00:07:49,570 --> 00:07:53,920 I could say SELECT, let's say, id FROM the authors table, 133 00:07:53,920 --> 00:07:56,980 where the name equals Fernanda Melchor. 134 00:07:56,980 --> 00:07:58,930 So I'll say SELECT id FROM authors. 135 00:07:58,930 --> 00:08:06,610 Then, WHERE the "name" equals Fernanda-- 'Fernanda Melchor' semicolon. 136 00:08:06,610 --> 00:08:10,330 So it seems Fernanda's ID is 24. 137 00:08:10,330 --> 00:08:13,810 And I could use this ID in the rest of my query, 138 00:08:13,810 --> 00:08:15,920 but I could probably do better than this. 139 00:08:15,920 --> 00:08:18,130 I could use what we saw before, something 140 00:08:18,130 --> 00:08:21,550 called a subquery, to take this query I just wrote 141 00:08:21,550 --> 00:08:24,560 and put it inside some other one here. 142 00:08:24,560 --> 00:08:29,560 So let's take the next step and find the book_ids that Fernando wrote. 143 00:08:29,560 --> 00:08:34,090 So let's try looking in the authored table for that, which has author_ids 144 00:08:34,090 --> 00:08:37,419 and book_ids, side-by-side. 145 00:08:37,419 --> 00:08:43,929 OK, I'll SELECT, in this case, the book_id from the authored table, WHERE, 146 00:08:43,929 --> 00:08:48,940 in this case, the author_id is equal to-- well, I could say 24, 147 00:08:48,940 --> 00:08:50,110 but I want to do better-- 148 00:08:50,110 --> 00:08:52,270 I could actually have a very own subquery here 149 00:08:52,270 --> 00:08:54,430 to find Fernanda's ID dynamically. 150 00:08:54,430 --> 00:08:58,810 I could put parentheses here, then hit one, two, three, four, spaces, 151 00:08:58,810 --> 00:09:05,560 and say SELECT id from the authors table, WHERE 152 00:09:05,560 --> 00:09:08,530 the name equals Fernanda Melchor-- 153 00:09:08,530 --> 00:09:10,390 that same query from before. 154 00:09:10,390 --> 00:09:13,360 I'll close this out, close out my subquery. 155 00:09:13,360 --> 00:09:20,090 And then if I hit Enter, I should see now the book IDs Fernanda has written. 156 00:09:20,090 --> 00:09:22,900 But there's still one more step here. 157 00:09:22,900 --> 00:09:24,580 What do I now need to do? 158 00:09:24,580 --> 00:09:28,060 I have book IDs that Fernando wrote but how 159 00:09:28,060 --> 00:09:33,160 do you think I could find the titles from those book IDs? 160 00:09:33,160 --> 00:09:35,698 AUDIENCE: You could select the title from the books. 161 00:09:35,698 --> 00:09:36,490 CARTER ZENKE: Nice. 162 00:09:36,490 --> 00:09:38,780 When you select title from that books table-- 163 00:09:38,780 --> 00:09:42,490 so if you remember, we have titles inside of our books table, 164 00:09:42,490 --> 00:09:46,240 but we still need to link those titles through their ID 165 00:09:46,240 --> 00:09:48,380 to the book IDs Fernanda has written. 166 00:09:48,380 --> 00:09:50,410 So let's do one last step here. 167 00:09:50,410 --> 00:09:52,670 I'll come back to my terminal. 168 00:09:52,670 --> 00:09:54,940 And I'll write the query in full now. 169 00:09:54,940 --> 00:09:57,430 So I know I want to end up with titles. 170 00:09:57,430 --> 00:10:02,230 I'll say SELECT, in this case, title from the books table. 171 00:10:02,230 --> 00:10:05,890 Now, I want to find particular titles where 172 00:10:05,890 --> 00:10:09,910 the ID is in that list of books Fernanda has written. 173 00:10:09,910 --> 00:10:16,660 So I'll say WHERE the id is IN some subquery, the query we actually 174 00:10:16,660 --> 00:10:17,590 just wrote. 175 00:10:17,590 --> 00:10:18,820 So let me write it again. 176 00:10:18,820 --> 00:10:22,390 I'll say one, two, three, four spaces here to indent for style. 177 00:10:22,390 --> 00:10:29,110 Then I'll say SELECT the book_id from the authored table, in this case, 178 00:10:29,110 --> 00:10:34,180 WHERE the author_id is equal to some number-- 179 00:10:34,180 --> 00:10:36,370 in this case, Fernanda's ID. 180 00:10:36,370 --> 00:10:38,170 But what is Fernanda's ID? 181 00:10:38,170 --> 00:10:39,190 Let's find out. 182 00:10:39,190 --> 00:10:44,140 I'll make another subquery here to find Fernanda's ID, indenting 183 00:10:44,140 --> 00:10:46,270 four times just for style here. 184 00:10:46,270 --> 00:10:50,200 Then I'll say SELECT id from the authors table-- 185 00:10:50,200 --> 00:10:51,640 and then again indent-- 186 00:10:51,640 --> 00:10:56,380 WHERE the name equals Fernanda Melchor. 187 00:10:56,380 --> 00:11:01,600 Then I'll close out all my queries to create this single long query to find 188 00:11:01,600 --> 00:11:04,370 all the books Fernanda has written. 189 00:11:04,370 --> 00:11:08,170 And, again, the order here is first find Fernanda's ID, 190 00:11:08,170 --> 00:11:11,440 then find the book IDs associated with that ID, 191 00:11:11,440 --> 00:11:16,300 then find the titles associated with those book IDs themselves. 192 00:11:16,300 --> 00:11:23,110 Now I'll hit Enter and I should see all those books that Fernanda has written. 193 00:11:23,110 --> 00:11:26,150 Now, as we talked about, this is a lot of complexity, 194 00:11:26,150 --> 00:11:28,480 a lot of tables to go through, and there's probably 195 00:11:28,480 --> 00:11:30,110 a better way to do this. 196 00:11:30,110 --> 00:11:32,590 So let me open up a new terminal and try out 197 00:11:32,590 --> 00:11:35,140 a different way of going about this. 198 00:11:35,140 --> 00:11:37,960 I'll make a new connection to my database. 199 00:11:37,960 --> 00:11:43,420 I'll make a new terminal and I'll say sqlite3 longlist.db, hit Enter. 200 00:11:43,420 --> 00:11:48,160 Now I should see that I have a brand new connection to my database. 201 00:11:48,160 --> 00:11:52,220 And I could try to find a better way to do this. 202 00:11:52,220 --> 00:11:56,770 Let me try to do what we did visually and join these tables together. 203 00:11:56,770 --> 00:12:04,820 Well I could try SELECT name and title FROM the authors table. 204 00:12:04,820 --> 00:12:09,100 But notice how authors table doesn't have titles in it. 205 00:12:09,100 --> 00:12:12,850 It does have the name of the author, but it doesn't have book title. 206 00:12:12,850 --> 00:12:17,060 So to get titles, I have to join in some new table here. 207 00:12:17,060 --> 00:12:24,490 I'll hit Enter and I'll say JOIN, let's say, the authored table on authors 208 00:12:24,490 --> 00:12:30,820 dot id equals authored dot author_id. 209 00:12:30,820 --> 00:12:36,610 And all I'm doing here is saying I'll take first the authors table and then 210 00:12:36,610 --> 00:12:40,540 why don't I try to combine this table with the authored table, 211 00:12:40,540 --> 00:12:44,320 linking it up by looking at the authors dot ID column 212 00:12:44,320 --> 00:12:47,050 and the authored dot author_id column. 213 00:12:47,050 --> 00:12:50,380 And visually, let's take a look what we're doing here in this join. 214 00:12:50,380 --> 00:12:54,640 If I look at this in my slides here, I should see-- 215 00:12:54,640 --> 00:12:56,980 I started with my authors table. 216 00:12:56,980 --> 00:13:03,370 Here I had names of those authors and I had an ID column in that author table. 217 00:13:03,370 --> 00:13:06,850 Now I'm trying to join the authored table 218 00:13:06,850 --> 00:13:12,940 by telling SQL that these two columns, author_id and id, should align. 219 00:13:12,940 --> 00:13:17,170 They should line up like this so their rows are all together. 220 00:13:17,170 --> 00:13:19,030 But there's still one more step. 221 00:13:19,030 --> 00:13:24,260 I've joined the author table and the authored table, 222 00:13:24,260 --> 00:13:27,940 but what's the next step to find book titles? 223 00:13:27,940 --> 00:13:31,280 I still have to join the books table. 224 00:13:31,280 --> 00:13:32,800 So let's try that here. 225 00:13:32,800 --> 00:13:36,970 I'll come back over and instead of joining just the authored table, 226 00:13:36,970 --> 00:13:39,970 let me now try to also join-- 227 00:13:39,970 --> 00:13:41,530 let me hit Enter-- 228 00:13:41,530 --> 00:13:44,080 to also join the books table. 229 00:13:44,080 --> 00:13:55,960 I'll JOIN books ON books dot id equals, in this case, authored dot book_id. 230 00:13:55,960 --> 00:14:00,760 And to be clear, this line here means that what I'm trying to do 231 00:14:00,760 --> 00:14:07,270 is now join this book table, which is over here, with this author table, 232 00:14:07,270 --> 00:14:11,530 aligning it so that their IDs match up into a single row, 233 00:14:11,530 --> 00:14:17,180 as we see here with 74 and 74, 1 and 1, and so on. 234 00:14:17,180 --> 00:14:17,950 I'll come back. 235 00:14:17,950 --> 00:14:20,690 And now let me try this. 236 00:14:20,690 --> 00:14:21,700 Come back over here. 237 00:14:21,700 --> 00:14:28,300 And I'll just try hitting Enter to see what I can see from this query. 238 00:14:28,300 --> 00:14:31,000 Let me zoom out a little bit and you'll see 239 00:14:31,000 --> 00:14:38,020 I actually do have author names next to book titles all in one table, 240 00:14:38,020 --> 00:14:40,970 just as we thought we would. 241 00:14:40,970 --> 00:14:47,680 So this is useful to me, but what if I wanted to have this stored as a way 242 00:14:47,680 --> 00:14:49,670 to see my tables? 243 00:14:49,670 --> 00:14:54,530 I want to be able to query this kind of table I see right here. 244 00:14:54,530 --> 00:14:59,230 Well for that, I could try to introduce a view. 245 00:14:59,230 --> 00:15:03,670 Now a view starts with a query, as we just saw. 246 00:15:03,670 --> 00:15:08,980 I wrote a query to select names and titles from three different tables, 247 00:15:08,980 --> 00:15:10,880 joining them together. 248 00:15:10,880 --> 00:15:14,180 But if I want to save the results of that query 249 00:15:14,180 --> 00:15:19,700 so I can query them later on, I could use this syntax here create 250 00:15:19,700 --> 00:15:26,310 a view that has a name that I can give to it as this query down below here. 251 00:15:26,310 --> 00:15:30,230 So CREATE VIEW given some name AS the result of this query 252 00:15:30,230 --> 00:15:35,400 gives me a view that is then saved and part of my schema. 253 00:15:35,400 --> 00:15:40,190 So let's try that here to create a view by which you can see both author 254 00:15:40,190 --> 00:15:43,220 and book titles side by side. 255 00:15:43,220 --> 00:15:48,650 I'll come back to SQLite here and I'm already in this brand new terminal. 256 00:15:48,650 --> 00:15:54,740 So I could try to remake this view and save it now for later usage. 257 00:15:54,740 --> 00:15:58,760 I'll say create a view and call it longlist, 258 00:15:58,760 --> 00:16:03,260 the same name we gave our single table back in Week 0. 259 00:16:03,260 --> 00:16:10,430 Now I'll say this view is the result of this following query AS, 260 00:16:10,430 --> 00:16:16,400 let's say, SELECT name and title from the authors table-- 261 00:16:16,400 --> 00:16:18,650 but not just the author's table-- 262 00:16:18,650 --> 00:16:22,010 join, let's say, the authored table-- 263 00:16:22,010 --> 00:16:30,530 JOIN authored on authors dot id equals authored dot author_id. 264 00:16:30,530 --> 00:16:34,760 And then join again our book table as we saw visually here. 265 00:16:34,760 --> 00:16:46,310 JOIN books ON books dot id equals authored dot book_id. 266 00:16:46,310 --> 00:16:52,220 Now that is the query I used before to see author names and book titles side 267 00:16:52,220 --> 00:16:53,000 by side. 268 00:16:53,000 --> 00:16:55,130 If I create this as a view-- 269 00:16:55,130 --> 00:16:57,230 hitting semicolon and Enter-- 270 00:16:57,230 --> 00:16:58,910 nothing quite happens. 271 00:16:58,910 --> 00:17:07,579 But now if I type .schema and hit Enter, I should see a view called longlist 272 00:17:07,579 --> 00:17:12,560 that has two columns, name and title, and actually see the query I use 273 00:17:12,560 --> 00:17:15,470 to create this view. 274 00:17:15,470 --> 00:17:17,390 So let me think now. 275 00:17:17,390 --> 00:17:20,030 I want to see what's inside this view. 276 00:17:20,030 --> 00:17:24,290 And to query a view I can use it exactly as I would a table. 277 00:17:24,290 --> 00:17:30,380 I could say SELECT star FROM longlist semicolon hit Enter, 278 00:17:30,380 --> 00:17:36,300 and now I should see author names and book titles side by side. 279 00:17:36,300 --> 00:17:42,620 So if I have this kind of table with authors and book title side by side, 280 00:17:42,620 --> 00:17:48,110 what then does my query become to find the books written by Fernanda? 281 00:17:48,110 --> 00:17:50,510 Let me ask our audience here. 282 00:17:50,510 --> 00:17:57,170 What query could I use now on this view to find the books written by Fernanda? 283 00:17:57,170 --> 00:17:58,820 AUDIENCE: Thank you, Mr. [INAUDIBLE]. 284 00:17:58,820 --> 00:18:05,787 We can query the book or all of the books using her specific name. 285 00:18:05,787 --> 00:18:06,870 CARTER ZENKE: A good idea. 286 00:18:06,870 --> 00:18:10,882 I can use a particular name in this case and perhaps just use a simple WHERE. 287 00:18:10,882 --> 00:18:11,840 So let's try that here. 288 00:18:11,840 --> 00:18:13,740 I'll go back to my terminal. 289 00:18:13,740 --> 00:18:16,850 And if I want to find the books written by Fernanda, 290 00:18:16,850 --> 00:18:19,310 well, I could just query for them in this single table 291 00:18:19,310 --> 00:18:23,270 that I created called longlist-- or not really a table but now a view. 292 00:18:23,270 --> 00:18:28,115 I'll say, in this case, SELECT the title from-- 293 00:18:28,115 --> 00:18:31,370 SELECT title from longlist WHERE-- 294 00:18:31,370 --> 00:18:32,460 to Muhammad's point-- 295 00:18:32,460 --> 00:18:40,340 I could just say name equals Fernanda Melchor semicolon. 296 00:18:40,340 --> 00:18:43,850 I'll hit Enter here and I see those same books 297 00:18:43,850 --> 00:18:47,990 Fernanda has written Paradais and Hurricane Season. 298 00:18:47,990 --> 00:18:49,940 So let's see where we started here. 299 00:18:49,940 --> 00:18:52,880 We started with this kind of query-- 300 00:18:52,880 --> 00:18:57,650 long, multiple lines, many subqueries involved-- just 301 00:18:57,650 --> 00:19:01,520 to find the answer to which books Fernanda had written. 302 00:19:01,520 --> 00:19:06,620 Now, though, that we have a new view, I can simplify this query 303 00:19:06,620 --> 00:19:08,210 to just this here-- 304 00:19:08,210 --> 00:19:12,740 SELECT title FROM longlist WHERE the name equals Fernanda. 305 00:19:12,740 --> 00:19:18,170 And I can do all of this without using much, if any, more disk space. 306 00:19:18,170 --> 00:19:21,560 A view, remember, is a virtual table, not a real one. 307 00:19:21,560 --> 00:19:25,220 And I can then have the result organized like this while still 308 00:19:25,220 --> 00:19:30,840 being in those underlying tables stored essentially elsewhere. 309 00:19:30,840 --> 00:19:32,150 So let me ask then-- 310 00:19:32,150 --> 00:19:37,670 what questions we have on how views can simplify our data? 311 00:19:37,670 --> 00:19:43,580 AUDIENCE: Can we then somehow manipulate the views to maybe order the results 312 00:19:43,580 --> 00:19:46,185 or display them just differently? 313 00:19:46,185 --> 00:19:47,310 CARTER ZENKE: A great idea. 314 00:19:47,310 --> 00:19:49,760 So if I wanted to maybe order these books, 315 00:19:49,760 --> 00:19:53,700 I could absolutely do that in the same way I would do in a table. 316 00:19:53,700 --> 00:19:55,670 So let me go back and show you how to do that. 317 00:19:55,670 --> 00:20:00,390 Let's say I want to organize these books alphabetically by title. 318 00:20:00,390 --> 00:20:04,910 So if you remember, using our regular old table, we could use ORDER BY. 319 00:20:04,910 --> 00:20:06,510 I could do the same thing here. 320 00:20:06,510 --> 00:20:13,520 I could say SELECT both the name and title columns from my longlist table. 321 00:20:13,520 --> 00:20:20,060 And now let me try to ORDER BY the title column in this case 322 00:20:20,060 --> 00:20:21,430 and I'll hit semicolon. 323 00:20:21,430 --> 00:20:28,300 Now, if I hit Enter, I should see the book titles organized alphabetically. 324 00:20:28,300 --> 00:20:31,690 Now, the view itself does not have these titles 325 00:20:31,690 --> 00:20:36,190 automatically alphabetically sorted for me, but I could make it do that. 326 00:20:36,190 --> 00:20:41,290 If I had, when I was making this view included ORDER BY, let's say, title 327 00:20:41,290 --> 00:20:46,480 at the end of that query, I would then have those titles saved in this view 328 00:20:46,480 --> 00:20:49,900 automatically as A to Z. So very powerful 329 00:20:49,900 --> 00:20:53,200 if you want to not just find particular ways of seeing your data 330 00:20:53,200 --> 00:20:57,080 but also organizing it or ordering it too. 331 00:20:57,080 --> 00:21:02,080 So with that note, we've seen how to simplify one example of having 332 00:21:02,080 --> 00:21:04,360 multiple tables down into one. 333 00:21:04,360 --> 00:21:07,990 We'll come back in just a few and see what we can aggregate data too, 334 00:21:07,990 --> 00:21:10,180 putting that data into a single table for us 335 00:21:10,180 --> 00:21:12,850 to see some statistics and such as well. 336 00:21:12,850 --> 00:21:15,040 Come back in a few. 337 00:21:15,040 --> 00:21:16,300 And we're back. 338 00:21:16,300 --> 00:21:21,070 So we just saw how to simplify some data, how to take multiple tables 339 00:21:21,070 --> 00:21:23,350 and see them as one. 340 00:21:23,350 --> 00:21:26,230 What we'll do now is think about how we can aggregate 341 00:21:26,230 --> 00:21:29,050 data, that is to take many individual data points 342 00:21:29,050 --> 00:21:33,100 and summarize them to find an average, or a min, or a max, 343 00:21:33,100 --> 00:21:36,640 but see those results inside of a view. 344 00:21:36,640 --> 00:21:42,250 So if you remember in our longlist database, we had a table of ratings. 345 00:21:42,250 --> 00:21:47,290 And in this case, books had individual ratings from individual people, 346 00:21:47,290 --> 00:21:50,950 like maybe I gave book_id 1 a four out of five stars, 347 00:21:50,950 --> 00:21:53,530 but David gave it only a three. 348 00:21:53,530 --> 00:21:57,070 Now, we store these individual ratings in our database, 349 00:21:57,070 --> 00:22:00,700 but ultimately I don't care so much what I rated it, 350 00:22:00,700 --> 00:22:04,540 or maybe what David rated it, I care about the average rating, let's say, 351 00:22:04,540 --> 00:22:06,760 for each book. 352 00:22:06,760 --> 00:22:10,480 But if we care about those averages, why then would 353 00:22:10,480 --> 00:22:13,270 we store all the individual ratings? 354 00:22:13,270 --> 00:22:15,490 Let's think of this as a design question here. 355 00:22:15,490 --> 00:22:17,920 Can you think of a reason we might want to actually 356 00:22:17,920 --> 00:22:22,450 have these individual ratings stored in our database? 357 00:22:22,450 --> 00:22:25,360 What does that give us at the end of the day? 358 00:22:25,360 --> 00:22:29,458 AUDIENCE: Perhaps reliability, to know how many persons rated the book. 359 00:22:29,458 --> 00:22:30,250 CARTER ZENKE: Nice. 360 00:22:30,250 --> 00:22:32,080 So it's nice to have some more data. 361 00:22:32,080 --> 00:22:35,110 I could know how, let's say, I rated a book or how you rated a book. 362 00:22:35,110 --> 00:22:36,580 That's good here too. 363 00:22:36,580 --> 00:22:39,550 And I would argue that from having this additional data, 364 00:22:39,550 --> 00:22:42,430 we could compute more statistics. 365 00:22:42,430 --> 00:22:45,460 I could find not just the average rating but also 366 00:22:45,460 --> 00:22:49,520 the minimum rating, or the maximum rating, or the median rating, 367 00:22:49,520 --> 00:22:50,060 and so on. 368 00:22:50,060 --> 00:22:53,590 So these individual results allow us to find and compute 369 00:22:53,590 --> 00:22:57,250 more statistics, like averages, mins, and maxes, 370 00:22:57,250 --> 00:23:00,880 as opposed to just storing the average. 371 00:23:00,880 --> 00:23:05,890 Now, if I wanted to find the average of these results, what could I do? 372 00:23:05,890 --> 00:23:06,880 We saw before-- 373 00:23:06,880 --> 00:23:11,470 I could try to group ratings buy individual book IDs. 374 00:23:11,470 --> 00:23:15,400 Let's say I want to find the ratings for book_id 1 and book_id 2. 375 00:23:15,400 --> 00:23:20,620 I could group by the book ID column, turning these into individual groups. 376 00:23:20,620 --> 00:23:25,120 And let's say I want to find the average here using a SQL query. 377 00:23:25,120 --> 00:23:28,810 I could say find me the average of the rating column 378 00:23:28,810 --> 00:23:31,840 when I have groups by book ID. 379 00:23:31,840 --> 00:23:35,960 That effectively takes my table like this and says, 380 00:23:35,960 --> 00:23:40,240 OK, I'll combine all the ratings for each individual book ID 381 00:23:40,240 --> 00:23:42,670 and find for you the average. 382 00:23:42,670 --> 00:23:46,510 I'll first compress this data together, literally putting it one on 383 00:23:46,510 --> 00:23:50,180 top the other here, and then find the average of all those results-- 384 00:23:50,180 --> 00:23:55,000 let's say, 3.67 book 1 and 2.5 for book 2-- 385 00:23:55,000 --> 00:23:57,520 compressing all of our data into a single result-- 386 00:23:57,520 --> 00:24:00,170 in this case, the average. 387 00:24:00,170 --> 00:24:05,200 So let's walk through then how we found the average using a SQL query 388 00:24:05,200 --> 00:24:08,420 and then we'll turn that into a view. 389 00:24:08,420 --> 00:24:13,660 So I'll come back over here to that computer and I'll open up longlist.db. 390 00:24:13,660 --> 00:24:19,540 I'll already here, so I can type .schema ratings to see the schema for this 391 00:24:19,540 --> 00:24:20,630 ratings table. 392 00:24:20,630 --> 00:24:26,630 And you'll see it does have a book_id column and a rating column as well. 393 00:24:26,630 --> 00:24:31,120 So if I want to find the average rating for each individual book ID, 394 00:24:31,120 --> 00:24:32,330 what could I do? 395 00:24:32,330 --> 00:24:39,710 I could try to SELECT the book_id column and the rating column like this. 396 00:24:39,710 --> 00:24:43,570 But I don't want to select just the rating column itself 397 00:24:43,570 --> 00:24:46,210 that gives me individual values. 398 00:24:46,210 --> 00:24:50,230 I want to find the average rating instead. 399 00:24:50,230 --> 00:24:55,960 I could instead say find me the average rating from this table. 400 00:24:55,960 --> 00:24:58,000 Now let me give this a new name. 401 00:24:58,000 --> 00:25:03,730 I'll say AS rating, calling the average rating now newly, in this case, rating. 402 00:25:03,730 --> 00:25:06,560 And I'll select this FROM the ratings table. 403 00:25:06,560 --> 00:25:10,150 And finally I need to GROUP BY something. 404 00:25:10,150 --> 00:25:12,880 I need to GROUP BY book_id. 405 00:25:12,880 --> 00:25:18,740 That means I'll calculate the average rating for each book ID. 406 00:25:18,740 --> 00:25:25,820 I'll hit semicolon here and Enter and I should see all of these book IDs, 407 00:25:25,820 --> 00:25:30,110 top to bottom, with their own rating. 408 00:25:30,110 --> 00:25:33,140 But here it doesn't look so pretty. 409 00:25:33,140 --> 00:25:39,560 What could I do to try to make this not so long, like 3.774019431-- 410 00:25:39,560 --> 00:25:42,950 there's a better way to do this and we saw it before. 411 00:25:42,950 --> 00:25:47,000 What could I do to enhance this view of my ratings table? 412 00:25:47,000 --> 00:25:50,690 AUDIENCE: I see some suggestions about rounding the ratings. 413 00:25:50,690 --> 00:25:55,505 But I was also going to suggest adding the book titles, of course. 414 00:25:55,505 --> 00:25:57,290 CARTER ZENKE: Ah, good point here too. 415 00:25:57,290 --> 00:26:00,550 So we first want to add the-- first we want to probably add the book title 416 00:26:00,550 --> 00:26:03,620 to see what titles are talking about-- not just, in this case, 417 00:26:03,620 --> 00:26:04,910 individual book IDs-- 418 00:26:04,910 --> 00:26:08,240 but also a good point to say let's try rounding these 419 00:26:08,240 --> 00:26:11,360 so we don't see 3.7740194. 420 00:26:11,360 --> 00:26:14,850 Let's try to round that maybe to two decimal places just to be safe. 421 00:26:14,850 --> 00:26:17,990 So let's first try rounding this and then 422 00:26:17,990 --> 00:26:21,170 we'll try adding in some book titles to your suggestion here. 423 00:26:21,170 --> 00:26:24,540 I'll come back over and let me just try rounding it first. 424 00:26:24,540 --> 00:26:28,280 So I will come back to my terminal and I'll 425 00:26:28,280 --> 00:26:32,540 say I want to find the average rating for each book_id. 426 00:26:32,540 --> 00:26:35,780 And I want to select, not just the average rating 427 00:26:35,780 --> 00:26:39,150 as we did before, but the rounded average rating. 428 00:26:39,150 --> 00:26:43,310 So I can use ROUND and say take the average rating 429 00:26:43,310 --> 00:26:46,370 and round it to two decimal places. 430 00:26:46,370 --> 00:26:50,260 And I'll call this column just rating in the final result. 431 00:26:50,260 --> 00:26:53,360 And I'll select this from the ratings table. 432 00:26:53,360 --> 00:26:59,420 And now I can group by, in this case, book_id Enter. 433 00:26:59,420 --> 00:27:04,490 I should see I have rounded ratings now, a little more pretty than before. 434 00:27:04,490 --> 00:27:10,100 Now, to your point, let's try to add in not just book IDs but also book titles. 435 00:27:10,100 --> 00:27:12,330 What does it mean for a book ID to have some rating? 436 00:27:12,330 --> 00:27:15,000 I want to see titles here in the end. 437 00:27:15,000 --> 00:27:16,520 So let's do that. 438 00:27:16,520 --> 00:27:22,130 I'll say SELECT book_id, and also SELECT title, 439 00:27:22,130 --> 00:27:27,050 and maybe even select the year that book was nominated for the Booker Prize. 440 00:27:27,050 --> 00:27:34,730 I'll then select the rounded average rating to two decimal places as rating. 441 00:27:34,730 --> 00:27:39,860 And I want to select this from the ratings table, from ratings. 442 00:27:39,860 --> 00:27:41,720 But what do you notice? 443 00:27:41,720 --> 00:27:46,520 If I were to select just from ratings, what data am I missing 444 00:27:46,520 --> 00:27:49,550 and where could I find it? 445 00:27:49,550 --> 00:27:54,352 AUDIENCE: Hey, you're missing the titles from the books table. 446 00:27:54,352 --> 00:27:56,060 CARTER ZENKE: Yeah, I have to have titles 447 00:27:56,060 --> 00:27:57,930 and those are stored in the books table. 448 00:27:57,930 --> 00:27:59,060 So what should we do? 449 00:27:59,060 --> 00:28:03,410 Probably just join in the books table matching up that primary key 450 00:28:03,410 --> 00:28:04,583 with some foreign keys. 451 00:28:04,583 --> 00:28:06,500 So I'll come back over here and we'll do that. 452 00:28:06,500 --> 00:28:09,120 I want to select from ratings, of course, 453 00:28:09,120 --> 00:28:12,740 but I also want to join, in this case, the book table. 454 00:28:12,740 --> 00:28:18,360 So I'll hit Enter and now I'll say JOIN as well the books table. 455 00:28:18,360 --> 00:28:22,020 Now, how can I match up the books table with the rating table? 456 00:28:22,020 --> 00:28:27,266 Well, I know that the rating table has a foreign key called book_id. 457 00:28:27,266 --> 00:28:30,800 And the book table has a primary key called id. 458 00:28:30,800 --> 00:28:34,890 So I'll tell SQL I want to match these two up a bit like this. 459 00:28:34,890 --> 00:28:39,590 I'll say JOIN books ON ratings dot book_id-- 460 00:28:39,590 --> 00:28:42,170 that foreign key that's inside of ratings-- 461 00:28:42,170 --> 00:28:47,450 equals, in this case, books dot id. 462 00:28:47,450 --> 00:28:52,070 Now, there's one more step, which is to group by book_id still. 463 00:28:52,070 --> 00:28:55,490 So I'll hit Enter and GROUP BY book_id. 464 00:28:55,490 --> 00:28:59,270 But now I should be able to hit semicolon and Enter 465 00:28:59,270 --> 00:29:06,440 and I should see, not just book IDs, but now titles and the years those books 466 00:29:06,440 --> 00:29:08,730 were nominated as well. 467 00:29:08,730 --> 00:29:11,390 So this view is getting better and better. 468 00:29:11,390 --> 00:29:14,780 And I might say we can actually go ahead and just add this to our schema. 469 00:29:14,780 --> 00:29:19,200 We could create a view that returns to us the results of this query. 470 00:29:19,200 --> 00:29:20,970 So let's try that. 471 00:29:20,970 --> 00:29:27,050 I'll say, create a view called maybe average book ratings, average book 472 00:29:27,050 --> 00:29:30,140 ratings for people to come in and do analysis on if they'd like. 473 00:29:30,140 --> 00:29:36,870 Then I could say, I want to create this view as that query we wrote before. 474 00:29:36,870 --> 00:29:42,230 So I'll say, again, select the book_id, the title, 475 00:29:42,230 --> 00:29:48,350 and let's say the year column, the rounded average rating 476 00:29:48,350 --> 00:29:51,200 to two decimal places. 477 00:29:51,200 --> 00:29:54,710 And then I want to select as rating. 478 00:29:54,710 --> 00:30:00,050 I'll select this from the ratings table, from ratings. 479 00:30:00,050 --> 00:30:03,350 And then I'll join, as we did before, the book table from ratings 480 00:30:03,350 --> 00:30:08,300 and then join books on that primary key and not foreign key. 481 00:30:08,300 --> 00:30:14,270 So I'll say on ratings.book_id that foreign key in ratings, 482 00:30:14,270 --> 00:30:19,650 and set that equal to books.id, that primary key we're going to use to join 483 00:30:19,650 --> 00:30:21,030 these tables together. 484 00:30:21,030 --> 00:30:24,120 Then I'll say group by book_id. 485 00:30:24,120 --> 00:30:28,890 And I should have if I hit Enter my very own view called average book 486 00:30:28,890 --> 00:30:33,420 ratings that I can use to then see the averages for each of these books. 487 00:30:33,420 --> 00:30:41,260 I'll say select-- let's say star-- from average_book_ratings semicolon. 488 00:30:41,260 --> 00:30:47,080 And now I should see all of those results just as we had before. 489 00:30:47,080 --> 00:30:49,950 Now notably, because this is in a view, I'm 490 00:30:49,950 --> 00:30:54,270 not using much, if any, additional space in my database. 491 00:30:54,270 --> 00:30:58,620 I still have underlying this view individual ratings. 492 00:30:58,620 --> 00:31:03,000 But because I saved this query and the results it created in a view, 493 00:31:03,000 --> 00:31:06,990 I can actually have access to the rounded average ratings 494 00:31:06,990 --> 00:31:08,050 for each of these. 495 00:31:08,050 --> 00:31:11,290 And I can query that view as well. 496 00:31:11,290 --> 00:31:16,590 So let me ask then what questions we have on how we've used views 497 00:31:16,590 --> 00:31:20,250 to aggregate some data so far? 498 00:31:20,250 --> 00:31:22,060 AUDIENCE: Actually, I have a question. 499 00:31:22,060 --> 00:31:28,350 Can we use both the same name, like before it was long list. 500 00:31:28,350 --> 00:31:31,530 And here we're using for average something. 501 00:31:31,530 --> 00:31:34,587 Can use long list for both of them? 502 00:31:34,587 --> 00:31:37,170 CARTER ZENKE: Yeah, so if you remember before we had this view 503 00:31:37,170 --> 00:31:38,490 called long list. 504 00:31:38,490 --> 00:31:43,780 And long list had book titles and authors inside of it. 505 00:31:43,780 --> 00:31:47,250 And I think you're asking, could we try to join, let's say, 506 00:31:47,250 --> 00:31:50,430 the long list table with our ratings to get book titles and so on? 507 00:31:50,430 --> 00:31:54,720 We couldn't do that because we don't have the ID column in that long list 508 00:31:54,720 --> 00:31:55,860 view. 509 00:31:55,860 --> 00:31:58,410 We could though perhaps write some queries that 510 00:31:58,410 --> 00:32:02,910 involve that long list view as well. 511 00:32:02,910 --> 00:32:07,170 OK, let's keep going here and seeing what advantages this view 512 00:32:07,170 --> 00:32:07,920 might give us. 513 00:32:07,920 --> 00:32:09,960 So I'll come back over here. 514 00:32:09,960 --> 00:32:14,640 And one more advantage is that if I were to add in some new rating 515 00:32:14,640 --> 00:32:18,870 to my ratings column, I could just requery this view 516 00:32:18,870 --> 00:32:20,980 to find the updated average. 517 00:32:20,980 --> 00:32:24,490 So let's say we use some INSERT INTO statement here 518 00:32:24,490 --> 00:32:26,400 and I update some books ratings. 519 00:32:26,400 --> 00:32:30,570 Well, if I'm a data analyst and I come back in a little later 520 00:32:30,570 --> 00:32:34,080 and I want to find the updated rating, I can do it just again, 521 00:32:34,080 --> 00:32:40,080 by selecting from, selecting star, from average book ratings like this, 522 00:32:40,080 --> 00:32:42,960 average book ratings. 523 00:32:42,960 --> 00:32:44,130 Hit Enter. 524 00:32:44,130 --> 00:32:47,880 And now I see perhaps some updated averages as well. 525 00:32:47,880 --> 00:32:52,350 Every time I query this view, I'll rerun essentially that query from before. 526 00:32:52,350 --> 00:32:57,420 And now I'll see the updated averages for each book. 527 00:32:57,420 --> 00:33:03,540 Now, sometimes as a data analyst, I might want to create a view 528 00:33:03,540 --> 00:33:07,500 but not store it long term as part of my schema. 529 00:33:07,500 --> 00:33:11,910 Like here if I type dot schema, I see both our prior view 530 00:33:11,910 --> 00:33:14,220 called long list right here. 531 00:33:14,220 --> 00:33:17,880 And I also see our average book ratings view. 532 00:33:17,880 --> 00:33:21,120 But if I am somebody who comes in and I maybe 533 00:33:21,120 --> 00:33:26,580 want to find the average book ratings not by each book, but by each year, 534 00:33:26,580 --> 00:33:30,780 I could do that, but then not store that view long term. 535 00:33:30,780 --> 00:33:32,910 Maybe I don't want others to see this view. 536 00:33:32,910 --> 00:33:37,000 Or maybe it's just from my own analysis on this given day. 537 00:33:37,000 --> 00:33:41,730 Well, to do that, I could move beyond just create view. 538 00:33:41,730 --> 00:33:46,320 And I could also enhance this to be create temporary view, where 539 00:33:46,320 --> 00:33:49,500 a temporary view exists only for the duration 540 00:33:49,500 --> 00:33:52,300 of my connection with the database. 541 00:33:52,300 --> 00:33:56,580 So to be clear, that means if I type SQLite three long list dot DB, 542 00:33:56,580 --> 00:33:59,100 I'm opening a connection to the database. 543 00:33:59,100 --> 00:34:03,720 If I then type dot quit, I would then lose that connection, 544 00:34:03,720 --> 00:34:04,650 quit that connection. 545 00:34:04,650 --> 00:34:07,810 And that view would no longer be there. 546 00:34:07,810 --> 00:34:09,060 So let's try this. 547 00:34:09,060 --> 00:34:10,449 I'll come back to my terminal. 548 00:34:10,449 --> 00:34:12,630 And we'll create a temporary view that shows 549 00:34:12,630 --> 00:34:16,469 us the average ratings for the books by the year they were nominated, 550 00:34:16,469 --> 00:34:19,570 not by every individual book here. 551 00:34:19,570 --> 00:34:23,550 So I could try doing the same query from before, 552 00:34:23,550 --> 00:34:27,300 like select, let's say, the average rating select it 553 00:34:27,300 --> 00:34:31,030 from the ratings table, join books, and titles, and so on. 554 00:34:31,030 --> 00:34:34,710 But what do I have already to build on top of? 555 00:34:34,710 --> 00:34:41,820 If I wanted to find a view that has a title, a rating, and even 556 00:34:41,820 --> 00:34:44,130 let's say the year that book was published, 557 00:34:44,130 --> 00:34:48,610 what do I already have in my schema for that? 558 00:34:48,610 --> 00:34:52,857 AUDIENCE: So we could use the view that we just created? 559 00:34:52,857 --> 00:34:54,190 CARTER ZENKE: Yeah, a nice idea. 560 00:34:54,190 --> 00:34:57,450 So we could actually use the same view we just created 561 00:34:57,450 --> 00:34:59,700 to create ourselves a brand new view. 562 00:34:59,700 --> 00:35:01,890 So just because something is a view doesn't mean 563 00:35:01,890 --> 00:35:05,140 we can't use it to then create another view as well. 564 00:35:05,140 --> 00:35:06,465 So we'll do exactly that. 565 00:35:06,465 --> 00:35:08,400 We'll come back over here. 566 00:35:08,400 --> 00:35:12,570 And I can type dot schema to see I still have 567 00:35:12,570 --> 00:35:15,510 this view called average book ratings. 568 00:35:15,510 --> 00:35:20,160 And in it I have book IDs, their title, the year they were nominated, 569 00:35:20,160 --> 00:35:21,810 and their average rating. 570 00:35:21,810 --> 00:35:25,080 But now I want to find, let's say, the average rating 571 00:35:25,080 --> 00:35:28,680 for each individual year, not for each individual book. 572 00:35:28,680 --> 00:35:30,160 So what could I do? 573 00:35:30,160 --> 00:35:36,360 I could say select the year column and the rounded average rating, 574 00:35:36,360 --> 00:35:46,290 in this case, from the rating column in average_book_ratings. 575 00:35:46,290 --> 00:35:47,585 And I'll wrap this query here. 576 00:35:47,585 --> 00:35:49,710 Let me zoom out, so you can see it all on one line. 577 00:35:49,710 --> 00:35:55,270 Now, I'll say I want to group by the year column. 578 00:35:55,270 --> 00:35:58,950 So here I'm now saying, find me the average rating 579 00:35:58,950 --> 00:36:03,900 of each book, of each book's average rating, for each individual year. 580 00:36:03,900 --> 00:36:04,770 I'll hit Enter. 581 00:36:04,770 --> 00:36:09,930 And we should see, if I zoom back in, for 2018 we have 3.75. 582 00:36:09,930 --> 00:36:11,550 For 2019, we have 3.64. 583 00:36:11,550 --> 00:36:16,012 So this means that apparently in 2023, the average rating 584 00:36:16,012 --> 00:36:18,720 for books that were longlisted for the international Booker Prize 585 00:36:18,720 --> 00:36:22,800 was 3.78, and same for 2022, 3.87. 586 00:36:22,800 --> 00:36:25,110 2021 was 3.69. 587 00:36:25,110 --> 00:36:27,310 So I have this new view here. 588 00:36:27,310 --> 00:36:30,390 I could store this temporarily inside of a view 589 00:36:30,390 --> 00:36:34,330 that I could use while I'm doing some analysis on this given day. 590 00:36:34,330 --> 00:36:37,740 So I'll say create a temporary view. 591 00:36:37,740 --> 00:36:42,030 And I'll call this one average ratings by year. 592 00:36:42,030 --> 00:36:46,530 How were the books rated each individual year on average? 593 00:36:46,530 --> 00:36:49,770 And I'll say this is the query that I just did before. 594 00:36:49,770 --> 00:36:54,660 Select, in this case, the year and the rounded average rating 595 00:36:54,660 --> 00:37:02,280 to two decimal places, as the rating column from average book ratings, 596 00:37:02,280 --> 00:37:04,390 that same view from before. 597 00:37:04,390 --> 00:37:08,190 Now, I'm creating a view from my view. 598 00:37:08,190 --> 00:37:11,610 I'll group by here, group by year. 599 00:37:11,610 --> 00:37:13,140 And then I'll hit Enter. 600 00:37:13,140 --> 00:37:15,510 And I should see nothing popping up. 601 00:37:15,510 --> 00:37:18,810 But now if I wanted to query this view, I 602 00:37:18,810 --> 00:37:23,730 could say select star from average ratings by year. 603 00:37:23,730 --> 00:37:28,230 And I'll see the results of that view. 604 00:37:28,230 --> 00:37:31,470 Now, though, here's the catch and why it's temporary. 605 00:37:31,470 --> 00:37:37,770 If I were to type dot quit now, dot quit, and try it again, 606 00:37:37,770 --> 00:37:41,040 SQLite 3 long list dot DB. 607 00:37:41,040 --> 00:37:42,840 Let me retype that. 608 00:37:42,840 --> 00:37:45,690 SQLite 3 long list dot DB. 609 00:37:45,690 --> 00:37:54,450 If I now try to select star from average ratings by year semicolon, 610 00:37:54,450 --> 00:37:57,060 I'll get no such table and really no such view. 611 00:37:57,060 --> 00:37:59,460 I don't have a view called average ratings by year 612 00:37:59,460 --> 00:38:03,450 anymore because this was temporary. 613 00:38:03,450 --> 00:38:10,200 So let me ask, what questions do we have on temporary tables, if any? 614 00:38:10,200 --> 00:38:12,030 Temporary views rather. 615 00:38:12,030 --> 00:38:13,990 AUDIENCE: Right, so thank you. 616 00:38:13,990 --> 00:38:17,610 So the first thing that I notice is that it just 617 00:38:17,610 --> 00:38:23,850 helps us to do some kind of testing when we use temporary views. 618 00:38:23,850 --> 00:38:29,490 Basically, the views will not be saved in the database. 619 00:38:29,490 --> 00:38:35,280 But again, it's just a good way just to test the query, whether it is usable 620 00:38:35,280 --> 00:38:36,332 or not. 621 00:38:36,332 --> 00:38:38,290 CARTER ZENKE: That's a great use case for them. 622 00:38:38,290 --> 00:38:41,160 And in general, if you are working on a database 623 00:38:41,160 --> 00:38:44,070 and you want to organize the data in some way, 624 00:38:44,070 --> 00:38:46,980 but you don't want to store that organization long term, 625 00:38:46,980 --> 00:38:49,500 a temporary view is great for that. 626 00:38:49,500 --> 00:38:52,500 If though, you are a developer and you want 627 00:38:52,500 --> 00:38:55,210 to ingrain this view as part of your database, 628 00:38:55,210 --> 00:38:57,780 you might just use create view, plain and simple, 629 00:38:57,780 --> 00:39:01,300 to have it always be in your database for you wherever you connect to it 630 00:39:01,300 --> 00:39:03,000 and whenever you connect to it. 631 00:39:03,000 --> 00:39:04,726 Great question. 632 00:39:04,726 --> 00:39:06,090 OK, let's keep going. 633 00:39:06,090 --> 00:39:12,810 And let's introduce now something called a common table expression, or a CTE. 634 00:39:12,810 --> 00:39:16,230 And a common table expression is simply a view 635 00:39:16,230 --> 00:39:19,810 that exists for the duration of a single query. 636 00:39:19,810 --> 00:39:24,750 So a regular view, as we saw before, exists forever in my schema. 637 00:39:24,750 --> 00:39:28,200 A temporary view exists for a single connection. 638 00:39:28,200 --> 00:39:32,280 But a CTE exists for a single query. 639 00:39:32,280 --> 00:39:34,245 And the syntax looks just a bit different. 640 00:39:34,245 --> 00:39:39,720 With a CTE I can write a single query, but at the beginning of that query, 641 00:39:39,720 --> 00:39:43,440 define the CTE, the Common Table Expression. 642 00:39:43,440 --> 00:39:48,690 To do so, I would say with and give it some name, as and then 643 00:39:48,690 --> 00:39:54,580 inside these parentheses say the query I want to be this CTE here. 644 00:39:54,580 --> 00:39:59,820 I could then optionally follow it up with more CTEs for this same query. 645 00:39:59,820 --> 00:40:05,580 Or I could omit the comma, and then use just select from the name of a CTE 646 00:40:05,580 --> 00:40:06,450 up above. 647 00:40:06,450 --> 00:40:11,010 This is useful if you want to have some very temporary view that you don't 648 00:40:11,010 --> 00:40:16,480 want to store in your schema, but you do want to use for this particular query. 649 00:40:16,480 --> 00:40:21,990 So let's see an example now recreating our average ratings by year, 650 00:40:21,990 --> 00:40:24,750 but now using a CTE. 651 00:40:24,750 --> 00:40:26,910 So I'll come back to my computer. 652 00:40:26,910 --> 00:40:29,970 And I'm already in my SQLite terminal. 653 00:40:29,970 --> 00:40:33,930 What I should do, though, is try to drop the prior view. 654 00:40:33,930 --> 00:40:37,230 I want to remove that view called average book 655 00:40:37,230 --> 00:40:41,950 ratings to create room to have now a CTE with that very same name. 656 00:40:41,950 --> 00:40:46,110 So I'll say, drop view and the name of the view I want to drop, 657 00:40:46,110 --> 00:40:53,200 average book ratings, average book ratings, semicolon. 658 00:40:53,200 --> 00:40:58,620 And now I should see that view is no longer in my schema. 659 00:40:58,620 --> 00:41:01,810 So I want to recreate though what we did before, 660 00:41:01,810 --> 00:41:06,520 which was finding the average ratings of books year over year. 661 00:41:06,520 --> 00:41:09,180 So let me try to write this query. 662 00:41:09,180 --> 00:41:12,990 But it would be handy if in that query I had access 663 00:41:12,990 --> 00:41:16,410 to a view that gave me the average rating for each book. 664 00:41:16,410 --> 00:41:18,510 So I'll write the query like this. 665 00:41:18,510 --> 00:41:19,800 I'll use a CTE. 666 00:41:19,800 --> 00:41:26,370 And I'll say, with average book ratings, average book ratings, 667 00:41:26,370 --> 00:41:31,080 as now this query I'll write now inside. 668 00:41:31,080 --> 00:41:32,760 And this query should be familiar. 669 00:41:32,760 --> 00:41:38,280 It's the same query we use to find the average rating for each book. 670 00:41:38,280 --> 00:41:47,820 I'll say here, I want to select the book ID column, the title column, the year 671 00:41:47,820 --> 00:41:53,700 column, and the rounded average rating to two decimal places, 672 00:41:53,700 --> 00:42:02,880 calling it rating, in this case from my ratings table, like this. 673 00:42:02,880 --> 00:42:04,360 Now, what do I need to do? 674 00:42:04,360 --> 00:42:08,850 I also need to, as you saw before, join in my books table. 675 00:42:08,850 --> 00:42:13,380 Join books on the primary and foreign keys aligning. 676 00:42:13,380 --> 00:42:19,440 So I'll say the book ID column, that foreign key in the ratings table, 677 00:42:19,440 --> 00:42:25,140 should equal books dot ID, that primary key column in books. 678 00:42:25,140 --> 00:42:32,160 Then I will group by the book ID to find the average rating for each book. 679 00:42:32,160 --> 00:42:35,820 And now, let me close out this CTE. 680 00:42:35,820 --> 00:42:38,010 I'll close my parentheses. 681 00:42:38,010 --> 00:42:42,300 And because I'm only using one CTE, I don't need a comma. 682 00:42:42,300 --> 00:42:46,170 I can then just write the rest of my query. 683 00:42:46,170 --> 00:42:48,250 I'll hit Enter for style sake. 684 00:42:48,250 --> 00:42:52,320 And now with this CTE called average book ratings, 685 00:42:52,320 --> 00:42:55,920 I want to find the average rating year over year 686 00:42:55,920 --> 00:42:58,290 grouping not by book, but by year. 687 00:42:58,290 --> 00:43:02,430 So I'll say select, in this case, year, and then 688 00:43:02,430 --> 00:43:09,900 select the rounded average rating still to two decimal places as a new rating 689 00:43:09,900 --> 00:43:11,100 column. 690 00:43:11,100 --> 00:43:16,290 Then I'll select this from, in this case, my CTE. 691 00:43:16,290 --> 00:43:22,080 So I'll hit Enter here as well from average book ratings, 692 00:43:22,080 --> 00:43:26,070 that same CTE I defined earlier in my query. 693 00:43:26,070 --> 00:43:30,000 Now finally, to find the average rating across individual years, 694 00:43:30,000 --> 00:43:34,060 I'll say group by year semicolon. 695 00:43:34,060 --> 00:43:38,940 And now hopefully, if I hit Enter, I should see the average rating 696 00:43:38,940 --> 00:43:42,950 for each year in my database. 697 00:43:42,950 --> 00:43:48,900 OK, so this CTE is useful when you have subqueries you don't want to repeat 698 00:43:48,900 --> 00:43:51,450 or when you want to make a view that only 699 00:43:51,450 --> 00:43:53,910 exists for the duration of a query. 700 00:43:53,910 --> 00:43:58,680 And it's with all of these tools, like views, temporary views, and CTEs 701 00:43:58,680 --> 00:44:02,810 that we can get away with storing statistics about some data in our data 702 00:44:02,810 --> 00:44:07,200 set and not storing them in the actual database, 703 00:44:07,200 --> 00:44:10,290 but only inside this view we might have here. 704 00:44:10,290 --> 00:44:13,230 What we'll do next is see how we can use views 705 00:44:13,230 --> 00:44:17,925 to create not just these aggregations, but also partitions of our data. 706 00:44:17,925 --> 00:44:23,310 A way to split our data up into logical pieces for use for our own sake or even 707 00:44:23,310 --> 00:44:25,380 for application sake as well. 708 00:44:25,380 --> 00:44:27,390 We'll come back In just a few. 709 00:44:27,390 --> 00:44:28,840 And we're back. 710 00:44:28,840 --> 00:44:34,410 So we've seen so far how to use views to simplify and also aggregate our data. 711 00:44:34,410 --> 00:44:38,400 What we'll see now though, is how to use views to partition our data. 712 00:44:38,400 --> 00:44:41,280 That is to break it into smaller pieces that will be useful 713 00:44:41,280 --> 00:44:44,040 to us or to some application. 714 00:44:44,040 --> 00:44:47,580 Now, on the International Booker Prize website, 715 00:44:47,580 --> 00:44:51,210 they have a page for each year of longlisted books. 716 00:44:51,210 --> 00:44:56,820 But in our table, we have all the books together, regardless of their year. 717 00:44:56,820 --> 00:45:00,660 Well, maybe it will be useful in the application, even for me 718 00:45:00,660 --> 00:45:04,380 personally, to have a table one for each year of books that 719 00:45:04,380 --> 00:45:07,140 have been nominated for this prize. 720 00:45:07,140 --> 00:45:09,330 But before I create that view, I should ask, 721 00:45:09,330 --> 00:45:14,910 how do I try to find the books nominated in 2023? 722 00:45:14,910 --> 00:45:18,870 If my table looks like this, I have a table of books with ID, 723 00:45:18,870 --> 00:45:22,560 title, and year columns, what query could 724 00:45:22,560 --> 00:45:28,230 I use to find only those books nominated in 2023? 725 00:45:28,230 --> 00:45:31,240 AUDIENCE: Use the WHERE clause with the year? 726 00:45:31,240 --> 00:45:33,240 CARTER ZENKE: Yeah, I could use the WHERE clause 727 00:45:33,240 --> 00:45:35,010 and try to search for a given year. 728 00:45:35,010 --> 00:45:38,970 Like I could try to say, where, in this case, the year is 2023 729 00:45:38,970 --> 00:45:40,890 or equal to 2023. 730 00:45:40,890 --> 00:45:43,230 And that would give me back these two rows 731 00:45:43,230 --> 00:45:45,420 if only two books were nominated in 2023. 732 00:45:45,420 --> 00:45:47,380 In reality it's 13. 733 00:45:47,380 --> 00:45:51,700 If I wanted to find the next ones from 2022, what could I do? 734 00:45:51,700 --> 00:45:57,120 I could just change the WHERE clause to say where year equals 2022. 735 00:45:57,120 --> 00:45:59,250 And then same thing for 2021. 736 00:45:59,250 --> 00:46:03,030 I could say where year equals 2021. 737 00:46:03,030 --> 00:46:06,480 Now, each of these segments of my data is 738 00:46:06,480 --> 00:46:10,440 what we'll call a partition, a certain breaking down 739 00:46:10,440 --> 00:46:15,930 of my data into smaller pieces, in this case, organized by year. 740 00:46:15,930 --> 00:46:20,910 So let's now create smaller tables, one for each year that 741 00:46:20,910 --> 00:46:22,950 might be in our data set here. 742 00:46:22,950 --> 00:46:25,000 I'll come back to my terminal. 743 00:46:25,000 --> 00:46:28,660 And let's think about writing this query from before. 744 00:46:28,660 --> 00:46:33,420 So if I wanted to find the books that were nominated in 2022, 745 00:46:33,420 --> 00:46:40,770 I could say select the ID column and the title column from my books table. 746 00:46:40,770 --> 00:46:48,240 And then I want to find only those books, where the year equals 2022, then 747 00:46:48,240 --> 00:46:50,100 semicolon, and then Enter. 748 00:46:50,100 --> 00:46:56,280 And I should see only those books that were nominated in 2022. 749 00:46:56,280 --> 00:46:59,490 But now, I want to create a view that might actually 750 00:46:59,490 --> 00:47:02,930 store the results of this query more long term. 751 00:47:02,930 --> 00:47:06,380 I could say create view and call this, let's 752 00:47:06,380 --> 00:47:15,720 say, 2022 as now same query from before, select ID and title from my books 753 00:47:15,720 --> 00:47:25,190 table, in this case where the year equals 2022 semicolon. 754 00:47:25,190 --> 00:47:30,530 Now, if I ever want to find only those books nominated in 2022, 755 00:47:30,530 --> 00:47:31,970 I could now use my view. 756 00:47:31,970 --> 00:47:39,050 I could say select star from 2022 quote and then semicolon, hit Enter. 757 00:47:39,050 --> 00:47:43,310 And now I should see those very same books. 758 00:47:43,310 --> 00:47:44,960 So what could I do now? 759 00:47:44,960 --> 00:47:48,330 Maybe I want those books from 2021 as well. 760 00:47:48,330 --> 00:47:50,600 Let me ask the audience, what could you do 761 00:47:50,600 --> 00:47:57,020 to create the view for the books in 2021 based on what we've just seen now? 762 00:47:57,020 --> 00:48:02,242 AUDIENCE: We could just change the year from 2022 to 2021. 763 00:48:02,242 --> 00:48:03,950 CARTER ZENKE: Yeah, just change the year. 764 00:48:03,950 --> 00:48:07,140 And what would you propose we call this new view? 765 00:48:07,140 --> 00:48:10,790 if you had to take the reins on naming this view, what would you name it? 766 00:48:10,790 --> 00:48:14,120 AUDIENCE: Books that were written in 2021. 767 00:48:14,120 --> 00:48:17,600 CARTER ZENKE: Yeah, we could call it books that were nominated in 2021. 768 00:48:17,600 --> 00:48:20,522 But I would argue that might be a bit of a long view title. 769 00:48:20,522 --> 00:48:21,980 So let's shorten just a little bit. 770 00:48:21,980 --> 00:48:25,830 And maybe in keeping with our current naming, we'll say 2021 instead. 771 00:48:25,830 --> 00:48:27,270 So I'll come back over here. 772 00:48:27,270 --> 00:48:28,400 And let's try this. 773 00:48:28,400 --> 00:48:32,090 I'll say create a view. 774 00:48:32,090 --> 00:48:34,440 And we could, by all accounts and purposes, 775 00:48:34,440 --> 00:48:41,360 we could say books nominated in 2021. 776 00:48:41,360 --> 00:48:43,340 And that's totally OK. 777 00:48:43,340 --> 00:48:48,188 But if I were to pass this view on to a new programmer, a new database analyst, 778 00:48:48,188 --> 00:48:51,230 well, I kind of want to shorten it for them, make it more simple for them 779 00:48:51,230 --> 00:48:51,840 to read. 780 00:48:51,840 --> 00:48:55,130 So I could just call this 2021. 781 00:48:55,130 --> 00:48:58,100 Of course, this title is kind of ambiguous. 782 00:48:58,100 --> 00:49:02,360 Like if I pass you a view called 2021, are you automatically going 783 00:49:02,360 --> 00:49:04,500 to know what's inside that view? 784 00:49:04,500 --> 00:49:05,360 Probably not. 785 00:49:05,360 --> 00:49:07,970 So it's actually a balance between how descriptive 786 00:49:07,970 --> 00:49:10,130 you want to be with your names here. 787 00:49:10,130 --> 00:49:14,460 We'll stick though with 2021 to our earlier example here. 788 00:49:14,460 --> 00:49:22,220 Now I'll say create view 2021 as select ID, title from books. 789 00:49:22,220 --> 00:49:28,910 And to Fabian's point, I'll say where the year equals 2021 semicolon Enter. 790 00:49:28,910 --> 00:49:36,320 And now I could say, select star from 2021 semicolon. 791 00:49:36,320 --> 00:49:42,080 So this is a good way to break apart a large table into logical pieces. 792 00:49:42,080 --> 00:49:46,910 It's useful for both you as a programmer and also for applications 793 00:49:46,910 --> 00:49:50,300 that might only need to access some subset of your data, 794 00:49:50,300 --> 00:49:53,430 where some condition is true. 795 00:49:53,430 --> 00:49:56,750 So based on what we've seen so far, what questions 796 00:49:56,750 --> 00:50:00,560 do we have on partitioning data with views? 797 00:50:00,560 --> 00:50:05,300 AUDIENCE: We have created two views for 2021 and 2022. 798 00:50:05,300 --> 00:50:11,780 Rather than creating two views, is it possible to simplify it as one 799 00:50:11,780 --> 00:50:17,327 and then use it rather like where you simply change that alone? 800 00:50:17,327 --> 00:50:19,410 CARTER ZENKE: Yeah, I mean, it's a trade off here. 801 00:50:19,410 --> 00:50:22,740 So you could imagine trying to do, let's say, 802 00:50:22,740 --> 00:50:27,380 a single view that maybe has just titles and just years in it. 803 00:50:27,380 --> 00:50:29,630 But then you'd have to rewrite the query later on 804 00:50:29,630 --> 00:50:32,000 to find only those books for a certain year. 805 00:50:32,000 --> 00:50:35,030 Here, what we're trying to do is just like break up this big table 806 00:50:35,030 --> 00:50:36,118 into smaller pieces. 807 00:50:36,118 --> 00:50:37,910 And you can imagine it's being particularly 808 00:50:37,910 --> 00:50:41,090 useful if you had not just 78 books, but even like 809 00:50:41,090 --> 00:50:43,895 thousands, or tens of thousands, or millions of books. 810 00:50:43,895 --> 00:50:49,220 It could be useful to have a view for each logical segment of those 10,000 811 00:50:49,220 --> 00:50:51,290 or more books. 812 00:50:51,290 --> 00:50:53,090 Let's take one more question here. 813 00:50:53,090 --> 00:50:57,740 AUDIENCE: I would like to ask about updating the view itself. 814 00:50:57,740 --> 00:51:03,860 If I update a single table, any element in each table, 815 00:51:03,860 --> 00:51:06,800 their view can be updated automatically? 816 00:51:06,800 --> 00:51:10,797 Or maybe it [INAUDIBLE]. 817 00:51:10,797 --> 00:51:13,130 CARTER ZENKE: OK, a good forward-thinking question here. 818 00:51:13,130 --> 00:51:15,590 You can, you're right, update a table. 819 00:51:15,590 --> 00:51:20,180 You can say update the table and set some column to some value. 820 00:51:20,180 --> 00:51:23,480 What you can't do actually is update a view. 821 00:51:23,480 --> 00:51:26,645 And that's because a view doesn't have any underlying-- actually, 822 00:51:26,645 --> 00:51:29,060 it doesn't have any data inside of it exactly. 823 00:51:29,060 --> 00:51:32,460 It only pulls data from underlying tables. 824 00:51:32,460 --> 00:51:35,540 So let me show you how I could update this view, 825 00:51:35,540 --> 00:51:37,730 or at least try to update this view. 826 00:51:37,730 --> 00:51:40,160 I'll come back to my computer. 827 00:51:40,160 --> 00:51:44,210 And let's say we wanted to change a book in the 2021 view. 828 00:51:44,210 --> 00:51:48,830 I could say select star from 2021 semicolon. 829 00:51:48,830 --> 00:51:51,870 And maybe it turns out there's a typo in here. 830 00:51:51,870 --> 00:51:54,920 So instead of a minor detail, it's called something 831 00:51:54,920 --> 00:52:00,420 like the minor detail, this book on ID 34 here. 832 00:52:00,420 --> 00:52:02,660 I could try to update this view. 833 00:52:02,660 --> 00:52:07,790 I could say, update 2021 and set the title 834 00:52:07,790 --> 00:52:17,780 equal to the minor detail, where the title currently is minor detail, 835 00:52:17,780 --> 00:52:19,910 like this, minor detail. 836 00:52:19,910 --> 00:52:21,680 Then I'll hit semicolon. 837 00:52:21,680 --> 00:52:28,340 And if I do this, I get this error, cannot modify 2021 because it is 838 00:52:28,340 --> 00:52:29,330 a view. 839 00:52:29,330 --> 00:52:31,700 Now, why can't we modify a view? 840 00:52:31,700 --> 00:52:36,200 Well, recall that a view is simply the combination 841 00:52:36,200 --> 00:52:38,780 of data from underlying tables. 842 00:52:38,780 --> 00:52:41,510 I've assembled data from different pieces of my database 843 00:52:41,510 --> 00:52:43,520 and put them into a single view. 844 00:52:43,520 --> 00:52:46,760 But I can't then update that data because that data 845 00:52:46,760 --> 00:52:50,600 is located in multiple individual tables around my database. 846 00:52:50,600 --> 00:52:52,460 What I should instead do is try to update 847 00:52:52,460 --> 00:52:56,870 the underlying table, which I can do as we saw before with just update 848 00:52:56,870 --> 00:52:59,810 and set to. 849 00:52:59,810 --> 00:53:03,920 So more on how we can update, or at least try to update, 850 00:53:03,920 --> 00:53:05,120 views a little bit later. 851 00:53:05,120 --> 00:53:07,280 We'll take a break and come back to talk about how 852 00:53:07,280 --> 00:53:10,530 we can secure data using views as well. 853 00:53:10,530 --> 00:53:11,870 And we're back. 854 00:53:11,870 --> 00:53:15,740 So one more use case for views is being able to secure 855 00:53:15,740 --> 00:53:17,280 the data that's in your database. 856 00:53:17,280 --> 00:53:19,970 You might imagine having a table. 857 00:53:19,970 --> 00:53:24,590 But some columns in that table you don't want to share with somebody else. 858 00:53:24,590 --> 00:53:28,610 Well, a common practice in security is to only give someone 859 00:53:28,610 --> 00:53:31,470 the information they need to know. 860 00:53:31,470 --> 00:53:34,070 With views you could do that very principle. 861 00:53:34,070 --> 00:53:36,020 You could accomplish that by taking a table 862 00:53:36,020 --> 00:53:40,350 and reducing the number of columns somebody else might see. 863 00:53:40,350 --> 00:53:43,370 So let's consider here some rideshare company. 864 00:53:43,370 --> 00:53:46,370 And rideshare companies keep track of a lot of information about us. 865 00:53:46,370 --> 00:53:49,640 They keep track of where we're going, where we're coming from, 866 00:53:49,640 --> 00:53:54,110 who's in the car with us, and who drove us there even sometimes. 867 00:53:54,110 --> 00:53:58,850 Let's say that a rideshare company has a table called rides. 868 00:53:58,850 --> 00:54:03,110 And inside this table is the origin of several rides 869 00:54:03,110 --> 00:54:08,210 and the destination along with the rider who took that ride. 870 00:54:08,210 --> 00:54:10,880 Now, if I were to ask you here, let's say, 871 00:54:10,880 --> 00:54:14,600 I want to give this data to an analyst to figure out 872 00:54:14,600 --> 00:54:17,840 which rides are most popular or which rides are highly rated. 873 00:54:17,840 --> 00:54:21,260 What data should I probably omit? 874 00:54:21,260 --> 00:54:24,050 If I wanted to share this data with somebody else, 875 00:54:24,050 --> 00:54:27,150 what should I remove from this table? 876 00:54:27,150 --> 00:54:32,608 AUDIENCE: The data you could remove would be the rider. 877 00:54:32,608 --> 00:54:33,900 CARTER ZENKE: Yeah, good point. 878 00:54:33,900 --> 00:54:37,850 So if I look at this table and I think of what information 879 00:54:37,850 --> 00:54:43,170 is personally identifiable, PII, Personally Identifiable Information, 880 00:54:43,170 --> 00:54:47,570 it seems like this rider column could identify somebody who took a ride from, 881 00:54:47,570 --> 00:54:50,210 let's say, Good Egg Galaxy to Honeyhive Galaxy, 882 00:54:50,210 --> 00:54:52,670 and we don't want to leak that information 883 00:54:52,670 --> 00:54:54,920 if I gave this to some analyst. 884 00:54:54,920 --> 00:54:59,210 Well, with a view I could keep this underlying table. 885 00:54:59,210 --> 00:55:04,280 But I could create a view that only has the origin and the destination, 886 00:55:04,280 --> 00:55:06,710 that omits the rider column. 887 00:55:06,710 --> 00:55:11,240 And I could more confidently share this view with an analyst, not 888 00:55:11,240 --> 00:55:13,190 the underlying table. 889 00:55:13,190 --> 00:55:15,980 So let me try to implement this view, so we can see it 890 00:55:15,980 --> 00:55:18,690 in action inside this database here. 891 00:55:18,690 --> 00:55:20,940 I'll come back to my computer. 892 00:55:20,940 --> 00:55:23,610 And in this case, I have a new database. 893 00:55:23,610 --> 00:55:26,250 I have one called rideshare dot DB. 894 00:55:26,250 --> 00:55:31,510 I'll type SQLite 3 rideshare dot DB to open up this database. 895 00:55:31,510 --> 00:55:35,580 And if I type dot schema to see the schema of the database, 896 00:55:35,580 --> 00:55:38,670 I should see I have one table called rides, 897 00:55:38,670 --> 00:55:44,790 where each ride has an ID, an origin, a destination, and a rider. 898 00:55:44,790 --> 00:55:51,660 Now, I'll select star from the rides table, select star from rides, 899 00:55:51,660 --> 00:55:52,740 hit semicolon. 900 00:55:52,740 --> 00:55:57,032 And I should see the very same data we just saw in our slides, 901 00:55:57,032 --> 00:55:58,740 where Peach is going from Good Egg Galaxy 902 00:55:58,740 --> 00:56:02,310 to Honeyhive Galaxy, Mario going from Castle Courtyard to Cascade Kingdom, 903 00:56:02,310 --> 00:56:04,030 and so on. 904 00:56:04,030 --> 00:56:09,480 So now, I know I don't want to share this data with somebody else. 905 00:56:09,480 --> 00:56:11,580 What could I do? 906 00:56:11,580 --> 00:56:13,800 Instead of selecting everything, I could try 907 00:56:13,800 --> 00:56:16,630 to omit this rider column to the point before. 908 00:56:16,630 --> 00:56:17,610 So let's try that. 909 00:56:17,610 --> 00:56:24,690 I'll select just ID, and just origin, and just destination from rides. 910 00:56:24,690 --> 00:56:28,200 I'll deliberately omit that rider column. 911 00:56:28,200 --> 00:56:30,300 Now, I'll hit Enter. 912 00:56:30,300 --> 00:56:33,576 And I should see the origin and destination. 913 00:56:33,576 --> 00:56:36,840 But I actually don't see the rider column. 914 00:56:36,840 --> 00:56:38,820 So this is a decent view. 915 00:56:38,820 --> 00:56:41,700 But I could probably improve this even more. 916 00:56:41,700 --> 00:56:47,040 It turns out that in SQLite if I want to fill a column with some arbitrary 917 00:56:47,040 --> 00:56:49,590 value, I can do that a bit like this. 918 00:56:49,590 --> 00:56:56,350 Why don't I try to select ID and select origin and destination. 919 00:56:56,350 --> 00:57:01,620 But I also want to let somebody know that this data has been anonymized. 920 00:57:01,620 --> 00:57:04,350 There is rider information in the underlying table. 921 00:57:04,350 --> 00:57:06,340 But you don't have access to it. 922 00:57:06,340 --> 00:57:10,200 What I'll do instead is say, let me make this new column that's filled 923 00:57:10,200 --> 00:57:13,470 with this value anonymous as a string. 924 00:57:13,470 --> 00:57:19,560 I'll say quote unquote "anonymous" as this column name called rider. 925 00:57:19,560 --> 00:57:24,630 And I'll select all of this from the rides table semicolon. 926 00:57:24,630 --> 00:57:29,760 Now I have more helpfully indicated that there is a rider column. 927 00:57:29,760 --> 00:57:34,800 But you deliberately don't have access to see who those riders are. 928 00:57:34,800 --> 00:57:38,610 So my last step here is probably to turn this into a view. 929 00:57:38,610 --> 00:57:40,320 I have a good query. 930 00:57:40,320 --> 00:57:43,140 But I want to save it so I could query it later on. 931 00:57:43,140 --> 00:57:45,570 Well, I could type select or not select. 932 00:57:45,570 --> 00:57:47,130 I could type Create View first. 933 00:57:47,130 --> 00:57:48,210 I'll create a view. 934 00:57:48,210 --> 00:57:49,980 This one just called analysis. 935 00:57:49,980 --> 00:57:51,810 I'll give this to my data analysts here. 936 00:57:51,810 --> 00:57:53,760 Create view analysis. 937 00:57:53,760 --> 00:58:01,570 And then I'll say as let's go with select ID, origin, and destination, 938 00:58:01,570 --> 00:58:08,820 as well as the anonymous column as, in this case, the rider. 939 00:58:08,820 --> 00:58:16,020 And I'll select all of this from my rides table semicolon, hit Enter. 940 00:58:16,020 --> 00:58:21,070 Now, if I type dot schema, I see my new view analysis. 941 00:58:21,070 --> 00:58:27,240 And if I say select star from analysis semicolon, 942 00:58:27,240 --> 00:58:33,870 I should see my "anonymized", quote unquote, this data. 943 00:58:33,870 --> 00:58:36,040 But this is helpful. 944 00:58:36,040 --> 00:58:40,380 I'd argue it's not very secure, at least in SQLite. 945 00:58:40,380 --> 00:58:44,340 And if you're thinking adversarially here, 946 00:58:44,340 --> 00:58:46,470 what might you be able to do still? 947 00:58:46,470 --> 00:58:49,740 If I gave you access to this database, what 948 00:58:49,740 --> 00:58:53,100 could you do to de-anonymize this data and to see 949 00:58:53,100 --> 00:58:55,560 the data I didn't want you to see? 950 00:58:55,560 --> 00:58:58,620 AUDIENCE: You can just query the riders table 951 00:58:58,620 --> 00:59:03,216 and find what are the riders values in the riders column. 952 00:59:03,216 --> 00:59:06,360 CARTER ZENKE: OK, so you're correct that we can still see the rides table. 953 00:59:06,360 --> 00:59:11,400 Maybe I look at the ID column in the analysis view and I see those IDs. 954 00:59:11,400 --> 00:59:15,180 I might be able to link those IDs with the IDs in the rides table. 955 00:59:15,180 --> 00:59:17,490 And nothing's stopping me at least in SQLite 956 00:59:17,490 --> 00:59:20,553 from saying select star from the rides table. 957 00:59:20,553 --> 00:59:21,720 So I'll come back over here. 958 00:59:21,720 --> 00:59:22,900 And I'll try that. 959 00:59:22,900 --> 00:59:28,600 I'll say, in this case, select star from rides, hit semicolon. 960 00:59:28,600 --> 00:59:32,190 And now I see all that ride information, although I hid it 961 00:59:32,190 --> 00:59:34,350 in this view called analysis. 962 00:59:34,350 --> 00:59:37,260 And this is a downside of SQLite. 963 00:59:37,260 --> 00:59:41,790 In SQLite we can't set access controls, meaning I either give you 964 00:59:41,790 --> 00:59:44,890 the entire database or nothing at all. 965 00:59:44,890 --> 00:59:48,870 In other DBMSes though, I could set access controls. 966 00:59:48,870 --> 00:59:52,690 I could give David access only to the analysis view. 967 00:59:52,690 --> 00:59:57,780 I can give myself access to both the rides table and the analysis view. 968 00:59:57,780 --> 01:00:01,590 So keep that in mind when you're working with SQLite. 969 01:00:01,590 --> 01:00:06,870 OK, so we've seen here how to secure some data using SQLite and using 970 01:00:06,870 --> 01:00:08,337 views in this as well. 971 01:00:08,337 --> 01:00:10,920 Let's come back in just a minute and see how we could actually 972 01:00:10,920 --> 01:00:15,340 use SQLite and views to not just secure our data, 973 01:00:15,340 --> 01:00:19,180 but also to work on soft deletions, which you saw just a bit ago. 974 01:00:19,180 --> 01:00:22,800 And we're back for one final application of views. 975 01:00:22,800 --> 01:00:27,390 We saw just a few weeks ago how we can implement soft deletions, that 976 01:00:27,390 --> 01:00:29,730 is not fully deleting something in our database, 977 01:00:29,730 --> 01:00:31,860 but only marking it as deleted. 978 01:00:31,860 --> 01:00:36,180 And we saw this in the context of Boston's own Museum of Fine Art. 979 01:00:36,180 --> 01:00:39,130 It's keeping track of items inside their collection. 980 01:00:39,130 --> 01:00:44,310 So let's say we had this table called collections full of items in the MFA, 981 01:00:44,310 --> 01:00:46,710 the Museum of Fine Arts in Boston. 982 01:00:46,710 --> 01:00:52,470 Here I have four items currently in our collections ID one, ID two, three, 983 01:00:52,470 --> 01:00:53,490 and four. 984 01:00:53,490 --> 01:00:58,530 But I also have this deleted column, which is either 0 or 1, depending on 985 01:00:58,530 --> 01:01:01,660 whether an item has been deleted or not. 986 01:01:01,660 --> 01:01:06,300 So I want to delete, let's say, Farmers Working at Dawn, this first piece here. 987 01:01:06,300 --> 01:01:08,580 Instead of deleting it from the table fully, 988 01:01:08,580 --> 01:01:13,360 I could set this deleted column from a 0 to a 1, just like this. 989 01:01:13,360 --> 01:01:16,260 And now, it's only soft deleted. 990 01:01:16,260 --> 01:01:17,430 I still have the data here. 991 01:01:17,430 --> 01:01:18,840 I still have the title. 992 01:01:18,840 --> 01:01:23,070 But now instead of being a 0, it's marked as a 1. 993 01:01:23,070 --> 01:01:25,420 Now, this is useful. 994 01:01:25,420 --> 01:01:31,950 But what could I do to only see now the items that have not been deleted? 995 01:01:31,950 --> 01:01:35,820 If I show you this database, collections here, the table here, 996 01:01:35,820 --> 01:01:38,890 I have titles and the deleted column. 997 01:01:38,890 --> 01:01:44,310 But if I only want to see those items that are not deleted, what query could 998 01:01:44,310 --> 01:01:46,290 I use for that? 999 01:01:46,290 --> 01:01:48,225 AUDIENCE: You would use a WHERE statement. 1000 01:01:48,225 --> 01:01:50,100 CARTER ZENKE: I could use a SELECT statement. 1001 01:01:50,100 --> 01:01:52,600 And I could use probably a WHERE clause as well. 1002 01:01:52,600 --> 01:01:56,580 I could say select star where perhaps deleted 1003 01:01:56,580 --> 01:02:00,210 equals 0 because in this case, when deleted is 0, 1004 01:02:00,210 --> 01:02:04,540 we know this item is still supposed to be in our table. 1005 01:02:04,540 --> 01:02:08,100 So here's the query here, select star from collections 1006 01:02:08,100 --> 01:02:10,980 where deleted equals 0. 1007 01:02:10,980 --> 01:02:14,520 Now, we can try to convert this query into a view, 1008 01:02:14,520 --> 01:02:18,420 so we could always only see the items in our current collection, 1009 01:02:18,420 --> 01:02:21,373 not those we have marked as deleted. 1010 01:02:21,373 --> 01:02:22,540 So I'll come back over here. 1011 01:02:22,540 --> 01:02:25,150 And we'll see we have our collections table. 1012 01:02:25,150 --> 01:02:28,680 But we also have now perhaps a new view of this table 1013 01:02:28,680 --> 01:02:30,670 called current collections. 1014 01:02:30,670 --> 01:02:35,730 Well, if I set current collections to be the result of this query, 1015 01:02:35,730 --> 01:02:41,010 I should see when I update deleted to be 1 for some of these items here, 1016 01:02:41,010 --> 01:02:44,620 it should be removed from this current collections table. 1017 01:02:44,620 --> 01:02:46,170 So let's try that. 1018 01:02:46,170 --> 01:02:49,530 I could take Farmers Working at Dawn and delete it. 1019 01:02:49,530 --> 01:02:54,720 I could see this correspondence between this row here and this row here. 1020 01:02:54,720 --> 01:02:58,320 It's in both our collections table and current collections. 1021 01:02:58,320 --> 01:03:02,460 But if I set deleted from a 0 to 1, I should 1022 01:03:02,460 --> 01:03:08,310 see that item removed from current collections, thus updating my view 1023 01:03:08,310 --> 01:03:12,970 and keeping track of my items that are not currently in my collection. 1024 01:03:12,970 --> 01:03:17,430 So let's come back here and implement this in terms of our own database. 1025 01:03:17,430 --> 01:03:19,440 I'll come back over here. 1026 01:03:19,440 --> 01:03:24,330 And I will open up MFA dot DB, that old database from a few weeks ago. 1027 01:03:24,330 --> 01:03:28,500 I'll say SQLite 3 MFA dot DB. 1028 01:03:28,500 --> 01:03:32,040 And now if I type dot schema, I should see 1029 01:03:32,040 --> 01:03:36,840 the tables inside of this database, collections, artists, and created. 1030 01:03:36,840 --> 01:03:39,460 We'll focus in particular on the collections table. 1031 01:03:39,460 --> 01:03:43,080 So if I say select star from collections, 1032 01:03:43,080 --> 01:03:47,700 I should see that I have the title, the accession number, and the date 1033 01:03:47,700 --> 01:03:50,460 this piece was acquired by the MFA. 1034 01:03:50,460 --> 01:03:54,510 Now, though, if I want to add a deleted column, 1035 01:03:54,510 --> 01:03:57,840 I could do so by using alter table. 1036 01:03:57,840 --> 01:04:03,180 I could say, let's alter table collections 1037 01:04:03,180 --> 01:04:07,050 and add a column called deleted. 1038 01:04:07,050 --> 01:04:10,590 I'll say this column has the type integer. 1039 01:04:10,590 --> 01:04:12,120 It stores whole numbers. 1040 01:04:12,120 --> 01:04:16,320 And the default number is 0 semicolon. 1041 01:04:16,320 --> 01:04:23,790 Now, if I say select star from collections semicolon, 1042 01:04:23,790 --> 01:04:27,090 I should see there is this new deleted column where 1043 01:04:27,090 --> 01:04:30,820 all the values are 0 at first. 1044 01:04:30,820 --> 01:04:35,010 So if I want to delete something, I could instead of typing 1045 01:04:35,010 --> 01:04:39,840 delete from as we saw before I can just update this deleted value. 1046 01:04:39,840 --> 01:04:48,210 I could say maybe update collections and set deleted equal to 1 1047 01:04:48,210 --> 01:04:51,330 where a certain title is present. 1048 01:04:51,330 --> 01:04:57,870 I'll say this that the title is equal to Farmers-- 1049 01:04:57,870 --> 01:05:03,210 title equals Farmers Working at Dawn semicolon. 1050 01:05:03,210 --> 01:05:06,510 So now, I've soft deleted this item. 1051 01:05:06,510 --> 01:05:09,180 It's still in my table, at least the title is. 1052 01:05:09,180 --> 01:05:11,910 But now I've marked it as being deleted. 1053 01:05:11,910 --> 01:05:14,400 It should not be in my ultimate view. 1054 01:05:14,400 --> 01:05:15,840 I'll hit Enter here. 1055 01:05:15,840 --> 01:05:18,570 And I'll say select star from collections. 1056 01:05:18,570 --> 01:05:23,610 I'll see that in this case, I still see Farmers Working at Dawn. 1057 01:05:23,610 --> 01:05:24,790 But what do I want? 1058 01:05:24,790 --> 01:05:31,000 I want a view that only shows me those items that have not been deleted. 1059 01:05:31,000 --> 01:05:32,220 So I'll try this. 1060 01:05:32,220 --> 01:05:36,120 I'll say select star from collections and apply a condition to it, 1061 01:05:36,120 --> 01:05:38,740 to our earlier point before. 1062 01:05:38,740 --> 01:05:44,160 So I'll say where, in this case, deleted equals 0 semicolon, hit Enter. 1063 01:05:44,160 --> 01:05:49,710 And now I should see only those items that are not marked as deleted. 1064 01:05:49,710 --> 01:05:52,275 But I want to turn this into a view, which I could. 1065 01:05:52,275 --> 01:05:58,080 I could say maybe create a view called current collections. 1066 01:05:58,080 --> 01:06:04,380 And then I'll make this the following query as select ID, 1067 01:06:04,380 --> 01:06:08,730 select title, accession number, and acquired. 1068 01:06:08,730 --> 01:06:12,390 But now I'll actually omit that deleted column. 1069 01:06:12,390 --> 01:06:15,960 I don't want this column to show up in my current collections. 1070 01:06:15,960 --> 01:06:19,980 I only want it to be in the underlying table. 1071 01:06:19,980 --> 01:06:24,160 So I'll select all of these columns from the collections table. 1072 01:06:24,160 --> 01:06:28,980 And then I'll say where deleted equals 0 to apply this condition 1073 01:06:28,980 --> 01:06:33,250 and only get those items that are not marked as deleted. 1074 01:06:33,250 --> 01:06:34,980 Now, I'll hit Enter. 1075 01:06:34,980 --> 01:06:40,080 And I should be able to do this, select star from current collections 1076 01:06:40,080 --> 01:06:41,230 semicolon. 1077 01:06:41,230 --> 01:06:47,410 And now I'll see only those items that have not been deleted. 1078 01:06:47,410 --> 01:06:49,190 So this seems worthwhile. 1079 01:06:49,190 --> 01:06:53,970 I'm now able to view the items that are currently in my collection. 1080 01:06:53,970 --> 01:06:56,190 But a problem remains. 1081 01:06:56,190 --> 01:07:01,200 Like if I tried to update this table and insert 1082 01:07:01,200 --> 01:07:04,380 something into it, what might happen? 1083 01:07:04,380 --> 01:07:06,450 Let me ask our audience here. 1084 01:07:06,450 --> 01:07:08,850 Based on what we saw before, what would happen 1085 01:07:08,850 --> 01:07:14,190 if I tried to insert into this table here, this view perhaps? 1086 01:07:14,190 --> 01:07:20,553 AUDIENCE: I think we cannot update the view itself, as we saw before. 1087 01:07:20,553 --> 01:07:22,720 CARTER ZENKE: Yeah, we can't update the view itself. 1088 01:07:22,720 --> 01:07:24,030 We can't insert new data. 1089 01:07:24,030 --> 01:07:25,780 We can't even delete data. 1090 01:07:25,780 --> 01:07:28,140 So let me try first to delete data. 1091 01:07:28,140 --> 01:07:30,960 And then we'll see how to insert some data a little bit later. 1092 01:07:30,960 --> 01:07:33,210 I'll come back over to my terminal. 1093 01:07:33,210 --> 01:07:36,570 And what I wanted to delete, Imaginative Landscape, 1094 01:07:36,570 --> 01:07:44,250 I could say delete from current collections where the title equals-- 1095 01:07:44,250 --> 01:07:45,960 let me make a new line here-- 1096 01:07:45,960 --> 01:07:54,180 where the title equals Imaginative Landscape semicolon. 1097 01:07:54,180 --> 01:07:56,895 Now, I'll see that same error we got before when 1098 01:07:56,895 --> 01:07:58,500 we tried to update some view. 1099 01:07:58,500 --> 01:08:02,760 I can't modify this because it is a view. 1100 01:08:02,760 --> 01:08:06,360 Well, it turns out there is a way to work around this. 1101 01:08:06,360 --> 01:08:08,310 We can't modify the view. 1102 01:08:08,310 --> 01:08:12,000 But we can modify the underlying table. 1103 01:08:12,000 --> 01:08:17,920 So to do this, we rank from old friends, in this case, those called triggers. 1104 01:08:17,920 --> 01:08:21,390 So if you remember, a trigger is a way to specify 1105 01:08:21,390 --> 01:08:26,340 some SQL statement I want to run when some other SQL statement is run. 1106 01:08:26,340 --> 01:08:29,609 We saw before we could use before or after 1107 01:08:29,609 --> 01:08:34,050 to say run this SQL statement either before or after some other SQL 1108 01:08:34,050 --> 01:08:35,100 statement here. 1109 01:08:35,100 --> 01:08:40,170 What we didn't see until now is this INSTEAD OF trigger. 1110 01:08:40,170 --> 01:08:42,450 I can create a trigger that has a name. 1111 01:08:42,450 --> 01:08:46,810 And I can run it instead of some other SQL statement. 1112 01:08:46,810 --> 01:08:50,760 So I could say INSTEAD OF deleting on this view, 1113 01:08:50,760 --> 01:08:54,630 let me give you the proper query to run, in this case. 1114 01:08:54,630 --> 01:08:58,620 So instead of deleting on this view, I will for each row 1115 01:08:58,620 --> 01:09:00,420 run this other statement. 1116 01:09:00,420 --> 01:09:01,870 I'll say begin. 1117 01:09:01,870 --> 01:09:04,090 This is the statement I want you to run instead. 1118 01:09:04,090 --> 01:09:08,100 And in this statement, we could have something that actually updates 1119 01:09:08,100 --> 01:09:11,399 not the view, but the underlying table. 1120 01:09:11,399 --> 01:09:14,680 And we then end that query here. 1121 01:09:14,680 --> 01:09:19,920 So based on what we know about our view and the underlying table 1122 01:09:19,920 --> 01:09:25,470 called collections, what query should we put in here to actually delete 1123 01:09:25,470 --> 01:09:28,510 something from our table? 1124 01:09:28,510 --> 01:09:30,390 Let me ask our audience. 1125 01:09:30,390 --> 01:09:36,593 Instead of deleting from our view, what should we instead do to our table? 1126 01:09:36,593 --> 01:09:38,760 AUDIENCE: Yeah, you should update the deleted column 1127 01:09:38,760 --> 01:09:41,130 in the collections table. 1128 01:09:41,130 --> 01:09:44,399 CARTER ZENKE: Yeah, so instead of deleting from our view, what I really 1129 01:09:44,399 --> 01:09:46,720 want to do is update that table. 1130 01:09:46,720 --> 01:09:48,645 So let's try to trigger it to do just that. 1131 01:09:48,645 --> 01:09:51,540 I'll come back to my computer here. 1132 01:09:51,540 --> 01:09:54,420 And I'll start on this trigger. 1133 01:09:54,420 --> 01:09:57,390 To our point, I want to create a trigger that 1134 01:09:57,390 --> 01:10:00,120 runs instead of deleting on this view. 1135 01:10:00,120 --> 01:10:04,710 And what I want to instead happen is an update on the underlying table. 1136 01:10:04,710 --> 01:10:09,420 So I'll say create a trigger, this one called delete. 1137 01:10:09,420 --> 01:10:15,840 And then I'll say instead of delete on current collections, that 1138 01:10:15,840 --> 01:10:21,780 is whenever SQL listens and hears a delete on the current collections view, 1139 01:10:21,780 --> 01:10:23,460 I don't want that to happen. 1140 01:10:23,460 --> 01:10:26,190 I instead want this other statement to happen. 1141 01:10:26,190 --> 01:10:29,370 I'll say for each row affected by that delete, 1142 01:10:29,370 --> 01:10:32,760 for each row I ask delete on current collections, run 1143 01:10:32,760 --> 01:10:34,560 instead this other statement. 1144 01:10:34,560 --> 01:10:35,550 I'll say begin. 1145 01:10:35,550 --> 01:10:36,990 Here is that statement. 1146 01:10:36,990 --> 01:10:39,400 One, two, three, four indents for style sake. 1147 01:10:39,400 --> 01:10:51,120 And now I'll say I want to update collections and set deleted equal to 1. 1148 01:10:51,120 --> 01:10:54,300 But I don't want to leave this just as is. 1149 01:10:54,300 --> 01:10:58,770 If I say update collections, set deleted equal to 1, 1150 01:10:58,770 --> 01:11:03,160 what's the problem with that statement, if I end it there? 1151 01:11:03,160 --> 01:11:05,820 AUDIENCE: So yeah, it will update all the columns, 1152 01:11:05,820 --> 01:11:09,540 whether it is we want it to be deleted or not. 1153 01:11:09,540 --> 01:11:12,970 CARTER ZENKE: Yeah, with no condition it'll update all the columns here. 1154 01:11:12,970 --> 01:11:15,690 So I want to apply a condition to make sure it only affects 1155 01:11:15,690 --> 01:11:18,340 the particular row I'm working with. 1156 01:11:18,340 --> 01:11:20,280 Well, what uniquely identifies each row? 1157 01:11:20,280 --> 01:11:21,490 The ID does. 1158 01:11:21,490 --> 01:11:23,460 So let's try using the ID here. 1159 01:11:23,460 --> 01:11:24,390 I'll come back. 1160 01:11:24,390 --> 01:11:28,530 And I'll say I want to update collections, set deleted equal to 1 1161 01:11:28,530 --> 01:11:31,800 but only when some condition is true. 1162 01:11:31,800 --> 01:11:37,600 I'll indent again four times and say where, in this case, 1163 01:11:37,600 --> 01:11:44,310 the ID column is equal to the old ID. 1164 01:11:44,310 --> 01:11:48,750 Now, old here is a special keyword that means the row we just deleted 1165 01:11:48,750 --> 01:11:51,540 or that we would have deleted from current collections. 1166 01:11:51,540 --> 01:11:58,260 I want to update that same row with that same ID in the underlying collections 1167 01:11:58,260 --> 01:11:59,460 table. 1168 01:11:59,460 --> 01:12:02,400 So I'll put a semicolon here to end. 1169 01:12:02,400 --> 01:12:04,980 And then I'll say end this statement. 1170 01:12:04,980 --> 01:12:06,870 This is the entire statement I want to run 1171 01:12:06,870 --> 01:12:10,980 after I try to delete all my current collections table. 1172 01:12:10,980 --> 01:12:12,780 Now, I'll hit Enter. 1173 01:12:12,780 --> 01:12:14,200 And I'll see nothing changes. 1174 01:12:14,200 --> 01:12:18,490 But if I say dot schema, I should then see down below here, 1175 01:12:18,490 --> 01:12:21,750 I have a trigger called delete that will instead run an update when 1176 01:12:21,750 --> 01:12:24,610 I try to delete from this view. 1177 01:12:24,610 --> 01:12:25,710 So let's try that. 1178 01:12:25,710 --> 01:12:30,570 What I can do now is try again to delete from current collections. 1179 01:12:30,570 --> 01:12:36,270 I'll say delete from current collections where. 1180 01:12:36,270 --> 01:12:41,267 In this case, the title equals Imaginative Landscape. 1181 01:12:41,267 --> 01:12:42,600 Let's say we sold this painting. 1182 01:12:42,600 --> 01:12:44,460 It's no longer part of our collection. 1183 01:12:44,460 --> 01:12:47,140 Imaginative Landscape semicolon. 1184 01:12:47,140 --> 01:12:50,040 And now it actually seems to work. 1185 01:12:50,040 --> 01:12:52,060 I get no error here. 1186 01:12:52,060 --> 01:12:54,060 So it seems like it went OK. 1187 01:12:54,060 --> 01:12:59,460 Well, if I select star from current collections, 1188 01:12:59,460 --> 01:13:01,920 I actually see that it's gone. 1189 01:13:01,920 --> 01:13:03,270 But is it really gone? 1190 01:13:03,270 --> 01:13:08,220 If I try select star from collections, not current collections, the underlying 1191 01:13:08,220 --> 01:13:12,780 table for current collections semicolon, what do I see? 1192 01:13:12,780 --> 01:13:18,780 That the ID column has been updated from a 0 now to a 1. 1193 01:13:18,780 --> 01:13:21,420 So a good way of trying to use triggers here 1194 01:13:21,420 --> 01:13:27,240 to update our views by instead modifying the underlying table. 1195 01:13:27,240 --> 01:13:32,650 Now, going to be one more issue we encounter here, which is the following. 1196 01:13:32,650 --> 01:13:36,430 Let's say I'm at this current state in my tables. 1197 01:13:36,430 --> 01:13:39,450 I have collections, where Farmers Working at Dawn is deleted. 1198 01:13:39,450 --> 01:13:41,190 It has a 1 right here. 1199 01:13:41,190 --> 01:13:45,340 And we thus can't see it inside of current collections. 1200 01:13:45,340 --> 01:13:49,080 Well, let's say I want to not delete from current collections, 1201 01:13:49,080 --> 01:13:51,930 but also add to current collections. 1202 01:13:51,930 --> 01:13:54,660 I could try to insert into this view. 1203 01:13:54,660 --> 01:13:59,070 But I probably actually won't be able to do that because it is a view. 1204 01:13:59,070 --> 01:14:00,390 Let's think then. 1205 01:14:00,390 --> 01:14:06,900 If I wanted to insert into this view, what should I really do instead? 1206 01:14:06,900 --> 01:14:11,318 AUDIENCE: You should add it to the collections table instead of the view. 1207 01:14:11,318 --> 01:14:12,610 CARTER ZENKE: I like that idea. 1208 01:14:12,610 --> 01:14:18,480 So I should add this row to the collections table instead of the view. 1209 01:14:18,480 --> 01:14:21,120 But now, here's a bit of a trick. 1210 01:14:21,120 --> 01:14:24,870 Let's say I add Farmers Working at Dawn. 1211 01:14:24,870 --> 01:14:27,030 We reacquired it. 1212 01:14:27,030 --> 01:14:34,630 What should I do in that case, when I'm adding a painting I previously deleted? 1213 01:14:34,630 --> 01:14:40,200 AUDIENCE: So what we could do is we know that the deleted column-- 1214 01:14:40,200 --> 01:14:45,990 we can check if the deleted column is gone and then based on that, 1215 01:14:45,990 --> 01:14:49,530 we could update that particular column. 1216 01:14:49,530 --> 01:14:57,430 Or if there is no title in the collections table, 1217 01:14:57,430 --> 01:14:59,782 we can just add it, insert it. 1218 01:14:59,782 --> 01:15:01,740 CARTER ZENKE: Yeah, I like your thinking there. 1219 01:15:01,740 --> 01:15:05,820 First trying to check, do we have this item already in our table? 1220 01:15:05,820 --> 01:15:07,290 Or do we not? 1221 01:15:07,290 --> 01:15:09,690 If we do already have it in our table, we should just 1222 01:15:09,690 --> 01:15:12,780 flip this bit from 1 to a 0. 1223 01:15:12,780 --> 01:15:17,890 If we don't though, let's add a brand new row, as we heard before. 1224 01:15:17,890 --> 01:15:19,350 So let's think then. 1225 01:15:19,350 --> 01:15:23,610 What could we use to conditionally run some code on the collections table? 1226 01:15:23,610 --> 01:15:27,180 Well, it turns out that you can use some other aspects of triggers. 1227 01:15:27,180 --> 01:15:32,970 In this case, triggers have not just unconditional use cases, but also 1228 01:15:32,970 --> 01:15:35,190 conditional ones too. 1229 01:15:35,190 --> 01:15:37,390 So let's try this. 1230 01:15:37,390 --> 01:15:43,200 I could try to insert on this view and then run some code down below. 1231 01:15:43,200 --> 01:15:45,400 But to our discussion a little bit earlier, 1232 01:15:45,400 --> 01:15:47,700 I don't want to unconditionally do this. 1233 01:15:47,700 --> 01:15:50,700 I want to check, is this item in my table? 1234 01:15:50,700 --> 01:15:52,540 Or is it not already? 1235 01:15:52,540 --> 01:15:56,070 So for that, I could use this keyword called when. 1236 01:15:56,070 --> 01:16:02,050 I could say for each row when some condition is true run this trigger. 1237 01:16:02,050 --> 01:16:04,710 And you can imagine having two triggers to insert 1238 01:16:04,710 --> 01:16:09,540 into our view, one that runs when the item is already in our table. 1239 01:16:09,540 --> 01:16:13,560 We could then flip that bit from a 1 to a 0. 1240 01:16:13,560 --> 01:16:18,300 We could also have a different trigger to actually insert a new item. 1241 01:16:18,300 --> 01:16:21,300 In that case, actually inserting a brand new row 1242 01:16:21,300 --> 01:16:23,920 to our underlying collections table. 1243 01:16:23,920 --> 01:16:28,620 So let's work on this now to finish off and polish our collections view. 1244 01:16:28,620 --> 01:16:31,480 I'll come back to our terminal here. 1245 01:16:31,480 --> 01:16:35,760 And what we'll try to do is now insert into this table. 1246 01:16:35,760 --> 01:16:41,040 I'll create a new trigger, in this case, called insert when exists. 1247 01:16:41,040 --> 01:16:46,930 I'll say create trigger insert when exists. 1248 01:16:46,930 --> 01:16:49,830 So this trigger will run when an item we're trying to insert 1249 01:16:49,830 --> 01:16:52,440 is already in our collections table. 1250 01:16:52,440 --> 01:16:57,510 I'll say instead of insert on the current collections table, 1251 01:16:57,510 --> 01:17:00,090 I want this to happen instead. 1252 01:17:00,090 --> 01:17:03,940 I'll say for each row. 1253 01:17:03,940 --> 01:17:07,320 Now I'll Enter again and say when-- 1254 01:17:07,320 --> 01:17:16,650 just for style sake-- when new dot accession number is in my table 1255 01:17:16,650 --> 01:17:17,620 already. 1256 01:17:17,620 --> 01:17:20,950 I want to find which accession numbers I already have. 1257 01:17:20,950 --> 01:17:24,900 And if you remember, accession number is a unique ID for the museum. 1258 01:17:24,900 --> 01:17:27,270 It's not the ID column of my table, but a unique 1259 01:17:27,270 --> 01:17:29,598 ID to keep track of items in the museum. 1260 01:17:29,598 --> 01:17:31,890 So if I've already given this item an accession number, 1261 01:17:31,890 --> 01:17:34,260 I should already see it in my table already. 1262 01:17:34,260 --> 01:17:37,620 So I'll say then Enter, Enter, Enter for style sake, 1263 01:17:37,620 --> 01:17:48,280 select accession number from collections. 1264 01:17:48,280 --> 01:17:52,650 So this subquery tells me what accession numbers do we already have. 1265 01:17:52,650 --> 01:17:56,670 And this condition tells me is the new row I'm inserting, 1266 01:17:56,670 --> 01:17:59,640 does that have the same accession number or not? 1267 01:17:59,640 --> 01:18:02,550 Now, I'll close this subquery. 1268 01:18:02,550 --> 01:18:05,040 And I'll continue my trigger. 1269 01:18:05,040 --> 01:18:06,780 I'll say begin. 1270 01:18:06,780 --> 01:18:11,640 Well, in the case that this item already exists in my collections table, 1271 01:18:11,640 --> 01:18:12,930 I just want to update. 1272 01:18:12,930 --> 01:18:17,130 I want to flip that integer from a 1 back down to a 0. 1273 01:18:17,130 --> 01:18:26,470 So I'll say update collections and set deleted equal to 0. 1274 01:18:26,470 --> 01:18:29,850 Now, I want to do this not on my entire table, 1275 01:18:29,850 --> 01:18:36,510 but only where the accession number is equal to the new accession 1276 01:18:36,510 --> 01:18:41,440 number, that is the accession number I'm inserting here as well. 1277 01:18:41,440 --> 01:18:43,390 I'll then end like this. 1278 01:18:43,390 --> 01:18:45,690 And now I have my entire trigger. 1279 01:18:45,690 --> 01:18:50,760 I'll say instead of inserting into my view, I want to run this instead. 1280 01:18:50,760 --> 01:18:53,280 Check if I've already added this item. 1281 01:18:53,280 --> 01:18:58,200 And if I have, change that deleted column from a 1 to a 0. 1282 01:18:58,200 --> 01:19:00,030 Now I'll hit Enter. 1283 01:19:00,030 --> 01:19:03,030 And let me hit Enter again. 1284 01:19:03,030 --> 01:19:06,030 And let me check. 1285 01:19:06,030 --> 01:19:09,210 I needed to have a semicolon after my where. 1286 01:19:09,210 --> 01:19:12,300 So let me try to run this again. 1287 01:19:12,300 --> 01:19:14,280 I'll do Control D to restart. 1288 01:19:14,280 --> 01:19:16,890 And I'll say SQLite 3. 1289 01:19:16,890 --> 01:19:18,780 Let's do MFA dot DB. 1290 01:19:18,780 --> 01:19:20,800 And I'll restart this trigger. 1291 01:19:20,800 --> 01:19:26,130 So here I'll say create trigger insert when exists. 1292 01:19:26,130 --> 01:19:28,020 We gave the name before. 1293 01:19:28,020 --> 01:19:33,750 Then I'll say instead of insert on current collections, what 1294 01:19:33,750 --> 01:19:35,460 I want to do is run this. 1295 01:19:35,460 --> 01:19:39,330 I'll say for each row I insert, check some condition. 1296 01:19:39,330 --> 01:19:43,680 When the new accession number that is the unique ID the museum 1297 01:19:43,680 --> 01:19:48,330 has for this item is in my already available accession numbers. 1298 01:19:48,330 --> 01:20:00,430 I'll say select in this case, select accession number from collections. 1299 01:20:00,430 --> 01:20:03,010 And then I'll close the subquery. 1300 01:20:03,010 --> 01:20:06,760 Now, I'll say I want to begin the same way I want to run. 1301 01:20:06,760 --> 01:20:11,770 Begin now update, in this case, my collections table and 1302 01:20:11,770 --> 01:20:14,500 set the deleted column from a 1 back down to a 0. 1303 01:20:14,500 --> 01:20:24,560 Update collections and set deleted equals 0 where, in this case, 1304 01:20:24,560 --> 01:20:30,130 the accession number is equal to the new accession number, 1305 01:20:30,130 --> 01:20:32,900 only modifying that particular row. 1306 01:20:32,900 --> 01:20:35,350 Now semicolon to close this statement. 1307 01:20:35,350 --> 01:20:39,320 And then I'll say end below, a second semicolon, Enter. 1308 01:20:39,320 --> 01:20:43,000 Now if I type dot schema, I should see my new trigger 1309 01:20:43,000 --> 01:20:47,650 to insert into my collections table when I try to insert into my view. 1310 01:20:47,650 --> 01:20:48,790 So let's try this. 1311 01:20:48,790 --> 01:20:54,323 I'll say maybe I want to insert into my current collections. 1312 01:20:54,323 --> 01:20:55,240 Well, let me try this. 1313 01:20:55,240 --> 01:21:01,180 I'll say select star from collections. 1314 01:21:01,180 --> 01:21:05,830 And let's say I want to reinsert, in this case, Profusion of Flowers. 1315 01:21:05,830 --> 01:21:09,760 So I'll say insert into my current collections. 1316 01:21:09,760 --> 01:21:14,300 Actually, reinsert into current collections. 1317 01:21:14,300 --> 01:21:17,200 And I want to insert Imaginative Landscape. 1318 01:21:17,200 --> 01:21:21,310 So I'll say insert into current collections some set of values along 1319 01:21:21,310 --> 01:21:22,070 with some columns. 1320 01:21:22,070 --> 01:21:25,900 I'll say the title is Perfusion of Flowers as Imaginative Landscape 1321 01:21:25,900 --> 01:21:32,905 later on, then accession number, then the date it was acquired, like this. 1322 01:21:32,905 --> 01:21:35,590 Now I'll hit Enter and provide the values. 1323 01:21:35,590 --> 01:21:40,030 I'll say the title is Imaginative Landscape like this. 1324 01:21:40,030 --> 01:21:43,450 The accession number is 56.496. 1325 01:21:43,450 --> 01:21:47,090 And the acquired date is going to be null, in this case. 1326 01:21:47,090 --> 01:21:48,820 We don't know when we got it. 1327 01:21:48,820 --> 01:21:51,520 I'll hit semicolon here and Enter. 1328 01:21:51,520 --> 01:21:55,390 So if I select star from collections now, select star from collections 1329 01:21:55,390 --> 01:22:00,040 semicolon, I should see that I didn't actually insert a new row. 1330 01:22:00,040 --> 01:22:05,140 I only flipped that deleted value from a 1 back down to a 0. 1331 01:22:05,140 --> 01:22:07,700 Now, there's certainly more to do here. 1332 01:22:07,700 --> 01:22:12,170 I could write a trigger to actually add a brand new row to this table here. 1333 01:22:12,170 --> 01:22:16,180 I could also try to update this data, change, for example, the acquired 1334 01:22:16,180 --> 01:22:19,780 date from null to a date that we actually do know. 1335 01:22:19,780 --> 01:22:21,850 For now though, we leave all that up to you. 1336 01:22:21,850 --> 01:22:24,630 And we'll see you next time. 1337 01:22:24,630 --> 01:22:26,000