1
00:00:00,000 --> 00:00:17,330


2
00:00:17,330 --> 00:00:20,460
CARTER ZENKE: Well, hello one and
all, and welcome back to CS50--

3
00:00:20,460 --> 00:00:22,940
Instruction to Databases with SQL.

4
00:00:22,940 --> 00:00:26,030
My name is Carter Zenke
and up until now you've

5
00:00:26,030 --> 00:00:28,460
been learning how to
design databases and how

6
00:00:28,460 --> 00:00:32,420
to add data to them, like how
to insert, update, and delete

7
00:00:32,420 --> 00:00:34,580
some values all together.

8
00:00:34,580 --> 00:00:37,070
Now, along the way, we've
added some complexity.

9
00:00:37,070 --> 00:00:41,120
That is, we went from one table to
multiple, , and we had relationships,

10
00:00:41,120 --> 00:00:43,860
too, among all those tables, as well.

11
00:00:43,860 --> 00:00:47,930
Now, what we'll introduce today is a
way to simplify some of that complexity

12
00:00:47,930 --> 00:00:51,890
and give you a way to see your
data all the more clearly-- .

13
00:00:51,890 --> 00:00:54,230
this tool called a view.

14
00:00:54,230 --> 00:00:58,250
Now, we'll begin with a familiar
data set-- this data set of books.

15
00:00:58,250 --> 00:01:01,730
And we learn about this data
set of books back in week zero.

16
00:01:01,730 --> 00:01:04,430
If you remember, we
had a table of 78 books

17
00:01:04,430 --> 00:01:08,330
that had all been nominated for
the International Booker Prize.

18
00:01:08,330 --> 00:01:11,030
It was a single table,
just of these books.

19
00:01:11,030 --> 00:01:15,170
Now quickly, in week one, we introduced
more than just this single table.

20
00:01:15,170 --> 00:01:20,000
We actually had multiple tables-- a
table for books, a table for authors,

21
00:01:20,000 --> 00:01:25,670
for example, and also some relationship
between those authors and those books.

22
00:01:25,670 --> 00:01:31,370
We said that an author could write
not just one book, but multiple books.

23
00:01:31,370 --> 00:01:35,420
And similarly, a book could be
written by not just one author,

24
00:01:35,420 --> 00:01:37,320
but multiple as well.

25
00:01:37,320 --> 00:01:40,650
This is what we call a
many-to-many relationship.

26
00:01:40,650 --> 00:01:45,080
Now, we also saw this not just
visually, but in our database,

27
00:01:45,080 --> 00:01:46,790
in terms of tables, as well.

28
00:01:46,790 --> 00:01:52,520
We had this kind of schema, where we had
a table for authors, a table for books,

29
00:01:52,520 --> 00:01:57,410
and then a table in the middle, to talk
about the relationship between authors

30
00:01:57,410 --> 00:02:00,440
and books-- which
author wrote which book.

31
00:02:00,440 --> 00:02:05,270
We can see this in the author
table, looking at these IDs here.

32
00:02:05,270 --> 00:02:11,060
And now to recap, if I wanted to find
the book that Han here has written,

33
00:02:11,060 --> 00:02:14,010
let me ask, how could I do that?

34
00:02:14,010 --> 00:02:17,690
How could I find the book that
Han, in the authors table,

35
00:02:17,690 --> 00:02:21,890
has written, over here
in the books table?

36
00:02:21,890 --> 00:02:24,320
Feel free to raise your hand.

37
00:02:24,320 --> 00:02:26,768
AUDIENCE: By looking
at the author_id, 31.

38
00:02:26,768 --> 00:02:27,560
CARTER ZENKE: Nice.

39
00:02:27,560 --> 00:02:28,760
So start by looking at the author_id.

40
00:02:28,760 --> 00:02:29,880
That's a great idea.

41
00:02:29,880 --> 00:02:33,350
So here we see that
Han's author_id is 31.

42
00:02:33,350 --> 00:02:37,700
And we could make this link between the
authors table and the authored table.

43
00:02:37,700 --> 00:02:40,160
So I'll highlight this here,
so we can see it visually.

44
00:02:40,160 --> 00:02:42,080
Han has the ID 31.

45
00:02:42,080 --> 00:02:46,700
And here on the authored table,
we see the author_id of 31 next

46
00:02:46,700 --> 00:02:49,280
to, actually, some book_id.

47
00:02:49,280 --> 00:02:51,050
This book_id is 74.

48
00:02:51,050 --> 00:02:53,840
Well what book has that ID, 74?

49
00:02:53,840 --> 00:02:57,830
Turns out it's the white
book, all the way over here.

50
00:02:57,830 --> 00:03:02,790
Now, this is adding some
complexity to our queries.

51
00:03:02,790 --> 00:03:05,510
If I want to find who
wrote what book, I have

52
00:03:05,510 --> 00:03:09,080
to go through the authors table,
then the authored table, and then

53
00:03:09,080 --> 00:03:09,800
the book table.

54
00:03:09,800 --> 00:03:12,390
That's just a lot of
steps to go through.

55
00:03:12,390 --> 00:03:15,530
And we could, though,
try to simplify this,

56
00:03:15,530 --> 00:03:19,100
to have a better view
of this data, to see it

57
00:03:19,100 --> 00:03:23,600
with just a single column for authors
and a single column for title.

58
00:03:23,600 --> 00:03:27,170
So let's try to visualize this together.

59
00:03:27,170 --> 00:03:31,760
What I might do first is try to flip
around this primary key of authors,

60
00:03:31,760 --> 00:03:32,810
this ID column.

61
00:03:32,810 --> 00:03:36,980
Let's put it closer to the author table,
just visually, metaphorically, here.

62
00:03:36,980 --> 00:03:42,200
Now I might see that these
IDs line, up one to another.

63
00:03:42,200 --> 00:03:46,910
And we saw in a previous lecture
that I could use a join in SQL

64
00:03:46,910 --> 00:03:50,420
to put these tables into a single table.

65
00:03:50,420 --> 00:03:55,550
And what a join does, if you remember,
is take the primary key in one table,

66
00:03:55,550 --> 00:03:59,655
and the foreign key in another,
and just find which ones match,

67
00:03:59,655 --> 00:04:01,280
to put the rows of each table together.

68
00:04:01,280 --> 00:04:04,290
A bit like touching your
fingertips together, like this.

69
00:04:04,290 --> 00:04:06,600
So I'll then visually
move these together,

70
00:04:06,600 --> 00:04:09,410
join these tables into a single table.

71
00:04:09,410 --> 00:04:13,100
And notice how if I, now,
just get rid of these IDs--

72
00:04:13,100 --> 00:04:15,090
metaphorically, just visually here--

73
00:04:15,090 --> 00:04:21,140
I now have a much more simple
table, a simple view of this data.

74
00:04:21,140 --> 00:04:24,950
And if I want to find which book Han
has written, well, I could do that.

75
00:04:24,950 --> 00:04:26,870
Just look at a single row here.

76
00:04:26,870 --> 00:04:31,880
Look at the name column and
then the title column here.

77
00:04:31,880 --> 00:04:37,190
Now, what we're doing, by creating
this new view of our data set,

78
00:04:37,190 --> 00:04:39,200
is actually creating a view.

79
00:04:39,200 --> 00:04:43,850
And a view is defined by
this here-- a virtual table

80
00:04:43,850 --> 00:04:46,700
that is itself defined by a query.

81
00:04:46,700 --> 00:04:48,290
Now, what does this mean?

82
00:04:48,290 --> 00:04:49,430
What is a virtual table?

83
00:04:49,430 --> 00:04:50,330
What is a query?

84
00:04:50,330 --> 00:04:53,310
Let's go back a few steps
and see this in action.

85
00:04:53,310 --> 00:04:56,030
So we started with, in
this case, three tables--

86
00:04:56,030 --> 00:04:58,730
authors, books, and authored.

87
00:04:58,730 --> 00:05:04,160
And presumably, there's some query here
that gets us from point A, these three

88
00:05:04,160 --> 00:05:07,400
tables, to point B, this single table.

89
00:05:07,400 --> 00:05:11,880
That query might involve some selects,
some join we talked about earlier.

90
00:05:11,880 --> 00:05:17,350
But there is some query we can write
to get us from point A to point B.

91
00:05:17,350 --> 00:05:22,540
And notice here that the result
of this query is itself a table.

92
00:05:22,540 --> 00:05:26,830
Like, this has rows and
columns and what we can do

93
00:05:26,830 --> 00:05:32,030
is see all of those in terms of
a table we could later on query.

94
00:05:32,030 --> 00:05:38,020
So a view is a way of writing some
query to see our data in a new way

95
00:05:38,020 --> 00:05:44,540
and then saving this view of our
data, so we can query it later on.

96
00:05:44,540 --> 00:05:47,050
Now, why might you want to build a view?

97
00:05:47,050 --> 00:05:49,330
Well, you might want to
for a few reasons here.

98
00:05:49,330 --> 00:05:53,200
You could use it to simplify
your data, as we just saw here.

99
00:05:53,200 --> 00:05:57,050
You can use it to aggregate your
data, to try to sum up some values

100
00:05:57,050 --> 00:05:58,820
and store that in a single table.

101
00:05:58,820 --> 00:06:02,710
You could use it to partition your
data, or divide it into logical pieces.

102
00:06:02,710 --> 00:06:05,560
And you could even use
it to secure your data,

103
00:06:05,560 --> 00:06:09,580
to perhaps hide some columns
you don't want somebody to see.

104
00:06:09,580 --> 00:06:11,890
There are more reasons,
too, to use views.

105
00:06:11,890 --> 00:06:14,420
We'll focus on these four today.

106
00:06:14,420 --> 00:06:18,110
And let's begin by
focusing on simplifying.

107
00:06:18,110 --> 00:06:23,920
So we saw earlier a way to simplify
these multiple tables down in to one.

108
00:06:23,920 --> 00:06:28,330
If I go back here, we'll see, we
got from this set of three tables

109
00:06:28,330 --> 00:06:33,310
to a single table, that had both
author names and author titles.

110
00:06:33,310 --> 00:06:38,440
So let's see here exactly how
much we're reducing complexity

111
00:06:38,440 --> 00:06:40,900
by trying out different
queries, and then

112
00:06:40,900 --> 00:06:45,800
adding in a view to see all the
benefits of this new view we have here.

113
00:06:45,800 --> 00:06:46,570
So we'll go back.

114
00:06:46,570 --> 00:06:49,090
And we'll see, in this
case, our three tables--

115
00:06:49,090 --> 00:06:51,700
authors, books, and authored.

116
00:06:51,700 --> 00:06:56,000
And let's say I want to find the
books a particular author has written.

117
00:06:56,000 --> 00:06:58,540
I could write a query to do just that.

118
00:06:58,540 --> 00:07:02,600
And let's go back to SQLite here,
to run that query ourselves.

119
00:07:02,600 --> 00:07:05,410
I'll come back to my
computer, here in SQLite.

120
00:07:05,410 --> 00:07:09,940
And my goal will be to find which
author wrote which particular books.

121
00:07:09,940 --> 00:07:13,770
So if I go to my database, here, I can
go to my terminal, and type sqlite3

122
00:07:13,770 --> 00:07:18,700
longlist.db, to reopen this database.

123
00:07:18,700 --> 00:07:21,430
And now, I should see,
if I type .schema,

124
00:07:21,430 --> 00:07:23,560
all the tables that are inside of it.

125
00:07:23,560 --> 00:07:27,580
I have a table for authors, if I scroll
up a little bit, a table for books,

126
00:07:27,580 --> 00:07:33,340
and a table for authored-- that is,
relationship between authors and books.

127
00:07:33,340 --> 00:07:37,240
Now, let's say I want
to write a query to find

128
00:07:37,240 --> 00:07:40,570
which books Fernanda Melchor wrote.

129
00:07:40,570 --> 00:07:44,500
Well, I probably first need to
find Fernanda's ID, as we just

130
00:07:44,500 --> 00:07:47,210
saw visually in our earlier example.

131
00:07:47,210 --> 00:07:49,570
So let's find Fernanda's ID.

132
00:07:49,570 --> 00:07:53,920
I could say SELECT, let's say,
id FROM the authors table,

133
00:07:53,920 --> 00:07:56,980
where the name equals Fernanda Melchor.

134
00:07:56,980 --> 00:07:58,930
So I'll say SELECT id FROM authors.

135
00:07:58,930 --> 00:08:06,610
Then, WHERE the "name" equals
Fernanda-- 'Fernanda Melchor' semicolon.

136
00:08:06,610 --> 00:08:10,330
So it seems Fernanda's ID is 24.

137
00:08:10,330 --> 00:08:13,810
And I could use this ID
in the rest of my query,

138
00:08:13,810 --> 00:08:15,920
but I could probably
do better than this.

139
00:08:15,920 --> 00:08:18,130
I could use what we
saw before, something

140
00:08:18,130 --> 00:08:21,550
called a subquery, to take
this query I just wrote

141
00:08:21,550 --> 00:08:24,560
and put it inside some other one here.

142
00:08:24,560 --> 00:08:29,560
So let's take the next step and find
the book_ids that Fernando wrote.

143
00:08:29,560 --> 00:08:34,090
So let's try looking in the authored
table for that, which has author_ids

144
00:08:34,090 --> 00:08:37,419
and book_ids, side-by-side.

145
00:08:37,419 --> 00:08:43,929
OK, I'll SELECT, in this case, the
book_id from the authored table, WHERE,

146
00:08:43,929 --> 00:08:48,940
in this case, the author_id is
equal to-- well, I could say 24,

147
00:08:48,940 --> 00:08:50,110
but I want to do better--

148
00:08:50,110 --> 00:08:52,270
I could actually have a
very own subquery here

149
00:08:52,270 --> 00:08:54,430
to find Fernanda's ID dynamically.

150
00:08:54,430 --> 00:08:58,810
I could put parentheses here, then
hit one, two, three, four, spaces,

151
00:08:58,810 --> 00:09:05,560
and say SELECT id from
the authors table, WHERE

152
00:09:05,560 --> 00:09:08,530
the name equals Fernanda Melchor--

153
00:09:08,530 --> 00:09:10,390
that same query from before.

154
00:09:10,390 --> 00:09:13,360
I'll close this out,
close out my subquery.

155
00:09:13,360 --> 00:09:20,090
And then if I hit Enter, I should see
now the book IDs Fernanda has written.

156
00:09:20,090 --> 00:09:22,900
But there's still one more step here.

157
00:09:22,900 --> 00:09:24,580
What do I now need to do?

158
00:09:24,580 --> 00:09:28,060
I have book IDs that
Fernando wrote but how

159
00:09:28,060 --> 00:09:33,160
do you think I could find the
titles from those book IDs?

160
00:09:33,160 --> 00:09:35,698
AUDIENCE: You could select
the title from the books.

161
00:09:35,698 --> 00:09:36,490
CARTER ZENKE: Nice.

162
00:09:36,490 --> 00:09:38,780
When you select title
from that books table--

163
00:09:38,780 --> 00:09:42,490
so if you remember, we have
titles inside of our books table,

164
00:09:42,490 --> 00:09:46,240
but we still need to link
those titles through their ID

165
00:09:46,240 --> 00:09:48,380
to the book IDs Fernanda has written.

166
00:09:48,380 --> 00:09:50,410
So let's do one last step here.

167
00:09:50,410 --> 00:09:52,670
I'll come back to my terminal.

168
00:09:52,670 --> 00:09:54,940
And I'll write the query in full now.

169
00:09:54,940 --> 00:09:57,430
So I know I want to end up with titles.

170
00:09:57,430 --> 00:10:02,230
I'll say SELECT, in this case,
title from the books table.

171
00:10:02,230 --> 00:10:05,890
Now, I want to find
particular titles where

172
00:10:05,890 --> 00:10:09,910
the ID is in that list of
books Fernanda has written.

173
00:10:09,910 --> 00:10:16,660
So I'll say WHERE the id is IN some
subquery, the query we actually

174
00:10:16,660 --> 00:10:17,590
just wrote.

175
00:10:17,590 --> 00:10:18,820
So let me write it again.

176
00:10:18,820 --> 00:10:22,390
I'll say one, two, three, four
spaces here to indent for style.

177
00:10:22,390 --> 00:10:29,110
Then I'll say SELECT the book_id from
the authored table, in this case,

178
00:10:29,110 --> 00:10:34,180
WHERE the author_id is
equal to some number--

179
00:10:34,180 --> 00:10:36,370
in this case, Fernanda's ID.

180
00:10:36,370 --> 00:10:38,170
But what is Fernanda's ID?

181
00:10:38,170 --> 00:10:39,190
Let's find out.

182
00:10:39,190 --> 00:10:44,140
I'll make another subquery here
to find Fernanda's ID, indenting

183
00:10:44,140 --> 00:10:46,270
four times just for style here.

184
00:10:46,270 --> 00:10:50,200
Then I'll say SELECT id
from the authors table--

185
00:10:50,200 --> 00:10:51,640
and then again indent--

186
00:10:51,640 --> 00:10:56,380
WHERE the name equals Fernanda Melchor.

187
00:10:56,380 --> 00:11:01,600
Then I'll close out all my queries to
create this single long query to find

188
00:11:01,600 --> 00:11:04,370
all the books Fernanda has written.

189
00:11:04,370 --> 00:11:08,170
And, again, the order here
is first find Fernanda's ID,

190
00:11:08,170 --> 00:11:11,440
then find the book IDs
associated with that ID,

191
00:11:11,440 --> 00:11:16,300
then find the titles associated
with those book IDs themselves.

192
00:11:16,300 --> 00:11:23,110
Now I'll hit Enter and I should see all
those books that Fernanda has written.

193
00:11:23,110 --> 00:11:26,150
Now, as we talked about,
this is a lot of complexity,

194
00:11:26,150 --> 00:11:28,480
a lot of tables to go
through, and there's probably

195
00:11:28,480 --> 00:11:30,110
a better way to do this.

196
00:11:30,110 --> 00:11:32,590
So let me open up a new
terminal and try out

197
00:11:32,590 --> 00:11:35,140
a different way of going about this.

198
00:11:35,140 --> 00:11:37,960
I'll make a new
connection to my database.

199
00:11:37,960 --> 00:11:43,420
I'll make a new terminal and I'll
say sqlite3 longlist.db, hit Enter.

200
00:11:43,420 --> 00:11:48,160
Now I should see that I have a
brand new connection to my database.

201
00:11:48,160 --> 00:11:52,220
And I could try to find
a better way to do this.

202
00:11:52,220 --> 00:11:56,770
Let me try to do what we did visually
and join these tables together.

203
00:11:56,770 --> 00:12:04,820
Well I could try SELECT name and
title FROM the authors table.

204
00:12:04,820 --> 00:12:09,100
But notice how authors table
doesn't have titles in it.

205
00:12:09,100 --> 00:12:12,850
It does have the name of the author,
but it doesn't have book title.

206
00:12:12,850 --> 00:12:17,060
So to get titles, I have to
join in some new table here.

207
00:12:17,060 --> 00:12:24,490
I'll hit Enter and I'll say JOIN, let's
say, the authored table on authors

208
00:12:24,490 --> 00:12:30,820
dot id equals authored dot author_id.

209
00:12:30,820 --> 00:12:36,610
And all I'm doing here is saying I'll
take first the authors table and then

210
00:12:36,610 --> 00:12:40,540
why don't I try to combine this
table with the authored table,

211
00:12:40,540 --> 00:12:44,320
linking it up by looking at
the authors dot ID column

212
00:12:44,320 --> 00:12:47,050
and the authored dot author_id column.

213
00:12:47,050 --> 00:12:50,380
And visually, let's take a look
what we're doing here in this join.

214
00:12:50,380 --> 00:12:54,640
If I look at this in my
slides here, I should see--

215
00:12:54,640 --> 00:12:56,980
I started with my authors table.

216
00:12:56,980 --> 00:13:03,370
Here I had names of those authors and I
had an ID column in that author table.

217
00:13:03,370 --> 00:13:06,850
Now I'm trying to join
the authored table

218
00:13:06,850 --> 00:13:12,940
by telling SQL that these two columns,
author_id and id, should align.

219
00:13:12,940 --> 00:13:17,170
They should line up like this
so their rows are all together.

220
00:13:17,170 --> 00:13:19,030
But there's still one more step.

221
00:13:19,030 --> 00:13:24,260
I've joined the author table
and the authored table,

222
00:13:24,260 --> 00:13:27,940
but what's the next step
to find book titles?

223
00:13:27,940 --> 00:13:31,280
I still have to join the books table.

224
00:13:31,280 --> 00:13:32,800
So let's try that here.

225
00:13:32,800 --> 00:13:36,970
I'll come back over and instead of
joining just the authored table,

226
00:13:36,970 --> 00:13:39,970
let me now try to also join--

227
00:13:39,970 --> 00:13:41,530
let me hit Enter--

228
00:13:41,530 --> 00:13:44,080
to also join the books table.

229
00:13:44,080 --> 00:13:55,960
I'll JOIN books ON books dot id equals,
in this case, authored dot book_id.

230
00:13:55,960 --> 00:14:00,760
And to be clear, this line here
means that what I'm trying to do

231
00:14:00,760 --> 00:14:07,270
is now join this book table, which
is over here, with this author table,

232
00:14:07,270 --> 00:14:11,530
aligning it so that their IDs
match up into a single row,

233
00:14:11,530 --> 00:14:17,180
as we see here with 74 and
74, 1 and 1, and so on.

234
00:14:17,180 --> 00:14:17,950
I'll come back.

235
00:14:17,950 --> 00:14:20,690
And now let me try this.

236
00:14:20,690 --> 00:14:21,700
Come back over here.

237
00:14:21,700 --> 00:14:28,300
And I'll just try hitting Enter to
see what I can see from this query.

238
00:14:28,300 --> 00:14:31,000
Let me zoom out a little
bit and you'll see

239
00:14:31,000 --> 00:14:38,020
I actually do have author names next
to book titles all in one table,

240
00:14:38,020 --> 00:14:40,970
just as we thought we would.

241
00:14:40,970 --> 00:14:47,680
So this is useful to me, but what if
I wanted to have this stored as a way

242
00:14:47,680 --> 00:14:49,670
to see my tables?

243
00:14:49,670 --> 00:14:54,530
I want to be able to query this
kind of table I see right here.

244
00:14:54,530 --> 00:14:59,230
Well for that, I could
try to introduce a view.

245
00:14:59,230 --> 00:15:03,670
Now a view starts with
a query, as we just saw.

246
00:15:03,670 --> 00:15:08,980
I wrote a query to select names and
titles from three different tables,

247
00:15:08,980 --> 00:15:10,880
joining them together.

248
00:15:10,880 --> 00:15:14,180
But if I want to save
the results of that query

249
00:15:14,180 --> 00:15:19,700
so I can query them later on, I
could use this syntax here create

250
00:15:19,700 --> 00:15:26,310
a view that has a name that I can give
to it as this query down below here.

251
00:15:26,310 --> 00:15:30,230
So CREATE VIEW given some name
AS the result of this query

252
00:15:30,230 --> 00:15:35,400
gives me a view that is then
saved and part of my schema.

253
00:15:35,400 --> 00:15:40,190
So let's try that here to create a
view by which you can see both author

254
00:15:40,190 --> 00:15:43,220
and book titles side by side.

255
00:15:43,220 --> 00:15:48,650
I'll come back to SQLite here and I'm
already in this brand new terminal.

256
00:15:48,650 --> 00:15:54,740
So I could try to remake this view
and save it now for later usage.

257
00:15:54,740 --> 00:15:58,760
I'll say create a view
and call it longlist,

258
00:15:58,760 --> 00:16:03,260
the same name we gave our
single table back in Week 0.

259
00:16:03,260 --> 00:16:10,430
Now I'll say this view is the
result of this following query AS,

260
00:16:10,430 --> 00:16:16,400
let's say, SELECT name and
title from the authors table--

261
00:16:16,400 --> 00:16:18,650
but not just the author's table--

262
00:16:18,650 --> 00:16:22,010
join, let's say, the authored table--

263
00:16:22,010 --> 00:16:30,530
JOIN authored on authors dot id
equals authored dot author_id.

264
00:16:30,530 --> 00:16:34,760
And then join again our book
table as we saw visually here.

265
00:16:34,760 --> 00:16:46,310
JOIN books ON books dot id
equals authored dot book_id.

266
00:16:46,310 --> 00:16:52,220
Now that is the query I used before to
see author names and book titles side

267
00:16:52,220 --> 00:16:53,000
by side.

268
00:16:53,000 --> 00:16:55,130
If I create this as a view--

269
00:16:55,130 --> 00:16:57,230
hitting semicolon and Enter--

270
00:16:57,230 --> 00:16:58,910
nothing quite happens.

271
00:16:58,910 --> 00:17:07,579
But now if I type .schema and hit Enter,
I should see a view called longlist

272
00:17:07,579 --> 00:17:12,560
that has two columns, name and title,
and actually see the query I use

273
00:17:12,560 --> 00:17:15,470
to create this view.

274
00:17:15,470 --> 00:17:17,390
So let me think now.

275
00:17:17,390 --> 00:17:20,030
I want to see what's inside this view.

276
00:17:20,030 --> 00:17:24,290
And to query a view I can use
it exactly as I would a table.

277
00:17:24,290 --> 00:17:30,380
I could say SELECT star FROM
longlist semicolon hit Enter,

278
00:17:30,380 --> 00:17:36,300
and now I should see author names
and book titles side by side.

279
00:17:36,300 --> 00:17:42,620
So if I have this kind of table with
authors and book title side by side,

280
00:17:42,620 --> 00:17:48,110
what then does my query become to
find the books written by Fernanda?

281
00:17:48,110 --> 00:17:50,510
Let me ask our audience here.

282
00:17:50,510 --> 00:17:57,170
What query could I use now on this view
to find the books written by Fernanda?

283
00:17:57,170 --> 00:17:58,820
AUDIENCE: Thank you, Mr. [INAUDIBLE].

284
00:17:58,820 --> 00:18:05,787
We can query the book or all of
the books using her specific name.

285
00:18:05,787 --> 00:18:06,870
CARTER ZENKE: A good idea.

286
00:18:06,870 --> 00:18:10,882
I can use a particular name in this case
and perhaps just use a simple WHERE.

287
00:18:10,882 --> 00:18:11,840
So let's try that here.

288
00:18:11,840 --> 00:18:13,740
I'll go back to my terminal.

289
00:18:13,740 --> 00:18:16,850
And if I want to find the
books written by Fernanda,

290
00:18:16,850 --> 00:18:19,310
well, I could just query for
them in this single table

291
00:18:19,310 --> 00:18:23,270
that I created called longlist-- or
not really a table but now a view.

292
00:18:23,270 --> 00:18:28,115
I'll say, in this case,
SELECT the title from--

293
00:18:28,115 --> 00:18:31,370
SELECT title from longlist WHERE--

294
00:18:31,370 --> 00:18:32,460
to Muhammad's point--

295
00:18:32,460 --> 00:18:40,340
I could just say name equals
Fernanda Melchor semicolon.

296
00:18:40,340 --> 00:18:43,850
I'll hit Enter here and
I see those same books

297
00:18:43,850 --> 00:18:47,990
Fernanda has written Paradais
and Hurricane Season.

298
00:18:47,990 --> 00:18:49,940
So let's see where we started here.

299
00:18:49,940 --> 00:18:52,880
We started with this kind of query--

300
00:18:52,880 --> 00:18:57,650
long, multiple lines, many
subqueries involved-- just

301
00:18:57,650 --> 00:19:01,520
to find the answer to which
books Fernanda had written.

302
00:19:01,520 --> 00:19:06,620
Now, though, that we have a new
view, I can simplify this query

303
00:19:06,620 --> 00:19:08,210
to just this here--

304
00:19:08,210 --> 00:19:12,740
SELECT title FROM longlist
WHERE the name equals Fernanda.

305
00:19:12,740 --> 00:19:18,170
And I can do all of this without
using much, if any, more disk space.

306
00:19:18,170 --> 00:19:21,560
A view, remember, is a
virtual table, not a real one.

307
00:19:21,560 --> 00:19:25,220
And I can then have the result
organized like this while still

308
00:19:25,220 --> 00:19:30,840
being in those underlying tables
stored essentially elsewhere.

309
00:19:30,840 --> 00:19:32,150
So let me ask then--

310
00:19:32,150 --> 00:19:37,670
what questions we have on how
views can simplify our data?

311
00:19:37,670 --> 00:19:43,580
AUDIENCE: Can we then somehow manipulate
the views to maybe order the results

312
00:19:43,580 --> 00:19:46,185
or display them just differently?

313
00:19:46,185 --> 00:19:47,310
CARTER ZENKE: A great idea.

314
00:19:47,310 --> 00:19:49,760
So if I wanted to maybe
order these books,

315
00:19:49,760 --> 00:19:53,700
I could absolutely do that in the
same way I would do in a table.

316
00:19:53,700 --> 00:19:55,670
So let me go back and
show you how to do that.

317
00:19:55,670 --> 00:20:00,390
Let's say I want to organize these
books alphabetically by title.

318
00:20:00,390 --> 00:20:04,910
So if you remember, using our regular
old table, we could use ORDER BY.

319
00:20:04,910 --> 00:20:06,510
I could do the same thing here.

320
00:20:06,510 --> 00:20:13,520
I could say SELECT both the name and
title columns from my longlist table.

321
00:20:13,520 --> 00:20:20,060
And now let me try to ORDER BY
the title column in this case

322
00:20:20,060 --> 00:20:21,430
and I'll hit semicolon.

323
00:20:21,430 --> 00:20:28,300
Now, if I hit Enter, I should see the
book titles organized alphabetically.

324
00:20:28,300 --> 00:20:31,690
Now, the view itself does
not have these titles

325
00:20:31,690 --> 00:20:36,190
automatically alphabetically sorted
for me, but I could make it do that.

326
00:20:36,190 --> 00:20:41,290
If I had, when I was making this view
included ORDER BY, let's say, title

327
00:20:41,290 --> 00:20:46,480
at the end of that query, I would then
have those titles saved in this view

328
00:20:46,480 --> 00:20:49,900
automatically as A to
Z. So very powerful

329
00:20:49,900 --> 00:20:53,200
if you want to not just find
particular ways of seeing your data

330
00:20:53,200 --> 00:20:57,080
but also organizing
it or ordering it too.

331
00:20:57,080 --> 00:21:02,080
So with that note, we've seen how
to simplify one example of having

332
00:21:02,080 --> 00:21:04,360
multiple tables down into one.

333
00:21:04,360 --> 00:21:07,990
We'll come back in just a few and
see what we can aggregate data too,

334
00:21:07,990 --> 00:21:10,180
putting that data into
a single table for us

335
00:21:10,180 --> 00:21:12,850
to see some statistics and such as well.

336
00:21:12,850 --> 00:21:15,040
Come back in a few.

337
00:21:15,040 --> 00:21:16,300
And we're back.

338
00:21:16,300 --> 00:21:21,070
So we just saw how to simplify some
data, how to take multiple tables

339
00:21:21,070 --> 00:21:23,350
and see them as one.

340
00:21:23,350 --> 00:21:26,230
What we'll do now is think
about how we can aggregate

341
00:21:26,230 --> 00:21:29,050
data, that is to take many
individual data points

342
00:21:29,050 --> 00:21:33,100
and summarize them to find an
average, or a min, or a max,

343
00:21:33,100 --> 00:21:36,640
but see those results inside of a view.

344
00:21:36,640 --> 00:21:42,250
So if you remember in our longlist
database, we had a table of ratings.

345
00:21:42,250 --> 00:21:47,290
And in this case, books had individual
ratings from individual people,

346
00:21:47,290 --> 00:21:50,950
like maybe I gave book_id
1 a four out of five stars,

347
00:21:50,950 --> 00:21:53,530
but David gave it only a three.

348
00:21:53,530 --> 00:21:57,070
Now, we store these individual
ratings in our database,

349
00:21:57,070 --> 00:22:00,700
but ultimately I don't care
so much what I rated it,

350
00:22:00,700 --> 00:22:04,540
or maybe what David rated it, I care
about the average rating, let's say,

351
00:22:04,540 --> 00:22:06,760
for each book.

352
00:22:06,760 --> 00:22:10,480
But if we care about those
averages, why then would

353
00:22:10,480 --> 00:22:13,270
we store all the individual ratings?

354
00:22:13,270 --> 00:22:15,490
Let's think of this as
a design question here.

355
00:22:15,490 --> 00:22:17,920
Can you think of a reason
we might want to actually

356
00:22:17,920 --> 00:22:22,450
have these individual ratings
stored in our database?

357
00:22:22,450 --> 00:22:25,360
What does that give us
at the end of the day?

358
00:22:25,360 --> 00:22:29,458
AUDIENCE: Perhaps reliability, to
know how many persons rated the book.

359
00:22:29,458 --> 00:22:30,250
CARTER ZENKE: Nice.

360
00:22:30,250 --> 00:22:32,080
So it's nice to have some more data.

361
00:22:32,080 --> 00:22:35,110
I could know how, let's say, I rated
a book or how you rated a book.

362
00:22:35,110 --> 00:22:36,580
That's good here too.

363
00:22:36,580 --> 00:22:39,550
And I would argue that from
having this additional data,

364
00:22:39,550 --> 00:22:42,430
we could compute more statistics.

365
00:22:42,430 --> 00:22:45,460
I could find not just the
average rating but also

366
00:22:45,460 --> 00:22:49,520
the minimum rating, or the maximum
rating, or the median rating,

367
00:22:49,520 --> 00:22:50,060
and so on.

368
00:22:50,060 --> 00:22:53,590
So these individual results
allow us to find and compute

369
00:22:53,590 --> 00:22:57,250
more statistics, like
averages, mins, and maxes,

370
00:22:57,250 --> 00:23:00,880
as opposed to just storing the average.

371
00:23:00,880 --> 00:23:05,890
Now, if I wanted to find the average
of these results, what could I do?

372
00:23:05,890 --> 00:23:06,880
We saw before--

373
00:23:06,880 --> 00:23:11,470
I could try to group ratings
buy individual book IDs.

374
00:23:11,470 --> 00:23:15,400
Let's say I want to find the
ratings for book_id 1 and book_id 2.

375
00:23:15,400 --> 00:23:20,620
I could group by the book ID column,
turning these into individual groups.

376
00:23:20,620 --> 00:23:25,120
And let's say I want to find the
average here using a SQL query.

377
00:23:25,120 --> 00:23:28,810
I could say find me the
average of the rating column

378
00:23:28,810 --> 00:23:31,840
when I have groups by book ID.

379
00:23:31,840 --> 00:23:35,960
That effectively takes my
table like this and says,

380
00:23:35,960 --> 00:23:40,240
OK, I'll combine all the ratings
for each individual book ID

381
00:23:40,240 --> 00:23:42,670
and find for you the average.

382
00:23:42,670 --> 00:23:46,510
I'll first compress this data
together, literally putting it one on

383
00:23:46,510 --> 00:23:50,180
top the other here, and then find
the average of all those results--

384
00:23:50,180 --> 00:23:55,000
let's say, 3.67 book
1 and 2.5 for book 2--

385
00:23:55,000 --> 00:23:57,520
compressing all of our
data into a single result--

386
00:23:57,520 --> 00:24:00,170
in this case, the average.

387
00:24:00,170 --> 00:24:05,200
So let's walk through then how we
found the average using a SQL query

388
00:24:05,200 --> 00:24:08,420
and then we'll turn that into a view.

389
00:24:08,420 --> 00:24:13,660
So I'll come back over here to that
computer and I'll open up longlist.db.

390
00:24:13,660 --> 00:24:19,540
I'll already here, so I can type .schema
ratings to see the schema for this

391
00:24:19,540 --> 00:24:20,630
ratings table.

392
00:24:20,630 --> 00:24:26,630
And you'll see it does have a book_id
column and a rating column as well.

393
00:24:26,630 --> 00:24:31,120
So if I want to find the average
rating for each individual book ID,

394
00:24:31,120 --> 00:24:32,330
what could I do?

395
00:24:32,330 --> 00:24:39,710
I could try to SELECT the book_id
column and the rating column like this.

396
00:24:39,710 --> 00:24:43,570
But I don't want to select
just the rating column itself

397
00:24:43,570 --> 00:24:46,210
that gives me individual values.

398
00:24:46,210 --> 00:24:50,230
I want to find the
average rating instead.

399
00:24:50,230 --> 00:24:55,960
I could instead say find me the
average rating from this table.

400
00:24:55,960 --> 00:24:58,000
Now let me give this a new name.

401
00:24:58,000 --> 00:25:03,730
I'll say AS rating, calling the average
rating now newly, in this case, rating.

402
00:25:03,730 --> 00:25:06,560
And I'll select this
FROM the ratings table.

403
00:25:06,560 --> 00:25:10,150
And finally I need to
GROUP BY something.

404
00:25:10,150 --> 00:25:12,880
I need to GROUP BY book_id.

405
00:25:12,880 --> 00:25:18,740
That means I'll calculate the
average rating for each book ID.

406
00:25:18,740 --> 00:25:25,820
I'll hit semicolon here and Enter and
I should see all of these book IDs,

407
00:25:25,820 --> 00:25:30,110
top to bottom, with their own rating.

408
00:25:30,110 --> 00:25:33,140
But here it doesn't look so pretty.

409
00:25:33,140 --> 00:25:39,560
What could I do to try to make this
not so long, like 3.774019431--

410
00:25:39,560 --> 00:25:42,950
there's a better way to do
this and we saw it before.

411
00:25:42,950 --> 00:25:47,000
What could I do to enhance
this view of my ratings table?

412
00:25:47,000 --> 00:25:50,690
AUDIENCE: I see some suggestions
about rounding the ratings.

413
00:25:50,690 --> 00:25:55,505
But I was also going to suggest
adding the book titles, of course.

414
00:25:55,505 --> 00:25:57,290
CARTER ZENKE: Ah, good point here too.

415
00:25:57,290 --> 00:26:00,550
So we first want to add the-- first
we want to probably add the book title

416
00:26:00,550 --> 00:26:03,620
to see what titles are talking
about-- not just, in this case,

417
00:26:03,620 --> 00:26:04,910
individual book IDs--

418
00:26:04,910 --> 00:26:08,240
but also a good point to
say let's try rounding these

419
00:26:08,240 --> 00:26:11,360
so we don't see 3.7740194.

420
00:26:11,360 --> 00:26:14,850
Let's try to round that maybe to
two decimal places just to be safe.

421
00:26:14,850 --> 00:26:17,990
So let's first try
rounding this and then

422
00:26:17,990 --> 00:26:21,170
we'll try adding in some book
titles to your suggestion here.

423
00:26:21,170 --> 00:26:24,540
I'll come back over and let
me just try rounding it first.

424
00:26:24,540 --> 00:26:28,280
So I will come back to
my terminal and I'll

425
00:26:28,280 --> 00:26:32,540
say I want to find the average
rating for each book_id.

426
00:26:32,540 --> 00:26:35,780
And I want to select, not
just the average rating

427
00:26:35,780 --> 00:26:39,150
as we did before, but the
rounded average rating.

428
00:26:39,150 --> 00:26:43,310
So I can use ROUND and say
take the average rating

429
00:26:43,310 --> 00:26:46,370
and round it to two decimal places.

430
00:26:46,370 --> 00:26:50,260
And I'll call this column just
rating in the final result.

431
00:26:50,260 --> 00:26:53,360
And I'll select this
from the ratings table.

432
00:26:53,360 --> 00:26:59,420
And now I can group by, in
this case, book_id Enter.

433
00:26:59,420 --> 00:27:04,490
I should see I have rounded ratings
now, a little more pretty than before.

434
00:27:04,490 --> 00:27:10,100
Now, to your point, let's try to add in
not just book IDs but also book titles.

435
00:27:10,100 --> 00:27:12,330
What does it mean for a
book ID to have some rating?

436
00:27:12,330 --> 00:27:15,000
I want to see titles here in the end.

437
00:27:15,000 --> 00:27:16,520
So let's do that.

438
00:27:16,520 --> 00:27:22,130
I'll say SELECT book_id,
and also SELECT title,

439
00:27:22,130 --> 00:27:27,050
and maybe even select the year that
book was nominated for the Booker Prize.

440
00:27:27,050 --> 00:27:34,730
I'll then select the rounded average
rating to two decimal places as rating.

441
00:27:34,730 --> 00:27:39,860
And I want to select this from
the ratings table, from ratings.

442
00:27:39,860 --> 00:27:41,720
But what do you notice?

443
00:27:41,720 --> 00:27:46,520
If I were to select just from
ratings, what data am I missing

444
00:27:46,520 --> 00:27:49,550
and where could I find it?

445
00:27:49,550 --> 00:27:54,352
AUDIENCE: Hey, you're missing
the titles from the books table.

446
00:27:54,352 --> 00:27:56,060
CARTER ZENKE: Yeah,
I have to have titles

447
00:27:56,060 --> 00:27:57,930
and those are stored in the books table.

448
00:27:57,930 --> 00:27:59,060
So what should we do?

449
00:27:59,060 --> 00:28:03,410
Probably just join in the books
table matching up that primary key

450
00:28:03,410 --> 00:28:04,583
with some foreign keys.

451
00:28:04,583 --> 00:28:06,500
So I'll come back over
here and we'll do that.

452
00:28:06,500 --> 00:28:09,120
I want to select from
ratings, of course,

453
00:28:09,120 --> 00:28:12,740
but I also want to join, in
this case, the book table.

454
00:28:12,740 --> 00:28:18,360
So I'll hit Enter and now I'll
say JOIN as well the books table.

455
00:28:18,360 --> 00:28:22,020
Now, how can I match up the books
table with the rating table?

456
00:28:22,020 --> 00:28:27,266
Well, I know that the rating table
has a foreign key called book_id.

457
00:28:27,266 --> 00:28:30,800
And the book table has
a primary key called id.

458
00:28:30,800 --> 00:28:34,890
So I'll tell SQL I want to match
these two up a bit like this.

459
00:28:34,890 --> 00:28:39,590
I'll say JOIN books ON
ratings dot book_id--

460
00:28:39,590 --> 00:28:42,170
that foreign key that's
inside of ratings--

461
00:28:42,170 --> 00:28:47,450
equals, in this case, books dot id.

462
00:28:47,450 --> 00:28:52,070
Now, there's one more step, which
is to group by book_id still.

463
00:28:52,070 --> 00:28:55,490
So I'll hit Enter and GROUP BY book_id.

464
00:28:55,490 --> 00:28:59,270
But now I should be able
to hit semicolon and Enter

465
00:28:59,270 --> 00:29:06,440
and I should see, not just book IDs,
but now titles and the years those books

466
00:29:06,440 --> 00:29:08,730
were nominated as well.

467
00:29:08,730 --> 00:29:11,390
So this view is getting
better and better.

468
00:29:11,390 --> 00:29:14,780
And I might say we can actually go
ahead and just add this to our schema.

469
00:29:14,780 --> 00:29:19,200
We could create a view that returns
to us the results of this query.

470
00:29:19,200 --> 00:29:20,970
So let's try that.

471
00:29:20,970 --> 00:29:27,050
I'll say, create a view called maybe
average book ratings, average book

472
00:29:27,050 --> 00:29:30,140
ratings for people to come in and
do analysis on if they'd like.

473
00:29:30,140 --> 00:29:36,870
Then I could say, I want to create this
view as that query we wrote before.

474
00:29:36,870 --> 00:29:42,230
So I'll say, again, select
the book_id, the title,

475
00:29:42,230 --> 00:29:48,350
and let's say the year column,
the rounded average rating

476
00:29:48,350 --> 00:29:51,200
to two decimal places.

477
00:29:51,200 --> 00:29:54,710
And then I want to select as rating.

478
00:29:54,710 --> 00:30:00,050
I'll select this from the
ratings table, from ratings.

479
00:30:00,050 --> 00:30:03,350
And then I'll join, as we did
before, the book table from ratings

480
00:30:03,350 --> 00:30:08,300
and then join books on that
primary key and not foreign key.

481
00:30:08,300 --> 00:30:14,270
So I'll say on ratings.book_id
that foreign key in ratings,

482
00:30:14,270 --> 00:30:19,650
and set that equal to books.id, that
primary key we're going to use to join

483
00:30:19,650 --> 00:30:21,030
these tables together.

484
00:30:21,030 --> 00:30:24,120
Then I'll say group by book_id.

485
00:30:24,120 --> 00:30:28,890
And I should have if I hit Enter my
very own view called average book

486
00:30:28,890 --> 00:30:33,420
ratings that I can use to then see
the averages for each of these books.

487
00:30:33,420 --> 00:30:41,260
I'll say select-- let's say star--
from average_book_ratings semicolon.

488
00:30:41,260 --> 00:30:47,080
And now I should see all of those
results just as we had before.

489
00:30:47,080 --> 00:30:49,950
Now notably, because
this is in a view, I'm

490
00:30:49,950 --> 00:30:54,270
not using much, if any,
additional space in my database.

491
00:30:54,270 --> 00:30:58,620
I still have underlying this
view individual ratings.

492
00:30:58,620 --> 00:31:03,000
But because I saved this query and
the results it created in a view,

493
00:31:03,000 --> 00:31:06,990
I can actually have access to
the rounded average ratings

494
00:31:06,990 --> 00:31:08,050
for each of these.

495
00:31:08,050 --> 00:31:11,290
And I can query that view as well.

496
00:31:11,290 --> 00:31:16,590
So let me ask then what questions
we have on how we've used views

497
00:31:16,590 --> 00:31:20,250
to aggregate some data so far?

498
00:31:20,250 --> 00:31:22,060
AUDIENCE: Actually, I have a question.

499
00:31:22,060 --> 00:31:28,350
Can we use both the same name,
like before it was long list.

500
00:31:28,350 --> 00:31:31,530
And here we're using
for average something.

501
00:31:31,530 --> 00:31:34,587
Can use long list for both of them?

502
00:31:34,587 --> 00:31:37,170
CARTER ZENKE: Yeah, so if you
remember before we had this view

503
00:31:37,170 --> 00:31:38,490
called long list.

504
00:31:38,490 --> 00:31:43,780
And long list had book titles
and authors inside of it.

505
00:31:43,780 --> 00:31:47,250
And I think you're asking,
could we try to join, let's say,

506
00:31:47,250 --> 00:31:50,430
the long list table with our ratings
to get book titles and so on?

507
00:31:50,430 --> 00:31:54,720
We couldn't do that because we don't
have the ID column in that long list

508
00:31:54,720 --> 00:31:55,860
view.

509
00:31:55,860 --> 00:31:58,410
We could though perhaps
write some queries that

510
00:31:58,410 --> 00:32:02,910
involve that long list view as well.

511
00:32:02,910 --> 00:32:07,170
OK, let's keep going here and
seeing what advantages this view

512
00:32:07,170 --> 00:32:07,920
might give us.

513
00:32:07,920 --> 00:32:09,960
So I'll come back over here.

514
00:32:09,960 --> 00:32:14,640
And one more advantage is that if
I were to add in some new rating

515
00:32:14,640 --> 00:32:18,870
to my ratings column, I
could just requery this view

516
00:32:18,870 --> 00:32:20,980
to find the updated average.

517
00:32:20,980 --> 00:32:24,490
So let's say we use some
INSERT INTO statement here

518
00:32:24,490 --> 00:32:26,400
and I update some books ratings.

519
00:32:26,400 --> 00:32:30,570
Well, if I'm a data analyst and
I come back in a little later

520
00:32:30,570 --> 00:32:34,080
and I want to find the updated
rating, I can do it just again,

521
00:32:34,080 --> 00:32:40,080
by selecting from, selecting star,
from average book ratings like this,

522
00:32:40,080 --> 00:32:42,960
average book ratings.

523
00:32:42,960 --> 00:32:44,130
Hit Enter.

524
00:32:44,130 --> 00:32:47,880
And now I see perhaps some
updated averages as well.

525
00:32:47,880 --> 00:32:52,350
Every time I query this view, I'll rerun
essentially that query from before.

526
00:32:52,350 --> 00:32:57,420
And now I'll see the updated
averages for each book.

527
00:32:57,420 --> 00:33:03,540
Now, sometimes as a data analyst,
I might want to create a view

528
00:33:03,540 --> 00:33:07,500
but not store it long
term as part of my schema.

529
00:33:07,500 --> 00:33:11,910
Like here if I type dot schema,
I see both our prior view

530
00:33:11,910 --> 00:33:14,220
called long list right here.

531
00:33:14,220 --> 00:33:17,880
And I also see our
average book ratings view.

532
00:33:17,880 --> 00:33:21,120
But if I am somebody
who comes in and I maybe

533
00:33:21,120 --> 00:33:26,580
want to find the average book ratings
not by each book, but by each year,

534
00:33:26,580 --> 00:33:30,780
I could do that, but then not
store that view long term.

535
00:33:30,780 --> 00:33:32,910
Maybe I don't want
others to see this view.

536
00:33:32,910 --> 00:33:37,000
Or maybe it's just from my own
analysis on this given day.

537
00:33:37,000 --> 00:33:41,730
Well, to do that, I could
move beyond just create view.

538
00:33:41,730 --> 00:33:46,320
And I could also enhance this to
be create temporary view, where

539
00:33:46,320 --> 00:33:49,500
a temporary view exists
only for the duration

540
00:33:49,500 --> 00:33:52,300
of my connection with the database.

541
00:33:52,300 --> 00:33:56,580
So to be clear, that means if I
type SQLite three long list dot DB,

542
00:33:56,580 --> 00:33:59,100
I'm opening a connection
to the database.

543
00:33:59,100 --> 00:34:03,720
If I then type dot quit, I
would then lose that connection,

544
00:34:03,720 --> 00:34:04,650
quit that connection.

545
00:34:04,650 --> 00:34:07,810
And that view would no longer be there.

546
00:34:07,810 --> 00:34:09,060
So let's try this.

547
00:34:09,060 --> 00:34:10,449
I'll come back to my terminal.

548
00:34:10,449 --> 00:34:12,630
And we'll create a
temporary view that shows

549
00:34:12,630 --> 00:34:16,469
us the average ratings for the books
by the year they were nominated,

550
00:34:16,469 --> 00:34:19,570
not by every individual book here.

551
00:34:19,570 --> 00:34:23,550
So I could try doing the
same query from before,

552
00:34:23,550 --> 00:34:27,300
like select, let's say, the
average rating select it

553
00:34:27,300 --> 00:34:31,030
from the ratings table, join
books, and titles, and so on.

554
00:34:31,030 --> 00:34:34,710
But what do I have already
to build on top of?

555
00:34:34,710 --> 00:34:41,820
If I wanted to find a view that
has a title, a rating, and even

556
00:34:41,820 --> 00:34:44,130
let's say the year that
book was published,

557
00:34:44,130 --> 00:34:48,610
what do I already have
in my schema for that?

558
00:34:48,610 --> 00:34:52,857
AUDIENCE: So we could use the
view that we just created?

559
00:34:52,857 --> 00:34:54,190
CARTER ZENKE: Yeah, a nice idea.

560
00:34:54,190 --> 00:34:57,450
So we could actually use the
same view we just created

561
00:34:57,450 --> 00:34:59,700
to create ourselves a brand new view.

562
00:34:59,700 --> 00:35:01,890
So just because something
is a view doesn't mean

563
00:35:01,890 --> 00:35:05,140
we can't use it to then
create another view as well.

564
00:35:05,140 --> 00:35:06,465
So we'll do exactly that.

565
00:35:06,465 --> 00:35:08,400
We'll come back over here.

566
00:35:08,400 --> 00:35:12,570
And I can type dot schema
to see I still have

567
00:35:12,570 --> 00:35:15,510
this view called average book ratings.

568
00:35:15,510 --> 00:35:20,160
And in it I have book IDs, their
title, the year they were nominated,

569
00:35:20,160 --> 00:35:21,810
and their average rating.

570
00:35:21,810 --> 00:35:25,080
But now I want to find,
let's say, the average rating

571
00:35:25,080 --> 00:35:28,680
for each individual year,
not for each individual book.

572
00:35:28,680 --> 00:35:30,160
So what could I do?

573
00:35:30,160 --> 00:35:36,360
I could say select the year column
and the rounded average rating,

574
00:35:36,360 --> 00:35:46,290
in this case, from the rating
column in average_book_ratings.

575
00:35:46,290 --> 00:35:47,585
And I'll wrap this query here.

576
00:35:47,585 --> 00:35:49,710
Let me zoom out, so you
can see it all on one line.

577
00:35:49,710 --> 00:35:55,270
Now, I'll say I want to
group by the year column.

578
00:35:55,270 --> 00:35:58,950
So here I'm now saying,
find me the average rating

579
00:35:58,950 --> 00:36:03,900
of each book, of each book's average
rating, for each individual year.

580
00:36:03,900 --> 00:36:04,770
I'll hit Enter.

581
00:36:04,770 --> 00:36:09,930
And we should see, if I zoom
back in, for 2018 we have 3.75.

582
00:36:09,930 --> 00:36:11,550
For 2019, we have 3.64.

583
00:36:11,550 --> 00:36:16,012
So this means that apparently
in 2023, the average rating

584
00:36:16,012 --> 00:36:18,720
for books that were longlisted
for the international Booker Prize

585
00:36:18,720 --> 00:36:22,800
was 3.78, and same for 2022, 3.87.

586
00:36:22,800 --> 00:36:25,110
2021 was 3.69.

587
00:36:25,110 --> 00:36:27,310
So I have this new view here.

588
00:36:27,310 --> 00:36:30,390
I could store this
temporarily inside of a view

589
00:36:30,390 --> 00:36:34,330
that I could use while I'm doing
some analysis on this given day.

590
00:36:34,330 --> 00:36:37,740
So I'll say create a temporary view.

591
00:36:37,740 --> 00:36:42,030
And I'll call this one
average ratings by year.

592
00:36:42,030 --> 00:36:46,530
How were the books rated each
individual year on average?

593
00:36:46,530 --> 00:36:49,770
And I'll say this is the
query that I just did before.

594
00:36:49,770 --> 00:36:54,660
Select, in this case, the year
and the rounded average rating

595
00:36:54,660 --> 00:37:02,280
to two decimal places, as the rating
column from average book ratings,

596
00:37:02,280 --> 00:37:04,390
that same view from before.

597
00:37:04,390 --> 00:37:08,190
Now, I'm creating a view from my view.

598
00:37:08,190 --> 00:37:11,610
I'll group by here, group by year.

599
00:37:11,610 --> 00:37:13,140
And then I'll hit Enter.

600
00:37:13,140 --> 00:37:15,510
And I should see nothing popping up.

601
00:37:15,510 --> 00:37:18,810
But now if I wanted
to query this view, I

602
00:37:18,810 --> 00:37:23,730
could say select star from
average ratings by year.

603
00:37:23,730 --> 00:37:28,230
And I'll see the results of that view.

604
00:37:28,230 --> 00:37:31,470
Now, though, here's the
catch and why it's temporary.

605
00:37:31,470 --> 00:37:37,770
If I were to type dot quit now,
dot quit, and try it again,

606
00:37:37,770 --> 00:37:41,040
SQLite 3 long list dot DB.

607
00:37:41,040 --> 00:37:42,840
Let me retype that.

608
00:37:42,840 --> 00:37:45,690
SQLite 3 long list dot DB.

609
00:37:45,690 --> 00:37:54,450
If I now try to select star from
average ratings by year semicolon,

610
00:37:54,450 --> 00:37:57,060
I'll get no such table
and really no such view.

611
00:37:57,060 --> 00:37:59,460
I don't have a view called
average ratings by year

612
00:37:59,460 --> 00:38:03,450
anymore because this was temporary.

613
00:38:03,450 --> 00:38:10,200
So let me ask, what questions do we
have on temporary tables, if any?

614
00:38:10,200 --> 00:38:12,030
Temporary views rather.

615
00:38:12,030 --> 00:38:13,990
AUDIENCE: Right, so thank you.

616
00:38:13,990 --> 00:38:17,610
So the first thing that
I notice is that it just

617
00:38:17,610 --> 00:38:23,850
helps us to do some kind of testing
when we use temporary views.

618
00:38:23,850 --> 00:38:29,490
Basically, the views will
not be saved in the database.

619
00:38:29,490 --> 00:38:35,280
But again, it's just a good way just
to test the query, whether it is usable

620
00:38:35,280 --> 00:38:36,332
or not.

621
00:38:36,332 --> 00:38:38,290
CARTER ZENKE: That's a
great use case for them.

622
00:38:38,290 --> 00:38:41,160
And in general, if you
are working on a database

623
00:38:41,160 --> 00:38:44,070
and you want to organize
the data in some way,

624
00:38:44,070 --> 00:38:46,980
but you don't want to store
that organization long term,

625
00:38:46,980 --> 00:38:49,500
a temporary view is great for that.

626
00:38:49,500 --> 00:38:52,500
If though, you are a
developer and you want

627
00:38:52,500 --> 00:38:55,210
to ingrain this view as
part of your database,

628
00:38:55,210 --> 00:38:57,780
you might just use create
view, plain and simple,

629
00:38:57,780 --> 00:39:01,300
to have it always be in your database
for you wherever you connect to it

630
00:39:01,300 --> 00:39:03,000
and whenever you connect to it.

631
00:39:03,000 --> 00:39:04,726
Great question.

632
00:39:04,726 --> 00:39:06,090
OK, let's keep going.

633
00:39:06,090 --> 00:39:12,810
And let's introduce now something called
a common table expression, or a CTE.

634
00:39:12,810 --> 00:39:16,230
And a common table
expression is simply a view

635
00:39:16,230 --> 00:39:19,810
that exists for the
duration of a single query.

636
00:39:19,810 --> 00:39:24,750
So a regular view, as we saw
before, exists forever in my schema.

637
00:39:24,750 --> 00:39:28,200
A temporary view exists
for a single connection.

638
00:39:28,200 --> 00:39:32,280
But a CTE exists for a single query.

639
00:39:32,280 --> 00:39:34,245
And the syntax looks
just a bit different.

640
00:39:34,245 --> 00:39:39,720
With a CTE I can write a single query,
but at the beginning of that query,

641
00:39:39,720 --> 00:39:43,440
define the CTE, the
Common Table Expression.

642
00:39:43,440 --> 00:39:48,690
To do so, I would say with and
give it some name, as and then

643
00:39:48,690 --> 00:39:54,580
inside these parentheses say the
query I want to be this CTE here.

644
00:39:54,580 --> 00:39:59,820
I could then optionally follow it up
with more CTEs for this same query.

645
00:39:59,820 --> 00:40:05,580
Or I could omit the comma, and then
use just select from the name of a CTE

646
00:40:05,580 --> 00:40:06,450
up above.

647
00:40:06,450 --> 00:40:11,010
This is useful if you want to have
some very temporary view that you don't

648
00:40:11,010 --> 00:40:16,480
want to store in your schema, but you do
want to use for this particular query.

649
00:40:16,480 --> 00:40:21,990
So let's see an example now recreating
our average ratings by year,

650
00:40:21,990 --> 00:40:24,750
but now using a CTE.

651
00:40:24,750 --> 00:40:26,910
So I'll come back to my computer.

652
00:40:26,910 --> 00:40:29,970
And I'm already in my SQLite terminal.

653
00:40:29,970 --> 00:40:33,930
What I should do, though, is
try to drop the prior view.

654
00:40:33,930 --> 00:40:37,230
I want to remove that
view called average book

655
00:40:37,230 --> 00:40:41,950
ratings to create room to have now
a CTE with that very same name.

656
00:40:41,950 --> 00:40:46,110
So I'll say, drop view and the
name of the view I want to drop,

657
00:40:46,110 --> 00:40:53,200
average book ratings, average
book ratings, semicolon.

658
00:40:53,200 --> 00:40:58,620
And now I should see that view
is no longer in my schema.

659
00:40:58,620 --> 00:41:01,810
So I want to recreate
though what we did before,

660
00:41:01,810 --> 00:41:06,520
which was finding the average
ratings of books year over year.

661
00:41:06,520 --> 00:41:09,180
So let me try to write this query.

662
00:41:09,180 --> 00:41:12,990
But it would be handy if
in that query I had access

663
00:41:12,990 --> 00:41:16,410
to a view that gave me the
average rating for each book.

664
00:41:16,410 --> 00:41:18,510
So I'll write the query like this.

665
00:41:18,510 --> 00:41:19,800
I'll use a CTE.

666
00:41:19,800 --> 00:41:26,370
And I'll say, with average book
ratings, average book ratings,

667
00:41:26,370 --> 00:41:31,080
as now this query I'll write now inside.

668
00:41:31,080 --> 00:41:32,760
And this query should be familiar.

669
00:41:32,760 --> 00:41:38,280
It's the same query we use to find
the average rating for each book.

670
00:41:38,280 --> 00:41:47,820
I'll say here, I want to select the book
ID column, the title column, the year

671
00:41:47,820 --> 00:41:53,700
column, and the rounded average
rating to two decimal places,

672
00:41:53,700 --> 00:42:02,880
calling it rating, in this case
from my ratings table, like this.

673
00:42:02,880 --> 00:42:04,360
Now, what do I need to do?

674
00:42:04,360 --> 00:42:08,850
I also need to, as you saw
before, join in my books table.

675
00:42:08,850 --> 00:42:13,380
Join books on the primary
and foreign keys aligning.

676
00:42:13,380 --> 00:42:19,440
So I'll say the book ID column, that
foreign key in the ratings table,

677
00:42:19,440 --> 00:42:25,140
should equal books dot ID, that
primary key column in books.

678
00:42:25,140 --> 00:42:32,160
Then I will group by the book ID to
find the average rating for each book.

679
00:42:32,160 --> 00:42:35,820
And now, let me close out this CTE.

680
00:42:35,820 --> 00:42:38,010
I'll close my parentheses.

681
00:42:38,010 --> 00:42:42,300
And because I'm only using
one CTE, I don't need a comma.

682
00:42:42,300 --> 00:42:46,170
I can then just write
the rest of my query.

683
00:42:46,170 --> 00:42:48,250
I'll hit Enter for style sake.

684
00:42:48,250 --> 00:42:52,320
And now with this CTE
called average book ratings,

685
00:42:52,320 --> 00:42:55,920
I want to find the average
rating year over year

686
00:42:55,920 --> 00:42:58,290
grouping not by book, but by year.

687
00:42:58,290 --> 00:43:02,430
So I'll say select, in
this case, year, and then

688
00:43:02,430 --> 00:43:09,900
select the rounded average rating still
to two decimal places as a new rating

689
00:43:09,900 --> 00:43:11,100
column.

690
00:43:11,100 --> 00:43:16,290
Then I'll select this
from, in this case, my CTE.

691
00:43:16,290 --> 00:43:22,080
So I'll hit Enter here as well
from average book ratings,

692
00:43:22,080 --> 00:43:26,070
that same CTE I defined
earlier in my query.

693
00:43:26,070 --> 00:43:30,000
Now finally, to find the average
rating across individual years,

694
00:43:30,000 --> 00:43:34,060
I'll say group by year semicolon.

695
00:43:34,060 --> 00:43:38,940
And now hopefully, if I hit Enter,
I should see the average rating

696
00:43:38,940 --> 00:43:42,950
for each year in my database.

697
00:43:42,950 --> 00:43:48,900
OK, so this CTE is useful when you have
subqueries you don't want to repeat

698
00:43:48,900 --> 00:43:51,450
or when you want to
make a view that only

699
00:43:51,450 --> 00:43:53,910
exists for the duration of a query.

700
00:43:53,910 --> 00:43:58,680
And it's with all of these tools,
like views, temporary views, and CTEs

701
00:43:58,680 --> 00:44:02,810
that we can get away with storing
statistics about some data in our data

702
00:44:02,810 --> 00:44:07,200
set and not storing them
in the actual database,

703
00:44:07,200 --> 00:44:10,290
but only inside this
view we might have here.

704
00:44:10,290 --> 00:44:13,230
What we'll do next is
see how we can use views

705
00:44:13,230 --> 00:44:17,925
to create not just these aggregations,
but also partitions of our data.

706
00:44:17,925 --> 00:44:23,310
A way to split our data up into logical
pieces for use for our own sake or even

707
00:44:23,310 --> 00:44:25,380
for application sake as well.

708
00:44:25,380 --> 00:44:27,390
We'll come back In just a few.

709
00:44:27,390 --> 00:44:28,840
And we're back.

710
00:44:28,840 --> 00:44:34,410
So we've seen so far how to use views
to simplify and also aggregate our data.

711
00:44:34,410 --> 00:44:38,400
What we'll see now though, is how
to use views to partition our data.

712
00:44:38,400 --> 00:44:41,280
That is to break it into smaller
pieces that will be useful

713
00:44:41,280 --> 00:44:44,040
to us or to some application.

714
00:44:44,040 --> 00:44:47,580
Now, on the International
Booker Prize website,

715
00:44:47,580 --> 00:44:51,210
they have a page for each
year of longlisted books.

716
00:44:51,210 --> 00:44:56,820
But in our table, we have all the books
together, regardless of their year.

717
00:44:56,820 --> 00:45:00,660
Well, maybe it will be useful
in the application, even for me

718
00:45:00,660 --> 00:45:04,380
personally, to have a table
one for each year of books that

719
00:45:04,380 --> 00:45:07,140
have been nominated for this prize.

720
00:45:07,140 --> 00:45:09,330
But before I create
that view, I should ask,

721
00:45:09,330 --> 00:45:14,910
how do I try to find the
books nominated in 2023?

722
00:45:14,910 --> 00:45:18,870
If my table looks like this, I
have a table of books with ID,

723
00:45:18,870 --> 00:45:22,560
title, and year columns,
what query could

724
00:45:22,560 --> 00:45:28,230
I use to find only those
books nominated in 2023?

725
00:45:28,230 --> 00:45:31,240
AUDIENCE: Use the WHERE
clause with the year?

726
00:45:31,240 --> 00:45:33,240
CARTER ZENKE: Yeah, I
could use the WHERE clause

727
00:45:33,240 --> 00:45:35,010
and try to search for a given year.

728
00:45:35,010 --> 00:45:38,970
Like I could try to say, where,
in this case, the year is 2023

729
00:45:38,970 --> 00:45:40,890
or equal to 2023.

730
00:45:40,890 --> 00:45:43,230
And that would give
me back these two rows

731
00:45:43,230 --> 00:45:45,420
if only two books were
nominated in 2023.

732
00:45:45,420 --> 00:45:47,380
In reality it's 13.

733
00:45:47,380 --> 00:45:51,700
If I wanted to find the next
ones from 2022, what could I do?

734
00:45:51,700 --> 00:45:57,120
I could just change the WHERE clause
to say where year equals 2022.

735
00:45:57,120 --> 00:45:59,250
And then same thing for 2021.

736
00:45:59,250 --> 00:46:03,030
I could say where year equals 2021.

737
00:46:03,030 --> 00:46:06,480
Now, each of these
segments of my data is

738
00:46:06,480 --> 00:46:10,440
what we'll call a partition,
a certain breaking down

739
00:46:10,440 --> 00:46:15,930
of my data into smaller pieces,
in this case, organized by year.

740
00:46:15,930 --> 00:46:20,910
So let's now create smaller
tables, one for each year that

741
00:46:20,910 --> 00:46:22,950
might be in our data set here.

742
00:46:22,950 --> 00:46:25,000
I'll come back to my terminal.

743
00:46:25,000 --> 00:46:28,660
And let's think about writing
this query from before.

744
00:46:28,660 --> 00:46:33,420
So if I wanted to find the books
that were nominated in 2022,

745
00:46:33,420 --> 00:46:40,770
I could say select the ID column and
the title column from my books table.

746
00:46:40,770 --> 00:46:48,240
And then I want to find only those
books, where the year equals 2022, then

747
00:46:48,240 --> 00:46:50,100
semicolon, and then Enter.

748
00:46:50,100 --> 00:46:56,280
And I should see only those books
that were nominated in 2022.

749
00:46:56,280 --> 00:46:59,490
But now, I want to create
a view that might actually

750
00:46:59,490 --> 00:47:02,930
store the results of this
query more long term.

751
00:47:02,930 --> 00:47:06,380
I could say create view
and call this, let's

752
00:47:06,380 --> 00:47:15,720
say, 2022 as now same query from before,
select ID and title from my books

753
00:47:15,720 --> 00:47:25,190
table, in this case where the
year equals 2022 semicolon.

754
00:47:25,190 --> 00:47:30,530
Now, if I ever want to find only
those books nominated in 2022,

755
00:47:30,530 --> 00:47:31,970
I could now use my view.

756
00:47:31,970 --> 00:47:39,050
I could say select star from 2022
quote and then semicolon, hit Enter.

757
00:47:39,050 --> 00:47:43,310
And now I should see
those very same books.

758
00:47:43,310 --> 00:47:44,960
So what could I do now?

759
00:47:44,960 --> 00:47:48,330
Maybe I want those
books from 2021 as well.

760
00:47:48,330 --> 00:47:50,600
Let me ask the audience,
what could you do

761
00:47:50,600 --> 00:47:57,020
to create the view for the books in
2021 based on what we've just seen now?

762
00:47:57,020 --> 00:48:02,242
AUDIENCE: We could just change
the year from 2022 to 2021.

763
00:48:02,242 --> 00:48:03,950
CARTER ZENKE: Yeah,
just change the year.

764
00:48:03,950 --> 00:48:07,140
And what would you propose
we call this new view?

765
00:48:07,140 --> 00:48:10,790
if you had to take the reins on naming
this view, what would you name it?

766
00:48:10,790 --> 00:48:14,120
AUDIENCE: Books that
were written in 2021.

767
00:48:14,120 --> 00:48:17,600
CARTER ZENKE: Yeah, we could call it
books that were nominated in 2021.

768
00:48:17,600 --> 00:48:20,522
But I would argue that might
be a bit of a long view title.

769
00:48:20,522 --> 00:48:21,980
So let's shorten just a little bit.

770
00:48:21,980 --> 00:48:25,830
And maybe in keeping with our current
naming, we'll say 2021 instead.

771
00:48:25,830 --> 00:48:27,270
So I'll come back over here.

772
00:48:27,270 --> 00:48:28,400
And let's try this.

773
00:48:28,400 --> 00:48:32,090
I'll say create a view.

774
00:48:32,090 --> 00:48:34,440
And we could, by all
accounts and purposes,

775
00:48:34,440 --> 00:48:41,360
we could say books nominated in 2021.

776
00:48:41,360 --> 00:48:43,340
And that's totally OK.

777
00:48:43,340 --> 00:48:48,188
But if I were to pass this view on to a
new programmer, a new database analyst,

778
00:48:48,188 --> 00:48:51,230
well, I kind of want to shorten it
for them, make it more simple for them

779
00:48:51,230 --> 00:48:51,840
to read.

780
00:48:51,840 --> 00:48:55,130
So I could just call this 2021.

781
00:48:55,130 --> 00:48:58,100
Of course, this title
is kind of ambiguous.

782
00:48:58,100 --> 00:49:02,360
Like if I pass you a view called
2021, are you automatically going

783
00:49:02,360 --> 00:49:04,500
to know what's inside that view?

784
00:49:04,500 --> 00:49:05,360
Probably not.

785
00:49:05,360 --> 00:49:07,970
So it's actually a balance
between how descriptive

786
00:49:07,970 --> 00:49:10,130
you want to be with your names here.

787
00:49:10,130 --> 00:49:14,460
We'll stick though with 2021
to our earlier example here.

788
00:49:14,460 --> 00:49:22,220
Now I'll say create view 2021
as select ID, title from books.

789
00:49:22,220 --> 00:49:28,910
And to Fabian's point, I'll say where
the year equals 2021 semicolon Enter.

790
00:49:28,910 --> 00:49:36,320
And now I could say, select
star from 2021 semicolon.

791
00:49:36,320 --> 00:49:42,080
So this is a good way to break apart
a large table into logical pieces.

792
00:49:42,080 --> 00:49:46,910
It's useful for both you as a
programmer and also for applications

793
00:49:46,910 --> 00:49:50,300
that might only need to access
some subset of your data,

794
00:49:50,300 --> 00:49:53,430
where some condition is true.

795
00:49:53,430 --> 00:49:56,750
So based on what we've
seen so far, what questions

796
00:49:56,750 --> 00:50:00,560
do we have on partitioning
data with views?

797
00:50:00,560 --> 00:50:05,300
AUDIENCE: We have created
two views for 2021 and 2022.

798
00:50:05,300 --> 00:50:11,780
Rather than creating two views, is
it possible to simplify it as one

799
00:50:11,780 --> 00:50:17,327
and then use it rather like where
you simply change that alone?

800
00:50:17,327 --> 00:50:19,410
CARTER ZENKE: Yeah, I mean,
it's a trade off here.

801
00:50:19,410 --> 00:50:22,740
So you could imagine
trying to do, let's say,

802
00:50:22,740 --> 00:50:27,380
a single view that maybe has
just titles and just years in it.

803
00:50:27,380 --> 00:50:29,630
But then you'd have to
rewrite the query later on

804
00:50:29,630 --> 00:50:32,000
to find only those books
for a certain year.

805
00:50:32,000 --> 00:50:35,030
Here, what we're trying to do is
just like break up this big table

806
00:50:35,030 --> 00:50:36,118
into smaller pieces.

807
00:50:36,118 --> 00:50:37,910
And you can imagine
it's being particularly

808
00:50:37,910 --> 00:50:41,090
useful if you had not just
78 books, but even like

809
00:50:41,090 --> 00:50:43,895
thousands, or tens of
thousands, or millions of books.

810
00:50:43,895 --> 00:50:49,220
It could be useful to have a view for
each logical segment of those 10,000

811
00:50:49,220 --> 00:50:51,290
or more books.

812
00:50:51,290 --> 00:50:53,090
Let's take one more question here.

813
00:50:53,090 --> 00:50:57,740
AUDIENCE: I would like to ask
about updating the view itself.

814
00:50:57,740 --> 00:51:03,860
If I update a single table,
any element in each table,

815
00:51:03,860 --> 00:51:06,800
their view can be updated automatically?

816
00:51:06,800 --> 00:51:10,797
Or maybe it [INAUDIBLE].

817
00:51:10,797 --> 00:51:13,130
CARTER ZENKE: OK, a good
forward-thinking question here.

818
00:51:13,130 --> 00:51:15,590
You can, you're right, update a table.

819
00:51:15,590 --> 00:51:20,180
You can say update the table and
set some column to some value.

820
00:51:20,180 --> 00:51:23,480
What you can't do
actually is update a view.

821
00:51:23,480 --> 00:51:26,645
And that's because a view doesn't
have any underlying-- actually,

822
00:51:26,645 --> 00:51:29,060
it doesn't have any data
inside of it exactly.

823
00:51:29,060 --> 00:51:32,460
It only pulls data
from underlying tables.

824
00:51:32,460 --> 00:51:35,540
So let me show you how I
could update this view,

825
00:51:35,540 --> 00:51:37,730
or at least try to update this view.

826
00:51:37,730 --> 00:51:40,160
I'll come back to my computer.

827
00:51:40,160 --> 00:51:44,210
And let's say we wanted to
change a book in the 2021 view.

828
00:51:44,210 --> 00:51:48,830
I could say select star
from 2021 semicolon.

829
00:51:48,830 --> 00:51:51,870
And maybe it turns out
there's a typo in here.

830
00:51:51,870 --> 00:51:54,920
So instead of a minor
detail, it's called something

831
00:51:54,920 --> 00:52:00,420
like the minor detail,
this book on ID 34 here.

832
00:52:00,420 --> 00:52:02,660
I could try to update this view.

833
00:52:02,660 --> 00:52:07,790
I could say, update
2021 and set the title

834
00:52:07,790 --> 00:52:17,780
equal to the minor detail, where
the title currently is minor detail,

835
00:52:17,780 --> 00:52:19,910
like this, minor detail.

836
00:52:19,910 --> 00:52:21,680
Then I'll hit semicolon.

837
00:52:21,680 --> 00:52:28,340
And if I do this, I get this error,
cannot modify 2021 because it is

838
00:52:28,340 --> 00:52:29,330
a view.

839
00:52:29,330 --> 00:52:31,700
Now, why can't we modify a view?

840
00:52:31,700 --> 00:52:36,200
Well, recall that a view
is simply the combination

841
00:52:36,200 --> 00:52:38,780
of data from underlying tables.

842
00:52:38,780 --> 00:52:41,510
I've assembled data from
different pieces of my database

843
00:52:41,510 --> 00:52:43,520
and put them into a single view.

844
00:52:43,520 --> 00:52:46,760
But I can't then update
that data because that data

845
00:52:46,760 --> 00:52:50,600
is located in multiple individual
tables around my database.

846
00:52:50,600 --> 00:52:52,460
What I should instead
do is try to update

847
00:52:52,460 --> 00:52:56,870
the underlying table, which I can
do as we saw before with just update

848
00:52:56,870 --> 00:52:59,810
and set to.

849
00:52:59,810 --> 00:53:03,920
So more on how we can update,
or at least try to update,

850
00:53:03,920 --> 00:53:05,120
views a little bit later.

851
00:53:05,120 --> 00:53:07,280
We'll take a break and
come back to talk about how

852
00:53:07,280 --> 00:53:10,530
we can secure data using views as well.

853
00:53:10,530 --> 00:53:11,870
And we're back.

854
00:53:11,870 --> 00:53:15,740
So one more use case for
views is being able to secure

855
00:53:15,740 --> 00:53:17,280
the data that's in your database.

856
00:53:17,280 --> 00:53:19,970
You might imagine having a table.

857
00:53:19,970 --> 00:53:24,590
But some columns in that table you
don't want to share with somebody else.

858
00:53:24,590 --> 00:53:28,610
Well, a common practice in
security is to only give someone

859
00:53:28,610 --> 00:53:31,470
the information they need to know.

860
00:53:31,470 --> 00:53:34,070
With views you could
do that very principle.

861
00:53:34,070 --> 00:53:36,020
You could accomplish
that by taking a table

862
00:53:36,020 --> 00:53:40,350
and reducing the number of
columns somebody else might see.

863
00:53:40,350 --> 00:53:43,370
So let's consider here
some rideshare company.

864
00:53:43,370 --> 00:53:46,370
And rideshare companies keep track
of a lot of information about us.

865
00:53:46,370 --> 00:53:49,640
They keep track of where we're
going, where we're coming from,

866
00:53:49,640 --> 00:53:54,110
who's in the car with us, and who
drove us there even sometimes.

867
00:53:54,110 --> 00:53:58,850
Let's say that a rideshare
company has a table called rides.

868
00:53:58,850 --> 00:54:03,110
And inside this table is
the origin of several rides

869
00:54:03,110 --> 00:54:08,210
and the destination along with
the rider who took that ride.

870
00:54:08,210 --> 00:54:10,880
Now, if I were to ask
you here, let's say,

871
00:54:10,880 --> 00:54:14,600
I want to give this data
to an analyst to figure out

872
00:54:14,600 --> 00:54:17,840
which rides are most popular or
which rides are highly rated.

873
00:54:17,840 --> 00:54:21,260
What data should I probably omit?

874
00:54:21,260 --> 00:54:24,050
If I wanted to share this
data with somebody else,

875
00:54:24,050 --> 00:54:27,150
what should I remove from this table?

876
00:54:27,150 --> 00:54:32,608
AUDIENCE: The data you could
remove would be the rider.

877
00:54:32,608 --> 00:54:33,900
CARTER ZENKE: Yeah, good point.

878
00:54:33,900 --> 00:54:37,850
So if I look at this table and
I think of what information

879
00:54:37,850 --> 00:54:43,170
is personally identifiable, PII,
Personally Identifiable Information,

880
00:54:43,170 --> 00:54:47,570
it seems like this rider column could
identify somebody who took a ride from,

881
00:54:47,570 --> 00:54:50,210
let's say, Good Egg Galaxy
to Honeyhive Galaxy,

882
00:54:50,210 --> 00:54:52,670
and we don't want to
leak that information

883
00:54:52,670 --> 00:54:54,920
if I gave this to some analyst.

884
00:54:54,920 --> 00:54:59,210
Well, with a view I could
keep this underlying table.

885
00:54:59,210 --> 00:55:04,280
But I could create a view that only
has the origin and the destination,

886
00:55:04,280 --> 00:55:06,710
that omits the rider column.

887
00:55:06,710 --> 00:55:11,240
And I could more confidently share
this view with an analyst, not

888
00:55:11,240 --> 00:55:13,190
the underlying table.

889
00:55:13,190 --> 00:55:15,980
So let me try to implement
this view, so we can see it

890
00:55:15,980 --> 00:55:18,690
in action inside this database here.

891
00:55:18,690 --> 00:55:20,940
I'll come back to my computer.

892
00:55:20,940 --> 00:55:23,610
And in this case, I have a new database.

893
00:55:23,610 --> 00:55:26,250
I have one called rideshare dot DB.

894
00:55:26,250 --> 00:55:31,510
I'll type SQLite 3 rideshare
dot DB to open up this database.

895
00:55:31,510 --> 00:55:35,580
And if I type dot schema to
see the schema of the database,

896
00:55:35,580 --> 00:55:38,670
I should see I have
one table called rides,

897
00:55:38,670 --> 00:55:44,790
where each ride has an ID, an
origin, a destination, and a rider.

898
00:55:44,790 --> 00:55:51,660
Now, I'll select star from the
rides table, select star from rides,

899
00:55:51,660 --> 00:55:52,740
hit semicolon.

900
00:55:52,740 --> 00:55:57,032
And I should see the very same
data we just saw in our slides,

901
00:55:57,032 --> 00:55:58,740
where Peach is going
from Good Egg Galaxy

902
00:55:58,740 --> 00:56:02,310
to Honeyhive Galaxy, Mario going from
Castle Courtyard to Cascade Kingdom,

903
00:56:02,310 --> 00:56:04,030
and so on.

904
00:56:04,030 --> 00:56:09,480
So now, I know I don't want to
share this data with somebody else.

905
00:56:09,480 --> 00:56:11,580
What could I do?

906
00:56:11,580 --> 00:56:13,800
Instead of selecting
everything, I could try

907
00:56:13,800 --> 00:56:16,630
to omit this rider column
to the point before.

908
00:56:16,630 --> 00:56:17,610
So let's try that.

909
00:56:17,610 --> 00:56:24,690
I'll select just ID, and just origin,
and just destination from rides.

910
00:56:24,690 --> 00:56:28,200
I'll deliberately omit
that rider column.

911
00:56:28,200 --> 00:56:30,300
Now, I'll hit Enter.

912
00:56:30,300 --> 00:56:33,576
And I should see the
origin and destination.

913
00:56:33,576 --> 00:56:36,840
But I actually don't
see the rider column.

914
00:56:36,840 --> 00:56:38,820
So this is a decent view.

915
00:56:38,820 --> 00:56:41,700
But I could probably
improve this even more.

916
00:56:41,700 --> 00:56:47,040
It turns out that in SQLite if I want
to fill a column with some arbitrary

917
00:56:47,040 --> 00:56:49,590
value, I can do that a bit like this.

918
00:56:49,590 --> 00:56:56,350
Why don't I try to select ID and
select origin and destination.

919
00:56:56,350 --> 00:57:01,620
But I also want to let somebody know
that this data has been anonymized.

920
00:57:01,620 --> 00:57:04,350
There is rider information
in the underlying table.

921
00:57:04,350 --> 00:57:06,340
But you don't have access to it.

922
00:57:06,340 --> 00:57:10,200
What I'll do instead is say, let me
make this new column that's filled

923
00:57:10,200 --> 00:57:13,470
with this value anonymous as a string.

924
00:57:13,470 --> 00:57:19,560
I'll say quote unquote "anonymous"
as this column name called rider.

925
00:57:19,560 --> 00:57:24,630
And I'll select all of this
from the rides table semicolon.

926
00:57:24,630 --> 00:57:29,760
Now I have more helpfully indicated
that there is a rider column.

927
00:57:29,760 --> 00:57:34,800
But you deliberately don't have
access to see who those riders are.

928
00:57:34,800 --> 00:57:38,610
So my last step here is probably
to turn this into a view.

929
00:57:38,610 --> 00:57:40,320
I have a good query.

930
00:57:40,320 --> 00:57:43,140
But I want to save it so
I could query it later on.

931
00:57:43,140 --> 00:57:45,570
Well, I could type select or not select.

932
00:57:45,570 --> 00:57:47,130
I could type Create View first.

933
00:57:47,130 --> 00:57:48,210
I'll create a view.

934
00:57:48,210 --> 00:57:49,980
This one just called analysis.

935
00:57:49,980 --> 00:57:51,810
I'll give this to my data analysts here.

936
00:57:51,810 --> 00:57:53,760
Create view analysis.

937
00:57:53,760 --> 00:58:01,570
And then I'll say as let's go with
select ID, origin, and destination,

938
00:58:01,570 --> 00:58:08,820
as well as the anonymous column
as, in this case, the rider.

939
00:58:08,820 --> 00:58:16,020
And I'll select all of this from my
rides table semicolon, hit Enter.

940
00:58:16,020 --> 00:58:21,070
Now, if I type dot schema,
I see my new view analysis.

941
00:58:21,070 --> 00:58:27,240
And if I say select star
from analysis semicolon,

942
00:58:27,240 --> 00:58:33,870
I should see my "anonymized",
quote unquote, this data.

943
00:58:33,870 --> 00:58:36,040
But this is helpful.

944
00:58:36,040 --> 00:58:40,380
I'd argue it's not very
secure, at least in SQLite.

945
00:58:40,380 --> 00:58:44,340
And if you're thinking
adversarially here,

946
00:58:44,340 --> 00:58:46,470
what might you be able to do still?

947
00:58:46,470 --> 00:58:49,740
If I gave you access
to this database, what

948
00:58:49,740 --> 00:58:53,100
could you do to de-anonymize
this data and to see

949
00:58:53,100 --> 00:58:55,560
the data I didn't want you to see?

950
00:58:55,560 --> 00:58:58,620
AUDIENCE: You can just
query the riders table

951
00:58:58,620 --> 00:59:03,216
and find what are the riders
values in the riders column.

952
00:59:03,216 --> 00:59:06,360
CARTER ZENKE: OK, so you're correct
that we can still see the rides table.

953
00:59:06,360 --> 00:59:11,400
Maybe I look at the ID column in the
analysis view and I see those IDs.

954
00:59:11,400 --> 00:59:15,180
I might be able to link those IDs
with the IDs in the rides table.

955
00:59:15,180 --> 00:59:17,490
And nothing's stopping
me at least in SQLite

956
00:59:17,490 --> 00:59:20,553
from saying select star
from the rides table.

957
00:59:20,553 --> 00:59:21,720
So I'll come back over here.

958
00:59:21,720 --> 00:59:22,900
And I'll try that.

959
00:59:22,900 --> 00:59:28,600
I'll say, in this case, select
star from rides, hit semicolon.

960
00:59:28,600 --> 00:59:32,190
And now I see all that ride
information, although I hid it

961
00:59:32,190 --> 00:59:34,350
in this view called analysis.

962
00:59:34,350 --> 00:59:37,260
And this is a downside of SQLite.

963
00:59:37,260 --> 00:59:41,790
In SQLite we can't set access
controls, meaning I either give you

964
00:59:41,790 --> 00:59:44,890
the entire database or nothing at all.

965
00:59:44,890 --> 00:59:48,870
In other DBMSes though, I
could set access controls.

966
00:59:48,870 --> 00:59:52,690
I could give David access
only to the analysis view.

967
00:59:52,690 --> 00:59:57,780
I can give myself access to both the
rides table and the analysis view.

968
00:59:57,780 --> 01:00:01,590
So keep that in mind when
you're working with SQLite.

969
01:00:01,590 --> 01:00:06,870
OK, so we've seen here how to secure
some data using SQLite and using

970
01:00:06,870 --> 01:00:08,337
views in this as well.

971
01:00:08,337 --> 01:00:10,920
Let's come back in just a minute
and see how we could actually

972
01:00:10,920 --> 01:00:15,340
use SQLite and views to
not just secure our data,

973
01:00:15,340 --> 01:00:19,180
but also to work on soft deletions,
which you saw just a bit ago.

974
01:00:19,180 --> 01:00:22,800
And we're back for one
final application of views.

975
01:00:22,800 --> 01:00:27,390
We saw just a few weeks ago how we
can implement soft deletions, that

976
01:00:27,390 --> 01:00:29,730
is not fully deleting
something in our database,

977
01:00:29,730 --> 01:00:31,860
but only marking it as deleted.

978
01:00:31,860 --> 01:00:36,180
And we saw this in the context of
Boston's own Museum of Fine Art.

979
01:00:36,180 --> 01:00:39,130
It's keeping track of items
inside their collection.

980
01:00:39,130 --> 01:00:44,310
So let's say we had this table called
collections full of items in the MFA,

981
01:00:44,310 --> 01:00:46,710
the Museum of Fine Arts in Boston.

982
01:00:46,710 --> 01:00:52,470
Here I have four items currently in
our collections ID one, ID two, three,

983
01:00:52,470 --> 01:00:53,490
and four.

984
01:00:53,490 --> 01:00:58,530
But I also have this deleted column,
which is either 0 or 1, depending on

985
01:00:58,530 --> 01:01:01,660
whether an item has been deleted or not.

986
01:01:01,660 --> 01:01:06,300
So I want to delete, let's say, Farmers
Working at Dawn, this first piece here.

987
01:01:06,300 --> 01:01:08,580
Instead of deleting it
from the table fully,

988
01:01:08,580 --> 01:01:13,360
I could set this deleted column
from a 0 to a 1, just like this.

989
01:01:13,360 --> 01:01:16,260
And now, it's only soft deleted.

990
01:01:16,260 --> 01:01:17,430
I still have the data here.

991
01:01:17,430 --> 01:01:18,840
I still have the title.

992
01:01:18,840 --> 01:01:23,070
But now instead of being
a 0, it's marked as a 1.

993
01:01:23,070 --> 01:01:25,420
Now, this is useful.

994
01:01:25,420 --> 01:01:31,950
But what could I do to only see now
the items that have not been deleted?

995
01:01:31,950 --> 01:01:35,820
If I show you this database,
collections here, the table here,

996
01:01:35,820 --> 01:01:38,890
I have titles and the deleted column.

997
01:01:38,890 --> 01:01:44,310
But if I only want to see those items
that are not deleted, what query could

998
01:01:44,310 --> 01:01:46,290
I use for that?

999
01:01:46,290 --> 01:01:48,225
AUDIENCE: You would
use a WHERE statement.

1000
01:01:48,225 --> 01:01:50,100
CARTER ZENKE: I could
use a SELECT statement.

1001
01:01:50,100 --> 01:01:52,600
And I could use probably
a WHERE clause as well.

1002
01:01:52,600 --> 01:01:56,580
I could say select star
where perhaps deleted

1003
01:01:56,580 --> 01:02:00,210
equals 0 because in this
case, when deleted is 0,

1004
01:02:00,210 --> 01:02:04,540
we know this item is still
supposed to be in our table.

1005
01:02:04,540 --> 01:02:08,100
So here's the query here,
select star from collections

1006
01:02:08,100 --> 01:02:10,980
where deleted equals 0.

1007
01:02:10,980 --> 01:02:14,520
Now, we can try to convert
this query into a view,

1008
01:02:14,520 --> 01:02:18,420
so we could always only see the
items in our current collection,

1009
01:02:18,420 --> 01:02:21,373
not those we have marked as deleted.

1010
01:02:21,373 --> 01:02:22,540
So I'll come back over here.

1011
01:02:22,540 --> 01:02:25,150
And we'll see we have
our collections table.

1012
01:02:25,150 --> 01:02:28,680
But we also have now perhaps
a new view of this table

1013
01:02:28,680 --> 01:02:30,670
called current collections.

1014
01:02:30,670 --> 01:02:35,730
Well, if I set current collections
to be the result of this query,

1015
01:02:35,730 --> 01:02:41,010
I should see when I update deleted
to be 1 for some of these items here,

1016
01:02:41,010 --> 01:02:44,620
it should be removed from this
current collections table.

1017
01:02:44,620 --> 01:02:46,170
So let's try that.

1018
01:02:46,170 --> 01:02:49,530
I could take Farmers Working
at Dawn and delete it.

1019
01:02:49,530 --> 01:02:54,720
I could see this correspondence between
this row here and this row here.

1020
01:02:54,720 --> 01:02:58,320
It's in both our collections
table and current collections.

1021
01:02:58,320 --> 01:03:02,460
But if I set deleted
from a 0 to 1, I should

1022
01:03:02,460 --> 01:03:08,310
see that item removed from current
collections, thus updating my view

1023
01:03:08,310 --> 01:03:12,970
and keeping track of my items that
are not currently in my collection.

1024
01:03:12,970 --> 01:03:17,430
So let's come back here and implement
this in terms of our own database.

1025
01:03:17,430 --> 01:03:19,440
I'll come back over here.

1026
01:03:19,440 --> 01:03:24,330
And I will open up MFA dot DB, that
old database from a few weeks ago.

1027
01:03:24,330 --> 01:03:28,500
I'll say SQLite 3 MFA dot DB.

1028
01:03:28,500 --> 01:03:32,040
And now if I type dot
schema, I should see

1029
01:03:32,040 --> 01:03:36,840
the tables inside of this database,
collections, artists, and created.

1030
01:03:36,840 --> 01:03:39,460
We'll focus in particular
on the collections table.

1031
01:03:39,460 --> 01:03:43,080
So if I say select
star from collections,

1032
01:03:43,080 --> 01:03:47,700
I should see that I have the title,
the accession number, and the date

1033
01:03:47,700 --> 01:03:50,460
this piece was acquired by the MFA.

1034
01:03:50,460 --> 01:03:54,510
Now, though, if I want
to add a deleted column,

1035
01:03:54,510 --> 01:03:57,840
I could do so by using alter table.

1036
01:03:57,840 --> 01:04:03,180
I could say, let's
alter table collections

1037
01:04:03,180 --> 01:04:07,050
and add a column called deleted.

1038
01:04:07,050 --> 01:04:10,590
I'll say this column
has the type integer.

1039
01:04:10,590 --> 01:04:12,120
It stores whole numbers.

1040
01:04:12,120 --> 01:04:16,320
And the default number is 0 semicolon.

1041
01:04:16,320 --> 01:04:23,790
Now, if I say select star
from collections semicolon,

1042
01:04:23,790 --> 01:04:27,090
I should see there is this
new deleted column where

1043
01:04:27,090 --> 01:04:30,820
all the values are 0 at first.

1044
01:04:30,820 --> 01:04:35,010
So if I want to delete something,
I could instead of typing

1045
01:04:35,010 --> 01:04:39,840
delete from as we saw before I can
just update this deleted value.

1046
01:04:39,840 --> 01:04:48,210
I could say maybe update collections
and set deleted equal to 1

1047
01:04:48,210 --> 01:04:51,330
where a certain title is present.

1048
01:04:51,330 --> 01:04:57,870
I'll say this that the
title is equal to Farmers--

1049
01:04:57,870 --> 01:05:03,210
title equals Farmers
Working at Dawn semicolon.

1050
01:05:03,210 --> 01:05:06,510
So now, I've soft deleted this item.

1051
01:05:06,510 --> 01:05:09,180
It's still in my table,
at least the title is.

1052
01:05:09,180 --> 01:05:11,910
But now I've marked it as being deleted.

1053
01:05:11,910 --> 01:05:14,400
It should not be in my ultimate view.

1054
01:05:14,400 --> 01:05:15,840
I'll hit Enter here.

1055
01:05:15,840 --> 01:05:18,570
And I'll say select
star from collections.

1056
01:05:18,570 --> 01:05:23,610
I'll see that in this case, I
still see Farmers Working at Dawn.

1057
01:05:23,610 --> 01:05:24,790
But what do I want?

1058
01:05:24,790 --> 01:05:31,000
I want a view that only shows me those
items that have not been deleted.

1059
01:05:31,000 --> 01:05:32,220
So I'll try this.

1060
01:05:32,220 --> 01:05:36,120
I'll say select star from collections
and apply a condition to it,

1061
01:05:36,120 --> 01:05:38,740
to our earlier point before.

1062
01:05:38,740 --> 01:05:44,160
So I'll say where, in this case,
deleted equals 0 semicolon, hit Enter.

1063
01:05:44,160 --> 01:05:49,710
And now I should see only those
items that are not marked as deleted.

1064
01:05:49,710 --> 01:05:52,275
But I want to turn this
into a view, which I could.

1065
01:05:52,275 --> 01:05:58,080
I could say maybe create a view
called current collections.

1066
01:05:58,080 --> 01:06:04,380
And then I'll make this the
following query as select ID,

1067
01:06:04,380 --> 01:06:08,730
select title, accession
number, and acquired.

1068
01:06:08,730 --> 01:06:12,390
But now I'll actually
omit that deleted column.

1069
01:06:12,390 --> 01:06:15,960
I don't want this column to show
up in my current collections.

1070
01:06:15,960 --> 01:06:19,980
I only want it to be in
the underlying table.

1071
01:06:19,980 --> 01:06:24,160
So I'll select all of these
columns from the collections table.

1072
01:06:24,160 --> 01:06:28,980
And then I'll say where deleted
equals 0 to apply this condition

1073
01:06:28,980 --> 01:06:33,250
and only get those items that
are not marked as deleted.

1074
01:06:33,250 --> 01:06:34,980
Now, I'll hit Enter.

1075
01:06:34,980 --> 01:06:40,080
And I should be able to do this,
select star from current collections

1076
01:06:40,080 --> 01:06:41,230
semicolon.

1077
01:06:41,230 --> 01:06:47,410
And now I'll see only those
items that have not been deleted.

1078
01:06:47,410 --> 01:06:49,190
So this seems worthwhile.

1079
01:06:49,190 --> 01:06:53,970
I'm now able to view the items that
are currently in my collection.

1080
01:06:53,970 --> 01:06:56,190
But a problem remains.

1081
01:06:56,190 --> 01:07:01,200
Like if I tried to update
this table and insert

1082
01:07:01,200 --> 01:07:04,380
something into it, what might happen?

1083
01:07:04,380 --> 01:07:06,450
Let me ask our audience here.

1084
01:07:06,450 --> 01:07:08,850
Based on what we saw
before, what would happen

1085
01:07:08,850 --> 01:07:14,190
if I tried to insert into this
table here, this view perhaps?

1086
01:07:14,190 --> 01:07:20,553
AUDIENCE: I think we cannot update
the view itself, as we saw before.

1087
01:07:20,553 --> 01:07:22,720
CARTER ZENKE: Yeah, we can't
update the view itself.

1088
01:07:22,720 --> 01:07:24,030
We can't insert new data.

1089
01:07:24,030 --> 01:07:25,780
We can't even delete data.

1090
01:07:25,780 --> 01:07:28,140
So let me try first to delete data.

1091
01:07:28,140 --> 01:07:30,960
And then we'll see how to insert
some data a little bit later.

1092
01:07:30,960 --> 01:07:33,210
I'll come back over to my terminal.

1093
01:07:33,210 --> 01:07:36,570
And what I wanted to delete,
Imaginative Landscape,

1094
01:07:36,570 --> 01:07:44,250
I could say delete from current
collections where the title equals--

1095
01:07:44,250 --> 01:07:45,960
let me make a new line here--

1096
01:07:45,960 --> 01:07:54,180
where the title equals
Imaginative Landscape semicolon.

1097
01:07:54,180 --> 01:07:56,895
Now, I'll see that same
error we got before when

1098
01:07:56,895 --> 01:07:58,500
we tried to update some view.

1099
01:07:58,500 --> 01:08:02,760
I can't modify this
because it is a view.

1100
01:08:02,760 --> 01:08:06,360
Well, it turns out there is
a way to work around this.

1101
01:08:06,360 --> 01:08:08,310
We can't modify the view.

1102
01:08:08,310 --> 01:08:12,000
But we can modify the underlying table.

1103
01:08:12,000 --> 01:08:17,920
So to do this, we rank from old friends,
in this case, those called triggers.

1104
01:08:17,920 --> 01:08:21,390
So if you remember, a
trigger is a way to specify

1105
01:08:21,390 --> 01:08:26,340
some SQL statement I want to run
when some other SQL statement is run.

1106
01:08:26,340 --> 01:08:29,609
We saw before we could
use before or after

1107
01:08:29,609 --> 01:08:34,050
to say run this SQL statement either
before or after some other SQL

1108
01:08:34,050 --> 01:08:35,100
statement here.

1109
01:08:35,100 --> 01:08:40,170
What we didn't see until now
is this INSTEAD OF trigger.

1110
01:08:40,170 --> 01:08:42,450
I can create a trigger that has a name.

1111
01:08:42,450 --> 01:08:46,810
And I can run it instead of
some other SQL statement.

1112
01:08:46,810 --> 01:08:50,760
So I could say INSTEAD
OF deleting on this view,

1113
01:08:50,760 --> 01:08:54,630
let me give you the proper
query to run, in this case.

1114
01:08:54,630 --> 01:08:58,620
So instead of deleting on
this view, I will for each row

1115
01:08:58,620 --> 01:09:00,420
run this other statement.

1116
01:09:00,420 --> 01:09:01,870
I'll say begin.

1117
01:09:01,870 --> 01:09:04,090
This is the statement I
want you to run instead.

1118
01:09:04,090 --> 01:09:08,100
And in this statement, we could
have something that actually updates

1119
01:09:08,100 --> 01:09:11,399
not the view, but the underlying table.

1120
01:09:11,399 --> 01:09:14,680
And we then end that query here.

1121
01:09:14,680 --> 01:09:19,920
So based on what we know about
our view and the underlying table

1122
01:09:19,920 --> 01:09:25,470
called collections, what query should
we put in here to actually delete

1123
01:09:25,470 --> 01:09:28,510
something from our table?

1124
01:09:28,510 --> 01:09:30,390
Let me ask our audience.

1125
01:09:30,390 --> 01:09:36,593
Instead of deleting from our view,
what should we instead do to our table?

1126
01:09:36,593 --> 01:09:38,760
AUDIENCE: Yeah, you should
update the deleted column

1127
01:09:38,760 --> 01:09:41,130
in the collections table.

1128
01:09:41,130 --> 01:09:44,399
CARTER ZENKE: Yeah, so instead of
deleting from our view, what I really

1129
01:09:44,399 --> 01:09:46,720
want to do is update that table.

1130
01:09:46,720 --> 01:09:48,645
So let's try to trigger
it to do just that.

1131
01:09:48,645 --> 01:09:51,540
I'll come back to my computer here.

1132
01:09:51,540 --> 01:09:54,420
And I'll start on this trigger.

1133
01:09:54,420 --> 01:09:57,390
To our point, I want to
create a trigger that

1134
01:09:57,390 --> 01:10:00,120
runs instead of deleting on this view.

1135
01:10:00,120 --> 01:10:04,710
And what I want to instead happen is
an update on the underlying table.

1136
01:10:04,710 --> 01:10:09,420
So I'll say create a trigger,
this one called delete.

1137
01:10:09,420 --> 01:10:15,840
And then I'll say instead of
delete on current collections, that

1138
01:10:15,840 --> 01:10:21,780
is whenever SQL listens and hears a
delete on the current collections view,

1139
01:10:21,780 --> 01:10:23,460
I don't want that to happen.

1140
01:10:23,460 --> 01:10:26,190
I instead want this other
statement to happen.

1141
01:10:26,190 --> 01:10:29,370
I'll say for each row
affected by that delete,

1142
01:10:29,370 --> 01:10:32,760
for each row I ask delete
on current collections, run

1143
01:10:32,760 --> 01:10:34,560
instead this other statement.

1144
01:10:34,560 --> 01:10:35,550
I'll say begin.

1145
01:10:35,550 --> 01:10:36,990
Here is that statement.

1146
01:10:36,990 --> 01:10:39,400
One, two, three, four
indents for style sake.

1147
01:10:39,400 --> 01:10:51,120
And now I'll say I want to update
collections and set deleted equal to 1.

1148
01:10:51,120 --> 01:10:54,300
But I don't want to
leave this just as is.

1149
01:10:54,300 --> 01:10:58,770
If I say update collections,
set deleted equal to 1,

1150
01:10:58,770 --> 01:11:03,160
what's the problem with that
statement, if I end it there?

1151
01:11:03,160 --> 01:11:05,820
AUDIENCE: So yeah, it will
update all the columns,

1152
01:11:05,820 --> 01:11:09,540
whether it is we want
it to be deleted or not.

1153
01:11:09,540 --> 01:11:12,970
CARTER ZENKE: Yeah, with no condition
it'll update all the columns here.

1154
01:11:12,970 --> 01:11:15,690
So I want to apply a condition
to make sure it only affects

1155
01:11:15,690 --> 01:11:18,340
the particular row I'm working with.

1156
01:11:18,340 --> 01:11:20,280
Well, what uniquely identifies each row?

1157
01:11:20,280 --> 01:11:21,490
The ID does.

1158
01:11:21,490 --> 01:11:23,460
So let's try using the ID here.

1159
01:11:23,460 --> 01:11:24,390
I'll come back.

1160
01:11:24,390 --> 01:11:28,530
And I'll say I want to update
collections, set deleted equal to 1

1161
01:11:28,530 --> 01:11:31,800
but only when some condition is true.

1162
01:11:31,800 --> 01:11:37,600
I'll indent again four times
and say where, in this case,

1163
01:11:37,600 --> 01:11:44,310
the ID column is equal to the old ID.

1164
01:11:44,310 --> 01:11:48,750
Now, old here is a special keyword
that means the row we just deleted

1165
01:11:48,750 --> 01:11:51,540
or that we would have deleted
from current collections.

1166
01:11:51,540 --> 01:11:58,260
I want to update that same row with that
same ID in the underlying collections

1167
01:11:58,260 --> 01:11:59,460
table.

1168
01:11:59,460 --> 01:12:02,400
So I'll put a semicolon here to end.

1169
01:12:02,400 --> 01:12:04,980
And then I'll say end this statement.

1170
01:12:04,980 --> 01:12:06,870
This is the entire
statement I want to run

1171
01:12:06,870 --> 01:12:10,980
after I try to delete all my
current collections table.

1172
01:12:10,980 --> 01:12:12,780
Now, I'll hit Enter.

1173
01:12:12,780 --> 01:12:14,200
And I'll see nothing changes.

1174
01:12:14,200 --> 01:12:18,490
But if I say dot schema, I
should then see down below here,

1175
01:12:18,490 --> 01:12:21,750
I have a trigger called delete that
will instead run an update when

1176
01:12:21,750 --> 01:12:24,610
I try to delete from this view.

1177
01:12:24,610 --> 01:12:25,710
So let's try that.

1178
01:12:25,710 --> 01:12:30,570
What I can do now is try again to
delete from current collections.

1179
01:12:30,570 --> 01:12:36,270
I'll say delete from
current collections where.

1180
01:12:36,270 --> 01:12:41,267
In this case, the title
equals Imaginative Landscape.

1181
01:12:41,267 --> 01:12:42,600
Let's say we sold this painting.

1182
01:12:42,600 --> 01:12:44,460
It's no longer part of our collection.

1183
01:12:44,460 --> 01:12:47,140
Imaginative Landscape semicolon.

1184
01:12:47,140 --> 01:12:50,040
And now it actually seems to work.

1185
01:12:50,040 --> 01:12:52,060
I get no error here.

1186
01:12:52,060 --> 01:12:54,060
So it seems like it went OK.

1187
01:12:54,060 --> 01:12:59,460
Well, if I select star
from current collections,

1188
01:12:59,460 --> 01:13:01,920
I actually see that it's gone.

1189
01:13:01,920 --> 01:13:03,270
But is it really gone?

1190
01:13:03,270 --> 01:13:08,220
If I try select star from collections,
not current collections, the underlying

1191
01:13:08,220 --> 01:13:12,780
table for current collections
semicolon, what do I see?

1192
01:13:12,780 --> 01:13:18,780
That the ID column has been
updated from a 0 now to a 1.

1193
01:13:18,780 --> 01:13:21,420
So a good way of trying
to use triggers here

1194
01:13:21,420 --> 01:13:27,240
to update our views by instead
modifying the underlying table.

1195
01:13:27,240 --> 01:13:32,650
Now, going to be one more issue we
encounter here, which is the following.

1196
01:13:32,650 --> 01:13:36,430
Let's say I'm at this
current state in my tables.

1197
01:13:36,430 --> 01:13:39,450
I have collections, where Farmers
Working at Dawn is deleted.

1198
01:13:39,450 --> 01:13:41,190
It has a 1 right here.

1199
01:13:41,190 --> 01:13:45,340
And we thus can't see it
inside of current collections.

1200
01:13:45,340 --> 01:13:49,080
Well, let's say I want to not
delete from current collections,

1201
01:13:49,080 --> 01:13:51,930
but also add to current collections.

1202
01:13:51,930 --> 01:13:54,660
I could try to insert into this view.

1203
01:13:54,660 --> 01:13:59,070
But I probably actually won't be
able to do that because it is a view.

1204
01:13:59,070 --> 01:14:00,390
Let's think then.

1205
01:14:00,390 --> 01:14:06,900
If I wanted to insert into this view,
what should I really do instead?

1206
01:14:06,900 --> 01:14:11,318
AUDIENCE: You should add it to the
collections table instead of the view.

1207
01:14:11,318 --> 01:14:12,610
CARTER ZENKE: I like that idea.

1208
01:14:12,610 --> 01:14:18,480
So I should add this row to the
collections table instead of the view.

1209
01:14:18,480 --> 01:14:21,120
But now, here's a bit of a trick.

1210
01:14:21,120 --> 01:14:24,870
Let's say I add Farmers Working at Dawn.

1211
01:14:24,870 --> 01:14:27,030
We reacquired it.

1212
01:14:27,030 --> 01:14:34,630
What should I do in that case, when I'm
adding a painting I previously deleted?

1213
01:14:34,630 --> 01:14:40,200
AUDIENCE: So what we could do is
we know that the deleted column--

1214
01:14:40,200 --> 01:14:45,990
we can check if the deleted column
is gone and then based on that,

1215
01:14:45,990 --> 01:14:49,530
we could update that particular column.

1216
01:14:49,530 --> 01:14:57,430
Or if there is no title
in the collections table,

1217
01:14:57,430 --> 01:14:59,782
we can just add it, insert it.

1218
01:14:59,782 --> 01:15:01,740
CARTER ZENKE: Yeah, I
like your thinking there.

1219
01:15:01,740 --> 01:15:05,820
First trying to check, do we have
this item already in our table?

1220
01:15:05,820 --> 01:15:07,290
Or do we not?

1221
01:15:07,290 --> 01:15:09,690
If we do already have it in
our table, we should just

1222
01:15:09,690 --> 01:15:12,780
flip this bit from 1 to a 0.

1223
01:15:12,780 --> 01:15:17,890
If we don't though, let's add a
brand new row, as we heard before.

1224
01:15:17,890 --> 01:15:19,350
So let's think then.

1225
01:15:19,350 --> 01:15:23,610
What could we use to conditionally run
some code on the collections table?

1226
01:15:23,610 --> 01:15:27,180
Well, it turns out that you can
use some other aspects of triggers.

1227
01:15:27,180 --> 01:15:32,970
In this case, triggers have not just
unconditional use cases, but also

1228
01:15:32,970 --> 01:15:35,190
conditional ones too.

1229
01:15:35,190 --> 01:15:37,390
So let's try this.

1230
01:15:37,390 --> 01:15:43,200
I could try to insert on this view
and then run some code down below.

1231
01:15:43,200 --> 01:15:45,400
But to our discussion
a little bit earlier,

1232
01:15:45,400 --> 01:15:47,700
I don't want to unconditionally do this.

1233
01:15:47,700 --> 01:15:50,700
I want to check, is
this item in my table?

1234
01:15:50,700 --> 01:15:52,540
Or is it not already?

1235
01:15:52,540 --> 01:15:56,070
So for that, I could use
this keyword called when.

1236
01:15:56,070 --> 01:16:02,050
I could say for each row when some
condition is true run this trigger.

1237
01:16:02,050 --> 01:16:04,710
And you can imagine having
two triggers to insert

1238
01:16:04,710 --> 01:16:09,540
into our view, one that runs when
the item is already in our table.

1239
01:16:09,540 --> 01:16:13,560
We could then flip that
bit from a 1 to a 0.

1240
01:16:13,560 --> 01:16:18,300
We could also have a different
trigger to actually insert a new item.

1241
01:16:18,300 --> 01:16:21,300
In that case, actually
inserting a brand new row

1242
01:16:21,300 --> 01:16:23,920
to our underlying collections table.

1243
01:16:23,920 --> 01:16:28,620
So let's work on this now to finish
off and polish our collections view.

1244
01:16:28,620 --> 01:16:31,480
I'll come back to our terminal here.

1245
01:16:31,480 --> 01:16:35,760
And what we'll try to do is
now insert into this table.

1246
01:16:35,760 --> 01:16:41,040
I'll create a new trigger, in this
case, called insert when exists.

1247
01:16:41,040 --> 01:16:46,930
I'll say create trigger
insert when exists.

1248
01:16:46,930 --> 01:16:49,830
So this trigger will run when
an item we're trying to insert

1249
01:16:49,830 --> 01:16:52,440
is already in our collections table.

1250
01:16:52,440 --> 01:16:57,510
I'll say instead of insert on
the current collections table,

1251
01:16:57,510 --> 01:17:00,090
I want this to happen instead.

1252
01:17:00,090 --> 01:17:03,940
I'll say for each row.

1253
01:17:03,940 --> 01:17:07,320
Now I'll Enter again and say when--

1254
01:17:07,320 --> 01:17:16,650
just for style sake-- when new dot
accession number is in my table

1255
01:17:16,650 --> 01:17:17,620
already.

1256
01:17:17,620 --> 01:17:20,950
I want to find which accession
numbers I already have.

1257
01:17:20,950 --> 01:17:24,900
And if you remember, accession
number is a unique ID for the museum.

1258
01:17:24,900 --> 01:17:27,270
It's not the ID column
of my table, but a unique

1259
01:17:27,270 --> 01:17:29,598
ID to keep track of items in the museum.

1260
01:17:29,598 --> 01:17:31,890
So if I've already given this
item an accession number,

1261
01:17:31,890 --> 01:17:34,260
I should already see
it in my table already.

1262
01:17:34,260 --> 01:17:37,620
So I'll say then Enter,
Enter, Enter for style sake,

1263
01:17:37,620 --> 01:17:48,280
select accession number
from collections.

1264
01:17:48,280 --> 01:17:52,650
So this subquery tells me what
accession numbers do we already have.

1265
01:17:52,650 --> 01:17:56,670
And this condition tells me
is the new row I'm inserting,

1266
01:17:56,670 --> 01:17:59,640
does that have the same
accession number or not?

1267
01:17:59,640 --> 01:18:02,550
Now, I'll close this subquery.

1268
01:18:02,550 --> 01:18:05,040
And I'll continue my trigger.

1269
01:18:05,040 --> 01:18:06,780
I'll say begin.

1270
01:18:06,780 --> 01:18:11,640
Well, in the case that this item
already exists in my collections table,

1271
01:18:11,640 --> 01:18:12,930
I just want to update.

1272
01:18:12,930 --> 01:18:17,130
I want to flip that integer
from a 1 back down to a 0.

1273
01:18:17,130 --> 01:18:26,470
So I'll say update collections
and set deleted equal to 0.

1274
01:18:26,470 --> 01:18:29,850
Now, I want to do this
not on my entire table,

1275
01:18:29,850 --> 01:18:36,510
but only where the accession number
is equal to the new accession

1276
01:18:36,510 --> 01:18:41,440
number, that is the accession
number I'm inserting here as well.

1277
01:18:41,440 --> 01:18:43,390
I'll then end like this.

1278
01:18:43,390 --> 01:18:45,690
And now I have my entire trigger.

1279
01:18:45,690 --> 01:18:50,760
I'll say instead of inserting into
my view, I want to run this instead.

1280
01:18:50,760 --> 01:18:53,280
Check if I've already added this item.

1281
01:18:53,280 --> 01:18:58,200
And if I have, change that
deleted column from a 1 to a 0.

1282
01:18:58,200 --> 01:19:00,030
Now I'll hit Enter.

1283
01:19:00,030 --> 01:19:03,030
And let me hit Enter again.

1284
01:19:03,030 --> 01:19:06,030
And let me check.

1285
01:19:06,030 --> 01:19:09,210
I needed to have a
semicolon after my where.

1286
01:19:09,210 --> 01:19:12,300
So let me try to run this again.

1287
01:19:12,300 --> 01:19:14,280
I'll do Control D to restart.

1288
01:19:14,280 --> 01:19:16,890
And I'll say SQLite 3.

1289
01:19:16,890 --> 01:19:18,780
Let's do MFA dot DB.

1290
01:19:18,780 --> 01:19:20,800
And I'll restart this trigger.

1291
01:19:20,800 --> 01:19:26,130
So here I'll say create
trigger insert when exists.

1292
01:19:26,130 --> 01:19:28,020
We gave the name before.

1293
01:19:28,020 --> 01:19:33,750
Then I'll say instead of insert
on current collections, what

1294
01:19:33,750 --> 01:19:35,460
I want to do is run this.

1295
01:19:35,460 --> 01:19:39,330
I'll say for each row I
insert, check some condition.

1296
01:19:39,330 --> 01:19:43,680
When the new accession number
that is the unique ID the museum

1297
01:19:43,680 --> 01:19:48,330
has for this item is in my already
available accession numbers.

1298
01:19:48,330 --> 01:20:00,430
I'll say select in this case, select
accession number from collections.

1299
01:20:00,430 --> 01:20:03,010
And then I'll close the subquery.

1300
01:20:03,010 --> 01:20:06,760
Now, I'll say I want to begin
the same way I want to run.

1301
01:20:06,760 --> 01:20:11,770
Begin now update, in this
case, my collections table and

1302
01:20:11,770 --> 01:20:14,500
set the deleted column
from a 1 back down to a 0.

1303
01:20:14,500 --> 01:20:24,560
Update collections and set deleted
equals 0 where, in this case,

1304
01:20:24,560 --> 01:20:30,130
the accession number is equal
to the new accession number,

1305
01:20:30,130 --> 01:20:32,900
only modifying that particular row.

1306
01:20:32,900 --> 01:20:35,350
Now semicolon to close this statement.

1307
01:20:35,350 --> 01:20:39,320
And then I'll say end below,
a second semicolon, Enter.

1308
01:20:39,320 --> 01:20:43,000
Now if I type dot schema,
I should see my new trigger

1309
01:20:43,000 --> 01:20:47,650
to insert into my collections table
when I try to insert into my view.

1310
01:20:47,650 --> 01:20:48,790
So let's try this.

1311
01:20:48,790 --> 01:20:54,323
I'll say maybe I want to insert
into my current collections.

1312
01:20:54,323 --> 01:20:55,240
Well, let me try this.

1313
01:20:55,240 --> 01:21:01,180
I'll say select star from collections.

1314
01:21:01,180 --> 01:21:05,830
And let's say I want to reinsert,
in this case, Profusion of Flowers.

1315
01:21:05,830 --> 01:21:09,760
So I'll say insert into
my current collections.

1316
01:21:09,760 --> 01:21:14,300
Actually, reinsert into
current collections.

1317
01:21:14,300 --> 01:21:17,200
And I want to insert
Imaginative Landscape.

1318
01:21:17,200 --> 01:21:21,310
So I'll say insert into current
collections some set of values along

1319
01:21:21,310 --> 01:21:22,070
with some columns.

1320
01:21:22,070 --> 01:21:25,900
I'll say the title is Perfusion of
Flowers as Imaginative Landscape

1321
01:21:25,900 --> 01:21:32,905
later on, then accession number, then
the date it was acquired, like this.

1322
01:21:32,905 --> 01:21:35,590
Now I'll hit Enter and
provide the values.

1323
01:21:35,590 --> 01:21:40,030
I'll say the title is
Imaginative Landscape like this.

1324
01:21:40,030 --> 01:21:43,450
The accession number is 56.496.

1325
01:21:43,450 --> 01:21:47,090
And the acquired date is going
to be null, in this case.

1326
01:21:47,090 --> 01:21:48,820
We don't know when we got it.

1327
01:21:48,820 --> 01:21:51,520
I'll hit semicolon here and Enter.

1328
01:21:51,520 --> 01:21:55,390
So if I select star from collections
now, select star from collections

1329
01:21:55,390 --> 01:22:00,040
semicolon, I should see that I
didn't actually insert a new row.

1330
01:22:00,040 --> 01:22:05,140
I only flipped that deleted
value from a 1 back down to a 0.

1331
01:22:05,140 --> 01:22:07,700
Now, there's certainly more to do here.

1332
01:22:07,700 --> 01:22:12,170
I could write a trigger to actually
add a brand new row to this table here.

1333
01:22:12,170 --> 01:22:16,180
I could also try to update this data,
change, for example, the acquired

1334
01:22:16,180 --> 01:22:19,780
date from null to a date
that we actually do know.

1335
01:22:19,780 --> 01:22:21,850
For now though, we leave
all that up to you.

1336
01:22:21,850 --> 01:22:24,630
And we'll see you next time.

1337
01:22:24,630 --> 01:22:26,000