WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:03.437
[AUDIO LOGO]

00:00:16.720 --> 00:00:20.080
[? CARTER ZENKE: ?] Well, hello, one
and all, and welcome back to CS50's

00:00:20.080 --> 00:00:22.480
Introduction to Databases with SQL.

00:00:22.480 --> 00:00:25.720
My name is Carter [? Zenke, ?]
and last we left off,

00:00:25.720 --> 00:00:28.450
we learned how to create our
very own database schemas--

00:00:28.450 --> 00:00:31.600
that is, a way to organize
data in our database.

00:00:31.600 --> 00:00:34.450
Today, we'll learn how to actually
add data to our databases--

00:00:34.450 --> 00:00:38.150
to insert, to update, and delete data.

00:00:38.150 --> 00:00:43.330
We'll do all of this in the context of
Boston's very own Museum of Fine Arts,

00:00:43.330 --> 00:00:45.170
or the MFA for short.

00:00:45.170 --> 00:00:47.617
So the MFA here in Boston is a century--

00:00:47.617 --> 00:00:49.450
well, maybe, about a
century-old museum that

00:00:49.450 --> 00:00:54.640
has many artifacts and artwork inside
of it, both historical and contemporary.

00:00:54.640 --> 00:00:58.720
And it's worth asking, how do they
keep track of the thousands of items

00:00:58.720 --> 00:01:00.410
that are in their collections?

00:01:00.410 --> 00:01:02.290
What could they possibly use?

00:01:02.290 --> 00:01:06.580
Well, chances are, they're likely
using some kind of database.

00:01:06.580 --> 00:01:10.630
And on that database, they
want to do four actions

00:01:10.630 --> 00:01:13.180
we've learned about back in week zero.

00:01:13.180 --> 00:01:17.620
They could, perhaps, create data-- to
add data or insert data to the database

00:01:17.620 --> 00:01:20.270
when they get some new piece
of artwork, for instance.

00:01:20.270 --> 00:01:22.970
They might want to query the
database, to read from it.

00:01:22.970 --> 00:01:27.250
They could also update data, change the
artist, change the artwork in some way.

00:01:27.250 --> 00:01:31.340
And they could also just delete
data to remove it all together.

00:01:31.340 --> 00:01:32.920
But if we think about these--

00:01:32.920 --> 00:01:36.370
creating, reading, updating,
and deleting-- we'll

00:01:36.370 --> 00:01:39.040
notice that reading,
updating, and deleting-- we

00:01:39.040 --> 00:01:42.910
can't do those if we don't
actually have data in our database.

00:01:42.910 --> 00:01:46.210
And so today, we'll see
how to create data--

00:01:46.210 --> 00:01:50.860
how to insert data into
our very own database.

00:01:50.860 --> 00:01:54.400
Now, let's think about the MFA's
[? collections, ?] [? their ?]

00:01:54.400 --> 00:01:55.983
[? collection ?] of art and artifacts.

00:01:55.983 --> 00:01:58.275
And let me propose that they
have a database [? that ?]

00:01:58.275 --> 00:01:59.540
looks a bit like this.

00:01:59.540 --> 00:02:03.700
It's a single table, and it has
a title column and accession

00:02:03.700 --> 00:02:08.380
number, which is a fancy way of saying
a unique ID internal to the museum,

00:02:08.380 --> 00:02:10.960
and also, a date it was acquired.

00:02:10.960 --> 00:02:15.610
We have, of course, a primary
key on this table called ID.

00:02:15.610 --> 00:02:16.630
And let's think.

00:02:16.630 --> 00:02:19.660
Well, the museum might want
to acquire this piece here--

00:02:19.660 --> 00:02:22.390
this one called Profusion of Flowers.

00:02:22.390 --> 00:02:27.940
Well, how could they log that this
artwork is inside of their database?

00:02:27.940 --> 00:02:30.160
They could maybe just add a new row.

00:02:30.160 --> 00:02:32.290
They could say, let's
put Profusion of Flowers

00:02:32.290 --> 00:02:34.900
as the first item in
our collections here.

00:02:34.900 --> 00:02:37.430
We'll give it a title and
accession number-- which, again,

00:02:37.430 --> 00:02:40.100
is just a unique ID
internal to the museum--

00:02:40.100 --> 00:02:42.250
and the date it was acquired.

00:02:42.250 --> 00:02:45.340
And that row then has
its own primary key

00:02:45.340 --> 00:02:49.000
to identify this row
uniquely in our database.

00:02:49.000 --> 00:02:51.040
Now let's say they get
another piece of artwork.

00:02:51.040 --> 00:02:54.130
They get this one, called
Farmers Working at Dawn,

00:02:54.130 --> 00:02:56.453
and they want to add this
one to their table, too.

00:02:56.453 --> 00:02:58.120
Well, they could do the very same thing.

00:02:58.120 --> 00:02:59.380
They could just add a new row.

00:02:59.380 --> 00:03:01.930
They could say, let's make a
title, an accession number,

00:03:01.930 --> 00:03:04.240
and a date it was acquired,
and this brand-new row

00:03:04.240 --> 00:03:06.670
here for that piece of artwork.

00:03:06.670 --> 00:03:08.830
And maybe they get another one, too.

00:03:08.830 --> 00:03:09.610
Same thing.

00:03:09.610 --> 00:03:11.770
Maybe they'll get back
Spring Outing, and they

00:03:11.770 --> 00:03:13.720
want to add this to their collection.

00:03:13.720 --> 00:03:17.810
They could simply add
another row like this.

00:03:17.810 --> 00:03:22.060
Now, it turns out that the database
administrator behind the MFA

00:03:22.060 --> 00:03:26.620
might be running a SQL statement
that looks a bit like this--

00:03:26.620 --> 00:03:28.870
INSERT INTO.

00:03:28.870 --> 00:03:34.180
We can use INSERT INTO to add
a new row to any given table.

00:03:34.180 --> 00:03:38.900
And notice how insert into needs to
know a few pieces of information.

00:03:38.900 --> 00:03:42.550
The first is what table to insert into.

00:03:42.550 --> 00:03:45.010
What is the name of that table?

00:03:45.010 --> 00:03:47.710
Second, it needs to
know what columns are

00:03:47.710 --> 00:03:51.130
we adding data to inside of this table?

00:03:51.130 --> 00:03:53.380
We give it a list of those columns here.

00:03:53.380 --> 00:03:57.130
Then, of course, it needs to
know what new values should

00:03:57.130 --> 00:04:00.790
go into this new row for
each of these columns.

00:04:00.790 --> 00:04:03.820
Like we saw before, is
it Profusion of Flowers?

00:04:03.820 --> 00:04:05.200
Is it Spring Outing?

00:04:05.200 --> 00:04:06.310
Et cetera.

00:04:06.310 --> 00:04:09.380
Here, we can see that we
have this list of values.

00:04:09.380 --> 00:04:12.610
And notice how value 0, the
first value in this list,

00:04:12.610 --> 00:04:17.560
corresponds to the new value that will
be inserted [? to ?] this first column.

00:04:17.560 --> 00:04:21.459
And we can keep having value 1,
or column 1, value 2 and column 2,

00:04:21.459 --> 00:04:25.910
each one aligning with that
particular column there.

00:04:25.910 --> 00:04:29.300
So let's see an example
of this actually in code,

00:04:29.300 --> 00:04:31.540
understand it a bit more concretely.

00:04:31.540 --> 00:04:33.730
I'll go back to my computer here.

00:04:33.730 --> 00:04:36.070
Let's actually create
our very own database

00:04:36.070 --> 00:04:40.990
that involves this schema of having our
very own table which can keep artists

00:04:40.990 --> 00:04:44.630
and artwork and artifacts as well.

00:04:44.630 --> 00:04:45.400
So I'll type

00:04:45.400 --> 00:04:49.510
1: [? TRANSCRIPT BREAK ?]
So I'll type sqlite3 mfa.db

00:04:49.510 --> 00:04:55.450
to create a database for the Museum
of Fine Arts, abbreviated as MFA.

00:04:55.450 --> 00:04:56.690
I'll hit Enter here.

00:04:56.690 --> 00:05:03.310
And notice how I can now have access
to my very own mfa.db inside my SQLite

00:05:03.310 --> 00:05:04.390
environment.

00:05:04.390 --> 00:05:09.510
Well, now, I can type .schema to
see the schema of this database.

00:05:09.510 --> 00:05:12.483
So if I hit Enter here,
well, nothing's there.

00:05:12.483 --> 00:05:13.900
Because I just made this database.

00:05:13.900 --> 00:05:16.000
There's nothing in it yet.

00:05:16.000 --> 00:05:19.840
Well, it turns out I do actually
have a schema file prepared for me

00:05:19.840 --> 00:05:22.330
already in schema.sql.

00:05:22.330 --> 00:05:26.530
Here, I propose we make a table called
"collections," and like our table

00:05:26.530 --> 00:05:29.320
visually, it has let's say four columns.

00:05:29.320 --> 00:05:35.515
One as an "id," the primary key
of this table, one for a "title,"

00:05:35.515 --> 00:05:39.610
one for the "accession_number," the
unique ID internal to the museum,

00:05:39.610 --> 00:05:42.670
and finally, the date it was acquired.

00:05:42.670 --> 00:05:47.720
And notice here, that "title" has to
be actually added to our database.

00:05:47.720 --> 00:05:50.500
It can't be [? NULL. ?] It
can't have an empty value.

00:05:50.500 --> 00:05:52.690
Same thing for "accession_number."

00:05:52.690 --> 00:05:56.135
And further, "accession_number"
has to be unique.

00:05:56.135 --> 00:05:58.510
I can't have two items [? with ?]
[? a ?] [? different ?]

00:05:58.510 --> 00:05:59.590
"accession_number."

00:05:59.590 --> 00:06:02.380
Otherwise, I might get them
confused in my database

00:06:02.380 --> 00:06:05.330
or inside my museum archives as well.

00:06:05.330 --> 00:06:09.550
So let me then add this
schema to my database.

00:06:09.550 --> 00:06:12.280
And I can do so with
this command right here.

00:06:12.280 --> 00:06:16.780
This is a SQLite command called .read,
that we saw a little bit last week

00:06:16.780 --> 00:06:17.690
as well.

00:06:17.690 --> 00:06:21.440
I could type .read and then the
name of this file I want to read.

00:06:21.440 --> 00:06:25.620
So I'll say .read schema.sql.

00:06:25.620 --> 00:06:28.020
Hit Enter.

00:06:28.020 --> 00:06:33.450
And now if I type, let's say .schema,
I can see that same [? sql ?]

00:06:33.450 --> 00:06:39.280
[? schema, ?] same database
schema. now inside of my terminal.

00:06:39.280 --> 00:06:44.420
OK, so now as promised, let's try
adding some rows to this table.

00:06:44.420 --> 00:06:48.040
Because right now, if I do SELECT
[? star ?] from [? collections ?]

00:06:48.040 --> 00:06:52.930
[? semicolon, ?] I don't see anything
because nothing is inside yet.

00:06:52.930 --> 00:06:56.000
But I could add something
using INSERT INTO.

00:06:56.000 --> 00:06:57.390
So let's try that.

00:06:57.390 --> 00:07:01.630
I'll say INSERT INTO
the "collections table."

00:07:01.630 --> 00:07:05.380
And now, I have to ask, what
columns inside of "collections"

00:07:05.380 --> 00:07:08.080
[? do I ?] [? want ?] to add data to.

00:07:08.080 --> 00:07:11.620
Well, I probably want to add
to the first the "id" column,

00:07:11.620 --> 00:07:13.480
that primary key column.

00:07:13.480 --> 00:07:17.470
Then maybe, I want to add a
"title" to this row, and also

00:07:17.470 --> 00:07:21.310
an "accession_number," and also,
of course, the date this piece

00:07:21.310 --> 00:07:22.670
was "acquired."

00:07:22.670 --> 00:07:25.750
So now, I have the table
I'm inserting into along

00:07:25.750 --> 00:07:28.240
with the columns I'm adding values for.

00:07:28.240 --> 00:07:31.060
So for style's sake,
I'll hit Enter here.

00:07:31.060 --> 00:07:35.710
And now, I could type a list of
values to insert into this new row.

00:07:35.710 --> 00:07:40.060
I could say VALUES, and then
inside parentheses, the values

00:07:40.060 --> 00:07:41.650
I want to insert.

00:07:41.650 --> 00:07:45.010
Maybe the first primary key
that I give to this item

00:07:45.010 --> 00:07:49.420
is going to be just [? 1, ?] start
at 1, and add up as we add new items.

00:07:49.420 --> 00:07:53.530
For the title, I'll say let's call
this one "Profusion of flowers."

00:07:53.530 --> 00:07:56.830
This is the piece we recently
acquired into our collection.

00:07:56.830 --> 00:08:00.400
The accession number that
we gave it was 56.257.

00:08:00.400 --> 00:08:06.120
And the date it was acquired
was back in 1956-04-12.

00:08:06.120 --> 00:08:07.680
Now, I have all these values here.

00:08:07.680 --> 00:08:12.030
I'll type [? semicolon ?] hit
Enter, and nothing seems to happen.

00:08:12.030 --> 00:08:17.370
But if I type, let's say, select
[? star ?] from collections

00:08:17.370 --> 00:08:22.830
[? semicolon, ?] what do I
see, but this new row inside.

00:08:22.830 --> 00:08:24.540
Let's do it again to get a hang of this.

00:08:24.540 --> 00:08:27.600
I want to now add "Farmers
working at dawn" to my collection.

00:08:27.600 --> 00:08:31.080
So I'll do the same thing,
I'll say let's INSERT INTO

00:08:31.080 --> 00:08:32.669
the "collections" table.

00:08:32.669 --> 00:08:36.929
And let's add values for
the "id" column, the "title"

00:08:36.929 --> 00:08:42.659
column, the "accession_number," and
also the "acquired" column like this.

00:08:42.659 --> 00:08:44.640
Again for style's sake, I'll hit Enter.

00:08:44.640 --> 00:08:48.780
Now, I give some list of values
to add into this new row.

00:08:48.780 --> 00:08:53.280
I'll say VALUES here, then this list
of values to add for each column.

00:08:53.280 --> 00:08:57.360
Well, the first column, I have
to give a value for is "id."

00:08:57.360 --> 00:09:02.460
If you remember, our last id was
1, so what should this id be?

00:09:02.460 --> 00:09:03.540
Maybe 2.

00:09:03.540 --> 00:09:08.190
So I'll type 2 here as the next
increment of my primary key,

00:09:08.190 --> 00:09:09.330
then I'll give it a title.

00:09:09.330 --> 00:09:13.470
And this title was
"Farmers working at dawn."

00:09:13.470 --> 00:09:15.810
We gave it an accession
number to keep track of it

00:09:15.810 --> 00:09:18.910
in our own internal museum
records, our own archives here.

00:09:18.910 --> 00:09:25.110
So I'll say 11.6152 is the
[? accession ?] [? number ?] for this

00:09:25.110 --> 00:09:32.740
particular item, then I'll say we
acquired it back in 1911-08-03.

00:09:32.740 --> 00:09:36.010
Now, I'll hit a semicolon
here, hit Enter.

00:09:36.010 --> 00:09:39.520
And now, I should be able to see if
I type [? SELECT ?] [? star ?] FROM

00:09:39.520 --> 00:09:42.520
"collections," SELECT
[? star ?] FROM "collections"

00:09:42.520 --> 00:09:48.740
[? semicolon, ?] I now have these
two items inside of my collection.

00:09:48.740 --> 00:09:51.440
Now, let's do one more here.

00:09:51.440 --> 00:09:55.640
But let's focus in particular
on this primary key.

00:09:55.640 --> 00:09:59.570
Notice here, how we've been actually
inserting our very own primary key

00:09:59.570 --> 00:10:01.400
1 and then 2.

00:10:01.400 --> 00:10:05.223
But maybe that's not the best design.

00:10:05.223 --> 00:10:06.140
Let me try this again.

00:10:06.140 --> 00:10:12.080
I'll say INSERT INTO, let's go for,
"collections" [? is ?] our table name.

00:10:12.080 --> 00:10:16.790
And maybe I'll try again to add
to the "id" column, the "title"

00:10:16.790 --> 00:10:23.930
column, the "accession_number,"
and the "acquired" column here.

00:10:23.930 --> 00:10:29.960
What could go wrong do you think if I
try to specify the primary key myself?

00:10:29.960 --> 00:10:31.880
Let me ask this as an
audience question here.

00:10:31.880 --> 00:10:33.140
Feel free raise your hand.

00:10:33.140 --> 00:10:37.700
What might go wrong if I try
to make the primary key myself?

00:10:37.700 --> 00:10:39.185
Insert a value myself?

00:10:43.010 --> 00:10:44.900
Let's go to [INAUDIBLE].

00:10:44.900 --> 00:10:49.940
SPEAKER 2: Yeah, what we could do when
we have to [INAUDIBLE] a lot of values,

00:10:49.940 --> 00:10:56.810
like maybe like big data or something,
is there a way to add it to a CSV file

00:10:56.810 --> 00:10:59.725
or something rather than typing
[? insert ?] [? into ?] tables?

00:10:59.725 --> 00:11:00.850
SPEAKER 1: A good question.

00:11:00.850 --> 00:11:02.558
I think you're onto
something here, which

00:11:02.558 --> 00:11:05.600
is we're inserting one row at a time,
which could get easily repetitive.

00:11:05.600 --> 00:11:07.897
So let me hold that thought
for just a minute here

00:11:07.897 --> 00:11:10.730
and focus on the primary keys before
we see some more efficient ways

00:11:10.730 --> 00:11:12.960
to actually add data to our table.

00:11:12.960 --> 00:11:15.800
So I'm with you on this
idea of maybe we don't want

00:11:15.800 --> 00:11:17.697
to duplicate this primary key, right?

00:11:17.697 --> 00:11:19.530
Maybe we could do a
little better than that.

00:11:19.530 --> 00:11:21.780
So I'll come back to my computer here.

00:11:21.780 --> 00:11:25.670
And if I specify the primary
key, I might actually

00:11:25.670 --> 00:11:27.840
add a value that's already in there.

00:11:27.840 --> 00:11:32.180
So I could, thankfully, leave it
up to SQLite to actually increment

00:11:32.180 --> 00:11:33.620
this value for me.

00:11:33.620 --> 00:11:36.440
Let me try leaving
off "id" in this case.

00:11:36.440 --> 00:11:39.350
I'll omit the "id"
altogether, which seems

00:11:39.350 --> 00:11:42.170
at first like a bit of a bad
thought, like shouldn't every row

00:11:42.170 --> 00:11:43.280
have its own id?

00:11:43.280 --> 00:11:45.830
Well, let's just try and
figure out what happened.

00:11:45.830 --> 00:11:47.060
I'll hit Enter here.

00:11:47.060 --> 00:11:49.560
And now I'll say, let
me give the VALUES.

00:11:49.560 --> 00:11:52.770
Well, now the first column
is the title column.

00:11:52.770 --> 00:11:55.970
So I'll go ahead and say that this
one will be called "Spring outing."

00:11:55.970 --> 00:11:58.520
We're going to add this
one to our collection here.

00:11:58.520 --> 00:12:01.610
The accession number,
in this case, is 14.76.

00:12:01.610 --> 00:12:08.540
And we acquired this one all the way
back in 1914-01-08 [? semicolon. ?]

00:12:08.540 --> 00:12:10.170
So I'll hit Enter here.

00:12:10.170 --> 00:12:13.340
And notice how there's no
[? id ?] that I have specified.

00:12:13.340 --> 00:12:16.850
But now if I hit Enter, seems to work.

00:12:16.850 --> 00:12:20.120
I'll type SELECT [? star ?]
[? FROM ?] "collections,"

00:12:20.120 --> 00:12:24.710
and what do I see but
the new primary key of 3.

00:12:24.710 --> 00:12:30.290
So it seems like SQLite actually
increments the primary key for me.

00:12:30.290 --> 00:12:35.180
If I add some new row, it looks, what
is the current highest primary key

00:12:35.180 --> 00:12:38.120
and adds 1 to that automatically for me.

00:12:38.120 --> 00:12:41.390
And I've gotten that by
specifying in my schema

00:12:41.390 --> 00:12:45.050
this primary key
constraint down on line 6.

00:12:45.050 --> 00:12:47.840
So pretty handy for me here.

00:12:47.840 --> 00:12:50.030
I see a few other questions too.

00:12:50.030 --> 00:12:54.570
So let's take those
before we move on as well.

00:12:54.570 --> 00:12:55.760
Let's go to [? Andre. ?]

00:12:55.760 --> 00:12:59.030
SPEAKER 3: I just want to ask
if I delete the, let's say,

00:12:59.030 --> 00:13:06.410
the [? first ?] record and type again,
will it number it [? on ?] the fourth

00:13:06.410 --> 00:13:07.503
number or the first one.

00:13:07.503 --> 00:13:08.670
SPEAKER 1: A great question.

00:13:08.670 --> 00:13:10.378
We'll see this a little
bit later on too.

00:13:10.378 --> 00:13:14.990
But if I delete some row, let's say, I
deleted the row with primary key of 1,

00:13:14.990 --> 00:13:17.990
well, what SQLite will
do, at least by default,

00:13:17.990 --> 00:13:20.060
is actually take the highest value.

00:13:20.060 --> 00:13:23.340
Let's say the highest value
is still three in my database.

00:13:23.340 --> 00:13:26.510
It will add 1 to that, and I
want to insert that new row it'll

00:13:26.510 --> 00:13:29.673
have the [? id ?] of 4 in this case.

00:13:29.673 --> 00:13:34.430
OK, so let's explore a bit more
some of these constraints on tables.

00:13:34.430 --> 00:13:37.850
Here, we're talking about the primary
key constraint, but we also have,

00:13:37.850 --> 00:13:40.310
as we saw before, the
[? NOT ?] [? NULL ?] constraint

00:13:40.310 --> 00:13:42.000
and the [? UNIQUE ?] constraint.

00:13:42.000 --> 00:13:45.230
So let's try inserting given
those constraints here.

00:13:45.230 --> 00:13:47.280
[? I'll go ?] back to my computer.

00:13:47.280 --> 00:13:52.340
And again, notice in my schema,
I specified that this "title"

00:13:52.340 --> 00:13:55.512
column should always be NOT NULL.

00:13:55.512 --> 00:13:57.720
It should never have a
[? null ?] value inside of it.

00:13:57.720 --> 00:14:01.280
And similarly, "accession_number"
should also be NOT NULL.

00:14:01.280 --> 00:14:05.270
It should also be [? unique. ?]
I should have no two rows

00:14:05.270 --> 00:14:07.550
that have the same accession number.

00:14:07.550 --> 00:14:11.300
And maybe kind of a
playfully, subversively here,

00:14:11.300 --> 00:14:13.458
let's try to run
against this constraint.

00:14:13.458 --> 00:14:16.500
Let's actually try to add the same
accession number and see what happens.

00:14:16.500 --> 00:14:20.900
So I'll come back to my
[? TERMINAL. ?] Let me just reinsert,

00:14:20.900 --> 00:14:22.490
let's say "Spring outing."

00:14:22.490 --> 00:14:27.300
Notice how if I SELECT [? star ?] FROM
"collections," it's already in here.

00:14:27.300 --> 00:14:30.750
But I'll try to add it yet again with
the same [? accession ?] [? number. ?]

00:14:30.750 --> 00:14:32.408
So I'll say INSERT INTO "collections."

00:14:32.408 --> 00:14:35.450
And by now this is [? hopefully ?]
[? becoming ?] a little more familiar.

00:14:35.450 --> 00:14:38.870
I now want to say the columns
I want to add data to.

00:14:38.870 --> 00:14:46.520
So I'll say the "title" column,
the "accession_number" column,

00:14:46.520 --> 00:14:49.550
and also the "acquired"
date I'll hit Enter.

00:14:49.550 --> 00:14:51.620
And now I'll say the VALUES again.

00:14:51.620 --> 00:14:54.650
Let's reinsert [? "Spring ?]
[? outing." ?] I'll say "Spring outing"

00:14:54.650 --> 00:14:58.640
as a title, the accession
number is 14.76,

00:14:58.640 --> 00:15:04.670
and the date it was acquired
was again 1914-01-08 semicolon.

00:15:04.670 --> 00:15:09.100
Now if I hit Enter, what do we see?

00:15:09.100 --> 00:15:12.490
Runtime error-- UNIQUE
constraint failed.

00:15:12.490 --> 00:15:14.890
So it seems we ran
against this constraint

00:15:14.890 --> 00:15:18.460
here that we specified on line 4 here.

00:15:18.460 --> 00:15:20.830
[? "Accession number" ?]
should always be unique.

00:15:20.830 --> 00:15:24.820
But by trying to add a new row that
had that same accession number,

00:15:24.820 --> 00:15:29.780
we ran into this runtime error and
our operation was not completed.

00:15:29.780 --> 00:15:33.190
If I say in this case, SELECT
[? star ?] FROM "collections,"

00:15:33.190 --> 00:15:38.050
notice how I guarded myself from
adding Spring outing more than once

00:15:38.050 --> 00:15:40.070
with the same accession number.

00:15:40.070 --> 00:15:43.630
So some usefulness here to
these kinds of constraints.

00:15:43.630 --> 00:15:46.180
Let's try violating NOT NULL too.

00:15:46.180 --> 00:15:49.252
I'll try adding a title that is
actually [? null, ?] non-existent.

00:15:49.252 --> 00:15:51.460
It's trying to add a painting
without a title itself.

00:15:51.460 --> 00:15:52.420
Let's try this.

00:15:52.420 --> 00:15:56.020
I might try INSERT INTO
the "collections" table,

00:15:56.020 --> 00:16:01.660
and I'll add to the "title" columns,
the "accession_number," and also

00:16:01.660 --> 00:16:03.940
the "acquired" column.

00:16:03.940 --> 00:16:06.020
Now, I'll say the VALUES here.

00:16:06.020 --> 00:16:07.870
But remember, this [? null ?] value.

00:16:07.870 --> 00:16:09.280
Null meaning nothing.

00:16:09.280 --> 00:16:10.630
This value doesn't exist.

00:16:10.630 --> 00:16:14.260
I could insert that into my
table, or at least try to.

00:16:14.260 --> 00:16:16.690
I'll say NULL is a title.

00:16:16.690 --> 00:16:19.610
And similarly, NULL is
the accession number.

00:16:19.610 --> 00:16:25.450
Now let's say, just playfully, we
got this painting back in 1900-01-10

00:16:25.450 --> 00:16:26.770
like this.

00:16:26.770 --> 00:16:28.960
OK now, semicolon.

00:16:28.960 --> 00:16:33.970
I'll hit Enter, and we see the same
runtime error-- constraint failed,

00:16:33.970 --> 00:16:36.200
in this case, the NOT NULL constraint.

00:16:36.200 --> 00:16:41.950
So notice again how in my schema, I
specified "title" should be NOT NULL.

00:16:41.950 --> 00:16:46.090
But here when I try to insert a
[? null ?] [? value, ?] I run into that

00:16:46.090 --> 00:16:48.670
constraint and I can't
insert that value.

00:16:48.670 --> 00:16:51.400
If I try to SELECT [? star ?]
FROM "collections,"

00:16:51.400 --> 00:16:57.410
I should see in this case
that insert didn't work.

00:16:57.410 --> 00:17:00.520
So these constraints are
really guardrails against me,

00:17:00.520 --> 00:17:02.980
against other users to
adding values that we really

00:17:02.980 --> 00:17:06.480
shouldn't add to our table.

00:17:06.480 --> 00:17:12.400
OK, so presumably here, we're onto
something by adding some rows.

00:17:12.400 --> 00:17:17.730
But if the museum acquires more
than one item, maybe 100 at a time,

00:17:17.730 --> 00:17:21.119
I don't want to be the programmer who's
sitting there writing 100 INSERT INTO.

00:17:21.119 --> 00:17:22.740
That's like not what I want to do.

00:17:22.740 --> 00:17:24.630
There's probably a
better way to do this.

00:17:24.630 --> 00:17:27.040
[? And ?] [? we ?] show
you one way to do this.

00:17:27.040 --> 00:17:32.190
One way is to instead of inserting
one row, like this, to instead insert

00:17:32.190 --> 00:17:34.900
multiple separated by commas.

00:17:34.900 --> 00:17:39.870
So here, I might say this is my
first row for each of these columns.

00:17:39.870 --> 00:17:43.080
This is my second row, again,
for each of these columns,

00:17:43.080 --> 00:17:45.370
and so on and so forth.

00:17:45.370 --> 00:17:49.530
And it's worth noting, this is
not just a manner of convenience.

00:17:49.530 --> 00:17:54.130
Like if I tried to insert 100 rows,
this is certainly convenient for me.

00:17:54.130 --> 00:17:57.210
But it's also most often
in most cases going

00:17:57.210 --> 00:18:01.200
to be faster for me to
insert 100 rows in one go,

00:18:01.200 --> 00:18:04.390
than to insert one row 100 times.

00:18:04.390 --> 00:18:06.930
So this is both a convenience
thing for programmers

00:18:06.930 --> 00:18:11.740
and also an efficiency thing to
actually optimize your database as well.

00:18:11.740 --> 00:18:16.350
So let's try this syntax now to avoid
me sitting there for hours and hours

00:18:16.350 --> 00:18:18.240
writing many insert interviews.

00:18:18.240 --> 00:18:20.460
I'll go back to my computer.

00:18:20.460 --> 00:18:23.940
And this time, let's say we
got two paintings at once.

00:18:23.940 --> 00:18:24.750
We got two.

00:18:24.750 --> 00:18:26.670
So I'll try to add both of them.

00:18:26.670 --> 00:18:30.510
I'll say INSERT INTO
the "collections" table.

00:18:30.510 --> 00:18:35.730
And again, I want to keep adding to my
"title" column, my "accession_number"

00:18:35.730 --> 00:18:40.200
column, and even my
"acquired" column, like this.

00:18:40.200 --> 00:18:42.640
Now, I want to add in some values.

00:18:42.640 --> 00:18:44.370
So I'll type VALUES here.

00:18:44.370 --> 00:18:48.390
And as a matter of style, let me
just say I'll make a new line.

00:18:48.390 --> 00:18:49.650
I'll Enter here.

00:18:49.650 --> 00:18:54.360
And now I can type each of my
new rows on their very own line.

00:18:54.360 --> 00:18:59.220
So again, I'll type all of the
values that I want in my first row.

00:18:59.220 --> 00:19:02.220
I might call this one
"Imaginative landscape."

00:19:02.220 --> 00:19:03.785
We got this one back in--

00:19:03.785 --> 00:19:05.910
actually, we don't quite
know when we got this one.

00:19:05.910 --> 00:19:12.210
If I type 56.496 is the accession
number, like, yeah, 56.496.

00:19:12.210 --> 00:19:15.430
The MFA actually doesn't know
when they got this painting.

00:19:15.430 --> 00:19:18.810
So if they don't know,
let's just insert NULL.

00:19:18.810 --> 00:19:22.800
This is intentionally left blank.

00:19:22.800 --> 00:19:24.150
This is my first row.

00:19:24.150 --> 00:19:26.970
Now, I want to add a
second one in one go.

00:19:26.970 --> 00:19:28.690
I'll follow this up with a comma.

00:19:28.690 --> 00:19:32.250
And now, I'll type out
my next set of values.

00:19:32.250 --> 00:19:36.000
This next one we acquired is
called "Peonies and butterfly."

00:19:36.000 --> 00:19:40.710
Now, I'll say the [? accession ?]
[? number ?] is 06.1899.

00:19:40.710 --> 00:19:48.570
And we got this one-- it was back in
1906-01-01 now I'll hit semicolon here.

00:19:48.570 --> 00:19:52.300
And now, if I hit
Enter, nothing happens.

00:19:52.300 --> 00:19:57.330
But if I type Select [? star ?]
FROM "collections" semicolon,

00:19:57.330 --> 00:20:01.960
what do we see, but now two
new rows being inserted.

00:20:01.960 --> 00:20:05.500
So a handy way to insert
more than one value.

00:20:05.500 --> 00:20:10.510
And also, if you have a lot of
values, a more efficient way as well.

00:20:10.510 --> 00:20:12.030
So we pause here.

00:20:12.030 --> 00:20:17.930
And ask what questions we have on
INSERT INTO, Whether adding one row

00:20:17.930 --> 00:20:19.550
or adding multiple.

00:20:19.550 --> 00:20:21.320
Let's go to [INAUDIBLE].

00:20:21.320 --> 00:20:24.070
SPEAKER 4: [? So ?] [? imagine ?]
you are [? writing ?] [? code ?]

00:20:24.070 --> 00:20:28.700
or inserting a row, so by mistake
you have entered the wrong spelling

00:20:28.700 --> 00:20:31.717
of the title, so how we will rename it?

00:20:31.717 --> 00:20:33.050
SPEAKER 1: A great question too.

00:20:33.050 --> 00:20:35.900
And often as is the case with
me, I make typos all the time.

00:20:35.900 --> 00:20:40.040
I might add the artwork title
and misspell it for instance.

00:20:40.040 --> 00:20:44.600
Well, in that case, I can't
use INSERT INTO to correct it,

00:20:44.600 --> 00:20:48.410
but I can use a new statement we'll
see a little later called [? update. ?]

00:20:48.410 --> 00:20:50.210
And so with [? update, ?]
you can actually

00:20:50.210 --> 00:20:51.830
change the spellings of things.

00:20:51.830 --> 00:20:53.930
And we'll see that a little later today.

00:20:53.930 --> 00:20:57.770
But great question to
peek ahead as well.

00:20:57.770 --> 00:21:00.270
OK, so let's keep going here.

00:21:00.270 --> 00:21:04.500
Let's think about how we can keep
adding values to our data set.

00:21:04.500 --> 00:21:09.620
And so far we've seen INSERT INTO
with one row and with multiple.

00:21:09.620 --> 00:21:13.640
But one more way to keep in
mind is you might already

00:21:13.640 --> 00:21:17.300
have your data, perhaps
in some other format.

00:21:17.300 --> 00:21:22.940
And one common format looks a bit
like this on our screen over here.

00:21:22.940 --> 00:21:29.750
This file is called CSV
for Comma Separated Values.

00:21:29.750 --> 00:21:31.760
Now, why Comma Separated Values?

00:21:31.760 --> 00:21:33.350
Well, let's just look at it.

00:21:33.350 --> 00:21:36.710
Here we see some presumably
column or column names,

00:21:36.710 --> 00:21:39.530
like [? id, ?] [? title, ?]
[? accession number, ?]

00:21:39.530 --> 00:21:42.980
and [? acquired, ?] but what
separates these column names?

00:21:42.980 --> 00:21:44.510
Well, it looks like commas--

00:21:44.510 --> 00:21:48.470
"id" comma "title," comma
"accession_number," and so on.

00:21:48.470 --> 00:21:51.350
Every row is still its own row.

00:21:51.350 --> 00:21:52.670
That makes sense here.

00:21:52.670 --> 00:21:57.410
But now, those row values
are also separated by commas.

00:21:57.410 --> 00:22:01.100
Notice here, the first
value before the first comma

00:22:01.100 --> 00:22:05.210
corresponds to the first
column, this id is 1.

00:22:05.210 --> 00:22:09.050
Similarly, the next Comma Separated
Value Profusion of flowers

00:22:09.050 --> 00:22:12.450
belongs to that next column
here-- "title" as well,

00:22:12.450 --> 00:22:14.070
and so on and so forth.

00:22:14.070 --> 00:22:16.820
You could if you wanted
to draw a snaky line

00:22:16.820 --> 00:22:21.780
to see how these columns
correspond down our file here.

00:22:21.780 --> 00:22:25.720
Now, you might often have
data already in this format.

00:22:25.720 --> 00:22:27.470
And it's actually
pretty convenient to try

00:22:27.470 --> 00:22:31.760
to import this data into
SQLite into your very own table

00:22:31.760 --> 00:22:34.170
so you can use [? SQL ?] on it.

00:22:34.170 --> 00:22:37.730
So let's try doing that
here, taking a CSV like this

00:22:37.730 --> 00:22:40.610
and adding it to our database.

00:22:40.610 --> 00:22:42.600
I'll come back to my computer.

00:22:42.600 --> 00:22:45.870
And at this point, I want to start over.

00:22:45.870 --> 00:22:48.590
Let's say, I didn't use
[? insert ?] at all.

00:22:48.590 --> 00:22:52.560
I actually got a CSV from the MFA
of all the items in our collection.

00:22:52.560 --> 00:22:55.340
So I'll type .quit to
leave this database.

00:22:55.340 --> 00:23:01.040
And now, I'll type rm mfa.db to
remove this file all together.

00:23:01.040 --> 00:23:07.070
Now, let me show you mfa.csv, the
CSV file I have that looks exactly

00:23:07.070 --> 00:23:08.480
like what we just saw.

00:23:08.480 --> 00:23:11.300
I'll type code mfa.csv.

00:23:11.300 --> 00:23:15.440
And now I can indeed see, I
have an iD column, a title

00:23:15.440 --> 00:23:18.830
column, and accession_number,
and the date these pieces were

00:23:18.830 --> 00:23:22.590
acquired all separated by commas.

00:23:22.590 --> 00:23:28.280
So let's say I want to quickly import
this data into its very own database

00:23:28.280 --> 00:23:30.050
to run [? SQL ?] [? queries ?] on it.

00:23:30.050 --> 00:23:33.200
As it stands, I can't use [? SQL ?]
[? queries ?] on this table

00:23:33.200 --> 00:23:36.110
because it's not inside
of a SQLite database.

00:23:36.110 --> 00:23:37.370
But let me add one.

00:23:37.370 --> 00:23:43.470
I'll type SQLite3 mfa.db to
remake [? MFA ?] down below.

00:23:43.470 --> 00:23:48.020
Let me now introduce this new
command that is not a SQL keyword,

00:23:48.020 --> 00:23:50.400
but is actually a SQLite command.

00:23:50.400 --> 00:23:54.170
And this command is called .import.

00:23:54.170 --> 00:23:57.560
.import lets you take
a file, like a CSV,

00:23:57.560 --> 00:24:04.010
and automatically insert it row by row
into a table of your own making or one

00:24:04.010 --> 00:24:06.590
you can let SQLite create for you.

00:24:06.590 --> 00:24:08.660
Let's actually create
the table ourselves,

00:24:08.660 --> 00:24:12.080
and then insert this CSV using .import.

00:24:12.080 --> 00:24:13.990
So I'll go back to SQLite.

00:24:13.990 --> 00:24:15.830
Let me recreate the schema.

00:24:15.830 --> 00:24:20.490
Right now, there's no tables inside
this database, but let me create one.

00:24:20.490 --> 00:24:26.240
I'll instead type .read schema.sql
to read in that old schema.sql file,

00:24:26.240 --> 00:24:28.460
so I can have my own table here.

00:24:28.460 --> 00:24:30.000
I'll type that schema again.

00:24:30.000 --> 00:24:33.150
And now I see, I have that table back.

00:24:33.150 --> 00:24:38.090
But if I type SELECT [? star ?]
FROM "collections" semicolon,

00:24:38.090 --> 00:24:39.980
nothing is inside.

00:24:39.980 --> 00:24:41.070
Well, I could fix this.

00:24:41.070 --> 00:24:46.220
I could to import this CSV into
collections to insert all at once from

00:24:46.220 --> 00:24:49.350
[? its ?] [? own, ?] from the
CSV file we have right here.

00:24:49.350 --> 00:24:52.910
So to do this, let me type .import.

00:24:52.910 --> 00:24:59.270
Now before I finish this off, there are
a few arguments or options I can give

00:24:59.270 --> 00:25:02.660
.import to make sure it works properly.

00:25:02.660 --> 00:25:05.430
The ones for this look as follows.

00:25:05.430 --> 00:25:10.730
I should type .import, and then
[? dash ?] [? dash ?] CSV to say

00:25:10.730 --> 00:25:12.560
I'm importing a CSV.

00:25:12.560 --> 00:25:15.620
If I don't type
[? dash ?] [? dash ?] CSV,

00:25:15.620 --> 00:25:19.040
SQLite I might assume something
about this file that just isn't true.

00:25:19.040 --> 00:25:22.490
It should know that numbers
are here separated by commas,

00:25:22.490 --> 00:25:25.130
[? numbers, ?] or [? values, ?]
whatever's inside my table are

00:25:25.130 --> 00:25:27.300
separated by commas.

00:25:27.300 --> 00:25:29.840
And then [? dash ?] [? dash ?] skip 1--

00:25:29.840 --> 00:25:31.070
well, let's take a look.

00:25:31.070 --> 00:25:34.280
If I go back to this
file here, let's see.

00:25:34.280 --> 00:25:38.900
I have the first row, the second
row, the third row, the fourth row.

00:25:38.900 --> 00:25:43.915
Are there any rows I probably
shouldn't insert into my table here?

00:25:43.915 --> 00:25:44.915
Let me ask the audience.

00:25:44.915 --> 00:25:46.460
You're free to raise your hand.

00:25:46.460 --> 00:25:49.775
Are there any rows I shouldn't
insert into this table?

00:25:52.970 --> 00:25:54.320
Let's go to [? Sukanya. ?]

00:25:54.320 --> 00:25:57.410
SPEAKER 5: Yes, I think they
shouldn't include the first one,

00:25:57.410 --> 00:25:59.910
because it doesn't give
us any value that we need.

00:25:59.910 --> 00:26:00.450
SPEAKER 1: Yeah, you're right.

00:26:00.450 --> 00:26:02.630
So if I look at this
header row, as we call it,

00:26:02.630 --> 00:26:08.300
I see id, title, accession_number, and
acquired-- the names for my columns.

00:26:08.300 --> 00:26:12.920
But notice in my schema, I already
have those column names existing.

00:26:12.920 --> 00:26:17.840
So I shouldn't insert the
value id into my "id" column.

00:26:17.840 --> 00:26:21.060
I want to just skip that row
and only do the next one.

00:26:21.060 --> 00:26:25.880
So here by typing [? dash ?] [? dash ?]
skip 1, I'll skip that header row.

00:26:25.880 --> 00:26:26.900
So let's try this now.

00:26:26.900 --> 00:26:29.420
I'll say .import
[? dash ?] [? dash ?] CSV.

00:26:29.420 --> 00:26:31.294
I'm going to import a CSV.

00:26:31.294 --> 00:26:34.640
Now, [? dash ?] [? dash ?] skip,
how many rows should I skip?

00:26:34.640 --> 00:26:37.040
Well, just one, the header row there.

00:26:37.040 --> 00:26:39.500
Now, I say the file to import.

00:26:39.500 --> 00:26:44.960
So I'll type mfa.csv, the CSV
file to import into my database.

00:26:44.960 --> 00:26:49.040
Now, I type the name of the
table I'm importing into,

00:26:49.040 --> 00:26:51.030
collections, in this case.

00:26:51.030 --> 00:26:56.060
And notice how I no longer need to
[? quote ?] collections or [? quote ?]

00:26:56.060 --> 00:27:01.508
CSV, because this is not a SQL
statement this is a SQLite statement,

00:27:01.508 --> 00:27:03.800
so I can get away with not
[? quoting ?] anything here.

00:27:03.800 --> 00:27:05.810
Now, I'll hit Enter.

00:27:05.810 --> 00:27:08.570
And again, nothing seems to happen,
but it's probably a good sign.

00:27:08.570 --> 00:27:14.240
Let me try selecting [? star ?]
[? from ?] "collections" semicolon,

00:27:14.240 --> 00:27:15.140
Enter.

00:27:15.140 --> 00:27:17.960
And now all my data is
just magically in there.

00:27:17.960 --> 00:27:23.903
It's went from my CSV
into my very own table.

00:27:23.903 --> 00:27:27.590
OK, so that's a pretty
nice step forward.

00:27:27.590 --> 00:27:31.100
No longer do we have to use
just single line inserts or even

00:27:31.100 --> 00:27:33.530
bulk inserts, those multi-line inserts.

00:27:33.530 --> 00:27:37.790
We can now just import an
entire table from a CSV.

00:27:37.790 --> 00:27:42.080
But I think I've been showing you a
bit of an inaccuracy here, at least

00:27:42.080 --> 00:27:44.480
that's not often going to happen to you.

00:27:44.480 --> 00:27:49.280
If I go back to this
file here, this CSV.

00:27:49.280 --> 00:27:53.480
What you might notice is that
I specified the primary key.

00:27:53.480 --> 00:27:57.470
I said [? id ?] of 1,
[? id ?] of 2, [? id ?] of 3.

00:27:57.470 --> 00:28:01.190
But if we'd heard before, this
might not be the best design.

00:28:01.190 --> 00:28:04.610
Because what if I import
this CSV, and there's already

00:28:04.610 --> 00:28:07.820
an item that has the id
of [? 1 ?] or the id of 2.

00:28:07.820 --> 00:28:12.020
Ideally, I could let SQLite
create its very own primary keys

00:28:12.020 --> 00:28:13.700
for each of these rows.

00:28:13.700 --> 00:28:15.950
And more often than
not, you'll likely have

00:28:15.950 --> 00:28:20.870
a database or CSV that looks a bit
more like this without the primary key.

00:28:20.870 --> 00:28:25.710
But you'll still want to have a primary
key when you import into your database.

00:28:25.710 --> 00:28:29.000
So let's try working
with this kind of CSV now

00:28:29.000 --> 00:28:32.870
and having SQLite generate some
of the primary keys for us.

00:28:32.870 --> 00:28:34.730
Come back to my computer.

00:28:34.730 --> 00:28:40.440
And let's update our mfa.csv just to
remove those primary key column here.

00:28:40.440 --> 00:28:44.900
So I will open up again mfa.csv.

00:28:44.900 --> 00:28:47.600
And let me try to just delete
this column all together.

00:28:47.600 --> 00:28:54.680
I'll move [? id, ?] remove 1,
and 2, and 3, and 4, and 5.

00:28:54.680 --> 00:28:59.250
Let me save it, and now
that id column is gone.

00:28:59.250 --> 00:29:01.020
But now there's a problem.

00:29:01.020 --> 00:29:05.550
Like if I go to my schema,
and I type .schema,

00:29:05.550 --> 00:29:09.300
notice how there are four
columns in this table--

00:29:09.300 --> 00:29:13.350
"id," "title," "accession_number,"
and "acquired."

00:29:13.350 --> 00:29:16.990
Well, now my CSV only has three columns.

00:29:16.990 --> 00:29:20.520
So if I try to import
this CSV into this table,

00:29:20.520 --> 00:29:23.730
I'm going to run into trouble, because
I have different numbers of columns

00:29:23.730 --> 00:29:26.755
[? between ?] my CSV and my table.

00:29:26.755 --> 00:29:29.130
Well, what can I do instead?

00:29:29.130 --> 00:29:32.910
It turns out that I could
actually import this data

00:29:32.910 --> 00:29:36.480
into a temporary table,
and then take that data

00:29:36.480 --> 00:29:40.050
and insert that data in that
temporary table into my real one

00:29:40.050 --> 00:29:41.670
called "collections" here.

00:29:41.670 --> 00:29:47.670
I can use both .import and INSERT
INTO to accomplish that task for me.

00:29:47.670 --> 00:29:48.810
So let's try this.

00:29:48.810 --> 00:29:50.760
I'll say .import.

00:29:50.760 --> 00:29:54.990
I want to import a CSV now-- [? dash ?]
[? dash ?] [? CSV. ?] Which one?

00:29:54.990 --> 00:29:58.080
Well, mfa.csv in this case.

00:29:58.080 --> 00:30:01.902
Now, though, I want to create a brand
new table that doesn't yet exist.

00:30:01.902 --> 00:30:03.360
So I'll type the name of the table.

00:30:03.360 --> 00:30:06.900
And I just call it temp for
now to temporarily import

00:30:06.900 --> 00:30:09.390
this data into a brand new table.

00:30:09.390 --> 00:30:12.260
And notice how here I'm not
using [? dash ?] [? dash ?]

00:30:12.260 --> 00:30:16.350
skip 1, because now I want to take
advantage of those header rows.

00:30:16.350 --> 00:30:22.810
SQLite, if I import into a new table,
will notice I have a header row

00:30:22.810 --> 00:30:27.600
and make my columns the very same
names that are in my header row--

00:30:27.600 --> 00:30:30.210
title, accession number, and acquired.

00:30:30.210 --> 00:30:31.380
I don't want to skip them.

00:30:31.380 --> 00:30:36.610
I want SQLite to see them and create
this table using those header rows.

00:30:36.610 --> 00:30:39.990
So now, I'll hit Enter, and
nothing seems to happen.

00:30:39.990 --> 00:30:45.000
But if I type .schema, what do I see,
but a brand new table called "temp"

00:30:45.000 --> 00:30:49.050
that SQLite has just made automatically
and notice how it used that header row.

00:30:49.050 --> 00:30:54.795
I have "title," "accession_number,"
and "acquired" as my column names.

00:30:54.795 --> 00:30:57.380
OK, let's now look
inside the temp table,

00:30:57.380 --> 00:31:02.520
I'll say SELECT [? star ?]
FROM "temp" semicolon.

00:31:02.520 --> 00:31:04.350
And now, I see all that data in there.

00:31:04.350 --> 00:31:06.200
But I don't have primary keys yet.

00:31:06.200 --> 00:31:10.070
So my goal [? is ?] [? really ?]
to take all this data and insert it

00:31:10.070 --> 00:31:14.630
into my collections table to give it
that primary key I've been wanting

00:31:14.630 --> 00:31:15.350
here.

00:31:15.350 --> 00:31:20.998
Again, if I type SELECT [? star ?]
FROM "collections," well, nothing-- oh,

00:31:20.998 --> 00:31:22.040
something is still there.

00:31:22.040 --> 00:31:24.217
Let me actually just
delete this [INAUDIBLE]

00:31:24.217 --> 00:31:27.050
I'll say DELETE FROM "collections"
to remove it all together-- we'll

00:31:27.050 --> 00:31:28.550
[? get ?] [INAUDIBLE] just a minute.

00:31:28.550 --> 00:31:33.020
And now let me try, SELECT
[? star ?] FROM "collections,"

00:31:33.020 --> 00:31:36.750
and I should see nothing
inside of collections.

00:31:36.750 --> 00:31:38.040
So what could I do?

00:31:38.040 --> 00:31:41.630
I could [? INSERT ?] [? INTO ?]
[? collections ?] using the data from

00:31:41.630 --> 00:31:43.130
my temp table.

00:31:43.130 --> 00:31:43.980
So I'll try that.

00:31:43.980 --> 00:31:49.460
I'll say INSERT INTO "collections"
and choose those columns yet again.

00:31:49.460 --> 00:31:54.450
I'll say the "title," and
the "accession_number,"

00:31:54.450 --> 00:31:58.900
and now let's go for the
"acquired" column as well.

00:31:58.900 --> 00:32:00.370
I'll hit Enter.

00:32:00.370 --> 00:32:04.330
But instead of typing
many new lines of values,

00:32:04.330 --> 00:32:10.240
I actually have a way to dynamically
select all the values I want to insert,

00:32:10.240 --> 00:32:12.170
and it looks a bit like this.

00:32:12.170 --> 00:32:18.890
I could say INSERT INTO [? some ?]
table and some columns of that table.

00:32:18.890 --> 00:32:23.330
But I want to insert the results
of this SELECT down below.

00:32:23.330 --> 00:32:26.720
So I'll [? SELECT ?] some columns
[? FROM ?] some separate table.

00:32:26.720 --> 00:32:31.850
And so long as these columns align, I'll
be able to actually take the results

00:32:31.850 --> 00:32:35.180
of that [? SELECT ?] and [? insert ?]
all of them [? into ?] this new table

00:32:35.180 --> 00:32:37.070
using INSERT INTO.

00:32:37.070 --> 00:32:39.920
So let me finish my statement
here, we can see the results.

00:32:39.920 --> 00:32:42.050
Come back to my computer.

00:32:42.050 --> 00:32:47.690
And here, I'll type a SELECT to get
back all the items from my temp table.

00:32:47.690 --> 00:32:52.070
I'll say SELECT-- in
this case, SELECT, let's

00:32:52.070 --> 00:32:56.450
go for the "title" column,
the "accession_number"

00:32:56.450 --> 00:33:01.100
column, and the "acquired"
column from my "temp" table.

00:33:01.100 --> 00:33:05.270
Now, if I hit semicolon, I should see--

00:33:05.270 --> 00:33:06.470
well, nothing at first.

00:33:06.470 --> 00:33:11.000
But if I type SELECT
[? star ?] FROM "collections,"

00:33:11.000 --> 00:33:16.610
I see all my data now in there
selected from my temporary table.

00:33:16.610 --> 00:33:19.623
And now if I type
.schema, I still see temp.

00:33:19.623 --> 00:33:20.540
But what can I do now?

00:33:20.540 --> 00:33:22.710
I could just delete
that table altogether.

00:33:22.710 --> 00:33:28.370
I could say DROP TABLE "temp," as we
saw last week, semicolon, hit Enter,

00:33:28.370 --> 00:33:30.090
.schema again.

00:33:30.090 --> 00:33:33.680
And now, we're back where we
want to be with a single table,

00:33:33.680 --> 00:33:39.030
and now we've import our data
from our CSV with primary keys.

00:33:39.030 --> 00:33:44.650
If I type "collections" here,
we see it all in this table.

00:33:44.650 --> 00:33:48.190
OK, so we've seen several
versions of import, one

00:33:48.190 --> 00:33:52.460
to import into an existing table and
one to import into a brand new table.

00:33:52.460 --> 00:33:58.540
Let me ask then what questions we have
on how to use .import or how to insert

00:33:58.540 --> 00:34:03.240
more data into our database.

00:34:03.240 --> 00:34:04.430
Let's go [? Louise. ?]

00:34:04.430 --> 00:34:08.330
SPEAKER 6: Can we ask the
INSERT INTO command to place

00:34:08.330 --> 00:34:10.387
a column in a specific position?

00:34:10.387 --> 00:34:11.429
SPEAKER 1: Good question.

00:34:11.429 --> 00:34:16.028
Can we ask INSERT INTO the place
a column in a certain position?

00:34:16.028 --> 00:34:19.070
There's a few ways of getting [? it-- ?]
what I think you're asking here,

00:34:19.070 --> 00:34:20.695
so let me show you a few of those here.

00:34:20.695 --> 00:34:26.389
I'll come back to my computer, and let
me pull up the syntax for INSERT INTO

00:34:26.389 --> 00:34:27.210
again.

00:34:27.210 --> 00:34:31.800
I will choose this, and I'll
show it on the screen over here.

00:34:31.800 --> 00:34:36.409
So I'll say INSERT INTO some
table given some columns.

00:34:36.409 --> 00:34:39.139
And notice how down below,
I actually have [? values ?]

00:34:39.139 --> 00:34:41.600
to insert into those columns.

00:34:41.600 --> 00:34:45.120
I could, to your question,
rearrange these values.

00:34:45.120 --> 00:34:49.670
So I might have this first value here is
some value that goes into this column.

00:34:49.670 --> 00:34:52.159
The second column here is
attached to that column.

00:34:52.159 --> 00:34:56.000
And that would rearrange the
values I insert into those columns.

00:34:56.000 --> 00:34:59.270
If you're asking, though, if
I can reorder the columns,

00:34:59.270 --> 00:35:02.733
that is up to [? CREATE ?] [? TABLE ?]
and [? CREATE ?] [? TABLE ?] only.

00:35:02.733 --> 00:35:05.900
And in general, I encourage you not so
much out of ordering of your columns,

00:35:05.900 --> 00:35:08.233
[? because ?] [? it ?] could
be in any order whatsoever.

00:35:08.233 --> 00:35:10.880
But you could just rearrange
your selection of columns

00:35:10.880 --> 00:35:14.780
here to insert the data
you want to insert.

00:35:14.780 --> 00:35:17.610
OK, let's take one more question here.

00:35:17.610 --> 00:35:20.390
SPEAKER 7: What happens if one of
the rows you're trying to insert

00:35:20.390 --> 00:35:22.340
violates a constraint on the table?

00:35:22.340 --> 00:35:25.140
SPEAKER 1: Yeah, so here
we're inserting multiple rows.

00:35:25.140 --> 00:35:28.430
And if one actually
violates some constraint,

00:35:28.430 --> 00:35:31.610
then we won't actually
insert any of those rows.

00:35:31.610 --> 00:35:34.010
And this is because this
insert is bundled inside

00:35:34.010 --> 00:35:38.640
of a transaction, which we'll learn
more about later on in the course.

00:35:38.640 --> 00:35:40.233
Let's take another one.

00:35:40.233 --> 00:35:43.400
SPEAKER 8: I noticed that when you did
SELECT [? star ?] FROM "collections,"

00:35:43.400 --> 00:35:45.740
that one of the acquired
dates was just blank.

00:35:45.740 --> 00:35:47.030
It didn't say NULL.

00:35:47.030 --> 00:35:51.290
Is that because the CSV itself just
had an empty value next to the comma?

00:35:51.290 --> 00:35:52.790
SPEAKER 1: Yeah, great observations.

00:35:52.790 --> 00:35:53.850
So let me try this again.

00:35:53.850 --> 00:35:55.808
So we can see the results
of this [? SELECT. ?]

00:35:55.808 --> 00:36:00.020
I'll come back to my computer
here, and let me show you again.

00:36:00.020 --> 00:36:03.140
If I say SELECT [? star ?]
FROM "collections"

00:36:03.140 --> 00:36:07.970
semicolon, notice how this acquired
column-- this acquired cell here, if I

00:36:07.970 --> 00:36:11.330
go back to my screen, it's just blank.

00:36:11.330 --> 00:36:14.720
But we saw before that if I
[? selected ?] [? star, ?] I would have

00:36:14.720 --> 00:36:19.460
seen [? NULL ?] there if this
value was truly [? NULL. ?] Well,

00:36:19.460 --> 00:36:24.890
it turns out one downside of importing
from a CSV is that all of your data is

00:36:24.890 --> 00:36:27.230
imported initially as text.

00:36:27.230 --> 00:36:30.920
And if I have just a
blank cell in my CSV,

00:36:30.920 --> 00:36:34.130
it won't be converted to
[? NULL ?] automatically.

00:36:34.130 --> 00:36:36.830
I need to do that myself
perhaps using an [? update ?]

00:36:36.830 --> 00:36:38.790
statement that we'll see in just a bit.

00:36:38.790 --> 00:36:42.140
So be wary of this if you do want to
keep track of [? null ?] [? values ?]

00:36:42.140 --> 00:36:42.930
and so on.

00:36:42.930 --> 00:36:46.070
If you don't actually manually
make this value a [? null, ?]

00:36:46.070 --> 00:36:49.230
it'll appear as just a
blank value not a null,

00:36:49.230 --> 00:36:52.770
which is different in this respect.

00:36:52.770 --> 00:36:58.380
OK so here, we've seen how to insert
not just one row, but multiple,

00:36:58.380 --> 00:37:01.840
and also how to import data from a CSV.

00:37:01.840 --> 00:37:05.980
When we come back, we'll see how to
actually update our data altogether,

00:37:05.980 --> 00:37:07.275
and even delete it too.

00:37:07.275 --> 00:37:09.150
So come back in just a
few and talk about how

00:37:09.150 --> 00:37:10.950
to delete data from our tables.

00:37:10.950 --> 00:37:12.690
And we're back.

00:37:12.690 --> 00:37:16.540
So we just saw how to insert
some rows into our database,

00:37:16.540 --> 00:37:18.480
and also to import some data.

00:37:18.480 --> 00:37:22.320
But presumably, we also want to be
able to delete some data from our table

00:37:22.320 --> 00:37:23.290
as well.

00:37:23.290 --> 00:37:26.130
You can imagine the MFA,
the Museum of Fine Arts,

00:37:26.130 --> 00:37:29.460
maybe they're selling a piece of
artwork, or maybe they've lost one,

00:37:29.460 --> 00:37:30.750
or maybe it was stolen.

00:37:30.750 --> 00:37:34.980
But either way, they want to
remove the row from their table.

00:37:34.980 --> 00:37:36.390
Well, let's see here.

00:37:36.390 --> 00:37:41.100
We can go back to our schema with
a table of artifacts and artwork

00:37:41.100 --> 00:37:43.620
that is inside of the MFA.

00:37:43.620 --> 00:37:45.870
Now if I want to delete
a particular piece,

00:37:45.870 --> 00:37:47.950
you could visually think
of it a bit like this.

00:37:47.950 --> 00:37:51.270
I could first identify
that row I want to delete.

00:37:51.270 --> 00:37:52.980
Let's say it's Spring outing here.

00:37:52.980 --> 00:37:54.810
We've sold this piece.

00:37:54.810 --> 00:37:58.210
Well, I could visually
just remove this row.

00:37:58.210 --> 00:38:02.460
So it's no longer there and shift
the remaining ones up metaphorically.

00:38:02.460 --> 00:38:05.610
And it turns out that
to do this in SQL, we

00:38:05.610 --> 00:38:08.220
have our very own statement we can use.

00:38:08.220 --> 00:38:14.830
This one reads DELETE FROM some
table WHERE a condition is true.

00:38:14.830 --> 00:38:17.550
So we see our old
friend WHERE back again.

00:38:17.550 --> 00:38:20.610
And WHERE is vitally
important to this DELETE FROM.

00:38:20.610 --> 00:38:23.910
If I say DELETE FROM
table with no WHERE,

00:38:23.910 --> 00:38:26.790
what do you think is going to happen?

00:38:26.790 --> 00:38:29.850
We might drop everything
from our table, right?

00:38:29.850 --> 00:38:32.910
But if I instead say WHERE
some condition is true,

00:38:32.910 --> 00:38:38.170
I can select the rows I want to
delete and only delete those rows.

00:38:38.170 --> 00:38:39.900
So let's try this here.

00:38:39.900 --> 00:38:43.080
I'll try to delete some artwork
from our collections table,

00:38:43.080 --> 00:38:45.610
for example, maybe we sold
it and we want get rid of it.

00:38:45.610 --> 00:38:51.960
I'll come back to my computer over here,
and I will open up our SQLite database.

00:38:51.960 --> 00:38:54.540
I'll say sqlite3 mfa.db.

00:38:54.540 --> 00:38:58.290
And now, I can type SELECT
[? star ?] [? FROM ?] "collections,"

00:38:58.290 --> 00:39:03.990
and I see a few more items than last
time now all inside of our table.

00:39:03.990 --> 00:39:07.650
But per the visual, I want
to delete Spring outing,

00:39:07.650 --> 00:39:09.820
to remove it from this table.

00:39:09.820 --> 00:39:10.900
So what could I do?

00:39:10.900 --> 00:39:15.270
I could try DELETE FROM and then
the table I want to delete from.

00:39:15.270 --> 00:39:20.400
In this case, I'll use the let's say
[? collections' ?] [? name, ?] DELETE

00:39:20.400 --> 00:39:22.320
FROM "collections."

00:39:22.320 --> 00:39:23.620
But I don't want to do this.

00:39:23.620 --> 00:39:25.590
I don't want to say
DELETE FROM "collections"

00:39:25.590 --> 00:39:30.780
semicolon end of statement,
because then I'll delete every row.

00:39:30.780 --> 00:39:33.540
What I should instead do is this.

00:39:33.540 --> 00:39:37.590
DELETE FROM "collections"
WHERE some condition is true.

00:39:37.590 --> 00:39:41.370
Maybe the title in this case is
a title I'm trying to remove.

00:39:41.370 --> 00:39:47.430
So I'll say "title" equals "Spring
outing," just like this, semicolon,

00:39:47.430 --> 00:39:48.510
Enter.

00:39:48.510 --> 00:39:49.740
Now, nothing seems to happen.

00:39:49.740 --> 00:39:55.140
But if I do SELECT [? star ?]
FROM "collections" semicolon,

00:39:55.140 --> 00:39:57.420
we no longer see Spring outing.

00:39:57.420 --> 00:40:02.580
And notice here, how the id
column, the id of 3 is now gone.

00:40:02.580 --> 00:40:05.880
If I were to insert some new
row here, I would start over

00:40:05.880 --> 00:40:10.110
with the highest number, which is
7 in this case, going from 7 to 8.

00:40:10.110 --> 00:40:14.150
But 3 is no longer part of our database.

00:40:14.150 --> 00:40:18.170
Let me try this again, notice how
this one called Imaginative Landscape,

00:40:18.170 --> 00:40:19.940
the acquired date is NULL.

00:40:19.940 --> 00:40:21.813
We don't know when we got it.

00:40:21.813 --> 00:40:23.480
Well, let's say that we eventually sell.

00:40:23.480 --> 00:40:25.010
It's no longer part of our database.

00:40:25.010 --> 00:40:29.810
We could use a condition based on NULL
to remove this particular artwork.

00:40:29.810 --> 00:40:30.810
So I'll say this.

00:40:30.810 --> 00:40:36.770
Let me try to DELETE FROM
"collections" WHERE "title" equals--

00:40:36.770 --> 00:40:39.920
not Spring outing, but WHERE
the acquired date-- sorry,

00:40:39.920 --> 00:40:44.480
the "acquired" date IS NULL semicolon.

00:40:44.480 --> 00:40:47.490
I could also use is NOT NULL--

00:40:47.490 --> 00:40:51.350
WHERE "acquired" IS NOT NULL, but that
would delete all the pieces of artwork

00:40:51.350 --> 00:40:53.510
that actually have an acquired date.

00:40:53.510 --> 00:40:57.260
Here, I want to leave only those
that do not have an acquired date.

00:40:57.260 --> 00:41:00.770
So I'll say where "acquired" IS NULL.

00:41:00.770 --> 00:41:03.320
Now, I'll hit Enter,
nothing seems to happen,

00:41:03.320 --> 00:41:07.850
but I'll say Select [? star ?]
FROM "collections" semicolon.

00:41:07.850 --> 00:41:12.020
And I'll see that piece
of artwork is now gone.

00:41:12.020 --> 00:41:16.820
I only have those that
have an acquired date.

00:41:16.820 --> 00:41:22.880
OK, so similar to [? INSERT, ?] we've
been able to [? delete ?] one row

00:41:22.880 --> 00:41:23.990
at a time.

00:41:23.990 --> 00:41:28.370
But it would probably be helpful for
us to delete multiple rows at a time

00:41:28.370 --> 00:41:29.550
as well.

00:41:29.550 --> 00:41:31.460
Let's look again at our table.

00:41:31.460 --> 00:41:36.200
Here we have our artwork, as
it currently stands, and maybe

00:41:36.200 --> 00:41:42.020
let's say we want to delete those
pieces that we acquired before 1909.

00:41:42.020 --> 00:41:46.970
Here in Boston, the MFA actually
moved locations in 1909 to a new place

00:41:46.970 --> 00:41:49.770
still in Boston, but a brand
new building altogether.

00:41:49.770 --> 00:41:53.090
So let's say that perhaps they left
some items at their old location,

00:41:53.090 --> 00:41:55.760
they're no longer part
of their collections.

00:41:55.760 --> 00:42:02.920
Via what condition do you think
could I select these three rows from

00:42:02.920 --> 00:42:08.670
my [? DELETE ?] [? FROM? ?] What could
I put in my condition to delete these

00:42:08.670 --> 00:42:09.900
three rows?

00:42:09.900 --> 00:42:11.670
Let me ask the audience here.

00:42:11.670 --> 00:42:15.780
What might you propose for [? delete ?]
[? from ?] here to delete those three

00:42:15.780 --> 00:42:16.575
particular rows?

00:42:21.540 --> 00:42:22.650
Let's go to [? Shiv. ?]

00:42:22.650 --> 00:42:26.468
SPEAKER 9: Greater than [? 5. ?]
[? id ?] equals greater than 5.

00:42:26.468 --> 00:42:28.010
SPEAKER 1: That's a good observation.

00:42:28.010 --> 00:42:29.468
So we could look at our table here.

00:42:29.468 --> 00:42:33.490
We see the id could be greater
than or equal to 5, that would

00:42:33.490 --> 00:42:36.220
remove these three rows certainly.

00:42:36.220 --> 00:42:38.380
There's probably another
way to do this too.

00:42:38.380 --> 00:42:42.190
Any other ideas besides the
id, what else can we use?

00:42:49.860 --> 00:42:51.710
Let's go to [? Yasavi. ?]

00:42:51.710 --> 00:42:55.070
SPEAKER 10: So we could compare the
dates and use greater than or less

00:42:55.070 --> 00:42:58.340
than sign with the acquired dates.

00:42:58.340 --> 00:43:00.800
SPEAKER 1: So you're proposing
to use the date column too.

00:43:00.800 --> 00:43:02.120
That makes sense to me as well.

00:43:02.120 --> 00:43:05.300
We could see acquired
here, perhaps we could see,

00:43:05.300 --> 00:43:11.100
is the acquired value less than 1909,
that year that we move locations,

00:43:11.100 --> 00:43:14.510
we could also probably use the title,
the [? accession ?] [? number, ?]

00:43:14.510 --> 00:43:15.390
and so on.

00:43:15.390 --> 00:43:17.600
And each has their own [? trade-offs. ?]

00:43:17.600 --> 00:43:20.030
Remember, though, for this
particular query, which

00:43:20.030 --> 00:43:23.480
is deleting those items
that were part of the museum

00:43:23.480 --> 00:43:29.210
acquired before 1909, probably
best to use the acquired

00:43:29.210 --> 00:43:32.240
date to actually delete those rows.

00:43:32.240 --> 00:43:35.930
What if, for instance, our ids
are actually not like this.

00:43:35.930 --> 00:43:38.210
But they're kind of interspersed around.

00:43:38.210 --> 00:43:41.870
I couldn't really do a query like if
the id is greater than or equal to 5,

00:43:41.870 --> 00:43:45.110
because I could also include those
that were maybe acquired 1911,

00:43:45.110 --> 00:43:48.680
[? 1956, ?] and so on. [? It's ?]
[? probably ?] ideal for me to use this

00:43:48.680 --> 00:43:52.140
date if it's the date that
I ultimately care about.

00:43:52.140 --> 00:43:53.210
So let's try this.

00:43:53.210 --> 00:43:55.580
I'll go back to my environment.

00:43:55.580 --> 00:43:59.090
And I'll try to delete
these three particular rows.

00:43:59.090 --> 00:44:03.750
I'll try DELETE FROM
"collections," as we saw before.

00:44:03.750 --> 00:44:05.630
But now here comes my condition.

00:44:05.630 --> 00:44:12.230
I'll say WHERE the "acquired" column
is less than some date that I'll give.

00:44:12.230 --> 00:44:17.180
And here, I'll give 1909-01-01.

00:44:17.180 --> 00:44:21.260
It turns out that SQLite has
a few ways to represent dates.

00:44:21.260 --> 00:44:26.300
One of which is this format that
follows the YYYY or the year,

00:44:26.300 --> 00:44:31.280
four digits, and then dash, and then
MM or the month with two digits,

00:44:31.280 --> 00:44:35.340
and then dash again, DD, in this
case, the day in two digits.

00:44:35.340 --> 00:44:42.470
So 1909-01-01 means January 1st, 1909.

00:44:42.470 --> 00:44:46.430
And I can use these same
operators with dates.

00:44:46.430 --> 00:44:51.230
I could say greater than, or less
than, or equal to this particular date.

00:44:51.230 --> 00:44:54.380
And SQLite will be able to parse this
for me to understand what I mean,

00:44:54.380 --> 00:44:58.010
which is earlier than 1909.

00:44:58.010 --> 00:44:59.330
So let's try this then.

00:44:59.330 --> 00:45:02.450
I'll hit Enter, and
nothing seems to happen.

00:45:02.450 --> 00:45:05.870
But if I do SELECT [? star ?]
FROM "collections,"

00:45:05.870 --> 00:45:10.820
I now see I'm down to two paintings,
two pieces of artwork that

00:45:10.820 --> 00:45:15.720
were acquired only after 1909.

00:45:15.720 --> 00:45:23.190
OK, so given this part on DELETE, we're
able to delete one row or even multiple

00:45:23.190 --> 00:45:24.315
rows at a time.

00:45:24.315 --> 00:45:26.190
But there are still
instances where you might

00:45:26.190 --> 00:45:31.080
want to think about, should we delete
something or can we delete something.

00:45:31.080 --> 00:45:35.010
We'll talk about those in just a
minute in terms of constraints.

00:45:35.010 --> 00:45:36.180
We back in a few.

00:45:36.180 --> 00:45:37.860
And we're back.

00:45:37.860 --> 00:45:43.260
So we just saw how to delete
one row, even multiple rows.

00:45:43.260 --> 00:45:46.620
We haven't yet talked about
whether we should delete some data

00:45:46.620 --> 00:45:49.020
or whether we should delete
some data and particularly

00:45:49.020 --> 00:45:51.180
in the context of these
constraints, where

00:45:51.180 --> 00:45:53.340
you might have maybe a
piece of data you actually

00:45:53.340 --> 00:45:57.460
shouldn't delete to maintain
the integrity of your table.

00:45:57.460 --> 00:46:00.870
Now, one example of this is
a [? Foreign ?] [? Key ?]

00:46:00.870 --> 00:46:05.880
[? Constraint, ?] which means that you
have some table with a primary key that

00:46:05.880 --> 00:46:08.163
is referenced by some other table.

00:46:08.163 --> 00:46:10.080
And we'll get concrete
about this in a minute.

00:46:10.080 --> 00:46:13.380
But it might mean that if you
were to delete that primary key,

00:46:13.380 --> 00:46:16.210
that other table would
have nothing to reference.

00:46:16.210 --> 00:46:20.610
So let's update our schema here for
the MFA, the Museum of Fine Arts.

00:46:20.610 --> 00:46:25.140
And let's say that they have not just
a collection now, but also artists

00:46:25.140 --> 00:46:25.890
involved.

00:46:25.890 --> 00:46:29.970
And there exists a [? many ?] [? to ?]
[? many ?] relationship among artists

00:46:29.970 --> 00:46:32.160
and items in the collection.

00:46:32.160 --> 00:46:36.480
We could say that an artist might
make more than one piece of art

00:46:36.480 --> 00:46:37.710
in the collection.

00:46:37.710 --> 00:46:42.480
And a piece of art might be
created by more than one artist.

00:46:42.480 --> 00:46:47.050
Now concretely in our database,
this might look as follows.

00:46:47.050 --> 00:46:51.480
I could have a table for artists
and a table for collections.

00:46:51.480 --> 00:46:54.780
And notice how each has
their own primary key.

00:46:54.780 --> 00:46:58.110
Artists has an id column,
called our primary key column.

00:46:58.110 --> 00:47:02.400
Collections too has an id called
our primary key column here.

00:47:02.400 --> 00:47:06.420
Now, in the middle is this
table created that symbolizes,

00:47:06.420 --> 00:47:11.280
that represents the relationship among
artists and items in the collection.

00:47:11.280 --> 00:47:16.380
Here on the first row, we see the
artist id of 1 created the piece of art

00:47:16.380 --> 00:47:18.750
with the collection id of 2.

00:47:18.750 --> 00:47:20.830
So who did what here?

00:47:20.830 --> 00:47:25.200
Well, we can see that the artist
with the ID of 1 is Li Yin.

00:47:25.200 --> 00:47:31.740
Li Yin created this piece of art, the
id of 2, which is Imaginative landscape.

00:47:31.740 --> 00:47:35.340
So we can kind of relate in this
case artists with collections

00:47:35.340 --> 00:47:37.890
as we saw just a few weeks ago.

00:47:37.890 --> 00:47:41.880
But let's say we decide to
delete a particular artist,

00:47:41.880 --> 00:47:45.510
maybe we delete Unidentified
artists down here.

00:47:45.510 --> 00:47:48.100
Well, we could just delete
from the artist table.

00:47:48.100 --> 00:47:53.940
Maybe we find a condition to select
this row and we delete from artists.

00:47:53.940 --> 00:47:56.610
We'll do that here remove that row.

00:47:56.610 --> 00:47:59.520
But what have we done wrong?

00:47:59.520 --> 00:48:03.560
If we look at our created
table in particular,

00:48:03.560 --> 00:48:06.350
what have we now done wrong?

00:48:06.350 --> 00:48:13.860
What kind of problem might we [? run ?]
[? into? ?] Let's go to [? Karim. ?]

00:48:13.860 --> 00:48:17.315
SPEAKER 11: [? Yeah ?] [INAUDIBLE]
[? was ?] [? only ?] [? deleted ?]

00:48:17.315 --> 00:48:19.940
[? from ?] [? the ?] [? artist ?]
[INAUDIBLE] without the other

00:48:19.940 --> 00:48:20.753
[INAUDIBLE].

00:48:20.753 --> 00:48:21.920
SPEAKER 1: Yeah, good point.

00:48:21.920 --> 00:48:27.650
We only deleted the artist with
the id of 3 from the artists table.

00:48:27.650 --> 00:48:34.343
Now if I look at the created table and
I look for the artist with id of 3,

00:48:34.343 --> 00:48:35.010
do you see them?

00:48:35.010 --> 00:48:36.190
They don't exist anymore.

00:48:36.190 --> 00:48:38.190
So I can't understand this relationship.

00:48:38.190 --> 00:48:42.120
I don't know the artist
with the id of 3.

00:48:42.120 --> 00:48:47.790
So let's try this now with a new
kind of schema now in our database.

00:48:47.790 --> 00:48:50.280
I'll go back to my computer.

00:48:50.280 --> 00:48:53.850
And let me open up our new
version of our database.

00:48:53.850 --> 00:49:01.440
I'll type sqlite3, in this case, mfa.db,
same name, but now a different schema.

00:49:01.440 --> 00:49:05.640
If I type .schema,
notice a few things here.

00:49:05.640 --> 00:49:09.090
I have those same
tables we saw visually.

00:49:09.090 --> 00:49:13.740
I have collections, artists,
and a created table.

00:49:13.740 --> 00:49:16.680
And [? let me ?] focus in
particular on this created table.

00:49:16.680 --> 00:49:21.090
If I say .schema, in this
case, created, Enter,

00:49:21.090 --> 00:49:26.610
I'll see I have a FOREIGN KEY
constraint on my "artist_id"

00:49:26.610 --> 00:49:30.040
column and my "collection_id" column.

00:49:30.040 --> 00:49:36.300
So if I tried to delete an artist that
is referenced in my created table,

00:49:36.300 --> 00:49:39.990
I would probably raise
a constraint error--

00:49:39.990 --> 00:49:42.370
a FOREIGN KEY [? constraint ?] error.

00:49:42.370 --> 00:49:45.670
So let's just try this
and see what happens.

00:49:45.670 --> 00:49:50.550
Let me try DELETE FROM the
[? artist's ?] [? table, ?] DELETE FROM

00:49:50.550 --> 00:49:59.060
"artists" WHERE, let's say the "name"
of the artist is literally "Unidentified

00:49:59.060 --> 00:49:59.940
artist."

00:49:59.940 --> 00:50:03.560
This is the name they have in the
Museum of Fine Arts collections--

00:50:03.560 --> 00:50:05.270
Unidentified artist.

00:50:05.270 --> 00:50:08.270
So now I want to delete
them from that table.

00:50:08.270 --> 00:50:14.130
Well, if I hit Enter, I do get
that FOREIGN KEY constraint error.

00:50:14.130 --> 00:50:19.610
So does this mean I just can't delete
this artist or is there a workaround?

00:50:19.610 --> 00:50:21.290
Let's look at it visually again.

00:50:21.290 --> 00:50:24.740
I'll go back to what we
had before as our table,

00:50:24.740 --> 00:50:27.620
and let me ask the audience here.

00:50:27.620 --> 00:50:29.210
What's the solution?

00:50:29.210 --> 00:50:33.140
If I can't delete Unidentified
artist, because they

00:50:33.140 --> 00:50:38.300
have an id referenced by this
table, what should I maybe do first?

00:50:38.300 --> 00:50:39.695
What should I do instead?

00:50:39.695 --> 00:50:41.570
SPEAKER 12: Yeah, I want
to say that first we

00:50:41.570 --> 00:50:46.160
can delete the id that is being
referenced by this [? id ?]

00:50:46.160 --> 00:50:48.740
that we want to delete,
then we can delete it.

00:50:48.740 --> 00:50:51.170
In fact, if we want to
delete it from the artists,

00:50:51.170 --> 00:50:54.950
first we have to delete from the
created table [INAUDIBLE] table, which

00:50:54.950 --> 00:50:58.070
it is referencing, then we can
delete from the artist [INAUDIBLE]

00:50:58.070 --> 00:51:00.060
SPEAKER 1: Yeah, I like
your thinking here.

00:51:00.060 --> 00:51:02.480
So if we look at the
created table, you notice

00:51:02.480 --> 00:51:05.510
that we can't delete
Unidentified artist, because it's

00:51:05.510 --> 00:51:07.430
referenced in the created table.

00:51:07.430 --> 00:51:12.230
What we should maybe first do is delete
this row delete their affiliation

00:51:12.230 --> 00:51:16.100
with their work, and then
delete the artist so we don't

00:51:16.100 --> 00:51:18.900
run into this FOREIGN KEY constraint.

00:51:18.900 --> 00:51:22.520
So let's try this in
our SQLite environment.

00:51:22.520 --> 00:51:24.780
Come back to my computer.

00:51:24.780 --> 00:51:30.320
And again, our plan was to first delete
the artists affiliation with their work

00:51:30.320 --> 00:51:32.210
from the created table.

00:51:32.210 --> 00:51:34.460
So let me show you
first the created table.

00:51:34.460 --> 00:51:38.240
I'll say SELECT
[? star ?] from "created,"

00:51:38.240 --> 00:51:42.080
and see how it's exactly like
the table we just visualized.

00:51:42.080 --> 00:51:46.820
Now, let me try delete this artist
affiliation with their work.

00:51:46.820 --> 00:51:48.870
I'm not deleting the artist.

00:51:48.870 --> 00:51:52.280
I'm deleting their affiliation
with their authorship,

00:51:52.280 --> 00:51:55.710
their [? artist ?] [? ship, ?] whatever
you want to call it for their work.

00:51:55.710 --> 00:51:59.810
Now, I'll say DELETE FROM--

00:51:59.810 --> 00:52:07.310
let's say, DELETE FROM "artists"
WHERE the "artist_id" is equal to

00:52:07.310 --> 00:52:10.600
or equals 3.

00:52:10.600 --> 00:52:12.770
Well, I could do this.

00:52:12.770 --> 00:52:16.840
But there's probably a better way,
like I know the id, and that's fine.

00:52:16.840 --> 00:52:19.370
But we could also use, as
we saw a little bit ago,

00:52:19.370 --> 00:52:22.510
which were [? subqueries, ?]
a way to write a query that

00:52:22.510 --> 00:52:25.960
returns me the result. And that
result gets included in another query

00:52:25.960 --> 00:52:26.840
altogether.

00:52:26.840 --> 00:52:28.270
So let's try that instead.

00:52:28.270 --> 00:52:31.030
Let's say I don't know the artist_id.

00:52:31.030 --> 00:52:33.130
What I could do is
make a [? subquery. ?]

00:52:33.130 --> 00:52:39.430
I could say inside of this, let me
select the "id" FROM the "artists"

00:52:39.430 --> 00:52:48.550
table WHERE, in this case, the "name"
equals "Unidentified artist," and then,

00:52:48.550 --> 00:52:51.130
let me close out this
query, hit a semicolon.

00:52:51.130 --> 00:52:54.340
And I already see a typo in this,
but I'm just going to try it anyway.

00:52:54.340 --> 00:52:57.040
I will go ahead, and hit Enter.

00:52:57.040 --> 00:53:03.700
And let's see if I type SELECT
[? star ?] FROM "artists," Enter,

00:53:03.700 --> 00:53:06.550
I still see Unidentified artist.

00:53:06.550 --> 00:53:08.480
And I think my typo here was as follows.

00:53:08.480 --> 00:53:11.050
If I up my query using the
up arrow on my keyboard,

00:53:11.050 --> 00:53:16.300
I see DELETE from "artists," where the
"artist_id" equals some value here.

00:53:16.300 --> 00:53:19.270
But do I want to delete from artists?

00:53:19.270 --> 00:53:19.990
I don't.

00:53:19.990 --> 00:53:21.610
I actually can't delete from artists.

00:53:21.610 --> 00:53:25.340
What I should do instead is
delete from the created table.

00:53:25.340 --> 00:53:26.350
So let's try this.

00:53:26.350 --> 00:53:30.640
I'll say DELETE FROM
"created," making sure I only

00:53:30.640 --> 00:53:33.370
have double quotes
around created, then I'll

00:53:33.370 --> 00:53:38.470
say WHERE the artist_id
equals, let's say, either 3--

00:53:38.470 --> 00:53:41.530
but it could also be the
result of this [? subquery. ?]

00:53:41.530 --> 00:53:47.410
I'll say 1, 2, 3, 4,
and then SELECT the "id"

00:53:47.410 --> 00:53:55.880
FROM the "artist" table WHERE the
"name" equals "Unidentified artist."

00:53:55.880 --> 00:54:00.230
Then I'll close this [? subquery, ?]
hitting semicolon, hit Enter.

00:54:00.230 --> 00:54:05.750
And now, if I say SELECT [? star ?]
FROM "created" semicolon,

00:54:05.750 --> 00:54:09.500
I should see that this artist's
affiliation with their work

00:54:09.500 --> 00:54:11.600
no longer exists.

00:54:11.600 --> 00:54:13.110
And because it no longer exists.

00:54:13.110 --> 00:54:15.990
I can now delete them
from the artists table.

00:54:15.990 --> 00:54:20.210
I don't have this FOREIGN
KEY artist_id referencing

00:54:20.210 --> 00:54:22.520
the primary key of this artist.

00:54:22.520 --> 00:54:23.900
So let's now try that.

00:54:23.900 --> 00:54:26.960
I could say SELECT
[? star ?] FROM "artists,"

00:54:26.960 --> 00:54:30.650
I see that Unidentified
artist is still in here.

00:54:30.650 --> 00:54:32.510
But let's delete them now.

00:54:32.510 --> 00:54:36.290
I'll say delete from,
let's say, "artists" WHERE

00:54:36.290 --> 00:54:42.620
the "name" equals "Unidentified artist,"
closing my quotes, semicolon, hit

00:54:42.620 --> 00:54:43.280
Enter.

00:54:43.280 --> 00:54:47.780
And now if I Select [? star ?]
FROM "artists" semicolon,

00:54:47.780 --> 00:54:52.520
I should see they're no longer in
this table to our earlier example.

00:54:52.520 --> 00:54:54.750
This becomes a 2-step process.

00:54:54.750 --> 00:54:59.480
First delete their affiliation,
then delete their name.

00:54:59.480 --> 00:55:07.910
OK, So let's try this yet again,
but now using some additional tools

00:55:07.910 --> 00:55:10.140
that we have at our disposal.

00:55:10.140 --> 00:55:16.860
So let's look back at this FOREIGN KEY
constraint that existed in this table.

00:55:16.860 --> 00:55:19.220
If you remember, it
looked a bit like this.

00:55:19.220 --> 00:55:21.980
In my created table, I had this line.

00:55:21.980 --> 00:55:28.220
FOREIGN KEY, artist_id, REFERENCES
the "id" column in "artists."

00:55:28.220 --> 00:55:34.430
Well, if that is the case, I can't
delete the artist with the id

00:55:34.430 --> 00:55:39.170
referenced by this [? foreign ?]
[? key. ?] But it turns out that I

00:55:39.170 --> 00:55:44.390
could specify some alternative action
that happens when I try to delete

00:55:44.390 --> 00:55:48.590
the id that is referenced by this
[? foreign ?] [? key. ?] I can specify

00:55:48.590 --> 00:55:52.040
that using a keyword called ON DELETE.

00:55:52.040 --> 00:55:54.860
So after FOREIGN KEY REFERENCES
[? "artist id," ?] I could say

00:55:54.860 --> 00:55:59.600
afterwards ON DELETE, and then specify
some action I want to actually happen

00:55:59.600 --> 00:56:03.500
when I tried to delete the primary
key referenced by this [? foreign ?]

00:56:03.500 --> 00:56:04.430
[? key. ?]

00:56:04.430 --> 00:56:08.000
One thing I could use
would be RESTRICT, which

00:56:08.000 --> 00:56:09.980
is kind of like what we saw before.

00:56:09.980 --> 00:56:14.600
If I try to delete a primary key is
[? referenced ?] by this [? foreign ?]

00:56:14.600 --> 00:56:17.090
[? key, ?] I will not
be allowed to do it.

00:56:17.090 --> 00:56:19.720
That action is restricted.

00:56:19.720 --> 00:56:23.410
I could also decide to take NO ACTION.

00:56:23.410 --> 00:56:28.150
In this case, I could delete the
primary key [? referenced ?] by this

00:56:28.150 --> 00:56:30.560
[? foreign ?] [? key, ?]
and nothing would happen.

00:56:30.560 --> 00:56:35.290
I would actually be allowed to do that,
which may be unwise in some cases,

00:56:35.290 --> 00:56:38.050
but I could give myself that power.

00:56:38.050 --> 00:56:41.590
I could also decide to SET NULL.

00:56:41.590 --> 00:56:46.330
That is if I delete the primary key
that is [? referenced ?] by this

00:56:46.330 --> 00:56:49.690
[? foreign ?] [? key, ?] what I'll do
is set the [? foreign ?] [? key ?] to be

00:56:49.690 --> 00:56:53.657
NULL, meaning that
value no longer exists.

00:56:53.657 --> 00:56:56.740
I can alternatively [? set ?] it to a
[? default ?] value for that column.

00:56:56.740 --> 00:57:02.770
Or perhaps, most compellingly, I
could try to CASCADE the deletion.

00:57:02.770 --> 00:57:06.590
Where CASCADE means if I delete
the artist, for instance,

00:57:06.590 --> 00:57:11.590
let's also delete their affiliation
with their artwork all in one go.

00:57:11.590 --> 00:57:16.430
This converts our 2-step
process into a 1-step process.

00:57:16.430 --> 00:57:18.710
So let's visualize this.

00:57:18.710 --> 00:57:24.040
Let's say we have now applied
this constraint ON DELETE CASCADE.

00:57:24.040 --> 00:57:27.850
So if I delete the artist, I'll
also delete their affiliation

00:57:27.850 --> 00:57:29.110
with their work.

00:57:29.110 --> 00:57:30.610
We'll have the same two tables--

00:57:30.610 --> 00:57:33.610
artists, collections, and created.

00:57:33.610 --> 00:57:39.430
Now again, artist_id references
this primary key in artists.

00:57:39.430 --> 00:57:40.510
Now, I'm going to try.

00:57:40.510 --> 00:57:43.550
I want to delete
Unidentified artist here.

00:57:43.550 --> 00:57:44.510
Well, I could do that.

00:57:44.510 --> 00:57:45.940
I could just delete their row.

00:57:45.940 --> 00:57:50.440
And now, instead of a FOREIGN
KEY constraint error, what I get

00:57:50.440 --> 00:57:51.490
is the following.

00:57:51.490 --> 00:57:55.690
[INAUDIBLE] [? created ?]
that this row is also gone.

00:57:55.690 --> 00:57:59.960
The row that had the
artist_id of 3 gets removed.

00:57:59.960 --> 00:58:04.960
We've cascaded the deletion
onward to the created table.

00:58:04.960 --> 00:58:08.760
So let's try this now with
a new database schema.

00:58:08.760 --> 00:58:10.590
I'll go back to my computer here.

00:58:14.497 --> 00:58:16.080
So I'm back in my [? TERMINAL ?] here.

00:58:16.080 --> 00:58:21.180
And I can type sqlite3 mfa.db
to reopen this database.

00:58:21.180 --> 00:58:26.160
And notice how if I type .schema, I've
updated this schema to now have ON

00:58:26.160 --> 00:58:27.150
DELETE CASCADE.

00:58:27.150 --> 00:58:29.130
Let me show you over here on the screen.

00:58:29.130 --> 00:58:34.290
Here on the created table, I now have
the very same kind of table schema,

00:58:34.290 --> 00:58:38.610
but now my only difference is
I've applied this ON DELETE action

00:58:38.610 --> 00:58:40.560
to my [? foreign ?]
[? key ?] constraints.

00:58:40.560 --> 00:58:42.870
In particular, I'm going
to [? cascade ?] the delete

00:58:42.870 --> 00:58:46.570
from the artist's table
to the created table.

00:58:46.570 --> 00:58:49.110
So let's try that out in SQLite.

00:58:49.110 --> 00:58:52.770
I'll come back over here,
and I will now try just

00:58:52.770 --> 00:59:01.440
DELETE FROM the "artists" table WHERE
the "name" equals "Unidentified artist.

00:59:01.440 --> 00:59:02.430
semicolon.

00:59:02.430 --> 00:59:08.280
Now, I'll hit Enter, and I don't get
a FOREIGN KEY constraint anymore.

00:59:08.280 --> 00:59:12.780
But if I say SELECT [? star ?]
FROM "created" semicolon,

00:59:12.780 --> 00:59:18.130
notice how I've also deleted the
artist's affiliation with their work.

00:59:18.130 --> 00:59:21.960
So that is wherever in
the artist_id column,

00:59:21.960 --> 00:59:27.940
I saw the id for the artist I
deleted, I would to delete that row,

00:59:27.940 --> 00:59:34.290
so I have no references to that primary
key, which is now gone from my table.

00:59:34.290 --> 00:59:36.340
So let me ask here.

00:59:36.340 --> 00:59:41.610
We've seen how to delete single
rows, how to delete multiple rows,

00:59:41.610 --> 00:59:44.595
and now how to delete data
among some constraints,

00:59:44.595 --> 00:59:46.470
like our [? foreign ?]
[? key ?] constraints.

00:59:46.470 --> 00:59:51.630
What questions do we
have on those techniques?

00:59:51.630 --> 00:59:52.650
Let's go to [? Han. ?]

00:59:52.650 --> 00:59:55.740
SPEAKER 13: I have a
question regarding deleting.

00:59:55.740 --> 01:00:01.290
The [? id ?] numbers have been removed
while we were removing the record.

01:00:01.290 --> 01:00:03.900
I was wondering if do we
have to clean it up somehow

01:00:03.900 --> 01:00:07.917
later or will they be populated
with the new data as it comes along?

01:00:07.917 --> 01:00:09.250
SPEAKER 1: Yeah, great question.

01:00:09.250 --> 01:00:13.690
So the question is, what happens to our
primary keys when we delete our data.

01:00:13.690 --> 01:00:15.900
So for this one, let
me show you the visual

01:00:15.900 --> 01:00:19.680
again that we had before of our
tables nicely printed on the side.

01:00:19.680 --> 01:00:26.790
So I'll come back here, and I will go
back to our idea of these joint tables

01:00:26.790 --> 01:00:31.440
where we had an artists table, a
collections table, and a created table.

01:00:31.440 --> 01:00:33.330
And we saw before that
we're going to delete

01:00:33.330 --> 01:00:36.000
this artist called Unidentified artist.

01:00:36.000 --> 01:00:40.290
So I'll delete them, and to your
point, [? well ?] the idea of 3

01:00:40.290 --> 01:00:43.410
no longer exists in this case.

01:00:43.410 --> 01:00:50.100
Now by default, at least in SQLite,
if I insert a new row, what I'll do

01:00:50.100 --> 01:00:53.940
is take the highest id
value and I will then

01:00:53.940 --> 01:00:58.440
make that the new primary key
for the new row that I insert.

01:00:58.440 --> 01:01:01.140
That is the default situation.

01:01:01.140 --> 01:01:04.170
I could if I wanted
to get more specific,

01:01:04.170 --> 01:01:11.550
and if I had done in my id integer
column here in my [? create ?]

01:01:11.550 --> 01:01:14.700
[? table, ?] if I had also said
this [? keyword ?] [? call ?]

01:01:14.700 --> 01:01:19.680
[? autoincrement, ?] all one word,
what would happen instead is I would

01:01:19.680 --> 01:01:23.710
actually reinsert an
ID that is not used.

01:01:23.710 --> 01:01:28.090
So in this case 3 is not used,
I could insert that one here.

01:01:28.090 --> 01:01:30.360
So up to you what you want to do.

01:01:30.360 --> 01:01:35.670
In general, SQlite by default will take
the highest ID, add one from there.

01:01:35.670 --> 01:01:38.940
If you specify this constraint
[? called ?] [? autoincrement, ?] you

01:01:38.940 --> 01:01:44.310
will instead take whatever id is not
being used and use that in your insert

01:01:44.310 --> 01:01:46.460
afterwards.

01:01:46.460 --> 01:01:51.530
OK, so now we've seen how to
insert, and how to delete data.

01:01:51.530 --> 01:01:55.560
But of course, we make mistakes when we
add data or even when we delete data.

01:01:55.560 --> 01:01:58.580
So we'll see in just a minute
how to update our values

01:01:58.580 --> 01:02:01.910
as well to correct typos and
even to update associations

01:02:01.910 --> 01:02:04.790
between artists and artwork.

01:02:04.790 --> 01:02:06.510
And we're back.

01:02:06.510 --> 01:02:10.250
So we've so far seen how to
insert some data into our tables

01:02:10.250 --> 01:02:12.080
and how to delete it.

01:02:12.080 --> 01:02:15.470
But sometimes, we don't want
to fully delete something,

01:02:15.470 --> 01:02:18.500
we just want to change its
value to correct some typo

01:02:18.500 --> 01:02:20.480
or correct some association.

01:02:20.480 --> 01:02:24.950
So let's think to our MFA example,
[? where ?] the Museum of Fine Arts

01:02:24.950 --> 01:02:27.270
has some tables that look like this.

01:02:27.270 --> 01:02:30.050
They have artists in their
collection, and they also

01:02:30.050 --> 01:02:32.180
have art work in their collection.

01:02:32.180 --> 01:02:36.950
They also have a created table to
associate artists with their artwork.

01:02:36.950 --> 01:02:43.190
Now in this case, we know that
I have this Unidentified artist.

01:02:43.190 --> 01:02:47.420
And we can see that they authored
this item in the collections,

01:02:47.420 --> 01:02:50.000
"Farmers working at dawn."

01:02:50.000 --> 01:02:55.730
But let's say that later on, we find
out it wasn't an Unidentified artist.

01:02:55.730 --> 01:03:00.020
It was instead Li Yin, who
created "Farmers working at dawn."

01:03:00.020 --> 01:03:06.350
How could we update our tables to
make sure it's Li Yin, who we have

01:03:06.350 --> 01:03:10.660
creating "Farmers working at dawn."

01:03:10.660 --> 01:03:15.154
SPEAKER 14: So what we could do is--

01:03:15.154 --> 01:03:20.690
we know that in created, we have our
artist_id and [? collection id, ?]

01:03:20.690 --> 01:03:23.920
so we could update it there.

01:03:23.920 --> 01:03:28.840
But somehow, we need to cascade it over
to the [? artists ?] [INAUDIBLE] now

01:03:28.840 --> 01:03:32.380
this unidentified [? artist ?]
[? through-- ?] I think we probably

01:03:32.380 --> 01:03:33.107
don't need it.

01:03:33.107 --> 01:03:34.690
SPEAKER 1: Yeah, so a good point here.

01:03:34.690 --> 01:03:39.250
We could actually probably change the
created table to [? re-associate ?]

01:03:39.250 --> 01:03:41.750
an artist with some new artwork here.

01:03:41.750 --> 01:03:42.880
So let's visualize this.

01:03:42.880 --> 01:03:44.540
I go to my created table.

01:03:44.540 --> 01:03:51.010
And here the artist_id is currently
3, but I want it really to be 1.

01:03:51.010 --> 01:03:55.870
I want Li Yin to be associated with this
piece called "Farmers working at dawn."

01:03:55.870 --> 01:04:01.600
So I could update the created table
to instead of having 3 here have 1.

01:04:01.600 --> 01:04:06.220
Now, we see Li Yin created "Farmers
working at dawn." [? And if ?]

01:04:06.220 --> 01:04:08.830
you're concerned about this
Unidentified artist here,

01:04:08.830 --> 01:04:11.380
I think it's OK to have
an artist in our table

01:04:11.380 --> 01:04:14.350
who may or may not have
an item in collections.

01:04:14.350 --> 01:04:17.240
We'll say that's OK at least for now.

01:04:17.240 --> 01:04:22.270
So if we can update our associations
between artists and collections

01:04:22.270 --> 01:04:26.770
like this, let's actually try to do
that in our very own database here.

01:04:26.770 --> 01:04:33.010
I'll go back to my SQLite environment,
and now, let me try to open it first.

01:04:33.010 --> 01:04:35.560
I'll use sqlite3 mfa.db.

01:04:35.560 --> 01:04:39.460
And now let me type .schema to show you
[INAUDIBLE] the very same schema from

01:04:39.460 --> 01:04:40.250
before.

01:04:40.250 --> 01:04:45.340
So I want to update the artists
association between Li Yin and "Farmers

01:04:45.340 --> 01:04:46.570
working at dawn."

01:04:46.570 --> 01:04:51.370
So let's say, I'll SELECT [? star ?]
FROM "created," like this, semicolon.

01:04:51.370 --> 01:04:55.990
Here, I have my artist_ids
and my collection_ids.

01:04:55.990 --> 01:05:00.340
I see that we have the
Unidentified artist, id of 3,

01:05:00.340 --> 01:05:05.690
creating "Farmers working at dawn,"
with the id of 1 in our collection here.

01:05:05.690 --> 01:05:11.170
So now, let's try to update the artist
who created this particular painting.

01:05:11.170 --> 01:05:16.660
Here, I have 3 associated with 1,
but I want 1 associated with 1,

01:05:16.660 --> 01:05:18.580
where 1 is the id for Li Yin.

01:05:18.580 --> 01:05:22.858
And [? 1 ?] [? in ?] my collection
id is the id for this artwork here.

01:05:22.858 --> 01:05:23.650
So let me try this.

01:05:23.650 --> 01:05:29.620
I'll say UPDATE "created" and
SET, let's say, "artist_id"

01:05:29.620 --> 01:05:32.770
equal to some particular value.

01:05:32.770 --> 01:05:35.650
Well, what value should I set it to?

01:05:35.650 --> 01:05:39.940
I could try to set it to Li Yin's
id, which we know is just 1.

01:05:39.940 --> 01:05:43.300
But let me try instead to
use a [? subquery ?] here.

01:05:43.300 --> 01:05:49.120
Let me try to say parentheses, and then
write some query to update this value.

01:05:49.120 --> 01:05:55.300
I'll say SELECT "id" from
"artists" like this Enter again,

01:05:55.300 --> 01:06:00.190
WHERE the "name" equals Li Yin.

01:06:00.190 --> 01:06:04.090
And then let me close
this [? subquery. ?] Well,

01:06:04.090 --> 01:06:10.510
if I try to run this query, what might
Happen I'm updating the created table,

01:06:10.510 --> 01:06:16.600
I'm setting the artist_id equal to the
id for Li Yin, but what I've forgotten

01:06:16.600 --> 01:06:21.520
is this WHERE to only
choose some rows to update.

01:06:21.520 --> 01:06:23.200
[INAUDIBLE] not close it yet.

01:06:23.200 --> 01:06:29.140
I'll instead say WHERE, in this
case, the "collection_id," the piece

01:06:29.140 --> 01:06:34.120
of artwork in our collection,
is equal to [? while ?] the id

01:06:34.120 --> 01:06:35.330
for this painting.

01:06:35.330 --> 01:06:39.100
I'll say SELECT "id" from
"collections," and then I'll

01:06:39.100 --> 01:06:45.710
say WHERE the "title" equals
"Farmers working at dawn."

01:06:45.710 --> 01:06:49.700
Now, I'll close this
[? subquery, ?] and hit semicolon.

01:06:49.700 --> 01:06:53.600
And here, we've seen our first
example of an UPDATE query.

01:06:53.600 --> 01:06:56.510
I'm trying to update
the artist_id column

01:06:56.510 --> 01:06:59.990
in created to be the id for Li Yin.

01:06:59.990 --> 01:07:03.050
I only want to do that
though on the row where

01:07:03.050 --> 01:07:07.460
collection_id is equal to the
id for this particular painting.

01:07:07.460 --> 01:07:09.470
[INAUDIBLE] change the attribution for.

01:07:09.470 --> 01:07:11.360
So now, I'll hit Enter.

01:07:11.360 --> 01:07:14.370
And if I SELECT
[? star ?] FROM "created,"

01:07:14.370 --> 01:07:20.270
I should hopefully see in this
case that the artist_id associated

01:07:20.270 --> 01:07:25.680
with this painting is 2 and also
down below here is 1 as well.

01:07:25.680 --> 01:07:31.290
So I have Li Yin associated
with now two paintings overall.

01:07:31.290 --> 01:07:34.760
So let's get a grasp on what
this [? UPDATE ?] syntax really

01:07:34.760 --> 01:07:36.140
looks like in general.

01:07:36.140 --> 01:07:38.720
And for that, let's show the slide here.

01:07:38.720 --> 01:07:44.090
We have this [? UPDATE ?] [? keyword, ?]
[? UPDATE ?] statement in SQL to take

01:07:44.090 --> 01:07:47.510
a table name and update
the columns inside of it.

01:07:47.510 --> 01:07:51.410
I say UPDATE, then the name
of the table I want to update,

01:07:51.410 --> 01:07:56.810
then I say SET some column
equal to some value.

01:07:56.810 --> 01:07:59.910
I could if I wanted to have
more than one column here.

01:07:59.910 --> 01:08:03.620
I could say maybe title
and even maybe if we're

01:08:03.620 --> 01:08:07.040
talking about authors, authors
over here, or even acquired date.

01:08:07.040 --> 01:08:11.930
I could update more than one column
in my [? SET ?] [? portion ?] here.

01:08:11.930 --> 01:08:16.370
Then comes this WHERE portion,
where some condition is true.

01:08:16.370 --> 01:08:19.970
I want to make sure I don't
update all of my rows.

01:08:19.970 --> 01:08:24.270
I only update those where some
condition is actually true.

01:08:24.270 --> 01:08:28.069
So this is your syntax
for updating some columns.

01:08:28.069 --> 01:08:30.859
Let's say, if you want to
change an artist's attribution

01:08:30.859 --> 01:08:34.950
or if you want to change
a typo you've made.

01:08:34.950 --> 01:08:38.810
So let's see this now, not
just in terms of changing

01:08:38.810 --> 01:08:42.394
artists and their attributions,
let's see a use case for update

01:08:42.394 --> 01:08:45.300
where we've made some
mistakes in our data.

01:08:45.300 --> 01:08:49.609
And let's say the museum decides
to host some kind of event

01:08:49.609 --> 01:08:52.160
where people vote on their
favorite piece of artwork,

01:08:52.160 --> 01:08:54.920
they kind of [? handwrite ?] or
type it into some online form.

01:08:54.920 --> 01:08:58.010
Well, when I get back, a
CSV of those responses,

01:08:58.010 --> 01:09:02.600
some Comma Separated Values, one
line for each vote from our people

01:09:02.600 --> 01:09:04.640
who've attended this convention.

01:09:04.640 --> 01:09:08.479
Let's go back over here,
and I'll show you that CSV.

01:09:08.479 --> 01:09:11.060
Let me go to my environment.

01:09:11.060 --> 01:09:16.939
And I'll type code mfa.csv to
open this CSV that I already have.

01:09:16.939 --> 01:09:19.430
And here, actually,
it's not called mfa.csv.

01:09:19.430 --> 01:09:22.430
It's called code votes.csv.

01:09:22.430 --> 01:09:27.890
And now here we can see, I
have a table of one column

01:09:27.890 --> 01:09:29.870
that has several votes inside of it.

01:09:29.870 --> 01:09:33.140
Let's see, we have maybe
20 votes to be exact.

01:09:33.140 --> 01:09:35.840
So the first row is the header row.

01:09:35.840 --> 01:09:39.649
I have in this CSV, one
column called title.

01:09:39.649 --> 01:09:44.029
And each line here is one vote for
a [? museum-goers ?] favorite piece

01:09:44.029 --> 01:09:44.750
of artwork.

01:09:44.750 --> 01:09:48.319
We see "Farmers working at dawn,"
we see "Imaginative landscape,"

01:09:48.319 --> 01:09:49.790
"Profusion of flowers."

01:09:49.790 --> 01:09:55.280
And our goal is to count up these
votes to see which is the most popular.

01:09:55.280 --> 01:10:01.670
Well, let's try using .import again to
actually turn this CSV into our very

01:10:01.670 --> 01:10:03.530
own SQLite database.

01:10:03.530 --> 01:10:08.240
Let me go back to my [? TERMINAL, ?]
and I'll type sqlite3 votes.db

01:10:08.240 --> 01:10:10.580
to create a votes database.

01:10:10.580 --> 01:10:11.630
I'll hit Enter.

01:10:11.630 --> 01:10:14.390
And now, I can use .import.

01:10:14.390 --> 01:10:21.440
I could say .import a CSV called
votes.csv into a table also called

01:10:21.440 --> 01:10:22.370
votes.

01:10:22.370 --> 01:10:23.390
I'll Enter now.

01:10:23.390 --> 01:10:28.280
And now if I SELECT [? star ?]
FROM "votes" semicolon,

01:10:28.280 --> 01:10:33.210
I'll see all of my votes
now in a single table.

01:10:33.210 --> 01:10:35.040
So I want to tally these votes.

01:10:35.040 --> 01:10:37.790
And we could do that using a
technique we learned back in the week

01:10:37.790 --> 01:10:40.010
on querying and relating.

01:10:40.010 --> 01:10:44.930
What if I wanted to group by my
titles and count up those groups,

01:10:44.930 --> 01:10:45.990
I could do that.

01:10:45.990 --> 01:10:50.990
I could say SELECT, let's say,
"title," and also the COUNT of "title,"

01:10:50.990 --> 01:10:55.530
and [? they ?] count up the votes
for each title FROM, in this case,

01:10:55.530 --> 01:10:58.050
my "votes" table like this.

01:10:58.050 --> 01:11:03.510
Now, I want to count not just each vote
individually, but I want to group them.

01:11:03.510 --> 01:11:07.870
I want to count votes by group
where that group is, in this case,

01:11:07.870 --> 01:11:09.610
the "title" itself.

01:11:09.610 --> 01:11:13.110
So let me try running this
query and hitting Enter.

01:11:13.110 --> 01:11:17.370
Well, [? I'll ?] [? see ?] I've
started to count some votes,

01:11:17.370 --> 01:11:23.130
like I have Imaginative landscape here,
Spring outing, Farmers working at dawn,

01:11:23.130 --> 01:11:25.110
Spring outing down below here too.

01:11:25.110 --> 01:11:28.950
But I seem to be missing something,
like I shouldn't have this many groups.

01:11:28.950 --> 01:11:30.570
I should only have four groups.

01:11:30.570 --> 01:11:32.940
Why do I have so many?

01:11:32.940 --> 01:11:35.010
Well, it seems like there
are some typos here.

01:11:35.010 --> 01:11:37.660
What typos do you see?

01:11:37.660 --> 01:11:40.680
Feel free to raise your hand.

01:11:40.680 --> 01:11:43.920
SPEAKER 15: So I think
that it is a title--

01:11:43.920 --> 01:11:49.215
when you count it is case sensitive,
so maybe it is counting one by one.

01:11:49.215 --> 01:11:50.340
SPEAKER 1: Yeah, good idea.

01:11:50.340 --> 01:11:53.670
So maybe it's case sensitive our
[? GROUP ?] [? BY, ?] and you'd be

01:11:53.670 --> 01:11:56.550
correct if you said that. [? GROUP ?]
[? BY ?] is, in this case,

01:11:56.550 --> 01:11:57.910
case sensitive.

01:11:57.910 --> 01:12:01.020
So if we see here, we see
Farmers Working at Dawn,

01:12:01.020 --> 01:12:04.890
all capitalized, at least the
Farmers Working and Dawn capitalized.

01:12:04.890 --> 01:12:07.230
Well, that won't be
part of the same group

01:12:07.230 --> 01:12:13.590
as let's say Farmers working at dawn
all properly capitalized, in this case,

01:12:13.590 --> 01:12:17.940
with lowercase working and lowercase
dawn as it is the MFA collections.

01:12:17.940 --> 01:12:21.360
So we could probably fix
some capitalization here.

01:12:21.360 --> 01:12:23.240
What other things could we fix?

01:12:23.240 --> 01:12:25.665
So let's hear from one more person.

01:12:25.665 --> 01:12:26.790
Let's go to [? Lawrence. ?]

01:12:26.790 --> 01:12:29.663
SPEAKER 16: [? The ?]
[? count ?] on each typo.

01:12:29.663 --> 01:12:31.830
SPEAKER 1: There are some
typos in here, absolutely.

01:12:31.830 --> 01:12:35.400
So if I look at some of
these, I see not farmers,

01:12:35.400 --> 01:12:37.620
but [? Famers ?] working at dawn.

01:12:37.620 --> 01:12:40.410
I might see instead of
Imaginative landscape,

01:12:40.410 --> 01:12:43.470
I just see [? Imagintive ?]
landscape, so definitely some

01:12:43.470 --> 01:12:46.120
typos in here to fix as well.

01:12:46.120 --> 01:12:49.380
So let's start working on using our
[? UPDATE ?] tool or [? UPDATE ?]

01:12:49.380 --> 01:12:53.230
keyword to fix some of this data,
to clean it up, so to speak,

01:12:53.230 --> 01:12:56.460
so we can count up our
votes for these artworks.

01:12:56.460 --> 01:13:02.550
Come back to my [? TERMINAL. ?] And now,
let's try updating to at least remove

01:13:02.550 --> 01:13:04.170
some of this [? whitespace ?] here.

01:13:04.170 --> 01:13:07.900
You might notice that Imaginative
landscape has a space before it,

01:13:07.900 --> 01:13:09.570
and so does Spring outing here.

01:13:09.570 --> 01:13:13.860
That space counts as part of the
title, and I want to remove it.

01:13:13.860 --> 01:13:15.850
So what could I do?

01:13:15.850 --> 01:13:21.750
I could try using a function in SQLite,
one called [? trim. ?] [? Trim ?] takes

01:13:21.750 --> 01:13:26.020
a string and just removes leading and
trailing [? whitespace, ?] [INAUDIBLE]

01:13:26.020 --> 01:13:28.650
back only the characters in the middle.

01:13:28.650 --> 01:13:29.830
So let me try that.

01:13:29.830 --> 01:13:34.410
I'll say UPDATE "votes,"
my table here, and SET

01:13:34.410 --> 01:13:38.710
the "title" column equal to what?

01:13:38.710 --> 01:13:41.130
Well, the trimmed version of title.

01:13:41.130 --> 01:13:42.810
Trim looks like this--

01:13:42.810 --> 01:13:47.500
trim, and then some parentheses
to say the input to this function.

01:13:47.500 --> 01:13:49.620
I'll give it the "title" column.

01:13:49.620 --> 01:13:51.610
Now, I'll hit semicolon.

01:13:51.610 --> 01:13:54.250
And this query will run as follows.

01:13:54.250 --> 01:13:56.370
It will update the votes table.

01:13:56.370 --> 01:13:59.880
And inside the votes table
set, the title column

01:13:59.880 --> 01:14:04.950
equal to the trimmed version of
whatever values are inside that column,

01:14:04.950 --> 01:14:09.210
trimming here means remove trailing
and leading [? whitespace. ?]

01:14:09.210 --> 01:14:11.850
I don't need to apply any condition.

01:14:11.850 --> 01:14:15.270
I don't need to say WHERE dot
dot dot, because I want this

01:14:15.270 --> 01:14:18.910
to apply to all rows in my data set.

01:14:18.910 --> 01:14:20.220
Now, I'll hit Enter.

01:14:20.220 --> 01:14:21.750
And let me try running this again.

01:14:21.750 --> 01:14:24.840
I'll say up arrow to
get back my old query.

01:14:24.840 --> 01:14:29.430
Now, I'll try counting again, and
I think we're making some progress,

01:14:29.430 --> 01:14:31.620
like I can see that I've removed--

01:14:31.620 --> 01:14:34.890
[? removed ?] [? quote ?] [? unquote, ?]
those titles that had leading

01:14:34.890 --> 01:14:37.800
or trailing [? whitespace. ?]
They're now part of a group

01:14:37.800 --> 01:14:41.400
because their titles
now match those groups.

01:14:41.400 --> 01:14:42.120
Let's keep going.

01:14:42.120 --> 01:14:44.620
We noticed before, we had
some capitalization issues.

01:14:44.620 --> 01:14:46.930
So let's fix those as well.

01:14:46.930 --> 01:14:50.670
I can go back to my [? TERMINAL, ?] and
we could introduce this other function,

01:14:50.670 --> 01:14:52.170
one called [? upper. ?]

01:14:52.170 --> 01:14:55.620
[? Upper ?] converts
a string like Farmers

01:14:55.620 --> 01:14:59.490
working at dawn to be all in
uppercase, kind of shout it out.

01:14:59.490 --> 01:15:03.060
And that's useful for us here because
we could kind of force everything

01:15:03.060 --> 01:15:07.840
to uppercase, thereby removing any
inconsistencies in capitalization.

01:15:07.840 --> 01:15:08.950
So let me try that.

01:15:08.950 --> 01:15:12.130
I'll say UPDATE, again,
the "votes" table.

01:15:12.130 --> 01:15:18.480
And this time, I'll set "title" equal
to the uppercase version of "title."

01:15:18.480 --> 01:15:22.200
So upper here is a
function in SQLite that

01:15:22.200 --> 01:15:25.530
takes some values associated with
the column and [? uppercases ?]

01:15:25.530 --> 01:15:30.420
the entire thing to remove any
inconsistencies in capitalization.

01:15:30.420 --> 01:15:34.620
Again, no condition here, because
I want to apply this to all rows.

01:15:34.620 --> 01:15:35.730
I'll hit Enter.

01:15:35.730 --> 01:15:41.340
And now if I SELECT [? star ?]
FROM "collections" semicolon--

01:15:41.340 --> 01:15:43.618
oops, SELECT [? star ?]
FROM "votes," in this case--

01:15:43.618 --> 01:15:44.910
no longer as collections table.

01:15:44.910 --> 01:15:48.030
I'll say SELECT [? star ?]
FROM "votes" semicolon,

01:15:48.030 --> 01:15:51.390
I should see all of
my titles capitalized.

01:15:51.390 --> 01:15:53.130
But now, I want to group by.

01:15:53.130 --> 01:15:54.060
So let's try this.

01:15:54.060 --> 01:16:00.120
I'll go up arrow to get back to my old
query that GROUP BY "title," hit Enter.

01:16:00.120 --> 01:16:02.760
And now, we're still
making some more progress.

01:16:02.760 --> 01:16:05.970
I'm reducing my number of
groups until I get to 4.

01:16:05.970 --> 01:16:10.260
But there's still some typos, like
we see Famers working at dawn,

01:16:10.260 --> 01:16:14.130
or Farmers working, or
Farmesr working at dawn.

01:16:14.130 --> 01:16:16.560
We want these to be included
as part of this group--

01:16:16.560 --> 01:16:18.870
Farmers working at dawn.

01:16:18.870 --> 01:16:22.240
So we should fix some
of these typos here.

01:16:22.240 --> 01:16:25.380
Let me go back to my [? TERMINAL. ?]
Now, let's try this out.

01:16:25.380 --> 01:16:29.040
I could, in this case, just
manually correct these.

01:16:29.040 --> 01:16:31.170
You could imagine me
doing something like this.

01:16:31.170 --> 01:16:36.900
UPDATE "votes" and SET the
"title" equal to all caps

01:16:36.900 --> 01:16:43.170
"FARMERS WORKING AT DAWN," now
including a condition, WHERE the "title"

01:16:43.170 --> 01:16:50.617
presently is equal to for "FAMERS
WORKING" like this, semicolon--

01:16:50.617 --> 01:16:51.450
or not Famers work--

01:16:51.450 --> 01:16:54.480
I think it's Farmers Working, just all--

01:16:54.480 --> 01:16:55.620
in two words.

01:16:55.620 --> 01:16:56.703
That was one title we saw.

01:16:56.703 --> 01:16:57.620
[? TRANSCRIPT BREAK ?]

01:16:57.620 --> 01:16:59.950
SPEAKER: I could manually
update it a bit like this.

01:16:59.950 --> 01:17:02.610
Let me hit Enter, and
let me do the same query

01:17:02.610 --> 01:17:05.550
by hitting the up arrow on my keyboard.

01:17:05.550 --> 01:17:07.910
Let me find this one,
hit Enter, and now we

01:17:07.910 --> 01:17:10.880
can see that that title,
"Farmers Working,"

01:17:10.880 --> 01:17:13.040
is now just part of the main group.

01:17:13.040 --> 01:17:15.470
"Farmers Working At Dawn."

01:17:15.470 --> 01:17:17.040
I could keep going here.

01:17:17.040 --> 01:17:21.560
I could try to convert "Farmers Working
At Dawn" to "Farmers Working At Dawn."

01:17:21.560 --> 01:17:22.500
Let me try it.

01:17:22.500 --> 01:17:24.110
I could say select.

01:17:24.110 --> 01:17:25.952
Let's say-- not select.

01:17:25.952 --> 01:17:26.535
Let me update.

01:17:26.535 --> 01:17:29.690
Let me update in this
case votes, and set

01:17:29.690 --> 01:17:37.340
title equal to "Farmers Working
At Dawn" where the title equals

01:17:37.340 --> 01:17:41.330
in this case "Farmers Working At Dawn."

01:17:41.330 --> 01:17:47.330
Semicolon, Enter, let me hit up arrow
to go back to my old query to group by,

01:17:47.330 --> 01:17:48.890
and now we're getting somewhere.

01:17:48.890 --> 01:17:52.370
I'm seeing now only two groups
for "Farmers Working At Dawn,"

01:17:52.370 --> 01:17:55.190
but one more to fix.

01:17:55.190 --> 01:17:59.150
Now we'd be here for a while
if I spend all this time

01:17:59.150 --> 01:18:04.380
one by one fixing these typos,
so there's probably a better way.

01:18:04.380 --> 01:18:05.150
There is.

01:18:05.150 --> 01:18:08.090
This one is going to be
using the like keyword.

01:18:08.090 --> 01:18:10.340
We don't have to say equal so much.

01:18:10.340 --> 01:18:11.840
We can just say like.

01:18:11.840 --> 01:18:16.850
Now, I could try to match any title that
looks like "Farmers Working At Dawn"

01:18:16.850 --> 01:18:19.400
and update it to be the correct title.

01:18:19.400 --> 01:18:21.390
So let's try that here.

01:18:21.390 --> 01:18:29.150
I could now try update votes and
set title equal to "Farmers Working

01:18:29.150 --> 01:18:30.980
At Dawn."

01:18:30.980 --> 01:18:38.490
"Farmers Working At Dawn"
where the title is like--

01:18:38.490 --> 01:18:42.000
the title is like some
pattern I could give.

01:18:42.000 --> 01:18:48.210
Now for this data set, it turns out that
the only artwork that begins with F-A

01:18:48.210 --> 01:18:49.980
is Farmers Working At Dawn.

01:18:49.980 --> 01:18:55.110
So to match any title with F-A
and then anything afterwards,

01:18:55.110 --> 01:19:01.110
I could use F-A percent, and
then close out that pattern.

01:19:01.110 --> 01:19:04.800
Notice how I don't have to
have consistent capitalization.

01:19:04.800 --> 01:19:09.420
Like is case insensitive, and
notice how, too, the percent matches

01:19:09.420 --> 01:19:15.870
anything that comes after F-A.
Now in this data set, this is OK,

01:19:15.870 --> 01:19:20.490
but in a larger data set, you probably
want to avoid this kind of query

01:19:20.490 --> 01:19:23.580
because you could match much more
than Farmers Working At Dawn.

01:19:23.580 --> 01:19:27.600
You'd match anything that begins
with F-A, so be careful with this

01:19:27.600 --> 01:19:32.370
and think through what pattern
should you use to fix up your titles.

01:19:32.370 --> 01:19:36.120
Let me just hit Enter here, and now
let me try hitting up arrow again

01:19:36.120 --> 01:19:40.530
to show you the result, and now
we're in a pretty good place.

01:19:40.530 --> 01:19:44.820
I see all of my titles like Farmers
Working At Dawn are correctly

01:19:44.820 --> 01:19:50.350
capitalized, consistently
spelled, and all in one place.

01:19:50.350 --> 01:19:52.200
OK, let's try the others here.

01:19:52.200 --> 01:19:55.530
I also have "Imaginative landscape
which could be a little bit

01:19:55.530 --> 01:19:56.790
better designed here.

01:19:56.790 --> 01:20:00.210
A little bit better grouped as
well by fixing some of these typos.

01:20:00.210 --> 01:20:05.190
Well, here, I could try to
match on to maybe imaginative.

01:20:05.190 --> 01:20:08.380
That seems like a long enough word
that it could be useful for me here.

01:20:08.380 --> 01:20:10.260
So I'll say something like select--

01:20:10.260 --> 01:20:10.770
not select.

01:20:10.770 --> 01:20:19.050
I'll do update votes and set the title
equal to "Imaginative Landscape."

01:20:19.050 --> 01:20:26.640
Now I want to match any rows that
have a title like "Imaginative space

01:20:26.640 --> 01:20:30.150
percent," end quote, semicolon.

01:20:30.150 --> 01:20:35.010
Now I'll match any title that
has imaginative as a full word

01:20:35.010 --> 01:20:37.650
at the very beginning of this phrase.

01:20:37.650 --> 01:20:41.100
This is, I would say,
a better query than F-A

01:20:41.100 --> 01:20:44.100
because F-A could match much
more than this painting name.

01:20:44.100 --> 01:20:48.150
Here, if we have only one painting
that begins with imaginative,

01:20:48.150 --> 01:20:50.232
this could work for us now.

01:20:50.232 --> 01:20:50.940
So I'll try this.

01:20:50.940 --> 01:20:55.100
I'll say Enter, and now,
let me rerun the query.

01:20:55.100 --> 01:20:57.832
Group by, and we're so close.

01:20:57.832 --> 01:20:58.790
There's still one more.

01:20:58.790 --> 01:21:02.090
"Imagintive Landscape,"
so for this one, I

01:21:02.090 --> 01:21:04.970
might just need to
manually update that piece.

01:21:04.970 --> 01:21:11.480
I could say let's do update,
votes, and set title equal to

01:21:11.480 --> 01:21:16.460
in this case "Imaginative Landscape"
where, of course, the title currently

01:21:16.460 --> 01:21:21.140
is "Imagintive Landscape"
without the A in there.

01:21:21.140 --> 01:21:23.240
Now hit semicolon to run this query.

01:21:23.240 --> 01:21:27.800
I'll show you the results, and now,
I think we're literally almost there.

01:21:27.800 --> 01:21:29.150
There's only one more to do.

01:21:29.150 --> 01:21:30.530
Profusion of Flowers.

01:21:30.530 --> 01:21:32.090
Let's fix that one now.

01:21:32.090 --> 01:21:34.880
I could say select votes--

01:21:34.880 --> 01:21:35.720
not select.

01:21:35.720 --> 01:21:36.290
Update.

01:21:36.290 --> 01:21:37.332
I'm always updating here.

01:21:37.332 --> 01:21:41.510
Update votes, and then
we want to set the title

01:21:41.510 --> 01:21:45.110
column to be equal to profusion--

01:21:45.110 --> 01:21:49.160
"Profusion of Flowers" like this.

01:21:49.160 --> 01:21:55.880
Then where the title is like, in this
case, profusion at the beginning.

01:21:55.880 --> 01:21:59.330
Notice how I can't use
perfusion space and then

01:21:59.330 --> 01:22:04.190
afterwards because we have one title
that is literally just "Profusion,"

01:22:04.190 --> 01:22:05.030
plain and simple.

01:22:05.030 --> 01:22:06.095
No spaces afterwards.

01:22:06.095 --> 01:22:07.220
No spaces at the beginning.

01:22:07.220 --> 01:22:10.860
To match that, I should just
say profusion at the beginning.

01:22:10.860 --> 01:22:14.660
Now I can hit Enter, and
if I do my usual query

01:22:14.660 --> 01:22:17.270
to show you the votes
grouped by title, I

01:22:17.270 --> 01:22:21.800
see I'm finally at my correct,
fully-formatted data set.

01:22:21.800 --> 01:22:24.230
Farmers Working At Dawn has six votes.

01:22:24.230 --> 01:22:28.460
Imaginative Landscape has
five, and so on down the row.

01:22:28.460 --> 01:22:33.080
So here we've seen update being
used not just to associate artists

01:22:33.080 --> 01:22:36.140
with new paintings, but
also to clean our data,

01:22:36.140 --> 01:22:39.110
to make sure we can actually
tally votes up in a way that's

01:22:39.110 --> 01:22:42.750
consistent and clean in this case.

01:22:42.750 --> 01:22:45.420
What questions do we have on update now?

01:22:48.380 --> 01:22:50.200
Let's go to [? Vixay. ?]

01:22:50.200 --> 01:22:51.950
[? VIXAY: ?] Yeah, so
what I wanted to ask

01:22:51.950 --> 01:22:55.430
is if there is a specific
function for like lowercase.

01:22:55.430 --> 01:23:00.037
So there's a function for upper, and
these are the ones for all lowercase.

01:23:00.037 --> 01:23:02.120
SPEAKER: Yeah, a good
corresponding question here.

01:23:02.120 --> 01:23:03.980
We used upper, but is there lower?

01:23:03.980 --> 01:23:05.600
Turns out, there is a lower function.

01:23:05.600 --> 01:23:09.680
It's spelled lower, and is in
all lowercase like upper was,

01:23:09.680 --> 01:23:11.570
so I could use that as well.

01:23:11.570 --> 01:23:15.950
There is a whole list of what SQLite
calls these scalar functions, functions

01:23:15.950 --> 01:23:19.940
that take in some set of values
and return to you some modified

01:23:19.940 --> 01:23:21.480
version of those values.

01:23:21.480 --> 01:23:24.170
So if you do some googling, look
for SQLite scalar functions.

01:23:24.170 --> 01:23:30.420
You can find all those functions in
one place in the SQLite documentation.

01:23:30.420 --> 01:23:32.200
Let's take one more question here.

01:23:32.200 --> 01:23:33.620
Let's go to Rupinder.

01:23:33.620 --> 01:23:38.390
RUPINDER: So, yeah, my question is so
instead of updating all the titles,

01:23:38.390 --> 01:23:42.320
is it possible to create a new
column with the four categories

01:23:42.320 --> 01:23:45.355
that we want to have actually?

01:23:45.355 --> 01:23:47.230
SPEAKER: Yeah, I could
see that working, too.

01:23:47.230 --> 01:23:48.855
So there's a few ways to fix this data.

01:23:48.855 --> 01:23:53.060
You could imagine adding a
new column, and maybe you

01:23:53.060 --> 01:23:59.542
assign values to the rows in that column
based on what you see in the title

01:23:59.542 --> 01:24:00.500
that they've given you.

01:24:00.500 --> 01:24:03.260
So if you can match
something like imaginative,

01:24:03.260 --> 01:24:06.500
and you'd know that all your
paintings associated with imaginative

01:24:06.500 --> 01:24:10.340
have the same title, you could group
them into the one category, the two

01:24:10.340 --> 01:24:12.110
category, and so on.

01:24:12.110 --> 01:24:14.270
Here, you wouldn't be
modifying those titles.

01:24:14.270 --> 01:24:16.970
You'd just be trying
to strategically apply

01:24:16.970 --> 01:24:19.610
different categories, which could
also work as a solution here,

01:24:19.610 --> 01:24:22.430
too, so good thinking.

01:24:22.430 --> 01:24:27.740
OK, so we've seen now how to
insert data, how to update data,

01:24:27.740 --> 01:24:30.250
and previously, how to delete data.

01:24:30.250 --> 01:24:35.900
What we can still do though, are
learn how to have other SQL command--

01:24:35.900 --> 01:24:39.818
other SQL statements that can run
after we make our very own SQL

01:24:39.818 --> 01:24:40.610
statements as well.

01:24:40.610 --> 01:24:44.150
We'll come back and talk about
these things called triggers.

01:24:44.150 --> 01:24:45.840
And we're back.

01:24:45.840 --> 01:24:50.630
So we've seen now a whole collection
of ways to write data to a database.

01:24:50.630 --> 01:24:56.080
We've seen how to insert data, how
to update data, and even delete data.

01:24:56.080 --> 01:24:59.000
What we'll see now though,
this idea of a trigger,

01:24:59.000 --> 01:25:04.670
a way of writing a SQL statement to run
in response to some other SQL statement

01:25:04.670 --> 01:25:08.780
like an insert, an update, or a delete.

01:25:08.780 --> 01:25:12.890
Let's consider our museum
yet again, and let's say they

01:25:12.890 --> 01:25:15.860
have a schema a bit like this one now.

01:25:15.860 --> 01:25:18.920
They have items in their
collections, but they also

01:25:18.920 --> 01:25:21.290
have a transactions table.

01:25:21.290 --> 01:25:26.390
Now wouldn't it be nice if whenever I
deleted something from the collections

01:25:26.390 --> 01:25:31.190
table, it would show up in
transactions as sold, having been

01:25:31.190 --> 01:25:33.120
sold from the museum's collections.

01:25:33.120 --> 01:25:36.120
Well, I could use a
trigger to do just that.

01:25:36.120 --> 01:25:38.270
Let's see it visually first though.

01:25:38.270 --> 01:25:43.160
I'll say, let's try to delete "Spring
Outing" from our collections table.

01:25:43.160 --> 01:25:47.300
Well, if I do that, I could pretty
easily do it with delete from, right?

01:25:47.300 --> 01:25:50.540
Now it's no longer part
of my database, but if I

01:25:50.540 --> 01:25:55.760
have created a trigger to
run an insert whenever I hear

01:25:55.760 --> 01:25:59.210
or have listened to a delete
on my collections table,

01:25:59.210 --> 01:26:04.370
I could then see that same
title in my transactions table

01:26:04.370 --> 01:26:08.250
as sold all automatically.

01:26:08.250 --> 01:26:12.320
Let's think to maybe the museum
acquires some piece of artwork.

01:26:12.320 --> 01:26:15.440
Maybe they actually add something
new to their collection.

01:26:15.440 --> 01:26:19.880
We want that to show up in our
transactions as well with the action of

01:26:19.880 --> 01:26:20.570
bought.

01:26:20.570 --> 01:26:21.440
Well, let's try it.

01:26:21.440 --> 01:26:23.490
I could probably add a new row here.

01:26:23.490 --> 01:26:28.280
I could say let's add "Peonies and
Butterfly" to my collection here.

01:26:28.280 --> 01:26:31.700
Now, if I have the
appropriate trigger, I

01:26:31.700 --> 01:26:36.320
could run an insert into on
transactions automatically

01:26:36.320 --> 01:26:40.190
that would insert this same
title with the action of bot,

01:26:40.190 --> 01:26:45.110
thereby keeping a log of all of our
sold pieces of artwork and all of our

01:26:45.110 --> 01:26:47.360
bought pieces of artwork.

01:26:47.360 --> 01:26:50.000
Now to create what we're
calling a trigger here,

01:26:50.000 --> 01:26:52.430
we can use the following syntax.

01:26:52.430 --> 01:26:57.620
I could say first create
trigger, and then some name.

01:26:57.620 --> 01:27:00.260
So I tend to give triggers
names to identify them

01:27:00.260 --> 01:27:04.670
among all of my database [INAUDIBLE],,
like my tables and so on.

01:27:04.670 --> 01:27:08.840
Here, after I say create
trigger, I have to specify.

01:27:08.840 --> 01:27:15.080
Should this trigger, this statement,
run after or before some other SQL

01:27:15.080 --> 01:27:16.190
statement?

01:27:16.190 --> 01:27:18.350
Let's keep going with before here.

01:27:18.350 --> 01:27:22.850
Maybe it runs before an
insert on some table.

01:27:22.850 --> 01:27:23.960
That's fine.

01:27:23.960 --> 01:27:29.190
It could also run before an
update of a column on some table.

01:27:29.190 --> 01:27:34.190
It could even run before a delete
on some table, and keep in mind,

01:27:34.190 --> 01:27:38.450
this could be before or after any
of these kinds of SQL statements.

01:27:38.450 --> 01:27:42.770
In this case, we have
insert, update, and delete.

01:27:42.770 --> 01:27:48.050
Now after we say this, what we
should say, too, is for each row.

01:27:48.050 --> 01:27:53.210
For each row means that if I were
to maybe delete multiple rows,

01:27:53.210 --> 01:27:57.410
I should run my SQL statement
for each row that I delete.

01:27:57.410 --> 01:28:03.980
If I delete two rows, I should run
this statement coming soon two times.

01:28:03.980 --> 01:28:09.470
OK, so now we've set up our trigger,
at least kind of the data, the setup

01:28:09.470 --> 01:28:10.550
for it here.

01:28:10.550 --> 01:28:15.620
What we haven't done yet is specify
what statement should run whenever

01:28:15.620 --> 01:28:18.020
we see a delete on particular table.

01:28:18.020 --> 01:28:19.640
Well, let's try this.

01:28:19.640 --> 01:28:21.590
Here I could say begin.

01:28:21.590 --> 01:28:24.860
Begin means here comes my
statement I want to run

01:28:24.860 --> 01:28:28.370
whenever I hear a delete on this table.

01:28:28.370 --> 01:28:29.840
Now, all right.

01:28:29.840 --> 01:28:31.430
That query down below.

01:28:31.430 --> 01:28:34.700
That statement inside my
begin, and I'll finish it off

01:28:34.700 --> 01:28:37.910
with an end to say this is
the end of my statement,

01:28:37.910 --> 01:28:41.600
and my entire trigger looks like
all of this lines of code here

01:28:41.600 --> 01:28:47.210
to say listen for a delete and run
this statement in here for each row

01:28:47.210 --> 01:28:50.530
that I delete, or update, or insert.

01:28:50.530 --> 01:28:53.900
So let's try actually
implementing in this case

01:28:53.900 --> 01:28:57.460
the museum's collections where
they're trying to sell or buy

01:28:57.460 --> 01:28:59.800
pieces of artwork, and
automatically add them

01:28:59.800 --> 01:29:03.760
to their transaction table whenever
they create or delete some piece of data

01:29:03.760 --> 01:29:05.710
from their collections.

01:29:05.710 --> 01:29:10.300
Go back to my terminal here,
and let me pull up a database.

01:29:10.300 --> 01:29:14.260
I'll do SQLite 3 mfa.db,
and you'll notice

01:29:14.260 --> 01:29:17.380
that this is the same schema
that we've had before.

01:29:17.380 --> 01:29:21.490
We have collections, artists,
and the created table.

01:29:21.490 --> 01:29:24.620
What's missing here, though,
is our transactions table.

01:29:24.620 --> 01:29:26.530
So let me make that for us here.

01:29:26.530 --> 01:29:30.580
I'll say let me go create
table, transactions,

01:29:30.580 --> 01:29:36.000
and now let me give it an
ID column of type integer.

01:29:36.000 --> 01:29:41.030
Now, I'll also give it a title column
as we saw before called "Text."

01:29:41.030 --> 01:29:43.310
It's in this column
[INAUDIBLE],, the title

01:29:43.310 --> 01:29:46.100
of the piece we either bought or sold.

01:29:46.100 --> 01:29:50.960
Now, let me also include the
action column, also of type text

01:29:50.960 --> 01:29:55.210
to say whether we bought
or sold this piece.

01:29:55.210 --> 01:30:00.050
OK, finally, I'll make the
ID column my primary key.

01:30:00.050 --> 01:30:06.080
And if I say semicolon here and hit
Enter, I now have created this table.

01:30:06.080 --> 01:30:10.210
So if I say dot schema, notice
how I can now see my transactions

01:30:10.210 --> 01:30:13.990
table down below, and
I'm safe to run insert

01:30:13.990 --> 01:30:18.100
on this table to add some data to it.

01:30:18.100 --> 01:30:21.490
Well, let's try now creating
our very first trigger.

01:30:21.490 --> 01:30:25.300
We're going to try to create a trigger
that whenever an item is deleted

01:30:25.300 --> 01:30:28.990
from our collections table, we actually
add it to our transactions table

01:30:28.990 --> 01:30:33.200
with the action of sold, meaning
we sold this particular item.

01:30:33.200 --> 01:30:36.550
So let me say create
trigger, as we saw before,

01:30:36.550 --> 01:30:38.830
and give it some name to identify it.

01:30:38.830 --> 01:30:41.030
I'll call this one sell.

01:30:41.030 --> 01:30:47.290
Now, I'll say I want to run this trigger
before I delete on the collections

01:30:47.290 --> 01:30:48.170
table.

01:30:48.170 --> 01:30:52.870
So before I run the delete
query on my collections table,

01:30:52.870 --> 01:30:56.980
I want to run this statement instead.

01:30:56.980 --> 01:31:00.830
I'll say for each row, begin.

01:31:00.830 --> 01:31:04.180
I want to now give you some
statement to run for each row

01:31:04.180 --> 01:31:06.730
that I delete from collections.

01:31:06.730 --> 01:31:08.350
Let's make it this query here.

01:31:08.350 --> 01:31:12.160
On 1, 2, 3, 4 spaces
to indent and make sure

01:31:12.160 --> 01:31:18.850
this is clearly the query I want to run
after I run the delete on collections.

01:31:18.850 --> 01:31:23.020
I'll say I want to
insert into transactions.

01:31:23.020 --> 01:31:27.730
I want to insert into the
title and action columns.

01:31:27.730 --> 01:31:29.410
What value should I insert?

01:31:29.410 --> 01:31:34.210
Well, I should insert the old
title, and actually, in triggers,

01:31:34.210 --> 01:31:37.330
you get access to this
keyword called old.

01:31:37.330 --> 01:31:41.600
This is the title that we've
just deleted from our collection.

01:31:41.600 --> 01:31:46.540
So old.title gives me access to
the old row, the row we deleted,

01:31:46.540 --> 01:31:49.160
its title column in particular.

01:31:49.160 --> 01:31:54.210
Now, let me add in the action, which
is just plain and simple "sold."

01:31:54.210 --> 01:31:58.820
So this is the query to run
before I delete on collections.

01:31:58.820 --> 01:32:03.350
I will take the title of the row
I'm about to delete, and insert it

01:32:03.350 --> 01:32:06.950
into transactions along
with the action sold.

01:32:06.950 --> 01:32:11.070
Let me type a semicolon here to say this
is my entire query, hit Enter again,

01:32:11.070 --> 01:32:14.000
and now I can end that
particular statement

01:32:14.000 --> 01:32:18.110
I want to run after I hear
a delete on collections.

01:32:18.110 --> 01:32:21.770
So I'll hit Enter now, and
let me type dot schema.

01:32:21.770 --> 01:32:26.480
I can actually see this
trigger is part of my schema.

01:32:26.480 --> 01:32:29.810
I can see create trigger sell
before delete on transactions

01:32:29.810 --> 01:32:33.410
underneath the transactions table.

01:32:33.410 --> 01:32:36.680
OK, let's try putting
this into action here.

01:32:36.680 --> 01:32:39.470
Let me try actually deleting
something from our collections table.

01:32:39.470 --> 01:32:44.090
I'll say delete from collections
where the title equals

01:32:44.090 --> 01:32:45.920
"Profusion of Flowers."

01:32:45.920 --> 01:32:49.460
I want to sell this particular piece
of artwork from our collection.

01:32:49.460 --> 01:32:54.530
I'll hit semicolon here, and now
nothing seems to happen as usual,

01:32:54.530 --> 01:32:58.290
but if I select star from collections--

01:32:58.290 --> 01:33:02.820
from collections, I
see that piece is gone.

01:33:02.820 --> 01:33:08.070
It's no longer in here, and if I
select star from transactions--

01:33:08.070 --> 01:33:12.720
select star from transactions,
I actually see it's been sold.

01:33:12.720 --> 01:33:15.840
So automatically, I added
"Profusion of Flowers"

01:33:15.840 --> 01:33:18.810
with the action of sold
to my transactions table.

01:33:18.810 --> 01:33:21.400
I did not type insert myself.

01:33:21.400 --> 01:33:24.960
I instead created a trigger
to run this statement,

01:33:24.960 --> 01:33:29.720
insert into, whenever I
delete it on collections.

01:33:29.720 --> 01:33:31.010
Let's try one more.

01:33:31.010 --> 01:33:32.900
We want to not just be
able to sell artwork,

01:33:32.900 --> 01:33:36.260
but also add it to our
collection by buying it.

01:33:36.260 --> 01:33:39.050
So I create a trigger called buy.

01:33:39.050 --> 01:33:44.750
Let me type create
trigger "buy," and now I

01:33:44.750 --> 01:33:52.460
want to run this statement forthcoming
after an insert on collections.

01:33:52.460 --> 01:33:58.820
So I want to first insert a new item to
collections, and when I do, after I do,

01:33:58.820 --> 01:34:01.610
I want to run this query coming next.

01:34:01.610 --> 01:34:07.590
Now, I want to, for each row,
begin the statement I want to run.

01:34:07.590 --> 01:34:12.380
So as we said before, I want to
insert into the transactions table.

01:34:12.380 --> 01:34:18.090
I want to insert into the title
and action columns, just like this.

01:34:18.090 --> 01:34:21.380
Now, I want to insert some values.

01:34:21.380 --> 01:34:23.060
The values I want to insert--

01:34:23.060 --> 01:34:27.410
well, I want to insert the new
title, the new row I've just added.

01:34:27.410 --> 01:34:30.740
So in addition to old,
we also have new when

01:34:30.740 --> 01:34:33.320
I'm talking when I'm using
a trigger on an insert.

01:34:33.320 --> 01:34:40.100
So I'll say new.title, where again, new
is the new row or rows I'm inserting,

01:34:40.100 --> 01:34:44.930
and the dot title is
a column called title.

01:34:44.930 --> 01:34:47.360
Now, here I'll say
the action was bought.

01:34:47.360 --> 01:34:49.070
We bought this artwork.

01:34:49.070 --> 01:34:52.250
I'll hit semicolon here,
and now, I'll say end

01:34:52.250 --> 01:34:56.510
to say this is the end of my statement
I want to run for every row I

01:34:56.510 --> 01:34:58.580
insert on collections.

01:34:58.580 --> 01:35:00.200
Let me hit Enter here.

01:35:00.200 --> 01:35:01.520
I've created this trigger.

01:35:01.520 --> 01:35:06.180
I can type dot schema to
see it in my actual schema.

01:35:06.180 --> 01:35:07.910
Now, let's try inserting.

01:35:07.910 --> 01:35:11.630
I'll say insert into collections.

01:35:11.630 --> 01:35:15.680
I want to insert into the
title and accession number,

01:35:15.680 --> 01:35:18.950
as well as the acquired columns here.

01:35:18.950 --> 01:35:21.260
I want to insert some new artwork.

01:35:21.260 --> 01:35:24.630
Maybe I sold Profusion of Flowers,
but now I want to just buy it back.

01:35:24.630 --> 01:35:30.510
So I'll say I'll insert "Profusion
of Flowers" is the title.

01:35:30.510 --> 01:35:37.160
The accession number was 56.257,
and the date we acquired it, well,

01:35:37.160 --> 01:35:41.360
that was in 1956/04/12.

01:35:41.360 --> 01:35:43.310
Let's hit semicolon here.

01:35:43.310 --> 01:35:48.770
Now, I'll insert, and I'll say
select star from collections.

01:35:48.770 --> 01:35:53.310
I should see Profusion of
Flowers back in my table.

01:35:53.310 --> 01:35:56.780
Now though, if I say select
star from transactions--

01:35:56.780 --> 01:36:00.020
select star from
transactions, I should see

01:36:00.020 --> 01:36:03.980
I previously sold Profusion of
Flowers, and later bought it back.

01:36:03.980 --> 01:36:08.040
So again, I didn't run
this insert myself.

01:36:08.040 --> 01:36:14.160
I instead triggered it by making my very
own insert on the collections table.

01:36:14.160 --> 01:36:16.430
So this is the power of triggers.

01:36:16.430 --> 01:36:18.830
You're able to run
statements that I myself

01:36:18.830 --> 01:36:23.030
didn't create in response to
statements that I actually did.

01:36:23.030 --> 01:36:27.440
So let me ask here, what questions
do we have on these triggers

01:36:27.440 --> 01:36:29.000
and what they can do for us?

01:36:31.650 --> 01:36:33.230
Let's go to Simon.

01:36:33.230 --> 01:36:38.390
SIMON: I'd like to know
if there are only--

01:36:38.390 --> 01:36:42.440
there is only one SQL
inquiry that you can write

01:36:42.440 --> 01:36:46.170
in a trigger or multiple inquiries.

01:36:46.170 --> 01:36:49.700
SPEAKER: Yeah, we saw begin
and end in our triggers

01:36:49.700 --> 01:36:52.290
here, where begin indicates
the start of our statement,

01:36:52.290 --> 01:36:54.020
and end indicates the end.

01:36:54.020 --> 01:36:56.480
You actually can have
multiple statements

01:36:56.480 --> 01:36:59.450
inside of the begin
and end separated by,

01:36:59.450 --> 01:37:03.130
of course, that usual
semicolon that we have.

01:37:03.130 --> 01:37:07.030
OK, so one final idea here
that we're getting at-- we're

01:37:07.030 --> 01:37:09.370
kind of touching at with
this transactions table is

01:37:09.370 --> 01:37:12.700
this idea of having a soft deletion.

01:37:12.700 --> 01:37:17.380
So when I dropped something
from my collections table,

01:37:17.380 --> 01:37:20.890
it actually ended up in
my transactions table,

01:37:20.890 --> 01:37:24.490
so it wasn't fully deleted in the
sense that I'm keeping a record of it

01:37:24.490 --> 01:37:26.330
somewhere else.

01:37:26.330 --> 01:37:28.930
There's also one more
way among many others

01:37:28.930 --> 01:37:32.123
to implement this idea of soft
deletions, that is not quite fully

01:37:32.123 --> 01:37:34.540
deleting something from my
database but instead keeping it

01:37:34.540 --> 01:37:37.970
around with a log that we've
removed it in some way.

01:37:37.970 --> 01:37:39.940
Let's look at this table here.

01:37:39.940 --> 01:37:45.250
I have collections, but I also
have this column called "deleted,"

01:37:45.250 --> 01:37:49.000
and by default, the
value here will be 0.

01:37:49.000 --> 01:37:54.190
That is, by default, I'll add some
artwork to my collections table

01:37:54.190 --> 01:37:56.980
and deleted will be 0.

01:37:56.980 --> 01:38:01.210
If I want to delete a piece of
artwork though, what could I do?

01:38:01.210 --> 01:38:05.050
Instead of deleting it fully from
my table or removing the row,

01:38:05.050 --> 01:38:07.090
I could mark it as deleted.

01:38:07.090 --> 01:38:10.630
I could change the deleted
column from a 0 to a 1.

01:38:10.630 --> 01:38:13.270
Let's say I wanted to delete
Farmers Working At Dawn

01:38:13.270 --> 01:38:16.330
instead of writing literally
delete from collection

01:38:16.330 --> 01:38:18.670
where title equals
Farmers Working At Dawn, I

01:38:18.670 --> 01:38:23.500
could say update the deleted column of
collections, where the title is Farmers

01:38:23.500 --> 01:38:27.490
Working At Dawn, and
make it not 0, but 1.

01:38:27.490 --> 01:38:30.490
Now, I've marked this item as deleted.

01:38:30.490 --> 01:38:33.430
I could run a query that
excludes deleted items,

01:38:33.430 --> 01:38:37.880
but this row is still
around in my table.

01:38:37.880 --> 01:38:41.410
So let's try implementing
this idea of a soft deletion

01:38:41.410 --> 01:38:43.960
inside of our collections
table so that we

01:38:43.960 --> 01:38:48.310
don't lose records of things that
actually had in our collection.

01:38:48.310 --> 01:38:51.250
I'll come back to my
computer here, and let's work

01:38:51.250 --> 01:38:54.940
on altering our table
for collections here.

01:38:54.940 --> 01:38:58.360
If I type dot schema collections--

01:38:58.360 --> 01:39:04.480
dot schema collections, you should see
I have a table "collections," and also

01:39:04.480 --> 01:39:06.430
the triggers associated with it.

01:39:06.430 --> 01:39:10.420
What I don't have yet in collections,
if I go over here and show you,

01:39:10.420 --> 01:39:12.910
I don't have a deleted column.

01:39:12.910 --> 01:39:18.640
I have ID, title, accession number,
and acquired, but I don't have deleted.

01:39:18.640 --> 01:39:22.390
So as you learned last
week, we can use alter table

01:39:22.390 --> 01:39:27.550
to add a column to our collections
table, and by default, make the value

01:39:27.550 --> 01:39:28.300
0.

01:39:28.300 --> 01:39:29.940
So let's try that.

01:39:29.940 --> 01:39:34.450
I'll go back to my environment,
and now I'll run alter table.

01:39:34.450 --> 01:39:41.440
I might say alter table, and a table I
want to alter, which is "collections."

01:39:41.440 --> 01:39:46.930
Now I want to add the column
"deleted" to collections,

01:39:46.930 --> 01:39:50.410
and I want to make this type
integer, so our whole numbers

01:39:50.410 --> 01:39:56.020
whether positive or negative, and the
default value will be 0 in this case.

01:39:56.020 --> 01:40:01.210
Now, I'll hit Enter, and if I
type dot schema collections,

01:40:01.210 --> 01:40:07.630
I should see I have my very own
deleted column inside of collections.

01:40:07.630 --> 01:40:14.120
If I type, for instance, select
star from collections semicolon,

01:40:14.120 --> 01:40:20.330
I'll see that by default, all of these
values are 0 in my new deleted column.

01:40:20.330 --> 01:40:25.580
So instead of trying to literally
delete from collections,

01:40:25.580 --> 01:40:27.230
I could instead just update it.

01:40:27.230 --> 01:40:31.880
I could say take the updated
column, and flip it from a 0 to a 1.

01:40:31.880 --> 01:40:38.330
Let me try update collections
and set the deleted column

01:40:38.330 --> 01:40:45.830
equal to 1, where let's say the
title equals Farmers Working At Dawn.

01:40:45.830 --> 01:40:48.900
And my query wraps here, but
it should still work just fine.

01:40:48.900 --> 01:40:51.050
I'll hit semicolon, Enter.

01:40:51.050 --> 01:40:53.600
Now, I didn't use delete from.

01:40:53.600 --> 01:40:54.740
I used update.

01:40:54.740 --> 01:40:59.570
If I select star from
collections semicolon here,

01:40:59.570 --> 01:41:03.420
I'll see, well, Farmers Working
At Dawn is in this table,

01:41:03.420 --> 01:41:07.560
but technically-- at least
we marked it as deleted.

01:41:07.560 --> 01:41:12.830
So it seems now, I don't want to use
just select star from collections

01:41:12.830 --> 01:41:14.540
to see what's inside my collection.

01:41:14.540 --> 01:41:18.230
I want to apply a filter
to remove those that have

01:41:18.230 --> 01:41:21.745
a deleted value that's not equal to 0.

01:41:21.745 --> 01:41:22.370
Let's try this.

01:41:22.370 --> 01:41:30.650
I'll say select star from collections
where deleted does not equal 1.

01:41:30.650 --> 01:41:33.710
If I hit semicolon here, what do I see?

01:41:33.710 --> 01:41:37.518
Only those values that are not deleted.

01:41:37.518 --> 01:41:38.810
I could go ahead and find them.

01:41:38.810 --> 01:41:45.620
I could say maybe select star from
collections where deleted actually

01:41:45.620 --> 01:41:46.700
is 1.

01:41:46.700 --> 01:41:50.870
Really equals 1 like this,
and now I see all those rows

01:41:50.870 --> 01:41:53.630
that I marked as deleted.

01:41:53.630 --> 01:41:55.940
Now this has some advantages.

01:41:55.940 --> 01:41:57.970
One of them is we keep data around.

01:41:57.970 --> 01:41:59.530
We don't actually formally delete it.

01:41:59.530 --> 01:42:04.460
We can still recover it later on, but it
also has some tricky ethical questions,

01:42:04.460 --> 01:42:04.960
too.

01:42:04.960 --> 01:42:07.240
Like it's OK if we're
talking about artwork here,

01:42:07.240 --> 01:42:12.520
but if you're talking about user data,
is it right to only soft delete it

01:42:12.520 --> 01:42:14.020
if they ask you to delete it?

01:42:14.020 --> 01:42:17.800
Particularly in conversation
with new frameworks like GDPR,

01:42:17.800 --> 01:42:20.570
and the right to be forgotten,
and so on, it's up to you

01:42:20.570 --> 01:42:22.570
as a programmer to make
the right decision here.

01:42:22.570 --> 01:42:25.480
When should you find data that
[INAUDIBLE] and actually delete it,

01:42:25.480 --> 01:42:28.570
or when is it better to just
soft delete it and keep it around

01:42:28.570 --> 01:42:31.210
in case you need it for later?

01:42:31.210 --> 01:42:36.490
So we'll end here on this note where
if I say select star from collections,

01:42:36.490 --> 01:42:43.570
and I want to find those only where
deleted is not equal to 0 like this--

01:42:43.570 --> 01:42:49.150
or not equal to 1, this gives me back
all of the items that are not deleted.

01:42:49.150 --> 01:42:51.670
But wouldn't it be nice
if I could actually

01:42:51.670 --> 01:42:57.580
run a query on some brand new temporary
table that actually only will ever have

01:42:57.580 --> 01:43:00.250
those items that are not soft deleted?

01:43:00.250 --> 01:43:03.140
Turns out, we can do that
with an idea called views,

01:43:03.140 --> 01:43:06.130
and we'll see those in
much more depth next week.

01:43:06.130 --> 01:43:08.010
We'll see you then.