WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.437 [AUDIO LOGO] 00:00:16.720 --> 00:00:20.080 [? CARTER ZENKE: ?] Well, hello, one and all, and welcome back to CS50's 00:00:20.080 --> 00:00:22.480 Introduction to Databases with SQL. 00:00:22.480 --> 00:00:25.720 My name is Carter [? Zenke, ?] and last we left off, 00:00:25.720 --> 00:00:28.450 we learned how to create our very own database schemas-- 00:00:28.450 --> 00:00:31.600 that is, a way to organize data in our database. 00:00:31.600 --> 00:00:34.450 Today, we'll learn how to actually add data to our databases-- 00:00:34.450 --> 00:00:38.150 to insert, to update, and delete data. 00:00:38.150 --> 00:00:43.330 We'll do all of this in the context of Boston's very own Museum of Fine Arts, 00:00:43.330 --> 00:00:45.170 or the MFA for short. 00:00:45.170 --> 00:00:47.617 So the MFA here in Boston is a century-- 00:00:47.617 --> 00:00:49.450 well, maybe, about a century-old museum that 00:00:49.450 --> 00:00:54.640 has many artifacts and artwork inside of it, both historical and contemporary. 00:00:54.640 --> 00:00:58.720 And it's worth asking, how do they keep track of the thousands of items 00:00:58.720 --> 00:01:00.410 that are in their collections? 00:01:00.410 --> 00:01:02.290 What could they possibly use? 00:01:02.290 --> 00:01:06.580 Well, chances are, they're likely using some kind of database. 00:01:06.580 --> 00:01:10.630 And on that database, they want to do four actions 00:01:10.630 --> 00:01:13.180 we've learned about back in week zero. 00:01:13.180 --> 00:01:17.620 They could, perhaps, create data-- to add data or insert data to the database 00:01:17.620 --> 00:01:20.270 when they get some new piece of artwork, for instance. 00:01:20.270 --> 00:01:22.970 They might want to query the database, to read from it. 00:01:22.970 --> 00:01:27.250 They could also update data, change the artist, change the artwork in some way. 00:01:27.250 --> 00:01:31.340 And they could also just delete data to remove it all together. 00:01:31.340 --> 00:01:32.920 But if we think about these-- 00:01:32.920 --> 00:01:36.370 creating, reading, updating, and deleting-- we'll 00:01:36.370 --> 00:01:39.040 notice that reading, updating, and deleting-- we 00:01:39.040 --> 00:01:42.910 can't do those if we don't actually have data in our database. 00:01:42.910 --> 00:01:46.210 And so today, we'll see how to create data-- 00:01:46.210 --> 00:01:50.860 how to insert data into our very own database. 00:01:50.860 --> 00:01:54.400 Now, let's think about the MFA's [? collections, ?] [? their ?] 00:01:54.400 --> 00:01:55.983 [? collection ?] of art and artifacts. 00:01:55.983 --> 00:01:58.275 And let me propose that they have a database [? that ?] 00:01:58.275 --> 00:01:59.540 looks a bit like this. 00:01:59.540 --> 00:02:03.700 It's a single table, and it has a title column and accession 00:02:03.700 --> 00:02:08.380 number, which is a fancy way of saying a unique ID internal to the museum, 00:02:08.380 --> 00:02:10.960 and also, a date it was acquired. 00:02:10.960 --> 00:02:15.610 We have, of course, a primary key on this table called ID. 00:02:15.610 --> 00:02:16.630 And let's think. 00:02:16.630 --> 00:02:19.660 Well, the museum might want to acquire this piece here-- 00:02:19.660 --> 00:02:22.390 this one called Profusion of Flowers. 00:02:22.390 --> 00:02:27.940 Well, how could they log that this artwork is inside of their database? 00:02:27.940 --> 00:02:30.160 They could maybe just add a new row. 00:02:30.160 --> 00:02:32.290 They could say, let's put Profusion of Flowers 00:02:32.290 --> 00:02:34.900 as the first item in our collections here. 00:02:34.900 --> 00:02:37.430 We'll give it a title and accession number-- which, again, 00:02:37.430 --> 00:02:40.100 is just a unique ID internal to the museum-- 00:02:40.100 --> 00:02:42.250 and the date it was acquired. 00:02:42.250 --> 00:02:45.340 And that row then has its own primary key 00:02:45.340 --> 00:02:49.000 to identify this row uniquely in our database. 00:02:49.000 --> 00:02:51.040 Now let's say they get another piece of artwork. 00:02:51.040 --> 00:02:54.130 They get this one, called Farmers Working at Dawn, 00:02:54.130 --> 00:02:56.453 and they want to add this one to their table, too. 00:02:56.453 --> 00:02:58.120 Well, they could do the very same thing. 00:02:58.120 --> 00:02:59.380 They could just add a new row. 00:02:59.380 --> 00:03:01.930 They could say, let's make a title, an accession number, 00:03:01.930 --> 00:03:04.240 and a date it was acquired, and this brand-new row 00:03:04.240 --> 00:03:06.670 here for that piece of artwork. 00:03:06.670 --> 00:03:08.830 And maybe they get another one, too. 00:03:08.830 --> 00:03:09.610 Same thing. 00:03:09.610 --> 00:03:11.770 Maybe they'll get back Spring Outing, and they 00:03:11.770 --> 00:03:13.720 want to add this to their collection. 00:03:13.720 --> 00:03:17.810 They could simply add another row like this. 00:03:17.810 --> 00:03:22.060 Now, it turns out that the database administrator behind the MFA 00:03:22.060 --> 00:03:26.620 might be running a SQL statement that looks a bit like this-- 00:03:26.620 --> 00:03:28.870 INSERT INTO. 00:03:28.870 --> 00:03:34.180 We can use INSERT INTO to add a new row to any given table. 00:03:34.180 --> 00:03:38.900 And notice how insert into needs to know a few pieces of information. 00:03:38.900 --> 00:03:42.550 The first is what table to insert into. 00:03:42.550 --> 00:03:45.010 What is the name of that table? 00:03:45.010 --> 00:03:47.710 Second, it needs to know what columns are 00:03:47.710 --> 00:03:51.130 we adding data to inside of this table? 00:03:51.130 --> 00:03:53.380 We give it a list of those columns here. 00:03:53.380 --> 00:03:57.130 Then, of course, it needs to know what new values should 00:03:57.130 --> 00:04:00.790 go into this new row for each of these columns. 00:04:00.790 --> 00:04:03.820 Like we saw before, is it Profusion of Flowers? 00:04:03.820 --> 00:04:05.200 Is it Spring Outing? 00:04:05.200 --> 00:04:06.310 Et cetera. 00:04:06.310 --> 00:04:09.380 Here, we can see that we have this list of values. 00:04:09.380 --> 00:04:12.610 And notice how value 0, the first value in this list, 00:04:12.610 --> 00:04:17.560 corresponds to the new value that will be inserted [? to ?] this first column. 00:04:17.560 --> 00:04:21.459 And we can keep having value 1, or column 1, value 2 and column 2, 00:04:21.459 --> 00:04:25.910 each one aligning with that particular column there. 00:04:25.910 --> 00:04:29.300 So let's see an example of this actually in code, 00:04:29.300 --> 00:04:31.540 understand it a bit more concretely. 00:04:31.540 --> 00:04:33.730 I'll go back to my computer here. 00:04:33.730 --> 00:04:36.070 Let's actually create our very own database 00:04:36.070 --> 00:04:40.990 that involves this schema of having our very own table which can keep artists 00:04:40.990 --> 00:04:44.630 and artwork and artifacts as well. 00:04:44.630 --> 00:04:45.400 So I'll type 00:04:45.400 --> 00:04:49.510 1: [? TRANSCRIPT BREAK ?] So I'll type sqlite3 mfa.db 00:04:49.510 --> 00:04:55.450 to create a database for the Museum of Fine Arts, abbreviated as MFA. 00:04:55.450 --> 00:04:56.690 I'll hit Enter here. 00:04:56.690 --> 00:05:03.310 And notice how I can now have access to my very own mfa.db inside my SQLite 00:05:03.310 --> 00:05:04.390 environment. 00:05:04.390 --> 00:05:09.510 Well, now, I can type .schema to see the schema of this database. 00:05:09.510 --> 00:05:12.483 So if I hit Enter here, well, nothing's there. 00:05:12.483 --> 00:05:13.900 Because I just made this database. 00:05:13.900 --> 00:05:16.000 There's nothing in it yet. 00:05:16.000 --> 00:05:19.840 Well, it turns out I do actually have a schema file prepared for me 00:05:19.840 --> 00:05:22.330 already in schema.sql. 00:05:22.330 --> 00:05:26.530 Here, I propose we make a table called "collections," and like our table 00:05:26.530 --> 00:05:29.320 visually, it has let's say four columns. 00:05:29.320 --> 00:05:35.515 One as an "id," the primary key of this table, one for a "title," 00:05:35.515 --> 00:05:39.610 one for the "accession_number," the unique ID internal to the museum, 00:05:39.610 --> 00:05:42.670 and finally, the date it was acquired. 00:05:42.670 --> 00:05:47.720 And notice here, that "title" has to be actually added to our database. 00:05:47.720 --> 00:05:50.500 It can't be [? NULL. ?] It can't have an empty value. 00:05:50.500 --> 00:05:52.690 Same thing for "accession_number." 00:05:52.690 --> 00:05:56.135 And further, "accession_number" has to be unique. 00:05:56.135 --> 00:05:58.510 I can't have two items [? with ?] [? a ?] [? different ?] 00:05:58.510 --> 00:05:59.590 "accession_number." 00:05:59.590 --> 00:06:02.380 Otherwise, I might get them confused in my database 00:06:02.380 --> 00:06:05.330 or inside my museum archives as well. 00:06:05.330 --> 00:06:09.550 So let me then add this schema to my database. 00:06:09.550 --> 00:06:12.280 And I can do so with this command right here. 00:06:12.280 --> 00:06:16.780 This is a SQLite command called .read, that we saw a little bit last week 00:06:16.780 --> 00:06:17.690 as well. 00:06:17.690 --> 00:06:21.440 I could type .read and then the name of this file I want to read. 00:06:21.440 --> 00:06:25.620 So I'll say .read schema.sql. 00:06:25.620 --> 00:06:28.020 Hit Enter. 00:06:28.020 --> 00:06:33.450 And now if I type, let's say .schema, I can see that same [? sql ?] 00:06:33.450 --> 00:06:39.280 [? schema, ?] same database schema. now inside of my terminal. 00:06:39.280 --> 00:06:44.420 OK, so now as promised, let's try adding some rows to this table. 00:06:44.420 --> 00:06:48.040 Because right now, if I do SELECT [? star ?] from [? collections ?] 00:06:48.040 --> 00:06:52.930 [? semicolon, ?] I don't see anything because nothing is inside yet. 00:06:52.930 --> 00:06:56.000 But I could add something using INSERT INTO. 00:06:56.000 --> 00:06:57.390 So let's try that. 00:06:57.390 --> 00:07:01.630 I'll say INSERT INTO the "collections table." 00:07:01.630 --> 00:07:05.380 And now, I have to ask, what columns inside of "collections" 00:07:05.380 --> 00:07:08.080 [? do I ?] [? want ?] to add data to. 00:07:08.080 --> 00:07:11.620 Well, I probably want to add to the first the "id" column, 00:07:11.620 --> 00:07:13.480 that primary key column. 00:07:13.480 --> 00:07:17.470 Then maybe, I want to add a "title" to this row, and also 00:07:17.470 --> 00:07:21.310 an "accession_number," and also, of course, the date this piece 00:07:21.310 --> 00:07:22.670 was "acquired." 00:07:22.670 --> 00:07:25.750 So now, I have the table I'm inserting into along 00:07:25.750 --> 00:07:28.240 with the columns I'm adding values for. 00:07:28.240 --> 00:07:31.060 So for style's sake, I'll hit Enter here. 00:07:31.060 --> 00:07:35.710 And now, I could type a list of values to insert into this new row. 00:07:35.710 --> 00:07:40.060 I could say VALUES, and then inside parentheses, the values 00:07:40.060 --> 00:07:41.650 I want to insert. 00:07:41.650 --> 00:07:45.010 Maybe the first primary key that I give to this item 00:07:45.010 --> 00:07:49.420 is going to be just [? 1, ?] start at 1, and add up as we add new items. 00:07:49.420 --> 00:07:53.530 For the title, I'll say let's call this one "Profusion of flowers." 00:07:53.530 --> 00:07:56.830 This is the piece we recently acquired into our collection. 00:07:56.830 --> 00:08:00.400 The accession number that we gave it was 56.257. 00:08:00.400 --> 00:08:06.120 And the date it was acquired was back in 1956-04-12. 00:08:06.120 --> 00:08:07.680 Now, I have all these values here. 00:08:07.680 --> 00:08:12.030 I'll type [? semicolon ?] hit Enter, and nothing seems to happen. 00:08:12.030 --> 00:08:17.370 But if I type, let's say, select [? star ?] from collections 00:08:17.370 --> 00:08:22.830 [? semicolon, ?] what do I see, but this new row inside. 00:08:22.830 --> 00:08:24.540 Let's do it again to get a hang of this. 00:08:24.540 --> 00:08:27.600 I want to now add "Farmers working at dawn" to my collection. 00:08:27.600 --> 00:08:31.080 So I'll do the same thing, I'll say let's INSERT INTO 00:08:31.080 --> 00:08:32.669 the "collections" table. 00:08:32.669 --> 00:08:36.929 And let's add values for the "id" column, the "title" 00:08:36.929 --> 00:08:42.659 column, the "accession_number," and also the "acquired" column like this. 00:08:42.659 --> 00:08:44.640 Again for style's sake, I'll hit Enter. 00:08:44.640 --> 00:08:48.780 Now, I give some list of values to add into this new row. 00:08:48.780 --> 00:08:53.280 I'll say VALUES here, then this list of values to add for each column. 00:08:53.280 --> 00:08:57.360 Well, the first column, I have to give a value for is "id." 00:08:57.360 --> 00:09:02.460 If you remember, our last id was 1, so what should this id be? 00:09:02.460 --> 00:09:03.540 Maybe 2. 00:09:03.540 --> 00:09:08.190 So I'll type 2 here as the next increment of my primary key, 00:09:08.190 --> 00:09:09.330 then I'll give it a title. 00:09:09.330 --> 00:09:13.470 And this title was "Farmers working at dawn." 00:09:13.470 --> 00:09:15.810 We gave it an accession number to keep track of it 00:09:15.810 --> 00:09:18.910 in our own internal museum records, our own archives here. 00:09:18.910 --> 00:09:25.110 So I'll say 11.6152 is the [? accession ?] [? number ?] for this 00:09:25.110 --> 00:09:32.740 particular item, then I'll say we acquired it back in 1911-08-03. 00:09:32.740 --> 00:09:36.010 Now, I'll hit a semicolon here, hit Enter. 00:09:36.010 --> 00:09:39.520 And now, I should be able to see if I type [? SELECT ?] [? star ?] FROM 00:09:39.520 --> 00:09:42.520 "collections," SELECT [? star ?] FROM "collections" 00:09:42.520 --> 00:09:48.740 [? semicolon, ?] I now have these two items inside of my collection. 00:09:48.740 --> 00:09:51.440 Now, let's do one more here. 00:09:51.440 --> 00:09:55.640 But let's focus in particular on this primary key. 00:09:55.640 --> 00:09:59.570 Notice here, how we've been actually inserting our very own primary key 00:09:59.570 --> 00:10:01.400 1 and then 2. 00:10:01.400 --> 00:10:05.223 But maybe that's not the best design. 00:10:05.223 --> 00:10:06.140 Let me try this again. 00:10:06.140 --> 00:10:12.080 I'll say INSERT INTO, let's go for, "collections" [? is ?] our table name. 00:10:12.080 --> 00:10:16.790 And maybe I'll try again to add to the "id" column, the "title" 00:10:16.790 --> 00:10:23.930 column, the "accession_number," and the "acquired" column here. 00:10:23.930 --> 00:10:29.960 What could go wrong do you think if I try to specify the primary key myself? 00:10:29.960 --> 00:10:31.880 Let me ask this as an audience question here. 00:10:31.880 --> 00:10:33.140 Feel free raise your hand. 00:10:33.140 --> 00:10:37.700 What might go wrong if I try to make the primary key myself? 00:10:37.700 --> 00:10:39.185 Insert a value myself? 00:10:43.010 --> 00:10:44.900 Let's go to [INAUDIBLE]. 00:10:44.900 --> 00:10:49.940 SPEAKER 2: Yeah, what we could do when we have to [INAUDIBLE] a lot of values, 00:10:49.940 --> 00:10:56.810 like maybe like big data or something, is there a way to add it to a CSV file 00:10:56.810 --> 00:10:59.725 or something rather than typing [? insert ?] [? into ?] tables? 00:10:59.725 --> 00:11:00.850 SPEAKER 1: A good question. 00:11:00.850 --> 00:11:02.558 I think you're onto something here, which 00:11:02.558 --> 00:11:05.600 is we're inserting one row at a time, which could get easily repetitive. 00:11:05.600 --> 00:11:07.897 So let me hold that thought for just a minute here 00:11:07.897 --> 00:11:10.730 and focus on the primary keys before we see some more efficient ways 00:11:10.730 --> 00:11:12.960 to actually add data to our table. 00:11:12.960 --> 00:11:15.800 So I'm with you on this idea of maybe we don't want 00:11:15.800 --> 00:11:17.697 to duplicate this primary key, right? 00:11:17.697 --> 00:11:19.530 Maybe we could do a little better than that. 00:11:19.530 --> 00:11:21.780 So I'll come back to my computer here. 00:11:21.780 --> 00:11:25.670 And if I specify the primary key, I might actually 00:11:25.670 --> 00:11:27.840 add a value that's already in there. 00:11:27.840 --> 00:11:32.180 So I could, thankfully, leave it up to SQLite to actually increment 00:11:32.180 --> 00:11:33.620 this value for me. 00:11:33.620 --> 00:11:36.440 Let me try leaving off "id" in this case. 00:11:36.440 --> 00:11:39.350 I'll omit the "id" altogether, which seems 00:11:39.350 --> 00:11:42.170 at first like a bit of a bad thought, like shouldn't every row 00:11:42.170 --> 00:11:43.280 have its own id? 00:11:43.280 --> 00:11:45.830 Well, let's just try and figure out what happened. 00:11:45.830 --> 00:11:47.060 I'll hit Enter here. 00:11:47.060 --> 00:11:49.560 And now I'll say, let me give the VALUES. 00:11:49.560 --> 00:11:52.770 Well, now the first column is the title column. 00:11:52.770 --> 00:11:55.970 So I'll go ahead and say that this one will be called "Spring outing." 00:11:55.970 --> 00:11:58.520 We're going to add this one to our collection here. 00:11:58.520 --> 00:12:01.610 The accession number, in this case, is 14.76. 00:12:01.610 --> 00:12:08.540 And we acquired this one all the way back in 1914-01-08 [? semicolon. ?] 00:12:08.540 --> 00:12:10.170 So I'll hit Enter here. 00:12:10.170 --> 00:12:13.340 And notice how there's no [? id ?] that I have specified. 00:12:13.340 --> 00:12:16.850 But now if I hit Enter, seems to work. 00:12:16.850 --> 00:12:20.120 I'll type SELECT [? star ?] [? FROM ?] "collections," 00:12:20.120 --> 00:12:24.710 and what do I see but the new primary key of 3. 00:12:24.710 --> 00:12:30.290 So it seems like SQLite actually increments the primary key for me. 00:12:30.290 --> 00:12:35.180 If I add some new row, it looks, what is the current highest primary key 00:12:35.180 --> 00:12:38.120 and adds 1 to that automatically for me. 00:12:38.120 --> 00:12:41.390 And I've gotten that by specifying in my schema 00:12:41.390 --> 00:12:45.050 this primary key constraint down on line 6. 00:12:45.050 --> 00:12:47.840 So pretty handy for me here. 00:12:47.840 --> 00:12:50.030 I see a few other questions too. 00:12:50.030 --> 00:12:54.570 So let's take those before we move on as well. 00:12:54.570 --> 00:12:55.760 Let's go to [? Andre. ?] 00:12:55.760 --> 00:12:59.030 SPEAKER 3: I just want to ask if I delete the, let's say, 00:12:59.030 --> 00:13:06.410 the [? first ?] record and type again, will it number it [? on ?] the fourth 00:13:06.410 --> 00:13:07.503 number or the first one. 00:13:07.503 --> 00:13:08.670 SPEAKER 1: A great question. 00:13:08.670 --> 00:13:10.378 We'll see this a little bit later on too. 00:13:10.378 --> 00:13:14.990 But if I delete some row, let's say, I deleted the row with primary key of 1, 00:13:14.990 --> 00:13:17.990 well, what SQLite will do, at least by default, 00:13:17.990 --> 00:13:20.060 is actually take the highest value. 00:13:20.060 --> 00:13:23.340 Let's say the highest value is still three in my database. 00:13:23.340 --> 00:13:26.510 It will add 1 to that, and I want to insert that new row it'll 00:13:26.510 --> 00:13:29.673 have the [? id ?] of 4 in this case. 00:13:29.673 --> 00:13:34.430 OK, so let's explore a bit more some of these constraints on tables. 00:13:34.430 --> 00:13:37.850 Here, we're talking about the primary key constraint, but we also have, 00:13:37.850 --> 00:13:40.310 as we saw before, the [? NOT ?] [? NULL ?] constraint 00:13:40.310 --> 00:13:42.000 and the [? UNIQUE ?] constraint. 00:13:42.000 --> 00:13:45.230 So let's try inserting given those constraints here. 00:13:45.230 --> 00:13:47.280 [? I'll go ?] back to my computer. 00:13:47.280 --> 00:13:52.340 And again, notice in my schema, I specified that this "title" 00:13:52.340 --> 00:13:55.512 column should always be NOT NULL. 00:13:55.512 --> 00:13:57.720 It should never have a [? null ?] value inside of it. 00:13:57.720 --> 00:14:01.280 And similarly, "accession_number" should also be NOT NULL. 00:14:01.280 --> 00:14:05.270 It should also be [? unique. ?] I should have no two rows 00:14:05.270 --> 00:14:07.550 that have the same accession number. 00:14:07.550 --> 00:14:11.300 And maybe kind of a playfully, subversively here, 00:14:11.300 --> 00:14:13.458 let's try to run against this constraint. 00:14:13.458 --> 00:14:16.500 Let's actually try to add the same accession number and see what happens. 00:14:16.500 --> 00:14:20.900 So I'll come back to my [? TERMINAL. ?] Let me just reinsert, 00:14:20.900 --> 00:14:22.490 let's say "Spring outing." 00:14:22.490 --> 00:14:27.300 Notice how if I SELECT [? star ?] FROM "collections," it's already in here. 00:14:27.300 --> 00:14:30.750 But I'll try to add it yet again with the same [? accession ?] [? number. ?] 00:14:30.750 --> 00:14:32.408 So I'll say INSERT INTO "collections." 00:14:32.408 --> 00:14:35.450 And by now this is [? hopefully ?] [? becoming ?] a little more familiar. 00:14:35.450 --> 00:14:38.870 I now want to say the columns I want to add data to. 00:14:38.870 --> 00:14:46.520 So I'll say the "title" column, the "accession_number" column, 00:14:46.520 --> 00:14:49.550 and also the "acquired" date I'll hit Enter. 00:14:49.550 --> 00:14:51.620 And now I'll say the VALUES again. 00:14:51.620 --> 00:14:54.650 Let's reinsert [? "Spring ?] [? outing." ?] I'll say "Spring outing" 00:14:54.650 --> 00:14:58.640 as a title, the accession number is 14.76, 00:14:58.640 --> 00:15:04.670 and the date it was acquired was again 1914-01-08 semicolon. 00:15:04.670 --> 00:15:09.100 Now if I hit Enter, what do we see? 00:15:09.100 --> 00:15:12.490 Runtime error-- UNIQUE constraint failed. 00:15:12.490 --> 00:15:14.890 So it seems we ran against this constraint 00:15:14.890 --> 00:15:18.460 here that we specified on line 4 here. 00:15:18.460 --> 00:15:20.830 [? "Accession number" ?] should always be unique. 00:15:20.830 --> 00:15:24.820 But by trying to add a new row that had that same accession number, 00:15:24.820 --> 00:15:29.780 we ran into this runtime error and our operation was not completed. 00:15:29.780 --> 00:15:33.190 If I say in this case, SELECT [? star ?] FROM "collections," 00:15:33.190 --> 00:15:38.050 notice how I guarded myself from adding Spring outing more than once 00:15:38.050 --> 00:15:40.070 with the same accession number. 00:15:40.070 --> 00:15:43.630 So some usefulness here to these kinds of constraints. 00:15:43.630 --> 00:15:46.180 Let's try violating NOT NULL too. 00:15:46.180 --> 00:15:49.252 I'll try adding a title that is actually [? null, ?] non-existent. 00:15:49.252 --> 00:15:51.460 It's trying to add a painting without a title itself. 00:15:51.460 --> 00:15:52.420 Let's try this. 00:15:52.420 --> 00:15:56.020 I might try INSERT INTO the "collections" table, 00:15:56.020 --> 00:16:01.660 and I'll add to the "title" columns, the "accession_number," and also 00:16:01.660 --> 00:16:03.940 the "acquired" column. 00:16:03.940 --> 00:16:06.020 Now, I'll say the VALUES here. 00:16:06.020 --> 00:16:07.870 But remember, this [? null ?] value. 00:16:07.870 --> 00:16:09.280 Null meaning nothing. 00:16:09.280 --> 00:16:10.630 This value doesn't exist. 00:16:10.630 --> 00:16:14.260 I could insert that into my table, or at least try to. 00:16:14.260 --> 00:16:16.690 I'll say NULL is a title. 00:16:16.690 --> 00:16:19.610 And similarly, NULL is the accession number. 00:16:19.610 --> 00:16:25.450 Now let's say, just playfully, we got this painting back in 1900-01-10 00:16:25.450 --> 00:16:26.770 like this. 00:16:26.770 --> 00:16:28.960 OK now, semicolon. 00:16:28.960 --> 00:16:33.970 I'll hit Enter, and we see the same runtime error-- constraint failed, 00:16:33.970 --> 00:16:36.200 in this case, the NOT NULL constraint. 00:16:36.200 --> 00:16:41.950 So notice again how in my schema, I specified "title" should be NOT NULL. 00:16:41.950 --> 00:16:46.090 But here when I try to insert a [? null ?] [? value, ?] I run into that 00:16:46.090 --> 00:16:48.670 constraint and I can't insert that value. 00:16:48.670 --> 00:16:51.400 If I try to SELECT [? star ?] FROM "collections," 00:16:51.400 --> 00:16:57.410 I should see in this case that insert didn't work. 00:16:57.410 --> 00:17:00.520 So these constraints are really guardrails against me, 00:17:00.520 --> 00:17:02.980 against other users to adding values that we really 00:17:02.980 --> 00:17:06.480 shouldn't add to our table. 00:17:06.480 --> 00:17:12.400 OK, so presumably here, we're onto something by adding some rows. 00:17:12.400 --> 00:17:17.730 But if the museum acquires more than one item, maybe 100 at a time, 00:17:17.730 --> 00:17:21.119 I don't want to be the programmer who's sitting there writing 100 INSERT INTO. 00:17:21.119 --> 00:17:22.740 That's like not what I want to do. 00:17:22.740 --> 00:17:24.630 There's probably a better way to do this. 00:17:24.630 --> 00:17:27.040 [? And ?] [? we ?] show you one way to do this. 00:17:27.040 --> 00:17:32.190 One way is to instead of inserting one row, like this, to instead insert 00:17:32.190 --> 00:17:34.900 multiple separated by commas. 00:17:34.900 --> 00:17:39.870 So here, I might say this is my first row for each of these columns. 00:17:39.870 --> 00:17:43.080 This is my second row, again, for each of these columns, 00:17:43.080 --> 00:17:45.370 and so on and so forth. 00:17:45.370 --> 00:17:49.530 And it's worth noting, this is not just a manner of convenience. 00:17:49.530 --> 00:17:54.130 Like if I tried to insert 100 rows, this is certainly convenient for me. 00:17:54.130 --> 00:17:57.210 But it's also most often in most cases going 00:17:57.210 --> 00:18:01.200 to be faster for me to insert 100 rows in one go, 00:18:01.200 --> 00:18:04.390 than to insert one row 100 times. 00:18:04.390 --> 00:18:06.930 So this is both a convenience thing for programmers 00:18:06.930 --> 00:18:11.740 and also an efficiency thing to actually optimize your database as well. 00:18:11.740 --> 00:18:16.350 So let's try this syntax now to avoid me sitting there for hours and hours 00:18:16.350 --> 00:18:18.240 writing many insert interviews. 00:18:18.240 --> 00:18:20.460 I'll go back to my computer. 00:18:20.460 --> 00:18:23.940 And this time, let's say we got two paintings at once. 00:18:23.940 --> 00:18:24.750 We got two. 00:18:24.750 --> 00:18:26.670 So I'll try to add both of them. 00:18:26.670 --> 00:18:30.510 I'll say INSERT INTO the "collections" table. 00:18:30.510 --> 00:18:35.730 And again, I want to keep adding to my "title" column, my "accession_number" 00:18:35.730 --> 00:18:40.200 column, and even my "acquired" column, like this. 00:18:40.200 --> 00:18:42.640 Now, I want to add in some values. 00:18:42.640 --> 00:18:44.370 So I'll type VALUES here. 00:18:44.370 --> 00:18:48.390 And as a matter of style, let me just say I'll make a new line. 00:18:48.390 --> 00:18:49.650 I'll Enter here. 00:18:49.650 --> 00:18:54.360 And now I can type each of my new rows on their very own line. 00:18:54.360 --> 00:18:59.220 So again, I'll type all of the values that I want in my first row. 00:18:59.220 --> 00:19:02.220 I might call this one "Imaginative landscape." 00:19:02.220 --> 00:19:03.785 We got this one back in-- 00:19:03.785 --> 00:19:05.910 actually, we don't quite know when we got this one. 00:19:05.910 --> 00:19:12.210 If I type 56.496 is the accession number, like, yeah, 56.496. 00:19:12.210 --> 00:19:15.430 The MFA actually doesn't know when they got this painting. 00:19:15.430 --> 00:19:18.810 So if they don't know, let's just insert NULL. 00:19:18.810 --> 00:19:22.800 This is intentionally left blank. 00:19:22.800 --> 00:19:24.150 This is my first row. 00:19:24.150 --> 00:19:26.970 Now, I want to add a second one in one go. 00:19:26.970 --> 00:19:28.690 I'll follow this up with a comma. 00:19:28.690 --> 00:19:32.250 And now, I'll type out my next set of values. 00:19:32.250 --> 00:19:36.000 This next one we acquired is called "Peonies and butterfly." 00:19:36.000 --> 00:19:40.710 Now, I'll say the [? accession ?] [? number ?] is 06.1899. 00:19:40.710 --> 00:19:48.570 And we got this one-- it was back in 1906-01-01 now I'll hit semicolon here. 00:19:48.570 --> 00:19:52.300 And now, if I hit Enter, nothing happens. 00:19:52.300 --> 00:19:57.330 But if I type Select [? star ?] FROM "collections" semicolon, 00:19:57.330 --> 00:20:01.960 what do we see, but now two new rows being inserted. 00:20:01.960 --> 00:20:05.500 So a handy way to insert more than one value. 00:20:05.500 --> 00:20:10.510 And also, if you have a lot of values, a more efficient way as well. 00:20:10.510 --> 00:20:12.030 So we pause here. 00:20:12.030 --> 00:20:17.930 And ask what questions we have on INSERT INTO, Whether adding one row 00:20:17.930 --> 00:20:19.550 or adding multiple. 00:20:19.550 --> 00:20:21.320 Let's go to [INAUDIBLE]. 00:20:21.320 --> 00:20:24.070 SPEAKER 4: [? So ?] [? imagine ?] you are [? writing ?] [? code ?] 00:20:24.070 --> 00:20:28.700 or inserting a row, so by mistake you have entered the wrong spelling 00:20:28.700 --> 00:20:31.717 of the title, so how we will rename it? 00:20:31.717 --> 00:20:33.050 SPEAKER 1: A great question too. 00:20:33.050 --> 00:20:35.900 And often as is the case with me, I make typos all the time. 00:20:35.900 --> 00:20:40.040 I might add the artwork title and misspell it for instance. 00:20:40.040 --> 00:20:44.600 Well, in that case, I can't use INSERT INTO to correct it, 00:20:44.600 --> 00:20:48.410 but I can use a new statement we'll see a little later called [? update. ?] 00:20:48.410 --> 00:20:50.210 And so with [? update, ?] you can actually 00:20:50.210 --> 00:20:51.830 change the spellings of things. 00:20:51.830 --> 00:20:53.930 And we'll see that a little later today. 00:20:53.930 --> 00:20:57.770 But great question to peek ahead as well. 00:20:57.770 --> 00:21:00.270 OK, so let's keep going here. 00:21:00.270 --> 00:21:04.500 Let's think about how we can keep adding values to our data set. 00:21:04.500 --> 00:21:09.620 And so far we've seen INSERT INTO with one row and with multiple. 00:21:09.620 --> 00:21:13.640 But one more way to keep in mind is you might already 00:21:13.640 --> 00:21:17.300 have your data, perhaps in some other format. 00:21:17.300 --> 00:21:22.940 And one common format looks a bit like this on our screen over here. 00:21:22.940 --> 00:21:29.750 This file is called CSV for Comma Separated Values. 00:21:29.750 --> 00:21:31.760 Now, why Comma Separated Values? 00:21:31.760 --> 00:21:33.350 Well, let's just look at it. 00:21:33.350 --> 00:21:36.710 Here we see some presumably column or column names, 00:21:36.710 --> 00:21:39.530 like [? id, ?] [? title, ?] [? accession number, ?] 00:21:39.530 --> 00:21:42.980 and [? acquired, ?] but what separates these column names? 00:21:42.980 --> 00:21:44.510 Well, it looks like commas-- 00:21:44.510 --> 00:21:48.470 "id" comma "title," comma "accession_number," and so on. 00:21:48.470 --> 00:21:51.350 Every row is still its own row. 00:21:51.350 --> 00:21:52.670 That makes sense here. 00:21:52.670 --> 00:21:57.410 But now, those row values are also separated by commas. 00:21:57.410 --> 00:22:01.100 Notice here, the first value before the first comma 00:22:01.100 --> 00:22:05.210 corresponds to the first column, this id is 1. 00:22:05.210 --> 00:22:09.050 Similarly, the next Comma Separated Value Profusion of flowers 00:22:09.050 --> 00:22:12.450 belongs to that next column here-- "title" as well, 00:22:12.450 --> 00:22:14.070 and so on and so forth. 00:22:14.070 --> 00:22:16.820 You could if you wanted to draw a snaky line 00:22:16.820 --> 00:22:21.780 to see how these columns correspond down our file here. 00:22:21.780 --> 00:22:25.720 Now, you might often have data already in this format. 00:22:25.720 --> 00:22:27.470 And it's actually pretty convenient to try 00:22:27.470 --> 00:22:31.760 to import this data into SQLite into your very own table 00:22:31.760 --> 00:22:34.170 so you can use [? SQL ?] on it. 00:22:34.170 --> 00:22:37.730 So let's try doing that here, taking a CSV like this 00:22:37.730 --> 00:22:40.610 and adding it to our database. 00:22:40.610 --> 00:22:42.600 I'll come back to my computer. 00:22:42.600 --> 00:22:45.870 And at this point, I want to start over. 00:22:45.870 --> 00:22:48.590 Let's say, I didn't use [? insert ?] at all. 00:22:48.590 --> 00:22:52.560 I actually got a CSV from the MFA of all the items in our collection. 00:22:52.560 --> 00:22:55.340 So I'll type .quit to leave this database. 00:22:55.340 --> 00:23:01.040 And now, I'll type rm mfa.db to remove this file all together. 00:23:01.040 --> 00:23:07.070 Now, let me show you mfa.csv, the CSV file I have that looks exactly 00:23:07.070 --> 00:23:08.480 like what we just saw. 00:23:08.480 --> 00:23:11.300 I'll type code mfa.csv. 00:23:11.300 --> 00:23:15.440 And now I can indeed see, I have an iD column, a title 00:23:15.440 --> 00:23:18.830 column, and accession_number, and the date these pieces were 00:23:18.830 --> 00:23:22.590 acquired all separated by commas. 00:23:22.590 --> 00:23:28.280 So let's say I want to quickly import this data into its very own database 00:23:28.280 --> 00:23:30.050 to run [? SQL ?] [? queries ?] on it. 00:23:30.050 --> 00:23:33.200 As it stands, I can't use [? SQL ?] [? queries ?] on this table 00:23:33.200 --> 00:23:36.110 because it's not inside of a SQLite database. 00:23:36.110 --> 00:23:37.370 But let me add one. 00:23:37.370 --> 00:23:43.470 I'll type SQLite3 mfa.db to remake [? MFA ?] down below. 00:23:43.470 --> 00:23:48.020 Let me now introduce this new command that is not a SQL keyword, 00:23:48.020 --> 00:23:50.400 but is actually a SQLite command. 00:23:50.400 --> 00:23:54.170 And this command is called .import. 00:23:54.170 --> 00:23:57.560 .import lets you take a file, like a CSV, 00:23:57.560 --> 00:24:04.010 and automatically insert it row by row into a table of your own making or one 00:24:04.010 --> 00:24:06.590 you can let SQLite create for you. 00:24:06.590 --> 00:24:08.660 Let's actually create the table ourselves, 00:24:08.660 --> 00:24:12.080 and then insert this CSV using .import. 00:24:12.080 --> 00:24:13.990 So I'll go back to SQLite. 00:24:13.990 --> 00:24:15.830 Let me recreate the schema. 00:24:15.830 --> 00:24:20.490 Right now, there's no tables inside this database, but let me create one. 00:24:20.490 --> 00:24:26.240 I'll instead type .read schema.sql to read in that old schema.sql file, 00:24:26.240 --> 00:24:28.460 so I can have my own table here. 00:24:28.460 --> 00:24:30.000 I'll type that schema again. 00:24:30.000 --> 00:24:33.150 And now I see, I have that table back. 00:24:33.150 --> 00:24:38.090 But if I type SELECT [? star ?] FROM "collections" semicolon, 00:24:38.090 --> 00:24:39.980 nothing is inside. 00:24:39.980 --> 00:24:41.070 Well, I could fix this. 00:24:41.070 --> 00:24:46.220 I could to import this CSV into collections to insert all at once from 00:24:46.220 --> 00:24:49.350 [? its ?] [? own, ?] from the CSV file we have right here. 00:24:49.350 --> 00:24:52.910 So to do this, let me type .import. 00:24:52.910 --> 00:24:59.270 Now before I finish this off, there are a few arguments or options I can give 00:24:59.270 --> 00:25:02.660 .import to make sure it works properly. 00:25:02.660 --> 00:25:05.430 The ones for this look as follows. 00:25:05.430 --> 00:25:10.730 I should type .import, and then [? dash ?] [? dash ?] CSV to say 00:25:10.730 --> 00:25:12.560 I'm importing a CSV. 00:25:12.560 --> 00:25:15.620 If I don't type [? dash ?] [? dash ?] CSV, 00:25:15.620 --> 00:25:19.040 SQLite I might assume something about this file that just isn't true. 00:25:19.040 --> 00:25:22.490 It should know that numbers are here separated by commas, 00:25:22.490 --> 00:25:25.130 [? numbers, ?] or [? values, ?] whatever's inside my table are 00:25:25.130 --> 00:25:27.300 separated by commas. 00:25:27.300 --> 00:25:29.840 And then [? dash ?] [? dash ?] skip 1-- 00:25:29.840 --> 00:25:31.070 well, let's take a look. 00:25:31.070 --> 00:25:34.280 If I go back to this file here, let's see. 00:25:34.280 --> 00:25:38.900 I have the first row, the second row, the third row, the fourth row. 00:25:38.900 --> 00:25:43.915 Are there any rows I probably shouldn't insert into my table here? 00:25:43.915 --> 00:25:44.915 Let me ask the audience. 00:25:44.915 --> 00:25:46.460 You're free to raise your hand. 00:25:46.460 --> 00:25:49.775 Are there any rows I shouldn't insert into this table? 00:25:52.970 --> 00:25:54.320 Let's go to [? Sukanya. ?] 00:25:54.320 --> 00:25:57.410 SPEAKER 5: Yes, I think they shouldn't include the first one, 00:25:57.410 --> 00:25:59.910 because it doesn't give us any value that we need. 00:25:59.910 --> 00:26:00.450 SPEAKER 1: Yeah, you're right. 00:26:00.450 --> 00:26:02.630 So if I look at this header row, as we call it, 00:26:02.630 --> 00:26:08.300 I see id, title, accession_number, and acquired-- the names for my columns. 00:26:08.300 --> 00:26:12.920 But notice in my schema, I already have those column names existing. 00:26:12.920 --> 00:26:17.840 So I shouldn't insert the value id into my "id" column. 00:26:17.840 --> 00:26:21.060 I want to just skip that row and only do the next one. 00:26:21.060 --> 00:26:25.880 So here by typing [? dash ?] [? dash ?] skip 1, I'll skip that header row. 00:26:25.880 --> 00:26:26.900 So let's try this now. 00:26:26.900 --> 00:26:29.420 I'll say .import [? dash ?] [? dash ?] CSV. 00:26:29.420 --> 00:26:31.294 I'm going to import a CSV. 00:26:31.294 --> 00:26:34.640 Now, [? dash ?] [? dash ?] skip, how many rows should I skip? 00:26:34.640 --> 00:26:37.040 Well, just one, the header row there. 00:26:37.040 --> 00:26:39.500 Now, I say the file to import. 00:26:39.500 --> 00:26:44.960 So I'll type mfa.csv, the CSV file to import into my database. 00:26:44.960 --> 00:26:49.040 Now, I type the name of the table I'm importing into, 00:26:49.040 --> 00:26:51.030 collections, in this case. 00:26:51.030 --> 00:26:56.060 And notice how I no longer need to [? quote ?] collections or [? quote ?] 00:26:56.060 --> 00:27:01.508 CSV, because this is not a SQL statement this is a SQLite statement, 00:27:01.508 --> 00:27:03.800 so I can get away with not [? quoting ?] anything here. 00:27:03.800 --> 00:27:05.810 Now, I'll hit Enter. 00:27:05.810 --> 00:27:08.570 And again, nothing seems to happen, but it's probably a good sign. 00:27:08.570 --> 00:27:14.240 Let me try selecting [? star ?] [? from ?] "collections" semicolon, 00:27:14.240 --> 00:27:15.140 Enter. 00:27:15.140 --> 00:27:17.960 And now all my data is just magically in there. 00:27:17.960 --> 00:27:23.903 It's went from my CSV into my very own table. 00:27:23.903 --> 00:27:27.590 OK, so that's a pretty nice step forward. 00:27:27.590 --> 00:27:31.100 No longer do we have to use just single line inserts or even 00:27:31.100 --> 00:27:33.530 bulk inserts, those multi-line inserts. 00:27:33.530 --> 00:27:37.790 We can now just import an entire table from a CSV. 00:27:37.790 --> 00:27:42.080 But I think I've been showing you a bit of an inaccuracy here, at least 00:27:42.080 --> 00:27:44.480 that's not often going to happen to you. 00:27:44.480 --> 00:27:49.280 If I go back to this file here, this CSV. 00:27:49.280 --> 00:27:53.480 What you might notice is that I specified the primary key. 00:27:53.480 --> 00:27:57.470 I said [? id ?] of 1, [? id ?] of 2, [? id ?] of 3. 00:27:57.470 --> 00:28:01.190 But if we'd heard before, this might not be the best design. 00:28:01.190 --> 00:28:04.610 Because what if I import this CSV, and there's already 00:28:04.610 --> 00:28:07.820 an item that has the id of [? 1 ?] or the id of 2. 00:28:07.820 --> 00:28:12.020 Ideally, I could let SQLite create its very own primary keys 00:28:12.020 --> 00:28:13.700 for each of these rows. 00:28:13.700 --> 00:28:15.950 And more often than not, you'll likely have 00:28:15.950 --> 00:28:20.870 a database or CSV that looks a bit more like this without the primary key. 00:28:20.870 --> 00:28:25.710 But you'll still want to have a primary key when you import into your database. 00:28:25.710 --> 00:28:29.000 So let's try working with this kind of CSV now 00:28:29.000 --> 00:28:32.870 and having SQLite generate some of the primary keys for us. 00:28:32.870 --> 00:28:34.730 Come back to my computer. 00:28:34.730 --> 00:28:40.440 And let's update our mfa.csv just to remove those primary key column here. 00:28:40.440 --> 00:28:44.900 So I will open up again mfa.csv. 00:28:44.900 --> 00:28:47.600 And let me try to just delete this column all together. 00:28:47.600 --> 00:28:54.680 I'll move [? id, ?] remove 1, and 2, and 3, and 4, and 5. 00:28:54.680 --> 00:28:59.250 Let me save it, and now that id column is gone. 00:28:59.250 --> 00:29:01.020 But now there's a problem. 00:29:01.020 --> 00:29:05.550 Like if I go to my schema, and I type .schema, 00:29:05.550 --> 00:29:09.300 notice how there are four columns in this table-- 00:29:09.300 --> 00:29:13.350 "id," "title," "accession_number," and "acquired." 00:29:13.350 --> 00:29:16.990 Well, now my CSV only has three columns. 00:29:16.990 --> 00:29:20.520 So if I try to import this CSV into this table, 00:29:20.520 --> 00:29:23.730 I'm going to run into trouble, because I have different numbers of columns 00:29:23.730 --> 00:29:26.755 [? between ?] my CSV and my table. 00:29:26.755 --> 00:29:29.130 Well, what can I do instead? 00:29:29.130 --> 00:29:32.910 It turns out that I could actually import this data 00:29:32.910 --> 00:29:36.480 into a temporary table, and then take that data 00:29:36.480 --> 00:29:40.050 and insert that data in that temporary table into my real one 00:29:40.050 --> 00:29:41.670 called "collections" here. 00:29:41.670 --> 00:29:47.670 I can use both .import and INSERT INTO to accomplish that task for me. 00:29:47.670 --> 00:29:48.810 So let's try this. 00:29:48.810 --> 00:29:50.760 I'll say .import. 00:29:50.760 --> 00:29:54.990 I want to import a CSV now-- [? dash ?] [? dash ?] [? CSV. ?] Which one? 00:29:54.990 --> 00:29:58.080 Well, mfa.csv in this case. 00:29:58.080 --> 00:30:01.902 Now, though, I want to create a brand new table that doesn't yet exist. 00:30:01.902 --> 00:30:03.360 So I'll type the name of the table. 00:30:03.360 --> 00:30:06.900 And I just call it temp for now to temporarily import 00:30:06.900 --> 00:30:09.390 this data into a brand new table. 00:30:09.390 --> 00:30:12.260 And notice how here I'm not using [? dash ?] [? dash ?] 00:30:12.260 --> 00:30:16.350 skip 1, because now I want to take advantage of those header rows. 00:30:16.350 --> 00:30:22.810 SQLite, if I import into a new table, will notice I have a header row 00:30:22.810 --> 00:30:27.600 and make my columns the very same names that are in my header row-- 00:30:27.600 --> 00:30:30.210 title, accession number, and acquired. 00:30:30.210 --> 00:30:31.380 I don't want to skip them. 00:30:31.380 --> 00:30:36.610 I want SQLite to see them and create this table using those header rows. 00:30:36.610 --> 00:30:39.990 So now, I'll hit Enter, and nothing seems to happen. 00:30:39.990 --> 00:30:45.000 But if I type .schema, what do I see, but a brand new table called "temp" 00:30:45.000 --> 00:30:49.050 that SQLite has just made automatically and notice how it used that header row. 00:30:49.050 --> 00:30:54.795 I have "title," "accession_number," and "acquired" as my column names. 00:30:54.795 --> 00:30:57.380 OK, let's now look inside the temp table, 00:30:57.380 --> 00:31:02.520 I'll say SELECT [? star ?] FROM "temp" semicolon. 00:31:02.520 --> 00:31:04.350 And now, I see all that data in there. 00:31:04.350 --> 00:31:06.200 But I don't have primary keys yet. 00:31:06.200 --> 00:31:10.070 So my goal [? is ?] [? really ?] to take all this data and insert it 00:31:10.070 --> 00:31:14.630 into my collections table to give it that primary key I've been wanting 00:31:14.630 --> 00:31:15.350 here. 00:31:15.350 --> 00:31:20.998 Again, if I type SELECT [? star ?] FROM "collections," well, nothing-- oh, 00:31:20.998 --> 00:31:22.040 something is still there. 00:31:22.040 --> 00:31:24.217 Let me actually just delete this [INAUDIBLE] 00:31:24.217 --> 00:31:27.050 I'll say DELETE FROM "collections" to remove it all together-- we'll 00:31:27.050 --> 00:31:28.550 [? get ?] [INAUDIBLE] just a minute. 00:31:28.550 --> 00:31:33.020 And now let me try, SELECT [? star ?] FROM "collections," 00:31:33.020 --> 00:31:36.750 and I should see nothing inside of collections. 00:31:36.750 --> 00:31:38.040 So what could I do? 00:31:38.040 --> 00:31:41.630 I could [? INSERT ?] [? INTO ?] [? collections ?] using the data from 00:31:41.630 --> 00:31:43.130 my temp table. 00:31:43.130 --> 00:31:43.980 So I'll try that. 00:31:43.980 --> 00:31:49.460 I'll say INSERT INTO "collections" and choose those columns yet again. 00:31:49.460 --> 00:31:54.450 I'll say the "title," and the "accession_number," 00:31:54.450 --> 00:31:58.900 and now let's go for the "acquired" column as well. 00:31:58.900 --> 00:32:00.370 I'll hit Enter. 00:32:00.370 --> 00:32:04.330 But instead of typing many new lines of values, 00:32:04.330 --> 00:32:10.240 I actually have a way to dynamically select all the values I want to insert, 00:32:10.240 --> 00:32:12.170 and it looks a bit like this. 00:32:12.170 --> 00:32:18.890 I could say INSERT INTO [? some ?] table and some columns of that table. 00:32:18.890 --> 00:32:23.330 But I want to insert the results of this SELECT down below. 00:32:23.330 --> 00:32:26.720 So I'll [? SELECT ?] some columns [? FROM ?] some separate table. 00:32:26.720 --> 00:32:31.850 And so long as these columns align, I'll be able to actually take the results 00:32:31.850 --> 00:32:35.180 of that [? SELECT ?] and [? insert ?] all of them [? into ?] this new table 00:32:35.180 --> 00:32:37.070 using INSERT INTO. 00:32:37.070 --> 00:32:39.920 So let me finish my statement here, we can see the results. 00:32:39.920 --> 00:32:42.050 Come back to my computer. 00:32:42.050 --> 00:32:47.690 And here, I'll type a SELECT to get back all the items from my temp table. 00:32:47.690 --> 00:32:52.070 I'll say SELECT-- in this case, SELECT, let's 00:32:52.070 --> 00:32:56.450 go for the "title" column, the "accession_number" 00:32:56.450 --> 00:33:01.100 column, and the "acquired" column from my "temp" table. 00:33:01.100 --> 00:33:05.270 Now, if I hit semicolon, I should see-- 00:33:05.270 --> 00:33:06.470 well, nothing at first. 00:33:06.470 --> 00:33:11.000 But if I type SELECT [? star ?] FROM "collections," 00:33:11.000 --> 00:33:16.610 I see all my data now in there selected from my temporary table. 00:33:16.610 --> 00:33:19.623 And now if I type .schema, I still see temp. 00:33:19.623 --> 00:33:20.540 But what can I do now? 00:33:20.540 --> 00:33:22.710 I could just delete that table altogether. 00:33:22.710 --> 00:33:28.370 I could say DROP TABLE "temp," as we saw last week, semicolon, hit Enter, 00:33:28.370 --> 00:33:30.090 .schema again. 00:33:30.090 --> 00:33:33.680 And now, we're back where we want to be with a single table, 00:33:33.680 --> 00:33:39.030 and now we've import our data from our CSV with primary keys. 00:33:39.030 --> 00:33:44.650 If I type "collections" here, we see it all in this table. 00:33:44.650 --> 00:33:48.190 OK, so we've seen several versions of import, one 00:33:48.190 --> 00:33:52.460 to import into an existing table and one to import into a brand new table. 00:33:52.460 --> 00:33:58.540 Let me ask then what questions we have on how to use .import or how to insert 00:33:58.540 --> 00:34:03.240 more data into our database. 00:34:03.240 --> 00:34:04.430 Let's go [? Louise. ?] 00:34:04.430 --> 00:34:08.330 SPEAKER 6: Can we ask the INSERT INTO command to place 00:34:08.330 --> 00:34:10.387 a column in a specific position? 00:34:10.387 --> 00:34:11.429 SPEAKER 1: Good question. 00:34:11.429 --> 00:34:16.028 Can we ask INSERT INTO the place a column in a certain position? 00:34:16.028 --> 00:34:19.070 There's a few ways of getting [? it-- ?] what I think you're asking here, 00:34:19.070 --> 00:34:20.695 so let me show you a few of those here. 00:34:20.695 --> 00:34:26.389 I'll come back to my computer, and let me pull up the syntax for INSERT INTO 00:34:26.389 --> 00:34:27.210 again. 00:34:27.210 --> 00:34:31.800 I will choose this, and I'll show it on the screen over here. 00:34:31.800 --> 00:34:36.409 So I'll say INSERT INTO some table given some columns. 00:34:36.409 --> 00:34:39.139 And notice how down below, I actually have [? values ?] 00:34:39.139 --> 00:34:41.600 to insert into those columns. 00:34:41.600 --> 00:34:45.120 I could, to your question, rearrange these values. 00:34:45.120 --> 00:34:49.670 So I might have this first value here is some value that goes into this column. 00:34:49.670 --> 00:34:52.159 The second column here is attached to that column. 00:34:52.159 --> 00:34:56.000 And that would rearrange the values I insert into those columns. 00:34:56.000 --> 00:34:59.270 If you're asking, though, if I can reorder the columns, 00:34:59.270 --> 00:35:02.733 that is up to [? CREATE ?] [? TABLE ?] and [? CREATE ?] [? TABLE ?] only. 00:35:02.733 --> 00:35:05.900 And in general, I encourage you not so much out of ordering of your columns, 00:35:05.900 --> 00:35:08.233 [? because ?] [? it ?] could be in any order whatsoever. 00:35:08.233 --> 00:35:10.880 But you could just rearrange your selection of columns 00:35:10.880 --> 00:35:14.780 here to insert the data you want to insert. 00:35:14.780 --> 00:35:17.610 OK, let's take one more question here. 00:35:17.610 --> 00:35:20.390 SPEAKER 7: What happens if one of the rows you're trying to insert 00:35:20.390 --> 00:35:22.340 violates a constraint on the table? 00:35:22.340 --> 00:35:25.140 SPEAKER 1: Yeah, so here we're inserting multiple rows. 00:35:25.140 --> 00:35:28.430 And if one actually violates some constraint, 00:35:28.430 --> 00:35:31.610 then we won't actually insert any of those rows. 00:35:31.610 --> 00:35:34.010 And this is because this insert is bundled inside 00:35:34.010 --> 00:35:38.640 of a transaction, which we'll learn more about later on in the course. 00:35:38.640 --> 00:35:40.233 Let's take another one. 00:35:40.233 --> 00:35:43.400 SPEAKER 8: I noticed that when you did SELECT [? star ?] FROM "collections," 00:35:43.400 --> 00:35:45.740 that one of the acquired dates was just blank. 00:35:45.740 --> 00:35:47.030 It didn't say NULL. 00:35:47.030 --> 00:35:51.290 Is that because the CSV itself just had an empty value next to the comma? 00:35:51.290 --> 00:35:52.790 SPEAKER 1: Yeah, great observations. 00:35:52.790 --> 00:35:53.850 So let me try this again. 00:35:53.850 --> 00:35:55.808 So we can see the results of this [? SELECT. ?] 00:35:55.808 --> 00:36:00.020 I'll come back to my computer here, and let me show you again. 00:36:00.020 --> 00:36:03.140 If I say SELECT [? star ?] FROM "collections" 00:36:03.140 --> 00:36:07.970 semicolon, notice how this acquired column-- this acquired cell here, if I 00:36:07.970 --> 00:36:11.330 go back to my screen, it's just blank. 00:36:11.330 --> 00:36:14.720 But we saw before that if I [? selected ?] [? star, ?] I would have 00:36:14.720 --> 00:36:19.460 seen [? NULL ?] there if this value was truly [? NULL. ?] Well, 00:36:19.460 --> 00:36:24.890 it turns out one downside of importing from a CSV is that all of your data is 00:36:24.890 --> 00:36:27.230 imported initially as text. 00:36:27.230 --> 00:36:30.920 And if I have just a blank cell in my CSV, 00:36:30.920 --> 00:36:34.130 it won't be converted to [? NULL ?] automatically. 00:36:34.130 --> 00:36:36.830 I need to do that myself perhaps using an [? update ?] 00:36:36.830 --> 00:36:38.790 statement that we'll see in just a bit. 00:36:38.790 --> 00:36:42.140 So be wary of this if you do want to keep track of [? null ?] [? values ?] 00:36:42.140 --> 00:36:42.930 and so on. 00:36:42.930 --> 00:36:46.070 If you don't actually manually make this value a [? null, ?] 00:36:46.070 --> 00:36:49.230 it'll appear as just a blank value not a null, 00:36:49.230 --> 00:36:52.770 which is different in this respect. 00:36:52.770 --> 00:36:58.380 OK so here, we've seen how to insert not just one row, but multiple, 00:36:58.380 --> 00:37:01.840 and also how to import data from a CSV. 00:37:01.840 --> 00:37:05.980 When we come back, we'll see how to actually update our data altogether, 00:37:05.980 --> 00:37:07.275 and even delete it too. 00:37:07.275 --> 00:37:09.150 So come back in just a few and talk about how 00:37:09.150 --> 00:37:10.950 to delete data from our tables. 00:37:10.950 --> 00:37:12.690 And we're back. 00:37:12.690 --> 00:37:16.540 So we just saw how to insert some rows into our database, 00:37:16.540 --> 00:37:18.480 and also to import some data. 00:37:18.480 --> 00:37:22.320 But presumably, we also want to be able to delete some data from our table 00:37:22.320 --> 00:37:23.290 as well. 00:37:23.290 --> 00:37:26.130 You can imagine the MFA, the Museum of Fine Arts, 00:37:26.130 --> 00:37:29.460 maybe they're selling a piece of artwork, or maybe they've lost one, 00:37:29.460 --> 00:37:30.750 or maybe it was stolen. 00:37:30.750 --> 00:37:34.980 But either way, they want to remove the row from their table. 00:37:34.980 --> 00:37:36.390 Well, let's see here. 00:37:36.390 --> 00:37:41.100 We can go back to our schema with a table of artifacts and artwork 00:37:41.100 --> 00:37:43.620 that is inside of the MFA. 00:37:43.620 --> 00:37:45.870 Now if I want to delete a particular piece, 00:37:45.870 --> 00:37:47.950 you could visually think of it a bit like this. 00:37:47.950 --> 00:37:51.270 I could first identify that row I want to delete. 00:37:51.270 --> 00:37:52.980 Let's say it's Spring outing here. 00:37:52.980 --> 00:37:54.810 We've sold this piece. 00:37:54.810 --> 00:37:58.210 Well, I could visually just remove this row. 00:37:58.210 --> 00:38:02.460 So it's no longer there and shift the remaining ones up metaphorically. 00:38:02.460 --> 00:38:05.610 And it turns out that to do this in SQL, we 00:38:05.610 --> 00:38:08.220 have our very own statement we can use. 00:38:08.220 --> 00:38:14.830 This one reads DELETE FROM some table WHERE a condition is true. 00:38:14.830 --> 00:38:17.550 So we see our old friend WHERE back again. 00:38:17.550 --> 00:38:20.610 And WHERE is vitally important to this DELETE FROM. 00:38:20.610 --> 00:38:23.910 If I say DELETE FROM table with no WHERE, 00:38:23.910 --> 00:38:26.790 what do you think is going to happen? 00:38:26.790 --> 00:38:29.850 We might drop everything from our table, right? 00:38:29.850 --> 00:38:32.910 But if I instead say WHERE some condition is true, 00:38:32.910 --> 00:38:38.170 I can select the rows I want to delete and only delete those rows. 00:38:38.170 --> 00:38:39.900 So let's try this here. 00:38:39.900 --> 00:38:43.080 I'll try to delete some artwork from our collections table, 00:38:43.080 --> 00:38:45.610 for example, maybe we sold it and we want get rid of it. 00:38:45.610 --> 00:38:51.960 I'll come back to my computer over here, and I will open up our SQLite database. 00:38:51.960 --> 00:38:54.540 I'll say sqlite3 mfa.db. 00:38:54.540 --> 00:38:58.290 And now, I can type SELECT [? star ?] [? FROM ?] "collections," 00:38:58.290 --> 00:39:03.990 and I see a few more items than last time now all inside of our table. 00:39:03.990 --> 00:39:07.650 But per the visual, I want to delete Spring outing, 00:39:07.650 --> 00:39:09.820 to remove it from this table. 00:39:09.820 --> 00:39:10.900 So what could I do? 00:39:10.900 --> 00:39:15.270 I could try DELETE FROM and then the table I want to delete from. 00:39:15.270 --> 00:39:20.400 In this case, I'll use the let's say [? collections' ?] [? name, ?] DELETE 00:39:20.400 --> 00:39:22.320 FROM "collections." 00:39:22.320 --> 00:39:23.620 But I don't want to do this. 00:39:23.620 --> 00:39:25.590 I don't want to say DELETE FROM "collections" 00:39:25.590 --> 00:39:30.780 semicolon end of statement, because then I'll delete every row. 00:39:30.780 --> 00:39:33.540 What I should instead do is this. 00:39:33.540 --> 00:39:37.590 DELETE FROM "collections" WHERE some condition is true. 00:39:37.590 --> 00:39:41.370 Maybe the title in this case is a title I'm trying to remove. 00:39:41.370 --> 00:39:47.430 So I'll say "title" equals "Spring outing," just like this, semicolon, 00:39:47.430 --> 00:39:48.510 Enter. 00:39:48.510 --> 00:39:49.740 Now, nothing seems to happen. 00:39:49.740 --> 00:39:55.140 But if I do SELECT [? star ?] FROM "collections" semicolon, 00:39:55.140 --> 00:39:57.420 we no longer see Spring outing. 00:39:57.420 --> 00:40:02.580 And notice here, how the id column, the id of 3 is now gone. 00:40:02.580 --> 00:40:05.880 If I were to insert some new row here, I would start over 00:40:05.880 --> 00:40:10.110 with the highest number, which is 7 in this case, going from 7 to 8. 00:40:10.110 --> 00:40:14.150 But 3 is no longer part of our database. 00:40:14.150 --> 00:40:18.170 Let me try this again, notice how this one called Imaginative Landscape, 00:40:18.170 --> 00:40:19.940 the acquired date is NULL. 00:40:19.940 --> 00:40:21.813 We don't know when we got it. 00:40:21.813 --> 00:40:23.480 Well, let's say that we eventually sell. 00:40:23.480 --> 00:40:25.010 It's no longer part of our database. 00:40:25.010 --> 00:40:29.810 We could use a condition based on NULL to remove this particular artwork. 00:40:29.810 --> 00:40:30.810 So I'll say this. 00:40:30.810 --> 00:40:36.770 Let me try to DELETE FROM "collections" WHERE "title" equals-- 00:40:36.770 --> 00:40:39.920 not Spring outing, but WHERE the acquired date-- sorry, 00:40:39.920 --> 00:40:44.480 the "acquired" date IS NULL semicolon. 00:40:44.480 --> 00:40:47.490 I could also use is NOT NULL-- 00:40:47.490 --> 00:40:51.350 WHERE "acquired" IS NOT NULL, but that would delete all the pieces of artwork 00:40:51.350 --> 00:40:53.510 that actually have an acquired date. 00:40:53.510 --> 00:40:57.260 Here, I want to leave only those that do not have an acquired date. 00:40:57.260 --> 00:41:00.770 So I'll say where "acquired" IS NULL. 00:41:00.770 --> 00:41:03.320 Now, I'll hit Enter, nothing seems to happen, 00:41:03.320 --> 00:41:07.850 but I'll say Select [? star ?] FROM "collections" semicolon. 00:41:07.850 --> 00:41:12.020 And I'll see that piece of artwork is now gone. 00:41:12.020 --> 00:41:16.820 I only have those that have an acquired date. 00:41:16.820 --> 00:41:22.880 OK, so similar to [? INSERT, ?] we've been able to [? delete ?] one row 00:41:22.880 --> 00:41:23.990 at a time. 00:41:23.990 --> 00:41:28.370 But it would probably be helpful for us to delete multiple rows at a time 00:41:28.370 --> 00:41:29.550 as well. 00:41:29.550 --> 00:41:31.460 Let's look again at our table. 00:41:31.460 --> 00:41:36.200 Here we have our artwork, as it currently stands, and maybe 00:41:36.200 --> 00:41:42.020 let's say we want to delete those pieces that we acquired before 1909. 00:41:42.020 --> 00:41:46.970 Here in Boston, the MFA actually moved locations in 1909 to a new place 00:41:46.970 --> 00:41:49.770 still in Boston, but a brand new building altogether. 00:41:49.770 --> 00:41:53.090 So let's say that perhaps they left some items at their old location, 00:41:53.090 --> 00:41:55.760 they're no longer part of their collections. 00:41:55.760 --> 00:42:02.920 Via what condition do you think could I select these three rows from 00:42:02.920 --> 00:42:08.670 my [? DELETE ?] [? FROM? ?] What could I put in my condition to delete these 00:42:08.670 --> 00:42:09.900 three rows? 00:42:09.900 --> 00:42:11.670 Let me ask the audience here. 00:42:11.670 --> 00:42:15.780 What might you propose for [? delete ?] [? from ?] here to delete those three 00:42:15.780 --> 00:42:16.575 particular rows? 00:42:21.540 --> 00:42:22.650 Let's go to [? Shiv. ?] 00:42:22.650 --> 00:42:26.468 SPEAKER 9: Greater than [? 5. ?] [? id ?] equals greater than 5. 00:42:26.468 --> 00:42:28.010 SPEAKER 1: That's a good observation. 00:42:28.010 --> 00:42:29.468 So we could look at our table here. 00:42:29.468 --> 00:42:33.490 We see the id could be greater than or equal to 5, that would 00:42:33.490 --> 00:42:36.220 remove these three rows certainly. 00:42:36.220 --> 00:42:38.380 There's probably another way to do this too. 00:42:38.380 --> 00:42:42.190 Any other ideas besides the id, what else can we use? 00:42:49.860 --> 00:42:51.710 Let's go to [? Yasavi. ?] 00:42:51.710 --> 00:42:55.070 SPEAKER 10: So we could compare the dates and use greater than or less 00:42:55.070 --> 00:42:58.340 than sign with the acquired dates. 00:42:58.340 --> 00:43:00.800 SPEAKER 1: So you're proposing to use the date column too. 00:43:00.800 --> 00:43:02.120 That makes sense to me as well. 00:43:02.120 --> 00:43:05.300 We could see acquired here, perhaps we could see, 00:43:05.300 --> 00:43:11.100 is the acquired value less than 1909, that year that we move locations, 00:43:11.100 --> 00:43:14.510 we could also probably use the title, the [? accession ?] [? number, ?] 00:43:14.510 --> 00:43:15.390 and so on. 00:43:15.390 --> 00:43:17.600 And each has their own [? trade-offs. ?] 00:43:17.600 --> 00:43:20.030 Remember, though, for this particular query, which 00:43:20.030 --> 00:43:23.480 is deleting those items that were part of the museum 00:43:23.480 --> 00:43:29.210 acquired before 1909, probably best to use the acquired 00:43:29.210 --> 00:43:32.240 date to actually delete those rows. 00:43:32.240 --> 00:43:35.930 What if, for instance, our ids are actually not like this. 00:43:35.930 --> 00:43:38.210 But they're kind of interspersed around. 00:43:38.210 --> 00:43:41.870 I couldn't really do a query like if the id is greater than or equal to 5, 00:43:41.870 --> 00:43:45.110 because I could also include those that were maybe acquired 1911, 00:43:45.110 --> 00:43:48.680 [? 1956, ?] and so on. [? It's ?] [? probably ?] ideal for me to use this 00:43:48.680 --> 00:43:52.140 date if it's the date that I ultimately care about. 00:43:52.140 --> 00:43:53.210 So let's try this. 00:43:53.210 --> 00:43:55.580 I'll go back to my environment. 00:43:55.580 --> 00:43:59.090 And I'll try to delete these three particular rows. 00:43:59.090 --> 00:44:03.750 I'll try DELETE FROM "collections," as we saw before. 00:44:03.750 --> 00:44:05.630 But now here comes my condition. 00:44:05.630 --> 00:44:12.230 I'll say WHERE the "acquired" column is less than some date that I'll give. 00:44:12.230 --> 00:44:17.180 And here, I'll give 1909-01-01. 00:44:17.180 --> 00:44:21.260 It turns out that SQLite has a few ways to represent dates. 00:44:21.260 --> 00:44:26.300 One of which is this format that follows the YYYY or the year, 00:44:26.300 --> 00:44:31.280 four digits, and then dash, and then MM or the month with two digits, 00:44:31.280 --> 00:44:35.340 and then dash again, DD, in this case, the day in two digits. 00:44:35.340 --> 00:44:42.470 So 1909-01-01 means January 1st, 1909. 00:44:42.470 --> 00:44:46.430 And I can use these same operators with dates. 00:44:46.430 --> 00:44:51.230 I could say greater than, or less than, or equal to this particular date. 00:44:51.230 --> 00:44:54.380 And SQLite will be able to parse this for me to understand what I mean, 00:44:54.380 --> 00:44:58.010 which is earlier than 1909. 00:44:58.010 --> 00:44:59.330 So let's try this then. 00:44:59.330 --> 00:45:02.450 I'll hit Enter, and nothing seems to happen. 00:45:02.450 --> 00:45:05.870 But if I do SELECT [? star ?] FROM "collections," 00:45:05.870 --> 00:45:10.820 I now see I'm down to two paintings, two pieces of artwork that 00:45:10.820 --> 00:45:15.720 were acquired only after 1909. 00:45:15.720 --> 00:45:23.190 OK, so given this part on DELETE, we're able to delete one row or even multiple 00:45:23.190 --> 00:45:24.315 rows at a time. 00:45:24.315 --> 00:45:26.190 But there are still instances where you might 00:45:26.190 --> 00:45:31.080 want to think about, should we delete something or can we delete something. 00:45:31.080 --> 00:45:35.010 We'll talk about those in just a minute in terms of constraints. 00:45:35.010 --> 00:45:36.180 We back in a few. 00:45:36.180 --> 00:45:37.860 And we're back. 00:45:37.860 --> 00:45:43.260 So we just saw how to delete one row, even multiple rows. 00:45:43.260 --> 00:45:46.620 We haven't yet talked about whether we should delete some data 00:45:46.620 --> 00:45:49.020 or whether we should delete some data and particularly 00:45:49.020 --> 00:45:51.180 in the context of these constraints, where 00:45:51.180 --> 00:45:53.340 you might have maybe a piece of data you actually 00:45:53.340 --> 00:45:57.460 shouldn't delete to maintain the integrity of your table. 00:45:57.460 --> 00:46:00.870 Now, one example of this is a [? Foreign ?] [? Key ?] 00:46:00.870 --> 00:46:05.880 [? Constraint, ?] which means that you have some table with a primary key that 00:46:05.880 --> 00:46:08.163 is referenced by some other table. 00:46:08.163 --> 00:46:10.080 And we'll get concrete about this in a minute. 00:46:10.080 --> 00:46:13.380 But it might mean that if you were to delete that primary key, 00:46:13.380 --> 00:46:16.210 that other table would have nothing to reference. 00:46:16.210 --> 00:46:20.610 So let's update our schema here for the MFA, the Museum of Fine Arts. 00:46:20.610 --> 00:46:25.140 And let's say that they have not just a collection now, but also artists 00:46:25.140 --> 00:46:25.890 involved. 00:46:25.890 --> 00:46:29.970 And there exists a [? many ?] [? to ?] [? many ?] relationship among artists 00:46:29.970 --> 00:46:32.160 and items in the collection. 00:46:32.160 --> 00:46:36.480 We could say that an artist might make more than one piece of art 00:46:36.480 --> 00:46:37.710 in the collection. 00:46:37.710 --> 00:46:42.480 And a piece of art might be created by more than one artist. 00:46:42.480 --> 00:46:47.050 Now concretely in our database, this might look as follows. 00:46:47.050 --> 00:46:51.480 I could have a table for artists and a table for collections. 00:46:51.480 --> 00:46:54.780 And notice how each has their own primary key. 00:46:54.780 --> 00:46:58.110 Artists has an id column, called our primary key column. 00:46:58.110 --> 00:47:02.400 Collections too has an id called our primary key column here. 00:47:02.400 --> 00:47:06.420 Now, in the middle is this table created that symbolizes, 00:47:06.420 --> 00:47:11.280 that represents the relationship among artists and items in the collection. 00:47:11.280 --> 00:47:16.380 Here on the first row, we see the artist id of 1 created the piece of art 00:47:16.380 --> 00:47:18.750 with the collection id of 2. 00:47:18.750 --> 00:47:20.830 So who did what here? 00:47:20.830 --> 00:47:25.200 Well, we can see that the artist with the ID of 1 is Li Yin. 00:47:25.200 --> 00:47:31.740 Li Yin created this piece of art, the id of 2, which is Imaginative landscape. 00:47:31.740 --> 00:47:35.340 So we can kind of relate in this case artists with collections 00:47:35.340 --> 00:47:37.890 as we saw just a few weeks ago. 00:47:37.890 --> 00:47:41.880 But let's say we decide to delete a particular artist, 00:47:41.880 --> 00:47:45.510 maybe we delete Unidentified artists down here. 00:47:45.510 --> 00:47:48.100 Well, we could just delete from the artist table. 00:47:48.100 --> 00:47:53.940 Maybe we find a condition to select this row and we delete from artists. 00:47:53.940 --> 00:47:56.610 We'll do that here remove that row. 00:47:56.610 --> 00:47:59.520 But what have we done wrong? 00:47:59.520 --> 00:48:03.560 If we look at our created table in particular, 00:48:03.560 --> 00:48:06.350 what have we now done wrong? 00:48:06.350 --> 00:48:13.860 What kind of problem might we [? run ?] [? into? ?] Let's go to [? Karim. ?] 00:48:13.860 --> 00:48:17.315 SPEAKER 11: [? Yeah ?] [INAUDIBLE] [? was ?] [? only ?] [? deleted ?] 00:48:17.315 --> 00:48:19.940 [? from ?] [? the ?] [? artist ?] [INAUDIBLE] without the other 00:48:19.940 --> 00:48:20.753 [INAUDIBLE]. 00:48:20.753 --> 00:48:21.920 SPEAKER 1: Yeah, good point. 00:48:21.920 --> 00:48:27.650 We only deleted the artist with the id of 3 from the artists table. 00:48:27.650 --> 00:48:34.343 Now if I look at the created table and I look for the artist with id of 3, 00:48:34.343 --> 00:48:35.010 do you see them? 00:48:35.010 --> 00:48:36.190 They don't exist anymore. 00:48:36.190 --> 00:48:38.190 So I can't understand this relationship. 00:48:38.190 --> 00:48:42.120 I don't know the artist with the id of 3. 00:48:42.120 --> 00:48:47.790 So let's try this now with a new kind of schema now in our database. 00:48:47.790 --> 00:48:50.280 I'll go back to my computer. 00:48:50.280 --> 00:48:53.850 And let me open up our new version of our database. 00:48:53.850 --> 00:49:01.440 I'll type sqlite3, in this case, mfa.db, same name, but now a different schema. 00:49:01.440 --> 00:49:05.640 If I type .schema, notice a few things here. 00:49:05.640 --> 00:49:09.090 I have those same tables we saw visually. 00:49:09.090 --> 00:49:13.740 I have collections, artists, and a created table. 00:49:13.740 --> 00:49:16.680 And [? let me ?] focus in particular on this created table. 00:49:16.680 --> 00:49:21.090 If I say .schema, in this case, created, Enter, 00:49:21.090 --> 00:49:26.610 I'll see I have a FOREIGN KEY constraint on my "artist_id" 00:49:26.610 --> 00:49:30.040 column and my "collection_id" column. 00:49:30.040 --> 00:49:36.300 So if I tried to delete an artist that is referenced in my created table, 00:49:36.300 --> 00:49:39.990 I would probably raise a constraint error-- 00:49:39.990 --> 00:49:42.370 a FOREIGN KEY [? constraint ?] error. 00:49:42.370 --> 00:49:45.670 So let's just try this and see what happens. 00:49:45.670 --> 00:49:50.550 Let me try DELETE FROM the [? artist's ?] [? table, ?] DELETE FROM 00:49:50.550 --> 00:49:59.060 "artists" WHERE, let's say the "name" of the artist is literally "Unidentified 00:49:59.060 --> 00:49:59.940 artist." 00:49:59.940 --> 00:50:03.560 This is the name they have in the Museum of Fine Arts collections-- 00:50:03.560 --> 00:50:05.270 Unidentified artist. 00:50:05.270 --> 00:50:08.270 So now I want to delete them from that table. 00:50:08.270 --> 00:50:14.130 Well, if I hit Enter, I do get that FOREIGN KEY constraint error. 00:50:14.130 --> 00:50:19.610 So does this mean I just can't delete this artist or is there a workaround? 00:50:19.610 --> 00:50:21.290 Let's look at it visually again. 00:50:21.290 --> 00:50:24.740 I'll go back to what we had before as our table, 00:50:24.740 --> 00:50:27.620 and let me ask the audience here. 00:50:27.620 --> 00:50:29.210 What's the solution? 00:50:29.210 --> 00:50:33.140 If I can't delete Unidentified artist, because they 00:50:33.140 --> 00:50:38.300 have an id referenced by this table, what should I maybe do first? 00:50:38.300 --> 00:50:39.695 What should I do instead? 00:50:39.695 --> 00:50:41.570 SPEAKER 12: Yeah, I want to say that first we 00:50:41.570 --> 00:50:46.160 can delete the id that is being referenced by this [? id ?] 00:50:46.160 --> 00:50:48.740 that we want to delete, then we can delete it. 00:50:48.740 --> 00:50:51.170 In fact, if we want to delete it from the artists, 00:50:51.170 --> 00:50:54.950 first we have to delete from the created table [INAUDIBLE] table, which 00:50:54.950 --> 00:50:58.070 it is referencing, then we can delete from the artist [INAUDIBLE] 00:50:58.070 --> 00:51:00.060 SPEAKER 1: Yeah, I like your thinking here. 00:51:00.060 --> 00:51:02.480 So if we look at the created table, you notice 00:51:02.480 --> 00:51:05.510 that we can't delete Unidentified artist, because it's 00:51:05.510 --> 00:51:07.430 referenced in the created table. 00:51:07.430 --> 00:51:12.230 What we should maybe first do is delete this row delete their affiliation 00:51:12.230 --> 00:51:16.100 with their work, and then delete the artist so we don't 00:51:16.100 --> 00:51:18.900 run into this FOREIGN KEY constraint. 00:51:18.900 --> 00:51:22.520 So let's try this in our SQLite environment. 00:51:22.520 --> 00:51:24.780 Come back to my computer. 00:51:24.780 --> 00:51:30.320 And again, our plan was to first delete the artists affiliation with their work 00:51:30.320 --> 00:51:32.210 from the created table. 00:51:32.210 --> 00:51:34.460 So let me show you first the created table. 00:51:34.460 --> 00:51:38.240 I'll say SELECT [? star ?] from "created," 00:51:38.240 --> 00:51:42.080 and see how it's exactly like the table we just visualized. 00:51:42.080 --> 00:51:46.820 Now, let me try delete this artist affiliation with their work. 00:51:46.820 --> 00:51:48.870 I'm not deleting the artist. 00:51:48.870 --> 00:51:52.280 I'm deleting their affiliation with their authorship, 00:51:52.280 --> 00:51:55.710 their [? artist ?] [? ship, ?] whatever you want to call it for their work. 00:51:55.710 --> 00:51:59.810 Now, I'll say DELETE FROM-- 00:51:59.810 --> 00:52:07.310 let's say, DELETE FROM "artists" WHERE the "artist_id" is equal to 00:52:07.310 --> 00:52:10.600 or equals 3. 00:52:10.600 --> 00:52:12.770 Well, I could do this. 00:52:12.770 --> 00:52:16.840 But there's probably a better way, like I know the id, and that's fine. 00:52:16.840 --> 00:52:19.370 But we could also use, as we saw a little bit ago, 00:52:19.370 --> 00:52:22.510 which were [? subqueries, ?] a way to write a query that 00:52:22.510 --> 00:52:25.960 returns me the result. And that result gets included in another query 00:52:25.960 --> 00:52:26.840 altogether. 00:52:26.840 --> 00:52:28.270 So let's try that instead. 00:52:28.270 --> 00:52:31.030 Let's say I don't know the artist_id. 00:52:31.030 --> 00:52:33.130 What I could do is make a [? subquery. ?] 00:52:33.130 --> 00:52:39.430 I could say inside of this, let me select the "id" FROM the "artists" 00:52:39.430 --> 00:52:48.550 table WHERE, in this case, the "name" equals "Unidentified artist," and then, 00:52:48.550 --> 00:52:51.130 let me close out this query, hit a semicolon. 00:52:51.130 --> 00:52:54.340 And I already see a typo in this, but I'm just going to try it anyway. 00:52:54.340 --> 00:52:57.040 I will go ahead, and hit Enter. 00:52:57.040 --> 00:53:03.700 And let's see if I type SELECT [? star ?] FROM "artists," Enter, 00:53:03.700 --> 00:53:06.550 I still see Unidentified artist. 00:53:06.550 --> 00:53:08.480 And I think my typo here was as follows. 00:53:08.480 --> 00:53:11.050 If I up my query using the up arrow on my keyboard, 00:53:11.050 --> 00:53:16.300 I see DELETE from "artists," where the "artist_id" equals some value here. 00:53:16.300 --> 00:53:19.270 But do I want to delete from artists? 00:53:19.270 --> 00:53:19.990 I don't. 00:53:19.990 --> 00:53:21.610 I actually can't delete from artists. 00:53:21.610 --> 00:53:25.340 What I should do instead is delete from the created table. 00:53:25.340 --> 00:53:26.350 So let's try this. 00:53:26.350 --> 00:53:30.640 I'll say DELETE FROM "created," making sure I only 00:53:30.640 --> 00:53:33.370 have double quotes around created, then I'll 00:53:33.370 --> 00:53:38.470 say WHERE the artist_id equals, let's say, either 3-- 00:53:38.470 --> 00:53:41.530 but it could also be the result of this [? subquery. ?] 00:53:41.530 --> 00:53:47.410 I'll say 1, 2, 3, 4, and then SELECT the "id" 00:53:47.410 --> 00:53:55.880 FROM the "artist" table WHERE the "name" equals "Unidentified artist." 00:53:55.880 --> 00:54:00.230 Then I'll close this [? subquery, ?] hitting semicolon, hit Enter. 00:54:00.230 --> 00:54:05.750 And now, if I say SELECT [? star ?] FROM "created" semicolon, 00:54:05.750 --> 00:54:09.500 I should see that this artist's affiliation with their work 00:54:09.500 --> 00:54:11.600 no longer exists. 00:54:11.600 --> 00:54:13.110 And because it no longer exists. 00:54:13.110 --> 00:54:15.990 I can now delete them from the artists table. 00:54:15.990 --> 00:54:20.210 I don't have this FOREIGN KEY artist_id referencing 00:54:20.210 --> 00:54:22.520 the primary key of this artist. 00:54:22.520 --> 00:54:23.900 So let's now try that. 00:54:23.900 --> 00:54:26.960 I could say SELECT [? star ?] FROM "artists," 00:54:26.960 --> 00:54:30.650 I see that Unidentified artist is still in here. 00:54:30.650 --> 00:54:32.510 But let's delete them now. 00:54:32.510 --> 00:54:36.290 I'll say delete from, let's say, "artists" WHERE 00:54:36.290 --> 00:54:42.620 the "name" equals "Unidentified artist," closing my quotes, semicolon, hit 00:54:42.620 --> 00:54:43.280 Enter. 00:54:43.280 --> 00:54:47.780 And now if I Select [? star ?] FROM "artists" semicolon, 00:54:47.780 --> 00:54:52.520 I should see they're no longer in this table to our earlier example. 00:54:52.520 --> 00:54:54.750 This becomes a 2-step process. 00:54:54.750 --> 00:54:59.480 First delete their affiliation, then delete their name. 00:54:59.480 --> 00:55:07.910 OK, So let's try this yet again, but now using some additional tools 00:55:07.910 --> 00:55:10.140 that we have at our disposal. 00:55:10.140 --> 00:55:16.860 So let's look back at this FOREIGN KEY constraint that existed in this table. 00:55:16.860 --> 00:55:19.220 If you remember, it looked a bit like this. 00:55:19.220 --> 00:55:21.980 In my created table, I had this line. 00:55:21.980 --> 00:55:28.220 FOREIGN KEY, artist_id, REFERENCES the "id" column in "artists." 00:55:28.220 --> 00:55:34.430 Well, if that is the case, I can't delete the artist with the id 00:55:34.430 --> 00:55:39.170 referenced by this [? foreign ?] [? key. ?] But it turns out that I 00:55:39.170 --> 00:55:44.390 could specify some alternative action that happens when I try to delete 00:55:44.390 --> 00:55:48.590 the id that is referenced by this [? foreign ?] [? key. ?] I can specify 00:55:48.590 --> 00:55:52.040 that using a keyword called ON DELETE. 00:55:52.040 --> 00:55:54.860 So after FOREIGN KEY REFERENCES [? "artist id," ?] I could say 00:55:54.860 --> 00:55:59.600 afterwards ON DELETE, and then specify some action I want to actually happen 00:55:59.600 --> 00:56:03.500 when I tried to delete the primary key referenced by this [? foreign ?] 00:56:03.500 --> 00:56:04.430 [? key. ?] 00:56:04.430 --> 00:56:08.000 One thing I could use would be RESTRICT, which 00:56:08.000 --> 00:56:09.980 is kind of like what we saw before. 00:56:09.980 --> 00:56:14.600 If I try to delete a primary key is [? referenced ?] by this [? foreign ?] 00:56:14.600 --> 00:56:17.090 [? key, ?] I will not be allowed to do it. 00:56:17.090 --> 00:56:19.720 That action is restricted. 00:56:19.720 --> 00:56:23.410 I could also decide to take NO ACTION. 00:56:23.410 --> 00:56:28.150 In this case, I could delete the primary key [? referenced ?] by this 00:56:28.150 --> 00:56:30.560 [? foreign ?] [? key, ?] and nothing would happen. 00:56:30.560 --> 00:56:35.290 I would actually be allowed to do that, which may be unwise in some cases, 00:56:35.290 --> 00:56:38.050 but I could give myself that power. 00:56:38.050 --> 00:56:41.590 I could also decide to SET NULL. 00:56:41.590 --> 00:56:46.330 That is if I delete the primary key that is [? referenced ?] by this 00:56:46.330 --> 00:56:49.690 [? foreign ?] [? key, ?] what I'll do is set the [? foreign ?] [? key ?] to be 00:56:49.690 --> 00:56:53.657 NULL, meaning that value no longer exists. 00:56:53.657 --> 00:56:56.740 I can alternatively [? set ?] it to a [? default ?] value for that column. 00:56:56.740 --> 00:57:02.770 Or perhaps, most compellingly, I could try to CASCADE the deletion. 00:57:02.770 --> 00:57:06.590 Where CASCADE means if I delete the artist, for instance, 00:57:06.590 --> 00:57:11.590 let's also delete their affiliation with their artwork all in one go. 00:57:11.590 --> 00:57:16.430 This converts our 2-step process into a 1-step process. 00:57:16.430 --> 00:57:18.710 So let's visualize this. 00:57:18.710 --> 00:57:24.040 Let's say we have now applied this constraint ON DELETE CASCADE. 00:57:24.040 --> 00:57:27.850 So if I delete the artist, I'll also delete their affiliation 00:57:27.850 --> 00:57:29.110 with their work. 00:57:29.110 --> 00:57:30.610 We'll have the same two tables-- 00:57:30.610 --> 00:57:33.610 artists, collections, and created. 00:57:33.610 --> 00:57:39.430 Now again, artist_id references this primary key in artists. 00:57:39.430 --> 00:57:40.510 Now, I'm going to try. 00:57:40.510 --> 00:57:43.550 I want to delete Unidentified artist here. 00:57:43.550 --> 00:57:44.510 Well, I could do that. 00:57:44.510 --> 00:57:45.940 I could just delete their row. 00:57:45.940 --> 00:57:50.440 And now, instead of a FOREIGN KEY constraint error, what I get 00:57:50.440 --> 00:57:51.490 is the following. 00:57:51.490 --> 00:57:55.690 [INAUDIBLE] [? created ?] that this row is also gone. 00:57:55.690 --> 00:57:59.960 The row that had the artist_id of 3 gets removed. 00:57:59.960 --> 00:58:04.960 We've cascaded the deletion onward to the created table. 00:58:04.960 --> 00:58:08.760 So let's try this now with a new database schema. 00:58:08.760 --> 00:58:10.590 I'll go back to my computer here. 00:58:14.497 --> 00:58:16.080 So I'm back in my [? TERMINAL ?] here. 00:58:16.080 --> 00:58:21.180 And I can type sqlite3 mfa.db to reopen this database. 00:58:21.180 --> 00:58:26.160 And notice how if I type .schema, I've updated this schema to now have ON 00:58:26.160 --> 00:58:27.150 DELETE CASCADE. 00:58:27.150 --> 00:58:29.130 Let me show you over here on the screen. 00:58:29.130 --> 00:58:34.290 Here on the created table, I now have the very same kind of table schema, 00:58:34.290 --> 00:58:38.610 but now my only difference is I've applied this ON DELETE action 00:58:38.610 --> 00:58:40.560 to my [? foreign ?] [? key ?] constraints. 00:58:40.560 --> 00:58:42.870 In particular, I'm going to [? cascade ?] the delete 00:58:42.870 --> 00:58:46.570 from the artist's table to the created table. 00:58:46.570 --> 00:58:49.110 So let's try that out in SQLite. 00:58:49.110 --> 00:58:52.770 I'll come back over here, and I will now try just 00:58:52.770 --> 00:59:01.440 DELETE FROM the "artists" table WHERE the "name" equals "Unidentified artist. 00:59:01.440 --> 00:59:02.430 semicolon. 00:59:02.430 --> 00:59:08.280 Now, I'll hit Enter, and I don't get a FOREIGN KEY constraint anymore. 00:59:08.280 --> 00:59:12.780 But if I say SELECT [? star ?] FROM "created" semicolon, 00:59:12.780 --> 00:59:18.130 notice how I've also deleted the artist's affiliation with their work. 00:59:18.130 --> 00:59:21.960 So that is wherever in the artist_id column, 00:59:21.960 --> 00:59:27.940 I saw the id for the artist I deleted, I would to delete that row, 00:59:27.940 --> 00:59:34.290 so I have no references to that primary key, which is now gone from my table. 00:59:34.290 --> 00:59:36.340 So let me ask here. 00:59:36.340 --> 00:59:41.610 We've seen how to delete single rows, how to delete multiple rows, 00:59:41.610 --> 00:59:44.595 and now how to delete data among some constraints, 00:59:44.595 --> 00:59:46.470 like our [? foreign ?] [? key ?] constraints. 00:59:46.470 --> 00:59:51.630 What questions do we have on those techniques? 00:59:51.630 --> 00:59:52.650 Let's go to [? Han. ?] 00:59:52.650 --> 00:59:55.740 SPEAKER 13: I have a question regarding deleting. 00:59:55.740 --> 01:00:01.290 The [? id ?] numbers have been removed while we were removing the record. 01:00:01.290 --> 01:00:03.900 I was wondering if do we have to clean it up somehow 01:00:03.900 --> 01:00:07.917 later or will they be populated with the new data as it comes along? 01:00:07.917 --> 01:00:09.250 SPEAKER 1: Yeah, great question. 01:00:09.250 --> 01:00:13.690 So the question is, what happens to our primary keys when we delete our data. 01:00:13.690 --> 01:00:15.900 So for this one, let me show you the visual 01:00:15.900 --> 01:00:19.680 again that we had before of our tables nicely printed on the side. 01:00:19.680 --> 01:00:26.790 So I'll come back here, and I will go back to our idea of these joint tables 01:00:26.790 --> 01:00:31.440 where we had an artists table, a collections table, and a created table. 01:00:31.440 --> 01:00:33.330 And we saw before that we're going to delete 01:00:33.330 --> 01:00:36.000 this artist called Unidentified artist. 01:00:36.000 --> 01:00:40.290 So I'll delete them, and to your point, [? well ?] the idea of 3 01:00:40.290 --> 01:00:43.410 no longer exists in this case. 01:00:43.410 --> 01:00:50.100 Now by default, at least in SQLite, if I insert a new row, what I'll do 01:00:50.100 --> 01:00:53.940 is take the highest id value and I will then 01:00:53.940 --> 01:00:58.440 make that the new primary key for the new row that I insert. 01:00:58.440 --> 01:01:01.140 That is the default situation. 01:01:01.140 --> 01:01:04.170 I could if I wanted to get more specific, 01:01:04.170 --> 01:01:11.550 and if I had done in my id integer column here in my [? create ?] 01:01:11.550 --> 01:01:14.700 [? table, ?] if I had also said this [? keyword ?] [? call ?] 01:01:14.700 --> 01:01:19.680 [? autoincrement, ?] all one word, what would happen instead is I would 01:01:19.680 --> 01:01:23.710 actually reinsert an ID that is not used. 01:01:23.710 --> 01:01:28.090 So in this case 3 is not used, I could insert that one here. 01:01:28.090 --> 01:01:30.360 So up to you what you want to do. 01:01:30.360 --> 01:01:35.670 In general, SQlite by default will take the highest ID, add one from there. 01:01:35.670 --> 01:01:38.940 If you specify this constraint [? called ?] [? autoincrement, ?] you 01:01:38.940 --> 01:01:44.310 will instead take whatever id is not being used and use that in your insert 01:01:44.310 --> 01:01:46.460 afterwards. 01:01:46.460 --> 01:01:51.530 OK, so now we've seen how to insert, and how to delete data. 01:01:51.530 --> 01:01:55.560 But of course, we make mistakes when we add data or even when we delete data. 01:01:55.560 --> 01:01:58.580 So we'll see in just a minute how to update our values 01:01:58.580 --> 01:02:01.910 as well to correct typos and even to update associations 01:02:01.910 --> 01:02:04.790 between artists and artwork. 01:02:04.790 --> 01:02:06.510 And we're back. 01:02:06.510 --> 01:02:10.250 So we've so far seen how to insert some data into our tables 01:02:10.250 --> 01:02:12.080 and how to delete it. 01:02:12.080 --> 01:02:15.470 But sometimes, we don't want to fully delete something, 01:02:15.470 --> 01:02:18.500 we just want to change its value to correct some typo 01:02:18.500 --> 01:02:20.480 or correct some association. 01:02:20.480 --> 01:02:24.950 So let's think to our MFA example, [? where ?] the Museum of Fine Arts 01:02:24.950 --> 01:02:27.270 has some tables that look like this. 01:02:27.270 --> 01:02:30.050 They have artists in their collection, and they also 01:02:30.050 --> 01:02:32.180 have art work in their collection. 01:02:32.180 --> 01:02:36.950 They also have a created table to associate artists with their artwork. 01:02:36.950 --> 01:02:43.190 Now in this case, we know that I have this Unidentified artist. 01:02:43.190 --> 01:02:47.420 And we can see that they authored this item in the collections, 01:02:47.420 --> 01:02:50.000 "Farmers working at dawn." 01:02:50.000 --> 01:02:55.730 But let's say that later on, we find out it wasn't an Unidentified artist. 01:02:55.730 --> 01:03:00.020 It was instead Li Yin, who created "Farmers working at dawn." 01:03:00.020 --> 01:03:06.350 How could we update our tables to make sure it's Li Yin, who we have 01:03:06.350 --> 01:03:10.660 creating "Farmers working at dawn." 01:03:10.660 --> 01:03:15.154 SPEAKER 14: So what we could do is-- 01:03:15.154 --> 01:03:20.690 we know that in created, we have our artist_id and [? collection id, ?] 01:03:20.690 --> 01:03:23.920 so we could update it there. 01:03:23.920 --> 01:03:28.840 But somehow, we need to cascade it over to the [? artists ?] [INAUDIBLE] now 01:03:28.840 --> 01:03:32.380 this unidentified [? artist ?] [? through-- ?] I think we probably 01:03:32.380 --> 01:03:33.107 don't need it. 01:03:33.107 --> 01:03:34.690 SPEAKER 1: Yeah, so a good point here. 01:03:34.690 --> 01:03:39.250 We could actually probably change the created table to [? re-associate ?] 01:03:39.250 --> 01:03:41.750 an artist with some new artwork here. 01:03:41.750 --> 01:03:42.880 So let's visualize this. 01:03:42.880 --> 01:03:44.540 I go to my created table. 01:03:44.540 --> 01:03:51.010 And here the artist_id is currently 3, but I want it really to be 1. 01:03:51.010 --> 01:03:55.870 I want Li Yin to be associated with this piece called "Farmers working at dawn." 01:03:55.870 --> 01:04:01.600 So I could update the created table to instead of having 3 here have 1. 01:04:01.600 --> 01:04:06.220 Now, we see Li Yin created "Farmers working at dawn." [? And if ?] 01:04:06.220 --> 01:04:08.830 you're concerned about this Unidentified artist here, 01:04:08.830 --> 01:04:11.380 I think it's OK to have an artist in our table 01:04:11.380 --> 01:04:14.350 who may or may not have an item in collections. 01:04:14.350 --> 01:04:17.240 We'll say that's OK at least for now. 01:04:17.240 --> 01:04:22.270 So if we can update our associations between artists and collections 01:04:22.270 --> 01:04:26.770 like this, let's actually try to do that in our very own database here. 01:04:26.770 --> 01:04:33.010 I'll go back to my SQLite environment, and now, let me try to open it first. 01:04:33.010 --> 01:04:35.560 I'll use sqlite3 mfa.db. 01:04:35.560 --> 01:04:39.460 And now let me type .schema to show you [INAUDIBLE] the very same schema from 01:04:39.460 --> 01:04:40.250 before. 01:04:40.250 --> 01:04:45.340 So I want to update the artists association between Li Yin and "Farmers 01:04:45.340 --> 01:04:46.570 working at dawn." 01:04:46.570 --> 01:04:51.370 So let's say, I'll SELECT [? star ?] FROM "created," like this, semicolon. 01:04:51.370 --> 01:04:55.990 Here, I have my artist_ids and my collection_ids. 01:04:55.990 --> 01:05:00.340 I see that we have the Unidentified artist, id of 3, 01:05:00.340 --> 01:05:05.690 creating "Farmers working at dawn," with the id of 1 in our collection here. 01:05:05.690 --> 01:05:11.170 So now, let's try to update the artist who created this particular painting. 01:05:11.170 --> 01:05:16.660 Here, I have 3 associated with 1, but I want 1 associated with 1, 01:05:16.660 --> 01:05:18.580 where 1 is the id for Li Yin. 01:05:18.580 --> 01:05:22.858 And [? 1 ?] [? in ?] my collection id is the id for this artwork here. 01:05:22.858 --> 01:05:23.650 So let me try this. 01:05:23.650 --> 01:05:29.620 I'll say UPDATE "created" and SET, let's say, "artist_id" 01:05:29.620 --> 01:05:32.770 equal to some particular value. 01:05:32.770 --> 01:05:35.650 Well, what value should I set it to? 01:05:35.650 --> 01:05:39.940 I could try to set it to Li Yin's id, which we know is just 1. 01:05:39.940 --> 01:05:43.300 But let me try instead to use a [? subquery ?] here. 01:05:43.300 --> 01:05:49.120 Let me try to say parentheses, and then write some query to update this value. 01:05:49.120 --> 01:05:55.300 I'll say SELECT "id" from "artists" like this Enter again, 01:05:55.300 --> 01:06:00.190 WHERE the "name" equals Li Yin. 01:06:00.190 --> 01:06:04.090 And then let me close this [? subquery. ?] Well, 01:06:04.090 --> 01:06:10.510 if I try to run this query, what might Happen I'm updating the created table, 01:06:10.510 --> 01:06:16.600 I'm setting the artist_id equal to the id for Li Yin, but what I've forgotten 01:06:16.600 --> 01:06:21.520 is this WHERE to only choose some rows to update. 01:06:21.520 --> 01:06:23.200 [INAUDIBLE] not close it yet. 01:06:23.200 --> 01:06:29.140 I'll instead say WHERE, in this case, the "collection_id," the piece 01:06:29.140 --> 01:06:34.120 of artwork in our collection, is equal to [? while ?] the id 01:06:34.120 --> 01:06:35.330 for this painting. 01:06:35.330 --> 01:06:39.100 I'll say SELECT "id" from "collections," and then I'll 01:06:39.100 --> 01:06:45.710 say WHERE the "title" equals "Farmers working at dawn." 01:06:45.710 --> 01:06:49.700 Now, I'll close this [? subquery, ?] and hit semicolon. 01:06:49.700 --> 01:06:53.600 And here, we've seen our first example of an UPDATE query. 01:06:53.600 --> 01:06:56.510 I'm trying to update the artist_id column 01:06:56.510 --> 01:06:59.990 in created to be the id for Li Yin. 01:06:59.990 --> 01:07:03.050 I only want to do that though on the row where 01:07:03.050 --> 01:07:07.460 collection_id is equal to the id for this particular painting. 01:07:07.460 --> 01:07:09.470 [INAUDIBLE] change the attribution for. 01:07:09.470 --> 01:07:11.360 So now, I'll hit Enter. 01:07:11.360 --> 01:07:14.370 And if I SELECT [? star ?] FROM "created," 01:07:14.370 --> 01:07:20.270 I should hopefully see in this case that the artist_id associated 01:07:20.270 --> 01:07:25.680 with this painting is 2 and also down below here is 1 as well. 01:07:25.680 --> 01:07:31.290 So I have Li Yin associated with now two paintings overall. 01:07:31.290 --> 01:07:34.760 So let's get a grasp on what this [? UPDATE ?] syntax really 01:07:34.760 --> 01:07:36.140 looks like in general. 01:07:36.140 --> 01:07:38.720 And for that, let's show the slide here. 01:07:38.720 --> 01:07:44.090 We have this [? UPDATE ?] [? keyword, ?] [? UPDATE ?] statement in SQL to take 01:07:44.090 --> 01:07:47.510 a table name and update the columns inside of it. 01:07:47.510 --> 01:07:51.410 I say UPDATE, then the name of the table I want to update, 01:07:51.410 --> 01:07:56.810 then I say SET some column equal to some value. 01:07:56.810 --> 01:07:59.910 I could if I wanted to have more than one column here. 01:07:59.910 --> 01:08:03.620 I could say maybe title and even maybe if we're 01:08:03.620 --> 01:08:07.040 talking about authors, authors over here, or even acquired date. 01:08:07.040 --> 01:08:11.930 I could update more than one column in my [? SET ?] [? portion ?] here. 01:08:11.930 --> 01:08:16.370 Then comes this WHERE portion, where some condition is true. 01:08:16.370 --> 01:08:19.970 I want to make sure I don't update all of my rows. 01:08:19.970 --> 01:08:24.270 I only update those where some condition is actually true. 01:08:24.270 --> 01:08:28.069 So this is your syntax for updating some columns. 01:08:28.069 --> 01:08:30.859 Let's say, if you want to change an artist's attribution 01:08:30.859 --> 01:08:34.950 or if you want to change a typo you've made. 01:08:34.950 --> 01:08:38.810 So let's see this now, not just in terms of changing 01:08:38.810 --> 01:08:42.394 artists and their attributions, let's see a use case for update 01:08:42.394 --> 01:08:45.300 where we've made some mistakes in our data. 01:08:45.300 --> 01:08:49.609 And let's say the museum decides to host some kind of event 01:08:49.609 --> 01:08:52.160 where people vote on their favorite piece of artwork, 01:08:52.160 --> 01:08:54.920 they kind of [? handwrite ?] or type it into some online form. 01:08:54.920 --> 01:08:58.010 Well, when I get back, a CSV of those responses, 01:08:58.010 --> 01:09:02.600 some Comma Separated Values, one line for each vote from our people 01:09:02.600 --> 01:09:04.640 who've attended this convention. 01:09:04.640 --> 01:09:08.479 Let's go back over here, and I'll show you that CSV. 01:09:08.479 --> 01:09:11.060 Let me go to my environment. 01:09:11.060 --> 01:09:16.939 And I'll type code mfa.csv to open this CSV that I already have. 01:09:16.939 --> 01:09:19.430 And here, actually, it's not called mfa.csv. 01:09:19.430 --> 01:09:22.430 It's called code votes.csv. 01:09:22.430 --> 01:09:27.890 And now here we can see, I have a table of one column 01:09:27.890 --> 01:09:29.870 that has several votes inside of it. 01:09:29.870 --> 01:09:33.140 Let's see, we have maybe 20 votes to be exact. 01:09:33.140 --> 01:09:35.840 So the first row is the header row. 01:09:35.840 --> 01:09:39.649 I have in this CSV, one column called title. 01:09:39.649 --> 01:09:44.029 And each line here is one vote for a [? museum-goers ?] favorite piece 01:09:44.029 --> 01:09:44.750 of artwork. 01:09:44.750 --> 01:09:48.319 We see "Farmers working at dawn," we see "Imaginative landscape," 01:09:48.319 --> 01:09:49.790 "Profusion of flowers." 01:09:49.790 --> 01:09:55.280 And our goal is to count up these votes to see which is the most popular. 01:09:55.280 --> 01:10:01.670 Well, let's try using .import again to actually turn this CSV into our very 01:10:01.670 --> 01:10:03.530 own SQLite database. 01:10:03.530 --> 01:10:08.240 Let me go back to my [? TERMINAL, ?] and I'll type sqlite3 votes.db 01:10:08.240 --> 01:10:10.580 to create a votes database. 01:10:10.580 --> 01:10:11.630 I'll hit Enter. 01:10:11.630 --> 01:10:14.390 And now, I can use .import. 01:10:14.390 --> 01:10:21.440 I could say .import a CSV called votes.csv into a table also called 01:10:21.440 --> 01:10:22.370 votes. 01:10:22.370 --> 01:10:23.390 I'll Enter now. 01:10:23.390 --> 01:10:28.280 And now if I SELECT [? star ?] FROM "votes" semicolon, 01:10:28.280 --> 01:10:33.210 I'll see all of my votes now in a single table. 01:10:33.210 --> 01:10:35.040 So I want to tally these votes. 01:10:35.040 --> 01:10:37.790 And we could do that using a technique we learned back in the week 01:10:37.790 --> 01:10:40.010 on querying and relating. 01:10:40.010 --> 01:10:44.930 What if I wanted to group by my titles and count up those groups, 01:10:44.930 --> 01:10:45.990 I could do that. 01:10:45.990 --> 01:10:50.990 I could say SELECT, let's say, "title," and also the COUNT of "title," 01:10:50.990 --> 01:10:55.530 and [? they ?] count up the votes for each title FROM, in this case, 01:10:55.530 --> 01:10:58.050 my "votes" table like this. 01:10:58.050 --> 01:11:03.510 Now, I want to count not just each vote individually, but I want to group them. 01:11:03.510 --> 01:11:07.870 I want to count votes by group where that group is, in this case, 01:11:07.870 --> 01:11:09.610 the "title" itself. 01:11:09.610 --> 01:11:13.110 So let me try running this query and hitting Enter. 01:11:13.110 --> 01:11:17.370 Well, [? I'll ?] [? see ?] I've started to count some votes, 01:11:17.370 --> 01:11:23.130 like I have Imaginative landscape here, Spring outing, Farmers working at dawn, 01:11:23.130 --> 01:11:25.110 Spring outing down below here too. 01:11:25.110 --> 01:11:28.950 But I seem to be missing something, like I shouldn't have this many groups. 01:11:28.950 --> 01:11:30.570 I should only have four groups. 01:11:30.570 --> 01:11:32.940 Why do I have so many? 01:11:32.940 --> 01:11:35.010 Well, it seems like there are some typos here. 01:11:35.010 --> 01:11:37.660 What typos do you see? 01:11:37.660 --> 01:11:40.680 Feel free to raise your hand. 01:11:40.680 --> 01:11:43.920 SPEAKER 15: So I think that it is a title-- 01:11:43.920 --> 01:11:49.215 when you count it is case sensitive, so maybe it is counting one by one. 01:11:49.215 --> 01:11:50.340 SPEAKER 1: Yeah, good idea. 01:11:50.340 --> 01:11:53.670 So maybe it's case sensitive our [? GROUP ?] [? BY, ?] and you'd be 01:11:53.670 --> 01:11:56.550 correct if you said that. [? GROUP ?] [? BY ?] is, in this case, 01:11:56.550 --> 01:11:57.910 case sensitive. 01:11:57.910 --> 01:12:01.020 So if we see here, we see Farmers Working at Dawn, 01:12:01.020 --> 01:12:04.890 all capitalized, at least the Farmers Working and Dawn capitalized. 01:12:04.890 --> 01:12:07.230 Well, that won't be part of the same group 01:12:07.230 --> 01:12:13.590 as let's say Farmers working at dawn all properly capitalized, in this case, 01:12:13.590 --> 01:12:17.940 with lowercase working and lowercase dawn as it is the MFA collections. 01:12:17.940 --> 01:12:21.360 So we could probably fix some capitalization here. 01:12:21.360 --> 01:12:23.240 What other things could we fix? 01:12:23.240 --> 01:12:25.665 So let's hear from one more person. 01:12:25.665 --> 01:12:26.790 Let's go to [? Lawrence. ?] 01:12:26.790 --> 01:12:29.663 SPEAKER 16: [? The ?] [? count ?] on each typo. 01:12:29.663 --> 01:12:31.830 SPEAKER 1: There are some typos in here, absolutely. 01:12:31.830 --> 01:12:35.400 So if I look at some of these, I see not farmers, 01:12:35.400 --> 01:12:37.620 but [? Famers ?] working at dawn. 01:12:37.620 --> 01:12:40.410 I might see instead of Imaginative landscape, 01:12:40.410 --> 01:12:43.470 I just see [? Imagintive ?] landscape, so definitely some 01:12:43.470 --> 01:12:46.120 typos in here to fix as well. 01:12:46.120 --> 01:12:49.380 So let's start working on using our [? UPDATE ?] tool or [? UPDATE ?] 01:12:49.380 --> 01:12:53.230 keyword to fix some of this data, to clean it up, so to speak, 01:12:53.230 --> 01:12:56.460 so we can count up our votes for these artworks. 01:12:56.460 --> 01:13:02.550 Come back to my [? TERMINAL. ?] And now, let's try updating to at least remove 01:13:02.550 --> 01:13:04.170 some of this [? whitespace ?] here. 01:13:04.170 --> 01:13:07.900 You might notice that Imaginative landscape has a space before it, 01:13:07.900 --> 01:13:09.570 and so does Spring outing here. 01:13:09.570 --> 01:13:13.860 That space counts as part of the title, and I want to remove it. 01:13:13.860 --> 01:13:15.850 So what could I do? 01:13:15.850 --> 01:13:21.750 I could try using a function in SQLite, one called [? trim. ?] [? Trim ?] takes 01:13:21.750 --> 01:13:26.020 a string and just removes leading and trailing [? whitespace, ?] [INAUDIBLE] 01:13:26.020 --> 01:13:28.650 back only the characters in the middle. 01:13:28.650 --> 01:13:29.830 So let me try that. 01:13:29.830 --> 01:13:34.410 I'll say UPDATE "votes," my table here, and SET 01:13:34.410 --> 01:13:38.710 the "title" column equal to what? 01:13:38.710 --> 01:13:41.130 Well, the trimmed version of title. 01:13:41.130 --> 01:13:42.810 Trim looks like this-- 01:13:42.810 --> 01:13:47.500 trim, and then some parentheses to say the input to this function. 01:13:47.500 --> 01:13:49.620 I'll give it the "title" column. 01:13:49.620 --> 01:13:51.610 Now, I'll hit semicolon. 01:13:51.610 --> 01:13:54.250 And this query will run as follows. 01:13:54.250 --> 01:13:56.370 It will update the votes table. 01:13:56.370 --> 01:13:59.880 And inside the votes table set, the title column 01:13:59.880 --> 01:14:04.950 equal to the trimmed version of whatever values are inside that column, 01:14:04.950 --> 01:14:09.210 trimming here means remove trailing and leading [? whitespace. ?] 01:14:09.210 --> 01:14:11.850 I don't need to apply any condition. 01:14:11.850 --> 01:14:15.270 I don't need to say WHERE dot dot dot, because I want this 01:14:15.270 --> 01:14:18.910 to apply to all rows in my data set. 01:14:18.910 --> 01:14:20.220 Now, I'll hit Enter. 01:14:20.220 --> 01:14:21.750 And let me try running this again. 01:14:21.750 --> 01:14:24.840 I'll say up arrow to get back my old query. 01:14:24.840 --> 01:14:29.430 Now, I'll try counting again, and I think we're making some progress, 01:14:29.430 --> 01:14:31.620 like I can see that I've removed-- 01:14:31.620 --> 01:14:34.890 [? removed ?] [? quote ?] [? unquote, ?] those titles that had leading 01:14:34.890 --> 01:14:37.800 or trailing [? whitespace. ?] They're now part of a group 01:14:37.800 --> 01:14:41.400 because their titles now match those groups. 01:14:41.400 --> 01:14:42.120 Let's keep going. 01:14:42.120 --> 01:14:44.620 We noticed before, we had some capitalization issues. 01:14:44.620 --> 01:14:46.930 So let's fix those as well. 01:14:46.930 --> 01:14:50.670 I can go back to my [? TERMINAL, ?] and we could introduce this other function, 01:14:50.670 --> 01:14:52.170 one called [? upper. ?] 01:14:52.170 --> 01:14:55.620 [? Upper ?] converts a string like Farmers 01:14:55.620 --> 01:14:59.490 working at dawn to be all in uppercase, kind of shout it out. 01:14:59.490 --> 01:15:03.060 And that's useful for us here because we could kind of force everything 01:15:03.060 --> 01:15:07.840 to uppercase, thereby removing any inconsistencies in capitalization. 01:15:07.840 --> 01:15:08.950 So let me try that. 01:15:08.950 --> 01:15:12.130 I'll say UPDATE, again, the "votes" table. 01:15:12.130 --> 01:15:18.480 And this time, I'll set "title" equal to the uppercase version of "title." 01:15:18.480 --> 01:15:22.200 So upper here is a function in SQLite that 01:15:22.200 --> 01:15:25.530 takes some values associated with the column and [? uppercases ?] 01:15:25.530 --> 01:15:30.420 the entire thing to remove any inconsistencies in capitalization. 01:15:30.420 --> 01:15:34.620 Again, no condition here, because I want to apply this to all rows. 01:15:34.620 --> 01:15:35.730 I'll hit Enter. 01:15:35.730 --> 01:15:41.340 And now if I SELECT [? star ?] FROM "collections" semicolon-- 01:15:41.340 --> 01:15:43.618 oops, SELECT [? star ?] FROM "votes," in this case-- 01:15:43.618 --> 01:15:44.910 no longer as collections table. 01:15:44.910 --> 01:15:48.030 I'll say SELECT [? star ?] FROM "votes" semicolon, 01:15:48.030 --> 01:15:51.390 I should see all of my titles capitalized. 01:15:51.390 --> 01:15:53.130 But now, I want to group by. 01:15:53.130 --> 01:15:54.060 So let's try this. 01:15:54.060 --> 01:16:00.120 I'll go up arrow to get back to my old query that GROUP BY "title," hit Enter. 01:16:00.120 --> 01:16:02.760 And now, we're still making some more progress. 01:16:02.760 --> 01:16:05.970 I'm reducing my number of groups until I get to 4. 01:16:05.970 --> 01:16:10.260 But there's still some typos, like we see Famers working at dawn, 01:16:10.260 --> 01:16:14.130 or Farmers working, or Farmesr working at dawn. 01:16:14.130 --> 01:16:16.560 We want these to be included as part of this group-- 01:16:16.560 --> 01:16:18.870 Farmers working at dawn. 01:16:18.870 --> 01:16:22.240 So we should fix some of these typos here. 01:16:22.240 --> 01:16:25.380 Let me go back to my [? TERMINAL. ?] Now, let's try this out. 01:16:25.380 --> 01:16:29.040 I could, in this case, just manually correct these. 01:16:29.040 --> 01:16:31.170 You could imagine me doing something like this. 01:16:31.170 --> 01:16:36.900 UPDATE "votes" and SET the "title" equal to all caps 01:16:36.900 --> 01:16:43.170 "FARMERS WORKING AT DAWN," now including a condition, WHERE the "title" 01:16:43.170 --> 01:16:50.617 presently is equal to for "FAMERS WORKING" like this, semicolon-- 01:16:50.617 --> 01:16:51.450 or not Famers work-- 01:16:51.450 --> 01:16:54.480 I think it's Farmers Working, just all-- 01:16:54.480 --> 01:16:55.620 in two words. 01:16:55.620 --> 01:16:56.703 That was one title we saw. 01:16:56.703 --> 01:16:57.620 [? TRANSCRIPT BREAK ?] 01:16:57.620 --> 01:16:59.950 SPEAKER: I could manually update it a bit like this. 01:16:59.950 --> 01:17:02.610 Let me hit Enter, and let me do the same query 01:17:02.610 --> 01:17:05.550 by hitting the up arrow on my keyboard. 01:17:05.550 --> 01:17:07.910 Let me find this one, hit Enter, and now we 01:17:07.910 --> 01:17:10.880 can see that that title, "Farmers Working," 01:17:10.880 --> 01:17:13.040 is now just part of the main group. 01:17:13.040 --> 01:17:15.470 "Farmers Working At Dawn." 01:17:15.470 --> 01:17:17.040 I could keep going here. 01:17:17.040 --> 01:17:21.560 I could try to convert "Farmers Working At Dawn" to "Farmers Working At Dawn." 01:17:21.560 --> 01:17:22.500 Let me try it. 01:17:22.500 --> 01:17:24.110 I could say select. 01:17:24.110 --> 01:17:25.952 Let's say-- not select. 01:17:25.952 --> 01:17:26.535 Let me update. 01:17:26.535 --> 01:17:29.690 Let me update in this case votes, and set 01:17:29.690 --> 01:17:37.340 title equal to "Farmers Working At Dawn" where the title equals 01:17:37.340 --> 01:17:41.330 in this case "Farmers Working At Dawn." 01:17:41.330 --> 01:17:47.330 Semicolon, Enter, let me hit up arrow to go back to my old query to group by, 01:17:47.330 --> 01:17:48.890 and now we're getting somewhere. 01:17:48.890 --> 01:17:52.370 I'm seeing now only two groups for "Farmers Working At Dawn," 01:17:52.370 --> 01:17:55.190 but one more to fix. 01:17:55.190 --> 01:17:59.150 Now we'd be here for a while if I spend all this time 01:17:59.150 --> 01:18:04.380 one by one fixing these typos, so there's probably a better way. 01:18:04.380 --> 01:18:05.150 There is. 01:18:05.150 --> 01:18:08.090 This one is going to be using the like keyword. 01:18:08.090 --> 01:18:10.340 We don't have to say equal so much. 01:18:10.340 --> 01:18:11.840 We can just say like. 01:18:11.840 --> 01:18:16.850 Now, I could try to match any title that looks like "Farmers Working At Dawn" 01:18:16.850 --> 01:18:19.400 and update it to be the correct title. 01:18:19.400 --> 01:18:21.390 So let's try that here. 01:18:21.390 --> 01:18:29.150 I could now try update votes and set title equal to "Farmers Working 01:18:29.150 --> 01:18:30.980 At Dawn." 01:18:30.980 --> 01:18:38.490 "Farmers Working At Dawn" where the title is like-- 01:18:38.490 --> 01:18:42.000 the title is like some pattern I could give. 01:18:42.000 --> 01:18:48.210 Now for this data set, it turns out that the only artwork that begins with F-A 01:18:48.210 --> 01:18:49.980 is Farmers Working At Dawn. 01:18:49.980 --> 01:18:55.110 So to match any title with F-A and then anything afterwards, 01:18:55.110 --> 01:19:01.110 I could use F-A percent, and then close out that pattern. 01:19:01.110 --> 01:19:04.800 Notice how I don't have to have consistent capitalization. 01:19:04.800 --> 01:19:09.420 Like is case insensitive, and notice how, too, the percent matches 01:19:09.420 --> 01:19:15.870 anything that comes after F-A. Now in this data set, this is OK, 01:19:15.870 --> 01:19:20.490 but in a larger data set, you probably want to avoid this kind of query 01:19:20.490 --> 01:19:23.580 because you could match much more than Farmers Working At Dawn. 01:19:23.580 --> 01:19:27.600 You'd match anything that begins with F-A, so be careful with this 01:19:27.600 --> 01:19:32.370 and think through what pattern should you use to fix up your titles. 01:19:32.370 --> 01:19:36.120 Let me just hit Enter here, and now let me try hitting up arrow again 01:19:36.120 --> 01:19:40.530 to show you the result, and now we're in a pretty good place. 01:19:40.530 --> 01:19:44.820 I see all of my titles like Farmers Working At Dawn are correctly 01:19:44.820 --> 01:19:50.350 capitalized, consistently spelled, and all in one place. 01:19:50.350 --> 01:19:52.200 OK, let's try the others here. 01:19:52.200 --> 01:19:55.530 I also have "Imaginative landscape which could be a little bit 01:19:55.530 --> 01:19:56.790 better designed here. 01:19:56.790 --> 01:20:00.210 A little bit better grouped as well by fixing some of these typos. 01:20:00.210 --> 01:20:05.190 Well, here, I could try to match on to maybe imaginative. 01:20:05.190 --> 01:20:08.380 That seems like a long enough word that it could be useful for me here. 01:20:08.380 --> 01:20:10.260 So I'll say something like select-- 01:20:10.260 --> 01:20:10.770 not select. 01:20:10.770 --> 01:20:19.050 I'll do update votes and set the title equal to "Imaginative Landscape." 01:20:19.050 --> 01:20:26.640 Now I want to match any rows that have a title like "Imaginative space 01:20:26.640 --> 01:20:30.150 percent," end quote, semicolon. 01:20:30.150 --> 01:20:35.010 Now I'll match any title that has imaginative as a full word 01:20:35.010 --> 01:20:37.650 at the very beginning of this phrase. 01:20:37.650 --> 01:20:41.100 This is, I would say, a better query than F-A 01:20:41.100 --> 01:20:44.100 because F-A could match much more than this painting name. 01:20:44.100 --> 01:20:48.150 Here, if we have only one painting that begins with imaginative, 01:20:48.150 --> 01:20:50.232 this could work for us now. 01:20:50.232 --> 01:20:50.940 So I'll try this. 01:20:50.940 --> 01:20:55.100 I'll say Enter, and now, let me rerun the query. 01:20:55.100 --> 01:20:57.832 Group by, and we're so close. 01:20:57.832 --> 01:20:58.790 There's still one more. 01:20:58.790 --> 01:21:02.090 "Imagintive Landscape," so for this one, I 01:21:02.090 --> 01:21:04.970 might just need to manually update that piece. 01:21:04.970 --> 01:21:11.480 I could say let's do update, votes, and set title equal to 01:21:11.480 --> 01:21:16.460 in this case "Imaginative Landscape" where, of course, the title currently 01:21:16.460 --> 01:21:21.140 is "Imagintive Landscape" without the A in there. 01:21:21.140 --> 01:21:23.240 Now hit semicolon to run this query. 01:21:23.240 --> 01:21:27.800 I'll show you the results, and now, I think we're literally almost there. 01:21:27.800 --> 01:21:29.150 There's only one more to do. 01:21:29.150 --> 01:21:30.530 Profusion of Flowers. 01:21:30.530 --> 01:21:32.090 Let's fix that one now. 01:21:32.090 --> 01:21:34.880 I could say select votes-- 01:21:34.880 --> 01:21:35.720 not select. 01:21:35.720 --> 01:21:36.290 Update. 01:21:36.290 --> 01:21:37.332 I'm always updating here. 01:21:37.332 --> 01:21:41.510 Update votes, and then we want to set the title 01:21:41.510 --> 01:21:45.110 column to be equal to profusion-- 01:21:45.110 --> 01:21:49.160 "Profusion of Flowers" like this. 01:21:49.160 --> 01:21:55.880 Then where the title is like, in this case, profusion at the beginning. 01:21:55.880 --> 01:21:59.330 Notice how I can't use perfusion space and then 01:21:59.330 --> 01:22:04.190 afterwards because we have one title that is literally just "Profusion," 01:22:04.190 --> 01:22:05.030 plain and simple. 01:22:05.030 --> 01:22:06.095 No spaces afterwards. 01:22:06.095 --> 01:22:07.220 No spaces at the beginning. 01:22:07.220 --> 01:22:10.860 To match that, I should just say profusion at the beginning. 01:22:10.860 --> 01:22:14.660 Now I can hit Enter, and if I do my usual query 01:22:14.660 --> 01:22:17.270 to show you the votes grouped by title, I 01:22:17.270 --> 01:22:21.800 see I'm finally at my correct, fully-formatted data set. 01:22:21.800 --> 01:22:24.230 Farmers Working At Dawn has six votes. 01:22:24.230 --> 01:22:28.460 Imaginative Landscape has five, and so on down the row. 01:22:28.460 --> 01:22:33.080 So here we've seen update being used not just to associate artists 01:22:33.080 --> 01:22:36.140 with new paintings, but also to clean our data, 01:22:36.140 --> 01:22:39.110 to make sure we can actually tally votes up in a way that's 01:22:39.110 --> 01:22:42.750 consistent and clean in this case. 01:22:42.750 --> 01:22:45.420 What questions do we have on update now? 01:22:48.380 --> 01:22:50.200 Let's go to [? Vixay. ?] 01:22:50.200 --> 01:22:51.950 [? VIXAY: ?] Yeah, so what I wanted to ask 01:22:51.950 --> 01:22:55.430 is if there is a specific function for like lowercase. 01:22:55.430 --> 01:23:00.037 So there's a function for upper, and these are the ones for all lowercase. 01:23:00.037 --> 01:23:02.120 SPEAKER: Yeah, a good corresponding question here. 01:23:02.120 --> 01:23:03.980 We used upper, but is there lower? 01:23:03.980 --> 01:23:05.600 Turns out, there is a lower function. 01:23:05.600 --> 01:23:09.680 It's spelled lower, and is in all lowercase like upper was, 01:23:09.680 --> 01:23:11.570 so I could use that as well. 01:23:11.570 --> 01:23:15.950 There is a whole list of what SQLite calls these scalar functions, functions 01:23:15.950 --> 01:23:19.940 that take in some set of values and return to you some modified 01:23:19.940 --> 01:23:21.480 version of those values. 01:23:21.480 --> 01:23:24.170 So if you do some googling, look for SQLite scalar functions. 01:23:24.170 --> 01:23:30.420 You can find all those functions in one place in the SQLite documentation. 01:23:30.420 --> 01:23:32.200 Let's take one more question here. 01:23:32.200 --> 01:23:33.620 Let's go to Rupinder. 01:23:33.620 --> 01:23:38.390 RUPINDER: So, yeah, my question is so instead of updating all the titles, 01:23:38.390 --> 01:23:42.320 is it possible to create a new column with the four categories 01:23:42.320 --> 01:23:45.355 that we want to have actually? 01:23:45.355 --> 01:23:47.230 SPEAKER: Yeah, I could see that working, too. 01:23:47.230 --> 01:23:48.855 So there's a few ways to fix this data. 01:23:48.855 --> 01:23:53.060 You could imagine adding a new column, and maybe you 01:23:53.060 --> 01:23:59.542 assign values to the rows in that column based on what you see in the title 01:23:59.542 --> 01:24:00.500 that they've given you. 01:24:00.500 --> 01:24:03.260 So if you can match something like imaginative, 01:24:03.260 --> 01:24:06.500 and you'd know that all your paintings associated with imaginative 01:24:06.500 --> 01:24:10.340 have the same title, you could group them into the one category, the two 01:24:10.340 --> 01:24:12.110 category, and so on. 01:24:12.110 --> 01:24:14.270 Here, you wouldn't be modifying those titles. 01:24:14.270 --> 01:24:16.970 You'd just be trying to strategically apply 01:24:16.970 --> 01:24:19.610 different categories, which could also work as a solution here, 01:24:19.610 --> 01:24:22.430 too, so good thinking. 01:24:22.430 --> 01:24:27.740 OK, so we've seen now how to insert data, how to update data, 01:24:27.740 --> 01:24:30.250 and previously, how to delete data. 01:24:30.250 --> 01:24:35.900 What we can still do though, are learn how to have other SQL command-- 01:24:35.900 --> 01:24:39.818 other SQL statements that can run after we make our very own SQL 01:24:39.818 --> 01:24:40.610 statements as well. 01:24:40.610 --> 01:24:44.150 We'll come back and talk about these things called triggers. 01:24:44.150 --> 01:24:45.840 And we're back. 01:24:45.840 --> 01:24:50.630 So we've seen now a whole collection of ways to write data to a database. 01:24:50.630 --> 01:24:56.080 We've seen how to insert data, how to update data, and even delete data. 01:24:56.080 --> 01:24:59.000 What we'll see now though, this idea of a trigger, 01:24:59.000 --> 01:25:04.670 a way of writing a SQL statement to run in response to some other SQL statement 01:25:04.670 --> 01:25:08.780 like an insert, an update, or a delete. 01:25:08.780 --> 01:25:12.890 Let's consider our museum yet again, and let's say they 01:25:12.890 --> 01:25:15.860 have a schema a bit like this one now. 01:25:15.860 --> 01:25:18.920 They have items in their collections, but they also 01:25:18.920 --> 01:25:21.290 have a transactions table. 01:25:21.290 --> 01:25:26.390 Now wouldn't it be nice if whenever I deleted something from the collections 01:25:26.390 --> 01:25:31.190 table, it would show up in transactions as sold, having been 01:25:31.190 --> 01:25:33.120 sold from the museum's collections. 01:25:33.120 --> 01:25:36.120 Well, I could use a trigger to do just that. 01:25:36.120 --> 01:25:38.270 Let's see it visually first though. 01:25:38.270 --> 01:25:43.160 I'll say, let's try to delete "Spring Outing" from our collections table. 01:25:43.160 --> 01:25:47.300 Well, if I do that, I could pretty easily do it with delete from, right? 01:25:47.300 --> 01:25:50.540 Now it's no longer part of my database, but if I 01:25:50.540 --> 01:25:55.760 have created a trigger to run an insert whenever I hear 01:25:55.760 --> 01:25:59.210 or have listened to a delete on my collections table, 01:25:59.210 --> 01:26:04.370 I could then see that same title in my transactions table 01:26:04.370 --> 01:26:08.250 as sold all automatically. 01:26:08.250 --> 01:26:12.320 Let's think to maybe the museum acquires some piece of artwork. 01:26:12.320 --> 01:26:15.440 Maybe they actually add something new to their collection. 01:26:15.440 --> 01:26:19.880 We want that to show up in our transactions as well with the action of 01:26:19.880 --> 01:26:20.570 bought. 01:26:20.570 --> 01:26:21.440 Well, let's try it. 01:26:21.440 --> 01:26:23.490 I could probably add a new row here. 01:26:23.490 --> 01:26:28.280 I could say let's add "Peonies and Butterfly" to my collection here. 01:26:28.280 --> 01:26:31.700 Now, if I have the appropriate trigger, I 01:26:31.700 --> 01:26:36.320 could run an insert into on transactions automatically 01:26:36.320 --> 01:26:40.190 that would insert this same title with the action of bot, 01:26:40.190 --> 01:26:45.110 thereby keeping a log of all of our sold pieces of artwork and all of our 01:26:45.110 --> 01:26:47.360 bought pieces of artwork. 01:26:47.360 --> 01:26:50.000 Now to create what we're calling a trigger here, 01:26:50.000 --> 01:26:52.430 we can use the following syntax. 01:26:52.430 --> 01:26:57.620 I could say first create trigger, and then some name. 01:26:57.620 --> 01:27:00.260 So I tend to give triggers names to identify them 01:27:00.260 --> 01:27:04.670 among all of my database [INAUDIBLE],, like my tables and so on. 01:27:04.670 --> 01:27:08.840 Here, after I say create trigger, I have to specify. 01:27:08.840 --> 01:27:15.080 Should this trigger, this statement, run after or before some other SQL 01:27:15.080 --> 01:27:16.190 statement? 01:27:16.190 --> 01:27:18.350 Let's keep going with before here. 01:27:18.350 --> 01:27:22.850 Maybe it runs before an insert on some table. 01:27:22.850 --> 01:27:23.960 That's fine. 01:27:23.960 --> 01:27:29.190 It could also run before an update of a column on some table. 01:27:29.190 --> 01:27:34.190 It could even run before a delete on some table, and keep in mind, 01:27:34.190 --> 01:27:38.450 this could be before or after any of these kinds of SQL statements. 01:27:38.450 --> 01:27:42.770 In this case, we have insert, update, and delete. 01:27:42.770 --> 01:27:48.050 Now after we say this, what we should say, too, is for each row. 01:27:48.050 --> 01:27:53.210 For each row means that if I were to maybe delete multiple rows, 01:27:53.210 --> 01:27:57.410 I should run my SQL statement for each row that I delete. 01:27:57.410 --> 01:28:03.980 If I delete two rows, I should run this statement coming soon two times. 01:28:03.980 --> 01:28:09.470 OK, so now we've set up our trigger, at least kind of the data, the setup 01:28:09.470 --> 01:28:10.550 for it here. 01:28:10.550 --> 01:28:15.620 What we haven't done yet is specify what statement should run whenever 01:28:15.620 --> 01:28:18.020 we see a delete on particular table. 01:28:18.020 --> 01:28:19.640 Well, let's try this. 01:28:19.640 --> 01:28:21.590 Here I could say begin. 01:28:21.590 --> 01:28:24.860 Begin means here comes my statement I want to run 01:28:24.860 --> 01:28:28.370 whenever I hear a delete on this table. 01:28:28.370 --> 01:28:29.840 Now, all right. 01:28:29.840 --> 01:28:31.430 That query down below. 01:28:31.430 --> 01:28:34.700 That statement inside my begin, and I'll finish it off 01:28:34.700 --> 01:28:37.910 with an end to say this is the end of my statement, 01:28:37.910 --> 01:28:41.600 and my entire trigger looks like all of this lines of code here 01:28:41.600 --> 01:28:47.210 to say listen for a delete and run this statement in here for each row 01:28:47.210 --> 01:28:50.530 that I delete, or update, or insert. 01:28:50.530 --> 01:28:53.900 So let's try actually implementing in this case 01:28:53.900 --> 01:28:57.460 the museum's collections where they're trying to sell or buy 01:28:57.460 --> 01:28:59.800 pieces of artwork, and automatically add them 01:28:59.800 --> 01:29:03.760 to their transaction table whenever they create or delete some piece of data 01:29:03.760 --> 01:29:05.710 from their collections. 01:29:05.710 --> 01:29:10.300 Go back to my terminal here, and let me pull up a database. 01:29:10.300 --> 01:29:14.260 I'll do SQLite 3 mfa.db, and you'll notice 01:29:14.260 --> 01:29:17.380 that this is the same schema that we've had before. 01:29:17.380 --> 01:29:21.490 We have collections, artists, and the created table. 01:29:21.490 --> 01:29:24.620 What's missing here, though, is our transactions table. 01:29:24.620 --> 01:29:26.530 So let me make that for us here. 01:29:26.530 --> 01:29:30.580 I'll say let me go create table, transactions, 01:29:30.580 --> 01:29:36.000 and now let me give it an ID column of type integer. 01:29:36.000 --> 01:29:41.030 Now, I'll also give it a title column as we saw before called "Text." 01:29:41.030 --> 01:29:43.310 It's in this column [INAUDIBLE],, the title 01:29:43.310 --> 01:29:46.100 of the piece we either bought or sold. 01:29:46.100 --> 01:29:50.960 Now, let me also include the action column, also of type text 01:29:50.960 --> 01:29:55.210 to say whether we bought or sold this piece. 01:29:55.210 --> 01:30:00.050 OK, finally, I'll make the ID column my primary key. 01:30:00.050 --> 01:30:06.080 And if I say semicolon here and hit Enter, I now have created this table. 01:30:06.080 --> 01:30:10.210 So if I say dot schema, notice how I can now see my transactions 01:30:10.210 --> 01:30:13.990 table down below, and I'm safe to run insert 01:30:13.990 --> 01:30:18.100 on this table to add some data to it. 01:30:18.100 --> 01:30:21.490 Well, let's try now creating our very first trigger. 01:30:21.490 --> 01:30:25.300 We're going to try to create a trigger that whenever an item is deleted 01:30:25.300 --> 01:30:28.990 from our collections table, we actually add it to our transactions table 01:30:28.990 --> 01:30:33.200 with the action of sold, meaning we sold this particular item. 01:30:33.200 --> 01:30:36.550 So let me say create trigger, as we saw before, 01:30:36.550 --> 01:30:38.830 and give it some name to identify it. 01:30:38.830 --> 01:30:41.030 I'll call this one sell. 01:30:41.030 --> 01:30:47.290 Now, I'll say I want to run this trigger before I delete on the collections 01:30:47.290 --> 01:30:48.170 table. 01:30:48.170 --> 01:30:52.870 So before I run the delete query on my collections table, 01:30:52.870 --> 01:30:56.980 I want to run this statement instead. 01:30:56.980 --> 01:31:00.830 I'll say for each row, begin. 01:31:00.830 --> 01:31:04.180 I want to now give you some statement to run for each row 01:31:04.180 --> 01:31:06.730 that I delete from collections. 01:31:06.730 --> 01:31:08.350 Let's make it this query here. 01:31:08.350 --> 01:31:12.160 On 1, 2, 3, 4 spaces to indent and make sure 01:31:12.160 --> 01:31:18.850 this is clearly the query I want to run after I run the delete on collections. 01:31:18.850 --> 01:31:23.020 I'll say I want to insert into transactions. 01:31:23.020 --> 01:31:27.730 I want to insert into the title and action columns. 01:31:27.730 --> 01:31:29.410 What value should I insert? 01:31:29.410 --> 01:31:34.210 Well, I should insert the old title, and actually, in triggers, 01:31:34.210 --> 01:31:37.330 you get access to this keyword called old. 01:31:37.330 --> 01:31:41.600 This is the title that we've just deleted from our collection. 01:31:41.600 --> 01:31:46.540 So old.title gives me access to the old row, the row we deleted, 01:31:46.540 --> 01:31:49.160 its title column in particular. 01:31:49.160 --> 01:31:54.210 Now, let me add in the action, which is just plain and simple "sold." 01:31:54.210 --> 01:31:58.820 So this is the query to run before I delete on collections. 01:31:58.820 --> 01:32:03.350 I will take the title of the row I'm about to delete, and insert it 01:32:03.350 --> 01:32:06.950 into transactions along with the action sold. 01:32:06.950 --> 01:32:11.070 Let me type a semicolon here to say this is my entire query, hit Enter again, 01:32:11.070 --> 01:32:14.000 and now I can end that particular statement 01:32:14.000 --> 01:32:18.110 I want to run after I hear a delete on collections. 01:32:18.110 --> 01:32:21.770 So I'll hit Enter now, and let me type dot schema. 01:32:21.770 --> 01:32:26.480 I can actually see this trigger is part of my schema. 01:32:26.480 --> 01:32:29.810 I can see create trigger sell before delete on transactions 01:32:29.810 --> 01:32:33.410 underneath the transactions table. 01:32:33.410 --> 01:32:36.680 OK, let's try putting this into action here. 01:32:36.680 --> 01:32:39.470 Let me try actually deleting something from our collections table. 01:32:39.470 --> 01:32:44.090 I'll say delete from collections where the title equals 01:32:44.090 --> 01:32:45.920 "Profusion of Flowers." 01:32:45.920 --> 01:32:49.460 I want to sell this particular piece of artwork from our collection. 01:32:49.460 --> 01:32:54.530 I'll hit semicolon here, and now nothing seems to happen as usual, 01:32:54.530 --> 01:32:58.290 but if I select star from collections-- 01:32:58.290 --> 01:33:02.820 from collections, I see that piece is gone. 01:33:02.820 --> 01:33:08.070 It's no longer in here, and if I select star from transactions-- 01:33:08.070 --> 01:33:12.720 select star from transactions, I actually see it's been sold. 01:33:12.720 --> 01:33:15.840 So automatically, I added "Profusion of Flowers" 01:33:15.840 --> 01:33:18.810 with the action of sold to my transactions table. 01:33:18.810 --> 01:33:21.400 I did not type insert myself. 01:33:21.400 --> 01:33:24.960 I instead created a trigger to run this statement, 01:33:24.960 --> 01:33:29.720 insert into, whenever I delete it on collections. 01:33:29.720 --> 01:33:31.010 Let's try one more. 01:33:31.010 --> 01:33:32.900 We want to not just be able to sell artwork, 01:33:32.900 --> 01:33:36.260 but also add it to our collection by buying it. 01:33:36.260 --> 01:33:39.050 So I create a trigger called buy. 01:33:39.050 --> 01:33:44.750 Let me type create trigger "buy," and now I 01:33:44.750 --> 01:33:52.460 want to run this statement forthcoming after an insert on collections. 01:33:52.460 --> 01:33:58.820 So I want to first insert a new item to collections, and when I do, after I do, 01:33:58.820 --> 01:34:01.610 I want to run this query coming next. 01:34:01.610 --> 01:34:07.590 Now, I want to, for each row, begin the statement I want to run. 01:34:07.590 --> 01:34:12.380 So as we said before, I want to insert into the transactions table. 01:34:12.380 --> 01:34:18.090 I want to insert into the title and action columns, just like this. 01:34:18.090 --> 01:34:21.380 Now, I want to insert some values. 01:34:21.380 --> 01:34:23.060 The values I want to insert-- 01:34:23.060 --> 01:34:27.410 well, I want to insert the new title, the new row I've just added. 01:34:27.410 --> 01:34:30.740 So in addition to old, we also have new when 01:34:30.740 --> 01:34:33.320 I'm talking when I'm using a trigger on an insert. 01:34:33.320 --> 01:34:40.100 So I'll say new.title, where again, new is the new row or rows I'm inserting, 01:34:40.100 --> 01:34:44.930 and the dot title is a column called title. 01:34:44.930 --> 01:34:47.360 Now, here I'll say the action was bought. 01:34:47.360 --> 01:34:49.070 We bought this artwork. 01:34:49.070 --> 01:34:52.250 I'll hit semicolon here, and now, I'll say end 01:34:52.250 --> 01:34:56.510 to say this is the end of my statement I want to run for every row I 01:34:56.510 --> 01:34:58.580 insert on collections. 01:34:58.580 --> 01:35:00.200 Let me hit Enter here. 01:35:00.200 --> 01:35:01.520 I've created this trigger. 01:35:01.520 --> 01:35:06.180 I can type dot schema to see it in my actual schema. 01:35:06.180 --> 01:35:07.910 Now, let's try inserting. 01:35:07.910 --> 01:35:11.630 I'll say insert into collections. 01:35:11.630 --> 01:35:15.680 I want to insert into the title and accession number, 01:35:15.680 --> 01:35:18.950 as well as the acquired columns here. 01:35:18.950 --> 01:35:21.260 I want to insert some new artwork. 01:35:21.260 --> 01:35:24.630 Maybe I sold Profusion of Flowers, but now I want to just buy it back. 01:35:24.630 --> 01:35:30.510 So I'll say I'll insert "Profusion of Flowers" is the title. 01:35:30.510 --> 01:35:37.160 The accession number was 56.257, and the date we acquired it, well, 01:35:37.160 --> 01:35:41.360 that was in 1956/04/12. 01:35:41.360 --> 01:35:43.310 Let's hit semicolon here. 01:35:43.310 --> 01:35:48.770 Now, I'll insert, and I'll say select star from collections. 01:35:48.770 --> 01:35:53.310 I should see Profusion of Flowers back in my table. 01:35:53.310 --> 01:35:56.780 Now though, if I say select star from transactions-- 01:35:56.780 --> 01:36:00.020 select star from transactions, I should see 01:36:00.020 --> 01:36:03.980 I previously sold Profusion of Flowers, and later bought it back. 01:36:03.980 --> 01:36:08.040 So again, I didn't run this insert myself. 01:36:08.040 --> 01:36:14.160 I instead triggered it by making my very own insert on the collections table. 01:36:14.160 --> 01:36:16.430 So this is the power of triggers. 01:36:16.430 --> 01:36:18.830 You're able to run statements that I myself 01:36:18.830 --> 01:36:23.030 didn't create in response to statements that I actually did. 01:36:23.030 --> 01:36:27.440 So let me ask here, what questions do we have on these triggers 01:36:27.440 --> 01:36:29.000 and what they can do for us? 01:36:31.650 --> 01:36:33.230 Let's go to Simon. 01:36:33.230 --> 01:36:38.390 SIMON: I'd like to know if there are only-- 01:36:38.390 --> 01:36:42.440 there is only one SQL inquiry that you can write 01:36:42.440 --> 01:36:46.170 in a trigger or multiple inquiries. 01:36:46.170 --> 01:36:49.700 SPEAKER: Yeah, we saw begin and end in our triggers 01:36:49.700 --> 01:36:52.290 here, where begin indicates the start of our statement, 01:36:52.290 --> 01:36:54.020 and end indicates the end. 01:36:54.020 --> 01:36:56.480 You actually can have multiple statements 01:36:56.480 --> 01:36:59.450 inside of the begin and end separated by, 01:36:59.450 --> 01:37:03.130 of course, that usual semicolon that we have. 01:37:03.130 --> 01:37:07.030 OK, so one final idea here that we're getting at-- we're 01:37:07.030 --> 01:37:09.370 kind of touching at with this transactions table is 01:37:09.370 --> 01:37:12.700 this idea of having a soft deletion. 01:37:12.700 --> 01:37:17.380 So when I dropped something from my collections table, 01:37:17.380 --> 01:37:20.890 it actually ended up in my transactions table, 01:37:20.890 --> 01:37:24.490 so it wasn't fully deleted in the sense that I'm keeping a record of it 01:37:24.490 --> 01:37:26.330 somewhere else. 01:37:26.330 --> 01:37:28.930 There's also one more way among many others 01:37:28.930 --> 01:37:32.123 to implement this idea of soft deletions, that is not quite fully 01:37:32.123 --> 01:37:34.540 deleting something from my database but instead keeping it 01:37:34.540 --> 01:37:37.970 around with a log that we've removed it in some way. 01:37:37.970 --> 01:37:39.940 Let's look at this table here. 01:37:39.940 --> 01:37:45.250 I have collections, but I also have this column called "deleted," 01:37:45.250 --> 01:37:49.000 and by default, the value here will be 0. 01:37:49.000 --> 01:37:54.190 That is, by default, I'll add some artwork to my collections table 01:37:54.190 --> 01:37:56.980 and deleted will be 0. 01:37:56.980 --> 01:38:01.210 If I want to delete a piece of artwork though, what could I do? 01:38:01.210 --> 01:38:05.050 Instead of deleting it fully from my table or removing the row, 01:38:05.050 --> 01:38:07.090 I could mark it as deleted. 01:38:07.090 --> 01:38:10.630 I could change the deleted column from a 0 to a 1. 01:38:10.630 --> 01:38:13.270 Let's say I wanted to delete Farmers Working At Dawn 01:38:13.270 --> 01:38:16.330 instead of writing literally delete from collection 01:38:16.330 --> 01:38:18.670 where title equals Farmers Working At Dawn, I 01:38:18.670 --> 01:38:23.500 could say update the deleted column of collections, where the title is Farmers 01:38:23.500 --> 01:38:27.490 Working At Dawn, and make it not 0, but 1. 01:38:27.490 --> 01:38:30.490 Now, I've marked this item as deleted. 01:38:30.490 --> 01:38:33.430 I could run a query that excludes deleted items, 01:38:33.430 --> 01:38:37.880 but this row is still around in my table. 01:38:37.880 --> 01:38:41.410 So let's try implementing this idea of a soft deletion 01:38:41.410 --> 01:38:43.960 inside of our collections table so that we 01:38:43.960 --> 01:38:48.310 don't lose records of things that actually had in our collection. 01:38:48.310 --> 01:38:51.250 I'll come back to my computer here, and let's work 01:38:51.250 --> 01:38:54.940 on altering our table for collections here. 01:38:54.940 --> 01:38:58.360 If I type dot schema collections-- 01:38:58.360 --> 01:39:04.480 dot schema collections, you should see I have a table "collections," and also 01:39:04.480 --> 01:39:06.430 the triggers associated with it. 01:39:06.430 --> 01:39:10.420 What I don't have yet in collections, if I go over here and show you, 01:39:10.420 --> 01:39:12.910 I don't have a deleted column. 01:39:12.910 --> 01:39:18.640 I have ID, title, accession number, and acquired, but I don't have deleted. 01:39:18.640 --> 01:39:22.390 So as you learned last week, we can use alter table 01:39:22.390 --> 01:39:27.550 to add a column to our collections table, and by default, make the value 01:39:27.550 --> 01:39:28.300 0. 01:39:28.300 --> 01:39:29.940 So let's try that. 01:39:29.940 --> 01:39:34.450 I'll go back to my environment, and now I'll run alter table. 01:39:34.450 --> 01:39:41.440 I might say alter table, and a table I want to alter, which is "collections." 01:39:41.440 --> 01:39:46.930 Now I want to add the column "deleted" to collections, 01:39:46.930 --> 01:39:50.410 and I want to make this type integer, so our whole numbers 01:39:50.410 --> 01:39:56.020 whether positive or negative, and the default value will be 0 in this case. 01:39:56.020 --> 01:40:01.210 Now, I'll hit Enter, and if I type dot schema collections, 01:40:01.210 --> 01:40:07.630 I should see I have my very own deleted column inside of collections. 01:40:07.630 --> 01:40:14.120 If I type, for instance, select star from collections semicolon, 01:40:14.120 --> 01:40:20.330 I'll see that by default, all of these values are 0 in my new deleted column. 01:40:20.330 --> 01:40:25.580 So instead of trying to literally delete from collections, 01:40:25.580 --> 01:40:27.230 I could instead just update it. 01:40:27.230 --> 01:40:31.880 I could say take the updated column, and flip it from a 0 to a 1. 01:40:31.880 --> 01:40:38.330 Let me try update collections and set the deleted column 01:40:38.330 --> 01:40:45.830 equal to 1, where let's say the title equals Farmers Working At Dawn. 01:40:45.830 --> 01:40:48.900 And my query wraps here, but it should still work just fine. 01:40:48.900 --> 01:40:51.050 I'll hit semicolon, Enter. 01:40:51.050 --> 01:40:53.600 Now, I didn't use delete from. 01:40:53.600 --> 01:40:54.740 I used update. 01:40:54.740 --> 01:40:59.570 If I select star from collections semicolon here, 01:40:59.570 --> 01:41:03.420 I'll see, well, Farmers Working At Dawn is in this table, 01:41:03.420 --> 01:41:07.560 but technically-- at least we marked it as deleted. 01:41:07.560 --> 01:41:12.830 So it seems now, I don't want to use just select star from collections 01:41:12.830 --> 01:41:14.540 to see what's inside my collection. 01:41:14.540 --> 01:41:18.230 I want to apply a filter to remove those that have 01:41:18.230 --> 01:41:21.745 a deleted value that's not equal to 0. 01:41:21.745 --> 01:41:22.370 Let's try this. 01:41:22.370 --> 01:41:30.650 I'll say select star from collections where deleted does not equal 1. 01:41:30.650 --> 01:41:33.710 If I hit semicolon here, what do I see? 01:41:33.710 --> 01:41:37.518 Only those values that are not deleted. 01:41:37.518 --> 01:41:38.810 I could go ahead and find them. 01:41:38.810 --> 01:41:45.620 I could say maybe select star from collections where deleted actually 01:41:45.620 --> 01:41:46.700 is 1. 01:41:46.700 --> 01:41:50.870 Really equals 1 like this, and now I see all those rows 01:41:50.870 --> 01:41:53.630 that I marked as deleted. 01:41:53.630 --> 01:41:55.940 Now this has some advantages. 01:41:55.940 --> 01:41:57.970 One of them is we keep data around. 01:41:57.970 --> 01:41:59.530 We don't actually formally delete it. 01:41:59.530 --> 01:42:04.460 We can still recover it later on, but it also has some tricky ethical questions, 01:42:04.460 --> 01:42:04.960 too. 01:42:04.960 --> 01:42:07.240 Like it's OK if we're talking about artwork here, 01:42:07.240 --> 01:42:12.520 but if you're talking about user data, is it right to only soft delete it 01:42:12.520 --> 01:42:14.020 if they ask you to delete it? 01:42:14.020 --> 01:42:17.800 Particularly in conversation with new frameworks like GDPR, 01:42:17.800 --> 01:42:20.570 and the right to be forgotten, and so on, it's up to you 01:42:20.570 --> 01:42:22.570 as a programmer to make the right decision here. 01:42:22.570 --> 01:42:25.480 When should you find data that [INAUDIBLE] and actually delete it, 01:42:25.480 --> 01:42:28.570 or when is it better to just soft delete it and keep it around 01:42:28.570 --> 01:42:31.210 in case you need it for later? 01:42:31.210 --> 01:42:36.490 So we'll end here on this note where if I say select star from collections, 01:42:36.490 --> 01:42:43.570 and I want to find those only where deleted is not equal to 0 like this-- 01:42:43.570 --> 01:42:49.150 or not equal to 1, this gives me back all of the items that are not deleted. 01:42:49.150 --> 01:42:51.670 But wouldn't it be nice if I could actually 01:42:51.670 --> 01:42:57.580 run a query on some brand new temporary table that actually only will ever have 01:42:57.580 --> 01:43:00.250 those items that are not soft deleted? 01:43:00.250 --> 01:43:03.140 Turns out, we can do that with an idea called views, 01:43:03.140 --> 01:43:06.130 and we'll see those in much more depth next week. 01:43:06.130 --> 01:43:08.010 We'll see you then.