1 00:00:00,000 --> 00:00:03,437 [AUDIO LOGO] 2 00:00:03,437 --> 00:00:16,720 3 00:00:16,720 --> 00:00:20,080 [? CARTER ZENKE: ?] Well, hello, one and all, and welcome back to CS50's 4 00:00:20,080 --> 00:00:22,480 Introduction to Databases with SQL. 5 00:00:22,480 --> 00:00:25,720 My name is Carter [? Zenke, ?] and last we left off, 6 00:00:25,720 --> 00:00:28,450 we learned how to create our very own database schemas-- 7 00:00:28,450 --> 00:00:31,600 that is, a way to organize data in our database. 8 00:00:31,600 --> 00:00:34,450 Today, we'll learn how to actually add data to our databases-- 9 00:00:34,450 --> 00:00:38,150 to insert, to update, and delete data. 10 00:00:38,150 --> 00:00:43,330 We'll do all of this in the context of Boston's very own Museum of Fine Arts, 11 00:00:43,330 --> 00:00:45,170 or the MFA for short. 12 00:00:45,170 --> 00:00:47,617 So the MFA here in Boston is a century-- 13 00:00:47,617 --> 00:00:49,450 well, maybe, about a century-old museum that 14 00:00:49,450 --> 00:00:54,640 has many artifacts and artwork inside of it, both historical and contemporary. 15 00:00:54,640 --> 00:00:58,720 And it's worth asking, how do they keep track of the thousands of items 16 00:00:58,720 --> 00:01:00,410 that are in their collections? 17 00:01:00,410 --> 00:01:02,290 What could they possibly use? 18 00:01:02,290 --> 00:01:06,580 Well, chances are, they're likely using some kind of database. 19 00:01:06,580 --> 00:01:10,630 And on that database, they want to do four actions 20 00:01:10,630 --> 00:01:13,180 we've learned about back in week zero. 21 00:01:13,180 --> 00:01:17,620 They could, perhaps, create data-- to add data or insert data to the database 22 00:01:17,620 --> 00:01:20,270 when they get some new piece of artwork, for instance. 23 00:01:20,270 --> 00:01:22,970 They might want to query the database, to read from it. 24 00:01:22,970 --> 00:01:27,250 They could also update data, change the artist, change the artwork in some way. 25 00:01:27,250 --> 00:01:31,340 And they could also just delete data to remove it all together. 26 00:01:31,340 --> 00:01:32,920 But if we think about these-- 27 00:01:32,920 --> 00:01:36,370 creating, reading, updating, and deleting-- we'll 28 00:01:36,370 --> 00:01:39,040 notice that reading, updating, and deleting-- we 29 00:01:39,040 --> 00:01:42,910 can't do those if we don't actually have data in our database. 30 00:01:42,910 --> 00:01:46,210 And so today, we'll see how to create data-- 31 00:01:46,210 --> 00:01:50,860 how to insert data into our very own database. 32 00:01:50,860 --> 00:01:54,400 Now, let's think about the MFA's [? collections, ?] [? their ?] 33 00:01:54,400 --> 00:01:55,983 [? collection ?] of art and artifacts. 34 00:01:55,983 --> 00:01:58,275 And let me propose that they have a database [? that ?] 35 00:01:58,275 --> 00:01:59,540 looks a bit like this. 36 00:01:59,540 --> 00:02:03,700 It's a single table, and it has a title column and accession 37 00:02:03,700 --> 00:02:08,380 number, which is a fancy way of saying a unique ID internal to the museum, 38 00:02:08,380 --> 00:02:10,960 and also, a date it was acquired. 39 00:02:10,960 --> 00:02:15,610 We have, of course, a primary key on this table called ID. 40 00:02:15,610 --> 00:02:16,630 And let's think. 41 00:02:16,630 --> 00:02:19,660 Well, the museum might want to acquire this piece here-- 42 00:02:19,660 --> 00:02:22,390 this one called Profusion of Flowers. 43 00:02:22,390 --> 00:02:27,940 Well, how could they log that this artwork is inside of their database? 44 00:02:27,940 --> 00:02:30,160 They could maybe just add a new row. 45 00:02:30,160 --> 00:02:32,290 They could say, let's put Profusion of Flowers 46 00:02:32,290 --> 00:02:34,900 as the first item in our collections here. 47 00:02:34,900 --> 00:02:37,430 We'll give it a title and accession number-- which, again, 48 00:02:37,430 --> 00:02:40,100 is just a unique ID internal to the museum-- 49 00:02:40,100 --> 00:02:42,250 and the date it was acquired. 50 00:02:42,250 --> 00:02:45,340 And that row then has its own primary key 51 00:02:45,340 --> 00:02:49,000 to identify this row uniquely in our database. 52 00:02:49,000 --> 00:02:51,040 Now let's say they get another piece of artwork. 53 00:02:51,040 --> 00:02:54,130 They get this one, called Farmers Working at Dawn, 54 00:02:54,130 --> 00:02:56,453 and they want to add this one to their table, too. 55 00:02:56,453 --> 00:02:58,120 Well, they could do the very same thing. 56 00:02:58,120 --> 00:02:59,380 They could just add a new row. 57 00:02:59,380 --> 00:03:01,930 They could say, let's make a title, an accession number, 58 00:03:01,930 --> 00:03:04,240 and a date it was acquired, and this brand-new row 59 00:03:04,240 --> 00:03:06,670 here for that piece of artwork. 60 00:03:06,670 --> 00:03:08,830 And maybe they get another one, too. 61 00:03:08,830 --> 00:03:09,610 Same thing. 62 00:03:09,610 --> 00:03:11,770 Maybe they'll get back Spring Outing, and they 63 00:03:11,770 --> 00:03:13,720 want to add this to their collection. 64 00:03:13,720 --> 00:03:17,810 They could simply add another row like this. 65 00:03:17,810 --> 00:03:22,060 Now, it turns out that the database administrator behind the MFA 66 00:03:22,060 --> 00:03:26,620 might be running a SQL statement that looks a bit like this-- 67 00:03:26,620 --> 00:03:28,870 INSERT INTO. 68 00:03:28,870 --> 00:03:34,180 We can use INSERT INTO to add a new row to any given table. 69 00:03:34,180 --> 00:03:38,900 And notice how insert into needs to know a few pieces of information. 70 00:03:38,900 --> 00:03:42,550 The first is what table to insert into. 71 00:03:42,550 --> 00:03:45,010 What is the name of that table? 72 00:03:45,010 --> 00:03:47,710 Second, it needs to know what columns are 73 00:03:47,710 --> 00:03:51,130 we adding data to inside of this table? 74 00:03:51,130 --> 00:03:53,380 We give it a list of those columns here. 75 00:03:53,380 --> 00:03:57,130 Then, of course, it needs to know what new values should 76 00:03:57,130 --> 00:04:00,790 go into this new row for each of these columns. 77 00:04:00,790 --> 00:04:03,820 Like we saw before, is it Profusion of Flowers? 78 00:04:03,820 --> 00:04:05,200 Is it Spring Outing? 79 00:04:05,200 --> 00:04:06,310 Et cetera. 80 00:04:06,310 --> 00:04:09,380 Here, we can see that we have this list of values. 81 00:04:09,380 --> 00:04:12,610 And notice how value 0, the first value in this list, 82 00:04:12,610 --> 00:04:17,560 corresponds to the new value that will be inserted [? to ?] this first column. 83 00:04:17,560 --> 00:04:21,459 And we can keep having value 1, or column 1, value 2 and column 2, 84 00:04:21,459 --> 00:04:25,910 each one aligning with that particular column there. 85 00:04:25,910 --> 00:04:29,300 So let's see an example of this actually in code, 86 00:04:29,300 --> 00:04:31,540 understand it a bit more concretely. 87 00:04:31,540 --> 00:04:33,730 I'll go back to my computer here. 88 00:04:33,730 --> 00:04:36,070 Let's actually create our very own database 89 00:04:36,070 --> 00:04:40,990 that involves this schema of having our very own table which can keep artists 90 00:04:40,990 --> 00:04:44,630 and artwork and artifacts as well. 91 00:04:44,630 --> 00:04:45,400 So I'll type 92 00:04:45,400 --> 00:04:49,510 1: [? TRANSCRIPT BREAK ?] So I'll type sqlite3 mfa.db 93 00:04:49,510 --> 00:04:55,450 to create a database for the Museum of Fine Arts, abbreviated as MFA. 94 00:04:55,450 --> 00:04:56,690 I'll hit Enter here. 95 00:04:56,690 --> 00:05:03,310 And notice how I can now have access to my very own mfa.db inside my SQLite 96 00:05:03,310 --> 00:05:04,390 environment. 97 00:05:04,390 --> 00:05:09,510 Well, now, I can type .schema to see the schema of this database. 98 00:05:09,510 --> 00:05:12,483 So if I hit Enter here, well, nothing's there. 99 00:05:12,483 --> 00:05:13,900 Because I just made this database. 100 00:05:13,900 --> 00:05:16,000 There's nothing in it yet. 101 00:05:16,000 --> 00:05:19,840 Well, it turns out I do actually have a schema file prepared for me 102 00:05:19,840 --> 00:05:22,330 already in schema.sql. 103 00:05:22,330 --> 00:05:26,530 Here, I propose we make a table called "collections," and like our table 104 00:05:26,530 --> 00:05:29,320 visually, it has let's say four columns. 105 00:05:29,320 --> 00:05:35,515 One as an "id," the primary key of this table, one for a "title," 106 00:05:35,515 --> 00:05:39,610 one for the "accession_number," the unique ID internal to the museum, 107 00:05:39,610 --> 00:05:42,670 and finally, the date it was acquired. 108 00:05:42,670 --> 00:05:47,720 And notice here, that "title" has to be actually added to our database. 109 00:05:47,720 --> 00:05:50,500 It can't be [? NULL. ?] It can't have an empty value. 110 00:05:50,500 --> 00:05:52,690 Same thing for "accession_number." 111 00:05:52,690 --> 00:05:56,135 And further, "accession_number" has to be unique. 112 00:05:56,135 --> 00:05:58,510 I can't have two items [? with ?] [? a ?] [? different ?] 113 00:05:58,510 --> 00:05:59,590 "accession_number." 114 00:05:59,590 --> 00:06:02,380 Otherwise, I might get them confused in my database 115 00:06:02,380 --> 00:06:05,330 or inside my museum archives as well. 116 00:06:05,330 --> 00:06:09,550 So let me then add this schema to my database. 117 00:06:09,550 --> 00:06:12,280 And I can do so with this command right here. 118 00:06:12,280 --> 00:06:16,780 This is a SQLite command called .read, that we saw a little bit last week 119 00:06:16,780 --> 00:06:17,690 as well. 120 00:06:17,690 --> 00:06:21,440 I could type .read and then the name of this file I want to read. 121 00:06:21,440 --> 00:06:25,620 So I'll say .read schema.sql. 122 00:06:25,620 --> 00:06:28,020 Hit Enter. 123 00:06:28,020 --> 00:06:33,450 And now if I type, let's say .schema, I can see that same [? sql ?] 124 00:06:33,450 --> 00:06:39,280 [? schema, ?] same database schema. now inside of my terminal. 125 00:06:39,280 --> 00:06:44,420 OK, so now as promised, let's try adding some rows to this table. 126 00:06:44,420 --> 00:06:48,040 Because right now, if I do SELECT [? star ?] from [? collections ?] 127 00:06:48,040 --> 00:06:52,930 [? semicolon, ?] I don't see anything because nothing is inside yet. 128 00:06:52,930 --> 00:06:56,000 But I could add something using INSERT INTO. 129 00:06:56,000 --> 00:06:57,390 So let's try that. 130 00:06:57,390 --> 00:07:01,630 I'll say INSERT INTO the "collections table." 131 00:07:01,630 --> 00:07:05,380 And now, I have to ask, what columns inside of "collections" 132 00:07:05,380 --> 00:07:08,080 [? do I ?] [? want ?] to add data to. 133 00:07:08,080 --> 00:07:11,620 Well, I probably want to add to the first the "id" column, 134 00:07:11,620 --> 00:07:13,480 that primary key column. 135 00:07:13,480 --> 00:07:17,470 Then maybe, I want to add a "title" to this row, and also 136 00:07:17,470 --> 00:07:21,310 an "accession_number," and also, of course, the date this piece 137 00:07:21,310 --> 00:07:22,670 was "acquired." 138 00:07:22,670 --> 00:07:25,750 So now, I have the table I'm inserting into along 139 00:07:25,750 --> 00:07:28,240 with the columns I'm adding values for. 140 00:07:28,240 --> 00:07:31,060 So for style's sake, I'll hit Enter here. 141 00:07:31,060 --> 00:07:35,710 And now, I could type a list of values to insert into this new row. 142 00:07:35,710 --> 00:07:40,060 I could say VALUES, and then inside parentheses, the values 143 00:07:40,060 --> 00:07:41,650 I want to insert. 144 00:07:41,650 --> 00:07:45,010 Maybe the first primary key that I give to this item 145 00:07:45,010 --> 00:07:49,420 is going to be just [? 1, ?] start at 1, and add up as we add new items. 146 00:07:49,420 --> 00:07:53,530 For the title, I'll say let's call this one "Profusion of flowers." 147 00:07:53,530 --> 00:07:56,830 This is the piece we recently acquired into our collection. 148 00:07:56,830 --> 00:08:00,400 The accession number that we gave it was 56.257. 149 00:08:00,400 --> 00:08:06,120 And the date it was acquired was back in 1956-04-12. 150 00:08:06,120 --> 00:08:07,680 Now, I have all these values here. 151 00:08:07,680 --> 00:08:12,030 I'll type [? semicolon ?] hit Enter, and nothing seems to happen. 152 00:08:12,030 --> 00:08:17,370 But if I type, let's say, select [? star ?] from collections 153 00:08:17,370 --> 00:08:22,830 [? semicolon, ?] what do I see, but this new row inside. 154 00:08:22,830 --> 00:08:24,540 Let's do it again to get a hang of this. 155 00:08:24,540 --> 00:08:27,600 I want to now add "Farmers working at dawn" to my collection. 156 00:08:27,600 --> 00:08:31,080 So I'll do the same thing, I'll say let's INSERT INTO 157 00:08:31,080 --> 00:08:32,669 the "collections" table. 158 00:08:32,669 --> 00:08:36,929 And let's add values for the "id" column, the "title" 159 00:08:36,929 --> 00:08:42,659 column, the "accession_number," and also the "acquired" column like this. 160 00:08:42,659 --> 00:08:44,640 Again for style's sake, I'll hit Enter. 161 00:08:44,640 --> 00:08:48,780 Now, I give some list of values to add into this new row. 162 00:08:48,780 --> 00:08:53,280 I'll say VALUES here, then this list of values to add for each column. 163 00:08:53,280 --> 00:08:57,360 Well, the first column, I have to give a value for is "id." 164 00:08:57,360 --> 00:09:02,460 If you remember, our last id was 1, so what should this id be? 165 00:09:02,460 --> 00:09:03,540 Maybe 2. 166 00:09:03,540 --> 00:09:08,190 So I'll type 2 here as the next increment of my primary key, 167 00:09:08,190 --> 00:09:09,330 then I'll give it a title. 168 00:09:09,330 --> 00:09:13,470 And this title was "Farmers working at dawn." 169 00:09:13,470 --> 00:09:15,810 We gave it an accession number to keep track of it 170 00:09:15,810 --> 00:09:18,910 in our own internal museum records, our own archives here. 171 00:09:18,910 --> 00:09:25,110 So I'll say 11.6152 is the [? accession ?] [? number ?] for this 172 00:09:25,110 --> 00:09:32,740 particular item, then I'll say we acquired it back in 1911-08-03. 173 00:09:32,740 --> 00:09:36,010 Now, I'll hit a semicolon here, hit Enter. 174 00:09:36,010 --> 00:09:39,520 And now, I should be able to see if I type [? SELECT ?] [? star ?] FROM 175 00:09:39,520 --> 00:09:42,520 "collections," SELECT [? star ?] FROM "collections" 176 00:09:42,520 --> 00:09:48,740 [? semicolon, ?] I now have these two items inside of my collection. 177 00:09:48,740 --> 00:09:51,440 Now, let's do one more here. 178 00:09:51,440 --> 00:09:55,640 But let's focus in particular on this primary key. 179 00:09:55,640 --> 00:09:59,570 Notice here, how we've been actually inserting our very own primary key 180 00:09:59,570 --> 00:10:01,400 1 and then 2. 181 00:10:01,400 --> 00:10:05,223 But maybe that's not the best design. 182 00:10:05,223 --> 00:10:06,140 Let me try this again. 183 00:10:06,140 --> 00:10:12,080 I'll say INSERT INTO, let's go for, "collections" [? is ?] our table name. 184 00:10:12,080 --> 00:10:16,790 And maybe I'll try again to add to the "id" column, the "title" 185 00:10:16,790 --> 00:10:23,930 column, the "accession_number," and the "acquired" column here. 186 00:10:23,930 --> 00:10:29,960 What could go wrong do you think if I try to specify the primary key myself? 187 00:10:29,960 --> 00:10:31,880 Let me ask this as an audience question here. 188 00:10:31,880 --> 00:10:33,140 Feel free raise your hand. 189 00:10:33,140 --> 00:10:37,700 What might go wrong if I try to make the primary key myself? 190 00:10:37,700 --> 00:10:39,185 Insert a value myself? 191 00:10:39,185 --> 00:10:43,010 192 00:10:43,010 --> 00:10:44,900 Let's go to [INAUDIBLE]. 193 00:10:44,900 --> 00:10:49,940 SPEAKER 2: Yeah, what we could do when we have to [INAUDIBLE] a lot of values, 194 00:10:49,940 --> 00:10:56,810 like maybe like big data or something, is there a way to add it to a CSV file 195 00:10:56,810 --> 00:10:59,725 or something rather than typing [? insert ?] [? into ?] tables? 196 00:10:59,725 --> 00:11:00,850 SPEAKER 1: A good question. 197 00:11:00,850 --> 00:11:02,558 I think you're onto something here, which 198 00:11:02,558 --> 00:11:05,600 is we're inserting one row at a time, which could get easily repetitive. 199 00:11:05,600 --> 00:11:07,897 So let me hold that thought for just a minute here 200 00:11:07,897 --> 00:11:10,730 and focus on the primary keys before we see some more efficient ways 201 00:11:10,730 --> 00:11:12,960 to actually add data to our table. 202 00:11:12,960 --> 00:11:15,800 So I'm with you on this idea of maybe we don't want 203 00:11:15,800 --> 00:11:17,697 to duplicate this primary key, right? 204 00:11:17,697 --> 00:11:19,530 Maybe we could do a little better than that. 205 00:11:19,530 --> 00:11:21,780 So I'll come back to my computer here. 206 00:11:21,780 --> 00:11:25,670 And if I specify the primary key, I might actually 207 00:11:25,670 --> 00:11:27,840 add a value that's already in there. 208 00:11:27,840 --> 00:11:32,180 So I could, thankfully, leave it up to SQLite to actually increment 209 00:11:32,180 --> 00:11:33,620 this value for me. 210 00:11:33,620 --> 00:11:36,440 Let me try leaving off "id" in this case. 211 00:11:36,440 --> 00:11:39,350 I'll omit the "id" altogether, which seems 212 00:11:39,350 --> 00:11:42,170 at first like a bit of a bad thought, like shouldn't every row 213 00:11:42,170 --> 00:11:43,280 have its own id? 214 00:11:43,280 --> 00:11:45,830 Well, let's just try and figure out what happened. 215 00:11:45,830 --> 00:11:47,060 I'll hit Enter here. 216 00:11:47,060 --> 00:11:49,560 And now I'll say, let me give the VALUES. 217 00:11:49,560 --> 00:11:52,770 Well, now the first column is the title column. 218 00:11:52,770 --> 00:11:55,970 So I'll go ahead and say that this one will be called "Spring outing." 219 00:11:55,970 --> 00:11:58,520 We're going to add this one to our collection here. 220 00:11:58,520 --> 00:12:01,610 The accession number, in this case, is 14.76. 221 00:12:01,610 --> 00:12:08,540 And we acquired this one all the way back in 1914-01-08 [? semicolon. ?] 222 00:12:08,540 --> 00:12:10,170 So I'll hit Enter here. 223 00:12:10,170 --> 00:12:13,340 And notice how there's no [? id ?] that I have specified. 224 00:12:13,340 --> 00:12:16,850 But now if I hit Enter, seems to work. 225 00:12:16,850 --> 00:12:20,120 I'll type SELECT [? star ?] [? FROM ?] "collections," 226 00:12:20,120 --> 00:12:24,710 and what do I see but the new primary key of 3. 227 00:12:24,710 --> 00:12:30,290 So it seems like SQLite actually increments the primary key for me. 228 00:12:30,290 --> 00:12:35,180 If I add some new row, it looks, what is the current highest primary key 229 00:12:35,180 --> 00:12:38,120 and adds 1 to that automatically for me. 230 00:12:38,120 --> 00:12:41,390 And I've gotten that by specifying in my schema 231 00:12:41,390 --> 00:12:45,050 this primary key constraint down on line 6. 232 00:12:45,050 --> 00:12:47,840 So pretty handy for me here. 233 00:12:47,840 --> 00:12:50,030 I see a few other questions too. 234 00:12:50,030 --> 00:12:54,570 So let's take those before we move on as well. 235 00:12:54,570 --> 00:12:55,760 Let's go to [? Andre. ?] 236 00:12:55,760 --> 00:12:59,030 SPEAKER 3: I just want to ask if I delete the, let's say, 237 00:12:59,030 --> 00:13:06,410 the [? first ?] record and type again, will it number it [? on ?] the fourth 238 00:13:06,410 --> 00:13:07,503 number or the first one. 239 00:13:07,503 --> 00:13:08,670 SPEAKER 1: A great question. 240 00:13:08,670 --> 00:13:10,378 We'll see this a little bit later on too. 241 00:13:10,378 --> 00:13:14,990 But if I delete some row, let's say, I deleted the row with primary key of 1, 242 00:13:14,990 --> 00:13:17,990 well, what SQLite will do, at least by default, 243 00:13:17,990 --> 00:13:20,060 is actually take the highest value. 244 00:13:20,060 --> 00:13:23,340 Let's say the highest value is still three in my database. 245 00:13:23,340 --> 00:13:26,510 It will add 1 to that, and I want to insert that new row it'll 246 00:13:26,510 --> 00:13:29,673 have the [? id ?] of 4 in this case. 247 00:13:29,673 --> 00:13:34,430 OK, so let's explore a bit more some of these constraints on tables. 248 00:13:34,430 --> 00:13:37,850 Here, we're talking about the primary key constraint, but we also have, 249 00:13:37,850 --> 00:13:40,310 as we saw before, the [? NOT ?] [? NULL ?] constraint 250 00:13:40,310 --> 00:13:42,000 and the [? UNIQUE ?] constraint. 251 00:13:42,000 --> 00:13:45,230 So let's try inserting given those constraints here. 252 00:13:45,230 --> 00:13:47,280 [? I'll go ?] back to my computer. 253 00:13:47,280 --> 00:13:52,340 And again, notice in my schema, I specified that this "title" 254 00:13:52,340 --> 00:13:55,512 column should always be NOT NULL. 255 00:13:55,512 --> 00:13:57,720 It should never have a [? null ?] value inside of it. 256 00:13:57,720 --> 00:14:01,280 And similarly, "accession_number" should also be NOT NULL. 257 00:14:01,280 --> 00:14:05,270 It should also be [? unique. ?] I should have no two rows 258 00:14:05,270 --> 00:14:07,550 that have the same accession number. 259 00:14:07,550 --> 00:14:11,300 And maybe kind of a playfully, subversively here, 260 00:14:11,300 --> 00:14:13,458 let's try to run against this constraint. 261 00:14:13,458 --> 00:14:16,500 Let's actually try to add the same accession number and see what happens. 262 00:14:16,500 --> 00:14:20,900 So I'll come back to my [? TERMINAL. ?] Let me just reinsert, 263 00:14:20,900 --> 00:14:22,490 let's say "Spring outing." 264 00:14:22,490 --> 00:14:27,300 Notice how if I SELECT [? star ?] FROM "collections," it's already in here. 265 00:14:27,300 --> 00:14:30,750 But I'll try to add it yet again with the same [? accession ?] [? number. ?] 266 00:14:30,750 --> 00:14:32,408 So I'll say INSERT INTO "collections." 267 00:14:32,408 --> 00:14:35,450 And by now this is [? hopefully ?] [? becoming ?] a little more familiar. 268 00:14:35,450 --> 00:14:38,870 I now want to say the columns I want to add data to. 269 00:14:38,870 --> 00:14:46,520 So I'll say the "title" column, the "accession_number" column, 270 00:14:46,520 --> 00:14:49,550 and also the "acquired" date I'll hit Enter. 271 00:14:49,550 --> 00:14:51,620 And now I'll say the VALUES again. 272 00:14:51,620 --> 00:14:54,650 Let's reinsert [? "Spring ?] [? outing." ?] I'll say "Spring outing" 273 00:14:54,650 --> 00:14:58,640 as a title, the accession number is 14.76, 274 00:14:58,640 --> 00:15:04,670 and the date it was acquired was again 1914-01-08 semicolon. 275 00:15:04,670 --> 00:15:09,100 Now if I hit Enter, what do we see? 276 00:15:09,100 --> 00:15:12,490 Runtime error-- UNIQUE constraint failed. 277 00:15:12,490 --> 00:15:14,890 So it seems we ran against this constraint 278 00:15:14,890 --> 00:15:18,460 here that we specified on line 4 here. 279 00:15:18,460 --> 00:15:20,830 [? "Accession number" ?] should always be unique. 280 00:15:20,830 --> 00:15:24,820 But by trying to add a new row that had that same accession number, 281 00:15:24,820 --> 00:15:29,780 we ran into this runtime error and our operation was not completed. 282 00:15:29,780 --> 00:15:33,190 If I say in this case, SELECT [? star ?] FROM "collections," 283 00:15:33,190 --> 00:15:38,050 notice how I guarded myself from adding Spring outing more than once 284 00:15:38,050 --> 00:15:40,070 with the same accession number. 285 00:15:40,070 --> 00:15:43,630 So some usefulness here to these kinds of constraints. 286 00:15:43,630 --> 00:15:46,180 Let's try violating NOT NULL too. 287 00:15:46,180 --> 00:15:49,252 I'll try adding a title that is actually [? null, ?] non-existent. 288 00:15:49,252 --> 00:15:51,460 It's trying to add a painting without a title itself. 289 00:15:51,460 --> 00:15:52,420 Let's try this. 290 00:15:52,420 --> 00:15:56,020 I might try INSERT INTO the "collections" table, 291 00:15:56,020 --> 00:16:01,660 and I'll add to the "title" columns, the "accession_number," and also 292 00:16:01,660 --> 00:16:03,940 the "acquired" column. 293 00:16:03,940 --> 00:16:06,020 Now, I'll say the VALUES here. 294 00:16:06,020 --> 00:16:07,870 But remember, this [? null ?] value. 295 00:16:07,870 --> 00:16:09,280 Null meaning nothing. 296 00:16:09,280 --> 00:16:10,630 This value doesn't exist. 297 00:16:10,630 --> 00:16:14,260 I could insert that into my table, or at least try to. 298 00:16:14,260 --> 00:16:16,690 I'll say NULL is a title. 299 00:16:16,690 --> 00:16:19,610 And similarly, NULL is the accession number. 300 00:16:19,610 --> 00:16:25,450 Now let's say, just playfully, we got this painting back in 1900-01-10 301 00:16:25,450 --> 00:16:26,770 like this. 302 00:16:26,770 --> 00:16:28,960 OK now, semicolon. 303 00:16:28,960 --> 00:16:33,970 I'll hit Enter, and we see the same runtime error-- constraint failed, 304 00:16:33,970 --> 00:16:36,200 in this case, the NOT NULL constraint. 305 00:16:36,200 --> 00:16:41,950 So notice again how in my schema, I specified "title" should be NOT NULL. 306 00:16:41,950 --> 00:16:46,090 But here when I try to insert a [? null ?] [? value, ?] I run into that 307 00:16:46,090 --> 00:16:48,670 constraint and I can't insert that value. 308 00:16:48,670 --> 00:16:51,400 If I try to SELECT [? star ?] FROM "collections," 309 00:16:51,400 --> 00:16:57,410 I should see in this case that insert didn't work. 310 00:16:57,410 --> 00:17:00,520 So these constraints are really guardrails against me, 311 00:17:00,520 --> 00:17:02,980 against other users to adding values that we really 312 00:17:02,980 --> 00:17:06,480 shouldn't add to our table. 313 00:17:06,480 --> 00:17:12,400 OK, so presumably here, we're onto something by adding some rows. 314 00:17:12,400 --> 00:17:17,730 But if the museum acquires more than one item, maybe 100 at a time, 315 00:17:17,730 --> 00:17:21,119 I don't want to be the programmer who's sitting there writing 100 INSERT INTO. 316 00:17:21,119 --> 00:17:22,740 That's like not what I want to do. 317 00:17:22,740 --> 00:17:24,630 There's probably a better way to do this. 318 00:17:24,630 --> 00:17:27,040 [? And ?] [? we ?] show you one way to do this. 319 00:17:27,040 --> 00:17:32,190 One way is to instead of inserting one row, like this, to instead insert 320 00:17:32,190 --> 00:17:34,900 multiple separated by commas. 321 00:17:34,900 --> 00:17:39,870 So here, I might say this is my first row for each of these columns. 322 00:17:39,870 --> 00:17:43,080 This is my second row, again, for each of these columns, 323 00:17:43,080 --> 00:17:45,370 and so on and so forth. 324 00:17:45,370 --> 00:17:49,530 And it's worth noting, this is not just a manner of convenience. 325 00:17:49,530 --> 00:17:54,130 Like if I tried to insert 100 rows, this is certainly convenient for me. 326 00:17:54,130 --> 00:17:57,210 But it's also most often in most cases going 327 00:17:57,210 --> 00:18:01,200 to be faster for me to insert 100 rows in one go, 328 00:18:01,200 --> 00:18:04,390 than to insert one row 100 times. 329 00:18:04,390 --> 00:18:06,930 So this is both a convenience thing for programmers 330 00:18:06,930 --> 00:18:11,740 and also an efficiency thing to actually optimize your database as well. 331 00:18:11,740 --> 00:18:16,350 So let's try this syntax now to avoid me sitting there for hours and hours 332 00:18:16,350 --> 00:18:18,240 writing many insert interviews. 333 00:18:18,240 --> 00:18:20,460 I'll go back to my computer. 334 00:18:20,460 --> 00:18:23,940 And this time, let's say we got two paintings at once. 335 00:18:23,940 --> 00:18:24,750 We got two. 336 00:18:24,750 --> 00:18:26,670 So I'll try to add both of them. 337 00:18:26,670 --> 00:18:30,510 I'll say INSERT INTO the "collections" table. 338 00:18:30,510 --> 00:18:35,730 And again, I want to keep adding to my "title" column, my "accession_number" 339 00:18:35,730 --> 00:18:40,200 column, and even my "acquired" column, like this. 340 00:18:40,200 --> 00:18:42,640 Now, I want to add in some values. 341 00:18:42,640 --> 00:18:44,370 So I'll type VALUES here. 342 00:18:44,370 --> 00:18:48,390 And as a matter of style, let me just say I'll make a new line. 343 00:18:48,390 --> 00:18:49,650 I'll Enter here. 344 00:18:49,650 --> 00:18:54,360 And now I can type each of my new rows on their very own line. 345 00:18:54,360 --> 00:18:59,220 So again, I'll type all of the values that I want in my first row. 346 00:18:59,220 --> 00:19:02,220 I might call this one "Imaginative landscape." 347 00:19:02,220 --> 00:19:03,785 We got this one back in-- 348 00:19:03,785 --> 00:19:05,910 actually, we don't quite know when we got this one. 349 00:19:05,910 --> 00:19:12,210 If I type 56.496 is the accession number, like, yeah, 56.496. 350 00:19:12,210 --> 00:19:15,430 The MFA actually doesn't know when they got this painting. 351 00:19:15,430 --> 00:19:18,810 So if they don't know, let's just insert NULL. 352 00:19:18,810 --> 00:19:22,800 This is intentionally left blank. 353 00:19:22,800 --> 00:19:24,150 This is my first row. 354 00:19:24,150 --> 00:19:26,970 Now, I want to add a second one in one go. 355 00:19:26,970 --> 00:19:28,690 I'll follow this up with a comma. 356 00:19:28,690 --> 00:19:32,250 And now, I'll type out my next set of values. 357 00:19:32,250 --> 00:19:36,000 This next one we acquired is called "Peonies and butterfly." 358 00:19:36,000 --> 00:19:40,710 Now, I'll say the [? accession ?] [? number ?] is 06.1899. 359 00:19:40,710 --> 00:19:48,570 And we got this one-- it was back in 1906-01-01 now I'll hit semicolon here. 360 00:19:48,570 --> 00:19:52,300 And now, if I hit Enter, nothing happens. 361 00:19:52,300 --> 00:19:57,330 But if I type Select [? star ?] FROM "collections" semicolon, 362 00:19:57,330 --> 00:20:01,960 what do we see, but now two new rows being inserted. 363 00:20:01,960 --> 00:20:05,500 So a handy way to insert more than one value. 364 00:20:05,500 --> 00:20:10,510 And also, if you have a lot of values, a more efficient way as well. 365 00:20:10,510 --> 00:20:12,030 So we pause here. 366 00:20:12,030 --> 00:20:17,930 And ask what questions we have on INSERT INTO, Whether adding one row 367 00:20:17,930 --> 00:20:19,550 or adding multiple. 368 00:20:19,550 --> 00:20:21,320 Let's go to [INAUDIBLE]. 369 00:20:21,320 --> 00:20:24,070 SPEAKER 4: [? So ?] [? imagine ?] you are [? writing ?] [? code ?] 370 00:20:24,070 --> 00:20:28,700 or inserting a row, so by mistake you have entered the wrong spelling 371 00:20:28,700 --> 00:20:31,717 of the title, so how we will rename it? 372 00:20:31,717 --> 00:20:33,050 SPEAKER 1: A great question too. 373 00:20:33,050 --> 00:20:35,900 And often as is the case with me, I make typos all the time. 374 00:20:35,900 --> 00:20:40,040 I might add the artwork title and misspell it for instance. 375 00:20:40,040 --> 00:20:44,600 Well, in that case, I can't use INSERT INTO to correct it, 376 00:20:44,600 --> 00:20:48,410 but I can use a new statement we'll see a little later called [? update. ?] 377 00:20:48,410 --> 00:20:50,210 And so with [? update, ?] you can actually 378 00:20:50,210 --> 00:20:51,830 change the spellings of things. 379 00:20:51,830 --> 00:20:53,930 And we'll see that a little later today. 380 00:20:53,930 --> 00:20:57,770 But great question to peek ahead as well. 381 00:20:57,770 --> 00:21:00,270 OK, so let's keep going here. 382 00:21:00,270 --> 00:21:04,500 Let's think about how we can keep adding values to our data set. 383 00:21:04,500 --> 00:21:09,620 And so far we've seen INSERT INTO with one row and with multiple. 384 00:21:09,620 --> 00:21:13,640 But one more way to keep in mind is you might already 385 00:21:13,640 --> 00:21:17,300 have your data, perhaps in some other format. 386 00:21:17,300 --> 00:21:22,940 And one common format looks a bit like this on our screen over here. 387 00:21:22,940 --> 00:21:29,750 This file is called CSV for Comma Separated Values. 388 00:21:29,750 --> 00:21:31,760 Now, why Comma Separated Values? 389 00:21:31,760 --> 00:21:33,350 Well, let's just look at it. 390 00:21:33,350 --> 00:21:36,710 Here we see some presumably column or column names, 391 00:21:36,710 --> 00:21:39,530 like [? id, ?] [? title, ?] [? accession number, ?] 392 00:21:39,530 --> 00:21:42,980 and [? acquired, ?] but what separates these column names? 393 00:21:42,980 --> 00:21:44,510 Well, it looks like commas-- 394 00:21:44,510 --> 00:21:48,470 "id" comma "title," comma "accession_number," and so on. 395 00:21:48,470 --> 00:21:51,350 Every row is still its own row. 396 00:21:51,350 --> 00:21:52,670 That makes sense here. 397 00:21:52,670 --> 00:21:57,410 But now, those row values are also separated by commas. 398 00:21:57,410 --> 00:22:01,100 Notice here, the first value before the first comma 399 00:22:01,100 --> 00:22:05,210 corresponds to the first column, this id is 1. 400 00:22:05,210 --> 00:22:09,050 Similarly, the next Comma Separated Value Profusion of flowers 401 00:22:09,050 --> 00:22:12,450 belongs to that next column here-- "title" as well, 402 00:22:12,450 --> 00:22:14,070 and so on and so forth. 403 00:22:14,070 --> 00:22:16,820 You could if you wanted to draw a snaky line 404 00:22:16,820 --> 00:22:21,780 to see how these columns correspond down our file here. 405 00:22:21,780 --> 00:22:25,720 Now, you might often have data already in this format. 406 00:22:25,720 --> 00:22:27,470 And it's actually pretty convenient to try 407 00:22:27,470 --> 00:22:31,760 to import this data into SQLite into your very own table 408 00:22:31,760 --> 00:22:34,170 so you can use [? SQL ?] on it. 409 00:22:34,170 --> 00:22:37,730 So let's try doing that here, taking a CSV like this 410 00:22:37,730 --> 00:22:40,610 and adding it to our database. 411 00:22:40,610 --> 00:22:42,600 I'll come back to my computer. 412 00:22:42,600 --> 00:22:45,870 And at this point, I want to start over. 413 00:22:45,870 --> 00:22:48,590 Let's say, I didn't use [? insert ?] at all. 414 00:22:48,590 --> 00:22:52,560 I actually got a CSV from the MFA of all the items in our collection. 415 00:22:52,560 --> 00:22:55,340 So I'll type .quit to leave this database. 416 00:22:55,340 --> 00:23:01,040 And now, I'll type rm mfa.db to remove this file all together. 417 00:23:01,040 --> 00:23:07,070 Now, let me show you mfa.csv, the CSV file I have that looks exactly 418 00:23:07,070 --> 00:23:08,480 like what we just saw. 419 00:23:08,480 --> 00:23:11,300 I'll type code mfa.csv. 420 00:23:11,300 --> 00:23:15,440 And now I can indeed see, I have an iD column, a title 421 00:23:15,440 --> 00:23:18,830 column, and accession_number, and the date these pieces were 422 00:23:18,830 --> 00:23:22,590 acquired all separated by commas. 423 00:23:22,590 --> 00:23:28,280 So let's say I want to quickly import this data into its very own database 424 00:23:28,280 --> 00:23:30,050 to run [? SQL ?] [? queries ?] on it. 425 00:23:30,050 --> 00:23:33,200 As it stands, I can't use [? SQL ?] [? queries ?] on this table 426 00:23:33,200 --> 00:23:36,110 because it's not inside of a SQLite database. 427 00:23:36,110 --> 00:23:37,370 But let me add one. 428 00:23:37,370 --> 00:23:43,470 I'll type SQLite3 mfa.db to remake [? MFA ?] down below. 429 00:23:43,470 --> 00:23:48,020 Let me now introduce this new command that is not a SQL keyword, 430 00:23:48,020 --> 00:23:50,400 but is actually a SQLite command. 431 00:23:50,400 --> 00:23:54,170 And this command is called .import. 432 00:23:54,170 --> 00:23:57,560 .import lets you take a file, like a CSV, 433 00:23:57,560 --> 00:24:04,010 and automatically insert it row by row into a table of your own making or one 434 00:24:04,010 --> 00:24:06,590 you can let SQLite create for you. 435 00:24:06,590 --> 00:24:08,660 Let's actually create the table ourselves, 436 00:24:08,660 --> 00:24:12,080 and then insert this CSV using .import. 437 00:24:12,080 --> 00:24:13,990 So I'll go back to SQLite. 438 00:24:13,990 --> 00:24:15,830 Let me recreate the schema. 439 00:24:15,830 --> 00:24:20,490 Right now, there's no tables inside this database, but let me create one. 440 00:24:20,490 --> 00:24:26,240 I'll instead type .read schema.sql to read in that old schema.sql file, 441 00:24:26,240 --> 00:24:28,460 so I can have my own table here. 442 00:24:28,460 --> 00:24:30,000 I'll type that schema again. 443 00:24:30,000 --> 00:24:33,150 And now I see, I have that table back. 444 00:24:33,150 --> 00:24:38,090 But if I type SELECT [? star ?] FROM "collections" semicolon, 445 00:24:38,090 --> 00:24:39,980 nothing is inside. 446 00:24:39,980 --> 00:24:41,070 Well, I could fix this. 447 00:24:41,070 --> 00:24:46,220 I could to import this CSV into collections to insert all at once from 448 00:24:46,220 --> 00:24:49,350 [? its ?] [? own, ?] from the CSV file we have right here. 449 00:24:49,350 --> 00:24:52,910 So to do this, let me type .import. 450 00:24:52,910 --> 00:24:59,270 Now before I finish this off, there are a few arguments or options I can give 451 00:24:59,270 --> 00:25:02,660 .import to make sure it works properly. 452 00:25:02,660 --> 00:25:05,430 The ones for this look as follows. 453 00:25:05,430 --> 00:25:10,730 I should type .import, and then [? dash ?] [? dash ?] CSV to say 454 00:25:10,730 --> 00:25:12,560 I'm importing a CSV. 455 00:25:12,560 --> 00:25:15,620 If I don't type [? dash ?] [? dash ?] CSV, 456 00:25:15,620 --> 00:25:19,040 SQLite I might assume something about this file that just isn't true. 457 00:25:19,040 --> 00:25:22,490 It should know that numbers are here separated by commas, 458 00:25:22,490 --> 00:25:25,130 [? numbers, ?] or [? values, ?] whatever's inside my table are 459 00:25:25,130 --> 00:25:27,300 separated by commas. 460 00:25:27,300 --> 00:25:29,840 And then [? dash ?] [? dash ?] skip 1-- 461 00:25:29,840 --> 00:25:31,070 well, let's take a look. 462 00:25:31,070 --> 00:25:34,280 If I go back to this file here, let's see. 463 00:25:34,280 --> 00:25:38,900 I have the first row, the second row, the third row, the fourth row. 464 00:25:38,900 --> 00:25:43,915 Are there any rows I probably shouldn't insert into my table here? 465 00:25:43,915 --> 00:25:44,915 Let me ask the audience. 466 00:25:44,915 --> 00:25:46,460 You're free to raise your hand. 467 00:25:46,460 --> 00:25:49,775 Are there any rows I shouldn't insert into this table? 468 00:25:49,775 --> 00:25:52,970 469 00:25:52,970 --> 00:25:54,320 Let's go to [? Sukanya. ?] 470 00:25:54,320 --> 00:25:57,410 SPEAKER 5: Yes, I think they shouldn't include the first one, 471 00:25:57,410 --> 00:25:59,910 because it doesn't give us any value that we need. 472 00:25:59,910 --> 00:26:00,450 SPEAKER 1: Yeah, you're right. 473 00:26:00,450 --> 00:26:02,630 So if I look at this header row, as we call it, 474 00:26:02,630 --> 00:26:08,300 I see id, title, accession_number, and acquired-- the names for my columns. 475 00:26:08,300 --> 00:26:12,920 But notice in my schema, I already have those column names existing. 476 00:26:12,920 --> 00:26:17,840 So I shouldn't insert the value id into my "id" column. 477 00:26:17,840 --> 00:26:21,060 I want to just skip that row and only do the next one. 478 00:26:21,060 --> 00:26:25,880 So here by typing [? dash ?] [? dash ?] skip 1, I'll skip that header row. 479 00:26:25,880 --> 00:26:26,900 So let's try this now. 480 00:26:26,900 --> 00:26:29,420 I'll say .import [? dash ?] [? dash ?] CSV. 481 00:26:29,420 --> 00:26:31,294 I'm going to import a CSV. 482 00:26:31,294 --> 00:26:34,640 Now, [? dash ?] [? dash ?] skip, how many rows should I skip? 483 00:26:34,640 --> 00:26:37,040 Well, just one, the header row there. 484 00:26:37,040 --> 00:26:39,500 Now, I say the file to import. 485 00:26:39,500 --> 00:26:44,960 So I'll type mfa.csv, the CSV file to import into my database. 486 00:26:44,960 --> 00:26:49,040 Now, I type the name of the table I'm importing into, 487 00:26:49,040 --> 00:26:51,030 collections, in this case. 488 00:26:51,030 --> 00:26:56,060 And notice how I no longer need to [? quote ?] collections or [? quote ?] 489 00:26:56,060 --> 00:27:01,508 CSV, because this is not a SQL statement this is a SQLite statement, 490 00:27:01,508 --> 00:27:03,800 so I can get away with not [? quoting ?] anything here. 491 00:27:03,800 --> 00:27:05,810 Now, I'll hit Enter. 492 00:27:05,810 --> 00:27:08,570 And again, nothing seems to happen, but it's probably a good sign. 493 00:27:08,570 --> 00:27:14,240 Let me try selecting [? star ?] [? from ?] "collections" semicolon, 494 00:27:14,240 --> 00:27:15,140 Enter. 495 00:27:15,140 --> 00:27:17,960 And now all my data is just magically in there. 496 00:27:17,960 --> 00:27:23,903 It's went from my CSV into my very own table. 497 00:27:23,903 --> 00:27:27,590 OK, so that's a pretty nice step forward. 498 00:27:27,590 --> 00:27:31,100 No longer do we have to use just single line inserts or even 499 00:27:31,100 --> 00:27:33,530 bulk inserts, those multi-line inserts. 500 00:27:33,530 --> 00:27:37,790 We can now just import an entire table from a CSV. 501 00:27:37,790 --> 00:27:42,080 But I think I've been showing you a bit of an inaccuracy here, at least 502 00:27:42,080 --> 00:27:44,480 that's not often going to happen to you. 503 00:27:44,480 --> 00:27:49,280 If I go back to this file here, this CSV. 504 00:27:49,280 --> 00:27:53,480 What you might notice is that I specified the primary key. 505 00:27:53,480 --> 00:27:57,470 I said [? id ?] of 1, [? id ?] of 2, [? id ?] of 3. 506 00:27:57,470 --> 00:28:01,190 But if we'd heard before, this might not be the best design. 507 00:28:01,190 --> 00:28:04,610 Because what if I import this CSV, and there's already 508 00:28:04,610 --> 00:28:07,820 an item that has the id of [? 1 ?] or the id of 2. 509 00:28:07,820 --> 00:28:12,020 Ideally, I could let SQLite create its very own primary keys 510 00:28:12,020 --> 00:28:13,700 for each of these rows. 511 00:28:13,700 --> 00:28:15,950 And more often than not, you'll likely have 512 00:28:15,950 --> 00:28:20,870 a database or CSV that looks a bit more like this without the primary key. 513 00:28:20,870 --> 00:28:25,710 But you'll still want to have a primary key when you import into your database. 514 00:28:25,710 --> 00:28:29,000 So let's try working with this kind of CSV now 515 00:28:29,000 --> 00:28:32,870 and having SQLite generate some of the primary keys for us. 516 00:28:32,870 --> 00:28:34,730 Come back to my computer. 517 00:28:34,730 --> 00:28:40,440 And let's update our mfa.csv just to remove those primary key column here. 518 00:28:40,440 --> 00:28:44,900 So I will open up again mfa.csv. 519 00:28:44,900 --> 00:28:47,600 And let me try to just delete this column all together. 520 00:28:47,600 --> 00:28:54,680 I'll move [? id, ?] remove 1, and 2, and 3, and 4, and 5. 521 00:28:54,680 --> 00:28:59,250 Let me save it, and now that id column is gone. 522 00:28:59,250 --> 00:29:01,020 But now there's a problem. 523 00:29:01,020 --> 00:29:05,550 Like if I go to my schema, and I type .schema, 524 00:29:05,550 --> 00:29:09,300 notice how there are four columns in this table-- 525 00:29:09,300 --> 00:29:13,350 "id," "title," "accession_number," and "acquired." 526 00:29:13,350 --> 00:29:16,990 Well, now my CSV only has three columns. 527 00:29:16,990 --> 00:29:20,520 So if I try to import this CSV into this table, 528 00:29:20,520 --> 00:29:23,730 I'm going to run into trouble, because I have different numbers of columns 529 00:29:23,730 --> 00:29:26,755 [? between ?] my CSV and my table. 530 00:29:26,755 --> 00:29:29,130 Well, what can I do instead? 531 00:29:29,130 --> 00:29:32,910 It turns out that I could actually import this data 532 00:29:32,910 --> 00:29:36,480 into a temporary table, and then take that data 533 00:29:36,480 --> 00:29:40,050 and insert that data in that temporary table into my real one 534 00:29:40,050 --> 00:29:41,670 called "collections" here. 535 00:29:41,670 --> 00:29:47,670 I can use both .import and INSERT INTO to accomplish that task for me. 536 00:29:47,670 --> 00:29:48,810 So let's try this. 537 00:29:48,810 --> 00:29:50,760 I'll say .import. 538 00:29:50,760 --> 00:29:54,990 I want to import a CSV now-- [? dash ?] [? dash ?] [? CSV. ?] Which one? 539 00:29:54,990 --> 00:29:58,080 Well, mfa.csv in this case. 540 00:29:58,080 --> 00:30:01,902 Now, though, I want to create a brand new table that doesn't yet exist. 541 00:30:01,902 --> 00:30:03,360 So I'll type the name of the table. 542 00:30:03,360 --> 00:30:06,900 And I just call it temp for now to temporarily import 543 00:30:06,900 --> 00:30:09,390 this data into a brand new table. 544 00:30:09,390 --> 00:30:12,260 And notice how here I'm not using [? dash ?] [? dash ?] 545 00:30:12,260 --> 00:30:16,350 skip 1, because now I want to take advantage of those header rows. 546 00:30:16,350 --> 00:30:22,810 SQLite, if I import into a new table, will notice I have a header row 547 00:30:22,810 --> 00:30:27,600 and make my columns the very same names that are in my header row-- 548 00:30:27,600 --> 00:30:30,210 title, accession number, and acquired. 549 00:30:30,210 --> 00:30:31,380 I don't want to skip them. 550 00:30:31,380 --> 00:30:36,610 I want SQLite to see them and create this table using those header rows. 551 00:30:36,610 --> 00:30:39,990 So now, I'll hit Enter, and nothing seems to happen. 552 00:30:39,990 --> 00:30:45,000 But if I type .schema, what do I see, but a brand new table called "temp" 553 00:30:45,000 --> 00:30:49,050 that SQLite has just made automatically and notice how it used that header row. 554 00:30:49,050 --> 00:30:54,795 I have "title," "accession_number," and "acquired" as my column names. 555 00:30:54,795 --> 00:30:57,380 OK, let's now look inside the temp table, 556 00:30:57,380 --> 00:31:02,520 I'll say SELECT [? star ?] FROM "temp" semicolon. 557 00:31:02,520 --> 00:31:04,350 And now, I see all that data in there. 558 00:31:04,350 --> 00:31:06,200 But I don't have primary keys yet. 559 00:31:06,200 --> 00:31:10,070 So my goal [? is ?] [? really ?] to take all this data and insert it 560 00:31:10,070 --> 00:31:14,630 into my collections table to give it that primary key I've been wanting 561 00:31:14,630 --> 00:31:15,350 here. 562 00:31:15,350 --> 00:31:20,998 Again, if I type SELECT [? star ?] FROM "collections," well, nothing-- oh, 563 00:31:20,998 --> 00:31:22,040 something is still there. 564 00:31:22,040 --> 00:31:24,217 Let me actually just delete this [INAUDIBLE] 565 00:31:24,217 --> 00:31:27,050 I'll say DELETE FROM "collections" to remove it all together-- we'll 566 00:31:27,050 --> 00:31:28,550 [? get ?] [INAUDIBLE] just a minute. 567 00:31:28,550 --> 00:31:33,020 And now let me try, SELECT [? star ?] FROM "collections," 568 00:31:33,020 --> 00:31:36,750 and I should see nothing inside of collections. 569 00:31:36,750 --> 00:31:38,040 So what could I do? 570 00:31:38,040 --> 00:31:41,630 I could [? INSERT ?] [? INTO ?] [? collections ?] using the data from 571 00:31:41,630 --> 00:31:43,130 my temp table. 572 00:31:43,130 --> 00:31:43,980 So I'll try that. 573 00:31:43,980 --> 00:31:49,460 I'll say INSERT INTO "collections" and choose those columns yet again. 574 00:31:49,460 --> 00:31:54,450 I'll say the "title," and the "accession_number," 575 00:31:54,450 --> 00:31:58,900 and now let's go for the "acquired" column as well. 576 00:31:58,900 --> 00:32:00,370 I'll hit Enter. 577 00:32:00,370 --> 00:32:04,330 But instead of typing many new lines of values, 578 00:32:04,330 --> 00:32:10,240 I actually have a way to dynamically select all the values I want to insert, 579 00:32:10,240 --> 00:32:12,170 and it looks a bit like this. 580 00:32:12,170 --> 00:32:18,890 I could say INSERT INTO [? some ?] table and some columns of that table. 581 00:32:18,890 --> 00:32:23,330 But I want to insert the results of this SELECT down below. 582 00:32:23,330 --> 00:32:26,720 So I'll [? SELECT ?] some columns [? FROM ?] some separate table. 583 00:32:26,720 --> 00:32:31,850 And so long as these columns align, I'll be able to actually take the results 584 00:32:31,850 --> 00:32:35,180 of that [? SELECT ?] and [? insert ?] all of them [? into ?] this new table 585 00:32:35,180 --> 00:32:37,070 using INSERT INTO. 586 00:32:37,070 --> 00:32:39,920 So let me finish my statement here, we can see the results. 587 00:32:39,920 --> 00:32:42,050 Come back to my computer. 588 00:32:42,050 --> 00:32:47,690 And here, I'll type a SELECT to get back all the items from my temp table. 589 00:32:47,690 --> 00:32:52,070 I'll say SELECT-- in this case, SELECT, let's 590 00:32:52,070 --> 00:32:56,450 go for the "title" column, the "accession_number" 591 00:32:56,450 --> 00:33:01,100 column, and the "acquired" column from my "temp" table. 592 00:33:01,100 --> 00:33:05,270 Now, if I hit semicolon, I should see-- 593 00:33:05,270 --> 00:33:06,470 well, nothing at first. 594 00:33:06,470 --> 00:33:11,000 But if I type SELECT [? star ?] FROM "collections," 595 00:33:11,000 --> 00:33:16,610 I see all my data now in there selected from my temporary table. 596 00:33:16,610 --> 00:33:19,623 And now if I type .schema, I still see temp. 597 00:33:19,623 --> 00:33:20,540 But what can I do now? 598 00:33:20,540 --> 00:33:22,710 I could just delete that table altogether. 599 00:33:22,710 --> 00:33:28,370 I could say DROP TABLE "temp," as we saw last week, semicolon, hit Enter, 600 00:33:28,370 --> 00:33:30,090 .schema again. 601 00:33:30,090 --> 00:33:33,680 And now, we're back where we want to be with a single table, 602 00:33:33,680 --> 00:33:39,030 and now we've import our data from our CSV with primary keys. 603 00:33:39,030 --> 00:33:44,650 If I type "collections" here, we see it all in this table. 604 00:33:44,650 --> 00:33:48,190 OK, so we've seen several versions of import, one 605 00:33:48,190 --> 00:33:52,460 to import into an existing table and one to import into a brand new table. 606 00:33:52,460 --> 00:33:58,540 Let me ask then what questions we have on how to use .import or how to insert 607 00:33:58,540 --> 00:34:03,240 more data into our database. 608 00:34:03,240 --> 00:34:04,430 Let's go [? Louise. ?] 609 00:34:04,430 --> 00:34:08,330 SPEAKER 6: Can we ask the INSERT INTO command to place 610 00:34:08,330 --> 00:34:10,387 a column in a specific position? 611 00:34:10,387 --> 00:34:11,429 SPEAKER 1: Good question. 612 00:34:11,429 --> 00:34:16,028 Can we ask INSERT INTO the place a column in a certain position? 613 00:34:16,028 --> 00:34:19,070 There's a few ways of getting [? it-- ?] what I think you're asking here, 614 00:34:19,070 --> 00:34:20,695 so let me show you a few of those here. 615 00:34:20,695 --> 00:34:26,389 I'll come back to my computer, and let me pull up the syntax for INSERT INTO 616 00:34:26,389 --> 00:34:27,210 again. 617 00:34:27,210 --> 00:34:31,800 I will choose this, and I'll show it on the screen over here. 618 00:34:31,800 --> 00:34:36,409 So I'll say INSERT INTO some table given some columns. 619 00:34:36,409 --> 00:34:39,139 And notice how down below, I actually have [? values ?] 620 00:34:39,139 --> 00:34:41,600 to insert into those columns. 621 00:34:41,600 --> 00:34:45,120 I could, to your question, rearrange these values. 622 00:34:45,120 --> 00:34:49,670 So I might have this first value here is some value that goes into this column. 623 00:34:49,670 --> 00:34:52,159 The second column here is attached to that column. 624 00:34:52,159 --> 00:34:56,000 And that would rearrange the values I insert into those columns. 625 00:34:56,000 --> 00:34:59,270 If you're asking, though, if I can reorder the columns, 626 00:34:59,270 --> 00:35:02,733 that is up to [? CREATE ?] [? TABLE ?] and [? CREATE ?] [? TABLE ?] only. 627 00:35:02,733 --> 00:35:05,900 And in general, I encourage you not so much out of ordering of your columns, 628 00:35:05,900 --> 00:35:08,233 [? because ?] [? it ?] could be in any order whatsoever. 629 00:35:08,233 --> 00:35:10,880 But you could just rearrange your selection of columns 630 00:35:10,880 --> 00:35:14,780 here to insert the data you want to insert. 631 00:35:14,780 --> 00:35:17,610 OK, let's take one more question here. 632 00:35:17,610 --> 00:35:20,390 SPEAKER 7: What happens if one of the rows you're trying to insert 633 00:35:20,390 --> 00:35:22,340 violates a constraint on the table? 634 00:35:22,340 --> 00:35:25,140 SPEAKER 1: Yeah, so here we're inserting multiple rows. 635 00:35:25,140 --> 00:35:28,430 And if one actually violates some constraint, 636 00:35:28,430 --> 00:35:31,610 then we won't actually insert any of those rows. 637 00:35:31,610 --> 00:35:34,010 And this is because this insert is bundled inside 638 00:35:34,010 --> 00:35:38,640 of a transaction, which we'll learn more about later on in the course. 639 00:35:38,640 --> 00:35:40,233 Let's take another one. 640 00:35:40,233 --> 00:35:43,400 SPEAKER 8: I noticed that when you did SELECT [? star ?] FROM "collections," 641 00:35:43,400 --> 00:35:45,740 that one of the acquired dates was just blank. 642 00:35:45,740 --> 00:35:47,030 It didn't say NULL. 643 00:35:47,030 --> 00:35:51,290 Is that because the CSV itself just had an empty value next to the comma? 644 00:35:51,290 --> 00:35:52,790 SPEAKER 1: Yeah, great observations. 645 00:35:52,790 --> 00:35:53,850 So let me try this again. 646 00:35:53,850 --> 00:35:55,808 So we can see the results of this [? SELECT. ?] 647 00:35:55,808 --> 00:36:00,020 I'll come back to my computer here, and let me show you again. 648 00:36:00,020 --> 00:36:03,140 If I say SELECT [? star ?] FROM "collections" 649 00:36:03,140 --> 00:36:07,970 semicolon, notice how this acquired column-- this acquired cell here, if I 650 00:36:07,970 --> 00:36:11,330 go back to my screen, it's just blank. 651 00:36:11,330 --> 00:36:14,720 But we saw before that if I [? selected ?] [? star, ?] I would have 652 00:36:14,720 --> 00:36:19,460 seen [? NULL ?] there if this value was truly [? NULL. ?] Well, 653 00:36:19,460 --> 00:36:24,890 it turns out one downside of importing from a CSV is that all of your data is 654 00:36:24,890 --> 00:36:27,230 imported initially as text. 655 00:36:27,230 --> 00:36:30,920 And if I have just a blank cell in my CSV, 656 00:36:30,920 --> 00:36:34,130 it won't be converted to [? NULL ?] automatically. 657 00:36:34,130 --> 00:36:36,830 I need to do that myself perhaps using an [? update ?] 658 00:36:36,830 --> 00:36:38,790 statement that we'll see in just a bit. 659 00:36:38,790 --> 00:36:42,140 So be wary of this if you do want to keep track of [? null ?] [? values ?] 660 00:36:42,140 --> 00:36:42,930 and so on. 661 00:36:42,930 --> 00:36:46,070 If you don't actually manually make this value a [? null, ?] 662 00:36:46,070 --> 00:36:49,230 it'll appear as just a blank value not a null, 663 00:36:49,230 --> 00:36:52,770 which is different in this respect. 664 00:36:52,770 --> 00:36:58,380 OK so here, we've seen how to insert not just one row, but multiple, 665 00:36:58,380 --> 00:37:01,840 and also how to import data from a CSV. 666 00:37:01,840 --> 00:37:05,980 When we come back, we'll see how to actually update our data altogether, 667 00:37:05,980 --> 00:37:07,275 and even delete it too. 668 00:37:07,275 --> 00:37:09,150 So come back in just a few and talk about how 669 00:37:09,150 --> 00:37:10,950 to delete data from our tables. 670 00:37:10,950 --> 00:37:12,690 And we're back. 671 00:37:12,690 --> 00:37:16,540 So we just saw how to insert some rows into our database, 672 00:37:16,540 --> 00:37:18,480 and also to import some data. 673 00:37:18,480 --> 00:37:22,320 But presumably, we also want to be able to delete some data from our table 674 00:37:22,320 --> 00:37:23,290 as well. 675 00:37:23,290 --> 00:37:26,130 You can imagine the MFA, the Museum of Fine Arts, 676 00:37:26,130 --> 00:37:29,460 maybe they're selling a piece of artwork, or maybe they've lost one, 677 00:37:29,460 --> 00:37:30,750 or maybe it was stolen. 678 00:37:30,750 --> 00:37:34,980 But either way, they want to remove the row from their table. 679 00:37:34,980 --> 00:37:36,390 Well, let's see here. 680 00:37:36,390 --> 00:37:41,100 We can go back to our schema with a table of artifacts and artwork 681 00:37:41,100 --> 00:37:43,620 that is inside of the MFA. 682 00:37:43,620 --> 00:37:45,870 Now if I want to delete a particular piece, 683 00:37:45,870 --> 00:37:47,950 you could visually think of it a bit like this. 684 00:37:47,950 --> 00:37:51,270 I could first identify that row I want to delete. 685 00:37:51,270 --> 00:37:52,980 Let's say it's Spring outing here. 686 00:37:52,980 --> 00:37:54,810 We've sold this piece. 687 00:37:54,810 --> 00:37:58,210 Well, I could visually just remove this row. 688 00:37:58,210 --> 00:38:02,460 So it's no longer there and shift the remaining ones up metaphorically. 689 00:38:02,460 --> 00:38:05,610 And it turns out that to do this in SQL, we 690 00:38:05,610 --> 00:38:08,220 have our very own statement we can use. 691 00:38:08,220 --> 00:38:14,830 This one reads DELETE FROM some table WHERE a condition is true. 692 00:38:14,830 --> 00:38:17,550 So we see our old friend WHERE back again. 693 00:38:17,550 --> 00:38:20,610 And WHERE is vitally important to this DELETE FROM. 694 00:38:20,610 --> 00:38:23,910 If I say DELETE FROM table with no WHERE, 695 00:38:23,910 --> 00:38:26,790 what do you think is going to happen? 696 00:38:26,790 --> 00:38:29,850 We might drop everything from our table, right? 697 00:38:29,850 --> 00:38:32,910 But if I instead say WHERE some condition is true, 698 00:38:32,910 --> 00:38:38,170 I can select the rows I want to delete and only delete those rows. 699 00:38:38,170 --> 00:38:39,900 So let's try this here. 700 00:38:39,900 --> 00:38:43,080 I'll try to delete some artwork from our collections table, 701 00:38:43,080 --> 00:38:45,610 for example, maybe we sold it and we want get rid of it. 702 00:38:45,610 --> 00:38:51,960 I'll come back to my computer over here, and I will open up our SQLite database. 703 00:38:51,960 --> 00:38:54,540 I'll say sqlite3 mfa.db. 704 00:38:54,540 --> 00:38:58,290 And now, I can type SELECT [? star ?] [? FROM ?] "collections," 705 00:38:58,290 --> 00:39:03,990 and I see a few more items than last time now all inside of our table. 706 00:39:03,990 --> 00:39:07,650 But per the visual, I want to delete Spring outing, 707 00:39:07,650 --> 00:39:09,820 to remove it from this table. 708 00:39:09,820 --> 00:39:10,900 So what could I do? 709 00:39:10,900 --> 00:39:15,270 I could try DELETE FROM and then the table I want to delete from. 710 00:39:15,270 --> 00:39:20,400 In this case, I'll use the let's say [? collections' ?] [? name, ?] DELETE 711 00:39:20,400 --> 00:39:22,320 FROM "collections." 712 00:39:22,320 --> 00:39:23,620 But I don't want to do this. 713 00:39:23,620 --> 00:39:25,590 I don't want to say DELETE FROM "collections" 714 00:39:25,590 --> 00:39:30,780 semicolon end of statement, because then I'll delete every row. 715 00:39:30,780 --> 00:39:33,540 What I should instead do is this. 716 00:39:33,540 --> 00:39:37,590 DELETE FROM "collections" WHERE some condition is true. 717 00:39:37,590 --> 00:39:41,370 Maybe the title in this case is a title I'm trying to remove. 718 00:39:41,370 --> 00:39:47,430 So I'll say "title" equals "Spring outing," just like this, semicolon, 719 00:39:47,430 --> 00:39:48,510 Enter. 720 00:39:48,510 --> 00:39:49,740 Now, nothing seems to happen. 721 00:39:49,740 --> 00:39:55,140 But if I do SELECT [? star ?] FROM "collections" semicolon, 722 00:39:55,140 --> 00:39:57,420 we no longer see Spring outing. 723 00:39:57,420 --> 00:40:02,580 And notice here, how the id column, the id of 3 is now gone. 724 00:40:02,580 --> 00:40:05,880 If I were to insert some new row here, I would start over 725 00:40:05,880 --> 00:40:10,110 with the highest number, which is 7 in this case, going from 7 to 8. 726 00:40:10,110 --> 00:40:14,150 But 3 is no longer part of our database. 727 00:40:14,150 --> 00:40:18,170 Let me try this again, notice how this one called Imaginative Landscape, 728 00:40:18,170 --> 00:40:19,940 the acquired date is NULL. 729 00:40:19,940 --> 00:40:21,813 We don't know when we got it. 730 00:40:21,813 --> 00:40:23,480 Well, let's say that we eventually sell. 731 00:40:23,480 --> 00:40:25,010 It's no longer part of our database. 732 00:40:25,010 --> 00:40:29,810 We could use a condition based on NULL to remove this particular artwork. 733 00:40:29,810 --> 00:40:30,810 So I'll say this. 734 00:40:30,810 --> 00:40:36,770 Let me try to DELETE FROM "collections" WHERE "title" equals-- 735 00:40:36,770 --> 00:40:39,920 not Spring outing, but WHERE the acquired date-- sorry, 736 00:40:39,920 --> 00:40:44,480 the "acquired" date IS NULL semicolon. 737 00:40:44,480 --> 00:40:47,490 I could also use is NOT NULL-- 738 00:40:47,490 --> 00:40:51,350 WHERE "acquired" IS NOT NULL, but that would delete all the pieces of artwork 739 00:40:51,350 --> 00:40:53,510 that actually have an acquired date. 740 00:40:53,510 --> 00:40:57,260 Here, I want to leave only those that do not have an acquired date. 741 00:40:57,260 --> 00:41:00,770 So I'll say where "acquired" IS NULL. 742 00:41:00,770 --> 00:41:03,320 Now, I'll hit Enter, nothing seems to happen, 743 00:41:03,320 --> 00:41:07,850 but I'll say Select [? star ?] FROM "collections" semicolon. 744 00:41:07,850 --> 00:41:12,020 And I'll see that piece of artwork is now gone. 745 00:41:12,020 --> 00:41:16,820 I only have those that have an acquired date. 746 00:41:16,820 --> 00:41:22,880 OK, so similar to [? INSERT, ?] we've been able to [? delete ?] one row 747 00:41:22,880 --> 00:41:23,990 at a time. 748 00:41:23,990 --> 00:41:28,370 But it would probably be helpful for us to delete multiple rows at a time 749 00:41:28,370 --> 00:41:29,550 as well. 750 00:41:29,550 --> 00:41:31,460 Let's look again at our table. 751 00:41:31,460 --> 00:41:36,200 Here we have our artwork, as it currently stands, and maybe 752 00:41:36,200 --> 00:41:42,020 let's say we want to delete those pieces that we acquired before 1909. 753 00:41:42,020 --> 00:41:46,970 Here in Boston, the MFA actually moved locations in 1909 to a new place 754 00:41:46,970 --> 00:41:49,770 still in Boston, but a brand new building altogether. 755 00:41:49,770 --> 00:41:53,090 So let's say that perhaps they left some items at their old location, 756 00:41:53,090 --> 00:41:55,760 they're no longer part of their collections. 757 00:41:55,760 --> 00:42:02,920 Via what condition do you think could I select these three rows from 758 00:42:02,920 --> 00:42:08,670 my [? DELETE ?] [? FROM? ?] What could I put in my condition to delete these 759 00:42:08,670 --> 00:42:09,900 three rows? 760 00:42:09,900 --> 00:42:11,670 Let me ask the audience here. 761 00:42:11,670 --> 00:42:15,780 What might you propose for [? delete ?] [? from ?] here to delete those three 762 00:42:15,780 --> 00:42:16,575 particular rows? 763 00:42:16,575 --> 00:42:21,540 764 00:42:21,540 --> 00:42:22,650 Let's go to [? Shiv. ?] 765 00:42:22,650 --> 00:42:26,468 SPEAKER 9: Greater than [? 5. ?] [? id ?] equals greater than 5. 766 00:42:26,468 --> 00:42:28,010 SPEAKER 1: That's a good observation. 767 00:42:28,010 --> 00:42:29,468 So we could look at our table here. 768 00:42:29,468 --> 00:42:33,490 We see the id could be greater than or equal to 5, that would 769 00:42:33,490 --> 00:42:36,220 remove these three rows certainly. 770 00:42:36,220 --> 00:42:38,380 There's probably another way to do this too. 771 00:42:38,380 --> 00:42:42,190 Any other ideas besides the id, what else can we use? 772 00:42:42,190 --> 00:42:49,860 773 00:42:49,860 --> 00:42:51,710 Let's go to [? Yasavi. ?] 774 00:42:51,710 --> 00:42:55,070 SPEAKER 10: So we could compare the dates and use greater than or less 775 00:42:55,070 --> 00:42:58,340 than sign with the acquired dates. 776 00:42:58,340 --> 00:43:00,800 SPEAKER 1: So you're proposing to use the date column too. 777 00:43:00,800 --> 00:43:02,120 That makes sense to me as well. 778 00:43:02,120 --> 00:43:05,300 We could see acquired here, perhaps we could see, 779 00:43:05,300 --> 00:43:11,100 is the acquired value less than 1909, that year that we move locations, 780 00:43:11,100 --> 00:43:14,510 we could also probably use the title, the [? accession ?] [? number, ?] 781 00:43:14,510 --> 00:43:15,390 and so on. 782 00:43:15,390 --> 00:43:17,600 And each has their own [? trade-offs. ?] 783 00:43:17,600 --> 00:43:20,030 Remember, though, for this particular query, which 784 00:43:20,030 --> 00:43:23,480 is deleting those items that were part of the museum 785 00:43:23,480 --> 00:43:29,210 acquired before 1909, probably best to use the acquired 786 00:43:29,210 --> 00:43:32,240 date to actually delete those rows. 787 00:43:32,240 --> 00:43:35,930 What if, for instance, our ids are actually not like this. 788 00:43:35,930 --> 00:43:38,210 But they're kind of interspersed around. 789 00:43:38,210 --> 00:43:41,870 I couldn't really do a query like if the id is greater than or equal to 5, 790 00:43:41,870 --> 00:43:45,110 because I could also include those that were maybe acquired 1911, 791 00:43:45,110 --> 00:43:48,680 [? 1956, ?] and so on. [? It's ?] [? probably ?] ideal for me to use this 792 00:43:48,680 --> 00:43:52,140 date if it's the date that I ultimately care about. 793 00:43:52,140 --> 00:43:53,210 So let's try this. 794 00:43:53,210 --> 00:43:55,580 I'll go back to my environment. 795 00:43:55,580 --> 00:43:59,090 And I'll try to delete these three particular rows. 796 00:43:59,090 --> 00:44:03,750 I'll try DELETE FROM "collections," as we saw before. 797 00:44:03,750 --> 00:44:05,630 But now here comes my condition. 798 00:44:05,630 --> 00:44:12,230 I'll say WHERE the "acquired" column is less than some date that I'll give. 799 00:44:12,230 --> 00:44:17,180 And here, I'll give 1909-01-01. 800 00:44:17,180 --> 00:44:21,260 It turns out that SQLite has a few ways to represent dates. 801 00:44:21,260 --> 00:44:26,300 One of which is this format that follows the YYYY or the year, 802 00:44:26,300 --> 00:44:31,280 four digits, and then dash, and then MM or the month with two digits, 803 00:44:31,280 --> 00:44:35,340 and then dash again, DD, in this case, the day in two digits. 804 00:44:35,340 --> 00:44:42,470 So 1909-01-01 means January 1st, 1909. 805 00:44:42,470 --> 00:44:46,430 And I can use these same operators with dates. 806 00:44:46,430 --> 00:44:51,230 I could say greater than, or less than, or equal to this particular date. 807 00:44:51,230 --> 00:44:54,380 And SQLite will be able to parse this for me to understand what I mean, 808 00:44:54,380 --> 00:44:58,010 which is earlier than 1909. 809 00:44:58,010 --> 00:44:59,330 So let's try this then. 810 00:44:59,330 --> 00:45:02,450 I'll hit Enter, and nothing seems to happen. 811 00:45:02,450 --> 00:45:05,870 But if I do SELECT [? star ?] FROM "collections," 812 00:45:05,870 --> 00:45:10,820 I now see I'm down to two paintings, two pieces of artwork that 813 00:45:10,820 --> 00:45:15,720 were acquired only after 1909. 814 00:45:15,720 --> 00:45:23,190 OK, so given this part on DELETE, we're able to delete one row or even multiple 815 00:45:23,190 --> 00:45:24,315 rows at a time. 816 00:45:24,315 --> 00:45:26,190 But there are still instances where you might 817 00:45:26,190 --> 00:45:31,080 want to think about, should we delete something or can we delete something. 818 00:45:31,080 --> 00:45:35,010 We'll talk about those in just a minute in terms of constraints. 819 00:45:35,010 --> 00:45:36,180 We back in a few. 820 00:45:36,180 --> 00:45:37,860 And we're back. 821 00:45:37,860 --> 00:45:43,260 So we just saw how to delete one row, even multiple rows. 822 00:45:43,260 --> 00:45:46,620 We haven't yet talked about whether we should delete some data 823 00:45:46,620 --> 00:45:49,020 or whether we should delete some data and particularly 824 00:45:49,020 --> 00:45:51,180 in the context of these constraints, where 825 00:45:51,180 --> 00:45:53,340 you might have maybe a piece of data you actually 826 00:45:53,340 --> 00:45:57,460 shouldn't delete to maintain the integrity of your table. 827 00:45:57,460 --> 00:46:00,870 Now, one example of this is a [? Foreign ?] [? Key ?] 828 00:46:00,870 --> 00:46:05,880 [? Constraint, ?] which means that you have some table with a primary key that 829 00:46:05,880 --> 00:46:08,163 is referenced by some other table. 830 00:46:08,163 --> 00:46:10,080 And we'll get concrete about this in a minute. 831 00:46:10,080 --> 00:46:13,380 But it might mean that if you were to delete that primary key, 832 00:46:13,380 --> 00:46:16,210 that other table would have nothing to reference. 833 00:46:16,210 --> 00:46:20,610 So let's update our schema here for the MFA, the Museum of Fine Arts. 834 00:46:20,610 --> 00:46:25,140 And let's say that they have not just a collection now, but also artists 835 00:46:25,140 --> 00:46:25,890 involved. 836 00:46:25,890 --> 00:46:29,970 And there exists a [? many ?] [? to ?] [? many ?] relationship among artists 837 00:46:29,970 --> 00:46:32,160 and items in the collection. 838 00:46:32,160 --> 00:46:36,480 We could say that an artist might make more than one piece of art 839 00:46:36,480 --> 00:46:37,710 in the collection. 840 00:46:37,710 --> 00:46:42,480 And a piece of art might be created by more than one artist. 841 00:46:42,480 --> 00:46:47,050 Now concretely in our database, this might look as follows. 842 00:46:47,050 --> 00:46:51,480 I could have a table for artists and a table for collections. 843 00:46:51,480 --> 00:46:54,780 And notice how each has their own primary key. 844 00:46:54,780 --> 00:46:58,110 Artists has an id column, called our primary key column. 845 00:46:58,110 --> 00:47:02,400 Collections too has an id called our primary key column here. 846 00:47:02,400 --> 00:47:06,420 Now, in the middle is this table created that symbolizes, 847 00:47:06,420 --> 00:47:11,280 that represents the relationship among artists and items in the collection. 848 00:47:11,280 --> 00:47:16,380 Here on the first row, we see the artist id of 1 created the piece of art 849 00:47:16,380 --> 00:47:18,750 with the collection id of 2. 850 00:47:18,750 --> 00:47:20,830 So who did what here? 851 00:47:20,830 --> 00:47:25,200 Well, we can see that the artist with the ID of 1 is Li Yin. 852 00:47:25,200 --> 00:47:31,740 Li Yin created this piece of art, the id of 2, which is Imaginative landscape. 853 00:47:31,740 --> 00:47:35,340 So we can kind of relate in this case artists with collections 854 00:47:35,340 --> 00:47:37,890 as we saw just a few weeks ago. 855 00:47:37,890 --> 00:47:41,880 But let's say we decide to delete a particular artist, 856 00:47:41,880 --> 00:47:45,510 maybe we delete Unidentified artists down here. 857 00:47:45,510 --> 00:47:48,100 Well, we could just delete from the artist table. 858 00:47:48,100 --> 00:47:53,940 Maybe we find a condition to select this row and we delete from artists. 859 00:47:53,940 --> 00:47:56,610 We'll do that here remove that row. 860 00:47:56,610 --> 00:47:59,520 But what have we done wrong? 861 00:47:59,520 --> 00:48:03,560 If we look at our created table in particular, 862 00:48:03,560 --> 00:48:06,350 what have we now done wrong? 863 00:48:06,350 --> 00:48:13,860 What kind of problem might we [? run ?] [? into? ?] Let's go to [? Karim. ?] 864 00:48:13,860 --> 00:48:17,315 SPEAKER 11: [? Yeah ?] [INAUDIBLE] [? was ?] [? only ?] [? deleted ?] 865 00:48:17,315 --> 00:48:19,940 [? from ?] [? the ?] [? artist ?] [INAUDIBLE] without the other 866 00:48:19,940 --> 00:48:20,753 [INAUDIBLE]. 867 00:48:20,753 --> 00:48:21,920 SPEAKER 1: Yeah, good point. 868 00:48:21,920 --> 00:48:27,650 We only deleted the artist with the id of 3 from the artists table. 869 00:48:27,650 --> 00:48:34,343 Now if I look at the created table and I look for the artist with id of 3, 870 00:48:34,343 --> 00:48:35,010 do you see them? 871 00:48:35,010 --> 00:48:36,190 They don't exist anymore. 872 00:48:36,190 --> 00:48:38,190 So I can't understand this relationship. 873 00:48:38,190 --> 00:48:42,120 I don't know the artist with the id of 3. 874 00:48:42,120 --> 00:48:47,790 So let's try this now with a new kind of schema now in our database. 875 00:48:47,790 --> 00:48:50,280 I'll go back to my computer. 876 00:48:50,280 --> 00:48:53,850 And let me open up our new version of our database. 877 00:48:53,850 --> 00:49:01,440 I'll type sqlite3, in this case, mfa.db, same name, but now a different schema. 878 00:49:01,440 --> 00:49:05,640 If I type .schema, notice a few things here. 879 00:49:05,640 --> 00:49:09,090 I have those same tables we saw visually. 880 00:49:09,090 --> 00:49:13,740 I have collections, artists, and a created table. 881 00:49:13,740 --> 00:49:16,680 And [? let me ?] focus in particular on this created table. 882 00:49:16,680 --> 00:49:21,090 If I say .schema, in this case, created, Enter, 883 00:49:21,090 --> 00:49:26,610 I'll see I have a FOREIGN KEY constraint on my "artist_id" 884 00:49:26,610 --> 00:49:30,040 column and my "collection_id" column. 885 00:49:30,040 --> 00:49:36,300 So if I tried to delete an artist that is referenced in my created table, 886 00:49:36,300 --> 00:49:39,990 I would probably raise a constraint error-- 887 00:49:39,990 --> 00:49:42,370 a FOREIGN KEY [? constraint ?] error. 888 00:49:42,370 --> 00:49:45,670 So let's just try this and see what happens. 889 00:49:45,670 --> 00:49:50,550 Let me try DELETE FROM the [? artist's ?] [? table, ?] DELETE FROM 890 00:49:50,550 --> 00:49:59,060 "artists" WHERE, let's say the "name" of the artist is literally "Unidentified 891 00:49:59,060 --> 00:49:59,940 artist." 892 00:49:59,940 --> 00:50:03,560 This is the name they have in the Museum of Fine Arts collections-- 893 00:50:03,560 --> 00:50:05,270 Unidentified artist. 894 00:50:05,270 --> 00:50:08,270 So now I want to delete them from that table. 895 00:50:08,270 --> 00:50:14,130 Well, if I hit Enter, I do get that FOREIGN KEY constraint error. 896 00:50:14,130 --> 00:50:19,610 So does this mean I just can't delete this artist or is there a workaround? 897 00:50:19,610 --> 00:50:21,290 Let's look at it visually again. 898 00:50:21,290 --> 00:50:24,740 I'll go back to what we had before as our table, 899 00:50:24,740 --> 00:50:27,620 and let me ask the audience here. 900 00:50:27,620 --> 00:50:29,210 What's the solution? 901 00:50:29,210 --> 00:50:33,140 If I can't delete Unidentified artist, because they 902 00:50:33,140 --> 00:50:38,300 have an id referenced by this table, what should I maybe do first? 903 00:50:38,300 --> 00:50:39,695 What should I do instead? 904 00:50:39,695 --> 00:50:41,570 SPEAKER 12: Yeah, I want to say that first we 905 00:50:41,570 --> 00:50:46,160 can delete the id that is being referenced by this [? id ?] 906 00:50:46,160 --> 00:50:48,740 that we want to delete, then we can delete it. 907 00:50:48,740 --> 00:50:51,170 In fact, if we want to delete it from the artists, 908 00:50:51,170 --> 00:50:54,950 first we have to delete from the created table [INAUDIBLE] table, which 909 00:50:54,950 --> 00:50:58,070 it is referencing, then we can delete from the artist [INAUDIBLE] 910 00:50:58,070 --> 00:51:00,060 SPEAKER 1: Yeah, I like your thinking here. 911 00:51:00,060 --> 00:51:02,480 So if we look at the created table, you notice 912 00:51:02,480 --> 00:51:05,510 that we can't delete Unidentified artist, because it's 913 00:51:05,510 --> 00:51:07,430 referenced in the created table. 914 00:51:07,430 --> 00:51:12,230 What we should maybe first do is delete this row delete their affiliation 915 00:51:12,230 --> 00:51:16,100 with their work, and then delete the artist so we don't 916 00:51:16,100 --> 00:51:18,900 run into this FOREIGN KEY constraint. 917 00:51:18,900 --> 00:51:22,520 So let's try this in our SQLite environment. 918 00:51:22,520 --> 00:51:24,780 Come back to my computer. 919 00:51:24,780 --> 00:51:30,320 And again, our plan was to first delete the artists affiliation with their work 920 00:51:30,320 --> 00:51:32,210 from the created table. 921 00:51:32,210 --> 00:51:34,460 So let me show you first the created table. 922 00:51:34,460 --> 00:51:38,240 I'll say SELECT [? star ?] from "created," 923 00:51:38,240 --> 00:51:42,080 and see how it's exactly like the table we just visualized. 924 00:51:42,080 --> 00:51:46,820 Now, let me try delete this artist affiliation with their work. 925 00:51:46,820 --> 00:51:48,870 I'm not deleting the artist. 926 00:51:48,870 --> 00:51:52,280 I'm deleting their affiliation with their authorship, 927 00:51:52,280 --> 00:51:55,710 their [? artist ?] [? ship, ?] whatever you want to call it for their work. 928 00:51:55,710 --> 00:51:59,810 Now, I'll say DELETE FROM-- 929 00:51:59,810 --> 00:52:07,310 let's say, DELETE FROM "artists" WHERE the "artist_id" is equal to 930 00:52:07,310 --> 00:52:10,600 or equals 3. 931 00:52:10,600 --> 00:52:12,770 Well, I could do this. 932 00:52:12,770 --> 00:52:16,840 But there's probably a better way, like I know the id, and that's fine. 933 00:52:16,840 --> 00:52:19,370 But we could also use, as we saw a little bit ago, 934 00:52:19,370 --> 00:52:22,510 which were [? subqueries, ?] a way to write a query that 935 00:52:22,510 --> 00:52:25,960 returns me the result. And that result gets included in another query 936 00:52:25,960 --> 00:52:26,840 altogether. 937 00:52:26,840 --> 00:52:28,270 So let's try that instead. 938 00:52:28,270 --> 00:52:31,030 Let's say I don't know the artist_id. 939 00:52:31,030 --> 00:52:33,130 What I could do is make a [? subquery. ?] 940 00:52:33,130 --> 00:52:39,430 I could say inside of this, let me select the "id" FROM the "artists" 941 00:52:39,430 --> 00:52:48,550 table WHERE, in this case, the "name" equals "Unidentified artist," and then, 942 00:52:48,550 --> 00:52:51,130 let me close out this query, hit a semicolon. 943 00:52:51,130 --> 00:52:54,340 And I already see a typo in this, but I'm just going to try it anyway. 944 00:52:54,340 --> 00:52:57,040 I will go ahead, and hit Enter. 945 00:52:57,040 --> 00:53:03,700 And let's see if I type SELECT [? star ?] FROM "artists," Enter, 946 00:53:03,700 --> 00:53:06,550 I still see Unidentified artist. 947 00:53:06,550 --> 00:53:08,480 And I think my typo here was as follows. 948 00:53:08,480 --> 00:53:11,050 If I up my query using the up arrow on my keyboard, 949 00:53:11,050 --> 00:53:16,300 I see DELETE from "artists," where the "artist_id" equals some value here. 950 00:53:16,300 --> 00:53:19,270 But do I want to delete from artists? 951 00:53:19,270 --> 00:53:19,990 I don't. 952 00:53:19,990 --> 00:53:21,610 I actually can't delete from artists. 953 00:53:21,610 --> 00:53:25,340 What I should do instead is delete from the created table. 954 00:53:25,340 --> 00:53:26,350 So let's try this. 955 00:53:26,350 --> 00:53:30,640 I'll say DELETE FROM "created," making sure I only 956 00:53:30,640 --> 00:53:33,370 have double quotes around created, then I'll 957 00:53:33,370 --> 00:53:38,470 say WHERE the artist_id equals, let's say, either 3-- 958 00:53:38,470 --> 00:53:41,530 but it could also be the result of this [? subquery. ?] 959 00:53:41,530 --> 00:53:47,410 I'll say 1, 2, 3, 4, and then SELECT the "id" 960 00:53:47,410 --> 00:53:55,880 FROM the "artist" table WHERE the "name" equals "Unidentified artist." 961 00:53:55,880 --> 00:54:00,230 Then I'll close this [? subquery, ?] hitting semicolon, hit Enter. 962 00:54:00,230 --> 00:54:05,750 And now, if I say SELECT [? star ?] FROM "created" semicolon, 963 00:54:05,750 --> 00:54:09,500 I should see that this artist's affiliation with their work 964 00:54:09,500 --> 00:54:11,600 no longer exists. 965 00:54:11,600 --> 00:54:13,110 And because it no longer exists. 966 00:54:13,110 --> 00:54:15,990 I can now delete them from the artists table. 967 00:54:15,990 --> 00:54:20,210 I don't have this FOREIGN KEY artist_id referencing 968 00:54:20,210 --> 00:54:22,520 the primary key of this artist. 969 00:54:22,520 --> 00:54:23,900 So let's now try that. 970 00:54:23,900 --> 00:54:26,960 I could say SELECT [? star ?] FROM "artists," 971 00:54:26,960 --> 00:54:30,650 I see that Unidentified artist is still in here. 972 00:54:30,650 --> 00:54:32,510 But let's delete them now. 973 00:54:32,510 --> 00:54:36,290 I'll say delete from, let's say, "artists" WHERE 974 00:54:36,290 --> 00:54:42,620 the "name" equals "Unidentified artist," closing my quotes, semicolon, hit 975 00:54:42,620 --> 00:54:43,280 Enter. 976 00:54:43,280 --> 00:54:47,780 And now if I Select [? star ?] FROM "artists" semicolon, 977 00:54:47,780 --> 00:54:52,520 I should see they're no longer in this table to our earlier example. 978 00:54:52,520 --> 00:54:54,750 This becomes a 2-step process. 979 00:54:54,750 --> 00:54:59,480 First delete their affiliation, then delete their name. 980 00:54:59,480 --> 00:55:07,910 OK, So let's try this yet again, but now using some additional tools 981 00:55:07,910 --> 00:55:10,140 that we have at our disposal. 982 00:55:10,140 --> 00:55:16,860 So let's look back at this FOREIGN KEY constraint that existed in this table. 983 00:55:16,860 --> 00:55:19,220 If you remember, it looked a bit like this. 984 00:55:19,220 --> 00:55:21,980 In my created table, I had this line. 985 00:55:21,980 --> 00:55:28,220 FOREIGN KEY, artist_id, REFERENCES the "id" column in "artists." 986 00:55:28,220 --> 00:55:34,430 Well, if that is the case, I can't delete the artist with the id 987 00:55:34,430 --> 00:55:39,170 referenced by this [? foreign ?] [? key. ?] But it turns out that I 988 00:55:39,170 --> 00:55:44,390 could specify some alternative action that happens when I try to delete 989 00:55:44,390 --> 00:55:48,590 the id that is referenced by this [? foreign ?] [? key. ?] I can specify 990 00:55:48,590 --> 00:55:52,040 that using a keyword called ON DELETE. 991 00:55:52,040 --> 00:55:54,860 So after FOREIGN KEY REFERENCES [? "artist id," ?] I could say 992 00:55:54,860 --> 00:55:59,600 afterwards ON DELETE, and then specify some action I want to actually happen 993 00:55:59,600 --> 00:56:03,500 when I tried to delete the primary key referenced by this [? foreign ?] 994 00:56:03,500 --> 00:56:04,430 [? key. ?] 995 00:56:04,430 --> 00:56:08,000 One thing I could use would be RESTRICT, which 996 00:56:08,000 --> 00:56:09,980 is kind of like what we saw before. 997 00:56:09,980 --> 00:56:14,600 If I try to delete a primary key is [? referenced ?] by this [? foreign ?] 998 00:56:14,600 --> 00:56:17,090 [? key, ?] I will not be allowed to do it. 999 00:56:17,090 --> 00:56:19,720 That action is restricted. 1000 00:56:19,720 --> 00:56:23,410 I could also decide to take NO ACTION. 1001 00:56:23,410 --> 00:56:28,150 In this case, I could delete the primary key [? referenced ?] by this 1002 00:56:28,150 --> 00:56:30,560 [? foreign ?] [? key, ?] and nothing would happen. 1003 00:56:30,560 --> 00:56:35,290 I would actually be allowed to do that, which may be unwise in some cases, 1004 00:56:35,290 --> 00:56:38,050 but I could give myself that power. 1005 00:56:38,050 --> 00:56:41,590 I could also decide to SET NULL. 1006 00:56:41,590 --> 00:56:46,330 That is if I delete the primary key that is [? referenced ?] by this 1007 00:56:46,330 --> 00:56:49,690 [? foreign ?] [? key, ?] what I'll do is set the [? foreign ?] [? key ?] to be 1008 00:56:49,690 --> 00:56:53,657 NULL, meaning that value no longer exists. 1009 00:56:53,657 --> 00:56:56,740 I can alternatively [? set ?] it to a [? default ?] value for that column. 1010 00:56:56,740 --> 00:57:02,770 Or perhaps, most compellingly, I could try to CASCADE the deletion. 1011 00:57:02,770 --> 00:57:06,590 Where CASCADE means if I delete the artist, for instance, 1012 00:57:06,590 --> 00:57:11,590 let's also delete their affiliation with their artwork all in one go. 1013 00:57:11,590 --> 00:57:16,430 This converts our 2-step process into a 1-step process. 1014 00:57:16,430 --> 00:57:18,710 So let's visualize this. 1015 00:57:18,710 --> 00:57:24,040 Let's say we have now applied this constraint ON DELETE CASCADE. 1016 00:57:24,040 --> 00:57:27,850 So if I delete the artist, I'll also delete their affiliation 1017 00:57:27,850 --> 00:57:29,110 with their work. 1018 00:57:29,110 --> 00:57:30,610 We'll have the same two tables-- 1019 00:57:30,610 --> 00:57:33,610 artists, collections, and created. 1020 00:57:33,610 --> 00:57:39,430 Now again, artist_id references this primary key in artists. 1021 00:57:39,430 --> 00:57:40,510 Now, I'm going to try. 1022 00:57:40,510 --> 00:57:43,550 I want to delete Unidentified artist here. 1023 00:57:43,550 --> 00:57:44,510 Well, I could do that. 1024 00:57:44,510 --> 00:57:45,940 I could just delete their row. 1025 00:57:45,940 --> 00:57:50,440 And now, instead of a FOREIGN KEY constraint error, what I get 1026 00:57:50,440 --> 00:57:51,490 is the following. 1027 00:57:51,490 --> 00:57:55,690 [INAUDIBLE] [? created ?] that this row is also gone. 1028 00:57:55,690 --> 00:57:59,960 The row that had the artist_id of 3 gets removed. 1029 00:57:59,960 --> 00:58:04,960 We've cascaded the deletion onward to the created table. 1030 00:58:04,960 --> 00:58:08,760 So let's try this now with a new database schema. 1031 00:58:08,760 --> 00:58:10,590 I'll go back to my computer here. 1032 00:58:10,590 --> 00:58:14,497 1033 00:58:14,497 --> 00:58:16,080 So I'm back in my [? TERMINAL ?] here. 1034 00:58:16,080 --> 00:58:21,180 And I can type sqlite3 mfa.db to reopen this database. 1035 00:58:21,180 --> 00:58:26,160 And notice how if I type .schema, I've updated this schema to now have ON 1036 00:58:26,160 --> 00:58:27,150 DELETE CASCADE. 1037 00:58:27,150 --> 00:58:29,130 Let me show you over here on the screen. 1038 00:58:29,130 --> 00:58:34,290 Here on the created table, I now have the very same kind of table schema, 1039 00:58:34,290 --> 00:58:38,610 but now my only difference is I've applied this ON DELETE action 1040 00:58:38,610 --> 00:58:40,560 to my [? foreign ?] [? key ?] constraints. 1041 00:58:40,560 --> 00:58:42,870 In particular, I'm going to [? cascade ?] the delete 1042 00:58:42,870 --> 00:58:46,570 from the artist's table to the created table. 1043 00:58:46,570 --> 00:58:49,110 So let's try that out in SQLite. 1044 00:58:49,110 --> 00:58:52,770 I'll come back over here, and I will now try just 1045 00:58:52,770 --> 00:59:01,440 DELETE FROM the "artists" table WHERE the "name" equals "Unidentified artist. 1046 00:59:01,440 --> 00:59:02,430 semicolon. 1047 00:59:02,430 --> 00:59:08,280 Now, I'll hit Enter, and I don't get a FOREIGN KEY constraint anymore. 1048 00:59:08,280 --> 00:59:12,780 But if I say SELECT [? star ?] FROM "created" semicolon, 1049 00:59:12,780 --> 00:59:18,130 notice how I've also deleted the artist's affiliation with their work. 1050 00:59:18,130 --> 00:59:21,960 So that is wherever in the artist_id column, 1051 00:59:21,960 --> 00:59:27,940 I saw the id for the artist I deleted, I would to delete that row, 1052 00:59:27,940 --> 00:59:34,290 so I have no references to that primary key, which is now gone from my table. 1053 00:59:34,290 --> 00:59:36,340 So let me ask here. 1054 00:59:36,340 --> 00:59:41,610 We've seen how to delete single rows, how to delete multiple rows, 1055 00:59:41,610 --> 00:59:44,595 and now how to delete data among some constraints, 1056 00:59:44,595 --> 00:59:46,470 like our [? foreign ?] [? key ?] constraints. 1057 00:59:46,470 --> 00:59:51,630 What questions do we have on those techniques? 1058 00:59:51,630 --> 00:59:52,650 Let's go to [? Han. ?] 1059 00:59:52,650 --> 00:59:55,740 SPEAKER 13: I have a question regarding deleting. 1060 00:59:55,740 --> 01:00:01,290 The [? id ?] numbers have been removed while we were removing the record. 1061 01:00:01,290 --> 01:00:03,900 I was wondering if do we have to clean it up somehow 1062 01:00:03,900 --> 01:00:07,917 later or will they be populated with the new data as it comes along? 1063 01:00:07,917 --> 01:00:09,250 SPEAKER 1: Yeah, great question. 1064 01:00:09,250 --> 01:00:13,690 So the question is, what happens to our primary keys when we delete our data. 1065 01:00:13,690 --> 01:00:15,900 So for this one, let me show you the visual 1066 01:00:15,900 --> 01:00:19,680 again that we had before of our tables nicely printed on the side. 1067 01:00:19,680 --> 01:00:26,790 So I'll come back here, and I will go back to our idea of these joint tables 1068 01:00:26,790 --> 01:00:31,440 where we had an artists table, a collections table, and a created table. 1069 01:00:31,440 --> 01:00:33,330 And we saw before that we're going to delete 1070 01:00:33,330 --> 01:00:36,000 this artist called Unidentified artist. 1071 01:00:36,000 --> 01:00:40,290 So I'll delete them, and to your point, [? well ?] the idea of 3 1072 01:00:40,290 --> 01:00:43,410 no longer exists in this case. 1073 01:00:43,410 --> 01:00:50,100 Now by default, at least in SQLite, if I insert a new row, what I'll do 1074 01:00:50,100 --> 01:00:53,940 is take the highest id value and I will then 1075 01:00:53,940 --> 01:00:58,440 make that the new primary key for the new row that I insert. 1076 01:00:58,440 --> 01:01:01,140 That is the default situation. 1077 01:01:01,140 --> 01:01:04,170 I could if I wanted to get more specific, 1078 01:01:04,170 --> 01:01:11,550 and if I had done in my id integer column here in my [? create ?] 1079 01:01:11,550 --> 01:01:14,700 [? table, ?] if I had also said this [? keyword ?] [? call ?] 1080 01:01:14,700 --> 01:01:19,680 [? autoincrement, ?] all one word, what would happen instead is I would 1081 01:01:19,680 --> 01:01:23,710 actually reinsert an ID that is not used. 1082 01:01:23,710 --> 01:01:28,090 So in this case 3 is not used, I could insert that one here. 1083 01:01:28,090 --> 01:01:30,360 So up to you what you want to do. 1084 01:01:30,360 --> 01:01:35,670 In general, SQlite by default will take the highest ID, add one from there. 1085 01:01:35,670 --> 01:01:38,940 If you specify this constraint [? called ?] [? autoincrement, ?] you 1086 01:01:38,940 --> 01:01:44,310 will instead take whatever id is not being used and use that in your insert 1087 01:01:44,310 --> 01:01:46,460 afterwards. 1088 01:01:46,460 --> 01:01:51,530 OK, so now we've seen how to insert, and how to delete data. 1089 01:01:51,530 --> 01:01:55,560 But of course, we make mistakes when we add data or even when we delete data. 1090 01:01:55,560 --> 01:01:58,580 So we'll see in just a minute how to update our values 1091 01:01:58,580 --> 01:02:01,910 as well to correct typos and even to update associations 1092 01:02:01,910 --> 01:02:04,790 between artists and artwork. 1093 01:02:04,790 --> 01:02:06,510 And we're back. 1094 01:02:06,510 --> 01:02:10,250 So we've so far seen how to insert some data into our tables 1095 01:02:10,250 --> 01:02:12,080 and how to delete it. 1096 01:02:12,080 --> 01:02:15,470 But sometimes, we don't want to fully delete something, 1097 01:02:15,470 --> 01:02:18,500 we just want to change its value to correct some typo 1098 01:02:18,500 --> 01:02:20,480 or correct some association. 1099 01:02:20,480 --> 01:02:24,950 So let's think to our MFA example, [? where ?] the Museum of Fine Arts 1100 01:02:24,950 --> 01:02:27,270 has some tables that look like this. 1101 01:02:27,270 --> 01:02:30,050 They have artists in their collection, and they also 1102 01:02:30,050 --> 01:02:32,180 have art work in their collection. 1103 01:02:32,180 --> 01:02:36,950 They also have a created table to associate artists with their artwork. 1104 01:02:36,950 --> 01:02:43,190 Now in this case, we know that I have this Unidentified artist. 1105 01:02:43,190 --> 01:02:47,420 And we can see that they authored this item in the collections, 1106 01:02:47,420 --> 01:02:50,000 "Farmers working at dawn." 1107 01:02:50,000 --> 01:02:55,730 But let's say that later on, we find out it wasn't an Unidentified artist. 1108 01:02:55,730 --> 01:03:00,020 It was instead Li Yin, who created "Farmers working at dawn." 1109 01:03:00,020 --> 01:03:06,350 How could we update our tables to make sure it's Li Yin, who we have 1110 01:03:06,350 --> 01:03:10,660 creating "Farmers working at dawn." 1111 01:03:10,660 --> 01:03:15,154 SPEAKER 14: So what we could do is-- 1112 01:03:15,154 --> 01:03:20,690 we know that in created, we have our artist_id and [? collection id, ?] 1113 01:03:20,690 --> 01:03:23,920 so we could update it there. 1114 01:03:23,920 --> 01:03:28,840 But somehow, we need to cascade it over to the [? artists ?] [INAUDIBLE] now 1115 01:03:28,840 --> 01:03:32,380 this unidentified [? artist ?] [? through-- ?] I think we probably 1116 01:03:32,380 --> 01:03:33,107 don't need it. 1117 01:03:33,107 --> 01:03:34,690 SPEAKER 1: Yeah, so a good point here. 1118 01:03:34,690 --> 01:03:39,250 We could actually probably change the created table to [? re-associate ?] 1119 01:03:39,250 --> 01:03:41,750 an artist with some new artwork here. 1120 01:03:41,750 --> 01:03:42,880 So let's visualize this. 1121 01:03:42,880 --> 01:03:44,540 I go to my created table. 1122 01:03:44,540 --> 01:03:51,010 And here the artist_id is currently 3, but I want it really to be 1. 1123 01:03:51,010 --> 01:03:55,870 I want Li Yin to be associated with this piece called "Farmers working at dawn." 1124 01:03:55,870 --> 01:04:01,600 So I could update the created table to instead of having 3 here have 1. 1125 01:04:01,600 --> 01:04:06,220 Now, we see Li Yin created "Farmers working at dawn." [? And if ?] 1126 01:04:06,220 --> 01:04:08,830 you're concerned about this Unidentified artist here, 1127 01:04:08,830 --> 01:04:11,380 I think it's OK to have an artist in our table 1128 01:04:11,380 --> 01:04:14,350 who may or may not have an item in collections. 1129 01:04:14,350 --> 01:04:17,240 We'll say that's OK at least for now. 1130 01:04:17,240 --> 01:04:22,270 So if we can update our associations between artists and collections 1131 01:04:22,270 --> 01:04:26,770 like this, let's actually try to do that in our very own database here. 1132 01:04:26,770 --> 01:04:33,010 I'll go back to my SQLite environment, and now, let me try to open it first. 1133 01:04:33,010 --> 01:04:35,560 I'll use sqlite3 mfa.db. 1134 01:04:35,560 --> 01:04:39,460 And now let me type .schema to show you [INAUDIBLE] the very same schema from 1135 01:04:39,460 --> 01:04:40,250 before. 1136 01:04:40,250 --> 01:04:45,340 So I want to update the artists association between Li Yin and "Farmers 1137 01:04:45,340 --> 01:04:46,570 working at dawn." 1138 01:04:46,570 --> 01:04:51,370 So let's say, I'll SELECT [? star ?] FROM "created," like this, semicolon. 1139 01:04:51,370 --> 01:04:55,990 Here, I have my artist_ids and my collection_ids. 1140 01:04:55,990 --> 01:05:00,340 I see that we have the Unidentified artist, id of 3, 1141 01:05:00,340 --> 01:05:05,690 creating "Farmers working at dawn," with the id of 1 in our collection here. 1142 01:05:05,690 --> 01:05:11,170 So now, let's try to update the artist who created this particular painting. 1143 01:05:11,170 --> 01:05:16,660 Here, I have 3 associated with 1, but I want 1 associated with 1, 1144 01:05:16,660 --> 01:05:18,580 where 1 is the id for Li Yin. 1145 01:05:18,580 --> 01:05:22,858 And [? 1 ?] [? in ?] my collection id is the id for this artwork here. 1146 01:05:22,858 --> 01:05:23,650 So let me try this. 1147 01:05:23,650 --> 01:05:29,620 I'll say UPDATE "created" and SET, let's say, "artist_id" 1148 01:05:29,620 --> 01:05:32,770 equal to some particular value. 1149 01:05:32,770 --> 01:05:35,650 Well, what value should I set it to? 1150 01:05:35,650 --> 01:05:39,940 I could try to set it to Li Yin's id, which we know is just 1. 1151 01:05:39,940 --> 01:05:43,300 But let me try instead to use a [? subquery ?] here. 1152 01:05:43,300 --> 01:05:49,120 Let me try to say parentheses, and then write some query to update this value. 1153 01:05:49,120 --> 01:05:55,300 I'll say SELECT "id" from "artists" like this Enter again, 1154 01:05:55,300 --> 01:06:00,190 WHERE the "name" equals Li Yin. 1155 01:06:00,190 --> 01:06:04,090 And then let me close this [? subquery. ?] Well, 1156 01:06:04,090 --> 01:06:10,510 if I try to run this query, what might Happen I'm updating the created table, 1157 01:06:10,510 --> 01:06:16,600 I'm setting the artist_id equal to the id for Li Yin, but what I've forgotten 1158 01:06:16,600 --> 01:06:21,520 is this WHERE to only choose some rows to update. 1159 01:06:21,520 --> 01:06:23,200 [INAUDIBLE] not close it yet. 1160 01:06:23,200 --> 01:06:29,140 I'll instead say WHERE, in this case, the "collection_id," the piece 1161 01:06:29,140 --> 01:06:34,120 of artwork in our collection, is equal to [? while ?] the id 1162 01:06:34,120 --> 01:06:35,330 for this painting. 1163 01:06:35,330 --> 01:06:39,100 I'll say SELECT "id" from "collections," and then I'll 1164 01:06:39,100 --> 01:06:45,710 say WHERE the "title" equals "Farmers working at dawn." 1165 01:06:45,710 --> 01:06:49,700 Now, I'll close this [? subquery, ?] and hit semicolon. 1166 01:06:49,700 --> 01:06:53,600 And here, we've seen our first example of an UPDATE query. 1167 01:06:53,600 --> 01:06:56,510 I'm trying to update the artist_id column 1168 01:06:56,510 --> 01:06:59,990 in created to be the id for Li Yin. 1169 01:06:59,990 --> 01:07:03,050 I only want to do that though on the row where 1170 01:07:03,050 --> 01:07:07,460 collection_id is equal to the id for this particular painting. 1171 01:07:07,460 --> 01:07:09,470 [INAUDIBLE] change the attribution for. 1172 01:07:09,470 --> 01:07:11,360 So now, I'll hit Enter. 1173 01:07:11,360 --> 01:07:14,370 And if I SELECT [? star ?] FROM "created," 1174 01:07:14,370 --> 01:07:20,270 I should hopefully see in this case that the artist_id associated 1175 01:07:20,270 --> 01:07:25,680 with this painting is 2 and also down below here is 1 as well. 1176 01:07:25,680 --> 01:07:31,290 So I have Li Yin associated with now two paintings overall. 1177 01:07:31,290 --> 01:07:34,760 So let's get a grasp on what this [? UPDATE ?] syntax really 1178 01:07:34,760 --> 01:07:36,140 looks like in general. 1179 01:07:36,140 --> 01:07:38,720 And for that, let's show the slide here. 1180 01:07:38,720 --> 01:07:44,090 We have this [? UPDATE ?] [? keyword, ?] [? UPDATE ?] statement in SQL to take 1181 01:07:44,090 --> 01:07:47,510 a table name and update the columns inside of it. 1182 01:07:47,510 --> 01:07:51,410 I say UPDATE, then the name of the table I want to update, 1183 01:07:51,410 --> 01:07:56,810 then I say SET some column equal to some value. 1184 01:07:56,810 --> 01:07:59,910 I could if I wanted to have more than one column here. 1185 01:07:59,910 --> 01:08:03,620 I could say maybe title and even maybe if we're 1186 01:08:03,620 --> 01:08:07,040 talking about authors, authors over here, or even acquired date. 1187 01:08:07,040 --> 01:08:11,930 I could update more than one column in my [? SET ?] [? portion ?] here. 1188 01:08:11,930 --> 01:08:16,370 Then comes this WHERE portion, where some condition is true. 1189 01:08:16,370 --> 01:08:19,970 I want to make sure I don't update all of my rows. 1190 01:08:19,970 --> 01:08:24,270 I only update those where some condition is actually true. 1191 01:08:24,270 --> 01:08:28,069 So this is your syntax for updating some columns. 1192 01:08:28,069 --> 01:08:30,859 Let's say, if you want to change an artist's attribution 1193 01:08:30,859 --> 01:08:34,950 or if you want to change a typo you've made. 1194 01:08:34,950 --> 01:08:38,810 So let's see this now, not just in terms of changing 1195 01:08:38,810 --> 01:08:42,394 artists and their attributions, let's see a use case for update 1196 01:08:42,394 --> 01:08:45,300 where we've made some mistakes in our data. 1197 01:08:45,300 --> 01:08:49,609 And let's say the museum decides to host some kind of event 1198 01:08:49,609 --> 01:08:52,160 where people vote on their favorite piece of artwork, 1199 01:08:52,160 --> 01:08:54,920 they kind of [? handwrite ?] or type it into some online form. 1200 01:08:54,920 --> 01:08:58,010 Well, when I get back, a CSV of those responses, 1201 01:08:58,010 --> 01:09:02,600 some Comma Separated Values, one line for each vote from our people 1202 01:09:02,600 --> 01:09:04,640 who've attended this convention. 1203 01:09:04,640 --> 01:09:08,479 Let's go back over here, and I'll show you that CSV. 1204 01:09:08,479 --> 01:09:11,060 Let me go to my environment. 1205 01:09:11,060 --> 01:09:16,939 And I'll type code mfa.csv to open this CSV that I already have. 1206 01:09:16,939 --> 01:09:19,430 And here, actually, it's not called mfa.csv. 1207 01:09:19,430 --> 01:09:22,430 It's called code votes.csv. 1208 01:09:22,430 --> 01:09:27,890 And now here we can see, I have a table of one column 1209 01:09:27,890 --> 01:09:29,870 that has several votes inside of it. 1210 01:09:29,870 --> 01:09:33,140 Let's see, we have maybe 20 votes to be exact. 1211 01:09:33,140 --> 01:09:35,840 So the first row is the header row. 1212 01:09:35,840 --> 01:09:39,649 I have in this CSV, one column called title. 1213 01:09:39,649 --> 01:09:44,029 And each line here is one vote for a [? museum-goers ?] favorite piece 1214 01:09:44,029 --> 01:09:44,750 of artwork. 1215 01:09:44,750 --> 01:09:48,319 We see "Farmers working at dawn," we see "Imaginative landscape," 1216 01:09:48,319 --> 01:09:49,790 "Profusion of flowers." 1217 01:09:49,790 --> 01:09:55,280 And our goal is to count up these votes to see which is the most popular. 1218 01:09:55,280 --> 01:10:01,670 Well, let's try using .import again to actually turn this CSV into our very 1219 01:10:01,670 --> 01:10:03,530 own SQLite database. 1220 01:10:03,530 --> 01:10:08,240 Let me go back to my [? TERMINAL, ?] and I'll type sqlite3 votes.db 1221 01:10:08,240 --> 01:10:10,580 to create a votes database. 1222 01:10:10,580 --> 01:10:11,630 I'll hit Enter. 1223 01:10:11,630 --> 01:10:14,390 And now, I can use .import. 1224 01:10:14,390 --> 01:10:21,440 I could say .import a CSV called votes.csv into a table also called 1225 01:10:21,440 --> 01:10:22,370 votes. 1226 01:10:22,370 --> 01:10:23,390 I'll Enter now. 1227 01:10:23,390 --> 01:10:28,280 And now if I SELECT [? star ?] FROM "votes" semicolon, 1228 01:10:28,280 --> 01:10:33,210 I'll see all of my votes now in a single table. 1229 01:10:33,210 --> 01:10:35,040 So I want to tally these votes. 1230 01:10:35,040 --> 01:10:37,790 And we could do that using a technique we learned back in the week 1231 01:10:37,790 --> 01:10:40,010 on querying and relating. 1232 01:10:40,010 --> 01:10:44,930 What if I wanted to group by my titles and count up those groups, 1233 01:10:44,930 --> 01:10:45,990 I could do that. 1234 01:10:45,990 --> 01:10:50,990 I could say SELECT, let's say, "title," and also the COUNT of "title," 1235 01:10:50,990 --> 01:10:55,530 and [? they ?] count up the votes for each title FROM, in this case, 1236 01:10:55,530 --> 01:10:58,050 my "votes" table like this. 1237 01:10:58,050 --> 01:11:03,510 Now, I want to count not just each vote individually, but I want to group them. 1238 01:11:03,510 --> 01:11:07,870 I want to count votes by group where that group is, in this case, 1239 01:11:07,870 --> 01:11:09,610 the "title" itself. 1240 01:11:09,610 --> 01:11:13,110 So let me try running this query and hitting Enter. 1241 01:11:13,110 --> 01:11:17,370 Well, [? I'll ?] [? see ?] I've started to count some votes, 1242 01:11:17,370 --> 01:11:23,130 like I have Imaginative landscape here, Spring outing, Farmers working at dawn, 1243 01:11:23,130 --> 01:11:25,110 Spring outing down below here too. 1244 01:11:25,110 --> 01:11:28,950 But I seem to be missing something, like I shouldn't have this many groups. 1245 01:11:28,950 --> 01:11:30,570 I should only have four groups. 1246 01:11:30,570 --> 01:11:32,940 Why do I have so many? 1247 01:11:32,940 --> 01:11:35,010 Well, it seems like there are some typos here. 1248 01:11:35,010 --> 01:11:37,660 What typos do you see? 1249 01:11:37,660 --> 01:11:40,680 Feel free to raise your hand. 1250 01:11:40,680 --> 01:11:43,920 SPEAKER 15: So I think that it is a title-- 1251 01:11:43,920 --> 01:11:49,215 when you count it is case sensitive, so maybe it is counting one by one. 1252 01:11:49,215 --> 01:11:50,340 SPEAKER 1: Yeah, good idea. 1253 01:11:50,340 --> 01:11:53,670 So maybe it's case sensitive our [? GROUP ?] [? BY, ?] and you'd be 1254 01:11:53,670 --> 01:11:56,550 correct if you said that. [? GROUP ?] [? BY ?] is, in this case, 1255 01:11:56,550 --> 01:11:57,910 case sensitive. 1256 01:11:57,910 --> 01:12:01,020 So if we see here, we see Farmers Working at Dawn, 1257 01:12:01,020 --> 01:12:04,890 all capitalized, at least the Farmers Working and Dawn capitalized. 1258 01:12:04,890 --> 01:12:07,230 Well, that won't be part of the same group 1259 01:12:07,230 --> 01:12:13,590 as let's say Farmers working at dawn all properly capitalized, in this case, 1260 01:12:13,590 --> 01:12:17,940 with lowercase working and lowercase dawn as it is the MFA collections. 1261 01:12:17,940 --> 01:12:21,360 So we could probably fix some capitalization here. 1262 01:12:21,360 --> 01:12:23,240 What other things could we fix? 1263 01:12:23,240 --> 01:12:25,665 So let's hear from one more person. 1264 01:12:25,665 --> 01:12:26,790 Let's go to [? Lawrence. ?] 1265 01:12:26,790 --> 01:12:29,663 SPEAKER 16: [? The ?] [? count ?] on each typo. 1266 01:12:29,663 --> 01:12:31,830 SPEAKER 1: There are some typos in here, absolutely. 1267 01:12:31,830 --> 01:12:35,400 So if I look at some of these, I see not farmers, 1268 01:12:35,400 --> 01:12:37,620 but [? Famers ?] working at dawn. 1269 01:12:37,620 --> 01:12:40,410 I might see instead of Imaginative landscape, 1270 01:12:40,410 --> 01:12:43,470 I just see [? Imagintive ?] landscape, so definitely some 1271 01:12:43,470 --> 01:12:46,120 typos in here to fix as well. 1272 01:12:46,120 --> 01:12:49,380 So let's start working on using our [? UPDATE ?] tool or [? UPDATE ?] 1273 01:12:49,380 --> 01:12:53,230 keyword to fix some of this data, to clean it up, so to speak, 1274 01:12:53,230 --> 01:12:56,460 so we can count up our votes for these artworks. 1275 01:12:56,460 --> 01:13:02,550 Come back to my [? TERMINAL. ?] And now, let's try updating to at least remove 1276 01:13:02,550 --> 01:13:04,170 some of this [? whitespace ?] here. 1277 01:13:04,170 --> 01:13:07,900 You might notice that Imaginative landscape has a space before it, 1278 01:13:07,900 --> 01:13:09,570 and so does Spring outing here. 1279 01:13:09,570 --> 01:13:13,860 That space counts as part of the title, and I want to remove it. 1280 01:13:13,860 --> 01:13:15,850 So what could I do? 1281 01:13:15,850 --> 01:13:21,750 I could try using a function in SQLite, one called [? trim. ?] [? Trim ?] takes 1282 01:13:21,750 --> 01:13:26,020 a string and just removes leading and trailing [? whitespace, ?] [INAUDIBLE] 1283 01:13:26,020 --> 01:13:28,650 back only the characters in the middle. 1284 01:13:28,650 --> 01:13:29,830 So let me try that. 1285 01:13:29,830 --> 01:13:34,410 I'll say UPDATE "votes," my table here, and SET 1286 01:13:34,410 --> 01:13:38,710 the "title" column equal to what? 1287 01:13:38,710 --> 01:13:41,130 Well, the trimmed version of title. 1288 01:13:41,130 --> 01:13:42,810 Trim looks like this-- 1289 01:13:42,810 --> 01:13:47,500 trim, and then some parentheses to say the input to this function. 1290 01:13:47,500 --> 01:13:49,620 I'll give it the "title" column. 1291 01:13:49,620 --> 01:13:51,610 Now, I'll hit semicolon. 1292 01:13:51,610 --> 01:13:54,250 And this query will run as follows. 1293 01:13:54,250 --> 01:13:56,370 It will update the votes table. 1294 01:13:56,370 --> 01:13:59,880 And inside the votes table set, the title column 1295 01:13:59,880 --> 01:14:04,950 equal to the trimmed version of whatever values are inside that column, 1296 01:14:04,950 --> 01:14:09,210 trimming here means remove trailing and leading [? whitespace. ?] 1297 01:14:09,210 --> 01:14:11,850 I don't need to apply any condition. 1298 01:14:11,850 --> 01:14:15,270 I don't need to say WHERE dot dot dot, because I want this 1299 01:14:15,270 --> 01:14:18,910 to apply to all rows in my data set. 1300 01:14:18,910 --> 01:14:20,220 Now, I'll hit Enter. 1301 01:14:20,220 --> 01:14:21,750 And let me try running this again. 1302 01:14:21,750 --> 01:14:24,840 I'll say up arrow to get back my old query. 1303 01:14:24,840 --> 01:14:29,430 Now, I'll try counting again, and I think we're making some progress, 1304 01:14:29,430 --> 01:14:31,620 like I can see that I've removed-- 1305 01:14:31,620 --> 01:14:34,890 [? removed ?] [? quote ?] [? unquote, ?] those titles that had leading 1306 01:14:34,890 --> 01:14:37,800 or trailing [? whitespace. ?] They're now part of a group 1307 01:14:37,800 --> 01:14:41,400 because their titles now match those groups. 1308 01:14:41,400 --> 01:14:42,120 Let's keep going. 1309 01:14:42,120 --> 01:14:44,620 We noticed before, we had some capitalization issues. 1310 01:14:44,620 --> 01:14:46,930 So let's fix those as well. 1311 01:14:46,930 --> 01:14:50,670 I can go back to my [? TERMINAL, ?] and we could introduce this other function, 1312 01:14:50,670 --> 01:14:52,170 one called [? upper. ?] 1313 01:14:52,170 --> 01:14:55,620 [? Upper ?] converts a string like Farmers 1314 01:14:55,620 --> 01:14:59,490 working at dawn to be all in uppercase, kind of shout it out. 1315 01:14:59,490 --> 01:15:03,060 And that's useful for us here because we could kind of force everything 1316 01:15:03,060 --> 01:15:07,840 to uppercase, thereby removing any inconsistencies in capitalization. 1317 01:15:07,840 --> 01:15:08,950 So let me try that. 1318 01:15:08,950 --> 01:15:12,130 I'll say UPDATE, again, the "votes" table. 1319 01:15:12,130 --> 01:15:18,480 And this time, I'll set "title" equal to the uppercase version of "title." 1320 01:15:18,480 --> 01:15:22,200 So upper here is a function in SQLite that 1321 01:15:22,200 --> 01:15:25,530 takes some values associated with the column and [? uppercases ?] 1322 01:15:25,530 --> 01:15:30,420 the entire thing to remove any inconsistencies in capitalization. 1323 01:15:30,420 --> 01:15:34,620 Again, no condition here, because I want to apply this to all rows. 1324 01:15:34,620 --> 01:15:35,730 I'll hit Enter. 1325 01:15:35,730 --> 01:15:41,340 And now if I SELECT [? star ?] FROM "collections" semicolon-- 1326 01:15:41,340 --> 01:15:43,618 oops, SELECT [? star ?] FROM "votes," in this case-- 1327 01:15:43,618 --> 01:15:44,910 no longer as collections table. 1328 01:15:44,910 --> 01:15:48,030 I'll say SELECT [? star ?] FROM "votes" semicolon, 1329 01:15:48,030 --> 01:15:51,390 I should see all of my titles capitalized. 1330 01:15:51,390 --> 01:15:53,130 But now, I want to group by. 1331 01:15:53,130 --> 01:15:54,060 So let's try this. 1332 01:15:54,060 --> 01:16:00,120 I'll go up arrow to get back to my old query that GROUP BY "title," hit Enter. 1333 01:16:00,120 --> 01:16:02,760 And now, we're still making some more progress. 1334 01:16:02,760 --> 01:16:05,970 I'm reducing my number of groups until I get to 4. 1335 01:16:05,970 --> 01:16:10,260 But there's still some typos, like we see Famers working at dawn, 1336 01:16:10,260 --> 01:16:14,130 or Farmers working, or Farmesr working at dawn. 1337 01:16:14,130 --> 01:16:16,560 We want these to be included as part of this group-- 1338 01:16:16,560 --> 01:16:18,870 Farmers working at dawn. 1339 01:16:18,870 --> 01:16:22,240 So we should fix some of these typos here. 1340 01:16:22,240 --> 01:16:25,380 Let me go back to my [? TERMINAL. ?] Now, let's try this out. 1341 01:16:25,380 --> 01:16:29,040 I could, in this case, just manually correct these. 1342 01:16:29,040 --> 01:16:31,170 You could imagine me doing something like this. 1343 01:16:31,170 --> 01:16:36,900 UPDATE "votes" and SET the "title" equal to all caps 1344 01:16:36,900 --> 01:16:43,170 "FARMERS WORKING AT DAWN," now including a condition, WHERE the "title" 1345 01:16:43,170 --> 01:16:50,617 presently is equal to for "FAMERS WORKING" like this, semicolon-- 1346 01:16:50,617 --> 01:16:51,450 or not Famers work-- 1347 01:16:51,450 --> 01:16:54,480 I think it's Farmers Working, just all-- 1348 01:16:54,480 --> 01:16:55,620 in two words. 1349 01:16:55,620 --> 01:16:56,703 That was one title we saw. 1350 01:16:56,703 --> 01:16:57,620 [? TRANSCRIPT BREAK ?] 1351 01:16:57,620 --> 01:16:59,950 SPEAKER: I could manually update it a bit like this. 1352 01:16:59,950 --> 01:17:02,610 Let me hit Enter, and let me do the same query 1353 01:17:02,610 --> 01:17:05,550 by hitting the up arrow on my keyboard. 1354 01:17:05,550 --> 01:17:07,910 Let me find this one, hit Enter, and now we 1355 01:17:07,910 --> 01:17:10,880 can see that that title, "Farmers Working," 1356 01:17:10,880 --> 01:17:13,040 is now just part of the main group. 1357 01:17:13,040 --> 01:17:15,470 "Farmers Working At Dawn." 1358 01:17:15,470 --> 01:17:17,040 I could keep going here. 1359 01:17:17,040 --> 01:17:21,560 I could try to convert "Farmers Working At Dawn" to "Farmers Working At Dawn." 1360 01:17:21,560 --> 01:17:22,500 Let me try it. 1361 01:17:22,500 --> 01:17:24,110 I could say select. 1362 01:17:24,110 --> 01:17:25,952 Let's say-- not select. 1363 01:17:25,952 --> 01:17:26,535 Let me update. 1364 01:17:26,535 --> 01:17:29,690 Let me update in this case votes, and set 1365 01:17:29,690 --> 01:17:37,340 title equal to "Farmers Working At Dawn" where the title equals 1366 01:17:37,340 --> 01:17:41,330 in this case "Farmers Working At Dawn." 1367 01:17:41,330 --> 01:17:47,330 Semicolon, Enter, let me hit up arrow to go back to my old query to group by, 1368 01:17:47,330 --> 01:17:48,890 and now we're getting somewhere. 1369 01:17:48,890 --> 01:17:52,370 I'm seeing now only two groups for "Farmers Working At Dawn," 1370 01:17:52,370 --> 01:17:55,190 but one more to fix. 1371 01:17:55,190 --> 01:17:59,150 Now we'd be here for a while if I spend all this time 1372 01:17:59,150 --> 01:18:04,380 one by one fixing these typos, so there's probably a better way. 1373 01:18:04,380 --> 01:18:05,150 There is. 1374 01:18:05,150 --> 01:18:08,090 This one is going to be using the like keyword. 1375 01:18:08,090 --> 01:18:10,340 We don't have to say equal so much. 1376 01:18:10,340 --> 01:18:11,840 We can just say like. 1377 01:18:11,840 --> 01:18:16,850 Now, I could try to match any title that looks like "Farmers Working At Dawn" 1378 01:18:16,850 --> 01:18:19,400 and update it to be the correct title. 1379 01:18:19,400 --> 01:18:21,390 So let's try that here. 1380 01:18:21,390 --> 01:18:29,150 I could now try update votes and set title equal to "Farmers Working 1381 01:18:29,150 --> 01:18:30,980 At Dawn." 1382 01:18:30,980 --> 01:18:38,490 "Farmers Working At Dawn" where the title is like-- 1383 01:18:38,490 --> 01:18:42,000 the title is like some pattern I could give. 1384 01:18:42,000 --> 01:18:48,210 Now for this data set, it turns out that the only artwork that begins with F-A 1385 01:18:48,210 --> 01:18:49,980 is Farmers Working At Dawn. 1386 01:18:49,980 --> 01:18:55,110 So to match any title with F-A and then anything afterwards, 1387 01:18:55,110 --> 01:19:01,110 I could use F-A percent, and then close out that pattern. 1388 01:19:01,110 --> 01:19:04,800 Notice how I don't have to have consistent capitalization. 1389 01:19:04,800 --> 01:19:09,420 Like is case insensitive, and notice how, too, the percent matches 1390 01:19:09,420 --> 01:19:15,870 anything that comes after F-A. Now in this data set, this is OK, 1391 01:19:15,870 --> 01:19:20,490 but in a larger data set, you probably want to avoid this kind of query 1392 01:19:20,490 --> 01:19:23,580 because you could match much more than Farmers Working At Dawn. 1393 01:19:23,580 --> 01:19:27,600 You'd match anything that begins with F-A, so be careful with this 1394 01:19:27,600 --> 01:19:32,370 and think through what pattern should you use to fix up your titles. 1395 01:19:32,370 --> 01:19:36,120 Let me just hit Enter here, and now let me try hitting up arrow again 1396 01:19:36,120 --> 01:19:40,530 to show you the result, and now we're in a pretty good place. 1397 01:19:40,530 --> 01:19:44,820 I see all of my titles like Farmers Working At Dawn are correctly 1398 01:19:44,820 --> 01:19:50,350 capitalized, consistently spelled, and all in one place. 1399 01:19:50,350 --> 01:19:52,200 OK, let's try the others here. 1400 01:19:52,200 --> 01:19:55,530 I also have "Imaginative landscape which could be a little bit 1401 01:19:55,530 --> 01:19:56,790 better designed here. 1402 01:19:56,790 --> 01:20:00,210 A little bit better grouped as well by fixing some of these typos. 1403 01:20:00,210 --> 01:20:05,190 Well, here, I could try to match on to maybe imaginative. 1404 01:20:05,190 --> 01:20:08,380 That seems like a long enough word that it could be useful for me here. 1405 01:20:08,380 --> 01:20:10,260 So I'll say something like select-- 1406 01:20:10,260 --> 01:20:10,770 not select. 1407 01:20:10,770 --> 01:20:19,050 I'll do update votes and set the title equal to "Imaginative Landscape." 1408 01:20:19,050 --> 01:20:26,640 Now I want to match any rows that have a title like "Imaginative space 1409 01:20:26,640 --> 01:20:30,150 percent," end quote, semicolon. 1410 01:20:30,150 --> 01:20:35,010 Now I'll match any title that has imaginative as a full word 1411 01:20:35,010 --> 01:20:37,650 at the very beginning of this phrase. 1412 01:20:37,650 --> 01:20:41,100 This is, I would say, a better query than F-A 1413 01:20:41,100 --> 01:20:44,100 because F-A could match much more than this painting name. 1414 01:20:44,100 --> 01:20:48,150 Here, if we have only one painting that begins with imaginative, 1415 01:20:48,150 --> 01:20:50,232 this could work for us now. 1416 01:20:50,232 --> 01:20:50,940 So I'll try this. 1417 01:20:50,940 --> 01:20:55,100 I'll say Enter, and now, let me rerun the query. 1418 01:20:55,100 --> 01:20:57,832 Group by, and we're so close. 1419 01:20:57,832 --> 01:20:58,790 There's still one more. 1420 01:20:58,790 --> 01:21:02,090 "Imagintive Landscape," so for this one, I 1421 01:21:02,090 --> 01:21:04,970 might just need to manually update that piece. 1422 01:21:04,970 --> 01:21:11,480 I could say let's do update, votes, and set title equal to 1423 01:21:11,480 --> 01:21:16,460 in this case "Imaginative Landscape" where, of course, the title currently 1424 01:21:16,460 --> 01:21:21,140 is "Imagintive Landscape" without the A in there. 1425 01:21:21,140 --> 01:21:23,240 Now hit semicolon to run this query. 1426 01:21:23,240 --> 01:21:27,800 I'll show you the results, and now, I think we're literally almost there. 1427 01:21:27,800 --> 01:21:29,150 There's only one more to do. 1428 01:21:29,150 --> 01:21:30,530 Profusion of Flowers. 1429 01:21:30,530 --> 01:21:32,090 Let's fix that one now. 1430 01:21:32,090 --> 01:21:34,880 I could say select votes-- 1431 01:21:34,880 --> 01:21:35,720 not select. 1432 01:21:35,720 --> 01:21:36,290 Update. 1433 01:21:36,290 --> 01:21:37,332 I'm always updating here. 1434 01:21:37,332 --> 01:21:41,510 Update votes, and then we want to set the title 1435 01:21:41,510 --> 01:21:45,110 column to be equal to profusion-- 1436 01:21:45,110 --> 01:21:49,160 "Profusion of Flowers" like this. 1437 01:21:49,160 --> 01:21:55,880 Then where the title is like, in this case, profusion at the beginning. 1438 01:21:55,880 --> 01:21:59,330 Notice how I can't use perfusion space and then 1439 01:21:59,330 --> 01:22:04,190 afterwards because we have one title that is literally just "Profusion," 1440 01:22:04,190 --> 01:22:05,030 plain and simple. 1441 01:22:05,030 --> 01:22:06,095 No spaces afterwards. 1442 01:22:06,095 --> 01:22:07,220 No spaces at the beginning. 1443 01:22:07,220 --> 01:22:10,860 To match that, I should just say profusion at the beginning. 1444 01:22:10,860 --> 01:22:14,660 Now I can hit Enter, and if I do my usual query 1445 01:22:14,660 --> 01:22:17,270 to show you the votes grouped by title, I 1446 01:22:17,270 --> 01:22:21,800 see I'm finally at my correct, fully-formatted data set. 1447 01:22:21,800 --> 01:22:24,230 Farmers Working At Dawn has six votes. 1448 01:22:24,230 --> 01:22:28,460 Imaginative Landscape has five, and so on down the row. 1449 01:22:28,460 --> 01:22:33,080 So here we've seen update being used not just to associate artists 1450 01:22:33,080 --> 01:22:36,140 with new paintings, but also to clean our data, 1451 01:22:36,140 --> 01:22:39,110 to make sure we can actually tally votes up in a way that's 1452 01:22:39,110 --> 01:22:42,750 consistent and clean in this case. 1453 01:22:42,750 --> 01:22:45,420 What questions do we have on update now? 1454 01:22:45,420 --> 01:22:48,380 1455 01:22:48,380 --> 01:22:50,200 Let's go to [? Vixay. ?] 1456 01:22:50,200 --> 01:22:51,950 [? VIXAY: ?] Yeah, so what I wanted to ask 1457 01:22:51,950 --> 01:22:55,430 is if there is a specific function for like lowercase. 1458 01:22:55,430 --> 01:23:00,037 So there's a function for upper, and these are the ones for all lowercase. 1459 01:23:00,037 --> 01:23:02,120 SPEAKER: Yeah, a good corresponding question here. 1460 01:23:02,120 --> 01:23:03,980 We used upper, but is there lower? 1461 01:23:03,980 --> 01:23:05,600 Turns out, there is a lower function. 1462 01:23:05,600 --> 01:23:09,680 It's spelled lower, and is in all lowercase like upper was, 1463 01:23:09,680 --> 01:23:11,570 so I could use that as well. 1464 01:23:11,570 --> 01:23:15,950 There is a whole list of what SQLite calls these scalar functions, functions 1465 01:23:15,950 --> 01:23:19,940 that take in some set of values and return to you some modified 1466 01:23:19,940 --> 01:23:21,480 version of those values. 1467 01:23:21,480 --> 01:23:24,170 So if you do some googling, look for SQLite scalar functions. 1468 01:23:24,170 --> 01:23:30,420 You can find all those functions in one place in the SQLite documentation. 1469 01:23:30,420 --> 01:23:32,200 Let's take one more question here. 1470 01:23:32,200 --> 01:23:33,620 Let's go to Rupinder. 1471 01:23:33,620 --> 01:23:38,390 RUPINDER: So, yeah, my question is so instead of updating all the titles, 1472 01:23:38,390 --> 01:23:42,320 is it possible to create a new column with the four categories 1473 01:23:42,320 --> 01:23:45,355 that we want to have actually? 1474 01:23:45,355 --> 01:23:47,230 SPEAKER: Yeah, I could see that working, too. 1475 01:23:47,230 --> 01:23:48,855 So there's a few ways to fix this data. 1476 01:23:48,855 --> 01:23:53,060 You could imagine adding a new column, and maybe you 1477 01:23:53,060 --> 01:23:59,542 assign values to the rows in that column based on what you see in the title 1478 01:23:59,542 --> 01:24:00,500 that they've given you. 1479 01:24:00,500 --> 01:24:03,260 So if you can match something like imaginative, 1480 01:24:03,260 --> 01:24:06,500 and you'd know that all your paintings associated with imaginative 1481 01:24:06,500 --> 01:24:10,340 have the same title, you could group them into the one category, the two 1482 01:24:10,340 --> 01:24:12,110 category, and so on. 1483 01:24:12,110 --> 01:24:14,270 Here, you wouldn't be modifying those titles. 1484 01:24:14,270 --> 01:24:16,970 You'd just be trying to strategically apply 1485 01:24:16,970 --> 01:24:19,610 different categories, which could also work as a solution here, 1486 01:24:19,610 --> 01:24:22,430 too, so good thinking. 1487 01:24:22,430 --> 01:24:27,740 OK, so we've seen now how to insert data, how to update data, 1488 01:24:27,740 --> 01:24:30,250 and previously, how to delete data. 1489 01:24:30,250 --> 01:24:35,900 What we can still do though, are learn how to have other SQL command-- 1490 01:24:35,900 --> 01:24:39,818 other SQL statements that can run after we make our very own SQL 1491 01:24:39,818 --> 01:24:40,610 statements as well. 1492 01:24:40,610 --> 01:24:44,150 We'll come back and talk about these things called triggers. 1493 01:24:44,150 --> 01:24:45,840 And we're back. 1494 01:24:45,840 --> 01:24:50,630 So we've seen now a whole collection of ways to write data to a database. 1495 01:24:50,630 --> 01:24:56,080 We've seen how to insert data, how to update data, and even delete data. 1496 01:24:56,080 --> 01:24:59,000 What we'll see now though, this idea of a trigger, 1497 01:24:59,000 --> 01:25:04,670 a way of writing a SQL statement to run in response to some other SQL statement 1498 01:25:04,670 --> 01:25:08,780 like an insert, an update, or a delete. 1499 01:25:08,780 --> 01:25:12,890 Let's consider our museum yet again, and let's say they 1500 01:25:12,890 --> 01:25:15,860 have a schema a bit like this one now. 1501 01:25:15,860 --> 01:25:18,920 They have items in their collections, but they also 1502 01:25:18,920 --> 01:25:21,290 have a transactions table. 1503 01:25:21,290 --> 01:25:26,390 Now wouldn't it be nice if whenever I deleted something from the collections 1504 01:25:26,390 --> 01:25:31,190 table, it would show up in transactions as sold, having been 1505 01:25:31,190 --> 01:25:33,120 sold from the museum's collections. 1506 01:25:33,120 --> 01:25:36,120 Well, I could use a trigger to do just that. 1507 01:25:36,120 --> 01:25:38,270 Let's see it visually first though. 1508 01:25:38,270 --> 01:25:43,160 I'll say, let's try to delete "Spring Outing" from our collections table. 1509 01:25:43,160 --> 01:25:47,300 Well, if I do that, I could pretty easily do it with delete from, right? 1510 01:25:47,300 --> 01:25:50,540 Now it's no longer part of my database, but if I 1511 01:25:50,540 --> 01:25:55,760 have created a trigger to run an insert whenever I hear 1512 01:25:55,760 --> 01:25:59,210 or have listened to a delete on my collections table, 1513 01:25:59,210 --> 01:26:04,370 I could then see that same title in my transactions table 1514 01:26:04,370 --> 01:26:08,250 as sold all automatically. 1515 01:26:08,250 --> 01:26:12,320 Let's think to maybe the museum acquires some piece of artwork. 1516 01:26:12,320 --> 01:26:15,440 Maybe they actually add something new to their collection. 1517 01:26:15,440 --> 01:26:19,880 We want that to show up in our transactions as well with the action of 1518 01:26:19,880 --> 01:26:20,570 bought. 1519 01:26:20,570 --> 01:26:21,440 Well, let's try it. 1520 01:26:21,440 --> 01:26:23,490 I could probably add a new row here. 1521 01:26:23,490 --> 01:26:28,280 I could say let's add "Peonies and Butterfly" to my collection here. 1522 01:26:28,280 --> 01:26:31,700 Now, if I have the appropriate trigger, I 1523 01:26:31,700 --> 01:26:36,320 could run an insert into on transactions automatically 1524 01:26:36,320 --> 01:26:40,190 that would insert this same title with the action of bot, 1525 01:26:40,190 --> 01:26:45,110 thereby keeping a log of all of our sold pieces of artwork and all of our 1526 01:26:45,110 --> 01:26:47,360 bought pieces of artwork. 1527 01:26:47,360 --> 01:26:50,000 Now to create what we're calling a trigger here, 1528 01:26:50,000 --> 01:26:52,430 we can use the following syntax. 1529 01:26:52,430 --> 01:26:57,620 I could say first create trigger, and then some name. 1530 01:26:57,620 --> 01:27:00,260 So I tend to give triggers names to identify them 1531 01:27:00,260 --> 01:27:04,670 among all of my database [INAUDIBLE],, like my tables and so on. 1532 01:27:04,670 --> 01:27:08,840 Here, after I say create trigger, I have to specify. 1533 01:27:08,840 --> 01:27:15,080 Should this trigger, this statement, run after or before some other SQL 1534 01:27:15,080 --> 01:27:16,190 statement? 1535 01:27:16,190 --> 01:27:18,350 Let's keep going with before here. 1536 01:27:18,350 --> 01:27:22,850 Maybe it runs before an insert on some table. 1537 01:27:22,850 --> 01:27:23,960 That's fine. 1538 01:27:23,960 --> 01:27:29,190 It could also run before an update of a column on some table. 1539 01:27:29,190 --> 01:27:34,190 It could even run before a delete on some table, and keep in mind, 1540 01:27:34,190 --> 01:27:38,450 this could be before or after any of these kinds of SQL statements. 1541 01:27:38,450 --> 01:27:42,770 In this case, we have insert, update, and delete. 1542 01:27:42,770 --> 01:27:48,050 Now after we say this, what we should say, too, is for each row. 1543 01:27:48,050 --> 01:27:53,210 For each row means that if I were to maybe delete multiple rows, 1544 01:27:53,210 --> 01:27:57,410 I should run my SQL statement for each row that I delete. 1545 01:27:57,410 --> 01:28:03,980 If I delete two rows, I should run this statement coming soon two times. 1546 01:28:03,980 --> 01:28:09,470 OK, so now we've set up our trigger, at least kind of the data, the setup 1547 01:28:09,470 --> 01:28:10,550 for it here. 1548 01:28:10,550 --> 01:28:15,620 What we haven't done yet is specify what statement should run whenever 1549 01:28:15,620 --> 01:28:18,020 we see a delete on particular table. 1550 01:28:18,020 --> 01:28:19,640 Well, let's try this. 1551 01:28:19,640 --> 01:28:21,590 Here I could say begin. 1552 01:28:21,590 --> 01:28:24,860 Begin means here comes my statement I want to run 1553 01:28:24,860 --> 01:28:28,370 whenever I hear a delete on this table. 1554 01:28:28,370 --> 01:28:29,840 Now, all right. 1555 01:28:29,840 --> 01:28:31,430 That query down below. 1556 01:28:31,430 --> 01:28:34,700 That statement inside my begin, and I'll finish it off 1557 01:28:34,700 --> 01:28:37,910 with an end to say this is the end of my statement, 1558 01:28:37,910 --> 01:28:41,600 and my entire trigger looks like all of this lines of code here 1559 01:28:41,600 --> 01:28:47,210 to say listen for a delete and run this statement in here for each row 1560 01:28:47,210 --> 01:28:50,530 that I delete, or update, or insert. 1561 01:28:50,530 --> 01:28:53,900 So let's try actually implementing in this case 1562 01:28:53,900 --> 01:28:57,460 the museum's collections where they're trying to sell or buy 1563 01:28:57,460 --> 01:28:59,800 pieces of artwork, and automatically add them 1564 01:28:59,800 --> 01:29:03,760 to their transaction table whenever they create or delete some piece of data 1565 01:29:03,760 --> 01:29:05,710 from their collections. 1566 01:29:05,710 --> 01:29:10,300 Go back to my terminal here, and let me pull up a database. 1567 01:29:10,300 --> 01:29:14,260 I'll do SQLite 3 mfa.db, and you'll notice 1568 01:29:14,260 --> 01:29:17,380 that this is the same schema that we've had before. 1569 01:29:17,380 --> 01:29:21,490 We have collections, artists, and the created table. 1570 01:29:21,490 --> 01:29:24,620 What's missing here, though, is our transactions table. 1571 01:29:24,620 --> 01:29:26,530 So let me make that for us here. 1572 01:29:26,530 --> 01:29:30,580 I'll say let me go create table, transactions, 1573 01:29:30,580 --> 01:29:36,000 and now let me give it an ID column of type integer. 1574 01:29:36,000 --> 01:29:41,030 Now, I'll also give it a title column as we saw before called "Text." 1575 01:29:41,030 --> 01:29:43,310 It's in this column [INAUDIBLE],, the title 1576 01:29:43,310 --> 01:29:46,100 of the piece we either bought or sold. 1577 01:29:46,100 --> 01:29:50,960 Now, let me also include the action column, also of type text 1578 01:29:50,960 --> 01:29:55,210 to say whether we bought or sold this piece. 1579 01:29:55,210 --> 01:30:00,050 OK, finally, I'll make the ID column my primary key. 1580 01:30:00,050 --> 01:30:06,080 And if I say semicolon here and hit Enter, I now have created this table. 1581 01:30:06,080 --> 01:30:10,210 So if I say dot schema, notice how I can now see my transactions 1582 01:30:10,210 --> 01:30:13,990 table down below, and I'm safe to run insert 1583 01:30:13,990 --> 01:30:18,100 on this table to add some data to it. 1584 01:30:18,100 --> 01:30:21,490 Well, let's try now creating our very first trigger. 1585 01:30:21,490 --> 01:30:25,300 We're going to try to create a trigger that whenever an item is deleted 1586 01:30:25,300 --> 01:30:28,990 from our collections table, we actually add it to our transactions table 1587 01:30:28,990 --> 01:30:33,200 with the action of sold, meaning we sold this particular item. 1588 01:30:33,200 --> 01:30:36,550 So let me say create trigger, as we saw before, 1589 01:30:36,550 --> 01:30:38,830 and give it some name to identify it. 1590 01:30:38,830 --> 01:30:41,030 I'll call this one sell. 1591 01:30:41,030 --> 01:30:47,290 Now, I'll say I want to run this trigger before I delete on the collections 1592 01:30:47,290 --> 01:30:48,170 table. 1593 01:30:48,170 --> 01:30:52,870 So before I run the delete query on my collections table, 1594 01:30:52,870 --> 01:30:56,980 I want to run this statement instead. 1595 01:30:56,980 --> 01:31:00,830 I'll say for each row, begin. 1596 01:31:00,830 --> 01:31:04,180 I want to now give you some statement to run for each row 1597 01:31:04,180 --> 01:31:06,730 that I delete from collections. 1598 01:31:06,730 --> 01:31:08,350 Let's make it this query here. 1599 01:31:08,350 --> 01:31:12,160 On 1, 2, 3, 4 spaces to indent and make sure 1600 01:31:12,160 --> 01:31:18,850 this is clearly the query I want to run after I run the delete on collections. 1601 01:31:18,850 --> 01:31:23,020 I'll say I want to insert into transactions. 1602 01:31:23,020 --> 01:31:27,730 I want to insert into the title and action columns. 1603 01:31:27,730 --> 01:31:29,410 What value should I insert? 1604 01:31:29,410 --> 01:31:34,210 Well, I should insert the old title, and actually, in triggers, 1605 01:31:34,210 --> 01:31:37,330 you get access to this keyword called old. 1606 01:31:37,330 --> 01:31:41,600 This is the title that we've just deleted from our collection. 1607 01:31:41,600 --> 01:31:46,540 So old.title gives me access to the old row, the row we deleted, 1608 01:31:46,540 --> 01:31:49,160 its title column in particular. 1609 01:31:49,160 --> 01:31:54,210 Now, let me add in the action, which is just plain and simple "sold." 1610 01:31:54,210 --> 01:31:58,820 So this is the query to run before I delete on collections. 1611 01:31:58,820 --> 01:32:03,350 I will take the title of the row I'm about to delete, and insert it 1612 01:32:03,350 --> 01:32:06,950 into transactions along with the action sold. 1613 01:32:06,950 --> 01:32:11,070 Let me type a semicolon here to say this is my entire query, hit Enter again, 1614 01:32:11,070 --> 01:32:14,000 and now I can end that particular statement 1615 01:32:14,000 --> 01:32:18,110 I want to run after I hear a delete on collections. 1616 01:32:18,110 --> 01:32:21,770 So I'll hit Enter now, and let me type dot schema. 1617 01:32:21,770 --> 01:32:26,480 I can actually see this trigger is part of my schema. 1618 01:32:26,480 --> 01:32:29,810 I can see create trigger sell before delete on transactions 1619 01:32:29,810 --> 01:32:33,410 underneath the transactions table. 1620 01:32:33,410 --> 01:32:36,680 OK, let's try putting this into action here. 1621 01:32:36,680 --> 01:32:39,470 Let me try actually deleting something from our collections table. 1622 01:32:39,470 --> 01:32:44,090 I'll say delete from collections where the title equals 1623 01:32:44,090 --> 01:32:45,920 "Profusion of Flowers." 1624 01:32:45,920 --> 01:32:49,460 I want to sell this particular piece of artwork from our collection. 1625 01:32:49,460 --> 01:32:54,530 I'll hit semicolon here, and now nothing seems to happen as usual, 1626 01:32:54,530 --> 01:32:58,290 but if I select star from collections-- 1627 01:32:58,290 --> 01:33:02,820 from collections, I see that piece is gone. 1628 01:33:02,820 --> 01:33:08,070 It's no longer in here, and if I select star from transactions-- 1629 01:33:08,070 --> 01:33:12,720 select star from transactions, I actually see it's been sold. 1630 01:33:12,720 --> 01:33:15,840 So automatically, I added "Profusion of Flowers" 1631 01:33:15,840 --> 01:33:18,810 with the action of sold to my transactions table. 1632 01:33:18,810 --> 01:33:21,400 I did not type insert myself. 1633 01:33:21,400 --> 01:33:24,960 I instead created a trigger to run this statement, 1634 01:33:24,960 --> 01:33:29,720 insert into, whenever I delete it on collections. 1635 01:33:29,720 --> 01:33:31,010 Let's try one more. 1636 01:33:31,010 --> 01:33:32,900 We want to not just be able to sell artwork, 1637 01:33:32,900 --> 01:33:36,260 but also add it to our collection by buying it. 1638 01:33:36,260 --> 01:33:39,050 So I create a trigger called buy. 1639 01:33:39,050 --> 01:33:44,750 Let me type create trigger "buy," and now I 1640 01:33:44,750 --> 01:33:52,460 want to run this statement forthcoming after an insert on collections. 1641 01:33:52,460 --> 01:33:58,820 So I want to first insert a new item to collections, and when I do, after I do, 1642 01:33:58,820 --> 01:34:01,610 I want to run this query coming next. 1643 01:34:01,610 --> 01:34:07,590 Now, I want to, for each row, begin the statement I want to run. 1644 01:34:07,590 --> 01:34:12,380 So as we said before, I want to insert into the transactions table. 1645 01:34:12,380 --> 01:34:18,090 I want to insert into the title and action columns, just like this. 1646 01:34:18,090 --> 01:34:21,380 Now, I want to insert some values. 1647 01:34:21,380 --> 01:34:23,060 The values I want to insert-- 1648 01:34:23,060 --> 01:34:27,410 well, I want to insert the new title, the new row I've just added. 1649 01:34:27,410 --> 01:34:30,740 So in addition to old, we also have new when 1650 01:34:30,740 --> 01:34:33,320 I'm talking when I'm using a trigger on an insert. 1651 01:34:33,320 --> 01:34:40,100 So I'll say new.title, where again, new is the new row or rows I'm inserting, 1652 01:34:40,100 --> 01:34:44,930 and the dot title is a column called title. 1653 01:34:44,930 --> 01:34:47,360 Now, here I'll say the action was bought. 1654 01:34:47,360 --> 01:34:49,070 We bought this artwork. 1655 01:34:49,070 --> 01:34:52,250 I'll hit semicolon here, and now, I'll say end 1656 01:34:52,250 --> 01:34:56,510 to say this is the end of my statement I want to run for every row I 1657 01:34:56,510 --> 01:34:58,580 insert on collections. 1658 01:34:58,580 --> 01:35:00,200 Let me hit Enter here. 1659 01:35:00,200 --> 01:35:01,520 I've created this trigger. 1660 01:35:01,520 --> 01:35:06,180 I can type dot schema to see it in my actual schema. 1661 01:35:06,180 --> 01:35:07,910 Now, let's try inserting. 1662 01:35:07,910 --> 01:35:11,630 I'll say insert into collections. 1663 01:35:11,630 --> 01:35:15,680 I want to insert into the title and accession number, 1664 01:35:15,680 --> 01:35:18,950 as well as the acquired columns here. 1665 01:35:18,950 --> 01:35:21,260 I want to insert some new artwork. 1666 01:35:21,260 --> 01:35:24,630 Maybe I sold Profusion of Flowers, but now I want to just buy it back. 1667 01:35:24,630 --> 01:35:30,510 So I'll say I'll insert "Profusion of Flowers" is the title. 1668 01:35:30,510 --> 01:35:37,160 The accession number was 56.257, and the date we acquired it, well, 1669 01:35:37,160 --> 01:35:41,360 that was in 1956/04/12. 1670 01:35:41,360 --> 01:35:43,310 Let's hit semicolon here. 1671 01:35:43,310 --> 01:35:48,770 Now, I'll insert, and I'll say select star from collections. 1672 01:35:48,770 --> 01:35:53,310 I should see Profusion of Flowers back in my table. 1673 01:35:53,310 --> 01:35:56,780 Now though, if I say select star from transactions-- 1674 01:35:56,780 --> 01:36:00,020 select star from transactions, I should see 1675 01:36:00,020 --> 01:36:03,980 I previously sold Profusion of Flowers, and later bought it back. 1676 01:36:03,980 --> 01:36:08,040 So again, I didn't run this insert myself. 1677 01:36:08,040 --> 01:36:14,160 I instead triggered it by making my very own insert on the collections table. 1678 01:36:14,160 --> 01:36:16,430 So this is the power of triggers. 1679 01:36:16,430 --> 01:36:18,830 You're able to run statements that I myself 1680 01:36:18,830 --> 01:36:23,030 didn't create in response to statements that I actually did. 1681 01:36:23,030 --> 01:36:27,440 So let me ask here, what questions do we have on these triggers 1682 01:36:27,440 --> 01:36:29,000 and what they can do for us? 1683 01:36:29,000 --> 01:36:31,650 1684 01:36:31,650 --> 01:36:33,230 Let's go to Simon. 1685 01:36:33,230 --> 01:36:38,390 SIMON: I'd like to know if there are only-- 1686 01:36:38,390 --> 01:36:42,440 there is only one SQL inquiry that you can write 1687 01:36:42,440 --> 01:36:46,170 in a trigger or multiple inquiries. 1688 01:36:46,170 --> 01:36:49,700 SPEAKER: Yeah, we saw begin and end in our triggers 1689 01:36:49,700 --> 01:36:52,290 here, where begin indicates the start of our statement, 1690 01:36:52,290 --> 01:36:54,020 and end indicates the end. 1691 01:36:54,020 --> 01:36:56,480 You actually can have multiple statements 1692 01:36:56,480 --> 01:36:59,450 inside of the begin and end separated by, 1693 01:36:59,450 --> 01:37:03,130 of course, that usual semicolon that we have. 1694 01:37:03,130 --> 01:37:07,030 OK, so one final idea here that we're getting at-- we're 1695 01:37:07,030 --> 01:37:09,370 kind of touching at with this transactions table is 1696 01:37:09,370 --> 01:37:12,700 this idea of having a soft deletion. 1697 01:37:12,700 --> 01:37:17,380 So when I dropped something from my collections table, 1698 01:37:17,380 --> 01:37:20,890 it actually ended up in my transactions table, 1699 01:37:20,890 --> 01:37:24,490 so it wasn't fully deleted in the sense that I'm keeping a record of it 1700 01:37:24,490 --> 01:37:26,330 somewhere else. 1701 01:37:26,330 --> 01:37:28,930 There's also one more way among many others 1702 01:37:28,930 --> 01:37:32,123 to implement this idea of soft deletions, that is not quite fully 1703 01:37:32,123 --> 01:37:34,540 deleting something from my database but instead keeping it 1704 01:37:34,540 --> 01:37:37,970 around with a log that we've removed it in some way. 1705 01:37:37,970 --> 01:37:39,940 Let's look at this table here. 1706 01:37:39,940 --> 01:37:45,250 I have collections, but I also have this column called "deleted," 1707 01:37:45,250 --> 01:37:49,000 and by default, the value here will be 0. 1708 01:37:49,000 --> 01:37:54,190 That is, by default, I'll add some artwork to my collections table 1709 01:37:54,190 --> 01:37:56,980 and deleted will be 0. 1710 01:37:56,980 --> 01:38:01,210 If I want to delete a piece of artwork though, what could I do? 1711 01:38:01,210 --> 01:38:05,050 Instead of deleting it fully from my table or removing the row, 1712 01:38:05,050 --> 01:38:07,090 I could mark it as deleted. 1713 01:38:07,090 --> 01:38:10,630 I could change the deleted column from a 0 to a 1. 1714 01:38:10,630 --> 01:38:13,270 Let's say I wanted to delete Farmers Working At Dawn 1715 01:38:13,270 --> 01:38:16,330 instead of writing literally delete from collection 1716 01:38:16,330 --> 01:38:18,670 where title equals Farmers Working At Dawn, I 1717 01:38:18,670 --> 01:38:23,500 could say update the deleted column of collections, where the title is Farmers 1718 01:38:23,500 --> 01:38:27,490 Working At Dawn, and make it not 0, but 1. 1719 01:38:27,490 --> 01:38:30,490 Now, I've marked this item as deleted. 1720 01:38:30,490 --> 01:38:33,430 I could run a query that excludes deleted items, 1721 01:38:33,430 --> 01:38:37,880 but this row is still around in my table. 1722 01:38:37,880 --> 01:38:41,410 So let's try implementing this idea of a soft deletion 1723 01:38:41,410 --> 01:38:43,960 inside of our collections table so that we 1724 01:38:43,960 --> 01:38:48,310 don't lose records of things that actually had in our collection. 1725 01:38:48,310 --> 01:38:51,250 I'll come back to my computer here, and let's work 1726 01:38:51,250 --> 01:38:54,940 on altering our table for collections here. 1727 01:38:54,940 --> 01:38:58,360 If I type dot schema collections-- 1728 01:38:58,360 --> 01:39:04,480 dot schema collections, you should see I have a table "collections," and also 1729 01:39:04,480 --> 01:39:06,430 the triggers associated with it. 1730 01:39:06,430 --> 01:39:10,420 What I don't have yet in collections, if I go over here and show you, 1731 01:39:10,420 --> 01:39:12,910 I don't have a deleted column. 1732 01:39:12,910 --> 01:39:18,640 I have ID, title, accession number, and acquired, but I don't have deleted. 1733 01:39:18,640 --> 01:39:22,390 So as you learned last week, we can use alter table 1734 01:39:22,390 --> 01:39:27,550 to add a column to our collections table, and by default, make the value 1735 01:39:27,550 --> 01:39:28,300 0. 1736 01:39:28,300 --> 01:39:29,940 So let's try that. 1737 01:39:29,940 --> 01:39:34,450 I'll go back to my environment, and now I'll run alter table. 1738 01:39:34,450 --> 01:39:41,440 I might say alter table, and a table I want to alter, which is "collections." 1739 01:39:41,440 --> 01:39:46,930 Now I want to add the column "deleted" to collections, 1740 01:39:46,930 --> 01:39:50,410 and I want to make this type integer, so our whole numbers 1741 01:39:50,410 --> 01:39:56,020 whether positive or negative, and the default value will be 0 in this case. 1742 01:39:56,020 --> 01:40:01,210 Now, I'll hit Enter, and if I type dot schema collections, 1743 01:40:01,210 --> 01:40:07,630 I should see I have my very own deleted column inside of collections. 1744 01:40:07,630 --> 01:40:14,120 If I type, for instance, select star from collections semicolon, 1745 01:40:14,120 --> 01:40:20,330 I'll see that by default, all of these values are 0 in my new deleted column. 1746 01:40:20,330 --> 01:40:25,580 So instead of trying to literally delete from collections, 1747 01:40:25,580 --> 01:40:27,230 I could instead just update it. 1748 01:40:27,230 --> 01:40:31,880 I could say take the updated column, and flip it from a 0 to a 1. 1749 01:40:31,880 --> 01:40:38,330 Let me try update collections and set the deleted column 1750 01:40:38,330 --> 01:40:45,830 equal to 1, where let's say the title equals Farmers Working At Dawn. 1751 01:40:45,830 --> 01:40:48,900 And my query wraps here, but it should still work just fine. 1752 01:40:48,900 --> 01:40:51,050 I'll hit semicolon, Enter. 1753 01:40:51,050 --> 01:40:53,600 Now, I didn't use delete from. 1754 01:40:53,600 --> 01:40:54,740 I used update. 1755 01:40:54,740 --> 01:40:59,570 If I select star from collections semicolon here, 1756 01:40:59,570 --> 01:41:03,420 I'll see, well, Farmers Working At Dawn is in this table, 1757 01:41:03,420 --> 01:41:07,560 but technically-- at least we marked it as deleted. 1758 01:41:07,560 --> 01:41:12,830 So it seems now, I don't want to use just select star from collections 1759 01:41:12,830 --> 01:41:14,540 to see what's inside my collection. 1760 01:41:14,540 --> 01:41:18,230 I want to apply a filter to remove those that have 1761 01:41:18,230 --> 01:41:21,745 a deleted value that's not equal to 0. 1762 01:41:21,745 --> 01:41:22,370 Let's try this. 1763 01:41:22,370 --> 01:41:30,650 I'll say select star from collections where deleted does not equal 1. 1764 01:41:30,650 --> 01:41:33,710 If I hit semicolon here, what do I see? 1765 01:41:33,710 --> 01:41:37,518 Only those values that are not deleted. 1766 01:41:37,518 --> 01:41:38,810 I could go ahead and find them. 1767 01:41:38,810 --> 01:41:45,620 I could say maybe select star from collections where deleted actually 1768 01:41:45,620 --> 01:41:46,700 is 1. 1769 01:41:46,700 --> 01:41:50,870 Really equals 1 like this, and now I see all those rows 1770 01:41:50,870 --> 01:41:53,630 that I marked as deleted. 1771 01:41:53,630 --> 01:41:55,940 Now this has some advantages. 1772 01:41:55,940 --> 01:41:57,970 One of them is we keep data around. 1773 01:41:57,970 --> 01:41:59,530 We don't actually formally delete it. 1774 01:41:59,530 --> 01:42:04,460 We can still recover it later on, but it also has some tricky ethical questions, 1775 01:42:04,460 --> 01:42:04,960 too. 1776 01:42:04,960 --> 01:42:07,240 Like it's OK if we're talking about artwork here, 1777 01:42:07,240 --> 01:42:12,520 but if you're talking about user data, is it right to only soft delete it 1778 01:42:12,520 --> 01:42:14,020 if they ask you to delete it? 1779 01:42:14,020 --> 01:42:17,800 Particularly in conversation with new frameworks like GDPR, 1780 01:42:17,800 --> 01:42:20,570 and the right to be forgotten, and so on, it's up to you 1781 01:42:20,570 --> 01:42:22,570 as a programmer to make the right decision here. 1782 01:42:22,570 --> 01:42:25,480 When should you find data that [INAUDIBLE] and actually delete it, 1783 01:42:25,480 --> 01:42:28,570 or when is it better to just soft delete it and keep it around 1784 01:42:28,570 --> 01:42:31,210 in case you need it for later? 1785 01:42:31,210 --> 01:42:36,490 So we'll end here on this note where if I say select star from collections, 1786 01:42:36,490 --> 01:42:43,570 and I want to find those only where deleted is not equal to 0 like this-- 1787 01:42:43,570 --> 01:42:49,150 or not equal to 1, this gives me back all of the items that are not deleted. 1788 01:42:49,150 --> 01:42:51,670 But wouldn't it be nice if I could actually 1789 01:42:51,670 --> 01:42:57,580 run a query on some brand new temporary table that actually only will ever have 1790 01:42:57,580 --> 01:43:00,250 those items that are not soft deleted? 1791 01:43:00,250 --> 01:43:03,140 Turns out, we can do that with an idea called views, 1792 01:43:03,140 --> 01:43:06,130 and we'll see those in much more depth next week. 1793 01:43:06,130 --> 01:43:08,010 We'll see you then. 1794 01:43:08,010 --> 01:43:10,000