[AUDIO LOGO] CARTER ZENKE: Well, hello, one and all, and welcome back to CS50's Introduction to Databases with SQL. My name is Carter Zenke, and last we left off, we learned how to create our very own database schemas-- that is, a way to organize data in our database. Today, we'll learn how to actually add data to our databases-- to insert, to update, and delete data. We'll do all of this in the context of Boston's very own Museum of Fine Arts, or the MFA for short. 

So the MFA here in Boston is a century-- well, maybe, about a century-old museum that has many artifacts and artwork inside of it, both historical and contemporary. And it's worth asking, how do they keep track of the thousands of items that are in their collections? What could they possibly use? 

Well, chances are, they're likely using some kind of database. And on that database, they want to do four actions we've learned about back in week zero. They could, perhaps, create data-- to add data or insert data to the database when they get some new piece of artwork, for instance. They might want to query the database, to read from it. They could also update data, change the artist, change the artwork in some way. And they could also just delete data to remove it all together. 

But if we think about these-- creating, reading, updating, and deleting-- we'll notice that reading, updating, and deleting-- we can't do those if we don't actually have data in our database. And so today, we'll see how to create data-- how to insert data into our very own database. Now, let's think about the MFA's collections, their collection of art and artifacts. And let me propose that they have a database that looks a bit like this. 

It's a single table, and it has a title column and accession number, which is a fancy way of saying a unique ID internal to the museum, and also, a date it was acquired. We have, of course, a primary key on this table called ID. And let's think. Well, the museum might want to acquire this piece here-- this one called Profusion of Flowers. Well, how could they log that this artwork is inside of their database? 

They could maybe just add a new row. They could say, let's put Profusion of Flowers as the first item in our collections here. We'll give it a title and accession number-- which, again, is just a unique ID internal to the museum-- and the date it was acquired. And that row then has its own primary key to identify this row uniquely in our database. 

Now let's say they get another piece of artwork. They get this one, called Farmers Working at Dawn, and they want to add this one to their table, too. Well, they could do the very same thing. They could just add a new row. They could say, let's make a title, an accession number, and a date it was acquired, and this brand-new row here for that piece of artwork. 

And maybe they get another one, too. Same thing. Maybe they'll get back Spring Outing, and they want to add this to their collection. They could simply add another row like this. Now, it turns out that the database administrator behind the MFA might be running a SQL statement that looks a bit like this-- INSERT INTO. We can use INSERT INTO to add a new row to any given table. 

And notice how insert into needs to know a few pieces of information. The first is what table to insert into. What is the name of that table? Second, it needs to know what columns are we adding data to inside of this table? We give it a list of those columns here. 

Then, of course, it needs to know what new values should go into this new row for each of these columns. Like we saw before, is it Profusion of Flowers? Is it Spring Outing? Et cetera. Here, we can see that we have this list of values. And notice how value 0, the first value in this list, corresponds to the new value that will be inserted to this first column. And we can keep having value 1, or column 1, value 2 and column 2, each one aligning with that particular column there. 

So let's see an example of this actually in code, understand it a bit more concretely. I'll go back to my computer here. Let's actually create our very own database that involves this schema of having our very own table which can keep artists and artwork and artifacts as well. 

So I'll type sqlite3 mfa.db to create a database for the Museum of Fine Arts, abbreviated as MFA. I'll hit Enter here. And notice how I can now have access to my very own mfa.db inside my SQLite environment. 

Well, now, I can type .schema to see the schema of this database. So if I hit Enter here, well, nothing's there. Because I just made this database. There's nothing in it yet. 

Well, it turns out I do actually have a schema file prepared for me already in schema.sql. Here, I propose we make a table called "collections," and like our table visually, it has let's say four columns. One as an "id," the primary key of this table, one for a "title," one for the "accession_number," the unique ID internal to the museum, and finally, the date it was acquired. 

And notice here, that "title" has to be actually added to our database. It can't be NULL. It can't have an empty value. Same thing for "accession_number." And further, "accession_number" has to be unique. I can't have two items with a different "accession_number." Otherwise, I might get them confused in my database or inside my museum archives as well. 

So let me then add this schema to my database. And I can do so with this command right here. This is a SQLite command called .read, that we saw a little bit last week as well. 

I could type .read and then the name of this file I want to read. So I'll say .read schema.sql. Hit Enter. And now if I type, let's say .schema, I can see that same sql schema, same database schema. now inside of my terminal. 

OK, so now as promised, let's try adding some rows to this table. Because right now, if I do SELECT star from collections, semicolon, I don't see anything because nothing is inside yet. But I could add something using INSERT INTO. So let's try that. 

I'll say INSERT INTO the "collections" table. And now I have to ask, what columns inside of "collections" do I have to add data to. Well, I probably want to add to the first the "id" column, that primary key column. Then maybe, I want to add a "title" to this row, and also an "accession_number," and also, of course, the date this piece was "acquired." So now, I have the table I'm inserting into along with the columns I'm adding values for. 

So for style's sake, I'll hit Enter here. And now, I could type a list of values to insert into this new row. I could say VALUES, and then inside parentheses, the values I want to insert. Maybe the first primary key that I give to this item is going to be just 1. Start at 1, and add up as we add new items. 

For the title, I'll say let's call this one "Profusion of flowers." This is the piece we recently acquired into our collection. The accession number that we gave it was 56.257. And the date it was acquired was back in 1956-04-12. 

Now, I have all these values here. I'll type semicolon, hit Enter, and nothing seems to happen. But if I type, let's say, select star from "collections" semicolon, what do I see but this new row inside? 

Let's do it again to get a hang of this. I want to now add "Farmers working at dawn" to my collection. So I'll do the same thing, I'll say let's INSERT INTO the "collections" table. And let's add values for the "id" column, the "title" column, the "accession_number," and also the "acquired" column like this. Again for style's sake, I'll hit Enter. 

Now, I give some list of values to add into this new row. I'll say VALUES here, then this list of values to add for each column. Well, the first column, I have to give a value for is "id." If you remember, our last id was 1, so what should this id be? Maybe 2. So I'll type 2 here as the next increment of my primary key, then I'll give it a title. 

And this title was "Farmers working at dawn." We gave it an accession number to keep track of it in our own internal museum records, our own archives here. So I'll say 11.6152 is the accession number for this particular item, then I'll say we acquired it back in 1911-08-03. 

Now, I'll hit a semicolon here, hit Enter. And now, I should be able to see if I type SELECT star FROM "collections," SELECT star FROM "collections" semicolon, I now have these two items inside of my collection. 

Now, let's do one more here. But let's focus in particular on this primary key. Notice here, how we've been actually inserting our very own primary key 1 and then 2. But maybe that's not the best design. 

Let me try this again. I'll say INSERT INTO, let's go for "collections" is our table name. And maybe I'll try again to add to the "id" column, the "title" column, the "accession_number," and the "acquired" column here. What could go wrong do you think if I try to specify the primary key myself? 

Let me ask this as an audience question here. Feel free raise your hand. What might go wrong if I try to make the primary key myself? Insert a value myself? Let's go to [INAUDIBLE]. 

SPEAKER 1: Yeah, what we could do when we have to lower a lot of values, like maybe like big data or something, is there a way to add it to a CSV file or something rather than typing insert into tables? 

CARTER ZENKE: A good question. I think you're onto something here, which is we're inserting one row at a time, which could get easily repetitive. So let me hold that thought for just a minute here and focus on the primary keys before we see some more efficient ways to actually add data to our table. So I'm with you on this idea of maybe we don't want to duplicate this primary key, right? Maybe we could do a little better than that. 

So I'll come back to my computer here. And if I specify the primary key, I might actually add a value that's already in there. So I could, thankfully, leave it up to SQLite to actually increment this value for me. Let me try leaving off "id" in this case. I'll omit the "id" altogether, which seems at first like a bit of a bad thought, like shouldn't every row have its own id? Well, let's just try and figure out what happened. I'll hit Enter here. 

And now I'll say, let me give the VALUES. Well, now the first column is the title column. So I'll go ahead and say that this one will be called "Spring outing." We're going to add this one to our collection here. The accession number, in this case, is 14.76. And we acquired this one all the way back in 1914-01-08 semicolon. 

So I'll hit Enter here. And notice how there's no ID that I have specified. But now if I hit Enter, seems to work. I'll type SELECT star FROM "collections," and what do I see but the new primary key of 3. So it seems like SQLite actually increments the primary key for me. 

If I add some new row, it looks, what is the current highest primary key and adds 1 to that automatically for me. And I've gotten that by specifying in my schema this primary key constraint down on line 6. So pretty handy for me here. I see a few other questions too. So let's take those before we move on as well. Let's go to Andre. 

SPEAKER 2: I just want to ask if I delete, let's say, the first record and type again, will it number it on the fourth number or the first one? 

CARTER ZENKE: A great question. We'll see this a little bit later on too. But if I delete some row, let's say, I deleted the row with primary key of 1, well, what SQLite will do, at least by default, is actually take the highest value. Let's say the highest value is still three in my database. It will add 1 to that, and I want to insert that new row it'll have the ID of 4 in this case. 

OK, so let's explore a bit more some of these constraints on tables. Here, we're talking about the primary key constraint, but we also have, as we saw before, the NOT NULL constraint and the UNIQUE constraint. So let's try inserting given those constraints here. 

Go back to my computer. And again, notice in my schema, I specified that this "title" column should always be NOT NULL. It should never have a NULL value inside of it. And similarly, "accession_number" should also be NOT NULL. 

It should also be unique. I should have no two rows that have the same accession number. And maybe kind of a playfully, subversively here, let's try to run against this constraint. Let's actually try to add the same accession number and see what happens. 

So I'll come back to my TERMINAL. Let me just reinsert, let's say "Spring outing." Notice how if I SELECT star FROM "collections," it's already in here. But I'll try to add it yet again with the same accession number. 

So I'll say INSERT INTO "collections." And by now, this is hopefully becoming a little more familiar. I now want to say the columns I want to add data to. So I'll say the "title" column, the "accession_number" column, and also the "acquired" date I'll hit Enter. 

And now I'll say the VALUES again. Let's reinsert "Spring outing." I'll say "Spring outing" as a title, the accession number is 14.76, and the date it was acquired was again 1914-01-08 semicolon. Now if I hit Enter, what do we see? 

Runtime error-- UNIQUE constraint failed. So it seems we ran against this constraint here that we specified on line 4 here. Accession number should always be unique. But by trying to add a new row that had that same accession number, we ran into this runtime error and our operation was not completed. 

If I say in this case, SELECT star FROM "collections," notice how I guarded myself from adding Spring outing more than once with the same accession number. So some usefulness here to these kinds of constraints. 

Let's try violating NOT NULL too. I'll try adding a title that is actually null, non-existent. It's trying to add a painting without a title itself. Let's try this. 

I might try INSERT INTO the "collections" table, and I'll add to the "title" columns, the "accession_number," and also the "acquired" column. Now, I'll say the VALUES here. But remember, this null value. Null meaning nothing. This value doesn't exist. 

I could insert that into my table, or at least try to. I'll say NULL is a title. And similarly, NULL is the accession number. Now let's say, just playfully, we got this painting back in 1900-01-10 like this. OK now, semicolon. 

I'll hit Enter, and we see the same runtime error-- constraint failed, in this case, the NOT NULL constraint. So notice again how in my schema, I specified "title" should be NOT NULL. But here when I try to insert a null value, I run into that constraint and I can't insert that value. If I try to SELECT star FROM "collections," I should see in this case that insert didn't work. So these constraints are really guardrails against me, against other users to adding values that we really shouldn't add to our table. 

OK, so presumably here, we're onto something by adding some rows. But if the museum acquires more than one item, maybe 100 at a time, I don't want to be the programmer who's sitting there writing 100 INSERT INTO. That's like not what I want to do. There's probably a better way to do this. And we show you one way to do this. 

One way is to instead of inserting one row, like this, to instead insert multiple separated by commas. So here, I might say this is my first row for each of these columns. This is my second row, again, for each of these columns, and so on and so forth. And it's worth noting, this is not just a manner of convenience. Like if I tried to insert 100 rows, this is certainly convenient for me. 

But it's also most often in most cases going to be faster for me to insert 100 rows in one go, than to insert one row 100 times. So this is both a convenience thing for programmers and also an efficiency thing to actually optimize your database as well. So let's try this syntax now to avoid me sitting there for hours and hours writing many insert interviews. 

I'll go back to my computer. And this time, let's say we got two paintings at once. We got two. So I'll try to add both of them. I'll say INSERT INTO the "collections" table. And again, I want to keep adding to my "title" column, my "accession_number" column, and even my "acquired" column, like this. 

Now, I want to add in some values. So I'll type VALUES here. And as a matter of style, let me just say I'll make a new line. I'll Enter here. And now I can type each of my new rows on their very own line. 

So again, I'll type all of the values that I want in my first row. I might call this one "Imaginative landscape." We got this one back in-- actually, we don't quite know when we got this one. If I type 56.496 is the accession number, like, yeah, 56.496. 

The MFA actually doesn't know when they got this painting. So if they don't know, let's just insert NULL. This is intentionally left blank. This is my first row. 

Now, I want to add a second one in one go. I'll follow this up with a comma. And now, I'll type out my next set of values. This next one we acquired is called "Peonies and butterfly." Now, I'll say the accession number is 06.1899. 

And we got this one-- it was back in 1906-01-01 now I'll hit semicolon here. And now, if I hit Enter, nothing happens. But if I type Select star FROM "collections" semicolon, what do we see, but now two new rows being inserted. 

So a handy way to insert more than one value. And also, if you have a lot of values, a more efficient way as well. So we pause here. And ask what questions we have on INSERT INTO, Whether adding one row or adding multiple. Let's go to [INAUDIBLE]. 

SPEAKER 3: So imagine you are writing code or inserting a row. So by mistake, you have entered the wrong spelling of the title, so how we will rename it? 

CARTER ZENKE: A great question too. And often as is the case with me, I make typos all the time. I might add the artwork title and misspell it for instance. Well, in that case, I can't use INSERT INTO to correct it, but I can use a new statement we'll see a little later called update. 

And so with update, you can actually change the spellings of things. And we'll see that a little later today. But great question to peek ahead as well. 

OK, so let's keep going here. Let's think about how we can keep adding values to our data set. And so far we've seen INSERT INTO with one row and with multiple. But one more way to keep in mind is you might already have your data, perhaps in some other format. And one common format looks a bit like this on our screen over here. 

This file is called CSV for Comma Separated Values. Now, why Comma Separated Values? Well, let's just look at it. Here we see some presumably column or column names, like id, title, accession_number, and acquired, but what separates these column names? Well, it looks like commas-- "id" comma "title," comma "accession_number," and so on. Every row is still its own row. That makes sense here. But now, those row values are also separated by commas. 

Notice here, the first value before the first comma corresponds to the first column, this id is 1. Similarly, the next Comma Separated Value Profusion of flowers belongs to that next column here-- "title" as well, and so on and so forth. You could if you wanted to draw a snaky line to see how these columns correspond down our file here. 

Now, you might often have data already in this format. And it's actually pretty convenient to try to import this data into SQLite into your very own table so you can use SQL on it. So let's try doing that here, taking a CSV like this and adding it to our database. 

I'll come back to my computer. And at this point, I want to start over. Let's say, I didn't use insert at all. I actually got a CSV from the MFA of all the items in our collection. So I'll type .quit to leave this database. And now, I'll type rm mfa.db to remove this file all together. 

Now, let me show you mfa.csv, the CSV file I have that looks exactly like what we just saw. I'll type code mfa.csv. And now I can indeed see, I have an iD column, a title column, and accession_number, and the date these pieces were acquired all separated by commas. So let's say I want to quickly import this data into its very own database to run SQL queries on it. 

As it stands, I can't use SQL queries on this table because it's not inside of a SQLite database. But let me add one. I'll type SQLite3 mfa.db to remake MFA down below. Let me now introduce this new command that is not a SQL keyword, but is actually a SQLite command. 

And this command is called .import. .import lets you take a file, like a CSV, and automatically insert it row by row into a table of your own making or one you can let SQLite create for you. Let's actually create the table ourselves, and then insert this CSV using .import. 

So I'll go back to SQLite. Let me recreate the schema. Right now, there's no tables inside this database, but let me create one. I'll instead type .read schema.sql to read in that old schema.sql file, so I can have my own table here. I'll type that schema again. 

And now I see, I have that table back. But if I type SELECT star FROM "collections" semicolon, nothing is inside. Well, I could fix this. I could to import this CSV into collections to insert all at once from this own-- from the CSV file we have right here. 

So to do this, let me type .import. Now before I finish this off, there are a few arguments or options I can give .import to make sure it works properly. The ones for this look as follows. 

I should type .import, and then --csv to say I'm importing a CSV. If I don't type --csv, SQLite I might assume something about this file that just isn't true. It should know that numbers are here separated by commas. Numbers or values, whatever's inside my table are separated by commas. 

And then --skip 1, well, let's take a look. If I go back to this file here, let's see. I have the first row, the second row, the third row, the fourth row. Are there any rows I probably shouldn't insert into my table here? 

Let me ask the audience. You're free to raise your hand. Are there any rows I shouldn't insert into this table? Let's go to Sukanya. 

SPEAKER 4: Yes, I think they shouldn't include the first one, because it doesn't give us any value that we need. 

CARTER ZENKE: Yeah, you're right. So if I look at this header row, as we call it, I see id, title, accession_number, and acquired-- the names for my columns. But notice in my schema, I already have those column names existing. So I shouldn't insert the value id into my "id" column. I want to just skip that row and only do the next one. 

So here by typing --skip 1, I'll skip that header row. So let's try this now. I'll say .import --csv. I'm going to import a CSV. Now, --skip, how many rows should I skip? 

Well, just one, the header row there. Now, I say the file to import. So I'll type mfa.csv, the CSV file to import into my database. Now, I type the name of the table I'm importing into, collections, in this case. And notice how I no longer need to quote collections or quote mfa.csv because this is not a SQL statement this is a SQLite statement, so I can get away with not quoting anything here. 

Now, I'll hit Enter. And again, nothing seems to happen, but it's probably a good sign. Let me try selecting star FROM "collections" semicolon, Enter. And now all my data is just magically in there. It's went from my CSV into my very own table. 

OK, so that's a pretty nice step forward. No longer do we have to use just single line inserts or even bulk inserts, those multi-line inserts. We can now just import an entire table from a CSV. But I think I've been showing you a bit of an inaccuracy here, at least that's not often going to happen to you. 

If I go back to this file here, this CSV. What you might notice is that I specified the primary key. I said id of 1, id of 2, id of 3. But if we'd heard before, this might not be the best design. Because what if I import this CSV, and there's already an item that has the id of 1 or the id of 2. 

Ideally, I could let SQLite create its very own primary keys for each of these rows. And more often than not, you'll likely have a database or CSV that looks a bit more like this without the primary key. But you'll still want to have a primary key when you import into your database. So let's try working with this kind of CSV now and having SQLite generate some of the primary keys for us. 

Come back to my computer. And let's update our mfa.csv just to remove those primary key column here. So I will open up again mfa.csv. And let me try to just delete this column all together. I'll remove id, remove 1, and 2, and 3, and 4, and 5. 

Let me save it, and now that id column is gone. But now there's a problem. Like if I go to my schema, and I type .schema, notice how there are four columns in this table-- "id," "title," "accession_number," and "acquired." Well, now my CSV only has three columns. So if I try to import this CSV into this table, I'm going to run into trouble, because I have different numbers of columns between my CSV and my table. 

Well, what can I do instead? It turns out that I could actually import this data into a temporary table, and then take that data and insert that data in that temporary table into my real one called "collections" here. I can use both .import and INSERT INTO to accomplish that task for me. 

So let's try this. I'll say .import. I want to import a CSV now-- --csv. Which one? Well, mfa.csv in this case. 

Now, though, I want to create a brand new table that doesn't yet exist. So I'll type the name of the table. And I just call it temp for now to temporarily import this data into a brand new table. And notice how here I'm not using --skip 1 because now I want to take advantage of those header rows. 

SQLite, if I import into a new table, will notice I have a header row and make my columns the very same names that are in my header row-- title, accession number, and acquired. I don't want to skip them. I want SQLite to see them and create this table using those header rows. 

So now, I'll hit Enter, and nothing seems to happen. But if I type .schema, what do I see, but a brand new table called "temp" that SQLite has just made automatically and notice how it used that header row. I have "title," "accession_number," and "acquired" as my column names. 

OK, let's now look inside the temp table, I'll say SELECT star FROM "temp" semicolon. And now, I see all that data in there. But I don't have primary keys yet. So my goal is really to take all this data and insert it into my collections table to give it that primary key I've been wanting here. 

Again, if I type SELECT star FROM "collections," well, nothing-- oh, something is still there. Let me actually just delete this for demonstration. I'll say DELETE FROM "collections" to remove it all together. We'll get that in just a minute. 

And now let me try, SELECT star FROM "collections," and I should see nothing inside of collections. So what could I do? I could INSERT INTO collections using the data from my temp table. So I'll try that. I'll say INSERT INTO "collections" and choose those columns yet again. 

I'll say the "title," and the "accession_number," and now let's go for the "acquired" column as well. I'll hit Enter. But instead of typing many new lines of values, I actually have a way to dynamically select all the values I want to insert, and it looks a bit like this. 

I could say INSERT INTO some table and some columns of that table. But I want to insert the results of this SELECT down below. So I'll SELECT some columns FROM some separate table. And so long as these columns align, I'll be able to actually take the results of that SELECT and insert all of them into this new table using INSERT INTO. So let me finish my statement here, we can see the results. 

Come back to my computer. And here, I'll type a SELECT to get back all the items from my temp table. I'll say SELECT-- in this case, SELECT, let's go for the "title" column, the "accession_number" column, and the "acquired" column from my "temp" table. Now, if I hit semicolon, I should see-- well, nothing at first. But if I type SELECT star FROM "collections," I see all my data now in there selected from my temporary table. 

And now if I type .schema, I still see temp. But what can I do now? I could just delete that table altogether. 

I could say DROP TABLE "temp," as we saw last week, semicolon, hit Enter, .schema again. And now, we're back where we want to be with a single table, and now we've import our data from our CSV with primary keys. If I type "collections" here, we see it all in this table. 

OK, so we've seen several versions of import, one to import into an existing table and one to import into a brand new table. Let me ask then what questions we have on how to use .import or how to insert more data into our database. Let's go Louise. 

SPEAKER 5: Can we ask the INSERT INTO command to place a column in a specific position? 

CARTER ZENKE: Good question. Can we ask INSERT INTO the place a column in a certain position? There's a few ways of getting at what I think you're asking here, so let me show you a few of those here. I'll come back to my computer, and let me pull up the syntax for INSERT INTO again. 

I will choose this, and I'll show it on the screen over here. So I'll say INSERT INTO some table given some columns. And notice how down below, I actually have values to insert into those columns. I could, to your question, rearrange these values. 

So I might have this first value here is some value that goes into this column. The second column here is attached to that column. And that would rearrange the values I insert into those columns. 

If you're asking, though, if I can reorder the columns, that is up to CREATE TABLE and CREATE TABLE only. And in general, I encourage you not so much out of ordering of your columns, because it could be in any order whatsoever. But you could just rearrange your selection of columns here to insert the data you want to insert. OK, let's take one more question here. 

SPEAKER 6: What happens if one of the rows you're trying to insert violates a constraint on the table? 

CARTER ZENKE: Yeah, so here we're inserting multiple rows. And if one actually violates some constraint, then we won't actually insert any of those rows. And this is because this insert is bundled inside of a transaction, which we'll learn more about later on in the course. Let's take another one. 

SPEAKER 7: I noticed that when you did SELECT star FROM "collections," that one of the acquired dates was just blank. It didn't say NULL. Is that because the CSV itself just had an empty value next to the comma? 

CARTER ZENKE: Yeah, great observations. So let me try this again. So we can see the results of this SELECT. I'll come back to my computer here, and let me show you again. If I say SELECT star FROM "collections" semicolon, notice how this acquired column-- this acquired cell here, if I go back to my screen, it's just blank. 

But we saw before that if I selected star, I would have seen NULL there if this value was truly NULL. Well, it turns out one downside of importing from a CSV is that all of your data is imported initially as text. And if I have just a blank cell in my CSV, it won't be converted to NULL automatically. 

I need to do that myself perhaps using an update statement that we'll see in just a bit. So be wary of this if you do want to keep track of null values and so on. If you don't actually manually make this value a null, it'll appear as just a blank value not a null, which is different in this respect. 

OK so here, we've seen how to insert not just one row, but multiple, and also how to import data from a CSV. When we come back, we'll see how to actually update our data altogether, and even delete it too. So come back in just a few and talk about how to delete data from our tables. 

And we're back. So we just saw how to insert some rows into our database, and also to import some data. But presumably, we also want to be able to delete some data from our table as well. You can imagine the MFA, the Museum of Fine Arts, maybe they're selling a piece of artwork, or maybe they've lost one, or maybe it was stolen. But either way, they want to remove the row from their table. 

Well, let's see here. We can go back to our schema with a table of artifacts and artwork that is inside of the MFA. Now if I want to delete a particular piece, you could visually think of it a bit like this. I could first identify that row I want to delete. Let's say it's Spring outing here. 

We've sold this piece. Well, I could visually just remove this row. So it's no longer there and shift the remaining ones up metaphorically. And it turns out that to do this in SQL, we have our very own statement we can use. This one reads DELETE FROM some table WHERE a condition is true. 

So we see our old friend WHERE back again. And WHERE is vitally important to this DELETE FROM. If I say DELETE FROM table with no WHERE, what do you think is going to happen? We might drop everything from our table, right? But if I instead say WHERE some condition is true, I can select the rows I want to delete and only delete those rows. 

So let's try this here. I'll try to delete some artwork from our collections table, for example, maybe we sold it and we want get rid of it. I'll come back to my computer over here, and I will open up our SQLite database. I'll say sqlite3 mfa.db. 

And now, I can type SELECT star FROM "collections," and I see a few more items than last time now all inside of our table. But per the visual, I want to delete Spring outing, to remove it from this table. So what could I do? 

I could try DELETE FROM and then the table I want to delete from. In this case, I'll use the let's say "collections" name, DELETE FROM "collections." But I don't want to do this. I don't want to say DELETE FROM "collections" semicolon end of statement, because then I'll delete every row. 

What I should instead do is this. DELETE FROM "collections" WHERE some condition is true. Maybe the title in this case is a title I'm trying to remove. So I'll say "title" equals "Spring outing," just like this, semicolon, Enter. Now, nothing seems to happen. 

But if I do SELECT star FROM "collections" semicolon, we no longer see Spring outing. And notice here, how the id column, the id of 3 is now gone. If I were to insert some new row here, I would start over with the highest number, which is 7 in this case, going from 7 to 8. But 3 is no longer part of our database. 

Let me try this again, notice how this one called Imaginative Landscape, the acquired date is NULL. We don't know when we got it. Well, let's say that we eventually sell. It's no longer part of our database. We could use a condition based on NULL to remove this particular artwork. 

So I'll say this. Let me try to DELETE FROM "collections" WHERE "title" equals-- not Spring outing, but WHERE the acquired date-- sorry, the "acquired" date IS NULL semicolon. I could also use is NOT NULL-- WHERE "acquired" IS NOT NULL, but that would delete all the pieces of artwork that actually have an acquired date. Here, I want to leave only those that do not have an acquired date. So I'll say where "acquired" IS NULL. 

Now, I'll hit Enter, nothing seems to happen, but I'll say Select star FROM "collections" semicolon. And I'll see that piece of artwork is now gone. I only have those that have an acquired date. 

OK, so similar to INSERT, we've been able to delete one row at a time. But it would probably be helpful for us to delete multiple rows at a time as well. Let's look again at our table. Here we have our artwork, as it currently stands, and maybe let's say we want to delete those pieces that we acquired before 1909. Here in Boston, the MFA actually moved locations in 1909 to a new place still in Boston, but a brand new building altogether. 

So let's say that perhaps they left some items at their old location, they're no longer part of their collections. Via what condition do you think could I select these three rows from my DELETE FROM? What could I put in my condition to delete these three rows? 

Let me ask the audience here. What might you propose for DELETE FROM here to delete those three particular rows? Let's go to Shiv. 

SPEAKER 8: Greater than 5. Id equals greater than 5. 

CARTER ZENKE: That's a good observation. So we could look at our table here. We see the id could be greater than or equal to 5, that would remove these three rows certainly. 

There's probably another way to do this too. Any other ideas besides the id, what else can we use? Let's go to Yashvi. 

SPEAKER 19: So we could compare the dates and use greater than or less than sign with the acquired dates. 

CARTER ZENKE: So you're proposing to use the date column too. That makes sense to me as well. We could see acquired here, perhaps we could see, is the acquired value less than 1909, that year that we move locations, we could also probably use the title, the accession number, and so on. And each has their own trade-offs. 

Remember, though, for this particular query, which is deleting those items that were part of the museum acquired before 1909, probably best to use the acquired date to actually delete those rows. What if, for instance, our ids are actually not like this. But they're kind of interspersed around. I couldn't really do a query like if the id is greater than or equal to 5, because I could also include those that were maybe acquired 1911, 1956, and so on. Probably ideal for me to use this date if it's the date that I ultimately care about. 

So let's try this. I'll go back to my environment. And I'll try to delete these three particular rows. I'll try DELETE FROM "collections," as we saw before. But now here comes my condition. I'll say WHERE the "acquired" column is less than some date that I'll give. 

And here, I'll give 1909-01-01. It turns out that SQLite has a few ways to represent dates. One of which is this format that follows the YYYY or the year, four digits, and then dash, and then MM or the month with two digits, and then dash again, DD, in this case, the day in two digits. So 1909-01-01 means January 1st, 1909. 

And I can use these same operators with dates. I could say greater than, or less than, or equal to this particular date. And SQLite will be able to parse this for me to understand what I mean, which is earlier than 1909. 

So let's try this then. I'll hit Enter, and nothing seems to happen. But if I do SELECT star FROM "collections," I now see I'm down to two paintings, two pieces of artwork that were acquired only after 1909. 

OK, so given this part on DELETE, we're able to delete one row or even multiple rows at a time. But there are still instances where you might want to think about, should we delete something or can we delete something. We'll talk about those in just a minute in terms of constraints. We back in a few. 

And we're back. So we just saw how to delete one row, even multiple rows. We haven't yet talked about whether we should delete some data or whether we should delete some data and particularly in the context of these constraints, where you might have maybe a piece of data you actually shouldn't delete to maintain the integrity of your table. 

Now, one example of this is a Foreign Key Constraint, which means that you have some table with a primary key that is referenced by some other table. And we'll get concrete about this in a minute. But it might mean that if you were to delete that primary key, that other table would have nothing to reference. 

So let's update our schema here for the MFA, the Museum of Fine Arts. And let's say that they have not just a collection now, but also artists involved. And there exists a many-to-many relationship among artists and items in the collection. We could say that an artist might make more than one piece of art in the collection. And a piece of art might be created by more than one artist. 

Now concretely in our database, this might look as follows. I could have a table for artists and a table for collections. And notice how each has their own primary key. Artists has an id column, called our primary key column. Collections too has an id called our primary key column here. 

Now, in the middle is this table created that symbolizes, that represents the relationship among artists and items in the collection. Here on the first row, we see the artist id of 1 created the piece of art with the collection id of 2. So who did what here? Well, we can see that the artist with the ID of 1 is Li Yin. Li Yin created this piece of art, the id of 2, which is Imaginative landscape. 

So we can kind of relate in this case artists with collections as we saw just a few weeks ago. But let's say we decide to delete a particular artist, maybe we delete Unidentified artists down here. Well, we could just delete from the artist table. Maybe we find a condition to select this row and we delete from artists. We'll do that here remove that row. But what have we done wrong? If we look at our created table in particular, what have we now done wrong? What kind of problem might we run into? Let's go to Karim. 

SPEAKER 10: [INAUDIBLE] was only deleted from the artists table without the other table. 

CARTER ZENKE: Yeah, good point. We only deleted the artist with the id of 3 from the artists table. Now if I look at the created table and I look for the artist with id of 3, do you see them? They don't exist anymore. 

So I can't understand this relationship. I don't know the artist with the id of 3. So let's try this now with a new kind of schema now in our database. 

I'll go back to my computer. And let me open up our new version of our database. I'll type sqlite3, in this case, mfa.db, same name, but now a different schema. If I type .schema, notice a few things here. I have those same tables we saw visually. 

I have collections, artists, and a created table. And let me focus in particular on this created table. If I say .schema, in this case, created, Enter, I'll see I have a FOREIGN KEY constraint on my "artist_id" column and my "collection_id" column. So if I tried to delete an artist that is referenced in my created table, I would probably raise a constraint error-- a FOREIGN KEY constraint error. 

So let's just try this and see what happens. Let me try DELETE FROM the "artists" table, DELETE FROM "artists" WHERE, let's say the "name" of the artist is literally "Unidentified artist." This is the name they have in the Museum of Fine Arts collections-- Unidentified artist. 

So now I want to delete them from that table. Well, if I hit Enter, I do get that FOREIGN KEY constraint error. So does this mean I just can't delete this artist or is there a workaround? Let's look at it visually again. 

I'll go back to what we had before as our table, and let me ask the audience here. What's the solution? If I can't delete Unidentified artist, because they have an id referenced by this table, what should I maybe do first? What should I do instead? 

SPEAKER 11: Yeah, I want to say that first we can delete the id that is being referenced by this id that we want to delete, then we can delete it. In fact, if we want to delete it from the artists, first we have to delete from the created table [INAUDIBLE] table, which it is referencing, then we can delete from the artist tables. 

CARTER ZENKE: Yeah, I like your thinking here. So if we look at the created table, you notice that we can't delete Unidentified artist, because it's referenced in the created table. What we should maybe first do is delete this row delete their affiliation with their work, and then delete the artist so we don't run into this FOREIGN KEY constraint. So let's try this in our SQLite environment. 

Come back to my computer. And again, our plan was to first delete the artists affiliation with their work from the created table. So let me show you first the created table. I'll say SELECT star from "created," and see how it's exactly like the table we just visualized. Now, let me try delete this artist affiliation with their work. 

I'm not deleting the artist. I'm deleting their affiliation with their authorship, their artist-ship, whatever you want to call it for their work. Now, I'll say DELETE FROM-- let's say, DELETE FROM "artists" WHERE the "artist_id" is equal to or equals 3. 

Well, I could do this. But there's probably a better way, like I know the id, and that's fine. But we could also use, as we saw a little bit ago, which were subqueries, a way to write a query that returns me the result. And that result gets included in another query altogether. So let's try that instead. 

Let's say I don't know the artist_id. What I could do is make a subquery. I could say inside of this, let me select the "id" FROM the "artists" table WHERE, in this case, the "name" equals "Unidentified artist," and then, let me close out this query, hit a semicolon. And I already see a typo in this, but I'm just going to try it anyway. I will go ahead, and hit Enter. 

And let's see if I type SELECT star FROM "artists," Enter, I still see Unidentified artist. And I think my typo here was as follows. If I up my query using the up arrow on my keyboard, I see DELETE from "artists," where the "artist_id" equals some value here. But do I want to delete from artists? 

I don't. I actually can't delete from artists. What I should do instead is delete from the created table. So let's try this. I'll say DELETE FROM "created," making sure I only have double quotes around created, then I'll say WHERE the artist_id equals, let's say, either 3-- but it could also be the result of this subquery. 

I'll say 1, 2, 3, 4, and then SELECT the "id" FROM the "artist" table WHERE the "name" equals "Unidentified artist." Then I'll close this subquery, hitting semicolon, hit Enter. And now, if I say SELECT star FROM "created" semicolon, I should see that this artist's affiliation with their work no longer exists. 

And because it no longer exists. I can now delete them from the artists table. I don't have this FOREIGN KEY artist_id referencing the primary key of this artist. 

So let's now try that. I could say SELECT star FROM "artists," I see that Unidentified artist is still in here. But let's delete them now. I'll say delete from, let's say, "artists" WHERE the "name" equals "Unidentified artist," closing my quotes, semicolon, hit Enter. And now if I Select star FROM "artists" semicolon, I should see they're no longer in this table to our earlier example. This becomes a 2-step process. First delete their affiliation, then delete their name. 

OK, So let's try this yet again, but now using some additional tools that we have at our disposal. So let's look back at this FOREIGN KEY constraint that existed in this table. If you remember, it looked a bit like this. In my created table, I had this line. FOREIGN KEY, artist_id, REFERENCES the "id" column in "artists." 

Well, if that is the case, I can't delete the artist with the id referenced by this foreign key. But it turns out that I could specify some alternative action that happens when I try to delete the id that is referenced by this foreign key. I can specify that using a keyword called ON DELETE. So after FOREIGN KEY REFERENCES "artist_id," I could say afterwards ON DELETE, and then specify some action I want to actually happen when I tried to delete the primary key referenced by this foreign key. 

One thing I could use would be RESTRICT, which is kind of like what we saw before. If I try to delete a primary key is referenced by this foreign key, I will not be allowed to do it. That action is restricted. 

I could also decide to take NO ACTION. In this case, I could delete the primary key referenced by this foreign key, and nothing would happen. I would actually be allowed to do that, which may be unwise in some cases, but I could give myself that power. 

I could also decide to SET NULL. That is if I delete the primary key that is referenced by this foreign key, what I'll do is set the foreign key to be NULL, meaning that value no longer exists. I can alternatively set it to a default value for that column. 

Or perhaps, most compellingly, I could try to CASCADE the deletion. Where CASCADE means if I delete the artist, for instance, let's also delete their affiliation with their artwork all in one go. This converts our 2-step process into a 1-step process. So let's visualize this. 

Let's say we have now applied this constraint ON DELETE CASCADE. So if I delete the artist, I'll also delete their affiliation with their work. We'll have the same two tables-- artists, collections, and created. Now again, artist_id references this primary key in artists. 

Now, I'm going to try. I want to delete Unidentified artist here. Well, I could do that. I could just delete their row. And now, instead of a FOREIGN KEY constraint error, what I get is the following. 

I see and created that this row is also gone. The row that had the artist_id of 3 gets removed. We've cascaded the deletion onward to the created table. So let's try this now with a new database schema. I'll go back to my computer here. 

So I'm back in my TERMINAL here. And I can type sqlite3 mfa.db to reopen this database. And notice how if I type .schema, I've updated this schema to now have ON DELETE CASCADE. Let me show you over here on the screen. 

Here on the created table, I now have the very same kind of table schema, but now my only difference is I've applied this ON DELETE action to my foreign key constraints. In particular, I'm going to cascade the delete from the artist's table to the created table. So let's try that out in SQLite. 

I'll come back over here, and I will now try just DELETE FROM the "artists" table WHERE the "name" equals "Unidentified artist. semicolon. Now, I'll hit Enter, and I don't get a FOREIGN KEY constraint anymore. But if I say SELECT star FROM "created" semicolon, notice how I've also deleted the artist's affiliation with their work. So that is wherever in the artist_id column, I saw the id for the artist I deleted, I would to delete that row, so I have no references to that primary key, which is now gone from my table. 

So let me ask here. We've seen how to delete single rows, how to delete multiple rows, and now how to delete data among some constraints, like our foreign key constraints. What questions do we have on those techniques? Let's go to Han. 

SPEAKER 12: I have a question regarding deleting. The id numbers have been removed while we were removing the record. I was wondering if do we have to clean it up somehow later or will they be populated with the new data as it comes along? 

CARTER ZENKE: Yeah, great question. So the question is, what happens to our primary keys when we delete our data. So for this one, let me show you the visual again that we had before of our tables nicely printed on the side. So I'll come back here, and I will go back to our idea of these joint tables where we had an artists table, a collections table, and a created table. 

And we saw before that we're going to delete this artist called Unidentified artist. So I'll delete them, and to your point, well, the idea of 3 no longer exists in this case. Now by default, at least in SQLite, if I insert a new row, what I'll do is take the highest id value and I will then make that the new primary key for the new row that I insert. That is the default situation. 

I could if I wanted to get more specific, and if I had done in my id integer column here in my create table, if I had also said this keyword called autoincrement, all one word, what would happen instead is I would actually reinsert an ID that is not used. So in this case 3 is not used, I could insert that one here. So up to you what you want to do. 

In general, SQlite by default will take the highest ID, add one from there. If you specify this constraint called autoincrement, you will instead take whatever id is not being used and use that in your insert afterwards. 

OK, so now we've seen how to insert, and how to delete data. But of course, we make mistakes when we add data or even when we delete data. So we'll see in just a minute how to update our values as well to correct typos and even to update associations between artists and artwork. 

And we're back. So we've so far seen how to insert some data into our tables and how to delete it. But sometimes, we don't want to fully delete something, we just want to change its value to correct some typo or correct some association. 

So let's think to our MFA example, where the Museum of Fine Arts has some tables that look like this. They have artists in their collection, and they also have art work in their collection. They also have a created table to associate artists with their artwork. 

Now in this case, we know that I have this Unidentified artist. And we can see that they authored this item in the collections, "Farmers working at dawn." But let's say that later on, we find out it wasn't an Unidentified artist. It was instead Li Yin, who created "Farmers working at dawn." How could we update our tables to make sure it's Li Yin, who we have creating "Farmers working at dawn." 

SPEAKER 13: So what we could do is-- we know that in created, we have our artist_id and the collection_id. So we could update it there, but somehow we need to cascade it over to the artists table because now this unidentified artist through-- I think we probably don't need it. 

CARTER ZENKE: Yeah, so a good point here. We could actually probably change the created table to re-associate an artist with some new artwork here. So let's visualize this. I go to my created table. And here the artist_id is currently 3, but I want it really to be 1. I want Li Yin to be associated with this piece called "Farmers working at dawn." 

So I could update the created table to instead of having 3 here have 1. Now, we see Li Yin created "Farmers working at dawn." And to your concern about this Unidentified artist here, I think it's OK to have an artist in our table who may or may not have an item in collections. We'll say that's OK at least for now. 

So if we can update our associations between artists and collections like this, let's actually try to do that in our very own database here. I'll go back to my SQLite environment, and now, let me try to open it first. I'll use sqlite3 mfa.db. And now let me type .schema to show you I have the very same schema from before. So I want to update the artists association between Li Yin and "Farmers working at dawn." 

So let's say, I'll SELECT star FROM "created," like this, semicolon. Here, I have my artist_ids and my collection_ids. I see that we have the Unidentified artist, id of 3, creating "Farmers working at dawn," with the id of 1 in our collection here. 

So now, let's try to update the artist who created this particular painting. Here, I have 3 associated with 1, but I want 1 associated with 1, where 1 is the id for Li Yin. And 1 in my collection id is the id for this artwork here. So let me try this. 

I'll say UPDATE "created" and SET, let's say, "artist_id" equal to some particular value. Well, what value should I set it to? I could try to set it to Li Yin's id, which we know is just 1. But let me try instead to use a subquery here. 

Let me try to say parentheses, and then write some query to update this value. I'll say SELECT "id" from "artists" like this Enter again, WHERE the "name" equals Li Yin. And then let me close this subquery. Well, if I try to run this query, what might Happen I'm updating the created table, I'm setting the artist_id equal to the id for Li Yin, but what I've forgotten is this WHERE to only choose some rows to update. 

So I'm going to not close it yet. I'll instead say WHERE, in this case, the "collection_id," the piece of artwork in our collection, is equal to, well, the id for this painting. I'll say SELECT "id" from "collections," and then I'll say WHERE the "title" equals "Farmers working at dawn." 

Now, I'll close this subquery, and hit semicolon. And here, we've seen our first example of an UPDATE query. I'm trying to update the artist_id column in created to be the id for Li Yin. I only want to do that though on the row where collection_id is equal to the id for this particular painting we want to change the attribution for. 

So now, I'll hit Enter. And if I SELECT star FROM "created," I should hopefully see in this case that the artist_id associated with this painting is 2 and also down below here is 1 as well. So I have Li Yin associated with now two paintings overall. 

So let's get a grasp on what this UPDATE syntax really looks like in general. And for that, let's show the slide here. We have this UPDATE keyword, UPDATE statement in SQL to take a table name and update the columns inside of it. I say UPDATE, then the name of the table I want to update, then I say SET some column equal to some value. 

I could if I wanted to have more than one column here. I could say maybe title and even maybe if we're talking about authors, authors over here, or even acquired date. I could update more than one column in my SET portion here. 

Then comes this WHERE portion, where some condition is true. I want to make sure I don't update all of my rows. I only update those where some condition is actually true. So this is your syntax for updating some columns. Let's say, if you want to change an artist's attribution or if you want to change a typo you've made. 

So let's see this now, not just in terms of changing artists and their attributions, let's see a use case for update where we've made some mistakes in our data. And let's say the museum decides to host some kind of event where people vote on their favorite piece of artwork. They handwrite or type it into some online form. Well, when I get back, a CSV of those responses, some Comma Separated Values, one line for each vote from our people who've attended this convention. 

Let's go back over here, and I'll show you that CSV. Let me go to my environment. And I'll type code mfa.csv to open this CSV that I already have. And here, actually, it's not called mfa.csv. 

It's called code votes.csv. And now here we can see, I have a table of one column that has several votes inside of it. Let's see, we have maybe 20 votes to be exact. 

So the first row is the header row. I have in this CSV, one column called title. And each line here is one vote for a museum-goers favorite piece of artwork. We see "Farmers working at dawn," we see "Imaginative landscape," "Profusion of flowers." 

And our goal is to count up these votes to see which is the most popular. Well, let's try using .import again to actually turn this CSV into our very own SQLite database. 

Let me go back to my TERMINAL, and I'll type sqlite3 votes.db to create a votes database. I'll hit Enter. And now, I can use .import. 

I could say .import a CSV called votes.csv into a table also called votes. I'll Enter now. And now if I SELECT star FROM "votes" semicolon, I'll see all of my votes now in a single table. 

So I want to tally these votes. And we could do that using a technique we learned back in the week on querying and relating. What if I wanted to group by my titles and count up those groups, I could do that. I could say SELECT, let's say, "title," and also the COUNT of "title," and they count up the votes for each title FROM, in this case, my "votes" table like this. 

Now, I want to count not just each vote individually, but I want to group them. I want to count votes by group where that group is, in this case, the "title" itself. So let me try running this query and hitting Enter. 

Well, I'll see I've started to count some votes, like I have Imaginative landscape here, Spring outing, Farmers working at dawn, Spring outing down below here too. But I seem to be missing something, like I shouldn't have this many groups. I should only have four groups. Why do I have so many? 

Well, it seems like there are some typos here. What typos do you see? Feel free to raise your hand. 

SPEAKER 14: So I think that it is a title-- when you count it is case sensitive, so maybe it is counting one by one. 

CARTER ZENKE: Yeah, good idea. So maybe it's case sensitive our GROUP BY, and you'd be correct if you said that. GROUP BY is, in this case, case sensitive. So if we see here, we see Farmers Working at Dawn, all capitalized, at least the Farmers Working and Dawn capitalized. Well, that won't be part of the same group as let's say Farmers working at dawn all properly capitalized, in this case, with lowercase working and lowercase dawn as it is the MFA collections. 

So we could probably fix some capitalization here. What other things could we fix? So let's hear from one more person. Let's go to Lawrence. 

SPEAKER 15: The count on each typo. 

CARTER ZENKE: There are some typos in here, absolutely. So if I look at some of these, I see not "Farmers," but "Famers" working at dawn. I might see instead of Imaginative landscape, I just see "Imagintive" landscape, so definitely some typos in here to fix as well. So let's start working on using our UPDATE tool or UPDATE keyword to fix some of this data, to clean it up, so to speak, so we can count up our votes for these artworks. 

Come back to my TERMINAL. And now, let's try updating to at least remove some of this whitespace here. You might notice that Imaginative landscape has a space before it, and so does Spring outing here. That space counts as part of the title, and I want to remove it. So what could I do? 

I could try using a function in SQLite, one called trim. Trim takes a string and just removes leading and trailing whitespace, giving back only the characters in the middle. So let me try that. 

I'll say UPDATE "votes," my table here, and SET the "title" column equal to what? Well, the trimmed version of title. Trim looks like this-- trim, and then some parentheses to say the input to this function. I'll give it the "title" column. Now, I'll hit semicolon. 

And this query will run as follows. It will update the votes table. And inside the votes table set, the title column equal to the trimmed version of whatever values are inside that column, trimming here means remove trailing and leading whitespace. I don't need to apply any condition. I don't need to say WHERE dot dot dot, because I want this to apply to all rows in my data set. 

Now, I'll hit Enter. And let me try running this again. I'll say up arrow to get back my old query. Now, I'll try counting again, and I think we're making some progress, like I can see that I've removed-- "removed," quote-unquote-- those titles that had leading or trailing whitespace. They're now part of a group because their titles now match those groups. 

Let's keep going. We noticed before, we had some capitalization issues. So let's fix those as well. I can go back to my TERMINAL, and we could introduce this other function, one called upper. 

Upper converts a string like Farmers working at dawn to be all in uppercase, kind of shout it out. And that's useful for us here because we could kind of force everything to uppercase, thereby removing any inconsistencies in capitalization. So let me try that. 

I'll say UPDATE, again, the "votes" table. And this time, I'll set "title" equal to the uppercase version of "title." So upper here is a function in SQLite that takes some values associated with the column and uppercases the entire thing to remove any inconsistencies in capitalization. Again, no condition here, because I want to apply this to all rows. I'll hit Enter. 

And now if I SELECT star FROM "collections" semicolon-- oops, SELECT star FROM "votes," in this case-- no longer as collections table. I'll say SELECT star FROM "votes" semicolon, I should see all of my titles capitalized. But now, I want to group by. So let's try this. 

I'll go up arrow to get back to my old query that GROUP BY "title," hit Enter. And now, we're still making some more progress. I'm reducing my number of groups until I get to 4. But there's still some typos, like we see Famers working at dawn, or Farmers working, or Farmesr working at dawn. We want these to be included as part of this group-- Farmers working at dawn. 

So we should fix some of these typos here. Let me go back to my TERMINAL. Now, let's try this out. I could, in this case, just manually correct these. 

You could imagine me doing something like this. UPDATE "votes" and SET the "title" equal to all caps "FARMERS WORKING AT DAWN," now including a condition, WHERE the "title" presently is equal to for "FAMERS WORKING" like this, semicolon-- or not Famers work-- I think it's Farmers Working, just all-- in two words. That was one title we saw. 

I could manually update it a bit like this. Let me hit Enter, and let me do the same query by hitting the up arrow on my keyboard. Let me find this one, hit Enter, and now we can see that that title, "Farmers Working," is now just part of the main group. "Farmers Working At Dawn." 

I could keep going here. I could try to convert "Farmers Working At Dawn" to "Farmers Working At Dawn." Let me try it. I could say select. Let's say-- not select. Let me update. 

Let me update in this case votes, and set title equal to "Farmers Working At Dawn" where the title equals in this case "Farmers Working At Dawn." Semicolon, Enter, let me hit up arrow to go back to my old query to group by, and now we're getting somewhere. I'm seeing now only two groups for "Farmers Working At Dawn," but one more to fix. 

Now we'd be here for a while if I spend all this time one by one fixing these typos, so there's probably a better way. There is. This one is going to be using the like keyword. We don't have to say equal so much. We can just say like. Now, I could try to match any title that looks like "Farmers Working At Dawn" and update it to be the correct title. 

So let's try that here. I could now try update votes and set title equal to "Farmers Working At Dawn." "Farmers Working At Dawn" where the title is like-- the title is like some pattern I could give. 

Now for this data set, it turns out that the only artwork that begins with F-A is Farmers Working At Dawn. So to match any title with F-A and then anything afterwards, I could use F-A percent, and then close out that pattern. 

Notice how I don't have to have consistent capitalization. Like is case insensitive, and notice how, too, the percent matches anything that comes after F-A. Now in this data set, this is OK, but in a larger data set, you probably want to avoid this kind of query because you could match much more than Farmers Working At Dawn. You'd match anything that begins with F-A, so be careful with this and think through what pattern should you use to fix up your titles. 

Let me just hit Enter here, and now let me try hitting up arrow again to show you the result, and now we're in a pretty good place. I see all of my titles like Farmers Working At Dawn are correctly capitalized, consistently spelled, and all in one place. 

OK, let's try the others here. I also have "Imaginative landscape which could be a little bit better designed here. A little bit better grouped as well by fixing some of these typos. Well, here, I could try to match on to maybe imaginative. That seems like a long enough word that it could be useful for me here. 

So I'll say something like select-- not select. I'll do update votes and set the title equal to "Imaginative Landscape." Now I want to match any rows that have a title like "Imaginative space percent," end quote, semicolon. 

Now I'll match any title that has imaginative as a full word at the very beginning of this phrase. This is, I would say, a better query than F-A because F-A could match much more than this painting name. Here, if we have only one painting that begins with imaginative, this could work for us now. 

So I'll try this. I'll say Enter, and now, let me rerun the query. Group by, and we're so close. There's still one more. "Imagintive Landscape," so for this one, I might just need to manually update that piece. 

I could say let's do update, votes, and set title equal to in this case "Imaginative Landscape" where, of course, the title currently is "Imagintive Landscape" without the A in there. Now hit semicolon to run this query. I'll show you the results, and now, I think we're literally almost there. 

There's only one more to do. Profusion of Flowers. Let's fix that one now. I could say select votes-- not select. Update. I'm always updating here. Update votes, and then we want to set the title column to be equal to profusion-- "Profusion of Flowers" like this. 

Then where the title is like, in this case, profusion at the beginning. Notice how I can't use perfusion space and then afterwards because we have one title that is literally just "Profusion," plain and simple. No spaces afterwards. No spaces at the beginning. To match that, I should just say profusion at the beginning. 

Now I can hit Enter, and if I do my usual query to show you the votes grouped by title, I see I'm finally at my correct, fully-formatted data set. Farmers Working At Dawn has six votes. Imaginative Landscape has five, and so on down the row. 

So here we've seen update being used not just to associate artists with new paintings, but also to clean our data, to make sure we can actually tally votes up in a way that's consistent and clean in this case. What questions do we have on update now? Let's go to [INAUDIBLE] 

SPEAKER 16: Yeah, so what I wanted to ask is if there is a specific function for like lowercase. So there's a function for upper, and these are the ones for all lowercase. 

CARTER ZENKE: Yeah, a good corresponding question here. We used upper, but is there lower? Turns out, there is a lower function. It's spelled lower, and is in all lowercase like upper was, so I could use that as well. 

There is a whole list of what SQLite calls these scalar functions, functions that take in some set of values and return to you some modified version of those values. So if you do some googling, look for SQLite scalar functions. You can find all those functions in one place in the SQLite documentation. 

Let's take one more question here. Let's go to Rupinder. 

SPEAKER 17: So, yeah, my question is so instead of updating all the titles, is it possible to create a new column with the four categories that we want to have actually? 

CARTER ZENKE: Yeah, I could see that working, too. So there's a few ways to fix this data. You could imagine adding a new column, and maybe you assign values to the rows in that column based on what you see in the title that they've given you. So if you can match something like imaginative, and you'd know that all your paintings associated with imaginative have the same title, you could group them into the one category, the two category, and so on. 

Here, you wouldn't be modifying those titles. You'd just be trying to strategically apply different categories, which could also work as a solution here, too, so good thinking. 

OK, so we've seen now how to insert data, how to update data, and previously, how to delete data. What we can still do though, are learn how to have other SQL command-- other SQL statements that can run after we make our very own SQL statements as well. We'll come back and talk about these things called triggers. 

And we're back. So we've seen now a whole collection of ways to write data to a database. We've seen how to insert data, how to update data, and even delete data. What we'll see now though, this idea of a trigger, a way of writing a SQL statement to run in response to some other SQL statement like an insert, an update, or a delete. 

Let's consider our museum yet again, and let's say they have a schema a bit like this one now. They have items in their collections, but they also have a transactions table. Now wouldn't it be nice if whenever I deleted something from the collections table, it would show up in transactions as sold, having been sold from the museum's collections. Well, I could use a trigger to do just that. 

Let's see it visually first though. I'll say, let's try to delete "Spring Outing" from our collections table. Well, if I do that, I could pretty easily do it with delete from, right? Now it's no longer part of my database, but if I have created a trigger to run an insert whenever I hear or have listened to a delete on my collections table, I could then see that same title in my transactions table as sold all automatically. 

Let's think to maybe the museum acquires some piece of artwork. Maybe they actually add something new to their collection. We want that to show up in our transactions as well with the action of bought. Well, let's try it. I could probably add a new row here. 

I could say let's add "Peonies and Butterfly" to my collection here. Now, if I have the appropriate trigger, I could run an insert into on transactions automatically that would insert this same title with the action of bot, thereby keeping a log of all of our sold pieces of artwork and all of our bought pieces of artwork. 

Now to create what we're calling a trigger here, we can use the following syntax. I could say first create trigger, and then some name. So I tend to give triggers names to identify them among all of my database schemata, like my tables and so on. 

Here, after I say create trigger, I have to specify. Should this trigger, this statement, run after or before some other SQL statement? Let's keep going with before here. 

Maybe it runs before an insert on some table. That's fine. It could also run before an update of a column on some table. It could even run before a delete on some table, and keep in mind, this could be before or after any of these kinds of SQL statements. 

In this case, we have insert, update, and delete. Now after we say this, what we should say, too, is for each row. For each row means that if I were to maybe delete multiple rows, I should run my SQL statement for each row that I delete. If I delete two rows, I should run this statement coming soon two times. 

OK, so now we've set up our trigger, at least kind of the data, the setup for it here. What we haven't done yet is specify what statement should run whenever we see a delete on particular table. Well, let's try this. Here I could say begin. Begin means here comes my statement I want to run whenever I hear a delete on this table. 

Now, all right. That query down below. That statement inside my begin, and I'll finish it off with an end to say this is the end of my statement, and my entire trigger looks like all of this lines of code here to say listen for a delete and run this statement in here for each row that I delete, or update, or insert. 

So let's try actually implementing in this case the museum's collections where they're trying to sell or buy pieces of artwork, and automatically add them to their transaction table whenever they create or delete some piece of data from their collections. Go back to my terminal here, and let me pull up a database. I'll do SQLite 3 mfa.db, and you'll notice that this is the same schema that we've had before. We have collections, artists, and the created table. 

What's missing here, though, is our transactions table. So let me make that for us here. I'll say let me go create table, transactions, and now let me give it an ID column of type integer. Now, I'll also give it a title column as we saw before called "Text." It's in this column I'll store the title of the piece we either bought or sold. Now, let me also include the action column, also of type text to say whether we bought or sold this piece. 

OK, finally, I'll make the ID column my primary key. And if I say semicolon here and hit Enter, I now have created this table. So if I say dot schema, notice how I can now see my transactions table down below, and I'm safe to run insert on this table to add some data to it. 

Well, let's try now creating our very first trigger. We're going to try to create a trigger that whenever an item is deleted from our collections table, we actually add it to our transactions table with the action of sold, meaning we sold this particular item. 

So let me say create trigger, as we saw before, and give it some name to identify it. I'll call this one sell. Now, I'll say I want to run this trigger before I delete on the collections table. So before I run the delete query on my collections table, I want to run this statement instead. 

I'll say for each row, begin. I want to now give you some statement to run for each row that I delete from collections. Let's make it this query here. On 1, 2, 3, 4 spaces to indent and make sure this is clearly the query I want to run after I run the delete on collections. 

I'll say I want to insert into transactions. I want to insert into the title and action columns. What value should I insert? Well, I should insert the old title, and actually, in triggers, you get access to this keyword called old. This is the title that we've just deleted from our collection. 

So old.title gives me access to the old row, the row we deleted, its title column in particular. Now, let me add in the action, which is just plain and simple "sold." So this is the query to run before I delete on collections. I will take the title of the row I'm about to delete, and insert it into transactions along with the action sold. 

Let me type a semicolon here to say this is my entire query, hit Enter again, and now I can end that particular statement I want to run after I hear a delete on collections. So I'll hit Enter now, and let me type dot schema. I can actually see this trigger is part of my schema. I can see create trigger sell before delete on transactions underneath the transactions table. 

OK, let's try putting this into action here. Let me try actually deleting something from our collections table. I'll say delete from collections where the title equals "Profusion of Flowers." I want to sell this particular piece of artwork from our collection. 

I'll hit semicolon here, and now nothing seems to happen as usual, but if I select star from collections-- from collections, I see that piece is gone. It's no longer in here, and if I select star from transactions-- select star from transactions, I actually see it's been sold. 

So automatically, I added "Profusion of Flowers" with the action of sold to my transactions table. I did not type insert myself. I instead created a trigger to run this statement, insert into, whenever I delete it on collections. 

Let's try one more. We want to not just be able to sell artwork, but also add it to our collection by buying it. So I create a trigger called buy. Let me type create trigger "buy," and now I want to run this statement forthcoming after an insert on collections. 

So I want to first insert a new item to collections, and when I do, after I do, I want to run this query coming next. Now, I want to, for each row, begin the statement I want to run. So as we said before, I want to insert into the transactions table. I want to insert into the title and action columns, just like this. 

Now, I want to insert some values. The values I want to insert-- well, I want to insert the new title, the new row I've just added. So in addition to old, we also have new when I'm talking when I'm using a trigger on an insert. So I'll say new.title, where again, new is the new row or rows I'm inserting, and the dot title is a column called title. 

Now, here I'll say the action was bought. We bought this artwork. I'll hit semicolon here, and now, I'll say end to say this is the end of my statement I want to run for every row I insert on collections. Let me hit Enter here. I've created this trigger. I can type dot schema to see it in my actual schema. 

Now, let's try inserting. I'll say insert into collections. I want to insert into the title and accession number, as well as the acquired columns here. I want to insert some new artwork. Maybe I sold Profusion of Flowers, but now I want to just buy it back. So I'll say I'll insert "Profusion of Flowers" is the title. The accession number was 56.257, and the date we acquired it, well, that was in 1956/04/12. 

Let's hit semicolon here. Now, I'll insert, and I'll say select star from collections. I should see Profusion of Flowers back in my table. Now though, if I say select star from transactions-- select star from transactions, I should see I previously sold Profusion of Flowers, and later bought it back. 

So again, I didn't run this insert myself. I instead triggered it by making my very own insert on the collections table. So this is the power of triggers. You're able to run statements that I myself didn't create in response to statements that I actually did. 

So let me ask here, what questions do we have on these triggers and what they can do for us? Let's go to Simon. 

SPEAKER 17: I'd like to know if there are only-- there is only one SQL inquiry that you can write in a trigger or multiple inquiries. 

CARTER ZENKE: Yeah, we saw begin and end in our triggers here, where begin indicates the start of our statement, and end indicates the end. You actually can have multiple statements inside of the begin and end separated by, of course, that usual semicolon that we have. 

OK, so one final idea here that we're getting at-- we're kind of touching at with this transactions table is this idea of having a soft deletion. So when I dropped something from my collections table, it actually ended up in my transactions table, so it wasn't fully deleted in the sense that I'm keeping a record of it somewhere else. 

There's also one more way among many others to implement this idea of soft deletions, that is not quite fully deleting something from my database but instead keeping it around with a log that we've removed it in some way. Let's look at this table here. 

I have collections, but I also have this column called "deleted," and by default, the value here will be 0. That is, by default, I'll add some artwork to my collections table and deleted will be 0. If I want to delete a piece of artwork though, what could I do? Instead of deleting it fully from my table or removing the row, I could mark it as deleted. I could change the deleted column from a 0 to a 1. 

Let's say I wanted to delete Farmers Working At Dawn instead of writing literally delete from collection where title equals Farmers Working At Dawn, I could say update the deleted column of collections, where the title is Farmers Working At Dawn, and make it not 0, but 1. 

Now, I've marked this item as deleted. I could run a query that excludes deleted items, but this row is still around in my table. So let's try implementing this idea of a soft deletion inside of our collections table so that we don't lose records of things that actually had in our collection. 

I'll come back to my computer here, and let's work on altering our table for collections here. If I type dot schema collections-- dot schema collections, you should see I have a table "collections," and also the triggers associated with it. 

What I don't have yet in collections, if I go over here and show you, I don't have a deleted column. I have ID, title, accession number, and acquired, but I don't have deleted. 

So as you learned last week, we can use alter table to add a column to our collections table, and by default, make the value 0. So let's try that. 

I'll go back to my environment, and now I'll run alter table. I might say alter table, and a table I want to alter, which is "collections." Now I want to add the column "deleted" to collections, and I want to make this type integer, so our whole numbers whether positive or negative, and the default value will be 0 in this case. 

Now, I'll hit Enter, and if I type dot schema collections, I should see I have my very own deleted column inside of collections. If I type, for instance, select star from collections semicolon, I'll see that by default, all of these values are 0 in my new deleted column. 

So instead of trying to literally delete from collections, I could instead just update it. I could say take the updated column, and flip it from a 0 to a 1. Let me try update collections and set the deleted column equal to 1, where let's say the title equals Farmers Working At Dawn. And my query wraps here, but it should still work just fine. 

I'll hit semicolon, Enter. Now, I didn't use delete from. I used update. If I select star from collections semicolon here, I'll see, well, Farmers Working At Dawn is in this table, but technically-- at least we marked it as deleted. 

So it seems now, I don't want to use just select star from collections to see what's inside my collection. I want to apply a filter to remove those that have a deleted value that's not equal to 0. Let's try this. I'll say select star from collections where deleted does not equal 1. 

If I hit semicolon here, what do I see? Only those values that are not deleted. I could go ahead and find them. I could say maybe select star from collections where deleted actually is 1. Really equals 1 like this, and now I see all those rows that I marked as deleted. 

Now this has some advantages. One of them is we keep data around. We don't actually formally delete it. We can still recover it later on, but it also has some tricky ethical questions, too. Like it's OK if we're talking about artwork here, but if you're talking about user data, is it right to only soft delete it if they ask you to delete it? 

Particularly in conversation with new frameworks like GDPR, and the right to be forgotten, and so on, it's up to you as a programmer to make the right decision here. When should you find data that's [INAUDIBLE] and actually delete it, or when is it better to just soft delete it and keep it around in case you need it for later? 

So we'll end here on this note where if I say select star from collections, and I want to find those only where deleted is not equal to 0 like this-- or not equal to 1, this gives me back all of the items that are not deleted. 

But wouldn't it be nice if I could actually run a query on some brand new temporary table that actually only will ever have those items that are not soft deleted? Turns out, we can do that with an idea called views, and we'll see those in much more depth next week. We'll see you then.