WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:04.910 [MUSIC PLAYING] 00:00:17.155 --> 00:00:18.780 CARTER ZENKE: Well, hello, one and all. 00:00:18.780 --> 00:00:22.890 And welcome back to CS50's Introduction to Databases with SQL. 00:00:22.890 --> 00:00:24.360 My name is Carter Zenke. 00:00:24.360 --> 00:00:26.820 And last we left off, we learned about relating-- 00:00:26.820 --> 00:00:31.360 that is, how to have multiple tables in our database for people, places, 00:00:31.360 --> 00:00:34.500 and things, and how to have them relate to one another in the way 00:00:34.500 --> 00:00:36.580 you might do in the real world. 00:00:36.580 --> 00:00:38.550 Now, today, we'll take a step forward. 00:00:38.550 --> 00:00:41.370 And we'll talk about how to put you in the driver's seat, 00:00:41.370 --> 00:00:45.870 designing your very own database schemas to organize your data. 00:00:45.870 --> 00:00:49.890 Now, we'll pick up where we left off, which is with this database of books. 00:00:49.890 --> 00:00:52.050 So we had this database. 00:00:52.050 --> 00:00:56.760 It was full of books that have been longlisted for the International Booker 00:00:56.760 --> 00:00:57.450 Prize. 00:00:57.450 --> 00:01:01.740 To be longlisted means to be nominated for some prize, let's say. 00:01:01.740 --> 00:01:05.190 So we had the past five years of books in this database. 00:01:05.190 --> 00:01:10.170 And we worked on improving this database over time from week 0 to week 1. 00:01:10.170 --> 00:01:13.080 Now, this week, we'll actually take a look underneath the hood 00:01:13.080 --> 00:01:17.980 and see what commands we had used to create these varied databases. 00:01:17.980 --> 00:01:21.660 So let's reveal now what we had done all along. 00:01:21.660 --> 00:01:24.010 I'll go back to my terminal here. 00:01:24.010 --> 00:01:26.040 And if you remember, I could use a command 00:01:26.040 --> 00:01:29.730 to open up a database file, which was this command here, 00:01:29.730 --> 00:01:34.450 SQL sqlite3, and then the name of the file I want to open. 00:01:34.450 --> 00:01:35.910 So let's try this. 00:01:35.910 --> 00:01:39.420 I'll go back to week 0, to my week 0 folder like this. 00:01:39.420 --> 00:01:43.035 And I'll open up my long list database, week0/longlist.db. 00:01:46.170 --> 00:01:47.430 I'll hit Enter. 00:01:47.430 --> 00:01:49.720 Now, I'm in my sqlite prompt. 00:01:49.720 --> 00:01:52.230 So I could work on typing some SQL commands, 00:01:52.230 --> 00:01:58.030 SQL statements or queries inside of this terminal prompt here. 00:01:58.030 --> 00:02:01.660 So if I want to get a feel for what's inside this database, 00:02:01.660 --> 00:02:04.230 we saw I could use a command called select. 00:02:04.230 --> 00:02:07.860 So I'll select now some rows from this table. 00:02:07.860 --> 00:02:14.370 I'll say SELECT, let's say, the title and also the author columns 00:02:14.370 --> 00:02:19.740 from my longlist table, semicolon, and Enter. 00:02:19.740 --> 00:02:23.760 Now, I'll see all the titles and all the authors that 00:02:23.760 --> 00:02:27.030 were inside of this long list table. 00:02:27.030 --> 00:02:29.250 But I only want to peek here. 00:02:29.250 --> 00:02:32.760 I only want to see roughly what kind of data is inside. 00:02:32.760 --> 00:02:36.130 So I could probably improve this command here. 00:02:36.130 --> 00:02:43.530 I could instead say SELECT the author and title columns from longlist-- 00:02:43.530 --> 00:02:49.470 from longlist, but limit now to the first five rows we saw before. 00:02:49.470 --> 00:02:52.440 Semicolon, and I'll hit Enter on this query. 00:02:52.440 --> 00:02:56.590 And now, I'll see only the top five rows. 00:02:56.590 --> 00:03:02.860 So I'm able now to see what kind of data is inside my database. 00:03:02.860 --> 00:03:08.790 But what I can't yet see is what command was used to create this table 00:03:08.790 --> 00:03:11.950 and what kind of data could be stored inside of it. 00:03:11.950 --> 00:03:16.140 So let's reveal now what was going on underneath the hood all this time. 00:03:16.140 --> 00:03:20.040 I'll say, this new command, a sqlite command-- not a SQL keyword, 00:03:20.040 --> 00:03:24.580 but sqlite command called .schema, .schema. 00:03:24.580 --> 00:03:28.350 Now, if I hit Enter, I'll see the following-- 00:03:28.350 --> 00:03:30.840 the command, the statement, the query that 00:03:30.840 --> 00:03:34.260 was used to create this longlist table. 00:03:34.260 --> 00:03:37.980 And notice how I have many columns inside this table. 00:03:37.980 --> 00:03:43.480 I have an ISBN column, a title column, an author column, and so on. 00:03:43.480 --> 00:03:47.340 And each column seems to have some kind of data 00:03:47.340 --> 00:03:52.080 that could be stored inside of it like text, or integers, or real, 00:03:52.080 --> 00:03:56.270 or floating point values, or decimals, if you're familiar. 00:03:56.270 --> 00:04:01.510 So this is how we created the very first version of longlist.db. 00:04:01.510 --> 00:04:04.165 But let's also see how we created the second. 00:04:04.165 --> 00:04:09.190 So I'll type .quit to leave this version of longlist.db. 00:04:09.190 --> 00:04:12.650 And now, let me open up the next version we had created, 00:04:12.650 --> 00:04:13.910 which is more relational. 00:04:13.910 --> 00:04:17.350 It had tables inside of it that could relate to one another. 00:04:17.350 --> 00:04:21.100 So I'm going to type sqlite3, then long-- 00:04:21.100 --> 00:04:26.320 sqlite3, and go to week 1, and then type longlist.db. 00:04:26.320 --> 00:04:27.700 I'll hit Enter. 00:04:27.700 --> 00:04:31.720 Now, I'm on my next version of longlist.db. 00:04:31.720 --> 00:04:32.960 Well, what could I do? 00:04:32.960 --> 00:04:36.050 I could type SELECT and look at some of the tables in here 00:04:36.050 --> 00:04:38.200 to see what kind of data is inside. 00:04:38.200 --> 00:04:44.350 I could perhaps say SELECT maybe the names from the authors table 00:04:44.350 --> 00:04:48.220 here, from the authors table, and hit semicolon. 00:04:48.220 --> 00:04:54.280 Now, I'll see all the names of authors that are inside of the authors table. 00:04:54.280 --> 00:04:55.930 I could do the same thing for books. 00:04:55.930 --> 00:04:58.150 I could maybe look at the titles of books. 00:04:58.150 --> 00:05:05.350 I could say SELECT title from the books table, semicolon, and Enter. 00:05:05.350 --> 00:05:10.490 Now, I see all of the titles that are inside my books table. 00:05:10.490 --> 00:05:14.230 But what I haven't seen yet is the schema of this database. 00:05:14.230 --> 00:05:17.080 It is the way it is organized and the commands 00:05:17.080 --> 00:05:19.700 that were used to create these tables. 00:05:19.700 --> 00:05:21.730 So let me work on that now. 00:05:21.730 --> 00:05:24.460 I'll clear my terminal using Control-L. 00:05:24.460 --> 00:05:28.090 And now, let me type .schema again. 00:05:28.090 --> 00:05:34.420 I'll type .schema to see what commands were used to create this database. 00:05:34.420 --> 00:05:37.630 Hit Enter, and I can see-- 00:05:37.630 --> 00:05:39.670 I mean, there are quite a lot of commands here. 00:05:39.670 --> 00:05:42.610 Now, if this feels overwhelming, I mean, I'd be right there with you. 00:05:42.610 --> 00:05:45.610 This is a lot of commands to parse through and read. 00:05:45.610 --> 00:05:49.850 So there's probably a better way to do this. 00:05:49.850 --> 00:05:55.660 And one way to try is to type .schema, and then give it some table name. 00:05:55.660 --> 00:05:58.930 Let's say, want to understand the schema for just the books table, 00:05:58.930 --> 00:06:00.770 like just that for now. 00:06:00.770 --> 00:06:04.060 So I'll say .schema, and then the table name-- 00:06:04.060 --> 00:06:05.720 books, in this case. 00:06:05.720 --> 00:06:07.840 Then I can hit Enter. 00:06:07.840 --> 00:06:11.500 Now, I'll see the schema, the organization, 00:06:11.500 --> 00:06:15.860 the command that we used to create the books table, in this case. 00:06:15.860 --> 00:06:18.250 And notice again, we have several columns-- 00:06:18.250 --> 00:06:22.420 ID, ISBN, title, publisher ID, and so on. 00:06:22.420 --> 00:06:26.380 Each one has their own kind of data they could support or take 00:06:26.380 --> 00:06:30.590 into this column like integers, text, and so on. 00:06:30.590 --> 00:06:33.700 So again, what we'll do today is have you all 00:06:33.700 --> 00:06:36.640 learn how to write your very own create table 00:06:36.640 --> 00:06:41.050 commands to build your very own databases that represent what you 00:06:41.050 --> 00:06:44.360 want to represent in the real world. 00:06:44.360 --> 00:06:47.860 So let me exit this prompt here. 00:06:47.860 --> 00:06:51.370 And let me propose that we'll have a bit of a design challenge 00:06:51.370 --> 00:06:55.600 today to actually try to represent a real world entity with some database 00:06:55.600 --> 00:06:56.240 here. 00:06:56.240 --> 00:06:59.050 And if you're not already familiar, Boston 00:06:59.050 --> 00:07:02.110 is perhaps famous for being among the first cities 00:07:02.110 --> 00:07:05.560 to have a subway system in the United States. 00:07:05.560 --> 00:07:09.730 So here is a picture from the late 1800s of a subway being 00:07:09.730 --> 00:07:11.950 built in Boston's city streets. 00:07:11.950 --> 00:07:14.170 Underneath the streets here, there would be 00:07:14.170 --> 00:07:18.100 trolley cars that would go and transport people across Boston. 00:07:18.100 --> 00:07:20.410 Here's another picture of a trolley actually working 00:07:20.410 --> 00:07:21.790 underneath the streets. 00:07:21.790 --> 00:07:23.920 So people would go down underneath. 00:07:23.920 --> 00:07:25.360 They would hop on a trolley. 00:07:25.360 --> 00:07:29.950 They'd be able to different parts of Boston, perhaps from Harvard to MIT 00:07:29.950 --> 00:07:33.550 or downtown up to, let's say, Braintree or down to Braintree, 00:07:33.550 --> 00:07:36.070 which is more south of Boston, for example. 00:07:36.070 --> 00:07:39.280 One of the famous stops is the Park Street stop, 00:07:39.280 --> 00:07:41.530 which is right down in the middle of Boston, 00:07:41.530 --> 00:07:45.010 one of the central hubs of this subway system. 00:07:45.010 --> 00:07:50.480 And now, these photos are all from, let's say, the early 1900s, late 1800s, 00:07:50.480 --> 00:07:51.230 and so on. 00:07:51.230 --> 00:07:54.070 But the subway has gotten a lot more modern since then. 00:07:54.070 --> 00:07:59.000 And actually, now, we have several lines that span the entire city and beyond. 00:07:59.000 --> 00:08:02.740 So here, we have the Red Line, of which Harvard and MIT are a part. 00:08:02.740 --> 00:08:05.800 We have the green line, which brings you kind of west to east-- 00:08:05.800 --> 00:08:09.680 the blue line, the orange line, and so on. 00:08:09.680 --> 00:08:14.790 So many more lines and stations have been added to this system. 00:08:14.790 --> 00:08:18.460 It's a big design challenge to represent all of these stations, 00:08:18.460 --> 00:08:24.560 all of these lines, and all of these people who might ride this subway, too. 00:08:24.560 --> 00:08:31.220 So the question then becomes, how can we create a schema for this data? 00:08:31.220 --> 00:08:36.350 And again, by schema, we mean what kinds of tables should we have, 00:08:36.350 --> 00:08:40.610 what kinds of columns might those tables have, and what kind of data 00:08:40.610 --> 00:08:44.159 should we put in each of those columns, for instance. 00:08:44.159 --> 00:08:46.320 So let me propose this. 00:08:46.320 --> 00:08:49.790 Let's say we start just first with names and stations. 00:08:49.790 --> 00:08:53.750 So Charlie here, our very first rider on this system, 00:08:53.750 --> 00:08:56.810 is going to be at the Kendall at MIT station. 00:08:56.810 --> 00:08:58.910 So this is what this table represents now. 00:08:58.910 --> 00:09:01.370 But what more could we have? 00:09:01.370 --> 00:09:05.750 Well, we might also want to have maybe what Charlie is doing at that station. 00:09:05.750 --> 00:09:09.050 Maybe he's entering the station, for instance. 00:09:09.050 --> 00:09:11.390 And if you're familiar with the subway system, 00:09:11.390 --> 00:09:17.550 you often have to pay to get onto a train or get onto the station itself. 00:09:17.550 --> 00:09:23.570 So let's say Charlie pays some fare to enter into the Kendall/MIT station. 00:09:23.570 --> 00:09:26.900 Well, back in the mid 1900s, the fare was only about a dime. 00:09:26.900 --> 00:09:27.800 It was $0.10. 00:09:27.800 --> 00:09:33.820 So we'll say Charlie paid $0.10 to enter the Kendall/MIT station. 00:09:33.820 --> 00:09:36.140 And now, this seems pretty good. 00:09:36.140 --> 00:09:41.300 But if I am the Transit Authority, the person who runs the subway system, 00:09:41.300 --> 00:09:45.890 I probably want to know, does Charlie have enough money to get on the train? 00:09:45.890 --> 00:09:48.730 And if so I want to make sure that, OK, well, Charlie actually 00:09:48.730 --> 00:09:50.590 could get on this train. 00:09:50.590 --> 00:09:53.420 So let's say not only does Charlie pay a fare. 00:09:53.420 --> 00:09:56.570 He has some remaining balance afterwards. 00:09:56.570 --> 00:10:00.160 So Charlie here has gotten onto the Kendall/MIT stop. 00:10:00.160 --> 00:10:05.330 He's paid the fare of $0.10 and has $0.05 left. 00:10:05.330 --> 00:10:08.750 OK, so here's a bit of a table. 00:10:08.750 --> 00:10:11.030 We probably add more information to it. 00:10:11.030 --> 00:10:15.230 Let's say Charlie then leaves at the Jamaica Plain stop. 00:10:15.230 --> 00:10:18.650 And the fare to leave is about a nickel, $0.05. 00:10:18.650 --> 00:10:21.410 And now, Charlie has no cents left over. 00:10:21.410 --> 00:10:27.680 So again, Charlie paid $0.10 to get on, had $0.05 left, paid $0.05 to get off, 00:10:27.680 --> 00:10:31.930 and now has no remaining balance here anymore. 00:10:31.930 --> 00:10:32.440 OK. 00:10:32.440 --> 00:10:34.040 So that's Charlie's story. 00:10:34.040 --> 00:10:35.860 Let's look at Alice, though. 00:10:35.860 --> 00:10:38.530 Let's say Alice gets on at the Harvard stop. 00:10:38.530 --> 00:10:42.310 They too pay $0.10 to get on at the Harvard stop. 00:10:42.310 --> 00:10:45.370 And they have a remaining balance of $0.20. 00:10:45.370 --> 00:10:49.510 Alice will go, let's say, to Park Street, get off at Park Street, 00:10:49.510 --> 00:10:51.160 pay the nickel to leave. 00:10:51.160 --> 00:10:55.430 And now, they'll have a balance of $0.15 at the end. 00:10:55.430 --> 00:10:56.620 Let's go to Bob. 00:10:56.620 --> 00:10:59.260 Bob enters the Alewife station. 00:10:59.260 --> 00:11:00.940 They pay $0.10. 00:11:00.940 --> 00:11:02.800 They have remaining balance of $0.30. 00:11:02.800 --> 00:11:08.337 And let's say they leave at Park Street and have a fare of $0.10 to leave 00:11:08.337 --> 00:11:09.670 because it's a further distance. 00:11:09.670 --> 00:11:14.490 Now, they'll have a running balance of $0.20 overall. 00:11:14.490 --> 00:11:18.840 So this table is OK, I might admit. 00:11:18.840 --> 00:11:23.100 I mean, last time, we learned about having what we called primary keys 00:11:23.100 --> 00:11:24.340 and foreign keys. 00:11:24.340 --> 00:11:26.220 So it seems like that's missing here. 00:11:26.220 --> 00:11:28.020 Let's go ahead and add that here. 00:11:28.020 --> 00:11:32.940 I'll give each row a unique ID, so I can know who entered, who exited, 00:11:32.940 --> 00:11:35.580 and so on, give that a unique ID here. 00:11:35.580 --> 00:11:41.950 But I might even say that this table could really be improved substantially. 00:11:41.950 --> 00:11:49.410 And I want to ask you what redundancies or inefficiencies do you see here? 00:11:49.410 --> 00:11:52.680 If we're trying to represent riders and stations, 00:11:52.680 --> 00:11:56.990 what can we improve about this design? 00:11:56.990 --> 00:12:01.740 AUDIENCE: So probably the redundancy will be the names and the stations, 00:12:01.740 --> 00:12:02.240 too. 00:12:02.240 --> 00:12:06.760 For example, if Charlie will go to the train daily, 00:12:06.760 --> 00:12:10.875 then he will become most of the names in the data. 00:12:10.875 --> 00:12:12.250 CARTER ZENKE: Yeah, a good point. 00:12:12.250 --> 00:12:13.450 If I'm hearing what you're saying, [? Loren, ?] 00:12:13.450 --> 00:12:16.420 let me show you some examples that I highlighted here. 00:12:16.420 --> 00:12:20.290 One example could be, to your point, about these names. 00:12:20.290 --> 00:12:24.370 These names seem to be telling us the name of a person. 00:12:24.370 --> 00:12:27.640 But here, we have only three names-- 00:12:27.640 --> 00:12:29.680 Charlie, Alice, and Bob. 00:12:29.680 --> 00:12:32.170 Well, my name is Carter. 00:12:32.170 --> 00:12:35.110 And what if somebody else named Carter also 00:12:35.110 --> 00:12:37.810 tried to get on and leave at some stop? 00:12:37.810 --> 00:12:39.820 Well, I wouldn't be able to know which Carter 00:12:39.820 --> 00:12:44.210 was which or which Charlie was which, which Alice was which, and so on. 00:12:44.210 --> 00:12:47.920 So we probably need a way to represent people and their names a little better 00:12:47.920 --> 00:12:50.220 here, too. 00:12:50.220 --> 00:12:55.870 What other ideas do we have for how to improve the design of this table? 00:12:55.870 --> 00:13:00.330 AUDIENCE: Yes, I think we can have a singular ID for a singular person. 00:13:00.330 --> 00:13:02.998 That way, we'll be better able to track their activities. 00:13:02.998 --> 00:13:03.790 CARTER ZENKE: Nice. 00:13:03.790 --> 00:13:06.555 So we probably have an ID for each person, 00:13:06.555 --> 00:13:10.830 a bit what we learned about last week, putting people in their own table 00:13:10.830 --> 00:13:14.890 and giving them their own unique ID, a primary key. 00:13:14.890 --> 00:13:16.380 Let's show that here. 00:13:16.380 --> 00:13:18.150 I'll go to some slides. 00:13:18.150 --> 00:13:21.345 And I'll pick out one that shows us just riders. 00:13:21.345 --> 00:13:23.220 So to your point, [? Soqanya, ?] we could try 00:13:23.220 --> 00:13:25.830 to have maybe a table for just riders. 00:13:25.830 --> 00:13:29.130 And maybe to simplify, this table has only two columns. 00:13:29.130 --> 00:13:32.850 It has a column for ID and a column for name. 00:13:32.850 --> 00:13:38.250 So here, we have Charlie, Alice, and Bob all in their own table. 00:13:38.250 --> 00:13:42.270 Well, let me propose to you, we could do the same thing for stations. 00:13:42.270 --> 00:13:44.760 Let's say we have a table of stations now. 00:13:44.760 --> 00:13:48.030 And we give each one their very own ID as well, 00:13:48.030 --> 00:13:50.400 our own primary key for this table. 00:13:50.400 --> 00:13:52.540 We have Harvard, Kendall, and Park Street. 00:13:52.540 --> 00:13:57.130 We can differentiate between them using their IDs here. 00:13:57.130 --> 00:14:00.000 So a few improvements could be made. 00:14:00.000 --> 00:14:04.560 And as we're making these improvements, splitting one table into many, 00:14:04.560 --> 00:14:08.010 and debating what kind of data to store in each table, 00:14:08.010 --> 00:14:12.180 the process we're going through is one called normalizing. 00:14:12.180 --> 00:14:14.640 We're normalizing our data. 00:14:14.640 --> 00:14:18.210 To normalize means to reduce redundancies, effectively, 00:14:18.210 --> 00:14:19.530 to take a table-- 00:14:19.530 --> 00:14:22.230 take one table, for instance, split up into multiple, 00:14:22.230 --> 00:14:26.550 and have each entity be part of its very own table. 00:14:26.550 --> 00:14:31.590 Some academics in the world have named different normal forms, quote unquote. 00:14:31.590 --> 00:14:36.120 There's like first normal form, second normal form, third normal form, 00:14:36.120 --> 00:14:39.420 this progression of making your day more and more efficient. 00:14:39.420 --> 00:14:41.160 You can use those as heuristics. 00:14:41.160 --> 00:14:43.770 But end of the day, a few principles might apply. 00:14:43.770 --> 00:14:48.750 First, take your entities, like in this case stations and riders, 00:14:48.750 --> 00:14:51.840 and put them each in their own table. 00:14:51.840 --> 00:14:56.760 And if you add more data, make sure that if I were to add a column, let's say, 00:14:56.760 --> 00:15:00.780 to riders, it is only a data point about riders-- 00:15:00.780 --> 00:15:05.190 not about stations, not about fares, only about riders. 00:15:05.190 --> 00:15:08.220 And that process can help us make a data set that 00:15:08.220 --> 00:15:14.050 is more dynamic, more easy to reproduce, and more easy to write queries on. 00:15:14.050 --> 00:15:18.090 So that is the process here of normalizing. 00:15:18.090 --> 00:15:18.960 OK. 00:15:18.960 --> 00:15:25.830 So if we have now riders and stations, we want to represent them in our table. 00:15:25.830 --> 00:15:28.050 Well, we could use what we learned about relating 00:15:28.050 --> 00:15:30.960 last week to ask, how could we actually represent 00:15:30.960 --> 00:15:33.790 these riders and these stations? 00:15:33.790 --> 00:15:36.640 So let's say here I can have riders and stations. 00:15:36.640 --> 00:15:40.330 I want to make sure that I have the right relationship between them. 00:15:40.330 --> 00:15:42.360 Well, if you're familiar with subways, we 00:15:42.360 --> 00:15:45.870 might say that a rider goes to one station. 00:15:45.870 --> 00:15:50.070 And this big T here is the symbol for a station here in Boston, 00:15:50.070 --> 00:15:52.890 for the T's that we call it, for the subway. 00:15:52.890 --> 00:15:56.010 So a rider might go to one station. 00:15:56.010 --> 00:15:58.530 But of course, that might not be the full picture. 00:15:58.530 --> 00:16:01.660 A rider also gets off at some station on. 00:16:01.660 --> 00:16:06.990 So a rider could be associated with not just one station, but multiple. 00:16:06.990 --> 00:16:10.380 And if you're familiar, at least with any subway system or the Boston one, 00:16:10.380 --> 00:16:12.660 too, it can often get pretty busy. 00:16:12.660 --> 00:16:16.890 And so riders might not just go to, of course, one station or two. 00:16:16.890 --> 00:16:22.750 Stations could also have multiple riders that are on a particular station here. 00:16:22.750 --> 00:16:28.290 So to recap, one rider might be associated with more than one station. 00:16:28.290 --> 00:16:33.030 They might get on at this first one and get off at this later one. 00:16:33.030 --> 00:16:37.350 But each station could presumably have more than one rider. 00:16:37.350 --> 00:16:41.130 Each station here could have rider A or rider B, the rider 00:16:41.130 --> 00:16:47.050 up here or the rider down below, and even many more than that as well. 00:16:47.050 --> 00:16:50.550 So to put it in the language of our ER diagrams, 00:16:50.550 --> 00:16:53.160 our entity relation diagrams from last week, 00:16:53.160 --> 00:16:57.840 we could look at it bit like this, where we have riders and stations. 00:16:57.840 --> 00:17:02.550 Riders visit stations, and they're associated like this. 00:17:02.550 --> 00:17:07.540 A rider must have at least one station associated with them. 00:17:07.540 --> 00:17:09.270 That's what this horizontal line means. 00:17:09.270 --> 00:17:13.140 If they aren't at a station, they aren't really a rider, right? 00:17:13.140 --> 00:17:16.960 A rider though could have many stations associated with them. 00:17:16.960 --> 00:17:19.470 That's what this three prongs down here means. 00:17:19.470 --> 00:17:22.440 They could have one, two, three, four, they 00:17:22.440 --> 00:17:26.290 could have many stations they get on and get off of. 00:17:26.290 --> 00:17:31.030 Now a station could have anywhere between zero riders, 00:17:31.030 --> 00:17:35.290 if it's maybe out of commission or isn't very popular, upwards to many. 00:17:35.290 --> 00:17:39.730 It could have two, three, four, five, even hundreds of riders associated 00:17:39.730 --> 00:17:42.070 with this particular station. 00:17:42.070 --> 00:17:47.350 So here is our entity relation diagram for these particular riders 00:17:47.350 --> 00:17:50.030 and these stations here. 00:17:50.030 --> 00:17:55.660 So let me ask, what questions do we have on these relationships between riders 00:17:55.660 --> 00:17:59.700 and stations and how to design this table so far? 00:17:59.700 --> 00:18:01.450 AUDIENCE: Then I want to ask that you have 00:18:01.450 --> 00:18:06.370 used the same ID for stations and riders so that maybe 00:18:06.370 --> 00:18:08.895 give us a problem in coding? 00:18:08.895 --> 00:18:10.520 CARTER ZENKE: Yeah, a good observation. 00:18:10.520 --> 00:18:14.950 So you might have noticed that in the riders table and in the stations table, 00:18:14.950 --> 00:18:17.140 I gave the same kind of ID. 00:18:17.140 --> 00:18:19.602 Like I had one, two, three for each of them. 00:18:19.602 --> 00:18:21.310 And let me just show you that again here. 00:18:21.310 --> 00:18:24.950 I'll come back to some slides, and I'll show you again, 00:18:24.950 --> 00:18:30.730 the riders table where we had Charlie, Alice, and Bob, ID 1, 2, 3. 00:18:30.730 --> 00:18:31.870 Same for the stations. 00:18:31.870 --> 00:18:36.730 We had stations Harvard, Kendall, Park Street, ID 1, 2, 3. 00:18:36.730 --> 00:18:39.950 And to your question, isn't that a problem? 00:18:39.950 --> 00:18:43.540 Well, I would argue in this case, it's not, 00:18:43.540 --> 00:18:48.460 so long as we keep clear that these IDs are for stations 00:18:48.460 --> 00:18:50.800 and these IDs are for riders. 00:18:50.800 --> 00:18:54.550 And we'll see how to do that using our SQL keywords later on. 00:18:54.550 --> 00:18:59.500 But again, so long as we have an ID just for our riders and an ID 00:18:59.500 --> 00:19:02.770 just for our stations, we can keep these separate 00:19:02.770 --> 00:19:05.560 even if they might have the same values. 00:19:05.560 --> 00:19:07.990 But a great question here. 00:19:07.990 --> 00:19:10.440 Let's take just one more. 00:19:10.440 --> 00:19:12.870 AUDIENCE: Regarding the entity relationship diagram, 00:19:12.870 --> 00:19:17.340 how is it possible for a station to have a possibility of zero riders, 00:19:17.340 --> 00:19:21.303 but riders must compulsorily have at least one station. 00:19:21.303 --> 00:19:22.720 CARTER ZENKE: Yeah, good question. 00:19:22.720 --> 00:19:25.470 So this might be up to you and how you formulate it, 00:19:25.470 --> 00:19:28.380 but for me, let me show that diagram again. 00:19:28.380 --> 00:19:32.280 I'll go back to over here. 00:19:32.280 --> 00:19:36.540 In my mind, to be a rider, you have to visit a station. 00:19:36.540 --> 00:19:39.780 If you aren't visiting a station, you aren't really a rider, right? 00:19:39.780 --> 00:19:43.740 Now presumably, there are stations that were built but aren't really being 00:19:43.740 --> 00:19:46.560 used right now or aren't really in service yet. 00:19:46.560 --> 00:19:49.230 That could be a station that has no visitors. 00:19:49.230 --> 00:19:50.910 So you could argue-- 00:19:50.910 --> 00:19:54.090 let's make sure every station has at least one rider 00:19:54.090 --> 00:19:56.910 and every rider may or may not have to visit a station. 00:19:56.910 --> 00:19:59.640 For that I would say, we could probably reasonably 00:19:59.640 --> 00:20:03.300 disagree there and talk about how we could represent the diagram here, too. 00:20:03.300 --> 00:20:09.940 But a great observation and a good other critique of this system here. 00:20:09.940 --> 00:20:11.150 All right. 00:20:11.150 --> 00:20:16.000 So let's now turn to representing this in our database. 00:20:16.000 --> 00:20:19.180 I'll go back to my computer and we'll learn 00:20:19.180 --> 00:20:23.440 about this new SQL keyword, SQL statement, 00:20:23.440 --> 00:20:26.200 this one called Create Table. 00:20:26.200 --> 00:20:31.400 Create table allows us to, as the name suggests, create a brand new table. 00:20:31.400 --> 00:20:34.450 So let's do just that in our new database 00:20:34.450 --> 00:20:37.480 to represent riders and stations. 00:20:37.480 --> 00:20:42.820 I'll go into my terminal, and I want to make a brand new database. 00:20:42.820 --> 00:20:49.120 I'll call this one mbta.db, because mbta stands for the Massachusetts Bay 00:20:49.120 --> 00:20:52.630 Transportation Authority, the people who run the subway, essentially. 00:20:52.630 --> 00:20:57.610 So I'll do sqlite3, mbta.db, hit Enter, and I'll 00:20:57.610 --> 00:21:04.220 type Y to say yes, I want to create this brand new database. 00:21:04.220 --> 00:21:09.010 Now if I type dot schema, I see nothing. 00:21:09.010 --> 00:21:10.150 But that's expected. 00:21:10.150 --> 00:21:11.530 I don't have any tables yet. 00:21:11.530 --> 00:21:13.870 I have nothing inside this database. 00:21:13.870 --> 00:21:17.350 It's up to me to create these tables myself. 00:21:17.350 --> 00:21:22.240 So what I'll do is clear my terminal, and let's start first with riders. 00:21:22.240 --> 00:21:23.890 I might create a table for riders. 00:21:23.890 --> 00:21:26.440 I'll say, create table. 00:21:26.440 --> 00:21:29.500 And now I have to give that table some name. 00:21:29.500 --> 00:21:31.960 I might call it riders here. 00:21:31.960 --> 00:21:36.310 And then in parentheses, like this, I can specify what 00:21:36.310 --> 00:21:39.500 columns should be part of this table. 00:21:39.500 --> 00:21:40.540 So let's start first. 00:21:40.540 --> 00:21:44.320 I'll hit Enter here and continue this query. 00:21:44.320 --> 00:21:47.770 Now, all by convention, I'll just indent four spaces-- 00:21:47.770 --> 00:21:52.760 one, two, three, four, and I'll give an ID to each of these riders 00:21:52.760 --> 00:21:54.430 as we saw before. 00:21:54.430 --> 00:21:58.180 I'll say ID here is one of my columns. 00:21:58.180 --> 00:22:04.300 Now to create a new column, I'll follow this up with a comma and hit Enter. 00:22:04.300 --> 00:22:08.350 I'll again, by convention, for style, just indent four spaces. 00:22:08.350 --> 00:22:09.730 And what's my next column? 00:22:09.730 --> 00:22:12.160 Perhaps a name for these riders. 00:22:12.160 --> 00:22:17.530 I'll give this column the name, name, and I'll leave it at that. 00:22:17.530 --> 00:22:22.660 Once I'm done adding columns, I no longer need to have a comma. 00:22:22.660 --> 00:22:26.320 I could simply close out this query, this statement. 00:22:26.320 --> 00:22:29.740 I could hit Enter here, say close in parentheses 00:22:29.740 --> 00:22:33.400 to close the top parentheses here, semicolon, hit Enter. 00:22:33.400 --> 00:22:36.730 And now nothing seems to happen. 00:22:36.730 --> 00:22:38.330 But that's actually a good sign. 00:22:38.330 --> 00:22:41.290 So let me type dot schema, hit Enter, and I'll 00:22:41.290 --> 00:22:44.650 see the result of that statement before. 00:22:44.650 --> 00:22:48.820 Create table, if it doesn't already exist, riders, and riders 00:22:48.820 --> 00:22:53.590 has inside of it two columns, ID and name. 00:22:53.590 --> 00:22:54.740 OK, let's keep going. 00:22:54.740 --> 00:22:56.800 Let's make one for stations, too. 00:22:56.800 --> 00:22:59.380 I'll clear my terminal and I'll say, create me 00:22:59.380 --> 00:23:02.740 a table called stations and include-- 00:23:02.740 --> 00:23:04.330 actually, not station. 00:23:04.330 --> 00:23:09.140 If you ever want to, let's say fix this kind of table here, 00:23:09.140 --> 00:23:14.500 let me try closing the parentheses, hit semicolon, Enter. 00:23:14.500 --> 00:23:15.850 I'll get a syntax error. 00:23:15.850 --> 00:23:17.050 I can restart. 00:23:17.050 --> 00:23:22.180 I'll do Control L. Now I'll do create table, stations, plural. 00:23:22.180 --> 00:23:24.250 Open parentheses, Enter. 00:23:24.250 --> 00:23:25.870 I'll indent by four spaces. 00:23:25.870 --> 00:23:27.430 One, two, three, four. 00:23:27.430 --> 00:23:31.870 And now, I'll similarly include an ID for each of these stations. 00:23:31.870 --> 00:23:36.220 I'll say ID comma, and then what all should I have? 00:23:36.220 --> 00:23:40.840 Well, stations tend to have a name, like the Kendall MIT station, 00:23:40.840 --> 00:23:43.480 the Harvard station, the Park Street Station. 00:23:43.480 --> 00:23:48.760 So I'll say, I'll give each of these their very own name, comma. 00:23:48.760 --> 00:23:50.770 What else do stations have? 00:23:50.770 --> 00:23:54.220 Well, they might also have a line that they're on. 00:23:54.220 --> 00:23:58.700 Let's say it's the red line, or the blue line, or the green line and so on. 00:23:58.700 --> 00:24:03.720 I'll have a space for them to write in their line that they're a part of. 00:24:03.720 --> 00:24:06.160 OK, and maybe I'll leave it at that to keep it simple. 00:24:06.160 --> 00:24:09.490 I'll say, stations have an ID, a name and a line. 00:24:09.490 --> 00:24:10.750 Now I'll close this out. 00:24:10.750 --> 00:24:15.430 I'll say end parentheses, semicolon, hit Enter, and nothing seems to happen. 00:24:15.430 --> 00:24:20.770 But if I type dot schema, I'll now see my riders and my stations 00:24:20.770 --> 00:24:24.870 tables inside of my database. 00:24:24.870 --> 00:24:26.760 Now one more step here. 00:24:26.760 --> 00:24:31.920 We have riders, we have stations, but we have to relate to them. 00:24:31.920 --> 00:24:34.800 We have to relate them using this many to many relationships, 00:24:34.800 --> 00:24:37.420 as we saw last week. 00:24:37.420 --> 00:24:40.950 So let me try making a table to implement 00:24:40.950 --> 00:24:43.110 this many to many relationship. 00:24:43.110 --> 00:24:47.400 And if you remember, we might call this kind of table a junction table, 00:24:47.400 --> 00:24:50.040 an associative entity, a join table. 00:24:50.040 --> 00:24:53.730 It has a lot of names, but it looks a bit like this. 00:24:53.730 --> 00:24:58.710 I'll create this new table to represent, let's say, visits. 00:24:58.710 --> 00:25:01.110 A rider visits a station. 00:25:01.110 --> 00:25:03.780 So I'll call this table visits. 00:25:03.780 --> 00:25:09.840 And inside, I'll make sure it has two columns, one for a rider 00:25:09.840 --> 00:25:17.130 ID to represent a rider, and one, let's say, for a station ID, 00:25:17.130 --> 00:25:19.020 to represent a station. 00:25:19.020 --> 00:25:24.630 Now when I see a rider ID next to a station ID in this table, 00:25:24.630 --> 00:25:27.870 I'll know the rider with that certain ID visited 00:25:27.870 --> 00:25:30.180 the station with that certain ID. 00:25:30.180 --> 00:25:31.500 So I'll close this out. 00:25:31.500 --> 00:25:36.330 I'll say end, parentheses here, semicolon, Enter, and finally, clear 00:25:36.330 --> 00:25:38.940 my terminal, type dot schema. 00:25:38.940 --> 00:25:45.000 And I can see, I have riders, stations, and visits between riders and stations 00:25:45.000 --> 00:25:48.630 in this associative entity, this junction table or a joined table. 00:25:48.630 --> 00:25:51.990 Up to you what you might want to call it in this case. 00:25:51.990 --> 00:25:55.840 Now what questions do we have? 00:25:55.840 --> 00:25:58.480 AUDIENCE: Why we have not use the primary key 00:25:58.480 --> 00:26:00.563 and secondary key in this table? 00:26:00.563 --> 00:26:01.730 CARTER ZENKE: Good question. 00:26:01.730 --> 00:26:03.605 So we're going to get there in just a minute. 00:26:03.605 --> 00:26:07.120 But if I look back at my terminal here, my schema, I'll see, 00:26:07.120 --> 00:26:09.340 I really just have column names. 00:26:09.340 --> 00:26:13.840 And we saw before and we typed dot schema on our longlist.db, 00:26:13.840 --> 00:26:16.150 we had more than just column names. 00:26:16.150 --> 00:26:19.400 We had column names, we had perhaps data types, 00:26:19.400 --> 00:26:21.807 we had primary keys and foreign keys. 00:26:21.807 --> 00:26:23.390 So we'll get to that in just a minute. 00:26:23.390 --> 00:26:28.300 But suffice to say for now, we're going to keep improving this over time. 00:26:28.300 --> 00:26:30.580 Let's take one more. 00:26:30.580 --> 00:26:33.300 AUDIENCE: Is it required to put spaces-- 00:26:33.300 --> 00:26:37.422 the four spaces indents, or that's just for the visual look? 00:26:37.422 --> 00:26:38.880 CARTER ZENKE: Yeah, great question. 00:26:38.880 --> 00:26:43.260 Is it required to have these four spaces before each column name. 00:26:43.260 --> 00:26:46.720 And in fact, no, it's not, but it makes the code more readable. 00:26:46.720 --> 00:26:49.590 So I could put this all in one line-- 00:26:49.590 --> 00:26:51.270 I shouldn't, but I could. 00:26:51.270 --> 00:26:55.620 And if I have instead this new line followed by four spaces, 00:26:55.620 --> 00:27:00.390 I can make this more readable for myself and for my colleagues, too. 00:27:00.390 --> 00:27:02.060 Good question. 00:27:02.060 --> 00:27:07.430 OK, so to our earlier point, there are things 00:27:07.430 --> 00:27:09.680 that are missing from this schema. 00:27:09.680 --> 00:27:13.800 Like, we have column names, but as we saw before, 00:27:13.800 --> 00:27:16.550 we should ideally specify what kind of data 00:27:16.550 --> 00:27:20.870 should be able to go into each of these columns. 00:27:20.870 --> 00:27:24.990 And for that, we'll need some new ideas to talk about here. 00:27:24.990 --> 00:27:30.950 So let's focus now on this idea of data types and storage classes. 00:27:30.950 --> 00:27:36.680 Data types and storage classes are two very similar but distinct IDs, 00:27:36.680 --> 00:27:40.230 and they're distinct in a way we'll talk about in just a minute. 00:27:40.230 --> 00:27:45.770 Now SQLite has five storage classes, five kind of storage-- 00:27:45.770 --> 00:27:49.520 kind of type, so to speak, of values that can hold. 00:27:49.520 --> 00:27:51.920 So let's talk about the first one, null, for instance. 00:27:51.920 --> 00:27:54.230 Null in this case means nothing. 00:27:54.230 --> 00:27:57.170 There's nothing that actually is inside of this value. 00:27:57.170 --> 00:27:59.990 It's kind of a central value to mean nothing is here. 00:27:59.990 --> 00:28:04.700 Integer means a whole number, like 1, 2, 3, 4, or 5. 00:28:04.700 --> 00:28:10.880 Real talks about decimals like floating points like 1.2 or 2.4 and so on. 00:28:10.880 --> 00:28:12.740 Text is used for characters. 00:28:12.740 --> 00:28:19.740 And blob, kind of a funny one, blob stands for Binary Large Object, 00:28:19.740 --> 00:28:24.570 and it represents the data exactly as I give it to this value. 00:28:24.570 --> 00:28:30.800 If I tell it to store 101010, it will store exactly 101010 in binary. 00:28:30.800 --> 00:28:34.220 So useful for storing, in this case like images and video files 00:28:34.220 --> 00:28:37.160 and audio files, things that have some structure 00:28:37.160 --> 00:28:40.030 we don't want to mess around with. 00:28:40.030 --> 00:28:44.290 Now let's focus on this idea of a storage class. 00:28:44.290 --> 00:28:50.830 These, I'll say, are storage classes and not data types in SQLite. 00:28:50.830 --> 00:28:58.390 Now a storage class like integer can comprise, can hold several data types. 00:28:58.390 --> 00:29:01.480 Notice how there are seven different data 00:29:01.480 --> 00:29:05.380 types that could be stored under this integer storage class. 00:29:05.380 --> 00:29:10.960 We have a 6-byte integer, 2-byte integer, a 8 and 0 and so on. 00:29:10.960 --> 00:29:13.750 It could be any of these particular types, 00:29:13.750 --> 00:29:19.330 but each of these types under the umbrella of this integer storage class. 00:29:19.330 --> 00:29:23.740 And SQLite itself will take care of making sure 00:29:23.740 --> 00:29:25.750 it uses the appropriate data type. 00:29:25.750 --> 00:29:30.910 Like if I give a very large number, like let's say 4 billion or 6 billion 00:29:30.910 --> 00:29:35.200 or even bigger than that, it will probably use a longer-- 00:29:35.200 --> 00:29:39.010 that is, a bigger byte integer to store that kind of value. 00:29:39.010 --> 00:29:42.610 If I give it a smaller one, like 1, 2, 3, or 4, 00:29:42.610 --> 00:29:47.140 it will probably use a 1-byte or a 2-byte integer for that. 00:29:47.140 --> 00:29:50.560 But SQLite's idea is that I, as a programmer, 00:29:50.560 --> 00:29:54.910 shouldn't have to care if I use an 8-byte or a 1-byte or a 2-byte integer, 00:29:54.910 --> 00:29:58.180 I just care that I'm using integers, whole numbers, 00:29:58.180 --> 00:30:03.670 and they give me a storage class to use any of these up to their choice 00:30:03.670 --> 00:30:05.960 here as well. 00:30:05.960 --> 00:30:10.930 Now let's look at a few examples of values in SQLite that we could store. 00:30:10.930 --> 00:30:14.590 Well, we have perhaps the red line as some text. 00:30:14.590 --> 00:30:16.690 And because this is characters, it's quoted, 00:30:16.690 --> 00:30:22.330 we could use the text storage class to represent this particular value here. 00:30:22.330 --> 00:30:25.630 We could have maybe an image, and to the earlier point, 00:30:25.630 --> 00:30:29.350 we could say, well, this image might be best represented 00:30:29.350 --> 00:30:33.490 using a blob, a binary large object, to keep all of these pixels 00:30:33.490 --> 00:30:36.790 exactly as they are in this image. 00:30:36.790 --> 00:30:40.330 But we do get some choice, some interesting design 00:30:40.330 --> 00:30:43.720 challenges when we look at the idea of fares. 00:30:43.720 --> 00:30:50.230 So let's say to our point earlier, fares are $0.10 back in the 1950s or so. 00:30:50.230 --> 00:30:55.960 Well, $0.10 we could store as an integer, which seems just fine. 00:30:55.960 --> 00:30:58.480 But this could get confused. 00:30:58.480 --> 00:31:00.790 I'm talking about dollars here or cents? 00:31:00.790 --> 00:31:03.205 Maybe it would be better, let's say, if I did this. 00:31:03.205 --> 00:31:07.610 A dollar sign, 0.10. 00:31:07.610 --> 00:31:09.680 And what might that be stored as? 00:31:09.680 --> 00:31:11.630 Well, probably text, right? 00:31:11.630 --> 00:31:14.840 I could say this dollar sign isn't really a number, 00:31:14.840 --> 00:31:18.140 but now I have to include it, so I'll say this will be quoted, essentially, 00:31:18.140 --> 00:31:22.358 as dollar sign 0.10. 00:31:22.358 --> 00:31:23.900 Now there's some downsides here, too. 00:31:23.900 --> 00:31:25.550 Like let's say I have-- 00:31:25.550 --> 00:31:26.840 I want to add these up. 00:31:26.840 --> 00:31:28.580 Well, I can't add up text. 00:31:28.580 --> 00:31:33.140 Like, what does it mean to say dollar sign 0.10 plus dollar sign 0.20. 00:31:33.140 --> 00:31:35.630 I can't do math with text. 00:31:35.630 --> 00:31:41.090 So maybe it'll be better after all if I used a real or a decimal like this, 00:31:41.090 --> 00:31:43.970 0.10. 00:31:43.970 --> 00:31:46.880 But I mean even here, you'll run into some problems. 00:31:46.880 --> 00:31:50.780 If you are familiar with how the number is represented in binary, 00:31:50.780 --> 00:31:54.380 you might know that decimal values or floating point values 00:31:54.380 --> 00:31:57.020 can't be perfectly, precisely represented. 00:31:57.020 --> 00:32:04.850 And if I talk about 0.10, the computer might store 0.10000056789. 00:32:04.850 --> 00:32:09.450 It could get very wonky out to the many, many decimal digits down below here. 00:32:09.450 --> 00:32:13.460 So trade offs and challenges overall. 00:32:13.460 --> 00:32:15.080 Let's look at these three, though. 00:32:15.080 --> 00:32:17.480 I have the first one to store as an integer. 00:32:17.480 --> 00:32:23.000 I'm trying to store fares here, second one as text, and the third one 00:32:23.000 --> 00:32:27.320 as a floating point or a real in this case. 00:32:27.320 --> 00:32:30.350 Let me ask for some opinions here, which one 00:32:30.350 --> 00:32:36.297 would you use and why, for trying represent fares in this case? 00:32:36.297 --> 00:32:37.130 AUDIENCE: Thank you. 00:32:37.130 --> 00:32:41.700 I prefer using integers because of course, 00:32:41.700 --> 00:32:44.690 I need to get the calculation very accurately. 00:32:44.690 --> 00:32:45.830 That's my point of view. 00:32:45.830 --> 00:32:50.030 Well, sometimes I can use float but, you know, 00:32:50.030 --> 00:32:52.310 like you said before, it can get very wonky 00:32:52.310 --> 00:32:59.815 if I really need that kind of precision, I don't really recommend using floats. 00:32:59.815 --> 00:33:01.190 CARTER ZENKE: Yeah, a good point. 00:33:01.190 --> 00:33:04.820 So if I go back to some slides here, you might argue for the integer 00:33:04.820 --> 00:33:07.940 because you know you can precisely represent integers. 00:33:07.940 --> 00:33:12.290 And let's say I want to add up fares over a lot, a lot of riders. 00:33:12.290 --> 00:33:15.020 This might be useful for me because I know that each number will 00:33:15.020 --> 00:33:16.820 be perfectly, precisely represented. 00:33:16.820 --> 00:33:20.720 I can do lots of big kind of math with this number here. 00:33:20.720 --> 00:33:24.260 To your point, this decimal might kind of, as you said, 00:33:24.260 --> 00:33:26.840 get wonky later on towards the later decimal points. 00:33:26.840 --> 00:33:30.980 I might get some unexpected results if we add up these overall. 00:33:30.980 --> 00:33:36.380 Let me ask, though, are there any proponents of this floating point 00:33:36.380 --> 00:33:40.120 value or a real value? 00:33:40.120 --> 00:33:42.930 AUDIENCE: So I think the-- 00:33:42.930 --> 00:33:48.810 I think the float is the number of, for example, 00:33:48.810 --> 00:33:56.190 for each pair, like the answers like truncation 00:33:56.190 --> 00:33:58.875 probably suggest by the comments there. 00:33:58.875 --> 00:34:00.250 CARTER ZENKE: Yeah, a good point. 00:34:00.250 --> 00:34:02.083 So if you talk about using this float value, 00:34:02.083 --> 00:34:06.491 I mean, one thing we could say for it is that this decimal could be-- 00:34:06.491 --> 00:34:09.449 it's more accurate to say, like, where you're working with dollars now, 00:34:09.449 --> 00:34:13.259 and we could have maybe $0.10, which is only 0.1 of a dollar. 00:34:13.259 --> 00:34:15.030 I totally hear that point as well. 00:34:15.030 --> 00:34:17.280 And the point we're making here is that they're really 00:34:17.280 --> 00:34:20.580 just trade offs among these data storage classes 00:34:20.580 --> 00:34:23.699 to use, whether you're using integers or real values, 00:34:23.699 --> 00:34:27.730 it just depends on your use case and what you're designing for. 00:34:27.730 --> 00:34:30.900 So be sure to read up on trade offs among these data types, 00:34:30.900 --> 00:34:35.420 and determine for yourself which one should you best use. 00:34:35.420 --> 00:34:36.409 OK. 00:34:36.409 --> 00:34:41.090 So we have now these storage classes to store values, 00:34:41.090 --> 00:34:47.210 and it turns out that columns can be made to store certain storage classes 00:34:47.210 --> 00:34:49.460 or prioritize certain classes. 00:34:49.460 --> 00:34:53.750 And the key distinction here is that columns in SQLite 00:34:53.750 --> 00:34:57.440 don't always store one particular type. 00:34:57.440 --> 00:35:01.250 Instead, they have type affinities, meaning 00:35:01.250 --> 00:35:07.280 that they'll try to convert some value you insert into a given cell or given 00:35:07.280 --> 00:35:11.660 row to the type they have the affinity for. 00:35:11.660 --> 00:35:16.490 Now there are, let's say five type affinities in SQLite-- 00:35:16.490 --> 00:35:17.270 text. 00:35:17.270 --> 00:35:19.610 Columns can be of the type affinity text, 00:35:19.610 --> 00:35:22.520 meaning they store just characters at the end of the day. 00:35:22.520 --> 00:35:27.950 There's also numeric, which stores either integer values or real values, 00:35:27.950 --> 00:35:31.010 depending on which one seems best to convert to. 00:35:31.010 --> 00:35:33.980 You have integer type affinity, which means it can store whole numbers. 00:35:33.980 --> 00:35:36.990 Real, to store floating points or decimal values. 00:35:36.990 --> 00:35:39.120 And we have blob here again, our old friend 00:35:39.120 --> 00:35:42.030 to store binary exactly as we get it. 00:35:42.030 --> 00:35:44.750 So whereas before we were talking about storage classes-- 00:35:44.750 --> 00:35:47.090 those are associated with individual values-- 00:35:47.090 --> 00:35:51.210 type affinities are now associated with individual columns. 00:35:51.210 --> 00:35:55.460 So let's see some example of how this might work in SQLite. 00:35:55.460 --> 00:35:59.000 Let's say I have this table of fares, and we've 00:35:59.000 --> 00:36:02.720 decided to store fares as integers. 00:36:02.720 --> 00:36:09.470 Well, if I said that this column called amount has the affinity for the text 00:36:09.470 --> 00:36:14.300 storage class, what would happen is if I insert this integer, 10, 00:36:14.300 --> 00:36:16.490 it would look a bit like this later on. 00:36:16.490 --> 00:36:20.250 It would be converted to text, it would be quoted in a sense, 00:36:20.250 --> 00:36:24.240 to represent-- it's now been converted to some set of characters. 00:36:24.240 --> 00:36:27.050 Let's say I insert this value 25. 00:36:27.050 --> 00:36:31.250 Well, 25 has a storage class right now of an integer. 00:36:31.250 --> 00:36:33.560 It is a whole number. 00:36:33.560 --> 00:36:38.480 But if I insert this into a column that has the text affinity, 00:36:38.480 --> 00:36:44.250 it will be converted into, in this case, text at the end of the day. 00:36:44.250 --> 00:36:45.920 Let's do the opposite. 00:36:45.920 --> 00:36:51.270 Let's say I have my fare as a string, some text in this case. 00:36:51.270 --> 00:36:54.690 I want to insert it into this column called amount, 00:36:54.690 --> 00:36:58.820 but now amount has the integer type affinity. 00:36:58.820 --> 00:37:03.830 Well, if I insert 10, quote unquote, into the column amount, 00:37:03.830 --> 00:37:10.850 I'll get back not 10, the text, but 10 the integer, because again, amount-- 00:37:10.850 --> 00:37:16.310 this column here-- has an affinity for the integer storage class. 00:37:16.310 --> 00:37:17.210 Let's try this. 00:37:17.210 --> 00:37:21.650 25, some text again I'll insert it into this table. 00:37:21.650 --> 00:37:26.690 Now I'll have 25 as an integer. 00:37:26.690 --> 00:37:31.300 So this is how SQLite allows us to give certain columns 00:37:31.300 --> 00:37:33.610 an affinity for certain types, that they'll 00:37:33.610 --> 00:37:38.860 try to store values of that type, so long as we insert values it could be 00:37:38.860 --> 00:37:41.900 feasibly converted to that type here. 00:37:41.900 --> 00:37:45.820 So let's go back to our schema and try to improve it now, 00:37:45.820 --> 00:37:50.860 to use not just column names, but also type affinities to store 00:37:50.860 --> 00:37:53.140 certain data of a certain type. 00:37:53.140 --> 00:37:58.000 Go back to my computer here, and let's improve this once more. 00:37:58.000 --> 00:38:00.910 So I'll go over to my table. 00:38:00.910 --> 00:38:05.890 And now, I probably want to improve the design here. 00:38:05.890 --> 00:38:08.680 And often, if I want to improve this, I might just 00:38:08.680 --> 00:38:10.910 need to erase what I've already done. 00:38:10.910 --> 00:38:16.210 So let me introduce this new keyword, this new statement called Drop Table. 00:38:16.210 --> 00:38:19.460 To drop a table means to delete it, to remove it, effectively. 00:38:19.460 --> 00:38:23.080 So let me try doing this for riders, stations, and visits. 00:38:23.080 --> 00:38:28.840 I'll type drop table writers, semicolon, Enter. 00:38:28.840 --> 00:38:34.240 Nothing seems to happen, but if I type dot schema, well, riders is gone. 00:38:34.240 --> 00:38:40.960 I'll try drop table stations, then semicolon, hit Enter, and type, 00:38:40.960 --> 00:38:42.760 let's say dot schema again. 00:38:42.760 --> 00:38:44.320 No more stations. 00:38:44.320 --> 00:38:49.030 I'll try drop table visits, semicolon, Enter, 00:38:49.030 --> 00:38:52.510 and then dot schema, and our table-- 00:38:52.510 --> 00:38:54.160 our database is really gone. 00:38:54.160 --> 00:38:57.280 There are no more tables inside of it. 00:38:57.280 --> 00:39:02.370 So let me propose that instead of working inside of the SQLite 00:39:02.370 --> 00:39:05.730 prompt, like typing out again, and again, and again, create table riders, 00:39:05.730 --> 00:39:07.650 create table stations, create table visits, 00:39:07.650 --> 00:39:10.650 let me be more efficient about this and create myself 00:39:10.650 --> 00:39:14.790 a schema file that I could reuse throughout this lesson 00:39:14.790 --> 00:39:18.870 and also later on while I'm working on this database on my own. 00:39:18.870 --> 00:39:23.850 To do that, let me quit my SQLite prompt here, and let me 00:39:23.850 --> 00:39:27.300 type something like code schema dot sql. 00:39:27.300 --> 00:39:31.620 I'm just creating this file called schema.sql. 00:39:31.620 --> 00:39:36.060 Now a dot SQL file allows me to type in SQL keywords 00:39:36.060 --> 00:39:38.790 and have them be syntax highlighted, so I know 00:39:38.790 --> 00:39:42.040 what's going on inside of this file. 00:39:42.040 --> 00:39:44.130 So let's just try this once more. 00:39:44.130 --> 00:39:49.170 I'll type Create table riders, and inside I'll 00:39:49.170 --> 00:39:53.820 say it has the ID column of what type affinity? 00:39:53.820 --> 00:39:58.270 Well, IDs are whole numbers, so perhaps integer in this case. 00:39:58.270 --> 00:40:02.160 I could say, ID has the integer type affinity. 00:40:02.160 --> 00:40:05.010 Now let me say the riders also have a name, 00:40:05.010 --> 00:40:07.710 and how could names be best represented? 00:40:07.710 --> 00:40:09.210 Maybe text, right? 00:40:09.210 --> 00:40:10.120 Characters here. 00:40:10.120 --> 00:40:12.750 So I'll say name and text. 00:40:12.750 --> 00:40:14.850 Now I'll include a semicolon to say, this 00:40:14.850 --> 00:40:17.820 is the end of my create table statement. 00:40:17.820 --> 00:40:21.630 And before, remember how I had to kind of air out 00:40:21.630 --> 00:40:25.560 or I had to like, backspace and so on to improve the design? 00:40:25.560 --> 00:40:28.890 Here I can literally just point and click and edit this file 00:40:28.890 --> 00:40:30.630 to improve my schema. 00:40:30.630 --> 00:40:35.130 And I'll later on apply this in my database using 00:40:35.130 --> 00:40:37.418 a command that we'll see a bit later. 00:40:37.418 --> 00:40:38.460 So let's keep going here. 00:40:38.460 --> 00:40:44.040 I'll say create table stations, and inside of the stations table, 00:40:44.040 --> 00:40:48.330 I'll make sure it has an ID column of type integer, 00:40:48.330 --> 00:40:51.960 a name column that probably stores some text, 00:40:51.960 --> 00:40:57.270 and a line column that also stores some text, where name 00:40:57.270 --> 00:41:00.130 is the name of the station like Kendell MIT, Harvard, 00:41:00.130 --> 00:41:05.640 and line is blue line or green line, what line it's part of in our subway. 00:41:05.640 --> 00:41:07.290 Let me try now visits. 00:41:07.290 --> 00:41:08.850 I'll say, create table visits. 00:41:08.850 --> 00:41:13.620 And then I'll do rider ID, which has what type affinity? 00:41:13.620 --> 00:41:15.030 Probably integer. 00:41:15.030 --> 00:41:19.680 And then I'll do station ID, which has this same type of affinity. 00:41:19.680 --> 00:41:23.070 It's relying on integers, whole numbers here for IDs. 00:41:23.070 --> 00:41:25.450 So my colon to finish this statement. 00:41:25.450 --> 00:41:29.040 Now this is my schema as a whole. 00:41:29.040 --> 00:41:31.830 I have riders, stations, and visits. 00:41:31.830 --> 00:41:36.760 But now I want to apply this schema to my database. 00:41:36.760 --> 00:41:37.780 So what could I do? 00:41:37.780 --> 00:41:39.330 I could reopen it, let's say. 00:41:39.330 --> 00:41:43.510 I'll do sqlite3 mbta.db in my terminal. 00:41:43.510 --> 00:41:47.370 And now I want to read in this schema.sql file, 00:41:47.370 --> 00:41:51.340 just run the commands that are inside of this file. 00:41:51.340 --> 00:41:52.660 So what could I do? 00:41:52.660 --> 00:41:59.040 I could say dot read schema.sql, where dot read is a command that 00:41:59.040 --> 00:42:01.860 says take whatever file you give me, like schema.sql 00:42:01.860 --> 00:42:08.070 and read it into this database, running any SQL keywords you come across there. 00:42:08.070 --> 00:42:09.450 I'll hit Enter. 00:42:09.450 --> 00:42:11.640 And now nothing seems to happen. 00:42:11.640 --> 00:42:17.658 But if I type dot schema, I now see my schema improved. 00:42:17.658 --> 00:42:19.950 And this is helpful for me, because what I could do now 00:42:19.950 --> 00:42:23.580 is I could just edit my schema.sql file and rerun it and rerun 00:42:23.580 --> 00:42:29.770 it to make sure I now have these tables being improved over time. 00:42:29.770 --> 00:42:30.730 OK. 00:42:30.730 --> 00:42:37.720 So this, then, is our new representation of our database. 00:42:37.720 --> 00:42:42.280 We have riders, of course, their own entity, and stations. 00:42:42.280 --> 00:42:47.170 They have an ID and a name, and stations have an ID, a name, and a line. 00:42:47.170 --> 00:42:52.810 We've also now included these type affinities-- integer, text, integer, 00:42:52.810 --> 00:42:56.320 text, to tell SQL what kinds of storage classes 00:42:56.320 --> 00:43:00.490 could be put inside of each of these columns. 00:43:00.490 --> 00:43:03.580 Now before we keep improving this, let me ask, 00:43:03.580 --> 00:43:10.780 what questions do we have on these storage classes and type affinities? 00:43:10.780 --> 00:43:12.400 AUDIENCE: That you would use-- 00:43:12.400 --> 00:43:17.870 when you were creating the table, the table was not in line. 00:43:17.870 --> 00:43:21.340 So when we search for the authors of books, 00:43:21.340 --> 00:43:25.982 so it comes with a perfect table, so how can we make a perfect table in the SQL? 00:43:25.982 --> 00:43:27.940 CARTER ZENKE: Yeah, do you mind clarifying what 00:43:27.940 --> 00:43:30.160 you mean by like, the perfect table? 00:43:30.160 --> 00:43:33.760 AUDIENCE: So I mean that it was arranged in something 00:43:33.760 --> 00:43:36.443 like in boxes, block boxes. 00:43:36.443 --> 00:43:37.610 CARTER ZENKE: Good question. 00:43:37.610 --> 00:43:41.770 So before we able to see the results of our queries 00:43:41.770 --> 00:43:46.660 inside some boxes in our terminal, and that is actually a mode of SQLite, 00:43:46.660 --> 00:43:51.370 I think type like dot mode table to see your results in that version. 00:43:51.370 --> 00:43:54.937 Here though, we have no data inside of our tables, 00:43:54.937 --> 00:43:56.770 so I can't really select anything from them. 00:43:56.770 --> 00:44:01.300 Like, if I go to my terminal here and I try 00:44:01.300 --> 00:44:04.690 to select some data from this riders table, 00:44:04.690 --> 00:44:10.510 let me say select star from rider's, semicolon, I won't get anything back. 00:44:10.510 --> 00:44:15.040 Next week, though, we'll see how to insert, and add, and update, and delete 00:44:15.040 --> 00:44:17.990 data inside of these tables, at which point, 00:44:17.990 --> 00:44:21.460 you could write, select star from riders and see some data 00:44:21.460 --> 00:44:23.560 you've inserted yourself. 00:44:23.560 --> 00:44:25.120 Great question. 00:44:25.120 --> 00:44:27.580 Let's take one more here. 00:44:27.580 --> 00:44:28.360 AUDIENCE: Yes. 00:44:28.360 --> 00:44:33.130 I would like to know if you have a column of the type boolean. 00:44:33.130 --> 00:44:36.730 CARTER ZENKE: Yeah, do we have a Boolean type affinity, let's say. 00:44:36.730 --> 00:44:39.850 So here, we don't, at least in SQLite. 00:44:39.850 --> 00:44:45.610 Some other DBMS', Database Management Systems might have bool or Boolean, 00:44:45.610 --> 00:44:47.110 true or false, right? 00:44:47.110 --> 00:44:48.700 Let me show you this instead. 00:44:48.700 --> 00:44:52.840 If I go to my terminal, I can see-- if I type dot schema, 00:44:52.840 --> 00:44:56.320 I haven't used Boolean, there's no need for me in this case, 00:44:56.320 --> 00:44:58.150 but I have used integer. 00:44:58.150 --> 00:45:02.350 And integer for SQLite can kind of serve the same purpose. 00:45:02.350 --> 00:45:06.550 I could have the integer 0 or the integer 1 to be true 00:45:06.550 --> 00:45:09.880 or to be false or true, respectively, 0 being false and true being 1, 00:45:09.880 --> 00:45:11.140 I believe in this case. 00:45:11.140 --> 00:45:13.970 But good question. 00:45:13.970 --> 00:45:14.750 OK. 00:45:14.750 --> 00:45:19.190 So to the earlier question, like, we've improved our tables. 00:45:19.190 --> 00:45:21.950 We now have type affinities for our columns. 00:45:21.950 --> 00:45:26.490 But we don't yet have this ID we talked about last week, 00:45:26.490 --> 00:45:29.450 which was primary keys and foreign keys. 00:45:29.450 --> 00:45:34.280 This idea of trying to uniquely represent each item in our own table 00:45:34.280 --> 00:45:36.770 using primary keys and trying to reference 00:45:36.770 --> 00:45:40.800 that primary key from some other table using foreign keys. 00:45:40.800 --> 00:45:43.820 So let's try to work on that now. 00:45:43.820 --> 00:45:48.980 And for this, we'll need this new ID called a table constraint. 00:45:48.980 --> 00:45:54.500 In SQLite, you can apply what's called a constraint to your entire table. 00:45:54.500 --> 00:45:59.300 A constraint means that some values have to be a certain way. 00:45:59.300 --> 00:46:04.340 Like let's say for a primary key, primary keys must be unique. 00:46:04.340 --> 00:46:07.730 They can't repeat, and they must be, at least in our case, 00:46:07.730 --> 00:46:12.990 is going to be integers, to be able to quickly add on to them over time. 00:46:12.990 --> 00:46:16.280 Similarly, for foreign keys, well, a constraint 00:46:16.280 --> 00:46:19.220 is that if you have a foreign key, you better 00:46:19.220 --> 00:46:22.850 find that value in some other table, otherwise 00:46:22.850 --> 00:46:26.250 you violated this constraint to having a foreign key. 00:46:26.250 --> 00:46:29.940 So we have two kinds of table constraints, among others, 00:46:29.940 --> 00:46:33.500 but two of these are primary key and foreign key. 00:46:33.500 --> 00:46:36.800 And we can apply these to our table by applying them 00:46:36.800 --> 00:46:42.030 underneath the columns we tend to say will be inside of our table. 00:46:42.030 --> 00:46:43.850 Let's try these two here now. 00:46:43.850 --> 00:46:47.840 So we come back to my terminal here so we implement our very own primary key 00:46:47.840 --> 00:46:49.730 and foreign key constraints. 00:46:49.730 --> 00:46:52.920 We'll go back to my SQLite terminal and clear my screen, 00:46:52.920 --> 00:46:55.670 and let's then pull up our schema.sql file 00:46:55.670 --> 00:46:58.130 so we can keep modifying our schema. 00:46:58.130 --> 00:47:01.610 I can now see I have the riders table, stations, and visits, 00:47:01.610 --> 00:47:06.080 and I have some columns that could be primary keys or foreign keys, 00:47:06.080 --> 00:47:08.850 but I need to declare them as such. 00:47:08.850 --> 00:47:12.120 So here in the riders table, what was our primary key? 00:47:12.120 --> 00:47:13.520 Well, it was ID. 00:47:13.520 --> 00:47:16.520 Every rider should have their own, unique ID that should not 00:47:16.520 --> 00:47:19.460 be duplicated across any two riders. 00:47:19.460 --> 00:47:24.410 So to ensure that constraint is applied, I could follow this up with a comma 00:47:24.410 --> 00:47:28.770 and then say, primary key ID, just like this. 00:47:28.770 --> 00:47:32.690 Now ID is the primary key of this rider's table. 00:47:32.690 --> 00:47:36.450 I can go down to stations and ask, what was my primary key? 00:47:36.450 --> 00:47:40.520 Well, similarly, it was ID, this ID column on line 8. 00:47:40.520 --> 00:47:45.590 So I'll type a comma followed up with primary key ID, 00:47:45.590 --> 00:47:49.170 and now that ID column is a primary key. 00:47:49.170 --> 00:47:52.580 It has that constraint applied in stations. 00:47:52.580 --> 00:47:56.900 But now if I get down to visits, we'll have a few more options. 00:47:56.900 --> 00:48:01.700 Visits here actually doesn't have its own ID column that I created. 00:48:01.700 --> 00:48:05.750 I instead have a rider ID and a station ID column. 00:48:05.750 --> 00:48:07.110 So a few options here. 00:48:07.110 --> 00:48:10.730 One option is actually to make a joint primary key. 00:48:10.730 --> 00:48:17.630 I could have a primary key composed of two columns, both rider ID and station 00:48:17.630 --> 00:48:18.170 ID. 00:48:18.170 --> 00:48:20.780 If I applied that constraint, that would mean 00:48:20.780 --> 00:48:24.110 that if I were to ever insert a row that had the same rider 00:48:24.110 --> 00:48:26.990 ID and the same station ID as another row, 00:48:26.990 --> 00:48:29.810 I would trigger a constraint violation. 00:48:29.810 --> 00:48:33.080 Every row that has a rider ID and a station ID 00:48:33.080 --> 00:48:36.860 has to be unique in their combination of those two values. 00:48:36.860 --> 00:48:39.770 To write that kind of scenario, I could follow this up 00:48:39.770 --> 00:48:43.190 and I could say similarly, the primary key of this table 00:48:43.190 --> 00:48:48.410 is not just rider ID like this, but it's also station ID. 00:48:48.410 --> 00:48:52.080 Now this is a joint primary key constraint. 00:48:52.080 --> 00:48:54.110 But if we think about this logically, I mean, 00:48:54.110 --> 00:48:57.620 it kind of stands to reason that somebody would visit a station more 00:48:57.620 --> 00:49:01.850 than once, and I don't want to make sure that every combination of rider 00:49:01.850 --> 00:49:03.420 and station ID should be unique. 00:49:03.420 --> 00:49:06.600 I want people to able to visit a station more than once. 00:49:06.600 --> 00:49:09.900 So maybe not the best design for this table, but I could certainly use it. 00:49:09.900 --> 00:49:12.530 In other cases or other contexts. 00:49:12.530 --> 00:49:15.410 One other option would be to do it before 00:49:15.410 --> 00:49:19.670 and to have maybe the ID column here of type integer, 00:49:19.670 --> 00:49:23.670 and then down below, make that our primary key, a bit like this. 00:49:23.670 --> 00:49:27.140 And now visits has its own ID column. 00:49:27.140 --> 00:49:33.395 But actually, SQLite by default will give me its very own primary key, 00:49:33.395 --> 00:49:35.810 one called row ID. 00:49:35.810 --> 00:49:39.170 It's implicit, I can't see it, but I actually could query it. 00:49:39.170 --> 00:49:41.810 I could query it for row ID, all one word, 00:49:41.810 --> 00:49:46.070 and get back a unique primary key for this table SQLite has automatically 00:49:46.070 --> 00:49:47.950 created for me. 00:49:47.950 --> 00:49:50.880 Now we have seen the primary key options, 00:49:50.880 --> 00:49:52.420 what are our foreign key options? 00:49:52.420 --> 00:49:55.180 Well, it seems like rider ID and station ID 00:49:55.180 --> 00:49:59.320 are the foreign keys of this table where rider ID references the ID 00:49:59.320 --> 00:50:05.020 column in rider's, and station ID references the ID column in stations. 00:50:05.020 --> 00:50:08.350 So to codify that, to make that a reality in our schema, 00:50:08.350 --> 00:50:11.540 I could follow this up with the foreign key. 00:50:11.540 --> 00:50:14.260 The foreign key of this table is rider ID, 00:50:14.260 --> 00:50:19.690 and it references the rider's table and the ID column 00:50:19.690 --> 00:50:22.840 inside of it using this syntax here. 00:50:22.840 --> 00:50:24.530 Now I could keep going. 00:50:24.530 --> 00:50:26.840 I could say I have more than one foreign key. 00:50:26.840 --> 00:50:29.500 I also have a foreign key-- 00:50:29.500 --> 00:50:32.020 foreign key called station ID. 00:50:32.020 --> 00:50:34.150 And that references, like we said before, 00:50:34.150 --> 00:50:37.670 the stations table and the ID column inside of it. 00:50:37.670 --> 00:50:40.060 So now here I have my completed schema. 00:50:40.060 --> 00:50:43.780 I have a primary key for the tables I've declared an explicit column 00:50:43.780 --> 00:50:47.800 for a primary key, and now I also have foreign key constraints 00:50:47.800 --> 00:50:51.710 for those columns that should be foreign keys. 00:50:51.710 --> 00:50:55.570 So now let me ask, what questions do we have on the schema 00:50:55.570 --> 00:50:59.508 or on primary keys and foreign keys? 00:50:59.508 --> 00:51:00.300 AUDIENCE: So, yeah. 00:51:00.300 --> 00:51:03.570 I just noticed that whenever before we hadn't 00:51:03.570 --> 00:51:06.210 added the affinities and the keys, we were not 00:51:06.210 --> 00:51:09.340 applying commas after each column name. 00:51:09.340 --> 00:51:11.295 So what is the difference there? 00:51:11.295 --> 00:51:12.670 CARTER ZENKE: Yeah, a good catch. 00:51:12.670 --> 00:51:15.128 Let me kind of show you what this looks like in my terminal 00:51:15.128 --> 00:51:17.310 so you can see it live. 00:51:17.310 --> 00:51:20.430 You, I think, had noticed that before we had, 00:51:20.430 --> 00:51:25.110 let's say, this primary key ID constraint in riders, 00:51:25.110 --> 00:51:26.740 we had done something like this. 00:51:26.740 --> 00:51:28.290 Let me just copy paste that. 00:51:28.290 --> 00:51:30.450 And we had removed this last column-- 00:51:30.450 --> 00:51:32.190 or comma from the name column. 00:51:32.190 --> 00:51:33.090 Is that right? 00:51:33.090 --> 00:51:38.290 And if that's the case, well, it's just convention, just style here. 00:51:38.290 --> 00:51:41.100 So if I want to keep adding some constraint 00:51:41.100 --> 00:51:45.000 or like a new line to my table, I should include a comma. 00:51:45.000 --> 00:51:51.720 Here, this name column was the last portion of my table I had specified. 00:51:51.720 --> 00:51:55.770 I have this column called name that has type affinity text, right? 00:51:55.770 --> 00:51:58.590 But now, if I add this new constraint, we'll 00:51:58.590 --> 00:52:03.360 have to follow it up with-- follow it up after a comma from this new column 00:52:03.360 --> 00:52:04.290 here. 00:52:04.290 --> 00:52:10.230 Notice now, this constraint, primary key ID, is the last-- 00:52:10.230 --> 00:52:12.810 let's say attribute of my table I specified. 00:52:12.810 --> 00:52:15.870 I no longer need to include a comma at the end of it. 00:52:15.870 --> 00:52:20.430 So whatever is the last portion, I should not have a comma after, 00:52:20.430 --> 00:52:23.460 but everything else, I should. 00:52:23.460 --> 00:52:25.800 Let's take one more question here, too. 00:52:25.800 --> 00:52:29.813 AUDIENCE: Would it be OK for the visits table to have an ID column as well? 00:52:29.813 --> 00:52:31.230 CARTER ZENKE: Yeah, good question. 00:52:31.230 --> 00:52:35.490 Would it be OK for the visits table to have an ID column as well. 00:52:35.490 --> 00:52:36.450 It certainly would be. 00:52:36.450 --> 00:52:39.870 We could define our very own primary key for this table, too. 00:52:39.870 --> 00:52:42.210 So let me go back and show you how that could work. 00:52:42.210 --> 00:52:44.880 I'll go to my visits table here. 00:52:44.880 --> 00:52:49.440 And I could try to add my own primary key to this table. 00:52:49.440 --> 00:52:54.870 I could say ID, make a new column here, make it a type affinity integer, 00:52:54.870 --> 00:52:55.900 like this. 00:52:55.900 --> 00:52:56.700 Let me scroll up. 00:52:56.700 --> 00:53:00.090 And now let me add some new constraint. 00:53:00.090 --> 00:53:03.450 I could say, because I've made my very own primary key, 00:53:03.450 --> 00:53:06.310 I'll say primary key ID. 00:53:06.310 --> 00:53:12.510 Now, this table has a primary key that I've created called ID. 00:53:12.510 --> 00:53:15.310 And this will be in place of SQLite, so that it 00:53:15.310 --> 00:53:21.730 would have made called row ID itself but hidden from my own view. 00:53:21.730 --> 00:53:24.450 OK, let's keep going then. 00:53:24.450 --> 00:53:27.570 We've seen table constraints, we've seen type affinities, 00:53:27.570 --> 00:53:30.540 but we could probably do more to improve the design 00:53:30.540 --> 00:53:32.740 of this table or this database. 00:53:32.740 --> 00:53:35.850 So let's introduce one more kind of constraint, 00:53:35.850 --> 00:53:38.740 this one called a column constraint. 00:53:38.740 --> 00:53:42.900 So whereas a table constraint applies to the table as a whole, 00:53:42.900 --> 00:53:47.160 a column constraint applies to a particular column, let's say. 00:53:47.160 --> 00:53:51.930 Maybe I want to column to have certain data inside of it. 00:53:51.930 --> 00:53:54.720 Maybe I want to make sure it doesn't have null values, and so on. 00:53:54.720 --> 00:53:57.690 I could do that with column constraints. 00:53:57.690 --> 00:54:00.510 There are four in SQLite-- 00:54:00.510 --> 00:54:07.440 check, default, not null, and unique, and each one does something different. 00:54:07.440 --> 00:54:11.340 Check allows me to make my very own check, like check to be sure 00:54:11.340 --> 00:54:14.250 that this amount is greater than 0. 00:54:14.250 --> 00:54:17.250 Or I could use default. Default means if I 00:54:17.250 --> 00:54:20.310 don't supply a value when I add a new row, 00:54:20.310 --> 00:54:23.640 it'll just use a default value instead. 00:54:23.640 --> 00:54:29.730 Not null means I can't insert null or empty values into this column. 00:54:29.730 --> 00:54:31.680 And in fact, it's required. 00:54:31.680 --> 00:54:37.770 Unique means, I want to make sure that every row in this column 00:54:37.770 --> 00:54:39.390 is a unique value. 00:54:39.390 --> 00:54:42.370 It doesn't appear twice in my data set. 00:54:42.370 --> 00:54:46.440 So let's try applying a few of these to our schema here. 00:54:46.440 --> 00:54:48.270 Go back to my terminal. 00:54:48.270 --> 00:54:50.920 And now let me check out this. 00:54:50.920 --> 00:54:55.020 Well, I could try applying the not null constraint 00:54:55.020 --> 00:54:59.190 when I know I want a column to be required, effectively. 00:54:59.190 --> 00:55:01.530 Now where could I best apply that? 00:55:01.530 --> 00:55:05.790 Maybe I could apply that to the name column in stations, 00:55:05.790 --> 00:55:07.870 like stations must have a name. 00:55:07.870 --> 00:55:11.140 So I'll say, the name column cannot be null. 00:55:11.140 --> 00:55:14.310 It cannot be empty in here. 00:55:14.310 --> 00:55:17.280 Line also should probably be not null. 00:55:17.280 --> 00:55:19.710 A station must be part of some line. 00:55:19.710 --> 00:55:21.780 I can't have an empty value for line. 00:55:21.780 --> 00:55:26.240 So I'll say, too, this should be not null. 00:55:26.240 --> 00:55:28.790 Now I could apply this up at name. 00:55:28.790 --> 00:55:31.610 I could say, riders must have a name, too. 00:55:31.610 --> 00:55:32.310 Let me try that. 00:55:32.310 --> 00:55:34.220 I'll say text not null. 00:55:34.220 --> 00:55:35.630 Or I could leave it optional. 00:55:35.630 --> 00:55:39.020 I could say, maybe text just on its own and let 00:55:39.020 --> 00:55:42.680 riders choose to supply a name or not. 00:55:42.680 --> 00:55:48.650 Now the question here is, should I apply not null to my primary key columns, 00:55:48.650 --> 00:55:53.300 like ID not null or ID not null here? 00:55:53.300 --> 00:55:56.420 You might think that you should, for thoroughness sake. 00:55:56.420 --> 00:56:01.940 Well it turns out that when you apply the primary key table constraint down 00:56:01.940 --> 00:56:06.440 below here, this already ensures that there are several constraints applied 00:56:06.440 --> 00:56:12.800 to this particular column called ID, among them being that ID cannot be 00:56:12.800 --> 00:56:13.350 null. 00:56:13.350 --> 00:56:17.480 So no need to duplicate this and say that this ID cannot be null when I 00:56:17.480 --> 00:56:23.240 already have it specified down below that ID is a primary key. 00:56:23.240 --> 00:56:25.520 Let me check others here. 00:56:25.520 --> 00:56:29.800 You might also think, could I do it for rider ID and station ID? 00:56:29.800 --> 00:56:32.470 Should I include not null here? 00:56:32.470 --> 00:56:36.490 Rider ID, not null, station ID, not null. 00:56:36.490 --> 00:56:39.250 Well, that would be a good thought, but again, we're 00:56:39.250 --> 00:56:43.540 taken care of by our table constraint using our foreign key here. 00:56:43.540 --> 00:56:48.520 Again, this constraint will say, if rider ID doesn't already 00:56:48.520 --> 00:56:53.380 exist in the ID column of rider's, I can't insert that value. 00:56:53.380 --> 00:56:58.090 And we could probably presume that if rider's ID is a primary key, 00:56:58.090 --> 00:57:01.130 well, null will not be part of this column, 00:57:01.130 --> 00:57:06.430 and therefore, I already can't insert null for rider ID or station ID. 00:57:06.430 --> 00:57:08.860 This would be in this case, redundant. 00:57:08.860 --> 00:57:12.910 So not null is good when you have columns that are neither 00:57:12.910 --> 00:57:15.910 primary keys nor foreign keys and you want 00:57:15.910 --> 00:57:20.010 to make sure that they have-- they always have a value, that they 00:57:20.010 --> 00:57:22.170 are never null, effectively. 00:57:25.010 --> 00:57:27.320 So that is not null. 00:57:27.320 --> 00:57:28.730 So we could keep going here. 00:57:28.730 --> 00:57:34.190 We also had one called unique that makes sure every value, 00:57:34.190 --> 00:57:37.100 every row in this column is unique. 00:57:37.100 --> 00:57:38.820 Where could we apply this? 00:57:38.820 --> 00:57:42.140 I could try to apply it, let's say, to the name of a station. 00:57:42.140 --> 00:57:44.820 Like, station should have unique names. 00:57:44.820 --> 00:57:47.600 So I'll say not null and unique. 00:57:47.600 --> 00:57:50.220 Now this column has two constraints. 00:57:50.220 --> 00:57:51.680 The first, not null. 00:57:51.680 --> 00:57:53.720 It should always have a value. 00:57:53.720 --> 00:57:55.190 The second, unique. 00:57:55.190 --> 00:57:58.220 The value shouldn't repeat throughout this column. 00:57:58.220 --> 00:58:01.100 Line I might leave without this constraint. 00:58:01.100 --> 00:58:04.370 I could imagine two stations being on the same line, like both on blue, 00:58:04.370 --> 00:58:07.170 I'll allow that in this case. 00:58:07.170 --> 00:58:12.560 Now again, we could try to apply unique to our primary keys or our foreign keys 00:58:12.560 --> 00:58:15.920 as I just did here, but it's already taken 00:58:15.920 --> 00:58:19.520 care of for us using this primary key constraint. 00:58:19.520 --> 00:58:24.050 A primary key, again, is always going to be unique and never null. 00:58:24.050 --> 00:58:29.720 So we'll take advantage of that already using our primary key and foreign key 00:58:29.720 --> 00:58:32.490 constraints here. 00:58:32.490 --> 00:58:33.270 OK. 00:58:33.270 --> 00:58:36.180 So we've seen unique and not null. 00:58:36.180 --> 00:58:39.510 And I might argue, we're at the point where this schema seems 00:58:39.510 --> 00:58:44.400 to be fairly optimized, at least using our column constraints, our table 00:58:44.400 --> 00:58:47.560 constraints, our type affinities and so on. 00:58:47.560 --> 00:58:55.440 So let's ask then, what questions do we have on not null and unique, if any? 00:58:55.440 --> 00:58:58.770 AUDIENCE: So basically, to recap if I understood correctly, it's 00:58:58.770 --> 00:59:01.620 not precisely about not null and unique but about 00:59:01.620 --> 00:59:06.060 the concept of the key, labeling key immediately 00:59:06.060 --> 00:59:10.135 gives the attribute of not null, unique, and to be referenced, right? 00:59:10.135 --> 00:59:11.260 CARTER ZENKE: That is true. 00:59:11.260 --> 00:59:14.640 So when you use a primary key or a foreign key constraint, 00:59:14.640 --> 00:59:17.880 there are other constraints that go along with that constraint. 00:59:17.880 --> 00:59:23.410 A primary key, for instance, must not be null, it must be unique, and so on. 00:59:23.410 --> 00:59:25.290 So it would be redundant to apply that again, 00:59:25.290 --> 00:59:29.010 to say that this primary key should be unique or not null. 00:59:29.010 --> 00:59:31.160 Good clarification there. 00:59:31.160 --> 00:59:32.000 OK. 00:59:32.000 --> 00:59:36.590 So I think we're at the point where this schema is pretty well set for us, 00:59:36.590 --> 00:59:42.380 and we're going to need to think about how we use this in the real world. 00:59:42.380 --> 00:59:48.630 Like, if this is our schema here, we have riders and stations, well, 00:59:48.630 --> 00:59:49.940 what could we do? 00:59:49.940 --> 00:59:52.340 I mean, riders tend to-- 00:59:52.340 --> 00:59:54.710 I mean, they could register for the subway. 00:59:54.710 --> 00:59:56.900 And riders, well, they do visit stations. 00:59:56.900 --> 00:59:59.360 But I think if we applied this to the real world, 00:59:59.360 --> 01:00:02.630 we'd see that this isn't quite how it's actually done. 01:00:02.630 --> 01:00:05.510 Like, riders don't really register that often. 01:00:05.510 --> 01:00:08.570 If a rider who's new to the city comes in, they want to ride the subway, 01:00:08.570 --> 01:00:10.760 they should be able to ride, too. 01:00:10.760 --> 01:00:15.110 So it turns out at least here in Boston, the MBTA doesn't really 01:00:15.110 --> 01:00:20.360 track riders, per se, but they do track what we call CharlieCards. 01:00:20.360 --> 01:00:23.780 If you want to ride a subway, you get a CharlieCard. 01:00:23.780 --> 01:00:26.330 A CharlieCard allows you access to the subway, 01:00:26.330 --> 01:00:28.910 keeps track of your fare, your balance and so on, 01:00:28.910 --> 01:00:31.730 and allows you to swipe in to certain stations. 01:00:31.730 --> 01:00:33.800 So when we come back from a break here, we'll 01:00:33.800 --> 01:00:37.550 see how we can actually implement these CharlieCards instead of riders 01:00:37.550 --> 01:00:40.460 to make our system more efficient and more in line with what 01:00:40.460 --> 01:00:42.140 happens in the real world. 01:00:42.140 --> 01:00:45.220 We'll see you all in just a few. 01:00:45.220 --> 01:00:46.940 And we're back. 01:00:46.940 --> 01:00:49.750 So we saw last time, this challenge of designing 01:00:49.750 --> 01:00:53.080 a system for the Massachusetts Bay Transportation Authority 01:00:53.080 --> 01:00:55.540 to represent riders and stations-- people 01:00:55.540 --> 01:00:58.480 who get on a subway to go around Boston. 01:00:58.480 --> 01:01:03.460 But as we also saw, we learned that the MBTA doesn't 01:01:03.460 --> 01:01:09.100 keep track of riders themselves, they keep track of CharlieCards, this card 01:01:09.100 --> 01:01:11.650 that a rider might carry in their pocket. 01:01:11.650 --> 01:01:17.230 And they can scan this card at a station to enter or even to exit in some cases, 01:01:17.230 --> 01:01:21.650 to make sure that their fare is counted as they enter that station. 01:01:21.650 --> 01:01:27.520 So let's think now how to improve our schema to represent not just riders, 01:01:27.520 --> 01:01:31.540 like me, But CharlieCards themselves people might carry around 01:01:31.540 --> 01:01:34.120 when they enter the subway station. 01:01:34.120 --> 01:01:37.120 Well, we saw before we had riders and cards. 01:01:37.120 --> 01:01:43.030 But our goal now is to remove riders from the picture, focus only on cards. 01:01:43.030 --> 01:01:47.470 Well cards, as we've seen, might make us swipe at a station. 01:01:47.470 --> 01:01:51.790 If I enter Harvard station, I might swipe my CharlieCard 01:01:51.790 --> 01:01:54.380 to enter that station. 01:01:54.380 --> 01:01:59.270 And we could see that a card would have maybe many swipes involved. 01:01:59.270 --> 01:02:04.520 Like if I swipe at Harvard, I might also swipe it MIT or swipe it Park Street, 01:02:04.520 --> 01:02:05.810 and so on. 01:02:05.810 --> 01:02:10.410 We could see that a swipe can only happen at a single station at a time, 01:02:10.410 --> 01:02:10.910 though. 01:02:10.910 --> 01:02:14.990 Like if I swipe at Park Street, I'm only swiping at Park Street here, 01:02:14.990 --> 01:02:20.050 and similarly, a swipe might only involve one, single card. 01:02:20.050 --> 01:02:23.040 So if we think about these entities and how they relate, 01:02:23.040 --> 01:02:27.810 we could also think about what kinds of columns you have inside of each entity. 01:02:27.810 --> 01:02:30.730 In this case, I would argue we have something a bit like this. 01:02:30.730 --> 01:02:36.150 We could say that a card has a ID for its primary key, in this case. 01:02:36.150 --> 01:02:42.360 Similarly, this card makes a swipe, and this swipe has, itself, an ID, 01:02:42.360 --> 01:02:46.260 as well as a type, some time that that swipe happened, 01:02:46.260 --> 01:02:49.480 and an amount or a transaction that is involved. 01:02:49.480 --> 01:02:54.090 So for example, let's say I swipe in at the Harvard Square Station. 01:02:54.090 --> 01:02:59.880 That type of swipe might be to enter the station at some certain time. 01:02:59.880 --> 01:03:03.510 Now associated with that swipe is also some amount 01:03:03.510 --> 01:03:07.320 in dollars that happened to be subtracted from my card. 01:03:07.320 --> 01:03:10.620 Like let's say the fare nowadays is $2.40, well, 01:03:10.620 --> 01:03:15.830 that amount is subtracted from my card from this swipe. 01:03:15.830 --> 01:03:20.270 Now, of course, I do all of this at a station, which has our same columns 01:03:20.270 --> 01:03:23.630 from before, ID, name, and line. 01:03:23.630 --> 01:03:28.130 So a similar idea now, but we're replacing riders with cards 01:03:28.130 --> 01:03:31.190 and adding more information to these visits. 01:03:31.190 --> 01:03:34.940 There are instead swipes that could be maybe entering the station, 01:03:34.940 --> 01:03:39.240 exiting the station or just adding funds to my balance, for instance. 01:03:39.240 --> 01:03:41.690 So let's see how we could improve our schema now 01:03:41.690 --> 01:03:47.720 using some new SQL keywords to alter our tables and add some new ones, too. 01:03:47.720 --> 01:03:49.790 I'll go back to my computer. 01:03:49.790 --> 01:03:54.560 And let's see how we could try to alter the table we've already created. 01:03:54.560 --> 01:03:58.910 We already have a riders table, a visits table and a stations table, 01:03:58.910 --> 01:04:02.450 but we could learn some new statements, some new queries to actually 01:04:02.450 --> 01:04:06.030 update and alter these tables as well. 01:04:06.030 --> 01:04:08.100 The first one as we saw a little bit before 01:04:08.100 --> 01:04:11.270 is this one called drop table, arguably the most dramatic thing 01:04:11.270 --> 01:04:14.100 you can do to table, is just drop it, delete it like this. 01:04:14.100 --> 01:04:18.080 So let's try just deleting the riders table from our database. 01:04:18.080 --> 01:04:21.140 I'll go back to my-- 01:04:21.140 --> 01:04:24.050 go back to my database here. 01:04:24.050 --> 01:04:29.370 I'll type sqlite mbta.db to pull up this database again. 01:04:29.370 --> 01:04:34.160 And now if I type dot schema, I'll see I have a riders, stations, and visits 01:04:34.160 --> 01:04:34.790 table. 01:04:34.790 --> 01:04:38.510 But no longer-- I don't want to have the riders table. 01:04:38.510 --> 01:04:39.990 I want to remove that. 01:04:39.990 --> 01:04:45.200 So I'll say, drop table riders, semicolon. 01:04:45.200 --> 01:04:46.880 Now I'll hit enter. 01:04:46.880 --> 01:04:48.650 No more riders table. 01:04:48.650 --> 01:04:52.830 If I type dot schema, that's gone for my database. 01:04:52.830 --> 01:04:54.990 Well, what could I do now? 01:04:54.990 --> 01:04:58.740 I've dropped the table, but I'd still need to maybe update 01:04:58.740 --> 01:05:00.660 visits to instead be swipes. 01:05:00.660 --> 01:05:02.850 I could probably leave stations as is. 01:05:02.850 --> 01:05:06.360 But I want to update a table or alter its schema, 01:05:06.360 --> 01:05:09.370 I can use some new commands as well. 01:05:09.370 --> 01:05:12.960 I'll use this one called alter table. 01:05:12.960 --> 01:05:16.300 Alter table looks a bit like this. 01:05:16.300 --> 01:05:22.320 I can use the alter table statement here and give it some table name. 01:05:22.320 --> 01:05:27.990 Like let's say alter table visits, and then inside this visits table, 01:05:27.990 --> 01:05:30.370 I get to ask, what do I want to do? 01:05:30.370 --> 01:05:33.030 Do I want to, let say, rename the table? 01:05:33.030 --> 01:05:34.240 I could do that. 01:05:34.240 --> 01:05:37.890 I could also decide to add a column, rename a column, 01:05:37.890 --> 01:05:40.440 or drop a column altogether. 01:05:40.440 --> 01:05:43.530 Let's try first looking at rename to. 01:05:43.530 --> 01:05:47.790 I want to rename this visits table to a swipes table, 01:05:47.790 --> 01:05:51.510 representing not just a visit to the station, but a swipe of a card. 01:05:51.510 --> 01:05:53.340 So let's try this one here. 01:05:53.340 --> 01:05:57.240 I'll go back to my computer and I'll go back to SQLite. 01:05:57.240 --> 01:06:01.740 And I'll say, I no longer want visits to be called visits. 01:06:01.740 --> 01:06:06.330 I ideally want visits to actually be called swipes. 01:06:06.330 --> 01:06:07.590 So let me try this. 01:06:07.590 --> 01:06:13.500 I'll say alter table, as we saw before, visits 01:06:13.500 --> 01:06:18.180 rename to swipes, like this, semicolon. 01:06:18.180 --> 01:06:19.770 Now I'll hit enter. 01:06:19.770 --> 01:06:24.212 And now if I type dot schema, again, I see-- 01:06:24.212 --> 01:06:25.170 oops, kind of strange-- 01:06:25.170 --> 01:06:28.500 I'll hit dot schema again, I'll see swipes, 01:06:28.500 --> 01:06:32.730 no longer called visits, but now called swipes. 01:06:32.730 --> 01:06:36.390 What we saw-- I'd ideally like to add a type of the swipe, 01:06:36.390 --> 01:06:40.200 maybe I'm entering the station, maybe I'm exiting. 01:06:40.200 --> 01:06:43.770 Maybe I'm just adding funds or depositing some funds. 01:06:43.770 --> 01:06:47.940 So let me try adding, let's say, a new column to swipes. 01:06:47.940 --> 01:06:51.630 I'll say alter table, add column-- 01:06:51.630 --> 01:06:57.910 alter table, swipes, I'm going to name that table, swipes, like this. 01:06:57.910 --> 01:07:00.840 And then let's add a column-- 01:07:00.840 --> 01:07:03.510 add a column called type. 01:07:03.510 --> 01:07:06.600 And this will have the type affinity text. 01:07:06.600 --> 01:07:08.580 I'll hit semicolon, enter. 01:07:08.580 --> 01:07:12.060 And now if I type dot schema, I'll see-- 01:07:12.060 --> 01:07:16.820 well, I have a column called a ttpe. 01:07:16.820 --> 01:07:18.270 So clearly, I made a typo here. 01:07:18.270 --> 01:07:20.570 I had rider ID, station ID. 01:07:20.570 --> 01:07:26.120 This new column down below called ttpe, I kind of want to fix this, right? 01:07:26.120 --> 01:07:28.430 I don't want ttpe, I want type. 01:07:28.430 --> 01:07:31.760 So probably good chance to use my rename column. 01:07:31.760 --> 01:07:34.250 I'll come back here and I'll try that. 01:07:34.250 --> 01:07:37.610 I'll instead do alter table-- 01:07:37.610 --> 01:07:43.160 alter table, swipes, and I will rename a column now. 01:07:43.160 --> 01:07:51.170 I'll say rename the column, ttpe to type, spelled correctly now. 01:07:51.170 --> 01:07:55.220 Hit semicolon, clear my terminal, type dot schema. 01:07:55.220 --> 01:08:01.070 And now I see over here that type has been added to my table of swipes. 01:08:01.070 --> 01:08:06.170 I see rider ID, station ID, and now a new column called type. 01:08:06.170 --> 01:08:11.630 So through alter table, we go ahead and add new columns, rename them even, 01:08:11.630 --> 01:08:15.650 or if I wanted to just drop the column altogether. 01:08:15.650 --> 01:08:19.160 Let's say I add this column type and I change my mind, 01:08:19.160 --> 01:08:20.580 I don't want it here anymore. 01:08:20.580 --> 01:08:24.260 I go back to my computer and I could try dropping a particular column. 01:08:24.260 --> 01:08:28.850 Let me try in this case, alter table, swipes. 01:08:28.850 --> 01:08:33.859 And now let me drop column type, semicolon. 01:08:33.859 --> 01:08:34.640 Hit enter. 01:08:34.640 --> 01:08:40.529 And now if I type dot schema, I'm back to where I began. 01:08:40.529 --> 01:08:45.330 So these are the commands we can use to alter our tables, to improve them 01:08:45.330 --> 01:08:48.450 if you make a mistake during our first create table command, 01:08:48.450 --> 01:08:52.930 or if you want to add more functionality later down the line. 01:08:52.930 --> 01:08:58.050 So ideally, I could keep using alter table, add table, create table, drop 01:08:58.050 --> 01:08:58.930 table, and so on. 01:08:58.930 --> 01:09:01.290 But what I want to do here is just start from scratch. 01:09:01.290 --> 01:09:04.029 Like I have stations and swipes and so on. 01:09:04.029 --> 01:09:07.899 Why don't I just go back to my schema file and use that instead? 01:09:07.899 --> 01:09:11.910 So what I'll do is I'll drop table for stations and-- 01:09:11.910 --> 01:09:12.779 oops, semicolon. 01:09:12.779 --> 01:09:17.490 I will drop table for swipes now, semicolon. 01:09:17.490 --> 01:09:22.439 I'll type dot schema, and now I'll see nothing in here. 01:09:22.439 --> 01:09:26.760 I'll quit and I'll type code schema.sql. 01:09:26.760 --> 01:09:29.970 Let me just start from scratch using this schema.sql. 01:09:29.970 --> 01:09:35.790 So we no longer want to have riders, we only want to have cards. 01:09:35.790 --> 01:09:38.670 So I could just rename this table here. 01:09:38.670 --> 01:09:39.930 I'll call it cards. 01:09:39.930 --> 01:09:41.609 Create table, cards. 01:09:41.609 --> 01:09:47.520 Now cards don't have a name, they only have some unique ID in this case. 01:09:47.520 --> 01:09:50.460 Now I'll leave stations just as it is. 01:09:50.460 --> 01:09:51.870 This seems perfectly fine to me. 01:09:51.870 --> 01:09:55.800 Stations have an ID, a name and a line with these constraints 01:09:55.800 --> 01:09:58.690 applied to the name and line columns. 01:09:58.690 --> 01:10:00.720 But now the visits table. 01:10:00.720 --> 01:10:04.500 Well, the visits table is no longer a visit to the station per se, 01:10:04.500 --> 01:10:07.570 it's more a swipe of the card at that station. 01:10:07.570 --> 01:10:10.950 So let's now say visits becomes swipes. 01:10:10.950 --> 01:10:15.370 And among these new columns to add are the following. 01:10:15.370 --> 01:10:18.690 I want to have not just an ID for each swipe, 01:10:18.690 --> 01:10:22.230 not an ID for rider ID and station ID and so on, 01:10:22.230 --> 01:10:26.310 but also I want to have a type of swipe. 01:10:26.310 --> 01:10:30.870 Am I entering the station, exiting, or depositing some funds? 01:10:30.870 --> 01:10:34.980 So I'll say the type of this is going to be a new column, 01:10:34.980 --> 01:10:40.200 and the data type, this whole will be text, enter, exit, 01:10:40.200 --> 01:10:42.900 or deposit for funds. 01:10:42.900 --> 01:10:45.270 Now let me try another column, two. 01:10:45.270 --> 01:10:47.700 I'll include a date time. 01:10:47.700 --> 01:10:49.260 A date time is like a timestamp. 01:10:49.260 --> 01:10:51.810 What time did this swipe actually happen? 01:10:51.810 --> 01:10:54.570 And I'll make this type numeric. 01:10:54.570 --> 01:11:00.650 Numeric can store all kinds of dates and times for me in this table. 01:11:00.650 --> 01:11:02.950 Now let me add one final column. 01:11:02.950 --> 01:11:04.990 This one will be an amount. 01:11:04.990 --> 01:11:09.910 I'll also use numeric for this kind of column here. 01:11:09.910 --> 01:11:13.150 And I'll say that this column called amount 01:11:13.150 --> 01:11:18.500 can store, in this case integers or real numbers, like floats and so on. 01:11:18.500 --> 01:11:24.310 I'll probably decide on that when I actually add some data to this table. 01:11:24.310 --> 01:11:28.650 So here we've updated our schema to represent that diagram we saw before. 01:11:28.650 --> 01:11:31.380 I have cards, I have stations, and I have 01:11:31.380 --> 01:11:36.240 swipes that have some type associated with them, enter, exit, deposit, 01:11:36.240 --> 01:11:39.750 some date, time, and some amount that was charged to me 01:11:39.750 --> 01:11:41.340 while I made this swipe. 01:11:41.340 --> 01:11:44.070 Either I added some funds, in which case amount is positive, 01:11:44.070 --> 01:11:50.000 or I subtracted some funds, entering exiting, in this case from the station. 01:11:50.000 --> 01:11:50.990 All right. 01:11:50.990 --> 01:11:53.390 So I have these tables now. 01:11:53.390 --> 01:11:56.750 And now I want to probably apply some of those same column 01:11:56.750 --> 01:11:58.490 constraints we saw before. 01:11:58.490 --> 01:12:02.840 Like here, it's fine, but I also want to make sure I'm not adding some data 01:12:02.840 --> 01:12:05.280 that I don't want to add to this table. 01:12:05.280 --> 01:12:09.230 So I could go back to my old friends, these column constraints. 01:12:09.230 --> 01:12:14.450 And we saw before, we had default and not null, unique and check. 01:12:14.450 --> 01:12:20.400 We've used not null and unique, but we haven't check or default. 01:12:20.400 --> 01:12:22.970 So let's start using more than just not null and unique 01:12:22.970 --> 01:12:27.720 and also focus on check and default, what they can do for us here. 01:12:27.720 --> 01:12:29.670 I'll go back to my schema. 01:12:29.670 --> 01:12:32.970 And let me just make sure that I'm making all the columns that I 01:12:32.970 --> 01:12:35.610 want to be required actually required. 01:12:35.610 --> 01:12:39.990 I'll go into my swipes table and I'll say that type-- 01:12:39.990 --> 01:12:41.700 I mean, this should be required. 01:12:41.700 --> 01:12:46.380 I should know whether I entered, exited, or deposited some funds. 01:12:46.380 --> 01:12:48.750 So I'll say text not null. 01:12:48.750 --> 01:12:52.590 Similarly, for the timestamp, the time this swipe happened, 01:12:52.590 --> 01:12:54.970 I want that to be not null as well. 01:12:54.970 --> 01:12:57.660 I must know what time I swiped the card. 01:12:57.660 --> 01:13:00.270 And also, it makes sense for there to always be 01:13:00.270 --> 01:13:02.820 some amount associated with this swipe. 01:13:02.820 --> 01:13:07.450 Either I added some funds to my balance or I removed some funds overall. 01:13:07.450 --> 01:13:11.260 So I'll make this not null as well. 01:13:11.260 --> 01:13:14.910 Well, let's see what we could do here with default. 01:13:14.910 --> 01:13:18.900 Default gives me some default value, some initial value 01:13:18.900 --> 01:13:25.320 to add to this column if I don't specify what kind of value to add, 01:13:25.320 --> 01:13:27.840 what be good for date time. 01:13:27.840 --> 01:13:30.390 Date time here is, again, the timestamp. 01:13:30.390 --> 01:13:35.080 The time I swiped this card to enter, let's say, Harvard station. 01:13:35.080 --> 01:13:40.210 Well, if I want this to always have the current time, 01:13:40.210 --> 01:13:43.690 I could use default. I could say the default 01:13:43.690 --> 01:13:48.220 value for this column is this special value here, 01:13:48.220 --> 01:13:52.160 CURRENT TIMESTAMP, in all caps. 01:13:52.160 --> 01:13:57.470 CURRENT TIMESTAMP will show me the year, the month, the day, the hour, 01:13:57.470 --> 01:14:02.600 the minute, the second, all in one value, and insert that into my table. 01:14:02.600 --> 01:14:08.330 So as soon as I add new row, if I don't supply a time for this datetime column, 01:14:08.330 --> 01:14:15.860 I'll instead get back the current time exactly as it is represented in SQLite. 01:14:15.860 --> 01:14:17.760 Now what can I do further than this? 01:14:17.760 --> 01:14:21.750 I could also try to add a-- my very own check. 01:14:21.750 --> 01:14:26.780 Maybe I want to make sure that the amounts here are never equal to zero. 01:14:26.780 --> 01:14:31.220 Like, nobody should ever be able to make a $0 transaction or $0 swipe. 01:14:31.220 --> 01:14:33.860 They're always being charged some money or they're always 01:14:33.860 --> 01:14:36.050 depositing some money, in this case. 01:14:36.050 --> 01:14:41.540 So I could say, amount here has my very own check on this column. 01:14:41.540 --> 01:14:45.980 And inside check, I can actually write my very own expression to check for. 01:14:45.980 --> 01:14:51.170 I could say, for example, amount is not equal to zero. 01:14:51.170 --> 01:14:54.500 Using those same operators we saw back in week zero, 01:14:54.500 --> 01:15:01.810 this will ensure that any value inside amount will not be equal to zero. 01:15:01.810 --> 01:15:03.760 Let's try also for type. 01:15:03.760 --> 01:15:06.730 I mean, type can only have a few values. 01:15:06.730 --> 01:15:11.170 We saw enter, exit, deposit some funds. 01:15:11.170 --> 01:15:16.060 I could make sure that only those values are included inside my type column. 01:15:16.060 --> 01:15:20.500 I could say check that type is in some list of values. 01:15:20.500 --> 01:15:23.560 Going back to week one here, we talk about in. 01:15:23.560 --> 01:15:31.460 I could say, maybe the type is in enter exit or is in deposit. 01:15:31.460 --> 01:15:34.570 So now when I have this table called swipes, 01:15:34.570 --> 01:15:38.740 I'm representing what I'm actually doing when I go to Harvard station. 01:15:38.740 --> 01:15:42.633 I have here a visit for myself, my very own ID 01:15:42.633 --> 01:15:44.050 here, which will update in second. 01:15:44.050 --> 01:15:47.290 I have a station ID where I'm actually going to visit. 01:15:47.290 --> 01:15:52.270 I have a type that I'm going to use to enter, exit, or deposit 01:15:52.270 --> 01:15:54.100 some funds at this station. 01:15:54.100 --> 01:15:56.710 I'm doing it at a certain time, and I have an amount 01:15:56.710 --> 01:16:00.200 associated with this transaction. 01:16:00.200 --> 01:16:02.500 Now, there's one thing to fix here, which 01:16:02.500 --> 01:16:07.040 is that we're still talking about riders inside our swipes table. 01:16:07.040 --> 01:16:08.360 So let's fix that here, too. 01:16:08.360 --> 01:16:12.460 I'll go back to my computer, and let's try fixing this. 01:16:12.460 --> 01:16:19.120 I have a rider ID inside of my swipes table, but no longer do I have riders, 01:16:19.120 --> 01:16:20.680 I have cards. 01:16:20.680 --> 01:16:25.780 So let me say that this is now a card ID, and down below in my foreign key, 01:16:25.780 --> 01:16:32.650 I'll say that this card ID column references the ID column in cards. 01:16:32.650 --> 01:16:35.140 And I think this should represent everything 01:16:35.140 --> 01:16:39.800 I want to represent about swipes at the station. 01:16:39.800 --> 01:16:44.690 So let me ask now what questions we have about this new schema 01:16:44.690 --> 01:16:47.270 and the constraints we've applied. 01:16:47.270 --> 01:16:50.210 AUDIENCE: I wonder, how could you delete or drop 01:16:50.210 --> 01:16:56.510 the table of riders when you use ID as a foreign key? 01:16:56.510 --> 01:17:00.128 I try to do that, but I got an error. 01:17:00.128 --> 01:17:03.170 CARTER ZENKE: Yeah, so you're getting into some more advanced stuff here. 01:17:03.170 --> 01:17:06.500 And suffice to say for now, my foreign key constraints 01:17:06.500 --> 01:17:09.830 aren't actually being checked right now, but yours might be. 01:17:09.830 --> 01:17:14.090 If you try to drop a table that actually has some data that's 01:17:14.090 --> 01:17:17.900 referenced by a foreign key, SQLite will warn you, perhaps to tell you, 01:17:17.900 --> 01:17:21.840 you can't do that because this ID is referenced from this table over there. 01:17:21.840 --> 01:17:26.780 So in that case, best to delete the value that has that foreign key 01:17:26.780 --> 01:17:30.470 and then proceed with dropping that table altogether. 01:17:30.470 --> 01:17:31.910 Good question there. 01:17:31.910 --> 01:17:34.630 Yeah, let's take one more. 01:17:34.630 --> 01:17:43.570 AUDIENCE: How might this syntax send the other SQL languages, same as SQLite 01:17:43.570 --> 01:17:46.580 languages, how much is syntax same? 01:17:46.580 --> 01:17:48.080 CARTER ZENKE: Yeah, a good question. 01:17:48.080 --> 01:17:52.000 So here we're using the SQLite database management system. 01:17:52.000 --> 01:17:55.060 It is similar to but distinct in some ways 01:17:55.060 --> 01:17:57.820 from others like MySQL or PostgreSQL. 01:17:57.820 --> 01:18:00.220 I would say that most of what you're doing here 01:18:00.220 --> 01:18:04.822 could also be ported to MySQL and PostgreSQL with a few exceptions 01:18:04.822 --> 01:18:07.030 that you might need to treat on a case by case basis. 01:18:07.030 --> 01:18:10.220 In fact, the developers of SQLite built things 01:18:10.220 --> 01:18:14.800 so that it would be easy to port their database schemas to another schema 01:18:14.800 --> 01:18:18.140 like MySQL or PostgreSQL as well. 01:18:18.140 --> 01:18:21.500 But good question there. 01:18:21.500 --> 01:18:23.960 All right, let's take one more question here. 01:18:23.960 --> 01:18:30.260 AUDIENCE: Imagine if we put in the ID, we don't put any data type, 01:18:30.260 --> 01:18:32.090 it's going to give us-- 01:18:32.090 --> 01:18:37.032 in this schema, it's going to give us a text or-- 01:18:37.032 --> 01:18:38.490 CARTER ZENKE: Yeah, great question. 01:18:38.490 --> 01:18:42.980 So you're asking if I didn't tell SQLite what kind of type affinity 01:18:42.980 --> 01:18:46.220 a column had, what type of affinity would it actually have? 01:18:46.220 --> 01:18:47.360 A great question. 01:18:47.360 --> 01:18:52.460 In this case, by default, SQLite gives the numeric type affinity, 01:18:52.460 --> 01:18:56.250 where numeric can store integers or real values. 01:18:56.250 --> 01:18:59.180 But if you give it like a string of text, like let's say red line, 01:18:59.180 --> 01:19:04.980 it will store that for you, too, kind of non-intuitively, but it will. 01:19:04.980 --> 01:19:08.310 But the default type affinity is numeric in this case, 01:19:08.310 --> 01:19:10.340 if you don't otherwise specify. 01:19:10.340 --> 01:19:11.900 Good question. 01:19:11.900 --> 01:19:17.410 OK, let's come back then and let's focus on wrapping up on a final few pieces 01:19:17.410 --> 01:19:17.910 here. 01:19:17.910 --> 01:19:20.810 So we've seen some table constraints, which 01:19:20.810 --> 01:19:23.360 we apply primary keys and foreign keys. 01:19:23.360 --> 01:19:26.360 We saw column constraints, where we could make sure 01:19:26.360 --> 01:19:28.970 that certain values were given to us through not null. 01:19:28.970 --> 01:19:31.730 We could also make sure that the-- 01:19:31.730 --> 01:19:33.925 let's say the value is in some list of values, 01:19:33.925 --> 01:19:37.460 they're using check or making sure that it's not some value, 01:19:37.460 --> 01:19:39.320 also using check as well. 01:19:39.320 --> 01:19:44.930 Default allows us to specify a given value for every new insertion of rows 01:19:44.930 --> 01:19:45.560 here. 01:19:45.560 --> 01:19:49.550 And this is actually pretty important to have. 01:19:49.550 --> 01:19:53.750 Not just schemas that have column names and let's 01:19:53.750 --> 01:19:57.320 say type affinities as well, but also constraints, 01:19:57.320 --> 01:20:00.210 make sure the data we insert works well for us. 01:20:00.210 --> 01:20:02.960 And there's actually a story behind this person 01:20:02.960 --> 01:20:07.820 who's on the CharlieCard, this person who's on the subway. 01:20:07.820 --> 01:20:11.420 His name is Charlie, and he's perhaps the most famous subway 01:20:11.420 --> 01:20:13.880 rider in all of Boston. 01:20:13.880 --> 01:20:17.900 Back in the 19-- let's say 1900s, the band 01:20:17.900 --> 01:20:21.290 called The Kingston Trio wrote a song about this man named Charlie. 01:20:21.290 --> 01:20:26.030 Charlie, supposedly, got on at the Kendall Square station where MIT is, 01:20:26.030 --> 01:20:27.950 and he made for Jamaica Plain. 01:20:27.950 --> 01:20:30.320 But once he got to Jamaica Plane, the conductor 01:20:30.320 --> 01:20:34.040 asked him for one more nickel, and well, he didn't have that nickel. 01:20:34.040 --> 01:20:36.860 So he got stuck on the train for years, and years, and years. 01:20:36.860 --> 01:20:38.750 He couldn't get off of the subway. 01:20:38.750 --> 01:20:42.532 So keep in mind, Charlie, when you're writing your own database constraints, 01:20:42.532 --> 01:20:44.240 making sure that if you get on the train, 01:20:44.240 --> 01:20:46.610 you're able to get off of it at some point, right? 01:20:46.610 --> 01:20:49.080 Don't end up like Charlie in this case. 01:20:49.080 --> 01:20:53.720 So with this in mind, you're able to design your very own database 01:20:53.720 --> 01:20:56.840 schemas that keep not just certain columns involved, 01:20:56.840 --> 01:20:59.210 but also type affinities for those columns, types-- 01:20:59.210 --> 01:21:01.790 those data types-- types the columns can store. 01:21:01.790 --> 01:21:04.940 You're also able to apply constraints to those columns to make sure 01:21:04.940 --> 01:21:07.130 that the data you're inserting is that you actually 01:21:07.130 --> 01:21:09.380 want to have in that column. 01:21:09.380 --> 01:21:13.100 Next time, we'll focus on actually adding data to our columns, 01:21:13.100 --> 01:21:16.260 actually write data to a database file to insert, 01:21:16.260 --> 01:21:18.720 update, and delete that data altogether. 01:21:18.720 --> 01:21:22.480 So with that in mind, we'll see you next time.