1 00:00:00,000 --> 00:00:04,910 [MUSIC PLAYING] 2 00:00:04,910 --> 00:00:17,155 3 00:00:17,155 --> 00:00:18,780 CARTER ZENKE: Well, hello, one and all. 4 00:00:18,780 --> 00:00:22,890 And welcome back to CS50's Introduction to Databases with SQL. 5 00:00:22,890 --> 00:00:24,360 My name is Carter Zenke. 6 00:00:24,360 --> 00:00:26,820 And last we left off, we learned about relating-- 7 00:00:26,820 --> 00:00:31,360 that is, how to have multiple tables in our database for people, places, 8 00:00:31,360 --> 00:00:34,500 and things, and how to have them relate to one another in the way 9 00:00:34,500 --> 00:00:36,580 you might do in the real world. 10 00:00:36,580 --> 00:00:38,550 Now, today, we'll take a step forward. 11 00:00:38,550 --> 00:00:41,370 And we'll talk about how to put you in the driver's seat, 12 00:00:41,370 --> 00:00:45,870 designing your very own database schemas to organize your data. 13 00:00:45,870 --> 00:00:49,890 Now, we'll pick up where we left off, which is with this database of books. 14 00:00:49,890 --> 00:00:52,050 So we had this database. 15 00:00:52,050 --> 00:00:56,760 It was full of books that have been longlisted for the International Booker 16 00:00:56,760 --> 00:00:57,450 Prize. 17 00:00:57,450 --> 00:01:01,740 To be longlisted means to be nominated for some prize, let's say. 18 00:01:01,740 --> 00:01:05,190 So we had the past five years of books in this database. 19 00:01:05,190 --> 00:01:10,170 And we worked on improving this database over time from week 0 to week 1. 20 00:01:10,170 --> 00:01:13,080 Now, this week, we'll actually take a look underneath the hood 21 00:01:13,080 --> 00:01:17,980 and see what commands we had used to create these varied databases. 22 00:01:17,980 --> 00:01:21,660 So let's reveal now what we had done all along. 23 00:01:21,660 --> 00:01:24,010 I'll go back to my terminal here. 24 00:01:24,010 --> 00:01:26,040 And if you remember, I could use a command 25 00:01:26,040 --> 00:01:29,730 to open up a database file, which was this command here, 26 00:01:29,730 --> 00:01:34,450 SQL sqlite3, and then the name of the file I want to open. 27 00:01:34,450 --> 00:01:35,910 So let's try this. 28 00:01:35,910 --> 00:01:39,420 I'll go back to week 0, to my week 0 folder like this. 29 00:01:39,420 --> 00:01:43,035 And I'll open up my long list database, week0/longlist.db. 30 00:01:43,035 --> 00:01:46,170 31 00:01:46,170 --> 00:01:47,430 I'll hit Enter. 32 00:01:47,430 --> 00:01:49,720 Now, I'm in my sqlite prompt. 33 00:01:49,720 --> 00:01:52,230 So I could work on typing some SQL commands, 34 00:01:52,230 --> 00:01:58,030 SQL statements or queries inside of this terminal prompt here. 35 00:01:58,030 --> 00:02:01,660 So if I want to get a feel for what's inside this database, 36 00:02:01,660 --> 00:02:04,230 we saw I could use a command called select. 37 00:02:04,230 --> 00:02:07,860 So I'll select now some rows from this table. 38 00:02:07,860 --> 00:02:14,370 I'll say SELECT, let's say, the title and also the author columns 39 00:02:14,370 --> 00:02:19,740 from my longlist table, semicolon, and Enter. 40 00:02:19,740 --> 00:02:23,760 Now, I'll see all the titles and all the authors that 41 00:02:23,760 --> 00:02:27,030 were inside of this long list table. 42 00:02:27,030 --> 00:02:29,250 But I only want to peek here. 43 00:02:29,250 --> 00:02:32,760 I only want to see roughly what kind of data is inside. 44 00:02:32,760 --> 00:02:36,130 So I could probably improve this command here. 45 00:02:36,130 --> 00:02:43,530 I could instead say SELECT the author and title columns from longlist-- 46 00:02:43,530 --> 00:02:49,470 from longlist, but limit now to the first five rows we saw before. 47 00:02:49,470 --> 00:02:52,440 Semicolon, and I'll hit Enter on this query. 48 00:02:52,440 --> 00:02:56,590 And now, I'll see only the top five rows. 49 00:02:56,590 --> 00:03:02,860 So I'm able now to see what kind of data is inside my database. 50 00:03:02,860 --> 00:03:08,790 But what I can't yet see is what command was used to create this table 51 00:03:08,790 --> 00:03:11,950 and what kind of data could be stored inside of it. 52 00:03:11,950 --> 00:03:16,140 So let's reveal now what was going on underneath the hood all this time. 53 00:03:16,140 --> 00:03:20,040 I'll say, this new command, a sqlite command-- not a SQL keyword, 54 00:03:20,040 --> 00:03:24,580 but sqlite command called .schema, .schema. 55 00:03:24,580 --> 00:03:28,350 Now, if I hit Enter, I'll see the following-- 56 00:03:28,350 --> 00:03:30,840 the command, the statement, the query that 57 00:03:30,840 --> 00:03:34,260 was used to create this longlist table. 58 00:03:34,260 --> 00:03:37,980 And notice how I have many columns inside this table. 59 00:03:37,980 --> 00:03:43,480 I have an ISBN column, a title column, an author column, and so on. 60 00:03:43,480 --> 00:03:47,340 And each column seems to have some kind of data 61 00:03:47,340 --> 00:03:52,080 that could be stored inside of it like text, or integers, or real, 62 00:03:52,080 --> 00:03:56,270 or floating point values, or decimals, if you're familiar. 63 00:03:56,270 --> 00:04:01,510 So this is how we created the very first version of longlist.db. 64 00:04:01,510 --> 00:04:04,165 But let's also see how we created the second. 65 00:04:04,165 --> 00:04:09,190 So I'll type .quit to leave this version of longlist.db. 66 00:04:09,190 --> 00:04:12,650 And now, let me open up the next version we had created, 67 00:04:12,650 --> 00:04:13,910 which is more relational. 68 00:04:13,910 --> 00:04:17,350 It had tables inside of it that could relate to one another. 69 00:04:17,350 --> 00:04:21,100 So I'm going to type sqlite3, then long-- 70 00:04:21,100 --> 00:04:26,320 sqlite3, and go to week 1, and then type longlist.db. 71 00:04:26,320 --> 00:04:27,700 I'll hit Enter. 72 00:04:27,700 --> 00:04:31,720 Now, I'm on my next version of longlist.db. 73 00:04:31,720 --> 00:04:32,960 Well, what could I do? 74 00:04:32,960 --> 00:04:36,050 I could type SELECT and look at some of the tables in here 75 00:04:36,050 --> 00:04:38,200 to see what kind of data is inside. 76 00:04:38,200 --> 00:04:44,350 I could perhaps say SELECT maybe the names from the authors table 77 00:04:44,350 --> 00:04:48,220 here, from the authors table, and hit semicolon. 78 00:04:48,220 --> 00:04:54,280 Now, I'll see all the names of authors that are inside of the authors table. 79 00:04:54,280 --> 00:04:55,930 I could do the same thing for books. 80 00:04:55,930 --> 00:04:58,150 I could maybe look at the titles of books. 81 00:04:58,150 --> 00:05:05,350 I could say SELECT title from the books table, semicolon, and Enter. 82 00:05:05,350 --> 00:05:10,490 Now, I see all of the titles that are inside my books table. 83 00:05:10,490 --> 00:05:14,230 But what I haven't seen yet is the schema of this database. 84 00:05:14,230 --> 00:05:17,080 It is the way it is organized and the commands 85 00:05:17,080 --> 00:05:19,700 that were used to create these tables. 86 00:05:19,700 --> 00:05:21,730 So let me work on that now. 87 00:05:21,730 --> 00:05:24,460 I'll clear my terminal using Control-L. 88 00:05:24,460 --> 00:05:28,090 And now, let me type .schema again. 89 00:05:28,090 --> 00:05:34,420 I'll type .schema to see what commands were used to create this database. 90 00:05:34,420 --> 00:05:37,630 Hit Enter, and I can see-- 91 00:05:37,630 --> 00:05:39,670 I mean, there are quite a lot of commands here. 92 00:05:39,670 --> 00:05:42,610 Now, if this feels overwhelming, I mean, I'd be right there with you. 93 00:05:42,610 --> 00:05:45,610 This is a lot of commands to parse through and read. 94 00:05:45,610 --> 00:05:49,850 So there's probably a better way to do this. 95 00:05:49,850 --> 00:05:55,660 And one way to try is to type .schema, and then give it some table name. 96 00:05:55,660 --> 00:05:58,930 Let's say, want to understand the schema for just the books table, 97 00:05:58,930 --> 00:06:00,770 like just that for now. 98 00:06:00,770 --> 00:06:04,060 So I'll say .schema, and then the table name-- 99 00:06:04,060 --> 00:06:05,720 books, in this case. 100 00:06:05,720 --> 00:06:07,840 Then I can hit Enter. 101 00:06:07,840 --> 00:06:11,500 Now, I'll see the schema, the organization, 102 00:06:11,500 --> 00:06:15,860 the command that we used to create the books table, in this case. 103 00:06:15,860 --> 00:06:18,250 And notice again, we have several columns-- 104 00:06:18,250 --> 00:06:22,420 ID, ISBN, title, publisher ID, and so on. 105 00:06:22,420 --> 00:06:26,380 Each one has their own kind of data they could support or take 106 00:06:26,380 --> 00:06:30,590 into this column like integers, text, and so on. 107 00:06:30,590 --> 00:06:33,700 So again, what we'll do today is have you all 108 00:06:33,700 --> 00:06:36,640 learn how to write your very own create table 109 00:06:36,640 --> 00:06:41,050 commands to build your very own databases that represent what you 110 00:06:41,050 --> 00:06:44,360 want to represent in the real world. 111 00:06:44,360 --> 00:06:47,860 So let me exit this prompt here. 112 00:06:47,860 --> 00:06:51,370 And let me propose that we'll have a bit of a design challenge 113 00:06:51,370 --> 00:06:55,600 today to actually try to represent a real world entity with some database 114 00:06:55,600 --> 00:06:56,240 here. 115 00:06:56,240 --> 00:06:59,050 And if you're not already familiar, Boston 116 00:06:59,050 --> 00:07:02,110 is perhaps famous for being among the first cities 117 00:07:02,110 --> 00:07:05,560 to have a subway system in the United States. 118 00:07:05,560 --> 00:07:09,730 So here is a picture from the late 1800s of a subway being 119 00:07:09,730 --> 00:07:11,950 built in Boston's city streets. 120 00:07:11,950 --> 00:07:14,170 Underneath the streets here, there would be 121 00:07:14,170 --> 00:07:18,100 trolley cars that would go and transport people across Boston. 122 00:07:18,100 --> 00:07:20,410 Here's another picture of a trolley actually working 123 00:07:20,410 --> 00:07:21,790 underneath the streets. 124 00:07:21,790 --> 00:07:23,920 So people would go down underneath. 125 00:07:23,920 --> 00:07:25,360 They would hop on a trolley. 126 00:07:25,360 --> 00:07:29,950 They'd be able to different parts of Boston, perhaps from Harvard to MIT 127 00:07:29,950 --> 00:07:33,550 or downtown up to, let's say, Braintree or down to Braintree, 128 00:07:33,550 --> 00:07:36,070 which is more south of Boston, for example. 129 00:07:36,070 --> 00:07:39,280 One of the famous stops is the Park Street stop, 130 00:07:39,280 --> 00:07:41,530 which is right down in the middle of Boston, 131 00:07:41,530 --> 00:07:45,010 one of the central hubs of this subway system. 132 00:07:45,010 --> 00:07:50,480 And now, these photos are all from, let's say, the early 1900s, late 1800s, 133 00:07:50,480 --> 00:07:51,230 and so on. 134 00:07:51,230 --> 00:07:54,070 But the subway has gotten a lot more modern since then. 135 00:07:54,070 --> 00:07:59,000 And actually, now, we have several lines that span the entire city and beyond. 136 00:07:59,000 --> 00:08:02,740 So here, we have the Red Line, of which Harvard and MIT are a part. 137 00:08:02,740 --> 00:08:05,800 We have the green line, which brings you kind of west to east-- 138 00:08:05,800 --> 00:08:09,680 the blue line, the orange line, and so on. 139 00:08:09,680 --> 00:08:14,790 So many more lines and stations have been added to this system. 140 00:08:14,790 --> 00:08:18,460 It's a big design challenge to represent all of these stations, 141 00:08:18,460 --> 00:08:24,560 all of these lines, and all of these people who might ride this subway, too. 142 00:08:24,560 --> 00:08:31,220 So the question then becomes, how can we create a schema for this data? 143 00:08:31,220 --> 00:08:36,350 And again, by schema, we mean what kinds of tables should we have, 144 00:08:36,350 --> 00:08:40,610 what kinds of columns might those tables have, and what kind of data 145 00:08:40,610 --> 00:08:44,159 should we put in each of those columns, for instance. 146 00:08:44,159 --> 00:08:46,320 So let me propose this. 147 00:08:46,320 --> 00:08:49,790 Let's say we start just first with names and stations. 148 00:08:49,790 --> 00:08:53,750 So Charlie here, our very first rider on this system, 149 00:08:53,750 --> 00:08:56,810 is going to be at the Kendall at MIT station. 150 00:08:56,810 --> 00:08:58,910 So this is what this table represents now. 151 00:08:58,910 --> 00:09:01,370 But what more could we have? 152 00:09:01,370 --> 00:09:05,750 Well, we might also want to have maybe what Charlie is doing at that station. 153 00:09:05,750 --> 00:09:09,050 Maybe he's entering the station, for instance. 154 00:09:09,050 --> 00:09:11,390 And if you're familiar with the subway system, 155 00:09:11,390 --> 00:09:17,550 you often have to pay to get onto a train or get onto the station itself. 156 00:09:17,550 --> 00:09:23,570 So let's say Charlie pays some fare to enter into the Kendall/MIT station. 157 00:09:23,570 --> 00:09:26,900 Well, back in the mid 1900s, the fare was only about a dime. 158 00:09:26,900 --> 00:09:27,800 It was $0.10. 159 00:09:27,800 --> 00:09:33,820 So we'll say Charlie paid $0.10 to enter the Kendall/MIT station. 160 00:09:33,820 --> 00:09:36,140 And now, this seems pretty good. 161 00:09:36,140 --> 00:09:41,300 But if I am the Transit Authority, the person who runs the subway system, 162 00:09:41,300 --> 00:09:45,890 I probably want to know, does Charlie have enough money to get on the train? 163 00:09:45,890 --> 00:09:48,730 And if so I want to make sure that, OK, well, Charlie actually 164 00:09:48,730 --> 00:09:50,590 could get on this train. 165 00:09:50,590 --> 00:09:53,420 So let's say not only does Charlie pay a fare. 166 00:09:53,420 --> 00:09:56,570 He has some remaining balance afterwards. 167 00:09:56,570 --> 00:10:00,160 So Charlie here has gotten onto the Kendall/MIT stop. 168 00:10:00,160 --> 00:10:05,330 He's paid the fare of $0.10 and has $0.05 left. 169 00:10:05,330 --> 00:10:08,750 OK, so here's a bit of a table. 170 00:10:08,750 --> 00:10:11,030 We probably add more information to it. 171 00:10:11,030 --> 00:10:15,230 Let's say Charlie then leaves at the Jamaica Plain stop. 172 00:10:15,230 --> 00:10:18,650 And the fare to leave is about a nickel, $0.05. 173 00:10:18,650 --> 00:10:21,410 And now, Charlie has no cents left over. 174 00:10:21,410 --> 00:10:27,680 So again, Charlie paid $0.10 to get on, had $0.05 left, paid $0.05 to get off, 175 00:10:27,680 --> 00:10:31,930 and now has no remaining balance here anymore. 176 00:10:31,930 --> 00:10:32,440 OK. 177 00:10:32,440 --> 00:10:34,040 So that's Charlie's story. 178 00:10:34,040 --> 00:10:35,860 Let's look at Alice, though. 179 00:10:35,860 --> 00:10:38,530 Let's say Alice gets on at the Harvard stop. 180 00:10:38,530 --> 00:10:42,310 They too pay $0.10 to get on at the Harvard stop. 181 00:10:42,310 --> 00:10:45,370 And they have a remaining balance of $0.20. 182 00:10:45,370 --> 00:10:49,510 Alice will go, let's say, to Park Street, get off at Park Street, 183 00:10:49,510 --> 00:10:51,160 pay the nickel to leave. 184 00:10:51,160 --> 00:10:55,430 And now, they'll have a balance of $0.15 at the end. 185 00:10:55,430 --> 00:10:56,620 Let's go to Bob. 186 00:10:56,620 --> 00:10:59,260 Bob enters the Alewife station. 187 00:10:59,260 --> 00:11:00,940 They pay $0.10. 188 00:11:00,940 --> 00:11:02,800 They have remaining balance of $0.30. 189 00:11:02,800 --> 00:11:08,337 And let's say they leave at Park Street and have a fare of $0.10 to leave 190 00:11:08,337 --> 00:11:09,670 because it's a further distance. 191 00:11:09,670 --> 00:11:14,490 Now, they'll have a running balance of $0.20 overall. 192 00:11:14,490 --> 00:11:18,840 So this table is OK, I might admit. 193 00:11:18,840 --> 00:11:23,100 I mean, last time, we learned about having what we called primary keys 194 00:11:23,100 --> 00:11:24,340 and foreign keys. 195 00:11:24,340 --> 00:11:26,220 So it seems like that's missing here. 196 00:11:26,220 --> 00:11:28,020 Let's go ahead and add that here. 197 00:11:28,020 --> 00:11:32,940 I'll give each row a unique ID, so I can know who entered, who exited, 198 00:11:32,940 --> 00:11:35,580 and so on, give that a unique ID here. 199 00:11:35,580 --> 00:11:41,950 But I might even say that this table could really be improved substantially. 200 00:11:41,950 --> 00:11:49,410 And I want to ask you what redundancies or inefficiencies do you see here? 201 00:11:49,410 --> 00:11:52,680 If we're trying to represent riders and stations, 202 00:11:52,680 --> 00:11:56,990 what can we improve about this design? 203 00:11:56,990 --> 00:12:01,740 AUDIENCE: So probably the redundancy will be the names and the stations, 204 00:12:01,740 --> 00:12:02,240 too. 205 00:12:02,240 --> 00:12:06,760 For example, if Charlie will go to the train daily, 206 00:12:06,760 --> 00:12:10,875 then he will become most of the names in the data. 207 00:12:10,875 --> 00:12:12,250 CARTER ZENKE: Yeah, a good point. 208 00:12:12,250 --> 00:12:13,450 If I'm hearing what you're saying, [? Loren, ?] 209 00:12:13,450 --> 00:12:16,420 let me show you some examples that I highlighted here. 210 00:12:16,420 --> 00:12:20,290 One example could be, to your point, about these names. 211 00:12:20,290 --> 00:12:24,370 These names seem to be telling us the name of a person. 212 00:12:24,370 --> 00:12:27,640 But here, we have only three names-- 213 00:12:27,640 --> 00:12:29,680 Charlie, Alice, and Bob. 214 00:12:29,680 --> 00:12:32,170 Well, my name is Carter. 215 00:12:32,170 --> 00:12:35,110 And what if somebody else named Carter also 216 00:12:35,110 --> 00:12:37,810 tried to get on and leave at some stop? 217 00:12:37,810 --> 00:12:39,820 Well, I wouldn't be able to know which Carter 218 00:12:39,820 --> 00:12:44,210 was which or which Charlie was which, which Alice was which, and so on. 219 00:12:44,210 --> 00:12:47,920 So we probably need a way to represent people and their names a little better 220 00:12:47,920 --> 00:12:50,220 here, too. 221 00:12:50,220 --> 00:12:55,870 What other ideas do we have for how to improve the design of this table? 222 00:12:55,870 --> 00:13:00,330 AUDIENCE: Yes, I think we can have a singular ID for a singular person. 223 00:13:00,330 --> 00:13:02,998 That way, we'll be better able to track their activities. 224 00:13:02,998 --> 00:13:03,790 CARTER ZENKE: Nice. 225 00:13:03,790 --> 00:13:06,555 So we probably have an ID for each person, 226 00:13:06,555 --> 00:13:10,830 a bit what we learned about last week, putting people in their own table 227 00:13:10,830 --> 00:13:14,890 and giving them their own unique ID, a primary key. 228 00:13:14,890 --> 00:13:16,380 Let's show that here. 229 00:13:16,380 --> 00:13:18,150 I'll go to some slides. 230 00:13:18,150 --> 00:13:21,345 And I'll pick out one that shows us just riders. 231 00:13:21,345 --> 00:13:23,220 So to your point, [? Soqanya, ?] we could try 232 00:13:23,220 --> 00:13:25,830 to have maybe a table for just riders. 233 00:13:25,830 --> 00:13:29,130 And maybe to simplify, this table has only two columns. 234 00:13:29,130 --> 00:13:32,850 It has a column for ID and a column for name. 235 00:13:32,850 --> 00:13:38,250 So here, we have Charlie, Alice, and Bob all in their own table. 236 00:13:38,250 --> 00:13:42,270 Well, let me propose to you, we could do the same thing for stations. 237 00:13:42,270 --> 00:13:44,760 Let's say we have a table of stations now. 238 00:13:44,760 --> 00:13:48,030 And we give each one their very own ID as well, 239 00:13:48,030 --> 00:13:50,400 our own primary key for this table. 240 00:13:50,400 --> 00:13:52,540 We have Harvard, Kendall, and Park Street. 241 00:13:52,540 --> 00:13:57,130 We can differentiate between them using their IDs here. 242 00:13:57,130 --> 00:14:00,000 So a few improvements could be made. 243 00:14:00,000 --> 00:14:04,560 And as we're making these improvements, splitting one table into many, 244 00:14:04,560 --> 00:14:08,010 and debating what kind of data to store in each table, 245 00:14:08,010 --> 00:14:12,180 the process we're going through is one called normalizing. 246 00:14:12,180 --> 00:14:14,640 We're normalizing our data. 247 00:14:14,640 --> 00:14:18,210 To normalize means to reduce redundancies, effectively, 248 00:14:18,210 --> 00:14:19,530 to take a table-- 249 00:14:19,530 --> 00:14:22,230 take one table, for instance, split up into multiple, 250 00:14:22,230 --> 00:14:26,550 and have each entity be part of its very own table. 251 00:14:26,550 --> 00:14:31,590 Some academics in the world have named different normal forms, quote unquote. 252 00:14:31,590 --> 00:14:36,120 There's like first normal form, second normal form, third normal form, 253 00:14:36,120 --> 00:14:39,420 this progression of making your day more and more efficient. 254 00:14:39,420 --> 00:14:41,160 You can use those as heuristics. 255 00:14:41,160 --> 00:14:43,770 But end of the day, a few principles might apply. 256 00:14:43,770 --> 00:14:48,750 First, take your entities, like in this case stations and riders, 257 00:14:48,750 --> 00:14:51,840 and put them each in their own table. 258 00:14:51,840 --> 00:14:56,760 And if you add more data, make sure that if I were to add a column, let's say, 259 00:14:56,760 --> 00:15:00,780 to riders, it is only a data point about riders-- 260 00:15:00,780 --> 00:15:05,190 not about stations, not about fares, only about riders. 261 00:15:05,190 --> 00:15:08,220 And that process can help us make a data set that 262 00:15:08,220 --> 00:15:14,050 is more dynamic, more easy to reproduce, and more easy to write queries on. 263 00:15:14,050 --> 00:15:18,090 So that is the process here of normalizing. 264 00:15:18,090 --> 00:15:18,960 OK. 265 00:15:18,960 --> 00:15:25,830 So if we have now riders and stations, we want to represent them in our table. 266 00:15:25,830 --> 00:15:28,050 Well, we could use what we learned about relating 267 00:15:28,050 --> 00:15:30,960 last week to ask, how could we actually represent 268 00:15:30,960 --> 00:15:33,790 these riders and these stations? 269 00:15:33,790 --> 00:15:36,640 So let's say here I can have riders and stations. 270 00:15:36,640 --> 00:15:40,330 I want to make sure that I have the right relationship between them. 271 00:15:40,330 --> 00:15:42,360 Well, if you're familiar with subways, we 272 00:15:42,360 --> 00:15:45,870 might say that a rider goes to one station. 273 00:15:45,870 --> 00:15:50,070 And this big T here is the symbol for a station here in Boston, 274 00:15:50,070 --> 00:15:52,890 for the T's that we call it, for the subway. 275 00:15:52,890 --> 00:15:56,010 So a rider might go to one station. 276 00:15:56,010 --> 00:15:58,530 But of course, that might not be the full picture. 277 00:15:58,530 --> 00:16:01,660 A rider also gets off at some station on. 278 00:16:01,660 --> 00:16:06,990 So a rider could be associated with not just one station, but multiple. 279 00:16:06,990 --> 00:16:10,380 And if you're familiar, at least with any subway system or the Boston one, 280 00:16:10,380 --> 00:16:12,660 too, it can often get pretty busy. 281 00:16:12,660 --> 00:16:16,890 And so riders might not just go to, of course, one station or two. 282 00:16:16,890 --> 00:16:22,750 Stations could also have multiple riders that are on a particular station here. 283 00:16:22,750 --> 00:16:28,290 So to recap, one rider might be associated with more than one station. 284 00:16:28,290 --> 00:16:33,030 They might get on at this first one and get off at this later one. 285 00:16:33,030 --> 00:16:37,350 But each station could presumably have more than one rider. 286 00:16:37,350 --> 00:16:41,130 Each station here could have rider A or rider B, the rider 287 00:16:41,130 --> 00:16:47,050 up here or the rider down below, and even many more than that as well. 288 00:16:47,050 --> 00:16:50,550 So to put it in the language of our ER diagrams, 289 00:16:50,550 --> 00:16:53,160 our entity relation diagrams from last week, 290 00:16:53,160 --> 00:16:57,840 we could look at it bit like this, where we have riders and stations. 291 00:16:57,840 --> 00:17:02,550 Riders visit stations, and they're associated like this. 292 00:17:02,550 --> 00:17:07,540 A rider must have at least one station associated with them. 293 00:17:07,540 --> 00:17:09,270 That's what this horizontal line means. 294 00:17:09,270 --> 00:17:13,140 If they aren't at a station, they aren't really a rider, right? 295 00:17:13,140 --> 00:17:16,960 A rider though could have many stations associated with them. 296 00:17:16,960 --> 00:17:19,470 That's what this three prongs down here means. 297 00:17:19,470 --> 00:17:22,440 They could have one, two, three, four, they 298 00:17:22,440 --> 00:17:26,290 could have many stations they get on and get off of. 299 00:17:26,290 --> 00:17:31,030 Now a station could have anywhere between zero riders, 300 00:17:31,030 --> 00:17:35,290 if it's maybe out of commission or isn't very popular, upwards to many. 301 00:17:35,290 --> 00:17:39,730 It could have two, three, four, five, even hundreds of riders associated 302 00:17:39,730 --> 00:17:42,070 with this particular station. 303 00:17:42,070 --> 00:17:47,350 So here is our entity relation diagram for these particular riders 304 00:17:47,350 --> 00:17:50,030 and these stations here. 305 00:17:50,030 --> 00:17:55,660 So let me ask, what questions do we have on these relationships between riders 306 00:17:55,660 --> 00:17:59,700 and stations and how to design this table so far? 307 00:17:59,700 --> 00:18:01,450 AUDIENCE: Then I want to ask that you have 308 00:18:01,450 --> 00:18:06,370 used the same ID for stations and riders so that maybe 309 00:18:06,370 --> 00:18:08,895 give us a problem in coding? 310 00:18:08,895 --> 00:18:10,520 CARTER ZENKE: Yeah, a good observation. 311 00:18:10,520 --> 00:18:14,950 So you might have noticed that in the riders table and in the stations table, 312 00:18:14,950 --> 00:18:17,140 I gave the same kind of ID. 313 00:18:17,140 --> 00:18:19,602 Like I had one, two, three for each of them. 314 00:18:19,602 --> 00:18:21,310 And let me just show you that again here. 315 00:18:21,310 --> 00:18:24,950 I'll come back to some slides, and I'll show you again, 316 00:18:24,950 --> 00:18:30,730 the riders table where we had Charlie, Alice, and Bob, ID 1, 2, 3. 317 00:18:30,730 --> 00:18:31,870 Same for the stations. 318 00:18:31,870 --> 00:18:36,730 We had stations Harvard, Kendall, Park Street, ID 1, 2, 3. 319 00:18:36,730 --> 00:18:39,950 And to your question, isn't that a problem? 320 00:18:39,950 --> 00:18:43,540 Well, I would argue in this case, it's not, 321 00:18:43,540 --> 00:18:48,460 so long as we keep clear that these IDs are for stations 322 00:18:48,460 --> 00:18:50,800 and these IDs are for riders. 323 00:18:50,800 --> 00:18:54,550 And we'll see how to do that using our SQL keywords later on. 324 00:18:54,550 --> 00:18:59,500 But again, so long as we have an ID just for our riders and an ID 325 00:18:59,500 --> 00:19:02,770 just for our stations, we can keep these separate 326 00:19:02,770 --> 00:19:05,560 even if they might have the same values. 327 00:19:05,560 --> 00:19:07,990 But a great question here. 328 00:19:07,990 --> 00:19:10,440 Let's take just one more. 329 00:19:10,440 --> 00:19:12,870 AUDIENCE: Regarding the entity relationship diagram, 330 00:19:12,870 --> 00:19:17,340 how is it possible for a station to have a possibility of zero riders, 331 00:19:17,340 --> 00:19:21,303 but riders must compulsorily have at least one station. 332 00:19:21,303 --> 00:19:22,720 CARTER ZENKE: Yeah, good question. 333 00:19:22,720 --> 00:19:25,470 So this might be up to you and how you formulate it, 334 00:19:25,470 --> 00:19:28,380 but for me, let me show that diagram again. 335 00:19:28,380 --> 00:19:32,280 I'll go back to over here. 336 00:19:32,280 --> 00:19:36,540 In my mind, to be a rider, you have to visit a station. 337 00:19:36,540 --> 00:19:39,780 If you aren't visiting a station, you aren't really a rider, right? 338 00:19:39,780 --> 00:19:43,740 Now presumably, there are stations that were built but aren't really being 339 00:19:43,740 --> 00:19:46,560 used right now or aren't really in service yet. 340 00:19:46,560 --> 00:19:49,230 That could be a station that has no visitors. 341 00:19:49,230 --> 00:19:50,910 So you could argue-- 342 00:19:50,910 --> 00:19:54,090 let's make sure every station has at least one rider 343 00:19:54,090 --> 00:19:56,910 and every rider may or may not have to visit a station. 344 00:19:56,910 --> 00:19:59,640 For that I would say, we could probably reasonably 345 00:19:59,640 --> 00:20:03,300 disagree there and talk about how we could represent the diagram here, too. 346 00:20:03,300 --> 00:20:09,940 But a great observation and a good other critique of this system here. 347 00:20:09,940 --> 00:20:11,150 All right. 348 00:20:11,150 --> 00:20:16,000 So let's now turn to representing this in our database. 349 00:20:16,000 --> 00:20:19,180 I'll go back to my computer and we'll learn 350 00:20:19,180 --> 00:20:23,440 about this new SQL keyword, SQL statement, 351 00:20:23,440 --> 00:20:26,200 this one called Create Table. 352 00:20:26,200 --> 00:20:31,400 Create table allows us to, as the name suggests, create a brand new table. 353 00:20:31,400 --> 00:20:34,450 So let's do just that in our new database 354 00:20:34,450 --> 00:20:37,480 to represent riders and stations. 355 00:20:37,480 --> 00:20:42,820 I'll go into my terminal, and I want to make a brand new database. 356 00:20:42,820 --> 00:20:49,120 I'll call this one mbta.db, because mbta stands for the Massachusetts Bay 357 00:20:49,120 --> 00:20:52,630 Transportation Authority, the people who run the subway, essentially. 358 00:20:52,630 --> 00:20:57,610 So I'll do sqlite3, mbta.db, hit Enter, and I'll 359 00:20:57,610 --> 00:21:04,220 type Y to say yes, I want to create this brand new database. 360 00:21:04,220 --> 00:21:09,010 Now if I type dot schema, I see nothing. 361 00:21:09,010 --> 00:21:10,150 But that's expected. 362 00:21:10,150 --> 00:21:11,530 I don't have any tables yet. 363 00:21:11,530 --> 00:21:13,870 I have nothing inside this database. 364 00:21:13,870 --> 00:21:17,350 It's up to me to create these tables myself. 365 00:21:17,350 --> 00:21:22,240 So what I'll do is clear my terminal, and let's start first with riders. 366 00:21:22,240 --> 00:21:23,890 I might create a table for riders. 367 00:21:23,890 --> 00:21:26,440 I'll say, create table. 368 00:21:26,440 --> 00:21:29,500 And now I have to give that table some name. 369 00:21:29,500 --> 00:21:31,960 I might call it riders here. 370 00:21:31,960 --> 00:21:36,310 And then in parentheses, like this, I can specify what 371 00:21:36,310 --> 00:21:39,500 columns should be part of this table. 372 00:21:39,500 --> 00:21:40,540 So let's start first. 373 00:21:40,540 --> 00:21:44,320 I'll hit Enter here and continue this query. 374 00:21:44,320 --> 00:21:47,770 Now, all by convention, I'll just indent four spaces-- 375 00:21:47,770 --> 00:21:52,760 one, two, three, four, and I'll give an ID to each of these riders 376 00:21:52,760 --> 00:21:54,430 as we saw before. 377 00:21:54,430 --> 00:21:58,180 I'll say ID here is one of my columns. 378 00:21:58,180 --> 00:22:04,300 Now to create a new column, I'll follow this up with a comma and hit Enter. 379 00:22:04,300 --> 00:22:08,350 I'll again, by convention, for style, just indent four spaces. 380 00:22:08,350 --> 00:22:09,730 And what's my next column? 381 00:22:09,730 --> 00:22:12,160 Perhaps a name for these riders. 382 00:22:12,160 --> 00:22:17,530 I'll give this column the name, name, and I'll leave it at that. 383 00:22:17,530 --> 00:22:22,660 Once I'm done adding columns, I no longer need to have a comma. 384 00:22:22,660 --> 00:22:26,320 I could simply close out this query, this statement. 385 00:22:26,320 --> 00:22:29,740 I could hit Enter here, say close in parentheses 386 00:22:29,740 --> 00:22:33,400 to close the top parentheses here, semicolon, hit Enter. 387 00:22:33,400 --> 00:22:36,730 And now nothing seems to happen. 388 00:22:36,730 --> 00:22:38,330 But that's actually a good sign. 389 00:22:38,330 --> 00:22:41,290 So let me type dot schema, hit Enter, and I'll 390 00:22:41,290 --> 00:22:44,650 see the result of that statement before. 391 00:22:44,650 --> 00:22:48,820 Create table, if it doesn't already exist, riders, and riders 392 00:22:48,820 --> 00:22:53,590 has inside of it two columns, ID and name. 393 00:22:53,590 --> 00:22:54,740 OK, let's keep going. 394 00:22:54,740 --> 00:22:56,800 Let's make one for stations, too. 395 00:22:56,800 --> 00:22:59,380 I'll clear my terminal and I'll say, create me 396 00:22:59,380 --> 00:23:02,740 a table called stations and include-- 397 00:23:02,740 --> 00:23:04,330 actually, not station. 398 00:23:04,330 --> 00:23:09,140 If you ever want to, let's say fix this kind of table here, 399 00:23:09,140 --> 00:23:14,500 let me try closing the parentheses, hit semicolon, Enter. 400 00:23:14,500 --> 00:23:15,850 I'll get a syntax error. 401 00:23:15,850 --> 00:23:17,050 I can restart. 402 00:23:17,050 --> 00:23:22,180 I'll do Control L. Now I'll do create table, stations, plural. 403 00:23:22,180 --> 00:23:24,250 Open parentheses, Enter. 404 00:23:24,250 --> 00:23:25,870 I'll indent by four spaces. 405 00:23:25,870 --> 00:23:27,430 One, two, three, four. 406 00:23:27,430 --> 00:23:31,870 And now, I'll similarly include an ID for each of these stations. 407 00:23:31,870 --> 00:23:36,220 I'll say ID comma, and then what all should I have? 408 00:23:36,220 --> 00:23:40,840 Well, stations tend to have a name, like the Kendall MIT station, 409 00:23:40,840 --> 00:23:43,480 the Harvard station, the Park Street Station. 410 00:23:43,480 --> 00:23:48,760 So I'll say, I'll give each of these their very own name, comma. 411 00:23:48,760 --> 00:23:50,770 What else do stations have? 412 00:23:50,770 --> 00:23:54,220 Well, they might also have a line that they're on. 413 00:23:54,220 --> 00:23:58,700 Let's say it's the red line, or the blue line, or the green line and so on. 414 00:23:58,700 --> 00:24:03,720 I'll have a space for them to write in their line that they're a part of. 415 00:24:03,720 --> 00:24:06,160 OK, and maybe I'll leave it at that to keep it simple. 416 00:24:06,160 --> 00:24:09,490 I'll say, stations have an ID, a name and a line. 417 00:24:09,490 --> 00:24:10,750 Now I'll close this out. 418 00:24:10,750 --> 00:24:15,430 I'll say end parentheses, semicolon, hit Enter, and nothing seems to happen. 419 00:24:15,430 --> 00:24:20,770 But if I type dot schema, I'll now see my riders and my stations 420 00:24:20,770 --> 00:24:24,870 tables inside of my database. 421 00:24:24,870 --> 00:24:26,760 Now one more step here. 422 00:24:26,760 --> 00:24:31,920 We have riders, we have stations, but we have to relate to them. 423 00:24:31,920 --> 00:24:34,800 We have to relate them using this many to many relationships, 424 00:24:34,800 --> 00:24:37,420 as we saw last week. 425 00:24:37,420 --> 00:24:40,950 So let me try making a table to implement 426 00:24:40,950 --> 00:24:43,110 this many to many relationship. 427 00:24:43,110 --> 00:24:47,400 And if you remember, we might call this kind of table a junction table, 428 00:24:47,400 --> 00:24:50,040 an associative entity, a join table. 429 00:24:50,040 --> 00:24:53,730 It has a lot of names, but it looks a bit like this. 430 00:24:53,730 --> 00:24:58,710 I'll create this new table to represent, let's say, visits. 431 00:24:58,710 --> 00:25:01,110 A rider visits a station. 432 00:25:01,110 --> 00:25:03,780 So I'll call this table visits. 433 00:25:03,780 --> 00:25:09,840 And inside, I'll make sure it has two columns, one for a rider 434 00:25:09,840 --> 00:25:17,130 ID to represent a rider, and one, let's say, for a station ID, 435 00:25:17,130 --> 00:25:19,020 to represent a station. 436 00:25:19,020 --> 00:25:24,630 Now when I see a rider ID next to a station ID in this table, 437 00:25:24,630 --> 00:25:27,870 I'll know the rider with that certain ID visited 438 00:25:27,870 --> 00:25:30,180 the station with that certain ID. 439 00:25:30,180 --> 00:25:31,500 So I'll close this out. 440 00:25:31,500 --> 00:25:36,330 I'll say end, parentheses here, semicolon, Enter, and finally, clear 441 00:25:36,330 --> 00:25:38,940 my terminal, type dot schema. 442 00:25:38,940 --> 00:25:45,000 And I can see, I have riders, stations, and visits between riders and stations 443 00:25:45,000 --> 00:25:48,630 in this associative entity, this junction table or a joined table. 444 00:25:48,630 --> 00:25:51,990 Up to you what you might want to call it in this case. 445 00:25:51,990 --> 00:25:55,840 Now what questions do we have? 446 00:25:55,840 --> 00:25:58,480 AUDIENCE: Why we have not use the primary key 447 00:25:58,480 --> 00:26:00,563 and secondary key in this table? 448 00:26:00,563 --> 00:26:01,730 CARTER ZENKE: Good question. 449 00:26:01,730 --> 00:26:03,605 So we're going to get there in just a minute. 450 00:26:03,605 --> 00:26:07,120 But if I look back at my terminal here, my schema, I'll see, 451 00:26:07,120 --> 00:26:09,340 I really just have column names. 452 00:26:09,340 --> 00:26:13,840 And we saw before and we typed dot schema on our longlist.db, 453 00:26:13,840 --> 00:26:16,150 we had more than just column names. 454 00:26:16,150 --> 00:26:19,400 We had column names, we had perhaps data types, 455 00:26:19,400 --> 00:26:21,807 we had primary keys and foreign keys. 456 00:26:21,807 --> 00:26:23,390 So we'll get to that in just a minute. 457 00:26:23,390 --> 00:26:28,300 But suffice to say for now, we're going to keep improving this over time. 458 00:26:28,300 --> 00:26:30,580 Let's take one more. 459 00:26:30,580 --> 00:26:33,300 AUDIENCE: Is it required to put spaces-- 460 00:26:33,300 --> 00:26:37,422 the four spaces indents, or that's just for the visual look? 461 00:26:37,422 --> 00:26:38,880 CARTER ZENKE: Yeah, great question. 462 00:26:38,880 --> 00:26:43,260 Is it required to have these four spaces before each column name. 463 00:26:43,260 --> 00:26:46,720 And in fact, no, it's not, but it makes the code more readable. 464 00:26:46,720 --> 00:26:49,590 So I could put this all in one line-- 465 00:26:49,590 --> 00:26:51,270 I shouldn't, but I could. 466 00:26:51,270 --> 00:26:55,620 And if I have instead this new line followed by four spaces, 467 00:26:55,620 --> 00:27:00,390 I can make this more readable for myself and for my colleagues, too. 468 00:27:00,390 --> 00:27:02,060 Good question. 469 00:27:02,060 --> 00:27:07,430 OK, so to our earlier point, there are things 470 00:27:07,430 --> 00:27:09,680 that are missing from this schema. 471 00:27:09,680 --> 00:27:13,800 Like, we have column names, but as we saw before, 472 00:27:13,800 --> 00:27:16,550 we should ideally specify what kind of data 473 00:27:16,550 --> 00:27:20,870 should be able to go into each of these columns. 474 00:27:20,870 --> 00:27:24,990 And for that, we'll need some new ideas to talk about here. 475 00:27:24,990 --> 00:27:30,950 So let's focus now on this idea of data types and storage classes. 476 00:27:30,950 --> 00:27:36,680 Data types and storage classes are two very similar but distinct IDs, 477 00:27:36,680 --> 00:27:40,230 and they're distinct in a way we'll talk about in just a minute. 478 00:27:40,230 --> 00:27:45,770 Now SQLite has five storage classes, five kind of storage-- 479 00:27:45,770 --> 00:27:49,520 kind of type, so to speak, of values that can hold. 480 00:27:49,520 --> 00:27:51,920 So let's talk about the first one, null, for instance. 481 00:27:51,920 --> 00:27:54,230 Null in this case means nothing. 482 00:27:54,230 --> 00:27:57,170 There's nothing that actually is inside of this value. 483 00:27:57,170 --> 00:27:59,990 It's kind of a central value to mean nothing is here. 484 00:27:59,990 --> 00:28:04,700 Integer means a whole number, like 1, 2, 3, 4, or 5. 485 00:28:04,700 --> 00:28:10,880 Real talks about decimals like floating points like 1.2 or 2.4 and so on. 486 00:28:10,880 --> 00:28:12,740 Text is used for characters. 487 00:28:12,740 --> 00:28:19,740 And blob, kind of a funny one, blob stands for Binary Large Object, 488 00:28:19,740 --> 00:28:24,570 and it represents the data exactly as I give it to this value. 489 00:28:24,570 --> 00:28:30,800 If I tell it to store 101010, it will store exactly 101010 in binary. 490 00:28:30,800 --> 00:28:34,220 So useful for storing, in this case like images and video files 491 00:28:34,220 --> 00:28:37,160 and audio files, things that have some structure 492 00:28:37,160 --> 00:28:40,030 we don't want to mess around with. 493 00:28:40,030 --> 00:28:44,290 Now let's focus on this idea of a storage class. 494 00:28:44,290 --> 00:28:50,830 These, I'll say, are storage classes and not data types in SQLite. 495 00:28:50,830 --> 00:28:58,390 Now a storage class like integer can comprise, can hold several data types. 496 00:28:58,390 --> 00:29:01,480 Notice how there are seven different data 497 00:29:01,480 --> 00:29:05,380 types that could be stored under this integer storage class. 498 00:29:05,380 --> 00:29:10,960 We have a 6-byte integer, 2-byte integer, a 8 and 0 and so on. 499 00:29:10,960 --> 00:29:13,750 It could be any of these particular types, 500 00:29:13,750 --> 00:29:19,330 but each of these types under the umbrella of this integer storage class. 501 00:29:19,330 --> 00:29:23,740 And SQLite itself will take care of making sure 502 00:29:23,740 --> 00:29:25,750 it uses the appropriate data type. 503 00:29:25,750 --> 00:29:30,910 Like if I give a very large number, like let's say 4 billion or 6 billion 504 00:29:30,910 --> 00:29:35,200 or even bigger than that, it will probably use a longer-- 505 00:29:35,200 --> 00:29:39,010 that is, a bigger byte integer to store that kind of value. 506 00:29:39,010 --> 00:29:42,610 If I give it a smaller one, like 1, 2, 3, or 4, 507 00:29:42,610 --> 00:29:47,140 it will probably use a 1-byte or a 2-byte integer for that. 508 00:29:47,140 --> 00:29:50,560 But SQLite's idea is that I, as a programmer, 509 00:29:50,560 --> 00:29:54,910 shouldn't have to care if I use an 8-byte or a 1-byte or a 2-byte integer, 510 00:29:54,910 --> 00:29:58,180 I just care that I'm using integers, whole numbers, 511 00:29:58,180 --> 00:30:03,670 and they give me a storage class to use any of these up to their choice 512 00:30:03,670 --> 00:30:05,960 here as well. 513 00:30:05,960 --> 00:30:10,930 Now let's look at a few examples of values in SQLite that we could store. 514 00:30:10,930 --> 00:30:14,590 Well, we have perhaps the red line as some text. 515 00:30:14,590 --> 00:30:16,690 And because this is characters, it's quoted, 516 00:30:16,690 --> 00:30:22,330 we could use the text storage class to represent this particular value here. 517 00:30:22,330 --> 00:30:25,630 We could have maybe an image, and to the earlier point, 518 00:30:25,630 --> 00:30:29,350 we could say, well, this image might be best represented 519 00:30:29,350 --> 00:30:33,490 using a blob, a binary large object, to keep all of these pixels 520 00:30:33,490 --> 00:30:36,790 exactly as they are in this image. 521 00:30:36,790 --> 00:30:40,330 But we do get some choice, some interesting design 522 00:30:40,330 --> 00:30:43,720 challenges when we look at the idea of fares. 523 00:30:43,720 --> 00:30:50,230 So let's say to our point earlier, fares are $0.10 back in the 1950s or so. 524 00:30:50,230 --> 00:30:55,960 Well, $0.10 we could store as an integer, which seems just fine. 525 00:30:55,960 --> 00:30:58,480 But this could get confused. 526 00:30:58,480 --> 00:31:00,790 I'm talking about dollars here or cents? 527 00:31:00,790 --> 00:31:03,205 Maybe it would be better, let's say, if I did this. 528 00:31:03,205 --> 00:31:07,610 A dollar sign, 0.10. 529 00:31:07,610 --> 00:31:09,680 And what might that be stored as? 530 00:31:09,680 --> 00:31:11,630 Well, probably text, right? 531 00:31:11,630 --> 00:31:14,840 I could say this dollar sign isn't really a number, 532 00:31:14,840 --> 00:31:18,140 but now I have to include it, so I'll say this will be quoted, essentially, 533 00:31:18,140 --> 00:31:22,358 as dollar sign 0.10. 534 00:31:22,358 --> 00:31:23,900 Now there's some downsides here, too. 535 00:31:23,900 --> 00:31:25,550 Like let's say I have-- 536 00:31:25,550 --> 00:31:26,840 I want to add these up. 537 00:31:26,840 --> 00:31:28,580 Well, I can't add up text. 538 00:31:28,580 --> 00:31:33,140 Like, what does it mean to say dollar sign 0.10 plus dollar sign 0.20. 539 00:31:33,140 --> 00:31:35,630 I can't do math with text. 540 00:31:35,630 --> 00:31:41,090 So maybe it'll be better after all if I used a real or a decimal like this, 541 00:31:41,090 --> 00:31:43,970 0.10. 542 00:31:43,970 --> 00:31:46,880 But I mean even here, you'll run into some problems. 543 00:31:46,880 --> 00:31:50,780 If you are familiar with how the number is represented in binary, 544 00:31:50,780 --> 00:31:54,380 you might know that decimal values or floating point values 545 00:31:54,380 --> 00:31:57,020 can't be perfectly, precisely represented. 546 00:31:57,020 --> 00:32:04,850 And if I talk about 0.10, the computer might store 0.10000056789. 547 00:32:04,850 --> 00:32:09,450 It could get very wonky out to the many, many decimal digits down below here. 548 00:32:09,450 --> 00:32:13,460 So trade offs and challenges overall. 549 00:32:13,460 --> 00:32:15,080 Let's look at these three, though. 550 00:32:15,080 --> 00:32:17,480 I have the first one to store as an integer. 551 00:32:17,480 --> 00:32:23,000 I'm trying to store fares here, second one as text, and the third one 552 00:32:23,000 --> 00:32:27,320 as a floating point or a real in this case. 553 00:32:27,320 --> 00:32:30,350 Let me ask for some opinions here, which one 554 00:32:30,350 --> 00:32:36,297 would you use and why, for trying represent fares in this case? 555 00:32:36,297 --> 00:32:37,130 AUDIENCE: Thank you. 556 00:32:37,130 --> 00:32:41,700 I prefer using integers because of course, 557 00:32:41,700 --> 00:32:44,690 I need to get the calculation very accurately. 558 00:32:44,690 --> 00:32:45,830 That's my point of view. 559 00:32:45,830 --> 00:32:50,030 Well, sometimes I can use float but, you know, 560 00:32:50,030 --> 00:32:52,310 like you said before, it can get very wonky 561 00:32:52,310 --> 00:32:59,815 if I really need that kind of precision, I don't really recommend using floats. 562 00:32:59,815 --> 00:33:01,190 CARTER ZENKE: Yeah, a good point. 563 00:33:01,190 --> 00:33:04,820 So if I go back to some slides here, you might argue for the integer 564 00:33:04,820 --> 00:33:07,940 because you know you can precisely represent integers. 565 00:33:07,940 --> 00:33:12,290 And let's say I want to add up fares over a lot, a lot of riders. 566 00:33:12,290 --> 00:33:15,020 This might be useful for me because I know that each number will 567 00:33:15,020 --> 00:33:16,820 be perfectly, precisely represented. 568 00:33:16,820 --> 00:33:20,720 I can do lots of big kind of math with this number here. 569 00:33:20,720 --> 00:33:24,260 To your point, this decimal might kind of, as you said, 570 00:33:24,260 --> 00:33:26,840 get wonky later on towards the later decimal points. 571 00:33:26,840 --> 00:33:30,980 I might get some unexpected results if we add up these overall. 572 00:33:30,980 --> 00:33:36,380 Let me ask, though, are there any proponents of this floating point 573 00:33:36,380 --> 00:33:40,120 value or a real value? 574 00:33:40,120 --> 00:33:42,930 AUDIENCE: So I think the-- 575 00:33:42,930 --> 00:33:48,810 I think the float is the number of, for example, 576 00:33:48,810 --> 00:33:56,190 for each pair, like the answers like truncation 577 00:33:56,190 --> 00:33:58,875 probably suggest by the comments there. 578 00:33:58,875 --> 00:34:00,250 CARTER ZENKE: Yeah, a good point. 579 00:34:00,250 --> 00:34:02,083 So if you talk about using this float value, 580 00:34:02,083 --> 00:34:06,491 I mean, one thing we could say for it is that this decimal could be-- 581 00:34:06,491 --> 00:34:09,449 it's more accurate to say, like, where you're working with dollars now, 582 00:34:09,449 --> 00:34:13,259 and we could have maybe $0.10, which is only 0.1 of a dollar. 583 00:34:13,259 --> 00:34:15,030 I totally hear that point as well. 584 00:34:15,030 --> 00:34:17,280 And the point we're making here is that they're really 585 00:34:17,280 --> 00:34:20,580 just trade offs among these data storage classes 586 00:34:20,580 --> 00:34:23,699 to use, whether you're using integers or real values, 587 00:34:23,699 --> 00:34:27,730 it just depends on your use case and what you're designing for. 588 00:34:27,730 --> 00:34:30,900 So be sure to read up on trade offs among these data types, 589 00:34:30,900 --> 00:34:35,420 and determine for yourself which one should you best use. 590 00:34:35,420 --> 00:34:36,409 OK. 591 00:34:36,409 --> 00:34:41,090 So we have now these storage classes to store values, 592 00:34:41,090 --> 00:34:47,210 and it turns out that columns can be made to store certain storage classes 593 00:34:47,210 --> 00:34:49,460 or prioritize certain classes. 594 00:34:49,460 --> 00:34:53,750 And the key distinction here is that columns in SQLite 595 00:34:53,750 --> 00:34:57,440 don't always store one particular type. 596 00:34:57,440 --> 00:35:01,250 Instead, they have type affinities, meaning 597 00:35:01,250 --> 00:35:07,280 that they'll try to convert some value you insert into a given cell or given 598 00:35:07,280 --> 00:35:11,660 row to the type they have the affinity for. 599 00:35:11,660 --> 00:35:16,490 Now there are, let's say five type affinities in SQLite-- 600 00:35:16,490 --> 00:35:17,270 text. 601 00:35:17,270 --> 00:35:19,610 Columns can be of the type affinity text, 602 00:35:19,610 --> 00:35:22,520 meaning they store just characters at the end of the day. 603 00:35:22,520 --> 00:35:27,950 There's also numeric, which stores either integer values or real values, 604 00:35:27,950 --> 00:35:31,010 depending on which one seems best to convert to. 605 00:35:31,010 --> 00:35:33,980 You have integer type affinity, which means it can store whole numbers. 606 00:35:33,980 --> 00:35:36,990 Real, to store floating points or decimal values. 607 00:35:36,990 --> 00:35:39,120 And we have blob here again, our old friend 608 00:35:39,120 --> 00:35:42,030 to store binary exactly as we get it. 609 00:35:42,030 --> 00:35:44,750 So whereas before we were talking about storage classes-- 610 00:35:44,750 --> 00:35:47,090 those are associated with individual values-- 611 00:35:47,090 --> 00:35:51,210 type affinities are now associated with individual columns. 612 00:35:51,210 --> 00:35:55,460 So let's see some example of how this might work in SQLite. 613 00:35:55,460 --> 00:35:59,000 Let's say I have this table of fares, and we've 614 00:35:59,000 --> 00:36:02,720 decided to store fares as integers. 615 00:36:02,720 --> 00:36:09,470 Well, if I said that this column called amount has the affinity for the text 616 00:36:09,470 --> 00:36:14,300 storage class, what would happen is if I insert this integer, 10, 617 00:36:14,300 --> 00:36:16,490 it would look a bit like this later on. 618 00:36:16,490 --> 00:36:20,250 It would be converted to text, it would be quoted in a sense, 619 00:36:20,250 --> 00:36:24,240 to represent-- it's now been converted to some set of characters. 620 00:36:24,240 --> 00:36:27,050 Let's say I insert this value 25. 621 00:36:27,050 --> 00:36:31,250 Well, 25 has a storage class right now of an integer. 622 00:36:31,250 --> 00:36:33,560 It is a whole number. 623 00:36:33,560 --> 00:36:38,480 But if I insert this into a column that has the text affinity, 624 00:36:38,480 --> 00:36:44,250 it will be converted into, in this case, text at the end of the day. 625 00:36:44,250 --> 00:36:45,920 Let's do the opposite. 626 00:36:45,920 --> 00:36:51,270 Let's say I have my fare as a string, some text in this case. 627 00:36:51,270 --> 00:36:54,690 I want to insert it into this column called amount, 628 00:36:54,690 --> 00:36:58,820 but now amount has the integer type affinity. 629 00:36:58,820 --> 00:37:03,830 Well, if I insert 10, quote unquote, into the column amount, 630 00:37:03,830 --> 00:37:10,850 I'll get back not 10, the text, but 10 the integer, because again, amount-- 631 00:37:10,850 --> 00:37:16,310 this column here-- has an affinity for the integer storage class. 632 00:37:16,310 --> 00:37:17,210 Let's try this. 633 00:37:17,210 --> 00:37:21,650 25, some text again I'll insert it into this table. 634 00:37:21,650 --> 00:37:26,690 Now I'll have 25 as an integer. 635 00:37:26,690 --> 00:37:31,300 So this is how SQLite allows us to give certain columns 636 00:37:31,300 --> 00:37:33,610 an affinity for certain types, that they'll 637 00:37:33,610 --> 00:37:38,860 try to store values of that type, so long as we insert values it could be 638 00:37:38,860 --> 00:37:41,900 feasibly converted to that type here. 639 00:37:41,900 --> 00:37:45,820 So let's go back to our schema and try to improve it now, 640 00:37:45,820 --> 00:37:50,860 to use not just column names, but also type affinities to store 641 00:37:50,860 --> 00:37:53,140 certain data of a certain type. 642 00:37:53,140 --> 00:37:58,000 Go back to my computer here, and let's improve this once more. 643 00:37:58,000 --> 00:38:00,910 So I'll go over to my table. 644 00:38:00,910 --> 00:38:05,890 And now, I probably want to improve the design here. 645 00:38:05,890 --> 00:38:08,680 And often, if I want to improve this, I might just 646 00:38:08,680 --> 00:38:10,910 need to erase what I've already done. 647 00:38:10,910 --> 00:38:16,210 So let me introduce this new keyword, this new statement called Drop Table. 648 00:38:16,210 --> 00:38:19,460 To drop a table means to delete it, to remove it, effectively. 649 00:38:19,460 --> 00:38:23,080 So let me try doing this for riders, stations, and visits. 650 00:38:23,080 --> 00:38:28,840 I'll type drop table writers, semicolon, Enter. 651 00:38:28,840 --> 00:38:34,240 Nothing seems to happen, but if I type dot schema, well, riders is gone. 652 00:38:34,240 --> 00:38:40,960 I'll try drop table stations, then semicolon, hit Enter, and type, 653 00:38:40,960 --> 00:38:42,760 let's say dot schema again. 654 00:38:42,760 --> 00:38:44,320 No more stations. 655 00:38:44,320 --> 00:38:49,030 I'll try drop table visits, semicolon, Enter, 656 00:38:49,030 --> 00:38:52,510 and then dot schema, and our table-- 657 00:38:52,510 --> 00:38:54,160 our database is really gone. 658 00:38:54,160 --> 00:38:57,280 There are no more tables inside of it. 659 00:38:57,280 --> 00:39:02,370 So let me propose that instead of working inside of the SQLite 660 00:39:02,370 --> 00:39:05,730 prompt, like typing out again, and again, and again, create table riders, 661 00:39:05,730 --> 00:39:07,650 create table stations, create table visits, 662 00:39:07,650 --> 00:39:10,650 let me be more efficient about this and create myself 663 00:39:10,650 --> 00:39:14,790 a schema file that I could reuse throughout this lesson 664 00:39:14,790 --> 00:39:18,870 and also later on while I'm working on this database on my own. 665 00:39:18,870 --> 00:39:23,850 To do that, let me quit my SQLite prompt here, and let me 666 00:39:23,850 --> 00:39:27,300 type something like code schema dot sql. 667 00:39:27,300 --> 00:39:31,620 I'm just creating this file called schema.sql. 668 00:39:31,620 --> 00:39:36,060 Now a dot SQL file allows me to type in SQL keywords 669 00:39:36,060 --> 00:39:38,790 and have them be syntax highlighted, so I know 670 00:39:38,790 --> 00:39:42,040 what's going on inside of this file. 671 00:39:42,040 --> 00:39:44,130 So let's just try this once more. 672 00:39:44,130 --> 00:39:49,170 I'll type Create table riders, and inside I'll 673 00:39:49,170 --> 00:39:53,820 say it has the ID column of what type affinity? 674 00:39:53,820 --> 00:39:58,270 Well, IDs are whole numbers, so perhaps integer in this case. 675 00:39:58,270 --> 00:40:02,160 I could say, ID has the integer type affinity. 676 00:40:02,160 --> 00:40:05,010 Now let me say the riders also have a name, 677 00:40:05,010 --> 00:40:07,710 and how could names be best represented? 678 00:40:07,710 --> 00:40:09,210 Maybe text, right? 679 00:40:09,210 --> 00:40:10,120 Characters here. 680 00:40:10,120 --> 00:40:12,750 So I'll say name and text. 681 00:40:12,750 --> 00:40:14,850 Now I'll include a semicolon to say, this 682 00:40:14,850 --> 00:40:17,820 is the end of my create table statement. 683 00:40:17,820 --> 00:40:21,630 And before, remember how I had to kind of air out 684 00:40:21,630 --> 00:40:25,560 or I had to like, backspace and so on to improve the design? 685 00:40:25,560 --> 00:40:28,890 Here I can literally just point and click and edit this file 686 00:40:28,890 --> 00:40:30,630 to improve my schema. 687 00:40:30,630 --> 00:40:35,130 And I'll later on apply this in my database using 688 00:40:35,130 --> 00:40:37,418 a command that we'll see a bit later. 689 00:40:37,418 --> 00:40:38,460 So let's keep going here. 690 00:40:38,460 --> 00:40:44,040 I'll say create table stations, and inside of the stations table, 691 00:40:44,040 --> 00:40:48,330 I'll make sure it has an ID column of type integer, 692 00:40:48,330 --> 00:40:51,960 a name column that probably stores some text, 693 00:40:51,960 --> 00:40:57,270 and a line column that also stores some text, where name 694 00:40:57,270 --> 00:41:00,130 is the name of the station like Kendell MIT, Harvard, 695 00:41:00,130 --> 00:41:05,640 and line is blue line or green line, what line it's part of in our subway. 696 00:41:05,640 --> 00:41:07,290 Let me try now visits. 697 00:41:07,290 --> 00:41:08,850 I'll say, create table visits. 698 00:41:08,850 --> 00:41:13,620 And then I'll do rider ID, which has what type affinity? 699 00:41:13,620 --> 00:41:15,030 Probably integer. 700 00:41:15,030 --> 00:41:19,680 And then I'll do station ID, which has this same type of affinity. 701 00:41:19,680 --> 00:41:23,070 It's relying on integers, whole numbers here for IDs. 702 00:41:23,070 --> 00:41:25,450 So my colon to finish this statement. 703 00:41:25,450 --> 00:41:29,040 Now this is my schema as a whole. 704 00:41:29,040 --> 00:41:31,830 I have riders, stations, and visits. 705 00:41:31,830 --> 00:41:36,760 But now I want to apply this schema to my database. 706 00:41:36,760 --> 00:41:37,780 So what could I do? 707 00:41:37,780 --> 00:41:39,330 I could reopen it, let's say. 708 00:41:39,330 --> 00:41:43,510 I'll do sqlite3 mbta.db in my terminal. 709 00:41:43,510 --> 00:41:47,370 And now I want to read in this schema.sql file, 710 00:41:47,370 --> 00:41:51,340 just run the commands that are inside of this file. 711 00:41:51,340 --> 00:41:52,660 So what could I do? 712 00:41:52,660 --> 00:41:59,040 I could say dot read schema.sql, where dot read is a command that 713 00:41:59,040 --> 00:42:01,860 says take whatever file you give me, like schema.sql 714 00:42:01,860 --> 00:42:08,070 and read it into this database, running any SQL keywords you come across there. 715 00:42:08,070 --> 00:42:09,450 I'll hit Enter. 716 00:42:09,450 --> 00:42:11,640 And now nothing seems to happen. 717 00:42:11,640 --> 00:42:17,658 But if I type dot schema, I now see my schema improved. 718 00:42:17,658 --> 00:42:19,950 And this is helpful for me, because what I could do now 719 00:42:19,950 --> 00:42:23,580 is I could just edit my schema.sql file and rerun it and rerun 720 00:42:23,580 --> 00:42:29,770 it to make sure I now have these tables being improved over time. 721 00:42:29,770 --> 00:42:30,730 OK. 722 00:42:30,730 --> 00:42:37,720 So this, then, is our new representation of our database. 723 00:42:37,720 --> 00:42:42,280 We have riders, of course, their own entity, and stations. 724 00:42:42,280 --> 00:42:47,170 They have an ID and a name, and stations have an ID, a name, and a line. 725 00:42:47,170 --> 00:42:52,810 We've also now included these type affinities-- integer, text, integer, 726 00:42:52,810 --> 00:42:56,320 text, to tell SQL what kinds of storage classes 727 00:42:56,320 --> 00:43:00,490 could be put inside of each of these columns. 728 00:43:00,490 --> 00:43:03,580 Now before we keep improving this, let me ask, 729 00:43:03,580 --> 00:43:10,780 what questions do we have on these storage classes and type affinities? 730 00:43:10,780 --> 00:43:12,400 AUDIENCE: That you would use-- 731 00:43:12,400 --> 00:43:17,870 when you were creating the table, the table was not in line. 732 00:43:17,870 --> 00:43:21,340 So when we search for the authors of books, 733 00:43:21,340 --> 00:43:25,982 so it comes with a perfect table, so how can we make a perfect table in the SQL? 734 00:43:25,982 --> 00:43:27,940 CARTER ZENKE: Yeah, do you mind clarifying what 735 00:43:27,940 --> 00:43:30,160 you mean by like, the perfect table? 736 00:43:30,160 --> 00:43:33,760 AUDIENCE: So I mean that it was arranged in something 737 00:43:33,760 --> 00:43:36,443 like in boxes, block boxes. 738 00:43:36,443 --> 00:43:37,610 CARTER ZENKE: Good question. 739 00:43:37,610 --> 00:43:41,770 So before we able to see the results of our queries 740 00:43:41,770 --> 00:43:46,660 inside some boxes in our terminal, and that is actually a mode of SQLite, 741 00:43:46,660 --> 00:43:51,370 I think type like dot mode table to see your results in that version. 742 00:43:51,370 --> 00:43:54,937 Here though, we have no data inside of our tables, 743 00:43:54,937 --> 00:43:56,770 so I can't really select anything from them. 744 00:43:56,770 --> 00:44:01,300 Like, if I go to my terminal here and I try 745 00:44:01,300 --> 00:44:04,690 to select some data from this riders table, 746 00:44:04,690 --> 00:44:10,510 let me say select star from rider's, semicolon, I won't get anything back. 747 00:44:10,510 --> 00:44:15,040 Next week, though, we'll see how to insert, and add, and update, and delete 748 00:44:15,040 --> 00:44:17,990 data inside of these tables, at which point, 749 00:44:17,990 --> 00:44:21,460 you could write, select star from riders and see some data 750 00:44:21,460 --> 00:44:23,560 you've inserted yourself. 751 00:44:23,560 --> 00:44:25,120 Great question. 752 00:44:25,120 --> 00:44:27,580 Let's take one more here. 753 00:44:27,580 --> 00:44:28,360 AUDIENCE: Yes. 754 00:44:28,360 --> 00:44:33,130 I would like to know if you have a column of the type boolean. 755 00:44:33,130 --> 00:44:36,730 CARTER ZENKE: Yeah, do we have a Boolean type affinity, let's say. 756 00:44:36,730 --> 00:44:39,850 So here, we don't, at least in SQLite. 757 00:44:39,850 --> 00:44:45,610 Some other DBMS', Database Management Systems might have bool or Boolean, 758 00:44:45,610 --> 00:44:47,110 true or false, right? 759 00:44:47,110 --> 00:44:48,700 Let me show you this instead. 760 00:44:48,700 --> 00:44:52,840 If I go to my terminal, I can see-- if I type dot schema, 761 00:44:52,840 --> 00:44:56,320 I haven't used Boolean, there's no need for me in this case, 762 00:44:56,320 --> 00:44:58,150 but I have used integer. 763 00:44:58,150 --> 00:45:02,350 And integer for SQLite can kind of serve the same purpose. 764 00:45:02,350 --> 00:45:06,550 I could have the integer 0 or the integer 1 to be true 765 00:45:06,550 --> 00:45:09,880 or to be false or true, respectively, 0 being false and true being 1, 766 00:45:09,880 --> 00:45:11,140 I believe in this case. 767 00:45:11,140 --> 00:45:13,970 But good question. 768 00:45:13,970 --> 00:45:14,750 OK. 769 00:45:14,750 --> 00:45:19,190 So to the earlier question, like, we've improved our tables. 770 00:45:19,190 --> 00:45:21,950 We now have type affinities for our columns. 771 00:45:21,950 --> 00:45:26,490 But we don't yet have this ID we talked about last week, 772 00:45:26,490 --> 00:45:29,450 which was primary keys and foreign keys. 773 00:45:29,450 --> 00:45:34,280 This idea of trying to uniquely represent each item in our own table 774 00:45:34,280 --> 00:45:36,770 using primary keys and trying to reference 775 00:45:36,770 --> 00:45:40,800 that primary key from some other table using foreign keys. 776 00:45:40,800 --> 00:45:43,820 So let's try to work on that now. 777 00:45:43,820 --> 00:45:48,980 And for this, we'll need this new ID called a table constraint. 778 00:45:48,980 --> 00:45:54,500 In SQLite, you can apply what's called a constraint to your entire table. 779 00:45:54,500 --> 00:45:59,300 A constraint means that some values have to be a certain way. 780 00:45:59,300 --> 00:46:04,340 Like let's say for a primary key, primary keys must be unique. 781 00:46:04,340 --> 00:46:07,730 They can't repeat, and they must be, at least in our case, 782 00:46:07,730 --> 00:46:12,990 is going to be integers, to be able to quickly add on to them over time. 783 00:46:12,990 --> 00:46:16,280 Similarly, for foreign keys, well, a constraint 784 00:46:16,280 --> 00:46:19,220 is that if you have a foreign key, you better 785 00:46:19,220 --> 00:46:22,850 find that value in some other table, otherwise 786 00:46:22,850 --> 00:46:26,250 you violated this constraint to having a foreign key. 787 00:46:26,250 --> 00:46:29,940 So we have two kinds of table constraints, among others, 788 00:46:29,940 --> 00:46:33,500 but two of these are primary key and foreign key. 789 00:46:33,500 --> 00:46:36,800 And we can apply these to our table by applying them 790 00:46:36,800 --> 00:46:42,030 underneath the columns we tend to say will be inside of our table. 791 00:46:42,030 --> 00:46:43,850 Let's try these two here now. 792 00:46:43,850 --> 00:46:47,840 So we come back to my terminal here so we implement our very own primary key 793 00:46:47,840 --> 00:46:49,730 and foreign key constraints. 794 00:46:49,730 --> 00:46:52,920 We'll go back to my SQLite terminal and clear my screen, 795 00:46:52,920 --> 00:46:55,670 and let's then pull up our schema.sql file 796 00:46:55,670 --> 00:46:58,130 so we can keep modifying our schema. 797 00:46:58,130 --> 00:47:01,610 I can now see I have the riders table, stations, and visits, 798 00:47:01,610 --> 00:47:06,080 and I have some columns that could be primary keys or foreign keys, 799 00:47:06,080 --> 00:47:08,850 but I need to declare them as such. 800 00:47:08,850 --> 00:47:12,120 So here in the riders table, what was our primary key? 801 00:47:12,120 --> 00:47:13,520 Well, it was ID. 802 00:47:13,520 --> 00:47:16,520 Every rider should have their own, unique ID that should not 803 00:47:16,520 --> 00:47:19,460 be duplicated across any two riders. 804 00:47:19,460 --> 00:47:24,410 So to ensure that constraint is applied, I could follow this up with a comma 805 00:47:24,410 --> 00:47:28,770 and then say, primary key ID, just like this. 806 00:47:28,770 --> 00:47:32,690 Now ID is the primary key of this rider's table. 807 00:47:32,690 --> 00:47:36,450 I can go down to stations and ask, what was my primary key? 808 00:47:36,450 --> 00:47:40,520 Well, similarly, it was ID, this ID column on line 8. 809 00:47:40,520 --> 00:47:45,590 So I'll type a comma followed up with primary key ID, 810 00:47:45,590 --> 00:47:49,170 and now that ID column is a primary key. 811 00:47:49,170 --> 00:47:52,580 It has that constraint applied in stations. 812 00:47:52,580 --> 00:47:56,900 But now if I get down to visits, we'll have a few more options. 813 00:47:56,900 --> 00:48:01,700 Visits here actually doesn't have its own ID column that I created. 814 00:48:01,700 --> 00:48:05,750 I instead have a rider ID and a station ID column. 815 00:48:05,750 --> 00:48:07,110 So a few options here. 816 00:48:07,110 --> 00:48:10,730 One option is actually to make a joint primary key. 817 00:48:10,730 --> 00:48:17,630 I could have a primary key composed of two columns, both rider ID and station 818 00:48:17,630 --> 00:48:18,170 ID. 819 00:48:18,170 --> 00:48:20,780 If I applied that constraint, that would mean 820 00:48:20,780 --> 00:48:24,110 that if I were to ever insert a row that had the same rider 821 00:48:24,110 --> 00:48:26,990 ID and the same station ID as another row, 822 00:48:26,990 --> 00:48:29,810 I would trigger a constraint violation. 823 00:48:29,810 --> 00:48:33,080 Every row that has a rider ID and a station ID 824 00:48:33,080 --> 00:48:36,860 has to be unique in their combination of those two values. 825 00:48:36,860 --> 00:48:39,770 To write that kind of scenario, I could follow this up 826 00:48:39,770 --> 00:48:43,190 and I could say similarly, the primary key of this table 827 00:48:43,190 --> 00:48:48,410 is not just rider ID like this, but it's also station ID. 828 00:48:48,410 --> 00:48:52,080 Now this is a joint primary key constraint. 829 00:48:52,080 --> 00:48:54,110 But if we think about this logically, I mean, 830 00:48:54,110 --> 00:48:57,620 it kind of stands to reason that somebody would visit a station more 831 00:48:57,620 --> 00:49:01,850 than once, and I don't want to make sure that every combination of rider 832 00:49:01,850 --> 00:49:03,420 and station ID should be unique. 833 00:49:03,420 --> 00:49:06,600 I want people to able to visit a station more than once. 834 00:49:06,600 --> 00:49:09,900 So maybe not the best design for this table, but I could certainly use it. 835 00:49:09,900 --> 00:49:12,530 In other cases or other contexts. 836 00:49:12,530 --> 00:49:15,410 One other option would be to do it before 837 00:49:15,410 --> 00:49:19,670 and to have maybe the ID column here of type integer, 838 00:49:19,670 --> 00:49:23,670 and then down below, make that our primary key, a bit like this. 839 00:49:23,670 --> 00:49:27,140 And now visits has its own ID column. 840 00:49:27,140 --> 00:49:33,395 But actually, SQLite by default will give me its very own primary key, 841 00:49:33,395 --> 00:49:35,810 one called row ID. 842 00:49:35,810 --> 00:49:39,170 It's implicit, I can't see it, but I actually could query it. 843 00:49:39,170 --> 00:49:41,810 I could query it for row ID, all one word, 844 00:49:41,810 --> 00:49:46,070 and get back a unique primary key for this table SQLite has automatically 845 00:49:46,070 --> 00:49:47,950 created for me. 846 00:49:47,950 --> 00:49:50,880 Now we have seen the primary key options, 847 00:49:50,880 --> 00:49:52,420 what are our foreign key options? 848 00:49:52,420 --> 00:49:55,180 Well, it seems like rider ID and station ID 849 00:49:55,180 --> 00:49:59,320 are the foreign keys of this table where rider ID references the ID 850 00:49:59,320 --> 00:50:05,020 column in rider's, and station ID references the ID column in stations. 851 00:50:05,020 --> 00:50:08,350 So to codify that, to make that a reality in our schema, 852 00:50:08,350 --> 00:50:11,540 I could follow this up with the foreign key. 853 00:50:11,540 --> 00:50:14,260 The foreign key of this table is rider ID, 854 00:50:14,260 --> 00:50:19,690 and it references the rider's table and the ID column 855 00:50:19,690 --> 00:50:22,840 inside of it using this syntax here. 856 00:50:22,840 --> 00:50:24,530 Now I could keep going. 857 00:50:24,530 --> 00:50:26,840 I could say I have more than one foreign key. 858 00:50:26,840 --> 00:50:29,500 I also have a foreign key-- 859 00:50:29,500 --> 00:50:32,020 foreign key called station ID. 860 00:50:32,020 --> 00:50:34,150 And that references, like we said before, 861 00:50:34,150 --> 00:50:37,670 the stations table and the ID column inside of it. 862 00:50:37,670 --> 00:50:40,060 So now here I have my completed schema. 863 00:50:40,060 --> 00:50:43,780 I have a primary key for the tables I've declared an explicit column 864 00:50:43,780 --> 00:50:47,800 for a primary key, and now I also have foreign key constraints 865 00:50:47,800 --> 00:50:51,710 for those columns that should be foreign keys. 866 00:50:51,710 --> 00:50:55,570 So now let me ask, what questions do we have on the schema 867 00:50:55,570 --> 00:50:59,508 or on primary keys and foreign keys? 868 00:50:59,508 --> 00:51:00,300 AUDIENCE: So, yeah. 869 00:51:00,300 --> 00:51:03,570 I just noticed that whenever before we hadn't 870 00:51:03,570 --> 00:51:06,210 added the affinities and the keys, we were not 871 00:51:06,210 --> 00:51:09,340 applying commas after each column name. 872 00:51:09,340 --> 00:51:11,295 So what is the difference there? 873 00:51:11,295 --> 00:51:12,670 CARTER ZENKE: Yeah, a good catch. 874 00:51:12,670 --> 00:51:15,128 Let me kind of show you what this looks like in my terminal 875 00:51:15,128 --> 00:51:17,310 so you can see it live. 876 00:51:17,310 --> 00:51:20,430 You, I think, had noticed that before we had, 877 00:51:20,430 --> 00:51:25,110 let's say, this primary key ID constraint in riders, 878 00:51:25,110 --> 00:51:26,740 we had done something like this. 879 00:51:26,740 --> 00:51:28,290 Let me just copy paste that. 880 00:51:28,290 --> 00:51:30,450 And we had removed this last column-- 881 00:51:30,450 --> 00:51:32,190 or comma from the name column. 882 00:51:32,190 --> 00:51:33,090 Is that right? 883 00:51:33,090 --> 00:51:38,290 And if that's the case, well, it's just convention, just style here. 884 00:51:38,290 --> 00:51:41,100 So if I want to keep adding some constraint 885 00:51:41,100 --> 00:51:45,000 or like a new line to my table, I should include a comma. 886 00:51:45,000 --> 00:51:51,720 Here, this name column was the last portion of my table I had specified. 887 00:51:51,720 --> 00:51:55,770 I have this column called name that has type affinity text, right? 888 00:51:55,770 --> 00:51:58,590 But now, if I add this new constraint, we'll 889 00:51:58,590 --> 00:52:03,360 have to follow it up with-- follow it up after a comma from this new column 890 00:52:03,360 --> 00:52:04,290 here. 891 00:52:04,290 --> 00:52:10,230 Notice now, this constraint, primary key ID, is the last-- 892 00:52:10,230 --> 00:52:12,810 let's say attribute of my table I specified. 893 00:52:12,810 --> 00:52:15,870 I no longer need to include a comma at the end of it. 894 00:52:15,870 --> 00:52:20,430 So whatever is the last portion, I should not have a comma after, 895 00:52:20,430 --> 00:52:23,460 but everything else, I should. 896 00:52:23,460 --> 00:52:25,800 Let's take one more question here, too. 897 00:52:25,800 --> 00:52:29,813 AUDIENCE: Would it be OK for the visits table to have an ID column as well? 898 00:52:29,813 --> 00:52:31,230 CARTER ZENKE: Yeah, good question. 899 00:52:31,230 --> 00:52:35,490 Would it be OK for the visits table to have an ID column as well. 900 00:52:35,490 --> 00:52:36,450 It certainly would be. 901 00:52:36,450 --> 00:52:39,870 We could define our very own primary key for this table, too. 902 00:52:39,870 --> 00:52:42,210 So let me go back and show you how that could work. 903 00:52:42,210 --> 00:52:44,880 I'll go to my visits table here. 904 00:52:44,880 --> 00:52:49,440 And I could try to add my own primary key to this table. 905 00:52:49,440 --> 00:52:54,870 I could say ID, make a new column here, make it a type affinity integer, 906 00:52:54,870 --> 00:52:55,900 like this. 907 00:52:55,900 --> 00:52:56,700 Let me scroll up. 908 00:52:56,700 --> 00:53:00,090 And now let me add some new constraint. 909 00:53:00,090 --> 00:53:03,450 I could say, because I've made my very own primary key, 910 00:53:03,450 --> 00:53:06,310 I'll say primary key ID. 911 00:53:06,310 --> 00:53:12,510 Now, this table has a primary key that I've created called ID. 912 00:53:12,510 --> 00:53:15,310 And this will be in place of SQLite, so that it 913 00:53:15,310 --> 00:53:21,730 would have made called row ID itself but hidden from my own view. 914 00:53:21,730 --> 00:53:24,450 OK, let's keep going then. 915 00:53:24,450 --> 00:53:27,570 We've seen table constraints, we've seen type affinities, 916 00:53:27,570 --> 00:53:30,540 but we could probably do more to improve the design 917 00:53:30,540 --> 00:53:32,740 of this table or this database. 918 00:53:32,740 --> 00:53:35,850 So let's introduce one more kind of constraint, 919 00:53:35,850 --> 00:53:38,740 this one called a column constraint. 920 00:53:38,740 --> 00:53:42,900 So whereas a table constraint applies to the table as a whole, 921 00:53:42,900 --> 00:53:47,160 a column constraint applies to a particular column, let's say. 922 00:53:47,160 --> 00:53:51,930 Maybe I want to column to have certain data inside of it. 923 00:53:51,930 --> 00:53:54,720 Maybe I want to make sure it doesn't have null values, and so on. 924 00:53:54,720 --> 00:53:57,690 I could do that with column constraints. 925 00:53:57,690 --> 00:54:00,510 There are four in SQLite-- 926 00:54:00,510 --> 00:54:07,440 check, default, not null, and unique, and each one does something different. 927 00:54:07,440 --> 00:54:11,340 Check allows me to make my very own check, like check to be sure 928 00:54:11,340 --> 00:54:14,250 that this amount is greater than 0. 929 00:54:14,250 --> 00:54:17,250 Or I could use default. Default means if I 930 00:54:17,250 --> 00:54:20,310 don't supply a value when I add a new row, 931 00:54:20,310 --> 00:54:23,640 it'll just use a default value instead. 932 00:54:23,640 --> 00:54:29,730 Not null means I can't insert null or empty values into this column. 933 00:54:29,730 --> 00:54:31,680 And in fact, it's required. 934 00:54:31,680 --> 00:54:37,770 Unique means, I want to make sure that every row in this column 935 00:54:37,770 --> 00:54:39,390 is a unique value. 936 00:54:39,390 --> 00:54:42,370 It doesn't appear twice in my data set. 937 00:54:42,370 --> 00:54:46,440 So let's try applying a few of these to our schema here. 938 00:54:46,440 --> 00:54:48,270 Go back to my terminal. 939 00:54:48,270 --> 00:54:50,920 And now let me check out this. 940 00:54:50,920 --> 00:54:55,020 Well, I could try applying the not null constraint 941 00:54:55,020 --> 00:54:59,190 when I know I want a column to be required, effectively. 942 00:54:59,190 --> 00:55:01,530 Now where could I best apply that? 943 00:55:01,530 --> 00:55:05,790 Maybe I could apply that to the name column in stations, 944 00:55:05,790 --> 00:55:07,870 like stations must have a name. 945 00:55:07,870 --> 00:55:11,140 So I'll say, the name column cannot be null. 946 00:55:11,140 --> 00:55:14,310 It cannot be empty in here. 947 00:55:14,310 --> 00:55:17,280 Line also should probably be not null. 948 00:55:17,280 --> 00:55:19,710 A station must be part of some line. 949 00:55:19,710 --> 00:55:21,780 I can't have an empty value for line. 950 00:55:21,780 --> 00:55:26,240 So I'll say, too, this should be not null. 951 00:55:26,240 --> 00:55:28,790 Now I could apply this up at name. 952 00:55:28,790 --> 00:55:31,610 I could say, riders must have a name, too. 953 00:55:31,610 --> 00:55:32,310 Let me try that. 954 00:55:32,310 --> 00:55:34,220 I'll say text not null. 955 00:55:34,220 --> 00:55:35,630 Or I could leave it optional. 956 00:55:35,630 --> 00:55:39,020 I could say, maybe text just on its own and let 957 00:55:39,020 --> 00:55:42,680 riders choose to supply a name or not. 958 00:55:42,680 --> 00:55:48,650 Now the question here is, should I apply not null to my primary key columns, 959 00:55:48,650 --> 00:55:53,300 like ID not null or ID not null here? 960 00:55:53,300 --> 00:55:56,420 You might think that you should, for thoroughness sake. 961 00:55:56,420 --> 00:56:01,940 Well it turns out that when you apply the primary key table constraint down 962 00:56:01,940 --> 00:56:06,440 below here, this already ensures that there are several constraints applied 963 00:56:06,440 --> 00:56:12,800 to this particular column called ID, among them being that ID cannot be 964 00:56:12,800 --> 00:56:13,350 null. 965 00:56:13,350 --> 00:56:17,480 So no need to duplicate this and say that this ID cannot be null when I 966 00:56:17,480 --> 00:56:23,240 already have it specified down below that ID is a primary key. 967 00:56:23,240 --> 00:56:25,520 Let me check others here. 968 00:56:25,520 --> 00:56:29,800 You might also think, could I do it for rider ID and station ID? 969 00:56:29,800 --> 00:56:32,470 Should I include not null here? 970 00:56:32,470 --> 00:56:36,490 Rider ID, not null, station ID, not null. 971 00:56:36,490 --> 00:56:39,250 Well, that would be a good thought, but again, we're 972 00:56:39,250 --> 00:56:43,540 taken care of by our table constraint using our foreign key here. 973 00:56:43,540 --> 00:56:48,520 Again, this constraint will say, if rider ID doesn't already 974 00:56:48,520 --> 00:56:53,380 exist in the ID column of rider's, I can't insert that value. 975 00:56:53,380 --> 00:56:58,090 And we could probably presume that if rider's ID is a primary key, 976 00:56:58,090 --> 00:57:01,130 well, null will not be part of this column, 977 00:57:01,130 --> 00:57:06,430 and therefore, I already can't insert null for rider ID or station ID. 978 00:57:06,430 --> 00:57:08,860 This would be in this case, redundant. 979 00:57:08,860 --> 00:57:12,910 So not null is good when you have columns that are neither 980 00:57:12,910 --> 00:57:15,910 primary keys nor foreign keys and you want 981 00:57:15,910 --> 00:57:20,010 to make sure that they have-- they always have a value, that they 982 00:57:20,010 --> 00:57:22,170 are never null, effectively. 983 00:57:22,170 --> 00:57:25,010 984 00:57:25,010 --> 00:57:27,320 So that is not null. 985 00:57:27,320 --> 00:57:28,730 So we could keep going here. 986 00:57:28,730 --> 00:57:34,190 We also had one called unique that makes sure every value, 987 00:57:34,190 --> 00:57:37,100 every row in this column is unique. 988 00:57:37,100 --> 00:57:38,820 Where could we apply this? 989 00:57:38,820 --> 00:57:42,140 I could try to apply it, let's say, to the name of a station. 990 00:57:42,140 --> 00:57:44,820 Like, station should have unique names. 991 00:57:44,820 --> 00:57:47,600 So I'll say not null and unique. 992 00:57:47,600 --> 00:57:50,220 Now this column has two constraints. 993 00:57:50,220 --> 00:57:51,680 The first, not null. 994 00:57:51,680 --> 00:57:53,720 It should always have a value. 995 00:57:53,720 --> 00:57:55,190 The second, unique. 996 00:57:55,190 --> 00:57:58,220 The value shouldn't repeat throughout this column. 997 00:57:58,220 --> 00:58:01,100 Line I might leave without this constraint. 998 00:58:01,100 --> 00:58:04,370 I could imagine two stations being on the same line, like both on blue, 999 00:58:04,370 --> 00:58:07,170 I'll allow that in this case. 1000 00:58:07,170 --> 00:58:12,560 Now again, we could try to apply unique to our primary keys or our foreign keys 1001 00:58:12,560 --> 00:58:15,920 as I just did here, but it's already taken 1002 00:58:15,920 --> 00:58:19,520 care of for us using this primary key constraint. 1003 00:58:19,520 --> 00:58:24,050 A primary key, again, is always going to be unique and never null. 1004 00:58:24,050 --> 00:58:29,720 So we'll take advantage of that already using our primary key and foreign key 1005 00:58:29,720 --> 00:58:32,490 constraints here. 1006 00:58:32,490 --> 00:58:33,270 OK. 1007 00:58:33,270 --> 00:58:36,180 So we've seen unique and not null. 1008 00:58:36,180 --> 00:58:39,510 And I might argue, we're at the point where this schema seems 1009 00:58:39,510 --> 00:58:44,400 to be fairly optimized, at least using our column constraints, our table 1010 00:58:44,400 --> 00:58:47,560 constraints, our type affinities and so on. 1011 00:58:47,560 --> 00:58:55,440 So let's ask then, what questions do we have on not null and unique, if any? 1012 00:58:55,440 --> 00:58:58,770 AUDIENCE: So basically, to recap if I understood correctly, it's 1013 00:58:58,770 --> 00:59:01,620 not precisely about not null and unique but about 1014 00:59:01,620 --> 00:59:06,060 the concept of the key, labeling key immediately 1015 00:59:06,060 --> 00:59:10,135 gives the attribute of not null, unique, and to be referenced, right? 1016 00:59:10,135 --> 00:59:11,260 CARTER ZENKE: That is true. 1017 00:59:11,260 --> 00:59:14,640 So when you use a primary key or a foreign key constraint, 1018 00:59:14,640 --> 00:59:17,880 there are other constraints that go along with that constraint. 1019 00:59:17,880 --> 00:59:23,410 A primary key, for instance, must not be null, it must be unique, and so on. 1020 00:59:23,410 --> 00:59:25,290 So it would be redundant to apply that again, 1021 00:59:25,290 --> 00:59:29,010 to say that this primary key should be unique or not null. 1022 00:59:29,010 --> 00:59:31,160 Good clarification there. 1023 00:59:31,160 --> 00:59:32,000 OK. 1024 00:59:32,000 --> 00:59:36,590 So I think we're at the point where this schema is pretty well set for us, 1025 00:59:36,590 --> 00:59:42,380 and we're going to need to think about how we use this in the real world. 1026 00:59:42,380 --> 00:59:48,630 Like, if this is our schema here, we have riders and stations, well, 1027 00:59:48,630 --> 00:59:49,940 what could we do? 1028 00:59:49,940 --> 00:59:52,340 I mean, riders tend to-- 1029 00:59:52,340 --> 00:59:54,710 I mean, they could register for the subway. 1030 00:59:54,710 --> 00:59:56,900 And riders, well, they do visit stations. 1031 00:59:56,900 --> 00:59:59,360 But I think if we applied this to the real world, 1032 00:59:59,360 --> 01:00:02,630 we'd see that this isn't quite how it's actually done. 1033 01:00:02,630 --> 01:00:05,510 Like, riders don't really register that often. 1034 01:00:05,510 --> 01:00:08,570 If a rider who's new to the city comes in, they want to ride the subway, 1035 01:00:08,570 --> 01:00:10,760 they should be able to ride, too. 1036 01:00:10,760 --> 01:00:15,110 So it turns out at least here in Boston, the MBTA doesn't really 1037 01:00:15,110 --> 01:00:20,360 track riders, per se, but they do track what we call CharlieCards. 1038 01:00:20,360 --> 01:00:23,780 If you want to ride a subway, you get a CharlieCard. 1039 01:00:23,780 --> 01:00:26,330 A CharlieCard allows you access to the subway, 1040 01:00:26,330 --> 01:00:28,910 keeps track of your fare, your balance and so on, 1041 01:00:28,910 --> 01:00:31,730 and allows you to swipe in to certain stations. 1042 01:00:31,730 --> 01:00:33,800 So when we come back from a break here, we'll 1043 01:00:33,800 --> 01:00:37,550 see how we can actually implement these CharlieCards instead of riders 1044 01:00:37,550 --> 01:00:40,460 to make our system more efficient and more in line with what 1045 01:00:40,460 --> 01:00:42,140 happens in the real world. 1046 01:00:42,140 --> 01:00:45,220 We'll see you all in just a few. 1047 01:00:45,220 --> 01:00:46,940 And we're back. 1048 01:00:46,940 --> 01:00:49,750 So we saw last time, this challenge of designing 1049 01:00:49,750 --> 01:00:53,080 a system for the Massachusetts Bay Transportation Authority 1050 01:00:53,080 --> 01:00:55,540 to represent riders and stations-- people 1051 01:00:55,540 --> 01:00:58,480 who get on a subway to go around Boston. 1052 01:00:58,480 --> 01:01:03,460 But as we also saw, we learned that the MBTA doesn't 1053 01:01:03,460 --> 01:01:09,100 keep track of riders themselves, they keep track of CharlieCards, this card 1054 01:01:09,100 --> 01:01:11,650 that a rider might carry in their pocket. 1055 01:01:11,650 --> 01:01:17,230 And they can scan this card at a station to enter or even to exit in some cases, 1056 01:01:17,230 --> 01:01:21,650 to make sure that their fare is counted as they enter that station. 1057 01:01:21,650 --> 01:01:27,520 So let's think now how to improve our schema to represent not just riders, 1058 01:01:27,520 --> 01:01:31,540 like me, But CharlieCards themselves people might carry around 1059 01:01:31,540 --> 01:01:34,120 when they enter the subway station. 1060 01:01:34,120 --> 01:01:37,120 Well, we saw before we had riders and cards. 1061 01:01:37,120 --> 01:01:43,030 But our goal now is to remove riders from the picture, focus only on cards. 1062 01:01:43,030 --> 01:01:47,470 Well cards, as we've seen, might make us swipe at a station. 1063 01:01:47,470 --> 01:01:51,790 If I enter Harvard station, I might swipe my CharlieCard 1064 01:01:51,790 --> 01:01:54,380 to enter that station. 1065 01:01:54,380 --> 01:01:59,270 And we could see that a card would have maybe many swipes involved. 1066 01:01:59,270 --> 01:02:04,520 Like if I swipe at Harvard, I might also swipe it MIT or swipe it Park Street, 1067 01:02:04,520 --> 01:02:05,810 and so on. 1068 01:02:05,810 --> 01:02:10,410 We could see that a swipe can only happen at a single station at a time, 1069 01:02:10,410 --> 01:02:10,910 though. 1070 01:02:10,910 --> 01:02:14,990 Like if I swipe at Park Street, I'm only swiping at Park Street here, 1071 01:02:14,990 --> 01:02:20,050 and similarly, a swipe might only involve one, single card. 1072 01:02:20,050 --> 01:02:23,040 So if we think about these entities and how they relate, 1073 01:02:23,040 --> 01:02:27,810 we could also think about what kinds of columns you have inside of each entity. 1074 01:02:27,810 --> 01:02:30,730 In this case, I would argue we have something a bit like this. 1075 01:02:30,730 --> 01:02:36,150 We could say that a card has a ID for its primary key, in this case. 1076 01:02:36,150 --> 01:02:42,360 Similarly, this card makes a swipe, and this swipe has, itself, an ID, 1077 01:02:42,360 --> 01:02:46,260 as well as a type, some time that that swipe happened, 1078 01:02:46,260 --> 01:02:49,480 and an amount or a transaction that is involved. 1079 01:02:49,480 --> 01:02:54,090 So for example, let's say I swipe in at the Harvard Square Station. 1080 01:02:54,090 --> 01:02:59,880 That type of swipe might be to enter the station at some certain time. 1081 01:02:59,880 --> 01:03:03,510 Now associated with that swipe is also some amount 1082 01:03:03,510 --> 01:03:07,320 in dollars that happened to be subtracted from my card. 1083 01:03:07,320 --> 01:03:10,620 Like let's say the fare nowadays is $2.40, well, 1084 01:03:10,620 --> 01:03:15,830 that amount is subtracted from my card from this swipe. 1085 01:03:15,830 --> 01:03:20,270 Now, of course, I do all of this at a station, which has our same columns 1086 01:03:20,270 --> 01:03:23,630 from before, ID, name, and line. 1087 01:03:23,630 --> 01:03:28,130 So a similar idea now, but we're replacing riders with cards 1088 01:03:28,130 --> 01:03:31,190 and adding more information to these visits. 1089 01:03:31,190 --> 01:03:34,940 There are instead swipes that could be maybe entering the station, 1090 01:03:34,940 --> 01:03:39,240 exiting the station or just adding funds to my balance, for instance. 1091 01:03:39,240 --> 01:03:41,690 So let's see how we could improve our schema now 1092 01:03:41,690 --> 01:03:47,720 using some new SQL keywords to alter our tables and add some new ones, too. 1093 01:03:47,720 --> 01:03:49,790 I'll go back to my computer. 1094 01:03:49,790 --> 01:03:54,560 And let's see how we could try to alter the table we've already created. 1095 01:03:54,560 --> 01:03:58,910 We already have a riders table, a visits table and a stations table, 1096 01:03:58,910 --> 01:04:02,450 but we could learn some new statements, some new queries to actually 1097 01:04:02,450 --> 01:04:06,030 update and alter these tables as well. 1098 01:04:06,030 --> 01:04:08,100 The first one as we saw a little bit before 1099 01:04:08,100 --> 01:04:11,270 is this one called drop table, arguably the most dramatic thing 1100 01:04:11,270 --> 01:04:14,100 you can do to table, is just drop it, delete it like this. 1101 01:04:14,100 --> 01:04:18,080 So let's try just deleting the riders table from our database. 1102 01:04:18,080 --> 01:04:21,140 I'll go back to my-- 1103 01:04:21,140 --> 01:04:24,050 go back to my database here. 1104 01:04:24,050 --> 01:04:29,370 I'll type sqlite mbta.db to pull up this database again. 1105 01:04:29,370 --> 01:04:34,160 And now if I type dot schema, I'll see I have a riders, stations, and visits 1106 01:04:34,160 --> 01:04:34,790 table. 1107 01:04:34,790 --> 01:04:38,510 But no longer-- I don't want to have the riders table. 1108 01:04:38,510 --> 01:04:39,990 I want to remove that. 1109 01:04:39,990 --> 01:04:45,200 So I'll say, drop table riders, semicolon. 1110 01:04:45,200 --> 01:04:46,880 Now I'll hit enter. 1111 01:04:46,880 --> 01:04:48,650 No more riders table. 1112 01:04:48,650 --> 01:04:52,830 If I type dot schema, that's gone for my database. 1113 01:04:52,830 --> 01:04:54,990 Well, what could I do now? 1114 01:04:54,990 --> 01:04:58,740 I've dropped the table, but I'd still need to maybe update 1115 01:04:58,740 --> 01:05:00,660 visits to instead be swipes. 1116 01:05:00,660 --> 01:05:02,850 I could probably leave stations as is. 1117 01:05:02,850 --> 01:05:06,360 But I want to update a table or alter its schema, 1118 01:05:06,360 --> 01:05:09,370 I can use some new commands as well. 1119 01:05:09,370 --> 01:05:12,960 I'll use this one called alter table. 1120 01:05:12,960 --> 01:05:16,300 Alter table looks a bit like this. 1121 01:05:16,300 --> 01:05:22,320 I can use the alter table statement here and give it some table name. 1122 01:05:22,320 --> 01:05:27,990 Like let's say alter table visits, and then inside this visits table, 1123 01:05:27,990 --> 01:05:30,370 I get to ask, what do I want to do? 1124 01:05:30,370 --> 01:05:33,030 Do I want to, let say, rename the table? 1125 01:05:33,030 --> 01:05:34,240 I could do that. 1126 01:05:34,240 --> 01:05:37,890 I could also decide to add a column, rename a column, 1127 01:05:37,890 --> 01:05:40,440 or drop a column altogether. 1128 01:05:40,440 --> 01:05:43,530 Let's try first looking at rename to. 1129 01:05:43,530 --> 01:05:47,790 I want to rename this visits table to a swipes table, 1130 01:05:47,790 --> 01:05:51,510 representing not just a visit to the station, but a swipe of a card. 1131 01:05:51,510 --> 01:05:53,340 So let's try this one here. 1132 01:05:53,340 --> 01:05:57,240 I'll go back to my computer and I'll go back to SQLite. 1133 01:05:57,240 --> 01:06:01,740 And I'll say, I no longer want visits to be called visits. 1134 01:06:01,740 --> 01:06:06,330 I ideally want visits to actually be called swipes. 1135 01:06:06,330 --> 01:06:07,590 So let me try this. 1136 01:06:07,590 --> 01:06:13,500 I'll say alter table, as we saw before, visits 1137 01:06:13,500 --> 01:06:18,180 rename to swipes, like this, semicolon. 1138 01:06:18,180 --> 01:06:19,770 Now I'll hit enter. 1139 01:06:19,770 --> 01:06:24,212 And now if I type dot schema, again, I see-- 1140 01:06:24,212 --> 01:06:25,170 oops, kind of strange-- 1141 01:06:25,170 --> 01:06:28,500 I'll hit dot schema again, I'll see swipes, 1142 01:06:28,500 --> 01:06:32,730 no longer called visits, but now called swipes. 1143 01:06:32,730 --> 01:06:36,390 What we saw-- I'd ideally like to add a type of the swipe, 1144 01:06:36,390 --> 01:06:40,200 maybe I'm entering the station, maybe I'm exiting. 1145 01:06:40,200 --> 01:06:43,770 Maybe I'm just adding funds or depositing some funds. 1146 01:06:43,770 --> 01:06:47,940 So let me try adding, let's say, a new column to swipes. 1147 01:06:47,940 --> 01:06:51,630 I'll say alter table, add column-- 1148 01:06:51,630 --> 01:06:57,910 alter table, swipes, I'm going to name that table, swipes, like this. 1149 01:06:57,910 --> 01:07:00,840 And then let's add a column-- 1150 01:07:00,840 --> 01:07:03,510 add a column called type. 1151 01:07:03,510 --> 01:07:06,600 And this will have the type affinity text. 1152 01:07:06,600 --> 01:07:08,580 I'll hit semicolon, enter. 1153 01:07:08,580 --> 01:07:12,060 And now if I type dot schema, I'll see-- 1154 01:07:12,060 --> 01:07:16,820 well, I have a column called a ttpe. 1155 01:07:16,820 --> 01:07:18,270 So clearly, I made a typo here. 1156 01:07:18,270 --> 01:07:20,570 I had rider ID, station ID. 1157 01:07:20,570 --> 01:07:26,120 This new column down below called ttpe, I kind of want to fix this, right? 1158 01:07:26,120 --> 01:07:28,430 I don't want ttpe, I want type. 1159 01:07:28,430 --> 01:07:31,760 So probably good chance to use my rename column. 1160 01:07:31,760 --> 01:07:34,250 I'll come back here and I'll try that. 1161 01:07:34,250 --> 01:07:37,610 I'll instead do alter table-- 1162 01:07:37,610 --> 01:07:43,160 alter table, swipes, and I will rename a column now. 1163 01:07:43,160 --> 01:07:51,170 I'll say rename the column, ttpe to type, spelled correctly now. 1164 01:07:51,170 --> 01:07:55,220 Hit semicolon, clear my terminal, type dot schema. 1165 01:07:55,220 --> 01:08:01,070 And now I see over here that type has been added to my table of swipes. 1166 01:08:01,070 --> 01:08:06,170 I see rider ID, station ID, and now a new column called type. 1167 01:08:06,170 --> 01:08:11,630 So through alter table, we go ahead and add new columns, rename them even, 1168 01:08:11,630 --> 01:08:15,650 or if I wanted to just drop the column altogether. 1169 01:08:15,650 --> 01:08:19,160 Let's say I add this column type and I change my mind, 1170 01:08:19,160 --> 01:08:20,580 I don't want it here anymore. 1171 01:08:20,580 --> 01:08:24,260 I go back to my computer and I could try dropping a particular column. 1172 01:08:24,260 --> 01:08:28,850 Let me try in this case, alter table, swipes. 1173 01:08:28,850 --> 01:08:33,859 And now let me drop column type, semicolon. 1174 01:08:33,859 --> 01:08:34,640 Hit enter. 1175 01:08:34,640 --> 01:08:40,529 And now if I type dot schema, I'm back to where I began. 1176 01:08:40,529 --> 01:08:45,330 So these are the commands we can use to alter our tables, to improve them 1177 01:08:45,330 --> 01:08:48,450 if you make a mistake during our first create table command, 1178 01:08:48,450 --> 01:08:52,930 or if you want to add more functionality later down the line. 1179 01:08:52,930 --> 01:08:58,050 So ideally, I could keep using alter table, add table, create table, drop 1180 01:08:58,050 --> 01:08:58,930 table, and so on. 1181 01:08:58,930 --> 01:09:01,290 But what I want to do here is just start from scratch. 1182 01:09:01,290 --> 01:09:04,029 Like I have stations and swipes and so on. 1183 01:09:04,029 --> 01:09:07,899 Why don't I just go back to my schema file and use that instead? 1184 01:09:07,899 --> 01:09:11,910 So what I'll do is I'll drop table for stations and-- 1185 01:09:11,910 --> 01:09:12,779 oops, semicolon. 1186 01:09:12,779 --> 01:09:17,490 I will drop table for swipes now, semicolon. 1187 01:09:17,490 --> 01:09:22,439 I'll type dot schema, and now I'll see nothing in here. 1188 01:09:22,439 --> 01:09:26,760 I'll quit and I'll type code schema.sql. 1189 01:09:26,760 --> 01:09:29,970 Let me just start from scratch using this schema.sql. 1190 01:09:29,970 --> 01:09:35,790 So we no longer want to have riders, we only want to have cards. 1191 01:09:35,790 --> 01:09:38,670 So I could just rename this table here. 1192 01:09:38,670 --> 01:09:39,930 I'll call it cards. 1193 01:09:39,930 --> 01:09:41,609 Create table, cards. 1194 01:09:41,609 --> 01:09:47,520 Now cards don't have a name, they only have some unique ID in this case. 1195 01:09:47,520 --> 01:09:50,460 Now I'll leave stations just as it is. 1196 01:09:50,460 --> 01:09:51,870 This seems perfectly fine to me. 1197 01:09:51,870 --> 01:09:55,800 Stations have an ID, a name and a line with these constraints 1198 01:09:55,800 --> 01:09:58,690 applied to the name and line columns. 1199 01:09:58,690 --> 01:10:00,720 But now the visits table. 1200 01:10:00,720 --> 01:10:04,500 Well, the visits table is no longer a visit to the station per se, 1201 01:10:04,500 --> 01:10:07,570 it's more a swipe of the card at that station. 1202 01:10:07,570 --> 01:10:10,950 So let's now say visits becomes swipes. 1203 01:10:10,950 --> 01:10:15,370 And among these new columns to add are the following. 1204 01:10:15,370 --> 01:10:18,690 I want to have not just an ID for each swipe, 1205 01:10:18,690 --> 01:10:22,230 not an ID for rider ID and station ID and so on, 1206 01:10:22,230 --> 01:10:26,310 but also I want to have a type of swipe. 1207 01:10:26,310 --> 01:10:30,870 Am I entering the station, exiting, or depositing some funds? 1208 01:10:30,870 --> 01:10:34,980 So I'll say the type of this is going to be a new column, 1209 01:10:34,980 --> 01:10:40,200 and the data type, this whole will be text, enter, exit, 1210 01:10:40,200 --> 01:10:42,900 or deposit for funds. 1211 01:10:42,900 --> 01:10:45,270 Now let me try another column, two. 1212 01:10:45,270 --> 01:10:47,700 I'll include a date time. 1213 01:10:47,700 --> 01:10:49,260 A date time is like a timestamp. 1214 01:10:49,260 --> 01:10:51,810 What time did this swipe actually happen? 1215 01:10:51,810 --> 01:10:54,570 And I'll make this type numeric. 1216 01:10:54,570 --> 01:11:00,650 Numeric can store all kinds of dates and times for me in this table. 1217 01:11:00,650 --> 01:11:02,950 Now let me add one final column. 1218 01:11:02,950 --> 01:11:04,990 This one will be an amount. 1219 01:11:04,990 --> 01:11:09,910 I'll also use numeric for this kind of column here. 1220 01:11:09,910 --> 01:11:13,150 And I'll say that this column called amount 1221 01:11:13,150 --> 01:11:18,500 can store, in this case integers or real numbers, like floats and so on. 1222 01:11:18,500 --> 01:11:24,310 I'll probably decide on that when I actually add some data to this table. 1223 01:11:24,310 --> 01:11:28,650 So here we've updated our schema to represent that diagram we saw before. 1224 01:11:28,650 --> 01:11:31,380 I have cards, I have stations, and I have 1225 01:11:31,380 --> 01:11:36,240 swipes that have some type associated with them, enter, exit, deposit, 1226 01:11:36,240 --> 01:11:39,750 some date, time, and some amount that was charged to me 1227 01:11:39,750 --> 01:11:41,340 while I made this swipe. 1228 01:11:41,340 --> 01:11:44,070 Either I added some funds, in which case amount is positive, 1229 01:11:44,070 --> 01:11:50,000 or I subtracted some funds, entering exiting, in this case from the station. 1230 01:11:50,000 --> 01:11:50,990 All right. 1231 01:11:50,990 --> 01:11:53,390 So I have these tables now. 1232 01:11:53,390 --> 01:11:56,750 And now I want to probably apply some of those same column 1233 01:11:56,750 --> 01:11:58,490 constraints we saw before. 1234 01:11:58,490 --> 01:12:02,840 Like here, it's fine, but I also want to make sure I'm not adding some data 1235 01:12:02,840 --> 01:12:05,280 that I don't want to add to this table. 1236 01:12:05,280 --> 01:12:09,230 So I could go back to my old friends, these column constraints. 1237 01:12:09,230 --> 01:12:14,450 And we saw before, we had default and not null, unique and check. 1238 01:12:14,450 --> 01:12:20,400 We've used not null and unique, but we haven't check or default. 1239 01:12:20,400 --> 01:12:22,970 So let's start using more than just not null and unique 1240 01:12:22,970 --> 01:12:27,720 and also focus on check and default, what they can do for us here. 1241 01:12:27,720 --> 01:12:29,670 I'll go back to my schema. 1242 01:12:29,670 --> 01:12:32,970 And let me just make sure that I'm making all the columns that I 1243 01:12:32,970 --> 01:12:35,610 want to be required actually required. 1244 01:12:35,610 --> 01:12:39,990 I'll go into my swipes table and I'll say that type-- 1245 01:12:39,990 --> 01:12:41,700 I mean, this should be required. 1246 01:12:41,700 --> 01:12:46,380 I should know whether I entered, exited, or deposited some funds. 1247 01:12:46,380 --> 01:12:48,750 So I'll say text not null. 1248 01:12:48,750 --> 01:12:52,590 Similarly, for the timestamp, the time this swipe happened, 1249 01:12:52,590 --> 01:12:54,970 I want that to be not null as well. 1250 01:12:54,970 --> 01:12:57,660 I must know what time I swiped the card. 1251 01:12:57,660 --> 01:13:00,270 And also, it makes sense for there to always be 1252 01:13:00,270 --> 01:13:02,820 some amount associated with this swipe. 1253 01:13:02,820 --> 01:13:07,450 Either I added some funds to my balance or I removed some funds overall. 1254 01:13:07,450 --> 01:13:11,260 So I'll make this not null as well. 1255 01:13:11,260 --> 01:13:14,910 Well, let's see what we could do here with default. 1256 01:13:14,910 --> 01:13:18,900 Default gives me some default value, some initial value 1257 01:13:18,900 --> 01:13:25,320 to add to this column if I don't specify what kind of value to add, 1258 01:13:25,320 --> 01:13:27,840 what be good for date time. 1259 01:13:27,840 --> 01:13:30,390 Date time here is, again, the timestamp. 1260 01:13:30,390 --> 01:13:35,080 The time I swiped this card to enter, let's say, Harvard station. 1261 01:13:35,080 --> 01:13:40,210 Well, if I want this to always have the current time, 1262 01:13:40,210 --> 01:13:43,690 I could use default. I could say the default 1263 01:13:43,690 --> 01:13:48,220 value for this column is this special value here, 1264 01:13:48,220 --> 01:13:52,160 CURRENT TIMESTAMP, in all caps. 1265 01:13:52,160 --> 01:13:57,470 CURRENT TIMESTAMP will show me the year, the month, the day, the hour, 1266 01:13:57,470 --> 01:14:02,600 the minute, the second, all in one value, and insert that into my table. 1267 01:14:02,600 --> 01:14:08,330 So as soon as I add new row, if I don't supply a time for this datetime column, 1268 01:14:08,330 --> 01:14:15,860 I'll instead get back the current time exactly as it is represented in SQLite. 1269 01:14:15,860 --> 01:14:17,760 Now what can I do further than this? 1270 01:14:17,760 --> 01:14:21,750 I could also try to add a-- my very own check. 1271 01:14:21,750 --> 01:14:26,780 Maybe I want to make sure that the amounts here are never equal to zero. 1272 01:14:26,780 --> 01:14:31,220 Like, nobody should ever be able to make a $0 transaction or $0 swipe. 1273 01:14:31,220 --> 01:14:33,860 They're always being charged some money or they're always 1274 01:14:33,860 --> 01:14:36,050 depositing some money, in this case. 1275 01:14:36,050 --> 01:14:41,540 So I could say, amount here has my very own check on this column. 1276 01:14:41,540 --> 01:14:45,980 And inside check, I can actually write my very own expression to check for. 1277 01:14:45,980 --> 01:14:51,170 I could say, for example, amount is not equal to zero. 1278 01:14:51,170 --> 01:14:54,500 Using those same operators we saw back in week zero, 1279 01:14:54,500 --> 01:15:01,810 this will ensure that any value inside amount will not be equal to zero. 1280 01:15:01,810 --> 01:15:03,760 Let's try also for type. 1281 01:15:03,760 --> 01:15:06,730 I mean, type can only have a few values. 1282 01:15:06,730 --> 01:15:11,170 We saw enter, exit, deposit some funds. 1283 01:15:11,170 --> 01:15:16,060 I could make sure that only those values are included inside my type column. 1284 01:15:16,060 --> 01:15:20,500 I could say check that type is in some list of values. 1285 01:15:20,500 --> 01:15:23,560 Going back to week one here, we talk about in. 1286 01:15:23,560 --> 01:15:31,460 I could say, maybe the type is in enter exit or is in deposit. 1287 01:15:31,460 --> 01:15:34,570 So now when I have this table called swipes, 1288 01:15:34,570 --> 01:15:38,740 I'm representing what I'm actually doing when I go to Harvard station. 1289 01:15:38,740 --> 01:15:42,633 I have here a visit for myself, my very own ID 1290 01:15:42,633 --> 01:15:44,050 here, which will update in second. 1291 01:15:44,050 --> 01:15:47,290 I have a station ID where I'm actually going to visit. 1292 01:15:47,290 --> 01:15:52,270 I have a type that I'm going to use to enter, exit, or deposit 1293 01:15:52,270 --> 01:15:54,100 some funds at this station. 1294 01:15:54,100 --> 01:15:56,710 I'm doing it at a certain time, and I have an amount 1295 01:15:56,710 --> 01:16:00,200 associated with this transaction. 1296 01:16:00,200 --> 01:16:02,500 Now, there's one thing to fix here, which 1297 01:16:02,500 --> 01:16:07,040 is that we're still talking about riders inside our swipes table. 1298 01:16:07,040 --> 01:16:08,360 So let's fix that here, too. 1299 01:16:08,360 --> 01:16:12,460 I'll go back to my computer, and let's try fixing this. 1300 01:16:12,460 --> 01:16:19,120 I have a rider ID inside of my swipes table, but no longer do I have riders, 1301 01:16:19,120 --> 01:16:20,680 I have cards. 1302 01:16:20,680 --> 01:16:25,780 So let me say that this is now a card ID, and down below in my foreign key, 1303 01:16:25,780 --> 01:16:32,650 I'll say that this card ID column references the ID column in cards. 1304 01:16:32,650 --> 01:16:35,140 And I think this should represent everything 1305 01:16:35,140 --> 01:16:39,800 I want to represent about swipes at the station. 1306 01:16:39,800 --> 01:16:44,690 So let me ask now what questions we have about this new schema 1307 01:16:44,690 --> 01:16:47,270 and the constraints we've applied. 1308 01:16:47,270 --> 01:16:50,210 AUDIENCE: I wonder, how could you delete or drop 1309 01:16:50,210 --> 01:16:56,510 the table of riders when you use ID as a foreign key? 1310 01:16:56,510 --> 01:17:00,128 I try to do that, but I got an error. 1311 01:17:00,128 --> 01:17:03,170 CARTER ZENKE: Yeah, so you're getting into some more advanced stuff here. 1312 01:17:03,170 --> 01:17:06,500 And suffice to say for now, my foreign key constraints 1313 01:17:06,500 --> 01:17:09,830 aren't actually being checked right now, but yours might be. 1314 01:17:09,830 --> 01:17:14,090 If you try to drop a table that actually has some data that's 1315 01:17:14,090 --> 01:17:17,900 referenced by a foreign key, SQLite will warn you, perhaps to tell you, 1316 01:17:17,900 --> 01:17:21,840 you can't do that because this ID is referenced from this table over there. 1317 01:17:21,840 --> 01:17:26,780 So in that case, best to delete the value that has that foreign key 1318 01:17:26,780 --> 01:17:30,470 and then proceed with dropping that table altogether. 1319 01:17:30,470 --> 01:17:31,910 Good question there. 1320 01:17:31,910 --> 01:17:34,630 Yeah, let's take one more. 1321 01:17:34,630 --> 01:17:43,570 AUDIENCE: How might this syntax send the other SQL languages, same as SQLite 1322 01:17:43,570 --> 01:17:46,580 languages, how much is syntax same? 1323 01:17:46,580 --> 01:17:48,080 CARTER ZENKE: Yeah, a good question. 1324 01:17:48,080 --> 01:17:52,000 So here we're using the SQLite database management system. 1325 01:17:52,000 --> 01:17:55,060 It is similar to but distinct in some ways 1326 01:17:55,060 --> 01:17:57,820 from others like MySQL or PostgreSQL. 1327 01:17:57,820 --> 01:18:00,220 I would say that most of what you're doing here 1328 01:18:00,220 --> 01:18:04,822 could also be ported to MySQL and PostgreSQL with a few exceptions 1329 01:18:04,822 --> 01:18:07,030 that you might need to treat on a case by case basis. 1330 01:18:07,030 --> 01:18:10,220 In fact, the developers of SQLite built things 1331 01:18:10,220 --> 01:18:14,800 so that it would be easy to port their database schemas to another schema 1332 01:18:14,800 --> 01:18:18,140 like MySQL or PostgreSQL as well. 1333 01:18:18,140 --> 01:18:21,500 But good question there. 1334 01:18:21,500 --> 01:18:23,960 All right, let's take one more question here. 1335 01:18:23,960 --> 01:18:30,260 AUDIENCE: Imagine if we put in the ID, we don't put any data type, 1336 01:18:30,260 --> 01:18:32,090 it's going to give us-- 1337 01:18:32,090 --> 01:18:37,032 in this schema, it's going to give us a text or-- 1338 01:18:37,032 --> 01:18:38,490 CARTER ZENKE: Yeah, great question. 1339 01:18:38,490 --> 01:18:42,980 So you're asking if I didn't tell SQLite what kind of type affinity 1340 01:18:42,980 --> 01:18:46,220 a column had, what type of affinity would it actually have? 1341 01:18:46,220 --> 01:18:47,360 A great question. 1342 01:18:47,360 --> 01:18:52,460 In this case, by default, SQLite gives the numeric type affinity, 1343 01:18:52,460 --> 01:18:56,250 where numeric can store integers or real values. 1344 01:18:56,250 --> 01:18:59,180 But if you give it like a string of text, like let's say red line, 1345 01:18:59,180 --> 01:19:04,980 it will store that for you, too, kind of non-intuitively, but it will. 1346 01:19:04,980 --> 01:19:08,310 But the default type affinity is numeric in this case, 1347 01:19:08,310 --> 01:19:10,340 if you don't otherwise specify. 1348 01:19:10,340 --> 01:19:11,900 Good question. 1349 01:19:11,900 --> 01:19:17,410 OK, let's come back then and let's focus on wrapping up on a final few pieces 1350 01:19:17,410 --> 01:19:17,910 here. 1351 01:19:17,910 --> 01:19:20,810 So we've seen some table constraints, which 1352 01:19:20,810 --> 01:19:23,360 we apply primary keys and foreign keys. 1353 01:19:23,360 --> 01:19:26,360 We saw column constraints, where we could make sure 1354 01:19:26,360 --> 01:19:28,970 that certain values were given to us through not null. 1355 01:19:28,970 --> 01:19:31,730 We could also make sure that the-- 1356 01:19:31,730 --> 01:19:33,925 let's say the value is in some list of values, 1357 01:19:33,925 --> 01:19:37,460 they're using check or making sure that it's not some value, 1358 01:19:37,460 --> 01:19:39,320 also using check as well. 1359 01:19:39,320 --> 01:19:44,930 Default allows us to specify a given value for every new insertion of rows 1360 01:19:44,930 --> 01:19:45,560 here. 1361 01:19:45,560 --> 01:19:49,550 And this is actually pretty important to have. 1362 01:19:49,550 --> 01:19:53,750 Not just schemas that have column names and let's 1363 01:19:53,750 --> 01:19:57,320 say type affinities as well, but also constraints, 1364 01:19:57,320 --> 01:20:00,210 make sure the data we insert works well for us. 1365 01:20:00,210 --> 01:20:02,960 And there's actually a story behind this person 1366 01:20:02,960 --> 01:20:07,820 who's on the CharlieCard, this person who's on the subway. 1367 01:20:07,820 --> 01:20:11,420 His name is Charlie, and he's perhaps the most famous subway 1368 01:20:11,420 --> 01:20:13,880 rider in all of Boston. 1369 01:20:13,880 --> 01:20:17,900 Back in the 19-- let's say 1900s, the band 1370 01:20:17,900 --> 01:20:21,290 called The Kingston Trio wrote a song about this man named Charlie. 1371 01:20:21,290 --> 01:20:26,030 Charlie, supposedly, got on at the Kendall Square station where MIT is, 1372 01:20:26,030 --> 01:20:27,950 and he made for Jamaica Plain. 1373 01:20:27,950 --> 01:20:30,320 But once he got to Jamaica Plane, the conductor 1374 01:20:30,320 --> 01:20:34,040 asked him for one more nickel, and well, he didn't have that nickel. 1375 01:20:34,040 --> 01:20:36,860 So he got stuck on the train for years, and years, and years. 1376 01:20:36,860 --> 01:20:38,750 He couldn't get off of the subway. 1377 01:20:38,750 --> 01:20:42,532 So keep in mind, Charlie, when you're writing your own database constraints, 1378 01:20:42,532 --> 01:20:44,240 making sure that if you get on the train, 1379 01:20:44,240 --> 01:20:46,610 you're able to get off of it at some point, right? 1380 01:20:46,610 --> 01:20:49,080 Don't end up like Charlie in this case. 1381 01:20:49,080 --> 01:20:53,720 So with this in mind, you're able to design your very own database 1382 01:20:53,720 --> 01:20:56,840 schemas that keep not just certain columns involved, 1383 01:20:56,840 --> 01:20:59,210 but also type affinities for those columns, types-- 1384 01:20:59,210 --> 01:21:01,790 those data types-- types the columns can store. 1385 01:21:01,790 --> 01:21:04,940 You're also able to apply constraints to those columns to make sure 1386 01:21:04,940 --> 01:21:07,130 that the data you're inserting is that you actually 1387 01:21:07,130 --> 01:21:09,380 want to have in that column. 1388 01:21:09,380 --> 01:21:13,100 Next time, we'll focus on actually adding data to our columns, 1389 01:21:13,100 --> 01:21:16,260 actually write data to a database file to insert, 1390 01:21:16,260 --> 01:21:18,720 update, and delete that data altogether. 1391 01:21:18,720 --> 01:21:22,480 So with that in mind, we'll see you next time. 1392 01:21:22,480 --> 01:21:24,000