1 00:00:00,000 --> 00:00:03,458 [MUSIC PLAYING] 2 00:00:03,458 --> 00:00:16,810 3 00:00:16,810 --> 00:00:19,270 CARTER ZENKE: OK, well, hello, one and all, and welcome 4 00:00:19,270 --> 00:00:22,810 back to CS50 Introduction to Databases with SQL. 5 00:00:22,810 --> 00:00:25,270 My name is Carter Zenke, and this week is 6 00:00:25,270 --> 00:00:28,900 the culmination of all you've learned over the past several weeks 7 00:00:28,900 --> 00:00:29,830 in the course. 8 00:00:29,830 --> 00:00:33,070 Today, we'll give the tools to all that you've done in the past, 9 00:00:33,070 --> 00:00:35,710 but now just at a bigger scale. 10 00:00:35,710 --> 00:00:40,900 Now, when we're talking about scale, it's worth defining what scale is 11 00:00:40,900 --> 00:00:45,340 and what it means for some application to have scalability. 12 00:00:45,340 --> 00:00:48,790 For an application to be scalable or to have scalability, 13 00:00:48,790 --> 00:00:52,270 it means it can take some number of requests and handle those, 14 00:00:52,270 --> 00:00:55,120 whether that request is small or large. 15 00:00:55,120 --> 00:00:58,240 And so to kick things off here, let me ask a question. 16 00:00:58,240 --> 00:01:01,480 What are some famous tech companies you've seen 17 00:01:01,480 --> 00:01:04,000 that have needed to scale over time? 18 00:01:04,000 --> 00:01:05,540 Let's brainstorm here. 19 00:01:05,540 --> 00:01:07,960 Which ones are you familiar with that needed 20 00:01:07,960 --> 00:01:12,190 to scale over time to handle more users, more data? 21 00:01:12,190 --> 00:01:13,630 What have you seen? 22 00:01:13,630 --> 00:01:16,090 AUDIENCE: Any social media sites, such as Facebook? 23 00:01:16,090 --> 00:01:17,950 CARTER ZENKE: Yeah, any social media sites. 24 00:01:17,950 --> 00:01:22,240 You can think of things like Facebook, now Meta, Instagram, and similar. 25 00:01:22,240 --> 00:01:27,220 And these kinds of sites might have some really large number of users 26 00:01:27,220 --> 00:01:30,080 that are always accessing their data. 27 00:01:30,080 --> 00:01:33,610 But even though they have a constantly large number of users, 28 00:01:33,610 --> 00:01:37,090 there might even be spikes in the number of requests that they get. 29 00:01:37,090 --> 00:01:40,600 Maybe during the World Cup, for example, everyone is on Facebook, 30 00:01:40,600 --> 00:01:41,920 everyone's on Instagram. 31 00:01:41,920 --> 00:01:45,820 That would be a spike our application should be able to handle, 32 00:01:45,820 --> 00:01:48,280 if it is truly scalable. 33 00:01:48,280 --> 00:01:51,160 Let's take one more idea here. 34 00:01:51,160 --> 00:01:56,930 What other kinds of companies-- tech, prominently-- have needed to scale? 35 00:01:56,930 --> 00:02:01,760 AUDIENCE: So I think for banking systems, they might start locally, 36 00:02:01,760 --> 00:02:04,310 but if we consider national and international level banking? 37 00:02:04,310 --> 00:02:05,560 CARTER ZENKE: So you're right. 38 00:02:05,560 --> 00:02:10,419 A banking system might need to scale as they gain more users, or more accounts, 39 00:02:10,419 --> 00:02:13,490 or they want to process more transactions. 40 00:02:13,490 --> 00:02:18,940 Now, today, we'll introduce you to a few new DBMSes, or database management 41 00:02:18,940 --> 00:02:24,340 systems, like MySQL and PostgreSQL-- or just Postgres for short. 42 00:02:24,340 --> 00:02:26,860 And developers often reach to these when they 43 00:02:26,860 --> 00:02:30,130 want to scale some application or some database 44 00:02:30,130 --> 00:02:36,490 because, whereas SQLite was a embedded database, MySQL and Postgres, 45 00:02:36,490 --> 00:02:38,650 these are database servers. 46 00:02:38,650 --> 00:02:42,970 That means they often run on their own dedicated hardware 47 00:02:42,970 --> 00:02:47,320 that you connect to over the internet to run your SQL queries. 48 00:02:47,320 --> 00:02:50,950 And now, being a database server, they have a few advantages. 49 00:02:50,950 --> 00:02:55,880 They can store data not just on disk, like on an SSD or a hard drive, 50 00:02:55,880 --> 00:03:01,840 but in RAM, or Random Access Memory, which often allows for faster querying. 51 00:03:01,840 --> 00:03:04,870 They also have a full feature set to take advantage 52 00:03:04,870 --> 00:03:08,170 of ways you could scale, like replication and sharding, 53 00:03:08,170 --> 00:03:11,030 that we'll talk about later on today. 54 00:03:11,030 --> 00:03:16,690 Now, before then, we'll actually compare MySQL and Postgres with SQLite, 55 00:03:16,690 --> 00:03:19,300 paying attention, in particular, to types. 56 00:03:19,300 --> 00:03:23,840 But later on, we'll focus on their features for scaling as well. 57 00:03:23,840 --> 00:03:27,070 Now, we'll do all of this in the context of the MBTA, 58 00:03:27,070 --> 00:03:29,980 or the Massachusetts Bay Transportation Authority. 59 00:03:29,980 --> 00:03:32,950 This is the subway system here in Boston. 60 00:03:32,950 --> 00:03:35,200 And if you're ever in town and you want to get around, 61 00:03:35,200 --> 00:03:38,200 you might use one of these subway lines. 62 00:03:38,200 --> 00:03:43,840 You could enter under the ground, get on a train, and go across town on either, 63 00:03:43,840 --> 00:03:46,280 let's say, the red line, on watch Harvard and MIT are 64 00:03:46,280 --> 00:03:50,287 a part, the green line, the orange line, or the blue line. 65 00:03:50,287 --> 00:03:52,120 And if you recall, in our week on designing, 66 00:03:52,120 --> 00:03:58,030 we talked about how to represent all of the riders and stations on the MBTA 67 00:03:58,030 --> 00:03:59,140 here. 68 00:03:59,140 --> 00:04:02,470 Now, one thing we discussed was that the MBTA doesn't 69 00:04:02,470 --> 00:04:04,540 keep track of individual people. 70 00:04:04,540 --> 00:04:07,960 You often don't register your name with them. 71 00:04:07,960 --> 00:04:10,990 You, instead, get your very own CharlieCard 72 00:04:10,990 --> 00:04:14,170 that you might swipe when you enter some station. 73 00:04:14,170 --> 00:04:16,690 Often, if I'm getting on the Harvard Square stop, 74 00:04:16,690 --> 00:04:19,750 I'll take out my CharlieCard and enter that station. 75 00:04:19,750 --> 00:04:24,310 I'll swipe it against some counter, and that will log my fare. 76 00:04:24,310 --> 00:04:27,580 I could also swipe my card to deposit some funds, 77 00:04:27,580 --> 00:04:29,660 to pay my fare in the future. 78 00:04:29,660 --> 00:04:33,160 And in some stops, I might actually swipe to leave the station 79 00:04:33,160 --> 00:04:35,890 and pay some extra fare there. 80 00:04:35,890 --> 00:04:41,635 Now, we decided, then, to represent, in this case, cards, swipes, and stations 81 00:04:41,635 --> 00:04:43,990 with their own table for each of them. 82 00:04:43,990 --> 00:04:49,030 Cards just had their own unique ID, and those cards made some swipe. 83 00:04:49,030 --> 00:04:52,270 And like we just mentioned, this swipe has a certain type. 84 00:04:52,270 --> 00:04:55,280 Did I swipe it to, let's say, enter the station, 85 00:04:55,280 --> 00:04:58,630 did I swipe it to exit the station, or did I 86 00:04:58,630 --> 00:05:02,530 swipe my card to deposit some funds into my account? 87 00:05:02,530 --> 00:05:06,610 And each swipe has, as well, its own date and time it occurred 88 00:05:06,610 --> 00:05:09,250 as well as some amount associated with it. 89 00:05:09,250 --> 00:05:12,460 Did I pay some money to enter, or did I deposit some money 90 00:05:12,460 --> 00:05:15,730 to actually add to the funds in my account? 91 00:05:15,730 --> 00:05:19,300 And then, finally, this all happens at individual stations, 92 00:05:19,300 --> 00:05:22,900 with their own IDs, their own names, like the Harvard Square stop 93 00:05:22,900 --> 00:05:26,140 or the Kendall/MIT stop, and their own line, like the red, 94 00:05:26,140 --> 00:05:28,990 green, blue, or orange lines. 95 00:05:28,990 --> 00:05:34,420 So our goal will be to first compare MySQL with SQLite, seeing 96 00:05:34,420 --> 00:05:39,140 how we can translate a schema like this from SQLite to MySQL. 97 00:05:39,140 --> 00:05:43,090 OK, let me come back to my computer here and open up our very first instance 98 00:05:43,090 --> 00:05:45,130 of a MySQL server. 99 00:05:45,130 --> 00:05:47,920 So to connect to this server here, I can run 100 00:05:47,920 --> 00:05:50,770 some command that begins with mysql. 101 00:05:50,770 --> 00:05:53,170 This is a command line program I can run to connect 102 00:05:53,170 --> 00:05:55,720 to the server over the internet. 103 00:05:55,720 --> 00:06:01,150 Now, unlike SQLite, MySQL actually has individual user accounts 104 00:06:01,150 --> 00:06:03,290 to log into the database. 105 00:06:03,290 --> 00:06:09,110 And here, I want to log in as the root user or the admin for this database. 106 00:06:09,110 --> 00:06:13,540 Now, to say the user I want to log in as, I can type dash u, where 107 00:06:13,540 --> 00:06:17,380 u is in lowercase here, followed by the name of the user 108 00:06:17,380 --> 00:06:19,690 I'm logging in as-- in this case, root. 109 00:06:19,690 --> 00:06:25,270 And again, root to just some synonym for the admin of some database. 110 00:06:25,270 --> 00:06:27,940 Now, I connect to this database over the internet. 111 00:06:27,940 --> 00:06:32,350 And I might often include the IP address of the computer, the unique identifier 112 00:06:32,350 --> 00:06:34,720 of that computer on the internet. 113 00:06:34,720 --> 00:06:38,840 For my sake, though, this database is on my very own computer here. 114 00:06:38,840 --> 00:06:40,030 So for the host-- 115 00:06:40,030 --> 00:06:42,910 or I type dash h, where h is in lowercase-- 116 00:06:42,910 --> 00:06:48,070 I can say 127.0.0.1, which, if you're familiar, 117 00:06:48,070 --> 00:06:53,900 is the IP address for my own home computer, also known as local host. 118 00:06:53,900 --> 00:06:56,930 Now, I can keep going, and I could tell MySQL 119 00:06:56,930 --> 00:06:59,720 what the port I'm connecting to is. 120 00:06:59,720 --> 00:07:03,650 In this case, I'll type dash p, where p is an uppercase, and then 121 00:07:03,650 --> 00:07:04,520 the port number-- 122 00:07:04,520 --> 00:07:11,370 3306 by default. And then I want to be prompted for the root password. 123 00:07:11,370 --> 00:07:14,330 So I'll type dash p, where p is lowercase. 124 00:07:14,330 --> 00:07:16,980 And finally, I'll hit Enter to connect. 125 00:07:16,980 --> 00:07:20,210 I'll type in my password, and I should see 126 00:07:20,210 --> 00:07:24,200 that I'm connected to my MySQL server. 127 00:07:24,200 --> 00:07:26,990 Now, this is a full database server. 128 00:07:26,990 --> 00:07:30,870 And as such, it could have multiple databases inside of it. 129 00:07:30,870 --> 00:07:33,620 In fact, there are a few here already. 130 00:07:33,620 --> 00:07:37,010 If I type SHOW DATABASES, just like this, 131 00:07:37,010 --> 00:07:42,440 you should see, by default, on your own MySQL installation on your own server, 132 00:07:42,440 --> 00:07:45,470 a few default databases, like information schema, 133 00:07:45,470 --> 00:07:48,380 or MySQL, or performance schema, or system. 134 00:07:48,380 --> 00:07:52,160 And these are tables or databases that have some meta 135 00:07:52,160 --> 00:07:55,940 information on the server, like the user accounts, how performance 136 00:07:55,940 --> 00:07:58,820 is configured, and other details that you might care about 137 00:07:58,820 --> 00:08:01,970 if you're a system administrator. 138 00:08:01,970 --> 00:08:06,860 So we've seen here how to log in to a MySQL server and some of these default 139 00:08:06,860 --> 00:08:07,940 databases. 140 00:08:07,940 --> 00:08:13,130 Now let me ask, what questions do we have on MySQL so far? 141 00:08:13,130 --> 00:08:14,207 AUDIENCE: What's -u root? 142 00:08:14,207 --> 00:08:16,290 CARTER ZENKE: Yeah, let me go back to my computer. 143 00:08:16,290 --> 00:08:19,457 I can show you what that looked like so we can refresh our memories on that. 144 00:08:19,457 --> 00:08:22,280 So if you recall-- let me open up a new terminal here-- 145 00:08:22,280 --> 00:08:29,630 to connect to this server I typed mysql u root, or -u root. 146 00:08:29,630 --> 00:08:33,080 And what this means is when I want to connect to my server, 147 00:08:33,080 --> 00:08:36,320 I have to connect to it using some user account. 148 00:08:36,320 --> 00:08:39,559 Unlike SQL, where I could just kind of log in without any users, 149 00:08:39,559 --> 00:08:41,750 MySQL has users. 150 00:08:41,750 --> 00:08:46,730 So the -u says, here comes the user I want to log in as. 151 00:08:46,730 --> 00:08:50,480 And root, in this case, is a name for the admin, 152 00:08:50,480 --> 00:08:55,220 just some geek historical term for the admin of some system. 153 00:08:55,220 --> 00:09:01,160 I could also very well use -u carter if my username was just Carter. 154 00:09:01,160 --> 00:09:03,230 But here, we're using route to get access 155 00:09:03,230 --> 00:09:07,380 to some administrative privileges. 156 00:09:07,380 --> 00:09:08,590 OK, good question. 157 00:09:08,590 --> 00:09:10,000 Good to clarify that. 158 00:09:10,000 --> 00:09:11,410 So let's keep going. 159 00:09:11,410 --> 00:09:15,480 And right now, we have access to our database server, 160 00:09:15,480 --> 00:09:17,920 with the databases inside of it. 161 00:09:17,920 --> 00:09:20,910 But our goal is to design the MBTA system 162 00:09:20,910 --> 00:09:24,600 and create our very own database, now in MySQL. 163 00:09:24,600 --> 00:09:27,570 So let me try creating some new database. 164 00:09:27,570 --> 00:09:31,030 I'll say CREATE DATABASE, like this. 165 00:09:31,030 --> 00:09:32,730 And now I'll give it some name. 166 00:09:32,730 --> 00:09:35,940 And in MySQL, I don't use double quotes, like I did in SQLite. 167 00:09:35,940 --> 00:09:38,970 I use back ticks, like this. 168 00:09:38,970 --> 00:09:42,030 CREATE DATABASE, back tick, mbta for title, 169 00:09:42,030 --> 00:09:44,700 back tick again to close that all out. 170 00:09:44,700 --> 00:09:47,010 Now I'll hit Enter. 171 00:09:47,010 --> 00:09:50,190 And I'll see the query goes, OK, one row affected. 172 00:09:50,190 --> 00:09:53,100 If I now type SHOW DATABASES-- 173 00:09:53,100 --> 00:10:00,640 SHOW DATABASES-- I should see that MBTA is now part of this server. 174 00:10:00,640 --> 00:10:04,740 And if I want to run my future queries on this database, 175 00:10:04,740 --> 00:10:08,040 I could use another command, another statement here, USE. 176 00:10:08,040 --> 00:10:11,670 I'll say USE mbta, semicolon, hit Enter. 177 00:10:11,670 --> 00:10:13,750 And I see Database changed. 178 00:10:13,750 --> 00:10:21,450 So now all my future queries will be run on this MBTA database that I just made. 179 00:10:21,450 --> 00:10:24,230 So presently, our database is empty. 180 00:10:24,230 --> 00:10:27,150 And it's up to us to add in some tables. 181 00:10:27,150 --> 00:10:31,790 So let's think through how we could add tables to our very own database we've 182 00:10:31,790 --> 00:10:33,810 just made here. 183 00:10:33,810 --> 00:10:38,240 But one thing we should focus on is this idea of the types for our columns. 184 00:10:38,240 --> 00:10:40,625 I think we know already what tables we'll have 185 00:10:40,625 --> 00:10:45,170 and what names we'll give them, but it remains to be seen what kind of columns 186 00:10:45,170 --> 00:10:47,300 should we add. 187 00:10:47,300 --> 00:10:51,620 And actually, in MySQL, we get a lot more control over types. 188 00:10:51,620 --> 00:10:55,070 We no longer have just text and numeric and integer. 189 00:10:55,070 --> 00:10:59,240 We have a lot more control and a lot more variety. 190 00:10:59,240 --> 00:11:02,150 So let's look here at our cards table. 191 00:11:02,150 --> 00:11:05,330 We want to recreate this now in MySQL. 192 00:11:05,330 --> 00:11:07,220 And this is from SQLite here. 193 00:11:07,220 --> 00:11:11,870 Recall how we had a table called cards that had an id column, with the type 194 00:11:11,870 --> 00:11:13,520 affinity of INTEGER. 195 00:11:13,520 --> 00:11:16,830 And that was our primary key down here. 196 00:11:16,830 --> 00:11:20,420 So our goal is now to think about how do we translate this from SQL 197 00:11:20,420 --> 00:11:22,430 right to MySQL. 198 00:11:22,430 --> 00:11:28,280 So we should think, what kind of types might MySQL have for integers? 199 00:11:28,280 --> 00:11:30,710 Well, it turns out there's more than just one. 200 00:11:30,710 --> 00:11:32,270 There are actually several here. 201 00:11:32,270 --> 00:11:38,180 We have tiny int, small int, medium int, int, and then big int, 202 00:11:38,180 --> 00:11:41,090 so lots of choices over our integers. 203 00:11:41,090 --> 00:11:44,150 And these vary just based on their size. 204 00:11:44,150 --> 00:11:46,370 I mean, they all store whole numbers. 205 00:11:46,370 --> 00:11:49,310 But they do vary based on their size. 206 00:11:49,310 --> 00:11:52,310 So a tiny int is only a single byte. 207 00:11:52,310 --> 00:11:53,630 It's about 8 bits. 208 00:11:53,630 --> 00:11:59,010 But a big int, that's 8 bytes, or 64 bits. 209 00:11:59,010 --> 00:12:01,670 Now, why do we have so many of these? 210 00:12:01,670 --> 00:12:04,100 Why can't we just have one? 211 00:12:04,100 --> 00:12:07,280 Well, let me ask you all to use your own intuition. 212 00:12:07,280 --> 00:12:11,450 I mean, what does using more bytes and more bits 213 00:12:11,450 --> 00:12:15,110 get me when I'm trying to represent an integer? 214 00:12:15,110 --> 00:12:20,600 MySQL basically, it depends on the need of the DB or the organization. 215 00:12:20,600 --> 00:12:24,140 If the data requires much more storage, then it 216 00:12:24,140 --> 00:12:27,870 means we need to have a bigger data type. 217 00:12:27,870 --> 00:12:33,980 But if currently if we need smaller sizes, then it means 218 00:12:33,980 --> 00:12:37,437 we need to use smaller data types. 219 00:12:37,437 --> 00:12:39,270 CARTER ZENKE: Yeah, so I like your thinking. 220 00:12:39,270 --> 00:12:43,730 And I think kind of behind that is this idea that if we used more bits, 221 00:12:43,730 --> 00:12:46,160 we'd be able to count higher. 222 00:12:46,160 --> 00:12:49,760 Just as I have on my hands, like 10 fingers, if I had more fingers, 223 00:12:49,760 --> 00:12:52,220 I could count higher in unitary notation, right? 224 00:12:52,220 --> 00:12:55,490 Or if I had more digits, I could count higher using decimal. 225 00:12:55,490 --> 00:12:59,780 So it's the same thing in binary here, where I could count higher 226 00:12:59,780 --> 00:13:01,950 the more bits that I have. 227 00:13:01,950 --> 00:13:05,240 So if I take a look at this table here, each data type 228 00:13:05,240 --> 00:13:08,990 has a range of values that it could possibly store. 229 00:13:08,990 --> 00:13:12,810 If I'm talking about both negative and positive values, 230 00:13:12,810 --> 00:13:15,890 here's the range for each of these data types. 231 00:13:15,890 --> 00:13:21,020 You could see for a tiny int we can only count up to 127. 232 00:13:21,020 --> 00:13:21,680 And that is it. 233 00:13:21,680 --> 00:13:25,340 We cannot count higher using a single byte. 234 00:13:25,340 --> 00:13:30,110 But if I used a big int, well, I could count all the way up to 2 235 00:13:30,110 --> 00:13:32,870 to the 63rd power minus 1. 236 00:13:32,870 --> 00:13:34,970 That is almost unpronounceably big. 237 00:13:34,970 --> 00:13:38,300 It's something like a quintillion number of options 238 00:13:38,300 --> 00:13:41,810 that you might have between these two minimum values 239 00:13:41,810 --> 00:13:43,800 and the maximum values here. 240 00:13:43,800 --> 00:13:47,720 So all trade off what you want to represent. 241 00:13:47,720 --> 00:13:51,230 But our goal here is to translate a primary key. 242 00:13:51,230 --> 00:13:53,780 A primary key, it isn't negative. 243 00:13:53,780 --> 00:13:58,460 So we could perhaps use something called an unsigned integer. 244 00:13:58,460 --> 00:14:03,620 An unsigned integer has no negative value, starts at 0, 245 00:14:03,620 --> 00:14:07,430 but can then go to a higher positive value 246 00:14:07,430 --> 00:14:11,000 because we're now using some of those combinations of ones and zeros 247 00:14:11,000 --> 00:14:15,410 that represented negative numbers now for positive numbers too. 248 00:14:15,410 --> 00:14:22,760 So here an int can go up to almost 4.2 billion values, between 0 249 00:14:22,760 --> 00:14:24,290 and 4.2 billion. 250 00:14:24,290 --> 00:14:28,200 So an int is already pretty darn big. 251 00:14:28,200 --> 00:14:33,480 So with these in mind, tiny int, small int, medium int, int, and big int, 252 00:14:33,480 --> 00:14:37,230 let's go back to our SQLite CREATE TABLE statement. 253 00:14:37,230 --> 00:14:42,420 And let me actually ask you all, what would you substitute in place here? 254 00:14:42,420 --> 00:14:45,870 We no longer have integer for our primary key. 255 00:14:45,870 --> 00:14:49,320 But which of these do you think seems the best 256 00:14:49,320 --> 00:14:58,800 substitute for our primary key on cards, tiny, small, medium, int, or big int? 257 00:14:58,800 --> 00:15:04,240 AUDIENCE: So in case we know that it's not going to be really huge, 258 00:15:04,240 --> 00:15:06,240 we can just use an int. 259 00:15:06,240 --> 00:15:12,872 But if it's going to be very large, like say a timestamp, we could use big int. 260 00:15:12,872 --> 00:15:14,580 CARTER ZENKE: Yeah, I like your thinking. 261 00:15:14,580 --> 00:15:18,300 So if it's big but not too big, maybe an int would be good. 262 00:15:18,300 --> 00:15:21,660 If we know for sure though it's going to be huge, well, 263 00:15:21,660 --> 00:15:23,640 a big int is probably best. 264 00:15:23,640 --> 00:15:26,290 And let me offer some intuition here. 265 00:15:26,290 --> 00:15:29,850 So if there are fewer than four billion people 266 00:15:29,850 --> 00:15:31,950 on this Earth, which I believe is a true fact, 267 00:15:31,950 --> 00:15:35,910 and we're using one of these IDs to represent one card, 268 00:15:35,910 --> 00:15:40,200 well, if everyone on Earth visited Boston and got their own CharlieCard, 269 00:15:40,200 --> 00:15:44,290 we wouldn't run out of values using an int here. 270 00:15:44,290 --> 00:15:46,740 So it seems like that's a pretty safe bet, 271 00:15:46,740 --> 00:15:49,830 unless we knew that everyone in Boston was 272 00:15:49,830 --> 00:15:54,390 going to have 10 CharlieCards, or 20, or more, then we could use a big int. 273 00:15:54,390 --> 00:15:57,670 But in this case, an int seems pretty safe. 274 00:15:57,670 --> 00:15:59,380 So let's update this. 275 00:15:59,380 --> 00:16:02,970 Let's go from SQLite now to MySQL. 276 00:16:02,970 --> 00:16:07,080 So here I had the ID column with type affinity integer. 277 00:16:07,080 --> 00:16:10,020 And my first step in converting should probably 278 00:16:10,020 --> 00:16:12,930 be to get rid of these double quotes. 279 00:16:12,930 --> 00:16:14,640 I no longer need to use double quotes. 280 00:16:14,640 --> 00:16:18,690 I should use back ticks as a matter of style in MySQL. 281 00:16:18,690 --> 00:16:24,360 But now I should update this type from integer perhaps to a plain old int. 282 00:16:24,360 --> 00:16:26,880 As we discussed before, some intuition for that is we'll 283 00:16:26,880 --> 00:16:32,430 probably never have more than four billion CharlieCards in our table. 284 00:16:32,430 --> 00:16:38,460 Now, there's one more thing to add too, which is this, auto increment. 285 00:16:38,460 --> 00:16:44,130 So SQLite for free gave us a primary key that if I inserted some value 286 00:16:44,130 --> 00:16:46,290 and didn't provide a value for the primary key, 287 00:16:46,290 --> 00:16:48,520 SQLite would generate that for me. 288 00:16:48,520 --> 00:16:53,040 But in MySQL, I have to actually say auto increment. 289 00:16:53,040 --> 00:16:58,290 If I add some new row, make sure you take the highest primary key right now, 290 00:16:58,290 --> 00:17:02,140 increase it by 1, and assign it to that new value. 291 00:17:02,140 --> 00:17:06,660 So you'll often see primary keys being defined with the name ID, 292 00:17:06,660 --> 00:17:12,839 and the type int, and the auto increment property of that column there. 293 00:17:12,839 --> 00:17:18,300 So let me ask now, what questions do we have on these integers and types 294 00:17:18,300 --> 00:17:19,730 so far? 295 00:17:19,730 --> 00:17:23,000 AUDIENCE: For ID we have using integer type. 296 00:17:23,000 --> 00:17:27,530 Why we know that ID cannot be negative it will always start from one and go 297 00:17:27,530 --> 00:17:28,650 onward. 298 00:17:28,650 --> 00:17:30,458 So we should use it and [INAUDIBLE]. 299 00:17:30,458 --> 00:17:31,250 CARTER ZENKE: Nice. 300 00:17:31,250 --> 00:17:33,650 So here we didn't include it partly for slides sake. 301 00:17:33,650 --> 00:17:37,970 But you could if you wanted to make sure that the ID is unsigned. 302 00:17:37,970 --> 00:17:42,530 So unsigned means going from 0 all the way up to some larger number. 303 00:17:42,530 --> 00:17:45,020 Whereas not unsigned, or just assigned integer, 304 00:17:45,020 --> 00:17:47,360 goes from negative to some positive value. 305 00:17:47,360 --> 00:17:49,870 Absolutely right. 306 00:17:49,870 --> 00:17:51,500 OK, let's keep going then. 307 00:17:51,500 --> 00:17:54,200 And let's think through our next table here. 308 00:17:54,200 --> 00:17:57,760 So our next table is not just the cards table. 309 00:17:57,760 --> 00:17:59,770 It's also the stations table. 310 00:17:59,770 --> 00:18:03,490 And actually, before we do that, let's actually try to add in our cards table 311 00:18:03,490 --> 00:18:04,790 to our server. 312 00:18:04,790 --> 00:18:07,540 So I'll go to MySQL here. 313 00:18:07,540 --> 00:18:11,830 And now, let me try to create this table to store it inside of our server 314 00:18:11,830 --> 00:18:13,450 before we move on to stations. 315 00:18:13,450 --> 00:18:17,710 So I'll say create table cards. 316 00:18:17,710 --> 00:18:20,215 And then I'll follow some very similar syntax 317 00:18:20,215 --> 00:18:22,960 from what you've seen before, create table cards. 318 00:18:22,960 --> 00:18:29,660 Now, let me add an ID column, int, and auto increment like this. 319 00:18:29,660 --> 00:18:35,170 And now, I'll make the primary key ID, like this. 320 00:18:35,170 --> 00:18:39,160 Now, I'll close out my create table with a semicolon, hit Enter. 321 00:18:39,160 --> 00:18:43,440 And I should see the query is OK. 322 00:18:43,440 --> 00:18:49,350 Now, if I use the MBTA database, as I did before, I typed use MBTA, 323 00:18:49,350 --> 00:18:54,930 I could show tables, like this, semicolon, hit Enter. 324 00:18:54,930 --> 00:19:01,080 And now I'll see I have a cards table inside the MBTA database. 325 00:19:01,080 --> 00:19:05,730 So kind of like, for example, typing dot schema or dot tables, in this instance, 326 00:19:05,730 --> 00:19:09,870 in SQLite to see what tables I have. 327 00:19:09,870 --> 00:19:11,710 Now though, let me try this. 328 00:19:11,710 --> 00:19:14,940 Let me try describe cards. 329 00:19:14,940 --> 00:19:20,550 Describe is a bit more like dot schema to show me the schema of this table. 330 00:19:20,550 --> 00:19:22,200 I'll describe cards. 331 00:19:22,200 --> 00:19:24,990 And I'll see a few different columns here. 332 00:19:24,990 --> 00:19:27,810 Let me come over to this board here. 333 00:19:27,810 --> 00:19:35,320 And we'll see that this table, cards, has one column, one field, called ID. 334 00:19:35,320 --> 00:19:39,360 The type of this column is an integer, as we said it was before 335 00:19:39,360 --> 00:19:41,130 in our create table statement. 336 00:19:41,130 --> 00:19:45,060 Now, this null here, this is whether this column 337 00:19:45,060 --> 00:19:47,890 can accept a null value, yes or no. 338 00:19:47,890 --> 00:19:49,560 Yes means it can accept a null value. 339 00:19:49,560 --> 00:19:50,880 It can hold null. 340 00:19:50,880 --> 00:19:54,430 No means it should never hold a null value. 341 00:19:54,430 --> 00:20:00,280 So this ID should never include a null value. 342 00:20:00,280 --> 00:20:05,140 Now, the key, key column tells us what role this column plays 343 00:20:05,140 --> 00:20:07,270 in terms of the keys of this table. 344 00:20:07,270 --> 00:20:11,540 PRI here, P-R-I, stands for primary key. 345 00:20:11,540 --> 00:20:14,230 This is the primary key of our table. 346 00:20:14,230 --> 00:20:19,578 Default means the initial value given to any value. 347 00:20:19,578 --> 00:20:21,370 If we inserted some row with no value, we'd 348 00:20:21,370 --> 00:20:23,620 get back this default value here, null. 349 00:20:23,620 --> 00:20:26,650 And the extra is what we included auto increment. 350 00:20:26,650 --> 00:20:31,150 It should increase by one every time we add some new row. 351 00:20:31,150 --> 00:20:34,540 So using describe and show tables, you can actually 352 00:20:34,540 --> 00:20:38,660 see what you're creating along the way. 353 00:20:38,660 --> 00:20:39,910 So now, let's keep going. 354 00:20:39,910 --> 00:20:42,550 Let's actually focus on creating our stations table. 355 00:20:42,550 --> 00:20:48,680 And for that, we'll pull up our SQLite syntax we use to create that table. 356 00:20:48,680 --> 00:20:55,540 So here, here was what we used to create the stations table in SQLite. 357 00:20:55,540 --> 00:21:01,570 We gave each station an ID that was an integer, the primary key of this table. 358 00:21:01,570 --> 00:21:05,080 We then gave them each a name and a line. 359 00:21:05,080 --> 00:21:08,440 Like the name could be the Harvard Square Station or the Kendall MIT 360 00:21:08,440 --> 00:21:09,040 station. 361 00:21:09,040 --> 00:21:12,550 And the line could be red, green, blue, whatever line 362 00:21:12,550 --> 00:21:14,350 this station is a part of. 363 00:21:14,350 --> 00:21:19,240 And of course, the primary key is our ID column, as we just said. 364 00:21:19,240 --> 00:21:22,100 Now here, we've seen how to handle this integer type. 365 00:21:22,100 --> 00:21:25,060 But we haven't yet seen how to handle text. 366 00:21:25,060 --> 00:21:30,250 Like could we do better than this in MySQL compared to SQLite? 367 00:21:30,250 --> 00:21:31,840 It turns out that we can. 368 00:21:31,840 --> 00:21:35,860 We have a variety of options for representing strings, basically 369 00:21:35,860 --> 00:21:37,790 this collection of text here. 370 00:21:37,790 --> 00:21:42,850 And two of the more popular ones are going to be char and varchar. 371 00:21:42,850 --> 00:21:49,150 So char is a fixed width string, a string that only has and will ever 372 00:21:49,150 --> 00:21:52,660 have the same number of characters. 373 00:21:52,660 --> 00:21:55,840 You could think of, for example, in the US, 374 00:21:55,840 --> 00:22:01,540 we abbreviate state names using two letters, like MA from Massachusetts, 375 00:22:01,540 --> 00:22:03,220 or VA for Virginia. 376 00:22:03,220 --> 00:22:06,470 You might do the same thing with countries as well. 377 00:22:06,470 --> 00:22:10,480 Now, that would be useful for a char because this char will always 378 00:22:10,480 --> 00:22:13,940 hold only two characters. 379 00:22:13,940 --> 00:22:17,900 But often you're not sure how long the string is. 380 00:22:17,900 --> 00:22:19,610 You're going to add to your column. 381 00:22:19,610 --> 00:22:22,670 Like let's say you're trying to store names of people. 382 00:22:22,670 --> 00:22:24,350 Well, my name is Carter. 383 00:22:24,350 --> 00:22:27,210 But yours could be longer or shorter. 384 00:22:27,210 --> 00:22:30,140 So a varchar works well for that scenario, 385 00:22:30,140 --> 00:22:35,120 where a varchar can accept a variable length string up 386 00:22:35,120 --> 00:22:38,160 to some number of characters. 387 00:22:38,160 --> 00:22:41,030 And when you use these, MySQL will ask you 388 00:22:41,030 --> 00:22:46,410 to provide how much space this type should take up using M here. 389 00:22:46,410 --> 00:22:50,360 So I could say char parentheses 2 to get back 390 00:22:50,360 --> 00:22:54,140 a fixed width string of two characters. 391 00:22:54,140 --> 00:22:58,010 For varchar I could say varchar and put in the middle here like, let's say, 392 00:22:58,010 --> 00:23:00,620 16, or 32, or 64. 393 00:23:00,620 --> 00:23:04,160 And that would be the maximum size of the string 394 00:23:04,160 --> 00:23:07,730 that I could represent using varchar. 395 00:23:07,730 --> 00:23:12,500 So good here for strings that are going to be a bit shorter in length. 396 00:23:12,500 --> 00:23:15,290 What if though you want to store the text of a book 397 00:23:15,290 --> 00:23:17,430 or something a bit longer than that? 398 00:23:17,430 --> 00:23:21,320 Well, you could use another type, this one called text. 399 00:23:21,320 --> 00:23:24,500 And this is different than SQLite's text. 400 00:23:24,500 --> 00:23:28,970 In SQLite's text I could store short strings, long strings, et cetera. 401 00:23:28,970 --> 00:23:35,760 In MySQL text is good for longer chunks of text, paragraphs, pages of books, 402 00:23:35,760 --> 00:23:36,800 and so on. 403 00:23:36,800 --> 00:23:42,350 And you actually have some options for the length or the size of this text. 404 00:23:42,350 --> 00:23:46,430 You could use tiny text, medium text, or long text. 405 00:23:46,430 --> 00:23:51,530 Each just uses some variable amount of space on your computer to store, 406 00:23:51,530 --> 00:23:54,020 in this case, characters that are going to be 407 00:23:54,020 --> 00:23:56,840 longer than any particular string, like a name, 408 00:23:56,840 --> 00:24:00,910 and more like some page in a book. 409 00:24:00,910 --> 00:24:04,590 Now, besides text we do have our old friend blob. 410 00:24:04,590 --> 00:24:08,880 And blob still stores binary exactly as you give it. 411 00:24:08,880 --> 00:24:13,650 And more interestingly though, we also have enum and set. 412 00:24:13,650 --> 00:24:14,920 And these are brand new. 413 00:24:14,920 --> 00:24:16,620 These are not in SQLite. 414 00:24:16,620 --> 00:24:19,260 But they're very useful for us. 415 00:24:19,260 --> 00:24:24,480 If you remember back in our week on designing, we introduced constraints. 416 00:24:24,480 --> 00:24:30,240 Maybe a constraint is that this column should only have certain values. 417 00:24:30,240 --> 00:24:33,330 Like let's say if we're calculating t-shirt sizes, 418 00:24:33,330 --> 00:24:39,120 I should only have xs for extra small, s for small, m for medium. 419 00:24:39,120 --> 00:24:43,900 There's some fixed range of values this column could hold. 420 00:24:43,900 --> 00:24:48,000 Well, I could use enum for that, where enum 421 00:24:48,000 --> 00:24:53,910 lets me enumerate the possible options I could put into this column, 422 00:24:53,910 --> 00:24:59,970 like xs, s, m if I'm doing t-shirt sizes, or something else altogether. 423 00:24:59,970 --> 00:25:03,270 Now, set complements enum. 424 00:25:03,270 --> 00:25:07,320 Enum means I can only have one of these options. 425 00:25:07,320 --> 00:25:13,830 If I had xs, x, s, m and so on for t-shirt sizes, I can only choose one. 426 00:25:13,830 --> 00:25:19,680 But a set will allow me to choose more than one to store in that same cell. 427 00:25:19,680 --> 00:25:22,560 You could think of, for example, movie genres. 428 00:25:22,560 --> 00:25:26,160 A movie isn't just a horror, or a fiction. 429 00:25:26,160 --> 00:25:27,420 Or it isn't just comedy. 430 00:25:27,420 --> 00:25:29,200 It could be all of those together. 431 00:25:29,200 --> 00:25:33,960 And so set would allow you to choose not just one option from your list, 432 00:25:33,960 --> 00:25:39,580 but multiple and put it inside of that single cell there. 433 00:25:39,580 --> 00:25:43,380 So let's go back to our SQLite create table statement. 434 00:25:43,380 --> 00:25:45,780 And let's update this for MySQL. 435 00:25:45,780 --> 00:25:47,880 Well, the first thing you do is what? 436 00:25:47,880 --> 00:25:50,290 Change these double quotes to back ticks. 437 00:25:50,290 --> 00:25:52,420 So I'll do that here. 438 00:25:52,420 --> 00:25:54,780 And now, let's update this ID. 439 00:25:54,780 --> 00:26:00,420 We saw before we could use regular old int and auto increment, like this. 440 00:26:00,420 --> 00:26:02,640 But now we have some more choice. 441 00:26:02,640 --> 00:26:06,000 Like for this name field that should store 442 00:26:06,000 --> 00:26:10,980 the name of a station, let me actually ask you all, what do you think 443 00:26:10,980 --> 00:26:13,080 is the best type to use? 444 00:26:13,080 --> 00:26:16,410 I'm just wanting the name of some station. 445 00:26:16,410 --> 00:26:19,560 AUDIENCE: It would be a varchar simply because you don't know 446 00:26:19,560 --> 00:26:21,405 how long the station name actually is. 447 00:26:21,405 --> 00:26:22,780 CARTER ZENKE: Yeah, a good point. 448 00:26:22,780 --> 00:26:25,560 So we're always creating new stations. 449 00:26:25,560 --> 00:26:28,440 And maybe some station name is pretty long. 450 00:26:28,440 --> 00:26:29,550 Or maybe it's short. 451 00:26:29,550 --> 00:26:31,140 We can't really tell, right? 452 00:26:31,140 --> 00:26:35,850 So a varchar is good if we have some variable amount of characters 453 00:26:35,850 --> 00:26:38,460 inside the text we're trying to store. 454 00:26:38,460 --> 00:26:43,050 And notice how a station name isn't paragraphs of text. 455 00:26:43,050 --> 00:26:44,890 It's just maybe a few words. 456 00:26:44,890 --> 00:26:47,040 That's good for a varchar. 457 00:26:47,040 --> 00:26:50,220 And in general here I could say, well, let's try to use a varchar 458 00:26:50,220 --> 00:26:55,120 and say we could use up to 32 characters including spaces. 459 00:26:55,120 --> 00:26:59,460 I think that will cover us pretty well for names of stations. 460 00:26:59,460 --> 00:27:02,130 Now let's talk about the line. 461 00:27:02,130 --> 00:27:05,110 Now, in Boston, there are a few lines. 462 00:27:05,110 --> 00:27:10,740 There's the blue line, the green line, the red line, even the silver line. 463 00:27:10,740 --> 00:27:17,190 But what type should we use for storing lines then 464 00:27:17,190 --> 00:27:19,240 that these stations are a part of? 465 00:27:19,240 --> 00:27:21,480 AUDIENCE: In this case, I believe that there 466 00:27:21,480 --> 00:27:26,340 can be two options, where it would be a small integer number 467 00:27:26,340 --> 00:27:29,780 and point it to another table storing the colors. 468 00:27:29,780 --> 00:27:36,330 Another one will be limited char taking into account that perhaps the longest 469 00:27:36,330 --> 00:27:39,390 color name will be like aquamarine, or something like that, 470 00:27:39,390 --> 00:27:43,032 and truncate it just to that size will be another option as well. 471 00:27:43,032 --> 00:27:45,990 CARTER ZENKE: Yeah, I love your thinking through multiple options here. 472 00:27:45,990 --> 00:27:48,240 And indeed, there's usually more than one option 473 00:27:48,240 --> 00:27:50,710 for any given type of a table here. 474 00:27:50,710 --> 00:27:56,040 So for this one, I mean, you could actually have line be a foreign key 475 00:27:56,040 --> 00:28:00,250 and reference some other table, to your point, of those colors. 476 00:28:00,250 --> 00:28:02,740 So line might then be an integer. 477 00:28:02,740 --> 00:28:05,993 We could also make line another varchar. 478 00:28:05,993 --> 00:28:07,410 Maybe this one's a little shorter. 479 00:28:07,410 --> 00:28:10,900 Like it's always going to be just blue, green, red, or so on, 480 00:28:10,900 --> 00:28:14,610 so a pretty short string, but still variable length nonetheless. 481 00:28:14,610 --> 00:28:17,960 So you could say varchar maybe 9, for instance, 482 00:28:17,960 --> 00:28:19,840 if we're representing a color here. 483 00:28:19,840 --> 00:28:26,340 One other option though exists, one that kind of constrains us, 484 00:28:26,340 --> 00:28:30,210 but that might still be a good fit. 485 00:28:30,210 --> 00:28:32,168 AUDIENCE: Maybe it would be good to use a set. 486 00:28:32,168 --> 00:28:33,210 CARTER ZENKE: To use set. 487 00:28:33,210 --> 00:28:34,600 You could use set as well. 488 00:28:34,600 --> 00:28:38,910 So you could say that a station might be on the blue line and the green line 489 00:28:38,910 --> 00:28:40,290 as exists in the real world. 490 00:28:40,290 --> 00:28:41,940 You could certainly use set for that. 491 00:28:41,940 --> 00:28:46,770 For our purposes, I might just go ahead and simplify things and just say enum. 492 00:28:46,770 --> 00:28:50,880 A station will only be on one line at a time for simplicity. 493 00:28:50,880 --> 00:28:55,300 But if you wanted a station on more than one line, you could use set as well. 494 00:28:55,300 --> 00:28:59,670 And in this enum or in this set we could include the possible lines 495 00:28:59,670 --> 00:29:03,090 that currently exist in the MBTA. 496 00:29:03,090 --> 00:29:06,850 So great reasoning through some different data types here. 497 00:29:06,850 --> 00:29:10,590 Let's now build this into our MySQL server here. 498 00:29:10,590 --> 00:29:11,490 I'll come back. 499 00:29:11,490 --> 00:29:14,970 And I will create this table. 500 00:29:14,970 --> 00:29:18,600 So we saw before I could simply use create table. 501 00:29:18,600 --> 00:29:20,310 I could say create table. 502 00:29:20,310 --> 00:29:23,670 And then I could say give it a name called stations. 503 00:29:23,670 --> 00:29:27,180 And inside I'll make sure to include-- 504 00:29:27,180 --> 00:29:31,290 let me indent four spaces here, one, two, three, four-- 505 00:29:31,290 --> 00:29:37,030 give it an ID, which has the type integer and auto increment. 506 00:29:37,030 --> 00:29:41,280 Then afterwards, we also want to include the name of our station, which 507 00:29:41,280 --> 00:29:44,430 could be a varchar up to size 32. 508 00:29:44,430 --> 00:29:46,890 It could include up to 32 characters. 509 00:29:46,890 --> 00:29:50,760 And this should also be not null and unique. 510 00:29:50,760 --> 00:29:55,470 Those same kind of column constraints, they're still here in MySQL. 511 00:29:55,470 --> 00:29:59,100 Now I could also include the line column. 512 00:29:59,100 --> 00:30:04,920 And for simplicity, I could just assume every station is part of only one line. 513 00:30:04,920 --> 00:30:07,060 So I'll say enum here. 514 00:30:07,060 --> 00:30:12,420 And I'll choose blue, and green, and orange, and red. 515 00:30:12,420 --> 00:30:16,920 These are four lines for the subway that exist here in Boston. 516 00:30:16,920 --> 00:30:20,100 And I'll make sure this has to have a value. 517 00:30:20,100 --> 00:30:21,000 It can't be null. 518 00:30:21,000 --> 00:30:22,950 Put that not null constraint here. 519 00:30:22,950 --> 00:30:27,510 Then I'll say my primary key is the ID column. 520 00:30:27,510 --> 00:30:31,710 And I will go ahead and close this out, hit semicolon and Enter. 521 00:30:31,710 --> 00:30:33,990 And I'll see query OK. 522 00:30:33,990 --> 00:30:40,020 So now, if I show tables, Enter, I'll see both cards and stations. 523 00:30:40,020 --> 00:30:46,420 And if I were to describe stations, what do I see here? 524 00:30:46,420 --> 00:30:47,310 Let's take a look. 525 00:30:47,310 --> 00:30:48,570 I'll come back over here. 526 00:30:48,570 --> 00:30:53,280 And I'll see I have three columns, ID, name, and line. 527 00:30:53,280 --> 00:30:56,070 And each of these has their very own type, 528 00:30:56,070 --> 00:30:59,610 as I see in this column of my describes output. 529 00:30:59,610 --> 00:31:02,700 I see here that the type of ID is an integer. 530 00:31:02,700 --> 00:31:05,520 The type of name is a varchar 32. 531 00:31:05,520 --> 00:31:07,680 And the type of line is an enum. 532 00:31:07,680 --> 00:31:12,910 Blue, green, orange, or red are the possible values for that column. 533 00:31:12,910 --> 00:31:17,170 Now, none of these columns can have a null value inside of them. 534 00:31:17,170 --> 00:31:18,870 But here's the interesting part. 535 00:31:18,870 --> 00:31:22,650 I see ID has a key, PRI, primary. 536 00:31:22,650 --> 00:31:25,690 That means ID is my primary key. 537 00:31:25,690 --> 00:31:30,330 But I also see uni, where uni stands for unique. 538 00:31:30,330 --> 00:31:33,210 Now, we saw before this name column. 539 00:31:33,210 --> 00:31:37,350 I applied a unique constraint, meaning there can only 540 00:31:37,350 --> 00:31:40,360 be a given station with a unique name. 541 00:31:40,360 --> 00:31:42,870 I can't duplicate station names. 542 00:31:42,870 --> 00:31:47,400 And then of course, over here we also saw the extra column auto increment 543 00:31:47,400 --> 00:31:51,360 that I applied to the ID column here. 544 00:31:51,360 --> 00:31:52,580 So let me ask. 545 00:31:52,580 --> 00:31:58,850 We've seen now some new types for text, some new types for enum, and set, 546 00:31:58,850 --> 00:31:59,720 and so on. 547 00:31:59,720 --> 00:32:04,850 What questions do we have on these new types and how we've used them? 548 00:32:04,850 --> 00:32:08,360 AUDIENCE: I was wondering if it would be perhaps possible to include 549 00:32:08,360 --> 00:32:10,700 a table as the argument to enum? 550 00:32:10,700 --> 00:32:12,420 CARTER ZENKE: Oh, a good question. 551 00:32:12,420 --> 00:32:14,480 I actually am not sure. 552 00:32:14,480 --> 00:32:21,390 I could imagine the possibility of using a nested select for that. 553 00:32:21,390 --> 00:32:24,290 But I mean, let's reason through this here. 554 00:32:24,290 --> 00:32:28,040 If I had a table that was growing, would I 555 00:32:28,040 --> 00:32:33,740 want to use those table results in my create table statement? 556 00:32:33,740 --> 00:32:37,680 I would argue probably not because then things are going to change over time. 557 00:32:37,680 --> 00:32:42,590 So probably worth just enumerating one at a time explicitly 558 00:32:42,590 --> 00:32:44,720 what might go into that enum. 559 00:32:44,720 --> 00:32:48,810 But I encourage you to try that out too and see what happens. 560 00:32:48,810 --> 00:32:49,738 Let's take one more. 561 00:32:49,738 --> 00:32:51,530 AUDIENCE: Here's the problem for a database 562 00:32:51,530 --> 00:32:57,080 if I, for example, I have no idea how long could be the name of the station. 563 00:32:57,080 --> 00:33:01,623 And if I write a varchar, for example, 100 or 300. 564 00:33:01,623 --> 00:33:04,040 CARTER ZENKE: Yeah, so maybe you're thinking of a scenario 565 00:33:04,040 --> 00:33:06,630 where you don't know how long it's going to be. 566 00:33:06,630 --> 00:33:08,240 So you'd be kind of extra safe. 567 00:33:08,240 --> 00:33:10,663 You'd say like varchar 300, for instance, 568 00:33:10,663 --> 00:33:12,830 just to accommodate anything you could put in there. 569 00:33:12,830 --> 00:33:14,510 That is OK. 570 00:33:14,510 --> 00:33:19,010 But the trade off is you're then using up to 300 bytes 571 00:33:19,010 --> 00:33:21,890 every time you store a smaller string. 572 00:33:21,890 --> 00:33:25,430 Like if I use varchar 300 and I only ever use, 573 00:33:25,430 --> 00:33:28,490 let's say, like 10 characters on my longest string, well, 574 00:33:28,490 --> 00:33:29,840 I'm wasting some space. 575 00:33:29,840 --> 00:33:32,270 So it's worthwhile to think through beforehand, 576 00:33:32,270 --> 00:33:34,340 what should I actually use here? 577 00:33:34,340 --> 00:33:37,700 And something to keep in mind is you can always 578 00:33:37,700 --> 00:33:41,970 expand a varchar using MySQL's alter table statement. 579 00:33:41,970 --> 00:33:44,630 So we'll see that in just a bit, how to use alter table. 580 00:33:44,630 --> 00:33:48,460 But you could always expand if you need to. 581 00:33:48,460 --> 00:33:49,760 OK, good questions here. 582 00:33:49,760 --> 00:33:51,080 Let's keep going then. 583 00:33:51,080 --> 00:33:55,900 And our next table is going to be the table that represents swipes. 584 00:33:55,900 --> 00:33:58,960 We have cards and stations. 585 00:33:58,960 --> 00:34:03,310 But now we have to represent swipes at those stations. 586 00:34:03,310 --> 00:34:09,550 Well, let's come back to our schema for swipes in SQLite. 587 00:34:09,550 --> 00:34:12,880 Here, I had this table called swipes that 588 00:34:12,880 --> 00:34:16,659 had some columns up top and some primary foreign keys down below. 589 00:34:16,659 --> 00:34:21,909 But the important part for us now is going to be these three columns, type, 590 00:34:21,909 --> 00:34:23,889 date time, and amount. 591 00:34:23,889 --> 00:34:29,080 And remember that on the MBTA, I can take my CharlieCard. 592 00:34:29,080 --> 00:34:35,020 I can make a swipe when I enter, exit, or deposit some money at a station. 593 00:34:35,020 --> 00:34:36,850 That's that type column here. 594 00:34:36,850 --> 00:34:41,170 It could be either enter, exit, or deposit. 595 00:34:41,170 --> 00:34:45,610 Now, date time is the time and date I swipe that card 596 00:34:45,610 --> 00:34:48,790 to enter, exit, or deposit some funds. 597 00:34:48,790 --> 00:34:55,179 And amount is the amount in dollars associated with that transaction. 598 00:34:55,179 --> 00:34:58,060 Now, the fare nowadays is like $2.40. 599 00:34:58,060 --> 00:35:00,790 Back in the '50s it was maybe like a dime or so. 600 00:35:00,790 --> 00:35:02,090 It changes over time. 601 00:35:02,090 --> 00:35:05,660 So it's worth keeping track of in our table here. 602 00:35:05,660 --> 00:35:09,670 So let's think through some options that MySQL gives us. 603 00:35:09,670 --> 00:35:12,890 Among them are options for dates and time. 604 00:35:12,890 --> 00:35:17,410 So whereas in SQLite we had numeric for dates and times, 605 00:35:17,410 --> 00:35:21,820 in MySQL we actually have explicit date and time types. 606 00:35:21,820 --> 00:35:24,520 For instance, we have all these here, date, 607 00:35:24,520 --> 00:35:28,640 which is just a date, like the year, the month, the day, time, 608 00:35:28,640 --> 00:35:32,740 which is the actual time, like hours, minutes, and seconds. 609 00:35:32,740 --> 00:35:36,970 We then have date time, which combines those, timestamp, 610 00:35:36,970 --> 00:35:40,810 which is similar but used with a bit more precision often for logging 611 00:35:40,810 --> 00:35:45,670 events on your own software, and then year, which is just 2023, 612 00:35:45,670 --> 00:35:49,830 or 2020 for instance, in a single column. 613 00:35:49,830 --> 00:35:53,820 Now, you can get more particular about how precise you 614 00:35:53,820 --> 00:35:55,950 want to be with these times. 615 00:35:55,950 --> 00:36:00,810 For time, date time, and timestamp, I can 616 00:36:00,810 --> 00:36:04,740 specify how many decimal digits I want to keep 617 00:36:04,740 --> 00:36:07,830 track of after the decimal point. 618 00:36:07,830 --> 00:36:11,850 Do I want to keep track of tenths of seconds, hundredths, or thousandths? 619 00:36:11,850 --> 00:36:16,980 I could specify that using this (fsp). 620 00:36:16,980 --> 00:36:20,750 In this case, I could say one, two, three, four, up to I 621 00:36:20,750 --> 00:36:24,300 believe six to figure out how precisely do I 622 00:36:24,300 --> 00:36:29,830 want to keep track of time, where more digits means more precision. 623 00:36:29,830 --> 00:36:32,820 So I could use these for dates and times. 624 00:36:32,820 --> 00:36:37,080 That could get us through part of our table migration here. 625 00:36:37,080 --> 00:36:41,760 But I should also think about how to represent real numbers, that is 626 00:36:41,760 --> 00:36:44,800 numbers with a decimal point in them. 627 00:36:44,800 --> 00:36:51,690 Now, you might think back to SQLite and remember the real data type. 628 00:36:51,690 --> 00:36:54,180 And we have some similar options here. 629 00:36:54,180 --> 00:36:59,250 We have float and double precision, where the only difference is 630 00:36:59,250 --> 00:37:04,590 float stores a number in size 4 bytes. 631 00:37:04,590 --> 00:37:10,060 Whereas, double precision stores that number in 8 bytes. 632 00:37:10,060 --> 00:37:13,170 And you might be thinking, well, if they both store 633 00:37:13,170 --> 00:37:19,440 a decimal number, a real number, why would I choose one over the other? 634 00:37:19,440 --> 00:37:23,970 Or why would I use more space when float does the same thing for me 635 00:37:23,970 --> 00:37:26,070 in fewer bytes? 636 00:37:26,070 --> 00:37:28,980 Well, it turns out that in computer science, 637 00:37:28,980 --> 00:37:34,380 we run into this problem of floating point imprecision, where there 638 00:37:34,380 --> 00:37:37,050 are an infinite number of real numbers. 639 00:37:37,050 --> 00:37:42,870 Like if I asked you to find me all of the decimal numbers between 1 and 2, 640 00:37:42,870 --> 00:37:45,480 you would be here for literally an eternity. 641 00:37:45,480 --> 00:37:50,940 And if we have so many numbers to represent, but so few bits, 642 00:37:50,940 --> 00:37:54,190 it goes to say that we can only represent some of them 643 00:37:54,190 --> 00:37:55,840 and not all of them. 644 00:37:55,840 --> 00:38:00,960 So at the end of the day, floating point numbers, decimals, real numbers, 645 00:38:00,960 --> 00:38:04,110 will be imprecisely represented. 646 00:38:04,110 --> 00:38:09,600 Now, a double precision could get us more of the way there. 647 00:38:09,600 --> 00:38:12,750 We could use more bits to more precisely represent 648 00:38:12,750 --> 00:38:16,380 some number, which you might use if you're a scientist, someone who wants 649 00:38:16,380 --> 00:38:18,960 to keep track of things very precisely. 650 00:38:18,960 --> 00:38:22,590 If you don't care as much though, you could use a float 651 00:38:22,590 --> 00:38:26,160 and just use a little bit less space. 652 00:38:26,160 --> 00:38:32,940 Now thankfully, this trade off isn't quite so salient in MySQL 653 00:38:32,940 --> 00:38:38,400 because you also get access to fixed precision types, where 654 00:38:38,400 --> 00:38:44,340 fixed precision means I can specify I want to always have two places 655 00:38:44,340 --> 00:38:47,670 after the decimal precisely represented. 656 00:38:47,670 --> 00:38:52,440 And one type for this will be the decimal type, where I can use a decimal 657 00:38:52,440 --> 00:38:57,600 and then say, I want to store a certain number of digits 658 00:38:57,600 --> 00:39:01,540 with some number coming after the decimal point. 659 00:39:01,540 --> 00:39:05,610 Now, for example, I could say decimal 5 comma 1. 660 00:39:05,610 --> 00:39:14,760 And I could store any number between negative 999.99 up to 999.99 661 00:39:14,760 --> 00:39:21,450 so long as there are only two places after the decimal here. 662 00:39:21,450 --> 00:39:22,810 Now, I could increase this. 663 00:39:22,810 --> 00:39:25,350 I could say I want to go beyond this range. 664 00:39:25,350 --> 00:39:26,970 I want to have more digits. 665 00:39:26,970 --> 00:39:32,220 I could increase this first parameter here from five, let's say, to six. 666 00:39:32,220 --> 00:39:34,530 And now I have a wider range. 667 00:39:34,530 --> 00:39:42,640 I have 9,999 all the way down to negative 9,999, in both cases 668 00:39:42,640 --> 00:39:48,810 0.992 decimal digits after this decimal point. 669 00:39:48,810 --> 00:39:49,830 Now, I can keep going. 670 00:39:49,830 --> 00:39:52,650 I could say 7 and get an even bigger range. 671 00:39:52,650 --> 00:39:56,820 But if I want to present something precisely after the decimal point, what 672 00:39:56,820 --> 00:39:57,940 could I do? 673 00:39:57,940 --> 00:40:03,960 I could change this one too from 2, let's say, to 3 or even to 4. 674 00:40:03,960 --> 00:40:05,040 I could keep going. 675 00:40:05,040 --> 00:40:08,220 But notice how my range shrinks. 676 00:40:08,220 --> 00:40:12,510 What I get in return is just more digits after that decimal point 677 00:40:12,510 --> 00:40:14,280 precisely represented. 678 00:40:14,280 --> 00:40:17,040 I don't have to worry about floating point imprecision, 679 00:40:17,040 --> 00:40:21,840 so long as I'm using decimal and telling MySQL how many digits I want 680 00:40:21,840 --> 00:40:25,540 and how many come after the decimal point. 681 00:40:25,540 --> 00:40:28,090 OK, so let's try this here. 682 00:40:28,090 --> 00:40:32,500 I want to migrate this table from SQLite to MySQL. 683 00:40:32,500 --> 00:40:34,960 The first thing to do, change these quotes 684 00:40:34,960 --> 00:40:37,360 from double quotes to back ticks. 685 00:40:37,360 --> 00:40:40,900 I could then update my primary key, and so on. 686 00:40:40,900 --> 00:40:43,390 But let's focus now on this type. 687 00:40:43,390 --> 00:40:48,160 This type column should only have some set of values, 688 00:40:48,160 --> 00:40:51,160 like enter, exit, or deposit. 689 00:40:51,160 --> 00:40:53,650 And when you see that, you could think I should probably 690 00:40:53,650 --> 00:40:58,570 use an enum to only store some data types, 691 00:40:58,570 --> 00:41:00,970 some values that are inside of this list. 692 00:41:00,970 --> 00:41:03,460 So I'll make this now an enum. 693 00:41:03,460 --> 00:41:08,150 But the question then becomes, what should I use for date time? 694 00:41:08,150 --> 00:41:12,980 What type have we seen so far that could be useful for storing the date 695 00:41:12,980 --> 00:41:16,410 and time I swiped my CharlieCard? 696 00:41:16,410 --> 00:41:19,790 AUDIENCE: Yes, I believe we can use that date time data type. 697 00:41:19,790 --> 00:41:22,820 CARTER ZENKE: I could use date time, so kind of self-explanatory here. 698 00:41:22,820 --> 00:41:24,920 I mean, the column is called date time. 699 00:41:24,920 --> 00:41:27,710 And coincidentally, MySQL also has a type 700 00:41:27,710 --> 00:41:33,140 called date time that stores both a date year, year, year, mm, dd, 701 00:41:33,140 --> 00:41:34,350 and then the hour. 702 00:41:34,350 --> 00:41:40,070 Along with that, I could say date time here to get both day and time. 703 00:41:40,070 --> 00:41:42,920 Now, the next one, amount. 704 00:41:42,920 --> 00:41:47,840 Now, it's worth noting I want to be precise with this amount. 705 00:41:47,840 --> 00:41:51,590 Like if I'm going to sum up many, many amounts, many fares, 706 00:41:51,590 --> 00:41:53,760 I want it to be accurate. 707 00:41:53,760 --> 00:41:58,070 So I don't want to deal with, let's say, float or double precision. 708 00:41:58,070 --> 00:42:01,370 I want to specify how precise I should be. 709 00:42:01,370 --> 00:42:07,610 Let me ask, what data type might be good for storing the amount of some given 710 00:42:07,610 --> 00:42:08,960 fare? 711 00:42:08,960 --> 00:42:11,190 AUDIENCE: I believe we can use the decimal data type. 712 00:42:11,190 --> 00:42:14,190 CARTER ZENKE: Yeah, you could absolutely use the decimal data type here. 713 00:42:14,190 --> 00:42:19,050 So decimal was part of MySQL's fixed precision kinds of types. 714 00:42:19,050 --> 00:42:23,880 So we could use that here to precisely represent the amount column here. 715 00:42:23,880 --> 00:42:28,040 I could change this from numeric in SQLite to decimal now, 716 00:42:28,040 --> 00:42:33,980 where decimal I could say takes five possible digits, like five total. 717 00:42:33,980 --> 00:42:37,350 And two now could come after the decimal point. 718 00:42:37,350 --> 00:42:40,610 So I could have amounts or fares that go up 719 00:42:40,610 --> 00:42:43,370 to hundreds of dollars, which often isn't going to happen. 720 00:42:43,370 --> 00:42:47,290 But I could be safe here if I need to be. 721 00:42:47,290 --> 00:42:51,980 OK, let's go ahead and combine this and put this inside of our MySQL server 722 00:42:51,980 --> 00:42:52,480 here. 723 00:42:52,480 --> 00:42:54,680 I'll come back to my computer here. 724 00:42:54,680 --> 00:42:58,150 And my goal is to run this create table statement to create 725 00:42:58,150 --> 00:43:00,250 this table on our server. 726 00:43:00,250 --> 00:43:04,840 I'll say create a table called swipes, like this. 727 00:43:04,840 --> 00:43:09,490 And I'll include a few columns we didn't show on the slides just for simplicity. 728 00:43:09,490 --> 00:43:14,050 All first include the ID column with type int. 729 00:43:14,050 --> 00:43:18,580 And I also want to make sure this auto increments as I add to it. 730 00:43:18,580 --> 00:43:20,080 I'll then go on. 731 00:43:20,080 --> 00:43:25,450 And I will also include a card ID column that has a type int. 732 00:43:25,450 --> 00:43:30,620 And then I'll include a station ID column, which also has type int. 733 00:43:30,620 --> 00:43:36,280 So now, I can see which card ID is swiped at which station. 734 00:43:36,280 --> 00:43:42,090 Now, when they swipe here at the MBTA, swipe can mean a few different things. 735 00:43:42,090 --> 00:43:43,540 You can have a different type. 736 00:43:43,540 --> 00:43:47,620 One type we might care about is the enter type, 737 00:43:47,620 --> 00:43:50,470 when I swipe to enter some station. 738 00:43:50,470 --> 00:43:55,990 I could also swipe to exit some station or to deposit 739 00:43:55,990 --> 00:43:58,600 some funds into my account. 740 00:43:58,600 --> 00:44:02,000 And a type should never be null. 741 00:44:02,000 --> 00:44:05,240 It has to be not null constraint here. 742 00:44:05,240 --> 00:44:09,990 Now, a swipe happens at some particular date and time. 743 00:44:09,990 --> 00:44:16,730 So I'll include the date time column and say that this is of type date time. 744 00:44:16,730 --> 00:44:20,610 I want to make sure this column always has some value. 745 00:44:20,610 --> 00:44:21,740 So it's not null. 746 00:44:21,740 --> 00:44:24,710 And the default when I insert some new row 747 00:44:24,710 --> 00:44:31,350 will be this special value here, current timestamp, the current date and time. 748 00:44:31,350 --> 00:44:32,600 Now I'll go on. 749 00:44:32,600 --> 00:44:35,450 And I'll include the amount of some swipe. 750 00:44:35,450 --> 00:44:37,470 I'll hit space. 751 00:44:37,470 --> 00:44:39,260 And then I'll say amount here. 752 00:44:39,260 --> 00:44:42,200 This is the amount associated with that swipe. 753 00:44:42,200 --> 00:44:43,370 Was I charged money? 754 00:44:43,370 --> 00:44:45,380 Was I adding money? 755 00:44:45,380 --> 00:44:48,530 Here I'll say I want this to be a decimal type because you 756 00:44:48,530 --> 00:44:50,480 said we want this to be precise. 757 00:44:50,480 --> 00:44:52,985 And decimal is a fixed precision type. 758 00:44:52,985 --> 00:44:59,010 I'll say I want five decimal digits with two after the decimal point. 759 00:44:59,010 --> 00:45:01,910 Now, decimal should be not null. 760 00:45:01,910 --> 00:45:04,310 And I'll include our same check constraint. 761 00:45:04,310 --> 00:45:09,980 I'll check that amount with back ticks here does not equal 0. 762 00:45:09,980 --> 00:45:13,910 And if that's the case, I'll be able to insert some new swipe. 763 00:45:13,910 --> 00:45:19,130 Now, I'll assign my table constraints, like primary keys and foreign keys. 764 00:45:19,130 --> 00:45:25,940 I'll say my primary key is the ID column, like this. 765 00:45:25,940 --> 00:45:34,010 And then if I hit Enter again, my foreign key, in this case card ID, 766 00:45:34,010 --> 00:45:40,550 should reference, it references cards table and the ID column 767 00:45:40,550 --> 00:45:42,860 in cards, like this. 768 00:45:42,860 --> 00:45:48,020 So that card ID column references the ID column in the cards table. 769 00:45:48,020 --> 00:45:50,760 Now, I'll include my last foreign key. 770 00:45:50,760 --> 00:45:58,070 The foreign key station ID references the station's table 771 00:45:58,070 --> 00:46:01,550 and the ID column in stations. 772 00:46:01,550 --> 00:46:05,310 And now to quickly review to make sure I typed everything in correctly here, 773 00:46:05,310 --> 00:46:10,340 so we have the ID column, the card ID column, station ID, type of enum, 774 00:46:10,340 --> 00:46:11,775 date time, amount. 775 00:46:11,775 --> 00:46:12,900 I think we'll be good here. 776 00:46:12,900 --> 00:46:17,300 So if I hit Enter and then use parentheses semicolon, 777 00:46:17,300 --> 00:46:21,590 I should see query OK zero rows affected. 778 00:46:21,590 --> 00:46:25,820 Now, if I type, in this case, show tables Enter, 779 00:46:25,820 --> 00:46:28,230 I'll see I have a new swipes table. 780 00:46:28,230 --> 00:46:33,480 And if I type describe swipes, like this, 781 00:46:33,480 --> 00:46:36,740 I should see some familiar output. 782 00:46:36,740 --> 00:46:41,240 If I go back to my screen over here, we can take a look. 783 00:46:41,240 --> 00:46:47,240 We have again, our fields, ID, card ID, and so on, types, int, enum, 784 00:46:47,240 --> 00:46:48,620 date time, decimal. 785 00:46:48,620 --> 00:46:51,650 Some can have null values, some should not. 786 00:46:51,650 --> 00:46:53,660 We have a primary key here. 787 00:46:53,660 --> 00:46:58,520 But now the interesting thing is this M-U-L. Well, 788 00:46:58,520 --> 00:47:06,590 if I look at card ID and station ID, what do we call those two columns? 789 00:47:06,590 --> 00:47:11,750 What kind of constraints did I apply to card ID and station ID? 790 00:47:11,750 --> 00:47:15,930 AUDIENCE: So we uses the foreign key constraints, the foreign key. 791 00:47:15,930 --> 00:47:18,692 So it's referencing the station and the card [INAUDIBLE].. 792 00:47:18,692 --> 00:47:20,150 CARTER ZENKE: You're exactly right. 793 00:47:20,150 --> 00:47:25,380 We applied the foreign key constraint to the card ID and station ID columns. 794 00:47:25,380 --> 00:47:30,410 So this M-U-L here is referring to the fact that these are foreign keys. 795 00:47:30,410 --> 00:47:33,560 And M-U-L stands for something like multiple or multi. 796 00:47:33,560 --> 00:47:39,410 It means I could have, in this case, multiple of the same value 797 00:47:39,410 --> 00:47:43,080 because it is a foreign key in this table. 798 00:47:43,080 --> 00:47:46,280 Now, the default here we now see current timestamp. 799 00:47:46,280 --> 00:47:49,700 When I insert some value into this table, 800 00:47:49,700 --> 00:47:54,140 insert some rows, sorry, into this table and don't supply a value for the date 801 00:47:54,140 --> 00:47:57,440 time column, I'll instead make that default value 802 00:47:57,440 --> 00:47:59,600 the current date and the current time. 803 00:47:59,600 --> 00:48:03,680 And here we see some extras, like auto increment that we saw before, 804 00:48:03,680 --> 00:48:07,850 and now default generated, meaning this is a generated column that 805 00:48:07,850 --> 00:48:13,670 will automatically create some value for me when I insert some new row. 806 00:48:13,670 --> 00:48:19,520 OK, so let me ask, what questions do we have then on these new types, dates 807 00:48:19,520 --> 00:48:22,190 and times, and more? 808 00:48:22,190 --> 00:48:25,130 AUDIENCE: Yes, I wanted to know the precedence 809 00:48:25,130 --> 00:48:32,730 in which the attributes, the specific constraints, the attributes are added. 810 00:48:32,730 --> 00:48:37,520 So is it in the format of the describe table? 811 00:48:37,520 --> 00:48:39,860 In the describe table, there's a setting format 812 00:48:39,860 --> 00:48:44,390 in which the data is the priority in which you do it? 813 00:48:44,390 --> 00:48:46,520 Let's say an integer before. 814 00:48:46,520 --> 00:48:49,760 Let's say a unique constraint before. 815 00:48:49,760 --> 00:48:50,720 And not null. 816 00:48:50,720 --> 00:48:52,070 What's the precedent? 817 00:48:52,070 --> 00:48:57,003 And secondly, is there any type affinity in MySQL? 818 00:48:57,003 --> 00:48:58,670 CARTER ZENKE: Yeah, both good questions. 819 00:48:58,670 --> 00:49:00,540 I'll answer the first one first. 820 00:49:00,540 --> 00:49:02,900 So the first one is, what's the precedence, 821 00:49:02,900 --> 00:49:05,330 I believe, of these constraints we're applying? 822 00:49:05,330 --> 00:49:09,350 If I look at this output of describe swipes here, 823 00:49:09,350 --> 00:49:11,540 I see that I applied some constraints. 824 00:49:11,540 --> 00:49:15,470 Among them are the foreign key constraint, the primary key constraint. 825 00:49:15,470 --> 00:49:20,000 Some I made sure that I couldn't have some null value in here. 826 00:49:20,000 --> 00:49:24,690 In effect, they all combine to work for me here. 827 00:49:24,690 --> 00:49:26,930 So when I made that create table statement, 828 00:49:26,930 --> 00:49:31,700 I applied multiple constraints, like not null and unique. 829 00:49:31,700 --> 00:49:34,190 And I could put those in any order. 830 00:49:34,190 --> 00:49:36,830 MySQL is designed so I can do it in any order 831 00:49:36,830 --> 00:49:39,860 and get the same result forwards or backwards. 832 00:49:39,860 --> 00:49:44,870 When I read this table, I might just read it left to right here and kind of 833 00:49:44,870 --> 00:49:49,370 get a feel for which constraints are being applied in no particular order, 834 00:49:49,370 --> 00:49:50,810 in this case. 835 00:49:50,810 --> 00:49:55,190 AUDIENCE: Yeah, the second question has to do with type affinity. 836 00:49:55,190 --> 00:49:58,590 With SQLite we had something like the type 837 00:49:58,590 --> 00:50:03,510 affinity where a particular value is appropriate for a particular role. 838 00:50:03,510 --> 00:50:06,410 So is there anything like that in MySQL? 839 00:50:06,410 --> 00:50:07,590 CARTER ZENKE: Yes, there is. 840 00:50:07,590 --> 00:50:13,100 So in SQLite, you're right, we had type affinities, where we defined a table 841 00:50:13,100 --> 00:50:17,840 and said that this column should store values of this type. 842 00:50:17,840 --> 00:50:23,090 We're doing the very same thing in MySQL when we say ID of type int 843 00:50:23,090 --> 00:50:28,280 or the name of a station of type varchar 16, for instance. 844 00:50:28,280 --> 00:50:31,790 One difference though is that SQLite will 845 00:50:31,790 --> 00:50:36,440 allow you to add in some value that doesn't actually 846 00:50:36,440 --> 00:50:38,940 have the type of that given column. 847 00:50:38,940 --> 00:50:46,340 So if I had a SQLite column of type text or of type, let's say, type numeric 848 00:50:46,340 --> 00:50:49,490 and I added some text, it would try to convert. 849 00:50:49,490 --> 00:50:52,040 But if it couldn't, it would actually just store text. 850 00:50:52,040 --> 00:50:54,230 MySQL is more strict than that. 851 00:50:54,230 --> 00:50:56,930 It will make sure you have the proper type when 852 00:50:56,930 --> 00:51:04,650 you insert some new rows, at least by default. OK, great questions here. 853 00:51:04,650 --> 00:51:08,780 Let's keep going and show some more features of MySQL. 854 00:51:08,780 --> 00:51:15,450 So one of them is the ability to alter tables in a more fundamental way. 855 00:51:15,450 --> 00:51:16,880 So let me zoom back in here. 856 00:51:16,880 --> 00:51:20,960 In SQLite we had some options to alter tables. 857 00:51:20,960 --> 00:51:24,380 But at the end of the day, they just weren't all that powerful. 858 00:51:24,380 --> 00:51:30,440 With MySQL, though, we're actually able to alter tables a little bit 859 00:51:30,440 --> 00:51:31,460 more fundamentally. 860 00:51:31,460 --> 00:51:37,520 So let's say that the MBTA adds in a new line, a new subway line. 861 00:51:37,520 --> 00:51:40,400 And actually, the silver line, if you can see it here, 862 00:51:40,400 --> 00:51:42,440 is currently a bus line. 863 00:51:42,440 --> 00:51:46,410 But it was planned to be a subway line. 864 00:51:46,410 --> 00:51:51,650 So if I wanted to add in a subway line called the silver line, 865 00:51:51,650 --> 00:51:58,160 well, I can't currently do that because my type for the lines is an enum. 866 00:51:58,160 --> 00:52:03,530 It only has red, green, blue, and orange, not silver. 867 00:52:03,530 --> 00:52:06,830 So I should actually edit this create table 868 00:52:06,830 --> 00:52:09,710 to now be able to include the silver line 869 00:52:09,710 --> 00:52:13,400 as a possible value among that enum. 870 00:52:13,400 --> 00:52:15,080 So I'll come back over here. 871 00:52:15,080 --> 00:52:19,440 And I'll introduce you to MySQL's alter table syntax. 872 00:52:19,440 --> 00:52:24,050 So in MySQL we have still alter table, similar to SQLite. 873 00:52:24,050 --> 00:52:26,250 They're all based on the SQL standard. 874 00:52:26,250 --> 00:52:31,550 But now we have modify, where modify can take some column definition 875 00:52:31,550 --> 00:52:34,080 and change it on the fly. 876 00:52:34,080 --> 00:52:36,380 So I'll go to MySQL here. 877 00:52:36,380 --> 00:52:41,240 And let me describe again the stations table. 878 00:52:41,240 --> 00:52:42,410 I'll hit Enter now. 879 00:52:42,410 --> 00:52:45,950 And we'll see that the line is an enum. 880 00:52:45,950 --> 00:52:50,090 And it has a few options, blue, green, orange, and red. 881 00:52:50,090 --> 00:52:52,860 But now I want to add silver here. 882 00:52:52,860 --> 00:52:54,540 So what could I do? 883 00:52:54,540 --> 00:52:55,920 I could do this. 884 00:52:55,920 --> 00:52:57,080 I could use alter table. 885 00:52:57,080 --> 00:53:00,440 And I could say, I want to alter table. 886 00:53:00,440 --> 00:53:03,470 I want to alter the stations table. 887 00:53:03,470 --> 00:53:05,730 And I want to modify. 888 00:53:05,730 --> 00:53:10,370 I want to modify the line column to be this. 889 00:53:10,370 --> 00:53:19,720 I want it to be an enum with these options, blue, green, orange, red, 890 00:53:19,720 --> 00:53:23,450 and silver, like this. 891 00:53:23,450 --> 00:53:26,170 Now I have more options in my enum. 892 00:53:26,170 --> 00:53:30,040 I could also include not null too, hit Enter. 893 00:53:30,040 --> 00:53:31,390 And I should see-- 894 00:53:31,390 --> 00:53:35,493 oops, I think I might have forgotten a quote somewhere. 895 00:53:35,493 --> 00:53:36,160 I believe I did. 896 00:53:36,160 --> 00:53:36,850 Let me fix this. 897 00:53:36,850 --> 00:53:39,905 I will finish the quote, hit semicolon. 898 00:53:39,905 --> 00:53:41,530 And let me show you how I noticed that. 899 00:53:41,530 --> 00:53:45,880 If I go back to my screen here, see in MySQL, it's kind of helpful. 900 00:53:45,880 --> 00:53:50,500 If I finish a line and I haven't closed this quote here, like I did in red, 901 00:53:50,500 --> 00:53:55,420 it will alert me on this next line, with this single quote here saying, hey, 902 00:53:55,420 --> 00:53:58,420 you didn't close that thing that you probably thought you were going to. 903 00:53:58,420 --> 00:54:00,580 So let me restart. 904 00:54:00,580 --> 00:54:01,900 I'll come back over here. 905 00:54:01,900 --> 00:54:04,250 And I will try this again. 906 00:54:04,250 --> 00:54:06,790 I will alter table stations. 907 00:54:06,790 --> 00:54:10,760 And then I will modify the line column, as we said before, 908 00:54:10,760 --> 00:54:13,270 to be an enum with a few different values, 909 00:54:13,270 --> 00:54:20,350 blue, green, orange, red, closing both quotes now, and then silver, 910 00:54:20,350 --> 00:54:21,760 again closing both quotes. 911 00:54:21,760 --> 00:54:23,140 And let me double check. 912 00:54:23,140 --> 00:54:24,340 I believe we're all set. 913 00:54:24,340 --> 00:54:27,910 I'll then say not null, semicolon, Enter. 914 00:54:27,910 --> 00:54:30,550 And I'll see the query is OK. 915 00:54:30,550 --> 00:54:37,960 Now if I do describe stations, like this, semicolon, 916 00:54:37,960 --> 00:54:42,220 and zoom out just a bit, I see I've added to my possible values 917 00:54:42,220 --> 00:54:43,000 for this enum. 918 00:54:43,000 --> 00:54:47,020 I have blue, green, orange, red, and silver. 919 00:54:47,020 --> 00:54:52,930 So this is one example of using alter table to adjust a column definition. 920 00:54:52,930 --> 00:54:57,250 To an earlier question, you could also use this to expand the size of varchars 921 00:54:57,250 --> 00:54:59,410 or to change your table all together. 922 00:54:59,410 --> 00:55:02,050 You can do so a bit more powerfully than you can 923 00:55:02,050 --> 00:55:07,740 with SQLite, just because MySQL decided to focus on this particular feature. 924 00:55:07,740 --> 00:55:14,210 So let me ask now what questions we have on altering tables in MySQL. 925 00:55:14,210 --> 00:55:18,530 AUDIENCE: This is generally regarding to MySQL and Postgres. 926 00:55:18,530 --> 00:55:21,470 Can we use those applications instead of v score 927 00:55:21,470 --> 00:55:23,700 [INAUDIBLE] that user interface? 928 00:55:23,700 --> 00:55:25,200 CARTER ZENKE: Yeah, a good question. 929 00:55:25,200 --> 00:55:27,590 So I think you're asking, what would be the difference, 930 00:55:27,590 --> 00:55:32,760 or how could I use MySQL and Postgres compared with VS Code? 931 00:55:32,760 --> 00:55:35,040 In effect, they're kind of different use cases. 932 00:55:35,040 --> 00:55:39,210 So VS Code, as you might know, is an IDE, Integrated Development 933 00:55:39,210 --> 00:55:39,710 Environment. 934 00:55:39,710 --> 00:55:43,160 It allows you to write code in a file, to run that code, 935 00:55:43,160 --> 00:55:45,590 and to edit it and see it all in one place. 936 00:55:45,590 --> 00:55:48,740 Whereas MySQL and PostgreSQL, those allow 937 00:55:48,740 --> 00:55:51,710 you to write SQL statements on a command line, 938 00:55:51,710 --> 00:55:54,090 talking to some underlying server. 939 00:55:54,090 --> 00:55:55,710 So a bit different. 940 00:55:55,710 --> 00:56:02,450 You could, for example, use VS Code and connect to a MySQL or PostgreSQL server 941 00:56:02,450 --> 00:56:06,860 to run your SQL queries you're creating in VS Code. 942 00:56:06,860 --> 00:56:10,600 Good question there for workflows. 943 00:56:10,600 --> 00:56:13,690 OK, so what we'll do is take a quick break here. 944 00:56:13,690 --> 00:56:16,660 And we'll come back and talk about stored procedures, 945 00:56:16,660 --> 00:56:21,730 some functions we can define to run over, and over, and over again. 946 00:56:21,730 --> 00:56:24,180 We'll see you in a few. 947 00:56:24,180 --> 00:56:25,830 OK, and we're back. 948 00:56:25,830 --> 00:56:28,920 Now we'll next talk about stored procedures, 949 00:56:28,920 --> 00:56:32,340 which are a way to automate the running of SQL statements 950 00:56:32,340 --> 00:56:34,990 over, and over, and over again. 951 00:56:34,990 --> 00:56:39,690 This is useful as your scaling because if you have some process you often 952 00:56:39,690 --> 00:56:43,200 call, you can put it inside of a stored procedure 953 00:56:43,200 --> 00:56:45,870 and run it all the more simply. 954 00:56:45,870 --> 00:56:48,930 So to demonstrate stored procedures, what we'll do 955 00:56:48,930 --> 00:56:52,740 is go back to our example of the Museum of Fine Arts, 956 00:56:52,740 --> 00:56:57,360 one of Boston's oldest museums, also called the MFA. 957 00:56:57,360 --> 00:57:01,230 And if you remember, in the MFA, we had a certain number 958 00:57:01,230 --> 00:57:05,490 of tables, where one of these tables was a collections table. 959 00:57:05,490 --> 00:57:09,750 It is the items the MFA has in their collection, the items 960 00:57:09,750 --> 00:57:11,730 they put on display, for instance. 961 00:57:11,730 --> 00:57:15,180 Here we had a few pieces, Farmers Working at Dawn, 962 00:57:15,180 --> 00:57:17,610 Imaginative Landscape, and so on. 963 00:57:17,610 --> 00:57:19,230 Each had their own ID. 964 00:57:19,230 --> 00:57:24,540 And they also had a deleted column to implement what we call a soft delete, 965 00:57:24,540 --> 00:57:28,080 where to delete something we wouldn't remove the row. 966 00:57:28,080 --> 00:57:33,570 We would instead just mark this deleted column from a 0 to a 1, 967 00:57:33,570 --> 00:57:38,340 sort of hiding it or marking it as having been deleted to keep data 968 00:57:38,340 --> 00:57:42,120 around, not removing it all together. 969 00:57:42,120 --> 00:57:46,710 Now to find all of the items that weren't marked as deleted, 970 00:57:46,710 --> 00:57:51,600 we could use this query, select star from collections where 971 00:57:51,600 --> 00:57:56,280 deleted equals 0, at least in SQLite. 972 00:57:56,280 --> 00:57:59,190 Now, we then created a view. 973 00:57:59,190 --> 00:58:04,740 We said, maybe I want to just always be able to view those items that have not 974 00:58:04,740 --> 00:58:09,060 been marked as deleted, one table called current collections, 975 00:58:09,060 --> 00:58:12,600 or one virtual table, a view called current collections. 976 00:58:12,600 --> 00:58:17,430 If I were to update Farmers Working at Dawn to be marked as deleted, 977 00:58:17,430 --> 00:58:18,330 what would happen? 978 00:58:18,330 --> 00:58:20,670 This deleted column would become a 1. 979 00:58:20,670 --> 00:58:23,880 And I would not see it in current collection. 980 00:58:23,880 --> 00:58:28,560 So our goal then is to re-implement something like this, 981 00:58:28,560 --> 00:58:30,780 but now not using a view. 982 00:58:30,780 --> 00:58:34,080 In this case, using a stored procedure, we 983 00:58:34,080 --> 00:58:41,230 could simply call to find all the items that have not been marked as deleted. 984 00:58:41,230 --> 00:58:46,060 So to create a stored procedure, we have some syntax we can use. 985 00:58:46,060 --> 00:58:49,000 And that syntax looks a bit like this. 986 00:58:49,000 --> 00:58:53,880 We could say create procedure and give it some name. 987 00:58:53,880 --> 00:58:57,660 Well, this procedure we could call maybe current collections, 988 00:58:57,660 --> 00:59:03,090 being able to view what is inside our collection and not marked as deleted. 989 00:59:03,090 --> 00:59:07,590 We could then, similar to a trigger, say begin 990 00:59:07,590 --> 00:59:14,080 after which we say the SQL statements we want to run as part of this procedure. 991 00:59:14,080 --> 00:59:18,300 So begin, we then type the SQL statements we want to run, 992 00:59:18,300 --> 00:59:23,340 and finally say end, this is the end of our procedure. 993 00:59:23,340 --> 00:59:27,300 And then later on, we could use a statement called call 994 00:59:27,300 --> 00:59:31,570 to call this procedure over and over again. 995 00:59:31,570 --> 00:59:35,790 So let's work on creating our very first stored procedure 996 00:59:35,790 --> 00:59:40,560 to find all the items in our collection that have not been deleted. 997 00:59:40,560 --> 00:59:44,070 Well, I'll go back to my MySQL server. 998 00:59:44,070 --> 00:59:47,370 And I'll show you a new database I've made here. 999 00:59:47,370 --> 00:59:51,520 I can type show databases semicolon. 1000 00:59:51,520 --> 00:59:56,260 And now I should see I have not just the default MySQL databases. 1001 00:59:56,260 --> 00:59:59,710 I also have the MFA database. 1002 00:59:59,710 --> 01:00:06,780 So if I type use MFA semicolon, I should see database changed. 1003 01:00:06,780 --> 01:00:11,640 Now all my future queries will run on this MFA database. 1004 01:00:11,640 --> 01:00:17,220 And so I could type show tables to see all the tables inside of this database. 1005 01:00:17,220 --> 01:00:18,060 Hit Enter. 1006 01:00:18,060 --> 01:00:21,960 And I'll see I have artists, collections, and created. 1007 01:00:21,960 --> 01:00:26,100 I could even if I wanted to describe the collections table, 1008 01:00:26,100 --> 01:00:28,780 describe collections semicolon. 1009 01:00:28,780 --> 01:00:32,940 And now I'll see the description for this collections table here. 1010 01:00:32,940 --> 01:00:38,580 I have an ID, title, accession number column, and an acquired column too. 1011 01:00:38,580 --> 01:00:40,816 But what am I missing? 1012 01:00:40,816 --> 01:00:44,880 I want to be able to implement soft deletes. 1013 01:00:44,880 --> 01:00:47,820 If you were to look at this describe output, 1014 01:00:47,820 --> 01:00:53,520 what am I missing if I want to implement soft deletes? 1015 01:00:53,520 --> 01:00:55,500 AUDIENCE: You're missing the deleted column. 1016 01:00:55,500 --> 01:00:58,500 CARTER ZENKE: Yeah, the deleted column is not yet in this table. 1017 01:00:58,500 --> 01:01:00,550 I know because I use describe here. 1018 01:01:00,550 --> 01:01:02,070 So I don't have deleted yet. 1019 01:01:02,070 --> 01:01:04,200 But I can add it using alter table. 1020 01:01:04,200 --> 01:01:06,030 So I'll come back to my computer. 1021 01:01:06,030 --> 01:01:10,170 And let me use alter table to add in this particular column. 1022 01:01:10,170 --> 01:01:14,700 I could say alter table collections, like this. 1023 01:01:14,700 --> 01:01:17,220 And then I could use add column. 1024 01:01:17,220 --> 01:01:20,910 Modify was good for modifying a column that already existed. 1025 01:01:20,910 --> 01:01:23,760 Add column is good for adding a brand new column. 1026 01:01:23,760 --> 01:01:25,500 I'll call this one deleted. 1027 01:01:25,500 --> 01:01:29,680 And its type will actually be a tiny int, a tiny int. 1028 01:01:29,680 --> 01:01:30,180 Why? 1029 01:01:30,180 --> 01:01:33,420 Well, it's only ever going to be 1 or 0. 1030 01:01:33,420 --> 01:01:35,910 I should use as few bits as possible. 1031 01:01:35,910 --> 01:01:38,430 And tiny int gives that to me. 1032 01:01:38,430 --> 01:01:42,090 Now I'll say the default value is 0 for not deleted. 1033 01:01:42,090 --> 01:01:43,350 And I'll hit Enter. 1034 01:01:43,350 --> 01:01:48,270 And now if I describe collections again, I 1035 01:01:48,270 --> 01:01:52,050 should see that I'm back to having a deleted column of type 1036 01:01:52,050 --> 01:01:56,430 tiny int with a default value of 0. 1037 01:01:56,430 --> 01:02:00,210 OK, so now, we saw I could select everything 1038 01:02:00,210 --> 01:02:04,140 from collections that's not deleted by saying something like select star 1039 01:02:04,140 --> 01:02:07,530 from collections where deleted equals 0. 1040 01:02:07,530 --> 01:02:12,720 I can implement the same thing here, but now using a stored procedure, something 1041 01:02:12,720 --> 01:02:15,960 I could call again, and again, and again to speed up 1042 01:02:15,960 --> 01:02:18,160 this process just a little bit. 1043 01:02:18,160 --> 01:02:22,200 So I'll say I want to create this procedure, current collections. 1044 01:02:22,200 --> 01:02:26,460 But there's something a little interesting about the MySQL syntax 1045 01:02:26,460 --> 01:02:27,000 here. 1046 01:02:27,000 --> 01:02:29,130 If you remember, if I go back to some slides 1047 01:02:29,130 --> 01:02:32,460 here, in the middle of begin and end, we're 1048 01:02:32,460 --> 01:02:37,950 going to put our very own SQL statements to run as part of this procedure. 1049 01:02:37,950 --> 01:02:43,680 Well, to end a SQL statement, we have to have a semicolon. 1050 01:02:43,680 --> 01:02:48,990 But MySQL might actually prematurely end this statement. 1051 01:02:48,990 --> 01:02:53,490 If I type quit procedure name begin some statement then a semicolon, 1052 01:02:53,490 --> 01:02:54,870 it'll say, OK, you're done. 1053 01:02:54,870 --> 01:02:56,280 But I'm actually not. 1054 01:02:56,280 --> 01:02:57,780 I might want to add more statements. 1055 01:02:57,780 --> 01:03:00,760 I might want to conclude with this end down here. 1056 01:03:00,760 --> 01:03:05,970 So a workaround is to temporarily declare a different delimiter. 1057 01:03:05,970 --> 01:03:09,900 Instead of ending my queries with a semicolon here, 1058 01:03:09,900 --> 01:03:12,750 I'll end them with, let's say, slash slash. 1059 01:03:12,750 --> 01:03:14,740 And we'll see that in just a moment here. 1060 01:03:14,740 --> 01:03:17,220 I'll come back to my computer. 1061 01:03:17,220 --> 01:03:20,550 And for the sake of writing this procedure, 1062 01:03:20,550 --> 01:03:25,950 I'll change my delimiter, that is what icon represents the end of my query 1063 01:03:25,950 --> 01:03:28,980 to slash slash, like this. 1064 01:03:28,980 --> 01:03:31,020 Then I'll hit Enter. 1065 01:03:31,020 --> 01:03:33,570 Now I can start creating my procedure. 1066 01:03:33,570 --> 01:03:36,270 I'll say create procedure. 1067 01:03:36,270 --> 01:03:42,240 And I'll call this one current collections with a open and closed 1068 01:03:42,240 --> 01:03:43,290 parentheses after this. 1069 01:03:43,290 --> 01:03:45,930 This is common for names here. 1070 01:03:45,930 --> 01:03:48,390 Then I'll say begin this procedure. 1071 01:03:48,390 --> 01:03:56,400 And in this procedure, I want to select the title, accession number, 1072 01:03:56,400 --> 01:03:59,610 and acquired columns. 1073 01:03:59,610 --> 01:04:06,810 I want to do it from the collections table, where deleted equals 0. 1074 01:04:06,810 --> 01:04:11,640 And now I can safely include a semicolon because my delimiter 1075 01:04:11,640 --> 01:04:14,970 for this entire query is a slash slash. 1076 01:04:14,970 --> 01:04:19,210 So this is the end of this particular query inside my procedure. 1077 01:04:19,210 --> 01:04:22,680 It's also the end of my procedure in general. 1078 01:04:22,680 --> 01:04:25,170 And now to declare the end of this procedure, 1079 01:04:25,170 --> 01:04:27,720 the end of this overarching statement, I could 1080 01:04:27,720 --> 01:04:31,380 say slash slash instead of semicolon. 1081 01:04:31,380 --> 01:04:35,130 Now, if I hit Enter, I'll see query OK. 1082 01:04:35,130 --> 01:04:38,980 And now I should change back to, in this case, semicolon, 1083 01:04:38,980 --> 01:04:43,020 so I can get back to what I'm used to using to end my queries. 1084 01:04:43,020 --> 01:04:44,070 I'll hit Enter. 1085 01:04:44,070 --> 01:04:45,960 And now let's see what we can do. 1086 01:04:45,960 --> 01:04:50,190 I could say call current collection. 1087 01:04:50,190 --> 01:04:52,560 I think we called it collection, singular. 1088 01:04:52,560 --> 01:04:56,070 And I'll say this, semicolon there. 1089 01:04:56,070 --> 01:05:02,440 Now, I should see all of the items that are marked as not deleted. 1090 01:05:02,440 --> 01:05:07,690 So using call, can I then call this procedure I've made and stored? 1091 01:05:07,690 --> 01:05:09,300 Let's try deleting something. 1092 01:05:09,300 --> 01:05:12,240 I'll try maybe deleting Farmers Working at Dawn. 1093 01:05:12,240 --> 01:05:19,800 I could say update, in this case, collections and set deleted equal to 1 1094 01:05:19,800 --> 01:05:28,380 where the title equals Farmers Working at Dawn, semicolon. 1095 01:05:28,380 --> 01:05:29,550 I'll Enter again. 1096 01:05:29,550 --> 01:05:31,710 I'll see one row has changed. 1097 01:05:31,710 --> 01:05:38,220 If I call current collection again, like this, semicolon Enter, 1098 01:05:38,220 --> 01:05:41,880 I'll see that particular artwork is now gone. 1099 01:05:41,880 --> 01:05:47,820 It's not returned to me by the query that was part of my procedure. 1100 01:05:47,820 --> 01:05:52,910 So this then is one example of a stored procedure, one way of saving a query 1101 01:05:52,910 --> 01:05:57,330 and calling it again and again if I might use it very often. 1102 01:05:57,330 --> 01:06:00,800 But what questions do we have so far on stored procedures 1103 01:06:00,800 --> 01:06:01,973 and how we can use them? 1104 01:06:01,973 --> 01:06:03,140 AUDIENCE: I have a question. 1105 01:06:03,140 --> 01:06:05,717 Just like in other languages, do we have the provisioning 1106 01:06:05,717 --> 01:06:08,300 that we can pass the parameters and arguments in the function, 1107 01:06:08,300 --> 01:06:10,670 in the procedure as well, in MySQL? 1108 01:06:10,670 --> 01:06:14,420 And secondly, can we call other procedures also in one procedure, 1109 01:06:14,420 --> 01:06:17,387 like in other languages we have one function call from other? 1110 01:06:17,387 --> 01:06:20,220 CARTER ZENKE: A great question about the power of stored procedures. 1111 01:06:20,220 --> 01:06:23,600 So the first question, could I add parameters or arguments 1112 01:06:23,600 --> 01:06:24,470 to this procedure? 1113 01:06:24,470 --> 01:06:27,260 And could I change what I do based on that input? 1114 01:06:27,260 --> 01:06:30,225 The answer is absolutely yes, you can in MySQL. 1115 01:06:30,225 --> 01:06:32,100 And I'll show you that in just a minute here. 1116 01:06:32,100 --> 01:06:37,160 The next one, which is, can I call other procedures inside my procedure? 1117 01:06:37,160 --> 01:06:38,910 Yes to that as well. 1118 01:06:38,910 --> 01:06:44,380 Any statement you might write, you could probably put in a procedure too. 1119 01:06:44,380 --> 01:06:46,420 OK, let's take one more. 1120 01:06:46,420 --> 01:06:48,520 AUDIENCE: Yes, I have a question. 1121 01:06:48,520 --> 01:06:54,580 There is any way that I can add any note or comment to the table? 1122 01:06:54,580 --> 01:06:57,580 CARTER ZENKE: Can you add any note or comment to your table? 1123 01:06:57,580 --> 01:07:00,700 A good intuition, like it's good to include notes to yourself 1124 01:07:00,700 --> 01:07:02,290 or to include notes to others. 1125 01:07:02,290 --> 01:07:06,010 To my knowledge you would best do that in something 1126 01:07:06,010 --> 01:07:08,560 like a schema dot SQL file. 1127 01:07:08,560 --> 01:07:12,910 If you have a text editor, you could make a dot SQL file 1128 01:07:12,910 --> 01:07:15,210 and store your schema inside of that. 1129 01:07:15,210 --> 01:07:17,710 At which point, you could leave comments to other developers 1130 01:07:17,710 --> 01:07:21,100 to make them know what you are thinking as you were building this. 1131 01:07:21,100 --> 01:07:23,770 There might be a way to leave comments in MySQL. 1132 01:07:23,770 --> 01:07:25,720 But I don't personally know what that is. 1133 01:07:25,720 --> 01:07:29,140 You could do some research to look that up too. 1134 01:07:29,140 --> 01:07:30,560 OK, great question. 1135 01:07:30,560 --> 01:07:31,550 Let's keep going then. 1136 01:07:31,550 --> 01:07:34,060 And to our question about procedures and trying 1137 01:07:34,060 --> 01:07:38,240 to make sure they could have inputs, let's see that as well. 1138 01:07:38,240 --> 01:07:43,930 Well, we saw too that I might want to mark some particular item as sold 1139 01:07:43,930 --> 01:07:46,880 as soon as I mark it as being deleted. 1140 01:07:46,880 --> 01:07:49,540 So for that I'll need a new table here. 1141 01:07:49,540 --> 01:07:57,490 If I type show tables semicolon, I'll see artists, collections, and created. 1142 01:07:57,490 --> 01:08:02,650 But back in our week on the MFA, we also had a table 1143 01:08:02,650 --> 01:08:06,730 to keep track of transactions, when we might buy and sell 1144 01:08:06,730 --> 01:08:08,210 some piece of artwork. 1145 01:08:08,210 --> 01:08:10,750 So let's recreate that table here. 1146 01:08:10,750 --> 01:08:16,640 I'll create a table called transactions, create table transactions, like this. 1147 01:08:16,640 --> 01:08:23,779 And then I'll give it an ID column with the type int and auto increment. 1148 01:08:23,779 --> 01:08:29,080 Then I'll give it a title to mark as being sold or bought. 1149 01:08:29,080 --> 01:08:34,180 Let's say the title is a varchar of size up to 64. 1150 01:08:34,180 --> 01:08:36,160 And it can't be null. 1151 01:08:36,160 --> 01:08:40,600 Then I'll say the action, which can be either bought 1152 01:08:40,600 --> 01:08:46,180 or sold-- did we buy or sell this piece of artwork with this particular title? 1153 01:08:46,180 --> 01:08:52,029 So I'll say this is an enum with the possibility of bought or sold. 1154 01:08:52,029 --> 01:08:54,430 Then I'll say this is not null. 1155 01:08:54,430 --> 01:08:59,170 And I'll then say the primary key of this table is ID. 1156 01:08:59,170 --> 01:09:03,550 And I'll close out this create table statement here hitting Enter. 1157 01:09:03,550 --> 01:09:08,109 Now I can see I have a transactions table to log the buying 1158 01:09:08,109 --> 01:09:11,350 and selling of pieces of artwork. 1159 01:09:11,350 --> 01:09:17,800 Well, it stands to reason that when I update my collections table and I 1160 01:09:17,800 --> 01:09:22,149 set some item to be deleted from a 0 to a 1 in that deleted column, 1161 01:09:22,149 --> 01:09:27,550 I want to add it to the transactions table and also mark it as sold. 1162 01:09:27,550 --> 01:09:32,050 So normally, that is two steps, updating the table 1163 01:09:32,050 --> 01:09:34,300 and also adding it to transactions. 1164 01:09:34,300 --> 01:09:40,040 With a procedure though, I could give that entire sequence one name. 1165 01:09:40,040 --> 01:09:41,979 So let's try that here. 1166 01:09:41,979 --> 01:09:45,959 I'll say I want to make my delimiter slash slash. 1167 01:09:45,959 --> 01:09:48,069 I'm going to create some procedure here. 1168 01:09:48,069 --> 01:09:51,100 I'm going to end it with slash slash. 1169 01:09:51,100 --> 01:09:54,160 So I'll say create procedure. 1170 01:09:54,160 --> 01:09:57,400 And I'll call this one sell, like this. 1171 01:09:57,400 --> 01:10:01,570 But now instead of open parentheses and closed parentheses, 1172 01:10:01,570 --> 01:10:05,980 I actually want to give some input to this function. 1173 01:10:05,980 --> 01:10:10,630 And let me actually ask you all here, what kind of input 1174 01:10:10,630 --> 01:10:13,330 should I give to this sell function, so I can 1175 01:10:13,330 --> 01:10:16,960 identify the painting I want to sell? 1176 01:10:16,960 --> 01:10:19,780 AUDIENCE: It should be with a primary key 1177 01:10:19,780 --> 01:10:21,882 and refer to the name of the painting probably. 1178 01:10:21,882 --> 01:10:23,590 CARTER ZENKE: Yeah, I like your thinking. 1179 01:10:23,590 --> 01:10:25,990 We could think about what identifies this artwork. 1180 01:10:25,990 --> 01:10:28,390 Maybe it's the ID or the title. 1181 01:10:28,390 --> 01:10:32,080 In this case, I like your intuition for the ID, which is unique, 1182 01:10:32,080 --> 01:10:33,140 will never change. 1183 01:10:33,140 --> 01:10:37,360 So let's actually use the ID here of an item in the collections table 1184 01:10:37,360 --> 01:10:39,100 to mark it as sold. 1185 01:10:39,100 --> 01:10:41,560 So I'll come back to my computer. 1186 01:10:41,560 --> 01:10:45,250 And I'll say that sell actually takes an input. 1187 01:10:45,250 --> 01:10:50,050 And I can say in here to note a value as an input. 1188 01:10:50,050 --> 01:10:55,540 I'll say the input is something called sold ID, the item we're selling. 1189 01:10:55,540 --> 01:10:59,540 And the type of this is an int, like this. 1190 01:10:59,540 --> 01:11:05,770 So now I have a procedure that takes in some value in integer called sold ID. 1191 01:11:05,770 --> 01:11:08,680 Now, I'll say let's begin this procedure. 1192 01:11:08,680 --> 01:11:10,120 Let's begin down below. 1193 01:11:10,120 --> 01:11:15,490 And the first thing I should do is update collections, like this, 1194 01:11:15,490 --> 01:11:21,310 and set deleted equal to 1 under what condition? 1195 01:11:21,310 --> 01:11:28,360 Well, perhaps under the condition where the ID is equal to our input, which 1196 01:11:28,360 --> 01:11:30,310 is called sold ID. 1197 01:11:30,310 --> 01:11:35,080 So I'm using the input of sold ID to find in collections which 1198 01:11:35,080 --> 01:11:38,470 artwork I'm marking now as deleted. 1199 01:11:38,470 --> 01:11:39,820 But the next step-- 1200 01:11:39,820 --> 01:11:42,220 I'll say semicolon here to end that step-- 1201 01:11:42,220 --> 01:11:44,890 is to then insert it into the transactions table. 1202 01:11:44,890 --> 01:11:46,780 I've marked it as being deleted. 1203 01:11:46,780 --> 01:11:49,420 But now I need to insert into transactions. 1204 01:11:49,420 --> 01:11:59,530 I'll say then I want to insert into transactions in the title and action 1205 01:11:59,530 --> 01:12:02,720 columns some values here. 1206 01:12:02,720 --> 01:12:04,010 Well, what values? 1207 01:12:04,010 --> 01:12:08,710 The first value is actually the title of this artwork. 1208 01:12:08,710 --> 01:12:11,500 And I only have the ID right now. 1209 01:12:11,500 --> 01:12:14,980 But I could use the ID to get back the title. 1210 01:12:14,980 --> 01:12:18,250 I could say write a subquery that is select, 1211 01:12:18,250 --> 01:12:25,960 in this case, title from collections where the ID equals sold, 1212 01:12:25,960 --> 01:12:28,240 sold ID like this. 1213 01:12:28,240 --> 01:12:32,770 That is a subquery that gives me back the title of this artwork. 1214 01:12:32,770 --> 01:12:36,430 Now, I'll also log sold here, like this. 1215 01:12:36,430 --> 01:12:41,320 And then I think that's the end for that particular query there. 1216 01:12:41,320 --> 01:12:43,960 Let me just double check this, insert into transactions, 1217 01:12:43,960 --> 01:12:46,780 title, action, value, select title from collections 1218 01:12:46,780 --> 01:12:49,270 where ID equals sold ID, sold. 1219 01:12:49,270 --> 01:12:50,650 OK, I think we're good. 1220 01:12:50,650 --> 01:12:51,820 I'll hit Enter on that. 1221 01:12:51,820 --> 01:12:55,990 And then I think I can safely end my procedure. 1222 01:12:55,990 --> 01:12:59,480 I'll say end here slash slash, Enter again. 1223 01:12:59,480 --> 01:13:01,570 I see everything was OK. 1224 01:13:01,570 --> 01:13:06,940 So now if I change my delimiter back to a semicolon, 1225 01:13:06,940 --> 01:13:11,680 I can call sell on some particular ID. 1226 01:13:11,680 --> 01:13:15,280 Like let's say I now want to sell the particular item. 1227 01:13:15,280 --> 01:13:19,630 I could say maybe select star from collections to see what I have. 1228 01:13:19,630 --> 01:13:25,240 And now I want to sell Imaginative Landscape, which has the ID of 2. 1229 01:13:25,240 --> 01:13:33,760 Well, I could try call sell and now pass in that value 2, call sell 2. 1230 01:13:33,760 --> 01:13:39,140 And if I now hit Enter, I should see some rows were affected. 1231 01:13:39,140 --> 01:13:40,270 This procedure worked. 1232 01:13:40,270 --> 01:13:40,870 Let me check. 1233 01:13:40,870 --> 01:13:44,800 I'll say select star from collections semicolon. 1234 01:13:44,800 --> 01:13:48,190 And I'll see that piece now has been updated. 1235 01:13:48,190 --> 01:13:49,600 Its value was 0. 1236 01:13:49,600 --> 01:13:51,970 But now it's 1 for the deleted column. 1237 01:13:51,970 --> 01:13:57,310 Now if I say select star from transactions semicolon, 1238 01:13:57,310 --> 01:14:02,710 I'll then see Imaginative Landscape being sold here as well. 1239 01:14:02,710 --> 01:14:04,528 So good check here. 1240 01:14:04,528 --> 01:14:06,070 And I'll actually just check on this. 1241 01:14:06,070 --> 01:14:09,862 So I think the ID was 2. 1242 01:14:09,862 --> 01:14:11,320 And that actually makes sense here. 1243 01:14:11,320 --> 01:14:14,590 So the ID we're thinking about [INAUDIBLE] and is different here. 1244 01:14:14,590 --> 01:14:17,250 So this is the ID of the transaction, which 1245 01:14:17,250 --> 01:14:20,520 is different than the idea of the painting, which is 2, 1246 01:14:20,520 --> 01:14:23,310 just to clarify that. 1247 01:14:23,310 --> 01:14:28,170 OK, so what other questions then do we have on stored procedures? 1248 01:14:28,170 --> 01:14:32,580 We've seen some without inputs, now some with inputs. 1249 01:14:32,580 --> 01:14:34,870 What else are you wondering? 1250 01:14:34,870 --> 01:14:38,237 AUDIENCE: Is it possible to pass in multiple inputs to a stored procedure? 1251 01:14:38,237 --> 01:14:40,320 CARTER ZENKE: Yeah, could it have multiple inputs? 1252 01:14:40,320 --> 01:14:44,880 It absolutely could separated by some commas inside of those parentheses. 1253 01:14:44,880 --> 01:14:50,890 So you could imagine in sold ID int comma some other input as well. 1254 01:14:50,890 --> 01:14:54,880 And a procedure can also output some value. 1255 01:14:54,880 --> 01:14:58,170 You could say I want to store the result in some variable I 1256 01:14:58,170 --> 01:15:01,000 could keep track of in the server too. 1257 01:15:01,000 --> 01:15:02,200 Let's keep going then. 1258 01:15:02,200 --> 01:15:08,340 And we'll kind of end stored procedures on this note here, which is I seem to-- 1259 01:15:08,340 --> 01:15:11,340 if I go back to what I had discussed before, 1260 01:15:11,340 --> 01:15:15,420 I was able to update some item in my collections table 1261 01:15:15,420 --> 01:15:18,960 and then add it to the transactions table too. 1262 01:15:18,960 --> 01:15:23,700 So if I say select star from collections, there is some danger here. 1263 01:15:23,700 --> 01:15:27,900 Like if I try to sell something that has already been sold, 1264 01:15:27,900 --> 01:15:33,300 I might run the procedure and mark something as being sold twice. 1265 01:15:33,300 --> 01:15:36,750 Now, I'll actually leave this up to you as your own exercise. 1266 01:15:36,750 --> 01:15:38,970 But I want to give you a few tools to think about 1267 01:15:38,970 --> 01:15:40,710 for writing your own procedures. 1268 01:15:40,710 --> 01:15:42,690 Among them are some constructs you might be 1269 01:15:42,690 --> 01:15:45,340 familiar with if you've programmed before. 1270 01:15:45,340 --> 01:15:48,750 So here you might notice a stored procedure can actually 1271 01:15:48,750 --> 01:15:52,710 be compatible with some familiar programming syntax, like conditions, 1272 01:15:52,710 --> 01:15:56,610 for example, if, else if, and else, and also loops, 1273 01:15:56,610 --> 01:15:58,590 like the ability to do something more than once 1274 01:15:58,590 --> 01:16:02,260 or to repeat something over, and over, and over again. 1275 01:16:02,260 --> 01:16:06,210 So you can use these to really build up your own stored procedures 1276 01:16:06,210 --> 01:16:09,150 to combine statements and automate processes that you're 1277 01:16:09,150 --> 01:16:14,410 going to do very frequently on your database at large scale. 1278 01:16:14,410 --> 01:16:17,550 So we've seen here some options for stored procedures. 1279 01:16:17,550 --> 01:16:20,790 What we'll do is come back and talk next about Postgres. 1280 01:16:20,790 --> 01:16:22,320 See you in a few. 1281 01:16:22,320 --> 01:16:23,800 And we're back. 1282 01:16:23,800 --> 01:16:28,530 So we've seen so far MySQL, which is a new DBMS that [INAUDIBLE] 1283 01:16:28,530 --> 01:16:32,190 just a bit more than SQLite at a larger scale. 1284 01:16:32,190 --> 01:16:35,820 It's worth also taking a brief detour into Postgres 1285 01:16:35,820 --> 01:16:38,072 just to get a taste for what it looks like and get 1286 01:16:38,072 --> 01:16:39,780 an understanding for how it can represent 1287 01:16:39,780 --> 01:16:43,450 some of these same tables in a different environment. 1288 01:16:43,450 --> 01:16:45,690 So we'll talk here about Postgres. 1289 01:16:45,690 --> 01:16:49,350 And our goal will be to do the same thing we did with MySQL, converting 1290 01:16:49,350 --> 01:16:53,730 our tables from SQLite over to Postgres taking advantage 1291 01:16:53,730 --> 01:16:56,700 of some new types and some new affordances 1292 01:16:56,700 --> 01:17:01,180 it might offer that are particular to Postgres itself. 1293 01:17:01,180 --> 01:17:05,280 So here then is our table we used for cards in SQLite, 1294 01:17:05,280 --> 01:17:07,140 just to remind you again. 1295 01:17:07,140 --> 01:17:09,990 We had an ID column of type integer. 1296 01:17:09,990 --> 01:17:12,540 And that was our primary key. 1297 01:17:12,540 --> 01:17:15,900 What we saw in MySQL, we had a variety of integers. 1298 01:17:15,900 --> 01:17:19,170 And here we actually have the same thing in Postgres. 1299 01:17:19,170 --> 01:17:21,670 But they just look a little bit slightly different. 1300 01:17:21,670 --> 01:17:28,830 So here I have a table of integers in Postgres, the signed and signed values 1301 01:17:28,830 --> 01:17:31,420 over here going from negative to positive. 1302 01:17:31,420 --> 01:17:35,310 So unlike MySQL, which had tiny int, medium int, 1303 01:17:35,310 --> 01:17:41,040 and so on, in Postgres we only have small int, int, and big int, 1304 01:17:41,040 --> 01:17:43,890 just a little bit fewer options because the developers thought 1305 01:17:43,890 --> 01:17:46,350 you might not need a tiny int or medium int, 1306 01:17:46,350 --> 01:17:49,740 just give you three particular options. 1307 01:17:49,740 --> 01:17:52,740 Now although there are fewer integer options, 1308 01:17:52,740 --> 01:17:55,890 we also get something a little bit different. 1309 01:17:55,890 --> 01:18:03,540 I can also use a serial type, small serial, serial, or big serial. 1310 01:18:03,540 --> 01:18:08,430 Now, a serial is still an integer, a whole number. 1311 01:18:08,430 --> 01:18:14,190 But it actually does what we try to do in MySQL for me, which was try 1312 01:18:14,190 --> 01:18:16,890 to include an auto increment attribute. 1313 01:18:16,890 --> 01:18:19,980 A serial will automatically increment for me, 1314 01:18:19,980 --> 01:18:23,320 making it good for primary keys. 1315 01:18:23,320 --> 01:18:26,370 So here, if I go back to my cards table, and I 1316 01:18:26,370 --> 01:18:29,760 had this ID column of type integer, I could 1317 01:18:29,760 --> 01:18:35,400 update that to now be a serial, where a serial means some whole integer 1318 01:18:35,400 --> 01:18:40,930 number that will increase as I add more and more data to my table. 1319 01:18:40,930 --> 01:18:42,880 Now, what else do you notice here? 1320 01:18:42,880 --> 01:18:47,070 Well, it seems like these double quotes are back in Postgres. 1321 01:18:47,070 --> 01:18:49,380 So in MySQL you had back ticks. 1322 01:18:49,380 --> 01:18:51,330 In SQLite you had double quotes. 1323 01:18:51,330 --> 01:18:56,230 Well, in Postgres you have double quotes for your identifiers here. 1324 01:18:56,230 --> 01:19:00,000 So this then is how we've changed our cards table. 1325 01:19:00,000 --> 01:19:03,810 And now let's actually do this inside of Postgres. 1326 01:19:03,810 --> 01:19:08,910 I'll log in and create my very own database and my own table. 1327 01:19:08,910 --> 01:19:11,820 So I'll come back to my computer here. 1328 01:19:11,820 --> 01:19:13,890 And now let me do this. 1329 01:19:13,890 --> 01:19:18,630 I will log in to Postgres using this command right here. 1330 01:19:18,630 --> 01:19:24,510 I'm going to open PSQL, which is essentially the command line interface, 1331 01:19:24,510 --> 01:19:28,140 like MySQL, but now just for Postgres. 1332 01:19:28,140 --> 01:19:29,520 I'll hit Enter here. 1333 01:19:29,520 --> 01:19:32,910 And I'll log in as the default Postgres user. 1334 01:19:32,910 --> 01:19:35,730 This is the admin for Postgres now. 1335 01:19:35,730 --> 01:19:37,380 I'll type in my password. 1336 01:19:37,380 --> 01:19:39,105 And I'll see a prompt. 1337 01:19:39,105 --> 01:19:42,660 It looks just a little bit different, but the same kind of spirit here. 1338 01:19:42,660 --> 01:19:47,610 If I want to list all databases, I don't type show databases. 1339 01:19:47,610 --> 01:19:50,880 I just type backslash l for list. 1340 01:19:50,880 --> 01:19:52,110 And I'll Enter here. 1341 01:19:52,110 --> 01:19:54,990 And now I'll see a collection of default databases 1342 01:19:54,990 --> 01:19:57,690 that come with the Postgres installation. 1343 01:19:57,690 --> 01:20:00,480 Similar to MySQL, these are databases that 1344 01:20:00,480 --> 01:20:02,940 contain configuration information, metadata 1345 01:20:02,940 --> 01:20:05,910 on this installation of Postgres. 1346 01:20:05,910 --> 01:20:10,440 But now I want to create my own database, one called MBTA. 1347 01:20:10,440 --> 01:20:14,280 So I'll say create database, same as before, 1348 01:20:14,280 --> 01:20:18,670 and call it MBTA now using double quotes. 1349 01:20:18,670 --> 01:20:24,160 If I hit Enter now, I should see create database, confirming this statement. 1350 01:20:24,160 --> 01:20:28,260 And if I type backslash l I'll now see MBTA 1351 01:20:28,260 --> 01:20:32,440 is included in my list of databases. 1352 01:20:32,440 --> 01:20:38,100 So if I want to connect to a database, in MySQL I used use. 1353 01:20:38,100 --> 01:20:41,520 In Postgres I'll use backslash C. 1354 01:20:41,520 --> 01:20:46,770 So I'll say backslash C and then MBTA, hit Enter. 1355 01:20:46,770 --> 01:20:50,370 And now I'm connected to the database MBTA 1356 01:20:50,370 --> 01:20:54,780 as this user whose name is Postgres. 1357 01:20:54,780 --> 01:20:57,420 And now if I want to see what tables I already 1358 01:20:57,420 --> 01:21:02,250 have, I could type backslash dt, hit Enter. 1359 01:21:02,250 --> 01:21:07,110 And I'll see no relations, no tables, but just another way of figuring out 1360 01:21:07,110 --> 01:21:10,150 this is an empty database. 1361 01:21:10,150 --> 01:21:15,810 So we've seen so far how to log in to Postgres and some basic navigational 1362 01:21:15,810 --> 01:21:19,780 statements we can use to work our way around this database. 1363 01:21:19,780 --> 01:21:22,650 But it turns out there is some stuff that's familiar here. 1364 01:21:22,650 --> 01:21:25,560 Like if I want to create a new table, I can do it 1365 01:21:25,560 --> 01:21:28,350 the very same way I did it before. 1366 01:21:28,350 --> 01:21:30,270 I could use create table. 1367 01:21:30,270 --> 01:21:33,660 I could say create table, let's say, cards. 1368 01:21:33,660 --> 01:21:38,460 And then let me include, in this case, an ID 1369 01:21:38,460 --> 01:21:45,060 column of type serial, which means an integer that auto increments for me. 1370 01:21:45,060 --> 01:21:50,760 Then I'll apply the primary key constraint, familiar old friend here. 1371 01:21:50,760 --> 01:21:54,510 Then I'll close out cards, like this, hit Enter. 1372 01:21:54,510 --> 01:21:58,530 And I should see it confirms I created this table. 1373 01:21:58,530 --> 01:22:03,600 If I now type slash dt to list my tables, 1374 01:22:03,600 --> 01:22:07,380 I'll see I have a table called cards. 1375 01:22:07,380 --> 01:22:13,110 And if I want to look at this table in particular to understand its schema, 1376 01:22:13,110 --> 01:22:17,220 I could type slash d and the name of that table, 1377 01:22:17,220 --> 01:22:19,470 cards, like this, hit Enter. 1378 01:22:19,470 --> 01:22:24,060 And now I should see some more information on this cards table. 1379 01:22:24,060 --> 01:22:29,580 It has a column called ID, a type called integer, and some more information 1380 01:22:29,580 --> 01:22:34,810 like we saw in MySQL, but now just in a different format. 1381 01:22:34,810 --> 01:22:36,270 So not to worry. 1382 01:22:36,270 --> 01:22:41,400 If these commands seem maybe arcane to you at first glance, 1383 01:22:41,400 --> 01:22:43,650 you'll get used to it with some practice if you choose 1384 01:22:43,650 --> 01:22:48,090 to learn either MySQL or Postgres here. 1385 01:22:48,090 --> 01:22:52,460 So let me pause before we go any further and just ask, what questions so far 1386 01:22:52,460 --> 01:22:54,950 do we have on Postgres, if any? 1387 01:22:54,950 --> 01:22:58,750 AUDIENCE: What's the sort of thing that indicates there's an error? 1388 01:22:58,750 --> 01:23:00,000 CARTER ZENKE: A good question. 1389 01:23:00,000 --> 01:23:03,380 So similar to SQLite or MySQL, you'll know 1390 01:23:03,380 --> 01:23:07,850 you have an error when you hit Enter and it doesn't say either query 1391 01:23:07,850 --> 01:23:11,600 OK or ptu, that command you just used. 1392 01:23:11,600 --> 01:23:15,140 It might give you a particular error, like maybe you forgot a comma, 1393 01:23:15,140 --> 01:23:16,820 maybe you forgot a quote somewhere. 1394 01:23:16,820 --> 01:23:18,830 It should of give you an idea of what you 1395 01:23:18,830 --> 01:23:20,810 didn't do correctly with the syntax. 1396 01:23:20,810 --> 01:23:27,130 The best way to see it is to try it and figure out what it tells you as you go. 1397 01:23:27,130 --> 01:23:28,520 OK, let's keep going then. 1398 01:23:28,520 --> 01:23:32,500 And we'll see a few differences between Postgres and MySQL too. 1399 01:23:32,500 --> 01:23:35,350 So we have our cards table. 1400 01:23:35,350 --> 01:23:38,680 But our other goal is to create a stations table. 1401 01:23:38,680 --> 01:23:41,395 So let me pull up what we did for this one. 1402 01:23:41,395 --> 01:23:44,390 Let me find stations here. 1403 01:23:44,390 --> 01:23:50,110 So this was our stations table with Postgres, or sorry, with SQLite. 1404 01:23:50,110 --> 01:23:55,000 We had an ID, a name, and a line column, integer, text, and text 1405 01:23:55,000 --> 01:23:56,500 for those types. 1406 01:23:56,500 --> 01:24:01,150 Now we've seen in Postgres I could change this integer for my primary key 1407 01:24:01,150 --> 01:24:02,110 to be a serial. 1408 01:24:02,110 --> 01:24:04,730 That's a little more apt for this use case here. 1409 01:24:04,730 --> 01:24:10,300 And it turns out that in Postgres I also have varchars I could use inside 1410 01:24:10,300 --> 01:24:11,620 of this stations table. 1411 01:24:11,620 --> 01:24:16,180 I could make name a varchar here up to size 32 characters. 1412 01:24:16,180 --> 01:24:19,600 And for simplicity, though we used an enum earlier, 1413 01:24:19,600 --> 01:24:24,680 I'll now use a varchar for line as well just to keep things simple. 1414 01:24:24,680 --> 01:24:30,040 So here we've updated our stations table using what Postgres might offer us. 1415 01:24:30,040 --> 01:24:33,580 The real differences though come not in our stations table 1416 01:24:33,580 --> 01:24:38,110 but actually inside of our other table, this one called swipes. 1417 01:24:38,110 --> 01:24:42,130 So this here was our swipes table, or some section of it. 1418 01:24:42,130 --> 01:24:47,210 We had IDs up top and primary key and foreign key constraints down below. 1419 01:24:47,210 --> 01:24:53,380 Now remember, we had type of text, date time of numeric, and amount of numeric 1420 01:24:53,380 --> 01:24:54,250 as well. 1421 01:24:54,250 --> 01:24:58,900 Well, Postgres offers some other types we could use to update these as well. 1422 01:24:58,900 --> 01:25:01,840 Among them are enum, our old friend again. 1423 01:25:01,840 --> 01:25:06,080 But we use enum a little bit differently. 1424 01:25:06,080 --> 01:25:11,290 So instead of including enum inside of our column definition, 1425 01:25:11,290 --> 01:25:15,100 we actually create our very own type. 1426 01:25:15,100 --> 01:25:17,440 Postgres supports something like this. 1427 01:25:17,440 --> 01:25:20,750 Create a type and give it some name. 1428 01:25:20,750 --> 01:25:25,250 And then I could say what that type actually is. 1429 01:25:25,250 --> 01:25:30,260 So here I've created some type and called it swipe type and then said, 1430 01:25:30,260 --> 01:25:35,650 this type can have the following values, enter, exit, or deposit. 1431 01:25:35,650 --> 01:25:41,710 And I can then use this name, swipe type, as my very own type in future 1432 01:25:41,710 --> 01:25:46,870 create table statements, so something that Postgres has that MySQL might not 1433 01:25:46,870 --> 01:25:48,700 in this form. 1434 01:25:48,700 --> 01:25:50,830 Let's consider then dates and times. 1435 01:25:50,830 --> 01:25:53,050 So dates and times, largely similar. 1436 01:25:53,050 --> 01:25:56,370 I have timestamp, date, time, and interval, 1437 01:25:56,370 --> 01:25:59,970 a particular type representing the distance between times, 1438 01:25:59,970 --> 01:26:02,310 how long did something take? 1439 01:26:02,310 --> 01:26:06,090 But we still have our old friend's date and time for dates and times 1440 01:26:06,090 --> 01:26:11,010 respectively, and also timestamp to combine these two into one. 1441 01:26:11,010 --> 01:26:14,160 And then we also have precision with these. 1442 01:26:14,160 --> 01:26:18,030 I could say I want to represent to the hundreds place, the thousands 1443 01:26:18,030 --> 01:26:23,200 place, or so on for each of timestamp, time, and interval. 1444 01:26:23,200 --> 01:26:26,920 Continuing on, we could also work with real numbers, which 1445 01:26:26,920 --> 01:26:30,400 are largely the same amenities between MySQL and Postgres 1446 01:26:30,400 --> 01:26:32,570 with a few differences. 1447 01:26:32,570 --> 01:26:37,570 One is that at least in Postgres we call decimal numeric. 1448 01:26:37,570 --> 01:26:41,080 So numeric in this case can specify some precision, that 1449 01:26:41,080 --> 01:26:43,700 is the number of digits, and the scale, that 1450 01:26:43,700 --> 01:26:45,700 is the number that come after the decimal point. 1451 01:26:45,700 --> 01:26:51,280 And newer to Postgres is a type particularly for money, 1452 01:26:51,280 --> 01:26:54,010 which depending on your settings in your database 1453 01:26:54,010 --> 01:26:57,550 will put to you either a certain number of decimal points 1454 01:26:57,550 --> 01:27:01,850 and include perhaps a dollar sign, or a euro sign, or a pound sign, 1455 01:27:01,850 --> 01:27:04,060 whatever it is you're using and whatever it 1456 01:27:04,060 --> 01:27:06,920 is you've configured in your database. 1457 01:27:06,920 --> 01:27:08,110 So let's update now. 1458 01:27:08,110 --> 01:27:11,290 Let's go ahead and improve this statement in swipes 1459 01:27:11,290 --> 01:27:13,390 from SQLite to Postgres. 1460 01:27:13,390 --> 01:27:15,100 Here, I had type. 1461 01:27:15,100 --> 01:27:18,610 Well, that could now be our very own swipe type. 1462 01:27:18,610 --> 01:27:21,820 Notice here I defined up above as an enum. 1463 01:27:21,820 --> 01:27:26,050 Now I'm going to use it down below in my swipes table. 1464 01:27:26,050 --> 01:27:29,890 This type column has the type swipe type. 1465 01:27:29,890 --> 01:27:31,450 And it cannot be null. 1466 01:27:31,450 --> 01:27:33,250 Then I'll keep going. 1467 01:27:33,250 --> 01:27:35,950 And I'll make datetime just timestamp. 1468 01:27:35,950 --> 01:27:38,500 Timestamp is the newer version in Postgres 1469 01:27:38,500 --> 01:27:44,380 of what MySQL had called date time and what SQLite called numeric. 1470 01:27:44,380 --> 01:27:47,830 And then instead of, let's say, current timestamp, 1471 01:27:47,830 --> 01:27:51,910 Postgres just uses this function called now, the current timestamp it 1472 01:27:51,910 --> 01:27:55,330 will get for as you add some new row. 1473 01:27:55,330 --> 01:27:58,570 And then finally, amount, we could use money. 1474 01:27:58,570 --> 01:28:00,220 We could use numeric. 1475 01:28:00,220 --> 01:28:04,000 We'll stick in this case to just numeric, like we did for MySQL 1476 01:28:04,000 --> 01:28:10,450 and say numeric up to five digits, two of which come after the decimal point. 1477 01:28:10,450 --> 01:28:15,990 So a whirlwind tour of some differences in types between MySQL and Postgres. 1478 01:28:15,990 --> 01:28:18,720 What questions do we have on these? 1479 01:28:18,720 --> 01:28:21,480 I'm not seeing too many here. 1480 01:28:21,480 --> 01:28:22,380 Let me keep going. 1481 01:28:22,380 --> 01:28:25,420 I'll come back to my computer here. 1482 01:28:25,420 --> 01:28:29,880 So it's worth showing that we're able to create a table. 1483 01:28:29,880 --> 01:28:32,610 We can then show it and describe it like this. 1484 01:28:32,610 --> 01:28:36,300 But then the question remains, how do you get out of Postgres? 1485 01:28:36,300 --> 01:28:40,510 Well, for that you can use-- let me clear my screen-- simply slash q. 1486 01:28:40,510 --> 01:28:42,960 Whereas in MySQL you could use quit. 1487 01:28:42,960 --> 01:28:45,420 In Postgres it's slash q. 1488 01:28:45,420 --> 01:28:47,220 So I'll hit Enter on this. 1489 01:28:47,220 --> 01:28:49,650 And we'll say goodbye to Postgres for now. 1490 01:28:49,650 --> 01:28:53,040 But after the break, we'll come back and keep working with MySQL 1491 01:28:53,040 --> 01:28:57,450 and focus on actually what features we can use to scale up our applications. 1492 01:28:57,450 --> 01:29:03,000 What does MySQL and Postgres offer to serve more users, more frequently? 1493 01:29:03,000 --> 01:29:05,240 Back in a few. 1494 01:29:05,240 --> 01:29:06,570 And we're back. 1495 01:29:06,570 --> 01:29:11,030 So we've gotten a taste so far of both MySQL and Postgres. 1496 01:29:11,030 --> 01:29:14,000 But now it's time to think through the features each 1497 01:29:14,000 --> 01:29:17,550 offers to scale up our application. 1498 01:29:17,550 --> 01:29:23,390 So let's envision a sample application here, one that starts off pretty small. 1499 01:29:23,390 --> 01:29:29,390 We have users trying to read from this database about 100 times per minute. 1500 01:29:29,390 --> 01:29:32,270 But they're also writing to the database, 1501 01:29:32,270 --> 01:29:36,080 adding new data about 50 times per minute. 1502 01:29:36,080 --> 01:29:40,370 And maybe this application over time it grows a little bit. 1503 01:29:40,370 --> 01:29:45,590 Maybe instead of 100 reads, it goes up to 1,000 and multiplies by 10, 1504 01:29:45,590 --> 01:29:47,610 and same thing for our writes. 1505 01:29:47,610 --> 01:29:51,170 They also multiply on average by about 10. 1506 01:29:51,170 --> 01:29:55,730 And this application, it gets even more popular, increases by 10-fold again. 1507 01:29:55,730 --> 01:30:00,030 Now we have 10,000 reads per minute, 5,000 writes per minute. 1508 01:30:00,030 --> 01:30:03,650 And then finally, it keeps growing another 10-fold. 1509 01:30:03,650 --> 01:30:09,500 And now we're at 100,000 reads per minute and 50,000 writes per minute. 1510 01:30:09,500 --> 01:30:12,230 What might be the problem? 1511 01:30:12,230 --> 01:30:18,500 If I had a single server, like this, why might my application 1512 01:30:18,500 --> 01:30:21,980 start to slow down? 1513 01:30:21,980 --> 01:30:25,910 AUDIENCE: So yeah, I would like to say that whenever the requests are 1514 01:30:25,910 --> 01:30:32,330 increasing with the time to which the server would reply will also increase. 1515 01:30:32,330 --> 01:30:35,002 And the list would seem to grow on and on. 1516 01:30:35,002 --> 01:30:36,960 CARTER ZENKE: Yeah, I like your thinking there. 1517 01:30:36,960 --> 01:30:39,110 So you can think of the time that each query takes. 1518 01:30:39,110 --> 01:30:43,260 As we saw in our optimizing lecture, we could time our queries. 1519 01:30:43,260 --> 01:30:47,520 And if I have many, many, many of those queries running all at one time, 1520 01:30:47,520 --> 01:30:49,770 I mean that could add up substantially. 1521 01:30:49,770 --> 01:30:53,690 So it's worth thinking about whether one computer can really 1522 01:30:53,690 --> 01:30:57,960 handle all of these reads and all of these writes. 1523 01:30:57,960 --> 01:31:01,220 So if you find yourself writing some application 1524 01:31:01,220 --> 01:31:06,350 and it behaves a little bit slower than you want it to, particularly at scale, 1525 01:31:06,350 --> 01:31:09,140 you have a few options. 1526 01:31:09,140 --> 01:31:12,050 And I want to ask you just to use your own intuition here. 1527 01:31:12,050 --> 01:31:17,150 Like if you encountered this problem of your application slowing down, 1528 01:31:17,150 --> 01:31:22,550 what might you try to do to solve that problem? 1529 01:31:22,550 --> 01:31:28,400 AUDIENCE: You could try to optimize your queries. 1530 01:31:28,400 --> 01:31:30,500 CARTER ZENKE: Yeah, try to optimize your queries, 1531 01:31:30,500 --> 01:31:33,260 make sure they take a little less time than before. 1532 01:31:33,260 --> 01:31:36,890 Or you could even try to throw more hardware at the problem. 1533 01:31:36,890 --> 01:31:39,200 Like maybe you have optimized your queries 1534 01:31:39,200 --> 01:31:40,910 to the maximum you think you can. 1535 01:31:40,910 --> 01:31:43,160 And it's just still not fast enough. 1536 01:31:43,160 --> 01:31:47,270 Well, one approach might be what's called vertical scaling, 1537 01:31:47,270 --> 01:31:51,770 trying to make this single server all the more quicker 1538 01:31:51,770 --> 01:31:54,150 at replying to your queries. 1539 01:31:54,150 --> 01:31:58,520 So vertical scaling is about increasing your application's capacity 1540 01:31:58,520 --> 01:32:02,960 by increasing the capacity of a single server, the computing 1541 01:32:02,960 --> 01:32:04,920 power of that server. 1542 01:32:04,920 --> 01:32:07,940 So whereas this was our regular, old server, 1543 01:32:07,940 --> 01:32:11,420 you could imagine trying to vertically scale this server 1544 01:32:11,420 --> 01:32:14,300 just making it more powerful, optimize your queries 1545 01:32:14,300 --> 01:32:19,280 and have them run on this server faster than they did before. 1546 01:32:19,280 --> 01:32:22,670 But there's also another approach you could take. 1547 01:32:22,670 --> 01:32:25,860 Here we focused on one server. 1548 01:32:25,860 --> 01:32:30,620 But what else could you logically do to just have more resources 1549 01:32:30,620 --> 01:32:32,480 at your disposal? 1550 01:32:32,480 --> 01:32:36,320 AUDIENCE: Maybe we can horizontally scale up the server. 1551 01:32:36,320 --> 01:32:39,532 We can add two or three more servers so we can manage it. 1552 01:32:39,532 --> 01:32:41,240 CARTER ZENKE: Yeah, I like your thinking. 1553 01:32:41,240 --> 01:32:45,210 So maybe we buy a fancier server like this. 1554 01:32:45,210 --> 01:32:49,280 But if we ever run out of this server's capacity, 1555 01:32:49,280 --> 01:32:52,550 well the only logical thing is to buy more servers. 1556 01:32:52,550 --> 01:32:55,160 Like let's have not just one, but three for instance. 1557 01:32:55,160 --> 01:32:58,520 And this is an example of horizontal scaling. 1558 01:32:58,520 --> 01:33:02,450 So horizontal scaling is increasing your application's capacity 1559 01:33:02,450 --> 01:33:05,930 not by increasing a single server's computing power, 1560 01:33:05,930 --> 01:33:08,900 but by spreading it out, buying more servers 1561 01:33:08,900 --> 01:33:12,710 to handle that load across more, in this case, individuals, 1562 01:33:12,710 --> 01:33:15,810 people who can do work on those queries. 1563 01:33:15,810 --> 01:33:19,230 So horizontal scaling might look a bit more like this. 1564 01:33:19,230 --> 01:33:24,180 We have not just one server, but now three overall. 1565 01:33:24,180 --> 01:33:28,640 So this process of having not just one server, but multiple, 1566 01:33:28,640 --> 01:33:31,470 it's a process called replication. 1567 01:33:31,470 --> 01:33:37,520 So replication is keeping copies of your database not just on one server, 1568 01:33:37,520 --> 01:33:39,620 but on multiple. 1569 01:33:39,620 --> 01:33:41,990 And this seems like a good idea. 1570 01:33:41,990 --> 01:33:46,050 Like wouldn't it be great to have a copy of your data not just in one place, 1571 01:33:46,050 --> 01:33:47,390 but in multiple? 1572 01:33:47,390 --> 01:33:51,540 Maybe you yourself have made a backup of your own data on your computer. 1573 01:33:51,540 --> 01:33:54,710 This is not a dissimilar idea. 1574 01:33:54,710 --> 01:33:57,830 But it turns out there are some challenges with this. 1575 01:33:57,830 --> 01:34:01,670 Like it's very complicated to have servers communicate with themselves 1576 01:34:01,670 --> 01:34:05,380 to keep track of who has the most recent updates, who should 1577 01:34:05,380 --> 01:34:08,140 I follow, who should lead, and so on. 1578 01:34:08,140 --> 01:34:12,580 And so for that reason, there are a few models of replication 1579 01:34:12,580 --> 01:34:14,680 that you should at least know about. 1580 01:34:14,680 --> 01:34:20,170 Among them are single-leader replication, multi-leader replication, 1581 01:34:20,170 --> 01:34:22,700 and leaderless replication. 1582 01:34:22,700 --> 01:34:28,180 So single-leader means that there is a single database server out there 1583 01:34:28,180 --> 01:34:33,130 that takes in the incoming writes to an application, the updates to some data. 1584 01:34:33,130 --> 01:34:37,150 And that single server passes them on to some copy 1585 01:34:37,150 --> 01:34:40,030 to take care of making sure it's redundant, 1586 01:34:40,030 --> 01:34:45,470 that this data is stored not just in one place, but multiple places too. 1587 01:34:45,470 --> 01:34:49,850 Multi-leader means there's not just one server listening for updates. 1588 01:34:49,850 --> 01:34:51,020 There are multiple. 1589 01:34:51,020 --> 01:34:53,480 And suffice to say, it gets a lot more complicated 1590 01:34:53,480 --> 01:34:56,510 when there are more than one leader trying to listen for updates 1591 01:34:56,510 --> 01:34:58,230 and process them. 1592 01:34:58,230 --> 01:35:01,670 And finally, there are some other ones, like leaderless and some more, 1593 01:35:01,670 --> 01:35:03,650 that actually do away with leaders altogether 1594 01:35:03,650 --> 01:35:05,430 and do something totally different. 1595 01:35:05,430 --> 01:35:09,110 But today will focus on single-leader, one of the most basic application 1596 01:35:09,110 --> 01:35:14,270 strategies you could try and one that is built into MySQL. 1597 01:35:14,270 --> 01:35:17,150 So let's focus on this one here. 1598 01:35:17,150 --> 01:35:21,410 You could imagine, in this case, that you have a computer. 1599 01:35:21,410 --> 01:35:24,980 And you're trying to connect to some social network. 1600 01:35:24,980 --> 01:35:27,840 Well, here are two databases. 1601 01:35:27,840 --> 01:35:29,180 One is a leader. 1602 01:35:29,180 --> 01:35:31,340 And one is a follower. 1603 01:35:31,340 --> 01:35:33,920 And just to check for understanding here, 1604 01:35:33,920 --> 01:35:37,370 let's say I want to upload a profile photo. 1605 01:35:37,370 --> 01:35:39,230 I'm updating some data. 1606 01:35:39,230 --> 01:35:42,720 To which server would I send that data? 1607 01:35:42,720 --> 01:35:44,960 If I want to change something on the server, 1608 01:35:44,960 --> 01:35:48,492 who's listening for that request? 1609 01:35:48,492 --> 01:35:50,450 AUDIENCE: I think that we'll send to the leader 1610 01:35:50,450 --> 01:35:53,743 because the other servers would get data from the leader. 1611 01:35:53,743 --> 01:35:55,160 They would get it from the leader. 1612 01:35:55,160 --> 01:35:59,120 But I will send to the leader because it will have the latest 1613 01:35:59,120 --> 01:36:01,513 data for other servers and other users. 1614 01:36:01,513 --> 01:36:02,930 CARTER ZENKE: Yeah, exactly right. 1615 01:36:02,930 --> 01:36:05,600 So if I want to change some piece of data, 1616 01:36:05,600 --> 01:36:10,790 I know by definition the leader has the most recent copy of that data 1617 01:36:10,790 --> 01:36:16,070 and is also the designated server to process updates to my database, 1618 01:36:16,070 --> 01:36:20,460 changes to its data, like insertions, or updates, or deletions. 1619 01:36:20,460 --> 01:36:25,310 So if I'm uploading some profile photo, I'll send this photo of me 1620 01:36:25,310 --> 01:36:26,540 right here to my leader. 1621 01:36:26,540 --> 01:36:29,240 And that leader will then take that profile photo 1622 01:36:29,240 --> 01:36:32,000 and store it on the database. 1623 01:36:32,000 --> 01:36:37,020 But before long, the leader will also talk to, communicate with the follower 1624 01:36:37,020 --> 01:36:41,580 and make sure that the follower has a copy of this data elsewhere. 1625 01:36:41,580 --> 01:36:44,900 So it will forward on this update to the follower 1626 01:36:44,900 --> 01:36:47,810 who will then process it in turn. 1627 01:36:47,810 --> 01:36:52,700 And now that both the leader and the follower have this data, well, 1628 01:36:52,700 --> 01:36:57,560 who could I ask to see the latest profile photo for Carter Zenke? 1629 01:36:57,560 --> 01:37:00,020 Which server might have that? 1630 01:37:00,020 --> 01:37:02,190 AUDIENCE: [INAUDIBLE] servers that have the data. 1631 01:37:02,190 --> 01:37:04,760 CARTER ZENKE: So I could ask either server for information 1632 01:37:04,760 --> 01:37:06,290 on this profile photo. 1633 01:37:06,290 --> 01:37:10,220 I could ask either the leader or the follower because each 1634 01:37:10,220 --> 01:37:15,860 is storing a copy of this photo, or rather a copy of its location 1635 01:37:15,860 --> 01:37:18,270 on the server itself. 1636 01:37:18,270 --> 01:37:21,770 Now let's say here I asked the follower for this profile photo 1637 01:37:21,770 --> 01:37:23,930 so I can see it on my own browser. 1638 01:37:23,930 --> 01:37:29,870 Here, I'll ask the follower and get back that image inside of my own browser. 1639 01:37:29,870 --> 01:37:32,960 But I could have asked either one to distribute 1640 01:37:32,960 --> 01:37:36,410 that load across both of these servers. 1641 01:37:36,410 --> 01:37:41,000 Now, this follower is what's known as a read replica. 1642 01:37:41,000 --> 01:37:47,030 A read replica is a server from which we would only ever read data. 1643 01:37:47,030 --> 01:37:51,230 If I ever wanted to update some photo or insert some new rows, 1644 01:37:51,230 --> 01:37:53,480 I would never ask the follower. 1645 01:37:53,480 --> 01:37:57,060 I would always ask the leader who's in charge, so to speak, 1646 01:37:57,060 --> 01:38:01,860 of handling these updates and these writes to that database. 1647 01:38:01,860 --> 01:38:07,790 So the next step is to think about not just the types of leaders that we have, 1648 01:38:07,790 --> 01:38:10,340 but also how they communicate. 1649 01:38:10,340 --> 01:38:15,410 And in general, there are two options for communication, both synchronous 1650 01:38:15,410 --> 01:38:16,970 and asynchronous. 1651 01:38:16,970 --> 01:38:20,330 So synchronous means that the leader will 1652 01:38:20,330 --> 01:38:23,900 wait for the follower to get the data and process it 1653 01:38:23,900 --> 01:38:25,610 before doing anything else. 1654 01:38:25,610 --> 01:38:29,060 It synchronizes with that follower. 1655 01:38:29,060 --> 01:38:32,450 Asynchronous though, means the leader sends that data to the follower 1656 01:38:32,450 --> 01:38:35,000 and doesn't wait for the follower to finish processing. 1657 01:38:35,000 --> 01:38:39,380 It just keeps going and going, hoping the follower is keeping up. 1658 01:38:39,380 --> 01:38:44,990 So let's focus first on a few of these, in particular on synchronous. 1659 01:38:44,990 --> 01:38:49,790 Now, in this diagram, I have a client and a server. 1660 01:38:49,790 --> 01:38:51,740 My client is my computer. 1661 01:38:51,740 --> 01:38:53,900 The server is some database. 1662 01:38:53,900 --> 01:38:58,880 And on this x-axis here, left to right, I have time. 1663 01:38:58,880 --> 01:39:02,780 Well, if I want to make a request to this server, 1664 01:39:02,780 --> 01:39:05,300 I might denote that with a line, like this. 1665 01:39:05,300 --> 01:39:07,970 My client sends some request to my server. 1666 01:39:07,970 --> 01:39:12,160 Maybe it asks for data or tries to update some photo. 1667 01:39:12,160 --> 01:39:17,080 Then that takes some time to get the request from my client to this server 1668 01:39:17,080 --> 01:39:18,340 down here. 1669 01:39:18,340 --> 01:39:21,730 And the response itself also takes some time. 1670 01:39:21,730 --> 01:39:24,280 Maybe the server responds, hey, everything's OK. 1671 01:39:24,280 --> 01:39:25,450 I got your photo. 1672 01:39:25,450 --> 01:39:26,980 Here's your data, et cetera. 1673 01:39:26,980 --> 01:39:30,100 It'll take some time for that to get back to me. 1674 01:39:30,100 --> 01:39:33,340 So between here and here, that's how long 1675 01:39:33,340 --> 01:39:38,080 it took for me to receive some data from this server. 1676 01:39:38,080 --> 01:39:40,810 Now let's focus on synchronous replication 1677 01:39:40,810 --> 01:39:45,322 here, where the leader and the follower are always in sync. 1678 01:39:45,322 --> 01:39:46,780 So this might look a bit like this. 1679 01:39:46,780 --> 01:39:50,950 If I add in the follower, I now have a leader and a follower. 1680 01:39:50,950 --> 01:39:55,970 I might send that profile photo to the leader, a bit like this. 1681 01:39:55,970 --> 01:40:00,640 The leader then sends it to the follower down below and waits. 1682 01:40:00,640 --> 01:40:03,460 It waits until the follower has confirmed 1683 01:40:03,460 --> 01:40:05,110 they have received that photo. 1684 01:40:05,110 --> 01:40:06,650 They have added to the database. 1685 01:40:06,650 --> 01:40:10,270 And now they send back a request saying, look, I got it. 1686 01:40:10,270 --> 01:40:12,520 Everything is OK. 1687 01:40:12,520 --> 01:40:15,670 And now only at this point, once the leader 1688 01:40:15,670 --> 01:40:19,930 has received the OK from the follower, does it then communicate with me. 1689 01:40:19,930 --> 01:40:22,250 It then says, everything's OK. 1690 01:40:22,250 --> 01:40:24,550 We got your photo. 1691 01:40:24,550 --> 01:40:26,550 So this is great because it's redundant. 1692 01:40:26,550 --> 01:40:32,130 Like I know for sure that my photo is on both the leader and the follower. 1693 01:40:32,130 --> 01:40:33,690 But there is a downside. 1694 01:40:33,690 --> 01:40:39,810 And looking at this diagram here, what do you think that downside might be? 1695 01:40:39,810 --> 01:40:41,060 AUDIENCE: Maybe it's too slow. 1696 01:40:41,060 --> 01:40:42,227 CARTER ZENKE: Yeah, exactly. 1697 01:40:42,227 --> 01:40:43,440 It could be just too slow. 1698 01:40:43,440 --> 01:40:48,360 Like if I have to wait for the follower to get the OK back to the leader 1699 01:40:48,360 --> 01:40:52,110 and then back to me, that's just more time for me to wait. 1700 01:40:52,110 --> 01:40:56,730 And maybe my application shouldn't wait around that long. 1701 01:40:56,730 --> 01:41:01,590 You might use this, though, if you're working in finance or healthcare, where 1702 01:41:01,590 --> 01:41:06,510 you have to be doubly sure you have data stored exactly as you want it to be. 1703 01:41:06,510 --> 01:41:10,452 But if you're like a social media site, like a Facebook or a Twitter, 1704 01:41:10,452 --> 01:41:13,410 I mean, maybe you don't care if you lose a tweet every once in a while. 1705 01:41:13,410 --> 01:41:14,520 It's OK. 1706 01:41:14,520 --> 01:41:20,480 So maybe synchronous isn't the best, most ideal scenario for that as well. 1707 01:41:20,480 --> 01:41:22,750 So let's think through now asynchronous. 1708 01:41:22,750 --> 01:41:26,590 So we'll go back to asynchronous here and think through our diagram again. 1709 01:41:26,590 --> 01:41:29,770 Well, in asynchronous, the leader doesn't wait around 1710 01:41:29,770 --> 01:41:31,610 for the follower to get the data. 1711 01:41:31,610 --> 01:41:34,570 So to visualize, my client could send this 1712 01:41:34,570 --> 01:41:37,720 to the leader, maybe a photo update, for example. 1713 01:41:37,720 --> 01:41:43,450 Then simultaneously, the leader tells me, I got your photo, everything's OK, 1714 01:41:43,450 --> 01:41:46,570 and then sends the photo to the follower. 1715 01:41:46,570 --> 01:41:50,990 The follower might later on tell the leader that everything is OK. 1716 01:41:50,990 --> 01:41:55,100 But the leader already told me, look, all good, we got your photo. 1717 01:41:55,100 --> 01:41:58,240 So there's nothing more to do after this. 1718 01:41:58,240 --> 01:42:00,430 Now, this admittedly is faster. 1719 01:42:00,430 --> 01:42:03,430 Like this amount of time is much shorter. 1720 01:42:03,430 --> 01:42:06,230 But again, there is some downside here. 1721 01:42:06,230 --> 01:42:12,660 Why might I not want to use asynchronous replication, at least in some cases? 1722 01:42:12,660 --> 01:42:16,280 AUDIENCE: I would guess that it's an issue with data 1723 01:42:16,280 --> 01:42:17,405 being corrupted by someone? 1724 01:42:17,405 --> 01:42:18,655 CARTER ZENKE: So you're right. 1725 01:42:18,655 --> 01:42:20,660 There could be some data corruption here, 1726 01:42:20,660 --> 01:42:24,080 where let's say the leader sends some data to the follower. 1727 01:42:24,080 --> 01:42:26,930 And they confirm for me that data has been received 1728 01:42:26,930 --> 01:42:30,030 across all instances of these servers. 1729 01:42:30,030 --> 01:42:32,930 Well, I mean, later on maybe the follower encounters some problem. 1730 01:42:32,930 --> 01:42:35,300 And they don't actually follow through. 1731 01:42:35,300 --> 01:42:38,930 Well, at that point, I have had a bad contract with my leader. 1732 01:42:38,930 --> 01:42:40,430 It told me everything was OK. 1733 01:42:40,430 --> 01:42:42,410 But actually, it wasn't. 1734 01:42:42,410 --> 01:42:46,250 So there are all kinds of trade offs here, whether in terms of the time 1735 01:42:46,250 --> 01:42:48,020 that things take or redundancy. 1736 01:42:48,020 --> 01:42:50,420 But it's up to you to decide which one might 1737 01:42:50,420 --> 01:42:53,720 work best for your own use case, synchronous replication 1738 01:42:53,720 --> 01:42:55,910 or asynchronous. 1739 01:42:55,910 --> 01:43:00,450 Now, in addition to replication, there is some other strategy for scaling 1740 01:43:00,450 --> 01:43:02,100 you should be at least familiar with. 1741 01:43:02,100 --> 01:43:05,300 And this is a process known as sharding. 1742 01:43:05,300 --> 01:43:08,450 Now, sharding is great for really large data 1743 01:43:08,450 --> 01:43:12,380 sets on the order of many, many, many gigabytes or terabytes of data 1744 01:43:12,380 --> 01:43:16,880 that you just can't fit on a single server or a single computer. 1745 01:43:16,880 --> 01:43:20,150 Sharding involves taking some data set like that 1746 01:43:20,150 --> 01:43:24,890 and essentially splitting it up in some logical way across multiple servers, 1747 01:43:24,890 --> 01:43:25,760 like this. 1748 01:43:25,760 --> 01:43:29,300 And maybe for simplicity you have some database of names. 1749 01:43:29,300 --> 01:43:34,070 And you decide that all of those names that begin with, let's say, A to I 1750 01:43:34,070 --> 01:43:37,160 end up on this server here, this first one. 1751 01:43:37,160 --> 01:43:42,060 But then all those names J through R, well, they get stored on this server 1752 01:43:42,060 --> 01:43:42,560 here. 1753 01:43:42,560 --> 01:43:46,620 And S through Z, they get stored on this third server here. 1754 01:43:46,620 --> 01:43:50,870 So our data has been split, or sharded, across multiple instances 1755 01:43:50,870 --> 01:43:53,090 of some database server. 1756 01:43:53,090 --> 01:43:56,660 And this is helpful because if I had a really large data set, 1757 01:43:56,660 --> 01:44:01,250 it now is smaller on every individual database server. 1758 01:44:01,250 --> 01:44:04,580 You could also organize your data by, let's say, primary key. 1759 01:44:04,580 --> 01:44:08,060 Maybe you have 3,000 rows, or 3,000 records. 1760 01:44:08,060 --> 01:44:12,380 And the first 1 through 1,000, those go on this first server. 1761 01:44:12,380 --> 01:44:17,060 The second, 1,001 to 2,000 inclusive, those go on the second server. 1762 01:44:17,060 --> 01:44:22,910 And 2,001 to 3,000 also inclusive, those go on that third server. 1763 01:44:22,910 --> 01:44:28,140 Now, there's a lot of thought that goes behind the best way to shard your data. 1764 01:44:28,140 --> 01:44:33,560 The goal is to really to avoid what we call a hotspot, wherein one database 1765 01:44:33,560 --> 01:44:38,910 server becomes more frequently accessed or requested than others. 1766 01:44:38,910 --> 01:44:43,190 Let's say for some reason the data in 1,001 to 2,000 1767 01:44:43,190 --> 01:44:45,680 gets accessed quite a lot. 1768 01:44:45,680 --> 01:44:49,820 Well, now we're actually trying to run into a problem 1769 01:44:49,820 --> 01:44:53,330 here where one server gets overloaded, the same problem we're 1770 01:44:53,330 --> 01:44:56,210 trying to guard against using sharding. 1771 01:44:56,210 --> 01:44:58,430 So it's important then to think about what's 1772 01:44:58,430 --> 01:45:01,970 the best, most logical way to distribute your data so you 1773 01:45:01,970 --> 01:45:07,730 don't have one server being accessed much more than another. 1774 01:45:07,730 --> 01:45:12,490 And there's another problem here too, which is if I'm not using replication, 1775 01:45:12,490 --> 01:45:18,130 I'm only using sharding, well, let's say that one server goes down. 1776 01:45:18,130 --> 01:45:20,390 What do we have now? 1777 01:45:20,390 --> 01:45:24,730 Maybe we're missing all the names that went to the middle of our partition. 1778 01:45:24,730 --> 01:45:27,310 Or maybe we're missing all of those rows, 1779 01:45:27,310 --> 01:45:30,400 let's see, 1,001 to 2,000 for instance. 1780 01:45:30,400 --> 01:45:34,580 And that would be what we call a single point of failure. 1781 01:45:34,580 --> 01:45:39,230 If one system goes down, our entire system is now not usable. 1782 01:45:39,230 --> 01:45:43,120 So something to keep in mind as you work on sharding is also 1783 01:45:43,120 --> 01:45:45,970 could combine it with some strategies from replication 1784 01:45:45,970 --> 01:45:50,680 to keep copies of your data across servers too. 1785 01:45:50,680 --> 01:45:55,780 So these are some features built into MySQL and Postgres for scaling. 1786 01:45:55,780 --> 01:45:58,570 But they also give us user accounts. 1787 01:45:58,570 --> 01:46:01,900 And we can perhaps enhance our security by taking advantage 1788 01:46:01,900 --> 01:46:04,130 of some of those user accounts. 1789 01:46:04,130 --> 01:46:07,720 So let's talk now about this idea of access control, 1790 01:46:07,720 --> 01:46:12,130 trying to have users and only grant some of them certain permissions. 1791 01:46:12,130 --> 01:46:18,500 So here we saw before I was logging into MySQL using the root user. 1792 01:46:18,500 --> 01:46:20,890 The root user is the admin. 1793 01:46:20,890 --> 01:46:23,740 But I could just as easily create a new user, 1794 01:46:23,740 --> 01:46:26,500 perhaps one called Carter for myself. 1795 01:46:26,500 --> 01:46:29,540 So I will open up MySQL again. 1796 01:46:29,540 --> 01:46:35,170 And in this case, I will create a new user, one named Carter. 1797 01:46:35,170 --> 01:46:38,540 So to do so, I can use this command here, 1798 01:46:38,540 --> 01:46:45,500 this statement here, create a user whose name is Carter in single quotes. 1799 01:46:45,500 --> 01:46:50,410 Now, I can say they are identified by the password. 1800 01:46:50,410 --> 01:46:55,270 And this password is literally password, P-A-S-S-W-O-R-D. 1801 01:46:55,270 --> 01:46:58,270 It is one of, unfortunately, the most popular passwords. 1802 01:46:58,270 --> 01:46:59,900 Do not use this password. 1803 01:46:59,900 --> 01:47:02,320 So I will hit Enter here. 1804 01:47:02,320 --> 01:47:05,080 And now I have created some new user named Carter. 1805 01:47:05,080 --> 01:47:09,080 And I can log in using that Carter username. 1806 01:47:09,080 --> 01:47:11,710 So I'll create myself a new terminal here. 1807 01:47:11,710 --> 01:47:16,460 And I will try to log in to MySQL again. 1808 01:47:16,460 --> 01:47:19,060 I will run this statement here. 1809 01:47:19,060 --> 01:47:24,850 I'll say MySQL, which is the command line interface for the MySQL server. 1810 01:47:24,850 --> 01:47:29,290 And I will say I want to connect using this user, Carter. 1811 01:47:29,290 --> 01:47:30,910 Before we used root. 1812 01:47:30,910 --> 01:47:32,830 But now I'll use Carter. 1813 01:47:32,830 --> 01:47:39,240 The host, the place the server is located, is 127.0.0.1, my own computer 1814 01:47:39,240 --> 01:47:39,740 here. 1815 01:47:39,740 --> 01:47:42,580 This is the IP for this very computer. 1816 01:47:42,580 --> 01:47:48,460 I can then say dash P, capital P, for the port number, 3306 by default. 1817 01:47:48,460 --> 01:47:52,550 And finally, dash P means prompt me for my password. 1818 01:47:52,550 --> 01:47:54,620 I'll type it in here. 1819 01:47:54,620 --> 01:47:55,720 So I'll hit Enter. 1820 01:47:55,720 --> 01:47:57,700 And now I get prompted for the password. 1821 01:47:57,700 --> 01:47:59,970 I'll type that in here. 1822 01:47:59,970 --> 01:48:02,228 And now-- oh wait, I typed in the wrong password. 1823 01:48:02,228 --> 01:48:03,270 So let me try this again. 1824 01:48:03,270 --> 01:48:04,260 I'll go back up. 1825 01:48:04,260 --> 01:48:08,730 And I will type in my password, which was password, hit Enter. 1826 01:48:08,730 --> 01:48:11,970 And now I'm logged in as Carter. 1827 01:48:11,970 --> 01:48:18,540 So by default, if I try something like this, show databases Enter, 1828 01:48:18,540 --> 01:48:20,850 I'll only see a few. 1829 01:48:20,850 --> 01:48:25,380 Like I'm pretty sure I had MBTA and MFA in here and a few others. 1830 01:48:25,380 --> 01:48:27,940 But I just can't see them. 1831 01:48:27,940 --> 01:48:30,840 So when I create this new user, by default 1832 01:48:30,840 --> 01:48:36,430 it has very few privileges, very few ways of accessing this database. 1833 01:48:36,430 --> 01:48:40,110 But I could decide to grant some privileges. 1834 01:48:40,110 --> 01:48:43,230 And just to show you, let me be root again for a second. 1835 01:48:43,230 --> 01:48:44,490 Here I'm the admin. 1836 01:48:44,490 --> 01:48:46,470 I can do most anything I want. 1837 01:48:46,470 --> 01:48:49,980 And notice how in here I have a new database. 1838 01:48:49,980 --> 01:48:52,320 I can type show databases Enter. 1839 01:48:52,320 --> 01:48:55,470 And I'll see I have a new database called 1840 01:48:55,470 --> 01:49:00,730 rideshare, which is similar from a prior week on viewing and security. 1841 01:49:00,730 --> 01:49:06,000 So if I say use rideshare semicolon Enter, 1842 01:49:06,000 --> 01:49:11,450 I could say select star from rideshare semicolon. 1843 01:49:11,450 --> 01:49:13,200 Whoops, sorry, the table wasn't rideshare. 1844 01:49:13,200 --> 01:49:15,960 Select star from rides. 1845 01:49:15,960 --> 01:49:17,490 Then I'll Enter here. 1846 01:49:17,490 --> 01:49:21,720 And now I'll see all of the rides in the rides table, which 1847 01:49:21,720 --> 01:49:24,240 is part of the rideshare database. 1848 01:49:24,240 --> 01:49:28,530 And if I look at this table, in our week on viewing, 1849 01:49:28,530 --> 01:49:36,120 we talked about eliminating some PII, personally identifiable information. 1850 01:49:36,120 --> 01:49:39,490 What was that PII here? 1851 01:49:39,490 --> 01:49:42,880 Seems it was the rider column, like the names of those riders. 1852 01:49:42,880 --> 01:49:45,370 We know where each rider is going. 1853 01:49:45,370 --> 01:49:48,490 But ideally we should make a new view, one 1854 01:49:48,490 --> 01:49:52,930 that allows us to see not the riders, but only their origins 1855 01:49:52,930 --> 01:49:59,230 and destinations, removing this PII of individuals' names. 1856 01:49:59,230 --> 01:50:02,680 So by default, Carter can't see this data. 1857 01:50:02,680 --> 01:50:07,330 But neither can Carter see the view I made called analysis. 1858 01:50:07,330 --> 01:50:13,270 If I go back to my computer here, I could type select star 1859 01:50:13,270 --> 01:50:17,050 from analysis, hit semicolon Enter. 1860 01:50:17,050 --> 01:50:21,190 And now I'll see the view I had made, the virtual table that 1861 01:50:21,190 --> 01:50:23,071 removes that PII. 1862 01:50:23,071 --> 01:50:25,000 But here I'm logged in as a root. 1863 01:50:25,000 --> 01:50:28,750 If Carter wants to see this, I go over here, show databases, 1864 01:50:28,750 --> 01:50:31,220 I can't even see the rideshare database. 1865 01:50:31,220 --> 01:50:38,260 So the root user needs to grant access to Carter to see this particular view. 1866 01:50:38,260 --> 01:50:42,820 Now, for that, I can use some syntax that looks a bit like this here. 1867 01:50:42,820 --> 01:50:46,390 I can use both grant and revoke. 1868 01:50:46,390 --> 01:50:51,670 Grant means to give some privilege, like select, or update, or delete 1869 01:50:51,670 --> 01:50:55,690 on some table to a particular user. 1870 01:50:55,690 --> 01:50:58,420 Revoke means simply to remove that permission. 1871 01:50:58,420 --> 01:51:02,740 If I gave you select before, I'll now revoke it down below like this. 1872 01:51:02,740 --> 01:51:04,810 And there are many privileges you can use. 1873 01:51:04,810 --> 01:51:08,020 You could use all, like everything, allow Carter 1874 01:51:08,020 --> 01:51:10,150 to do literally everything he could do. 1875 01:51:10,150 --> 01:51:13,550 There's create to create tables, and views, and so on. 1876 01:51:13,550 --> 01:51:18,400 There's insert to add data, select to read, and all the way down this list. 1877 01:51:18,400 --> 01:51:22,560 Look at documentation to figure out what you can use here. 1878 01:51:22,560 --> 01:51:27,440 So I want to give Carter it seems like a select privilege on the view I 1879 01:51:27,440 --> 01:51:30,780 created for analysis in this database. 1880 01:51:30,780 --> 01:51:32,120 So I'll come back over here. 1881 01:51:32,120 --> 01:51:33,740 And I'll try that. 1882 01:51:33,740 --> 01:51:37,190 I will log in as root here. 1883 01:51:37,190 --> 01:51:45,680 And I will try to grant the select permission 1884 01:51:45,680 --> 01:51:51,350 privilege on the rideshare database, and in particular, 1885 01:51:51,350 --> 01:51:54,320 the analysis table inside of it. 1886 01:51:54,320 --> 01:51:59,750 And I'll grant that to Carter, like this, semicolon and Enter. 1887 01:51:59,750 --> 01:52:03,050 Now I'll see the query is OK. 1888 01:52:03,050 --> 01:52:08,960 If I log back in as Carter down here, and now type show databases Enter, 1889 01:52:08,960 --> 01:52:12,540 I should see I can now see the rideshare database. 1890 01:52:12,540 --> 01:52:16,450 So I'll type use rideshare semicolon. 1891 01:52:16,450 --> 01:52:18,260 The database changed. 1892 01:52:18,260 --> 01:52:20,360 I could say show tables. 1893 01:52:20,360 --> 01:52:25,890 And what do I see but that virtual table, that view called analysis. 1894 01:52:25,890 --> 01:52:34,460 So now as Carter I can select star from analysis semicolon 1895 01:52:34,460 --> 01:52:37,010 and see that same data. 1896 01:52:37,010 --> 01:52:38,930 But I can't see the ride data. 1897 01:52:38,930 --> 01:52:43,640 If I say select star from rides, that table is currently hidden to me. 1898 01:52:43,640 --> 01:52:47,590 I get back, you don't have permission. 1899 01:52:47,590 --> 01:52:52,780 So this is the benefit of MySQL's users and privileges 1900 01:52:52,780 --> 01:52:56,500 to make sure I can have multiple users accessing this database 1901 01:52:56,500 --> 01:53:02,540 and then only allowing some to access confidential data. 1902 01:53:02,540 --> 01:53:06,760 So let me ask now what questions we have on access control 1903 01:53:06,760 --> 01:53:09,880 in MySQL and Postgres. 1904 01:53:09,880 --> 01:53:14,890 AUDIENCE: If we have to ask a test for everything, 1905 01:53:14,890 --> 01:53:17,470 is there any particular syntax for that? 1906 01:53:17,470 --> 01:53:21,473 Or should we type on all the select drive and everything? 1907 01:53:21,473 --> 01:53:23,140 CARTER ZENKE: Yeah, great question here. 1908 01:53:23,140 --> 01:53:27,130 Like if you want to give somebody multiple privileges, what could you do? 1909 01:53:27,130 --> 01:53:31,910 One approach is you could say grant and then say the privilege, 1910 01:53:31,910 --> 01:53:33,610 and then have a comma afterwards. 1911 01:53:33,610 --> 01:53:37,180 I could grant select comma insert comma et cetera, 1912 01:53:37,180 --> 01:53:42,100 including the tables that this user can insert, update, delete on, et cetera. 1913 01:53:42,100 --> 01:53:44,230 You could also use all. 1914 01:53:44,230 --> 01:53:48,790 Like if I say grant all on star dot star to Carter, 1915 01:53:48,790 --> 01:53:54,370 that means grant all permissions on everything to this user named Carter. 1916 01:53:54,370 --> 01:53:57,280 So you can absolutely do either of those approaches. 1917 01:53:57,280 --> 01:54:00,070 But best to be more fine grained particularly 1918 01:54:00,070 --> 01:54:03,650 in a production environment. 1919 01:54:03,650 --> 01:54:05,640 OK, great questions here. 1920 01:54:05,640 --> 01:54:07,280 Let's keep going. 1921 01:54:07,280 --> 01:54:12,890 So this then was how to use access controls. 1922 01:54:12,890 --> 01:54:18,440 But it's also worth thinking about ways we can prevent other kinds of attacks 1923 01:54:18,440 --> 01:54:21,260 and harden the security of our database. 1924 01:54:21,260 --> 01:54:26,490 Like here I have users and passwords, which works well. 1925 01:54:26,490 --> 01:54:30,860 But if I were to expose my application to the outside world, 1926 01:54:30,860 --> 01:54:35,210 I should also be concerned about a few other attacks here. 1927 01:54:35,210 --> 01:54:39,950 Now, one of these attacks is a SQL injection attack. 1928 01:54:39,950 --> 01:54:43,490 SQL injection means I have some query. 1929 01:54:43,490 --> 01:54:49,850 And somebody else adds in some malicious code to finish that query for me. 1930 01:54:49,850 --> 01:54:56,150 So for example, maybe I have a query like this, select ID from the users 1931 01:54:56,150 --> 01:55:00,230 table where the username equals someone's username 1932 01:55:00,230 --> 01:55:03,650 and the password equals somebody's password. 1933 01:55:03,650 --> 01:55:08,600 And if I find a user matching this username and password, 1934 01:55:08,600 --> 01:55:12,700 if I get back an ID, I'll log that user in. 1935 01:55:12,700 --> 01:55:17,460 Now, in a regular website of someone who's actually behaving nicely, 1936 01:55:17,460 --> 01:55:20,130 they could type in their username, like this. 1937 01:55:20,130 --> 01:55:22,900 And they could type in their password a bit like this. 1938 01:55:22,900 --> 01:55:24,317 This is kind of what I did before. 1939 01:55:24,317 --> 01:55:27,850 I logged in as Carter and gave the password of password. 1940 01:55:27,850 --> 01:55:32,010 And when you are entering your own username and your own password 1941 01:55:32,010 --> 01:55:35,880 on some website, something similar to this is likely happening. 1942 01:55:35,880 --> 01:55:40,020 There might be some SQL query that takes your username and takes your password 1943 01:55:40,020 --> 01:55:43,950 and inserts them into a SQL query to figure out, 1944 01:55:43,950 --> 01:55:46,680 are you who you say you are? 1945 01:55:46,680 --> 01:55:49,530 But let's think a bit more maliciously about this. 1946 01:55:49,530 --> 01:55:56,820 Like here, if I gave you permission to add anything to this dot dot dot here, 1947 01:55:56,820 --> 01:56:01,350 anything at all, what could you do to try to hack 1948 01:56:01,350 --> 01:56:04,860 this database, to log in as Carter? 1949 01:56:04,860 --> 01:56:08,970 Keep in mind that this is selecting an ID from users 1950 01:56:08,970 --> 01:56:12,650 where some condition is true. 1951 01:56:12,650 --> 01:56:17,740 What might you be able to add to this dot dot dot to log in as me? 1952 01:56:17,740 --> 01:56:21,010 AUDIENCE: Yes, you could use some kind of statements 1953 01:56:21,010 --> 01:56:27,820 like union, or [INAUDIBLE],, or delete something, et cetera. 1954 01:56:27,820 --> 01:56:31,000 CARTER ZENKE: Yeah, I could really put most anything I want in here. 1955 01:56:31,000 --> 01:56:35,680 But the SQL injection attack comes about when I deliberately put some SQL 1956 01:56:35,680 --> 01:56:39,340 statement that I think will give me access to some data I shouldn't 1957 01:56:39,340 --> 01:56:40,670 have access to. 1958 01:56:40,670 --> 01:56:43,690 So you could imagine maybe I type in my password. 1959 01:56:43,690 --> 01:56:50,170 But then I type in a single quote followed by or 1 equals 1. 1960 01:56:50,170 --> 01:56:51,350 Now, what would that do? 1961 01:56:51,350 --> 01:56:56,150 Well, notice that 1 equals 1, that condition will always be true. 1962 01:56:56,150 --> 01:56:58,540 And so this condition, where username equals 1963 01:56:58,540 --> 01:57:02,290 Carter and password equals password, well, in that case, 1964 01:57:02,290 --> 01:57:05,590 I have access to Carter's account because I 1965 01:57:05,590 --> 01:57:09,220 modified the condition just always be true whether or not 1966 01:57:09,220 --> 01:57:11,960 the password is correct. 1967 01:57:11,960 --> 01:57:13,840 Let's see one more example here. 1968 01:57:13,840 --> 01:57:16,510 Maybe I'm logging into my bank account. 1969 01:57:16,510 --> 01:57:18,520 And I want to check my balance. 1970 01:57:18,520 --> 01:57:22,630 We could maybe assume that the bank has some table of balances 1971 01:57:22,630 --> 01:57:25,210 or has some table of accounts that have balances. 1972 01:57:25,210 --> 01:57:28,990 And I could type in my account number, like this, one, two, three, four, 1973 01:57:28,990 --> 01:57:30,880 all the way up to number nine here. 1974 01:57:30,880 --> 01:57:35,110 And that will give me access to my account balance. 1975 01:57:35,110 --> 01:57:38,200 But now, let's take a look at this query here, 1976 01:57:38,200 --> 01:57:43,410 select balance from accounts where ID equals the account number. 1977 01:57:43,410 --> 01:57:49,680 What kinds of SQL statements could you include in this ID value here 1978 01:57:49,680 --> 01:57:54,060 to get access to everyone's account balances? 1979 01:57:54,060 --> 01:57:58,500 AUDIENCE: So we basically have a web which there's some ID. 1980 01:57:58,500 --> 01:58:05,520 If we just commented that part out, then it would be sort of a problem. 1981 01:58:05,520 --> 01:58:10,980 Or if we just did a similar example to the last time, like put the or 1 1982 01:58:10,980 --> 01:58:14,650 is equal to 1, that could also trigger it. 1983 01:58:14,650 --> 01:58:16,200 So there are multiple options. 1984 01:58:16,200 --> 01:58:17,470 CARTER ZENKE: Yeah, I like your thinking here. 1985 01:58:17,470 --> 01:58:19,380 There's definitely more than one option. 1986 01:58:19,380 --> 01:58:23,010 I like your thinking of maybe we include some comment 1987 01:58:23,010 --> 01:58:24,840 that kind of messes up this query. 1988 01:58:24,840 --> 01:58:29,310 Maybe it removes the condition so I just get select balance from accounts. 1989 01:58:29,310 --> 01:58:31,560 That could be one strategy. 1990 01:58:31,560 --> 01:58:36,210 I could maybe include an or that maybe makes sure 1991 01:58:36,210 --> 01:58:39,300 that this where is just always true, give me back all 1992 01:58:39,300 --> 01:58:41,160 balances from this account. 1993 01:58:41,160 --> 01:58:45,220 I could also use as well a union like this. 1994 01:58:45,220 --> 01:58:49,980 I could say make sure I add my own account ID but then union 1995 01:58:49,980 --> 01:58:54,180 some other query like this, union select ID from accounts. 1996 01:58:54,180 --> 01:58:58,230 Now I have all the account IDs from this table. 1997 01:58:58,230 --> 01:59:02,410 And I could then query them one at a time for their balance. 1998 01:59:02,410 --> 01:59:08,010 So let's try this out in MySQL to see some of the dangers of these injection 1999 01:59:08,010 --> 01:59:08,850 attacks. 2000 01:59:08,850 --> 01:59:11,670 I'll go back to my computer here. 2001 01:59:11,670 --> 01:59:16,320 And I'll show you how I already have a database set up for this. 2002 01:59:16,320 --> 01:59:23,280 If I type show databases, hit Enter, I see bank is one database here. 2003 01:59:23,280 --> 01:59:25,440 I want to use that database. 2004 01:59:25,440 --> 01:59:30,810 I'll say use bank and then hit Enter semicolon. 2005 01:59:30,810 --> 01:59:35,970 And now I can say show tables where I see an accounts table. 2006 01:59:35,970 --> 01:59:40,860 And I'll select star from accounts, Enter semicolon here, 2007 01:59:40,860 --> 01:59:45,970 to see some familiar users and some familiar balances. 2008 01:59:45,970 --> 01:59:49,170 So let's actually complete this injection 2009 01:59:49,170 --> 01:59:51,780 attack we just saw on the slides. 2010 01:59:51,780 --> 01:59:56,730 Maybe this bank has some feature where I can choose some ID. 2011 01:59:56,730 --> 02:00:00,700 I can enter in my own account ID to see my balance. 2012 02:00:00,700 --> 02:00:03,930 So on the website they complete this query for me. 2013 02:00:03,930 --> 02:00:06,390 I give them, let's say, the ID 1. 2014 02:00:06,390 --> 02:00:08,700 I am Alice in this case. 2015 02:00:08,700 --> 02:00:09,870 They might finish this. 2016 02:00:09,870 --> 02:00:15,450 They might say select star from accounts where the ID equals 1. 2017 02:00:15,450 --> 02:00:19,680 And they might automatically run this SQL query. 2018 02:00:19,680 --> 02:00:20,550 I'll hit Enter. 2019 02:00:20,550 --> 02:00:25,480 And now I'll see the balance for this account. 2020 02:00:25,480 --> 02:00:28,330 Well, now, what else could I do? 2021 02:00:28,330 --> 02:00:31,410 I could try to deliberately mess up this query 2022 02:00:31,410 --> 02:00:34,200 to run my own SQL injection attack. 2023 02:00:34,200 --> 02:00:38,190 I could select, let's say, star from accounts. 2024 02:00:38,190 --> 02:00:42,630 And then let me try where the ID equals 1. 2025 02:00:42,630 --> 02:00:43,710 I give them 1. 2026 02:00:43,710 --> 02:00:48,000 But I also give them some SQL statements they unwittingly execute. 2027 02:00:48,000 --> 02:00:56,080 I say union and then select star from accounts, a bit like this. 2028 02:00:56,080 --> 02:00:57,540 Now I'll hit Enter. 2029 02:00:57,540 --> 02:01:01,720 And if I complete this query, I see not just my own. 2030 02:01:01,720 --> 02:01:06,190 I see everyone's balances and their names too. 2031 02:01:06,190 --> 02:01:11,490 So this is one demo of how SQL injection attacks can show and leak 2032 02:01:11,490 --> 02:01:13,710 data that isn't intended to be leaked. 2033 02:01:13,710 --> 02:01:17,490 And sadly, these are fairly common today among people who just aren't 2034 02:01:17,490 --> 02:01:20,220 familiar with how to prevent them. 2035 02:01:20,220 --> 02:01:25,640 So let me ask now what questions we have on these attacks before we 2036 02:01:25,640 --> 02:01:28,590 get how to guard against them. 2037 02:01:28,590 --> 02:01:32,560 AUDIENCE: And just, can we use this in regular expressions also? 2038 02:01:32,560 --> 02:01:34,560 CARTER ZENKE: Could you use regular expressions? 2039 02:01:34,560 --> 02:01:37,100 So if you're familiar, a regular expression 2040 02:01:37,100 --> 02:01:40,670 is some pattern of characters that can match some text 2041 02:01:40,670 --> 02:01:43,250 and find values within some text. 2042 02:01:43,250 --> 02:01:45,770 I personally haven't seen regular expressions 2043 02:01:45,770 --> 02:01:47,850 being used for injection attacks. 2044 02:01:47,850 --> 02:01:49,350 But that doesn't mean they can't be. 2045 02:01:49,350 --> 02:01:51,830 And so always do your research on what you can and can't 2046 02:01:51,830 --> 02:01:54,420 use for an injection attack. 2047 02:01:54,420 --> 02:01:57,570 But a good question here. 2048 02:01:57,570 --> 02:01:59,840 So we've seen now SQL injection attacks. 2049 02:01:59,840 --> 02:02:01,640 But how do we prevent them? 2050 02:02:01,640 --> 02:02:06,200 Well one strategy is using what we call a prepared statement. 2051 02:02:06,200 --> 02:02:09,290 A prepared statement is kind of what it sounds like. 2052 02:02:09,290 --> 02:02:12,890 A prepared statement is a statement that we 2053 02:02:12,890 --> 02:02:16,730 will later on Insert some values into. 2054 02:02:16,730 --> 02:02:20,660 We might treat it like a fill in the blank, like here is my select 2055 02:02:20,660 --> 02:02:24,740 and I'll let you fill in what you want me to select. 2056 02:02:24,740 --> 02:02:30,148 But I'll make sure to clean your input before actually adding it in. 2057 02:02:30,148 --> 02:02:31,940 So let's think through a few of these here. 2058 02:02:31,940 --> 02:02:36,920 Here I have prepare name from statement. 2059 02:02:36,920 --> 02:02:43,520 This is a SQL statement I can use to prepare some statement, to take 2060 02:02:43,520 --> 02:02:47,630 some user input to make sure that when I add in that user input, 2061 02:02:47,630 --> 02:02:50,750 nothing malicious will happen. 2062 02:02:50,750 --> 02:02:53,930 And in particular, I can make sure the statement looks something 2063 02:02:53,930 --> 02:03:00,290 like this, select balance from accounts where ID equals question mark. 2064 02:03:00,290 --> 02:03:03,620 This question mark is just a question mark. 2065 02:03:03,620 --> 02:03:08,750 But syntactically it stands for the ability of this statement 2066 02:03:08,750 --> 02:03:14,300 to clean whatever I put inside of it to make sure it doesn't inadvertently 2067 02:03:14,300 --> 02:03:18,540 run SQL code I don't want it to run. 2068 02:03:18,540 --> 02:03:22,370 So let's try making our very own prepared statement to guard 2069 02:03:22,370 --> 02:03:25,100 against these kinds of insertions here. 2070 02:03:25,100 --> 02:03:27,410 I'll go back to my computer. 2071 02:03:27,410 --> 02:03:30,230 And let's try this one out. 2072 02:03:30,230 --> 02:03:33,200 I could go back to MySQL. 2073 02:03:33,200 --> 02:03:36,320 And let me prepare some statement. 2074 02:03:36,320 --> 02:03:39,380 I could say let's prepare a statement called 2075 02:03:39,380 --> 02:03:42,600 balance check, like we had before. 2076 02:03:42,600 --> 02:03:47,240 And now if I try to prepare this statement 2077 02:03:47,240 --> 02:03:55,730 from the following, select star from accounts where the ID equals 2078 02:03:55,730 --> 02:03:59,000 question mark end quote semicolon. 2079 02:03:59,000 --> 02:04:04,820 So in single quotes I have the query I'm trying to prepare to take user input. 2080 02:04:04,820 --> 02:04:07,790 And I'm giving it some name, like balance check. 2081 02:04:07,790 --> 02:04:09,680 So I'll hit Enter now. 2082 02:04:09,680 --> 02:04:13,370 And I'll see the query is OK, statement prepared. 2083 02:04:13,370 --> 02:04:18,860 Now if I want to run this statement to check on somebody's balance, 2084 02:04:18,860 --> 02:04:21,590 I can go through a few steps. 2085 02:04:21,590 --> 02:04:24,260 I could first set my own variable. 2086 02:04:24,260 --> 02:04:27,080 Like let's say I get some user input from the user. 2087 02:04:27,080 --> 02:04:28,490 They type in 1. 2088 02:04:28,490 --> 02:04:36,020 Underneath the hood in SQL, I could say set this variable called ID equal to 1. 2089 02:04:36,020 --> 02:04:41,390 This at symbol, which is a convention for variables in MySQL. 2090 02:04:41,390 --> 02:04:43,010 Now I'll hit Enter. 2091 02:04:43,010 --> 02:04:49,310 So the user input is stored separately in some variable not inside of MySQL 2092 02:04:49,310 --> 02:04:51,660 query, like we saw before. 2093 02:04:51,660 --> 02:04:53,390 Now I can execute. 2094 02:04:53,390 --> 02:05:00,260 I can say execute, in this case, the balance check statement using 2095 02:05:00,260 --> 02:05:03,830 that same variable at ID. 2096 02:05:03,830 --> 02:05:07,130 So that question mark I had before I will then 2097 02:05:07,130 --> 02:05:11,270 take whatever the user typed in and clean it, and then 2098 02:05:11,270 --> 02:05:15,140 substitute in that value into this. 2099 02:05:15,140 --> 02:05:17,300 So I'll say using ID. 2100 02:05:17,300 --> 02:05:21,290 And now I get back Alice's balance. 2101 02:05:21,290 --> 02:05:26,060 And just to be clear here, let me try to make this deliberately malicious. 2102 02:05:26,060 --> 02:05:28,470 I want to see everyone's accounts. 2103 02:05:28,470 --> 02:05:33,170 Well, we saw before that I completed this query by typing in-- 2104 02:05:33,170 --> 02:05:40,340 let's see-- set at ID equal to 1 but then union select star from accounts. 2105 02:05:40,340 --> 02:05:45,350 This was what I added to that query to show me everyone's account balances. 2106 02:05:45,350 --> 02:05:47,390 And I'll try to do the same here. 2107 02:05:47,390 --> 02:05:50,570 I'll say this is what I'm trying to substitute 2108 02:05:50,570 --> 02:05:52,850 into this prepared statement. 2109 02:05:52,850 --> 02:05:54,080 I'll hit Enter. 2110 02:05:54,080 --> 02:05:56,180 Now I'll execute again. 2111 02:05:56,180 --> 02:05:59,840 I'll say execute balance check using at ID, 2112 02:05:59,840 --> 02:06:04,500 which has an ID, but also some malicious SQL statements. 2113 02:06:04,500 --> 02:06:07,880 Now if I hit Enter, what do I see? 2114 02:06:07,880 --> 02:06:10,310 Only Alice's balance. 2115 02:06:10,310 --> 02:06:14,240 So it seems that the prepared statement cleans up my input, 2116 02:06:14,240 --> 02:06:19,650 ensures that if I have SQL statements they don't get run. 2117 02:06:19,650 --> 02:06:25,400 So we've seen now SQL injection attacks and how to guard against them. 2118 02:06:25,400 --> 02:06:27,590 What questions do we have? 2119 02:06:27,590 --> 02:06:30,230 AUDIENCE: So for these previous examples you've shown us, 2120 02:06:30,230 --> 02:06:32,900 the [INAUDIBLE] statements only take into account 2121 02:06:32,900 --> 02:06:36,440 the first acceptable condition? 2122 02:06:36,440 --> 02:06:37,373 Or how does it work? 2123 02:06:37,373 --> 02:06:38,790 CARTER ZENKE: Yeah, good question. 2124 02:06:38,790 --> 02:06:43,320 So here, let me show you again the ID that I had put inside of here. 2125 02:06:43,320 --> 02:06:49,310 So if I say select at ID, like that, I'll see this is the value of ID, 2126 02:06:49,310 --> 02:06:52,850 one union select star from accounts. 2127 02:06:52,850 --> 02:06:54,920 And what happens is this prepared statement 2128 02:06:54,920 --> 02:06:57,460 does a process called escaping. 2129 02:06:57,460 --> 02:07:02,410 It finds all of the SQL keywords, all of the values that could be malicious 2130 02:07:02,410 --> 02:07:05,900 and escapes them so they don't actually get executed. 2131 02:07:05,900 --> 02:07:08,170 So it seems like in this case what actually 2132 02:07:08,170 --> 02:07:11,350 happened was if we type in the query again balance check 2133 02:07:11,350 --> 02:07:17,140 was select star from accounts where ID equals 1. 2134 02:07:17,140 --> 02:07:19,990 It seems like it just did the equals 1 part 2135 02:07:19,990 --> 02:07:24,430 to give me back, in this case, the account balance while cleaning up 2136 02:07:24,430 --> 02:07:27,390 the rest of it overall. 2137 02:07:27,390 --> 02:07:30,250 OK, let's take one more question here. 2138 02:07:30,250 --> 02:07:33,300 AUDIENCE: So is this the same reason why we 2139 02:07:33,300 --> 02:07:39,000 shouldn't use a formatted string in Python to execute a SQL query? 2140 02:07:39,000 --> 02:07:42,030 CARTER ZENKE: Yeah, so if you're familiar with Python format strings, 2141 02:07:42,030 --> 02:07:44,460 it's a similar kind of pitfall, which is you 2142 02:07:44,460 --> 02:07:49,690 can add in any possible value for this variable you're substituting into. 2143 02:07:49,690 --> 02:07:52,570 And that can lead to some unintended consequences. 2144 02:07:52,570 --> 02:07:55,710 So similarly, should we create prepared statements 2145 02:07:55,710 --> 02:07:58,920 that can actually clean and escape the data 2146 02:07:58,920 --> 02:08:02,610 that we might want to insert into our queries or our statements over there 2147 02:08:02,610 --> 02:08:03,810 too. 2148 02:08:03,810 --> 02:08:06,660 Good question. 2149 02:08:06,660 --> 02:08:09,930 OK, so we've seen here SQL injection attacks 2150 02:08:09,930 --> 02:08:13,170 and also how to guard against them. 2151 02:08:13,170 --> 02:08:16,200 And actually, by now you're pretty well equipped 2152 02:08:16,200 --> 02:08:19,140 to go off into the world of databases. 2153 02:08:19,140 --> 02:08:22,110 And even though this is our last week, I hope you think back 2154 02:08:22,110 --> 02:08:24,480 on how far you have come. 2155 02:08:24,480 --> 02:08:30,270 So you began by querying a single table, a table of books in this case. 2156 02:08:30,270 --> 02:08:35,700 You then graduated into tables that were related of books, and authors, 2157 02:08:35,700 --> 02:08:39,300 and publishers, and so on, and querying those as well. 2158 02:08:39,300 --> 02:08:43,950 Afterwards, you saw how to design your own databases, learning about types 2159 02:08:43,950 --> 02:08:47,730 and create table statements to build something up from scratch. 2160 02:08:47,730 --> 02:08:52,420 Finally, you saw how to write, how to add data to those databases 2161 02:08:52,420 --> 02:08:53,710 you had designed. 2162 02:08:53,710 --> 02:08:55,110 And then you kept going. 2163 02:08:55,110 --> 02:08:59,670 You saw how to view databases, how to see them in a more simplified light, 2164 02:08:59,670 --> 02:09:03,000 how to optimize those queries, and finally today 2165 02:09:03,000 --> 02:09:08,230 how to scale up all of this so you can serve more and more users. 2166 02:09:08,230 --> 02:09:12,400 So I hope after all these weeks that you are proud of how much you've grown. 2167 02:09:12,400 --> 02:09:14,100 We are certainly proud of you all too. 2168 02:09:14,100 --> 02:09:16,410 Now go off and continue your learning. 2169 02:09:16,410 --> 02:09:18,980 We'll see you later. 2170 02:09:18,980 --> 02:09:21,000