[MUSIC PLAYING] CARTER ZENKE: OK, well, hello, one and all, and welcome back to CS50 Introduction to Databases with SQL. My name is Carter Zenke, and this week is the culmination of all you've learned over the past several weeks in the course. Today, we'll give the tools to all that you've done in the past, but now just at a bigger scale. 

Now, when we're talking about scale, it's worth defining what scale is and what it means for some application to have scalability. For an application to be scalable or to have scalability, it means it can take some number of requests and handle those, whether that request is small or large. And so to kick things off here, let me ask a question. What are some famous tech companies you've seen that have needed to scale over time? Let's brainstorm here. Which ones are you familiar with that needed to scale over time to handle more users, more data? What have you seen? 

AUDIENCE: Any social media sites, such as Facebook? 

CARTER ZENKE: Yeah, any social media sites. You can think of things like Facebook, now Meta, Instagram, and similar. And these kinds of sites might have some really large number of users that are always accessing their data. But even though they have a constantly large number of users, there might even be spikes in the number of requests that they get. Maybe during the World Cup, for example, everyone is on Facebook, everyone's on Instagram. That would be a spike our application should be able to handle, if it is truly scalable. Let's take one more idea here. What other kinds of companies-- tech, prominently-- have needed to scale? 

AUDIENCE: So I think for banking systems, they might start locally, but if we consider national and international level banking? 

CARTER ZENKE: So you're right. A banking system might need to scale as they gain more users, or more accounts, or they want to process more transactions. Now, today, we'll introduce you to a few new DBMSes, or database management systems, like MySQL and PostgreSQL-- or just Postgres for short. And developers often reach to these when they want to scale some application or some database because, whereas SQLite was a embedded database, MySQL and Postgres, these are database servers. That means they often run on their own dedicated hardware that you connect to over the internet to run your SQL queries. 

And now, being a database server, they have a few advantages. They can store data not just on disk, like on an SSD or a hard drive, but in RAM, or Random Access Memory, which often allows for faster querying. They also have a full feature set to take advantage of ways you could scale, like replication and sharding, that we'll talk about later on today. Now, before then, we'll actually compare MySQL and Postgres with SQLite, paying attention, in particular, to types. But later on, we'll focus on their features for scaling as well. 

Now, we'll do all of this in the context of the MBTA, or the Massachusetts Bay Transportation Authority. This is the subway system here in Boston. And if you're ever in town and you want to get around, you might use one of these subway lines. You could enter under the ground, get on a train, and go across town on either, let's say, the red line, on watch Harvard and MIT are a part, the green line, the orange line, or the blue line. And if you recall, in our week on designing, we talked about how to represent all of the riders and stations on the MBTA here. 

Now, one thing we discussed was that the MBTA doesn't keep track of individual people. You often don't register your name with them. You, instead, get your very own CharlieCard that you might swipe when you enter some station. Often, if I'm getting on the Harvard Square stop, I'll take out my CharlieCard and enter that station. I'll swipe it against some counter, and that will log my fare. I could also swipe my card to deposit some funds, to pay my fare in the future. And in some stops, I might actually swipe to leave the station and pay some extra fare there. 

Now, we decided, then, to represent, in this case, cards, swipes, and stations with their own table for each of them. Cards just had their own unique ID, and those cards made some swipe. And like we just mentioned, this swipe has a certain type. Did I swipe it to, let's say, enter the station, did I swipe it to exit the station, or did I swipe my card to deposit some funds into my account? 

And each swipe has, as well, its own date and time it occurred as well as some amount associated with it. Did I pay some money to enter, or did I deposit some money to actually add to the funds in my account? And then, finally, this all happens at individual stations, with their own IDs, their own names, like the Harvard Square stop or the Kendall/MIT stop, and their own line, like the red, green, blue, or orange lines. So our goal will be to first compare MySQL with SQLite, seeing how we can translate a schema like this from SQLite to MySQL. 

OK, let me come back to my computer here and open up our very first instance of a MySQL server. So to connect to this server here, I can run some command that begins with mysql. This is a command line program I can run to connect to the server over the internet. Now, unlike SQLite, MySQL actually has individual user accounts to log into the database. And here, I want to log in as the root user or the admin for this database. 

Now, to say the user I want to log in as, I can type dash u, where u is in lowercase here, followed by the name of the user I'm logging in as-- in this case, root. And again, root to just some synonym for the admin of some database. Now, I connect to this database over the internet. And I might often include the IP address of the computer, the unique identifier of that computer on the internet. For my sake, though, this database is on my very own computer here. So for the host-- or I type dash h, where h is in lowercase-- I can say 127.0.0.1, which, if you're familiar, is the IP address for my own home computer, also known as local host. 

Now, I can keep going, and I could tell MySQL what the port I'm connecting to is. In this case, I'll type dash p, where p is an uppercase, and then the port number-- 3306 by default. And then I want to be prompted for the root password. So I'll type dash p, where p is lowercase. And finally, I'll hit Enter to connect. I'll type in my password, and I should see that I'm connected to my MySQL server. 

Now, this is a full database server. And as such, it could have multiple databases inside of it. In fact, there are a few here already. If I type SHOW DATABASES, just like this, you should see, by default, on your own MySQL installation on your own server, a few default databases, like information schema, or MySQL, or performance schema, or system. And these are tables or databases that have some meta information on the server, like the user accounts, how performance is configured, and other details that you might care about if you're a system administrator. So we've seen here how to log in to a MySQL server and some of these default databases. Now let me ask, what questions do we have on MySQL so far? 

AUDIENCE: What's -u root? 

CARTER ZENKE: Yeah, let me go back to my computer. I can show you what that looked like so we can refresh our memories on that. So if you recall-- let me open up a new terminal here-- to connect to this server I typed mysql u root, or -u root. And what this means is when I want to connect to my server, I have to connect to it using some user account. 

Unlike SQL, where I could just kind of log in without any users, MySQL has users. So the -u says, here comes the user I want to log in as. And root, in this case, is a name for the admin, just some geek historical term for the admin of some system. I could also very well use -u carter if my username was just Carter. But here, we're using route to get access to some administrative privileges. OK, good question. Good to clarify that. 

So let's keep going. And right now, we have access to our database server, with the databases inside of it. But our goal is to design the MBTA system and create our very own database, now in MySQL. So let me try creating some new database. I'll say CREATE DATABASE, like this. And now I'll give it some name. And in MySQL, I don't use double quotes, like I did in SQLite. I use back ticks, like this. CREATE DATABASE, back tick, mbta for title, back tick again to close that all out. 

Now I'll hit Enter. And I'll see the query goes, OK, one row affected. If I now type SHOW DATABASES-- SHOW DATABASES-- I should see that MBTA is now part of this server. And if I want to run my future queries on this database, I could use another command, another statement here, USE. I'll say USE mbta, semicolon, hit Enter. And I see Database changed. So now all my future queries will be run on this MBTA database that I just made. 

So presently, our database is empty. And it's up to us to add in some tables. So let's think through how we could add tables to our very own database we've just made here. But one thing we should focus on is this idea of the types for our columns. I think we know already what tables we'll have and what names we'll give them, but it remains to be seen what kind of columns should we add. 

And actually, in MySQL, we get a lot more control over types. We no longer have just text and numeric and integer. We have a lot more control and a lot more variety. So let's look here at our cards table. We want to recreate this now in MySQL. And this is from SQLite here. Recall how we had a table called cards that had an id column, with the type affinity of INTEGER. And that was our primary key down here. So our goal is now to think about how do we translate this from SQL right to MySQL. 

So we should think, what kind of types might MySQL have for integers? Well, it turns out there's more than just one. There are actually several here. We have tiny int, small int, medium int, int, and then big int, so lots of choices over our integers. And these vary just based on their size. I mean, they all store whole numbers. But they do vary based on their size. 

So a tiny int is only a single byte. It's about 8 bits. But a big int, that's 8 bytes, or 64 bits. Now, why do we have so many of these? Why can't we just have one? Well, let me ask you all to use your own intuition. I mean, what does using more bytes and more bits get me when I'm trying to represent an integer? 

MySQL basically, it depends on the need of the DB or the organization. If the data requires much more storage, then it means we need to have a bigger data type. But if currently if we need smaller sizes, then it means we need to use smaller data types. 

CARTER ZENKE: Yeah, so I like your thinking. And I think kind of behind that is this idea that if we used more bits, we'd be able to count higher. Just as I have on my hands, like 10 fingers, if I had more fingers, I could count higher in unitary notation, right? Or if I had more digits, I could count higher using decimal. So it's the same thing in binary here, where I could count higher the more bits that I have. 

So if I take a look at this table here, each data type has a range of values that it could possibly store. If I'm talking about both negative and positive values, here's the range for each of these data types. You could see for a tiny int we can only count up to 127. And that is it. We cannot count higher using a single byte. 

But if I used a big int, well, I could count all the way up to 2 to the 63rd power minus 1. That is almost unpronounceably big. It's something like a quintillion number of options that you might have between these two minimum values and the maximum values here. So all trade off what you want to represent. 

But our goal here is to translate a primary key. A primary key, it isn't negative. So we could perhaps use something called an unsigned integer. An unsigned integer has no negative value, starts at 0, but can then go to a higher positive value because we're now using some of those combinations of ones and zeros that represented negative numbers now for positive numbers too. 

So here an int can go up to almost 4.2 billion values, between 0 and 4.2 billion. So an int is already pretty darn big. So with these in mind, tiny int, small int, medium int, int, and big int, let's go back to our SQLite CREATE TABLE statement. And let me actually ask you all, what would you substitute in place here? We no longer have integer for our primary key. But which of these do you think seems the best substitute for our primary key on cards, tiny, small, medium, int, or big int? 

AUDIENCE: So in case we know that it's not going to be really huge, we can just use an int. But if it's going to be very large, like say a timestamp, we could use big int. 

CARTER ZENKE: Yeah, I like your thinking. So if it's big but not too big, maybe an int would be good. If we know for sure though it's going to be huge, well, a big int is probably best. And let me offer some intuition here. So if there are fewer than four billion people on this Earth, which I believe is a true fact, and we're using one of these IDs to represent one card, well, if everyone on Earth visited Boston and got their own CharlieCard, we wouldn't run out of values using an int here. 

So it seems like that's a pretty safe bet, unless we knew that everyone in Boston was going to have 10 CharlieCards, or 20, or more, then we could use a big int. But in this case, an int seems pretty safe. So let's update this. Let's go from SQLite now to MySQL. So here I had the ID column with type affinity integer. And my first step in converting should probably be to get rid of these double quotes. 

I no longer need to use double quotes. I should use back ticks as a matter of style in MySQL. But now I should update this type from integer perhaps to a plain old int. As we discussed before, some intuition for that is we'll probably never have more than four billion CharlieCards in our table. 

Now, there's one more thing to add too, which is this, auto increment. So SQLite for free gave us a primary key that if I inserted some value and didn't provide a value for the primary key, SQLite would generate that for me. But in MySQL, I have to actually say auto increment. If I add some new row, make sure you take the highest primary key right now, increase it by 1, and assign it to that new value. 

So you'll often see primary keys being defined with the name ID, and the type int, and the auto increment property of that column there. So let me ask now, what questions do we have on these integers and types so far? 

AUDIENCE: For ID we have using integer type. Why we know that ID cannot be negative it will always start from one and go onward. So we should use it and [INAUDIBLE]. 

CARTER ZENKE: Nice. So here we didn't include it partly for slides sake. But you could if you wanted to make sure that the ID is unsigned. So unsigned means going from 0 all the way up to some larger number. Whereas not unsigned, or just assigned integer, goes from negative to some positive value. Absolutely right. 

OK, let's keep going then. And let's think through our next table here. So our next table is not just the cards table. It's also the stations table. And actually, before we do that, let's actually try to add in our cards table to our server. So I'll go to MySQL here. And now, let me try to create this table to store it inside of our server before we move on to stations. 

So I'll say create table cards. And then I'll follow some very similar syntax from what you've seen before, create table cards. Now, let me add an ID column, int, and auto increment like this. And now, I'll make the primary key ID, like this. Now, I'll close out my create table with a semicolon, hit Enter. And I should see the query is OK. 

Now, if I use the MBTA database, as I did before, I typed use MBTA, I could show tables, like this, semicolon, hit Enter. And now I'll see I have a cards table inside the MBTA database. So kind of like, for example, typing dot schema or dot tables, in this instance, in SQLite to see what tables I have. 

Now though, let me try this. Let me try describe cards. Describe is a bit more like dot schema to show me the schema of this table. I'll describe cards. And I'll see a few different columns here. Let me come over to this board here. And we'll see that this table, cards, has one column, one field, called ID. The type of this column is an integer, as we said it was before in our create table statement. 

Now, this null here, this is whether this column can accept a null value, yes or no. Yes means it can accept a null value. It can hold null. No means it should never hold a null value. So this ID should never include a null value. Now, the key, key column tells us what role this column plays in terms of the keys of this table. PRI here, P-R-I, stands for primary key. This is the primary key of our table. 

Default means the initial value given to any value. If we inserted some row with no value, we'd get back this default value here, null. And the extra is what we included auto increment. It should increase by one every time we add some new row. So using describe and show tables, you can actually see what you're creating along the way. 

So now, let's keep going. Let's actually focus on creating our stations table. And for that, we'll pull up our SQLite syntax we use to create that table. So here, here was what we used to create the stations table in SQLite. We gave each station an ID that was an integer, the primary key of this table. We then gave them each a name and a line. Like the name could be the Harvard Square Station or the Kendall MIT station. And the line could be red, green, blue, whatever line this station is a part of. 

And of course, the primary key is our ID column, as we just said. Now here, we've seen how to handle this integer type. But we haven't yet seen how to handle text. Like could we do better than this in MySQL compared to SQLite? It turns out that we can. We have a variety of options for representing strings, basically this collection of text here. 

And two of the more popular ones are going to be char and varchar. So char is a fixed width string, a string that only has and will ever have the same number of characters. You could think of, for example, in the US, we abbreviate state names using two letters, like MA from Massachusetts, or VA for Virginia. You might do the same thing with countries as well. 

Now, that would be useful for a char because this char will always hold only two characters. But often you're not sure how long the string is. You're going to add to your column. Like let's say you're trying to store names of people. Well, my name is Carter. But yours could be longer or shorter. So a varchar works well for that scenario, where a varchar can accept a variable length string up to some number of characters. 

And when you use these, MySQL will ask you to provide how much space this type should take up using M here. So I could say char parentheses 2 to get back a fixed width string of two characters. For varchar I could say varchar and put in the middle here like, let's say, 16, or 32, or 64. And that would be the maximum size of the string that I could represent using varchar. 

So good here for strings that are going to be a bit shorter in length. What if though you want to store the text of a book or something a bit longer than that? Well, you could use another type, this one called text. And this is different than SQLite's text. In SQLite's text I could store short strings, long strings, et cetera. In MySQL text is good for longer chunks of text, paragraphs, pages of books, and so on. 

And you actually have some options for the length or the size of this text. You could use tiny text, medium text, or long text. Each just uses some variable amount of space on your computer to store, in this case, characters that are going to be longer than any particular string, like a name, and more like some page in a book. 

Now, besides text we do have our old friend blob. And blob still stores binary exactly as you give it. And more interestingly though, we also have enum and set. And these are brand new. These are not in SQLite. But they're very useful for us. If you remember back in our week on designing, we introduced constraints. Maybe a constraint is that this column should only have certain values. 

Like let's say if we're calculating t-shirt sizes, I should only have xs for extra small, s for small, m for medium. There's some fixed range of values this column could hold. Well, I could use enum for that, where enum lets me enumerate the possible options I could put into this column, like xs, s, m if I'm doing t-shirt sizes, or something else altogether. Now, set complements enum. 

Enum means I can only have one of these options. If I had xs, x, s, m and so on for t-shirt sizes, I can only choose one. But a set will allow me to choose more than one to store in that same cell. You could think of, for example, movie genres. A movie isn't just a horror, or a fiction. Or it isn't just comedy. It could be all of those together. 

And so set would allow you to choose not just one option from your list, but multiple and put it inside of that single cell there. So let's go back to our SQLite create table statement. And let's update this for MySQL. Well, the first thing you do is what? Change these double quotes to back ticks. So I'll do that here. 

And now, let's update this ID. We saw before we could use regular old int and auto increment, like this. But now we have some more choice. Like for this name field that should store the name of a station, let me actually ask you all, what do you think is the best type to use? I'm just wanting the name of some station. 

AUDIENCE: It would be a varchar simply because you don't know how long the station name actually is. 

CARTER ZENKE: Yeah, a good point. So we're always creating new stations. And maybe some station name is pretty long. Or maybe it's short. We can't really tell, right? So a varchar is good if we have some variable amount of characters inside the text we're trying to store. And notice how a station name isn't paragraphs of text. It's just maybe a few words. That's good for a varchar. 

And in general here I could say, well, let's try to use a varchar and say we could use up to 32 characters including spaces. I think that will cover us pretty well for names of stations. Now let's talk about the line. Now, in Boston, there are a few lines. There's the blue line, the green line, the red line, even the silver line. But what type should we use for storing lines then that these stations are a part of? 

AUDIENCE: In this case, I believe that there can be two options, where it would be a small integer number and point it to another table storing the colors. Another one will be limited char taking into account that perhaps the longest color name will be like aquamarine, or something like that, and truncate it just to that size will be another option as well. 

CARTER ZENKE: Yeah, I love your thinking through multiple options here. And indeed, there's usually more than one option for any given type of a table here. So for this one, I mean, you could actually have line be a foreign key and reference some other table, to your point, of those colors. So line might then be an integer. 

We could also make line another varchar. Maybe this one's a little shorter. Like it's always going to be just blue, green, red, or so on, so a pretty short string, but still variable length nonetheless. So you could say varchar maybe 9, for instance, if we're representing a color here. One other option though exists, one that kind of constrains us, but that might still be a good fit. 

AUDIENCE: Maybe it would be good to use a set. 

CARTER ZENKE: To use set. You could use set as well. So you could say that a station might be on the blue line and the green line as exists in the real world. You could certainly use set for that. For our purposes, I might just go ahead and simplify things and just say enum. A station will only be on one line at a time for simplicity. 

But if you wanted a station on more than one line, you could use set as well. And in this enum or in this set we could include the possible lines that currently exist in the MBTA. So great reasoning through some different data types here. Let's now build this into our MySQL server here. 

I'll come back. And I will create this table. So we saw before I could simply use create table. I could say create table. And then I could say give it a name called stations. And inside I'll make sure to include-- let me indent four spaces here, one, two, three, four-- give it an ID, which has the type integer and auto increment. Then afterwards, we also want to include the name of our station, which could be a varchar up to size 32. It could include up to 32 characters. 

And this should also be not null and unique. Those same kind of column constraints, they're still here in MySQL. Now I could also include the line column. And for simplicity, I could just assume every station is part of only one line. So I'll say enum here. And I'll choose blue, and green, and orange, and red. These are four lines for the subway that exist here in Boston. 

And I'll make sure this has to have a value. It can't be null. Put that not null constraint here. Then I'll say my primary key is the ID column. And I will go ahead and close this out, hit semicolon and Enter. And I'll see query OK. So now, if I show tables, Enter, I'll see both cards and stations. And if I were to describe stations, what do I see here? 

Let's take a look. I'll come back over here. And I'll see I have three columns, ID, name, and line. And each of these has their very own type, as I see in this column of my describes output. I see here that the type of ID is an integer. The type of name is a varchar 32. And the type of line is an enum. Blue, green, orange, or red are the possible values for that column. 

Now, none of these columns can have a null value inside of them. But here's the interesting part. I see ID has a key, PRI, primary. That means ID is my primary key. But I also see uni, where uni stands for unique. Now, we saw before this name column. I applied a unique constraint, meaning there can only be a given station with a unique name. I can't duplicate station names. 

And then of course, over here we also saw the extra column auto increment that I applied to the ID column here. So let me ask. We've seen now some new types for text, some new types for enum, and set, and so on. What questions do we have on these new types and how we've used them? 

AUDIENCE: I was wondering if it would be perhaps possible to include a table as the argument to enum? 

CARTER ZENKE: Oh, a good question. I actually am not sure. I could imagine the possibility of using a nested select for that. But I mean, let's reason through this here. If I had a table that was growing, would I want to use those table results in my create table statement? I would argue probably not because then things are going to change over time. So probably worth just enumerating one at a time explicitly what might go into that enum. 

But I encourage you to try that out too and see what happens. Let's take one more. 

AUDIENCE: Here's the problem for a database if I, for example, I have no idea how long could be the name of the station. And if I write a varchar, for example, 100 or 300. 

CARTER ZENKE: Yeah, so maybe you're thinking of a scenario where you don't know how long it's going to be. So you'd be kind of extra safe. You'd say like varchar 300, for instance, just to accommodate anything you could put in there. That is OK. But the trade off is you're then using up to 300 bytes every time you store a smaller string. Like if I use varchar 300 and I only ever use, let's say, like 10 characters on my longest string, well, I'm wasting some space. 

So it's worthwhile to think through beforehand, what should I actually use here? And something to keep in mind is you can always expand a varchar using MySQL's alter table statement. So we'll see that in just a bit, how to use alter table. But you could always expand if you need to. OK, good questions here. 

Let's keep going then. And our next table is going to be the table that represents swipes. We have cards and stations. But now we have to represent swipes at those stations. Well, let's come back to our schema for swipes in SQLite. Here, I had this table called swipes that had some columns up top and some primary foreign keys down below. But the important part for us now is going to be these three columns, type, date time, and amount. 

And remember that on the MBTA, I can take my CharlieCard. I can make a swipe when I enter, exit, or deposit some money at a station. That's that type column here. It could be either enter, exit, or deposit. Now, date time is the time and date I swipe that card to enter, exit, or deposit some funds. And amount is the amount in dollars associated with that transaction. 

Now, the fare nowadays is like $2.40. Back in the '50s it was maybe like a dime or so. It changes over time. So it's worth keeping track of in our table here. So let's think through some options that MySQL gives us. Among them are options for dates and time. So whereas in SQLite we had numeric for dates and times, in MySQL we actually have explicit date and time types. 

For instance, we have all these here, date, which is just a date, like the year, the month, the day, time, which is the actual time, like hours, minutes, and seconds. We then have date time, which combines those, timestamp, which is similar but used with a bit more precision often for logging events on your own software, and then year, which is just 2023, or 2020 for instance, in a single column. 

Now, you can get more particular about how precise you want to be with these times. For time, date time, and timestamp, I can specify how many decimal digits I want to keep track of after the decimal point. Do I want to keep track of tenths of seconds, hundredths, or thousandths? I could specify that using this (fsp). In this case, I could say one, two, three, four, up to I believe six to figure out how precisely do I want to keep track of time, where more digits means more precision. 

So I could use these for dates and times. That could get us through part of our table migration here. But I should also think about how to represent real numbers, that is numbers with a decimal point in them. Now, you might think back to SQLite and remember the real data type. And we have some similar options here. We have float and double precision, where the only difference is float stores a number in size 4 bytes. Whereas, double precision stores that number in 8 bytes. 

And you might be thinking, well, if they both store a decimal number, a real number, why would I choose one over the other? Or why would I use more space when float does the same thing for me in fewer bytes? Well, it turns out that in computer science, we run into this problem of floating point imprecision, where there are an infinite number of real numbers. 

Like if I asked you to find me all of the decimal numbers between 1 and 2, you would be here for literally an eternity. And if we have so many numbers to represent, but so few bits, it goes to say that we can only represent some of them and not all of them. So at the end of the day, floating point numbers, decimals, real numbers, will be imprecisely represented. 

Now, a double precision could get us more of the way there. We could use more bits to more precisely represent some number, which you might use if you're a scientist, someone who wants to keep track of things very precisely. If you don't care as much though, you could use a float and just use a little bit less space. 

Now thankfully, this trade off isn't quite so salient in MySQL because you also get access to fixed precision types, where fixed precision means I can specify I want to always have two places after the decimal precisely represented. And one type for this will be the decimal type, where I can use a decimal and then say, I want to store a certain number of digits with some number coming after the decimal point. 

Now, for example, I could say decimal 5 comma 1. And I could store any number between negative 999.99 up to 999.99 so long as there are only two places after the decimal here. Now, I could increase this. I could say I want to go beyond this range. I want to have more digits. I could increase this first parameter here from five, let's say, to six. And now I have a wider range. I have 9,999 all the way down to negative 9,999, in both cases 0.992 decimal digits after this decimal point. 

Now, I can keep going. I could say 7 and get an even bigger range. But if I want to present something precisely after the decimal point, what could I do? I could change this one too from 2, let's say, to 3 or even to 4. I could keep going. But notice how my range shrinks. What I get in return is just more digits after that decimal point precisely represented. I don't have to worry about floating point imprecision, so long as I'm using decimal and telling MySQL how many digits I want and how many come after the decimal point. 

OK, so let's try this here. I want to migrate this table from SQLite to MySQL. The first thing to do, change these quotes from double quotes to back ticks. I could then update my primary key, and so on. But let's focus now on this type. This type column should only have some set of values, like enter, exit, or deposit. And when you see that, you could think I should probably use an enum to only store some data types, some values that are inside of this list. 

So I'll make this now an enum. But the question then becomes, what should I use for date time? What type have we seen so far that could be useful for storing the date and time I swiped my CharlieCard? 

AUDIENCE: Yes, I believe we can use that date time data type. 

CARTER ZENKE: I could use date time, so kind of self-explanatory here. I mean, the column is called date time. And coincidentally, MySQL also has a type called date time that stores both a date year, year, year, mm, dd, and then the hour. Along with that, I could say date time here to get both day and time. 

Now, the next one, amount. Now, it's worth noting I want to be precise with this amount. Like if I'm going to sum up many, many amounts, many fares, I want it to be accurate. So I don't want to deal with, let's say, float or double precision. I want to specify how precise I should be. Let me ask, what data type might be good for storing the amount of some given fare? 

AUDIENCE: I believe we can use the decimal data type. 

CARTER ZENKE: Yeah, you could absolutely use the decimal data type here. So decimal was part of MySQL's fixed precision kinds of types. So we could use that here to precisely represent the amount column here. I could change this from numeric in SQLite to decimal now, where decimal I could say takes five possible digits, like five total. And two now could come after the decimal point. 

So I could have amounts or fares that go up to hundreds of dollars, which often isn't going to happen. But I could be safe here if I need to be. OK, let's go ahead and combine this and put this inside of our MySQL server here. I'll come back to my computer here. And my goal is to run this create table statement to create this table on our server. I'll say create a table called swipes, like this. 

And I'll include a few columns we didn't show on the slides just for simplicity. All first include the ID column with type int. And I also want to make sure this auto increments as I add to it. I'll then go on. And I will also include a card ID column that has a type int. And then I'll include a station ID column, which also has type int. 

So now, I can see which card ID is swiped at which station. Now, when they swipe here at the MBTA, swipe can mean a few different things. You can have a different type. One type we might care about is the enter type, when I swipe to enter some station. I could also swipe to exit some station or to deposit some funds into my account. And a type should never be null. It has to be not null constraint here. 

Now, a swipe happens at some particular date and time. So I'll include the date time column and say that this is of type date time. I want to make sure this column always has some value. So it's not null. And the default when I insert some new row will be this special value here, current timestamp, the current date and time. Now I'll go on. And I'll include the amount of some swipe. I'll hit space. And then I'll say amount here. 

This is the amount associated with that swipe. Was I charged money? Was I adding money? Here I'll say I want this to be a decimal type because you said we want this to be precise. And decimal is a fixed precision type. I'll say I want five decimal digits with two after the decimal point. Now, decimal should be not null. And I'll include our same check constraint. I'll check that amount with back ticks here does not equal 0. And if that's the case, I'll be able to insert some new swipe. 

Now, I'll assign my table constraints, like primary keys and foreign keys. I'll say my primary key is the ID column, like this. And then if I hit Enter again, my foreign key, in this case card ID, should reference, it references cards table and the ID column in cards, like this. So that card ID column references the ID column in the cards table. Now, I'll include my last foreign key. 

The foreign key station ID references the station's table and the ID column in stations. And now to quickly review to make sure I typed everything in correctly here, so we have the ID column, the card ID column, station ID, type of enum, date time, amount. I think we'll be good here. So if I hit Enter and then use parentheses semicolon, I should see query OK zero rows affected. 

Now, if I type, in this case, show tables Enter, I'll see I have a new swipes table. And if I type describe swipes, like this, I should see some familiar output. If I go back to my screen over here, we can take a look. We have again, our fields, ID, card ID, and so on, types, int, enum, date time, decimal. Some can have null values, some should not. 

We have a primary key here. But now the interesting thing is this M-U-L. Well, if I look at card ID and station ID, what do we call those two columns? What kind of constraints did I apply to card ID and station ID? 

AUDIENCE: So we uses the foreign key constraints, the foreign key. So it's referencing the station and the card [INAUDIBLE]. 

CARTER ZENKE: You're exactly right. We applied the foreign key constraint to the card ID and station ID columns. So this M-U-L here is referring to the fact that these are foreign keys. And M-U-L stands for something like multiple or multi. It means I could have, in this case, multiple of the same value because it is a foreign key in this table. 

Now, the default here we now see current timestamp. When I insert some value into this table, insert some rows, sorry, into this table and don't supply a value for the date time column, I'll instead make that default value the current date and the current time. And here we see some extras, like auto increment that we saw before, and now default generated, meaning this is a generated column that will automatically create some value for me when I insert some new row. 

OK, so let me ask, what questions do we have then on these new types, dates and times, and more? 

AUDIENCE: Yes, I wanted to know the precedence in which the attributes, the specific constraints, the attributes are added. So is it in the format of the describe table? In the describe table, there's a setting format in which the data is the priority in which you do it? Let's say an integer before. Let's say a unique constraint before. And not null. What's the precedent? And secondly, is there any type affinity in MySQL? 

CARTER ZENKE: Yeah, both good questions. I'll answer the first one first. So the first one is, what's the precedence, I believe, of these constraints we're applying? If I look at this output of describe swipes here, I see that I applied some constraints. Among them are the foreign key constraint, the primary key constraint. Some I made sure that I couldn't have some null value in here. 

In effect, they all combine to work for me here. So when I made that create table statement, I applied multiple constraints, like not null and unique. And I could put those in any order. MySQL is designed so I can do it in any order and get the same result forwards or backwards. When I read this table, I might just read it left to right here and kind of get a feel for which constraints are being applied in no particular order, in this case. 

AUDIENCE: Yeah, the second question has to do with type affinity. With SQLite we had something like the type affinity where a particular value is appropriate for a particular role. So is there anything like that in MySQL? 

CARTER ZENKE: Yes, there is. So in SQLite, you're right, we had type affinities, where we defined a table and said that this column should store values of this type. We're doing the very same thing in MySQL when we say ID of type int or the name of a station of type varchar 16, for instance. One difference though is that SQLite will allow you to add in some value that doesn't actually have the type of that given column. 

So if I had a SQLite column of type text or of type, let's say, type numeric and I added some text, it would try to convert. But if it couldn't, it would actually just store text. MySQL is more strict than that. It will make sure you have the proper type when you insert some new rows, at least by default. OK, great questions here. 

Let's keep going and show some more features of MySQL. So one of them is the ability to alter tables in a more fundamental way. So let me zoom back in here. In SQLite we had some options to alter tables. But at the end of the day, they just weren't all that powerful. With MySQL, though, we're actually able to alter tables a little bit more fundamentally. 

So let's say that the MBTA adds in a new line, a new subway line. And actually, the silver line, if you can see it here, is currently a bus line. But it was planned to be a subway line. So if I wanted to add in a subway line called the silver line, well, I can't currently do that because my type for the lines is an enum. It only has red, green, blue, and orange, not silver. 

So I should actually edit this create table to now be able to include the silver line as a possible value among that enum. So I'll come back over here. And I'll introduce you to MySQL's alter table syntax. So in MySQL we have still alter table, similar to SQLite. They're all based on the SQL standard. But now we have modify, where modify can take some column definition and change it on the fly. 

So I'll go to MySQL here. And let me describe again the stations table. I'll hit Enter now. And we'll see that the line is an enum. And it has a few options, blue, green, orange, and red. But now I want to add silver here. So what could I do? I could do this. I could use alter table. And I could say, I want to alter table. I want to alter the stations table. And I want to modify. I want to modify the line column to be this. 

I want it to be an enum with these options, blue, green, orange, red, and silver, like this. Now I have more options in my enum. I could also include not null too, hit Enter. And I should see-- oops, I think I might have forgotten a quote somewhere. I believe I did. Let me fix this. I will finish the quote, hit semicolon. And let me show you how I noticed that. 

If I go back to my screen here, see in MySQL, it's kind of helpful. If I finish a line and I haven't closed this quote here, like I did in red, it will alert me on this next line, with this single quote here saying, hey, you didn't close that thing that you probably thought you were going to. So let me restart. I'll come back over here. And I will try this again. 

I will alter table stations. And then I will modify the line column, as we said before, to be an enum with a few different values, blue, green, orange, red, closing both quotes now, and then silver, again closing both quotes. And let me double check. I believe we're all set. I'll then say not null, semicolon, Enter. And I'll see the query is OK. 

Now if I do describe stations, like this, semicolon, and zoom out just a bit, I see I've added to my possible values for this enum. I have blue, green, orange, red, and silver. So this is one example of using alter table to adjust a column definition. To an earlier question, you could also use this to expand the size of varchars or to change your table all together. You can do so a bit more powerfully than you can with SQLite, just because MySQL decided to focus on this particular feature. 

So let me ask now what questions we have on altering tables in MySQL. 

AUDIENCE: This is generally regarding to MySQL and Postgres. Can we use those applications instead of v score [INAUDIBLE] that user interface? 

CARTER ZENKE: Yeah, a good question. So I think you're asking, what would be the difference, or how could I use MySQL and Postgres compared with VS Code? In effect, they're kind of different use cases. So VS Code, as you might know, is an IDE, Integrated Development Environment. It allows you to write code in a file, to run that code, and to edit it and see it all in one place. Whereas MySQL and PostgreSQL, those allow you to write SQL statements on a command line, talking to some underlying server. So a bit different. 

You could, for example, use VS Code and connect to a MySQL or PostgreSQL server to run your SQL queries you're creating in VS Code. Good question there for workflows. OK, so what we'll do is take a quick break here. And we'll come back and talk about stored procedures, some functions we can define to run over, and over, and over again. We'll see you in a few. 

OK, and we're back. Now we'll next talk about stored procedures, which are a way to automate the running of SQL statements over, and over, and over again. This is useful as your scaling because if you have some process you often call, you can put it inside of a stored procedure and run it all the more simply. 

So to demonstrate stored procedures, what we'll do is go back to our example of the Museum of Fine Arts, one of Boston's oldest museums, also called the MFA. And if you remember, in the MFA, we had a certain number of tables, where one of these tables was a collections table. It is the items the MFA has in their collection, the items they put on display, for instance. Here we had a few pieces, Farmers Working at Dawn, Imaginative Landscape, and so on. 

Each had their own ID. And they also had a deleted column to implement what we call a soft delete, where to delete something we wouldn't remove the row. We would instead just mark this deleted column from a 0 to a 1, sort of hiding it or marking it as having been deleted to keep data around, not removing it all together. 

Now to find all of the items that weren't marked as deleted, we could use this query, select star from collections where deleted equals 0, at least in SQLite. Now, we then created a view. We said, maybe I want to just always be able to view those items that have not been marked as deleted, one table called current collections, or one virtual table, a view called current collections. 

If I were to update Farmers Working at Dawn to be marked as deleted, what would happen? This deleted column would become a 1. And I would not see it in current collection. So our goal then is to re-implement something like this, but now not using a view. In this case, using a stored procedure, we could simply call to find all the items that have not been marked as deleted. 

So to create a stored procedure, we have some syntax we can use. And that syntax looks a bit like this. We could say create procedure and give it some name. Well, this procedure we could call maybe current collections, being able to view what is inside our collection and not marked as deleted. We could then, similar to a trigger, say begin after which we say the SQL statements we want to run as part of this procedure. 

So begin, we then type the SQL statements we want to run, and finally say end, this is the end of our procedure. And then later on, we could use a statement called call to call this procedure over and over again. So let's work on creating our very first stored procedure to find all the items in our collection that have not been deleted. 

Well, I'll go back to my MySQL server. And I'll show you a new database I've made here. I can type show databases semicolon. And now I should see I have not just the default MySQL databases. I also have the MFA database. So if I type use MFA semicolon, I should see database changed. Now all my future queries will run on this MFA database. 

And so I could type show tables to see all the tables inside of this database. Hit Enter. And I'll see I have artists, collections, and created. I could even if I wanted to describe the collections table, describe collections semicolon. And now I'll see the description for this collections table here. I have an ID, title, accession number column, and an acquired column too. 

But what am I missing? I want to be able to implement soft deletes. If you were to look at this describe output, what am I missing if I want to implement soft deletes? 

AUDIENCE: You're missing the deleted column. 

CARTER ZENKE: Yeah, the deleted column is not yet in this table. I know because I use describe here. So I don't have deleted yet. But I can add it using alter table. So I'll come back to my computer. And let me use alter table to add in this particular column. I could say alter table collections, like this. And then I could use add column. 

Modify was good for modifying a column that already existed. Add column is good for adding a brand new column. I'll call this one deleted. And its type will actually be a tiny int, a tiny int. Why? Well, it's only ever going to be 1 or 0. I should use as few bits as possible. And tiny int gives that to me. Now I'll say the default value is 0 for not deleted. And I'll hit Enter. 

And now if I describe collections again, I should see that I'm back to having a deleted column of type tiny int with a default value of 0. OK, so now, we saw I could select everything from collections that's not deleted by saying something like select star from collections where deleted equals 0. I can implement the same thing here, but now using a stored procedure, something I could call again, and again, and again to speed up this process just a little bit. 

So I'll say I want to create this procedure, current collections. But there's something a little interesting about the MySQL syntax here. If you remember, if I go back to some slides here, in the middle of begin and end, we're going to put our very own SQL statements to run as part of this procedure. Well, to end a SQL statement, we have to have a semicolon. 

But MySQL might actually prematurely end this statement. If I type quit procedure name begin some statement then a semicolon, it'll say, OK, you're done. But I'm actually not. I might want to add more statements. I might want to conclude with this end down here. So a workaround is to temporarily declare a different delimiter. Instead of ending my queries with a semicolon here, I'll end them with, let's say, slash slash. And we'll see that in just a moment here. 

I'll come back to my computer. And for the sake of writing this procedure, I'll change my delimiter, that is what icon represents the end of my query to slash slash, like this. Then I'll hit Enter. Now I can start creating my procedure. I'll say create procedure. And I'll call this one current collections with a open and closed parentheses after this. This is common for names here. 

Then I'll say begin this procedure. And in this procedure, I want to select the title, accession number, and acquired columns. I want to do it from the collections table, where deleted equals 0. And now I can safely include a semicolon because my delimiter for this entire query is a slash slash. So this is the end of this particular query inside my procedure. It's also the end of my procedure in general. 

And now to declare the end of this procedure, the end of this overarching statement, I could say slash slash instead of semicolon. Now, if I hit Enter, I'll see query OK. And now I should change back to, in this case, semicolon, so I can get back to what I'm used to using to end my queries. I'll hit Enter. And now let's see what we can do. 

I could say call current collection. I think we called it collection, singular. And I'll say this, semicolon there. Now, I should see all of the items that are marked as not deleted. So using call, can I then call this procedure I've made and stored? Let's try deleting something. I'll try maybe deleting Farmers Working at Dawn. I could say update, in this case, collections and set deleted equal to 1 where the title equals Farmers Working at Dawn, semicolon. 

I'll Enter again. I'll see one row has changed. If I call current collection again, like this, semicolon Enter, I'll see that particular artwork is now gone. It's not returned to me by the query that was part of my procedure. So this then is one example of a stored procedure, one way of saving a query and calling it again and again if I might use it very often. 

But what questions do we have so far on stored procedures and how we can use them? 

AUDIENCE: I have a question. Just like in other languages, do we have the provisioning that we can pass the parameters and arguments in the function, in the procedure as well, in MySQL? And secondly, can we call other procedures also in one procedure, like in other languages we have one function call from other? 

CARTER ZENKE: A great question about the power of stored procedures. So the first question, could I add parameters or arguments to this procedure? And could I change what I do based on that input? The answer is absolutely yes, you can in MySQL. And I'll show you that in just a minute here. The next one, which is, can I call other procedures inside my procedure? Yes to that as well. 

Any statement you might write, you could probably put in a procedure too. OK, let's take one more. 

AUDIENCE: Yes, I have a question. There is any way that I can add any note or comment to the table? 

CARTER ZENKE: Can you add any note or comment to your table? A good intuition, like it's good to include notes to yourself or to include notes to others. To my knowledge you would best do that in something like a schema dot SQL file. If you have a text editor, you could make a dot SQL file and store your schema inside of that. At which point, you could leave comments to other developers to make them know what you are thinking as you were building this. 

There might be a way to leave comments in MySQL. But I don't personally know what that is. You could do some research to look that up too. OK, great question. Let's keep going then. And to our question about procedures and trying to make sure they could have inputs, let's see that as well. Well, we saw too that I might want to mark some particular item as sold as soon as I mark it as being deleted. 

So for that I'll need a new table here. If I type show tables semicolon, I'll see artists, collections, and created. But back in our week on the MFA, we also had a table to keep track of transactions, when we might buy and sell some piece of artwork. So let's recreate that table here. I'll create a table called transactions, create table transactions, like this. And then I'll give it an ID column with the type int and auto increment. 

Then I'll give it a title to mark as being sold or bought. Let's say the title is a varchar of size up to 64. And it can't be null. Then I'll say the action, which can be either bought or sold-- did we buy or sell this piece of artwork with this particular title? So I'll say this is an enum with the possibility of bought or sold. Then I'll say this is not null. And I'll then say the primary key of this table is ID. And I'll close out this create table statement here hitting Enter. 

Now I can see I have a transactions table to log the buying and selling of pieces of artwork. Well, it stands to reason that when I update my collections table and I set some item to be deleted from a 0 to a 1 in that deleted column, I want to add it to the transactions table and also mark it as sold. So normally, that is two steps, updating the table and also adding it to transactions. 

With a procedure though, I could give that entire sequence one name. So let's try that here. I'll say I want to make my delimiter slash slash. I'm going to create some procedure here. I'm going to end it with slash slash. So I'll say create procedure. And I'll call this one sell, like this. But now instead of open parentheses and closed parentheses, I actually want to give some input to this function. 

And let me actually ask you all here, what kind of input should I give to this sell function, so I can identify the painting I want to sell? 

AUDIENCE: It should be with a primary key and refer to the name of the painting probably. 

CARTER ZENKE: Yeah, I like your thinking. We could think about what identifies this artwork. Maybe it's the ID or the title. In this case, I like your intuition for the ID, which is unique, will never change. So let's actually use the ID here of an item in the collections table to mark it as sold. 

So I'll come back to my computer. And I'll say that sell actually takes an input. And I can say in here to note a value as an input. I'll say the input is something called sold ID, the item we're selling. And the type of this is an int, like this. So now I have a procedure that takes in some value in integer called sold ID. 

Now, I'll say let's begin this procedure. Let's begin down below. And the first thing I should do is update collections, like this, and set deleted equal to 1 under what condition? Well, perhaps under the condition where the ID is equal to our input, which is called sold ID. So I'm using the input of sold ID to find in collections which artwork I'm marking now as deleted. 

But the next step-- I'll say semicolon here to end that step-- is to then insert it into the transactions table. I've marked it as being deleted. But now I need to insert into transactions. I'll say then I want to insert into transactions in the title and action columns some values here. Well, what values? 

The first value is actually the title of this artwork. And I only have the ID right now. But I could use the ID to get back the title. I could say write a subquery that is select, in this case, title from collections where the ID equals sold, sold ID like this. That is a subquery that gives me back the title of this artwork. 

Now, I'll also log sold here, like this. And then I think that's the end for that particular query there. Let me just double check this, insert into transactions, title, action, value, select title from collections where ID equals sold ID, sold. OK, I think we're good. I'll hit Enter on that. And then I think I can safely end my procedure. I'll say end here slash slash, Enter again. I see everything was OK. 

So now if I change my delimiter back to a semicolon, I can call sell on some particular ID. Like let's say I now want to sell the particular item. I could say maybe select star from collections to see what I have. And now I want to sell Imaginative Landscape, which has the ID of 2. Well, I could try call sell and now pass in that value 2, call sell 2. And if I now hit Enter, I should see some rows were affected. This procedure worked. 

Let me check. I'll say select star from collections semicolon. And I'll see that piece now has been updated. Its value was 0. But now it's 1 for the deleted column. Now if I say select star from transactions semicolon, I'll then see Imaginative Landscape being sold here as well. So good check here. And I'll actually just check on this. So I think the ID was 2. And that actually makes sense here. 

So the ID we're thinking about [INAUDIBLE] and is different here. So this is the ID of the transaction, which is different than the idea of the painting, which is 2, just to clarify that. OK, so what other questions then do we have on stored procedures? We've seen some without inputs, now some with inputs. What else are you wondering? 

AUDIENCE: Is it possible to pass in multiple inputs to a stored procedure? 

CARTER ZENKE: Yeah, could it have multiple inputs? It absolutely could separated by some commas inside of those parentheses. So you could imagine in sold ID int comma some other input as well. And a procedure can also output some value. You could say I want to store the result in some variable I could keep track of in the server too. 

Let's keep going then. And we'll kind of end stored procedures on this note here, which is I seem to-- if I go back to what I had discussed before, I was able to update some item in my collections table and then add it to the transactions table too. So if I say select star from collections, there is some danger here. 

Like if I try to sell something that has already been sold, I might run the procedure and mark something as being sold twice. Now, I'll actually leave this up to you as your own exercise. But I want to give you a few tools to think about for writing your own procedures. Among them are some constructs you might be familiar with if you've programmed before. 

So here you might notice a stored procedure can actually be compatible with some familiar programming syntax, like conditions, for example, if, else if, and else, and also loops, like the ability to do something more than once or to repeat something over, and over, and over again. So you can use these to really build up your own stored procedures to combine statements and automate processes that you're going to do very frequently on your database at large scale. 

So we've seen here some options for stored procedures. What we'll do is come back and talk next about Postgres. See you in a few. And we're back. So we've seen so far MySQL, which is a new DBMS that [INAUDIBLE] just a bit more than SQLite at a larger scale. It's worth also taking a brief detour into Postgres just to get a taste for what it looks like and get an understanding for how it can represent some of these same tables in a different environment. 

So we'll talk here about Postgres. And our goal will be to do the same thing we did with MySQL, converting our tables from SQLite over to Postgres taking advantage of some new types and some new affordances it might offer that are particular to Postgres itself. So here then is our table we used for cards in SQLite, just to remind you again. We had an ID column of type integer. And that was our primary key. 

What we saw in MySQL, we had a variety of integers. And here we actually have the same thing in Postgres. But they just look a little bit slightly different. So here I have a table of integers in Postgres, the signed and signed values over here going from negative to positive. So unlike MySQL, which had tiny int, medium int, and so on, in Postgres we only have small int, int, and big int, just a little bit fewer options because the developers thought you might not need a tiny int or medium int, just give you three particular options. 

Now although there are fewer integer options, we also get something a little bit different. I can also use a serial type, small serial, serial, or big serial. Now, a serial is still an integer, a whole number. But it actually does what we try to do in MySQL for me, which was try to include an auto increment attribute. A serial will automatically increment for me, making it good for primary keys. 

So here, if I go back to my cards table, and I had this ID column of type integer, I could update that to now be a serial, where a serial means some whole integer number that will increase as I add more and more data to my table. Now, what else do you notice here? Well, it seems like these double quotes are back in Postgres. 

So in MySQL you had back ticks. In SQLite you had double quotes. Well, in Postgres you have double quotes for your identifiers here. So this then is how we've changed our cards table. And now let's actually do this inside of Postgres. I'll log in and create my very own database and my own table. So I'll come back to my computer here. And now let me do this. 

I will log in to Postgres using this command right here. I'm going to open PSQL, which is essentially the command line interface, like MySQL, but now just for Postgres. I'll hit Enter here. And I'll log in as the default Postgres user. This is the admin for Postgres now. I'll type in my password. And I'll see a prompt. It looks just a little bit different, but the same kind of spirit here. 

If I want to list all databases, I don't type show databases. I just type backslash l for list. And I'll Enter here. And now I'll see a collection of default databases that come with the Postgres installation. Similar to MySQL, these are databases that contain configuration information, metadata on this installation of Postgres. But now I want to create my own database, one called MBTA. 

So I'll say create database, same as before, and call it MBTA now using double quotes. If I hit Enter now, I should see create database, confirming this statement. And if I type backslash l I'll now see MBTA is included in my list of databases. So if I want to connect to a database, in MySQL I used use. In Postgres I'll use backslash C. 

So I'll say backslash C and then MBTA, hit Enter. And now I'm connected to the database MBTA as this user whose name is Postgres. And now if I want to see what tables I already have, I could type backslash dt, hit Enter. And I'll see no relations, no tables, but just another way of figuring out this is an empty database. 

So we've seen so far how to log in to Postgres and some basic navigational statements we can use to work our way around this database. But it turns out there is some stuff that's familiar here. Like if I want to create a new table, I can do it the very same way I did it before. I could use create table. I could say create table, let's say, cards. And then let me include, in this case, an ID column of type serial, which means an integer that auto increments for me. 

Then I'll apply the primary key constraint, familiar old friend here. Then I'll close out cards, like this, hit Enter. And I should see it confirms I created this table. If I now type slash dt to list my tables, I'll see I have a table called cards. And if I want to look at this table in particular to understand its schema, I could type slash d and the name of that table, cards, like this, hit Enter. And now I should see some more information on this cards table. 

It has a column called ID, a type called integer, and some more information like we saw in MySQL, but now just in a different format. So not to worry. If these commands seem maybe arcane to you at first glance, you'll get used to it with some practice if you choose to learn either MySQL or Postgres here. 

So let me pause before we go any further and just ask, what questions so far do we have on Postgres, if any? 

AUDIENCE: What's the sort of thing that indicates there's an error? 

CARTER ZENKE: A good question. So similar to SQLite or MySQL, you'll know you have an error when you hit Enter and it doesn't say either query OK or ptu, that command you just used. It might give you a particular error, like maybe you forgot a comma, maybe you forgot a quote somewhere. It should of give you an idea of what you didn't do correctly with the syntax. 

The best way to see it is to try it and figure out what it tells you as you go. OK, let's keep going then. And we'll see a few differences between Postgres and MySQL too. So we have our cards table. But our other goal is to create a stations table. So let me pull up what we did for this one. Let me find stations here. 

So this was our stations table with Postgres, or sorry, with SQLite. We had an ID, a name, and a line column, integer, text, and text for those types. Now we've seen in Postgres I could change this integer for my primary key to be a serial. That's a little more apt for this use case here. And it turns out that in Postgres I also have varchars I could use inside of this stations table. 

I could make name a varchar here up to size 32 characters. And for simplicity, though we used an enum earlier, I'll now use a varchar for line as well just to keep things simple. So here we've updated our stations table using what Postgres might offer us. The real differences though come not in our stations table but actually inside of our other table, this one called swipes. 

So this here was our swipes table, or some section of it. We had IDs up top and primary key and foreign key constraints down below. Now remember, we had type of text, date time of numeric, and amount of numeric as well. Well, Postgres offers some other types we could use to update these as well. Among them are enum, our old friend again. But we use enum a little bit differently. 

So instead of including enum inside of our column definition, we actually create our very own type. Postgres supports something like this. Create a type and give it some name. And then I could say what that type actually is. So here I've created some type and called it swipe type and then said, this type can have the following values, enter, exit, or deposit. 

And I can then use this name, swipe type, as my very own type in future create table statements, so something that Postgres has that MySQL might not in this form. Let's consider then dates and times. So dates and times, largely similar. I have timestamp, date, time, and interval, a particular type representing the distance between times, how long did something take? But we still have our old friend's date and time for dates and times respectively, and also timestamp to combine these two into one. 

And then we also have precision with these. I could say I want to represent to the hundreds place, the thousands place, or so on for each of timestamp, time, and interval. Continuing on, we could also work with real numbers, which are largely the same amenities between MySQL and Postgres with a few differences. 

One is that at least in Postgres we call decimal numeric. So numeric in this case can specify some precision, that is the number of digits, and the scale, that is the number that come after the decimal point. And newer to Postgres is a type particularly for money, which depending on your settings in your database will put to you either a certain number of decimal points and include perhaps a dollar sign, or a euro sign, or a pound sign, whatever it is you're using and whatever it is you've configured in your database. 

So let's update now. Let's go ahead and improve this statement in swipes from SQLite to Postgres. Here, I had type. Well, that could now be our very own swipe type. Notice here I defined up above as an enum. Now I'm going to use it down below in my swipes table. This type column has the type swipe type. And it cannot be null. 

Then I'll keep going. And I'll make datetime just timestamp. Timestamp is the newer version in Postgres of what MySQL had called date time and what SQLite called numeric. And then instead of, let's say, current timestamp, Postgres just uses this function called now, the current timestamp it will get for as you add some new row. And then finally, amount, we could use money. We could use numeric. 

We'll stick in this case to just numeric, like we did for MySQL and say numeric up to five digits, two of which come after the decimal point. So a whirlwind tour of some differences in types between MySQL and Postgres. What questions do we have on these? I'm not seeing too many here. Let me keep going. 

I'll come back to my computer here. So it's worth showing that we're able to create a table. We can then show it and describe it like this. But then the question remains, how do you get out of Postgres? Well, for that you can use-- let me clear my screen-- simply slash q. Whereas in MySQL you could use quit. In Postgres it's slash q. 

So I'll hit Enter on this. And we'll say goodbye to Postgres for now. But after the break, we'll come back and keep working with MySQL and focus on actually what features we can use to scale up our applications. What does MySQL and Postgres offer to serve more users, more frequently? Back in a few. 

And we're back. So we've gotten a taste so far of both MySQL and Postgres. But now it's time to think through the features each offers to scale up our application. So let's envision a sample application here, one that starts off pretty small. We have users trying to read from this database about 100 times per minute. 

But they're also writing to the database, adding new data about 50 times per minute. And maybe this application over time it grows a little bit. Maybe instead of 100 reads, it goes up to 1,000 and multiplies by 10, and same thing for our writes. They also multiply on average by about 10. And this application, it gets even more popular, increases by 10-fold again. Now we have 10,000 reads per minute, 5,000 writes per minute. And then finally, it keeps growing another 10-fold. And now we're at 100,000 reads per minute and 50,000 writes per minute. 

What might be the problem? If I had a single server, like this, why might my application start to slow down? 

AUDIENCE: So yeah, I would like to say that whenever the requests are increasing with the time to which the server would reply will also increase. And the list would seem to grow on and on. 

CARTER ZENKE: Yeah, I like your thinking there. So you can think of the time that each query takes. As we saw in our optimizing lecture, we could time our queries. And if I have many, many, many of those queries running all at one time, I mean that could add up substantially. So it's worth thinking about whether one computer can really handle all of these reads and all of these writes. 

So if you find yourself writing some application and it behaves a little bit slower than you want it to, particularly at scale, you have a few options. And I want to ask you just to use your own intuition here. Like if you encountered this problem of your application slowing down, what might you try to do to solve that problem? 

AUDIENCE: You could try to optimize your queries. 

CARTER ZENKE: Yeah, try to optimize your queries, make sure they take a little less time than before. Or you could even try to throw more hardware at the problem. Like maybe you have optimized your queries to the maximum you think you can. And it's just still not fast enough. Well, one approach might be what's called vertical scaling, trying to make this single server all the more quicker at replying to your queries. 

So vertical scaling is about increasing your application's capacity by increasing the capacity of a single server, the computing power of that server. So whereas this was our regular, old server, you could imagine trying to vertically scale this server just making it more powerful, optimize your queries and have them run on this server faster than they did before. 

But there's also another approach you could take. Here we focused on one server. But what else could you logically do to just have more resources at your disposal? 

AUDIENCE: Maybe we can horizontally scale up the server. We can add two or three more servers so we can manage it. 

CARTER ZENKE: Yeah, I like your thinking. So maybe we buy a fancier server like this. But if we ever run out of this server's capacity, well the only logical thing is to buy more servers. Like let's have not just one, but three for instance. And this is an example of horizontal scaling. So horizontal scaling is increasing your application's capacity not by increasing a single server's computing power, but by spreading it out, buying more servers to handle that load across more, in this case, individuals, people who can do work on those queries. 

So horizontal scaling might look a bit more like this. We have not just one server, but now three overall. So this process of having not just one server, but multiple, it's a process called replication. So replication is keeping copies of your database not just on one server, but on multiple. And this seems like a good idea. 

Like wouldn't it be great to have a copy of your data not just in one place, but in multiple? Maybe you yourself have made a backup of your own data on your computer. This is not a dissimilar idea. But it turns out there are some challenges with this. Like it's very complicated to have servers communicate with themselves to keep track of who has the most recent updates, who should I follow, who should lead, and so on. 

And so for that reason, there are a few models of replication that you should at least know about. Among them are single-leader replication, multi-leader replication, and leaderless replication. So single-leader means that there is a single database server out there that takes in the incoming writes to an application, the updates to some data. And that single server passes them on to some copy to take care of making sure it's redundant, that this data is stored not just in one place, but multiple places too. 

Multi-leader means there's not just one server listening for updates. There are multiple. And suffice to say, it gets a lot more complicated when there are more than one leader trying to listen for updates and process them. And finally, there are some other ones, like leaderless and some more, that actually do away with leaders altogether and do something totally different. 

But today will focus on single-leader, one of the most basic application strategies you could try and one that is built into MySQL. So let's focus on this one here. You could imagine, in this case, that you have a computer. And you're trying to connect to some social network. Well, here are two databases. One is a leader. And one is a follower. And just to check for understanding here, let's say I want to upload a profile photo. 

I'm updating some data. To which server would I send that data? If I want to change something on the server, who's listening for that request? 

AUDIENCE: I think that we'll send to the leader because the other servers would get data from the leader. They would get it from the leader. But I will send to the leader because it will have the latest data for other servers and other users. 

CARTER ZENKE: Yeah, exactly right. So if I want to change some piece of data, I know by definition the leader has the most recent copy of that data and is also the designated server to process updates to my database, changes to its data, like insertions, or updates, or deletions. So if I'm uploading some profile photo, I'll send this photo of me right here to my leader. And that leader will then take that profile photo and store it on the database. 

But before long, the leader will also talk to, communicate with the follower and make sure that the follower has a copy of this data elsewhere. So it will forward on this update to the follower who will then process it in turn. And now that both the leader and the follower have this data, well, who could I ask to see the latest profile photo for Carter Zenke? Which server might have that? 

AUDIENCE: [INAUDIBLE] servers that have the data. 

CARTER ZENKE: So I could ask either server for information on this profile photo. I could ask either the leader or the follower because each is storing a copy of this photo, or rather a copy of its location on the server itself. Now let's say here I asked the follower for this profile photo so I can see it on my own browser. 

Here, I'll ask the follower and get back that image inside of my own browser. But I could have asked either one to distribute that load across both of these servers. Now, this follower is what's known as a read replica. A read replica is a server from which we would only ever read data. If I ever wanted to update some photo or insert some new rows, I would never ask the follower. I would always ask the leader who's in charge, so to speak, of handling these updates and these writes to that database. 

So the next step is to think about not just the types of leaders that we have, but also how they communicate. And in general, there are two options for communication, both synchronous and asynchronous. So synchronous means that the leader will wait for the follower to get the data and process it before doing anything else. It synchronizes with that follower. 

Asynchronous though, means the leader sends that data to the follower and doesn't wait for the follower to finish processing. It just keeps going and going, hoping the follower is keeping up. So let's focus first on a few of these, in particular on synchronous. Now, in this diagram, I have a client and a server. 

My client is my computer. The server is some database. And on this x-axis here, left to right, I have time. Well, if I want to make a request to this server, I might denote that with a line, like this. My client sends some request to my server. Maybe it asks for data or tries to update some photo. Then that takes some time to get the request from my client to this server down here. 

And the response itself also takes some time. Maybe the server responds, hey, everything's OK. I got your photo. Here's your data, et cetera. It'll take some time for that to get back to me. So between here and here, that's how long it took for me to receive some data from this server. Now let's focus on synchronous replication here, where the leader and the follower are always in sync. 

So this might look a bit like this. If I add in the follower, I now have a leader and a follower. I might send that profile photo to the leader, a bit like this. The leader then sends it to the follower down below and waits. It waits until the follower has confirmed they have received that photo. They have added to the database. And now they send back a request saying, look, I got it. Everything is OK. 

And now only at this point, once the leader has received the OK from the follower, does it then communicate with me. It then says, everything's OK. We got your photo. So this is great because it's redundant. Like I know for sure that my photo is on both the leader and the follower. But there is a downside. And looking at this diagram here, what do you think that downside might be? 

AUDIENCE: Maybe it's too slow. 

CARTER ZENKE: Yeah, exactly. It could be just too slow. Like if I have to wait for the follower to get the OK back to the leader and then back to me, that's just more time for me to wait. And maybe my application shouldn't wait around that long. You might use this, though, if you're working in finance or healthcare, where you have to be doubly sure you have data stored exactly as you want it to be. 

But if you're like a social media site, like a Facebook or a Twitter, I mean, maybe you don't care if you lose a tweet every once in a while. It's OK. So maybe synchronous isn't the best, most ideal scenario for that as well. So let's think through now asynchronous. So we'll go back to asynchronous here and think through our diagram again. 

Well, in asynchronous, the leader doesn't wait around for the follower to get the data. So to visualize, my client could send this to the leader, maybe a photo update, for example. Then simultaneously, the leader tells me, I got your photo, everything's OK, and then sends the photo to the follower. The follower might later on tell the leader that everything is OK. 

But the leader already told me, look, all good, we got your photo. So there's nothing more to do after this. Now, this admittedly is faster. Like this amount of time is much shorter. But again, there is some downside here. Why might I not want to use asynchronous replication, at least in some cases? 

AUDIENCE: I would guess that it's an issue with data being corrupted by someone? 

CARTER ZENKE: So you're right. There could be some data corruption here, where let's say the leader sends some data to the follower. And they confirm for me that data has been received across all instances of these servers. Well, I mean, later on maybe the follower encounters some problem. And they don't actually follow through. Well, at that point, I have had a bad contract with my leader. 

It told me everything was OK. But actually, it wasn't. So there are all kinds of trade offs here, whether in terms of the time that things take or redundancy. But it's up to you to decide which one might work best for your own use case, synchronous replication or asynchronous. 

Now, in addition to replication, there is some other strategy for scaling you should be at least familiar with. And this is a process known as sharding. Now, sharding is great for really large data sets on the order of many, many, many gigabytes or terabytes of data that you just can't fit on a single server or a single computer. Sharding involves taking some data set like that and essentially splitting it up in some logical way across multiple servers, like this. 

And maybe for simplicity you have some database of names. And you decide that all of those names that begin with, let's say, A to I end up on this server here, this first one. But then all those names J through R, well, they get stored on this server here. And S through Z, they get stored on this third server here. So our data has been split, or sharded, across multiple instances of some database server. 

And this is helpful because if I had a really large data set, it now is smaller on every individual database server. You could also organize your data by, let's say, primary key. Maybe you have 3,000 rows, or 3,000 records. And the first 1 through 1,000, those go on this first server. The second, 1,001 to 2,000 inclusive, those go on the second server. And 2,001 to 3,000 also inclusive, those go on that third server. 

Now, there's a lot of thought that goes behind the best way to shard your data. The goal is to really to avoid what we call a hotspot, wherein one database server becomes more frequently accessed or requested than others. Let's say for some reason the data in 1,001 to 2,000 gets accessed quite a lot. Well, now we're actually trying to run into a problem here where one server gets overloaded, the same problem we're trying to guard against using sharding. 

So it's important then to think about what's the best, most logical way to distribute your data so you don't have one server being accessed much more than another. And there's another problem here too, which is if I'm not using replication, I'm only using sharding, well, let's say that one server goes down. What do we have now? Maybe we're missing all the names that went to the middle of our partition. Or maybe we're missing all of those rows, let's see, 1,001 to 2,000 for instance. 

And that would be what we call a single point of failure. If one system goes down, our entire system is now not usable. So something to keep in mind as you work on sharding is also could combine it with some strategies from replication to keep copies of your data across servers too. So these are some features built into MySQL and Postgres for scaling. 

But they also give us user accounts. And we can perhaps enhance our security by taking advantage of some of those user accounts. So let's talk now about this idea of access control, trying to have users and only grant some of them certain permissions. So here we saw before I was logging into MySQL using the root user. 

The root user is the admin. But I could just as easily create a new user, perhaps one called Carter for myself. So I will open up MySQL again. And in this case, I will create a new user, one named Carter. So to do so, I can use this command here, this statement here, create a user whose name is Carter in single quotes. Now, I can say they are identified by the password. 

And this password is literally password, P-A-S-S-W-O-R-D. It is one of, unfortunately, the most popular passwords. Do not use this password. So I will hit Enter here. And now I have created some new user named Carter. And I can log in using that Carter username. So I'll create myself a new terminal here. And I will try to log in to MySQL again. I will run this statement here. I'll say MySQL, which is the command line interface for the MySQL server. 

And I will say I want to connect using this user, Carter. Before we used root. But now I'll use Carter. The host, the place the server is located, is 127.0.0.1, my own computer here. This is the IP for this very computer. I can then say dash P, capital P, for the port number, 3306 by default. And finally, dash P means prompt me for my password. I'll type it in here. 

So I'll hit Enter. And now I get prompted for the password. I'll type that in here. And now-- oh wait, I typed in the wrong password. So let me try this again. I'll go back up. And I will type in my password, which was password, hit Enter. And now I'm logged in as Carter. So by default, if I try something like this, show databases Enter, I'll only see a few. Like I'm pretty sure I had MBTA and MFA in here and a few others. But I just can't see them. 

So when I create this new user, by default it has very few privileges, very few ways of accessing this database. But I could decide to grant some privileges. And just to show you, let me be root again for a second. Here I'm the admin. I can do most anything I want. And notice how in here I have a new database. I can type show databases Enter. And I'll see I have a new database called rideshare, which is similar from a prior week on viewing and security. 

So if I say use rideshare semicolon Enter, I could say select star from rideshare semicolon. Whoops, sorry, the table wasn't rideshare. Select star from rides. Then I'll Enter here. And now I'll see all of the rides in the rides table, which is part of the rideshare database. And if I look at this table, in our week on viewing, we talked about eliminating some PII, personally identifiable information. What was that PII here? 

Seems it was the rider column, like the names of those riders. We know where each rider is going. But ideally we should make a new view, one that allows us to see not the riders, but only their origins and destinations, removing this PII of individuals' names. So by default, Carter can't see this data. But neither can Carter see the view I made called analysis. 

If I go back to my computer here, I could type select star from analysis, hit semicolon Enter. And now I'll see the view I had made, the virtual table that removes that PII. But here I'm logged in as a root. If Carter wants to see this, I go over here, show databases, I can't even see the rideshare database. So the root user needs to grant access to Carter to see this particular view. 

Now, for that, I can use some syntax that looks a bit like this here. I can use both grant and revoke. Grant means to give some privilege, like select, or update, or delete on some table to a particular user. Revoke means simply to remove that permission. If I gave you select before, I'll now revoke it down below like this. And there are many privileges you can use. You could use all, like everything, allow Carter to do literally everything he could do. 

There's create to create tables, and views, and so on. There's insert to add data, select to read, and all the way down this list. Look at documentation to figure out what you can use here. So I want to give Carter it seems like a select privilege on the view I created for analysis in this database. So I'll come back over here. And I'll try that. 

I will log in as root here. And I will try to grant the select permission privilege on the rideshare database, and in particular, the analysis table inside of it. And I'll grant that to Carter, like this, semicolon and Enter. Now I'll see the query is OK. If I log back in as Carter down here, and now type show databases Enter, I should see I can now see the rideshare database. 

So I'll type use rideshare semicolon. The database changed. I could say show tables. And what do I see but that virtual table, that view called analysis. So now as Carter I can select star from analysis semicolon and see that same data. But I can't see the ride data. If I say select star from rides, that table is currently hidden to me. I get back, you don't have permission. 

So this is the benefit of MySQL's users and privileges to make sure I can have multiple users accessing this database and then only allowing some to access confidential data. So let me ask now what questions we have on access control in MySQL and Postgres. 

AUDIENCE: If we have to ask a test for everything, is there any particular syntax for that? Or should we type on all the select drive and everything? 

CARTER ZENKE: Yeah, great question here. Like if you want to give somebody multiple privileges, what could you do? One approach is you could say grant and then say the privilege, and then have a comma afterwards. I could grant select comma insert comma et cetera, including the tables that this user can insert, update, delete on, et cetera. You could also use all. Like if I say grant all on star dot star to Carter, that means grant all permissions on everything to this user named Carter. 

So you can absolutely do either of those approaches. But best to be more fine grained particularly in a production environment. OK, great questions here. Let's keep going. So this then was how to use access controls. But it's also worth thinking about ways we can prevent other kinds of attacks and harden the security of our database. Like here I have users and passwords, which works well. 

But if I were to expose my application to the outside world, I should also be concerned about a few other attacks here. Now, one of these attacks is a SQL injection attack. SQL injection means I have some query. And somebody else adds in some malicious code to finish that query for me. So for example, maybe I have a query like this, select ID from the users table where the username equals someone's username and the password equals somebody's password. 

And if I find a user matching this username and password, if I get back an ID, I'll log that user in. Now, in a regular website of someone who's actually behaving nicely, they could type in their username, like this. And they could type in their password a bit like this. This is kind of what I did before. I logged in as Carter and gave the password of password. 

And when you are entering your own username and your own password on some website, something similar to this is likely happening. There might be some SQL query that takes your username and takes your password and inserts them into a SQL query to figure out, are you who you say you are? But let's think a bit more maliciously about this. 

Like here, if I gave you permission to add anything to this dot dot dot here, anything at all, what could you do to try to hack this database, to log in as Carter? Keep in mind that this is selecting an ID from users where some condition is true. What might you be able to add to this dot dot dot to log in as me? 

AUDIENCE: Yes, you could use some kind of statements like union, or [INAUDIBLE], or delete something, et cetera. 

CARTER ZENKE: Yeah, I could really put most anything I want in here. But the SQL injection attack comes about when I deliberately put some SQL statement that I think will give me access to some data I shouldn't have access to. So you could imagine maybe I type in my password. But then I type in a single quote followed by or 1 equals 1. Now, what would that do? 

Well, notice that 1 equals 1, that condition will always be true. And so this condition, where username equals Carter and password equals password, well, in that case, I have access to Carter's account because I modified the condition just always be true whether or not the password is correct. Let's see one more example here. 

Maybe I'm logging into my bank account. And I want to check my balance. We could maybe assume that the bank has some table of balances or has some table of accounts that have balances. And I could type in my account number, like this, one, two, three, four, all the way up to number nine here. And that will give me access to my account balance. But now, let's take a look at this query here, select balance from accounts where ID equals the account number. 

What kinds of SQL statements could you include in this ID value here to get access to everyone's account balances? 

AUDIENCE: So we basically have a web which there's some ID. If we just commented that part out, then it would be sort of a problem. Or if we just did a similar example to the last time, like put the or 1 is equal to 1, that could also trigger it. So there are multiple options. 

CARTER ZENKE: Yeah, I like your thinking here. There's definitely more than one option. I like your thinking of maybe we include some comment that kind of messes up this query. Maybe it removes the condition so I just get select balance from accounts. That could be one strategy. I could maybe include an or that maybe makes sure that this where is just always true, give me back all balances from this account. 

I could also use as well a union like this. I could say make sure I add my own account ID but then union some other query like this, union select ID from accounts. Now I have all the account IDs from this table. And I could then query them one at a time for their balance. So let's try this out in MySQL to see some of the dangers of these injection attacks. 

I'll go back to my computer here. And I'll show you how I already have a database set up for this. If I type show databases, hit Enter, I see bank is one database here. I want to use that database. I'll say use bank and then hit Enter semicolon. And now I can say show tables where I see an accounts table. And I'll select star from accounts, Enter semicolon here, to see some familiar users and some familiar balances. 

So let's actually complete this injection attack we just saw on the slides. Maybe this bank has some feature where I can choose some ID. I can enter in my own account ID to see my balance. So on the website they complete this query for me. I give them, let's say, the ID 1. I am Alice in this case. They might finish this. They might say select star from accounts where the ID equals 1. And they might automatically run this SQL query. 

I'll hit Enter. And now I'll see the balance for this account. Well, now, what else could I do? I could try to deliberately mess up this query to run my own SQL injection attack. I could select, let's say, star from accounts. And then let me try where the ID equals 1. I give them 1. But I also give them some SQL statements they unwittingly execute. I say union and then select star from accounts, a bit like this. 

Now I'll hit Enter. And if I complete this query, I see not just my own. I see everyone's balances and their names too. So this is one demo of how SQL injection attacks can show and leak data that isn't intended to be leaked. And sadly, these are fairly common today among people who just aren't familiar with how to prevent them. 

So let me ask now what questions we have on these attacks before we get how to guard against them. 

AUDIENCE: And just, can we use this in regular expressions also? 

CARTER ZENKE: Could you use regular expressions? So if you're familiar, a regular expression is some pattern of characters that can match some text and find values within some text. I personally haven't seen regular expressions being used for injection attacks. But that doesn't mean they can't be. 

And so always do your research on what you can and can't use for an injection attack. But a good question here. So we've seen now SQL injection attacks. But how do we prevent them? Well one strategy is using what we call a prepared statement. A prepared statement is kind of what it sounds like. A prepared statement is a statement that we will later on Insert some values into. 

We might treat it like a fill in the blank, like here is my select and I'll let you fill in what you want me to select. But I'll make sure to clean your input before actually adding it in. So let's think through a few of these here. Here I have prepare name from statement. This is a SQL statement I can use to prepare some statement, to take some user input to make sure that when I add in that user input, nothing malicious will happen. 

And in particular, I can make sure the statement looks something like this, select balance from accounts where ID equals question mark. This question mark is just a question mark. But syntactically it stands for the ability of this statement to clean whatever I put inside of it to make sure it doesn't inadvertently run SQL code I don't want it to run. 

So let's try making our very own prepared statement to guard against these kinds of insertions here. I'll go back to my computer. And let's try this one out. I could go back to MySQL. And let me prepare some statement. I could say let's prepare a statement called balance check, like we had before. And now if I try to prepare this statement from the following, select star from accounts where the ID equals question mark end quote semicolon. 

So in single quotes I have the query I'm trying to prepare to take user input. And I'm giving it some name, like balance check. So I'll hit Enter now. And I'll see the query is OK, statement prepared. Now if I want to run this statement to check on somebody's balance, I can go through a few steps. 

I could first set my own variable. Like let's say I get some user input from the user. They type in 1. Underneath the hood in SQL, I could say set this variable called ID equal to 1. This at symbol, which is a convention for variables in MySQL. Now I'll hit Enter. So the user input is stored separately in some variable not inside of MySQL query, like we saw before. 

Now I can execute. I can say execute, in this case, the balance check statement using that same variable at ID. So that question mark I had before I will then take whatever the user typed in and clean it, and then substitute in that value into this. So I'll say using ID. And now I get back Alice's balance. And just to be clear here, let me try to make this deliberately malicious. 

I want to see everyone's accounts. Well, we saw before that I completed this query by typing in-- let's see-- set at ID equal to 1 but then union select star from accounts. This was what I added to that query to show me everyone's account balances. And I'll try to do the same here. I'll say this is what I'm trying to substitute into this prepared statement. I'll hit Enter. 

Now I'll execute again. I'll say execute balance check using at ID, which has an ID, but also some malicious SQL statements. Now if I hit Enter, what do I see? Only Alice's balance. So it seems that the prepared statement cleans up my input, ensures that if I have SQL statements they don't get run. 

So we've seen now SQL injection attacks and how to guard against them. What questions do we have? 

AUDIENCE: So for these previous examples you've shown us, the [INAUDIBLE] statements only take into account the first acceptable condition? Or how does it work? 

CARTER ZENKE: Yeah, good question. So here, let me show you again the ID that I had put inside of here. So if I say select at ID, like that, I'll see this is the value of ID, one union select star from accounts. And what happens is this prepared statement does a process called escaping. It finds all of the SQL keywords, all of the values that could be malicious and escapes them so they don't actually get executed. 

So it seems like in this case what actually happened was if we type in the query again balance check was select star from accounts where ID equals 1. It seems like it just did the equals 1 part to give me back, in this case, the account balance while cleaning up the rest of it overall. OK, let's take one more question here. 

AUDIENCE: So is this the same reason why we shouldn't use a formatted string in Python to execute a SQL query? 

CARTER ZENKE: Yeah, so if you're familiar with Python format strings, it's a similar kind of pitfall, which is you can add in any possible value for this variable you're substituting into. And that can lead to some unintended consequences. So similarly, should we create prepared statements that can actually clean and escape the data that we might want to insert into our queries or our statements over there too. Good question. 

OK, so we've seen here SQL injection attacks and also how to guard against them. And actually, by now you're pretty well equipped to go off into the world of databases. And even though this is our last week, I hope you think back on how far you have come. So you began by querying a single table, a table of books in this case. You then graduated into tables that were related of books, and authors, and publishers, and so on, and querying those as well. 

Afterwards, you saw how to design your own databases, learning about types and create table statements to build something up from scratch. Finally, you saw how to write, how to add data to those databases you had designed. And then you kept going. You saw how to view databases, how to see them in a more simplified light, how to optimize those queries, and finally today how to scale up all of this so you can serve more and more users. 

So I hope after all these weeks that you are proud of how much you've grown. We are certainly proud of you all too. Now go off and continue your learning. We'll see you later.