[MUSIC PLAYING] DOUG LLOYD: In our videos on web development topics, we've mentioned the concept of a database a few times, right? So a database you're probably familiar with from say using Microsoft Excel or Google Spreadsheets. It's really just an organized set of tables, rows, and columns. And a database is where our website stores information that is important for our website to work properly. Again, a really common example here is storing usernames and passwords in a database, so that when a user logs into our website, the database can be queried to see if that user exists in the database. And if they are, checking that their password is correct. And if their password is correct, then we can give them whatever page they're requesting. So you're probably, again, familiar with this idea from Excel or Google Spreadsheets. We have databases, tables, rows, and columns. And that's really sort of the fundamental set of hierarchical breakdown here. So here's an Excel spreadsheet. And if you've ever opened this or another similar program you know that these here are rows-- 1, 2, 3, 4, 5, 6, 7. These are columns. Maybe down here, although you may not use this feature terribly much-- I'll zoom in-- we have this idea of a sheet. So maybe these sheets, if I alternate back and forth, are different tables that exist in my database. And if we continue the example all the way, the name of this database is Book 1. Maybe I have Book 2 and Book 3. So each Excel file is a database, each sheet is a table, and inside of each table I have this idea of rows and columns. So how do I work with this database? How do I get information from it? Well there's a language called SQL-- which I usually just call Sequel-- and it stands for the Structured Query Language. And it is a programming language, but it's a fairly limited programming language. It's not quite like others that we have worked with. But the purpose of this programming language is to query a database, to ask information of a database, find information in a database, and so on. We also, in CS50-- and it's a very common platform, it's called MySQL. That's what we use in the course. It's an open source platform that establishes a so-called relational database-- a database, effectively. We don't need to get into too much detail on what a relational database is. But the SQL language is very adept at working with MySQL and other similar styles of relational databases. And many installations of MySQL come with something called phpMyAdmin, which is a graphical user interface-- a GUI-- that makes it a little more user friendly to execute database queries, because databases are not just used by advanced programmers, right? Sometimes there are these small businesses, and they can't afford to hire a team of programmers, but they still need to store information in a database. Something like phpMyAdmin makes it very easy for somebody who's never programmed before to pick up and become familiar with how to work with a database. The problem is, phpMyAdmin, while it's a fantastic tool for learning about databases, it is manual. You're going to have to log into it and execute commands and type things in manually. And as we know from our example on PHP web programming, having to manually do things on our website, if we want a dynamic, active responsive website, perhaps not the best approach. We would like to find a way to perhaps automate this somehow. And SQL will enable us to do this. So when we're going to start working with SQL, we first need to have a database to work with. Creating a database is something you probably will do in phpMyAdmin, because you'll only need to do it once, and the syntax for doing so is a lot more straightforward. It's a lot easier to do it in a graphic user interface than typing it out as a command. The command can get a little cumbersome. Similarly, creating a table can get quite a bit cumbersome as well. And so things like creating a database and creating a table, which you're probably only going to do once-- once per table, once per database-- it's OK to do that in a graphical interface. In the process of creating a table, you'll also have to specify all of the columns that will be in that table. What sort of information do you want to store in the table? Maybe a user's name and date of birth, password, user ID number, and maybe city and state, right? And for every time we want to add a user to the database, we want to get all six of those pieces of information. And we do that by adding rows to the table. So we first create a database, then we create a table. As part of creating a table, we are asked to specify every column that we would like in this table. And then as we start to add information to the database and query the database more generally-- not just adding, but everything else we do-- we'll be dealing with rows of the table, which is one user's information from the entire set. So every SQL column is capable of holding data of a particular data type. So we sort of eliminated this idea of data types in PHP, but they're back here in SQL. And there's a lot of data types. Here's just 20 of them, but it's not even all of them. So we have ideas like INTs-- Integers-- we probably know that this column can hold integers. And there are variations thereon-- SMALLINT, TINYINT, MEDIUMINT, BIGINT. Maybe we don't always need four bites. Maybe we need eight bytes, and so we can use these variations on integers to be a bit more space efficient. We can do decimal numbers, we can do floating point numbers. These are pretty similar. There are some differences, and if you would like to look up the SQL sort of guide, you can see what the slight differences are between them. Maybe we want to store information about date and time. Maybe we're keeping track of when the user joined our website, and so maybe we want to have a column that's a date time or a timestamp that indicates when the user actually signed up. We can do geometries and linestrings. This is actually pretty cool. We could map out a geographical area using GIS coordinates to plot out an area. So can actually store that sort of information in a SQL column. TEXT is just giant blobs of text, maybe. ENUMs are kind of interesting. They actually exist in C. We don't talk about them because they're not terribly commonly used, at least CS50. But it's an enumerated data type, which is capable of holding limited values. A really good example here would be to create an enum where the seven possible values are Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, right? That data type Day of Week does not exist, but we could create an enumerated data type such that that column can only ever hold one of those seven possible values. We've enumerated all of the possible values. Then we have CHAR and VARCHAR, and I've color these green because we're actually going to take a second to talk about the difference between these two things. So CHAR, unlike C where CHAR was a single character, in SQL a CHAR refers to a fixed length string. And when we create this column, we actually can specify the length of the string. So in this example, we might say CHAR(10). That means that every element of that column will consist of 10 bytes of information. No more, no less. So if we try and put in a 15 bit or a 15 character element or value into this column, we're only get the first 10. If we put in the two character long value, we're going to have the two characters, and then eight null bites. We'll never be more efficient than that. A VARCHAR is kind of like our notion of a string that we're familiar with from C or from PHP. It's a variable length string. And when you create this column, you just specify the maximum possible lengths. So maybe 99, or commonly 255. That would be the maximum length. And so if we were storing 15 character string, we would use 15 bytes, maybe 16 bytes for the null terminator. If we were storing a three character string, we would use three or four bytes. But we wouldn't use the full 99. So why would we have both? Well, if we need to figure out how long something is with a VARCHAR, we have to kind of iterate across it just like we did in C and figure out where it stops. Whereas if we know that everything in this column is 10 bytes, maybe we know that information, we can jump 10 bytes, 10 bytes, 10 bytes, 10 bytes, and always find the beginning of the string. So we may have some wasted space with a CHAR, but maybe there's a trade off of having better speed in navigating the database. But maybe we want the flexibility of a VARCHAR instead of having-- If our CHAR was 255, but most of our users were only inputting three or four bytes worth of information or three or four characters worth of information. But some users were using the whole 255, maybe VARCHAR would be more appropriate there. It's sort of a trade off, and generally for purposes of CS50, you don't need to worry too much about whether you use a CHAR or a VARCHAR. But in the real world, these things do matter because all of these columns take up actual physical space. And physical space, in the real world, comes at a premium. So one other consideration when you're building a table is to pick one column to be what's called a primary key. And a primary key is a column where every single value is unique. And that means that you can easily pick out a single row just by looking at the primary key of that row. So for example, you generally, with users, don't want two users who have the same user ID number. And so maybe you have lots of information, and maybe two users can have the same name-- you have John Smith and John Smith. That's not necessarily a problem, because there are multiple people in the world named John Smith. But we only have one user ID number 10, one user ID number 11, 12, 13. We don't have two users with the same number, and so maybe user ID numbers would be a good primary key. We don't have any duplication, and we can now uniquely identify every single row just by looking at that column. Choosing primary keys can actually make subsequent table operations a lot easier because you can leverage the fact that certain rows will be unique, or a certain column of your database or table will be unique to pick out particular rows. You can also have a joint primary key, which you may find occasion to use, which is just a combination of two columns that is guaranteed to be unique. So maybe you have one column that's As and Bs, one column that's one, two, and three, but you'll only ever have a single A1, a single A2, and so on and so on. But you might have a B2, a C2, or an A1, A2, A3, A4. So you might have multiple As, multiple Bs, multiple ones, multiple twos, but you can only ever have a single A1, B2, C3, and so on. So as I said, SQL is a programming language, but it has a fairly limited vocabulary. It's not quite as expansive as C and PHP and other languages that we talk in the course. It is more verbose a language than what we're going to talk about in this video, because in this video we're going to talk about four operations that we can perform on a table. There are more than this. We can do more than this, but for our purposes, we're generally going to be using just four operations-- insert, select, update, and delete. And you can probably intuitively guess what all four of these things do. But we'll go into a bit of detail on each one. So for purposes of this video, let's assume we have the following two tables in a single database. We have a table called Users that has four columns-- ID number, username, password, and full name. And we have a second table in the same database called Moms that just stores information about a username and a mother. So for all of examples in this video, we'll be using this database and subsequent updates to it. So let's say we want to add information to a table. That's what the insert operation does. In explaining all of these commands, I'm going to give you a general skeleton to use. Because basically, the queries are going look pretty similar, we're just going to be changing slightly different pieces of information to do different things with the table. So for INSERT, the skeleton looks kind of like this. We want to insert into a particular table. Then we have an open parenthesis and a list of columns that we want to put values into. Close parentheses, the following values, and then again, we list out the values we want to put in the table. So an example of this would be the following. I want to insert into the table users the following columns-- username, password, and fullname. So a new row where I'm putting in those three columns and we're going to put in the values Newman, USMAIL, and Newman. So in this case, I'm putting the lowercase newman into the username column, the password USMAIL, and the full name capital N Newman into the fullname column. So here's what the database looked like before. Here's what the users table on the top looked like before we did this. After we execute this query, we get this. We've added a new row to the table. But notice this one thing that I didn't specify, but somehow I've got a value for, which is this 12 right here. I didn't say I wanted to put ID number in there. I wanted to put username, password, fullname. And I did that, that's fine. But I also got this 12. Why did I get this 12? Well, it turns out that when you are defining a column that is going to be your primary key, which is usually, like I said, an ID number. It's not always necessarily going to be an ID number, but it's usually a good idea to be some sort of integer value. You have an option in phpMyAdmin when you're creating your database or your table to set that column as auto incrementing. Which is a really good idea when you're working with a primary key, because you want every value in that column to be unique. And if you forget to specify it for more than one person, you now have a situation where that column's no longer unique. You have two blanks, so you can no longer uniquely identify a column-- or you can no longer uniquely identify a row based on that column. It's lost all of its value as a primary key. And so apparently what I have done here is configured the user ID column to auto increment so that every time I add information to the table, it will automatically give me a value for the primary key. So I can never forget to do it because the database will do it for me. So that's kind of nice. And so that's why we get 12 in there, because I've set that column up to auto increment. If I added somebody else it'd be 13, if I added somebody else it'd be 14, and so on. So let's just do one more insertion. We'll insert into the moms table, in particular, the username and mother column, the values kramer and Babs Kramer. And so we had this before. After we execute that SQL query, we have this. We've added Kramer and Babs Kramer to the moms table. So that's inserting. SELECT is what we use to extract information from the table. So this is how we get information out of the database. And so SELECT commands are going to be very frequently used in programming. The general framework-- the general skeleton looks like this. Select a set of columns from a table, and then optionally you can specify a condition-- or what we typically call a predicate, is usually the term we use in SQL. But it's basically what particular rows you want to get. If you want to, instead of getting everything, narrow it down, this is where you would do that. And then optionally, you can also order by a particular column. So maybe you want to have things sorted alphabetically based on one column or alphabetically based on another. Again, WHERE and ORDER BY are optional. But they'll probably be useful-- particularly WHERE will be useful to narrow down so you don't get your entire database back and have to process it, you just get the pieces of it that you care about. So for example, I might want to select ID number and fullname from users. So what might this look like? So here's my users table. I want to select idnum and fullname from users. What am I going to get? I'm going to get this. I didn't narrow it down, so I'm getting the ID number for every row and I'm getting the full name from every row. OK. What if I want to select password from users WHERE-- so now I'm adding a condition, a predicate-- where idnum is less than 12. So here's my database again, my users table the top. What am I going to get if I want to select that information, the password, where user ID or idnum is less than 12? I'm going to get this information back, right? It happens that idnum is 10, less than 12, ID number 11 less than 12. I'm getting the password for those rows. That's what I asked for. What about this? What if I want to select star from the moms table where username equals Jerry? OK, select star is the special sort of wild card so-called that we use to get everything. So they're saying select username comma mother, which happened to be the only two columns of this table, I can just select star and get everything where the username equals Jerry. And so that's what I would get if I made that particular query. Now, databases are great because they allow us to organize information perhaps a bit more efficiently than we might otherwise. We don't necessarily to store every relevant piece of information about a user in the same table. We had two tables there. We need to store everybody's mother's name, and maybe we don't have social security number, we have their date of birth. That doesn't always need to be in the same table. As long as we can define relationships between the tables-- and that's where that relational database term kind of comes into play-- as long as we can define relationships between the tables, we can sort of compartmentalize or abstract things a way, where we only have the really important information we care about in the user's table. And then we have ancillary information or extra information in other tables that we can connect back to the main users table in a particular way. So here we have these two tables, but there's a relationship between them, right? It seems like username might be something that exists in common between these two different tables. So what if we now have a situation where we want to get a user's full name from the user's table, and their mother's name from the mother table? We don't have a way to get that as it stands, right? There's no single table that contains both full name and mother's name. We don't have that option from what we've seen so far. And so we have to introduce the idea of a JOIN. And JOINs are probably the most complex-- it's really most complex operation we're going to talk about in the video. They're a little complicated, but once you get the hang of it, they're actually not too bad. It's just a special case of a SELECT. We're going to select a set of columns from a table joining in a second table on some predicate. In this case, think about it like this-- table one is one circle over here, table two is another circle over here. And that predicate part in the middle, it's sort of like if you think about as a Venn diagram, what do they have in common? We want to link these two tables based on what they have in common and create this hypothetical table that is the merger of the two together. So we'll see this in an example and maybe that'll help clear it up a little bit. So maybe you want to select user.fullname and moms.mother from users joining in the moms table in every situation where the username column is the same between them. And this is a new syntax here, this user. and moms.. If I'm doing multiple tables together, I can specify a table. I can distinguish in particular on that on at the very bottom there. I can distinguish the username column of the users table from the username column of the moms table, which are otherwise-- if we just said username equals username, that doesn't really mean anything. We want to do it where they match. So I can specify the table and the column name in case of a situation where it would be unclear what I'm talking about. So that's all I'm doing there is I'm saying this column from this table, and being very explicit. So again, I'm selecting the full name and the mother's name from the users table linked together with the moms table in every situation where they share that column-- they share that username notion. So here are the tables we had before. This is the state of our database as it exists right now. The information we're extracting is this to start with. This is the new table we're going to create combining these together. And notice we're not highlighting Newman's row in the user's table, and we're not highlighting Kramer's row in the moms table because neither one exists in both sets-- in both tables. The only information that is in common between them is Jerry is in both tables and gcostanza is in both tables. And so when we do the SQL JOIN, what we get-- and we doing actually get this. It's sort of a temporary variable. It's like a hypothetical merger of the two tables. We actually get something like this, where we've merged together the tables on the information that they have in common. So notice that users.username and moms.username column, it's exactly the same. That was the information that was consistent from the users table and the moms table. And so we merged them together. We discarded Kramer because he didn't exist in the users table, and we discarded Newman, because he didn't exist in the moms table. So this is the hypothetical merger using the JOIN operation of SELECT. And then we were looking for the user's full name and the user's mother, and so this is the information that we would get from the overall query that we made with SELECT. So we joined the tables together and we extracted those two columns, and so that's what we would get. But SQL JOINs a kind of complicated. You probably won't do them too much, but just have some idea of the skeleton that you could use to merge two tables together if you needed to. The last two are a bit simpler I promise. So updating, we can use UPDATE to change information in a table. The general format is UPDATE some table, SET some column to some value WHERE some predicate is satisfied. So for example, we might want to update the users table and set the password to yada yada, where the ID number is 10. So in this case, we're updating the users table. The ID number is 10 for that first row there, and we want to update the password to yada yada. And so that's what would happen. It's pretty straightforward, right? It's just a very simple modification to the table. DELETE is the operation we used to remove information from a table. DELETE FROM table WHERE some predicate is satisfied. We want to delete from the users table for example where the username is Newman. You can probably guess what's going to happen here after we execute that SQL query, Newman is gone from the table. So all these operations, as I've said, are very easy to do in phpMyAdmin. It's a very user friendly interface. But it does require manual effort. We don't want to employ manual effort. We want our programs to do this for us, right? So we might want to do this programmatically. We want to incorporate SQL and have something else to do this for us. But what have we seen that allows us to programmatically do something? We've seen PHP, right? It introduces some dynamism into our programs. And so fortunately, SQL and PHP play very nicely together. There's a function in PHP called query, which can be used. And you can pass as the parameter or the argument to query a SQL query that you would like to execute. And PHP will do it on your behalf. So after you've connected to your database with PHP, there's two primaries you do this. There's something called MySQLi and something called PDO. We won't go into a huge amount detail there. In CS50 we use PDO. After you've connected to your database, you can then make queries your database by passing the queries as arguments to PHP functions. And when you do that, you store the result set in an associative array. And we know how to work with associative arrays in PHP . So I might say something like this-- $results-- this is in PHP-- equals query. And then inside of the query function that argument that I'm passing to query that looks like SQL. And in fact that is SQL. That's the query string that I would like to execute on my database. And so in red, this is PHP. This is SQL that I'm integrating into PHP by making it the argument to the query function. I want to select fullname from users where ID number equals 10. And then maybe after I've done that, I might say something like this. I want to print out the message Thanks for logging in. And I want it interpolate-- I want to interpolate $results fullname. And so that's how I work with that associative array that I got back. $results fullname would basically end up printing out, thanks for logging in, Jerry Seinfeld. That was the full name where idnum equals 10. And so all I'm doing is I'm now-- I stored my query, the results of my query and results in an associative array, and fullname is the name of the column I was getting for. So that's my key into the results associative array that I want. So Thanks for logging in, $results, fullname will print out, will stick right in between those curly braces, Jerry Seinfeld. And I'll like to print out the message Thanks for logging in Jerry Seinfeld. Now, we probably don't want to hard code things like that in, right? We might want to do something like print f, where we can substitute and maybe collect different information, or maybe have the query process different information. And so query, the query function has this notion of sort of substitutions very similar to print f percent s and percent c, is question marks. And we can use question marks very analogously to print f to substitute variables. So maybe your user logged in earlier, and you saved their user ID number in $_session of PHP super global in the key ID. So maybe after they logged in, you set $_session ID equals 10, extrapolating from the example we just saw a second ago. And so when we actually execute this query the results now, it would plug in 10, or whatever the $_session ID value is. And so that allows us to be a bit more dynamic. We're not hard coding things in anymore. We're saving information somewhere and then we can use that information again to sort of generalize what we want to do, and just plug-in and change the behavior of our page based on what the user's ID number actually is after they've logged in. It's also possible, though, that your results set might consist of multiple rows. In which case, you have an array of arrays-- an array of associative arrays. And you just need to iterate through it. And we know how to iterate through an array in PHP, right? So here is probably the most complex thing we've seen so far. It actually integrates three languages together. Here in red, this is some HTML. I'm apparently starting-- this is a snippet of some HTML that I have. I'm starting a new paragraph that says the moms of TV's Seinfeld. And then immediately afterwards I'm starting a table. And then after that, I have some PHP, right? I have all of this PHP code in there. I'm apparently going make a query. And to make the query, I'm going to be using SELECT mothers FROM moms. So this is getting-- this is SQL. So the the blue is SQL. The red we saw a second ago was HTML. And the green here is PHP. So I'm making a query to my database, I'm selecting all of the mothers in the moms table. Not just narrowing it down to particular row, I'm asking for all of them. Then I check if result is not equals equals false. This is just my way of checking sort of if results is not equal to null, that we would see c for example. Basically this is just checking to make sure that it actually got data back. Because I don't want to start printing out data if I didn't get any data. Then for each results as a result the foreach syntax from PHP, all I'm doing is printing out $result mothers. And so I'm going to get a set of all of the mothers of each-- it's an array of associative arrays-- and I'm printing out each one as its own row of a table. And that's really pretty much all there is to it. I know there's a little bit going on here in this last example with arrays of arrays-- arrays of associative arrays. But it really does just boil down in SQL to making a query, usually selecting after we've already put information into the table, and then just pulling it out. And this is we would pull it out in this particular case. We would extract all of the individual mothers from the moms table. We got a whole set of them, and we want to iterate through and print out each one. So again, this is probably the most complicated example we've seen because we're mixing three different languages together, right? Again, we have HTML here in red, mixed with some SQL here in blue, mixed with some PHP in green. But all of these play nicely together, it's just a matter of developing good habits so that you can get them to work together the way you want. And the only way to really do that is to practice, practice, practice. I'm Doug Lloyd, this is CS50.