WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:05.250 --> 00:00:07.250 DOUG LLOYD: So in our videos on web development, 00:00:07.250 --> 00:00:10.420 we've talked a couple of times now about what a database is. 00:00:10.420 --> 00:00:12.920 And in this video, we're goingww to get into a bit more detail 00:00:12.920 --> 00:00:16.820 and show you exactly what a database is, why we would use it, 00:00:16.820 --> 00:00:18.920 and how we might manipulate it. 00:00:18.920 --> 00:00:23.844 In order for us to build websites that are sort of more complex than just 00:00:23.844 --> 00:00:26.010 a page where they just go and see news, for example, 00:00:26.010 --> 00:00:27.926 we might need a database to store information, 00:00:27.926 --> 00:00:30.620 such as username and password combinations 00:00:30.620 --> 00:00:32.830 so that a user attempts to log in. 00:00:32.830 --> 00:00:34.830 That log-in information is sent to the database. 00:00:34.830 --> 00:00:36.740 It is checked against information in the database 00:00:36.740 --> 00:00:39.230 to see whether that username-password combination matches. 00:00:39.230 --> 00:00:41.400 And if so, it lets the user in. 00:00:41.400 --> 00:00:44.990 We might also store other stuff for users, like their shopping history 00:00:44.990 --> 00:00:47.510 or really any other information that you might 00:00:47.510 --> 00:00:50.750 want to keep long-term for a user. 00:00:50.750 --> 00:00:54.740 Now, if you ever used programs like Microsoft Excel or Google Sheets 00:00:54.740 --> 00:01:00.830 or Numbers, you're probably familiar with the basic idea of a database. 00:01:00.830 --> 00:01:03.584 A database consists of a couple of different levels of hierarchy. 00:01:03.584 --> 00:01:05.000 Within a database, we have tables. 00:01:05.000 --> 00:01:08.130 And within each of those tables, we have rows and columns. 00:01:08.130 --> 00:01:10.880 And if you are familiar with Excel, or perhaps even if you're not, 00:01:10.880 --> 00:01:14.420 let's take a quick second and I can draw this analogy for you 00:01:14.420 --> 00:01:17.700 about what a database, a table, a row, a column is. 00:01:17.700 --> 00:01:22.700 And then we can translate that in just a moment to the same idea in SQL. 00:01:22.700 --> 00:01:26.769 So on my screen here, I've opened up an Excel file. 00:01:26.769 --> 00:01:29.060 And across the top here, you see I have these letters-- 00:01:29.060 --> 00:01:34.400 A, B, C, D, E. These would be different columns in my file. 00:01:34.400 --> 00:01:37.800 Down the left side, I have numbered rows, where I might put information. 00:01:37.800 --> 00:01:41.130 So, for example, I might just put my name here in cell A1. 00:01:41.130 --> 00:01:43.672 So it's in the A column in the first row. 00:01:43.672 --> 00:01:45.380 Now, down at the bottom left is a feature 00:01:45.380 --> 00:01:48.830 that you may not have used too much if you don't use Excel very frequently. 00:01:48.830 --> 00:01:50.660 But there's these notions of sheets. 00:01:50.660 --> 00:01:53.060 Sheets are sort of akin to different tables. 00:01:53.060 --> 00:01:56.780 Notice that when I switch to sheet 2, the data that was in sheet 1, my name, 00:01:56.780 --> 00:01:57.870 is no longer there. 00:01:57.870 --> 00:02:02.000 So each sheet has its own unique set of rows and columns. 00:02:02.000 --> 00:02:06.480 But all of these sheets are still bound up in one single file called Book 1. 00:02:06.480 --> 00:02:09.380 So the analogy here is that Book 1 is our database, which 00:02:09.380 --> 00:02:12.770 contains a different number of tables, which in Excel parlance 00:02:12.770 --> 00:02:14.090 is just a sheet. 00:02:14.090 --> 00:02:16.850 And each sheet has columns that we can put data in 00:02:16.850 --> 00:02:20.010 and rows that we can put data in, as well. 00:02:20.010 --> 00:02:23.120 So there are a couple of different types of database engines 00:02:23.120 --> 00:02:24.860 that we can use in our programs. 00:02:24.860 --> 00:02:29.780 We're going to talk about SQL, S-Q-L, which stands for the Structured Query 00:02:29.780 --> 00:02:30.921 Language. 00:02:30.921 --> 00:02:33.170 The Structure Query Language is a programming language 00:02:33.170 --> 00:02:37.160 whose sole purpose in life is to query or ask questions of or retrieve data 00:02:37.160 --> 00:02:39.120 from a database. 00:02:39.120 --> 00:02:42.361 And there are many different implementations of SQL. 00:02:42.361 --> 00:02:44.360 Two of the most popular are as follows-- we have 00:02:44.360 --> 00:02:47.510 MySQL which is an open-source platform. 00:02:47.510 --> 00:02:50.230 It is very commonly used to establish relational databases. 00:02:50.230 --> 00:02:51.260 We're not going to get into a lot of detail 00:02:51.260 --> 00:02:53.359 in this video about what a relational database is 00:02:53.359 --> 00:02:54.650 versus other types of database. 00:02:54.650 --> 00:02:56.649 But just know that there are more than one type, 00:02:56.649 --> 00:02:59.640 and we cover relational databases in this course. 00:02:59.640 --> 00:03:03.380 Another type, which we actually have used in CS50 since 2016, 00:03:03.380 --> 00:03:07.834 is SQLite, which has a very similar feature set to MySQL. 00:03:07.834 --> 00:03:09.500 But it's just a little more lightweight. 00:03:09.500 --> 00:03:12.020 It's a little easier to use on CS50 IDE. 00:03:12.020 --> 00:03:15.140 And so that's why we're actually going to be using that one in the class. 00:03:15.140 --> 00:03:20.540 Now, regardless of which implementation of SQL that you use, a lot of them 00:03:20.540 --> 00:03:25.280 will come with a tool called phpMyAdmin, which is a GUI or Graphical User 00:03:25.280 --> 00:03:28.760 Interface tool that is used to execute some 00:03:28.760 --> 00:03:33.410 of the more tedious or mundane database queries in a more user-m friendly way 00:03:33.410 --> 00:03:36.070 because you can just click and do things in the web browser. 00:03:36.070 --> 00:03:39.620 And those are most commonly used to build databases in the first place 00:03:39.620 --> 00:03:44.030 and to set up tables because that's the first thing you're 00:03:44.030 --> 00:03:47.522 going to do once you get your database configured, is to create a table. 00:03:47.522 --> 00:03:49.730 Because without a table, we have no rows and columns. 00:03:49.730 --> 00:03:52.430 And without rows and columns, we can't store any data. 00:03:52.430 --> 00:03:55.340 Tables have very cumbersome syntax that is used to set them up. 00:03:55.340 --> 00:04:00.200 And so using phpMyAdmin, the graphical interface, to build your table 00:04:00.200 --> 00:04:02.420 is definitely gonna come in handy. 00:04:02.420 --> 00:04:04.910 In the process of building your table, you're 00:04:04.910 --> 00:04:08.149 going to have to specify exactly which columns 00:04:08.149 --> 00:04:09.690 are going to be stored in that table. 00:04:09.690 --> 00:04:12.273 So at the very beginning, when you create your table, you say, 00:04:12.273 --> 00:04:14.850 my table's going to store usernames and passwords 00:04:14.850 --> 00:04:16.850 and whatever sort of other information you want. 00:04:16.850 --> 00:04:20.150 and you have to specify that before you have inserted any data into the table. 00:04:20.150 --> 00:04:24.410 So you have to design ahead of time what your table is going to look like. 00:04:24.410 --> 00:04:27.650 Once you've done that, pretty much every query except for, like, 00:04:27.650 --> 00:04:29.744 deleting the database and deleting the table 00:04:29.744 --> 00:04:31.910 that you're going to use on that table going forward 00:04:31.910 --> 00:04:33.620 is going to refer to data which is stored 00:04:33.620 --> 00:04:37.700 in the different rows of the table. 00:04:37.700 --> 00:04:40.310 Just like in C, every column of our SQL table 00:04:40.310 --> 00:04:43.420 is capable of holding data of different data types. 00:04:43.420 --> 00:04:48.080 So in C, for example, we had characters and strings and integers and floats. 00:04:48.080 --> 00:04:51.110 And SQL has a few more data types than that. 00:04:51.110 --> 00:04:54.030 This is just 20 of them, and this is not even an exhaustive list. 00:04:54.030 --> 00:04:55.610 But some of these things should look familiar. 00:04:55.610 --> 00:04:57.901 So, for example, we have int, which can store integers. 00:04:57.901 --> 00:05:00.830 But we also have these four other types, which can hold integers 00:05:00.830 --> 00:05:02.360 with different upper bounds. 00:05:02.360 --> 00:05:05.200 So you may recall from C that the upper bound of integer 00:05:05.200 --> 00:05:10.220 is 2 to the 31st power, or 2 to the 32nd power if they're unsigned integers. 00:05:10.220 --> 00:05:12.620 But here we can say small ints or tiny ints 00:05:12.620 --> 00:05:14.990 or medium or big ints, each of which have 00:05:14.990 --> 00:05:17.690 different upper bounds on the values. 00:05:17.690 --> 00:05:21.500 Decimal and float stand in place of double and float, 00:05:21.500 --> 00:05:25.850 which we're familiar with from C. We can also store date and time stamps 00:05:25.850 --> 00:05:27.590 in SQL databases. 00:05:27.590 --> 00:05:29.860 There's no data type for that that's native to C. 00:05:29.860 --> 00:05:32.760 But in SQL, there are a couple of different ways to do this. 00:05:32.760 --> 00:05:35.940 We can even do more exotic things like store geometry or line strings. 00:05:35.940 --> 00:05:36.830 And what are these? 00:05:36.830 --> 00:05:41.330 Well, geometry and line strings can be used to store in a SQL database 00:05:41.330 --> 00:05:45.820 a mapping out or a drawing out of an area on a map, such as using GIS data. 00:05:45.820 --> 00:05:48.440 And we can actually store that in our SQL table 00:05:48.440 --> 00:05:52.310 and recreate that exact drawing a little bit later on. 00:05:52.310 --> 00:05:55.160 Text sort of stands in the stead of strings 00:05:55.160 --> 00:06:00.200 for just arbitrarily large chunks of text. 00:06:00.200 --> 00:06:04.170 Enums are also a type that exists in C, but we don't talk about them too much 00:06:04.170 --> 00:06:08.650 in C. But really quickly, what an enum is is it is a column of your table 00:06:08.650 --> 00:06:11.645 that can be used to store a limited set of values. 00:06:11.645 --> 00:06:14.930 So, for example, I could have an enum that is called, 00:06:14.930 --> 00:06:17.034 like, favorite colors, for example. 00:06:17.034 --> 00:06:18.950 And I could specify when I'm building my table 00:06:18.950 --> 00:06:22.670 that it can only be capable of holding red, green, and blue. 00:06:22.670 --> 00:06:25.880 If you tried to insert a row that had purple in that place, 00:06:25.880 --> 00:06:29.510 that wouldn't work because it is not one of the enumerated values that 00:06:29.510 --> 00:06:31.520 can be stored in that column. 00:06:31.520 --> 00:06:34.310 There are also char and varchar. 00:06:34.310 --> 00:06:38.057 And these are not quite the same as what you might think they are in C. 00:06:38.057 --> 00:06:39.890 So let's just take a quick second to explain 00:06:39.890 --> 00:06:43.520 the difference between these two data types, which are pretty important. 00:06:43.520 --> 00:06:47.600 So unlike in C, char does not refer to a single character. 00:06:47.600 --> 00:06:51.180 It is actually sort of akin to our notion of a string, 00:06:51.180 --> 00:06:54.620 but with a caveat that that string is a fixed length. 00:06:54.620 --> 00:06:57.500 And typically when we specify a char or a varchar type, 00:06:57.500 --> 00:07:00.680 we have to specify the length of that string at the outset, 00:07:00.680 --> 00:07:02.810 just like we do in c. 00:07:02.810 --> 00:07:07.460 So, for example, my column type might be a char 10. 00:07:07.460 --> 00:07:10.940 That means that I can store 10 character strings 00:07:10.940 --> 00:07:15.005 in that column of my table, exactly 10 character strings. 00:07:15.005 --> 00:07:16.880 So if I tried to store, for example, the word 00:07:16.880 --> 00:07:21.420 "hi," H-I, which is just two letters, that would go into the column OK. 00:07:21.420 --> 00:07:24.020 But it would also store 8 extra-- 00:07:24.020 --> 00:07:28.130 the equivalent of null bytes, basically, so that I still had 10 characters, 00:07:28.130 --> 00:07:34.145 per se, in that column for that row. 00:07:34.145 --> 00:07:37.089 And if I tried to store, like, a 15-character string, 00:07:37.089 --> 00:07:37.880 that wouldn't work. 00:07:37.880 --> 00:07:40.100 I'd only end up storing the first 10 letters. 00:07:40.100 --> 00:07:43.430 So it's always going to be 10, every time. 00:07:43.430 --> 00:07:46.490 A varchar, on the other hand, refers to a variable-length string. 00:07:46.490 --> 00:07:49.490 So if I said that my column was varchar 99, 00:07:49.490 --> 00:07:54.740 I can store 1, 2, 3, 4, 5 character strings, up to 99 characters 00:07:54.740 --> 00:07:59.090 long, without having to have all of this extra sort of slack space 00:07:59.090 --> 00:08:02.360 or null bytes or zeros tacked onto the end. 00:08:02.360 --> 00:08:05.100 So char, fixed-length strings. 00:08:05.100 --> 00:08:06.557 Varchar, variable-length strings. 00:08:06.557 --> 00:08:08.390 We won't get into the difference between why 00:08:08.390 --> 00:08:09.440 you might want to use one or the other. 00:08:09.440 --> 00:08:12.315 But there are reasons why you might not always want to use a varchar, 00:08:12.315 --> 00:08:15.790 and you might want to use a char in some situations. 00:08:15.790 --> 00:08:19.840 Now, SQLite actually has many or all of these same data types. 00:08:19.840 --> 00:08:22.480 But the difference there is that each of those data types 00:08:22.480 --> 00:08:26.220 is affiliated with what's called a type affinity to simplify things. 00:08:26.220 --> 00:08:28.150 So that whole list of 20-- 00:08:28.150 --> 00:08:32.110 or more, really-- can be reduced to one of these five different affinities-- 00:08:32.110 --> 00:08:34.660 null, integer, real, text, and blob. 00:08:34.660 --> 00:08:36.700 Null is probably pretty obvious-- nothing. 00:08:36.700 --> 00:08:37.919 Integer-- whole numbers. 00:08:37.919 --> 00:08:40.929 Real-- that would include things like decimal and float. 00:08:40.929 --> 00:08:43.780 Text would include things like char and varchar. 00:08:43.780 --> 00:08:48.010 Blob is just-- that would be more like the geometry or data that isn't really 00:08:48.010 --> 00:08:51.070 text, but it's just a large number of bits or bytes. 00:08:51.070 --> 00:08:52.060 That would be a blob. 00:08:52.060 --> 00:08:55.420 And everything can reduce to one of these five different type affinities. 00:08:55.420 --> 00:08:58.720 So after specifying the columns, it's also really important in our SQL table 00:08:58.720 --> 00:09:01.480 to have one other consideration, which is to have 00:09:01.480 --> 00:09:04.150 one column which is our primary key. 00:09:04.150 --> 00:09:06.550 Why do we need a primary key? 00:09:06.550 --> 00:09:09.220 The reason is every row of our table, in order 00:09:09.220 --> 00:09:13.810 to make our SQL queries most effective, should be able to be uniquely 00:09:13.810 --> 00:09:15.520 and quickly identified. 00:09:15.520 --> 00:09:17.890 So choosing the right primary key will allow 00:09:17.890 --> 00:09:23.980 us to make sure that there is one value in every row that is completely unique. 00:09:23.980 --> 00:09:27.550 And if that is true, if there is one column in every row that's unique, 00:09:27.550 --> 00:09:30.130 then we can uniquely identify or very quickly identify 00:09:30.130 --> 00:09:32.500 which row we're talking about. 00:09:32.500 --> 00:09:35.090 Now, it's also possible to establish a joint primary key, 00:09:35.090 --> 00:09:37.241 which is just a combination of, say, two columns-- 00:09:37.241 --> 00:09:39.490 but it can be an arbitrarily large number of columns-- 00:09:39.490 --> 00:09:41.114 that is always guaranteed to be unique. 00:09:41.114 --> 00:09:44.200 So I could have one column that could always have A's or B's or C's. 00:09:44.200 --> 00:09:47.200 I could have another column that has ones and twos and threes and fours. 00:09:47.200 --> 00:09:50.440 I could have multiple A's and multiple B's and multiple C's, multiple ones, 00:09:50.440 --> 00:09:51.940 twos, and threes. 00:09:51.940 --> 00:09:54.080 But across all of those rows, I'm only ever allowed 00:09:54.080 --> 00:09:57.160 to have 1 combination of A1. 00:09:57.160 --> 00:09:58.630 I can have B1, C1. 00:09:58.630 --> 00:10:00.370 I can have A2, A3, A4. 00:10:00.370 --> 00:10:04.450 But there's only ever one combination of those two columns that is unique. 00:10:04.450 --> 00:10:08.457 That would also be acceptable because that is a joint primary key. 00:10:08.457 --> 00:10:11.290 Now, SQL is a programming language, like other programming languages 00:10:11.290 --> 00:10:12.190 that we've discussed. 00:10:12.190 --> 00:10:14.557 But it has a very limited vocabulary. 00:10:14.557 --> 00:10:17.140 Now, there are more things that you can do with SQL than we're 00:10:17.140 --> 00:10:18.820 going to talk about in this video because in this video 00:10:18.820 --> 00:10:21.191 we're just going to talk about the four operations 00:10:21.191 --> 00:10:24.190 that one can perform on a table, or the four main operations that you'll 00:10:24.190 --> 00:10:26.260 most likely perform on the table-- 00:10:26.260 --> 00:10:29.161 INSERT, SELECT, UPDATE, and DELETE. 00:10:29.161 --> 00:10:32.410 And these four things we'll get into a bit more detail in just a second about. 00:10:32.410 --> 00:10:34.630 But these are definitely the most common four things 00:10:34.630 --> 00:10:36.370 you'll be doing with the tables that you're building, 00:10:36.370 --> 00:10:39.460 certainly in CS50, but also probably more generally whenever you're 00:10:39.460 --> 00:10:41.269 working with databases. 00:10:41.269 --> 00:10:44.060 So for all of the examples that we're going to cover in this video, 00:10:44.060 --> 00:10:46.450 we're going to consider a database that contains 00:10:46.450 --> 00:10:50.530 these two tables called users and moms. 00:10:50.530 --> 00:10:54.370 And you can see users has four different columns-- idnum, username, password, 00:10:54.370 --> 00:10:55.720 and fullname. 00:10:55.720 --> 00:10:59.684 And moms has two different columns, username and mother. 00:10:59.684 --> 00:11:01.600 And let's now start to work with these and see 00:11:01.600 --> 00:11:05.860 how we can manipulate this database and the tables within it to-- 00:11:05.860 --> 00:11:09.190 for whatever reason we need them to-- be updated in our website. 00:11:09.190 --> 00:11:11.320 So the first operation we'll cover is insert. 00:11:11.320 --> 00:11:14.230 And as you might expect, it adds information to a table. 00:11:14.230 --> 00:11:17.604 Now, SQL queries have a certain structure to them. 00:11:17.604 --> 00:11:20.020 And so with each of these operations, what I'm going to do 00:11:20.020 --> 00:11:24.680 is give you the general sort of skeleton of what a query might look like. 00:11:24.680 --> 00:11:27.070 And then we'll use a couple of examples of that query 00:11:27.070 --> 00:11:29.350 to see how it affects our table. 00:11:29.350 --> 00:11:33.190 So an insert query, in general, looks something like this-- 00:11:33.190 --> 00:11:35.110 INSERT INTO table. 00:11:35.110 --> 00:11:37.690 So we specify what table we want to insert into. 00:11:37.690 --> 00:11:39.610 And then we have parentheses, columns-- which 00:11:39.610 --> 00:11:43.180 is a comma-separated list of all of the columns of our table 00:11:43.180 --> 00:11:45.730 that we want to insert data into. 00:11:45.730 --> 00:11:49.510 Then we have VALUES, and then a comma-separated list of the values 00:11:49.510 --> 00:11:53.150 that we want to put into those columns in the same order. 00:11:53.150 --> 00:11:59.110 So for example, I might want to INSERT INTO users into these three columns-- 00:11:59.110 --> 00:12:01.540 username, password, name. 00:12:01.540 --> 00:12:06.417 The values, respectively-- newman, in lowercase, USMAIL-- 00:12:06.417 --> 00:12:08.500 that's going to be corresponding to the password-- 00:12:08.500 --> 00:12:11.650 and then capital-N Newman for fullname. 00:12:11.650 --> 00:12:17.140 So assuming that I execute this query on the users table shown here, 00:12:17.140 --> 00:12:18.279 what's going to happen? 00:12:18.279 --> 00:12:19.820 Well, this is what's going to happen. 00:12:19.820 --> 00:12:24.350 We're going to add this row because we inserted into the users table. 00:12:24.350 --> 00:12:26.350 But you might be asking yourself, wait a minute. 00:12:26.350 --> 00:12:29.150 You never specified ID number. 00:12:29.150 --> 00:12:31.030 So how exactly did ID number get there? 00:12:31.030 --> 00:12:34.740 I mean, you specified newman and USMAIL and Newman. 00:12:34.740 --> 00:12:37.120 But that 12, that was never part of our query. 00:12:37.120 --> 00:12:38.064 And you're right. 00:12:38.064 --> 00:12:40.480 Because when you define the column that ultimately ends up 00:12:40.480 --> 00:12:43.414 being your table's primary key, it's usually a good idea 00:12:43.414 --> 00:12:44.830 to have that column be an integer. 00:12:44.830 --> 00:12:49.150 It's not a requirement, but it's usually a good idea. 00:12:49.150 --> 00:12:53.110 And because having a primary key is so important, 00:12:53.110 --> 00:12:55.600 you might want to specify it to autoincrement. 00:12:55.600 --> 00:12:58.450 And what this basically means is you can forget 00:12:58.450 --> 00:13:01.030 to include it as part of your query. 00:13:01.030 --> 00:13:05.080 And if you do, it will automatically insert a value there 00:13:05.080 --> 00:13:07.930 that is unique from every other value in that column, 00:13:07.930 --> 00:13:10.737 typically by just incrementing by one every time, 00:13:10.737 --> 00:13:12.820 so that that row is still guaranteed to be unique. 00:13:12.820 --> 00:13:15.170 Because if it wasn't set to autoincrement 00:13:15.170 --> 00:13:18.690 and we just forgot to omit it, there might 00:13:18.690 --> 00:13:22.830 be a couple of rows that are actually blank or null there. 00:13:22.830 --> 00:13:25.740 But if that's our primary key, if ID number is our primary key, 00:13:25.740 --> 00:13:30.040 and we have a couple of rows that are blank there, now they're not unique. 00:13:30.040 --> 00:13:31.707 We have two rows that are blank. 00:13:31.707 --> 00:13:33.040 We can't uniquely identify them. 00:13:33.040 --> 00:13:36.359 And so when you set up your primary key, you usually want it to be an integer, 00:13:36.359 --> 00:13:39.150 and you usually want it to autoincrement so that you can completely 00:13:39.150 --> 00:13:42.740 forget about having to worry about that and let the table do it for you. 00:13:42.740 --> 00:13:44.100 So that's how 12 got there. 00:13:44.100 --> 00:13:48.540 I had specified my idnum column do autoincrement every time I 00:13:48.540 --> 00:13:52.010 make an insert into the table. 00:13:52.010 --> 00:13:53.230 Let's do this another time. 00:13:53.230 --> 00:13:55.813 Let's insert into the other table, insert into the moms table, 00:13:55.813 --> 00:13:58.300 the following values into these following columns. 00:13:58.300 --> 00:14:01.980 We can INSERT INTO the username and mother columns of the users table 00:14:01.980 --> 00:14:05.069 the VALUES kramer and Babs Kramer. 00:14:05.069 --> 00:14:08.110 And just like what we saw before, this is what we would have as a result. 00:14:08.110 --> 00:14:10.068 We would just insert that row, and it would now 00:14:10.068 --> 00:14:13.449 be part of our database and part of our tables. 00:14:13.449 --> 00:14:14.740 So that's the insert operation. 00:14:14.740 --> 00:14:17.260 Now let's talk about the second operation, which is select. 00:14:17.260 --> 00:14:20.200 So if insert is used to put information into a table, 00:14:20.200 --> 00:14:23.110 select is used to get that information back out of the table 00:14:23.110 --> 00:14:24.850 so that we can do something with it. 00:14:24.850 --> 00:14:28.630 Select queries also have a very similar skeleton. 00:14:28.630 --> 00:14:29.770 And they look like this-- 00:14:29.770 --> 00:14:32.110 SELECT and then whatever columns you want 00:14:32.110 --> 00:14:35.766 to choose FROM whichever table you want to choose them from. 00:14:35.766 --> 00:14:38.140 And then optionally in red here, these two other things-- 00:14:38.140 --> 00:14:40.550 WHERE some condition is satisfied. 00:14:40.550 --> 00:14:43.900 And we usually, in the context of SQL, refer to a condition as a predicate. 00:14:43.900 --> 00:14:47.740 But basically, we're just checking to see that some situation is true. 00:14:47.740 --> 00:14:51.160 And we might want to, for example, order them by a specific column so 00:14:51.160 --> 00:14:54.940 that they are organized alphabetically by the values in some column 00:14:54.940 --> 00:14:55.750 or whatever else. 00:14:55.750 --> 00:14:58.770 We don't have to do either WHERE or ORDER BY. 00:14:58.770 --> 00:15:00.520 But generally WHERE in particular is going 00:15:00.520 --> 00:15:03.700 to be used so that you don't get your entire database back. 00:15:03.700 --> 00:15:06.580 And ORDER BY just helps to keep things a little more organized. 00:15:06.580 --> 00:15:09.670 So, for example, here's a select query that we could use-- 00:15:09.670 --> 00:15:13.712 SELECT idnum, fullname from the users table. 00:15:13.712 --> 00:15:15.420 So let's see what's going to happen here. 00:15:15.420 --> 00:15:19.690 So remember, I'm selecting idnum and fullname from the users table. 00:15:19.690 --> 00:15:22.420 Well, what is the select query going to return to me? 00:15:22.420 --> 00:15:23.647 This information here. 00:15:23.647 --> 00:15:25.480 It's going to look at each row of the table, 00:15:25.480 --> 00:15:28.960 and it's going to just pull out the ID number and the full name 00:15:28.960 --> 00:15:32.050 and give me all of those down the entire users table. 00:15:32.050 --> 00:15:34.100 My user table only consists of three rows, 00:15:34.100 --> 00:15:36.370 so it's given me three pairs of information, 00:15:36.370 --> 00:15:39.930 three idnum-fullname combinations. 00:15:39.930 --> 00:15:41.680 Now let's say I want to restrict my search 00:15:41.680 --> 00:15:43.180 a little bit, my query a little bit. 00:15:43.180 --> 00:15:48.200 SELECT password FROM users WHERE idnum is less than 12. 00:15:48.200 --> 00:15:51.704 So now I'm adding a predicate or a condition to my select query. 00:15:51.704 --> 00:15:52.870 What's going to happen here? 00:15:52.870 --> 00:15:55.480 Well, I'm going to get this information. 00:15:55.480 --> 00:15:58.990 This is just the password column from the users table 00:15:58.990 --> 00:16:02.360 where the ID number in that row is less than 12. 00:16:02.360 --> 00:16:05.830 So only rows 10 and 11's password get given back to me. 00:16:05.830 --> 00:16:09.610 12's does not because 12 is not less than 12. 00:16:09.610 --> 00:16:13.540 There's one other thing we can do with a select query, and that's to select *. 00:16:13.540 --> 00:16:17.750 SELECT * from moms where username equals jerry. 00:16:17.750 --> 00:16:20.800 * is just shorthand for every column. 00:16:20.800 --> 00:16:24.040 So instead of having to specify username, mother, the two columns here, 00:16:24.040 --> 00:16:26.740 or, if I was using the users table, having to specify idnum, 00:16:26.740 --> 00:16:29.650 username, password, fullname, I can just say SELECT *. 00:16:29.650 --> 00:16:31.680 Give me everything in that row. 00:16:31.680 --> 00:16:35.170 And if I SELECT * from moms where username equals jerry, 00:16:35.170 --> 00:16:39.096 I am given this set of information back. 00:16:39.096 --> 00:16:41.550 So here's the thing that's cool about databases, 00:16:41.550 --> 00:16:44.249 though-- we don't have to just have two tables. 00:16:44.249 --> 00:16:46.290 And in particular, we don't have to have just one 00:16:46.290 --> 00:16:50.130 table that stores every relevant piece of information about a user. 00:16:50.130 --> 00:16:53.050 If we wanted to store, in addition to what we have there, information 00:16:53.050 --> 00:16:57.754 like their address and their date of birth and their social security number 00:16:57.754 --> 00:16:59.670 or whatever else we wanted to keep about them, 00:16:59.670 --> 00:17:02.169 this table could get bigger and bigger and bigger and bigger 00:17:02.169 --> 00:17:05.760 to the point where it becomes almost annoying to have to use it. 00:17:05.760 --> 00:17:08.930 But we can use relationships between different tables. 00:17:08.930 --> 00:17:10.680 And that's where the term relational comes 00:17:10.680 --> 00:17:12.599 into play for a relational database. 00:17:12.599 --> 00:17:15.089 We can set up our tables within our databases 00:17:15.089 --> 00:17:18.900 carefully enough that we can use relationships between them 00:17:18.900 --> 00:17:22.319 to pull information from where we need it without all that information having 00:17:22.319 --> 00:17:25.270 to be located in the same table in the first place. 00:17:25.270 --> 00:17:30.480 So, for example, here is our database as we last left it, with three rows 00:17:30.480 --> 00:17:35.231 in each of the users and the moms tables. 00:17:35.231 --> 00:17:36.980 Now let's imagine a hypothetical situation 00:17:36.980 --> 00:17:41.720 where we want to pair a user's full name, which is currently only stored 00:17:41.720 --> 00:17:45.080 in the users table, with their mother's name, which 00:17:45.080 --> 00:17:47.474 is only stored in the mother table. 00:17:47.474 --> 00:17:49.140 They're not on the same table right now. 00:17:49.140 --> 00:17:52.190 So we can't use the same sort of select query that we used before. 00:17:52.190 --> 00:17:56.000 We have to use what's called a select join query. 00:17:56.000 --> 00:17:58.700 It looks pretty similar to the select query before. 00:17:58.700 --> 00:18:01.250 But we're adding one extra piece of information, which 00:18:01.250 --> 00:18:03.360 is what tables we're joining together. 00:18:03.360 --> 00:18:07.610 So I want to select a specific set of columns from one table, 00:18:07.610 --> 00:18:10.302 joining another table onto it just temporarily. 00:18:10.302 --> 00:18:11.510 It doesn't really merge them. 00:18:11.510 --> 00:18:16.700 But it creates this sort of hypothetical table that does merge them. 00:18:16.700 --> 00:18:18.679 On predicate, where predicate is-- basically, 00:18:18.679 --> 00:18:20.720 we're trying to find where the two tables overlap 00:18:20.720 --> 00:18:23.930 so that we can create this merged table just temporarily 00:18:23.930 --> 00:18:27.140 for purposes of this query, and then have them separate again. 00:18:27.140 --> 00:18:30.320 So, for example, I might have a query that looks like this-- 00:18:30.320 --> 00:18:36.620 SELECT users.fullname, comma, moms.mother FROM users joining the moms 00:18:36.620 --> 00:18:41.090 table onto it temporarily ON-- 00:18:41.090 --> 00:18:44.220 which is akin to WHERE-- 00:18:44.220 --> 00:18:48.055 users.username equals moms.username. 00:18:48.055 --> 00:18:50.180 And this syntax is also a little unfamiliar, right? 00:18:50.180 --> 00:18:53.820 I'm now prepending some of my column names with table names. 00:18:53.820 --> 00:18:56.570 The reason I'm doing this is particularly because of the last line 00:18:56.570 --> 00:19:03.050 there, where in each of those two tables I had a column called username. 00:19:03.050 --> 00:19:05.180 I need to disambiguate between them. 00:19:05.180 --> 00:19:07.640 So I need to specify what table and what column. 00:19:07.640 --> 00:19:10.931 That's all that's happening here, is I'm being very explicit about which column 00:19:10.931 --> 00:19:15.240 I want by specifying not only the column name, but also the table name. 00:19:15.240 --> 00:19:19.400 So here is what would happen if I executed this select query. 00:19:19.400 --> 00:19:21.980 Here are our two tables at the outset. 00:19:21.980 --> 00:19:25.520 This is all of the information that I'm trying to get to start with. 00:19:25.520 --> 00:19:31.970 I'm going to join these two tables together where where users.username 00:19:31.970 --> 00:19:34.130 equals moms.username. 00:19:34.130 --> 00:19:35.730 Newman does not appear in both tables. 00:19:35.730 --> 00:19:38.600 So I'm not going to even come close to extracting that piece of information. 00:19:38.600 --> 00:19:41.420 Kramer does not appear in both tables, so that one's out too. 00:19:41.420 --> 00:19:45.050 But jerry and gcostanza both appear in both tables. 00:19:45.050 --> 00:19:47.990 And so that is sort of the overlap point. 00:19:47.990 --> 00:19:50.810 And you can think about this as if we have 00:19:50.810 --> 00:19:54.600 our users table, which is like one circle over here, 00:19:54.600 --> 00:19:56.699 and our moms table, which is one circle over here. 00:19:56.699 --> 00:19:59.240 When we're joining, it's sort of like if it's a Venn diagram. 00:19:59.240 --> 00:20:01.864 We're trying to figure out what those two tables have in common 00:20:01.864 --> 00:20:04.805 and using that as, like, our anchor point for the join tables. 00:20:04.805 --> 00:20:05.930 So that's what we did here. 00:20:05.930 --> 00:20:08.660 We found where the two tables have data in common. 00:20:08.660 --> 00:20:12.200 And based on that, we create this hypothetical table 00:20:12.200 --> 00:20:15.800 called users & moms that contains all of the information that 00:20:15.800 --> 00:20:18.740 appears in both of those tables. 00:20:18.740 --> 00:20:22.490 But notice that the second column there sort of indicates the overlap. 00:20:22.490 --> 00:20:27.489 This column refers to data that is in both users.username and moms.username. 00:20:27.489 --> 00:20:29.030 So that's what we're extracting here. 00:20:29.030 --> 00:20:31.363 But remember that this query, we didn't want everything. 00:20:31.363 --> 00:20:35.947 It wasn't SELECT * FROM users JOIN moms WHERE users.username equals 00:20:35.947 --> 00:20:36.530 moms.username. 00:20:36.530 --> 00:20:40.650 Username We just want the full name and the mother's name. 00:20:40.650 --> 00:20:42.200 And so that's what we extract. 00:20:42.200 --> 00:20:44.210 And so that really long query just results 00:20:44.210 --> 00:20:48.650 in us getting this set of information-- users.fullname and moms.mother 00:20:48.650 --> 00:20:52.920 for every row in both tables where the username happens to be the same. 00:20:52.920 --> 00:20:53.532 All right. 00:20:53.532 --> 00:20:54.990 So that was a bit more complicated. 00:20:54.990 --> 00:20:56.540 You probably won't do too many joins. 00:20:56.540 --> 00:21:00.210 Fortunately, the last two operations are a little bit easier. 00:21:00.210 --> 00:21:02.510 So update is the third of the four major operations. 00:21:02.510 --> 00:21:04.850 This is what we use to modify information in the table 00:21:04.850 --> 00:21:06.901 or change it or update information. 00:21:06.901 --> 00:21:08.150 The skeleton looks like this-- 00:21:08.150 --> 00:21:12.740 UPDATE table SET column equals value-- so we're updating some column to have 00:21:12.740 --> 00:21:14.510 a new piece of data in it-- 00:21:14.510 --> 00:21:15.459 WHERE predicates. 00:21:15.459 --> 00:21:17.750 We're trying to figure out which row we want to update, 00:21:17.750 --> 00:21:19.800 and then we're updating that column. 00:21:19.800 --> 00:21:22.550 So, for example, I want to UPDATE users SET password equals 00:21:22.550 --> 00:21:25.566 yadayada WHERE idnum equals 10. 00:21:25.566 --> 00:21:27.690 You can probably guess what's going to happen here. 00:21:27.690 --> 00:21:29.773 We're going to find the row whose ID number is 10, 00:21:29.773 --> 00:21:34.520 and we're going update the password there from "fusilli" to "yadayada." 00:21:34.520 --> 00:21:35.630 Pretty straightforward. 00:21:35.630 --> 00:21:37.970 And the fourth and final operation that we do in SQL 00:21:37.970 --> 00:21:41.120 most commonly is to delete, which strikes or completely removes 00:21:41.120 --> 00:21:42.972 information from a table. 00:21:42.972 --> 00:21:44.930 The basic framework looks something like this-- 00:21:44.930 --> 00:21:48.140 DELETE from some table WHERE predicate, or, again, 00:21:48.140 --> 00:21:50.160 where some condition is satisfied. 00:21:50.160 --> 00:21:53.360 So, for example, you might want to DELETE from our users table WHERE 00:21:53.360 --> 00:21:54.994 the user name equals 'newman.' 00:21:54.994 --> 00:21:57.410 Now, if we did that, and this was our table to start with, 00:21:57.410 --> 00:21:59.326 you can probably guess what's going to happen. 00:21:59.326 --> 00:22:02.420 After that query executes, the row whose username is newman 00:22:02.420 --> 00:22:06.020 is no longer in our table. 00:22:06.020 --> 00:22:09.170 Now, all of the operations that we just showed 00:22:09.170 --> 00:22:12.924 are really easy to do in the graphical interface of phpMyAdmin, 00:22:12.924 --> 00:22:15.090 which, again, comes with most implementations of SQL 00:22:15.090 --> 00:22:16.574 that you might download. 00:22:16.574 --> 00:22:18.740 Here's the catch, though-- phpMyAdmin, despite being 00:22:18.740 --> 00:22:22.640 very user-friendly to use, requires manual intervention. 00:22:22.640 --> 00:22:25.640 You're going to have to log in and click on some buttons 00:22:25.640 --> 00:22:27.740 to delete information or update it or change it. 00:22:27.740 --> 00:22:29.009 And we don't want to do that. 00:22:29.009 --> 00:22:30.050 We're a webmaster, right? 00:22:30.050 --> 00:22:31.970 We want things to happen automatically for us. 00:22:31.970 --> 00:22:35.147 We want programs that we write to make those changes for us. 00:22:35.147 --> 00:22:37.730 So we don't want to go into phpMyAdmin and make those changes. 00:22:37.730 --> 00:22:40.490 We have to write code that does it for us. 00:22:40.490 --> 00:22:43.430 Fortunately, SQL integrates really nicely 00:22:43.430 --> 00:22:46.760 with a lot of modern programming languages such as Python or PHP 00:22:46.760 --> 00:22:50.570 that have functions that you can use to connect to your database. 00:22:50.570 --> 00:22:55.040 And then the programming language has other functions 00:22:55.040 --> 00:22:58.220 that will query the database for you and make those changes. 00:22:58.220 --> 00:23:01.430 Now, we're going to leave off for now exactly how those languages do that. 00:23:01.430 --> 00:23:04.007 We'll save that for a video on those languages themselves. 00:23:04.007 --> 00:23:05.840 But know that that is something that you can 00:23:05.840 --> 00:23:08.840 do to avoid having to do any sort of manual intervention 00:23:08.840 --> 00:23:13.960 when you want to update or do any sort of operation at all on your databases. 00:23:13.960 --> 00:23:15.220 My name is Doug Lloyd. 00:23:15.220 --> 00:23:20.220 This is CS50.