WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:02.622 --> 00:00:06.640 CARTER ZENKE: Well, welcome everyone to our week seven, CS50 section. 00:00:06.640 --> 00:00:07.680 My name is Carter Zenke. 00:00:07.680 --> 00:00:09.780 I'm the Course Preceptor here on campus. 00:00:09.780 --> 00:00:13.410 And the goal of these sections is really to help bridge the gap between lecture 00:00:13.410 --> 00:00:14.890 and this week's problem set. 00:00:14.890 --> 00:00:17.790 So the goal of the day is to dive into more SQL 00:00:17.790 --> 00:00:21.120 to give you more practice writing queries, working with databases, 00:00:21.120 --> 00:00:22.912 and hopefully, at the very end, you'll feel 00:00:22.912 --> 00:00:25.740 prepared to tackle this week's problem set. 00:00:25.740 --> 00:00:28.870 Now actually, this week seven, is one of my favorite weeks in the course. 00:00:28.870 --> 00:00:31.920 And I like this week so much because it's all about data, 00:00:31.920 --> 00:00:36.140 and data is often about real people, real things in the real world. 00:00:36.140 --> 00:00:38.640 So it's like really good connection between computer science 00:00:38.640 --> 00:00:40.800 and how things actually work in the real world. 00:00:40.800 --> 00:00:43.343 It kind of brings these questions about how we design things, 00:00:43.343 --> 00:00:44.760 makes them all the more important. 00:00:44.760 --> 00:00:46.570 Because we're actually going to talk about real people, 00:00:46.570 --> 00:00:48.570 real students, real things in this world, 00:00:48.570 --> 00:00:51.960 and we're going to represent inside of our own programs now. 00:00:51.960 --> 00:00:55.200 So thinking about data and representing real people, 00:00:55.200 --> 00:00:58.232 let's actually take a look into a fantasy world, one in Harry Potter. 00:00:58.232 --> 00:00:59.940 And if you're familiar with Harry Potter, 00:00:59.940 --> 00:01:03.150 maybe you're familiar with the Hall of Prophecy, 00:01:03.150 --> 00:01:06.912 which is this place where all of the prophecies go to, live, and reside. 00:01:06.912 --> 00:01:09.870 The place of we're they're going to store lots of different prophecies. 00:01:09.870 --> 00:01:13.260 And maybe be one of these prophecies is this place of where 00:01:13.260 --> 00:01:17.310 Hogwarts students, people who go to the Hogwarts School of Magic, will be 00:01:17.310 --> 00:01:20.730 could have housed, where they're going to actually live during their time 00:01:20.730 --> 00:01:21.450 at Hogwarts. 00:01:21.450 --> 00:01:25.620 So, let's think about how we might store a database of Hogwarts students 00:01:25.620 --> 00:01:27.882 inside of the different houses of Hogwarts. 00:01:27.882 --> 00:01:30.840 And if you're familiar, there's like Slytherin, Gryffindor, Hufflepuff, 00:01:30.840 --> 00:01:31.800 and Ravenclaw. 00:01:31.800 --> 00:01:34.390 And all these students live in these various houses here. 00:01:34.390 --> 00:01:36.690 So if we think of a database, we think of this space 00:01:36.690 --> 00:01:40.570 that is maybe currently empty, we could fill it with some things. 00:01:40.570 --> 00:01:43.500 So let's go ahead and call this space, for the sake of our discussion 00:01:43.500 --> 00:01:45.330 here, as roster.db. 00:01:45.330 --> 00:01:47.380 We're going to have some students in here, 00:01:47.380 --> 00:01:51.370 and they're going to go into different houses inside of this database. 00:01:51.370 --> 00:01:55.410 So all databases have some tables, some places that we're 00:01:55.410 --> 00:01:57.220 going to store data inside of them. 00:01:57.220 --> 00:02:00.320 And in this case, let's say we have this table that looks a bit like this. 00:02:00.320 --> 00:02:03.150 And if you can see that, we see we have Adelaide Murton, who's 00:02:03.150 --> 00:02:04.680 in the house for Slytherin. 00:02:04.680 --> 00:02:08.699 And the head of the Slytherin house is a professor named Severus Snape. 00:02:08.699 --> 00:02:10.620 And down below and kind of row three here, 00:02:10.620 --> 00:02:13.080 we have Anthony Goldstein, who's in Ravenclaw. 00:02:13.080 --> 00:02:17.010 And the leader, the head of Ravenclaw, is Filius Flitwick, a professor 00:02:17.010 --> 00:02:18.780 at Hogwarts School of Magic. 00:02:18.780 --> 00:02:20.580 So here, we have a table. 00:02:20.580 --> 00:02:23.150 And often, in our databases, tables are named. 00:02:23.150 --> 00:02:25.320 So let's maybe call this one students. 00:02:25.320 --> 00:02:27.210 And now, we have our full database. 00:02:27.210 --> 00:02:29.970 We have a name for our database, roster.db, 00:02:29.970 --> 00:02:33.460 and we have a table inside of it, this one called students. 00:02:33.460 --> 00:02:36.210 But what we might want to do with this table? 00:02:36.210 --> 00:02:39.130 What are some things we might want to actually do to this data. 00:02:39.130 --> 00:02:42.930 If want to chime in and chat, what's the point of storing data in this way? 00:02:42.930 --> 00:02:44.040 What could we do with it? 00:02:48.270 --> 00:02:48.770 Yeah. 00:02:48.770 --> 00:02:49.910 So I'm seeing somebody messaging me. 00:02:49.910 --> 00:02:51.785 So you could maybe actually try to figure out 00:02:51.785 --> 00:02:53.500 what else is inside of this table. 00:02:53.500 --> 00:02:54.620 You query the database. 00:02:54.620 --> 00:02:57.005 We could try to ask what's inside of it. 00:02:57.005 --> 00:02:58.880 And to do that, we can actually use these SQL 00:02:58.880 --> 00:03:00.560 keywords that begins with a select. 00:03:00.560 --> 00:03:02.940 How do we select data from this table? 00:03:02.940 --> 00:03:06.420 So let's dive into selecting data from tables as we saw in lecture. 00:03:06.420 --> 00:03:08.700 Let's go a bit more deeply into it this time. 00:03:08.700 --> 00:03:13.700 So if I wanted to select all the data from this particular table, 00:03:13.700 --> 00:03:17.642 does anyone know what kind of query I could use for this? 00:03:17.642 --> 00:03:18.600 I'm seeing in the chat. 00:03:18.600 --> 00:03:20.183 I'm seeing some select, which is good. 00:03:20.183 --> 00:03:22.382 I'm seeing some stars, which is also good. 00:03:22.382 --> 00:03:24.090 And now, let's go ahead and try this out. 00:03:24.090 --> 00:03:31.280 So if I go to my own VS code over here, I can see that I have my own roster.db. 00:03:31.280 --> 00:03:34.550 So I'll go ahead and open this up with SQLite, this kind of database engine 00:03:34.550 --> 00:03:37.060 we can use to execute SQL queries on this database, 00:03:37.060 --> 00:03:39.710 so I'll do SQLite3 roster.db. 00:03:39.710 --> 00:03:42.290 And now, just to show you, if I type dot schema 00:03:42.290 --> 00:03:44.840 to see the kinds of tables I have inside the database, 00:03:44.840 --> 00:03:47.120 I see I have this table called students. 00:03:47.120 --> 00:03:49.580 It has the very same attributes or columns 00:03:49.580 --> 00:03:53.040 we saw earlier in our visual back here. 00:03:53.040 --> 00:03:56.660 So, the hypothesis right now is we could kind of select everything 00:03:56.660 --> 00:03:59.600 in this table using this syntax I see from the chat. 00:03:59.600 --> 00:04:02.060 We could say SELECT star. 00:04:02.060 --> 00:04:05.560 We can select everything from this students table. 00:04:05.560 --> 00:04:06.810 So let's go ahead and do that. 00:04:06.810 --> 00:04:10.393 And we end all of our queries in SQL with a semicolon. 00:04:10.393 --> 00:04:11.810 So this is the end of our thought. 00:04:11.810 --> 00:04:12.920 This is our entire query. 00:04:12.920 --> 00:04:15.410 Select everything from this students table. 00:04:15.410 --> 00:04:16.850 I'll hit Enter now. 00:04:16.850 --> 00:04:19.640 And now, I can see kind of, if I zoom back out, 00:04:19.640 --> 00:04:22.530 all of the students who are inside of this roster. 00:04:22.530 --> 00:04:24.950 So if you scroll through, we have about 40 students here, 00:04:24.950 --> 00:04:26.580 and we see some of that very same data. 00:04:26.580 --> 00:04:30.350 So we see that Anthony Goldstein is in Ravenclaw, led by Filius Flitwick. 00:04:30.350 --> 00:04:34.558 We've got in all data from inside this table, so that's pretty good. 00:04:34.558 --> 00:04:35.850 Let's create a visual for this. 00:04:35.850 --> 00:04:38.840 So if you said we had this roster here with a table 00:04:38.840 --> 00:04:41.930 called students, if we do SELECT star FROM students, what we'll see 00:04:41.930 --> 00:04:44.660 is the entire table will select everything 00:04:44.660 --> 00:04:48.090 in this kind of yellow highlight here. 00:04:48.090 --> 00:04:51.410 But we could also get more specific, because it's often not 00:04:51.410 --> 00:04:52.700 good to select everything. 00:04:52.700 --> 00:04:54.825 Often we care about some particular set of students 00:04:54.825 --> 00:04:56.670 or particular column in general. 00:04:56.670 --> 00:05:00.350 So let's say I just wanted the student names, like how can I 00:05:00.350 --> 00:05:03.920 get student names from this list, disregarding the house and the head 00:05:03.920 --> 00:05:06.290 and whoever else is in this database? 00:05:06.290 --> 00:05:08.180 How could I just get student names? 00:05:11.250 --> 00:05:14.130 I'm seeing some WHERE, and that is true. 00:05:14.130 --> 00:05:15.970 We could use WHERE for other cases. 00:05:15.970 --> 00:05:21.410 But here, I just want to get all of the student names from this roster. 00:05:21.410 --> 00:05:21.910 Nice. 00:05:21.910 --> 00:05:24.327 So I'm seeing some syntax here that looks a bit like this. 00:05:24.327 --> 00:05:27.530 We could say SELECT student_name FROM students. 00:05:27.530 --> 00:05:29.900 So often, we use select. 00:05:29.900 --> 00:05:32.623 After we say SELECT, we have a column name, 00:05:32.623 --> 00:05:35.040 like the column we want to actually select from our table. 00:05:35.040 --> 00:05:37.730 So in this case, if we looked at student's table here, 00:05:37.730 --> 00:05:40.410 we have a column called student_name. 00:05:40.410 --> 00:05:44.120 So if you want to select that column, we could say SELECT student_name FROM 00:05:44.120 --> 00:05:45.600 this table called student. 00:05:45.600 --> 00:05:50.090 So if we go back to our SQLite prompt here, we actually type in SQL commands. 00:05:50.090 --> 00:05:55.670 Only we now type SELECT student_name FROM students. 00:05:55.670 --> 00:05:59.910 And what should I end this with now, because it's a SQL query? 00:05:59.910 --> 00:06:00.460 A semicolon. 00:06:00.460 --> 00:06:00.960 Good. 00:06:00.960 --> 00:06:03.180 So I'll hit semicolon here, and I'll hit Enter. 00:06:03.180 --> 00:06:06.690 And now, notice how my table just has that single column. 00:06:06.690 --> 00:06:10.500 It doesn't have every column now, but just that student_name column. 00:06:10.500 --> 00:06:11.970 So often, we're using SELECT. 00:06:11.970 --> 00:06:16.120 If we want to include the column name, we're selecting from our table. 00:06:16.120 --> 00:06:18.810 And if we just want everything, we can say star. 00:06:18.810 --> 00:06:24.010 SELECT star, meaning select everything from our given table. 00:06:24.010 --> 00:06:28.660 Now, to kind of focus in on getting particular students, 00:06:28.660 --> 00:06:31.780 let's say we cared not just about every student's name, 00:06:31.780 --> 00:06:34.570 but only those who are in a particular house. 00:06:34.570 --> 00:06:38.530 If you go back to our table here, we have student names, houses, and heads. 00:06:38.530 --> 00:06:41.590 So maybe we care about the students who are in Slytherin house. 00:06:41.590 --> 00:06:44.800 How could we find the students who are in that house? 00:06:44.800 --> 00:06:45.430 Any ideas? 00:06:49.870 --> 00:06:54.560 We just care now about students who are in Slytherin. 00:06:54.560 --> 00:06:55.060 Yeah. 00:06:55.060 --> 00:06:58.100 So here, we can bring back that WHERE clause that we saw before. 00:06:58.100 --> 00:07:01.240 So, if we want not just an entire column of students 00:07:01.240 --> 00:07:05.320 as we see here but some rows, depending on a certain condition 00:07:05.320 --> 00:07:07.540 in another column, we can use WHERE. 00:07:07.540 --> 00:07:12.110 So WHERE will give us back only certain rows from our table. 00:07:12.110 --> 00:07:12.860 So let's try this. 00:07:12.860 --> 00:07:15.460 We could say, let's SELECT student_names. 00:07:15.460 --> 00:07:19.180 Let's get that student name column from the student's table now, 00:07:19.180 --> 00:07:21.610 but let's go ahead and get only those students where 00:07:21.610 --> 00:07:26.330 the house is equal to what we're going to call Slytherin in this case. 00:07:26.330 --> 00:07:29.210 So if we go back to our table now and I type this query, 00:07:29.210 --> 00:07:33.640 I could say SELECT student_name, and I can type this all in one line. 00:07:33.640 --> 00:07:37.120 For the sake of argument here, I could say FROM students. 00:07:37.120 --> 00:07:42.280 And let's do WHERE house is Slytherin. 00:07:42.280 --> 00:07:43.880 OK, my query will wrap here. 00:07:43.880 --> 00:07:46.840 So we kind of go around the edge, but it's all one line, theoretically. 00:07:46.840 --> 00:07:47.770 Semicolon here. 00:07:47.770 --> 00:07:49.380 What do we see? 00:07:49.380 --> 00:07:53.470 We see only the students who are inside of Slytherin. 00:07:53.470 --> 00:07:56.410 And just to prove this, let's go ahead and actually update our query 00:07:56.410 --> 00:07:58.800 and let's show you how could do it on multiple lines. 00:07:58.800 --> 00:08:03.570 So we can say SELECT student_name and house. 00:08:03.570 --> 00:08:05.640 We could say student_name and house. 00:08:05.640 --> 00:08:08.670 We want this in the same query here separated by comma. 00:08:08.670 --> 00:08:12.840 Well, let's get this FROM our students table. 00:08:12.840 --> 00:08:17.490 And let's say we only want the rows where that house column is 00:08:17.490 --> 00:08:19.720 going to be Slytherin in this case. 00:08:19.720 --> 00:08:22.020 So I'll do a semicolon here and I'll hit Enter. 00:08:22.020 --> 00:08:26.550 And now, I do see that these same student names do belong to Slytherin. 00:08:26.550 --> 00:08:29.010 So with SELECT, we can always try to select 00:08:29.010 --> 00:08:32.159 not just a single column, but also multiple columns separated 00:08:32.159 --> 00:08:35.789 by commas here. 00:08:35.789 --> 00:08:39.030 Now, what questions are there so far on these SELECT, or does 00:08:39.030 --> 00:08:41.669 it seem pretty straightforward so far? 00:08:41.669 --> 00:08:42.960 Any questions in the chat? 00:08:47.760 --> 00:08:49.830 OK, not seeing any right now. 00:08:49.830 --> 00:08:52.090 But let's try to go a little more advanced now. 00:08:52.090 --> 00:08:54.000 So we've seen how we can select some data. 00:08:54.000 --> 00:08:58.510 We've seen how we can get different rows back from our table. 00:08:58.510 --> 00:09:00.690 But what if we're not kind of sure exactly what we 00:09:00.690 --> 00:09:02.470 want to look for in our table? 00:09:02.470 --> 00:09:06.180 Maybe we want to get just the students whose last name is Potter. 00:09:06.180 --> 00:09:09.380 Well, how could we do that? 00:09:09.380 --> 00:09:11.490 Any ideas what kind of clause we could use, 00:09:11.490 --> 00:09:13.995 if you want to find the students whose name end with Potter? 00:09:17.100 --> 00:09:18.660 Somebody said we should use LIKE. 00:09:18.660 --> 00:09:21.420 And as we saw in lecture, LIKE is this query 00:09:21.420 --> 00:09:23.700 that kind of lets us do a fuzzy match, meaning 00:09:23.700 --> 00:09:26.550 that we get all the rows that kind of roughly 00:09:26.550 --> 00:09:31.750 match some clause or some string we might give to our query here. 00:09:31.750 --> 00:09:36.508 So to do this, let's go ahead and look at our table again. 00:09:36.508 --> 00:09:38.550 For Slytherin, we got kind of a result like this. 00:09:38.550 --> 00:09:40.775 But now, let's look at how we could use it, LIKE. 00:09:40.775 --> 00:09:43.650 So if we're interested in students whose name ends with Harry Potter, 00:09:43.650 --> 00:09:45.240 well, we could do a query like this. 00:09:45.240 --> 00:09:49.020 We could say, SELECT student_names, obviously, FROM our student table. 00:09:49.020 --> 00:09:51.540 And now, we're only going to take the student 00:09:51.540 --> 00:09:58.350 names that are like Potter with this kind of percent sign in front. 00:09:58.350 --> 00:10:01.170 And I want to recall what that percent sign is doing for us here, 00:10:01.170 --> 00:10:03.940 if you want to chime in the chat. 00:10:03.940 --> 00:10:07.480 Why can we not just say LIKE Potter, and why do I 00:10:07.480 --> 00:10:09.340 have to say LIKE percent Potter? 00:10:12.550 --> 00:10:16.300 Yes, I'm seeing that people are calling this a wild card character. 00:10:16.300 --> 00:10:19.660 And basically, this percent sign says that I 00:10:19.660 --> 00:10:24.040 can match any character ahead of this string Potter. 00:10:24.040 --> 00:10:25.720 It could be Harry Potter. 00:10:25.720 --> 00:10:26.950 It could be Lily Potter. 00:10:26.950 --> 00:10:30.092 It could be, really, any name as long as it ends with Potter. 00:10:30.092 --> 00:10:32.050 And putting that percent sign at the beginning, 00:10:32.050 --> 00:10:36.310 well, that says that I can have any characters I want before Potter. 00:10:36.310 --> 00:10:40.900 And someone can maybe just have Harry and then percent sign after Harry, 00:10:40.900 --> 00:10:44.290 and that would give me all of the names that begin with Harry in this case. 00:10:44.290 --> 00:10:46.040 So let's go ahead and take a look at this. 00:10:46.040 --> 00:10:49.990 We can say SELECT student_name now, and we 00:10:49.990 --> 00:10:54.130 want to take student name FROM our students table. 00:10:54.130 --> 00:10:56.110 And then after this, let's say, OK, we want 00:10:56.110 --> 00:11:01.270 to get the student names that are LIKE, not equal to but LIKE, 00:11:01.270 --> 00:11:05.500 this string we're going to give called Potter, percent Potter. 00:11:05.500 --> 00:11:09.160 Meaning, we can get any character before Potter as long as at the end, 00:11:09.160 --> 00:11:12.370 we see Potter exactly, so I'll hit Enter now. 00:11:12.370 --> 00:11:15.733 And we see that there's really only one student whose name ends with Potter, 00:11:15.733 --> 00:11:16.900 which is Harry James Potter. 00:11:16.900 --> 00:11:19.733 But you might also see other characters in the Harry Potter universe 00:11:19.733 --> 00:11:22.600 whose names also end in Potter too. 00:11:22.600 --> 00:11:26.110 Now, maybe a better question here, to get more students in this query, 00:11:26.110 --> 00:11:26.800 is this one. 00:11:26.800 --> 00:11:30.370 So we could say, find the names and houses 00:11:30.370 --> 00:11:34.510 of students whose names begin with H. So let's try this one. 00:11:34.510 --> 00:11:37.870 We're trying to find students whose names begin with H. 00:11:37.870 --> 00:11:42.040 So we want to select again names. 00:11:42.040 --> 00:11:44.460 So I'll say, OK, I want to SELECT a student_name 00:11:44.460 --> 00:11:48.510 from my student_name column, and I also want to select the house. 00:11:48.510 --> 00:11:51.060 So I'll say comma house, and I'll get back now 00:11:51.060 --> 00:11:54.390 in this query student_name and house. 00:11:54.390 --> 00:11:57.440 Where am I going to get this data from though? 00:11:57.440 --> 00:11:58.580 Any ideas? 00:11:58.580 --> 00:12:00.830 Where am I selecting from in this case? 00:12:00.830 --> 00:12:01.610 In the chat? 00:12:01.610 --> 00:12:02.360 From students. 00:12:02.360 --> 00:12:05.160 So I'll say FROM, students here. 00:12:05.160 --> 00:12:10.490 Now to finish this off, I only want the students whose names begin with H. 00:12:10.490 --> 00:12:14.630 So I could say WHERE to kind of subset my rows, 00:12:14.630 --> 00:12:18.030 get a certain number of rows who only meet a certain condition. 00:12:18.030 --> 00:12:20.880 And now, the column I care about, the student_name. 00:12:20.880 --> 00:12:24.980 And I'm seeing in the chat, I could say, not equals, not particular H here. 00:12:24.980 --> 00:12:26.730 I don't want students whose name literally 00:12:26.730 --> 00:12:33.360 is H. I want students whose name is LIKE H and then any other characters that 00:12:33.360 --> 00:12:34.680 come after it. 00:12:34.680 --> 00:12:36.180 So I could hit Enter here. 00:12:36.180 --> 00:12:38.780 And now, I'll do this. 00:12:38.780 --> 00:12:41.340 Now, we see we have Hannah Abbott, whose name begins 00:12:41.340 --> 00:12:44.160 with H, Harry, and Hermione also begins with H. 00:12:44.160 --> 00:12:47.310 And they're H in Hufflepuff for Hannah, and Harry and Hermione 00:12:47.310 --> 00:12:49.470 are in Gryffindor. 00:12:49.470 --> 00:12:51.060 A question, is this case sensitive? 00:12:51.060 --> 00:12:52.117 It's a good question. 00:12:52.117 --> 00:12:53.200 Let's go ahead and try it. 00:12:53.200 --> 00:12:57.060 So we could say, SELECT student_name, and we're 00:12:57.060 --> 00:13:02.340 going to select also the houses FROM the students table, where the student 00:13:02.340 --> 00:13:06.510 name is LIKE, and we saw capital H beforehand, like this. 00:13:06.510 --> 00:13:09.060 But now, let's try a lowercase h and see if we 00:13:09.060 --> 00:13:12.060 get any students here or any different results, now that it's lowercase. 00:13:12.060 --> 00:13:15.220 So let's hit Enter now, and we still get the very same results. 00:13:15.220 --> 00:13:16.530 So what do you think? 00:13:16.530 --> 00:13:18.570 Is LIKE case sensitive? 00:13:18.570 --> 00:13:21.600 Does it matter what case I use? 00:13:21.600 --> 00:13:22.620 No, it doesn't seem to. 00:13:22.620 --> 00:13:25.620 I could use the capital H. I could say the lowercase h 00:13:25.620 --> 00:13:28.120 and I get the very same results here. 00:13:28.120 --> 00:13:30.350 So that's a great question. 00:13:30.350 --> 00:13:32.720 And now, I'm seeing another question. 00:13:32.720 --> 00:13:36.170 We need to use keywords like SELECT, like WHERE in all caps, 00:13:36.170 --> 00:13:37.080 and why is that so? 00:13:37.080 --> 00:13:40.130 So here, you'll see that I'm using SELECT in all caps, 00:13:40.130 --> 00:13:43.160 I'm using FROM in all caps, and using WHERE in all caps, 00:13:43.160 --> 00:13:44.240 and LIKE in all caps. 00:13:44.240 --> 00:13:47.510 And the convention here is, is that if you're 00:13:47.510 --> 00:13:52.880 typing in a SQL keyword, some kind of command to SQL, like SELECT or FROM 00:13:52.880 --> 00:13:55.610 or WHERE or LIKE, well, you should capitalize 00:13:55.610 --> 00:13:58.560 that to denote it as a SQL keyword. 00:13:58.560 --> 00:14:02.840 I can keep my column names though or my strings in lowercase 00:14:02.840 --> 00:14:04.580 because if I look at my table here, well, 00:14:04.580 --> 00:14:08.490 these literally are in lowercase, student_name, house is lowercase. 00:14:08.490 --> 00:14:11.330 So I want to make sure that casing matches there. 00:14:11.330 --> 00:14:14.120 But for a SQL query, I should, by convention, sort of 00:14:14.120 --> 00:14:16.407 capitalize these keywords. 00:14:16.407 --> 00:14:17.490 I could very much do this. 00:14:17.490 --> 00:14:20.790 I could say SELECT star FROM students and I 00:14:20.790 --> 00:14:23.030 would get the very same results if I zoom out a bit, 00:14:23.030 --> 00:14:24.500 but it's just not quite as clean. 00:14:24.500 --> 00:14:25.620 It's hard to tell. 00:14:25.620 --> 00:14:30.650 Let's say, if I do this, select student_name, house from students. 00:14:30.650 --> 00:14:33.110 It's kind of hard to tell where my keywords begin 00:14:33.110 --> 00:14:36.060 and where my column names actually join in, 00:14:36.060 --> 00:14:38.310 like it's hard to tell which what thing is which here. 00:14:38.310 --> 00:14:41.720 So always capitalize your SQL keywords as you go through. 00:14:44.160 --> 00:14:44.660 Great. 00:14:44.660 --> 00:14:46.243 Another question, when do we use LIKE? 00:14:46.243 --> 00:14:48.778 So as we've seen before here, we use LIKE-- 00:14:48.778 --> 00:14:50.570 let me actually just finish the query here. 00:14:50.570 --> 00:14:54.140 We use LIKE when we want to actually fuzzy match something, kind 00:14:54.140 --> 00:14:55.650 of roughly match something. 00:14:55.650 --> 00:14:59.750 So if I know that I want students whose name begins with H but isn't exactly H, 00:14:59.750 --> 00:15:03.700 LIKE is a good use case for that here. 00:15:03.700 --> 00:15:06.955 Other questions on selecting then? 00:15:13.565 --> 00:15:15.565 I'm just going through the chat here to be sure. 00:15:21.520 --> 00:15:26.340 All right, so I think that's most of our questions on this. 00:15:26.340 --> 00:15:28.090 So there's a main question about, is there 00:15:28.090 --> 00:15:29.950 a way to make search case sensitive? 00:15:29.950 --> 00:15:30.520 There is. 00:15:30.520 --> 00:15:35.710 So we saw before that LIKE is generally not case sensitive, but let's try this. 00:15:35.710 --> 00:15:39.850 If I do a search for Harry James Potter, who we know is in our database, 00:15:39.850 --> 00:15:45.690 I could do SELECT, the student_name, FROM students, 00:15:45.690 --> 00:15:51.360 WHERE the student_name is exactly Harry James Potter, 00:15:51.360 --> 00:15:53.670 and I'll hit semicolon here. 00:15:53.670 --> 00:15:56.400 And I should get, well, none other than Harry James Potter. 00:15:56.400 --> 00:16:02.250 But what if I did this, what if I said SELECT student_name, FROM students, 00:16:02.250 --> 00:16:07.530 WHERE the student_name is now all lowercase harry james potter, 00:16:07.530 --> 00:16:11.310 hit Enter now, and no one seems to match that exactly. 00:16:11.310 --> 00:16:14.520 So equals, when you're using that, is case sensitive, 00:16:14.520 --> 00:16:16.875 but LIKE, by definition, is not. 00:16:21.380 --> 00:16:24.800 So that seems to be most of what you might want to use our selecting for. 00:16:24.800 --> 00:16:27.590 We can use WHERE and we can use LIKE to kind of narrow 00:16:27.590 --> 00:16:29.840 our matches from our table. 00:16:29.840 --> 00:16:31.980 But what else could we do with these queries? 00:16:31.980 --> 00:16:35.360 So one thing I want to do, as we saw in lecture, is kind of order 00:16:35.360 --> 00:16:37.610 our students, put them in like alphabetical order 00:16:37.610 --> 00:16:40.020 or kind of roster order and so on. 00:16:40.020 --> 00:16:42.860 And we can use some new school keywords for that exactly. 00:16:42.860 --> 00:16:47.100 So we can not just select, we can also order our data. 00:16:47.100 --> 00:16:49.940 So we saw before, we can just kind of get every student we want to. 00:16:49.940 --> 00:16:53.570 I guess by SELECT star FROM students give us all the students here. 00:16:53.570 --> 00:16:56.480 But at very end of this query, once it's complete, 00:16:56.480 --> 00:16:59.150 we could have tacked on this other keyword called 00:16:59.150 --> 00:17:04.369 ORDER BY to tell SQL what column I want to sort this data by. 00:17:04.369 --> 00:17:08.180 So I'll get every column, but if I want to sort the data by particular column, 00:17:08.180 --> 00:17:11.760 I could say ORDER BY that particular column name. 00:17:11.760 --> 00:17:15.530 So here, if I do this, ORDER BY student_name, well, 00:17:15.530 --> 00:17:19.260 I should get the names in alphabetical order. 00:17:19.260 --> 00:17:25.400 So, I think if I select everything, SELECT star FROM students, 00:17:25.400 --> 00:17:28.850 I might already have these names in alphabetical order by virtue of how 00:17:28.850 --> 00:17:30.860 they were added to this data set. 00:17:30.860 --> 00:17:34.400 But if I were to do this, SELECT star FROM students, 00:17:34.400 --> 00:17:37.060 I'll zoom in down below here. 00:17:37.060 --> 00:17:44.910 Zoom in here, SELECT star FROM students, and then ORDER BY student_name, 00:17:44.910 --> 00:17:47.337 well, I kind of get the very same thing. 00:17:47.337 --> 00:17:49.170 But it's important to keep in mind, you keep 00:17:49.170 --> 00:17:53.670 in mind that when you use ORDER BY by without kind of anything 00:17:53.670 --> 00:17:57.360 after it, like this here, we're getting the names in what's 00:17:57.360 --> 00:17:59.290 called ascending order. 00:17:59.290 --> 00:18:04.350 So ascending, in SQL, means from kind of the lower Ascii value, like A, 00:18:04.350 --> 00:18:08.890 to the higher Ascii value, like Z, so A to Z in this case. 00:18:08.890 --> 00:18:13.710 But if I wanted to flip it, not having A to Z, but Z to A, 00:18:13.710 --> 00:18:16.350 well, I could change this, not to order by ascending, 00:18:16.350 --> 00:18:19.420 but to order by descending, like this. 00:18:19.420 --> 00:18:24.368 So by default, SQL always order by the ascending order like this. 00:18:24.368 --> 00:18:26.160 But if you want to order by descending, you 00:18:26.160 --> 00:18:30.000 could simply say descending at the end or DESC for descending. 00:18:30.000 --> 00:18:32.590 So let's try that now so I can actually see the results. 00:18:32.590 --> 00:18:37.170 So I'll go, clear my terminal, zoom in a bit, and I'll say SELECT. 00:18:37.170 --> 00:18:42.660 I'll say SELECT star FROM students, and then I will say, 00:18:42.660 --> 00:18:49.440 ORDER BY student_name, now descending, so from Z to A. Now, let's try this. 00:18:49.440 --> 00:18:50.580 I'll hit Enter. 00:18:50.580 --> 00:18:53.340 And now, if I zoom out, you should see. 00:18:53.340 --> 00:18:57.300 I start with Vincent now, whose name is towards the end of the alphabet, 00:18:57.300 --> 00:19:00.940 and I go up to Adelaide, whose name is at the top of the alphabet. 00:19:00.940 --> 00:19:02.980 In this case. 00:19:02.980 --> 00:19:06.160 And I mean there are other things we could sort of sort by here. 00:19:06.160 --> 00:19:09.665 So we don't have to just sort by student name, we could also sort by house. 00:19:09.665 --> 00:19:11.040 So maybe we want to sort by this. 00:19:11.040 --> 00:19:15.980 We could say SELECT student_name, or sorry, SELECT-- let's 00:19:15.980 --> 00:19:19.850 just select everything here, SELECT everything FROM students. 00:19:19.850 --> 00:19:24.080 Whoops, my internet's a little wonky here. 00:19:24.080 --> 00:19:27.251 I'll SELECT star FROM-- 00:19:27.251 --> 00:19:30.970 I'm going to give my code space a minute to breathe here. 00:19:30.970 --> 00:19:32.920 I'm going too quickly through these SELECT. 00:19:35.408 --> 00:19:36.200 I think we're back. 00:19:36.200 --> 00:19:39.280 So we'll do SELECT star FROM students. 00:19:39.280 --> 00:19:42.760 And now I'm going to order by, not student name, but by house. 00:19:42.760 --> 00:19:47.340 And so now, I should see, if I zoom out, my data organized by house. 00:19:47.340 --> 00:19:50.580 I have all the students in Gryffindor, all students in Hufflepuff, 00:19:50.580 --> 00:19:54.600 and all the students in Ravenclaw, and all the students in Slytherin. 00:19:54.600 --> 00:19:57.660 So a question now is, let's say we wanted to sort the student 00:19:57.660 --> 00:20:02.970 data in alphabetical order, but have it first sorted by house, and then 00:20:02.970 --> 00:20:04.740 by student name. 00:20:04.740 --> 00:20:08.550 So ideally, we should have all of our data from our table 00:20:08.550 --> 00:20:13.180 viewed now, but sorted first by house, alphabetical order. 00:20:13.180 --> 00:20:17.170 And then, inside of each house, sorted by name. 00:20:17.170 --> 00:20:19.050 So how could we do this, do you think? 00:20:19.050 --> 00:20:20.970 Do you want to chime in the chat? 00:20:20.970 --> 00:20:25.760 How could we have this kind of multiple, multi-layered sort going on? 00:20:29.700 --> 00:20:32.855 I've seen a few options, and one that I like here is this comma. 00:20:32.855 --> 00:20:34.980 We could use a comma, something that we did before. 00:20:34.980 --> 00:20:37.680 We could say, if we start off in SELECT, we 00:20:37.680 --> 00:20:42.030 could do SELECT student_name, and house, or we could 00:20:42.030 --> 00:20:43.680 do the very same thing for ORDER BY. 00:20:43.680 --> 00:20:47.790 We could ORDER BY multiple things at the very same time, so let's do this. 00:20:47.790 --> 00:20:52.710 We could do SELECT everything FROM students, and let's ORDER BY 00:20:52.710 --> 00:20:58.300 first house, and then by student_name. 00:20:58.300 --> 00:21:03.180 So with the comma here is saying is that the very first thing before the comma 00:21:03.180 --> 00:21:05.640 is the first thing I will sort by. 00:21:05.640 --> 00:21:09.330 Then afterwards, student name is the next thing I will sort by. 00:21:09.330 --> 00:21:15.550 So inside of my sorted list of houses, I will then sort student names, 00:21:15.550 --> 00:21:16.860 so let's try this. 00:21:16.860 --> 00:21:21.510 And now, we can see, if I zoom out, well, we have Gryffindor all sorted. 00:21:21.510 --> 00:21:24.390 And inside of the students who are in Gryffindor, well, 00:21:24.390 --> 00:21:31.450 we have those students in alphabetical order, CDGHHLN and so on. 00:21:31.450 --> 00:21:35.250 So let's try it maybe with a more advanced technique. 00:21:35.250 --> 00:21:39.910 We could go in and say SELECT everything FROM students. 00:21:39.910 --> 00:21:44.100 But now, we just want to get the houses as they are, 00:21:44.100 --> 00:21:46.770 SELECT ORDER BY house, and then the student_name, 00:21:46.770 --> 00:21:48.268 but in reverse alphabetical order. 00:21:48.268 --> 00:21:50.310 And how can I get reverse alphabetical order now? 00:21:53.540 --> 00:21:54.320 Descending, right. 00:21:54.320 --> 00:21:56.030 So I could say DESC here. 00:21:56.030 --> 00:21:57.860 And now, I'll hit Enter. 00:21:57.860 --> 00:22:04.070 And now I see, I'm still sorting by house alphabetically, G, H, R, and S. 00:22:04.070 --> 00:22:06.620 But now, inside of each house, notice how 00:22:06.620 --> 00:22:13.350 I'm going from the bottom of the alphabet to the very top. 00:22:13.350 --> 00:22:15.560 So questions on these sorts? 00:22:18.590 --> 00:22:19.750 What do you wonder about? 00:22:28.700 --> 00:22:31.033 I could write this sort case sensitive. 00:22:31.033 --> 00:22:32.450 Let's check that one out actually. 00:22:32.450 --> 00:22:38.900 So we see-- we can do SELECT star FROM students, and now, I'll do ORDER BY. 00:22:38.900 --> 00:22:43.640 And because we said house before, why don't we just try capital H house 00:22:43.640 --> 00:22:45.860 and I'll hit semicolon and Enter again? 00:22:45.860 --> 00:22:47.730 And that still seems to work. 00:22:47.730 --> 00:22:50.150 So when you're talking about SQL column names, 00:22:50.150 --> 00:22:53.230 those also are not going to be case sensitive for you. 00:22:57.485 --> 00:23:00.850 And the question, what will happen if I switch between house and student_name 00:23:00.850 --> 00:23:01.433 in that query? 00:23:01.433 --> 00:23:02.540 That's a great question. 00:23:02.540 --> 00:23:05.590 So before, we had a query like this, SELECT star FROM students, 00:23:05.590 --> 00:23:10.790 and then we did ORDER BY first house and then student_name? 00:23:10.790 --> 00:23:19.080 But what if we did this, what if we did, first, student_name and then house? 00:23:19.080 --> 00:23:21.540 So what do you think will happen in this case? 00:23:21.540 --> 00:23:25.650 In the chat, if you want to chime in, what do you think? 00:23:25.650 --> 00:23:27.590 What will happen if we flip this order now? 00:23:35.780 --> 00:23:39.560 Yes, so somebody says it'll first sort by name and then by house, 00:23:39.560 --> 00:23:40.940 so let's try running this. 00:23:40.940 --> 00:23:43.460 We'll hit Enter here and I'll zoom out. 00:23:43.460 --> 00:23:49.280 And notice how my student names are in alphabetical order. 00:23:49.280 --> 00:23:55.810 But within the C's, for example, I now have my houses in alphabetical order, 00:23:55.810 --> 00:23:58.330 so Hufflepuff, Ravenclaw, and Gryffindor. 00:23:58.330 --> 00:24:00.115 Or let's see, we have a lot of M's. 00:24:03.170 --> 00:24:05.990 So here, we have Ravenclaw, Slytherin. 00:24:05.990 --> 00:24:10.758 Well, that didn't seem to quite work in this case actually. 00:24:10.758 --> 00:24:11.675 Why did that not work? 00:24:16.900 --> 00:24:21.640 MRR, interesting. 00:24:21.640 --> 00:24:23.710 Oh, yeah, so somebody-- 00:24:23.710 --> 00:24:24.460 it's a good point. 00:24:24.460 --> 00:24:27.710 So somebody asked, there's no two identical names. 00:24:27.710 --> 00:24:30.412 So what happened here is, this isn't quite useful for us, 00:24:30.412 --> 00:24:31.870 and I'll see if I can explain this. 00:24:31.870 --> 00:24:38.200 Because what SQL will do is not sort by the very first letter of student_name, 00:24:38.200 --> 00:24:39.880 but sort by the entire string. 00:24:39.880 --> 00:24:44.455 And so, in order for the houses that we see in alphabetical order, 00:24:44.455 --> 00:24:46.330 we have to think the exact same student name. 00:24:46.330 --> 00:24:48.872 So not just the same first letter, but the same student name. 00:24:48.872 --> 00:24:52.160 And because no two student names are exactly alike, 00:24:52.160 --> 00:24:54.890 we probably won't actually see the results in this case. 00:24:54.890 --> 00:24:57.640 So good catch from somebody in the chat here. 00:25:08.280 --> 00:25:10.200 Other questions on ordering then? 00:25:17.735 --> 00:25:18.610 A good question here. 00:25:18.610 --> 00:25:21.720 So what if a name starts with a number, kind of unlikely, 00:25:21.720 --> 00:25:23.427 but a good case to consider. 00:25:23.427 --> 00:25:25.385 So if we had a name search with a number, well, 00:25:25.385 --> 00:25:28.860 it's going to go by the Ascii value for that number, I believe. 00:25:28.860 --> 00:25:32.670 So in this case, numbers would likely come before A 00:25:32.670 --> 00:25:36.070 or come before, certainly before Z, and so on. 00:25:36.070 --> 00:25:39.900 So first, numbers and symbols, and then the alphabet. 00:25:42.330 --> 00:25:43.080 Any question here? 00:25:43.080 --> 00:25:45.980 Can we order one column by ascending and the second by descending? 00:25:45.980 --> 00:25:46.820 You absolutely can. 00:25:46.820 --> 00:25:47.695 So we could say this. 00:25:47.695 --> 00:25:49.980 We can say SELECT star FROM students. 00:25:49.980 --> 00:25:55.010 Let's go ahead and ORDER BY house, descending, and then 00:25:55.010 --> 00:25:59.030 we could ORDER BY student_name ascending, like this. 00:25:59.030 --> 00:26:00.920 So certainly, if you're free to apply your 00:26:00.920 --> 00:26:07.320 own ascending or descending to each of these multiple arguments to ORDER BY 00:26:07.320 --> 00:26:07.820 here. 00:26:14.000 --> 00:26:19.890 And another question before we move on, so one is, how 00:26:19.890 --> 00:26:23.400 are lower or uppercase letters sorted? 00:26:23.400 --> 00:26:30.420 So I think, in this case, it would be uppercase would come first 00:26:30.420 --> 00:26:34.560 because I think uppercase is lower in the Ascii alphabet. 00:26:34.560 --> 00:26:37.140 If you remember how letters correspond to Ascii numbers, 00:26:37.140 --> 00:26:39.780 I think uppercase is lower there, so it would come first 00:26:39.780 --> 00:26:43.960 in the default ascending order of these characters. 00:26:43.960 --> 00:26:46.320 So let's move on from ordering, and let's think 00:26:46.320 --> 00:26:49.650 about one keyword that's actually kind of useful along with ordering. 00:26:49.650 --> 00:26:51.430 That's the use case of a limit. 00:26:51.430 --> 00:26:55.230 So if we have some query, and we won't get back just a certain number of rows 00:26:55.230 --> 00:26:58.920 from that query, all we could do is simply apply this limit one 00:26:58.920 --> 00:27:00.750 or limit five or limit 10, whatever number 00:27:00.750 --> 00:27:02.860 you want to limit at the very end of that query. 00:27:02.860 --> 00:27:05.730 This is helpful for finding the top 10 of something, for example, 00:27:05.730 --> 00:27:08.160 because you can order by a certain column 00:27:08.160 --> 00:27:10.240 and then take the top 10 that show up. 00:27:10.240 --> 00:27:11.940 So, let's go back to our query here. 00:27:11.940 --> 00:27:16.290 We can maybe say, let's want the first 10 students 00:27:16.290 --> 00:27:20.760 whose name is first in the alphabet, the first 10 students from the very first, 00:27:20.760 --> 00:27:21.520 alphabetically. 00:27:21.520 --> 00:27:24.510 So we say SELECT star FROM students. 00:27:24.510 --> 00:27:27.510 I'm going to order by student_name, and then we'll 00:27:27.510 --> 00:27:29.820 limit our query to only 10 students. 00:27:29.820 --> 00:27:30.840 I'll hit Enter here. 00:27:30.840 --> 00:27:33.900 And now, I should see only 10 students, those whose names 00:27:33.900 --> 00:27:36.195 are the very first 10 in the alphabet. 00:27:36.195 --> 00:27:37.570 And somewhat I could change this. 00:27:37.570 --> 00:27:40.860 I could say, let's not get the first 10 alphabetically, 00:27:40.860 --> 00:27:43.200 let's get the last 10 alphabetically. 00:27:43.200 --> 00:27:46.860 So I could do SELECT star FROM students. 00:27:46.860 --> 00:27:51.000 Now, let's order by student_name, but in descending order. 00:27:51.000 --> 00:27:54.120 And then let's go ahead and limit 10, and we should see, 00:27:54.120 --> 00:27:58.650 we get the very last 10 students in our alphabetical order. 00:27:58.650 --> 00:28:01.680 So limiting is helpful along with ORDER BY 00:28:01.680 --> 00:28:04.920 to kind of at the top or the bottom sort of segments of your data 00:28:04.920 --> 00:28:11.010 set, as long as you order by a certain column that you care about here. 00:28:11.010 --> 00:28:15.390 Question in the chat is, what if you don't have an ID 00:28:15.390 --> 00:28:17.920 and you select 10 queries, but you want to see the number, 00:28:17.920 --> 00:28:20.843 is there a way to do this? 00:28:20.843 --> 00:28:24.010 Just so I can understand the question, I think we're talking about these IDs 00:28:24.010 --> 00:28:25.700 here on the left hand side. 00:28:25.700 --> 00:28:28.060 And as we'll see later in section, these IDs 00:28:28.060 --> 00:28:31.690 are the primary keys, the unique identifiers for each of these students. 00:28:31.690 --> 00:28:36.940 And generally speaking, every student should have their own unique ID. 00:28:36.940 --> 00:28:40.642 If they don't have an ID, well, you might have 00:28:40.642 --> 00:28:42.100 some problems with database design. 00:28:42.100 --> 00:28:47.540 But if we're talking about ordering by here, ORDER BY student_name, 00:28:47.540 --> 00:28:50.990 for example, we're not really caring about the student ID 00:28:50.990 --> 00:28:52.560 just the student name, in this case. 00:28:52.560 --> 00:28:56.150 We could certainly ORDER BY ID if we like to, ORDER BY ID descending, 00:28:56.150 --> 00:28:57.950 ORDER BY ID ascending. 00:28:57.950 --> 00:29:01.070 But in this case, it so happens to be that the student 00:29:01.070 --> 00:29:04.670 ID corresponds to the student name kind of alphabetically sorted. 00:29:08.053 --> 00:29:10.720 All right, and I'm seeing a question about the last 10 students. 00:29:10.720 --> 00:29:13.440 So this actually is going to work somewhat similarly. 00:29:13.440 --> 00:29:16.690 So if I wanted the last 10 students alphabetically, I could do this. 00:29:16.690 --> 00:29:20.400 I could sort SELECT star FROM students, and then go ahead 00:29:20.400 --> 00:29:24.360 and say I want to ORDER BY student_name in descending order 00:29:24.360 --> 00:29:27.300 to get those bottom 10 students, and then whoops, 00:29:27.300 --> 00:29:29.820 and then I could limit 10 at the very end. 00:29:33.720 --> 00:29:35.205 Other questions too on limiting? 00:29:39.630 --> 00:29:41.100 Let's keep moving then. 00:29:41.100 --> 00:29:43.860 And what we could do here is we often-- 00:29:43.860 --> 00:29:46.080 it's nice to get the individual student names, 00:29:46.080 --> 00:29:49.583 but one thing we often care about is how many students are there? 00:29:49.583 --> 00:29:51.000 What's like an average test score? 00:29:51.000 --> 00:29:54.660 Trying to aggregate our data, not just take individual rows, 00:29:54.660 --> 00:29:57.460 but see kind of the bigger picture from our data. 00:29:57.460 --> 00:30:01.770 And for that, SQL has a variety of keywords you can use to get that data, 00:30:01.770 --> 00:30:03.760 get that aggregate data, so to speak. 00:30:03.760 --> 00:30:07.020 So here, we'll talk about aggregating, where aggregating 00:30:07.020 --> 00:30:09.630 means taking those individual rows, as I said before, 00:30:09.630 --> 00:30:12.030 but kind of summing things up or averaging them 00:30:12.030 --> 00:30:14.500 or kind of seeing the whole larger picture. 00:30:14.500 --> 00:30:17.350 So if I wanted to count the rows I get back, well, we could do this. 00:30:17.350 --> 00:30:21.150 I could say SELECT COUNT star FROM students. 00:30:21.150 --> 00:30:24.450 And this will take everything in our table, force selecting everything, 00:30:24.450 --> 00:30:27.130 and return the count of it, the number of rows that I have. 00:30:27.130 --> 00:30:29.880 So for example, to figure out how many students are in this table, 00:30:29.880 --> 00:30:35.280 I can say SELECT COUNT star FROM students, semicolon, hit Enter. 00:30:35.280 --> 00:30:38.560 And now, I see, I get 40 in this case. 00:30:38.560 --> 00:30:40.170 There are 40 total students. 00:30:40.170 --> 00:30:43.290 And this is kind of not very pretty, like the COUNT star. 00:30:43.290 --> 00:30:46.600 I'd love to just say like number of students, for example. 00:30:46.600 --> 00:30:52.530 So how could I change this column names, anyone know in the chat? 00:30:52.530 --> 00:30:53.500 Yeah, I could use AS. 00:30:53.500 --> 00:31:02.090 So I can say SELECT COUNT star AS in this case, maybe number_of_students, 00:31:02.090 --> 00:31:05.790 and then I could say we want to select that from the student's table. 00:31:05.790 --> 00:31:07.730 So now, I see I get the very same answer, 00:31:07.730 --> 00:31:10.680 but now my column is called number_of_students. 00:31:10.680 --> 00:31:13.070 So again, what we're doing here is selecting everything 00:31:13.070 --> 00:31:18.780 from our students table, counting it up, and returning that as a column named 00:31:18.780 --> 00:31:22.670 number_of_students, in this case. 00:31:22.670 --> 00:31:25.880 So what if we only wanted particular students, 00:31:25.880 --> 00:31:28.580 like let's say we just wanted to count students from Gryffindor? 00:31:28.580 --> 00:31:30.380 Well, how could I do that? 00:31:30.380 --> 00:31:35.680 In the chat, if you want to chime in, counting just students from Gryffindor. 00:31:38.640 --> 00:31:41.460 So I'm seeing the use of WHERE, which is helpful. 00:31:41.460 --> 00:31:42.420 Other ideas too? 00:31:45.360 --> 00:31:45.860 Yeah. 00:31:45.860 --> 00:31:47.710 So I'm seeing-- I want to use COUNT. 00:31:47.710 --> 00:31:51.080 I probably use WHERE here too, and we'll get to what 00:31:51.080 --> 00:31:52.650 we call GROUP BY in just a moment. 00:31:52.650 --> 00:31:54.480 But let's try WHERE first. 00:31:54.480 --> 00:31:57.292 So normally, to get all the things in Gryffindor, 00:31:57.292 --> 00:31:58.500 I would do a query like this. 00:31:58.500 --> 00:32:03.500 I could say SELECT star FROM students WHERE, in this case, 00:32:03.500 --> 00:32:08.740 house is Gryffindor, and then semicolon Enter. 00:32:08.740 --> 00:32:13.120 And now, these are all students who are in Gryffindor. 00:32:13.120 --> 00:32:16.390 But if I wanted just the count of these, not every individual row, 00:32:16.390 --> 00:32:20.470 what could modify my query, and to not SELECT star but SELECT COUNT star, 00:32:20.470 --> 00:32:22.480 so like the count of everything here. 00:32:22.480 --> 00:32:28.090 SELECT COUNT star FROM students, and then I'll say WHERE. 00:32:28.090 --> 00:32:33.070 In this case, the house is Gryffindor, semicolon, Enter. 00:32:33.070 --> 00:32:38.710 And now, I see I have 11 students who are in Gryffindor here, 00:32:38.710 --> 00:32:42.940 and that's all pretty good, but we also have to care about Slytherin. 00:32:42.940 --> 00:32:46.300 So let's say we want to SELECT the COUNT star FROM 00:32:46.300 --> 00:32:54.770 students where the house is Slytherin, and that would give us also 11. 00:32:54.770 --> 00:32:59.347 But how could I get all of the houses in one place, 00:32:59.347 --> 00:33:02.180 like not just individual areas for each house, for how many students 00:33:02.180 --> 00:33:05.090 are in them, but actually have it all in one query 00:33:05.090 --> 00:33:08.022 and have a table that says Slytherin has this many students, 00:33:08.022 --> 00:33:09.230 Gryffindor has many students? 00:33:09.230 --> 00:33:11.360 And I'm seeing, in the chat, this GROUP BY 00:33:11.360 --> 00:33:13.710 keyword that we're going to explore now here. 00:33:13.710 --> 00:33:17.180 So GROUP BY is helpful if you want to get the number of students 00:33:17.180 --> 00:33:18.710 by a certain attribute. 00:33:18.710 --> 00:33:22.770 So we want to kind of group our data by house and figure out, well, 00:33:22.770 --> 00:33:25.350 how many students are in each house? 00:33:25.350 --> 00:33:26.610 So let's try this. 00:33:26.610 --> 00:33:28.560 We can use GROUP BY as follows. 00:33:28.560 --> 00:33:35.030 I can say SELECT star FROM our students table, and let's go ahead 00:33:35.030 --> 00:33:38.870 and GROUP BY, in this case, house. 00:33:38.870 --> 00:33:42.200 We want to group our data by house, and I think 00:33:42.200 --> 00:33:43.970 we might have messed up slightly here. 00:33:43.970 --> 00:33:45.137 Let's go ahead to the query. 00:33:45.137 --> 00:33:47.130 We'll just hit Enter here. 00:33:47.130 --> 00:33:51.050 And now we can see where we have individual houses, 00:33:51.050 --> 00:33:54.470 but there's something weird going on with this student_name over here. 00:33:54.470 --> 00:33:56.703 We have much fewer pieces of data, but I'm just not 00:33:56.703 --> 00:33:58.620 sure why we're getting student_name over here. 00:33:58.620 --> 00:34:03.140 Well, instead of searching everything, why don't we select the count 00:34:03.140 --> 00:34:04.535 from each of our houses? 00:34:04.535 --> 00:34:14.659 So SELECT COUNT again FROM our students table, and then we'll GROUP BY house. 00:34:14.659 --> 00:34:16.190 I'll hit Enter here. 00:34:16.190 --> 00:34:20.469 And so I get back 11, 5, 13, and 11. 00:34:20.469 --> 00:34:23.229 What do you think this means, in this case? 00:34:26.090 --> 00:34:27.050 Any ideas in the chat? 00:34:30.985 --> 00:34:33.610 It's the number of students in each house, but it's hard for us 00:34:33.610 --> 00:34:35.290 to tell which house is which. 00:34:35.290 --> 00:34:39.010 Lets include the house name in our query, so let's not just select 00:34:39.010 --> 00:34:40.420 the count of everything. 00:34:40.420 --> 00:34:42.688 Let's select the house in our query too. 00:34:42.688 --> 00:34:43.730 Let's order it like this. 00:34:43.730 --> 00:34:46.090 Let's first select the house, and then select 00:34:46.090 --> 00:34:52.540 the COUNT of everything FROM students, as long as we're grouping by house. 00:34:52.540 --> 00:34:53.560 I'll Enter here. 00:34:53.560 --> 00:34:57.490 And now, I get Gryffindor, Hufflepuff, Ravenclaw, and Slytherin, 00:34:57.490 --> 00:35:02.410 where each one has a count next to it, like 11, 5, 13, and 11. 00:35:02.410 --> 00:35:05.800 So this is the total number of students in each group, as somebody in the chat 00:35:05.800 --> 00:35:08.560 is saying, that we're going to first group by house 00:35:08.560 --> 00:35:13.020 and then take the count by every house. 00:35:13.020 --> 00:35:16.845 Other questions on these GROUP BYs or this aggregation here? 00:35:29.840 --> 00:35:32.330 So it seems like we have most questions answered there. 00:35:32.330 --> 00:35:35.480 And now that we've kind of seen a variety of SQL keywords, 00:35:35.480 --> 00:35:37.850 we've kind of reviewed them from lecture, let's dive 00:35:37.850 --> 00:35:40.940 into something about database that probably isn't great for us. 00:35:40.940 --> 00:35:44.960 Like if we select everything here, SELECT star FROM students, 00:35:44.960 --> 00:35:48.620 let's just zoom out a bit and think about what 00:35:48.620 --> 00:35:54.190 isn't very well designed about this database if you take a look at it. 00:35:54.190 --> 00:36:03.820 What could be better designed about it, or where do you see some repetition 00:36:03.820 --> 00:36:05.290 that we maybe can get rid of? 00:36:13.415 --> 00:36:15.540 Yeah, so I'm seeing some people saying that there's 00:36:15.540 --> 00:36:16.910 some repetition in the houses. 00:36:16.910 --> 00:36:21.650 We have Slytherin here, Slytherin again, Ravenclaw, Slytherin, Hufflepuff, 00:36:21.650 --> 00:36:24.230 like a lot of these names are repeating. 00:36:24.230 --> 00:36:27.950 And similarly for the heads, notice how we have all the heads kind of 00:36:27.950 --> 00:36:29.510 repeated throughout this data set. 00:36:29.510 --> 00:36:32.420 And maybe, ideally, we wouldn't actually have 00:36:32.420 --> 00:36:37.160 to repeat Severus Snape, Severus Snape, all the way through our data set. 00:36:37.160 --> 00:36:40.490 We could just put Severus Snape in one place 00:36:40.490 --> 00:36:43.400 and reference Severus Snape from other places. 00:36:43.400 --> 00:36:45.680 And maybe for the house, we could say, well, 00:36:45.680 --> 00:36:47.360 Slytherin shows up a lot of times. 00:36:47.360 --> 00:36:50.030 What if we just had Slytherin in one table, 00:36:50.030 --> 00:36:52.860 and we reference Slytherin in this table? 00:36:52.860 --> 00:36:56.600 So the goal is to avoid repeating ourselves over and over again 00:36:56.600 --> 00:36:59.240 in this data set, like having Slytherin so many times 00:36:59.240 --> 00:37:01.220 or Hufflepuff so many times. 00:37:01.220 --> 00:37:06.740 I just want to make our own tables for individual pieces of our data sets. 00:37:06.740 --> 00:37:10.740 So if we're thinking about how we could better design this table, 00:37:10.740 --> 00:37:13.160 let's look at a few design principles we could use 00:37:13.160 --> 00:37:15.510 to actually improve this table here. 00:37:15.510 --> 00:37:16.730 So go back to our slides. 00:37:16.730 --> 00:37:18.950 Let's think about database design. 00:37:18.950 --> 00:37:23.540 So one principle for a database is that every table 00:37:23.540 --> 00:37:29.368 you have inside of your database should only represent a single entity, 00:37:29.368 --> 00:37:31.410 and we've actually kind of broken that rule here. 00:37:31.410 --> 00:37:35.600 So if I go back into my table, well, I have students in this table. 00:37:35.600 --> 00:37:37.020 It's a table of students. 00:37:37.020 --> 00:37:39.860 But I also have houses, I have heads, like there's 00:37:39.860 --> 00:37:44.690 lots of things going on here that don't just belong to students. 00:37:44.690 --> 00:37:48.810 So ideally, what we could do is try to simplify this a little bit. 00:37:48.810 --> 00:37:53.630 Let's try to have a single table for students, a single table for houses, 00:37:53.630 --> 00:37:58.710 and then a single table for the assignments of students between houses. 00:37:58.710 --> 00:38:02.570 So in this case, students wouldn't also have the house assignment. 00:38:02.570 --> 00:38:05.630 And similarly, houses wouldn't have the house assignment. 00:38:05.630 --> 00:38:08.690 But we have this extra table called student-house assignments 00:38:08.690 --> 00:38:13.050 that sort of tells us how these students and houses relate to each other. 00:38:13.050 --> 00:38:15.380 So as a visual here, let's dive in. 00:38:15.380 --> 00:38:17.390 Let's say-- this is what we currently look like. 00:38:17.390 --> 00:38:21.110 We have a student's table that has student names, houses, 00:38:21.110 --> 00:38:24.620 and the head of each house, but we want to break this up. 00:38:24.620 --> 00:38:28.740 We only want to have a single thing inside of each of our tables. 00:38:28.740 --> 00:38:29.720 So let's break this up. 00:38:29.720 --> 00:38:31.880 Let's say get rid of the houses and heads here. 00:38:31.880 --> 00:38:32.610 Those are gone. 00:38:32.610 --> 00:38:38.420 Let's just have students in this table, but where should we put our houses now? 00:38:38.420 --> 00:38:40.700 Let's make a new table called houses. 00:38:40.700 --> 00:38:43.640 And this one we'll put here, we'll call it houses. 00:38:43.640 --> 00:38:46.760 And notice how in this table called houses, 00:38:46.760 --> 00:38:49.695 well, we have some new IDs for each of these houses. 00:38:49.695 --> 00:38:52.320 But now, we have, well, in this case, it's called student_name. 00:38:52.320 --> 00:38:52.987 Let me fix that. 00:38:52.987 --> 00:38:58.580 That should be house in this case, house called 00:38:58.580 --> 00:39:01.550 Gryffindor, Hufflepuff, Ravenclaw, and Slytherin, and then 00:39:01.550 --> 00:39:04.730 the head of that house in this case, like Minerva, Pomona, Filius, 00:39:04.730 --> 00:39:06.230 and Severus. 00:39:06.230 --> 00:39:10.490 So, we now have our own separate table for houses. 00:39:10.490 --> 00:39:13.230 And we could even go one step further and say, well, 00:39:13.230 --> 00:39:16.760 how do we now know where students are assigned. 00:39:16.760 --> 00:39:18.980 And for that, we could make our new table, 00:39:18.980 --> 00:39:22.380 this one called assignments, down below. 00:39:22.380 --> 00:39:29.930 So you can see that I have assignment 1 to 4, where student ID is 1 00:39:29.930 --> 00:39:32.030 and house ID is 4. 00:39:32.030 --> 00:39:35.780 So what do you think this table now is telling us down below? 00:39:35.780 --> 00:39:40.540 If I give you the full screen below here, if you type in the chat, 00:39:40.540 --> 00:39:46.570 what is this table telling us, only this first row? 00:39:46.570 --> 00:39:50.950 It's telling us that Adelaide, Adelaide Murton, belongs to Slytherin. 00:39:50.950 --> 00:39:52.795 And how do you know that in this case? 00:39:56.870 --> 00:39:59.870 How do you know that Adelaide belongs to Slytherin now? 00:39:59.870 --> 00:40:01.100 Because of the IDs. 00:40:01.100 --> 00:40:03.620 So we saw before that here, we have ID. 00:40:03.620 --> 00:40:06.230 Student ID is one, but which student has the ID one? 00:40:06.230 --> 00:40:08.750 Well, it seems like it's Adelaide if you look at that table. 00:40:08.750 --> 00:40:12.442 And I look at the 4 here for the house ID, in which house is 4. 00:40:12.442 --> 00:40:13.400 Well, that's Slytherin. 00:40:13.400 --> 00:40:18.440 So we know that Adelaide has a belonging to Slytherin in this case. 00:40:18.440 --> 00:40:21.395 Adelaide is assigned to Slytherin house. 00:40:21.395 --> 00:40:23.270 So this brings us to another design principle 00:40:23.270 --> 00:40:25.940 here, which is that, ideally, we only want 00:40:25.940 --> 00:40:29.490 to have every piece of data stored in one place. 00:40:29.490 --> 00:40:33.860 And then from thereafter, refers that piece of data by its ID. 00:40:33.860 --> 00:40:37.980 So notice how we have, in the students table, an ID for every student. 00:40:37.980 --> 00:40:40.400 In the houses table, an ID for every house. 00:40:40.400 --> 00:40:44.120 And now, in our other tables, we just have to reference that ID. 00:40:44.120 --> 00:40:47.810 We don't have to actually reference Slytherin or the student name anymore. 00:40:47.810 --> 00:40:50.420 If those things change, we change them in one place. 00:40:50.420 --> 00:40:52.295 And now, the relationship, we can sort of 00:40:52.295 --> 00:40:56.130 define elsewhere and update that as we need to. 00:40:56.130 --> 00:40:58.410 And this is a good segue into what people 00:40:58.410 --> 00:41:01.630 are saying in the chat, which is primary keys and foreign keys. 00:41:01.630 --> 00:41:05.850 So what is the primary key now in each of these tables? 00:41:05.850 --> 00:41:07.425 Where do you see a primary key? 00:41:10.760 --> 00:41:13.350 Yes, so you're seeing in the ID columns for every table. 00:41:13.350 --> 00:41:16.950 So here in students, students is primary key. 00:41:16.950 --> 00:41:19.940 The unique ID for every student is just that ID column. 00:41:19.940 --> 00:41:23.810 And for houses, the primary key is that ID column as well, and similarly, 00:41:23.810 --> 00:41:25.220 for assignments. 00:41:25.220 --> 00:41:29.700 But now, student_id is not a primary key. 00:41:29.700 --> 00:41:32.990 It's not a unique identifier for every student here. 00:41:32.990 --> 00:41:36.010 What is it instead? 00:41:36.010 --> 00:41:37.190 It's a foreign key. 00:41:37.190 --> 00:41:41.500 So it's saying that this references a primary key in another table. 00:41:41.500 --> 00:41:44.260 It's references the unique ID in students now. 00:41:44.260 --> 00:41:48.130 And similarly, this house ID 4, well, that reference the primary key 00:41:48.130 --> 00:41:49.030 in houses. 00:41:49.030 --> 00:41:52.630 So we're going to say is that the primary key is that unique ID 00:41:52.630 --> 00:41:55.870 inside of the table, and a foreign key is 00:41:55.870 --> 00:41:59.260 a reference to that primary key from another table, a separate table 00:41:59.260 --> 00:42:01.030 in this case. 00:42:01.030 --> 00:42:06.100 So if we wanted to update this table, we could do it as follows. 00:42:06.100 --> 00:42:08.850 Let's actually go through the work of trying to update this table. 00:42:08.850 --> 00:42:10.150 Currently, looks like this. 00:42:10.150 --> 00:42:12.270 We have just a single table here. 00:42:12.270 --> 00:42:14.550 But we ideally want to make a different table 00:42:14.550 --> 00:42:16.480 for houses, a different table for assignments, 00:42:16.480 --> 00:42:19.710 so let's explore some syntax to actually make that happen for us. 00:42:19.710 --> 00:42:23.290 Now, we could create a table a bit like this, and something's missing here. 00:42:23.290 --> 00:42:25.540 But the general consensus is it looks a bit like this. 00:42:25.540 --> 00:42:28.590 We could say CREATE TABLE, call it houses, 00:42:28.590 --> 00:42:31.860 that has four, actually three different columns now, 00:42:31.860 --> 00:42:36.480 one called ID, one called house, and one called head. 00:42:36.480 --> 00:42:41.310 So the table called houses will have a column called ID, 00:42:41.310 --> 00:42:44.070 a column called house, and a column called head. 00:42:44.070 --> 00:42:47.280 And what is this last bit doing for us here? 00:42:47.280 --> 00:42:48.600 Any ideas? 00:42:48.600 --> 00:42:55.030 What is that specifying about our table, at fourth kind of row here? 00:42:59.300 --> 00:43:01.620 Yeah, so it's going to say this will be a primary key. 00:43:01.620 --> 00:43:05.240 This is going to be our unique ID inside of this house's table. 00:43:05.240 --> 00:43:07.760 So ID should be always unique for every row. 00:43:07.760 --> 00:43:11.430 It should never change for every row. 00:43:11.430 --> 00:43:16.050 So, if you want to do this, we could try this, but it won't quite work for us. 00:43:16.050 --> 00:43:19.440 That's because in SQL, there are also data types. 00:43:19.440 --> 00:43:22.310 So if we kind of table if we say, what kind of type 00:43:22.310 --> 00:43:24.300 we're storing in each of our columns? 00:43:24.300 --> 00:43:28.130 So here, we might say not just ID, but that ID is an integer, 00:43:28.130 --> 00:43:31.910 or not just house, but that house is a piece of text, and so it's head. 00:43:31.910 --> 00:43:35.330 So now, we're saying that we're going to have these three columns, ID, house, 00:43:35.330 --> 00:43:39.290 and head, and that ID is an integer, house is text, 00:43:39.290 --> 00:43:42.090 and head is text as well. 00:43:42.090 --> 00:43:44.000 And we could even get more fancy with this 00:43:44.000 --> 00:43:47.420 and apply some constraints, which means we could kind of make sure 00:43:47.420 --> 00:43:51.350 that as we insert data to this table, we adhere to some rules 00:43:51.350 --> 00:43:52.610 that we set beforehand. 00:43:52.610 --> 00:43:55.820 And maybe one of these rules is that each of these columns 00:43:55.820 --> 00:43:57.350 should never be null. 00:43:57.350 --> 00:44:02.360 No meaning empty, meaning that every column or every time we 00:44:02.360 --> 00:44:05.690 add a new row, no column can be blank. 00:44:05.690 --> 00:44:09.578 We can't have a house without a name, and we can't have a house out ahead. 00:44:09.578 --> 00:44:10.620 That wouldn't make sense. 00:44:10.620 --> 00:44:15.620 So every row has to have each of these three columns in order for us 00:44:15.620 --> 00:44:17.150 to insert it. 00:44:17.150 --> 00:44:22.940 And so, kind of a translation here is that not null is similar to required. 00:44:22.940 --> 00:44:26.270 We have to make sure this column name is required 00:44:26.270 --> 00:44:29.210 when we insert some new data here. 00:44:29.210 --> 00:44:31.780 So let's try this soon, but what questions do you 00:44:31.780 --> 00:44:34.750 have so far on this great table syntax? 00:44:38.360 --> 00:44:38.900 Let me see. 00:44:43.440 --> 00:44:44.380 A good question here. 00:44:44.380 --> 00:44:48.420 So if we update the particular house's ID in the house table, 00:44:48.420 --> 00:44:51.550 can that update the ID and the assignments table? 00:44:51.550 --> 00:44:53.713 So let's go back to our visual here. 00:44:53.713 --> 00:44:54.630 It's a great question. 00:44:54.630 --> 00:44:58.350 If we looked at the ID for Slytherin, that is 4. 00:44:58.350 --> 00:45:02.370 And in the assignments table, we have the ID 4. 00:45:02.370 --> 00:45:07.410 Now, if I changed Slytherin ID to be 5, well, this 00:45:07.410 --> 00:45:09.600 is Hyman's table would not automatically update 00:45:09.600 --> 00:45:11.520 for me, at least if I do it naively. 00:45:11.520 --> 00:45:15.210 If I just go in here and I say that this row here, Slytherin, should now 00:45:15.210 --> 00:45:18.690 have the ID 5, that will mess this reference up. 00:45:18.690 --> 00:45:21.420 We'll now actually know what to point to anymore. 00:45:21.420 --> 00:45:25.722 What I could do perhaps is better is let's say, OK, Slytherin-- perhaps it's 00:45:25.722 --> 00:45:27.180 no longer called Slytherin anymore. 00:45:27.180 --> 00:45:28.972 Maybe it's called a different kind of name. 00:45:28.972 --> 00:45:32.340 I could change this, well, this house now. 00:45:32.340 --> 00:45:35.210 Let me fix that typo here. 00:45:35.210 --> 00:45:41.160 Fix this house, and I could call it something else. 00:45:41.160 --> 00:45:43.820 And now, if I reference that same ID, I'll 00:45:43.820 --> 00:45:46.400 get a different name for the student house, 00:45:46.400 --> 00:45:49.250 but I should never change the primary key in general. 00:45:52.130 --> 00:45:53.840 And I see a question to our primary keys. 00:45:53.840 --> 00:45:57.940 So is it good practice to have an auto increment for the primary key? 00:45:57.940 --> 00:46:00.880 Meaning that if I add a new row, like let's say 00:46:00.880 --> 00:46:05.290 if I add a new student over here, that the ID automatically increases. 00:46:05.290 --> 00:46:06.060 It's a great idea. 00:46:06.060 --> 00:46:08.740 We always want to make sure that our ID is unique. 00:46:08.740 --> 00:46:13.030 And so as we add new rows, we'll make sure they're always increasing our ID, 00:46:13.030 --> 00:46:14.820 and SQL will handle that for you. 00:46:14.820 --> 00:46:16.760 If I go back to my create table statement 00:46:16.760 --> 00:46:20.710 and if I have this primary key ID constraint down the bottom, 00:46:20.710 --> 00:46:24.640 that will mean that this ID column will always auto increment for me, 00:46:24.640 --> 00:46:27.470 and often it actually won't ever be null. 00:46:27.470 --> 00:46:30.760 So primary key handles some of these constraints for me 00:46:30.760 --> 00:46:33.650 by saying that, look, this column should never be null. 00:46:33.650 --> 00:46:37.450 It should also never be a random value, which always auto increment as I 00:46:37.450 --> 00:46:39.420 add new rows. 00:46:39.420 --> 00:46:41.430 Great questions here. 00:46:41.430 --> 00:46:42.225 Other ones too? 00:46:47.815 --> 00:46:48.690 A good question here. 00:46:48.690 --> 00:46:51.410 So when we chose to introduce, to distribute 00:46:51.410 --> 00:46:55.380 the database into three tables, why is this the best distribution? 00:46:55.380 --> 00:46:59.468 So if you go back to our visual again here, why is this the best? 00:46:59.468 --> 00:47:01.760 I actually won't say this is the very best we could do. 00:47:01.760 --> 00:47:04.010 There's probably some more updates we could make here, 00:47:04.010 --> 00:47:05.570 but this is simply better. 00:47:05.570 --> 00:47:08.720 Because in this case, we're reducing some of our redundancies, 00:47:08.720 --> 00:47:11.750 and I could more easily change my data set if I'd like to. 00:47:11.750 --> 00:47:15.210 So notice how, here, going back to our example 00:47:15.210 --> 00:47:19.730 of changing some names of houses, well, if I change the name of Ravenclaw, 00:47:19.730 --> 00:47:22.837 in my prior data set like this one, I would 00:47:22.837 --> 00:47:25.670 have to go through and update all of these names, like individually, 00:47:25.670 --> 00:47:30.530 to find Ravenclaw, Ravenclaw, Ravenclaw, update that every time. 00:47:30.530 --> 00:47:33.020 But if I have this kind of arrangement, well, I 00:47:33.020 --> 00:47:35.060 can just change Ravenclaw in one table. 00:47:35.060 --> 00:47:38.840 And now, that name has changed on my entire database as I go through. 00:47:38.840 --> 00:47:42.560 So it's helpful to not repeat ourselves while we're having 00:47:42.560 --> 00:47:45.380 these databases design questions here. 00:47:49.220 --> 00:47:51.710 So let's try actually creating this table now, 00:47:51.710 --> 00:47:54.350 and our goal was to create a table called houses that 00:47:54.350 --> 00:47:55.920 had some of these constraints here. 00:47:55.920 --> 00:47:59.480 So let's go ahead and go back to our SQLite database, and let's try it. 00:47:59.480 --> 00:48:04.460 Let's go ahead and CREATE TABLE, this one called houses. 00:48:04.460 --> 00:48:07.250 And now, I'll add my parentheses to say what 00:48:07.250 --> 00:48:09.020 columns should be inside of houses. 00:48:09.020 --> 00:48:10.280 I'll hit Enter here. 00:48:10.280 --> 00:48:13.140 And now, I'll kind of, by convention, indent four spaces. 00:48:13.140 --> 00:48:14.930 I'll hit 1, 2, 3, 4. 00:48:14.930 --> 00:48:18.920 And the first column I want to have, that one is called ID. 00:48:18.920 --> 00:48:20.930 That one will be an INTEGER. 00:48:20.930 --> 00:48:23.900 And ideally, I want it to be NOT NULL, never 00:48:23.900 --> 00:48:28.130 should be able to have a blank ID in this table. 00:48:28.130 --> 00:48:31.580 Next one, I want to have a name of the house. 00:48:31.580 --> 00:48:33.872 This one, some text. 00:48:33.872 --> 00:48:35.580 And I also want to make sure that I can't 00:48:35.580 --> 00:48:37.770 have this ever be null, never blank. 00:48:37.770 --> 00:48:42.360 And similarly, I want to have a head, head name. 00:48:42.360 --> 00:48:48.330 This one called, this up type TEXT, and also never NULL. 00:48:48.330 --> 00:48:51.360 And there's one more thing I need to add here, and what is that? 00:48:54.570 --> 00:48:55.450 A primary key. 00:48:55.450 --> 00:48:59.580 So I can say PRIMARY KEY id, and then close this out down below 00:48:59.580 --> 00:49:02.610 in my parentheses, and hit a semicolon, and hit Enter. 00:49:02.610 --> 00:49:05.940 And now, if I type dot schema again, well, I 00:49:05.940 --> 00:49:11.790 should see I have a table for students now and a table for houses. 00:49:11.790 --> 00:49:15.090 And what we could do if we have more time 00:49:15.090 --> 00:49:17.400 is spend some time actually updating this database, 00:49:17.400 --> 00:49:20.520 like take data from our students table, insert into houses, 00:49:20.520 --> 00:49:24.750 make sure the students table removes the house and head columns as well. 00:49:24.750 --> 00:49:27.490 We won't spend too much time on that here today, 00:49:27.490 --> 00:49:30.210 but the goal is simply to show you how we can add more tables 00:49:30.210 --> 00:49:31.830 and insert some data. 00:49:31.830 --> 00:49:35.520 But on with inserting some data, how could we actually add heads here? 00:49:35.520 --> 00:49:40.410 Because if I do SELECT star FROM houses, well, there's nothing in there, 00:49:40.410 --> 00:49:41.160 nothing shows up. 00:49:41.160 --> 00:49:43.508 So how should I add some data here? 00:49:43.508 --> 00:49:46.300 I'm seeing insert, so let's actually see what that could look like. 00:49:46.300 --> 00:49:50.730 I could go here, and I could say our picture right now is like this. 00:49:50.730 --> 00:49:53.880 We have a blank houses table that has the right column names, 00:49:53.880 --> 00:49:56.470 but no data, so let's actually insert some data. 00:49:56.470 --> 00:49:59.130 So in general, inserting looks like a bit like this. 00:49:59.130 --> 00:50:04.170 Insert into the table name, followed by some columns I want to add data into, 00:50:04.170 --> 00:50:07.350 and then some values want to add to those columns. 00:50:07.350 --> 00:50:11.610 And to add in let's say Gryffindor, I could do this. 00:50:11.610 --> 00:50:16.710 INSERT INTO the houses table on the house and head columns Gryffindor 00:50:16.710 --> 00:50:20.250 for the house and McGonagall for the head, so let's try that. 00:50:20.250 --> 00:50:24.660 I could say, down below, INSERT INTO houses. 00:50:24.660 --> 00:50:27.510 And which columns we want to insert into now for houses? 00:50:31.460 --> 00:50:34.540 Head and house. 00:50:34.540 --> 00:50:35.950 Let's do it in the right order. 00:50:35.950 --> 00:50:40.350 Let's first do house and then head, let's then do the values. 00:50:40.350 --> 00:50:43.830 Let's say, OK, what should go into the house column? 00:50:43.830 --> 00:50:45.660 Gryffindor. 00:50:45.660 --> 00:50:47.670 And what should go into the head column? 00:50:47.670 --> 00:50:50.360 McGonagall. 00:50:50.360 --> 00:50:52.390 And then we'll hit semicolon, Enter here. 00:50:52.390 --> 00:50:53.560 And now, let's select. 00:50:53.560 --> 00:50:56.980 Let's SELECT star FROM houses, and now, we 00:50:56.980 --> 00:51:00.955 see Gryffindor as the house and the head is McGonagall, with the ID of 1. 00:51:00.955 --> 00:51:03.940 That primary key has auto incremented for us. 00:51:03.940 --> 00:51:06.850 Let's maybe add now, whoops-- 00:51:06.850 --> 00:51:08.500 let's maybe add now Slytherin. 00:51:08.500 --> 00:51:19.310 We could say INSERT INTO houses with the VALUES Slytherin and Severus Snape. 00:51:19.310 --> 00:51:23.060 And now, if we SELECT star FROM houses, we 00:51:23.060 --> 00:51:26.240 should see our ID has auto incremented, and we now 00:51:26.240 --> 00:51:30.110 have Slytherin with Severus Snape as the head. 00:51:30.110 --> 00:51:32.495 So questions on these inserts or these creating tables? 00:51:38.500 --> 00:51:41.620 Question, what if one insert more than one row at once? 00:51:41.620 --> 00:51:44.920 I believe that that is a good use case for something like Python. 00:51:44.920 --> 00:51:48.520 So we saw in lecture, not so much today, but in lecture, how we can use Python 00:51:48.520 --> 00:51:50.350 to kind of automate SQL queries. 00:51:50.350 --> 00:51:53.050 You could imagine, for example, having a Python loop, 00:51:53.050 --> 00:51:55.660 like a for loop, that kind of runs INSERT query 00:51:55.660 --> 00:51:59.580 command over and over again to add multiple rows over and over again 00:51:59.580 --> 00:52:00.080 there. 00:52:00.080 --> 00:52:03.577 So more on that next week, when we work on like-- 00:52:03.577 --> 00:52:05.660 actually in later weeks like with Flask and so on. 00:52:05.660 --> 00:52:08.440 But for now, I would refer back to lecture and look at how 00:52:08.440 --> 00:52:10.390 we can use Python along with SQL. 00:52:14.100 --> 00:52:17.990 So to close us out here, what we'll do is just 00:52:17.990 --> 00:52:21.290 figure out how we can actually modify our queries to fit 00:52:21.290 --> 00:52:24.030 this kind of more complex database. 00:52:24.030 --> 00:52:26.910 Now that we have data in all kinds of different tables, well, 00:52:26.910 --> 00:52:27.870 how do we query it? 00:52:27.870 --> 00:52:31.190 How do we get actually the information we want to get out of it? 00:52:31.190 --> 00:52:32.690 Something like this in the very end. 00:52:32.690 --> 00:52:35.520 How do we get the data we want out of this? 00:52:35.520 --> 00:52:37.520 And there are a number of ways we could do this. 00:52:37.520 --> 00:52:40.230 And let's take a look at actually an already complete database. 00:52:40.230 --> 00:52:42.920 So I'll go ahead and I will get a database 00:52:42.920 --> 00:52:48.170 that I've already created here for you. 00:52:48.170 --> 00:52:51.820 I'll do roster.db, copy it here. 00:52:51.820 --> 00:52:56.410 And now, if I do SQLite3 roster.db, type dot schema, 00:52:56.410 --> 00:53:00.230 you should see that now we have this brand new schema for our table. 00:53:00.230 --> 00:53:04.210 So we have a student's table with a student name and an ID, 00:53:04.210 --> 00:53:08.090 a houses table with the house name and a head name and an ID. 00:53:08.090 --> 00:53:10.300 And then finally, it's assignments table, where 00:53:10.300 --> 00:53:13.030 we say that a foreign key, called student_id, 00:53:13.030 --> 00:53:16.600 references the ID column in the student's table like this, 00:53:16.600 --> 00:53:22.960 and then the house_id column here references the ID column in houses 00:53:22.960 --> 00:53:24.260 just like this. 00:53:24.260 --> 00:53:29.440 And if I SELECT star FROM houses, notice how we have all of our houses 00:53:29.440 --> 00:53:30.430 and heads. 00:53:30.430 --> 00:53:36.400 If I SELECT star FROM students, notice I have all of my students and their ID. 00:53:36.400 --> 00:53:41.690 If I then SELECT star FROM the assignments table, 00:53:41.690 --> 00:53:47.670 well, I then just see all of the student IDs and the house ID they belong to. 00:53:47.670 --> 00:53:50.270 So a bit more complex now, but the goal was 00:53:50.270 --> 00:53:55.070 to reduce our repetition, to make sure we have every entity in a single table. 00:53:55.070 --> 00:53:57.930 But how do we then get the data that we want out of this? 00:53:57.930 --> 00:53:59.680 Well, there are a few ways we could do it. 00:53:59.680 --> 00:54:01.763 One way is to use these selects and these joins. 00:54:01.763 --> 00:54:04.430 Maybe we would actually try to figure out the number of students 00:54:04.430 --> 00:54:07.070 in Gryffindor, like how many students are in Gryffindor? 00:54:07.070 --> 00:54:10.340 We tried it out before, but now we have this new schema, this new table 00:54:10.340 --> 00:54:10.980 organization. 00:54:10.980 --> 00:54:12.890 So how should we change our approach? 00:54:12.890 --> 00:54:16.310 Well, we could use these selects and we can actually 00:54:16.310 --> 00:54:19.130 make sure our select are kind of nested inside of each other, 00:54:19.130 --> 00:54:22.790 like have one select work first and then feed the result to another select 00:54:22.790 --> 00:54:23.400 later on. 00:54:23.400 --> 00:54:24.390 So let's try this. 00:54:24.390 --> 00:54:28.430 Let's say I want to figure out how many students are in Gryffindor. 00:54:28.430 --> 00:54:33.320 Well, I should probably first notice that the ID column in houses 00:54:33.320 --> 00:54:37.790 corresponds with the house_id column in assignments. 00:54:37.790 --> 00:54:41.520 So maybe, what I should first do is figure out, well, 00:54:41.520 --> 00:54:44.570 what is the ID of Gryffindor? 00:54:44.570 --> 00:54:47.190 And to do that-- 00:54:47.190 --> 00:54:54.200 any ideas for a query to find out the ID for Gryffindor? 00:54:54.200 --> 00:54:56.735 What query could I use, assuming I couldn't see it before? 00:54:59.360 --> 00:55:03.560 Assuming I don't know what that ID is, how could I find the ID for Gryffindor? 00:55:06.900 --> 00:55:07.980 Here's our schema again. 00:55:14.287 --> 00:55:15.870 So maybe one thing I could do is I'd-- 00:55:15.870 --> 00:55:19.690 I don't the ID of Gryffindor off the bat, but I could try to select it. 00:55:19.690 --> 00:55:25.690 I could say SELECT maybe the ID from the houses table 00:55:25.690 --> 00:55:30.376 WHERE the house name is Gryffindor. 00:55:30.376 --> 00:55:33.880 I'll hit Enter and I get back 3 and does that match? 00:55:33.880 --> 00:55:34.380 Let's see. 00:55:34.380 --> 00:55:37.860 SELECT star FROM, in this case, houses. 00:55:37.860 --> 00:55:41.230 And I see Gryffindor in this case is 3, although admittedly, over here is one. 00:55:41.230 --> 00:55:42.313 That's a little confusing. 00:55:42.313 --> 00:55:48.100 But in our actual table, it is 3 over here. 00:55:48.100 --> 00:55:52.030 And now that I have the idea of Gryffindor, well, how could I find 00:55:52.030 --> 00:55:54.310 all the students who are in Gryffindor? 00:55:54.310 --> 00:55:55.280 I could try to do this. 00:55:55.280 --> 00:56:00.690 I can maybe say select the count of the student IDs. 00:56:00.690 --> 00:56:08.110 That's the count of student_ids from the assignments table WHERE 00:56:08.110 --> 00:56:11.020 the house_id, in this case, was 3. 00:56:11.020 --> 00:56:12.830 Gryffindor in this table here is 3. 00:56:12.830 --> 00:56:17.360 I'll hit Enter, and I get 11 students, which seems right. 00:56:17.360 --> 00:56:20.420 So this query could still be better designed though, 00:56:20.420 --> 00:56:23.450 because here, we're talking about this number three, 00:56:23.450 --> 00:56:25.580 but what could we do instead? 00:56:25.580 --> 00:56:29.090 How could we maybe not hard code this number three, 00:56:29.090 --> 00:56:31.130 if you recall from lecture? 00:56:31.130 --> 00:56:33.550 Any ideas in the chat? 00:56:33.550 --> 00:56:34.840 We could use a JOIN. 00:56:34.840 --> 00:56:37.935 We'll see those later on, but what else could we do? 00:56:43.050 --> 00:56:46.510 I'm seeing some ideas of kind of nesting these queries. 00:56:46.510 --> 00:56:50.700 So as an example here, let's say we first run this query. 00:56:50.700 --> 00:56:53.580 SELECT if FROM houses WHERE the house is Gryffindor. 00:56:53.580 --> 00:56:55.470 Give us the ID of Gryffindor. 00:56:55.470 --> 00:57:00.780 Then let's feed the results of that query into a larger query. 00:57:00.780 --> 00:57:02.290 This one, the above. 00:57:02.290 --> 00:57:05.790 So here, notice how we have some parentheses saying 00:57:05.790 --> 00:57:09.840 that this query first will be run at the very beginning. 00:57:09.840 --> 00:57:11.700 The result of this will then be passed up 00:57:11.700 --> 00:57:17.460 to our larger query, our query that's kind of nesting this one. 00:57:17.460 --> 00:57:19.080 So I'll run this first query. 00:57:19.080 --> 00:57:21.460 I'll go back one, at least in this table, 00:57:21.460 --> 00:57:25.800 and then I'll say that in this case, the house_id for Gryffindor is 1. 00:57:25.800 --> 00:57:27.930 And now, there's a query itself completes. 00:57:27.930 --> 00:57:30.680 I could say, it's like COUNT the student_id FROM assignments WHERE 00:57:30.680 --> 00:57:32.500 house_id is one in this case. 00:57:32.500 --> 00:57:33.900 So let's try it out. 00:57:33.900 --> 00:57:35.670 I could say SELECT. 00:57:35.670 --> 00:57:42.660 I know they're beginning the count of student_ids FROM the assignments table. 00:57:42.660 --> 00:57:47.073 But now, I need to figure out what's the ID of Gryffindor? 00:57:47.073 --> 00:57:49.740 Well, I know I want a house ID and I want it to equal something, 00:57:49.740 --> 00:57:51.092 but what is that something? 00:57:51.092 --> 00:57:54.300 What I could do in this case is I could say, well, I don't really quite know. 00:57:54.300 --> 00:57:55.290 I could instead-- 00:57:55.290 --> 00:57:59.010 I could say, well, go ahead and first find me, 00:57:59.010 --> 00:58:02.280 that ID for Gryffindor, SELECT id, in this case, 00:58:02.280 --> 00:58:11.540 FROM the houses table WHERE the name or the house is Gryffindor. 00:58:11.540 --> 00:58:17.660 And then I will go ahead and hit Enter here, close the query with a semicolon 00:58:17.660 --> 00:58:20.910 here, hit Enter, and I still get that same result. 00:58:20.910 --> 00:58:23.900 This is better now, because if the idea of Gryffindor changes, 00:58:23.900 --> 00:58:27.612 notice how in this table it's three, but in this table it's 1. 00:58:27.612 --> 00:58:29.570 If the idea of Gryffindor changes, well, now, I 00:58:29.570 --> 00:58:32.240 can just figure out what I'm looking for without actually caring 00:58:32.240 --> 00:58:34.920 what the ID of Gryffindor actually is. 00:58:34.920 --> 00:58:38.600 So thoughts or questions on these nest and selects? 00:58:38.600 --> 00:58:39.755 What questions do you have? 00:58:54.380 --> 00:58:56.720 A question here, can we nest without any limit? 00:58:56.720 --> 00:58:59.550 You theoretically can have as many nestings as you want to have. 00:58:59.550 --> 00:59:01.910 So we could maybe have a query here, if I look up here. 00:59:01.910 --> 00:59:04.830 Like maybe we don't know something about this query, 00:59:04.830 --> 00:59:07.310 we could say, OK, run another query to give me 00:59:07.310 --> 00:59:10.190 that result, to kind of nest them and nest them over and over again. 00:59:10.190 --> 00:59:11.570 You can absolutely do that. 00:59:11.570 --> 00:59:13.317 It comes at a cost though of performance. 00:59:13.317 --> 00:59:16.400 You want to make sure that you're kind of limiting your number of nestings 00:59:16.400 --> 00:59:18.650 to make sure you're not just looking up arbitrary data 00:59:18.650 --> 00:59:22.020 that you don't actually need in this case. 00:59:22.020 --> 00:59:25.322 So, no theoretical limit really, but there is a limit 00:59:25.322 --> 00:59:27.780 if you're concerned about performance, which you should be. 00:59:33.360 --> 00:59:35.363 So this was one way of approaching this problem, 00:59:35.363 --> 00:59:38.030 counting students of Gryffindor with this new table using a nest 00:59:38.030 --> 00:59:38.750 and selects. 00:59:38.750 --> 00:59:42.680 But another way-- this maybe a little more intuitive. 00:59:42.680 --> 00:59:45.660 In some ways, it's going to be this JOIN that we saw earlier. 00:59:45.660 --> 00:59:48.290 So this will bring us almost to the very end of our lecture, 00:59:48.290 --> 00:59:49.790 to look at JOINs here. 00:59:49.790 --> 00:59:53.090 But instead of using SELECT, we could use this idea 00:59:53.090 --> 00:59:55.400 of putting these tables together. 00:59:55.400 --> 00:59:58.940 Here in our select, which we're treating them still as separate tables. 00:59:58.940 --> 01:00:02.090 But with JOINs, we could say, let's actually put our tables together 01:00:02.090 --> 01:00:04.800 and think of it as one bigger table in this case. 01:00:04.800 --> 01:00:08.445 So if we look at our visual again, counting students in Gryffindor, 01:00:08.445 --> 01:00:10.070 we're going to see something like this. 01:00:10.070 --> 01:00:13.700 So we have three different tables, but it would be helpful, 01:00:13.700 --> 01:00:16.340 like ideally as a human, we could just simply take these tables 01:00:16.340 --> 01:00:19.280 and put them together, like just kind of LEGOs stick them together 01:00:19.280 --> 01:00:20.870 so they fit right next each other. 01:00:20.870 --> 01:00:24.440 We can see which houses align with which IDs. 01:00:24.440 --> 01:00:29.360 So let's again notice that in this case, the ID column in houses 01:00:29.360 --> 01:00:32.540 corresponds to the house_id column in assignments, 01:00:32.540 --> 01:00:35.060 and we just want to pair them up like put them right there 01:00:35.060 --> 01:00:39.500 and see that larger, bigger table, and that's what join does for you. 01:00:39.500 --> 01:00:43.580 So a JOIN, once you run it, look a bit like this. 01:00:43.580 --> 01:00:47.210 It'll put your two tables together and have them online 01:00:47.210 --> 01:00:49.620 based on the ID you specify. 01:00:49.620 --> 01:00:52.310 So here I'm saying, give me the assignments table 01:00:52.310 --> 01:00:55.050 and JOIN houses onto it. 01:00:55.050 --> 01:00:57.350 So notice how on the left hand side here, I 01:00:57.350 --> 01:00:59.810 see assignments, that is the assignments table, 01:00:59.810 --> 01:01:02.540 and I see houses, which is the houses table. 01:01:02.540 --> 01:01:05.330 But they're corresponding, they're now next each other, 01:01:05.330 --> 01:01:10.040 based on the fact that house_id in the assignment table corresponds 01:01:10.040 --> 01:01:13.670 to or references that ID column in houses. 01:01:13.670 --> 01:01:19.700 So SQL knows which rows to put together with which rows in each table based 01:01:19.700 --> 01:01:21.770 on this ID we specify. 01:01:21.770 --> 01:01:23.850 And the syntax for this looks a bit like this. 01:01:23.850 --> 01:01:28.470 We could say SELECT everything FROM the assignments table. 01:01:28.470 --> 01:01:34.820 But first, JOIN on the houses table, and make sure we use this column 01:01:34.820 --> 01:01:38.960 as the corresponding column in this case, like the assignments.house_id, 01:01:38.960 --> 01:01:42.530 meaning the house ID column in assignments, and houses.id, 01:01:42.530 --> 01:01:46.700 meaning the ID column in houses, making sure that these correspond to each 01:01:46.700 --> 01:01:48.260 other in this case. 01:01:48.260 --> 01:01:50.390 So notice how I go back to our visual here. 01:01:50.390 --> 01:01:52.580 We have the ID column with houses corresponding 01:01:52.580 --> 01:01:55.140 to the house_id column in assignment. 01:01:55.140 --> 01:01:57.080 So let's try this in SQL. 01:01:57.080 --> 01:02:02.220 Let's go ahead and SELECT star FROM our assignments table. 01:02:02.220 --> 01:02:06.170 But first, let's go ahead and JOIN on the houses table, 01:02:06.170 --> 01:02:09.590 and let's join houses on a certain column. 01:02:09.590 --> 01:02:17.550 Let's say we want the houses.id column to match up with 01:02:17.550 --> 01:02:20.070 the assignments.house_id column. 01:02:20.070 --> 01:02:20.990 And here, we do this. 01:02:20.990 --> 01:02:24.170 And if I zoom out a bit, you'll see the very same table. 01:02:24.170 --> 01:02:30.440 We have a student_id, the house_id, the ID of the house in the house's table, 01:02:30.440 --> 01:02:33.630 the house name, and the head. 01:02:33.630 --> 01:02:39.200 So what can we do now to figure out how many students are inside of Slytherin 01:02:39.200 --> 01:02:40.340 or Gryffindor in this case? 01:02:43.950 --> 01:02:46.025 I'm seeing some longer queries, which is good. 01:02:50.170 --> 01:02:52.060 We could simply count up student IDs. 01:02:52.060 --> 01:02:54.060 I'm sure we can do this, but let's try this one. 01:02:54.060 --> 01:02:58.240 We could say, I want to SELECT the COUNT of the student_ids 01:02:58.240 --> 01:03:02.950 that I have from the assignments table. 01:03:02.950 --> 01:03:04.870 But I first want to JOIN in. 01:03:04.870 --> 01:03:11.840 I want to JOIN in the houses table, making sure that, in this case, 01:03:11.840 --> 01:03:18.760 the houses.id column matches up with the assignments table 01:03:18.760 --> 01:03:21.730 with the house_id column. 01:03:21.730 --> 01:03:24.240 And now, with that bigger table, well, I only 01:03:24.240 --> 01:03:31.740 want the rows where it seems to be that the houses' house 01:03:31.740 --> 01:03:37.480 column, the house column in houses is equal to Gryffindor. 01:03:37.480 --> 01:03:38.890 Now, hit Enter here. 01:03:38.890 --> 01:03:41.218 and now, I get back 11 still, and I could 01:03:41.218 --> 01:03:42.760 do the very same thing for Slytherin. 01:03:42.760 --> 01:03:45.770 I could just change Gryffindor for Slytherin or Hufflepuff and so on. 01:03:45.770 --> 01:03:49.180 But the idea here is that we're trying to see our tables not 01:03:49.180 --> 01:03:52.957 as individual tables anymore, but now as one bigger table, 01:03:52.957 --> 01:03:55.540 and we can use that to figure out, well, what kinds of columns 01:03:55.540 --> 01:03:59.040 can I query to get the result that I want now. 01:03:59.040 --> 01:04:09.520 So what questions are there on these JOINs, either on the syntax 01:04:09.520 --> 01:04:10.240 or how they look? 01:04:13.160 --> 01:04:18.440 A good question-- so here we say, so it does not 01:04:18.440 --> 01:04:22.670 matter when we say JOIN on a foreign table in a primary table? 01:04:22.670 --> 01:04:25.100 Let me try to parse this question. 01:04:25.100 --> 01:04:27.260 What we could do-- 01:04:27.260 --> 01:04:33.800 notice how in this above query, I said houses.id equals assignments.house_id. 01:04:33.800 --> 01:04:39.230 And the reason for this was that if I have two ID columns in two tables, 01:04:39.230 --> 01:04:43.190 well, those columns are unambiguous in their own tables. 01:04:43.190 --> 01:04:46.130 But when I JOIN them, those columns of the same name 01:04:46.130 --> 01:04:49.310 are now ambiguous, meaning that they have the same name 01:04:49.310 --> 01:04:52.530 and there are two columns, so we have to differentiate them somehow. 01:04:52.530 --> 01:04:57.170 So if we say houses.id and assignments.house_id, well, 01:04:57.170 --> 01:05:01.190 if I had two columns with the same name, maybe it's just ID in assignments, 01:05:01.190 --> 01:05:05.120 I could say, houses.id equals assignments.id and therefore get 01:05:05.120 --> 01:05:10.330 the appropriate column name as we go through. 01:05:10.330 --> 01:05:12.100 A question, how not to show identical IDs, 01:05:12.100 --> 01:05:14.517 like the house_id in the house table and assignment table. 01:05:14.517 --> 01:05:21.970 So if I do this, go back to SELECT star, SELECT everything FROM our assignments 01:05:21.970 --> 01:05:33.760 table, joining in the houses table on houses.id equals assignments.house_id, 01:05:33.760 --> 01:05:35.970 notice how we had some repetition here. 01:05:35.970 --> 01:05:38.850 We had house_id and we had ID. 01:05:38.850 --> 01:05:42.000 In the way that SQLite works, this is just the way it adjoins tables. 01:05:42.000 --> 01:05:45.510 You're always going to get those duplicate columns here, 01:05:45.510 --> 01:05:48.270 but maybe helpful if you narrow down your SELECT. 01:05:48.270 --> 01:05:50.070 You could say maybe not select everything, 01:05:50.070 --> 01:05:52.830 but maybe SELECT the house_id. 01:05:52.830 --> 01:05:55.068 You could SELECT student_name and so on. 01:05:55.068 --> 01:05:57.360 So you could narrow your SELECT, not select everything. 01:05:57.360 --> 01:06:00.030 But in general, when you JOIN two tables, 01:06:00.030 --> 01:06:03.750 you are going to get those duplicate columns because SQL needs know 01:06:03.750 --> 01:06:07.140 which columns to JOIN by. 01:06:07.140 --> 01:06:08.190 Does that makes sense? 01:06:12.280 --> 01:06:17.260 And a good question, can you give an example of a JOIN with a GROUP BY? 01:06:17.260 --> 01:06:18.670 So we absolutely could. 01:06:18.670 --> 01:06:20.870 So notice how-- 01:06:20.870 --> 01:06:22.870 I remember how an earlier in section, we tried 01:06:22.870 --> 01:06:25.510 to have this GROUP BY that figured out how many students are 01:06:25.510 --> 01:06:29.770 in every house as a whole, like Slytherin, Gryffindor all in one table. 01:06:29.770 --> 01:06:33.040 We could do the same query now, but with a JOIN, with our separate tables. 01:06:33.040 --> 01:06:37.510 We could say, to be first, SELECT for me the houses, 01:06:37.510 --> 01:06:40.450 the house like the actual house name, and the COUNT 01:06:40.450 --> 01:06:46.120 of the student_ids in that house. 01:06:46.120 --> 01:06:50.110 And what I'll do now is I'll say I want to SELECT that FROM the assignments 01:06:50.110 --> 01:06:52.170 table. 01:06:52.170 --> 01:06:58.800 But I want to first JOIN in the houses table so long as the house's ID column 01:06:58.800 --> 01:07:03.700 corresponds with the assignments.house_id column. 01:07:03.700 --> 01:07:07.980 And now, what I could do is I could just simply GROUP BY house, 01:07:07.980 --> 01:07:10.440 or to be more specific, I could GROUP BY houses.house, 01:07:10.440 --> 01:07:12.270 the house column in houses. 01:07:12.270 --> 01:07:16.080 But I'll just say house for now, for consistency, and I'll hit Enter now. 01:07:16.080 --> 01:07:18.090 And now, I see that very same thing. 01:07:18.090 --> 01:07:21.930 Gryffindor has 11 students, Hufflepuff has 5, Ravenclaw has 13, 01:07:21.930 --> 01:07:24.770 and Slytherin has 11. 01:07:24.770 --> 01:07:26.420 So good question there. 01:07:26.420 --> 01:07:28.940 If we just treat our JOIN table as a regular big old table, 01:07:28.940 --> 01:07:31.715 well, we can do the very same, arithmetic as before. 01:07:39.778 --> 01:07:42.570 A good question here, so can we use AS when we JOIN the two tables? 01:07:42.570 --> 01:07:43.270 Let's try it. 01:07:43.270 --> 01:07:47.340 So, let's say SELECT house-- 01:07:47.340 --> 01:07:51.210 SELECT star, sorry, from, let's say, assignments. 01:07:51.210 --> 01:08:00.360 And then let's go ahead and JOIN on the houses table with houses.id equals, 01:08:00.360 --> 01:08:04.742 in this case, assignments.house_id. 01:08:04.742 --> 01:08:06.450 And the question was, can we use AS here? 01:08:06.450 --> 01:08:07.650 Can we say AS? 01:08:07.650 --> 01:08:12.430 Let's just call all of this ID, and let's see what happens. 01:08:12.430 --> 01:08:13.840 We'll hit Enter here. 01:08:13.840 --> 01:08:15.880 And I don't think we can do that, because we're 01:08:15.880 --> 01:08:18.460 trying to kind of rename these columns out or joining them. 01:08:18.460 --> 01:08:22.003 SQL expect the actual column names, not the aliases 01:08:22.003 --> 01:08:24.920 that we're going to use while we're joining, so a good question there. 01:08:27.930 --> 01:08:29.685 Other questions on these joins? 01:08:34.170 --> 01:08:37.660 For a visual for folks, if you are still wrapping your mind around this, 01:08:37.660 --> 01:08:44.069 it's helpful to see in this case that we're trying to simply figure out 01:08:44.069 --> 01:08:48.479 which IDs are sort of matching up with each other, and this part of our query 01:08:48.479 --> 01:08:50.134 is actually doing that matching for us. 01:08:53.135 --> 01:08:55.760 Can you join a table more than once, is a question in the chat. 01:08:55.760 --> 01:08:56.920 You absolutely can. 01:08:56.920 --> 01:09:00.979 So if I wanted to have more than one table in my joint, 01:09:00.979 --> 01:09:02.479 I can certainly do that. 01:09:02.479 --> 01:09:05.500 I could maybe get back my entirety of that original table we had before. 01:09:05.500 --> 01:09:11.979 I could say SELECT star FROM students, originally, but then I want to join, 01:09:11.979 --> 01:09:18.700 let's say, the assignments table, where the, let's see, 01:09:18.700 --> 01:09:23.399 it would be students.id is corresponding to the assignments.student_id. 01:09:25.939 --> 01:09:30.029 Let me zoom out a bit, so that JOIN is ready. 01:09:30.029 --> 01:09:33.149 Now I'll also JOIN to this table. 01:09:33.149 --> 01:09:38.580 I'll JOIN on the houses table so long as the houses.id column corresponds 01:09:38.580 --> 01:09:41.740 to the assignments.house_id column. 01:09:41.740 --> 01:09:46.109 And now, I should get back my entire table, joining these two back together. 01:09:46.109 --> 01:09:48.899 And now, it's kind of big and kind of redundant now, 01:09:48.899 --> 01:09:52.985 but I do see that very same table we had before, everything joined together, 01:09:52.985 --> 01:09:53.882 originally. 01:09:57.270 --> 01:09:58.470 Can you alias table name? 01:09:58.470 --> 01:10:00.350 I don't believe-- 01:10:00.350 --> 01:10:01.212 I'm not quite sure. 01:10:01.212 --> 01:10:02.920 If you were going to try it out yourself, 01:10:02.920 --> 01:10:05.350 I don't think you would be able to do that in this case, 01:10:05.350 --> 01:10:09.185 often because they don't show up in the query itself or the result of the query 01:10:09.185 --> 01:10:09.685 itself. 01:10:13.330 --> 01:10:15.250 Difference between JOIN and a UNION. 01:10:15.250 --> 01:10:18.040 We probably won't dive into unions so much here. 01:10:18.040 --> 01:10:20.645 But a UNION is good for finding out things that 01:10:20.645 --> 01:10:22.270 are in common with different data sets. 01:10:22.270 --> 01:10:24.340 Let's say you have one table and another, 01:10:24.340 --> 01:10:29.050 and both tables have some intersection, like datas [INAUDIBLE] 01:10:29.050 --> 01:10:32.350 a UNION would be good for that, whereas a JOIN is just good at simply combining 01:10:32.350 --> 01:10:34.300 your tables together as you go through. 01:10:37.040 --> 01:10:39.710 Other questions on joins before we wrap up? 01:10:43.400 --> 01:10:46.810 I'm happy to stay after as well. 01:10:46.810 --> 01:10:48.640 So, why don't we wrap officially here? 01:10:48.640 --> 01:10:51.225 Well, this was the end of our section for this week seven. 01:10:51.225 --> 01:10:52.600 Thank you all so much for coming. 01:10:52.600 --> 01:10:54.520 It was wonderful to see you over Zoom this week. 01:10:54.520 --> 01:10:56.460 I'm happy to stay on and answer questions in the chat, 01:10:56.460 --> 01:10:58.420 but otherwise, we'll see you in the future. 01:10:58.420 --> 01:10:59.040 Thank you all. 01:11:05.670 --> 01:11:06.600 Recording stopped. 01:11:11.957 --> 01:11:14.040 And thank you for all of your wonderful questions. 01:11:14.040 --> 01:11:15.560 I appreciate them in the chat. 01:11:15.560 --> 01:11:18.457 If you'd like to ask any others, I'm happy to stick around 01:11:18.457 --> 01:11:19.415 for a few more minutes. 01:11:24.955 --> 01:11:26.580 And thank you for that congratulations. 01:11:26.580 --> 01:11:29.290 I appreciate that. 01:11:29.290 --> 01:11:32.560 A question on GROUP BY, let's go back to that. 01:11:32.560 --> 01:11:34.900 The question was, how does GROUP BY work? 01:11:34.900 --> 01:11:36.940 What's the point of GROUP BY? 01:11:36.940 --> 01:11:43.380 It's maybe helpful if I show you how it would work in this context. 01:11:43.380 --> 01:11:45.150 Let's go ahead and join-- 01:11:45.150 --> 01:11:49.510 back to we have before, let's select everything from our students table. 01:11:49.510 --> 01:11:56.950 And let's go ahead and also JOIN in our assignments table ON assignments 01:11:56.950 --> 01:12:05.410 dot, in this case, student_id is equal to students_id. 01:12:05.410 --> 01:12:13.270 And let's join the, I can't remember, the houses table on houses.id equals 01:12:13.270 --> 01:12:15.940 assignments.house_id. 01:12:15.940 --> 01:12:18.220 So now, we have our big, big table. 01:12:18.220 --> 01:12:21.670 But let's say we want to group that data or do something 01:12:21.670 --> 01:12:24.490 with that data within a certain group, so we could say this. 01:12:24.490 --> 01:12:28.420 GROUP BY, often this case makes sense as GROUP BY house. 01:12:28.420 --> 01:12:33.400 And this is saying that if I wanted to count up anything or average anything 01:12:33.400 --> 01:12:37.420 or figure out how many of something I have, well, normally, 01:12:37.420 --> 01:12:42.640 if I didn't say GROUP BY, I would just get the count of all the students. 01:12:42.640 --> 01:12:47.305 But if I instead GROUP BY house, what I'll get-- 01:12:47.305 --> 01:12:50.750 and my code space is kind of glitching right now, 01:12:50.750 --> 01:12:54.410 but I will get back that same count, but only 01:12:54.410 --> 01:12:57.450 looking at the smaller subset of data within each group, 01:12:57.450 --> 01:12:59.750 so looking at how every house individually 01:12:59.750 --> 01:13:03.240 and counting up within that house group, if that makes sense. 01:13:03.240 --> 01:13:04.980 And if that doesn't quite make sense yet, 01:13:04.980 --> 01:13:07.940 I encourage you to experiment with it more to figure out how GROUP BY works, 01:13:07.940 --> 01:13:10.982 and there's still some resources on the web you could learn from as well. 01:13:15.300 --> 01:13:15.800 Yep. 01:13:15.800 --> 01:13:17.550 And we will certainly share the recordings 01:13:17.550 --> 01:13:21.915 for these in [INAUDIBLE] 2023, so it will be very early January as well. 01:13:24.880 --> 01:13:27.828 So other questions here? 01:13:27.828 --> 01:13:29.620 A question, are commands with capital case? 01:13:29.620 --> 01:13:31.810 That's by convention, so it helps us note 01:13:31.810 --> 01:13:36.110 between the capital of SQL keywords and our column names here, 01:13:36.110 --> 01:13:37.570 so by convention. 01:13:44.060 --> 01:13:46.130 Are sorting and GROUP BY similar? 01:13:46.130 --> 01:13:50.600 Sorting is helpful if you like ORDER BY, like is that an example of sorting? 01:13:50.600 --> 01:13:52.940 I could order my data. 01:13:52.940 --> 01:13:56.360 And that is kind of helpful for seeing groups in my data, 01:13:56.360 --> 01:13:58.310 like seeing what groups there are. 01:13:58.310 --> 01:14:01.620 But if I want to count something up within each group, 01:14:01.620 --> 01:14:04.160 well, I should still use GROUP BY in that case. 01:14:04.160 --> 01:14:07.772 So GROUP BY is often used with these aggregators, like count or average. 01:14:07.772 --> 01:14:09.980 Here, I don't have that up here, but I would normally 01:14:09.980 --> 01:14:13.580 type like COUNT star, AVERAGE star, and then a GROUP BY, so good question. 01:14:16.458 --> 01:14:18.500 Well, so glad this is helpful, and thank you all. 01:14:18.500 --> 01:14:19.833 It's great to see you over Zoom. 01:14:19.833 --> 01:14:22.300 I hope to see you next week.