WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:04.900 [MUSIC PLAYING] 00:00:17.650 --> 00:00:21.730 CARTER ZENKE: Well, hello, one and all, and welcome to CS50's Introduction to 00:00:21.730 --> 00:00:23.530 Databases with SQL. 00:00:23.530 --> 00:00:25.150 My name is Carter Zenke. 00:00:25.150 --> 00:00:29.920 And in this course, you'll learn how to represent, how to organize and manage, 00:00:29.920 --> 00:00:32.170 and how to ask questions of the data that's around you 00:00:32.170 --> 00:00:34.370 in your everyday life. 00:00:34.370 --> 00:00:36.610 But why learn those skills? 00:00:36.610 --> 00:00:40.520 Well, you may have heard we're living in the information age, where we generate 00:00:40.520 --> 00:00:43.660 so much information, so much data by virtue of interaction 00:00:43.660 --> 00:00:46.730 with computers and with each other over the internet. 00:00:46.730 --> 00:00:50.620 You might think of, let's say, Google keeping track of the sites you click on 00:00:50.620 --> 00:00:52.270 or the sites you search for. 00:00:52.270 --> 00:00:55.630 You could think of maybe the smartphone in your pocket or the smartwatch 00:00:55.630 --> 00:00:58.760 on your wrist keeping track of health information, emails, 00:00:58.760 --> 00:01:00.740 text messages, and so on. 00:01:00.740 --> 00:01:02.890 You might even think of YouTube, where you 00:01:02.890 --> 00:01:05.530 might be watching this same video keeping track of all 00:01:05.530 --> 00:01:08.470 the videos on their platform, the creators of those videos, 00:01:08.470 --> 00:01:12.190 and even the comment you might leave on this video. 00:01:12.190 --> 00:01:15.130 So although, we're living in this information age, where 00:01:15.130 --> 00:01:17.770 there is so much data, so much information, 00:01:17.770 --> 00:01:20.200 we can use these new tools, like database and SQL, 00:01:20.200 --> 00:01:23.918 to interact with that information to store it and manage it. 00:01:23.918 --> 00:01:25.960 And although we're using some of these new tools, 00:01:25.960 --> 00:01:29.560 some of the other concepts we'll learn aren't actually so new. 00:01:29.560 --> 00:01:34.690 So here is a diagram from literally a few thousand years ago. 00:01:34.690 --> 00:01:39.400 And notice how this diagram has rows and columns. 00:01:39.400 --> 00:01:43.270 And this seems to store the stipends for workers at a temple 00:01:43.270 --> 00:01:45.170 some few thousand years ago. 00:01:45.170 --> 00:01:50.200 So given what you know based on your prior knowledge, what kind of name 00:01:50.200 --> 00:01:55.030 might you give this diagram with rows and with columns? 00:01:55.030 --> 00:01:58.120 What can we give a name to for this? 00:01:58.120 --> 00:02:02.530 So I'm seeing some ideas of a table, perhaps a spreadsheet as well. 00:02:02.530 --> 00:02:05.260 For our purposes, we'll call this a table, where 00:02:05.260 --> 00:02:08.259 a table stores some set of information. 00:02:08.259 --> 00:02:12.370 And every row in that table stores one item 00:02:12.370 --> 00:02:15.760 in that set, where every column has some piece of information 00:02:15.760 --> 00:02:18.590 about that item, some attribute of that item. 00:02:18.590 --> 00:02:22.540 So here, for example, we do have a table of workers at a temple. 00:02:22.540 --> 00:02:24.790 Every row is one worker. 00:02:24.790 --> 00:02:29.210 And every column is their stipend for a particular month. 00:02:29.210 --> 00:02:33.010 So I could take this idea of a table, this very ancient idea, 00:02:33.010 --> 00:02:35.660 and apply it to a more modern context. 00:02:35.660 --> 00:02:38.810 So let's say I'm a librarian, for instance. 00:02:38.810 --> 00:02:41.230 I want to organize my library. 00:02:41.230 --> 00:02:44.530 Well, here I have book titles and authors. 00:02:44.530 --> 00:02:47.410 And I could certainly use a table to store this information. 00:02:47.410 --> 00:02:51.370 But how might you propose I store this information? 00:02:51.370 --> 00:02:54.640 What could I do with my rows and with my columns 00:02:54.640 --> 00:02:58.300 if I have book titles and book authors? 00:02:58.300 --> 00:03:03.400 I see one thing I could do is probably organize my titles and my authors 00:03:03.400 --> 00:03:05.178 next to each other like this. 00:03:05.178 --> 00:03:07.970 I could take my titles to my authors, put them right next together. 00:03:07.970 --> 00:03:10.420 So I have Song of Solomon by Toni Morrison, 00:03:10.420 --> 00:03:12.310 Goodnight Moon by Margaret Wise Brown. 00:03:12.310 --> 00:03:15.980 And notice hear how each book is one row. 00:03:15.980 --> 00:03:21.640 But every row has two columns worth two pieces of information for each book. 00:03:21.640 --> 00:03:25.970 I have a title for one column and an author for the other. 00:03:25.970 --> 00:03:29.950 And so together, I have a table of books where every column tells me 00:03:29.950 --> 00:03:31.210 one piece of information. 00:03:31.210 --> 00:03:36.440 And every row tells me one book in this data set. 00:03:36.440 --> 00:03:40.450 So thankfully, now that we're living in this information age, 00:03:40.450 --> 00:03:45.760 we no longer have to use stone tablets or perhaps pencil and paper 00:03:45.760 --> 00:03:47.110 to store our tables. 00:03:47.110 --> 00:03:51.550 We have software now like Apple Numbers, Google Sheets, and Microsoft Excel. 00:03:51.550 --> 00:03:57.050 But this isn't a course on Apple Numbers or Microsoft Excel or so on. 00:03:57.050 --> 00:04:00.340 It's actually a course on databases and on SQL. 00:04:00.340 --> 00:04:03.970 So feel free to raise your hand if you'd like, but why 00:04:03.970 --> 00:04:07.360 would we decide to move along from these spreadsheet 00:04:07.360 --> 00:04:10.420 softwares towards a database? 00:04:10.420 --> 00:04:14.620 What might that database give us that a spreadsheet might not give us? 00:04:14.620 --> 00:04:18.339 So I'm seeing a few ideas here and among them are some simplicity, 00:04:18.339 --> 00:04:20.845 some ability to organize some data. 00:04:20.845 --> 00:04:22.720 But there are a few other ideas which I think 00:04:22.720 --> 00:04:26.740 about too for why we might move beyond spreadsheets and go towards databases. 00:04:26.740 --> 00:04:29.530 Now one of these is this idea of scale. 00:04:29.530 --> 00:04:32.830 So let's say you are a Google or an Instagram. 00:04:32.830 --> 00:04:35.110 You're trying to store not just tens of thousands 00:04:35.110 --> 00:04:37.150 of users or hundreds of thousands but literally 00:04:37.150 --> 00:04:39.850 millions of users or billions of users. 00:04:39.850 --> 00:04:43.450 And with that kind of scale may be better served by a database 00:04:43.450 --> 00:04:46.250 to store that much information. 00:04:46.250 --> 00:04:49.900 Another reason to move from spreadsheets to databases 00:04:49.900 --> 00:04:53.170 is the idea of being able to update data more frequently. 00:04:53.170 --> 00:04:55.180 Maybe you're a Twitter of the world, and you're 00:04:55.180 --> 00:04:58.060 trying to have others tweet multiple times per second. 00:04:58.060 --> 00:05:00.910 Well, a database can handle that kind of capacity 00:05:00.910 --> 00:05:04.060 much better than a spreadsheet could alone. 00:05:04.060 --> 00:05:08.180 And a third reason to move beyond this might be speed. 00:05:08.180 --> 00:05:11.810 Let's say I'm trying to look up some piece of information in my database. 00:05:11.810 --> 00:05:14.320 Well, I could do that much faster with a database 00:05:14.320 --> 00:05:15.670 than I could with a spreadsheet. 00:05:15.670 --> 00:05:18.400 You could think of yourself using Command-F or Control-F to find 00:05:18.400 --> 00:05:20.260 a piece of information in your spreadsheet, 00:05:20.260 --> 00:05:22.330 going one by one through the rows. 00:05:22.330 --> 00:05:24.550 A database gives you access to more kinds 00:05:24.550 --> 00:05:28.540 of algorithms you could use to search this data much faster, ultimately. 00:05:28.540 --> 00:05:31.270 So these three reasons, among others, are 00:05:31.270 --> 00:05:36.220 those might want to move beyond spreadsheets and start using databases. 00:05:36.220 --> 00:05:40.437 Now, it's worth thinking first, what is a database? 00:05:40.437 --> 00:05:42.770 We'll be talking about databases throughout this course. 00:05:42.770 --> 00:05:44.530 So what is a database? 00:05:44.530 --> 00:05:48.430 Well, a database is simply some way to organize your data such 00:05:48.430 --> 00:05:54.070 that you can actually create data, update data, read data, and delete 00:05:54.070 --> 00:05:54.580 data. 00:05:54.580 --> 00:05:57.760 And often these our for interactions that we'll do with a database-- 00:05:57.760 --> 00:06:02.170 like adding some data, looking at data, deleting data, and even updating it 00:06:02.170 --> 00:06:03.550 along the way. 00:06:03.550 --> 00:06:06.640 But the database isn't the only thing in our picture here. 00:06:06.640 --> 00:06:09.715 We also have a database management system, 00:06:09.715 --> 00:06:13.180 a way to interact with our database. 00:06:13.180 --> 00:06:17.350 So you might think of perhaps writing a program on your computer. 00:06:17.350 --> 00:06:19.510 You have some interface with to write that program, 00:06:19.510 --> 00:06:21.250 like VS Code for instance. 00:06:21.250 --> 00:06:23.860 Or you might think of your own desktop on your own computer. 00:06:23.860 --> 00:06:27.340 You have icons that you interact with the underlying operating system. 00:06:27.340 --> 00:06:31.660 In the same way, we can use this software called a database management 00:06:31.660 --> 00:06:35.440 system, or a DBMS for short, to interact with a database perhaps 00:06:35.440 --> 00:06:40.470 using a graphical interface or using a textual language too. 00:06:40.470 --> 00:06:46.025 Now, there are a few varieties of database management softwares. 00:06:46.025 --> 00:06:47.400 And these are a few of them here. 00:06:47.400 --> 00:06:51.240 MySQL, Oracle, PostgreSQL, and SQLite. 00:06:51.240 --> 00:06:53.440 But this is a non-exhaustive list. 00:06:53.440 --> 00:06:57.690 So let me also ask again, what kinds of other database management systems 00:06:57.690 --> 00:07:01.120 have you perhaps heard of in this case? 00:07:01.120 --> 00:07:05.640 I'm seeing one for Microsoft Access, perhaps, MongoDB. 00:07:05.640 --> 00:07:08.790 There are other kinds of softwares, other companies out there that make 00:07:08.790 --> 00:07:10.900 these ways to interact with a database. 00:07:10.900 --> 00:07:13.440 And this is, again, a nonexhaustive list. 00:07:13.440 --> 00:07:16.260 Now, if you are a database administrator, 00:07:16.260 --> 00:07:19.410 or maybe you're somebody who's making a choice of which software to use, 00:07:19.410 --> 00:07:21.550 you have a few trade-offs to consider. 00:07:21.550 --> 00:07:25.320 Let's say you might think of one being proprietary, for instance, costing 00:07:25.320 --> 00:07:26.730 money to work with. 00:07:26.730 --> 00:07:29.250 What you get for that money is additional support 00:07:29.250 --> 00:07:31.950 to actually implement your own database. 00:07:31.950 --> 00:07:35.610 On the other hand, you might have open source software or free software 00:07:35.610 --> 00:07:39.210 to use, stuff like PostgreSQL, MySQL, and SQLite. 00:07:39.210 --> 00:07:43.500 But the downside is you are then responsible for actually implementing 00:07:43.500 --> 00:07:45.670 that database. 00:07:45.670 --> 00:07:48.960 Another thing to consider too is that maybe some 00:07:48.960 --> 00:07:52.350 are going to be heavier weight than others, more fully featured 00:07:52.350 --> 00:07:54.690 as a consequence but perhaps heavier weight requiring 00:07:54.690 --> 00:07:56.520 more computation to run. 00:07:56.520 --> 00:07:58.830 You can think of those like MySQL or PostgreSQL 00:07:58.830 --> 00:08:02.610 being a little bit heavier weight but being more fully featured, 00:08:02.610 --> 00:08:06.240 whereas SQLite down below will be a little lighter weight, as the name 00:08:06.240 --> 00:08:09.780 might imply, but allow you to do most of the same work 00:08:09.780 --> 00:08:13.330 that these other softwares could allow you to do as well. 00:08:13.330 --> 00:08:16.410 And in this course, we'll actually be using SQLite for you 00:08:16.410 --> 00:08:18.150 to work with your own databases. 00:08:18.150 --> 00:08:24.400 But gradually, we'll move on to MySQL and PostgreSQL too. 00:08:24.400 --> 00:08:29.180 So let me go ahead and talk about then SQL in this case. 00:08:29.180 --> 00:08:34.030 You might notice that in each of these, MySQL and PostgreSQL and SQLite, 00:08:34.030 --> 00:08:36.789 each of them have this idea of SQL in them. 00:08:36.789 --> 00:08:40.750 And SQL is that language that we'll use to interact with our database. 00:08:40.750 --> 00:08:45.130 Now, let me ask, what does SQL stand for, perhaps? 00:08:45.130 --> 00:08:51.260 We talk about S-Q-L or SQL, but what might SQL stand for? 00:08:51.260 --> 00:08:53.870 So I'm saying it stands for Structured Query Language, 00:08:53.870 --> 00:08:56.787 which is good if you already know this, but not to worry if you don't. 00:08:56.787 --> 00:09:00.200 So SQL does stand for Structured Query Language. 00:09:00.200 --> 00:09:02.790 As we'll see in this course, it is structured. 00:09:02.790 --> 00:09:05.720 It does have some keywords you can use to interact with the database. 00:09:05.720 --> 00:09:08.810 And it is a query language-- it can be used to ask 00:09:08.810 --> 00:09:13.110 questions of data inside a database. 00:09:13.110 --> 00:09:18.440 We'll see that this is the language we can use to create data, to read data, 00:09:18.440 --> 00:09:24.360 to update data, and delete data all with SQL in this case. 00:09:24.360 --> 00:09:28.350 And our next thing will be to talk about this idea of querying. 00:09:28.350 --> 00:09:32.830 So SQL is a query language, but what can we do with SQL? 00:09:32.830 --> 00:09:35.150 Well, the first thing we can do, we'll focus on first 00:09:35.150 --> 00:09:37.310 in this course is writing queries-- 00:09:37.310 --> 00:09:41.270 trying to ask questions of data using SQL. 00:09:41.270 --> 00:09:43.730 Well, what kinds of questions could we ask? 00:09:43.730 --> 00:09:48.320 Well, you might imagine perhaps working at an Instagram or a Facebook trying 00:09:48.320 --> 00:09:51.290 to work as an engineer to figure out what kinds of posts 00:09:51.290 --> 00:09:53.280 are the most liked on your platform. 00:09:53.280 --> 00:09:57.050 That's a question you can answer with databases and with SQL. 00:09:57.050 --> 00:10:00.500 You might also think of whether your numbers of daily users 00:10:00.500 --> 00:10:03.590 are growing or shrinking if you work at a startup for instance. 00:10:03.590 --> 00:10:07.530 Even maybe you might be working for some a company like Spotify that could ask, 00:10:07.530 --> 00:10:11.570 how could we play songs that are like those a user just played? 00:10:11.570 --> 00:10:17.690 This too is a question you can answer with databases and with SQL. 00:10:17.690 --> 00:10:21.440 Now today, we'll be focusing on this database of books. 00:10:21.440 --> 00:10:24.500 And in particular, books that have been longlisted, 00:10:24.500 --> 00:10:27.528 quote, unquote, "for the International Booker Prize." 00:10:27.528 --> 00:10:29.820 The International Booker Prize, if you're not familiar, 00:10:29.820 --> 00:10:32.570 is an award given to books written around the world 00:10:32.570 --> 00:10:34.250 by authors from many countries. 00:10:34.250 --> 00:10:38.600 And it's designed to award books of fiction particularly good 00:10:38.600 --> 00:10:39.500 in some cases. 00:10:39.500 --> 00:10:43.040 And every year, the committee selects 13 books 00:10:43.040 --> 00:10:47.360 to include on a longlist for consideration for this prize. 00:10:47.360 --> 00:10:50.960 And our database then has five years worth of longlists 00:10:50.960 --> 00:10:53.870 for the International Booker Prize inside of it. 00:10:53.870 --> 00:10:55.730 We could use this database, perhaps, if we 00:10:55.730 --> 00:11:00.680 were a librarian trying to find books for our library or even as a book 00:11:00.680 --> 00:11:03.620 reader, an avid reader myself trying to find books to read that I 00:11:03.620 --> 00:11:06.420 could put on my own shelf overall. 00:11:06.420 --> 00:11:08.550 So we'll look at this database. 00:11:08.550 --> 00:11:10.580 But we'll need a few tools in our toolkit, 00:11:10.580 --> 00:11:13.650 metaphorically, to actually interact with this database. 00:11:13.650 --> 00:11:17.510 And one of them is going to be Visual Studio Code. 00:11:17.510 --> 00:11:22.460 Visual Studio code is an IDE, integrated development environment, to write code 00:11:22.460 --> 00:11:24.020 and to edit files with. 00:11:24.020 --> 00:11:27.370 It's also often called VS Code. 00:11:27.370 --> 00:11:32.700 Now, in VS Code, we'll also be able to use SQLite, this database management 00:11:32.700 --> 00:11:37.600 system, or a DBMS for short, to actually interact with that database. 00:11:37.600 --> 00:11:39.780 So we'll be using these two tools combined 00:11:39.780 --> 00:11:44.100 to work with the database of longlisted books for the International Booker 00:11:44.100 --> 00:11:45.520 Prize. 00:11:45.520 --> 00:11:49.540 And although we'll be using it here, SQLite is not just used in this course. 00:11:49.540 --> 00:11:51.940 It's used in a variety of applications. 00:11:51.940 --> 00:11:54.100 You could think too of phone applications, 00:11:54.100 --> 00:11:58.540 where SQLite is often used on those devices have much lower memory. 00:11:58.540 --> 00:12:02.140 You could think too of it being used on desktop applications 00:12:02.140 --> 00:12:05.320 to simplify the process of storing data there too. 00:12:05.320 --> 00:12:07.990 You could even think of it being used on websites 00:12:07.990 --> 00:12:12.890 to help store information that user submits via a form, for example. 00:12:12.890 --> 00:12:15.760 So we'll jump into using SQLite, but keep in mind 00:12:15.760 --> 00:12:18.340 that not just in this course you'll use it, 00:12:18.340 --> 00:12:23.180 but also, it's used in a variety of applications here too. 00:12:23.180 --> 00:12:27.530 So why don't we just jump right into things and start using our environment 00:12:27.530 --> 00:12:28.850 and start using SQLite? 00:12:28.850 --> 00:12:34.490 So I'll go over here to my computer and will open up, let's say, VS Code. 00:12:34.490 --> 00:12:35.330 We're here. 00:12:35.330 --> 00:12:37.730 You can see I have my terminal environment. 00:12:37.730 --> 00:12:40.190 And if you're familiar, you can type things like ls 00:12:40.190 --> 00:12:43.110 to see the files that are in your current folder. 00:12:43.110 --> 00:12:44.810 So I'll type ls right here. 00:12:44.810 --> 00:12:48.800 And I'll see this database called longlist.db. 00:12:48.800 --> 00:12:52.370 Again, working with books that have been longlisted or considered 00:12:52.370 --> 00:12:54.920 for the International Booker Prize. 00:12:54.920 --> 00:13:00.380 So if I want to open up this file, I can use this command. 00:13:00.380 --> 00:13:04.910 Then this command is going to be called SQLite 3. 00:13:04.910 --> 00:13:09.590 Or I can take some file that I have, like longlist.db, and open 00:13:09.590 --> 00:13:12.680 it using this program called SQLite 3. 00:13:12.680 --> 00:13:16.700 Well, it's called SQLite 3 because this is the third version of the SQLite 00:13:16.700 --> 00:13:17.870 software. 00:13:17.870 --> 00:13:20.360 So let's try this in our terminal. 00:13:20.360 --> 00:13:27.110 I'll go back over here, and I'll say sqlite3 longlist.db. 00:13:27.110 --> 00:13:29.180 And now, I'll hit Enter. 00:13:29.180 --> 00:13:32.010 And notice how my terminal prompt changes. 00:13:32.010 --> 00:13:33.470 It's no longer dollar sign. 00:13:33.470 --> 00:13:35.960 It now says sqlite in front. 00:13:35.960 --> 00:13:39.840 This means I'm inside of my SQLite environment. 00:13:39.840 --> 00:13:42.900 So to clear things up, let me just clear my terminal. 00:13:42.900 --> 00:13:44.780 I can use Control-L for this. 00:13:44.780 --> 00:13:48.050 And now I have just that prompt up top. 00:13:48.050 --> 00:13:52.040 And now a question I want to answer in this case first is, what 00:13:52.040 --> 00:13:55.520 data do I actually have in my database? 00:13:55.520 --> 00:14:00.650 What data is actually here for me to look at and to ask questions about? 00:14:00.650 --> 00:14:04.520 Now, for this question, I can use my very first SQL 00:14:04.520 --> 00:14:07.100 keyword, which will be called SELECT. 00:14:07.100 --> 00:14:13.370 So SELECT is a way for me to select some rows in a table inside of my database. 00:14:13.370 --> 00:14:15.890 Using SELECT, I can get back certain rows 00:14:15.890 --> 00:14:20.010 or, in this case, perhaps all of them just to get a taste of what's inside. 00:14:20.010 --> 00:14:23.210 So let's try using SELECT on this database to understand 00:14:23.210 --> 00:14:26.550 what rows we have in our table here. 00:14:26.550 --> 00:14:30.440 Let me go back to my computer and to my SQLite environment. 00:14:30.440 --> 00:14:33.920 And I will try this very first SQL keyword. 00:14:33.920 --> 00:14:35.690 I'll say SELECT. 00:14:35.690 --> 00:14:39.980 And I can use this star operator here to say select everything. 00:14:39.980 --> 00:14:45.050 I want every row and every column from this table. 00:14:45.050 --> 00:14:49.940 Now, it's not enough for me to simply say select star and end my query. 00:14:49.940 --> 00:14:54.140 I had to tell SQL which table do I want to select rows from. 00:14:54.140 --> 00:14:57.260 In this case, I know I table was called longlist. 00:14:57.260 --> 00:15:02.510 So I'll say SELECT star from "longlist," quote, unquote. 00:15:02.510 --> 00:15:06.440 And to end my query, I'll say Semicolon. 00:15:06.440 --> 00:15:09.230 And then finally, I can hit Enter. 00:15:09.230 --> 00:15:11.690 And notice how I get a lot of data back. 00:15:11.690 --> 00:15:13.285 This is a lot of data all at once. 00:15:13.285 --> 00:15:15.410 But it's because my terminal is a little bit small, 00:15:15.410 --> 00:15:17.250 there's a lot of rows and columns here. 00:15:17.250 --> 00:15:20.420 So I could probably simplify this just a little bit. 00:15:20.420 --> 00:15:24.740 And instead of saying SELECT star, I could also select a particular column 00:15:24.740 --> 00:15:26.390 from my table. 00:15:26.390 --> 00:15:29.360 I could say, for instance, SELECT, let's say, 00:15:29.360 --> 00:15:35.100 just the title column from my database for my table like this. 00:15:35.100 --> 00:15:39.170 I just know already that there is a column called title. 00:15:39.170 --> 00:15:43.370 So now, I'll try this instead-- not select star, but select title instead. 00:15:43.370 --> 00:15:44.390 I'll hit Enter. 00:15:44.390 --> 00:15:46.640 And now, this looks a little bit better. 00:15:46.640 --> 00:15:53.970 I can see the titles inside of this table from top to bottom. 00:15:53.970 --> 00:15:58.185 Now, the neat thing here is I can select more than one column too. 00:15:58.185 --> 00:15:59.810 Let's say I don't want just the title-- 00:15:59.810 --> 00:16:02.360 I want titles and authors in my search. 00:16:02.360 --> 00:16:04.260 Well, I could do that as well. 00:16:04.260 --> 00:16:05.150 Let me try this. 00:16:05.150 --> 00:16:09.020 SELECT not just "title," quote, unquote, but then I'll 00:16:09.020 --> 00:16:12.740 say comma and some new column to select. 00:16:12.740 --> 00:16:16.700 I'll select also the authors from this table. 00:16:16.700 --> 00:16:20.510 And I'll select them from the longlist table like this. 00:16:20.510 --> 00:16:22.850 Now I'll hit Semicolon to end my query. 00:16:22.850 --> 00:16:23.960 Hit Enter. 00:16:23.960 --> 00:16:27.260 And now I'll see a variety of columns here, 00:16:27.260 --> 00:16:31.450 in particular the title and the author column. 00:16:31.450 --> 00:16:34.183 Now, this is going to be like all of my columns so far. 00:16:34.183 --> 00:16:36.600 And there is a way for me to get back just to some of them 00:16:36.600 --> 00:16:37.920 later on that we'll see. 00:16:37.920 --> 00:16:40.560 But for now, let's ask, what questions do 00:16:40.560 --> 00:16:47.180 you have on this SQL SELECT statement and how we're getting back these rows? 00:16:47.180 --> 00:16:50.712 SPEAKER: Do I need to use the quotes around the words like you are? 00:16:50.712 --> 00:16:52.170 CARTER ZENKE: Yeah, great question. 00:16:52.170 --> 00:16:55.130 Do I have to use quotes around the words like I am? 00:16:55.130 --> 00:16:59.570 In general, it's a good practice to use these double quotes around your table 00:16:59.570 --> 00:17:01.430 names and your column names. 00:17:01.430 --> 00:17:03.920 These are called SQL identifiers. 00:17:03.920 --> 00:17:08.180 Later on, we'll see we'll also have strings in SQL-- strings 00:17:08.180 --> 00:17:10.040 being collections of characters we can use. 00:17:10.040 --> 00:17:12.319 For those, we'll simply single-quote them 00:17:12.319 --> 00:17:16.440 to note the difference between a string and an actual column name. 00:17:16.440 --> 00:17:19.970 So good style convention here to use-- double quotes for column names 00:17:19.970 --> 00:17:23.300 and single quotes for string names. 00:17:23.300 --> 00:17:25.190 Other questions too? 00:17:25.190 --> 00:17:25.740 SPEAKER: OK. 00:17:25.740 --> 00:17:29.990 I wanted to know, where did we take all this information from. 00:17:29.990 --> 00:17:34.340 Like, this data, I don't know this list of books, 00:17:34.340 --> 00:17:36.638 where did we take all this information from? 00:17:36.638 --> 00:17:38.180 CARTER ZENKE: Yeah, a great question. 00:17:38.180 --> 00:17:39.990 Where do we take this information from? 00:17:39.990 --> 00:17:42.270 So some of this data is publicly available. 00:17:42.270 --> 00:17:44.690 In fact, if you look at the Booker Prize website, 00:17:44.690 --> 00:17:47.780 you can find a set of longlisted books over the years. 00:17:47.780 --> 00:17:51.650 In this case, we have books from 2018 to 2023. 00:17:51.650 --> 00:17:56.888 We'll also see later on this table has data on the ratings of those books 00:17:56.888 --> 00:17:59.180 and the number of votes that were given to those books. 00:17:59.180 --> 00:18:02.630 That data is from Goodreads, the site that aggregates 00:18:02.630 --> 00:18:06.930 reviews from people like you, people like me who rate books online. 00:18:06.930 --> 00:18:11.380 So we've taken data from a variety of sources and combined it into one here. 00:18:11.380 --> 00:18:14.497 Let's take one more question to from Vinayak. 00:18:14.497 --> 00:18:15.080 SPEAKER: Yeah. 00:18:15.080 --> 00:18:20.340 So I wanted to ask regarding the syntax of what you used in the terminal. 00:18:20.340 --> 00:18:24.830 So is the whole SQLite 3 is case-sensitive 00:18:24.830 --> 00:18:30.020 because while using the syntax you used capital letters, whereas can we 00:18:30.020 --> 00:18:32.568 use small-case letters as well? 00:18:32.568 --> 00:18:33.860 CARTER ZENKE: A great question. 00:18:33.860 --> 00:18:36.580 So here, I used capital letters for SQL keywords 00:18:36.580 --> 00:18:39.280 and lowercase, my table names and column names. 00:18:39.280 --> 00:18:40.810 Do I have to do that? 00:18:40.810 --> 00:18:43.210 I, in some cases, do, some cases don't. 00:18:43.210 --> 00:18:46.390 I think I could use lowercase for these SQL keywords, 00:18:46.390 --> 00:18:48.860 but it's not very good style, for instance. 00:18:48.860 --> 00:18:52.460 So let me just show you an example of this while I go back to my computer. 00:18:52.460 --> 00:18:57.550 So the question again was, can I use lowercase for SQL keywords? 00:18:57.550 --> 00:19:00.850 I think I could, but the question is, should I? 00:19:00.850 --> 00:19:01.618 And probably not. 00:19:01.618 --> 00:19:02.410 So let me try this. 00:19:02.410 --> 00:19:05.445 I'll say select, let's say, title from-- 00:19:05.445 --> 00:19:06.820 I'm in the habit of uppercase it. 00:19:06.820 --> 00:19:11.650 So I'll say from in lowercase longlist like this Semicolon. 00:19:11.650 --> 00:19:12.610 Hit Enter. 00:19:12.610 --> 00:19:14.470 And that still works. 00:19:14.470 --> 00:19:17.110 But the problem you might run into is someone 00:19:17.110 --> 00:19:21.160 who's reading your query particularly, a long one, might want to know, 00:19:21.160 --> 00:19:22.810 what are the SQL keywords? 00:19:22.810 --> 00:19:26.380 And what are your column names and other identifiers here? 00:19:26.380 --> 00:19:29.410 By capitalizing your SQL keywords, you can make it clear 00:19:29.410 --> 00:19:35.280 that this is a SQL keyword and not some other name overall. 00:19:35.280 --> 00:19:35.880 OK. 00:19:35.880 --> 00:19:37.020 So let's keep going. 00:19:37.020 --> 00:19:41.790 And we saw just a little bit ago that we could select "title" from "longlist." 00:19:41.790 --> 00:19:46.350 And we would get back a whole list of titles, literally all the titles that 00:19:46.350 --> 00:19:48.400 are in this database. 00:19:48.400 --> 00:19:53.100 But it's often maybe good practice for me to not look at all the data. 00:19:53.100 --> 00:19:56.430 Like, imagine if we had millions of rows in this column, 00:19:56.430 --> 00:19:57.990 but only to give back some. 00:19:57.990 --> 00:20:00.690 Take a peek of what's inside this database. 00:20:00.690 --> 00:20:05.820 And for that, I could use this other SQL keyword, this one called LIMIT. 00:20:05.820 --> 00:20:09.120 So LIMIT, as its name might imply, limits 00:20:09.120 --> 00:20:13.050 the number of queries or the number of rows I get back from my query. 00:20:13.050 --> 00:20:16.200 I could say LIMIT 3, for instance, or LIMIT 5 00:20:16.200 --> 00:20:20.130 to get back only the top 3 or the top 5 rows from my table. 00:20:20.130 --> 00:20:22.500 And let me ask folks, if I wanted to peak 00:20:22.500 --> 00:20:27.890 in this data set, how many rows should I try to limit it to? 00:20:27.890 --> 00:20:30.200 I could say SELECT title from, let's say, 00:20:30.200 --> 00:20:33.960 longlist, but limit to what number? 00:20:33.960 --> 00:20:35.100 I'm seeing 10. 00:20:35.100 --> 00:20:36.350 So let's try 10 first. 00:20:36.350 --> 00:20:38.180 I'll go back to my computer. 00:20:38.180 --> 00:20:40.610 I'll come back over here, and I'll say, why don't I 00:20:40.610 --> 00:20:45.700 select "title" from "longlist?" 00:20:45.700 --> 00:20:53.680 But now, instead of hitting Semicolon, I will instead say LIMIT 10 Semicolon. 00:20:53.680 --> 00:20:54.970 And I'll hit Enter. 00:20:54.970 --> 00:21:00.010 Now I see only the first 10 rows in my data set. 00:21:00.010 --> 00:21:03.380 Handy for peeking in at the top of my data set here too. 00:21:03.380 --> 00:21:05.210 Let me try not just 10 but 5. 00:21:05.210 --> 00:21:13.910 So I'll say SELECT "title" from "longlist" LIMIT 5 Semicolon. 00:21:13.910 --> 00:21:19.375 This then gives me just the top 5 titles in my database. 00:21:19.375 --> 00:21:21.500 Just in whatever order they were added to my table, 00:21:21.500 --> 00:21:25.800 I'll see them in that order here too. 00:21:25.800 --> 00:21:31.120 So using LIMIT, we can actually try to get back a certain number of rows. 00:21:31.120 --> 00:21:33.745 But this isn't quite that interesting. 00:21:33.745 --> 00:21:35.370 It's good for peaking in your data set. 00:21:35.370 --> 00:21:38.200 I think we've answered that question of, what data do we have? 00:21:38.200 --> 00:21:40.950 But let's say we want to make more advanced queries. 00:21:40.950 --> 00:21:44.580 We want to find the books that were nominated in 2023 or perhaps books 00:21:44.580 --> 00:21:45.810 by a certain author. 00:21:45.810 --> 00:21:50.730 Well, for that, we could use this next SQL keyword, this one called WHERE. 00:21:50.730 --> 00:21:56.880 So WHERE allows me to get back not all rows, but only 00:21:56.880 --> 00:22:00.970 some rows where some condition is true. 00:22:00.970 --> 00:22:04.500 So WHERE is often combined with other conditions 00:22:04.500 --> 00:22:09.730 to make sure I only get back some rows, where that condition is true. 00:22:09.730 --> 00:22:11.850 Let's try looking at that in SQLite to get 00:22:11.850 --> 00:22:14.170 a feel for what it can do for us here. 00:22:14.170 --> 00:22:17.550 So I'll go back to my WHERE. 00:22:17.550 --> 00:22:20.610 And I will then go back to SQLite here. 00:22:20.610 --> 00:22:23.490 Hit Control-L to clear my terminal. 00:22:23.490 --> 00:22:25.530 And I'll then try this query. 00:22:25.530 --> 00:22:28.800 I'll say SELECT "title." 00:22:28.800 --> 00:22:29.790 SELECT "title." 00:22:29.790 --> 00:22:32.460 And why don't we also select author along the way? 00:22:32.460 --> 00:22:34.290 Two columns here. 00:22:34.290 --> 00:22:37.380 And I'll select them from my longlist table. 00:22:37.380 --> 00:22:39.450 But I don't want all rows. 00:22:39.450 --> 00:22:45.240 I only want, let's say, those titles and authors that were longlisted in 2023. 00:22:45.240 --> 00:22:46.260 So I'll do this. 00:22:46.260 --> 00:22:53.070 I'll say WHERE the "year" column is equal to 2023. 00:22:53.070 --> 00:22:56.880 And notice here how 2023 is not in quotes because it 00:22:56.880 --> 00:22:59.110 is an actual number, an integer. 00:22:59.110 --> 00:23:02.610 So I don't need to quote it like I would a string some collection of characters 00:23:02.610 --> 00:23:05.710 or a table or column name. 00:23:05.710 --> 00:23:07.480 So I'll hit Enter here. 00:23:07.480 --> 00:23:08.370 And what do I see? 00:23:08.370 --> 00:23:12.870 Well, I see only those books that were nominated in 2023. 00:23:12.870 --> 00:23:14.200 Let's try this again. 00:23:14.200 --> 00:23:17.130 I might try not just 2023-- 00:23:17.130 --> 00:23:20.520 I might try 2022 like this. 00:23:20.520 --> 00:23:21.870 Hit Semicolon. 00:23:21.870 --> 00:23:25.380 Now, I'll see those books nominated in 2022. 00:23:25.380 --> 00:23:26.500 I could keep going. 00:23:26.500 --> 00:23:28.260 I could say why not 2022? 00:23:28.260 --> 00:23:29.820 Why not 2021? 00:23:29.820 --> 00:23:34.450 Now, I have all those books nominated in 2021. 00:23:34.450 --> 00:23:37.040 So this is handy. 00:23:37.040 --> 00:23:40.930 We can set things equal to or not equal to make some condition here. 00:23:40.930 --> 00:23:42.970 And we also have others we could use. 00:23:42.970 --> 00:23:44.650 We saw equals just now. 00:23:44.650 --> 00:23:46.960 But we similarly have not equals. 00:23:46.960 --> 00:23:50.170 And we have this kind of obscure operator down here. 00:23:50.170 --> 00:23:55.850 This one being also equivalent to not equals as we'll see in just a minute. 00:23:55.850 --> 00:23:58.120 But let me first ask now, what questions do 00:23:58.120 --> 00:24:02.170 we have on how to use WHERE or using SELECT so far? 00:24:02.170 --> 00:24:04.760 SPEAKER: Why are the subsets of SQL? 00:24:04.760 --> 00:24:06.010 CARTER ZENKE: A good question. 00:24:06.010 --> 00:24:07.730 Are there subsets of SQL? 00:24:07.730 --> 00:24:09.160 So there are. 00:24:09.160 --> 00:24:12.870 In fact, S-Q-L or SQL was defined by the, 00:24:12.870 --> 00:24:15.010 I believe it's the ANSI, like standard corporation. 00:24:15.010 --> 00:24:17.290 They have a whole set of the SQL language 00:24:17.290 --> 00:24:19.370 that is like the official version of it. 00:24:19.370 --> 00:24:22.240 You might be able to use some subset of that version 00:24:22.240 --> 00:24:24.740 with the database manager system that you actually use. 00:24:24.740 --> 00:24:29.020 So for SQLite, we're using a subset of SQL that works with SQLite. 00:24:29.020 --> 00:24:33.340 Similarly, if you were using another software like PostgreSQL or MySQL, 00:24:33.340 --> 00:24:36.770 you could use another subset there too. 00:24:36.770 --> 00:24:40.350 Let's take another question from Tayas, perhaps. 00:24:40.350 --> 00:24:45.793 SPEAKER: Then I want to know that, can we add 2022 and 2021 in a terminal? 00:24:45.793 --> 00:24:47.210 CARTER ZENKE: Yeah, good question. 00:24:47.210 --> 00:24:52.880 Could I, perhaps, try to filter by 2021 and 2022? 00:24:52.880 --> 00:24:53.582 I could do that. 00:24:53.582 --> 00:24:55.290 And we'll see that in just a moment here. 00:24:55.290 --> 00:24:58.250 So let's keep going and exploring some other options with not equals. 00:24:58.250 --> 00:25:02.090 And then we'll see how we can combine conditions using WHERE too. 00:25:02.090 --> 00:25:05.300 So let's go back, and let's focus first on trying 00:25:05.300 --> 00:25:07.730 to use these not equal operators. 00:25:07.730 --> 00:25:12.380 We saw the exclamation point equals and this greater than, less than sign put 00:25:12.380 --> 00:25:13.500 together. 00:25:13.500 --> 00:25:15.180 So let's try a few of those. 00:25:15.180 --> 00:25:20.070 Let's say I want to find books that are written by a certain author. 00:25:20.070 --> 00:25:21.620 Well, I could use equals for that. 00:25:21.620 --> 00:25:24.170 But let's say I also want to find books that are not 00:25:24.170 --> 00:25:26.810 written in the hardcover format, like they 00:25:26.810 --> 00:25:28.440 tend to be more expensive and so on. 00:25:28.440 --> 00:25:30.620 So I don't want hardcover books. 00:25:30.620 --> 00:25:32.550 Well, I could try a query like this. 00:25:32.550 --> 00:25:38.090 I could say SELECT "title" and "format," where format is either hardcover 00:25:38.090 --> 00:25:42.050 or paperback, FROM my "longlist" table. 00:25:42.050 --> 00:25:49.370 And now I'll say, WHERE the "format" is not equal to hardcover Semicolon. 00:25:49.370 --> 00:25:53.660 So notice here I'm using single quotes for hardcover. 00:25:53.660 --> 00:25:54.960 This is a string. 00:25:54.960 --> 00:25:57.530 It's not a table name or a column name. 00:25:57.530 --> 00:25:58.580 It is just a string. 00:25:58.580 --> 00:26:00.770 So I'm using single quotes here. 00:26:00.770 --> 00:26:05.000 Everything else, though, like format, longlist, title, et cetera, 00:26:05.000 --> 00:26:08.400 those are all table names or column names. 00:26:08.400 --> 00:26:09.860 So again, I'll hit Enter. 00:26:09.860 --> 00:26:15.090 And now I'll see that these are all in paperback according to my table. 00:26:15.090 --> 00:26:17.990 I've omitted those that are hardcover. 00:26:17.990 --> 00:26:21.980 Well, I could also use in this case, the greater than or less 00:26:21.980 --> 00:26:25.230 than sign put together to say not equals as well. 00:26:25.230 --> 00:26:27.200 Let me just hit the Up Arrow on my computer 00:26:27.200 --> 00:26:29.540 to reveal what I just previously typed. 00:26:29.540 --> 00:26:36.410 I'll then tab back over and say not exclamation point equals, but less than 00:26:36.410 --> 00:26:37.670 and then greater than. 00:26:37.670 --> 00:26:38.840 Hit Enter now. 00:26:38.840 --> 00:26:41.150 And I should see the very same results. 00:26:41.150 --> 00:26:44.270 But all I did was change this operator from exclamation point 00:26:44.270 --> 00:26:47.030 equals to less than and greater than. 00:26:47.030 --> 00:26:49.250 It tends to be that the exclamation point equals 00:26:49.250 --> 00:26:55.430 is the more common operator in this case, but they do the very same thing. 00:26:55.430 --> 00:26:59.260 Now, one more keyword I could use here that's worth mentioning 00:26:59.260 --> 00:27:01.570 is this keyword called NOT. 00:27:01.570 --> 00:27:05.950 So here, I was able to use exclamation point equals or the less 00:27:05.950 --> 00:27:08.290 than or greater than sign to say not equals. 00:27:08.290 --> 00:27:12.070 But I could also negate a condition using NOT. 00:27:12.070 --> 00:27:16.150 So let's try using this one in our SQLite terminal too. 00:27:16.150 --> 00:27:17.500 I'll go back over here. 00:27:17.500 --> 00:27:21.190 And I'll bring back my SQLite terminal. 00:27:21.190 --> 00:27:27.520 I'll say let's do not this operator here, but instead use NOT. 00:27:27.520 --> 00:27:35.740 So I might, in front of WHERE, say WHERE NOT "format" equals "hardcover." 00:27:35.740 --> 00:27:38.500 So I have a condition, "format" equals "hardcover," 00:27:38.500 --> 00:27:41.680 but now I'm going to negate it, take the opposite of it 00:27:41.680 --> 00:27:46.600 and get back the very same results here too. 00:27:46.600 --> 00:27:48.060 OK. 00:27:48.060 --> 00:27:52.050 So to the question earlier, how we can combine these conditionals. 00:27:52.050 --> 00:27:55.000 So let's try that here in just a minute. 00:27:55.000 --> 00:27:58.770 So let's say I wanted to find the books that 00:27:58.770 --> 00:28:05.370 were not just written in 2022 or 2023 alone, but all the books together. 00:28:05.370 --> 00:28:08.040 Well for this, I could use a few other SQL keywords that 00:28:08.040 --> 00:28:11.530 might be a little familiar to you too. 00:28:11.530 --> 00:28:14.560 Let's try looking at some of these over here. 00:28:14.560 --> 00:28:20.580 So here, we have these called AND, OR, and these parentheses here too. 00:28:20.580 --> 00:28:25.230 So using AND and OR, I can change conditionals. 00:28:25.230 --> 00:28:28.830 I can put them together to make a more complex conditional, a compound 00:28:28.830 --> 00:28:29.880 condition. 00:28:29.880 --> 00:28:32.760 And I could also use these parentheses to symbolize 00:28:32.760 --> 00:28:35.190 that this condition should come first and then 00:28:35.190 --> 00:28:37.690 some condition should come afterwards as well. 00:28:37.690 --> 00:28:40.780 So let's try these in SQLite as well. 00:28:40.780 --> 00:28:42.310 I'll go back to my computer. 00:28:42.310 --> 00:28:47.220 And again, our goal is to find not just books in 2022 or 2023, 00:28:47.220 --> 00:28:51.100 but books that work across those years as well. 00:28:51.100 --> 00:28:53.220 So I'll, in this case say, SELECT-- 00:28:53.220 --> 00:28:59.590 let's go for a "title" and "author" from my "longlist" table. 00:28:59.590 --> 00:29:05.110 Now, I'll say WHERE the "year" is 2022, as we did 00:29:05.110 --> 00:29:10.360 before, OR perhaps the "year" is 2023. 00:29:10.360 --> 00:29:14.230 And notice how my query is kind of wrapping around my terminal. 00:29:14.230 --> 00:29:19.190 I could leave it like this, or if I backspace just a little bit, 00:29:19.190 --> 00:29:21.320 I could hit Enter. 00:29:21.320 --> 00:29:25.170 And now I'm on a new line to continue my query. 00:29:25.170 --> 00:29:30.380 So here, I'll say OR "year" is 2023. 00:29:30.380 --> 00:29:32.340 And now my query is done. 00:29:32.340 --> 00:29:33.680 So I'll hit Semicolon. 00:29:33.680 --> 00:29:42.560 And I should see those books published in or nominated in 2022 or 2023. 00:29:42.560 --> 00:29:44.580 Let's try a few more here. 00:29:44.580 --> 00:29:46.790 Let's try using our parentheses as well. 00:29:46.790 --> 00:29:49.580 Maybe I want not just these books but also 00:29:49.580 --> 00:29:52.290 those that are formatted in a hardcover format. 00:29:52.290 --> 00:29:53.600 So I'll say-- 00:29:53.600 --> 00:30:00.190 I'll clear my terminal, Control-L again, and SELECT "title" as well as, 00:30:00.190 --> 00:30:05.710 let's say, "format" from my "longlist" table. 00:30:05.710 --> 00:30:09.040 And now, I'll hit my new line to extend my query 00:30:09.040 --> 00:30:10.990 without wrapping it on my terminal. 00:30:10.990 --> 00:30:11.980 I'll hit Enter. 00:30:11.980 --> 00:30:20.440 And I'll say WHERE the "year" is 2022 OR the "year" is 2023. 00:30:20.440 --> 00:30:22.210 That's one condition. 00:30:22.210 --> 00:30:26.170 And I can denote that with a single set of parentheses here. 00:30:26.170 --> 00:30:33.650 I also want it to be true that the "format" is not "hardcover." 00:30:33.650 --> 00:30:36.650 So now, I'm adding another condition in here. 00:30:36.650 --> 00:30:39.050 Now, I'll say Semicolon, hit Enter. 00:30:39.050 --> 00:30:43.100 And I'll get back only those books that are published in the paperback 00:30:43.100 --> 00:30:49.190 format in 2022 and 2023. 00:30:49.190 --> 00:30:50.420 OK. 00:30:50.420 --> 00:30:53.840 So let me pause again and ask if there are any questions so far 00:30:53.840 --> 00:31:00.960 in how we've been using WHERE and SELECT and other conditions as well. 00:31:00.960 --> 00:31:05.120 SPEAKER: I would like to know about the-- 00:31:05.120 --> 00:31:10.040 we can have a list of the top titles available in the database. 00:31:10.040 --> 00:31:15.380 Like you have mentioned, the title, author, where we can know about that. 00:31:15.380 --> 00:31:19.418 What are the titles available in the database using command below? 00:31:19.418 --> 00:31:20.210 CARTER ZENKE: Yeah. 00:31:20.210 --> 00:31:23.420 So here I've been using these column names called "title" and "author." 00:31:23.420 --> 00:31:26.670 And I think your question is, how would I know that I have these columns? 00:31:26.670 --> 00:31:29.520 Well, as we'll see in future times together, 00:31:29.520 --> 00:31:32.600 I'll be able to actually look at the schema of my database, what 00:31:32.600 --> 00:31:33.890 columns are inside of it. 00:31:33.890 --> 00:31:37.280 For now, just take it on my own word that I 00:31:37.280 --> 00:31:40.430 knew what was inside the database before I actually started querying it. 00:31:40.430 --> 00:31:42.380 We'll see later on how you can take a database 00:31:42.380 --> 00:31:44.630 and understand the columns you have there too. 00:31:44.630 --> 00:31:45.840 A good question. 00:31:45.840 --> 00:31:48.080 Let's jump into some more queries then. 00:31:48.080 --> 00:31:49.610 I'll go back to my computer. 00:31:49.610 --> 00:31:53.580 And let's see what else we could do with these conditions. 00:31:53.580 --> 00:31:57.380 Well, not only could I try to make compound conditions, 00:31:57.380 --> 00:32:02.480 I could also try to find, let's say, which data is missing. 00:32:02.480 --> 00:32:06.860 So I know in this table, I have not just authors of books, 00:32:06.860 --> 00:32:09.620 but translators of those books. 00:32:09.620 --> 00:32:11.660 Often, books for the International Booker Prize 00:32:11.660 --> 00:32:15.590 were translated from some other language into English-- 00:32:15.590 --> 00:32:18.590 but some weren't or at least they didn't have a translator 00:32:18.590 --> 00:32:20.480 that was separate from the author. 00:32:20.480 --> 00:32:23.570 So to think about what data is missing from our table, 00:32:23.570 --> 00:32:25.880 we should introduce this new idea-- 00:32:25.880 --> 00:32:28.440 this one called NULL. 00:32:28.440 --> 00:32:29.730 So I'll walk over here. 00:32:29.730 --> 00:32:33.650 And we'll see that we have this type called NULL, where this 00:32:33.650 --> 00:32:35.480 means that this value doesn't exist. 00:32:35.480 --> 00:32:37.770 It's not in our database. 00:32:37.770 --> 00:32:42.290 We can actually put together a condition around this idea of null, something 00:32:42.290 --> 00:32:44.000 not being there. 00:32:44.000 --> 00:32:47.570 We could use IS NULL to figure out if a value is null. 00:32:47.570 --> 00:32:48.228 It's not there. 00:32:48.228 --> 00:32:49.520 It's missing from our database. 00:32:49.520 --> 00:32:54.283 Or IS IT NULL, meaning that it actually is there. 00:32:54.283 --> 00:32:56.450 So I'll go back to SQLite and show you what we could 00:32:56.450 --> 00:32:59.970 do with some of these concepts here. 00:32:59.970 --> 00:33:01.400 Let me go back to my terminal. 00:33:01.400 --> 00:33:04.520 And let's say I do want to find those translators that 00:33:04.520 --> 00:33:06.600 don't exist in my database. 00:33:06.600 --> 00:33:13.400 Well, I could use SELECT, let's say, "title" and "translator" from my 00:33:13.400 --> 00:33:14.450 "longlist." 00:33:14.450 --> 00:33:18.560 And I want to make sure that these translators are null-- 00:33:18.560 --> 00:33:19.920 they don't exist. 00:33:19.920 --> 00:33:25.250 So I'll say WHERE "translator" IS NULL Semicolon. 00:33:25.250 --> 00:33:26.780 Now, I'll hit Enter. 00:33:26.780 --> 00:33:28.218 And I should see two books. 00:33:28.218 --> 00:33:31.010 Titles are The Perfect Nine and the indictment of The Enlightenment 00:33:31.010 --> 00:33:31.927 of The Greengate Tree. 00:33:31.927 --> 00:33:35.330 But notice how over here, this value is null-- 00:33:35.330 --> 00:33:38.070 it doesn't exist in my table. 00:33:38.070 --> 00:33:43.340 I could conversely find those books that do have translators using IS NOT NULL. 00:33:43.340 --> 00:33:46.050 And I will try this one again. 00:33:46.050 --> 00:33:53.900 But in this case, I'll say, WHERE "translator" IS NOT NULL Semicolon. 00:33:53.900 --> 00:33:55.670 And I'll hit Enter. 00:33:55.670 --> 00:33:58.340 And now, let me just zoom out a little bit. 00:33:58.340 --> 00:34:03.470 I can see that I have both titles on the left-hand side 00:34:03.470 --> 00:34:05.780 and translators on the right-hand side. 00:34:05.780 --> 00:34:08.040 All of these actually exists. 00:34:08.040 --> 00:34:11.719 These are books that did have translators in this case. 00:34:11.719 --> 00:34:14.810 So a good way to find data that's missing in your table using 00:34:14.810 --> 00:34:17.632 NULL or IS NOT NULL. 00:34:17.632 --> 00:34:19.840 So let's come back over here and figure out what more 00:34:19.840 --> 00:34:22.510 we can do with some of these queries. 00:34:22.510 --> 00:34:27.460 We've kind of exhausted our work with some of our conditions, 00:34:27.460 --> 00:34:29.810 like chaining them together and using NULL and so on. 00:34:29.810 --> 00:34:34.540 But one more thing we could do is trying to use this idea of matching 00:34:34.540 --> 00:34:36.889 some kind of pattern in my database. 00:34:36.889 --> 00:34:40.480 So maybe I'm a book reader. 00:34:40.480 --> 00:34:44.469 And I want to find a book with the word "love" somewhere in the title. 00:34:44.469 --> 00:34:47.020 Well, for this, I could use another keyword-- 00:34:47.020 --> 00:34:49.270 this one called LIKE. 00:34:49.270 --> 00:34:54.429 So LIKE is a good keyword to use when I want to roughly match 00:34:54.429 --> 00:34:56.710 some string in my database. 00:34:56.710 --> 00:34:59.140 Let's say I want to look at book titles and find 00:34:59.140 --> 00:35:01.660 if some word exists in that title. 00:35:01.660 --> 00:35:04.070 I could use LIKE for that. 00:35:04.070 --> 00:35:06.430 And LIKE becomes powerful when you combine it 00:35:06.430 --> 00:35:10.830 with these other operators, namely this percent sign and this underscore. 00:35:10.830 --> 00:35:15.610 The percent sign can match any character around a string I give it. 00:35:15.610 --> 00:35:18.610 And the underscore can match any single character 00:35:18.610 --> 00:35:20.980 that I pass in with my string. 00:35:20.980 --> 00:35:23.450 It's probably best shown with an example. 00:35:23.450 --> 00:35:26.260 So let me show you some in my terminal here. 00:35:26.260 --> 00:35:27.310 I'll walk back. 00:35:27.310 --> 00:35:29.380 And, again, we'll try to find these books that 00:35:29.380 --> 00:35:32.230 have "love" somewhere in the title. 00:35:32.230 --> 00:35:38.200 So I'll say in this case, SELECT, let's say, "title" from "longlist." 00:35:38.200 --> 00:35:39.880 But I don't want all titles. 00:35:39.880 --> 00:35:43.940 I only want those that have "love" somewhere in this title. 00:35:43.940 --> 00:35:53.650 So I'll say WHERE "title" LIKE, let's say, percent love percent Semicolon. 00:35:53.650 --> 00:35:57.640 Now before I run this, let me explain what this is doing. 00:35:57.640 --> 00:36:01.600 I have here a SELECT query asking for the "title" 00:36:01.600 --> 00:36:04.600 column from my "longlist" table. 00:36:04.600 --> 00:36:11.110 But I'll only get back those rows where "title" is LIKE percent love percent. 00:36:11.110 --> 00:36:12.530 But what does that mean? 00:36:12.530 --> 00:36:16.330 Well, the percent, remember, matches any string of characters. 00:36:16.330 --> 00:36:18.580 It could match a, b, c, 1, 2, 3. 00:36:18.580 --> 00:36:22.270 As long as any string of characters comes after and has "love," 00:36:22.270 --> 00:36:25.460 I could match that value here. 00:36:25.460 --> 00:36:28.045 Similarly, the percent sign after says anything 00:36:28.045 --> 00:36:32.630 that comes after "love," as long as "love" is somewhere in the middle. 00:36:32.630 --> 00:36:36.220 So anything before, anything after, but so long as "love" 00:36:36.220 --> 00:36:38.530 is just somewhere in there, I'll get it back. 00:36:38.530 --> 00:36:42.280 So let me try that running this query then and come back over here. 00:36:42.280 --> 00:36:44.530 I will hit Enter on my query. 00:36:44.530 --> 00:36:47.050 And I'll see I get back four books-- 00:36:47.050 --> 00:36:51.110 Love in the Big City, More Than I Love My Life and so on. 00:36:51.110 --> 00:36:54.880 So notice how if I come back over here that each of these titles 00:36:54.880 --> 00:36:56.710 has "love" somewhere in it. 00:36:56.710 --> 00:37:00.130 For this one, I match love up front and then 00:37:00.130 --> 00:37:02.930 had any string of characters coming after it like this. 00:37:02.930 --> 00:37:05.680 For this one, I had More Than I Love My Life. 00:37:05.680 --> 00:37:07.390 I got some string before it. 00:37:07.390 --> 00:37:12.580 And then afterwards, any string after it, "love" is somewhere in the middle. 00:37:12.580 --> 00:37:14.650 Let me show you another example too, where 00:37:14.650 --> 00:37:17.360 we use percent in a different way. 00:37:17.360 --> 00:37:19.870 Let's say I want to find only those books that 00:37:19.870 --> 00:37:23.320 have "The" at the very beginning of the title. 00:37:23.320 --> 00:37:24.410 Let me try this. 00:37:24.410 --> 00:37:30.880 I'll say a SELECT "title" from "longlist" then WHERE, 00:37:30.880 --> 00:37:39.960 let's say, the "title" is LIKE the percent Semicolon. 00:37:39.960 --> 00:37:41.860 Now, I've changed something up. 00:37:41.860 --> 00:37:47.760 I have not percent in front and behind, but only after the "The." 00:37:47.760 --> 00:37:50.670 So in this case, I'll get back not anything 00:37:50.670 --> 00:37:53.670 that has "The" the title wherever, but now 00:37:53.670 --> 00:37:56.610 at the very beginning of this string. 00:37:56.610 --> 00:37:59.640 And I see perhaps a style mistake. 00:37:59.640 --> 00:38:01.950 Let me ask the audience, what style mistake did 00:38:01.950 --> 00:38:05.680 I just make when I typed in this query? 00:38:05.680 --> 00:38:09.595 So I'm seeing maybe I used double quotes when I should have used single quotes. 00:38:09.595 --> 00:38:11.220 So let me come back and fix that first. 00:38:11.220 --> 00:38:12.720 I'll come back over here. 00:38:12.720 --> 00:38:17.760 And, again, by convention, we tend to use single quotes for our strings. 00:38:17.760 --> 00:38:19.800 So let me fix that right here. 00:38:19.800 --> 00:38:22.230 And now let me run this query to see what we get back. 00:38:22.230 --> 00:38:23.280 I'll hit Enter. 00:38:23.280 --> 00:38:28.250 And I'll see only those books that begin with "The." 00:38:28.250 --> 00:38:32.800 Now, let me show this query again, though. 00:38:32.800 --> 00:38:34.900 This was our query here. 00:38:34.900 --> 00:38:39.070 Knowing what we know about the percent sign, 00:38:39.070 --> 00:38:45.010 what other titles might I accidentally get back by running this query? 00:38:45.010 --> 00:38:47.305 I have "The" percent. 00:38:47.305 --> 00:38:51.250 But what other words actually begin these book titles 00:38:51.250 --> 00:38:53.770 if I were to run this query here? 00:38:53.770 --> 00:38:57.100 We saw only those with "The," but if I had other book titles, what might 00:38:57.100 --> 00:38:58.620 I get back? 00:38:58.620 --> 00:39:00.570 So I might get back those book titles that 00:39:00.570 --> 00:39:04.020 have not just "The" the beginning but also, let's say, 00:39:04.020 --> 00:39:06.220 "There" or "They" or so on. 00:39:06.220 --> 00:39:08.805 There are many words beginning with T-H-E. 00:39:08.805 --> 00:39:10.680 And if I had the percent sign right after it, 00:39:10.680 --> 00:39:15.400 I might match one of those words like Y or R-E or so on. 00:39:15.400 --> 00:39:17.650 So I didn't have any of those titles in this database. 00:39:17.650 --> 00:39:22.910 But you can imagine a different database where I have that kind of data. 00:39:22.910 --> 00:39:24.260 Let's fix this then. 00:39:24.260 --> 00:39:28.950 I say, I want to match not just "The" but "The" and a space, 00:39:28.950 --> 00:39:31.100 and match any characters after that to make this 00:39:31.100 --> 00:39:35.380 query better designed in this instance. 00:39:35.380 --> 00:39:36.210 OK. 00:39:36.210 --> 00:39:38.940 So let me pause here then and ask what questions we 00:39:38.940 --> 00:39:44.180 have on using percent along with LIKE. 00:39:44.180 --> 00:39:48.138 SPEAKER: Can we use two percent signs between two words? 00:39:48.138 --> 00:39:49.430 CARTER ZENKE: Yeah, I think so. 00:39:49.430 --> 00:39:52.740 So let me go back to my terminal here and let me try to answer this question. 00:39:52.740 --> 00:39:56.790 So can we use two percent signs to say something's between two words? 00:39:56.790 --> 00:39:57.890 So let me say this. 00:39:57.890 --> 00:40:06.710 I'll go with SELECT, let's say, "title" from "longlist" WHERE maybe 00:40:06.710 --> 00:40:10.220 the "title" has something like-- 00:40:10.220 --> 00:40:11.630 maybe it begins with "The." 00:40:11.630 --> 00:40:17.210 So I'll say WHERE "title" LIKE "The." 00:40:17.210 --> 00:40:20.780 And then I'll take any set of characters afterwards. 00:40:20.780 --> 00:40:23.630 But I want to have "love" also in here too. 00:40:23.630 --> 00:40:25.070 So I'll say "love." 00:40:25.070 --> 00:40:31.500 And then I'll take any other characters after it, percent again, single quote, 00:40:31.500 --> 00:40:32.880 Semicolon. 00:40:32.880 --> 00:40:37.410 Now, I don't know if I have any books that actually fit this search. 00:40:37.410 --> 00:40:41.760 But I could say the percent love percent to mean give me 00:40:41.760 --> 00:40:44.730 back any title that has "The" at the beginning 00:40:44.730 --> 00:40:48.510 then any words or characters then "love" then any words or characters 00:40:48.510 --> 00:40:49.680 after that. 00:40:49.680 --> 00:40:50.490 I'll hit Enter. 00:40:50.490 --> 00:40:54.310 And I see I don't have any books that fit that kind of description. 00:40:54.310 --> 00:40:58.410 But if I did, I would see them here with "The" at the beginning and "love" 00:40:58.410 --> 00:41:01.700 somewhere in the title. 00:41:01.700 --> 00:41:02.200 All right. 00:41:02.200 --> 00:41:03.033 So let's keep going. 00:41:03.033 --> 00:41:08.010 Let's focus not just on this percent sign, but also on this underscore. 00:41:08.010 --> 00:41:09.330 So if I want to find-- 00:41:09.330 --> 00:41:13.950 let's say, I don't know what a particular character is in my title. 00:41:13.950 --> 00:41:17.580 I could use this underscore to match any particular character. 00:41:17.580 --> 00:41:20.820 Not any string of characters, but any single character too. 00:41:20.820 --> 00:41:23.250 So let's try this in our terminal. 00:41:23.250 --> 00:41:26.430 And there is a book, this one called a Pyre. 00:41:26.430 --> 00:41:29.160 And I actually keep forgetting how it's spelled. 00:41:29.160 --> 00:41:34.690 I don't know whether it's P-I-R-E or P-Y-R-E. It could be either one, 00:41:34.690 --> 00:41:36.760 but I want to find it in my database. 00:41:36.760 --> 00:41:37.770 So let me try this. 00:41:37.770 --> 00:41:44.670 I'll say SELECT, let's say, "title" FROM "longlist," WHERE, in this case, 00:41:44.670 --> 00:41:46.800 "title" is LIKE-- 00:41:46.800 --> 00:41:49.740 well, I know that it starts with a P. And I 00:41:49.740 --> 00:41:54.000 don't know if this is an I or a Y. But I'll at least leave it as an underscore 00:41:54.000 --> 00:41:56.880 now to say it could be any character here. 00:41:56.880 --> 00:42:02.800 Then I'll say R-E single quote, Semicolon. 00:42:02.800 --> 00:42:04.860 And now I could try hitting Enter. 00:42:04.860 --> 00:42:08.820 And I'll see I get back this title called Pyre. 00:42:08.820 --> 00:42:12.060 So notice in this case that the underscore is matching 00:42:12.060 --> 00:42:14.310 literally any single character. 00:42:14.310 --> 00:42:20.230 This could be a Y. It could be an I. But in this case, I have this Y here. 00:42:20.230 --> 00:42:20.835 OK. 00:42:20.835 --> 00:42:21.460 Let me go back. 00:42:21.460 --> 00:42:23.920 And let's actually ask in this case what questions we 00:42:23.920 --> 00:42:28.077 have on using LIKE with these single underscores if any. 00:42:28.077 --> 00:42:28.660 SPEAKER: Yeah. 00:42:28.660 --> 00:42:33.230 So Carter, I wanted to ask you, that as you use the underscore sign here, 00:42:33.230 --> 00:42:37.390 so for multiple characters, can we use multiple underscores in order 00:42:37.390 --> 00:42:39.698 to find something in the database? 00:42:39.698 --> 00:42:40.990 CARTER ZENKE: A great question. 00:42:40.990 --> 00:42:43.767 Could we use more than one underscore to try to find 00:42:43.767 --> 00:42:45.100 some characters in our database? 00:42:45.100 --> 00:42:45.975 You absolutely could. 00:42:45.975 --> 00:42:47.500 So let me try that myself. 00:42:47.500 --> 00:42:49.630 I'll go back to my terminal. 00:42:49.630 --> 00:42:52.510 And let me try to find a book title this could work with. 00:42:52.510 --> 00:42:56.380 I'll say maybe SELECT "title" from "longlist" 00:42:56.380 --> 00:42:58.660 to get back all the books in this table. 00:42:58.660 --> 00:42:59.980 I'll hit Semicolon. 00:42:59.980 --> 00:43:02.755 And maybe I will go with-- 00:43:05.930 --> 00:43:08.720 let's try this one called Tyll. 00:43:08.720 --> 00:43:13.760 Well, maybe I want to just find the titles that have a Y or an I in here, 00:43:13.760 --> 00:43:17.693 but I also don't know if it's one L or maybe two L's, for instance. 00:43:17.693 --> 00:43:18.860 Let me go back and try this. 00:43:18.860 --> 00:43:26.930 I'll say SELECT, let's say, "title" FROM "longlist" WHERE my "title" is LIKE-- 00:43:26.930 --> 00:43:31.490 I know it begins with a T. I don't know if this is a Y or an I. 00:43:31.490 --> 00:43:36.110 And maybe I know that it has maybe one or two characters after it. 00:43:36.110 --> 00:43:37.610 So I'll try this one. 00:43:37.610 --> 00:43:40.820 Now I have three underscores, single underscore. 00:43:40.820 --> 00:43:45.200 So match any book title that has T and then any three 00:43:45.200 --> 00:43:47.000 individual characters after it. 00:43:47.000 --> 00:43:48.290 I'll hit Enter. 00:43:48.290 --> 00:43:50.300 And I'll see I get back Tyll. 00:43:50.300 --> 00:43:56.000 This is the only title that has T and then three characters after it. 00:43:56.000 --> 00:43:57.690 I could try to get better with this. 00:43:57.690 --> 00:44:00.980 I could say maybe I'll accept five or six characters like that. 00:44:00.980 --> 00:44:02.030 Hit Enter. 00:44:02.030 --> 00:44:04.497 And I'll see-- whoops, I didn't compute my query here. 00:44:04.497 --> 00:44:06.080 Let me just try it from the top again. 00:44:06.080 --> 00:44:08.870 I'll say SELECT "title" from "longlist" where 00:44:08.870 --> 00:44:12.770 title is like, let's say, T-Y underscore, underscore, 00:44:12.770 --> 00:44:13.970 underscore, underscore. 00:44:13.970 --> 00:44:16.790 Hit Semicolon. 00:44:16.790 --> 00:44:18.250 And now I get no matches. 00:44:18.250 --> 00:44:22.090 So there is no book in the database that has T-Y 00:44:22.090 --> 00:44:28.140 and then, let's say, three or four underscores for any character after it. 00:44:28.140 --> 00:44:28.830 OK. 00:44:28.830 --> 00:44:32.910 So this covers our use of LIKE, but let's 00:44:32.910 --> 00:44:35.130 keep going and building more complex conditions 00:44:35.130 --> 00:44:40.440 to find even more answers to questions we have about this data over here. 00:44:40.440 --> 00:44:43.530 Let me think what we should show next. 00:44:43.530 --> 00:44:44.790 We've seen LIKE. 00:44:44.790 --> 00:44:46.860 We've seen some compound conditionals. 00:44:46.860 --> 00:44:51.390 Well, let's go back to trying to find books that are in a certain year. 00:44:51.390 --> 00:44:54.540 So we saw earlier we had this kind of query. 00:44:54.540 --> 00:45:00.810 We could say SELECT "title" and "year" FROM, let's say, our "longlist." 00:45:00.810 --> 00:45:04.380 Now I can try to find those books that are written or nominated 00:45:04.380 --> 00:45:06.990 in 2022 and 2023. 00:45:06.990 --> 00:45:08.880 But let's say I want to go further. 00:45:08.880 --> 00:45:15.130 I want those from 2019 to 2022, a span of multiple years here. 00:45:15.130 --> 00:45:18.990 So I could try it like this, WHERE "year" equals-- 00:45:18.990 --> 00:45:21.300 let's go ahead and say 2019-- 00:45:21.300 --> 00:45:28.160 OR "year" is 2020 OR "year" is 2021. 00:45:28.160 --> 00:45:30.380 And let me make a new line again. 00:45:30.380 --> 00:45:36.190 OR "year" is 2022 Semicolon. 00:45:36.190 --> 00:45:39.940 And before I run this query, let me ask our audience, 00:45:39.940 --> 00:45:44.910 what strikes you as being not very well designed about this query? 00:45:44.910 --> 00:45:47.560 What could I be doing better here? 00:45:47.560 --> 00:45:52.650 So I'm seeing maybe one improvement is that I don't need to write out 00:45:52.650 --> 00:45:55.950 OR "year" is this, OR "year" is that. 00:45:55.950 --> 00:45:57.900 I could probably do better with this. 00:45:57.900 --> 00:46:00.630 And let's introduce some new keywords for working 00:46:00.630 --> 00:46:04.240 with ranges in terms of our conditions. 00:46:04.240 --> 00:46:08.460 So here, we can see some new operators to use. 00:46:08.460 --> 00:46:12.450 We have this greater than sign, this less than sign, 00:46:12.450 --> 00:46:15.360 greater than or equal to, and less than or equal to. 00:46:15.360 --> 00:46:19.800 And we can use these to build ranges inside of our queries to say, 00:46:19.800 --> 00:46:24.630 I want something to be greater than this number or less than this number too. 00:46:24.630 --> 00:46:27.180 And we can combine these with AND and our OR 00:46:27.180 --> 00:46:31.440 we saw before to get back in this case some set of rows 00:46:31.440 --> 00:46:34.230 that match what we intend to find. 00:46:34.230 --> 00:46:36.450 So let me go back and try some of these out. 00:46:36.450 --> 00:46:39.840 I'll try to improve the design of this query. 00:46:39.840 --> 00:46:46.350 Let me first run it, and we'll see we do get back 2019 to 2022. 00:46:46.350 --> 00:46:47.980 But I could probably do better. 00:46:47.980 --> 00:46:50.760 So let's try using our new operators here 00:46:50.760 --> 00:46:53.370 that can give us some range capabilities. 00:46:53.370 --> 00:46:59.790 I'll say SELECT "title" and also SELECT "year" from "longlist." 00:46:59.790 --> 00:47:07.080 But now I want those rows where the year is greater than or equal to 2019, 00:47:07.080 --> 00:47:13.380 and the year is less than or equal to 2022 Semicolon. 00:47:13.380 --> 00:47:14.950 I'll hit Enter. 00:47:14.950 --> 00:47:17.110 Notice I'll get the very same results. 00:47:17.110 --> 00:47:19.020 So I get all those same rows. 00:47:19.020 --> 00:47:21.700 But now my query is much smaller. 00:47:21.700 --> 00:47:26.340 It's making use of these range operators I've seen so far. 00:47:26.340 --> 00:47:28.930 I could even further improve this. 00:47:28.930 --> 00:47:31.380 I could make this a little better designed to. 00:47:31.380 --> 00:47:34.590 Let me go back to some slides and show you we could use these keywords 00:47:34.590 --> 00:47:39.510 BETWEEN blank AND blank, where this can be some condition 00:47:39.510 --> 00:47:41.230 or some number in this case. 00:47:41.230 --> 00:47:46.320 I could say between, let's say, 2019 AND 2022. 00:47:46.320 --> 00:47:47.740 This is inclusive. 00:47:47.740 --> 00:47:56.280 So if I say 2019 AND 2022, I'll get back a query that includes 2019 and 2022. 00:47:56.280 --> 00:47:57.900 So let me try this one. 00:47:57.900 --> 00:47:59.130 I'll go back over here. 00:47:59.130 --> 00:48:04.170 And I will now try SELECT "title" and "year" 00:48:04.170 --> 00:48:12.370 from "longlist," WHERE the "year" is between 2019 AND 2022. 00:48:12.370 --> 00:48:17.990 Same results now, which is a different way of writing this same query. 00:48:17.990 --> 00:48:21.120 Now what else can we do with these ranges? 00:48:21.120 --> 00:48:26.480 Well, as we've said before, these books actually have some ratings involved. 00:48:26.480 --> 00:48:29.030 These ratings are crowdsourced from Goodreads, 00:48:29.030 --> 00:48:31.400 a site you can review books online. 00:48:31.400 --> 00:48:35.570 And I want to find maybe the books that have a rating of 4.0 or higher. 00:48:35.570 --> 00:48:41.270 Well, I could do that now with my ranges I could say SELECT "title" and "rating" 00:48:41.270 --> 00:48:43.010 from my "longlist." 00:48:43.010 --> 00:48:48.920 And I could say WHERE the "rating" is greater than 4.0 Semicolon. 00:48:48.920 --> 00:48:49.940 I'll hit Enter. 00:48:49.940 --> 00:48:55.100 And I'll see now only those books that have a rating of 4.0 00:48:55.100 --> 00:48:58.660 or higher like this. 00:48:58.660 --> 00:49:00.970 I could even combine conditions. 00:49:00.970 --> 00:49:03.510 So I know that these books have a certain rating, 00:49:03.510 --> 00:49:05.730 but how many votes do they really get? 00:49:05.730 --> 00:49:07.210 Well, let's take a peek. 00:49:07.210 --> 00:49:08.460 I'll come back over here. 00:49:08.460 --> 00:49:10.800 And let me try this one. 00:49:10.800 --> 00:49:13.780 I could say SELECT "title." 00:49:13.780 --> 00:49:17.120 Oops, let me clear my terminal again so it's back up top. 00:49:17.120 --> 00:49:22.880 SELECT "title" and "rating" and the number of 00:49:22.880 --> 00:49:28.040 votes that these books got from, let's say, our "longlist" table. 00:49:28.040 --> 00:49:34.400 Now, I want to find those that have a rating of greater than 4.0, 00:49:34.400 --> 00:49:38.680 and, let's say, a number of votes-- 00:49:38.680 --> 00:49:41.080 a number of votes which is greater than at least-- 00:49:41.080 --> 00:49:43.330 let's go with 10,000. 00:49:43.330 --> 00:49:46.960 So at least we know a good number of folks actually voted on these books 00:49:46.960 --> 00:49:48.970 to find the best among them. 00:49:48.970 --> 00:49:49.990 I'll Enter. 00:49:49.990 --> 00:49:53.620 And now I'll see we're only down to a few books, 00:49:53.620 --> 00:49:58.270 four in fact, where each one has a rating higher than 4.0. 00:49:58.270 --> 00:50:06.490 And indeed, every vote row has a vote total greater than 10,000 in this case. 00:50:06.490 --> 00:50:12.100 So a good way to try to find the top books in our data set here. 00:50:12.100 --> 00:50:13.920 Let's keep going with these ranges. 00:50:13.920 --> 00:50:17.190 And let's think about one more thing we could do. 00:50:17.190 --> 00:50:20.040 Maybe I want to find books that are less than a certain length. 00:50:20.040 --> 00:50:21.690 So I'll try that as well. 00:50:21.690 --> 00:50:27.180 I'll say SELECT, let's say, "title" and "pages" from my "longlist." 00:50:27.180 --> 00:50:30.000 And now I can make a condition based on pages. 00:50:30.000 --> 00:50:33.690 I'll say WHERE "pages" is less than 300. 00:50:33.690 --> 00:50:34.800 Hit Enter. 00:50:34.800 --> 00:50:37.800 And now I should see that I have all these books that 00:50:37.800 --> 00:50:43.270 are less than 300 pages long when they were first published. 00:50:43.270 --> 00:50:45.330 So let's pause here and ask what questions 00:50:45.330 --> 00:50:49.020 we have on these range conditions. 00:50:49.020 --> 00:50:55.230 SPEAKER: I just wanted to check if for a proper query in this case 00:50:55.230 --> 00:51:00.240 to be able to run operations, they have to be integers in the database. 00:51:00.240 --> 00:51:04.080 And my second question is for when we're matching a string, 00:51:04.080 --> 00:51:06.203 is it case-sensitive or not? 00:51:06.203 --> 00:51:07.870 CARTER ZENKE: Yeah, two great questions. 00:51:07.870 --> 00:51:12.520 So the first one here is going to be, do I have to use integers in this case? 00:51:12.520 --> 00:51:14.310 And what types, maybe, should I use? 00:51:14.310 --> 00:51:17.850 And the second one being, could I match strings like case insensitively? 00:51:17.850 --> 00:51:20.940 So for the first one, in this case, it'll 00:51:20.940 --> 00:51:23.110 depend on the design of your database. 00:51:23.110 --> 00:51:25.140 So we'll see later on in the course, how we 00:51:25.140 --> 00:51:27.420 can choose the types for our columns. 00:51:27.420 --> 00:51:31.260 And how that might impact the types we actually use in our queries. 00:51:31.260 --> 00:51:33.400 For now, I made this database. 00:51:33.400 --> 00:51:36.750 So I just know that my year column is an integer, 00:51:36.750 --> 00:51:40.020 my ratings column is a real number or a float, if you're familiar, 00:51:40.020 --> 00:51:41.560 and my votes is an integer. 00:51:41.560 --> 00:51:43.800 So I just know to use those numbers there. 00:51:43.800 --> 00:51:47.460 To the question of matching things case-insensitively, 00:51:47.460 --> 00:51:50.820 let's actually revisit LIKE just briefly here to show you what that can do. 00:51:50.820 --> 00:51:58.160 So I go back to my terminal, and let's say I want to find just a book title. 00:51:58.160 --> 00:51:59.910 And I want to type it in kind of sloppily. 00:51:59.910 --> 00:52:02.118 I don't want to capitalize it like capital books are. 00:52:02.118 --> 00:52:06.750 So I'll say SELECT let's say "title" from "longlist." 00:52:06.750 --> 00:52:10.930 And maybe, I'll want to find that book Pyre again. 00:52:10.930 --> 00:52:17.710 So I could say WHERE "title" is LIKE 'pyre', but all in lowercase. 00:52:17.710 --> 00:52:19.110 Now I'll hit Enter. 00:52:19.110 --> 00:52:21.540 And I'll see I do get back Pyre. 00:52:21.540 --> 00:52:27.510 So even though I said WHERE "title" is LIKE lowercase 'pyre', I got back 00:52:27.510 --> 00:52:29.880 capital Pyre. 00:52:29.880 --> 00:52:35.100 Now this is in contrast to saying WHERE "title" equals lowercase 'pyre'. 00:52:35.100 --> 00:52:36.610 Let's try that. 00:52:36.610 --> 00:52:37.830 I'll come back over here. 00:52:37.830 --> 00:52:43.560 And I'll say again SELECT "title" from "longlist," 00:52:43.560 --> 00:52:52.830 but now WHERE "title" equals, quote, unquote, 'pyre' Semicolon. 00:52:52.830 --> 00:52:53.790 I'll hit Enter. 00:52:53.790 --> 00:52:55.570 And now I see no results. 00:52:55.570 --> 00:52:59.010 So in this case, the equal is going to be case-sensitive. 00:52:59.010 --> 00:53:05.320 Case matters in this case, but LIKE is case-insensitive. 00:53:05.320 --> 00:53:06.990 OK. 00:53:06.990 --> 00:53:09.848 Why don't we keep going then? 00:53:09.848 --> 00:53:11.640 And let's take a look at a few other things 00:53:11.640 --> 00:53:15.450 we can do with these SQL keywords for querying. 00:53:15.450 --> 00:53:18.240 Well, earlier, we were trying to find a way 00:53:18.240 --> 00:53:21.480 to find the best books in our data set. 00:53:21.480 --> 00:53:24.810 And we did that by filtering them based on some ranges. 00:53:24.810 --> 00:53:27.210 But we could probably do that a little bit more 00:53:27.210 --> 00:53:33.180 methodically in this case using a new keyword, this one called ORDER BY. 00:53:33.180 --> 00:53:37.980 So ORDER BY allows us to take the results of our query 00:53:37.980 --> 00:53:43.000 and order them, as it suggests, by some column itself. 00:53:43.000 --> 00:53:47.670 So we could put them in alphabetical order or in order by a number of votes 00:53:47.670 --> 00:53:50.340 or in order by number of ratings. 00:53:50.340 --> 00:53:54.360 And let's just try an example of this to see how ORDER BY works for us. 00:53:54.360 --> 00:53:58.710 But in the end, we'll see it can arrange columns, arrange rows for us 00:53:58.710 --> 00:54:00.430 in our resulting query. 00:54:00.430 --> 00:54:02.790 So I'll go back to my computer. 00:54:02.790 --> 00:54:04.830 And let's try this question here. 00:54:04.830 --> 00:54:09.400 I want to try to find the top 10 books in my table. 00:54:09.400 --> 00:54:16.880 So I'll say SELECT "title" and "rating" from "longlist." 00:54:16.880 --> 00:54:18.080 Enter. 00:54:18.080 --> 00:54:19.820 Not only query yet though. 00:54:19.820 --> 00:54:24.640 Now I'll say ORDER BY the rating. 00:54:24.640 --> 00:54:26.800 And let's only take the top 10. 00:54:26.800 --> 00:54:31.120 So I'll say limit 10 in this instance Semicolon. 00:54:31.120 --> 00:54:34.330 So now I've combined some of my prior keywords. 00:54:34.330 --> 00:54:35.500 I'm using SELECT. 00:54:35.500 --> 00:54:36.790 I'm using ORDER BY. 00:54:36.790 --> 00:54:38.840 And I'm still using our old friend LIMIT. 00:54:38.840 --> 00:54:40.690 So let me hit Enter here. 00:54:40.690 --> 00:54:46.050 And I'll get back, well, not quite the top 10. 00:54:46.050 --> 00:54:52.240 I see rating of 3.05 here and rating of 3.42 down here. 00:54:52.240 --> 00:55:00.000 So based on this, what do you think the default ordering of ORDER BY is? 00:55:00.000 --> 00:55:02.020 So you might be from least to greatest. 00:55:02.020 --> 00:55:05.430 So we saw here that we have rating being pretty small, 00:55:05.430 --> 00:55:07.140 but we said ORDER BY our rating. 00:55:07.140 --> 00:55:10.440 So it starts from small and goes down to large. 00:55:10.440 --> 00:55:12.420 So we need to fix this in some way. 00:55:12.420 --> 00:55:15.240 And let's introduce a new addition to ORDER BY 00:55:15.240 --> 00:55:18.820 to have us fix this query overall. 00:55:18.820 --> 00:55:24.930 So let me show you that ORDER BY does by default sort from least to greatest. 00:55:24.930 --> 00:55:27.570 But let's try some addition here. 00:55:27.570 --> 00:55:31.830 We have not just ORDER BY but ORDER BY some column 00:55:31.830 --> 00:55:35.530 and then ascending or descending. 00:55:35.530 --> 00:55:39.480 So ascending is the default. It means from least to greatest. 00:55:39.480 --> 00:55:44.260 Descending, we can specify meaning from greatest to smallest. 00:55:44.260 --> 00:55:47.310 So let's try using ORDER BY but now with this other keyword 00:55:47.310 --> 00:55:50.520 called DESC for descending here. 00:55:50.520 --> 00:55:53.760 I'll go back to my terminal. 00:55:53.760 --> 00:55:57.210 And let's rewrite this query to include DESC. 00:55:57.210 --> 00:56:02.700 I'll say SELECT "title" and "rating" from "longlist." 00:56:02.700 --> 00:56:04.632 And let me-- before I run this query, let 00:56:04.632 --> 00:56:06.840 me just clear my terminal so it's back up at the top. 00:56:06.840 --> 00:56:08.580 I'll backspace this. 00:56:08.580 --> 00:56:13.050 And then a moment here, I'll press Control-L. Now I'm back at the top. 00:56:13.050 --> 00:56:21.847 I'll say SELECT "title", SELECT "title" and "rating" from "longlist" WHERE-- 00:56:21.847 --> 00:56:22.680 actually, not WHERE. 00:56:22.680 --> 00:56:23.820 We're not filtering yet. 00:56:23.820 --> 00:56:30.082 I'll say ORDER BY rating but not by ascending by default-- 00:56:30.082 --> 00:56:31.290 going from least to greatest. 00:56:31.290 --> 00:56:33.010 I want greatest to least. 00:56:33.010 --> 00:56:35.370 So I'll say DESC here. 00:56:35.370 --> 00:56:38.880 Now, I can say LIMIT 10 Semicolon. 00:56:38.880 --> 00:56:39.840 Hit Enter. 00:56:39.840 --> 00:56:42.540 And now I'll see the top 10 books. 00:56:42.540 --> 00:56:47.160 Here, I have The Eighth Life coming in at 4.52 and The Books of Jacob 00:56:47.160 --> 00:56:49.680 coming in at 4.06. 00:56:49.680 --> 00:56:53.460 So now we're going from greatest to smallest. 00:56:53.460 --> 00:56:56.880 Well, I could order by not just these ratings, 00:56:56.880 --> 00:56:59.220 but also by the number of votes. 00:56:59.220 --> 00:57:00.900 It seems there's a tie to break here. 00:57:00.900 --> 00:57:04.890 If I look at Still Born and When We Cease to Understand the World, 00:57:04.890 --> 00:57:07.800 those both have a rating of 4.14. 00:57:07.800 --> 00:57:12.430 But presumably, one book has maybe more votes than the other. 00:57:12.430 --> 00:57:18.360 So I could try to break this tie by ordering not just by rating but also 00:57:18.360 --> 00:57:22.530 by votes, the number votes this book actually received on Good Reads. 00:57:22.530 --> 00:57:25.290 So let's try that then to break this tie. 00:57:25.290 --> 00:57:26.730 I'll come back over here. 00:57:26.730 --> 00:57:29.490 And I'll try this query now. 00:57:29.490 --> 00:57:35.790 I'll say, again, SELECT "title" and "rating" from "longlist." 00:57:35.790 --> 00:57:41.860 Now I'll order by, first, the rating column in descending order. 00:57:41.860 --> 00:57:47.100 But I also want to order by the number of votes after I order by rating. 00:57:47.100 --> 00:57:50.730 So I'm saying first order by rating, but afterwards, 00:57:50.730 --> 00:57:54.210 followed by a comma, let's order by the number of votes 00:57:54.210 --> 00:57:58.500 also in descending order-- from greatest to smallest. 00:57:58.500 --> 00:58:01.170 Now I'll just continue my query on the next line. 00:58:01.170 --> 00:58:04.080 And I'll say LIMIT 10 Semicolon. 00:58:04.080 --> 00:58:09.120 This then gives me, let's see, the books but now 00:58:09.120 --> 00:58:12.970 they're going to be in the order that allows us to see the number of votes. 00:58:12.970 --> 00:58:14.380 Let me just actually refine this. 00:58:14.380 --> 00:58:17.112 Let me say not just title and rating, let's make 00:58:17.112 --> 00:58:18.570 sure we can see the votes here too. 00:58:18.570 --> 00:58:23.580 So SELECT "title" and "rating" and "votes" from "longlist." 00:58:23.580 --> 00:58:24.810 Hit Enter on my query. 00:58:24.810 --> 00:58:27.750 Now I'll say ORDER BY "rating" and "votes." 00:58:27.750 --> 00:58:29.550 Then I'll say LIMIT 10. 00:58:29.550 --> 00:58:32.160 And here, I'm just getting the Up Arrow on my computer. 00:58:32.160 --> 00:58:33.660 I'll hit Enter. 00:58:33.660 --> 00:58:36.150 And now I'll see the votes included. 00:58:36.150 --> 00:58:38.070 So let me show you this on the big screen. 00:58:38.070 --> 00:58:40.440 Here, we see that the tie is broken. 00:58:40.440 --> 00:58:42.600 So when we cease to understand the world, 00:58:42.600 --> 00:58:46.560 these both have 4.14 along with Still Born. 00:58:46.560 --> 00:58:51.540 But here, this book has more votes and so is higher in our order 00:58:51.540 --> 00:58:55.910 now that we've ordered by multiple columns. 00:58:55.910 --> 00:58:58.240 So let me pause here and ask what questions 00:58:58.240 --> 00:59:01.630 we have on ordering with data. 00:59:01.630 --> 00:59:07.480 Ordering by one column or multiple, and how we can sort data like this. 00:59:07.480 --> 00:59:14.355 SPEAKER: Sir, I want to know that can we write rating to 4.93 to 4.9? 00:59:14.355 --> 00:59:15.730 CARTER ZENKE: Yeah good question. 00:59:15.730 --> 00:59:18.580 I think if I understand you correctly, how can we select a rating 00:59:18.580 --> 00:59:22.820 or try to find a rating that's like equal to 4.92 or things like that. 00:59:22.820 --> 00:59:23.780 Let's try that here. 00:59:23.780 --> 00:59:26.350 So if I want to find a particular rating, 00:59:26.350 --> 00:59:30.250 I could simply use my WHERE friend from before. 00:59:30.250 --> 00:59:34.390 I could say SELECT, let's say, "title" and "rating." 00:59:34.390 --> 00:59:37.750 And maybe I could try to find a particular rating for a book 00:59:37.750 --> 00:59:39.400 from longlist. 00:59:39.400 --> 00:59:46.870 I could say then WHERE this rating is equal to, let's say, 4.932 Semicolon. 00:59:46.870 --> 00:59:49.160 If this book exists, it will get it back here. 00:59:49.160 --> 00:59:50.170 So I'll hit Enter. 00:59:50.170 --> 00:59:55.810 And I see there's no book with this particular rating, 4.392. 00:59:55.810 --> 01:00:01.510 But good question for how to find particular ratings for our books here. 01:00:01.510 --> 01:00:02.380 OK. 01:00:02.380 --> 01:00:09.130 Other questions too on how we've been able to sort our data and use ORDER BY? 01:00:09.130 --> 01:00:13.000 SPEAKER: Will descend work on a string on an alphabetic basis? 01:00:13.000 --> 01:00:18.098 Or do we need to have special conditions for alphabetic characters? 01:00:18.098 --> 01:00:19.640 CARTER ZENKE: Yeah, a great question. 01:00:19.640 --> 01:00:23.740 So how could we use ORDER BY with some characters or strings or some text 01:00:23.740 --> 01:00:24.590 in our database? 01:00:24.590 --> 01:00:27.790 Let's try that one out too and see how that works with ASC 01:00:27.790 --> 01:00:30.760 for ascending and DESC for descending. 01:00:30.760 --> 01:00:32.350 So I'll go back to my terminal. 01:00:32.350 --> 01:00:35.710 And I'll demonstrate here how we can use this for some text. 01:00:35.710 --> 01:00:40.780 So let's try to simply sorting our books alphabetically for, let's say, 01:00:40.780 --> 01:00:41.800 our library. 01:00:41.800 --> 01:00:46.900 I'll say SELECT "title" from "longlist," Enter. 01:00:46.900 --> 01:00:50.210 And I want to order by title, just plain and simple. 01:00:50.210 --> 01:00:51.590 And then hit Semicolon. 01:00:51.590 --> 01:00:52.750 Let's see what happens. 01:00:52.750 --> 01:00:54.160 I'll hit Enter. 01:00:54.160 --> 01:00:57.700 And now I'll see that these books are ordered. 01:00:57.700 --> 01:00:59.980 But they seem to be ordered alphabetically. 01:00:59.980 --> 01:01:03.700 So here, we have some titles lower in the alphabet. 01:01:03.700 --> 01:01:06.320 And up here, we have titles earlier in the alphabet. 01:01:06.320 --> 01:01:10.780 So by default ORDER BY seems to order alphabetically. 01:01:10.780 --> 01:01:15.070 If I change that default, though, from ascending to descending, 01:01:15.070 --> 01:01:16.960 let's see what happens. 01:01:16.960 --> 01:01:18.250 I'll go back over here. 01:01:18.250 --> 01:01:22.990 And I'll try the same query but now using DESC. 01:01:22.990 --> 01:01:24.760 SELECT "title" from "longlist." 01:01:24.760 --> 01:01:29.110 ORDER BY "title" now in descending order. 01:01:29.110 --> 01:01:30.400 Hit Enter. 01:01:30.400 --> 01:01:34.630 And now I'll see these titles in reverse alphabetical order. 01:01:34.630 --> 01:01:39.190 So notice how earlier on, we have titles that are lower in the alphabet. 01:01:39.190 --> 01:01:42.740 But down below, we have titles that are earlier in the alphabet here. 01:01:42.740 --> 01:01:46.330 So you can use ORDER BY with these texts. 01:01:46.330 --> 01:01:49.690 But you then have to specify whether you want it in alphabetical order 01:01:49.690 --> 01:01:53.960 or in reverse alphabetical order. 01:01:53.960 --> 01:01:58.700 OK, so let's show a few other concepts here 01:01:58.700 --> 01:02:01.740 we can use alongside of these orderings. 01:02:01.740 --> 01:02:06.080 One thing we could also do is try to find more information 01:02:06.080 --> 01:02:07.850 about the ratings of these books. 01:02:07.850 --> 01:02:10.970 So let's say I want not just to order these books 01:02:10.970 --> 01:02:15.710 but try to find the average rating, or to try to find the number of books, 01:02:15.710 --> 01:02:21.080 or try to find let's say maybe the sum of my total votes on each of my books. 01:02:21.080 --> 01:02:24.860 Well, for this, we could introduce some new concepts, 01:02:24.860 --> 01:02:29.060 these ones called SQL's aggregate functions. 01:02:29.060 --> 01:02:35.390 These allow us to take a whole set of rows and return not each of those rows 01:02:35.390 --> 01:02:36.560 individually. 01:02:36.560 --> 01:02:42.290 But instead, in this case, one number based on the values in those rows. 01:02:42.290 --> 01:02:45.350 You could imagine trying to count the number of rows you have 01:02:45.350 --> 01:02:47.960 or take the average of the number of rows, 01:02:47.960 --> 01:02:50.630 or take the average of let's say a rating, for instance. 01:02:50.630 --> 01:02:52.700 Finding the minimum rating of the maximum rating 01:02:52.700 --> 01:02:55.190 or finding the sum of some votes. 01:02:55.190 --> 01:02:58.230 And we'll see each of these in action here. 01:02:58.230 --> 01:03:01.460 Let's go back to our terminal, try some of these out. 01:03:01.460 --> 01:03:05.720 I will try, in this case first trying to find 01:03:05.720 --> 01:03:08.620 the average rating from my longlist. 01:03:08.620 --> 01:03:12.050 Well, I just from experience, and as you now know too, 01:03:12.050 --> 01:03:14.870 I can try to find the average of some column 01:03:14.870 --> 01:03:18.390 by using the AVG aggregate function. 01:03:18.390 --> 01:03:22.970 So I'll say SELECT not just rating in this case 01:03:22.970 --> 01:03:27.920 but select the average rating FROM "longlist." 01:03:27.920 --> 01:03:33.560 Notice how in this case, I'm using this kind of syntax, where I take rating, 01:03:33.560 --> 01:03:38.330 my column I want to aggregate or to sum up or to average like this. 01:03:38.330 --> 01:03:41.510 And I apply the function by saying its name 01:03:41.510 --> 01:03:45.180 followed by some parentheses around that column name. 01:03:45.180 --> 01:03:47.630 So this will return to me not all of the rating 01:03:47.630 --> 01:03:52.430 rows but the average of the rating rows in one single cell. 01:03:52.430 --> 01:03:54.080 Let me try this. 01:03:54.080 --> 01:03:56.990 I'll come back, and I will then hit Enter. 01:03:56.990 --> 01:04:00.440 And I'll see this is the average rating. 01:04:00.440 --> 01:04:08.720 We have 3.7537179471795 is our average rating for all of these books. 01:04:08.720 --> 01:04:11.780 But of course, this isn't great. 01:04:11.780 --> 01:04:15.920 What might I want to do if I was going to show this to somebody else? 01:04:15.920 --> 01:04:20.940 I could probably improve the presentation of this in some way. 01:04:20.940 --> 01:04:25.640 So I could probably round this result. I have 3.75371, 01:04:25.640 --> 01:04:28.520 we can probably stop after two decimal points, right? 01:04:28.520 --> 01:04:30.620 Just simply like 3.75. 01:04:30.620 --> 01:04:35.330 So I could introduce some new keyword here, this one around the results. 01:04:35.330 --> 01:04:37.850 Let me show you this one in action. 01:04:37.850 --> 01:04:41.540 I'll come back, and I'll try not just select average rating 01:04:41.540 --> 01:04:44.370 but select the rounded average rating. 01:04:44.370 --> 01:04:51.920 So I'll say SELECT ROUND and then take average of "rating" and round 01:04:51.920 --> 01:04:58.220 to 2 decimal points FROM "longlist" Semicolon. 01:04:58.220 --> 01:05:04.910 So now this query decides to first find the average of the rating column. 01:05:04.910 --> 01:05:10.860 Then take the result and round it using two decimal points. 01:05:10.860 --> 01:05:14.510 Notice how round takes two inputs or two arguments, the first one 01:05:14.510 --> 01:05:18.500 being the rating, the average rating, the second one being number 01:05:18.500 --> 01:05:20.300 of decimal points to round, 2. 01:05:20.300 --> 01:05:24.630 And we complete our query in the way we usually do by saying FROM this table. 01:05:24.630 --> 01:05:26.870 So let's try this one to figure this out. 01:05:26.870 --> 01:05:27.620 I'll come back. 01:05:27.620 --> 01:05:29.120 And I'll hit Enter. 01:05:29.120 --> 01:05:33.110 And now I'll see we do get back 3.75. 01:05:33.110 --> 01:05:35.720 But there is still one thing to improve here. 01:05:35.720 --> 01:05:39.590 When I write this query, I see this ugly title name-- 01:05:39.590 --> 01:05:41.990 ROUND average "rating" comma 2. 01:05:41.990 --> 01:05:44.775 I wouldn't send this to my boss or somebody else who I 01:05:44.775 --> 01:05:46.400 work for or maybe even a friend, right? 01:05:46.400 --> 01:05:49.170 I want to make sure it's pretty so they can read it correctly. 01:05:49.170 --> 01:05:53.350 So what could I do then to try to make this prettier? 01:05:53.350 --> 01:05:55.090 I could maybe rename this column. 01:05:55.090 --> 01:05:59.680 I could try to take this and make it not just this ugly mess of SQL 01:05:59.680 --> 01:06:02.810 keyword but to give it some name I could use instead. 01:06:02.810 --> 01:06:05.170 So for this, we'll introduce a brand new one-- 01:06:05.170 --> 01:06:07.300 new brand new keyword called AS. 01:06:07.300 --> 01:06:09.100 Let's try this one too. 01:06:09.100 --> 01:06:19.120 I'll come back, and I'll say SELECT, again, ROUND average "rating" comma 2. 01:06:19.120 --> 01:06:25.260 But now, I'll select it AS, let's say, average rating. 01:06:25.260 --> 01:06:27.093 And now before I actually finish this query, 01:06:27.093 --> 01:06:30.260 let me try to bring it up to the top my terminal so we can see it all in one 01:06:30.260 --> 01:06:30.830 go. 01:06:30.830 --> 01:06:32.150 I'll backspace this. 01:06:32.150 --> 01:06:40.450 And I'll say SELECT the rounded version of the average rating rounded 01:06:40.450 --> 01:06:42.760 to 2 decimal points AS-- 01:06:42.760 --> 01:06:45.670 let's call this one "average rating." 01:06:45.670 --> 01:06:46.780 Now hit Enter. 01:06:46.780 --> 01:06:51.370 And I'll say FROM my "longlist" table Semicolon. 01:06:51.370 --> 01:06:53.830 Now, I see it's much prettier overall. 01:06:53.830 --> 01:06:57.010 I have no longer these SQL keywords but instead 01:06:57.010 --> 01:07:01.120 just average rating as my column name. 01:07:01.120 --> 01:07:02.560 OK. 01:07:02.560 --> 01:07:04.960 So let me pause here and ask questions then 01:07:04.960 --> 01:07:12.490 on using average or using ROUND or using AS in these cases. 01:07:12.490 --> 01:07:16.360 SPEAKER: I'm wondering, do these sorts of commands have a funct-- 01:07:16.360 --> 01:07:21.760 are these commands encode something like data types or just encode something? 01:07:21.760 --> 01:07:23.998 Do these commands also have a name? 01:07:23.998 --> 01:07:24.790 CARTER ZENKE: Yeah. 01:07:24.790 --> 01:07:28.630 And can I ask are you referring to the AVG like COUNT like SUM 01:07:28.630 --> 01:07:29.860 those kinds of things, or? 01:07:29.860 --> 01:07:30.550 Yeah. 01:07:30.550 --> 01:07:33.140 So these functions do have a name. 01:07:33.140 --> 01:07:34.750 They are called aggregate functions. 01:07:34.750 --> 01:07:37.850 And aggregate means to combine, to bring together. 01:07:37.850 --> 01:07:41.410 So they're called aggregate functions because they take some number of rows, 01:07:41.410 --> 01:07:45.610 like all my ratings for instance, and bring it down to one single cell, 01:07:45.610 --> 01:07:48.530 like the average or the sum or the count. 01:07:48.530 --> 01:07:52.030 So if you look up or read more about SQL aggregate functions, 01:07:52.030 --> 01:07:54.070 you'll see all of these and perhaps some more 01:07:54.070 --> 01:07:57.750 depending on the software you're using. 01:07:57.750 --> 01:07:58.530 OK. 01:07:58.530 --> 01:08:02.970 So let's keep going then and try to start counting some other rows 01:08:02.970 --> 01:08:04.950 and use our other aggregate functions here. 01:08:04.950 --> 01:08:06.840 I'll go back to my terminal. 01:08:06.840 --> 01:08:13.140 And so far, we've seen average, as well as we have seen ROUND and so on. 01:08:13.140 --> 01:08:18.600 But why don't I try to find the maximum or the minimum rating in my table? 01:08:18.600 --> 01:08:25.560 I'll say SELECT, let's say, the MAX "rating" FROM my "longlist" Semicolon. 01:08:25.560 --> 01:08:26.670 Hit Enter. 01:08:26.670 --> 01:08:31.529 Now I see the highest rated book had a rating of 4.52. 01:08:31.529 --> 01:08:33.029 Well, what about the minimum rating? 01:08:33.029 --> 01:08:34.200 I could use MIN here too. 01:08:34.200 --> 01:08:40.310 I could say SELECT, let's say, MIN of my "rating" column FROM my "longlist" 01:08:40.310 --> 01:08:41.000 table. 01:08:41.000 --> 01:08:42.710 I'll hit Semicolon. 01:08:42.710 --> 01:08:44.689 And I'll see it 3.05. 01:08:44.689 --> 01:08:48.979 It is the lowest-rated book I have in this set. 01:08:48.979 --> 01:08:53.029 Well, as we've seen, let me try to view this for you all. 01:08:53.029 --> 01:08:59.390 I could say SELECT "title" and "votes" FROM my "longlist." 01:08:59.390 --> 01:09:01.430 SELECT "title" and "votes" from my "longlist." 01:09:01.430 --> 01:09:06.170 Here, I have many books with many user-generated votes. 01:09:06.170 --> 01:09:09.740 Maybe people on the internet decided to rate this book out of 5 01:09:09.740 --> 01:09:15.120 and maybe Go, Went, Gone got about, let's say, 592 votes. 01:09:15.120 --> 01:09:20.700 So I'm curious then, how many total votes do I have in my data set? 01:09:20.700 --> 01:09:23.600 Well, for that, I could use the SUM aggregate function. 01:09:23.600 --> 01:09:27.859 Try to count up each one of these rows and/or turn it back to me 01:09:27.859 --> 01:09:29.550 in a single cell. 01:09:29.550 --> 01:09:30.979 So I'll use SUM here. 01:09:30.979 --> 01:09:36.229 I'll come back and I'll say I want to find the sum of my votes column. 01:09:36.229 --> 01:09:42.890 I'll say SELECT, let's say, the SUM of my "votes" column 01:09:42.890 --> 01:09:45.510 FROM my "longlist" table. 01:09:45.510 --> 01:09:46.729 And then I'll just Enter-- 01:09:46.729 --> 01:09:47.960 Semicolon Enter. 01:09:47.960 --> 01:09:52.670 And I'll see over 600,000 people offer to vote 01:09:52.670 --> 01:09:56.810 for each of these books that were longlisted for the International Booker 01:09:56.810 --> 01:09:58.620 Prize. 01:09:58.620 --> 01:10:00.070 Now there'd be a few more here. 01:10:00.070 --> 01:10:03.240 Let's check out what else we have left to do and our aggregate functions. 01:10:03.240 --> 01:10:08.200 We could also try to count up just the number of books in our data set. 01:10:08.200 --> 01:10:11.910 So why don't I try to find the number of rows I have? 01:10:11.910 --> 01:10:14.070 For that, I could use count. 01:10:14.070 --> 01:10:17.100 And often to find the number of rows in your data set, 01:10:17.100 --> 01:10:20.460 you might use count and star as we saw a little earlier. 01:10:20.460 --> 01:10:25.200 I could say SELECT COUNT star from "longlist" 01:10:25.200 --> 01:10:30.270 And this means, again star means give me every row and every column. 01:10:30.270 --> 01:10:32.850 Give me basically my whole table, right? 01:10:32.850 --> 01:10:36.750 And if I say COUNT star, that means count up the number of rows 01:10:36.750 --> 01:10:39.280 that I have in my database. 01:10:39.280 --> 01:10:41.550 So I'll say COUNT star from "longlist." 01:10:41.550 --> 01:10:45.750 And I get back 78 books in this database. 01:10:45.750 --> 01:10:48.720 Well, let me try counting up the number of translators here. 01:10:48.720 --> 01:10:54.840 I'll say SELECT, let's say, COUNT of "translators" FROM "longlist" 01:10:54.840 --> 01:10:55.890 Semicolon. 01:10:55.890 --> 01:10:57.300 Hit Enter. 01:10:57.300 --> 01:11:00.670 And now I see 76. 01:11:00.670 --> 01:11:08.680 So I have 78 books, but if I count translators, I have 76 of them. 01:11:08.680 --> 01:11:10.870 So why might that be? 01:11:10.870 --> 01:11:14.080 Free to raise your hand and try to guess at this. 01:11:14.080 --> 01:11:19.053 Why do I have 78 rows but 76 translators? 01:11:19.053 --> 01:11:20.000 SPEAKER: Hi. 01:11:20.000 --> 01:11:22.412 I actually had raised my hand for the question. 01:11:22.412 --> 01:11:23.620 CARTER ZENKE: Yeah, go ahead. 01:11:23.620 --> 01:11:26.210 SPEAKER: Also, I wanted to know whether the MAX 01:11:26.210 --> 01:11:30.020 and the MIN functions can be used for finding the longest or the shortest 01:11:30.020 --> 01:11:30.710 string as well? 01:11:30.710 --> 01:11:32.543 Or do you have a different command for that? 01:11:32.543 --> 01:11:33.710 CARTER ZENKE: Good question. 01:11:33.710 --> 01:11:37.040 Could we use MAX and MIN to find the longest or shortest string? 01:11:37.040 --> 01:11:38.140 That's a good question. 01:11:38.140 --> 01:11:41.570 So let's actually pause on this counting here and try that out real quick. 01:11:41.570 --> 01:11:43.440 So I'll come back to my terminal. 01:11:43.440 --> 01:11:47.480 And let me try to use MAX and MIN with some book titles. 01:11:47.480 --> 01:11:50.090 So I'll say Select-- 01:11:50.090 --> 01:11:52.790 let's select the MAX title. 01:11:52.790 --> 01:11:57.780 And at the same time, why don't we select the MIN title as well? 01:11:57.780 --> 01:12:00.920 And I'll select these from my "longlist" table. 01:12:00.920 --> 01:12:02.370 I'll hit Semicolon. 01:12:02.370 --> 01:12:05.110 And now, let me try this out. 01:12:05.110 --> 01:12:09.100 I'll get back Wretchedness and A New Name-- 01:12:09.100 --> 01:12:12.570 Septology VI-VII. 01:12:12.570 --> 01:12:14.940 Now, there's a few hypotheses here. 01:12:14.940 --> 01:12:19.530 It does seem that our MAX "title" is shorter than our MIN "title." 01:12:19.530 --> 01:12:23.860 So it's probably not that MAX gives us the length of the string, 01:12:23.860 --> 01:12:25.420 but what do you notice? 01:12:25.420 --> 01:12:28.935 Well, I see MIN is really early on in the alphabet. 01:12:28.935 --> 01:12:34.290 It has an A here, whereas MAX has a W, pretty low in the alphabet. 01:12:34.290 --> 01:12:37.320 And I would bet if we ordered these book titles, 01:12:37.320 --> 01:12:43.140 we would see a new name up at the very top and a Wretchedness, the book here, 01:12:43.140 --> 01:12:44.710 down at the bottom. 01:12:44.710 --> 01:12:48.750 So MAX seems to give us the lowest alphabetically, which 01:12:48.750 --> 01:12:51.630 is kind of contradictory with titles here or strings. 01:12:51.630 --> 01:12:58.190 And MIN gives us the earliest in the alphabet using this A as well. 01:12:58.190 --> 01:12:58.940 OK. 01:12:58.940 --> 01:12:59.910 So a good question. 01:12:59.910 --> 01:13:01.840 Let's come back to our counting here. 01:13:01.840 --> 01:13:03.080 Let's go back to my terminal. 01:13:03.080 --> 01:13:09.140 And again, we had, in this case, 78 rows, but only 76 translators. 01:13:09.140 --> 01:13:12.710 So, again, if I did SELECT COUNT star from, let's say, 01:13:12.710 --> 01:13:17.000 "longlist" then Semicolon, I get back 78. 01:13:17.000 --> 01:13:25.280 But if I do SELECT COUNT of "translator" from "longlist," I get back 76. 01:13:25.280 --> 01:13:31.640 And let me ask again, why do we have 78 rows but 76 translators? 01:13:31.640 --> 01:13:34.135 Feel free to say it. 01:13:34.135 --> 01:13:39.350 OK, so I'm seeing maybe we have some number of rows, 78. 01:13:39.350 --> 01:13:43.850 But for our translators, you remember, two of those were null values. 01:13:43.850 --> 01:13:45.810 They didn't exist in our table. 01:13:45.810 --> 01:13:50.640 So it seems like if we use COUNT star, we're counting all the rows. 01:13:50.640 --> 01:13:54.710 But if we use COUNT "translator," some column that has null values, 01:13:54.710 --> 01:13:59.010 we're only getting back those rows or those values that aren't null. 01:13:59.010 --> 01:14:02.060 So COUNT, when given a column, counts only those that 01:14:02.060 --> 01:14:06.620 are not null that exist in our table. 01:14:06.620 --> 01:14:09.240 OK, let's look at one more example here for counting. 01:14:09.240 --> 01:14:11.980 And let's try this. 01:14:11.980 --> 01:14:16.080 Let's say I want to find all of the publishers in this database. 01:14:16.080 --> 01:14:22.230 I'll say SELECT COUNT of "publisher" from my "longlist." 01:14:22.230 --> 01:14:24.390 And I'll hit Semicolon. 01:14:24.390 --> 01:14:30.330 So you might think that I have 78 publishers in this long list. 01:14:30.330 --> 01:14:32.190 But would it be accurate if I were to say 01:14:32.190 --> 01:14:37.160 I have 78 different publishers in this longlist? 01:14:37.160 --> 01:14:40.080 Could I say that? 01:14:40.080 --> 01:14:41.550 I'm seeing no, right? 01:14:41.550 --> 01:14:44.280 I couldn't try to count up these publishers 01:14:44.280 --> 01:14:46.260 and then say I have 78 different ones. 01:14:46.260 --> 01:14:49.380 I might double count the publisher along the way. 01:14:49.380 --> 01:14:51.100 And let me show you what we mean here. 01:14:51.100 --> 01:14:52.680 So I'll go back to my table. 01:14:52.680 --> 01:14:57.360 And let me try to select from publishers or select the publisher column 01:14:57.360 --> 01:14:58.530 from longlist. 01:14:58.530 --> 01:15:02.100 I'll select "publisher" from "longlist." 01:15:02.100 --> 01:15:03.870 Hit Semicolon. 01:15:03.870 --> 01:15:06.690 Oops, and now I see something a little odd. 01:15:06.690 --> 01:15:13.070 Let me scroll back up and maybe ask for a raised hand here. 01:15:13.070 --> 01:15:19.560 Why might I get this odd result? 01:15:19.560 --> 01:15:21.318 SPEAKER: Because of the quotes? 01:15:21.318 --> 01:15:22.110 CARTER ZENKE: Yeah. 01:15:22.110 --> 01:15:24.360 So I think I mistyped some of my query here. 01:15:24.360 --> 01:15:28.650 I said it looks like "pubsliher" instead of "publisher." 01:15:28.650 --> 01:15:31.860 And in this case, SQL will give me what I asked for. 01:15:31.860 --> 01:15:34.400 I said SELECT "pubsliher" from "longlist." 01:15:34.400 --> 01:15:35.760 And says, OK, here it is. 01:15:35.760 --> 01:15:38.830 But that column doesn't exist so it creates this data for me. 01:15:38.830 --> 01:15:40.860 So let me try this again. 01:15:40.860 --> 01:15:41.520 I'll go back. 01:15:41.520 --> 01:15:43.470 And I'll hopefully type this correctly now. 01:15:43.470 --> 01:15:52.360 I'll say SELECT, let's say, "publisher," this one, FROM "longlist" Semicolon. 01:15:52.360 --> 01:15:57.270 And now I'll see all of the publishers that I have in my table. 01:15:57.270 --> 01:15:58.118 But what do you see? 01:15:58.118 --> 01:15:59.160 Well, some repeat, right? 01:15:59.160 --> 01:16:02.340 I have Harvill Secker multiple times here. 01:16:02.340 --> 01:16:06.010 I have similarly MacLehose Press multiple times as well. 01:16:06.010 --> 01:16:09.120 So if I count it up, these publishers, I would 01:16:09.120 --> 01:16:12.720 get each one counted one time, which I want to find 01:16:12.720 --> 01:16:14.760 the distinct ones the different ones. 01:16:14.760 --> 01:16:16.800 I need a new keyword for this. 01:16:16.800 --> 01:16:20.190 And for this, we'll use this keyword indeed called DISTINCT, trying 01:16:20.190 --> 01:16:23.650 to find unique values from our column. 01:16:23.650 --> 01:16:24.960 So let's try this. 01:16:24.960 --> 01:16:30.360 I'll go back over here and I will now select not just publishers, 01:16:30.360 --> 01:16:32.070 but distinct publishers. 01:16:32.070 --> 01:16:39.820 I'll say SELECT DISTINCT publisher from "longlist" Semicolon. 01:16:39.820 --> 01:16:44.920 Now, if I scroll through here, I should see each publisher in here 01:16:44.920 --> 01:16:46.750 only one time. 01:16:46.750 --> 01:16:49.330 If they have the same name, they've been filtered out. 01:16:49.330 --> 01:16:52.540 And now they're only the same publisher here too. 01:16:52.540 --> 01:16:54.850 So I will then try to say SELECT, let's say, 01:16:54.850 --> 01:17:01.017 COUNT of "publisher," SELECT COUNT of "publisher" from-- 01:17:01.017 --> 01:17:02.850 COUNT of DISTINCT "publisher," for instance. 01:17:02.850 --> 01:17:15.140 COUNT distinct publisher-- oh, typo, "publisher" FROM "longlist" Semicolon. 01:17:15.140 --> 01:17:19.700 And I'll see I have 33 distinct publishers. 01:17:19.700 --> 01:17:21.050 OK. 01:17:21.050 --> 01:17:24.620 So this just about brings us to the conclusion 01:17:24.620 --> 01:17:27.080 of all of these new SQL keywords here. 01:17:27.080 --> 01:17:30.560 We've seen so far that we have several here to use. 01:17:30.560 --> 01:17:33.175 But let's figure out how to actually exit this prompt. 01:17:33.175 --> 01:17:35.300 So you might it be in your SQLite prompt right now. 01:17:35.300 --> 01:17:39.860 If you want to leave it, you could also use this command, dot quit. 01:17:39.860 --> 01:17:41.540 Dot quit is not a SQL keyword. 01:17:41.540 --> 01:17:44.240 It's a SQLite keyword to leave your terminal 01:17:44.240 --> 01:17:47.220 and go back to where you started. 01:17:47.220 --> 01:17:50.810 So just to review then, what we've seen so far 01:17:50.810 --> 01:17:53.360 is how to select data from our table. 01:17:53.360 --> 01:17:56.870 We can use select column to take some column from our table 01:17:56.870 --> 01:18:02.060 and give us back all of those rows from that table for that column. 01:18:02.060 --> 01:18:05.120 We've seen we can apply some aggregate functions to take 01:18:05.120 --> 01:18:08.310 maybe the count of our columns or the average or so on. 01:18:08.310 --> 01:18:10.860 And we can get back not just all of our rows, 01:18:10.860 --> 01:18:16.230 but only some of them using our WHERE clause here along with a condition. 01:18:16.230 --> 01:18:20.270 We could have multiple conditions, having not just one but perhaps two. 01:18:20.270 --> 01:18:23.480 Like, let's say, here, condition 0 and condition 1. 01:18:23.480 --> 01:18:27.110 And we could also use, we saw before, this idea of equals and LIKE 01:18:27.110 --> 01:18:31.010 to match some pattern or to make something exactly equal over here. 01:18:31.010 --> 01:18:33.200 We could, again, use AND and OR. 01:18:33.200 --> 01:18:35.870 And we saw later on how we could order our data 01:18:35.870 --> 01:18:41.940 and use our LIMIT function to get back only some number of rows. 01:18:41.940 --> 01:18:45.530 Now, this then is our interaction to querying. 01:18:45.530 --> 01:18:48.830 And so far, we've seen this world of books. 01:18:48.830 --> 01:18:52.310 And the table we've had so far really just has books inside of it. 01:18:52.310 --> 01:18:57.020 But next time, what we'll see is how to take this world of books and split 01:18:57.020 --> 01:18:58.260 into multiple tables. 01:18:58.260 --> 01:19:02.255 How do we find information on publishers or books or authors too? 01:19:02.255 --> 01:19:04.130 And how do we try to put that in a table that 01:19:04.130 --> 01:19:07.380 can present the relationships among all of these different entities? 01:19:07.380 --> 01:19:10.005 We'll talk about all that and more when we come back next time. 01:19:10.005 --> 01:19:11.920 And we'll see you there.