1 00:00:00,000 --> 00:00:02,946 [MUSIC PLAYING] 2 00:00:02,946 --> 00:00:12,047 3 00:00:12,047 --> 00:00:13,630 SPEAKER: All right, let's get started. 4 00:00:13,630 --> 00:00:15,320 So welcome back, everyone, to CS50 Beyond. 5 00:00:15,320 --> 00:00:17,444 So just to give everyone a little bit of a roadmap, 6 00:00:17,444 --> 00:00:20,660 we are here on day three of CS50 Beyond, which is the middle of the course. 7 00:00:20,660 --> 00:00:23,540 We spent the first half of the class talking about the building 8 00:00:23,540 --> 00:00:26,630 and designing of web applications using HTML, CSS. 9 00:00:26,630 --> 00:00:28,490 We took a look at Flask on the second day, 10 00:00:28,490 --> 00:00:29,990 looking at how to build web servers. 11 00:00:29,990 --> 00:00:31,948 And then we took a look at JavaScript, in order 12 00:00:31,948 --> 00:00:35,390 to write programs on the client side, in order to create more interactive user 13 00:00:35,390 --> 00:00:36,161 interfaces. 14 00:00:36,161 --> 00:00:38,910 Today, we'll continue our discussion of building web applications, 15 00:00:38,910 --> 00:00:40,619 in particular by talking about databases, 16 00:00:40,619 --> 00:00:43,368 which are going to be particularly important if you ever go around 17 00:00:43,368 --> 00:00:46,040 to trying to store information inside of a web application, 18 00:00:46,040 --> 00:00:49,370 managing users, or managing any kind of data in an application. 19 00:00:49,370 --> 00:00:51,009 We saw a little bit of this in CS50. 20 00:00:51,009 --> 00:00:54,050 We'll dive a little bit deeper, look at some other options for databases, 21 00:00:54,050 --> 00:00:56,090 look at some more advanced tools like ORMs 22 00:00:56,090 --> 00:00:58,970 that allow you to interact with databases a little more easily. 23 00:00:58,970 --> 00:01:01,610 And then in the latter half of the week, starting tomorrow, 24 00:01:01,610 --> 00:01:03,710 we'll introduce a framework called React. 25 00:01:03,710 --> 00:01:06,100 And React is a JavaScript-based framework 26 00:01:06,100 --> 00:01:08,960 that is built by Facebook that's designed to make it much easier 27 00:01:08,960 --> 00:01:11,392 in order to design and build interactive user interfaces. 28 00:01:11,392 --> 00:01:14,600 You might have found that yesterday, as you were writing JavaScript-based web 29 00:01:14,600 --> 00:01:16,691 applications, you were running into situations 30 00:01:16,691 --> 00:01:19,940 where you were writing a lot of code to do things that should have been fairly 31 00:01:19,940 --> 00:01:24,050 simple, writing code to update variables and change the way that the page looked 32 00:01:24,050 --> 00:01:25,790 based on the values of those variables. 33 00:01:25,790 --> 00:01:28,640 React is going to simplify that process, make it a whole lot easier, 34 00:01:28,640 --> 00:01:31,910 and you'll find that it's a very, very powerful tool to very easily build 35 00:01:31,910 --> 00:01:33,160 interactive user interfaces. 36 00:01:33,160 --> 00:01:35,150 So we'll take a look at that starting tomorrow. 37 00:01:35,150 --> 00:01:38,450 But today the main focus is going to be databases. 38 00:01:38,450 --> 00:01:41,390 And so we saw this in CS50, but the basic idea 39 00:01:41,390 --> 00:01:43,610 of what a database is, is you can think of it-- 40 00:01:43,610 --> 00:01:46,434 at least in our terms-- as a collection of tables. 41 00:01:46,434 --> 00:01:48,350 And a table looks something along these lines, 42 00:01:48,350 --> 00:01:51,410 or you have columns that each refer to a different category of data 43 00:01:51,410 --> 00:01:53,870 that you might want to store, and then rows, one 44 00:01:53,870 --> 00:01:55,901 for each entry inside of a database. 45 00:01:55,901 --> 00:01:57,650 And the example that I'm going to be using 46 00:01:57,650 --> 00:01:59,400 throughout this morning and this afternoon 47 00:01:59,400 --> 00:02:01,820 is, in particular, thinking about the types of data 48 00:02:01,820 --> 00:02:05,000 that I might want to store if I were building a database for a web 49 00:02:05,000 --> 00:02:06,500 application for an airline. 50 00:02:06,500 --> 00:02:10,160 For instance, managing the flights of that airline, services, 51 00:02:10,160 --> 00:02:12,950 and the passengers that are riding on planes on that airline. 52 00:02:12,950 --> 00:02:15,260 And so that's the data that we're going to be trying to keep track of, 53 00:02:15,260 --> 00:02:17,240 and we'll see examples that pertain to that. 54 00:02:17,240 --> 00:02:19,115 And so you might imagine that if we're trying 55 00:02:19,115 --> 00:02:22,382 to create a table to keep track of all the flights that an airline has, 56 00:02:22,382 --> 00:02:25,340 for instance, we might care about the origin of that flight, where it's 57 00:02:25,340 --> 00:02:28,250 going from, the destination of that flight, 58 00:02:28,250 --> 00:02:31,909 and then a duration in number of minutes long that that flight takes. 59 00:02:31,909 --> 00:02:33,950 And what we'd like to do is come up with some way 60 00:02:33,950 --> 00:02:37,590 of interacting with this database in order to make queries to it. 61 00:02:37,590 --> 00:02:40,490 And so this is where S-Q-L, or SQL, came in. 62 00:02:40,490 --> 00:02:43,700 And SQL was something we introduced in CS50 as a language 63 00:02:43,700 --> 00:02:45,800 for interacting with databases, a language that 64 00:02:45,800 --> 00:02:48,320 made it easy to request data from databases, 65 00:02:48,320 --> 00:02:51,800 to insert data into databases, to update data as well. 66 00:02:51,800 --> 00:02:53,450 And SQL is a language. 67 00:02:53,450 --> 00:02:57,110 And there are a number of different databases that use this language 68 00:02:57,110 --> 00:02:59,310 and so there are a good number of these. 69 00:02:59,310 --> 00:03:04,130 But a handful that are probably the most popular include My-S-Q-L, or MySQL, 70 00:03:04,130 --> 00:03:05,810 PostgreSQL, and SQLite. 71 00:03:05,810 --> 00:03:08,630 SQLite is the version that we were using in CS50. 72 00:03:08,630 --> 00:03:13,640 SQLite is a SQL database that is entirely based in a file. 73 00:03:13,640 --> 00:03:15,870 Your whole database was stored inside of a file. 74 00:03:15,870 --> 00:03:19,080 So for CS50 finance, for instance, all your database information 75 00:03:19,080 --> 00:03:21,790 was stored inside of a file called finance.db. 76 00:03:21,790 --> 00:03:26,090 And SQLite is a fairly simple, lightweight database. 77 00:03:26,090 --> 00:03:29,720 But it's not so good if you want to think about potentially scaling 78 00:03:29,720 --> 00:03:32,752 a database to be larger, if you need to start managing permissions. 79 00:03:32,752 --> 00:03:35,210 If you need to be able to have much more efficient queries, 80 00:03:35,210 --> 00:03:38,489 or if you're trying to scale something to a production level sort of thing 81 00:03:38,489 --> 00:03:41,030 you want anyone to be able to use and access on the internet, 82 00:03:41,030 --> 00:03:43,760 generally SQLite is not recommended for that. 83 00:03:43,760 --> 00:03:46,850 And so you'll generally want to use a database that scales a bit better, 84 00:03:46,850 --> 00:03:51,320 that is able to handle the load that a web application might bring. 85 00:03:51,320 --> 00:03:55,100 And MySQL and PostgreSQL are the sorts of databases 86 00:03:55,100 --> 00:03:56,690 that you might use for that. 87 00:03:56,690 --> 00:04:00,080 And so the database we're going to be using over the course of CS50 Beyond 88 00:04:00,080 --> 00:04:03,680 is going to be a PostgreSQL database, which is just a different database that 89 00:04:03,680 --> 00:04:05,430 also uses the SQL language. 90 00:04:05,430 --> 00:04:07,670 So you'll notice there are some slight differences. 91 00:04:07,670 --> 00:04:09,650 And we're using this database in particular 92 00:04:09,650 --> 00:04:12,880 because it'll be very easy to deploy Heroku, the service 93 00:04:12,880 --> 00:04:15,260 we introduced a couple of days ago, to design and deploy 94 00:04:15,260 --> 00:04:16,790 a web application to the internet. 95 00:04:16,790 --> 00:04:20,360 Offers free Postgres databases along with any of your applications. 96 00:04:20,360 --> 00:04:23,810 And that's going to make it very easy for us to set up a Postgres database 97 00:04:23,810 --> 00:04:25,710 and then to be able to insert data into it. 98 00:04:25,710 --> 00:04:28,160 And so just know that there are a number of different types of databases 99 00:04:28,160 --> 00:04:29,750 that all use the SQL language. 100 00:04:29,750 --> 00:04:30,977 We used SQLite in CS50. 101 00:04:30,977 --> 00:04:32,810 We're going to be moving away from that now, 102 00:04:32,810 --> 00:04:35,360 because we're thinking about if we actually want to start building web 103 00:04:35,360 --> 00:04:37,310 applications, deploying them to the internet, what 104 00:04:37,310 --> 00:04:38,860 are the types of databases we're going to use. 105 00:04:38,860 --> 00:04:41,210 And a SQLite database is just not going to work quite as 106 00:04:41,210 --> 00:04:44,120 well for those sorts of purposes. 107 00:04:44,120 --> 00:04:46,176 So much like in any programming language, 108 00:04:46,176 --> 00:04:48,800 where we have different data types for different types of data, 109 00:04:48,800 --> 00:04:51,270 SQL likewise also has data types. 110 00:04:51,270 --> 00:04:54,310 These are the data types you might commonly seen in PostgreSQL. 111 00:04:54,310 --> 00:04:56,697 And different databases will have different types of some 112 00:04:56,697 --> 00:04:58,280 of the stuff you're going to see here. 113 00:04:58,280 --> 00:05:02,150 Differs a little bit from SQLite which you might be familiar with from CS50. 114 00:05:02,150 --> 00:05:04,617 INTEGER, as you might imagine, just stores an integer. 115 00:05:04,617 --> 00:05:06,450 A DECIMAL number is a floating point number, 116 00:05:06,450 --> 00:05:09,200 something that might have digits after the decimal point. 117 00:05:09,200 --> 00:05:11,700 SERIAL is a special type in Postgres. 118 00:05:11,700 --> 00:05:15,840 That's the equivalent of what the auto-incrementing feature of SQLite 119 00:05:15,840 --> 00:05:16,440 was. 120 00:05:16,440 --> 00:05:19,590 A SERIAL variable is like an integer that will automatically 121 00:05:19,590 --> 00:05:20,590 increment every time. 122 00:05:20,590 --> 00:05:23,850 So it'll start at 1, then go to 2, then 3, then 4, so on and so forth. 123 00:05:23,850 --> 00:05:27,240 And you'll often use that for the ID column of your table 124 00:05:27,240 --> 00:05:29,340 where, if you recall, the ID column of our table 125 00:05:29,340 --> 00:05:32,100 we used as our primary key, the primary way via which 126 00:05:32,100 --> 00:05:33,960 we would identify rows in the table. 127 00:05:33,960 --> 00:05:36,810 And usually that's going to end up being SERIAL. 128 00:05:36,810 --> 00:05:40,470 VARCHAR is used for a variable number of characters, like a string of text, 129 00:05:40,470 --> 00:05:41,380 for instance. 130 00:05:41,380 --> 00:05:43,590 A TIMESTAMP keeps track of a date and time. 131 00:05:43,590 --> 00:05:46,170 A BOOLEAN value could just be true or false. 132 00:05:46,170 --> 00:05:48,600 And then an ENUM is generally going to be 133 00:05:48,600 --> 00:05:50,940 one of a fixed finite number of values. 134 00:05:50,940 --> 00:05:53,239 So if you have a couple of possible options, 135 00:05:53,239 --> 00:05:56,280 and you only want to store one of those possible options in the database, 136 00:05:56,280 --> 00:05:59,540 an ENUM might be the type that you'll choose for that purpose. 137 00:05:59,540 --> 00:06:03,220 And so we'll have an opportunity to play around with that in just a moment. 138 00:06:03,220 --> 00:06:06,030 So how do we create a table in SQL? 139 00:06:06,030 --> 00:06:09,210 Well we'll use a command called CREATE TABLE. 140 00:06:09,210 --> 00:06:11,176 And in Postgres, at least, a CREATE TABLE query 141 00:06:11,176 --> 00:06:13,050 might look something along the lines of this. 142 00:06:13,050 --> 00:06:16,590 This is SQL code to actually create a table programmatically. 143 00:06:16,590 --> 00:06:20,015 So we're saying CREATE TABLE flights and inside of this flights table, 144 00:06:20,015 --> 00:06:22,140 there are a couple of things that we want to store. 145 00:06:22,140 --> 00:06:24,630 We want to store an ID, since every table probably 146 00:06:24,630 --> 00:06:26,850 should have an ID column such that we can uniquely 147 00:06:26,850 --> 00:06:29,400 identify any row in that table. 148 00:06:29,400 --> 00:06:32,081 The type of the ID column is going to be SERIAL. 149 00:06:32,081 --> 00:06:34,080 In other words, it's going to auto increment one 150 00:06:34,080 --> 00:06:35,288 after another, after another. 151 00:06:35,288 --> 00:06:38,130 I don't have to tell the table what ID to give a flight. 152 00:06:38,130 --> 00:06:41,310 It's just going to automatically give it a flight ID. 153 00:06:41,310 --> 00:06:42,730 It's going to be a primary key. 154 00:06:42,730 --> 00:06:44,730 In other words, this is the primary way by which 155 00:06:44,730 --> 00:06:46,540 I'm going to identify a flight. 156 00:06:46,540 --> 00:06:49,647 And if some other table wants to reference the flight, then by default, 157 00:06:49,647 --> 00:06:51,230 it's going to reference it by that ID. 158 00:06:51,230 --> 00:06:53,526 And we'll see an example of that later. 159 00:06:53,526 --> 00:06:55,400 What else do we need to store about a flight? 160 00:06:55,400 --> 00:06:57,830 We need to store the origin of the flight which 161 00:06:57,830 --> 00:07:01,490 is going to be VARCHAR, just some string, the destination of that flight, 162 00:07:01,490 --> 00:07:04,900 and the duration of the flight, which is going to be an integer. 163 00:07:04,900 --> 00:07:07,550 And so let's actually try to create this table now, 164 00:07:07,550 --> 00:07:09,560 by creating a Postgres database and trying 165 00:07:09,560 --> 00:07:11,530 to put this information in there. 166 00:07:11,530 --> 00:07:13,260 And so I'll show you an example of that. 167 00:07:13,260 --> 00:07:15,560 Now there are a number of ways to set up a Postgres database. 168 00:07:15,560 --> 00:07:17,270 I could set it up locally on my computer. 169 00:07:17,270 --> 00:07:20,090 You can install and download Postgres onto your computer, 170 00:07:20,090 --> 00:07:23,062 in order to run a Postgres server on your computer locally. 171 00:07:23,062 --> 00:07:24,770 But the easiest way for us to get started 172 00:07:24,770 --> 00:07:27,860 with this is probably just going to be to go ahead and go to Heroku, 173 00:07:27,860 --> 00:07:31,560 and try and create a Heroku application and add a database to it. 174 00:07:31,560 --> 00:07:34,790 So I'll go ahead and go to New and Create New App much as I did before 175 00:07:34,790 --> 00:07:37,834 in Heroku, and I'm going to call this Brian-flights-- 176 00:07:37,834 --> 00:07:40,250 see if that's available, all right, great, it's available. 177 00:07:40,250 --> 00:07:42,920 I'll create the app. 178 00:07:42,920 --> 00:07:45,170 And, all right, I've created a new Heroku application. 179 00:07:45,170 --> 00:07:48,420 And now what I'd like to do is add on a database to it. 180 00:07:48,420 --> 00:07:51,020 And so if I go to the Overview tab here on Heroku, 181 00:07:51,020 --> 00:07:53,120 I'll see installed add-ons, which is an area where 182 00:07:53,120 --> 00:07:57,050 I can add some additional features to my web application. 183 00:07:57,050 --> 00:08:00,620 And in add-ons, I'm going to do a search for Postgres. 184 00:08:00,620 --> 00:08:02,000 And here's Heroku Postgres. 185 00:08:02,000 --> 00:08:04,520 This is Heroku's own service for providing you 186 00:08:04,520 --> 00:08:07,230 with the PostgreSQL database. 187 00:08:07,230 --> 00:08:08,637 I'll click on that. 188 00:08:08,637 --> 00:08:10,220 There are a number of different tiers. 189 00:08:10,220 --> 00:08:12,710 The free tier of the databases is all we'll need for now. 190 00:08:12,710 --> 00:08:14,812 The free tier will give me 10,000 rows that I 191 00:08:14,812 --> 00:08:16,520 can insert into this database, which will 192 00:08:16,520 --> 00:08:18,350 be more than enough for our purposes. 193 00:08:18,350 --> 00:08:22,160 And we'll go ahead and click Provision. 194 00:08:22,160 --> 00:08:26,840 And all right, we've been able to create a new Postgres database. 195 00:08:26,840 --> 00:08:33,690 And if I click on Heroku Postgres now, we'll load it. 196 00:08:33,690 --> 00:08:37,320 And if I go into the settings, I'll be able to see the database 197 00:08:37,320 --> 00:08:38,969 credentials for this database. 198 00:08:38,969 --> 00:08:41,010 And so the database credentials for this database 199 00:08:41,010 --> 00:08:44,495 is information that I'm going to need if I ever want to access that database. 200 00:08:44,495 --> 00:08:47,370 So it gives me the host name for where the database is being stored-- 201 00:08:47,370 --> 00:08:49,290 it's being stored on Amazon servers-- 202 00:08:49,290 --> 00:08:53,550 the name of the database, the username, the port number, and a password 203 00:08:53,550 --> 00:08:55,170 for being able to access the database. 204 00:08:55,170 --> 00:08:58,810 Databases like PostgreSQL have user management. 205 00:08:58,810 --> 00:09:01,620 You have a username and password, such that it's possible 206 00:09:01,620 --> 00:09:04,830 if you're maintaining the database to create multiple different users, each 207 00:09:04,830 --> 00:09:07,038 of which has their own credentials, and each of which 208 00:09:07,038 --> 00:09:09,917 might have different permissions or access to the database. 209 00:09:09,917 --> 00:09:12,000 And so there are a number of ways you can actually 210 00:09:12,000 --> 00:09:14,460 try to access this database. 211 00:09:14,460 --> 00:09:18,810 If you install PostgreSQL locally onto your computer, you can-- 212 00:09:18,810 --> 00:09:23,179 let me shut this down a little bit. 213 00:09:23,179 --> 00:09:24,970 If you install Postgres onto your computer, 214 00:09:24,970 --> 00:09:28,510 you can run a command like PSQL for PostgreSQL, 215 00:09:28,510 --> 00:09:33,910 followed by that database URL, and then you're going to be inside of a terminal 216 00:09:33,910 --> 00:09:35,590 where you can begin to run queries. 217 00:09:35,590 --> 00:09:38,960 I could just type, Create Table Here, and that will work just fine. 218 00:09:38,960 --> 00:09:43,450 So I could actually take this code, Create Table Flights, and just go ahead 219 00:09:43,450 --> 00:09:47,470 and paste it in to Postgres here, and press Return. 220 00:09:47,470 --> 00:09:48,500 And if all goes well-- 221 00:09:48,500 --> 00:09:54,800 222 00:09:54,800 --> 00:09:55,310 OK, great. 223 00:09:55,310 --> 00:09:56,610 It created the table. 224 00:09:56,610 --> 00:10:01,020 And if I do backslash dT is the command to list all the tables-- 225 00:10:01,020 --> 00:10:03,350 looks like internet connection is a little bit slow-- 226 00:10:03,350 --> 00:10:06,170 but I can see that, great, I have a Flights Table there now. 227 00:10:06,170 --> 00:10:08,390 I'm going to go ahead and drop table flights 228 00:10:08,390 --> 00:10:11,300 to delete that table so I can show you a different way 229 00:10:11,300 --> 00:10:15,260 to create the table if you'd like to without using the command line. 230 00:10:15,260 --> 00:10:17,780 There are a number of services that you can use to interact 231 00:10:17,780 --> 00:10:20,180 with Postgres in a graphical way. 232 00:10:20,180 --> 00:10:23,280 Much in the way that we had phpLiteAdmin in CS50 233 00:10:23,280 --> 00:10:26,210 to be able to interact with a SQLite database. 234 00:10:26,210 --> 00:10:30,290 CS50 offers one such service that we host called Adminer. 235 00:10:30,290 --> 00:10:33,740 So if you go to Adminer.cs50.io, you'll be 236 00:10:33,740 --> 00:10:36,790 taken to a user interface that looks a little bit like this 237 00:10:36,790 --> 00:10:39,164 that will let you log in to a Postgres database 238 00:10:39,164 --> 00:10:42,080 by providing information about the server, the username, the password, 239 00:10:42,080 --> 00:10:43,170 and the database. 240 00:10:43,170 --> 00:10:46,880 So if I go ahead and do this, take the server name, the host name, 241 00:10:46,880 --> 00:10:51,260 paste that in there, take the username, paste that into username, 242 00:10:51,260 --> 00:10:54,890 take the password, paste that into a password, and take the database name 243 00:10:54,890 --> 00:11:00,500 and paste that into database, I can go ahead and log into this database. 244 00:11:00,500 --> 00:11:04,220 And what I'll see when I log in is an interface very much like phpLiteAdmin, 245 00:11:04,220 --> 00:11:07,490 an interface where I can begin to execute queries, 246 00:11:07,490 --> 00:11:11,600 and I can look at my data and manage my data in that way as well. 247 00:11:11,600 --> 00:11:14,670 So in order to create a table, there are a number of ways to do this. 248 00:11:14,670 --> 00:11:16,962 I could just click the Create Table button, which 249 00:11:16,962 --> 00:11:19,670 will take me to a graphical interface where I can name the table, 250 00:11:19,670 --> 00:11:22,340 give it columns, much as you may have done in CS50. 251 00:11:22,340 --> 00:11:24,334 I could also just type raw SQL commands. 252 00:11:24,334 --> 00:11:26,000 And so that's what I'm going to do here. 253 00:11:26,000 --> 00:11:28,970 Just type SQL Command over here on the left hand side, which 254 00:11:28,970 --> 00:11:31,520 is going to open up a place where I can just run SQL queries, 255 00:11:31,520 --> 00:11:33,200 any queries that I want to run. 256 00:11:33,200 --> 00:11:36,680 And in this case, I'm just going to take this Create Table query, 257 00:11:36,680 --> 00:11:41,050 paste it in here, and go ahead and execute that query. 258 00:11:41,050 --> 00:11:41,550 All right. 259 00:11:41,550 --> 00:11:43,766 So it said, Query Executed OK. 260 00:11:43,766 --> 00:11:45,640 And now if I look here on the left hand side, 261 00:11:45,640 --> 00:11:48,870 you'll see that I do now have this table, called Flights. 262 00:11:48,870 --> 00:11:52,350 And that table, called Flights, has an ID, it has an origin, 263 00:11:52,350 --> 00:11:55,760 it has a destination, and it has a duration. 264 00:11:55,760 --> 00:11:58,580 So long story short, I went to Heroku, set up a new application, 265 00:11:58,580 --> 00:12:00,756 attached a Postgres database to it. 266 00:12:00,756 --> 00:12:02,630 That gave me access to all of the credentials 267 00:12:02,630 --> 00:12:05,570 for the database, username, and password, and host name, and such. 268 00:12:05,570 --> 00:12:07,904 Then you can access that database in any number of ways. 269 00:12:07,904 --> 00:12:09,903 If you have Postgres installed on your computer, 270 00:12:09,903 --> 00:12:11,960 you can access the database via the command line. 271 00:12:11,960 --> 00:12:14,690 Otherwise you can just use the web interface using something 272 00:12:14,690 --> 00:12:17,900 like Adminer.cs50.io to log into the database, 273 00:12:17,900 --> 00:12:20,370 and then to be able to manipulate it. 274 00:12:20,370 --> 00:12:22,200 Questions about anything so far? 275 00:12:22,200 --> 00:12:22,836 Yeah? 276 00:12:22,836 --> 00:12:26,804 AUDIENCE: Is there invalid [INAUDIBLE] to varchar columns? 277 00:12:26,804 --> 00:12:31,764 Like, [INAUDIBLE] for the kind of string [INAUDIBLE],, 278 00:12:31,764 --> 00:12:34,492 but like an emoji, or something along those lines? 279 00:12:34,492 --> 00:12:38,720 Are there things that varchar can't force into into a string? 280 00:12:38,720 --> 00:12:41,660 SPEAKER: Are there things that varchar can't force into a string? 281 00:12:41,660 --> 00:12:43,670 There might be binary data types. 282 00:12:43,670 --> 00:12:46,580 You probably wouldn't be able to store an image for instance, 283 00:12:46,580 --> 00:12:49,670 there are different types for dealing with binary data that are not just 284 00:12:49,670 --> 00:12:52,211 plain strings, and so you'll probably want to pick and choose 285 00:12:52,211 --> 00:12:54,335 the type appropriately for that. 286 00:12:54,335 --> 00:12:54,835 Yeah? 287 00:12:54,835 --> 00:12:57,860 AUDIENCE: Can you go through the IDE to get to the [INAUDIBLE]?? 288 00:12:57,860 --> 00:13:00,440 SPEAKER: Can you go through the IDE? 289 00:13:00,440 --> 00:13:02,992 There may be a way, depending on the version. 290 00:13:02,992 --> 00:13:05,700 But generally speaking, this is just accessible via the internet, 291 00:13:05,700 --> 00:13:07,910 so you'll just go to a URL and you'll be able to access it. 292 00:13:07,910 --> 00:13:08,410 Yeah? 293 00:13:08,410 --> 00:13:13,330 AUDIENCE: I'm not too sure how the [INAUDIBLE] Adminer [INAUDIBLE] 294 00:13:13,330 --> 00:13:15,626 related to the Heroku. 295 00:13:15,626 --> 00:13:16,750 SPEAKER: Oh, good question. 296 00:13:16,750 --> 00:13:20,290 What's the relationship between Adminer here on CS50 and the Heroku database? 297 00:13:20,290 --> 00:13:23,740 Heroku has set up the database for me, and is effectively 298 00:13:23,740 --> 00:13:24,640 hosting the database. 299 00:13:24,640 --> 00:13:26,556 So if you took a look at the host name, you'll 300 00:13:26,556 --> 00:13:29,470 see the database is actually being hosted on Amazon servers. 301 00:13:29,470 --> 00:13:33,430 And all Adminer.cs50.io is doing, is it's connecting me 302 00:13:33,430 --> 00:13:36,100 to a database located somewhere else on the internet, 303 00:13:36,100 --> 00:13:39,760 and I'm interacting with it via this graphical interface. 304 00:13:39,760 --> 00:13:44,770 And so the database itself is being hosted on Amazon servers somewhere, 305 00:13:44,770 --> 00:13:48,280 but so long as I know the credentials, like what URL to access, what username, 306 00:13:48,280 --> 00:13:51,340 what password, I can access that database on Amazon servers 307 00:13:51,340 --> 00:13:52,420 in any number of ways. 308 00:13:52,420 --> 00:13:55,900 I can access it via the command line, as you saw me do a moment ago, 309 00:13:55,900 --> 00:14:00,280 or you can access it via any number of database browser applications, of which 310 00:14:00,280 --> 00:14:03,234 Adminer is one example of it, whereby you just type in the credentials 311 00:14:03,234 --> 00:14:05,650 for the database and that will connect you to the database 312 00:14:05,650 --> 00:14:08,161 and let you interact with it online. 313 00:14:08,161 --> 00:14:08,660 Yeah? 314 00:14:08,660 --> 00:14:11,570 AUDIENCE: So do you solve the database locally? 315 00:14:11,570 --> 00:14:15,450 Would you still be able to do [INAUDIBLE] 316 00:14:15,450 --> 00:14:17,397 317 00:14:17,397 --> 00:14:20,480 SPEAKER: So if you have the command line application, the PSQL application 318 00:14:20,480 --> 00:14:23,271 that I was using a moment ago, you can run that on the command line 319 00:14:23,271 --> 00:14:25,790 to access any database on the internet by just 320 00:14:25,790 --> 00:14:29,480 typing in the URL for that particular database, which 321 00:14:29,480 --> 00:14:30,980 is what you saw me do a moment ago. 322 00:14:30,980 --> 00:14:34,640 It's also possible to not host your database on Heroku servers, 323 00:14:34,640 --> 00:14:36,841 to just host the database locally on your computer, 324 00:14:36,841 --> 00:14:39,590 and then it's not going to be accessible by anyone on the internet 325 00:14:39,590 --> 00:14:41,410 unless you open it up to make it so. 326 00:14:41,410 --> 00:14:45,054 327 00:14:45,054 --> 00:14:46,220 All right, we'll keep going. 328 00:14:46,220 --> 00:14:49,970 329 00:14:49,970 --> 00:14:50,470 All right. 330 00:14:50,470 --> 00:14:53,735 So we saw the Create Table Query, creating a table called Flights. 331 00:14:53,735 --> 00:14:56,110 When you create a table, there are additional constraints 332 00:14:56,110 --> 00:14:57,590 that you can put on those tables. 333 00:14:57,590 --> 00:14:59,620 And so you may have seen some of these in CS50. 334 00:14:59,620 --> 00:15:02,090 We can specify not null on a column. 335 00:15:02,090 --> 00:15:05,980 So if I said, origin is a varchar and it's not null, 336 00:15:05,980 --> 00:15:08,830 that would mean that the column cannot be empty, 337 00:15:08,830 --> 00:15:10,450 so there needs to be something there. 338 00:15:10,450 --> 00:15:13,960 Unique is another keyword you might use, if something needs to be unique. 339 00:15:13,960 --> 00:15:16,540 So you might imagine, if users are signing up for a website, 340 00:15:16,540 --> 00:15:20,260 maybe the email column is going to be a unique column, because you only 341 00:15:20,260 --> 00:15:24,610 want each email address to be associated with at most one account, for example. 342 00:15:24,610 --> 00:15:26,790 Primary key is one that we saw before. 343 00:15:26,790 --> 00:15:29,650 Default is something you can use to specify what the default 344 00:15:29,650 --> 00:15:31,430 value of a column should be. 345 00:15:31,430 --> 00:15:36,880 So for instance, if you wanted to say that the default value of some integer 346 00:15:36,880 --> 00:15:39,460 should be 0, or 1, or whatever number you want it to be, 347 00:15:39,460 --> 00:15:42,430 you can specify some integer column and then say, default 0 348 00:15:42,430 --> 00:15:44,560 to mean the default value of the column is 0. 349 00:15:44,560 --> 00:15:48,070 If I don't specify a value, here's what the default value should be. 350 00:15:48,070 --> 00:15:50,140 And then check is an additional one that you 351 00:15:50,140 --> 00:15:54,030 can use to add a constraint to a column that is Boolean logic. 352 00:15:54,030 --> 00:15:58,360 I could say, I want a column called age that is an integer. 353 00:15:58,360 --> 00:16:01,026 And then I could say, check age greater than 0. 354 00:16:01,026 --> 00:16:04,150 In other words, make sure that the age that I'm inserting into the database 355 00:16:04,150 --> 00:16:05,140 is positive. 356 00:16:05,140 --> 00:16:08,410 I don't want to allow invalid input into the database like negative ages, 357 00:16:08,410 --> 00:16:09,440 for example. 358 00:16:09,440 --> 00:16:13,270 And so you can use check to enforce certain constraints on part 359 00:16:13,270 --> 00:16:16,930 of the database itself, so the database will manage those constraints 360 00:16:16,930 --> 00:16:18,339 automatically for you. 361 00:16:18,339 --> 00:16:20,380 So additional constraints that you can use there. 362 00:16:20,380 --> 00:16:20,993 Yeah? 363 00:16:20,993 --> 00:16:24,444 AUDIENCE: If you [INAUDIBLE] data on the check [INAUDIBLE] 364 00:16:24,444 --> 00:16:26,560 are you going to have an error message? 365 00:16:26,560 --> 00:16:28,801 SPEAKER: You'll get an error message, yeah. 366 00:16:28,801 --> 00:16:29,300 Yeah? 367 00:16:29,300 --> 00:16:31,508 AUDIENCE: Can you use regular expressions with check? 368 00:16:31,508 --> 00:16:32,940 Or is just like, [INAUDIBLE]? 369 00:16:32,940 --> 00:16:35,062 SPEAKER: You can use regular expressions, maybe. 370 00:16:35,062 --> 00:16:37,020 You'll have to check the documentation on that. 371 00:16:37,020 --> 00:16:37,860 I'm not 100% sure. 372 00:16:37,860 --> 00:16:40,962 373 00:16:40,962 --> 00:16:42,920 All right, so we've created the table, now we'd 374 00:16:42,920 --> 00:16:45,590 like to insert data into the table, to actually add information 375 00:16:45,590 --> 00:16:46,460 into the table. 376 00:16:46,460 --> 00:16:50,660 The syntax for this looks pretty much the same between PostgreSQL and SQLite. 377 00:16:50,660 --> 00:16:52,640 You'll use syntax that looks like this. 378 00:16:52,640 --> 00:16:57,020 In Postgres, the single quotes are how you define strings, not double quotes. 379 00:16:57,020 --> 00:16:59,930 And we'll say something like, insert into, 380 00:16:59,930 --> 00:17:01,642 meaning add something into a table. 381 00:17:01,642 --> 00:17:03,350 Which table do I want to add things into? 382 00:17:03,350 --> 00:17:05,569 I want to add things into the Flights Table. 383 00:17:05,569 --> 00:17:08,720 Next, in parentheses come, what are the column 384 00:17:08,720 --> 00:17:10,700 names that I want to add things into? 385 00:17:10,700 --> 00:17:13,520 I want to add which particular values, I want to add an origin, 386 00:17:13,520 --> 00:17:17,450 I want to add a destination, and I want to add a duration. 387 00:17:17,450 --> 00:17:20,270 Now which column on my table is missing from these column names 388 00:17:20,270 --> 00:17:21,300 that I specified? 389 00:17:21,300 --> 00:17:22,419 AUDIENCE: ID. 390 00:17:22,419 --> 00:17:23,210 SPEAKER: ID, great. 391 00:17:23,210 --> 00:17:24,793 And why do I not need to specify that? 392 00:17:24,793 --> 00:17:26,170 AUDIENCE: [INAUDIBLE] 393 00:17:26,170 --> 00:17:28,170 SPEAKER: It will automatically increment, great. 394 00:17:28,170 --> 00:17:31,378 It automatically adds to the ID, so I don't need to worry about inserting it. 395 00:17:31,378 --> 00:17:34,079 And if I had other columns that had default values for instance, 396 00:17:34,079 --> 00:17:36,370 I wouldn't necessarily need to include them here either 397 00:17:36,370 --> 00:17:38,680 if I wanted them to just take on their default values. 398 00:17:38,680 --> 00:17:42,310 I only include the columns that I actually want to insert values for. 399 00:17:42,310 --> 00:17:45,520 Because after the keyword VALUES in all capital letters comes, 400 00:17:45,520 --> 00:17:48,610 what are the actual values I want to insert into this database? 401 00:17:48,610 --> 00:17:52,730 I want to insert New York as the origin, London as the destination, 402 00:17:52,730 --> 00:17:57,000 and 415 minutes as the duration for that particular flight. 403 00:17:57,000 --> 00:17:59,605 And so that that query will allow me to do that insert 404 00:17:59,605 --> 00:18:03,790 So I'll show you an example of that using Adminer's interface, at least 405 00:18:03,790 --> 00:18:04,940 for now. 406 00:18:04,940 --> 00:18:07,760 I'll go ahead and go to SQL commands. 407 00:18:07,760 --> 00:18:12,500 And here, you could write something like, insert into flights. 408 00:18:12,500 --> 00:18:16,259 Origin, destination, and-- what was the last one-- duration. 409 00:18:16,259 --> 00:18:18,050 And then insert particular values, and I'll 410 00:18:18,050 --> 00:18:22,582 say, OK, New York, and London, and 415. 411 00:18:22,582 --> 00:18:24,290 And I can add multiple different flights. 412 00:18:24,290 --> 00:18:25,914 I'll go ahead and run a couple queries. 413 00:18:25,914 --> 00:18:29,330 Instead of New York, let's say Shanghai. 414 00:18:29,330 --> 00:18:35,750 And this flight is going to go to Paris, and that's going to take 760 minutes. 415 00:18:35,750 --> 00:18:37,730 And I'll add one more. 416 00:18:37,730 --> 00:18:43,580 And we'll say this flight is going to go from Istanbul to Turkey, 417 00:18:43,580 --> 00:18:46,340 and it's going to be 700 minutes. 418 00:18:46,340 --> 00:18:51,122 So just adding some sample data that I can then use in order 419 00:18:51,122 --> 00:18:52,330 to insert a bunch of flights. 420 00:18:52,330 --> 00:18:55,160 So I say, OK, these queries have all executed OK. 421 00:18:55,160 --> 00:18:58,310 One row affected, meaning I've added a row into these databases. 422 00:18:58,310 --> 00:19:01,115 If I go into Flights and go to Select Data-- 423 00:19:01,115 --> 00:19:03,740 and of course there are a bunch of different database browsers, 424 00:19:03,740 --> 00:19:07,040 Adminer is just one of them, but they all operate in fairly similar ways. 425 00:19:07,040 --> 00:19:11,150 I can see here, I've got my table where I've got three rows, each of which 426 00:19:11,150 --> 00:19:15,170 has a unique idea, each one has an origin origin, a destination, 427 00:19:15,170 --> 00:19:18,590 and a duration associated with them as well. 428 00:19:18,590 --> 00:19:22,170 Just for the sake of giving myself a little more data to work with, 429 00:19:22,170 --> 00:19:24,800 I'm going to go into-- 430 00:19:24,800 --> 00:19:31,257 insert-- I'm going to go ahead and grab three extra rows and just insert those. 431 00:19:31,257 --> 00:19:33,590 So I'll go ahead and run three more insert queries, just 432 00:19:33,590 --> 00:19:35,173 to give myself a little bit more data. 433 00:19:35,173 --> 00:19:38,080 That'll make things more interesting in just a moment. 434 00:19:38,080 --> 00:19:42,750 And so now if I look at my Flights Table, Select Data, all right, great. 435 00:19:42,750 --> 00:19:45,540 I've got six rows inside the database now. 436 00:19:45,540 --> 00:19:49,560 437 00:19:49,560 --> 00:19:51,814 Questions about inserting data into a database. 438 00:19:51,814 --> 00:19:53,730 This much at least should be review from CS50. 439 00:19:53,730 --> 00:19:55,969 440 00:19:55,969 --> 00:19:58,760 All right, we'll briefly go through selecting data from a database. 441 00:19:58,760 --> 00:20:01,135 Again, this is mostly review from CS50, but good practice 442 00:20:01,135 --> 00:20:02,930 if you haven't used SQL in a little while. 443 00:20:02,930 --> 00:20:04,780 The basic way to select things from a database 444 00:20:04,780 --> 00:20:06,530 is a query that looks something like this. 445 00:20:06,530 --> 00:20:08,470 Select star from flights. 446 00:20:08,470 --> 00:20:10,600 Flights is the table I want to select things from. 447 00:20:10,600 --> 00:20:12,640 Star means I want to select every column, 448 00:20:12,640 --> 00:20:15,010 I care about getting all the columns from the database. 449 00:20:15,010 --> 00:20:16,330 And that might not always be true. 450 00:20:16,330 --> 00:20:18,579 It might be more efficient to only ask for the columns 451 00:20:18,579 --> 00:20:19,788 that you actually care about. 452 00:20:19,788 --> 00:20:22,662 But if I want to select all the columns from the flight's table, what 453 00:20:22,662 --> 00:20:24,160 I'll get back is all of the data. 454 00:20:24,160 --> 00:20:26,560 Every row in the database is going to come back to me, 455 00:20:26,560 --> 00:20:29,281 for this particular table at least. 456 00:20:29,281 --> 00:20:31,030 But maybe I don't care about the duration, 457 00:20:31,030 --> 00:20:33,010 maybe I don't care about the ID, I only care 458 00:20:33,010 --> 00:20:36,400 about knowing, OK, where flights coming from and where are they going to? 459 00:20:36,400 --> 00:20:38,920 I can revise the query to look a little something like this. 460 00:20:38,920 --> 00:20:42,640 Select the origin, the destination from the flight table, 461 00:20:42,640 --> 00:20:44,980 just specifying the columns that I want to select. 462 00:20:44,980 --> 00:20:47,590 And rather than selecting all of the columns from the table, 463 00:20:47,590 --> 00:20:50,770 that's going to result in just selecting the two columns that I actually 464 00:20:50,770 --> 00:20:54,580 care about, the origin column and the destination column. 465 00:20:54,580 --> 00:20:57,200 Still here I'm selecting every single row, 466 00:20:57,200 --> 00:21:00,370 so you can also add what's called a WHERE clause to specify 467 00:21:00,370 --> 00:21:03,010 which rows you actually want to select. 468 00:21:03,010 --> 00:21:06,580 So you can write something like, select star from flights where ID equals 3. 469 00:21:06,580 --> 00:21:08,950 In other words, just select flight number three and that 470 00:21:08,950 --> 00:21:11,110 will only return all of the columns-- 471 00:21:11,110 --> 00:21:12,760 as indicated by the star-- 472 00:21:12,760 --> 00:21:17,999 from the table row that has an ID of three. 473 00:21:17,999 --> 00:21:20,290 All of this at least so far should be review from CS50, 474 00:21:20,290 --> 00:21:21,581 but questions before I move on? 475 00:21:21,581 --> 00:21:24,620 476 00:21:24,620 --> 00:21:26,564 Queries can return multiple roles as well. 477 00:21:26,564 --> 00:21:29,480 If I did a query that looked a little something like this, select star 478 00:21:29,480 --> 00:21:31,940 from flights where the origin is equal to New York, 479 00:21:31,940 --> 00:21:35,450 well that's going to select only the flights where the origin column has 480 00:21:35,450 --> 00:21:39,262 a value of New York, and that will return just those two, for instance. 481 00:21:39,262 --> 00:21:40,970 And it doesn't just have to be a quality. 482 00:21:40,970 --> 00:21:44,690 You can use other Boolean expressions here as well, greater than, less than. 483 00:21:44,690 --> 00:21:46,850 For instance, I could say, select all the flights 484 00:21:46,850 --> 00:21:49,460 who have a duration of greater than some number of minutes, 485 00:21:49,460 --> 00:21:51,376 and what I'll get back is all of the rows that 486 00:21:51,376 --> 00:21:55,070 match that query, only the longest flights, for example. 487 00:21:55,070 --> 00:21:57,050 And because they're Boolean expressions, you 488 00:21:57,050 --> 00:22:00,500 can combine them in order to combine multiple Boolean expressions 489 00:22:00,500 --> 00:22:01,850 to create more complex queries. 490 00:22:01,850 --> 00:22:04,280 I could say something like, select start from flights 491 00:22:04,280 --> 00:22:08,300 where the destination is Paris and the duration is greater than 500, 492 00:22:08,300 --> 00:22:11,300 and that will only give me flights that have a destination to Paris 493 00:22:11,300 --> 00:22:14,780 and that are longer than 500 minutes, even if there are other flights that 494 00:22:14,780 --> 00:22:17,116 are going to Paris, even if there are other flights that 495 00:22:17,116 --> 00:22:17,990 have a long duration. 496 00:22:17,990 --> 00:22:20,610 497 00:22:20,610 --> 00:22:22,920 Of course if you change the and to an or, that will 498 00:22:22,920 --> 00:22:24,630 create a different Boolean expression. 499 00:22:24,630 --> 00:22:27,330 That will return to you anything that either is going to Paris, 500 00:22:27,330 --> 00:22:28,984 or is a long flight, or both. 501 00:22:28,984 --> 00:22:30,150 And that's totally fine too. 502 00:22:30,150 --> 00:22:33,624 503 00:22:33,624 --> 00:22:35,040 This is all stuff you saw in CS50. 504 00:22:35,040 --> 00:22:37,915 There are a couple other types of queries that we didn't see in CS50. 505 00:22:37,915 --> 00:22:40,710 So things like, if I want for a column to be 506 00:22:40,710 --> 00:22:44,754 in some possible set of possible values as opposed to one particular value, 507 00:22:44,754 --> 00:22:46,920 I might have a query that looks something like this. 508 00:22:46,920 --> 00:22:49,860 Select star from flights where the origin is in-- 509 00:22:49,860 --> 00:22:52,504 and then this tuple of New York and Lima. 510 00:22:52,504 --> 00:22:55,170 In other words, I want the origin to be one of those two cities, 511 00:22:55,170 --> 00:22:57,336 but I don't really care which, just return to me all 512 00:22:57,336 --> 00:22:59,470 of the rows that match either one. 513 00:22:59,470 --> 00:23:03,690 And so we can do a query like that in order to get those results. 514 00:23:03,690 --> 00:23:06,890 And sometimes I want to be able to perform a search, 515 00:23:06,890 --> 00:23:10,730 but I don't quite yet have the entire column name. 516 00:23:10,730 --> 00:23:12,509 Like maybe I want to be able to implement 517 00:23:12,509 --> 00:23:14,300 some sort of auto-complete feature, whereby 518 00:23:14,300 --> 00:23:17,690 if I start typing the name of a city, my airline application is 519 00:23:17,690 --> 00:23:19,924 able to figure out what flight I'm referring to, 520 00:23:19,924 --> 00:23:21,590 even if I don't type in the whole thing. 521 00:23:21,590 --> 00:23:23,687 Or you might imagine that search engines do things 522 00:23:23,687 --> 00:23:26,270 like this, where you can type in a word, and even if it's only 523 00:23:26,270 --> 00:23:28,460 a partial match of something, it will ultimately 524 00:23:28,460 --> 00:23:30,287 result in matching that particular row. 525 00:23:30,287 --> 00:23:32,120 And so to do that, you might use syntax that 526 00:23:32,120 --> 00:23:34,220 looks a little something like this. 527 00:23:34,220 --> 00:23:38,210 Select star from flights where origin is like-- 528 00:23:38,210 --> 00:23:43,530 keyword like-- and then %a%. 529 00:23:43,530 --> 00:23:46,310 The percent sign here is a wildcard, basically means, 530 00:23:46,310 --> 00:23:48,020 this could be anything. 531 00:23:48,020 --> 00:23:52,250 And so what I'm here saying is, the origin has an a, 532 00:23:52,250 --> 00:23:56,220 and before that could be anything, and after that could be anything. 533 00:23:56,220 --> 00:23:59,660 In other words, get me any flight where the origin has the letter a in it 534 00:23:59,660 --> 00:24:03,497 somewhere, and I'll get back all the flights that have an a in it somewhere. 535 00:24:03,497 --> 00:24:05,330 So you imagine an application where I'm just 536 00:24:05,330 --> 00:24:09,140 typing in where I want my flight to be going from, I type the letter a, 537 00:24:09,140 --> 00:24:12,000 we might run a query like this, return the possible results, 538 00:24:12,000 --> 00:24:14,450 and those are going to be the matches for all the flights 539 00:24:14,450 --> 00:24:17,840 whose origin matches a particular expression like that. 540 00:24:17,840 --> 00:24:21,605 541 00:24:21,605 --> 00:24:22,230 Yeah, question? 542 00:24:22,230 --> 00:24:24,360 AUDIENCE: Is that case sensitive? 543 00:24:24,360 --> 00:24:26,460 SPEAKER: Is that case sensitive? 544 00:24:26,460 --> 00:24:29,552 It's probably going to depend a little bit on the database implementation. 545 00:24:29,552 --> 00:24:31,760 There are some databases that are not case sensitive, 546 00:24:31,760 --> 00:24:33,718 and I can't remember exactly which one this is. 547 00:24:33,718 --> 00:24:37,872 548 00:24:37,872 --> 00:24:40,830 In addition to that, there are some functions that you can use as well. 549 00:24:40,830 --> 00:24:42,720 I can sum up rows, I can count the number 550 00:24:42,720 --> 00:24:45,000 of rows that match something, get the min, or the max, 551 00:24:45,000 --> 00:24:46,350 or the average of something. 552 00:24:46,350 --> 00:24:50,500 And so I'll show you some examples of that as well. 553 00:24:50,500 --> 00:24:54,990 So let's go ahead and go into SQL Command where I can run a SQL command. 554 00:24:54,990 --> 00:24:57,240 And the basic command is just, select star 555 00:24:57,240 --> 00:25:03,710 from flights, which I can type in there, click Execute, and the result of that 556 00:25:03,710 --> 00:25:05,060 is going to be-- 557 00:25:05,060 --> 00:25:07,070 all right, all of the rows, as I can see here. 558 00:25:07,070 --> 00:25:09,770 Select star from flights just returns me everything. 559 00:25:09,770 --> 00:25:16,760 I could also say something like, all right, SELECT COUNT star from flights. 560 00:25:16,760 --> 00:25:18,860 Count star-- star just meaning all the columns, 561 00:25:18,860 --> 00:25:21,470 and count just meaning count these things-- 562 00:25:21,470 --> 00:25:23,330 means, get me just the number of flights, 563 00:25:23,330 --> 00:25:26,330 if I just care about the number of flights that have a certain property. 564 00:25:26,330 --> 00:25:30,260 I press Execute and I get back a table whose column is count 565 00:25:30,260 --> 00:25:35,760 and just has a single row, and that rows value is 6, for example. 566 00:25:35,760 --> 00:25:37,620 I could add to this query, SELECT COUNT star 567 00:25:37,620 --> 00:25:43,500 from flights where the origin equals New York, for example, to just get 568 00:25:43,500 --> 00:25:46,640 the number of flights whose origin is New York. 569 00:25:46,640 --> 00:25:48,050 Press Execute. 570 00:25:48,050 --> 00:25:50,170 And all right, what I get back is 2. 571 00:25:50,170 --> 00:25:53,671 And now, you can imagine that this is not necessary for your web application, 572 00:25:53,671 --> 00:25:54,170 right? 573 00:25:54,170 --> 00:25:57,260 If you wanted to know the number of flights that are going from New York, 574 00:25:57,260 --> 00:26:00,982 you could just query for all the flights that are from New York in Python, 575 00:26:00,982 --> 00:26:03,440 for example, as you might have done in finance or something 576 00:26:03,440 --> 00:26:07,570 similar, and then just count the number of rows that come back. 577 00:26:07,570 --> 00:26:15,620 But why might this be preferable, using built in SQL functions? 578 00:26:15,620 --> 00:26:16,120 Yeah? 579 00:26:16,120 --> 00:26:17,354 AUDIENCE: A little faster? 580 00:26:17,354 --> 00:26:19,020 SPEAKER: Yeah, it's a little bit faster. 581 00:26:19,020 --> 00:26:20,936 The database is only returning the information 582 00:26:20,936 --> 00:26:22,670 I actually care about, just the count. 583 00:26:22,670 --> 00:26:24,770 And it's not returning the information I don't care about. 584 00:26:24,770 --> 00:26:27,470 If I don't care about where any of these New York flights are going, 585 00:26:27,470 --> 00:26:30,511 and I just want to know how many results there are, then all I need to do 586 00:26:30,511 --> 00:26:31,880 is return this count. 587 00:26:31,880 --> 00:26:33,950 And so you can do this for other things as well. 588 00:26:33,950 --> 00:26:37,700 I could say, all right, give me the average duration 589 00:26:37,700 --> 00:26:39,342 from the Flights Table. 590 00:26:39,342 --> 00:26:42,050 In other words, take all the flights, take their duration column, 591 00:26:42,050 --> 00:26:44,820 and just go ahead and give me the average duration. 592 00:26:44,820 --> 00:26:48,000 And I can press Execute, and all right. 593 00:26:48,000 --> 00:26:52,077 The average duration is 501 and 2/3 minutes, for example. 594 00:26:52,077 --> 00:26:53,910 I can add to that, like the average duration 595 00:26:53,910 --> 00:26:57,090 of flights to New York or whatnot, in order 596 00:26:57,090 --> 00:26:59,660 to add arbitrary logic to my queries as well. 597 00:26:59,660 --> 00:27:03,659 598 00:27:03,659 --> 00:27:04,950 Questions about selecting data? 599 00:27:04,950 --> 00:27:09,230 600 00:27:09,230 --> 00:27:09,730 All right. 601 00:27:09,730 --> 00:27:12,819 Updating data, again, very similar to what we saw in CS50, 602 00:27:12,819 --> 00:27:14,110 it's just going to be a review. 603 00:27:14,110 --> 00:27:16,026 The syntax looks a little something like this. 604 00:27:16,026 --> 00:27:18,150 You update the flight name, called Flights-- 605 00:27:18,150 --> 00:27:20,020 or the table name called Flights. 606 00:27:20,020 --> 00:27:24,280 Setting a column to a particular value, so setting duration of 430 607 00:27:24,280 --> 00:27:26,002 where a particular condition is true. 608 00:27:26,002 --> 00:27:28,210 So the New York to London flight, I want that to have 609 00:27:28,210 --> 00:27:33,040 a duration of 430, for example. 610 00:27:33,040 --> 00:27:35,542 And deleting flights, again, just a review from CS50 611 00:27:35,542 --> 00:27:37,000 looks a little something like this. 612 00:27:37,000 --> 00:27:39,200 DELETE FROM, name of the table. 613 00:27:39,200 --> 00:27:41,530 And then if I just had DELETE FROM Flights, 614 00:27:41,530 --> 00:27:44,770 that would delete every flight, which is not what I want to do, hopefully. 615 00:27:44,770 --> 00:27:46,300 What I want to do is specify a condition, 616 00:27:46,300 --> 00:27:48,133 delete from flights where the destination is 617 00:27:48,133 --> 00:27:51,210 Tokyo, if for some reason I didn't want any of my flights to go to Tokyo. 618 00:27:51,210 --> 00:27:58,562 You could do a command like that in order to get that result. 619 00:27:58,562 --> 00:27:59,770 Questions on anything so far? 620 00:27:59,770 --> 00:28:02,540 621 00:28:02,540 --> 00:28:05,540 There are other clauses that you can add to your queries as well. 622 00:28:05,540 --> 00:28:08,270 If I say, select star from Flights, LIMIT 5, 623 00:28:08,270 --> 00:28:11,720 for example, that will just get me five flights instead of all of the flights, 624 00:28:11,720 --> 00:28:14,840 if I only wanted to get a subset of them. 625 00:28:14,840 --> 00:28:18,370 ORDER BY is a way that you can sort what order the results come back in. 626 00:28:18,370 --> 00:28:22,250 And I could say, select star from Flights, ORDER BY duration, and then 627 00:28:22,250 --> 00:28:25,280 ASC for ascending, or DESC for descending, 628 00:28:25,280 --> 00:28:29,170 in order to get all of the flights sorted by their duration, for instance. 629 00:28:29,170 --> 00:28:30,170 And I can combine these. 630 00:28:30,170 --> 00:28:34,820 If I said, select star from flights, ORDER BY duration, and then LIMIT 5, 631 00:28:34,820 --> 00:28:37,520 I'll just get the five shortest flights, for example, 632 00:28:37,520 --> 00:28:39,530 because I'm sorting them in order of duration 633 00:28:39,530 --> 00:28:41,480 and then just getting the first five rows. 634 00:28:41,480 --> 00:28:43,640 So that works as well. 635 00:28:43,640 --> 00:28:46,500 GROUP BY is a way of grouping results together. 636 00:28:46,500 --> 00:28:48,890 So if I want to count how many flights are 637 00:28:48,890 --> 00:28:51,290 coming from each of the various different locations, 638 00:28:51,290 --> 00:28:54,470 you can group things by their origin as well, for instance. 639 00:28:54,470 --> 00:29:02,270 So I could say something like, select origin from Flights. 640 00:29:02,270 --> 00:29:05,540 And then if I say, GROUP BY origin, what that's going to do 641 00:29:05,540 --> 00:29:08,720 is it's going to take all of the rows that have the same origin and group 642 00:29:08,720 --> 00:29:11,707 them together into the same result. 643 00:29:11,707 --> 00:29:13,540 And so instead of just selecting origin, I'm 644 00:29:13,540 --> 00:29:15,890 going to select origin and count star. 645 00:29:15,890 --> 00:29:19,320 Recall that count star is going to count how many rows return 646 00:29:19,320 --> 00:29:21,360 for a particular query. 647 00:29:21,360 --> 00:29:24,485 And so I'll show you what this looks like, and then we'll look at the query 648 00:29:24,485 --> 00:29:26,010 and see why it works. 649 00:29:26,010 --> 00:29:27,090 And we get, OK. 650 00:29:27,090 --> 00:29:30,510 Shanghai, 1, New York, 2, Istanbul, 1, Lima, 1, Moscow, 1. 651 00:29:30,510 --> 00:29:33,570 And what that's basically doing is telling me how many 652 00:29:33,570 --> 00:29:36,630 flights have any particular origin. 653 00:29:36,630 --> 00:29:38,730 What I was doing is, by saying GROUP BY origin, 654 00:29:38,730 --> 00:29:41,670 I took all of the rows, grouped them by their origin, 655 00:29:41,670 --> 00:29:44,280 then I'm selecting just the name of the origin, 656 00:29:44,280 --> 00:29:47,999 and then the count of how many rows are there in that grouping. 657 00:29:47,999 --> 00:29:50,040 You could do the same thing with other functions. 658 00:29:50,040 --> 00:29:52,415 If I wanted to find the average length of a flight coming 659 00:29:52,415 --> 00:29:55,039 from any particular city, you could run the same query in order 660 00:29:55,039 --> 00:29:56,640 to get that information as well. 661 00:29:56,640 --> 00:30:00,120 And so grouped by you can use in order to collect results together 662 00:30:00,120 --> 00:30:04,064 in order to get more interesting queries there too. 663 00:30:04,064 --> 00:30:04,980 Questions on anything? 664 00:30:04,980 --> 00:30:10,970 665 00:30:10,970 --> 00:30:12,250 All right. 666 00:30:12,250 --> 00:30:18,810 Let's go on then and talk a little bit briefly about foreign keys. 667 00:30:18,810 --> 00:30:21,990 And so where SQL gets very powerful and where we really 668 00:30:21,990 --> 00:30:26,502 start to use it a lot more is in terms of relationships between tables, 669 00:30:26,502 --> 00:30:28,710 again, something that you saw in CS50, but that we'll 670 00:30:28,710 --> 00:30:30,100 explore a little more today. 671 00:30:30,100 --> 00:30:34,267 Well, we might have a flights table, but that flights table maybe 672 00:30:34,267 --> 00:30:35,850 wants to store additional information. 673 00:30:35,850 --> 00:30:38,760 In addition to just storing the origin and the destination, 674 00:30:38,760 --> 00:30:42,960 maybe we also want to store information like what the airport 675 00:30:42,960 --> 00:30:44,790 code is for that particular airport. 676 00:30:44,790 --> 00:30:47,430 So I could do something like this, have an origin, 677 00:30:47,430 --> 00:30:50,370 have the origin airport code, have the destination, 678 00:30:50,370 --> 00:30:55,440 have the destination airport code, and just add columns to my existing table. 679 00:30:55,440 --> 00:30:57,110 But this might not be the best design. 680 00:30:57,110 --> 00:30:58,860 So what I want you to start doing today is 681 00:30:58,860 --> 00:31:00,401 start thinking about database design. 682 00:31:00,401 --> 00:31:02,100 How should your tables be organized? 683 00:31:02,100 --> 00:31:03,600 Should you have multiple tables? 684 00:31:03,600 --> 00:31:05,850 What fields belong with which tables? 685 00:31:05,850 --> 00:31:08,280 And you imagine that if I start doing things like this, 686 00:31:08,280 --> 00:31:10,864 my database very quickly is going to start to get complicated. 687 00:31:10,864 --> 00:31:13,571 I'm going to run into situations where, OK, maybe now I also want 688 00:31:13,571 --> 00:31:16,260 to store the latitude and longitude of each of these airports, 689 00:31:16,260 --> 00:31:20,250 and now I'm adding more and more columns to this table. 690 00:31:20,250 --> 00:31:22,860 And so what I could do instead is separate things 691 00:31:22,860 --> 00:31:26,010 out into a different table, partition the database in order 692 00:31:26,010 --> 00:31:28,740 to separate out information that I want to store separately. 693 00:31:28,740 --> 00:31:33,840 Just have a locations table where every location has the location code and also 694 00:31:33,840 --> 00:31:38,580 the name of that particular airport, or the location where that airport is. 695 00:31:38,580 --> 00:31:43,364 And then my Flights Table is going to look like what? 696 00:31:43,364 --> 00:31:45,780 What would the flights table look like if I have locations 697 00:31:45,780 --> 00:31:47,902 stored in a structure like this? 698 00:31:47,902 --> 00:31:48,860 What columns do I need? 699 00:31:48,860 --> 00:31:49,070 Yeah? 700 00:31:49,070 --> 00:31:52,395 AUDIENCE: Well, you wouldn't need the-- if you had a flights table separately, 701 00:31:52,395 --> 00:31:54,770 you wouldn't need the whole [INAUDIBLE] airport. 702 00:31:54,770 --> 00:31:58,207 [INAUDIBLE] Or you could reverse it, right? 703 00:31:58,207 --> 00:31:58,790 SPEAKER: Yeah. 704 00:31:58,790 --> 00:32:02,540 You don't need to store now both the airport code and the name. 705 00:32:02,540 --> 00:32:05,480 You can just pick one column of this, ideally a unique column-- 706 00:32:05,480 --> 00:32:07,760 and the default choice is going to be the ID-- 707 00:32:07,760 --> 00:32:11,420 to say, if I know that the flight is going from location ID 1 708 00:32:11,420 --> 00:32:14,720 to location ID 2, then I know it's a flight from New York to Shanghai, 709 00:32:14,720 --> 00:32:15,395 for example. 710 00:32:15,395 --> 00:32:17,270 And so my flight's table can effectively just 711 00:32:17,270 --> 00:32:20,540 turn into this, whereby I have an ID for the flight, and then 712 00:32:20,540 --> 00:32:22,820 rather than storing the origin and the destination, 713 00:32:22,820 --> 00:32:26,690 I store the origin ID and the destination ID, 714 00:32:26,690 --> 00:32:29,400 and then the duration for that flight as well. 715 00:32:29,400 --> 00:32:31,847 And then if I wanted to store passengers on flights, 716 00:32:31,847 --> 00:32:33,680 I could do something along the lines of this 717 00:32:33,680 --> 00:32:35,720 where I store a whole bunch of people's names, 718 00:32:35,720 --> 00:32:38,620 and then just store what flight number they're associated with. 719 00:32:38,620 --> 00:32:40,370 And so the SQL gets very powerful when you 720 00:32:40,370 --> 00:32:45,380 begin to reference other tables by their ID, or by their primary key, 721 00:32:45,380 --> 00:32:48,046 and then be able to relate them together. 722 00:32:48,046 --> 00:32:49,420 And so we'll take a look at this. 723 00:32:49,420 --> 00:32:52,120 724 00:32:52,120 --> 00:32:57,290 I will now go ahead and create a Passengers Table. 725 00:32:57,290 --> 00:33:04,630 And if I go in to joins.SQL in just a moment, let me select some rows. 726 00:33:04,630 --> 00:33:06,680 I'm going to run a couple queries. 727 00:33:06,680 --> 00:33:09,160 I'm going to create a table called Passengers, where 728 00:33:09,160 --> 00:33:10,960 Passengers is going to have an ID. 729 00:33:10,960 --> 00:33:12,850 It's going to have a name of the passenger 730 00:33:12,850 --> 00:33:15,755 and it's going to have a flight ID, which is going to be an integer. 731 00:33:15,755 --> 00:33:18,130 And interestingly here, I added to the end of this query. 732 00:33:18,130 --> 00:33:21,820 I added flight ID integer references flights. 733 00:33:21,820 --> 00:33:24,190 In other words, I'm saying that this flight ID 734 00:33:24,190 --> 00:33:29,020 column is going to be a column that is going to reference some other table. 735 00:33:29,020 --> 00:33:31,630 And in particular, it's referencing the Flights Table. 736 00:33:31,630 --> 00:33:37,240 This flight ID is going to refer to the ID of the Flights Table. 737 00:33:37,240 --> 00:33:39,170 And now I can insert data into this table. 738 00:33:39,170 --> 00:33:41,920 I can add a whole bunch of names in the flights 739 00:33:41,920 --> 00:33:45,730 that they're associated with so that I can add passengers to this flight 740 00:33:45,730 --> 00:33:47,890 as well. 741 00:33:47,890 --> 00:33:51,400 So I'll go ahead and click Execute here. 742 00:33:51,400 --> 00:33:53,540 And all right, I've created a Passengers Table 743 00:33:53,540 --> 00:33:57,899 and have added a whole bunch of passengers into my database. 744 00:33:57,899 --> 00:33:58,940 Questions before I go on? 745 00:33:58,940 --> 00:34:06,160 746 00:34:06,160 --> 00:34:06,660 All right. 747 00:34:06,660 --> 00:34:10,170 So the next thing that I'll want to do is somehow join this information 748 00:34:10,170 --> 00:34:11,040 together. 749 00:34:11,040 --> 00:34:14,070 I have data stored separately now, I've got a Flights Table, 750 00:34:14,070 --> 00:34:15,690 and I've got a Passenger Table. 751 00:34:15,690 --> 00:34:18,239 But in many cases, I'll want to be able to access 752 00:34:18,239 --> 00:34:19,710 that information collectively. 753 00:34:19,710 --> 00:34:23,219 Like, I want to know, Alice, where is her flight departing from? 754 00:34:23,219 --> 00:34:25,320 And I want to know the origin of Alice's flight. 755 00:34:25,320 --> 00:34:27,570 And that information is not stored in a single table. 756 00:34:27,570 --> 00:34:31,650 It's stored in two tables, because Alice is stored in the Passengers Table, 757 00:34:31,650 --> 00:34:35,340 and her flight is being stored in a Flights Table separately, 758 00:34:35,340 --> 00:34:38,110 and I'd like to be able to relate that information together. 759 00:34:38,110 --> 00:34:41,909 And so there are a number of different types of joins that I can do. 760 00:34:41,909 --> 00:34:43,679 The default kind of JOIN that we'll do is 761 00:34:43,679 --> 00:34:45,960 what's called an INNER JOIN, which is basically 762 00:34:45,960 --> 00:34:48,960 just taking all of the rows that match between two tables 763 00:34:48,960 --> 00:34:52,020 and only returning their intersection, the results that 764 00:34:52,020 --> 00:34:53,670 have then come together. 765 00:34:53,670 --> 00:34:55,980 LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN 766 00:34:55,980 --> 00:34:59,220 are different types of joins that work in slightly different ways. 767 00:34:59,220 --> 00:35:01,920 We're not going to worry about them too much because INNER JOINS will usually 768 00:35:01,920 --> 00:35:05,003 take care of what we need to do, but just introducing yourself to the idea 769 00:35:05,003 --> 00:35:08,350 that there are other types of queries for other different situations, 770 00:35:08,350 --> 00:35:09,600 depending on your use case. 771 00:35:09,600 --> 00:35:11,790 Might be interesting, as well. 772 00:35:11,790 --> 00:35:15,420 But the syntax for what this is going to look like is going to look like this. 773 00:35:15,420 --> 00:35:18,200 774 00:35:18,200 --> 00:35:25,187 I would like to select information about all the passengers 775 00:35:25,187 --> 00:35:27,520 and where they're going from and where they're going to. 776 00:35:27,520 --> 00:35:30,100 So if I were to just select star from passengers, 777 00:35:30,100 --> 00:35:33,550 just select all the data in the Passengers Table and click Execute 778 00:35:33,550 --> 00:35:36,190 here, what I get is something like this. 779 00:35:36,190 --> 00:35:39,730 A whole bunch of IDs, names of the passengers, and what flights 780 00:35:39,730 --> 00:35:41,344 they're on. 781 00:35:41,344 --> 00:35:43,510 But I would like to get not just what flight they're 782 00:35:43,510 --> 00:35:46,720 on, but their origin and their destination, where they're going from 783 00:35:46,720 --> 00:35:48,520 and where they're going to. 784 00:35:48,520 --> 00:35:50,710 So I'm going to modify this query slightly. 785 00:35:50,710 --> 00:35:54,190 I want to select the passengers name, but I also 786 00:35:54,190 --> 00:35:59,770 want to select their origin and their destination. 787 00:35:59,770 --> 00:36:03,880 But unfortunately, I can't just select this from the Flight's Table, 788 00:36:03,880 --> 00:36:06,370 and I can't just select this from the Passengers Table, 789 00:36:06,370 --> 00:36:10,340 because none of those two tables contain all of this information. 790 00:36:10,340 --> 00:36:13,330 The Passengers Table contains things like the name, 791 00:36:13,330 --> 00:36:16,690 but the Flights Table contains things like origin and destination, 792 00:36:16,690 --> 00:36:17,990 for example. 793 00:36:17,990 --> 00:36:21,270 And so what I want to do is, I want to join these tables together. 794 00:36:21,270 --> 00:36:25,030 And I'm going to explicitly say, INNER JOIN these tables together. 795 00:36:25,030 --> 00:36:30,670 And I want to join at the Flights Table with the Passengers Table. 796 00:36:30,670 --> 00:36:32,760 So I want to join two flights together. 797 00:36:32,760 --> 00:36:35,010 And then in the query I need to specify, what 798 00:36:35,010 --> 00:36:36,970 is the relationship between these two tables? 799 00:36:36,970 --> 00:36:40,846 How is it that these two tables are connected to each other? 800 00:36:40,846 --> 00:36:43,470 And the way that these two tables are connected to each other-- 801 00:36:43,470 --> 00:36:45,690 going onto a new line just for clarity here, 802 00:36:45,690 --> 00:36:48,030 though it doesn't need to be on a new line. 803 00:36:48,030 --> 00:36:50,940 I'm going to join these on what condition? 804 00:36:50,940 --> 00:36:58,250 Well, passengers.flight_id, so Passengers is the name of the table, 805 00:36:58,250 --> 00:37:00,240 .flight_id is the column. 806 00:37:00,240 --> 00:37:03,950 So this column, the flight ID column of the passengers table, 807 00:37:03,950 --> 00:37:07,675 needs to equal what? 808 00:37:07,675 --> 00:37:11,045 What should this be equivalent to if I want to join these things together? 809 00:37:11,045 --> 00:37:12,470 [INTERPOSING VOICES] 810 00:37:12,470 --> 00:37:13,790 SPEAKER: Flights.id, great. 811 00:37:13,790 --> 00:37:18,110 Flights, again, is the name of a table, and ID is the name of a column. 812 00:37:18,110 --> 00:37:21,050 And so what I'm saying here is that only join things together 813 00:37:21,050 --> 00:37:25,190 if the flight ID of the passenger matches up with the ID of the flight. 814 00:37:25,190 --> 00:37:28,294 I want to merge those things together. 815 00:37:28,294 --> 00:37:29,960 So I'll go ahead and click Execute here. 816 00:37:29,960 --> 00:37:32,820 817 00:37:32,820 --> 00:37:36,740 And what I get is a new table that's returned back 818 00:37:36,740 --> 00:37:39,740 to me that is exactly the information that I want. 819 00:37:39,740 --> 00:37:42,770 I've joined the information from two different tables, the Passengers 820 00:37:42,770 --> 00:37:46,460 Table and the Flights Table, and I have a list of passengers, 821 00:37:46,460 --> 00:37:49,040 and also their origin and their destination, 822 00:37:49,040 --> 00:37:50,810 all returned to me in a single query. 823 00:37:50,810 --> 00:37:51,310 Yeah? 824 00:37:51,310 --> 00:37:54,262 AUDIENCE: If you have two tables that you're trying to [INAUDIBLE] 825 00:37:54,262 --> 00:38:00,658 and there happen to be [INAUDIBLE] of a column that both tables have, 826 00:38:00,658 --> 00:38:04,102 can you just not reference that name [INAUDIBLE]?? 827 00:38:04,102 --> 00:38:07,644 What happens if you select only [INAUDIBLE]?? 828 00:38:07,644 --> 00:38:08,810 SPEAKER: Excellent question. 829 00:38:08,810 --> 00:38:11,630 So what happens if I'm doing this select query 830 00:38:11,630 --> 00:38:14,480 and two tables have a column that have the same name? 831 00:38:14,480 --> 00:38:16,160 How do I deal with that situation? 832 00:38:16,160 --> 00:38:20,180 You'll just have to disambiguate it to tell Postgres which one you actually 833 00:38:20,180 --> 00:38:20,772 mean. 834 00:38:20,772 --> 00:38:23,480 And so you would do something like, rather than just select name, 835 00:38:23,480 --> 00:38:25,850 I would say select passengers.name. 836 00:38:25,850 --> 00:38:28,946 Meaning, from the Passengers Table, select the name column. 837 00:38:28,946 --> 00:38:31,070 If you don't include it, it's just going to default 838 00:38:31,070 --> 00:38:35,810 to assuming that there's only going to be one of them and just returning that. 839 00:38:35,810 --> 00:38:38,860 But you can be more explicit about it and specify the table name as well. 840 00:38:38,860 --> 00:38:39,020 Yeah? 841 00:38:39,020 --> 00:38:41,270 AUDIENCE: Will this table automatically upgrade as you 842 00:38:41,270 --> 00:38:44,132 add more information to other tables? 843 00:38:44,132 --> 00:38:46,090 SPEAKER: Would this table automatically update? 844 00:38:46,090 --> 00:38:48,250 This table is only computed one time. 845 00:38:48,250 --> 00:38:51,700 When I run this query, it generates this table and gives it back to me, 846 00:38:51,700 --> 00:38:53,110 the person running the query. 847 00:38:53,110 --> 00:38:56,230 And so if I update the table, I'll need to run the query again, 848 00:38:56,230 --> 00:38:58,960 and then it will be updated with the latest information. 849 00:38:58,960 --> 00:39:00,329 Correct. 850 00:39:00,329 --> 00:39:00,829 Yeah? 851 00:39:00,829 --> 00:39:02,870 AUDIENCE: If you're trying to do it the other way 852 00:39:02,870 --> 00:39:11,680 around, with the passengers [INAUDIBLE],, would it just only [INAUDIBLE]?? 853 00:39:11,680 --> 00:39:15,340 SPEAKER: If you're trying to get the passengers map to the flight IDs? 854 00:39:15,340 --> 00:39:20,140 What you'd probably get is you'd get one row for each pairing of passenger 855 00:39:20,140 --> 00:39:20,830 and flight. 856 00:39:20,830 --> 00:39:22,540 So a flight would appear multiple times. 857 00:39:22,540 --> 00:39:24,290 And that's actually what's happening here, 858 00:39:24,290 --> 00:39:26,350 that we have this flight, New York to London, 859 00:39:26,350 --> 00:39:30,323 showing up multiple times because both Alice and Bob are on it. 860 00:39:30,323 --> 00:39:30,864 AUDIENCE: OK. 861 00:39:30,864 --> 00:39:34,264 862 00:39:34,264 --> 00:39:35,180 SPEAKER: Other things? 863 00:39:35,180 --> 00:39:40,800 864 00:39:40,800 --> 00:39:42,660 All right. 865 00:39:42,660 --> 00:39:46,720 We'll do a couple other things about SQL queries. 866 00:39:46,720 --> 00:39:48,640 One thing that's just good to be aware of 867 00:39:48,640 --> 00:39:52,960 is the idea that you can create an index on a table for a particular column. 868 00:39:52,960 --> 00:39:55,510 This is particularly useful as you begin to scale a database. 869 00:39:55,510 --> 00:40:00,640 And an index on a table is very similar to an index on like a book, 870 00:40:00,640 --> 00:40:01,210 for instance. 871 00:40:01,210 --> 00:40:03,844 If you want to be able to find something more easily in a book, 872 00:40:03,844 --> 00:40:06,760 you can add an index to the book whereby in order to look something up 873 00:40:06,760 --> 00:40:08,770 in the book, you just go immediately to the index, 874 00:40:08,770 --> 00:40:11,200 find the thing you're looking for, and that tells you where to go. 875 00:40:11,200 --> 00:40:12,533 A table can have the same thing. 876 00:40:12,533 --> 00:40:17,320 If you add an index to a column, like the origin column of our Flights Table, 877 00:40:17,320 --> 00:40:20,140 then I can very easily look up a flight by its origin. 878 00:40:20,140 --> 00:40:24,070 I just need to look up the origin in the index and immediately select that data. 879 00:40:24,070 --> 00:40:25,810 So the create index query-- 880 00:40:25,810 --> 00:40:29,020 adding an index to your table will make your select queries faster. 881 00:40:29,020 --> 00:40:34,120 It will make it faster to query data from columns that you have an index on. 882 00:40:34,120 --> 00:40:36,190 So why then, given that information, do you 883 00:40:36,190 --> 00:40:39,070 think we might not just want to create an index on every column? 884 00:40:39,070 --> 00:40:44,644 885 00:40:44,644 --> 00:40:47,060 We add an index to a book, makes it easier to select data. 886 00:40:47,060 --> 00:40:48,270 Why not just add it to every column? 887 00:40:48,270 --> 00:40:48,740 Yeah? 888 00:40:48,740 --> 00:40:50,300 AUDIENCE: [INAUDIBLE] bigger [INAUDIBLE]?? 889 00:40:50,300 --> 00:40:51,590 SPEAKER: Makes the database bigger, for sure. 890 00:40:51,590 --> 00:40:53,480 I need to store more information now because in addition 891 00:40:53,480 --> 00:40:56,000 to storing all the data, I also need to store the index. 892 00:40:56,000 --> 00:40:56,810 Great. 893 00:40:56,810 --> 00:40:58,430 Why else? 894 00:40:58,430 --> 00:40:59,466 It's a good reason. 895 00:40:59,466 --> 00:41:00,590 There are other things too. 896 00:41:00,590 --> 00:41:00,830 Yeah? 897 00:41:00,830 --> 00:41:02,786 AUDIENCE: Just like, plug in indexes can't 898 00:41:02,786 --> 00:41:06,209 have entities on every single [INAUDIBLE] book talks about. 899 00:41:06,209 --> 00:41:10,935 Maybe if you have an index on too many things, it actually [INAUDIBLE]?? 900 00:41:10,935 --> 00:41:12,810 SPEAKER: It could make things more difficult. 901 00:41:12,810 --> 00:41:14,130 So certainly you're getting into a good idea 902 00:41:14,130 --> 00:41:15,838 here, that there are trade-offs involved. 903 00:41:15,838 --> 00:41:19,320 And so in particular, one trade-off of creating an index 904 00:41:19,320 --> 00:41:23,370 is that now we need to keep updating that index any time your data changes. 905 00:41:23,370 --> 00:41:25,860 If I insert a row, delete a row, update a row, 906 00:41:25,860 --> 00:41:29,580 I might need to update the index in order to reflect these new values. 907 00:41:29,580 --> 00:41:31,530 And so while creating an index on a column 908 00:41:31,530 --> 00:41:34,740 makes select queries much faster, you can select data much more readily, much 909 00:41:34,740 --> 00:41:37,590 more easily, it does actually slow down certain queries. 910 00:41:37,590 --> 00:41:41,340 In particular, slowing down updates, and inserts, and deletes, 911 00:41:41,340 --> 00:41:43,950 because we also need to in addition to changing the data, 912 00:41:43,950 --> 00:41:45,840 also change the index as well. 913 00:41:45,840 --> 00:41:48,780 So just introducing that and those trade-offs, in case 914 00:41:48,780 --> 00:41:50,921 you see them in the future. 915 00:41:50,921 --> 00:41:52,670 One final type of query that I'll show you 916 00:41:52,670 --> 00:41:55,610 are nested queries, which might prove useful sometimes. 917 00:41:55,610 --> 00:41:58,460 So imagine that, OK, here I have my Flights Table, it's got origin, 918 00:41:58,460 --> 00:42:00,620 it's got destination, it's got a duration. 919 00:42:00,620 --> 00:42:03,290 And I have my Passengers Table, it's got an ID, 920 00:42:03,290 --> 00:42:07,770 it's got the name of the passenger, and it's got what flight they're on. 921 00:42:07,770 --> 00:42:10,730 Here is a more sophisticated GROUP BY query. 922 00:42:10,730 --> 00:42:15,195 Select flight_id FROM passengers, GROUP BY flight_ id HAVING-- 923 00:42:15,195 --> 00:42:18,320 and this a key word I mentioned, but we didn't really show or demonstrate-- 924 00:42:18,320 --> 00:42:21,940 COUNT star greater than 1. 925 00:42:21,940 --> 00:42:22,502 OK. 926 00:42:22,502 --> 00:42:25,460 Let's look at that query, try and tease it apart, and someone translate 927 00:42:25,460 --> 00:42:28,650 for me in English, what do you think this query is doing? 928 00:42:28,650 --> 00:42:31,800 929 00:42:31,800 --> 00:42:37,830 Selecting the flight ID from passengers, GROUP BY flight ID, HAVING 930 00:42:37,830 --> 00:42:39,300 COUNT star greater than 1. 931 00:42:39,300 --> 00:42:41,341 I'll show you what the table actually looks like. 932 00:42:41,341 --> 00:42:43,380 Here's what the Passengers Table looks like. 933 00:42:43,380 --> 00:42:47,904 What happens if I run this query on the Passengers Table? 934 00:42:47,904 --> 00:42:49,320 What type of question am I asking? 935 00:42:49,320 --> 00:42:52,757 936 00:42:52,757 --> 00:42:55,703 Yeah? 937 00:42:55,703 --> 00:42:58,649 AUDIENCE: If I say GROUP BY flight ID HAVING COUNT greater than 1, 938 00:42:58,649 --> 00:43:04,541 then it would group the flight IDs that have more than one flight ID. 939 00:43:04,541 --> 00:43:07,605 So it would be 1, and 2, and 6. 940 00:43:07,605 --> 00:43:08,230 SPEAKER: Great. 941 00:43:08,230 --> 00:43:10,688 We're going to get 1, 2, and 6, because what I'm doing here 942 00:43:10,688 --> 00:43:13,480 is I'm grouping all this table by the flight ID column, 943 00:43:13,480 --> 00:43:16,360 just putting them in groups based on their flight ID. 944 00:43:16,360 --> 00:43:21,520 And then I'm selecting only the ones that have count star greater than 1. 945 00:43:21,520 --> 00:43:24,610 In other words, when I count the number of rows, there are more than one 946 00:43:24,610 --> 00:43:25,310 of those rows. 947 00:43:25,310 --> 00:43:26,680 So what I'll get back is a table that just 948 00:43:26,680 --> 00:43:28,360 looks a little something like this. 949 00:43:28,360 --> 00:43:30,790 Flight ID 1, 2, and 6. 950 00:43:30,790 --> 00:43:35,747 These are the flight IDs that have more than one passenger on them. 951 00:43:35,747 --> 00:43:37,580 Feel OK about that query and how that works? 952 00:43:37,580 --> 00:43:39,663 Again, it's one of the more sophisticated queries. 953 00:43:39,663 --> 00:43:42,247 Not one we saw in CS50, but it's something potentially useful. 954 00:43:42,247 --> 00:43:42,746 Yeah? 955 00:43:42,746 --> 00:43:44,480 AUDIENCE: What was the meaning of count? 956 00:43:44,480 --> 00:43:49,160 SPEAKER: Count just counts the number of rows that are returned by the query. 957 00:43:49,160 --> 00:43:52,610 So the number of rows that have a flight ID of 6 958 00:43:52,610 --> 00:43:56,090 is two, and that's greater than one, and so 6 is included. 959 00:43:56,090 --> 00:44:00,840 But the number of rows that have a flight ID of 4 is only one. 960 00:44:00,840 --> 00:44:02,900 And so 4 is not included in my result set. 961 00:44:02,900 --> 00:44:06,050 962 00:44:06,050 --> 00:44:09,390 And so nested queries allow me to take a query 963 00:44:09,390 --> 00:44:11,900 and put it inside of another query, which 964 00:44:11,900 --> 00:44:14,010 SQL allows as much as you want to do. 965 00:44:14,010 --> 00:44:16,530 So I can take this query, again, this query that 966 00:44:16,530 --> 00:44:20,520 returns to me just the flights that have more than one passenger, 967 00:44:20,520 --> 00:44:23,190 and I can just put that into another query. 968 00:44:23,190 --> 00:44:26,795 I can say, SELECT star FROM flights WHERE id 969 00:44:26,795 --> 00:44:30,840 is IN, and then that big query that we just did, 970 00:44:30,840 --> 00:44:34,890 that query that returned to us flight numbers 1, 2, and 6. 971 00:44:34,890 --> 00:44:36,540 And so what is this query going to do? 972 00:44:36,540 --> 00:44:41,397 973 00:44:41,397 --> 00:44:41,897 Yeah? 974 00:44:41,897 --> 00:44:44,819 AUDIENCE: [INAUDIBLE] 975 00:44:44,819 --> 00:44:47,932 976 00:44:47,932 --> 00:44:48,640 SPEAKER: Exactly. 977 00:44:48,640 --> 00:44:51,056 We're going to get all the flight information, the origin, 978 00:44:51,056 --> 00:44:54,500 the destination, and the duration for only the flights that 979 00:44:54,500 --> 00:44:56,319 have more than one passenger. 980 00:44:56,319 --> 00:44:58,610 And so if you wanted to ask a question, like all right, 981 00:44:58,610 --> 00:45:00,985 here's my table of flights, I don't want all the flights, 982 00:45:00,985 --> 00:45:03,890 I only want the flights that have more than one passenger. 983 00:45:03,890 --> 00:45:05,960 Or perhaps a more realistic situation, select 984 00:45:05,960 --> 00:45:08,600 all of the flights that are over capacity, for example, 985 00:45:08,600 --> 00:45:11,183 you could run a query that looks a little something like this, 986 00:45:11,183 --> 00:45:14,232 get back only the results from the table that you actually want. 987 00:45:14,232 --> 00:45:16,190 And so depending on the question, these queries 988 00:45:16,190 --> 00:45:18,150 might become a little more sophisticated. 989 00:45:18,150 --> 00:45:20,960 But by using GROUP By, having nesting queries within each other, 990 00:45:20,960 --> 00:45:24,290 you can begin to construct a SQL query that will match exactly what it 991 00:45:24,290 --> 00:45:25,480 is that you're going for. 992 00:45:25,480 --> 00:45:27,857 993 00:45:27,857 --> 00:45:29,190 Questions about anything so far? 994 00:45:29,190 --> 00:45:32,990 995 00:45:32,990 --> 00:45:33,490 All right. 996 00:45:33,490 --> 00:45:34,900 I would be remiss if I didn't talk about a couple 997 00:45:34,900 --> 00:45:36,910 of vulnerabilities that come about with SQL. 998 00:45:36,910 --> 00:45:38,500 These are things that we talked about in CS50, 999 00:45:38,500 --> 00:45:41,380 but just reintroducing them so they're at the forefront of your mind 1000 00:45:41,380 --> 00:45:45,190 as you go about thinking about writing programs that interact with databases. 1001 00:45:45,190 --> 00:45:49,150 The first of which we talked about in CS50 is SQL injection. 1002 00:45:49,150 --> 00:45:50,500 And what is SQL injection again? 1003 00:45:50,500 --> 00:45:55,809 1004 00:45:55,809 --> 00:45:56,600 What is the threat? 1005 00:45:56,600 --> 00:45:57,560 Yeah, Chris? 1006 00:45:57,560 --> 00:46:00,045 AUDIENCE: [INAUDIBLE] 1007 00:46:00,045 --> 00:46:01,667 1008 00:46:01,667 --> 00:46:02,250 SPEAKER: Yeah. 1009 00:46:02,250 --> 00:46:06,930 The user trying to input SQL into some string input in the application, 1010 00:46:06,930 --> 00:46:11,460 such that they can try and run malicious potentially SQL code on your database. 1011 00:46:11,460 --> 00:46:13,555 So imagine a login system with a username 1012 00:46:13,555 --> 00:46:15,930 and password feel that look a little something like this, 1013 00:46:15,930 --> 00:46:20,430 and a SELECT query where we have SELECT star FROM users WHERE the username is, 1014 00:46:20,430 --> 00:46:23,640 and then plugging in the username there, and the password is, and then 1015 00:46:23,640 --> 00:46:25,580 plugging in the password there. 1016 00:46:25,580 --> 00:46:28,260 Whereby if someone logs in normally with just username Alice, 1017 00:46:28,260 --> 00:46:33,420 password 12345, then we take the names Alice and 12345, and we plug them in. 1018 00:46:33,420 --> 00:46:36,120 Alice in the user name, 12345 in the password. 1019 00:46:36,120 --> 00:46:39,389 And this might be a way via which we select data from the database. 1020 00:46:39,389 --> 00:46:41,430 In reality, you also want to hash your passwords, 1021 00:46:41,430 --> 00:46:43,890 but that's a separate matter. 1022 00:46:43,890 --> 00:46:47,732 But imagine some malicious hacker decides to type in something 1023 00:46:47,732 --> 00:46:48,940 a little something like this. 1024 00:46:48,940 --> 00:46:53,590 Username is hacker and the password is this sort of odd string, 1025 00:46:53,590 --> 00:47:01,200 1' OR, and then '1 equals, and then '1, without a closing single quote, which 1026 00:47:01,200 --> 00:47:02,430 seems strange. 1027 00:47:02,430 --> 00:47:05,640 But when that gets inserted into the query, 1028 00:47:05,640 --> 00:47:08,010 we end up plugging in hacker as the username. 1029 00:47:08,010 --> 00:47:11,970 And when we plug in exactly what he typed as the password, 1030 00:47:11,970 --> 00:47:15,270 we get SELECT star FROM users where username equals hacker and password 1031 00:47:15,270 --> 00:47:19,350 equals '1' or '1' equals '1'. 1032 00:47:19,350 --> 00:47:21,390 And this is always, of course, going to be true. 1033 00:47:21,390 --> 00:47:23,220 And the result of this is that this query 1034 00:47:23,220 --> 00:47:26,820 could result in giving a user access to an account that they shouldn't have. 1035 00:47:26,820 --> 00:47:30,390 It could result in someone being able to access data, modify data, delete data. 1036 00:47:30,390 --> 00:47:32,972 And so this threat is definitely very real. 1037 00:47:32,972 --> 00:47:35,430 And in a moment when we actually take a look at Python code 1038 00:47:35,430 --> 00:47:38,388 for being able to interact with databases, we'll look at different ways 1039 00:47:38,388 --> 00:47:41,520 that we can prevent this and libraries that can help us try and prevent 1040 00:47:41,520 --> 00:47:42,780 injection attacks, as well. 1041 00:47:42,780 --> 00:47:43,280 Yeah? 1042 00:47:43,280 --> 00:47:45,680 AUDIENCE: Do you need to [INAUDIBLE] or no? 1043 00:47:45,680 --> 00:47:49,520 Like, could you just [INAUDIBLE]? 1044 00:47:49,520 --> 00:47:52,850 SPEAKER: Well, the idea here is that the query that we're running already 1045 00:47:52,850 --> 00:47:54,900 has password equals something. 1046 00:47:54,900 --> 00:47:58,010 And so we have to come up with something to fill in here 1047 00:47:58,010 --> 00:48:00,889 that will make this query always true. 1048 00:48:00,889 --> 00:48:02,930 And so one of the easiest ways to do that is just 1049 00:48:02,930 --> 00:48:05,660 compare two things that you know to be true. 1050 00:48:05,660 --> 00:48:10,550 And so even if the password is wrong, this half of the OR query is true, 1051 00:48:10,550 --> 00:48:14,210 and so the whole OR query is true. 1052 00:48:14,210 --> 00:48:17,150 You can make the whole expression true if just half of the expression 1053 00:48:17,150 --> 00:48:17,903 is true as well. 1054 00:48:17,903 --> 00:48:19,352 AUDIENCE: Do you mean like the [INAUDIBLE]?? 1055 00:48:19,352 --> 00:48:21,767 Or could you put like, WHERE password '1' equals '1'? 1056 00:48:21,767 --> 00:48:23,710 And wouldn't that [INAUDIBLE]? 1057 00:48:23,710 --> 00:48:27,290 SPEAKER: So password refers to a column in the database. 1058 00:48:27,290 --> 00:48:30,040 So this password is some password that's stored in the database 1059 00:48:30,040 --> 00:48:32,020 that the hacker presumably doesn't know. 1060 00:48:32,020 --> 00:48:34,330 And so they wouldn't be able to necessarily guess 1061 00:48:34,330 --> 00:48:36,902 a password that is equal to the password in the database. 1062 00:48:36,902 --> 00:48:38,860 If they could, they would just log in normally. 1063 00:48:38,860 --> 00:48:43,180 And so this OR query guarantees that whatever it is that they're typing in 1064 00:48:43,180 --> 00:48:44,985 is going to be a true expression. 1065 00:48:44,985 --> 00:48:47,930 1066 00:48:47,930 --> 00:48:48,430 All right. 1067 00:48:48,430 --> 00:48:51,930 And the other vulnerability, or at least security potential threat 1068 00:48:51,930 --> 00:48:55,680 that you need to be aware of, something that can happen is a race condition. 1069 00:48:55,680 --> 00:48:57,660 Race conditions are what happens if you aren't 1070 00:48:57,660 --> 00:49:01,922 too careful about how different interactions with the SQL 1071 00:49:01,922 --> 00:49:04,380 database in this case are going to interact with each other 1072 00:49:04,380 --> 00:49:05,796 if they're happening concurrently. 1073 00:49:05,796 --> 00:49:08,850 Two things that are happening at potentially the same time. 1074 00:49:08,850 --> 00:49:12,930 Imagine, for example, that someone has a bank account and you run a query, 1075 00:49:12,930 --> 00:49:15,420 SELECT the balance FROM the bank account. 1076 00:49:15,420 --> 00:49:17,964 And that gives you, OK, the balance is $100. 1077 00:49:17,964 --> 00:49:20,880 And then you say, OK, I have $100 in the bank account, let me go ahead 1078 00:49:20,880 --> 00:49:24,420 and withdraw the $100 from the bank account, updating the bank, 1079 00:49:24,420 --> 00:49:28,050 setting the balance to be balance minus $100 for this particular user, 1080 00:49:28,050 --> 00:49:30,840 and then we'll give them the $100. 1081 00:49:30,840 --> 00:49:33,100 And that interaction seems fine. 1082 00:49:33,100 --> 00:49:37,050 But what happens if the user tries to open up two windows on their computer, 1083 00:49:37,050 --> 00:49:39,000 or go to two different ATMs, and try and do 1084 00:49:39,000 --> 00:49:42,244 this same exact thing at the same time? 1085 00:49:42,244 --> 00:49:44,910 Now a computer, of course, can only process one thing at a time. 1086 00:49:44,910 --> 00:49:48,500 So these threads, so to speak, might cross over each other. 1087 00:49:48,500 --> 00:49:51,690 And so what can happen there after the bank balance goes to 0 in this case? 1088 00:49:51,690 --> 00:49:55,140 Let's imagine a situation where on one side of things, 1089 00:49:55,140 --> 00:49:59,340 in one window at one ATM, we ask, OK, how much is in the bank? 1090 00:49:59,340 --> 00:50:01,680 And we say, OK, there's $100 in the bank. 1091 00:50:01,680 --> 00:50:04,260 And at the same time or slightly after this point, 1092 00:50:04,260 --> 00:50:07,380 someone else is asking, OK, how much balance is in the bank? 1093 00:50:07,380 --> 00:50:09,630 And based on the current value of the bank account, 1094 00:50:09,630 --> 00:50:11,610 we're also going to return $100. 1095 00:50:11,610 --> 00:50:15,510 And so now both sides of this, both running threads, so to speak, 1096 00:50:15,510 --> 00:50:18,450 think that there is $100 in the bank that they can withdraw. 1097 00:50:18,450 --> 00:50:22,710 Which is true, there is $100 in the bank that can be withdrawn. 1098 00:50:22,710 --> 00:50:24,840 But both sides think this, and so this side 1099 00:50:24,840 --> 00:50:28,590 runs the query, OK, update the bank, set the balance to be balance minus $100, 1100 00:50:28,590 --> 00:50:31,380 give the user $100, that changes to zero. 1101 00:50:31,380 --> 00:50:34,650 And as far as this side is concerned, there is still $100 in the bank, 1102 00:50:34,650 --> 00:50:36,570 and so it can run the same query, and now 1103 00:50:36,570 --> 00:50:39,720 you're left with a bank account that has negative $100. 1104 00:50:39,720 --> 00:50:41,996 So that's a potential issue to deal with as well. 1105 00:50:41,996 --> 00:50:45,120 And the way that we generally deal with those problems is through something 1106 00:50:45,120 --> 00:50:46,930 called SQL transactions. 1107 00:50:46,930 --> 00:50:51,270 Which is, rather than allow for different queries to intervene, 1108 00:50:51,270 --> 00:50:55,950 just group a whole bunch of queries together into a single transaction, 1109 00:50:55,950 --> 00:50:58,740 and then when you're done with all those queries, commit them. 1110 00:50:58,740 --> 00:51:00,930 And that will guarantee that all of those queries 1111 00:51:00,930 --> 00:51:04,800 are run uninterrupted, that no other queries can intervene and get 1112 00:51:04,800 --> 00:51:05,632 in the way of that. 1113 00:51:05,632 --> 00:51:07,590 And so in a moment when we turn to Python code, 1114 00:51:07,590 --> 00:51:09,420 we'll see where we're going to run queries, 1115 00:51:09,420 --> 00:51:12,030 but we're not actually going to run them on the database 1116 00:51:12,030 --> 00:51:13,470 until we decide to commit them. 1117 00:51:13,470 --> 00:51:17,220 To say, take all these queries, group them together into this transaction, 1118 00:51:17,220 --> 00:51:18,520 and run this transaction now. 1119 00:51:18,520 --> 00:51:22,900 1120 00:51:22,900 --> 00:51:24,134 All right. 1121 00:51:24,134 --> 00:51:25,300 So now we've introduced SQL. 1122 00:51:25,300 --> 00:51:27,160 Some of this was review from CS50. 1123 00:51:27,160 --> 00:51:30,460 But now let's turn our attention to how to actually write Python code that 1124 00:51:30,460 --> 00:51:31,684 interacts with the database. 1125 00:51:31,684 --> 00:51:34,600 And the library we're going to be using for this is called SQLAlchemy. 1126 00:51:34,600 --> 00:51:36,974 We're going to move away from the CS50 SQL library, which 1127 00:51:36,974 --> 00:51:39,580 you may have used in CS50 Finance, or in final projects 1128 00:51:39,580 --> 00:51:41,200 to work on databases of your own. 1129 00:51:41,200 --> 00:51:43,750 And just take a look at Python code you can write in order 1130 00:51:43,750 --> 00:51:46,560 to interact with the database. 1131 00:51:46,560 --> 00:51:57,190 And so let's start with an example that will just list all of the flights that 1132 00:51:57,190 --> 00:52:02,270 are currently inside of the database. 1133 00:52:02,270 --> 00:52:09,160 So right here, we have a list.py And I'm importing some functions 1134 00:52:09,160 --> 00:52:10,114 from SQLAlchemy. 1135 00:52:10,114 --> 00:52:12,280 No need to worry too much about what all this means. 1136 00:52:12,280 --> 00:52:14,140 But the important thing here is that I need 1137 00:52:14,140 --> 00:52:18,160 a database engine that's going to allow me to interact with the database. 1138 00:52:18,160 --> 00:52:20,200 And when I create that database engine, I 1139 00:52:20,200 --> 00:52:24,340 need to tell Python what database to use. 1140 00:52:24,340 --> 00:52:27,370 And in order to do that, I'm going to need the URL for the database. 1141 00:52:27,370 --> 00:52:30,190 And the URL for the database, at least in Postgres's case, 1142 00:52:30,190 --> 00:52:34,270 is this thing here under URI, this long URL 1143 00:52:34,270 --> 00:52:37,750 is the URL for my Postgres database. 1144 00:52:37,750 --> 00:52:40,680 It's got all the credentials that I need. 1145 00:52:40,680 --> 00:52:44,790 And I could have just said, create engine, and then in parentheses 1146 00:52:44,790 --> 00:52:48,030 plugged in that exact URL as a string right there. 1147 00:52:48,030 --> 00:52:49,740 You can do that. 1148 00:52:49,740 --> 00:52:51,990 But oftentimes for security reasons, it's 1149 00:52:51,990 --> 00:52:55,470 best not to, best not to put your database credentials, usernames, 1150 00:52:55,470 --> 00:52:57,854 and passwords, and whatnot inside of your code. 1151 00:52:57,854 --> 00:53:00,270 Because then anyone with access to your GitHub repository, 1152 00:53:00,270 --> 00:53:03,780 for instance, can see exactly what the credentials for your database are. 1153 00:53:03,780 --> 00:53:06,330 They can get into your database, they can see all your data. 1154 00:53:06,330 --> 00:53:10,650 So what I'm doing here is putting-- 1155 00:53:10,650 --> 00:53:12,780 I'm going to put the database name into what's 1156 00:53:12,780 --> 00:53:15,180 called an environment variable, a variable that 1157 00:53:15,180 --> 00:53:17,487 just exists on my computer system. 1158 00:53:17,487 --> 00:53:19,320 And what I'm telling the Python program here 1159 00:53:19,320 --> 00:53:22,980 is rather than just literally plugging in the string for the database 1160 00:53:22,980 --> 00:53:25,620 into my Python program, I'm saying, I'm going 1161 00:53:25,620 --> 00:53:29,040 to store the database URL inside of an environment variable 1162 00:53:29,040 --> 00:53:33,070 that I'm going to call database_URL, but you could call it anything you want. 1163 00:53:33,070 --> 00:53:35,400 And then I want the Python program to just draw 1164 00:53:35,400 --> 00:53:37,260 the URL from the environment variable. 1165 00:53:37,260 --> 00:53:39,600 Read that environment variable into memory 1166 00:53:39,600 --> 00:53:43,930 and then use it, such that I don't need to put the name of the database 1167 00:53:43,930 --> 00:53:47,030 and the credentials for the database literally in my code. 1168 00:53:47,030 --> 00:53:50,870 And so in order to set that environment variable on a Mac or on Linux, 1169 00:53:50,870 --> 00:53:55,430 you'll write something like Export database_URL equals-- 1170 00:53:55,430 --> 00:53:57,930 and then I'll just type in the credentials for the database. 1171 00:53:57,930 --> 00:54:01,860 If you're using Windows, you'll use the word Set instead of the word Export. 1172 00:54:01,860 --> 00:54:06,440 So I've exported the database URL, set that environment variable in my system, 1173 00:54:06,440 --> 00:54:08,490 and now what is this code actually doing? 1174 00:54:08,490 --> 00:54:10,260 Well, let's look at the main function. 1175 00:54:10,260 --> 00:54:14,040 We're saying flights equals, and then db.execute 1176 00:54:14,040 --> 00:54:16,680 is my way of executing a database query. 1177 00:54:16,680 --> 00:54:21,450 Select origin destination duration from flights, and then FETCH ALL 1178 00:54:21,450 --> 00:54:24,190 is my way of saying, get me all of the results back from this. 1179 00:54:24,190 --> 00:54:26,273 SQLAlchemy offers a number of different functions. 1180 00:54:26,273 --> 00:54:29,040 You can do FETCH ONE to just get you a single result back 1181 00:54:29,040 --> 00:54:31,350 from the database as a single entry. 1182 00:54:31,350 --> 00:54:34,650 FETCH ALL is going to return to me a Python list of all of the results 1183 00:54:34,650 --> 00:54:37,020 that came back. 1184 00:54:37,020 --> 00:54:40,800 And then I can say, OK, for flight in flights for each one of these flights, 1185 00:54:40,800 --> 00:54:45,240 just loop over them, go ahead and print out this formatted string, 1186 00:54:45,240 --> 00:54:47,580 plugging in flight.origin. 1187 00:54:47,580 --> 00:54:51,240 I'm using dot notation to say, get at the origin property of the flight, 1188 00:54:51,240 --> 00:54:53,970 and get at the destination property of the flight, 1189 00:54:53,970 --> 00:54:56,190 and get at the duration property of the flight, 1190 00:54:56,190 --> 00:54:59,297 and just plug it in in those values. 1191 00:54:59,297 --> 00:55:01,630 And so I'll run the program so you can see how it works, 1192 00:55:01,630 --> 00:55:04,713 and then we'll go back to the code if you want to take another look at it. 1193 00:55:04,713 --> 00:55:10,540 I run Python list.py And what I get is, all right, here are all of the rows 1194 00:55:10,540 --> 00:55:14,440 inside of my Flights database that I've formatted using format strings in order 1195 00:55:14,440 --> 00:55:17,720 to display to the user. 1196 00:55:17,720 --> 00:55:19,980 I'll show you the code for that again. 1197 00:55:19,980 --> 00:55:20,911 Take a look at it. 1198 00:55:20,911 --> 00:55:21,660 Yeah, [INAUDIBLE]? 1199 00:55:21,660 --> 00:55:25,971 AUDIENCE: Why do you use FETCH ALL if you don't [INAUDIBLE] FETCH ALL 1200 00:55:25,971 --> 00:55:26,930 [INAUDIBLE]? 1201 00:55:26,930 --> 00:55:29,210 SPEAKER: Why do you need FETCH ALL? 1202 00:55:29,210 --> 00:55:31,850 In this particular case, you might not. 1203 00:55:31,850 --> 00:55:34,559 It changes a little bit the type of data that gets returned, 1204 00:55:34,559 --> 00:55:37,475 although both types of data I think are things that you can loop over. 1205 00:55:37,475 --> 00:55:40,141 So you might be able to get away without FETCH ALL in this case. 1206 00:55:40,141 --> 00:55:41,520 Yeah? 1207 00:55:41,520 --> 00:55:45,662 AUDIENCE: Do the environment variables disappear any time the terminal 1208 00:55:45,662 --> 00:55:47,199 window quits out or closes? 1209 00:55:47,199 --> 00:55:47,740 SPEAKER: Yes. 1210 00:55:47,740 --> 00:55:49,690 If you close your terminal window and reopen it, 1211 00:55:49,690 --> 00:55:51,200 that will reset your environment variables. 1212 00:55:51,200 --> 00:55:53,960 So you'll probably need to set the environment variable again. 1213 00:55:53,960 --> 00:55:57,439 There are ways of setting it up so that every time you 1214 00:55:57,439 --> 00:55:59,980 open a new terminal it will automatically set the environment 1215 00:55:59,980 --> 00:56:01,410 variables appropriately. 1216 00:56:01,410 --> 00:56:04,190 If you're using Bash on a Mac or on Linux for example, 1217 00:56:04,190 --> 00:56:06,280 you can add something to your Bash profile, 1218 00:56:06,280 --> 00:56:09,420 and that will automatically run the setting of environment variables 1219 00:56:09,420 --> 00:56:12,364 anytime you open a new Bash Shell, anytime you open a terminal window. 1220 00:56:12,364 --> 00:56:14,530 And we can help you work on that during project time 1221 00:56:14,530 --> 00:56:16,670 if you want to set that up. 1222 00:56:16,670 --> 00:56:17,170 Yeah? 1223 00:56:17,170 --> 00:56:18,628 AUDIENCE: Sorry, just following up. 1224 00:56:18,628 --> 00:56:21,500 So if you were pushing something to Heroku, 1225 00:56:21,500 --> 00:56:25,329 you wouldn't want to set the variables [INAUDIBLE] anything 1226 00:56:25,329 --> 00:56:27,301 is being [INAUDIBLE] closed down. 1227 00:56:27,301 --> 00:56:30,770 So you'd want to set it up with the Bash profile, like [INAUDIBLE]?? 1228 00:56:30,770 --> 00:56:31,210 SPEAKER: Oh, good question. 1229 00:56:31,210 --> 00:56:32,170 Question about Heroku. 1230 00:56:32,170 --> 00:56:35,990 So my database credentials here are stored on my computer. 1231 00:56:35,990 --> 00:56:39,520 So what happens when I push my application to Heroku, 1232 00:56:39,520 --> 00:56:41,830 how is Heroku going to know the database credentials? 1233 00:56:41,830 --> 00:56:44,200 Well you can set environment variables in Heroku 1234 00:56:44,200 --> 00:56:46,009 in the settings for the application. 1235 00:56:46,009 --> 00:56:47,800 So if I go to Brian Flights, my application 1236 00:56:47,800 --> 00:56:52,740 here on Heroku, and go into Settings, and then go into here, 1237 00:56:52,740 --> 00:56:56,490 config vars, which is just basically the environment variables. 1238 00:56:56,490 --> 00:56:59,970 If I reveal them, one of them by default is already 1239 00:56:59,970 --> 00:57:03,900 going to be database URL that is going to be equal to that Postgres database. 1240 00:57:03,900 --> 00:57:06,970 That gets set up automatically when I create a Heroku Postgres database. 1241 00:57:06,970 --> 00:57:08,886 But if you wanted other environment variables, 1242 00:57:08,886 --> 00:57:11,910 you could add them here as well in order to add environment variables 1243 00:57:11,910 --> 00:57:13,451 to the Heroku application. 1244 00:57:13,451 --> 00:57:16,680 1245 00:57:16,680 --> 00:57:17,250 Other things? 1246 00:57:17,250 --> 00:57:20,380 1247 00:57:20,380 --> 00:57:20,880 All right. 1248 00:57:20,880 --> 00:57:26,162 So that was listing all the things inside of a database. 1249 00:57:26,162 --> 00:57:28,120 Let's look at inserting things into a database. 1250 00:57:28,120 --> 00:57:30,250 Maybe I want to insert them from a file. 1251 00:57:30,250 --> 00:57:32,560 And a file type that we introduced in CS50 1252 00:57:32,560 --> 00:57:35,650 were CSV files, Comma Separated Values. 1253 00:57:35,650 --> 00:57:38,290 And those files looked a little something like this. 1254 00:57:38,290 --> 00:57:41,190 This is a file called Flights.CSV. 1255 00:57:41,190 --> 00:57:44,770 And very simply, this is basically a very simple spreadsheet almost, 1256 00:57:44,770 --> 00:57:46,930 where each row is a row in the spreadsheet. 1257 00:57:46,930 --> 00:57:49,420 And the columns of the spreadsheet are separated by commas. 1258 00:57:49,420 --> 00:57:53,050 You could open this up in Apple Numbers, or Microsoft Excel, or Google Sheets, 1259 00:57:53,050 --> 00:57:55,180 and it would format nicely as a spreadsheet. 1260 00:57:55,180 --> 00:58:01,142 But basically, each row is origin, destination, duration. 1261 00:58:01,142 --> 00:58:03,850 And this is just some text file that I've created on my computer. 1262 00:58:03,850 --> 00:58:07,870 And I would like to take this CSV file and take each row 1263 00:58:07,870 --> 00:58:11,170 and insert it into my database. 1264 00:58:11,170 --> 00:58:13,820 So how might I go about doing that? 1265 00:58:13,820 --> 00:58:19,180 Well, one thing that we'll use for here is Python's CSV module, 1266 00:58:19,180 --> 00:58:21,340 which is a module built into Python that makes it 1267 00:58:21,340 --> 00:58:24,370 very easy to just loop over a CSV file. 1268 00:58:24,370 --> 00:58:29,800 And so if I open up import.py. 1269 00:58:29,800 --> 00:58:32,290 This is going to do the importing. 1270 00:58:32,290 --> 00:58:35,620 And let's go ahead and take a look at the main function. 1271 00:58:35,620 --> 00:58:39,520 The first thing that I do here on 911 is open up flights.csv. 1272 00:58:39,520 --> 00:58:41,200 I'm taking the CSV file and opening it. 1273 00:58:41,200 --> 00:58:44,890 OPEN is a built in function in Python that just opens up a file. 1274 00:58:44,890 --> 00:58:50,170 And up at the top, I've imported the CSV module such that on line 12, I can say, 1275 00:58:50,170 --> 00:58:55,450 csv.reader_f in order to generate an object that is going to basically read 1276 00:58:55,450 --> 00:59:00,160 this CSV file for me one line at a time, an object that I can loop over and loop 1277 00:59:00,160 --> 00:59:03,660 over every line in this file. 1278 00:59:03,660 --> 00:59:05,440 So I'm going to loop over the file. 1279 00:59:05,440 --> 00:59:09,400 And I'm going to say, for origin, destination, duration 1280 00:59:09,400 --> 00:59:11,310 in this CSV reader. 1281 00:59:11,310 --> 00:59:14,200 And I'm using origin, destination, duration 1282 00:59:14,200 --> 00:59:17,860 because I know that in my CSV file, this file is formatted 1283 00:59:17,860 --> 00:59:20,980 as origin, destination, duration. 1284 00:59:20,980 --> 00:59:25,060 So I can specify, here's a variable name I want to apply to the first column, 1285 00:59:25,060 --> 00:59:27,565 here's a variable name I want to apply to the second column, 1286 00:59:27,565 --> 00:59:30,190 and here's a variable name I want to apply to the third column, 1287 00:59:30,190 --> 00:59:32,730 just going over the whole thing. 1288 00:59:32,730 --> 00:59:36,360 And now we're going to say, db.execute. 1289 00:59:36,360 --> 00:59:39,210 And we're going to say, INSERT into flights-- this is the table-- 1290 00:59:39,210 --> 00:59:42,420 these columns, origin, destination, duration. 1291 00:59:42,420 --> 00:59:46,930 And then values, here I'm using a little bit of different syntax, syntax 1292 00:59:46,930 --> 00:59:48,750 that might be familiar from CS50. 1293 00:59:48,750 --> 00:59:52,410 But rather than just literally plugging in the origin, and the destination, 1294 00:59:52,410 --> 00:59:54,540 and the duration there, which would open me up 1295 00:59:54,540 --> 00:59:56,820 to potential SQL injection attacks. 1296 00:59:56,820 --> 01:00:00,510 What I'm going to do instead is use this colon syntax to mean, 1297 01:00:00,510 --> 01:00:02,910 this is a placeholder for the origin, and this 1298 01:00:02,910 --> 01:00:05,100 is a placeholder for the destination, and this 1299 01:00:05,100 --> 01:00:07,050 is a placeholder for the duration. 1300 01:00:07,050 --> 01:00:12,810 And then I'm going to plug in values into those placeholders by also 1301 01:00:12,810 --> 01:00:17,790 providing to db.execute a Python dictionary that associates keys-- 1302 01:00:17,790 --> 01:00:21,520 which are the names of those placeholders, origin, destination, 1303 01:00:21,520 --> 01:00:22,860 and duration-- 1304 01:00:22,860 --> 01:00:26,452 with values, namely the variable names that I actually want to insert, 1305 01:00:26,452 --> 01:00:29,160 which also happen to be called origin, destination, and duration, 1306 01:00:29,160 --> 01:00:31,660 although they don't necessarily need to have the same names. 1307 01:00:31,660 --> 01:00:33,150 They could be called anything. 1308 01:00:33,150 --> 01:00:35,100 And what this is going to do is, SQLAlchemy 1309 01:00:35,100 --> 01:00:38,610 is going to plug in these variables into those place holders, 1310 01:00:38,610 --> 01:00:40,960 and it's automatically going to escape them. 1311 01:00:40,960 --> 01:00:44,280 In other words, make sure that no SQL injection attacks are possible. 1312 01:00:44,280 --> 01:00:46,320 So for security reasons, we'll always want 1313 01:00:46,320 --> 01:00:48,960 to use these place holders using the colon syntax 1314 01:00:48,960 --> 01:00:52,860 and then provide all of the variables that we want to plug in to the query 1315 01:00:52,860 --> 01:00:54,090 there. 1316 01:00:54,090 --> 01:00:55,920 Again, very similar to what we did in CS50. 1317 01:00:55,920 --> 01:00:57,840 The syntax of it is a little bit different 1318 01:00:57,840 --> 01:01:00,780 since we're no longer using the CS50 library. 1319 01:01:00,780 --> 01:01:04,999 And we'll see an even easier way to do this later this afternoon. 1320 01:01:04,999 --> 01:01:06,790 And then I'm just going to print a message. 1321 01:01:06,790 --> 01:01:08,650 I added a flight from origin to destination 1322 01:01:08,650 --> 01:01:09,981 lasting this number of minutes. 1323 01:01:09,981 --> 01:01:11,980 That's just a printout to me so I can keep track 1324 01:01:11,980 --> 01:01:13,690 of the progress of this application. 1325 01:01:13,690 --> 01:01:16,420 If I'm only importing a couple flights, it's probably not going to take long. 1326 01:01:16,420 --> 01:01:19,378 But you might imagine a situation where I'm importing thousands of rows 1327 01:01:19,378 --> 01:01:22,150 into a database, and then it might start to take a while. 1328 01:01:22,150 --> 01:01:24,430 And in fact, the morning project as you'll soon see, 1329 01:01:24,430 --> 01:01:28,030 involves trying to import a big CSV data set into a database. 1330 01:01:28,030 --> 01:01:31,230 And we'll take a look at that in just a moment. 1331 01:01:31,230 --> 01:01:35,330 Once I'm done doing all of those db.execute queries, at the very end 1332 01:01:35,330 --> 01:01:37,490 I have this line that says db.commit. 1333 01:01:37,490 --> 01:01:41,120 Meaning commit the results, group all these into a single transaction, 1334 01:01:41,120 --> 01:01:44,482 and just actually execute these queries in the database. 1335 01:01:44,482 --> 01:01:48,100 1336 01:01:48,100 --> 01:01:50,470 So now, OK, here is flights.csv. 1337 01:01:50,470 --> 01:01:59,301 If I run Python import.py, run this command, and all right. 1338 01:01:59,301 --> 01:02:02,300 We added a flight from Paris to New York lasting that number of minutes, 1339 01:02:02,300 --> 01:02:05,630 added a whole bunch of different flights in order 1340 01:02:05,630 --> 01:02:08,190 to get the result that we want. 1341 01:02:08,190 --> 01:02:12,020 And in fact, if I run Python list.py now to list all of the flights that are 1342 01:02:12,020 --> 01:02:15,920 inside of my Flights Table, press Return, now I have-- 1343 01:02:15,920 --> 01:02:16,790 if all goes well-- 1344 01:02:16,790 --> 01:02:17,900 even more flights than before. 1345 01:02:17,900 --> 01:02:19,900 Because I had all the flights I added originally 1346 01:02:19,900 --> 01:02:22,760 plus all of these other flights that I've now imported. 1347 01:02:22,760 --> 01:02:26,340 1348 01:02:26,340 --> 01:02:29,188 Questions about anything we've done so far? 1349 01:02:29,188 --> 01:02:29,688 Yeah? 1350 01:02:29,688 --> 01:02:33,239 AUDIENCE: Do you mind going back to the import.py? 1351 01:02:33,239 --> 01:02:33,780 SPEAKER: Yep. 1352 01:02:33,780 --> 01:02:35,293 Import.py is here. 1353 01:02:35,293 --> 01:02:39,550 AUDIENCE: I was wondering, do you have to specify anywhere 1354 01:02:39,550 --> 01:02:42,145 that you're not going to commit it until later? 1355 01:02:42,145 --> 01:02:46,075 Or is it just standard that all of your db.executes, 1356 01:02:46,075 --> 01:02:49,526 they don't run until it sees the db.commit? 1357 01:02:49,526 --> 01:02:52,670 SPEAKER: A db.execute command that modifies 1358 01:02:52,670 --> 01:02:56,070 the database, like inserting rows, updating rows, deleting rows, 1359 01:02:56,070 --> 01:02:58,904 will not actually run in the database until it's committed. 1360 01:02:58,904 --> 01:03:01,070 Now I could have done something like, OK, every time 1361 01:03:01,070 --> 01:03:03,410 I execute, just immediately commit right afterwards 1362 01:03:03,410 --> 01:03:06,216 and do that again, and again, and again for every flight, 1363 01:03:06,216 --> 01:03:07,340 but this is less efficient. 1364 01:03:07,340 --> 01:03:09,650 Because committing takes time, I actually to go to the database, 1365 01:03:09,650 --> 01:03:10,490 talk to it. 1366 01:03:10,490 --> 01:03:13,790 And so for efficiency sake, it's a little bit faster in this case 1367 01:03:13,790 --> 01:03:16,520 just to say, wait until I run all the queries I want to run 1368 01:03:16,520 --> 01:03:18,103 and just commit them at the same time. 1369 01:03:18,103 --> 01:03:22,210 Send them to the database together, make that one single transaction. 1370 01:03:22,210 --> 01:03:22,710 Yeah? 1371 01:03:22,710 --> 01:03:25,800 AUDIENCE: Is Reader a function? 1372 01:03:25,800 --> 01:03:29,250 SPEAKER: Reader is a function or a method, yeah, a function basically 1373 01:03:29,250 --> 01:03:33,180 inside of the CSV module that's built into Python 1374 01:03:33,180 --> 01:03:35,210 that just makes it easier to read CSV files. 1375 01:03:35,210 --> 01:03:36,984 It automatically parses it, automatically 1376 01:03:36,984 --> 01:03:39,150 separates things based on the comma so that we don't 1377 01:03:39,150 --> 01:03:40,921 need to worry about doing any of that. 1378 01:03:40,921 --> 01:03:41,420 Yeah? 1379 01:03:41,420 --> 01:03:46,077 AUDIENCE: What's a engine [INAUDIBLE] specific to SQLAlchemy? 1380 01:03:46,077 --> 01:03:48,160 SPEAKER: The engine is not specific to SQLAlchemy, 1381 01:03:48,160 --> 01:03:51,280 it's pretty common across database programs. 1382 01:03:51,280 --> 01:03:56,680 The engine you can think of as the entity in my Python code, 1383 01:03:56,680 --> 01:03:59,830 the object that's taking care of interacting with the database, that's 1384 01:03:59,830 --> 01:04:02,320 responsible for making requests to the database 1385 01:04:02,320 --> 01:04:04,660 and getting back responses to it. 1386 01:04:04,660 --> 01:04:06,760 And this scoped session as you'll soon see 1387 01:04:06,760 --> 01:04:08,804 when we get into the world of web applications, 1388 01:04:08,804 --> 01:04:11,470 this is specifically to make sure I can have multiple users that 1389 01:04:11,470 --> 01:04:13,540 are using my web application, each of which 1390 01:04:13,540 --> 01:04:18,550 is making their own database queries and has their own transactions 1391 01:04:18,550 --> 01:04:20,732 that they're executing on the database separately. 1392 01:04:20,732 --> 01:04:22,190 And we'll see that in a moment too. 1393 01:04:22,190 --> 01:04:24,640 Yeah? 1394 01:04:24,640 --> 01:04:28,160 AUDIENCE: What is the time efficiency of a database search like this? 1395 01:04:28,160 --> 01:04:31,640 Like, if you want to select everything from the database? 1396 01:04:31,640 --> 01:04:35,041 SPEAKER: So the efficiency of the database or the time it takes 1397 01:04:35,041 --> 01:04:37,040 is going to depend a little bit on the database. 1398 01:04:37,040 --> 01:04:39,742 But certainly databases like MySQL and PostgreSQL 1399 01:04:39,742 --> 01:04:42,950 are going to be able to handle things like this much more quickly than SQLite 1400 01:04:42,950 --> 01:04:45,884 can, which is not designed to handle as large scale database. 1401 01:04:45,884 --> 01:04:49,050 And it also is going to depend upon the amount of data and the type of data. 1402 01:04:49,050 --> 01:04:49,939 So it varies a lot. 1403 01:04:49,939 --> 01:04:51,730 And that's why in practice, a lot of people 1404 01:04:51,730 --> 01:04:53,438 will do benchmarking to actually just run 1405 01:04:53,438 --> 01:04:57,320 tests to see how much load can it handle, and in how much time, 1406 01:04:57,320 --> 01:05:01,090 and how fast are various different operations. 1407 01:05:01,090 --> 01:05:02,485 Were there other questions? 1408 01:05:02,485 --> 01:05:03,110 A couple, yeah? 1409 01:05:03,110 --> 01:05:06,599 AUDIENCE: How did you connect the Python files to the Heroku app? 1410 01:05:06,599 --> 01:05:09,140 SPEAKER: How did I connect the Python file to the Heroku app? 1411 01:05:09,140 --> 01:05:13,250 I do that here where I say, os.getend the DATABASE_URL. 1412 01:05:13,250 --> 01:05:16,940 In other words, get the environment variable called DATABASE_URL. 1413 01:05:16,940 --> 01:05:21,110 And then in my terminal, I have set the DATABASE_URL environment variable 1414 01:05:21,110 --> 01:05:25,520 via the Export keyword on Mac or Linux or the Set keyword 1415 01:05:25,520 --> 01:05:31,130 on Windows to set it equal to the big long URL that I got from Heroku. 1416 01:05:31,130 --> 01:05:33,200 I copied this URL, and that is the URL that's 1417 01:05:33,200 --> 01:05:36,420 going to connect me to the database that Heroku has created for me. 1418 01:05:36,420 --> 01:05:39,680 1419 01:05:39,680 --> 01:05:40,180 Yeah? 1420 01:05:40,180 --> 01:05:40,846 AUDIENCE: Sorry. 1421 01:05:40,846 --> 01:05:43,362 Is that DATABASE_URL the long URL online? 1422 01:05:43,362 --> 01:05:44,570 Or is it something different? 1423 01:05:44,570 --> 01:05:45,550 SPEAKER: It's a long one from online. 1424 01:05:45,550 --> 01:05:45,880 AUDIENCE: OK. 1425 01:05:45,880 --> 01:05:46,463 SPEAKER: Yeah. 1426 01:05:46,463 --> 01:05:51,370 1427 01:05:51,370 --> 01:05:52,010 All right. 1428 01:05:52,010 --> 01:05:54,380 I'll show you one other example of the types of scripts 1429 01:05:54,380 --> 01:05:57,795 that we can create just by the fact that we can now run queries on applications. 1430 01:05:57,795 --> 01:06:00,920 I'll show you passengers.py, and then we'll take a look at web applications 1431 01:06:00,920 --> 01:06:03,410 before we break for our morning project. 1432 01:06:03,410 --> 01:06:05,900 So passengers.py is just a program that's 1433 01:06:05,900 --> 01:06:08,990 going to allow me to see the passengers on a given flight. 1434 01:06:08,990 --> 01:06:11,630 The first thing that it does is it lists all the flights. 1435 01:06:11,630 --> 01:06:14,000 And we do that by selecting the ID, origin, 1436 01:06:14,000 --> 01:06:15,860 and destination from the Flights Table. 1437 01:06:15,860 --> 01:06:18,590 Just select all of that information from the Flight Table, 1438 01:06:18,590 --> 01:06:20,960 and then for each one of those flights, go ahead 1439 01:06:20,960 --> 01:06:25,430 and print it out, printing out various different properties for the flight. 1440 01:06:25,430 --> 01:06:29,030 Then I'll prompt the user to choose a flight using the input function to say, 1441 01:06:29,030 --> 01:06:30,710 type in a flight ID here. 1442 01:06:30,710 --> 01:06:33,810 Convert it to an integer, as we saw when we first introduced Python, 1443 01:06:33,810 --> 01:06:36,050 save it inside a flight ID. 1444 01:06:36,050 --> 01:06:38,840 And then I'm saying, OK, select everything 1445 01:06:38,840 --> 01:06:42,710 from a flight where the ID is this flight ID. 1446 01:06:42,710 --> 01:06:44,570 Fetch one, meaning just get me one. 1447 01:06:44,570 --> 01:06:47,111 I'm only trying to get one flight, I don't want many flights, 1448 01:06:47,111 --> 01:06:49,210 so just get me one thing. 1449 01:06:49,210 --> 01:06:51,950 It's possible that that flight ID doesn't exist. 1450 01:06:51,950 --> 01:06:54,972 So if the flight that comes back is none, in other words nothing, 1451 01:06:54,972 --> 01:06:56,680 then I'm going to print out error, sorry, 1452 01:06:56,680 --> 01:06:58,750 that's not actually a valid flight. 1453 01:06:58,750 --> 01:07:00,880 But otherwise, if it is a valid flight, I'll 1454 01:07:00,880 --> 01:07:02,620 go ahead and request the passengers. 1455 01:07:02,620 --> 01:07:05,530 Select name from the Passengers Table where 1456 01:07:05,530 --> 01:07:10,550 the flight ID is whatever the flight ID the user just typed in is. 1457 01:07:10,550 --> 01:07:12,890 Save that result in passengers. 1458 01:07:12,890 --> 01:07:16,794 Then loop over that list of passengers and print out the passenger's name. 1459 01:07:16,794 --> 01:07:18,710 If there are no passengers, print out the fact 1460 01:07:18,710 --> 01:07:20,520 that there are no passengers on the flight. 1461 01:07:20,520 --> 01:07:22,020 I went through that rather quickly. 1462 01:07:22,020 --> 01:07:24,920 But this code and all the code from today is online on the course website 1463 01:07:24,920 --> 01:07:27,586 if you want to take a look at it in order to experiment with it, 1464 01:07:27,586 --> 01:07:30,200 or use it in your own projects. 1465 01:07:30,200 --> 01:07:35,337 So if I run Python passengers.py now, it's going to show me all the flights. 1466 01:07:35,337 --> 01:07:37,420 All right, here are all the flights and their IDs. 1467 01:07:37,420 --> 01:07:40,810 If I type in flight number one, I want to see the passengers on that flight. 1468 01:07:40,810 --> 01:07:45,290 It'll show me which passengers are on that flight, Alice and Bob. 1469 01:07:45,290 --> 01:07:50,010 And that allows me to see the passengers that are currently on a given flight 1470 01:07:50,010 --> 01:07:54,150 by interacting with that database. 1471 01:07:54,150 --> 01:07:58,200 So last thing I'll show before we go on to the morning project 1472 01:07:58,200 --> 01:08:01,190 is to take a look at how we might combine this into a web application, 1473 01:08:01,190 --> 01:08:03,000 like a Flask web application. 1474 01:08:03,000 --> 01:08:08,810 I'll go ahead and run flask.run, open up the Flask URL. 1475 01:08:08,810 --> 01:08:12,220 And what I see here is a page where I can book a new flight. 1476 01:08:12,220 --> 01:08:14,945 In this dropdown, I see a list of all the possible flights. 1477 01:08:14,945 --> 01:08:17,319 And say I want to book a flight on the New York to London 1478 01:08:17,319 --> 01:08:21,060 flight, type in my name, click book flight. 1479 01:08:21,060 --> 01:08:24,359 [INAUDIBLE],, it says success, you successfully booked a flight. 1480 01:08:24,359 --> 01:08:28,810 And if I go back to passengers.py-- 1481 01:08:28,810 --> 01:08:37,517 go back-- passengers.py and go to flight number one and take a look at it. 1482 01:08:37,517 --> 01:08:40,100 All right, I now see that I am now registered for this flight. 1483 01:08:40,100 --> 01:08:42,740 I've been able to add myself to the database 1484 01:08:42,740 --> 01:08:45,279 by making some database queries. 1485 01:08:45,279 --> 01:08:48,200 What was the code that made that work? 1486 01:08:48,200 --> 01:08:51,802 You can take a look at it more closely if you'd like to in your own time, 1487 01:08:51,802 --> 01:08:53,510 but I'll briefly go over what's going on. 1488 01:08:53,510 --> 01:08:57,950 On the Index page, if I just go to the Home page, the first thing I do 1489 01:08:57,950 --> 01:08:59,420 is run SELECT star from flights. 1490 01:08:59,420 --> 01:09:02,899 Get me all of the flights, save it in a variable called Flights, 1491 01:09:02,899 --> 01:09:06,240 and pass that into index.html. 1492 01:09:06,240 --> 01:09:12,935 All index.html is going to do now is basically show me a select dropdown box 1493 01:09:12,935 --> 01:09:15,810 where for every flight in my list of flights I'm going to run a loop. 1494 01:09:15,810 --> 01:09:19,140 I want one dropdown option for each flight that I have. 1495 01:09:19,140 --> 01:09:23,399 I have an option whose value is going to be the flight's ID. 1496 01:09:23,399 --> 01:09:26,520 And the value is going to be important because when I submit this form, 1497 01:09:26,520 --> 01:09:29,250 it's going to use this ID as the value that I get. 1498 01:09:29,250 --> 01:09:31,439 I want the flight's ID to be the value. 1499 01:09:31,439 --> 01:09:34,109 And then what shows up to the user at least, the text, 1500 01:09:34,109 --> 01:09:36,860 is just flights origin to flights destination. 1501 01:09:36,860 --> 01:09:41,200 So I saw New York to London, for example. 1502 01:09:41,200 --> 01:09:44,290 I have a place for the passenger to type in their name, 1503 01:09:44,290 --> 01:09:46,986 and then I have a button that's going to book the flight. 1504 01:09:46,986 --> 01:09:48,819 And there are a bunch of classes here, these 1505 01:09:48,819 --> 01:09:51,250 are bootstrap classes to add bootstrap styling. 1506 01:09:51,250 --> 01:09:54,800 Just makes the page look a little bit nicer, if you might have noticed. 1507 01:09:54,800 --> 01:09:56,460 And so that's all that's going on here. 1508 01:09:56,460 --> 01:09:59,630 I've got a form with a dropdown, one option for each flight, 1509 01:09:59,630 --> 01:10:03,170 a place to type in a name, and then a button to book the flight. 1510 01:10:03,170 --> 01:10:07,530 When the flight's actually booked, it's going to go to this route, /book. 1511 01:10:07,530 --> 01:10:13,300 I first get the name of the user, then I try 1512 01:10:13,300 --> 01:10:17,470 to get the flight ID, making sure that it's an integer. 1513 01:10:17,470 --> 01:10:19,930 If it's not an integer, it's going to throw a value error, 1514 01:10:19,930 --> 01:10:21,221 so I'm catching that exception. 1515 01:10:21,221 --> 01:10:24,040 Remember, Python exception handlers from a couple days ago. 1516 01:10:24,040 --> 01:10:29,000 And I'm going to render an error page in case it's not actually a valid integer. 1517 01:10:29,000 --> 01:10:31,490 Then I'm going to make sure the flight actually exists, 1518 01:10:31,490 --> 01:10:34,460 SELECT star from flights where that ID is equal to the ID. 1519 01:10:34,460 --> 01:10:36,470 And if the flight doesn't exist, I'm going 1520 01:10:36,470 --> 01:10:38,660 to render an error page that says, sorry, 1521 01:10:38,660 --> 01:10:41,140 there's no flight that has that ID. 1522 01:10:41,140 --> 01:10:42,890 And as long as the flight does exist, I'll 1523 01:10:42,890 --> 01:10:45,890 go ahead and insert something into the Passengers Table, 1524 01:10:45,890 --> 01:10:49,310 give it a name and a flight ID with whatever name the user typed in 1525 01:10:49,310 --> 01:10:51,950 and whatever flight ID the user typed in. 1526 01:10:51,950 --> 01:10:55,675 Commit that result as a result and then just display the success page. 1527 01:10:55,675 --> 01:10:57,800 I know I went through that rather quickly, but just 1528 01:10:57,800 --> 01:10:59,060 to give you the general idea. 1529 01:10:59,060 --> 01:11:01,010 You can look to the application online if you 1530 01:11:01,010 --> 01:11:03,960 want a look at the specific syntax that's used there. 1531 01:11:03,960 --> 01:11:07,580 And so those are the capabilities that we get just 1532 01:11:07,580 --> 01:11:11,720 by adding a little bit of information about how to connect to and interact 1533 01:11:11,720 --> 01:11:13,457 with the database. 1534 01:11:13,457 --> 01:11:15,290 With not a whole lot more code, we can begin 1535 01:11:15,290 --> 01:11:18,390 to add a little more sophistication to this application as well. 1536 01:11:18,390 --> 01:11:23,500 I'll go ahead and go into Airline 1 and show you the types of things we can do. 1537 01:11:23,500 --> 01:11:25,810 We'll go to the main page. 1538 01:11:25,810 --> 01:11:28,870 And this page does the same thing as before, 1539 01:11:28,870 --> 01:11:34,090 but now I can also go to /flights, a different route that I just added 1540 01:11:34,090 --> 01:11:37,260 whereby a list all flights, and I can click on any one flight. 1541 01:11:37,260 --> 01:11:40,660 If I click on like New York to London, for example, then I'll 1542 01:11:40,660 --> 01:11:41,890 see flight details. 1543 01:11:41,890 --> 01:11:44,920 Origin is New York, destination is London, here's the duration, 1544 01:11:44,920 --> 01:11:47,110 and here are the passengers on the flight, 1545 01:11:47,110 --> 01:11:50,380 allowing myself the ability to interact with this database all the more 1546 01:11:50,380 --> 01:11:54,330 and see the results in a nicely formatted web page. 1547 01:11:54,330 --> 01:12:04,470 And in order to do that, all that I needed to do was in application.py, 1548 01:12:04,470 --> 01:12:06,300 add some additional routes. 1549 01:12:06,300 --> 01:12:10,410 I added a flights route that very simply just selects all the flights 1550 01:12:10,410 --> 01:12:13,050 and returns the flights.html page. 1551 01:12:13,050 --> 01:12:16,500 And then I added a specific flight route, slash flights, 1552 01:12:16,500 --> 01:12:19,020 slash integer flight_id. 1553 01:12:19,020 --> 01:12:22,830 This again, as we saw in Flask, was our variable URL 1554 01:12:22,830 --> 01:12:26,790 whereby we have slash flight slash 1, slash flight slash 2, slash 28, 1555 01:12:26,790 --> 01:12:30,150 or any flight ID whereby then we're going to try and request 1556 01:12:30,150 --> 01:12:33,420 that particular flight and then get all of the passengers 1557 01:12:33,420 --> 01:12:39,150 on that flight returning the results in an html page. 1558 01:12:39,150 --> 01:12:41,250 High level questions about any of that? 1559 01:12:41,250 --> 01:12:43,500 I know I didn't go through all the code, but feel free to take a look at it. 1560 01:12:43,500 --> 01:12:45,480 It's basically just a combination of the things 1561 01:12:45,480 --> 01:12:48,780 that we've been doing already, using db.execute to make database queries 1562 01:12:48,780 --> 01:12:52,020 and then using Flask and our ability to generate templates to display 1563 01:12:52,020 --> 01:12:54,202 information in a way that looks nicer. 1564 01:12:54,202 --> 01:12:56,740 1565 01:12:56,740 --> 01:12:57,240 All right. 1566 01:12:57,240 --> 01:13:00,364 So what I thought we'd do as our morning project, rather than dive into web 1567 01:13:00,364 --> 01:13:03,720 applications right away is just to get set up with Postgres, 1568 01:13:03,720 --> 01:13:07,440 and trying to import data into a database, a Postgres database 1569 01:13:07,440 --> 01:13:09,900 that we can host on Heroku, such that we can write 1570 01:13:09,900 --> 01:13:11,400 Python scripts to interact with it. 1571 01:13:11,400 --> 01:13:13,230 And if you'd like to, later on we can begin 1572 01:13:13,230 --> 01:13:15,450 to build web applications around it as well. 1573 01:13:15,450 --> 01:13:17,880 And so I've given you a sample dataset. 1574 01:13:17,880 --> 01:13:23,340 If we go to CS50.harvard.edu/beyond, go to the Beyond page, go to Lectures, 1575 01:13:23,340 --> 01:13:28,560 and go to Today, you'll see I've given you a CSV file called Books Data, 1576 01:13:28,560 --> 01:13:31,470 which is a CSV file of 5,000 books. 1577 01:13:31,470 --> 01:13:36,540 So a whole bunch of books ranging from Shakespeare up until modern books. 1578 01:13:36,540 --> 01:13:39,800 If I download this, you'll need to-- 1579 01:13:39,800 --> 01:13:42,780 might need to create a Dropbox account to be able to download it. 1580 01:13:42,780 --> 01:13:47,760 But you download books.csv, and you'll get a CSV file. 1581 01:13:47,760 --> 01:13:55,320 We go back into books, you'll get a books.csv file that 1582 01:13:55,320 --> 01:13:56,780 looks a little something like this. 1583 01:13:56,780 --> 01:14:00,770 It's a whole bunch of ISBN numbers, titles, authors, and years. 1584 01:14:00,770 --> 01:14:06,140 And what we're going to have you do is create a database on Heroku, 1585 01:14:06,140 --> 01:14:08,930 first by creating a Heroku application for these books, 1586 01:14:08,930 --> 01:14:11,720 and then setting up a new Heroku Postgrs database. 1587 01:14:11,720 --> 01:14:14,480 Decide on a database schema in which to store these books. 1588 01:14:14,480 --> 01:14:16,910 What are the tables, what are the columns going to be, 1589 01:14:16,910 --> 01:14:19,950 what are their types going to be, are there any constraints on them? 1590 01:14:19,950 --> 01:14:22,580 You should have one table for authors and one table for books, 1591 01:14:22,580 --> 01:14:25,130 but beyond that, we'll leave the flexibility there up to you. 1592 01:14:25,130 --> 01:14:28,610 And then write a program to take the data from the CSV file 1593 01:14:28,610 --> 01:14:30,636 and import it into the database. 1594 01:14:30,636 --> 01:14:32,510 You'll find that you can reference import.py, 1595 01:14:32,510 --> 01:14:34,426 the one that I used in order to import flights 1596 01:14:34,426 --> 01:14:36,427 into the database as a potential starting point, 1597 01:14:36,427 --> 01:14:38,884 although this one is going to be a little more complicated. 1598 01:14:38,884 --> 01:14:41,240 Because whereas the import.py from the flights example 1599 01:14:41,240 --> 01:14:44,120 just imported things into a single table, the Flights Table, 1600 01:14:44,120 --> 01:14:47,840 here you're going to want to import the author into the Authors Table, the book 1601 01:14:47,840 --> 01:14:49,940 title into the Books Table, probably. 1602 01:14:49,940 --> 01:14:53,630 And it's possible that there are multiple books by the same author, 1603 01:14:53,630 --> 01:14:56,780 for example, that's inside the CSV file, that you 1604 01:14:56,780 --> 01:14:59,090 can assume if there are two authors with the same name, 1605 01:14:59,090 --> 01:15:02,239 they are in fact the same author, at least for these purposes. 1606 01:15:02,239 --> 01:15:04,280 And then if you have time, write a Python program 1607 01:15:04,280 --> 01:15:07,310 that lets you type in a title of a book or the name of an author. 1608 01:15:07,310 --> 01:15:09,200 Or if you'd like to, like a partial title, 1609 01:15:09,200 --> 01:15:11,850 or a partial match of the author's name and returns 1610 01:15:11,850 --> 01:15:14,820 you all of those possible results. 1611 01:15:14,820 --> 01:15:17,180 And so you can definitely use the source code examples 1612 01:15:17,180 --> 01:15:19,439 that we use today as a starting point for that. 1613 01:15:19,439 --> 01:15:21,230 But that's going to be the morning project. 1614 01:15:21,230 --> 01:15:25,310 Whereby at the time that you're done with it, you should be able to say-- 1615 01:15:25,310 --> 01:15:30,520 and I'll go into Heroku, I set this up already on Brian Books. 1616 01:15:30,520 --> 01:15:34,090 Whereby if I go to Settings-- 1617 01:15:34,090 --> 01:15:44,420 or actually, we'll go to Heroku Postgres, and go to Settings, 1618 01:15:44,420 --> 01:15:47,330 and get the database credentials. 1619 01:15:47,330 --> 01:15:55,930 You should be able to go to Adminer, log in with the server name, the user name, 1620 01:15:55,930 --> 01:16:00,130 the password, and the database name. 1621 01:16:00,130 --> 01:16:04,120 And when you do all that, you'll find that you 1622 01:16:04,120 --> 01:16:10,000 can see for instance of the authors and data associated with those authors. 1623 01:16:10,000 --> 01:16:14,260 And then all of the books and data associated with those books as well, 1624 01:16:14,260 --> 01:16:16,630 along with some way of keeping track of the relationship 1625 01:16:16,630 --> 01:16:18,520 between the book and the author. 1626 01:16:18,520 --> 01:16:21,470 And so we'll leave it up to you to figure out how exactly to do that, 1627 01:16:21,470 --> 01:16:24,140 but we'll go ahead and break out into different rooms. 1628 01:16:24,140 --> 01:16:27,497 We don't have room 136 this morning, we'll get it back later this afternoon. 1629 01:16:27,497 --> 01:16:29,830 But if I could have this side of the room go to room 212 1630 01:16:29,830 --> 01:16:32,170 and everyone else can stay here in the auditorium, 1631 01:16:32,170 --> 01:16:33,961 we'll get started with the morning project. 1632 01:16:33,961 --> 01:16:37,500 Break for lunch at 12:30 and we'll reconvene at 2:00. 1633 01:16:37,500 --> 01:16:38,664