[MUSIC PLAYING] SPEAKER: All right, let's get started. So welcome back, everyone, to CS50 Beyond. So just to give everyone a little bit of a roadmap, we are here on day three of CS50 Beyond, which is the middle of the course. We spent the first half of the class talking about the building and designing of web applications using HTML, CSS. We took a look at Flask on the second day, looking at how to build web servers. And then we took a look at JavaScript, in order to write programs on the client side, in order to create more interactive user interfaces. 

Today, we'll continue our discussion of building web applications, in particular by talking about databases, which are going to be particularly important if you ever go around to trying to store information inside of a web application, managing users, or managing any kind of data in an application. We saw a little bit of this in CS50. We'll dive a little bit deeper, look at some other options for databases, look at some more advanced tools like ORMs that allow you to interact with databases a little more easily. 

And then in the latter half of the week, starting tomorrow, we'll introduce a framework called React. And React is a JavaScript-based framework that is built by Facebook that's designed to make it much easier in order to design and build interactive user interfaces. You might have found that yesterday, as you were writing JavaScript-based web applications, you were running into situations where you were writing a lot of code to do things that should have been fairly simple, writing code to update variables and change the way that the page looked based on the values of those variables. React is going to simplify that process, make it a whole lot easier, and you'll find that it's a very, very powerful tool to very easily build interactive user interfaces. So we'll take a look at that starting tomorrow. 

But today the main focus is going to be databases. And so we saw this in CS50, but the basic idea of what a database is, is you can think of it-- at least in our terms-- as a collection of tables. And a table looks something along these lines, or you have columns that each refer to a different category of data that you might want to store, and then rows, one for each entry inside of a database. 

And the example that I'm going to be using throughout this morning and this afternoon is, in particular, thinking about the types of data that I might want to store if I were building a database for a web application for an airline. For instance, managing the flights of that airline, services, and the passengers that are riding on planes on that airline. And so that's the data that we're going to be trying to keep track of, and we'll see examples that pertain to that. 

And so you might imagine that if we're trying to create a table to keep track of all the flights that an airline has, for instance, we might care about the origin of that flight, where it's going from, the destination of that flight, and then a duration in number of minutes long that that flight takes. And what we'd like to do is come up with some way of interacting with this database in order to make queries to it. 

And so this is where S-Q-L, or SQL, came in. And SQL was something we introduced in CS50 as a language for interacting with databases, a language that made it easy to request data from databases, to insert data into databases, to update data as well. And SQL is a language. And there are a number of different databases that use this language and so there are a good number of these. But a handful that are probably the most popular include My-S-Q-L, or MySQL, PostgreSQL, and SQLite. 

SQLite is the version that we were using in CS50. SQLite is a SQL database that is entirely based in a file. Your whole database was stored inside of a file. So for CS50 finance, for instance, all your database information was stored inside of a file called finance.db. And SQLite is a fairly simple, lightweight database. But it's not so good if you want to think about potentially scaling a database to be larger, if you need to start managing permissions. If you need to be able to have much more efficient queries, or if you're trying to scale something to a production level sort of thing you want anyone to be able to use and access on the internet, generally SQLite is not recommended for that. 

And so you'll generally want to use a database that scales a bit better, that is able to handle the load that a web application might bring. And MySQL and PostgreSQL are the sorts of databases that you might use for that. And so the database we're going to be using over the course of CS50 Beyond is going to be a PostgreSQL database, which is just a different database that also uses the SQL language. So you'll notice there are some slight differences. And we're using this database in particular because it'll be very easy to deploy Heroku, the service we introduced a couple of days ago, to design and deploy a web application to the internet. Offers free Postgres databases along with any of your applications. And that's going to make it very easy for us to set up a Postgres database and then to be able to insert data into it. 

And so just know that there are a number of different types of databases that all use the SQL language. We used SQLite in CS50. We're going to be moving away from that now, because we're thinking about if we actually want to start building web applications, deploying them to the internet, what are the types of databases we're going to use. And a SQLite database is just not going to work quite as well for those sorts of purposes. 

So much like in any programming language, where we have different data types for different types of data, SQL likewise also has data types. These are the data types you might commonly seen in PostgreSQL. And different databases will have different types of some of the stuff you're going to see here. Differs a little bit from SQLite which you might be familiar with from CS50. 

INTEGER, as you might imagine, just stores an integer. A DECIMAL number is a floating point number, something that might have digits after the decimal point. SERIAL is a special type in Postgres. That's the equivalent of what the auto-incrementing feature of SQLite was. A SERIAL variable is like an integer that will automatically increment every time. So it'll start at 1, then go to 2, then 3, then 4, so on and so forth. And you'll often use that for the ID column of your table where, if you recall, the ID column of our table we used as our primary key, the primary way via which we would identify rows in the table. And usually that's going to end up being SERIAL. 

VARCHAR is used for a variable number of characters, like a string of text, for instance. A TIMESTAMP keeps track of a date and time. A BOOLEAN value could just be true or false. And then an ENUM is generally going to be one of a fixed finite number of values. So if you have a couple of possible options, and you only want to store one of those possible options in the database, an ENUM might be the type that you'll choose for that purpose. And so we'll have an opportunity to play around with that in just a moment. 

So how do we create a table in SQL? Well we'll use a command called CREATE TABLE. And in Postgres, at least, a CREATE TABLE query might look something along the lines of this. This is SQL code to actually create a table programmatically. So we're saying CREATE TABLE flights and inside of this flights table, there are a couple of things that we want to store. 

We want to store an ID, since every table probably should have an ID column such that we can uniquely identify any row in that table. The type of the ID column is going to be SERIAL. In other words, it's going to auto increment one after another, after another. I don't have to tell the table what ID to give a flight. It's just going to automatically give it a flight ID. It's going to be a primary key. In other words, this is the primary way by which I'm going to identify a flight. And if some other table wants to reference the flight, then by default, it's going to reference it by that ID. And we'll see an example of that later. 

What else do we need to store about a flight? We need to store the origin of the flight which is going to be VARCHAR, just some string, the destination of that flight, and the duration of the flight, which is going to be an integer. 

And so let's actually try to create this table now, by creating a Postgres database and trying to put this information in there. And so I'll show you an example of that. Now there are a number of ways to set up a Postgres database. I could set it up locally on my computer. You can install and download Postgres onto your computer, in order to run a Postgres server on your computer locally. But the easiest way for us to get started with this is probably just going to be to go ahead and go to Heroku, and try and create a Heroku application and add a database to it. 

So I'll go ahead and go to New and Create New App much as I did before in Heroku, and I'm going to call this Brian-flights-- see if that's available, all right, great, it's available. I'll create the app. And, all right, I've created a new Heroku application. And now what I'd like to do is add on a database to it. And so if I go to the Overview tab here on Heroku, I'll see installed add-ons, which is an area where I can add some additional features to my web application. And in add-ons, I'm going to do a search for Postgres. And here's Heroku Postgres. This is Heroku's own service for providing you with the PostgreSQL database. I'll click on that. 

There are a number of different tiers. The free tier of the databases is all we'll need for now. The free tier will give me 10,000 rows that I can insert into this database, which will be more than enough for our purposes. And we'll go ahead and click Provision. 

And all right, we've been able to create a new Postgres database. And if I click on Heroku Postgres now, we'll load it. And if I go into the settings, I'll be able to see the database credentials for this database. 

And so the database credentials for this database is information that I'm going to need if I ever want to access that database. So it gives me the host name for where the database is being stored-- it's being stored on Amazon servers-- the name of the database, the username, the port number, and a password for being able to access the database. 

Databases like PostgreSQL have user management. You have a username and password, such that it's possible if you're maintaining the database to create multiple different users, each of which has their own credentials, and each of which might have different permissions or access to the database. And so there are a number of ways you can actually try to access this database. 

If you install PostgreSQL locally onto your computer, you can-- let me shut this down a little bit. If you install Postgres onto your computer, you can run a command like PSQL for PostgreSQL, followed by that database URL, and then you're going to be inside of a terminal where you can begin to run queries. I could just type, Create Table Here, and that will work just fine. 

So I could actually take this code, Create Table Flights, and just go ahead and paste it in to Postgres here, and press Return. And if all goes well-- OK, great. It created the table. And if I do backslash dT is the command to list all the tables-- looks like internet connection is a little bit slow-- but I can see that, great, I have a Flights Table there now. 

I'm going to go ahead and drop table flights to delete that table so I can show you a different way to create the table if you'd like to without using the command line. There are a number of services that you can use to interact with Postgres in a graphical way. Much in the way that we had phpLiteAdmin in CS50 to be able to interact with a SQLite database. 

CS50 offers one such service that we host called Adminer. So if you go to Adminer.cs50.io, you'll be taken to a user interface that looks a little bit like this that will let you log in to a Postgres database by providing information about the server, the username, the password, and the database. So if I go ahead and do this, take the server name, the host name, paste that in there, take the username, paste that into username, take the password, paste that into a password, and take the database name and paste that into database, I can go ahead and log into this database. 

And what I'll see when I log in is an interface very much like phpLiteAdmin, an interface where I can begin to execute queries, and I can look at my data and manage my data in that way as well. So in order to create a table, there are a number of ways to do this. I could just click the Create Table button, which will take me to a graphical interface where I can name the table, give it columns, much as you may have done in CS50. 

I could also just type raw SQL commands. And so that's what I'm going to do here. Just type SQL Command over here on the left hand side, which is going to open up a place where I can just run SQL queries, any queries that I want to run. And in this case, I'm just going to take this Create Table query, paste it in here, and go ahead and execute that query. 

All right. So it said, Query Executed OK. And now if I look here on the left hand side, you'll see that I do now have this table, called Flights. And that table, called Flights, has an ID, it has an origin, it has a destination, and it has a duration. 

So long story short, I went to Heroku, set up a new application, attached a Postgres database to it. That gave me access to all of the credentials for the database, username, and password, and host name, and such. Then you can access that database in any number of ways. 

If you have Postgres installed on your computer, you can access the database via the command line. Otherwise you can just use the web interface using something like Adminer.cs50.io to log into the database, and then to be able to manipulate it. Questions about anything so far? Yeah? 

AUDIENCE: Is there invalid [INAUDIBLE] to varchar columns? Like, [INAUDIBLE] for the kind of string [INAUDIBLE], but like an emoji, or something along those lines? Are there things that varchar can't force into into a string? 

SPEAKER: Are there things that varchar can't force into a string? There might be binary data types. You probably wouldn't be able to store an image for instance, there are different types for dealing with binary data that are not just plain strings, and so you'll probably want to pick and choose the type appropriately for that. Yeah? 

AUDIENCE: Can you go through the IDE to get to the [INAUDIBLE]? 

SPEAKER: Can you go through the IDE? There may be a way, depending on the version. But generally speaking, this is just accessible via the internet, so you'll just go to a URL and you'll be able to access it. Yeah? 

AUDIENCE: I'm not too sure how the [INAUDIBLE] Adminer [INAUDIBLE] related to the Heroku. 

SPEAKER: Oh, good question. What's the relationship between Adminer here on CS50 and the Heroku database? Heroku has set up the database for me, and is effectively hosting the database. So if you took a look at the host name, you'll see the database is actually being hosted on Amazon servers. 

And all Adminer.cs50.io is doing, is it's connecting me to a database located somewhere else on the internet, and I'm interacting with it via this graphical interface. And so the database itself is being hosted on Amazon servers somewhere, but so long as I know the credentials, like what URL to access, what username, what password, I can access that database on Amazon servers in any number of ways. 

I can access it via the command line, as you saw me do a moment ago, or you can access it via any number of database browser applications, of which Adminer is one example of it, whereby you just type in the credentials for the database and that will connect you to the database and let you interact with it online. Yeah? 

AUDIENCE: So do you solve the database locally? Would you still be able to do [INAUDIBLE] 

SPEAKER: So if you have the command line application, the PSQL application that I was using a moment ago, you can run that on the command line to access any database on the internet by just typing in the URL for that particular database, which is what you saw me do a moment ago. It's also possible to not host your database on Heroku servers, to just host the database locally on your computer, and then it's not going to be accessible by anyone on the internet unless you open it up to make it so. 

All right, we'll keep going. All right. So we saw the Create Table Query, creating a table called Flights. When you create a table, there are additional constraints that you can put on those tables. And so you may have seen some of these in CS50. 

We can specify not null on a column. So if I said, origin is a varchar and it's not null, that would mean that the column cannot be empty, so there needs to be something there. Unique is another keyword you might use, if something needs to be unique. So you might imagine, if users are signing up for a website, maybe the email column is going to be a unique column, because you only want each email address to be associated with at most one account, for example. 

Primary key is one that we saw before. Default is something you can use to specify what the default value of a column should be. So for instance, if you wanted to say that the default value of some integer should be 0, or 1, or whatever number you want it to be, you can specify some integer column and then say, default 0 to mean the default value of the column is 0. If I don't specify a value, here's what the default value should be. 

And then check is an additional one that you can use to add a constraint to a column that is Boolean logic. I could say, I want a column called age that is an integer. And then I could say, check age greater than 0. In other words, make sure that the age that I'm inserting into the database is positive. I don't want to allow invalid input into the database like negative ages, for example. And so you can use check to enforce certain constraints on part of the database itself, so the database will manage those constraints automatically for you. So additional constraints that you can use there. Yeah? 

AUDIENCE: If you [INAUDIBLE] data on the check [INAUDIBLE] are you going to have an error message? 

SPEAKER: You'll get an error message, yeah. Yeah? 

AUDIENCE: Can you use regular expressions with check? Or is just like, [INAUDIBLE]? 

SPEAKER: You can use regular expressions, maybe. You'll have to check the documentation on that. I'm not 100% sure. 

All right, so we've created the table, now we'd like to insert data into the table, to actually add information into the table. The syntax for this looks pretty much the same between PostgreSQL and SQLite. You'll use syntax that looks like this. In Postgres, the single quotes are how you define strings, not double quotes. And we'll say something like, insert into, meaning add something into a table. 

Which table do I want to add things into? I want to add things into the Flights Table. Next, in parentheses come, what are the column names that I want to add things into? I want to add which particular values, I want to add an origin, I want to add a destination, and I want to add a duration. Now which column on my table is missing from these column names that I specified? 

AUDIENCE: ID. 

SPEAKER: ID, great. And why do I not need to specify that? 

AUDIENCE: [INAUDIBLE] 

SPEAKER: It will automatically increment, great. It automatically adds to the ID, so I don't need to worry about inserting it. And if I had other columns that had default values for instance, I wouldn't necessarily need to include them here either if I wanted them to just take on their default values. I only include the columns that I actually want to insert values for. 

Because after the keyword VALUES in all capital letters comes, what are the actual values I want to insert into this database? I want to insert New York as the origin, London as the destination, and 415 minutes as the duration for that particular flight. And so that that query will allow me to do that insert 

So I'll show you an example of that using Adminer's interface, at least for now. I'll go ahead and go to SQL commands. And here, you could write something like, insert into flights. Origin, destination, and-- what was the last one-- duration. And then insert particular values, and I'll say, OK, New York, and London, and 415. And I can add multiple different flights. 

I'll go ahead and run a couple queries. Instead of New York, let's say Shanghai. And this flight is going to go to Paris, and that's going to take 760 minutes. And I'll add one more. And we'll say this flight is going to go from Istanbul to Turkey, and it's going to be 700 minutes. So just adding some sample data that I can then use in order to insert a bunch of flights. 

So I say, OK, these queries have all executed OK. One row affected, meaning I've added a row into these databases. If I go into Flights and go to Select Data-- and of course there are a bunch of different database browsers, Adminer is just one of them, but they all operate in fairly similar ways. I can see here, I've got my table where I've got three rows, each of which has a unique idea, each one has an origin origin, a destination, and a duration associated with them as well. 

Just for the sake of giving myself a little more data to work with, I'm going to go into-- insert-- I'm going to go ahead and grab three extra rows and just insert those. So I'll go ahead and run three more insert queries, just to give myself a little bit more data. That'll make things more interesting in just a moment. 

And so now if I look at my Flights Table, Select Data, all right, great. I've got six rows inside the database now. Questions about inserting data into a database. This much at least should be review from CS50. 

All right, we'll briefly go through selecting data from a database. Again, this is mostly review from CS50, but good practice if you haven't used SQL in a little while. The basic way to select things from a database is a query that looks something like this. Select star from flights. Flights is the table I want to select things from. Star means I want to select every column, I care about getting all the columns from the database. 

And that might not always be true. It might be more efficient to only ask for the columns that you actually care about. But if I want to select all the columns from the flight's table, what I'll get back is all of the data. Every row in the database is going to come back to me, for this particular table at least. 

But maybe I don't care about the duration, maybe I don't care about the ID, I only care about knowing, OK, where flights coming from and where are they going to? I can revise the query to look a little something like this. Select the origin, the destination from the flight table, just specifying the columns that I want to select. And rather than selecting all of the columns from the table, that's going to result in just selecting the two columns that I actually care about, the origin column and the destination column. 

Still here I'm selecting every single row, so you can also add what's called a WHERE clause to specify which rows you actually want to select. So you can write something like, select star from flights where ID equals 3. In other words, just select flight number three and that will only return all of the columns-- as indicated by the star-- from the table row that has an ID of three. All of this at least so far should be review from CS50, but questions before I move on? 

Queries can return multiple roles as well. If I did a query that looked a little something like this, select star from flights where the origin is equal to New York, well that's going to select only the flights where the origin column has a value of New York, and that will return just those two, for instance. And it doesn't just have to be a quality. You can use other Boolean expressions here as well, greater than, less than. 

For instance, I could say, select all the flights who have a duration of greater than some number of minutes, and what I'll get back is all of the rows that match that query, only the longest flights, for example. And because they're Boolean expressions, you can combine them in order to combine multiple Boolean expressions to create more complex queries. I could say something like, select start from flights where the destination is Paris and the duration is greater than 500, and that will only give me flights that have a destination to Paris and that are longer than 500 minutes, even if there are other flights that are going to Paris, even if there are other flights that have a long duration. 

Of course if you change the and to an or, that will create a different Boolean expression. That will return to you anything that either is going to Paris, or is a long flight, or both. And that's totally fine too. 

This is all stuff you saw in CS50. There are a couple other types of queries that we didn't see in CS50. So things like, if I want for a column to be in some possible set of possible values as opposed to one particular value, I might have a query that looks something like this. Select star from flights where the origin is in-- and then this tuple of New York and Lima. In other words, I want the origin to be one of those two cities, but I don't really care which, just return to me all of the rows that match either one. And so we can do a query like that in order to get those results. 

And sometimes I want to be able to perform a search, but I don't quite yet have the entire column name. Like maybe I want to be able to implement some sort of auto-complete feature, whereby if I start typing the name of a city, my airline application is able to figure out what flight I'm referring to, even if I don't type in the whole thing. Or you might imagine that search engines do things like this, where you can type in a word, and even if it's only a partial match of something, it will ultimately result in matching that particular row. 

And so to do that, you might use syntax that looks a little something like this. Select star from flights where origin is like-- keyword like-- and then %a%. The percent sign here is a wildcard, basically means, this could be anything. And so what I'm here saying is, the origin has an a, and before that could be anything, and after that could be anything. In other words, get me any flight where the origin has the letter a in it somewhere, and I'll get back all the flights that have an a in it somewhere. 

So you imagine an application where I'm just typing in where I want my flight to be going from, I type the letter a, we might run a query like this, return the possible results, and those are going to be the matches for all the flights whose origin matches a particular expression like that. Yeah, question? 

AUDIENCE: Is that case sensitive? 

SPEAKER: Is that case sensitive? It's probably going to depend a little bit on the database implementation. There are some databases that are not case sensitive, and I can't remember exactly which one this is. 

In addition to that, there are some functions that you can use as well. I can sum up rows, I can count the number of rows that match something, get the min, or the max, or the average of something. And so I'll show you some examples of that as well. 

So let's go ahead and go into SQL Command where I can run a SQL command. And the basic command is just, select star from flights, which I can type in there, click Execute, and the result of that is going to be-- all right, all of the rows, as I can see here. Select star from flights just returns me everything. I could also say something like, all right, SELECT COUNT star from flights. 

Count star-- star just meaning all the columns, and count just meaning count these things-- means, get me just the number of flights, if I just care about the number of flights that have a certain property. I press Execute and I get back a table whose column is count and just has a single row, and that rows value is 6, for example. I could add to this query, SELECT COUNT star from flights where the origin equals New York, for example, to just get the number of flights whose origin is New York. 

Press Execute. And all right, what I get back is 2. And now, you can imagine that this is not necessary for your web application, right? If you wanted to know the number of flights that are going from New York, you could just query for all the flights that are from New York in Python, for example, as you might have done in finance or something similar, and then just count the number of rows that come back. 

But why might this be preferable, using built in SQL functions? Yeah? 

AUDIENCE: A little faster? 

SPEAKER: Yeah, it's a little bit faster. The database is only returning the information I actually care about, just the count. And it's not returning the information I don't care about. If I don't care about where any of these New York flights are going, and I just want to know how many results there are, then all I need to do is return this count. 

And so you can do this for other things as well. I could say, all right, give me the average duration from the Flights Table. In other words, take all the flights, take their duration column, and just go ahead and give me the average duration. And I can press Execute, and all right. The average duration is 501 and 2/3 minutes, for example. I can add to that, like the average duration of flights to New York or whatnot, in order to add arbitrary logic to my queries as well. 

Questions about selecting data? All right. Updating data, again, very similar to what we saw in CS50, it's just going to be a review. The syntax looks a little something like this. 

You update the flight name, called Flights-- or the table name called Flights. Setting a column to a particular value, so setting duration of 430 where a particular condition is true. So the New York to London flight, I want that to have a duration of 430, for example. 

And deleting flights, again, just a review from CS50 looks a little something like this. DELETE FROM, name of the table. And then if I just had DELETE FROM Flights, that would delete every flight, which is not what I want to do, hopefully. What I want to do is specify a condition, delete from flights where the destination is Tokyo, if for some reason I didn't want any of my flights to go to Tokyo. You could do a command like that in order to get that result. 

Questions on anything so far? There are other clauses that you can add to your queries as well. If I say, select star from Flights, LIMIT 5, for example, that will just get me five flights instead of all of the flights, if I only wanted to get a subset of them. 

ORDER BY is a way that you can sort what order the results come back in. And I could say, select star from Flights, ORDER BY duration, and then ASC for ascending, or DESC for descending, in order to get all of the flights sorted by their duration, for instance. And I can combine these. If I said, select star from flights, ORDER BY duration, and then LIMIT 5, I'll just get the five shortest flights, for example, because I'm sorting them in order of duration and then just getting the first five rows. So that works as well. 

GROUP BY is a way of grouping results together. So if I want to count how many flights are coming from each of the various different locations, you can group things by their origin as well, for instance. So I could say something like, select origin from Flights. And then if I say, GROUP BY origin, what that's going to do is it's going to take all of the rows that have the same origin and group them together into the same result. 

And so instead of just selecting origin, I'm going to select origin and count star. Recall that count star is going to count how many rows return for a particular query. And so I'll show you what this looks like, and then we'll look at the query and see why it works. 

And we get, OK. Shanghai, 1, New York, 2, Istanbul, 1, Lima, 1, Moscow, 1. And what that's basically doing is telling me how many flights have any particular origin. What I was doing is, by saying GROUP BY origin, I took all of the rows, grouped them by their origin, then I'm selecting just the name of the origin, and then the count of how many rows are there in that grouping. 

You could do the same thing with other functions. If I wanted to find the average length of a flight coming from any particular city, you could run the same query in order to get that information as well. And so grouped by you can use in order to collect results together in order to get more interesting queries there too. Questions on anything? 

All right. Let's go on then and talk a little bit briefly about foreign keys. And so where SQL gets very powerful and where we really start to use it a lot more is in terms of relationships between tables, again, something that you saw in CS50, but that we'll explore a little more today. Well, we might have a flights table, but that flights table maybe wants to store additional information. 

In addition to just storing the origin and the destination, maybe we also want to store information like what the airport code is for that particular airport. So I could do something like this, have an origin, have the origin airport code, have the destination, have the destination airport code, and just add columns to my existing table. But this might not be the best design. 

So what I want you to start doing today is start thinking about database design. How should your tables be organized? Should you have multiple tables? What fields belong with which tables? 

And you imagine that if I start doing things like this, my database very quickly is going to start to get complicated. I'm going to run into situations where, OK, maybe now I also want to store the latitude and longitude of each of these airports, and now I'm adding more and more columns to this table. And so what I could do instead is separate things out into a different table, partition the database in order to separate out information that I want to store separately. 

Just have a locations table where every location has the location code and also the name of that particular airport, or the location where that airport is. And then my Flights Table is going to look like what? What would the flights table look like if I have locations stored in a structure like this? What columns do I need? Yeah? 

AUDIENCE: Well, you wouldn't need the-- if you had a flights table separately, you wouldn't need the whole [INAUDIBLE] airport. [INAUDIBLE] Or you could reverse it, right? 

SPEAKER: Yeah. You don't need to store now both the airport code and the name. You can just pick one column of this, ideally a unique column-- and the default choice is going to be the ID-- to say, if I know that the flight is going from location ID 1 to location ID 2, then I know it's a flight from New York to Shanghai, for example. And so my flight's table can effectively just turn into this, whereby I have an ID for the flight, and then rather than storing the origin and the destination, I store the origin ID and the destination ID, and then the duration for that flight as well. 

And then if I wanted to store passengers on flights, I could do something along the lines of this where I store a whole bunch of people's names, and then just store what flight number they're associated with. And so the SQL gets very powerful when you begin to reference other tables by their ID, or by their primary key, and then be able to relate them together. And so we'll take a look at this. 

I will now go ahead and create a Passengers Table. And if I go in to joins.SQL in just a moment, let me select some rows. I'm going to run a couple queries. I'm going to create a table called Passengers, where Passengers is going to have an ID. 

It's going to have a name of the passenger and it's going to have a flight ID, which is going to be an integer. And interestingly here, I added to the end of this query. I added flight ID integer references flights. In other words, I'm saying that this flight ID column is going to be a column that is going to reference some other table. And in particular, it's referencing the Flights Table. 

This flight ID is going to refer to the ID of the Flights Table. And now I can insert data into this table. I can add a whole bunch of names in the flights that they're associated with so that I can add passengers to this flight as well. 

So I'll go ahead and click Execute here. And all right, I've created a Passengers Table and have added a whole bunch of passengers into my database. Questions before I go on? 

All right. So the next thing that I'll want to do is somehow join this information together. I have data stored separately now, I've got a Flights Table, and I've got a Passenger Table. But in many cases, I'll want to be able to access that information collectively. Like, I want to know, Alice, where is her flight departing from? And I want to know the origin of Alice's flight. 

And that information is not stored in a single table. It's stored in two tables, because Alice is stored in the Passengers Table, and her flight is being stored in a Flights Table separately, and I'd like to be able to relate that information together. And so there are a number of different types of joins that I can do. 

The default kind of JOIN that we'll do is what's called an INNER JOIN, which is basically just taking all of the rows that match between two tables and only returning their intersection, the results that have then come together. LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN are different types of joins that work in slightly different ways. We're not going to worry about them too much because INNER JOINS will usually take care of what we need to do, but just introducing yourself to the idea that there are other types of queries for other different situations, depending on your use case. Might be interesting, as well. 

But the syntax for what this is going to look like is going to look like this. I would like to select information about all the passengers and where they're going from and where they're going to. So if I were to just select star from passengers, just select all the data in the Passengers Table and click Execute here, what I get is something like this. A whole bunch of IDs, names of the passengers, and what flights they're on. 

But I would like to get not just what flight they're on, but their origin and their destination, where they're going from and where they're going to. So I'm going to modify this query slightly. I want to select the passengers name, but I also want to select their origin and their destination. 

But unfortunately, I can't just select this from the Flight's Table, and I can't just select this from the Passengers Table, because none of those two tables contain all of this information. The Passengers Table contains things like the name, but the Flights Table contains things like origin and destination, for example. And so what I want to do is, I want to join these tables together. 

And I'm going to explicitly say, INNER JOIN these tables together. And I want to join at the Flights Table with the Passengers Table. So I want to join two flights together. And then in the query I need to specify, what is the relationship between these two tables? How is it that these two tables are connected to each other? 

And the way that these two tables are connected to each other-- going onto a new line just for clarity here, though it doesn't need to be on a new line. I'm going to join these on what condition? Well, passengers.flight_id, so Passengers is the name of the table, .flight_id is the column. So this column, the flight ID column of the passengers table, needs to equal what? 

What should this be equivalent to if I want to join these things together? 

[INTERPOSING VOICES] 

SPEAKER: Flights.id, great. Flights, again, is the name of a table, and ID is the name of a column. And so what I'm saying here is that only join things together if the flight ID of the passenger matches up with the ID of the flight. I want to merge those things together. 

So I'll go ahead and click Execute here. And what I get is a new table that's returned back to me that is exactly the information that I want. I've joined the information from two different tables, the Passengers Table and the Flights Table, and I have a list of passengers, and also their origin and their destination, all returned to me in a single query. Yeah? 

AUDIENCE: If you have two tables that you're trying to [INAUDIBLE] and there happen to be [INAUDIBLE] of a column that both tables have, can you just not reference that name [INAUDIBLE]? What happens if you select only [INAUDIBLE]? 

SPEAKER: Excellent question. So what happens if I'm doing this select query and two tables have a column that have the same name? How do I deal with that situation? You'll just have to disambiguate it to tell Postgres which one you actually mean. 

And so you would do something like, rather than just select name, I would say select passengers.name. Meaning, from the Passengers Table, select the name column. If you don't include it, it's just going to default to assuming that there's only going to be one of them and just returning that. But you can be more explicit about it and specify the table name as well. Yeah? 

AUDIENCE: Will this table automatically upgrade as you add more information to other tables? 

SPEAKER: Would this table automatically update? This table is only computed one time. When I run this query, it generates this table and gives it back to me, the person running the query. And so if I update the table, I'll need to run the query again, and then it will be updated with the latest information. Correct. Yeah? 

AUDIENCE: If you're trying to do it the other way around, with the passengers [INAUDIBLE], would it just only [INAUDIBLE]? 

SPEAKER: If you're trying to get the passengers map to the flight IDs? What you'd probably get is you'd get one row for each pairing of passenger and flight. So a flight would appear multiple times. 

And that's actually what's happening here, that we have this flight, New York to London, showing up multiple times because both Alice and Bob are on it. 

AUDIENCE: OK. 

SPEAKER: Other things? All right. We'll do a couple other things about SQL queries. 

One thing that's just good to be aware of is the idea that you can create an index on a table for a particular column. This is particularly useful as you begin to scale a database. And an index on a table is very similar to an index on like a book, for instance. 

If you want to be able to find something more easily in a book, you can add an index to the book whereby in order to look something up in the book, you just go immediately to the index, find the thing you're looking for, and that tells you where to go. A table can have the same thing. If you add an index to a column, like the origin column of our Flights Table, then I can very easily look up a flight by its origin. I just need to look up the origin in the index and immediately select that data. 

So the create index query-- adding an index to your table will make your select queries faster. It will make it faster to query data from columns that you have an index on. So why then, given that information, do you think we might not just want to create an index on every column? 

We add an index to a book, makes it easier to select data. Why not just add it to every column? Yeah? 

AUDIENCE: [INAUDIBLE] bigger [INAUDIBLE]? 

SPEAKER: Makes the database bigger, for sure. I need to store more information now because in addition to storing all the data, I also need to store the index. Great. Why else? It's a good reason. There are other things too. Yeah? 

AUDIENCE: Just like, plug in indexes can't have entities on every single [INAUDIBLE] book talks about. Maybe if you have an index on too many things, it actually [INAUDIBLE]? 

SPEAKER: It could make things more difficult. So certainly you're getting into a good idea here, that there are trade-offs involved. And so in particular, one trade-off of creating an index is that now we need to keep updating that index any time your data changes. 

If I insert a row, delete a row, update a row, I might need to update the index in order to reflect these new values. And so while creating an index on a column makes select queries much faster, you can select data much more readily, much more easily, it does actually slow down certain queries. In particular, slowing down updates, and inserts, and deletes, because we also need to in addition to changing the data, also change the index as well. So just introducing that and those trade-offs, in case you see them in the future. 

One final type of query that I'll show you are nested queries, which might prove useful sometimes. So imagine that, OK, here I have my Flights Table, it's got origin, it's got destination, it's got a duration. And I have my Passengers Table, it's got an ID, it's got the name of the passenger, and it's got what flight they're on. 

Here is a more sophisticated GROUP BY query. Select flight_id FROM passengers, GROUP BY flight_ id HAVING-- and this a key word I mentioned, but we didn't really show or demonstrate-- COUNT star greater than 1. OK. Let's look at that query, try and tease it apart, and someone translate for me in English, what do you think this query is doing? 

Selecting the flight ID from passengers, GROUP BY flight ID, HAVING COUNT star greater than 1. I'll show you what the table actually looks like. Here's what the Passengers Table looks like. What happens if I run this query on the Passengers Table? What type of question am I asking? Yeah? 

AUDIENCE: If I say GROUP BY flight ID HAVING COUNT greater than 1, then it would group the flight IDs that have more than one flight ID. So it would be 1, and 2, and 6. 

SPEAKER: Great. We're going to get 1, 2, and 6, because what I'm doing here is I'm grouping all this table by the flight ID column, just putting them in groups based on their flight ID. And then I'm selecting only the ones that have count star greater than 1. In other words, when I count the number of rows, there are more than one of those rows. 

So what I'll get back is a table that just looks a little something like this. Flight ID 1, 2, and 6. These are the flight IDs that have more than one passenger on them. 

Feel OK about that query and how that works? Again, it's one of the more sophisticated queries. Not one we saw in CS50, but it's something potentially useful. Yeah? 

AUDIENCE: What was the meaning of count? 

SPEAKER: Count just counts the number of rows that are returned by the query. So the number of rows that have a flight ID of 6 is two, and that's greater than one, and so 6 is included. But the number of rows that have a flight ID of 4 is only one. And so 4 is not included in my result set. 

And so nested queries allow me to take a query and put it inside of another query, which SQL allows as much as you want to do. So I can take this query, again, this query that returns to me just the flights that have more than one passenger, and I can just put that into another query. I can say, SELECT star FROM flights WHERE id is IN, and then that big query that we just did, that query that returned to us flight numbers 1, 2, and 6. 

And so what is this query going to do? Yeah? 

AUDIENCE: [INAUDIBLE] 

SPEAKER: Exactly. We're going to get all the flight information, the origin, the destination, and the duration for only the flights that have more than one passenger. And so if you wanted to ask a question, like all right, here's my table of flights, I don't want all the flights, I only want the flights that have more than one passenger. Or perhaps a more realistic situation, select all of the flights that are over capacity, for example, you could run a query that looks a little something like this, get back only the results from the table that you actually want. 

And so depending on the question, these queries might become a little more sophisticated. But by using GROUP By, having nesting queries within each other, you can begin to construct a SQL query that will match exactly what it is that you're going for. Questions about anything so far? 

All right. I would be remiss if I didn't talk about a couple of vulnerabilities that come about with SQL. These are things that we talked about in CS50, but just reintroducing them so they're at the forefront of your mind as you go about thinking about writing programs that interact with databases. The first of which we talked about in CS50 is SQL injection. And what is SQL injection again? What is the threat? Yeah, Chris? 

AUDIENCE: [INAUDIBLE] 

SPEAKER: Yeah. The user trying to input SQL into some string input in the application, such that they can try and run malicious potentially SQL code on your database. So imagine a login system with a username and password feel that look a little something like this, and a SELECT query where we have SELECT star FROM users WHERE the username is, and then plugging in the username there, and the password is, and then plugging in the password there. 

Whereby if someone logs in normally with just username Alice, password 12345, then we take the names Alice and 12345, and we plug them in. Alice in the user name, 12345 in the password. And this might be a way via which we select data from the database. In reality, you also want to hash your passwords, but that's a separate matter. 

But imagine some malicious hacker decides to type in something a little something like this. Username is hacker and the password is this sort of odd string, 1' OR, and then '1 equals, and then '1, without a closing single quote, which seems strange. But when that gets inserted into the query, we end up plugging in hacker as the username. 

And when we plug in exactly what he typed as the password, we get SELECT star FROM users where username equals hacker and password equals '1' or '1' equals '1'. And this is always, of course, going to be true. And the result of this is that this query could result in giving a user access to an account that they shouldn't have. It could result in someone being able to access data, modify data, delete data. And so this threat is definitely very real. 

And in a moment when we actually take a look at Python code for being able to interact with databases, we'll look at different ways that we can prevent this and libraries that can help us try and prevent injection attacks, as well. Yeah? 

AUDIENCE: Do you need to [INAUDIBLE] or no? Like, could you just [INAUDIBLE]? 

SPEAKER: Well, the idea here is that the query that we're running already has password equals something. And so we have to come up with something to fill in here that will make this query always true. And so one of the easiest ways to do that is just compare two things that you know to be true. And so even if the password is wrong, this half of the OR query is true, and so the whole OR query is true. You can make the whole expression true if just half of the expression is true as well. 

AUDIENCE: Do you mean like the [INAUDIBLE]? Or could you put like, WHERE password '1' equals '1'? And wouldn't that [INAUDIBLE]? 

SPEAKER: So password refers to a column in the database. So this password is some password that's stored in the database that the hacker presumably doesn't know. And so they wouldn't be able to necessarily guess a password that is equal to the password in the database. If they could, they would just log in normally. And so this OR query guarantees that whatever it is that they're typing in is going to be a true expression. 

All right. And the other vulnerability, or at least security potential threat that you need to be aware of, something that can happen is a race condition. Race conditions are what happens if you aren't too careful about how different interactions with the SQL database in this case are going to interact with each other if they're happening concurrently. Two things that are happening at potentially the same time. 

Imagine, for example, that someone has a bank account and you run a query, SELECT the balance FROM the bank account. And that gives you, OK, the balance is $100. And then you say, OK, I have $100 in the bank account, let me go ahead and withdraw the $100 from the bank account, updating the bank, setting the balance to be balance minus $100 for this particular user, and then we'll give them the $100. 

And that interaction seems fine. But what happens if the user tries to open up two windows on their computer, or go to two different ATMs, and try and do this same exact thing at the same time? Now a computer, of course, can only process one thing at a time. So these threads, so to speak, might cross over each other. And so what can happen there after the bank balance goes to 0 in this case? 

Let's imagine a situation where on one side of things, in one window at one ATM, we ask, OK, how much is in the bank? And we say, OK, there's $100 in the bank. And at the same time or slightly after this point, someone else is asking, OK, how much balance is in the bank? And based on the current value of the bank account, we're also going to return $100. 

And so now both sides of this, both running threads, so to speak, think that there is $100 in the bank that they can withdraw. Which is true, there is $100 in the bank that can be withdrawn. But both sides think this, and so this side runs the query, OK, update the bank, set the balance to be balance minus $100, give the user $100, that changes to zero. And as far as this side is concerned, there is still $100 in the bank, and so it can run the same query, and now you're left with a bank account that has negative $100. So that's a potential issue to deal with as well. 

And the way that we generally deal with those problems is through something called SQL transactions. Which is, rather than allow for different queries to intervene, just group a whole bunch of queries together into a single transaction, and then when you're done with all those queries, commit them. And that will guarantee that all of those queries are run uninterrupted, that no other queries can intervene and get in the way of that. 

And so in a moment when we turn to Python code, we'll see where we're going to run queries, but we're not actually going to run them on the database until we decide to commit them. To say, take all these queries, group them together into this transaction, and run this transaction now. All right. 

So now we've introduced SQL. Some of this was review from CS50. But now let's turn our attention to how to actually write Python code that interacts with the database. And the library we're going to be using for this is called SQLAlchemy. 

We're going to move away from the CS50 SQL library, which you may have used in CS50 Finance, or in final projects to work on databases of your own. And just take a look at Python code you can write in order to interact with the database. And so let's start with an example that will just list all of the flights that are currently inside of the database. 

So right here, we have a list.py And I'm importing some functions from SQLAlchemy. No need to worry too much about what all this means. But the important thing here is that I need a database engine that's going to allow me to interact with the database. 

And when I create that database engine, I need to tell Python what database to use. And in order to do that, I'm going to need the URL for the database. And the URL for the database, at least in Postgres's case, is this thing here under URI, this long URL is the URL for my Postgres database. It's got all the credentials that I need. 

And I could have just said, create engine, and then in parentheses plugged in that exact URL as a string right there. You can do that. But oftentimes for security reasons, it's best not to, best not to put your database credentials, usernames, and passwords, and whatnot inside of your code. Because then anyone with access to your GitHub repository, for instance, can see exactly what the credentials for your database are. They can get into your database, they can see all your data. 

So what I'm doing here is putting-- I'm going to put the database name into what's called an environment variable, a variable that just exists on my computer system. And what I'm telling the Python program here is rather than just literally plugging in the string for the database into my Python program, I'm saying, I'm going to store the database URL inside of an environment variable that I'm going to call database_URL, but you could call it anything you want. 

And then I want the Python program to just draw the URL from the environment variable. Read that environment variable into memory and then use it, such that I don't need to put the name of the database and the credentials for the database literally in my code. And so in order to set that environment variable on a Mac or on Linux, you'll write something like Export database_URL equals-- and then I'll just type in the credentials for the database. 

If you're using Windows, you'll use the word Set instead of the word Export. So I've exported the database URL, set that environment variable in my system, and now what is this code actually doing? Well, let's look at the main function. 

We're saying flights equals, and then db.execute is my way of executing a database query. Select origin destination duration from flights, and then FETCH ALL is my way of saying, get me all of the results back from this. SQLAlchemy offers a number of different functions. You can do FETCH ONE to just get you a single result back from the database as a single entry. FETCH ALL is going to return to me a Python list of all of the results that came back. 

And then I can say, OK, for flight in flights for each one of these flights, just loop over them, go ahead and print out this formatted string, plugging in flight.origin. I'm using dot notation to say, get at the origin property of the flight, and get at the destination property of the flight, and get at the duration property of the flight, and just plug it in in those values. 

And so I'll run the program so you can see how it works, and then we'll go back to the code if you want to take another look at it. I run Python list.py And what I get is, all right, here are all of the rows inside of my Flights database that I've formatted using format strings in order to display to the user. 

I'll show you the code for that again. Take a look at it. Yeah, [INAUDIBLE]? 

AUDIENCE: Why do you use FETCH ALL if you don't [INAUDIBLE] FETCH ALL [INAUDIBLE]? 

SPEAKER: Why do you need FETCH ALL? In this particular case, you might not. It changes a little bit the type of data that gets returned, although both types of data I think are things that you can loop over. So you might be able to get away without FETCH ALL in this case. Yeah? 

AUDIENCE: Do the environment variables disappear any time the terminal window quits out or closes? 

SPEAKER: Yes. If you close your terminal window and reopen it, that will reset your environment variables. So you'll probably need to set the environment variable again. There are ways of setting it up so that every time you open a new terminal it will automatically set the environment variables appropriately. 

If you're using Bash on a Mac or on Linux for example, you can add something to your Bash profile, and that will automatically run the setting of environment variables anytime you open a new Bash Shell, anytime you open a terminal window. And we can help you work on that during project time if you want to set that up. Yeah? 

AUDIENCE: Sorry, just following up. So if you were pushing something to Heroku, you wouldn't want to set the variables [INAUDIBLE] anything is being [INAUDIBLE] closed down. So you'd want to set it up with the Bash profile, like [INAUDIBLE]? 

SPEAKER: Oh, good question. Question about Heroku. So my database credentials here are stored on my computer. So what happens when I push my application to Heroku, how is Heroku going to know the database credentials? Well you can set environment variables in Heroku in the settings for the application. 

So if I go to Brian Flights, my application here on Heroku, and go into Settings, and then go into here, config vars, which is just basically the environment variables. If I reveal them, one of them by default is already going to be database URL that is going to be equal to that Postgres database. That gets set up automatically when I create a Heroku Postgres database. But if you wanted other environment variables, you could add them here as well in order to add environment variables to the Heroku application. 

Other things? All right. So that was listing all the things inside of a database. Let's look at inserting things into a database. 

Maybe I want to insert them from a file. And a file type that we introduced in CS50 were CSV files, Comma Separated Values. And those files looked a little something like this. 

This is a file called Flights.CSV. And very simply, this is basically a very simple spreadsheet almost, where each row is a row in the spreadsheet. And the columns of the spreadsheet are separated by commas. 

You could open this up in Apple Numbers, or Microsoft Excel, or Google Sheets, and it would format nicely as a spreadsheet. But basically, each row is origin, destination, duration. And this is just some text file that I've created on my computer. And I would like to take this CSV file and take each row and insert it into my database. 

So how might I go about doing that? Well, one thing that we'll use for here is Python's CSV module, which is a module built into Python that makes it very easy to just loop over a CSV file. And so if I open up import.py. This is going to do the importing. 

And let's go ahead and take a look at the main function. The first thing that I do here on 911 is open up flights.csv. I'm taking the CSV file and opening it. OPEN is a built in function in Python that just opens up a file. And up at the top, I've imported the CSV module such that on line 12, I can say, csv.reader_f in order to generate an object that is going to basically read this CSV file for me one line at a time, an object that I can loop over and loop over every line in this file. 

So I'm going to loop over the file. And I'm going to say, for origin, destination, duration in this CSV reader. And I'm using origin, destination, duration because I know that in my CSV file, this file is formatted as origin, destination, duration. So I can specify, here's a variable name I want to apply to the first column, here's a variable name I want to apply to the second column, and here's a variable name I want to apply to the third column, just going over the whole thing. 

And now we're going to say, db.execute. And we're going to say, INSERT into flights-- this is the table-- these columns, origin, destination, duration. And then values, here I'm using a little bit of different syntax, syntax that might be familiar from CS50. But rather than just literally plugging in the origin, and the destination, and the duration there, which would open me up to potential SQL injection attacks. What I'm going to do instead is use this colon syntax to mean, this is a placeholder for the origin, and this is a placeholder for the destination, and this is a placeholder for the duration. 

And then I'm going to plug in values into those placeholders by also providing to db.execute a Python dictionary that associates keys-- which are the names of those placeholders, origin, destination, and duration-- with values, namely the variable names that I actually want to insert, which also happen to be called origin, destination, and duration, although they don't necessarily need to have the same names. They could be called anything. 

And what this is going to do is, SQLAlchemy is going to plug in these variables into those place holders, and it's automatically going to escape them. In other words, make sure that no SQL injection attacks are possible. So for security reasons, we'll always want to use these place holders using the colon syntax and then provide all of the variables that we want to plug in to the query there. 

Again, very similar to what we did in CS50. The syntax of it is a little bit different since we're no longer using the CS50 library. And we'll see an even easier way to do this later this afternoon. 

And then I'm just going to print a message. I added a flight from origin to destination lasting this number of minutes. That's just a printout to me so I can keep track of the progress of this application. 

If I'm only importing a couple flights, it's probably not going to take long. But you might imagine a situation where I'm importing thousands of rows into a database, and then it might start to take a while. And in fact, the morning project as you'll soon see, involves trying to import a big CSV data set into a database. And we'll take a look at that in just a moment. 

Once I'm done doing all of those db.execute queries, at the very end I have this line that says db.commit. Meaning commit the results, group all these into a single transaction, and just actually execute these queries in the database. 

So now, OK, here is flights.csv. If I run Python import.py, run this command, and all right. We added a flight from Paris to New York lasting that number of minutes, added a whole bunch of different flights in order to get the result that we want. And in fact, if I run Python list.py now to list all of the flights that are inside of my Flights Table, press Return, now I have-- if all goes well-- even more flights than before. Because I had all the flights I added originally plus all of these other flights that I've now imported. 

Questions about anything we've done so far? Yeah? 

AUDIENCE: Do you mind going back to the import.py? 

SPEAKER: Yep. Import.py is here. 

AUDIENCE: I was wondering, do you have to specify anywhere that you're not going to commit it until later? Or is it just standard that all of your db.executes, they don't run until it sees the db.commit? 

SPEAKER: A db.execute command that modifies the database, like inserting rows, updating rows, deleting rows, will not actually run in the database until it's committed. Now I could have done something like, OK, every time I execute, just immediately commit right afterwards and do that again, and again, and again for every flight, but this is less efficient. Because committing takes time, I actually to go to the database, talk to it. 

And so for efficiency sake, it's a little bit faster in this case just to say, wait until I run all the queries I want to run and just commit them at the same time. Send them to the database together, make that one single transaction. Yeah? 

AUDIENCE: Is Reader a function? 

SPEAKER: Reader is a function or a method, yeah, a function basically inside of the CSV module that's built into Python that just makes it easier to read CSV files. It automatically parses it, automatically separates things based on the comma so that we don't need to worry about doing any of that. Yeah? 

AUDIENCE: What's a engine [INAUDIBLE] specific to SQLAlchemy? 

SPEAKER: The engine is not specific to SQLAlchemy, it's pretty common across database programs. The engine you can think of as the entity in my Python code, the object that's taking care of interacting with the database, that's responsible for making requests to the database and getting back responses to it. And this scoped session as you'll soon see when we get into the world of web applications, this is specifically to make sure I can have multiple users that are using my web application, each of which is making their own database queries and has their own transactions that they're executing on the database separately. And we'll see that in a moment too. Yeah? 

AUDIENCE: What is the time efficiency of a database search like this? Like, if you want to select everything from the database? 

SPEAKER: So the efficiency of the database or the time it takes is going to depend a little bit on the database. But certainly databases like MySQL and PostgreSQL are going to be able to handle things like this much more quickly than SQLite can, which is not designed to handle as large scale database. And it also is going to depend upon the amount of data and the type of data. 

So it varies a lot. And that's why in practice, a lot of people will do benchmarking to actually just run tests to see how much load can it handle, and in how much time, and how fast are various different operations. Were there other questions? A couple, yeah? 

AUDIENCE: How did you connect the Python files to the Heroku app? 

SPEAKER: How did I connect the Python file to the Heroku app? I do that here where I say, os.getend the DATABASE_URL. In other words, get the environment variable called DATABASE_URL. And then in my terminal, I have set the DATABASE_URL environment variable via the Export keyword on Mac or Linux or the Set keyword on Windows to set it equal to the big long URL that I got from Heroku. I copied this URL, and that is the URL that's going to connect me to the database that Heroku has created for me. Yeah? 

AUDIENCE: Sorry. Is that DATABASE_URL the long URL online? Or is it something different? 

SPEAKER: It's a long one from online. 

AUDIENCE: OK. 

SPEAKER: Yeah. All right. I'll show you one other example of the types of scripts that we can create just by the fact that we can now run queries on applications. I'll show you passengers.py, and then we'll take a look at web applications before we break for our morning project. 

So passengers.py is just a program that's going to allow me to see the passengers on a given flight. The first thing that it does is it lists all the flights. And we do that by selecting the ID, origin, and destination from the Flights Table. Just select all of that information from the Flight Table, and then for each one of those flights, go ahead and print it out, printing out various different properties for the flight. 

Then I'll prompt the user to choose a flight using the input function to say, type in a flight ID here. Convert it to an integer, as we saw when we first introduced Python, save it inside a flight ID. And then I'm saying, OK, select everything from a flight where the ID is this flight ID. Fetch one, meaning just get me one. I'm only trying to get one flight, I don't want many flights, so just get me one thing. 

It's possible that that flight ID doesn't exist. So if the flight that comes back is none, in other words nothing, then I'm going to print out error, sorry, that's not actually a valid flight. But otherwise, if it is a valid flight, I'll go ahead and request the passengers. 

Select name from the Passengers Table where the flight ID is whatever the flight ID the user just typed in is. Save that result in passengers. Then loop over that list of passengers and print out the passenger's name. If there are no passengers, print out the fact that there are no passengers on the flight. 

I went through that rather quickly. But this code and all the code from today is online on the course website if you want to take a look at it in order to experiment with it, or use it in your own projects. So if I run Python passengers.py now, it's going to show me all the flights. All right, here are all the flights and their IDs. 

If I type in flight number one, I want to see the passengers on that flight. It'll show me which passengers are on that flight, Alice and Bob. And that allows me to see the passengers that are currently on a given flight by interacting with that database. 

So last thing I'll show before we go on to the morning project is to take a look at how we might combine this into a web application, like a Flask web application. I'll go ahead and run flask.run, open up the Flask URL. And what I see here is a page where I can book a new flight. 

In this dropdown, I see a list of all the possible flights. And say I want to book a flight on the New York to London flight, type in my name, click book flight. [INAUDIBLE], it says success, you successfully booked a flight. And if I go back to passengers.py-- go back-- passengers.py and go to flight number one and take a look at it. All right, I now see that I am now registered for this flight. 

I've been able to add myself to the database by making some database queries. What was the code that made that work? You can take a look at it more closely if you'd like to in your own time, but I'll briefly go over what's going on. 

On the Index page, if I just go to the Home page, the first thing I do is run SELECT star from flights. Get me all of the flights, save it in a variable called Flights, and pass that into index.html. All index.html is going to do now is basically show me a select dropdown box where for every flight in my list of flights I'm going to run a loop. I want one dropdown option for each flight that I have. 

I have an option whose value is going to be the flight's ID. And the value is going to be important because when I submit this form, it's going to use this ID as the value that I get. I want the flight's ID to be the value. And then what shows up to the user at least, the text, is just flights origin to flights destination. So I saw New York to London, for example. 

I have a place for the passenger to type in their name, and then I have a button that's going to book the flight. And there are a bunch of classes here, these are bootstrap classes to add bootstrap styling. Just makes the page look a little bit nicer, if you might have noticed. 

And so that's all that's going on here. I've got a form with a dropdown, one option for each flight, a place to type in a name, and then a button to book the flight. When the flight's actually booked, it's going to go to this route, /book. I first get the name of the user, then I try to get the flight ID, making sure that it's an integer. If it's not an integer, it's going to throw a value error, so I'm catching that exception. 

Remember, Python exception handlers from a couple days ago. And I'm going to render an error page in case it's not actually a valid integer. Then I'm going to make sure the flight actually exists, SELECT star from flights where that ID is equal to the ID. 

And if the flight doesn't exist, I'm going to render an error page that says, sorry, there's no flight that has that ID. And as long as the flight does exist, I'll go ahead and insert something into the Passengers Table, give it a name and a flight ID with whatever name the user typed in and whatever flight ID the user typed in. Commit that result as a result and then just display the success page. 

I know I went through that rather quickly, but just to give you the general idea. You can look to the application online if you want a look at the specific syntax that's used there. And so those are the capabilities that we get just by adding a little bit of information about how to connect to and interact with the database. 

With not a whole lot more code, we can begin to add a little more sophistication to this application as well. I'll go ahead and go into Airline 1 and show you the types of things we can do. We'll go to the main page. And this page does the same thing as before, but now I can also go to /flights, a different route that I just added whereby a list all flights, and I can click on any one flight. 

If I click on like New York to London, for example, then I'll see flight details. Origin is New York, destination is London, here's the duration, and here are the passengers on the flight, allowing myself the ability to interact with this database all the more and see the results in a nicely formatted web page. And in order to do that, all that I needed to do was in application.py, add some additional routes. 

I added a flights route that very simply just selects all the flights and returns the flights.html page. And then I added a specific flight route, slash flights, slash integer flight_id. This again, as we saw in Flask, was our variable URL whereby we have slash flight slash 1, slash flight slash 2, slash 28, or any flight ID whereby then we're going to try and request that particular flight and then get all of the passengers on that flight returning the results in an html page. 

High level questions about any of that? I know I didn't go through all the code, but feel free to take a look at it. It's basically just a combination of the things that we've been doing already, using db.execute to make database queries and then using Flask and our ability to generate templates to display information in a way that looks nicer. 

All right. So what I thought we'd do as our morning project, rather than dive into web applications right away is just to get set up with Postgres, and trying to import data into a database, a Postgres database that we can host on Heroku, such that we can write Python scripts to interact with it. And if you'd like to, later on we can begin to build web applications around it as well. 

And so I've given you a sample dataset. If we go to CS50.harvard.edu/beyond, go to the Beyond page, go to Lectures, and go to Today, you'll see I've given you a CSV file called Books Data, which is a CSV file of 5,000 books. So a whole bunch of books ranging from Shakespeare up until modern books. If I download this, you'll need to-- might need to create a Dropbox account to be able to download it. 

But you download books.csv, and you'll get a CSV file. We go back into books, you'll get a books.csv file that looks a little something like this. It's a whole bunch of ISBN numbers, titles, authors, and years. And what we're going to have you do is create a database on Heroku, first by creating a Heroku application for these books, and then setting up a new Heroku Postgrs database. 

Decide on a database schema in which to store these books. What are the tables, what are the columns going to be, what are their types going to be, are there any constraints on them? You should have one table for authors and one table for books, but beyond that, we'll leave the flexibility there up to you. 

And then write a program to take the data from the CSV file and import it into the database. You'll find that you can reference import.py, the one that I used in order to import flights into the database as a potential starting point, although this one is going to be a little more complicated. Because whereas the import.py from the flights example just imported things into a single table, the Flights Table, here you're going to want to import the author into the Authors Table, the book title into the Books Table, probably. And it's possible that there are multiple books by the same author, for example, that's inside the CSV file, that you can assume if there are two authors with the same name, they are in fact the same author, at least for these purposes. 

And then if you have time, write a Python program that lets you type in a title of a book or the name of an author. Or if you'd like to, like a partial title, or a partial match of the author's name and returns you all of those possible results. And so you can definitely use the source code examples that we use today as a starting point for that. 

But that's going to be the morning project. Whereby at the time that you're done with it, you should be able to say-- and I'll go into Heroku, I set this up already on Brian Books. Whereby if I go to Settings-- or actually, we'll go to Heroku Postgres, and go to Settings, and get the database credentials. You should be able to go to Adminer, log in with the server name, the user name, the password, and the database name. 

And when you do all that, you'll find that you can see for instance of the authors and data associated with those authors. And then all of the books and data associated with those books as well, along with some way of keeping track of the relationship between the book and the author. And so we'll leave it up to you to figure out how exactly to do that, but we'll go ahead and break out into different rooms. 

We don't have room 136 this morning, we'll get it back later this afternoon. But if I could have this side of the room go to room 212 and everyone else can stay here in the auditorium, we'll get started with the morning project. Break for lunch at 12:30 and we'll reconvene at 2:00.