WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:04.910
[MUSIC PLAYING]

00:00:17.155 --> 00:00:18.780
CARTER ZENKE: Well, hello, one and all.

00:00:18.780 --> 00:00:22.890
And welcome back to CS50's
Introduction to Databases with SQL.

00:00:22.890 --> 00:00:24.360
My name is Carter Zenke.

00:00:24.360 --> 00:00:26.820
And last we left off, we
learned about relating--

00:00:26.820 --> 00:00:31.360
that is, how to have multiple tables
in our database for people, places,

00:00:31.360 --> 00:00:34.500
and things, and how to have them
relate to one another in the way

00:00:34.500 --> 00:00:36.580
you might do in the real world.

00:00:36.580 --> 00:00:38.550
Now, today, we'll take a step forward.

00:00:38.550 --> 00:00:41.370
And we'll talk about how to
put you in the driver's seat,

00:00:41.370 --> 00:00:45.870
designing your very own database
schemas to organize your data.

00:00:45.870 --> 00:00:49.890
Now, we'll pick up where we left off,
which is with this database of books.

00:00:49.890 --> 00:00:52.050
So we had this database.

00:00:52.050 --> 00:00:56.760
It was full of books that have been
longlisted for the International Booker

00:00:56.760 --> 00:00:57.450
Prize.

00:00:57.450 --> 00:01:01.740
To be longlisted means to be
nominated for some prize, let's say.

00:01:01.740 --> 00:01:05.190
So we had the past five years
of books in this database.

00:01:05.190 --> 00:01:10.170
And we worked on improving this database
over time from week 0 to week 1.

00:01:10.170 --> 00:01:13.080
Now, this week, we'll actually
take a look underneath the hood

00:01:13.080 --> 00:01:17.980
and see what commands we had used
to create these varied databases.

00:01:17.980 --> 00:01:21.660
So let's reveal now what
we had done all along.

00:01:21.660 --> 00:01:24.010
I'll go back to my terminal here.

00:01:24.010 --> 00:01:26.040
And if you remember,
I could use a command

00:01:26.040 --> 00:01:29.730
to open up a database file,
which was this command here,

00:01:29.730 --> 00:01:34.450
SQL sqlite3, and then the name
of the file I want to open.

00:01:34.450 --> 00:01:35.910
So let's try this.

00:01:35.910 --> 00:01:39.420
I'll go back to week 0, to
my week 0 folder like this.

00:01:39.420 --> 00:01:43.035
And I'll open up my long list
database, week0/longlist.db.

00:01:46.170 --> 00:01:47.430
I'll hit Enter.

00:01:47.430 --> 00:01:49.720
Now, I'm in my sqlite prompt.

00:01:49.720 --> 00:01:52.230
So I could work on
typing some SQL commands,

00:01:52.230 --> 00:01:58.030
SQL statements or queries inside
of this terminal prompt here.

00:01:58.030 --> 00:02:01.660
So if I want to get a feel for
what's inside this database,

00:02:01.660 --> 00:02:04.230
we saw I could use a
command called select.

00:02:04.230 --> 00:02:07.860
So I'll select now some
rows from this table.

00:02:07.860 --> 00:02:14.370
I'll say SELECT, let's say, the
title and also the author columns

00:02:14.370 --> 00:02:19.740
from my longlist table,
semicolon, and Enter.

00:02:19.740 --> 00:02:23.760
Now, I'll see all the titles
and all the authors that

00:02:23.760 --> 00:02:27.030
were inside of this long list table.

00:02:27.030 --> 00:02:29.250
But I only want to peek here.

00:02:29.250 --> 00:02:32.760
I only want to see roughly
what kind of data is inside.

00:02:32.760 --> 00:02:36.130
So I could probably
improve this command here.

00:02:36.130 --> 00:02:43.530
I could instead say SELECT the author
and title columns from longlist--

00:02:43.530 --> 00:02:49.470
from longlist, but limit now to
the first five rows we saw before.

00:02:49.470 --> 00:02:52.440
Semicolon, and I'll hit
Enter on this query.

00:02:52.440 --> 00:02:56.590
And now, I'll see only
the top five rows.

00:02:56.590 --> 00:03:02.860
So I'm able now to see what kind
of data is inside my database.

00:03:02.860 --> 00:03:08.790
But what I can't yet see is what
command was used to create this table

00:03:08.790 --> 00:03:11.950
and what kind of data could
be stored inside of it.

00:03:11.950 --> 00:03:16.140
So let's reveal now what was going
on underneath the hood all this time.

00:03:16.140 --> 00:03:20.040
I'll say, this new command, a
sqlite command-- not a SQL keyword,

00:03:20.040 --> 00:03:24.580
but sqlite command
called .schema, .schema.

00:03:24.580 --> 00:03:28.350
Now, if I hit Enter,
I'll see the following--

00:03:28.350 --> 00:03:30.840
the command, the
statement, the query that

00:03:30.840 --> 00:03:34.260
was used to create this longlist table.

00:03:34.260 --> 00:03:37.980
And notice how I have many
columns inside this table.

00:03:37.980 --> 00:03:43.480
I have an ISBN column, a title
column, an author column, and so on.

00:03:43.480 --> 00:03:47.340
And each column seems to
have some kind of data

00:03:47.340 --> 00:03:52.080
that could be stored inside of it
like text, or integers, or real,

00:03:52.080 --> 00:03:56.270
or floating point values, or
decimals, if you're familiar.

00:03:56.270 --> 00:04:01.510
So this is how we created the
very first version of longlist.db.

00:04:01.510 --> 00:04:04.165
But let's also see how
we created the second.

00:04:04.165 --> 00:04:09.190
So I'll type .quit to leave
this version of longlist.db.

00:04:09.190 --> 00:04:12.650
And now, let me open up the
next version we had created,

00:04:12.650 --> 00:04:13.910
which is more relational.

00:04:13.910 --> 00:04:17.350
It had tables inside of it that
could relate to one another.

00:04:17.350 --> 00:04:21.100
So I'm going to type
sqlite3, then long--

00:04:21.100 --> 00:04:26.320
sqlite3, and go to week 1,
and then type longlist.db.

00:04:26.320 --> 00:04:27.700
I'll hit Enter.

00:04:27.700 --> 00:04:31.720
Now, I'm on my next
version of longlist.db.

00:04:31.720 --> 00:04:32.960
Well, what could I do?

00:04:32.960 --> 00:04:36.050
I could type SELECT and look
at some of the tables in here

00:04:36.050 --> 00:04:38.200
to see what kind of data is inside.

00:04:38.200 --> 00:04:44.350
I could perhaps say SELECT maybe
the names from the authors table

00:04:44.350 --> 00:04:48.220
here, from the authors
table, and hit semicolon.

00:04:48.220 --> 00:04:54.280
Now, I'll see all the names of authors
that are inside of the authors table.

00:04:54.280 --> 00:04:55.930
I could do the same thing for books.

00:04:55.930 --> 00:04:58.150
I could maybe look at
the titles of books.

00:04:58.150 --> 00:05:05.350
I could say SELECT title from the
books table, semicolon, and Enter.

00:05:05.350 --> 00:05:10.490
Now, I see all of the titles
that are inside my books table.

00:05:10.490 --> 00:05:14.230
But what I haven't seen yet is
the schema of this database.

00:05:14.230 --> 00:05:17.080
It is the way it is
organized and the commands

00:05:17.080 --> 00:05:19.700
that were used to create these tables.

00:05:19.700 --> 00:05:21.730
So let me work on that now.

00:05:21.730 --> 00:05:24.460
I'll clear my terminal using Control-L.

00:05:24.460 --> 00:05:28.090
And now, let me type .schema again.

00:05:28.090 --> 00:05:34.420
I'll type .schema to see what commands
were used to create this database.

00:05:34.420 --> 00:05:37.630
Hit Enter, and I can see--

00:05:37.630 --> 00:05:39.670
I mean, there are quite
a lot of commands here.

00:05:39.670 --> 00:05:42.610
Now, if this feels overwhelming, I
mean, I'd be right there with you.

00:05:42.610 --> 00:05:45.610
This is a lot of commands
to parse through and read.

00:05:45.610 --> 00:05:49.850
So there's probably a
better way to do this.

00:05:49.850 --> 00:05:55.660
And one way to try is to type .schema,
and then give it some table name.

00:05:55.660 --> 00:05:58.930
Let's say, want to understand the
schema for just the books table,

00:05:58.930 --> 00:06:00.770
like just that for now.

00:06:00.770 --> 00:06:04.060
So I'll say .schema, and
then the table name--

00:06:04.060 --> 00:06:05.720
books, in this case.

00:06:05.720 --> 00:06:07.840
Then I can hit Enter.

00:06:07.840 --> 00:06:11.500
Now, I'll see the
schema, the organization,

00:06:11.500 --> 00:06:15.860
the command that we used to create
the books table, in this case.

00:06:15.860 --> 00:06:18.250
And notice again, we
have several columns--

00:06:18.250 --> 00:06:22.420
ID, ISBN, title,
publisher ID, and so on.

00:06:22.420 --> 00:06:26.380
Each one has their own kind of
data they could support or take

00:06:26.380 --> 00:06:30.590
into this column like
integers, text, and so on.

00:06:30.590 --> 00:06:33.700
So again, what we'll do
today is have you all

00:06:33.700 --> 00:06:36.640
learn how to write your
very own create table

00:06:36.640 --> 00:06:41.050
commands to build your very own
databases that represent what you

00:06:41.050 --> 00:06:44.360
want to represent in the real world.

00:06:44.360 --> 00:06:47.860
So let me exit this prompt here.

00:06:47.860 --> 00:06:51.370
And let me propose that we'll
have a bit of a design challenge

00:06:51.370 --> 00:06:55.600
today to actually try to represent a
real world entity with some database

00:06:55.600 --> 00:06:56.240
here.

00:06:56.240 --> 00:06:59.050
And if you're not
already familiar, Boston

00:06:59.050 --> 00:07:02.110
is perhaps famous for being
among the first cities

00:07:02.110 --> 00:07:05.560
to have a subway system
in the United States.

00:07:05.560 --> 00:07:09.730
So here is a picture from the
late 1800s of a subway being

00:07:09.730 --> 00:07:11.950
built in Boston's city streets.

00:07:11.950 --> 00:07:14.170
Underneath the streets
here, there would be

00:07:14.170 --> 00:07:18.100
trolley cars that would go and
transport people across Boston.

00:07:18.100 --> 00:07:20.410
Here's another picture of
a trolley actually working

00:07:20.410 --> 00:07:21.790
underneath the streets.

00:07:21.790 --> 00:07:23.920
So people would go down underneath.

00:07:23.920 --> 00:07:25.360
They would hop on a trolley.

00:07:25.360 --> 00:07:29.950
They'd be able to different parts of
Boston, perhaps from Harvard to MIT

00:07:29.950 --> 00:07:33.550
or downtown up to, let's say,
Braintree or down to Braintree,

00:07:33.550 --> 00:07:36.070
which is more south of
Boston, for example.

00:07:36.070 --> 00:07:39.280
One of the famous stops
is the Park Street stop,

00:07:39.280 --> 00:07:41.530
which is right down in
the middle of Boston,

00:07:41.530 --> 00:07:45.010
one of the central hubs
of this subway system.

00:07:45.010 --> 00:07:50.480
And now, these photos are all from,
let's say, the early 1900s, late 1800s,

00:07:50.480 --> 00:07:51.230
and so on.

00:07:51.230 --> 00:07:54.070
But the subway has gotten a
lot more modern since then.

00:07:54.070 --> 00:07:59.000
And actually, now, we have several lines
that span the entire city and beyond.

00:07:59.000 --> 00:08:02.740
So here, we have the Red Line, of
which Harvard and MIT are a part.

00:08:02.740 --> 00:08:05.800
We have the green line, which
brings you kind of west to east--

00:08:05.800 --> 00:08:09.680
the blue line, the
orange line, and so on.

00:08:09.680 --> 00:08:14.790
So many more lines and stations
have been added to this system.

00:08:14.790 --> 00:08:18.460
It's a big design challenge to
represent all of these stations,

00:08:18.460 --> 00:08:24.560
all of these lines, and all of these
people who might ride this subway, too.

00:08:24.560 --> 00:08:31.220
So the question then becomes, how
can we create a schema for this data?

00:08:31.220 --> 00:08:36.350
And again, by schema, we mean what
kinds of tables should we have,

00:08:36.350 --> 00:08:40.610
what kinds of columns might those
tables have, and what kind of data

00:08:40.610 --> 00:08:44.159
should we put in each of
those columns, for instance.

00:08:44.159 --> 00:08:46.320
So let me propose this.

00:08:46.320 --> 00:08:49.790
Let's say we start just first
with names and stations.

00:08:49.790 --> 00:08:53.750
So Charlie here, our very
first rider on this system,

00:08:53.750 --> 00:08:56.810
is going to be at the
Kendall at MIT station.

00:08:56.810 --> 00:08:58.910
So this is what this
table represents now.

00:08:58.910 --> 00:09:01.370
But what more could we have?

00:09:01.370 --> 00:09:05.750
Well, we might also want to have maybe
what Charlie is doing at that station.

00:09:05.750 --> 00:09:09.050
Maybe he's entering the
station, for instance.

00:09:09.050 --> 00:09:11.390
And if you're familiar
with the subway system,

00:09:11.390 --> 00:09:17.550
you often have to pay to get onto a
train or get onto the station itself.

00:09:17.550 --> 00:09:23.570
So let's say Charlie pays some fare
to enter into the Kendall/MIT station.

00:09:23.570 --> 00:09:26.900
Well, back in the mid 1900s,
the fare was only about a dime.

00:09:26.900 --> 00:09:27.800
It was $0.10.

00:09:27.800 --> 00:09:33.820
So we'll say Charlie paid $0.10
to enter the Kendall/MIT station.

00:09:33.820 --> 00:09:36.140
And now, this seems pretty good.

00:09:36.140 --> 00:09:41.300
But if I am the Transit Authority,
the person who runs the subway system,

00:09:41.300 --> 00:09:45.890
I probably want to know, does Charlie
have enough money to get on the train?

00:09:45.890 --> 00:09:48.730
And if so I want to make sure
that, OK, well, Charlie actually

00:09:48.730 --> 00:09:50.590
could get on this train.

00:09:50.590 --> 00:09:53.420
So let's say not only
does Charlie pay a fare.

00:09:53.420 --> 00:09:56.570
He has some remaining
balance afterwards.

00:09:56.570 --> 00:10:00.160
So Charlie here has gotten
onto the Kendall/MIT stop.

00:10:00.160 --> 00:10:05.330
He's paid the fare of
$0.10 and has $0.05 left.

00:10:05.330 --> 00:10:08.750
OK, so here's a bit of a table.

00:10:08.750 --> 00:10:11.030
We probably add more information to it.

00:10:11.030 --> 00:10:15.230
Let's say Charlie then leaves
at the Jamaica Plain stop.

00:10:15.230 --> 00:10:18.650
And the fare to leave is
about a nickel, $0.05.

00:10:18.650 --> 00:10:21.410
And now, Charlie has no cents left over.

00:10:21.410 --> 00:10:27.680
So again, Charlie paid $0.10 to get on,
had $0.05 left, paid $0.05 to get off,

00:10:27.680 --> 00:10:31.930
and now has no remaining
balance here anymore.

00:10:31.930 --> 00:10:32.440
OK.

00:10:32.440 --> 00:10:34.040
So that's Charlie's story.

00:10:34.040 --> 00:10:35.860
Let's look at Alice, though.

00:10:35.860 --> 00:10:38.530
Let's say Alice gets
on at the Harvard stop.

00:10:38.530 --> 00:10:42.310
They too pay $0.10 to get
on at the Harvard stop.

00:10:42.310 --> 00:10:45.370
And they have a remaining
balance of $0.20.

00:10:45.370 --> 00:10:49.510
Alice will go, let's say, to Park
Street, get off at Park Street,

00:10:49.510 --> 00:10:51.160
pay the nickel to leave.

00:10:51.160 --> 00:10:55.430
And now, they'll have a
balance of $0.15 at the end.

00:10:55.430 --> 00:10:56.620
Let's go to Bob.

00:10:56.620 --> 00:10:59.260
Bob enters the Alewife station.

00:10:59.260 --> 00:11:00.940
They pay $0.10.

00:11:00.940 --> 00:11:02.800
They have remaining balance of $0.30.

00:11:02.800 --> 00:11:08.337
And let's say they leave at Park Street
and have a fare of $0.10 to leave

00:11:08.337 --> 00:11:09.670
because it's a further distance.

00:11:09.670 --> 00:11:14.490
Now, they'll have a running
balance of $0.20 overall.

00:11:14.490 --> 00:11:18.840
So this table is OK, I might admit.

00:11:18.840 --> 00:11:23.100
I mean, last time, we learned about
having what we called primary keys

00:11:23.100 --> 00:11:24.340
and foreign keys.

00:11:24.340 --> 00:11:26.220
So it seems like that's missing here.

00:11:26.220 --> 00:11:28.020
Let's go ahead and add that here.

00:11:28.020 --> 00:11:32.940
I'll give each row a unique ID, so
I can know who entered, who exited,

00:11:32.940 --> 00:11:35.580
and so on, give that a unique ID here.

00:11:35.580 --> 00:11:41.950
But I might even say that this table
could really be improved substantially.

00:11:41.950 --> 00:11:49.410
And I want to ask you what redundancies
or inefficiencies do you see here?

00:11:49.410 --> 00:11:52.680
If we're trying to represent
riders and stations,

00:11:52.680 --> 00:11:56.990
what can we improve about this design?

00:11:56.990 --> 00:12:01.740
AUDIENCE: So probably the redundancy
will be the names and the stations,

00:12:01.740 --> 00:12:02.240
too.

00:12:02.240 --> 00:12:06.760
For example, if Charlie
will go to the train daily,

00:12:06.760 --> 00:12:10.875
then he will become most
of the names in the data.

00:12:10.875 --> 00:12:12.250
CARTER ZENKE: Yeah, a good point.

00:12:12.250 --> 00:12:13.450
If I'm hearing what you're
saying, [? Loren, ?]

00:12:13.450 --> 00:12:16.420
let me show you some examples
that I highlighted here.

00:12:16.420 --> 00:12:20.290
One example could be, to your
point, about these names.

00:12:20.290 --> 00:12:24.370
These names seem to be telling
us the name of a person.

00:12:24.370 --> 00:12:27.640
But here, we have only three names--

00:12:27.640 --> 00:12:29.680
Charlie, Alice, and Bob.

00:12:29.680 --> 00:12:32.170
Well, my name is Carter.

00:12:32.170 --> 00:12:35.110
And what if somebody
else named Carter also

00:12:35.110 --> 00:12:37.810
tried to get on and leave at some stop?

00:12:37.810 --> 00:12:39.820
Well, I wouldn't be able
to know which Carter

00:12:39.820 --> 00:12:44.210
was which or which Charlie was which,
which Alice was which, and so on.

00:12:44.210 --> 00:12:47.920
So we probably need a way to represent
people and their names a little better

00:12:47.920 --> 00:12:50.220
here, too.

00:12:50.220 --> 00:12:55.870
What other ideas do we have for how
to improve the design of this table?

00:12:55.870 --> 00:13:00.330
AUDIENCE: Yes, I think we can have
a singular ID for a singular person.

00:13:00.330 --> 00:13:02.998
That way, we'll be better able
to track their activities.

00:13:02.998 --> 00:13:03.790
CARTER ZENKE: Nice.

00:13:03.790 --> 00:13:06.555
So we probably have
an ID for each person,

00:13:06.555 --> 00:13:10.830
a bit what we learned about last week,
putting people in their own table

00:13:10.830 --> 00:13:14.890
and giving them their own
unique ID, a primary key.

00:13:14.890 --> 00:13:16.380
Let's show that here.

00:13:16.380 --> 00:13:18.150
I'll go to some slides.

00:13:18.150 --> 00:13:21.345
And I'll pick out one
that shows us just riders.

00:13:21.345 --> 00:13:23.220
So to your point,
[? Soqanya, ?] we could try

00:13:23.220 --> 00:13:25.830
to have maybe a table for just riders.

00:13:25.830 --> 00:13:29.130
And maybe to simplify, this
table has only two columns.

00:13:29.130 --> 00:13:32.850
It has a column for ID
and a column for name.

00:13:32.850 --> 00:13:38.250
So here, we have Charlie, Alice,
and Bob all in their own table.

00:13:38.250 --> 00:13:42.270
Well, let me propose to you, we
could do the same thing for stations.

00:13:42.270 --> 00:13:44.760
Let's say we have a
table of stations now.

00:13:44.760 --> 00:13:48.030
And we give each one
their very own ID as well,

00:13:48.030 --> 00:13:50.400
our own primary key for this table.

00:13:50.400 --> 00:13:52.540
We have Harvard,
Kendall, and Park Street.

00:13:52.540 --> 00:13:57.130
We can differentiate between
them using their IDs here.

00:13:57.130 --> 00:14:00.000
So a few improvements could be made.

00:14:00.000 --> 00:14:04.560
And as we're making these improvements,
splitting one table into many,

00:14:04.560 --> 00:14:08.010
and debating what kind of
data to store in each table,

00:14:08.010 --> 00:14:12.180
the process we're going through
is one called normalizing.

00:14:12.180 --> 00:14:14.640
We're normalizing our data.

00:14:14.640 --> 00:14:18.210
To normalize means to reduce
redundancies, effectively,

00:14:18.210 --> 00:14:19.530
to take a table--

00:14:19.530 --> 00:14:22.230
take one table, for instance,
split up into multiple,

00:14:22.230 --> 00:14:26.550
and have each entity be
part of its very own table.

00:14:26.550 --> 00:14:31.590
Some academics in the world have named
different normal forms, quote unquote.

00:14:31.590 --> 00:14:36.120
There's like first normal form,
second normal form, third normal form,

00:14:36.120 --> 00:14:39.420
this progression of making your
day more and more efficient.

00:14:39.420 --> 00:14:41.160
You can use those as heuristics.

00:14:41.160 --> 00:14:43.770
But end of the day, a few
principles might apply.

00:14:43.770 --> 00:14:48.750
First, take your entities, like
in this case stations and riders,

00:14:48.750 --> 00:14:51.840
and put them each in their own table.

00:14:51.840 --> 00:14:56.760
And if you add more data, make sure that
if I were to add a column, let's say,

00:14:56.760 --> 00:15:00.780
to riders, it is only a
data point about riders--

00:15:00.780 --> 00:15:05.190
not about stations, not about
fares, only about riders.

00:15:05.190 --> 00:15:08.220
And that process can help
us make a data set that

00:15:08.220 --> 00:15:14.050
is more dynamic, more easy to reproduce,
and more easy to write queries on.

00:15:14.050 --> 00:15:18.090
So that is the process
here of normalizing.

00:15:18.090 --> 00:15:18.960
OK.

00:15:18.960 --> 00:15:25.830
So if we have now riders and stations,
we want to represent them in our table.

00:15:25.830 --> 00:15:28.050
Well, we could use what
we learned about relating

00:15:28.050 --> 00:15:30.960
last week to ask, how
could we actually represent

00:15:30.960 --> 00:15:33.790
these riders and these stations?

00:15:33.790 --> 00:15:36.640
So let's say here I can
have riders and stations.

00:15:36.640 --> 00:15:40.330
I want to make sure that I have the
right relationship between them.

00:15:40.330 --> 00:15:42.360
Well, if you're familiar
with subways, we

00:15:42.360 --> 00:15:45.870
might say that a rider
goes to one station.

00:15:45.870 --> 00:15:50.070
And this big T here is the symbol
for a station here in Boston,

00:15:50.070 --> 00:15:52.890
for the T's that we
call it, for the subway.

00:15:52.890 --> 00:15:56.010
So a rider might go to one station.

00:15:56.010 --> 00:15:58.530
But of course, that might
not be the full picture.

00:15:58.530 --> 00:16:01.660
A rider also gets off
at some station on.

00:16:01.660 --> 00:16:06.990
So a rider could be associated with
not just one station, but multiple.

00:16:06.990 --> 00:16:10.380
And if you're familiar, at least with
any subway system or the Boston one,

00:16:10.380 --> 00:16:12.660
too, it can often get pretty busy.

00:16:12.660 --> 00:16:16.890
And so riders might not just go
to, of course, one station or two.

00:16:16.890 --> 00:16:22.750
Stations could also have multiple riders
that are on a particular station here.

00:16:22.750 --> 00:16:28.290
So to recap, one rider might be
associated with more than one station.

00:16:28.290 --> 00:16:33.030
They might get on at this first
one and get off at this later one.

00:16:33.030 --> 00:16:37.350
But each station could presumably
have more than one rider.

00:16:37.350 --> 00:16:41.130
Each station here could have
rider A or rider B, the rider

00:16:41.130 --> 00:16:47.050
up here or the rider down below, and
even many more than that as well.

00:16:47.050 --> 00:16:50.550
So to put it in the
language of our ER diagrams,

00:16:50.550 --> 00:16:53.160
our entity relation
diagrams from last week,

00:16:53.160 --> 00:16:57.840
we could look at it bit like this,
where we have riders and stations.

00:16:57.840 --> 00:17:02.550
Riders visit stations, and
they're associated like this.

00:17:02.550 --> 00:17:07.540
A rider must have at least one
station associated with them.

00:17:07.540 --> 00:17:09.270
That's what this horizontal line means.

00:17:09.270 --> 00:17:13.140
If they aren't at a station, they
aren't really a rider, right?

00:17:13.140 --> 00:17:16.960
A rider though could have many
stations associated with them.

00:17:16.960 --> 00:17:19.470
That's what this three
prongs down here means.

00:17:19.470 --> 00:17:22.440
They could have one,
two, three, four, they

00:17:22.440 --> 00:17:26.290
could have many stations
they get on and get off of.

00:17:26.290 --> 00:17:31.030
Now a station could have
anywhere between zero riders,

00:17:31.030 --> 00:17:35.290
if it's maybe out of commission or
isn't very popular, upwards to many.

00:17:35.290 --> 00:17:39.730
It could have two, three, four, five,
even hundreds of riders associated

00:17:39.730 --> 00:17:42.070
with this particular station.

00:17:42.070 --> 00:17:47.350
So here is our entity relation
diagram for these particular riders

00:17:47.350 --> 00:17:50.030
and these stations here.

00:17:50.030 --> 00:17:55.660
So let me ask, what questions do we have
on these relationships between riders

00:17:55.660 --> 00:17:59.700
and stations and how to
design this table so far?

00:17:59.700 --> 00:18:01.450
AUDIENCE: Then I want
to ask that you have

00:18:01.450 --> 00:18:06.370
used the same ID for stations
and riders so that maybe

00:18:06.370 --> 00:18:08.895
give us a problem in coding?

00:18:08.895 --> 00:18:10.520
CARTER ZENKE: Yeah, a good observation.

00:18:10.520 --> 00:18:14.950
So you might have noticed that in the
riders table and in the stations table,

00:18:14.950 --> 00:18:17.140
I gave the same kind of ID.

00:18:17.140 --> 00:18:19.602
Like I had one, two,
three for each of them.

00:18:19.602 --> 00:18:21.310
And let me just show
you that again here.

00:18:21.310 --> 00:18:24.950
I'll come back to some slides,
and I'll show you again,

00:18:24.950 --> 00:18:30.730
the riders table where we had
Charlie, Alice, and Bob, ID 1, 2, 3.

00:18:30.730 --> 00:18:31.870
Same for the stations.

00:18:31.870 --> 00:18:36.730
We had stations Harvard,
Kendall, Park Street, ID 1, 2, 3.

00:18:36.730 --> 00:18:39.950
And to your question,
isn't that a problem?

00:18:39.950 --> 00:18:43.540
Well, I would argue in
this case, it's not,

00:18:43.540 --> 00:18:48.460
so long as we keep clear that
these IDs are for stations

00:18:48.460 --> 00:18:50.800
and these IDs are for riders.

00:18:50.800 --> 00:18:54.550
And we'll see how to do that
using our SQL keywords later on.

00:18:54.550 --> 00:18:59.500
But again, so long as we have an
ID just for our riders and an ID

00:18:59.500 --> 00:19:02.770
just for our stations, we
can keep these separate

00:19:02.770 --> 00:19:05.560
even if they might have the same values.

00:19:05.560 --> 00:19:07.990
But a great question here.

00:19:07.990 --> 00:19:10.440
Let's take just one more.

00:19:10.440 --> 00:19:12.870
AUDIENCE: Regarding the
entity relationship diagram,

00:19:12.870 --> 00:19:17.340
how is it possible for a station to
have a possibility of zero riders,

00:19:17.340 --> 00:19:21.303
but riders must compulsorily
have at least one station.

00:19:21.303 --> 00:19:22.720
CARTER ZENKE: Yeah, good question.

00:19:22.720 --> 00:19:25.470
So this might be up to you
and how you formulate it,

00:19:25.470 --> 00:19:28.380
but for me, let me show
that diagram again.

00:19:28.380 --> 00:19:32.280
I'll go back to over here.

00:19:32.280 --> 00:19:36.540
In my mind, to be a rider,
you have to visit a station.

00:19:36.540 --> 00:19:39.780
If you aren't visiting a station,
you aren't really a rider, right?

00:19:39.780 --> 00:19:43.740
Now presumably, there are stations
that were built but aren't really being

00:19:43.740 --> 00:19:46.560
used right now or aren't
really in service yet.

00:19:46.560 --> 00:19:49.230
That could be a station
that has no visitors.

00:19:49.230 --> 00:19:50.910
So you could argue--

00:19:50.910 --> 00:19:54.090
let's make sure every station
has at least one rider

00:19:54.090 --> 00:19:56.910
and every rider may or may
not have to visit a station.

00:19:56.910 --> 00:19:59.640
For that I would say, we
could probably reasonably

00:19:59.640 --> 00:20:03.300
disagree there and talk about how we
could represent the diagram here, too.

00:20:03.300 --> 00:20:09.940
But a great observation and a good
other critique of this system here.

00:20:09.940 --> 00:20:11.150
All right.

00:20:11.150 --> 00:20:16.000
So let's now turn to representing
this in our database.

00:20:16.000 --> 00:20:19.180
I'll go back to my
computer and we'll learn

00:20:19.180 --> 00:20:23.440
about this new SQL
keyword, SQL statement,

00:20:23.440 --> 00:20:26.200
this one called Create Table.

00:20:26.200 --> 00:20:31.400
Create table allows us to, as the name
suggests, create a brand new table.

00:20:31.400 --> 00:20:34.450
So let's do just that
in our new database

00:20:34.450 --> 00:20:37.480
to represent riders and stations.

00:20:37.480 --> 00:20:42.820
I'll go into my terminal, and I
want to make a brand new database.

00:20:42.820 --> 00:20:49.120
I'll call this one mbta.db, because
mbta stands for the Massachusetts Bay

00:20:49.120 --> 00:20:52.630
Transportation Authority, the people
who run the subway, essentially.

00:20:52.630 --> 00:20:57.610
So I'll do sqlite3,
mbta.db, hit Enter, and I'll

00:20:57.610 --> 00:21:04.220
type Y to say yes, I want to
create this brand new database.

00:21:04.220 --> 00:21:09.010
Now if I type dot schema, I see nothing.

00:21:09.010 --> 00:21:10.150
But that's expected.

00:21:10.150 --> 00:21:11.530
I don't have any tables yet.

00:21:11.530 --> 00:21:13.870
I have nothing inside this database.

00:21:13.870 --> 00:21:17.350
It's up to me to create
these tables myself.

00:21:17.350 --> 00:21:22.240
So what I'll do is clear my terminal,
and let's start first with riders.

00:21:22.240 --> 00:21:23.890
I might create a table for riders.

00:21:23.890 --> 00:21:26.440
I'll say, create table.

00:21:26.440 --> 00:21:29.500
And now I have to give
that table some name.

00:21:29.500 --> 00:21:31.960
I might call it riders here.

00:21:31.960 --> 00:21:36.310
And then in parentheses,
like this, I can specify what

00:21:36.310 --> 00:21:39.500
columns should be part of this table.

00:21:39.500 --> 00:21:40.540
So let's start first.

00:21:40.540 --> 00:21:44.320
I'll hit Enter here and
continue this query.

00:21:44.320 --> 00:21:47.770
Now, all by convention, I'll
just indent four spaces--

00:21:47.770 --> 00:21:52.760
one, two, three, four, and I'll
give an ID to each of these riders

00:21:52.760 --> 00:21:54.430
as we saw before.

00:21:54.430 --> 00:21:58.180
I'll say ID here is one of my columns.

00:21:58.180 --> 00:22:04.300
Now to create a new column, I'll follow
this up with a comma and hit Enter.

00:22:04.300 --> 00:22:08.350
I'll again, by convention, for
style, just indent four spaces.

00:22:08.350 --> 00:22:09.730
And what's my next column?

00:22:09.730 --> 00:22:12.160
Perhaps a name for these riders.

00:22:12.160 --> 00:22:17.530
I'll give this column the name,
name, and I'll leave it at that.

00:22:17.530 --> 00:22:22.660
Once I'm done adding columns, I
no longer need to have a comma.

00:22:22.660 --> 00:22:26.320
I could simply close out
this query, this statement.

00:22:26.320 --> 00:22:29.740
I could hit Enter here,
say close in parentheses

00:22:29.740 --> 00:22:33.400
to close the top parentheses
here, semicolon, hit Enter.

00:22:33.400 --> 00:22:36.730
And now nothing seems to happen.

00:22:36.730 --> 00:22:38.330
But that's actually a good sign.

00:22:38.330 --> 00:22:41.290
So let me type dot schema,
hit Enter, and I'll

00:22:41.290 --> 00:22:44.650
see the result of that statement before.

00:22:44.650 --> 00:22:48.820
Create table, if it doesn't
already exist, riders, and riders

00:22:48.820 --> 00:22:53.590
has inside of it two
columns, ID and name.

00:22:53.590 --> 00:22:54.740
OK, let's keep going.

00:22:54.740 --> 00:22:56.800
Let's make one for stations, too.

00:22:56.800 --> 00:22:59.380
I'll clear my terminal
and I'll say, create me

00:22:59.380 --> 00:23:02.740
a table called stations and include--

00:23:02.740 --> 00:23:04.330
actually, not station.

00:23:04.330 --> 00:23:09.140
If you ever want to, let's say
fix this kind of table here,

00:23:09.140 --> 00:23:14.500
let me try closing the
parentheses, hit semicolon, Enter.

00:23:14.500 --> 00:23:15.850
I'll get a syntax error.

00:23:15.850 --> 00:23:17.050
I can restart.

00:23:17.050 --> 00:23:22.180
I'll do Control L. Now I'll do
create table, stations, plural.

00:23:22.180 --> 00:23:24.250
Open parentheses, Enter.

00:23:24.250 --> 00:23:25.870
I'll indent by four spaces.

00:23:25.870 --> 00:23:27.430
One, two, three, four.

00:23:27.430 --> 00:23:31.870
And now, I'll similarly include
an ID for each of these stations.

00:23:31.870 --> 00:23:36.220
I'll say ID comma, and then
what all should I have?

00:23:36.220 --> 00:23:40.840
Well, stations tend to have a
name, like the Kendall MIT station,

00:23:40.840 --> 00:23:43.480
the Harvard station,
the Park Street Station.

00:23:43.480 --> 00:23:48.760
So I'll say, I'll give each of
these their very own name, comma.

00:23:48.760 --> 00:23:50.770
What else do stations have?

00:23:50.770 --> 00:23:54.220
Well, they might also have
a line that they're on.

00:23:54.220 --> 00:23:58.700
Let's say it's the red line, or the
blue line, or the green line and so on.

00:23:58.700 --> 00:24:03.720
I'll have a space for them to write
in their line that they're a part of.

00:24:03.720 --> 00:24:06.160
OK, and maybe I'll leave it
at that to keep it simple.

00:24:06.160 --> 00:24:09.490
I'll say, stations have
an ID, a name and a line.

00:24:09.490 --> 00:24:10.750
Now I'll close this out.

00:24:10.750 --> 00:24:15.430
I'll say end parentheses, semicolon,
hit Enter, and nothing seems to happen.

00:24:15.430 --> 00:24:20.770
But if I type dot schema, I'll
now see my riders and my stations

00:24:20.770 --> 00:24:24.870
tables inside of my database.

00:24:24.870 --> 00:24:26.760
Now one more step here.

00:24:26.760 --> 00:24:31.920
We have riders, we have stations,
but we have to relate to them.

00:24:31.920 --> 00:24:34.800
We have to relate them using
this many to many relationships,

00:24:34.800 --> 00:24:37.420
as we saw last week.

00:24:37.420 --> 00:24:40.950
So let me try making
a table to implement

00:24:40.950 --> 00:24:43.110
this many to many relationship.

00:24:43.110 --> 00:24:47.400
And if you remember, we might call
this kind of table a junction table,

00:24:47.400 --> 00:24:50.040
an associative entity, a join table.

00:24:50.040 --> 00:24:53.730
It has a lot of names, but
it looks a bit like this.

00:24:53.730 --> 00:24:58.710
I'll create this new table to
represent, let's say, visits.

00:24:58.710 --> 00:25:01.110
A rider visits a station.

00:25:01.110 --> 00:25:03.780
So I'll call this table visits.

00:25:03.780 --> 00:25:09.840
And inside, I'll make sure it
has two columns, one for a rider

00:25:09.840 --> 00:25:17.130
ID to represent a rider, and
one, let's say, for a station ID,

00:25:17.130 --> 00:25:19.020
to represent a station.

00:25:19.020 --> 00:25:24.630
Now when I see a rider ID next
to a station ID in this table,

00:25:24.630 --> 00:25:27.870
I'll know the rider with
that certain ID visited

00:25:27.870 --> 00:25:30.180
the station with that certain ID.

00:25:30.180 --> 00:25:31.500
So I'll close this out.

00:25:31.500 --> 00:25:36.330
I'll say end, parentheses here,
semicolon, Enter, and finally, clear

00:25:36.330 --> 00:25:38.940
my terminal, type dot schema.

00:25:38.940 --> 00:25:45.000
And I can see, I have riders, stations,
and visits between riders and stations

00:25:45.000 --> 00:25:48.630
in this associative entity, this
junction table or a joined table.

00:25:48.630 --> 00:25:51.990
Up to you what you might
want to call it in this case.

00:25:51.990 --> 00:25:55.840
Now what questions do we have?

00:25:55.840 --> 00:25:58.480
AUDIENCE: Why we have
not use the primary key

00:25:58.480 --> 00:26:00.563
and secondary key in this table?

00:26:00.563 --> 00:26:01.730
CARTER ZENKE: Good question.

00:26:01.730 --> 00:26:03.605
So we're going to get
there in just a minute.

00:26:03.605 --> 00:26:07.120
But if I look back at my terminal
here, my schema, I'll see,

00:26:07.120 --> 00:26:09.340
I really just have column names.

00:26:09.340 --> 00:26:13.840
And we saw before and we typed
dot schema on our longlist.db,

00:26:13.840 --> 00:26:16.150
we had more than just column names.

00:26:16.150 --> 00:26:19.400
We had column names, we
had perhaps data types,

00:26:19.400 --> 00:26:21.807
we had primary keys and foreign keys.

00:26:21.807 --> 00:26:23.390
So we'll get to that in just a minute.

00:26:23.390 --> 00:26:28.300
But suffice to say for now, we're
going to keep improving this over time.

00:26:28.300 --> 00:26:30.580
Let's take one more.

00:26:30.580 --> 00:26:33.300
AUDIENCE: Is it required to put spaces--

00:26:33.300 --> 00:26:37.422
the four spaces indents, or
that's just for the visual look?

00:26:37.422 --> 00:26:38.880
CARTER ZENKE: Yeah, great question.

00:26:38.880 --> 00:26:43.260
Is it required to have these four
spaces before each column name.

00:26:43.260 --> 00:26:46.720
And in fact, no, it's not, but
it makes the code more readable.

00:26:46.720 --> 00:26:49.590
So I could put this all in one line--

00:26:49.590 --> 00:26:51.270
I shouldn't, but I could.

00:26:51.270 --> 00:26:55.620
And if I have instead this new
line followed by four spaces,

00:26:55.620 --> 00:27:00.390
I can make this more readable for
myself and for my colleagues, too.

00:27:00.390 --> 00:27:02.060
Good question.

00:27:02.060 --> 00:27:07.430
OK, so to our earlier
point, there are things

00:27:07.430 --> 00:27:09.680
that are missing from this schema.

00:27:09.680 --> 00:27:13.800
Like, we have column names,
but as we saw before,

00:27:13.800 --> 00:27:16.550
we should ideally
specify what kind of data

00:27:16.550 --> 00:27:20.870
should be able to go into
each of these columns.

00:27:20.870 --> 00:27:24.990
And for that, we'll need some
new ideas to talk about here.

00:27:24.990 --> 00:27:30.950
So let's focus now on this idea
of data types and storage classes.

00:27:30.950 --> 00:27:36.680
Data types and storage classes are
two very similar but distinct IDs,

00:27:36.680 --> 00:27:40.230
and they're distinct in a way
we'll talk about in just a minute.

00:27:40.230 --> 00:27:45.770
Now SQLite has five storage
classes, five kind of storage--

00:27:45.770 --> 00:27:49.520
kind of type, so to speak,
of values that can hold.

00:27:49.520 --> 00:27:51.920
So let's talk about the first
one, null, for instance.

00:27:51.920 --> 00:27:54.230
Null in this case means nothing.

00:27:54.230 --> 00:27:57.170
There's nothing that actually
is inside of this value.

00:27:57.170 --> 00:27:59.990
It's kind of a central value
to mean nothing is here.

00:27:59.990 --> 00:28:04.700
Integer means a whole number,
like 1, 2, 3, 4, or 5.

00:28:04.700 --> 00:28:10.880
Real talks about decimals like floating
points like 1.2 or 2.4 and so on.

00:28:10.880 --> 00:28:12.740
Text is used for characters.

00:28:12.740 --> 00:28:19.740
And blob, kind of a funny one, blob
stands for Binary Large Object,

00:28:19.740 --> 00:28:24.570
and it represents the data exactly
as I give it to this value.

00:28:24.570 --> 00:28:30.800
If I tell it to store 101010, it
will store exactly 101010 in binary.

00:28:30.800 --> 00:28:34.220
So useful for storing, in this
case like images and video files

00:28:34.220 --> 00:28:37.160
and audio files, things
that have some structure

00:28:37.160 --> 00:28:40.030
we don't want to mess around with.

00:28:40.030 --> 00:28:44.290
Now let's focus on this
idea of a storage class.

00:28:44.290 --> 00:28:50.830
These, I'll say, are storage classes
and not data types in SQLite.

00:28:50.830 --> 00:28:58.390
Now a storage class like integer can
comprise, can hold several data types.

00:28:58.390 --> 00:29:01.480
Notice how there are
seven different data

00:29:01.480 --> 00:29:05.380
types that could be stored under
this integer storage class.

00:29:05.380 --> 00:29:10.960
We have a 6-byte integer, 2-byte
integer, a 8 and 0 and so on.

00:29:10.960 --> 00:29:13.750
It could be any of
these particular types,

00:29:13.750 --> 00:29:19.330
but each of these types under the
umbrella of this integer storage class.

00:29:19.330 --> 00:29:23.740
And SQLite itself will
take care of making sure

00:29:23.740 --> 00:29:25.750
it uses the appropriate data type.

00:29:25.750 --> 00:29:30.910
Like if I give a very large number,
like let's say 4 billion or 6 billion

00:29:30.910 --> 00:29:35.200
or even bigger than that, it
will probably use a longer--

00:29:35.200 --> 00:29:39.010
that is, a bigger byte integer
to store that kind of value.

00:29:39.010 --> 00:29:42.610
If I give it a smaller
one, like 1, 2, 3, or 4,

00:29:42.610 --> 00:29:47.140
it will probably use a 1-byte
or a 2-byte integer for that.

00:29:47.140 --> 00:29:50.560
But SQLite's idea is
that I, as a programmer,

00:29:50.560 --> 00:29:54.910
shouldn't have to care if I use an
8-byte or a 1-byte or a 2-byte integer,

00:29:54.910 --> 00:29:58.180
I just care that I'm using
integers, whole numbers,

00:29:58.180 --> 00:30:03.670
and they give me a storage class to
use any of these up to their choice

00:30:03.670 --> 00:30:05.960
here as well.

00:30:05.960 --> 00:30:10.930
Now let's look at a few examples of
values in SQLite that we could store.

00:30:10.930 --> 00:30:14.590
Well, we have perhaps the
red line as some text.

00:30:14.590 --> 00:30:16.690
And because this is
characters, it's quoted,

00:30:16.690 --> 00:30:22.330
we could use the text storage class to
represent this particular value here.

00:30:22.330 --> 00:30:25.630
We could have maybe an image,
and to the earlier point,

00:30:25.630 --> 00:30:29.350
we could say, well, this image
might be best represented

00:30:29.350 --> 00:30:33.490
using a blob, a binary large
object, to keep all of these pixels

00:30:33.490 --> 00:30:36.790
exactly as they are in this image.

00:30:36.790 --> 00:30:40.330
But we do get some choice,
some interesting design

00:30:40.330 --> 00:30:43.720
challenges when we look
at the idea of fares.

00:30:43.720 --> 00:30:50.230
So let's say to our point earlier,
fares are $0.10 back in the 1950s or so.

00:30:50.230 --> 00:30:55.960
Well, $0.10 we could store as an
integer, which seems just fine.

00:30:55.960 --> 00:30:58.480
But this could get confused.

00:30:58.480 --> 00:31:00.790
I'm talking about dollars here or cents?

00:31:00.790 --> 00:31:03.205
Maybe it would be better,
let's say, if I did this.

00:31:03.205 --> 00:31:07.610
A dollar sign, 0.10.

00:31:07.610 --> 00:31:09.680
And what might that be stored as?

00:31:09.680 --> 00:31:11.630
Well, probably text, right?

00:31:11.630 --> 00:31:14.840
I could say this dollar
sign isn't really a number,

00:31:14.840 --> 00:31:18.140
but now I have to include it, so I'll
say this will be quoted, essentially,

00:31:18.140 --> 00:31:22.358
as dollar sign 0.10.

00:31:22.358 --> 00:31:23.900
Now there's some downsides here, too.

00:31:23.900 --> 00:31:25.550
Like let's say I have--

00:31:25.550 --> 00:31:26.840
I want to add these up.

00:31:26.840 --> 00:31:28.580
Well, I can't add up text.

00:31:28.580 --> 00:31:33.140
Like, what does it mean to say dollar
sign 0.10 plus dollar sign 0.20.

00:31:33.140 --> 00:31:35.630
I can't do math with text.

00:31:35.630 --> 00:31:41.090
So maybe it'll be better after all if
I used a real or a decimal like this,

00:31:41.090 --> 00:31:43.970
0.10.

00:31:43.970 --> 00:31:46.880
But I mean even here, you'll
run into some problems.

00:31:46.880 --> 00:31:50.780
If you are familiar with how the
number is represented in binary,

00:31:50.780 --> 00:31:54.380
you might know that decimal
values or floating point values

00:31:54.380 --> 00:31:57.020
can't be perfectly,
precisely represented.

00:31:57.020 --> 00:32:04.850
And if I talk about 0.10, the
computer might store 0.10000056789.

00:32:04.850 --> 00:32:09.450
It could get very wonky out to the many,
many decimal digits down below here.

00:32:09.450 --> 00:32:13.460
So trade offs and challenges overall.

00:32:13.460 --> 00:32:15.080
Let's look at these three, though.

00:32:15.080 --> 00:32:17.480
I have the first one
to store as an integer.

00:32:17.480 --> 00:32:23.000
I'm trying to store fares here,
second one as text, and the third one

00:32:23.000 --> 00:32:27.320
as a floating point or
a real in this case.

00:32:27.320 --> 00:32:30.350
Let me ask for some
opinions here, which one

00:32:30.350 --> 00:32:36.297
would you use and why, for trying
represent fares in this case?

00:32:36.297 --> 00:32:37.130
AUDIENCE: Thank you.

00:32:37.130 --> 00:32:41.700
I prefer using integers
because of course,

00:32:41.700 --> 00:32:44.690
I need to get the
calculation very accurately.

00:32:44.690 --> 00:32:45.830
That's my point of view.

00:32:45.830 --> 00:32:50.030
Well, sometimes I can
use float but, you know,

00:32:50.030 --> 00:32:52.310
like you said before,
it can get very wonky

00:32:52.310 --> 00:32:59.815
if I really need that kind of precision,
I don't really recommend using floats.

00:32:59.815 --> 00:33:01.190
CARTER ZENKE: Yeah, a good point.

00:33:01.190 --> 00:33:04.820
So if I go back to some slides here,
you might argue for the integer

00:33:04.820 --> 00:33:07.940
because you know you can
precisely represent integers.

00:33:07.940 --> 00:33:12.290
And let's say I want to add up
fares over a lot, a lot of riders.

00:33:12.290 --> 00:33:15.020
This might be useful for me because
I know that each number will

00:33:15.020 --> 00:33:16.820
be perfectly, precisely represented.

00:33:16.820 --> 00:33:20.720
I can do lots of big kind of
math with this number here.

00:33:20.720 --> 00:33:24.260
To your point, this decimal
might kind of, as you said,

00:33:24.260 --> 00:33:26.840
get wonky later on towards
the later decimal points.

00:33:26.840 --> 00:33:30.980
I might get some unexpected
results if we add up these overall.

00:33:30.980 --> 00:33:36.380
Let me ask, though, are there any
proponents of this floating point

00:33:36.380 --> 00:33:40.120
value or a real value?

00:33:40.120 --> 00:33:42.930
AUDIENCE: So I think the--

00:33:42.930 --> 00:33:48.810
I think the float is the
number of, for example,

00:33:48.810 --> 00:33:56.190
for each pair, like the
answers like truncation

00:33:56.190 --> 00:33:58.875
probably suggest by the comments there.

00:33:58.875 --> 00:34:00.250
CARTER ZENKE: Yeah, a good point.

00:34:00.250 --> 00:34:02.083
So if you talk about
using this float value,

00:34:02.083 --> 00:34:06.491
I mean, one thing we could say for
it is that this decimal could be--

00:34:06.491 --> 00:34:09.449
it's more accurate to say, like,
where you're working with dollars now,

00:34:09.449 --> 00:34:13.259
and we could have maybe $0.10,
which is only 0.1 of a dollar.

00:34:13.259 --> 00:34:15.030
I totally hear that point as well.

00:34:15.030 --> 00:34:17.280
And the point we're making
here is that they're really

00:34:17.280 --> 00:34:20.580
just trade offs among
these data storage classes

00:34:20.580 --> 00:34:23.699
to use, whether you're using
integers or real values,

00:34:23.699 --> 00:34:27.730
it just depends on your use case
and what you're designing for.

00:34:27.730 --> 00:34:30.900
So be sure to read up on trade
offs among these data types,

00:34:30.900 --> 00:34:35.420
and determine for yourself
which one should you best use.

00:34:35.420 --> 00:34:36.409
OK.

00:34:36.409 --> 00:34:41.090
So we have now these storage
classes to store values,

00:34:41.090 --> 00:34:47.210
and it turns out that columns can be
made to store certain storage classes

00:34:47.210 --> 00:34:49.460
or prioritize certain classes.

00:34:49.460 --> 00:34:53.750
And the key distinction here
is that columns in SQLite

00:34:53.750 --> 00:34:57.440
don't always store one particular type.

00:34:57.440 --> 00:35:01.250
Instead, they have type
affinities, meaning

00:35:01.250 --> 00:35:07.280
that they'll try to convert some value
you insert into a given cell or given

00:35:07.280 --> 00:35:11.660
row to the type they
have the affinity for.

00:35:11.660 --> 00:35:16.490
Now there are, let's say five
type affinities in SQLite--

00:35:16.490 --> 00:35:17.270
text.

00:35:17.270 --> 00:35:19.610
Columns can be of the
type affinity text,

00:35:19.610 --> 00:35:22.520
meaning they store just
characters at the end of the day.

00:35:22.520 --> 00:35:27.950
There's also numeric, which stores
either integer values or real values,

00:35:27.950 --> 00:35:31.010
depending on which one
seems best to convert to.

00:35:31.010 --> 00:35:33.980
You have integer type affinity, which
means it can store whole numbers.

00:35:33.980 --> 00:35:36.990
Real, to store floating
points or decimal values.

00:35:36.990 --> 00:35:39.120
And we have blob here
again, our old friend

00:35:39.120 --> 00:35:42.030
to store binary exactly as we get it.

00:35:42.030 --> 00:35:44.750
So whereas before we were
talking about storage classes--

00:35:44.750 --> 00:35:47.090
those are associated
with individual values--

00:35:47.090 --> 00:35:51.210
type affinities are now associated
with individual columns.

00:35:51.210 --> 00:35:55.460
So let's see some example of
how this might work in SQLite.

00:35:55.460 --> 00:35:59.000
Let's say I have this
table of fares, and we've

00:35:59.000 --> 00:36:02.720
decided to store fares as integers.

00:36:02.720 --> 00:36:09.470
Well, if I said that this column called
amount has the affinity for the text

00:36:09.470 --> 00:36:14.300
storage class, what would happen
is if I insert this integer, 10,

00:36:14.300 --> 00:36:16.490
it would look a bit like this later on.

00:36:16.490 --> 00:36:20.250
It would be converted to text,
it would be quoted in a sense,

00:36:20.250 --> 00:36:24.240
to represent-- it's now been
converted to some set of characters.

00:36:24.240 --> 00:36:27.050
Let's say I insert this value 25.

00:36:27.050 --> 00:36:31.250
Well, 25 has a storage class
right now of an integer.

00:36:31.250 --> 00:36:33.560
It is a whole number.

00:36:33.560 --> 00:36:38.480
But if I insert this into a
column that has the text affinity,

00:36:38.480 --> 00:36:44.250
it will be converted into, in this
case, text at the end of the day.

00:36:44.250 --> 00:36:45.920
Let's do the opposite.

00:36:45.920 --> 00:36:51.270
Let's say I have my fare as a
string, some text in this case.

00:36:51.270 --> 00:36:54.690
I want to insert it into
this column called amount,

00:36:54.690 --> 00:36:58.820
but now amount has the
integer type affinity.

00:36:58.820 --> 00:37:03.830
Well, if I insert 10, quote
unquote, into the column amount,

00:37:03.830 --> 00:37:10.850
I'll get back not 10, the text, but 10
the integer, because again, amount--

00:37:10.850 --> 00:37:16.310
this column here-- has an affinity
for the integer storage class.

00:37:16.310 --> 00:37:17.210
Let's try this.

00:37:17.210 --> 00:37:21.650
25, some text again I'll
insert it into this table.

00:37:21.650 --> 00:37:26.690
Now I'll have 25 as an integer.

00:37:26.690 --> 00:37:31.300
So this is how SQLite allows
us to give certain columns

00:37:31.300 --> 00:37:33.610
an affinity for certain
types, that they'll

00:37:33.610 --> 00:37:38.860
try to store values of that type, so
long as we insert values it could be

00:37:38.860 --> 00:37:41.900
feasibly converted to that type here.

00:37:41.900 --> 00:37:45.820
So let's go back to our schema
and try to improve it now,

00:37:45.820 --> 00:37:50.860
to use not just column names,
but also type affinities to store

00:37:50.860 --> 00:37:53.140
certain data of a certain type.

00:37:53.140 --> 00:37:58.000
Go back to my computer here, and
let's improve this once more.

00:37:58.000 --> 00:38:00.910
So I'll go over to my table.

00:38:00.910 --> 00:38:05.890
And now, I probably want
to improve the design here.

00:38:05.890 --> 00:38:08.680
And often, if I want to
improve this, I might just

00:38:08.680 --> 00:38:10.910
need to erase what I've already done.

00:38:10.910 --> 00:38:16.210
So let me introduce this new keyword,
this new statement called Drop Table.

00:38:16.210 --> 00:38:19.460
To drop a table means to delete
it, to remove it, effectively.

00:38:19.460 --> 00:38:23.080
So let me try doing this for
riders, stations, and visits.

00:38:23.080 --> 00:38:28.840
I'll type drop table
writers, semicolon, Enter.

00:38:28.840 --> 00:38:34.240
Nothing seems to happen, but if I
type dot schema, well, riders is gone.

00:38:34.240 --> 00:38:40.960
I'll try drop table stations, then
semicolon, hit Enter, and type,

00:38:40.960 --> 00:38:42.760
let's say dot schema again.

00:38:42.760 --> 00:38:44.320
No more stations.

00:38:44.320 --> 00:38:49.030
I'll try drop table
visits, semicolon, Enter,

00:38:49.030 --> 00:38:52.510
and then dot schema, and our table--

00:38:52.510 --> 00:38:54.160
our database is really gone.

00:38:54.160 --> 00:38:57.280
There are no more tables inside of it.

00:38:57.280 --> 00:39:02.370
So let me propose that instead
of working inside of the SQLite

00:39:02.370 --> 00:39:05.730
prompt, like typing out again, and
again, and again, create table riders,

00:39:05.730 --> 00:39:07.650
create table stations,
create table visits,

00:39:07.650 --> 00:39:10.650
let me be more efficient
about this and create myself

00:39:10.650 --> 00:39:14.790
a schema file that I could
reuse throughout this lesson

00:39:14.790 --> 00:39:18.870
and also later on while I'm
working on this database on my own.

00:39:18.870 --> 00:39:23.850
To do that, let me quit my
SQLite prompt here, and let me

00:39:23.850 --> 00:39:27.300
type something like code schema dot sql.

00:39:27.300 --> 00:39:31.620
I'm just creating this
file called schema.sql.

00:39:31.620 --> 00:39:36.060
Now a dot SQL file allows
me to type in SQL keywords

00:39:36.060 --> 00:39:38.790
and have them be syntax
highlighted, so I know

00:39:38.790 --> 00:39:42.040
what's going on inside of this file.

00:39:42.040 --> 00:39:44.130
So let's just try this once more.

00:39:44.130 --> 00:39:49.170
I'll type Create table
riders, and inside I'll

00:39:49.170 --> 00:39:53.820
say it has the ID column
of what type affinity?

00:39:53.820 --> 00:39:58.270
Well, IDs are whole numbers, so
perhaps integer in this case.

00:39:58.270 --> 00:40:02.160
I could say, ID has the
integer type affinity.

00:40:02.160 --> 00:40:05.010
Now let me say the
riders also have a name,

00:40:05.010 --> 00:40:07.710
and how could names be best represented?

00:40:07.710 --> 00:40:09.210
Maybe text, right?

00:40:09.210 --> 00:40:10.120
Characters here.

00:40:10.120 --> 00:40:12.750
So I'll say name and text.

00:40:12.750 --> 00:40:14.850
Now I'll include a
semicolon to say, this

00:40:14.850 --> 00:40:17.820
is the end of my create table statement.

00:40:17.820 --> 00:40:21.630
And before, remember how
I had to kind of air out

00:40:21.630 --> 00:40:25.560
or I had to like, backspace and
so on to improve the design?

00:40:25.560 --> 00:40:28.890
Here I can literally just point
and click and edit this file

00:40:28.890 --> 00:40:30.630
to improve my schema.

00:40:30.630 --> 00:40:35.130
And I'll later on apply
this in my database using

00:40:35.130 --> 00:40:37.418
a command that we'll see a bit later.

00:40:37.418 --> 00:40:38.460
So let's keep going here.

00:40:38.460 --> 00:40:44.040
I'll say create table stations,
and inside of the stations table,

00:40:44.040 --> 00:40:48.330
I'll make sure it has an
ID column of type integer,

00:40:48.330 --> 00:40:51.960
a name column that
probably stores some text,

00:40:51.960 --> 00:40:57.270
and a line column that also
stores some text, where name

00:40:57.270 --> 00:41:00.130
is the name of the station
like Kendell MIT, Harvard,

00:41:00.130 --> 00:41:05.640
and line is blue line or green line,
what line it's part of in our subway.

00:41:05.640 --> 00:41:07.290
Let me try now visits.

00:41:07.290 --> 00:41:08.850
I'll say, create table visits.

00:41:08.850 --> 00:41:13.620
And then I'll do rider ID,
which has what type affinity?

00:41:13.620 --> 00:41:15.030
Probably integer.

00:41:15.030 --> 00:41:19.680
And then I'll do station ID, which
has this same type of affinity.

00:41:19.680 --> 00:41:23.070
It's relying on integers,
whole numbers here for IDs.

00:41:23.070 --> 00:41:25.450
So my colon to finish this statement.

00:41:25.450 --> 00:41:29.040
Now this is my schema as a whole.

00:41:29.040 --> 00:41:31.830
I have riders, stations, and visits.

00:41:31.830 --> 00:41:36.760
But now I want to apply
this schema to my database.

00:41:36.760 --> 00:41:37.780
So what could I do?

00:41:37.780 --> 00:41:39.330
I could reopen it, let's say.

00:41:39.330 --> 00:41:43.510
I'll do sqlite3 mbta.db in my terminal.

00:41:43.510 --> 00:41:47.370
And now I want to read
in this schema.sql file,

00:41:47.370 --> 00:41:51.340
just run the commands that
are inside of this file.

00:41:51.340 --> 00:41:52.660
So what could I do?

00:41:52.660 --> 00:41:59.040
I could say dot read schema.sql,
where dot read is a command that

00:41:59.040 --> 00:42:01.860
says take whatever file you
give me, like schema.sql

00:42:01.860 --> 00:42:08.070
and read it into this database, running
any SQL keywords you come across there.

00:42:08.070 --> 00:42:09.450
I'll hit Enter.

00:42:09.450 --> 00:42:11.640
And now nothing seems to happen.

00:42:11.640 --> 00:42:17.658
But if I type dot schema, I
now see my schema improved.

00:42:17.658 --> 00:42:19.950
And this is helpful for me,
because what I could do now

00:42:19.950 --> 00:42:23.580
is I could just edit my schema.sql
file and rerun it and rerun

00:42:23.580 --> 00:42:29.770
it to make sure I now have these
tables being improved over time.

00:42:29.770 --> 00:42:30.730
OK.

00:42:30.730 --> 00:42:37.720
So this, then, is our new
representation of our database.

00:42:37.720 --> 00:42:42.280
We have riders, of course,
their own entity, and stations.

00:42:42.280 --> 00:42:47.170
They have an ID and a name, and
stations have an ID, a name, and a line.

00:42:47.170 --> 00:42:52.810
We've also now included these type
affinities-- integer, text, integer,

00:42:52.810 --> 00:42:56.320
text, to tell SQL what
kinds of storage classes

00:42:56.320 --> 00:43:00.490
could be put inside of
each of these columns.

00:43:00.490 --> 00:43:03.580
Now before we keep
improving this, let me ask,

00:43:03.580 --> 00:43:10.780
what questions do we have on these
storage classes and type affinities?

00:43:10.780 --> 00:43:12.400
AUDIENCE: That you would use--

00:43:12.400 --> 00:43:17.870
when you were creating the
table, the table was not in line.

00:43:17.870 --> 00:43:21.340
So when we search for
the authors of books,

00:43:21.340 --> 00:43:25.982
so it comes with a perfect table, so how
can we make a perfect table in the SQL?

00:43:25.982 --> 00:43:27.940
CARTER ZENKE: Yeah, do
you mind clarifying what

00:43:27.940 --> 00:43:30.160
you mean by like, the perfect table?

00:43:30.160 --> 00:43:33.760
AUDIENCE: So I mean that it
was arranged in something

00:43:33.760 --> 00:43:36.443
like in boxes, block boxes.

00:43:36.443 --> 00:43:37.610
CARTER ZENKE: Good question.

00:43:37.610 --> 00:43:41.770
So before we able to see
the results of our queries

00:43:41.770 --> 00:43:46.660
inside some boxes in our terminal,
and that is actually a mode of SQLite,

00:43:46.660 --> 00:43:51.370
I think type like dot mode table to
see your results in that version.

00:43:51.370 --> 00:43:54.937
Here though, we have no
data inside of our tables,

00:43:54.937 --> 00:43:56.770
so I can't really select
anything from them.

00:43:56.770 --> 00:44:01.300
Like, if I go to my
terminal here and I try

00:44:01.300 --> 00:44:04.690
to select some data
from this riders table,

00:44:04.690 --> 00:44:10.510
let me say select star from rider's,
semicolon, I won't get anything back.

00:44:10.510 --> 00:44:15.040
Next week, though, we'll see how to
insert, and add, and update, and delete

00:44:15.040 --> 00:44:17.990
data inside of these
tables, at which point,

00:44:17.990 --> 00:44:21.460
you could write, select star
from riders and see some data

00:44:21.460 --> 00:44:23.560
you've inserted yourself.

00:44:23.560 --> 00:44:25.120
Great question.

00:44:25.120 --> 00:44:27.580
Let's take one more here.

00:44:27.580 --> 00:44:28.360
AUDIENCE: Yes.

00:44:28.360 --> 00:44:33.130
I would like to know if you have
a column of the type boolean.

00:44:33.130 --> 00:44:36.730
CARTER ZENKE: Yeah, do we have a
Boolean type affinity, let's say.

00:44:36.730 --> 00:44:39.850
So here, we don't, at least in SQLite.

00:44:39.850 --> 00:44:45.610
Some other DBMS', Database Management
Systems might have bool or Boolean,

00:44:45.610 --> 00:44:47.110
true or false, right?

00:44:47.110 --> 00:44:48.700
Let me show you this instead.

00:44:48.700 --> 00:44:52.840
If I go to my terminal, I can
see-- if I type dot schema,

00:44:52.840 --> 00:44:56.320
I haven't used Boolean, there's
no need for me in this case,

00:44:56.320 --> 00:44:58.150
but I have used integer.

00:44:58.150 --> 00:45:02.350
And integer for SQLite can
kind of serve the same purpose.

00:45:02.350 --> 00:45:06.550
I could have the integer 0
or the integer 1 to be true

00:45:06.550 --> 00:45:09.880
or to be false or true, respectively,
0 being false and true being 1,

00:45:09.880 --> 00:45:11.140
I believe in this case.

00:45:11.140 --> 00:45:13.970
But good question.

00:45:13.970 --> 00:45:14.750
OK.

00:45:14.750 --> 00:45:19.190
So to the earlier question,
like, we've improved our tables.

00:45:19.190 --> 00:45:21.950
We now have type
affinities for our columns.

00:45:21.950 --> 00:45:26.490
But we don't yet have this
ID we talked about last week,

00:45:26.490 --> 00:45:29.450
which was primary keys and foreign keys.

00:45:29.450 --> 00:45:34.280
This idea of trying to uniquely
represent each item in our own table

00:45:34.280 --> 00:45:36.770
using primary keys and
trying to reference

00:45:36.770 --> 00:45:40.800
that primary key from some
other table using foreign keys.

00:45:40.800 --> 00:45:43.820
So let's try to work on that now.

00:45:43.820 --> 00:45:48.980
And for this, we'll need this
new ID called a table constraint.

00:45:48.980 --> 00:45:54.500
In SQLite, you can apply what's called
a constraint to your entire table.

00:45:54.500 --> 00:45:59.300
A constraint means that some
values have to be a certain way.

00:45:59.300 --> 00:46:04.340
Like let's say for a primary
key, primary keys must be unique.

00:46:04.340 --> 00:46:07.730
They can't repeat, and they
must be, at least in our case,

00:46:07.730 --> 00:46:12.990
is going to be integers, to be able
to quickly add on to them over time.

00:46:12.990 --> 00:46:16.280
Similarly, for foreign
keys, well, a constraint

00:46:16.280 --> 00:46:19.220
is that if you have a
foreign key, you better

00:46:19.220 --> 00:46:22.850
find that value in some
other table, otherwise

00:46:22.850 --> 00:46:26.250
you violated this constraint
to having a foreign key.

00:46:26.250 --> 00:46:29.940
So we have two kinds of table
constraints, among others,

00:46:29.940 --> 00:46:33.500
but two of these are
primary key and foreign key.

00:46:33.500 --> 00:46:36.800
And we can apply these to
our table by applying them

00:46:36.800 --> 00:46:42.030
underneath the columns we tend to
say will be inside of our table.

00:46:42.030 --> 00:46:43.850
Let's try these two here now.

00:46:43.850 --> 00:46:47.840
So we come back to my terminal here so
we implement our very own primary key

00:46:47.840 --> 00:46:49.730
and foreign key constraints.

00:46:49.730 --> 00:46:52.920
We'll go back to my SQLite
terminal and clear my screen,

00:46:52.920 --> 00:46:55.670
and let's then pull
up our schema.sql file

00:46:55.670 --> 00:46:58.130
so we can keep modifying our schema.

00:46:58.130 --> 00:47:01.610
I can now see I have the riders
table, stations, and visits,

00:47:01.610 --> 00:47:06.080
and I have some columns that could
be primary keys or foreign keys,

00:47:06.080 --> 00:47:08.850
but I need to declare them as such.

00:47:08.850 --> 00:47:12.120
So here in the riders table,
what was our primary key?

00:47:12.120 --> 00:47:13.520
Well, it was ID.

00:47:13.520 --> 00:47:16.520
Every rider should have their
own, unique ID that should not

00:47:16.520 --> 00:47:19.460
be duplicated across any two riders.

00:47:19.460 --> 00:47:24.410
So to ensure that constraint is applied,
I could follow this up with a comma

00:47:24.410 --> 00:47:28.770
and then say, primary
key ID, just like this.

00:47:28.770 --> 00:47:32.690
Now ID is the primary key
of this rider's table.

00:47:32.690 --> 00:47:36.450
I can go down to stations and
ask, what was my primary key?

00:47:36.450 --> 00:47:40.520
Well, similarly, it was ID,
this ID column on line 8.

00:47:40.520 --> 00:47:45.590
So I'll type a comma followed
up with primary key ID,

00:47:45.590 --> 00:47:49.170
and now that ID column is a primary key.

00:47:49.170 --> 00:47:52.580
It has that constraint
applied in stations.

00:47:52.580 --> 00:47:56.900
But now if I get down to visits,
we'll have a few more options.

00:47:56.900 --> 00:48:01.700
Visits here actually doesn't have
its own ID column that I created.

00:48:01.700 --> 00:48:05.750
I instead have a rider ID
and a station ID column.

00:48:05.750 --> 00:48:07.110
So a few options here.

00:48:07.110 --> 00:48:10.730
One option is actually to
make a joint primary key.

00:48:10.730 --> 00:48:17.630
I could have a primary key composed of
two columns, both rider ID and station

00:48:17.630 --> 00:48:18.170
ID.

00:48:18.170 --> 00:48:20.780
If I applied that
constraint, that would mean

00:48:20.780 --> 00:48:24.110
that if I were to ever insert
a row that had the same rider

00:48:24.110 --> 00:48:26.990
ID and the same station
ID as another row,

00:48:26.990 --> 00:48:29.810
I would trigger a constraint violation.

00:48:29.810 --> 00:48:33.080
Every row that has a
rider ID and a station ID

00:48:33.080 --> 00:48:36.860
has to be unique in their
combination of those two values.

00:48:36.860 --> 00:48:39.770
To write that kind of scenario,
I could follow this up

00:48:39.770 --> 00:48:43.190
and I could say similarly,
the primary key of this table

00:48:43.190 --> 00:48:48.410
is not just rider ID like
this, but it's also station ID.

00:48:48.410 --> 00:48:52.080
Now this is a joint
primary key constraint.

00:48:52.080 --> 00:48:54.110
But if we think about
this logically, I mean,

00:48:54.110 --> 00:48:57.620
it kind of stands to reason that
somebody would visit a station more

00:48:57.620 --> 00:49:01.850
than once, and I don't want to make
sure that every combination of rider

00:49:01.850 --> 00:49:03.420
and station ID should be unique.

00:49:03.420 --> 00:49:06.600
I want people to able to visit
a station more than once.

00:49:06.600 --> 00:49:09.900
So maybe not the best design for this
table, but I could certainly use it.

00:49:09.900 --> 00:49:12.530
In other cases or other contexts.

00:49:12.530 --> 00:49:15.410
One other option would
be to do it before

00:49:15.410 --> 00:49:19.670
and to have maybe the ID
column here of type integer,

00:49:19.670 --> 00:49:23.670
and then down below, make that
our primary key, a bit like this.

00:49:23.670 --> 00:49:27.140
And now visits has its own ID column.

00:49:27.140 --> 00:49:33.395
But actually, SQLite by default will
give me its very own primary key,

00:49:33.395 --> 00:49:35.810
one called row ID.

00:49:35.810 --> 00:49:39.170
It's implicit, I can't see it,
but I actually could query it.

00:49:39.170 --> 00:49:41.810
I could query it for
row ID, all one word,

00:49:41.810 --> 00:49:46.070
and get back a unique primary key for
this table SQLite has automatically

00:49:46.070 --> 00:49:47.950
created for me.

00:49:47.950 --> 00:49:50.880
Now we have seen the
primary key options,

00:49:50.880 --> 00:49:52.420
what are our foreign key options?

00:49:52.420 --> 00:49:55.180
Well, it seems like
rider ID and station ID

00:49:55.180 --> 00:49:59.320
are the foreign keys of this table
where rider ID references the ID

00:49:59.320 --> 00:50:05.020
column in rider's, and station ID
references the ID column in stations.

00:50:05.020 --> 00:50:08.350
So to codify that, to make
that a reality in our schema,

00:50:08.350 --> 00:50:11.540
I could follow this up
with the foreign key.

00:50:11.540 --> 00:50:14.260
The foreign key of
this table is rider ID,

00:50:14.260 --> 00:50:19.690
and it references the rider's
table and the ID column

00:50:19.690 --> 00:50:22.840
inside of it using this syntax here.

00:50:22.840 --> 00:50:24.530
Now I could keep going.

00:50:24.530 --> 00:50:26.840
I could say I have more
than one foreign key.

00:50:26.840 --> 00:50:29.500
I also have a foreign key--

00:50:29.500 --> 00:50:32.020
foreign key called station ID.

00:50:32.020 --> 00:50:34.150
And that references,
like we said before,

00:50:34.150 --> 00:50:37.670
the stations table and the
ID column inside of it.

00:50:37.670 --> 00:50:40.060
So now here I have my completed schema.

00:50:40.060 --> 00:50:43.780
I have a primary key for the tables
I've declared an explicit column

00:50:43.780 --> 00:50:47.800
for a primary key, and now I
also have foreign key constraints

00:50:47.800 --> 00:50:51.710
for those columns that
should be foreign keys.

00:50:51.710 --> 00:50:55.570
So now let me ask, what questions
do we have on the schema

00:50:55.570 --> 00:50:59.508
or on primary keys and foreign keys?

00:50:59.508 --> 00:51:00.300
AUDIENCE: So, yeah.

00:51:00.300 --> 00:51:03.570
I just noticed that
whenever before we hadn't

00:51:03.570 --> 00:51:06.210
added the affinities and
the keys, we were not

00:51:06.210 --> 00:51:09.340
applying commas after each column name.

00:51:09.340 --> 00:51:11.295
So what is the difference there?

00:51:11.295 --> 00:51:12.670
CARTER ZENKE: Yeah, a good catch.

00:51:12.670 --> 00:51:15.128
Let me kind of show you what
this looks like in my terminal

00:51:15.128 --> 00:51:17.310
so you can see it live.

00:51:17.310 --> 00:51:20.430
You, I think, had noticed
that before we had,

00:51:20.430 --> 00:51:25.110
let's say, this primary key
ID constraint in riders,

00:51:25.110 --> 00:51:26.740
we had done something like this.

00:51:26.740 --> 00:51:28.290
Let me just copy paste that.

00:51:28.290 --> 00:51:30.450
And we had removed this last column--

00:51:30.450 --> 00:51:32.190
or comma from the name column.

00:51:32.190 --> 00:51:33.090
Is that right?

00:51:33.090 --> 00:51:38.290
And if that's the case, well, it's
just convention, just style here.

00:51:38.290 --> 00:51:41.100
So if I want to keep
adding some constraint

00:51:41.100 --> 00:51:45.000
or like a new line to my table,
I should include a comma.

00:51:45.000 --> 00:51:51.720
Here, this name column was the last
portion of my table I had specified.

00:51:51.720 --> 00:51:55.770
I have this column called name
that has type affinity text, right?

00:51:55.770 --> 00:51:58.590
But now, if I add this
new constraint, we'll

00:51:58.590 --> 00:52:03.360
have to follow it up with-- follow it
up after a comma from this new column

00:52:03.360 --> 00:52:04.290
here.

00:52:04.290 --> 00:52:10.230
Notice now, this constraint,
primary key ID, is the last--

00:52:10.230 --> 00:52:12.810
let's say attribute of
my table I specified.

00:52:12.810 --> 00:52:15.870
I no longer need to include
a comma at the end of it.

00:52:15.870 --> 00:52:20.430
So whatever is the last portion,
I should not have a comma after,

00:52:20.430 --> 00:52:23.460
but everything else, I should.

00:52:23.460 --> 00:52:25.800
Let's take one more question here, too.

00:52:25.800 --> 00:52:29.813
AUDIENCE: Would it be OK for the visits
table to have an ID column as well?

00:52:29.813 --> 00:52:31.230
CARTER ZENKE: Yeah, good question.

00:52:31.230 --> 00:52:35.490
Would it be OK for the visits
table to have an ID column as well.

00:52:35.490 --> 00:52:36.450
It certainly would be.

00:52:36.450 --> 00:52:39.870
We could define our very own
primary key for this table, too.

00:52:39.870 --> 00:52:42.210
So let me go back and show
you how that could work.

00:52:42.210 --> 00:52:44.880
I'll go to my visits table here.

00:52:44.880 --> 00:52:49.440
And I could try to add my own
primary key to this table.

00:52:49.440 --> 00:52:54.870
I could say ID, make a new column
here, make it a type affinity integer,

00:52:54.870 --> 00:52:55.900
like this.

00:52:55.900 --> 00:52:56.700
Let me scroll up.

00:52:56.700 --> 00:53:00.090
And now let me add some new constraint.

00:53:00.090 --> 00:53:03.450
I could say, because I've
made my very own primary key,

00:53:03.450 --> 00:53:06.310
I'll say primary key ID.

00:53:06.310 --> 00:53:12.510
Now, this table has a primary
key that I've created called ID.

00:53:12.510 --> 00:53:15.310
And this will be in place
of SQLite, so that it

00:53:15.310 --> 00:53:21.730
would have made called row ID
itself but hidden from my own view.

00:53:21.730 --> 00:53:24.450
OK, let's keep going then.

00:53:24.450 --> 00:53:27.570
We've seen table constraints,
we've seen type affinities,

00:53:27.570 --> 00:53:30.540
but we could probably do
more to improve the design

00:53:30.540 --> 00:53:32.740
of this table or this database.

00:53:32.740 --> 00:53:35.850
So let's introduce one
more kind of constraint,

00:53:35.850 --> 00:53:38.740
this one called a column constraint.

00:53:38.740 --> 00:53:42.900
So whereas a table constraint
applies to the table as a whole,

00:53:42.900 --> 00:53:47.160
a column constraint applies to
a particular column, let's say.

00:53:47.160 --> 00:53:51.930
Maybe I want to column to have
certain data inside of it.

00:53:51.930 --> 00:53:54.720
Maybe I want to make sure it
doesn't have null values, and so on.

00:53:54.720 --> 00:53:57.690
I could do that with column constraints.

00:53:57.690 --> 00:54:00.510
There are four in SQLite--

00:54:00.510 --> 00:54:07.440
check, default, not null, and unique,
and each one does something different.

00:54:07.440 --> 00:54:11.340
Check allows me to make my very
own check, like check to be sure

00:54:11.340 --> 00:54:14.250
that this amount is greater than 0.

00:54:14.250 --> 00:54:17.250
Or I could use default.
Default means if I

00:54:17.250 --> 00:54:20.310
don't supply a value
when I add a new row,

00:54:20.310 --> 00:54:23.640
it'll just use a default value instead.

00:54:23.640 --> 00:54:29.730
Not null means I can't insert null
or empty values into this column.

00:54:29.730 --> 00:54:31.680
And in fact, it's required.

00:54:31.680 --> 00:54:37.770
Unique means, I want to make sure
that every row in this column

00:54:37.770 --> 00:54:39.390
is a unique value.

00:54:39.390 --> 00:54:42.370
It doesn't appear twice in my data set.

00:54:42.370 --> 00:54:46.440
So let's try applying a few
of these to our schema here.

00:54:46.440 --> 00:54:48.270
Go back to my terminal.

00:54:48.270 --> 00:54:50.920
And now let me check out this.

00:54:50.920 --> 00:54:55.020
Well, I could try applying
the not null constraint

00:54:55.020 --> 00:54:59.190
when I know I want a column
to be required, effectively.

00:54:59.190 --> 00:55:01.530
Now where could I best apply that?

00:55:01.530 --> 00:55:05.790
Maybe I could apply that to
the name column in stations,

00:55:05.790 --> 00:55:07.870
like stations must have a name.

00:55:07.870 --> 00:55:11.140
So I'll say, the name
column cannot be null.

00:55:11.140 --> 00:55:14.310
It cannot be empty in here.

00:55:14.310 --> 00:55:17.280
Line also should probably be not null.

00:55:17.280 --> 00:55:19.710
A station must be part of some line.

00:55:19.710 --> 00:55:21.780
I can't have an empty value for line.

00:55:21.780 --> 00:55:26.240
So I'll say, too, this
should be not null.

00:55:26.240 --> 00:55:28.790
Now I could apply this up at name.

00:55:28.790 --> 00:55:31.610
I could say, riders
must have a name, too.

00:55:31.610 --> 00:55:32.310
Let me try that.

00:55:32.310 --> 00:55:34.220
I'll say text not null.

00:55:34.220 --> 00:55:35.630
Or I could leave it optional.

00:55:35.630 --> 00:55:39.020
I could say, maybe text
just on its own and let

00:55:39.020 --> 00:55:42.680
riders choose to supply a name or not.

00:55:42.680 --> 00:55:48.650
Now the question here is, should I apply
not null to my primary key columns,

00:55:48.650 --> 00:55:53.300
like ID not null or ID not null here?

00:55:53.300 --> 00:55:56.420
You might think that you
should, for thoroughness sake.

00:55:56.420 --> 00:56:01.940
Well it turns out that when you apply
the primary key table constraint down

00:56:01.940 --> 00:56:06.440
below here, this already ensures that
there are several constraints applied

00:56:06.440 --> 00:56:12.800
to this particular column called ID,
among them being that ID cannot be

00:56:12.800 --> 00:56:13.350
null.

00:56:13.350 --> 00:56:17.480
So no need to duplicate this and say
that this ID cannot be null when I

00:56:17.480 --> 00:56:23.240
already have it specified down
below that ID is a primary key.

00:56:23.240 --> 00:56:25.520
Let me check others here.

00:56:25.520 --> 00:56:29.800
You might also think, could I do
it for rider ID and station ID?

00:56:29.800 --> 00:56:32.470
Should I include not null here?

00:56:32.470 --> 00:56:36.490
Rider ID, not null,
station ID, not null.

00:56:36.490 --> 00:56:39.250
Well, that would be a good
thought, but again, we're

00:56:39.250 --> 00:56:43.540
taken care of by our table constraint
using our foreign key here.

00:56:43.540 --> 00:56:48.520
Again, this constraint will
say, if rider ID doesn't already

00:56:48.520 --> 00:56:53.380
exist in the ID column of rider's,
I can't insert that value.

00:56:53.380 --> 00:56:58.090
And we could probably presume that
if rider's ID is a primary key,

00:56:58.090 --> 00:57:01.130
well, null will not be
part of this column,

00:57:01.130 --> 00:57:06.430
and therefore, I already can't insert
null for rider ID or station ID.

00:57:06.430 --> 00:57:08.860
This would be in this case, redundant.

00:57:08.860 --> 00:57:12.910
So not null is good when you
have columns that are neither

00:57:12.910 --> 00:57:15.910
primary keys nor foreign
keys and you want

00:57:15.910 --> 00:57:20.010
to make sure that they have-- they
always have a value, that they

00:57:20.010 --> 00:57:22.170
are never null, effectively.

00:57:25.010 --> 00:57:27.320
So that is not null.

00:57:27.320 --> 00:57:28.730
So we could keep going here.

00:57:28.730 --> 00:57:34.190
We also had one called unique
that makes sure every value,

00:57:34.190 --> 00:57:37.100
every row in this column is unique.

00:57:37.100 --> 00:57:38.820
Where could we apply this?

00:57:38.820 --> 00:57:42.140
I could try to apply it, let's
say, to the name of a station.

00:57:42.140 --> 00:57:44.820
Like, station should have unique names.

00:57:44.820 --> 00:57:47.600
So I'll say not null and unique.

00:57:47.600 --> 00:57:50.220
Now this column has two constraints.

00:57:50.220 --> 00:57:51.680
The first, not null.

00:57:51.680 --> 00:57:53.720
It should always have a value.

00:57:53.720 --> 00:57:55.190
The second, unique.

00:57:55.190 --> 00:57:58.220
The value shouldn't repeat
throughout this column.

00:57:58.220 --> 00:58:01.100
Line I might leave
without this constraint.

00:58:01.100 --> 00:58:04.370
I could imagine two stations being
on the same line, like both on blue,

00:58:04.370 --> 00:58:07.170
I'll allow that in this case.

00:58:07.170 --> 00:58:12.560
Now again, we could try to apply unique
to our primary keys or our foreign keys

00:58:12.560 --> 00:58:15.920
as I just did here,
but it's already taken

00:58:15.920 --> 00:58:19.520
care of for us using this
primary key constraint.

00:58:19.520 --> 00:58:24.050
A primary key, again, is always
going to be unique and never null.

00:58:24.050 --> 00:58:29.720
So we'll take advantage of that already
using our primary key and foreign key

00:58:29.720 --> 00:58:32.490
constraints here.

00:58:32.490 --> 00:58:33.270
OK.

00:58:33.270 --> 00:58:36.180
So we've seen unique and not null.

00:58:36.180 --> 00:58:39.510
And I might argue, we're at the
point where this schema seems

00:58:39.510 --> 00:58:44.400
to be fairly optimized, at least using
our column constraints, our table

00:58:44.400 --> 00:58:47.560
constraints, our type
affinities and so on.

00:58:47.560 --> 00:58:55.440
So let's ask then, what questions do
we have on not null and unique, if any?

00:58:55.440 --> 00:58:58.770
AUDIENCE: So basically, to recap
if I understood correctly, it's

00:58:58.770 --> 00:59:01.620
not precisely about not
null and unique but about

00:59:01.620 --> 00:59:06.060
the concept of the key,
labeling key immediately

00:59:06.060 --> 00:59:10.135
gives the attribute of not null,
unique, and to be referenced, right?

00:59:10.135 --> 00:59:11.260
CARTER ZENKE: That is true.

00:59:11.260 --> 00:59:14.640
So when you use a primary key
or a foreign key constraint,

00:59:14.640 --> 00:59:17.880
there are other constraints that
go along with that constraint.

00:59:17.880 --> 00:59:23.410
A primary key, for instance, must not
be null, it must be unique, and so on.

00:59:23.410 --> 00:59:25.290
So it would be redundant
to apply that again,

00:59:25.290 --> 00:59:29.010
to say that this primary key
should be unique or not null.

00:59:29.010 --> 00:59:31.160
Good clarification there.

00:59:31.160 --> 00:59:32.000
OK.

00:59:32.000 --> 00:59:36.590
So I think we're at the point where
this schema is pretty well set for us,

00:59:36.590 --> 00:59:42.380
and we're going to need to think about
how we use this in the real world.

00:59:42.380 --> 00:59:48.630
Like, if this is our schema here,
we have riders and stations, well,

00:59:48.630 --> 00:59:49.940
what could we do?

00:59:49.940 --> 00:59:52.340
I mean, riders tend to--

00:59:52.340 --> 00:59:54.710
I mean, they could
register for the subway.

00:59:54.710 --> 00:59:56.900
And riders, well, they
do visit stations.

00:59:56.900 --> 00:59:59.360
But I think if we applied
this to the real world,

00:59:59.360 --> 01:00:02.630
we'd see that this isn't
quite how it's actually done.

01:00:02.630 --> 01:00:05.510
Like, riders don't really
register that often.

01:00:05.510 --> 01:00:08.570
If a rider who's new to the city comes
in, they want to ride the subway,

01:00:08.570 --> 01:00:10.760
they should be able to ride, too.

01:00:10.760 --> 01:00:15.110
So it turns out at least here in
Boston, the MBTA doesn't really

01:00:15.110 --> 01:00:20.360
track riders, per se, but they do
track what we call CharlieCards.

01:00:20.360 --> 01:00:23.780
If you want to ride a subway,
you get a CharlieCard.

01:00:23.780 --> 01:00:26.330
A CharlieCard allows you
access to the subway,

01:00:26.330 --> 01:00:28.910
keeps track of your fare,
your balance and so on,

01:00:28.910 --> 01:00:31.730
and allows you to swipe
in to certain stations.

01:00:31.730 --> 01:00:33.800
So when we come back
from a break here, we'll

01:00:33.800 --> 01:00:37.550
see how we can actually implement
these CharlieCards instead of riders

01:00:37.550 --> 01:00:40.460
to make our system more efficient
and more in line with what

01:00:40.460 --> 01:00:42.140
happens in the real world.

01:00:42.140 --> 01:00:45.220
We'll see you all in just a few.

01:00:45.220 --> 01:00:46.940
And we're back.

01:00:46.940 --> 01:00:49.750
So we saw last time, this
challenge of designing

01:00:49.750 --> 01:00:53.080
a system for the Massachusetts
Bay Transportation Authority

01:00:53.080 --> 01:00:55.540
to represent riders
and stations-- people

01:00:55.540 --> 01:00:58.480
who get on a subway to go around Boston.

01:00:58.480 --> 01:01:03.460
But as we also saw, we
learned that the MBTA doesn't

01:01:03.460 --> 01:01:09.100
keep track of riders themselves, they
keep track of CharlieCards, this card

01:01:09.100 --> 01:01:11.650
that a rider might
carry in their pocket.

01:01:11.650 --> 01:01:17.230
And they can scan this card at a station
to enter or even to exit in some cases,

01:01:17.230 --> 01:01:21.650
to make sure that their fare is
counted as they enter that station.

01:01:21.650 --> 01:01:27.520
So let's think now how to improve our
schema to represent not just riders,

01:01:27.520 --> 01:01:31.540
like me, But CharlieCards
themselves people might carry around

01:01:31.540 --> 01:01:34.120
when they enter the subway station.

01:01:34.120 --> 01:01:37.120
Well, we saw before we
had riders and cards.

01:01:37.120 --> 01:01:43.030
But our goal now is to remove riders
from the picture, focus only on cards.

01:01:43.030 --> 01:01:47.470
Well cards, as we've seen, might
make us swipe at a station.

01:01:47.470 --> 01:01:51.790
If I enter Harvard station,
I might swipe my CharlieCard

01:01:51.790 --> 01:01:54.380
to enter that station.

01:01:54.380 --> 01:01:59.270
And we could see that a card would
have maybe many swipes involved.

01:01:59.270 --> 01:02:04.520
Like if I swipe at Harvard, I might also
swipe it MIT or swipe it Park Street,

01:02:04.520 --> 01:02:05.810
and so on.

01:02:05.810 --> 01:02:10.410
We could see that a swipe can only
happen at a single station at a time,

01:02:10.410 --> 01:02:10.910
though.

01:02:10.910 --> 01:02:14.990
Like if I swipe at Park Street, I'm
only swiping at Park Street here,

01:02:14.990 --> 01:02:20.050
and similarly, a swipe might
only involve one, single card.

01:02:20.050 --> 01:02:23.040
So if we think about these
entities and how they relate,

01:02:23.040 --> 01:02:27.810
we could also think about what kinds of
columns you have inside of each entity.

01:02:27.810 --> 01:02:30.730
In this case, I would argue we
have something a bit like this.

01:02:30.730 --> 01:02:36.150
We could say that a card has a ID
for its primary key, in this case.

01:02:36.150 --> 01:02:42.360
Similarly, this card makes a swipe,
and this swipe has, itself, an ID,

01:02:42.360 --> 01:02:46.260
as well as a type, some time
that that swipe happened,

01:02:46.260 --> 01:02:49.480
and an amount or a
transaction that is involved.

01:02:49.480 --> 01:02:54.090
So for example, let's say I swipe
in at the Harvard Square Station.

01:02:54.090 --> 01:02:59.880
That type of swipe might be to enter
the station at some certain time.

01:02:59.880 --> 01:03:03.510
Now associated with that
swipe is also some amount

01:03:03.510 --> 01:03:07.320
in dollars that happened to
be subtracted from my card.

01:03:07.320 --> 01:03:10.620
Like let's say the fare
nowadays is $2.40, well,

01:03:10.620 --> 01:03:15.830
that amount is subtracted
from my card from this swipe.

01:03:15.830 --> 01:03:20.270
Now, of course, I do all of this at
a station, which has our same columns

01:03:20.270 --> 01:03:23.630
from before, ID, name, and line.

01:03:23.630 --> 01:03:28.130
So a similar idea now, but we're
replacing riders with cards

01:03:28.130 --> 01:03:31.190
and adding more information
to these visits.

01:03:31.190 --> 01:03:34.940
There are instead swipes that could
be maybe entering the station,

01:03:34.940 --> 01:03:39.240
exiting the station or just adding
funds to my balance, for instance.

01:03:39.240 --> 01:03:41.690
So let's see how we could
improve our schema now

01:03:41.690 --> 01:03:47.720
using some new SQL keywords to alter
our tables and add some new ones, too.

01:03:47.720 --> 01:03:49.790
I'll go back to my computer.

01:03:49.790 --> 01:03:54.560
And let's see how we could try to
alter the table we've already created.

01:03:54.560 --> 01:03:58.910
We already have a riders table, a
visits table and a stations table,

01:03:58.910 --> 01:04:02.450
but we could learn some new statements,
some new queries to actually

01:04:02.450 --> 01:04:06.030
update and alter these tables as well.

01:04:06.030 --> 01:04:08.100
The first one as we
saw a little bit before

01:04:08.100 --> 01:04:11.270
is this one called drop table,
arguably the most dramatic thing

01:04:11.270 --> 01:04:14.100
you can do to table, is just
drop it, delete it like this.

01:04:14.100 --> 01:04:18.080
So let's try just deleting the
riders table from our database.

01:04:18.080 --> 01:04:21.140
I'll go back to my--

01:04:21.140 --> 01:04:24.050
go back to my database here.

01:04:24.050 --> 01:04:29.370
I'll type sqlite mbta.db to
pull up this database again.

01:04:29.370 --> 01:04:34.160
And now if I type dot schema, I'll see
I have a riders, stations, and visits

01:04:34.160 --> 01:04:34.790
table.

01:04:34.790 --> 01:04:38.510
But no longer-- I don't want
to have the riders table.

01:04:38.510 --> 01:04:39.990
I want to remove that.

01:04:39.990 --> 01:04:45.200
So I'll say, drop table
riders, semicolon.

01:04:45.200 --> 01:04:46.880
Now I'll hit enter.

01:04:46.880 --> 01:04:48.650
No more riders table.

01:04:48.650 --> 01:04:52.830
If I type dot schema,
that's gone for my database.

01:04:52.830 --> 01:04:54.990
Well, what could I do now?

01:04:54.990 --> 01:04:58.740
I've dropped the table, but
I'd still need to maybe update

01:04:58.740 --> 01:05:00.660
visits to instead be swipes.

01:05:00.660 --> 01:05:02.850
I could probably leave stations as is.

01:05:02.850 --> 01:05:06.360
But I want to update a
table or alter its schema,

01:05:06.360 --> 01:05:09.370
I can use some new commands as well.

01:05:09.370 --> 01:05:12.960
I'll use this one called alter table.

01:05:12.960 --> 01:05:16.300
Alter table looks a bit like this.

01:05:16.300 --> 01:05:22.320
I can use the alter table statement
here and give it some table name.

01:05:22.320 --> 01:05:27.990
Like let's say alter table visits,
and then inside this visits table,

01:05:27.990 --> 01:05:30.370
I get to ask, what do I want to do?

01:05:30.370 --> 01:05:33.030
Do I want to, let say, rename the table?

01:05:33.030 --> 01:05:34.240
I could do that.

01:05:34.240 --> 01:05:37.890
I could also decide to add
a column, rename a column,

01:05:37.890 --> 01:05:40.440
or drop a column altogether.

01:05:40.440 --> 01:05:43.530
Let's try first looking at rename to.

01:05:43.530 --> 01:05:47.790
I want to rename this visits
table to a swipes table,

01:05:47.790 --> 01:05:51.510
representing not just a visit to
the station, but a swipe of a card.

01:05:51.510 --> 01:05:53.340
So let's try this one here.

01:05:53.340 --> 01:05:57.240
I'll go back to my computer
and I'll go back to SQLite.

01:05:57.240 --> 01:06:01.740
And I'll say, I no longer want
visits to be called visits.

01:06:01.740 --> 01:06:06.330
I ideally want visits to
actually be called swipes.

01:06:06.330 --> 01:06:07.590
So let me try this.

01:06:07.590 --> 01:06:13.500
I'll say alter table,
as we saw before, visits

01:06:13.500 --> 01:06:18.180
rename to swipes, like this, semicolon.

01:06:18.180 --> 01:06:19.770
Now I'll hit enter.

01:06:19.770 --> 01:06:24.212
And now if I type dot
schema, again, I see--

01:06:24.212 --> 01:06:25.170
oops, kind of strange--

01:06:25.170 --> 01:06:28.500
I'll hit dot schema
again, I'll see swipes,

01:06:28.500 --> 01:06:32.730
no longer called visits,
but now called swipes.

01:06:32.730 --> 01:06:36.390
What we saw-- I'd ideally like
to add a type of the swipe,

01:06:36.390 --> 01:06:40.200
maybe I'm entering the
station, maybe I'm exiting.

01:06:40.200 --> 01:06:43.770
Maybe I'm just adding funds
or depositing some funds.

01:06:43.770 --> 01:06:47.940
So let me try adding, let's
say, a new column to swipes.

01:06:47.940 --> 01:06:51.630
I'll say alter table, add column--

01:06:51.630 --> 01:06:57.910
alter table, swipes, I'm going to
name that table, swipes, like this.

01:06:57.910 --> 01:07:00.840
And then let's add a column--

01:07:00.840 --> 01:07:03.510
add a column called type.

01:07:03.510 --> 01:07:06.600
And this will have the
type affinity text.

01:07:06.600 --> 01:07:08.580
I'll hit semicolon, enter.

01:07:08.580 --> 01:07:12.060
And now if I type dot schema, I'll see--

01:07:12.060 --> 01:07:16.820
well, I have a column called a ttpe.

01:07:16.820 --> 01:07:18.270
So clearly, I made a typo here.

01:07:18.270 --> 01:07:20.570
I had rider ID, station ID.

01:07:20.570 --> 01:07:26.120
This new column down below called ttpe,
I kind of want to fix this, right?

01:07:26.120 --> 01:07:28.430
I don't want ttpe, I want type.

01:07:28.430 --> 01:07:31.760
So probably good chance
to use my rename column.

01:07:31.760 --> 01:07:34.250
I'll come back here and I'll try that.

01:07:34.250 --> 01:07:37.610
I'll instead do alter table--

01:07:37.610 --> 01:07:43.160
alter table, swipes, and I
will rename a column now.

01:07:43.160 --> 01:07:51.170
I'll say rename the column, ttpe
to type, spelled correctly now.

01:07:51.170 --> 01:07:55.220
Hit semicolon, clear my
terminal, type dot schema.

01:07:55.220 --> 01:08:01.070
And now I see over here that type
has been added to my table of swipes.

01:08:01.070 --> 01:08:06.170
I see rider ID, station ID, and
now a new column called type.

01:08:06.170 --> 01:08:11.630
So through alter table, we go ahead
and add new columns, rename them even,

01:08:11.630 --> 01:08:15.650
or if I wanted to just
drop the column altogether.

01:08:15.650 --> 01:08:19.160
Let's say I add this column
type and I change my mind,

01:08:19.160 --> 01:08:20.580
I don't want it here anymore.

01:08:20.580 --> 01:08:24.260
I go back to my computer and I could
try dropping a particular column.

01:08:24.260 --> 01:08:28.850
Let me try in this case,
alter table, swipes.

01:08:28.850 --> 01:08:33.859
And now let me drop
column type, semicolon.

01:08:33.859 --> 01:08:34.640
Hit enter.

01:08:34.640 --> 01:08:40.529
And now if I type dot schema,
I'm back to where I began.

01:08:40.529 --> 01:08:45.330
So these are the commands we can use
to alter our tables, to improve them

01:08:45.330 --> 01:08:48.450
if you make a mistake during
our first create table command,

01:08:48.450 --> 01:08:52.930
or if you want to add more
functionality later down the line.

01:08:52.930 --> 01:08:58.050
So ideally, I could keep using alter
table, add table, create table, drop

01:08:58.050 --> 01:08:58.930
table, and so on.

01:08:58.930 --> 01:09:01.290
But what I want to do here
is just start from scratch.

01:09:01.290 --> 01:09:04.029
Like I have stations
and swipes and so on.

01:09:04.029 --> 01:09:07.899
Why don't I just go back to my
schema file and use that instead?

01:09:07.899 --> 01:09:11.910
So what I'll do is I'll drop
table for stations and--

01:09:11.910 --> 01:09:12.779
oops, semicolon.

01:09:12.779 --> 01:09:17.490
I will drop table for
swipes now, semicolon.

01:09:17.490 --> 01:09:22.439
I'll type dot schema, and
now I'll see nothing in here.

01:09:22.439 --> 01:09:26.760
I'll quit and I'll type code schema.sql.

01:09:26.760 --> 01:09:29.970
Let me just start from
scratch using this schema.sql.

01:09:29.970 --> 01:09:35.790
So we no longer want to have
riders, we only want to have cards.

01:09:35.790 --> 01:09:38.670
So I could just rename this table here.

01:09:38.670 --> 01:09:39.930
I'll call it cards.

01:09:39.930 --> 01:09:41.609
Create table, cards.

01:09:41.609 --> 01:09:47.520
Now cards don't have a name, they
only have some unique ID in this case.

01:09:47.520 --> 01:09:50.460
Now I'll leave stations just as it is.

01:09:50.460 --> 01:09:51.870
This seems perfectly fine to me.

01:09:51.870 --> 01:09:55.800
Stations have an ID, a name and
a line with these constraints

01:09:55.800 --> 01:09:58.690
applied to the name and line columns.

01:09:58.690 --> 01:10:00.720
But now the visits table.

01:10:00.720 --> 01:10:04.500
Well, the visits table is no longer
a visit to the station per se,

01:10:04.500 --> 01:10:07.570
it's more a swipe of the
card at that station.

01:10:07.570 --> 01:10:10.950
So let's now say visits becomes swipes.

01:10:10.950 --> 01:10:15.370
And among these new columns
to add are the following.

01:10:15.370 --> 01:10:18.690
I want to have not just
an ID for each swipe,

01:10:18.690 --> 01:10:22.230
not an ID for rider ID
and station ID and so on,

01:10:22.230 --> 01:10:26.310
but also I want to have a type of swipe.

01:10:26.310 --> 01:10:30.870
Am I entering the station,
exiting, or depositing some funds?

01:10:30.870 --> 01:10:34.980
So I'll say the type of this
is going to be a new column,

01:10:34.980 --> 01:10:40.200
and the data type, this whole
will be text, enter, exit,

01:10:40.200 --> 01:10:42.900
or deposit for funds.

01:10:42.900 --> 01:10:45.270
Now let me try another column, two.

01:10:45.270 --> 01:10:47.700
I'll include a date time.

01:10:47.700 --> 01:10:49.260
A date time is like a timestamp.

01:10:49.260 --> 01:10:51.810
What time did this
swipe actually happen?

01:10:51.810 --> 01:10:54.570
And I'll make this type numeric.

01:10:54.570 --> 01:11:00.650
Numeric can store all kinds of dates
and times for me in this table.

01:11:00.650 --> 01:11:02.950
Now let me add one final column.

01:11:02.950 --> 01:11:04.990
This one will be an amount.

01:11:04.990 --> 01:11:09.910
I'll also use numeric for
this kind of column here.

01:11:09.910 --> 01:11:13.150
And I'll say that this
column called amount

01:11:13.150 --> 01:11:18.500
can store, in this case integers or
real numbers, like floats and so on.

01:11:18.500 --> 01:11:24.310
I'll probably decide on that when I
actually add some data to this table.

01:11:24.310 --> 01:11:28.650
So here we've updated our schema to
represent that diagram we saw before.

01:11:28.650 --> 01:11:31.380
I have cards, I have
stations, and I have

01:11:31.380 --> 01:11:36.240
swipes that have some type associated
with them, enter, exit, deposit,

01:11:36.240 --> 01:11:39.750
some date, time, and some
amount that was charged to me

01:11:39.750 --> 01:11:41.340
while I made this swipe.

01:11:41.340 --> 01:11:44.070
Either I added some funds, in
which case amount is positive,

01:11:44.070 --> 01:11:50.000
or I subtracted some funds, entering
exiting, in this case from the station.

01:11:50.000 --> 01:11:50.990
All right.

01:11:50.990 --> 01:11:53.390
So I have these tables now.

01:11:53.390 --> 01:11:56.750
And now I want to probably
apply some of those same column

01:11:56.750 --> 01:11:58.490
constraints we saw before.

01:11:58.490 --> 01:12:02.840
Like here, it's fine, but I also want
to make sure I'm not adding some data

01:12:02.840 --> 01:12:05.280
that I don't want to add to this table.

01:12:05.280 --> 01:12:09.230
So I could go back to my old
friends, these column constraints.

01:12:09.230 --> 01:12:14.450
And we saw before, we had default
and not null, unique and check.

01:12:14.450 --> 01:12:20.400
We've used not null and unique,
but we haven't check or default.

01:12:20.400 --> 01:12:22.970
So let's start using more
than just not null and unique

01:12:22.970 --> 01:12:27.720
and also focus on check and default,
what they can do for us here.

01:12:27.720 --> 01:12:29.670
I'll go back to my schema.

01:12:29.670 --> 01:12:32.970
And let me just make sure that
I'm making all the columns that I

01:12:32.970 --> 01:12:35.610
want to be required actually required.

01:12:35.610 --> 01:12:39.990
I'll go into my swipes table
and I'll say that type--

01:12:39.990 --> 01:12:41.700
I mean, this should be required.

01:12:41.700 --> 01:12:46.380
I should know whether I entered,
exited, or deposited some funds.

01:12:46.380 --> 01:12:48.750
So I'll say text not null.

01:12:48.750 --> 01:12:52.590
Similarly, for the timestamp,
the time this swipe happened,

01:12:52.590 --> 01:12:54.970
I want that to be not null as well.

01:12:54.970 --> 01:12:57.660
I must know what time I swiped the card.

01:12:57.660 --> 01:13:00.270
And also, it makes sense
for there to always be

01:13:00.270 --> 01:13:02.820
some amount associated with this swipe.

01:13:02.820 --> 01:13:07.450
Either I added some funds to my balance
or I removed some funds overall.

01:13:07.450 --> 01:13:11.260
So I'll make this not null as well.

01:13:11.260 --> 01:13:14.910
Well, let's see what we
could do here with default.

01:13:14.910 --> 01:13:18.900
Default gives me some default
value, some initial value

01:13:18.900 --> 01:13:25.320
to add to this column if I don't
specify what kind of value to add,

01:13:25.320 --> 01:13:27.840
what be good for date time.

01:13:27.840 --> 01:13:30.390
Date time here is, again, the timestamp.

01:13:30.390 --> 01:13:35.080
The time I swiped this card to
enter, let's say, Harvard station.

01:13:35.080 --> 01:13:40.210
Well, if I want this to
always have the current time,

01:13:40.210 --> 01:13:43.690
I could use default. I
could say the default

01:13:43.690 --> 01:13:48.220
value for this column is
this special value here,

01:13:48.220 --> 01:13:52.160
CURRENT TIMESTAMP, in all caps.

01:13:52.160 --> 01:13:57.470
CURRENT TIMESTAMP will show me the
year, the month, the day, the hour,

01:13:57.470 --> 01:14:02.600
the minute, the second, all in one
value, and insert that into my table.

01:14:02.600 --> 01:14:08.330
So as soon as I add new row, if I don't
supply a time for this datetime column,

01:14:08.330 --> 01:14:15.860
I'll instead get back the current time
exactly as it is represented in SQLite.

01:14:15.860 --> 01:14:17.760
Now what can I do further than this?

01:14:17.760 --> 01:14:21.750
I could also try to add
a-- my very own check.

01:14:21.750 --> 01:14:26.780
Maybe I want to make sure that the
amounts here are never equal to zero.

01:14:26.780 --> 01:14:31.220
Like, nobody should ever be able to
make a $0 transaction or $0 swipe.

01:14:31.220 --> 01:14:33.860
They're always being charged
some money or they're always

01:14:33.860 --> 01:14:36.050
depositing some money, in this case.

01:14:36.050 --> 01:14:41.540
So I could say, amount here has
my very own check on this column.

01:14:41.540 --> 01:14:45.980
And inside check, I can actually write
my very own expression to check for.

01:14:45.980 --> 01:14:51.170
I could say, for example,
amount is not equal to zero.

01:14:51.170 --> 01:14:54.500
Using those same operators
we saw back in week zero,

01:14:54.500 --> 01:15:01.810
this will ensure that any value inside
amount will not be equal to zero.

01:15:01.810 --> 01:15:03.760
Let's try also for type.

01:15:03.760 --> 01:15:06.730
I mean, type can only have a few values.

01:15:06.730 --> 01:15:11.170
We saw enter, exit, deposit some funds.

01:15:11.170 --> 01:15:16.060
I could make sure that only those values
are included inside my type column.

01:15:16.060 --> 01:15:20.500
I could say check that type
is in some list of values.

01:15:20.500 --> 01:15:23.560
Going back to week one
here, we talk about in.

01:15:23.560 --> 01:15:31.460
I could say, maybe the type is
in enter exit or is in deposit.

01:15:31.460 --> 01:15:34.570
So now when I have this
table called swipes,

01:15:34.570 --> 01:15:38.740
I'm representing what I'm actually
doing when I go to Harvard station.

01:15:38.740 --> 01:15:42.633
I have here a visit for
myself, my very own ID

01:15:42.633 --> 01:15:44.050
here, which will update in second.

01:15:44.050 --> 01:15:47.290
I have a station ID where
I'm actually going to visit.

01:15:47.290 --> 01:15:52.270
I have a type that I'm going to
use to enter, exit, or deposit

01:15:52.270 --> 01:15:54.100
some funds at this station.

01:15:54.100 --> 01:15:56.710
I'm doing it at a certain
time, and I have an amount

01:15:56.710 --> 01:16:00.200
associated with this transaction.

01:16:00.200 --> 01:16:02.500
Now, there's one thing
to fix here, which

01:16:02.500 --> 01:16:07.040
is that we're still talking about
riders inside our swipes table.

01:16:07.040 --> 01:16:08.360
So let's fix that here, too.

01:16:08.360 --> 01:16:12.460
I'll go back to my computer,
and let's try fixing this.

01:16:12.460 --> 01:16:19.120
I have a rider ID inside of my swipes
table, but no longer do I have riders,

01:16:19.120 --> 01:16:20.680
I have cards.

01:16:20.680 --> 01:16:25.780
So let me say that this is now a card
ID, and down below in my foreign key,

01:16:25.780 --> 01:16:32.650
I'll say that this card ID column
references the ID column in cards.

01:16:32.650 --> 01:16:35.140
And I think this should
represent everything

01:16:35.140 --> 01:16:39.800
I want to represent about
swipes at the station.

01:16:39.800 --> 01:16:44.690
So let me ask now what questions
we have about this new schema

01:16:44.690 --> 01:16:47.270
and the constraints we've applied.

01:16:47.270 --> 01:16:50.210
AUDIENCE: I wonder, how
could you delete or drop

01:16:50.210 --> 01:16:56.510
the table of riders when
you use ID as a foreign key?

01:16:56.510 --> 01:17:00.128
I try to do that, but I got an error.

01:17:00.128 --> 01:17:03.170
CARTER ZENKE: Yeah, so you're getting
into some more advanced stuff here.

01:17:03.170 --> 01:17:06.500
And suffice to say for now,
my foreign key constraints

01:17:06.500 --> 01:17:09.830
aren't actually being checked
right now, but yours might be.

01:17:09.830 --> 01:17:14.090
If you try to drop a table that
actually has some data that's

01:17:14.090 --> 01:17:17.900
referenced by a foreign key, SQLite
will warn you, perhaps to tell you,

01:17:17.900 --> 01:17:21.840
you can't do that because this ID is
referenced from this table over there.

01:17:21.840 --> 01:17:26.780
So in that case, best to delete
the value that has that foreign key

01:17:26.780 --> 01:17:30.470
and then proceed with dropping
that table altogether.

01:17:30.470 --> 01:17:31.910
Good question there.

01:17:31.910 --> 01:17:34.630
Yeah, let's take one more.

01:17:34.630 --> 01:17:43.570
AUDIENCE: How might this syntax send
the other SQL languages, same as SQLite

01:17:43.570 --> 01:17:46.580
languages, how much is syntax same?

01:17:46.580 --> 01:17:48.080
CARTER ZENKE: Yeah, a good question.

01:17:48.080 --> 01:17:52.000
So here we're using the SQLite
database management system.

01:17:52.000 --> 01:17:55.060
It is similar to but
distinct in some ways

01:17:55.060 --> 01:17:57.820
from others like MySQL or PostgreSQL.

01:17:57.820 --> 01:18:00.220
I would say that most of
what you're doing here

01:18:00.220 --> 01:18:04.822
could also be ported to MySQL and
PostgreSQL with a few exceptions

01:18:04.822 --> 01:18:07.030
that you might need to treat
on a case by case basis.

01:18:07.030 --> 01:18:10.220
In fact, the developers
of SQLite built things

01:18:10.220 --> 01:18:14.800
so that it would be easy to port their
database schemas to another schema

01:18:14.800 --> 01:18:18.140
like MySQL or PostgreSQL as well.

01:18:18.140 --> 01:18:21.500
But good question there.

01:18:21.500 --> 01:18:23.960
All right, let's take
one more question here.

01:18:23.960 --> 01:18:30.260
AUDIENCE: Imagine if we put in the
ID, we don't put any data type,

01:18:30.260 --> 01:18:32.090
it's going to give us--

01:18:32.090 --> 01:18:37.032
in this schema, it's going
to give us a text or--

01:18:37.032 --> 01:18:38.490
CARTER ZENKE: Yeah, great question.

01:18:38.490 --> 01:18:42.980
So you're asking if I didn't tell
SQLite what kind of type affinity

01:18:42.980 --> 01:18:46.220
a column had, what type of
affinity would it actually have?

01:18:46.220 --> 01:18:47.360
A great question.

01:18:47.360 --> 01:18:52.460
In this case, by default, SQLite
gives the numeric type affinity,

01:18:52.460 --> 01:18:56.250
where numeric can store
integers or real values.

01:18:56.250 --> 01:18:59.180
But if you give it like a string
of text, like let's say red line,

01:18:59.180 --> 01:19:04.980
it will store that for you, too,
kind of non-intuitively, but it will.

01:19:04.980 --> 01:19:08.310
But the default type affinity
is numeric in this case,

01:19:08.310 --> 01:19:10.340
if you don't otherwise specify.

01:19:10.340 --> 01:19:11.900
Good question.

01:19:11.900 --> 01:19:17.410
OK, let's come back then and let's focus
on wrapping up on a final few pieces

01:19:17.410 --> 01:19:17.910
here.

01:19:17.910 --> 01:19:20.810
So we've seen some
table constraints, which

01:19:20.810 --> 01:19:23.360
we apply primary keys and foreign keys.

01:19:23.360 --> 01:19:26.360
We saw column constraints,
where we could make sure

01:19:26.360 --> 01:19:28.970
that certain values were
given to us through not null.

01:19:28.970 --> 01:19:31.730
We could also make sure that the--

01:19:31.730 --> 01:19:33.925
let's say the value is
in some list of values,

01:19:33.925 --> 01:19:37.460
they're using check or making
sure that it's not some value,

01:19:37.460 --> 01:19:39.320
also using check as well.

01:19:39.320 --> 01:19:44.930
Default allows us to specify a given
value for every new insertion of rows

01:19:44.930 --> 01:19:45.560
here.

01:19:45.560 --> 01:19:49.550
And this is actually
pretty important to have.

01:19:49.550 --> 01:19:53.750
Not just schemas that have
column names and let's

01:19:53.750 --> 01:19:57.320
say type affinities as
well, but also constraints,

01:19:57.320 --> 01:20:00.210
make sure the data we
insert works well for us.

01:20:00.210 --> 01:20:02.960
And there's actually a
story behind this person

01:20:02.960 --> 01:20:07.820
who's on the CharlieCard, this
person who's on the subway.

01:20:07.820 --> 01:20:11.420
His name is Charlie, and he's
perhaps the most famous subway

01:20:11.420 --> 01:20:13.880
rider in all of Boston.

01:20:13.880 --> 01:20:17.900
Back in the 19-- let's
say 1900s, the band

01:20:17.900 --> 01:20:21.290
called The Kingston Trio wrote a
song about this man named Charlie.

01:20:21.290 --> 01:20:26.030
Charlie, supposedly, got on at the
Kendall Square station where MIT is,

01:20:26.030 --> 01:20:27.950
and he made for Jamaica Plain.

01:20:27.950 --> 01:20:30.320
But once he got to Jamaica
Plane, the conductor

01:20:30.320 --> 01:20:34.040
asked him for one more nickel, and
well, he didn't have that nickel.

01:20:34.040 --> 01:20:36.860
So he got stuck on the train
for years, and years, and years.

01:20:36.860 --> 01:20:38.750
He couldn't get off of the subway.

01:20:38.750 --> 01:20:42.532
So keep in mind, Charlie, when you're
writing your own database constraints,

01:20:42.532 --> 01:20:44.240
making sure that if
you get on the train,

01:20:44.240 --> 01:20:46.610
you're able to get off of
it at some point, right?

01:20:46.610 --> 01:20:49.080
Don't end up like Charlie in this case.

01:20:49.080 --> 01:20:53.720
So with this in mind, you're able
to design your very own database

01:20:53.720 --> 01:20:56.840
schemas that keep not just
certain columns involved,

01:20:56.840 --> 01:20:59.210
but also type affinities
for those columns, types--

01:20:59.210 --> 01:21:01.790
those data types-- types
the columns can store.

01:21:01.790 --> 01:21:04.940
You're also able to apply constraints
to those columns to make sure

01:21:04.940 --> 01:21:07.130
that the data you're
inserting is that you actually

01:21:07.130 --> 01:21:09.380
want to have in that column.

01:21:09.380 --> 01:21:13.100
Next time, we'll focus on actually
adding data to our columns,

01:21:13.100 --> 01:21:16.260
actually write data to a
database file to insert,

01:21:16.260 --> 01:21:18.720
update, and delete that data altogether.

01:21:18.720 --> 01:21:22.480
So with that in mind,
we'll see you next time.