WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:00.998
[CROWD MURMURING]

00:00:00.998 --> 00:00:03.992
[MUSIC PLAYING]

00:00:24.980 --> 00:00:27.710
DAVID MALAN: All right,
this is CS50's Introduction

00:00:27.710 --> 00:00:29.030
to Programming with Python.

00:00:29.030 --> 00:00:33.500
My name is David Malan, and this is
our week on File I/O, Input and Output

00:00:33.500 --> 00:00:34.100
of files.

00:00:34.100 --> 00:00:37.020
So up until now, most every
program we've written just

00:00:37.020 --> 00:00:39.800
stores all the information
that it collects in memory--

00:00:39.800 --> 00:00:43.910
that is, in variables or inside of the
program itself, a downside of which

00:00:43.910 --> 00:00:46.520
is that, as soon as the program
exits, anything you typed in,

00:00:46.520 --> 00:00:49.220
anything that you did
with that program is lost.

00:00:49.220 --> 00:00:53.240
Now, with files, of course, on your Mac
or PC, you can hang on to information

00:00:53.240 --> 00:00:53.960
long term.

00:00:53.960 --> 00:00:56.180
And File I/O within the
context of programming

00:00:56.180 --> 00:01:00.170
is all about writing code that can
read from, that is load information

00:01:00.170 --> 00:01:04.709
from, or write to, that is save
information to, files themselves.

00:01:04.709 --> 00:01:06.980
So let's see if we can't
transition then from only

00:01:06.980 --> 00:01:10.130
using memory and variables and
the like to actually writing

00:01:10.130 --> 00:01:14.150
code that saves some files for us
and, therefore, data persistently.

00:01:14.150 --> 00:01:18.050
Well, to do this, let me propose that
we first consider a familiar data

00:01:18.050 --> 00:01:21.830
structure, a familiar type of variable
that we've seen before, that of a list.

00:01:21.830 --> 00:01:24.890
And using lists, we've been able
to store more than one piece

00:01:24.890 --> 00:01:26.180
of information in the past.

00:01:26.180 --> 00:01:28.620
Using one variable, we
typically store one value.

00:01:28.620 --> 00:01:31.950
But if that variable is a list,
we can store multiple values.

00:01:31.950 --> 00:01:34.890
Unfortunately, lists are stored
in the computer's memory.

00:01:34.890 --> 00:01:38.390
And so once your program exits, even
the contents of those disappear.

00:01:38.390 --> 00:01:40.920
But let's at least give
ourselves a starting point.

00:01:40.920 --> 00:01:42.440
So I'm over here in VS Code.

00:01:42.440 --> 00:01:45.020
And I'm going to go ahead and
create a simple program using

00:01:45.020 --> 00:01:49.790
code of names.py, a program that
just collects people's names,

00:01:49.790 --> 00:01:51.230
students' names, if you will.

00:01:51.230 --> 00:01:53.330
And I'm going to do it
super simply initially

00:01:53.330 --> 00:01:56.390
in a manner consistent with what we've
done in the past to get user input

00:01:56.390 --> 00:01:57.560
and print it back out.

00:01:57.560 --> 00:02:01.910
I'm going to say something like this,
name equals input, quote/unquote,

00:02:01.910 --> 00:02:03.170
what's your name?

00:02:03.170 --> 00:02:06.350
Thereby storing in a
variable called name

00:02:06.350 --> 00:02:08.690
the return value of input, as always.

00:02:08.690 --> 00:02:11.060
And as always, I'm going
to go ahead and very simply

00:02:11.060 --> 00:02:14.090
print out a nice f string
that says, hello, comma,

00:02:14.090 --> 00:02:17.720
and then, in curly braces, name to
print out Hello, David, hello, world,

00:02:17.720 --> 00:02:20.060
whoever happens to be using the program.

00:02:20.060 --> 00:02:23.060
Let me go ahead and run this just to
remind myself what I should expect.

00:02:23.060 --> 00:02:26.750
And if I run python of names.py and
hit Enter, type in my name like David,

00:02:26.750 --> 00:02:29.520
of course, I now see
Hello, comma, David.

00:02:29.520 --> 00:02:32.720
Suppose, though, that we wanted to
add support not just for one name,

00:02:32.720 --> 00:02:35.870
but multiple names-- maybe three
names for the sake of discussion

00:02:35.870 --> 00:02:39.740
so that we can begin to accumulate
some amount of information

00:02:39.740 --> 00:02:42.080
in the program, such
that it's really going

00:02:42.080 --> 00:02:46.190
to be a downside if we keep throwing
it away once the program exits.

00:02:46.190 --> 00:02:49.430
Well, let me go back into
names.py up here at top.

00:02:49.430 --> 00:02:52.820
Let me proactively give myself a
variable, this time called names,

00:02:52.820 --> 00:02:53.510
plural.

00:02:53.510 --> 00:02:55.570
And set it equal to an empty list.

00:02:55.570 --> 00:02:58.820
Recall that the square bracket notation,
especially if nothing's inside of it,

00:02:58.820 --> 00:03:03.140
just means, give me an empty list
that we can add things to over time.

00:03:03.140 --> 00:03:04.790
Well, what do we want to add to it?

00:03:04.790 --> 00:03:07.130
Well, let's add three
names, each from the user.

00:03:07.130 --> 00:03:11.930
And let me say something like
this, for underscore in range of 3,

00:03:11.930 --> 00:03:16.160
let me go ahead and prompt the
user with the input function

00:03:16.160 --> 00:03:18.050
and getting their name in this variable.

00:03:18.050 --> 00:03:25.400
And then using list syntax, I can
say, names.append name to that list.

00:03:25.400 --> 00:03:28.370
And now I have, in that
list, that given name--

00:03:28.370 --> 00:03:30.200
1, 2, 3 of them.

00:03:30.200 --> 00:03:32.780
Other points to note is, I
could use a variable here,

00:03:32.780 --> 00:03:34.280
like i, which is conventional.

00:03:34.280 --> 00:03:37.640
But if I'm not actually using i
explicitly on any subsequent lines,

00:03:37.640 --> 00:03:40.730
I might as well just use underscore,
which is a Pythonic convention.

00:03:40.730 --> 00:03:43.790
And actually, if I want to clean
this up a little bit right now,

00:03:43.790 --> 00:03:46.610
notice that my name
variable doesn't really

00:03:46.610 --> 00:03:48.830
need to exist because
I'm assigning it a value

00:03:48.830 --> 00:03:50.360
and then immediately appending it.

00:03:50.360 --> 00:03:54.440
Well, I could tighten this up further
by just getting rid of that variable

00:03:54.440 --> 00:03:59.300
altogether and just appending
immediately the return value of input.

00:03:59.300 --> 00:04:01.888
I think we could go both
ways in terms of design here.

00:04:01.888 --> 00:04:04.430
On the one hand, it's a pretty
short line, and it's readable.

00:04:04.430 --> 00:04:06.950
On the other hand, if I were to
eventually change this phrase

00:04:06.950 --> 00:04:08.950
to be not what's your
name but something longer,

00:04:08.950 --> 00:04:11.390
we might want to break it
out again into two lines.

00:04:11.390 --> 00:04:13.310
But for now, I think
it's pretty readable.

00:04:13.310 --> 00:04:17.180
Now later in the program, let's just go
ahead and print out those same names,

00:04:17.180 --> 00:04:20.540
but let's sort them alphabetically
so that it makes sense

00:04:20.540 --> 00:04:24.510
to be gathering them all together,
then sorting them, and printing them.

00:04:24.510 --> 00:04:25.580
So how can I do that?

00:04:25.580 --> 00:04:28.490
Well, in Python, the simplest
way to sort a list in a loop

00:04:28.490 --> 00:04:30.170
is probably to do something like this.

00:04:30.170 --> 00:04:32.780
For name in names--

00:04:32.780 --> 00:04:33.410
but wait.

00:04:33.410 --> 00:04:34.910
Let's sort the names first.

00:04:34.910 --> 00:04:36.920
Recall that there's a
function called sorted

00:04:36.920 --> 00:04:40.050
which will return a sorted
version of that list.

00:04:40.050 --> 00:04:44.960
Now let's go ahead and print out an
f string that says, again, hello,

00:04:44.960 --> 00:04:47.623
bracket, name, close quotes.

00:04:47.623 --> 00:04:49.290
All right, let me go ahead and run this.

00:04:49.290 --> 00:04:52.910
So Python of names.py,
and let me go ahead

00:04:52.910 --> 00:04:54.590
and type in a few names this time.

00:04:54.590 --> 00:04:56.090
How about Hermione?

00:04:56.090 --> 00:04:57.680
How about Harry?

00:04:57.680 --> 00:04:58.940
How about Ron?

00:04:58.940 --> 00:05:02.190
And notice that they're not
quite in alphabetical order.

00:05:02.190 --> 00:05:04.910
But when I hit Enter
and that loop kicks in,

00:05:04.910 --> 00:05:07.520
it's going to print out, hello,
Harry, hello, Hermione, hello,

00:05:07.520 --> 00:05:10.310
Ron, in sorted order.

00:05:10.310 --> 00:05:13.730
But of course, now, if I run this
program again, all of the names

00:05:13.730 --> 00:05:14.420
are lost.

00:05:14.420 --> 00:05:16.235
And if this is a bigger
program than this,

00:05:16.235 --> 00:05:18.110
that might actually be
pretty painful to have

00:05:18.110 --> 00:05:21.090
to re-input the same information
again, and again, and again.

00:05:21.090 --> 00:05:23.780
Wouldn't it be nice, like
most any program today

00:05:23.780 --> 00:05:26.240
on a phone, or a laptop,
or desktop, or cloud

00:05:26.240 --> 00:05:30.330
to be able to save this
information somehow instead?

00:05:30.330 --> 00:05:32.360
And that's where File I/O comes in.

00:05:32.360 --> 00:05:33.890
And that's where files come in.

00:05:33.890 --> 00:05:37.910
They are a way of storing information
persistently on your own phone, or Mac,

00:05:37.910 --> 00:05:42.020
or PC, or some cloud server's disk
so that they're there when you

00:05:42.020 --> 00:05:44.010
come back and run the program again.

00:05:44.010 --> 00:05:50.030
So how can we go about saving all three
of these names on in a file as opposed

00:05:50.030 --> 00:05:52.627
to having to type them again and again?

00:05:52.627 --> 00:05:54.710
Let me go ahead and simplify
this file and, again,

00:05:54.710 --> 00:05:57.050
give myself just a single
variable called name,

00:05:57.050 --> 00:06:01.890
and set the return value of
input equal to that variable.

00:06:01.890 --> 00:06:04.550
So what's your name, as
before, quote/unquote.

00:06:04.550 --> 00:06:08.540
And now let me go ahead, and let me
do something more with this value.

00:06:08.540 --> 00:06:11.750
Instead of just adding it to a list
or printing it immediately out,

00:06:11.750 --> 00:06:14.030
let's save the value
of the person's name

00:06:14.030 --> 00:06:15.950
that's just been typed in to a file.

00:06:15.950 --> 00:06:17.600
Well, how do we go about doing that?

00:06:17.600 --> 00:06:20.600
Well, in Python, there's this function
called open whose purpose in life

00:06:20.600 --> 00:06:25.320
is to do just that, to open a file,
but to open it up programmatically

00:06:25.320 --> 00:06:28.580
so that you, the programmer, can
actually read information from it

00:06:28.580 --> 00:06:30.440
or write information to it.

00:06:30.440 --> 00:06:33.560
So open is like the programmer's
equivalent of double clicking

00:06:33.560 --> 00:06:35.480
on an icon on your Mac or PC.

00:06:35.480 --> 00:06:37.580
But it's a programmer's
technique because it's

00:06:37.580 --> 00:06:40.070
going to allow you to
specify exactly what you want

00:06:40.070 --> 00:06:42.980
to read from or write to that file.

00:06:42.980 --> 00:06:45.440
Formally, it's documentation
is here, and you'll

00:06:45.440 --> 00:06:48.037
see that it's usage is
relatively straightforward.

00:06:48.037 --> 00:06:50.870
It minimally just requires the name
of the file that we want to open

00:06:50.870 --> 00:06:53.700
and, optionally, how we want to open it.

00:06:53.700 --> 00:06:57.650
So let me go back to VS Code here,
and let me propose now that I do this.

00:06:57.650 --> 00:07:01.190
I'm going to go ahead and call
this function called open, passing

00:07:01.190 --> 00:07:05.150
in an argument for names.txt, which
is the name of the file I would

00:07:05.150 --> 00:07:07.400
like to store all of these names in.

00:07:07.400 --> 00:07:08.750
I could call it anything I want.

00:07:08.750 --> 00:07:10.670
But because it's going
to be just text, it's

00:07:10.670 --> 00:07:13.280
conventional to call it something.txt.

00:07:13.280 --> 00:07:15.590
But I'm also going to
tell the open function

00:07:15.590 --> 00:07:18.150
that I plan to write to this file.

00:07:18.150 --> 00:07:21.530
So as a second argument to open, I'm
going to put literally, quote/unquote,

00:07:21.530 --> 00:07:25.160
w, for Write, and that's
going to tell open to open

00:07:25.160 --> 00:07:28.070
the file in a way that's going to
allow me to change the content.

00:07:28.070 --> 00:07:29.960
And better yet, if it
doesn't even exist yet,

00:07:29.960 --> 00:07:32.030
it's going to create the file for me.

00:07:32.030 --> 00:07:35.540
Now, open returns what's
called a file handle,

00:07:35.540 --> 00:07:39.020
a special value that allows me
to access that file subsequently.

00:07:39.020 --> 00:07:42.560
So I'm going to go ahead and sign
it equal to a variable like file.

00:07:42.560 --> 00:07:45.020
And now I'm going to go
ahead and, quite simply,

00:07:45.020 --> 00:07:47.640
write this person's name to that file.

00:07:47.640 --> 00:07:52.790
So I'm going to literally type file,
which is the variable linking to that

00:07:52.790 --> 00:07:57.230
file, .write, which is a function
otherwise known as a method that comes

00:07:57.230 --> 00:08:00.920
with open files that allows me
to write that name to the file.

00:08:00.920 --> 00:08:03.500
And then lastly, I'm going
to quite simply going

00:08:03.500 --> 00:08:07.310
to go ahead and say, file.close,
which will close and effectively save

00:08:07.310 --> 00:08:08.092
the file.

00:08:08.092 --> 00:08:11.300
So these three lines of code here are
essentially the programmer's equivalent

00:08:11.300 --> 00:08:13.820
to double clicking an
icon on your Mac or PC,

00:08:13.820 --> 00:08:16.760
making some changes in Microsoft
Word or some other program,

00:08:16.760 --> 00:08:18.020
and going to File, Save.

00:08:18.020 --> 00:08:21.560
We're doing that all in code
with just these three lines here.

00:08:21.560 --> 00:08:24.210
Well, let's see, now, how this works.

00:08:24.210 --> 00:08:30.440
Let me go ahead now and run
python of names.py and Enter.

00:08:30.440 --> 00:08:31.740
Let's type in a name.

00:08:31.740 --> 00:08:34.789
I'll type in Hermione, Enter.

00:08:34.789 --> 00:08:37.370
All right, where did she end up?

00:08:37.370 --> 00:08:41.630
Well, let me go ahead now
and type code of names.txt,

00:08:41.630 --> 00:08:43.850
which is a file that
happens now to exist

00:08:43.850 --> 00:08:45.950
because I opened it in write mode.

00:08:45.950 --> 00:08:49.700
And if I open this in a tab,
we'll see there is Hermione.

00:08:49.700 --> 00:08:52.520
Well, let's go ahead and
run names.py once more.

00:08:52.520 --> 00:08:57.290
I'm going to go ahead and run python
of names.py, Enter, and this time,

00:08:57.290 --> 00:08:58.760
I'll type in Harry.

00:08:58.760 --> 00:09:00.590
Let me go ahead and
run it one more time.

00:09:00.590 --> 00:09:02.480
And this time, I'll type in Ron.

00:09:02.480 --> 00:09:07.010
And now let me go up to names.txt,
where, hopefully, I'll see all three

00:09:07.010 --> 00:09:08.570
of them here.

00:09:08.570 --> 00:09:09.650
But no.

00:09:09.650 --> 00:09:12.350
I've just actually seen Ron.

00:09:12.350 --> 00:09:16.250
What might explain what
happened to Hermione and Harry,

00:09:16.250 --> 00:09:19.040
even though I'm pretty sure I
ran the program three times,

00:09:19.040 --> 00:09:24.170
and I definitely wrote the code
that writes their name to that file?

00:09:24.170 --> 00:09:26.425
What's going on here, do you think?

00:09:26.425 --> 00:09:28.550
AUDIENCE: I think because
we're not appending them,

00:09:28.550 --> 00:09:30.650
we should append the names.

00:09:30.650 --> 00:09:34.430
Since we are writing directly,
it is erasing the old content,

00:09:34.430 --> 00:09:40.605
and it is replacing with the last
set of characters that we mentioned.

00:09:40.605 --> 00:09:41.480
DAVID MALAN: Exactly.

00:09:41.480 --> 00:09:44.240
Unfortunately, quote/unquote
w is a little dangerous.

00:09:44.240 --> 00:09:46.160
Not only will it create
the file for you,

00:09:46.160 --> 00:09:49.250
it will also recreate the
file for you every time you

00:09:49.250 --> 00:09:50.610
open the file in that mode.

00:09:50.610 --> 00:09:52.940
So if you open the file
once and write Hermione,

00:09:52.940 --> 00:09:54.478
that worked just fine, as we saw.

00:09:54.478 --> 00:09:57.020
But if you do it again for Harry,
if you do it again for Ron,

00:09:57.020 --> 00:09:58.100
the code is working.

00:09:58.100 --> 00:10:02.240
But each time, it's opening the file and
recreating it with brand-new contents,

00:10:02.240 --> 00:10:04.940
so we had one version with Hermione,
and one version with Harry,

00:10:04.940 --> 00:10:06.650
and one final version with Ron.

00:10:06.650 --> 00:10:09.500
But ideally, I think we
probably want to be appending,

00:10:09.500 --> 00:10:11.960
as Vishal says, each of
those names to the file,

00:10:11.960 --> 00:10:15.630
not just clobbering-- that is,
overwriting the file each time.

00:10:15.630 --> 00:10:16.520
So how can I do this?

00:10:16.520 --> 00:10:18.500
It's actually a relatively easy fix.

00:10:18.500 --> 00:10:20.610
Let me go ahead and do this as follows.

00:10:20.610 --> 00:10:23.630
I'm going to first remove
the old version of names.txt.

00:10:23.630 --> 00:10:26.550
And now I'm going to
change my code to do this.

00:10:26.550 --> 00:10:29.840
I'm going to change the w,
quote/unquote, to just a,

00:10:29.840 --> 00:10:32.990
quote/unquote-- a for Append,
which means to add to the bottom,

00:10:32.990 --> 00:10:34.940
to the bottom, to the
bottom, again and again.

00:10:34.940 --> 00:10:39.320
Now let me go ahead and rerun
python of names.py, Enter.

00:10:39.320 --> 00:10:41.990
I'll again start from
scratch with Hermione

00:10:41.990 --> 00:10:44.090
because I'm creating the file new.

00:10:44.090 --> 00:10:49.700
Notice that if I now do code
of names.txt, Enter, we do

00:10:49.700 --> 00:10:51.170
see that Hermione is back.

00:10:51.170 --> 00:10:54.590
So after removing the
file, it did get recreated,

00:10:54.590 --> 00:10:56.670
even though I'm using
append, which is good.

00:10:56.670 --> 00:11:00.380
But now let's see what happens
when I go back to my terminal.

00:11:00.380 --> 00:11:03.260
And this time, I run
python of names.py again--

00:11:03.260 --> 00:11:04.850
this time, typing in Harry.

00:11:04.850 --> 00:11:06.720
And let me run it one more time--

00:11:06.720 --> 00:11:08.120
this time, typing in Ron.

00:11:08.120 --> 00:11:10.850
So hopefully, this time, in
that second tab, names.txt,

00:11:10.850 --> 00:11:13.670
I should now see all three of them.

00:11:13.670 --> 00:11:17.030
But, but, but, but this
doesn't look ideal.

00:11:17.030 --> 00:11:21.213
What have I clearly done wrong?

00:11:21.213 --> 00:11:23.630
Something tells me, even though
all three names are there,

00:11:23.630 --> 00:11:26.180
it's not going to be easy to
read those back unless you

00:11:26.180 --> 00:11:29.300
know where each name ends and begins.

00:11:29.300 --> 00:11:33.200
AUDIENCE: The English
format is not correct.

00:11:33.200 --> 00:11:35.510
The English format is not correct.

00:11:35.510 --> 00:11:36.620
It's incorrect.

00:11:36.620 --> 00:11:38.540
It's concatenating them.

00:11:38.540 --> 00:11:40.910
DAVID MALAN: It is.

00:11:40.910 --> 00:11:43.070
Well, it appears to be concatenating.

00:11:43.070 --> 00:11:46.280
But technically speaking, it's
just appending to the file--

00:11:46.280 --> 00:11:48.710
first Hermione, then Harry, then Ron.

00:11:48.710 --> 00:11:50.840
It has the effect of
combining them back to back,

00:11:50.840 --> 00:11:52.298
but it's not concatenating, per se.

00:11:52.298 --> 00:11:53.690
It really is just appending.

00:11:53.690 --> 00:11:55.370
Let's go to another hand here.

00:11:55.370 --> 00:11:58.100
What really have I done wrong?

00:11:58.100 --> 00:12:01.010
Or equivalently, how might I fix?

00:12:01.010 --> 00:12:05.000
It would be nice if there were some
kind of gaps between each of the names,

00:12:05.000 --> 00:12:07.460
so we could read them more cleanly.

00:12:07.460 --> 00:12:08.210
AUDIENCE: Hello.

00:12:08.210 --> 00:12:13.160
We should add a new line
before we write new name.

00:12:13.160 --> 00:12:13.910
DAVID MALAN: Good.

00:12:13.910 --> 00:12:15.470
We want to add a new line ourselves.

00:12:15.470 --> 00:12:19.430
So whereas print by default, recall,
always outputs, automatically,

00:12:19.430 --> 00:12:20.990
a line ending of backslash n.

00:12:20.990 --> 00:12:24.410
Unless we override it with the
named parameter called end,

00:12:24.410 --> 00:12:25.640
write does not do that.

00:12:25.640 --> 00:12:26.810
Write takes you literally.

00:12:26.810 --> 00:12:29.120
And if you say write
Hermione, that's it.

00:12:29.120 --> 00:12:30.680
You're getting the H through the e.

00:12:30.680 --> 00:12:33.740
If you say, write Harry,
you get the H through the y.

00:12:33.740 --> 00:12:36.810
You don't get any extra
new lines automatically.

00:12:36.810 --> 00:12:40.760
So if you want to have a new line
at the end of each of these names,

00:12:40.760 --> 00:12:42.150
we've got to do that manually.

00:12:42.150 --> 00:12:46.350
So let me, again, close names.txt,
and let me remove the current file.

00:12:46.350 --> 00:12:48.200
And let me go back up to my code here.

00:12:48.200 --> 00:12:49.920
And I can fix this in
any number of ways,

00:12:49.920 --> 00:12:51.712
but I'm just going to
go ahead and do this.

00:12:51.712 --> 00:12:55.700
I'm going to write out an f string
that contains name and backslash

00:12:55.700 --> 00:12:56.522
n at the end.

00:12:56.522 --> 00:12:57.980
We could do this in different ways.

00:12:57.980 --> 00:13:00.952
We could manually print just the
new line or some other technique,

00:13:00.952 --> 00:13:04.160
but I'm going to go ahead and use my f
strings, as I'm in the habit of doing,

00:13:04.160 --> 00:13:07.290
and just print the name and
the new line all at once.

00:13:07.290 --> 00:13:11.150
I'm going to go ahead now and down to my
terminal window, run python of names.py

00:13:11.150 --> 00:13:12.230
again, Enter.

00:13:12.230 --> 00:13:13.790
We'll type in Hermione.

00:13:13.790 --> 00:13:15.890
I'm going to run it
again, type in Harry.

00:13:15.890 --> 00:13:18.500
I'm going to type it
again and this time, Ron.

00:13:18.500 --> 00:13:22.430
Now I'm going to run code of
names.txt and open that file.

00:13:22.430 --> 00:13:25.730
And now it looks like the
file is a bit cleaner.

00:13:25.730 --> 00:13:28.130
Indeed, I have each of
the name on its own line

00:13:28.130 --> 00:13:32.810
as well as a line ending, which
ensures that we can separate one

00:13:32.810 --> 00:13:33.750
from the other.

00:13:33.750 --> 00:13:38.030
Now, if I were writing code, I
bet I could parse, that is, read

00:13:38.030 --> 00:13:39.950
the previous file by
looking at differences

00:13:39.950 --> 00:13:41.727
between lowercase and uppercase letters.

00:13:41.727 --> 00:13:43.310
But that's going to get messy quickly.

00:13:43.310 --> 00:13:46.640
Generally speaking, when storing
data long-term in a file,

00:13:46.640 --> 00:13:50.750
you should probably do it somehow
cleanly, like doing one name at a time.

00:13:50.750 --> 00:13:52.662
Well, let's now go
back, and I'll propose

00:13:52.662 --> 00:13:54.620
that this code is now
working correctly, but we

00:13:54.620 --> 00:13:56.300
can design it a little bit better.

00:13:56.300 --> 00:14:00.410
It turns out that it's all too easy
when writing code to sometimes forget

00:14:00.410 --> 00:14:01.460
to close files.

00:14:01.460 --> 00:14:03.770
And sometimes, this isn't
necessarily a big deal.

00:14:03.770 --> 00:14:05.450
But sometimes, it can create problems.

00:14:05.450 --> 00:14:08.210
Files could get corrupted or
accidentally deleted or the like,

00:14:08.210 --> 00:14:09.990
depending on what happens in your code.

00:14:09.990 --> 00:14:14.660
So it turns out that you don't strictly
need to call close on the file yourself

00:14:14.660 --> 00:14:16.550
if you take another approach instead.

00:14:16.550 --> 00:14:21.950
More Pythonic when manipulating
files is to do this,

00:14:21.950 --> 00:14:25.370
to introduce this other
keyword called, quite simply,

00:14:25.370 --> 00:14:29.220
with that allows you to
specify that, in this context,

00:14:29.220 --> 00:14:33.030
I want you to open and
automatically close some file.

00:14:33.030 --> 00:14:34.520
So how do we use with?

00:14:34.520 --> 00:14:35.970
It simply looks like this.

00:14:35.970 --> 00:14:37.430
Let me go back to my code here.

00:14:37.430 --> 00:14:39.320
I've gotten rid of the close line.

00:14:39.320 --> 00:14:41.360
And I'm now just going
to say this instead.

00:14:41.360 --> 00:14:44.240
Instead of saying, file
equals open, I'm going

00:14:44.240 --> 00:14:48.290
to say, with open, then the
same arguments as before,

00:14:48.290 --> 00:14:51.860
and somewhat curiously, I'm going to
put the variable at the end of the line.

00:14:51.860 --> 00:14:52.400
Why?

00:14:52.400 --> 00:14:54.080
That's just the way this is done.

00:14:54.080 --> 00:14:56.840
You say, with, you call
the function in question,

00:14:56.840 --> 00:15:00.320
and then you say as and specify the
name of the variable that should

00:15:00.320 --> 00:15:03.110
be assigned the return value of open.

00:15:03.110 --> 00:15:05.870
Then I'm going to go ahead and
indent the line underneath so

00:15:05.870 --> 00:15:08.330
that the line of code
that's writing the name

00:15:08.330 --> 00:15:12.770
is now in the context of this with
statement, which just ensures that,

00:15:12.770 --> 00:15:15.560
automatically, if I had
more code in this file

00:15:15.560 --> 00:15:19.970
down below no longer indented, the
file would be automatically closed

00:15:19.970 --> 00:15:22.130
as soon as line 4 is done executing.

00:15:22.130 --> 00:15:24.050
So it doesn't change
what has just happened,

00:15:24.050 --> 00:15:26.900
but it does automate the process
of at least closing things for us

00:15:26.900 --> 00:15:31.490
just to ensure I don't forget and
so that something doesn't go wrong.

00:15:31.490 --> 00:15:35.630
But suppose, now, that I wanted
to read these names from the file.

00:15:35.630 --> 00:15:38.580
All I've done thus far is write
code that writes names to the file.

00:15:38.580 --> 00:15:41.720
But let's assume, now, that we have
all of these names in the file.

00:15:41.720 --> 00:15:43.880
And heck, let's go
ahead and add one more.

00:15:43.880 --> 00:15:47.270
Let me go ahead and run this one
more time-- python of names.py.

00:15:47.270 --> 00:15:49.680
And let's add in Draco to the mix.

00:15:49.680 --> 00:15:52.100
So now that we have all
four of these names here,

00:15:52.100 --> 00:15:54.650
how might we want to read them back?

00:15:54.650 --> 00:15:57.203
Well, let me propose that
we go into names.py now,

00:15:57.203 --> 00:15:59.120
or we could create another
program altogether.

00:15:59.120 --> 00:16:02.660
But I'm going to keep reusing the same
name just to keep us focused on this.

00:16:02.660 --> 00:16:07.850
And now I'm going to write code that
reads an existing file with Hermione,

00:16:07.850 --> 00:16:10.550
Harry, Ron, and Draco together.

00:16:10.550 --> 00:16:11.802
And how do I do this?

00:16:11.802 --> 00:16:13.010
Well, it's similar in spirit.

00:16:13.010 --> 00:16:15.605
I'm going to start this
time with with open,

00:16:15.605 --> 00:16:18.230
and then the first argument is
going to be the name of the file

00:16:18.230 --> 00:16:19.910
that I want to open, as before.

00:16:19.910 --> 00:16:23.780
And I'm going to open it, this time,
in read mode-- quote/unquote, r.

00:16:23.780 --> 00:16:27.360
And to read a file just means
to load it, not to save it.

00:16:27.360 --> 00:16:30.462
And I'm going to name
the return value file.

00:16:30.462 --> 00:16:31.670
And now I'm going to do this.

00:16:31.670 --> 00:16:33.462
And there's a number
of ways I can do this,

00:16:33.462 --> 00:16:37.100
but one way to read all of the lines
from the file at once would be this.

00:16:37.100 --> 00:16:39.230
Let me declare a variable called lines.

00:16:39.230 --> 00:16:42.680
Let me access that file and
call a function or a method that

00:16:42.680 --> 00:16:44.730
comes with it called readlines.

00:16:44.730 --> 00:16:47.720
So if you read the documentation
on File I/O in Python,

00:16:47.720 --> 00:16:51.740
you'll see that open files come with
a special method whose purpose in life

00:16:51.740 --> 00:16:56.550
is to read all the lines from the
file and return them to me as a list.

00:16:56.550 --> 00:16:59.750
So what this line 2 is doing is
it's reading all of the lines

00:16:59.750 --> 00:17:03.230
from that file, storing them
in a variable called lines.

00:17:03.230 --> 00:17:05.839
Now, suppose I want to iterate
over all of those lines

00:17:05.839 --> 00:17:07.760
and print out each of those names.

00:17:07.760 --> 00:17:12.349
For line in lines, this is just
a standard for loop in Python.

00:17:12.349 --> 00:17:13.880
Lines as a list.

00:17:13.880 --> 00:17:16.760
Line is the variable that
will be automatically set

00:17:16.760 --> 00:17:17.930
to each of those lines.

00:17:17.930 --> 00:17:22.609
Let me go ahead and print out
something like, oh, hello, comma,

00:17:22.609 --> 00:17:25.750
and then I'll print out the line itself.

00:17:25.750 --> 00:17:30.790
All right, so let me go to my terminal
window, run python of names.py now--

00:17:30.790 --> 00:17:34.360
I have not deleted names.txt,
so it still contains all four

00:17:34.360 --> 00:17:38.590
of those names-- and hit
Enter, and OK, it's not bad,

00:17:38.590 --> 00:17:41.290
but it's a little ugly here.

00:17:41.290 --> 00:17:42.430
What's going on?

00:17:42.430 --> 00:17:45.940
When I ran names.py, it's saying
Hello to Hermione, to Harry, to Ron,

00:17:45.940 --> 00:17:46.540
to Draco.

00:17:46.540 --> 00:17:50.640
But there's these gaps
now between the lines.

00:17:50.640 --> 00:17:53.100
What explains that symptom?

00:17:53.100 --> 00:17:55.230
If nothing else, it just looks ugly.

00:17:55.230 --> 00:17:57.360
AUDIENCE: It happens
because in the text file,

00:17:57.360 --> 00:18:01.620
we have new line symbols
in between those names,

00:18:01.620 --> 00:18:05.850
and the print always adds
another new line at the end.

00:18:05.850 --> 00:18:08.695
So you use the same symbol twice.

00:18:08.695 --> 00:18:09.570
DAVID MALAN: Perfect.

00:18:09.570 --> 00:18:12.460
And here's a good example of
a bug, a mistake in a program.

00:18:12.460 --> 00:18:14.760
But if you just think about
those first principles,

00:18:14.760 --> 00:18:18.103
like, how do each of the lines
of code work that I'm using?

00:18:18.103 --> 00:18:21.270
You should be able to reason, exactly
as Ripal there to say that, all right,

00:18:21.270 --> 00:18:24.450
well, one of those new lines is
coming from the file after each name.

00:18:24.450 --> 00:18:26.760
And then, of course, print,
all of these weeks later,

00:18:26.760 --> 00:18:29.370
is still giving us for
free that extra new line.

00:18:29.370 --> 00:18:31.530
So there's a couple possible solutions.

00:18:31.530 --> 00:18:34.110
I could certainly do this,
which we've done in the past,

00:18:34.110 --> 00:18:38.040
and pass in a named argument
to print, like end="".

00:18:38.040 --> 00:18:39.330
And that's fine.

00:18:39.330 --> 00:18:41.730
I would argue a little better
than that might actually

00:18:41.730 --> 00:18:46.530
be to do this, to strip off of the
end of the line the actual new line

00:18:46.530 --> 00:18:50.370
itself so that print is handling the
printing of everything, the person's

00:18:50.370 --> 00:18:52.050
name as well as the new line.

00:18:52.050 --> 00:18:55.500
But you're just stripping off what
is really just an implementation

00:18:55.500 --> 00:18:56.700
detail in the file.

00:18:56.700 --> 00:19:01.420
We chose to use new lines in my text
file to separate one name from another.

00:19:01.420 --> 00:19:05.040
So arguably, it should be a
little cleaner in terms of design

00:19:05.040 --> 00:19:07.740
to strip that off and
then let print print out

00:19:07.740 --> 00:19:09.283
what is really just now a name.

00:19:09.283 --> 00:19:10.950
But that's ultimately a design decision.

00:19:10.950 --> 00:19:14.340
The effect is going to
be exactly the same.

00:19:14.340 --> 00:19:18.540
Well, if I'm going to open this
file and read all the lines

00:19:18.540 --> 00:19:21.870
and then iterate over all of those
lines and print them each out,

00:19:21.870 --> 00:19:23.910
I could actually combine
this into one thing

00:19:23.910 --> 00:19:26.130
because, right now, I'm
doing twice as much work.

00:19:26.130 --> 00:19:30.300
I'm reading all of the lines, then I'm
iterating over all of the lines just

00:19:30.300 --> 00:19:32.140
to print out each of them.

00:19:32.140 --> 00:19:34.770
Well, in Python, with files,
you can actually do this.

00:19:34.770 --> 00:19:37.060
I'm going to erase
almost all of these lines

00:19:37.060 --> 00:19:39.960
now, keeping only with statement at top.

00:19:39.960 --> 00:19:45.960
And inside of this with statement, I'm
going to say this, for line in file,

00:19:45.960 --> 00:19:50.872
go ahead and print out, quote/unquote,
hello, comma, and then line.rstrip.

00:19:50.872 --> 00:19:53.830
So I'm going to take the approach of
stripping off the end of the line.

00:19:53.830 --> 00:19:57.130
But notice how elegant
this is, so to speak.

00:19:57.130 --> 00:19:59.320
I've opened the file in line 1.

00:19:59.320 --> 00:20:01.860
And if I want to iterate
over every line in the file,

00:20:01.860 --> 00:20:05.280
I don't have to very
explicitly read all the lines,

00:20:05.280 --> 00:20:06.900
then iterate over all of the lines.

00:20:06.900 --> 00:20:08.440
I can combine this into one thought.

00:20:08.440 --> 00:20:11.407
In Python, you can simply
say, for line in file,

00:20:11.407 --> 00:20:14.490
and that's going to have the effect
of giving you a for loop that iterates

00:20:14.490 --> 00:20:18.240
over every line in the file, one
at a time, and on each iteration,

00:20:18.240 --> 00:20:22.110
updating the value of this
variable line to be Hermione,

00:20:22.110 --> 00:20:24.990
then Harry, then Ron, then Draco.

00:20:24.990 --> 00:20:28.080
So this, again, is one of the
appealing aspects of Python

00:20:28.080 --> 00:20:32.140
is that it reads rather like English--
for line in file, print this.

00:20:32.140 --> 00:20:35.190
It's a little more compact
when written this way.

00:20:35.190 --> 00:20:38.580
Well, what if, though, I don't
want quite this behavior?

00:20:38.580 --> 00:20:42.450
Because notice now, if I run
python of names.py, it's correct.

00:20:42.450 --> 00:20:45.060
I'm seeing each of the names
and each of the hellos,

00:20:45.060 --> 00:20:47.320
and there's no extra spaces in between.

00:20:47.320 --> 00:20:52.440
But just to be difficult, I'd really
like us to be sorting these hellos.

00:20:52.440 --> 00:20:56.610
Really, I'd like to see Draco first,
then Harry, then Hermione, then Ron,

00:20:56.610 --> 00:20:58.890
no matter what order
they appear in the file.

00:20:58.890 --> 00:21:02.127
So I could go in, of course, to the
file and manually change the file.

00:21:02.127 --> 00:21:03.960
But if that file is
changing over time based

00:21:03.960 --> 00:21:06.203
on who is typing their
name into the program,

00:21:06.203 --> 00:21:07.620
that's not really a good solution.

00:21:07.620 --> 00:21:10.412
In code, I should be able to load
the file, no matter what it looks

00:21:10.412 --> 00:21:12.930
like, and just sort it all at once.

00:21:12.930 --> 00:21:17.100
Now, here is a reason to
not do what I've just done.

00:21:17.100 --> 00:21:21.510
I can't iterate over each line
in the file and print it out

00:21:21.510 --> 00:21:23.550
but sort everything in advance.

00:21:23.550 --> 00:21:27.750
Logically, if I'm looking at each line
one at a time and printing it out,

00:21:27.750 --> 00:21:29.310
it's too late to sort.

00:21:29.310 --> 00:21:32.970
I really need to read all of the
lines first without printing them,

00:21:32.970 --> 00:21:34.990
sort them, then print them.

00:21:34.990 --> 00:21:38.110
So we have to take a step back in
order to add now this new feature.

00:21:38.110 --> 00:21:39.340
So how can I do this?

00:21:39.340 --> 00:21:42.030
Well, let me combine
some ideas from before.

00:21:42.030 --> 00:21:44.310
Let me go ahead and
start fresh with this.

00:21:44.310 --> 00:21:48.330
Let me give myself a list called
names, and assign it an empty list,

00:21:48.330 --> 00:21:52.140
just so I have a variable in which
to accumulate all of these lines.

00:21:52.140 --> 00:21:56.550
And now let me open the file with
open, quote/unquote, names.txt.

00:21:56.550 --> 00:21:58.840
And it turns out, I can
tighten this up a little bit.

00:21:58.840 --> 00:22:00.960
It turns out, if you're
opening a file to read it,

00:22:00.960 --> 00:22:03.420
you don't need to
specify, quote/unquote, r.

00:22:03.420 --> 00:22:05.130
That is the implicit default.

00:22:05.130 --> 00:22:08.160
So you can tighten things up
by just saying, open names.txt.

00:22:08.160 --> 00:22:10.680
And you'll be able to read
the file but not write it.

00:22:10.680 --> 00:22:13.590
I'm going to give myself a
variable called file, as before.

00:22:13.590 --> 00:22:17.730
I am going to iterate over the file
in the same way, for line in file.

00:22:17.730 --> 00:22:21.450
But instead of printing each
line, I'm going to do this.

00:22:21.450 --> 00:22:25.170
I'm going to take my names
list and append to it.

00:22:25.170 --> 00:22:27.930
And this is appending
to a list in memory,

00:22:27.930 --> 00:22:30.617
not appending to the file itself.

00:22:30.617 --> 00:22:32.700
I'm going to go ahead and
append the current line,

00:22:32.700 --> 00:22:35.400
but I'm going to strip off
the new line at the end

00:22:35.400 --> 00:22:39.600
so that all I'm adding to this list
is each of the students' names.

00:22:39.600 --> 00:22:42.660
Now I can use that familiar
technique from before.

00:22:42.660 --> 00:22:46.740
Let me go outside of this with statement
because now I've read the entire file,

00:22:46.740 --> 00:22:47.310
presumably.

00:22:47.310 --> 00:22:50.238
So by the time I'm done
with lines 4 and 5,

00:22:50.238 --> 00:22:52.530
again, and again, and again,
for each line in the file,

00:22:52.530 --> 00:22:53.610
I'm done with the file.

00:22:53.610 --> 00:22:54.390
It can close.

00:22:54.390 --> 00:22:57.870
I now have all of the students'
names in this list variable.

00:22:57.870 --> 00:22:58.890
Let me do this.

00:22:58.890 --> 00:23:04.110
For name in, not just
names, but the sorted names,

00:23:04.110 --> 00:23:08.250
using our Python function sorted,
which does just that, and do print,

00:23:08.250 --> 00:23:10.950
quote/unquote, with an
f string, hello, comma,

00:23:10.950 --> 00:23:13.780
and now I'll plug in bracket name.

00:23:13.780 --> 00:23:15.700
So now, what have I done?

00:23:15.700 --> 00:23:18.060
I'm creating a list
at the beginning, just

00:23:18.060 --> 00:23:20.010
so I have a place to gather my data.

00:23:20.010 --> 00:23:23.910
I then, on lines 3 through 5, iterate
over the file from top to bottom,

00:23:23.910 --> 00:23:27.000
reading in each line, one at a
time, stripping off the new line

00:23:27.000 --> 00:23:29.200
and adding just the
student's name to this list.

00:23:29.200 --> 00:23:32.280
And the reason I'm doing
that is so that on line 7,

00:23:32.280 --> 00:23:35.850
I can sort all of those names,
now that they're all in memory,

00:23:35.850 --> 00:23:37.450
and print them in order.

00:23:37.450 --> 00:23:40.720
I need to load them all into
memory before I can sort them.

00:23:40.720 --> 00:23:42.720
Otherwise, I'd be printing
them out prematurely,

00:23:42.720 --> 00:23:45.240
and Draco would end up
last instead of first.

00:23:45.240 --> 00:23:48.720
So let me go ahead in my terminal
window and run python of names.py

00:23:48.720 --> 00:23:50.280
now, and hit Enter.

00:23:50.280 --> 00:23:51.360
And there we go.

00:23:51.360 --> 00:23:54.900
The same list of four hellos,
but now they're sorted.

00:23:54.900 --> 00:23:56.460
And this is a very common technique.

00:23:56.460 --> 00:23:58.710
When dealing with files
and information more

00:23:58.710 --> 00:24:03.300
generally, if you want to change that
data in some way, like sorting it,

00:24:03.300 --> 00:24:06.690
creating some kind of variable at
the top of your program, like a list,

00:24:06.690 --> 00:24:10.620
adding or appending information to
it just to collect it in one place,

00:24:10.620 --> 00:24:14.070
and then do something interesting
with that collection, that list,

00:24:14.070 --> 00:24:16.140
is exactly what I've done here.

00:24:16.140 --> 00:24:18.840
Now, I should note that if we
just want to sort the file,

00:24:18.840 --> 00:24:21.960
we can actually do this even more
simply in Python, particularly

00:24:21.960 --> 00:24:25.980
by not bothering with this names
list, nor the second for loop.

00:24:25.980 --> 00:24:28.690
And let me go ahead and, instead,
just do more simply this.

00:24:28.690 --> 00:24:31.020
Let me go ahead and tell
Python that we want the file

00:24:31.020 --> 00:24:34.050
itself to be sorted using
that same sorted function,

00:24:34.050 --> 00:24:36.015
but this time on the file itself.

00:24:36.015 --> 00:24:38.640
And then inside of that for loop,
let's just go ahead and print

00:24:38.640 --> 00:24:42.300
right away our hello, comma,
followed by the line itself,

00:24:42.300 --> 00:24:46.110
but still stripping off of the
end of it any white space therein.

00:24:46.110 --> 00:24:48.330
If we go ahead and run
this same program now

00:24:48.330 --> 00:24:51.660
with python of names.py and hit
Enter, we get the same result.

00:24:51.660 --> 00:24:53.550
But of course, it's a lot more compact.

00:24:53.550 --> 00:24:55.950
But for the sake of
discussion, let's assume

00:24:55.950 --> 00:24:59.850
that we do actually want to potentially
make some changes to the data

00:24:59.850 --> 00:25:00.870
as we iterate over it.

00:25:00.870 --> 00:25:03.210
So let me undo those
changes, leave things as is.

00:25:03.210 --> 00:25:06.240
Whereby now, we'll continue to
accumulate all of the names first

00:25:06.240 --> 00:25:08.910
into a list, maybe do something
to them, maybe forcing them

00:25:08.910 --> 00:25:13.365
to uppercase or lowercase or the like,
and then sort and print out each item.

00:25:13.365 --> 00:25:15.240
Let me pause and see if
there's any questions

00:25:15.240 --> 00:25:21.180
now on File I/O reading or writing or
now accumulating all of these values

00:25:21.180 --> 00:25:22.138
in some list.

00:25:22.138 --> 00:25:22.680
AUDIENCE: Hi.

00:25:22.680 --> 00:25:25.920
Is there a way to sort the files--

00:25:25.920 --> 00:25:29.490
instead if you want it from
alphabetically from A to Z,

00:25:29.490 --> 00:25:32.490
is there a way to
reverse it from Z to A.

00:25:32.490 --> 00:25:35.460
Is there a little extension that
you can add to the end to do that?

00:25:35.460 --> 00:25:37.680
Or would you have to
create a new function?

00:25:37.680 --> 00:25:40.560
DAVID MALAN: If you wanted to
reverse the contents of the file?

00:25:40.560 --> 00:25:43.920
AUDIENCE: Yeah, so if you, instead
of sorting them from A to Z

00:25:43.920 --> 00:25:47.640
in ascending order, if you
wanted them in descending order,

00:25:47.640 --> 00:25:49.470
is there an extension for that?

00:25:49.470 --> 00:25:50.790
DAVID MALAN: There is, indeed.

00:25:50.790 --> 00:25:53.313
And as always, the
documentation is your friend.

00:25:53.313 --> 00:25:55.980
So if the goal is to sort them,
not in alphabetical order, which

00:25:55.980 --> 00:25:58.410
is the default, but maybe
reverse alphabetical order,

00:25:58.410 --> 00:26:01.660
you can take a look, for instance, at
the formal Python documentation there.

00:26:01.660 --> 00:26:03.540
And what you'll see is this summary.

00:26:03.540 --> 00:26:06.870
You'll see that the sorted function
takes the first argument, generally

00:26:06.870 --> 00:26:08.160
known as an iterable.

00:26:08.160 --> 00:26:11.100
And something that's iterable
means that you can iterate over it.

00:26:11.100 --> 00:26:13.620
That is you can loop over
it one thing at a time.

00:26:13.620 --> 00:26:17.520
What the rest of this line here means
is that you can specify a key, like,

00:26:17.520 --> 00:26:19.600
how you want to sort it,
but more on that later.

00:26:19.600 --> 00:26:22.200
But this last named
parameter here is reverse.

00:26:22.200 --> 00:26:25.140
And by default, per the
documentation, it's false.

00:26:25.140 --> 00:26:28.560
It will not be reversed by default.
But if we change that to true,

00:26:28.560 --> 00:26:29.650
I bet we can do that.

00:26:29.650 --> 00:26:32.350
So let me go back to VS
Code here and do just that.

00:26:32.350 --> 00:26:34.590
Let me go ahead and pass
in a second argument

00:26:34.590 --> 00:26:38.970
to sorted in addition to this
iterable, which is my names list--

00:26:38.970 --> 00:26:42.120
iterable, again, in the sense
that it can be looped over.

00:26:42.120 --> 00:26:47.740
And let me pass in reverse=True,
thereby overriding the default of false.

00:26:47.740 --> 00:26:49.830
Let me now run python of names.py.

00:26:49.830 --> 00:26:53.410
And now Ron's at the top,
and Draco's at the bottom.

00:26:53.410 --> 00:26:56.490
So there, too, whenever you have a
question like that moving forward,

00:26:56.490 --> 00:26:58.650
consider, what does
the documentation say?

00:26:58.650 --> 00:27:01.290
And see if there's a germ of an
idea there because, odds are,

00:27:01.290 --> 00:27:03.480
if you have some problem,
odds are, some programmer

00:27:03.480 --> 00:27:05.910
before you has had the same question.

00:27:05.910 --> 00:27:07.320
Other thoughts?

00:27:07.320 --> 00:27:11.130
AUDIENCE: Can we limit the
number or numbers of names?

00:27:11.130 --> 00:27:15.812
And the second question, can we
find a specific name in list?

00:27:15.812 --> 00:27:17.520
DAVID MALAN: Really
good question, can we

00:27:17.520 --> 00:27:19.270
limit the number of
the names in the file?

00:27:19.270 --> 00:27:20.730
And can we find a specific one?

00:27:20.730 --> 00:27:22.380
We absolutely could.

00:27:22.380 --> 00:27:25.500
If we were to write code,
we could, for instance,

00:27:25.500 --> 00:27:29.580
open the file first, count how
many lines are already there,

00:27:29.580 --> 00:27:32.250
and then if there's too
many already, we could just

00:27:32.250 --> 00:27:35.760
exit with sys.exit or some other
message to indicate to the user

00:27:35.760 --> 00:27:37.290
that, sorry, the class is full.

00:27:37.290 --> 00:27:40.500
As for finding someone
specifically, absolutely.

00:27:40.500 --> 00:27:44.490
You could imagine opening the file,
iterating over it with a for loop

00:27:44.490 --> 00:27:46.620
again and again and then
adding a conditional.

00:27:46.620 --> 00:27:51.397
Like, if the current line equals equals
Harry, then we found the chosen run.

00:27:51.397 --> 00:27:52.980
And you can print something like that.

00:27:52.980 --> 00:27:55.590
So you can absolutely combine
these ideas with previous ideas,

00:27:55.590 --> 00:27:58.470
like conditionals, to
ask those same questions.

00:27:58.470 --> 00:28:02.160
How about one other
question on File I/O?

00:28:02.160 --> 00:28:08.670
AUDIENCE: So I just thought about this
function, like read all the lines.

00:28:08.670 --> 00:28:14.280
And it looks like it's
separate all the lines

00:28:14.280 --> 00:28:17.520
by this special character, backslash.

00:28:17.520 --> 00:28:24.480
And but it looks like we don't need
it character, and we always strip it.

00:28:24.480 --> 00:28:28.920
And it looks like some
bad design or function.

00:28:28.920 --> 00:28:33.910
Why wouldn't we just strip
it inside this function?

00:28:33.910 --> 00:28:35.410
DAVID MALAN: A really good question.

00:28:35.410 --> 00:28:40.140
So we are, in my examples
thus far, using rstrip

00:28:40.140 --> 00:28:43.290
to strip from the end of the
line all of this white space.

00:28:43.290 --> 00:28:45.000
You might not want to do that.

00:28:45.000 --> 00:28:49.560
In this case, I am stripping it away
because I know that each of those lines

00:28:49.560 --> 00:28:51.000
isn't some generic line of text.

00:28:51.000 --> 00:28:55.050
Each line really represents a
name that I have put there myself.

00:28:55.050 --> 00:28:58.320
I'm using the new line just to
separate one value from another.

00:28:58.320 --> 00:29:00.600
In other scenarios, you
might very well want

00:29:00.600 --> 00:29:03.990
to keep that line ending because
it's a very long series of text,

00:29:03.990 --> 00:29:06.240
or a paragraph, or something
like that, where you want

00:29:06.240 --> 00:29:07.740
to keep it distinct from the others.

00:29:07.740 --> 00:29:09.150
But it's just a convention.

00:29:09.150 --> 00:29:13.950
We have to use something, presumably,
to separate one chunk of text

00:29:13.950 --> 00:29:14.700
from another.

00:29:14.700 --> 00:29:18.870
There are other functions in Python
that will, in fact, handle the removal

00:29:18.870 --> 00:29:20.490
of that white space for you.

00:29:20.490 --> 00:29:22.590
Readlines, though, does
literally that, though.

00:29:22.590 --> 00:29:25.110
It reads all of the lines as is.

00:29:25.110 --> 00:29:28.780
Well, allow me to turn our attention
back to where we left off here,

00:29:28.780 --> 00:29:33.450
which is just names to propose that,
with names.txt, we have an ability,

00:29:33.450 --> 00:29:36.690
it seems, to store each of these
names pretty straightforwardly.

00:29:36.690 --> 00:29:39.750
But what if we wanted to keep
track of other information as well?

00:29:39.750 --> 00:29:42.700
Suppose that we wanted
to store information,

00:29:42.700 --> 00:29:47.550
including a student's name
and their house at Hogwarts,

00:29:47.550 --> 00:29:50.230
be it Gryffindor, or
Slytherin, or something else.

00:29:50.230 --> 00:29:52.770
Well, where do we go about putting that?

00:29:52.770 --> 00:29:55.020
Hermione lives in Gryffindor,
so we could do something

00:29:55.020 --> 00:29:56.520
like this in our text file.

00:29:56.520 --> 00:29:58.980
Harry lives in Gryffindor,
so we could do that.

00:29:58.980 --> 00:30:01.170
Ron lives in Gryffindor,
so we could do that.

00:30:01.170 --> 00:30:03.900
And Draco lives in Slytherin,
so we could do that.

00:30:03.900 --> 00:30:06.600
But I worry here--

00:30:06.600 --> 00:30:09.990
but I worry now that we're mixing
apples and oranges, so to speak.

00:30:09.990 --> 00:30:11.220
Some lines are names.

00:30:11.220 --> 00:30:12.610
Some lines are houses.

00:30:12.610 --> 00:30:15.870
So this probably isn't the best
design, if only because it's confusing,

00:30:15.870 --> 00:30:17.010
or it's ambiguous.

00:30:17.010 --> 00:30:19.470
So maybe what we could
do is adopt a convention.

00:30:19.470 --> 00:30:22.140
And indeed, this is, in fact,
what a lot of programmers do.

00:30:22.140 --> 00:30:26.190
They change this file not to
be names.txt, but instead, let

00:30:26.190 --> 00:30:28.860
me create a new file called names.csv.

00:30:28.860 --> 00:30:31.650
CSV stands for Comma-Separated Values.

00:30:31.650 --> 00:30:35.490
And it's a very common convention to
store multiple pieces of information

00:30:35.490 --> 00:30:37.860
that are related in the same file.

00:30:37.860 --> 00:30:41.250
And so to do this, I'm going to
separate each of these types of data,

00:30:41.250 --> 00:30:44.400
not with another new line,
but simply with a comma.

00:30:44.400 --> 00:30:46.860
I'm going to keep each
student on their own line,

00:30:46.860 --> 00:30:49.980
but I'm going to separate the
information about each student using

00:30:49.980 --> 00:30:51.340
a comma instead.

00:30:51.340 --> 00:30:54.600
And so now we sort of have a
two-dimensional file, if you will.

00:30:54.600 --> 00:30:56.830
Row by row, we have our students.

00:30:56.830 --> 00:30:59.510
But if you think of these
commas as representing a column,

00:30:59.510 --> 00:31:02.760
even though it's not perfectly straight
because of the lengths of these names,

00:31:02.760 --> 00:31:05.310
it's a little jagged.

00:31:05.310 --> 00:31:07.950
You can think of these commas
as representing a column.

00:31:07.950 --> 00:31:11.190
And it turns out, these
CSV files are very commonly

00:31:11.190 --> 00:31:14.700
used when you use something like
Microsoft Excel, Apple Numbers,

00:31:14.700 --> 00:31:17.550
or Google Spreadsheets, and you
want to export the data to share

00:31:17.550 --> 00:31:20.160
with someone else as a CSV file.

00:31:20.160 --> 00:31:23.460
Or conversely, if you
want to import a CSV

00:31:23.460 --> 00:31:25.860
file into your preferred
spreadsheet software,

00:31:25.860 --> 00:31:29.590
like Excel, or Numbers, or Google
Spreadsheets, you can do that as well.

00:31:29.590 --> 00:31:33.150
So CSV is a very common,
very simple text format

00:31:33.150 --> 00:31:37.290
that just separates values with
commas and different types of values,

00:31:37.290 --> 00:31:39.280
ultimately, with new lines as well.

00:31:39.280 --> 00:31:42.210
Let me go ahead and run
code of students.csv

00:31:42.210 --> 00:31:44.520
to create a brand-new file
that's initially empty.

00:31:44.520 --> 00:31:48.820
And we'll add to it those same names
but also some other information as well.

00:31:48.820 --> 00:31:52.860
So if I now have this new file,
students.csv, inside of which

00:31:52.860 --> 00:31:56.370
is one column of names, so to
speak, and one column of houses,

00:31:56.370 --> 00:32:00.540
how do I go about changing my code
to read not just those names but also

00:32:00.540 --> 00:32:03.240
those names and houses so that
they're not all on one line--

00:32:03.240 --> 00:32:06.970
we somehow have access to
both type of value separately?

00:32:06.970 --> 00:32:11.340
Well, let me go ahead and create a
new program here called students.py.

00:32:11.340 --> 00:32:13.950
And in this program,
let's go about reading,

00:32:13.950 --> 00:32:17.610
not a text file, per se, but a
specific type of text file, a CSV,

00:32:17.610 --> 00:32:19.800
a Comma-Separated Values file.

00:32:19.800 --> 00:32:22.200
And to do this, I'm going to
use similar code as before.

00:32:22.200 --> 00:32:26.897
I'm going to say with open,
quote/unquote, students.csv.

00:32:26.897 --> 00:32:28.980
I'm not going to bother
specifying, quote/unquote,

00:32:28.980 --> 00:32:30.670
r because, again, that's the default.

00:32:30.670 --> 00:32:33.390
But I'm going to give myself
a variable name of file.

00:32:33.390 --> 00:32:36.150
And then in this file, I'm
going to go ahead and do this.

00:32:36.150 --> 00:32:41.220
For line in file, as before, and
now I have to be a bit clever here.

00:32:41.220 --> 00:32:45.180
Let me go back to students.csv,
looking at this file,

00:32:45.180 --> 00:32:47.940
and it seems that on my
loop on each iteration,

00:32:47.940 --> 00:32:51.000
I'm going to get access
to the whole line of text.

00:32:51.000 --> 00:32:52.920
I'm not going to
automatically get access

00:32:52.920 --> 00:32:55.170
to just Hermione or just Gryffindor.

00:32:55.170 --> 00:32:58.960
Recall that the loop is going to
give me each full line of text.

00:32:58.960 --> 00:33:01.590
So logically, what would
you propose that we

00:33:01.590 --> 00:33:05.520
do inside of a for loop that's
reading a whole line of text at once,

00:33:05.520 --> 00:33:08.490
but we now want to get access
to the individual values,

00:33:08.490 --> 00:33:11.670
like Hermione and Gryffindor,
Harry and Gryffindor?

00:33:11.670 --> 00:33:14.160
How do we go about
taking one line of text

00:33:14.160 --> 00:33:16.740
and gaining access to those
individual values, do you think?

00:33:16.740 --> 00:33:20.040
Just instinctively, even if you're not
sure what the name of the functions

00:33:20.040 --> 00:33:20.820
would be.

00:33:20.820 --> 00:33:24.810
AUDIENCE: You can access it as you
would as if you were using a dictionary,

00:33:24.810 --> 00:33:26.195
like using a key and value.

00:33:26.195 --> 00:33:29.070
DAVID MALAN: So ideally, we would
access it using it a key and value.

00:33:29.070 --> 00:33:32.100
But at this point in the story,
all we have is this loop,

00:33:32.100 --> 00:33:35.580
and this loop is giving me one
line of text that is the time.

00:33:35.580 --> 00:33:36.570
I'm the programmer now.

00:33:36.570 --> 00:33:37.470
I have to solve this.

00:33:37.470 --> 00:33:39.480
There is no dictionary yet in question.

00:33:39.480 --> 00:33:41.760
How about another suggestion here?

00:33:41.760 --> 00:33:45.818
AUDIENCE: So you can somehow split
the two words based on the comma?

00:33:45.818 --> 00:33:47.610
DAVID MALAN: Yeah, even
if you're not quite

00:33:47.610 --> 00:33:49.940
sure what function is going
to do this, intuitively,

00:33:49.940 --> 00:33:51.690
you want to take this
whole line of text--

00:33:51.690 --> 00:33:55.320
Hermione, comma, Gryffindor, Harry,
comma, Gryffindor, and so forth--

00:33:55.320 --> 00:33:58.253
and split that line into
two pieces, if you will.

00:33:58.253 --> 00:34:00.420
And it turns out wonderfully,
the function we'll use

00:34:00.420 --> 00:34:03.780
is actually called split that
can split on any characters,

00:34:03.780 --> 00:34:06.100
but you can tell it
what character to use.

00:34:06.100 --> 00:34:09.633
So I'm going to go back into
students.py, and inside of this loop,

00:34:09.633 --> 00:34:11.050
I'm going to go ahead and do this.

00:34:11.050 --> 00:34:12.540
I'm going to take the current line.

00:34:12.540 --> 00:34:17.159
I'm going to remove the white space at
the end, as always, using rstrip here.

00:34:17.159 --> 00:34:19.260
And then whatever the
result of that is, I'm

00:34:19.260 --> 00:34:23.250
going to now call split
and, quote/unquote, comma.

00:34:23.250 --> 00:34:27.330
So the split function or
method comes with strings.

00:34:27.330 --> 00:34:31.570
Strs in Python-- any str
has this method built-in.

00:34:31.570 --> 00:34:36.659
And if you pass in an argument, like a
comma, what this split function will do

00:34:36.659 --> 00:34:41.880
is split that current string into 1,
2, 3, maybe more pieces by looking

00:34:41.880 --> 00:34:46.530
for that character again and again.

00:34:46.530 --> 00:34:48.540
Ultimately, split is
going to return to us

00:34:48.540 --> 00:34:51.570
a list of all of the
individual parts to the left

00:34:51.570 --> 00:34:53.260
and to the right of those commas.

00:34:53.260 --> 00:34:55.949
So I can give myself a
variable called row here.

00:34:55.949 --> 00:34:57.360
And this is a common paradigm.

00:34:57.360 --> 00:35:01.390
When you know you're iterating
over a file, specifically a CSV,

00:35:01.390 --> 00:35:04.500
it's common to think of
each line of it as being

00:35:04.500 --> 00:35:09.790
a row and each of the values therein
separated by commas as columns,

00:35:09.790 --> 00:35:10.570
so to speak.

00:35:10.570 --> 00:35:13.170
So I'm going to deliberately
name my variable row, just

00:35:13.170 --> 00:35:14.880
to be consistent with that convention.

00:35:14.880 --> 00:35:17.430
And now what do I want to print?

00:35:17.430 --> 00:35:19.140
Well, I'm going to go
ahead and say this.

00:35:19.140 --> 00:35:26.250
Print, how about the following, an f
string that starts with curly braces--

00:35:26.250 --> 00:35:29.610
well, how do I get access to
the first thing in that row?

00:35:29.610 --> 00:35:31.590
Well, the row is going
to have how many parts?

00:35:31.590 --> 00:35:35.580
Two, because if I'm splitting on
commas, and there's one comma per line,

00:35:35.580 --> 00:35:37.980
that's going to give me a
left part and a right part,

00:35:37.980 --> 00:35:41.100
like Hermione and Gryffindor,
Harry and Gryffindor.

00:35:41.100 --> 00:35:45.820
When I have a list like row, how do
I get access to individual values?

00:35:45.820 --> 00:35:47.320
Well, I can do this.

00:35:47.320 --> 00:35:50.310
I can say, row, bracket, 0.

00:35:50.310 --> 00:35:52.920
And that's going to go to the
first element of the list, which

00:35:52.920 --> 00:35:54.720
should hopefully be the student's name.

00:35:54.720 --> 00:35:57.240
Then after that, I'm
going to say, is in,

00:35:57.240 --> 00:36:01.830
and I'm going to have another curly
brace here for row, bracket, 1.

00:36:01.830 --> 00:36:03.705
And then I'm going to
close my whole quote.

00:36:03.705 --> 00:36:05.580
So it looks a little
cryptic at first glance.

00:36:05.580 --> 00:36:09.660
But most of this is just f string syntax
with curly braces to plug in values.

00:36:09.660 --> 00:36:11.430
And what values am I plugging in?

00:36:11.430 --> 00:36:15.210
Well, row, again, is a list, and
it has two elements, presumably--

00:36:15.210 --> 00:36:19.030
Hermione in one and Gryffindor
in the other, and so forth.

00:36:19.030 --> 00:36:22.440
So bracket 0 is the first
element because, remember,

00:36:22.440 --> 00:36:25.050
we start indexing at 0 in Python.

00:36:25.050 --> 00:36:27.520
And 1 is going to be the second element.

00:36:27.520 --> 00:36:30.330
So let me go ahead and run
this now and see what happens--

00:36:30.330 --> 00:36:35.880
python of students.py, Enter.

00:36:35.880 --> 00:36:37.993
And we see Hermione is in Gryffindor.

00:36:37.993 --> 00:36:38.910
Harry's in Gryffindor.

00:36:38.910 --> 00:36:39.960
Ron is in Gryffindor.

00:36:39.960 --> 00:36:41.970
And Draco is in Slytherin.

00:36:41.970 --> 00:36:48.180
So we have now implemented our own
code from scratch that actually parses,

00:36:48.180 --> 00:36:53.010
that is, reads and interprets
a CSV file ultimately here.

00:36:53.010 --> 00:36:55.390
Now, let me pause to see
if there's any questions.

00:36:55.390 --> 00:36:59.080
But we'll make this even easier
to read in just a moment.

00:36:59.080 --> 00:37:03.090
Any questions on what we've just
done here by splitting by comma?

00:37:03.090 --> 00:37:08.610
AUDIENCE: So my question is, can we
edit any line of code any time we want?

00:37:08.610 --> 00:37:13.620
Or the only option that we
have is to append the lines?

00:37:13.620 --> 00:37:18.780
Or let's say, we want to,
let's say, change Harry's house

00:37:18.780 --> 00:37:22.500
to Slytherin or some other house.

00:37:22.500 --> 00:37:24.250
DAVID MALAN: Yeah, a
really good question.

00:37:24.250 --> 00:37:28.740
What if you want to, in Python,
change a line in the file and not just

00:37:28.740 --> 00:37:30.130
append to the end?

00:37:30.130 --> 00:37:32.290
You would have to implement
that logic yourself.

00:37:32.290 --> 00:37:35.880
So for instance, you could
imagine now opening the file

00:37:35.880 --> 00:37:39.660
and reading all of the contents
in, then maybe iterating over

00:37:39.660 --> 00:37:40.650
each of those lines.

00:37:40.650 --> 00:37:43.830
And as soon as you see that the
current name equals equals Harry,

00:37:43.830 --> 00:37:47.100
you could maybe change
his house to Slytherin.

00:37:47.100 --> 00:37:51.030
And then it would be up to you,
though, to write all of those changes

00:37:51.030 --> 00:37:52.060
back to the file.

00:37:52.060 --> 00:37:54.360
So in that case, you might
want to, in simplest form,

00:37:54.360 --> 00:37:56.610
read the file once and let it close.

00:37:56.610 --> 00:38:00.300
Then open it again, but open for
writing, and change the whole file.

00:38:00.300 --> 00:38:04.770
It's not really possible or easy to go
in and change just part of the file,

00:38:04.770 --> 00:38:05.760
though you can do it.

00:38:05.760 --> 00:38:09.630
It's easier to actually read the whole
file, make your changes in memory,

00:38:09.630 --> 00:38:11.100
then write the whole file out.

00:38:11.100 --> 00:38:13.920
But for larger files where
that might be quite slow,

00:38:13.920 --> 00:38:16.200
you can be more clever than that.

00:38:16.200 --> 00:38:19.980
Well, let me propose now that we clean
this up a little bit because I actually

00:38:19.980 --> 00:38:23.370
think this is a little cryptic to
read-- row, bracket, 0, row, bracket,

00:38:23.370 --> 00:38:27.090
1-- it's not that well-written
at the moment, I would say.

00:38:27.090 --> 00:38:32.050
But it turns out that when you have
a variable that's a list like row,

00:38:32.050 --> 00:38:35.250
you don't have to throw all of
those variables into a list.

00:38:35.250 --> 00:38:38.580
You can actually unpack
that whole sequence at once.

00:38:38.580 --> 00:38:42.630
That is to say, if you know that a
function like split returns a list,

00:38:42.630 --> 00:38:45.090
but you know in advance
that it's going to return

00:38:45.090 --> 00:38:48.330
two values in a list,
the first and the second,

00:38:48.330 --> 00:38:51.750
you don't have to throw them all into
a variable that itself is a list.

00:38:51.750 --> 00:38:55.840
You can actually unpack them
simultaneously into two variables,

00:38:55.840 --> 00:38:57.630
doing name, comma, house.

00:38:57.630 --> 00:39:01.680
So this is a nice Python technique
to not only create, but assign,

00:39:01.680 --> 00:39:05.580
automatically, in parallel,
two variables at once,

00:39:05.580 --> 00:39:06.880
rather than just one.

00:39:06.880 --> 00:39:10.230
So this will have the effect of
putting the name in the left, Hermione,

00:39:10.230 --> 00:39:12.360
and it will have the effect
of putting Gryffindor

00:39:12.360 --> 00:39:14.040
the house in the right variable.

00:39:14.040 --> 00:39:15.643
And we now no longer have a row.

00:39:15.643 --> 00:39:18.810
We can now make our code a little more
readable by now literally just saying

00:39:18.810 --> 00:39:22.020
name down here and, for
instance, house down here.

00:39:22.020 --> 00:39:25.020
So just a little more readable,
even though, functionally, the code

00:39:25.020 --> 00:39:28.430
now is exactly the same.

00:39:28.430 --> 00:39:30.470
All right, so this now works.

00:39:30.470 --> 00:39:34.070
And I'll confirm as much by just running
it once more-- python of students.py,

00:39:34.070 --> 00:39:34.580
Enter.

00:39:34.580 --> 00:39:37.340
And we see that the text is as intended.

00:39:37.340 --> 00:39:39.590
But suppose, for the sake
of discussion, that I'd

00:39:39.590 --> 00:39:42.650
like to sort this list of output.

00:39:42.650 --> 00:39:46.310
I'd like to say hello, again, to
Draco first, then hello to Harry,

00:39:46.310 --> 00:39:47.960
then Hermione, then Ron.

00:39:47.960 --> 00:39:49.770
How can I go about doing this?

00:39:49.770 --> 00:39:52.520
Well, let's take some inspiration
from the previous example, where

00:39:52.520 --> 00:39:57.680
we were only dealing with names and,
instead, do it with these full phrases.

00:39:57.680 --> 00:39:59.480
So and so is in house.

00:39:59.480 --> 00:40:01.080
Well, let me go ahead and do this.

00:40:01.080 --> 00:40:05.660
I'm going to go ahead and start scratch
and give myself a list called students,

00:40:05.660 --> 00:40:07.370
equal to an empty list, initially.

00:40:07.370 --> 00:40:14.060
And then with open students.csv as file,
I'm going to go ahead and say this--

00:40:14.060 --> 00:40:16.405
for line in file.

00:40:16.405 --> 00:40:19.280
And then below this, I'm going to
do exactly as before-- name, comma,

00:40:19.280 --> 00:40:23.240
house equals the current line, stripping
off the white space at the end,

00:40:23.240 --> 00:40:24.840
splitting it on a comma--

00:40:24.840 --> 00:40:26.670
so that's exact same as before.

00:40:26.670 --> 00:40:32.180
But this time, before I go
about printing the sentence,

00:40:32.180 --> 00:40:34.370
I'm going to store it
temporarily in a list

00:40:34.370 --> 00:40:38.010
so that I can accumulate all of these
sentences and then sort them later.

00:40:38.010 --> 00:40:39.380
So let me go ahead and do this.

00:40:39.380 --> 00:40:42.770
Students, which is my list, .append--

00:40:42.770 --> 00:40:45.320
let me append the actual
sentence I want to show

00:40:45.320 --> 00:40:46.820
on the screen-- so another f string.

00:40:46.820 --> 00:40:50.640
So name is in house, just as before.

00:40:50.640 --> 00:40:52.520
But notice, I'm not
printing that sentence.

00:40:52.520 --> 00:40:56.600
I'm appending it to my list--
not a file, but to my list.

00:40:56.600 --> 00:40:58.050
Why am I doing this?

00:40:58.050 --> 00:41:00.140
Well, just because, as
before, I want to do this.

00:41:00.140 --> 00:41:04.070
For student in the
sorted students, I want

00:41:04.070 --> 00:41:07.590
to go ahead and print
out students, like this.

00:41:07.590 --> 00:41:11.900
Well, let me go ahead and run python
of students.py, and hit Enter now.

00:41:11.900 --> 00:41:14.713
And I think we'll see,
indeed, Draco is now first.

00:41:14.713 --> 00:41:15.380
Harry is second.

00:41:15.380 --> 00:41:16.310
Hermione is third.

00:41:16.310 --> 00:41:18.380
And Ron is fourth.

00:41:18.380 --> 00:41:21.980
But this is arguably a
little sloppy, right?

00:41:21.980 --> 00:41:25.490
It seems a little hackish that
I'm constructing these sentences.

00:41:25.490 --> 00:41:29.150
And even though I technically
want to sort by name,

00:41:29.150 --> 00:41:32.490
I'm technically sorting by
these whole English sentences.

00:41:32.490 --> 00:41:33.530
So it's not wrong.

00:41:33.530 --> 00:41:36.590
It's achieving the intended
result, but it's not really

00:41:36.590 --> 00:41:39.480
well designed because I'm just
getting lucky that English

00:41:39.480 --> 00:41:40.730
is reading from left to right.

00:41:40.730 --> 00:41:43.700
And therefore, when I print
this out, it's sorting properly.

00:41:43.700 --> 00:41:46.760
It would be better, really, to come
up with a technique for sorting

00:41:46.760 --> 00:41:50.600
by the students' names, not
by some English sentence

00:41:50.600 --> 00:41:53.360
that I've constructed here on line 6.

00:41:53.360 --> 00:41:57.200
So to achieve this, I'm going to
need to make my life more complicated

00:41:57.200 --> 00:41:57.980
for a moment.

00:41:57.980 --> 00:42:02.330
And I'm going to need to collect
information about each student

00:42:02.330 --> 00:42:04.950
before I bother
assembling that sentence.

00:42:04.950 --> 00:42:06.750
So let me propose that we do this.

00:42:06.750 --> 00:42:09.960
Let me go ahead and undo
these last few lines of code

00:42:09.960 --> 00:42:14.480
so that we currently have two
variables, name and house, each of which

00:42:14.480 --> 00:42:16.560
has name and the student's
house respectively.

00:42:16.560 --> 00:42:19.130
And we still have our
global variable, students.

00:42:19.130 --> 00:42:20.360
But let me do this.

00:42:20.360 --> 00:42:22.610
Recall that Python
supports dictionaries.

00:42:22.610 --> 00:42:25.770
And dictionaries are just
collections of keys and values.

00:42:25.770 --> 00:42:28.160
So you can associate
something with something else,

00:42:28.160 --> 00:42:32.000
like, a name with Hermione,
like, a house with Gryffindor.

00:42:32.000 --> 00:42:33.660
That really is a dictionary.

00:42:33.660 --> 00:42:34.610
So let me do this.

00:42:34.610 --> 00:42:39.950
Let me temporarily create a dictionary
that stores this association of name

00:42:39.950 --> 00:42:40.950
with house.

00:42:40.950 --> 00:42:42.240
Let me go ahead and do this.

00:42:42.240 --> 00:42:45.950
Let me say that the student here is
going to be represented initially

00:42:45.950 --> 00:42:46.908
by an empty dictionary.

00:42:46.908 --> 00:42:49.575
And just like you can create an
empty list with square brackets,

00:42:49.575 --> 00:42:51.990
you can create an empty
dictionary with curly braces.

00:42:51.990 --> 00:42:57.050
So give me an empty dictionary that
will soon have two keys, name and house.

00:42:57.050 --> 00:42:58.140
How do I do that?

00:42:58.140 --> 00:43:01.070
Well, I could do it this
way-- student, open bracket,

00:43:01.070 --> 00:43:05.870
name equals the student's name
that we got from the line.

00:43:05.870 --> 00:43:10.490
Student, bracket, house equals the
house that we got from the line.

00:43:10.490 --> 00:43:14.450
And now I'm going to append
to the students list--

00:43:14.450 --> 00:43:17.660
plural-- that particular student.

00:43:17.660 --> 00:43:18.920
Now, why have I done this?

00:43:18.920 --> 00:43:21.060
I've admittedly made my
code more complicated.

00:43:21.060 --> 00:43:23.870
It's more lines of code,
but I've now collected

00:43:23.870 --> 00:43:27.560
all of the information I have
about students while still keeping

00:43:27.560 --> 00:43:29.960
track-- what's a name, what's a house.

00:43:29.960 --> 00:43:34.100
The list, meanwhile, has all of the
students' names and houses together.

00:43:34.100 --> 00:43:35.630
Now, why have I done this?

00:43:35.630 --> 00:43:38.150
Well, let me, for the moment,
just do something simple.

00:43:38.150 --> 00:43:43.220
Let me do for student in students,
and let me very simply now say, print

00:43:43.220 --> 00:43:48.980
the following f string, the
current student with this name

00:43:48.980 --> 00:43:53.390
is in this current student's house.

00:43:53.390 --> 00:43:55.460
And now notice one detail.

00:43:55.460 --> 00:43:59.390
Inside of this f string, I'm
using my curly braces, as always.

00:43:59.390 --> 00:44:03.590
I'm using, inside of those curly braces,
the name of a variable, as always.

00:44:03.590 --> 00:44:07.970
But then I'm using not bracket 0 or
1 because these are dictionaries now,

00:44:07.970 --> 00:44:08.840
not list.

00:44:08.840 --> 00:44:16.090
But why am I using single quotes to
surround house and to surround name?

00:44:16.090 --> 00:44:25.850
Why single quotes inside of this
f string to access those keys?

00:44:25.850 --> 00:44:30.960
AUDIENCE: Yes, because you have
double quotes in that line 12.

00:44:30.960 --> 00:44:34.222
And so you have to tell
Python to differentiate.

00:44:34.222 --> 00:44:35.930
DAVID MALAN: Exactly,
because I'm already

00:44:35.930 --> 00:44:39.620
using double quotes outside of the
f string, if I want to put quotes

00:44:39.620 --> 00:44:41.750
around any strings on
the inside, which I do

00:44:41.750 --> 00:44:44.810
need to do for dictionaries
because, recall, when you index

00:44:44.810 --> 00:44:47.570
into a dictionary, you don't
use numbers like lists--

00:44:47.570 --> 00:44:49.100
0, 1, 2, onward--

00:44:49.100 --> 00:44:51.760
you, instead, use strings,
which need to be quoted.

00:44:51.760 --> 00:44:53.510
But if you're already
using double quotes,

00:44:53.510 --> 00:44:55.820
it's easiest to then use
single quotes on the inside,

00:44:55.820 --> 00:44:59.360
so Python doesn't get confused
about what lines up with what.

00:44:59.360 --> 00:45:02.120
So at the moment, when
I run this program,

00:45:02.120 --> 00:45:04.130
it's going to print out those hellos.

00:45:04.130 --> 00:45:05.990
But they're not yet sorted.

00:45:05.990 --> 00:45:10.340
In fact, what I now have
is a list of dictionaries,

00:45:10.340 --> 00:45:12.110
and nothing is yet sorted.

00:45:12.110 --> 00:45:14.540
But let me tighten up the
code too to point out that it

00:45:14.540 --> 00:45:16.340
doesn't need to be quite as verbose.

00:45:16.340 --> 00:45:20.210
If you're in the habit of creating an
empty dictionary, like this on line 6,

00:45:20.210 --> 00:45:23.480
and then immediately putting
in two keys, name and house,

00:45:23.480 --> 00:45:26.315
each with two values, name
and house respectively, you

00:45:26.315 --> 00:45:27.690
can actually do this all at once.

00:45:27.690 --> 00:45:29.870
So let me show you a
slightly different syntax.

00:45:29.870 --> 00:45:30.920
I can do this.

00:45:30.920 --> 00:45:34.550
Give me a variable called student,
and let me use curly braces

00:45:34.550 --> 00:45:35.760
on the right-hand side here.

00:45:35.760 --> 00:45:38.780
But instead of leaving them empty,
let's just define those keys

00:45:38.780 --> 00:45:40.070
and those values now.

00:45:40.070 --> 00:45:45.620
Quote/unquote name will be name, and
quote/unquote house will be house.

00:45:45.620 --> 00:45:49.850
This achieves the exact same effect
in one line instead of three.

00:45:49.850 --> 00:45:53.692
It creates a new non-empty
dictionary containing a name key,

00:45:53.692 --> 00:45:55.400
the value of which is
the student's name,

00:45:55.400 --> 00:45:58.610
and a house key, the value of
which is the student's house.

00:45:58.610 --> 00:45:59.870
Nothing else needs to change.

00:45:59.870 --> 00:46:03.955
That will still just work so that if
I, again, run python of students.py,

00:46:03.955 --> 00:46:06.080
I'm still seeing those
greetings, but they're still

00:46:06.080 --> 00:46:08.960
not quite actually sorted.

00:46:08.960 --> 00:46:12.290
Well, what might I go about
doing here in order to--

00:46:12.290 --> 00:46:15.410
what could I do to
improve upon this further?

00:46:15.410 --> 00:46:19.850
Well, we need some mechanism
now of sorting those students.

00:46:19.850 --> 00:46:22.820
But unfortunately, you can't do this.

00:46:22.820 --> 00:46:28.413
We can't sort all of the students now
because those students are not names

00:46:28.413 --> 00:46:29.330
like they were before.

00:46:29.330 --> 00:46:31.310
They aren't sentences
like they were before.

00:46:31.310 --> 00:46:34.400
Each of the students is a
dictionary, and it's not obvious

00:46:34.400 --> 00:46:37.830
how you would sort a
dictionary inside of a list.

00:46:37.830 --> 00:46:40.280
So ideally, what do we want to do?

00:46:40.280 --> 00:46:45.440
If at the moment we hit line 9, we
have a list of all of these students,

00:46:45.440 --> 00:46:48.620
and inside of that list is
one dictionary per student,

00:46:48.620 --> 00:46:52.040
and each of those dictionaries
has two keys, name and house,

00:46:52.040 --> 00:46:57.050
wouldn't it be nice if there were way
in code to tell Python, sort this list

00:46:57.050 --> 00:46:59.960
by looking at this key
in each dictionary?

00:46:59.960 --> 00:47:03.830
Because that would give us the ability
to sort either by name, or even

00:47:03.830 --> 00:47:07.800
by house, or even by any other
field that we add to that file.

00:47:07.800 --> 00:47:09.980
So it turns out, we can do this.

00:47:09.980 --> 00:47:14.000
We can tell the sorted function
not just to reverse things or not.

00:47:14.000 --> 00:47:16.250
It takes another positional--

00:47:16.250 --> 00:47:19.520
it takes another named
parameter called key,

00:47:19.520 --> 00:47:23.990
where you can specify what key
should be used in order to sort

00:47:23.990 --> 00:47:25.370
some list of dictionaries.

00:47:25.370 --> 00:47:27.410
And I'm going to
propose that we do this.

00:47:27.410 --> 00:47:31.940
I'm going to first define a function--
temporarily, for now-- called get_name.

00:47:31.940 --> 00:47:35.090
And this function's purpose
in life, given a student,

00:47:35.090 --> 00:47:38.480
is to, quite simply,
return the student's name

00:47:38.480 --> 00:47:40.500
from that particular dictionary.

00:47:40.500 --> 00:47:43.910
So if student is a dictionary,
this is going to return literally

00:47:43.910 --> 00:47:45.470
the student's name, and that's it.

00:47:45.470 --> 00:47:48.530
That's the sole purpose
of this function in life.

00:47:48.530 --> 00:47:50.120
What do I now want to do?

00:47:50.120 --> 00:47:52.670
Well now that I have a
function that, given a student,

00:47:52.670 --> 00:47:56.130
will return to me the
student's name, I can do this.

00:47:56.130 --> 00:47:59.630
I can change sorted to
say, use a key that's

00:47:59.630 --> 00:48:03.350
equal to whatever the
return value of get_name is.

00:48:03.350 --> 00:48:05.810
And this now is a feature of Python.

00:48:05.810 --> 00:48:12.300
Python allows you to pass functions
as arguments into other functions.

00:48:12.300 --> 00:48:14.180
So get_name is a function.

00:48:14.180 --> 00:48:15.710
Sorted is a function.

00:48:15.710 --> 00:48:22.610
And I'm passing in get_name to sorted
as the value of that key parameter.

00:48:22.610 --> 00:48:24.540
Now, why am I doing that?

00:48:24.540 --> 00:48:26.600
Well, if you think of
the get_name function,

00:48:26.600 --> 00:48:30.080
it's just a block of code that
will get the name of a student.

00:48:30.080 --> 00:48:33.410
That's handy because that's the
capability that sorted needs.

00:48:33.410 --> 00:48:36.470
When given a list of students,
each of which is a dictionary,

00:48:36.470 --> 00:48:38.990
sorted needs to know, how do
I get the name of the student?

00:48:38.990 --> 00:48:40.882
In order to do alphabetical
sorting for you.

00:48:40.882 --> 00:48:42.590
The authors of Python
didn't know that we

00:48:42.590 --> 00:48:44.880
were going to be creating
students here in this class,

00:48:44.880 --> 00:48:47.540
so they couldn't have anticipated
writing code in advance

00:48:47.540 --> 00:48:51.770
that specifically sorts on a field
called student, let alone called name,

00:48:51.770 --> 00:48:53.150
let alone house.

00:48:53.150 --> 00:48:54.950
So what did they do?

00:48:54.950 --> 00:48:57.590
They instead built into
the sorted function

00:48:57.590 --> 00:49:01.490
this named parameter key that
allows us, all these years later,

00:49:01.490 --> 00:49:06.060
to tell their function sorted how
to sort this list of dictionaries.

00:49:06.060 --> 00:49:07.910
So now watch what happens.

00:49:07.910 --> 00:49:11.540
If I run python of
students.py and hit Enter,

00:49:11.540 --> 00:49:14.150
I now have a sorted list of output.

00:49:14.150 --> 00:49:14.810
Why?

00:49:14.810 --> 00:49:17.750
Because now that list
of dictionaries has all

00:49:17.750 --> 00:49:20.570
been sorted by the student's name.

00:49:20.570 --> 00:49:22.020
I can further do this.

00:49:22.020 --> 00:49:24.840
If, as before, we want to reverse
the whole thing by saying reverse

00:49:24.840 --> 00:49:26.740
equals true, we can do that too.

00:49:26.740 --> 00:49:28.980
Let me rerun Python of
students.py, and hit Enter.

00:49:28.980 --> 00:49:29.880
Now it's reversed.

00:49:29.880 --> 00:49:32.610
Now it's Ron, then
Hermione, Harry, and Draco.

00:49:32.610 --> 00:49:34.590
But we can do something
different as well.

00:49:34.590 --> 00:49:39.150
What if I want to sort, for
instance, by house name reversed?

00:49:39.150 --> 00:49:40.230
I could do this.

00:49:40.230 --> 00:49:43.110
I could change this function
from get_name to get_house.

00:49:43.110 --> 00:49:46.320
I could change the implementation
up here to be get_house.

00:49:46.320 --> 00:49:49.660
And I can return not the student's
name but the student's house.

00:49:49.660 --> 00:49:56.250
And so now notice, if I run python
of students.py, Enter, notice now

00:49:56.250 --> 00:49:59.730
it is sorted by house in reverse order.

00:49:59.730 --> 00:50:02.400
Slytherin is first, and then Gryffindor.

00:50:02.400 --> 00:50:07.110
If I get rid of the reverse but keep
the get_house and rerun this program,

00:50:07.110 --> 00:50:09.390
now it's sorted by house.

00:50:09.390 --> 00:50:11.970
Gryffindor is first,
and Slytherin is last.

00:50:11.970 --> 00:50:15.990
And the upside now of this is, because
I'm using this list of dictionaries

00:50:15.990 --> 00:50:19.620
and keeping the students data
together until the last minute

00:50:19.620 --> 00:50:21.780
when I'm finally doing
the printing, I now

00:50:21.780 --> 00:50:25.800
have full control over the information
itself, and I can sort by this or that.

00:50:25.800 --> 00:50:29.100
I don't have to construct those
sentences in advance, like I

00:50:29.100 --> 00:50:31.587
rather hackishly did the first time.

00:50:31.587 --> 00:50:32.670
All right, that was a lot.

00:50:32.670 --> 00:50:36.000
Let me pause here to see
if there are questions.

00:50:36.000 --> 00:50:40.050
AUDIENCE: So when we are
sorting the files, every time,

00:50:40.050 --> 00:50:48.090
should we use the loops, or a text
dictionary, or any kind of list?

00:50:48.090 --> 00:50:55.440
Can we sort by just sorting, not
looping or any kind of stuff?

00:50:55.440 --> 00:50:58.890
DAVID MALAN: A good question,
and the short answer with Python

00:50:58.890 --> 00:51:00.630
alone, you're the programmer.

00:51:00.630 --> 00:51:01.890
You need to do the sorting.

00:51:01.890 --> 00:51:05.160
With libraries and other
techniques, absolutely.

00:51:05.160 --> 00:51:08.100
You can do more of this
automatically because someone else

00:51:08.100 --> 00:51:09.180
has written that code.

00:51:09.180 --> 00:51:12.420
What we're doing at the moment is doing
everything from scratch ourselves.

00:51:12.420 --> 00:51:15.045
But absolutely, with other
functions or libraries, some of this

00:51:15.045 --> 00:51:18.120
could be made more easily done.

00:51:18.120 --> 00:51:20.590
Some of this could be made easier.

00:51:20.590 --> 00:51:23.400
Other questions on this technique here?

00:51:23.400 --> 00:51:28.050
AUDIENCE: If equal to the
return value of the function,

00:51:28.050 --> 00:51:36.152
can it be equal to just
a variable or a value?

00:51:36.152 --> 00:51:37.110
DAVID MALAN: Well, yes.

00:51:37.110 --> 00:51:39.240
It should equal a value.

00:51:39.240 --> 00:51:42.630
And I should clarify, actually,
since this was not obvious.

00:51:42.630 --> 00:51:46.950
So when you pass in a function
like get_name or get_house

00:51:46.950 --> 00:51:49.620
to the sorted function
as the value of key,

00:51:49.620 --> 00:51:55.830
that function is automatically
called by the sorted function for you

00:51:55.830 --> 00:51:58.740
on each of the dictionaries in the list.

00:51:58.740 --> 00:52:02.250
And it uses the return value
of get_name or get_house

00:52:02.250 --> 00:52:07.080
to decide what strings to actually
use to compare in order to decide

00:52:07.080 --> 00:52:09.150
which is alphabetically correct.

00:52:09.150 --> 00:52:12.120
So this function, which
you pass just by name, you

00:52:12.120 --> 00:52:14.790
do not pass in
parentheses at the end, is

00:52:14.790 --> 00:52:18.690
called by the sorted function
in order to figure out for you

00:52:18.690 --> 00:52:21.790
how to compare these same values.

00:52:21.790 --> 00:52:25.230
AUDIENCE: How can we
use nested dictionaries?

00:52:25.230 --> 00:52:28.920
I have read about nested dictionaries.

00:52:28.920 --> 00:52:31.500
What is the difference
between nested dictionaries

00:52:31.500 --> 00:52:34.380
and the dictionary inside a list?

00:52:34.380 --> 00:52:35.460
I think it is that.

00:52:35.460 --> 00:52:36.930
DAVID MALAN: Sure.

00:52:36.930 --> 00:52:39.280
So we are using a list of dictionaries.

00:52:39.280 --> 00:52:39.780
Why?

00:52:39.780 --> 00:52:42.450
Because each of those
dictionaries represents a student.

00:52:42.450 --> 00:52:45.270
And a student has a name and a
house, and we want to, I claim,

00:52:45.270 --> 00:52:46.782
maintain that association.

00:52:46.782 --> 00:52:49.740
And it's a list of students because
we've got multiple students-- four,

00:52:49.740 --> 00:52:50.580
in this case.

00:52:50.580 --> 00:52:54.570
You could create a structure that
is a dictionary of dictionaries.

00:52:54.570 --> 00:52:56.700
But I would argue, it just
doesn't solve a problem.

00:52:56.700 --> 00:52:58.367
I don't need a dictionary of dictionary.

00:52:58.367 --> 00:53:00.660
I need a list of
key-value pairs right now.

00:53:00.660 --> 00:53:01.800
That's all.

00:53:01.800 --> 00:53:05.460
So let me propose, if we go
back to students.py here,

00:53:05.460 --> 00:53:10.140
and we revert back to the approach
where we have get_name as the function,

00:53:10.140 --> 00:53:14.700
both used and defined here, and that
function returns the student's name,

00:53:14.700 --> 00:53:19.920
what happens to be clear is that the
sorted function will use the value

00:53:19.920 --> 00:53:22.020
of key-- get_name, in this case--

00:53:22.020 --> 00:53:25.890
calling that function on
every dictionary in the list

00:53:25.890 --> 00:53:27.540
that it's supposed to sort.

00:53:27.540 --> 00:53:30.930
And that function,
get_name, returns the string

00:53:30.930 --> 00:53:33.600
that sorted will actually
use to decide whether things

00:53:33.600 --> 00:53:36.630
go in this order, left-right,
or in this order, right-left.

00:53:36.630 --> 00:53:39.790
It alphabetizes these things
based on that return value.

00:53:39.790 --> 00:53:43.020
So notice that I'm not calling
the function get_name here

00:53:43.020 --> 00:53:43.920
with parentheses.

00:53:43.920 --> 00:53:47.340
I'm passing it in only by its
name so that the sorted function

00:53:47.340 --> 00:53:50.520
can call that get name function for me.

00:53:50.520 --> 00:53:53.940
Now, it turns out, as always,
if you're defining something,

00:53:53.940 --> 00:53:57.750
be it a variable or, in this case, a
function, and then immediately using

00:53:57.750 --> 00:54:01.530
it but never, once again, needing
the name of that function,

00:54:01.530 --> 00:54:04.950
like, get_name, we can actually
tighten this code up further.

00:54:04.950 --> 00:54:06.300
I can actually do this.

00:54:06.300 --> 00:54:09.180
I can get rid of the get_name
function all together,

00:54:09.180 --> 00:54:12.750
just like I could get rid of a
variable that isn't strictly necessary.

00:54:12.750 --> 00:54:16.350
And instead of passing key,
the name of a function,

00:54:16.350 --> 00:54:19.680
I can actually pass key
what's called a lambda

00:54:19.680 --> 00:54:22.410
function, which is an anonymous
function, a function that

00:54:22.410 --> 00:54:23.460
just has no name.

00:54:23.460 --> 00:54:24.000
Why?

00:54:24.000 --> 00:54:27.150
Because you don't need to give it a name
if you're only going to call it in one

00:54:27.150 --> 00:54:27.690
place.

00:54:27.690 --> 00:54:30.220
And the syntax for this in
Python is a little weird.

00:54:30.220 --> 00:54:35.100
But if I do key equals literally
the word lambda, then something

00:54:35.100 --> 00:54:37.560
like student, which is
the name of the parameter

00:54:37.560 --> 00:54:41.550
I expect this function to take, and
then I don't even type the Return key.

00:54:41.550 --> 00:54:45.150
I instead just say,
student, bracket, name.

00:54:45.150 --> 00:54:47.620
So what am I doing here with my code?

00:54:47.620 --> 00:54:52.560
This code here that I've highlighted
is equivalent to the get_name function

00:54:52.560 --> 00:54:54.270
I implemented a moment ago.

00:54:54.270 --> 00:54:56.320
The syntax is admittedly
a little different.

00:54:56.320 --> 00:54:57.330
I don't use def.

00:54:57.330 --> 00:54:59.580
I didn't even give it
a name, like get_name.

00:54:59.580 --> 00:55:03.850
I, instead, am using this other keyword
in Python called lambda, which says,

00:55:03.850 --> 00:55:06.660
hey, Python, here comes a
function, but it has no name.

00:55:06.660 --> 00:55:07.650
It's anonymous.

00:55:07.650 --> 00:55:10.050
That function takes a parameter.

00:55:10.050 --> 00:55:11.520
I could call it anything I want.

00:55:11.520 --> 00:55:12.580
I'm calling it student.

00:55:12.580 --> 00:55:13.080
Why?

00:55:13.080 --> 00:55:16.230
Because this function
that's passed in as key

00:55:16.230 --> 00:55:20.010
is called on every one of
the students in that list,

00:55:20.010 --> 00:55:22.200
every one of the
dictionaries in that list.

00:55:22.200 --> 00:55:24.990
What do I want this
anonymous function to return?

00:55:24.990 --> 00:55:28.560
Well given a student, I want
to index into that dictionary

00:55:28.560 --> 00:55:32.910
and access their name so that the
string Hermione, and Harry, and Ron,

00:55:32.910 --> 00:55:34.900
and Draco is ultimately returned.

00:55:34.900 --> 00:55:37.680
And that's what the sorted
function uses to decide

00:55:37.680 --> 00:55:42.450
how to sort these bigger dictionaries
that have other keys, like house,

00:55:42.450 --> 00:55:43.600
as well.

00:55:43.600 --> 00:55:47.640
So if I now go back to my terminal
window and run python of students.py,

00:55:47.640 --> 00:55:52.140
it still seems to work the same, but
it's arguably a little better design

00:55:52.140 --> 00:55:55.110
because I didn't waste lines of code
by defining some other function,

00:55:55.110 --> 00:55:57.180
calling it in one and only one place.

00:55:57.180 --> 00:56:00.948
I've done it all sort of
in one breath, if you will.

00:56:00.948 --> 00:56:03.990
All right, let me pause here to see
if there's any questions specifically

00:56:03.990 --> 00:56:10.470
about lambda, or anonymous functions,
and this tightening up of the code.

00:56:10.470 --> 00:56:14.850
AUDIENCE: I have a question, like
whether we could define lambda twice.

00:56:14.850 --> 00:56:17.040
DAVID MALAN: You can use lambda twice.

00:56:17.040 --> 00:56:19.890
You can create as many anonymous
functions as you'd like.

00:56:19.890 --> 00:56:22.710
And you generally use them
in contexts like this,

00:56:22.710 --> 00:56:25.390
where you want to pass
to some other function

00:56:25.390 --> 00:56:27.960
a function that itself
does not need a name.

00:56:27.960 --> 00:56:30.570
So you can absolutely use
it in more than one place.

00:56:30.570 --> 00:56:32.460
I just have only one use case for it.

00:56:32.460 --> 00:56:36.390
How about one other question on lambda
or anonymous functions specifically?

00:56:36.390 --> 00:56:43.900
AUDIENCE: What if our lambda would
take more than one line, for example?

00:56:43.900 --> 00:56:45.900
DAVID MALAN: Sure, if
your lambda function takes

00:56:45.900 --> 00:56:48.070
multiple parameters, that is fine.

00:56:48.070 --> 00:56:52.350
You can simply specify commas followed
by the names of those parameters,

00:56:52.350 --> 00:56:55.960
maybe x and y or so forth,
after the name student.

00:56:55.960 --> 00:56:58.080
So here too, lambda
looks a little different

00:56:58.080 --> 00:57:00.255
from def in that you
don't have parentheses,

00:57:00.255 --> 00:57:02.880
you don't have the keyword def,
you don't have a function name.

00:57:02.880 --> 00:57:05.080
But ultimately, they
achieve that same effect.

00:57:05.080 --> 00:57:08.940
They create a function anonymously
and allow you to pass it in,

00:57:08.940 --> 00:57:11.020
for instance, as some value here.

00:57:11.020 --> 00:57:14.040
So let's now change
students.csv to contain

00:57:14.040 --> 00:57:17.700
not students' houses at Hogwarts,
but their homes where they grew up.

00:57:17.700 --> 00:57:21.120
So Draco, for instance,
grew up in Malfoy Manor.

00:57:21.120 --> 00:57:24.090
Ron grew up in The Burrow.

00:57:24.090 --> 00:57:29.640
Harry grew up in Number
Four, Privet Drive.

00:57:29.640 --> 00:57:33.117
And according to the internet, no
one knows where Hermione grew up.

00:57:33.117 --> 00:57:35.950
The movies apparently took certain
liberties with where she grew up.

00:57:35.950 --> 00:57:37.658
So for this purpose,
we're actually going

00:57:37.658 --> 00:57:40.900
to remove Hermione because it is
unknown exactly where she was born.

00:57:40.900 --> 00:57:43.030
So we still have some three students.

00:57:43.030 --> 00:57:47.550
But if anyone can spot
the potential problem now,

00:57:47.550 --> 00:57:49.738
how might this be a bad thing?

00:57:49.738 --> 00:57:51.780
Well, let's go and try
and run our own code here.

00:57:51.780 --> 00:57:53.940
Let me go back to students.py here.

00:57:53.940 --> 00:57:56.340
And let me propose that I
just change my semantics

00:57:56.340 --> 00:57:59.640
because I'm now not thinking about
Hogwarts houses but the students'

00:57:59.640 --> 00:58:00.158
own homes.

00:58:00.158 --> 00:58:01.950
So I'm just going to
change some variables.

00:58:01.950 --> 00:58:06.000
I'm going to change this house
to a home, this house to a home,

00:58:06.000 --> 00:58:07.500
as well as this one here.

00:58:07.500 --> 00:58:09.720
I'm still going to sort
the students by name,

00:58:09.720 --> 00:58:13.950
but I'm going to say that they're not
in a house, but rather, from a home.

00:58:13.950 --> 00:58:17.460
So I've just changed the names of my
variables and my grammar in English

00:58:17.460 --> 00:58:20.400
here, ultimately, to print
out that, for instance, Harry

00:58:20.400 --> 00:58:23.860
is from Number Four,
Privet Drive, and so forth.

00:58:23.860 --> 00:58:25.800
But let's see what
happens here when I run

00:58:25.800 --> 00:58:30.930
Python of this version of students.py,
having changed students.csv

00:58:30.930 --> 00:58:33.360
to contain those homes and not houses.

00:58:33.360 --> 00:58:34.854
Enter.

00:58:34.854 --> 00:58:40.770
Huh, our first value error, like
the program just doesn't work.

00:58:40.770 --> 00:58:43.340
What might explain this value error?

00:58:43.340 --> 00:58:45.920
The explanation of
which rather cryptically

00:58:45.920 --> 00:58:48.410
is, too many values to unpack.

00:58:48.410 --> 00:58:52.520
And the line in question is
this one involving split.

00:58:52.520 --> 00:58:57.230
How did, all of a sudden, after all of
these successful runs of this program,

00:58:57.230 --> 00:59:00.260
did line 5 suddenly now break?

00:59:00.260 --> 00:59:04.100
AUDIENCE: In the line in
students.csv, you have three values.

00:59:04.100 --> 00:59:07.842
There's a line that you have
three values and in students.

00:59:07.842 --> 00:59:09.800
DAVID MALAN: Yeah, I
spent a lot of time trying

00:59:09.800 --> 00:59:12.800
to figure out where every
student should be from so that we

00:59:12.800 --> 00:59:14.540
could create this problem for us.

00:59:14.540 --> 00:59:16.940
And wonderfully, like, the
first sentence of the book

00:59:16.940 --> 00:59:19.070
is Number Four, Privet Drive.

00:59:19.070 --> 00:59:23.160
And so the fact that address has
a comma in it is problematic.

00:59:23.160 --> 00:59:23.660
Why?

00:59:23.660 --> 00:59:27.200
Because you and I decided sometime
ago to just standardize on commas--

00:59:27.200 --> 00:59:33.530
CSV, Comma-Separated
Values-- to denote the--

00:59:33.530 --> 00:59:37.800
we standardized on commas in order
to delineate one value from another.

00:59:37.800 --> 00:59:41.720
And if we have commas grammatically
in the student's home,

00:59:41.720 --> 00:59:44.750
we're clearly confusing
it as this special symbol.

00:59:44.750 --> 00:59:47.690
And the split function
is now, for just Harry,

00:59:47.690 --> 00:59:50.870
trying to split it into
three values, not just two.

00:59:50.870 --> 00:59:53.660
And that's why there's
too many values to unpack

00:59:53.660 --> 00:59:57.920
because we're only trying to assign
two variables, name and house.

00:59:57.920 --> 00:59:59.460
Now, what could we do here?

00:59:59.460 --> 01:00:02.120
Well, we could just change
our approach, for instance.

01:00:02.120 --> 01:00:08.540
One paradigm that is not uncommon is
to use something a little less common,

01:00:08.540 --> 01:00:10.130
like a vertical bar.

01:00:10.130 --> 01:00:13.550
So I could go in and change all
of my commas to vertical bars.

01:00:13.550 --> 01:00:15.710
That, too, could eventually
come back to bite us

01:00:15.710 --> 01:00:18.410
in that if my file eventually
has vertical bars somewhere,

01:00:18.410 --> 01:00:19.520
it might still break.

01:00:19.520 --> 01:00:21.530
So maybe that's not the best approach.

01:00:21.530 --> 01:00:23.370
I could maybe do something like this.

01:00:23.370 --> 01:00:25.880
I could escape the data,
as I've done in the past.

01:00:25.880 --> 01:00:30.230
And maybe I could put quotes
around any English string

01:00:30.230 --> 01:00:32.300
that itself contains a comma.

01:00:32.300 --> 01:00:33.230
And that's fine.

01:00:33.230 --> 01:00:36.350
I could do that, but then
my code, students.py,

01:00:36.350 --> 01:00:40.250
is going to have to change too
because I can't just naively split on

01:00:40.250 --> 01:00:41.240
a comma now.

01:00:41.240 --> 01:00:43.760
I'm going to have to
be smarter about it.

01:00:43.760 --> 01:00:45.710
I'm going to have to
take into account split

01:00:45.710 --> 01:00:48.800
only on the commas that
are not inside of quotes.

01:00:48.800 --> 01:00:51.260
And oh, it's getting complicated fast.

01:00:51.260 --> 01:00:53.810
And at this point, you need to
take a step back and consider,

01:00:53.810 --> 01:00:57.320
you know what, if we're having this
problem, odds are, many other people

01:00:57.320 --> 01:00:59.420
before us have had this same problem.

01:00:59.420 --> 01:01:02.750
It is incredibly common
to store data in files.

01:01:02.750 --> 01:01:06.420
It is incredibly common to
use CSV files specifically.

01:01:06.420 --> 01:01:07.740
And so you know what.

01:01:07.740 --> 01:01:10.760
Why don't we see if there's
a library in Python that

01:01:10.760 --> 01:01:14.690
exists to read and/or write CSV files?

01:01:14.690 --> 01:01:16.910
Rather than reinvent
the wheel, so to speak,

01:01:16.910 --> 01:01:20.540
let's see if we can write better code by
standing on the shoulders of others who

01:01:20.540 --> 01:01:22.610
have come before us--
programmers passed--

01:01:22.610 --> 01:01:26.090
and actually use their code to do
the reading and writing of CSVs,

01:01:26.090 --> 01:01:30.210
so we can focus on the part of our
problem that you and I care about.

01:01:30.210 --> 01:01:32.930
So let's propose that we
go back to our code here

01:01:32.930 --> 01:01:35.960
and see how we might
use the CSV library.

01:01:35.960 --> 01:01:40.370
Indeed, within Python, there
is a module called CSV.

01:01:40.370 --> 01:01:43.010
The documentation for
it is at this URL here

01:01:43.010 --> 01:01:44.720
in Python's official documentation.

01:01:44.720 --> 01:01:49.040
But there's a few functions that are
pretty readily accessible if we just

01:01:49.040 --> 01:01:49.940
dive right in.

01:01:49.940 --> 01:01:52.050
And let me propose that we do this.

01:01:52.050 --> 01:01:53.840
Let me go back to my code here.

01:01:53.840 --> 01:01:58.370
And instead of re-inventing this wheel
and reading the file line by line,

01:01:58.370 --> 01:02:02.390
and splitting on commas, and dealing
now with quotes, and Privet Drives,

01:02:02.390 --> 01:02:04.640
and so forth, let's do this instead.

01:02:04.640 --> 01:02:10.010
At the start of my program, let me
go up and import the CSV module.

01:02:10.010 --> 01:02:12.530
Let's use this library
that someone else has

01:02:12.530 --> 01:02:16.130
written that's dealing with all of
these corner cases, if you will.

01:02:16.130 --> 01:02:18.980
I'm still going to give myself
a list, initially empty,

01:02:18.980 --> 01:02:20.630
in which to store all these students.

01:02:20.630 --> 01:02:23.930
But I'm going to change my approach
here now just a little bit.

01:02:23.930 --> 01:02:28.220
When I open this file with
with, let me go in here

01:02:28.220 --> 01:02:30.080
and change this a little bit.

01:02:30.080 --> 01:02:33.620
I'm going to go in
here now and say this.

01:02:33.620 --> 01:02:38.630
Reader equals csv.reader,
passing in file as input.

01:02:38.630 --> 01:02:42.230
So it turns out, if you read the
documentation for the CSV module,

01:02:42.230 --> 01:02:45.650
it comes with a function called
reader whose purpose in life

01:02:45.650 --> 01:02:50.450
is to read a CSV file for you and
figure out, where are the commas, where

01:02:50.450 --> 01:02:53.450
are the quotes, where are all
the potential corner cases,

01:02:53.450 --> 01:02:55.380
and just deal with them for you.

01:02:55.380 --> 01:02:57.860
You can override certain
defaults or assumptions in case

01:02:57.860 --> 01:03:00.260
you're using not a comma,
but a pipe or something else.

01:03:00.260 --> 01:03:02.910
But by default, I think
it's just going to work.

01:03:02.910 --> 01:03:07.070
Now, how do I integrate over a
reader and not the raw file itself?

01:03:07.070 --> 01:03:08.060
It's almost the same.

01:03:08.060 --> 01:03:10.220
The library allows you still to do this.

01:03:10.220 --> 01:03:13.220
For each row in the reader--

01:03:13.220 --> 01:03:15.890
so you're not iterating
over the file directly now.

01:03:15.890 --> 01:03:18.020
You're iterating over the
reader, which is, again,

01:03:18.020 --> 01:03:22.130
going to handle all of the parsing
of commas, and new lines, and more.

01:03:22.130 --> 01:03:25.070
For each row in the reader,
what am I going to do?

01:03:25.070 --> 01:03:27.080
Well, at the moment,
I'm going to do this.

01:03:27.080 --> 01:03:32.060
I'm going to append to my students list
the following dictionary, a dictionary

01:03:32.060 --> 01:03:36.680
that has a name whose value is
the current row's first column,

01:03:36.680 --> 01:03:41.240
and whose house, or rather,
home now is the row's second.

01:03:41.240 --> 01:03:41.870
column.

01:03:41.870 --> 01:03:45.890
Now, it's worth noting that the
reader for each line in the file,

01:03:45.890 --> 01:03:47.480
indeed, returns to me a row.

01:03:47.480 --> 01:03:50.210
But it returns to me a
row that's a list, which

01:03:50.210 --> 01:03:52.310
is to say that the first
element of that list

01:03:52.310 --> 01:03:54.560
is going to be the
student's name, as before.

01:03:54.560 --> 01:03:59.030
The second element of that list is
going to be the student's home, as now

01:03:59.030 --> 01:03:59.810
before.

01:03:59.810 --> 01:04:02.430
But if I want to access
each of those elements,

01:04:02.430 --> 01:04:04.310
remember that lists are 0 indexed.

01:04:04.310 --> 01:04:07.490
We start counting at 0 and then
1, rather than 1 and then 2.

01:04:07.490 --> 01:04:10.380
So if I want to get at the student's
name, I use row, bracket, 0.

01:04:10.380 --> 01:04:13.130
And if I want to get at the student's
home, I use row, bracket, 1.

01:04:13.130 --> 01:04:17.060
But in my for loop, we can do
that same unpacking as before.

01:04:17.060 --> 01:04:21.030
If I know the CSV is only
going to have two columns,

01:04:21.030 --> 01:04:25.280
I could even do this--
for name, home in reader.

01:04:25.280 --> 01:04:27.710
And now I don't need
to use list notation.

01:04:27.710 --> 01:04:32.360
I can unpack things all at once
and say, name here, and home here.

01:04:32.360 --> 01:04:35.270
The rest of my code can stay
exactly the same because,

01:04:35.270 --> 01:04:36.890
what am I doing now on line 8?

01:04:36.890 --> 01:04:39.770
I'm still constructing the
same dictionary as before,

01:04:39.770 --> 01:04:42.050
albeit for homes instead of houses.

01:04:42.050 --> 01:04:45.200
And I'm grabbing those values
now, not from the file itself

01:04:45.200 --> 01:04:47.062
and my use of split, but the reader.

01:04:47.062 --> 01:04:48.770
And again, what the
reader is going to do

01:04:48.770 --> 01:04:51.320
is figure out, where are those
commas, where are the quotes?

01:04:51.320 --> 01:04:53.700
And just solve that problem for you.

01:04:53.700 --> 01:04:57.560
So let me go now down to my terminal
window and run python of students.py,

01:04:57.560 --> 01:04:58.400
and hit Enter.

01:04:58.400 --> 01:05:04.040
And now we see successfully, sorted no
less, that Draco is from Malfoy Manor.

01:05:04.040 --> 01:05:07.250
Harry is from Number
Four, comma, Privet Drive.

01:05:07.250 --> 01:05:09.950
And Ron is from The Burrow.

01:05:09.950 --> 01:05:17.420
Questions now on this technique of using
CSV reader from that CSV module, which,

01:05:17.420 --> 01:05:20.990
again, is just getting us out of the
business of reading each line ourself

01:05:20.990 --> 01:05:23.330
and reading each of those
commas and splitting?

01:05:23.330 --> 01:05:27.500
AUDIENCE: So my questions are
related to something in the past.

01:05:27.500 --> 01:05:31.670
I recognize that you are
reading a file every time--

01:05:31.670 --> 01:05:39.080
well, we assume that we have the CSV
file to hand already in this case.

01:05:39.080 --> 01:05:44.540
Is it possible to make a
file readable and writable?

01:05:44.540 --> 01:05:50.960
So in this case, you could
write such stuff to the file,

01:05:50.960 --> 01:05:53.510
but then at the same
time, you could have

01:05:53.510 --> 01:05:57.590
another function that reads through
the file and does changes to it

01:05:57.590 --> 01:05:58.257
as you go along?

01:05:58.257 --> 01:05:59.757
DAVID MALAN: A really good question.

01:05:59.757 --> 01:06:01.070
And the short answer is, yes.

01:06:01.070 --> 01:06:05.000
However, historically, the mental model
for a file is that of a cassette tape.

01:06:05.000 --> 01:06:08.300
Years ago, not really in use
anymore, but cassette tapes

01:06:08.300 --> 01:06:10.830
are sequential whereby they
start at the beginning,

01:06:10.830 --> 01:06:12.747
and if you want to get
to the end, you kind of

01:06:12.747 --> 01:06:14.690
have to unwind the tape
to get to that point.

01:06:14.690 --> 01:06:18.307
The closest analog nowadays would be
something like Netflix or any streaming

01:06:18.307 --> 01:06:21.140
service, where there's a scrubber
that you have to go left to right.

01:06:21.140 --> 01:06:22.910
You can't just jump there or jump there.

01:06:22.910 --> 01:06:24.450
You don't have random access.

01:06:24.450 --> 01:06:27.290
So the problem with files, if
you want to read and write them,

01:06:27.290 --> 01:06:31.010
you or some library needs to keep
track of where you are in the file

01:06:31.010 --> 01:06:34.200
so that if you're reading from the
top and then you write at the bottom,

01:06:34.200 --> 01:06:37.170
and you want to start reading again,
you seek back to the beginning.

01:06:37.170 --> 01:06:39.045
So it's not something
we'll do here in class.

01:06:39.045 --> 01:06:41.360
It's more involved, but
it's absolutely doable.

01:06:41.360 --> 01:06:44.402
For our purposes, we'll generally
recommend, read the file.

01:06:44.402 --> 01:06:46.610
And then if you want to
change it, write it back out,

01:06:46.610 --> 01:06:49.880
rather than trying to make more
piecemeal changes, which is good

01:06:49.880 --> 01:06:53.480
if, though, the file is massive,
and it would just be very expensive

01:06:53.480 --> 01:06:55.680
time-wise to change the whole thing.

01:06:55.680 --> 01:06:59.690
Other questions on this CSV reader?

01:06:59.690 --> 01:07:05.170
AUDIENCE: It's possible to
write a paragraph in that file?

01:07:05.170 --> 01:07:06.170
DAVID MALAN: Absolutely.

01:07:06.170 --> 01:07:09.590
Right now, I'm writing very small
strings, just names or houses,

01:07:09.590 --> 01:07:10.460
as I did before.

01:07:10.460 --> 01:07:15.730
But you can absolutely write as
much text as you want, indeed.

01:07:15.730 --> 01:07:18.040
Other questions on CSV reader?

01:07:18.040 --> 01:07:22.780
AUDIENCE: Can a user
chose himself a key?

01:07:22.780 --> 01:07:26.920
Like, input key will be a name or code.

01:07:26.920 --> 01:07:29.950
DAVID MALAN: So short answer,
yes, we could absolutely

01:07:29.950 --> 01:07:32.680
write a program that
prompts the user for a name

01:07:32.680 --> 01:07:34.240
and a home, a name and a home.

01:07:34.240 --> 01:07:35.740
And we could write out those values.

01:07:35.740 --> 01:07:38.770
And in a moment, we'll see how
you can write to a CSV file.

01:07:38.770 --> 01:07:44.530
For now, I'm assuming, as the programmer
who created students.csv, that I

01:07:44.530 --> 01:07:46.270
know what the columns are going to be.

01:07:46.270 --> 01:07:48.770
And therefore, I'm naming
my variables accordingly.

01:07:48.770 --> 01:07:53.470
However, this is a good segue to one
final feature of reading CSVs, which

01:07:53.470 --> 01:07:57.520
is that you don't have to rely
on either getting a row as a list

01:07:57.520 --> 01:08:00.520
and using bracket 0 or
bracket 1, and, you don't have

01:08:00.520 --> 01:08:02.500
to unpack things manually in this way.

01:08:02.500 --> 01:08:05.260
We could actually be
smarter and start storing

01:08:05.260 --> 01:08:08.500
the names of these columns
in the CSV file itself.

01:08:08.500 --> 01:08:12.310
And in fact, if any of you have ever
opened a spreadsheet file before, be it

01:08:12.310 --> 01:08:16.210
in Excel, Apple Numbers, Google
Spreadsheets or the like, odds are,

01:08:16.210 --> 01:08:20.149
you've noticed that the first row,
very frequently, is a little different.

01:08:20.149 --> 01:08:22.270
It actually is boldface
sometimes, or it actually

01:08:22.270 --> 01:08:26.710
contains the names of those columns,
the names of those attributes below.

01:08:26.710 --> 01:08:27.939
And we can do this here.

01:08:27.939 --> 01:08:30.580
In students.csv, I
don't have to just keep

01:08:30.580 --> 01:08:32.830
assuming that the
student's name is first

01:08:32.830 --> 01:08:34.840
and that the student's home is second.

01:08:34.840 --> 01:08:39.010
I can explicitly bake that
information into the file just

01:08:39.010 --> 01:08:41.950
to reduce the probability
of mistakes down the road.

01:08:41.950 --> 01:08:46.810
I can literally use the first row of
this file and say, name, comma, home.

01:08:46.810 --> 01:08:50.622
So notice that name is not
literally someone's name,

01:08:50.622 --> 01:08:52.330
and home is not
literally someone's home.

01:08:52.330 --> 01:08:57.050
It is literally the words, name
and home, separated by comma.

01:08:57.050 --> 01:09:01.630
And if I now go back into
students.py and don't use CSV reader,

01:09:01.630 --> 01:09:04.540
but instead, I use a
dictionary reader, I

01:09:04.540 --> 01:09:09.290
can actually treat my CSV file even
more flexibly, not just for this,

01:09:09.290 --> 01:09:10.630
but for other examples too.

01:09:10.630 --> 01:09:11.740
Let me do this.

01:09:11.740 --> 01:09:14.380
Instead of using a
CSV reader, let me use

01:09:14.380 --> 01:09:19.870
a CSV dict reader, which will now
iterate over the file top to bottom,

01:09:19.870 --> 01:09:24.250
loading in each line of text
not as a list of columns

01:09:24.250 --> 01:09:26.712
but as a dictionary of columns.

01:09:26.712 --> 01:09:28.420
What's nice about this
is that it's going

01:09:28.420 --> 01:09:32.200
to give me automatic access
now to those columns' names.

01:09:32.200 --> 01:09:35.470
I'm going to revert to just
saying, for row in reader,

01:09:35.470 --> 01:09:38.319
and now I'm going to
append a name and a home.

01:09:38.319 --> 01:09:41.890
But how am I going to get
access to the current row's

01:09:41.890 --> 01:09:44.740
name and the current row's home?

01:09:44.740 --> 01:09:48.790
Well, earlier, I used bracket 0 for
the first and bracket 1 for the second

01:09:48.790 --> 01:09:50.800
when I was using a reader.

01:09:50.800 --> 01:09:52.569
A reader returns lists.

01:09:52.569 --> 01:09:57.920
A dict reader or dictionary reader
returns dictionaries, one at a time.

01:09:57.920 --> 01:10:01.210
And so if I want to access
the current row's name,

01:10:01.210 --> 01:10:03.400
I can say, row, quote/unquote, name.

01:10:03.400 --> 01:10:06.790
I can say here for home,
row, quote/unquote, home.

01:10:06.790 --> 01:10:09.220
And I now have access
to those same values.

01:10:09.220 --> 01:10:12.130
The only change I had to make,
to be clear, was in my CSV file,

01:10:12.130 --> 01:10:16.060
I had to include, on the
very first row, little hints

01:10:16.060 --> 01:10:17.830
as to what these columns are.

01:10:17.830 --> 01:10:21.220
And if I now run this code, I
think it should behave pretty much

01:10:21.220 --> 01:10:23.080
the same-- python of students.py.

01:10:23.080 --> 01:10:25.000
And indeed, we get the same sentences.

01:10:25.000 --> 01:10:29.950
But now my code is more robust
against changes in this data.

01:10:29.950 --> 01:10:34.270
If I were to open the CSV file in
Excel, or Google Spreadsheets, or Apple

01:10:34.270 --> 01:10:37.272
Numbers, and for whatever reason
change the columns around,

01:10:37.272 --> 01:10:39.730
maybe this is a file that you're
sharing with someone else,

01:10:39.730 --> 01:10:42.850
and just because, they decide
to sort things differently left

01:10:42.850 --> 01:10:46.390
to right by moving the columns
around, previously, my code

01:10:46.390 --> 01:10:50.020
would have broken because I was
assuming that name is always first,

01:10:50.020 --> 01:10:51.940
and home is always second.

01:10:51.940 --> 01:10:53.800
But if I did this--

01:10:53.800 --> 01:10:57.490
be it manually in one of those
programs or here-- home, comma, name,

01:10:57.490 --> 01:10:59.530
and suppose, I reversed all of this.

01:10:59.530 --> 01:11:04.600
The home comes first, followed by
Harry, The Burrow, then by Ron,

01:11:04.600 --> 01:11:08.020
and then lastly, Malfoy
Manor, then Draco,

01:11:08.020 --> 01:11:10.285
notice that my file is
now completely flipped.

01:11:10.285 --> 01:11:12.910
The first column is now the
second, and the second's the first.

01:11:12.910 --> 01:11:17.950
But I took care to update the
header of that file, the first row.

01:11:17.950 --> 01:11:21.070
Notice my Python code, I'm
not going to touch it at all.

01:11:21.070 --> 01:11:24.940
I'm going to rerun python of
students.py, and hit Enter.

01:11:24.940 --> 01:11:26.830
And it still just works.

01:11:26.830 --> 01:11:29.890
And this, too, is an example
of coding defensively.

01:11:29.890 --> 01:11:32.530
What if someone changes your
CSV file, your data file?

01:11:32.530 --> 01:11:33.830
Ideally, that won't happen.

01:11:33.830 --> 01:11:37.840
But even if it does now, because
I'm using a dictionary reader that's

01:11:37.840 --> 01:11:42.490
going to infer from that first row
for me what the columns are called,

01:11:42.490 --> 01:11:44.350
my code just keeps working.

01:11:44.350 --> 01:11:47.990
And so it keeps getting, if
you will, better and better.

01:11:47.990 --> 01:11:50.920
Any questions now on this approach?

01:11:50.920 --> 01:11:54.008
AUDIENCE: Yeah, what is the importance
of new line in the CSV file?

01:11:54.008 --> 01:11:56.800
DAVID MALAN: What's the importance
of the new line in the CSV file?

01:11:56.800 --> 01:11:58.270
It's partly a convention.

01:11:58.270 --> 01:12:00.670
In the world of text
files, we humans have just

01:12:00.670 --> 01:12:04.810
been, for decades, in the habit
of storing data line by line.

01:12:04.810 --> 01:12:06.370
It's visually convenient.

01:12:06.370 --> 01:12:09.400
It's just easy to extract
from the file because you just

01:12:09.400 --> 01:12:10.450
look for the new lines.

01:12:10.450 --> 01:12:14.800
So the new line just separates
some data from some other data.

01:12:14.800 --> 01:12:17.710
We could use any other
symbol on the keyboard,

01:12:17.710 --> 01:12:21.250
but it's just common to hit Enter to
just move the data to the next line.

01:12:21.250 --> 01:12:22.810
Just a convention.

01:12:22.810 --> 01:12:23.710
Other questions?

01:12:23.710 --> 01:12:28.010
AUDIENCE: It seems to be working
fine if you just have name and home.

01:12:28.010 --> 01:12:32.155
I'm wondering what will happen
if you want to put in more data.

01:12:34.750 --> 01:12:40.115
Say, you wanted to add a house
to both the name and the home.

01:12:40.115 --> 01:12:43.240
DAVID MALAN: Sure, if you wanted to
add the house back-- so if I go in here

01:12:43.240 --> 01:12:47.980
and add house last, and I go here
and say, Gryffindor for Harry,

01:12:47.980 --> 01:12:53.890
Gryffindor for Ron, and Slytherin
for Draco, now I have three columns,

01:12:53.890 --> 01:12:57.010
effectively, if you will-- home
on the left, name in the middle,

01:12:57.010 --> 01:13:00.640
house on the right, each separated
by commas with weird things,

01:13:00.640 --> 01:13:03.610
like Number Four, comma,
Privet Drive still quoted.

01:13:03.610 --> 01:13:07.540
Notice, if I go back to students.py,
and I don't change the code at all

01:13:07.540 --> 01:13:11.230
and run python of students.py,
it still just works.

01:13:11.230 --> 01:13:14.140
And this is what's so powerful
about a dictionary reader.

01:13:14.140 --> 01:13:15.730
It can change over time.

01:13:15.730 --> 01:13:17.620
It can have more and more columns.

01:13:17.620 --> 01:13:20.290
Your existing code is
not going to break.

01:13:20.290 --> 01:13:23.500
Your code would break, would be
much more fragile, so to speak,

01:13:23.500 --> 01:13:26.860
if you were making assumptions like, the
first column's always going to be name.

01:13:26.860 --> 01:13:28.810
The second column is
always going to be house.

01:13:28.810 --> 01:13:32.590
Things will break fast if
those assumptions break down--

01:13:32.590 --> 01:13:34.750
so not a problem in this case.

01:13:34.750 --> 01:13:37.720
Well, let me propose that,
besides reading CSVs,

01:13:37.720 --> 01:13:40.960
let's at least take a peek at
how we might write a CSV too.

01:13:40.960 --> 01:13:44.410
If you're writing a program in which you
want to store not just students' names,

01:13:44.410 --> 01:13:48.920
but maybe their homes as well in a file,
how can we keep adding to this file?

01:13:48.920 --> 01:13:52.460
Let me go ahead and delete
the contents of students.csv

01:13:52.460 --> 01:13:56.300
and just re-add a single
simple row, name, comma, home,

01:13:56.300 --> 01:14:00.530
so as to anticipate inserting more
names and homes into this file.

01:14:00.530 --> 01:14:03.780
And then let me go to students.py,
and let me just start fresh

01:14:03.780 --> 01:14:05.600
so as to write out data this time.

01:14:05.600 --> 01:14:07.730
I'm still going to go
ahead and Import CSV.

01:14:07.730 --> 01:14:11.870
I'm going to go ahead now and
prompt the user for their name-- so

01:14:11.870 --> 01:14:15.410
input, quote/unquote, What's your name?

01:14:15.410 --> 01:14:18.170
And I'm going to go ahead and
prompt the user for their home--

01:14:18.170 --> 01:14:23.780
so home equals input,
quote/unquote, Where's your home?

01:14:23.780 --> 01:14:26.000
Now I'm going to go
ahead and open the file,

01:14:26.000 --> 01:14:29.090
but this time for writing
instead of reading, as follows--

01:14:29.090 --> 01:14:32.900
with open, quote/unquote, students.csv.

01:14:32.900 --> 01:14:35.210
I'm going to open it in
append mode so that I

01:14:35.210 --> 01:14:38.210
keep adding more and more
students and homes to the file,

01:14:38.210 --> 01:14:40.820
rather than just overwriting
the entire file itself.

01:14:40.820 --> 01:14:43.250
And I'm going to use a
variable name of file.

01:14:43.250 --> 01:14:46.460
I'm then going to go ahead and give
myself a variable called writer,

01:14:46.460 --> 01:14:49.790
and I'm going to set it equal to
the return value of another function

01:14:49.790 --> 01:14:53.060
in the CSV module called csv.writer.

01:14:53.060 --> 01:14:59.600
And that writer function takes as its
sole argument the file variable there.

01:14:59.600 --> 01:15:01.460
Now I'm going to go
ahead and just do this.

01:15:01.460 --> 01:15:04.220
I'm going to say,
writer.writerow, and I'm

01:15:04.220 --> 01:15:09.020
going to pass into writerow the line
that I want to write to the file

01:15:09.020 --> 01:15:10.470
specifically as a list.

01:15:10.470 --> 01:15:13.890
So I'm going to give this a
list of name, comma, home,

01:15:13.890 --> 01:15:16.140
which, of course, are the
contents of those variables.

01:15:16.140 --> 01:15:18.170
Now I'm going to go
ahead and save the file.

01:15:18.170 --> 01:15:22.220
I'm going to go ahead and rerun
python of students.py, hit Enter.

01:15:22.220 --> 01:15:23.270
And what's your name?

01:15:23.270 --> 01:15:28.870
Well, let me go ahead and type in
Harry as my name and Number Four,

01:15:28.870 --> 01:15:31.690
comma, Privet Drive, Enter.

01:15:31.690 --> 01:15:34.750
Now notice, that input
itself did have a comma.

01:15:34.750 --> 01:15:37.450
And so if I go to my
CSV file now, notice

01:15:37.450 --> 01:15:40.090
that it's automatically
been quoted for me so

01:15:40.090 --> 01:15:41.860
that subsequent reads
from this file don't

01:15:41.860 --> 01:15:46.007
confuse that comma with the actual
comma between Harry and his home.

01:15:46.007 --> 01:15:48.340
Well, let me go ahead and run
it a couple of more times.

01:15:48.340 --> 01:15:51.340
Let me go ahead and rerun
python of students.py.

01:15:51.340 --> 01:15:55.300
Let me go ahead and input this time
Ron and his home as The Burrow.

01:15:55.300 --> 01:15:58.210
Let's go back to students.csv
to see what it looks like.

01:15:58.210 --> 01:16:02.140
Now we see Ron, comma, The Burrow has
been added automatically to the file.

01:16:02.140 --> 01:16:03.520
And let's do one more--

01:16:03.520 --> 01:16:06.190
python of students.py, Enter.

01:16:06.190 --> 01:16:10.900
Let's go ahead and give Draco's name and
his home, which would be Malfoy Manor,

01:16:10.900 --> 01:16:11.590
Enter.

01:16:11.590 --> 01:16:14.200
And if we go back to
students.csv, now, we

01:16:14.200 --> 01:16:15.940
see that Draco is in the file itself.

01:16:15.940 --> 01:16:19.060
And the library took care of not
only writing each of those rows,

01:16:19.060 --> 01:16:20.140
per the function's name.

01:16:20.140 --> 01:16:23.710
It also handled the escaping,
so to speak, of any strings

01:16:23.710 --> 01:16:27.018
that themselves contained a
comma, like Harry's own home.

01:16:27.018 --> 01:16:28.810
Well, it turns out,
there's yet another way

01:16:28.810 --> 01:16:32.920
we could implement this same program
without having to worry about precisely

01:16:32.920 --> 01:16:35.650
that order again and again
and just passing in a list.

01:16:35.650 --> 01:16:39.580
It turns out, if we're keeping track
of what's the name and what's the home,

01:16:39.580 --> 01:16:42.100
we could use something like
a dictionary to associate

01:16:42.100 --> 01:16:43.580
those keys with those values.

01:16:43.580 --> 01:16:46.720
So let me go ahead and back up and
remove these students from the file,

01:16:46.720 --> 01:16:49.660
leaving only the header row
again-- name, comma, home.

01:16:49.660 --> 01:16:51.550
And let me go over to students.py.

01:16:51.550 --> 01:16:54.130
And this time, instead
of using CSV writer,

01:16:54.130 --> 01:16:57.010
I'm going to go ahead
and use csv.DictWriter,

01:16:57.010 --> 01:16:58.900
which is a dictionary
writer, that's going

01:16:58.900 --> 01:17:00.890
to open the file in much the same way.

01:17:00.890 --> 01:17:04.840
But rather than write a
row as this list of name,

01:17:04.840 --> 01:17:08.050
comma, home, what I'm now
going to do is follows.

01:17:08.050 --> 01:17:11.950
I'm going to first output
an actual dictionary,

01:17:11.950 --> 01:17:14.550
the first key of which
is name, colon, and then

01:17:14.550 --> 01:17:17.050
the value thereof is going to
be the name that was typed in.

01:17:17.050 --> 01:17:19.468
And I'm going to pass in a
key of home, quote/unquote,

01:17:19.468 --> 01:17:22.010
the value of which, of course,
is the home that was typed in.

01:17:22.010 --> 01:17:24.520
But with DictWriter,
I do need to give it

01:17:24.520 --> 01:17:29.440
a hint as to the order in which those
columns are when writing it out so

01:17:29.440 --> 01:17:33.530
that, subsequently, they could be
read, even if those orderings change.

01:17:33.530 --> 01:17:36.070
Let me go ahead and pass
in fieldnames, which

01:17:36.070 --> 01:17:39.460
is a second argument to
DictWriter, equals, and then

01:17:39.460 --> 01:17:41.890
a list of the actual
columns that I know are

01:17:41.890 --> 01:17:45.340
in this file, which, of
course, are name, comma, home.

01:17:45.340 --> 01:17:47.410
Those times, in quotes
because that's, indeed,

01:17:47.410 --> 01:17:50.200
the string names of the
columns, so to speak,

01:17:50.200 --> 01:17:52.390
that I intend to write to in that file.

01:17:52.390 --> 01:17:55.340
All right, now let me go ahead
and go to my terminal window,

01:17:55.340 --> 01:17:57.190
run python of students.py.

01:17:57.190 --> 01:17:59.860
This time, I'll type
in Harry's name again.

01:17:59.860 --> 01:18:05.170
I'll, again, type in Number
Four, comma, Privet Drive, Enter.

01:18:05.170 --> 01:18:07.360
Let's now go back to students.csv.

01:18:07.360 --> 01:18:11.380
And voila, Harry is back in the file,
and it's properly escaped or quoted.

01:18:11.380 --> 01:18:14.830
I'm sure that if we do this
again with Ron and The Burrow,

01:18:14.830 --> 01:18:20.320
and let's go ahead and run it one
third time with Draco and Malfoy Manor,

01:18:20.320 --> 01:18:21.100
Enter.

01:18:21.100 --> 01:18:22.810
Let's go back to students.csv.

01:18:22.810 --> 01:18:26.200
And via this dictionary
writer, we now have all three

01:18:26.200 --> 01:18:27.530
of those students as well.

01:18:27.530 --> 01:18:31.480
So whereas with CSV
writer, the onus is on us

01:18:31.480 --> 01:18:34.270
to pass in a list of all
of the values that we

01:18:34.270 --> 01:18:37.870
want to put from left to right, with
a dictionary writer, technically,

01:18:37.870 --> 01:18:39.760
they could be in any
order in the dictionary.

01:18:39.760 --> 01:18:43.120
In fact, I could just
have correctly done this,

01:18:43.120 --> 01:18:45.640
passing in home followed by name.

01:18:45.640 --> 01:18:46.720
But it's a dictionary.

01:18:46.720 --> 01:18:50.322
And so the ordering in this case does
not matter so long as the key is there

01:18:50.322 --> 01:18:51.280
and the value is there.

01:18:51.280 --> 01:18:55.660
And because I have passed in field names
as the second argument to DictWriter,

01:18:55.660 --> 01:18:59.410
it ensures that the library
knows exactly which column

01:18:59.410 --> 01:19:02.920
contains name or home, respectively.

01:19:02.920 --> 01:19:07.300
Are there any questions now on
dictionary reading, dictionary writing,

01:19:07.300 --> 01:19:10.480
or CSVs more generally?

01:19:10.480 --> 01:19:14.200
AUDIENCE: In any
specific situation for me

01:19:14.200 --> 01:19:17.110
to use a single quotation
or double quotation?

01:19:17.110 --> 01:19:20.980
Because after the print,
we use single quotation

01:19:20.980 --> 01:19:24.220
to represent the key of the dictionary.

01:19:24.220 --> 01:19:30.363
But after the reading or writing,
we use the double quotation.

01:19:30.363 --> 01:19:31.780
DAVID MALAN: It's a good question.

01:19:31.780 --> 01:19:36.340
In Python, you can generally use double
quotes, or you can use single quotes.

01:19:36.340 --> 01:19:37.430
And it doesn't matter.

01:19:37.430 --> 01:19:40.660
You should just be self-consistent
so that stylistically your code

01:19:40.660 --> 01:19:42.340
looks the same all throughout.

01:19:42.340 --> 01:19:45.610
Sometimes, though, it is
necessary to alternate.

01:19:45.610 --> 01:19:49.870
If you're already using double quotes,
as I was earlier for a long f string,

01:19:49.870 --> 01:19:52.780
but inside that f string,
I was interpolating

01:19:52.780 --> 01:19:55.240
the values of some variables
using curly braces,

01:19:55.240 --> 01:19:57.760
and those variables were dictionaries.

01:19:57.760 --> 01:20:02.230
And in order to index into a
dictionary, you use square brackets

01:20:02.230 --> 01:20:03.370
and then quotes.

01:20:03.370 --> 01:20:05.690
But if you're already using
double quotes out here,

01:20:05.690 --> 01:20:09.250
you should generally use single
quotes here, or vise versa.

01:20:09.250 --> 01:20:12.683
But otherwise, I'm in the habit
of using double quotes everywhere.

01:20:12.683 --> 01:20:15.100
Others are in the habit of
using single quotes everywhere.

01:20:15.100 --> 01:20:20.676
It only matters sometimes if one
might be confused for the other.

01:20:20.676 --> 01:20:24.200
Other questions on dictionary
writing or reading?

01:20:24.200 --> 01:20:30.790
AUDIENCE: Yeah, my question is, can we
use multiple CSV files in any program?

01:20:30.790 --> 01:20:31.790
DAVID MALAN: Absolutely.

01:20:31.790 --> 01:20:33.830
You can use as many
CSV files as you want.

01:20:33.830 --> 01:20:37.190
And it's just one of the formats
that you can use to save data.

01:20:37.190 --> 01:20:40.910
Other questions on CSVs or File I/O?

01:20:40.910 --> 01:20:43.110
AUDIENCE: Thanks for taking my question.

01:20:43.110 --> 01:20:49.580
So when you're reading from
the file as a dictionary,

01:20:49.580 --> 01:20:52.910
you had the fields called.

01:20:52.910 --> 01:20:55.280
When you're reading, couldn't
you just call the row?

01:20:55.280 --> 01:21:03.830
the previous version of the students.py
file, when you're reading each row,

01:21:03.830 --> 01:21:07.490
you were splitting out
the fields by name.

01:21:10.370 --> 01:21:13.310
Yeah, so when you're appending
to the students list,

01:21:13.310 --> 01:21:20.200
couldn't you just call for row
and reader, students.append row,

01:21:20.200 --> 01:21:22.340
rather than naming each of the fields?

01:21:22.340 --> 01:21:23.690
DAVID MALAN: Oh, very clever.

01:21:23.690 --> 01:21:28.880
Short answer, yes, in so
far as DictReader returns

01:21:28.880 --> 01:21:32.480
one dictionary at a time,
when you loop over it,

01:21:32.480 --> 01:21:34.550
row is already going to be a dictionary.

01:21:34.550 --> 01:21:38.060
So yes, you could actually
get away with doing this.

01:21:38.060 --> 01:21:41.510
And the effect would really
be the same in this case.

01:21:41.510 --> 01:21:42.620
Good observation.

01:21:42.620 --> 01:21:46.100
How about one more question on CSVs?

01:21:46.100 --> 01:21:51.260
AUDIENCE: Yeah, when reading in
CSVs from my past work with data,

01:21:51.260 --> 01:21:53.550
a lot of things can go wrong.

01:21:53.550 --> 01:21:57.170
I don't know if it's a fair question
that you can answer in a few sentences.

01:21:57.170 --> 01:22:04.472
But are there any best practices to
double check that no mistakes occurred?

01:22:04.472 --> 01:22:06.180
DAVID MALAN: It's a
really good question.

01:22:06.180 --> 01:22:10.730
And I would say, in general, if
you're using code to generate the CSVs

01:22:10.730 --> 01:22:14.330
and to read the CSVs, and
you're using a good library,

01:22:14.330 --> 01:22:16.080
theoretically, nothing should go wrong.

01:22:16.080 --> 01:22:20.960
It should be 100% correct if
the libraries are 100% correct.

01:22:20.960 --> 01:22:22.850
You and I tend to be the problem.

01:22:22.850 --> 01:22:27.110
When you let a human touch the CSV,
or when Excel, or Apple Numbers,

01:22:27.110 --> 01:22:29.030
or some other tools
involved that might not

01:22:29.030 --> 01:22:30.980
be aligned with your
code's expectations,

01:22:30.980 --> 01:22:33.500
things then, yes, can break.

01:22:33.500 --> 01:22:37.100
The goal-- sometimes, honestly,
the solution is manual fixes.

01:22:37.100 --> 01:22:40.610
You go in and fix the CSV, or
you have a lot of error checking,

01:22:40.610 --> 01:22:44.450
or you have a lot of try, except just
to tolerate mistakes in the data.

01:22:44.450 --> 01:22:47.900
But generally, I would say, if
you're using CSV or any file format

01:22:47.900 --> 01:22:50.990
internally to a program
to both read and write it,

01:22:50.990 --> 01:22:52.580
you shouldn't have concerns there.

01:22:52.580 --> 01:22:55.190
You and I, the humans,
are the problem, generally

01:22:55.190 --> 01:22:59.000
speaking-- and not the programmers,
the users of those files, instead.

01:22:59.000 --> 01:23:02.930
All right, allow me to propose that
we leave CSVs behind but to note

01:23:02.930 --> 01:23:04.850
that they're not the
only file format you

01:23:04.850 --> 01:23:07.310
can use in order to read or write data.

01:23:07.310 --> 01:23:10.760
In fact, they're a popular format,
as is just raw text files--

01:23:10.760 --> 01:23:11.690
.txt files.

01:23:11.690 --> 01:23:14.210
But you can store data,
really, any way that you want.

01:23:14.210 --> 01:23:16.730
We've just picked CSVs
because it's representative

01:23:16.730 --> 01:23:18.800
of how you might read
and write from a file

01:23:18.800 --> 01:23:22.910
and do so in a structured way, where
you can somehow have multiple keys,

01:23:22.910 --> 01:23:26.930
multiple values all in the same file
without having to resort to what would

01:23:26.930 --> 01:23:29.160
be otherwise known as a binary file.

01:23:29.160 --> 01:23:32.750
So a binary file is a file that's
really just zeros and ones.

01:23:32.750 --> 01:23:36.890
And they can be laid out in any
pattern you might want, particularly

01:23:36.890 --> 01:23:39.080
if you want to store
not textual information,

01:23:39.080 --> 01:23:43.200
but maybe graphical, or audio,
or video information as well.

01:23:43.200 --> 01:23:45.560
So it turns out that
Python is really good

01:23:45.560 --> 01:23:48.320
when it comes to having libraries
for, really, everything.

01:23:48.320 --> 01:23:50.660
And in fact, there's a
popular library called

01:23:50.660 --> 01:23:55.340
pillow that allows you to
navigate image files as well

01:23:55.340 --> 01:23:57.980
and to perform operations
on image files.

01:23:57.980 --> 01:24:00.230
You can apply filters, a la Instagram.

01:24:00.230 --> 01:24:02.670
You can animate them as well.

01:24:02.670 --> 01:24:05.900
And so what I thought we'd do is
leave behind text files for now

01:24:05.900 --> 01:24:08.150
and tackle one more
demonstration, this time,

01:24:08.150 --> 01:24:13.290
focusing on this particular
library and image files instead.

01:24:13.290 --> 01:24:16.250
So let me propose that we
go over here to VS Code

01:24:16.250 --> 01:24:19.910
and create a program, ultimately,
that creates an animated GIF.

01:24:19.910 --> 01:24:23.225
These things are everywhere nowadays
in the form of memes, and animations,

01:24:23.225 --> 01:24:24.350
and stickers, and the like.

01:24:24.350 --> 01:24:27.380
And an animated GIF is
really just an image file

01:24:27.380 --> 01:24:29.840
that has multiple images inside of it.

01:24:29.840 --> 01:24:34.790
And your computer or your phone shows
you those images, one after another,

01:24:34.790 --> 01:24:37.820
sometimes on an endless
loop, again and again.

01:24:37.820 --> 01:24:41.480
And so long as there's enough images,
it creates the illusion of animation

01:24:41.480 --> 01:24:44.600
because your mind and mine kind
of fills in the gaps visually

01:24:44.600 --> 01:24:47.630
and just assumes that if something
is moving, even though you're only

01:24:47.630 --> 01:24:51.230
seeing one frame per second,
or some sequence thereof,

01:24:51.230 --> 01:24:52.730
it looks like an animation.

01:24:52.730 --> 01:24:55.700
So it's like a simplistic
version of a video file.

01:24:55.700 --> 01:25:00.710
Well, let me propose that we start
with maybe a couple of costumes

01:25:00.710 --> 01:25:02.600
from another popular
programming language.

01:25:02.600 --> 01:25:05.780
And let me go ahead and open up
my first costume here, number 1.

01:25:05.780 --> 01:25:09.260
So suppose here that this is a costume
or, really, just a static image

01:25:09.260 --> 01:25:11.150
here, costume1.gif.

01:25:11.150 --> 01:25:14.600
And it's just a static picture
of a cat, no movement at all.

01:25:14.600 --> 01:25:18.770
Let me go ahead now and open
up a second one, costume2.gif,

01:25:18.770 --> 01:25:20.910
that looks a little bit different.

01:25:20.910 --> 01:25:23.510
Notice-- and I'll go back
and forth-- this cat's legs

01:25:23.510 --> 01:25:27.530
are a little bit aligned differently
so that this was version 1,

01:25:27.530 --> 01:25:29.570
and this was version 2.

01:25:29.570 --> 01:25:32.150
Now, these cats come from a
programming language from MIT

01:25:32.150 --> 01:25:34.490
called scratch that allows
you, very graphically,

01:25:34.490 --> 01:25:36.410
to animate all this and more.

01:25:36.410 --> 01:25:41.600
But we'll use just these two static
images, costume1 and costume2

01:25:41.600 --> 01:25:44.660
to create our own animated
GIF that, after this, you

01:25:44.660 --> 01:25:48.800
could text to a friend or message
them, much like any meme online.

01:25:48.800 --> 01:25:52.270
Well, let me propose that we
create this animated GIF, not

01:25:52.270 --> 01:25:54.770
by just using some off-the-shelf
program that we downloaded,

01:25:54.770 --> 01:25:56.450
but by writing our own code.

01:25:56.450 --> 01:25:59.630
Let me go ahead and
run code of costumes.py

01:25:59.630 --> 01:26:02.090
and create our very own
program that's going to take,

01:26:02.090 --> 01:26:07.460
as input, two or even more image files
and then generate an animated GIF

01:26:07.460 --> 01:26:12.230
from them by essentially creating this
animated GIF by toggling back and forth

01:26:12.230 --> 01:26:14.627
endlessly between those two images.

01:26:14.627 --> 01:26:15.960
Well, how am I going to do this?

01:26:15.960 --> 01:26:19.520
Well, let's assume that this will
be a program called costumes.py that

01:26:19.520 --> 01:26:22.280
expects two command line
arguments, the names

01:26:22.280 --> 01:26:26.490
of the files, the individual costumes
that we want to animate back and forth.

01:26:26.490 --> 01:26:29.060
So to do that, I'm going to
import sys so that we ultimately

01:26:29.060 --> 01:26:31.190
have access to sys.argv.

01:26:31.190 --> 01:26:35.090
I'm then, from this pillow library,
going to import support for images

01:26:35.090 --> 01:26:35.750
specifically.

01:26:35.750 --> 01:26:41.520
So from PIL import Image-- capital I,
as per the library's documentation.

01:26:41.520 --> 01:26:44.270
Now I'm going to give myself
an empty list called images,

01:26:44.270 --> 01:26:48.230
just so I have a list in which to store
one, or two, or more of these images.

01:26:48.230 --> 01:26:50.150
And now let me do this.

01:26:50.150 --> 01:26:56.540
For each argument in sys.argv, I'm
going to go ahead and create a new image

01:26:56.540 --> 01:27:03.650
variable, set it equal to this
Image.open function, passing in arg.

01:27:03.650 --> 01:27:05.030
Now, what is this doing?

01:27:05.030 --> 01:27:07.400
I'm proposing that,
eventually, I want to be

01:27:07.400 --> 01:27:10.190
able to run python of
costumes.py, and then

01:27:10.190 --> 01:27:14.330
as command line argument, specify
costume1.gif, space, costume2.gif.

01:27:14.330 --> 01:27:18.740
So I want to take in those file names
from the command line as my arguments.

01:27:18.740 --> 01:27:20.370
So what am I doing here?

01:27:20.370 --> 01:27:25.670
Well, I'm iterating over sys.argv all of
the words in my command line arguments.

01:27:25.670 --> 01:27:27.620
I'm creating a variable
called image, and I'm

01:27:27.620 --> 01:27:30.200
passing to this function,
Image.open from the pillow

01:27:30.200 --> 01:27:32.330
library, that specific argument.

01:27:32.330 --> 01:27:35.810
And that library is essentially
going to open that image

01:27:35.810 --> 01:27:38.960
in a way that gives me a lot of
functionality for manipulating it,

01:27:38.960 --> 01:27:40.040
like animating.

01:27:40.040 --> 01:27:48.180
Now I'm going to go ahead and append to
my images list that particular image.

01:27:48.180 --> 01:27:48.840
And that's it.

01:27:48.840 --> 01:27:51.890
So this loop's purpose in life is
just to iterate over the command line

01:27:51.890 --> 01:27:55.310
arguments and open those
images using this library.

01:27:55.310 --> 01:27:57.783
The last line is pretty straightforward.

01:27:57.783 --> 01:27:58.700
I'm going to say this.

01:27:58.700 --> 01:28:02.120
I'm going to grab the first of those
images, which is going to be in my list

01:28:02.120 --> 01:28:05.870
at location 0, and I'm
going to save it to disk.

01:28:05.870 --> 01:28:08.060
That is, I'm going to save this file.

01:28:08.060 --> 01:28:10.730
Now, in the past when we
use CSVs or text files,

01:28:10.730 --> 01:28:12.590
I had to do the file opening.

01:28:12.590 --> 01:28:15.340
I had to do the file writing,
maybe even the closing.

01:28:15.340 --> 01:28:17.090
I don't need to do
that with this library.

01:28:17.090 --> 01:28:20.750
The pillow library takes care of the
opening, the closing, and the saving

01:28:20.750 --> 01:28:23.000
for me by just calling save.

01:28:23.000 --> 01:28:24.780
I'm going to call this save function.

01:28:24.780 --> 01:28:27.740
And just to leave space, because I
have a number of arguments to pass,

01:28:27.740 --> 01:28:29.780
I'm going to move to
another line so it fits.

01:28:29.780 --> 01:28:33.290
I'm going to pass in the name of
the file that I want to create,

01:28:33.290 --> 01:28:34.730
costumes.gif--

01:28:34.730 --> 01:28:37.310
that will be the name
of my animated GIF.

01:28:37.310 --> 01:28:41.510
I'm going to tell this library
to save all of the frames

01:28:41.510 --> 01:28:44.870
that I pass to it-- so the first
costume, the second costume, and even

01:28:44.870 --> 01:28:46.190
more if I gave them.

01:28:46.190 --> 01:28:49.220
I'm going to then append
to this first image--

01:28:49.220 --> 01:28:55.310
the images 0-- the following
images, equals this list of images.

01:28:55.310 --> 01:28:57.650
And this is a bit clever,
but I'm going to do this.

01:28:57.650 --> 01:29:01.640
I want to append the next
image there, images[1].

01:29:01.640 --> 01:29:05.180
And now I want to specify a
duration of 200 milliseconds

01:29:05.180 --> 01:29:08.730
for each of these frames, and
I want this to loop forever.

01:29:08.730 --> 01:29:12.170
And if you specify
loop=0, that is time 0,

01:29:12.170 --> 01:29:15.620
it means it's just not going to
loop a finite number of times,

01:29:15.620 --> 01:29:18.080
but an infinite number of times instead.

01:29:18.080 --> 01:29:20.210
And I need to do one other thing.

01:29:20.210 --> 01:29:24.740
Recall that sys.argv
contains not just the words I

01:29:24.740 --> 01:29:29.960
typed after my program's name, but
what else does sys.argv contain?

01:29:29.960 --> 01:29:33.710
If you think back to our discussion
of command line arguments,

01:29:33.710 --> 01:29:38.240
what else is sys.argv besides
the words I'm about to type,

01:29:38.240 --> 01:29:41.510
like costume1.gif and costume2?

01:29:41.510 --> 01:29:45.530
AUDIENCE: Yeah, so we'll actually
get the original name of the program

01:29:45.530 --> 01:29:48.053
we want to run, the costumes.py.

01:29:48.053 --> 01:29:50.720
DAVID MALAN: Indeed, we'll get
the original name of the program,

01:29:50.720 --> 01:29:53.270
costumes.py in this case,
which is not a GIF, obviously.

01:29:53.270 --> 01:29:57.230
So remember that using slices
in Python, we can do this.

01:29:57.230 --> 01:30:01.670
If sys.argv is a list, and we want to
get a slice of that list, everything

01:30:01.670 --> 01:30:05.330
after the first element, we
can do 1, colon, which says,

01:30:05.330 --> 01:30:10.220
start it location 1, not 0, and
take a slice all the way to the end.

01:30:10.220 --> 01:30:12.620
So give me everything
except the first thing

01:30:12.620 --> 01:30:16.700
in that list, which, to McKenzie's
point, is the name of the program.

01:30:16.700 --> 01:30:19.980
Now, if I haven't made any
mistakes, let's see what happens.

01:30:19.980 --> 01:30:22.880
I'm going to run python of
costumes.py, and now I'm

01:30:22.880 --> 01:30:25.400
going to specify the two
images that I want to animate--

01:30:25.400 --> 01:30:30.290
so costume1.gif and costume2.gif.

01:30:30.290 --> 01:30:32.240
What is the code now going to do?

01:30:32.240 --> 01:30:34.520
Well, to recap, we're
using the sys library

01:30:34.520 --> 01:30:36.380
to access those command line arguments.

01:30:36.380 --> 01:30:39.140
We're using the pillow
library to treat those files

01:30:39.140 --> 01:30:42.680
as images and with all the functionality
that comes with that library.

01:30:42.680 --> 01:30:46.490
I'm using this images list just to
accumulate all of these images, one

01:30:46.490 --> 01:30:48.110
at a time from the command line.

01:30:48.110 --> 01:30:52.520
And in lines 7 through 9, I'm just
using a loop to iterate over all of them

01:30:52.520 --> 01:30:56.750
and just add them to this list
after opening them with the library.

01:30:56.750 --> 01:31:00.170
And the last step, which is really just
one line of code broken onto three so

01:31:00.170 --> 01:31:02.990
that it all fits, I'm going
to save the first image,

01:31:02.990 --> 01:31:07.340
but I'm asking the library to
append this other image to it

01:31:07.340 --> 01:31:09.550
as well-- not bracket 0, but bracket 1.

01:31:09.550 --> 01:31:12.010
And if I had more, I could
express those as well.

01:31:12.010 --> 01:31:14.260
I want to save all of
these files together.

01:31:14.260 --> 01:31:17.680
I want to pause 200
milliseconds-- a fifth of a second

01:31:17.680 --> 01:31:18.940
in between each frame.

01:31:18.940 --> 01:31:21.860
And I want it to loop
infinitely many times.

01:31:21.860 --> 01:31:27.520
So now if I cross my fingers
as always, hit Enter,

01:31:27.520 --> 01:31:30.710
nothing bad happened, and that's
almost always a good thing.

01:31:30.710 --> 01:31:38.480
Let me now run code of costumes.gif
to open up in VS Code the final image.

01:31:38.480 --> 01:31:42.610
And what I think I should
see is a very happy cat?

01:31:42.610 --> 01:31:43.510
And indeed.

01:31:43.510 --> 01:31:47.320
So now we've seen not only that we can
read and write files, be it textually.

01:31:47.320 --> 01:31:51.405
We can read and now write files
that are binary zeros and ones.

01:31:51.405 --> 01:31:52.780
We've just scratched the surface.

01:31:52.780 --> 01:31:54.790
This is using the library called pillow.

01:31:54.790 --> 01:31:58.120
But ultimately, this is going to give
us the ability to read and write files

01:31:58.120 --> 01:31:59.240
however we want.

01:31:59.240 --> 01:32:03.340
So we've now seen that via File I/O, we
can manipulate not just textual files,

01:32:03.340 --> 01:32:06.790
be it TXT files, or CSVs, but
even binary files as well.

01:32:06.790 --> 01:32:08.840
In this case, they happen to be images.

01:32:08.840 --> 01:32:11.950
But if we dived in deeper, we
could explore audio, and video,

01:32:11.950 --> 01:32:15.400
and so much more all by way of these
simple primitives, this ability,

01:32:15.400 --> 01:32:18.250
somehow, to read and write files.

01:32:18.250 --> 01:32:19.460
That's it for now.

01:32:19.460 --> 01:32:21.840
We'll see you next time.