1
00:00:00,000 --> 00:00:02,620
[Week 7, Continued]
2
00:00:02,620 --> 00:00:05,090
[David J. Malan, Harvard University]
3
00:00:05,090 --> 00:00:07,780
[This is CS50.] [CS50.TV]
4
00:00:07,780 --> 00:00:09,810
All right. Welcome Back. This is CS50,
5
00:00:09,810 --> 00:00:12,100
and this is the end of week 7.
6
00:00:12,100 --> 00:00:15,460
So one of these stupid little things that goes around the Internet
7
00:00:15,460 --> 00:00:24,080
and we slurped up, and it should now make a little bit of geeky sense to you.
8
00:00:24,080 --> 00:00:28,330
Well, it was funnier to this guy than it was to you guys.
9
00:00:28,330 --> 00:00:32,619
Speaking of, well, guys,
10
00:00:32,619 --> 00:00:42,550
today is Nate's birthday.
11
00:00:42,550 --> 00:00:46,630
To give you a sense of just how good Nate and I are
12
00:00:46,630 --> 00:00:50,140
at web development based on Monday's class and based now on this,
13
00:00:50,140 --> 00:00:53,170
I thought I'd pull up Nate's home page, if you haven't seen it yet.
14
00:00:53,170 --> 00:00:57,020
This here is Nate's HTML.
15
00:00:57,020 --> 00:00:59,380
So see his source code if you'd like to see how to do this, and Nate,
16
00:00:59,380 --> 00:01:02,250
if we could embarrass you just briefly, the staff got you a little something
17
00:01:02,250 --> 00:01:06,080
if you'd like to share some dessert with some of the kids in the class here.
18
00:01:06,080 --> 00:01:10,150
If you'd like to come on down.
19
00:01:10,150 --> 00:01:14,350
You all applaud and are very nice, but no one is sitting anywhere near Nate,
20
00:01:14,350 --> 00:01:17,560
for some reason, in that back zone.
21
00:01:17,560 --> 00:01:24,020
So perhaps you can find some folks to enjoy these with.
22
00:01:24,020 --> 00:01:33,380
Happy Birthday, Nate.
23
00:01:33,380 --> 00:01:37,660
>> Additional hellos: We showed a couple clips from our CS50x students.
24
00:01:37,660 --> 00:01:39,710
If you would like to see who else it is in the world
25
00:01:39,710 --> 00:01:41,850
that's following along, you can head to this URL,
26
00:01:41,850 --> 00:01:45,780
where Joseph, one of our TFs, has put together a montage of sorts
27
00:01:45,780 --> 00:01:50,290
of everyone who has been submitting these videos, among them Rick Astley.
28
00:01:50,290 --> 00:01:53,010
And if you scroll through these, it's really quite inspiring
29
00:01:53,010 --> 00:01:56,890
to see the diversity of countries and cities from which people are hailing.
30
00:01:56,890 --> 00:02:00,830
So if you'd like to take a look at that, that will be up through the end of the semester.
31
00:02:00,830 --> 00:02:05,370
Today we continue our look at the Web, web programming, HTML and the like,
32
00:02:05,370 --> 00:02:08,280
and we also have lunch coming up this Friday
33
00:02:08,280 --> 00:02:11,360
if you would like, and particularly, have not done so before.
34
00:02:11,360 --> 00:02:13,630
This Friday's theme will be Nate's birthday,
35
00:02:13,630 --> 00:02:15,700
so if you would like to have birthday lunch with Nate
36
00:02:15,700 --> 00:02:17,500
and others, some of our friends from industry,
37
00:02:17,500 --> 00:02:19,300
please head to that URL there.
38
00:02:19,300 --> 00:02:22,510
Space, as always, is limited. Also, if you've forgotten,
39
00:02:22,510 --> 00:02:26,460
realize that next week is the deadline for problem set 4's scavenger hunt,
40
00:02:26,460 --> 00:02:30,070
whereby after recovering all of those JPEGs from card.raw,
41
00:02:30,070 --> 00:02:32,880
you and your section mates, if you would like, can try photographing
42
00:02:32,880 --> 00:02:36,100
as many of the computer scientists from that memory card as possible,
43
00:02:36,100 --> 00:02:39,070
and you and your section will then win a fabulous prize.
44
00:02:39,070 --> 00:02:44,470
Refer back to pset 4's specification as to what to submit and by when.
45
00:02:44,470 --> 00:02:47,650
Also, if you would like to have your handiwork immortalized
46
00:02:47,650 --> 00:02:51,400
on the course's website and its history of apparel,
47
00:02:51,400 --> 00:02:54,010
know that you are welcome now to start submitting designs
48
00:02:54,010 --> 00:02:57,180
for this year's T-shirts and sweatshirts and the like.
49
00:02:57,180 --> 00:02:59,200
We'll do our best to include as many as we can,
50
00:02:59,200 --> 00:03:01,440
but we'll have some members of the staff review all of the designs
51
00:03:01,440 --> 00:03:04,180
to make sure they're consistent with the specifications,
52
00:03:04,180 --> 00:03:07,500
and we then pick generally a handful of them to be exhibited.
53
00:03:07,500 --> 00:03:10,620
So if you are the design type, just know that the requirements
54
00:03:10,620 --> 00:03:14,030
for graphics are PNG, at least 200 DPI;
55
00:03:14,030 --> 00:03:16,520
they shouldn't be more than 4000 x 4000 pixels,
56
00:03:16,520 --> 00:03:19,010
and no more than 10 MB, but you're welcome to use things like
57
00:03:19,010 --> 00:03:22,430
Photoshop or GIMP or various graphics programs,
58
00:03:22,430 --> 00:03:24,590
whatever you have at your disposal.
59
00:03:24,590 --> 00:03:28,280
>> Also on the horizon is the final project. The final project really is the climax of 50,
60
00:03:28,280 --> 00:03:30,560
whereby of all the assignments in the course,
61
00:03:30,560 --> 00:03:33,170
it's your opportunity really to do your own thing.
62
00:03:33,170 --> 00:03:35,280
And that can be simply to do something for fun,
63
00:03:35,280 --> 00:03:38,160
it can be to solve some pressing problem your student group has,
64
00:03:38,160 --> 00:03:40,980
for some new website, some new collection mechanism for data.
65
00:03:40,980 --> 00:03:43,420
It can be a mobile application for Android, for iOS.
66
00:03:43,420 --> 00:03:46,030
Really, the sky is the limit, and over the next few weeks,
67
00:03:46,030 --> 00:03:50,900
as we transition from C to these higher-level languages like PHP and JavaScript,
68
00:03:50,900 --> 00:03:55,150
you'll find yourself increasingly familiarized with some real-world techniques,
69
00:03:55,150 --> 00:03:57,800
some real-world tools, and to supplement that,
70
00:03:57,800 --> 00:04:00,170
know that the course has a history of seminars,
71
00:04:00,170 --> 00:04:02,880
whereby over the next several weeks, some of the teaching staff
72
00:04:02,880 --> 00:04:06,160
and friends of ours from on campus will offer optional seminars
73
00:04:06,160 --> 00:04:08,540
which go above and beyond what's typically done in section
74
00:04:08,540 --> 00:04:11,090
to introduce you to things like Android programming,
75
00:04:11,090 --> 00:04:13,450
to introduce you to things like iOS programming
76
00:04:13,450 --> 00:04:15,950
or more advanced web-development techniques.
77
00:04:15,950 --> 00:04:17,970
There's a whole history of these already online.
78
00:04:17,970 --> 00:04:25,000
If you go to cs50.net/seminars, we've been doing this for quite some years,
79
00:04:25,000 --> 00:04:28,740
and you'll see that archived here with PDFs and videos and the like
80
00:04:28,740 --> 00:04:33,090
are several dozen videos of seminars.
81
00:04:33,090 --> 00:04:37,380
Last year, for instance, we had a seminar on acing your technical interviews,
82
00:04:37,380 --> 00:04:40,980
if you're actually looking to go off and do an internship or full-time gig.
83
00:04:40,980 --> 00:04:43,450
Windows mobile development, Android development, Google Maps,
84
00:04:43,450 --> 00:04:47,700
API, CSS, developing for the BlackBerry, Emacs.
85
00:04:47,700 --> 00:04:52,610
Really, you are welcome to take a look at any of these seminars at your convenience.
86
00:04:52,610 --> 00:04:57,080
And we'll be holding some new ones this semester, as well.
87
00:04:57,080 --> 00:04:59,020
>> So what is ahead with the final project?
88
00:04:59,020 --> 00:05:01,090
Well, first, even though this date is somewhat imminent,
89
00:05:01,090 --> 00:05:06,460
this is really just an opportunity to start thinking about the final project quite realistically.
90
00:05:06,460 --> 00:05:10,550
We know only the beginnings of some of what we'll still be covering in the course--
91
00:05:10,550 --> 00:05:13,470
HTML, PHP and the like--but you're all familiar with the Web,
92
00:05:13,470 --> 00:05:16,270
and I bias this conversation toward the Web only because
93
00:05:16,270 --> 00:05:18,380
most people end up doing Web-based final projects,
94
00:05:18,380 --> 00:05:20,260
but that is by no means requisite.
95
00:05:20,260 --> 00:05:22,260
Using C is fine, Objective-C, Java,
96
00:05:22,260 --> 00:05:25,350
any other language you might know or want to know is quite fine.
97
00:05:25,350 --> 00:05:29,370
But to get the juices flowing initially, we'll expect the submission of a preproposal
98
00:05:29,370 --> 00:05:33,520
which, per the PDF on the website, which is now at cs50.net,
99
00:05:33,520 --> 00:05:36,080
and at the top left you'll see final project
100
00:05:36,080 --> 00:05:38,920
is the specification for the final project,
101
00:05:38,920 --> 00:05:41,470
and in there are details on the preproposal and the like.
102
00:05:41,470 --> 00:05:44,760
It pretty much boils down to an email to your teaching fellow
103
00:05:44,760 --> 00:05:48,450
just to strike up a conversation with him or her about what you're thinking.
104
00:05:48,450 --> 00:05:52,510
On projects.cs50.net is a repository of ideas from folks on campus
105
00:05:52,510 --> 00:05:54,480
if you're struggling to come up with some idea,
106
00:05:54,480 --> 00:06:01,140
and manual.cs50.net/apis is a repository of links to APIs.
107
00:06:01,140 --> 00:06:06,710
>> What, though, is an API?
108
00:06:06,710 --> 00:06:09,790
What's an API? I've said it at least twice,
109
00:06:09,790 --> 00:06:12,640
according to the transcripts of the past several weeks.
110
00:06:12,640 --> 00:06:17,050
What's that? [Student, unintelligible]
111
00:06:17,050 --> 00:06:19,340
>>Okay, good. So something programming interface.
112
00:06:19,340 --> 00:06:22,710
Application programming interface, and this can take several forms,
113
00:06:22,710 --> 00:06:25,850
but what this really boils down to is code
114
00:06:25,850 --> 00:06:29,660
that someone else has written or data that someone else has collected
115
00:06:29,660 --> 00:06:33,670
that is made available to you in some programmatic way.
116
00:06:33,670 --> 00:06:36,630
You can write code in C, PHP, Python, Ruby,
117
00:06:36,630 --> 00:06:38,760
whatever your language of choice typically is,
118
00:06:38,760 --> 00:06:42,240
and you can somehow build upon someone else's functionality
119
00:06:42,240 --> 00:06:44,440
or someone else's data set.
120
00:06:44,440 --> 00:06:47,210
For instance, if I go to this link here,
121
00:06:47,210 --> 00:06:50,750
and you'll see a pair of links on the subsequent page
122
00:06:50,750 --> 00:06:56,093
whereby we have CS50's own APIs, which are very Harvard-centric, and then third-party APIs.
123
00:06:56,930 --> 00:06:59,300
Among the third-party APIs are really useful things
124
00:06:59,300 --> 00:07:01,780
like being able to send SMS's to people,
125
00:07:01,780 --> 00:07:04,690
being able to receive SMS text messages from people.
126
00:07:04,690 --> 00:07:08,160
And things like that that you might have no idea how to implement yourself,
127
00:07:08,160 --> 00:07:10,440
but thanks to services, some free and some commercial,
128
00:07:10,440 --> 00:07:14,000
you can build atop those and do something of interest to you.
129
00:07:14,000 --> 00:07:16,990
Among CS50's APIs are these campus-centric things like
130
00:07:16,990 --> 00:07:21,480
Harvard courses, energy, events, food, maps, news, tweets, and Shuttleboy's own,
131
00:07:21,480 --> 00:07:23,940
and these are APIs that look a little something like this.
132
00:07:23,940 --> 00:07:26,990
>> Let me pull up the HarvardFood API.
133
00:07:26,990 --> 00:07:30,620
If you've ever been to HUD's website, you've probably been there
134
00:07:30,620 --> 00:07:35,410
to just see what's for dinner or to see what the hours are for some d-hall.
135
00:07:35,410 --> 00:07:38,000
Well, it's not particularly easy to navigate,
136
00:07:38,000 --> 00:07:41,100
and so what we did some time ago was we wrote software--
137
00:07:41,100 --> 00:07:47,270
it happens to be in PHP--that actually screen scrapes the entirety of HUD's website.
138
00:07:47,270 --> 00:07:51,400
To screen scrape something means to write a program in a language like PHP
139
00:07:51,400 --> 00:07:55,270
that pretends to be a browser, even though you might run it at a command prompt,
140
00:07:55,270 --> 00:07:58,180
that pretends to be a browser, connects to a website,
141
00:07:58,180 --> 00:08:01,480
downloads its HTML, the language in which it's written,
142
00:08:01,480 --> 00:08:04,300
and then reads it, or more specifically, parses it
143
00:08:04,300 --> 00:08:06,140
top to bottom, left to right.
144
00:08:06,140 --> 00:08:08,870
And what we did was we wrote our code in such a way that
145
00:08:08,870 --> 00:08:12,910
any time we saw something in that HTML that looked like something on the menu,
146
00:08:12,910 --> 00:08:16,470
like hamburger, we would then import that into our own database.
147
00:08:16,470 --> 00:08:20,410
And any time we saw nutritional content, we would import that into our own database.
148
00:08:20,410 --> 00:08:23,090
And what we did was leverage the fact that HUD's website,
149
00:08:23,090 --> 00:08:27,280
even though it might be a bit of a challenge for us humans to navigate
150
00:08:27,280 --> 00:08:32,559
underneath the hood, all of the HTML is generated by their own computer programs.
151
00:08:32,559 --> 00:08:35,159
So all of their HTML, even though it might look messy,
152
00:08:35,159 --> 00:08:38,026
like most websites underneath the hood, it follows a pattern.
153
00:08:38,260 --> 00:08:40,799
So we just spent a couple hours figuring out that pattern
154
00:08:40,799 --> 00:08:44,240
so that in the end, we throw away all of the messy HTML,
155
00:08:44,240 --> 00:08:47,340
all of the aesthetics of bold facing and italics and the like,
156
00:08:47,340 --> 00:08:52,350
and what we are then able to do is expose that same data.
157
00:08:52,350 --> 00:08:54,870
For instance, in this way.
158
00:08:54,870 --> 00:08:56,840
So we, according to the documentation here,
159
00:08:56,840 --> 00:08:59,190
have informed the world that if you request a URL
160
00:08:59,190 --> 00:09:03,310
that looks like this, food.cs50.net/something,
161
00:09:03,310 --> 00:09:07,220
and you provide certain parameters, which we'll talk about today,
162
00:09:07,220 --> 00:09:11,780
like end-date time, start-date time, meal, and so forth,
163
00:09:11,780 --> 00:09:14,090
what our servers will return to you, for instance,
164
00:09:14,090 --> 00:09:18,740
is a CSV file, comma separted values like an Excel file,
165
00:09:18,740 --> 00:09:23,140
containing everything for breakfast on this particular date in March of last year
166
00:09:23,140 --> 00:09:25,450
when I happened to write up this documentation.
167
00:09:25,450 --> 00:09:27,870
>> For those familiar, CSV is not the only file format.
168
00:09:27,870 --> 00:09:30,610
There's another format that's all the more versatile
169
00:09:30,610 --> 00:09:32,670
called JSON, JavaScript Object Notation.
170
00:09:32,670 --> 00:09:34,770
The data can come back in that format.
171
00:09:34,770 --> 00:09:38,110
So the takeaway here is that whether you dive into this API
172
00:09:38,110 --> 00:09:41,170
or any other of CS50's or anything out there on the Internet,
173
00:09:41,170 --> 00:09:45,560
or not at all, realize that the world has increasingly started to standardize
174
00:09:45,560 --> 00:09:47,670
how machines intercommunicate.
175
00:09:47,670 --> 00:09:50,660
We use standard data formats like CSV or JSON.
176
00:09:50,660 --> 00:09:54,320
And what this means for you is you can write the interesting part of a program
177
00:09:54,320 --> 00:09:56,580
that lets your user search a dining hall menu,
178
00:09:56,580 --> 00:10:00,010
that lets them create lists of favorites that lets them get text alerts
179
00:10:00,010 --> 00:10:02,480
when their favorite meal is about to be served in some d-hall
180
00:10:02,480 --> 00:10:07,090
by using someone else's data sets and building on top of their APIs.
181
00:10:07,090 --> 00:10:13,600
So more on that in the form of seminars and the documentation that you have here online.
182
00:10:13,600 --> 00:10:16,450
So those, then, are APIs.
183
00:10:16,450 --> 00:10:18,900
>> That brings us back to HTML. Quick recap.
184
00:10:18,900 --> 00:10:22,920
What is HTML?
185
00:10:22,920 --> 00:10:25,000
[Student, unintelligible] >>Good. HyperText Markup Language.
186
00:10:25,000 --> 00:10:31,300
Someone else, what is Hypertext Markup Language?
187
00:10:31,300 --> 00:10:37,340
HyperText Markup Language.
188
00:10:37,340 --> 00:10:40,330
Okay. So HTML, HyperText.
189
00:10:40,330 --> 00:10:43,100
HyperText just refers to the Web, for the most part.
190
00:10:43,100 --> 00:10:45,730
Markup means that it's not actually a programming language, HTML.
191
00:10:45,730 --> 00:10:48,120
It's not a language that you can express logic in.
192
00:10:48,120 --> 00:10:50,710
It doesn't have loops. It doesn't have conditions.
193
00:10:50,710 --> 00:10:52,820
It doesn't have functions, per se.
194
00:10:52,820 --> 00:10:56,680
Rather, it has these things called tags, or, more properly, elements.
195
00:10:56,680 --> 00:10:59,970
And those elements have start tags and end tags,
196
00:10:59,970 --> 00:11:04,300
or open tags and closed tags, and what those tags generally mean for a browser is,
197
00:11:04,300 --> 00:11:09,270
start doing something and then stop doing something, though there are exceptions to that.
198
00:11:09,270 --> 00:11:12,480
Sometimes it's just "put a line break here," for instance.
199
00:11:12,480 --> 00:11:15,150
And we saw examples of that the other day, between bold facing,
200
00:11:15,150 --> 00:11:17,430
line breaks, and then a couple of other tags.
201
00:11:17,430 --> 00:11:19,880
So HTML is the language in which web pages are written.
202
00:11:19,880 --> 00:11:23,760
So if I go to something like Google.com
203
00:11:23,760 --> 00:11:26,180
and pull up just their home page,
204
00:11:26,180 --> 00:11:29,690
recall that if you right click or control click
205
00:11:29,690 --> 00:11:32,140
and look at view page source, typically
206
00:11:32,140 --> 00:11:34,420
it's a complete mess these days underneath the hood, but that's because
207
00:11:34,420 --> 00:11:38,170
computers don't care about white space, so this doesn't have to look pretty.
208
00:11:38,170 --> 00:11:40,240
But if we zoom in on parts of it,
209
00:11:40,240 --> 00:11:43,460
notice that Chrome, just to be nice, has color coded things.
210
00:11:43,460 --> 00:11:48,460
Indeed, this is the very first tag that we saw in a web page.
211
00:11:48,460 --> 00:11:51,750
And again, HTML 5, the latest version of this language,
212
00:11:51,750 --> 00:11:53,830
does have this thing at the beginning,
213
00:11:53,830 --> 00:11:57,820
00:12:03,580
but that's just sort of a standard that says hey world, here comes an HTML file in version 5.
215
00:12:03,580 --> 00:12:08,920
>> The interesting part begins here. So 00:12:11,640
of the HTML elements last time.
217
00:12:11,640 --> 00:12:14,630
What were those two main children?
218
00:12:14,630 --> 00:12:17,170
Head and body, just like the guy with the tattoo a moment ago.
219
00:12:17,170 --> 00:12:19,640
There's two portions of a web page, head and body,
220
00:12:19,640 --> 00:12:23,750
and recall, then, that perhaps the simplest web page we could make looks like this.
221
00:12:23,750 --> 00:12:27,460
And I've indented it just to be kind of neat and tidy with my code,
222
00:12:27,460 --> 00:12:30,710
but what's really important here is that there is some hierarchy to this.
223
00:12:30,710 --> 00:12:35,420
And any tag that I've opened I have closed and that there's therefore this symmetry
224
00:12:35,420 --> 00:12:38,300
to all of the markup that I've created.
225
00:12:38,300 --> 00:12:41,620
So last time we started writing web pages on my own laptop.
226
00:12:41,620 --> 00:12:45,470
I opened up TextEdit, I saved the file as hello.html,
227
00:12:45,470 --> 00:12:50,190
I then dragged the file onto my browser, and voila, I had a page on the Internet.
228
00:12:50,190 --> 00:12:53,110
Now, it's not quite the case; I had a page on my hard drive,
229
00:12:53,110 --> 00:12:58,260
and I was literally the only person in the world who would see that web page in a browser.
230
00:12:58,260 --> 00:13:00,670
>> So today, we introduce an actual web server
231
00:13:00,670 --> 00:13:02,750
and the notion of actually serving content on the Internet
232
00:13:02,750 --> 00:13:04,970
and how this all starts to fit together.
233
00:13:04,970 --> 00:13:08,350
So it turns out that all this time in the CS50 appliance
234
00:13:08,350 --> 00:13:11,590
you have had a web server on your computer.
235
00:13:11,590 --> 00:13:16,560
We have, in fairness, only used it for gedit, for Clang, for GDB and the like,
236
00:13:16,560 --> 00:13:21,000
but also installed by us for you in the appliance is a web server,
237
00:13:21,000 --> 00:13:23,940
and that web server happens to be free, an open source,
238
00:13:23,940 --> 00:13:26,580
one of the most popular ones in the world, called Apache.
239
00:13:26,580 --> 00:13:31,340
Its more technical name is HTTPd, the d being for daemon here,
240
00:13:31,340 --> 00:13:34,110
which is a technical word for a server.
241
00:13:34,110 --> 00:13:38,690
So installed in the CS50 appliance is a web server, and what does that mean?
242
00:13:38,690 --> 00:13:43,740
Well, a web server is, conceptually, some server on the Internet that serves up web content.
243
00:13:43,740 --> 00:13:48,630
When asked for a file, it spits out the HTML that composes that file, and voila.
244
00:13:48,630 --> 00:13:51,370
You see some website's home page.
245
00:13:51,370 --> 00:13:54,970
But a server is, more precisely, a piece of software.
246
00:13:54,970 --> 00:13:59,190
It doesn't have to be on a physical machine, it just has to be a piece of software running.
247
00:13:59,190 --> 00:14:01,980
So the CS50 Appliance, of course, is a piece of software,
248
00:14:01,980 --> 00:14:04,270
even though it's sort of pretending to be a machine.
249
00:14:04,270 --> 00:14:06,960
It's pretending to be a computer inside of a computer,
250
00:14:06,960 --> 00:14:11,140
but that just means that the appliance can certainly run things like web servers.
251
00:14:11,140 --> 00:14:13,260
It can actually run email servers.
252
00:14:13,260 --> 00:14:16,440
We could run an instant messaging server in the appliance if we wanted to,
253
00:14:16,440 --> 00:14:20,780
and indeed, we do run one other type of server, known as a database server, MySQL.
254
00:14:20,780 --> 00:14:22,620
But more on that next week.
255
00:14:22,620 --> 00:14:26,400
This means that I can actually visit web pages
256
00:14:26,400 --> 00:14:30,480
inside of my appliance by using a browser inside the appliance
257
00:14:30,480 --> 00:14:33,600
or even on my own laptop, my Mac or my PC.
258
00:14:33,600 --> 00:14:37,780
So what does this mean? It turns out that any time you're running a Linux computer,
259
00:14:37,780 --> 00:14:40,910
its nickname is "localhost."
260
00:14:40,910 --> 00:14:43,370
It doesn't have a domain name because we haven't bought a domain name
261
00:14:43,370 --> 00:14:46,590
for something like the appliance, so its default name is localhost.
262
00:14:46,590 --> 00:14:50,470
>> But in order to get the appliance to start serving up web pages,
263
00:14:50,470 --> 00:14:52,270
we have to create them first.
264
00:14:52,270 --> 00:14:55,200
So let's do that. Let me go into a terminal window here
265
00:14:55,200 --> 00:14:58,190
and notice that I'm at my typical John Harvard prompt.
266
00:14:58,190 --> 00:15:01,670
Let me go ahead and type ls, and we'll see some familiar things from this semester,
267
00:15:01,670 --> 00:15:04,580
desktop, downloads, dropbox and so forth,
268
00:15:04,580 --> 00:15:07,540
but now we start turning our attention to a couple.
269
00:15:07,540 --> 00:15:11,530
On many Linux web servers there's this folder called public html,
270
00:15:11,530 --> 00:15:15,630
but we're going to skip that one for now and focus on this, vhosts.
271
00:15:15,630 --> 00:15:18,850
Anyone know what a vhost is?
272
00:15:18,850 --> 00:15:21,110
Just stupid jargon for virtual host,
273
00:15:21,110 --> 00:15:23,850
and what this means is that on a typical server
274
00:15:23,850 --> 00:15:26,810
you can actually host multiple websites.
275
00:15:26,810 --> 00:15:31,500
You can buy a domain name like foo.com, and you can host it on a server.
276
00:15:31,500 --> 00:15:36,100
But you can also buy bar.com and host it on the same server.
277
00:15:36,100 --> 00:15:40,250
The reason being, browsers are smart enough to inform the server
278
00:15:40,250 --> 00:15:45,880
when a user is requesting some webpage, what domain name the user wants the homepage for.
279
00:15:45,880 --> 00:15:48,760
So what's nice about this is you don't need one physical server
280
00:15:48,760 --> 00:15:52,040
or one CS50 appliance for every website you might want to create.
281
00:15:52,040 --> 00:15:55,520
You can use the same server and develop a hundred different websites.
282
00:15:55,520 --> 00:15:58,770
And indeed, if you are a person trying to start a website,
283
00:15:58,770 --> 00:16:02,100
whether for fun or for business, typically you'll go out on the Internet
284
00:16:02,100 --> 00:16:04,650
and you'll pay someone ten bucks a month, a hundred dollars a month,
285
00:16:04,650 --> 00:16:06,670
to host your website for you.
286
00:16:06,670 --> 00:16:11,060
And the way that works is they are charging other people
287
00:16:11,060 --> 00:16:13,160
ten bucks a month or a hundred bucks a month
288
00:16:13,160 --> 00:16:17,200
to host other people's websites on their same server.
289
00:16:17,200 --> 00:16:20,740
The reason they can do that is because of this feature called bhosts,
290
00:16:20,740 --> 00:16:23,790
but more on that when it comes time for final projects.
291
00:16:23,790 --> 00:16:28,360
>> For now, let's just dive in there. So cd vhosts, and if I type ls now,
292
00:16:28,360 --> 00:16:31,370
notice that there's a folder in there called local host.
293
00:16:31,370 --> 00:16:33,440
That's because, by default, the appliance figures
294
00:16:33,440 --> 00:16:36,160
you're ever going to run one website on an appliance.
295
00:16:36,160 --> 00:16:38,970
This isn't really the real world; it's not a real-world web server.
296
00:16:38,970 --> 00:16:41,690
So let me go into localhost, and now we'll see in there
297
00:16:41,690 --> 00:16:44,290
one last directory called HTML.
298
00:16:44,290 --> 00:16:47,080
So it's a little deep, the hierarchy, but if and when
299
00:16:47,080 --> 00:16:51,230
you decide to start developing multiple websites over the next n months or years,
300
00:16:51,230 --> 00:16:54,370
this kind of folder structure tends to be helpful.
301
00:16:54,370 --> 00:16:56,560
Now let's go into HTML as I just did,
302
00:16:56,560 --> 00:16:59,010
type ls, and nothing is there.
303
00:16:59,010 --> 00:17:01,390
So now let's go ahead and do this. Let me open up Chrome
304
00:17:01,390 --> 00:17:07,300
inside of the appliance, and let me go to http://localhost.
305
00:17:07,300 --> 00:17:14,440
So literally the name for my appliance, enter, and I get index of /.
306
00:17:14,440 --> 00:17:18,290
This isn't really showing me anything of interest,
307
00:17:18,290 --> 00:17:23,400
but it turns out that what we're seeing is that folder, HTML.
308
00:17:23,400 --> 00:17:25,770
There's nothing inside that folder right now,
309
00:17:25,770 --> 00:17:28,750
so instead, what I'm going to have to do is first create a file.
310
00:17:28,750 --> 00:17:33,530
Create an HTML file like we did on Monday, but this time put it inside of the appliance.
311
00:17:33,530 --> 00:17:36,830
For those of you who are trying to follow along with laptops now,
312
00:17:36,830 --> 00:17:42,040
let me do one aside that'll be covered in the web-based pset,
313
00:17:42,040 --> 00:17:44,280
but in order to get this to work for the very first time,
314
00:17:44,280 --> 00:17:49,830
you're going to have to run this command: sudo service httpd start.
315
00:17:49,830 --> 00:17:52,670
And this, again, will be repeated in the last pset,
316
00:17:52,670 --> 00:17:55,460
but if you're playing along at home now, the web server
317
00:17:55,460 --> 00:17:58,660
is turned off in the appliance, and that's so that it doesn't sap up RAM
318
00:17:58,660 --> 00:18:01,960
and memory for 7 weeks out of the semester when we don't need it.
319
00:18:01,960 --> 00:18:05,190
So you need to run this command once, and you'll get an output like that.
320
00:18:05,190 --> 00:18:07,920
Then you should be able to play along here.
321
00:18:07,920 --> 00:18:10,330
Now let's go back into this folder.
322
00:18:10,330 --> 00:18:12,770
This folder is empty, so let me start creating a file,
323
00:18:12,770 --> 00:18:16,360
gedit hello.html.
324
00:18:16,360 --> 00:18:20,930
>> All right. Gedit is open, as usual. Let me do doctype, html,
325
00:18:20,930 --> 00:18:25,270
html, let me get ahead of myself and start closing my tags in advance.
326
00:18:25,270 --> 00:18:28,380
Now I have the head. Let me go ahead and close the head,
327
00:18:28,380 --> 00:18:32,450
let me now do the title of the page, hello world like last time,
328
00:18:32,450 --> 00:18:34,790
close title, now let me do a body.
329
00:18:34,790 --> 00:18:38,130
In here I'll say hello, world with some exclams
330
00:18:38,130 --> 00:18:40,550
to make clear that it's a different string.
331
00:18:40,550 --> 00:18:45,800
Close body, and now let me go ahead and File, Save.
332
00:18:45,800 --> 00:18:48,470
Let me go back to my terminal window, and if I type ls,
333
00:18:48,470 --> 00:18:51,830
I should, presumably, see hello.html. And I do.
334
00:18:51,830 --> 00:18:55,070
So now let's go back to my browser, click reload,
335
00:18:55,070 --> 00:18:58,930
and you can see we are indeed inside of this HTML folder.
336
00:18:58,930 --> 00:19:02,310
I'm not seeing a web page yet; this is Apache, the web server,
337
00:19:02,310 --> 00:19:04,670
just showing me the list contents of this directory.
338
00:19:04,670 --> 00:19:08,260
Just like Mac OS or Windows would typically do on your own local hard drive.
339
00:19:08,260 --> 00:19:12,730
So if I want to see this web page, I can click this little link here, hello.html,
340
00:19:12,730 --> 00:19:15,160
and indeed, that's what I was expecting to see.
341
00:19:15,160 --> 00:19:18,080
Now, again, this is not a URL that any of you can visit right now,
342
00:19:18,080 --> 00:19:20,760
because for you, localhost, if you have a laptop here,
343
00:19:20,760 --> 00:19:23,050
it is referring to your own instance of the appliance.
344
00:19:23,050 --> 00:19:25,900
This is on my own personal appliance,
345
00:19:25,900 --> 00:19:29,080
but this is kind of dumb for me to have, to have
346
00:19:29,080 --> 00:19:34,480
a user like myself click on hello.html to actually see the contents of this page.
347
00:19:34,480 --> 00:19:42,590
It turns out that web servers like Apache let you have a default file for any web server.
348
00:19:42,590 --> 00:19:44,640
Notice here we have hello.html.
349
00:19:44,640 --> 00:19:48,410
What's the command in Linux to rename a file?
350
00:19:48,410 --> 00:19:50,870
>> MV, for move. So let me do that,
351
00:19:50,870 --> 00:19:55,870
and let me rename hello.html to index.html.
352
00:19:55,870 --> 00:19:58,610
Let me type ls to confirm it's now been renamed.
353
00:19:58,610 --> 00:20:03,250
Now this is going to--if I go back to localhost,
354
00:20:03,250 --> 00:20:06,710
notice now that I'm automatically seeing that web page.
355
00:20:06,710 --> 00:20:11,740
This is identical to my actually doing /index.html,
356
00:20:11,740 --> 00:20:14,740
but the nice thing now is that the web server's figuring
357
00:20:14,740 --> 00:20:18,830
oh, if you have a file that, by human conventions, is called index.html,
358
00:20:18,830 --> 00:20:21,200
let me show the user that file by default
359
00:20:21,200 --> 00:20:25,290
rather than some stupid directory listing which is not at all user-friendly.
360
00:20:25,290 --> 00:20:28,900
Indeed, most websites you visit on the Internet don't have a list of files to click on,
361
00:20:28,900 --> 00:20:34,040
they just show you the content. So that's how we can do that, index.html.
362
00:20:34,040 --> 00:20:37,000
So this is all fun and good, but this is a pretty simple web page.
363
00:20:37,000 --> 00:20:41,640
Let me go ahead and open up index.html in my vhosts,
364
00:20:41,640 --> 00:20:47,620
local hosts, html directory, and let's add something of greater interest.
365
00:20:47,620 --> 00:20:56,120
So there's hello world; let's instead say "This is CS50, Harvard College's . . ."
366
00:20:56,120 --> 00:21:00,000
So the beginning of the course catalog description of some sort there.
367
00:21:00,000 --> 00:21:03,780
Now if I reload, I should see this in my home page.
368
00:21:03,780 --> 00:21:09,560
Okay, and I do see that, but suppose that I want to now list some more content in this file.
369
00:21:09,560 --> 00:21:15,160
I could go down here and say prerequisites none,
370
00:21:15,160 --> 00:21:18,740
although some of you are probably like, "Ha ha ha, no prerequisites."
371
00:21:18,740 --> 00:21:24,320
But--officially. So reload, and now we have the same quirk that we saw last time.
372
00:21:24,320 --> 00:21:26,240
But why is that? It was a simple fix.
373
00:21:26,240 --> 00:21:31,440
Why is this page broken?
374
00:21:31,440 --> 00:21:34,170
[Student, unintelligible] >>Yeah, we've solved this before
375
00:21:34,170 --> 00:21:37,440
by explicitly telling the browser "put a line break here."
376
00:21:37,440 --> 00:21:39,440
And that's because, again, a browser's only going to do
377
00:21:39,440 --> 00:21:42,610
explicitly what the markup language tells it to do,
378
00:21:42,610 --> 00:21:45,730
so even though you might have hit Enter once or twice or even ten times,
379
00:21:45,730 --> 00:21:49,870
it's going to combine that all into a single space, just by convention.
380
00:21:49,870 --> 00:21:52,770
So if you really want a line break, you have to use the br tag,
381
00:21:52,770 --> 00:21:56,840
and now notice, like Monday, I put the / inside of this tag,
382
00:21:56,840 --> 00:22:00,090
only because this just doesn't feel right
383
00:22:00,090 --> 00:22:02,990
to start a line break then stop it with nothing in between.
384
00:22:02,990 --> 00:22:07,740
>> So the convention in HTML is to open and close a tag simultaneously.
385
00:22:07,740 --> 00:22:11,050
As an aside, you'll see a lot of websites in books not doing that.
386
00:22:11,050 --> 00:22:14,240
It is correct to do or not to do it, but we would argue
387
00:22:14,240 --> 00:22:17,430
that design-wise and stylistically, this is just better
388
00:22:17,430 --> 00:22:20,540
because then every tag is both opened and closed somehow.
389
00:22:20,540 --> 00:22:23,370
So now let's save and reload. Go back to the browser, okay.
390
00:22:23,370 --> 00:22:26,680
Now we're making some progress, but it's not quite enough.
391
00:22:26,680 --> 00:22:33,210
Let's go ahead and start typing in some longer body of text.
392
00:22:33,210 --> 00:22:40,610
So let's say, "A quick brown fox jumps over a lazy dog."
393
00:22:40,610 --> 00:22:42,700
And now let me just copy and paste this a few times
394
00:22:42,700 --> 00:22:45,040
so that we have a paragraph of text.
395
00:22:45,040 --> 00:22:47,780
Let me go back over here. So it's not looking very good.
396
00:22:47,780 --> 00:22:50,000
I do have a line break, so it's okay,
397
00:22:50,000 --> 00:22:52,140
but now, once we're getting to the point of having a web page
398
00:22:52,140 --> 00:22:55,640
that has lots of content and not just single lines to demonstrate HTML,
399
00:22:55,640 --> 00:22:58,570
we can start to think of these things as actual paragraphs.
400
00:22:58,570 --> 00:23:01,590
And we can start to structure our web page a little more cleanly.
401
00:23:01,590 --> 00:23:05,120
And indeed, what I can do is go up here inside of my body tag,
402
00:23:05,120 --> 00:23:09,400
and you know what, if "This is CS50. . ." really demarks the beginning of a paragraph,
403
00:23:09,400 --> 00:23:11,310
well, let's tag it as such.
404
00:23:11,310 --> 00:23:13,570
Let me indent the text; just by convention, let me say
405
00:23:13,570 --> 00:23:15,710
that this paragraph ends here,
406
00:23:15,710 --> 00:23:18,320
and then rather than do this line break, let me just say
407
00:23:18,320 --> 00:23:23,300
that this belongs there and as a new paragraph,
408
00:23:23,300 --> 00:23:27,610
and I'll just quickly indent by just clobbering all of this stuff.
409
00:23:27,610 --> 00:23:30,660
>> So now we have an indented paragraph there,
410
00:23:30,660 --> 00:23:33,510
and now our markup is starting to get a little more
411
00:23:33,510 --> 00:23:37,070
semantically consistent with what we're trying to do.
412
00:23:37,070 --> 00:23:40,130
We have a paragraph, so let's call it a paragraph with the p tag.
413
00:23:40,130 --> 00:23:43,370
We have a second paragraph, so let's call it a paragraph with the p tag.
414
00:23:43,370 --> 00:23:45,850
And now, what the browser will typically do
415
00:23:45,850 --> 00:23:48,490
is just like in an English book or essay,
416
00:23:48,490 --> 00:23:51,280
where you typically see some line breaks between paragraphs.
417
00:23:51,280 --> 00:23:53,720
Browsers will do that for you automatically.
418
00:23:53,720 --> 00:23:56,680
So now we have two paragraphs and we can continue this.
419
00:23:56,680 --> 00:23:58,770
But, of course, on the Web, when you have bodies of text,
420
00:23:58,770 --> 00:24:01,370
it's not typically just huge blobs of text.
421
00:24:01,370 --> 00:24:04,040
There are often hyperlinks in there.
422
00:24:04,040 --> 00:24:07,250
So if we want to, for instance, include some links there,
423
00:24:07,250 --> 00:24:10,760
suppose what might be of interest in whatever web page I'm creating here is--
424
00:24:10,760 --> 00:24:12,780
let me go to Google.com,
425
00:24:12,780 --> 00:24:16,540
and let me search for a quick brown fox.
426
00:24:16,540 --> 00:24:22,150
Go to Google images, and, how about--this is cute.
427
00:24:22,150 --> 00:24:27,420
We'll go with this. So here we have a quick brown fox jumping over a lazy dog.
428
00:24:27,420 --> 00:24:30,560
So what I'm going to do here, just for the sake of demonstration,
429
00:24:30,560 --> 00:24:32,950
is suppose that this image was on my server,
430
00:24:32,950 --> 00:24:35,240
and I had been creating these images.
431
00:24:35,240 --> 00:24:38,720
What I just did was right click or control click on the image,
432
00:24:38,720 --> 00:24:42,370
and what you'll see in most browsers is a little menu--
433
00:24:42,370 --> 00:24:48,800
stop doing that--a little menu that allows you to choose copy link location or copy URL.
434
00:24:48,800 --> 00:24:52,750
So let me go back now to my HTML, and suppose that I want
435
00:24:52,750 --> 00:24:56,420
to hyperlink this to another web page.
436
00:24:56,420 --> 00:24:58,640
>> What was the tag called for that?
437
00:24:58,640 --> 00:25:01,650
[Student, unintelligible] >>Yeah. So a href for hyper reference.
438
00:25:01,650 --> 00:25:04,660
Let me go ahead and paste that in.
439
00:25:04,660 --> 00:25:07,290
It's a pretty long URL, so let me zoom back out.
440
00:25:07,290 --> 00:25:09,950
Close brackets, so now notice I'm way over here
441
00:25:09,950 --> 00:25:11,960
because that URL happened to be pretty long.
442
00:25:11,960 --> 00:25:15,180
Let me scroll over here to the end of quick brown fox,
443
00:25:15,180 --> 00:25:18,830
and then let me close this tag with 00:25:21,280
where I only closed the name of the tag.
445
00:25:21,280 --> 00:25:24,470
Now let me go ahead and save that file, reload the web page,
446
00:25:24,470 --> 00:25:27,880
and now, by default, that's going to be underlined in blue for me,
447
00:25:27,880 --> 00:25:31,980
but indeed, I can now click on this and voila. There's that image.
448
00:25:31,980 --> 00:25:33,990
And it didn't have to be an image; it could have linked
449
00:25:33,990 --> 00:25:36,270
to some other random website on the Internet.
450
00:25:36,270 --> 00:25:39,610
I could do this, for instance, with CS50, so one last example here.
451
00:25:39,610 --> 00:25:42,730
"This is CS50" might make sense to go a href =
452
00:25:42,730 --> 00:25:50,340
http://www.cs50.net, close quote, close anchor.
453
00:25:50,340 --> 00:25:53,990
So now that's an even shorter URL, and this time we're not going to link to an image.
454
00:25:53,990 --> 00:25:57,880
We're instead going to link to another page.
455
00:25:57,880 --> 00:25:59,840
Now, we have an image here.
456
00:25:59,840 --> 00:26:02,970
I feel like we can do a little better than just linking to an image.
457
00:26:02,970 --> 00:26:05,760
What if we want to actually embody it in our own web page?
458
00:26:05,760 --> 00:26:09,290
>> Well, what I can do here is, rather than link to this graphic,
459
00:26:09,290 --> 00:26:14,690
let me instead cut the URL, and we'll get rid of that hyperlink and clean this up.
460
00:26:14,690 --> 00:26:17,190
And we'll go down here and get rid of this.
461
00:26:17,190 --> 00:26:20,910
We don't really need all these sentences now, so let me shorten the page a little bit.
462
00:26:20,910 --> 00:26:24,530
And then down here, let me go ahead in a new paragraph,
463
00:26:24,530 --> 00:26:30,100
say I don't want text now; I want an image whose source is going to be that URL.
464
00:26:30,100 --> 00:26:33,100
An image, like a line break, is either there or it's not.
465
00:26:33,100 --> 00:26:35,900
So let me immediately close that tag.
466
00:26:35,900 --> 00:26:39,440
Let me go ahead now and close the paragraph that I'm inside,
467
00:26:39,440 --> 00:26:43,010
and if all goes well with hello, world, if I reload now,
468
00:26:43,010 --> 00:26:45,520
I, indeed, see right inside my own web page an image.
469
00:26:45,520 --> 00:26:48,570
So now we have an image tag, an anchor tag and the like,
470
00:26:48,570 --> 00:26:51,320
and for good measure, let me do one other thing that's often neglected
471
00:26:51,320 --> 00:26:55,900
on websites these days: Let's provide some descriptive text for this image
472
00:26:55,900 --> 00:26:58,090
for people who are on a mobile device
473
00:26:58,090 --> 00:27:00,640
and therefore might not be able to download this image very quickly,
474
00:27:00,640 --> 00:27:03,310
for people who are blind and might not be able to see the image
475
00:27:03,310 --> 00:27:06,480
but they might have a screen reader that can tell them what this image is of.
476
00:27:06,480 --> 00:27:09,100
And to do that, there is another attribute for image tags
477
00:27:09,100 --> 00:27:11,290
called alt, for alternative text.
478
00:27:11,290 --> 00:27:14,650
And what I can do here is say, "This is a quick brown fox."
479
00:27:14,650 --> 00:27:17,650
So that even if the human can't see the image on the screen,
480
00:27:17,650 --> 00:27:20,560
he or she can at least hear, as with some piece of software,
481
00:27:20,560 --> 00:27:23,080
what actually is there on the screen.
482
00:27:23,080 --> 00:27:25,040
>> That won't change the aesthetics of the page,
483
00:27:25,040 --> 00:27:27,640
but it is certainly good practice for users.
484
00:27:27,640 --> 00:27:31,760
All right, let's leave this web page in its current form,
485
00:27:31,760 --> 00:27:33,890
but let's see if we can't now introduce
486
00:27:33,890 --> 00:27:36,210
some better approaches to writing these web pages,
487
00:27:36,210 --> 00:27:39,980
some lessons that are going to serve us well as our pages get more and more complex.
488
00:27:39,980 --> 00:27:42,220
What we're not going to do over the next few weeks
489
00:27:42,220 --> 00:27:46,810
is walk you through all of the several dozen HTML tags that there are.
490
00:27:46,810 --> 00:27:49,800
Much like in Scratch back in week 0, it probably will suffice
491
00:27:49,800 --> 00:27:52,120
to give a high-level overview of some of the concepts,
492
00:27:52,120 --> 00:27:54,530
a quick tour of some of the blocks you were probably able,
493
00:27:54,530 --> 00:27:58,240
pretty comfortably, to navigate on your own, the various puzzle pieces.
494
00:27:58,240 --> 00:28:00,460
And that's going to happen again in HTML, most likely,
495
00:28:00,460 --> 00:28:04,320
whereby there's ample resources on the Web that we'll point you at,
496
00:28:04,320 --> 00:28:06,920
various textbooks, if you prefer to read a textbook,
497
00:28:06,920 --> 00:28:10,560
that will walk you through all of the various things you can do with HTML,
498
00:28:10,560 --> 00:28:16,100
but really, we have seen thus far in HTML most of the fundamental concepts.
499
00:28:16,100 --> 00:28:19,900
We have the notion of tags being opened, tags being closed.
500
00:28:19,900 --> 00:28:22,100
Some tags that are both opened and closed
501
00:28:22,100 --> 00:28:24,620
in the sense that they're empty; there should be nothing inside of them
502
00:28:24,620 --> 00:28:27,490
like an image tag or a line break, which are just there.
503
00:28:27,490 --> 00:28:32,330
We also looked already at the notion of an attribute, like alt or source.
504
00:28:32,330 --> 00:28:36,410
Notice that these words tend, by convention, to be short and succinct.
505
00:28:36,410 --> 00:28:39,140
>> We do not have discretion over what these things are called;
506
00:28:39,140 --> 00:28:42,060
someone else who invented HTML came up with these names.
507
00:28:42,060 --> 00:28:44,710
So you just have to start to know or look up, any time you need them,
508
00:28:44,710 --> 00:28:47,160
what the names are for these tags and attributes.
509
00:28:47,160 --> 00:28:49,510
In the case of these attributes, attributes generally
510
00:28:49,510 --> 00:28:52,900
modify the behavior of some tag.
511
00:28:52,900 --> 00:28:55,710
In this case, the source attribute tells the image tag
512
00:28:55,710 --> 00:28:57,940
what the source of the image should be.
513
00:28:57,940 --> 00:29:04,460
The href attribute tells the anchor tag what it should actually be linking to.
514
00:29:04,460 --> 00:29:06,800
But in terms of the structure of a web page, even though Facebook
515
00:29:06,800 --> 00:29:09,680
and Google and the like look like a complete mess
516
00:29:09,680 --> 00:29:12,560
underneath the hood at first glance, if you start to read through it
517
00:29:12,560 --> 00:29:16,950
more methodically, they all follow this basic, basic structure.
518
00:29:16,950 --> 00:29:19,660
But we can improve the stylization of these things.
519
00:29:19,660 --> 00:29:24,180
So let me go to some examples that I prepared in advance.
520
00:29:24,180 --> 00:29:27,280
Let me go ahead and copy them from another folder here
521
00:29:27,280 --> 00:29:29,380
and put them into this directory.
522
00:29:29,380 --> 00:29:32,210
In advance, what I did was prepare a few files:
523
00:29:32,210 --> 00:29:35,670
search0, search1, search2, and search3 and 4.
524
00:29:35,670 --> 00:29:38,740
Let me go ahead and open up the first of those files,
525
00:29:38,740 --> 00:29:42,570
and let's see if we can't begin to create our own search engine.
526
00:29:42,570 --> 00:29:46,530
At the top of this file, as is usually the case in class, just a bunch of comments.
527
00:29:46,530 --> 00:29:49,760
In HTML, though, the means by which you start a comment
528
00:29:49,760 --> 00:29:55,640
is 00:29:59,800
When you're ready to stop that comment, you can do -->.
530
00:29:59,800 --> 00:30:02,380
So everything at the top in blue is just a comment.
531
00:30:02,380 --> 00:30:04,620
>> This is my doctype declaration, which again,
532
00:30:04,620 --> 00:30:07,080
you can just copy and paste on faith, for now.
533
00:30:07,080 --> 00:30:10,410
This just tells the browser, "Here comes some HTML 5."
534
00:30:10,410 --> 00:30:13,600
Below that, on line 14, is the first of my actual tags,
535
00:30:13,600 --> 00:30:16,900
and this just says, as before, here comes some HTML,
536
00:30:16,900 --> 00:30:19,460
here comes the head of my page, here comes the title,
537
00:30:19,460 --> 00:30:23,900
and then, conversely, that's it for the title, that's it for the head.
538
00:30:23,900 --> 00:30:26,460
Here now comes the body of my page.
539
00:30:26,460 --> 00:30:31,040
So a couple new tags now: h1 stands for heading 1.
540
00:30:31,040 --> 00:30:33,850
There's a tradition in HTML for many years back
541
00:30:33,850 --> 00:30:37,990
of having different sizes of text.
542
00:30:37,990 --> 00:30:41,980
And back in the day, each one meant, generally, just big and bold.
543
00:30:41,980 --> 00:30:45,860
But there's also h2, which is big but not quite as big and bold.
544
00:30:45,860 --> 00:30:49,320
There's h3, which is kind of big but not nearly as big and bold,
545
00:30:49,320 --> 00:30:52,380
and so forth, all the way down to h6.
546
00:30:52,380 --> 00:30:55,550
These days, though, h1, h2, and h3 are really meant
547
00:30:55,550 --> 00:30:57,980
to have more semantic meaning to them,
548
00:30:57,980 --> 00:31:01,100
whereby h1 is really a heading: the heading of a web page,
549
00:31:01,100 --> 00:31:04,210
the heading of a column or something like that of text.
550
00:31:04,210 --> 00:31:09,030
So I've deliberately said
CS50 search
551
00:31:09,030 --> 00:31:12,640
to specifiy that this is really the heading, the title of my page.
552
00:31:12,640 --> 00:31:14,850
Not the title in the title bar sense,
553
00:31:14,850 --> 00:31:18,960
but the title that you actually see in the web page itself, in the body.
554
00:31:18,960 --> 00:31:20,990
Now this, you can probably guess what it is,
555
00:31:20,990 --> 00:31:23,110
even though we have a few new pieces of syntax.
556
00:31:23,110 --> 00:31:25,930
This is a form. So the web really gets interesting
557
00:31:25,930 --> 00:31:28,770
when websites take input from users.
558
00:31:28,770 --> 00:31:31,700
In this class, in the problem set on web programming,
559
00:31:31,700 --> 00:31:33,880
we're not going to make a website, per se,
560
00:31:33,880 --> 00:31:37,570
with static content that shows photographs that you've taken,
561
00:31:37,570 --> 00:31:40,010
or this is my resume, and things about me,
562
00:31:40,010 --> 00:31:42,450
because those things are relatively easy to put together.
563
00:31:42,450 --> 00:31:44,400
It's hard to make things beautiful on the Web,
564
00:31:44,400 --> 00:31:46,390
but at least putting up content is pretty trivial.
565
00:31:46,390 --> 00:31:49,380
But things get really interesting when someone can visit your website
566
00:31:49,380 --> 00:31:52,260
and provide input and can fill out forms,
567
00:31:52,260 --> 00:31:55,800
can check off checkboxes and can interact with your website.
568
00:31:55,800 --> 00:31:57,780
And indeed, probably every website you care about
569
00:31:57,780 --> 00:32:00,710
these days, in any detail, is somehow interactive.
570
00:32:00,710 --> 00:32:03,110
Facebook, Google, and the like, that take user input
571
00:32:03,110 --> 00:32:05,100
and produce customized output.
572
00:32:05,100 --> 00:32:07,780
>> So let's start to do that now. Let's transition now
573
00:32:07,780 --> 00:32:11,150
from just using HTML for markup of static content
574
00:32:11,150 --> 00:32:14,790
as instead a delivery mechanism for dynamic content.
575
00:32:14,790 --> 00:32:17,350
And toward that end, let's implement our own search engine.
576
00:32:17,350 --> 00:32:20,820
Let's do it as follows. Here's the form tag.
577
00:32:20,820 --> 00:32:24,090
The action attribute specifies that when the user fills out this form
578
00:32:24,090 --> 00:32:28,400
with their keyboard, it will be submitted to this URL here.
579
00:32:28,400 --> 00:32:31,230
So I'm kind of cheating. It's going to take us a little longer
580
00:32:31,230 --> 00:32:33,780
than one class to implement the whole search engine,
581
00:32:33,780 --> 00:32:35,880
so we'll just do the front end, so to speak.
582
00:32:35,880 --> 00:32:38,650
We'll do the part that lets the user search, and we'll sort of punt to Google
583
00:32:38,650 --> 00:32:40,950
the hard part of finding search results,
584
00:32:40,950 --> 00:32:43,520
but, specifically, I'm going to talk to Google's web server
585
00:32:43,520 --> 00:32:46,710
using one of two very popular methods.
586
00:32:46,710 --> 00:32:50,000
One being get, another, that we'll eventually see, being post,
587
00:32:50,000 --> 00:32:52,660
although there are others that are less often used.
588
00:32:52,660 --> 00:32:56,440
So get just conjures up the idea of I want to get some content, get some search results.
589
00:32:56,440 --> 00:32:58,440
This, you can perhaps guess what this does.
590
00:32:58,440 --> 00:33:01,900
This is some kind of input; it's, in fact, going to look like a text field,
591
00:33:01,900 --> 00:33:05,200
and the name of that input, the name of that variable, so to speak,
592
00:33:05,200 --> 00:33:08,610
is going to be q for query by convention.
593
00:33:08,610 --> 00:33:11,700
And again, the type of this input is not going to be a checkbox;
594
00:33:11,700 --> 00:33:13,890
it's not going to be a menu; it's going to be a text field
595
00:33:13,890 --> 00:33:18,060
as denoted by this attribute here, and this text box,
596
00:33:18,060 --> 00:33:20,680
like a line break, is either there or not.
597
00:33:20,680 --> 00:33:24,480
So we have an empty element with the slash inside that tag.
598
00:33:24,480 --> 00:33:28,050
Then I'm going to put a line break, and you can, perhaps, guess what this is going to do.
599
00:33:28,050 --> 00:33:30,210
This is another sort of form input.
600
00:33:30,210 --> 00:33:32,350
>> This one's going to be used for submitting the form.
601
00:33:32,350 --> 00:33:36,140
So this is going to be the big button that the user can click to submit the form,
602
00:33:36,140 --> 00:33:40,800
and the label on that button is going to be "CS50 Search."
603
00:33:40,800 --> 00:33:44,170
Close form, close body, close HTML.
604
00:33:44,170 --> 00:33:46,280
Let's see what we have in the form of this web page.
605
00:33:46,280 --> 00:33:48,260
So let me go to my browser,
606
00:33:48,260 --> 00:33:50,360
let me go, still, to localhost.
607
00:33:50,360 --> 00:33:54,650
This is still index.html, so if I want to see this file called search0,
608
00:33:54,650 --> 00:33:59,710
I can simply do /search0.html, Enter--
609
00:33:59,710 --> 00:34:01,880
and the first of my mistakes.
610
00:34:01,880 --> 00:34:04,400
What's going on? I clearly don't have permission
611
00:34:04,400 --> 00:34:06,430
to access this file, for some reason.
612
00:34:06,430 --> 00:34:10,170
But that's because, unlike the work we've done thus far in C,
613
00:34:10,170 --> 00:34:14,340
where the programs you write are assumed to be runnable by you,
614
00:34:14,340 --> 00:34:17,590
executable by you, that's not really the case on the Web,
615
00:34:17,590 --> 00:34:21,010
whereby sometimes you might want to create files on a server,
616
00:34:21,010 --> 00:34:23,310
but you don't want the whole world to be able to see them.
617
00:34:23,310 --> 00:34:25,469
Rather, you want the world to see some files
618
00:34:25,469 --> 00:34:27,730
but not others, just for privacy's sake.
619
00:34:27,730 --> 00:34:30,730
So it's more of an opt-in basis when you're doing things on the Web.
620
00:34:30,730 --> 00:34:32,810
And so let me actually type ls here,
621
00:34:32,810 --> 00:34:37,440
and you see the files I have, but recall that if I do ls -l for long,
622
00:34:37,440 --> 00:34:41,520
I'll get a longer listing that gives me some more details about these files
623
00:34:41,520 --> 00:34:45,139
that are now, really, for the first time relevant to us.
624
00:34:45,139 --> 00:34:47,840
Notice that on the far right are the names of my files,
625
00:34:47,840 --> 00:34:50,690
and then the time at which they were last modified or copied.
626
00:34:50,690 --> 00:34:54,370
This number here is what? Do you recall?
627
00:34:54,370 --> 00:34:56,400
The size in bytes, how big the file is.
628
00:34:56,400 --> 00:34:59,520
>> So I seem to have some kind of logo in here that's bigger than all the other files.
629
00:34:59,520 --> 00:35:03,610
This is who I am, this is what I am and what group I'm in.
630
00:35:03,610 --> 00:35:07,430
But then, over here on the left is a bit of cryptic sequence,
631
00:35:07,430 --> 00:35:10,040
and we talked, I think, briefly about this in the past,
632
00:35:10,040 --> 00:35:12,050
but this has to do with permissions.
633
00:35:12,050 --> 00:35:14,020
And even if that's a little hazy,
634
00:35:14,020 --> 00:35:17,270
RW probably means read and write.
635
00:35:17,270 --> 00:35:22,560
So it turns out that these dashes denote different sets of permissions for different people.
636
00:35:22,560 --> 00:35:24,730
And the pattern is, essentially, as follows.
637
00:35:24,730 --> 00:35:27,650
When you see a sequence of dashes here, they look as follows.
638
00:35:27,650 --> 00:35:30,450
There's a dash, then there's three more dashes,
639
00:35:30,450 --> 00:35:33,390
then there's another three, then there's another three.
640
00:35:33,390 --> 00:35:36,800
The first one is either a dash or it's a d for directory.
641
00:35:36,800 --> 00:35:40,220
So that one's pretty easy. If it's a folder, it says d, otherwise it's a hyphen.
642
00:35:40,220 --> 00:35:44,080
There's a couple other cases, but for now we'll just care about files and directories.
643
00:35:44,080 --> 00:35:48,090
These next three dashes--and I've artificially inserted the spaces.
644
00:35:48,090 --> 00:35:50,490
They were, obviously, not there when we saw them a moment ago.
645
00:35:50,490 --> 00:35:52,900
These are the file owner's permissions,
646
00:35:52,900 --> 00:35:55,840
and recall from a second ago that it was read and write.
647
00:35:55,840 --> 00:35:58,560
That was because I, as the person who created this file a moment ago,
648
00:35:58,560 --> 00:36:01,250
I, just by default, on a Linux computer,
649
00:36:01,250 --> 00:36:03,910
have the ability to continue reading and writing that file.
650
00:36:03,910 --> 00:36:07,170
>> So the operating system just gives me RW automatically.
651
00:36:07,170 --> 00:36:10,840
The middle ones relate to my group, that of students,
652
00:36:10,840 --> 00:36:14,590
which is sort of meaningless on the appliance because I'm the only person using the appliance.
653
00:36:14,590 --> 00:36:16,620
So let me just wave my hands at that for now.
654
00:36:16,620 --> 00:36:19,190
But the last ones are most important for the Web.
655
00:36:19,190 --> 00:36:21,580
This is everyone else in the world, and the fact
656
00:36:21,580 --> 00:36:24,600
that that is --- means that no one else in the world
657
00:36:24,600 --> 00:36:26,680
has any permissions to this file.
658
00:36:26,680 --> 00:36:29,180
Clearly a problem, so I need to fix this
659
00:36:29,180 --> 00:36:33,830
by somehow giving the world what? Read and write?
660
00:36:33,830 --> 00:36:35,850
That's probably dumb, right? I don't want anyone on the Web
661
00:36:35,850 --> 00:36:38,530
to go to visit my page and somehow change that file,
662
00:36:38,530 --> 00:36:40,800
even though they really couldn't with an HTML file,
663
00:36:40,800 --> 00:36:44,110
but just in principle, probably just want them to be able to read it.
664
00:36:44,110 --> 00:36:47,910
What does it mean to read it? It doesn't mean they're going to care about the actual HTML,
665
00:36:47,910 --> 00:36:51,820
but the browser needs to be able to parse that markup language,
666
00:36:51,820 --> 00:36:53,720
top to bottom, left to right.
667
00:36:53,720 --> 00:36:57,990
So someone on the Web needs to be able to read it, so I minimally need to give it r.
668
00:36:57,990 --> 00:37:00,240
I can do this in a few different ways, but perhaps
669
00:37:00,240 --> 00:37:03,080
the simplest is to run this command here.
670
00:37:03,080 --> 00:37:10,860
Chmod, change mode, then a + r so all, everyone in the world + read,
671
00:37:10,860 --> 00:37:13,830
and then the name of the file, search0.html.
672
00:37:13,830 --> 00:37:18,310
>> Now if I do ls -l again, notice that that file has changed,
673
00:37:18,310 --> 00:37:21,440
and indeed, I've turned on r for everyone.
674
00:37:21,440 --> 00:37:23,350
I've also turned it on for my group, but that's fine,
675
00:37:23,350 --> 00:37:27,150
because if I turned it on for everyone, my group is a subset of that.
676
00:37:27,150 --> 00:37:31,480
So that's fine too. This just means the computer has now made it readable.
677
00:37:31,480 --> 00:37:34,430
Now let me go back to my browser, click reload.
678
00:37:34,430 --> 00:37:36,330
Ah-ha. We now have CS50 Search.
679
00:37:36,330 --> 00:37:39,830
I've zoomed in a little artificially--pretty hideous search engine.
680
00:37:39,830 --> 00:37:41,930
But let's see if it actually works.
681
00:37:41,930 --> 00:37:45,880
First, let me do a quick sanity check, let me control click and view page source.
682
00:37:45,880 --> 00:37:50,780
Notice that within Chrome, we're now seeing the same HTML that I myself created.
683
00:37:50,780 --> 00:37:55,420
Don't get confused here, though. I can't start changing the code here,
684
00:37:55,420 --> 00:37:59,420
because the browser has a read-only view of this code.
685
00:37:59,420 --> 00:38:06,060
The browser has just asked localhost for a file called search0.html.
686
00:38:06,060 --> 00:38:09,490
It is now pure coincidence that the appliance
687
00:38:09,490 --> 00:38:13,480
happens to be on the same computer as my browser.
688
00:38:13,480 --> 00:38:20,470
I could just have, equivalently, have typed in www.facebook.com/search0.html,
689
00:38:20,470 --> 00:38:23,830
and if Facebook had a file called that, I would then be seeing their HTML.
690
00:38:23,830 --> 00:38:27,360
And, of course, I can't change the file that comes back from Facebook, either.
691
00:38:27,360 --> 00:38:29,360
So now we're sort of blurring the lines.
692
00:38:29,360 --> 00:38:32,130
The appliance is both a server, serving up web pages,
693
00:38:32,130 --> 00:38:34,870
but it's also a client in the sense that I'm using a browser
694
00:38:34,870 --> 00:38:37,630
to actually talk to that server.
695
00:38:37,630 --> 00:38:39,610
So let's see if my Google search engine works.
696
00:38:39,610 --> 00:38:44,930
Let me go ahead and search for quick brown fox, Enter.
697
00:38:44,930 --> 00:38:47,540
And voila, I now have my own search engine.
698
00:38:47,540 --> 00:38:51,460
>> But how does this work?
699
00:38:51,460 --> 00:38:55,380
Bit of a stretch, but--and now you can't see, precisely, the part that's of interest.
700
00:38:55,380 --> 00:38:57,370
Notice what happens.
701
00:38:57,370 --> 00:39:00,430
Notice the URL. It turns out that that method,
702
00:39:00,430 --> 00:39:02,780
called get, is super simple.
703
00:39:02,780 --> 00:39:10,270
When you specify in a form that you want to 'get' results from some server,
704
00:39:10,270 --> 00:39:13,200
what it's going to do is take whatever you typed into the form
705
00:39:13,200 --> 00:39:15,290
and put it in the URL.
706
00:39:15,290 --> 00:39:18,580
It's going to standardize how it gets put into the URL as follows.
707
00:39:18,580 --> 00:39:22,290
Notice that this is the URL that was the value of my action attribute.
708
00:39:22,290 --> 00:39:24,730
That's where I wanted the form to end up.
709
00:39:24,730 --> 00:39:26,950
But then notice this question mark.
710
00:39:26,950 --> 00:39:30,230
This is a convention on the Web whereby to provide user input
711
00:39:30,230 --> 00:39:35,320
to a website, you append to the URL a question mark,
712
00:39:35,320 --> 00:39:38,330
and then you have a whole bunch of key-value pairs.
713
00:39:38,330 --> 00:39:42,380
The name of a key, otherwise known as a parameter in the Web,
714
00:39:42,380 --> 00:39:46,380
then you have an equal sign, then you have the value of that parameter.
715
00:39:46,380 --> 00:39:49,810
So it's essentially a variable name and a variable value,
716
00:39:49,810 --> 00:39:54,250
but those variables' names and values came from the HTML form.
717
00:39:54,250 --> 00:39:56,250
Why are the pluses there, do you think?
718
00:39:56,250 --> 00:39:59,340
Because I did not type + in between my words.
719
00:39:59,340 --> 00:40:01,430
[Student, unintelligible]
720
00:40:01,430 --> 00:40:05,080
>>Yeah, it's just for spacing. Odds are, whenever you've seen a URL,
721
00:40:05,080 --> 00:40:07,320
there's never any spaces in it, if only because
722
00:40:07,320 --> 00:40:09,440
if there were, you couldn't really copy and paste it
723
00:40:09,440 --> 00:40:12,700
into an IM or into an email because it would break.
724
00:40:12,700 --> 00:40:15,420
You want the whole thing to be one contiguous string of characters.
725
00:40:15,450 --> 00:40:18,450
>> So the browser is smart enough to realize, uh-uh.
726
00:40:18,450 --> 00:40:22,610
Don't just put a space there. Let me encode the space in some standard way.
727
00:40:22,610 --> 00:40:25,170
One of the conventions for doing so is to have the browser
728
00:40:25,170 --> 00:40:29,350
automatically put a + where you would otherwise have a space.
729
00:40:29,350 --> 00:40:32,140
So now, notice Google has been kind of user-friendly.
730
00:40:32,140 --> 00:40:34,380
I certainly did not create this web page,
731
00:40:34,380 --> 00:40:37,200
but they have prepopulated their own text field
732
00:40:37,200 --> 00:40:39,490
with what, precisely, I typed in.
733
00:40:39,490 --> 00:40:43,090
Suppose I want to search for something else, like a lazy dog.
734
00:40:43,090 --> 00:40:45,340
I can just type this here, re-search.
735
00:40:45,340 --> 00:40:47,730
Notice that the URL changes up here,
736
00:40:47,730 --> 00:40:51,390
but notice then that I can actually search for anything I want
737
00:40:51,390 --> 00:40:53,610
just by understanding how URLs work.
738
00:40:53,610 --> 00:40:56,840
I could do lazy cat, Enter,
739
00:40:56,840 --> 00:41:01,370
and notice now I'm getting a very lazy--should we? I feel like we should.
740
00:41:01,370 --> 00:41:09,900
I get a very lazy cat.
741
00:41:09,900 --> 00:41:11,930
All right. This is one of the stupidest things we've done.
742
00:41:11,930 --> 00:41:17,160
But that is a lazy cat.
743
00:41:17,160 --> 00:41:19,730
Anyhow, what's the key takeaway here?
744
00:41:19,730 --> 00:41:22,830
Now we're sort of playing in the world of HTTP.
745
00:41:22,830 --> 00:41:26,050
HTML is just this markup language, open tag, close tag,
746
00:41:26,050 --> 00:41:29,490
that tells a browser how to render content on a web page.
747
00:41:29,490 --> 00:41:32,850
But when you start transmitting data across the Internet
748
00:41:32,850 --> 00:41:36,290
between web browser and server, that's where this protocol
749
00:41:36,290 --> 00:41:39,370
known as HyperText Transfer Protocol takes over.
750
00:41:39,370 --> 00:41:42,630
This is the sort of human convention; when Sam and I shook hands on Monday,
751
00:41:42,630 --> 00:41:48,300
starting a connection and then closing a connection, same idea here.
752
00:41:48,300 --> 00:41:53,100
How are Google's results coming back to me?
753
00:41:53,100 --> 00:41:55,290
How is my form submission going to Google?
754
00:41:55,290 --> 00:41:58,160
Well, recall from the other day that what's really going on
755
00:41:58,160 --> 00:42:02,150
underneath the hood when you request a web page is
756
00:42:02,150 --> 00:42:04,860
your browser is sending a somewhat cryptic message like
757
00:42:04,860 --> 00:42:09,510
GET / HTTP/1.1 for the default home page.
758
00:42:09,510 --> 00:42:13,000
>> Or, in this case, because I specifically requested earlier
759
00:42:13,000 --> 00:42:17,340
search0.html, this then would be the somewhat-cryptic message
760
00:42:17,340 --> 00:42:20,040
that my browser sends to the appliance.
761
00:42:20,040 --> 00:42:23,090
Or, in this case of Google, what's actually sent
762
00:42:23,090 --> 00:42:33,740
is a request to /search, and then ?q=lazy cat, with a plus there.
763
00:42:33,740 --> 00:42:36,790
So this message that I, the human, am never typing,
764
00:42:36,790 --> 00:42:40,620
but is being sent by my browser, this is how HTTP happens.
765
00:42:40,620 --> 00:42:43,240
This is the equivalent of our having shaken hands.
766
00:42:43,240 --> 00:42:46,320
This is the request, and the server's about to send a response.
767
00:42:46,320 --> 00:42:48,560
So let's take a look at this underneath the hood.
768
00:42:48,560 --> 00:42:55,320
As before, we can open up this special field in a browser.
769
00:42:55,320 --> 00:42:58,720
View Page, Inspect Elements.
770
00:42:58,720 --> 00:43:01,550
So under Inspect Element, notice that what's happened in Chrome,
771
00:43:01,550 --> 00:43:04,160
and IE and Firefox have similar mechanisms,
772
00:43:04,160 --> 00:43:07,370
we have these developer tools accessible to us.
773
00:43:07,370 --> 00:43:09,630
Normal people do not use these tabs.
774
00:43:09,630 --> 00:43:11,940
But we, now, are interested in what's going on
775
00:43:11,940 --> 00:43:13,890
underneath the hood at the network level.
776
00:43:13,890 --> 00:43:16,130
So if I pull up the network level here,
777
00:43:16,130 --> 00:43:18,510
let me go ahead and expand this window,
778
00:43:18,510 --> 00:43:21,840
open up this entry here, and look at the headers.
779
00:43:21,840 --> 00:43:26,010
So what happens when I request a file from a web server
780
00:43:26,010 --> 00:43:29,410
is my browser sends a whole bunch of things.
781
00:43:29,410 --> 00:43:32,390
And let me view source. So under request headers,
782
00:43:32,390 --> 00:43:35,250
and this is just Chrome showing me some diagnostic output,
783
00:43:35,250 --> 00:43:37,340
sort of like a debugger of some sort,
784
00:43:37,340 --> 00:43:40,500
notice that what I've highlighted here is precisely what
785
00:43:40,500 --> 00:43:47,060
Chrome is sending to the server in order to request a file called search0.html.
786
00:43:47,060 --> 00:43:50,160
It is telling the server what it thinks its name is,
787
00:43:50,160 --> 00:43:52,210
thanks to this host colon field, then there's some
788
00:43:52,210 --> 00:43:56,950
pretty esoteric stuff in here, like something to do with dates and times,
789
00:43:56,950 --> 00:43:59,720
something to do with the languages that the browser understands,
790
00:43:59,720 --> 00:44:02,850
but the really important lines are these first two here.
791
00:44:02,850 --> 00:44:05,490
>> What does the server respond with? Well, if we scroll down here
792
00:44:05,490 --> 00:44:08,510
and view source of this thing, notice that the server
793
00:44:08,510 --> 00:44:13,700
has responded with a somewhat cryptic message as well, 304 not modified.
794
00:44:13,700 --> 00:44:16,030
That's a little strange; let me actually try to fix this.
795
00:44:16,030 --> 00:44:18,670
Let me hold down Shift and click Reload up here
796
00:44:18,670 --> 00:44:22,460
to force the browser to actually make this request for the first time.
797
00:44:22,460 --> 00:44:25,700
Then let me zoom in, and we'll see now that the server's response,
798
00:44:25,700 --> 00:44:28,950
because I held Shift, is 200 OK.
799
00:44:28,950 --> 00:44:31,170
So you've probably never seen the number 200
800
00:44:31,170 --> 00:44:33,300
in the context of the Web, but what numbers
801
00:44:33,300 --> 00:44:36,760
have you sometimes seen unexpectedly from a server?
802
00:44:36,760 --> 00:44:42,010
404, file not found; 403, forbidden; 500, server error.
803
00:44:42,010 --> 00:44:44,890
So there are these numeric codes that the world uses in the Web
804
00:44:44,890 --> 00:44:47,870
to signify errors, just like C functions
805
00:44:47,870 --> 00:44:51,030
can return errors and main can return exit codes.
806
00:44:51,030 --> 00:44:54,160
200, though, you rarely see because it means all is well.
807
00:44:54,160 --> 00:44:59,000
And 304 you probably never see because what is it signifying?
808
00:44:59,000 --> 00:45:03,330
That nothing has--let's see if we can simulate this again--
809
00:45:03,330 --> 00:45:07,170
Oh, now it's not cooperating. 304 said not modified,
810
00:45:07,170 --> 00:45:09,170
so why was the server even responding?
811
00:45:09,170 --> 00:45:12,550
Well, for efficiency, a web server automatically for you,
812
00:45:12,550 --> 00:45:16,570
if the file hasn't changed, it won't retransmit the whole HTML file.
813
00:45:16,570 --> 00:45:19,150
It'll just tell the browser it hasn't changed.
814
00:45:19,150 --> 00:45:21,220
Just use the copy you already have.
815
00:45:21,220 --> 00:45:22,650
So there's this notion of caching on the Web
816
00:45:22,650 --> 00:45:25,840
for performance, so that you don't waste time and waste bandwidth
817
00:45:25,840 --> 00:45:29,160
downloading files again and again unnecessarily.
818
00:45:29,160 --> 00:45:31,460
>> But this web page, now, was super-simple,
819
00:45:31,460 --> 00:45:34,980
and it only showed me the HTML that came back.
820
00:45:34,980 --> 00:45:40,940
Let's actually use the network tab now to do a Google search like quick brown fox.
821
00:45:40,940 --> 00:45:43,010
Let me then click CS50 Search,
822
00:45:43,010 --> 00:45:46,950
and now, notice in the bottom here a whole bunch of stuff came back
823
00:45:46,950 --> 00:45:49,900
because when I visit a real website like Google.com,
824
00:45:49,900 --> 00:45:53,520
they have images, they have text, they have a language called JavaScript there.
825
00:45:53,520 --> 00:45:55,940
So every row in this table down here
826
00:45:55,940 --> 00:46:01,490
represents something that Google spit out in response to my single request.
827
00:46:01,490 --> 00:46:04,160
The one I care about, though, is this first one.
828
00:46:04,160 --> 00:46:08,420
And if I go to the search, request, click View Source here,
829
00:46:08,420 --> 00:46:11,300
notice that, indeed, the cryptic message that my browser sent
830
00:46:11,300 --> 00:46:15,010
to Google was these two lines here,
831
00:46:15,010 --> 00:46:18,420
followed by some arcane information down here which we'll ignore for now.
832
00:46:18,420 --> 00:46:20,890
But notice, too, what Chrome is pretty handy with,
833
00:46:20,890 --> 00:46:24,540
it's also showing me the query string that was sent in.
834
00:46:24,540 --> 00:46:27,410
So rather than show me this, which was literally sent,
835
00:46:27,410 --> 00:46:30,800
if I view it decoded, Chrome, just for debugging purposes,
836
00:46:30,800 --> 00:46:34,270
for developers like us, it's just showing me a human-friendly version of--
837
00:46:34,270 --> 00:46:36,390
that is not how you spell fox, apparently.
838
00:46:36,390 --> 00:46:40,520
I'm just noticing this now--but it's showing you what I, apparently, typed.
839
00:46:40,520 --> 00:46:45,340
Meanwhile, the response that came back from the server is again 200 OK.
840
00:46:45,340 --> 00:46:47,930
But included in that response, of course,
841
00:46:47,930 --> 00:46:51,920
if we actually view the page's HTML--
842
00:46:51,920 --> 00:46:55,440
sorry, this is a little keyboard shortcut gone awry today.
843
00:46:55,440 --> 00:46:59,020
>> I'll deal with this later. So if we actually view the page's source,
844
00:46:59,020 --> 00:47:02,990
which I can do down here by clicking response,
845
00:47:02,990 --> 00:47:10,080
this is what was actually spit back, in addition to that cryptic 200 OK message from the server.
846
00:47:10,080 --> 00:47:12,520
A little cryptic, but where is all this coming from?
847
00:47:12,520 --> 00:47:15,570
Well, let's do one other thing here. Another somewhat cryptic command,
848
00:47:15,570 --> 00:47:20,530
but this one's kind of neat in that it reveals to us exactly what's going on underneath the hood.
849
00:47:20,530 --> 00:47:22,530
So I'm back on my Mac here, I have connected
850
00:47:22,530 --> 00:47:25,980
via a program called SSH, Secure Shell, to another server
851
00:47:25,980 --> 00:47:28,940
because most of Harvard's computers block the command we're about to run
852
00:47:28,940 --> 00:47:31,640
because there's this command on some servers called traceroute
853
00:47:31,640 --> 00:47:34,810
that allows you to trace the route between points a and b,
854
00:47:34,810 --> 00:47:37,020
and thus far we've been taking completely for granted
855
00:47:37,020 --> 00:47:40,170
that I can type in Google.com and somehow get data back
856
00:47:40,170 --> 00:47:43,530
from halfway across the country or halfway across the world.
857
00:47:43,530 --> 00:47:45,810
With traceroute we can actually dive in a little deeper
858
00:47:45,810 --> 00:47:49,370
as to how the Internet works, and see what's going on underneath the hood.
859
00:47:49,370 --> 00:47:54,440
So let's go ahead and arbitrarily trace a route to, say, Stanford.edu,
860
00:47:54,440 --> 00:47:57,150
which is across the country, and hit Enter.
861
00:47:57,150 --> 00:47:59,380
This command can be super fast or super slow,
862
00:47:59,380 --> 00:48:02,010
but what we're seeing now, line by line,
863
00:48:02,010 --> 00:48:08,060
is every one of the steps or hops between us and Palo Alto, or Stanford,
864
00:48:08,060 --> 00:48:11,010
where they have their web server.
865
00:48:11,010 --> 00:48:16,600
So what does each of these lines represent more concretely, though?
866
00:48:16,600 --> 00:48:19,100
A piece of jargon from the Internet? [Student, unintelligible]
867
00:48:19,100 --> 00:48:21,570
>>What's that? [Student, unintelligible]
868
00:48:21,570 --> 00:48:25,390
>>Oh, so there are times, but what does each row--what do I mean by hop?
869
00:48:25,390 --> 00:48:29,140
>> Well, there are these things on the Internet called routers.
870
00:48:29,140 --> 00:48:33,020
And routers, as the name suggests, route information from point a to point b.
871
00:48:33,020 --> 00:48:36,920
But there are several points beyond a and b.
872
00:48:36,920 --> 00:48:40,010
There's c and d and e and f between row 1,
873
00:48:40,010 --> 00:48:43,480
which happens to be my computer's IP address,
874
00:48:43,480 --> 00:48:46,890
or my numeric address, which uniquely identifies my computer,
875
00:48:46,890 --> 00:48:50,300
and step 15, which is actually the sixth web server,
876
00:48:50,300 --> 00:48:54,640
apparently, which I'm inferring from this, or version 6 of their web server at Stanford.
877
00:48:54,640 --> 00:48:56,680
But what's kind of neat is, we can see the path
878
00:48:56,680 --> 00:49:00,480
that my 0's and 1's are taking from my computer to Stanford.
879
00:49:00,480 --> 00:49:02,500
So step 1 is my own computer's address.
880
00:49:02,500 --> 00:49:05,760
Every computer on the Internet has a unique identifier that looks like this.
881
00:49:05,760 --> 00:49:08,150
Number.number.number.number.
882
00:49:08,150 --> 00:49:10,370
Somewhere on this campus, probably in the science center,
883
00:49:10,370 --> 00:49:16,780
is a router called Core Gateway 2 -te83, whatever that means,
884
00:49:16,780 --> 00:49:20,590
so this is one of Harvard's big fancy routers that routes a lot of their traffic.
885
00:49:20,590 --> 00:49:24,640
Here's another of Harvard's routers, this one is Border Gateway,
886
00:49:24,640 --> 00:49:28,310
border meaning it's probably on the periphery of campus somewhere.
887
00:49:28,480 --> 00:49:32,790
Then there's nox one, row 4, which is Northern Crossroads,
888
00:49:32,790 --> 00:49:35,070
which is a big ISP, Internet service provider,
889
00:49:35,070 --> 00:49:37,740
that places like Harvard connect up to.
890
00:49:37,740 --> 00:49:40,760
But then things get a little interesting in line 6.
891
00:49:40,760 --> 00:49:45,960
Where are my bits all of a sudden? Kansas.
892
00:49:45,960 --> 00:49:49,300
The world has a habit of using airport codes in a lot of these things,
893
00:49:49,300 --> 00:49:52,900
or at least abbreviations for states or cities,
894
00:49:52,900 --> 00:49:56,490
so it looks like, in just 60 ms,
895
00:49:56,490 --> 00:49:59,420
a packet of information, 0's and 1's from my laptop
896
00:49:59,420 --> 00:50:03,210
got all the way to Kansas, and again, in 60 ms.
897
00:50:03,210 --> 00:50:08,180
>> Moreover, after Kansas, they took a tour through Houston, probably,
898
00:50:08,180 --> 00:50:10,140
as suggested by the name of this server.
899
00:50:10,140 --> 00:50:13,310
So just as a server on the Internet must have a numeric address,
900
00:50:13,310 --> 00:50:18,360
it can also, optionally, have a slightly more human-friendly address that humans came up with.
901
00:50:18,360 --> 00:50:20,510
Now, in step 8, we don't know what this is.
902
00:50:20,510 --> 00:50:22,550
Sometimes routers just kind of ignore you,
903
00:50:22,550 --> 00:50:25,010
and they just don't answer the questions, so that's fine.
904
00:50:25,010 --> 00:50:29,290
The one after step 8 is apparently where? L.A.
905
00:50:29,290 --> 00:50:35,290
Notice in only 78 ms, what takes us humans like 6+ hours to do physically,
906
00:50:35,290 --> 00:50:40,110
takes packets of information on the Internet 78 ms to travel that far.
907
00:50:40,110 --> 00:50:45,890
Step 10 is in L.A. as well, and step 11 seems to have gone north, up near Stanford.
908
00:50:45,890 --> 00:50:48,750
This is their boundary router, or border router.
909
00:50:48,750 --> 00:50:51,240
A couple steps at Stanford that are ignoring us,
910
00:50:51,240 --> 00:50:55,610
and lastly, we reach the web server in just 87 ms.
911
00:50:55,610 --> 00:50:57,760
Now, all of these numbers, as an aside,
912
00:50:57,760 --> 00:51:00,640
just tell you how long it takes for data to get from me
913
00:51:00,640 --> 00:51:03,530
to each of these routers, and it's not accumulative.
914
00:51:03,530 --> 00:51:06,960
What this program does is it first sends a message, essentially, to the first router.
915
00:51:06,960 --> 00:51:09,490
Then one to the second router; then one to the third router,
916
00:51:09,490 --> 00:51:12,610
measuring each time. So in theory, these times will be growing
917
00:51:12,610 --> 00:51:14,860
or at least pretty close to one another,
918
00:51:14,860 --> 00:51:18,090
and, indeed, the ones that are right here on campus are super-small.
919
00:51:18,090 --> 00:51:20,820
As soon as you start going across the country, it takes data
920
00:51:20,820 --> 00:51:24,830
a little longer to travel, closer to 100 ms, give or take.
921
00:51:24,830 --> 00:51:28,330
But let's go the other direction now. How about Cambridge University in the UK?
922
00:51:28,330 --> 00:51:32,540
Let me instead run traceroute of www.cam for Cambridge,
923
00:51:32,540 --> 00:51:36,710
.ac for academic, .uk, and hit Enter here.
924
00:51:36,710 --> 00:51:38,830
That was pretty damn fast.
925
00:51:38,830 --> 00:51:43,300
My data literally went to Cambridge, England, in that split second of time.
926
00:51:43,300 --> 00:51:45,340
>> So let's see the path that it took.
927
00:51:45,340 --> 00:51:47,520
Harvard, Harvard, Harvard, Northern Crossroads,
928
00:51:47,520 --> 00:51:52,690
which is an ISP, and then this is Northern Crossroads, and then bam.
929
00:51:52,690 --> 00:51:58,320
What is in between steps 6 and 7, router 6 and 7?
930
00:51:58,320 --> 00:52:02,040
The Atlantic Ocean. And we're inferring this from the fact that
931
00:52:02,040 --> 00:52:06,530
we go from 20 ms here to 80 ms here.
932
00:52:06,530 --> 00:52:10,050
So something took 60 ms, give or take, to get over.
933
00:52:10,050 --> 00:52:12,910
And that was probably a big body of water.
934
00:52:12,910 --> 00:52:15,250
What goes on after that? Well, here we are in London,
935
00:52:15,250 --> 00:52:18,860
just 88 ms later. More London, more London,
936
00:52:18,860 --> 00:52:21,730
not sure where this is, but we'll assume it's outside of London,
937
00:52:21,730 --> 00:52:26,390
Cambridge here, and finally we--literally, University of Cambridge
938
00:52:26,390 --> 00:52:29,500
.something.net, and then, finally, in line 16,
939
00:52:29,500 --> 00:52:31,720
their web server is apparently called Scorpius
940
00:52:31,720 --> 00:52:35,500
underneath the hood, even though we know it as www.
941
00:52:35,500 --> 00:52:38,790
Kind of mind-blowing, I think. The first time I ever did this, it totally blew my mind.
942
00:52:38,790 --> 00:52:41,670
Unfortunately, Harvard blocks this kind of traffic, typically, on the network.
943
00:52:41,670 --> 00:52:44,340
So you can't do it super easily.
944
00:52:44,340 --> 00:52:48,500
Realize, though, this here is possible.
945
00:52:48,500 --> 00:52:53,630
All right. Let's take our 5-minute break here. We'll come back and dive in deeper.
946
00:52:53,630 --> 00:53:00,850
So we are back, and we've kind of ambled about in a few different directions here.
947
00:53:00,850 --> 00:53:03,700
So let's summarize exactly what's been going on here.
948
00:53:03,700 --> 00:53:07,990
We started the conversation talking about this language called HTML.
949
00:53:07,990 --> 00:53:10,680
Again, not a programming language. It's just a markup language
950
00:53:10,680 --> 00:53:15,490
that is largely about aesthetics and structuring of content in the form of a webpage.
951
00:53:15,490 --> 00:53:19,220
But HTML, therefore, needs some kind of mechanism
952
00:53:19,220 --> 00:53:22,870
for traveling between web browser and server.
953
00:53:22,870 --> 00:53:28,360
HTML therefore sort of rides on top of this other language,
954
00:53:28,360 --> 00:53:31,280
or more properly, a protocol, known as HTTP.
955
00:53:31,280 --> 00:53:33,730
>> And HTTP, as we've seen it thus far,
956
00:53:33,730 --> 00:53:37,140
is kind of analogous to this human convention of shaking hands.
957
00:53:37,140 --> 00:53:39,940
When a browser wants to request a page from a server,
958
00:53:39,940 --> 00:53:43,450
it sends that "get" request from browser to server,
959
00:53:43,450 --> 00:53:48,040
and then the server responds with a number like 200, all is okay,
960
00:53:48,040 --> 00:53:53,290
as well as the HTML or some bad number like 404, file not found.
961
00:53:53,290 --> 00:53:58,220
But meanwhile, HTTP itself isn't the Internet, per se.
962
00:53:58,220 --> 00:54:01,550
HTTP is just a service, a feature of the Internet
963
00:54:01,550 --> 00:54:05,530
much like G chat is another service, much like email is another service.
964
00:54:05,530 --> 00:54:09,180
There's all sorts of things we can do on the Internet.
965
00:54:09,180 --> 00:54:12,670
HTTP is just one of those applications.
966
00:54:12,670 --> 00:54:17,210
So on top of--HTTP is on top of something else
967
00:54:17,210 --> 00:54:21,750
which we didn't mention by name, you might have heard of by name, TCP/IP.
968
00:54:21,750 --> 00:54:25,160
So the story we just told there is all about
969
00:54:25,160 --> 00:54:28,720
how data travels from point a to point b.
970
00:54:28,720 --> 00:54:30,950
And in this case, we saw at a very low level
971
00:54:30,950 --> 00:54:33,060
router to router to router to router,
972
00:54:33,060 --> 00:54:35,390
how the data is actually being transmitted.
973
00:54:35,390 --> 00:54:40,510
But along the way, it is going to encounter various impediments.
974
00:54:40,510 --> 00:54:43,770
Besides these routers, there are things called firewalls on the Internet,
975
00:54:43,770 --> 00:54:46,680
and so data, such as that we were just transmitting
976
00:54:46,680 --> 00:54:49,720
from me to Stanford, from me to Cambridge,
977
00:54:49,720 --> 00:54:54,560
is sent to, at this level, something called an IP address.
978
00:54:54,560 --> 00:54:57,340
We saw this a moment ago, and an IP address
979
00:54:57,340 --> 00:55:02,480
is just a numeric address of the form w.x.y.z,
980
00:55:02,480 --> 00:55:08,070
where each of these is between, give or take, 0 and 255,
981
00:55:08,070 --> 00:55:10,080
though you can't quite use all of those numbers.
982
00:55:10,080 --> 00:55:14,220
But each of these place holders is a number between 0 and 255.
983
00:55:14,220 --> 00:55:16,820
So an IP address these days is 32 bits.
984
00:55:16,820 --> 00:55:20,780
>> Now, that gives us how many possible IP addresses in the world?
985
00:55:20,780 --> 00:55:24,420
Roughly 4 billion, because any time we're counting in powers of 2
986
00:55:24,420 --> 00:55:27,760
all the way up to 32 of something, that usually gives us 4 billion.
987
00:55:27,760 --> 00:55:30,160
So that's a lot of IP addresses, but you might have read,
988
00:55:30,160 --> 00:55:32,410
or you might now notice in the popular press,
989
00:55:32,410 --> 00:55:36,020
a push toward a new version of IP called IPV6.
990
00:55:36,020 --> 00:55:38,290
Right now we're using version 4.
991
00:55:38,290 --> 00:55:41,060
There really hasn't been a version 5, we're just jumping right to 6.
992
00:55:41,060 --> 00:55:46,760
Version 6 is going to use 128 bits for IP addresses, which is freaking huge.
993
00:55:46,760 --> 00:55:49,430
We should not run out for quite some time now,
994
00:55:49,430 --> 00:55:52,980
but we have begun to run out of version 4 IP addresses,
995
00:55:52,980 --> 00:55:56,110
because all of us have not only things like laptops and desktops,
996
00:55:56,110 --> 00:55:58,700
a lot of us have phones, a lot of us have other devices
997
00:55:58,700 --> 00:56:01,600
like TiVo and the like that have IP addresses themselves.
998
00:56:01,600 --> 00:56:03,720
Harvard itself has tens of thousands of computers.
999
00:56:03,720 --> 00:56:07,970
So the world is genuinely running out of IP addresses, at least of this form.
1000
00:56:07,970 --> 00:56:10,340
So over the next few years, you are going to see the addresses
1001
00:56:10,340 --> 00:56:12,870
on your own computers probably slowly change
1002
00:56:12,870 --> 00:56:16,740
as more and more companies and universities start to support the newer version.
1003
00:56:16,740 --> 00:56:22,770
But an IP address is not sufficient for computer a to request data from computer b.
1004
00:56:22,770 --> 00:56:24,950
Because computer b could be a server,
1005
00:56:24,950 --> 00:56:27,600
and a server, as I mentioned earlier, can do bunches of things.
1006
00:56:27,600 --> 00:56:29,940
It can host web pages, it can be an email server,
1007
00:56:29,940 --> 00:56:32,310
it can be a Skype server, it can be a G chat server.
1008
00:56:32,310 --> 00:56:35,870
>> All these different services that can be provided on a server
1009
00:56:35,870 --> 00:56:38,330
could all, physically, be on the same machine.
1010
00:56:38,330 --> 00:56:40,380
So in addition to IP addresses,
1011
00:56:40,380 --> 00:56:43,250
the world has things called ports on the Internet.
1012
00:56:43,250 --> 00:56:47,830
A port is just a number; so there is a unique number for HTTP.
1013
00:56:47,830 --> 00:56:50,280
Its number is 80.
1014
00:56:50,280 --> 00:56:55,870
HTTP also uses number 443, but more specifically, for encrypted HTTPS.
1015
00:56:55,870 --> 00:57:00,030
Whenever you see the s, for secure, that's using a different number.
1016
00:57:00,030 --> 00:57:06,580
There are other numbers, like 25, used for something called SMTP, otherwise known as email.
1017
00:57:06,580 --> 00:57:09,620
There's something called 22 for SSH,
1018
00:57:09,620 --> 00:57:11,850
and there's a whole bunch of other ports out there.
1019
00:57:11,850 --> 00:57:14,460
Now, we humans rarely see these numbers.
1020
00:57:14,460 --> 00:57:21,970
However, when you type in an address like http://www.facebook.com,
1021
00:57:21,970 --> 00:57:26,560
the browser is secretly inserting 80, because you're using HTTP.
1022
00:57:26,560 --> 00:57:30,630
If you, instead, type HTTPS, it's secretly inserting 443.
1023
00:57:30,630 --> 00:57:35,180
And we can kind of see this manually if I pull up a brower
1024
00:57:35,180 --> 00:57:41,850
and go to http://www.facebook.com:80.
1025
00:57:41,850 --> 00:57:44,550
Therefore explicitly citing not just the name of the website
1026
00:57:44,550 --> 00:57:47,650
but the port that I want to talk to, and hit Enter.
1027
00:57:47,650 --> 00:57:50,170
Notice it disappears, because the browser assumes,
1028
00:57:50,170 --> 00:57:53,360
oh, 80, I'm not even going to bother showing that to you.
1029
00:57:53,360 --> 00:57:56,400
But the reason for this is that if I actually wanted to send someone an email,
1030
00:57:56,400 --> 00:58:02,340
I would really be sending it to them on port 25, that being SMTP.
1031
00:58:02,340 --> 00:58:04,890
A bit of an oversimplification, but some of you have friends
1032
00:58:04,890 --> 00:58:09,290
who actually work at Facebook, and they, similarly, have servers that receive email.
1033
00:58:09,290 --> 00:58:12,610
>> Any time you send an email, what Gmail is doing for you
1034
00:58:12,610 --> 00:58:14,960
or Outlook or whatever program you use,
1035
00:58:14,960 --> 00:58:19,270
it's sort of secretly inserting that number as well, 25 in that case.
1036
00:58:19,270 --> 00:58:24,490
It's this combination of IP address and number that uniquely identifies
1037
00:58:24,490 --> 00:58:29,190
a computer on the Internet and a specific service on that computer.
1038
00:58:29,190 --> 00:58:33,460
Now, of course, most of us have probably never typed manually an IP address.
1039
00:58:33,460 --> 00:58:37,340
Maybe you have in the appliance, but in the real world, not so much.
1040
00:58:37,340 --> 00:58:42,750
Why do we not type IP addresses into browsers?
1041
00:58:42,750 --> 00:58:45,860
It would work, in fact, we can see this; let me show you
1042
00:58:45,860 --> 00:58:50,000
one other command that should work most anywhere on Harvard's campus on a Mac or a PC.
1043
00:58:50,000 --> 00:58:53,970
There's this command called nslookup, name server lookup.
1044
00:58:53,970 --> 00:58:59,960
If I look up www.cnn.com, it turns out that CNN has--oh, interesting.
1045
00:58:59,960 --> 00:59:03,180
CNN has started using Amazon web services.
1046
00:59:03,180 --> 00:59:06,380
You might know of cloud computing; Amazon's one of the big players in cloud computing.
1047
00:59:06,380 --> 00:59:10,240
What I just did was I said, "Give me the address of CNN's web server,"
1048
00:59:10,240 --> 00:59:14,090
but it turns out that CNN's web server is managed by Amazon,
1049
00:59:14,090 --> 00:59:16,030
Amazon web services, this suggests.
1050
00:59:16,030 --> 00:59:19,680
And the address of that server is this here.
1051
00:59:19,680 --> 00:59:22,350
So I'm not sure if this will work, because they didn't used to use Amazon.
1052
00:59:22,350 --> 00:59:32,830
But let's try this; http://, IP address, Enter, and--
1053
00:59:32,830 --> 00:59:35,690
is it going to work?
1054
00:59:35,690 --> 00:59:39,280
Yes. It is going to work. Internet is super slow today.
1055
00:59:39,280 --> 00:59:43,680
But, in a moment, you will see some news story.
1056
00:59:43,680 --> 00:59:48,360
There we go. Bank of America's being sued. All right.
1057
00:59:48,360 --> 00:59:54,000
>> This is because this IP address just happens to by synonymous with www.cnn.com.
1058
00:59:54,000 --> 00:59:59,920
Of course, it would be horrible marketing to say, visit us on the Web at 50.112.94.127.
1059
00:59:59,920 --> 01:00:02,370
You'd never remember. So even these days you might recall things
1060
01:00:02,370 --> 01:00:07,210
like 1-800-COLLECT or mnemonics the world came up with for phone numbers.
1061
01:00:07,210 --> 01:00:09,540
Which, before cell phones, were rather hard to remember
1062
01:00:09,540 --> 01:00:11,800
until you could just type it in and forget about it.
1063
01:00:11,800 --> 01:00:15,730
So the Web, too, has this convention of names and IP addresses,
1064
01:00:15,730 --> 01:00:17,770
and there are these things out there called DNS servers,
1065
01:00:17,770 --> 01:00:23,870
domain name systems servers, that translate IP addresses into names and vice versa.
1066
01:00:23,870 --> 01:00:26,340
So that's what's going on underneath the hood.
1067
01:00:26,340 --> 01:00:29,540
In the end, we have TCP/IP, which is this very low-level protocol
1068
01:00:29,540 --> 01:00:32,570
that, really, just gets 0's and 1's across the Internet,
1069
01:00:32,570 --> 01:00:36,030
and it does so by putting them into a virtual envelope,
1070
01:00:36,030 --> 01:00:38,820
if you will, and writing on the outside of the envelope
1071
01:00:38,820 --> 01:00:43,930
the IP address of the destination, as well as the numeric port number
1072
01:00:43,930 --> 01:00:47,520
of the service on that destination that it wants to talk to.
1073
01:00:47,520 --> 01:00:51,060
Meanwhile, on the envelope there's also something known as a return address,
1074
01:00:51,060 --> 01:00:55,600
which is your IP address, so that when CNN gets a packet of information from you,
1075
01:00:55,600 --> 01:00:58,710
opens this virtual envelope, sees that you want the home page,
1076
01:00:58,710 --> 01:01:04,630
it knows from the sender part of this virtual envelope whom to send the HTML back to.
1077
01:01:04,630 --> 01:01:07,470
So let's take a look at this in a little more detail.
1078
01:01:07,470 --> 01:01:11,370
This is from a company called Ericson, from a few years back.
1079
01:01:11,370 --> 01:01:14,780
And they took some liberties with how the Internet actually works,
1080
01:01:14,780 --> 01:01:18,920
but it paints a much more visual picture than mere chalk up here.
1081
01:01:18,920 --> 01:01:26,690
So I give you "A Bit of the Internet."
1082
01:02:26,660 --> 01:02:29,840
>> [Narrator] For the first time in history,
1083
01:02:29,840 --> 01:02:35,260
people and machinery are working together, realizing a dream.
1084
01:02:35,260 --> 01:02:38,910
A uniting force that knows no geographical boundaries.
1085
01:02:38,910 --> 01:02:43,230
Without regard to race, creed, or color.
1086
01:02:43,230 --> 01:02:47,770
A new era where communication truly brings people together.
1087
01:02:47,770 --> 01:02:50,070
This is
1088
01:02:50,070 --> 01:02:54,980
The Dawn of the Net.
1089
01:02:54,980 --> 01:03:04,640
Want to know how it works? Click here to begin your journey into the Net.
1090
01:03:04,640 --> 01:03:07,890
Now, exactly what happened when you clicked on that link?
1091
01:03:07,890 --> 01:03:10,150
You started a flow of information.
1092
01:03:10,150 --> 01:03:13,310
This information travels down into your own personal mailroom
1093
01:03:13,310 --> 01:03:18,500
where Mr. IP packages it, labels it, and sends it on its way.
1094
01:03:18,500 --> 01:03:20,960
Each packet is limited in its size.
1095
01:03:20,960 --> 01:03:23,880
The mail room must decide how to divide the information
1096
01:03:23,880 --> 01:03:26,070
and how to package it.
1097
01:03:26,070 --> 01:03:29,550
Now, the package needs a label containing important information
1098
01:03:29,550 --> 01:03:35,570
such as sender's address, receiver's address, and the type of packet it is.
1099
01:03:51,700 --> 01:03:54,980
Because this particular packet is going out onto the Internet,
1100
01:03:54,980 --> 01:03:57,720
it also gets an address for the proxy server,
1101
01:03:57,720 --> 01:04:01,520
which has a special function, as we'll see later.
1102
01:04:01,520 --> 01:04:06,650
The packet is now launched onto your local area network, or LAN.
1103
01:04:06,650 --> 01:04:10,160
This network is used to connect all the local computers'
1104
01:04:10,160 --> 01:04:15,900
routers, printers, et cetera, for information exchange within the physical walls of the building.
1105
01:04:15,900 --> 01:04:20,290
The LAN is a pretty uncontrolled place, and, unfortunately,
1106
01:04:20,290 --> 01:04:23,950
accidents can happen.
1107
01:04:31,190 --> 01:04:34,710
The highway of the LAN is packed with all types of information.
1108
01:04:34,710 --> 01:04:38,900
These are IP packets, Novell packets, AppleTalk packets.
1109
01:04:38,900 --> 01:04:41,270
They're going against traffic, as usual.
1110
01:04:41,270 --> 01:04:44,260
The local router reads the address and, if necessary,
1111
01:04:44,260 --> 01:04:48,520
lifts the packet on to another network.
1112
01:04:48,520 --> 01:04:54,270
Ah, the router. A symbol of control in a seemingly disorganized world.
1113
01:04:54,270 --> 01:05:05,480
[Router mumbling and talking to itself]
1114
01:05:05,480 --> 01:05:10,030
>> [Narrator] There he is, systematic, uncaring, methodical,
1115
01:05:10,030 --> 01:05:14,150
conservative, and sometimes not quite up to speed.
1116
01:05:14,150 --> 01:05:17,680
But at least he is exact, for the most part.
1117
01:05:32,270 --> 01:05:36,820
As the packets leave the router, they make their way into the corporate Internet
1118
01:05:36,820 --> 01:05:40,830
and head for the router switch.
1119
01:05:40,830 --> 01:05:46,250
A bit more efficient than the router, the router switch plays fast and loose with IP packets,
1120
01:05:46,250 --> 01:05:48,920
deftly routing them along their way.
1121
01:05:48,920 --> 01:05:52,130
A digital "pinball wizard," if you will.
1122
01:05:52,130 --> 01:06:04,270
[Router switch talking to itself]
1123
01:06:09,830 --> 01:06:12,150
[Narrator] As packets arrive at their destination,
1124
01:06:12,150 --> 01:06:14,740
they're picked up by the network interface,
1125
01:06:14,740 --> 01:06:18,040
ready to be sent to the next level.
1126
01:06:18,040 --> 01:06:21,010
In this case, the proxy.
1127
01:06:21,010 --> 01:06:25,040
The proxy is used by many companies as sort of a middle man
1128
01:06:25,040 --> 01:06:27,630
in order to lessen the load on the Internet connection
1129
01:06:27,630 --> 01:06:32,240
and for security reasons, as well.
1130
01:06:32,240 --> 01:06:38,750
As you can see, the packets are all of various sizes depending upon their content.
1131
01:06:55,210 --> 01:07:01,890
The proxy opens the packet and looks for the web address or URL.
1132
01:07:01,890 --> 01:07:04,950
Depending upon whether the address is acceptable,
1133
01:07:04,950 --> 01:07:08,000
the packet is sent on to the Internet.
1134
01:07:13,890 --> 01:07:19,630
There are, however, some addresses which do not meet with the approval of the proxy.
1135
01:07:19,630 --> 01:07:25,680
That is to say, corporate or management guidelines.
1136
01:07:25,680 --> 01:07:30,580
These are summarily dealt with.
1137
01:07:30,580 --> 01:07:32,410
We'll have none of that.
1138
01:07:32,410 --> 01:07:36,350
For those who make it, it's on the road again.
1139
01:07:46,850 --> 01:07:53,310
>> Next up, the firewall.
1140
01:07:53,310 --> 01:07:57,410
The corporate firewall serves two purposes.
1141
01:07:57,410 --> 01:08:02,420
It prevents some rather nasty things from the Internet from coming in to the Intranet,
1142
01:08:02,420 --> 01:08:10,280
and it can also prevent sensitive corporate information from being sent out onto the Internet.
1143
01:08:10,280 --> 01:08:12,980
Once through the firewall, a router picks up the packet
1144
01:08:12,980 --> 01:08:18,180
and places it onto a much narrower road, or bandwidth, as we say.
1145
01:08:18,180 --> 01:08:23,720
Obviously, the road is not broad enough to take them all.
1146
01:08:23,720 --> 01:08:29,319
Now, you might wonder what happens to all those packets which don't make it along the way.
1147
01:08:29,319 --> 01:08:32,270
Well, when Mr. IP doesn't receive an acknowledgement
1148
01:08:32,270 --> 01:08:35,000
that a packet has been received in due time,
1149
01:08:35,000 --> 01:08:39,890
he simply sends a replacement packet.
1150
01:08:39,890 --> 01:08:44,760
We are now ready to enter the world of the Internet.
1151
01:08:44,760 --> 01:08:49,370
A spiderweb of interconnected networks which span our entire globe.
1152
01:08:49,370 --> 01:08:56,050
Here, routers and switches establish links between networks.
1153
01:08:56,050 --> 01:08:59,200
Now, the Net is an entirely different environment than you'll find
1154
01:08:59,200 --> 01:09:01,569
within the protective walls of your LAN.
1155
01:09:01,569 --> 01:09:04,060
Out here, it's the Wild West.
1156
01:09:04,060 --> 01:09:06,359
Plenty of space, plenty of opportunities,
1157
01:09:06,359 --> 01:09:09,760
plenty of things to explore and places to go.
1158
01:09:09,760 --> 01:09:12,760
Thanks to very little control and regulation,
1159
01:09:12,760 --> 01:09:18,300
new ideas find fertile soil to push the envelope of their possibilities.
1160
01:09:18,300 --> 01:09:22,330
But because of this freedom, certain dangers also lurk.
1161
01:09:22,330 --> 01:09:27,000
You'll never know when you'll meet the dreaded ping of death,
1162
01:09:27,000 --> 01:09:29,890
a special version of a normal request ping,
1163
01:09:29,890 --> 01:09:35,720
which some idiot thought up to mess up unsuspecting hosts.
1164
01:09:35,720 --> 01:09:39,130
The path our packets take may be via satellite,
1165
01:09:39,130 --> 01:09:43,090
telephone lines, wireless, or even transoceanic cable.
1166
01:09:43,090 --> 01:09:46,520
They don't always take the fastest or shortest routes possible,
1167
01:09:46,520 --> 01:09:50,290
but they will get there eventually.
1168
01:09:50,290 --> 01:09:55,230
Maybe that's why it's sometimes called "The World Wide Wait."
1169
01:09:55,230 --> 01:09:57,980
But when everything is working smoothly,
1170
01:09:57,980 --> 01:10:03,800
you can circumvent the globe five times over at the drop of a hat, literally.
1171
01:10:03,800 --> 01:10:08,230
And all for the cost of a local call or less.
1172
01:10:08,230 --> 01:10:15,070
Near the end of our destination, we'll find another firewall.
1173
01:10:15,070 --> 01:10:18,420
>> Depending upon your perspective as a data packet,
1174
01:10:18,420 --> 01:10:23,730
the firewall could be a bastion of security or a dreaded adversary.
1175
01:10:23,730 --> 01:10:28,530
It all depends on which side you're on and what your intentions are.
1176
01:10:28,530 --> 01:10:34,990
The firewall is designed to let in only those packets that meet its criteria.
1177
01:10:34,990 --> 01:10:39,360
This firewall is operating on ports 80 and 25.
1178
01:10:39,360 --> 01:10:46,630
All attempts to enter through other ports are closed for business.
1179
01:10:57,660 --> 01:11:03,480
Port 25 is used for mail packets,
1180
01:11:03,480 --> 01:11:10,720
while port 80 is the entrance for packets from the Internet to the web server.
1181
01:11:10,720 --> 01:11:15,080
Inside the firewall, packets are screened more thoroughly.
1182
01:11:15,080 --> 01:11:17,970
Some packets make it easily through customs,
1183
01:11:17,970 --> 01:11:21,420
while others look just a bit dubious.
1184
01:11:21,420 --> 01:11:24,060
Now, the firewall officer is not easily fooled,
1185
01:11:24,060 --> 01:11:32,120
such as when this ping of death packet tries to disguise itself as a normal ping packet.
1186
01:11:32,120 --> 01:11:37,520
[Firewall officer talking to packets]
1187
01:11:37,520 --> 01:11:40,510
[Narrator] For those packets lucky enough to make it this far,
1188
01:11:40,510 --> 01:11:45,730
the journey is almost over.
1189
01:11:45,730 --> 01:11:52,130
It's just a line up on the interface to be taken up into the web server.
1190
01:11:52,130 --> 01:11:55,440
Nowadays, a web server can run on many things,
1191
01:11:55,440 --> 01:11:59,230
from a mainframe to a web cam to the computer on your desk.
1192
01:11:59,230 --> 01:12:01,720
Why not your refrigerator?
1193
01:12:01,720 --> 01:12:04,870
With the proper setup, you can find out if you have the makings
1194
01:12:04,870 --> 01:12:08,390
for Chicken Cacciatore, or if you have to go shopping.
1195
01:12:08,390 --> 01:12:11,760
Remember, this is the dawn of the Net.
1196
01:12:11,760 --> 01:12:17,310
Almost anything is possible.
1197
01:12:17,310 --> 01:12:20,440
One by one, the packets are received,
1198
01:12:20,440 --> 01:12:26,320
opened, and unpacked.
1199
01:12:26,320 --> 01:12:31,200
The information they contain, that is, your request for information,
1200
01:12:31,200 --> 01:12:34,830
is sent on to the web server application.
1201
01:12:41,540 --> 01:12:47,140
The packet itself is recycled,
1202
01:12:47,140 --> 01:12:57,570
ready to be used again, and filled with your requested information,
1203
01:12:57,570 --> 01:13:03,340
addressed, and sent out on its way back to you.
1204
01:13:03,340 --> 01:13:13,250
Back past the firewall, routers, and on through to the Internet.
1205
01:13:13,250 --> 01:13:21,020
Back through your corporate firewall
1206
01:13:21,020 --> 01:13:24,180
and onto your interface,
1207
01:13:24,180 --> 01:13:31,180
ready to supply your web browser with the information you've requested.
1208
01:13:31,180 --> 01:13:39,840
That is, this film.
1209
01:13:39,840 --> 01:13:43,550
Pleased with their efforts, and trusting the better world,
1210
01:13:43,550 --> 01:13:50,250
our trusty data packets ride off blissfully into the sunset of another day,
1211
01:13:50,250 --> 01:13:56,880
knowing fully they have served their masters well.
1212
01:13:56,880 --> 01:14:02,560
Now, isn't that a happy ending?
1213
01:14:02,560 --> 01:14:07,040
[Malan] Okay, that's enough. We'll see you next week.
1214
01:14:07,040 --> 01:14:10,040
[CS50.TV]