1 00:00:00,000 --> 00:00:02,620 [Week 7, Continued] 2 00:00:02,620 --> 00:00:05,090 [David J. Malan, Harvard University] 3 00:00:05,090 --> 00:00:07,780 [This is CS50.] [CS50.TV] 4 00:00:07,780 --> 00:00:09,810 All right. Welcome Back. This is CS50, 5 00:00:09,810 --> 00:00:12,100 and this is the end of week 7. 6 00:00:12,100 --> 00:00:15,460 So one of these stupid little things that goes around the Internet 7 00:00:15,460 --> 00:00:24,080 and we slurped up, and it should now make a little bit of geeky sense to you. 8 00:00:24,080 --> 00:00:28,330 Well, it was funnier to this guy than it was to you guys. 9 00:00:28,330 --> 00:00:32,619 Speaking of, well, guys, 10 00:00:32,619 --> 00:00:42,550 today is Nate's birthday. 11 00:00:42,550 --> 00:00:46,630 To give you a sense of just how good Nate and I are 12 00:00:46,630 --> 00:00:50,140 at web development based on Monday's class and based now on this, 13 00:00:50,140 --> 00:00:53,170 I thought I'd pull up Nate's home page, if you haven't seen it yet. 14 00:00:53,170 --> 00:00:57,020 This here is Nate's HTML. 15 00:00:57,020 --> 00:00:59,380 So see his source code if you'd like to see how to do this, and Nate, 16 00:00:59,380 --> 00:01:02,250 if we could embarrass you just briefly, the staff got you a little something 17 00:01:02,250 --> 00:01:06,080 if you'd like to share some dessert with some of the kids in the class here. 18 00:01:06,080 --> 00:01:10,150 If you'd like to come on down. 19 00:01:10,150 --> 00:01:14,350 You all applaud and are very nice, but no one is sitting anywhere near Nate, 20 00:01:14,350 --> 00:01:17,560 for some reason, in that back zone. 21 00:01:17,560 --> 00:01:24,020 So perhaps you can find some folks to enjoy these with. 22 00:01:24,020 --> 00:01:33,380 Happy Birthday, Nate. 23 00:01:33,380 --> 00:01:37,660 >> Additional hellos: We showed a couple clips from our CS50x students. 24 00:01:37,660 --> 00:01:39,710 If you would like to see who else it is in the world 25 00:01:39,710 --> 00:01:41,850 that's following along, you can head to this URL, 26 00:01:41,850 --> 00:01:45,780 where Joseph, one of our TFs, has put together a montage of sorts 27 00:01:45,780 --> 00:01:50,290 of everyone who has been submitting these videos, among them Rick Astley. 28 00:01:50,290 --> 00:01:53,010 And if you scroll through these, it's really quite inspiring 29 00:01:53,010 --> 00:01:56,890 to see the diversity of countries and cities from which people are hailing. 30 00:01:56,890 --> 00:02:00,830 So if you'd like to take a look at that, that will be up through the end of the semester. 31 00:02:00,830 --> 00:02:05,370 Today we continue our look at the Web, web programming, HTML and the like, 32 00:02:05,370 --> 00:02:08,280 and we also have lunch coming up this Friday 33 00:02:08,280 --> 00:02:11,360 if you would like, and particularly, have not done so before. 34 00:02:11,360 --> 00:02:13,630 This Friday's theme will be Nate's birthday, 35 00:02:13,630 --> 00:02:15,700 so if you would like to have birthday lunch with Nate 36 00:02:15,700 --> 00:02:17,500 and others, some of our friends from industry, 37 00:02:17,500 --> 00:02:19,300 please head to that URL there. 38 00:02:19,300 --> 00:02:22,510 Space, as always, is limited. Also, if you've forgotten, 39 00:02:22,510 --> 00:02:26,460 realize that next week is the deadline for problem set 4's scavenger hunt, 40 00:02:26,460 --> 00:02:30,070 whereby after recovering all of those JPEGs from card.raw, 41 00:02:30,070 --> 00:02:32,880 you and your section mates, if you would like, can try photographing 42 00:02:32,880 --> 00:02:36,100 as many of the computer scientists from that memory card as possible, 43 00:02:36,100 --> 00:02:39,070 and you and your section will then win a fabulous prize. 44 00:02:39,070 --> 00:02:44,470 Refer back to pset 4's specification as to what to submit and by when. 45 00:02:44,470 --> 00:02:47,650 Also, if you would like to have your handiwork immortalized 46 00:02:47,650 --> 00:02:51,400 on the course's website and its history of apparel, 47 00:02:51,400 --> 00:02:54,010 know that you are welcome now to start submitting designs 48 00:02:54,010 --> 00:02:57,180 for this year's T-shirts and sweatshirts and the like. 49 00:02:57,180 --> 00:02:59,200 We'll do our best to include as many as we can, 50 00:02:59,200 --> 00:03:01,440 but we'll have some members of the staff review all of the designs 51 00:03:01,440 --> 00:03:04,180 to make sure they're consistent with the specifications, 52 00:03:04,180 --> 00:03:07,500 and we then pick generally a handful of them to be exhibited. 53 00:03:07,500 --> 00:03:10,620 So if you are the design type, just know that the requirements 54 00:03:10,620 --> 00:03:14,030 for graphics are PNG, at least 200 DPI; 55 00:03:14,030 --> 00:03:16,520 they shouldn't be more than 4000 x 4000 pixels, 56 00:03:16,520 --> 00:03:19,010 and no more than 10 MB, but you're welcome to use things like 57 00:03:19,010 --> 00:03:22,430 Photoshop or GIMP or various graphics programs, 58 00:03:22,430 --> 00:03:24,590 whatever you have at your disposal. 59 00:03:24,590 --> 00:03:28,280 >> Also on the horizon is the final project. The final project really is the climax of 50, 60 00:03:28,280 --> 00:03:30,560 whereby of all the assignments in the course, 61 00:03:30,560 --> 00:03:33,170 it's your opportunity really to do your own thing. 62 00:03:33,170 --> 00:03:35,280 And that can be simply to do something for fun, 63 00:03:35,280 --> 00:03:38,160 it can be to solve some pressing problem your student group has, 64 00:03:38,160 --> 00:03:40,980 for some new website, some new collection mechanism for data. 65 00:03:40,980 --> 00:03:43,420 It can be a mobile application for Android, for iOS. 66 00:03:43,420 --> 00:03:46,030 Really, the sky is the limit, and over the next few weeks, 67 00:03:46,030 --> 00:03:50,900 as we transition from C to these higher-level languages like PHP and JavaScript, 68 00:03:50,900 --> 00:03:55,150 you'll find yourself increasingly familiarized with some real-world techniques, 69 00:03:55,150 --> 00:03:57,800 some real-world tools, and to supplement that, 70 00:03:57,800 --> 00:04:00,170 know that the course has a history of seminars, 71 00:04:00,170 --> 00:04:02,880 whereby over the next several weeks, some of the teaching staff 72 00:04:02,880 --> 00:04:06,160 and friends of ours from on campus will offer optional seminars 73 00:04:06,160 --> 00:04:08,540 which go above and beyond what's typically done in section 74 00:04:08,540 --> 00:04:11,090 to introduce you to things like Android programming, 75 00:04:11,090 --> 00:04:13,450 to introduce you to things like iOS programming 76 00:04:13,450 --> 00:04:15,950 or more advanced web-development techniques. 77 00:04:15,950 --> 00:04:17,970 There's a whole history of these already online. 78 00:04:17,970 --> 00:04:25,000 If you go to cs50.net/seminars, we've been doing this for quite some years, 79 00:04:25,000 --> 00:04:28,740 and you'll see that archived here with PDFs and videos and the like 80 00:04:28,740 --> 00:04:33,090 are several dozen videos of seminars. 81 00:04:33,090 --> 00:04:37,380 Last year, for instance, we had a seminar on acing your technical interviews, 82 00:04:37,380 --> 00:04:40,980 if you're actually looking to go off and do an internship or full-time gig. 83 00:04:40,980 --> 00:04:43,450 Windows mobile development, Android development, Google Maps, 84 00:04:43,450 --> 00:04:47,700 API, CSS, developing for the BlackBerry, Emacs. 85 00:04:47,700 --> 00:04:52,610 Really, you are welcome to take a look at any of these seminars at your convenience. 86 00:04:52,610 --> 00:04:57,080 And we'll be holding some new ones this semester, as well. 87 00:04:57,080 --> 00:04:59,020 >> So what is ahead with the final project? 88 00:04:59,020 --> 00:05:01,090 Well, first, even though this date is somewhat imminent, 89 00:05:01,090 --> 00:05:06,460 this is really just an opportunity to start thinking about the final project quite realistically. 90 00:05:06,460 --> 00:05:10,550 We know only the beginnings of some of what we'll still be covering in the course-- 91 00:05:10,550 --> 00:05:13,470 HTML, PHP and the like--but you're all familiar with the Web, 92 00:05:13,470 --> 00:05:16,270 and I bias this conversation toward the Web only because 93 00:05:16,270 --> 00:05:18,380 most people end up doing Web-based final projects, 94 00:05:18,380 --> 00:05:20,260 but that is by no means requisite. 95 00:05:20,260 --> 00:05:22,260 Using C is fine, Objective-C, Java, 96 00:05:22,260 --> 00:05:25,350 any other language you might know or want to know is quite fine. 97 00:05:25,350 --> 00:05:29,370 But to get the juices flowing initially, we'll expect the submission of a preproposal 98 00:05:29,370 --> 00:05:33,520 which, per the PDF on the website, which is now at cs50.net, 99 00:05:33,520 --> 00:05:36,080 and at the top left you'll see final project 100 00:05:36,080 --> 00:05:38,920 is the specification for the final project, 101 00:05:38,920 --> 00:05:41,470 and in there are details on the preproposal and the like. 102 00:05:41,470 --> 00:05:44,760 It pretty much boils down to an email to your teaching fellow 103 00:05:44,760 --> 00:05:48,450 just to strike up a conversation with him or her about what you're thinking. 104 00:05:48,450 --> 00:05:52,510 On projects.cs50.net is a repository of ideas from folks on campus 105 00:05:52,510 --> 00:05:54,480 if you're struggling to come up with some idea, 106 00:05:54,480 --> 00:06:01,140 and manual.cs50.net/apis is a repository of links to APIs. 107 00:06:01,140 --> 00:06:06,710 >> What, though, is an API? 108 00:06:06,710 --> 00:06:09,790 What's an API? I've said it at least twice, 109 00:06:09,790 --> 00:06:12,640 according to the transcripts of the past several weeks. 110 00:06:12,640 --> 00:06:17,050 What's that? [Student, unintelligible] 111 00:06:17,050 --> 00:06:19,340 >>Okay, good. So something programming interface. 112 00:06:19,340 --> 00:06:22,710 Application programming interface, and this can take several forms, 113 00:06:22,710 --> 00:06:25,850 but what this really boils down to is code 114 00:06:25,850 --> 00:06:29,660 that someone else has written or data that someone else has collected 115 00:06:29,660 --> 00:06:33,670 that is made available to you in some programmatic way. 116 00:06:33,670 --> 00:06:36,630 You can write code in C, PHP, Python, Ruby, 117 00:06:36,630 --> 00:06:38,760 whatever your language of choice typically is, 118 00:06:38,760 --> 00:06:42,240 and you can somehow build upon someone else's functionality 119 00:06:42,240 --> 00:06:44,440 or someone else's data set. 120 00:06:44,440 --> 00:06:47,210 For instance, if I go to this link here, 121 00:06:47,210 --> 00:06:50,750 and you'll see a pair of links on the subsequent page 122 00:06:50,750 --> 00:06:56,093 whereby we have CS50's own APIs, which are very Harvard-centric, and then third-party APIs. 123 00:06:56,930 --> 00:06:59,300 Among the third-party APIs are really useful things 124 00:06:59,300 --> 00:07:01,780 like being able to send SMS's to people, 125 00:07:01,780 --> 00:07:04,690 being able to receive SMS text messages from people. 126 00:07:04,690 --> 00:07:08,160 And things like that that you might have no idea how to implement yourself, 127 00:07:08,160 --> 00:07:10,440 but thanks to services, some free and some commercial, 128 00:07:10,440 --> 00:07:14,000 you can build atop those and do something of interest to you. 129 00:07:14,000 --> 00:07:16,990 Among CS50's APIs are these campus-centric things like 130 00:07:16,990 --> 00:07:21,480 Harvard courses, energy, events, food, maps, news, tweets, and Shuttleboy's own, 131 00:07:21,480 --> 00:07:23,940 and these are APIs that look a little something like this. 132 00:07:23,940 --> 00:07:26,990 >> Let me pull up the HarvardFood API. 133 00:07:26,990 --> 00:07:30,620 If you've ever been to HUD's website, you've probably been there 134 00:07:30,620 --> 00:07:35,410 to just see what's for dinner or to see what the hours are for some d-hall. 135 00:07:35,410 --> 00:07:38,000 Well, it's not particularly easy to navigate, 136 00:07:38,000 --> 00:07:41,100 and so what we did some time ago was we wrote software-- 137 00:07:41,100 --> 00:07:47,270 it happens to be in PHP--that actually screen scrapes the entirety of HUD's website. 138 00:07:47,270 --> 00:07:51,400 To screen scrape something means to write a program in a language like PHP 139 00:07:51,400 --> 00:07:55,270 that pretends to be a browser, even though you might run it at a command prompt, 140 00:07:55,270 --> 00:07:58,180 that pretends to be a browser, connects to a website, 141 00:07:58,180 --> 00:08:01,480 downloads its HTML, the language in which it's written, 142 00:08:01,480 --> 00:08:04,300 and then reads it, or more specifically, parses it 143 00:08:04,300 --> 00:08:06,140 top to bottom, left to right. 144 00:08:06,140 --> 00:08:08,870 And what we did was we wrote our code in such a way that 145 00:08:08,870 --> 00:08:12,910 any time we saw something in that HTML that looked like something on the menu, 146 00:08:12,910 --> 00:08:16,470 like hamburger, we would then import that into our own database. 147 00:08:16,470 --> 00:08:20,410 And any time we saw nutritional content, we would import that into our own database. 148 00:08:20,410 --> 00:08:23,090 And what we did was leverage the fact that HUD's website, 149 00:08:23,090 --> 00:08:27,280 even though it might be a bit of a challenge for us humans to navigate 150 00:08:27,280 --> 00:08:32,559 underneath the hood, all of the HTML is generated by their own computer programs. 151 00:08:32,559 --> 00:08:35,159 So all of their HTML, even though it might look messy, 152 00:08:35,159 --> 00:08:38,026 like most websites underneath the hood, it follows a pattern. 153 00:08:38,260 --> 00:08:40,799 So we just spent a couple hours figuring out that pattern 154 00:08:40,799 --> 00:08:44,240 so that in the end, we throw away all of the messy HTML, 155 00:08:44,240 --> 00:08:47,340 all of the aesthetics of bold facing and italics and the like, 156 00:08:47,340 --> 00:08:52,350 and what we are then able to do is expose that same data. 157 00:08:52,350 --> 00:08:54,870 For instance, in this way. 158 00:08:54,870 --> 00:08:56,840 So we, according to the documentation here, 159 00:08:56,840 --> 00:08:59,190 have informed the world that if you request a URL 160 00:08:59,190 --> 00:09:03,310 that looks like this, food.cs50.net/something, 161 00:09:03,310 --> 00:09:07,220 and you provide certain parameters, which we'll talk about today, 162 00:09:07,220 --> 00:09:11,780 like end-date time, start-date time, meal, and so forth, 163 00:09:11,780 --> 00:09:14,090 what our servers will return to you, for instance, 164 00:09:14,090 --> 00:09:18,740 is a CSV file, comma separted values like an Excel file, 165 00:09:18,740 --> 00:09:23,140 containing everything for breakfast on this particular date in March of last year 166 00:09:23,140 --> 00:09:25,450 when I happened to write up this documentation. 167 00:09:25,450 --> 00:09:27,870 >> For those familiar, CSV is not the only file format. 168 00:09:27,870 --> 00:09:30,610 There's another format that's all the more versatile 169 00:09:30,610 --> 00:09:32,670 called JSON, JavaScript Object Notation. 170 00:09:32,670 --> 00:09:34,770 The data can come back in that format. 171 00:09:34,770 --> 00:09:38,110 So the takeaway here is that whether you dive into this API 172 00:09:38,110 --> 00:09:41,170 or any other of CS50's or anything out there on the Internet, 173 00:09:41,170 --> 00:09:45,560 or not at all, realize that the world has increasingly started to standardize 174 00:09:45,560 --> 00:09:47,670 how machines intercommunicate. 175 00:09:47,670 --> 00:09:50,660 We use standard data formats like CSV or JSON. 176 00:09:50,660 --> 00:09:54,320 And what this means for you is you can write the interesting part of a program 177 00:09:54,320 --> 00:09:56,580 that lets your user search a dining hall menu, 178 00:09:56,580 --> 00:10:00,010 that lets them create lists of favorites that lets them get text alerts 179 00:10:00,010 --> 00:10:02,480 when their favorite meal is about to be served in some d-hall 180 00:10:02,480 --> 00:10:07,090 by using someone else's data sets and building on top of their APIs. 181 00:10:07,090 --> 00:10:13,600 So more on that in the form of seminars and the documentation that you have here online. 182 00:10:13,600 --> 00:10:16,450 So those, then, are APIs. 183 00:10:16,450 --> 00:10:18,900 >> That brings us back to HTML. Quick recap. 184 00:10:18,900 --> 00:10:22,920 What is HTML? 185 00:10:22,920 --> 00:10:25,000 [Student, unintelligible] >>Good. HyperText Markup Language. 186 00:10:25,000 --> 00:10:31,300 Someone else, what is Hypertext Markup Language? 187 00:10:31,300 --> 00:10:37,340 HyperText Markup Language. 188 00:10:37,340 --> 00:10:40,330 Okay. So HTML, HyperText. 189 00:10:40,330 --> 00:10:43,100 HyperText just refers to the Web, for the most part. 190 00:10:43,100 --> 00:10:45,730 Markup means that it's not actually a programming language, HTML. 191 00:10:45,730 --> 00:10:48,120 It's not a language that you can express logic in. 192 00:10:48,120 --> 00:10:50,710 It doesn't have loops. It doesn't have conditions. 193 00:10:50,710 --> 00:10:52,820 It doesn't have functions, per se. 194 00:10:52,820 --> 00:10:56,680 Rather, it has these things called tags, or, more properly, elements. 195 00:10:56,680 --> 00:10:59,970 And those elements have start tags and end tags, 196 00:10:59,970 --> 00:11:04,300 or open tags and closed tags, and what those tags generally mean for a browser is, 197 00:11:04,300 --> 00:11:09,270 start doing something and then stop doing something, though there are exceptions to that. 198 00:11:09,270 --> 00:11:12,480 Sometimes it's just "put a line break here," for instance. 199 00:11:12,480 --> 00:11:15,150 And we saw examples of that the other day, between bold facing, 200 00:11:15,150 --> 00:11:17,430 line breaks, and then a couple of other tags. 201 00:11:17,430 --> 00:11:19,880 So HTML is the language in which web pages are written. 202 00:11:19,880 --> 00:11:23,760 So if I go to something like Google.com 203 00:11:23,760 --> 00:11:26,180 and pull up just their home page, 204 00:11:26,180 --> 00:11:29,690 recall that if you right click or control click 205 00:11:29,690 --> 00:11:32,140 and look at view page source, typically 206 00:11:32,140 --> 00:11:34,420 it's a complete mess these days underneath the hood, but that's because 207 00:11:34,420 --> 00:11:38,170 computers don't care about white space, so this doesn't have to look pretty. 208 00:11:38,170 --> 00:11:40,240 But if we zoom in on parts of it, 209 00:11:40,240 --> 00:11:43,460 notice that Chrome, just to be nice, has color coded things. 210 00:11:43,460 --> 00:11:48,460 Indeed, this is the very first tag that we saw in a web page. 211 00:11:48,460 --> 00:11:51,750 And again, HTML 5, the latest version of this language, 212 00:11:51,750 --> 00:11:53,830 does have this thing at the beginning, 213 00:11:53,830 --> 00:11:57,820 00:12:03,580 but that's just sort of a standard that says hey world, here comes an HTML file in version 5. 215 00:12:03,580 --> 00:12:08,920 >> The interesting part begins here. So 00:12:11,640 of the HTML elements last time. 217 00:12:11,640 --> 00:12:14,630 What were those two main children? 218 00:12:14,630 --> 00:12:17,170 Head and body, just like the guy with the tattoo a moment ago. 219 00:12:17,170 --> 00:12:19,640 There's two portions of a web page, head and body, 220 00:12:19,640 --> 00:12:23,750 and recall, then, that perhaps the simplest web page we could make looks like this. 221 00:12:23,750 --> 00:12:27,460 And I've indented it just to be kind of neat and tidy with my code, 222 00:12:27,460 --> 00:12:30,710 but what's really important here is that there is some hierarchy to this. 223 00:12:30,710 --> 00:12:35,420 And any tag that I've opened I have closed and that there's therefore this symmetry 224 00:12:35,420 --> 00:12:38,300 to all of the markup that I've created. 225 00:12:38,300 --> 00:12:41,620 So last time we started writing web pages on my own laptop. 226 00:12:41,620 --> 00:12:45,470 I opened up TextEdit, I saved the file as hello.html, 227 00:12:45,470 --> 00:12:50,190 I then dragged the file onto my browser, and voila, I had a page on the Internet. 228 00:12:50,190 --> 00:12:53,110 Now, it's not quite the case; I had a page on my hard drive, 229 00:12:53,110 --> 00:12:58,260 and I was literally the only person in the world who would see that web page in a browser. 230 00:12:58,260 --> 00:13:00,670 >> So today, we introduce an actual web server 231 00:13:00,670 --> 00:13:02,750 and the notion of actually serving content on the Internet 232 00:13:02,750 --> 00:13:04,970 and how this all starts to fit together. 233 00:13:04,970 --> 00:13:08,350 So it turns out that all this time in the CS50 appliance 234 00:13:08,350 --> 00:13:11,590 you have had a web server on your computer. 235 00:13:11,590 --> 00:13:16,560 We have, in fairness, only used it for gedit, for Clang, for GDB and the like, 236 00:13:16,560 --> 00:13:21,000 but also installed by us for you in the appliance is a web server, 237 00:13:21,000 --> 00:13:23,940 and that web server happens to be free, an open source, 238 00:13:23,940 --> 00:13:26,580 one of the most popular ones in the world, called Apache. 239 00:13:26,580 --> 00:13:31,340 Its more technical name is HTTPd, the d being for daemon here, 240 00:13:31,340 --> 00:13:34,110 which is a technical word for a server. 241 00:13:34,110 --> 00:13:38,690 So installed in the CS50 appliance is a web server, and what does that mean? 242 00:13:38,690 --> 00:13:43,740 Well, a web server is, conceptually, some server on the Internet that serves up web content. 243 00:13:43,740 --> 00:13:48,630 When asked for a file, it spits out the HTML that composes that file, and voila. 244 00:13:48,630 --> 00:13:51,370 You see some website's home page. 245 00:13:51,370 --> 00:13:54,970 But a server is, more precisely, a piece of software. 246 00:13:54,970 --> 00:13:59,190 It doesn't have to be on a physical machine, it just has to be a piece of software running. 247 00:13:59,190 --> 00:14:01,980 So the CS50 Appliance, of course, is a piece of software, 248 00:14:01,980 --> 00:14:04,270 even though it's sort of pretending to be a machine. 249 00:14:04,270 --> 00:14:06,960 It's pretending to be a computer inside of a computer, 250 00:14:06,960 --> 00:14:11,140 but that just means that the appliance can certainly run things like web servers. 251 00:14:11,140 --> 00:14:13,260 It can actually run email servers. 252 00:14:13,260 --> 00:14:16,440 We could run an instant messaging server in the appliance if we wanted to, 253 00:14:16,440 --> 00:14:20,780 and indeed, we do run one other type of server, known as a database server, MySQL. 254 00:14:20,780 --> 00:14:22,620 But more on that next week. 255 00:14:22,620 --> 00:14:26,400 This means that I can actually visit web pages 256 00:14:26,400 --> 00:14:30,480 inside of my appliance by using a browser inside the appliance 257 00:14:30,480 --> 00:14:33,600 or even on my own laptop, my Mac or my PC. 258 00:14:33,600 --> 00:14:37,780 So what does this mean? It turns out that any time you're running a Linux computer, 259 00:14:37,780 --> 00:14:40,910 its nickname is "localhost." 260 00:14:40,910 --> 00:14:43,370 It doesn't have a domain name because we haven't bought a domain name 261 00:14:43,370 --> 00:14:46,590 for something like the appliance, so its default name is localhost. 262 00:14:46,590 --> 00:14:50,470 >> But in order to get the appliance to start serving up web pages, 263 00:14:50,470 --> 00:14:52,270 we have to create them first. 264 00:14:52,270 --> 00:14:55,200 So let's do that. Let me go into a terminal window here 265 00:14:55,200 --> 00:14:58,190 and notice that I'm at my typical John Harvard prompt. 266 00:14:58,190 --> 00:15:01,670 Let me go ahead and type ls, and we'll see some familiar things from this semester, 267 00:15:01,670 --> 00:15:04,580 desktop, downloads, dropbox and so forth, 268 00:15:04,580 --> 00:15:07,540 but now we start turning our attention to a couple. 269 00:15:07,540 --> 00:15:11,530 On many Linux web servers there's this folder called public html, 270 00:15:11,530 --> 00:15:15,630 but we're going to skip that one for now and focus on this, vhosts. 271 00:15:15,630 --> 00:15:18,850 Anyone know what a vhost is? 272 00:15:18,850 --> 00:15:21,110 Just stupid jargon for virtual host, 273 00:15:21,110 --> 00:15:23,850 and what this means is that on a typical server 274 00:15:23,850 --> 00:15:26,810 you can actually host multiple websites. 275 00:15:26,810 --> 00:15:31,500 You can buy a domain name like foo.com, and you can host it on a server. 276 00:15:31,500 --> 00:15:36,100 But you can also buy bar.com and host it on the same server. 277 00:15:36,100 --> 00:15:40,250 The reason being, browsers are smart enough to inform the server 278 00:15:40,250 --> 00:15:45,880 when a user is requesting some webpage, what domain name the user wants the homepage for. 279 00:15:45,880 --> 00:15:48,760 So what's nice about this is you don't need one physical server 280 00:15:48,760 --> 00:15:52,040 or one CS50 appliance for every website you might want to create. 281 00:15:52,040 --> 00:15:55,520 You can use the same server and develop a hundred different websites. 282 00:15:55,520 --> 00:15:58,770 And indeed, if you are a person trying to start a website, 283 00:15:58,770 --> 00:16:02,100 whether for fun or for business, typically you'll go out on the Internet 284 00:16:02,100 --> 00:16:04,650 and you'll pay someone ten bucks a month, a hundred dollars a month, 285 00:16:04,650 --> 00:16:06,670 to host your website for you. 286 00:16:06,670 --> 00:16:11,060 And the way that works is they are charging other people 287 00:16:11,060 --> 00:16:13,160 ten bucks a month or a hundred bucks a month 288 00:16:13,160 --> 00:16:17,200 to host other people's websites on their same server. 289 00:16:17,200 --> 00:16:20,740 The reason they can do that is because of this feature called bhosts, 290 00:16:20,740 --> 00:16:23,790 but more on that when it comes time for final projects. 291 00:16:23,790 --> 00:16:28,360 >> For now, let's just dive in there. So cd vhosts, and if I type ls now, 292 00:16:28,360 --> 00:16:31,370 notice that there's a folder in there called local host. 293 00:16:31,370 --> 00:16:33,440 That's because, by default, the appliance figures 294 00:16:33,440 --> 00:16:36,160 you're ever going to run one website on an appliance. 295 00:16:36,160 --> 00:16:38,970 This isn't really the real world; it's not a real-world web server. 296 00:16:38,970 --> 00:16:41,690 So let me go into localhost, and now we'll see in there 297 00:16:41,690 --> 00:16:44,290 one last directory called HTML. 298 00:16:44,290 --> 00:16:47,080 So it's a little deep, the hierarchy, but if and when 299 00:16:47,080 --> 00:16:51,230 you decide to start developing multiple websites over the next n months or years, 300 00:16:51,230 --> 00:16:54,370 this kind of folder structure tends to be helpful. 301 00:16:54,370 --> 00:16:56,560 Now let's go into HTML as I just did, 302 00:16:56,560 --> 00:16:59,010 type ls, and nothing is there. 303 00:16:59,010 --> 00:17:01,390 So now let's go ahead and do this. Let me open up Chrome 304 00:17:01,390 --> 00:17:07,300 inside of the appliance, and let me go to http://localhost. 305 00:17:07,300 --> 00:17:14,440 So literally the name for my appliance, enter, and I get index of /. 306 00:17:14,440 --> 00:17:18,290 This isn't really showing me anything of interest, 307 00:17:18,290 --> 00:17:23,400 but it turns out that what we're seeing is that folder, HTML. 308 00:17:23,400 --> 00:17:25,770 There's nothing inside that folder right now, 309 00:17:25,770 --> 00:17:28,750 so instead, what I'm going to have to do is first create a file. 310 00:17:28,750 --> 00:17:33,530 Create an HTML file like we did on Monday, but this time put it inside of the appliance. 311 00:17:33,530 --> 00:17:36,830 For those of you who are trying to follow along with laptops now, 312 00:17:36,830 --> 00:17:42,040 let me do one aside that'll be covered in the web-based pset, 313 00:17:42,040 --> 00:17:44,280 but in order to get this to work for the very first time, 314 00:17:44,280 --> 00:17:49,830 you're going to have to run this command: sudo service httpd start. 315 00:17:49,830 --> 00:17:52,670 And this, again, will be repeated in the last pset, 316 00:17:52,670 --> 00:17:55,460 but if you're playing along at home now, the web server 317 00:17:55,460 --> 00:17:58,660 is turned off in the appliance, and that's so that it doesn't sap up RAM 318 00:17:58,660 --> 00:18:01,960 and memory for 7 weeks out of the semester when we don't need it. 319 00:18:01,960 --> 00:18:05,190 So you need to run this command once, and you'll get an output like that. 320 00:18:05,190 --> 00:18:07,920 Then you should be able to play along here. 321 00:18:07,920 --> 00:18:10,330 Now let's go back into this folder. 322 00:18:10,330 --> 00:18:12,770 This folder is empty, so let me start creating a file, 323 00:18:12,770 --> 00:18:16,360 gedit hello.html. 324 00:18:16,360 --> 00:18:20,930 >> All right. Gedit is open, as usual. Let me do doctype, html, 325 00:18:20,930 --> 00:18:25,270 html, let me get ahead of myself and start closing my tags in advance. 326 00:18:25,270 --> 00:18:28,380 Now I have the head. Let me go ahead and close the head, 327 00:18:28,380 --> 00:18:32,450 let me now do the title of the page, hello world like last time, 328 00:18:32,450 --> 00:18:34,790 close title, now let me do a body. 329 00:18:34,790 --> 00:18:38,130 In here I'll say hello, world with some exclams 330 00:18:38,130 --> 00:18:40,550 to make clear that it's a different string. 331 00:18:40,550 --> 00:18:45,800 Close body, and now let me go ahead and File, Save. 332 00:18:45,800 --> 00:18:48,470 Let me go back to my terminal window, and if I type ls, 333 00:18:48,470 --> 00:18:51,830 I should, presumably, see hello.html. And I do. 334 00:18:51,830 --> 00:18:55,070 So now let's go back to my browser, click reload, 335 00:18:55,070 --> 00:18:58,930 and you can see we are indeed inside of this HTML folder. 336 00:18:58,930 --> 00:19:02,310 I'm not seeing a web page yet; this is Apache, the web server, 337 00:19:02,310 --> 00:19:04,670 just showing me the list contents of this directory. 338 00:19:04,670 --> 00:19:08,260 Just like Mac OS or Windows would typically do on your own local hard drive. 339 00:19:08,260 --> 00:19:12,730 So if I want to see this web page, I can click this little link here, hello.html, 340 00:19:12,730 --> 00:19:15,160 and indeed, that's what I was expecting to see. 341 00:19:15,160 --> 00:19:18,080 Now, again, this is not a URL that any of you can visit right now, 342 00:19:18,080 --> 00:19:20,760 because for you, localhost, if you have a laptop here, 343 00:19:20,760 --> 00:19:23,050 it is referring to your own instance of the appliance. 344 00:19:23,050 --> 00:19:25,900 This is on my own personal appliance, 345 00:19:25,900 --> 00:19:29,080 but this is kind of dumb for me to have, to have 346 00:19:29,080 --> 00:19:34,480 a user like myself click on hello.html to actually see the contents of this page. 347 00:19:34,480 --> 00:19:42,590 It turns out that web servers like Apache let you have a default file for any web server. 348 00:19:42,590 --> 00:19:44,640 Notice here we have hello.html. 349 00:19:44,640 --> 00:19:48,410 What's the command in Linux to rename a file? 350 00:19:48,410 --> 00:19:50,870 >> MV, for move. So let me do that, 351 00:19:50,870 --> 00:19:55,870 and let me rename hello.html to index.html. 352 00:19:55,870 --> 00:19:58,610 Let me type ls to confirm it's now been renamed. 353 00:19:58,610 --> 00:20:03,250 Now this is going to--if I go back to localhost, 354 00:20:03,250 --> 00:20:06,710 notice now that I'm automatically seeing that web page. 355 00:20:06,710 --> 00:20:11,740 This is identical to my actually doing /index.html, 356 00:20:11,740 --> 00:20:14,740 but the nice thing now is that the web server's figuring 357 00:20:14,740 --> 00:20:18,830 oh, if you have a file that, by human conventions, is called index.html, 358 00:20:18,830 --> 00:20:21,200 let me show the user that file by default 359 00:20:21,200 --> 00:20:25,290 rather than some stupid directory listing which is not at all user-friendly. 360 00:20:25,290 --> 00:20:28,900 Indeed, most websites you visit on the Internet don't have a list of files to click on, 361 00:20:28,900 --> 00:20:34,040 they just show you the content. So that's how we can do that, index.html. 362 00:20:34,040 --> 00:20:37,000 So this is all fun and good, but this is a pretty simple web page. 363 00:20:37,000 --> 00:20:41,640 Let me go ahead and open up index.html in my vhosts, 364 00:20:41,640 --> 00:20:47,620 local hosts, html directory, and let's add something of greater interest. 365 00:20:47,620 --> 00:20:56,120 So there's hello world; let's instead say "This is CS50, Harvard College's . . ." 366 00:20:56,120 --> 00:21:00,000 So the beginning of the course catalog description of some sort there. 367 00:21:00,000 --> 00:21:03,780 Now if I reload, I should see this in my home page. 368 00:21:03,780 --> 00:21:09,560 Okay, and I do see that, but suppose that I want to now list some more content in this file. 369 00:21:09,560 --> 00:21:15,160 I could go down here and say prerequisites none, 370 00:21:15,160 --> 00:21:18,740 although some of you are probably like, "Ha ha ha, no prerequisites." 371 00:21:18,740 --> 00:21:24,320 But--officially. So reload, and now we have the same quirk that we saw last time. 372 00:21:24,320 --> 00:21:26,240 But why is that? It was a simple fix. 373 00:21:26,240 --> 00:21:31,440 Why is this page broken? 374 00:21:31,440 --> 00:21:34,170 [Student, unintelligible] >>Yeah, we've solved this before 375 00:21:34,170 --> 00:21:37,440 by explicitly telling the browser "put a line break here." 376 00:21:37,440 --> 00:21:39,440 And that's because, again, a browser's only going to do 377 00:21:39,440 --> 00:21:42,610 explicitly what the markup language tells it to do, 378 00:21:42,610 --> 00:21:45,730 so even though you might have hit Enter once or twice or even ten times, 379 00:21:45,730 --> 00:21:49,870 it's going to combine that all into a single space, just by convention. 380 00:21:49,870 --> 00:21:52,770 So if you really want a line break, you have to use the br tag, 381 00:21:52,770 --> 00:21:56,840 and now notice, like Monday, I put the / inside of this tag, 382 00:21:56,840 --> 00:22:00,090 only because this just doesn't feel right 383 00:22:00,090 --> 00:22:02,990 to start a line break then stop it with nothing in between. 384 00:22:02,990 --> 00:22:07,740 >> So the convention in HTML is to open and close a tag simultaneously. 385 00:22:07,740 --> 00:22:11,050 As an aside, you'll see a lot of websites in books not doing that. 386 00:22:11,050 --> 00:22:14,240 It is correct to do or not to do it, but we would argue 387 00:22:14,240 --> 00:22:17,430 that design-wise and stylistically, this is just better 388 00:22:17,430 --> 00:22:20,540 because then every tag is both opened and closed somehow. 389 00:22:20,540 --> 00:22:23,370 So now let's save and reload. Go back to the browser, okay. 390 00:22:23,370 --> 00:22:26,680 Now we're making some progress, but it's not quite enough. 391 00:22:26,680 --> 00:22:33,210 Let's go ahead and start typing in some longer body of text. 392 00:22:33,210 --> 00:22:40,610 So let's say, "A quick brown fox jumps over a lazy dog." 393 00:22:40,610 --> 00:22:42,700 And now let me just copy and paste this a few times 394 00:22:42,700 --> 00:22:45,040 so that we have a paragraph of text. 395 00:22:45,040 --> 00:22:47,780 Let me go back over here. So it's not looking very good. 396 00:22:47,780 --> 00:22:50,000 I do have a line break, so it's okay, 397 00:22:50,000 --> 00:22:52,140 but now, once we're getting to the point of having a web page 398 00:22:52,140 --> 00:22:55,640 that has lots of content and not just single lines to demonstrate HTML, 399 00:22:55,640 --> 00:22:58,570 we can start to think of these things as actual paragraphs. 400 00:22:58,570 --> 00:23:01,590 And we can start to structure our web page a little more cleanly. 401 00:23:01,590 --> 00:23:05,120 And indeed, what I can do is go up here inside of my body tag, 402 00:23:05,120 --> 00:23:09,400 and you know what, if "This is CS50. . ." really demarks the beginning of a paragraph, 403 00:23:09,400 --> 00:23:11,310 well, let's tag it as such. 404 00:23:11,310 --> 00:23:13,570 Let me indent the text; just by convention, let me say 405 00:23:13,570 --> 00:23:15,710 that this paragraph ends here, 406 00:23:15,710 --> 00:23:18,320 and then rather than do this line break, let me just say 407 00:23:18,320 --> 00:23:23,300 that this belongs there and as a new paragraph, 408 00:23:23,300 --> 00:23:27,610 and I'll just quickly indent by just clobbering all of this stuff. 409 00:23:27,610 --> 00:23:30,660 >> So now we have an indented paragraph there, 410 00:23:30,660 --> 00:23:33,510 and now our markup is starting to get a little more 411 00:23:33,510 --> 00:23:37,070 semantically consistent with what we're trying to do. 412 00:23:37,070 --> 00:23:40,130 We have a paragraph, so let's call it a paragraph with the p tag. 413 00:23:40,130 --> 00:23:43,370 We have a second paragraph, so let's call it a paragraph with the p tag. 414 00:23:43,370 --> 00:23:45,850 And now, what the browser will typically do 415 00:23:45,850 --> 00:23:48,490 is just like in an English book or essay, 416 00:23:48,490 --> 00:23:51,280 where you typically see some line breaks between paragraphs. 417 00:23:51,280 --> 00:23:53,720 Browsers will do that for you automatically. 418 00:23:53,720 --> 00:23:56,680 So now we have two paragraphs and we can continue this. 419 00:23:56,680 --> 00:23:58,770 But, of course, on the Web, when you have bodies of text, 420 00:23:58,770 --> 00:24:01,370 it's not typically just huge blobs of text. 421 00:24:01,370 --> 00:24:04,040 There are often hyperlinks in there. 422 00:24:04,040 --> 00:24:07,250 So if we want to, for instance, include some links there, 423 00:24:07,250 --> 00:24:10,760 suppose what might be of interest in whatever web page I'm creating here is-- 424 00:24:10,760 --> 00:24:12,780 let me go to Google.com, 425 00:24:12,780 --> 00:24:16,540 and let me search for a quick brown fox. 426 00:24:16,540 --> 00:24:22,150 Go to Google images, and, how about--this is cute. 427 00:24:22,150 --> 00:24:27,420 We'll go with this. So here we have a quick brown fox jumping over a lazy dog. 428 00:24:27,420 --> 00:24:30,560 So what I'm going to do here, just for the sake of demonstration, 429 00:24:30,560 --> 00:24:32,950 is suppose that this image was on my server, 430 00:24:32,950 --> 00:24:35,240 and I had been creating these images. 431 00:24:35,240 --> 00:24:38,720 What I just did was right click or control click on the image, 432 00:24:38,720 --> 00:24:42,370 and what you'll see in most browsers is a little menu-- 433 00:24:42,370 --> 00:24:48,800 stop doing that--a little menu that allows you to choose copy link location or copy URL. 434 00:24:48,800 --> 00:24:52,750 So let me go back now to my HTML, and suppose that I want 435 00:24:52,750 --> 00:24:56,420 to hyperlink this to another web page. 436 00:24:56,420 --> 00:24:58,640 >> What was the tag called for that? 437 00:24:58,640 --> 00:25:01,650 [Student, unintelligible] >>Yeah. So a href for hyper reference. 438 00:25:01,650 --> 00:25:04,660 Let me go ahead and paste that in. 439 00:25:04,660 --> 00:25:07,290 It's a pretty long URL, so let me zoom back out. 440 00:25:07,290 --> 00:25:09,950 Close brackets, so now notice I'm way over here 441 00:25:09,950 --> 00:25:11,960 because that URL happened to be pretty long. 442 00:25:11,960 --> 00:25:15,180 Let me scroll over here to the end of quick brown fox, 443 00:25:15,180 --> 00:25:18,830 and then let me close this tag with 00:25:21,280 where I only closed the name of the tag. 445 00:25:21,280 --> 00:25:24,470 Now let me go ahead and save that file, reload the web page, 446 00:25:24,470 --> 00:25:27,880 and now, by default, that's going to be underlined in blue for me, 447 00:25:27,880 --> 00:25:31,980 but indeed, I can now click on this and voila. There's that image. 448 00:25:31,980 --> 00:25:33,990 And it didn't have to be an image; it could have linked 449 00:25:33,990 --> 00:25:36,270 to some other random website on the Internet. 450 00:25:36,270 --> 00:25:39,610 I could do this, for instance, with CS50, so one last example here. 451 00:25:39,610 --> 00:25:42,730 "This is CS50" might make sense to go a href = 452 00:25:42,730 --> 00:25:50,340 http://www.cs50.net, close quote, close anchor. 453 00:25:50,340 --> 00:25:53,990 So now that's an even shorter URL, and this time we're not going to link to an image. 454 00:25:53,990 --> 00:25:57,880 We're instead going to link to another page. 455 00:25:57,880 --> 00:25:59,840 Now, we have an image here. 456 00:25:59,840 --> 00:26:02,970 I feel like we can do a little better than just linking to an image. 457 00:26:02,970 --> 00:26:05,760 What if we want to actually embody it in our own web page? 458 00:26:05,760 --> 00:26:09,290 >> Well, what I can do here is, rather than link to this graphic, 459 00:26:09,290 --> 00:26:14,690 let me instead cut the URL, and we'll get rid of that hyperlink and clean this up. 460 00:26:14,690 --> 00:26:17,190 And we'll go down here and get rid of this. 461 00:26:17,190 --> 00:26:20,910 We don't really need all these sentences now, so let me shorten the page a little bit. 462 00:26:20,910 --> 00:26:24,530 And then down here, let me go ahead in a new paragraph, 463 00:26:24,530 --> 00:26:30,100 say I don't want text now; I want an image whose source is going to be that URL. 464 00:26:30,100 --> 00:26:33,100 An image, like a line break, is either there or it's not. 465 00:26:33,100 --> 00:26:35,900 So let me immediately close that tag. 466 00:26:35,900 --> 00:26:39,440 Let me go ahead now and close the paragraph that I'm inside, 467 00:26:39,440 --> 00:26:43,010 and if all goes well with hello, world, if I reload now, 468 00:26:43,010 --> 00:26:45,520 I, indeed, see right inside my own web page an image. 469 00:26:45,520 --> 00:26:48,570 So now we have an image tag, an anchor tag and the like, 470 00:26:48,570 --> 00:26:51,320 and for good measure, let me do one other thing that's often neglected 471 00:26:51,320 --> 00:26:55,900 on websites these days: Let's provide some descriptive text for this image 472 00:26:55,900 --> 00:26:58,090 for people who are on a mobile device 473 00:26:58,090 --> 00:27:00,640 and therefore might not be able to download this image very quickly, 474 00:27:00,640 --> 00:27:03,310 for people who are blind and might not be able to see the image 475 00:27:03,310 --> 00:27:06,480 but they might have a screen reader that can tell them what this image is of. 476 00:27:06,480 --> 00:27:09,100 And to do that, there is another attribute for image tags 477 00:27:09,100 --> 00:27:11,290 called alt, for alternative text. 478 00:27:11,290 --> 00:27:14,650 And what I can do here is say, "This is a quick brown fox." 479 00:27:14,650 --> 00:27:17,650 So that even if the human can't see the image on the screen, 480 00:27:17,650 --> 00:27:20,560 he or she can at least hear, as with some piece of software, 481 00:27:20,560 --> 00:27:23,080 what actually is there on the screen. 482 00:27:23,080 --> 00:27:25,040 >> That won't change the aesthetics of the page, 483 00:27:25,040 --> 00:27:27,640 but it is certainly good practice for users. 484 00:27:27,640 --> 00:27:31,760 All right, let's leave this web page in its current form, 485 00:27:31,760 --> 00:27:33,890 but let's see if we can't now introduce 486 00:27:33,890 --> 00:27:36,210 some better approaches to writing these web pages, 487 00:27:36,210 --> 00:27:39,980 some lessons that are going to serve us well as our pages get more and more complex. 488 00:27:39,980 --> 00:27:42,220 What we're not going to do over the next few weeks 489 00:27:42,220 --> 00:27:46,810 is walk you through all of the several dozen HTML tags that there are. 490 00:27:46,810 --> 00:27:49,800 Much like in Scratch back in week 0, it probably will suffice 491 00:27:49,800 --> 00:27:52,120 to give a high-level overview of some of the concepts, 492 00:27:52,120 --> 00:27:54,530 a quick tour of some of the blocks you were probably able, 493 00:27:54,530 --> 00:27:58,240 pretty comfortably, to navigate on your own, the various puzzle pieces. 494 00:27:58,240 --> 00:28:00,460 And that's going to happen again in HTML, most likely, 495 00:28:00,460 --> 00:28:04,320 whereby there's ample resources on the Web that we'll point you at, 496 00:28:04,320 --> 00:28:06,920 various textbooks, if you prefer to read a textbook, 497 00:28:06,920 --> 00:28:10,560 that will walk you through all of the various things you can do with HTML, 498 00:28:10,560 --> 00:28:16,100 but really, we have seen thus far in HTML most of the fundamental concepts. 499 00:28:16,100 --> 00:28:19,900 We have the notion of tags being opened, tags being closed. 500 00:28:19,900 --> 00:28:22,100 Some tags that are both opened and closed 501 00:28:22,100 --> 00:28:24,620 in the sense that they're empty; there should be nothing inside of them 502 00:28:24,620 --> 00:28:27,490 like an image tag or a line break, which are just there. 503 00:28:27,490 --> 00:28:32,330 We also looked already at the notion of an attribute, like alt or source. 504 00:28:32,330 --> 00:28:36,410 Notice that these words tend, by convention, to be short and succinct. 505 00:28:36,410 --> 00:28:39,140 >> We do not have discretion over what these things are called; 506 00:28:39,140 --> 00:28:42,060 someone else who invented HTML came up with these names. 507 00:28:42,060 --> 00:28:44,710 So you just have to start to know or look up, any time you need them, 508 00:28:44,710 --> 00:28:47,160 what the names are for these tags and attributes. 509 00:28:47,160 --> 00:28:49,510 In the case of these attributes, attributes generally 510 00:28:49,510 --> 00:28:52,900 modify the behavior of some tag. 511 00:28:52,900 --> 00:28:55,710 In this case, the source attribute tells the image tag 512 00:28:55,710 --> 00:28:57,940 what the source of the image should be. 513 00:28:57,940 --> 00:29:04,460 The href attribute tells the anchor tag what it should actually be linking to. 514 00:29:04,460 --> 00:29:06,800 But in terms of the structure of a web page, even though Facebook 515 00:29:06,800 --> 00:29:09,680 and Google and the like look like a complete mess 516 00:29:09,680 --> 00:29:12,560 underneath the hood at first glance, if you start to read through it 517 00:29:12,560 --> 00:29:16,950 more methodically, they all follow this basic, basic structure. 518 00:29:16,950 --> 00:29:19,660 But we can improve the stylization of these things. 519 00:29:19,660 --> 00:29:24,180 So let me go to some examples that I prepared in advance. 520 00:29:24,180 --> 00:29:27,280 Let me go ahead and copy them from another folder here 521 00:29:27,280 --> 00:29:29,380 and put them into this directory. 522 00:29:29,380 --> 00:29:32,210 In advance, what I did was prepare a few files: 523 00:29:32,210 --> 00:29:35,670 search0, search1, search2, and search3 and 4. 524 00:29:35,670 --> 00:29:38,740 Let me go ahead and open up the first of those files, 525 00:29:38,740 --> 00:29:42,570 and let's see if we can't begin to create our own search engine. 526 00:29:42,570 --> 00:29:46,530 At the top of this file, as is usually the case in class, just a bunch of comments. 527 00:29:46,530 --> 00:29:49,760 In HTML, though, the means by which you start a comment 528 00:29:49,760 --> 00:29:55,640 is 00:29:59,800 When you're ready to stop that comment, you can do -->. 530 00:29:59,800 --> 00:30:02,380 So everything at the top in blue is just a comment. 531 00:30:02,380 --> 00:30:04,620 >> This is my doctype declaration, which again, 532 00:30:04,620 --> 00:30:07,080 you can just copy and paste on faith, for now. 533 00:30:07,080 --> 00:30:10,410 This just tells the browser, "Here comes some HTML 5." 534 00:30:10,410 --> 00:30:13,600 Below that, on line 14, is the first of my actual tags, 535 00:30:13,600 --> 00:30:16,900 and this just says, as before, here comes some HTML, 536 00:30:16,900 --> 00:30:19,460 here comes the head of my page, here comes the title, 537 00:30:19,460 --> 00:30:23,900 and then, conversely, that's it for the title, that's it for the head. 538 00:30:23,900 --> 00:30:26,460 Here now comes the body of my page. 539 00:30:26,460 --> 00:30:31,040 So a couple new tags now: h1 stands for heading 1. 540 00:30:31,040 --> 00:30:33,850 There's a tradition in HTML for many years back 541 00:30:33,850 --> 00:30:37,990 of having different sizes of text. 542 00:30:37,990 --> 00:30:41,980 And back in the day, each one meant, generally, just big and bold. 543 00:30:41,980 --> 00:30:45,860 But there's also h2, which is big but not quite as big and bold. 544 00:30:45,860 --> 00:30:49,320 There's h3, which is kind of big but not nearly as big and bold, 545 00:30:49,320 --> 00:30:52,380 and so forth, all the way down to h6. 546 00:30:52,380 --> 00:30:55,550 These days, though, h1, h2, and h3 are really meant 547 00:30:55,550 --> 00:30:57,980 to have more semantic meaning to them, 548 00:30:57,980 --> 00:31:01,100 whereby h1 is really a heading: the heading of a web page, 549 00:31:01,100 --> 00:31:04,210 the heading of a column or something like that of text. 550 00:31:04,210 --> 00:31:09,030 So I've deliberately said

CS50 search

551 00:31:09,030 --> 00:31:12,640 to specifiy that this is really the heading, the title of my page. 552 00:31:12,640 --> 00:31:14,850 Not the title in the title bar sense, 553 00:31:14,850 --> 00:31:18,960 but the title that you actually see in the web page itself, in the body. 554 00:31:18,960 --> 00:31:20,990 Now this, you can probably guess what it is, 555 00:31:20,990 --> 00:31:23,110 even though we have a few new pieces of syntax. 556 00:31:23,110 --> 00:31:25,930 This is a form. So the web really gets interesting 557 00:31:25,930 --> 00:31:28,770 when websites take input from users. 558 00:31:28,770 --> 00:31:31,700 In this class, in the problem set on web programming, 559 00:31:31,700 --> 00:31:33,880 we're not going to make a website, per se, 560 00:31:33,880 --> 00:31:37,570 with static content that shows photographs that you've taken, 561 00:31:37,570 --> 00:31:40,010 or this is my resume, and things about me, 562 00:31:40,010 --> 00:31:42,450 because those things are relatively easy to put together. 563 00:31:42,450 --> 00:31:44,400 It's hard to make things beautiful on the Web, 564 00:31:44,400 --> 00:31:46,390 but at least putting up content is pretty trivial. 565 00:31:46,390 --> 00:31:49,380 But things get really interesting when someone can visit your website 566 00:31:49,380 --> 00:31:52,260 and provide input and can fill out forms, 567 00:31:52,260 --> 00:31:55,800 can check off checkboxes and can interact with your website. 568 00:31:55,800 --> 00:31:57,780 And indeed, probably every website you care about 569 00:31:57,780 --> 00:32:00,710 these days, in any detail, is somehow interactive. 570 00:32:00,710 --> 00:32:03,110 Facebook, Google, and the like, that take user input 571 00:32:03,110 --> 00:32:05,100 and produce customized output. 572 00:32:05,100 --> 00:32:07,780 >> So let's start to do that now. Let's transition now 573 00:32:07,780 --> 00:32:11,150 from just using HTML for markup of static content 574 00:32:11,150 --> 00:32:14,790 as instead a delivery mechanism for dynamic content. 575 00:32:14,790 --> 00:32:17,350 And toward that end, let's implement our own search engine. 576 00:32:17,350 --> 00:32:20,820 Let's do it as follows. Here's the form tag. 577 00:32:20,820 --> 00:32:24,090 The action attribute specifies that when the user fills out this form 578 00:32:24,090 --> 00:32:28,400 with their keyboard, it will be submitted to this URL here. 579 00:32:28,400 --> 00:32:31,230 So I'm kind of cheating. It's going to take us a little longer 580 00:32:31,230 --> 00:32:33,780 than one class to implement the whole search engine, 581 00:32:33,780 --> 00:32:35,880 so we'll just do the front end, so to speak. 582 00:32:35,880 --> 00:32:38,650 We'll do the part that lets the user search, and we'll sort of punt to Google 583 00:32:38,650 --> 00:32:40,950 the hard part of finding search results, 584 00:32:40,950 --> 00:32:43,520 but, specifically, I'm going to talk to Google's web server 585 00:32:43,520 --> 00:32:46,710 using one of two very popular methods. 586 00:32:46,710 --> 00:32:50,000 One being get, another, that we'll eventually see, being post, 587 00:32:50,000 --> 00:32:52,660 although there are others that are less often used. 588 00:32:52,660 --> 00:32:56,440 So get just conjures up the idea of I want to get some content, get some search results. 589 00:32:56,440 --> 00:32:58,440 This, you can perhaps guess what this does. 590 00:32:58,440 --> 00:33:01,900 This is some kind of input; it's, in fact, going to look like a text field, 591 00:33:01,900 --> 00:33:05,200 and the name of that input, the name of that variable, so to speak, 592 00:33:05,200 --> 00:33:08,610 is going to be q for query by convention. 593 00:33:08,610 --> 00:33:11,700 And again, the type of this input is not going to be a checkbox; 594 00:33:11,700 --> 00:33:13,890 it's not going to be a menu; it's going to be a text field 595 00:33:13,890 --> 00:33:18,060 as denoted by this attribute here, and this text box, 596 00:33:18,060 --> 00:33:20,680 like a line break, is either there or not. 597 00:33:20,680 --> 00:33:24,480 So we have an empty element with the slash inside that tag. 598 00:33:24,480 --> 00:33:28,050 Then I'm going to put a line break, and you can, perhaps, guess what this is going to do. 599 00:33:28,050 --> 00:33:30,210 This is another sort of form input. 600 00:33:30,210 --> 00:33:32,350 >> This one's going to be used for submitting the form. 601 00:33:32,350 --> 00:33:36,140 So this is going to be the big button that the user can click to submit the form, 602 00:33:36,140 --> 00:33:40,800 and the label on that button is going to be "CS50 Search." 603 00:33:40,800 --> 00:33:44,170 Close form, close body, close HTML. 604 00:33:44,170 --> 00:33:46,280 Let's see what we have in the form of this web page. 605 00:33:46,280 --> 00:33:48,260 So let me go to my browser, 606 00:33:48,260 --> 00:33:50,360 let me go, still, to localhost. 607 00:33:50,360 --> 00:33:54,650 This is still index.html, so if I want to see this file called search0, 608 00:33:54,650 --> 00:33:59,710 I can simply do /search0.html, Enter-- 609 00:33:59,710 --> 00:34:01,880 and the first of my mistakes. 610 00:34:01,880 --> 00:34:04,400 What's going on? I clearly don't have permission 611 00:34:04,400 --> 00:34:06,430 to access this file, for some reason. 612 00:34:06,430 --> 00:34:10,170 But that's because, unlike the work we've done thus far in C, 613 00:34:10,170 --> 00:34:14,340 where the programs you write are assumed to be runnable by you, 614 00:34:14,340 --> 00:34:17,590 executable by you, that's not really the case on the Web, 615 00:34:17,590 --> 00:34:21,010 whereby sometimes you might want to create files on a server, 616 00:34:21,010 --> 00:34:23,310 but you don't want the whole world to be able to see them. 617 00:34:23,310 --> 00:34:25,469 Rather, you want the world to see some files 618 00:34:25,469 --> 00:34:27,730 but not others, just for privacy's sake. 619 00:34:27,730 --> 00:34:30,730 So it's more of an opt-in basis when you're doing things on the Web. 620 00:34:30,730 --> 00:34:32,810 And so let me actually type ls here, 621 00:34:32,810 --> 00:34:37,440 and you see the files I have, but recall that if I do ls -l for long, 622 00:34:37,440 --> 00:34:41,520 I'll get a longer listing that gives me some more details about these files 623 00:34:41,520 --> 00:34:45,139 that are now, really, for the first time relevant to us. 624 00:34:45,139 --> 00:34:47,840 Notice that on the far right are the names of my files, 625 00:34:47,840 --> 00:34:50,690 and then the time at which they were last modified or copied. 626 00:34:50,690 --> 00:34:54,370 This number here is what? Do you recall? 627 00:34:54,370 --> 00:34:56,400 The size in bytes, how big the file is. 628 00:34:56,400 --> 00:34:59,520 >> So I seem to have some kind of logo in here that's bigger than all the other files. 629 00:34:59,520 --> 00:35:03,610 This is who I am, this is what I am and what group I'm in. 630 00:35:03,610 --> 00:35:07,430 But then, over here on the left is a bit of cryptic sequence, 631 00:35:07,430 --> 00:35:10,040 and we talked, I think, briefly about this in the past, 632 00:35:10,040 --> 00:35:12,050 but this has to do with permissions. 633 00:35:12,050 --> 00:35:14,020 And even if that's a little hazy, 634 00:35:14,020 --> 00:35:17,270 RW probably means read and write. 635 00:35:17,270 --> 00:35:22,560 So it turns out that these dashes denote different sets of permissions for different people. 636 00:35:22,560 --> 00:35:24,730 And the pattern is, essentially, as follows. 637 00:35:24,730 --> 00:35:27,650 When you see a sequence of dashes here, they look as follows. 638 00:35:27,650 --> 00:35:30,450 There's a dash, then there's three more dashes, 639 00:35:30,450 --> 00:35:33,390 then there's another three, then there's another three. 640 00:35:33,390 --> 00:35:36,800 The first one is either a dash or it's a d for directory. 641 00:35:36,800 --> 00:35:40,220 So that one's pretty easy. If it's a folder, it says d, otherwise it's a hyphen. 642 00:35:40,220 --> 00:35:44,080 There's a couple other cases, but for now we'll just care about files and directories. 643 00:35:44,080 --> 00:35:48,090 These next three dashes--and I've artificially inserted the spaces. 644 00:35:48,090 --> 00:35:50,490 They were, obviously, not there when we saw them a moment ago. 645 00:35:50,490 --> 00:35:52,900 These are the file owner's permissions, 646 00:35:52,900 --> 00:35:55,840 and recall from a second ago that it was read and write. 647 00:35:55,840 --> 00:35:58,560 That was because I, as the person who created this file a moment ago, 648 00:35:58,560 --> 00:36:01,250 I, just by default, on a Linux computer, 649 00:36:01,250 --> 00:36:03,910 have the ability to continue reading and writing that file. 650 00:36:03,910 --> 00:36:07,170 >> So the operating system just gives me RW automatically. 651 00:36:07,170 --> 00:36:10,840 The middle ones relate to my group, that of students, 652 00:36:10,840 --> 00:36:14,590 which is sort of meaningless on the appliance because I'm the only person using the appliance. 653 00:36:14,590 --> 00:36:16,620 So let me just wave my hands at that for now. 654 00:36:16,620 --> 00:36:19,190 But the last ones are most important for the Web. 655 00:36:19,190 --> 00:36:21,580 This is everyone else in the world, and the fact 656 00:36:21,580 --> 00:36:24,600 that that is --- means that no one else in the world 657 00:36:24,600 --> 00:36:26,680 has any permissions to this file. 658 00:36:26,680 --> 00:36:29,180 Clearly a problem, so I need to fix this 659 00:36:29,180 --> 00:36:33,830 by somehow giving the world what? Read and write? 660 00:36:33,830 --> 00:36:35,850 That's probably dumb, right? I don't want anyone on the Web 661 00:36:35,850 --> 00:36:38,530 to go to visit my page and somehow change that file, 662 00:36:38,530 --> 00:36:40,800 even though they really couldn't with an HTML file, 663 00:36:40,800 --> 00:36:44,110 but just in principle, probably just want them to be able to read it. 664 00:36:44,110 --> 00:36:47,910 What does it mean to read it? It doesn't mean they're going to care about the actual HTML, 665 00:36:47,910 --> 00:36:51,820 but the browser needs to be able to parse that markup language, 666 00:36:51,820 --> 00:36:53,720 top to bottom, left to right. 667 00:36:53,720 --> 00:36:57,990 So someone on the Web needs to be able to read it, so I minimally need to give it r. 668 00:36:57,990 --> 00:37:00,240 I can do this in a few different ways, but perhaps 669 00:37:00,240 --> 00:37:03,080 the simplest is to run this command here. 670 00:37:03,080 --> 00:37:10,860 Chmod, change mode, then a + r so all, everyone in the world + read, 671 00:37:10,860 --> 00:37:13,830 and then the name of the file, search0.html. 672 00:37:13,830 --> 00:37:18,310 >> Now if I do ls -l again, notice that that file has changed, 673 00:37:18,310 --> 00:37:21,440 and indeed, I've turned on r for everyone. 674 00:37:21,440 --> 00:37:23,350 I've also turned it on for my group, but that's fine, 675 00:37:23,350 --> 00:37:27,150 because if I turned it on for everyone, my group is a subset of that. 676 00:37:27,150 --> 00:37:31,480 So that's fine too. This just means the computer has now made it readable. 677 00:37:31,480 --> 00:37:34,430 Now let me go back to my browser, click reload. 678 00:37:34,430 --> 00:37:36,330 Ah-ha. We now have CS50 Search. 679 00:37:36,330 --> 00:37:39,830 I've zoomed in a little artificially--pretty hideous search engine. 680 00:37:39,830 --> 00:37:41,930 But let's see if it actually works. 681 00:37:41,930 --> 00:37:45,880 First, let me do a quick sanity check, let me control click and view page source. 682 00:37:45,880 --> 00:37:50,780 Notice that within Chrome, we're now seeing the same HTML that I myself created. 683 00:37:50,780 --> 00:37:55,420 Don't get confused here, though. I can't start changing the code here, 684 00:37:55,420 --> 00:37:59,420 because the browser has a read-only view of this code. 685 00:37:59,420 --> 00:38:06,060 The browser has just asked localhost for a file called search0.html. 686 00:38:06,060 --> 00:38:09,490 It is now pure coincidence that the appliance 687 00:38:09,490 --> 00:38:13,480 happens to be on the same computer as my browser. 688 00:38:13,480 --> 00:38:20,470 I could just have, equivalently, have typed in www.facebook.com/search0.html, 689 00:38:20,470 --> 00:38:23,830 and if Facebook had a file called that, I would then be seeing their HTML. 690 00:38:23,830 --> 00:38:27,360 And, of course, I can't change the file that comes back from Facebook, either. 691 00:38:27,360 --> 00:38:29,360 So now we're sort of blurring the lines. 692 00:38:29,360 --> 00:38:32,130 The appliance is both a server, serving up web pages, 693 00:38:32,130 --> 00:38:34,870 but it's also a client in the sense that I'm using a browser 694 00:38:34,870 --> 00:38:37,630 to actually talk to that server. 695 00:38:37,630 --> 00:38:39,610 So let's see if my Google search engine works. 696 00:38:39,610 --> 00:38:44,930 Let me go ahead and search for quick brown fox, Enter. 697 00:38:44,930 --> 00:38:47,540 And voila, I now have my own search engine. 698 00:38:47,540 --> 00:38:51,460 >> But how does this work? 699 00:38:51,460 --> 00:38:55,380 Bit of a stretch, but--and now you can't see, precisely, the part that's of interest. 700 00:38:55,380 --> 00:38:57,370 Notice what happens. 701 00:38:57,370 --> 00:39:00,430 Notice the URL. It turns out that that method, 702 00:39:00,430 --> 00:39:02,780 called get, is super simple. 703 00:39:02,780 --> 00:39:10,270 When you specify in a form that you want to 'get' results from some server, 704 00:39:10,270 --> 00:39:13,200 what it's going to do is take whatever you typed into the form 705 00:39:13,200 --> 00:39:15,290 and put it in the URL. 706 00:39:15,290 --> 00:39:18,580 It's going to standardize how it gets put into the URL as follows. 707 00:39:18,580 --> 00:39:22,290 Notice that this is the URL that was the value of my action attribute. 708 00:39:22,290 --> 00:39:24,730 That's where I wanted the form to end up. 709 00:39:24,730 --> 00:39:26,950 But then notice this question mark. 710 00:39:26,950 --> 00:39:30,230 This is a convention on the Web whereby to provide user input 711 00:39:30,230 --> 00:39:35,320 to a website, you append to the URL a question mark, 712 00:39:35,320 --> 00:39:38,330 and then you have a whole bunch of key-value pairs. 713 00:39:38,330 --> 00:39:42,380 The name of a key, otherwise known as a parameter in the Web, 714 00:39:42,380 --> 00:39:46,380 then you have an equal sign, then you have the value of that parameter. 715 00:39:46,380 --> 00:39:49,810 So it's essentially a variable name and a variable value, 716 00:39:49,810 --> 00:39:54,250 but those variables' names and values came from the HTML form. 717 00:39:54,250 --> 00:39:56,250 Why are the pluses there, do you think? 718 00:39:56,250 --> 00:39:59,340 Because I did not type + in between my words. 719 00:39:59,340 --> 00:40:01,430 [Student, unintelligible] 720 00:40:01,430 --> 00:40:05,080 >>Yeah, it's just for spacing. Odds are, whenever you've seen a URL, 721 00:40:05,080 --> 00:40:07,320 there's never any spaces in it, if only because 722 00:40:07,320 --> 00:40:09,440 if there were, you couldn't really copy and paste it 723 00:40:09,440 --> 00:40:12,700 into an IM or into an email because it would break. 724 00:40:12,700 --> 00:40:15,420 You want the whole thing to be one contiguous string of characters. 725 00:40:15,450 --> 00:40:18,450 >> So the browser is smart enough to realize, uh-uh. 726 00:40:18,450 --> 00:40:22,610 Don't just put a space there. Let me encode the space in some standard way. 727 00:40:22,610 --> 00:40:25,170 One of the conventions for doing so is to have the browser 728 00:40:25,170 --> 00:40:29,350 automatically put a + where you would otherwise have a space. 729 00:40:29,350 --> 00:40:32,140 So now, notice Google has been kind of user-friendly. 730 00:40:32,140 --> 00:40:34,380 I certainly did not create this web page, 731 00:40:34,380 --> 00:40:37,200 but they have prepopulated their own text field 732 00:40:37,200 --> 00:40:39,490 with what, precisely, I typed in. 733 00:40:39,490 --> 00:40:43,090 Suppose I want to search for something else, like a lazy dog. 734 00:40:43,090 --> 00:40:45,340 I can just type this here, re-search. 735 00:40:45,340 --> 00:40:47,730 Notice that the URL changes up here, 736 00:40:47,730 --> 00:40:51,390 but notice then that I can actually search for anything I want 737 00:40:51,390 --> 00:40:53,610 just by understanding how URLs work. 738 00:40:53,610 --> 00:40:56,840 I could do lazy cat, Enter, 739 00:40:56,840 --> 00:41:01,370 and notice now I'm getting a very lazy--should we? I feel like we should. 740 00:41:01,370 --> 00:41:09,900 I get a very lazy cat. 741 00:41:09,900 --> 00:41:11,930 All right. This is one of the stupidest things we've done. 742 00:41:11,930 --> 00:41:17,160 But that is a lazy cat. 743 00:41:17,160 --> 00:41:19,730 Anyhow, what's the key takeaway here? 744 00:41:19,730 --> 00:41:22,830 Now we're sort of playing in the world of HTTP. 745 00:41:22,830 --> 00:41:26,050 HTML is just this markup language, open tag, close tag, 746 00:41:26,050 --> 00:41:29,490 that tells a browser how to render content on a web page. 747 00:41:29,490 --> 00:41:32,850 But when you start transmitting data across the Internet 748 00:41:32,850 --> 00:41:36,290 between web browser and server, that's where this protocol 749 00:41:36,290 --> 00:41:39,370 known as HyperText Transfer Protocol takes over. 750 00:41:39,370 --> 00:41:42,630 This is the sort of human convention; when Sam and I shook hands on Monday, 751 00:41:42,630 --> 00:41:48,300 starting a connection and then closing a connection, same idea here. 752 00:41:48,300 --> 00:41:53,100 How are Google's results coming back to me? 753 00:41:53,100 --> 00:41:55,290 How is my form submission going to Google? 754 00:41:55,290 --> 00:41:58,160 Well, recall from the other day that what's really going on 755 00:41:58,160 --> 00:42:02,150 underneath the hood when you request a web page is 756 00:42:02,150 --> 00:42:04,860 your browser is sending a somewhat cryptic message like 757 00:42:04,860 --> 00:42:09,510 GET / HTTP/1.1 for the default home page. 758 00:42:09,510 --> 00:42:13,000 >> Or, in this case, because I specifically requested earlier 759 00:42:13,000 --> 00:42:17,340 search0.html, this then would be the somewhat-cryptic message 760 00:42:17,340 --> 00:42:20,040 that my browser sends to the appliance. 761 00:42:20,040 --> 00:42:23,090 Or, in this case of Google, what's actually sent 762 00:42:23,090 --> 00:42:33,740 is a request to /search, and then ?q=lazy cat, with a plus there. 763 00:42:33,740 --> 00:42:36,790 So this message that I, the human, am never typing, 764 00:42:36,790 --> 00:42:40,620 but is being sent by my browser, this is how HTTP happens. 765 00:42:40,620 --> 00:42:43,240 This is the equivalent of our having shaken hands. 766 00:42:43,240 --> 00:42:46,320 This is the request, and the server's about to send a response. 767 00:42:46,320 --> 00:42:48,560 So let's take a look at this underneath the hood. 768 00:42:48,560 --> 00:42:55,320 As before, we can open up this special field in a browser. 769 00:42:55,320 --> 00:42:58,720 View Page, Inspect Elements. 770 00:42:58,720 --> 00:43:01,550 So under Inspect Element, notice that what's happened in Chrome, 771 00:43:01,550 --> 00:43:04,160 and IE and Firefox have similar mechanisms, 772 00:43:04,160 --> 00:43:07,370 we have these developer tools accessible to us. 773 00:43:07,370 --> 00:43:09,630 Normal people do not use these tabs. 774 00:43:09,630 --> 00:43:11,940 But we, now, are interested in what's going on 775 00:43:11,940 --> 00:43:13,890 underneath the hood at the network level. 776 00:43:13,890 --> 00:43:16,130 So if I pull up the network level here, 777 00:43:16,130 --> 00:43:18,510 let me go ahead and expand this window, 778 00:43:18,510 --> 00:43:21,840 open up this entry here, and look at the headers. 779 00:43:21,840 --> 00:43:26,010 So what happens when I request a file from a web server 780 00:43:26,010 --> 00:43:29,410 is my browser sends a whole bunch of things. 781 00:43:29,410 --> 00:43:32,390 And let me view source. So under request headers, 782 00:43:32,390 --> 00:43:35,250 and this is just Chrome showing me some diagnostic output, 783 00:43:35,250 --> 00:43:37,340 sort of like a debugger of some sort, 784 00:43:37,340 --> 00:43:40,500 notice that what I've highlighted here is precisely what 785 00:43:40,500 --> 00:43:47,060 Chrome is sending to the server in order to request a file called search0.html. 786 00:43:47,060 --> 00:43:50,160 It is telling the server what it thinks its name is, 787 00:43:50,160 --> 00:43:52,210 thanks to this host colon field, then there's some 788 00:43:52,210 --> 00:43:56,950 pretty esoteric stuff in here, like something to do with dates and times, 789 00:43:56,950 --> 00:43:59,720 something to do with the languages that the browser understands, 790 00:43:59,720 --> 00:44:02,850 but the really important lines are these first two here. 791 00:44:02,850 --> 00:44:05,490 >> What does the server respond with? Well, if we scroll down here 792 00:44:05,490 --> 00:44:08,510 and view source of this thing, notice that the server 793 00:44:08,510 --> 00:44:13,700 has responded with a somewhat cryptic message as well, 304 not modified. 794 00:44:13,700 --> 00:44:16,030 That's a little strange; let me actually try to fix this. 795 00:44:16,030 --> 00:44:18,670 Let me hold down Shift and click Reload up here 796 00:44:18,670 --> 00:44:22,460 to force the browser to actually make this request for the first time. 797 00:44:22,460 --> 00:44:25,700 Then let me zoom in, and we'll see now that the server's response, 798 00:44:25,700 --> 00:44:28,950 because I held Shift, is 200 OK. 799 00:44:28,950 --> 00:44:31,170 So you've probably never seen the number 200 800 00:44:31,170 --> 00:44:33,300 in the context of the Web, but what numbers 801 00:44:33,300 --> 00:44:36,760 have you sometimes seen unexpectedly from a server? 802 00:44:36,760 --> 00:44:42,010 404, file not found; 403, forbidden; 500, server error. 803 00:44:42,010 --> 00:44:44,890 So there are these numeric codes that the world uses in the Web 804 00:44:44,890 --> 00:44:47,870 to signify errors, just like C functions 805 00:44:47,870 --> 00:44:51,030 can return errors and main can return exit codes. 806 00:44:51,030 --> 00:44:54,160 200, though, you rarely see because it means all is well. 807 00:44:54,160 --> 00:44:59,000 And 304 you probably never see because what is it signifying? 808 00:44:59,000 --> 00:45:03,330 That nothing has--let's see if we can simulate this again-- 809 00:45:03,330 --> 00:45:07,170 Oh, now it's not cooperating. 304 said not modified, 810 00:45:07,170 --> 00:45:09,170 so why was the server even responding? 811 00:45:09,170 --> 00:45:12,550 Well, for efficiency, a web server automatically for you, 812 00:45:12,550 --> 00:45:16,570 if the file hasn't changed, it won't retransmit the whole HTML file. 813 00:45:16,570 --> 00:45:19,150 It'll just tell the browser it hasn't changed. 814 00:45:19,150 --> 00:45:21,220 Just use the copy you already have. 815 00:45:21,220 --> 00:45:22,650 So there's this notion of caching on the Web 816 00:45:22,650 --> 00:45:25,840 for performance, so that you don't waste time and waste bandwidth 817 00:45:25,840 --> 00:45:29,160 downloading files again and again unnecessarily. 818 00:45:29,160 --> 00:45:31,460 >> But this web page, now, was super-simple, 819 00:45:31,460 --> 00:45:34,980 and it only showed me the HTML that came back. 820 00:45:34,980 --> 00:45:40,940 Let's actually use the network tab now to do a Google search like quick brown fox. 821 00:45:40,940 --> 00:45:43,010 Let me then click CS50 Search, 822 00:45:43,010 --> 00:45:46,950 and now, notice in the bottom here a whole bunch of stuff came back 823 00:45:46,950 --> 00:45:49,900 because when I visit a real website like Google.com, 824 00:45:49,900 --> 00:45:53,520 they have images, they have text, they have a language called JavaScript there. 825 00:45:53,520 --> 00:45:55,940 So every row in this table down here 826 00:45:55,940 --> 00:46:01,490 represents something that Google spit out in response to my single request. 827 00:46:01,490 --> 00:46:04,160 The one I care about, though, is this first one. 828 00:46:04,160 --> 00:46:08,420 And if I go to the search, request, click View Source here, 829 00:46:08,420 --> 00:46:11,300 notice that, indeed, the cryptic message that my browser sent 830 00:46:11,300 --> 00:46:15,010 to Google was these two lines here, 831 00:46:15,010 --> 00:46:18,420 followed by some arcane information down here which we'll ignore for now. 832 00:46:18,420 --> 00:46:20,890 But notice, too, what Chrome is pretty handy with, 833 00:46:20,890 --> 00:46:24,540 it's also showing me the query string that was sent in. 834 00:46:24,540 --> 00:46:27,410 So rather than show me this, which was literally sent, 835 00:46:27,410 --> 00:46:30,800 if I view it decoded, Chrome, just for debugging purposes, 836 00:46:30,800 --> 00:46:34,270 for developers like us, it's just showing me a human-friendly version of-- 837 00:46:34,270 --> 00:46:36,390 that is not how you spell fox, apparently. 838 00:46:36,390 --> 00:46:40,520 I'm just noticing this now--but it's showing you what I, apparently, typed. 839 00:46:40,520 --> 00:46:45,340 Meanwhile, the response that came back from the server is again 200 OK. 840 00:46:45,340 --> 00:46:47,930 But included in that response, of course, 841 00:46:47,930 --> 00:46:51,920 if we actually view the page's HTML-- 842 00:46:51,920 --> 00:46:55,440 sorry, this is a little keyboard shortcut gone awry today. 843 00:46:55,440 --> 00:46:59,020 >> I'll deal with this later. So if we actually view the page's source, 844 00:46:59,020 --> 00:47:02,990 which I can do down here by clicking response, 845 00:47:02,990 --> 00:47:10,080 this is what was actually spit back, in addition to that cryptic 200 OK message from the server. 846 00:47:10,080 --> 00:47:12,520 A little cryptic, but where is all this coming from? 847 00:47:12,520 --> 00:47:15,570 Well, let's do one other thing here. Another somewhat cryptic command, 848 00:47:15,570 --> 00:47:20,530 but this one's kind of neat in that it reveals to us exactly what's going on underneath the hood. 849 00:47:20,530 --> 00:47:22,530 So I'm back on my Mac here, I have connected 850 00:47:22,530 --> 00:47:25,980 via a program called SSH, Secure Shell, to another server 851 00:47:25,980 --> 00:47:28,940 because most of Harvard's computers block the command we're about to run 852 00:47:28,940 --> 00:47:31,640 because there's this command on some servers called traceroute 853 00:47:31,640 --> 00:47:34,810 that allows you to trace the route between points a and b, 854 00:47:34,810 --> 00:47:37,020 and thus far we've been taking completely for granted 855 00:47:37,020 --> 00:47:40,170 that I can type in Google.com and somehow get data back 856 00:47:40,170 --> 00:47:43,530 from halfway across the country or halfway across the world. 857 00:47:43,530 --> 00:47:45,810 With traceroute we can actually dive in a little deeper 858 00:47:45,810 --> 00:47:49,370 as to how the Internet works, and see what's going on underneath the hood. 859 00:47:49,370 --> 00:47:54,440 So let's go ahead and arbitrarily trace a route to, say, Stanford.edu, 860 00:47:54,440 --> 00:47:57,150 which is across the country, and hit Enter. 861 00:47:57,150 --> 00:47:59,380 This command can be super fast or super slow, 862 00:47:59,380 --> 00:48:02,010 but what we're seeing now, line by line, 863 00:48:02,010 --> 00:48:08,060 is every one of the steps or hops between us and Palo Alto, or Stanford, 864 00:48:08,060 --> 00:48:11,010 where they have their web server. 865 00:48:11,010 --> 00:48:16,600 So what does each of these lines represent more concretely, though? 866 00:48:16,600 --> 00:48:19,100 A piece of jargon from the Internet? [Student, unintelligible] 867 00:48:19,100 --> 00:48:21,570 >>What's that? [Student, unintelligible] 868 00:48:21,570 --> 00:48:25,390 >>Oh, so there are times, but what does each row--what do I mean by hop? 869 00:48:25,390 --> 00:48:29,140 >> Well, there are these things on the Internet called routers. 870 00:48:29,140 --> 00:48:33,020 And routers, as the name suggests, route information from point a to point b. 871 00:48:33,020 --> 00:48:36,920 But there are several points beyond a and b. 872 00:48:36,920 --> 00:48:40,010 There's c and d and e and f between row 1, 873 00:48:40,010 --> 00:48:43,480 which happens to be my computer's IP address, 874 00:48:43,480 --> 00:48:46,890 or my numeric address, which uniquely identifies my computer, 875 00:48:46,890 --> 00:48:50,300 and step 15, which is actually the sixth web server, 876 00:48:50,300 --> 00:48:54,640 apparently, which I'm inferring from this, or version 6 of their web server at Stanford. 877 00:48:54,640 --> 00:48:56,680 But what's kind of neat is, we can see the path 878 00:48:56,680 --> 00:49:00,480 that my 0's and 1's are taking from my computer to Stanford. 879 00:49:00,480 --> 00:49:02,500 So step 1 is my own computer's address. 880 00:49:02,500 --> 00:49:05,760 Every computer on the Internet has a unique identifier that looks like this. 881 00:49:05,760 --> 00:49:08,150 Number.number.number.number. 882 00:49:08,150 --> 00:49:10,370 Somewhere on this campus, probably in the science center, 883 00:49:10,370 --> 00:49:16,780 is a router called Core Gateway 2 -te83, whatever that means, 884 00:49:16,780 --> 00:49:20,590 so this is one of Harvard's big fancy routers that routes a lot of their traffic. 885 00:49:20,590 --> 00:49:24,640 Here's another of Harvard's routers, this one is Border Gateway, 886 00:49:24,640 --> 00:49:28,310 border meaning it's probably on the periphery of campus somewhere. 887 00:49:28,480 --> 00:49:32,790 Then there's nox one, row 4, which is Northern Crossroads, 888 00:49:32,790 --> 00:49:35,070 which is a big ISP, Internet service provider, 889 00:49:35,070 --> 00:49:37,740 that places like Harvard connect up to. 890 00:49:37,740 --> 00:49:40,760 But then things get a little interesting in line 6. 891 00:49:40,760 --> 00:49:45,960 Where are my bits all of a sudden? Kansas. 892 00:49:45,960 --> 00:49:49,300 The world has a habit of using airport codes in a lot of these things, 893 00:49:49,300 --> 00:49:52,900 or at least abbreviations for states or cities, 894 00:49:52,900 --> 00:49:56,490 so it looks like, in just 60 ms, 895 00:49:56,490 --> 00:49:59,420 a packet of information, 0's and 1's from my laptop 896 00:49:59,420 --> 00:50:03,210 got all the way to Kansas, and again, in 60 ms. 897 00:50:03,210 --> 00:50:08,180 >> Moreover, after Kansas, they took a tour through Houston, probably, 898 00:50:08,180 --> 00:50:10,140 as suggested by the name of this server. 899 00:50:10,140 --> 00:50:13,310 So just as a server on the Internet must have a numeric address, 900 00:50:13,310 --> 00:50:18,360 it can also, optionally, have a slightly more human-friendly address that humans came up with. 901 00:50:18,360 --> 00:50:20,510 Now, in step 8, we don't know what this is. 902 00:50:20,510 --> 00:50:22,550 Sometimes routers just kind of ignore you, 903 00:50:22,550 --> 00:50:25,010 and they just don't answer the questions, so that's fine. 904 00:50:25,010 --> 00:50:29,290 The one after step 8 is apparently where? L.A. 905 00:50:29,290 --> 00:50:35,290 Notice in only 78 ms, what takes us humans like 6+ hours to do physically, 906 00:50:35,290 --> 00:50:40,110 takes packets of information on the Internet 78 ms to travel that far. 907 00:50:40,110 --> 00:50:45,890 Step 10 is in L.A. as well, and step 11 seems to have gone north, up near Stanford. 908 00:50:45,890 --> 00:50:48,750 This is their boundary router, or border router. 909 00:50:48,750 --> 00:50:51,240 A couple steps at Stanford that are ignoring us, 910 00:50:51,240 --> 00:50:55,610 and lastly, we reach the web server in just 87 ms. 911 00:50:55,610 --> 00:50:57,760 Now, all of these numbers, as an aside, 912 00:50:57,760 --> 00:51:00,640 just tell you how long it takes for data to get from me 913 00:51:00,640 --> 00:51:03,530 to each of these routers, and it's not accumulative. 914 00:51:03,530 --> 00:51:06,960 What this program does is it first sends a message, essentially, to the first router. 915 00:51:06,960 --> 00:51:09,490 Then one to the second router; then one to the third router, 916 00:51:09,490 --> 00:51:12,610 measuring each time. So in theory, these times will be growing 917 00:51:12,610 --> 00:51:14,860 or at least pretty close to one another, 918 00:51:14,860 --> 00:51:18,090 and, indeed, the ones that are right here on campus are super-small. 919 00:51:18,090 --> 00:51:20,820 As soon as you start going across the country, it takes data 920 00:51:20,820 --> 00:51:24,830 a little longer to travel, closer to 100 ms, give or take. 921 00:51:24,830 --> 00:51:28,330 But let's go the other direction now. How about Cambridge University in the UK? 922 00:51:28,330 --> 00:51:32,540 Let me instead run traceroute of www.cam for Cambridge, 923 00:51:32,540 --> 00:51:36,710 .ac for academic, .uk, and hit Enter here. 924 00:51:36,710 --> 00:51:38,830 That was pretty damn fast. 925 00:51:38,830 --> 00:51:43,300 My data literally went to Cambridge, England, in that split second of time. 926 00:51:43,300 --> 00:51:45,340 >> So let's see the path that it took. 927 00:51:45,340 --> 00:51:47,520 Harvard, Harvard, Harvard, Northern Crossroads, 928 00:51:47,520 --> 00:51:52,690 which is an ISP, and then this is Northern Crossroads, and then bam. 929 00:51:52,690 --> 00:51:58,320 What is in between steps 6 and 7, router 6 and 7? 930 00:51:58,320 --> 00:52:02,040 The Atlantic Ocean. And we're inferring this from the fact that 931 00:52:02,040 --> 00:52:06,530 we go from 20 ms here to 80 ms here. 932 00:52:06,530 --> 00:52:10,050 So something took 60 ms, give or take, to get over. 933 00:52:10,050 --> 00:52:12,910 And that was probably a big body of water. 934 00:52:12,910 --> 00:52:15,250 What goes on after that? Well, here we are in London, 935 00:52:15,250 --> 00:52:18,860 just 88 ms later. More London, more London, 936 00:52:18,860 --> 00:52:21,730 not sure where this is, but we'll assume it's outside of London, 937 00:52:21,730 --> 00:52:26,390 Cambridge here, and finally we--literally, University of Cambridge 938 00:52:26,390 --> 00:52:29,500 .something.net, and then, finally, in line 16, 939 00:52:29,500 --> 00:52:31,720 their web server is apparently called Scorpius 940 00:52:31,720 --> 00:52:35,500 underneath the hood, even though we know it as www. 941 00:52:35,500 --> 00:52:38,790 Kind of mind-blowing, I think. The first time I ever did this, it totally blew my mind. 942 00:52:38,790 --> 00:52:41,670 Unfortunately, Harvard blocks this kind of traffic, typically, on the network. 943 00:52:41,670 --> 00:52:44,340 So you can't do it super easily. 944 00:52:44,340 --> 00:52:48,500 Realize, though, this here is possible. 945 00:52:48,500 --> 00:52:53,630 All right. Let's take our 5-minute break here. We'll come back and dive in deeper. 946 00:52:53,630 --> 00:53:00,850 So we are back, and we've kind of ambled about in a few different directions here. 947 00:53:00,850 --> 00:53:03,700 So let's summarize exactly what's been going on here. 948 00:53:03,700 --> 00:53:07,990 We started the conversation talking about this language called HTML. 949 00:53:07,990 --> 00:53:10,680 Again, not a programming language. It's just a markup language 950 00:53:10,680 --> 00:53:15,490 that is largely about aesthetics and structuring of content in the form of a webpage. 951 00:53:15,490 --> 00:53:19,220 But HTML, therefore, needs some kind of mechanism 952 00:53:19,220 --> 00:53:22,870 for traveling between web browser and server. 953 00:53:22,870 --> 00:53:28,360 HTML therefore sort of rides on top of this other language, 954 00:53:28,360 --> 00:53:31,280 or more properly, a protocol, known as HTTP. 955 00:53:31,280 --> 00:53:33,730 >> And HTTP, as we've seen it thus far, 956 00:53:33,730 --> 00:53:37,140 is kind of analogous to this human convention of shaking hands. 957 00:53:37,140 --> 00:53:39,940 When a browser wants to request a page from a server, 958 00:53:39,940 --> 00:53:43,450 it sends that "get" request from browser to server, 959 00:53:43,450 --> 00:53:48,040 and then the server responds with a number like 200, all is okay, 960 00:53:48,040 --> 00:53:53,290 as well as the HTML or some bad number like 404, file not found. 961 00:53:53,290 --> 00:53:58,220 But meanwhile, HTTP itself isn't the Internet, per se. 962 00:53:58,220 --> 00:54:01,550 HTTP is just a service, a feature of the Internet 963 00:54:01,550 --> 00:54:05,530 much like G chat is another service, much like email is another service. 964 00:54:05,530 --> 00:54:09,180 There's all sorts of things we can do on the Internet. 965 00:54:09,180 --> 00:54:12,670 HTTP is just one of those applications. 966 00:54:12,670 --> 00:54:17,210 So on top of--HTTP is on top of something else 967 00:54:17,210 --> 00:54:21,750 which we didn't mention by name, you might have heard of by name, TCP/IP. 968 00:54:21,750 --> 00:54:25,160 So the story we just told there is all about 969 00:54:25,160 --> 00:54:28,720 how data travels from point a to point b. 970 00:54:28,720 --> 00:54:30,950 And in this case, we saw at a very low level 971 00:54:30,950 --> 00:54:33,060 router to router to router to router, 972 00:54:33,060 --> 00:54:35,390 how the data is actually being transmitted. 973 00:54:35,390 --> 00:54:40,510 But along the way, it is going to encounter various impediments. 974 00:54:40,510 --> 00:54:43,770 Besides these routers, there are things called firewalls on the Internet, 975 00:54:43,770 --> 00:54:46,680 and so data, such as that we were just transmitting 976 00:54:46,680 --> 00:54:49,720 from me to Stanford, from me to Cambridge, 977 00:54:49,720 --> 00:54:54,560 is sent to, at this level, something called an IP address. 978 00:54:54,560 --> 00:54:57,340 We saw this a moment ago, and an IP address 979 00:54:57,340 --> 00:55:02,480 is just a numeric address of the form w.x.y.z, 980 00:55:02,480 --> 00:55:08,070 where each of these is between, give or take, 0 and 255, 981 00:55:08,070 --> 00:55:10,080 though you can't quite use all of those numbers. 982 00:55:10,080 --> 00:55:14,220 But each of these place holders is a number between 0 and 255. 983 00:55:14,220 --> 00:55:16,820 So an IP address these days is 32 bits. 984 00:55:16,820 --> 00:55:20,780 >> Now, that gives us how many possible IP addresses in the world? 985 00:55:20,780 --> 00:55:24,420 Roughly 4 billion, because any time we're counting in powers of 2 986 00:55:24,420 --> 00:55:27,760 all the way up to 32 of something, that usually gives us 4 billion. 987 00:55:27,760 --> 00:55:30,160 So that's a lot of IP addresses, but you might have read, 988 00:55:30,160 --> 00:55:32,410 or you might now notice in the popular press, 989 00:55:32,410 --> 00:55:36,020 a push toward a new version of IP called IPV6. 990 00:55:36,020 --> 00:55:38,290 Right now we're using version 4. 991 00:55:38,290 --> 00:55:41,060 There really hasn't been a version 5, we're just jumping right to 6. 992 00:55:41,060 --> 00:55:46,760 Version 6 is going to use 128 bits for IP addresses, which is freaking huge. 993 00:55:46,760 --> 00:55:49,430 We should not run out for quite some time now, 994 00:55:49,430 --> 00:55:52,980 but we have begun to run out of version 4 IP addresses, 995 00:55:52,980 --> 00:55:56,110 because all of us have not only things like laptops and desktops, 996 00:55:56,110 --> 00:55:58,700 a lot of us have phones, a lot of us have other devices 997 00:55:58,700 --> 00:56:01,600 like TiVo and the like that have IP addresses themselves. 998 00:56:01,600 --> 00:56:03,720 Harvard itself has tens of thousands of computers. 999 00:56:03,720 --> 00:56:07,970 So the world is genuinely running out of IP addresses, at least of this form. 1000 00:56:07,970 --> 00:56:10,340 So over the next few years, you are going to see the addresses 1001 00:56:10,340 --> 00:56:12,870 on your own computers probably slowly change 1002 00:56:12,870 --> 00:56:16,740 as more and more companies and universities start to support the newer version. 1003 00:56:16,740 --> 00:56:22,770 But an IP address is not sufficient for computer a to request data from computer b. 1004 00:56:22,770 --> 00:56:24,950 Because computer b could be a server, 1005 00:56:24,950 --> 00:56:27,600 and a server, as I mentioned earlier, can do bunches of things. 1006 00:56:27,600 --> 00:56:29,940 It can host web pages, it can be an email server, 1007 00:56:29,940 --> 00:56:32,310 it can be a Skype server, it can be a G chat server. 1008 00:56:32,310 --> 00:56:35,870 >> All these different services that can be provided on a server 1009 00:56:35,870 --> 00:56:38,330 could all, physically, be on the same machine. 1010 00:56:38,330 --> 00:56:40,380 So in addition to IP addresses, 1011 00:56:40,380 --> 00:56:43,250 the world has things called ports on the Internet. 1012 00:56:43,250 --> 00:56:47,830 A port is just a number; so there is a unique number for HTTP. 1013 00:56:47,830 --> 00:56:50,280 Its number is 80. 1014 00:56:50,280 --> 00:56:55,870 HTTP also uses number 443, but more specifically, for encrypted HTTPS. 1015 00:56:55,870 --> 00:57:00,030 Whenever you see the s, for secure, that's using a different number. 1016 00:57:00,030 --> 00:57:06,580 There are other numbers, like 25, used for something called SMTP, otherwise known as email. 1017 00:57:06,580 --> 00:57:09,620 There's something called 22 for SSH, 1018 00:57:09,620 --> 00:57:11,850 and there's a whole bunch of other ports out there. 1019 00:57:11,850 --> 00:57:14,460 Now, we humans rarely see these numbers. 1020 00:57:14,460 --> 00:57:21,970 However, when you type in an address like http://www.facebook.com, 1021 00:57:21,970 --> 00:57:26,560 the browser is secretly inserting 80, because you're using HTTP. 1022 00:57:26,560 --> 00:57:30,630 If you, instead, type HTTPS, it's secretly inserting 443. 1023 00:57:30,630 --> 00:57:35,180 And we can kind of see this manually if I pull up a brower 1024 00:57:35,180 --> 00:57:41,850 and go to http://www.facebook.com:80. 1025 00:57:41,850 --> 00:57:44,550 Therefore explicitly citing not just the name of the website 1026 00:57:44,550 --> 00:57:47,650 but the port that I want to talk to, and hit Enter. 1027 00:57:47,650 --> 00:57:50,170 Notice it disappears, because the browser assumes, 1028 00:57:50,170 --> 00:57:53,360 oh, 80, I'm not even going to bother showing that to you. 1029 00:57:53,360 --> 00:57:56,400 But the reason for this is that if I actually wanted to send someone an email, 1030 00:57:56,400 --> 00:58:02,340 I would really be sending it to them on port 25, that being SMTP. 1031 00:58:02,340 --> 00:58:04,890 A bit of an oversimplification, but some of you have friends 1032 00:58:04,890 --> 00:58:09,290 who actually work at Facebook, and they, similarly, have servers that receive email. 1033 00:58:09,290 --> 00:58:12,610 >> Any time you send an email, what Gmail is doing for you 1034 00:58:12,610 --> 00:58:14,960 or Outlook or whatever program you use, 1035 00:58:14,960 --> 00:58:19,270 it's sort of secretly inserting that number as well, 25 in that case. 1036 00:58:19,270 --> 00:58:24,490 It's this combination of IP address and number that uniquely identifies 1037 00:58:24,490 --> 00:58:29,190 a computer on the Internet and a specific service on that computer. 1038 00:58:29,190 --> 00:58:33,460 Now, of course, most of us have probably never typed manually an IP address. 1039 00:58:33,460 --> 00:58:37,340 Maybe you have in the appliance, but in the real world, not so much. 1040 00:58:37,340 --> 00:58:42,750 Why do we not type IP addresses into browsers? 1041 00:58:42,750 --> 00:58:45,860 It would work, in fact, we can see this; let me show you 1042 00:58:45,860 --> 00:58:50,000 one other command that should work most anywhere on Harvard's campus on a Mac or a PC. 1043 00:58:50,000 --> 00:58:53,970 There's this command called nslookup, name server lookup. 1044 00:58:53,970 --> 00:58:59,960 If I look up www.cnn.com, it turns out that CNN has--oh, interesting. 1045 00:58:59,960 --> 00:59:03,180 CNN has started using Amazon web services. 1046 00:59:03,180 --> 00:59:06,380 You might know of cloud computing; Amazon's one of the big players in cloud computing. 1047 00:59:06,380 --> 00:59:10,240 What I just did was I said, "Give me the address of CNN's web server," 1048 00:59:10,240 --> 00:59:14,090 but it turns out that CNN's web server is managed by Amazon, 1049 00:59:14,090 --> 00:59:16,030 Amazon web services, this suggests. 1050 00:59:16,030 --> 00:59:19,680 And the address of that server is this here. 1051 00:59:19,680 --> 00:59:22,350 So I'm not sure if this will work, because they didn't used to use Amazon. 1052 00:59:22,350 --> 00:59:32,830 But let's try this; http://, IP address, Enter, and-- 1053 00:59:32,830 --> 00:59:35,690 is it going to work? 1054 00:59:35,690 --> 00:59:39,280 Yes. It is going to work. Internet is super slow today. 1055 00:59:39,280 --> 00:59:43,680 But, in a moment, you will see some news story. 1056 00:59:43,680 --> 00:59:48,360 There we go. Bank of America's being sued. All right. 1057 00:59:48,360 --> 00:59:54,000 >> This is because this IP address just happens to by synonymous with www.cnn.com. 1058 00:59:54,000 --> 00:59:59,920 Of course, it would be horrible marketing to say, visit us on the Web at 50.112.94.127. 1059 00:59:59,920 --> 01:00:02,370 You'd never remember. So even these days you might recall things 1060 01:00:02,370 --> 01:00:07,210 like 1-800-COLLECT or mnemonics the world came up with for phone numbers. 1061 01:00:07,210 --> 01:00:09,540 Which, before cell phones, were rather hard to remember 1062 01:00:09,540 --> 01:00:11,800 until you could just type it in and forget about it. 1063 01:00:11,800 --> 01:00:15,730 So the Web, too, has this convention of names and IP addresses, 1064 01:00:15,730 --> 01:00:17,770 and there are these things out there called DNS servers, 1065 01:00:17,770 --> 01:00:23,870 domain name systems servers, that translate IP addresses into names and vice versa. 1066 01:00:23,870 --> 01:00:26,340 So that's what's going on underneath the hood. 1067 01:00:26,340 --> 01:00:29,540 In the end, we have TCP/IP, which is this very low-level protocol 1068 01:00:29,540 --> 01:00:32,570 that, really, just gets 0's and 1's across the Internet, 1069 01:00:32,570 --> 01:00:36,030 and it does so by putting them into a virtual envelope, 1070 01:00:36,030 --> 01:00:38,820 if you will, and writing on the outside of the envelope 1071 01:00:38,820 --> 01:00:43,930 the IP address of the destination, as well as the numeric port number 1072 01:00:43,930 --> 01:00:47,520 of the service on that destination that it wants to talk to. 1073 01:00:47,520 --> 01:00:51,060 Meanwhile, on the envelope there's also something known as a return address, 1074 01:00:51,060 --> 01:00:55,600 which is your IP address, so that when CNN gets a packet of information from you, 1075 01:00:55,600 --> 01:00:58,710 opens this virtual envelope, sees that you want the home page, 1076 01:00:58,710 --> 01:01:04,630 it knows from the sender part of this virtual envelope whom to send the HTML back to. 1077 01:01:04,630 --> 01:01:07,470 So let's take a look at this in a little more detail. 1078 01:01:07,470 --> 01:01:11,370 This is from a company called Ericson, from a few years back. 1079 01:01:11,370 --> 01:01:14,780 And they took some liberties with how the Internet actually works, 1080 01:01:14,780 --> 01:01:18,920 but it paints a much more visual picture than mere chalk up here. 1081 01:01:18,920 --> 01:01:26,690 So I give you "A Bit of the Internet." 1082 01:02:26,660 --> 01:02:29,840 >> [Narrator] For the first time in history, 1083 01:02:29,840 --> 01:02:35,260 people and machinery are working together, realizing a dream. 1084 01:02:35,260 --> 01:02:38,910 A uniting force that knows no geographical boundaries. 1085 01:02:38,910 --> 01:02:43,230 Without regard to race, creed, or color. 1086 01:02:43,230 --> 01:02:47,770 A new era where communication truly brings people together. 1087 01:02:47,770 --> 01:02:50,070 This is 1088 01:02:50,070 --> 01:02:54,980 The Dawn of the Net. 1089 01:02:54,980 --> 01:03:04,640 Want to know how it works? Click here to begin your journey into the Net. 1090 01:03:04,640 --> 01:03:07,890 Now, exactly what happened when you clicked on that link? 1091 01:03:07,890 --> 01:03:10,150 You started a flow of information. 1092 01:03:10,150 --> 01:03:13,310 This information travels down into your own personal mailroom 1093 01:03:13,310 --> 01:03:18,500 where Mr. IP packages it, labels it, and sends it on its way. 1094 01:03:18,500 --> 01:03:20,960 Each packet is limited in its size. 1095 01:03:20,960 --> 01:03:23,880 The mail room must decide how to divide the information 1096 01:03:23,880 --> 01:03:26,070 and how to package it. 1097 01:03:26,070 --> 01:03:29,550 Now, the package needs a label containing important information 1098 01:03:29,550 --> 01:03:35,570 such as sender's address, receiver's address, and the type of packet it is. 1099 01:03:51,700 --> 01:03:54,980 Because this particular packet is going out onto the Internet, 1100 01:03:54,980 --> 01:03:57,720 it also gets an address for the proxy server, 1101 01:03:57,720 --> 01:04:01,520 which has a special function, as we'll see later. 1102 01:04:01,520 --> 01:04:06,650 The packet is now launched onto your local area network, or LAN. 1103 01:04:06,650 --> 01:04:10,160 This network is used to connect all the local computers' 1104 01:04:10,160 --> 01:04:15,900 routers, printers, et cetera, for information exchange within the physical walls of the building. 1105 01:04:15,900 --> 01:04:20,290 The LAN is a pretty uncontrolled place, and, unfortunately, 1106 01:04:20,290 --> 01:04:23,950 accidents can happen. 1107 01:04:31,190 --> 01:04:34,710 The highway of the LAN is packed with all types of information. 1108 01:04:34,710 --> 01:04:38,900 These are IP packets, Novell packets, AppleTalk packets. 1109 01:04:38,900 --> 01:04:41,270 They're going against traffic, as usual. 1110 01:04:41,270 --> 01:04:44,260 The local router reads the address and, if necessary, 1111 01:04:44,260 --> 01:04:48,520 lifts the packet on to another network. 1112 01:04:48,520 --> 01:04:54,270 Ah, the router. A symbol of control in a seemingly disorganized world. 1113 01:04:54,270 --> 01:05:05,480 [Router mumbling and talking to itself] 1114 01:05:05,480 --> 01:05:10,030 >> [Narrator] There he is, systematic, uncaring, methodical, 1115 01:05:10,030 --> 01:05:14,150 conservative, and sometimes not quite up to speed. 1116 01:05:14,150 --> 01:05:17,680 But at least he is exact, for the most part. 1117 01:05:32,270 --> 01:05:36,820 As the packets leave the router, they make their way into the corporate Internet 1118 01:05:36,820 --> 01:05:40,830 and head for the router switch. 1119 01:05:40,830 --> 01:05:46,250 A bit more efficient than the router, the router switch plays fast and loose with IP packets, 1120 01:05:46,250 --> 01:05:48,920 deftly routing them along their way. 1121 01:05:48,920 --> 01:05:52,130 A digital "pinball wizard," if you will. 1122 01:05:52,130 --> 01:06:04,270 [Router switch talking to itself] 1123 01:06:09,830 --> 01:06:12,150 [Narrator] As packets arrive at their destination, 1124 01:06:12,150 --> 01:06:14,740 they're picked up by the network interface, 1125 01:06:14,740 --> 01:06:18,040 ready to be sent to the next level. 1126 01:06:18,040 --> 01:06:21,010 In this case, the proxy. 1127 01:06:21,010 --> 01:06:25,040 The proxy is used by many companies as sort of a middle man 1128 01:06:25,040 --> 01:06:27,630 in order to lessen the load on the Internet connection 1129 01:06:27,630 --> 01:06:32,240 and for security reasons, as well. 1130 01:06:32,240 --> 01:06:38,750 As you can see, the packets are all of various sizes depending upon their content. 1131 01:06:55,210 --> 01:07:01,890 The proxy opens the packet and looks for the web address or URL. 1132 01:07:01,890 --> 01:07:04,950 Depending upon whether the address is acceptable, 1133 01:07:04,950 --> 01:07:08,000 the packet is sent on to the Internet. 1134 01:07:13,890 --> 01:07:19,630 There are, however, some addresses which do not meet with the approval of the proxy. 1135 01:07:19,630 --> 01:07:25,680 That is to say, corporate or management guidelines. 1136 01:07:25,680 --> 01:07:30,580 These are summarily dealt with. 1137 01:07:30,580 --> 01:07:32,410 We'll have none of that. 1138 01:07:32,410 --> 01:07:36,350 For those who make it, it's on the road again. 1139 01:07:46,850 --> 01:07:53,310 >> Next up, the firewall. 1140 01:07:53,310 --> 01:07:57,410 The corporate firewall serves two purposes. 1141 01:07:57,410 --> 01:08:02,420 It prevents some rather nasty things from the Internet from coming in to the Intranet, 1142 01:08:02,420 --> 01:08:10,280 and it can also prevent sensitive corporate information from being sent out onto the Internet. 1143 01:08:10,280 --> 01:08:12,980 Once through the firewall, a router picks up the packet 1144 01:08:12,980 --> 01:08:18,180 and places it onto a much narrower road, or bandwidth, as we say. 1145 01:08:18,180 --> 01:08:23,720 Obviously, the road is not broad enough to take them all. 1146 01:08:23,720 --> 01:08:29,319 Now, you might wonder what happens to all those packets which don't make it along the way. 1147 01:08:29,319 --> 01:08:32,270 Well, when Mr. IP doesn't receive an acknowledgement 1148 01:08:32,270 --> 01:08:35,000 that a packet has been received in due time, 1149 01:08:35,000 --> 01:08:39,890 he simply sends a replacement packet. 1150 01:08:39,890 --> 01:08:44,760 We are now ready to enter the world of the Internet. 1151 01:08:44,760 --> 01:08:49,370 A spiderweb of interconnected networks which span our entire globe. 1152 01:08:49,370 --> 01:08:56,050 Here, routers and switches establish links between networks. 1153 01:08:56,050 --> 01:08:59,200 Now, the Net is an entirely different environment than you'll find 1154 01:08:59,200 --> 01:09:01,569 within the protective walls of your LAN. 1155 01:09:01,569 --> 01:09:04,060 Out here, it's the Wild West. 1156 01:09:04,060 --> 01:09:06,359 Plenty of space, plenty of opportunities, 1157 01:09:06,359 --> 01:09:09,760 plenty of things to explore and places to go. 1158 01:09:09,760 --> 01:09:12,760 Thanks to very little control and regulation, 1159 01:09:12,760 --> 01:09:18,300 new ideas find fertile soil to push the envelope of their possibilities. 1160 01:09:18,300 --> 01:09:22,330 But because of this freedom, certain dangers also lurk. 1161 01:09:22,330 --> 01:09:27,000 You'll never know when you'll meet the dreaded ping of death, 1162 01:09:27,000 --> 01:09:29,890 a special version of a normal request ping, 1163 01:09:29,890 --> 01:09:35,720 which some idiot thought up to mess up unsuspecting hosts. 1164 01:09:35,720 --> 01:09:39,130 The path our packets take may be via satellite, 1165 01:09:39,130 --> 01:09:43,090 telephone lines, wireless, or even transoceanic cable. 1166 01:09:43,090 --> 01:09:46,520 They don't always take the fastest or shortest routes possible, 1167 01:09:46,520 --> 01:09:50,290 but they will get there eventually. 1168 01:09:50,290 --> 01:09:55,230 Maybe that's why it's sometimes called "The World Wide Wait." 1169 01:09:55,230 --> 01:09:57,980 But when everything is working smoothly, 1170 01:09:57,980 --> 01:10:03,800 you can circumvent the globe five times over at the drop of a hat, literally. 1171 01:10:03,800 --> 01:10:08,230 And all for the cost of a local call or less. 1172 01:10:08,230 --> 01:10:15,070 Near the end of our destination, we'll find another firewall. 1173 01:10:15,070 --> 01:10:18,420 >> Depending upon your perspective as a data packet, 1174 01:10:18,420 --> 01:10:23,730 the firewall could be a bastion of security or a dreaded adversary. 1175 01:10:23,730 --> 01:10:28,530 It all depends on which side you're on and what your intentions are. 1176 01:10:28,530 --> 01:10:34,990 The firewall is designed to let in only those packets that meet its criteria. 1177 01:10:34,990 --> 01:10:39,360 This firewall is operating on ports 80 and 25. 1178 01:10:39,360 --> 01:10:46,630 All attempts to enter through other ports are closed for business. 1179 01:10:57,660 --> 01:11:03,480 Port 25 is used for mail packets, 1180 01:11:03,480 --> 01:11:10,720 while port 80 is the entrance for packets from the Internet to the web server. 1181 01:11:10,720 --> 01:11:15,080 Inside the firewall, packets are screened more thoroughly. 1182 01:11:15,080 --> 01:11:17,970 Some packets make it easily through customs, 1183 01:11:17,970 --> 01:11:21,420 while others look just a bit dubious. 1184 01:11:21,420 --> 01:11:24,060 Now, the firewall officer is not easily fooled, 1185 01:11:24,060 --> 01:11:32,120 such as when this ping of death packet tries to disguise itself as a normal ping packet. 1186 01:11:32,120 --> 01:11:37,520 [Firewall officer talking to packets] 1187 01:11:37,520 --> 01:11:40,510 [Narrator] For those packets lucky enough to make it this far, 1188 01:11:40,510 --> 01:11:45,730 the journey is almost over. 1189 01:11:45,730 --> 01:11:52,130 It's just a line up on the interface to be taken up into the web server. 1190 01:11:52,130 --> 01:11:55,440 Nowadays, a web server can run on many things, 1191 01:11:55,440 --> 01:11:59,230 from a mainframe to a web cam to the computer on your desk. 1192 01:11:59,230 --> 01:12:01,720 Why not your refrigerator? 1193 01:12:01,720 --> 01:12:04,870 With the proper setup, you can find out if you have the makings 1194 01:12:04,870 --> 01:12:08,390 for Chicken Cacciatore, or if you have to go shopping. 1195 01:12:08,390 --> 01:12:11,760 Remember, this is the dawn of the Net. 1196 01:12:11,760 --> 01:12:17,310 Almost anything is possible. 1197 01:12:17,310 --> 01:12:20,440 One by one, the packets are received, 1198 01:12:20,440 --> 01:12:26,320 opened, and unpacked. 1199 01:12:26,320 --> 01:12:31,200 The information they contain, that is, your request for information, 1200 01:12:31,200 --> 01:12:34,830 is sent on to the web server application. 1201 01:12:41,540 --> 01:12:47,140 The packet itself is recycled, 1202 01:12:47,140 --> 01:12:57,570 ready to be used again, and filled with your requested information, 1203 01:12:57,570 --> 01:13:03,340 addressed, and sent out on its way back to you. 1204 01:13:03,340 --> 01:13:13,250 Back past the firewall, routers, and on through to the Internet. 1205 01:13:13,250 --> 01:13:21,020 Back through your corporate firewall 1206 01:13:21,020 --> 01:13:24,180 and onto your interface, 1207 01:13:24,180 --> 01:13:31,180 ready to supply your web browser with the information you've requested. 1208 01:13:31,180 --> 01:13:39,840 That is, this film. 1209 01:13:39,840 --> 01:13:43,550 Pleased with their efforts, and trusting the better world, 1210 01:13:43,550 --> 01:13:50,250 our trusty data packets ride off blissfully into the sunset of another day, 1211 01:13:50,250 --> 01:13:56,880 knowing fully they have served their masters well. 1212 01:13:56,880 --> 01:14:02,560 Now, isn't that a happy ending? 1213 01:14:02,560 --> 01:14:07,040 [Malan] Okay, that's enough. We'll see you next week. 1214 01:14:07,040 --> 01:14:10,040 [CS50.TV]