[MUSIC PLAYING] SPEAKER 1: All right, welcome back to CS50. This is the end of week eight, and almost Halloween. Tomorrow night's office hours will be the scariest ones yet, and not because of Halloween. But on that note, do realize that problem set six, the spell checking problem set, is renowned to be, for many students, the most challenging, certainly among the C problem sets, and really, in general. And I mention this only because this is the week where a lot of people get particularly stressed with just trying to get the damn spell checker to work. And the one thing I would encourage you is that, as you'll see today, and on Monday, we begin to hit this peak this week where, now, things become a little more familiar, a little more accessible, as we transition from a command line environment in C to a web based environment in PHP. And so I'd encourage you, even if you're really at your wit's end in trying to get the p set to work, if that's indeed the place you're at, or find yourself at, do try to power through it. Because I do think you'll be quite pleased, and quite proud of yourself, if you really end that portion of the course, the C portion, on that high, if stressful, note. So that's not to scare. That's just meant to encourage you to stay up that extra hour in order to get the spell checking working. And if you do, realize that this is optional, entirely. But we have the so-called big board that went live this morning. As of this morning, I was atop the big board, which is a measurement of how much RAM and how much running time your program speller requires. But I've since been displaced. I'm now the unlucky number 13. And what you'll see here is, David Kaufman, and Lauren, and Adam, and Jason, and others are now atop the big board. If you look over there at the right, all of us have really good implementations of size at least-- returning the number of words in the dictionary. And in each of these columns, you'll see how much RAM each of our implementations is using, how much running time it's taking to execute load, versus check, versus size and unload, and then, the total running time. So just to reassure Elmer, and Patrick, and Linda, and everyone else who comes after you, there's absolutely no shame in being toward the bottom of the big board. If anything, that means you got working, and it's correct, but it's not necessarily as efficient, space or time-wise, as it might be. So, totally optional. But meant to be a carrot of sorts so that when you're working on your p set, you're so proud of yourself, you got it working, you post to the big board, you've got a really good number, you go to dinner, you come back, and your roommates has edged you out on the big board. Well, it's time, at that point, to go back to the drawing board so as to re-challenge the big board. If you look at the spec, the instructions for interfacing with the big board are now posted. So a couple of heads ups-- one, the pre-proposal for the final project is due this coming Monday. See this spec on the course's website for what that means. It's really just a casual but thought provoking email between you and your TF, really just to get things started, the conversation started, even though most of you have never even written a web page before, don't even know what you might, how you might, implement your final project. Go on faith that you'll know how to do quite a few more things in a few weeks. So just begin this process per the spec of exploring possible ideas. Also, what we'd invite you to do is-- we have a tradition, for many years now, in the course, of hosting this-- store.cs50.net. Everything's sold at cost. And it's really just an opportunity to wear CS50, if you would like to do that, at course's and. For instance, there are such things as the t-shirts that you might have seen going around campus, sweatshirts. And then, we also invite students to submit designs to be immortalized in the CS50 store. For instance, one of last year's favorites that will, perhaps, now resonate with you is this one here. Very popular item. So if you would like to participate in this, we'll put up a form soon, at cs50.net/design, to which you can upload an image that you've made in Illustrator, or Photoshop, or some similar program. And if you're familiar with these kinds of specifications, we want it to be a PNG image, at least 200 dots per inch, and fewer than that many pixels, and under 10 megabytes. For more details, just email the course's heads at heads@cs50.net if you would like to partake in this. All right, so today, no more C. So we begin to pull back the layers of the internet, the web, and how you can actually start writing software for this different environment. So in particular, let's ask, first, the question of-- let me get us to our familiar drawing app over here. Let me pose the question of, how does the internet work. [? STUDENT: Magic. ?] SPEAKER 1: Magic. OK. Good answer. So we'll start there today, and see if we can't make it a little less magical within the hour. Let's try to tell it in the context of a story. So you're fans of going to facebook.com, or reddit.com, or whatever these days. And so what's really happening when you type in something like facebook.com, and hit Enter, in Chrome, or Firefox, or IE, or Safari, or whatever browser you're actually doing? Can we tell this story, maybe sentence by sentence? What's one of the first things that happens when you hit Enter, after typing facebook.com? [? STUDENT: Your ?] computer makes an HTTP request. SPEAKER 1: OK. So your computer makes-- we'll call it-- an HTTP request. Now what does that mean? Well, all of us have probably seen or typed, for years now, H-T-T-P often followed by colon, slash, slash. So what is that? Well, HTTP is HyperText Transfer Protocol. And that's just a fancy way of saying, it's the language that web browsers, like Chrome and others, and web servers, like facebook.com, speak to one another. And it's a fairly simple, English oriented language. It's almost like pseudo code. And it's a way of a client, as we'll call it-- a browser-- communicating with the server. And just like in a restaurant, when you, the client, sit down at a table and then order something off of the menu of the server, that server's going to bring you back something, whatever it is you requested. Same in the computer world. A browser-- a client-- is going to make a request, and then, hopefully get back something from the server. And that something is, at a high level, the web page. At a slightly lower level, it's a file written in another language called HTML-- HyperText Markup Language. But more on that in just a moment. So HyperText Transfer Protocol-- HTTP-- that's the protocol that browser and server use. Well, what is a protocol, exactly? Well, you can think of it as a language. But if I reach out to our audience here, a normal thing for us humans to do is, when we greet someone, I say, hi, my name is David. [? STUDENT: Hi, ?] my name is Dipty. SPEAKER 1: "Hi, my name is Dipty," she replies. And so we've had this fairly arbitrary interaction of shaking hands, as is often the human convention in most countries. And that's a protocol, right? I sort of initiated it by extending my hand, rather awkwardly, on the stage of Sanders here. She realized, oh, I've gotten a request for a hand apparently. And so she responded to that request by actually acknowledging it. An acknowledging, ACK, is actually a phrase very common in the world of networking, for a server to acknowledge the client. Then, we sort of completed that transaction, and awkwardness over. So that's really what's happening underneath the hood as well. Let me do this a little more technically under the hood. I'm going to go over here to a terminal window. This terminal window happens to be on my Mac, but you could do the same kind of thing in CS50 Appliance. And I'm actually going to use a program that we won't really used for much at all the semester. But it's called Telnet. Back in the day, Telnet was the program that you used to connect to a remote server, to check your mail or to do something like that. For now, we're going to use this old school program, Telnet, to pretend to be a browser. And I'm going to go ahead and do the following-- let me increase my font size. And I'm going to say, Telnet to the server called www.facebook.com, but specifically, Telnet to port 80. We'll come back to this. But for now, know that most services on the internet are identified uniquely by some number. In this case, it's 80. Now most of you have probably never typed 80 before. But in reality, if I go to a browser and pull up, for instance, http://www.facebook.com/-- that's auto-complete, that's not my history-- all right, so now, we go to colon 80 slash. So I claim that even though you've probably never typed this before, with the colon 80 after facebook.com, hopefully, it's still going to work. And indeed, it goes to facebook.com. So it turns out that 80 has been implicit. None of us humans have had to type that for years. Because browsers, by default, just assume that the number you want to use when calling up a server so to speak is, in fact, 80. Because long story short, servers can do way more than just serve up web pages. They can respond to instant messages. They can send emails. There's lots of services that can run on a single server. So these numbers-- in this case, 80-- uniquely identifies one of those services, which is HTTP, the web protocol than a server might actually support. But I can simulate this request now, textually, using this old school Telnet program. So I'm going to essentially now pretend to be a browser and speak HTTP by sending, with my keyboard, exactly the commands that Chrome just knew how to send for me magically. So I'm going to go ahead and hit Enter. Notice that it's trying 31.13.69.32. 13 What is that? So it's an IP address. Now even if you're not too familiar with the intricacies of those, you probably have a general sense that these things exist. And an IP address-- Internet Protocol address-- is just a unique identifier for a computer on the internet. This is a bit of an oversimplification for the moment. But every computer on the internet has a unique IP address, much like every house in, say, the US has a unique postal address, something like 123 Main Street, in Anytown, USA. So something like that. And that, too, is oversimplification. But these addresses that we have in the postal world and these addresses that we have in the computer world uniquely identify servers so that when you send a message to them over the internet, or when you put a letter in an old school mailbox-- postal mail-- the service knows how to get that request, or that letter, to the intended recipient. Now my computer, somehow, has just figured out that Facebook's unique IP is 31.13.69.32. In fact, that can probably change. Facebook probably has multiple IP addresses, because they absolutely have more than one server. But that's happened for us magically. In fact, the internal secret name of the server I've apparently connected to is called star.c10r.facebook.com, whatever that is. It's just whatever the system administrator at Facebook decided to call this particular server that I was somewhat randomly sent to. So now if my connection hasn't timed out, I'm going to pretend to be that browser. I'm going to say get space forward slash space. And I'm going to pretend to be speaking HTTP version 1.1, which is the one that most browsers use. And I'm specifically going to mention to the server, by the way, I want the website known to the world as facebook.com. Enter, Enter. And now, notice what's happened. The server, the waiter, has responded to my order, or my request, with another textual message. Now again, in the world of browsers like Chrome and Safari, you wouldn't see this, as the human. Microsoft and Google just hide these details from us. But Facebook has responded with an answer, also in the language HTTP. Notice there's a code here, 302, which actually has special significance by convention. Found, so that's at least promising. But apparently Facebook is telling me, mm-mm, you don't want what you asked for. You instead want today's special, which is facebook.com/unsupportedbrowser. So at a high level, what does Facebook appear to be doing here? It's redirecting me. So Facebook doesn't like the fact that I'm pretending to be this other browser. And so it's redirecting me to some website. I'm actually curious, now, what this thing looks like. Let me go over to that in Chrome so we can see what they want me to see. So now they've actually sent me back to Facebook because they've realized, oh, you do have a supported browser. We're not even going to show you that page. So let's go ahead and see if we can't fix this. I'm going to have to cheat a little bit. And more on this in the weeks to come. But I'm going to do one thing here. And I'll explain this before long. Give me just a moment to cheat, and wow you. So let me get this. OK. I'll explain what I'm doing in just a moment. I'm going to go ahead and cancel this connection, and try this again. Get slash HTTP 1.1 host www.facebook.com user-agent. OK. Now I have pretended to be Chrome. So it turns out that when a browser sends a request to a server, it's just the honor system. If I say I'm Chrome, Facebook will assume I'm Chrome. And the means by which I identified myself as Chrome is by this atrociously long string. Essentially, all the browser manufacturers in the world have decided, well, this version of this browser on this operating system will have a user-agent string that looks like that crazy mess there. And Mozilla is in there for historical reasons. But notice how much information I'm leaking to facebook.com without even logging in. I'm telling Mark that it's a Mac that I'm using. I'm telling him that it's an Intel based Mac running Mac OS 10.8.5. As an aside, this information is going to every website that you visit with your browser. Pretty innocuous so far, but it gets a little juicier. Notice that, if we read far enough, I'm using Chrome version 30.0.1599.101. But now, notice that the response is not as bad as it was before. Where is Facebook telling me to go now? It's telling me, again, the website-- it's telling me it's moved permanently. Well, where the heck did Facebook go? Yeah, so it's a subtle difference. But notice, here, that the website has actually relocated to HTTPS. So long story short, this is one way that Facebook is enforcing that I actually end up at the secure version of their website, the one that's using encryption-- more complex than the encryption we talked about for p set two, but encryption nonetheless. Now at this point it gets hard for me to spoof their web request using Telnet. Because if they're telling me to use SSL-- the HTTPS prefix is what that implies-- if they're telling me to use cryptography, there's no way I'm going to manually encrypt my message in front of all of you here, and try to figure out how to do that. It's just going to get much more complex. But that's what the browser is doing for you. Let's see if we can't do this a little more simply, then, with a website that's not expecting us to be as secure. Let's go to, say, harvard.edu on port 80. Enter. All right, so get slash HTTP 1.1. And what does this first slash mean? Just to be clear, why do I keep typing that? Well normally, when you type a URL-- and unfortunately, browsers usually hide this these days-- normally, when you go to harvard.edu, that URL officially does end in a slash. Because a single slash denotes what part of the hard drive? The root of the hard drive. We in the Appliance haven't really had to think about this, because we're always in John Harvard's folder. But his folder's in another folder. And that folder's in the root of the Appliance's hard drive, so to speak, even though it's virtual. So a single slash like this means the root of the hard drive. It's like C colon backslash, or it's the root of your volume, on Mac OS. But Chrome, and other browsers these days, have gotten user-friendly, and they hide that slash altogether. But that's all that means in my textual message-- give me the root of harvard.edu's homepage, that is, the default page itself. So let me go ahead and hit Enter. Let me remind the host that I want www.harvard.edu, just in case there's other websites living on the same physical server. OK. Harvard got a little impatient with me. So let's do this again, faster. Get slash HTTP 1.1 host www.harvard.edu user-agent-- I'm guessing our servers don't care as much about this-- Enter, Enter. Whew. Oh damn it, bad request. OK. So what's going on here-- hello, harvard.edu. Why is it doing the-- interesting. Oh, OK. So what Harvard's now doing-- and we're going to quickly veer off of this path, because it's going to get tedious quickly-- notice that Harvard is actually compressing its response to me, which isn't ideal. Because I, apparently, as a human, don't know how to decompress bits that have been sent to me compressed. And they're being shown is garbage there, because they're zeros and ones, but they're not ASCII characters. They're patterns of zeros and ones that have been compressed to take up less space. So very quickly, let me see if I can recover here. Let's try, maybe, another campus altogether. mit.edu get slash HTTP slash 1.1 host www.mit.edu user-agent colon there. Thank you, MIT. OK. So here we have a web page. So this is the language known as HTML-- HyperText Markup Language. I'm simply scrolling back up in time to get to the very tip top of this page. And notice how MIT has responded to my request. 200 is good. 200 means everything is literally OK. And that's a status code that we humans really never see, in a good way. Because it means all is well. Notice that MIT is informing me, hey, the server we're running is called Apache, which is a very popular open source free web server. They're running, apparently, UNIX, which is an operating system like Linux. Notice that they apparently updated their web page at 4:00 a.m., Greenwich Mean Time. Notice a couple of other details. They're returning, to me, text/html. So we'll see what that means in just a moment. They've apparently given me 14,717 bytes worth of HTML. And some other, more esoteric information is in there. But this is where it gets interesting. This is how you make a web page. This is how you make a web page whose title in the tab, in your browser, is MIT hyphen Massachusetts Institute of Technology. And indeed, if we go back to Chrome and visit www.mit.edu, notice that, indeed, in the title up here, is MIT dash Massachusetts Institute dot, dot, dot. And now notice, too, if I right click or control click on the desktop here, and go to View Page Source-- at least in Chrome, though every browser does this via some means-- here is that same file. It happens to be color coded, or syntax highlighted. But just like with your C code that was not colorized by you, it was colorized by gedit, similarly is Chrome just making this prettier to read. But this is the stuff that we'll soon be writing. So that's the endgame. The server has responded with that information, just like you responded with your hand for our handshake. But what else has to be going on in between those steps? Well, when I type in, in this last case, www.mit.edu and hit Enter, we know it's talking to port 80 automatically, port just being that number. But where did the IP address go? How is my computer figuring out what the IP address of mit.edu is? Well, it turns out, in this world, there are things called DNS servers. And let me go ahead and draw a quick picture over here. And this'll just sketch out, in rough terms, what's going on. So we'll pretend like this is my laptop here, in Sanders. And it has Wi-Fi, so it's connected wirelessly to something. What's it actually connected to? Well, somewhere in here, there's something on the wall with some antennas. And that's called an access point-- AP. Wireless access point, wireless router-- call it whatever you want. But they're all over campus, with those little antennas. Ours are made by Cisco, typically. And so somehow, my computer is talking to that wireless access point, somewhere here in Sanders, or downstairs, or outside. Meanwhile, this thing has a lot of physical wires going to, probably, the Science Center, which we'll draw like this. It doesn't actually look like that. That actually looks a lot better. So the Science Center has a whole bunch of computers inside of it that are somehow physically connected to all of these access points on campus. And those physical computers, we'll call routers, or gateways. A router, as its name suggests, it's purpose in life is to route information. It takes some bits, from a computer, as input, and figures out to where those bits should be sent. So in the case of my request for mit.edu, it's actually pretty easy. My request comes in from my browser, over Wi-Fi, to the access point, then, via some cable, into a router in the Science Center. And somehow, the router in the Science Center figures out that MIT is that way. And I'm going to move forward those bits, I'm going to route those bits, down the road, down Mass Ave., to MIT. But how did my computer know what the IP address even was? Well it turns out that somewhere in here there are servers-- and I'm going to draw it fairly abstractly-- as a DNS server-- Domain Name System. These are not routers. These are different types of servers whose purpose in life is to translate host names, like www.mit.edu, to IP addresses, like 1.2.3.4 So DNS servers do exactly that. You can think of them as having a big database, or really, like a big Excel file with two columns. One is host names, one is IP addresses. And they just convert one to the other, in either direction. Now in reality, it's a little more complex than that. But that's how my computer, my random Mac or PC on this table here, knows what the unique identifier is for www.mit.edu, or Facebook, or harvard.edu, for that matter. But of course, there's the entirety of Mass Ave here. And then, we get to MIT, which this is actually more compelling. That'll be MIT. And so they, too, have some servers. And they somehow have a wired, or wireless, connection to Harvard. And of course, we can go much farther down the road than MIT, and talk to most any computer in the world. But let's see if we can't see that. Let me go back to my Terminal window for just a moment. And let's assume that I figured out what the IP address is for mit.edu like Telnet figured it out before, and my browser can clearly figure it out for me. And I'm going to run another program, in this Terminal window, called traceroute, tracing the route from here-- literally, this table-- to www.mit.edu. Let's see what happens. Let me actually shrink the font size. Oop. No, I wanted to surprise you. OK. So here we go. Let me go ahead and run this here. And what I was seeing a moment ago, and we're seeing again now, is this output-- traceroute www.mit.edu. Notice, in the first line, this program indeed figured out that MIT's IP address is this number here. And now, what's going on between us and them? So this line here, in row one, and this line here, in row two, and then, row three-- what do each of these lines probably represent? Locations, points, sure. They're called hops, conceptually. But physically, what are they? They're routers. We only have, really, one piece of hardware here to talk about thus far. They're routers. So this thing here-- crazy name-- but this is probably machine room, MR, in the Science Center. It's a gateway, aka router. This is just some unique number that someone came up with for it. And it's within harvard.edu. And that's the IP address of that router that's, again, probably in the Science Center, based on its name. This second row represents another router that doesn't have a nickname apparently-- a host name-- it just has an IP address. So long story short, to get data from points A to B, there's more than just Harvard's router, and MIT's router, and Google's router, and Facebook's router. There's dozens, hundreds, thousands of routers between any point A and any point B on the internet. But typically, you can get data from one point to another in fewer than 30 hops. In other words, you only have to hand the data to 30 or fewer such routers. And it's typically many fewer than that. Well, let's see what happens here. In row three, we hit a router called core Science Center gateway something or other. In row 4, we have border gateway-- these are just cryptic acronyms-- also within harvard.edu. Here's another border gateway. And then, all of a sudden, whoa, we seem to be in New York City. So it turns out-- and I'm in inferring only from the host name. This could be misleading. It could be down the road. It's tough to say-- but this can be used as a revelation that the shortest distance between two points on the internet is not necessarily a straight line. If we think of shortest as the quickest path, the least congested path, it is quite possible-- though we can't be sure-- that the data is traveling a decent distance between rows five and six. Now unfortunately MIT, or someone, got a little self-defensive, and they've started ignoring our requests. Those routers have been configured to ignore requests of the form who are you, who are you, who are you. So let's see if we can't do this with someone more cooperative. So Stanford has a nice tradition of having a little more openness. So let's see what happens here. Again, pretty cryptic. But we start, again, in the machine room in the Science Center, in row one. So that's good. Most of the servers did reply, including Stanford. So notice we went from the machine room in the Science Center, to some anonymous router elsewhere, to another Science Center gateway, to a border gateway, and then, to something here-- nox.org. This is the Northern Crossroads, a very popular peering point where lots of cables, lots of ISPs-- internet service providers-- connect into. Here's another nameless IP here. Here's another such server. But this is interesting. Where is the router in row eight, probably? So it's probably in Washington, DC. And I can kind of corroborate that hypothesis this time. Because how long did it take us to go from the Science Center to this router in row seven? Well, these milliseconds measurements on the right hand side here are estimates of that time. There are three of them because the program, traceroute, tries every router three times, just so you can get a visual average of the numbers. But it apparently takes six milliseconds to get to row seven's router. But how fast can, apparently, you travel, if you are a bit, between Boston and Washington DC? 14 milliseconds is as long as it takes for that instant message, for that email, for that web page request to travel between here and Washington DC. If I go further, to router number 10, what city am I apparently in now? So, Houston. And this is corroborated by the jump in time. It's really slow to get to Houston. It takes 47 milliseconds to get from Boston to Houston in this case. And if we look further, LAX-- looks like we're getting to Stanford sort of this way, by going through LA. But I'm inferring that from LAX. The geeks tend to use airport codes for routers names here. And this is kind of consistent with that assumption. 82 milliseconds. Then, we apparently go to another LAX, another LA router and then, some nameless one, and then finally, a cryptic name on Stanford's network, or close thereto, stanford.edu, is 90 milliseconds away, or 6 plus hours by plane. So this is how fast data travels on the internet. And it's things we absolutely take for granted these days. When you're having some Gchat with someone, and the messages are just appearing, consider just how fast that's happening. And visually, it's indeed happening at that kind of rate. So between points one and 18, in this case, there are things besides routers. What are some machines on the internet that can block traffic from getting through? STUDENT: Firewalls. SPEAKER 1: So, firewalls. And we have personal firewalls such that your own Mac or PC can keep traffic in or out. Harvard has firewalls. MIT presumably has firewalls. And Stanford does, as do all of the internet service providers who own these routers in between points A and B. But did you ever stop to consider, or care, how a firewall works. Well already, we have the basic building blocks with which to engineer that answer. If you were a firewall-- and let's suppose that you are somewhere between point A and point B. A cable is coming into you, and going out of you. So you have the technological ability to look at all of the envelopes of information that are flowing between you and the other person. In other words, those get messages I was manually typing, you can think of them as writing a quick note to someone, putting the IP address of the recipient, and the port number of the recipient, on this envelope, then, writing your own IP address and your own port number in the top left hand corner like you would a letter. Then, you send it out wirelessly. And it somehow travels, through routers, through wires, wirelessly, down the road to MIT. So if you're a firewall, how do you stop that from happening? What would you do if your next p set was implement a firewall? How do I stop all Harvard people from ever talking to MIT people again? [? STUDENT: You ?] reverse the letter. SPEAKER 1: You what? [? STUDENT: Reverse ?] the letter early. SPEAKER 1: Reverse the letter-- what do you mean? [? STUDENT: Send ?] it back to the sender. SPEAKER 1: Send it back. OK. So you could reject the virtual envelope, sort of by doing return to sender somehow. So sure, that's what we want to achieve. But let's dive a little deeper. How do I do that? If the input to this problem-- if I'm the firewall, and I'm effectively standing between points A and B, and I am a middle man that gets to look inside of this envelope, and then decide whether to send it back to Harvard or to allow it to continue, what is it I, the firewall, am going to want to look at? I think I heard it here. [? STUDENT: Where it's ?] coming from. SPEAKER 1: Where it's coming from. So if the source IP address-- the little number up here-- is an IP address belonging to Harvard-- and I can actually know that with high probability. Most of Harvard's IP addresses start with 140.247 dot something dot something, or 128.103 dot something dot something. Harvard owns those chunks of IP addresses. Well, if I see that IP addresses as the sender, I can just send it back. In reality, the internet doesn't bother wasting time sending the bits back. It just literally drops the packet by deleting it, effectively. So what else could I look at though? Suppose that I want to let people at Harvard visit mit.edu, and pull up websites, and watch videos at MIT, and the like. But I don't want humans at Harvard emailing anyone at MIT. How could I allow traffic from Harvard to MIT, via the web, but disallow something like an email? [? STUDENT: The ?] port number. SPEAKER 1: A port number-- that's the only other ingredient we have. We have IP address, which we just leveraged, or we have port number, where 80, we said, uniquely identifies web traffic. Now I wouldn't expect you to know this-- some of you might already know from familiarity-- what's a number that's used for email, usually? It's often 25. 25 refers to SMTP, which is a mail transfer protocol that you might have had to set up at some point, if you're using Eudora, or Outlook, or something like that. It's just another number-- 25. Telnet, which we were using before, uses 23. FTP-- file transfer protocol, if you've ever heard of that one-- uses 21. HTTPS, the secure version of HTTP, which we'll come back to before long, uses 443. So the world has a whole bunch of numbers that correlate packets-- rather, correlate services to those actual numbers. So that's all a firewall is doing. It's taking a look inside this virtual envelope, and then deciding yea or nay to forward along, based on those ingredients. Now what could Harvard clearly do to get past this firewall then? If you want to be able to send a message to MIT but not be detected, well, you could spoof your IP address, and just somehow be fancy enough, know how to write C code, and write your own network program that changes the firm address. The problem is you can absolutely send data anonymously, but if you want to get any kind of reply, like see MIT's homepage, obviously, this addresses needs to be correct. Otherwise, you can say anything you want, you're not going to hear back from them. But these are just one of the kinds of attacks that we can send. But it turns out when we send these messages-- and let's do an example of this. It turns out, if I have a message that I want to send, it's not just sent in one envelope. For efficiency's sake, especially when the files you're requesting or the responses you're getting are particularly large, what TCP/IP-- Transmission Control Protocol / Internet Protocol-- it's just a fancy way of saying what the networking software and computers do-- is they take a message like this, and they cut it up into fragments-- let's say four fragments. And if I now cut this up into here, cut this up into here, what my computer is then going to do is it's going to take one fragment and put it in an envelope. All right, and let me get a-- let's see. It's going to take one. It's going to take another envelope, and it's going to put the second part of this message in here. All right. It's going to take the third part, put it in here. Maybe next time we'll just do two parts. And we'll take the fourth part, and put it in here. And what, now, has to be written on these envelopes-- which we'll pretend to do, for time's sake, and not actually write out. What needs to be written on each of these four envelopes, with my message to someone? [? STUDENT: The ?] order. SPEAKER 1: So, the order. I need not only the IP address and the port numbers, as we just discussed, I now need a sequence number of some sort to say, this is packet one, this is two, this is three, this is four. And this is actually useful. Because the internet, it turns out, is actually pretty unreliable. Routers can get congested. Cables can get overwhelmed-- an oversimplification-- but, with bits such that what routers have to do is just drop packets. In other words, if the internet is just really congested, you might get three out of those four packets. But if you have a unique identifier on each of them, you'll know that you're missing packet number four of four. So you can ask the guy at the other end to resend it. But assuming that doesn't happen, let's see what might happen. So if I want to send a message to-- who would like to receive my message from the internet? How about someone closer up front. Brian, is it? All right. You stay there. I'm going to send it to you. And the thing about the internet is that they might not even follow the same path. So here I go. I am sending a message, fragment one of four. Be a router. Just let other people deal with it. There you go. We'll give this to you, and we'll give this to you. And we'll see how quickly-- how many milliseconds it takes to get this message to Brian. Everyone gets to participate today. All right. Brian has one, and two. If someone wants to be-- [? STUDENT: All four. ?] SPEAKER 1: He has all four. So no one chose to drop a packet. That's cool. That's fine. So Brian now has all four. If you want to go ahead and reassemble those for us. I know, we're pretending. So for time's sake-- we have four. So, OK, open one of them. OK. That's one fourth of my message to you. Now, open the second. This may be funny, in the end, only to me and Brian. All right, you've got two. So in the meantime, we physically did this with the scissors, but all it takes to fragment these things in a computer is just to send some of the bits in one packet, in one virtual envelope, some of the bits in the other, some in another, and some in a fourth, and then, let the computer decide, based on those numbers, in what order you have to concatenate them. And Brian's, maybe, the only one that can see this. The message I sent to Brain-- because of course, the internet is filled with these, is-- yes. So that's the message. And Brian can hang on to that now. So it took, obviously, a while to do this. But that's what really happens, like routing data through the audience in this way. But there is, again, a number of points, routers, firewalls, and other such things between points A and B. And rather than just tell the story verbally, I thought I'd pull up this video that some friends of ours, from Erikson, years back, actually put together that explains how this all works. And it's about 10 or so minutes long. So let's give you, now, Warriors of the Net. [MUSIC PLAYING] NARRATOR: For the first time in history, people and machinery are working together, realizing a dream-- a uniting force that knows no geographical boundaries, without regard to race, creed, or color-- a new era where communication truly brings people together. This is the dawn of the net. Want to know how it works? Click here to begin your journey into the net. Now exactly what happened when you clicked on that link? You started a flow of information. This information travels down into your own personal mail room, when Mr. IP packages it, labels it, and sends it on its way. Each packet is limited in its size. The mail room must decide how to divide the information, and how to package it. Now the package needs a label containing important information such as sender's address, receiver's address, and the type of packet it is. Because this particular packet is going out onto the internet, it also gets an address for the proxy server, which has a special function, as we'll see later. The packet is now launched onto your local area network, or LAN. This network is used to connect all the local computers, routers, printers, et cetera for information exchange within the physical walls of the building. The LAN is a pretty uncontrolled place, and unfortunately, accidents can happen. The highway of the LAN is packed with all types of information. These are IP packets, Novell packets, AppleTalk packets-- they're going against traffic, as usual. The local router reads to address and, if necessary, lifts the packet onto another network. Ah, the router-- a symbol of control in a seemingly disorganized world. ROUTER: Whoops, sorry about that. Let's put this one here, this one here. This moves here. This one moves here. I don't like this one. Let's move this one. This one goes here. [INAUDIBLE] Put another jangle here. Let's put this one here. Nah, I'll go with that. Let's put that one here. NARRATOR: There he is-- systematic, uncaring, methodical, conservative, and sometimes, not quite up to speed. But at least he is exact, for the most part. ROUTER: Put that one over there. That one goes there, that one goes there, and this one goes there. Well, another one goes there. That goes here. [INAUDIBLE] NARRATOR: As the packets leave the router, they make their way into the corporate intranet and head for the router switch. A bit more efficient than the router, the router switch plays fast and loose with IP packets, deftly routing them along their way-- a digital pinball wizard, if you will. ROUTER SWITCH: Here we go. Here comes another one. And it's another. Watch this, mom. Here it goes. Whoop, around the back. Hey, in there, in there. Over to the left. Over to the right. Over to the left. Over to the right. You got it. Here it comes. He shoots, he scores. It's going. Hey Wayne, watch out, here comes another one. Oh, here we go. NARRATOR: As packets arrive at their destination, they're picked up by the network interface, ready to be sent to the next level-- in this case, the proxy. The proxy is used by many companies as sort of a middle man in order to lessen the load on their internet connection, and for security reasons as well. As you can see, the packets are all of various sizes, depending upon their content. The proxy opens the packet and looks for the web address, or URL. Depending upon whether the address is acceptable, the packet is sent on to the internet. There are, however, some addresses which do not meet with the approval of the proxy-- that is to say, corporate or management guidelines. These are summarily dealt with. We'll have none of that. For those who make it, it's on the road again. Next up, the firewall. The corporate firewall serves two purposes. It prevents some rather nasty things from the internet from coming into the intranet, and it can also prevent sensitive corporate information from being sent out onto the internet. Once through the firewall, a router picks up the packet and places it onto a much narrower road, or bandwidth, as we say. Obviously, the road is not broad enough to take them all. Now you might wonder what happens to all those packets which don't make it along the way. Well, when Mr. IP doesn't receive an acknowledgement that a packet has been received in due time, he simply sends a replacement packet. We are now ready to enter the world of the internet, a spider web of interconnected networks which span our entire globe. Here, routers and switches establish links between networks. Now the net is an entirely different environment than you'll find within the protective walls of your LAN. Out here, it's the Wild West-- plenty of space, plenty of opportunities, plenty of things to explore, and places to go. Thanks to very little control and regulation, new ideas find fertile soil to push the envelope of their possibilities. But because of this freedom, certain dangers also lurk. You'll never know when you'll meet the dreaded ping of death, a special version of a normal request ping which some idiot thought up to mess up unsuspecting hosts. The path our packets take may be via satellite, telephone lines, wireless, or even trans-oceanic cable. They don't always take the fastest, or shortest, routes possible. But they will get there eventually. Maybe that's why it's sometimes called the world wide wait. But when everything is working smoothly, you can circumvent the globe five times over at the drop of a hat, literally-- and all for the cost of a local call, or less. Near the end of our destination, we'll find another firewall. Depending upon your perspective as a data packet, the firewall could be a bastion of security, or a dreaded adversary. It all depends on which side you're on and what your intentions are. The firewall is designed to let in only those packets that meet its criteria. This firewall is operating on ports 80 and 25. All attempts to enter through other ports are closed for business. Port 25 is used for mail packets, while port 80 is the entrance for packets from the internet to the web server. Inside the firewall, packets are screened more thoroughly. Some packets make it easily through customs, while others look just a bit dubious. The firewall officer is not easily fooled, such as when this ping of death packet tries to disguise itself as a normal ping packet. FIREWALL: Next. OK. Go on. That's OK. No problem. Have a nice day. Be out here. Bye. NARRATOR: For those packets lucky enough to make it this far, the journey is almost over. It's just a lineup on the interface to be taken up into the web server. Nowadays a web server can run on many things, from a mainframe, to a webcam, to the computer on your desk. Or why not your refrigerator? With the proper setup, you can find out if you have the makings for chicken cacciatore, or if you have to go shopping. Remember, this is the dawn of the net. Almost anything's possible. One by one, the packets are received, opened, and unpacked. The information they contain-- that is, your request for information-- is sent on to the web server application. The packet itself is recycled, ready to be used again, and filled with your requested information, addressed, and send out, on its way back to you, back past the firewall, routers, and on through to the internet, back through your corporate firewall, and on to your interface, ready to supply your web browser with the information you requested-- that is, this film. Pleased with their efforts and trusting in a better world, our trusty data packets ride off blissfully into the sunset of another day, knowing fully, they have served their masters well. Now isn't that a happy ending? SPEAKER 1: That, then, is how the internet works. Through problem set seven will you better understand this and will you learn a bit of HTML, PHP, and more. More on that in the specification that will go out on Friday. And we will see you on Monday.