[MUSIC PLAYING] 

DAVID J. MALAN: So I just wanted to assuage to. I would echo exactly what Scaz said about institutional memory. CS50 has been around for some 20 years at Harvard. And the reality is, from the seniors on down, there is annually reassurance that the freshmen, the sophomores, and the juniors and also the seniors taking CS50, that you end up doing fine. 

The reality is, students do not fail CS50. In fact, in the rare instances where we've had Es or Fs, it's really been because of extenuating circumstances, whether it's medical or personal. Ds are incredibly uncommon as well. And I can say comfortably, though we typically don't disclose statistics, but given that there is no institutional memory here whatsoever, a majority of students in CS50 do end up getting A range grades. A significant chunk end up ending up in the B range too. 

So even though you might be equating in your mind threes with 60% and therefore Ds, or Cs, or the like, it really does not line up with the reality. In fact, we mean exactly what we say at the beginning of the term that so many students in CS50, both in Cambridge and here in New Haven, have never taken a CS course before. And what indeed ultimately matters is where you end up in week 12 relative to yourself in week zero. Now we have multiple tracks in the course as you know-- less comfortable, more comfortable, somewhere in between. And indeed, when you get statistics on this week's quiz, don't be discouraged if, especially if you feel that you're around the mean or below the mean or the median, especially since we don't necessarily take all those demographics into account mid-semester with the grading statistics. 

In other words, we know statistically every year that students who are less comfortable, do a little worse on the quiz. And students who are more comfortable do a little better on the quiz. But per that promise in the syllabus and also in the first week of lectures, we take all of that into account. 

Indeed, at years end, what we end up doing is normalizing all scores across sections, both in Cambridge and now here in New Haven, which means taking into account the disparate styles, the disparate harshness, the different sort of personalities that the individual TAs have here and in Cambridge so that you're not at a disadvantage even if you just happen to have had a TF or a TA who's been a little tougher on you in your mind. 

Two, we take into account comfort level and actual background, or lack thereof, when taking quiz scores into account. So those two are factored in. And at the end of the day, because it's always the case that a student ended up in a less comfy section when he or she really belonged in an in-between or vice versa, everything is so incredibly individualized. Indeed, you will get annoyed at us at the end of the term when we are late submitting your grades because with Scaz, and Jason, and Andy, and I, and the team will have done in Cambridge is literally have hundreds of emails back and forth with all hundred of the courses TAs, here and in Cambridge, asking them what they think of all of their students based on a draft of the grades. And everything there after is incredibly individualized. So to the extent we get to know you in office hours, sections, and more, all of that too is taken into account. 

So though we tend to use this five point scale, please, detach yourself from the assumption that a three is indeed a 60%. It is meant to be good. And the teaching assistants are charged at term start to try to keep scores in the twos, and threes, and fours range so that we actually have room to grow. And we actually have a yardstick by which we can give you useful feedback as to how you're doing and how you're progressing. So please do take that to heart. 

Are there any questions I can help address or concerns I can help assuage? Or promises I can try to keep? No? OK. 

All right. So with that said, this is CS50. This is the start of week six here in New Haven. Let's begin with a brief dimming of the lights to set the stage for today's content. [VIDEO PLAYBACK] [MUSIC PLAYING] -He came with a message. With a protocol all his own. He came to a world of cool firewalls, uncaring routers, and dangers far worse that death. He's fast, he's strong, he's TCP/IP. And he's got your address. Warriors of the net. 

[END PLAYBACK] DAVID J. MALAN: All right. This is CS50. This is the start of week six. And this is the start of our look at the internet and web programming. And, perhaps most excitingly, today marks the transition for us from our command line world of C to the web based world of PHP, and HTML, and CSS, and SQL, and JavaScript, and so much more that is on the horizon. 

But first, it has come to our attention in walking across campus that there is a certain bathroom here in New Haven called the Harvard room, which is a little greyed out here. But indeed, someone went to the time and expense of etching in Harvard room on this here room. Thank you for that. I can't say we have an analogue in Cambridge yet, but I think we have a little project for ourselves now when we go back. So thank you for that. 

So a quick look back at where we left off last week and where you're going this coming week with problems set five. So in problem set five, you'll be challenged to implement a spellchecker. And to do that, you'll be handed a pretty big text file with like 140,000 English words. And you'll be challenged to decide on a data structure with which you want to load all of those words into memory, and into RAM, and then implement a few functions, one of which is going to be check. Whereby when passed an argument, a word, your function check simply is going to have to say true or false, this is a word in the dictionary. 

But you're going to have some design discretion and challenges when it comes to implementing that. In the simplest implementation, you could certainly implement a spellchecker in the underlying dictionary with what kind of data structure? You just need to store a whole bunch of strings in memory? What's the go to answer from week two perhaps? AUDIENCE: Array. DAVID J. MALAN: You can use an array. And that's not all that bad. But you don't necessarily know in advance how big of an array you're going to need, if you don't know the file necessarily in advance. So you're going to have to use a little bit of trickery like malloc, like we started using. Or we could address that concern by using what other data structure that's been sort of a marginal enhancement on an array? AUDIENCE: Linked list. DAVID J. MALAN: Like a linked list, wherein we get some dynamism. But there's a little more expense. We have pointers to maintain. And you've not yet coded this up, but there's definitely to be a little more complexity than just using square brackets and jumping around an array. 

But an array's running time, if you're searching for a word, might be log of n. But again, it might be a little non-trivial to build up that array not knowing the size in advance. A linked list though, if you just store a bunch of strings in a linked list, what's your upper bound on running time going to be to search for or check a word in that list? 

AUDIENCE: n. 

DAVID J. MALAN: Yeah, big O of n or linear because in the worst case, the word is like a Z word all the way at the end. And because of a linked list, because those arrows by default, in a singly linked list, only go from one direction to the other, you can't jump around. You have to follow all of them. 

So we proposed at the end of last week, week five, that there are better ways. And in fact, the holy grail would really be constant time whereby when you want to look up a word, you get an instant answer irrespective of how many words are already in your dictionary. 

This is an artist's rendition of what you might call a hash table. And a hash table is kind of a nice amalgam of an array-- drawn vertically here, just because-- and then a linked list-- draw horizontally here. And the hash table can be implemented in bunches of ways. This excerpt from a textbook happens to use these people's birth dates as the means by which it's deciding where to put someone's name. So this is a dictionary if you will of names. And in order to expedite putting names into this data structure, they look at, apparently, these people's birth dates with respect to a month. 

So it's 1 to 31. And forget about February and corner cases like that. And if your birthday is on January 1, or February 1, or December 1, you're going to end up at the very first chain up top. If your birth date is like the 25th of a month, you're going to end up at bucket number 25. And if there's already someone there in any of those locations, what you start doing with these linked lists is stitching them together so that you can have an arbitrary number of people, or anything, at that location. 

So you have kind of a mix of constant time for hashing. And to hash something means to take as input like a person, or his or her name, or his or her birth date, and then decide on some output based on that, like looking at their birthday and outputting one through 31. 

So then you might have a bit of linear time, but in reality, and as in the case of problem set five, we're not going to be working in P set five so much about asymptotic running time, like the theoretical slowness with which an algorithm might run. We're going to care about the actual number of seconds and the actual amount of memory, the actual number of bytes of memory you're using. So frankly, having one huge chain of like a million people is pretty damn slow if you're searching for a name in a list of size million. 

But what if you divide that list up into 31 parts? Searching 1/31 of that super long list, in reality, is certainly going to be faster. Asymptotically, it's the same thing. You're just dividing by a constant factor. And recall that we throw those things away. But in reality, it's going to be 31 times faster. And that's what we're going to start to leverage in P set five. 

So P set five too also proposes that you consider slightly more sophisticated data structure called a trie. And a trie is just a tree like data structure. But instead of having little circles or rectangles as we keep drawing for nodes, it actually has entire arrays for its nodes. And even though this is a bit abstract here to look at, Zamyla in the P set walk through will walk you through in more detail on this. This is a data structure that rather cleverly might have each node being an array of size 26, A through Z or zero through 25. And when you want to insert a person's name into this data structure or find him or her, what you do, if the name is like Maxwell, M-A-X-W-E-L-L, you first look at M. And then you jump to the corresponding M location in the first array. You then jump to A, the first location in the next array, following the arrows. Then X, then W, then E, then L, then L, and then maybe some special end character, some sentinel that says a word stops here. 

And what's nice about this-- and keep in mind that the picture here, notice how edges of every array are cut off. That's just because this thing would be massive and horrific to look at on the screen. So it's excerpted. What's nice about this approach is that if there's a million names already in this data structure, how many steps does it take me to insert Maxwell? M-A-X-W-E-L-L-- like seven-ish steps to insert or look for Maxwell. 

Suppose there's a trillion names in this data structure. How many steps does it take me to look for Maxwell? M-A-X-- still seven. 

And therein lies the so-called constant time. If we assume that words are certainly bounded by 20 characters, or 46 characters, or some reasonably small integer, then it's effectively a constant. And so insertion and searching a trie is super fast. Of course, we never get anything for free. And even though you probably haven't dived into P set five yet, what price are we probably paying to get that greater efficiency time wise? 

AUDIENCE: Memory. 

DAVID J. MALAN: Memory, right? I mean, we've not drawn the whole picture here. This excerpt from the textbook hasn't drawn all of the arrays. There's a huge amount of memory and just null pointers that aren't being used. So it's a trade off. And it'll be left to you in P set five to decide on which way you want to go. 

Now this idea of hashing, as an aside, is actually super prevalent. So to hash a value means, quite simply, to take something as input and produce an output. So a hash function is just an algorithm. 

And generally, a hash functions purpose in life is to take something as input and produce a number as output, like the number one through 31 or A through Z, zero through 25. So it takes a complex output and shrinks it down to something that's a little more useful and manageable. 

And so it turns out in a very popular function that the security world and the human world's been using for years is called SHA1. This is a pretty fancy mathematical formula that does essentially that. 

You take a really big chunk of zeros and ones-- that could be a megabyte long, a gigabyte long-- and it shrinks it down to just a few bits, a few bits, so that you have a number like one through 31, or A through Z. But in reality, it's a little bigger than just A through Z. 

Unfortunately, we're on the cusp of what someone playfully called the SHAppening whereby the world is about to end in probably a few months time because researchers, just this past week, published a report that contrary to what security researchers have thought for some time, by just spending about, what was it, I think it was $175,000-- a lot of money, but not beyond the reach of particularly bad bad guys, or particularly bad countries-- $175,000 could buy you a lot of rented server space in the cloud. And we'll come back to the cloud before long. But it just means renting server space on like Microsoft's servers, or Google's, or Amazon's, or the like where you can pay by the minute to use someone else's computers. 

And it turns out if you can pay someone else to borrow their computers and run code that you've written on it and use pretty fancy mathematics, you can essentially figure out how someone's hash function is working, and given its output, reverse engineer what its input is. And for today's purposes, suffice it to say, this is bad. Because SHA1 and hash functions like it are super commonly used in security applications, encrypted connections on the web, bank transactions, cellular encryption for your cell phones, and the like. And so any time someone finds a way to reverse engineer one of these technologies or break it, bad things can happen. 

Now the world already knew this. This was foreseeable. And the world has since moved from SHA1 to SHA256, which is just a fancy way of saying they use bigger bits. And in fact, even CS50's own website upgraded last year to-- not that we face all this many threats trying to get at the PDFs and whatnot-- but CS50's website uses the bigger hash function, which means that we will be safe. So all of your PDFs will be safe, but not necessarily your money or anything particularly private or personal to use. Sp check out that URL if you'd like some additional details. 

So problem set five is indeed on the horizon. Quiz one is this coming Wednesday. But do take advantage of office hours, both tonight and tomorrow. And also take advantage of office hours, if you're available, right after this. The staff and I'll stick around and do more casual Q&A in addition to tonight. And let me strongly note here, for those of us here in New Haven-- so it's absolutely per Scaz's remarks felt, I'm sure, like a bit of an uphill struggle. And by reputation, if you haven't learned already or heard from some friends at Harvard, know here are some new institutional memory. P set five kind of sort of tends to be the hardest in CS50, or the most challenging for most students. 

But what that means is that we're almost at the top of this hill. And I really do mean this. It's the most challenging, but it's also the most rewarding in that unlike most every other introductory computer science course in the US that we know of, most students do not finish an intro course having already implemented things like trees, and tries, and hash tables, and the like. 

And so I do hope, and we do hope that you're have an enormous sense of satisfaction even if the week or two via which you get to that satisfaction does feel a little bit like this. But let me reassure, we only have four P sets left. So sort of that top is in sight. 

On the other side of it, trust us, it's just rolling hills and clouds. And shall we say, puppies are on the other side. So you just have to hang in there a little longer. I mean, indeed as we start to transition into the world of web programming, you'll find that things become-- this is adorable actually. OK, we'll post this URL later. You'll find too that we're reaching sort of a plateau where everything is indeed still sophisticated and challenging by design, but you're not going to feel like we are perpetually going up this hill. So take some comfort in that. 

So without further ado, let's start to make this market transition in the semester to the world of the web, and really the world with which all of us are more familiar. We've got internet devices in our pockets, on our desks, in our backpacks, and the like. How does all of this work? And how can we start writing code that's not super arcane and in some blinking text prompt that none of your friends or family are ever going to want to interact with, but something you can put on their phones, or on their web browsers, or on any devices with which they interact. 

So here is someone's home. And inside of this home is a couple of laptops, a couple of old school desktop computers, something called a router or hub in the middle, and then some kind of cable modem or DSL modem. And then there's the internet, generally drawn as a cloud up there in the sky. 

So this picture, though a little sort of dated, certainly captures what most of you probably have in your homes, or effectively what all of you have in your dorm rooms, or apartments, or the like. 

So what is actually going on when you try to use the internet today? So every computer on the internet, it turns out, needs to have a unique address, much like we in the real world need a postal address, like 51 Prospect Street, New Haven, Connecticut, or 33 Oxford Street, Cambridge, Massachusetts. So do computers on the internet need a way of uniquely addressing themselves. 

That is so that when one computer wants to talk to another, it can send a message and inform the recipient to whom it should send the response back. So it just makes sort of intuitive sense perhaps that everything have an address of some sort. 

But how do you get an address? Well, if you get here on campus, or you go home and you turn on your laptop or desktop computer, and either plug it in or connect to Wi-Fi, it turns out that there's a special server on most networks called a DHCP server. Doesn't really matter what this stands for, but it's dynamic host configuration protocol, which is just a fancy way of saying, this is a computer that either Yale has, or Harvard has, or Comcast has, or Verizon has, or your company has, whose purpose in life, when it hears someone newly added to the network, is to say here, use this address. 

So we humans don't have to hard code into our computers what our unique address is. We just turn it on, open the lid, and somehow this server on the local network just tells me that my address is 51 Prospect Street, or 33 Oxford Street, or the like. 

Now it's not going to be so verbose as that. Rather what I'm going to get is a numeric address called an IP address. IP meaning internet protocol. And odds are by this time in your life, you probably heard or seen the word IP, or generally thrown it around perhaps. But in fact, it's pretty straight forward a thing. 

An IP address is just a dotted decimal number, which means it's something dot something dot something dot something. And each of those somethings happens to be a number between 0 and 255. 

So based on five plus weeks of CS50, if these numbers each range from 0 to 255, how many bits is each of those number signs? 

AUDIENCE: Eight. 

DAVID J. MALAN: It's got to be eight. So in total, how many bits is an IP address? AUDIENCE: 32. 

DAVID J. MALAN: So 32. 8 plus 8 plus 8 plus 8 is 32. How many total IP addresses can there be in the world? AUDIENCE: 4 billion. DAVID J. MALAN: So roughly four billion because that's 2 the 32 power. And if you can't sort of grok that in your mind, just know that 32-bit values can be as big as 4 billion if it's all positive values. So that means there's 4 billion possible IP addresses in the world. 

And funny story, we're kind of running out of them. And in fact it's a huge problem in that the world also saw this problem coming, but hasn't necessarily responded to it in the most rapid way possible. And indeed, once you've finished CS50 and started paying attention in the tech world, you'll see this is very commonly thematic. 

For instance, if we go really old school nowadays, Y2K. That wasn't really a surprise. Like everyone knew for 1,000 years that that was-- more than a thousand years-- that that was eventually going to happen. And yet, we responded to it very much at the last minute. And that's happening again. So today we'll talk about IP version 4. But know that the world is finally getting around to upgrading to something called IPv6, which instead of 32-bit addresses, uses-- anyone want to take a guess, how many bits? 

AUDIENCE: 64? 

DAVID J. MALAN: Good guess, but no. We're finally trying to get ahead of the curve. 

AUDIENCE: 128. DAVID J. MALAN: 128, which is a freaking huge number of IP addresses, because that's like times 2, times 2, times 2, a lot of times twos up from 4 billion. 

So if curious. It turns out-- and I just googled this to find this out-- Yale computers, here at Yale, tend to start with these numbers-- 130.132 dot something, and 128.36 dot something. But there's certainly exceptions across the board depending on what department and building and campus you're on. Harvard tends to have 140.247, or 128.103. And generally this is useless information, but it's something you might notice now. When you start poking around settings on your computers, you might start to notice these kinds of patterns before long. 

But when you're at home and have an Apple AirPort, or a Linksys device, or a D-Link, or whatever it is your parents or siblings installed in your house, well what you probably have is what's called a private IP address. And these were actually a nice, temporary solution to the problem of running short on IP addresses. 

And what you can do with home networks, typically-- and frankly, even Yale and Harvard are starting to do this in different areas-- is you can give a whole bunch of computers one IP address so long as you put a special device in front of them, something called a router, or it can be called a proxy or any number of other things. But a certain device that has that one IP address. And then behind that device, within a building, within a house or an apartment, can be any number of computers, all of which have an IP address that start with one of these digits here. And so long as that computer knows how to convert the public address to the private address, everything can sort of work as expected. 

But the converse of this is that if you're at home and you have a sibling, and both of you are visiting some website, that website does not know if it's you or your sibling visiting the website, because you appear to be the same person because all of your data is going through that router or that central point. 

But enough on these lower level details. Let's take a look at how IP addresses sometimes come up perhaps in the media and how we can now start to ruin, frankly, even more shows for you. If we could dim the lights for a few seconds. 

[VIDEO PLAYBACK] 

-It's a 32-bit on IPP 4 address. 

-IP ES internet-- 

-Private network, Tamia's private network. She's so amazing. -Come on Charlie. DAVID J. MALAN: It's a mirror IP address. She's letting us watch which she's doing in real time. 

[END PLAYBACK] 

DAVID J. MALAN: OK. So a few problems with this. So one, what we're looking at here on the screen is a code written in a language called Objective-C, which is kind of a successor to the C language that we're doing. This has absolutely nothing to do with programming. In fact, as best I can tell, this is a drawing program that someone downloaded from the internet somehow involving crayons. 

Perhaps less egregious is that this IP address, valid or invalid? 

AUDIENCE: Invalid. 

DAVID J. MALAN: Invalid, because 275 is, of course, not between 0 and 255. That too is probably OK though, because you don't want to bunch of crazy people who are like pausing TV on their TiVos and then visiting the IP to see if there's actually something there. So that one's a little less egregious. But realize that too is sort of all around us. 

So of course, none of us ever really type numeric addresses into our browsers. It would be kind of a bad thing if Google, to visit Google, you had to go to 123.46.57.89. And the whole world had to just remember that. And frankly, we've kind of seen this issue before. Back in the day when people don't have cell phones and contact lists, and companies actually still-- actually, I guess companies still have 800 numbers and the like-- but you generally see numbers advertised as 1-800-COLLECT, C-O-L-L-E-C-T. Because no one can really remember, when seeing an advertisement on a bus or billboard, what someone's number is, but they can probably, with higher probability, remember a word. 

So we adopted the same kind of system in the world of the internet whereby there's a domain name system so that we humans can type google.com, facebook.com, yale.edu, harvard.edu, and let the computers figure out what the corresponding IP address is for a given name. 

And the way you do this in the real world is that for $10 a year, maybe $50 a year, you can buy a domain name, or really rent a domain name. And then whoever you're paying to rent that domain name, you tell them who in the world knows what your IP address is. And we won't go into these particulars, but many of you might want, for final projects, to actually sign up for your own web hosting company, either for free or for a few dollars per month. Some of you might want to buy, for a few dollars, your own domain name, just for fun or to start a business or a personal site or the like. 

And realize that all of that will ultimately boil down to you telling the world what your server's IP address is. And then these DNS servers actually take care of informing the rest of the world. So all a DNS server has, in short, inside of its memory is like the equivalent of a Google spreadsheet or an Excel spreadsheet with at least two columns, one of which has names, like harvard.edu, and yale.edu, and google.com. And the other column has the corresponding IP address or IP addresses. And we can actually see this. So on my Mac-- and you can do this on Windows computers as well-- if I open up a terminal window here, quite like the one in CD50 IDE, most computers have a command called nslookup, name server look up. And if I type something in like yale.edu and hit Enter, what I should see if my network cooperates as it did for multiple tests before class began-- let's try google.com. Of course now nothing's working. That's great. All right, stand by for one moment. nslookup google.com. 

Well, let's see if the actual internet-- no. That's what happened. Oh my god, all right. The Wi-Fi broke. 

Hey, want to know what my IP address is? All right. YaleSecure. This is how you troubleshoot things as a computer scientist. We turn the Wi-Fi off. OK. 

And actually, Scaz, do you mind logging us into the secure one? Otherwise more tests are-- OK, thank you Yale-- or is about to break. I want to go on YaleSecure. Oh, and maybe we'll be OK. Maybe we're back. And that's how, as a computer scientist fix a computer. [APPLAUSE] All right. So where I was within this so-called terminal window, and if I do nslookup yale.edu, there we go. So I get back first the IP address of the DNS server that my laptop is using. So in addition to a DHCP server that we talked about a moment ago telling my laptop what my IP address is, that DHCP server also tells me what DNS server to use. Otherwise I would have to manually type this in. 

But that's not all that interesting. What I care about is that this is the IP address of Yale's website apparently. So in fact, let's try this. Let me go up into a browser and go to http://, and then that IP address, and hit Enter. And let us see. That is how else you can visit Yale's websites. Now it's not all that memorable. Like, the pre-frosh probably aren't going to remember this particular address if told to visit there after visiting. But it does seem to work. And so DNS really just allows us to have much more human friendly addresses. But they don't necessarily just yield one answer. 

In fact, when you're a really big tech company, you probably want to have lots of servers. And even this is misleading. So Yale probably doesn't have just one web server. Google probably doesn't have just 10 or so web servers. Google especially probably has thousands of web servers around the world that can respond to requests from people like us. 

But they also use a technology called load balancing, which long story short, has just a few devices in the world spreading the load across more servers. So it's kind of like a spider web if you will dispatching the requests. But for now, all that's interesting for today is that a domain name like google.com even can have multiple IP addresses like that. 

But how does all of our data actually get back and forth then in the end? Well, it turns out that there's these things called routers on the internet. And what is a router to the extent that you know already? And I've used the word a couple times in the context of a home, but in simple terms, what does a router do? Give me just a guess based on its name? 

AUDIENCE: So a road or a path? DAVID J. MALAN: So it's a road or a path. So a route is a road or path, absolutely. And a router, so a device that actually routes information, would move data between points A and B. 

And so in fact-- and this is perhaps when you Google depictions of routers on the world, all you get are cheesy marketing diagrams. And so this is sort of the most representative one I could find that looked mildly interesting. Each of these dots or glimmers of hope around the world represents a router. And each of them has a line between some other router. 

Because indeed, there are thousands, probably millions of routers around the world, some of which are in our homes and on our campuses, but a lot of which are owned by big companies and are interconnected so that if I want to send some data from here at Yale back home to Cambridge, Yale probably doesn't have a single cable, certainly, going directly to Harvard. And Yale doesn't have a single cable going to MIT, or to Stanford, or to Berkeley, or to Google, or any number of destinations. 

Rather, Yale, and Harvard, and everyone else on the internet does have one or more routers connected to it, maybe on the periphery of campus. So that when my data wants to leave Yale's campus, it goes to that nearest router, as depicted by one of these dots. And then that router figures out whether to send it this way, or this way, or this way, or this way based on another table in its memory, another Excel file or Google spreadsheet that in one column says, if your IP address starts with the number one, go this way. If your IP address starts with a number two, go that way. And so you can break it down numerically to have the router sending data every which way. 

And we can kind of see this as well. Let's go ahead into this terminal window again, and let me go ahead and trace the route to, let's say, www.mit.edu, which is a couple hundred miles away. That was really damn fast. 

So what just happened? So in just seven steps, and in just four milliseconds, I sent data over the internet from here at Yale to MIT. Each of these rows, you can perhaps guess now represents what? 

AUDIENCE: A router. 

DAVID J. MALAN: A router. So indeed, it looks like there's about seven or so routers, or six routers in between me physically in Yale's law school here and MIT's website over there. And what we can glean from this is as follows-- and let me clean it up. I'm going to rerun it with a command line argument of -q 1 to just say, just give me one query. By default, trace route does three. And that's why we saw bunches of numbers. I want to see fewer numbers just to keep the output cleaner. And let's see what happens. 

So for whatever reason, someone at Yale thought it would be funny to call it your default router arubacentral, which is on vlan or virtual LAN, virtual local area network 30-- so you probably have at least 29 others-- router.net.yale.internal. And .internal here is kind of a fake top level domain meant to be used just on campus. And notice the corresponding IP address of that router, wherever it is here on campus, is 172.28.204.129. And it took 36 milliseconds to go from here to there. 

Funny story. We'll get back to that in just a moment. But now the second router-- to which arubacentral apparently has some kind of physical connection most likely-- the humans didn't bother naming it. The Yale humans didn't bother naming it because it's inside of your network it seems. And so it just has an IP address. 

But then a third router here on Yale's network that's probably a little farther away still is called cen10g whatever that is asr.net.yale.internal. And it too has an IP address. 

Now why are these numbers kind of fluctuating? 2.9, 1.4, 36? Routers get busy. And they get congested and backed up. There's thousands of people on this campus using the internet right now. There's a hundred people in this room using the internet right now. 

And so what's happening is that the routers might get congested. And so those times might fluctuate a little bit. So that's why they don't necessarily increase straightforwardly. 

But things get kind of interesting in step four. Apparently between Yale and step four is another hop. And where is the router in step four probably? 

AUDIENCE: [INAUDIBLE] 

DAVID J. MALAN: JFK maybe, maybe at the airport. But for whatever reason, system administrators, so geeks that run servers for years have named routers after the nearest airport code. So JFK probably means it's just somewhere in New York, maybe in Manhattan or one of the boroughs. nyc2 denotes, presumably, another router that's somewhere in New York. 

I don't quite know where row six is here, router number six. quest.net a big ISP, internet service provider, that provides internet connectivity to big places like Yale and others. And then this last one, it looks like that MIT doesn't even have their own website in Cambridge necessarily, but rather they've outsourced their website, or at least the physical servers, to a company called Akamai. And Akamai actually is right down the road from MIT in Cambridge it turns out. 

But realize too that even thought you're going to www.mit.edu, we could really be sent anywhere in the world. 

And let's see somewhere else in the world. Let me go ahead and clear this screen and instead trace the route, just once, so query one, to www.cnn.co.jp, the Japanese home page for CNN, the news site. And if I hit Enter now, let's see what happens. We're again starting at arubacentral. We're then going to the nameless router, a few more. So it took 12 hops to get to Japan this time. And let's see what we can glean. 

So same hop, same hop. Slightly different now. This one's interesting. So I'm guessing here, stamford1 is a few towns away in Connecticut also. These routers in row six and seven don't have names. But this is kind of amazing. 

So what seems to be between the routers in step seven and eight? And why do you say as much? Yeah? 

AUDIENCE: Ocean. 

DAVID J. MALAN: Probably an ocean. We know that's true like, intuitively, right? But we can confirm as much kind of sort of empirically why? What has changed between rows seven and eight? 

It took a lot more time to go to whatever this nameless router seven is, probably somewhere in the continental US, to step eight, which is probably somewhere in Japan based on the domain name of .jp there. And so those additional hundred something milliseconds or 90 or so milliseconds is the result of our data going over a pretty large body of water. 

Now curiously, it seems that maybe that cable goes across the whole US. If we're actually going over the West Coast to get to Japan, it's kind of the long way if we go the other way. So it's not entirely clear what's going on physically. But the fact that every additional hop indeed took markedly longer than every other, it's pretty good confirmation that CNN's Japanese web server is probably indeed in Japan. And it's certainly farther away than MIT has been. And it's worth noting too, your data is not necessarily going to travel the shortest possible distance. In fact, if you play around with trace route at home just picking random websites, you might find that just to send an email or to visit a website that's here in New Haven, sometimes your data might first take a detour, go down to DC, and then come back up. And that's just because of the dynamic routing decisions that these computers are making. 

Now just for fun, the production team trimmed one of these videos for us to just be a little more succinct. But to give us a quick sense here-- and we can leave the lights on-- as to just how much cabling is actually carrying all of our data. [VIDEO PLAYBACK] [MUSIC PLAYING] [END PLAYBACK] DAVID J. MALAN: All networking videos have cool sounding music apparently. So that's to get just a sense of just how much have been going on underneath the hood. 

But let's look at a slightly lower level now at what data is actually traversing those lines, and even going wirelessly in a room like this. 

So it turns out when you request a web page, or send an e-mail, or receive a web page, or an e-mail, or a Gchat message, or a Facebook message, or the like, that is not just one big chunk of bits flowing wirelessly through the air or electronically on a wire. Rather, that request or response is generally chunked up into separate pieces. 

So in other words, when you have a request to make of another computer, or you get back a response from another computer-- like suppose, for instance, if unfamiliar-- as too many people seem to be these days-- if unfamiliar with this-- not this fellow-- this fellow. So suppose this is a message that I want to send to someone in back. Who in the very back would like to receive a picture of Rick Astley today? OK, what's your name? 

AUDIENCE: Cole. 

DAVID J. MALAN: What is it? 

AUDIENCE: Cole. 

DAVID J. MALAN: Holt? H-O? AUDIENCE: C-O-L-E. DAVID J. MALAN: C-O-L-E, Cole. Sorry. C-O-L-E. All right. So if I want to send Cole this picture here, you know this is kind of a big picture, right? This could be a few kilobytes, a few megabytes, especially if it's high resolution. And I don't really want to stop everyone else from using the internet just while I send this really big, high quality picture of Rick Astley throughout the room. I'd like your data to continue to traverse the network and the Wi-Fi as well. 

And so it makes sense-- and this is recoverable electronically, not so much in the real world. Actually, this is going to have multiple meanings if you take my audio out. So if I tear this in the half like this here, this now can travel the internet more efficiently, because it's a smaller piece. So with lower probability is it going to collide with someone else's traffic on the internet. 

And so what your computer indeed does when you want to send a message to Cole is it chunks up a message like this into smaller pieces, fragments so to speak. And then it puts them inside of what we'll call sort of virtual envelopes. 

So I have four paper envelopes here. And I've pre-numbered them, one, two, three, and four. And what I'm going to do on the front of this, just like a normal mailing, is I'm going to put Cole's name there. And then at the top, I'm going to put my name there, David, so that the first such packet I'm sending out there on the internet looks a little something like this, the salient characteristics of which are that it has a to address, a from address, and also a number, so that that hopefully is sufficient information for Cole to reconstruct this message. 

So let me do the same here, the same here, and the same here, writing his name in the To field on all of them. And then let's go ahead and put these pictures inside. 

So here is one packet that's ready to go. Here is another packet that's ready to go. Here is a third packet that's ready to go. And here is a fourth packet that's ready to go. 

And now what's interesting about how the internet in reality works is that even though I've got four packets, all of which are destined for the same location, they're not necessarily going to traverse the same route. And so even though I might hand these packets off to the nearest router let's say, if you would like to send them every which way, let's see what actually happens, the goal of which is to get them ultimately to Cole. And indeed, they're already not necessarily taking the same direction. And that's fine. This is a little awkward and Oprah style today. 

And now let me deliberately take that one back. And now Cole, if you'd like to reassemble it as best you can. Of course, we can all guess what the conclusion here is going to be. You're going to have 3/4 of Rick Astley in just a moment. And what though is the implication of that? You want to try to hold it up? We do have one camera pointed at you if you'd like to pose with Rick Astley over here. There we go. Lovely. 

But you seem to be missing a fragment of Rick Astley. So it turns out that the internet is generally driven by not just IP, but in fact we heard at the very beginning of lecture in that video-- and you've probably seen this acronym more often-- what really is the protocol you tend to hear about? 

AUDIENCE: TCP/IP. 

DAVID J. MALAN: TCP/IP, which is just a combination of two protocols, one called IP. Which again, is just the set of conventions via which we address every computer in the internet. And then TCP, which serves another purpose. 

TCP is a protocol that you typically use in conjunction with IP, that among other things, guarantees delivery. In fact, TCP is the protocol that would notice that one of the packets apparently didn't get to Cole, because he seems to be missing number four out of four. And so what TCP, a protocol does, is it tells Cole, hey Cole, if you receive only three out of four packets, tell me which one you are missing, essentially, and then my purpose in life should be to retransmit that. 

And so if I too, the sender, are using TCP, I should then create a new packet-- not this wrinkled one here-- retransmit just this piece of it, so that ultimately Cole has a complete souvenir, if nothing else. But so that ultimately the data actually gets to its correct destination. 

But unfortunately, writing Cole's name on the front isn't sufficient, per se. And really, I wouldn't write Cole's name, but probably his IP address on the envelope. And I wouldn't write David. I'd write my IP address on the envelope so that the computers can actually communicate back and forth. But it turns out that computers can do way more than serve up pictures of Rick Astley. They can also resend and receive emails, chat messages. They can do things like file transfers, and any number of other tools you use on the internet, servers can do these days. 

And just because a company, or a school, or a person wants to have a web server, and an email server, and a chat server, does not mean you need three computers. You can have just one computer running multiple services, so to speak. 

And so when Cole receives a message like that, how does his computer know whether to show that picture in his browser, or in Gchat, or in Facebook Messenger, or in any number of other tools? 

So it turns out also on that as envelope is additional piece of information known as a port number. And a port number is just a number indeed, but it uniquely identifies not the computer, but the service. And there's bunches of these. So it turns out that in the world, humans have decided on a few such conventions, some of which are these. So there's something called File Transfer Protocol. It's pretty dated. It's completely insecure. A lot of people still use it. And it uses port number 21. In other words, if sending a file via FTP, the envelope would have not only the sender and the receiver's IP address, it would also have the number 21 so that the receiving computer knows oh, this is a file, not an email or a chat message. 

25 is SMTP. How many of you have ever used SMTP? Wrong. Almost all of you have. If you've ever used email, you've used SMTP, simple mail transfer protocol, which is just a fancy way of saying, this is the type of computer or service that sends your email outbound. 

And if you've ever seen acronyms like POP, or IMAP, and there's a few others, those are for receiving email, typically. That just means it's a different service. It's software that someone wrote that sends to or listens on a specific port number so that it doesn't confuse emails with some other type of data. 

Now the web is HTTP, which is number 80, and also port 443. And in fact, even though we humans fortunately don't have to do this, any time you visit a website like http://www.yale.edu, the browser is just being kind of helpful in that it's assuming that you want numeric port 80. We already know that DNS can figure out what the IP address is of www.yale.edu. But the computer is just going to infer that you want port 80 because you're using Chrome, or IE, or some other browser. But I could technically do colon 80. And then I can explicitly tell my browser, send a packet or more of information to www.yale.edu requesting today's home page. But specifically, address it to Yale's IP at port 80 so that I actually get back Yale's web server. 

Now it immediately disappears because browsers just decide that we don't need to confuse humans by having yet more arcane information like colon 80. And frankly, browsers like Chrome don't even show you HTTP anymore, or the colon, or the slash slash, or the trailing slash, in some sense because they're trying to make things simpler for users. In another sense, it's just kind of a user experience thing-- let's get rid of some of the clutter. But it's hiding some of these underlying details. 

And in fact, none of us probably ever type http anymore. You just type in something like www.harvard.edu. And again, Chrome infers that you want HTTP. But there are other protocols that we could certainly be using. 

So given all of this, if you now sort of put on the so-called engineering hat, how do things called firewalls work? So you're probably generally familiar with the firewall, not so much in the physical sense. So back in the day, and still to this day, if you've got like strip malls for instance that have a lot of stores, generally the walls in between individual stores or shops are firewalls in the sense that they have special insulation so that if a fire breaks out in one shop, it doesn't necessarily spread to the shop next door. 

The computer world also has firewalls that do something different. What does a fireball do? Yeah? 

AUDIENCE: Basically they cut off connection if they encounter something like, for example, they have number of id statements. And if something happens, they cut the connection. Like if this malicious attack [INAUDIBLE] your computer, or-- 

DAVID J. MALAN: OK good. Yeah, and in fact you're even going a little farther in describing something that might be called an intrusion detection system, or IDS for short, whereby you actually have rules defined. And if you do start to see suspicious behavior, you try to put an end to it. 

And a firewall, frankly, at a networking level, is even dumber and simpler than that, generally. And there's different types of firewalls in the world. But the ones that operate at the level we're talking today-- IP and TCP-- work even more straightforwardly. 

For instance, if you were Yale system administrators, or Harvard system administrators, or some Big Brother at some company, and you wanted to prevent all of your students or all of your employees from going to facebook.com, all you have to do is make sure that all of their network traffic, first of all, goes through a special device. Let's call it a firewall. 

And that's fine, because you can make your router the same thing as a firewall if you put the same kind of software on the same machine. So if all of your students or employees traffic is going through this central firewall, how would we block people from going to facebook.com, for instance? What would the system administrator have to do? Anyone else? Let's try to go around. 

AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Say that again? AUDIENCE: It should just get caught up inside the system. So just put Facebook into 127.0.0-- DAVID J. MALAN: Oh, interesting. So you can actually then hack your DNS system. This is indeed a way you could do this whereby any time a Yale student pulls up www.facebook.com, all of us here today on campus are using Yale's DNS server, because Yale's DHCP server gave us that address. So yeah, you could kind of break things or break convention by just saying, yeah, facebook.com's address is fake, is 1.2.3.4, which is not actually legitimate. Or maybe it's 278. whatever was in the TV show a moment ago so that none of us can actually visit facebook.com. 

So suppose Yale did that. Suppose Yale wanted to keep you out of facebook.com. And therefore, they changed the DNS settings to give you a bogus IP address for facebook.com. How do you respond? Technically, not-- oh, now everyone wants to participate. OK, yeah. AUDIENCE: You just type in the actual IP address of Facebook. 

DAVID J. MALAN: OK, good. So we could just type in the actual IP address of Facebook, much like I did with Yale's website. And if the Facebook server is configured to support that, it should indeed work. It's a minor pain in the neck, because now we have to remember some random 32-bit value, but that could work. What else could you do? Yeah. 

AUDIENCE: You could change those settings [INAUDIBLE]. DAVID J. MALAN: Yeah, you could even change your DNS settings. So in fact this is actually pretty useful, frankly, if you're in an airport, or if you're in a cafe, or something that has flaky internet whereby sometimes the DNS server just stops working. So even I occasionally do this, not for malicious, I want to use Facebook purposes, but really because I seem to have a network connection, but nothing is working. And so one of the first things I try-- and you can do this on Windows too-- but on my Mac, if I go to Network. And I choose my Wi-Fi connection. And I go to Advanced. And I go to DNS. These are the three IP addresses that Yale is giving me for three DNS servers. The purpose then is for me to try any one of these to resolve addresses. 

But I can override these by doing a plus. And anyone want to propose a DNS server? 

AUDIENCE: 8.8.8.8? 

DAVID J. MALAN: Oh, you're amazing. Yes, 8.8.8.8. So Google, bless their hearts, bought the IP address 8.8.8.8, because it kind of looks like Gs probably, and it's easy to remember. But indeed, now I have configured my computer to use Google's DNS server. 

So now if I go to yale.edu, it's still going to work. But I'm not using Yale's DNS servers anymore. And if I go to facebook.com, all of those look ups are going to go through Google. 

So on the one hand, I've cleverly circumvented the local system administrators just by understanding how networking works. But I'm paying a price. Nothing is free. What have I just given up? What have I just given up? All of you clever people who have been using 8.8.8.8, because it's cool or solves problems, what have you been doing all this time? 

AUDIENCE: Traveling farther? 

DAVID J. MALAN: Maybe traveling farther, because Google's probably not quite as close as the server down the street. But more worrisomely. Yeah? 

AUDIENCE: So now Google knows where you're going. 

DAVID J. MALAN: Google knows literally every website you are visiting, because you are literally asking them, hey Google, can you translate yale.edu for me? Or hey Google, can you translate this other website address for me into an IP address. And so they're-- I have no idea what you're talking about. And so they know everything about you. So realize that this is a free service with a purpose from their perspective as well. But it can certainly get you out of a bind. 

Now just to address one other issue that often comes up among students, especially when traveling internationally in certain countries like China, where there indeed is a Great Firewall of China whereby the government there blocks quite a bit of traffic at different levels. You don't have to just block traffic at the level we're talking here, DNS or otherwise, you can block it at other levels. 

And in fact, just to be clear, a firewall can operate even more simply than just having the system administrators change DNS settings. A firewall, a device in between us and the rest of the world, could just block any outgoing requests to the IP address for Facebook on port 80, or the IP address for harvard.edu, or the IP address of anything. So a firewall can look at your envelopes' IP addresses and even port numbers, and if Yale wanted to, it could just stop all of us from even using FTP anymore, which would probably be a good thing because it is indeed an insecure protocol. Yale could even stop us from visiting the entirety of the web just by blocking all port traffic on number 80 as well. So that might be another way. And there's even fancier ways as well. 

But when you're traveling abroad for instance, or if you're in an internet cafe, or if you're anywhere where there's blockages or threats, what can you do? Well, if you go down the street to Starbucks or you travel in an airport, generally you can just hop on the Wi-Fi by choosing like, JFK Wi-Fi of LaGuardia Wi-Fi, or Logan Airport Wi-Fi, or what not. And it's not encrypted, right? There's no padlock icon. And you're probably not prompted for a username and password. You're just prompted with some stupid form to say like, I agree to use this only for 30 minutes, or something like that. 

But there's no encryption between you and Starbucks Wi-Fi access point, the things with the antennas on the wall. There's no encryption between you and the airport's Wi-Fi signals. 

And so technically, that creepy person sitting a few seats down from you in Starbucks or at the airport could be, with the right software, watching all of your wireless traffic on his or her laptop. It's not that hard to put a laptop into what's called promiscuous mode, which as the name suggests, means you're kind of loose with the rules. And it just listens not only for traffic meant for it, but also to everyone else's traffic within range. 

And by that logic, it can see all of the packets of information you're receiving. And if those packets aren't encrypted, you are putting yourself at risk of your emails, or your messages, or anything else getting exposed. 

So even if you're not abroad but you're just in Starbucks, or you're on some random person's Wi-Fi that's not encrypted, a VPN is a good thing. A VPN is a virtual private network. And it's a technology that allows you to have an encrypted, a scrambled connection-- fancier than Caesar or Vigenere-- between your laptop, or your phone, or your desktop, and a server elsewhere, like a server on Yale's campus. 

And if you're traveling abroad-- and in fact, you find this in hotels all the time. And especially as aspiring computer scientists where you guys might, as geeks, want to use ports other than 80, and ports other than 443-- and in fact for problem set six, we are going to play with multiple TCP ports just by choice-- a lot of hotels, and shops, and networks just block that kind of stuff because they somewhat naively, or ignorantly, just think that no one needs those other ports. 

And so by using a VPN can you circumvent those kinds of restrictions, because what a VPN does is it allows you at Starbucks, or the airport, or anywhere in the world to connect encryptedly to yale.edu, to some server here on campus, and then tunnel, so to speak, all of your traffic from wherever you are through Yale, at which point it then goes to its final destination. 

But by encrypting it, you avoid any of these kinds of filters or the imposition that some local network has imposed. And plus, you have a much more robust defense against creepy people around you who might be trying to listen in on your traffic. There could still be creepy people here back home at Yale watching your traffic as it comes out of the VPN, but at least you've pushed the threat farther away. And it's here too, a trade off. 

Now of course, if you are in China or even in the cafe, and you're tunneling all your traffic through Yale, what price are we paying perhaps? 

AUDIENCE: Speed. DAVID J. MALAN: Speed, right? There's got to be some math or some fanciness involved in the actual encryption. There could be thousands of miles of distance or thousands of miles of cables between you and Yale. And it's really bad if you're in China, for instance, and you want to visit a website in China. And so your data is going to the US, and then back to China just because you're encrypting it through this tunnel. 

But it solves technical and work problems alike. But it all boils down to these very simple ideas. And Harvard, for those curious, has one here as well, at vpn.harvard.edu, which operates just like Yale's. 

So with all that said, why is this whole network useful? And what can we start doing with it? Well, let's make this now more real. This is the acronym with which most of us are probably super familiar-- HTTP-- which stands for hyper text transfer protocol. And this just means this is the language, the protocol that web browsers and web server speak. 

The P in HTTP is indeed a protocol. And a protocol is just a set of conventions. We've seen IP-- internet protocol-- TCP-- transmission control protocol-- and HTTP. But what is this stupid thing of a protocol? It's just a set of conventions. 

So if I sort of come down here, and I want to greet you. I would say hi, my name is David. 

AUDIENCE: Luis. 

DAVID J. MALAN: Luis. We have this stupid human convention of shaking hands here. But that's a protocol, right? I extended my hand. Luis extended his hand. We did this. And then complete, done. 

And that's exactly the same spirit of a computer protocol where as in HTTP, what happens is this. If you are the computer on the left here, and there is some web server there on the right. And the computer on the left wants to request information from that server. It's kind of a bi-directional operation. The browser on the left asks for some web page. The server on the right responds with some web page. And we'll see what form those take in just a moment. 

And it turns out that those computers-- that browser and server, or client and server, so to speak. Much like a restaurant where the client is asking for something, and the server is bringing him or her something-- get is kind of the operative word. Literally inside of the envelope that my browser sends from here to a web server is the word get. Like I want to get today's news. I want to get my Facebook news feed, or I want to get some page from the server. 

Specifically, this is what's going on inside of that envelope. So I, with Cole, essentially sent Cole a response. If you imagine that Cole actually wanted a picture of Rick Astley, he might have sent me a request similar in spirit to this. Inside of his envelope to me, where I'm now playing the role of Google, would be a request that literally says, get, and then a forward slash-- and you've probably seen forward slashes in URLs before. It just means give me the default page, the default Rick Astley picture in this case. 

And by the way, Cole speaks the language HTTP version 1.1, or the protocol 1.1. And it turns out there's an older version 1.0. But computers tend to use 1.1. 

The second line is a useful thing that will come back to perhaps before long. But it's just a specification to me, the recipient, that the thing I want is www.google.com. Because it's very possible these days for dozens, hundreds of websites with different domain names to all live on the same server. It's not going to be true so much in Google's case. But in a smaller company's case, could absolutely be. So Cole is just kind of putting in the envelope, by the way, when this reaches your IP address on port 80, just be sure that you know I want www.google.com, not some other random website on the same server. 

What I then respond to Cole with, at the end of the day, is a picture. But atop that picture inside of the envelope is actually some text, where I say, OK. I speak HTTP version 1.1 also. 200. Which is a status code that most of us have probably never seen, because it means OK. And this is good, because it means I am responding successfully to Cole's request. 

What numbers have you probably seen on the web that are not OK? 

AUDIENCE: 404. 

DAVID J. MALAN: 404-- file not found. So indeed, any time you've seen one of those annoying file not found errors, because the web page is dead, or because you mistyped a URL, that just means that the little envelope that your computer received from the server contained a message HTTP 1.1, 404-- not found. That file or that request you made is not found. 

Moreover, inside of the envelope typically is this line, content type. Sometimes it's HTML, something we'll soon see. Sometimes it's a JPEG. Sometimes it's a GIF. Sometimes it's a movie file, an audio file, any number of things. So inside of the envelope is just a little hint as to what I am receiving. There's other status codes too, some of which we'll explore in P set six, and you'll stumble across in P set seven and/or eight. But some here, like 404 we've seen. Forbidden, 403, means like the permissions are wrong, like you haven't kind of configured it correctly. 301 and 302, we rarely see visually. But they mean redirect. Any time you've gone to one URL and you've been magically sent somewhere else, that's because the browser has sent back an envelope containing the number 301 or 302, and the URL that it wants your browser to go to instead. 

500 is horrible. You'll see it before long, probably in P set six or P set seven. And it generally means there's some bug in your code, because indeed we'll be writing code that responds to web requests. And you've just got some error in logic or syntax, and the server can't handle it. 

So let's see how we can now leverage and understand these requests as follows. If I go to, let's say, google.com. Let me go to www.google.com. And for demonstration's sake, let's see, I need to go to Settings here. I'm going to go to Search Settings. And Google has increasingly annoying features, but useful features. So Google has this thing like instant results where you start typing, and automatically things start appearing. And that's all fine and technically useful, and we'll understand before long how this works. But for now, I'm turning off instant results, because I want my browser to sort of work old school so that I can see what's going on. 

So now I'm back here. And I want to search for cats. And notice I'm seeing some suggestions, some very benign suggestions thankfully. And now if I hit Enter, let's see what happens. 

So there are some cats. And the top hit is on Wikipedia. But today we care about the technology up here. So the URL to which I've been sent is this here. And there's some stuff I don't really understand. 

So I'm going to go ahead, because I kind of know how Google works, and I'm going to distill this URL into its simplest form. And now I'm going to hit Enter again. And it still works. I have a page of results all about cats. 

But notice the simplicity of my URL. It turns out this is how much of the web works. The web is just a whole bunch of computers running software that take input. It's not get string style input. It's not command line arguments like we're used to. They take input, these web servers, by way of the URLs quite often. And any time you've searched for something, any time you've logged into Facebook, any time you've done anything interactive with a web page, what you're doing is effectively submitting a form, so to speak-- text boxes, check boxes, little circles, and whatnot that send information from you to the server. 

And it turns out that the web server knows to look at that URL and parse it, like look at it character by character looking for anything interesting after a question mark. Because after a question mark, it turns out, is going to come a bunch of key value pairs. I mean key=value. And then if there's multiple-- maybe an ampersand, some other key=value, ampersand, key=value. 

So we've kind of seen this idea before where something has a value. It's just a new format here. And I just know, by convention, Google uses q for query. And then if I want to search for dogs, I can manually search for dogs like that. And then I'm apparently getting some search results involving dogs. 

So that seems to be interesting. And indeed, what's going on underneath the hood is this. Let me do this. This is a-- let's see. Let me go back over here for just a moment. 

We'll see that there's other ways to submit information. So if I'm logging into Facebook, or Gmail, or any other popular website, it seems kind of bad if whatever I typed into the search box ends up in my URL, in my browser's address bar. Why? Why is that mildly worrisome? Yeah? AUDIENCE: Type in a password. DAVID J. MALAN: Yeah. So what if what I've typed in is my password? I kind of don't want it so obviously visible in my browser's address bar. One, because my annoying roommate tends to watch over my shoulder, and he or she can now see, even though it was bullets when I'm typing it in, little circles. Now it's in my address bar. 

Moreover, what's true about stuff you tend to type in the address bar. 

AUDIENCE: [INAUDIBLE] 

DAVID J. MALAN: What's that? AUDIENCE: It gets sent out. DAVID J. MALAN: It gets sent out. And also, it gets remembered. Because the next time you type things up there, often it autocompletes and it remembers what you've typed before. And so there's this veritable history that your sibling, or your roommate, or whoever can walk through to pretty much see every website you visited because it's logged in that address bar. 

Moreover, suppose you want to upload a photo to Facebook. How in the world are you going to put a photo in a URL? 

Well it turns out you can do it in some way, but it's certainly non-obvious. And so there's this other way of sending information in an envelope, not via a GET, but via something called POST. And in theory, it looks pretty much the same. Instead of the word GET, we say POST, and then the same kind of format. 

For instance, this is a screenshot of what it might look like if I try logging into Facebook, which sends me to a file called login.php, which is actually still to this day named as such. It's the same filename Mark gave to it many years ago. It is the program he wrote in PHP via which users can login to the website. 

But you need to send some additional input. And rather than it going after the file name as it did before with cats-- q=cats-- it can go lower in the request, deeper inside of the envelope if you will where no one can see it, and where it does not end up in the user's browser bar, and therefore not remember for people to snoop around. 

And so here my email address and my fake password actually go. And if Facebook is using not HTTP, but HTTPS, this will all be encrypted, scrambled, ala Caesar or Vigenere, but more fancily so that no one can actually see this request. 

And so indeed, any time you have a URL that starts with HTTPS, it just means it's encrypted. But at the end of the day, what's actually inside of these envelopes? This was super low level. And fortunately, we're not going to necessarily have to go so low level every time to start writing interesting software. We can start to take the ideas of week one through five, assume that there is now this infrastructure that lets us write software that operates on the web, and it's going to allow us this coming week to start looking at something called HTML. This is the stuff that is even deeper inside of the envelope, but it's the stuff we're going to start writing. And it's the stuff more interestingly, we're going to write programs that starts generating automatically so that our websites are not hard coded, but take input and produce output. 

This is perhaps the simplest web page you can make in the world. I can indeed open up something stupid like TextEdit on my Mac, which just gives me a simple text window like this. PC users have Notepad.ext, which is very similar in spirit. 

And I can literally type out this-- DOCTYPE HTML, which looks a little cryptic. But we'll come back to that. HTML, with these weird angled brackets and slashes, inside of which now I'm going to say here comes the head of my web page. Inside of that, I just know, and you'll soon know, that I can put the title of my web page. And then below the head of the web page is going to go to the so-called body of the web page. And I'm just indenting just like in C to kind of keep things nicely readable stylistically. And now I'm going to save this as a file on my desktop, called hello.html. 

And I'm going to tell it yes, use HTML. Don't change it to .txt, even though all this is a text file, just like a C program written with a text editor. Although not in CS50 IDE at the moment, just here on my Mac. 

And if I now go to my desktop, you'll see hello.html. If I double click this, it will open Chrome. And even though this file happens to live on my desktop, that is perhaps the simplest web page I could make. 

Notice that the title of the tab way up top is hello world. The body of the web page is indeed hello world. And all I've done to get to this point is implement, or is write a new language, called HTML. It's not a programming language like C. There's not going to be conditions, and loops, and functions. It's a markup language, in which case you just tell the receiving program what you want to do. This means hey browser, here comes an HTML page. Hey browser, here comes the head of my page. Hey browser, here comes the body of my page. Hey browser, that's it for the body. That's it for the HTML page. 

And with those simple definitions alone, we'll soon see that one, we can represent this as a tree. But more on that later. So this will all interconnect to our most recent data structures. Two, we'll introduce this stupid joke. This is an actual tattoo that this guy had on his neck. It's probably funny the first week or two, and thereafter, maybe not so much. 

But HTML, and even the web page I just made, super mind numbingly disappointing-- just saying hello world in black text on a white background. Surely we can do much better. And we'll do so by introducing another language called CSS. This too not a programming language-- no loops, and conditions, or for loops, but really, just syntax by which we can say, make this text big. Make this text small. Right align it. Left align it. Make it pink. Make it purple. Make it blue. Or do any number of other visual effects. And so we'll see how to start stylizing web pages so that they look in a manner closer to what we want. 

And lastly, we have indeed ruined perhaps much of TV and film for you. I thought we'd end here with our final seconds on a final clip that shows you how hacking on the internet works. If we could dim the lights one final time. 

[VIDEO PLAYBACK] 

-No way. I'm getting hacked. 

-Okorsky? 

-No-- no, this is major. They've already burned through the NCIS public firewall. -Well, isolate the node and dump them on the other side of the router. -I'm trying. It's moving too fast. 

-Oh, this is not good. They're using our connection [INAUDIBLE] this database. Sever it. -I can't. It's a point attack. He or she is only going after my machine. 

-It's not possible. There's DOD level mine encryption. It would take months to get-- -Hey, what is that? A video game? 

-No Tony, we're getting hacked. 

-If they get in Abby's computer, the entire NCIS network is next. 

-I can't stop him. Do something McGee. 

-I've never seen code like this. -Oh. -Where's it go? Abby? -I didn't do anything. I thought you did. 

-No. 

-I did. 

[END PLAYBACK] DAVID J. MALAN: The best part is two people typing on the keyboard at the same time. 

So that's it for CS50. We'll stick around for office hours. And we'll see you next time. [MUSIC PLAYING - "SEINFELD THEME"] This is CS50. I don't want to be a pirate. SPEAKER 2: Yarr David. It is a fine doublet you be wearing. Lot of luff in that puff.