SPEAKER: Let's talk about another protocol-- the Hypertext Transfer Protocol, or HTTP. So we've talked about IP and TCP in previous videos. And those are protocols that dictate how information moves from machine to machine and from program to program or service to service via the internet, via this network of routers and machines. 

But that's usually not the entire picture, right? Usually when we send information, the program itself-- when data is received, say, for example, in email via TCP port 25 or a web page request via port 80, there's usually a system of rules there to process what I've just received. And HTTP is an example of just such a protocol. 

HTTP is the only application layer protocol that we're going to talk about. But it is another set of rules dictating how information is to be transmitted and processed via the internet. In particular, HTTP specifies exactly how one must make a request for a web page and exactly how a server, a machine that hosts web pages, delivers that information back to clients. 

So this protocol doesn't actually have anything to do with how information moves from point A to point B. It's really the system of rules for-- it's basically the rules of engagement for working with a web page, similar to when somebody waves their hand at you, you're supposed to wave back. That's sort of a conventional human protocol. HTTP protocol just says, if you want to request a web page, make sure your format looks like this-- sort of like formatting a business letter, for example. And the response will similarly come according to this protocol. There are other application layer protocols that we're not going to talk about in videos. But these include things like the File Transfer Protocol, Simple Mail Transfer Protocol for sending emails, the Data Distribution Service, Remote Desktop Protocol, RDP, which is used if you want to remotely access your computer from another computer, XMPP, which is frequently known as Jabber or chat, so this is the protocol for using chat services. And there are many, many, many others. 

So every time you're using a service, the service is expecting information to be received-- a request to be received-- in a very particular format and is required to return information back in a very particular format as well. 

So let's go back to our illustration of us wanting to talk to the internet. So we're happy, and we want to go to cats.com, right? So if we're just talking to cats.com, we might say something like hey, can I see your home page? And cats.com will probably respond, yeah, sure. Here you go. So that's a human sort of ask-and-answer. 

What does that look like in HTTP? Well, it actually kind of translates pretty cleanly to something like this. We might say GET/HTTP/1.1 from host cats.com. So basically what I'm doing here is asking for the web page www.cats.com/. We usually omit the slash nowadays, but that would just mean cats.com's homepage. 

Oh, and by the way, I'm going to be using HTTP version 1.1 to communicate with you. That's sort of analogous to saying, like, by the way, I'm going to be speaking in French, or by the way, I'm going to be speaking in English. That's just the format of the protocol. It's also 1.0, which is not commonly used anymore. So I'm speaking HTTP 1.1, and I would like www.cats.com/. Please get that for me. 

And then there's other information, too-- the dot, dot, dot there, which is information about who you are so cats.com would know where to send it. But these are the two sort of critical parts at the very beginning of an HTTP request-- just like when you start a letter you say, dear, blank. This is very similar in spirit to that. 

And if cats.com is going to say, oh, sure, here you go. They might respond like this-- I'm also responding. I also speak HTTP 1.1. Your request is approved, 200 OK. What you're about to receive is HTML and then dot, dot, dot some extra information. And at the very bottom of the request is actually the HTML, the markup language, the content of cats.com's homepage. 

So HTTP/1.1-- I acknowledge your request was accepted via HTTP 1.1. Your request was approved. I can give you what you want, 200 OK. You're about to receive HTML. And then here's the HTML that you requested. 

But sometimes our requests don't always go quite according to plan. Can I see your cats.html page? Well, what if they say, we don't have a cats.html page, which seems kind of unrealistic because they're cats.com. You'd think they would have cats.html. But OK. So this is sort of the conventional human interaction we've now had with cats.com. How does that translate? 

This might be something familiar to you. Our request looked exactly the same, except instead of getting slash we're now getting cats.html. So now what basically this entire request is saying is please give me www.cats.com/cats.html. So the host and the middle part of that top line there indicate precisely what page I am asking for. But cats.com in this case isn't going to be able to respond positively. They don't know we're talking about. And so this is something you might have seen before-- HTTP 1.1 404 Not Found. I couldn't find what you were asking for. By the way, I'm going to give you back some HTML, and usually that HTML is the content of some 404 page. And in the case of cats.com, it's probably some cute cats in a basket with a sad 404 face next to them, because you're going to be sad when you don't get page that you were looking for. 

That's kind of the basics of what a protocol, the HTTP protocol requests look like. They're really similar to how we would make a similar interaction in just human conventions asking for something and getting it back or writing a letter and expecting a response letter in a particular format. That's pretty much what HTTP is just canonicalizing for all devices that wish to access web pages, hypertext transfers. 

So a line of the form, this the method request target HTTP version, is called an HTTP request line. It's usually the first thing that is transmitted as part of an HTTP request or if you're asking for HTTP. It's sort of like, as I said, saying dear, blank at the top your letter. They know that you're writing them a letter. So this is very similar to saying, I know that they're making an HTTP request and this is the particular format they're asking for. 

HTTP version is probably always going to be HTTP/1/1. 1.0 also exists but isn't really used anymore. For purposes of CS50, GET is probably always what you're going to be using when you're actually making direct HTTP requests. But POST is another option that we're not going to talk about right now. And then request-target is what page on the host's server you would like to get. As I said, that host name is a separate line, usually the second line of the overall request. And so taken together, the host name and the request target specify a specific resource being sought. In our 404 example a second ago, I was asking again for www.cats.com, cats.com being the host. And in my request line, I said /cats.html. That was my request target. So overall I was asking for the contents or the resource located at www.cats.com/cats.html. 

And then based on whether the resource exists and whether the server can deliver the resource pursuant to the client's request, you might get various status codes back. Some of these status codes you've seen because they're part of the response. Some of them, 200 OK, are probably pretty silent. You've probably never seen a page respond 200 OK. You just get the page. It's not like a 404 error, which is usually pretty clear. You usually see that it says 404. 

So let's talk about what some of those status codes might be. Again, when the server responds to us, they're going to respond HTTP version status. Usually HTTP/1.1. What are these status codes going to be? Well, we might get a success. So in the success category, we might get code 200 with the text OK. What does this mean? Well, everything is good. You made a valid request. Here's a valid response. I was able to deliver exactly what you wanted. 

Sometimes you might get other things that you won't notice right away but are somewhat failures. They're called redirections. There's two common ones here. 301 Moved Permanently-- what this basically means is the page is now at a new location. It will live there forever. And most browsers will automatically redirect you. So you'll never really see a 301, either, unless you're using a really out-of-date browser, possibly, because the 301 response is part of the dot, dot, dot of the 301 response. It also tells you where the new page is. And so most browsers will just redirect you there, assuming that you want to go there. 

Sometimes you'll also get 302 Found. And this one you actually might still see occasionally. Sometimes pages move temporarily. So it's not going to be built into the request telling the browser to permanently change any time it sees the request that you make to change it to something else. So you might see 302 Found, which basically says this page lives somewhere else. But it's not going to live there forever. It will eventually probably go back to where you think it is. 

Then you'll get things like client errors. So these are ones you've probably seen, now. You probably haven't seen the 200s or the 300s, but you're probably familiar with the 400s. And that's what we'll talk about in a second, 500s as well. 

You might see 401 Unauthorized. Usually this means you're trying to access a page, but you haven't logged in. So you try and go to some profile or something on Facebook or you try and access some-- you're at work. You're trying to access something on your work's internet, but you're not logged in. You can't see the page. You might get a 401 unauthorized, which means we probably will be able to satisfy this request, but first you need to log in to do so. 

Conversely, you might get 403 Forbidden, which is it doesn't really matter if you're logged in or not. This request isn't allowed. The resource exists on the server. But you are not allowed to access it. This is usually internal files that live on the server for various reasons but are not intended to be accessed from the outside world, and so they are forbidden. They live there. I'm not saying I can't find it. But I'm saying I cannot give it to you. And it doesn't matter if you're logged in or not. And then of course, the very common 404 Not Found. The file doesn't exist on the server. I would like to satisfy your request, but I can't. 

You also sometimes see server errors, the most common generally being 500 Internal Server Error, which doesn't actually tell you anything at all about what has gone wrong. But it's not actually you making a mistake in your request. It's actually the server failing to deliver on the request somehow. So 500 is the general response. 

You'll also see something like Service Unavailable, which I believe is code 503. And Gateway Timeout-- if you ever had a page just sit there loading and loading and loading and you never know if it's going to load and then eventually it just says-- just gives up. That's a 504 Gateway Timeout. The server wanted to execute your request, but something went wrong on the server side-- not on your side-- to cause that to be a problem. Now, we could end the story here, but what I'm actually going to do now is I'm going to open up my browser and show you how you might be able to see some of these status codes even if you don't generally see them. And we're going to do that by taking a look at some developer tools. 

All right So here I am now in my browser window. And I want to learn a little bit more about these HTTP requests. How do I know-- certainly we know if a page goes-- when something goes wrong, we get a 404. We've all seen that. We don't need to illustrate that. But what are some other ones? And how would we see these requests in action? 

So first thing I'm going to do is open up Developer Tools. So Developer Tools are built into most modern browsers and allow us to see things that we don't otherwise see-- some extra information sort of being transmitted underneath our web requests. I'm using Google Chrome here. And to open Developer Tools in Chrome, you just hit F-12, and it's going to open it up on the side. Once I type the request, I'll zoom in so we can see what's going on here. But what I'm going to do in my browser bar is-- and I'll zoom in over here-- I will make a request to www.google.com. We've all probably made this request before. I'm going to hit Enter. 

Now, over here in my Developer Tools, I've chosen the Network tab. And you notice a lot of things here. Look at these-- 200 OK, 200 OK, some of these status codes coming up. I don't know why I'm getting 302 Found. I didn't realize I'd see that one. But basically notice that pretty much, in terms of my Google request-- I made a very simple request for Google's page. And in the process of delivering my request, Google has apparently made a lot of other requests on my behalf. 

But I've made a get request for Google's page and I'm getting a lot of 200 OKs. I'm not seeing 200 OK on my screen, but I'm getting a lot of requests that have been made. One more that I'm pretty sure is going to work is-- for those of you who are really old-school, you may know that Facebook was not always at Facebook.com. In its early days it was at wwww.thefacebook.com. They apparently could not get access to Facebook.com for quite awhile. 

And so what I'm expecting here is to get information. And we'll see if this pans out. What I'm expecting here is to get information that Facebook has moved permanently from thefacebook.com to Facebook.com. So I'm expecting somewhere near the top of my requests over in my Developer Tools to get a 301 notification that Facebook has moved permanently. Again, I won't see 301 on my browser screen. And because it's a 301, it's a permanent move. My browser, being that it's a modern browser, is probably going to redirect me to Facebook.com anyway. But let's see what happens. 

And now I'm going to go to thefacebook.com. And yep, there it is right at the top. It went away, but it was there. Let me scroll up here. Right here at the top. I made a request to thefacebook.com, and I'm getting a response that this page has moved permanently. And then 307 here is an internal redirect. And so this is what has actually moved me to the much more familiar www.facebook.com. 

So these response codes do still happen, even if we don't see them. I'm not going to illustrate 401, 403, 404, because you've probably seen those at various points. And 500, I would just be kind of-- we'd get lucky if got a 500 because we don't know what servers are currently down anywhere. But these codes do exist, and there is a way to access them even if we don't see them firsthand on our systems. I'm Doug Lloyd. This is CS50.