DOUG LLOYD: If you watched our internet primer video, I left a bit of a cliffhanger by talking about the internet and how it's a system of protocols. Well, let's talk about the first of those protocols that actually comprises the internet. And interestingly enough, it's called the Internet Protocol, which we usually refer to as IP. So the internet, as I said, is an interconnected network, an internet, which is really just several networks woven together and agreeing somehow to communicate with one another. What is this somehow I'm talking about? Well, this is the Internet Protocol. This dictates how information is transmitted from point A to point B. And this is sort of a condition of joining the network of the internet is agreeing to follow this protocol when information needs to be moved from point A to point B. So at the very end of that internet primer video, I showed this image of what the internet was. And on a small scale, this is actually probably pretty accurate. This might be how three networks actually talk to each other. But it's a bit misleading. And the reason it's a bit misleading is because-- if I just number the networks for the sake of convenience here and we get rid of everything else and just focus on the networks-- it's a bit misleading because it implies that all three network have a connection to one another. One is connected to two. Two is connected to three. And three is connected to one. And when I talk about a connection here, I'm talking about a physical, wired connection. We do have wireless. But it's really impractical for data to be transmitted wirelessly over a large scale. And so at some point, we really do rely on wired technology-- telephone wires, fiber optic wires, various technologies that are physically connecting point A to point B. And on a small scale like this-- this might be accurate, but as the image gets a little bigger, let's now imagine we have six different networks. If that's true, now we have something like this for every network to be connected to every other network. And if you look, every network has five arrows connected to it. So everything is connected to every other network. We only have six networks here, and already look at how much wiring we have to employ, right? And the internet consists of a lot more than six networks. We can't afford to wire each network to each other network, especially considering some of these networks span oceans, right? If we're trying to connect to a network in Asia or in Europe, we're going to have to span an entire ocean. We're going to need to use wires at some point, but we want to minimize the number of wires we actually use. We don't want to send a million wires across the ocean, because they cost millions of dollars apiece to lie down. And so rapidly, we wouldn't be able to afford the internet anymore. So we have to have another way for every network to talk to every other network or else we have pieces of the internet that are disconnected from other pieces of the internet. And that's not what we want. But we don't want to have them all wired together. And this is where routers come back into play. We can use routers in the following way. What if instead of every network being physically connected to every other network, we had these intermediary pieces, where the networks were connected to these intermediaries, which are connected to a few networks. So instead of having one connect to two, three, four, five, six, maybe one connects to a router, which maybe connects to one or two of those networks, but also maybe connects to other routers, which also will connect to those other networks. And the router's job is-- it contains information called a routing table that dictates where do I go if I see a particular IP address? If I see an IP address starting with four, I'm going to go this way. If I seen IP address starting with a 12, I'm going to go that way. We don't need to be connected physically to network number four or network number 12 in this example. We just know generally where we want to go. And if you think about it, this is sort of similar to the concept of recursion that we talked about when we were talking about it in C. I'm not going to connect you to exactly where you want to go. I'm just going to move you one step closer to where you want to go. And I'll let somebody else deal with solving the rest of the problem. I'll just solve this little piece of the problem and defer the rest of it to somebody else. So routing information is actually kind of similar to recursion. If that's a concept that you understand well, maybe that analogy would help. So let's take a look at this networking example again and assume that, again, we're going to use those same six networks, one through six. So let's just say that every IP address on network one starts with one dot something. And we'll say that there's some other thing that deals with how all the systems are connected to network one. We just care about connecting all of those networks together in an internet. So every device that is connected to network one has an IP address that starts with one dot and then three other numbers. This is a generalization of the way things actually work. It's quite a bit more precise than this. But this should give you a general idea of what the Internet Protocol is actually doing. So this was the diagram we had before. This was the system that was not sustainable. Even six, this might be OK. But if we get to 10 or 20 or 50, we're going to be lying a lot of wires. And 50 is still also not even the tip of the iceberg as to the number of networks we have. So this model is unsustainable. We can't stick with this. So let's instead adopt this model where we get rid of all the wires between the networks and we add routers. So these yellow boxes represent routers. And their job is to move information generally closer to where it's supposed to go. And maybe these are the connections that these networks have. And maybe these are the tables that are built into the routers. So if we just start by looking at network one, for example, basically what it says is if I ever see an address that doesn't start with a one-- that's what the exclamation point one or the bang one there, not one-- I'm going to pass it off to a router. And from there, the router can make a decision. The router says if I see a one, I'm going to move to network number one. That's the green arrow heading to the left out of that top left box. If I see a two-- that's the arrow sort of heading to the top right towards the purple network-- if I see an IP address starting with a two, I'm going to go towards the two network. If I see a three, a four, a five, or a six-- that's that red arrow coming out of the top left router-- I'm not connected to three, four, five, or six. But I know somebody who is or who's a little bit closer to there. So I'm just going to say, every time I see an IP address starting with three, four, five, or six, I'm just going to send it to that router. So I'll move it a little closer to where it's supposed to go and let that router deal with the problem. And as you can see-- if you wanted to pause here and trace-- you can get to every other point in the network from wherever you are. All six networks can still connect to every other network but they're not physically connected anymore. They're now these intermediate steps. Now, of course there's a trade off of speed, right? If one was directly connected to six, we wouldn't have to go through two routers along the way. So we may be able to get the connection a little bit faster. But maybe that trade-off is worth it, right? If it's going to be so expensive in terms of actual cost, dollars and cents, to physically wire all these networks together, maybe a little bit of a slowdown in speed is OK. We can tolerate that. So again, in that example we were just talking about, none of the networks directly connect to each other all. There could have been-- maybe in that example we could have made it so that maybe network one and two were directly connected. And that would be OK. Some networks are physically connected to other networks. But they're not all connected to each other. They rely on the routers-- in this particular example-- to distribute the communication from point A to point B. On a small scale-- like what we're talking about here-- this configuration actually might be more inefficient than just having direct connections. But on a large scale, we can scale the system a lot better. It's really going to reduce our cost of network infrastructure to have intermediary routers whose job it is to move traffic from the sender to the receiver, from point A to point B, as opposed to wiring everybody together. So let's take a look at an example of information traveling using this Internet Protocol. Let's say that I am physically located at IP 1.208.12.37 so I exist somewhere on the one network. And I want to send a message to you. And you're on the five network at 5.188.109.14. Your IP address specifically doesn't matter, but in this particular example we're talking about this generalization of what the internet protocol is all about. You're on the five network, and I'm on the one network. As you can see, we're not connected to each other at all. So I start out. And I want to send you a message. And so somehow I communicate that message to the router. The router is the one that actually has the IP address. And it's looking at where it's supposed to go. We're going to five dot something. So now I'm going to start using my-- or the router, rather, is going to start using its router table to pass information along. It sees that five is not one, so it says I'm going to pass it to this guy. Then this guy has to make a decision. Where am I going to go? Well, it's not a one, so I'm not going to move to the one network. And it's not a two. I'm not going to move to the two network. It starts with a five. I'm not connected to five, this router says. And so I'm just going to pass it off to-- I'm going to go down this path. This is where threes and fours and fives and sixes go. And I'll let that guy deal with it. I'll get it a little closer to where it's supposed to go. I know it's supposed to go in that general direction. But maybe that guy can deal with it. OK. So that guy looks. He says, OK, this IP address starts with a five. Well, I'm connected to three and to six, so I can't get the message directly where it needs to go. But that other router over there, I know if I send it fours and fives, it can handle those. So it passes it along down the path. And then this router says, well, I'm connected to networks four and five. So, yes, I can help you. I'll take your IP address that starts with a five. I'll give it to the five network. The five network will do some work on its end and give the message to you. And now we've successfully transmitted a message from me to you using the Internet Protocol. Again, very generalized for purposes of illustration as to what's happening. But that's pretty much how the Internet Protocol works. The routers know generally where to send it and will send it one step along the way, getting it closer and closer to its destination until one router is physically connected to the network or the address or whatever in question and gives it there. Now, in general, except for really, really small, small messages, it's not going to send it as one big chunk of data. If I'm sending you an email-- a very long email, say-- it's not going to take that entire email, bundle it up in a ball or a package or whatever, and send that entire thing down the network. First of all, sending information along the network is expensive. It does add up. And the larger the chunk, the more costly it is to move every step of the way. And if there's somehow a slowdown and then there's this giant-- sort of like if you're driving on the highway and there's this giant truck kind of blocking the way and you can't get around it on either lane because it's kind of spread out. It slows everybody else down behind it. But small cars, if they were all small cars, they might be able to move around, if that analogy sort of helps a little bit. So one big block in the system can really slow everybody else down. And so what IP is going to do is split this data into packets. It's going to take this big email or FTP transfer or a file transfer, or maybe I'm making a request to a web browser because I want a picture of cat. And it's going to take that request or that email or that file and break it up into many pieces and send all of the pieces separately. So in fact, I'm filling the highway with a lot of small cars, which can all move instead of a big truck that might, if something goes wrong, throttle the traffic for everybody else. Another side effect of this is if there's some sort of catastrophic failure and something goes wrong and the packet gets dropped. Something is failed and the message can't be communicated. The router maybe had too much stuff going in. It couldn't juggle everything. And so it just literally dropped it. That's sort of the analogy, right? It's got a lot of things going on. It's passing information from point A to point B. We're not the only two people on the internet, so it has to process a lot of traffic. And if it doesn't have enough hands and it can't figure out what it's doing, it might just drop something. So it can do something else. It's got too much going on. If we had our message as one huge block and that was what got dropped, now we have to send the message again. And we are now possibly causing traffic again. And we run the risk of that huge block being dropped again. But if the data's been broken up into packets and we drop one of those, it's a lot less costly to send that packet one more time as opposed to the entire thing one more time. So IP is responsible for getting information from point A to point B and also breaking the information into small pieces so that the network isn't overly taxed. IP is also known as a connectionless protocol. There's not necessarily a defined path from the sender to the receiver or vice versa. Now, in this example we've talked about, there actually is only one way to get to every network. So in this particular illustration, there actually is a defined path from point A to point B. But we can change that by just making one modification to the two routers on the left by adding this condition to the router tables. Now notice that from the top left router, there are actually two ways to deal with a four or a five IP address. It can go down to the lower left router, or can go to the right, to the right router. It has multiple options. And this is actually kind of a good thing because it makes our network more responsive. If for example-- it's sort of like a GPS. If you've ever been driving on the highway and suddenly your GPS warns you that traffic is ahead, you want to avoid it if you can. And so you can recalculate your route. And a router network, in addition to having information about where packets should go or where data should go, there's also sort of this general pulse on the state of its local network. What's going to happen if I send it down this path versus this path? And so in light of heavy traffic situations on the network, maybe things will get routed a more inefficient way or a more generally inefficient way, because if we go the regular way, there's going to be a lot of traffic. The highway is completely jammed. So maybe what we'll do is instead take side roads, which ordinarily would take a lot more time, but no one's really using those side roads. And so we can route our packets that way. So not every packet of a big chunk of data might take the same path from the beginning to the end. And our network becomes a lot more responsive if our router tables allow for there to be multiple options for where to go. We're not depending on that one truck moving out of the way. We can get off the highway at the next exit and take a different path. And so the Internet Protocol sort of does a little bit of that, too. So that's the basics of the Internet Protocol. But there's one more issue to deal with, which is what happens if we do drop a packet? How do we know we're going to send that packet again? Right? Well, Internet Protocol doesn't guarantee delivery. We're going to be depending on another protocol to deal with that called Transmission Control Protocol, TCP. And we're going to talk about Transmission Control Protocol in the next video. I'm Doug Lloyd. This is CS50.